大数据管理与分析：驱动各应用的关键

需积分: 0 194 浏览量更新于2024-07-18 收藏 3.89MB PDF 举报

《大数据管理与分析在各类应用中的重要性》一书深入探讨了云计算和大数据技术对基础设施、平台、软件以及业务流程带来的深刻影响。本书以数据科学中的复杂性和需求增长为核心议题，提供了有效和高效的处理方法。涵盖了银行、社交网络、生物信息学、医疗保健、交通和犯罪学等多个领域的案例研究，由数据管理与分析领域的专家共同编撰。作者们如Mohammad Moshirpour、Behrouz Far和Reda Alhajj等人共同编辑了这一卷，旨在快速发布大数据领域的新进展和前沿成果。该系列书籍"Studies in Big Data"（大数据研究）由Janusz Kacprzyk博士担任系列编辑，隶属于波兰科学院华沙分部，邮箱kacprzyk@ibspan.waw.pl。这个系列的目标是广泛覆盖大数据理论、研究、开发及在工程、计算机科学、物理学、经济学和生命科学等领域的应用。书中强调的是对来自传感器、模拟、众包、社交网络和其他互联网交易（如电子邮件和视频点击流）产生的大规模、复杂或分布式数据集的分析和理解。《大数据管理与分析对各类应用的重要性》一书特别关注如何将这些理论和实践应用于实际场景，帮助读者洞悉在不同应用领域如何进行数据管理与分析。无论是金融决策中的数据驱动策略，还是社交媒体分析中的人际行为洞察，或者医疗健康中的个性化治疗方案，该书都提供了丰富的实例和实用的指导，以期激发研究人员和专业人员对大数据技术的兴趣和探索。通过阅读本书，读者不仅能掌握数据管理的核心原则，还能了解到如何利用大数据优化业务流程，提升企业的竞争力和创新能力。这是一本既理论深厚又实用性强的读物，对于任何希望在这个快速发展的领域取得突破的专业人士来说，都是一份宝贵的参考资料。

Big Data Analytics of Social Network Data 9

which are then fed into an R add-on package called arules

from the Comprehensive

R Archive Network (CRAN). The arules package executes an R implementation of

the Apriori algorithm to mine frequent patterns and to learn those association rules

from the input Facebook dataset extracted by Rfacebook. A key difference between

the original proposal of the Apriori algorithm and this R implementation of the

Apriori algorithm is that the former uses conﬁdence to measure the interestingness

of association rules. The conﬁdence, which is a conditional probability value

measuring the chance of having the consequence of the association rule given the

antecedent of the association rule, can be deﬁned as follows:

conﬁdence.A ) C/ D

sup.A [ C/

sup.A/

; (1)

where (i) A is a frequent pattern representing the antecedent of the association

rule A ) C, (ii) C is a frequent pattern representing the consequence of the

association rule A ) C, (iii) sup.A/ is the support (i.e., occurrence, frequency)

of A. In contrast, this R implementation of the Apriori algorithm uses lift to measure

the interestingness of association rules. The lift, which measures the dependence

between the antecedent and the consequence of the association rule, can be deﬁned

as follows:

lift.A ) C/ D

sup.A [ C/

sup.A/ sup.C/

(2)

conﬁdence.A ) C/

sup.C/

(3)

Although we used Rfacebook and the arules package for the discovery of the

most interactive friends of the user of interest (i.e., primary user), we are not

conﬁned to Rfacebook and the arules package. For instance, we can use B-mine

[13] as an alternative frequent pattern mining algorithm. To handle big social

networks, we can also use FoP-Miner [10, 20] for dense networks and CFoP-

Miner [19] for sparse networks. Key ideas behind these three algorithms can be

described as follows. They capture the big social data by an uncompressed or a

compressed bitmap structure, from which frequent patterns (e.g., patterns revealing

the “following” relationships in social networks) are recursively mined in a depth-

ﬁrst fashion using the MapReduce model.

Furthermore, when handling big social data, the number of discovered associ-

ation rules or frequent patterns can be large. Consequently, these rules or patterns

may not be easily comprehended by users. To resolve this problem, we apply another

package—namely, arulesViz

—to visualize association rules returned by the arules

package.

http://lyle.smu.edu/IDA/arules/, http://cran.r-project.org/package=arules, and/or https://github.

com/mhahsler/arules.

http://cran.r-project.org/package=arulesViz, https://github.com/mhahsler/arulesViz.

Big Data Analytics of Social Network Data 13

thousands of friends. Among these friends, some of them care about the users

of interest (i.e., primary users) by responding to the primary users’ posts (e.g.,

like these posts, add comments to the posts, or tag the primary users) while

some other are lurkers who just observe do not actively participate in any social

network activities. How to distinguish those who care about you from those lurkers?

To answer this question, our key contribution of this book chapter is our big data

analytics techniques on social network data. Speciﬁcally, our techniques help users

discover those most interactive users who cares most about the primary users

on social networking sites such as Facebook. We ﬁrst used Rfacebook to access

Facebook’s API via the R project for extracting relevant social data from Facebook.

We then executes the arules package—which is a variant of the well-known Apriori

algorithm—from the Comprehensive R Archive Network (CRAN) to mine frequent

patterns and learn association rules with conﬁdence and lift measures. Afterwards,

the discovered knowledge—in the form of association rules—are visualized by

using the arulesViz package. Hence, the knowledge discovered from this big data

analytics of social network data reveals who cares most about you on Facebook.

As ongoing work, we are adjusting the weights on different posts or activities. For

instance, we applied time-fading model to assign lighter weights to older posts and

heavier weights to more recent posts. Moreover, we are applying sentiment analysis

to identify and categorize the relevance of tag posts.

Acknowledgements This project is partially supported by Natural Sciences and Engineering

Research Council of Canada (NSERC) and University of Manitoba.

References

1. Aggarwal R, Srikant R. Fast algorithms for mining association rules. In: VLDB 1994; 1994.

p. 487–99.

2. Bayrak AE, Polat F. Examining place categories for link prediction in location based social

networks. In: IEEE/ACM ASONAM 2016; 2016. p. 976–79.

3. Cuzzocrea A, Folino F, Pizzuti C. DynamicNet: an effective and efﬁcient algorithm for

supporting community evolution detection in time-evolving information networks. In: IDEAS

2013; 2013. p. 148–53.

4. Dai BT, Kwee AT, Lim EP. ViStruclizer: a structural visualizer for multi-dimensional social

networks. In: PAKDD 2013, Part I. LNCS (LNAI), vol. 7818; 2013. p. 49–60.

5. del Carmen Contreras Chinchilla L, Ferreira KAR. Analysis of the behavior of customers

in the social networks using data mining techniques. In: IEEE/ACM ASONAM 2016; 2016.

p. 623–25.

6. Ferrara A, Genta L, Montanelli S. Linked data classiﬁcation: a feature-based approach. In:

EDBT/ICDT workshops 2013; 2013. p. 75–82.

7. Fowkes JM, Sutton CA. A subsequence interleaving model for sequential pattern mining. In:

ACM KDD 2016; 2016. p. 835–44.

8. Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: ACM SIGMOD

2000; 2000. p. 1–12.

9. Jiang F, Leung CK. A business intelligence solution for frequent pattern mining on social

networks. In: IEEE ICDM workshops 2014; 2014. p. 789–96.

剩余167页未读，继续阅读

wang1062807258

粉丝: 13
资源: 272

大数据管理与分析：驱动各应用的关键

Advances in Feature Selection for Data and Pattern Recognition

Mobile Big Data: A Roadmap from Models to Technologies

Highlighting

jekyllHighlighterAtomOneDarkTheme:highlighting style for Jekyll and Pygments inspired by Atom's One Dark theme

[Practical Exercise] Data Storage and Analysis: Storing Scraped Data into MySQL and Performing Data ...

【Practical Exercise】Data Storage and Analysis: Storing Scraped Data into Elasticsearch and ...

Other Reference Management Tools: A Comparative Analysis of Pros and Cons, Choosing the One Best ...

C Language Image Pixel Data Loading and Analysis [Image Processing Library] NImage: A C Language-...

PyCharm Python Version Management and Security Analysis: Analyzing Security Vulnerabilities Due to ...

In-depth Analysis of the Rendering Principles and Optimization Techniques of kkfileview

最新资源