没有合适的资源?快使用搜索试试~ 我知道了~
首页论文研究 - 基于股票时间序列数据和网络舆情文本数据的股市热点事件检测
论文研究 - 基于股票时间序列数据和网络舆情文本数据的股市热点事件检测
需积分: 0 321 浏览量
更新于2023-05-23
评论 4
收藏 5.16MB PDF 举报
随着Internet世界与现实世界的高度集成,Internet信息不仅为金融投资者提供实时有效的数据,而且还帮助他们了解市场动态,并使投资者能够快速识别可能导致股票下跌的相关金融事件。市场波动。 但是,在金融领域的事件检测研究中,许多研究集中在微博客,新闻和其他网络文本信息上。 很少有学者研究金融时间序列数据的特征。 考虑到在金融领域,事件的发生经常影响在线舆论空间和真实交易空间,因此本文提出了一种基于股票交易时间序列数据和在线舆论文本的多源异构信息检测方法。数据以检测股市中的热点事件。 该方法采用离群检测算法,基于多成员融合提取股市热点事件的时间。 并根据本文提出的特征项权重计算公式,计算网络舆情信息的关键词权重,得出股市热点事件的核心内容。 最后,实现了对股票市场热点事件的准确检测。
资源详情
资源评论
资源推荐

Journal of Data Analysis and Information Processing, 2019, 7, 174-189
https://www.scirp.org/journal/jdaip
ISSN Online: 2327-7203
ISSN Print: 2327-7211
DOI:
10.4236/jdaip.2019.74011 Sep. 30, 2019 174 Journal of Data Analysis and Information Processing
Hot Events Detection of Stock Market Based on
Time Series Data of Stock and Text Data of
Network Public Opinion
Beibei Cao
Department of Publishing and Dissemination, Shanghai Publishing and Printing College, Shanghai, China
Abstract
With the highly integration of the
Internet world and the real world, Internet
information not only provides real-time and effective data for financial in-
vestors, but also helps them understand market dynamics, and enables inves-
tors to quickly identify relevant financial events that may lea
d to stock market
volatility. However, in the research of event detection in the financial field,
many studies are focused on micro-blog, news and other network text infor-
mation. Few scholars have studied the characteristics of financial time series
data. Considering that in the financial field, the occurrence of an event often
affects both the online public opinion space and the real transaction space, so
this paper proposes a multi-source heterogeneous information detection me-
thod based on stock transacti
on time series data and online public opinion
text data to detect hot events in the stock market. This method uses outlier
detection algorithm to extract the time of hot events in stock market based on
multi-member fusion. And according to the weight calcu
lation formula of the
feature item proposed in this paper, this method calculates the keyword
weight of network public opinion information to obtain the core content of
hot events in the stock market. Finally, accurate detection of stock market hot
events is achieved.
Keywords
Relationship, Network Public Opinion, Stock Trading Behavior, Stock Market
Hot Events
1. Introduction
In the securities industry, once market fluctuations occur, investors first hope to
find the answer from the Internet information. However, the geometric expan-
How to cite this paper:
Cao, B.B. (2019
)
Hot Events Detection of Stock Market
Based on Time Series Data of Stock and
Text Data of Network Public Opinion.
Journal of Data Analysis and Information
Processing
,
7
, 174-189.
https://doi.org/10.4236/jdaip.2019.74011
Received:
July 11, 2019
Accepted:
September 27, 2019
Published:
September 30, 2019
Copyright © 201
9 by author(s) and
Scientific
Research Publishing Inc.
This work is licensed under the Creative
Commons Attribution International
License (CC BY
4.0).
http://creativecommons.org/licenses/by/4.0/
Open Access

B. B. Cao
DOI:
10.4236/jdaip.2019.74011 175 Journal of Data Analysis and Information Processing
sion of Internet information makes it more and more difficult for people to ex-
tract effective information. If investors are unable to obtain timely and accurate
information about events that lead to financial market volatility, then the losses
caused are incalculable. Therefore, how to quickly find valuable topics and
events from a large number of Internet data is particularly important.
With time goes by, numerous research methods for event discovery have been
put forward [1]-[11]. However, most of these methods are based on text data
[1]-[11] or time series data [12]-[19] for event discovery separately. There are
few scholars, to the best of our knowledge, study the characteristics of financial
time series data and text data to carry out research [20] [21] [22]. As a realistic
behavior of financial markets, time-series data such as stock trading data and
market data are often affected by events and can better reflect changes before
and after events. Therefore, this paper studies the discovery of financial
events by combining network text information and financial time series data, so
as to help investors to quickly obtain hot events and correctly grasp market
dynamics.
2. Post’s Influence of Network Public Opinion Space
2.1. Definition and Quantification of Post’s Activity
In web forums, netizens can express their concern for specific information by
posting, reading and replying. And this degree of attention is an important ex-
ternal feature of the emotional tendency of network public opinion. In this pa-
per, we call it post’s activity. In order to quantify the user’s attention to topic in-
formation intuitively, we calculate it by the amount of readings and the amount
of comments of the posts. Among them, the readings amount of posts reflects
the degree of dissemination of the information contained in the posts and it is
the instinct concern of users. The comments amount of posts reflects the atten-
tion paid to the information contained in the posts. And it is the manifestation
of the user’s emphasis on topic interaction, and its emotional intensity is strong-
er. So in this paper, we choose the amount of readings and the amount of com-
ments as indicators of post’s activity. The specific definitions are as follows:
Definition 2-1 Post’s activity: Assuming that within a period of time
t
, a total
of
N
posts are posted in the online public opinion space, which are
{
}
12
,,,,,
in
pp p p
. The readings amount of the
i
-th post is
_
i
pr
, and the
comments amount is
_
i
pc
. Then the total readings amount of
N
posts in the
time period
t
is
1
_
N
i
i
r pr
=
=
∑
, and the average readings amount of each post is
_
r
avg r
N
=
. We define the propagation coefficient
_
i
pp
of
i
p
as the ratio of
the readings amount of
i
p
in time period
t
to the average reading amount of
each post in the same time period. The formula of
_
i
pp
is
_
_
_
i
i
pr
pp
avg r
=
. Si-
milarly, the attention coefficient
_
i
pc
of
i
p
is defined as the ratio of the
comment amount of
i
p
in time period
t
to the average comment amount of

B. B. Cao
DOI:
10.4236/jdaip.2019.74011 176 Journal of Data Analysis and Information Processing
each post in the same time period, and the formula is
_
_
_
i
i
pc
pc
avg c
=
, where
1
_
_
N
i
i
pc
c
avg c
NN
=
= =
∑
. Finally, the activity of
i
p
is defined as the sum of its
propagation coefficient and the attention coefficient, namely:
11
_ __
__
__
__
__
i ii
ii
ii
NN
ii
ii
pAc pp pc
pr pc
avg r avg c
pr pc
pr pc
NN
= =
= +
= +
= +
∑∑
(1)
2.2. Definition and Quantification of User’s Influence
The influence of users in the stock bar forum refers to the popularity index of the
user in the stock bar. It is mainly affected by the age of the user, the amount of
comments posted by the user, the amount of forwarding, and other factors. So in
this paper we use user’s power, user’s activity and user’s attention to measure us-
er’s influence in the stock bar forum.
2.2.1. User’s Power
User’s power is the potential energy that users have under static conditions. It is
mainly reflected in the three factors of age, the amount of fans and the amount of
people that user concern.
Definition 2-2 User’s power: The user’s power of the
i
-th user
i
a
is defined as
3
ii i
i
Pa Pfr Pfe
Pa Pfr Pfe
a
++
=
, where
i
Pa
is the age of the user
i
a
,
Pa
is the average
age of all users,
i
Pfr
is the amount of fans of the user
i
a
,
Pfr
is the average
amount of fans of all users,
i
Pfe
is the amount of people that user
i
a
concern,
and
Pfe
is the average amount of people that all users concern.
2.2.2. User’s Activity
User’s activity reflects the degree of user’s autonomy, which is mainly determined
by the amount of postings and the amount of comments. A new post usually
contains a new topic. Therefore, the more posts a user publishes, the easier it is
for other users to pay attention to the post, and the greater the influence of the
user. Comments reflect the user’s views and opinions about other people’s infor-
mation, and it is also a manifestation of user’s activity.
Definition 2-3 User’s activity: The user’s activity of the
i
-th user
i
a
is defined
as
_
2
ii
i
Pp Pd
Pp Pd
a Ac
+
=
, where
i
Pp
represents the total number of posts for the
user
i
a
,
Pp
represents the average number of posts for all users,
i
Pd
represents the total number of comments for the user, and
Pd
represents the

B. B. Cao
DOI:
10.4236/jdaip.2019.74011 177 Journal of Data Analysis and Information Processing
average number of comments for all users.
2.2.3. User’s Attention
User’s attention mainly reflects the degree of attraction and attention of users to
other users in the online forum. When a user’s posts are commented by a large
number of other users, it shows that the quality of these posts are high and at-
tractive, which further indicates that the user has great influence. In addition,
there are some users who are not good at commenting, but are used to express-
ing their concern about posts through reading, which also shows the attraction
of posts to them. Therefore, the total amount of readings also needs to be re-
garded as the influencing factor of users’ attention.
Definition 2-4 User’s attention: The user’s attention of the
i
-th user
i
a
is
defined as
_
2
ii
i
Pr Pc
Pr Pc
a At
+
=
, where
i
Pr
represents the average readings amount
of all posts of user
i
a
,
Pr
represents the average readings amount of all users’
posts,
i
Pc
represents the average comments amount of all posts of user
i
a
, and
Pc
represents the average comments amount of all users’ posts.
Based on the above three user indicators, the paper calculates the user’s influ-
ence formula as follows:
___ _
iii i
aI aP aAc aAt=++
In the end, we calculate the post’s influence of network public opinion space
by combining the activity of the post and the influence of the poster. The calcu-
lation formula is as follows:
_ 0.7 _ 0.3 _
ii i
pI pAc aI=× +×
(2)
3. Hot Event Detection of Stock Market
3.1. Definition of Stock Market Hot Events
The events to be studied in this paper refer to hot events that are related to the
stock market and can lead to changes in stock trading behavior. This paper de-
fines it as hot events of stock market. It is embodied in the following three cha-
racteristics:
1) The events corresponding to popular posts (which have been read and
commented for many times and have high influence) on the web forums.
2) The events corresponding to online hot news (which is reported or repro-
duced by multiple news websites).
3) The events that can have a significant impact on the stock market.
The first two are based on the feedback from the online public opinion space
to understand the stock market hot events, and the third is to understand the hot
events of stock market based on the information fed back from the real trading
conditions of the stock market. These three event characteristics will be fully
combined below, and on this basis, we will conduct a research on the detection of
hot events in the stock market.
剩余15页未读,继续阅读










安全验证
文档复制为VIP权益,开通VIP直接复制

评论0