融合市场新闻与股票价格提升日内股票收益预测精度

120 浏览量更新于2024-08-29 收藏 602KB PDF 举报

本文探讨了股票价格过程与市场新闻之间的相互作用在金融市场中的重要性，尤其是在日内股票回报预测中的应用。传统研究通常将市场新闻视为外生因素，认为其对股价有显著影响，或者侧重于过去股价变动如何影响未来收益。然而，本研究旨在通过量化融合市场新闻和股票价格信息，提升日内股票回报预测的准确性。作者们采用多核学习方法（Multiple Kernel Learning, MKL），这是一种强大的机器学习技术，它能够同时利用多个特征空间中的信息，从而更好地捕捉到新闻和股价数据中的复杂关系。多核学习允许模型在不同的特征表示之间进行灵活转换，提高了预测模型的泛化能力。具体来说，研究者首先收集实时的市场新闻数据，包括宏观经济指标、公司公告、行业动态等，这些数据被认为是预测股票价格的重要信号。同时，他们也会分析历史股票价格数据，包括开盘价、收盘价、最高价、最低价和交易量等关键指标，以捕捉价格趋势和波动模式。在融合这两类信息的过程中，作者构建了一个混合核函数，这个函数将市场新闻和股票价格数据的不同特征空间结合在一起。通过MKL算法，模型能够自动学习如何最优地组合不同特征，从而提取出最有价值的信息用于预测。这种方法能够减少过度拟合的风险，并提高预测的稳定性和精度。实验部分展示了在实际金融市场环境中，这种整合策略相较于仅依赖单一数据源（如新闻或价格）的模型，显著提升了日内股票回报预测的F1分数、平均绝对误差（MAE）和均方根误差（RMSE）等性能指标。这表明，通过同时考虑市场新闻和股票价格，可以更全面地理解影响股价动态的因素，从而提高预测的精准度。总结来说，这篇研究在股票市场预测领域的一个重要突破是，它提出了一个系统性的框架，通过多核学习技术，有效地整合了市场新闻和股票价格数据，为投资者提供了更精确、实时的日内股票回报预测。这一方法对于量化交易、风险管理以及投资者决策具有实际应用价值。

three baseline methods (which use only one information source

or use a naive combination of the two sources) on real tick-by-

tick data.

The rest of this paper is organized as follows. Section 2 reviews

the literature on traditional approaches for stock market predic-

tion, and the formulation of MKL and its applications. Section 3

presents our proposed system and describes the design of

the experiments. Section 4 shows the experimental results and

discussions. Section 5 gives our conclusion and future work

directions.

2. Related work and background

2.1. Traditional approaches

Some useful observations have been made in the ﬁnance

domain. Ederington and Lee [8] observe that there is always a

big increment of the standard deviation of ﬁve-minutes returns on

a day when a government announcement on a ﬁnancial relevant

policy is released at 8:30 am. Tetlock [44] analyzes the content of

the “Abreast of the Market” column in Wall Street Journal, and

ﬁnds that pessimistic words predict low stock returns. Tetlock

et al. [45] also ﬁnd that ﬁrm-speciﬁc future earnings and returns

could be predicted by news when they analyze the tone of ﬁrm-

speciﬁc news.

Analysis of news articles has been reported in the literature of

computer sciences. Following the approach of text mining on news

articles, Seo et al. [39] build a multi-agent system for intelligent

portfolio management, which can assess the risk associated with

companies by analyzing news articles. Yu et al. [52] propose a

four-stage Support Vector Machine (SVM) based on the multi-

agent ensemble learning approach for credit risk evaluation. Fung

et al. [13] classify news articles into different categories and

predict the directional impact of newly released news articles.

The AZFinText system, built by Schumaker and Chen [37], makes

not only directional prediction but also quantiﬁed estimation of

prices.

As illustrated in Fig. 2, the processing pipeline of the

approaches that use the news information source could be

summarized below:

1. Representation of news articles. A piece of news is usually

represented as a term vector by using the Vector Space Model

after the removal of stop words and feature selection [11,21,25,

26,46,48]. Sentiment analysis is occasionally employed to

analyze news at a semantic level [15,24,30,31].

2. Labeling. Each piece of news is then assigned with a label. In a

classiﬁ

cation model, pieces of news are sorted by their time

stamps, and labeled with nominal values, such as positive/

neutral/negative. While in regression approaches, news is sim-

ply labeled with a continuous value, such as daily stock price

return. In this paper, we use short-term price return for intra-

day prediction.

3. Model training and evaluation. Machine learning models are

employed in this step. Except for evaluating models by means

of standard metrics, such as Precision, Recall, and Mean Square

Error (MSE), some researchers conduct preliminary simulations

[12,36], where strategies trade stocks in a virtual market using

historical real data or data that was generated by simulated

agents. Return rate is the measure of the trading performance.

However, how to design a functional trading strategy that could

make full use of the predictions of a model (not strategies that

simply buy-and-hold) is beyond the scope of this paper.

Besides the work on news, there are also many published

papers on mining signals from stock prices. Guo et al. [17] focus on

the architecture of the neural network and develop a sparsely

connected network model, which achieves the better performance

than traditional neural networks with respect to three data sets

(Microata, Finance Data and Telecommunication Data). For fore-

casting the changes of long-term index, Hung and Lin [20] develop

an intuitionistic fuzzy least-squares support vector machine with

genetic algorithms (IFLS-SVRGAs). Lin et al. [29] recently apply

FLS-SVR-GA to forecast the seasonal revenue of a company. Gestel

et al. [47] use a Bayesian evidence framework and apply it to least-

squares support vector machine regression for price and volatility

prediction. Tay et al. [42,43] and Cao et al. [3] modify the SVM

objective function and make C an adaptive parameter for non-

stationary time series. For predicting index prices, Kim [23]

concludes that the performance of SVM is better than that of

back-propagation neural networks. Applying SVM to predict S&P

500 daily prices, Cao et al. [4,5] also ﬁnd that SVM has the better

performance based on the metrics of normalized mean square

error and mean absolute error. Huang et al. [19] predict the price

directional movement of NIKKEI 225 index using SVM. After

comparing SVM with linear discriminant analysis, quadratic dis-

criminant analysis, and back-propagation neural networks, they

reach the same conclusion. To summarize, the steps of the

approaches in this category are (1) Preprocessing raw prices. To

make historical prices more indicative, prices in absolute price

levels are sometimes translated into price indicators. (2) Patten

classiﬁcation. Patterns of prices are then classiﬁed (or regressed)

by models into predetermined categories (or estimated values).

Note that previous work made use of only one type of

information sources, i.e., either ﬁnance news or historical prices.

In contrast, our proposed approach use both information sources

which are integrated for market prediction by MKL.

2.2. Multiple Kernel learning

In order to have a deep understanding of the problem and to

interpret the derived decision function in a simple way, we

describe MKL used in our system in this section.

The objective function of a single kernel SVR is

min

w;b

〈w; w〉 þC ∑

i ¼ 1

ðξ

þξ

Þ; s:t: ð〈w; ϕðx

Þ〉þ bÞy

r ε þξ

;

ð〈w; ϕðx

Þ〉þ bÞr ε þξ

; ξ

Z 0; i ¼ 1; …; N; ð3Þ

where N is the number of training instances, and C is a penalty

term which balances between training error and model complex-

ity. ξ

ðnÞ

(ξ

ðnÞ

refer to ξ

and ξ

) are slack variables, where ξ

means

News

articles

Historical

prices

Textual

processing

Time series

processing

News

labeling

Model training

Classification

or regression

Evaluation

Accuracy

Closeness

Fig. 2. Architecture of traditional approaches.

X. Li et al. / Neurocomputing 142 (2014) 228–238230

剩余10页未读，继续阅读

weixin_38663516

粉丝: 6
资源: 932

融合市场新闻与股票价格提升日内股票收益预测精度

Enhancing-the-performance-of-a-tensioned-meta_2018_Nuclear-Instruments-and-M

ExtremeLearningMachine资源共享-Towards-enhancing-centroid-classifier-for-text-classification_2013_Neurocomp.pdf

1901_Enhancing-Bluetooth-Location-Service_FINAL.pdf

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform

enhancing mqtt-based machine-to-machine communication with python in iot sys

Small but Mighty: Enhancing 3D Point Clouds Semantic Segmentation with U-Next Framework

用英文回答面试问题：你如何理解信息管理与信息系统专业。两分钟左右

matlab exposure

write an IELTS sample article on the topic of university tuition fees with advanced vocabulary and phrases

请帮我创建一个pyradiomics提取特征的配置yaml文件

最新资源