Team # 2009116 Page 5 of 24
1 Introduction
1.1 Statement of the problem
Online marketplace where Amazon takes the leading place has become the most
important platform to trade for most companies. As the mechanism for customers to
communicate shopping experience with each other, various kinds of reviews and ratings
provide potential consumers with subjective evidence. It can also provide the guidance
for new entrants in choosing a target market and target product. In addition, in the social
media era, consumers have louder and stronger voices than ever before, which is shown
as over 50% of consumers frequently factor in online reviews before buying a
product.[1] Because this process can be seen as a direct link to the former customer,
getting the first-hand information of product in order to avoid the exaggeration or of
the seller.
However, at this stage, it is not clear that their interaction, and how they influence
the corporate strategy. Thus, we aim to explore the internal relationship of three
variables and design a composite data measure to assist companies in the selection of
potential successful products. Meanwhile, we discuss how the customer group reviews
influence on individual review attitude orientation. At last, we rank the alternative
products according to our method and offer a recommendation list to Sunshine
company.
1.2 Literature Review
At present, there are mainly two methods for sentiment analysis, one is based on
machine learning, and the other is based on the lexicon. Several approaches based on
the machine learning have come into the spotlight in recent years and most of them are
based on the big amount of personalized reviews from social media. Due to the reviews
in the social media are not only English, but other languages such as Arabic. [2] Many
researchers from non-English regions also apply this method to solve the issue, showing
that this method is quite mature.
And the other method is the Lexicon based approach. This method also utilizes the
text which expresses people’s sentiment or emotion in their social media, such as the
Twitter posts and Micro-blog text. [3,4] Latest researches on the sentiment polarity
lexicon also show the resource of the lexicon is extended to a particular domain, the
stock market, without human intervention and addressing the scaling and thresholding
problem. [5] Therefore, we believe that this method is potential to be widely used.
After we compared the two methods, we determine to choose the Lexicon-based
approach. We believe that this method is most applicable when time and resources are
limited, that is, similar to the conclusions obtained in the study of Urdu's Sentiment
Analysis. [6]
After introducing the time factor, we propose a time series model to discuss the
changes of three variables in the time dimension. After comparing several applicable
models, we choose to use the traditional time series models and the Autoregressive
Integrated Moving Average (ARIMA) model to observe the influence of time factors
on different reviews and to predict the future indicator trend. We think that during the
forecasting phase, the ARIMA model can achieve a quite accurate forecasting as
theories of Matyjaszek, M. et al. [7] At the same time, we also conducted product
forecasting of different products by product. This is in line with Nguyen, H. et al. The
model's predictions for different product prices are similar [8].
1.3 Overview of Our Work
Our main goals are to address three issues that need to be explored: (1) the
sentiment analysis of text reviews; (2) the relationship between reviews and ratings
indicators and (3) the impact of time. Then we propose the product sequencing method
which these three contents combined with.
In order to solve these problems, we first analyze the data characteristics of each
digital parameter, and then apply the sentiment lexicon-based method to analyze the
emotional attitude of the text of the text review, extract the emotional words in the