df = pd.read_csv('stock_comments_analyzed.csv', parse_dates=['created_time']) grouped = df['polarity'].groupby(df.created_time.dt.date) def BI_Simple_func(row): pos = row[row == 1].count() neg = row[row == 0].count() return (pos-neg)/(pos+neg) BI_Simple_index = grouped.apply(BI_Simple_func) def BI_func(row): pos = row[row == 1].count() neg = row[row == 0].count() bi = np.log(1.0 * (1+pos) / (1+neg)) return bi BI_index = grouped.apply(BI_func) sentiment_idx = pd.concat([BI_index.rename('BI'), BI_Simple_index.rename('BI_Simple')], axis=1) quotes = pd.read_csv('./data/sh000001.csv', parse_dates=['date']) quotes.set_index('date', inplace=True) sentiment_idx.index = pd.to_datetime(sentiment_idx.index) merged = pd.merge(sentiment_idx, quotes, how='left', left_index=True, right_index=True) merged.fillna(method='ffill', inplace=True) merged['BI_MA'] = merged['BI'].rolling(window=10, center=False).mean() merged['BI_Simple_MA'] = merged['BI_Simple'].rolling(window=10, center=False).mean() merged.to_csv('merged_sentiment_idx.csv')
时间: 2024-03-29 19:36:04 浏览: 334
这段代码是关于股票情绪指数的计算和处理,包括读取股票评论数据并按日期分组,定义了两个函数用于计算简单的情绪指数和 BI 指数,然后将这些指数与股票行情数据进行合并并计算移动平均数,最后将结果保存到 csv 文件中。其中,BI 指数是一种用于衡量市场情绪的指标,通过计算买方和卖方之间的力量平衡来反映市场的情绪状态。
相关问题
result = pd.DataFrame({'第一次打卡时间': first_checkin, '最后一次打卡时间': last_checkin})
As an AI language model, I don't have the context to understand what kind of data is being analyzed here. However, based on the given code, it seems like a pandas DataFrame is being created with two columns: '第一次打卡时间' (first_checkin) and '最后一次打卡时间' (last_checkin). The data for these columns is not provided in the given code.
sp = data_train.target data_train.target1 =np.power(1.5,sp) print(data_train.target1.describe())
This code calculates the target1 variable for the data_train dataset, which is the target variable (sp) raised to the power of 1.5. Then, it uses the describe() method to get descriptive statistics of the target1 variable.
The describe() method provides information such as the count, mean, standard deviation, minimum value, 25th percentile, median (50th percentile), 75th percentile, and maximum value of the target1 variable.
It is important to note that the interpretation of these descriptive statistics depends on the specific dataset and problem being analyzed.
阅读全文