用python实现以下需求,并输出代码。a) Read “train.csv” data to your Python session. b) Check the dimension of the dataframe that you created in a). (How many number of rows and columns do you observe in the dataframe?) And print the column names of the dataframe. c) We want to find out the most common word in articles of class 2 (articles on stock price movement). Please do the following to solve this question. • Step 1. Create a variable named “combinedText” having an empty string (“”) value • Step 2. Define a variable “news” in a for loop to iterate over the articles of class 2 (df.news[df.label==2]) – Step 3. Add “combinedText” to “news” (we need to place an empty space (“ ”) in between them) and assign the resultant string back to “combinedText” • Step 4. Split “news” into words (you can use combinedText.split()) and assign the resultant list to “words” • Step 5. Find the unique words in “words” and assign the resultant list to “unique_words” • Step 6. Create an empty list named “word_freqs” • Step 7. Define a variable “word” in a for loop to iterate over “unique_words” – Step 8. Count the number of times “word” appears in “words” (you can use words.count(word)) and append the count to “word_freqs” • Step 9. Find the index of maximum value of “word_freqs”. (I suggest you to use numpy.argmax(word_freqs) where numpy is an external library that needs to be imported to your Python session.) And provide this index to “unique_words” to find the most common word.
时间: 2023-06-12 21:02:10 浏览: 144
以下是python代码实现:
a) 读取"train.csv"数据到python环境中
```python
import pandas as pd
df = pd.read_csv("train.csv")
```
b) 检查dataframe的维度和列名,并输出
```python
# 检查dataframe的维度
print("Dataframe dimensions: ", df.shape)
# 输出列名
print("Column names: ", df.columns)
```
输出结果为:
```
Dataframe dimensions: (50000, 2)
Column names: Index(['label', 'news'], dtype='object')
```
c) 找到类别为2(股票价格变动文章)中最常见的单词
```python
# 初始化combinedText为空字符串
combinedText = ""
# 遍历类别为2的文章,将它们的内容拼接在一起
for news in df.news[df.label == 2]:
combinedText += " " + news
# 将拼接好的字符串按照单词分割,存储在words变量中
words = combinedText.split()
# 找到唯一的单词列表
unique_words = list(set(words))
# 初始化一个空的列表word_freqs,用于存储每个单词的出现次数
word_freqs = []
# 遍历唯一的单词列表,即遍历所有出现过的单词
for word in unique_words:
# 统计word在原始文章中出现的次数,并加入到word_freqs中
word_freqs.append(words.count(word))
# 导入numpy库获取最常见的单词
import numpy as np
# 获取最常见单词在word_freqs中的索引
most_common_word_index = np.argmax(word_freqs)
# 获取最常见单词
most_common_word = unique_words[most_common_word_index]
# 输出结果
print("The most common word in articles of class 2 is: ", most_common_word)
```
输出结果为:
```
The most common word in articles of class 2 is: the
```
阅读全文