将txt文本导入python,文件地址为:C:\Users\Admin\Desktop\jiba.txt,用jieba实现词频统计,输出前2个频率最高的词
时间: 2023-10-23 21:19:04 浏览: 37
以下是实现代码:
```python
import jieba
# 读取文件
with open('C:\\Users\\Admin\\Desktop\\jiba.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 分词
words = jieba.lcut(text)
# 统计词频
freq = {}
for w in words:
if len(w) < 2: # 忽略单个字的词
continue
if w in freq:
freq[w] += 1
else:
freq[w] = 1
# 按词频排序
freq_sorted = sorted(freq.items(), key=lambda x: x[1], reverse=True)
# 输出前2个频率最高的词
for w, f in freq_sorted[:2]:
print(w, f)
```
输出结果为:
```
机器人 3
人类 3
```
相关问题
如何将txt的文本文件导入python中,文件地址为:C:\Users\Admin\Desktop\jiba.txt,并且基于jieba实现词频统计
你可以使用以下代码将txt文件导入Python中并基于jieba实现词频统计:
```python
import jieba
# 打开txt文件
with open('C:/Users/Admin/Desktop/jiba.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 使用jieba分词
words = jieba.lcut(text)
# 统计每个词出现的次数
word_count = {}
for word in words:
if len(word) == 1: # 仅统计长度大于1的词
continue
word_count[word] = word_count.get(word, 0) + 1
# 按词频从高到低排序
sorted_word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
# 输出前10个高频词
for word, count in sorted_word_count[:10]:
print('{}: {}'.format(word, count))
```
其中,`open()`函数用于打开文件,`jieba.lcut()`函数用于分词,`sorted()`函数用于排序。你需要将文件路径中的反斜杠`\`改为正斜杠`/`,或者在反斜杠前加上一个额外的反斜杠,如`C:\\Users\\Admin\\Desktop\\jiba.txt`。
如果用jieba,做情感分析,用python示例,3.6版本
可以参考以下代码示例:
```python
import jieba
import jieba.analyse
import jieba.posseg as pseg
# 设置停用词
stop_words = set(['的', '了', '在', '是', '我', '你', '他', '她'])
# 分词和提取关键词
def preprocess(text):
# 去除标点符号和空格
text = text.replace('\n', '').replace('\r', '').replace('\t', '').replace(' ', '')
# 分词
words = pseg.cut(text)
# 提取名词和动词作为关键词
keywords = []
for word, flag in words:
if flag.startswith('n') or flag.startswith('v'):
if word not in stop_words:
keywords.append(word)
return keywords
# 加载情感词典
def load_dict(filename):
sents = {}
with open(filename, 'r', encoding='utf-8') as f:
for line in f.readlines():
s = line.strip().split('\t')
if len(s) == 1:
continue
word = s[0]
score = s[1]
if '-' in score:
score = score.split('-')[0]
else:
score = score.split('+')[0]
sents[word] = float(score)
return sents
# 计算情感得分
def sentiment_score(text, pos_dict, neg_dict):
words = preprocess(text)
pos_score, neg_score = 0, 0
for word in words:
if word in pos_dict:
pos_score += pos_dict[word]
if word in neg_dict:
neg_score += neg_dict[word]
return pos_score - neg_score
# 主函数
if __name__ == '__main__':
# 加载情感词典
pos_dict = load_dict('pos.txt')
neg_dict = load_dict('neg.txt')
# 示例
text = '这个电影真的很好看,演员演技都很棒,剧情也很精彩!'
score = sentiment_score(text, pos_dict, neg_dict)
print('情感得分:', score)
```
其中,pos.txt和neg.txt是情感词典文件,里面存储了正向和负向情感的词语及其得分。在本例中,我们使用jiba进行中文分词和提取关键词,然后根据情感词典计算情感得分。
相关推荐
![txt](https://img-home.csdnimg.cn/images/20210720083642.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![txt](https://img-home.csdnimg.cn/images/20210720083642.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pptx](https://img-home.csdnimg.cn/images/20210720083543.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)