一个Python函数实现Wav2Vec2中文语音转文字
时间: 2024-02-03 22:03:25 浏览: 204
可以使用下面的Python函数实现Wav2Vec2中文语音转文字:
```
import torch
import numpy as np
import pyaudio
from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer
def speech_to_text():
# 加载模型和分词器
model_name = "jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn"
tokenizer = Wav2Vec2Tokenizer.from_pretrained(model_name)
model = Wav2Vec2ForCTC.from_pretrained(model_name)
# 录音参数配置
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 5
# 录音并预处理
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
frames = []
print("开始录音,请讲话...")
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("录音结束!")
stream.stop_stream()
stream.close()
p.terminate()
audio = np.frombuffer(b''.join(frames), dtype=np.int16)
input_values = tokenizer(audio, return_tensors="pt").input_values
# 模型推理并转换成文本
with torch.no_grad():
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = tokenizer.batch_decode(predicted_ids)[0]
return transcription
```
调用该函数即可进行中文语音转文字,例如:
```
text = speech_to_text()
print("语音转文字结果:", text)
```
需要注意的是,该函数使用的是Wav2Vec2模型,因此需要满足Wav2Vec2模型对输入语音信号的要求。如果输入的语音信号不符合要求,可能会影响模型的识别效果。
阅读全文