说明:环境MacOS 11.13.3 [MacBook Pro (Retina, 15-inch, Mid 2015)]
DeepSpeech是mozilla利用Tensorflow实现的一种语音识别引擎,参见https://github.com/mozilla/DeepSpeech。
1) 创建项目目录
mkdir deepspeech
cd deepspeech
2)创建虚拟环境
virtualenv env-deepspeech --system-site-packages
3)使虚拟环境生效
source env-deepspeech/bin/activate
4)安装deepspeech
pip install deepspeech
此时可以运行deepspeech -h检查环境是否正常,我运行时出现下面的错误:
RuntimeError: module compiled against API version 0xc but this version of numpy is 0x9
此时只要升级一下numpy即可:
pip install --upgrade numpy
5) 获取已训练好的模型并解压缩
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz
tar xvzf deepspeech-0.1.1-models.tar.gz
然后在当前目录中会生成一个models文件夹,保存了deepspeech训练出来的模型:
-rw-r--r-- 1 none staff 329 11 18 03:25 alphabet.txt
-rw-r--r-- 1 none staff 1601028778 11 18 03:25 lm.binary
-rw-r--r-- 1 none staff 490978889 1 17 22:09 output_graph.pb
-rw-r--r-- 1 none staff 43550345 11 18 03:25 trie
6)准备一个16K采样,16bit,单声道的wav文件
我将女儿的英语听力mp3转成16K,16bit,mono的文件
deepspeech models/output_graph.pb models/alphabet.txt models/lm.binary models/trie test.wav
结果会出现错误,然后退出,错误信息如下:
libc++abi.dylib: terminating with uncaught exception of type lm::FormatLoadException: native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece &, std::vector &) threw FormatLoadException.
first non-empty line was "1414678853" not \data\. Byte: 11
估计是通过pip安装的deepspeech版本存在问题,下载原代码自己编译:
git clone https://github.com/mozilla/DeepSpeech
cd DeepSpeech/
python util/taskcluster.py --arch osx --target .
查看目录下生成了deepspeech,说明编译成功了,将deepspeech拷贝到原来的目录中再运行:
./deepspeech models/output_graph.pb models/alphabet.txt models/lm.binary models/trie test.wav
识别结果如下:
he aihtbebyureunittwoo smell and taste lisenanti one the water melon is big and round to here are two parts on the tall tree lets get them three taste these grapes are the nice for what a nice lemon it smells good five what would you like id like some oranges six what of those they are strawverieslisenancircle one is it lemenjuce or oringtuce to taste these grapes or they taste three taste what is it here are some strowberies for you five or those pairs sweet or sour six what would you like watermolanjuceororangejucepage nine beat yo year listen choose and complete one its nice here lets ever pigknack what do you have jo guess they are round they are or range they smell nice what are the kitty oh they are oringers i like sweetorringers here you are thank you to we do you have alice close your eyes now taste it oudesitase its soar is it a lemon yes youre right it sour but nice we can make some lemenjece three im thirsty what do you have beat look a big water melon wow thats great i like what i mean for what do you have in your pack kitty at them are the grape so strawberies there small and round their grapes i think yes they are grapes do you like grapes pen no i like stroperies five what do you like del uatamolans or apples guess they are sweet big and round a pulse no youre wrong i like water melons
说实话识别率还有点低,而且耗时特别长:
real 4m31.706s
user 8m35.168s
sys 0m14.004s
注意:有运行deepspeech时有可能会找不到libsox2库,使用brew安装一下sox库即可。