1. 개발환경 구성
pip3 install SpeechRecognition
# for MAC
brew install portaudio
pip3 install pyaudio
# for Ubuntu
sudo apt-get install python-pyaudio python3-pyaudio
sudo apt-get install portaudio19-dev python-all-dev python3-all-dev
sudo pip install pyaudio
# for api
python3 -m pip install vosk
python3 -m pip install git+https://github.com/openai/whisper.git soundfile
2. 예제 코드
1. Google API
- 코드
import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text = r.recognize_google(audio, language='ko') print(text) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
2. Vosk
- 환경 구성
- 모델 다운로드
- 모델 파일 다운로드 이후에, 프로젝트 폴더 하위에 model 폴더 생성 후 압축해제한다.
- 코드
import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text = r.recognize_vosk(audio, language='ko') print(text) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
3. whisper
- 코드
import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text = r.recognize_whisper(audio, language='ko') print(text) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
3. 테스트
- 테스트 파일
- 안녕하세요. 이것은 테스트 문장입니다.
- 코드
import speech_recognition as sr import json r = sr.Recognizer() with sr.Microphone() as source: print('listening...') audio = r.listen(source, timeout=10, phrase_time_limit=10) print("......") try: text_google = r.recognize_google(audio, language='ko', show_all=True) text_google = dict(text_google)['alternative'][0]['transcript'] if 'alternative' in dict(text_google).keys() else "" text_vosk = r.recognize_vosk(audio, language='ko') text_vosk = json.loads(text_vosk)['text'] text_whisper = r.recognize_whisper(audio, language='ko') print("[Google]", text_google) print("[Vosk]", text_vosk) print("[whisper]", text_whisper) except sr.UnknownValueError: print("Recognizer Failed..") except sr.RequestError as e: print("Request Failed...", e)
- 인식 결과
[Google] 안녕하세요 이것은 테스트 문장입니다 [Vosk] 륜 리아의 아이버슨 테스트 문자 입니다 [whisper] 안녕하세요 이것은 테스트 문장입니다