我如何转录大文件以避免使用Google Speech API异步转录为大型音频文件发生错误 Operation not complete and retry limit reached.

可能的解决方案

If the operation has not completed, you can poll the endpoint by repeatedly making the GET request until the done property of the response is true.

在python中这样做是否可行?或者我应该将文件分解为更小的文件并重试?

Speech API的已知问题

  • 编码 .

WHAT I HAVE TRIED SO FAR


编码命令

ffmpeg -i 2017-06-13-17_48_51.flac -ac 1 mono.flac

为什么ffmpeg over sox?

我选择了ffmpeg,因为我使用sox得到了这个错误

sox 2017-06-13-17_48_51.flac --channels=1 --bits=16 2017-06-13-17_48_51_more_stable.flac

袜子WARN抖动:抖动剪裁了55个样本;减少量?

输入音频文件

Input File : '2017-06-13-17_48_51.flac' Channels : 2 Sample Rate : 48000 Precision : 16-bit Duration : 00:21:18.40 = 61363200 samples ~ 95880 CDDA sectors File Size : 60.7M Bit Rate : 380k Sample Encoding: 16-bit FLAC

跑完这个命令

ffmpeg -i 2017-06-13-17_48_51.flac -ac 1 mono.flac

输出音频文件

Input File : 'mono.flac' Channels : 1 Sample Rate : 48000 Precision : 16-bit Duration : 00:21:18.40 = 61363200 samples ~ 95880 CDDA sectors File Size : 59.9M Bit Rate : 375k Sample Encoding: 16-bit FLAC Comment : 'encoder=Lavf56.40.101'

Python文件

Google Speech API Asynchronous Ex . w / Explicit Credentials我将Flac Hertz更改为“48000”并输入显式环境路径导入argparse
进口io
进口时间
进口口
os.environ [“GOOGLE_APPLICATION_CREDENTIALS”] =“cloud_speech_service_keys.json”
def transcribe_file(speech_file):
msgstr“”“异步转录给定的音频文件 . ”“
来自google.cloud导入演讲
speech_client = speech.Client()

使用io.open(speech_file,'rb')作为audio_file:
content = audio_file.read()
audio_sample = speech_client.sample(
内容,
source_uri =无,
编码= 'LINEAR16',
sample_rate_hertz = 16000)

operation = audio_sample.long_running_recognize('en-US')

retry_count = 100
而retry_count> 0而不是operation.complete:
retry_count - = 1
time.sleep(2)
operation.poll()

如果不是operation.complete:
print('操作未完成且已达到重试限制 . ')
返回

alternative = operation.results
替代方案:
print('Transcript:{}' . format(alternative.transcript))
print('Confidence:{}' . format(alternative.confidence))
#[END send_request]
def transcribe_gcs(gcs_uri):
msgstr“”“异步转录gcs_uri指定的音频文件 . ”“
来自google.cloud导入演讲
speech_client = speech.Client()

audio_sample = speech_client.sample(
含量=无,
source_uri = gcs_uri,
编码= 'FLAC',
sample_rate_hertz = 48000)

operation = audio_sample.long_running_recognize('en-US')

retry_count = 100
而retry_count> 0而不是operation.complete:
retry_count - = 1
time.sleep(2)
operation.poll()

如果不是operation.complete:
print('操作未完成且已达到重试限制 . ')
返回

alternative = operation.results
替代方案:
print('Transcript:{}' . format(alternative.transcript))
print('Confidence:{}' . format(alternative.confidence))
#[END send_request_gcs]

如果__name__ =='__ main__':
parser = argparse.ArgumentParser(
描述= __ doc__会给出,
formatter_class = argparse.RawDescriptionHelpFormatter)
parser.add_argument(
'path',help ='要识别的音频文件的文件或GCS路径')
args = parser.parse_args()
如果args.path.startswith('gs://'):
transcribe_gcs(args.path)
其他:
transcribe_file(args.path)