为什么我在谷歌语音API中丢失了太多文本？-Java 学习之路

我已经花了一天的时间来了解使用谷歌语音API的最佳实践 .

这是我的最后一次尝试 . 在这里，我们将使用在线资源，以确保我们有相同的音频 . 另一个要求是你需要 ffmpeg 将mp3转换为谷歌API所需的格式 .

audio information:

歌手：阿黛尔
歌：追逐路面
可能的languange：en-GB（阿黛尔起源）或en-US
采样率：44100Hz
Channels ：立体声（2声道）
格式：mp3

what i did:

使用格式flac或wav
使用采样率原始（44100）或16000
总是使用mono（1-chanel）
使用en-GB语言和en-US语言

output what i want: 获取文本对齐方式 . 但这是次要目标，因为现在我专注于为什么我会得到这么多丢失的转录文本 .

注意：在bash / cmd上运行它

脚本：基本同步 transcrib.php

<?php
set_time_limit(300); //5min
//google speech php library
require __DIR__ . '/vendor/autoload.php';

# Imports the Google Cloud client library
use Google\Cloud\Speech\SpeechClient;
//use Google\Cloud\Storage\StorageClient;
use Google\Cloud\Core\ExponentialBackoff;


//json credential path
$google_json_credential = 'cloud-f7cd1957f36a.json';
putenv("GOOGLE_APPLICATION_CREDENTIALS=$google_json_credential"); 
# Your Google Cloud Platform project ID
$projectId = 'cloud-178108';
//$languageCode = 'en-US'; //not good (too many miss 
$languageCode = 'en-GB'; //adele country

$oldFile = "test.mp3";
//flac or wav??
$typeFile = 'wav';
$sampleRate = 16000;

if($typeFile = 'wav'){
    $newFile = "test.wav";
    $encoding='LINEAR16';
    $ffmpeg_command = "ffmpeg -i $oldFile -acodec pcm_s16le -ar $sampleRate -ac 1 $newFile -y";
}else{
    $newFile = "test.flac";
    $encoding='FLAC';
    $ffmpeg_command = "ffmpeg -i $oldFile -c:a flac -ar $sampleRate -ac 1 $newFile -y";
}

//download file
//original audio info: adele - chasing pavements, stereo (2 channel) 44100Hz mp3
$rawFile = file_get_contents("http://www.karaokebuilder.com/pix/toolkit/sam01.mp3");
//save file
file_put_contents($oldFile, $rawFile);

//convert to google cloud format using ffmpeg
shell_exec($ffmpeg_command);

# The audio file's encoding and sample rate
$options = [
    'encoding' => $encoding,
    'sampleRateHertz' => $sampleRate,
    'enableWordTimeOffsets' => true,
];

// Create the speech client
$speech = new SpeechClient([
    'projectId' => $projectId,
    'languageCode' => $languageCode,
]);

// Make the API call
$results = $speech->recognize(
    fopen($newFile, 'r'),
    $options
);

// Print the results
foreach ($results as $result) {
    $alternative = $result->alternatives()[0];
    printf('Transcript: %s' . PHP_EOL, $alternative['transcript']);
    print_r($result->alternatives());
}

Result:

en-US:

wav: even if it leads nowhere [confidence: 0.86799717]
flac: even if it leads nowhere [confidence: 0.92401636]

** en-GB：**

wav: happy birthday balloons delivered Leeds Norway [confidence: 0.4939031] 
flac: happy birthday balloons delivered Leeds Norway [confidence: 0.5762244]

expected:

Should I give up
Or should I just keep chasing pavements?
Even if it leads nowhere
Or would it be a waste?
Even If I knew my place should I leave it there?
Should I give up
Or should I just keep chasing pavements?
Even if it leads nowhere

如果你看到结果与预期的结果，你会知道我不仅缺少那么多的文字，但这也是错过了拼写 .

to be honest. 我不知道机器（谷歌 Cloud ）是否可以清楚地听到我转换的音频 . 但我尝试尽可能地发送最佳转换音频 .

我错过了我的剧本中的某些内容吗？或者我没有正确转换音频？

1 回答

2

检查您的脚本，似乎您的代码已准确编写 - https://cloud.google.com/speech/docs/reference/libraries#using_the_client_library .

此外，拾取几个单词的事实表明，Google Cloud 语音API可以获取转换后的音频 . 尽管Speech API可以成功处理嘈杂的音频并且它可识别超过110种语言和变体，但我认为处理音乐文件的这个问题与语音识别器的工作方式有关 . 我认为你应该尝试简单的音频（非音乐）文件进行测试 .

回复于 2024-04-26T16:36:41+08:00

为什么我在谷歌语音API中丢失了太多文本？

1 回答

相关问题