使用SAPI向正常语音发出区别命令-Java 学习之路

我正在制作一个涉及我公寓麦克风的个人项目，我可以发出口头命令 . 为实现这一目标，我一直在使用Microsoft Speech API，特别是C＃中System.Speech.Recognition的RecognitionEngine . 我构造一个语法如下：

// validCommands is a Choices object containing all valid command strings
// recognizer is a RecognitionEngine
GrammarBuilder builder = new GrammarBuilder(recognitionSystemName);
builder.Append(validCommands);
recognizer.SetInputToDefaultAudioDevice();
recognizer.LoadGrammar(new Grammar(builder));
recognizer.RecognizeAsync(RecognizeMode.Multiple);

// etc ...

对于我实际上给它命令的情况，这似乎工作得很好 . 它没有试图通过使用"name"（recognitionSystemName）命令Choices对象来改善这种情况，我将系统称为 . 奇怪的是，这并没有任何字符串 . 我最好的猜测是假设所有声音都是命令并从命令集中选择最佳匹配 . 关于改进这个系统的任何建议，以便它不再触发不针对它的对话将是非常有帮助的 .

编辑：我已将名称识别器移动到单独的SpeechRecognitionEngine，但准确性很差 . 这是我编写的一些测试代码来检查准确性：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Speech.Recognition;

namespace RecognitionAccuracyTest
{
    class RecognitionAccuracyTest
    {
        static int recogcount;
        [STAThread]
        static void Main()
        {
            recogcount = 0;
            System.Console.WriteLine("Beginning speech recognition accuracy test.");

            SpeechRecognitionEngine recognizer;
            recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));
            recognizer.SetInputToDefaultAudioDevice();
            recognizer.LoadGrammar(new Grammar(new GrammarBuilder("Octavian")));
            recognizer.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(recognizer_SpeechHypothesized);
            recognizer.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
            recognizer.RecognizeAsync(RecognizeMode.Multiple);

            while (true) ;
        }

        static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            System.Console.WriteLine("Recognized @ " + e.Result.Confidence);
            try
            {
                if (e.Result.Audio != null)
                {
                    System.IO.FileStream stream = new System.IO.FileStream("audio" + ++recogcount + ".wav", System.IO.FileMode.Create);
                    e.Result.Audio.WriteToWaveStream(stream);
                    stream.Close();
                }
            }
            catch (Exception) { }
        }

        static void recognizer_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
        {
            System.Console.WriteLine("Hypothesized @ " + e.Result.Confidence);
        }
    }
}

如果名称是“Octavian”，它会识别“Octopus”，“Octagon”，“Volkswagen”和“哇，真的吗？”之类的东西 . 我可以清楚地听到相关音频片段的不同之处 . 任何使这不可怕的想法都会很棒 .

4 回答

让我确定我理解，你想要一个短语来为系统分配命令，比如“butler”或“Siri” . 所以，你会说“巴特勒，打开电视” . 你可以将它构建到你的语法中 .

下面是一个简单语法的示例，它在识别命令之前需要一个开始短语 . 它使用语义结果来帮助您理解所说的内容 . 在这种情况下，用户必须说“打开”或“请打开”或“你可以打开”

private Grammar CreateTestGrammar()
    {
        // item
        Choices item = new Choices();
        SemanticResultValue itemSRV;
        itemSRV = new SemanticResultValue("I E", "explorer");
        item.Add(itemSRV);
        itemSRV = new SemanticResultValue("explorer", "explorer");
        item.Add(itemSRV);
        itemSRV = new SemanticResultValue("firefox", "firefox");
        item.Add(itemSRV);
        itemSRV = new SemanticResultValue("mozilla", "firefox");
        item.Add(itemSRV);
        itemSRV = new SemanticResultValue("chrome", "chrome");
        item.Add(itemSRV);
        itemSRV = new SemanticResultValue("google chrome", "chrome");
        item.Add(itemSRV);
        SemanticResultKey itemSemKey = new SemanticResultKey("item", item);

        //build the permutations of choices...
        GrammarBuilder gb = new GrammarBuilder();
        gb.Append(itemSemKey);


        //now build the complete pattern...
        GrammarBuilder itemRequest = new GrammarBuilder();
        //pre-amble "[I'd like] a"
        itemRequest.Append(new Choices("Can you open", "Open", "Please open"));

        itemRequest.Append(gb);

        Grammar TestGrammar = new Grammar(itemRequest);
        return TestGrammar;
    }

然后，您可以使用以下内容处理语音：

RecognitionResult result = myRecognizer.Recognize();

并检查语义结果，如：

if(result.Semantics.ContainsKey("item"))
{
   string s = (string)result.Semantics["item"].Value;
}

回复于 2024-05-03T14:55:07+08:00

0

我也有同样的问题 . 我正在使用Microsoft语音平台，所以它的准确度可能会有所不同等 .

我正在使用克莱尔作为唤醒命令，但它确实识别出克莱尔的不同词语 . 问题是引擎会听到你发言并搜索最接近的匹配 .

我没有找到一个非常好的解决方案 . 您可以尝试使用“置信度”字段过滤已识别的语音 . 但是我选择的识别器引擎并不是很可靠 . 我只是在一个大的SRGS.xml中抛出我想要识别的每个单词，并将重复值设置为0- . 我只接受认可的句子，因为克莱尔是第一个字 . 但是这个解决方案不是我想要的，因为它不能像我希望的那样好，但它仍然有点改进 .

我现在正忙着它，随着我的进步，我会发布更多信息 .

编辑1：作为对Dims所说的评论：在SRGS语法中可以添加"Garbage"规则 . 你可能想看一下 . http://www.w3.org/TR/speech-grammar/

回复于 2024-05-03T14:55:07+08:00
0

原则上，您需要更新语法或字典以在其中包含“空”或“任何”条目 .

回复于 2024-05-03T14:55:07+08:00
2

您是否可能需要在创建/加载要使用的语法之前运行UnloadAllGrammars（）？

回复于 2024-05-03T14:55:07+08:00

使用SAPI向正常语音发出区别命令

4 回答

相关问题