summaryrefslogtreecommitdiffstats
path: root/src/plugins/asr/pocketsphinx/doc/src/pocketsphinx.qdoc
blob: 5ea5114447791a62ac1625f020d8cc85654a344e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
/****************************************************************************
**
** Copyright (C) 2015 The Qt Company Ltd.
** Contact: http://www.qt.io/licensing/
**
** This file is part of the documentation of the Qt Toolkit.
**
** $QT_BEGIN_LICENSE:FDL$
** Commercial License Usage
** Licensees holding valid commercial Qt licenses may use this file in
** accordance with the commercial license agreement provided with the
** Software or, alternatively, in accordance with the terms contained in
** a written agreement between you and The Qt Company. For licensing terms
** and conditions see http://www.qt.io/terms-conditions. For further
** information use the contact form at http://www.qt.io/contact-us.
**
** GNU Free Documentation License Usage
** Alternatively, this file may be used under the terms of the GNU Free
** Documentation License version 1.3 as published by the Free Software
** Foundation and appearing in the file included in the packaging of
** this file. Please review the following information to ensure
** the GNU Free Documentation License version 1.3 requirements
** will be met: http://www.gnu.org/copyleft/fdl.html.
** $QT_END_LICENSE$
**
****************************************************************************/

/*!
   \page qspeechrecognition-pocketsphinx.html
   \title PocketSphinx Speech Recognition Plugin
   \brief Speech recognition plug-in that uses PocketSphinx engine.

   \keyword PocketSphinx

   The engine provider name for this plug-in is "pocketsphinx".

   The plug-in only supports JSGF-format grammars (loading from Qt resources is supported).

  A directory containing PocketSphinx acoustic model files needs to be installed in the
  locale-specific sub-directory under the configured engine resource directory. The name
  of the acoustic model directory should be one of the following:
  \list 1
    \li \c acmodel_SAMPLERATE
        where \c SAMPLERATE is the configured audio sample rate. Example: acmodel_16000.
        This format should be used if multiple sample rates need to be supported.
    \li \c acmodel
  \endlist

  The engine continuously adapts to certain audio path features and stores the
  adaptation state in the configured data directory. Separate adaptation states are
  stored for engines with different names (given in QSpeechRecognition::createEngine()).

  When an engine is created, the adaptation state is automatically restored from the
  file that was previously created. If the file does not exist, a default initial
  state is used. In this case, it may take a few utterances until the speech
  recognition starts returning good results.

  \section1 Supported Parameters

  The following table lists the supported engine parameters.
  See QSpeechRecognitionEngine::supportedParameters() for generic details of the
  parameters. The minimally needed parameters are \l {QSpeechRecognitionEngine::}{ResourceDirectory}
  and \l {QSpeechRecognitionEngine::}{Locale}.
  Parameter \l {QSpeechRecognitionEngine::}{Dictionary} is needed if the default
  dictionary name is not used.

  \table
  \header
    \li Key
    \li Value type
    \li Description
  \row
    \li \l {QSpeechRecognitionEngine::}{Locale}
    \li QLocale
    \li
  \row
    \li \l {QSpeechRecognitionEngine::}{Dictionary}
    \li QUrl
    \li PocketSphinx (CMU) format dictionary file.
        If not given, file "lexicon.dict" in the locale-specific resource directory is used.
        Loading the dictionary from Qt resources is not supported.
  \row
    \li \l {QSpeechRecognitionEngine::}{ResourceDirectory}
    \li QString
    \li
  \row
    \li \l {QSpeechRecognitionEngine::}{DataDirectory}
    \li QString
    \li
  \row
    \li \l {QSpeechRecognitionEngine::}{DebugAudioDirectory}
    \li QString
    \li
  \row
    \li \l {QSpeechRecognitionEngine::}{AudioSampleRate}
    \li int
    \li
  \row
    \li \l {QSpeechRecognitionEngine::}{AudioInputFile}
    \li QString
    \li
  \row
    \li \l {QSpeechRecognitionEngine::}{AudioInputDevices}
    \li QStringList
    \li
  \row
    \li \l {QSpeechRecognitionEngine::}{AudioInputDevice}
    \li QString
    \li
  \endtable

  \section1 Quick Start Guide

  The following instructions are for setting up the PocketSphinx engine for US English.

  \section2 1. Make sure you have the PocketSphinx plug-in for QtSpeechRecognition

  In directory plugins/speechrecognition in your Qt installation directory you should
  have a library file that contains "pocketsphinx" in the name. If you compiled Qt from
  the source, you may need to manually compile PocketSphinx libraries by following
  the instructions in QtSpeech source directory.

  \section2 2. Set up a resource directory for PocketSphinx

  Create a directory into which PocketSphinx resources can be copied. Path to this directory will
  be given to QSpeechRecognition::createEngine() in parameter \l {QSpeechRecognitionEngine::}{ResourceDirectory}.

  Under this directory, create subdirectory "en" for US English resources.

  \section2 3. Download US English acoustic model

  One of the \l {http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/}
  {CMU Sphinx US English Acoustic Models} needs to be downloaded and extracted in the US English resource
  directory created above. The extracted directory should be renamed to "acmodel".
  After this step, the directory \e{ResourceDirectory}/en/acmodel should contain all the individual
  acoustic model files, like "mdef".

  The default audio sampling rate supported by the models is 16Khz. The 8KHz models should only be used if
  no higher sampling rate is available in the audio input. In this case, the audio sampling rate also
  needs to be changed in the engine parameters.

  Currently the best acoustic model is \l {http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/cmusphinx-en-us-5.2.tar.gz}
  {the continuous model version 5.2}. PTM models are smaller but slightly less accurate.

  \section2 4. Download CMU Pronouncing Dictionary

  The \l {http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/sphinxdict/cmudict_SPHINX_40}
  {Sphinx-compatible version of CMUdict} should be downloaded and copied to the US English
  resource directory created above. The file should be renamed to "lexicon.dict".
  After this step, file \e{ResourceDirectory}/en/lexicon.dict should contain close to 130k
  lines, with each line containing a pronunciation rule for one English word.

  \section2 5. Configuring the engine

  For US English speech recognition with 16KHz audio input, at least the following
  parameters should be given to QSpeechRecognition::createEngine():

  \table
  \header
    \li Key
    \li Value type
    \li Value
  \row
    \li QSpeechRecognitionEngine::Locale
    \li QLocale
    \li \c QLocale("en_US")
  \row
    \li QSpeechRecognitionEngine::ResourceDirectory
    \li QString
    \li Path to the PocketSphinx resource directory created in step 2 above.
  \endtable
*/