Using phrases instead of sentences to record data is a better method of documenting data.
Skip to main content This browser is no longer supported. Show
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support. Training and testing datasets
In this articleIn a Custom Speech project, you can upload datasets for training, qualitative inspection, and quantitative measurement. This article covers the types of training and testing data that you can use for Custom Speech. Text and audio that you use to test and train a custom model should include samples from a diverse set of speakers and scenarios that you want your model to recognize. Consider these factors when you're gathering data for custom model testing and training:
Data typesThe following table lists accepted data types, when each data type should be used, and the recommended quantity. Not every data type is required to create a model. Data requirements will vary depending on whether you're creating a test or training a model.
Training with plain text or structured text usually finishes within a few minutes. Tip Start with plain-text data or structured-text data. This data will improve the recognition of special terms and phrases. Training with text is much faster than training with audio (minutes versus days). Start with small sets of sample data that match the language, acoustics, and hardware where your model will be used. Small datasets of representative data can expose problems before you invest in gathering larger datasets for training. For sample Custom Speech data, see this GitHub repository. If you will train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. See footnotes in the regions table for more information. In regions with dedicated hardware for Custom Speech training, the Speech service will use up to 20 hours of your audio training data, and can process about 10 hours of data per day. In other regions, the Speech service uses up to 8 hours of your audio data, and can process about 1 hour of data per day. After the model is trained, you can copy the model to another region as needed with the CopyModelToSubscriptionToSubscription REST API. Consider datasets by scenarioA model that's trained on a subset of scenarios can perform well in only those scenarios. Carefully choose data that represents the full scope of scenarios that you need your custom model to recognize. The following table shows datasets to consider for some speech recognition scenarios:
To help determine which dataset to use to address your problems, refer to the following table:
Audio + human-labeled transcript data for training or testingYou can use audio + human-labeled transcript data for both training and testing purposes. You must provide human-labeled transcriptions (word by word) for comparison:
For a list of base models that support training with audio data, see Language support. Even if a base model does support training with audio data, the service might use only part of the audio. And it will still use all the transcripts. Important If a base model doesn't support customization with audio data, only the transcription text will be used for training. If you switch to a base model that supports customization with audio data, the training time may increase from several hours to several days. The change in training time would be most noticeable when you switch to a base model in a region without dedicated hardware for training. If the audio data is not required, you should remove it to decrease the training time. Audio with human-labeled transcripts offers the greatest accuracy improvements if the audio comes from the target use case. Samples must cover the full scope of speech. For example, a call center for a retail store would get the most calls about swimwear and sunglasses during summer months. Ensure that your sample includes the full scope of speech that you want to detect. Consider these details:
A large training dataset is required to improve recognition. Generally, we recommend that you provide word-by-word transcriptions for 1 to 20 hours of audio. However, even as little as 30 minutes can help improve recognition results. Although creating human-labeled transcription can take time, improvements in recognition will only be as good as the data that you provide. You should upload only high-quality transcripts. Audio files can have silence at the beginning and end of the recording. If possible, include at least a half-second of silence before and after speech in each sample file. Although audio with low recording volume or disruptive background noise is not helpful, it shouldn't limit or degrade your custom model. Always consider upgrading your microphones and signal processing hardware before gathering audio samples. Custom Speech projects require audio files with these properties:
Plain-text data for trainingYou can add plain text sentences of related text to improve the recognition of domain-specific words and phrases. Related text sentences can reduce substitution errors related to misrecognition of common words and domain-specific words by showing them in context. Domain-specific words can be uncommon or made-up words, but their pronunciation must be straightforward to be recognized. Provide domain-related sentences in a single text file. Use text data that's close to the expected spoken utterances. Utterances don't need to be complete or grammatically correct, but they must accurately reflect the spoken input that you expect the model to recognize. When possible, try to have one sentence or keyword controlled on a separate line. To increase the weight of a term such as product names, add several sentences that include the term. But don't copy too much - it could affect the overall recognition rate. Note Avoid related text sentences that include noise such as unrecognizable characters or words. Use this table to ensure that your plain text dataset file is formatted correctly:
You must also adhere to the following restrictions:
Structured-text data for trainingNote Structured-text data for training is in public preview. Use structured text data when your data follows a particular pattern in particular utterances that differ only by words or phrases from a list. To simplify the creation of training data and to enable better modeling inside the Custom Language model, you can use a structured text in Markdown format to define lists of items and phonetic pronunciation of words. You can then reference these lists inside your training utterances. Expected utterances often follow a certain pattern. One common pattern is that utterances differ only by words or phrases from a list. Examples of this pattern could be:
For a list of supported base models and locales for training with structured text, see Language support. You must use the latest base model for these locales. For locales that don't support training with structured text, the service will take any training sentences that don't reference any classes as part of training with plain-text data. The structured-text file should have an .md extension. The maximum file size is 200 MB, and the text encoding must be UTF-8 BOM. The syntax of the Markdown is the same as that from the Language Understanding models, in particular list entities and example utterances. For more information about the complete Markdown syntax, see the Language Understanding Markdown. Here are key details about the supported Markdown format:
Here's an example structured text file:
Pronunciation data for trainingSpecialized or made up words might have unique pronunciations. These words can be recognized if they can be broken down into smaller words to pronounce them. For example, to recognize "Xbox", pronounce it as "X box". This approach won't increase overall accuracy, but can improve recognition of this and other keywords. You can provide a custom pronunciation file to improve recognition. Don't use custom pronunciation files to alter the pronunciation of common words. For a list of languages that support custom pronunciation, see language support. Note You can use a pronunciation file alongside any other training dataset except structured text training data. To use pronunciation data with structured text, it must be within a structured text file. The spoken form is the phonetic sequence spelled out. It can be composed of letters, words, syllables, or a combination of all three. This table includes some examples:
You provide pronunciations in a single text file. Include the spoken utterance and a custom pronunciation for each. Each row in the file should begin with the recognized form, then a tab character, and then the space-delimited phonetic sequence.
Refer to the following table to ensure that your pronunciation dataset files are valid and correctly formatted.
Audio data for training or testingAudio data is optimal for testing the accuracy of Microsoft's baseline speech-to-text model or a custom model. Keep in mind that audio data is used to inspect the accuracy of speech with regard to a specific model's performance. If you want to quantify the accuracy of a model, use audio + human-labeled transcripts. Note Audio only data for training is available in preview for the Custom Speech projects require audio files with these properties:
Note When you're uploading training and testing data, the .zip file size can't exceed 2 GB. If you require more data for training, divide it into several .zip files and upload them separately. Later, you can choose to train from multiple datasets. However, you can test from only a single dataset. Use SoX to verify audio properties or convert existing audio to the appropriate formats. Here are some example SoX commands:
Next steps
FeedbackSubmit and view feedback for What are the way for validating and documenting findings?Terms in this set (15). recheck your data via repeat assessment.. clarify data with client by asking addtl questions.. verify data with another health care professional.. compare objective findings with subjective findings to uncover discrepancies.. What are the ways to validate data quizlet?What are the methods of validation? Repeat assessment. Clarify data with client. Verify with another health care professional.
How do nurses validate data?Validating data is done by comparing subjective and objective data, clarifying ambiguous statements, making sure that the data consist of what the clients says, and by using references, such as textbooks, journals, and research reports.
Which of the following may be used to document unusual events Responses significant observations or interactions?Chs. 1, 2, 3, 4, 5, 10. |