ISO IEC 19794-13 pdf download

admin
ISO IEC 19794-13 pdf download

ISO IEC 19794-13 pdf download.Information technology — Biometric data interchange formats
1 Scope
This document specifies a data interchange format that can be used for storing, recording, and transmitting digitized acoustic human voice data (speech) assumed to be from a single speaker recorded in a single session. This format is designed specifically to support a wide variety of Speaker Identification and Verification (SIV) applications, both text-dependent and text-independent, with minimal assumptions made regarding the voice data capture conditions or the collection environment. Other uses for the data encapsulated in this format, such as automated speech recognition (ASR), may be possible, but are not addressed in this documnet. This document also does not address handling of data that has been processed to the feature or voice model levels. No application-specific requirements, equipment, or features are addressed in this document. This document supports the optional inclusion of non-standardized extended data. This document allows both the original data captured and digitally- processed (enhanced) voice data to be exchanged. A description of any processing of the original source input is intended to be included in the metadata associated with the voice representations (VRs). This document does not address data streaming. Provisions that stored and transmitted biometric data be time-stamped and that cryptographic techniques be used to protect their authenticity, integrity and confidentiality are out of the scope of this document. Information formatted in accordance with this document can be recorded on machine-readable media or can be transmitted by data communication between systems. A general content-oriented subclause describing the voice data interchange format is followed by a subclause addressing an XML schema definition. This document includes vocabulary in common use by the speech and speaker recognition community, as well as terminology from other ISO standards.
3 Terms and definitions
For the purposes of this document, the terms and definitions in ISO/IEC 19794-1 and the following apply.ISO and IEC maintain terminological databases for use in standardization at the following addresses: — IEC Electropedia: available at http://www.electropedia.org/ — ISO Online browsing platform: available at http://www.iso.org/obp 3.1 analog-to-digital converter (ADC) resolution exponent of the base 2 representation (the number of bits) of the number of discrete amplitudes that the analog-to-digital converter is capable of producing Note 1 to entry: Common values for ADC resolution for sound-cards are: 8, 16, 20 and 24. 3.2 audio duration duration of the complete audio containing all voice representation utterances, e.g. whole call recordings 3.3 audio encoding encoding used by the data capture subsystem, e.g. cellphone Note 1 to entry: The voice signal is encoded before being transmitted over a channel. There are many formats in use today and the number is likely to continue to change as telephones and transmission channels evolve. Formats include PCM(ITU-T G.711) and ADPCM(ITU-T G.726) for wave encoding and ACELP(ITU-T G.723.1) and CS-ACELP(ITU-T G.729 Annex A) for AbS encoding. A-law PCM and mu-law PCM are included in ITU-T G.711. Note 2 to entry: A comprehensive overview list is provided in 7.4.3.2. 3.4 compression process that reduces the size of a digital file and, accordingly, the data rate required for transmission Note 1 to entry: Some audio encodings include compression and some do not. Compression is almost always “lossy” and, therefore, has an impact on the speech signal. 3.5 cut-off frequency (lower/upper) frequency (below/above) which the acoustic energy drops 3dB below the average energy in the pass band 3.6 far-field region far enough from the source where the angular field distribution is independent of the distance from the source 3.7 interactive voice response IVR predicate title for a telephony based computer that is used to control the flow of telephone calls and to provide voice based self-service Note 1 to entry: Technology that allows a computer to detect voice and keypad inputs. Note 2 to entry: IVR systems deal with several real-world and constrained-content effects, such as emotional voices, varying environmental noises, recording of free speech, but also hotwords (e.g., yes, no, digits, keywords). Note 3 to entry: IVRs apply ASR for user navigation, where on secure applications SIV becomes relevant e.g., financial transactions via telefone. IVR systems may combine ASR and SIV to detect audio sample replays and detect user liveness by introducing on-time generated knowledge to the user that should be spoken.