Overview

Description

The Knomi Voice Matcher package provides software enabling the comparison of voice data. This includes (but is not limited to):

  • One to one voice audio sample comparison.
  • Enrolling a voice audio sample into a gallery.
  • Comparing a voice audio sample against an enrolled entry.

Knomi Voice Matcher provides this functionality in a robust solution with the following features:

  • Multi-core Matching - Knomi Voice Matcher provides multi-core support for improved speed and throughput. It can be configured to take advantage of as many cores as present on a system.
  • Ease of Integration - Knomi Voice Matcher has a simple, easy to learn API using REST.
  • Robust Scoring - Knomi Voice Matcher uses a comparison score that can be mapped to false positives. This allows for selecting match thresholds with confidence in the outcomes.
  • WAV Audio Support - Knomi Voice Liveness supports processing of audio in the WAV format.

Security

The Knomi Voice Matcher supports TLS 1.2.

Instructions on how to enable TLS can be found in section 5.1.

Voice Sample Types

Nexa|Voice supports matching of 2 types of voice samples.

Text Dependent

In text dependent voice matching, at first user is recommended to enroll with 3 read out of the same phrase. Then voice probe can be made of single read out of that phrase and then matching is done between the samples and the probe. In request json, it use type STATIC_PHRASE.

"voiceSamples": [ {  "data": "<base64 encoded @data_type@>",
                      "voiceSampleType": "STATIC_PHRASE"} ]

Text Independent

In text independent voice matching it is much simpler to enroll as user does not need to read out from any phrase. For example in call center scenario, user can be enrolled beforehand with any single sample audio input. Later when the user dials in and begin speaking, user can be matched on the fly. In request json, it use type TEXT_INDEPENDENT.

Note: Text independent voiceSampleType expects the audio file to have the following condition satisfied. Otherwise client will get ‘not supported exception’.

  • Input audio file should have at least 30 sec of speech i.e. if the audio input file is e.g. of 45 sec length, there cannot be more than 15 sec of silence.
  • Input audio file should only have 1 channel (i.e. of type mono)
  • Input audio file should have sample rate of either 8k, 16k, 32k or 48k.
"voiceSamples": [ {  "data": "<base64 encoded @data_type@>",
                      "voiceSampleType": "TEXT_INDEPENDENT"} ]

Text independent sample constraints

  • Currently only mono type (num of channel =1) is supported for text independent audio.
  • Currently supported sample rates for Text Independent audio(Hz) are 8000, 16000, 32000 and 48000.