ABSTRACT

This chapter describes a number of signal-processing and statistical modeling techniques commonly used to calculate likelihood ratios in human-supervised automatic approaches to forensic voice comparison. These techniques described mel-frequency cepstral coefficients (MFCCs) feature extraction, Gaussian mixture model-universal background model (GMM-UBM) systems, i-vector-probabilistic linear discriminant analysis (i-vector PLDA) systems, deep neural network (DNN) based systems (including senone posterior i-vectors, bottleneck features, and embeddings/x-vectors), mismatch compensation, and score-to-likelihood-ratio conversion (also known as calibration). Empirical validation of forensic-voice-comparison systems is also covered.

The aim of the chapter is to bridge the gap between general introductions to forensic voice comparison and the highly technical automatic-speaker-recognition literature from which the signal-processing and statistical modeling techniques are mostly drawn. Knowledge of the likelihood-ratio framework for the evaluation of forensic evidence is assumed. The material should be of value to students of forensic voice comparison and to researchers interested in learning about statistical modeling techniques that could be applied to data from other branches of forensic science.