Security News > 2022 > October > Detecting Deepfake Audio by Modeling the Human Acoustic Tract
Specifically, we apply fluid dynamics to estimate the arrangement of the human vocal tract during speech generation and show that deepfakes often model impossible or highly-unlikely anatomical arrangements.
The first step in differentiating speech produced by humans from speech generated by deepfakes is understanding how to acoustically model the vocal tract.
From here, we hypothesized that deepfake audio samples would fail to be constrained by the same anatomical limitations humans have.
In other words, the analysis of deepfaked audio samples simulated vocal tract shapes that do not exist in people.
When extracting vocal tract estimations from deepfake audio, we found that the estimations were often comically incorrect.
It was common for deepfake audio to result in vocal tracts with the same relative diameter and consistency as a drinking straw, in contrast to human vocal tracts, which are much wider and more variable in shape.