The Screener That Cannot Listen: What AI Interview Bots Actually Measure

A recent piece in The Verge documents a reporter going through an AI-conducted job interview, and the HackerNews thread that followed accumulated over 400 comments almost immediately. The volume of reaction is telling. Most people in software have either encountered one of these systems or heard about them from someone who has, and the experience tends to produce a specific kind of unease that is difficult to articulate but easy to recognize.

The unease is not about technology replacing humans in the abstract. It is about a specific mismatch: the system presents itself as an evaluator of competence, but it is measuring something considerably narrower than that.

What These Systems Are Actually Doing

The main commercial platforms in this space, HireVue, Paradox (which makes the Olivia chatbot), and the now-HireVue-acquired Modern Hire, all take slightly different approaches, but they share a common architecture. A candidate records video responses to structured questions. The system processes that recording across several channels: the transcript gets passed through NLP models that score word choice, vocabulary range, use of filler words, and semantic coherence. The audio signal is analyzed for pace, fluency, and tonal variation. Historically, video frames were processed for facial micro-expressions and eye movement, though HireVue publicly dropped its visual analysis component in 2021 after sustained pressure from researchers and a 2019 FTC complaint filed by the Electronic Privacy Information Center.

What remains after removing facial analysis is still a bundle of proxy signals for traits that are themselves only loosely predictive of job performance. Speaking pace and tonal variation are correlated with confidence and practiced communication. Vocabulary range correlates with educational exposure. Semantic coherence in a five-minute structured response correlates with the ability to prepare and deliver a structured response. None of these is a bad proxy per se, but the distance between “articulate under pressure in a novel environment” and “good at the job being applied for” is significant and varies enormously by role.

The vendors would argue that their models are validated against actual employee performance data at client companies. That validation is real but limited: it captures what made existing employees successful at that company, under that management, in that economic environment, which is a narrow slice of what predicts future performance in a changed context. It also captures whatever biases existed in who got hired and who got promoted historically.

Phase Two of Algorithmic Filtering

Applicant tracking systems have been filtering resumes algorithmically for over a decade. The first generation of ATS filtering was blunt: keyword matching against job descriptions, which produced a well-documented incentive for candidates to stuff their resumes with terms that parse correctly rather than terms that accurately describe their experience. The signal degraded as candidates learned the rules of the game.

AI video interviews are the second phase of the same dynamic, just with a higher-bandwidth signal. The candidate is now optimizing not just their written words but their recorded presence. Several coaching services have emerged specifically to train candidates on how to perform well in HireVue assessments: slow your pace, make explicit use of structured frameworks like STAR, vary your vocabulary, minimize filler words. YouTube has no shortage of tutorial videos.

This creates the familiar Goodhart’s Law problem: once a measure becomes a target, it ceases to be a good measure. A candidate who has practiced specifically for AI interview scoring looks different from a candidate who has not, but that difference is a measure of interview preparation, not job competence. For roles that require interviewing skills, that might be relevant. For most roles, it is noise.

The Regulatory Response Is Uneven

Illinois passed the Artificial Intelligence Video Interview Act in 2020, one of the first state-level laws specifically addressing AI in hiring. It requires employers to notify candidates that AI is being used to evaluate their video responses, explain how the AI works and what characteristics it evaluates, and obtain consent before the video is analyzed. It also prohibits sharing video with third parties except for the purpose of AI analysis.

This is a disclosure-and-consent framework, which is a reasonable first step, but disclosure without recourse is limited protection. A candidate in Illinois knows an AI is evaluating their video, but they cannot see the score, cannot request an explanation of a rejection, and cannot effectively appeal. The information asymmetry remains nearly total.

The EEOC has engaged with algorithmic hiring tools through its existing framework for disparate impact discrimination. The 2022 technical assistance document on AI in employment made clear that standard anti-discrimination law applies to algorithmic tools, meaning employers can be held liable if an AI screening tool has a disparate impact on protected classes even without discriminatory intent. This creates real legal exposure, but the burden of discovering and proving disparate impact falls on candidates and advocacy organizations, not on vendors.

The Asymmetry That Matters Most

In a human interview, both parties are evaluating each other. The candidate is assessing whether the role, the manager, and the company culture are worth committing to. That mutual evaluation is not just a courtesy; it produces information that improves matching on both sides. A candidate who asks sharp questions, expresses genuine skepticism about something in the job description, or pushes back on an unclear expectation reveals things about themselves that a scripted recording cannot capture, and they learn things about the role that a one-way system cannot communicate.

An AI screening interview collapses this to a one-directional assessment. The bot does not answer questions. It does not respond to context. It does not adjust when a candidate explains that a question is ambiguous. The candidate is performing into a void, knowing only that something is watching and scoring.

This is not inherently disqualifying as a technology. There are real costs to early-stage human screening at scale, and some proxy filtering is necessary when ten thousand people apply for one position. The problem is the gap between what these systems claim to do and what they actually do. Vendors frame AI interviews as objective and evidence-based, which is technically true in a narrow sense and misleading in the broader sense that matters. The system is evidence-based about whatever its training data happened to capture, which reflects historical hiring decisions made by humans with their own biases and constraints.

The candidate in The Verge article walked away feeling evaluated by something that could not hear them. That feeling is not irrational. These systems are good at processing audio. They are not good at listening.