The New Interview Prep: Performing for a Rubric You Cannot Read

The Verge article on being interviewed by an AI bot went to the top of Hacker News almost immediately, pulling in over 400 comments from people who had been through similar experiences or found the concept alarming enough to engage with. The discomfort it captures is genuine: talking into a camera, answering structured questions, receiving no social feedback, and wondering what the system is actually measuring. That last part is the one worth sitting with.

Job interviews have always had hidden evaluation criteria. Every interviewer has things they care about that they will not say explicitly. Cultural fit is the classic euphemism, but there are subtler ones: how you respond to silence, whether you ask questions, how you frame failure. Candidates have always had to perform for criteria they could not fully see. AI screening makes this explicit in a way that strips away any remaining pretense.

The coaching industry it spawned

A small industry has built up specifically around gaming AI interview systems, and it tells you something about how these tools function. It started with applicant tracking systems. Resume optimization services and tools like Jobscan built businesses on helping candidates pass ATS keyword filters: include exact phrases from the job description, use clean formatting, avoid tables or graphics that confuse parsers. Knowing a system’s limitations and writing around them became routine job-seeking practice.

AI interview coaching is the next layer. There are now consultants, courses, and YouTube channels advising candidates on how to perform for asynchronous video screening tools. The advice includes: speak at roughly 130 to 150 words per minute, because systems like HireVue reportedly score candidates lower when they speak unusually slowly or quickly. Use structured response frameworks like STAR (Situation, Task, Action, Result), because these systems parse narrative coherence, and meandering answers score poorly regardless of content. Avoid filler words. Maintain eye contact with the camera lens rather than the preview window. Modulate your tone; flat affect reads poorly to prosody analyzers even when the semantic content of your answers is strong.

Whether any of this advice is accurate is hard to verify, because the scoring functions are not public. The existence of this coaching market is itself informative. It suggests that candidates believe, or have found empirically, that these systems can be gamed with surface-level performance optimization, independent of actual knowledge or experience.

What features are being extracted

When I think about building something like a HireVue-style scoring system, the feature space is not hard to imagine. You have the audio transcription, which gives you text for an NLP pipeline. From there: semantic similarity between the candidate’s answer and a reference answer (either human-generated or derived from aggregated high-scorer responses), lexical diversity, answer length, topical coherence across sentences, use of domain-relevant vocabulary, and structural markers like whether the candidate followed a recognizable narrative format.

From the audio itself: speech rate, pause frequency and duration, pitch variation as a proxy for engagement, and filler word frequency. From video, until most platforms dropped it following sustained criticism from organizations like the AI Now Institute and the ACLU: facial expression analysis and gaze direction. HireVue removed facial analysis in 2021 after pressure from researchers who pointed out that emotion recognition from facial geometry has no validated scientific basis and correlates with race and disability in ways that create disparate impact. Most of the remaining features can be computed with off-the-shelf libraries. The scoring model is likely a gradient-boosted classifier or a shallow neural network, trained to predict some combination of “did a human recruiter advance this person” and “did this person succeed in the role.”

Each of these features is a proxy, not a direct measure of job performance. Speech rate is not job performance. Lexical diversity is not job performance. Structured narrative format is not job performance. These features might correlate with performance for certain roles, but that correlation needs empirical validation, and as has been well documented, independent validation is thin. Research from scholars like those publishing in the Journal of Applied Psychology has found that automated scoring of video interviews shows low correlations with assessor ratings and performance outcomes when evaluated by researchers independent of the vendors.

What you are demonstrating when you optimize

When you practice speaking at 140 words per minute and structuring answers using STAR format, you are demonstrating something real: the ability to identify an optimization target and change your behavior systematically to hit it, even without direct feedback on how you are doing. For some roles, that is a genuinely useful trait. But it is not what the interview is supposed to measure, and it is not what most employers think they are measuring when they deploy these tools.

The employers deploying these systems believe they are getting signal about communication quality, relevant experience, and cultural alignment. What they are getting is signal about candidates’ familiarity with the optimization game, combined with whatever the model learned from historical hires. Candidates who are well-coached for AI interviews, or who naturally speak in the patterns the system rewards, advance. Candidates who do not, regardless of their qualifications, do not.

This pattern appears in every standardized selection process: SAT tutoring, interview coaching for management consulting, LinkedIn optimization for professional visibility. The AI version is more problematic for two reasons. First, the criteria are proprietary, so candidates cannot study the rubric directly. Second, there is no human in the loop who might notice and discount a too-polished answer the way a skilled interviewer would.

The developer’s vantage point

I build software. I have never built a hiring system, but I could. The feature extraction is not technically hard; the scoring model is not hard; the API integrations are not hard. What would be hard is answering whether any of it works, in the sense of producing better hires at the end. That question is a research project, not an engineering project. It requires controlled experiments that most companies will not run because the incentive to validate the tool, as opposed to just deploying it, is low.

Amazon found this out directly. In 2018, Reuters reported that Amazon had quietly scrapped an internal AI recruiting tool after discovering it systematically downgraded resumes from women. The system had trained on a decade of hiring data during which Amazon predominantly hired men, and it learned to treat signals of maleness as proxies for hirability. The team tried correcting for gender-specific language but could not fully remove the bias because it was encoded across many features, not just explicit ones. That case became public. Most do not.

The vendors have incentives to sell products, not to publish embarrassing accuracy numbers. The buyers have incentives to process candidates at scale, not to audit whether the scale processing works. The candidates have no data at all: they do not see the scoring rubric, the score they received, or any comparison to who else applied. No party in the system has both the data and the incentives to answer whether it works. The system persists because it is useful operationally (it reduces recruiter workload) regardless of whether it is useful epistemically (it identifies better candidates). Those are different claims that get treated as the same claim.

What this reveals

Regulators have started paying attention. The Illinois Artificial Intelligence Video Interview Act, in effect since 2020, requires employers to notify candidates before using AI to evaluate video interviews, explain how the AI works, obtain consent, and delete recordings within 30 days of a candidate’s request. New York City’s Local Law 144, which took effect in 2023, requires employers using automated employment decision tools to conduct and publish annual bias audits conducted by independent auditors. These laws are a start, but they create disclosure requirements, not validity requirements. You can audit a system and publish that it has disparate impact without being required to stop using it.

The candidate who sat for an AI interview, as documented by The Verge and discussed in the Hacker News thread that followed, had no way to know what they were being scored on. They could not ask for feedback. They could not appeal. They could not find out whether the system decided based on the content of their answers, the pitch of their voice, or the rate of their speech.

The companies using these systems have not necessarily thought harder about what good hiring looks like. They have delegated that question to a vendor, who encoded an approximate answer in a scoring model and called it a process. That may not be worse than what it replaced in every case. But it means consequential decisions about people are being made using criteria those people cannot see, validated by data they cannot access, with accountability distributed thin enough that no single party is responsible for whether it works.