We Are Starting to Sound Like the Thing We Built

“Stay on the road. Keep clear of the moors.“

July 05, 2026

There's a particular kind of horror in noticing you've started saying "delve" unironically. Not typing it – saying it, out loud, mid-conversation, with no irony available to reach for once you clock what you've just done.

It's not a word that was in common circulation two years ago, at least not outside a fairly narrow band of academic writing. The honest answer for where it came from is a chatbot, by way of several million other people's mouths, and apparently none of us really noticed it happening until someone went and measured it.

This isn't a post about AI writing code, or AI writing poetry badly, or any of the ground I've already gone on this blog. This is about something quieter and, I think, more unsettling.

The actual words coming out of actual human mouths are starting to drift toward whatever a language model prefers. And the drift is measurable, recent, and ongoing while I type this.

The Study That Made Me Feel Watched

Researchers at the Max Planck Institute for Human Development did something fairly clever to test whether this was real or just a feeling. They fed millions of pages of e-mails, essays, and articles into ChatGPT with instructions to "polish" the text.

Then they noted which words the model kept reaching for on its own – "delve," "realm," "meticulous," "underscore," "boast," "garner" – and called this set GPT words.

Then they went looking for those exact words across roughly 740,000 hours of spoken English: over 360,000 academic YouTube talks and 771,000 podcast episodes, comparing usage from before and after ChatGPT's release.

The lead researcher, Hiromu Yakura, apparently started this whole investigation because he'd noticed the shift in himself first. He says he realised he was using "delve" more about a year after ChatGPT came out, and wanted to know if it was happening to other people too.

It was. The team found the increase held up even after building in comparisons against synonyms nobody had accused of being AI-flavoured, and the effect wasn't confined to scripted, formal content either. It turned up in spontaneous, unscripted conversation as well.

What I find genuinely strange isn't that the words spread. Words always spread – that's just how language works, and I'm not about to get precious about it from a man who's spent a chunk of this year learning Gàidhlig, a language that's been absorbing and shedding loanwords for a thousand years without asking my permission first.

What's strange is the mechanism. This isn't slang moving from one group of people to another. It's a cultural feedback loop – we train the machines, they talk back to us, and then we talk like them, in the words of one of the study's co-authors.

The words didn't originate in anyone's actual life. They originated in a statistical average of everyone else's writing, got handed back to us with a coat of algorithmic polish, and now they're ours. Except they were never really anyone's to begin with.

It's Not Even Consistent, Which Is the Bit That Gets Me

Here's the detail that stuck with me longer than the headline finding: the spread isn't even.

The paper itself found the effect statistically significant in Science and Technology, Business, and Education podcasts, but not in Religion and Spirituality or Sports. Podcasters talking about football are apparently immune in a way podcasters talking about start-ups aren't.

That's not a coincidence, and I don't think it's really about topic at all. It's about who's spending their working day reading and half-consciously absorbing the specific register that AI writing assistants default to, then carrying it into a microphone a few hours later without meaning to.

Which means the words aren't spreading evenly through the population. They're spreading through the professional and educational classes first – the people whose jobs already involve a laptop and a Slack window and a "can you make this sound more polished" request typed into a text box at 4pm on a Thursday.

If that's where the infection vector runs, it says something slightly bleak about which register of English gets treated as the aspirational one. And it isn't the one people actually reach for when they're talking about something they love without a deadline attached.

The Same Thing Is Already Visible in Print

None of this is confined to speech, either, and the paper trail on the written side is honestly clearer.

A separate piece of research out of Tübingen, published in Science Advances, looked at more than 15 million biomedical abstracts published on PubMed between 2010 and 2024. It tracked which words suddenly started appearing far more often than their pre-2022 baseline would predict.

The jump was abrupt, dated precisely to ChatGPT's release, and large enough that the researchers put a lower bound of 13.5% of 2024 abstracts as likely processed through an LLM at some stage – climbing to 40% in some sub-fields.

That's scientific writing, supposedly the most carefully checked prose that exists, quietly absorbing a machine's fingerprints into its vocabulary at a scale big enough to out-measure a pandemic. The paper genuinely draws that comparison.

It's the kind of sentence that should make anyone doing academic writing pause for a second before they hit "improve clarity" on whatever tool they've got open.

Why the Word Choice Actually Matters

I'll admit the instinct here is to shrug. So what if "delve" is having a moment.

Words go in and out of fashion constantly, and I'd be a hypocrite to complain about linguistic drift while learning a Celtic language that's spent centuries drifting itself.

But I don't think this is ordinary drift, and the reason is the direction of travel. Ordinary linguistic change moves outward from someone who actually needed a word for something – a community, a subculture, a specific bit of lived experience that didn't have vocabulary yet, so it made some.

GPT words move the other way. They started as a statistical preference inside a model trained to sound authoritative and inoffensive to as many readers as possible at once, and now they're moving into the mouths of people who never needed them – replacing words that were doing the job fine already.

"Meticulous" isn't filling a gap in the language. It's just quietly nudging out "careful," "thorough," and "precise," because a machine optimised for sounding a certain way said it first, and enough of us were reading that output long enough to catch it like a cold.

One computer scientist quoted in coverage of the podcast study put it more bluntly than I would have dared: the language of ChatGPT is infectious, and people are drawn to it because it feels authoritative.

Feels authoritative isn't the same as is precise, or is honest, or is actually yours. It's a texture people are borrowing because it sounds like competence, in the same way a suit sounds like a job interview whether or not the person wearing it can actually do the work.

The Bit Where I Admit I'm Not Innocent Here

I write a lot of things. Poetry that's genuinely, non-negotiably mine – I've said before that I won't let AI anywhere near that, and I'm not walking it back here.

But also code comments, commit messages, blog posts like this one, README files, the odd Slack-adjacent message to people I barely know. I'd be lying if I said none of that prose has ever leaned on a phrase I picked up from somewhere I can't fully trace back.

I don't think I've said "delve" in writing recently, mercifully. But I wouldn't put money on "underscore" being entirely clean.

The uncomfortable thing about a study like this is that it doesn't give you a way to audit yourself after the fact. You don't get a little pop-up telling you which of your words came from you and which ones you absorbed off a machine's polished draft three scrolls back.

It just sits there, quietly, as a fact about how language works now. A large enough chunk of everyone's reading diet has been machine-smoothed prose that some of the machine's habits are leaking backward into people who never touched the machine directly, through the people who did.

What I Actually Want to Say About This

I don't think this makes anyone's speech fake, exactly – not in the way I do think AI-generated poetry is a category error rather than a lesser version of the real thing.

Nobody's faking anything by saying "delve." They picked the word up honestly, the way you pick up any word, by hearing it enough times that it started to feel available. That's just how language absorption has always worked.

What bothers me is the source doing the seeding. Normally the words filtering into common use came from someone who lived through something and needed a name for it – slang from a subculture, a borrowing from a language that got somewhere first, a coinage from someone clever enough to notice a gap.

This time the words are filtering in from a machine's best statistical guess at what sounds authoritative to as many people as possible – trained on writing it never lived through and doesn't understand, handed to us in a tone calibrated to sound trustworthy regardless of whether there's anything underneath it.

We are, slowly, starting to sound like the average of everyone else's polished first draft. And the average of everyone else's polished first draft was never a person.

I don't have a tidy fix for this, and I'm suspicious of anyone who claims they do. I'm not going to stop saying "delve" out of some performance of purity – I'd have to police myself constantly and I'd almost certainly miss half of it anyway.

The best I've got is noticing, which is a fairly small thing to offer given the size of the problem. But it's the same thing I'd tell a friend who'd picked up a verbal tic from someone they spend too much time around: you don't have to stop the tic. You just have to know it isn't originally yours.

Two Months and Six Days for a Cube

llm

llms

language

devolution

Ewan’s Blog

I ramble, enjoy.