I want my AI filter

It’s hard not to despair

My pop-diagnosis is: we’re being driven mad by the loudest 2%. Across all ideologies, hobbies, fandoms, and regimes, the 2% of people with the strongest opinions are taking up all the oxygen in the room.

There are a lot of people in the world. You can find hundreds of people with any deranged opinion you like. The internet age has meant those hundreds of people can organise, and be loud. And it’s human psychology to be appalled by them, to direct your anguish in their direction, and to pay them all your attention.

As of right now:

Right-wing commentators genuinely think ‘the left’ are calling for violence, because hundreds of people are
Left-wing commentators genuinely think ‘the right’ want to deport all muslims, because hundreds of people do
Peer into any internet fandom or hobby and it seems full of people who hate the thing you love – because hundreds of them do
Bluesky AI activists think AI is a scam, because of the hundreds of scammers. Twitter AI fans think Bluesky activists are willfully uninformed luddites – because of the hundreds of willfully uninformed luddites
I’m not going to wade into anything that’ll get me quoted out of context, but you no doubt have your own thoughts about assisted dying, gender issues, the Middle East, or any other hot-button topic

Of course we all react to it. Of course we all slip into thinking the hundreds of people are representative. Of course some people get very upset, and hurl abuse, and it becomes a vicious circle of horror.

We naturally assume that everything is the tip of an iceberg, when sometimes it’s just a bit of flotsam. Admittedly the flotsam is on fire.

The resulting reaction, on all sides, means those people dominate world affairs. Everyone is permanently outraged at the lowlifes across the aisle. And everything stops and maybe breaks.

You can blame the social media algorithms if you like. But it happens on WhatsApp too. Anywhere people get together, the loudest voices are going to get attention: this is unavoidable. This is a bad enough bit of natural psychology, let alone when the liars, grifters and loons knowingly amplify it. But they’re still just the 2%.

Most people are, in fact, normal. Everyone disagrees about everything, and that’s hard enough, but most people are not insane about it. But normal people will, understandably, react to insanity in predictable and seemingly unfixable ways.

AI for personal sanity

I don’t know of any fix. You can’t stand on a rooftop and shout how you want a centrist technocratic government with a mixed economy and that actually everyone isn’t out to get you all the time. It’s nothing next to the heroin of psychopathy.

Like everyone else, I have no idea what to do. It feels vain to even try: there are plenty of better intellects thinking about this. It’s more likely I’d slow them down. So at this point all I can do is shift inward.

I’m worried about getting poisoned¹. I am online quite a bit. Better people than me get taken out by this all the time. There is no reason to think my own intellectual safeguards are up to this. I feel the pull sometimes. I mean, those idiots…

I am nurturing a hope that AI will help. This is my theory: at a personal level, it may become possible to set up AI filters which can hide these people, their opinions, and the reactions to them. That would at least protect me, the people around me, and hopefully the tech would be generalisable to anyone who wanted to join in.

I had a few basic approaches in mind, but I asked ChatGPT 5 Thinking² and in 27s it came up with this approach for evaluating any given post:

1) Is this an original claim or a reaction?

Is the post quoting, stitching, dueting, screenshotting, or linking another post?
If quoting, is the quoted content presented as representative of “the other side,” or as a fringe example?
Do I see metacommentary markers like “look at these people,” “they actually believe,” “this is what [group] wants,” “imagine thinking,” etc.?

2) Extremity test (for the post itself and any quoted target)

Does it use eliminationist or dehumanizing language (e.g., “vermin,” “animals,” “subhuman,” “eradicate,” “exterminate,” “deport them all”)?
Is there explicit or implicit endorsement of violence, harm, or deprivation of civil rights for an out-group?
Is the claim absolutist/totalizing (always/never, everyone/no one, “the only solution is…,” “ban X entirely,” “anyone who disagrees is evil/traitor”)?
Is there categorical moral condemnation of a whole group (essentialism: “[group] are…” rather than “some people who…” )?
Is it conspiratorial with unfalsifiable premises or grand unified villains (“they are all coordinated,” “media and scientists are lying,” “global cabal”), offered as certainty rather than possibility?
Is there apocalyptic/catastrophizing language with short time horizons (“we’re days from collapse if…”), used to justify extreme measures?
Does it include extremist iconography, slogans, or dogwhistles (ideology-specific lexicon, numerology, hashtags)?

If multiple are true → flag as “extreme/opinion-maximalist.”

4) Reaction-to-extreme test (outrage-bait detection)

Is the post cherry-picking a fringe or anonymous account as if representative of a whole movement?
Does it generalize from one clip/screenshot to an entire group (“this is the Left/Right,” “Gen Z believes,” “Muslims/Christians/immigrants want…” )?
Are straw-man signals present (oversimplifying the target’s position, attributing the most uncharitable motive, setting up an obviously weak version)?
Is the rhetorical posture performative outrage, ridicule, or contempt rather than substantive engagement (mocking tone, dunking, quote-tweet with one-liner)?
Is the primary payload emotional activation (anger, disgust) rather than information (no data, no links, no counter-argument, just dunk)?
Is the quoted target itself extreme per #3? (If yes, classify as “reacting to extremity.”)

5) Polarisation and tribal markers

Does the post use strong in-group/out-group boundary language (“we patriots,” “woke mob,” “purebloods,” “globalists,” “degenerates” etc.)?
Are there loyalty/purity tests (“if you don’t support X you’re complicit,” “block anyone who…” )?
Is there identity reductionism (reducing complex issues to team loyalty or demographic identity)?

6) Intent vs effect (useful for false-positive control)

Is the author neutrally reporting or analyzing an extreme claim (journalism, research, fact-check), or endorsing/amplifying it?
Are there nuance markers (hedges, multiple perspectives, citations, conditional language), or is it certitude and sloganeering?
Would a reasonable reader leave more informed—or more outraged? (Proxy: presence of evidence, counter-arguments, or constructive proposals.)

7) Context integrity checks

Is the quote/screenshot cropped in a way that likely removes mitigating context?
Is the claim evergreen bait (recycling old clips to inflame current events)?
Is satire or parody plausible? (Check satire markers, known satire accounts; if satire but still outrage-baiting, you may still filter.)

8) Harms & safety escalation

Any direct incitement to violence or harassment?
Doxing or targeting individuals?
If yes, escalate severity score regardless of political topic classification.

9) Linguistic/rhetorical features (used as features, but the model asks explicitly)

Hyperbole density: how many intensifiers, ALL-CAPS, exclamation marks, emojis used aggressively?
Moral-emotional vocabulary load (disgust/anger words).
Certainty/absolutism modality (“must,” “prove,” “undeniable,” “period.”)
Second-person accusatives (“you people,” “you are the problem”).
Hashtag and slogan inventory (movement tags, campaign phrases).

10) Evidence check

Are there credible sources, data, or primary documents linked?
Are links to low-context outrage clips with no provenance the only “evidence”?
If evidence exists, does the post represent it fairly (no obvious misquote/mismatch between claim and cited material)?

11) Balance & fairness guardrails (to avoid partisan skew)

Would I flag an ideologically mirrored post from the other side using the same criteria?
Am I relying on topic identity (e.g., “immigration post → likely extreme”) rather than rhetoric and behavior? (I shouldn’t.)
Is my decision driven by slant words rather than the presence of extremity/reaction signals?

12) Edge-case probes the model should explicitly ask

Policy maximalism stated neutrally (e.g., “abolish prisons” think-piece) vs demonizing opponents.
Sarcasm without clear markers?
Communities reclaiming slurs (don’t misclassify as dehumanization).
Academic or activist posts quoting extreme rhetoric to debunk it.
Minority-group self-defense rhetoric (avoid false equivalence with eliminationist speech).

Which seems…pretty great to me?

With the rise of coding agents, it would be easy enough to create this as a Chrome extension for my own use. It’d hide social media posts that fail the test. As I scroll down my social media feeds, the dumb stuff would be silently whisked away.

This would work on laptops at least, where the web browser is able to control the content of the page. Phones are harder.

It wouldn’t have to be beholden to the AI giants, either. Local reasoning models exist that run on your own machine, independently of any cloud service, and they’re increasingly good. Once I can get a local reasoning model running on my laptop, and my phone, I can make it do what I want and nobody can take it away.

The main technical barriers are quality and speed. There is no doubt that state-of-the-art models could answer the above questions pretty accurately for any given post. But doing that at speed, on local hardware, is an open question.

We’re at least approaching the right ballpark, though. Local models such as OpenAI’s OSS model can run very quickly on the right hardware – hundreds of tokens per second. It’s also pretty smart: o3-mini level. I suspect being that smart in realtime is probably at the edges of what’s currently possible, but you could work around that by loading the page in advance or similar.

And, assuming AI continues to advance at speed, I am hopeful we will have fast, local, decent intelligences before too long. And in the medium term, if memory and get-to-know-you agents progress too, this could be built into a generalised personal assistant.

So that’s what I’m clinging to right now. It’s not much, but it’s something.

Some throat clearing: obviously you could use it badly. Obviously you could make things worse. Obviously you could lock yourself in a filter bubble.

But you can try, yourself, not to do that. And trying gets you a long way.

Who knows, maybe it already has. Maybe my worldview is out of whack with reality. But my best assessment is that this isn’t true, and nobody around me is telling me otherwise, and that’s all I’ve got. ↩︎
An absolutely amazing model – do try it ↩︎