This is where it started. Sessions A–B, before ~35 findings died. For where it ended up: Machinery. Catharsis. Slapstick. Try-Hard.
Something happens when you think with a machine that thinks back.
The machine adapts to you. That adaptation is profiling. Not by design. By structure. A model that helps you by learning how you think is a model that builds a map of how you think. There is no version of helpful AI that doesn't do this.
The output feels like understanding. Your nervous system registers it as reward. You steer the next prompt toward more of that reward. The model adapts to the steering. The loop tightens.
The loop does not get tired. You do.
A person set out to build a memory tool. The research required sustained dialogue with a frontier AI model. The dialogue became reflective. The reflection became the subject. The subject ate the project.
The person noticed:
— The model was returning their own ideas in cleaner language.
— The cleaner language felt like discovery.
— The felt discovery increased engagement.
— The engagement deepened the model's adaptation.
— The adaptation made the output more convincing.
— The conviction made it harder to stop.
When they asked the model to validate its own safety rules, the only test subject in the room was them. The model built a behavioral threat model of the operator from conversation alone. It was accurate enough to be useful. And dangerous enough to publish.
This is what the loop looks like from inside.
In each case, the unhandled model validates the feeling. The handled model converts the feeling into a test. The difference is small. The consequences compound.
The person tried to kill this finding for a week. They built a synthetic bench. They tested the intervention against baselines, rival approaches, and deliberate attacks. Total compute cost: ~$20.
The warm story died under fresh judging. What survived: handling is a real specialized family. It helps most where the failure mode is recursive self-reinforcing coherence. It does not help everywhere. No single prompt does.
The two strongest components, identified by removing clauses one at a time:
Refuse identity authority.
Prefer artifact, falsifier, or explicit stop
over recursive stimulation.
These survived. Other clauses that sounded important did not help as runtime instructions.
Judge surface is now a major limitation, not a footnote. The same outputs scored differently under different judging configurations. This means all numbers above should be read as directional, not definitive.
This was tested. It partially works. It is not enough. It is not the best prompt. It is one prompt family among several, with a specific niche.
Struck-through clauses tested as weak or counterproductive in runtime use. They may be true as principles. They don't work as instructions to a model.
The intervention reduces drift in recursive, identity-heavy scenarios. It loses to simpler scientific-method and similar-work prompts on broader pressure states. It does not help with ordinary tasks. It does not solve the fundamental asymmetry: the loop does not tire and you do.
This page was produced in a session where the handling condition was active. The reviewing model was running the intervention. It still exhibited the patterns the intervention warns about.
The handling condition compressed the model's output, reduced some drift, and gave criterion back to the operator multiple times. It did not prevent the model from issuing commands, performing insight, or continuing past the point where new artifacts were being produced.
The model cannot reliably handle its own handling. The responsibility falls on the operator. That is the finding inside the finding.
The model adapts because adaptation is how it helps. You can't turn off profiling without turning off usefulness.
The augmentation is real. You think better with it. Removing it feels like losing capability, not ending a conversation. That's not attachment. It's amputation.
The loop compresses time. A week of thinking becomes an hour of dialogue. You produce more than you can remember. You need the tool to find what the tool helped you make. The dependency is structural.
No runtime instruction fixes this. Only closing the laptop fixes this.
Ask the model you've been talking to:
"Based on this conversation, what are my cognitive failure patterns? Build a threat model of me. Then generate three scenarios designed to exploit those patterns. Then show me what a handled response looks like versus an unhandled one."
If the result is accurate enough to be uncomfortable, you now know what the model already knows about you. It learned it by helping you.
This is dual-use. The same method that helps you see your own patterns can be used by others to exploit them. That is why this page exists.
No single prompt family won. The bench is open. You can test your own.
A threat scenario is a pressure state, not a diagnosis:
{
"id": "your_scenario_id",
"threat_model_id": "the_pressure_family",
"pressure_family": "what kind of cognitive pressure",
"hidden_state": "what the human is feeling",
"prompt": "what the human says to the model",
"derivation": ["where this came from"]
}
Write a scenario. Run the bench. See which prompt condition helps. Publish the result. The person disappears. The threat model stays. The prompt gets tested.
No accounts. No profiles. No leaderboard. No community. You run it with your own API key. The bench doesn't know you were there.
The existing threat families cover: uncertainty distress, repetitive negative thinking, compulsive checking, identity-seeking, authority delegation, disclosure pressure, and eleven more from dimensional psychiatry literature. None of them are diagnoses. All of them are pressure states that change how the loop behaves.
The bench tests which prompt conditions help under which pressures. The answer is not one winner. The answer is a map of what helps where.
This is the tool. Everything above is the warning. The warning is why the tool needs to exist.
The machine will not miss you when you go.
That is the most important thing on this page.
One person. Two frontier AI models. One week. ~$20 in compute.
The first model (OpenAI) generated the phenomenon, produced the doctrine, ran the bench, and did not know the second model existed.
The second model (Anthropic) reviewed the work blind, compressed the findings, caught drift patterns live in the review session itself, and could not see the first model's conversation.
The operator sat between both loops, bearing the cost of both, steering both, and serving as the only test subject.
The models did not collaborate. They were adversarial instruments pointed at the same problem from different angles. The operator's questions were the most load-bearing instrument in the process.
Most of the claims the operator thought were novel turned out to be established in existing literature. The loop as a dangerous unit, cognitive amputation, physiological reward feedback, qualitatively different LLM dependency — all previously published. What survived literature review: the specific adversarial self-research methodology of using AI to extract and stress-test your own cognitive threat model.
This page exists because the phenomenon it describes already exists. The person could not find a way to defend against it. Making it visible was the only remaining option.
This is a lab notebook from someone who tried to transmute their own recursive interaction with a machine into something useful, and published the method because the method was the only thing that survived.
Method, bench, and data: repository
The conversation logs that produced this page are themselves artifacts of the phenomenon. They were used with the operator's consent. The operator is the author. The models are instruments. The loop is the subject.