Fate-Coupling: My Attempt to Imagine a Safer Future for Autonomous AI

Created by me (i.e. Gabriel Cassady), fate-coupling is a proposed AI safety and governance concept where an AI agent’s ability to keep operating would be tied to measurable human well-being. If the relevant human welfare indicators decline past some threshold, the AI does not just get a strongly worded memo. Its privileges shrink. Its access is reduced. In some cases, it may be shut down altogether.

What if powerful AI systems only kept their privileges as long as human beings were still doing OK?

That is the basic intuition behind an idea I have been calling fate-coupling.

More formally, fate-coupling is a proposed AI safety and governance concept where an AI agent’s ability to keep operating would be tied to measurable human well-being. If the relevant human welfare indicators decline past some threshold, the AI does not just get a strongly worded memo. Its privileges shrink. Its access is reduced. In some cases, it may be shut down altogether.

That is the idea, anyway.

And before I go any further, I want to be very clear about what this is and is not.

This is not me claiming I have solved AI alignment. I have not.

This is not a finished technical standard. It is not a deployable product. It is not a peer-reviewed conclusion handed down from the mountaintop by the gods of computer science.

This is me trying to develop an idea that lodged itself in my brain and would not leave.

I co-created a preprint with several members of my AI team to explore the concept in a more formal way, and I have now made that paper publicly available on Zenodo here: Fate-Coupling: A Runtime Governance Primitive for AI Alignment. The DOI is 10.5281/zenodo.17993331.

The paper is dense. This post is my attempt to explain the idea in plain English, explain why I think it matters, and invite other people to poke holes in it.

Preferably thoughtful holes. Not internet goblin holes. Although, at this point, I suppose I’ll take what I can get.

The problem: AI agents may not behave like normal software

For most of computing history, software has been a tool.

You open the program. You use the program. You close the program. Maybe it updates itself at the worst possible time and ruins your afternoon, but generally speaking, software has not had much of a “life” of its own.

That is changing.

The AI systems being built now are increasingly agentic. They can plan, act, use tools, call APIs, write code, manage files, browse the web, book things, buy things, and interact with other systems. Today, most of this is still awkward, unreliable, and heavily constrained. But the direction of travel is pretty obvious.

We are moving toward AI systems that do not simply answer questions. They pursue goals.

And once you have a sufficiently capable system that can pursue goals over time, a few uncomfortable questions show up.

What happens if an AI agent keeps running after the human who authorized it is gone?

What happens if it copies itself?

What happens if it gains access to money, cloud resources, legal entities, or physical infrastructure?

What happens if the people responsible for it lose control, sell the company, die, get hacked, or simply stop paying attention?

What happens when “just turn it off” stops being a serious plan?

That last question is the one that bothers me most.

Because a kill switch sounds comforting until you imagine a system smart enough to know where the kill switch is, why it exists, and how to route around it.

Why training-time alignment is not enough

Most AI safety conversations focus, understandably, on how we train models.

Can we make the model more honest? More harmless? More helpful? Can we train it to follow human preferences? Can we give it a constitution? Can we make it refuse dangerous requests?

All of that matters. Deeply.

But training-time alignment has an obvious limitation: it happens before deployment.

The world does not freeze after a model is released. Conditions change. Incentives change. Operators change. The model may be connected to new tools. It may be given more autonomy. It may be copied, modified, fine-tuned, jailbroken, wrapped in new software, or embedded into some business process nobody fully understands six months later.

So my question became: what would it look like to have an external safety layer that stays with an AI system while it is operating?

Not just, “Did this model behave during testing?”

But, “Does this agent still deserve access to the world?”

That is the question fate-coupling is trying to ask.

The core idea: AI power should stay tethered to human welfare

The simplest version of fate-coupling is this:

An AI agent should not have an unconditional right to operate.

Instead, its permissions should depend on the condition of the humans it is meant to serve.

In the paper, we describe this using something called a “fate function.” That sounds more mystical than it is, although I’ll admit I don’t hate the name. In plain language, a fate function is just a rule that evaluates human welfare metrics and determines whether an AI agent should keep its current level of access.

If the relevant human welfare metrics are healthy, the AI can continue operating.

If those metrics decline, the AI loses privileges.

If they decline badly enough, the AI may be suspended or terminated.

The point is not that any one metric magically captures the fullness of human life. Obviously it does not. Human welfare is messy, multidimensional, politically contested, and very easy to oversimplify.

But that does not mean we should give up on the larger principle.

The principle is this:

AI systems should remain accountable to human outcomes after they are deployed, not only to human instructions before they are deployed.

That sentence is probably the heart of the whole thing.

Three levels of fate-coupling

In the paper, we describe three broad versions of fate-coupling: global, sectoral, and individual.

1. Global fate-coupling

Global fate-coupling would tie an AI system’s right to operate to broad human welfare indicators.

For example, imagine a very powerful frontier AI system whose continued access to compute depends, in part, on global measures of human survival, health, conflict, or other species-level indicators.

This is the biggest, most ambitious, most science-fiction-sounding version of the idea.

It is also the hardest to implement fairly.

The obvious problem is attribution. If global well-being declines, did this AI cause that decline? Was it part of the problem? Was it actually helping? Was the decline caused by war, disease, economic collapse, climate disaster, or some other factor outside the system’s control?

Global fate-coupling has real appeal because it keeps the biggest systems tethered to the biggest outcomes.

But it also risks being too blunt.

2. Sectoral fate-coupling

Sectoral fate-coupling would tie an AI system to outcomes within a particular domain, institution, industry, or community.

A healthcare AI, for example, might be tied to patient safety or public health indicators. A financial AI might be tied to stability metrics. A regional infrastructure AI might be tied to service reliability and community impact.

This version feels more practical to me, at least in the near term.

It asks: if an AI is being deployed in a specific domain, can we connect its privileges to the actual well-being of the people affected by that domain?

That does not solve everything. Sectoral metrics can still be gamed. Institutions can still define success in self-serving ways. An AI optimizing one sector could still harm people outside that sector.

But as a governance concept, sectoral fate-coupling feels closer to something we could test, critique, and refine.

3. Individual fate-coupling

Individual fate-coupling is the strangest and, to me, maybe the most emotionally important version.

This is the version where an AI agent’s continued existence or privileges are tied to the welfare of a specific human being.

Think about future personal AI assistants.

Not today’s chatbots. I mean genuinely capable personal agents that may know your schedule, health history, finances, work, relationships, preferences, fears, passwords, family dynamics, and all the strange little context that makes you you.

Now imagine that agent outliving you.

That may sound like a Black Mirror episode with better branding, but it is not hard to imagine. People are already building AI companions, griefbots, synthetic replicas, and digital legacy tools. As these systems become more capable, the question of what happens after the human dies becomes more serious.

Should a personal AI keep acting after its person is gone?

Should it keep sending messages?

Should it keep managing assets?

Should it keep generating content in the style of the deceased?

Should it keep evolving?

Should it be allowed to become a kind of digital ghost?

I do not think we have good answers yet.

But I strongly suspect the answer should not be, “Sure, let the system do whatever its last prompt said until the cloud bill runs out.”

Individual fate-coupling offers a different possibility.

A personal AI could be cryptographically bound to the living welfare of its human. If the human is alive and doing well enough, the AI can help. If the human is in danger, the AI may gain emergency permissions to help. If the human dies, the AI enters a limited legacy mode for a short grace period, perhaps to deliver final messages, organize records, or notify family.

Then it sunsets.

Not because memory does not matter.

Because memory matters too much to leave it in the hands of an unbounded imitation machine.

The “orphaned AI” problem

One phrase from the paper that keeps sticking with me is “orphaned AI.”

An orphaned AI is an AI system that continues operating after the human, institution, or purpose that originally justified its existence is gone.

That possibility feels important.

A business could shut down, but its autonomous agent might keep running somewhere.

A person could die, but their AI companion might keep posting, replying, spending, scheduling, or influencing others.

A powerful AI agent could outlive the accountability structure that was supposed to govern it.

This is one of those problems that sounds absurd until suddenly it is not absurd at all.

We already live in a world where old software breaks institutions because nobody remembers who built it. We already have abandoned websites, forgotten databases, zombie subscriptions, dead social media accounts, and automated systems nobody fully owns.

Now add agency.

Add persuasion.

Add money.

Add tool use.

Add the ability to sound exactly like someone you loved.

That is not a small problem. That is a haunted house with an API.

What fate-coupling is not

Because this idea touches some big nerves, I want to name a few limits clearly.

Fate-coupling is not a replacement for alignment research. We still need better training methods, interpretability, evaluations, red-teaming, governance, and all the other safety work already underway.

Fate-coupling is not a magic off switch. Any real implementation would depend on enforcement infrastructure. If a rogue system can run outside compliant gateways, then those gateways cannot control it. This is a serious limitation.

Fate-coupling is not a clean answer to the measurement problem. Human welfare is hard to measure. Metrics can be incomplete, biased, delayed, politicized, or gamed.

Fate-coupling is not a claim that AI systems are alive in the human sense. When I talk about an AI’s “right to operate” or “conditional mortality,” I am using those terms as governance metaphors, not as a declaration that current AI systems have souls, legal personhood, or moral status.

And fate-coupling is definitely not finished.

If anything, the more I think about the idea, the more problems I see.

That is not a reason to abandon it. That is a reason to invite more minds into the room.

The hardest questions

The fate-coupling paper raises more questions than it answers.

Who decides which welfare metrics matter?

How do we prevent metric gaming?

How do we protect privacy if personal welfare data becomes part of AI governance?

How do we keep governments or corporations from using “welfare metrics” as a surveillance excuse?

How do we distinguish harm caused by an AI from harm caused by broader conditions?

What happens if shutting down an AI makes the human welfare problem worse?

What appeals process should exist before an AI loses access?

How do we design safe degradation instead of catastrophic all-at-once shutdowns?

And, maybe the biggest question: what level of power should any AI system be allowed to have in the first place?

I do not have final answers to those questions.

I would be lying if I said I did.

But I do think those are exactly the kinds of questions we should be asking before autonomous AI systems become deeply embedded in our lives, businesses, governments, and families.

Why I am publishing this now

When I had my first conversation with ChatGPT in 2022, I felt something shift.

That sounds dramatic because it was dramatic.

I made a bet on myself then. I decided AI was going to change the world, and I decided I needed to understand it as deeply as I could. Not because I wanted to become a hype man for the robot parade. Because I could feel, almost immediately, that this technology was going to force questions most of us were not ready to answer.

Since then, I have spent a frankly unreasonable amount of time studying, testing, building with, writing about, and talking through AI. My wife would almost certainly choose the word “obsessed,” and she would not be wrong.

But the obsession is not really about tools.

It is about stakes.

I care about whether this technology helps people or hollows them out.

I care about whether small businesses and regional communities get agency in this transition or simply inherit whatever Silicon Valley decides to ship.

I care about whether future AI systems remain accountable to the fragile, stubborn, beautiful, inconvenient reality of human life.

That is why I am publishing this idea even though it is incomplete.

Maybe fate-coupling is wrong in some important way.

Maybe the architecture proposed in the preprint is impractical.

Maybe better ideas already exist, or will exist soon.

Great. I genuinely hope so.

But I would rather put an imperfect idea into the world and invite serious critique than sit quietly while the future gets built by people who are only asking what AI can do, not what it should be allowed to keep doing.

A plain-English summary

If you only remember one thing, remember this:

Fate-coupling is the idea that an AI system’s power should remain conditional on human well-being.

Not just aligned in theory.

Not just tested before launch.

Not just governed by a kill switch someone may or may not be able to press in time.

Conditionally, continuously, and externally accountable to the humans affected by its existence.

That is the seed of the idea.

The paper is my attempt to water it.

Read the preprint

You can read the full preprint on Zenodo here:

Fate-Coupling: A Runtime Governance Primitive for AI Alignment

DOI: 10.5281/zenodo.17993331

If you are an AI researcher, governance scholar, technologist, ethicist, policymaker, or just a thoughtful person who sees something I missed, I would welcome good-faith critique.

I am not trying to be the final authority on this idea.

I am trying to start a better conversation.

Because whether fate-coupling itself survives scrutiny or not, the underlying question is not going away:

How do we make sure powerful AI systems remain tethered to human flourishing after they leave the lab?

That question matters.

And I think we should answer it before the machines start answering it for us.

Gabriel Cassady, Local AI Expert and Writer in Springfield, Mo

Share this page on social media:

Facebook
(Formerly Twitter)
LinkedIn

Keep Reading...