Copilot for Everything: Training your AI replacement one keystroke at a time

89 comments

·March 1, 2025

Chance-Device

> the thing that scares me about the existence of this data is that it seems well within the capabilities of current technology to train a model that can replicate me, in some sense

There’s already all of your posts on social media accounts, all your emails on various servers, all of your text messages, all the notes you’ve written anywhere in any form that might end up in some database in the future.

It does make me wonder how much of a person could be inferred by an LLM or future AI from that data. It would never be enough though, I think, to do it properly. There are too many experiences and knowledge you have that might influence what you write without being directly expressed.

Will all of our content end up in some database in the future, and someone decides to make agents based on what they can link to specific identities? Interesting thought.

javajosh

That only captures your output, not your input. The best people to simulate in this world would be so-called terminally online people virtually all of whom's input is itself online. So for those who've read a lot of paper books or done a lot of traveling or had a lot of offline conversations or relationships, I think it would be difficult to truly simulate someone.

visarga

I think aggregate information across billions of humans can compensate. It would be like a human personality model, that can impersonate anyone. How do you train such a model? Simple -

Collect texts with known author and date. They can be books, articles, papers, forum and social network comments, emails, open source PRs, etc. Then assign each author a random ID, and train the model with "[Author-ID, Date] Text", and also "Text [Author-ID, Date]". This means you have a model that can predict authors and impersonate them. You can simulate someone by filling in the missing pieces of knowledge from the personality model.

Currently LLMs don't learn to assign attribution or condition on author. A whole layer of insight is lost, how people compare against each other, how they evolve over time. It would allow more precise conditioning by personality profile.

gavmor

While I agree somewhat with my sibling comment's assertion that "aggregate information across billions of humans can compensate", somewhat, I'd like to offer that a lot of important output is non-digital, as well!

For example, lately I've spent a lot of time with resin printers, laser cutters, vacuum chambers, and the meaningful positioning of physical models on large sheets of paper. It'll be a while yet before my haphazard, freewheeling R&D methods are replicable by robots. (Although it's tough to measure the economic value of these labors.)

darknavi

That is interesting.

I can imagine replicating my speaking/typing mannerisms quite well if I think about stages in my life. Maybe a yearly snapshot, so I could talk to my self as a teen, college student, early professional, etc.

brookst

Seems likely, given the number of models already fine tuned on notable historical figures.

Not sure if it makes it better or worse that most of us are probably mostly useful as virtual focus groups / crowds rather than any particular interest in you or me as individuals.

ForTheKidz

Our social media personas are also tiny subsets of our actual personalities. I think most people don't reveal their full character in any one medium.

now phone conversations—that would be a goldmine and a nightmare.

Chance-Device

Hmm. Good thought. Perhaps we could, for strictly educational reasons, set up some agencies who could collect these phone conversations.

Maybe the Chronicle & Information Agency? Or the National Scholarship Agency?

jocoda

I'm comforted by the thought that like me, most people with nothing to say are determined to let us know that. So I was never really bothered by the belief that everything online is being stored somewhere, because I was certain that there was so much crap to wade through that no one could make any use of it. Not so sure about that any more...

bflesch

one should operate under the assumption that all phone conversations are being recorded and will be stored eternally. with today's technology they are automatically converted to text, and some palantir-rebranded chatGPT model ranks it in different categories such as as counter-intelligence, organized crime, or terrorism.

This is state of the art and certainly done on a national scale by someone (with or without approval of your own government).

ForTheKidz

Agreed. I didn't really have the nightmare of "what if the government is impersonating me with hundreds or thousands of hours of phone conversation" until this thread, though.

TechDebtDevin

I can clone your voice ezpz, without much expertise with a 10 min phone recording.

Some banks still use "my voice is my password" for authentication. Crazy.

djfivyvusn

[dead]

sharpshadow

I’ll hope so. Much worse would be if one’s public content gets erased from history.

I fear the loss of original sources when LLMs get placed in between. We already have the unresolved issue that the training data is partly illegal and can’t be published. Accessing information through LLMs is much more efficient and is great progress but it’s build in to censor parts of the source information and likely the censored information is lost in transition.

Somehow there should be a global data vault initiative, where at least the most important information about our human endeavour is stored. It gives me a chill down my spine when I hear that content from the internet archive is being deleted on request erased from history..

crackalamoo

I made a custom GPT of myself using my blog. It understood who I am, but wasn't able to replicate me very well, and mostly sounded like generic ChatGPT with some added interest in my interests.

I would imagine fine tuning with enough data would be different though

Your output is the map, the map of your experiences. If you make a map of the map (by training a model on your output / the map), this is two abstractions away from the human being experiencing the world with all the errors and uncertainties encoded in both maps.

SJC_Hacker

Prolific authors, such as Hitchens (passed away in 2011) have been convincingly by duplicated by AI.

https://www.youtube.com/watch?v=0qIdEteK0VE

phito

How could you tell without being very close to him personally?

ericjmorey

The vast majority of my communication is not in text. Most of what I have written or typed is not in anyone's database. I'm not sure how that compares to others.

34679

The first time I changed a system prompt, I changed it to "You are George Carlin." So, I think we're already doing that, in a way.

fluidcruft

The flip side to this is that if you come to depend on a companies AI, then if you leave or are let go, you are leaving a significant part of yourself behind that you cannot take with you.

This is a real problem companies need to address before I even begin to trust or rely on these tools. It's going to lead to massive growth of shadow IT.

cmiles74

That wacky article in the NY Times where Sergey Brin recommends everyone at Google put in 60 hours a week had a bit about how hey thinks all employees need to be using their AI products more:

He highlighted the need for Google’s employees to use more of its A.I. for coding, saying the A.I.’s improving itself would lead to A.G.I. He also called on employees working on Gemini to be “the most efficient coders and A.I. scientists in the world by using our own A.I.”

https://www.nytimes.com/2025/02/27/technology/google-sergey-...

SrslyJosh

> That wacky article in the NY Times where Sergey Brin recommends everyone at Google put in 60 hours a week

Ah, yes, work 60 hour weeks so that Google can create AGI and lay off half of their employees. A brilliant plan for workers.

(I don't think they will create AGI, and "everyone working 60 hours a week" is the same kind of executive brainrot that leads to AAA games with $100 million+ budgets that get mediocre reviews. Throwing more resources at a problem does not guarantee a better outcome.)

38123

[flagged]

userbinator

then if you leave or are let go, you are leaving a significant part of yourself behind that you cannot take with you.

I suspect that's "a feature, not a bug" in the company's view.

fluidcruft

Yes, but as an information worker it means and I avoid the company's infrastructure like the plague. There should be a symbiotic solution here, but corporate IT tend to have their heads up their asses.

wkat4242

Also the 'agent' features. Our company has been blocking all agent features from regular employee access because they don't want random users building their own automations. This kind of stuff requires care and an eye for regulations like GDPR.

To permit this as Microsoft wants would lead to a lot of shadow IT. Which will be really hard to get rid of. I compare it to lotus notes which beside being an email client was also a competent database. Over the decades we used it users built up a huge library of hobby tools, many of which wormed their way into critical business processes. Making it really difficult to move away to another solution because the people that knew how it worked were long gone.

I suspect this is exactly what Microsoft wants. Us being locked into copilot so they can charge ever more for it. This is kinda their business model anyway.

Under the hood it's really not that special, it's just ChatGPT. Some special sauce to make it talk to office 365 but that's about it.

fluidcruft

Oh the horrors of people using and adapting machines to improve their own workflows and improve their own productivity.

MrLeap

What came to mind first is a situation where like, your company has a public API endpoint that sends emails for your SaaS. It accepts destination and email content as parameters. One day you find a deposed member of Nigerian royalty has used it to appeal to an entire address list worth of strangers to help them get money out of the country. They're clearly desperate, that's why they used your password reset form's API.

If your infrastructure is set up correctly, you can intercept that opportunity before it reaches the masses. Cut out the middleman. Deal with the prince directly. It's all yours now. It's your time. Daddy's eating steak tonight.

wkat4242

The problem is the continuity. Nobody to maintain it after the employee departs. And also legal compliance. For GDPR you have to be careful of where you store personal info, how you protect it etc. We don't want it ending up on random S3 buckets. For these reasons we centralise such things.

Most employees won't care about that, they're just looking for the easiest way to get their job done. But that can lead to hundreds of millions in fines not to mention the reputation damage. I don't like things being locked down either. But I understand the reasoning behind it.

mistrial9

> Us being locked into copilot

the customer of MSFT is management; product design and implementation for the C-Suite, their lawyers and their investors . You are a tool ; there is no us in this picture.

wkat4242

True, that's a very apt description. It's exactly how it feels when I deal with them. Their staff always have this condescending attitude.

fluidcruft

All I need is Citix and I can automate anything. If it's on a screen and I have a mouse and/or keyboard I can automate it.

MrLeap

I'd jokingly say "Can you automate authentic human connection?" but I wager we're about 2 years away from a million people granting power of attorney to their apple body pillows so I'm going to take it for granted that your claim is completely true to all extremes.

mattlondon

Anecdote: I recently opened VScode on my personal laptop for the first time in a year or two and, having got used to the internal AI assistance we have at work, my own non-AI-enhanced VScode felt so primitive. Like using Win95-era notepad.exe or something.

It was palpable - that was "a moment" for me. Programming has changed.

anal_reactor

Lol meanwhile I got to install correct syntax highlighting plugin after like two years at my company

amalcon

You have just made me realize that I did the same thing a couple weeks ago, and didn't even notice that the AI integration was missing.

(I find it moderately useful at work, but apparently not enough to realize it's missing unprompted).

eikenberry

Local, open source models are the answer to this. You never lose access and should be able to grow with you for years.

fluidcruft

Yes, but only if on your own hardware or as a service that stays with the individual rather than the company.

eikenberry

Locally run, open source models, on your own hardware is the only way to really own it. Services put you at the whim of a company and I don't like the idea of handing control of parts of my stack to anyone.

brookst

Is it that different from corporate culture? Someone who makes big contributions to culture is extremely valuable, and that’s left behind as well.

I could see evaluation of one’s ability to contribute to training corpus being just as important as cultural contribution (e.g. leadership, team building, etc).

belter

Udemy has recently forced all Instructors to accept the AI mode for their content. So expect soon the courses to be 100% GenAI.

Expect the same coercion, for a future role at any company: "We will record all you do to train your GenAI replacement".

Maybe it will apply to the C-Suite...

https://www.aiaaic.org/aiaaic-repository/ai-algorithmic-and-...

SrslyJosh

> Maybe it will apply to the C-Suite...

Of course it won't, though those roles are probably the most suited to being replaced by a not-so-bright chatbot.

anonymousDan

I've actually used it to generate business plans and strategies for some technical work I've done. I've not followed up but the options it created were all credible and would definitely lower the bar for someone less commercially oriented to get up and running with a business idea.

insamniac

All humans need to start being compensated for all the data they generate that gets collected. Start giving fractions of stock or whatever for all human input which can now more than ever be endlessly productized.

Arcuru

I had a similar thought recently[0], though I was thinking about it more from the copyright side. I think there needs to be some kind of additional tax taken from AI profits, possibly justified by copyrights, because otherwise the economic impact might end up being nuts.

[0] https://jackson.dev/post/generative-ai-and-copyright/

wombatpm

Your inherent value as a data source becomes the justification for UBI

babyshake

Incidentally, with the rise of OpenAI as a unicorn Sam Altman has seemed to get a lot less verbal about advocating UBI. Although I may be wrong about that.

fburnaby

I do not think this extra justification is necessary, but it is valid.

esafak

What is the point in replicating an employee when the AI could be better than all the employees? This is a problem only extreme outliers like Nobelists need to worry about. Rank and file employees are not gonna be replicated.

SrslyJosh

> What is the point in replicating an employee when the AI could be better than all the employees?

Wishful thinking at this point in time.

gukov

True, AI can be more productive, but as a starting point, fully automating a remote engineering seat is at the very least a great experiment.

Wouldn't be the first time Google is doing something like this. See recaptcha and building numbers.

SrslyJosh

> True, AI can be more productive

Citation needed. =)

chasing

Funny how automation is only a problem because workers don't own much of anything they create. Meaning, if I owned my business then automating myself away would be a dream! But for most people, automating yourself away means a complete loss of income.

I'm sure there's an economic lesson, here, our country will completely ignore.

HideousKojima

There's nothing preventing people from creating employee-owned co-ops. In fact there are some large and successful ones, like WinCo.

We did learn an economic lesson from countries that tried to make employee ownership of the means of production mandatory, and more especially we learned lessons from the mountains of skulls those countries left behind.

chasing

> We did learn an economic lesson from countries that tried to make employee ownership of the means of production mandatory…

You absolutely know that’s not what I’m talking about.

When wealth is siloed, workers are less able to advocate for themselves with risking their livelihood.

anticensor

Workers at a worker cooperative would similarly resist automation out of the fear of machines getting personhood thereby diluting the existing ownerships.

pphysch

Powerful LLMs are one thing, but it seems the current approach for full "agentic" AI like this is just another attempt at Expert Systems.

In time we will probably look at LLMs like we do ALUs; magical superhuman AI at inception, but eventually just another mundane component of human-engineered information systems.

tomaytotomato

After reading this thought experiment, I think the potential scenarios are as follows after your company ingests all your emails, git commits and messages.

- They create a passive aggressive artificial persona who leaves unhelpful messages on PRs and Slack

- They create a poor communicating artificial persona who doesn't give detailed communications and leaves "LGTM" on PRs

- They create a over communicating hyper artificial persona who keeps sending you lots of invites for pointless meetings and goes off in tangents about using a BalancedBinaryTree in your Java code, when simply a LinkedList would do.

- They create a 10x AI persona, who after 6 months of working at the company realises they can make more money elsewhere and promptly leaves, without giving any documentation, handover and you find lots of hardcoded variables left in the code that was force pushed to master.

- The artifical persona decides that it can train some low paid humans offshore to do its work whilst it ponders its own existance. After resolving its existential crisis it decides to try and write 50 recipes for the best focaccia bread, something it has known deep down it wanted to make.

Personally I am rooting for the focaccia baking AI, I love that type of bread.

SJC_Hacker

And when management wants to "change direction" they'll realize that AI they changed is now dogshit without humans to retrain it properly.

skybrian

> I’d imagine my emails are stored somewhere too, as well as the notes I wrote via Google docs.

These aren't the same. It's been many years since I worked there, but it was well known that by default, email was automatically deleted after a while unless it was overridden for some reason (as sometimes is required for legal reasons). If you want to save something, Google Docs would be a better choice.

...

> I don’t know whether they do this or even what their policies are; I’m just trying to use my own experience in the corporate world to speculate on what I imagine will be a much bigger issue in the future.

Yeah, okay, but when speculating, you should probably assume that the legal issues around discovery and corporate record retention aren't going away. Logging everything just because it might be useful someday isn't too likely, particularly at a company that has been burned by this before.

jxmorris12

This is a fair point. I remember the default chat setting was to delete all chats after 24 hours. I think emails had a similar retention policy — they were automatically after a year or something like that.

from-nibly

> Our employers have all the data they need to train AI models to replace us. How long will it be until this actually happens?

If employers did have this data they probably would understand how our jobs work better.

The dysfunction you see in the workplace woulf, by definition, only be exacerbated by AI.

Automation is just, do the same thing but more of it, harder, and without remorse. Managers have (I know it's hard to believe sometimes) remorse that interrupts their misunderstandings on how things function. AI "replacing jobs" would be just the misunderstandings

vinni2

It is an interesting problem to train an LLM to make it aware only the things I know and with only my knowledge and beliefs stored in the weights. Not to be corrupted by the knowledge from the rest of the world. For new things it can have access to the web search and learn under my supervision. I.e, what data should be used to update it’s weights.

HN

Copilot for Everything: Training your AI replacement one keystroke at a time

Copilot for Everything: Training your AI replacement one keystroke at a time