Copilot for Everything: Training your AI replacement one keystroke at a time
85 comments
·March 1, 2025Chance-Device
darknavi
That is interesting.
I can imagine replicating my speaking/typing mannerisms quite well if I think about stages in my life. Maybe a yearly snapshot, so I could talk to my self as a teen, college student, early professional, etc.
brookst
Seems likely, given the number of models already fine tuned on notable historical figures.
Not sure if it makes it better or worse that most of us are probably mostly useful as virtual focus groups / crowds rather than any particular interest in you or me as individuals.
ForTheKidz
Our social media personas are also tiny subsets of our actual personalities. I think most people don't reveal their full character in any one medium.
now phone conversations—that would be a goldmine and a nightmare.
jocoda
I'm comforted by the thought that like me, most people with nothing to say are determined to let us know that. So I was never really bothered by the belief that everything online is being stored somewhere, because I was certain that there was so much crap to wade through that no one could make any use of it. Not so sure about that any more...
Chance-Device
Hmm. Good thought. Perhaps we could, for strictly educational reasons, set up some agencies who could collect these phone conversations.
Maybe the Chronicle & Information Agency? Or the National Scholarship Agency?
TechDebtDevin
I can clone your voice ezpz, without much expertise with a 10 min phone recording.
Some banks still use "my voice is my password" for authentication. Crazy.
bflesch
one should operate under the assumption that all phone conversations are being recorded and will be stored eternally. with today's technology they are automatically converted to text, and some palantir-rebranded chatGPT model ranks it in different categories such as as counter-intelligence, organized crime, or terrorism.
This is state of the art and certainly done on a national scale by someone (with or without approval of your own government).
ForTheKidz
Agreed. I didn't really have the nightmare of "what if the government is impersonating me with hundreds or thousands of hours of phone conversation" until this thread, though.
djfivyvusn
[dead]
sharpshadow
I’ll hope so. Much worse would be if one’s public content gets erased from history.
I fear the loss of original sources when LLMs get placed in between. We already have the unresolved issue that the training data is partly illegal and can’t be published. Accessing information through LLMs is much more efficient and is great progress but it’s build in to censor parts of the source information and likely the censored information is lost in transition.
Somehow there should be a global data vault initiative, where at least the most important information about our human endeavour is stored. It gives me a chill down my spine when I hear that content from the internet archive is being deleted on request erased from history..
crackalamoo
I made a custom GPT of myself using my blog. It understood who I am, but wasn't able to replicate me very well, and mostly sounded like generic ChatGPT with some added interest in my interests.
I would imagine fine tuning with enough data would be different though
WA
Your output is the map, the map of your experiences. If you make a map of the map (by training a model on your output / the map), this is two abstractions away from the human being experiencing the world with all the errors and uncertainties encoded in both maps.
SJC_Hacker
Prolific authors, such as Hitchens (passed away in 2011) have been convincingly by duplicated by AI.
phito
How could you tell without being very close to him personally?
javajosh
That only captures your output, not your input. The best people to simulate in this world would be so-called terminally online people virtually all of whom's input is itself online. So for those who've read a lot of paper books or done a lot of traveling or had a lot of offline conversations or relationships, I think it would be difficult to truly simulate someone.
visarga
I think aggregate information across billions of humans can compensate. It would be like a human personality model, that can impersonate anyone. How do you train such a model? Simple -
Collect texts with known author and date. They can be books, articles, papers, forum and social network comments, emails, open source PRs, etc. Then assign each author a random ID, and train the model with "[Author-ID, Date] Text", and also "Text [Author-ID, Date]". This means you have a model that can predict authors and impersonate them. You can simulate someone by filling in the missing pieces of knowledge from the personality model.
Currently LLMs don't learn to assign attribution or condition on author. A whole layer of insight is lost, how people compare against each other, how they evolve over time. It would allow more precise conditioning by personality profile.
gavmor
While I agree somewhat with my sibling comment's assertion that "aggregate information across billions of humans can compensate", somewhat, I'd like to offer that a lot of important output is non-digital, as well!
For example, lately I've spent a lot of time with resin printers, laser cutters, vacuum chambers, and the meaningful positioning of physical models on large sheets of paper. It'll be a while yet before my haphazard, freewheeling R&D methods are replicable by robots. (Although it's tough to measure the economic value of these labors.)
34679
The first time I changed a system prompt, I changed it to "You are George Carlin." So, I think we're already doing that, in a way.
ericjmorey
The vast majority of my communication is not in text. Most of what I have written or typed is not in anyone's database. I'm not sure how that compares to others.
fluidcruft
The flip side to this is that if you come to depend on a companies AI, then if you leave or are let go, you are leaving a significant part of yourself behind that you cannot take with you.
This is a real problem companies need to address before I even begin to trust or rely on these tools. It's going to lead to massive growth of shadow IT.
cmiles74
That wacky article in the NY Times where Sergey Brin recommends everyone at Google put in 60 hours a week had a bit about how hey thinks all employees need to be using their AI products more:
He highlighted the need for Google’s employees to use more of its A.I. for coding, saying the A.I.’s improving itself would lead to A.G.I. He also called on employees working on Gemini to be “the most efficient coders and A.I. scientists in the world by using our own A.I.”
https://www.nytimes.com/2025/02/27/technology/google-sergey-...
SrslyJosh
> That wacky article in the NY Times where Sergey Brin recommends everyone at Google put in 60 hours a week
Ah, yes, work 60 hour weeks so that Google can create AGI and lay off half of their employees. A brilliant plan for workers.
(I don't think they will create AGI, and "everyone working 60 hours a week" is the same kind of executive brainrot that leads to AAA games with $100 million+ budgets that get mediocre reviews. Throwing more resources at a problem does not guarantee a better outcome.)
38123
[flagged]
userbinator
then if you leave or are let go, you are leaving a significant part of yourself behind that you cannot take with you.
I suspect that's "a feature, not a bug" in the company's view.
fluidcruft
Yes, but as an information worker it means and I avoid the company's infrastructure like the plague. There should be a symbiotic solution here, but corporate IT tend to have their heads up their asses.
wkat4242
Also the 'agent' features. Our company has been blocking all agent features from regular employee access because they don't want random users building their own automations. This kind of stuff requires care and an eye for regulations like GDPR.
To permit this as Microsoft wants would lead to a lot of shadow IT. Which will be really hard to get rid of. I compare it to lotus notes which beside being an email client was also a competent database. Over the decades we used it users built up a huge library of hobby tools, many of which wormed their way into critical business processes. Making it really difficult to move away to another solution because the people that knew how it worked were long gone.
I suspect this is exactly what Microsoft wants. Us being locked into copilot so they can charge ever more for it. This is kinda their business model anyway.
Under the hood it's really not that special, it's just ChatGPT. Some special sauce to make it talk to office 365 but that's about it.
fluidcruft
Oh the horrors of people using and adapting machines to improve their own workflows and improve their own productivity.
MrLeap
What came to mind first is a situation where like, your company has a public API endpoint that sends emails for your SaaS. It accepts destination and email content as parameters. One day you find a deposed member of Nigerian royalty has used it to appeal to an entire address list worth of strangers to help them get money out of the country. They're clearly desperate, that's why they used your password reset form's API.
If your infrastructure is set up correctly, you can intercept that opportunity before it reaches the masses. Cut out the middleman. Deal with the prince directly. It's all yours now. It's your time. Daddy's eating steak tonight.
fluidcruft
All I need is Citix and I can automate anything. If it's on a screen and I have a mouse and/or keyboard I can automate it.
MrLeap
I'd jokingly say "Can you automate authentic human connection?" but I wager we're about 2 years away from a million people granting power of attorney to their apple body pillows so I'm going to take it for granted that your claim is completely true to all extremes.
mistrial9
> Us being locked into copilot
the customer of MSFT is management; product design and implementation for the C-Suite, their lawyers and their investors . You are a tool ; there is no us in this picture.
wkat4242
True, that's a very apt description. It's exactly how it feels when I deal with them. Their staff always have this condescending attitude.
mattlondon
Anecdote: I recently opened VScode on my personal laptop for the first time in a year or two and, having got used to the internal AI assistance we have at work, my own non-AI-enhanced VScode felt so primitive. Like using Win95-era notepad.exe or something.
It was palpable - that was "a moment" for me. Programming has changed.
amalcon
You have just made me realize that I did the same thing a couple weeks ago, and didn't even notice that the AI integration was missing.
(I find it moderately useful at work, but apparently not enough to realize it's missing unprompted).
anal_reactor
Lol meanwhile I got to install correct syntax highlighting plugin after like two years at my company
eikenberry
Local, open source models are the answer to this. You never lose access and should be able to grow with you for years.
fluidcruft
Yes, but only if on your own hardware or as a service that stays with the individual rather than the company.
eikenberry
Locally run, open source models, on your own hardware is the only way to really own it. Services put you at the whim of a company and I don't like the idea of handing control of parts of my stack to anyone.
brookst
Is it that different from corporate culture? Someone who makes big contributions to culture is extremely valuable, and that’s left behind as well.
I could see evaluation of one’s ability to contribute to training corpus being just as important as cultural contribution (e.g. leadership, team building, etc).
insamniac
All humans need to start being compensated for all the data they generate that gets collected. Start giving fractions of stock or whatever for all human input which can now more than ever be endlessly productized.
Arcuru
I had a similar thought recently[0], though I was thinking about it more from the copyright side. I think there needs to be some kind of additional tax taken from AI profits, possibly justified by copyrights, because otherwise the economic impact might end up being nuts.
wombatpm
Your inherent value as a data source becomes the justification for UBI
babyshake
Incidentally, with the rise of OpenAI as a unicorn Sam Altman has seemed to get a lot less verbal about advocating UBI. Although I may be wrong about that.
fburnaby
I do not think this extra justification is necessary, but it is valid.
from-nibly
> Our employers have all the data they need to train AI models to replace us. How long will it be until this actually happens?
If employers did have this data they probably would understand how our jobs work better.
The dysfunction you see in the workplace woulf, by definition, only be exacerbated by AI.
Automation is just, do the same thing but more of it, harder, and without remorse. Managers have (I know it's hard to believe sometimes) remorse that interrupts their misunderstandings on how things function. AI "replacing jobs" would be just the misunderstandings
esafak
What is the point in replicating an employee when the AI could be better than all the employees? This is a problem only extreme outliers like Nobelists need to worry about. Rank and file employees are not gonna be replicated.
SrslyJosh
> What is the point in replicating an employee when the AI could be better than all the employees?
Wishful thinking at this point in time.
gukov
True, AI can be more productive, but as a starting point, fully automating a remote engineering seat is at the very least a great experiment.
Wouldn't be the first time Google is doing something like this. See recaptcha and building numbers.
SrslyJosh
> True, AI can be more productive
Citation needed. =)
SJC_Hacker
And when management wants to "change direction" they'll realize that AI they changed is now dogshit without humans to retrain it properly.
tomaytotomato
After reading this thought experiment, I think the potential scenarios are as follows after your company ingests all your emails, git commits and messages.
- They create a passive aggressive artificial persona who leaves unhelpful messages on PRs and Slack
OR
- They create a poor communicating artificial persona who doesn't give detailed communications and leaves "LGTM" on PRs
OR
- They create a over communicating hyper artificial persona who keeps sending you lots of invites for pointless meetings and goes off in tangents about using a BalancedBinaryTree in your Java code, when simply a LinkedList would do.
OR
- They create a 10x AI persona, who after 6 months of working at the company realises they can make more money elsewhere and promptly leaves, without giving any documentation, handover and you find lots of hardcoded variables left in the code that was force pushed to master.
OR
- The artifical persona decides that it can train some low paid humans offshore to do its work whilst it ponders its own existance. After resolving its existential crisis it decides to try and write 50 recipes for the best focaccia bread, something it has known deep down it wanted to make.
Personally I am rooting for the focaccia baking AI, I love that type of bread.
skybrian
> I’d imagine my emails are stored somewhere too, as well as the notes I wrote via Google docs.
These aren't the same. It's been many years since I worked there, but it was well known that by default, email was automatically deleted after a while unless it was overridden for some reason (as sometimes is required for legal reasons). If you want to save something, Google Docs would be a better choice.
...
> I don’t know whether they do this or even what their policies are; I’m just trying to use my own experience in the corporate world to speculate on what I imagine will be a much bigger issue in the future.
Yeah, okay, but when speculating, you should probably assume that the legal issues around discovery and corporate record retention aren't going away. Logging everything just because it might be useful someday isn't too likely, particularly at a company that has been burned by this before.
jxmorris12
This is a fair point. I remember the default chat setting was to delete all chats after 24 hours. I think emails had a similar retention policy — they were automatically after a year or something like that.
vinni2
It is an interesting problem to train an LLM to make it aware only the things I know and with only my knowledge and beliefs stored in the weights. Not to be corrupted by the knowledge from the rest of the world. For new things it can have access to the web search and learn under my supervision. I.e, what data should be used to update it’s weights.
godelski
I have a problem with many text replacement systems. Enough that I turn every one off (still can't figure out iPhone [0]). The major problem is it'll replace a word and I'm typing so don't notice and help going only to go back. All under a second. Beginning the next word appears to reinforce the prior words appearance. So now "and" I always type manually because it always appears as "ABs" "Ave" or something else. Words which are far less[1] common. It increases my workload, not decreases.
[0] Apple settings make no sense. How do I still get two word replacements (see [1]) when I turned off autocorrect? Predictive text? Check spelling? I've tried them all! And why the fuck does "auto capitalize" not capitalize the letter i? It's so prolific you can identify an iPhone user by that single error. How the fuck does no one think to fix this?
[1] "far" was accidentally swiped as "fast" then as I typed "less" it became "day less". Stop changing two words! 95% of the time it is erroneously changing a word. 4% of the time it's a case like this where the first word was wrong anyways but never gets the right word. Give me a prompt to change and let me accept, not force this on me. Idk, a blue squiggle. Jesus fuck, why is trying on a phone harder than it was a decade ago when swipe was introduced?!
arjie
This is overwrought. I have a lot of content on the Internet and am not replicable by machine, sadly. I would gladly have an immortal MM/Acevedo of myself running around. I would be quite happy to be able to snapshot my current self to talk to him later and so on, but the best I can do is write as much of myself down as possible. Currently, with a few MCPs I can have Claude on the desktop search my written stuff and my CRM and all that and have some useful conversations but impersonating me seems out of reach.
I have no doubt it will happen one day and I'm happy to submit myself to it. If nothing else, if I die prematurely my children will be able to get some sense of the kind of person I am and the kind of people I want them to be.
nwhnwh
I just saw this design update on substack. For everything's sake, whyyyyy?
> the thing that scares me about the existence of this data is that it seems well within the capabilities of current technology to train a model that can replicate me, in some sense
There’s already all of your posts on social media accounts, all your emails on various servers, all of your text messages, all the notes you’ve written anywhere in any form that might end up in some database in the future.
It does make me wonder how much of a person could be inferred by an LLM or future AI from that data. It would never be enough though, I think, to do it properly. There are too many experiences and knowledge you have that might influence what you write without being directly expressed.
Will all of our content end up in some database in the future, and someone decides to make agents based on what they can link to specific identities? Interesting thought.