How to Train an AI Image Model on Yourself

ramoz

FYI: For Flux, there is a lot more power in the text-encoder & you can prompt with more meaningful and comprehensive sentences. Thus, less of the traditional comma separated & concise phrasing we saw in stable diffusion.

You should do the same with your training images. Caption everything you do not want the model to remember as "you" (what you're doing, wearing, accompanied by, accessories, etc).

null

[deleted]

isoprophlex

I did this for our beloved, dead cat... On replicate, too. I loved the results, until at one point I suddenly got really creeped out about the thing I was doing.

ryandvm

This is going to be big business I think. I have probably sent hundreds of thousands of emails, texts, chats, etc. It would be well within the realm of possibility to train an LLM on a loved ones communications corpus and allow you to chat with "them" after they're gone.

Possible? Yes. Convincing results? Probably. Good idea? I doubt it.

mipmap04

Oh man, I did this with my dad's voice after he died and set up a thing where I could talk with an LLM-backed assistant and have it respond in his voice and mannerisms. It was a very weird coping and grief period and I ultimately hit a point where I got really weirded out about what I was doing.

portaouflop

I think that was 1:1 a black mirror episode

PaulDavisThe1st

Episode title was "Be right back"

toomuchtodo

Apple Intelligence + Legacy Contact. One's phone could perform this compute while plugged in at night, or offload to a Mac in your iCloud account.

"Enable Avatar Training" "Enable Avatar Transfer to Legacy Contact" Always optional of course.

waspleg

Literally a Black Mirror episode.

slig

I remember seeing it here on HN that someone did that with a group chat and it would reply as each friend.

knicholes

This is exactly what I'd want to do for my "smart urn."

mystified5016

Forget an urn, I want my digital ghost to haunt a furby.

thefourthchime

I did this a while back, though it was pictures of my wife in lingerie.

- I asked grok to generate a list of racey prompts. - Has replicate generate them via script. About 10-20% are very poor, I filtered those out manually. - It also has NSFW guardrails, but a simple retry or word juggle gives you a chance to get around it.

I think I spent $10

Der_Einzige

There is a parallel "underground" AI research world of stuff like this, with it's hub on "civit.ai" instead of huggingface.

Often the innovations from that world are ahead of mainstream AI research by years. You should see what coomers did for LLM sampling in order to get over issues with "slop" responses just for their own pervy interests. This is a full several years before the mainstream crowd ever cared.

ok_dad

Porn has always pushed the boundaries of media on the internet. I don't know why people are surprised! Since sex is something nearly everyone does, it would make sense that a lot of human progress were the result of trying to integrate sex and whatever new tech is out there at the time. I am sure a hundred years ago some inventors were pushing the boundaries of motors in sex toys, and in another hundred years some other inventor will be pushing the boundaries on putting sex in holograms.

petercooper

Replicate does make this particularly easy while still being somewhat developer focused. I've used it for a few people in our group chat so we can make silly in-joke memes and stuff and the results are quite stunning. Replicate then offers the model up over a simple API (shown in the post) if you wanted to let people generate right from the chat, etc. Replicate is worth poking around a bit more broadly, too, they have some interesting models on there (though the pricing tends not to be very competitive if you were going to do it at scale.)

ge96

What I want is to be able to feed in a bunch of videos and generate an animatable (from talking) 3D face from that data. I suppose you in theory only need 3 images (front and sides). But mapping pixels to motion is interesting (facial expressions).

There wouldn't be depth data so it would be inferred from shadows

ilaksh

Replicate has Hunyuan video training now. https://replicate.com/blog/fine-tune-video

Also, Kling 1.6 Elements works pretty okay if you use the same person/face for each element.

Kling also has lip sync.

Or this lip sync with replicate: https://replicate.com/bytedance/latentsync

Or there is HeyGen or D-ID or Synthesia, or tavus.io for full interactive digital twins.

ge96

Thanks

timdiggerm

Why do you want to do that?

ge96

My case is not directly nefarious, for example an old popular YouTuber that streamed in the early 2000s taking their content and making a model of them for personal use like a 3D chat bot but with that person's quirks

Edit: when I say "nefarious" I mean you can use that tech to impersonate someone (eg. political reason) but for my case it's more the creeper type cloning someone for personal use eg. Replika

Tangent, the holo vtubers industry is interesting since they build up these characters with some unique persona/theme and then people follow that specific model, they could make themselves into an AI easily since it's a rigged 3D asset but of course it would be boring compared to the real thing

GaggiX

>they could make themselves into an AI easily since it's a rigged 3D asset but of course it would be boring compared to the real thing

The most popular vtuber on Twitch is an AI tho

deadbabe

I’m imagining something where an influencer trains AI to make and post images of themselves on social media, then the influencer dies but the AI keeps going forever.

ge96

The impact is kind of interesting, how do you know someone's legit, the person doing basejumping or whatever

Thanos/NFTs: where did that take you? right back to me

Thinking hardware with built in chain interface for proof

Oh man dating apps too

That's true love though, two people meet up IRL they're both like wtf who are you

manishsharan

This is fantastic but now you need to train a model to detect AI generated images from actual photos. Then of course , a model to beat the detector model and then a model to catch the model that beats the detector model and so on.

Thank you from people holding NVDA.

beng-nl

You may have re-invented GANs :-)

DoodahMan

is something like this possible to do with video yet?