Skip to content(if available)orjump to list(if available)

Hunyuan3D 2.0 – High-Resolution 3D Assets Generation

geuis

Question related to 3D mesh models in general: has any significant work been done on models oriented towards photogrammetry?

Case in point, I have a series of photos (48) that capture a small statue. The photos are high quality, the object was on a rotating platform. Lighting is consistent. The background is solid black.

These normally are ideal variables for photogrammetry but none of the various common applications and websites do a very good job creating a mesh out of it that isn't super low poly and/or full of holes.

I've been casually scanning huggingface for relevant models to try out but haven't really found anything.

troymc

Check out RealityCapture [1]. I think it's what's used to create the Quixel Megascans [2]. (They're both under the Epic corporate umbrella now.)

[1] https://www.capturingreality.com/realitycapture

[2] https://quixel.com/megascans/

jocaal

Recently, a lot of development in this area has been in gaussian splatting and from what I have seen, the new methods are super effective.

https://en.wikipedia.org/wiki/Gaussian_splatting

https://www.youtube.com/watch?v=6dPBaV6M9u4

geuis

Yeah some very impressive stuff with splats going on. But I haven't seen much about going from splats to high quality 3D meshes. I've tried one or two with pretty poor results.

tzumby

I’m not an expert, only dabbled in photogrammetry, but it seems to me that the crux of that problem is identifying common pixels across images in order to sort of triangulate a point in the 3D space. It doesn’t sound like something an LLM would be good at.

falloon

[dead]

MikeTheRocker

Generative AI is going to drive the marginal cost of building 3D interactive content to zero. Unironically this will unlock the metaverse, cringe as that may sound. I'm more bullish than ever on AR/VR.

jsheard

I can only speak for myself, but a Metaverse consisting of infinite procedural slop sounds about as appealing as reading infinite LLM generated books, that is, not at all. "Cost to zero" implies drinking directly from the AI firehose with no human in the loop (those cost money) and entertainment produced in that manner is still dire, even in the relatively mature field of pure text generation.

MikeTheRocker

IMO current generation models are capable of creating significantly better than "slop" quality content. You need only look at NotebookLM output. As models continue to improve, this will only get better. Look at the rate of improvement of video generation models in the last 12-24 months. It's obvious to me we're rapidly approaching acceptable or even excellent quality on-demand generated content.

jsheard

I feel like you're conflating quality with fidelity. Video generation models have better fidelity than they did a year ago, but they are no closer to producing any kind of compelling content without a human directing them, and the latter is what you would actually need to make the "infinite entertainment machine" happen.

The fidelity of a video generation model is comparable to an LLMs ability to nail spelling and grammar - it's a start, but there's more to being an author than that.

deeznuttynutz

This is exactly while I'm building my app now with the expectation that these assets will be exponentially better in the short term.

jdietrich

I can only speak for myself, but a large and growing proportion of the text I read every day is LLM output. If Claude and Deepseek produce slop, then it's a far higher calibre of slop than most human writers could aspire to.

hex4def6

I think it has its place. For 'background filler' I think it makes a lot of sense; stuff which you don't need to care about, but whose absence can make something feel less real.

To me, this takes the place / augments procedural generation stuff. NPC crowds in which none of the participants are needed for the plot, but in which you can have unique clothing / appearance / lines is not "needed" for a game, but can flesh it out when done thoughtfully.

Recall the lambasting Cyberpunk 2077 got for its NPCs that cycled through a seemingly very limited number of appearances, to the point that you'd see clones right next to each other. This would solve that sort of problem, for example.

bufferoverflow

Minecraft is procedurally generated slop, yet it's insanely popular.

null

[deleted]

noch

> a Metaverse consisting of infinite procedural slop sounds about as appealing as reading infinite LLM generated books

Take a look at the ImgnAI gallery (https://app.imgnai.com/) and tell me: can you paint better and more imaginatively than that? Do you know anyone in your immediate vicinity who can?

Read this satirical speech by Claude, in French https://x.com/pmarca/status/1881869448275177764) and in English (https://x.com/pmarca/status/1881869651329913047) and tell me: can you write fiction more entertaining or imaginative than that? Is there someone in your vicinity who can?

Perhaps that's mundane, so is there someone in your vicinity who can reason about a topic in mathematics/physics as well as this: https://x.com/hsu_steve/status/1881696226669916408 ?

Probably your answer is "yes, obviously!" to all the above.

My point: deep learning works and the era of slop ended ages ago except that some people are still living in the past or with some cartoon image of the state of the art.

> "Cost to zero" implies drinking directly from the AI firehose with no human in the loop

No. It means the marginal cost of production tends towards 0. If you can think it, then you can make it instantly and iterate a billion times to refine your idea with as much effort as it took to generate a single concept.

Your fixation on "content without a human directing them" is bizarre and counterproductive. Why is "no human in the loop" a prerequisite for productivity? Your fixation on that is confounding your reasoning.

Philpax

> Take a look at the ImgnAI gallery (https://app.imgnai.com/) and tell me: can you paint better and more imaginatively than that?

So while I generally agree with you, I think this was a bad example to use: a lot of these are slop, with the kind of AI sheen we've come to glaze over. I'd say less than 20% are actually artistically impressive / engaging / thought-provoking.

deadbabe

I think you’re being short sighted. Imagine feeding in your favorite TV shows to a generative AI and being able to walk around in the world and talk to characters or explore it with other people.

bschwindHN

That's still AI slop, in my opinion.

slt2021

do you find it interesting talking to NPCs in games?

echelon

You're too old and jaded [1]. It's for kids inventing infinite worlds to role play and adventure. They're going to have a blast.

[1] Not meant as an insult. Working professionals don't have time for this stuff.

wizzwizz4

Object permanence and a communications channel is enough for this. Give children (who get along with each other) a pile of sticks and leave them alone for half an hour, and there's half a chance their game will ignore the sticks. Most children wouldn't want to have their play mediated by the computer in the way you describe, because the ergonomics are so poor.

taejavu

Jeez I'd love to know what Apple's R&D debt on Vision Pro is, based on current sales to date. I really really hope they continue to push for a headset that's within reach of average people but the hole must be so deep at this point I wouldn't be surprised if they cut their losses.

EncomLab

As Carmack pointed out the problem with AR/VR right now - it's not the hardware, it's the software. Until the "visicalc" must have killer app shows up to move the hardware, there is little incentive for general users to make the investment.

null

[deleted]

PittleyDunkin

> As Carmack pointed out the problem with AR/VR right now - it's not the hardware, it's the software.

The third option is peoples' expectation for AR/VR itself: it could be a highly niche and expensive industry and unlikely to grow to the general population.

null

[deleted]

PittleyDunkin

Maybe eventually. Based on this quality I don't see this happening any time in the near future.

slt2021

[flagged]

pella

Ouch; License: EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA

  TENCENT HUNYUAN 3D 2.0 COMMUNITY LICENSE AGREEMENT
  Tencent Hunyuan 3D 2.0 Release Date: January 21, 2025
  THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
https://github.com/Tencent/Hunyuan3D-2?tab=License-1-ov-file

EMIRELADERO

I assume it's safe to ignore as model weights aren't copyrightable, probably.

slt2021

you dont know what kind of backdoors are hidden in the model weights

gruez

Is this tied to EU regulations around AI models?

null

[deleted]

denkmoon

For the AI un-initiated; is this something you could feasibly run at home? eg on a 4090? (How can I tell how "big" the model is from the github or huggingface page?)

sorenjan

The hunyuan3d-dit-v2-0 model is 4.93 GB. ComfyUI is on their roadmap, might be best to wait for that, although it doesn't look complicated to use in their example code.

https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan...

sebzim4500

Interesting. One of the diagrams suggests that the mesh is generated from the marching cubes algorithm but the geometry of the meshes shown above are clearly not generated in this way.

GrantMoyer

To me, the bird mesh actually does look like marching cubes output. Note the abundance of almost square triangle pairs on the front and sides. Also note that marching cubes doesn't nescessarily create stairstep-like artifacts; it can generate a smooth looking mesh given signed distance field input by slightly adjusting the locations of vertices based on the relative magnitude of the field at the surrounding lattice points.

wumeow

The meshes generated by the huggingface demo definitely look like the product of marching cubes.

godelski

As with any generative model, trust but verify. Try it yourself. Frankly, as a generative researcher myself, there's a lot of reason to not trust what you see in papers and pages.

They link a Huggingface page (great sign!): https://huggingface.co/spaces/tencent/Hunyuan3D-2

I tried to replicate the objects they show on their project page (https://3d-models.hunyuan.tencent.com/). The full prompts exist but are truncated so you can just inspect the element and grab the text.

  Here's what I got
  Leaf
     PNG: https://0x0.st/8HDL.png
     GLB: https://0x0.st/8HD9.glb
  Guitar
     PNG: https://0x0.st/8HDf.png  other view: https://0x0.st/8HDO.png
     GLB: https://0x0.st/8HDV.glb
  Google Translate of Guitar:
     Prompt: A brown guitar is centered against a white background, creating a realistic photography style. This photo captures the culture of the instrument and conveys a tranquil atmosphere.
     PNG: https://0x0.st/8HDt.png   and  https://0x0.st/8HDv.png
     Note: Weird thing on top of guitar. But at least this time the strings aren't fusing into sound hole. 
I haven't tested my own prompts or the google translation of the Chinese prompts because I'm getting an over usage error (I'll edit comment if I get them). That said, these look pretty good. The paper and page images definitely look better, but these aren't like Stable Diffusion 1 paper vs Stable Diffusion 1 reality.

But these are long and detailed prompts. Lots of prompt engineering. That should raise some suspicion. Real world has higher variance and let's get an idea how hard it is to use. So let's try some simpler things :)

  Prompt: A guitar
    PNG: https://0x0.st/8HDg.png
    Note: Not bad! Definitely overfit but does that matter here? A bit too thick for a electric guitar but too thin for acoustic.
  Prompt: A Monstera leaf
    PNG: https://0x0.st/8HD6.png  
         https://0x0.st/8HDl.png  
         https://0x0.st/8HDU.png
    Note: A bit wonkier. I picked this because it looked like the leaf in the example but this one is doing some odd things. 
          It's definitely a leaf and monstera like but a bit of a mutant. 
  Prompt: Mario from Super Mario Bros
    PNG: https://0x0.st/8Hkq.png
    Note: Now I'm VERY suspicious....
  Prompt: Luigi from Super Mario Bros
    PNG: https://0x0.st/8Hkc.png
         https://0x0.st/8HkT.png  
         https://0x0.st/8HkA.png
    Note: Highly overfit[0]. This is what I suspected. Luigi isn't just tall Mario. 
          Where is the tie coming from? The suspender buttons are all messed up. 
          Really went uncanny valley here. So this suggests we're really brittle. 
  Prompt: Peach from Super Mario Bros
    PNG: https://0x0.st/8Hku.png  
         https://0x0.st/8HkM.png
    Note: I'm fucking dying over here this is so funny. It's just a peach with a cute face hahahahaha
  Prompt: Toad from Super Mario Bros
    PNG: https://0x0.st/8Hke.png 
         https://0x0.st/8Hk_.png
         https://0x0.st/8HkL.png
    Note: Lord have mercy on this toad, I think it is a mutated Squirtle.  
Paper can be found here (the arxiv badge on the page leads to a pdf in the repo, which github is slow to render those): https://arxiv.org/abs/2411.02293

(If you want to share images like I did all I'm doing is `curl -F'file=@foobar.png' https://0x0.st`)

[0] Overfit is a weird thing now. Maybe it doesn't generalize well, but sometimes that's not a problem. I think this is one of the bigger lessons we've learned with recent ML models. My viewpoint is "Sometimes you want a database with a human language interface. Sometimes you want to generalize". So we have to be more context driven here. But certainly there are a lot of things we should be careful about when we're talking about generation. These things are trained on A LOT of data. If you're more "database-like" then certainly there's potential legal ramifications...

Edit: For context, by "look pretty good" I mean in comparison to other works I've seen. I think it is likely a ways from being useful in production. I'm not sure how much human labor would be required to fix the issues.

godelski

Ops ran out of edit time when I was posting my last two

  Prompt: A hawk flying in the sky
    PNG: https://0x0.st/8Hkw.png
         https://0x0.st/8Hkx.png
         https://0x0.st/8Hk3.png
    Note: This looks like it would need more work. I tried a few birds and generic too. They all seem to have similar form. 
  Prompt: A hawk with the head of a dragon flying in the sky and holding a snake
    PNG: https://0x0.st/8HkE.png
         https://0x0.st/8Hk6.png
         https://0x0.st/8HkI.png
         https://0x0.st/8Hkl.png
    Note: This one really isn't great. Just a normal hawk head. Not how a bird holds a snake either...
This last one is really key for judging where the tech is at btw. Most of the generations are assets you could download freely from the internet and you could probably get better ones by some artist on fiver or something. But the last example is more our realistic use case. Something that is relatively reasonable, probably not in the set of easy to download assets, and might be something someone wants. It isn't too crazy of an ask given Chimera and how similar a dragon is to a bird in the first place, this should be on the "easier" end. I'm sure you could prompt engineer your way into it but then we have to have the discussion of what costs more a prompt engineer or an artist? And do you need a prompt engineer who can repair models? Because these look like they need repairs.

This can make it hard to really tell if there's progress or not. It is really easy to make compelling images in a paper and beat benchmarks while not actually creating a something that is __or will become__ a usable product. All the little details matter. Little errors quickly compound... That said, I do much more on generative imagery than generative 3d objects so grain of salt here.

Keep in mind: generative models (of any kind) are incredibly difficult to evaluate. Always keep that in mind. You really only have a good idea after you've generated hundreds or thousands of samples yourself and are able to look at a lot with high scrutiny.

BigJono

Yeah, this is absolutely light years off being useful in production.

People just see fancy demos and start crapping on about the future, but just look at stable diffusion. It's been around for how long, and what serious professional game developers are using it as a core part of their workflow? Maybe some concept artists? But consistent style is such an important thing for any half decent game and these generative tools shit the bed on consistency in a way that's difficult to paper over.

I've spent a lot of time thinking about game design and experimenting with SD/Flux, and the only thing I think I could even get close to production that I couldn't before is maybe an MTG style card game where gameplay is far more important than graphics, and flashy nice looking static artwork is far more important than consistency. That's a fucking small niche, and I don't see a lot of paths to generalisation.

Kelvin506

The first guitar has one of the strings end at the sound hole, and six tuning knobs for five strings.

The second has similar problems: it has tuning knobs with missing winding posts, then five strings becoming four at the bridge. It also has a pickup under the fretboard.

Are these considered good capability examples?

godelski

I take back a fair amount of what I said.

It is pretty good with some easier assets that I suspect there's lots of samples of (and we're comparing to other generative models, not to what humans make. Humans probably still win by a good margin). But when moving out of obvious assets that we could easily find, I'm not seeing good performance at all. Probably a lot can be done with heavy prompt engineering but that just makes things more complicated to evaluate.

keyle

Thanks for this. The results are quite impressive, after trying it myself.

artemonster

[flagged]