Skip to content(if available)orjump to list(if available)

I am giving up on Intel and have bought an AMD Ryzen 9950X3D

augustl

Happy 9950X user here. Super happy with it, everything is crazy fast. Not a gamer, according to internet and benchmarks the extra cost is only worth it for gaming workloads.

I use Arch, btw ;)

c0l0

I realize this has not much to do with CPU choice per se, but I'm still gonna leave this recommendation here for people who like to build PCs to get stuff done with :) Since I've been able to afford it and the market has had them available, I've been buying desktop systems with proper ECC support.

I've been chasing flimsy but very annoying stability problems (some, of course, due to overclocking during my younger years, when it still had a tangible payoff) enough times on systems I had built that taking this one BIG potential cause out of the equation is worth the few dozens of extra bucks I have to spend on ECC-capable gear many times over.

Trying to validate an ECC-less platform's stability is surprisingly hard, because memtest and friends just aren't very reliably detecting more subtle problems. PRIME95, y-cruncher and linpack (in increasing order of effectiveness) are better than specialzied memory testing software in my experience, but they are not perfect, either.

Most AMD CPUs (but not their APUs with potent iGPUs - there, you will have to buy the "PRO" variants) these days have full support for ECC UDIMMs. If your mainboard vendor also plays ball - annoyingly, only a minority of them enables ECC support in their firmware, so always check for that before buying! - there's not much that can prevent you from having that stability enhancement and reassuring peace of mind.

Quoth DJB (around the very start of this millenium): https://cr.yp.to/hardware/ecc.html :)

wer232essf

Good points. I’ve also found ECC worth the extra cost chasing stability issues without it can be a real time sink. Taking random memory errors out of the equation makes troubleshooting so much simpler and saves a ton of frustration when you’re working on something important.

You’re right that tools like memtest often miss the subtle problems, and even stress tests like Prime95 or y-cruncher don’t catch everything. That’s why having ECC as a built-in safeguard feels so valuable. With most modern AMD CPUs supporting ECC UDIMMs, the main variable really is whether the motherboard firmware enables it, which makes it important to double-check before buying. For me, the added stability and peace of mind have been well worth the small premium.

storus

Now where can I get 64GB ECC UDIMM DDR5 modules so that my X870E board can have 256GB RAM? The largest I found were just 48GB ECC UDIMMs or 64GB non-ECC UDIMMs.

c0l0

I don't think 64GB ECC UDIMM is commercially available yet. I use Geizhals to check for EU availability: https://geizhals.eu/?cat=ramddr3&xf=7500_DDR5~7501_DIMM~7761...

In my experience, it's generally unwise to push the platform you're on to the outermost of its spec'd limits. At work, we bought several 5950X-based Zen3 workstations with 128GB of 3200MT/s ECC UDIMM, and two of these boxes will only ever POST when you manually downclock memory to 3000MT/s. Past a certain point, it's silicon lottery deciding if you can make reality live up to the datasheets' promises.

jml7c5

The big problem with ECC for me is that the sticks are so much more expensive. You'd expect ECC UDIMMs to have a price premium of just over 12.5% (because there are 9 chips instead of 8), but it's usually at least 100%. I don't mind paying reasonable premium for ECC, but paying double is too hard to swallow.

sippeangelo

Yeah, with that kind of markup you might as well just buy new ones IF they break, or just spend the extra budget on better quality parts. Just having to pick a very specific motherboard that probably is very much not optimal for your build will blow the costs up even more, and for what gain?

I've been building my own gaming and productivity rigs for 20 years and I don't think memory has ever been a problem. Maybe survivorship bias, but surely even budget parts aren't THIS bad.

lmm

> with that kind of markup you might as well just buy new ones IF they break

Assuming you can tell, and assuming you don't end up silently corrupting your data before then.

devnullbrain

I like the warning not to buy a motherboard from a manufacturer that has been defunct for 17 years

dijit

> only a minority of them enables ECC support in their firmware, so always check for that before buying!

This is the annoying part.

That AMD permits ECC is a truly fantastic situation, but if it's supported by the motherboard is often unlikely and worse: it's not advertised even when it's available.

I have an ASUS PRIME TRX40 PRO and the tech specs say that it can run ECC and non-ECC but not if ECC will be available to the operating system, merely that the DIMMS will work.

It's much more hit and miss in reality than it should be, though this motherboard was a pricey one: one can't use price as a proxy for features.

sunshowers

If you're on Linux, dmesg containing

  EDAC MC0: Giving out device to module amd64_edac
is a pretty reliable indication that ECC is working.

See my blog post about it (it was top of HN): https://sunshowers.io/posts/am5-ryzen-7000-ecc-ram/

oneshtein

My `dmesg` tells:

    EDAC MC0: Giving out device to module igen6_edac controller Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
    EDAC MC1: Giving out device to module igen6_edac controller Intel_client_SoC MC#1: DEV 0000:00:00.0 (INTERRUPT)
but `dmidecode --type 16` says:

    Error Correction Type: None
    Error Information Handle: Not Provided

adrian_b

This is weird. I have used many ASUS MBs specified as "can run ECC and non-ECC" and this has always meant that there was an ECC enabling option in the BIOS settings, and then if the OS had an appropriate EDAC driver for the installed CPU ECC worked fine.

I am writing this message on such an ASUS MB with a Ryzen CPU and working ECC memory. You must check that you actually have a recent enough OS to know your Threadripper CPU and that you have installed any software package required for this (e.g. on Linux "edac-utils" or something with a similar name).

c0l0

Usually, if a vendor's spec sheet for a (SOHO/consumer-grade) motherboard mentions ECC-UDIMM explicitly in its memory compatibility section, and (but this is a more recent development afaict) DOES NOT specify something like "operating in non-ECC mode only" at the same time, then you will have proper ECC (and therefore EDAC and RAS) support in Linux, if the kernel version you have can already deal with ECC on your platform in general.

I would assume your particular motherboard to operate with proper SECDED+-level ECC if you have capable, compatible DIMM, enable ECC mode in the firmware, and boot an OS kernel that can make sense of it all.

consp

Isn't it mostly an ease of mind thing? I've never seen a ECC error on my home server which has plenty of memory in use and runs longer than my desktop. Maybe it's more common with higher clocked, near the limit, desktop PC's.

Also: DDR5 has some false ecc marketing due to the memory standard having an error correction scheme build in. Don't fall for it.

adrian_b

Whether you will see ECC errors depends a lot on how much memory you have and how old it is.

A computer with 64 GB of memory is 4 times more likely to encounter memory errors than one with 16 GB of memory.

When DIMMs are new, at the usual amounts of memory for desktops, you will see at most a few errors per year, sometimes only an error after a few years. With old DIMMs, some of them will start to have frequent errors (such modules presumably had a borderline bad fabrication quality and now have become worn out, e.g. due to increased leakage leading to storing a lower amount of charge on the memory cell capacitors).

For such bad DIMMs, the frequency of errors will increase, and it may become of several errors per day, or even per hour.

For me, a very important advantage of ECC has been the ability to detect such bad memory modules (in computers that have been used for 5 years or more) and replace them before corrupting any precious data.

I also had a case with a HP laptop with ECC, where memory errors had become frequent after being stored for a long time (more than a year) in a rather humid place, which might have caused some oxidation of the SODIMM socket contacts, because removing the SODIMMs, scrubbing the sockets and reinserting the SODIMMs made disappear the errors.

wpm

I had a somewhat dodgy stick of used RAM (DDR4 UDIMM) in a Supermicro X11 board. This board is running my NAS, all ZFS, so RAM corruption can equal data corruption. The OS alerted me to recoverable errors on DIMM B2. Swapped it and another DIMM, rebooted, saw DIMM error on slot B1. Swapped it for a spare stick. No more errors.

This was running at like, 1866 or something. It's a pretty barebones 8th gen i3 with a beefier chipset, but ECC still came in clutch. I won't buy hardware for server purposes without it.

c0l0

I see a particular ECC error at least weekly on my home desktop system, because one of my DIMMs doesn't like the (out of spec) clock rate that I make it operate at. Looks like this:

    94 2025-08-26 01:49:40 +0200 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller (bank=18), mcg mcgstatus=0, mci CECC, memory_channel=1,csrow=0, mcgcap=0x0000011c, status=0x9c2040000000011b, addr=0x36e701dc0, misc=0xd01a000101000000, walltime=0x68aea758, cpuid=0x00a50f00, bank=0x00000012
    95 2025-09-01 09:41:50 +0200 error: Corrected error, no action required., CPU 2, bank Unified Memory Controller (bank=18), mcg mcgstatus=0, mci CECC, memory_channel=1,csrow=0, mcgcap=0x0000011c, status=0x9c2040000000011b, addr=0x36e701dc0, misc=0xd01a000101000000, walltime=0x68b80667, cpuid=0x00a50f00, bank=0x00000012
(this is `sudo ras-mc-ctl --errors` output)

It's always the same address, and always a Corrected Error (obviously, otherwise my kernel would panic). However, operating my system's memory at this clock and latency boosts x265 encoding performance (just one of the benchmarks I picked when trying to figure out how to handle this particular tradeoff) by about 12%. That is an improvement I am willing to stomach the extra risk of effectively overclocking the memory module beyond its comformt zone for, given that I can fully mitigate it by virtue of properly working ECC.

Hendrikto

Running your RAM so far out of spec that it breaks down regularly, where do you take the confidence that ECC will still work correctly?

Also: Could you not have just bought slightly faste RAM, given the premium for ECC?

ainiriand

I think you've found a particularly weak memory cell, I would start thinking about replacing that module. The consistent memory_channel=1, csrow=0 pattern confirms it's the same physical location failing predictably.

kderbe

I would loosen the memory timings a bit and see if that resolves the ECC errors. x265 performance shouldn't fall since it generally benefits more from memory clock rate than latency.

Also, could you share some relevant info about your processor, mainboard, and UEFI? I see many internet commenters question whether their ECC is working (or ask if a particular setup would work), and far fewer that report a successful ECC consumer desktop build. So it would be nice to know some specific product combinations that really work.

BikiniPrince

I pick up old serves for my garage system. With edac it is a dream to isolate the fault and be instantly aware. It also lets you determine the severity of the issue. Dimms can run for years with just the one error or overnight explode into streams of corrections. I keep spares so it’s fairly easy to isolate any faults. It’s just how do you want to spend your time?

hedora

You have to go pretty far down the rabbit hole to make sure you’ve actually got ECC with [LP]DDR5

Some vendors use hamming codes with “holes” in them, and you need the CPU to also run ECC (or at least error detection) between ram and the cache hierarchy.

Those things are optional in the spec, because we can’t have nice things.

Scramblejams

I run a handful of servers and I have a couple that pop ECC errors every year or three, so YMMV.

immibis

I saw a corrected memory error logged every few hours when my current machine was new. It seems to have gone away now, so either some burn-in effect, or ECC accidentally got switched off and all my data is now corrupted. Threadripper 7000 series, 4x64GB DDR5.

Edit: it's probably because I switched it to "energy efficiency mode" instead of "performance mode" because it would occasionally lock up in performance mode. Presumably with the same root cause.

rendaw

So I'm trying to learn more about this stuff, but aren't there multiple ECC flavors and the AMD consumer CPUs only support one of them (not the one you'd have on servers?)

Does anyone maintain a list with de-facto support of amd chips and mainboards? That partlist site only shows official support IIRC, so it won't give you any results.

adrian_b

The difference between the "unbuffered" ECC DIMMs (ECC UDIMMs), which you must use in desktop motherboards (and in some of those advertised as "workstation" MBs) and the "registered" ECC DIMMs (ECC RDIMMs), which you must use in server motherboards (and in some of the "workstation" MBs), has existed for decades.

However in the past there have existed very few CPU models and MBs that supported either kind of DIMMs, while today this has become completely impossible, as the mechanical and electrical differences between them have increased.

In any case, today, like also 20 years ago, when searching for ECC DIMMs you must always search only the correct type, e.g. unbuffered ECC DIMMs for desktop CPUs.

In general, registered ECC DIMMs are easier to find, because wherever "server memory" is advertised, that is what is meant. For desktop ECC memory, you must be careful to see both "ECC" and "unbuffered" mentioned in the module description.

hungmung

Seconding this. I'm looking for a fanless industrial mini PC with out of band ECC and I'm having a hell of a time.

adrian_b

Had you been looking for "in-band ECC", the cheap ODROID H4 PLUS ($150) or the cheaper ODROID H4 ($110) would have been fine, or for something more expensive some of the variants of Asus NUC 13 Rugged support in-band ECC.

For out-of-band ECC, e.g. with standard ECC SODIMMs, all the embedded SBCs that I have seen used only CPUs that are very obsolete nowadays, i.e. ancient versions of Intel Xeon or old AMD industrial Ryzen CPUs (AMD's series of industrial Ryzen CPUs are typically at least one or two generations behind their laptop/desktop CPUs).

Moreover all such industrial SBCs with ECC SODIMMs were rather large, i.e. either in the 3.5" form factor or in the NanoITX form factor (120 mm x 120 mm), and it might have been necessary to replace their original coolers with bigger heatsinks for fanless operation.

In-band ECC causes a significant decrease of the performance, but for most applications of such mini-PCs the performance is completely acceptable.

swinglock

Excellent point. It's a shame and a travesty that data integrity is still mostly locked away inside servers, leaving most other computing devices effectively toys, the early prototype demo thing but then never finished and sold forever at inflated prices.

I wish AMD would make ECC a properly advertised feature with clear motherboard support. At least DDR5 has some level of ECC.

kevin_thibedeau

> At least DDR5 has some level of ECC.

That is mostly to assist manufacturers in selling marginal chips with a few bad bits scattered around. It's really a step backwards in reliability.

wpm

I wish AMD wouldn't gate APU ECC support behind unobtainium "PRO" SKUs they only give out, seemingly, to your typical "business" OEMs and the rare Chinese miniPC company.

enronmusk

If OP's CPU cooler (Noctua NH-D15 G2) wasn't able to cool down his CPU below 100C, he must have been (intentionally or unintentionally with Asus multi core enhancement) overclocked his CPU. Or he didn't apply thermal paste properly or didn't remove the cooler plastic sticker?

I have followed his blog for years and hold him in high respect so I am surprised he has done that and expected stability at 100C regardless of what Intel claim is okay.

Not to mention that you rapidly hit diminishing returns pass 200W with current gen Intel CPUs, although he mentions caring able idle power usage. Why go from 150W to 300W for a 20% performance increase?

dahauns

The cpu temps are one thing, but if - as you said - even a beast like the D15 G2 has it pegged at 100C, this very much sounds like bad ventilation and other parts of the system being toasted as well - VRMs in particular, for which the "PRIME" (actually being the low-end series) mainboards from Asus, as used here, don't exactly have a stellar reputation.

And yeah, having Arrow Lake running at its defaults is just a waste of energy. Even halving your TDP just loses you roughly 15% performance in highly MT scenarios...

magicalhippo

He did have the Fractal Define 7 Compact case, and the pictures[1] only show a single 140mm case fan. From personal experience the Fractal Define cases are great at sound reduction due to the thermal padding, but those pads also insulates well.

Given the motherboard and RAM will also generate quite some heat, if the case fan profile was conservative (he does mention he likes low noise), could be the insides got quite toasty.

Back when I got my 2080 Ti, I had this issue when gaming. The internal temps would get so hot due to the blanket effect of the padding I couldn't touch the components after a gaming session. Had to significantly tweak my fan profiles. His CPU at peak would generate about the same amount of heat as my 2080 Ti + CPU I had then, and I had the non-Compact case with two case fans.

[1]: https://michael.stapelberg.ch/posts/2025-05-15-my-2025-high-...

enronmusk

Excellent point. A single case fan is highly atypical and concerning.

I also have a fractal define case with anti noise padding material and dust filters, but my temperatures are great and the computer is almost inaudible even when compiling code for hours with -j $(nproc). And my fans and cooler are much cheaper than his.

Dunedan

> […] so I am surprised he has done that and expected stability at 100C regardless of what Intel claim is okay.

Intel specifies a max operating temperature of 105°C for the 285K [1]. Also modern CPUs aren't supposed to die when run with inadequate cooling, but instead clock down to stay within their thermal envelope.

[1]: https://www.intel.com/content/www/us/en/products/sku/241060/...

epolanski

I always wonder: how many sensors are registering that temp?

Because CPUs can get much hotter in specific spots at specific pins no? Just because you're reading 100, doesn't mean there aren't spots that are way hotter.

My understanding is that modern Intel CPUs have a temp sensor per core + one at package level, but which one is being reported?

enronmusk

Yes, I have read the article and I agree Intel should be shamed (and even sued) for inaccurate statements. But it doesn't change the fact it has never been a good idea to run desktop processors at their throttling temperature -- it's not good for performance, it's not good for longevity and stability, and it's also terrible for efficiency (performance per watt).

Anyway, OP's cooler should be able to cool down 250W CPUs below 100C. He must have done something wrong for this to not happen. That's my point -- the motherboard likely overclocked the CPU and he failed to properly cool it down or set a power limit (PL1/PL2). He could have easily avoided all this trouble.

danieldk

I feel like both Intel and AMD are not doing great in the desktop CPU stability department. I made a machine with a Ryzen 9900X a while back and it had the issue that it would freeze when idling. A few years before I had a 5950X that would regularly crash under load (luckily it was a prebuilt, so it was ultimately fixed).

When you do not have a bunch of components ready to swap out it is also really hard to debug these issues. Sometimes it’s something completely different like the PSU. After the last issues, I decided to buy a prebuilt (ThinkStation) with on-site service. The cooling is a bit worse, etc., but if issues come up, I don’t have to spend a lot of time debugging them.

Random other comment: when comparing CPUs, a sad observation was that even a passively cooled M4 is faster than a lot of desktop CPUs (typically single-threaded, sometimes also multi-threaded).

seec

Your comment about the passively cooled M4 is misleading. Sure, in single thread, it will be definitely faster. In multithread unless you are going for low end or older CPUs it's basically a lie. A 10 Core M4 will score around a 14TH gen mobile i5. It will consume much less power but the argument is on performance, so that's beside the point.

And if we are talking about a passively cooled M4 (MacBook Air basically) it will quite heavily throttle relatively quickly, you lose at the very least 30%.

So, let's not misrepresent things, Apple CPUs are very power efficient but they are not magic, if you hit them hard, they still need good cooling. Plenty of people have had the experience with their M4 Max, discovering that actually, if they did use the laptop as a workstation, it will generate a good amount of fan noise, there is no other way around.

Apple stuff is good because most people actually have bursty workload (especially graphic design, video editing and some audio stuff) but if you hammer it for hours on end, it's not that good and the power efficiency point becomes a bit moot.

Dennip

I have a 5950X system that will just randomly shut down, I've RMA'd the CPU, tried swapping the RAM, GPU, PSU and the motherboard in different combinations. I cannot track down a specific issue and it just won't be stable. I've given up and decided to discard the PC of theseus and build a new one -_-.

bob1029

I've got a 5950x that I can reliably crater with a very specific .NET 8 console app when it would otherwise be stable 24/7/365, even under some pretty crazy workloads like Unity.

I think a lot of it boils down to load profile and power delivery. My 2500VA double conversion UPS seems to have difficulty keeping up with the volatility in load when running that console app. I can tell because its fans ramp up and my lights on the same circuit begin to flicker very perceptibly. It also creates audible PWM noise in the PC which is crazy to me because up til recently I've only ever heard that from a heavily loaded GPU.

neRok

> I think a lot of it boils down to load profile and power delivery

You said the right words but with the wrong meaning! On Gigabyte mobo you want to increase the "CPU Vcore Loadline Calibration" and the "PWM Phase Control" settings, [see screenshot here](https://forum.level1techs.com/t/ddr4-ram-load-line-calibrati...).

When I first got my Ryzen 3900X cpu and X570 mobo in 2019, I had many issues for a long time (freezes at idle, not waking from sleep, bios loops, etc). Eventually I found that bumping up those settings to ~High (maybe even Extreme) was what was required, and things worked for 2 years or so until I got a 5950X on clearance last year.

I slotted that in to the same mobo and it worked fine, but when I was looking at HWMon etc, I noticed some strange things with the power/voltage. After some mucking about and theorising with ChatGPT (it's way quicker than googling for uncommon problems), it became apparent that the ~High LLC/power settings I was still using were no good. ChatGPT explained that my 3900X was probably a bit "crude" in relative quality, and so it needed the "stronger" power settings to keep itself in order. Then when I've swapped to 5950X, it happens to be more "refined" and thus doesn't need to be "manhandled" — and in fact, didn't like being manhandled at all!

heelix

I wonder if cooling/power is really the key here. I've got a 5950x that ended up getting the water loop I'd intended for my next threadripper - only to find they were not selling the blasted things to anyone but a few companies. With the cooling sized for almost twice what the 5950x could put out, it has been a very stable machine for some crazy workloads. That old dog will likely keep the setup when a zen 5 TR gets swapped in.

For a long time, my Achille's heel was my Bride's vacuum. Her Dyson pulled enough amps that the UPS would start singing and trigger the auto shutdown sequence for the half rack. Took way too long to figure out as I was usually not around when she did it.

esseph

I have a 5700X with an AIO water cooler and it runs 65C under load. Never seems to crash. Been like this for years.

486sx33

My 5950 didn’t like liquid cooling and lives very well with air cooling :)

shrubble

If you have a double conversion UPS that is complaining about less than 100W deviation, I would recommend you check the UPS for a component that is out of spec or on the way to failure.

bob1029

The concern isn't the average rated TDP. It's the high Di/dt (change in current over time) transients of certain workload profiles cascading through the various layers of switch mode power supplies. Every layer of power delivery has some reactivity to it. I'd agree this would be no problem if all our power supplies were purely linear (and massively inefficient).

bell-cot

I'm sure there are spec's for how fast a PS should be able to ramp up in response to spikes in demand, how a motherboard should handle sudden load changes, etc.

But if your UPS (or just the electrical outlet you're plugged into) can't cope - dunno if I'd describe that as cratering your CPU.

66fm472tjy7

Occasionally occurring issues are so annoying. I lived with these issues for years before becoming able to reliably reproduce them by accident and thus making a good guess on the cause:

My system would randomly freeze for ~5 seconds, usually while gaming and having a video in the browser running a the same time. Then, it would reliably happen in Titanfall 2 and I noticed there were always AHCI errors in the Windows logs at the same time so I switched to an NVMe drive.

The system would also shut down occasionally (~ once every few hours) in certain games only. Then, I managed to reproduce it 100% of the time by casting lightning magic in Oblivion Remastered. I had to switch out my PSU, the old one probably couldn't handle some transient load spike, even though it was a Seasonic Prime Ultra Titanium.

etempleton

My experience is similar. Modern enthusiast CPUs and hardware compatibility is going backwards. I have a 5900x that randomly crashes on idle, but not under load. My 285K has so far been rock solid and generally feels snappier. I feel like both Intel and AMD are really trying to push the envelope to look good on benchmarks and this is the end result.

InMice

Have you tried using powerprofilesctl to change the power profile to 'performance' instead of 'balanced' or 'power saver'? I think this would prevent the lowest idle states at least. Just a guesss, never had this problem myself.

My modern CPU problems are DDR5 and the pre-boot timing thing never completing. So a build of a 9700x that I did that WAS supposed to be located remotely from me has to sit in my office and have its hand held thru every reboot cuz you never know quite know when its doing to decide it needs to retime and randomly never come back. Requires pulling the plug from the back and waiting a few minutes then powering back, then waiting 30 minutes for 64gb of ddr5 to do its timing thing.

naasking

Crash on idle, interesting. Must be some timing issue related to down clocking, or maybe a voltage issue related to shutting off a core.

sunmag

Have had three systems (two 5800x, one 3600x) that reboots/freezes due to WHEA errors. Started after about 3years problem free. One of the 5800xs so frequently it was trashed.

protocolture

>I made a machine with a Ryzen 9900X a while back and it had the issue that it would freeze when idling

I also have this issue.

InMice

You could try using powerprofilesctl to change the mode from 'balanced' or 'power saver' to 'performance' since i think this may prevent the cpu from ever entering the throttled down low idle states that your freezing happens in. they are controlled with powerprofilesctl. You may also be able to add some flugs to grub config file. assuming you are using linux i guess.

c0balt

If you are on Linux, there are long time known problems with low power cpu states. These states can be entered by your CPU when under low/no load.

A common approach is to go into the BIOS/UEFI settings and check that c6 is disabled. To verify and/or temporarily turn c6 off, see https://github.com/r4m0n/ZenStates-Linux

hedora

It’s also worth checking all the autodetected stuff that can be overclocked, like ram speed. That stuff can be wrong, and then you get crazy but similar bugs in linux and windows.

johnisgood

This does not fill me with much hope. What am I even ought to buy at this point then, I wonder. I have a ~13 years old Intel CPU which lacks AVX2 (and I need it by now) and I thought of buying a new desktop (items separately, of course), but that is crazy to me that it freezes because of the CPU going idle. It was never an issue in my case. I guess I can only hope it is not going to be a problem once I completed building my PC. :|

On what metric am I ought to buy a CPU these days? Should I care about reviews? I am fine with a middle-end CPU, for what it is worth, and I thought of AMD Ryzen 7 5700 or AMD Ryzen 5 5600GT or anything with a similar price tag. They might even be lower-end by now?

hhh

Just buy an AMD CPU. One person’s experience isn’t the world. Nobody in my circle has had an issue with any chip from AMD in recent time (10 years).

Intel is just bad at the moment and not even worth touching.

danieldk

I agree that Intel is bad at the moment (especially with the 13th and 14th gen self-destruct issues). But unfortunately I also know plenty of people with issues with AMD systems.

And it's no bad power quality on mains as someone suggested (it's excellent here) or 'in the air' (whatever that means) if it happens very quickly after buying.

I would guess that a lot of it comes from bad firmware/mainboards, etc. like the recent issue with ASRock mainboards destroying Ryzen 9000-series GPUs: https://www.techspot.com/news/108120-asrock-confirms-ryzen-9... Anyone who uses Linux and has dealt with bad ACPI bugs, etc. knows that a lot of these mainboards probably have crap firmware.

I should also say that I had a Ryzen 3700X and 5900X many years back and two laptops with a Ryzen CPU and they have been awesome.

hedora

I went further and got an AMD system on chip machine with an integrated gpu. It’s fine for gaming and borderline for LLM inference (I should have put 64GB in instead of 32GB).

The only issues are with an intel Bluetooth chipset, and bios auto detection bugs. Under Linux, the hardware is bug for bug compatible with Windows, and I’m down to zero known issues after doing a bit of hardware debugging.

tester756

This is funny because recently my AMD Ryzen 7 5700X3D died and I've decided that my next CPU will be Intel

https://news.ycombinator.com/item?id=45043269

johnisgood

That is what I thought, thanks.

ahofmann

I wouldn't be so hopeless. Intel and AMD CPUs are used in millions of builds and most of them just work.

dahcryn

Indeed. I feel so weird reading this discussion section.

My home server is on a 5600G. I turned it on, installed home assistant and jellyfin etc... , and since it has not been off. It's been chugging along completely unattended, no worries.

Yes, it's in a basement where temperature is never above 21C, and it's almost never pushed to 100%, and certainly never for extended periods of time.

But it's the stock cooler, cheap motherboard, cheap RAM and cheap SSD (with expensive NAS grade mechanical hard drives).

danieldk

However, the vast majority of PCs out there are not hobbyist builds but Dell/Lenovo/HP/etc. [1] with far fewer possible configurations (and much more testing as a byproduct). I am not saying these machines never have issues, but a high failure rate would not be acceptable to their business customers.

[1] Well, most non-servers are probably laptops today, but the same reasoning applies.

homebrewer

It's either bad luck, bad power quality from the mains, or something in the air in that particular area. I know plenty of people running AM5 builds, have done so myself for the last couple of years, and there were no problems with any of them apart from the usual amdgpu bugs in latest kernels (which are "normal" since I'm running mainline kernels — it's easy to solve by just sticking to lts, and it has seemingly improved anyway since 6.15).

scns

> I thought of AMD Ryzen 7 5700

Definetly not that one if you plan to pair with a dedicated GPU! The 5700X has twice the L3 cache. All Ryzen 5000 with a GPU have only 16MB, 5700 has the GPU deactivated.

PartiallyTyped

3 of my last 4 machines have been AMD x NVDA and I have been very happy. The intel x NVDA machine has been my least stable one.

orthoxerox

It's interesting how Intel has been surviving in smaller and smaller market niches these days:

  - cheap ULV chips like N100, N150, N300
  - ultrabook ULV chips (I hope Lunar Lake is not a fluke)
  - workstation chips that aren't too powerful (mainstream Core CPUs)
  - inexpensive GPUs (a surprising niche, but excruciatingly small)
AMD has been dominating them in all other submarkets.

Without a mainstream halo product Intel has been forced to compete on price, which is not something they can afford. They have to make a product that leapfrogs either AMD or Nvidia and successfully (and meaningfully) iterate on it. The last time they tried something like that was in 2021 with the launch of Alder Lake, but AMD overtook them with 3D V-Cache in 2022.

norman784

AFAIK most (if not all) business laptops AKA Dell are intel based? Also I believe they are still big in the server market.

guardian5x

Dell has been very loyal to Intel all these years, but i guess that is under pressure as well. As more and more customers look for AMD CPUs nowadays. I guess the CPU doesn't matter much in standard office company laptops and price is more important.

noisem4ker

I'm not sure whether a "Dell Pro 16 Plus" is considered a "business laptop" (although I think so), but I'm using one right now and it has an AMD Ryzen AI 5 Pro CPU inside.

casenmgreen

Curiously, libgmp reported something similar recently, but with AMD.

https://gmplib.org/gmp-zen5

devnullbrain

In my experience building PCs this is not so curious. There are just a lot of duds, from individual SKUs to entire generations, and both manufacturers and retailers will do anything to prevent you RMAing them.

I also find that, as performance improvements tolerances get tighter throughout the system, the set of 'things that can screw your build' grows bigger.

kd913

General consensus on that case seems to be they picked a budget motherboard and skimped on the cooler.

devnullbrain

That ASUS motherboard is far from the cheapest available. If using it makes the user liable for failure, a large part of the market is unsuitable.

For both the cooler and the motherboard, AMD have too much control to look the other way. The chip can measure its own temperature and the conceit of undermining partners by moving things on chip and controlling more of the ecosystem is that things perform better. They should at least perform.

bubblebeard

I have the same CPU in my primary system, and if you can afford it, it’s so choice.

A big surprise for me, having owned both a Ryzen gen 1 & 3 previously, was that this time my system posted without me needing to flash my BIOS or play around with various RAM configurations. Felt like magic.

baobabKoodaa

Why is the author showing a chart of room temperatures? CPU temperature is what matters here. Expecting a CPU to be stable at 100C is just asking for problems. Issue probably could have been avoided by making improvements to case airflow.

Jolter

I would expect the CPU to start throttling at high temperatures in order to avoid damage. Allegedly, it never did, and instead died. Do you think that’s acceptable in 2025?

ACCount37

Thermal throttling originated as a safety feature. The early implementations were basically a "thermal fuse" in function, and cut all power to the system to prevent catastrophic hardware damage. Only later did the more sophisticated versions that do things like "cut down clocks to prevent temps from rising further" appear.

On desktop PCs, thermal throttling is often set up as "just a safety feature" to this very day. Which means: the system does NOT expect to stay at the edge of its thermal limit. I would not trust thermal throttling with keeping a system running safely at a continuous 100C on die.

100C is already a "danger zone", with elevated error rates and faster circuit degradation - and there are only this many thermal sensors a die has. Some under-sensored hotspots may be running a few degrees higher than that. Which may not be enough to kill the die outright - but more than enough to put those hotspots into a "fuck around" zone of increased instability and massively accelerated degradation.

If you're relying on thermal throttling to balance your system's performance, as laptops and smartphones often do, then you seriously need to dial in better temperature thresholds. 100C is way too spicy.

baobabKoodaa

What does room temperature have to do with any of this? Yes, you can lower your CPU temperature by lowering your room temperature. But you can also lower your CPU temperature by a variety of other means; particularly by improving case airflow. CPU temperature is the interesting metric here, not room temperature.

FeepingCreature

No but it's also important to realize that this CPU was running at an insane temperature that should never happen in normal operation. I have a laptop with an undersized fan and if I max out all my cores with full load, I barely cross 80. 100 is mental. It doesn't matter if the manufacturer set the peak temperature wrong, a computer whose cpu reaches 100 degrees celsius is simply built incorrectly.

If nothing else, it very clearly indicates that you can boost your performance significantly by sorting out your cooling because your cpu will be stuck permanently emergency throttling.

izacus

I somehow doubt that, are you looking at the same temperature? I haven't seen a laptop that would have thermal stop under 95 for a long time and any gaming laptop will run at 95 under load for package temps.

userbinator

Expecting a CPU to be stable at 100C is just asking for problems.

I had an 8th-gen i7 sitting at the thermal limit (~100C) in a laptop for half a decade 24/7 with no problem. As sibling comments have noted, modern CPUs are designed to run "flat-out against the governor".

Voltage-dependent electromigration is the biggest problem and what lead to the failures in Intel CPUs not long ago, perhaps ironically caused by cooling that was "too good" --- the CPU finds that there's still plenty of thermal headroom, so it boosts frequency and accompanying voltage to reach the limit, and went too far with the voltage. If it had hit the thermal limit it would've backed off on the voltage and frequency.

swinglock

The text clearly explains all of this.

baobabKoodaa

No it does not. Which part of the text do you feel explains this?

chmod775

First off, there's a chart for CPU temperature at the very top and they do talk about it:

> I also double-checked if the CPU temperature of about 100 degrees celsius is too high, but no: [..] Intel specifies a maximum of 110 degrees. So, running at “only” 100 degrees for a few hours should be fine.

Secondly, the article reads:

> Tom’s Hardware recently reported that “Intel Raptor Lake crashes are increasing with rising temperatures in record European heat wave”, which prompted some folks to blame Europe’s general lack of Air Conditioning.

> But in this case, I actually did air-condition the room about half-way through the job (at about 16:00), when I noticed the room was getting hot. Here’s the temperature graph:

> [GRAPH]

> I would say that 25 to 28 degrees celsius are normal temperatures for computers.

So apparently a Tom's Hardware article connected a recent heat wave with crashing computers containing Intel CPUs. They brought that up to rule it out by presenting a graph showing reasonable room temperatures.

I hope this helps.

formerly_proven

Strange, laptop CPUs and their thermal solutions are designed in concert to stay at Tjmax when under sustained load and throttle appropriately to maintain maximum temperature (~ power ~ performance).

ACCount37

And those mobile devices have much more conservative limits, and much more aggressive throttling behavior.

Smartphones have no active cooling and are fully dependent on thermal throttling for survival, but they can start throttling at as low as 50C easily. Laptops with underspecced cooling systems generally try their best to avoid crossing into triple digits - a lot of them max out at 85C to 95C, even under extreme loads.

dezgeg

For handhelds the temperature of the device's case is one factor as well when deciding the thermal limits (so you don't burn the user's hands) - less of a problem on laptops.

perching_aix

> Expecting a CPU to be stable at 100C is just asking for problems.

No. High performance gaming laptops will routinely do this for hours on end for years.

If it can't take it, it shouldn't allow it.

bell-cot

I've not looked at the specifics here - but "stable at X degrees, Y% duty cycle, for Z" years is just another engineering spec.

Intel's basic 285K spec's - https://www.intel.com/content/www/us/en/products/sku/241060/... - say "Max Operating Temperature 105 °C".

So, yes - running the CPU that close to its maximum is really not asking for stability, nor longevity.

No reason to doubt your assertion about gaming laptops - but chip binning is a thing, and the manufacturers of those laptops have every reason to pay Intel a premium for CPU's which test to better values of X, Y, and Z.

Fr0styMatt88

Alright so two CPUs failing in the same system has gotta be strange; mobo issue?

Secondly, what BIOS settings should I be using to run safely? Is XMP/whatever the AMD equivalent is safe? If I don't run XMP then my RAM runs at way below spec (for the stick) default speeds.

Anyone know of a good guide for this stuff?

fmajid

I generally prefer AMD Zen5 to Intel due to AVX512 not being gimped by crippled E-cores that really don't belong on a desktop system, SMT (hyperthreading) that actually works and using TSMC processes, but they've also had their issues recently:

https://www.theregister.com/2025/08/29/amd_ryzen_twice_fails...

whyoh

It's crazy how unreliable CPUs have become in the last 5 years or so, both AMD and Intel. And it seems they're all running at their limit from the factory, whereas 10-20 years ago they usually had ample headroom for overclocking.

stavros

That's good, isn't it? I don't want the factory leaving performance on the table.

topspin

I do. I've been buying Intel for the same reason as the author: I build machines that don't have glitches and mysterious failures and driver issues and all the rest of the garbage one sees PC assemblers inflict on themselves. Make conservative choices and leave ample headroom and you get a solid machine with no problems.

I've never overclocked anything and I've never felt I've missed out in any way. I really can't imagine spending even one minute trying to squeeze 5% or whatnot tweaking voltages and dealing with plumbing and roaring fans. I want to use the machine, not hotrod it.

I would rather Intel et al. leave a few percent "on the table" and sell things that work, for years on end without failure and without a lot of care and feeding. Lately it looks like a crapshoot trying to identify components that don't kill themselves.

stavros

So underclock your CPU.

bell-cot

Depends on your priorities. That "performance on the table" might also be called "engineering safety factor for stability".

devnullbrain

Yep. Redundancy and headroom are antonyms of efficiency.

stavros

Given that there used to be plenty of room to overclock the cores while still keeping them stable, I think it was more "performance on the table".

makeitdouble

TBF using more conservative energy profiles will bring stability and safety. To that effect in Windows the default profile effectively debuffs the CPU and most people will be fine that way.

techpression

The 7800X3D is amazing here, runs extremely cool and stable, you can push it far above its defaults and it still won’t get to 80C even with air cooling. Mine was running between 60-70 under load with PBO set to high. Unfortunately it seems its successor is not that great :/

k4rli

7900X same. System uptimes of 1month+ often and nearly always runs at 5.0Ghz. Never goes above 80c or so either.

hu3

Same for 9800X3D here, which is basically the same CPU. Watercooled. Silent. Stupidly fast.

williamDafoe

The 7000 series of CPUs is NOT known for running cool, unlike the AMD 5000 series (which are basically server CPUs repurposed for desktop usage). In the 7000 series, AMD decided to just increase the power of each CPU and that's where most of the performance gains are coming from - but power consumption is 40-50% higher than with similar 5000-series CPUs.

scns

When you use EcoMode with them you only lose ~5% performance, but are still ~30% ahead of the corresponding 5000-series CPU. You can reduce PPT/TDP even further while still ahead.

https://www.computerbase.de/artikel/prozessoren/amd-ryzen-79...

techpression

I specifically singled out the 7800X3D though, it runs incredibly cool and at a very low power draw for the performance you get.

mldbk

> You know, I'm something of a CPU engineer myself :D

Actually almost everything what you wrote is not true, and commenter above already sent you some links.

7800X3D is the GOAT, very power efficient and cool.

mrheosuper

we have unstable "code" generator, so unstable CPU would be natural.

cjpartridge

Try enabling PBO and finding a setting for the curve optimizer that works for you, each CPU is different but -10/-15 is generally achievable - should reduce temperatures across the board and potentially give you some more performance.

steve1977

The problem is that stable curve optimizer settings can vary hugely across cores

I had differences of like 20 or more between different cores... i.e. one core might work fine at -20, the other maybe only at +5.

cjpartridge

Most definitely - you should always do your own stress testing with your specific CPU (and system) to find out what's stable.

And while all core CO might not be optimal, based on personal experience and what I've seen across multiple enthusiast communities, more often than not you can get an worthwhile improvement to temps/perf with an all core CO.

That being said, there are certainly ways to find and set the best CO values per core, but it will certainly take more effort, stress testing and time.