Tangential, I bought a nearly identically-spec'd (didn't spring for the 8 TB SSD - in retrospect, had I kept it, I would've been OK with the 4 TB) model, and returned it yesterday due to thermal throttling. I have an M4 Pro w/ 48 GB RAM, and since the M5 Max was touted as being quite a bit faster for various local LLM usages, I decided I'd try it.
Turns out the heatsink in the 14" isn't nearly enough to handle the Max with all cores pegged. I'd get about 30 seconds of full power before frequency would drop like a rock.
I haven't really had a problem with thermal throttling, but my highest compute activity is inferencing. The main performance fall-off I've observed is that the cache/context size to token output rate curve is way more aggressive than I expected given the memory bandwidth compared to GPU-based inferencing I've done on PC. But other than spinning up the fans during prompt processing, I'm able to stay peak CPU usage without clock speed reducing. Generally though this only maintains peak compute utilization for around 2-3 minutes.
I'm wondering if there was something wrong with your particular unit?
CPU performance was acceptable; GPU was the one was that falling off a cliff.
Re: particular unit, I’m not sure - it was perfectly fine during anything “normal,” and admittedly, asking a laptop to run at 100% for any extended period of time is already a big ask. But it’s possible, I suppose.
I’m waiting for the Studios to get the Max and / or Ultra, and will reconsider if I want one, or if I don’t really need to play with local LLM at this time.
Turns out the heatsink in the 14" isn't nearly enough to handle the Max with all cores pegged. I'd get about 30 seconds of full power before frequency would drop like a rock.