The best Chinese RAM on the market is 50% larger and requires more power and thus emits more heat, as it is a 16nm feature size. If they can get to competitive sizes, then of course data centers will purchase it.
The dgx spark is the same chip and those are in the low 3s to 5 range for most of them depending on manu, storage config, etc. The dgx sparks also have connectx 7 cards in them to support the 200gbps networking for RoCE.
So I would expect the mini PCs to come in less than the sparks. Laptops I assume will be close in price with the addition of all the other laptop stuff.
Prefill is another advantage vs. Apple. It's way way way way faster on a spark than it is even on an m5 max.
Same model, same quant, same query, as close to as matched settings as I can get from vllm, and for workloads with large prompts + low cacheability, one of my sparks will often be done responding before the mbp is done with prefill.
Not true. This is aimed squarely at the Strix Halo and Mac markets. It's basically just strictly better than the Strix, and it's not clear cut vs that Macs in any sort of blanket statement.
My M5 Max 128gb MBP decodes faster than one of my Sparks, but the Spark's prefill is so much faster it can often answer the same query before the mac's prefill is finished. If you have large prompts, low cacheability, etc., a spark might be a very good options.
Not to mention you get can get two sparks and the MBP will be 85%+ of the cost at half the RAM.
I'm kind of tempted to pick one up. Leave running big models to my dual dgx setup, and all the misc. random stuff on an rtx.
Prefill will be a huge deal if batched unattended inference of SOTA models (on consumer platforms) becomes viable, because at that point it's the main remaining bottleneck. If running 30 inferences together boosts your decode throughput to 3x (that's consistent with some very rough experiments, though these haven't even looked at trying to mask SSD offload latency just yet), that's a 10x in total decode time but a 30x in total prefill time, because prefill workloads are fully compute bound already on consumer platforms and don't benefit from batching much at all.
Anything where you're dealing with a large volume of records/documents. Lots of people are using these for large-scale digitization of documents - scanned stuff being OCR'ed and summarized, generating embeddings, etc. Large scale translation.
Anywhere where you might have a large backlog of data to work with can end up in this sort of situation.
For these in specific, they appear basically transparently to the GPU. There's a lot of software/firmware stuff for this, but also a different hardware architecture - while the RAM is on the CPU die, the nvlink-c2c gives it extremely low latency and 600GB/s bandwidth between the GPU and CPU.
Which is part of the problem with all of this nonsense right now - everyone is running off of emotion and not looking to see if what is being said is actually true. Which is somewhat ironic, considering the message of the article.
Never had this issue with any of my MBP. I never even have magsafe available, couldn't tell you where a USB-C to magsafe cable is in my house if my life depended on it.
While not quite what you are saying here, I have found that I like usbc charging on my 15in m3 MacBook Air way more than its magsafe. The magsafe is always falling out as soon as I move the computer on and off my lap and is another cable that I can only use for one purpose. Nowhere near as good as it was back in the early 2010s macbook days...
They are saying they don’t own a MagSafe cable and have never experienced the above anecdote of a MacBook that was too dead to be charged by USBC alone. Likewise, my M1 MacBook Pro has only ever been charged by USBC and I let it die all the way often.
I remember getting my first CalDigit TB dock and being excited - everyone seemed to love them. I expected it to largely Just Work.
That thing Didn't Work more than it Worked, but options were slim. Eventually it fully died about 14 months in. I didn't even bother checking to see what the warranty terms were. TS3 Plus, back in 17 or 18. What a piece of shit.
Sounds like it's a good thing I didn't bother trying again in the early 2020s and only recently bought a new dock.
Very similar story here. Went through two Caldigit TB hubs most recently a TB4. Soooo many issues. The same Ethernet issue described above, a failure to provide the rated PD power, and the TB linked monitor connection was dodgy af. A very expensive lesson. Add to this the confusing (and deceptive) jumble of TB cable standards. I have so many supposed TB3,4 and 5 rated cables I could probably circle my house. You have to hand Apple one thing and that’s the consistency of their hardware due to tight control of the stack and supply chain. You get far fewer of these sorts of issues.
I have a TS3+ that broke about 18 months in. I talked to support, set up a repair, and before I could send it the dock unbroke and had worked since. Truly mysterious and left me with a sense of unease with that thing given the cost.
My Caldigit TS4 dock is so close to being perfect except for my secondary monitor turning on maybe 50% of the time if I connect it to the dock via USB-C, I've given up and now have USB-C from the secondary monitor going straight to my laptop but let me be entirely clear in saying I hate that I have to do that.
There are a variety of inference engines that support this, regardless of whether or not there is native FP8 in Ampere - llama.cpp will do it quite happily. VLLM you can do W8A16 quant too.
There are a whole lot of ways to quantize models in general.
Rising supply from China will impact prices even in countries where there are tariffs.
reply