Hacker Newsnew | past | comments | ask | show | jobs | submit | cthalupa's commentslogin

Plenty of demand outside the US. Why would the hyperscalers not buy the chinese RAM for all of their datacenters across the world besides the US one?

Rising supply from China will impact prices even in countries where there are tariffs.


The best Chinese RAM on the market is 50% larger and requires more power and thus emits more heat, as it is a 16nm feature size. If they can get to competitive sizes, then of course data centers will purchase it.

> The cuda cores can be broken up into compute complexes which larger blocks of memory directly attached to the cores.

Perhaps in theory, but for the gb10 stuff the memory is all on the CPU die and connected to the GPU die via nvlink-c2c


The dgx spark is the same chip and those are in the low 3s to 5 range for most of them depending on manu, storage config, etc. The dgx sparks also have connectx 7 cards in them to support the 200gbps networking for RoCE.

So I would expect the mini PCs to come in less than the sparks. Laptops I assume will be close in price with the addition of all the other laptop stuff.


Prefill is another advantage vs. Apple. It's way way way way faster on a spark than it is even on an m5 max.

Same model, same quant, same query, as close to as matched settings as I can get from vllm, and for workloads with large prompts + low cacheability, one of my sparks will often be done responding before the mbp is done with prefill.


Not true. This is aimed squarely at the Strix Halo and Mac markets. It's basically just strictly better than the Strix, and it's not clear cut vs that Macs in any sort of blanket statement.

My M5 Max 128gb MBP decodes faster than one of my Sparks, but the Spark's prefill is so much faster it can often answer the same query before the mac's prefill is finished. If you have large prompts, low cacheability, etc., a spark might be a very good options.

Not to mention you get can get two sparks and the MBP will be 85%+ of the cost at half the RAM.

I'm kind of tempted to pick one up. Leave running big models to my dual dgx setup, and all the misc. random stuff on an rtx.


Prefill will be a huge deal if batched unattended inference of SOTA models (on consumer platforms) becomes viable, because at that point it's the main remaining bottleneck. If running 30 inferences together boosts your decode throughput to 3x (that's consistent with some very rough experiments, though these haven't even looked at trying to mask SSD offload latency just yet), that's a 10x in total decode time but a 30x in total prefill time, because prefill workloads are fully compute bound already on consumer platforms and don't benefit from batching much at all.

Fair, but I don’t see what case you have w this. Mind sharing?

Seems niche to be both uncacheable and long context?


Anything where you're dealing with a large volume of records/documents. Lots of people are using these for large-scale digitization of documents - scanned stuff being OCR'ed and summarized, generating embeddings, etc. Large scale translation.

Anywhere where you might have a large backlog of data to work with can end up in this sort of situation.


For these in specific, they appear basically transparently to the GPU. There's a lot of software/firmware stuff for this, but also a different hardware architecture - while the RAM is on the CPU die, the nvlink-c2c gives it extremely low latency and 600GB/s bandwidth between the GPU and CPU.

> If I see code like the one above posted by the OP, that the author wouldn't have written, I start to pay attention.

Except the author did write it. https://github.com/RsyncProject/rsync/issues/959#issuecommen...

Which is part of the problem with all of this nonsense right now - everyone is running off of emotion and not looking to see if what is being said is actually true. Which is somewhat ironic, considering the message of the article.


Never had this issue with any of my MBP. I never even have magsafe available, couldn't tell you where a USB-C to magsafe cable is in my house if my life depended on it.

While not quite what you are saying here, I have found that I like usbc charging on my 15in m3 MacBook Air way more than its magsafe. The magsafe is always falling out as soon as I move the computer on and off my lap and is another cable that I can only use for one purpose. Nowhere near as good as it was back in the early 2010s macbook days...

They are saying they don’t own a MagSafe cable and have never experienced the above anecdote of a MacBook that was too dead to be charged by USBC alone. Likewise, my M1 MacBook Pro has only ever been charged by USBC and I let it die all the way often.

In my specific case, the laptop was already at low (<10%) battery when I closed the lid, and it sat for several days without a top-up.

I remember getting my first CalDigit TB dock and being excited - everyone seemed to love them. I expected it to largely Just Work.

That thing Didn't Work more than it Worked, but options were slim. Eventually it fully died about 14 months in. I didn't even bother checking to see what the warranty terms were. TS3 Plus, back in 17 or 18. What a piece of shit.

Sounds like it's a good thing I didn't bother trying again in the early 2020s and only recently bought a new dock.


Very similar story here. Went through two Caldigit TB hubs most recently a TB4. Soooo many issues. The same Ethernet issue described above, a failure to provide the rated PD power, and the TB linked monitor connection was dodgy af. A very expensive lesson. Add to this the confusing (and deceptive) jumble of TB cable standards. I have so many supposed TB3,4 and 5 rated cables I could probably circle my house. You have to hand Apple one thing and that’s the consistency of their hardware due to tight control of the stack and supply chain. You get far fewer of these sorts of issues.

I have a TS3+ that broke about 18 months in. I talked to support, set up a repair, and before I could send it the dock unbroke and had worked since. Truly mysterious and left me with a sense of unease with that thing given the cost.

My Caldigit TS4 dock is so close to being perfect except for my secondary monitor turning on maybe 50% of the time if I connect it to the dock via USB-C, I've given up and now have USB-C from the secondary monitor going straight to my laptop but let me be entirely clear in saying I hate that I have to do that.

You can split tensors across an AMD GPU and Nvidia GPU - different architectures are not an issue. People run LLMs across some pretty crazy setups.

It depends but you cannot directly mix for example Ampere with Ada coz the lack of support for native FP8 in Ampere.

There are a variety of inference engines that support this, regardless of whether or not there is native FP8 in Ampere - llama.cpp will do it quite happily. VLLM you can do W8A16 quant too.

There are a whole lot of ways to quantize models in general.


Yeah, you'd need to use asymmetric quantization and other software techniques.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: