Are there multiple instances of each model running? Or at least more than 1? I'd be fascinated to see what multiple Claude instances would fare, would they all be up, or did this instance just get lucky?
While space has always interested me quite a bit, I've never looked into the toilet situation and I had this scene [0] from an unrealistic kids movie firmly fixed in my brain as "this is how they use the restroom in space, or something better since that movie is old".
YouTube Premium is the best ~~$11.99~~ ~~$13.00~~ $15.17 I spend per month...
I actually don't care that much about YouTube content and it's not a place I go and hang out, but I'll pay just about anything to avoid seeing ads. Yes, I know ad blockers exist, but getting them to work on laptop, phone, Apple TV directly, Apple TV via casting, etc is not easy or even possible in some cases. If I had to guess I watch ~1hr of YouTube a week, if I watch more than that I'm watching longer-form content, but mostly it's because everyone hosts their videos there so the 3-7min here and there add up to ~1hr (tutorials, product launch, help/documentation video, etc). As much as it pains me to fork over $15+/mo for that, I hate being interrupted by ads more.
Firefox + uBlock Origin on smartphone and desktop = no ads (though I personally prefer Vivaldi on desktop, but for simplicity just recommend Firefox)
SmartTube on Android TV has no ads and skip sponsors
so all you need to remember are two apps FF (+uBO) and SmartTune on TV, you install it once and don't care anymore, comparably difficult with payment for YT
As with all of these cure-alls, I'm wary. Mostly I'm wary because I anticipate the developer will lose interest in very little time and also because it will just get subsumed into CC at some point if it actually works. It might take longer but changing my workflow every few days for the new thing that's going to reduce MCP usage, replace it, compress it, etc is way too disruptive.
I'm generally happy with the base Claude Code and I think running a near-vanilla setup is the best option currently with how quickly things are moving.
Agreed. Projects like these tend to feel shortsighted.
Lately, I lean towards keeping a vanilla setup until I’m convinced the new thing will last beyond being a fad (and not subsumed by AI lab) or beyond being just for niche use cases.
For example, I still have never used worktrees and I barely use MCPs. But, skills, I love.
In my view an unappreciated benefit of the vanilla setup is you can get really accustomed to the model’s strengths and weaknesses. I don’t need a prompt to try to steer around these potholes when I can navigate on my own just fine. I love skills too because they can be out of the way until I decide to use them.
I also share something of an "efficient market hypothesis" with regards to Claude Code. Given that Anthropic is basically a hothouse of geniuses recursively dogfooding their own product, the market pressure to make the vanilla setup be the one that performs best at writing code is incredibly high. I just treat CLAUDE.md like my first draft memo to a very smart remote colleague, let Claude do all its various quirks, and it works really well.
The "efficient market" framing assumes Anthropic wants to minimize output, but they don't. They charge per token, so the defaults being verbose isn't a bug they haven't gotten around to fixing.
That said, most of this repo is solving the wrong problem. "Answer before reasoning" actively hurts quality, and the benchmark is basically meaningless. But the anti-sycophancy rules should just be default. "Great Question!" has never really helped anyone debug anything.
> "because it will just get subsumed into CC at some point if it actually works."
This is the sharp-bladed axe of reason I've used against all of these massive "prompt frameworks" and "superprompts".
Anthropic's survival depends on Claude Code performing as well as it can, by all metrics.
If the Very Smart People working on CC haven't integrated a feature or put text into the System Prompt, it's probably because it doesn't improve performance.
Put another way: The product is probably as optimized as it can get when it comes out the box, and I'm skeptical about claims otherwise without substantial proof.
Claude also has it's own md optimizer that I believe is continually updated.
So you could run these 'cure-alls' that maybe relevant today, as long as you are constantly updating your md files, you should be ahead of the curve [lack of better term]
I'm hoping that was just the blog version of what they did (since more succinct) but yes, I have so many "-CURRENTDATE-EXPLAINATION.ext" files for any flat-file databases I interact with (keychain, sqlite, db4, etc). It's saved me more times than I can count.
Going in to fix a service that uses sqlite and seeing 5 other times I recovered data or was making a change is always fun.
Do these posts just get upvoted due to the graphics/animations? I find this site incredibly difficult to read with things re-playing as you scroll up and down and the articles I've read from here are often light on details. The graphics seem very AI-generated (overlapping text and other little issues) which makes me think the whole thing is from an LLM.
While this post does have some interesting information, I have to wade through distracting animations that seem "off" which makes me questions all of it.
> Do these posts just get upvoted due to the graphics/animations?
I don't think so. It's more likely that they're upvoted as a signal-boost; convene here to talk about bad government tech.
Some submissions are less about the subject matter than they are about providing a space to talk about only the subject in general. I've found this to be the case when the content is AI-generated.
That may be true in some cases, but I disagree about that in this case. TFA links to numerous sources for it's data (e.g. https://reports.exodus-privacy.eu.org/en/reports/723186/ for the White House app, and AFAICT exodus privacy is a legit service), and discussions around government applications that are loaded with surveillance tech (and in many cases it seems like the apps' primary, and sometimes only, purpose is data harvesting) seem very on-topic for HN.
Also, FWIW, while I found the layout of the top section of the article to be weird, the actual text body and linked sources were easy to read for me.
I was referring to the graphics/animations that the GP comment mentioned. I was more confident that those were AI-generated than the actual text. Upon further scrutiny I'm having second thoughts.
There are multiple cases of inconsistencies between certain claims and the sources that they linked to:
> The acting IRS Commissioner, Melanie Krause, resigned in protest.
> DHS's own internal documents admit Mobile Fortify can be used to amass biographical information of "individuals regardless of citizenship or immigration status", and CBP confirmed it will "retain all photographs" including those of U.S. citizens, for 15 years.
> ICE Homeland Security Investigations signed a $9.2 million contract with Clearview AI in September 2025, giving agents access to over 50 billion facial images scraped from the internet.
If I really wanted to force the claim that the body text is AI-generated (or assisted) then I'd guess that the LLM (likely Claude) counted the "dangerous" icon from its appearance in "The icon [Red exclamation mark] indicates a 'Dangerous' or 'Special' level according to Google's protection levels."
> And the whole CBP ecosystem, from CBP One to CBP Home to Mobile Passport Control, feeds data into a network that retains your faceprints for up to 75 years and shares it across DHS, ICE, and the FBI.
This makes it appear that there are separate apps running concurrently, namely CBP One and CBP Home. They aren't. From the linked source, "CBP One is no longer available". It was replaced with CBP Home. The source does not mention Mobile Passport Control.: https://www.americanimmigrationcouncil.org/fact-sheet/cbp-on...
> ...discussions around government applications that are loaded with surveillance tech (and in many cases it seems like the apps' primary, and sometimes only, purpose is data harvesting) seem very on-topic for HN.
Which is exactly why I said: "Some submissions are less about the subject matter than they are about providing a space to talk about only the subject in general."
The article in its entirety reads more like a desperate attempt at spinning the recent release of the "White House app" into a story about state surveillance. The problem is that it doesn't have a cogent conclusion or point to make except for a "Surveillance Data Pipeline" graphic that depicts ICE as the central destination for all of this data and the following:
> The federal government publishes content available through standard web protocols and RSS feeds, then wraps that content in applications that demand access to your location, biometrics, storage, contacts, and device identity. They embed advertising trackers in FBI apps. They sell the line that you need their app to receive their propaganda while the app quietly collects data that flows into the same surveillance pipeline feeding ICE raids and warrantless location tracking. Every single one of these apps could be replaced by a web page, and they know that. The app exists because a web page can't read your fingerprint, track your GPS in the background, or inventory the other accounts on your device.
>
> You don't need their app. You don't need their permission to access public information. You already have a browser, an RSS reader, and the ability to decide for yourself what runs on your own hardware. Use them.
What is the link between the two? Who is the "You" being addressed here? We have apps that are apparently used only by ICE, apps meant for foreign travelers into the US, apps only someone's conservative/veteran grandfather would be caught using—these are disparate demographics to me.
If my initial impression to all of this information was "So what?" how would this article convince me that it's actually meaningful? Submissions like this aren't about discussing anything novel or critical about the subject matter (with the exception of the Huawei thing which is a missed opportunity from an editorial point of view). They are signal boosts to talk about bad government and technology in general.
I've spent enough of my morning trying to make actual sense of this story, that's not to say that it's not informative (albeit unsurprising), but the quality of the writing irrespective of whether its "readable" makes me question if the submission was popular because of its substance or because it's supposed to be a proxy for r/politics.
These posts get upvoted because the content itself is big news (government apps having insane amounts of spyware is, imo, something worth discussing.) I think if the frontend was just plain HTML/CSS, it would still get a comparable number of upvotes.
Brave has a feature (free) that lets you hit a button and literally remove every animation if you want.
The White House app tracker list comes directly from Exodus Privacy's independent audit, verifiable by anyone.
What the cards look like on mobile has nothing to do with it.
> RAM prices are crashing because new models won’t need as much
Reality begs to differ [0] and following the link for that text goes to an article [1] where they talk about Google's TurboQuant which supposedly will lower the RAM requirements. Now if that means RAM prices come down (as speculated, not reported on, in the link) or the AI companies just do more things with their extra ram is yet to be determined. The fact this article links there with text "RAM prices are crashing" throws the entire rest of the article into doubt for me.
RAM prices are most certainly not crashing (yet) and treating it as a forgone conclusion because _one_ lab found gains could be made and hasn't even reported on the efficiency of their method is just irresponsible. It's almost as bad as when LLMs link things to prove their point, you visit the link, and find it says nothing of the sort or even the opposite.
> Now if that means RAM prices come down (as speculated, not reported on, in the link) or the AI companies just do more things with their extra ram is yet to be determined.
Not if the bigger models have diminishing returns. Lets say you figure out a way to reduce RAM requirements 100X, but 2x increasing RAM usage by 2x only gets you a 1% increase in effectiveness and 3x does not get you any noticeable increase over 2x at all. Sure you can reduce the price per token, but you might have already saturated the market. Even if you haven't saturated the market, your hardware based moat just got smaller and this is going to reduce your margins even more.
Jevons paradox only applies if demand hasnt already been saturated.
The fact that public LLM usage is leveling off at a price of $0 and Jensen "we make the shovels in this gold rush" Huang is rather desperately claiming that you need to spend $250k/year in tokens to be taken seriously suggests that demand saturation may not be that far off.
Whether Jevons' Paradox applies to software engineers I think is another open question. Im constantly being told that it doesnt and that LLMs make half of us redundant now, but Im skeptical - so much automation I see is broken or badly done.
LLMs haven't remotely begun to be integrated into the lives of the typical person. Not even close. The typical person is using LLMs not at all as it pertains to their daily life tasks. They're using them almost entirely for limited discussion matters (eg having a discussion with GPT about a medical issue, or a work related matter).
This is the first or second inning in the LLM rollout. It'll take 15-20 more years for full integration of AI agents into the life of the typical person.
The claw experiments for example can just barely be considered alpha stage. They're early AI garbage unfit for the average person to utilize safely. That new world hasn't gotten near the typical person yet.
The compute requirements to get to full integration of AI agents into the life of the average person - billions of them - is far beyond 10x where we're at now.
> LLMs haven't remotely begun to be integrated into the lives of the typical person. Not even close. The typical person is using LLMs not at all as it pertains to their daily life tasks. They're using them almost entirely for limited discussion matters
This is an argument in favor of demand having leveled off.
Only if nothing changes. Right now, people are running agent frameworks like OpenClaw on their own hardware or a VPS and the frameworks are often single person projects. This results in all sorts of problems but you can pick an easy solution from history which is to create a walled garden service for running these agents where you can provide security and standardization. If that platform also allows trusted services to integrate then they can provide end to end security guarantees. They also benefit from improvements to the models themselves making them more difficult to subvert. Creating something that is secure enough for the average person to entrust their credit card to is not an impossible task.
>The typical person is using LLMs not at all as it pertains to their daily life tasks.
This doesnt track at all with my experience. Everybody is using it everywhere.
Moreover people are using them for daily life tasks even when it is not an appropriate use of LLMs - e.g. getting medical advice as you referred to or writing emails which are clearly pissing off their coworkers.
In this respect I see it as akin to radium - a new technology that got a little too fashionable for its own good when it first emerged and which will likely have many use cases scaled back.
In my experience people vastly overestimate the competence of doctors. Getting medical advice from LLMs could be life saving.
Personally I experienced this when a specialized doctor believed a drug interaction to be the opposite, thinking A hinders the absorption of B, when actually it hinders the clearance, tripling concentration of B.
Without AI, I would have been clueless about this and could not have spotted the mistake. I don't know if it would truly have been critical, but it did shake my confidence in doctors.
Id be careful stating this is an inappropriate use of LLMs. Im semi tapped in to the medical literature community and there is a lot of serious discussion and research going into the usage of LLMs for medical advice and most of it is showing that LLMs are barely worse than doctors, and much much cheaper/more convenient. They definitely arent ready to completely replace doctors, but it seems they can provide competent medical advice in a pinch. Look out for the literature on this in the coming year, its only the last few months that researchers seem to be taking LLMs seriously.
I am surprised that people are surprised by this finding, and support your position.
Anecdotally, doctors get things wrong quite frequently. Almost everybody has a bad medical diagnosis/advice story. The amount of reference material that a doctor needs to know off-hand and the data that they are given to make a diagnosis makes it a really difficult job. They also seldom have the ability to know whether their diagnosis/treatment worked, so have a limited ability to 'learn' from outcomes. (I did some work for cancer research and one of the most difficult problems was trying to get 'end of treatment' data because the end of treatment was often an unknown, to the researchers, death).
The ability to have a 'prompt' that includes lab data is likely to be better than the opinions of a doctor that only has one person's professional experience, limited ability to interpret 'prompts', and needing map it to an in-memory conditions database.
Well the thing is that it often isnt worse than a doctor's, thats the point of the research here. I get that sounds crazy, just watch out for the coming literature I guess.
A significant portion of americans detest the medical industry and deeply dislike going to the doctor so I dont even think the product needs to be very good to disrupt the way the system works, just different and accessible is likely enough. Funnily enough, restaurants where the food is bad but the portions are big are actually decently popular. Priorities can vary so widely that many people are unable to even comprehend the priorities a significant number of people truly hold.
I like that this comment is below, and posted after, an example where somebody had to pay extra money to clear up a misdiagnosis of stage 4 cancer by the “barely worse” software
There are many examples of doctors misdiagnosing a wide variety of things, which is largely the point here. People think of doctors as infallible when that is not even close to true.
Im certainly not saying fire all the radiologists, just advising an open mind when the actual literature starts saying that LLMs are as good as doctors in some areas.
There are many examples of people into homeopathy, chinese medicine and even witchcraft using an identical (not similar, identical) argument to the one you just used to push it.
No one in our Auto shop is using AI. One of the new diagnostic tools was demo'd with AI, and none of us were having it. It's about as accurate as Googling your symptoms.
My mother had an AI powered lung scan that came back with Stage 4 Cancer. The Oncologist got called in (for a fee!) to tell us it was just early stage COPD.
It is quite hard to imagine how the demand is saturated now. I think any company that uses a sliver of AI will happily increase their token consumption 100x if it's free.
Are you assuming a brute force "burn tokens until it passes the tests" model, or is there a really sweet approach on the horizon that is impractical at current token costs?
I'm asking 'cos while I'm philosophically opposed to the first option, but I'd love to hear about anything that resembles the second.
One idea I've heard is prototype-first design reviews. If the cost of code genuinely trends to zero, there's no reason why most technical disagreements about product functionality couldn't come with prototypes to illustrate each side of the debate. Today, that's not always practical between token costs and usage limits.
Then hopefully the reviewers will notice that the first prototype's flaws are correctable. Sometimes they won't, and they'll end up making a bad decision, just as they sometimes make bad decisions today with no prototypes to look at. But having prototypes allows for a lot of debates that are today vague and meandering to be reduced to "which of these assertions at the end of this integration test do you think is the correct behavior?".
Executive FOMO disease is being exploited by the model providers to push for maximal token usage even when it is pointless.
This includes encouraging people to set up elaborate multi model set ups (e.g. "gas town") for coding that do not meaningfully improve productivity but which certainly do cause token usage to explode.
It also includes encouraging execs to use token consumption as a proxy for productivity - almost akin to SLOC.
AI has a halo right now and the managerial class seem to be willing to forgive almost any failure because the promise is so enticing. We're at peak expectations right now. They will soon start to be less forgiving when the warts which are intrinsic to LLMs remain unsolved.
nobody know how to measure software productivity + ai is supposed to mean productivity goes up = more ai means more productivity
As best as I can tell, that's the thinking. It's one number, it's very easy to find and manage, and there is a belief that it directly measures productivity.
I disagree that it does; seems to me the throughput of useful features is a better measure, but I'm not in the drivers seat on this one
Incremental revenue and cost-savings, at least for enterprises, is where it would show up. There’s also a present value consideration - if LLM’s make those dollars come into existence closer to the present, they are worth more.
The personal use case stuff is messy and subjective.
attributing incremental revenue to gross engineering effort is challenging, imo.
Cost savings is primarily a function of headcount here. Which is also easy to measure, and so if we take my thesis that easy to measure stuff is prioritized...
Yep - it’s impossible to separate experimental tokens vs value creating ones.
Ultimately the performance will be assessed via the income statement and cash flows of customers of the model producers.
Frankly in the window pre-IPO it’s in the best interests of OAI et al to show a line going to the top-right in relation to tokens, in their prospectus. What does that mean?
Demand for top models is definitely not saturated, at least when it comes to programming. If I could afford to use 5x more Claude Opus 4.6 tokens, I would!
Demand is relative. How many Claude tokens would you buy if they had a 10x price hike?
The market has achieved it's current saturation level with loss-leader prices that remind me of the Chinese bike share bubble[0]. Once those prices go up to break even levels (let alone profitable levels), the number of people who can afford to pay will go down dramatically (and that's not even accounting for the bubble pop further constricting people's finances).
There is no evidence that labs are losing money on inference subscriptions. The labs have massive fixed costs, but as long as inference spend is higher than the datacenters they use for inference cost all they need to do to become profitable is scale up. Right now software engineers are basically the only ones actually paying for inference, the labs just need to create coding assistants for everything that are good enough that every white collar worker in the country(world?) is paying a $1000/yr subscription. Certainly theres a lot of risk, will models become commoditized and everyone switches to open models? can they actually get non software engineers to pay for inference in mass? But its not like theres no path
If they've already built themselves a loyal customer base (which is usually the point of fighting a price war) and the customers are happy with the technology they have, then if funding is tight and turning a profit is more important why wouldn't they pivot to optimizing inference by stopping further training, freezing the model versions, burning the weights into silicon and building better caching strategies and improving harnesses and tools that lower their cost and increase their margin?
If all they do is hike prices then they'll lose customers to competitors who don't or who find a way to serve a similar model cheaper.
The demand isn't going to go away purely through higher prices. Once people know something is possible they will demand it whether supply is constrained or not. That's a huge bounty for anyone who can figure out how to service that demand.
Easier said than done. What you're describing can take years to implement. Can OpenAI et al. keep burning cash at the same rate for two years while they wait for the salvation of custom silicon if the investments dry up?
I thought we were going to hit token saturation years ago, but they keep inventing new ways to use tokens. Like, instead of asking a chat model to write something and getting ~1000 tokens out of it, you now have an agent producing ~10,000 tokens - or, worse, spawning 10 subagents that collectively burn ~100,000 tokens. All for marginally better answers with significantly higher compute usage.
Personally, I would have used all those tokens to generate synthetic data for IDA (iterated distillation and amplification) so that the more efficient 1000 token/answer chat model can answer more questions, but apparently that doesn't justify an insane datacenter buildout.
Demand is stagnating only applies to the B2C segment, where people are already bored of generating poems and funny pictures. In B2B, the demand hasn't even started yet because corporations are still terrified of shoving their NDA data into public APIs. The second local models and secure private clouds get cheaper, the enterprise is going to devour literally any amount of available compute just to automate internal document workflows
We’re not even close to demand saturation with tokens. Have you seen the people rending their garments with rage that Anthropic and Google won’t let them use their flat-rate subscriptions to burn millions of tokens per hour on OpenClaw? And that’s a tiny set of die-hard tinkerers.
The ceiling of token use when everyone has something akin to OpenClaw just running as a background process on their phone is way higher than there’s supply for right now. Jevons paradox is still in full force.
Is that not appealing to those users _because_ its a subsidised flat rate? Like those users could go and swap to API pricing right now if they wanted to, but at API pricing they don’t want to
TurboQuant has a specific benefit by compressing the KV cache at a negligible cost to quality. That mainly means that the context lengths can go up in models for the same amount of memory, however the KV cache only accounts for something like 20% of the overall model size, and this will not dramatically decrease memory demands in the way that some of the more sensationalist reporting has stated.
The open source tooling got quantization support 3 years ago! It was a lesser type of quantization, but more than enough to prove that the savings just go to bigger models.
I’m not disagreeing with you, but consumer RAM prices are lagging indicators. If commercial RAM prices are dropping then consumers will see those price drops last, especially given the fact that several consumer manufacturers turned to commercial only.
Is there a source that says commercial RAM prices are dropping? I was recently told (without a source, so I am not sure if it is true or not) that OpenAI never even bought any of the RAM they signed deals on last year, and that those deals were just letters of intent. So if prices are coming down I wouldn't be shocked but the economy is pretty well vibe coded these days so who even knows.
Well, all manufacturers of ram have publicly stated that they're sold out for 2026
RAM prices falling during 2026 is insanely unlikely unless AI crashes so hard it starts to actually kill companies. And not just any but big tech
I'm not seeing that in 2026. Maybe 2027 (I'd sincerely doubt that too, honestly), but definitely not within the next 9 months. Their runway is _way_ too large for things to spiral out of control within such a short period of time
If the claims the GP made about letters of intent to buy vs actual purchases are true, that brings additional questions. Like, if you send a letter of intent but do not follow through, are there financial penalties? How hard is it for the chip maker to sell the chips allotted based on that letter of intent? Would someone like Apple buy up the extra, or would they not need it as they've already bought enough for the units they expect to sell? If someone like Apple suddenly had an influx of RAM, that does not mean they would have extra CPU capacity to match. If the supply chain is this closely apportioned, what is the most likely result of a sudden surplus?
> unlikely unless AI crashes so hard it starts to actually kill companies. And not just any but big tech. I'm not seeing that in 2026
A month ago AI crash we looking unlikely but with the strait of Hormuz being de-facto blocked many predict a global stagflation which could affect AI too.
If they see them. Plenty of businesses are still charging pandemic prices for all kinds of goods and simply pocketing the difference.
Cars come to mind instantly. Prices exploded in 2020/1, due to legitimate shortages, most of which have been plus or minus resolved, but the prices for new (and used!) cars never came back down.
While the pandemic chip shortage resolved around 2024, a new chip shortage started in 2025 when the Dutch government took control of Nexperia (who are owned by China's Wingtech) and China retaliated by creating export restrictions. Honda, Nissan, Mercedes-Benz and others cut production. With less inventory, manufacturers and dealers are raising prices to compensate.
Also the cost of shipping never came down and lots of cars and/or their components need to cross oceans. Plus we have a new energy crisis...
Actually the prices for new cars seem to be now lower than in 2022 where I live in Europe. Though this could be attributed as well to the competition from Chinese manufacturers.
Honestly you're both wrong. RAM prices spiked speculatively, and they're going down for the same reason. Market people always want to argue in fundamentals, when in practice *ALL* the high frequency components of the signal are down to a bunch of traders trying to guess where it's going in the short term.
At best those guesses are informed by ground truth ("AI needs a lot of RAM!" "Sam cornered the marked!" "TurboQuant needs less RAM!"), but they remain guesses, and even then you can't tell the difference between that and random motion.
Then note how wide the gray bands are. That makes it very easy to cherry-pick a few examples to present as "supporting evidence" that prices are doing whatever you want to believe they are doing.
It's showing $999 now, which seems about median for similarly-spec'd memory on Amazon. The cheapest slot-and-capacity-compatible equivalent I can find is around $570, even. So 3-5x increase, at minimum.
It's true that that's a high error bar. It's absolutely not true that the trend is ambiguous.
Can you cherry pick me a $141 kit, please? I mean, it's not an abstract question! I'd buy it from you right now if you had it or could get it, in whatever quantity you can source. No joke.
I’ll believe they’re going down when it doesn’t cost $550 for the $105 ram I purchased 1 year ago. Yes consumer prices lag commercial prices yada yada, I think any hot takes are pointless until we see lower prices or far more convincing evidence it’s coming. When it costs basically a MacBook neo for 32gb of DDR5 ram it’s hard to hear “ram is coming down for sure”
No, they signed a bunch of contracts for future deliveries. That's not a supply constraint. The factories making RAM continued operating and serving their existing deliveries, and in fact they still are.
Freshman economics would say that supply is fine and that prices shouldn't move. But they did anyway. And the reason is speculation.
I don't get it tbh. What market participants were speculating here? There aren't futures markets in RAM as far I know, though I certainly don't know much. And the supply constraints appear to have been pretty real (though maybe not immediate) if eg. Valve was begging publicly for RAM consignments. Were there pure-play speculators filling warehouses with DDR5?
>There aren't futures markets in RAM as far I know
sure there is. not formally, but if you hold a contract for x units of future production, you can sell that contract to somebody else who wants those units more than you do.
Its still speculative that OpenAI won't go bankrupt and have to free it back to the market, but if it is holding them unfinished it is a supply constraint on finished RAM chips even if not on wafer output.
Have we gotten anymore word on the potential Helium constraints that SK Hynix was making noise about after the strike on the helium plant in the Middle East that suppplied 60% of S. Korea's Helium? Because that could definitely put a kink in things, since SKH is one of the 3 remaining big DRAM producers.
I do wonder how closely prices consumer RAM kits follow the wholesale prices for NAND chips manufacturers see internally. The pcpartpicker graphs you linked show consumer prices have leveled out and may even be starting to fall. Depending on how the economics shake out this could mean we've hit an inflection point.
My personal prediction is that once the VC bill comes due and prices for frontier models starts to climb, competition for efficiency will heat up. The main AI use-cases seem to be falling into buckets, and I doubt serving gigantic, do-it-all general models for every use-case under the sun is remotely cost-effective.
If common use-cases start to be more efficiently served by smaller, more efficient purpose-built models (or systems thereof), it'd make the big frontier models increasingly niche. Cursor's Composer 2 model is a great example of this.
In any case, I think it's pretty fair to speculate we may be seeing RAM prices start falling sooner rather than later.
Consumer vs NAND is an absolutely fair distinction to make, I'm not sure how to track those prices. My main issue the article saying "RAM prices are crashing" (which I can't find any evidence of) and linking to an article that doesn't even repeat that claim, it instead just speculates that maybe RAM will come down in price due to this new idea.
> In any case, I think it's pretty fair to speculate we may be seeing RAM prices start falling sooner rather than later.
I sure hope so. RAM, HDDs, and SSDs are all crazy-high right now and I was in the market for literally all 3 but have paused all my buying because I can't justify the costs as they stand today.
> My main issue the article saying "RAM prices are crashing" (which I can't find any evidence of) and linking to an article that doesn't even repeat that claim
That's totally fair. The article is written in a very odd way where it makes a bunch of authoritative, factual-sounding claims and then throws a "this is all very speculative" line right at the end.
It's very interesting speculation, but can't really be considered anything more than that, despite the prose it chose.
RAM prices haven't crashed yet and it'll take time because it has to propagate within the supply chain. Micron is -20% from the top already https://www.investing.com/equities/micron-tech
Stock price is the best forward indicator I can think of
yeah good point, although it's just one of all the catalysers I mentioned. I fact I had written most of the post already before I saw the news about RAM.
I would think that we are going to see RAM prices increase even more, given, among other things, pure helium disruptions and increased electricity prices.
I haven't looked closely into TurboQuant, but perhaps it will revolutionize just as much as the 1-bit llm did...
consumer ram is starved by production capacity shifting to HBMs. Hbms dropping in price would not affect consumer RAM on any immediate timeline. Also, as pointed out by many, Jevons Paradox
Thank you, there are two things I would like to point out:
1) Google releasing something probably means they don't see it as important. 4-bit KV-cache quantization has been known for a long time. The fact there is almost a mass hysteria about this paper makes me think there is a lack of skepticism in this AI mania, even in relatively tech-savvy crowd.
2) But prices for memory companies are crashing! look around, the whole market is crashing.
Bingo. Even if some magic drops tomorrow that compresses the KV cache down to literally zero bits, that saved VRAM will instantly get swallowed up by bumping the batch size or pushing the context window to 10 million tokens. There is no such thing as "excess memory" in ML, only under-trained models
There is also demand for ram in others areas of data centers. As we are all pushed deeper into clouds, i can see the rise of ram for data storage (ram drives) continue to eat into the supply. A module of ddr5 will be more useful in a netflix rack streaming movies 24/7 than in a gaming PC where it may only be used an hour or two every day.
> > RAM prices are crashing because new models won’t need as much
> Reality begs to differ [0] and following the link for that text goes to an article [1] where they talk about Google's TurboQuant which supposedly will lower the RAM requirements. Now if that means RAM prices come down (as speculated, not reported on, in the link) or the AI companies just do more things with their extra ram is yet to be determined. The fact this article links there with text "RAM prices are crashing" throws the entire rest of the article into doubt for me.
I find it fascinating how extremely reactive things have become. One research paper which, to my knowledge, hasn't been externally replicated yet, nor implemented, generate tons of hyperbolic article, tweets and such, and do actually manage to move the market at least temporarily. Not just this, but a simple message in full caps lock by the president of the U.S who is in the habit of lying through is teeth constantly, and the same thing happens. It's like there is a big bubble that threw any form of critical thinking out of the window and is in a hurry to react to anything even if it is not even remotely believable.
Now I understand why it happens, there is a lot of money that can be made by capitalizing on FOMO, either by driving traffic to their website, socials, etc, or by simply insider trading (which feels like it has been legalized these days). But I still find it incredible the proportion it started to take.
My favorite was when Google revealed Project Genie a month ago (which lets you generate video game worlds with AI, basically) and stocks for game companies immediately dropped. Anyone familiar with games and gaming knows that what Project Genie offers (essentially empty worlds with minimal interactivity that you can just kind of wander around in, and they struggle with simple things like object permanence if you look away) knows that this isn't real competition for actual games, but the markets reacted anyways.
I've always seen the stock market as a mix of mass hysteria and pyramid scheme. With actual value underlying it of course, but actual stock values are frequently irrational.
You get more Claude tokens from a Google subscriptions via antigravity than from anthropic. Especially if you use the 5 other "family" accounts you can share the subscription with...
They've all avoided loading up their LLMs with ads to this point. That is going to change dramatically over the next 2-3 years. All of them will be loaded with ads, and Google will partake as expected given their ad network & capabilities in that realm. They'll match GPT's ad roll-out.
where do I find the paid option? I can not find that on their product page.
There are only two options I can see; one "Available at no charge" and another one "Coming soon - For organizations"
Can you upgrade in the IDE? It would be strange that Google has a performance problem for paid users while I do not experience any such issues at all with Claude and Codex.
Even worse, 3 memory companies control well over 90% of the international market, with a history of cartel collaboration that's going to be ever harder to prove with fewer companies.
Some also argue that the RAM price keeps rising because of the bullwhip effect. I was wondering if there's anyway for us to differentiate a sustained demand from the bullwhip effect.
This article and the title are total clickbait filled with emotional hooks. And it worked. You totally debunked it but look how it still became so popular.
Not crashing yet. The article is looking 1 to 5 years to come.
Given Nvidia's CEO's agitation I would give credit to the prediction, and if it's correct the price will go back to what it was, or even lower of investment in capacity are made today.
Yeah, I also stopped reading at that point. If I want a bunch of random, made up facts to sell lukewarm opinions or steer the uneducated masses, I'll tune in on a Trump press conference. Why does this feel like someone is desperately trying to make reality mirror his flailing market bets?
This feels similar to when Deepseek first debuted with claims of ultra-low cost training, and all the pundits exclaimed that Nvidia was finished, the bubble had burst, etc.
I find that incredibly buggy. Literally 50%+ of the time CC complains it can’t connect to the MCP. When it does work it can be magical, but my success rate is tiny. I’m not going to restart all my chrome windows every time I turn around because CC can’t talk to it for some unknown reason, especially since I’ve restarted Chrome before and CC still couldn’t connect.
There should be a a better way to restart just the MCP.
This smells of the same "I got kicked off Stipe" where we later find out it was porn or something else illegal or not allowed by Stripe ToS. None of these people have said what they were using the API access for and it seems that they are getting banned by the backend providers based on what they are trying to generate.
I like this idea, I wrote a tiny CRM-like thing for my mom a month or so ago (upgrade from a google sheet that I had previously upgraded from an Apple Note) and it uses Sheets as it's backend. She can always go back to the Sheet and lose nothing but the web interface she uses it tailored to exactly what she needs. I couldn't be happier with it and she loves it.
I'll have to remember this for my next little one-off tool.
reply