1. Factory limits basically. There's a limit to the amount of fabrication lines that can create ram. Combined with the market incentives right now to make high bandwidth memory (HBM) over server memory (DRAM)... HBM starts as DRAM dies, so it competes with normal DRAM for wafer starts / cleanroom fab capacity.
2. Eventually more plants will come on line. Most of the main manufacturers have announced expansions but these can take O(years) to come online.
Are more plants coming? I think I heard it won't be many of them, because it's risky.
If the bubble bursts and RAM demand drops, then they'll have big losses. And that's not an impossible scenario over the few X years that it takes to build a plant
Not particularly. I'm not yet convinced people's mouse movements are unique enough to our identity that they're useful as a fingerprint, whereas it's very easy to classify whether something looks bezier or looks human.
Eventually I'm hoping to collect enough data here to train a biased decoding model, so you could input some randomized personality vector (which implicitly encodes slow movement, jerky motion, trackpad, mouse, etc) and have that impact the RNN generation. So in theory there would be infinite combinations from the larger subspace we're sampling from.
So much of what Apple has lost over the last 10 years is a lower bar for what counts as good enough.
You see this most obviously in software and marketing - the kinds of decisions where only a few people sign off at the end, and where "good enough" is whatever those few people decide it is. You see it less in hardware and procurement where there's a powerful review cycle and scrutiny at every level of the stack. Work there is more immediately measurable: benchmarks for performance, dollars for cost.
The "vibe" of software, or of a PDF [^1], is much harder to catch that way. There's no benchmark that flags it and most conventional executives aren't drilling down in that level of detail to see it either.
You want distributed decision-making, of course. But that only works well if it's distributed to people who've cultivated their own taste and who will make good calls under pressure. I'm not sure how much of that gets fixed by leadership change at the top. Taste isn't really something a CEO can decree into a 60,000 person org. But I've only heard good things about Ternus, so I'm optimistic. Fingers crossed for a bright new chapter.
Digitizing my old tapes was one of the most rewarding side projects that I did over the last year. I managed to get in under the wire (pun intended) of Firewire compatibility on Sequoia and a long daisy-chain of adapters. But it was clear the days of this approach were numbered. I'm optimistic these 3rd party accessories will become more standardized into self-contained cheap boxes where people can easily transfer over their stuff before camcorders degrade.
My pipeline went camera -> dvrescue -> ffmpeg -> clip chunking -> gemini for auto tagging of family members and locations where things were shot.
We now have all our family's footage hosted on a NAS with Jellyfin serving over Tailscale to my parents Macbooks. I found the clip chunking in particular made the footage a lot more watchable than just importing the two-hour long tapes although ymmv.
I am going to finish such a project soon myself, including some old Video8 tapes! Sounds like you're on macOS, Any reason you didn't use iMovie for the capture itself?
The Video8 tapes have already been digitalized via a Digital8 camcorder, but apparently you can get even better quality out of old analog tapes with the vhsdecode project. Let's see if I ever get around to that, but at least it bypass Firewire entirely:
https://github.com/oyvindln/vhs-decodehttps://www.reddit.com/r/vhsdecode/
Mostly wanted to fully automate the pipeline (auto-rewind tape, scan tape head position, etc) and iMovie is just using the same AVFoundation APIs under the scene that you can call manually. Took some notes here if helpful:
https://pierce.dev/notes/automating-our-home-video-imports
Wish vhsdecode was easier to use in practice! Such a cool idea but a bit too inconvenient to hack your own hardware like this...
I used dvgrab to ingest my old tapes, and ffmpeg and avisynth/QTGMC to de-interface and encode files for easy viewing (though I keep the original .dv files).
The biggest issue I ran into was that while the audio and video were properly synced up in the original .dv file (due to it being an interleaved format), when I re-encoded the videos, the audio and video would drift out of sync as the video went on.
I was able to fix the sync issues by using dvgrab to split the original dv file into a bunch of 3 minute chunks. I then wrote a script to extract the audio track from each chunk, pad the end of the audio with milliseconds of silence to the exact length of the video track, combine the padded audio tracks, encodes the combined track, and muxes the fixed audio track with the encoded video. This worked really well; the silence padding is imperceptible, but the audio and video are still in sync - even after 2 hours.
A final point that needs making is that doing anything with dv files in ffmpeg (even -c:v copy) destroys the SMPTE timecodes embedded in the original file, making it much harder to split by scene.
Just because I've dealt with this exact issue in the past, it may have been a 30fps vs 29.97fps issue. For me the audio was a fixed length, but the frame rate was SLIGHTLY too fast. The problem can manifest as either too slow or too fast depending on which side is expecting 30fps vs 29.97fps.
I think it was just clock drift on the camcorder during the initial recording, as I'm pretty sure I tried adjusting the frequency of the audio track to make it the same duration as the video track, and the A/V sync was still wrong.
I'm so glad the audio and video tracks are stored interleaved, as it made my solution possible, and the results I got were great. By splitting the interleaved video into small enough chunks, padding the audio, and cutting it exactly to video length, the padding was practically imperceptible.
The only issue I ran into was that ffmpeg can't cut audio with any real precision. I eventually figured out that I could dump the audio track to a headerless PCM file, calculate the exact byte offsets for my cut points, and cut them with perfect precision using the head and tail commands from GNU coreutils. This was perfect because I was able to use the cat command to combine all of the padded audio chunks into a single raw PCM file, which I then made an AAC encode of with ffmpeg to mux with my original encoded video track.
Ffmpeg's dvvideo implantation is unfortunately just broken and mangles timecodes, even if just doing a stream copy from dvvideo to dvvideo without any re-encoding.
Fortunately, dvgrab does allow you to take the original .dv file and generate a .srt subtitle track with time stamps that you can mux into your encoded files.
If you are capturing I find dvgrab is pretty good. It's what I've been using for about 25 years now!
In the olden days when I got paid to shoot real video on a VX2000 and edit it for people, captured using a PCI Firewire card and dvgrab in Slackware, rewrapped with probably mencoder shading towards ffmpeg when it became more popular (and developed!), dual-boot into Windows 2000 and cut in Premiere 5.0, then back into Linux to transcode back to DV if I wanted to write it out to DV tape.
These days I shoot on a PD150 or DSR500 (and quite often some HDV cameras), capture via a PCIe Firewire card and dvgrab in Ubuntu, rewrap with ffmpeg, and edit in Resolve, without the dual-booting step.
If you use dvgrab it will split the capture up into separate clips on shot boundaries based on the pause/unpause markers on the tape. I have not found a way to extract good/no good from the stream, but if you're not shooting on a broadcast camera you don't have this anyway. Timecode is preserved though!
When you load it all up in Resolve, one of the options in the Cut page is "Source Tape View" which runs all your clips together by timecode, and lets you view them as though they were a continuous tape of your rushes, which is how we used to do basic assemble editing in the olden days of clunky tape decks and edit controllers with big rows of red 7-segment displays.
Edit your old home videos. You can do that now, and they'll be far more watchable.
Went through a very similar journey recently as well. In my case using a Macbook was a non-starter, as certain adapters are prohibitively expensive these days, if you can even get your hands on one. Thankfully my son has a desktop Windows PC and Firewire PCI cards are cheap and plentiful, so getting connected that way worked out. Much better than an earlier attempt via RCA cables (simple but digital -> analog -> digital is not the way to go).
My pipeline was camera -> WinDV -> DVdate (to extract exact datetimes into srt subtitles) -> Handbrake (to convert to mp4).
> Digitizing my old tapes was one of the most rewarding side projects
I also wanted to do that, but then I realised I needed to invest more
time and may need some hardware, so one day I simply had enough, went
to a commercial shop and had them turn all the old stuff into digital.
The cost wasn't that huge either, so considering that I could also save
time (doing it myself), I am ok with that investment. Hopefully the future
has digital everywhere. Storage to be cheaper too, ideally.
As far as I've seen, local OSS video understanding models just really aren't there yet. I briefly looked at facial recognition models but a good amount of signal was actually in the video's audio instead of the raw video frames. Depends on the accuracy you're looking for at the end of the day.
Waymo is such an interesting case study. For most other ~AI deployments you have strong public reaction to the proliferation of slop, non-human failure modes, cost cutting at the expense of quality, etc. But I haven't met a single person who doesn't like the experience of Waymo. They ended up cracking the code on what I suspect people really want:
- consistent car quality
- safety of the drive (conservative driving and potential fear of drivers)
- no randomly chatty driver
All of those feel like a breath of fresh air especially when stacked up against the current state of Uber & Lyft rides. People really just want consistency. I don't actually think you needed AI to get there (I've had occasional rides in black cars that provided the same experience). Waymo was just right time, right place, right price.
> but I haven't met a single person who doesn't like the experience of Waymo.
Just last week a Waymo was driving on train tracks and the rider had to jump out of the car and run because the car stopped while trains came at it. (https://www.youtube.com/watch?v=26KJvL2clTs) I bet that guy'd have something to say about the experience.
Yeah that's obviously not great but that video is nothing like what you described. You made it sound like it drove onto a mainline train track with a train barreling down the tracks that couldn't stop with the guy diving out of the car to avoid getting clobbered. It did not, it got stuck on a tram track. Not quite the same thing.
I've had Waymos in SF take very strange routes. It seemed to really strongly avoid ever using Market St, generally preferring a long right-angle route over the perfect hypotenuse. Sometimes this delayed me very considerably, doubling my ride time compared to the Google Maps estimated time.
That said, I've never felt unsafe or uncomfortable. But I have jumped out halfway through the ride and grabbed an eScooter instead.
Back when I had to drive/walk in SF, I would also go quite out of my way to avoid market or mission. Especially near 6th. Self-preservation and whatnot...
I'm not commenting on the externalities. For that I'd also cite economic impact, job loss, occasional emergency services issues, etc. I'm saying the experience when you yourself are taking a ride. I haven't met a single person who's said "this sucked - I'm going back to Uber".
My first and only Waymo ride was super sketch. Car slowed down to ~5mph in a 35mph zone and stayed that way for 5+ minutes as other cars were swerving around us. Felt like it was going to come to a complete stop in the middle of the road, I prefer real humans.
At the risk of being overly pedantic, topologists would typically classify this as venom.
Venom is inert if digested; it's only a problem if it gets in your blood stream. So arrows that were laced with venom and thereby contaminated meat were actually perfectly safe to eat.
Poison is different. If ingested, inhaled, or absorbed it will kill you.
We Dutch solve this problem by having a single word for "poison", "venom and "toxin"¹. Everybody still knows what you mean and nobody gets to be pedantic.
Although there are plenty of other opportunities for pedantry, especially when we take regionalisms, and other Portuguese speaking countries into account.
> Funny that in English gift is a word but entirely different meaning.
In English it maintains its original Germanic meaning derived from the verb give.
The sense of "poison" in German comes from a euphemistic use of "gift". (Literally 'something given' but actually used to calque Greek "dosis", which also literally meant 'something given', but was used to mean 'dose [of medicine]'.)
Summing up, the reason gift is a word in English with an entirely different meaning from what it has in German is that everyone in Germany forgot what gift meant.
(The reason it's gift and not something more like yift is the Danelaw.)
It's probably the same, for example in Afrikaans its just gif. Vergif is the verb action of doing it, and vergiftig the same past tense of it having happened previously.
Magyar (Hungarian) and Finnish are both Uralic languages along with Estonian and the Sámi languages, but none of these are related to the Indo-European languages common in the other parts of Europe.
And while most of Europe’s extant languages are in the Indo-European language family, there’s still a fair number of differences between Albanian, Germanic, Hellenic, Celtic, Romantic and Slavic languages.
Oh for sure there are many differences, that comes with them being different languages, countries, ethnicity. You can do this on many levels.
The point was essentially what you're showing here: People focusing on all the differences instead of shared history, languages influencing each other and how we're all not that different in the end.
If you want to, even within what are nowadays countries and what outsiders would say is "one language" and "one ethnicity", you can start focusing on differences and make people dislike each other.
At the very least, they'd complain about accuracy, if not time zone, or even how we should all be on UTC (do not get one started on the difference between GMT and UTC if you value your... time)
Obviously I know "jad" but I don't see any issue with calling venom "trucizna". Natural languages aren't C++ and you don't get compiler errors when you speak - to me, there is no issue calling both venoms and poison trucizna. Polish dictionary doesn't seem to contradict it either:
Nobody would say „trujący wąż” (poisonous snake) or „jadowity grzyb” (venomous mushroom). The distinction is similar to English. There are exceptions and contexts where it can be used interchangeably but arguably the same is true for English.
Italy, the core remnant of the Roman Empire, has unmatched language diversity, often varies even from town to town. It's a colorful mosaic of micro cultures and customs where people from one region using different words for venom/poison is completely normal, in their local dialect. Everyone speaks standard Italian though.
You've never visited Italy ? They're not that far away and I'm sure you'll love it.
> The point is, both are correct(afaik) while in English venom and poison are definitely two different things.
No, the situation in English matches your description exactly: all of these things are called poison. The word venom is almost never used in natural speech.
Furthermore, if you ask English speakers what the difference between poison and venom is, by far the two most common responses will be "there isn't one" and "I don't know". icyfox is just looking to be annoying.
(Another popular option will probably be "it's called venom when you're talking about snakes", which explains roughly 100% of use of venom in natural speech.)
And in Russian we use "jad" ("яд" in cyrillic) for both. Although there is the word "отрава", which can be used for poisons and "яд" is closer to "venom" the difference is almost non-existant and both are often used interchangeably.
TIL. I always thought that "If it bite you -> you die = venom" and "If you eat, bite, touch -> you die = poison". But your differentiation makes more sense
>a venomous creature that bites you will release its venom into your bloodstream
unless it's a bee, wasp, hornet, scorpion, stingray, jellyfish, man-of-war, platypus, lionfish, stonefish, sea urchin, or catfish, which all have venom instead of poison, but the delivery mechanism of said venom isn't biting
If a venomous snake bites you, you die. If you bite a venomous snake, you live.
If a poisonous snake bites you, you will. If you bite a poisonous snake, you die.
Or Hamlet's mother died by drinking poisoned wine. Hamlet died by being stabbed with an envenomed sword.
You're mixing up phōnē (voice) and phonos (slaughter), but the truth about Persephone is actually more metal.
Her name predates Greek contacts with Persians, so the timeline doesn't fit. Instead, it comes from perthein (to destroy) + phonos, making her the "Bringer of Destruction". With a caveat that the etymology of her name is uncertain: https://en.wikipedia.org/wiki/Persephone#Name
I do like "killer of distance" for telephone, though. :)
> Instead, it comes from perthein (to destroy) + phonos, making her the "Bringer of Destruction". With a caveat that the etymology of her name is uncertain:
But... of all the theories listed there, perthein isn't among them.
And if the roots are "destroy" and "death", what would make her the "bringer" of destruction?
Fair point about the source, but the classification usually follows the mode of delivery, not the organism of origin.
Many plant-derived compounds function as venoms once introduced into the bloodstream (arrow coatings, darts, etc.), even if they’re also toxic when ingested. Curare is one example of a plant-based compound - lethal in blood, but largely harmless if eaten.
So while Boophone is absolutely a poison in the ecological sense, using it on arrows still fits the venom/toxin distinction better than a purely ingested poison. Otherwise why would people hunt with this if they got sick the second they ate the meat?
What things are more important than the study of meanings in a linguistic context?
Well semantics only covers an infinitesimal fraction of all meaning. Consider if I inject arsenic into a snakes venom sac is it now a venom? Nothing about your answer changes anything about what’s going on, yet you could still debate the question.
So when you say “what could be more important” I can only say that just about everything is more important.
Exactly half of these HN usernames actually exist. So either there are enough people on HN that follow common conventions for Gemini to guess from a more general distribution, or Gemini has memorized some of the more popular posters. The ones that are missing:
Before the AI stuff Google had those pop up quick answers when googling. So I googled something like three years ago, saw the answer, realized it was sourced from HN. Clicked the link, and lo and behold, I answered my own question. Look mah! Im on google! So I am not surprised at all that Google crawls HN enough to have it in their LLM.
I did chuckle at the 100% Rust Linux kernel. I like Rust, but that felt like a clever joke by the AI.
I wouldn't be surprised if it went towards the LaTeX model instead where there's essentially never another major version release. There's only so much functionality you need in a local only database engine I bet they're getting close to complete.
I'd love to see more ALTER TABLE functionality, and maybe MERGE, and definitely better JSON validation. None of that warrants a version bump, though.
You know what I'd really like, that would justify a version bump? CRDT. Automatically syncing local changes to a remote service, so e.g. an Android app could store data locally on SQLite, but also log into a web site on his desktop and all the data is right there. The remote service need not be SQLite - in fact I'd prefer postgres. The service would also have to merge databases from all users into a single database... Or should I actually use postgres for authorisation but open each users' data in a replicated SQLite file? This is such a common issue, I'm surprised there isn't a canonical solution yet.
I think the unified syncing while neat is way beyond what SQLite is really meant for and you'd get into so many niche situations dealing with out of sync master and slave 'databases' it's hard to make an automated solution that covers them effectively unless you force the schema into a transactional design for everything just to sort out update conflicts. eg: Your user has the app on two devices uses one while it doesn't have an internet connection altering the state and then uses the app on another device before the original has a chance to sync.
Every few years I stumble across the same java or mongodb issue.
I google for it, find it on stackoverflow, and figure that it was me who wrote that very answer. Always have a good laugh when it happens.
Usually my memory regarding such things is quite well, but this one I keep forgetting, so much so that I don't remember what the issue is actually about xD
ziggy42 is both a submitter of a story on the actual front page at the moment, and also in the AI generated future one.
See other comment where OP shared the prompt. They included a current copy of the front page for context. So it’s not so surprising that ziggy42 for example is in the generated page.
And for other usernames that are real but not currently on the home page, the LLM definitely has plenty occurrences of HN comments and stories in its training data so it’s not really surprising that it is able to include real usernames of people that post a lot. Their names will be occurring over and over in the training data.
In 2032 new HN usernames must use underscores. It was part of the grandfathering process to help with moderating accounts generated after the AI singlarity spammed too many new accounts.
my hypothesis is they trained it to snake case for lower case and that obsession carried over from programming to other spheres. It can't bring itself to make a lowercaseunseparatedname
Most LLMs, including Gemini (AFAIK), operate on tokens. lowercaseunseparatedname would be literally impossible for them to generate, unless they went out of their way to enhance the tokenizer. E.g. the LLM would need a special invisible separator token that it could output, and when preprocessing the training data the input would then be tokenized as "lowercase unseparated name" but with those invisible separators.
edit: It looks like it probably is a thing given it does sometimes output names like that. So the pattern is probably just too rare in the training data that the LLM almost always prefers to use actual separators like underscore.
The tokenization can represent uncommon words with multiple tokens. Inputting your example on https://platform.openai.com/tokenizer (GPT-4o) gives me (tokens separated by "|"):
You can straight up ask Google to look for reddit, hackernews users post history. Some of it is probably just via search because it's very recent, as in last few days. Some of the older corpus includes deleted comments so they must be scraping from reddit archive apis too or using that deprecated google history cache.
It does memorize. But that's not actually very news.... I remember ChatGPT 3.5 or old 4.0 to remember some users on some reddit subreddts and all. Saying even the top users for each subreddit..
The thing is, most of the models were heavily post-trained to limit this...
That’s a lot more underscores than the actual distribution (I counted three users with underscores in their usernames among the first five pages of links atm).
Aw, I was actually a bit disappointed how much on the nose the usernames were, relative to their postings. Like the "Rust Linux Kernel" by rust_evangelist, "Fixing Lactose Intolerance" by bio_hacker, fixing an 2024 Framework by retro_fix, etc...
- Uses existing model backbones for text encoding & semantic tokens (why reinvent the wheel if you don't need to?)
- Trains on a whole lot of synthetic captions of different lengths, ostensibly generated using some existing vision LLM
- Solid text generation support is facilitated by training on all OCR'd text from the ground truth image. This seems to match how Nano Banana Pro got so good as well; I've seen its thinking tokens sketch out exactly what text to say in the image before it renders.
I used Serp via API many moons ago. The most interesting part of the company imo is their legal defense of different plans:
Production - $150
15,000 searches / month
U.S. Legal Shield
ie. "Our U.S. Legal Shield protects your right to crawl and parse public search engine data under the First Amendment. We assume scraping and parsing liability for customers on most recurring plans unless your usage is illegal."
I imagine at least some portion of companies use them just for this liability shield.
Sounds a lot like the old guarantee paid SSL certificate providers used to offer; pretty words, but meaningless in practice. (IIRC, no one ever got a payout from any of them.)
"We assume scraping and parsing liabilities for both domestic and foreign companies unless your usage is otherwise illegal" seems like a big loophole in it.
Couldn't this be laid out as, We assume scraping and parsing liability unless it is ruled as being illegal, in which case your use would be illegal and our liability shield wouldn't help you?
2. Eventually more plants will come on line. Most of the main manufacturers have announced expansions but these can take O(years) to come online.
reply