Hacker Newsnew | past | comments | ask | show | jobs | submit | maxrumpf's commentslogin

are you a maintainer on npm?

lol thankfully no. My GitHub is in bio

congratulations!


We think this is a pretty sad day for research: Some context about Chroma's model. https://x.com/maxrumpf/status/2037365748973384154?s=20


can you share actual links for your published research and theirs .


This is the tech report for a model I helped work on. I'm biased, but it turned out very well.

We essentially let the model learn to retrieve like a human would: Make a first search, read the results, and then make another. This lets the model be vastly better than pre-programmed pipelines. We test this extensively and compare against implementing this with API models (like Sonnet 4.5 and GPT-5.1). SID-1 compares favorably.

Happy to answer any questions or get feedback. First and foremost: Enjoy the read. It's much more detailed than most tech reports.


This comment section is scary. Hacker news advocating FOR nanny technology?!


Excel is awful in almost every way, but I just wish more software was as customizable.

I can get (even more) customization by using pandas etc., but it's usually much slower and you get much less of an intuition about the data.


I imagine color consistency will be such a pain here.


I'd hope that per-pixel calibration would solve that, but I wonder how much that calibration would drift over time.


Whatever the drift would be, inorganics would drift less than organic materials.


The weirdest thing people do is make up criteria that YC supposedly uses to reject people. There was such a huge diversity in our batch: From 20 y/o to 40+. Foreign, domestic. Credentialed, not credentialed. $1M rev run rate, $0 run rate. Just apply.


This comment scares me that YC is desperate for applications now after burning through so many early stage founders for years. Has YC peaked?


The abstract and the rest of the paper don't really match imo. It's not really allocating more to some sequences, but just introducing ~dropout. Might be different sides to the same coin, but was still a weird read.


We spent a fair bit of effort ensuring we were accurate with the language and claims, so we're happy to take any feedback and make updates in subsequent versions. However, I don't see where we claim that MoD allocates more to some sequences and not others (specifically, the abstract says "transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence".

That said, it's a pretty simple change to make the approach work in the way you describe (allocating more to some sequences and not others) by changing the group across which the top-k works. In the paper we use the time (sequence) dimension, but one could also use the batch * time dimension, which would result in asymmetric allocation across sequences


Dropout is at train time this is at inference time. Dropout is random this is determined. Can't compare them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: