Hacker Newsnew | past | comments | ask | show | jobs | submit | mateiz's commentslogin

This is a cool paper showing there is value in using an LLM multiple times, but in recent research we showed that with majority voting, quality can decrease past some point as you make more calls. Check out https://arxiv.org/pdf/2403.02419.pdf. It raises the natural question of how to design the best inference algorithm given an LLM you can call multiple times.


Don't MLflow Projects exactly meet this use case? A project lives in a Git repo, which can include both code and data, and specifies its software environment (currently Conda but will eventually also support Docker): https://www.mlflow.org/docs/latest/projects.html. You can then run it wherever you want to run code: CI system, Kubernetes, cloud, etc. The reason MLflow doesn't force people to use Projects is because many users like to develop ML in notebooks, but we definitely expect engineering teams to use it with Projects.


I could go on at length about why MLFlow / Databricks understanding of ML projects is bad to a bonkers degree. I’ll give just one example, which has mattered considerably for several production projects my team works on and tried to manage in ML Flow for a while.

The project was a suite of neural network models that provided face & object detection results in a low-latency web interface where customers can manipulate photos and want automated metadata about people or objects.

In our case, to optimize for performance we need to frequently experiment with compile-time details of the runtime environment (in our case a container) where the application will run in production.

So the axis of our experiments wasnot usually anything to do with neural network layers or data or parameters. It was different compiler optimization flags, different precision approximations and GPU settings that needed to be rolled into a huge number of different underlying runtime environments, and then for each distinct runtime environment the more mundane experiments would be carried out for layer topology, number of neurons, width of CNN filters, etc.

We found that unless youbasically build your own entire “meta” version of ML Flow that wraps around ML Flow, then it falls apart at use cases where custom compile time details of the runtime are themselves aspects of the experiment. Not to mention that the Projects formatting violates good practices, like 12 Factor stuff, for how to inject settings from the environment, which again leads to wasted effort making special case deployment handling for ML Flow jobs.

Whatever deploys and measures your tasks should not also impose any type of special case packaging structure, which is a big reason why MLFlow conceptually fails. Any attempt to make anything at all like a DSL packaging layer for experiments that causes it to diverge from “regular deployment of any old job” is immediately a failed idea. The only thing it’s good for is creating unwitting vendor lock-in once you’re highly dependent on this bespoke, weird packaging template for Projects that makes your ML jobs weirdly (and needlessly) different from other deployment tasks.


While MLflow doesn't submit jobs to Kubernetes for you, it should be possible to integrate it with your favorite scheduler to do that. MLflow is designed to accept experiment results from wherever you are running your code, so you can just submit an "mlflow run ..." command to Kubernetes and have it report results to your tracking server.


We'd love to hear about the size of the payloads used. With the code we posted, larger records take 10-15 more ns per query, which is still faster than the numbers in the paper, but of course YMMV based on the table implementation and other factors.


You can see the table here, it's not a lot of code: https://github.com/stanford-futuredata/index-baselines/blob/.... Each bucket just has 8 keys and you want to test whether one of them is equal to the key you're searching for.


That's pretty intelligent. There's an 8-way SIMD operation for comparison to match up to the 256-bit AVX2 register.

So in the case of Cuckoo hashing, its not 1-thing per bucket and 2-buckets per hash... (as per the classical "textbook" Cuckoo Hash), but instead 8-things per bucket (very quickly compared with 8-way 256-bit AVX2 Comparisons), with 16-effective locations (2 buckets, 8-locations per bucket) for each key.

I was wondering how you were getting 99% occupancy and other such numbers. I'm not exactly up-to-date on the latest Cuckoo-techniques, apparently. But even then, this simple implementation you have is way deeper and useful than the one discussed on say... Wikipedia.

---------

A few notes for those who aren't up-to-date with the latest architectures:

1. Intel machines can perform 2x 256-bit loads per clock cycle, but ONLY using AVX, AVX2, or AVX512 instructions. Two-loads per-clock can be sustained through the L1 and L2 cache, only slowing down at the L3 cache. Latency characteristics differ of course between L1 and L2 levels.

2. Most AVX2 instructions have superb numbers: executing in a single clock or even super-scalar execution per clock. Skylake supports 2x 256-bit comparisons per clock cycle (Simultaneously, with the two loads. Ports 0 and 1 can do 256-bit comparisons each, while port 2 and 3 can do loads. Intel Skylake will still have ports 4, 5, and 6 open as well to perform more tasks)

So effectively, the code checks ~16 bucket-locations of a Cuckoo Hash in roughly the same speed as the total-latency of a DDR4 RAM Access. In fact, their implementation of ~36ns is damn close to the total latency of the Skylake system as a whole. The implementation is likely memory-controller bottlenecked and can't be much faster.

http://www.corsair.com/~/media/corsair/blog/2015-09/ddr3_vs_...

Note that this Corsair chart is from 2015, and DDR4 RAM Latencies as well as memory-controllers have gotten better.

Good job with the implementation! That's an outstanding level of speed you've achieved. I stand by what I said earlier: you've basically made a 400MPH truck and barely spend any sentences on it, lol. That's the really interesting part of your post IMO.

-----------------

I think I better understand the conclusion of the blogpost as well:

> Does all this mean that learned indexes are a bad idea? Not at all: the paper makes a great observation that, when cycles are cheap relative to memory accesses, compute-intensive function approximations can be beneficial for lookups, and ML models may be better at approximating some functions than existing data structures.

The Google Machine-learning paper is an indicator of how slow RAM is, rather than anything else. RAM is so slow, that spending a ton of cycles doing machine-learning may (in some cases) be more efficient than accessing memory and being wrong.

The Cuckoo-Hash has two major benefits: Its basically checking 16-locations at once (AVX2 instructions + speculatively looking at two memory locations at the same time).

Secondly, Cuckoo-Hashes are provably wrong at most two times. So two memory-accesses (which can be done "speculatively" by the processor) is the worst that could happen. Indeed: the latency numbers reported here are close to the practical limits of the x86 processor! A machine-learning algorithm can't do much better than that.


Yup, this is a great explanation. Cuckoo hashes actually do much better in terms of load if you use buckets with multiple items, as we did here. The classical one with one item per location can get "stuck" with unresolvable cycles between elements at 50% load, but when you have these buckets, there are many more ways to resolve collisions, which lets you greatly increase the load. 99% is extreme but definitely doable. Writing a fuller post on this is definitely a good idea -- we may do it in the future.


Matei Zaharia (one of the PIs on DAWN) here. Snorkel, MacroBase and ASAP are already being used in production at several companies, and we intend to continue publishing everything as open source. We only started this lab a year ago, so a lot of the projects listed are still new.


I am trying to see any sample projects to learn, if its possible pls share


Shoot, now we have to implement all of those before someone else does :).



Great book, except for the bit about being able to boot an Apple ][ with a binary from a SETI download.

[I've often wondered what a computer and OS designed by an extraterrestrial civilization would look like. But it's impossible to for just us to project our own systems out any reasonable amount of time. It's unclear that we'll even have keyboards and displays 100 years from now, or that we'll have replaced C with something better <-- troll :-) ]


At least you defined C as the language to beat.


Well, and several later you could upload a virus from a Mac computer, so I guess they kept updating

/s


For all of Independence Day's faults, I couldn't forgive the fact that Will Smith flew up to the alien mother ship and plugged in a USB key (containing the virus which destroyed the ship (er, spoiler)).


> I couldn't forgive the fact that Will Smith flew up to the alien mother ship and plugged in a USB key (containing the virus which destroyed the ship (er, spoiler)).

Only that this is not what happened in the movie. Jeff Goldblum tried to get out of "his" satellite links that pirate signal (= alien communications hijacking) that distorted the TV transmissions. He found and partly reverse engineered its line coding between his arrival at the TV station and his lunch break, identifying a recurring pattern that he (correctly) identified as a countdown.

Later at Area 51 (that scene is only found in the extended edition) he's shown a piece of alien tech in the crashed ship the A51 scientists can make no sense of, but turns out to transceive (and display) the same pattern as the pirate signal used by the aliens.

Eventually he arrives at the idea of engineering a signal pattern that, when injected into the aliens' communications network will (temporarily) shut down the ships' shields, simply by glitching the responsible controllers. It's never stated how that signal pattern is generated, but if it makes you feel better, assume it was some kind of fuzzing method.

When they go up to the mothership they don't "plug in" a USB storage device. Watch that sequence again. Goldblum is first running a program on his computer that synchronizes with the aliens' communication network's signal pattern and then uses that link to inject the glitching signal.

----

Yes, the timeframe is more than optimistic short. But it's not something that's logically impossible. In fact it's one of the better parts of the movie IMHO. And all the people who complain about "how did he do this without knowing their computer architecture?" definitely never looked into, or did reverse engineer a completely black box piece of computation equipment. It's definitely possible; yes it takes a lot of patience, but it can be done.


That's because all of our tech was based off the roswell crash, that's the hand wavy way of explaining the compatibility. That and a telepathic species can afford to be complacent on internal security.

The one I can't forgive is the aliens having telepathy in the first place. So much sci fi has this and it ruins the story for me as much as it would if a wizard showed up.


Why not a telepathy consistent with physics? Electromagnetic radiation emitted into the brain of a recipient and whatnot.


For one, I've never seen the telepathy be interfered with by other electromagnetic radiation or anyone build a faraday cage to limit the communication. Telepathy goes right through the deflector shield in star trek.


Well it is a universal serial bus!


Databricks -- San Francisco -- https://databricks.com

* Software Engineer (ONSITE)

* Software Engineer Intern (ONSITE)

* Product Manager (ONSITE)

Databricks was founded in 2013 by the team that started Apache Spark, meaning you might not only use Spark but also get to work on it :). We provide a cloud data processing platform based on Apache Spark used by customers including top 5 banks, healthcare and media companies. We've also done some really cool technical stuff, such as setting the 2014 GraySort record (http://www.wired.com/2014/10/startup-crunches-100-terabytes-...).

We are hiring engineers in the following areas:

* Backend (JVM, AWS, database engine)

* Frontend (React, D3)

* Machine learning

List of available positions: https://databricks.com/company/careers


Hey, I'm unsure if the listings you made first are supposed to be in the available positions. If the ones you list are meant to be available, is there a reason for the internship position not appearing? Thanks for the help!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: