More

kriro · 2026-06-01T17:13:59 1780334039

Unfortunately DDG is still horrible for non-English results. As are most "smaller" search engines. I rotate through them every now and then to try. Is there a meta search engine that uses country specific engines depending on searches anyone can recommend?

kriro · 2026-05-29T18:15:10 1780078510

We're talking about enterprise customers. The trivial answer is Mistral has sales teams and consultants from the same company that builds the models and from the EU.

doctorpangloss · 2026-05-29T18:39:11 1780079951

i can invest in public markets in a lot of $10b sales and consultants businesses, who can also put mistral on premises (or do whatever the hell people ask for), it makes mistral sound like it is yet another one of those, not a growing $1T business.

kriro · 2026-05-28T13:06:07 1779973567

I don't see it mentioned explicitly in the methods section but I assume you prompted each model only once for each question? Did you consider prompting n-times in blank states to see if the models even agree with themselves?

Would also be interesting to add a virtual model that is simply the majority of all models and see how much the individual models differ from the "consensus".

Do you plan to add some sources in the related work section of baseline numbers for human expert disagreement in fact checking tasks (I'm assuming such studies exist).

kostaj · 2026-05-28T13:43:41 1779975821

Indeed. I prompted each model ones, plus one retry on errors. Very good point to measure the inter-model disagreement! Will add in the next version.

Section "4.2 Agreement w/ peer majority" shows the level of agreement of each model with the majority.

Yes, planning of human-labelling the same corpus of 1,000 claims and publishing a second study measuring the models performance against the human-labels on corpus that the models have not seen during training.

kriro · 2026-05-24T16:13:06 1779639186

I agree. Once upon a time I was quite interested in FPGAs but the infrastructure being so uninviting in general made me move on completely. I was somewhat recently involved in quantizing neural networks with FINN (AMD) and let's just say...that was a pretty bad experience overall.

tremon · 2026-05-24T17:25:51 1779643551

Yes, same here. Did my thesis on reconfigurable co-processors in the 00s, then quickly moved away from that market due to the atrocious tooling availability and OS support once I was no longer a student.

kriro · 2026-05-22T07:38:17 1779435497

Nice analysis, I would have loved a short overview of the kinds of experiments that were running on the machine (I know the results are given).

I find the "independent researcher" business model quite interesting. In the linked post he writes """DFT is a proprietary training algorithm, however, I’m currently offering a beta for a model training service where I will train your model for you using DFT.""" I'm curious how successful this is. Essentially market some AI breakthrough as a service instead of publishing a paper like my academic brain is trained to do.

As an aside, one thing that I always loved about our field was that the startup cost for many business ideas was "a laptop, internet connection and some some grit". In the age of AI it's quite a bit more and I feel one of the sad side effects of this is that it crowds out poorer and younger developers.

kriro · 2026-05-20T09:38:12 1779269892

If I'm only thinking about non-programming business applications, anecdotally, Mistral is certainly a player in the European enterprise market. For most German companies I have interacted with, Mistral was the first point of contact regarding corporate AI rollout. For the "small potatoes" day to day minutia Copilot is probably the #1.

kriro · 2026-05-19T14:36:27 1779201387

I used it a bit, had it installed for a while on a G4 PowerBook (must have been early-ish 2000s). I like the no-nonsense attitude towards blobs, security focus. Overall the experience was very good. The bit of code I read was also written nicely. I'll always endorse it and should really install it somewhere again in the near future.

This is also the 60th release. Congrats team.

kriro · 2026-05-17T09:04:35 1779008675

Seems like textbook Inside the Tornado marketing. Pick a country as a bowling pin, show some success, go for a different/bigger country. Presumably cover EU first this way. Be the first to offer all-citizens licenses.

kriro · 2026-05-05T07:30:37 1777966237

I did it back in the day when fast.ai was relatively new with ULMFiT. This must have been when Bert was sota. The architecture allows you to train a base and specialize with a head. I used the entire Wikipedia for the base and then some GBs of tweets I had collected through the firehouse. I had access to a lab with 20 game dev computers. Must have been roughly GTX 2080s. One training cycle took about half a day for the tokenized Wikipedia so I hyper parameter tuned by running one different setting on each computer and then moving on with the winner as the starting point for the next day. It was always fun to come to work the next morning and check the results.

The engineering was horrible and very ad-hoc but I learned a lot. Results were ok-ish (I classified tweets) but it gave me a good perspective on the sheer GPU power (and engineering challenges) one would need to do this seriously. I didn't fully grasp the potential of generating output but spent quite some time chuckling at generated tweets (was just curious to try it).

kriro · 2026-04-20T12:54:23 1776689663

Innocent mistakes and frustrating back and forth are also very common, especially for interdisciplinary teams. The mismatch in tooling and workflows and manual copy & paste conversion is a thing to behold. Add multiple countries and Excel to the mix (dot vs comma, formulas being language specific), maybe have a couple of Chinese, Japanese, Russian or Arabic speking researchers in the group for some extra UTF-8 magic. Line endings on Linux vs. OSX vs. Windows.