Hacker Newsnew | past | comments | ask | show | jobs | submit | jjcm's commentslogin

Worked at Figma for 5 years. The author uses Figma as an example, but I think misses the point. They're so close though. Note these quotes:

> Both are very well-designed from first principles, but do not conform to what other interfaces the user might be familiar with

> The lack of homogeneous interfaces means that I spend most of my digital time not in a state of productive flow

There are generally two types of apps - general apps and professional tools. While I highly agree with the author that general apps should align with trends, from a pure time-spent PoV Figma is a professional tool. The design editor in particular is designed for users who are in it every day for multiple hours a day. In this scenario, small delays in common actions stack up significantly.

I'll use the Variables project in Figma as an example (mainly because that was my baby while I was there). Variables were used on the order of magnitude of billions. An increase in 1s in the time it took to pick a variable was a net loss of around 100 human years in aggregate. We could have used more standardized patterns for picking them (ie illustrator's palette approach), or unified patterns for picking them (making styles and variables the same thing), but in the end we picked slightly different behavior because at the end of the day it was faster.

In the end it's about minimizing friction of an experience. Sometimes minimizing friction for one audience impacts another - in the case of Figma minimizing it for pro users increased the friction for casual users, but that's the nature of pro tools. Blender shouldn't try and adopt idiomatic patterns - it doesn't make sense for it, as it would negatively impact their core audience despite lowering friction for casual users. You have to look at net friction as a whole.


Barring an Internet giant suing them in court, it really feels like this is unlikely to change as most just don’t understand the why or the effect.

Someone needs to write a heist movie set in Spain where a key part of the plan is they steal something while La Liga is blocking some key security route.


I've been using linux as a daily driver since the start of the year.

There's still a long ways to go before things "just work". It's about equivalent to windows right now in terms of frustrations, it's just that frustrations are more along the lines of "this is a bit wonky" instead of "this is malicious / was their intended behavior". It's gotten a LOT better, don't get me wrong, but it's still far off from what a typical user would need.

I'd love to see either Valve or Nvidia really put in effort into creating their own hardware/software integration on a level that Apple does. I think it'd go a long way to legitimizing it.


Thank you for saying something I've been saying for awhile: Linux definitely has jank, but I'm not convinced it's more janky than Windows.

I think people are so used to Windows' awfulness that they kind of forget about how much bullshit is associated with it. Linux has bullshit too, though it's getting better, but when people talk about Linux jank they're always smuggling in an implication of Windows having less jank, which I don't concede at all.


After I replaced my last windows install a few years ago... Checking windows 11 on a friend's PC a few weeks ago was a nightmare. I considered myself a power user back in the day and I really struggled. So now I do have perspective from the other end and it fits the picture - windows is also jank it is just familiar jank for most people.

There is another point too. The trend with Linux is up and improving slowly over decades. And for windows it seems to be the reverse and faster.


Ah the time old classic. Go into the registry and change these 3 keys that seeming have zero relation to the problem at hand and restart your machine TWICE then its fixed.

Out of the box most popular distros require less tweaking and hammering into shape than a windows 11 install and that is a very important "feature"


I don’t think it’s a question that Linux has more jank. I recently installed a fedora spin on a laptop that came with regular Fedora installed originally and the WiFi didn’t work. That’s some janky stuff right there.

I've had wifi drivers not work with fresh installs of Windows as well, so that's hardly a unique Linux thing. I've also had to reboot Windows into special modes because apparently a driver from a Broadcom WiFi card was "unsigned", so I had to disable the check for that.

I've also had registry corruptions, and I've had unprompted updates brick my hard drive because Windows Update is a terrible piece of software, because as far as I can tell the Windows "repair tools" have never worked for any human in history, and neither has System Restore.

I've had updates in Linux break things but never so thoroughly as the time my mom got an automatic update where she literally could not boot in at all (because I think that the automatic update to Windows 11 that she did not want or ask for screwed up the boot keys).


I haven't had a Widows driver not work in decades.

On the other hand, Linux doesn't try to copy my home directory to RedHat's cloud, or force some AI assistant that I don't want onto me.


As much as I am a nixOS user myself, I think regular users should be directed to use atomic, immutable distros (as is the case with most of the distros growing in popularity) because of the robust update system along with the ease of rollback should something go wrong. Regular distros (really comes down to the package manager of choice) are much more brittle, perhaps even worse than Windows Update.

Meanwhile I haven’t had a wireless issue on Linux since 2010 or so.

> fedora spin

Installing the equivalent of OS "slop" isn't Linux's fault... For better or worse the choice that is afforded by OSS licenses means that many of those choices will be bad.


I've been using Linux on the desktop off-and-on for 20 years. I used OSX for awhile 2008-2015 when they clearly had the best hardware, and the OS was pretty nice. I've been using KDE since then, and I recently installed Bazzite (Fedora+KDE-based) on my sans-windows gaming PC. I also started a new job this year, where I have to use the company-provided MBP for compliance reasons, after having not used MacOS since 2015. So all this is pretty fresh in my mind, and I'll say that 2025+ KDE is by far the best out-of-box experience for power users. It mostly just works, and anything you want to tweak is easy to find in the settings. Setting up modern MacOS with things like more keyboard shortcuts for window management, focus-follows-mouse or even remembering where windows where after waking up from sleep requires you to buy an app or pay a subscription.

Linux may break more often, but you can almost always fix it with a quick google search. If it doesn't do what you want, there's certainly a setting or config or free app you can install that does.

MacOS may break less often, but when it does you're mostly out of luck. It may do what you want more often, but if it doesn't you have to buy an app, if its even possible at all.


> Linux may break more often, but you can almost always fix it with a quick google search.

And that’s where the problem is: a quick google search. Laughably trivial for technical users. Non-trivial for the majority of the population.

I love Linux and it is completely viable as a desktop operating system, but it’s far from ready for mainstream without better support.

For a rough analogy, I’d compare it to an old car before electronics. An old car is easy to work on and reliable if you do the maintenance. But an old car wouldn’t be reliable for somebody who doesn’t do any work on a car and outsources the maintenance.

Linux excels when things go right. The failure modes are substantially worse and far more likely to occur. It doesn’t matter if they’re rare. They’re not rare enough. And there isn’t support when things go wrong.

For example: It’s difficult to make the macOS UI fail to start through configuration. You never need to directly touch configuration. (And you can’t modify or delete macOS system files.)

With Linux, some normal problems just have to be solved in the terminal. This allows you to put the system into a configuration where the GUI does not start.


Have also been using Bazzite since march on my home desktop and you are spot on. I think the main reason for average person linux being difficult these days are laptops with weird hardware configurations.

I use MacOS at work and although it is miles better than windows, if I had a choice, I would also use Linux for work.


Me too, I was a 30 year Windows developer and Electronics Engineer so I went pretty conservative with Kubuntu LTS and it's been a pretty slick experience. Gemini has been great tech support for all the CLI stuff and getting all of my weirder hardware projects interfaced (100% success rate to date). Just considering whether to delete my windows partition to put my MP3's on, as realistically I'm not going to get any more Windows Programming gigs.

Yeah, for example a bunch of my system updates began showing scary error notes because somehow there is a header inconsistency between the amdgpu driver and the kernel.

I'm not regretting my choice, but it's also something where the average user can't just call Linux Support and get a "run X and it'll fix it" solution.


One can call Windows support? And get help?

Arguably there are more support options for Windows because it's got fewer derivatives than Linux, and was historically more common on desktop.

If you’re using Fedora or Ubuntu, there may be some bumps.

Use Debian or AlmaLinux and the ride is smoother.


Do typical users care that much about a bit of jank, though? All the “typical users” I know are on spyware infested Windows laptops and just interpret the horrible shabbiness of the whole experience as being normal.

This is the saddest part - they actually think computers suck that much and don't know their lives could be a lot easier.

To add. It is jarring for me when I occasionally get to use someone's browser that does not have an ad blocker. It is indeed surprising what users have accepted as the norm.

Additionally, if you provide any service that offers image diffusion as an offering. You WILL get CSAM* being generated. Make sure you set up multiple layers to catch this. I built out Figma's safety pipeline and procedures for generated content. You'd be amazed what people try and make.

* Not going to debate whether or not AI imagery is CSAM here, but the point being you'll get users trying to generate ai images with subjects < 18yrs old.


I read the entire thing fwiw (pseudo-retired life helps with time here).

It looks like it was a collaborative effort across multiple teams, where each team (research, security, psycology, etc etc etc) were all submitting ~10 pages or so. It doesn't feel like slop.


Did anything stand out across those 244 pages? Perhaps you have some of your take away thoughts written up somewhere?

Sorry very late reply to this, but ya. I posted here: https://x.com/pwnies/status/2041658034087457236

I'll copy the highlights here, but the tweets have imagery as well:

> The obvious hype - It crushes benchmarks across the board, and it does so with fewer tokens per task.

> Despite this, they don’t think it can self-improve on its own. There are still areas your average engineer does better with, and despite it accelerating tasks by 4x, that only translates to <2x increase in overall progress.

> They’re probably right to hold this back - its ability to exploit things is unprecedented. Any site running on an old stack right now or any traditional industry with outdated software should be terrified if this becomes accessible.

> Counterintuitively, while it’s the most dangerous model, it’s also the safest. They’ve also seen significant additional improvements in safety between their early versions of Mythos and the preview version.

> Anthropic does a really good job of documenting some of the rare dangerous behaviors the early models had. > Interestingly, Mythos itself leaked a recent internal “code related artifact” on github.

> Mythos is also RUTHLESS in Vending Bench. Agent-as-a-CEO might be viable?

> The last thing: Mythos has emergent humor. One of the first models I’ve seen that’s witty. The examples are puns it came up with and witty slack responses it had when operating as a bot.


AI writing has stopped feeling like slop around Opus 4.5, though.

This is obviously in response to Mythos, but I'll actually defend their statement at that time - they were right to take a pause.

Think about how much things have changed in our industry since GPT-2 has dropped - it WAS that dangerous, not in itself, but because it was the first that really signaled a change in the field of play. GPT-2 was where the capabilities of these were really proven, up until that point it was a neat research project.

Mythos is similar. It's showing things we haven't seen before. I read the full 250 page whitepaper today (joys of being pseudo-retired, had the hours to do it), and I was blown away. It's capabilities for hacking are unparalleled, but more importantly they've shown that they've made significant improvements in safety for this model just in the last month, and taking more time to make sure it doesn't negatively affect society is a net positive.


Gates Law

A great first step. I'd love to see a sin tax associated with this as well - ie, for adverts that do run, they should have to pay a % of the ad fee to the government.

I don't think people understand just how ingrained in the culture gambling is in Australia. One of the primary 3rd spaces for people in Australia are RSLs, which are technically clubs for veterans to get co-op like services, but have evolved into a 3rd space for everyone that offer food, alcohol, entertainment, and of course, sports gambling and "pokies" (poker/slot machines).


As a West Australian this is so interesting to me, because gambling culture is extremely niche here - but WA law is that pokies are only allowed at the casino, nowhere else. And thank fuck for that.

That's one of the myths the gambling dens propagate: that they are there for the veterans. There is no technicality about it.

https://www.rslaustralia.org/rsl-sub-branches-and-rsl-clubs-...

The "RSL sub-branch" is a not-for-profit welfare organisation, that looks after veterans. For the most part they are small and if they are lucky they get the use of a meeting room in the RSL club.

The "RSL Club" is a multimillion dollar commercial enterprise that looks after its own interests, conducts political lobbying, makes millions of dollars off gambling addicts and hands out token grants in the community to give the impression that they are there to benefit the community. Typically nothing to do with the RSL sub-branch.


What % of a speedup should I be expecting vs just running this the standard pytorch approach?

Technically not in this case, or not effectively. The 0 or 1 correspond to a FP16 scaling factor for each group of 128 bits. The value fluctuates between each group of 128.

1 bit with a FP16 scale factor every 128 bits. Fascinating that this works so well.

I tried a few things with it. Got it driving Cursor, which in itself was impressive - it handled some tool usage. Via cursor I had it generate a few web page tests.

On a monte carlo simulation of pi, it got the logic correct but failed to build an interface to start the test. Requesting changes mostly worked, but left over some symbols which caused things to fail. Required a bit of manual editing.

Tried a Simon Wilson pelican as well - very abstract, not recognizable at all as a bird or a bicycle.

Pictures of the results here: https://x.com/pwnies/status/2039122871604441213

There doesn't seem to be a demo link on their webpage, so here's a llama.cpp running on my local desktop if people want to try it out. I'll keep this running for a couple hours past this post: https://unfarmable-overaffirmatively-euclid.ngrok-free.dev


Thanks for sharing the link to your instance. Was blazing fast in responding. Tried throwing a few things at it with the following results: 1. Generating an R script to take a city and country name and finding it's lat/long and mapping it using ggmaps. Generated a pretty decent script (could be more optimal but impressive for the model size) with warnings about using geojson if possible 2. Generate a latex script to display the gaussian integral equation - generated a (I think) non-standard version using probability distribution functions instead of the general version but still give it points for that. Gave explanations of the formula, parameters as well as instructions on how to compile the script using BASH etc 3. Generate a latex script to display the euler identity equation - this one it nailed.

Strongly agree that the knowledge density is impressive for the being a 1-bit model with such a small size and blazing fast response


> Was blazing fast in responding.

I should note this is running on an RTX 6000 pro, so it's probably at the max speed you'll get for "consumer" hardware.


consumer hardware?

That... pft. Nevermind, I'm just jealous


Look it was my present to myself after the Figma IPO (worked there 5 years). If you want to feel less jealous, look at the stock price since then.

Well in this context it's a 5090 with extra unused memory.

Holy hell ... that's a monster of a card

I must add that I also tried out the standard "should I walk or drive to the carwash 100 meters away for washing the car" and it made usual error or suggesting a walk given the distance and health reasons etc. But then this does not claim to be a reasoning model and I did not expect, in the remotest case, for this to be answered correctly. Ever previous generation larger reasoning models struggle with this

I ran it through a rudimentary thinking harness, and it still failed, fwiw:

    The question is about the best mode of transportation to a car wash located 100 meters away. Since the user is asking for a recommendation, it's important to consider practical factors like distance, time, and convenience.

    Walking is the most convenient and eco-friendly option, especially if the car wash is within a short distance. It avoids the need for any transportation and is ideal for quick errands.
    Driving is also an option, but it involves the time and effort of starting and stopping the car, parking, and navigating to the location.
    Given the proximity of the car wash (100 meters), walking is the most practical and efficient choice. If the user has a preference or if the distance is longer, they can adjust accordingly.

And to be fair, you asked about traveling to a location. It just so happens that location is a car wash. You didn't say anything about wanting to wash the car; that's an inference on your part. A reasonable inference based on human experience, sure, but still an inference. You could just as easily want to go to the car wash because that's where you work, or you are meeting somebody there.

Honestly, the fact that we have models that can coherently reason about this problem at all is a technological miracle. And to have it runnable in a 1.15GB memory footprint? Is insanity.

Exactly. It's not that the pig dances poorly, or that the dog's stock tips never seem to pan out. It's the fact that it's happening at all.

But the fact that we have convinced a pig to dance, and trained a dog to provide stock tips? That can be improved upon over time. We've gotten here, haven't we? It really is a miracle, and I'll stick to that opinion.

here's the google colab link, https://colab.research.google.com/drive/1EzyAaQ2nwDv_1X0jaC5... since the ngrok like likely got ddosed by the number of individuals coming along

Thanks, that works. I only tested the 1.7B. It has that original GPT3 feel to it. Hallucinates like crazy when it doesn't know something. For something that will fit on a GTX1080, though, it's solid.

We're only a couple of years into optimization tech for LLMs. How many other optimizations are we yet to find? Just how small can you make a working LLM that doesn't emit nonsense? With the right math could we have been running LLMs in the 1990s?


I think that not just could, we should have had them.

As far as I understand, neural networks were very hyped in 60s and 70s and when hype bust, they've fallen out of focus. Hardware was not there yet.

Then they were neglected for many years and really pioneer science was apparently only done by Google. Theoretical breakthroughs came in 2010s, after GPT-2 masses attention caught up and we (over)focused on neural networks again. GPT-2 was way below the capabilities of current hardware, we quickly caught up and now we're optimising.

Had it not be the burst of previous hype bubble, the NN wouldn't be essentially forgotten, and we'd have steady stream of optimisations and improvements while using the maximum of currently availible hardware.

Something like voice translation model running locally should have been possible by the end of 1990s. That way we'd have steady increase of LLM capabilities, no hype, and time to adapt and understand how to properly use them with no disruption.


Good call. Right now though traffic is low (1 req per min). With the speed of completion I should be able to handle ~100x that, but if the ngrok link doesn't work defo use the google colab link.

The link didn't work for me personally, but that may be a bandwidth issue with me fighting for a connection in the EU

As someone whose brain was addled by exposure to art history, I strongly support the suggested pelican on bicycle.

Thanks. Did you need to use Prism's llama.cpp fork to run this?

Yep.

Could you elaborate on what you did to get it working? I built it from source, but couldn't get it (the 4B model) to produce coherent English.

Sample output below (the model's response to "hi" in the forked llama-cli):

X ( Altern as the from (.. Each. ( the or,./, and, can the Altern for few the as ( (. . ( the You theb,’s, Switch, You entire as other, You can the similar is the, can the You other on, and. Altern. . That, on, and similar, and, similar,, and, or in


I have older M1 air with 8GB, but still getting ober 23 t/s on 4B model.. and the quality of outputs is on par with top models of similar size.

1. Clone their forked repo: `git clone https://github.com/PrismML-Eng/llama.cpp.git`

2. Then (assuming you already have xcode build tools installed):

  cd llama.cpp
  cmake -B build -DGGML_METAL=ON
  cmake --build build --config Release -j$(sysctl -n hw.logicalcpu)
3. Finally, run it with (you can adjust arguments):

  ./build/bin/llama-server -m ~/Downloads/Bonsai-8B.gguf --port 80 --host 0.0.0.0 --ctx-size 0 --parallel 4 --flash-attn on --no-perf --log-colors on --api-key some_api_key_string
Model was first downloaded from: https://huggingface.co/prism-ml/Bonsai-8B-gguf/tree/main

To the author: why is this taking 4.56GB ? I was expecting this to be under 1GB for 4B model. https://ibb.co/CprTGZ1c

And this is when Im serving zero prompts.. just loaded the model (using llama-server).


I did this: https://image.non.io/2093de83-97f6-43e1-a95e-3667b6d89b3f.we...

Literally just downloaded the model into a folder, opened cursor in that folder, and told it to get it running.

Prompt: The gguf for bonsai 8b are in this local project. Get it up and running so I can chat with it. I don't care through what interface. Just get things going quickly. Run it locally - I have plenty of vram. https://huggingface.co/prism-ml/Bonsai-8B-gguf/tree/main

I had to ask it to increase the context window size to 64k, but other than that it got it running just fine. After that I just told ngrok the port I was serving it on and voila.


I reminds me of very early ChatGPT with mostly correct answers but some nonsense. Given its speed, it might be interesting to run it through a 'thinking' phase where it double checks its answers and/or use search grounding which would make it significantly more useful.

The speed is impressive, I wish it could be setup for similar to speculative decoding

man, that is really really quick. What is your desktop setup??? GPU?

It is fast, but I do have good hardware. A few people have asked for my local inference build, so I have an existing guide that mirrors my setup: https://non.io/Local-inference-build

thanks, i tested it, failed in strawberry test. qwen 3.5 0.8B with similar size passes it and is far more usable.

Does asking it to think step by step, or character by character, improves the answer? It might be a tokenization+unawareness of its own tokenization shortcomings

no it did not with character by character it concluded 2 :-)

I hope you are kidding, how is that a test of any capabilities? it's a miracle that any model can learn strawberry because it cannot see the actual characters and ALSO, it's likely misspelled a lot in the corpus. I've been playing with this model and I'm pleasantly surprised, it certainly knows a lot, quite a lot for 1.1G

Interesting. Qwen 3.5 0.8B failed the test for me.

wow that was cooler than I expected, curious to embed this for some lightweight semantic workflows now

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: