I effectively distill the frontier models by building whole sets of skills, personas, and other artifacts that I can then run on smaller models and get 10% even 20% improvements on models like haiku or local models.
There's a lot of room for improving the smaller models at many levels of the stack.
This is a good point. It didn't really work on older small models but the latest crop are quite good at following instructions and paying attention to detail, they just lack a lot of the sophistication and nuance that the frontier models have these days. So they are often capable of doing very complex tasks, they just need more detailed and foolproof instructions than the larger models would.
I used it today to take a look at my previously built design system with Logos, branding, fonts, and everything else. After a lot of annoying tweaking back and forth, finally, I got something that was satisfactory.
Then I looked at the usage and it said I had used 95% of my Claude design usage for the week!
This isn't a real tool. This is a plaything, if that's what they're providing as examples.
I used Claude Design to see how it'd spit out a design I already had been working on for some weeks, given a dense enough prompt and a decent requirements document (I did not feed it visuals). I thought the output was pretty good! It didn't match the style we're after at all but it did do some logical content grouping and made some IA decisions I decided to pull into my own explorations. Overall I left with a good impression.
And then I was scrolling Twitter, and saw someone else post their own "success story" and the design was nearly identical to the mock up Claude Design made for me. Lol. The homogenization problem will continue to plague tools like these to some degree, much in the same way AI generated text or code or imagery has a sort of homogenous tone or feel to it.
That’s because designers stopped caring about following each platform’s guidelines because they want to spread “brand recognition” or some shit like that.
This is kind of a revisionist view of software. I think most of the consistency we remember from software past is because skipping the OS tollboxes and doing your own custom UI was hard rather than because most software developers cared about consistency. Yes the OS vendors did, but one doesn't need to go far to find applications that very much did their own thing. "Bubbly" and "goopy" UIs of the sort "Kai's Power Tools" exemplified were all over in the 90's. Everyone's favorite Winamp was famously not using the standard UI toolkits and had a heavily customizable UI. To say nothing of the many software packages that used the standard toolkits only far enough to give you a window that was then filled with some sort of Macromedia or similar UI that was then completely proprietary to the application itself (think encyclopedia and other educational software of the day). Even the OS vendors couldn't help themselves sometimes (looking at you QuickTime 7)
If older software was more consistent, it's only because the OS didn't provide nearly the same degree of customization options that HTML and CSS provide developers today. Not because of some pride in consistency.
This exudes everywhere. I've had cases of where some weirdo company changes their packaging on, say, soap... and now I literally can't find what I used to use. The logic is that some other company is cloning their look, so they want to "stand out" again, and thus change theirs.
Sometimes, I'll manage to find the brand with the new colours and logo. But often even then, I can't find the specific product from that brand. They've changed it so much I can't tell which version I picked before. Which makes me look for something more like what I used to have.
Good job "standing out" guys. I'd say literally maybe 1/3 of the time, I've just literally lost products. I don't know the name, just how it looks.
Distinctive hammers and other tools get brand recognition and free marketing out in the field, ostensibly increasing sales - that's why all the tool companies have their distinct colors and you can see the type of tool someone uses from a distance. Matching chargers/batteries incompatible with other brands perpetuate this even further.
Someone IS designing all this, they just aren't optimizing for what you wish they were.
Design is too broad a word for what is being discussed here and often in the world at large.
Still, to me, good design is intuitive. I look at the thing, and I know how to use it. If it looks great and distinctive, even better. But most outlandishly distinctive design I've (consciously?) found is terrible.
Obviously, these short sentences hide a lot:
- To know how to use things, I must have prior experience. But different users have different prior experiences and acquired design patterns (i.e. interaction patterns)
- My knowledge of the domain is also different from that of other users.
- The way I interact with the system is affected by many factors (e.g. accessibility related concerns, zoom, etc.)
- Intuition is not magic. It comes after training as well. Good design is discoverable. Extraordinary design reinforces its own patterns seamlessly, so that I learn it without even knowing I'm learning (see: hidden tutorials in game design). I also include here the incredible attributes of good design that far predate computer-related design (e.g. how an icon should be recognizable just by its silhouette, or how apps "invisibly" teach us what each color or even section of the screen means).
- My incentive to learn (sometimes "tolerate") the design depends on many variables. Some of these include the design's "taste", yes. Others depend on how much my boss/client is paying me to "use this shit".
I wouldn't say I want a world where everything looks the same, but I certainly want one where everything works the same, and some geniuses once in a while add something new to my list of known (and loved) design patterns. I am not anti-design-experiments, but I will take a predictable UI that looks like windows 98 everyday over some "distinctive" shit that breaks all manner of expected behaviors (from keyboard shortcuts, to colors, to button placement, to relative sizing, to........)
I would take every news site delivering straight text, and letting me pick the page layout template to apply to all of them. Some kind of markup language that could be transmitted and then respect the users preferences as far as rendering.
I think its good that HN and reddit are basically the same, or that all old forums were basically the same but with different color schemes. Homogeneity is a blessing for UX.
Honestly, HN and Reddit are almost as different as threaded discussion forums are possible to be, especially New Reddit with it's "click hundreds of times to unhide most of the text on this page" approach to threading. Reddit's overall design aesthetic is all about pictures and headings and sidebars, and even minor details like the up/down arrows look different and are placed in a different relative position. The only design element they've got in common is Verdana, and that simply because when the websites were launched you only had two widely-installed sans-serif fonts to choose from...
It’s really difficult to make a design that is usable, follows platform standards, yet has unique personality.
I mean, really difficult.
Coming up with a design that relies exclusively on platform standards is easy, “low-hanging fruit.”
I write stuff for iOS/MacOS/WatchOS. There’s tremendous pressure to follow platform standards. In fact, if you use SwiftUI, it’s very hard to deviate from them. SwiftUI makes it easy (crazy easy) to follow the herd, and downright miserable, if you want to blaze your own trail.
90% of the time, that’s actually a good thing. I get pretty sick of designers that refuse to compromise, and believe that their graphic opus is more important than usable UI. It’s even worse, if the designer is an engineer, with little background in graphic design.
A designer that knows how to compromise, and work with usability, is a unicorn. If you have one, keep them.
Like the code that LLMs produce, I expect the designs to be fairly low-effort, but that will be a good thing, overall. They will be effective and usable. We need more of that.
> I used it today to take a look at my previously built design system with Logos, branding, fonts, and everything else.
The fact that you are using this language tells me you are probably more advanced than the average individual, and likely have higher expectations.
My sister-in-law has a small apparel company. She’s developed quite a bit of skill over the past six years but she really struggled at the start. She had great ideas, but translating them to something she could apply was frustrating. *Anything* that could have helped her there would have been worth a look.
> The fact that you are using this language tells me you are probably more advanced than the average individual, and likely have higher expectations.
I am terrible at frontend, but I’m a decent engineer, and I needed to do frontend with AI a few weeks ago. The first thing I did is figure out how other people manage this; apparently there’s a whole design system made of atoms, molecules and organisms that works well.
I asked Claude about this, set up a workflow together, and now I have a design system markdown, maintaining the design standards using the atoms etc vocabulary, and it works really well.
If I can pick this up in a few days, most people that are serious about design are able to as well.
Funny. My read on that language was this person has absolutely no idea what a truly robust and scalable design system and component library actually are, particularly within the scope of a successful business. Well built ones serve every facet of the organization, not just the product.
I had a similar experience with running out of usage quite quickly, after setting up one design system properly, and then getting pretty close with a second one. But it's a research preview - I'm sure it will change.
I was quite happy with what I pulled off using the first design system: I wanted a new footer section for my IPAAS startup, it generated four options, the fourth of which was quite good. We iterated on it for a bit, then I pulled it into Claude Code (that integrated feature is very cool), CC built it, I deployed it, done. (Bottom section of https://tediware.com/ if you're interested, the bit with "Origin story" on the left and the signup panel on the right).
It was not a complicated build by any means but I liked the concept it developed and it was dead-easy to make it all happen. I think the ideas in the UI are very good. Still rough, but you can see where this could go, and it's got a ton of potential.
I mean, it's fine and serves it's purpose, but I'm a bit confused what you are getting that you wouldn't get with the millions of pre-made designs and design systems? Like Tailwind UI for example.
I find that with the ubiquity of Tailwind, developers treat design as a "solved problem". What's missing is the specific evolution of one's product and the resultant information architecture. The sibling response is my experience as well, design is an incredibly interactive exercise.
Granted, not every component on every surface will need this amount of scrutiny. But I'm usually the outlier developer warning teammates that design is not a solved problem. Granted, there's a huge difference between an existing app and its evolution and throwing a nextjs landing page up in search of any life.
Even with bootstrap, design was a solved problem. What you bring with a UI designer is appeal (aka make thing pretty and enjoyable). If you want utilitarian, even the old x11 toolkit like Athena, Win 98 era widgets would do the part.
I wouldn’t, but you’re not much of a product designer if you can’t get your ideas across using simple tools like a sketch on a whiteboard (there was|is an app the let you take photos and link them using active areas).
So you can take bootstrap (or even raw html) and create something useful. Then you make it nice, not the other way around.
You would have to be a big outlier to feel the need to create a custom widget. Most widgets have been defined since decades.
I agree that design is about primitives. wireframes and IA should come across clearly at any fidelity.
But i don't think that's what tailwind and bootstrap are doing. But people very much use these tools to "solve design".
The layouts, widgets, and primitives in these tools are not primitives. I can't deny they get tons of people very far very fast. But my main disagreement is that all of this isn't design and it's not what designers do. You touched on what i agree with: UX flows, diagrams, stories, journeys, personas, etc, these all need to be designed and connected in reality using various primitives for the medium.
Then you slap a cohesive paint job on it, interaction elements, tone and terminology and yes, there is that element of design too.
Iterative experience (experimenting with different ideas, deciding what works best) and speed of execution (once I was happy with it, making it happen required almost no work).
Yes. Even without Claude design and just Claude code, it can use existing design and build out new mockups in-app, which is much easier to demo , tweak and then implement the backend (if any) - all through Claude Code (or Codex if you prefer that). We use both and are now leaning more towards Codex over Claude
• Claude Design uses Opus 4.7, which is more expensive than earlier models.
• It's just Day 2; it's not a finished product. It's ridiculous how quickly Anthropic iterates.
• If you've been using Claude for a while, Design already knows your style and preferences. You'd have to start from scratch using a different AI design tool. I don’t doubt that'll pay dividends in the long run.
> It will never be cheaper than what it is today. Anthropic is heavily subsidizing.
We don't know that for sure—they've dropped prices before:
1. Claude 3 → Claude 3.5/3.7 generation (mid-2024 to early 2025): Haiku went from $0.25/$1.25 to $0.80/$4.00 per MTok — this was actually a price increase for Haiku, but Sonnet stayed flat at $3/$15 while delivering significantly better performance, effectively a price-per-capability reduction.
2. Claude 3/4 Opus → Claude Opus 4.5/4.6 (late 2025): This was the big one. Opus dropped from $15/$75 per MTok down to $5/$25 per MTok — a 67% reduction on input and output. This is the most significant explicit price cut Anthropic has made, delivering a far more capable model at one-third the price.
They're definitely not subsidizing API pricing, can't believe how prevalent that fallacy is on HN of all places. The question is how profitable Claude Code is. Your example 2 is real and major but your example 1 is ridiculous, almost any new model from any company is better at the same price, and how is increasing the price an example of decreasing prices??
BTW, Github Copilot is pricing Opus 4.7 at 2.5x the cost of Opus 4.6 at promotional pricing (so maybe it'll be 4-5x). But Github's request based pricing is insane, completely divorced from their actual costs (you can achieve 1+M tokens for $0.10 if you give it a large request), so I'd assume they're losing a lot of money.
The cost of a thing, is relative to its source costs. They are subsidizing API pricing, if you consider all the costs to provide the service, including all model creation, training, etc costs.
But that doesn't mean they will be more expensive, longer term. The cost of compute will go down as time goes on. Each year it will get cheaper. Same for power requirements, computing density, cooling, and so on.
I remember trying to store and play mp3 files on older computers. I could typically hold a few on a disk, and if I wasn't doing anything else I could play one. Barely. Now you'll be hard pressed to play an mp3 and see the load results in top or what not.
If those cost of compute is going down, then eventually it will go down enough that we will run on our LLMs locally and Anthropic will go out of business.
> then eventually it will go down enough that we will run on our LLMs locally and Anthropic will go out of business.
I want robust local LLMs as much as the next person—Gemma E2B, 3.2GB does my word completions as I type. It's gotten to the point where it knows what I'm going to type before I do!
But I don't see Anthropic going out of business anytime soon. As good as some of the open source LLMs are, we’re still a long way from being able to frontier models at home.
If you are using LLMs for tool use locally, then in a decade it will not make sense anymore to pay for hosted solutions. Your device will have compute power to run powerful LLMs trivially.
If you need LLMs at scale to serve many customers, then hosted solutions make sense for the availability aspect. But by this point models can be offered by any generic services provider, like AWS or Cloudflare. Pure AI companies that just offer hosted models and nothing else will go extinct if they don’t expand to offer more services.
> If you are using LLMs for tool use locally, then in a decade it will not make sense anymore to pay for hosted solutions. Your device will have compute power to run powerful LLMs trivially.
LLMs a couple of years ago that'd be impossible to run on consumer hardware are now running on consumer hardware. I'm less concerned about compute power; it's more about memory.
It could be several years before new RAM capacity comes online. Even then, it won't be cheap.
I expect in the future, hosted frontier models will be a utility like electricity or cable tv. Part of a package most people will subscribe to.
> can't believe how prevalent that fallacy is on HN of all places
AI is very emotional for a lot of people leading to bias takes in both directions. We like to think HN is more rational than average, but we’re all human.
It's in research preview. I suspect limits are low on purpose. FWIW, I gave it twelve screenshots of different pages in my app and it did a really excellent job fixing them up. Consumed just 40% of weekly quota - still too high but it's probably a YMMV situation.
It produced great results for me, in 10 mins, and then my usage was blown and now I have to wait a week. It did let me export the ZIP, though. I tried throwing the contents of the ZIP into Stitch With Google, but it didn't work very well.
Yup it's based off their playground so plaything is the right word.
It's a wrapper around that. I definitely appreciate the better design output from Claude code but it has a ways to go before it can replace serious design contenders.
I'm working on a tool to determine which portions of an LLM process can be optimized, and how to measure that optimization and check whether it's optimizable at all. The shaping pattern that they talk about here is directly relevant and makes a whole lot more processes potentially optimizable by looking at the pattern rather than if the metrics just go up or down.
Hey, LLM, take a look at these multiple hundred emails and docs in my docs folder from the last few years, before I started using AI, that I wrote personally. create a list of all of the idiosyncrasies that I have in my writing. Create a file to remember that. And then use that to write any new text that'll be published so it sounds like my authentic voice. Thank you.
Maybe for you reading a paper deeply is the most constructive way that you have to absorb information.
For me, it is having a document and interrogating it. Maybe having many sets of documents about a whole category of information. Getting the bullet points. getting the high level and then interrogating and digging down and being able to get bubbled up information as I need it.
That is the learning style that matches how I learn.
I have never been able to skim, so reading a large document WILL teach me that topic, but getting through that doc is tough.
I can dump a very large set of docs in a reader that lets me interrogate the whole data set and I can fly through looking for what is interesting to me, and what I may need, and along the way I will likely dive into other parts too. Asking questions keeps my hyperfocus active.
I think it is just a different style. I have synesthesia and a hard time not working on three to five things at once. I am use to knowing I learn differently than others.
Might be worth something to create an AI summary of the actual documents that are behind the curtain. That way the Client can have some assurance of what the deliverables are. It's a nice ability for the escrow service to be a trusted third party to verify something is there that matches about what is expected. If you do it right, you can even have the LLM prepare the content, encrypt it, and you never even have access to any of that information.
I don't use Claw. It is way too dangerous. I built my own system where I know the ins and outs and how they can break.
When it comes to agents' tasks, I tend to focus on things that I couldn't do before without automated agents, at least at the going price.
The kind of automation I'm doing is more like building a set of agents to generate marketing surveys for me. They take free form input from me and my project. They aren't particularly sexy but they go off and do something valuable that I literally would never pay for at the prices that they are normally.
This is a really tempting approach but I think it is the wrong one. The issue is one of trajectory. OpenClaw has the attention of thousands of hackers and there is a huge incentive to contribute to make it better. That will compound very quickly and will become much better than whatever private solution you create.
You could literally drop this into Claude Code or Codex and point it at a local fork of Zulip and have it build your bimodal version with triage and grazing styles.
I use an LLM behavior test to see if the semantic responses from LLMs using my MCP server match what I expect them to. This is beyond the regex tests, but to see if there's a semantic response that's appropriate. Sometimes the LLMs kick back an unusual response that technically is a no, but effectively is a yes. Different models can behave semantically different too.
If I had a nice CI/CD workflow that was built into GitHub rather than rolling my own that I have running locally, that might just make it a little more automatic and a little easier.
It looks like it does have an MCP Gateway https://github.com/github/gh-aw-mcpg so I may see how well it works with my MCP server. One of the components mine makes are agent elements with my own permissioning, security, memory, and skills. I put explicit programatic hard stops on my agents if they do something that is dangerous or destructive.
As for the domain, this is the same account that has been hosting Github projects for more than a decade. Pretty sure it is legit. Org ID is 9,919 from 2008.
There's a lot of room for improving the smaller models at many levels of the stack.
reply