Hacker Newsnew | past | comments | ask | show | jobs | submit | pdyc's commentslogin

what bug and it affects what?

it's a prompt cache invalidation bug that causes all input to be reprocessed instead of getting preloaded

There are other reasons to prefer vllm to llama-cpp as well


yes

harness - pi+custom extension for subagents

model - qwen3.6 35ba3b q4km

hardware - intel arrow lake with 32gb ram

server - llama.cpp vulkan

performance - 15-18t/s generation 50-150t/s pp

planning and task creation is still using claude/gpt but they dont touch the code. All coding is done using this setup.

Example of project made using this setup easyanalytica.com , its of medium size complexity


i am still working on easyanalytica tool to auto generate dashboards without ai . I recently added comparison feature and figuring that out was fun. There are lot of interesting ideas on execution side of it but for end user its a simple product, just give data and see the dashboard.

html snippet playground - for testing html/react snippets

token speed calculator - for estimating tg/s of ai based on ram speed and model size/params this helps in comparing different hw, estimating likely speeds i will get on hardware

prompt assembler - to create prompt and context once and reuse it in different ai's, picking and choosing context in a prompt, creating agent.md etc.

dashboard builder - for viewing gsc, ga, stripe data in one place


afaik, enterprise plans are not subsidized. its 20$/seat+api pricing. Unless you are saying api pricing itself is subsidized.

This is market introductory pricing that hasn't factored in cost recovery. Most of it has been run on early investment with the assumption they will recover costs in the long run. The prices are subsidized across the board and they will need to go up signficantly to recover them.

Assuming this were accurate, then presumably the AI companies would be betting that inference costs come down before the bill is due - I don't see enterprises being willing to absorb another ~10x price increase for tokens (as they've just done going from subscription prices to per-token pricing)

For claude shops this was a huge hit. But lets back this up. There are some companies that haven't even built a break-even model at this price because they are funded by investment. As soon as those investors lose patience the first dominos will fall. For those who have somewhat of a business model, will it survive a price increase? The bigger question is do the base model providers have enough runway and have a way to keep going as they need to recover costs.

It's mostly R&D though, not inference. If LLM's effectively become a commodity then they are screwed anyway.

Aren’t the Chinese labs quickly turning them into a commodity?

The open-weight models will have a steady race to the bottom on inference costs just by dint of competition between providers. They aren’t at the frontier yet, but they are rapidly eating the flash market.


Yeah, that's not going to work if you can get e.g. 80% of value by using 10-20x or more cheaper open models. At some point it would just make sense for large companies to rent compute and deploy their version of DeepSeek or whatever (if they don't trust Chinese providers)

None of what you said is true

And you know this how?

Burden of proof is on you

depends on how clear your instructions are, if there is no ambiguity you can even use gemma4 e2b/e4b.


i use smaller model gemma e2b for most of my editing and it works surprisingly well. Workflow is planning with sota models and execution via small models. If you plan properly dont leave ambiguity for smaller model it works well.


Out of curiosity have you tried other small models? The e2b for me was unusable. Llama3.2 3b was better and that thing is a year old and I rarely use it now too.


yes i keep on trying small models, i have also tried qwen 3.5 0.8B, 2B, 4b and gemma4 e4B models but they either did not worked reliably (thinking loop, issue in following instruction) or there were performance issues (prompt speed, tg speed, too much ram) e2b was the sweet spot where i could give it plan and it can edit files properly.


That makes sense it sounds like your computer isn't super powerful. Whatever works for you


How did e2b compare to e4b ?


i did not see much improvement for my use case i.e. file editing tasks but with e4b tg/s is lower so i stick with e2b.


- Tool for organizing files, pasted data, and prompts into markdown snippets you can copy into different AI chats.

- Calculator that gives tg/s and vram required based on model params and ddr settings.

- Auto create dashboard from csv/json files or apis Easyanalytica.com

- snippet viewer for html/react that allows annotation and sharing based on url fragments


why do people want to continue to use anthropic despite their shitty service? its not like they have some kind of lock-in as it is still new company and it has shown its color before we are stuck with it unlike google/meta etc.


Totally agree. This is why open source models and toolings are so important for the ecosystem. I would not want these companies decide what we can or cannot do.


That's a great question. Maybe other services have flaws too.


I did a showhn with similar idea(got a whooping 1 point and was flagged as spam which was later removed by mods), you paste your html and it encodes it into url, you can share the url without server involvement. I even added a url shortener because while technically feasible encoded url becomes long and QR code no longer works reliably. I also added annotation so you can add your comments and pass it to colleagues.

https://easyanalytica.com/tools/html-playground/


1. How does this work? window.open('about:blank'); and then a document write?

2. The share svg icons look very broken.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: