More

pdyc · 2026-06-16T05:06:46 1781586406

what bug and it affects what?

verdverm · 2026-06-16T15:19:16 1781623156

it's a prompt cache invalidation bug that causes all input to be reprocessed instead of getting preloaded

There are other reasons to prefer vllm to llama-cpp as well

pdyc · 2026-06-16T04:32:47 1781584367

yes

harness - pi+custom extension for subagents

model - qwen3.6 35ba3b q4km

hardware - intel arrow lake with 32gb ram

server - llama.cpp vulkan

performance - 15-18t/s generation 50-150t/s pp

planning and task creation is still using claude/gpt but they dont touch the code. All coding is done using this setup.

Example of project made using this setup easyanalytica.com , its of medium size complexity

pdyc · 2026-06-15T05:49:42 1781502582

i am still working on easyanalytica tool to auto generate dashboards without ai . I recently added comparison feature and figuring that out was fun. There are lot of interesting ideas on execution side of it but for end user its a simple product, just give data and see the dashboard.

pdyc · 2026-06-09T05:41:15 1780983675

html snippet playground - for testing html/react snippets

token speed calculator - for estimating tg/s of ai based on ram speed and model size/params this helps in comparing different hw, estimating likely speeds i will get on hardware

prompt assembler - to create prompt and context once and reuse it in different ai's, picking and choosing context in a prompt, creating agent.md etc.

dashboard builder - for viewing gsc, ga, stripe data in one place

pdyc · 2026-06-03T13:18:56 1780492736

afaik, enterprise plans are not subsidized. its 20$/seat+api pricing. Unless you are saying api pricing itself is subsidized.

LurkandComment · 2026-06-03T13:21:01 1780492861

This is market introductory pricing that hasn't factored in cost recovery. Most of it has been run on early investment with the assumption they will recover costs in the long run. The prices are subsidized across the board and they will need to go up signficantly to recover them.

swiftcoder · 2026-06-03T13:24:45 1780493085

Assuming this were accurate, then presumably the AI companies would be betting that inference costs come down before the bill is due - I don't see enterprises being willing to absorb another ~10x price increase for tokens (as they've just done going from subscription prices to per-token pricing)

LurkandComment · 2026-06-03T14:41:42 1780497702

For claude shops this was a huge hit. But lets back this up. There are some companies that haven't even built a break-even model at this price because they are funded by investment. As soon as those investors lose patience the first dominos will fall. For those who have somewhat of a business model, will it survive a price increase? The bigger question is do the base model providers have enough runway and have a way to keep going as they need to recover costs.

pqtyw · 2026-06-03T18:50:00 1780512600

It's mostly R&D though, not inference. If LLM's effectively become a commodity then they are screwed anyway.

swiftcoder · 2026-06-03T19:15:03 1780514103

Aren’t the Chinese labs quickly turning them into a commodity?

The open-weight models will have a steady race to the bottom on inference costs just by dint of competition between providers. They aren’t at the frontier yet, but they are rapidly eating the flash market.

pqtyw · 2026-06-03T18:49:09 1780512549

Yeah, that's not going to work if you can get e.g. 80% of value by using 10-20x or more cheaper open models. At some point it would just make sense for large companies to rent compute and deploy their version of DeepSeek or whatever (if they don't trust Chinese providers)

logancbrown · 2026-06-03T13:23:51 1780493031

None of what you said is true

rimliu · 2026-06-03T13:28:46 1780493326

And you know this how?

logancbrown · 2026-06-09T17:43:09 1781026989

Burden of proof is on you

pdyc · 2026-05-26T05:07:07 1779772027

depends on how clear your instructions are, if there is no ambiguity you can even use gemma4 e2b/e4b.

pdyc · 2026-05-11T06:28:08 1778480888

i use smaller model gemma e2b for most of my editing and it works surprisingly well. Workflow is planning with sota models and execution via small models. If you plan properly dont leave ambiguity for smaller model it works well.

2ndorderthought · 2026-05-11T10:05:17 1778493917

Out of curiosity have you tried other small models? The e2b for me was unusable. Llama3.2 3b was better and that thing is a year old and I rarely use it now too.

pdyc · 2026-05-11T12:33:12 1778502792

yes i keep on trying small models, i have also tried qwen 3.5 0.8B, 2B, 4b and gemma4 e4B models but they either did not worked reliably (thinking loop, issue in following instruction) or there were performance issues (prompt speed, tg speed, too much ram) e2b was the sweet spot where i could give it plan and it can edit files properly.

2ndorderthought · 2026-05-11T12:51:20 1778503880

That makes sense it sounds like your computer isn't super powerful. Whatever works for you

Melatonic · 2026-05-12T01:45:08 1778550308

How did e2b compare to e4b ?

pdyc · 2026-05-12T04:33:14 1778560394

i did not see much improvement for my use case i.e. file editing tasks but with e4b tg/s is lower so i stick with e2b.

pdyc · 2026-05-11T05:05:15 1778475915

- Tool for organizing files, pasted data, and prompts into markdown snippets you can copy into different AI chats.

- Calculator that gives tg/s and vram required based on model params and ddr settings.

- Auto create dashboard from csv/json files or apis Easyanalytica.com

- snippet viewer for html/react that allows annotation and sharing based on url fragments

pdyc · 2026-04-30T16:01:41 1777564901

why do people want to continue to use anthropic despite their shitty service? its not like they have some kind of lock-in as it is still new company and it has shown its color before we are stuck with it unlike google/meta etc.

0xpiguy · 2026-04-30T16:07:11 1777565231

Totally agree. This is why open source models and toolings are so important for the ecosystem. I would not want these companies decide what we can or cannot do.

AtNightWeCode · 2026-04-30T17:20:59 1777569659

That's a great question. Maybe other services have flaws too.

pdyc · 2026-04-24T13:34:47 1777037687

I did a showhn with similar idea(got a whooping 1 point and was flagged as spam which was later removed by mods), you paste your html and it encodes it into url, you can share the url without server involvement. I even added a url shortener because while technically feasible encoded url becomes long and QR code no longer works reliably. I also added annotation so you can add your comments and pass it to colleagues.

https://easyanalytica.com/tools/html-playground/

kilroy123 · 2026-04-24T15:53:51 1777046031

1. How does this work? window.open('about:blank'); and then a document write?

2. The share svg icons look very broken.