Hacker Newsnew | past | comments | ask | show | jobs | submit | esafak's commentslogin

Your sibling post estimated it pretty well :)

We need to promote alignment and other ethics benchmarks; we can't change what we don't measure. I don't even know any off the top of my head.

> It was told to escape a sandbox and notify a researcher. It did. The researcher found out via an unexpected email while eating a sandwich in a park.

Now that they have a lead, I hope they double down on alignment. We are courting trouble.


Or will they rapidly become indistinguishable since they both get the job done?

I'm on their Lite plan and I see some of this too. It is also slow. I use it as a backup.

An application of their specification language, https://juxt.github.io/allium/

It seems the difference between this and conventional specification languages is that Allium's specs are in natural language, and enforcement is by LLM. This places it in a middle ground between unstructured plan files, and formal specification languages. I can see this as a low friction way to improve code quality.


Is this a good way to improve performance (frame rate, latency, CPU load) ?

Yea!

* Do video playback out of the browser. You can render a subset of frames, use a different pipeline for decode etc...

* Pull video from a different source. Join Google Meet on current computer, but stream from another host.



How does it select what to forget? Let's say I land a PR that introduces a sharp change, migrating from one thing to another. An exponential decay won't catch this. Biological learning makes sense when things we observe similar things repeatedly in order to learn patterns. I am skeptical that it applies to learning the commits of one code base.

I think this is a very important question, and it makes it clear that memory systems are less about fact retrieval, and more about knowledge classification. Memories systems are not document stores -- which to be fair this hippo system does recognize and motivates by exponential decay, recall strengthening and "sleep" consolidation.

I personally don't think a memory system should try to "select what to forget", but to store everythign and live with the contradictions inherent in history. Having said that, we need to ascribe a certain confidence to each memory at storage time, where something uncertain is described as such, and when contradicting information gets stored, it reduces the confidence even further -- this on top of time decay and retreival bumps in confidence. E. T. Jaynes argued that this could be achieved in machines through Bayesian updating, say a beta distribution is stored for each memory and upon storing knowledge that confirms this memory, the beta distribution is updated to have more confidence (the original is the prior).

If every memory has a Bayesian prior denoting confidence, and this is surfaced when recalling, then the LLM itself can decide how to sythesize the different memories. Together with a "remembered on" field, the LLM can grok that the database schema was changed, or a certain design pattern was discarded (for example).

(Full disclosure, I have developed a memory system myself which I will post here in a couple days, with a slightly different target audience than hippo).


No, they are not. People just want the model. Let people bring their own harness to their Anthropic subscription and see who's still using CC.

You can literally do this right now if you want. The masses have spoken, they want CC.

You literally can't use your Anthropic subscription that you paid for with any agent other than CC; you have to pay by the token. We've talked about this a lot; check the history.

Saying "you can use any other agent, just pay 20x more through the API!" does not demonstrate a realistic choice.


Every Claude Enterprise customer is choosing that. Claude Enterprise bills at API rates (presumably with a discount if you're big enough).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: