More

cpard · 2026-06-10T18:28:56 1781116136

Most framework vendors don’t have an incentive to make things less obscure. The agent framework is free/open source and they make money primarily from selling observability products for agents. Even if they don’t intentionally obscure things, they just don’t have the motivation to optimize that part.

cpard · 2026-06-09T16:45:10 1781023510

I'm personally interested in this problem and it's a quite active research area right now.

My feeling is that the research is converging to what the paper claims, that the combination of two is the right way to do it and it's a matter of how you combine the two as part of the harness you built that makes the difference.

At the AID-Wild / ACM CAIS 2026 workshop that happened recently, there are plenty of examples in the accepted papers on that.

A great example is AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve. It uses AlphaEvolve and Vizier to evolve compiler code-layout heuristics. (https://arxiv.org/abs/2606.00131)

_alternator_ · 2026-06-09T16:55:29 1781024129

The combination approach jives well with my use of the models in a number of areas. I guide models to use best-in-class algorithmic approaches as available. (Eg using constraint solves for a particular problem where pure Monte Carlo rarely gives "in-bounds" data.)

I find it odd that frontier models often don't suggest the most powerful methods for crushing problems, but it may be that the training data doesn't actually have "good enough" experts on the problems I encounter. If the experts don't know about the best ways to solve the problem, they'll get dinged in training for even trying.

cpard · 2026-06-09T17:34:51 1781026491

Do you enumerate the options of the algorithms to the models? I've tried to do "algorithmic discovery" with these systems, e.g. openevolve, and to be honest the models didn't really focus on that part.

Instead they were focusing more on optimizations of the existing algorithm that has been implemented. Maybe it's an artifact of the problem I was throwing to them (I was asking to optimize the implementation of select_k in Arrow, which is currently using a max-heap streaming algorithm).

I've started documenting my journey with this here: https://www.kostasp.net/posts/16-ai-experiments-apache-arrow in case you want to take a look. Any advice would be highly appreciated, I'm looking for more inspiration on how to torture myself with that stuff.

cpard · 2026-06-07T03:56:59 1780804619

This is really neat. I’m working on something similar but for data artifacts not just code. It’s very encouraging to see that this kind of tooling helps both humans and models, that was what made me starting to work on that.

rohanucla · 2026-06-07T03:58:49 1780804729

Thanks! The data artifacts angle is really interesting. in some ways the problem is even harder there because data pipelines have less explicit structure than code, I guess.

gwerbin · 2026-06-07T04:08:36 1780805316

The artifacts themselves have more structure, but diffing is hard because of size: what exactly do you show in the different? Row-level? Summary statistics? How do you keep it from getting slow on bigger datasets?

Then there are plots saved as images which have basically no structure at all exposed.

cpard · 2026-06-07T05:20:39 1780809639

Row level and summary stats are both diffs over values that can tell you that something changed but not whether the * meaning * has changed. What I'm working on is providing more information on how the meaning changes.

What questions I'd like to answer with the diffing is more like: will the grain go from one-row-per-user to one-row-per-user-per-day, will a key stop being unique, will a join start fanning out and quietly double a measure, will something additive become non-additive.

This diff is over structure but this structure is latent in the transformation that produces it and to make things harder, if we are talking about some declarative language being used (e.g. SQL) the code doesn't even describe how things are getting done, but what the output would be.

What I've ended up doing is recovering the structure from the code by analyzing it and then using * cheap * profiling than a full row compare.

As an example, my equivalent impact sub-command output would be something like this: "this change makes account_id non-unique three models downstream"

gwerbin · 2026-06-07T04:09:36 1780805376

There is still no good "data diff" tool that I can run on, say, a big pile of CSV or Parquet. Something with DVC integration would be especially welcome.

appplication · 2026-06-07T04:40:15 1780807215

I would imagine because at scales where most folks use parquet files, you’re generally no longer really thinking in terms of individual diffs to your data (and also does imply some level of batch processing, vs e.g. a DB).

We have some custom data diff tools at my ultracorp that provide a browsable interface, but the customer tends to be more operations folk than engineers or DS etc who would be more familiar with actual version control concepts. But these work against the data store and not on something like csv or parquet.

gwerbin · 2026-06-08T03:40:15 1780890015

Sorta? Maybe I'm weird. I tend to use Parquet files inside my project instead of reading directly from and writing directly to our data warehouse. That lets me cut out a lot of overhead spent on just waiting for data to flow over the network, and also as a side benefit lets me track everything with DVC, which itself has a lot of benefits like being able to summon all project data with `dvc pull`.

I consider that a completely distinct use case from, say, Iceberg tables in S3.

cpard · 2026-06-07T03:51:29 1780804289

Curious to see when a post from OpenAI will appear with the corrected theory or something. This seems to be an ideal scenario for them to go after another scientific case. They have the theory, they have the experimental proof that it’s wrong, exactly what you need for an agentic loop to do its work.

Or maybe what works in math doesn’t work with chemistry?

vi_sextus_vi · 2026-06-07T03:52:34 1780804354

>[theorists disagree] that the discrepancy is as significant as it appears.

It was predicted by decade old "theory" (with a single equation,and it seems that the original paper has no equations at all)

so OAI/DeepMind can quietly check if it's in the training or if they can extrapolate, yes

https://arxiv.org/abs/0803.2752

https://cen.acs.org/articles/85/i18/Boron-buckyball-predicte...

cpard · 2026-06-07T05:26:50 1780810010

but is it worth the effort from a PR perspective for them? I guess we will have to wait and see.

cpard · 2026-06-07T03:48:08 1780804088

I don’t think the flex here is the amount of code alone. Their goal is to show that AI can improve productivity, the number of lines is just the proxy to that. This article is a marketing piece after all.

Now someone can argue that lines of code are not a good proxy of engineering productivity, but I wouldn’t be surprised if the audience they target with this content is not the HN commenters of this thread.

zbrock · 2026-06-07T14:23:25 1780842205

Correct on the first part, partially correct on the second. LOC is a bad metric, but it is at least a legible one. Lots of people working on better ways to measure Software Productivity!

cpard · 2026-06-06T22:11:58 1780783918

* …tools, UMP does for memory - negotiated operations over a portable, signed, bi-temporal record … *

What is a bi-temporal record? I don’t think I’ve heard the term before and I’d love to learn more.

msteffen · 2026-06-06T22:16:57 1780784217

IIUC, the most basic version is when you have a log where every entry has both “date added” and “effective date,” so you can add stuff to the log retroactively. For example, “the user just informed us yesterday that they moved last year” -> address date added=yesterday, date effective=last year

skeledrew · 2026-06-06T22:49:35 1780786175

I have similar setup in Orgzly (kinda in Emacs too but it's buggy and not not as useful there) where a note has a "created time" property that's always automatically applied. And then there's the "closed time" applied when I set note the state to "done", which I sometimes modify depending on what the note is for and thus what "done" means.

cpard · 2026-06-07T03:39:48 1780803588

Thank you!

cpard · 2026-06-05T00:50:07 1780620607

It’s clear that Anthropic is building harnesses for specific use cases now and turns them into products.

This is the equivalent of Claude Design but for security.

Different harness, different packaging and obviously different distribution because the persona is different.

It’s funny because from all the posts I’ve read from companies reporting on Mythos, everyone is building their own harness for it.

Cisco even published a specification for one.

But Anthropic is the one who has figured out how to package and distribute this. Great GTM!

ElijahLynn · 2026-06-05T01:23:51 1780622631

This post is misleading and so is the GitHub org. Anthropics vs Anthropic.

Zetaphor · 2026-06-05T04:43:01 1780634581

That is their actual account. We have this discussion every time they post something sadly

ElijahLynn · 2026-06-05T14:56:35 1780671395

Oh, bummer. That is really confusing.

cpard · 2026-06-04T16:44:43 1780591483

SQL, JS, Excel are really hard to substitute because of how widely used they are by people. Even if something new comes up that it's objectively better, so far has always failed gaining traction because of this reality.

I wonder though, is such a dialect better for agents? Have you tried to measure if an agent performs better expressing queries in such a language instead of SQL?

remywang · 2026-06-04T16:48:17 1780591697

Claude had no problem translating SQL into Prela, and because you have fine grained control over the query plan (a Prela query is a plan), it was able to optimize queries to be very fast

cpard · 2026-06-04T16:56:22 1780592182

I'm more curious about going from text to Prela instead of going from text to SQL and measuring any difference in the performance there. On one hand models have been trained on a lot of SQL on the other hand they are really good in mathematical reasoning too so thinking in Perla might be a natural fit for them.

remywang · 2026-06-04T17:38:16 1780594696

There are fewer foot guns in Prela in particular no NULLs which should help both humans and robots.

joelthelion · 2026-06-04T16:52:47 1780591967

Having control over the execution plan is super interesting ! This is a very common frustration when writing SQL.

Do you think it would be possible to offer Prela as a direct interface to a relational database?

remywang · 2026-06-04T17:36:35 1780594595

Yes, maybe not the language itself, but the ideas behind it. Tarski's Algebra of Relations is actually a better model for modern columns stores than the standard relational algebra, because a column is a binary relation from the primary key into its value.

ted_dunning · 2026-06-04T19:25:07 1780601107

It would be pretty easy to put a DuckDB data source into this code.

It might be pretty easy to use overloading to get special case implementations that form SQL queries progressively until the results need to be materialized as something like a dataframe for the function code to work on.

cpard · 2026-05-31T21:44:59 1780263899

Replicating the Postgres WAL to S3 and Iceberg reliably is a hard problem but it’s not accurate to say that no ETL is needed here.

maybe you can say it’s more of an ELT pattern but anyone who’s interested into using this for realistic analytics they will have to transform the data at some point.

If an org is early enough to think that they can use a solution like this and just get in duckdb and start spitting out reports, they will be up for a really bad experience.

Please educate people to do the right thing and realize the scope of the work they are facing, it might feel that it hurts your growth in the short term but it will benefit you greatly in the mid-long term as a vendor.

kikimora · 2026-05-31T22:09:02 1780265342

IDK, AWS Zero ETL from Autora into Redshift really helped us at some point. You right that data transformation is very limited if not possible. But having data in an analytical store, being able to experiment with queries, understand what is wrong with your OLTP schema and then build ETL is way better than doing an upfront design.

cpard · 2026-06-01T00:51:43 1780275103

Of course it is. What you describe is one of the reasons that ELT became popular, if you couple it with a variant type and schema on read, you have a very powerful and flexible architecture.

But there’s no free lunch, building and maintains data infrastructure that is reliable requires work. Many companies don’t realise that when they start their analytical journey and aggressive marketing doesn’t help. That’s the point I was trying to make.

kikimora · 2026-06-01T01:05:03 1780275903

I don’t disagree, just placing emphasis on a different aspect.

In an ideal world there is a tool that moves your schema into an analytical store “as is” with a single click. Then the same tool lets you add arbitrary transformations of the data. Surprisingly I have not come across such a tool. It is earthier “one click to move your data” or “any transformation you want” but only after a significant upfront investment :(

cpard · 2026-06-01T01:28:04 1780277284

I think I didn’t articulate myself very well on my reply. I actually wanted to say that I agree with you and emphasise again the need for educating users for the complexity of these projects.

What you describe has been pitched by many different products for different parts of the data platform. Fivetran for example claims to do that for the extraction and loading part, good old Informatica was offering the ETL in a graphical interface etc.

The problem that many teams ended up having is the explosion of the tooling needed by data teams.

cpard · 2026-05-31T04:16:37 1780200997

The comments are definitely not worth reading. It’s a very sad thread, you literally had to go through all of them to find one that wasn’t about hate and stating some facts about the issues of the code.

wjnc · 2026-05-31T04:30:02 1780201802

I found them worth reading for the following set of thoughts came up:

- programmers had problems with delivering quality long before LLM’s

- very much research and tools went into that, bringing us {Git, libraries, VSCode, reviews, …,} but the human factor stayed the same (and more pronounced imho than in other fields of engineering)

- LLMs democratized programming, enhancing a few, dropping the bottom to no skill programming

- the tools and practices created for the quality problems from the past turn out to be wholly incapable of maintaining quality in the present

The main problem behind this is that those delivering the QA tools of the past are central in the AI race. Old school engineering would separate these concerns.