This specific painting was reinterpreted based on specific descriptions of the colours in a letter from the painter.
As far as I'm aware there is no way to know for sure what colours originally looked like, especially if the information is limited. There are so many variables, we can only guess.
> Most useful async blocks are big enough that the overhead for the error cases disappears.
Is it really though?
In my experience many Rust applications/libraries can be quite heavy on the indirection. One of the points from the article is that contrary to sync Rust, in async Rust each indirection has a runtime cost. Example from the article:
I would naively expect the above to be a 'free' indirection, paying only a compile-time cost for the compiler to inline the code. But after reading the article I understand this is not true, and it has a runtime cost as well.
It's not possible to learn anything about other elements when performing binary search, _except_ the only thing there is to learn: if the target is before or after the recently compared element.
If we would guess that there is a bias in the distribution based on recently seen elements, the guess is at least as likely to be wrong as it is to be right. And if we guess incorrectly, in the worst case, the algorithm degrades to a linear scan.
Unless we have prior knowledge. For example: if there is a particular distribution, or if we know we're dealing with integers without any repetition (i.e. each element is strictly greater than the previous one), etc.
> It's not possible to learn anything about other elements when performing binary search, _except_ the only thing there is to learn: if the target is before or after the recently compared element.
You have another piece of information, you don't only know if the element was before or after the compared element. You can also know the delta between what you looked at and what you're looking for. And you also have the delta from the previous item you looked at.
Assuming your key space is anything like randomly distributed.
Thinking about it--yeah, if you can anticipate anything like a random distribution it's a few extra instructions to reduce the number of values looked up. In the old days that would have been very unlikely to be a good deal, but with so many algorithms dominated by the cache (I've seen more than one case where a clearly less efficient algorithm that reduced memory reads turned out better) I suspect there's a lot of such things that don't go the way we learned them in the stone age.
Is the disconnect here that in many datasets there is some implicit distribution? For example if we are searching for english words we can assume that the number of words or sentences starting with "Q" or "Z" is very small while the ones starting with "T" are many. Or if the first three lookups in a binary search all start with "T" we are probably being asked to search just the "T" section of a dictionary.
Depending on the problem space such assumptions can prove right enough to be worth using despite sometimes being wrong. Of course if you've got the compute to throw at it (and the problem is large) take the Contact approach: why do one when you can do two in parallel for twice the price (cycles)?
> If we would guess that there is a bias in the distribution based on recently seen elements, the guess is at least as likely to be wrong as it is to be right.
This is true for abstract and random data. I don't think it's true for real world data.
For example, python's sort function "knows nothing" about the data you're passing in. But, it does look for some shortcuts and these end up saving time, on average.
Just tried it out and it works great and is really fast! It's a breath of fresh air compared to VS Code. Lots of other editors are fast, but this seems feature complete as well as fast.
Migrating from VS Code was also super simple and integrations with AI assistant seem to just work.
I can definitely appreciate the engineering work that went into it. Loving it so far! Thanks!
I think I was taking 3 grams a day, 1g every 8 hours; the day I finally decided to go seek medical help, I felt extremely tired... when I traced back how much I had taken, it was 3 grams in less than 8 hours, but this was due to being extremely tired and exhausted from the fever, which made me "forget". Lesson learned, I now keep a strict journal with all medicine I take.
It's only a matter of time before laptops get 5G. Macbooks have been rumoured for a while to get cellular modems. [1]
This will probably help adoption. On the one hand it will generate more IPv6 traffic. On the other hand it will expose more developers to IPv6; which will expose them to any lack of support for IPv6 within their own products.
I can confirm this. I work at an e-waste recycling company, and the vast majority of my inventory is corporate IT decommissioned gear. About 1 out of 10 laptops I tear down has a cellular modem, going back to about Intel Core 5th gen.
I've had laptops with cellular modems built in going back to Pentium IIIs. The Compaq N600c had a "multiport" bay on the lid, one of the options was a GSM modem.
I've never had a modern laptop with a cellular modem, but every one I've owned has supported them internally. Even when they aren't provisioned with them, they're usually still supported as aftermarket options.
Thats quite surprising thing to me and weirdly obvious.
If you are single, have a phone contract, you would need some extra contract for a landline internet and wifi router because thats what a lot of people just do and now they can just add an esim and pay a little bit more.
Interesting that this sounds/feels a lot more right or useful than it did 5 years ago.
I can't imagine a worse privacy nightmare. Always on backdoored baseband in 5G with a unique permanent IPv6 address assigned to the machine. Okay, maybe it could be worse if each user account is assigned its own unique IPv6 perma-cookie.
You're thinking of MAC addresses. Machines don't have permanently-assigned v6 addresses, rather the IP is assigned by whatever network they're currently attached to and will change based on that network's whims, just like it does in v4.
> I'd argue self-hosting is the right choice for basically everyone, with the few exceptions at both ends of the extreme:
> If you're just starting out in software & want to get something working quickly with vibe coding, it's easier to treat Postgres as just another remote API that you can call from your single deployed app
> If you're a really big company and are reaching the scale where you need trained database engineers to just work on your stack, you might get economies of scale by just outsourcing that work to a cloud company that has guaranteed talent in that area. The second full freight salaries come into play, outsourcing looks a bit cheaper.
This is funny. I'd argue the exact opposite. I would self host only:
* if I were on a tight budget and trading an hour or two of my time for a cost saving of a hundred dollars or so is a good deal; or
* at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.
I have nothing against self-hosting PostgreSQL. Do whatever you prefer. But to me outsourcing this to cloud providers seems entirely reasonable for small and medium-sized businesses. According to the author's article, self hosting costs you between 30 and 120 minutes per month (after setup, and if you already know what to do). It's easy to do the math...
> employing engineers to manage self-hosted databases is more cost effective than outsourcing
Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.
PaaS platforms (Heroku, Render, Railway) can legitimately be operated by your average dev and not have to hire a dedicated person; those cost even more though.
Another limitation of both the cloud and PaaS is that they are only responsible for the infrastructure/services you use; they will not touch your application at all. Can your application automatically recover from a slow/intermittent network, a DB failover (that you can't even test because your cloud providers' failover and failure modes are a black box), and so on? Otherwise you're waking up at 3am no matter what.
> Every company out there is using the cloud and yet still employs infrastructure engineers
Every company beyond a particular size surely? For many small and medium sized companies hiring an infrastructure team makes just as little sense as hiring kitchen staff to make lunch.
For small companies things like vercel, supabase, firebase, ... wipe the floor with Amazon RDS.
For medium sized companies you need "devops engineers". And in all honesty, more than you'd need sysadmins for the same deployment.
For large companies, they split up AWS responsibilities into entire departments of teams (for example, all clouds have math auth so damn difficult most large companies have -not 1- but multiple departments just dealing with authorization, before you so much as start your first app)
You're paying people to do the role either way, if it's not dedicated staff then it's taking time away from your application developers so they can play the role of underqualified architects, sysadmins, security engineers.
From experience (because I used to do this), it’s a lot less time than a self-hosted solution, once you’re factoring in the multiple services that need to be maintained.
It depends entirely on your use case. If all you need is a DB and Python/PHP/Node server behind Nginx then you can get away with that for a long time. Once you throw in a task runner, emails, queue systems, blob storage, user-uploaded content, etc. you can start running beyond your own ability or time to fix the inevitable problems.
As I pointed out above, you may be better served mixing and matching so you spend your time on the critical aspects but offload those other tasks to someone else.
Of course, I’m not sitting at your computer so I can’t tell you what’s right for you.
Yeah, and nobody is looking at the other side of this. There just are not a lot of good DBA/sysop type who even want to work for some non-tech SMB. So this either gets outsourced to the cloud, or some junior dev or desktop support guy hacks it together. And then who knows if the backups are even working.
Fact is a lot of these companies are on the cloud because their internal IT was a total fail.
When you need to scale up and don't want that $160 to increase 10x to handle the additional load the numbers start making more sense: 3 month's worth of the projected increase upfront is around 4.3k, which is good money for a few days' work for the setup/migration and remains a good deal for you since you break even after 3 months and keep on pocketing the savings indefinitely from that point on.
Of course, my comment wasn't aimed at those who successfully keep their cloud bill in the low 3-figures, but the majority of companies with a 5-figure bill and multiple "infrastructure" people on payroll futzing around with YAML files. Even half the achieved savings should be enough incentive for those guys to learn something new.
But initial setup is maybe 10% of the story. The day 2 operations of monitoring, backups, scaling, and failover still needs to happen, and it still requires expertise.
If you bring that expertise in house, it costs much more than 10x ($3/day -> $30/day = $10,950/year).
If you get the expertise from experts who are juggling you along with a lot of other clients, you get something like PlanetScale or CrunchyData, which are also significantly more expensive.
Most monitoring solutions support Postgres and don't actually care where your DB is hosted. Of course this only applies if someone was actually looking at the metrics to begin with.
> backups
Plenty of options to choose from depending on your recovery time objective. From scheduled pg_dumps to WAL shipping to disk snapshots and a combination of them at any schedule you desire. Just ship them to your favorite blob storage provider and call it a day.
> scaling
That's the main reason I favor bare-metal infrastructure. There is no way anything on the cloud (at a price you can afford) can rival the performance of even a mid-range server that scaling is effectively never an issue; if you're outgrowing that, the conversation we're having is not about getting a big DB but using multiple DBs and sharding at the application layer.
> failover still needs to happen
Yes, get another server and use Patroni/etc. Or just accept the occasional downtime and up to 15 mins of data loss if the machine never comes back up. You'd be surprised how many businesses are perfectly fine with this. Case in point: two major clouds had hour-long downtimes recently and everyone basically forgot about it a week later.
> If you bring that expertise in house
Infrastructure should not require continuous upkeep/repair. You wouldn't buy a car that requires you to have a full-time mechanic in the passenger seat at all times. If your infrastructure requires this, you should ask for a refund and buy from someone who sells more reliable infra.
A server will run forever once set up unless hardware fails (and some hardware can be redundant with spares provisioned ahead of time to automatically take over and delay maintenance operations). You should spend a couple hours a month max on routine maintenance which can be outsourced and still beats the cloud price.
I think you're underestimating the amount of tech that is essentially nix machines all around you that somehow just... work* despite having zero upkeep or maintenance. Modern hardware is surprisingly reliable and most outages are caused by operator error when people are (potentially unnecessarily) messing with stuff rather than the hardware failing.
At 160/mo you are using so little you might as well host off of a raspberry pi on your desk with a USB3 SSD attached. Maintenance and keeping a hot backup would take a few hours to set up, and you're more flexible too. And if you need to scale, rent a VPS or even dedicated machine from Hetzner.
An LLM could set this up for you, it's dead simple.
I'm not going to put customer data on a USB-3 SSD sitting on my desk. Having a small database doesn't mean you can ignore physical security and regulatory compliance, particularly if you've still got reasonable cash flow. Just as one example, some of our regulatory requirements involve immutable storage - how am I supposed to make an SSD that's literally on my desk immutable in any meaningful way? S3 handles this in seconds. Same thing with geographically distributed replicas and backups.
I also disagree that the ongoing maintenance, observability, and testing of a replicated database would take a few hours to set up and then require zero maintenance and never ping me with alerts.
At my last two places it very quickly got to the point where the technical complexity of deployments, managing environments, dealing with large piles of data, etc. meant that we needed to hire someone to deal with it all.
They actually preferred managing VMs and self hosting in many cases (we kept the cloud web hosting for features like deploy previews, but that’s about it) to dealing with proprietary cloud tooling and APIs. Saved a ton of money, too.
On the other hand, the place before that was simple enough to build and deploy using cloud solutions without hiring someone dedicated (up to at least some pretty substantial scale that we didn’t hit).
> Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.
This doesn’t make sense as an argument. The reason the cloud is more complex is because that complexity is available. Under a certain size, a large number of cloud products simply can’t be managed in-house (and certainly not altogether).
Also your argument is incorrect in my experience.
At a smaller business I worked at, I was able to use these services to achieve uptime and performance that I couldn’t achieve self-hosted, because I had to spend time on the product itself. So yeah, we’d saved on infrastructure engineers.
At larger scales, what your false dichotomy suggests also doesn’t actually happen. Where I work now, our data stores are all self-managed on top of EC2/Azure, where performance and reliability are critical. But we don’t self-host everything. For example, we use SES to send our emails and we use RDS for our app DB, because their performance profiles and uptime guarantees are more than acceptable for the price we pay. That frees up our platform engineers to spend their energy on keeping our uptime on our critical services.
>At a smaller business I worked at, I was able to use these services to achieve uptime and performance that I couldn’t achieve self-hosted, because I had to spend time on the product itself. So yeah, we’d saved on infrastructure engineers.
How sure are you about that one? All of my hetzner vm`s reach an uptime if 99.9% something.
I could see more then one small business stack fitting onto a single of those vm`s.
100% certain because I started by self hosting before moving to AWS services for specific components and improved the uptime and reduced the time I spent keeping those services alive.
A Django+Celery app behind Nginx back in the day. Most maintenance would be discovering a new failure mode:
- certificates not being renewed in time
- Celery eating up all RAM and having to be recycled
- RabbitMQ getting blocked requiring a forced restart
- random issues with Postgres that usually required a hard restart of PG (running low on RAM maybe?)
- configs having issues
- running out of inodes
- DNS not updating when upgrading to a new server (no CDN at the time)
- data centre going down, taking the provider’s email support with it (yes, really)
Bear in mind I’m going back a decade now, my memory is rusty. Each issue was solvable but each would happen at random and even mitigating them was time that I (a single dev) was not spending on new features or fixing bugs.
Er… what? Even in today’s world with Docker, you have differences between dev and prod. For a start, one is accessed via the internet and requires TLS configs to work correctly. The other is accessed via localhost.
Just fyi, you can put whatever you want in /etc/hosts, it gets hit before the resolver. So you can run your website on localhost with your regular host name over https.
Just because your VM is running doesn't mean the service is accessible. Whenever there's a large AWS outage it's usually not because the servers turned off. It also doesn't guarantee that your backups are working properly.
If you have a server where everything is on the server, the server being on means everything is online... There is not a lot of complexity going on inside a single server infrastructure.
I mean just because you have backups does not mean you can restore them ;-)
We do test backup restoration automatically and also on a quarterly basis manually, but so you should do with AWS.
Otherwise how do you know you can restore system a without impact other dependency, d and c
Yes, mix-and-match is the way to go, depending on what kind of skills are available in your team. I wouldn't touch a mail server with a 10-foot pole, but I'll happily self-manage certain daemons that I'm comfortable with.
Just be careful not to accept more complexity just because it is available, which is what the AWS evangelists often try to sell. After all, we should always make an informed decision when adding a new dependency, whether in code or in infrastructure.
Of course AWS are trying to sell you everything. It’s still on you and your team to understand your product and infrastructure and decide what makes sense for you.
> Do you account for frequency and variety of wakeups here?
Yes. In my career I've dealt with way more failures due to unnecessary distributed systems (that could have been one big bare-metal box) rather than hardware failures.
You can never eliminate wake-ups, but I find bare-metal systems to have much less moving parts means you eliminate a whole bunch of failure scenarios so you're only left with actual hardware failure (and HW is pretty reliable nowadays).
If this isn't the truth. I just spent several weeks, on and off, debugging a remote hosted build system tool thingy because it was in turn made of at least 50 different microservice type systems and it was breaking in the middle of two of them.
There was, I have to admit, a log message that explained the problem... once I could find the specific log message and understand the 45 steps in the chain that got to that spot.
In-house vs Cloud Provider is largely a wash in terms of cost. Regardless of the approach, you are going need people to maintain stuff and people cost money. Similarly compute and storage cost money so what you lose on the swings, you gain on the roundabouts.
In my experience you typically need less people if using a Cloud Provider than in-house (or the same number of people can handle more instances) due to increased leverage. Whether you can maximize what you get via leverage depends on how good your team is.
US companies typically like to minimize headcount (either through accounting tricks or outsourcing) so usually using a Cloud Provider wins out for this reason alone. It's not how much money you spend, it's how it looks on the balance sheet ;)
I don’t think it’s a lie, it’s just perhaps overstated. The number of staff needed to manage a cloud infrastructure is definitely lower than that required to manage the equivalent self-hosted infrastructure.
Whether or not you need that equivalence is an orthogonal question.
> The number of staff needed to manage a cloud infrastructure is definitely lower than that required to manage the equivalent self-hosted infrastructure.
There's probably a sweet spot where that is true, but because cloud providers offer more complexity (self-inflicted problems) and use PR to encourage you to use them ("best practices" and so on) in all the cloud-hosted shops I've been in a decade of experience I've always seen multiple full-time infra people being busy with... something?
There was always something to do, whether to keep up with cloud provider changes/deprecations, implementing the latest "best practice", debugging distributed systems failures or self-inflicted problems and so on. I'm sure career/resume polishing incentives are at play here too - the employee wants the system to require their input otherwise their job is no longer needed.
Maybe in a perfect world you can indeed use cloud-hosted services to reduce/eliminate dedicated staff, but in practice I've never seen anything but solo founders actually achieve that.
Exactly. Companies with cloud infra often still have to hire infra people or even an infra team, but that team will be smaller than if they were self-hosting everything, in some cases radically smaller.
I love self-hosting stuff and even have a bias towards it, but the cost/time tradeoff is more complex than most people think.
Working in a university Lab self-hosting is the default for almost anything. While I would agree that cost are quite low, I sometimes would be really happy to throw money at problems to make them go away. Without having the chance and thus being no expert, I really see the opportunity of scaling (up and down) quickly in the cloud. We ran a postgres database of a few 100 GB with multiple read replica and we managed somehow, but actually really hit our limits of expertise at some point. At some point we stopped migrating to newer database schemas because it was just such a hassle keeping availability. If I had the money as company, I guess I would have paid for a hosted solution.
The fact that as many engineers are on payroll doesn't mean that "cloud" is not an efficiency improvement. When things are easier and cheaper, people don't do less or buy less. They do more and buy more until they fill their capacity. The end result is the same number (or more) of engineers, but they deal with a higher level of abstraction and achieve more with the same headcount.
I can't talk about staff costs, but as someone who's self-hosted Postgres before, using RDS or Supabase saves weeks of time on upgrades, replicas, tuning, and backups (yeah, you still need independent backups, but PITRs make life easier). Databases and file storage are probably the most useful cloud functionality for small teams.
If you have the luxury of spending half a million per year on infrastructure engineers then you can of course do better, but this is by no means universal or cost-effective.
Well sure you still have 2 or 3 infra people but now you don’t need 15. Comparing to modern Hetzner is also not fair to “cloud” in the sense that click-and-get-server didn’t exist until cloud providers popped up. That was initially the whole point. If bare metal behind an API existed in 2009 the whole industry would look very different. Contingencies Rule Everything Around Me.
You are missing that most services don't have high availability needs and don't need to scale.
Most projects I have worked on in my career have never seen more than a hundred concurrent users. If something goes down on Saturday, I am going to fix it on Monday.
I have worked on internal tools were I just added a postgres DB to the docker setup and that was it. 5 Minute of work and no issues at all. Sure if you have something customer facing, you need to do a bit more and setup a good backup strategy but that really isn't magic.
> at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.
This is the crux of one of the most common fallacies in software engineering decision making today. I've participated in a bunch of architecture / vendor evaluations that concluded managed services are more cost effective almost purely because they underestimated (or even discarded entirely) the internal engineering cost of vendor management. Black box debugging is one of the most time costuming engineering pursuits, & even when it's something widely documented & well supported like RDS, it's only really tuned for the lowest common denominator - the complexities of tuning someone else's system at scale can really add up to only marginally less effort than self-hosting (if there's any difference at all).
But most importantly - even if it's significantly less effort than self-hosting, it's never effectively costed when evaluating trade-offs - that's what leads to this persistent myth about the engineering cost of self-hosting. "Managing" managed services is a non-zero cost.
Add to that the ultimate trade-off of accountability vs availability (internal engineers care less about availability when it's out of there hands - but it's still a loss to your product either way).
> Black box debugging is one of the most time costuming engineering pursuits, & even when it's something widely documented & well supported like RDS, it's only really tuned for the lowest common denominator - the complexities of tuning someone else's system at scale can really add up to only marginally less effort than self-hosting (if there's any difference at all).
I'm really not sure what you're talking about here. I manage many RDS clusters at work. I think in total, we've spent maybe eight hours over the last three years "tuning" the system. It runs at about 100kqps during peak load. Could it be cheaper or faster? Probably, but it's a small fraction of our total infra spend and it's not keeping me up at night.
Virtually all the effort we've ever put in here has been making the application query the appropriate indexes. But you'd do no matter how you host your database.
Hell, even the metrics that RDS gives you for free make the thing pay for itself, IMO. The thought of setting up grafana to monitor a new database makes me sweat.
> even the metrics that RDS gives you for free make the thing pay for itself, IMO. The thought of setting up grafana to monitor a new database makes me sweat.
Sure, and I can install something to do RDS performance insights without querying PG stats, and something to schedule backups to another region, and something to aggregate the logs, and then I have N more things that can break.
Ultimately, it depends on your stack & your bottlenecks. If you can afford to run slower queries then focusing your efforts elsewhere makes sense for you. We run ~25kqps average & mostly things are fine, but when on-call pages come in query performance is a common culprit. The time we've spent on that hasn't been significantly different to self-hosted persistence backends I've worked with (probably less time spent but far from orders of magnitudes - certainly not worthy of a bullet point in the "pros" column when costing application architectures.
But that almost certainly has to do with index use and configuration, not whether you're self hosting or not. RDS gives you essentially all of the same Postgres configuration options.
its not. I've been in a few shops that use RDS because they think their time is better spend doing other things.
except now they are stuck trying to maintain and debug Postgres without having the same visibility and agency that they would if they hosted it themselves. situation isn't at all clear.
One thing unaccounted for if you've only ever used cloud-hosted DBs is just how slow they are compared to a modern server with NVME storage.
This leads the developers to do all kinds of workarounds and reach for more cloud services (and then integrating them and - often poorly - ensuring consistency across them) because the cloud hosted DB is not able to handle the load.
On bare-metal, you can go a very long way with just throwing everything at Postgres and calling it a day.
This is the reason I manage SQL Server on a VM in Azure instead of their PaaS offering. The fully managed SQL has terrible performance unless you drop many thousands a month. The VM I built is closer to 700 a month.
Running on IaaS also gives you more scalability knobs to tweak: SSD Iops and b/w, multiple drives for logs/partitions, memory optimized VMs, and there's a lot of low level settings that aren't accessible in managed SQL. Licensing costs are also horrible with managed SQL Server, where it seems like you pay the Enterprise level, but running it yourself offers lower cost editions like Standard or Web.
I use Google Cloud SQL for PostgreSQL and it's been rock solid. No issues; troubleshooting works fine; all extensions we need already installed; can adjust settings where needed.
its more of a general condition - its not that RDS is somehow really faulty, its just that when things do go wrong, its not really anybody's job to introspect the system because RDS is taking care of it for us.
in the limit I dont think we should need DBAs, but as long as we need to manage indices by hand, think more than 10 seconds about the hot queries, manage replication, tune the vacuumer, track updates, and all the other rot - then actually installing PG on a node of your choice is really the smallest of problems you face.
I also encourage people to just use managed databases. After all, it is easy to replace such people. Heck actually you can fire all of them and replace the demand with genAI nowadays.
The discussion isn't "what is more effective". The discussion is "who wants to be blamed in case things go south". If you push the decision to move to self-hosted and then one of the engineers fucks up the database, you have a serious problem. If same engineer fucks up cloud database, it's easier to save your own ass.
Agreed. As someone in a very tiny shop, all us devs want to do as little context switching to ops as possible. Not even half a day a month. Our hosted services are in aggregate still way cheaper than hiring another person. (We do not employ an "infrastructure engineer").
grok/claude/gpt: "Write a concise Bash script for setting up an automated daily PostgreSQL database backup using pg_dump and cron on a Linux server, with error handling via logging and 7-day retention by deleting older backups."
What is needed to evaluate OCR for most business applications (above everything else) is accuracy.
Some results look plausible but are just plain wrong. That is worse than useless.
Example: the "Table" sample document contains chemical substances and their properties. How many numbers did the LLM output and associate correctly? That is all that matters. There is no "preference" aspect that is relevant until the data is correct. Nicely formatted incorrect data is still incorrect.
I reviewed the output from Qwen3-VL-8B on this document. It mixes up the rows, resulting in many values associated with the wrong substance. I presume using its output for any real purpose would be incredibly dangerous. This model should not be used for such a purpose. There is no winning aspect to it. Does another model produce worse results? Then both models should be avoided at all costs.
Are there models available that are accurate enough for this purpose? I don't know. It is very time consuming to evaluate. This particular table seems pretty legible. A real production grade OCR solution should probably need a 100% score on this example before it can be adopted. The output of such a table is not something humans are good at reviewing. It is difficult to spot errors. It either needs to be entirely correct, or the OCR has failed completely.
I am confident we'll reach a point where a mix of traditional OCR and LLM models can produce correct and usable output. I would welcome a benchmark where (objective) correctness is rated separately from of the (subjective) output structure.
Edit: Just checked a few other models for errors on this example.
* GPT 5.1 is confused by the column labelled "C4" and mismatches the last 4 columns entirely. And almost all of the numbers in the last column are wrong.
* olmOCR 2 omits the single value in column "C4" from the table.
* Gemini 3 produces "1.001E-04" instead of "1.001E-11" as viscosity at T_max for Argon. Off by 7 orders of magnitude! There is zero ambiguity in the original table. On the second try it got it right. Which is interesting! I want to see this in a benchmark!
There might be more errors! I don't know, I'd like to see them!
This is a philosophy. One which many people that write Ruby subscribe to. The fundamental idea is: create a DSL that makes it very easy to implement your application. It is what made Rails different when it was created: it is a DSL that makes expressing web applications easy.
I don't know its history well enough, but it seems to originate from Lisp. PG wrote about it before [1].
It can result in code that is extremely easy to read and reason about. It can also be incredibly messy. I have seen lots of examples of both over the years.
It is the polar opposite of Go's philosophy (be explicit & favour predictability across all codebases over expressiveness).
For some additional context; many old pigments were not stable at all.
https://www.vangoghstudio.com/what-were-the-original-colors-...
reply