Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Rob Pike’s Rules of Programming (1989) (utexas.edu)
631 points by gjvc on Aug 12, 2020 | hide | past | favorite | 323 comments


> Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

I wish people would follow this rule and just let stuff work. I recently encountered the most extreme version of this I've ever seen in my career: a design review where a guy proposed a Redis caching layer and a complex custom lookup scheme for a <1GB, moderate read volume, super low write volume MySQL database. And of course he wants to put the bulk of the data in JSON fields and manage any schema evolution in our application code.

Can't we just let stuff work? I'm no fan of MySQL, but can't we admit that a ubiquitous and battle-tested piece of technology, applied to a canonical use case, on tiny data under near-ideal circumstances, is probably going to work just fine? At least give it a chance before you spend days designing and documenting a bunch of fancy tricks to save MySQL from being crushed under a few megabytes of data.


That case you've seen speaks of the guy's inexperience and lack of understanding.

I have a problem with this rule because what I see happening is people taking it to heart and no longer thinking about what they're doing performance-wise. And then the program is working 1000x slower than it should, at no extra gain (and often a loss) of readability or safety, just because someone decided to use O(n) data structure where O(1) would do, or keeps repeating the same computation thousand times instead of ensuring it's done once.

So to your "let stuff work", I want to also add: "understand the work you're putting the computer through", and "don't do stupid things".


Architecture is not optimization. This gets fuzzy with the decision to use caching, static generation, etc. Ideally, redis or memcached can be added when needed. It will require some changing of the app but hopefully in limited places.

This rule a la Pike is about doing things like writing assembly or manually unrolling loops in noncritical parts of code. However, in some code, almost everything is on the critical path, and that requires architecture. I’m thinking of Carmack’s single function game loop here.

I remember working on a project that claimed to want 10,000 transactions per second. “Okay”, I said, “how much can be accomplished in 100us?”

They looked at me like I was an idiot. “No, it’s going to be clustered and pipelined.”

“Ok, if you can do that maybe you have a budget of 1-5ms.”

“No, we’re going to give each transaction up to a second to get done”

I smiled and admitted that they were obviously a lot smarter than me. Oddly enough the product never saw the light of day.


Don’t worry, we’ll use serverless! So we can support an unlimited amount of transactions per second!


Latency and throughput are not the same thing. A CPU can process multiple instructions per clock cycle yet instructions take longer than a clock cycle.


That’s not obviously absurd. If Amdahl’s law permits then Little’s law tells you whether or not it’s possible.


Yep. As I said I’m not smart.


Wow, you are really good at this.

Let the "geniuses" figure out that to do 10k transactions/sec with processes that can handle 1 transaction/sec would take.....10k processes. Sure, doable, in specific contexts, given enough resources. But they sure as heck aren't free resources!


I think maybe you are misreading the rule, it doesn't say don't optimize, it says when optimizing, don't guess from the code where the bottleneck is, go measure it.


Yeah but some things like how you structure your data(which then drives CPU cache misses) aren't something you can easily adjust/tune.

Usually when you encounter one of those it's a rewrite/rearchitecture of a whole module/subsystem before you see any gains. Been there done that, not excited to repeat it again.


The other rule is choose your data structures wisely.


The number of people who don't understand when to use a list/array VS dictionary/hash table vs lookup object is too damn high. A huge amount of basic optimization is constantly at their fingertips and they nearly always choose to make a list/array and use linq to join/merge multiple relational data sets instead of optimized standard objects.


sure, sometimes your approach is at fault, but as the rule says, sometimes it's just a surprising bottleneck. Having done a lot of embedded work, I've found that many of those can often be fixed in a straightforward manner. Sometimes, you have to change your approach though.


It is all about lack of knowledge because checking where the bottlenecks are is one thing. Knowing why they are there is another.

A retailer app was very slow. The developers proposed a newer faster server with a newer Oracle version.

Then I took a look at one of the slowest queries. Changed it so it could use the indexes in a better way and the query went from +20 to 0.7 seconds.

The developers measured the query was slow, they checked the query used indexes and they were right that a server with more RAM and a newer Oracle version could improve the speed. But they missed that the query had to fetch a lot of data (using indexes) before it could start filtering the data. The only thing I did was to change the query so it could filter the data first.


Bolt - 95 cents

knowing where to put the bolt - 95 bucks per hour.


I've had many similar experiences with Postgres. I always seem to be the one adding the query logging to the apps. Then I feed the prod queries into the query planner. Then run the results through this tool: http://tatiyants.com/pev/#/plans/new Tweak the queries, add/remove indexes. Voilà.


The developers didn't actually know where the problem is because they didn't root cause it. They knew it lay within a certain pretty large bounding box (a particular query), but didn't drill into it further, beyond checking that indices are used (eliminating some inadvertent unindexed search as being the root cause).

If you actually know where (or each one of the multiple wheres if several places collude), that is usually very close to knowing why; often the same.

They should have had the intuition that if a query takes 20 seconds, even in a testing scenario where the system is not bogged down, it must be churning through a lot of data all over the place. Then think: does the query actually need to be looking at a lot of data? Maybe it's wastefuly looking at more records than necessary. They didn't imagine what the machine might have to do to satisfy the query, just accepting it as a black box that the DB has optimized as well as it can be (so just throw hardware at it).


Sounds like ops vs dev (or DBA) right there; it does make sense to a point that ops would grow into that kind of behaviour, given that they can't really allocate resources into performance enhancements for software (especially if it's from an external party or off-the-shelf). To them it's a black box.

But if it's an in-house thing then there should be open communication lines. The SRE paradigm makes sense there, where I see a SRE as part dev, part ops. They can identify perf issues as a software problem and either fix it or send it back to the authors.


It was more like dev & dev.

I am not criticizing anyone because I was just lucky to notice the query could be rewritten. But it was also the fact that I had a little more knowledge about how the db engine handles queries.

So we all looked in the right direction, we all found the bottleneck but the solution was different based on different knowledge.


I would go so far as to say, their "solution" was not a solution at all. Granted, in the absence of being able to recognize alternatives, it may have been the best available option left, but it fundamentally missed the root cause of the query slowness.

Your experience exactly shows why having a diversity of opinions/backgrounds/expertise on a team is a very valuable trait. Had no one realized you could rewrite the query, would scaling vertically and upgrading Oracle been a fatal mistake for the team? Probably not, but damn if it wouldn't have been a big waste of time/money.


This is particularly exasperating for me. I can't tell you how many times in my professional career I've ended up speeding up systems by removing two or three layers of improperly-implemented "caching" and using good ol' MySQL and a basic understanding of algorithmic time complexity to simplify things.


Me too. I've seen a few systems that replaced their simple request requiring 10,000 queries "optimized" by requiring 10,000 cache lookups when they should have just added some joins. The bottleneck is the network latency, not the database. The worst I've seen is an nHibernate cache stored in a session variable, half the database was being serialized/deserialized on every http request. Fortunately that was a small database.

Even with in memory caches I've seen systems grind to a halt by death of a thousand cuts, dictionary based entity attribute systems where each attribute is looked up individually. There seems to be a mentality that constant lookup == free lookup and devs don't seem to realize that constant * $bignumber == $biggerNumber. Caching shouldn't be granular.

Obligatory latency numbers every programmer should know: https://gist.github.com/jboner/2841832


It very much depends. I've seen too many cases when business logic was moved to the database layer, with absolutely disastrous effects. Scaling application backends doing map-reduce on a pre-filtered datasets is way easier than scaling database to handle five-way joins. Yet,I've seen opposing cases too, when rdbms was used as a huge key-value storage , and was pushing hundreds of thousands of rows to tens of backends, sucking it's bandwidth dry.


Having worked with Hibernate recently on a smallish database (that isn't expected to grow too big), my take away is to minimize the network trips. If you have to check something for a list/set of things and all of that can be done from, say, two not so big tables, fetch the superset of records via a join with as much criteria pushed to predicates as efficiently possible and handle rest of the logic in application code. It is so much more quicker. Of course, like most things in software, conditions apply like network latency/total no. of records being fetched by the query/potential for data to suddenly grow etc.


Not to cache simple trips to the database is pretty uncontroversial, the are more controversial cases elsewhere.

Say you consume a Restful API to hidrate order data in another system. Do you fetch orders as:

  /orders?ids=id-1,id-2,id-3
or separate calls to

  /orders/id-x
which can be cached and retrieved by id as a memoized function?

Well if you had to pick only one the second is probably better, but the best would be abstracting away order fetching in application code to always fetch single orders and behind the scenes looking up the cache for singles and pooling all the misses into a single request to the plural endpoint.


If this is a common occurrence the former would almost always be better, the second would be absolutely glacial.

> but the best would be abstracting away order fetching in application code to always fetch single orders and behind the scenes looking up the cache for singles and pooling all the misses into a single request to the plural endpoint.

Unless a significant amount of requests have 100% cache hits then I doubt a local cache will make much of a difference at all, all it's saving is a bit of bandwidth.


I had worked on an application that saw less than 100 writes per minute and about 1k reads per min and used a caching layer in front of the DB. Not only was the cache actually slowing us down, it was also inconsistent with the DB. Can't even begin to express the amount of lost dev time and productivity. We couldn't get rid of it, because someone above was convinced that we would need it to scale in the future.


There are only two hard things in computer science: naming things, cache invalidation, and off-by-one errors.

I'm trying to never implement any caching if I can help it. The database itself does caching already as well.

And if you DO need caching, keep your hands off of the application; add a cache layer in front, or between the application and the database. But don't invent it yourself.


  And if you DO need caching, keep your hands off of the application; add a cache layer in front, or between the application and the database. But don't invent it yourself.

so, redis?


My tech lead once speed up our system by just turning off Memcached. :-)


My problem was people using ORM frameworks and having very little knowledge of the database they were using. Slow as molasses, difficult to figure out what is going on because of many layers of extra "stuff".


Same boat. Basically every day.


It’s insane how often people stick Memchaced/Redis in front of MySQL/Postgres, completely unnecessarily. So many otherwise decent developers assume that MySQL/Postgres is too slow/doesn’t scale, without even trying. Or similarly, how frequently people shard databases unnecessarily. A single MySQL/Postgres server, on a beefy machine, can handle an absolutely massive amount of work, with great performance, assuming you’ve indexed well and don’t have to run super crazy queries.

I’ve worked on some pretty high traffic systems, with pretty large data volumes, and very, very rarely have I actually needed to cache simple DB reads. And it’s hard to properly cache complex ones anyways, because cache invalidation is so hard for complex queries. Likewise, it’s very rare that I’ve needed sharding. SOMETIMES you need these things, but mostly people are adding a lot of extra complexity and cost for no good reason.


I don’t think people realize that databases themselves have a caching layer internally. Redis isn’t magical, you still have to send network packets to the other server.

Even when reading from disk via mmap, hot pages are in memory.

Sometimes postgresql is faster than redis because all it needs to do is read something from memory and spit it out in the right format.


Totally. If the hot pages of your indexes and hot pages of your records mostly fit in memory, DBs mostly read from memory anyways. If they don’t mostly fit in memory, things can slow down a bit, but you’re likely better off vertically scaling your DB (more RAM) than adding a cache.

Looking up a normal sized record by id, with good networking in your data centre, takes ~1-2 ms round trip whether you’re reading from Redis/Memcached or MySQL/Postgres, and either can handle massive read load if sized properly. The cache just ~doubles your costs, is one more thing to patch/maintain, and introduces new sorts of bugs/outages.


I ran into this early in my career when a senior engineer at a startup reviewed my Python PR for operating on a CSV dataset and insisted I rewrite it in Pandas for performance. It was a very simple program with naive Python (a handful of lines), but the Pandas version was far longer and more complex (I had to have the senior engineer help because it took some advanced Pandas-fu, and he spent nearly a full day on it), and it ultimately ended up being an order of magnitude slower because we ultimately had to call back into Python for each cell.

I actually don’t think “premature optimization” is all that bad in the general case, but you have to be educated about it (and I’ve met a lot of people in Python shops who think that Pandas or multiprocessing will cure all performance ails), and you should specifically think about how costly is a given optimization going to be to maintain or back out of if you’re wrong about the performance benefits.

In general, I’ve never worked on a Python project where we didn’t have to do weird, inordinately expensive things to work around performance issues (though I’ve never worked on a rudimentary CRUD app either) nor has “just throw Pandas/C/multiprocessing at it” ever adequately addressed our most significant performance bottlenecks (usually the solution looks something like Spark, all to do something that would have been sufficiently performant with naive Java or Go). This might just be my experience working on nontrivial SaaS apps; maybe if you’re just doing straight data science or CRUD apps or workloads that aren’t latency-sensitive (mind you, we struggled to keep per-request performance in the tens of seconds, so I use “latency-sensitive” very loosely), Python/Pandas will be just fine. We also ran into other problems with Python, such as packaging and distribution; notably our lambda functions were routinely too large because the pandas branch of the dependency tree was itself more than half of the permitted artifact size (to work around, we switched to Fargate tasks, which have a much larger size limit but take 30s or minutes to boot up).


"I ran into this early in my career when a senior engineer at a startup reviewed my Python PR for operating on a CSV dataset and insisted I rewrite it in Pandas for performance."

Did a performance problem exist?

If not, then I would never promote that person to senior engineer. A working program should only be re-engineered for performance if it wasn't meeting the performance contract agreed upon when the program was written.


No, the performance problem didn’t exist. The engineer in question was really smart in many other areas, but “engineering” per se wasn’t his strength. Things are fast-and-loose in startup world.


Yes, startups are the wild, wild, west. I spent from 1994-97 in startups. I loved the lack of bureaucracy. But we spent too much time in firefighting mode. Through trial-and-error I found medium-small organizations in the non-profit world my best fit.

Early on I had a senior guy mentoring me on a project involving a tool called PowerBuilder. He chose a design that didn't fit the problem-space well but it fit the "PowerBuilder Best Practices" so he implemented it. The performance was abominable and he should have known it would be: he too was a smart guy. But he had a hard time seeing "big picture" design.


Do you work at my company? Because I have a coworker that is always proposing _exactly_ that solution. No matter what issues the code has, for him its usage of MySQL is always "the worst moment".

Asking for benchmark just gets a repeat of "our worst moment is MySQL and we can solve that with some NoSQL cache".


My dude's just trying to get a promo, I guarantee it.


We all start out this way, don't we? I think what drove it out of me was having to maintain. Okay, you wrote this thing, now you need to change the way a few features work and add some others. Especially effective is if you take turns being on call, getting paged in the middle of the night when your multilayer architecture has a hiccup and now you have to troubleshoot it.

I wish there was a quick remedy for such people. But usually they are new, based on what I said. So they should be Junior Developer, not Architect. Sure, you can suggest things, but the answer is nope.

A lot of people, though, may never get much experience supporting their own architectures, because they swoop in, then swoop out, never staying at a job long enough. Or support gets assigned to another group. One thing I like about Agile is that the team that creates the code is the one that supports it.


Yeah. I got told to put in a caching for a system that made very slow calls through a handful of proxies to the other side of the world and back.

We went with redis for the "PoC". I'd tried explaining that if a query is only made once every 24 hours and the data can't be cached longer than that because it's considered too out of date then a cache is pointless extra complexity.

He wasn't having it though, so he made us build it and demo it to a room full of people. Fortunately the room understood the simple explanation and he listened to them where he wouldn't listen to the dev team, so a few weeks of work was scrapped there and then.


It seems that a subset of developers are somehow convinced that adding complexity is a good thing for performance and should be done first, when it should really be a last resort. Their usual rebuttal is "it's scalable" and other buzzword-laden phrases. I wonder if it's a form of "big data envy" or just regular "architecture astronautism".


Honestly, it's the other way around usually; simplicity is scalable. A stateless service talking to a database is fine. You can scale out the service, and scale up the database. It's even easier when you use a cloud provider, their relational database offerings can scale from a wordpress blog's database to enterprise scale, tens of thousands of transactions / second.

The problem is that it's boring, and there's a lot of developers that create work and complexity to make their own jobs interesting.


No, no. We need all these layers of proxies, caches and various databases to make things more reliable!


This is taken to other extreme many times.

Example: Google Chrome codebase was allocating lot of std::string and also someone used a Set to check membership of single item. [1]

I mean, if you say like this, many people don't even care about algorithm complexity.

Doesn't help that people want to write Python in the monster that is C++.

https://groups.google.com/a/chromium.org/forum/m/#!msg/chrom...


I very much agree that many people use rules like these as excuses to write shitty code.

From the post you linked though:

> Not reserving space in a vector when the size is known

std::vector::reserve() is actually not something you should always use when you are adding a number of elements as it will (typically) grow the vector to exactly what you ask. If your function then gets called in a loop to append to the same vector multiple times you end up with quadradtic run time that is normally avoided by the geometric growth done when you just append without reserving.


>and also someone used a Set to check membership of single item. [1]

Depending on where it was done (fast path or not), this could be just fine.


You don't know when something comes to hot path.

    if (std::set(itr.begin(), itr.end()).count(element)) { _____ } 
is it tempting for someone than something like

std::find(itr.begin(), itr.end(), element) != itr.end())

?? I don't know. That said, C++ STL quite undiscoverable.


>You don't know when something comes to hot path.

That's the point of the first advice though...


Wholeheartedly agree. What you're describing is like the ideal mysql use case. Also, even the most basic next steps performance-wise will probably tithe you over well into the TBs of data. The headroom for mysql is quite a bit higher than most people think unless you're operating at facebook scale.


I see this behavior a lot too.

Inexperienced engineers will nitpick about what is often minor “performance optimization”, clearly not seeing the bigger picture. Example, why should we spend precious developer time to rewrite some code using something that is often less readable when it’s called once every 30 seconds?

To the folks who do this: you are better off spending the time making data-driven decisions and optimizing for the big picture. In other words measure first than come up with an optimization that has a large impact on the system. Not this micro-optimization, I-love-to-tickle-myself stuff.

Learn to see the bigger picture.


> a Redis caching layer and a complex custom lookup scheme

Bane of my life. A few gigs ago, they had a global Redis cache, a Redis cache per server, an in-app cache, and then MySQL. Needless to say, there were many, MANY bugs that came down to cache coherency and race conditions between them.

[edit: it was a global Redis, not clustered.]


Did we both work at a certain edtech company? Seems oddly specific and exactly how I remember our monolith being structured


Alas, that wasn't edtech, it was photos related.


No comment on your particular example, but the opposite approach is more common than I'd personally like and also not ideal -- if you're doing anything nontrivial it's probably worth doing a back of the envelope calculation to ensure you have enough disk, memory, and network throughput for your favorite solution to handle the expected load.


> Can't we just let stuff work? I'm no fan of MySQL, but can't we admit that a ubiquitous and battle-tested piece of technology, applied to a canonical use case, on tiny data under near-ideal circumstances, is probably going to work just fine?

I call that "separation of error-domains", meaning trying to find interfaces where you can separate off parts of your application, so debugging gets easier:

Is the bug in the database, or in our code?


I feels ya.

My most recent team had our hottest dataset in dynamodb. Because cloud. So much effort to get our web service's P99 <50ms.

The whole dataset fit easily into RAM.

Thru (way too much) effort, I was able to introduce Redis. First shared. Then eventually one mini-instance per EC2, running alongside nginx & nodejs.


> <1GB, moderate read volume, super low write volume MySQL database

It looks like an excellent candidate to put it entirely in RAM and trigger sync on writes only, why on Earth would you need anything else.


That's effectively what MySQL is going to do.


For the most part, yes, but I was thinking about explicitly proloading all tables and indexes into RAM or even using the MEMORY engine, if it makes sense in that particular case.


hehehhe I have actually put some data that do not change more often than once a week hardcoded in the code base and commited in source control... whenever a change needs to happen in the data the CI/CD runs and deploys a new version of the app.

You don't wanna know how that was done before. It is like < 1 GB of JSON as well.


I wish we had a deployment process trivial enough to support that. On the other hand, eventually you want to farm that job out to customer success or an implementation engineer, and then it's not so fun having the data in source control.


Please tell me you have git LFS configured to handle this.


Yup !


Never read the release notes for MySQL. Unless phrases like "server exit" or 'regression' don't bother you.


In The Mythical Man Month Fred Brooks said "Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious."

I first read that on Guy Steele's site: http://www.dreamsongs.com/ObjectsHaveNotFailedNarr.html


> Show me your tables, and I won't usually need your flowchart

A couple of years ago I spent quite some time trying to evaluate the tech stack (and general engineering culture) of merger/acquisition targets of my employer. It was quite a fun exercise, all said and done. I encountered all sorts; from a small team start up who had their tech sorted out more or less to a largish organisation who relied on IBM's ESB which exactly one person in their team knew how it worked!!

I discovered this exact method during the third tech evaluation exercise. When the team began explaining various modules top-down and user-flows etc., I politely interrupted them and asked for DB schema. It was just on a whim because I was bored of typical one way session interrupted by me asking minor questions. Once I had a hang of their schema rest of the session was literally me telling them what their control and user flows were and them validating it.

Since then it's become my magic wand to understand a new company or team. Just go directly to the schema and work backwards.

Conversely, I've begun paying more attention to data modelling. Because once a data model is fixed it's very hard to change and once enough data accumulates the inertia just increases and instead if changing the data model (for the fear of data migration etc.,) the tendency is to beat the use cases to fit the data model. It's not your usual fail-fast-and-iterate thing.


I have learned to spend a good chuck on my effort and focus on data model - it is literally the heart of the application. Once that is done correctly, I've seen that the code almost just falls into place by itself.


Based on your experience it seems that #5 ought be first: "Data dominates"


Resaid by Linus with a bit more modern nomeclature (and Linus's trademark bluntness):

> Bad programmers worry about the code. Good programmers worry about data structures and their relationships


> Bad programmers worry about the code

And yet, I see a whole swath of the industry hyper-focused on various linters/styling/rules.


> And yet, I see a whole swath of the industry hyper-focused on various linters/styling/rules.

It seems to me that what you're actually seeing is an entire industry trying to eliminate all code-related issues, specially bike-shedding ones.

This is patently obvious to anyone who was forced to waste their time in code review iterations discussing, say, where a brace should go and how many spaces someone should have added.


This. The point of the style guide, now reaching its best embodiment in clang-format, gofmt, and others, is that you don't want to waste time arguing about, or even considering for a moment the formatting of anything.


the people who might not follow explicit guides are also the ones who don't typically spend the time 'arguing' about it in the first place though.


This usually a good time to apply the When In Rome rule. Do not reformat needlessly, follow the code style of the code you're modifying. Done.

(If multiple people are arguing back and forth in code review -- when following the WIR rule -- tell them about the WIR rule and that should settle it. If not, you have bigger problems in your team.)


Sort of. You shouldn't combine style and non-style changes in on change set. But if your project has mixed styles from file to file you should assimilate the locals over time, just as the Romans did. A consistent style has value.


I agree (in principle!) with the idea that consistency across the whole code base has value, but personally I find it extremely marginal, unless you're literally dealing with e.g. hundreds of people needing to read/modify the code. The value proposition skews heavily towards situations where you have a lot of people needing to read/modify the code.


Nobody was ever "forced to waste their time" on this stuff. I have a simple rule - I don't comment on other people's style, and if people comment on my style, I just go with their suggestions. Problem solved.


You've never been on a team with two people with opposing opinions I guess.


Just out of morbid curiosity... have you actually experienced multiple 'seniors' giving conflicting code review comments about code style (of all things)?

That sounds quite dysfunctional.

(EDIT: Sure, nitpicks may differ, but...)


> That sounds quite dysfunctional.

No,it doesn't. It sounds like the expected outcome of not enforcing an established style with automated tools.

All it takes is someone posting a merge request with a bracket out of place, or tabs instead of spaces which screws layout because yes IDEs have custom definitions and a dude happened to have opened a source file with an editor that wasn't properly configured.

Boom, merge request receives two comments pointing out the bracket and how indentation is off.

Congrats, about 20 minutes of your team's day are wasted because that's the time it takes to receive feedback from the merge request, be briefed on the remarks, go through the code and fix whitespaces, commit your change, push those changes, update the merge request, and wait for a team member to review your update.

No drama. No dysfunctional team. No disagreement, even. But those 20 minutes of your life are lost forever.


> It sounds like the expected outcome of not enforcing an established style with automated tools.

Unfortunately those also have significant downsides around large-scale refactoring.

What I always do (and advise other code reviewers to do) is to just ask themselves: Does this code follow the local style in the file being edited?

That simplifies things greatly, IME.


Style yes, formatting no.

Worked with two developers who endlessly argued whether or not we should handle a certain bit of complexity in a certain layer or the next layer over, so we ended up handling it in both layers with the downsides of both.


Having different prople using different coding styles adds a lot of noise to the history of you repo, expecially if those differences are not only about whitespace (e.g. one developer insisting on opening bra kets on the same line and another on a new line)


> I just go with their suggestions.

Why though? I am not going to go with suggestions if they make the code less readable for me!


I think the whole world would benefit if we’d make it so that code formatting happened separately from what was committed. Then everyone could have their local checkouts formatted the way they wanted and there would be nothing to argue about in terms of coding style.


Why though? It seems like an elaborate technical solution just to avoid someone making a decision on fruitless bike-shedding discussions. It will only work for white space formatting not for other conventions prone to bikeshedding like naming/casing conventions.

Just make decision and get on with your lives. Have some kind of linter check for inconsistencies before any human review.

If an organization cannot make a decision on inconsequential bike-shedding it is dysfunctional.


After pre-commit linting, you get post-pull linting... If you kept the code in a RAM drive this might even be quick. 8)


Smart tabs is halfway there.


This would be my number one wish for tooling.


Every have more than one reviewer in your CR?


Doing any kind of style discussion in a code review means you’ve already failed.

I personally get super annoyed when people keep pointing out style issues, but our CI tool can notify me of issues with my commit until the end of time without me getting frustrated with it.


> Doing any kind of style discussion in a code review means you’ve already failed

This sort of baseless assertion has no bearing in reality. In a project that hasn't adopted any linting tools and automatic style checks, all it takes is a misconfigured editor to post a change request that fails to comply with style guides. These sorts of absolutes show a complete detachment from reality and absence of any practical experience in the field.


> These sorts of absolutes show a complete detachment from reality and absence of any practical experience in the field.

But you are making these baseless assertions yourself?

Obviously you can have issues if you are not using automated linting (both in the editor and on CI). That’s part of the failure.


I completely agree. One reason I like prettier is that it only has about 6 options you can change. It removes the bike-shedding. Just let it do it's thing and worry about more important things.

It also remove all debate in PRs about style and formatting.

(note: before prettier, I was fairly particular about how I formatted my code, and I disagreed with prettier in some cases, but now, I love having one less thing to think about)


> > Bad programmers worry about the code

> And yet, I see a whole swath of the industry hyper-focused on various linters/styling/rules.

I've come to a severe distaste for this good programmer/bad programmer mentality I've seen on the internet for, I guess decades now

There is skill in programming, yes, obviously. But this simplistic divide seems to me to be more about putting one's own ego on the superior side. It leads to simplistic heuristics and flames rather than nuanced discussion

In this case, in my opinion, linters/styling/rules help people to focus on what matters. And sure, with sufficient skill you might not need any of that to help you focus on what matters. But so what? It's better if we can make the trade more accessible, and can make it so people can focus on what matters with less experience


Couldn’t agree more. That’s the main reason why simple and accessible projects are praised most by general people.


And that’s because it’s Bad Programmers who need help!


but... they need help on data structures/relations and up front thinking about those issues, not where curly braces should go, or tabs-v-spaces.


Right, because the hyper-focus on linting of the industry is the symptom here, it's not a misguided treatment for the underlying problem of bad programmers.


That stuff is hard! Better to just shove your data somewhere unstructured and then you don't have to worry about data structures and relations.


... and "rules" for programming.


Bad programmers inflicting their worry upon the others.


I do like to use a linter to make it easier readable. But yeah, most of the architecture actually is based on how you store/structure your data. Code is just a result of how you implement the data.


>I first read that on Guy Steele's site.

It isn't Guy Steele's website. That page was written by him but the website is owned by Richard P Gabriel.


I've recently started a job in a very complex business domain, but sadly they're using NoSQL for everything. I've known for a while about the technical tradeoffs of NoSQL, but until now I'd never experienced that the lack of expressiveness in the data store is a major obstacle to understanding what kind of data the code deals with and how it's related. The data's all there, but exploring it without a real schema is much more difficult.


An early mentor put it as “learn the data, which won’t change, before learning the fancy stuff on top, which will”

That carried me very well.


Plenty of professional developers would benefit greatly if they read Domain-Driven Design.


Interesting. What’s the best resource beyond the Wikipedia page?


"The term was coined by Eric Evans in his book of the same title."

https://en.m.wikipedia.org/wiki/Domain-driven_design


That’s Dick Gabriel’s site; he posted gls’s essay there with attribution (so you didn’t realize which site it is). He and quux are friends and collaborators.


as someone in ML, I see myself wanting the opposite.

ML researchers drown their algorithms in huge tables of results, effectively spending time on "how well" rather than the "what".

It often leads to things being added as long as they are better, with the conclusion of it being a gargantuan monster of models and hand-engineered changes. All with no one understanding how the whole things works as a single unit.

Flow charts are incredibly effective as the top most layer of abstraction. Does the whole process, when viewed in an end-2-end manner, make sense ? We dive into the details only if it passes that sniff test of a flow chart.

I might be missing the point being made here, but they can claw flowcharts from my cold dead hands.


"Flowchart" has historically, in Brook's time, meant "flow-of-control chart", and these usually degenerate into vast webs of minutia -- useless as abstractions.

But perhaps you meant "flow-of-data between structures" -- in which case we have agreement on engineering, but a muddle on semantics.


When Brooks says tables, I believe he means the internal data representation, rather than "tables of results".


Rule 5 seems to mirror one of my favorite insights from Alexander Stepanov:

> In 1976, still back in the USSR, I got a very serious case of food poisoning from eating raw fish. While in the hospital, in the state of delirium, I suddenly realized that the ability to add numbers in parallel depends on the fact that addition is associative. (So, putting it simply, STL is the result of a bacterial infection.) In other words, I realized that a parallel reduction algorithm is associated with a semigroup structure type. That is the fundamental point: algorithms are defined on algebraic structures.

This is also exemplified in the analytics infrastructure used at stripe: https://www.infoq.com/presentations/abstract-algebra-analyti...


In case you've wondered what a monoid is, that's a monoid. Something with an associative operation (and an identity), so you can do the operation on chunks in parallel, like addition.


Every monoid is a semigroup, but it's only a monoid if there is also a value that serves as an identity.


Yep. And if what you have is an Abelian Group, then you also get distributed computation as well (thanks to commutativity).


While true, that's too strict. An Abelian group (like any group) needs inverses. You get distributed computation if you've got an Abelian semigroup.


Thanks for the correction. I think that in Avi Bryant's talk (that I linked to above) Stripe ended up using Abelian groups for some reason, rather than Abelian semigroups, though if so I forget the reason why.


Inverses don't show up as much as I'd (aesthetically?) like in computing. There was an interesting application here: https://www.reddit.com/r/haskell/comments/9x684a/edward_kmet...


To be fair, Abel did not know (or care) about semigroups.


That matches my understanding, but the terminology is still (IME) common.


You can distribute the computation on just a monoid as well but it needs more bookkeeping. In particular, your reduce function should know

* lhs is before rhs

* There is no data between lhs and rhs


One way of looking at it is that equipping our data with that bookkeeping gives us something that commutes.


Hmm sure, but it is not a requirement that your underlying algebraic structure should commute, so I think original phrasing was misleading. The bookkeeping allows you to commute a specific list of objects, even though the underlying operation is anti-commutative (i.e. exists a,b a.b != b.a).

At the moment of computation, you can build a new structure that commutes by enumerating the data. I guess it's true that you need a commuting intermediate data structure to be able to distribute.


Yeah, I think it's informative that you "need commutativity" but important that you can build it yourself. It's nice (mostly from an efficiency standpoint, sometimes from a complexity stnadpoint) when you can get it "for free" because the underlying type is commutative, and the fact that you're shooting for commutativity can inform how you build and test the bookkeeping.

As an interesting nit, "anticommutative" specifically means that a.b = -(b.a), which is different than simply not being commutative.

A group might be commutative, anticommutative, neither, or even both (trivially true of the empty group and the group with one element, but I think it can be true of larger groups).


Ah, my terminology is rusty as it's been years since my math classes. I guess I need to review my algebra books again. I do have some personal code that called "exists a,b a.b != b.a" anticommutative, didn't even realize that's wrong terminology!


One of the ideas in the talk I link is how you can represent typically non-commutative data (like averages) in a data structure that does support commutative operations (numerator/denominator pair) and to take advantage of generic analytics infrastructure.


But adding floating point numbers isn't associative, in general. Sometimes you need to do it the right way to avoid catastrophic cancellation.

I guess the key is to know how to deal with things that are only mostly true.


> But adding floating point numbers isn't associative, in general. Sometimes you need to do it the right way to avoid catastrophic cancellation.

That exactly proves his point. Systems that are associative can be processed by the parallel algorithm he was thinking of. Floating point numbers, if you care about their non-associativity, cannot be processed by that algorithm. So the validity is that algorithm depends on whether the system is associative.


That's true about floating point numbers. I assume that depending upon the context, it may not be a big issue (e.g. GPU compute)?

In any case, the point Stepanov was making is that if you want to be able to use a certain algorithm, then you have to make a choice to represent your data in a way that enables that algorithm, and the way you know whether the structure is appropriate for that algorithm is the algebraic properties of the structure.


No, the key is to use good abstractions unless you have a really good reason. New code should not be using IEEE 754 floating point unless you have benchmarks and profiles showing that you actually need to.


That’s why in C++ we have traits and overloading.


Could you explain where do you see traits and overloading helping you with floating point operations?


And that was 10 years before Haskell went huge in that idea.


I'm not super familiar with either the C++ or Haskell communities, but Stepanov's notion of Generic Programming[1] certainly seems to fit with the Haskell ethos.

[1]http://www.generic-programming.org/


Recently I searched the Web, trying to find out the origin of monoids as an approach to distributed computing, and couldn’t find it. This quote is a great find for me! Is this the origin?


If you're thinking about map-reduce, the original Google paper talks about associativity.


Everyone building no code tools is learning or will learn that the problem most businesses have is not a lack of coding skill, or the inability to build the algorithm, but rather how to structure and model data in a sensible way in the first place.


Modeling the data and structuring the program are indeed the harder tasks, but orgs have lots of smart people who have those skills but not the familiarity with various existing syntaxes and standard libraries and so on that a programmer learns over the decades of their career. Further, those same orgs probably have many people with experience in the latter but without any special ability to think abstractly. This significantly limits the ability to create tools. Further, the no code tools often abstract at a more appropriate level than general purpose programming languages’ standard libraries because these tools aren’t trying to be general purpose (at least not to the same degree as general purpose programming languages). Lastly, I’ve seen business people use certain no code tools to build internal solutions quickly that would have taken a programmer considerable (but not crazy) time to crank out, especially considering things like CI/CD pipelines, etc. Nocode won’t replace Python, but it serves a valuable niche.


If no code tools are anything like ORMs, there will be some interesting surprises when one encounters non-normalized data structures.


I've been preaching this to all my non-coder creative marketing type buddies who have ideas all the time and think they can just whip up a product using all the latest and greatest no-code tooling.

They are destined for failure.


> Tony Hoare's famous maxim "Premature optimization is the root of all evil."

Actually that was Donald Knuth - it's an urban legend that it's an urban legend that it was originally Knuth. Hoare was quoting Knuth, but Knuth forgot he said it, and re-mis-attributed the quote to Hoare.


And it is usually quoted out of its context.

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."


It's also often interpreted literally.

Premature complex optimization is a bad idea, but simple (read, cheap to code) optimization for common bottleneck patterns is a perfectly reasonable thing to do.


I like what Chandler Carruth said, "The death of a thousand cuts", on why is my code slow.


It's also often (ab)used far too often to justify performing no optimisation at all.


...or even pessimisation.


That is a great word.


I don't think that changes the meaning. Once that 3% matters to you and you've invested the work to measure that 3%, it's not premature anymore.

That "premature" and "optimization" are undefined and left up for debate is what makes it trite.


It does change the meaning IMO. It means than 3% of the time you should be doing "premature" optimization.

The point of most people referring to this quote is never try to optimize anything as you write it. First build your system(?), then measure it, then optimize it. Knuth's point is that this attitude is ok most of the time, but sometimes it's not ok. Another way of putting this is that most of the code, for most applications, isn't performance critical. But some code is.

Sure, you can't always tell in advance, but sometimes you can. This is sort of the difference between terrible software that will always suck and well crafted software, no amount of measurement or after-the-fact optimization will turn that terrible software into well crafted software.

The other aspect that I think is often missed is that these observations are often made at different scales. You can look at relatively short algorithm (let's say merge sort) and it may not be obvious which instructions are the ones that need to be optimized and what the bottlenecks will be, execution units, data access e.g. So you start with a reasonable but maybe naive implementation and then you optimize from there. That's a pretty solid idea. But taking that idea to a higher scale level, e.g. saying we're going to build this huge system with a billion lines of code and so we'll just throw something together and measure it isn't exactly the same thing, that's a pretty problematic idea. You need to be able to anticipate what the bottlenecks in your billion line system are going to be because finding that out after you've written a billion lines could be a big deal.

[EDIT: and really this whole long story is why these sort of rules don't work. Because the people who know (have the experience/craftsmanship) don't need the rule and the people who don't know won't understand it. It's like reading a book about sword fighting and then trying to go into a sword fight... The reading can complement your training but can't be substitute...]


This reminds me of that Woody Allen joke about someone translating all the T.S. Eliot’s poems into English after some vandals had broken into the school library and translated them into French.


I've always been uncomfortable with these kinds of ideas. The odds that the idea will be correctly applied is heavily tied to intelligence, culture, and situation. Instead of reducing the space of options you must consider, all it says is that you should "do it this way when you should and do it the other when you shouldn't." I suppose perhaps it is useful to highlight that the decision exists, but I would be surprised if anyone working in the space is unaware of the existence of the decision.

The scientific method has a similar problem. A scientist should form their hypothesis before gathering data to evaluate the hypothesis. If a scientist fails to do this, and starts engaging in p-hacking or data dredging, the quality of their research greatly declines. But proving that a hypothesis was obtained before data was collected is not usually provable when just looking at the publication itself. And further, there are ways that data dredging can unintentionally sneak into the scientific process, especially around the phase before hypothesis- observation.

This kind of idea has large technical impact, but doesn't have a solid technical reason. It's proof is closer to aesthetics than reason. And much like other aesthetic beliefs, a population believes it based on no deeper reasoning. Only exclusion or indoctrination can ensure the population's view, and only illogical rhetoric will change it.


Ha. So the truth is that Knuth did quote Hoare, not aware that he was quoting Knuth--indirectly Knuth was quoting himself.


In long-lived systems (systems that run for many years) it's almost impossible to choose the "right data structures" for the ages. The sources and uses of your data will not last nearly as long as the data itself.

What to do about this? Two things:

STORE YOUR TIMESTAMPS IN UTC. NOT US Pacific or any other local timezone. If you start out with the wrong timezone you'll never be able to fix it. And generations of programmers will curse your name.

Keep your data structures simple enough to adapt to the future. Written another way: respect the programmers who have to use your data when you're not around to explain it.

And, a rule that's like the third law of thermodynamics. You can never know when you're designing data how long it will last. Written another way: your kludges will come back to bite you in the xxx.


Sometimes storing in UTC is simply not correct. For example a shop opening time. The shop opens 10am local time, whether DST or not. Their opening time is 10am local time all year but their UTC opening time actually changes depending on the time of year!


But a shop opening time is not a timestamp, so I think the original advice is still good. A timestamp is the time at which some event happened, which is different than a date/time used for specifying a schedule.

For example, if you wanted to track the history of when the shop actually opened, it would make sense to store a UTC timestamp.


> A timestamp is the time at which some event happened, which is different than a date/time used for specifying a schedule.

Correct, but that makes this a rule with much more limited applications than many people are going to interpret it as.


Yes, but scheduled times can look like timestamps. It might be tempting to store a date+time+location as just a UTC timestamps but timezones can and do change so the UTC timestamps for that scheduled time is not fixed.


> A timestamp is the time at which some event happened,

It's important to the advice to make explicit that the use of “timestamp” in that sense is intended, because “timestamp” is also in many contexts “the data type that combines date and time of day and, perhaps optionally, time zone information”. The application of “timestamps” in the latter sense is not limited to when they represent “timestamps” in the former sense.


And the time offset that was in effect when the event happened, allows you to easily answer questions like "Did the shop open late, i.e. after 10 AM local time?".


The most interesting case of this I encountered was for photo 'timestamps' on a global sharing site. UTC was being used and I was proposing a change to local time. There was great debate as many drank the UTC juice and stopped thinking.

It was when I showed them that we also have a 'shot at' location then proceeded to show Christmas eve photos showing the UTC time converted to the viewers local timezone (not always evening, not always Dec 24) alongside where the photo was taken. Just as in space-time a photo needs both a time and a place.


Sounds like the problem was images being uploaded with a timestamp without a timzeone, in which case neither solution would work.


The timezone could be inferred from uploader's geoip as a fallback. The problem was that even if the timezone was known at time of upload it was converted to UTC and lost when stored.


For historical events, where the local time is important, the combination of "UTC timestamp" and "local time offset in effect at the moment the timestamp was taken" seems to be the choice. Allows you to easily learn what time the wall clock was showing at the moment.


Databases have support for a single type that encodes exactly like this. In postgresql a timestamptz shows as 'yyyy-mm-dd hh:mm:ss.123456+1234' but internally it's stored as UTC unixtime and tz offset.


Doesn't it actually store the timezone's IANA name and uses the tzdata to do conversions? That implies slightly more work than storing just the effective timezone offset, but is probably more correct when it comes to the timestamps in the future.


Docs aren't 100% explicit but it seems that it uses tzdata etc to convert to offset as necessary and stores offset--that wouldn't affect correctness as the conversion is done now not in the future.


Totally. "Store everything in UTC" is just another flavor of "pick a timezone to store everything." In a lot of cases, you probably need to go ahead and just store the fully qualified date including timezone/offset for each record.


Even storing offset or timezone might not be enough if what you really want is some future date and time at a particular location. Timezones do change, including the regions they cover.

Still, for things that have already happended, storing them as a UTC timestmp is almost always the correct thing to do.


I made that mistake early my career following this exact advice and I ended up with a lot things were randomly 1 hour off depending on when the record was created and the date entered.


That is the difference between a clock reading and a timestamp.


> STORE YOUR TIMESTAMPS IN UTC. NOT US Pacific or any other local timezone.

What difference does it make if the timestamp includes the timezone? The UTC value can be recovered. In some applications the timezone is useful e.g. when intraday times matter.


Odds are that you forgot to record the timezone at all, and it didn't matter until you already have users in different timezones who have saved data and they don't remember which timezone they saved everything in.

If you record the timezone you can convert. Even then, it is easier to use UTC just because everyone else does and so you can feed UTC into any third party library and it will work.


Daylight savings might get in the way, especially if daylight savings rules change some time later.


People ask "why UTC"? Good question. Here's why:

You can always translate UTC to a local time in a given timezone. With IANA zoneinfo, you can do that correctly even for historical data in places where timezone rules changed in the past.

You can always calculate elapsed times correctly by taking differences between UTC timestamps. With local times you can't. Because daylight time transitions.

If you started with a local service, UTC lets you expand globally without explaining to your new customers why your timestamps are not in their timezones.

Daylight time transition days. Because daylight time.

Oddball daylight transition rules. Because Indiana USA, from the legislature that almost wrote a law declaring the value of π to be 22/7.

Because almost everybody will understand your decision, even after you're gone.


PostgreSQL stores it as UTC if the data type is "timestamp with time zone" --- https://www.postgresql.org/docs/current/datatype-datetime.ht...


In our system, contracts have an expiration date/time. The only actors are Swedish. 12:59:59 must always be 12:59:59 on a certain date. It may never become 6:59:59 when someone in the company traveled to NY and prints out the contract.


A quote that combines the rules about optimization with the rules about data structures: "NoSQL databases that only have weak consistency are enforcing a broadly applied premature optimization on the entire system." --- Alexander Lloyd, Google

Also in support of Rule 5, see Eric Raymond's treatment of Data-Driven Programming:

"Even the simplest procedural logic is hard for humans to verify, but quite complex data structures are fairly easy to model and reason about. To see this, compare the expressiveness and explanatory power of a diagram of (say) a fifty-node pointer tree with a flowchart of a fifty-line program. Or, compare an array initializer expressing a conversion table with an equivalent switch statement. The difference in transparency and clarity is dramatic. See Rob Pike's Rule 5.

"Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data."

http://www.catb.org/~esr/writings/taoup/html/ch01s06.html#id...

http://www.catb.org/~esr/writings/taoup/html/generationchapt...


A quote from one of our founders that I've always liked:

If you make an optimization that was not at a bottleneck, you did not make an optimization.


That's a bit broad i think, talking about pure speed of your task you are right, talking about energy consumption then it's not always the case. Or small but often repeated task should be optimized no mater if they are bottlenecks, when the system grows they will become bottlenecks, optimize like a Vulcan is what my boss once said...be logical and nothing else (my interpretation)


"optimize like a Vulcan"... classic!


Best boss ever!! He even had a bottle of his best whisky refilled in a bottle called "Saurian brandy", so everyone who said that this is illegal or ohh that's "start trek" got one...well not a bottle but a glas :)


Read The Goal by Eliyahu Goldratt. While it's possible your founder came upon the idea independently, this is one of many that are repeated in that book. It's relatively short and entertaining to read and has definitely survived the 36 years since first publishing quite well.


This is what instantly came to mind for me. Making a station faster generally only matters when it's a bottleneck.

It doesn't matter how optimized your computations are if you're spending the whole time waiting in IO. And don't forget that the program is generally just a piece of a larger process.


This was adapted into a novel about IT/devops called The Phoenix Project. It's an excellent read.


I second this. Quite the entertaining read, honestly. I also enjoyed the completely unnecessary transformation of the security dork into a security ubermensch.


I've read The Goal and The Phoenix Project. While I did enjoy the stories, I'm uncertain, perhaps due to inexperience, what the main lesson/s are supposed to be.

Anyone want to share their main takeaways from these books?


Reduce feedback cycles


That's less true when you're paying the cloud for compute by the second.


Not all optimization candidates are about bottlenecks. Reducing allocation is also optimization, for example.


Peak memory or garbage collection throughput can become a bottleneck. But if you know you have more memory than you need, further reducing allocation is arguably a waste of your time.

This can become a tragedy of the commons in desktop and mobile apps, where you don't know how much memory the end user has or needs, but you do know you aren't paying for it.


> But if you know you have more memory than you need, further reducing allocation is arguably a waste of your time.

This is absolutely not true. Just because you have enough memory does not mean that wasted memory couldn't be better used - e.g. for disk cache or to run more tasks.


You made an optimization for the future when enough bottlenecks have been fixed such that this one part becomes the bottleneck.


Except that there are infinite such non-bottlenecks, and all the effort you spend on there is effort not spent on the real bottlenecks.

In other words, all engineering is time- and cost-constrained. Anybody can build a good chair for $10,000 or a good PC for $100,000. Doesn't mean it's good engineering.


"Anybody can build a good chair for $10,000 or a good PC for $100,000."

And some people can build a great PC for $1,000 that runs circles around the good PC for $100,000.

There's so much more to engineering than thinking in terms of time and cost constraints. Those are real constraints, but they're not the most important.

Engineering is design. If you have good design, good insight, you can do things that people with infinite time and budget could never dream to achieve. You can start making a product that's a hundred times more powerful for a tenth of the price in a fraction of the time. If you don't have good design, good insight, then no amount of time or budget can help you.



>not spent on the real bottlenecks

BE LOGICAL! Of course you first fix the big bottlenecks.

>good PC for $100,000. Doesn't mean it's good engineering.

Of course it is...or can you gold platter a pc case?


There are several assumptions that are far from a given with premature optimization.

1. Adding the optimization didn’t make the code more complicated.

2. Adding the optimization didn’t introduce a bug.

3. This part of the code will be a bottleneck in the future. The time spent optimizing is a write-off if the project is canceled or that portion is replaced.


All, true, but if you have a time budget and you know this thing will prevent you from meeting it AND making this change now will inform the architecture then its the right optimization to make.


Amdahl would agree.


> Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

I agree with the general thrust of this. But it's worth pointing out that often the easiest way to prove where a bottleneck is (or at least) isn't, is to try an optimization and see if it helps. I like profiling tools immensely, and this kind of trial an error optimization doesn't scale well to widespread performance problems. But there's something to be said for doing a couple of quick optimizations as tracer bullets to see if you get lucky and find the problem before bringing in the big guns.

The last three rules bug me. I wish we had a name for aphorisms that perfectly encapsulate an idea once you already have the wisdom to understand it, but that don't actually teach anything. They may help you remember a concept—a sort of aphoristic mnemonic—but don't illuminate it. The problem with these is that espousing them is more a way of bragging ("look how smart I am for understanding this!") than really helping others.

For example:

> Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

OK, well what are the "right" data structures? The answer is "the ones that let you perform the operations you need to perform easily or efficiently". So you still need to know what the code is doing too. And the algorithms are only "self-evident" because you chose data structures expressely to give you the luxury of using simple algorithms.


I think the point is more to direct how you write/refactor code: focus more on the data structures than on the code. E.g. if you're struggling to write the code, then maybe you need to take a step back and reconsider your data structures.


> Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

Is "data structures" the correct term here? Assuming I'm not misinterpreting, the usage of "data structures" can be misleading - one usually thinks of things like BST's and hash tables, which are inherently tied to algorithms. I feel like "data modeling" better captures the intended meaning here.


A custom type is also a data structure and that is usually what I think quotes like these refer to.


These rules are good as far as they go. However, they are notably silent on what happens when they stop working.

If you're a celebrity computer scientist and a problem has been 'squashed' to the point where there aren't any more 80-99% "hot spots" I guess you can just parachute out of there and on to a more interesting problem.

However, if you're paid to work on a particular thing, sooner or later, you will fire up a profile and say "oh, great, I don't have a hot spot anymore, just a bunch of 10-30% 'warm spots'". At that point you need to attack problems that aren't traditional bottlenecks if you still want to get speedups.

Moreover, the things that you learn from repeatedly attacking those 10-30% 'warm spots' might be fundamentally different from the learning you get from, I dunno, taking some O(N^2) monstrosity out of the "99% of the profile" and Declaring Victory.


Rule 1 and 2 depend on context, whether you're working on an existing program or a new program. They can be true or false. They can really help or they can really hurt. Are you going into an existing system to do performance optimization? Sure, don't guess, measure. Are you designing a new system? Throw out those parroted premature optimization mantras... you are responsible for designing for performance upfront. You will always measure but depending on context you will design for speed first and then test your prototype with measurements. There's no way around an initial hypothesis when you're designing new systems. You have to start somewhere. That's where Jeff Dean's rule always to do back of the envelope guesses will pay off in orders of magnitude, many times over.

Rule 3 and 4 are gold and always true.

Rule 5 is the key to good design.


I've always taken a very practical, results-oriented approach to software development.

That makes sense, to me.

One of the first things that we learned, when optimizing our code, was to use a profiler.

Bottlenecks could be in very strange places, like breaking L2 caches. That would happen when data was just a bit too big, or a method was called; forcing a stack frame update.

We wouldn't see this kind of thing until we looked at a profiler; sometimes, a rather advanced one, provided by Intel.


It turns out rule 5 (Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident) is both true but also hard.

Eric Evans' Domain Driven Design is a good book on the topic.


I’m such a huge fan of DDD. IMO, if you only ever read one book on Software Architecture, that’s the one to pick.

A key point, though, is that you learn the right domain models/abstractions over time. Refactoring is critical as you gain more insight into the domain. If you’re constantly questioning your modelling of the domain, and refactoring towards a better one, you’ll end up with a great model and thus a clean, understandable, easy to extend/modify system. If you stick with whatever abstractions you chose at the start, when you knew way less about the domain/business problems, you’ll likely end up with poor abstractions, and a code based that’s slow, tedious and error prone to modify.

Convincing the business that it’s worth setting aside time to constantly refactor towards better domain models is often the hardest part, but crucial.


> If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident) is both true but also hard.

The problem is not the self-evident algorithm, but the delicate implementation (or god forbid, at scale).

Take in 1000 web requests per second. The data is all strictly validated and has about 60 fields a record/req plus dealing with errors.

How does that go from webserver to (rolls dice) kafka to a (rolls dice) cassandra that can be queried accurately and timely? How much does that cost?

Oh, that's not a programmer problem. Except it is. Creating a fantasy niche of describing problems as data vs algorithm is the canonical ivory tower disconnect.


It seems you are arguing something different, although I am having a hard time understanding what you have written. I think you are saying algorithms and data structures aren't hard, distributed systems are hard. In my experience choosing the correct data structures and algorithms in your services/programs/whatever can dramatically simplify the design of your systems overall.


> It seems you are arguing something different,

I'm making an argument that the stark reality of what's hard in software development is not the simplistic "Rules of programming", which have limited utility.

The reason the "rules" aren't self evident (or followed), is because we live in the reality of disparate functionality paired with an ever-changing technical landscape. You can't just make a DB KISS abstraction and expect it to hold with all the different repository types like (rolls dice) Athena after using (rolls dice) CockroachDB. There are concerns that are not purely algorithmic vs data structure that are far more influential and important to understand. Even knowing these details and cases, becomes less useful as time goes on and new technologies emerge.

> I am having a hard time understanding what you have written

If you're not interacting with new environments, tooling, and problems, regularly (every year or 2) you don't encounter the real pain which is far more important to your career and your ability to produce functional software. Reading almost every postmortem, the number of lines attributed to "we changed the data structure to O" is dwarfed by "we learned that technology X does Y, so we had to do Z".

This is only incidentally related to distributed systems, which is indicative of a disconnect with the problem described. Of course when you sit around in the same environment for a long time, you can observe and optimize on structure and algorithm, but that's not getting you to market (you're already there) and that's the nature of maintenance...not just being a fire extinguisher who is on call.


> we live in the reality of disparate functionality paired with an ever-changing technical landscape

It is the responsibility of tech leaders to minimize this (accurate) stereotype. Choose boring technology, and only build your own or choose something exotic when it gives you competitive advantage - because the reality I also see is that 99% of devs aren't working on anything new or unseen in the field. Even at the FAMANG companies, most people I know are working on boring problems.

So when your CTO or architect or whomever buys into the hype for X technology, make a good argument against it by proposing a better solution.


Not only is it hard, it's the one thing that if you get it right, your technical foundation will be rock solid. But it's the thing most teams and organizations neglect to spend enough time on. I often wonder why this is the case -- my first mentors taught me that logical data modeling was a really important skill. But I never talk about third normal form or any such things with my peers.


Modeling the problem domain is so important that I dont know why the first year of every compsci undergrad program isnt entirely dedicated to teaching the idea.

Instead, day 1 is installing python or java, running hello world and talking about pointers, binary, encoding, logic gates, etc.

We should be teaching students on day 1 that code is a liability and to be avoided whenever it is convenient to do so.


IMO Matthias Felleisen (building on work of Sussman (IIRC) et. al.) gets first-year CS education right with How to Design Programs (https://htdp.org/2020-5-6/Book/index.html). Literally step 1 in the design recipe presented in the preface is:

   "1. From Problem Analysis to Data Definitions
    Identify the information that must be represented and how  it is represented in the chosen programming language.
    Formulate data definitions and illustrate them with examples."


I recommend Domain Modeling Made Functional by Scott Wlaschin. It's analaysis first, DDD, then coding (in F#) Best intro to functional programming imo.


All this advice against "premature optimization" has created generations of programmers that don't understand how to use hardware efficiently.

Here's the problem: If you profile software that is 100x slower than it needs to be on every level, there are no obvious bottlenecks. Your whole program is just slow across the board, because you used tons of allocations, abstractions and indirections at every step of the way.

Rob Pike probably has never written a program where performance really mattered, because if he did, he would've found that you need to think about performance right from the beginning and all the way during development, because making bad decisions early on can force you to rewrite almost everything.

For instance, if you start writing a Go program with the mindset that you can just heap-allocate all the time, the garbage collector will eventually come back to bite you and the "bottleneck" will be your entire codebase.


> Here's the problem: If you profile software that is 100x slower than it needs to be on every level, there are no obvious bottlenecks. Your whole program is just slow across the board, because you used tons of allocations, abstractions and indirections at every step of the way.

Oh this hurts. I work with a system in Perl that is just kinda slow. Too slow to be good but not slow enough to be useless. Slow enough that if it crashes we have trouble getting things reprocessed in a reasonable time, there's no fat built in to our timelines.

Anyway I've profiled it many times and found exactly what you said. Layers and layers of OO soup, functions calling functions calling functions. There are no obvious improvements. It's overhead, not code.


> Rob Pike probably has never written a program where performance really mattered

Rob Pike has written window system software which ran in what now would be called a "thin client" over a 9600 baud modem and rendered graphics using a 2MHz CPU. He probably knows a thing or two about performance tuning.


By "rendered graphics" you mean "place characters on screen"? If the bottleneck for that is a 9600 baud modem throughput, there's not a lot you need to optimize, even on a 2MHZ CPU.

Also, having programmed more constrained systems decades ago doesn't magically make you knowledgeable on performance on modern hardware with completely different capabilities. In fact, it's probably what causes you to develop a "computers are so fast now, no need to think about performance"-mindset, because everything you want to do could be done in an arbitrarily inefficient way on modern hardware. Performance doesn't matter to you anymore.


> By "rendered graphics" you mean "place characters on screen"?

It was a fully graphical 800x1024 (or 1024x1024) system running on 1982 processors.

https://en.wikipedia.org/wiki/Blit_(computer_terminal)

> having programmed more constrained systems decades ago doesn't magically make you knowledgeable on performance

Perhaps not but it does mean you've "written a program where performance really mattered" which I believe was the original claim?


> It was a fully graphical 800x1024 (or 1024x1024) system running on 1982 processors.

I've looked into in that. Blit was monochrome, had an 8Mhz processor, and a relatively large 256KB framebuffer which could but directly written to. There were only a handful commands, mostly concerned with copying (blitting) bitmaps around.

Rob Pike only wrote the first version of the graphics routines - the slowest version, in C(!). It was rewritten another four times over, by Locanthi and finally Reiser.

I don't think any credit should go to Pike for implementing the performance-critical parts of that system.

https://9p.io/cm/cs/doc/87/archtr.ps.gz

> Perhaps not but it does mean you've "written a program where performance really mattered" which I believe was the original claim?

No, it doesn't mean that. You can be wasteful on constrained hardware as well, performance doesn't necessarily matter even on the simplest chips, if what you want to do doesn't need the full capabilities of the system.

However, I am specifically replying to the claim that "Rob Pike probably knows a thing or two about performance". As you can see, Rob Pike handed off performance-critical work to someone else. He probably didn't know how to write optimal code for that particular platform, but even if he did, most of that knowledge wouldn't transfer over to modern systems.

At the very least, he didn't care about optimizing that stuff, or he wouldn't have handed it off. He would've enjoyed optimizing that stuff. And that's all completely fine, not every programmer needs to care about performance. I just refuse to take advice from these people about performance or "premature optimization", because it is uninformed.


Writeup of the blit terminal's operating system is here, it consists of a lot more than the bitblt primitive, with many whole-system performance concerns at play:

http://a.papnet.eu/UNIX/bltj/06771910.pdf

Suggest you read this before denigrating Rob Pike's bona fides. Not sure what axe you are trying to grind but it is ugly and unbecoming of a professional.


I'm not denigrating his bona fides, I'm questioning his credentials on performance-oriented computing. For all I know, if Rob Pike had been a performance freak, Blit might've never shipped. He may indeed have chosen all the right trade-offs.

Nevertheless, the advice he gives on performance is wrong, plain and simple, for the reason that I gave you: If you have overhead everywhere, there is no bottleneck that you can observe - your software is just slower than it needs to be across the board. If you write software without performance in mind from the very beginning, you can never get all of it back by optimizing later - without major rewrites that is.

How does one give wrong advice? By not having the required experience to give correct advice. I don't care if you're Rob Pike, Dennis Ritchie or Donald Knuth. If you're wrong, you're wrong.


> In fact, it's probably what causes you to develop a "computers are so fast now, no need to think about performance"-mindset

This is the complete opposite of Rob’s mindset, which you’d know if you had any familiarity with his work.


Your reply here saddens me.

I suggest you look up Rob Pike and reconsider some of your hypotheticals about what he knows about. (https://en.wikipedia.org/wiki/Rob_Pike)


> Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

That one hits me in the feels because I think a lot of folks focus on algorithms (including myself), and code patterns, before their data and as a result a lot of things end up being harder than they need to be. I've always liked this quote from Torvalds on the subject speaking on git's design (first line is for some context):

> … git actually has a simple design, with stable and reasonably well-documented data structures.

then continues:

> In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful […] I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

When I have good data structures most things just sort of fall into place. I honestly can't think of a time where I've figuratively (or literally) said "my data structure really whips the llamas ass" and then immediately said "it's going to be horrible to use." On the contrary, I have written code that is both so beautiful and esoteric, its pedantry would be lauded for the ages-- had only I glanced over at my data model during my madness. No, instead, I awaken to find I spent my time quite aptly digging a marvelous hole, filling said hole with shit, and then hopping in hoping to not get shitty.

One thing that really has helped me make better data structures and models is taking advanced courses on things like multivariate linear regression analysis specifically going over identifying things like multicolinearity and heteroskedasticity. Statistical tools are incredibly powerful in this field, even if you aren't doing statistical analysis everyday. Making good data models isn't necessarily easy, nor obvious, and I've watched a lot of experienced folks make silly mistakes simply because they didn't want something asinine like two fields instead of one.


The counter argument would be that git is the poster-child of poor UX, which could be blamed on the fact that it exposes too much of its internal data structure and general inner-workings to the user.

I.e. too much focus has been put on data structures and not enough on the rest of the tool.

A less efficient data structure, but more focus on UX could have saved millions of man hours by this point.


It's difficult, because git's exposition of it's data structures enables you to use it in ways that would not otherwise be possible.

I think git is more of a power-tool than people sometimes want it to be. It's more like vi than it is like MS Word, but it's ubiquity makes people wish it had an MS-word mode.

So, I think that it's hard to fault git's developers for where it is today. It's a faithful implementation of it's mission.

FWIW, I have never used a tool with better documentation than git in 2020 (it hasn't always had good --help documentation, but it absolutely does today).


Or perhaps learning Git just requires a different approach: you understand the model first, not the interface. Once you understand the model (which is quite simple), the interface is easy.


People keep repeating this, but it's not true. The interface has so many "this flag in this case" but "this other flag in that case" and "that command doesn't support this flag like that" etc. There's no composability or orthoganality or suggestiveness. It's nonsensical and capricious and unmemorable, even though I understand the "simple" underlying model and have for years.


Sorry, I was replying to this:

> The counter argument would be that git is the poster-child of poor UX, which could be blamed on the fact that it exposes too much of its internal data structure and general inner-workings to the user.

I agree with you that the UI is inconsistent, however I don't agree that it's the result of git exposing too much of the internal data structure.


Has anyone attempted to re-engineer a superior UX on top of the git data structure? Would it even be possible?


Yes, I think so. There are many git clients which offer superior UX already, but they only provide a subset of the functionality available with the data structure. I'd personally love to experiment with showing and editing the data structure more 'directly', instead of relying on a battery of CLI commands and options.


Magit with emacs solves git's UX problem IMO. Discoverability is/was git's real problem.


This is true, but the trouble is that you need to know what git will do before the magit commands and options make sense.


It makes sense, when we bring in another aphorism "code is data". It's easier to write good code with good libraries. And it's easier to write good data models that extend good data models. The main distinction is that code is very dynamic, flexible, and malleable, whereas data models need not be.

Data models are the "bones" of an application, as part of the application as code is. Data models fundamentally limit the application's growth, but if they're well-placed, they can allow you to do things that are really powerful.

You always want to have good bones. But the Anna Karenina Principle is a thing [0].

So, applying this, I think baby ideas should not have many constraints on the bones, to allow them to move around in the future. Instead, there should be a ton of crap code implementing the idea's constraints, because they change every week, month, quarter, and the implementer is still learning the domain.

Once the implementer reaches a certain point of maturity in the domain, all of the lessons learned writing that crap code can be compressed into a very clever data model that minimizes the amount of "code" necessary, and simultaneously makes the project more maintainable, interface-stable, and extensible: in other words, making it an excellent platform to build on. The crap code can be thrown out, because it was designed to halfway-ensure invariants that the database can now take care of.

I think most software we consider "good" these days followed this development cycle. multics -> unix, <kversion_control> -> git, ed -> vi -> vim.


It's worth noting the same holds true for UI: data dominates. Design your widgets, layout, and workflow around the data.


> It's worth noting the same holds true for UI: data dominates. Design your widgets, layout, and workflow around the data.

I couldn't agree more.

I think the current state of UI programming is like the pathological case to be honest. Too often folks are concerned with representing their database 1-to-1 in their UI instead of representing their view.

If anyone is suffering from brittle UI code, where somehow caching issues and stale data are affecting your application, this is very likely why. You have muddled your persistence and view concerns together and it's not manageable or pretty. What this means for folks using something like React- don't directly use your persistence models in your views, create "view models" which directly represent whatever the hell it is you're trying to display. Bind your data in your view models, and not your views, and then pass the view model in as props.


Amen. My 23 years experience in webdev says React (the paradigm, not the lib per se) is dominating web UI precisely because it is all about unidirectional data flow.


"Write stupid code that uses smart objects"

That's a good one. It's amazing how much complexity can be created by using the wrong abstractions.


FWIW I find this is especially important for compilers and interpreters.

It's not an exaggeration to say that such programs are basically big data structures, full of compromises to accomodate the algorithms you need to run on them.

For example LLVM IR is just a big data structure. Lattner has been saying for awhile that a major design mistake in Clang is not to have its own IR (in the talks on the new MLIR project).

SSA is data structure with some invariants that make a bunch of algorithms easier to write (and I think it improves their computational complexity over naive algorithms in several cases)

----

In Oil I used a DSL to describe an elaborate data structure that describes all of shell:

What is Zephyr ASDL? http://www.oilshell.org/blog/2016/12/11.html

https://www.oilshell.org/release/0.8.pre9/source-code.wwz/fr...

I added some nice properties that algebraic data types in some language don't have, e.g. variants are "first class" unlike in Rust.

Related: I noticed recently that Rust IDE support has a related DSL for its data structure representation: https://internals.rust-lang.org/t/announcement-simple-produc...


> FWIW I find this is especially important for compilers and interpreters.

Totally. I'm building a relational language and start to get very obvious why RDBMS not fit certain purity ideals of the relational model (like all relations are sets, not bags).

I'm stuck in deciding which structures provide by default. Dancing between flat vectors or ndarrays or split between flat vectors (columns), and HashMaps/BTree with n-values (this is my intuition now).

--- > I added some nice properties that algebraic data types in some language don't have, e.g. variants are "first class" unlike in Rust.

This sound cool, where I can learn about this?


FWIW I found this post thought provoking in thinking about data models of languages.

https://news.ycombinator.com/item?id=13293290

---

About first class variants:

https://lobste.rs/s/77nu3d/oil_s_parser_is_160x_200x_faster_...

https://github.com/rust-lang/rfcs/pull/2593

Another way I think of this is "types vs. tags": https://oilshell.zulipchat.com/#narrow/stream/208950-zephyr-... (Zulip, requires login)

Basically variant can types stand alone, and have a unique tag. Tags are discriminated at RUNTIME with "pattern matching".

But a variant can belong to multiple sum types, and that's checked statically. This is modeled with multiple inheritance in OOP, but there's no implementation inheritance. Related: https://pling.jondgoodwin.com/post/when-sum-types-inherit/

So basically in the ASDL and C++ and Python type system I can model:

- a Token type is a leaf in an arithmetic expression

- a Token type is a leaf in an word expression

But it's not a leaf in say what goes in a[i], or dozens of other sum types. Shell is a big composition of sublanguages, so this is very useful and natural. Another construct that appears in multiple places is ${x}.

So having these invariants modeled by the type system is very useful, and actually C++ and MyPy are surprisingly more expressive than Rust! (due to multiple inheritance)

Search for %Token here, the syntax I made up for including a first class variant into a sum type:

https://www.oilshell.org/release/0.8.pre9/source-code.wwz/fr...

There is a name for the type, and a name for the tag (and multiple names for the same integer tag). Tags (dynamic) and types (static) are decoupled.


I frequently find that, when refactoring especially, you find a lot of big ol' god objects that are incomprehensible. But when you break them down into 5-10 small objects, suddenly the operation they were trying to do makes perfect sense.


mmmm very true with Redux --> context+hook state


Yes! We should replace DRY (don't repeat yourself) with AHA (avoid hasty abstractions), as the dominant rule of thumb.


It's key to write code that can be easily re-factored, because most of the time you come up with a much better idea a week later.


Just yesterday I removed an obsolete doxygen tag from some 300 files and my Visual Studio went catatonic.

Well, I suspect it's not about Studio per se but rather the git integration but still. Someone avoided a "fancy algorithm" and wasted both my time and the product reputation with a "small n" workaround. Because, git is about small commits right?

I'd like to restate the first rule as "You can't tell where a program is going to spend its life".


"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

— Linus Torvalds


Amateur mathematicians worry about patterns, professionals worry about numbers


I'm neither an amateur nor professional mathematician, so I can't tell; is this statement tongue-in-cheek? If not, what does it actually mean?



Interesting that the first 3 are all about performance. Which strikes me as a bit ironic given rule #1, which could be summarized as don't worry about performance until you have to.


Pike himself says "there's a 6th rule": https://twitter.com/rob_pike/status/998681790037442561?lang=...

(6. There is no Rule 6.)

And points to the best source he finds for it on the web: http://doc.cat-v.org/bell_labs/pikestyle


dang if this sticks to the frontpage, can you change the title to "Rob Pike's 5 rules of programming in C"? As the title is now it misrepresents Rob Pikes words.


> "write stupid code that uses smart objects".

Writing stupid code is actually really difficult.

For me, it takes a little bit of iterating before I know just the right place to insert stupid.


I'm reminded of the old adage "I'm writing you a long letter, because I don't have time to write a short one."

(Often attributed to Mark Twain, but similar sentiments were expressed by many before him.)


I have two projects that I consider to have nearly-perfect code. Both are their third iterations, and I think they're stable at this point.

Kinda goes: 1) Make a bad solution exploring the problem 2) Explore a good idea for how to solve the now-understood problem 3) Mature the good idea through usage.


Do you use TDD? I'm not religious about it in general, but when I'm lost, confused, and easily distracted, I start with TDD to write the dumbest possible code.


No.

It's really an issue of not being sure at first what needs to be flexible & data-driven vs handled in code. If make everything data driven, then it becomes this horrible mess where your input is basically a program and your actual code ends up being a terrible interpreter.

I tend to just build things bottom-up, and start with a small bit of functionality, then when I have enough small bits, I bolt them together and decide what I need to abstract at that point, do refactoring on the smaller bits and provide data to them from the caller. Then repeat that continuously until I have all of the functionality I need.

It might be different for other people, but I need to have working code before I can it abstract.


I don't understand. Tdd gets you to working code (if on a subset of all your data) extremely fast. There are both bottom-up and top-down styles, neither of which is particularly wrong.


The TDD aspect is not really here nor there.

To provide a more concrete example, say I have a function that performs a transformation on a piece data. From what I currently know about the data, I can parse it using a regex. So I code up the function that accepts data, it runs the regex and provides the transformed result. Great.

Now as I'm continuing my work, I notice that some other data requires a similar, but not exactly the same transformation. It can be done with a slightly different regex. So rather than duplicating functionality, I modify the transform function above to take a regex as input along with the data. Everything works as expected.

I get further along in the project and I realize another piece of data needs a somewhat similar transformation, but this time it's just slightly too complicated for a stand-alone regex, it needs to be a function.

The "smart code" way to handle this would be to create another transformation function and call that instead. The "dumb code" way of handling this is to generalize the transformation function such that I can pass in some descriptor for the transformation, and have the transform function return the correct result.

That's the crux of the issue. I rarely have enough information at the time of writing to know just how generalized to make a function. If I created this hyper-generalized transformation at the beginning, but never needed anything beyond the original simple regex, I would have wasted a bunch of time creating code that's needlessly confusing.

TDD is perfectly applicable for development, and would help tremendously with the refactoring aspect, but what it doesn't help with is information that you don't yet know about.


TDD is good for features (like web apps) but not so much for algorithms.

The difference is that you only need to support a tiny fraction of possible features / use cases, but your algorithms need to be correct for a wide range of inputs.


I disagree. You code your algo for one input, come up with a corner case, write a failing test, refactor, repeat.

For an algo, let's say it operates on a list, I'll start with test f([]) == 0, and implement f to output the constant 0.

And then go from there.


That only works if the algorithm isn't already understood and so you don't know what optimal is. If you tdd sort there is no way to get divide the list to small chunks, insert sort each small chunk, then merge sort them. (20 years ago this is what gnu sort did, I assume that this algorithm has changed now that cache is a bigger factor in performance). Even knowing that the above is your end goal TDD to get there is wrong because you don't know how the best algorithm will change when something else comes up as important.

Tests are good, but they need to be universal for any implementation, which means you often cannot tests the internal details that prove you didn't use bogo sort (picking a pathological example to make the point)


Ok but honestly, most people are not coding raw low level data structure algorithms. They are coding business algorithms. "Here's a table of accounts, calculate what they owe"

When one says "wide range of inputs", as gp did, one is not talking about low-level algorithms; one is talking about a business algorithm. You should be using TDD for this.


Why does it have to be this polarized - stupid code & smart objects. Writing smart code that uses smart objects is better than stupid code. Writing stupid code that uses smart objects is indeed difficult.


Rule 5 (data dominates) is exactly the same as lie 3 (code is more important than data) of Mike Acton's Three Big Lies (https://cellperformance.beyond3d.com/articles/2008/03/three-...). Rob Pike's rule 3 (don't use fancy algorithms) is closely related to Mike Acton's lie 1 (software is a platform) as well.

A part of me wonders if Mike was influenced by Rob Pike's rules directly or indirectly. It's also something that an experienced programmer can discover independently easily enough. Mike Acton was clearly heavily influenced by some of the failures of classic OOP design (lie 2 of Three Big lies).


I feel ike #5 is why many people like GraphQL, and #1 - #4 is why many hate it.

It becomes much easier to understand what data exists and build tools on top of it's schema. But the extra cludge it adds to optimise the specific case of returning less data in a field is often counter productive.


> But the extra cludge it adds to optimise the specific case of returning less data in a field is often counter productive.

Could you elaborate?


While the rules make sense, it is often not realistic due to business constraints.

What tends to happen in real life is that the profiling and optimization of bottlenecks is forgotten once the software is "ready".

I would propose Rule 0: Developers are irrelevant, only end users of your software matter.


One of the big problems with fancy algorithms is that they either access data out of order and/or do pointer chasing. Simple algorithms tend to access the data in order.

CPU's have a lot of logic for making in order data access very fast.


This is even more relevant after 30 years.

The Data Dominates principle is the key, everything else just follow, including n are usually small and preference for simple algorithms.

Nowadays we have exactly the opposite, which is understandable since the field has been invaded by uneducated amateur posers.

https://karma-engineering.com/lab/blog/NodeModules


Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

True, but emphasis on the speed hack. It shouldn't stop you from thinking about performance in your design. If it means doing less, do less now. If it means adding a speed hack (usually some sort of cache), don't do it until you are sure that you need it.

Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.

I agree completely. But you have to realize that measuring is as much of an art form as optimization is. Especially on today's ridiculously complex systems, the bottleneck may not be obvious. For example a function may take a long time to run, but the real cause may be another function flushing the cache.

Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)

Disagree, unless you can prove that n will stay small. You don't know how your users will abuse your program. For example, you may have designed your program to handle shopping lists of a few dozens of items, and then someone decides to import the entire McMaster-Carr catalogue, which has more than half a million items. It may be a great use case you didn't thought of, and that fancy algorithm permits it by scaling well. There are also vulnerabilities that exploit high algorithmic complexity and worst cases. Don't overdo it, but N^2 is rarely a good idea if you can avoid it.

Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.

True, but with the caveats of Rule 3

Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

Agree, for me there is a hierarchy in code. From most important to least important: data (structures), code (algorithms), comments.

The order comes from the fact that if you change your data, you need to change your code too, and if you change your code, you also need to change your comments. Going the other way, you can freely change comments, and changing your code will not require you to change your data. Data is the cornerstone.


> Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident

What does this have to say about the careers and roles of data scientists vs programmers? A data scientists entire job is to categorize and model data in a useful way. In the future, will they fundamentally more important than coders, or will the two roles just merge?


I think you're conflating two things: in my mind, working on the shape of data is different than pulling inferences out of that data.


Am I wrong to avoid writing O(n^2) code if at all possible when it is fairly easy to use hash tables for a better time complexity? Sure when n is small the O(n^2) one will be faster but when n is small /anything/ you do is fast in absolute terms so I'm trying to not leave traps in my code just waiting for n to get bigger than initially expected.


I think, in most cases, if you have n items in memory, and you want to find m of them by id, or some such thing, your default should be to represent the n items as a hashmap, not a list, and look them up from the hashmap. There will be cases where this isn’t the right choice, but it’s a good default. And it’s almost always virtually no extra complexity to represent them this way, i.e. often something as simple as: myMap = myList.groupBy(_.id)

Don’t write complex optimizations until you know you need them, but I think defaulting to code with good O(N) complexity, where it’s simple to do so, is a good default.


Keep in mind that a lot of this was written during an era where even modestly complex data structures/algorithms had to be rolled by hand.

The philosophy is really to not waste time implementing optimizations that may not be necessary. Naturally you should reach for the best tool you have in your tool box. So if you language of choice has a hashmap that can be used with no additional work, go for it. But don't wait two days rolling your own red-black tree because it might be better.


> Am I wrong to avoid writing O(n^2) code if at all possible when it is fairly easy to use hash tables for a better time complexity

Are you sure that std::unordered_map is faster than std::vector? Did you measure?

Every time you access an element in std::vector, you also access nearby ones (thanks to L1 cache, as well as CPU-prefetching of in-line data).

In contrast, your std::unordered_map or hash-table has almost no benefits to L1 cache. (It should be noted that linear-probing, despite being a O(N^2) version of hash-tables worst-case, is actually one of the better performers due to L1 cache + prefetching)


To be fair std::unordered_map is one of the slowest hash tables in any programming language.


Worrying about performance of small collections is premature optimization.

Using maps or sets nowadays is mostly for clarity, as they are used to solve certain kind of problems.


I agree with you. But what you're talking about is completely different from what I was responding to originally.

If you need a set, use a set. But don't assume that its faster than a std::vector.

Even then, std::vector has set-like operations through binary_search or std::make_heap in C++, so it really isn't that hard using a sorted (or make_heap'd) std::vector in practice.

--------

Even if you don't plan on doing optimization work, its important to have a proper understanding of a modern CPU. The effects of L1 and prefetching are non-trivial, and make simple arrays and std::vectors extremely fast data structures, far faster than compared to 80s or 90s computers anyway. A lot of optimization advice from the past has become outdated because of the evolution of CPUs.

So its important to bring up these changes in discussion, from time to time, to remind others to restudy computers. Things change.


> sorted std::vector

But inserting into sorted std::vector is o(n log n) in worst case right? (As you have to binary search the position and move other elements). But a hash set with linear probing can give O(1) (amortized) access, while usually maintaining an invariant like total_size > 2*n. I don't think that would be such an impact on cache locality. Linear probing doesn't require linked lists.

Of course this is given that you have a data structure in standard library. But at this point I think hash sets are pretty standard.


Yeah, linear probing helps hash-sets a lot on modern CPUs.

Perhaps linear probing is a better example of how BigO analysis can go wrong on modern architectures. Inserting into a hashset with linear-probing is O(n) worst-case, while inserting into a linked list is always O(1) (best case, worst case, and average case).

And yet, linear probing seems to work out best in practice (with a bit of rigging. The total_size > 2*n invariant is one, but so does Robin-hood hashing if you want to keep the table small)

Linear probing vs Linked List implementations of hash-sets seems to be a more clear example of an O(1) vs O(n) anomaly, where the O(n) example is superior.


I don't think it is the same thing.

Inserting into linked list assumes you have found the node to insert in.

I don't remember exact details but in hash set with linear probing, the worst case happens quite rarely given the hash function is good one (which are quite sophisticated these days). It is O(1) amortized. The same applies for hash table with chaining too, that all of your keys may go to same bucket given a sufficiently bad hash function.

Given the other choices, like (as far as I know) SkipList based or tree based variants, hash sets are obvious choice.


I mean "Hash-set with Linked List" vs "Hash-set with linear probing". I realize I was getting lazy with my typing, so lemme try to be more clear this time.

Hash-set with Linked List is O(1) all cases.

Hash-set with linear-probing is O(n) worst-case insertion. But happens to be faster in practice with circa 2020-style CPUs (especially with Robin Hood insertion)

Assume the load-factor to be 90%+, so that we actually get a reasonable difference between the two strategies. We have a situation where O(n) is better than O(1).


> Even then, std::vector has set-like operations through binary_search or std::make_heap in C++, so it really isn't that hard using a sorted (or make_heap'd) std::vector in practice.

Except std::vector does not enforce the sorted or heap constraints when adding or removing elements so eventually someone working on your code will break them.


Make a wrapper class that enforces the invariant.


Also, creating a hash isn't free. And often ordering is required.


> Also, creating a hash isn't free.

Hmmm... I argue that the hash is nearly free actually.

An unordered_map traversal is probably DDR4 latency bound. That's ~50-nanoseconds (200 clock ticks) per access. What's the CPU going to do in that time?

Well, spending 10 to 20 clock ticks on a typical hash algorithm is fine. Then it will wait the other 180 clock ticks for RAM. If you got hyperthreading, maybe the CPU will go to another thread and do meaningful work while waiting for RAM... but... I think you get the gist.

Even IF the hash were free, the CPU is waiting for RAM anyway. So you got plenty of time to make that hash worthwhile. Even an integer division/modulo operator (worst case ~80 clock ticks) can fit in there while waiting for RAM, with plenty of room to spare.

I guess if everything was in L1 cache, the story is a bit different. A lot of "depends", depends on the data, the access frequency, etc. etc.


Yes. If you don't measure, you will never know when the time complexity moves in your favor due to the constant, and other factors. For example, quicksort moves to a O(n^2) algorithm (insertion sort) for the last few iterations of a branch of work(typically n=1000) because this reduces the total sort time.


I would not read these rules as anything like suggesting to write O(n^2) code. I think at best I'd read them as favoring O(n) instead of (1) for small n, but I really think that "fancy" algorithms in this case doesn't mean obvious optimizations like this.

If you can't guarantee n is small, I think it is entirely sensible to use a dictionary/hash table instead of looping over an array or list. As long as n is small the overhead is probably irrelevant, and it'll prevent surprises if n gets large as you said. And if the difference actually matters, you get back to rule 1 and 2 and measure first anyway.


If it's fairly easy, then I think it still fits the spirit of the rules. KISS and all.

On the other hand, the idea that one might be setting traps is slightly weird... if you _know_ 90% that n will be large then pick an algorithm that's efficient (and since it's easy to implement, it's a win-win). If n is always going to be small, then does the choice really matter?


If N is known small you should favor simplicity and readability over complex optimization.


That depends.

If you are writing the code yourself then the maintenance costs of everyone else after you trying to understand it makes it wrong. However most programming languages have generic programing features such that you can just use an algorithm and so you aren't writing either the algorithm. In that case the code for the fast hash is equal to the O(n^2) code and so of course you select the faster one as a application of don't prematurely pessimise your code rule. If your programing language doesn't already have built in generic algorihtms for your data, then you are using the wrong language (unless your job is to write the generic algorithms for the language in which case this doesn't apply because you can assume your algorithm will be used in a performance critical part at some time)


Yeah, I do well writing dumb inefficient code as default, and optimizing it when needed, which is almost never.

If I know beforehand we'll handle a lot of data, I can pick something fast and complex to begin with, but that effort is probably mostly a waste.


Use whatever makes for clearer code, unless you are certain it's the bottleneck.


It depends. If you don't need to worry about performance, then it will work. But in some performance-dependent situations, the brute force algorithm will win over the hash table, and the only way to know is measuring which one is better.


Refer to the rule about measuring first. It's a pessimisation because N may not even be large enough to overcome the slowdown from a hash table's cache misses. You can always profile and fix it later.


You can never profile with all data your users are going to feed the program.


What about when n is small but you do it m times and m is large.


Using array based data structures also get you more cache hits. For smaller data sets, this may well be faster than a fancy algorithm.


This is true only if you are iterating over the entire array often. If you only rarely need to access one data member the array will not be in cache and so you have less to load from memory. Depending on how your data is structured the array may or may not save time even in small sizes.


Prefetching still may make it way faster.


Of course temporal and spatial locality are important.


These resonate, especially #1, but I'm not so sure about #5. Although it makes sense to choose good data structures, I don't think that guarantees a simpler algorithm. For example you can store your data in a heap (tree), and still need to write a tree traversal algorithm to print out the elements in order.


Rob Pike's rules of programming simplified :

- Start with stupid code built on smart data (after Fred Brooks)

- When in doubt use brute force (Ken Thompson)

- Premature optimization is the root of all evil (Tony Hoare)


Why don’t interviewers at tech companies just don’t follow these rules while grading candidates?


Over time, the rule I've "discovered" is:

"Focus on the Nouns, not the Verbs"


Wait, wasn't Knuth that said that premature optimizations are the root of all evil?


When people say worry about the data structures, what do they mean?


> Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

:O

Epiphany


There are actually six rules!



These rules look silly when you know that your tight loop that waits for IO or redundantly computes things needs caching. No, you don't need to measure that, and you know that your tight loop function is going to be the bottleneck. Everyone knows that.

Now it does make sense when you introduce an entire constraint library instead of looping over 3-4 variables with a small search space. But again, you know it is a small search space. You know you don't have to optimize it.

I really don't get these rules.

Edit: Go ahead and roast me, but keep in mind I've probably been there and back.


> No, you don't need to measure that

Just this evening I came across some code which pasted an image on top of a blue background (in Go) that set every individual pixel to the background, then got every pixel from the source and set the corresponding pixel of the destination to that colour. I figured it'd be quicker to paste the source onto the destination with `image/draw`.

Turns out, if you're using NRGBA images, it's 40-50% slower. That's definitely an "obvious optimisation" that was proven wrong by measurement.

(If you're using RGBA images, though, the pasting method is 300% faster. Because obviously.)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: