Rob Pike’s Rules of Programming (1989)

dkarl · on Aug 12, 2020

> Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

I wish people would follow this rule and just let stuff work. I recently encountered the most extreme version of this I've ever seen in my career: a design review where a guy proposed a Redis caching layer and a complex custom lookup scheme for a <1GB, moderate read volume, super low write volume MySQL database. And of course he wants to put the bulk of the data in JSON fields and manage any schema evolution in our application code.

Can't we just let stuff work? I'm no fan of MySQL, but can't we admit that a ubiquitous and battle-tested piece of technology, applied to a canonical use case, on tiny data under near-ideal circumstances, is probably going to work just fine? At least give it a chance before you spend days designing and documenting a bunch of fancy tricks to save MySQL from being crushed under a few megabytes of data.

TeMPOraL · on Aug 12, 2020

That case you've seen speaks of the guy's inexperience and lack of understanding.

I have a problem with this rule because what I see happening is people taking it to heart and no longer thinking about what they're doing performance-wise. And then the program is working 1000x slower than it should, at no extra gain (and often a loss) of readability or safety, just because someone decided to use O(n) data structure where O(1) would do, or keeps repeating the same computation thousand times instead of ensuring it's done once.

So to your "let stuff work", I want to also add: "understand the work you're putting the computer through", and "don't do stupid things".

projektfu · on Aug 13, 2020

Architecture is not optimization. This gets fuzzy with the decision to use caching, static generation, etc. Ideally, redis or memcached can be added when needed. It will require some changing of the app but hopefully in limited places.

This rule a la Pike is about doing things like writing assembly or manually unrolling loops in noncritical parts of code. However, in some code, almost everything is on the critical path, and that requires architecture. I’m thinking of Carmack’s single function game loop here.

I remember working on a project that claimed to want 10,000 transactions per second. “Okay”, I said, “how much can be accomplished in 100us?”

They looked at me like I was an idiot. “No, it’s going to be clustered and pipelined.”

“Ok, if you can do that maybe you have a budget of 1-5ms.”

“No, we’re going to give each transaction up to a second to get done”

I smiled and admitted that they were obviously a lot smarter than me. Oddly enough the product never saw the light of day.

Aeolun · on Aug 13, 2020

Don’t worry, we’ll use serverless! So we can support an unlimited amount of transactions per second!

formerly_proven · on Aug 16, 2020

Latency and throughput are not the same thing. A CPU can process multiple instructions per clock cycle yet instructions take longer than a clock cycle.

User23 · on Aug 13, 2020

That’s not obviously absurd. If Amdahl’s law permits then Little’s law tells you whether or not it’s possible.

projektfu · on Aug 13, 2020

Yep. As I said I’m not smart.

hawkesnest · on Aug 13, 2020

Wow, you are really good at this.

Let the "geniuses" figure out that to do 10k transactions/sec with processes that can handle 1 transaction/sec would take.....10k processes. Sure, doable, in specific contexts, given enough resources. But they sure as heck aren't free resources!

keithnz · on Aug 13, 2020

I think maybe you are misreading the rule, it doesn't say don't optimize, it says when optimizing, don't guess from the code where the bottleneck is, go measure it.

vvanders · on Aug 13, 2020

Yeah but some things like how you structure your data(which then drives CPU cache misses) aren't something you can easily adjust/tune.

Usually when you encounter one of those it's a rewrite/rearchitecture of a whole module/subsystem before you see any gains. Been there done that, not excited to repeat it again.

tonyhb · on Aug 13, 2020

The other rule is choose your data structures wisely.

tikiman163 · on Aug 15, 2020

The number of people who don't understand when to use a list/array VS dictionary/hash table vs lookup object is too damn high. A huge amount of basic optimization is constantly at their fingertips and they nearly always choose to make a list/array and use linq to join/merge multiple relational data sets instead of optimized standard objects.

keithnz · on Aug 13, 2020

sure, sometimes your approach is at fault, but as the rule says, sometimes it's just a surprising bottleneck. Having done a lot of embedded work, I've found that many of those can often be fixed in a straightforward manner. Sometimes, you have to change your approach though.

thdrdt · on Aug 13, 2020

It is all about lack of knowledge because checking where the bottlenecks are is one thing. Knowing why they are there is another.

A retailer app was very slow. The developers proposed a newer faster server with a newer Oracle version.

Then I took a look at one of the slowest queries. Changed it so it could use the indexes in a better way and the query went from +20 to 0.7 seconds.

The developers measured the query was slow, they checked the query used indexes and they were right that a server with more RAM and a newer Oracle version could improve the speed. But they missed that the query had to fetch a lot of data (using indexes) before it could start filtering the data. The only thing I did was to change the query so it could filter the data first.

Annatar · on Aug 13, 2020

Bolt - 95 cents

knowing where to put the bolt - 95 bucks per hour.

oh-4-fucks-sake · on Aug 13, 2020

I've had many similar experiences with Postgres. I always seem to be the one adding the query logging to the apps. Then I feed the prod queries into the query planner. Then run the results through this tool: http://tatiyants.com/pev/#/plans/new Tweak the queries, add/remove indexes. Voilà.

kazinator · on Aug 14, 2020

The developers didn't actually know where the problem is because they didn't root cause it. They knew it lay within a certain pretty large bounding box (a particular query), but didn't drill into it further, beyond checking that indices are used (eliminating some inadvertent unindexed search as being the root cause).

If you actually know where (or each one of the multiple wheres if several places collude), that is usually very close to knowing why; often the same.

They should have had the intuition that if a query takes 20 seconds, even in a testing scenario where the system is not bogged down, it must be churning through a lot of data all over the place. Then think: does the query actually need to be looking at a lot of data? Maybe it's wastefuly looking at more records than necessary. They didn't imagine what the machine might have to do to satisfy the query, just accepting it as a black box that the DB has optimized as well as it can be (so just throw hardware at it).

Cthulhu_ · on Aug 13, 2020

Sounds like ops vs dev (or DBA) right there; it does make sense to a point that ops would grow into that kind of behaviour, given that they can't really allocate resources into performance enhancements for software (especially if it's from an external party or off-the-shelf). To them it's a black box.

But if it's an in-house thing then there should be open communication lines. The SRE paradigm makes sense there, where I see a SRE as part dev, part ops. They can identify perf issues as a software problem and either fix it or send it back to the authors.

thdrdt · on Aug 13, 2020

It was more like dev & dev.

I am not criticizing anyone because I was just lucky to notice the query could be rewritten. But it was also the fact that I had a little more knowledge about how the db engine handles queries.

So we all looked in the right direction, we all found the bottleneck but the solution was different based on different knowledge.

strgcmc · on Aug 13, 2020

I would go so far as to say, their "solution" was not a solution at all. Granted, in the absence of being able to recognize alternatives, it may have been the best available option left, but it fundamentally missed the root cause of the query slowness.

Your experience exactly shows why having a diversity of opinions/backgrounds/expertise on a team is a very valuable trait. Had no one realized you could rewrite the query, would scaling vertically and upgrading Oracle been a fatal mistake for the team? Probably not, but damn if it wouldn't have been a big waste of time/money.

maxk42 · on Aug 12, 2020

This is particularly exasperating for me. I can't tell you how many times in my professional career I've ended up speeding up systems by removing two or three layers of improperly-implemented "caching" and using good ol' MySQL and a basic understanding of algorithmic time complexity to simplify things.

flukus · on Aug 12, 2020

Me too. I've seen a few systems that replaced their simple request requiring 10,000 queries "optimized" by requiring 10,000 cache lookups when they should have just added some joins. The bottleneck is the network latency, not the database. The worst I've seen is an nHibernate cache stored in a session variable, half the database was being serialized/deserialized on every http request. Fortunately that was a small database.

Even with in memory caches I've seen systems grind to a halt by death of a thousand cuts, dictionary based entity attribute systems where each attribute is looked up individually. There seems to be a mentality that constant lookup == free lookup and devs don't seem to realize that constant * $bignumber == $biggerNumber. Caching shouldn't be granular.

Obligatory latency numbers every programmer should know: https://gist.github.com/jboner/2841832

YarickR2 · on Aug 12, 2020

It very much depends. I've seen too many cases when business logic was moved to the database layer, with absolutely disastrous effects. Scaling application backends doing map-reduce on a pre-filtered datasets is way easier than scaling database to handle five-way joins. Yet,I've seen opposing cases too, when rdbms was used as a huge key-value storage , and was pushing hundreds of thousands of rows to tens of backends, sucking it's bandwidth dry.

noisy_boy · on Aug 13, 2020

Having worked with Hibernate recently on a smallish database (that isn't expected to grow too big), my take away is to minimize the network trips. If you have to check something for a list/set of things and all of that can be done from, say, two not so big tables, fetch the superset of records via a join with as much criteria pushed to predicates as efficiently possible and handle rest of the logic in application code. It is so much more quicker. Of course, like most things in software, conditions apply like network latency/total no. of records being fetched by the query/potential for data to suddenly grow etc.

bernawil · on Aug 13, 2020

Not to cache simple trips to the database is pretty uncontroversial, the are more controversial cases elsewhere.

Say you consume a Restful API to hidrate order data in another system. Do you fetch orders as:

  /orders?ids=id-1,id-2,id-3

or separate calls to

  /orders/id-x

which can be cached and retrieved by id as a memoized function?

Well if you had to pick only one the second is probably better, but the best would be abstracting away order fetching in application code to always fetch single orders and behind the scenes looking up the cache for singles and pooling all the misses into a single request to the plural endpoint.

flukus · on Aug 14, 2020

If this is a common occurrence the former would almost always be better, the second would be absolutely glacial.

> but the best would be abstracting away order fetching in application code to always fetch single orders and behind the scenes looking up the cache for singles and pooling all the misses into a single request to the plural endpoint.

Unless a significant amount of requests have 100% cache hits then I doubt a local cache will make much of a difference at all, all it's saving is a bit of bandwidth.

cvrjk · on Aug 13, 2020

I had worked on an application that saw less than 100 writes per minute and about 1k reads per min and used a caching layer in front of the DB. Not only was the cache actually slowing us down, it was also inconsistent with the DB. Can't even begin to express the amount of lost dev time and productivity. We couldn't get rid of it, because someone above was convinced that we would need it to scale in the future.

Cthulhu_ · on Aug 13, 2020

There are only two hard things in computer science: naming things, cache invalidation, and off-by-one errors.

I'm trying to never implement any caching if I can help it. The database itself does caching already as well.

And if you DO need caching, keep your hands off of the application; add a cache layer in front, or between the application and the database. But don't invent it yourself.

bernawil · on Aug 13, 2020

  And if you DO need caching, keep your hands off of the application; add a cache layer in front, or between the application and the database. But don't invent it yourself.

so, redis?

earthboundkid · on Aug 13, 2020

My tech lead once speed up our system by just turning off Memcached. :-)

pitay · on Aug 13, 2020

My problem was people using ORM frameworks and having very little knowledge of the database they were using. Slow as molasses, difficult to figure out what is going on because of many layers of extra "stuff".

oh-4-fucks-sake · on Aug 13, 2020

Same boat. Basically every day.

yashap · on Aug 13, 2020

It’s insane how often people stick Memchaced/Redis in front of MySQL/Postgres, completely unnecessarily. So many otherwise decent developers assume that MySQL/Postgres is too slow/doesn’t scale, without even trying. Or similarly, how frequently people shard databases unnecessarily. A single MySQL/Postgres server, on a beefy machine, can handle an absolutely massive amount of work, with great performance, assuming you’ve indexed well and don’t have to run super crazy queries.

I’ve worked on some pretty high traffic systems, with pretty large data volumes, and very, very rarely have I actually needed to cache simple DB reads. And it’s hard to properly cache complex ones anyways, because cache invalidation is so hard for complex queries. Likewise, it’s very rare that I’ve needed sharding. SOMETIMES you need these things, but mostly people are adding a lot of extra complexity and cost for no good reason.

nojvek · on Aug 13, 2020

I don’t think people realize that databases themselves have a caching layer internally. Redis isn’t magical, you still have to send network packets to the other server.

Even when reading from disk via mmap, hot pages are in memory.

Sometimes postgresql is faster than redis because all it needs to do is read something from memory and spit it out in the right format.

yashap · on Aug 13, 2020

Totally. If the hot pages of your indexes and hot pages of your records mostly fit in memory, DBs mostly read from memory anyways. If they don’t mostly fit in memory, things can slow down a bit, but you’re likely better off vertically scaling your DB (more RAM) than adding a cache.

Looking up a normal sized record by id, with good networking in your data centre, takes ~1-2 ms round trip whether you’re reading from Redis/Memcached or MySQL/Postgres, and either can handle massive read load if sized properly. The cache just ~doubles your costs, is one more thing to patch/maintain, and introduces new sorts of bugs/outages.

throwaway894345 · on Aug 13, 2020

I ran into this early in my career when a senior engineer at a startup reviewed my Python PR for operating on a CSV dataset and insisted I rewrite it in Pandas for performance. It was a very simple program with naive Python (a handful of lines), but the Pandas version was far longer and more complex (I had to have the senior engineer help because it took some advanced Pandas-fu, and he spent nearly a full day on it), and it ultimately ended up being an order of magnitude slower because we ultimately had to call back into Python for each cell.

I actually don’t think “premature optimization” is all that bad in the general case, but you have to be educated about it (and I’ve met a lot of people in Python shops who think that Pandas or multiprocessing will cure all performance ails), and you should specifically think about how costly is a given optimization going to be to maintain or back out of if you’re wrong about the performance benefits.

In general, I’ve never worked on a Python project where we didn’t have to do weird, inordinately expensive things to work around performance issues (though I’ve never worked on a rudimentary CRUD app either) nor has “just throw Pandas/C/multiprocessing at it” ever adequately addressed our most significant performance bottlenecks (usually the solution looks something like Spark, all to do something that would have been sufficiently performant with naive Java or Go). This might just be my experience working on nontrivial SaaS apps; maybe if you’re just doing straight data science or CRUD apps or workloads that aren’t latency-sensitive (mind you, we struggled to keep per-request performance in the tens of seconds, so I use “latency-sensitive” very loosely), Python/Pandas will be just fine. We also ran into other problems with Python, such as packaging and distribution; notably our lambda functions were routinely too large because the pandas branch of the dependency tree was itself more than half of the permitted artifact size (to work around, we switched to Fargate tasks, which have a much larger size limit but take 30s or minutes to boot up).

bpyne · on Aug 13, 2020

"I ran into this early in my career when a senior engineer at a startup reviewed my Python PR for operating on a CSV dataset and insisted I rewrite it in Pandas for performance."

Did a performance problem exist?

If not, then I would never promote that person to senior engineer. A working program should only be re-engineered for performance if it wasn't meeting the performance contract agreed upon when the program was written.

throwaway894345 · on Aug 13, 2020

No, the performance problem didn’t exist. The engineer in question was really smart in many other areas, but “engineering” per se wasn’t his strength. Things are fast-and-loose in startup world.

bpyne · on Aug 13, 2020

Yes, startups are the wild, wild, west. I spent from 1994-97 in startups. I loved the lack of bureaucracy. But we spent too much time in firefighting mode. Through trial-and-error I found medium-small organizations in the non-profit world my best fit.

Early on I had a senior guy mentoring me on a project involving a tool called PowerBuilder. He chose a design that didn't fit the problem-space well but it fit the "PowerBuilder Best Practices" so he implemented it. The performance was abominable and he should have known it would be: he too was a smart guy. But he had a hard time seeing "big picture" design.

farhaven · on Aug 12, 2020

Do you work at my company? Because I have a coworker that is always proposing _exactly_ that solution. No matter what issues the code has, for him its usage of MySQL is always "the worst moment".

Asking for benchmark just gets a repeat of "our worst moment is MySQL and we can solve that with some NoSQL cache".

fizwhiz · on Aug 12, 2020

My dude's just trying to get a promo, I guarantee it.

combatentropy · on Aug 13, 2020

We all start out this way, don't we? I think what drove it out of me was having to maintain. Okay, you wrote this thing, now you need to change the way a few features work and add some others. Especially effective is if you take turns being on call, getting paged in the middle of the night when your multilayer architecture has a hiccup and now you have to troubleshoot it.

I wish there was a quick remedy for such people. But usually they are new, based on what I said. So they should be Junior Developer, not Architect. Sure, you can suggest things, but the answer is nope.

A lot of people, though, may never get much experience supporting their own architectures, because they swoop in, then swoop out, never staying at a job long enough. Or support gets assigned to another group. One thing I like about Agile is that the team that creates the code is the one that supports it.

cs02rm0 · on Aug 13, 2020

Yeah. I got told to put in a caching for a system that made very slow calls through a handful of proxies to the other side of the world and back.

We went with redis for the "PoC". I'd tried explaining that if a query is only made once every 24 hours and the data can't be cached longer than that because it's considered too out of date then a cache is pointless extra complexity.

He wasn't having it though, so he made us build it and demo it to a room full of people. Fortunately the room understood the simple explanation and he listened to them where he wouldn't listen to the dev team, so a few weeks of work was scrapped there and then.

userbinator · on Aug 13, 2020

It seems that a subset of developers are somehow convinced that adding complexity is a good thing for performance and should be done first, when it should really be a last resort. Their usual rebuttal is "it's scalable" and other buzzword-laden phrases. I wonder if it's a form of "big data envy" or just regular "architecture astronautism".

Cthulhu_ · on Aug 13, 2020

Honestly, it's the other way around usually; simplicity is scalable. A stateless service talking to a database is fine. You can scale out the service, and scale up the database. It's even easier when you use a cloud provider, their relational database offerings can scale from a wordpress blog's database to enterprise scale, tens of thousands of transactions / second.

The problem is that it's boring, and there's a lot of developers that create work and complexity to make their own jobs interesting.

Aeolun · on Aug 13, 2020

No, no. We need all these layers of proxies, caches and various databases to make things more reliable!

avasthe · on Aug 13, 2020

This is taken to other extreme many times.

Example: Google Chrome codebase was allocating lot of std::string and also someone used a Set to check membership of single item. [1]

I mean, if you say like this, many people don't even care about algorithm complexity.

Doesn't help that people want to write Python in the monster that is C++.

https://groups.google.com/a/chromium.org/forum/m/#!msg/chrom...

account42 · on Aug 13, 2020

I very much agree that many people use rules like these as excuses to write shitty code.

From the post you linked though:

> Not reserving space in a vector when the size is known

std::vector::reserve() is actually not something you should always use when you are adding a number of elements as it will (typically) grow the vector to exactly what you ask. If your function then gets called in a loop to append to the same vector multiple times you end up with quadradtic run time that is normally avoided by the geometric growth done when you just append without reserving.

coldtea · on Aug 13, 2020

>and also someone used a Set to check membership of single item. [1]

Depending on where it was done (fast path or not), this could be just fine.

avasthe · on Aug 13, 2020

You don't know when something comes to hot path.

    if (std::set(itr.begin(), itr.end()).count(element)) { _____ }

is it tempting for someone than something like

std::find(itr.begin(), itr.end(), element) != itr.end())

?? I don't know. That said, C++ STL quite undiscoverable.

coldtea · on Aug 13, 2020

>You don't know when something comes to hot path.

That's the point of the first advice though...

enahs-sf · on Aug 12, 2020

Wholeheartedly agree. What you're describing is like the ideal mysql use case. Also, even the most basic next steps performance-wise will probably tithe you over well into the TBs of data. The headroom for mysql is quite a bit higher than most people think unless you're operating at facebook scale.

deckarep · on Aug 13, 2020

I see this behavior a lot too.

Inexperienced engineers will nitpick about what is often minor “performance optimization”, clearly not seeing the bigger picture. Example, why should we spend precious developer time to rewrite some code using something that is often less readable when it’s called once every 30 seconds?

To the folks who do this: you are better off spending the time making data-driven decisions and optimizing for the big picture. In other words measure first than come up with an optimization that has a large impact on the system. Not this micro-optimization, I-love-to-tickle-myself stuff.

Learn to see the bigger picture.

zimpenfish · on Aug 13, 2020

> a Redis caching layer and a complex custom lookup scheme

Bane of my life. A few gigs ago, they had a global Redis cache, a Redis cache per server, an in-app cache, and then MySQL. Needless to say, there were many, MANY bugs that came down to cache coherency and race conditions between them.

[edit: it was a global Redis, not clustered.]

hckr1292 · on Aug 13, 2020

Did we both work at a certain edtech company? Seems oddly specific and exactly how I remember our monolith being structured

zimpenfish · on Aug 13, 2020

Alas, that wasn't edtech, it was photos related.

hansvm · on Aug 13, 2020

No comment on your particular example, but the opposite approach is more common than I'd personally like and also not ideal -- if you're doing anything nontrivial it's probably worth doing a back of the envelope calculation to ensure you have enough disk, memory, and network throughput for your favorite solution to handle the expected load.

fn1 · on Aug 13, 2020

> Can't we just let stuff work? I'm no fan of MySQL, but can't we admit that a ubiquitous and battle-tested piece of technology, applied to a canonical use case, on tiny data under near-ideal circumstances, is probably going to work just fine?

I call that "separation of error-domains", meaning trying to find interfaces where you can separate off parts of your application, so debugging gets easier:

Is the bug in the database, or in our code?

specialist · on Aug 13, 2020

I feels ya.

My most recent team had our hottest dataset in dynamodb. Because cloud. So much effort to get our web service's P99 <50ms.

The whole dataset fit easily into RAM.

Thru (way too much) effort, I was able to introduce Redis. First shared. Then eventually one mini-instance per EC2, running alongside nginx & nodejs.

dvfjsdhgfv · on Aug 13, 2020

> <1GB, moderate read volume, super low write volume MySQL database

It looks like an excellent candidate to put it entirely in RAM and trigger sync on writes only, why on Earth would you need anything else.

dkarl · on Aug 13, 2020

That's effectively what MySQL is going to do.

dvfjsdhgfv · on Aug 14, 2020

For the most part, yes, but I was thinking about explicitly proloading all tables and indexes into RAM or even using the MEMORY engine, if it makes sense in that particular case.

aprdm · on Aug 13, 2020

hehehhe I have actually put some data that do not change more often than once a week hardcoded in the code base and commited in source control... whenever a change needs to happen in the data the CI/CD runs and deploys a new version of the app.

You don't wanna know how that was done before. It is like < 1 GB of JSON as well.

dkarl · on Aug 13, 2020

I wish we had a deployment process trivial enough to support that. On the other hand, eventually you want to farm that job out to customer success or an implementation engineer, and then it's not so fun having the data in source control.

indiv0 · on Aug 13, 2020

Please tell me you have git LFS configured to handle this.

aprdm · on Aug 13, 2020

Yup !

paulryanrogers · on Aug 12, 2020

Never read the release notes for MySQL. Unless phrases like "server exit" or 'regression' don't bother you.

andrewl · on Aug 12, 2020

In The Mythical Man Month Fred Brooks said "Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious."

I first read that on Guy Steele's site: http://www.dreamsongs.com/ObjectsHaveNotFailedNarr.html

vishnugupta · on Aug 12, 2020

> Show me your tables, and I won't usually need your flowchart

A couple of years ago I spent quite some time trying to evaluate the tech stack (and general engineering culture) of merger/acquisition targets of my employer. It was quite a fun exercise, all said and done. I encountered all sorts; from a small team start up who had their tech sorted out more or less to a largish organisation who relied on IBM's ESB which exactly one person in their team knew how it worked!!

I discovered this exact method during the third tech evaluation exercise. When the team began explaining various modules top-down and user-flows etc., I politely interrupted them and asked for DB schema. It was just on a whim because I was bored of typical one way session interrupted by me asking minor questions. Once I had a hang of their schema rest of the session was literally me telling them what their control and user flows were and them validating it.

Since then it's become my magic wand to understand a new company or team. Just go directly to the schema and work backwards.

Conversely, I've begun paying more attention to data modelling. Because once a data model is fixed it's very hard to change and once enough data accumulates the inertia just increases and instead if changing the data model (for the fear of data migration etc.,) the tendency is to beat the use cases to fit the data model. It's not your usual fail-fast-and-iterate thing.

noisy_boy · on Aug 13, 2020

I have learned to spend a good chuck on my effort and focus on data model - it is literally the heart of the application. Once that is done correctly, I've seen that the code almost just falls into place by itself.

sova · on Aug 13, 2020

Based on your experience it seems that #5 ought be first: "Data dominates"

monocasa · on Aug 12, 2020

Resaid by Linus with a bit more modern nomeclature (and Linus's trademark bluntness):

> Bad programmers worry about the code. Good programmers worry about data structures and their relationships

mgkimsal · on Aug 12, 2020

> Bad programmers worry about the code

And yet, I see a whole swath of the industry hyper-focused on various linters/styling/rules.

rumanator · on Aug 12, 2020

> And yet, I see a whole swath of the industry hyper-focused on various linters/styling/rules.

It seems to me that what you're actually seeing is an entire industry trying to eliminate all code-related issues, specially bike-shedding ones.

This is patently obvious to anyone who was forced to waste their time in code review iterations discussing, say, where a brace should go and how many spaces someone should have added.

lokar · on Aug 12, 2020

This. The point of the style guide, now reaching its best embodiment in clang-format, gofmt, and others, is that you don't want to waste time arguing about, or even considering for a moment the formatting of anything.

mgkimsal · on Aug 13, 2020

the people who might not follow explicit guides are also the ones who don't typically spend the time 'arguing' about it in the first place though.

Quekid5 · on Aug 12, 2020

This usually a good time to apply the When In Rome rule. Do not reformat needlessly, follow the code style of the code you're modifying. Done.

(If multiple people are arguing back and forth in code review -- when following the WIR rule -- tell them about the WIR rule and that should settle it. If not, you have bigger problems in your team.)

jacobsenscott · on Aug 12, 2020

Sort of. You shouldn't combine style and non-style changes in on change set. But if your project has mixed styles from file to file you should assimilate the locals over time, just as the Romans did. A consistent style has value.

Quekid5 · on Aug 15, 2020

I agree (in principle!) with the idea that consistency across the whole code base has value, but personally I find it extremely marginal, unless you're literally dealing with e.g. hundreds of people needing to read/modify the code. The value proposition skews heavily towards situations where you have a lot of people needing to read/modify the code.

s17n · on Aug 12, 2020

Nobody was ever "forced to waste their time" on this stuff. I have a simple rule - I don't comment on other people's style, and if people comment on my style, I just go with their suggestions. Problem solved.

ses1984 · on Aug 12, 2020

You've never been on a team with two people with opposing opinions I guess.

Quekid5 · on Aug 12, 2020

Just out of morbid curiosity... have you actually experienced multiple 'seniors' giving conflicting code review comments about code style (of all things)?

That sounds quite dysfunctional.

(EDIT: Sure, nitpicks may differ, but...)

rumanator · on Aug 13, 2020

> That sounds quite dysfunctional.

No,it doesn't. It sounds like the expected outcome of not enforcing an established style with automated tools.

All it takes is someone posting a merge request with a bracket out of place, or tabs instead of spaces which screws layout because yes IDEs have custom definitions and a dude happened to have opened a source file with an editor that wasn't properly configured.

Boom, merge request receives two comments pointing out the bracket and how indentation is off.

Congrats, about 20 minutes of your team's day are wasted because that's the time it takes to receive feedback from the merge request, be briefed on the remarks, go through the code and fix whitespaces, commit your change, push those changes, update the merge request, and wait for a team member to review your update.

No drama. No dysfunctional team. No disagreement, even. But those 20 minutes of your life are lost forever.

Quekid5 · on Aug 15, 2020

> It sounds like the expected outcome of not enforcing an established style with automated tools.

Unfortunately those also have significant downsides around large-scale refactoring.

What I always do (and advise other code reviewers to do) is to just ask themselves: Does this code follow the local style in the file being edited?

That simplifies things greatly, IME.

ses1984 · on Aug 13, 2020

Style yes, formatting no.

Worked with two developers who endlessly argued whether or not we should handle a certain bit of complexity in a certain layer or the next layer over, so we ended up handling it in both layers with the downsides of both.

mrighele · on Aug 13, 2020

Having different prople using different coding styles adds a lot of noise to the history of you repo, expecially if those differences are not only about whitespace (e.g. one developer insisting on opening bra kets on the same line and another on a new line)

johnisgood · on Aug 12, 2020

> I just go with their suggestions.

Why though? I am not going to go with suggestions if they make the code less readable for me!

codetrotter · on Aug 12, 2020

I think the whole world would benefit if we’d make it so that code formatting happened separately from what was committed. Then everyone could have their local checkouts formatted the way they wanted and there would be nothing to argue about in terms of coding style.

goto11 · on Aug 13, 2020

Why though? It seems like an elaborate technical solution just to avoid someone making a decision on fruitless bike-shedding discussions. It will only work for white space formatting not for other conventions prone to bikeshedding like naming/casing conventions.

Just make decision and get on with your lives. Have some kind of linter check for inconsistencies before any human review.

If an organization cannot make a decision on inconsequential bike-shedding it is dysfunctional.

frandroid · on Aug 13, 2020

After pre-commit linting, you get post-pull linting... If you kept the code in a RAM drive this might even be quick. 8)

markrages · on Aug 13, 2020

Smart tabs is halfway there.

brigandish · on Aug 13, 2020

This would be my number one wish for tooling.

lallysingh · on Aug 12, 2020

Every have more than one reviewer in your CR?

Aeolun · on Aug 13, 2020

Doing any kind of style discussion in a code review means you’ve already failed.

I personally get super annoyed when people keep pointing out style issues, but our CI tool can notify me of issues with my commit until the end of time without me getting frustrated with it.

rumanator · on Aug 13, 2020

> Doing any kind of style discussion in a code review means you’ve already failed

This sort of baseless assertion has no bearing in reality. In a project that hasn't adopted any linting tools and automatic style checks, all it takes is a misconfigured editor to post a change request that fails to comply with style guides. These sorts of absolutes show a complete detachment from reality and absence of any practical experience in the field.

Aeolun · on Aug 13, 2020

> These sorts of absolutes show a complete detachment from reality and absence of any practical experience in the field.

But you are making these baseless assertions yourself?

Obviously you can have issues if you are not using automated linting (both in the editor and on CI). That’s part of the failure.

Osiris · on Aug 12, 2020

I completely agree. One reason I like prettier is that it only has about 6 options you can change. It removes the bike-shedding. Just let it do it's thing and worry about more important things.

It also remove all debate in PRs about style and formatting.

(note: before prettier, I was fairly particular about how I formatted my code, and I disagreed with prettier in some cases, but now, I love having one less thing to think about)

furyofantares · on Aug 13, 2020

> > Bad programmers worry about the code

> And yet, I see a whole swath of the industry hyper-focused on various linters/styling/rules.

I've come to a severe distaste for this good programmer/bad programmer mentality I've seen on the internet for, I guess decades now

There is skill in programming, yes, obviously. But this simplistic divide seems to me to be more about putting one's own ego on the superior side. It leads to simplistic heuristics and flames rather than nuanced discussion

In this case, in my opinion, linters/styling/rules help people to focus on what matters. And sure, with sufficient skill you might not need any of that to help you focus on what matters. But so what? It's better if we can make the trade more accessible, and can make it so people can focus on what matters with less experience

zekrioca · on Aug 13, 2020

Couldn’t agree more. That’s the main reason why simple and accessible projects are praised most by general people.

Koshkin · on Aug 12, 2020

And that’s because it’s Bad Programmers who need help!

mgkimsal · on Aug 12, 2020

but... they need help on data structures/relations and up front thinking about those issues, not where curly braces should go, or tabs-v-spaces.

Benjammer · on Aug 12, 2020

Right, because the hyper-focus on linting of the industry is the symptom here, it's not a misguided treatment for the underlying problem of bad programmers.

wvenable · on Aug 12, 2020

That stuff is hard! Better to just shove your data somewhere unstructured and then you don't have to worry about data structures and relations.

rapind · on Aug 12, 2020

... and "rules" for programming.

burke · on Aug 12, 2020

Bad programmers inflicting their worry upon the others.

Akronymus · on Aug 13, 2020

I do like to use a linter to make it easier readable. But yeah, most of the architecture actually is based on how you store/structure your data. Code is just a result of how you implement the data.

rjsw · on Aug 12, 2020

>I first read that on Guy Steele's site.

It isn't Guy Steele's website. That page was written by him but the website is owned by Richard P Gabriel.

nizmow · on Aug 12, 2020

I've recently started a job in a very complex business domain, but sadly they're using NoSQL for everything. I've known for a while about the technical tradeoffs of NoSQL, but until now I'd never experienced that the lack of expressiveness in the data store is a major obstacle to understanding what kind of data the code deals with and how it's related. The data's all there, but exploring it without a real schema is much more difficult.

mathattack · on Aug 12, 2020

An early mentor put it as “learn the data, which won’t change, before learning the fancy stuff on top, which will”

That carried me very well.

rumanator · on Aug 12, 2020

Plenty of professional developers would benefit greatly if they read Domain-Driven Design.

mathattack · on Aug 13, 2020

Interesting. What’s the best resource beyond the Wikipedia page?

bmaupin · on Aug 13, 2020

"The term was coined by Eric Evans in his book of the same title."

https://en.m.wikipedia.org/wiki/Domain-driven_design

gumby · on Aug 12, 2020

That’s Dick Gabriel’s site; he posted gls’s essay there with attribution (so you didn’t realize which site it is). He and quux are friends and collaborators.

screye · on Aug 12, 2020

as someone in ML, I see myself wanting the opposite.

ML researchers drown their algorithms in huge tables of results, effectively spending time on "how well" rather than the "what".

It often leads to things being added as long as they are better, with the conclusion of it being a gargantuan monster of models and hand-engineered changes. All with no one understanding how the whole things works as a single unit.

Flow charts are incredibly effective as the top most layer of abstraction. Does the whole process, when viewed in an end-2-end manner, make sense ? We dive into the details only if it passes that sniff test of a flow chart.

I might be missing the point being made here, but they can claw flowcharts from my cold dead hands.

everybodyknows · on Aug 12, 2020

"Flowchart" has historically, in Brook's time, meant "flow-of-control chart", and these usually degenerate into vast webs of minutia -- useless as abstractions.

But perhaps you meant "flow-of-data between structures" -- in which case we have agreement on engineering, but a muddle on semantics.

dllthomas · on Aug 12, 2020

When Brooks says tables, I believe he means the internal data representation, rather than "tables of results".

lliamander · on Aug 12, 2020

Rule 5 seems to mirror one of my favorite insights from Alexander Stepanov:

> In 1976, still back in the USSR, I got a very serious case of food poisoning from eating raw fish. While in the hospital, in the state of delirium, I suddenly realized that the ability to add numbers in parallel depends on the fact that addition is associative. (So, putting it simply, STL is the result of a bacterial infection.) In other words, I realized that a parallel reduction algorithm is associated with a semigroup structure type. That is the fundamental point: algorithms are defined on algebraic structures.

This is also exemplified in the analytics infrastructure used at stripe: https://www.infoq.com/presentations/abstract-algebra-analyti...

kens · on Aug 12, 2020

In case you've wondered what a monoid is, that's a monoid. Something with an associative operation (and an identity), so you can do the operation on chunks in parallel, like addition.

dllthomas · on Aug 12, 2020

Every monoid is a semigroup, but it's only a monoid if there is also a value that serves as an identity.

lliamander · on Aug 12, 2020

Yep. And if what you have is an Abelian Group, then you also get distributed computation as well (thanks to commutativity).

dllthomas · on Aug 12, 2020

While true, that's too strict. An Abelian group (like any group) needs inverses. You get distributed computation if you've got an Abelian semigroup.

lliamander · on Aug 12, 2020

Thanks for the correction. I think that in Avi Bryant's talk (that I linked to above) Stripe ended up using Abelian groups for some reason, rather than Abelian semigroups, though if so I forget the reason why.

dllthomas · on Aug 12, 2020

Inverses don't show up as much as I'd (aesthetically?) like in computing. There was an interesting application here: https://www.reddit.com/r/haskell/comments/9x684a/edward_kmet...

Koshkin · on Aug 12, 2020

To be fair, Abel did not know (or care) about semigroups.

dllthomas · on Aug 18, 2020

That matches my understanding, but the terminology is still (IME) common.

gnulinux · on Aug 12, 2020

You can distribute the computation on just a monoid as well but it needs more bookkeeping. In particular, your reduce function should know

* lhs is before rhs

* There is no data between lhs and rhs

dllthomas · on Aug 12, 2020

One way of looking at it is that equipping our data with that bookkeeping gives us something that commutes.

gnulinux · on Aug 12, 2020

Hmm sure, but it is not a requirement that your underlying algebraic structure should commute, so I think original phrasing was misleading. The bookkeeping allows you to commute a specific list of objects, even though the underlying operation is anti-commutative (i.e. exists a,b a.b != b.a).

At the moment of computation, you can build a new structure that commutes by enumerating the data. I guess it's true that you need a commuting intermediate data structure to be able to distribute.

dllthomas · on Aug 13, 2020

Yeah, I think it's informative that you "need commutativity" but important that you can build it yourself. It's nice (mostly from an efficiency standpoint, sometimes from a complexity stnadpoint) when you can get it "for free" because the underlying type is commutative, and the fact that you're shooting for commutativity can inform how you build and test the bookkeeping.

As an interesting nit, "anticommutative" specifically means that a.b = -(b.a), which is different than simply not being commutative.

A group might be commutative, anticommutative, neither, or even both (trivially true of the empty group and the group with one element, but I think it can be true of larger groups).

gnulinux · on Aug 13, 2020

Ah, my terminology is rusty as it's been years since my math classes. I guess I need to review my algebra books again. I do have some personal code that called "exists a,b a.b != b.a" anticommutative, didn't even realize that's wrong terminology!

lliamander · on Aug 12, 2020

One of the ideas in the talk I link is how you can represent typically non-commutative data (like averages) in a data structure that does support commutative operations (numerator/denominator pair) and to take advantage of generic analytics infrastructure.

skybrian · on Aug 12, 2020

But adding floating point numbers isn't associative, in general. Sometimes you need to do it the right way to avoid catastrophic cancellation.

I guess the key is to know how to deal with things that are only mostly true.

quietbritishjim · on Aug 12, 2020

> But adding floating point numbers isn't associative, in general. Sometimes you need to do it the right way to avoid catastrophic cancellation.

That exactly proves his point. Systems that are associative can be processed by the parallel algorithm he was thinking of. Floating point numbers, if you care about their non-associativity, cannot be processed by that algorithm. So the validity is that algorithm depends on whether the system is associative.

lliamander · on Aug 12, 2020

That's true about floating point numbers. I assume that depending upon the context, it may not be a big issue (e.g. GPU compute)?

In any case, the point Stepanov was making is that if you want to be able to use a certain algorithm, then you have to make a choice to represent your data in a way that enables that algorithm, and the way you know whether the structure is appropriate for that algorithm is the algebraic properties of the structure.

lmm · on Aug 13, 2020

No, the key is to use good abstractions unless you have a really good reason. New code should not be using IEEE 754 floating point unless you have benchmarks and profiles showing that you actually need to.

Koshkin · on Aug 12, 2020

That’s why in C++ we have traits and overloading.

rumanator · on Aug 12, 2020

Could you explain where do you see traits and overloading helping you with floating point operations?

sukilot · on Aug 12, 2020

And that was 10 years before Haskell went huge in that idea.

lliamander · on Aug 12, 2020

I'm not super familiar with either the C++ or Haskell communities, but Stepanov's notion of Generic Programming[1] certainly seems to fit with the Haskell ethos.

[1]http://www.generic-programming.org/

algebra-history · on Aug 12, 2020

Recently I searched the Web, trying to find out the origin of monoids as an approach to distributed computing, and couldn’t find it. This quote is a great find for me! Is this the origin?

vikiomega9 · on Aug 13, 2020

If you're thinking about map-reduce, the original Google paper talks about associativity.

karl11 · on Aug 12, 2020

Everyone building no code tools is learning or will learn that the problem most businesses have is not a lack of coding skill, or the inability to build the algorithm, but rather how to structure and model data in a sensible way in the first place.

throwaway894345 · on Aug 12, 2020

Modeling the data and structuring the program are indeed the harder tasks, but orgs have lots of smart people who have those skills but not the familiarity with various existing syntaxes and standard libraries and so on that a programmer learns over the decades of their career. Further, those same orgs probably have many people with experience in the latter but without any special ability to think abstractly. This significantly limits the ability to create tools. Further, the no code tools often abstract at a more appropriate level than general purpose programming languages’ standard libraries because these tools aren’t trying to be general purpose (at least not to the same degree as general purpose programming languages). Lastly, I’ve seen business people use certain no code tools to build internal solutions quickly that would have taken a programmer considerable (but not crazy) time to crank out, especially considering things like CI/CD pipelines, etc. Nocode won’t replace Python, but it serves a valuable niche.

tmaly · on Aug 12, 2020

If no code tools are anything like ORMs, there will be some interesting surprises when one encounters non-normalized data structures.

bballer · on Aug 13, 2020

I've been preaching this to all my non-coder creative marketing type buddies who have ideas all the time and think they can just whip up a product using all the latest and greatest no-code tooling.

They are destined for failure.

commandlinefan · on Aug 12, 2020

> Tony Hoare's famous maxim "Premature optimization is the root of all evil."

Actually that was Donald Knuth - it's an urban legend that it's an urban legend that it was originally Knuth. Hoare was quoting Knuth, but Knuth forgot he said it, and re-mis-attributed the quote to Hoare.

karmakaze · on Aug 12, 2020

And it is usually quoted out of its context.

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

eps · on Aug 12, 2020

It's also often interpreted literally.

Premature complex optimization is a bad idea, but simple (read, cheap to code) optimization for common bottleneck patterns is a perfectly reasonable thing to do.

danybittel · on Aug 13, 2020

I like what Chandler Carruth said, "The death of a thousand cuts", on why is my code slow.

FridgeSeal · on Aug 13, 2020

It's also often (ab)used far too often to justify performing no optimisation at all.

userbinator · on Aug 13, 2020

...or even pessimisation.

NohatCoder · on Aug 13, 2020

That is a great word.

hombre_fatal · on Aug 12, 2020

I don't think that changes the meaning. Once that 3% matters to you and you've invested the work to measure that 3%, it's not premature anymore.

That "premature" and "optimization" are undefined and left up for debate is what makes it trite.

YZF · on Aug 13, 2020

It does change the meaning IMO. It means than 3% of the time you should be doing "premature" optimization.

The point of most people referring to this quote is never try to optimize anything as you write it. First build your system(?), then measure it, then optimize it. Knuth's point is that this attitude is ok most of the time, but sometimes it's not ok. Another way of putting this is that most of the code, for most applications, isn't performance critical. But some code is.

Sure, you can't always tell in advance, but sometimes you can. This is sort of the difference between terrible software that will always suck and well crafted software, no amount of measurement or after-the-fact optimization will turn that terrible software into well crafted software.

The other aspect that I think is often missed is that these observations are often made at different scales. You can look at relatively short algorithm (let's say merge sort) and it may not be obvious which instructions are the ones that need to be optimized and what the bottlenecks will be, execution units, data access e.g. So you start with a reasonable but maybe naive implementation and then you optimize from there. That's a pretty solid idea. But taking that idea to a higher scale level, e.g. saying we're going to build this huge system with a billion lines of code and so we'll just throw something together and measure it isn't exactly the same thing, that's a pretty problematic idea. You need to be able to anticipate what the bottlenecks in your billion line system are going to be because finding that out after you've written a billion lines could be a big deal.

[EDIT: and really this whole long story is why these sort of rules don't work. Because the people who know (have the experience/craftsmanship) don't need the rule and the people who don't know won't understand it. It's like reading a book about sword fighting and then trying to go into a sword fight... The reading can complement your training but can't be substitute...]

RoutinePlayer · on Aug 12, 2020

This reminds me of that Woody Allen joke about someone translating all the T.S. Eliot’s poems into English after some vandals had broken into the school library and translated them into French.

bendbro · on Aug 12, 2020

I've always been uncomfortable with these kinds of ideas. The odds that the idea will be correctly applied is heavily tied to intelligence, culture, and situation. Instead of reducing the space of options you must consider, all it says is that you should "do it this way when you should and do it the other when you shouldn't." I suppose perhaps it is useful to highlight that the decision exists, but I would be surprised if anyone working in the space is unaware of the existence of the decision.

The scientific method has a similar problem. A scientist should form their hypothesis before gathering data to evaluate the hypothesis. If a scientist fails to do this, and starts engaging in p-hacking or data dredging, the quality of their research greatly declines. But proving that a hypothesis was obtained before data was collected is not usually provable when just looking at the publication itself. And further, there are ways that data dredging can unintentionally sneak into the scientific process, especially around the phase before hypothesis- observation.

This kind of idea has large technical impact, but doesn't have a solid technical reason. It's proof is closer to aesthetics than reason. And much like other aesthetic beliefs, a population believes it based on no deeper reasoning. Only exclusion or indoctrination can ensure the population's view, and only illogical rhetoric will change it.

karmakaze · on Aug 12, 2020

Ha. So the truth is that Knuth did quote Hoare, not aware that he was quoting Knuth--indirectly Knuth was quoting himself.

OliverJones · on Aug 12, 2020

In long-lived systems (systems that run for many years) it's almost impossible to choose the "right data structures" for the ages. The sources and uses of your data will not last nearly as long as the data itself.

What to do about this? Two things:

STORE YOUR TIMESTAMPS IN UTC. NOT US Pacific or any other local timezone. If you start out with the wrong timezone you'll never be able to fix it. And generations of programmers will curse your name.

Keep your data structures simple enough to adapt to the future. Written another way: respect the programmers who have to use your data when you're not around to explain it.

And, a rule that's like the third law of thermodynamics. You can never know when you're designing data how long it will last. Written another way: your kludges will come back to bite you in the xxx.

aserafini · on Aug 12, 2020

Sometimes storing in UTC is simply not correct. For example a shop opening time. The shop opens 10am local time, whether DST or not. Their opening time is 10am local time all year but their UTC opening time actually changes depending on the time of year!

wtetzner · on Aug 12, 2020

But a shop opening time is not a timestamp, so I think the original advice is still good. A timestamp is the time at which some event happened, which is different than a date/time used for specifying a schedule.

For example, if you wanted to track the history of when the shop actually opened, it would make sense to store a UTC timestamp.

TheCoelacanth · on Aug 13, 2020

> A timestamp is the time at which some event happened, which is different than a date/time used for specifying a schedule.

Correct, but that makes this a rule with much more limited applications than many people are going to interpret it as.

account42 · on Aug 13, 2020

Yes, but scheduled times can look like timestamps. It might be tempting to store a date+time+location as just a UTC timestamps but timezones can and do change so the UTC timestamps for that scheduled time is not fixed.

dragonwriter · on Aug 13, 2020

> A timestamp is the time at which some event happened,

It's important to the advice to make explicit that the use of “timestamp” in that sense is intended, because “timestamp” is also in many contexts “the data type that combines date and time of day and, perhaps optionally, time zone information”. The application of “timestamps” in the latter sense is not limited to when they represent “timestamps” in the former sense.

Joker_vD · on Aug 13, 2020

And the time offset that was in effect when the event happened, allows you to easily answer questions like "Did the shop open late, i.e. after 10 AM local time?".

karmakaze · on Aug 12, 2020

The most interesting case of this I encountered was for photo 'timestamps' on a global sharing site. UTC was being used and I was proposing a change to local time. There was great debate as many drank the UTC juice and stopped thinking.

It was when I showed them that we also have a 'shot at' location then proceeded to show Christmas eve photos showing the UTC time converted to the viewers local timezone (not always evening, not always Dec 24) alongside where the photo was taken. Just as in space-time a photo needs both a time and a place.

rurounijones · on Aug 13, 2020

Sounds like the problem was images being uploaded with a timestamp without a timzeone, in which case neither solution would work.

karmakaze · on Aug 13, 2020

The timezone could be inferred from uploader's geoip as a fallback. The problem was that even if the timezone was known at time of upload it was converted to UTC and lost when stored.

Joker_vD · on Aug 13, 2020

For historical events, where the local time is important, the combination of "UTC timestamp" and "local time offset in effect at the moment the timestamp was taken" seems to be the choice. Allows you to easily learn what time the wall clock was showing at the moment.

karmakaze · on Aug 13, 2020

Databases have support for a single type that encodes exactly like this. In postgresql a timestamptz shows as 'yyyy-mm-dd hh:mm:ss.123456+1234' but internally it's stored as UTC unixtime and tz offset.

Joker_vD · on Aug 13, 2020

Doesn't it actually store the timezone's IANA name and uses the tzdata to do conversions? That implies slightly more work than storing just the effective timezone offset, but is probably more correct when it comes to the timestamps in the future.

karmakaze · on Aug 20, 2020

Docs aren't 100% explicit but it seems that it uses tzdata etc to convert to offset as necessary and stores offset--that wouldn't affect correctness as the conversion is done now not in the future.

Benjammer · on Aug 12, 2020

Totally. "Store everything in UTC" is just another flavor of "pick a timezone to store everything." In a lot of cases, you probably need to go ahead and just store the fully qualified date including timezone/offset for each record.

account42 · on Aug 13, 2020

Even storing offset or timezone might not be enough if what you really want is some future date and time at a particular location. Timezones do change, including the regions they cover.

Still, for things that have already happended, storing them as a UTC timestmp is almost always the correct thing to do.

wvenable · on Aug 12, 2020

I made that mistake early my career following this exact advice and I ended up with a lot things were randomly 1 hour off depending on when the record was created and the date entered.

drvd · on Aug 13, 2020

That is the difference between a clock reading and a timestamp.

zelly · on Aug 13, 2020

> STORE YOUR TIMESTAMPS IN UTC. NOT US Pacific or any other local timezone.

What difference does it make if the timestamp includes the timezone? The UTC value can be recovered. In some applications the timezone is useful e.g. when intraday times matter.

bluGill · on Aug 13, 2020

Odds are that you forgot to record the timezone at all, and it didn't matter until you already have users in different timezones who have saved data and they don't remember which timezone they saved everything in.

If you record the timezone you can convert. Even then, it is easier to use UTC just because everyone else does and so you can feed UTC into any third party library and it will work.

swift532 · on Aug 14, 2020

Daylight savings might get in the way, especially if daylight savings rules change some time later.

OliverJones · on Aug 13, 2020

People ask "why UTC"? Good question. Here's why:

You can always translate UTC to a local time in a given timezone. With IANA zoneinfo, you can do that correctly even for historical data in places where timezone rules changed in the past.

You can always calculate elapsed times correctly by taking differences between UTC timestamps. With local times you can't. Because daylight time transitions.

If you started with a local service, UTC lets you expand globally without explaining to your new customers why your timestamps are not in their timezones.

Daylight time transition days. Because daylight time.

Oddball daylight transition rules. Because Indiana USA, from the legislature that almost wrote a law declaring the value of π to be 22/7.

Because almost everybody will understand your decision, even after you're gone.

combatentropy · on Aug 13, 2020

PostgreSQL stores it as UTC if the data type is "timestamp with time zone" --- https://www.postgresql.org/docs/current/datatype-datetime.ht...

pjettter · on Aug 13, 2020

In our system, contracts have an expiration date/time. The only actors are Swedish. 12:59:59 must always be 12:59:59 on a certain date. It may never become 6:59:59 when someone in the company traveled to NY and prints out the contract.

combatentropy · on Aug 13, 2020

A quote that combines the rules about optimization with the rules about data structures: "NoSQL databases that only have weak consistency are enforcing a broadly applied premature optimization on the entire system." --- Alexander Lloyd, Google

Also in support of Rule 5, see Eric Raymond's treatment of Data-Driven Programming:

"Even the simplest procedural logic is hard for humans to verify, but quite complex data structures are fairly easy to model and reason about. To see this, compare the expressiveness and explanatory power of a diagram of (say) a fifty-node pointer tree with a flowchart of a fifty-line program. Or, compare an array initializer expressing a conversion table with an equivalent switch statement. The difference in transparency and clarity is dramatic. See Rob Pike's Rule 5.

"Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data."

http://www.catb.org/~esr/writings/taoup/html/ch01s06.html#id...

http://www.catb.org/~esr/writings/taoup/html/generationchapt...

sethammons · on Aug 12, 2020

A quote from one of our founders that I've always liked:

If you make an optimization that was not at a bottleneck, you did not make an optimization.

nix23 · on Aug 12, 2020

That's a bit broad i think, talking about pure speed of your task you are right, talking about energy consumption then it's not always the case. Or small but often repeated task should be optimized no mater if they are bottlenecks, when the system grows they will become bottlenecks, optimize like a Vulcan is what my boss once said...be logical and nothing else (my interpretation)

_urga · on Aug 12, 2020

"optimize like a Vulcan"... classic!

nix23 · on Aug 12, 2020

Best boss ever!! He even had a bottle of his best whisky refilled in a bottle called "Saurian brandy", so everyone who said that this is illegal or ohh that's "start trek" got one...well not a bottle but a glas :)

renewiltord · on Aug 12, 2020

Read The Goal by Eliyahu Goldratt. While it's possible your founder came upon the idea independently, this is one of many that are repeated in that book. It's relatively short and entertaining to read and has definitely survived the 36 years since first publishing quite well.

vicda · on Aug 13, 2020

This is what instantly came to mind for me. Making a station faster generally only matters when it's a bottleneck.

It doesn't matter how optimized your computations are if you're spending the whole time waiting in IO. And don't forget that the program is generally just a piece of a larger process.

mplewis · on Aug 12, 2020

This was adapted into a novel about IT/devops called The Phoenix Project. It's an excellent read.

renewiltord · on Aug 12, 2020

I second this. Quite the entertaining read, honestly. I also enjoyed the completely unnecessary transformation of the security dork into a security ubermensch.

b0afc375b5 · on Aug 13, 2020

I've read The Goal and The Phoenix Project. While I did enjoy the stories, I'm uncertain, perhaps due to inexperience, what the main lesson/s are supposed to be.

Anyone want to share their main takeaways from these books?

sethammons · on Aug 13, 2020

Reduce feedback cycles

hirundo · on Aug 12, 2020

That's less true when you're paying the cloud for compute by the second.

atombender · on Aug 12, 2020

Not all optimization candidates are about bottlenecks. Reducing allocation is also optimization, for example.

erik_seaberg · on Aug 12, 2020

Peak memory or garbage collection throughput can become a bottleneck. But if you know you have more memory than you need, further reducing allocation is arguably a waste of your time.

This can become a tragedy of the commons in desktop and mobile apps, where you don't know how much memory the end user has or needs, but you do know you aren't paying for it.

account42 · on Aug 13, 2020

> But if you know you have more memory than you need, further reducing allocation is arguably a waste of your time.

This is absolutely not true. Just because you have enough memory does not mean that wasted memory couldn't be better used - e.g. for disk cache or to run more tasks.

NewEntryHN · on Aug 12, 2020

You made an optimization for the future when enough bottlenecks have been fixed such that this one part becomes the bottleneck.

chubot · on Aug 12, 2020

Except that there are infinite such non-bottlenecks, and all the effort you spend on there is effort not spent on the real bottlenecks.

In other words, all engineering is time- and cost-constrained. Anybody can build a good chair for $10,000 or a good PC for $100,000. Doesn't mean it's good engineering.

_urga · on Aug 12, 2020

"Anybody can build a good chair for $10,000 or a good PC for $100,000."

And some people can build a great PC for $1,000 that runs circles around the good PC for $100,000.

There's so much more to engineering than thinking in terms of time and cost constraints. Those are real constraints, but they're not the most important.

Engineering is design. If you have good design, good insight, you can do things that people with infinite time and budget could never dream to achieve. You can start making a product that's a hundred times more powerful for a tenth of the price in a fraction of the time. If you don't have good design, good insight, then no amount of time or budget can help you.

Koshkin · on Aug 12, 2020

Yes they can indeed: https://blogs.systweak.com/someone-has-built-a-gigantic-1000...