Your general point stands, but with corrections:

- sqrt actually can be a single instruction on x86_64 for example. Look at how musl implements it, it's just a single line inline asm statement, `__asm__ ("sqrtsd %1, %0" : "=x"(x) : "x"(x));`: https://github.com/ifduyue/musl/blob/master/src/math/x86_64/... Of course, not every x64_64 impl must use that instruction, and not every architecture must match Intel's implementation. I've never looked into it, but wouldn't be surprised if even Intel and AMD have some differences.

- operator precedence is well-defined, and IEE754 compliant compilation modes respect the non-associativity. In most popular compilers, you need to pass specific flags to allow the compiler to change the associativity of operations (-ffast-math implies -funsafe-math-optimizations which implies -fassociative-math which actually breaks strict adherence to the program text). A somewhat similar issue does arise with floating point environment though, as statements may be reordered, function arguments order of evaluation may differ, etc.

The fact that compilers respect the non-associativity of program text is a huge reason why compilers are very limited in how much auto-vectorization they will do for floating point. The classic example is a sum of squares, where it bars itself from even loop unrolling, never mind SIMD with FMADDs. To do any of that, you have to fiddle with compiler options that are often problematic when enabled globally, or __attribute__((optimize("...")) or __pragma(optimize("...")), or probably best of all, explicitly vectorize using intrinsics.