The fact many languages don't overflow check by default really saddens me. Integer overflow is the cause of so many bugs (many user-facing: http://www.reddit.com/r/softwaregore/), and yet people keep making new languages which don't check overflow. They check buffer overruns, they check bounds, and yet not integer overflow. Why? The supposed performance penalty.
The reckless removal of safety checks in the pursuit of performance would be considered alarming were it not commonplace.
(Disclaimer: I really, really care about integer overflow for some odd reason, going so far as to be going through the entire PHP codebase to add big integer support...)
This is the main reason I don't trust the mob of people working on Rust to get their act together and make it a good language. You can see some awareness of the value of integer overflow detection among some individuals that work on Rust but despite that, here we are without that feature. Swift, on the other hand, is one language that has put some thought to the matter.
> You can see some awareness of the value of integer overflow detection among some individuals that work on Rust but despite that, here we are without that feature. Swift, on the other hand, is one language that has put some thought to the matter.
We have put a lot of thought into the matter, and the conclusion was that (a) the performance overhead, as measured by benchmarks, could be significant; (b) the relative benefits of doing the checking in Rust would be much less than in C and C++, since overflow in (safe) Rust cannot result in memory safety problems. However, we're looking into ways that we could allow folks to opt into the feature globally in the future should the performance story change or if they want to panic rather than overflow. You can already use the special checked types for arithmetic on a case-by-case basis if you'd like overflow checking.
Not to re-relitigate old discussion, but (as I'm sure you know) it's now more probable that you might overflow your integers and then pass them to unsafe code or external APIs, and there are many other bugs or undesirable behavior that could be killed dead instead of allowing them to behave badly. For a not contrived example, I don't want weird stuff to happen when my semaphore counter overflows, or underflows, I want it to just crash, right there. (My perspective and desires come from a project where I didn't care about "raw" performance that much -- the project being a storage engine where C++ was chosen with the performance concerns in mind being lack of GC and resultant traffic shockwaves, so my preferences lie a bit away from CPU integer operation performance and more towards crashing obviously at the spot of the foul. It's the worst kind of bug when you hang mysteriously in production or to write garbage to disk when the error could be caught, and I had many C++ integer underflow worries at that job.)
I also think overflow errors should be conveniently handleable without putting a try! or match expression on every operation, and so should other errors, but I'll spare you that rant :-)
> Not to re-relitigate old discussion, but (as I'm sure you know) it's now more probable that you might overflow your integers and then pass them to unsafe code
Unsafe code needs to validate its inputs. That's part of the contract of unsafe code. If your unsafe code causes memory safety errors when given MAX_INT as an input, then your unsafe code is dangerously broken regardless of how easy it is for safe code to accidentally create a MAX_INT value.
The relevant contract boundary isn't specifically at the edge of the unsafe block, it's at some other level like the public face of your ADT, inside which your unsafe and technically safe regions of code share the same set of invariants. It's immaterial whether the integer operation that results in overflow happens inside or outside what is technically the unsafe block, it should still be trapped. (Sorry about being unclear in that part of my previous comment.)
Does Rust do saturating arithmetic by default? If not, an integer overflow won't pass INT_MAX, it will pass some other int that might be a valid input to the unsafe code.
Rust does "computer" (silently overflowing) arithmetic by default. Saturating[0] and checked[1] are optionally available but require converting to methods (and some other changes for checked as it returns an Option)
Unsafe code should not have memory safety errors when passed valid input. The point is that while integer overflow in Rust might cause incorrect program behaviour, it's not going to cause undefined behaviour (and therefore e.g. executing user input) the way it does in C.
I don't understand your criticism. Anyone who claims anything is "safe" has to be specific about the definition of "safe", or else they'd be making vague unsubstantiated claims.
Rust is "memory safe" per the commonly used Saraswat definition [1] (not proven yet, but the proof is in progress). This is not something we just made up.
I think I misread you. It's certainly reasonable to say that memory safety will reduce the harm, on average, of integer overflows and that this should affect any cost/benefit calculation for implementing overflow checking. What would have been unreasonable is to imply that the safety implications of integer overflow can't matter much or are somehow out of scope because of Rust memory-safety. No matter how nice and precisely-defined the memory-safety criteria are, they're only part (an important one ofc) of what people reasonably expect in terms of safety improvements from a language which aims to be a better, saner successor to C/C++ and is evangelised as such. "Not a deathtrap like C" may be a much fuzzier standard than "memory safe per the Saraswat definition", but that doesn't change the fact that the latter is just a partial means to the former.
> This is not something we just made up.
I deliberately did not say 'made up' or 'created', but rather 'picked' meaning 'selected'. The fact that the map is pre-existing and commonly-used (and rightly so) does not alter the fact that it is not the terrain.
> What would have been unreasonable is to imply that the
> safety implications of integer overflow can't matter much
> or are somehow out of scope because of Rust memory-safety.
pcwalton is not suggesting this. The Rust devs care about reducing the amount of unsafe code in the wild, which means that they care about checked arithmetic operations. But making the world a safer place first requires displacing C++ where it is used in safety-conscious applications, which means being competitive with C++ in terms of speed. And yes, this is all about defaults, given that Rust already offers checked and saturating arithmetic types. That Rust leans on the side of speed here is a pragmatic choice (and one that is rooted in actual benchmarking, corroborating the 2-6x slowdown mentioned in the article), and is only possible because it doesn't violate memory safety by any accepted definition (see also how Rust gladly accepts the cost of checked array indexing, while also going to great lengths to provide elaborate abstractions (iterators) to eliminate that same cost). It's also an unfortunate accident of our modern historical context: once compilers get better at optimizing overflow checkes, and/or once hardware gains better support for efficient overflow checks, and/or once Rust begins to actually demonstrate a credible ability to replace C++ in safety-conscious applications (and thus doesn't need to beat the drum of performance quite so vigorously), then checked arithmetic by default can be trivially added to Rust after the fact (though it would obviously be a breaking change, necessitating a version bump).
(For completeness, I'd like to point out here that checked arithmetic is not a panacea. For any range that you wish to express in an integral type, it is vanishingly unlikely that INT_MAX is actually the logical upper bound. If I have an RPG where characters can quiver 100 arrows but I forget to enforce the upper bound, checked arithmetic won't save me from the fact that my program can now exist in somewhere between 28 invalid states (signed eight-bit variable) and nine sextillion invalid states (for a signed pointer-sized variable on a 64-bit platform, as is so common for "default" integral types). The real solution is reifying integral bounds in the type system, as I expect the language that someday replaces Rust to do.)
Other ways in which the choice of what constitutes safety is convenient: you can recurse until the stack overflows, and you can make infinite loops instead of having to prove through the type system that everything terminates. These are reasonable design decisions. If allowing silent integer overflow is the wrong choice for Rust, it's barely the wrong choice, because the costs and benefits are clearly arguable and user-specific.
> These are reasonable design decisions. If allowing silent integer overflow is the wrong choice for Rust, it's barely the wrong choice, because the costs and benefits are clearly arguable and user-specific.
Seems reasonable to me, and I certainly didn't demand overflow-checking by default in Rust. (OTOH, twelve hours ago you were so disappointed with that decision that it undermined your confidence in the Rust team...) But I don't see what that has to with picking the definitions for desirable adjectives like 'safety' when it comes to promoting or describing a system.
So inaction on overflow checking in hardware is regularly justified by claims that overflow checking in software imposes only a minor performance burden (viz. OP), while inaction on overflow checking in software is justified by claims that that it imposes an unacceptable performance burden. I feel that something doesn't add up here.
No, Rust does exactly zero implicit coversions between numeric types. For that matter, it doesn't allow you to implicitly use an i8 where, say, an i32 is expected (forcing you to explicitly cast instead), even though the former represents a range that is a strict subset of the latter. Whether or not this is a feature or a flaw depends on the observer. :)
Asides from pcwalton's argument, I think any successful (or to-be-successful) language is trying to solve some problems very well and other problems reasonably well. For Rust, the former includes the memory safety and the latter includes the integer overflow detection (but you can always name a lot more). It would be very hard to solve all problems at once (especially when known solutions to some problems worsen other problems, as seen in Rust). That can be said to Swift as well, it just picked different problems to solve very well.
Both problems are undergoing active research. They are indeed orthogonal (albeit one can somehow reduce the severity of another and vice versa), but with a finite resource and time you sometimes have to choose.
It seems to me that, despite all the flexibility people demand, they only really want (and use) exactly three integer data-types in a language:
1. machine bit-string of size 8(2^n) bits (e.g. b/w/d/q fields) with no concept of being a signed/unsigned integer, but where you can apply both signed and unsigned integer operations to it in an optimized manner. Such code will be shimmed with sets of clever shifting ops if the target architecture doesn't actually have a storage type of that width, but you have to be explicit about what you'll allow (e.g. if an integer is of type "d|2w", then it will compile fine on a 16-bit architecture (and operate using shimmed ops), but fail to compile on an 8-bit architecture.
2. unsigned integer of fixed exactly-specified bitsize, which exactly matches the semantics of a having a value on a modular-arithmetic ring. You go up, it wraps around at 2^bitsize; you go down, it wraps back. This stays true even if the code is compiled on an architecture where the exactly-specified bitsize isn't a clean processor storage-width: a 27-bit ring on a 32-bit processor is A-OK, and shims will be inserted to enforce the semantics. Shims will also be inserted if you're targeting an ISA that doesn't have an integer type with wrapping semantics (e.g. the JVM.)
3. signed arbitrary-precision integer (i.e. optimal-machine-word-size-minus-a-tag-bit with bignum promotion checks). The optimizer might convert one of these to something of type #1 if it can be very, very sure of the value range you're operating within.
#1 is for doing pointer math in unsafe regions; #2 is for implementing cryptographic primitives; #3 is for everyone else.
(There's also a variant on #1—let's call it #1A—which is a fixed-size array of #1s you can repeat signed/unsigned integer transformations over, where this will generate optimized SIMD code. This would be the "buffer" type for wire protocol packing/unpacking, and also the backing store for non-sparse matrix ADTs, bloom filters, etc.)
The only place integer overflow raising an exception would make sense, to me, are if you want some sort of hybrid type between #1 and #3: a value that pretends to be arbitrary-precision, but in fact only has a constant-size bitstring to operate within. I could see the use of this in something like Cap'n Proto, where you're keeping wire-encoded integer bitstrings (#1s from a #1A) around and pretending they're fully-functional integers (#3) for the sake of zero-copy—but do you really gain that much by not letting a data structure be recreated resized on the stack, if you're not also throwing down optimizations like intrusive lists in fixed arenas?
I have a small nit about #2 that doesn't affect your main argument. Similar to the 'overflow' flag the processor automatically sets to signal signed overflow, the processor flag for 'carry' is already being set by the processor when you add or subtract unsigned numbers that overflow. There is no separate signed or unsigned addition in x86/x64 assembly --- the processor just sets the flags for both cases, and they can either be used or ignored.
This is the reason that overflow checking can be so inexpensive --- no additional checking is necessary, just a conditional branch on an existing flag. Correctly predicted branches not taken are very inexpensive, and this branch will almost always be correctly predicted. The issue is that standard C doesn't allow any direct defined way to make use of the overflow flag. Instead you have to write something awkward and hope the compiler optimizes it down for you.
Irritatingly, GCC lacks a "branch on overflow" intrinsic or "add and branch on overflow" intrinsic, forcing you to write inline asm to do fast overflow checking.
> A new set of built-in functions for arithmetics with overflow checking has been added: __builtin_add_overflow, __builtin_sub_overflow and __builtin_mul_overflow and for compatibility with clang also other variants
Kind of makes sense, though. Since the C standard says that integer overflow is undefined, you can think of the "C abstract machine" as being built on an assumption of an underlying architecture that provides magic fixed-width-yet-arbitrary-precision integers (and/or an assumption of omniscient programmers who always know what their inputs will be and never let math happen if it would result in overflow.) Basically, you can't talk about overflow in C, or even to a C compiler, because as soon as you make the fact that overflow is happening explicit, you've stepped outside the C abstract machine.
It's like trying to talk to Haskell about explicit thunks, or Lisp about stack-vs-heap allocation. The language encapsulates the concept away from you, so if you wrench it back, you're no longer quite writing the language; you're writing a union of it and something else.
Yes, but in practice, that's what people do: write in a language more powerful than the standard gives them. You kind of have to if you want to get anything done without driving yourself crazy. For example:
long mega_byte = 1024 * 1024 * 8; // bits per MB
long mega_bit = 1000 * 1000; // bits per megabit
If you follow the C90 standard, this code has all sorts of issues. First, // comments are not allowed. Second, identifiers are only guaranteed to have six significant characters, so the above two variables might actually be the same variable. Finally, ints are only guaranteed to be 2 bytes, so the multiplication may overflow and the program is undefined. (Assigning to longs doesn't help: the multiplication already happened.)
You are correct that such things are outside of the C abstract machine, but GCC provides a lot of extensions that are outside of the C abstract machine. For example, __sync_synchronize and other atomic intrinsics (https://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Atomic-Builtins...). C had no memory model when these were introduced (which, of course, is why they were introduced), so they were outside of the C abstract machine.
"The result is the 32 low-order bits of the true mathematical result in a sufficiently wide two's-complement format, represented as a value of type int. If overflow occurs, then the sign of the result may not be the same as the sign of the mathematical sum of the two values.
Despite the fact that overflow may occur, execution of an iadd instruction never throws a run-time exception."
With this definition the JVM doesn't need distinct instructions for unsigned arithmetic since they're identical to signed arithmetic in two's complement (with some exceptions like shifting for which there are both ishr / iushr instructions).
The important thing when giving people primitives to work with is that they come with time and space guarantees. People are writing low-level code basically in order to be able to explicitly make that choice, rather than having the compiler make it for them.
So, #2 is just a few extra guarantees about the intrinsics of regular machine-words, with minimal overhead for the fast path (even on the slower path of e.g. a 30-bit ring on 32-bit storage, there's still an O(1) set of barrel-shifts and masks that can be used to implement that.)
A fully-general ringed arithmetic value, on the other hand, is more of an ADT. It could certainly be implemented in the same language and act like a number, but it wouldn't be useful as backing storage for other types in the way #1, #2, and #3 are. (Note that #2 is a better backing store than #1 for many use-cases, e.g. encoding IP addresses and subnet masks.)
Time guarantees on a modern out-of-order processor with some complicated cache structure of three or more levels, with virtual memory enabled? Forget it.
Also, if you include bigints in your list of things people want, I do not find it unreasonable to add timestamps (maybe not the best example because of leap seconds) or 'day of the week', 'month of the year' to that list. Most language libraries get those (often including those perfect 86400 second days) before they get bigints.
What do you expect to happen, then, in cases where you have a "can't fail" transparent deserialization operation from a wire format that doesn't limit the precision of its integers? Say, a web request that receives and decodes a JSON object. Now, JSON itself has no integer type, just a double type—but when people are manually generating JSON, they tend to forget this and shove arbitrary-precision integers in as naked \d+ tokens; and when people are receiving JSON, they also tend to expect to receive integers, not doubles. So what would you expect this JSON-decoding library to do, in a sensible language:
• crash the request thread?
• output a floating-point value with some of the precision thrown away?
• output a machine integer with the overflow flag set?
• output an arbitrary-precision integer?
Basically, arbitrary-precision integers are a necessary part of any platform that needs to be able to hold onto, and do math to, externally-generated numbers that didn't arrive packed in #1A type structs.
To insist, in the modern day, on attempting to do arbitrary math with arbitrary numbers using strictly machine words, is like insisting on defining a Postgres column as a VARCHAR, when TEXT is just as optimal for the average case, while automatically coping with edge-cases.
Most numbers computers touch don't have bit patterns that fit in 8(2^n) bits, but not 8(2^n) - 1 bits. Tag bits are cheap, especially if you have the other two integer types to fall back on.
I would expect the JSON-decoding library to do one of the following, depending on the library author's or callee's wishes:
- output a floating-point value with some lost precision
- output an arbitrary-precision integer, or some general "number" type that can be an arbitrary-precision integer or double-precision float
- output an "integer too precise to decode" placeholder in that location in the data structure
- fail to decode the document entirely
In no case should it crash.
I don't know what you mean by "a machine integer with the overflow flag set", since it doesn't exactly work that way, so the "integer too precise to decode" value is my alternative.
For example, suppose you're making a "JSON document"-based database engine and its query language only supports double-precision numbers. In that case, I would output a correctly-rounded floating point value, if I were decoding JSON.
Usually in other cases of decoding an integer, unless you're hosting an arbitrary precision integer calculator web application, there should be maximum acceptable values on any integer you'd read off the wire. For example, there's a maximum data size, maximum timeout duration, and such.
> fail to decode the document entirely [...] in no case should it crash
Note that I was using the Erlang sense of "crash"—a crash from a pattern-matching failure is the default clause of most case statements, and you want those to be bubbled up by crashing linked and supervising processes until they hit something that has generic "something went wrong; I'm going to report this to the user" code in it (like e.g. Rack does in Ruby.) You don't want to specifically be throwing, immediately catching, and then converting exceptions into 500 errors in every part of your code.
Also, I was assuming in this example that you're using a modern web framework, where the JSON decoding is handled by the framework—not by you—so the framework likely doesn't expose any configuration for something like its JSON deserialization library, which it wants you to just consider an implementation detail. So you won't get to specify anything like "maximum data size". Instead, the JSON library needs to "just work" in all cases, returning its best-effort interpretation of the wire message to you. (The parallel for HTML parsing would be libraries like Beautiful Soup.)
"JSON itself has no integer type, just a double type"
Nit - actually the JSON standard doesn't specify a precision for the number type, but most JSON implementations don't support arbitrary precision numbers.
Though the more recent RFC does have some implementation notes for interoperability.
> "This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available."
> "Note that when such software is used, numbers that are integers and are in the range [-(253)+1, (253)-1] are interoperable in the sense that implementations will agree exactly on their numeric values."
What is missing is a good and easy way of doing the check:
- Without need checking after every operation
- Without impeding easy use of overflow (and/or allowing for saturation) when it's needed (in a lot of places)
So, I see a narrow use for it, IF you validated your inputs AND overflow is unexpected (or, it is expected and you do something different, but it's still on narrow cases)
The reckless removal of safety checks in the pursuit of performance would be considered alarming were it not commonplace.
(Disclaimer: I really, really care about integer overflow for some odd reason, going so far as to be going through the entire PHP codebase to add big integer support...)