Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For computing being 'engineering', that is an old sore point with me. I believe that computing should be engineering but so far really misses some key points.

Here, close to the article, is an example of some of what is missing:

In engineering for, say, a bridge, the design engineer can get quite comprehensive data on the 'engineering properties' of the materials, components, and processes he is intending to use. E.g., he knows the stiffness of steel beams, the tensile strength of steel cables. and how much weight a reinforced concrete column can carry.

Now in computing, for an algorithm, what do I know about the resources it requires? E.g., I just wrote a Web site session state store using TCP/IP for communications, my own code to convert the TCP/IP streams to the 'messages' I needed, and Microsoft's .NET collection class SortedDictionary to store the pairs of session keys and session values. So, how do I know fast my code will be and how much storage it will need? That is, could I get data on TCP/IP and SortedDictionary so that I could tell before writing and running the code?

That is, do I have solid engineering information about the components I used that will let me know, before writing and running the code, the final performance of my software?

NO!!!

Instead, about all I can do is run the code and measure the execution time. For the storage needed, I don't have a good solution even given the running code.

So, by analogy, if bridge building was like software writing, then a bridge engineer would not know how strong his bridge was until he built it and tested it with loads.

So, net, the bridge designer can 'engineer' his bridge just on paper before any construction has started, but I can't do the same for my session state store.

Again, a big difference is that the bridge engineer has detailed engineering information on his components and I do not.



You make a very good point. I had to do a trade study of different microprocessors (for embedded software) when I worked at a more engineering-focused organization, because one of our requirements was that a certain algorithm had to complete in under 30 seconds (or something like that---I don't remember the exact time requirement). One of the things we had to do was try to quantify how fast the algorithm would run on each processor, which as you mentioned was nearly impossible.

We ended up looking up the MIPS [1] and FLOPS [2] for each processor, whether it supported SIMD instructions, and how many pieces of data its SIMD instructions could process in a single instruction. Some processors didn't support floating point math, so we had to estimate how fast it could be emulated with integer registers. Then we had to have some formula to estimate the run time based on this data and the nature of the algorithm (what percentage of the code uses floating point vs integer instructions, what percentage can take advantage of SIMD, etc).

The problem with this is that even the published performance data, such as MIPS, aren't a measure of a single attribute that we can apply in any meaningful way. For example, MIPS is affected by clock speed, which instructions are executed (some may execute in a single cycle, others take several cycles), branch prediction, and cache coherency. How do you quantify stalled instructions? The final execution time is a result of an interplay between the hardware's characteristics and the algorithm's characteristics (as compiled by a certain compiler) over time.

Perhaps a more appropriate approach for software engineering would be statistics. Given normal distributions for MIPS, missed branch predictions, etc, similar characteristics of an algorithm, and perhaps the language and compiler used, then maybe we can give a confidence interval for an algorithm's run-time.

But like you said, in many cases it's probably easier (and cheaper) to just build the algorithm and then measure it.

[1] Millions of (integer) instructions per second.

[2] Floating point operations per second.

[3] Single instruction multiple data.

[4] Not to down-play the difficulty of that task. I'm sure it's very complicated as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: