I can speak to the value of multivariate optimization in the oil and gas busines...

I can speak to the value of multivariate optimization in the oil and gas business.

Specifically, the U.S. onshore natural gas business is currently focused on what are called shale gas "resource plays". These are large contiguous blocks of acreage -- hundreds of thousands of acres -- over which thousands of wells are drilled one-after-another, in a manufacturing process. What makes this possible is a large, contiguous, underlying resource of natural gas reserves, trapped in tight rocks.

The key to profitably exploiting these resource plays is finding the right "formula" that can be repetitively applied to drilling and completing each well. There is no time to individually sit down and engineer each well. All you can do is analyze the data after the fact.

The profitability of each gas well is influenced by hundreds of variables. For instance, you have to "fracture stimulate" a gas well using a variable combination of hundreds of chemicals, each in a varying volume, with a variable pressure, to a variable depth. The well itself may have a variable number of frac stages, with a variable number of horizontal lateral wells connecting it to the reservoir, with a variable length for each lateral.

The question you have to answer is: given thousands of data points over thousands of individual wells, what is the most profitable "formula" for drilling a well in a particular resource play? Profitability is defined as the present value of the discounted future cash flow of the well using a discount rate of 10% -- you'll hear people talk about "the PV10". You know how each well has performed after it was drilled, you know what its profitability was, and you know what formula was applied to drill and to complete it. And you have this data for hundreds of wells -- each using a different combination of variables. And there's new data added every day.

There is a _lot_ of money riding on the right answer to this question.

There is a _massive_ amount of data in the oil and gas business. And every year there are entirely new classes of data being generated. 2D seismic, then 3D seismic, then microseismic, completion data, well logs, magnetic, gravity, surface linears, etc...

It's not surprising that one of the largest customers to the supercomputing industry -- after defense -- is the oil and gas business.

One trick here is that this data is usually locked up inside proprietary databases shared amongst companies in different industry consortia. So, you really have to be a player in the business to have access to the data -- that doesn't mean that you have any clue how to analyze it, however.

And there are big QA problems with this data, too. So, you have to deal with that as well.