Saturday, September 15, 2018

Machine Learning and Python: Interview with Patrick Ng

The United States is now the world's largest producer of oil and gas, and machine learning played a large role in the transformation, which has occurred because of new techniques and technologies.

Welcome to an interview with Patrick Ng, geoscientist and pioneer of innovative ways to use analytics and specifically machine learning, to find new oil and gas reserves and to produce them more efficiently and sustainably.

https://youtu.be/6uQR8PO3l3A

https://youtu.be/6uQR8PO3l3A


LIFE EDGE with Patrick Ng Chat 2018 Q&A Notes

Background - I am a geophysicist by training, and experienced A to Z in  geosciences. 1) As - AVO amplitude versus offset to reduce risk, azimuthal features to map natural fractures, 2) transform seismic to rock properties, and 3) prestack depth imaging / model building to map subsalt reservoirs leading to 3 giant discoveries total over 2.5 billion boe in the Gulf of Mexico, and 4) the Z is drilling wells and learning from the drill bit all the way to total depth (Z).

And I learn through the drill bit that we drill anything but an average well, or rather a range of IP initial productions. The risk lies in the spread, and I make it a business managing risk at Real Core Energy.

Q1: how about examples of using Python in industry?

The hackathon focus was production forecast of a well. Given the flow rate data (courtesy of Halliburton, sponsor) and Python Notebook as template, and bootcamp to bring everyone up to speed. The exercise is to try use geoscience in machine learning, and play with the number of layers and neurons in neural network, and improve the forecast accuracy.

Q2: why Python?

Python is like the foundation, that my teenage daughter uses for make up. Depending on the event, she will put on other colors and things (not sure what to call those… so I won’t).  And the real power of Python comes from a set of libraries. For example:

1) Numpy, numeric Python for vectorized numerical computation
2) Pandas for handling lots of columns and rows
3) SK learn for machine learning algorithms, ready plug-n-play.

Think of.Python example, say write a few lines of codes, in a loop do something to each element in an array one at a time.

Numpy can collapse that into a single line, operates on an entire time series as a vector all at one go.

Often we may have a thousand wells, each with its production profiles. Think of wells as columns across the top with number of barrels per day, week or month hanging down. Pandas can operate on the entire collection of series of data all at once, like getting the mean, median, statistics with one line on an entire group of data. We also get the top 25%, next 50% and bottom 25% percentiles. Quickly we get a feel for how well the producing assets perform.

Q3: why is Python so popular with  machine learning?

It has to do with the availability of powerful libraries like Keras and Tensorflow well suited for neural network and deep learning. While SK Learn has been around for some time, Tensorflow was released by Google to open source consortium in November 2017.

Lets take deep learning as example. Microsoft had success using 158 layers in a deep neural network. Using keras, we specify one layer at a time, and we’d have 158 lines of codes.

But with Tensorflow, we can do that in one line albeit a long line, by listing the number of neurons in all 158 layers all at once. Again fewer lines of codes. But if we want to customize, and tune each layer, then we can do so with Python in a more granular way.

So we go from Python (the foundation), to Numpy, Pandas, Keras and Tensorflow, each provides the tools to do more, faster with fewer line of codes. In a nutshell, Python opens up a whole new way for geoscientists to explore data, do rapid experiments and gain new insights.

Q4: can machine learning make the industry more safe and clean?

Here are two examples. First predictive maintenance, we can better anticipate and schedule downtime for routine maintenance and repairs of equipments. Just as we do annual check up for our AC in Houston and keep them running top shape. That will prevent potential leaks and minimize surprises, so keep us safe.

On cleaner environment, one possibility is that we drill fewer wells and produce the same volume, if we can better predict the outcome with machine learning. Doing so, we reduce the footprint and impact on the environment.

(One more thought came after the Chat, is refracking. If we can use machine learning to better identify refracking candidate wells, we shall improve recovery factor and may also drill fewer new wells. Again reduce footprint and lessen impact on the environment.)

Q5: is there benefit of reprocessing data and machine learning together?

Yes. It has been standard business practice that every few years, with improved algorithm, we reprocess data, get higher resolution and a more detailed look. Like going from 4K to 8K HDTV, instead of 80 to 100 feet resolution in seismic, we may get that down to 40 ft. With higher resolution data, we’d retrain machine learning and get better results. Both go hand in hand.

That brings up a good point. In the world of geoscience, if we change the model, we also get different resulting imaged data. Unlike typical data used to feed machine learning algorithm, say what I bought from Amazon or movies streamed from Netflix, what I read and watched became record. That won’t change. But when imaging seismic, the model and resulting data are tightly coupled. Change one, we change the other.

So learning with machine beats machine learning alone.

Before 1995, the thinking in Gulf of Mexico was that salt bodies would become detached because of buoyancy (density of salt is lighter than that of surrounding rocks). So over time in geologic scale (millions of years, not weeks), salt moved up from great depth and ended up what looks like cup cakes (picture inside the lava lamp). But with the Crazy Horse (now called Thunder Horse) discovery, we learn there is salt mountain that goes forty five thousand feet deep below the seafloor. No cup cakes.

Python is a tool that can geoscientists explore and test their ideas with data. Better understanding of the geology and producing more. Last but not lease, is that Python while really powerful for numerically intense applications, it can go all the way to voice. Using Python-Flask libraries, I put together numerically rigorous app and deliver via Alexa.  That I see can draw more highschool students interested in geoscience.

Closing

As a closing thought, remember the old saying “The journey of a thousand miles begins with one step.” I see learning python is the first step. Just do it!

 Thank you, Patrick! 




Blog Archive