December 6th, 2017

Data Analytics - What and Why?

Data analytics, among a host of other terms, has been widely touted as the next “big thing” in oil and gas, but the nature of the term is often left ill-defined by marketing materials. Oftentimes, the benefits of this approach are not well understood, nor are the business circumstances in which they are useful. Other industries have been using these techniques for years with considerable success. Upstream oil and gas is in the initial stages of adoption. While some in the industry have successfully deployed data analytics solutions, there remains a great deal of confusion about the nature of the field and its potential uses.

What is it?

Before discussing the potential uses of data analytics, we need to know what the term means. At first glance, data analytics seems like a hopelessly general term. Everyone in the industry performs analysis on data, whether they are doing financial analysis, market analysis, or technical analysis. Adding to the confusion is the plethora of ancillary terms associated with the data analytics umbrella: artificial intelligence, machine learning, Industrial Internet of Things (IIOT), data science, smart manufacturing, algorithms, big data, data warehousing, Hadoop, R, etc.

Data analytics refers to a specific class of analytical methods used to study complex systems which are not tractable to traditional approaches. Specifically, data analytics refers to data-driven techniques which are applications of inductive reasoning. In the purely “data-driven” case, the system under consideration is assumed to be a black box with unknown, possibly unknowable, internal mechanics. Data is collected on how the black box behaves under a large variety of circumstances, as well as the nature of those circumstances. This data is then used to build a statistical model describing what the black box is likely to do in any given situation. For the “pure” case, at no point in the process is any attention given to the mechanisms driving the behavior of the system.

If data-driven analysis seems like regression analysis, or curve fitting, it should. Regression analysis is one of the simplest methods in the data analytics toolbox. Other data analytics approaches can deal with much more complex relationships than regression analysis. Some can self-correct for new information over time or even self-determine what model is most appropriate to use (machine learning and artificial intelligence).

In contrast, most traditional modeling and analysis schemes use deductive reasoning. The analyst builds up a model of the system from simple components with known function and tries to deduce the system’s overall behavior from those known parts. It is a case of this part behaves this way, which affects another part, and eventually builds up to overall behavior of the system. The relationships of the parts are defined in terms of first principles physical equations.

When is it appropriate?

Data-driven analysis is not a replacement for traditional analysis methods. In most cases, it is used in conjunction with traditional analysis methods and with the input of experts in the subject field in question. While the data-driven analysis methods will find patterns, finding the right patterns and ascribing meaning to them is a matter of finesse and experience. These analysis methods are tools whose effectiveness is greatly enhanced by expert guidance.

Data analytics techniques are well-suited to systems whose internal mechanics are poorly understood or difficult to probe. Many E&P systems are not directly observable, deep in the earth and under water. Data analytics methods examine actual results from multiple situations providing a way to manage uncertainty systematically and neatly. It is also suitable for systems which are too big and complicated to model in detail.


Data analytics, among a host of other terms, has been widely touted as the next “big thing” in oil and gas, but the nature of the term is often left ill-defined by marketing materials. The question is, what to do about it?


A high-level schematic of the development process indicating general lines of influence and feedback. Green indicates a driving influence, while purple indicates a feedback channel. For instance, the chosen objective drives what data is required and how it should be organized, but the restrictions on what data is available or can be feasibly collected provide feedback which guides the choice of objective. Similar relationships exist among the other segments of the development process. The personnel and skill relationships are not illustrated here.


Important Considerations

Data analytics is not a cure for everything that ails a business. It is a set of analysis tools which, much like any other, has its strengths and weaknesses. There are important requirements for the tools to be used and function properly.

Data-driven approaches, unsurprisingly, require data resources. Organizing data, which come in a variety of formats and from a variety of locations, is perhaps the hardest aspect of getting data analytics solutions implemented and effective. The data used must be available for efficient and rapid access by machines; accurate and consistently acquired, calibrated, and properly identified so the data scientist knows what it is. Without this organization, even if an analyst finds something interesting, the results can be suspect and may be hard to extend beyond the initial pilot case. Many current efforts are stalled at this transition into production.

There needs to be sufficient data about the system in question (or about other, appropriately similar, systems) to satisfy any statistical assumptions made in the analysis.

A common moniker for the field is “big data”. The data for data analytics approaches is called “big” because the tools often require, or at least greatly benefit from, a large volume of existing information to learn about the system being analyzed and verify the model produced. Such systems are usually built, trained, and validated using a large amount of data; ideally each of these three steps will use independent large datasets. Once validated, data driven models can be used with the expectation of acceptable functionality. Even then, the smart implementation will provide a continuous feedback so the model can “learn” with ongoing use.

Building a useful solution with data analytics tools generally involves comparing results and assumptions with other approaches, including traditional methods, and checking with human experts in the field to make sure the conclusion makes sense. Verification and validation steps are critically important, but once done, the model can be used with confidence.

Data analytics benefits from expertise about the system under examination and about the inherent assumptions of the statistical tools being used. Not all engineers and not all data scientist have both. The right balance of skills on your data analytics teams is important for success.

Potential Benefits/applications

As noted, data analytics methods can provide insights into highly-complex, poorly defined systems through relatively simple statistical models. These complex, intractable systems are relatively common in the oil and gas industry. Downhole environments are difficult to probe and often very complicated. Between the multi-phase flow problems, the imperfect knowledge of reservoir geometry, composition, and extent, physical models of such systems can be difficult, if not impossible to describe with traditional approaches. However, there are often analogous reservoirs where more historical behavior is available. Data analytics provides mathematical tools to identify patterns from that historical data which may be applicable to the system under analysis.

Dealing with uncertainty in traditional models is normally done by running a large ensemble of model systems with differing initial conditions. Simulating a large ensemble of configurations requires considerable time and computer resources. Due to their structure, data driven models tend to run much faster since the effective “ensemble” of results is already “stored” in the data observations, rather than generated from scratch.

Similar difficulties can arise when dealing with large, interconnected systems of equipment. Behavior of these machines can be mapped out from first principles. However, between the unexpectedly changing inputs and conditions (common for oil and gas production systems), uneven equipment wear, and the potential for emergent phenomena arising from interactions between equipment, doing so can be problematic. Accurate represention of all the interactions requires feedback on each possible equipment interaction, and properly accounting for the different possible inputs requires exploring a large phase space with numerous simulation runs. Additionally, each equipment reconfiguration could prompt an entirely new set of simulation runs.

In the field, experienced equipment operators can generally pick up a feel for how such a rig is going to behave despite this complexity. Data analytics and machine learning can potentially provide ways to capture that experience in a quantifiable and repeatable way. And that may be a huge opportunity!

Even for single pieces of equipment, data driven analysis can be useful for predicting failures and performing real time monitoring. Specifically, the emphasis is on real-time in this case. A pump, for instance, can be simulated in principle, but if it is moving a mixed-phase fluid slurry, the simulation work is very computationally intensive. Ensemble runs cannot take place fast enough to guide the operation of the pump in real time. Conversely, a data-driven algorithm might search a pre-compiled set of prior states of similar pumps to make “educated” guesses at its future behavior, and that search can be run fast enough to give near-real time feedback.

As noted before, data organization and management are critical to success in data analysis projects. Once organized, that data is also available for other analysis methods, including more traditional models. A well-organized data backend will usually scale well for other applications and support almost any kind of research easily. The shifts in business practice necessary to make the various data silos available for any business use can be hurdles to implementation in and of themselves.


Data analytics is about applying statistical methods to the complex mechanical and chemical systems in upstream oil and gas. It leverages the vast amount of information already collected and continuing to be collected to provide timely guidance for business decisions. The answers provided are not necessarily better than those given by traditional modeling methods. However, they can be provided quicker and with less knowledge of the internal workings of the system. This comes at the cost of requiring better managed and recorded behavioral data. Effective data analytics requires expertise in the technical and business systems being modeled, as well as the mathematical tools being used, and the IT systems supporting them.