Public Data and Baseline Model Development

Q: How does public data fit into the Petro.ai DSU Design Service?

A: “We can build models with only public data,” Kyle LaMotta, VP of Analytics explains. "There’s enough information reported in the public data to create predictions. While that information is limited, we can build a model using features like total proppant, total fluid, lateral length, latitude, and longitude. That’s really the most information that we can get out of public data.

“When we use public data, we know that the predictions are also learning from data that inherently has errors based on how it was reported: allocation errors on production, for example. These allocation errors are one of the biggest things Petro.ai and all of our operator clients deal with knowing that they’re inherent to the public data. This is a result of lease-level reporting, rather than well-level reporting.

“A model built with public data alone can generate predictions that are 70% accurate. Really exciting things start to happen as these models are supplemented with additional data types. That's when the accuracy of the model grows into the middle to high nineties.

“Private data is going to include things like daily production from the operator. At the well level, each operator knows exactly how much production that well is making every day. As a result, the production data is more accurate when it’s private.

“Other geological data types help inform the model. If an operator has petrophysical logs or if they have maps of porosity, structure tops, and so on, this mosaic of data adds very important information to the model and makes it dramatically more accurate.

“We usually supplement public data with private data. Operators will only have private data on their own wells or on offset wells that they’ve traded for. Usually, the private data is either more detailed well level information or more regional geologic data (e.g. various grids and well logs).

“We always start with public data because everyone has a subscription to a supporting data provider. Operators internally have their own ways of dealing with data errors. That’s why the operators will prefer one company compared to the other is how they do their allocation and how they handle any data quality issues. Public data is good, but it becomes dramatically better when we can add additional private data and increase the accuracy to >90%.”