Feature Variation Influences Accuracy in Petro.ai DSU Design Service

Q: Finding similar wells can be difficult. How many wells does it take to build accuracy?

A: “This question comes up in one form or another with almost every client.” Kyle LaMotta nods as he explains, “Companies want to know, what data should we give you, what wells do you need? Do you need one county, a ten-mile radius or all the wells in the basin?

“My response always is, what we’re trying to do is build a model that accurately predicts one small AOI or a single DSU. We’re not trying to build a basin model that generalizes across thousands of wells, we want to build the most accurate model for a very particular purpose.

“The answer to ‘how many wells do you need?’ depends on a number of different factors.

“We start with a few basic questions: What are you trying to design? Do you have diagnostic data near the DSU that you’re planning? Where is it located? Are you thinking about completions designs that are within the bounds that other operators have done before? For example, are you going to pump 2500 lbs per foot like everyone else has or are you trying to pump 4000 lbs per foot which no one has done?

“Another form of the nature of the design is, are you offsetting parent wells? Are you targeting multiple benches, are you really concerned about sibling interaction or is there one parent well in the DSU that you’re really worried about?

“Once we know those things, then we know the nature of what we’re trying to predict. Then we can go see how many wells are representative of that. In some cases, a hundred wells that have a good variation across the features that the model’s learning from could be enough.

It really comes down to asking, "What’s the variation of the data for the specific features that are going into the model?"

“What’s the variation in the features that go into the model like proppant per foot, fluid per foot, petrophysical variables like porosity, hydrocarbon core volume, thickness, or relative landings. In order to make a prediction that’s accurate, the model needs to have seen data that’s similar. And in order to run sensitivities, we need to have data across a wide variety of those variations. If we try to predict outside of that window, then the model hasn’t learned the relationship of all those features to one another. There simply wouldn’t be enough variation in the data.

“The most important factor is capturing variation in the features that the model is learning from; this determines the number of wells that we might need.

“Many of our clients are in new areas that have new or different challenges. They’ll want to include a larger data set to provide as much information as possible about the region. By increasing the well count you can capture more variation across the parent child relationships, more changes across completion designs, differences in lateral length. So, there are wells that are 3,000 feet, all the way up to 15,000 feet. There’s a lot of variations in there!

“Really what we’re trying to capture is a data set that has variations for the features that the model is learning from. That way it can really understand this complex multivariate relationship.

“We’ve built models that have high accuracy on as few as 80 wells and up to 3000 wells. There’s no answer for how many wells is enough.

“It depends on how mature the area is that we’re studying and the amount of variation in the framework the client wants to study. A final lesson is that quality trumps quantity. An accurate but small data set is informative and can provide a high level of accuracy in the model formation.”