Data Analytics & Shale-Gas Well's Design Optimization

Hydraulic fracturing also called fracking is a WELL-STIMULATION technique involving the fracturing of bedrock formations by a pressurized liquid (see illustration above). The process involves the high-pressure injection of "fracking fluid" (primarily water, containing sand or other proppants suspended with the aid of water and thickening agents) into a wellbore to create cracks in the deep-rock formations through which natural gas, petroleum, and brine will flow more freely. When the hydraulic pressure is removed from the well, small grains of hydraulic fracturing proppants (either sand or aluminum oxide) hold the fractures open.

Hydraulic fracturing is used to increase the rate at which petroleum or natural gas can be recovered from subterranean natural reservoirs. Reservoirs, as depicted in Figure 1 below, are typically porous sandstoneslimestones, or dolomite rocks, but also include "unconventional reservoirs" such as shale rock and tight sands. Hydraulic fracturing enables the extraction of natural gas and oil from rock formations or unconventional reservoirs deep below the earth's surface (generally 2,000–6,000 m (5,000–20,000 ft)). At such depth, there may be insufficient permeability or reservoir pressure to allow natural gas and oil to flow from the rock into the wellbore at a high economic return. Thus, creating conductive fractures in the rock is instrumental, particularly, in extraction from naturally impermeable SHALE reservoirs.

Conceptual model of conventional and non-conventional (shale-gas) deposits
Figure 1: Conceptual model of conventional and non-conventional (shale-gas) deposits.

Since the early 2000s, advances in drilling and type of completion technology ("plug and perf", "perf" and "sliding sleeve", etc.) have made horizontal wellbores, like the ones illustrated in Figure 1 above, much more economical. This is particularly useful in shale formations (Vaca Muerta in Argentina, for example) that don't have sufficient permeability to produce economically with a vertical well. The type of wellbore completion is used to determine how many times a formation is fractured, and at what locations along the horizontal section.

The key variables related to the typical (unconventional) reservoir fracking process can be summarized as been:
  • The pressure (in psi), is used in the injection process.
  • The power consumption (in hp) while operating pumps and other pieces of equipment.
  • The volume (in m3) of water injected.
  • The quantity (in tons) of sand (or other proppants).
  • Type of completion ("plug-and-perf", "perf" and "sliding sleeve").
  • The well's horizontal-section length (in m).
  • The number of fractures.
In general, fracking is quite a complex and expensive process, therefore any methodology and/or procedure that allows for reducing costs, optimizing well designs, and enhancing the whole process, would be very welcome by operators.

So, when a few days ago while presenting to an O&G company an analytic solution regarding data from conventional reservoirs, someone asked me if it was possible to build something similar and apply it to publicly available fracking data, I immediately started wondering how to modify and adapt the End-to-End Analytics Solution of my last post, to unlock this new very interesting real-world use case.

To clarify the exposition, I've divided the current post into three sections:
  • Data preparation Tool Principal Components Analysis (PCA) and Visual Exploration
  • Weighted Association Rules Mining: Optimizing Wells' Design Parameters
  • Implementing a Content-Based Recommendation Engine: Optimizing Wells' Design Parameters
Let's move on, hoping this prototype of Analytics Solution could help answer some key business questions in the Fracking realm and be a springboard for future deeper analyses.

Data Preparation Tool Principal Components Analysis (PCA) and Visual Exploration

In previous posts, I've strongly emphasized the key role of data preparation, and the TRIFACTA platform, in any Data Analytics workflow. As already referred, the platform is available in two versions, on-premise and in the cloud; the last is part of the Google Cloud Platform ecosystem. To tackle the data challenges of the current real-world use case, TRIFACTA Cloud Dataprep will be unleashed; it'll allow, as illustrated in Figure 2 below, to take full advantage of the scalability, connectivity, and data pipeline's tool capabilities.

TRIFACTA data preparation tool: assess, transform and automate
Figure 2: TRIFACTA - Data Quality, Data Transformation, and Data Pipeline.

The data is a publicly available dataset from CAP-IV Secretaria de EnergiaIt comprised the information of the well's name, company, area, and reservoir; along with the well's horizontal-section lengths, the number of fractures, type of completion, pressure, etc.; from more than 1,850 wells of unconventional reservoirs in the Neuquen and Rio Negro provinces in Argentina. Additional data, regarding, the well's producing life (in days) or vu, and accumulated liquids' productions (oil, gas, water, etc.), was also available and uploaded together with fracking data in a bucked previously created in Google Cloud Storage (GCS).

Recipe to refine data for advanced visual exploration
Figure 3: Recipe to refine data for advanced visual exploration.

Figure 3 above illustrates part of the recipe of the data preparation process in TRIFACTA, carried out to generate the input requires to perform a Principal Component Analysis or PCA; to identify the most relevant and informative variables. To learn more about PCA, please take a look at my third post. Some of the PCA results can be visually explored in IMAGE#1 and IMAGE#2. The first image illustrates both quantitative and categorical variables in a general Variables-FAMD plot, and the second, a Quantitative (Numeric) Variables-FAMD plot; it's pretty clear from the last image which numeric variables, depicted as the longest and thickest red arrows, are of the GREATEST VARIABILITY in the data. In summary, the 15 most informative (out of 20) variables were selected for future analyzes.

Figure 4: Advanced Visual Exploration of PCA selected variables.

The fresh refined data also saved in GCS, was connected directly from Data Studio, where fully interactive visualizations and tables, as shown in Figure 4 above, were built; allowing to perform a more in-depth visual exploration of the most relevant variables identified by the PCA process.

The visual exploration of the most informative variables also surfaced possible interesting relationships between single variables and combinations of them. Following this promising lead, the next step, to be presented in the following section, was to perform a full Weighted Association Rules Mining (WARM) analysis, applying Market Basket Analysis techniques using, for example, the apriori and hits algorithms. To learn more about apriori and hits algorithms, please read the related discussion in my third post.

Uncovering possible relationships between the most informative variables and, in particular, the well's PRODUCING LIFE vu and the ACCUMULATED GAS PRODUCTION agp, would be of paramount interest.

Weighted Association Rules Mining: Optimizing Wells' Design Parameters

Indeed, to perform the WARM analysis and successfully accomplish the extraction of actionable relationships between the PCA's most relevant variables, using the apriori and hits algorithms, it was absolutely imperative to implement an advanced data preparation recipe in TRIFACTA Cloud Dataprep to generate the necessary input with the required format.

Recipe to refine data for Predictive Analytics and recommendations
Figure 5: Recipe to refine data for Predictive Analytics and recommendations.

First, the refined data from the 15 most informative variables were blended with other relevant information, particularly the already mentioned agp and vu. Next, it was required to convert all numerical variables (integer and decimal) into categorical variables, as illustrated in Figure 5 above. So, for example, the (decimal) variable corresponding to the well's horizontal-section length or hsl, with values distributed between 0m and 2,800m, was transformed into a categorical six-valued variable, i.e., 0m<=hsl<100m, 100m<=hsl<1000m,..., hsl>=2500m. A similar procedure was applied to the variables pressure, the volume of injected water, number of fractures, agp, vu, etc.

Recipe to refine data for Predictive Analytics and recommendations
Figure 6: Recipe to refine data for Predictive Analytics and recommendations.

Finally, it was required to Unpivot the blended-transformed data (as shown in Figure 6 above), to obtain a dataset that can be ingested directly by the Market Basket Analysis algorithms, and successfully carry out the WARM process. With the data ready, it was time to unleash apriori and hits algorithms power, and get the most value out of the freshly prepared data!

Before engaging with the WARM, it could be useful to visually explore the dataset obtained so far a little bit. Using arules and igraph R packages' visualization tools, a graph, shown in IMAGE#3, was built; tuning up the graph-plot function parameters (size, colors, etc.), it was possible to visually identify a few interesting features, as well as, uncover some important relationships, for example, between 150KMm3<=agp<250KMm3 (values of the accumulated gas production agp between 150K Mm3 and 250K Mm3) and 25Km3<=ainy<50Km3 (volume of injected water between 25K m3 and 50K m3), lrh>=2500m (well's horizontal-section length greater than or equal 2,500 m), etc.

With the last as a guide, the WARM analysis started evaluating the weights corresponding to each variable association, using the hits algorithm. Then, the weights obtained were plugged into the apriori algorithm, and the association rules (variable associations) were induced or generated. As I've already mentioned in previous posts, rules generated in the WARM process are made of antecedent (lhs) and consequence (rhs) sets of variables or factors. So, by tuning up apriori's parameter properly, it is possible to induce rules, for example:
  1. with a target or specific element in the CONSEQUENCE rhs-set,
  2. or select a group of particular elements or factors in the PRECEDENCE lhs-set.
When the rhs (consequence) was, for example, set to {150KMm3<=agp<250KMm3} (target value of the accumulated gas production agp between 150K Mm3 and 250K Mm3, indeed values very attractive from an economic return point of view), the TOP-30 induced rules obtained, sorted decreasing by the metric Lift, are shown in Figure 7 below. The green rectangle highlights the five combinations with the highest Lift. It's important to notice that the elements appearing in the lhs sets are non-random combinations at all. Remember, the greater the Lift value is than 1.00, the smaller the likelihood that the induced rule occurs by chance. So, the induced rules or associations generated are strong rules, and the recommended values (appearing in the lhs) of pressure, the number of fractures, the volume of water injected, completion type, etc., could be used directly by the domain expert or engineer to improve the design of new unconventional wells.

WARM - optimizing wells' design parameters
Figure 7: WARM - optimizing wells' design parameters.

Now, the rhs (consequence) was set, for example, to {600d<=vu<900d} (target value of the well's producing life vu between 600 and 900 days, again very attractive from an economic point of view). Figure 8 below depicts the TOP-30 induced rules (sorting decreasing by Lift). As before, the green rectangle highlights the four combinations with the highest Lift. Again, these are strong rules, and the recommended values (appearing in the lhs) of the number of fractures, horizontal-section length, pressure, etc., could be used directly by the engineer to improve the design of new unconventional wells.

WARM - optimizing wells' design parameters
Figure 8: WARM - optimizing wells' design parameters.

Finally, if a group of specific elements is included in the PRECEDENCE lhs-set, for example, {2000m<=lrh<2500m, Tapon-disparo, 50Km3<=ainy<75Km3, 10.5Kpsi<=pre<11Kpsi, 16Khp<=w<18Khp, 30<=NUMFRA<40} (value of horizontal-section length between 2,000 m and 2,500 m, completion type plug-and-perf, the volume of water injected between 50K m3 and 75K m3, the pressure between 10.5K psi and 11K psi, power between 16K hp and 18K hp, and the number of fractures between 30and 40); the TOP-30 induced rules', sorted descending by the metric Lift, are shown in Figure 9 below.

WARM - optimizing wells' design parameters
Figure 9: WARM - optimizing wells' design parameters.

Interestingly, as depicted particularly from the rules highlighted by the green rectanglesalso in this scenario results are pretty consistent with CONSEQUENCE values of agp and vu, very attractive from an economic-return point of view. As before, the actionable insight unearthed by the WARM analysis could be used directly by the domain expert/engineer to recommend actions directed to improve the design of new (unconventional) wells.

Implementing a Content-Based Recommendation Engine: Optimizing Wells' Design Parameters

Also, it would be of interest to explore additional ways to optimize the proposed designs of new fracking wells, using the data and information already available for well-known wells (even possibly located in a different geographic location).

In this scenario, consider a group of new locations or proposed wells (let's call them the "REFERENCE WELLs"), whose design parameters have been estimated or generated synthetically... Could they be directly compared with well-known wells (let's call them the "SIMILAR WELLs"), allowing the domain expert/engineer to make adjustments in advance to the new well design parameters? Here was when implementing a Content-Based Recommendation Engine (CBRE) come in handy.

First, from the original refined data of the 15 most informative variables and the REFERENCE WELLs synthetic data (both uploaded and living in Google Data Storage); one-hot encoding could be used to transform categorical variables into numeric 0/1 codes, to generate the input (in TRIFACTA) required to evaluate a Similarity Matrix (using the Pearson Correlation Coefficient). The Similarity Matrix is a square matrix where each row (column) corresponds to a well; each matrix cell contains a value of the Similarity Index; the diagonal is filled with values equal to 1 (each well is identical to itself), and the off-diagonal elements, are values distributed between approximately -1 (very different) and approximately 1 (very similar). Evaluation of the Similarity Matrix can be performed in the R language framework, and it also must be saved into Google Cloud Storage.

Second, back in TRIFACTA Cloud Dataprep, the Similarity Matrix must be UNPIVOT and blended with additional relevant information. The resulting refined dataset, containing both REFERENCE and SIMILAR WELL data, would be the core of the recommender system. The latest results are also saved in Google Cloud Storage, ready to be connected.

The final step in the implementation of the Content-Based Recommendation Engine (CBRE) consisted in facilitating the access and utilization of the latest blended-transformed dataset to the end-users (engineer and/or domain expert), serving it in Data Studio as a fully interactive table and easy-to-digest informative visualization. Figure 10 below depicts the built solution.

Figure 10: Example Content-Based Recommender Engine - CBRE.

Referring again to Figure 10 above, picking up a REFERENCE WELL, and typing or selecting its name in the dropdown REFERENCE filter, the engine shows up, in the table at the bottom, a list sorted descending by the Similarity Index, of well-known SIMILAR WELLs and their related data and parameters; their names are also displayed, to the right, in a compelling visualization. The SLIDER control can be used to easily adjust the Similarity Index upper and lower bounds. If required, the user can apply additional filters and export the filtered data in a convenient format, or save it as a Google Sheet.

The recommendations delivered by the CBRE can be used immediately by the engineer and/or domain expert to make adjustments and customize the estimated new well(s) parameters, or to prescribe and directly implement in the new well's design, values that are taken from the SIMILAR WELL's parameters that have been already tested and deployed successfully; among other important practical applications. Here the fully interactive report is available:

SUMMARY

In this post, a functional prototype of a scalable cloud-based End-to-End Data Analytics solution was presented and discussed. It comprised:
  • construction of a data repository in Google Cloud Storage,
  • implementation of complex and advanced data preparation recipes in TRIFACTA Cloud Dataprep, to transform and reshape a publicly available hydraulic fracking dataset,
  • PCA and Weighted Association Rules Mining Analysis carried out by applying algorithms and methods available in the R language framework,
  • and implementation of a Content-Based Recommendation Engine served in Data Studio.
The results of the analyzes presented in this post can be used directly by engineers and/or domain experts, for example, to prescribe and directly implement in new unconventional well(s), values of the SIMILAR WELL's parameters that have been extensively tested and deployed successfully; among other important practical applications.

Hoping the presented example of Analytics Solution could help to answer some key business questions in the Fracking realm; and that it'd be, in the short-term, a springboard to broader and deeper analyses. In future posts, I’ll continue unlocking and presenting more real-life use cases. Please, stay tuned and don’t miss them out. And kindly, leave your comments below and share. Thank you!

Comments