My Data Journey: From Maxwell's Equation to Prescriptive Analytics
After extensive Researching in Academia and Geomodeling in the Oil and Gas Industry Careers, I shifted my attention to Artificial Intelligence/Machine Learning (AI/ML), Advanced Visualization, Cloud Computing, Automation, and other cutting-edge technologies. But all the while my passion and goal remain to get the most value out of data, solving complex and diverse, real-world problems that can have the biggest impact. Now, I would like to share with you a few stories of this data journey. It hasn't been a straight-easy walkthrough, but a tortuous and sometimes painful path. To facilitate the exposition, I have divided it into four main milestones:
- The Origins
- Traffic Jams Modeling
- Geomodeling
- Data Analytics: Huge Challenges and Limitless Opportunities
Please relax and follow me for a while; hope you enjoy these stories as much as I have enjoyed writing them.
1. The Origins
In my last year in college and after having my degree in Physics and armed with perhaps my most powerful weapon, namely, SIMILAR PROBLEMS IDENTICAL SOLUTIONS, I was particularly interested in electromagnetic theory and Maxwell's Equations; in how to solve these partial differential equations and obtain closed solutions, to understand electromagnetic wave propagation and several other problems related to its applications in real-life situations.
Figure 1: “God said... (Maxwell´s Equations) and there was light”.
Indeed, as Figure 1 above illustrates (click on any image to enlarge), the understanding and the solution of this set of partial differential equations LITERALLY shed light upon nature's secrets; it was absolutely fascinating and I was determined to continue dig-in into matter. Certainly, the results of the lengthy and difficult calculations must be checked against experimental measurements (DATA!). However, for me at that point in time, dealing with measurements/DATA was the job of other people.
Unfortunately, the number of problems in Mechanics, Electromagnetism, Fluid Dynamics, etc., that have closed solutions, especially real-world problems that can have the biggest impact, represent only a small fraction. Nature always likes to hide her secrets and enjoys so much challenging human intellect... Always!
2. Traffic Jams Modeling
So, to broaden the scope and increase a little bit the number of real-life issues to be addressed, I realized that it was imperative to incorporate additional methods and techniques (and more measurements/DATA!). Here is where statistics' tools come in handy together with mathematical formalism, as in Quantum Mechanics, Statistical Mechanics, and Quantum Statistical Mechanics. No to mention that I was dragged into Programming, Data Visualization, and Numerical Analysis, among other interesting topics.
Figure 2: Cellular Automata and Traffic Jam Modeling.
Again, nature never stops raising the stakes and challenging human intellect; there are a great number of extremely complex problems, like the Modeling of Traffic Jams, in which the best approach is, most of the time, from the perspective of the measurements/DATA, stochastic modeling, and computational BRUTE FORCE; pretty far from the mathematical elegance of the Hilbert Spaces and partial differential equation exact solutions. However, it's worth a try.
Figure 2 above depicts the simulation using Cellular Automata techniques (image is taken from A Cellular Automaton Model for Freeway Traffic by K. Nagel and M. Schreckenberg), of traffic jams in a synthetic one-line road where the “cars” (represented by numbers) are moving from left to right. Jams appear as groups of zeros `0`. And yes, I couldn't help it to dive deeper into this fascinating subject and work on a few projects and in the publication of a handful of papers (see, for example, Inappropriate Use of the Shoulder in Highways - Impact over the Increase of Gas Consumption by A. Aponte, et al). And yes, my secret weapon, SIMILAR PROBLEMS IDENTICAL SOLUTIONS, was quite useful and very effective in achieving these goals.
3. Geomodeling
While enjoying Cellular Automata and traffic jam modeling, I was hired by an Oil and Gas company. Taking advantage of my experience in mathematical and stochastic modeling, I started working on Geomodeling or Geologic Modeling. Figure 3 below illustrates a typical geomodeling workflow. But, what exactly is a Geomodel? A Geomodel is, in its simplest terms, a spatial representation of the rock porosity, permeability, and hydrocarbon saturation in a reservoir.
Figure 3: Geologic Modeling or Geostatistical Reservoir Modeling workflow.
Ultimately, geomodels are consistent 3D representations of a WIDE RANGE of DATA and knowledge relevant to the understanding of hydrocarbon systems.
Geomodelling is commonly used to manage natural resources, identify natural hazards, and quantify geological processes, with main applications to Oil and Gas fields, groundwater aquifers, and ore deposits.
For example, in the Oil and Gas industry, REALISTIC geologic models are required as input to reservoir simulator programs, which PREDICT the behavior of the rocks under various hydrocarbon recovery scenarios. A reservoir CAN ONLY BE DEVELOPED and PRODUCED ONCE; therefore, making a mistake by selecting a site with poor conditions for development is tragic and WASTEFUL.
As mentioned before, the geomodeling process comprises a wide range of data and specialized software that in general offers similar difficulties in the model construction and standalone user experiences. So, now I was in a situation where the most complex and time-consuming step wasn't necessarily the modeling process itself but the gathering, depuration, and adequation of the (typically incomplete) relevant inputs. Gravity center has chanced toward data. Certainly, building a REALISTIC geomodel under such circumstances is pretty challenging. Realistic geomodels are a must for reliable PREDICTIONS: making mistakes is wasteful. This was pivotal in my data journey, a point of no return, and I was determined to go forward and tackle this and future DATA CHALLENGES.
How these and other data challenges were addressed? First, Excel formulas and obscure macros; cryptic scripts in C or Matlab languages (and lately, scripts in R language and/or Python). A messy file agony... At most, very limited, no-scalable, time-consuming solutions that when the volume of data, namely, the number of wells in the project was beyond a few tens, quickly become cumbersome and impractical. Time to get out of the box and explore other new data-knowledge-domains!
4. Data Analytics: Huge Challenges and Limitless Opportunities
Exploring new data knowledge domains really sounded very exciting. Great! But, where to start? I had no idea that to answer this question I would have to embark on the most extraordinary quest I have ever dreamed about. Indeed, Data Analytics is a very comprehensive topic that comprises a huge number of other also comprehensive topics, like Data Visualization, Programming, Statistics, Machine learning, Artificial Intelligence, and Probability Theory, etc., etc.; roughly grouped into Descriptive Analytics (about the past), Predictive Analytics (about the future), and Prescriptive Analytics (provides advice based on predictions); each of them a vast knowledge domain by itself. And all the matter is always evolving and fast-changing (amid the COVID-19 pandemic, it is evolving and changing even faster). It was (and continues to be) absolutely overwhelming!
So, I was required to commit all my experience and my secret weapon's full power (SIMILAR PROBLEMS IDENTICAL SOLUTIONS!) to address the challenges ahead. After a while (several months, indeed) of reading many articles and papers; watching a lot of videos, and attending several webinars, online courses, etc.; I realized that the best strategy was to keep it as simple as possible and focus only on a HANDFUL of methods and techniques. So, I was able to have a basic but solid background to unlock a few relevant use cases in a reasonable among of time. And, If required, I go back, read a little bit, watch more videos/webinars, and move forward again; back-and-forth, back-and-forth... In summary, move on and iterate a few times, if it is necessary. And, IT WORKED!
The next task was to pin down a few relevant and interesting real-life use cases and gather the necessary data to carry them out. However, I realized that to effectively apply any Machine Learning/Artificial Intelligence method or technique and obtain practical and usable results, it was imperative to clean, transform, reshape, and refine the raw data, first and effectively ready data for analysis.
And, forget Excel macros and cryptic scripts! The time had come to GO to the NEXT LEVEL and start using a powerful cloud-based, scalable, interactive data tool like Trifacta; it is the best choice to tackle any data challenge and get the job done in short times and at scale. To learn more about Trifacta, please take a look at the great ebook by Ulrika Jägare: From messy file agony to automated analytics glory, for a detailed introduction. Things were set and it was time to unlock some real-world use cases. Let's get this done!
After a careful search, I identified some interesting use cases in the Oil and Gas (O&G), Healthcare, Public Security, and Fitness Industries. As an advance of future discussions, I'll end this post with an example of the application of Market Basket Analysis and Association Rules Mining techniques to address mature oil fields' wells productivity. In future posts, I'll present and discuss other real-life interesting examples. Please, stay tuned, and don't miss them!
Market Basket Analysis is one of the key techniques used by RETAILERS to uncover associations (or rules) between items. It can be carried out using the apriori algorithm; using apriori together with the hits algorithm to perform Weighted Association Rules Mining or using a Machine Learning method as the Neural Designer. It works by looking for combinations of items that occur together frequently in transactions. Simply put, it allows retailers to identify relationships or rules between the items that people (customers) buy.
During the life of an oil/gas mature field, it's required at specific times, to apply to some of its wells particular actions/interventions, or WELL-EVENTS; to maintain or increase the well's production and the overall oil/gas field productivity. Now, if one imagines the RETAIL as the mature field, the CUSTOMER as the well, and the well-events as the ITEMS the customer has BOUGHT, it is not hard to conclude that it is possible to apply the same techniques and algorithms used by retailers, to uncover relationships or rules between the well-events. This analogy process (remember my secret weapon?) was also applied to address other interesting use cases that'll be discussed in future posts.
Indeed, once the associations of interventions/well-events or rules have been uncovered, it's possible to go a step further and, after a similarity analysis, prescribe some of them, for example, to newer wells with no interventions/well-events, located in the same field; or even deliver recommendations to wells located in other fields.
At this point in time, the effort and a more in-depth understanding of data preparation techniques paid off: the key step, the secret ingredient, in the successful implementation of the workflow described above (and others to be presented in future blog posts) was indeed to clean, reshape and blend the available well-events raw data into an input file with a format suited for the Market Basket Analysis' algorithms. This allowed quickly extracting new additional actionable insight from the well-events data; new valuable knowledge that can be used directly and/or integrated smoothly into other traditional O&G workflows.
Figure 4: Weighted Association Rules Mining application to well oil productivity.
The table in Figure 4 above (click on any image to enlarge) depicts the results of the analysis, ranked by the metric Lift. Lift summarises the strength of association between the well-events on the left (lhs: Precedence) and right-hand (rhs: Consequence) sides of the rule: the larger the Lift the greater the link between the combinations of interventions appearing on both sides of the " ==> " symbol.
A practical interpretation of these results is as follows. Referring for example to the rule labeled [2] in the table, if a well has been intervened (to increase its productivity) with the combination of events {"Punzar_Ensayo, Punzar_Ensayo_Fracturar"}, then the following action or intervention recommended to be applied would be {"Ensayo_Estimular"}; and so on. This is one example of how Data Analytics methods and techniques can be adapted and applied to other relevant problems and add significant value. Now the production engineer has at her/his disposal additional tools to analyze well-events data, and extra knowledge to support the decision-making process, and for example, optimize budgets.
To explicitly take into account the production fractional increment related to each intervention or well-event and include it in the calculations (Weighted Association Rules Mining), it is necessary to previously quantify the weights corresponding to each "transaction"/intervention; or evaluate these weights using, for example, the hits algorithm (already mentioned above). But, this is another story, a story to be told in a future post...
Wrapping up, this was an introductory fast-track of my data journey, from its origins, a time when I was fascinated and focused only on the mathematical formalisms and on problems with elegant closed solutions; the challenging and fast-growing knowledge experience when addressing the very complex issue of simulation of traffic jams; until now, the on-going tireless pursuit of the in-depth understanding of Data Analytics methods and techniques, to get the most value out of data and solve real-world problems that can have the biggest impact. I really hope you've enjoyed reading these paragraphs as well as I enjoyed writing each one of them.
In future blog posts, I'll present and discuss more real-life interesting, and relevant use cases. Please, stay tuned and don't miss them out. And kindly, leave your comments below and share. Thank you!
