Lessons for Geoscientists from the book Real World AI: A Practical Guide for Responsible Machine Learning

In this blog article Enthought Energy Solutions Vice President Mason Dykstra looks at the recently published book titled “Real World AI: A Practical Guide for Responsible Machine Learning” in the context of both the technical challenges faced by geoscientists and how to scale.

Author: Mason Dykstra, Ph.D., Vice President, Energy Solutions 

In the newly released book titled “Real World AI: A Practical Guide for Responsible Machine Learning,” Alyssa Simpson Rochwerger and Wilson Pang share a number of examples of how organizations have succeeded – and failed – at integrating machine learning initiatives. Among them, “Only 20 percent of AI in pilot stages at major companies make it to production, and many fail to serve their customers as well as they could. In some cases, it’s because they’re trying to solve the wrong problem. In others, it’s because they fail to account for all the variables—or latent biases—that are crucial to a model’s success or failure.”

Artificial intelligence is pushing the boundaries of what is possible: live facial recognition, autocorrecting language editors, fraud detection, customer service chatbots and more. Advances in autonomous technologies are changing the way people and machines operate. And people are increasingly reliant on the power of AI technology to improve decision-making, safety, communication and productivity. In fact, today’s AI has advanced so much that most people are impacted by some form of AI every day without realizing it.

But using AI to solve complex, real-world business problems comes with a unique set of challenges that are absent in academic and scientific research settings. Overcoming these challenges requires resources, skills, collaboration and knowledge, of which data science is only one. Used appropriately, applied machine learning can help companies accelerate and transform their businesses in ways unimagined a few years ago. 

With today’s combination of open source big data, faster processing speeds and transformative advances in cloud computing, businesses have the potential to implement and scale their AI faster than ever. However, deploying AI initiatives is a huge undertaking and a treacherous process. According to industry analysts, more than 80% of AI projects never make it past the pilot stage. Why? 

The 4 Biggest Business Challenges of Deploying AI and the Geoscientist

“Real World AI” highlights four key challenges organizations face when leveraging AI. This categorization serves very well in providing guidance to geoscientists working to deliver value to their organizations through AI/machine learning technologies and workflows. 

1. Defining the problem

The first step to overcoming any challenge is identifying the problem. AI is no exception. Once you have a defined objective, you can implement a specialized approach that takes existing data, operational constraints and risk into consideration.

In “Real World AI,” defining the problem also includes determining how well you want to solve the problem. In health care, a cancer detection neural network requires human-machine interaction and a high degree of certainty to reduce risk and improve final outcomes, while many other life sciences applications have less consequential health risks. 

In the energy industry, the geoscientist must use practical knowledge and experience in defining both the problem and an acceptable level of accuracy required of the solution. Risks for shallow drilling hazards and uncertainty in pore pressure predictions are critical to operations and safety, while field development plans and reserves calculations have significant impact on financial decisions and, thus, results in problems such as these need to be accurate. On the other hand, less rigor might be considered for many problems that don’t have a high safety or financial risk profile. 

For geoscientists working to introduce AI into their workflows, it is best to start small, understand the risks, criticality of accuracy and have a clear line of sight to business value for the selected problem. It’s helpful to develop a plan to scale the new workflow across the organization early on, something often missing from AI pilot projects.  

2. Gathering training data

One of the biggest challenges associated with machine learning is gathering and organizing the right data to train models. In the real world, high-quality, accurate data is incredibly important – and incredibly difficult to collect. 

“When creating AI in the real world, the data used to train the model is far more important than the model itself,” Rochwerger and Pang write. “This is a reversal of the typical paradigm represented by academia, where data science PhDs spend most of their focus and effort on creating new models. But the data used to train models in academia are only meant to prove the functionality of the model, not solve real problems. Out in the real world, high-quality and accurate data that can be used to train a working model is incredibly tricky to collect.”

In upstream oil and gas, accessing and integrating accurate, up-to-date data is one of the industry’s biggest challenges, particularly for geoscientists. Data on reservoirs is added continuously at highly varying scales in space and time, from initial seismic and other remotely sensed data, to exploration log and core data, to production data. AI workflows and data infrastructure must be built with these physical and time scales in mind. The OSDU Data Platform is a significant step in addressing this challenge. 

In the future it is reasonable to expect the energy industry to develop analogues to ImageNet for training data on any number of geoscience applications. As more open source data becomes available, those companies with exclusive access to massive data sets will find it less and less of a competitive advantage. 

Considering the rapid advances in AI, if a business opportunity presents itself from exclusive data, it is best to move on it quickly before the advantage disappears. 

3. Maintaining machine learning models

To provide maximum value and maintain confidence in results, AI requires continuous human-machine collaboration. Regardless of the algorithm, machine learning is only as accurate as the labeled data and human input. If the data is of questionable accuracy or obsolete, the resulting model will be of limited use.

Rochwerger and Pang continue, “Don’t forget to allocate resources for the ongoing training of your model. Models have to be trained continually, or they’ll become less accurate over time as the real world changes around them.”

Machine learning models are increasingly becoming commodities. More and more data is becoming available, and analogues to ImageNet start to exist for various industries and domains. For subsurface data, ImageNet analogues are on the horizon, albeit initially only for domain-specific workflows, for example seismic or well log interpretation. 

More sophisticated models that take into consideration all available subsurface data are the next challenge for research and pilot projects, and then with success, to scale across organizations. Major operators with their broad and deep data sets may hold a competitive advantage here. How long such an advantage will persist is another question. 

A critical element for maintaining any machine learning model will be a highly intuitive user interface, specifically for integrating and labeling data for new generations of workflows. 

A first step in this direction is found in our machine learning application, SubsurfaceAI Seismic, where geoscientists are able to train the AI in the way they want the interpretation to be done. In this cloud-enabled application, domain experts provide ‘near real-time’ QC feedback on predictions made by the machine learning models. No data science or IT knowledge is required of the domain expert, another feature of future applications.  

4. Gathering the right team

The fourth main challenge “Real World AI” sets out is gathering the right team. Successfully solving real-world problems with AI, including scaling across the organization, requires deep, cross-functional collaboration. This collaboration extends from domain experts who ultimately discover the value, to parts of the organization often not considered by those closest to the challenge.  

“A business problem that can be solved by a model alone is very unusual. Most problems are multifaceted and require an assortment of skills—data pipelines, infrastructure, UX, business risk analysis,” Rochwerger and Pang observe. “Even with a wonderful business strategy, a well-articulated, specific problem, and a great team, it’ll be impossible to achieve success without access to the data, tools, and infrastructure necessary to ingest each dataset, save it, move it to the right place, and manipulate it.” 

Geoscientists today are in the transition from machine learning pilot projects to scaling across the organization. Experts often spend significant amounts of time interacting with data science experts and IT departments rather than working on their domain challenges. Deployment of external software with a strong AI component can be more challenging than traditional ones due to platform integration issues, particularly if plans include migrating to the cloud. The OSDU Data Platform is important to consider when planning for scaling of new generation workflows. 

In summary, this book is a worthwhile read for anyone with significant involvement in scaling advanced scientific software and new workflows across an organization. One comment to add from the Enthought experience across multiple industries: Start with a small project, have line of sight to business value and develop plans to scale to the organization as the project progresses. Look for pilot locations within the business where there is clear value, with domain experts committed to achieving success.  

We’d welcome a conversation about our experience. 


The Challenges of Applied Machine Learning on TechTalks 

Real World AI: A Practical Guide for Responsible Machine Learning on Amazon 

VentureBeat article

The OSDU Data Platform – open source, standards-based, technology-agnostic data platform 

The Enthought SubsurfaceAI Seismic custom deep learning application 

About the Author

Mason Dykstra is Enthought’s Vice President of Energy Solutions. As an intuitive thought leader with previous experience in academia, Statoil and Anadarko, he helps oil and gas companies connect the dots between science, engineering, technology and business needs. Mason leads the Enthought team of experts in tackling problems that contribute to the bottom lines of its customers. Connect with Mason on LinkedIn at linkedin.com/in/mason-dykstra-a304b25/ to join his online conversations.

Share this article:

Related Content


Enthought GKチームは、東京で開催されたライフサイエンスカンファレンス「ファーマIT&デジタルヘルスエキスポ2022」に出展し、技術的な見識と市場成長の活性化を求めて集まる製薬業界のリーダーたちと会談しました。三日間の会期中に200社が出展し、6700人以上の参加者が集まりました。 デジタルトランスフォーメーションが主要テーマである本展示会は、当社のターゲットとする企業に、製薬業界の新薬開発を加速させる当社のサービスを

Read More


OpenAIのChatGPTやGoogleのBardなど、大規模言語モデル(LLM)は自然言語で人と対話する能力において著しい進歩を遂げました。 ユーザーが言葉で要望を入力すれば、LLMは「理解」し、適切な回答を返してくれます。

Read More

Top 5 Takeaways from the American Chemical Society (ACS) 2023 Fall Meeting: R&D Data, Generative AI and More

By Mike Heiber, Ph.D., Di…

Read More

From Data to Discovery: Exploring the Potential of Generative Models in Materials Informatics Solutions

Generative models can be used in many more areas than just language generation, with one particularly promising area: molecule generation for chemical product development.

Read More

The Importance of Large Language Models in Science Even If You Don’t Work With Language

OpenAI's ChatGPT, Google's Bard, and other similar Large Language Models (LLMs) have made dramatic strides in their ability to interact with people using natural language....

Read More

Leveraging AI in Cell Culture Analysis

Mammalian cell culture is a fundamental tool for many discoveries, innovations, and products in the life sciences.

Read More

Extracting Value from Scientific Data to Accelerate Discovery and Innovation

In the digital era, robust data tools are crucial for all companies and the science-driven industries like the life sciences, materials science, and chemistry are...

Read More

Giving Visibility to Renewable Energy

The ultimate project goal…

Read More

Machine Learning in Materials Science

The process of materials …

Read More

AI Needs the ‘Applied Sciences’ Treatment

As industries rapidly adv…

Read More