Problem Solving with Data

For those just getting started with advanced scientific computing techniques, here are four steps to efficiently turn data into decisions with business value.

Author: Ryan Swindeman, Scientific Software Developer

In the 4 minute video below, Enthought scientist Ryan Swindeman puts data into context as foundational to any digital transformation initiative, setting out four fundamental steps for data to science problems.

1. Data Preparation (or Data Conditioning): This is the essential, first step in a digital project. Data must be clean and accessible. Access to data must be quick, and reliable. The data must be cataloged or categorized, so that there is consistency in how it is reached and integrated into projects. Data preparation must be in service of addressing a business need or objective, to solve a specific problem, and not be a case of ‘we need to organize our data’.

2. Data Visualization: Visualizing data is important as a starting point to understanding a problem. This involves looking at the data in its native domain, identifying trends, and from there possibly transforming it to a different domain, cross-plotting to look for relationships, or running statistics as a way to discover features. Visualization is also a reliable way to increase efficiency in problem-solving. The understanding gained through visualization is essential for deep learning – if you do not understand the underlying trends or relationships in the data, you will not understand the outcomes produced by any AI/ML/Deep Learning.

3. Modeling and Optimization: This step uses the underlying dynamics or physics of the problem, and the applications are endless. (In geophysics, this is often called forward modeling and inversion.) Most critically, modeling and optimization allows scientists to prove (or disprove) hypotheses very quickly, enabling teams to test, iterate and change strategy, often resulting in problems being solved quickly.

4. AI/ML/Deep Learning: These advanced computing techniques are related, and differ in important ways. Unlike modeling and optimization, or inversion (which is a physics-based approach), AI/ML/Deep Learning is a data-driven approach. These techniques are beneficial if forward modeling and optimization are not possible because of a lack of understanding of the underlying physics, or if the physics leads to too many approximations. The problem-solving and analytical power of AI/ML/Deep Learning becomes obvious in pattern recognition or texture analysis.

These four steps provide a robust sequence for solving problems using data, whether a small set or large, fundamental to digital transformation projects.

 

About the Author

Ryan Swindeman, Scientific Software Developer, holds a M.S. in geophysics from the University of Texas at Austin and a B.S. in physics from the University of Illinois at Urbana-Champaign, with graduate research in computational seismology.

Share this article:

Related Content

Enthoughtが定義する、製薬会社の研究開発ラボにおける真のDX

Enthought GKチームは、東京で開催されたライフサイエンスカンファレンス「ファーマIT&デジタルヘルスエキスポ2022」に出展し、技術的な見識と市場成長の活性化を求めて集まる製薬業界のリーダーたちと会談しました。三日間の会期中に200社が出展し、6700人以上の参加者が集まりました。 デジタルトランスフォーメーションが主要テーマである本展示会は、当社のターゲットとする企業に、製薬業界の新薬開発を加速させる当社のサービスを

Read More

科学における大規模言語モデルの重要性

OpenAIのChatGPTやGoogleのBardなど、大規模言語モデル(LLM)は自然言語で人と対話する能力において著しい進歩を遂げました。 ユーザーが言葉で要望を入力すれば、LLMは「理解」し、適切な回答を返してくれます。

Read More

ライフサイエンス分野におけるデジタル化拡大の課題

研究開発におけるイノベーションの規模拡大は、ラボか…

Read More

Top 5 Takeaways from the American Chemical Society (ACS) 2023 Fall Meeting: R&D Data, Generative AI and More

By Mike Heiber, Ph.D., Di…

Read More

Life Sciences Labs Optimize with New Digital Technologies and Upskilling

Labs are resetting the tr…

Read More

From Data to Discovery: Exploring the Potential of Generative Models in Materials Informatics Solutions

Generative models can be used in many more areas than just language generation, with one particularly promising area: molecule generation for chemical product development.

Read More

The Importance of Large Language Models in Science Even If You Don’t Work With Language

OpenAI's ChatGPT, Google's Bard, and other similar Large Language Models (LLMs) have made dramatic strides in their ability to interact with people using natural language....

Read More

Scientists Who Code

Digital skills personas f…

Read More

Making the Most of Small Data in Scientific R&D

For many traditional innovation-driven organizations, scientific data is generated to answer specific immediate research questions and then archived to protect IP, with little attention paid...

Read More

Digital Transformation of the Materials Science R&D Lab

“Digital transforma…

Read More