Common Data Science Challenges

Common Data Science Challenges

In a previous newsletter I point out some of the key differences that separate data science from traditional project management. While traditional project management is focused more on goals, planning, and tangible deliverables, data science is a more open-ended operation with the focus on discovery and innovation — less tangible, but no less valuable, deliverables.

Data Science Challenges

To arrive at a deeper understanding of the differences between traditional project management and data science, consider the unique challenges of a data science project:

  • Unlike traditional projects, data science "projects" have a much broader scope and are much less constrained by cost and schedule requirements. As a result, data science teams are more susceptible to wandering— losing focus and spending too much time trying to answer irrelevant or unimportant questions. Having a narrow scope, a limited budget, and strict deadlines just isn't compatible with the scientific method that data science teams should follow, but these teams still need to produce something of value to the organization.

  • While traditional projects can benefit from having a narrow and well-defined scope, data science teams often must resist forces in the organization that attempt to "box them in." The process must be empirical and exploratory. A data science team functioning as it should thinks outside the box. If a team is forced to engage in setting goals and achieving milestones, it is likely to look for what it already knows. A team is unlikely to discover anything new when it is forced to explore within the confines of a well-defined box.

  • Data science teams must also break away from traditional organizational structure and language. The language of most organizations still hinges on terms such as "mission," "objectives," and "outcomes." Meetings still usually revolve around setting goals and objectives, planning, and progress reports. Many organizations find it difficult to imagine a team devoted solely to exploration and discovery. As a result, data science teams often struggle to swim upstream against a very strong current.

Comparing a Traditional and a Data Science "Project"

Let’s look at a traditional project and compare it to what a data science team does. Then, we'll look at what often happens when traditional project management is applied to a data science team.

Consider a typical software project. Your organization wants to develop a human resources (HR) self-help portal for its employees. The project charter is to create the portal as a way to lower costs and improve overall employee satisfaction. The project will have a set cost, but the organization will save money by reducing HR costs and employee turnover. The estimated return on investment (ROI) for this project is substantial. The plan lays out all the features in a requirements document and includes a development schedule and detailed budget. The project manager will oversee development and update the plan to account for any changes in schedule, budget, or product requirements.

In contrast, consider how a data science team operates. The team is small — four to five people, including a research lead, a couple data analysts, and a project manager. Their "mission" is to help the organization come to a better understanding of the customers’ needs and behaviors in the hopes that this deeper understanding reveals opportunities to generate more revenue.

The research lead starts by asking questions such as these:

  • What do we know about our customer?

  • What do we assume about our customer?

  • Why does our customer shop with us instead of our competitors?

  • What might make our customers shop with us even more?

The data analysts do their job — analyze the data — to come up with answers to these questions. They deliver the answers in the form of data visualizations — graphic summaries of the data. For example, the data visualizations may be graphs that shed light on customer income and spend, as shown here. The x-axis (horizontal) represents income, and the y-axis (vertical) represents spending. Note that customers with higher incomes don’t necessarily spend more. Those who have an income around $20k–$30k seem to spend the most.

The analysts could also look at data from social media platforms and create a word cloud of feedback from thousands of customers, as shown below. For example, some of the largest words in the word cloud are “travel,” “recipe,” and “restaurant.”

Based on the knowledge and insight gleaned from these data visualizations, the team is likely to ask more questions, such as "Why do customers in a certain income bracket spend more than customers in higher or lower income brackets?" and "Why do our customers like to travel?" and "When our customers travel, where are they most likely to go?"

As you can imagine, knowing more about customers can lead to higher sales. The team could then share its discoveries with others in the organization. Marketing may decide to advertise more in travel magazines. Product development may shift its focus to products that are more closely related to travel. Sales might focus more if its efforts toward customers in a specific income bracket.

Then again, the team may hit a dead end. A data visualization created to analyze spending patterns among customers who travel and those who don't is inconclusive, as shown below. It reveals only that customers who travel outspend, by a relatively small margin, those who don't travel and that customers who do travel visit a variety of destinations around the world and the total spend by customers who travel to those destinations is no greater than the total spend by customers who don't travel. The data visualization doesn't provide sufficient evidence to support a change in what the company is doing, so the team abandons this line of enquiry and shifts direction.

Applying Traditional Project Management to Data Science

Imagine trying to shoe-horn data science into a traditional project management framework. How would you define the scope of the project when your exploration can lead you in so many different directions? How can you meet predetermined milestones when you're building an ever-increasing body of knowledge and insight about your customers? How can you possibly meet a deadline when you don't know, specifically, what you're looking for? How do you budget for time when you have no idea how long it will take to find the answers?

Data science is all about learning, and "learning" is a verb. Specifically, it is a verb in the form of a present participle, which conveys continuous action. Data science is engaged in ongoing discovery and innovation. It doesn't conform to the traditional project management framework. Don't try to force it to.

Frequently Asked Questions

What are some common data science challenges faced by a data scientist?

Common data science challenges include managing large amounts of data, ensuring data quality, integrating data from multiple data sources, and maintaining data privacy and security. Additionally, data scientists often face challenges in clearly defining a business problem and communicating findings effectively.

How can data scientists address the issue of data quality in their projects?

Data scientists can improve data quality by implementing robust data validation processes, cleaning raw data before analysis, and setting up thorough data collection protocols. Regularly auditing data sets to identify and correct inaccuracies also helps maintain high standards of data quality.

What is the importance of data privacy and security in data science?

Data privacy and security are critical in data science because they protect sensitive data from unauthorized access and misuse. Ensuring data privacy and security helps build trust with users and stakeholders and complies with legal and regulatory requirements, safeguarding the organization from potential breaches and legal consequences.

How do data scientists handle multiple data sources in a data science project?

Data scientists handle multiple data sources by integrating data into a single, cohesive data set for analysis. This involves using tools and techniques for data merging, transformation, and consistent formatting. Ensuring that data from different sources aligns properly is crucial for accurate analysis and insights.

What role does a data strategy play in overcoming data science challenges?

A well-defined data strategy helps data science teams align their efforts with business objectives, prioritize projects, and allocate resources efficiently. It addresses challenges by establishing clear guidelines for data collection, storage, analysis, and governance, ultimately improving the quality and impact of data science solutions.

How can data science professionals ensure effective communication of their findings?

Data science professionals can ensure effective communication by using clear and visual-based representations of data, such as charts and graphs, tailoring their messaging to the intended audience, and focusing on actionable insights. Simplifying complex technical details without losing the essence of the findings helps in better understanding and decision-making.

What are some methods to manage and analyze a lot of data efficiently?

To manage and analyze a lot of data efficiently, data scientists can use scalable data storage solutions, such as cloud storage and big data technologies. Employing advanced analytics tools and techniques, paralleling processing, and utilizing effective data preprocessing steps are also crucial for handling large data sets.

Why is it important to define the business problem clearly in a data science project?

Clearly defining the business problem is important because it sets the direction for the entire data science project. It helps data scientists design their data collection, analysis, and interpretation processes to ensure the solutions directly address the business challenge. Without a clear understanding of the business problem, the data science efforts may not yield actionable or relevant results.

How do data science experts deal with data privacy and security regulations?

Data science experts stay compliant with data privacy and security regulations by implementing data encryption, access controls, and anonymization techniques. They also stay informed about relevant laws and protocols to ensure that their practices adhere to legal requirements, thus protecting sensitive data and maintaining user trust.

This is my weekly newsletter that I call The Deep End because I want to go deeper than results you’ll see from searches or AI, incorporating insights from the history of data and data science. Each week I’ll go deep to explain a topic that’s relevant to people who work with technology. I’ll be posting about artificial intelligence, data science, and data ethics. 

This newsletter is 100% human written 💪 (* aside from a quick run through grammar and spell check).

More sources

  1. https://userpilot.com/blog/visualize-customer-feedback/

  2. https://www.datascience-pm.com/data-science-project-manager/

  3. https://getthematic.com/insights/visualizing-customer-feedback-word-clouds/

  4. https://www.displayr.com/visualize-your-customer-satisfaction-data-with-displayr/

  5. https://www.simplesat.io/understanding-feedback/the-ultimate-guide-to-customer-feedback-data/

  6. https://domino.ai/resources/field-guide/managing-data-science-projects

Sunday Adesina

Payment Integrity Leader | Healthcare Fraud SME | AI/ML Consultant & Data Science Problem Solver | HealthTech Product Manager

1mo

Useful tips, traditional watershed approach is not applicable to data science work, rather agile/sprints iteration methodologies help in discovery of more patterns that is data driven. Moreover, data science is not a project, it's a means to satisfy an end such as root cause, finding patterns, supporting decision making and predicting outcome. Not an end by itself.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics