Data science will be around

The field of data science has evolved over time and so has the role title. When I started my career (at the end of 2012) it was called differently, data mining. The idea was that we would “mine data” from databases. The focus was mainly in working with tabular datasets. Then came the “data scientist” role title, but the work remained relatively unchanged.

The reality is the vast majority of companies in the US are so far behind in Data Science that there is still very much a need for data scientists. It’s possible that the hype around it has cooled off but there is still a massive need for data scientists.

The need for data scientist will be there in 10 years without a doubt. It’s still very difficult for companies to be able to implement simple machine learning models. We hear constantly on social media that machine learning for tabular data is simple and can be automated with AutoML tools, but the reality of most companies is far from that.

It’s possible that the role name will change but there is still a massive difference between what a data scientist and data analyst do today in their day to day work. It’s possible that these roles will merge into one down the road, but this transition will a relatively long time.

Will chatGPT like tools replace data scientist or data analysts?

I’ve been working with data for more than 10 years, the reality is chatGPT is not useful to me. I’ve been using GPT-3 for projects since mid 2022, I know how to prompt it!

If you know about a topic, then you don’t need it. If you don’t know about it, then how are you going to check if the answers it’s providing to you are correct! Also as it’s built on “old data”, you won’t be able to ask about new topics.

But that’s just about the coding part, there is a lot more than coding in the data science work we do on a daily basis.

Here are some tasks that are impossible to automate with AI:

  • Project Planning
  • Brainstorming project ideas
  • Communicating with stakeholders
  • Data preprocessing
  • Solution architecture

And I could go on for days writing tasks where AI would have no clue what to do. Some of these tasks involve working with other teams but you still need someone on a data science role.

What is going to change?

I think the first ones that will take a hit are NLP engineers. Some of the NLP tasks might be just done with chatGPT or any other LLM that is available.

In some cases, it will be more cost effective to use an API rather than do all the work to have a customized machine learning model for a particular task.

There will still be NLP engineers but I think the need will decrease over time. One exception would be situations where performance and cost matter. In these situations, calling an API is not cost effective and there will be a need for an NLP developer.

This will still take some time to happen!

Which area of data science will be around without doubt?

I don’t think it will be possible to automate the tabular aspect of data science. You can automate the machine learning aspect of a project if you are willing to accept a lower quality product, but I don’t think you can automate all the steps needed to go from raw data to a data product (deployed model or insights).

1. Statistics and Causality

I think a deep understanding of statistics and causality will be very valuable in the future. It’s possible that statisticians and economist to be well positioned for this type of task.

2. Soft Skills

Being able to interact with multiple stakeholders and add value to an organization will always be in demand. Communication is also a great skill to have, if you want to move your career forward probably this is one of the most important topics to improve. I’ll also add leadership skills to this bucket. I don’t see AI leading teams anytime soon! 🙂

3. Going from Idea to a working project

Generally, senior data scientist get some sort of requirement from business and figure out a way to redefine this as a project that can be solved with data analysis or a machine learning model.

There is zero chance that AI will be able to do this.

4. Data Preparation

Being able to understand a database, join multiple tables and then be able to do an analysis or model with it will be practically impossible to automate. It’s possible that tools are created that help do this work more efficiently but I highly doubt how useful these tools will be.

Again, there is a need for a human to think and go from an idea to a query that is useful to answer business questions or develop a model.

5. Model Deployment

This area is mostly related to software engineering. The trend has been that this is becoming simpler as time passes but it’s still a significant amount of effort to deploy a model.

The bar gets much higher if the deployment is done in a website and you need results in real time. There will be a need of some interaction between a data scientist and data engineers or a solution architect.

6. Deciding which machine learning model to use

For tabular data, even if the model training code gets abstracted there are still cases where it makes sense to use some models more than others.

This logic could be automated also, but the need for supervision will be there still.

In a way, when I started my career most companies were using a GUI tool (SAS miner). However, we always needed some custom code to make it work well.

I don’t think there is a need for these tools as this part of the job is generally what almost anyone knows how to do. This is the “kaggle” aspect of data science and everyone knows that now.

Final thoughts

To summarize, I think the safest bet to be relevant in 10 years is to focus on working with tabular data. NLP and computer vision will likely get automated more easily and the number of people working on these topics will be smaller.

Investing in soft skills (leadership, public speaking, story-telling), advanced statistics and problem solving will probably pay off better than becoming an AI expert.


Leave a Reply

Your email address will not be published. Required fields are marked *