Let’s put it out straight: Data scientists are not going to disappear soon. Their roles are evolving with the emergence of new tools of Generative AI in data science.
Siddhartha Sharan, a senior data and applied scientist at Microsoft, said in a podcast that these tools (LLMs) can help with efficiency and problem-solving, but they cannot replace data scientists or data engineering jobs.
Vin Vashishta, an AI expert, agreed with this view. He said that tools of generative AI in Data science can augment people, but they are not substitutes for them. He said that most tools are still in the proof-of-concept stage and have bugs to fix before we worry about AI taking over people’s jobs.
Generative AI in Data Science: Boosting Data Scientists?
Generative AI in data science is a powerful tool that can help data scientists with various tasks. For example, data scientists often have to deal with data that is messy, incomplete, or insufficient for their needs. Generative AI in data science can automate the process of data cleaning and formatting, as well as generate synthetic data that resembles the real data in terms of properties and distribution. This way, data scientists can have more data to work with and focus on more complex problems. Generative AI can also enhance the quality of data by removing noise, imputing missing values, or fixing errors.
Another benefit of generative AI in data science is that it can reduce the workload of repetitive tasks that data scientists face. For instance, data scientists may have to explain the same concepts or answer the same questions over and over again. Generative AI can create small models that can automate these use cases and provide consistent and accurate responses. This can save time and resources for data scientists and allow them to take on more challenging work. As Vashishta said, “We spend a lot of time explaining the same things or answering the same questions. As the business scales, that work scales too, and those repetitive tasks add significant overhead.
Small generative AI models make automating those use cases very simple. Offloading simple tasks free people’s time to take on more complex work.
Generative AI in data science is transforming the way data scientists work with data. Instead of spending hours or days collecting and cleaning data, they can use algorithms to create synthetic data that resembles real-world situations. This saves time and resources, and enables data scientists to focus more on the analysis and interpretation of results.
According to Gartner, 60% of data for AI will be synthetic by 2024, up from just 1% in 2021. This will help simulate reality, future scenarios and de-risk AI projects.
Furthermore, generative AI in data science can inspire data scientists to explore data in new ways. “Data scientists are becoming ‘solution scientists’, designing creative solutions using the GenAI toolset, or business automation architects, leveraging AI to build automated solutions for business functions,” said Ruban Phukan, co-founder & CEO at GoodGist.com, a skill development and education co-pilot for corporations. Generative AI is a game-changer for data science and business..
Generative AI in data science has made impressive strides in creating realistic and diverse content, but it is not a substitute for the expertise and creativity of data scientists. Generative AI in data science lacks the ability to tailor solutions to specific business needs, account for human factors, or learn relevant domain knowledge on its own.
For example, Sharan explained that sentiment analysis still requires human intervention, saying, “It is hard to imagine a fully automated system at this point because our approach is that AI does the first three passes, and then a human validates the results.”
For Aspiring Data Scientists
Sharan, an expert in generative AI, shared some advice for aspiring data scientists who want to keep up with the latest developments in this field. He said that data scientists should learn about different generative models, such as variational autoencoders, generative adversarial networks, and transformers, and their pros and cons, so that they can provide guidance on choosing the best model for a given problem, deploying it efficiently, and ensuring its long-term effectiveness.
He also stressed the importance of understanding the cost implications of using various language models, and finding ways to optimize the cost-benefit ratio of the product. He said that data scientists should not rely on using GPT-4 for summarisation tasks, as it may be too expensive and unnecessary. He said that data scientists should be aware of the changing expectations of the employers, who are looking for candidates with skills and experience in generative AI. For instance, he cited an example of a data scientist role at HP that requires a focus on generative AI and large language models.
Further he continued on AWS which expects its senior data scientists to work with customers and understand their needs and challenges related to generative AI. AWS also provides various tools and services for generative AI, such as Amazon SageMaker, Amazon Comprehend, and Amazon Lex.
Another one is IBM, which values the ability to stay current with the latest research and innovations in generative AI, especially in the domains of foundation models and large language models. IBM also offers various solutions and platforms for generative AI, such as IBM Watson, IBM Cloud Pak for Data, and IBM CodeNet.
– To help data scientists learn and apply generative AI, IBM has collaborated with Coursera to launch a course called Generative AI for Data Scientists Specialization. This course teaches how to use generative AI to solve real-world problems and generate novel data. The course consists of three modules that cover the basics of generative AI, prompt engineering, and data science applications. The course is available on Coursera’s website and costs $49 per month. The course duration is about one month, with 10 hours of study per week.
Know more about Generative ai in data science at Medium
Discover more from WireUnwired
Subscribe to get the latest posts sent to your email.
1 Comment