Spend less time cleaning sparse and patchy data sets to ensure that your AI and ML models are best in class.
The real world doesn't deliver data in as pristine a form as we would like it. Data wrangling is a key component of world class AI solutions. It is estimated that data scientists spend a staggering 82% of their time on non-model development activities. Principally this includes cleaning & preparing their data (38%) as well as reporting and presenting analysis (29%). Developing an effective data pipeline strategy will not only cut that time down, but also ensure that you are spending it effectively.
This is the first in a two part series on EDA. In this series you will learn the basics in dealing with missing data, dealing with outliers and how to develop effective strategies for numerical encoding of text based data.
In this first video we look principally at outliers in numerical features. We explain the 'IQR Method' (Inter-Quartile Range Method) to identify and eliminate outliers.
Click on the > 'Play' button to view the first video in this series on EDA.
AI and Machine Learning have been widely (and successfully) deployed in Financial Markets for a variety of solutions. Two common use-cases are Loan Default Prediction and Customer Churn. In this series we use two representative data sets that can be downloaded from the URLs below:
Customer Churn
https://www.kaggle.com/datasets/syviaw/bankchurners?select=BankChurners.csv
Loan default
https://drive.google.com/file/d/1WFvu8dnVwZV5WuluHFS_eCMJv3qOaXr1/view

