protectdream的个人博客分享 http://blog.sciencenet.cn/u/protectdream

博文

How to learn Data Science from Beginners to Masters

已有 1531 次阅读 2022-3-21 05:31 |个人分类:技术类|系统分类:生活其它

In this complete article, we would be discussing how can a complete freshman can start their journey in the vast field of Machine Learning and Data Science starting from learning core concepts and writing basic codes all the way to cracking interviews and gain experience over time. There is soo much content available on the internet and in books, but what to read and what not? Totally confused! Let’s set back ourselves and go back in time to start everything from the beginning.

If you are a Beginner and want to learn Data science with a great passion towards it, then trust me, this article would definitely help you to build your plan to learn Machine Learnign and Data Science.

Before that, Just a small tip, this article is going to be short (😅 not so short), simple and precise one, discussing the exact approaches I would have followed if I were to start back again in time and prepare for becoming a Data Scientist with a cool time travel storytelling!

So as said, let’s go back in time a year from now and start our journey.


So basically first let us create a checklist of things that we could have thought of before jumping directly into exploring things and leaning. My simple descriptive checklist is present below:

  1. Learn a Programming Language (R or Python).

  2. Get familiar with applied mathematics (LA, Stats, Prob).

  3. Start reading blogs on ML/AI and listen to podcasts.

  4. Read some amazing books to build up foundations.

  5. Learning Machine Learning and Deep Learning.

  6. Work on experimenting with the skills and do some hands-on.

  7. Building some end-to-end projects with competitive Data Science.

  8. Apply and crack interviews.

Now, the checklist would be useless until we know the best and useful resourcesto start with and keep ourselves pumped-up always towards learning.

These two things if directed perfectly towards execution could make me a good DS really in just one year!

Learning a Programming Language

For becoming a data scientist, you won't need to be a pro-coder or a 5 star on CodeChef or TopCoder, you just need to know how to write a well-optimized code in your preferred language. People from various backgrounds especially with zero coding experiences have proven to become good data scientists in just one year by learning to code smartly.

Google Trends R (red) vs Python (blue)

Choose a language: Python and R programming have been one of the best-supported languages since 2014 for Machine Learning and Data Science due to their ease of use and vast support of an exhaustive list of libraries to literarily do anything with just a very few lines of code. The above google trends graph shows how popular these languages were on the google search engine. One can try out both languages and explore which one suits you better and which one you think could help you more for your job profile later.

Some people would even learn both but for time being, I would choose Python between them based on my requirements. A few resources to learn Python as per my experience are youtube: Sentdex or Corey SchaferOther than that I would prefer taking a month pack on DataCamp and try out my hands-on python lessons or for free, LearnPyhon.org would also have helped the same.

Get familiar with applied mathematics

Anyone can make a machine learning model work with just 3–4 lines of code but have you ever thought what's going on behind the scene. The core of a machine learning algorithm hosts the mathematics running behind and making it work for us. Support of libraries has eased the work for us but we need to be clear with how it's working and how can we even build our own models.

To do that we need to be very clear with the core mathematics like Linear Algebra, explaining to me the geometric intuition of each and every algorithm and how can I even make my own model with the understanding of algebra and vector system. Topics such as understanding the matrix algebra and vector system which can be used in Principle Component Analysis (PCA), Support Vector Machines (SVMs), and many other mathematical models.

Secondly, Statistics and Probability are really important to understand the patterns in data and finding insights form that. Statistical and Probability Theory needed for machine learning are understanding of various distribution like Gaussian distribution, Binomial distribution, Probability Rules Bayes Theorem based on conditional probability, Perito law, etc.

Calculus also plays a very important role in understanding mathematical models and some of the necessary topics include Differential and Integral Calculus, Laplacian, Jacobian, Partial Derivatives, Directional Gradient, Lagrangian Distribution, etc.

Finally, Algorithms and optimization problems are another very important branch of mathematics required for computational efficiency and scalability of our Machine Learning Algorithm. Understanding about writing a well optimal model building or data normalizing could be well built using optimal algorithmic understanding.

Some good resources could be the ISLR Book or the Mathematics for Machine Learning Specialization or I could do Khan Academy’s Linear AlgebraProbability & StatisticsMultivariable Calculus, and Optimization courses, etc.

Reading blogs, articles and listening to podcasts

There are tons of amazing blogs being published by highly experienced people each day. I can choose to read a few every day. Now I need to plan how many blogs/articles I can digest in a day without breaking the flow and understand whats going on in the world of Data Science or get familiar with some tech AI news or can be anything. Some really good publishers who regularly publish some amazing content are Towards Data ScienceData Driven InvestorAnalytics Vidya, or KD Nuggets, etc.

Listening to a few great podcasts could drastically improve my skills in understanding the science behind the working of amazing projects or how deep researchers are disrupting the field of Machine Learning and AI. Podcasts basically help us to build well attainable data literacy to communicate our data stories with everyone. A good list of podcasts one can listen regularly is here out of which I personally would prefer DataFramed by Hugo-Bownie or SuperDataScience Podcast by Kirill Eremenko or sometimes DataHack Radioon sound cloud for day to day basis.

Reading Machine Learning Books

“Reading is essential for those who seek to rise above the ordinary.” — Jim Rohn

Reading books is one of the most important things which I should do this year to enhance my learning. I should say this, if you are a book lover then books are one of the best sources to enhance your learning. I can choose to read freely-available e-books or purchase the hardcopy, anything can work. There are a lot of books one can purchase and start reading but some good books which I would suggest reading are:

  1. Python for Data Analysis,

  2. An Introduction to Statistical Learning,

  3. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow,

  4. Pattern Recognition and Machine Learning by Christopher M. and Bishop,

  5. Deep Learning by Ian Goodfellow,

  6. Some other O'Reilly books are also great, etc.

Learn Machine Learning and Deep Learning

Online MOOC courses could be a very good source to learn ML and DL in less time and keeping the journey interactive. I would prefer to take the following courses for learning ML and DL.

  1. Machine Learning by Stanford University,

  2. Applied Data Science with Python Specialization University of Michigan,

  3. Deep Learning Specialization by deeplearning.ai,

  4. Applied AI course by Srikanth Varma,

  5. Data Science and ML courses by Kirill Eremenko at Udemy,

  6. For CNN and Computer Vision CS231n by Stanford University,

  7. And for NLP CS224n by Stanford University are beautiful courses, etc.

Apart from these ones, some standard universities across the globe also provide amazing online or offline courses for ML and Data Science. One can check them out at their official website if want to explore more or read the article associated with it to know more.

Skill Gym and hands-on practice

‘Tell me and I forget. Teach me and I remember. Involve me and I learn.’ — Benjamin Franklin

Greatly said, if one just keeps on learning and not involving in things then they can’t save knowledge in persistent memory for long. Doing hands-on is one very important process for learning. I would undoubtedly prefer either writing code for each concept I learned on my local machine and push it on some cloud or GitHub to keep safe or use some good online platforms to do the same on their clouds such as DataCamp or Dataquest. This would help me to not only make my concepts strong but also to improve my coding skills over time which I could anytime revise by just looking at the code.

Building end-to-end Projects and participate in Competitions

Working on projects to apply what I have learned is the past is very important to build a sense of how ML projects are built in the real industry. You would get an exhaustive list of projects you can work on DataFlairSimplilearn or any random article on the internet can help you start with making basic projects. Secondly, making a well-documented repository of my code for a public project on GitHub or any VCS would definitely help me in building a good portfolio.

Participating in competitions would also eventually help me in becoming good at coding and to apply my skills to real-world problems and see where do I stand globally on the leaderboard, which can help me rectify my mistakes and build a better self. I would take part in competitions on any of the KaggleAnalytics VidyaDriven Data or HackerEarthetc. who host amazing global research as well as industry level competitions. But make sure you do not end up a overfit😂!

You can read this article below to know more about my journey on Kaggle from the future(😅 don’t forget we are in the past right now).

How I became Kaggle 3X-Expert in Just 1 Month and a Master in 3 months!

Let’s talk about my story on becoming a Kaggle Expert in just one month of joining the platform and how I managed to…

towardsdatascience.com

Crack an Interview

Now I should be completely ready with everything and start applying for a data science role and should work on improving my communication skills, data storytelling, and other soft skills. I should be able to understand my work done in the past and projects I have done to the fullest and could be able to explain them in the best possible way then no one else.

Your knowledge won’t get you a good job unless you have a good portfolio which says a lot about you, read this article if you are interested in learning more about building a good portfolio for yourself!

How to Build a Data Science Portfolio that can get you a Job?

Learn to Make a Strong Portfolio that Speaks about you!

towardsdatascience.com

If I follow all the things now, within a year I would be able to break through the ML and Data Science tech and would be able to bag a good job, same goes for you my friend who walked through the complete journey with me through this article😉!

Let’s come back in time in 2020 after a cool time travel and make the most use of the time we have due to the COVID pandemic uncertainty to learn and become good at Machine Learning and Data Science.

Bonus tip for you!

Congratulations 🎉, if you are still reading this article so far, you are really the person who has persistence and determination towards being a Data Scientist and would eventually become one soon!

If you are a beginner and was looking for something to start with, I hope you would have got a clear picture to choose and plan your own custom checklist.

Want to know more about Statistics, Machine Learning, and Data Science?

You can read my other similar blogs here:

How to become Super Powerful and Successful?

Unleash the Super Powers within you!

medium.com

How to Evaluate Machine Learning Model Performance in Python?

A Practical Approach to Compute the Model’s Performance and Implementation in Python covering all Mathematical…

towardsdatascience.com

The Powers of “Normal Distribution”

Understanding the Science behind a Bell Curve!

towardsdatascience.com

Thank you and Best of Luck for your new journey!

Additional Resources and References

  1. https://medium.com/towards-artificial-intelligence/top-universities-to-pursue-a-masters-in-machine-learning-ml-in-the-us-ai-d4a461229fbb

  2. https://towardsdatascience.com/python-vs-r-for-data-science-6a83e4541000

  3. https://towardsdatascience.com/the-mathematics-of-machine-learning-894f046c568

  4. https://towardsdatascience.com/top-competitive-data-science-platforms-other-than-kaggle-2995e9dad93c

  5. https://towardsdatascience.com/8-ml-ai-projects-to-make-your-portfolio-stand-out-bfc5be94e063




https://wap.sciencenet.cn/blog-2866696-1330329.html

上一篇:《退役野战军官评俄军表现》
下一篇:孤独
收藏 IP: 72.70.60.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-19 19:28

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部