This list started out as a way for me to keep track of data science resources I've found helpful. However, I frequently get asked for data science resource recommendations by other data scientists and friends looking to break into data science. So I've continued to add to this, with a focus on beginner- and intermediate-level resources. Where possible, I've included links to the (legitimate) free versions of books. One of the great things about the data science community is the willingness to open-source and make things available for free. Within each category or sub-category the resources are listed very loosely in order of usefulness/introductory level to more advanced (but not entirely).
This list is far from complete, but I'll try to continue to add to it. Hopefully you find it helpful.
Non-exhaustive list of additional topics to add:
- Spark
- time series forecasting
- docker
courseCoursera - Introduction to Data Science in Pythoncoursecodecademy - Learn Python 3ebookPython Like You Mean ItbookAutomate the Boring Stuff with PythonbookPython for EverybodybookLearn Python the Hard Way maybe not my favorite resource, but was still usefulbookPython Data Science Handbookvideo seriesCalm Code- Google Python style guide
courseKhan Academy - Statistics- Stanford Experimental Design course and course notes
courseCoursera - Statistics with Python SpecializationbookOpen Intro StatisticsbookIntroduction to Empirical Bayes by David RobinsonbookThink BayesbookThink Stats
course-MIT OCW- Introduction to Computer Science and Programming in PythonebookProblem Solving with Algorithms and Data Structures using PythoncourseKhan Academy - AlgorithmscourseCoursera - Algorithms Specializationcourse-MIT OCW- Introduction to Algorithms- HackerRank 30 days of code
github repoAwesome AlgorithmsbookIntroduction to Algorithms by Cormen, Leiserson, Rivest and SteinarticleLearn X in Y minutes: BasharticleBash scripting cheatsheet
courseCoursera - Machine Learning by Andrew Ng foundational knowledge of machine learningcourseApplied Data Science with Python Specialization more immediately applicable than the previous coursebookAn Introduction to Statistical Learning with Applications in R (ISLR), 2nd edition by James, Witten, Hastie, TibshiranibookThe Hundred Page Machine Learning bookbookApproaching (Almost) Any Machine Learning ProblembookMining of Massive Datasets andcourseedX/Stanford - Mining Massive Datasetsarticle seriesMachine Learning MasteryarticleHow to Train a Final Machine Learning Model on Machine Learning Masterybook(advanced material) Probabilistic Machine Learning: An Introduction by Kevin Murphybook(advanced material) Elements of Statistical Learning- Papers With Code
- ArXiv Sanity Preserver
course-HarvardCS 109 Data Sciencecourse-CornellCS 4780 Machine Learning lecture notes and lecture youtube videoscourse-MITIntro to Machine Learningcourse-WisconsinMachine Learning Sebastian Raschka
paperUMAP: Uniform Manifold Approximation and Projection for Dimension Reduction by McInnes et al.blogUnderstanding UMAP by Andy Coenen and Adam PearcearticleHow Exactly UMAP WorkspaperVisualizing Data using t-SNE by van der Maaten and HintonblogHow to Use t-SNE Effectively
blogVisualizing DBSCAN by Naftali HarrisAPI documentationHow HDBSCAN WorkspaperAccelerated Hierarchical Density Clustering by McInnes and Healy, 2017blogUnderstanding HDBSCAN and Density-Based Clustering by Pepe BerbastackoverflowHow to select a clustering method? How to validate a cluster solution?stackoverflowEvaluation measures of goodness or validity of clusteringpaperWhat are the true clusters? by Christian HenningpaperDensity-Based Clustering Validation by Moulavi et al, 2014
paperOn the Surprising Behavior of Distance Metrics in High Dimensional Space by Aggarwal et al., 2001blogEscaping the Curse of Dimensionality by Peter Gleeson (FreeCodeCamp)
articleLearning from Imbalanced Classes
github repoCurated papers, articles, and blogs on data science & machine learning in productioncourseStanford CS 329S: Machine Learning Systems DesignarticleOverview of the different approaches to putting Machine Learning (ML) models in productionarticleA Practical Guide to Maintaining Machine Learning in Production
github repo and tutorialsMade With ML by Goku Mohandasgithub repoAwesome ML OpsarticleML Ops: Machine Learning as an Engineering Discipline
courseCoursera - deeplearning.ai Deep Learning SpecializationbookHands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow and its associated github repo (the first ~200 pages are about general ML so this book could go under that section, but it's probably better suited for someone looking to learn about DL)courseGoogle's Machine Learning Crash CoursesiteNeural Network Playgroundblog postThe Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Kaparthy
github repoDeep Learning Drizzle giant list of university DL coursescourseStanford CS230 - Deep LearningcourseStanford CS231n - Convolutional Neural Networks for Visual RecognitioncourseYann LeCun's NYU course - DS-GA 1008 · SPRING 2020courseMIT Intro to Deep Learning
github repoDeep Learning Papers Reading RoadmappaperDelving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He et al, 2015paperA disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay by Smith, 2018
bookDeep Learning with Python, 2nd edition by François CholletcourseCoursera Deeplearning.AI Tensorflow Developer Professional Certificate
courseCoursera - Reinforcement Learning SpecializationbookReinforcement Learning by Sutton and Barto
courseCoursera - deeplearning.ai Natural Language Processing SpecializationcourseCS224n: Natural Language Processing with Deep LearningcourseAdvanced NLP with spaCycourseHugging Face coursebookNatural Language Processing with Python by Bird, Klein and LopercourseMichigan NLP course videos and githubarticleFROM Pre-trained Word Embeddings TO Pre-trained Language Models — Focus on BERTbookSpeech and Language Processing (3rd ed. draft) byarticleHow to get started in NLP
articleIntroduction to Word EmbeddingsarticleDocument Embedding Techniquespaperword2vec: Efficient Estimation of Word Representations in Vector Space by Mikolov et al.paperGloVe: GloVe: Global Vectors for Word Representation by Pennington et al. and Stanford webiste for GloVepaperfastText: Bag of Tricks for Efficient Text Classification by Joulin et al.paperUniversal Sentence Encoder by Cer et al., 2018
paperLDA: Latent Dirichlet Allocation by Blei et al.paperAnchored CorEx Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge by Gallagher et al., 2017 and githubgithubTop2Vec andpaperTop2Vec: Distributed Representations of Topics by Dimo AngelovgithubBERTopic andarticleTopic Modeling with BERT by Maarten Grootendorstblog post-StitchFix- Introducing our Hybrid lda2vec Algorithm by Chris Moody
papertransformers Attention Is All You Need by Vaswani et al, 2017articleThe Illustrated GPT-2 (Visualizing Transformer Language Models by Jay AlammararticleHow GPT3 Works - Visualizations and Animations by Jay Alammar
bookTrustworthy Online Controlled Experiments: A Practical Guide to A/B Testing by Kohavi, et al.courseMicrosoft Experimentation Platform- Evan Miller's A/B test tools
paperThree Key Checklists and Remedies for Trustworthy Analysis of Online Controlled Experiments at ScalepaperTop Challenges from the first Practical Online Controlled Experiments SummitpaperControlled experiments on the web: survey and practical guide Kohavi et al, 2009articleGuidelines for A/B TestingarticleA/B Testing: 29 Guidelines for Online Experiments (Plus a Checklist)courseUdacity - A/B testing by GooglearticleHow Not To Run an A/B Test by Evan MillerarticleSimple Sequential A/B testing by Evan MillerpaperOverlapping Experiment Infrastructure: More, Better, Faster Experimentation by Tang et al, 2010blog postA/B Testing TutorialpaperControlled experiments on the web: survey and practical guide by Ron Kohavi et alpresentationOnline Controlled Experiments: Lessons from Running A/B/n Tests for 12 years by Ron KohaviarticlesMicrosoft's Experimentation Platform
articleUnderstanding Bayesian A/B testing by David RobinsonarticleIs Bayesian A/B Testing Immune to Peeking? Not Exactly by David RobinsonarticleAgile A/B testing with Bayesian Statistics and Python by Chris StucchioarticleThe Power of Bayesian A/B Testing
paperBest arm identification in multi-armed bandits with delayed feedbackpaperGeneralized Thompson Sampling for Contextual BanditspaperAnalysis of Thompson Sampling for the Multi-armed Bandit ProblempaperA Contextual-Bandit Approach to Personalized News Article RecommendationarticleA/B testing — Is there a better way? An exploration of multi-armed bandits
courseCoursera - Introduction to Git and GitHubebookPro Git
articleStructuring Your Project: The Hitchhiker's Guide to PythongithubCookiecutter data scienceblog postHow to Set Up a Python Project For Automation and Collaboration by Eugene Yan
articleThe importance of structure, coding style, and refactoring in notebookstutorialProduction Data SciencearticleCoding habits for data scientists
articleEffective Python Testing With Pytest Real PythonarticleBecoming a Better Data Scientist: Testing with pytest by Chang Hsin LeearticleUnit Testing for Data Scientists
tutorialPyPA Packaging Python projects tutoriale-bookPython Packages e-booke-bookThe Joy of Packaging- poetry
articleHow to Build Your First Python Package
courseCoursera - Getting Started with AWS Machine LearningcourseCoursera - AWS Cloud Technical EssentialscourseCoursera - Practical Data Science Specialization- AWS Ramp up guide
- Flask Mega-tutorial by Miguel Ginberg
articleParameter Tuning with Hyperopt by District Data LabsarticleOn Using Hyperopt: Advanced Machine Learning by Tanay AgrawalarticleAn Introductory Example of Bayesian Optimization in Python with Hyperopt by Will Koehrsen
github repoEthicalML Awesome Production ML
packageFairlearn
articleData science learning resources by Microsoft Data Science teamblogEnd-to-End Machine Learning by Brandon Rohrer (some good free resources, some paid)github repoAwesome Machine LearningblogFree online machine learning curriculum by Chip Huyen
bookBuild a Career in Data Science by Emily Robinson and Jacqueline Nolisarticle80000 hours: Data Science career reviewQuoraAs a data scientist, what career advice changed your life?blog postA Framework for Career Decisions by Conor Dewey- ApplyingML - Mentor interviews by Eugene Yan
- talks by Angela Bassa
blog postApplied / Research Scientist, ML Engineer: What’s the Difference? by Eugene YanredditDifference between DS and MLEarticleMachine Learning Engineer vs Data Scientist (Is Data Science Over?)
- Open-Source Data Science Masters
articleHow to Build a Data Science PortfoliogithubAwesome Data Science
articleUnpopular Opinion - Data Scientists Should be More End-to-End by Eugene Yanarticle-Stitch FixBeware the data science pin factory: The power of the full-stack data science generalist and the perils of division of labor through function by Eric Colson
blog postFinding Answers to your Career Questionsblog postEngieering Management: The Pendulum or the ladder by Charity Majorsblog postThe Engineer/Manager Pendulum by Charity Majorsblog postSenior engineer and then what? by Ju Yang
articleModels for integrating data science teams within organizationsarticle-CourseraAnalytics at Coursera: three years laterarticle-CourseraWhat is the most effective way to structure a data science team?article-AirBnBAt Airbnb, Data Science Belongs EverywherearticleEmbedding Data Science In Cross-Functional Teams
blogBuilding a data team at a mid-stage startup: a short story by Erik Bernhardssonblog-StitchFixLet Curiosity Drive: Fostering Innovation in Data Science
bookIntroduction to Machine Learning Interviews Book by Chip HuyenarticleData science career advice to my younger self by Schaun Wheelerblog postHow to Break Into the Tech Industry—a Guide to Job Hunting and Tech Interviews by Haseeb QureshiarticleMastering the Data Science Interview Loop by Andrei Lyskovblog postReverse Interviewing Your Future Manager and Team by Gergely Oroszblog postRed Flags to Look Out for When Joining a Data Team by Eugene Yanblog postRed Flags in Data Science Interviews by Emily Robinson
articleHow to manage Machine Learning and Data Science projectsarticleData Science and Agile (What works, and what doesn't) and Data Science and Agile (Frameworks for effectiveness)
articleJobs To Be Done Frameworkarticle series-SequoiaData-Informed Product BuildingarticleSequoia Data Science Team Measuring Product HealtharticleSequoia Data Science Team Retention
article10 Reads for Data Scientists Getting Started with Business Models by Conor Dewey