Curated papers, articles, and blogs on data science & machine learning in production. โ๏ธ
Figuring out how to implement your ML project? Learn how other organizations did it:
- How the problem is framed ๐(e.g., personalization as recsys vs. search vs. sequences)
- What machine learning techniques worked โ (and sometimes, what didn't โ)
- Why it works, the science behind it with research, literature, and references ๐
- What real-world results were achieved (so you can better assess ROI โฐ๐ฐ๐)
P.S., Want a summary of ML advancements? ๐ml-surveys
Table of Contents
- Data Quality
- Data Engineering
- Data Discovery
- Classification
- Regression
- Recommendation
- Search & Ranking
- Embeddings
- Natural Language Processing
- Sequence Modelling
- Forecasting
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Generation
- Efficiency
- Validation and A/B Testing
- Ethics
- Practices
- Fails
- Monitoring Data Quality at Scale with Statistical Modeling
Uber - An Approach to Data Quality for Netflix Personalization Systems
Netflix - Automating Large-Scale Data Quality Verification (Paper)
Amazon - Meet Hodor โ Gojekโs Upstream Data Quality Tool
Gojek - Reliable and Scalable Data Ingestion at Airbnb
Airbnb - Data Management Challenges in Production Machine Learning (Paper)
Google - Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper)
Facebook
- Zipline: Airbnbโs Machine Learning Data Management Platform
Airbnb - Sputnik: Airbnbโs Apache Spark Framework for Data Engineering
Airbnb - Introducing Feast: an open source feature store for machine learning (Code)
Gojek - Feast: Bridging ML Models and Data
Gojek - Unbundling Data Science Workflows with Metaflow and AWS Step Functions
Netflix
- Amundsen โ Lyftโs Data Discovery & Metadata Engine
Lyft - Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code)
Lyft - Using Amundsen to Support User Privacy via Metadata Collection at Square
Square - Democratizing Data at Airbnb
Airbnb - Databook: Turning Big Data into Knowledge with Metadata at Uber
Uber - Metacat: Making Big Data Discoverable and Meaningful at Netflix
Netflix - DataHub: A Generalized Metadata Search & Discovery Tool
LinkedIn - How We Improved Data Discovery for Data Scientists at Spotify
Spotify - How Weโre Solving Data Discovery Challenges at Shopify
Shopify
- High-Precision Phrase-Based Document Classification on a Modern Scale (Paper)
LinkedIn - Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper)
WalmartLabs - Large-scale Item Categorization for e-Commerce (Paper)
DianPing,eBay - Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper)
NAVER - Categorizing Products at Scale
Shopify - Learning to Diagnose with LSTM Recurrent Neural Networks (Paper)
Google - Discovering and Classifying In-app Message Intent at Airbnb
Airbnb - How We Built the Good First Issues Feature
GitHub - Teaching Machines to Triage Firefox Bugs
Mozilla - Testing Firefox More Efficiently with Machine Learning
Mozilla - Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper)
Microsoft - Prediction of Advertiser Churn for Google AdWords (Paper)
Google - Scalable Data Classification for Security and Privacy (Paper)
Facebook
- Using Machine Learning to Predict Value of Homes On Airbnb
Airbnb - Using Machine Learning to Predict the Value of Ad Requests
Twitter - Open-Sourcing Riskquant, a Library for Quantifying Risk (Code)
NetFlix
- Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper)
Amazon - Temporal-Contextual Recommendation in Real-Time (Paper)
Amazon - Recommending Complementary Products in E-Commerce Push Notifications (Paper)
Alibaba - Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper)
Alibaba - TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper)
Alibaba - Session-based Recommendations with Recurrent Neural Networks (Paper)
Telefonica - How 20th Century Fox uses ML to predict a movie audience (Paper)
20th Century Fox - Deep Neural Networks for YouTube Recommendations
YouTube - Personalized Recommendations for Experiences Using Deep Learning
TripAdvisor - E-commerce in Your Inbox: Product Recommendations at Scale
Yahoo - Product Recommendations at Scale (Paper)
Yahoo - Powered by AI: Instagramโs Explore recommender system
Facebook - Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2)
Netflix - Learning a Personalized Homepage
Netflix - Artwork Personalization at Netflix
Netflix - To Be Continued: Helping you find shows to continue watching on Netflix
Netflix - Calibrated Recommendations (Paper)
Netflix - Food Discovery with Uber Eats: Recommending for the Marketplace
Uber - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber - How Music Recommendation Works โ And Doesnโt Work
Spotify - Music recommendation at Spotify
Spotify - Recommending Music on Spotify with Deep Learning
Spotify - For Your Ears Only: Personalizing Spotify Home with Machine Learning
Spotify - Reach for the Top: How Spotify Built Shortcuts in Just Six Months
Spotify - Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper)
Spotify - The Evolution of Kit: Automating Marketing Using Machine Learning
Shopify - Using Machine Learning to Predict what File you Need Next (Part 1)
Dropbox - Using Machine Learning to Predict what File you Need Next (Part 2)
Dropbox - Personalized Recommendations in LinkedIn Learning
LinkedIn - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1)
LinkedIn - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2)
LinkedIn - Learning to be Relevant: Evolution of a Course Recommendation System (PAPER NEEDED)
LinkedIn - How TikTok recommends videos #ForYou
ByteDance - A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper)
Twitter - Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper)
Google - Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper)
Google - Personalized Channel Recommendations in Slack
Slack
- Amazon Search: The Joy of Ranking Products (Paper, Video, Code)
Amazon - Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper)
Amazon - How Lazada Ranks Products to Improve Customer Experience and Conversion
Lazada - Using Deep Learning at Scale in Twitterโs Timelines
Twitter - Machine Learning-Powered Search Ranking of Airbnb Experiences
Airbnb - Applying Deep Learning To Airbnb Search (Paper)
Airbnb - Managing Diversity in Airbnb Search (Paper)
Airbnb - Ranking Relevance in Yahoo Search (Paper)
Yahoo - An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper)
Etsy - Learning to Rank Personalized Search Results in Professional Networks (Paper)
LinkedIn - Entity Personalized Talent Search Models with Tree Interaction Features (Paper)
LinkedIn - In-session Personalization for Talent Search (Paper)
LinkedIn - The AI Behind LinkedIn Recruiter search and recommendation systems
LinkedIn - Quality Matches Via Personalized AI for Hirer and Seeker Preferences
LinkedIn - Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn - AI at Scale in Bing
Microsoft - Query Understanding Engine in Traveloka Universal Search
Traveloka - The Secret Sauce Behind Search Personalisation
GoJek - Food Discovery with Uber Eats: Building a Query Understanding Engine
Uber - Neural Code Search: ML-based Code Search Using Natural Language Queries
Facebook - Bayesian Product Ranking at Wayfair
Wayfair - COLD: Towards the Next Generation of Pre-Ranking System (Paper)
Alibaba - Understanding Searches Better Than Ever Before (Paper)
Google
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper)
Alibaba - Embeddings@Twitter
Twitter - Listing Embeddings in Search Ranking (Paper)
Airbnb - Understanding Latent Style
Stitch Fix - Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper)
LinkedIn - Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper)
Sears - Machine Learning for a Better Developer Experience
Netflix - Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code)
Google
- Abusive Language Detection in Online User Content (Paper)
Yahoo - How Natural Language Processing Helps LinkedIn Members Get Support Easily
LinkedIn - Building Smart Replies for Member Messages
LinkedIn - DeText: A deep NLP Framework for Intelligent Text Understanding (Code)
LinkedIn - Smart Reply: Automated Response Suggestion for Email (Paper)
Google - Gmail Smart Compose: Real-Time Assisted Writing (Paper)
Google - SmartReply for YouTube Creators
Google - Using Neural Networks to Find Answers in Tables (Paper)
Google - A Scalable Approach to Reducing Gender Bias in Google Translate
Google - Assistive AI Makes Replying Easier
Microsoft - AI Advances to Better Detect Hate Speech
Facebook - A State-of-the-Art Open Source Chatbot (Paper)
Facebook - A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs
Facebook - Deep Learning to Translate Between Programming Languages (Paper, Code)
Facebook - Deploying Lifelong Open-Domain Dialogue Learning (Paper)
Facebook - Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper)
Amazon - How Gojek Uses NLP to Name Pickup Locations at Scale
GoJek - Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want
Stitch Fix - The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper)
Baidu - PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code)
Google - Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo)
Salesforce - Applying Topic Modeling to Improve Call Center Operations
RICOH
- Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)
Alibaba - Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper)
Alibaba - Learning to Diagnose with LSTM Recurrent Neural Networks (Paper)
Google - Deep Learning for Understanding Consumer Histories (Paper)
Zalando - Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper)
Telefonica - Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper)
Sutter Health - Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper)
Sutter Health - How Duolingo uses AI in every part of its app
Duolingo - Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper)
Facebook
- Forecasting at Uber: An Introduction
Uber - Engineering Extreme Event Forecasting at Uber with RNN
Uber - Transforming Financial Forecasting with Data Science and Machine Learning at Uber
Uber - Under the Hood of Gojekโs Automated Forecasting Tool
GoJek
- Categorizing Listing Photos at Airbnb
Airbnb - Amenity Detection and Beyond โ New Frontiers of Computer Vision at Airbnb
Airbnb - Powered by AI: Advancing product understanding and building new shopping experiences
Facebook - Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning
Dropbox - How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors
Deepomatic - A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper)
Google - Machine Learning-based Damage Assessment for Disaster Relief (Paper)
Google - RepNet: Counting Repetitions in Videos (Paper)
Google - Converting Text to Images for Product Discovery (Paper)
Amazon - How Disney Uses PyTorch for Animated Character Recognition
Disney - Image Captioning as an Assistive Technology (Video)
IBM - AI for AG: Production machine learning for agriculture
Blue River - AI for Full-Self Driving at Tesla
Tesla - On-device Supermarket Product Recognition
Google - Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper)
Google
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper)
Alibaba - Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper)
Alibaba - Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper)
Alibaba - Productionizing Deep Reinforcement Learning with Spark and MLflow
Zynga - Deep Reinforcement Learning in Production Part1 Part 2
Zynga - Building AI Trading Systems
Denny Britz
- Detecting Performance Anomalies in External Firmware Deployments
Netflix - Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code)
LinkedIn - Preventing Abuse Using Unsupervised Learning
LinkedIn - The Technology Behind Fighting Harassment on LinkedIn
LinkedIn - Uncovering Insurance Fraud Conspiracy with Network Learning (Paper)
Ant Financial - How Does Spam Protection Work on Stack Exchange?
Stack Exchange - Auto Content Moderation in C2C e-Commerce
Mercari - Blocking Slack Invite Spam With Machine Learning
Slack - Cloudflare Bot Management: Machine Learning and More
Cloudflare - Anomalies in Oil Temperature Variations in a Tunnel Boring Machine
SENER - Using Anomaly Detection to Monitor Low-Risk Bank Customers
Rabobank
- Building The LinkedIn Knowledge Graph
LinkedIn - Retail Graph โ Walmartโs Product Knowledge Graph
Walmart - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber - AliGraph: A Comprehensive Graph Neural Network Platform (Paper)
Alibaba - Scaling Knowledge Access and Retrieval at Airbnb
Airbnb - Traffic Prediction with Advanced Graph Neural Networks
DeepMind
- How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats
Uber - Next-Generation Optimization for Dasher Dispatch at DoorDash
DoorDash - Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3)
Lyft - The Data and Science behind GrabShare Carpooling (PAPER NEEDED)
Grab - Optimization of Passengers Waiting Time in Elevators Using Machine Learning
Thyssen Krupp AG
- Unsupervised Extraction of Attributes and Their Values from Product Description (Paper)
Rakuten - Information Extraction from Receipts with Graph Convolutional Networks
Nanonets - Using Machine Learning to Index Text from Billions of Images
Dropbox - Extracting Structured Data from Templatic Documents (Paper)
Google
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper)
Google - Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper)
Intel - Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple - Bootstrapping Conversational Agents with Weak Supervision (Paper)
IBM
- Better Language Models and Their Implications (Paper)
OpenAI - Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post)
OpenAI - Image GPT (Paper, Code)
OpenAI - Deep Learned Super Resolution for Feature Film Production (Paper)
Pixar
- The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper)
Google - Detecting Interference: An A/B Test of A/B Tests
LinkedIn - Experimenting to Solve Cramming
Twitter - Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper)
Uber - Enabling 10x More Experiments with Traveloka Experiment Platform
Traveloka - Large Scale Experimentation at Stitch Fix (Paper)
Stitch Fix - Multi-Armed Bandits and the Stitch Fix Experimentation Platform
Stitch Fix - Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code)
Better - Computational Causal Inference at Netflix (Paper)
Netflix - Key Challenges with Quasi Experiments at Netflix
Netflix
- Building Inclusive Products Through A/B Testing (Paper)
LinkedIn - LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper)
LinkedIn
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
Yoshua Bengio - Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper)
Google - Rules of Machine Learning: Best Practices for ML Engineering
Google - On Challenges in Machine Learning Model Management
Amazon - Machine Learning in Production: The Booking.com Approach
Booking - 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
Booking - Engineers Shouldnโt Write ETL: A Guide to Building a High Functioning Data Science Department
Stitch Fix - Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist
Stitch Fix - Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank
Rabobank
- 160k+ High School Students Will Graduate Only If a Model Allows Them to
International Baccalaureate - When It Comes to Gorillas, Google Photos Remains Blind
Google - An Algorithm That โPredictsโ Criminality Based on a Face Sparks a Furor
Harrisburg University - It's Hard to Generate Neural Text From GPT-3 About Muslims
OpenAI - A British AI Tool to Predict Violent Crime Is Too Flawed to Use
United Kingdom - More in awful-ai
P.S., Want a summary of ML advancements? Get up to speed with survey papers ๐ml-surveys