PRACTICAL MACHINE LEARNING ON DATABRICKS seamlessly transition ML models and MLOps on Databricks

Book Cover
Average Rating
Published
Birmingham, UK : Packt Publishing Ltd., 2023.
Status
Available Online

Description

Take your machine learning skills to the next level by mastering databricks and building robust ML pipeline solutions for future ML innovations Key Features Learn to build robust ML pipeline solutions for databricks transition Master commonly available features like AutoML and MLflow Leverage data governance and model deployment using MLflow model registry Purchase of the print or Kindle book includes a free PDF eBook Book Description Unleash the potential of databricks for end-to-end machine learning with this comprehensive guide, tailored for experienced data scientists and developers transitioning from DIY or other cloud platforms. Building on a strong foundation in Python, Practical Machine Learning on Databricks serves as your roadmap from development to production, covering all intermediary steps using the databricks platform. You'll start with an overview of machine learning applications, databricks platform features, and MLflow. Next, you'll dive into data preparation, model selection, and training essentials and discover the power of databricks feature store for precomputing feature tables. You'll also learn to kickstart your projects using databricks AutoML and automate retraining and deployment through databricks workflows. By the end of this book, you'll have mastered MLflow for experiment tracking, collaboration, and advanced use cases like model interpretability and governance. The book is enriched with hands-on example code at every step. While primarily focused on generally available features, the book equips you to easily adapt to future innovations in machine learning, databricks, and MLflow. What you will learn Transition smoothly from DIY setups to databricks Master AutoML for quick ML experiment setup Automate model retraining and deployment Leverage databricks feature store for data prep Use MLflow for effective experiment tracking Gain practical insights for scalable ML solutions Find out how to handle model drifts in production environments Who this book is for This book is for experienced data scientists, engineers, and developers proficient in Python, statistics, and ML lifecycle looking to transition to databricks from DIY clouds. Introductory Spark knowledge is a must to make the most out of this book, however, end-to-end ML workflows will be covered. If you aim to accelerate your machine learning workflows and deploy scalable, robust solutions, this book is an indispensable resource.

More Details

Format
Edition
1st edition.
Language
English
ISBN
9781801818292, 1801818290

Notes

Description
Take your machine learning skills to the next level by mastering databricks and building robust ML pipeline solutions for future ML innovations Key Features Learn to build robust ML pipeline solutions for databricks transition Master commonly available features like AutoML and MLflow Leverage data governance and model deployment using MLflow model registry Purchase of the print or Kindle book includes a free PDF eBook Book Description Unleash the potential of databricks for end-to-end machine learning with this comprehensive guide, tailored for experienced data scientists and developers transitioning from DIY or other cloud platforms. Building on a strong foundation in Python, Practical Machine Learning on Databricks serves as your roadmap from development to production, covering all intermediary steps using the databricks platform. You'll start with an overview of machine learning applications, databricks platform features, and MLflow. Next, you'll dive into data preparation, model selection, and training essentials and discover the power of databricks feature store for precomputing feature tables. You'll also learn to kickstart your projects using databricks AutoML and automate retraining and deployment through databricks workflows. By the end of this book, you'll have mastered MLflow for experiment tracking, collaboration, and advanced use cases like model interpretability and governance. The book is enriched with hands-on example code at every step. While primarily focused on generally available features, the book equips you to easily adapt to future innovations in machine learning, databricks, and MLflow. What you will learn Transition smoothly from DIY setups to databricks Master AutoML for quick ML experiment setup Automate model retraining and deployment Leverage databricks feature store for data prep Use MLflow for effective experiment tracking Gain practical insights for scalable ML solutions Find out how to handle model drifts in production environments Who this book is for This book is for experienced data scientists, engineers, and developers proficient in Python, statistics, and ML lifecycle looking to transition to databricks from DIY clouds. Introductory Spark knowledge is a must to make the most out of this book, however, end-to-end ML workflows will be covered. If you aim to accelerate your machine learning workflows and deploy scalable, robust solutions, this book is an indispensable resource.
Local note
O'Reilly O'Reilly Online Learning: Academic/Public Library Edition

Table of Contents

Cover
Title page
Copyright and credits
Contributors
About the author
About the reviewers
Table of Contents
Preface
Part 1: Introduction
Chapter 1: The ML Process and Its Challenges
Understanding the typical machine learning process
Discovering the roles associated with machine learning projects in organizations
Challenges with productionizing machine learning use cases in organizations
Understanding the requirements of an enterprise-grade machine learning platform
Scalability
the growth catalyst
Performance
ensuring efficiency and speed
Security
safeguarding data and models
Governance
steering the machine learning life cycle
Reproducibility
ensuring trust and consistency
Ease of use
balancing complexity and usability
Exploring Databricks and the Lakehouse architecture
Scalability
the growth catalyst
Performance
ensuring efficiency and speed
Security
safeguarding data and models
Governance
steering the machine learning life cycle
Reproducibility
ensuring trust and consistency
Ease of use
balancing complexity and usability
Simplifying machine learning development with the Lakehouse architecture
Summary
Further reading
Chapter 2: Overview of ML on Databricks
Technical requirements
Setting up a Databricks trial account
Exploring the workspace
Repos
Exploring clusters
Single user
Shared
No isolation shared
Single-node clusters
Exploring notebooks
Exploring data
Exploring experiments
Discovering the feature store
Discovering the model registry
Libraries
Storing libraries
Managing libraries
Databricks Runtime and libraries
Library usage modes
Unity Catalog limitations
Installation sources for libraries
Summary
Further reading.
Part 2: ML Pipeline Components and Implementation
Chapter 3: Utilizing the Feature Store
Technical requirements
Diving into feature stores and the problems they solve
Discovering feature stores on the Databricks platform
Feature table
Offline store
Online store
Training Set
Model packaging
Registering your first feature table in Databricks Feature Store
Summary
Further reading
Chapter 4: Understanding MLflow Components on Databricks
Technical requirements
Overview of MLflow
MLflow Tracking
MLflow Models
MLflow Model Registry
Example code showing how to track ML model training in Databricks
Summary
Chapter 5: Create a Baseline Model Using Databricks AutoML
Technical requirements
Understanding the need for AutoML
Understanding AutoML in Databricks
Sampling large datasets
Imbalance data detection
Splitting data into train/validation/test sets
Enhancing semantic type detection
Shapley value (SHAP) for model explainability
Feature Store integration
Running AutoML on our churn prediction dataset
Summary
Further reading
Part 3: ML Governance and Deployment
Chapter 6: Model Versioning and Webhooks
Technical requirements
Understanding the need for the Model Registry
Registering your candidate model to the Model Registry and managing access
Diving into the webhooks support in the Model Registry
Summary
Further reading
Chapter 7: Model Deployment Approaches
Technical requirements
Understanding ML deployments and paradigms
Deploying ML models for batch and streaming inference
Batch inference on Databricks
Streaming inference on Databricks
Deploying ML models for real-time inference
In-depth analysis of the constraints and capabilities of Databricks Model Serving.
Incorporating custom Python libraries into MLflow models for Databricks deployment
Deploying custom models with MLflow and Model Serving
Packaging dependencies with MLflow models
Summary
Further reading
Chapter 8: Automating ML Workflows Using Databricks Jobs
Technical requirements
Understanding Databricks Workflows
Utilizing Databricks Workflows with Jobs to automate model training and testing
Summary
Further reading
Chapter 9: Model Drift Detection and Retraining
Technical requirements
The motivation behind model monitoring
Introduction to model drift
Introduction to Statistical Drift
Techniques for drift detection
Hypothesis testing
Statistical tests and measurements for numeric features
Statistical tests and measurements for categorical features
Statistical tests and measurements on models
Implementing drift detection on Databricks
Summary
Chapter 10: Using CI/CD to Automate Model Retraining and Redeployment
Introduction to MLOps
Delta Lake
more than just a data lake
Comprehensive model management with Databricks MLflow
Integrating DevOps and MLOps for robust ML pipelines with Databricks
Fundamentals of MLOps and deployment patterns
Navigating environment isolation in Databricks
multiple strategies for MLOps
Understanding ML deployment patterns
The deploy models approach
The deploy code approach
Summary
Further reading
Index
Other Books You May Enjoy.

Discover More

Reviews from GoodReads

Loading GoodReads Reviews.

Citations

APA Citation, 7th Edition (style guide)

Sinha, D. (2023). PRACTICAL MACHINE LEARNING ON DATABRICKS: seamlessly transition ML models and MLOps on Databricks (1st edition.). Packt Publishing Ltd..

Chicago / Turabian - Author Date Citation, 17th Edition (style guide)

Sinha, Debu. 2023. PRACTICAL MACHINE LEARNING ON DATABRICKS: Seamlessly Transition ML Models and MLOps On Databricks. Birmingham, UK: Packt Publishing Ltd.

Chicago / Turabian - Humanities (Notes and Bibliography) Citation, 17th Edition (style guide)

Sinha, Debu. PRACTICAL MACHINE LEARNING ON DATABRICKS: Seamlessly Transition ML Models and MLOps On Databricks Birmingham, UK: Packt Publishing Ltd, 2023.

Harvard Citation (style guide)

Sinha, D. (2023). PRACTICAL MACHINE LEARNING ON DATABRICKS: seamlessly transition ML models and mlops on databricks. 1st edn. Birmingham, UK: Packt Publishing Ltd.

MLA Citation, 9th Edition (style guide)

Sinha, Debu. PRACTICAL MACHINE LEARNING ON DATABRICKS: Seamlessly Transition ML Models and MLOps On Databricks 1st edition., Packt Publishing Ltd., 2023.

Note! Citations contain only title, author, edition, publisher, and year published. Citations should be used as a guideline and should be double checked for accuracy. Citation formats are based on standards as of August 2021.

Staff View

Grouped Work ID
e5fdfd95-c7b0-cfd9-f736-83399ccd3ae1-eng
Go To Grouped Work View in Staff Client

Grouping Information

Grouped Work IDe5fdfd95-c7b0-cfd9-f736-83399ccd3ae1-eng
Full titlepractical machine learning on databricks seamlessly transition ml models and mlops on databricks
Authorsinha debu
Grouping Categorybook
Last Update2025-01-24 12:33:29PM
Last Indexed2025-05-22 03:43:05AM

Book Cover Information

Image Sourcedefault
First LoadedMay 30, 2025
Last UsedMay 30, 2025

Marc Record

First DetectedDec 16, 2024 11:27:19 PM
Last File Modification TimeDec 17, 2024 08:26:44 AM
SuppressedRecord had no items

MARC Record

LEADER09029cam a22003977a 4500
001on1407093858
003OCoLC
00520241217082456.0
006m     o  d        
007cr |n|||||||||
008231103s2022    enk     o     000 0 eng d
020 |a 9781801818292|q (electronic bk.)
020 |a 1801818290|q (electronic bk.)
035 |a (OCoLC)1407093858
037 |a 9781801812030|b O'Reilly Media
040 |a YDX|b eng|c YDX|d OCLCO|d ORMDA|d DXU
049 |a MAIN
050 4|a Q325.5
08204|a 006.3/1|2 23/eng/20231205
1001 |a Sinha, Debu,|e author.
24510|a PRACTICAL MACHINE LEARNING ON DATABRICKS|h [electronic resource] :|b seamlessly transition ML models and MLOps on Databricks /|c Debu Sinha.
250 |a 1st edition.
260 |a Birmingham, UK :|b Packt Publishing Ltd.,|c 2023.
300 |a 1 online resource
5050 |a Cover -- Title page -- Copyright and credits -- Contributors -- About the author -- About the reviewers -- Table of Contents -- Preface -- Part 1: Introduction -- Chapter 1: The ML Process and Its Challenges -- Understanding the typical machine learning process -- Discovering the roles associated with machine learning projects in organizations -- Challenges with productionizing machine learning use cases in organizations -- Understanding the requirements of an enterprise-grade machine learning platform -- Scalability -- the growth catalyst -- Performance -- ensuring efficiency and speed -- Security -- safeguarding data and models -- Governance -- steering the machine learning life cycle -- Reproducibility -- ensuring trust and consistency -- Ease of use -- balancing complexity and usability -- Exploring Databricks and the Lakehouse architecture -- Scalability -- the growth catalyst -- Performance -- ensuring efficiency and speed -- Security -- safeguarding data and models -- Governance -- steering the machine learning life cycle -- Reproducibility -- ensuring trust and consistency -- Ease of use -- balancing complexity and usability -- Simplifying machine learning development with the Lakehouse architecture -- Summary -- Further reading -- Chapter 2: Overview of ML on Databricks -- Technical requirements -- Setting up a Databricks trial account -- Exploring the workspace -- Repos -- Exploring clusters -- Single user -- Shared -- No isolation shared -- Single-node clusters -- Exploring notebooks -- Exploring data -- Exploring experiments -- Discovering the feature store -- Discovering the model registry -- Libraries -- Storing libraries -- Managing libraries -- Databricks Runtime and libraries -- Library usage modes -- Unity Catalog limitations -- Installation sources for libraries -- Summary -- Further reading.
5058 |a Part 2: ML Pipeline Components and Implementation -- Chapter 3: Utilizing the Feature Store -- Technical requirements -- Diving into feature stores and the problems they solve -- Discovering feature stores on the Databricks platform -- Feature table -- Offline store -- Online store -- Training Set -- Model packaging -- Registering your first feature table in Databricks Feature Store -- Summary -- Further reading -- Chapter 4: Understanding MLflow Components on Databricks -- Technical requirements -- Overview of MLflow -- MLflow Tracking -- MLflow Models -- MLflow Model Registry -- Example code showing how to track ML model training in Databricks -- Summary -- Chapter 5: Create a Baseline Model Using Databricks AutoML -- Technical requirements -- Understanding the need for AutoML -- Understanding AutoML in Databricks -- Sampling large datasets -- Imbalance data detection -- Splitting data into train/validation/test sets -- Enhancing semantic type detection -- Shapley value (SHAP) for model explainability -- Feature Store integration -- Running AutoML on our churn prediction dataset -- Summary -- Further reading -- Part 3: ML Governance and Deployment -- Chapter 6: Model Versioning and Webhooks -- Technical requirements -- Understanding the need for the Model Registry -- Registering your candidate model to the Model Registry and managing access -- Diving into the webhooks support in the Model Registry -- Summary -- Further reading -- Chapter 7: Model Deployment Approaches -- Technical requirements -- Understanding ML deployments and paradigms -- Deploying ML models for batch and streaming inference -- Batch inference on Databricks -- Streaming inference on Databricks -- Deploying ML models for real-time inference -- In-depth analysis of the constraints and capabilities of Databricks Model Serving.
5058 |a Incorporating custom Python libraries into MLflow models for Databricks deployment -- Deploying custom models with MLflow and Model Serving -- Packaging dependencies with MLflow models -- Summary -- Further reading -- Chapter 8: Automating ML Workflows Using Databricks Jobs -- Technical requirements -- Understanding Databricks Workflows -- Utilizing Databricks Workflows with Jobs to automate model training and testing -- Summary -- Further reading -- Chapter 9: Model Drift Detection and Retraining -- Technical requirements -- The motivation behind model monitoring -- Introduction to model drift -- Introduction to Statistical Drift -- Techniques for drift detection -- Hypothesis testing -- Statistical tests and measurements for numeric features -- Statistical tests and measurements for categorical features -- Statistical tests and measurements on models -- Implementing drift detection on Databricks -- Summary -- Chapter 10: Using CI/CD to Automate Model Retraining and Redeployment -- Introduction to MLOps -- Delta Lake -- more than just a data lake -- Comprehensive model management with Databricks MLflow -- Integrating DevOps and MLOps for robust ML pipelines with Databricks -- Fundamentals of MLOps and deployment patterns -- Navigating environment isolation in Databricks -- multiple strategies for MLOps -- Understanding ML deployment patterns -- The deploy models approach -- The deploy code approach -- Summary -- Further reading -- Index -- Other Books You May Enjoy.
520 |a Take your machine learning skills to the next level by mastering databricks and building robust ML pipeline solutions for future ML innovations Key Features Learn to build robust ML pipeline solutions for databricks transition Master commonly available features like AutoML and MLflow Leverage data governance and model deployment using MLflow model registry Purchase of the print or Kindle book includes a free PDF eBook Book Description Unleash the potential of databricks for end-to-end machine learning with this comprehensive guide, tailored for experienced data scientists and developers transitioning from DIY or other cloud platforms. Building on a strong foundation in Python, Practical Machine Learning on Databricks serves as your roadmap from development to production, covering all intermediary steps using the databricks platform. You'll start with an overview of machine learning applications, databricks platform features, and MLflow. Next, you'll dive into data preparation, model selection, and training essentials and discover the power of databricks feature store for precomputing feature tables. You'll also learn to kickstart your projects using databricks AutoML and automate retraining and deployment through databricks workflows. By the end of this book, you'll have mastered MLflow for experiment tracking, collaboration, and advanced use cases like model interpretability and governance. The book is enriched with hands-on example code at every step. While primarily focused on generally available features, the book equips you to easily adapt to future innovations in machine learning, databricks, and MLflow. What you will learn Transition smoothly from DIY setups to databricks Master AutoML for quick ML experiment setup Automate model retraining and deployment Leverage databricks feature store for data prep Use MLflow for effective experiment tracking Gain practical insights for scalable ML solutions Find out how to handle model drifts in production environments Who this book is for This book is for experienced data scientists, engineers, and developers proficient in Python, statistics, and ML lifecycle looking to transition to databricks from DIY clouds. Introductory Spark knowledge is a must to make the most out of this book, however, end-to-end ML workflows will be covered. If you aim to accelerate your machine learning workflows and deploy scalable, robust solutions, this book is an indispensable resource.
590 |a O'Reilly|b O'Reilly Online Learning: Academic/Public Library Edition
650 0|a Machine learning|x Computer programs.
77608|i Print version:|z 9781801818292
77608|i Print version:|z 1801812039|z 9781801812030|w (OCoLC)1337143940
85640|u https://library.access.arlingtonva.us/login?url=https://learning.oreilly.com/library/view/~/9781801812030/?ar|x O'Reilly|z eBook
938 |a YBP Library Services|b YANK|n 305792890
994 |a 92|b VIA
999 |c 360015|d 360015