Practical Data Science with R, Second Edition

Name: Practical Data Science with R, Second Edition /
Availability: OnlineOnly
Author: Mount, John,

Average Rating

Author

Mount, John,

Published

Manning Publications, 2019.

Status

Available Online

Links

O'Reilly

Description

Loading Description...

More Details

Format

Edition

2nd edition.

Language

English

UPC

9781617295874

Notes

Description

Practical Data Science with R, Second Edition is a task-based tutorial that leads readers through dozens of useful, data analysis practices using the R language. By concentrating on the most important tasks you'll face on the job, this friendly guide is comfortable both for business analysts and data scientists. Because data is only useful if it can be understood, you'll also find fantastic tips for organizing and presenting data in tables, as well as snappy visualizations.

Issuing Body

Made available through: Safari, an O'Reilly Media Company.

Local note

O'Reilly,O'Reilly Online Learning: Academic/Public Library Edition

Subjects

LC Subjects

Data mining.
Mathematical statistics -- Data processing.
R (Computer program language)

Also in this Series

Checking series information...

More Like This

Loading more titles like this title...

Citations

APA Citation, 7th Edition (style guide)

Mount, J., & Zumel, N. (2019). Practical Data Science with R, Second Edition (2nd edition.). Manning Publications.

Chicago / Turabian - Author Date Citation, 17th Edition (style guide)

Mount, John and Nina, Zumel. 2019. Practical Data Science With R, Second Edition. Manning Publications.

Chicago / Turabian - Humanities (Notes and Bibliography) Citation, 17th Edition (style guide)

Mount, John and Nina, Zumel. Practical Data Science With R, Second Edition Manning Publications, 2019.

Harvard Citation (style guide)

Mount, J. and Zumel, N. (2019). Practical data science with R, second edition. 2nd edn. Manning Publications.

MLA Citation, 9th Edition (style guide)

Mount, John,, and Nina Zumel. Practical Data Science With R, Second Edition 2nd edition., Manning Publications, 2019.

Note! Citations contain only title, author, edition, publisher, and year published. Citations should be used as a guideline and should be double checked for accuracy. Citation formats are based on standards as of August 2021.

Staff View

Grouped Work ID

8f9828fb-5504-c634-2756-77281b3c3dfa-eng

Go To Grouped Work View in Staff Client

Grouping Information

Grouped Work ID	8f9828fb-5504-c634-2756-77281b3c3dfa-eng
Full title	practical data science with r
Author	mount john
Grouping Category	book
Last Update	2024-12-17 08:30:41AM
Last Indexed	2024-12-17 08:34:45AM

Book Cover Information

Image Source	contentCafe
First Loaded	Aug 5, 2023
Last Used	Dec 16, 2024

Marc Record

First Detected	Mar 21, 2023 01:15:33 PM
Last File Modification Time	Dec 17, 2024 08:11:54 AM
Suppressed	Record had no items

MARC Record

LEADER	11534cam a22004937a 4500
001	on1139769167
003	OCoLC
005	20241217080935.0
006	m o d
007	cr cnu\|\|\|\|\|\|\|\|
008	200208s2019 xx o 000 0 eng
024	8		\|a 9781617295874
035			\|a (OCoLC)1139769167
040			\|a AU@\|b eng\|e pn\|c AU@\|d TOH\|d OCLCO\|d CZL\|d OCLCO\|d OCLCQ\|d OCLCO
049			\|a MAIN
082	0	4	\|a 004\|q OCoLC\|2 23/eng/20230216
100	1		\|a Mount, John,\|e author.
245	1	0	\|a Practical Data Science with R, Second Edition /\|c Mount, John.
250			\|a 2nd edition.
264		1	\|b Manning Publications,\|c 2019.
300			\|a 1 online resource (568 pages)
336			\|a text\|b txt\|2 rdacontent
337			\|a computer\|b c\|2 rdamedia
338			\|a online resource\|b cr\|2 rdacarrier
347			\|a text file
505	0		\|a Intro -- Practical Data Science with R, Second Edition -- Nina Zumel and John Mount -- Copyright -- Dedication -- Brief Table of Contents -- Table of Contents -- Praise for the First Edition -- front matter -- Foreword -- Preface -- Acknowledgments -- About This Book -- What is data science? -- Roadmap -- Audience -- What is not in this book? -- Code conventions and downloads -- Working with this book -- Downloading the book's supporting materials/repository -- Book forum -- About the Authors -- About the Foreword Authors -- About the Cover Illustration -- Part 1. Introduction to data science -- Chapter 1. The data science process -- 1.1. The roles in a data science project -- 1.1.1. Project roles -- 1.2. Stages of a data science project -- 1.2.1. Defining the goal -- 1.2.2. Data collection and management -- 1.2.3. Modeling -- 1.2.4. Model evaluation and critique -- 1.2.5. Presentation and documentation -- 1.2.6. Model deployment and maintenance -- 1.3. Setting expectations -- 1.3.1. Determining lower bounds on model performance -- Summary -- Chapter 2. Starting with R and data -- 2.1. Starting with R -- 2.1.1. Installing R, tools, and examples -- 2.1.2. R programming -- 2.2. Working with data from files -- 2.2.1. Working with well-structured data from files or URLs -- 2.2.2. Using R with less-structured data -- 2.3. Working with relational databases -- 2.3.1. A production-size example -- Summary -- Chapter 3. Exploring data -- 3.1. Using summary statistics to spot problems -- 3.1.1. Typical problems revealed by data summaries -- 3.2. Spotting problems using graphics and visualization -- 3.2.1. Visually checking distributions for a single variable -- 3.2.2. Visually checking relationships between two variables -- Summary -- Chapter 4. Managing data -- 4.1. Cleaning data -- 4.1.1. Domain-specific data cleaning -- 4.1.2. Treating missing values.
505	8		\|a 4.1.3. The vtreat package for automatically treating missing variables -- 4.2. Data transformations -- 4.2.1. Normalization -- 4.2.2. Centering and scaling -- 4.2.3. Log transformations for skewed and wide distributions -- 4.3. Sampling for modeling and validation -- 4.3.1. Test and training splits -- 4.3.2. Creating a sample group column -- 4.3.3. Record grouping -- 4.3.4. Data provenance -- Summary -- Chapter 5. Data engineering and data shaping -- 5.1. Data selection -- 5.1.1. Subsetting rows and columns -- 5.1.2. Removing records with incomplete data -- 5.1.3. Ordering rows -- 5.2. Basic data transforms -- 5.2.1. Adding new columns -- 5.2.2. Other simple operations -- 5.3. Aggregating transforms -- 5.3.1. Combining many rows into summary rows -- 5.4. Multitable data transforms -- 5.4.1. Combining two or more ordered data frames quickly -- 5.4.2. Principal methods to combine data from multiple tables -- 5.5. Reshaping transforms -- 5.5.1. Moving data from wide to tall form -- 5.5.2. Moving data from tall to wide form -- 5.5.3. Data coordinates -- Summary -- Part 2. Modeling methods -- Chapter 6. Choosing and evaluating models -- 6.1. Mapping problems to machine learning tasks -- 6.1.1. Classification problems -- 6.1.2. Scoring problems -- 6.1.3. Grouping: working without known targets -- 6.1.4. Problem-to-method mapping -- 6.2. Evaluating models -- 6.2.1. Overfitting -- 6.2.2. Measures of model performance -- 6.2.3. Evaluating classification models -- 6.2.4. Evaluating scoring models -- 6.2.5. Evaluating probability models -- 6.3. Local interpretable model-agnostic explanations (LIME) for explai- ining model predictions -- 6.3.1. LIME: Automated sanity checking -- 6.3.2. Walking through LIME: A small example -- 6.3.3. LIME for text classification -- 6.3.4. Training the text classifier -- 6.3.5. Explaining the classifier's predictions -- Summary.
505	8		\|a Chapter 7. Linear and logistic regression -- 7.1. Using linear regression -- 7.1.1. Understanding linear regression -- 7.1.2. Building a linear regression model -- 7.1.3. Making predictions -- 7.1.4. Finding relations and extracting advice -- 7.1.5. Reading the model summary and characterizing coefficient quality -- 7.1.6. Linear regression takeaways -- 7.2. Using logistic regression -- 7.2.1. Understanding logistic regression -- 7.2.2. Building a logistic regression model -- 7.2.3. Making predictions -- 7.2.4. Finding relations and extracting advice from logistic models -- 7.2.5. Reading the model summary and characterizing coefficients -- 7.2.6. Logistic regression takeaways -- 7.3. Regularization -- 7.3.1. An example of quasi-separation -- 7.3.2. The types of regularized regression -- 7.3.3. Regularized regression with glmnet -- Summary -- Chapter 8. Advanced data preparation -- 8.1. The purpose of the vtreat package -- 8.2. KDD and KDD Cup 2009 -- 8.2.1. Getting started with KDD Cup 2009 data -- 8.2.2. The bull-in-the-china-shop approach -- 8.3. Basic data preparation for classification -- 8.3.1. The variable score frame -- 8.4. Advanced data preparation for classification -- 8.4.1. Using mkCrossFrameCExperiment() -- 8.4.2. Building a model -- Building a multivariable model -- Evaluating the model -- 8.5. Preparing data for regression modeling -- 8.6. Mastering the vtreat package -- 8.6.1. The vtreat phases -- 8.6.2. Missing values -- 8.6.3. Indicator variables -- 8.6.4. Impact coding -- 8.6.5. The treatment plan -- 8.6.6. The cross-frame -- Summary -- Chapter 9. Unsupervised methods -- 9.1. Cluster analysis -- 9.1.1. Distances -- 9.1.2. Preparing the data -- 9.1.3. Hierarchical clustering with hclust -- 9.1.4. The k-means algorithm -- 9.1.5. Assigning new points to clusters -- 9.1.6. Clustering takeaways -- 9.2. Association rules.
505	8		\|a 9.2.1. Overview of association rules -- 9.2.2. The example problem -- 9.2.3. Mining association rules with the arules package -- 9.2.4. Association rule takeaways -- Summary -- Chapter 10. Exploring advanced methods -- 10.1. Tree-based methods -- 10.1.1. A basic decision tree -- 10.1.2. Using bagging to improve prediction -- 10.1.3. Using random forests to further improve prediction -- 10.1.4. Gradient-boosted trees -- 10.1.5. Tree-based model takeaways -- 10.2. Using generalized additive models (GAMs) to learn non-monotone relationships -- 10.2.1. Understanding GAMs -- 10.2.2. A one-dimensional regression example -- 10.2.3. Extracting the non-linear relationships -- 10.2.4. Using GAM on actual data -- 10.2.5. Using GAM for logistic regression -- 10.2.6. GAM takeaways -- 10.3. Solving "inseparable" problems using support vector machines -- 10.3.1. Using an SVM to solve a problem -- 10.3.2. Understanding support vector machines -- 10.3.3. Understanding kernel functions -- 10.3.4. Support vector machine and kernel methods takeaways -- Summary -- Part 3. Working in the real world -- Chapter 11. Documentation and deployment -- 11.1. Predicting buzz -- 11.2. Using R markdown to produce milestone documentation -- 11.2.1. What is R markdown? -- 11.2.2. knitr technical details -- 11.2.3. Using knitr to document the Buzz data and produce the model -- 11.3. Using comments and version control for running documentation -- 11.3.1. Writing effective comments -- 11.3.2. Using version control to record history -- 11.3.3. Using version control to explore your project -- 11.3.4. Using version control to share work -- 11.4. Deploying models -- 11.4.1. Deploying demonstrations using Shiny -- 11.4.2. Deploying models as HTTP services -- 11.4.3. Deploying models by export -- 11.4.4. What to take away -- Summary -- Chapter 12. Producing effective presentations.
505	8		\|a 12.1. Presenting your results to the project sponsor -- 12.1.1. Summarizing the project's goals -- 12.1.2. Stating the project's results -- 12.1.3. Filling in the details -- 12.1.4. Making recommendations and discussing future work -- 12.1.5. Project sponsor presentation takeaways -- 12.2. Presenting your model to end users -- 12.2.1. Summarizing the project goals -- 12.2.2. Showing how the model fits user workflow -- 12.2.3. Showing how to use the model -- 12.2.4. End user presentation takeaways -- 12.3. Presenting your work to other data scientists -- 12.3.1. Introducing the problem -- 12.3.2. Discussing related work -- 12.3.3. Discussing your approach -- 12.3.4. Discussing results and future work -- 12.3.5. Peer presentation takeaways -- Summary -- Appendix A. Starting with R and other tools -- A.1. Installing the tools -- A.1.1. Installing Tools -- A.1.2. The R package system -- A.1.3. Installing Git -- A.1.4. Installing RStudio -- A.1.5. R resources -- A.2. Starting with R -- A.2.1. Primary features of R -- A.2.2. Primary R data types -- A.3. Using databases with R -- A.3.1. Running database queries using a query generator -- A.3.2. How to think relationally about data -- A.4. The takeaway -- Appendix B. Important statistical concepts -- B.1. Distributions -- B.1.1. Normal distribution -- B.1.2. Summarizing R's distribution naming conventions -- B.1.3. Lognormal distribution -- B.1.4. Binomial distribution -- B.1.5. More R tools for distributions -- B.2. Statistical theory -- B.2.1. Statistical philosophy -- B.2.2. A/B tests -- B.2.3. Power of tests -- B.2.4. Specialized statistical tests -- B.3. Examples of the statistical view of data -- B.3.1. Sampling bias -- B.3.2. Omitted variable bias -- B.4. The takeaway -- Appendix C. Bibliography -- Practical Data Science with R -- Index -- List of Figures -- List of Tables -- List of Listings.
520			\|a Practical Data Science with R, Second Edition is a task-based tutorial that leads readers through dozens of useful, data analysis practices using the R language. By concentrating on the most important tasks you'll face on the job, this friendly guide is comfortable both for business analysts and data scientists. Because data is only useful if it can be understood, you'll also find fantastic tips for organizing and presenting data in tables, as well as snappy visualizations.
542			\|f © 2019 Manning Publications Co. All rights reserved.\|g 2019
550			\|a Made available through: Safari, an O'Reilly Media Company.
588			\|a Online resource; Title from title page (viewed December 23, 2019)
590			\|a O'Reilly\|b O'Reilly Online Learning: Academic/Public Library Edition
650		0	\|a R (Computer program language)\|9 74517
650		0	\|a Data mining.\|9 71797
650		0	\|a Mathematical statistics\|x Data processing.\|9 46679
700	1		\|a Zumel, Nina,\|e author.\|9 429050
710	2		\|a Safari, an O'Reilly Media Company.
856	4	0	\|u https://library.access.arlingtonva.us/login?url=https://learning.oreilly.com/library/view/~/9781617295874/?ar\|x O'Reilly\|z eBook
936			\|a BATCHLOAD
994			\|a 92\|b VIA
999			\|c 290425\|d 290425

Navigation

Practical Data Science with R, Second Edition

Links

Description

More Details

Notes

Table of Contents

Subjects

Also in this Series

More Like This

Excerpt

Author Notes

Citations

Staff View

Grouping Information

Book Cover Information

Marc Record

MARC Record