Now Australia's Biggest Toy Store

We won't be beaten by anyone. Guaranteed

Parallel Computing for Data Science


Product Description
Product Details

Table of Contents

Introduction to Parallel Processing in R
Recurring Theme: The Principle of Pretty Good Parallelism
A Note on Machines
Recurring Theme: Hedging One's Bets
Extended Example: Mutual Web Outlinks

"Why Is My Program So Slow?": Obstacles to Speed
Obstacles to Speed
Performance and Hardware Structures
Memory Basics
Network Basics
Latency and Bandwidth
Thread Scheduling
How Many Processes/Threads?
Example: Mutual Outlink Problem
"Big O" Notation
Data Serialization
"Embarrassingly Parallel" Applications Principles of Parallel Loop Scheduling
General Notions of Loop Scheduling
Chunking in Snow
A Note on Code Complexity
Example: All Possible Regressions
The partools Package
Example: All Possible Regressions, Improved Version
Introducing Another Tool: multicore
Issues with Chunk Size
Example: Parallel Distance Computation
The foreach Package
Another Scheduling Approach: Random Task Permutation
Debugging snow and multicore Code The Shared Memory Paradigm: A Gentle Introduction through R
So, What Is Actually Shared?
Clarity and Conciseness of Shared-Memory Programming
High-Level Introduction to Shared-Memory Programming: Rdsm Package
Example: Matrix Multiplication
Shared Memory Can Bring a Performance Advantage
Locks and Barriers
Example: Finding the Maximal Burst in a Time Series
Example: Transformation of an Adjacency Matrix
Example: k-Means Clustering The Shared Memory Paradigm: C Level
Example: Finding the Maximal Burst in a Time Series
OpenMP Loop Scheduling Options
Example: Transformation an Adjacency Matrix
Example: Transforming an Adjacency Matrix, R-Callable Code
Speedup in C
Run Time vs. Development Time
Further Cache/Virtual Memory Issues
Reduction Operations in OpenMP
Intel Thread Building Blocks (TBB)
Lockfree Synchronization The Shared Memory Paradigm: GPUs
Another Note on Code Complexity
Goal of This Chapter
Introduction to NVIDIA GPUs and CUDA
Example: Mutual Inlinks Problem
Synchronization on GPUs
R and GPUs
The Intel Xeon Phi Chip Thrust and Rth
Hedging One's Bets
Thrust Overview
Skipping the C++
Example: Finding Quantiles
Introduction to Rth The Message Passing Paradigm
Message Passing Overview
The Cluster Model
Performance Issues
Example: Pipelined Method for Finding Primes
Memory Allocation Issues
Message-Passing Performance Subtleties MapReduce Computation
Apache Hadoop
Other MapReduce Systems
R Interfaces to MapReduce Systems
An Alternative: "Snowdoop" Parallel Sorting and Merging
The Elusive Goal of Optimality
Sorting Algorithms
Example: Bucket Sort in R
Example: Quicksort in OpenMP
Sorting in Rth
Some Timing Comparisons
Sorting on Distributed Data Parallel Prefix Scan
General Formulation
General Strategies for Parallel Scan Computation
Implementations of Parallel Prefix Scan
Parallel cumsum() with OpenMP
Example: Moving Average Parallel Matrix Operations
Tiled Matrices
Example: Snowdoop Approach to Matrix Operations
Parallel Matrix Multiplication
BLAS Libraries
Example: A Look at the Performance of OpenBLAS
Example: Graph Connectedness
Solving Systems of Linear Equations
Sparse Matrices Inherently Statistical Approaches: Subset Methods
Chunk Averaging
Bag of Little Bootstraps
Subsetting VariablesAppendix A: Review of Matrix Algebra
Appendix B: R Quick Start
Appendix C: Introduction to C for R Programmers

About the Author

Dr. Norman Matloff is a professor of computer science at the University of California, Davis, where he was a founding member of the Department of Statistics. He is a statistical consultant and a former database software developer. He has published numerous articles in prestigious journals, such as the ACM Transactions on Database Systems, ACM Transactions on Modeling and Computer Simulation, Annals of Probability, Biometrika, Communications of the ACM, and IEEE Transactions on Data Engineering. He earned a PhD in pure mathematics from UCLA, specializing in probability/functional analysis and statistics.


"From my reading of the book, Matloff achieves his goals, and in doing so he has provided a volume that will be immensely useful to a very wide audience. I can see it being used as a reference by data analysts, statisticians, engineers, econometricians, biometricians, etc. This would apply to both established researchers and graduate students. This book provides exactly the sort of information that this audience is looking for, and it is presented in a very accessible and friendly manner."
-Econometrics Beat: Dave Giles' Blog, July 2015

"The author has correctly recognized that there is a pressing need for a thorough, but readable guide to parallel computing-one that can be used by researchers and students in a wide range of disciplines. In my view, this book will meet that need. ... For me and colleagues in my field, I would see this as a 'must-have' reference book-one that would be well thumbed!"
-David E. Giles, University of Victoria

"This is a book that I will use, both as a reference and for instruction. The examples are poignant and the presentation moves the reader directly from concept to working code."
-Michael Kane, Yale University

"Matloff's Parallel Computing for Data Science: With Examples in R, C++ and CUDA can be recommended to colleagues and students alike, and the author is to be congratulated for taming a difficult and exhaustive body of topics via a very accessible primer."
-Dirk Eddelbuettel, Debian and R Projects

Ask a Question About this Product More...
Write your question below:
Look for similar items by category
Home » Books » Computers » General
Home » Books » Science » Mathematics » General
Home » Books » Science » Mathematics » Statistics » General
People also searched for
How Fishpond Works
Fishpond works with suppliers all over the world to bring you a huge selection of products, really great prices, and delivery included on over 25 million products that we sell. We do our best every day to make Fishpond an awesome place for customers to shop and get what they want — all at the best prices online.
Webmasters, Bloggers & Website Owners
You can earn a 5% commission by selling Parallel Computing for Data Science: With Examples in R, C++ and CUDA (Chapman & Hall/CRC: The R Series) on your website. It's easy to get started - we will give you example code. After you're set-up, your website can earn you money while you work, play or even sleep! You should start right now!
Authors / Publishers
Are you the Author or Publisher of a book? Or the manufacturer of one of the millions of products that we sell. You can improve sales and grow your revenue by submitting additional information on this title. The better the information we have about a product, the more we will sell!
Item ships from and is sold by Fishpond World Ltd.
Back to top