PrefaceIntroductionChapter 1 Data Exploration As a ProcessChapter 2 The Nature of the World and Its Impact on Data PreparationChapter 3 Data Preparation as a ProcessChapter 4 Getting the Data: Basic PreparationChapter 5 Sampling, Variability and ConfidenceChapter 6 Handling Non-Numerical VariablesChapter 7 Normalizing and Redistributing VariablesChapter 8 Replacing Missing and Empty Values Chapter 9 Series VariablesChapter 10 Preparing the Data SetChapter 11 The Data SurveyChapter 12 Using Prepared DataAppendix A Using the Demonstration Code on the CDAppendix B Further ReadingIndex

Promotional Information

* Offers in-depth coverage of an essential but largely ignored subject.
* Goes far beyond theory, leading you-step by step-through the author's own data preparation techniques.
* Provides practical illustrations of the author's methodology using realistic sample data sets.
* Includes algorithms you can apply directly to your own project, along with instructions for understanding when automation is possible and when greater intervention is required.
* Explains how to identify and correct data problems that may be present in your application.
* Prepares miners, helping them head into preparation with a better understanding of data sets and their limitations.

About the Author

Dorian Pyle is Chief Scientist and Founder of PTI (www.pti.com), which develops and markets PowerhouseT predictive and explanatory analytics software. Dorian has over 20 years experience in artificial intelligence and machine learning techniques which are used in what is known today as "data mining" or "predictive analytics". He has applied this knowledge as a consultant with Knowledge Stream Partners, Xchange, Naviant, Thinking Machines, and Data Miners and with various companies directly involved in credit card marketing for banks and with manufacturing companies using industrial automation. In 1976 he was involved in building artificially intelligent machine learning systems utilizing the pioneering technologies that are currently known as neural computing and associative memories. He is current in and familiar with using the most advanced technologies in data mining including: entropic analysis (information theory), chaotic and fractal decomposition, neural technologies, evolution and genetic optimization, algebra evolvers, case-based reasoning, concept induction and other advanced statistical techniques.

Table of Contents

Promotional Information

About the Author