Haskell-Data-Analysis-Cookbook/Ch02
2014-05-31 14:38:24 -04:00
..
Code01_whitespace Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code02_punctuation Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code03_unexpected Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code04_regex Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code05_lex Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code06_combining Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code07_dedup Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code08_freq Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code09_freq2 Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code10_manhattan Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code11_euclid Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code12_pearson Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
Code13_cos Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
LICENSE Updated README, added Ch02 content 2014-05-31 14:34:41 -04:00
README.md Fixed wording in README 2014-05-31 14:38:24 -04:00

Chapter 2

Chapter 2, Integrity and Inspection, explains the importance of cleaning data through recipes about trimming whitespace, lexing, and regular expression matching.

This is the accompanying source code for Haskell Data Analysis Cookbook. Refer to the book for step-by-step explanations.

Recipes

  • Code01: Trimming excess whitespace
  • Code02: Ignoring punctuation and specific characters
  • Code03: Coping with unexpected or missing input
  • Code04: Validating records by matching regular expressions
  • Code05: Lexing and parsing an e-mail address
  • Code06: Deduplication of nonconflicting data items
  • Code07: Deduplication of conflicting data items
  • Code08: Implementing a frequency table using Data.List
  • Code08: Implementing a frequency table using Data.MultiSet
  • Code10: Computing the Manhattan distance
  • Code11: Computing the Euclidean distance
  • Code12: Comparing scaled data using the Pearson correlation coefficient
  • Code13: Comparing sparse data using cosine similarity

How to use

Setting up the environment

Install the Haskell Platform.

$ sudo apt-get install haskell-platform

Alternatively, install GHC 7.6 (or above) and Cabal.

$ sudo apt-get install ghc cabal-install

Running the code

A Makefile is provided in each recipe. Compile the corresponding executable by running make.

$ make

Run the resulting code. For example,

$ ./Code01

To clean up the directory:

$ make clean