Use spacebar and the arrows to advance slides


| I | II | III | IV | ||||
|---|---|---|---|---|---|---|---|
| x | y | x | y | x | y | x | y |
| 10.0 | 8.04 | 10.0 | 9.14 | 10.0 | 7.46 | 8.0 | 6.58 |
| 8.0 | 6.95 | 8.0 | 8.14 | 8.0 | 6.77 | 8.0 | 5.76 |
| 13.0 | 7.58 | 13.0 | 8.74 | 13.0 | 12.74 | 8.0 | 7.71 |
| 9.0 | 8.81 | 9.0 | 8.77 | 9.0 | 7.11 | 8.0 | 8.84 |
| 11.0 | 8.33 | 11.0 | 9.26 | 11.0 | 7.81 | 8.0 | 8.47 |
| 14.0 | 9.96 | 14.0 | 8.10 | 14.0 | 8.84 | 8.0 | 7.04 |
| 6.0 | 7.24 | 6.0 | 6.13 | 6.0 | 6.08 | 8.0 | 5.25 |
| 4.0 | 4.26 | 4.0 | 3.10 | 4.0 | 5.39 | 19.0 | 12.50 |
| 12.0 | 10.84 | 12.0 | 9.13 | 12.0 | 8.15 | 8.0 | 5.56 |
| 7.0 | 4.82 | 7.0 | 7.26 | 7.0 | 6.42 | 8.0 | 7.91 |
| 5.0 | 5.68 | 5.0 | 4.74 | 5.0 | 5.73 | 8.0 | 6.89 |
| Property | Value |
|---|---|
| Mean of x | 9 |
| Variance of x | 11 |
| Mean of y | 7.50 |
| Variance of y | 4.125 |
| Correlation between x and y | 0.816 |
| Linear regression | y = 3.00 + 0.500x |
| Coefficient of determination of the linear regression | 0.67 |


Ask friends and family
That's inferring statistics from a sample n=1
Data based decisions


Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data
... to work effectively with heterogeneous, real-world data and to extract insights from the data using the latest tools and analytical methods.
Develop locally
Too ambiguous!! π€¦π½ββοΈ Let's go beyond that
Can you fit it in one computer?
Yes? ππΌ Then, is not really big π€·π½ββοΈ
Big data ππΌ Big overhead
Big Data? π π½ββοΈ
How do you compute this?
80+ trillion photos (80'''000''000'000.000)
That's big data
How do you compute this?
Big Data? ππΌ Only if it doesn't fit on one π»
β οΈ Use it only if you must β οΈ
My wife tells it to me all the time!
Machine Learning?
A distributed computing alternative of to map reduce.

| 1-D Linear | Document Lens, SeeSoft, Info Mural |
| 2-D Map | GIS, ArcView, PageMaker, Medical imagery |
| 3-D World | CAD, Medical, Molecules, Architecture |
| Multi-Var | Spotfire, Tableau, GGobi, TableLens, ParCoords, |
| Temporal | LifeLines, TimeSearcher, Palantir, DataMontage, LifeFlow |
| Tree | Cone/Cam/Hyperbolic, SpaceTree, Treemap, Treeversity |
| Network | Gephi, NodeXL, Sigmajs |


Task: Change in drug's adverse effects reports
User: FDA Analysts
Task: Detect fraud networks
User: Undisclosed Analysts