John Alexis Guerra GΓ³mez
@duto_guerra
https://johnguerra.co/slides/questForInsights2
Use spacebar and the arrows to advance slides
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data
... to work effectively with heterogeneous, real-world data and to extract insights from the data using the latest tools and analytical methods.
I | II | III | IV | ||||
---|---|---|---|---|---|---|---|
x | y | x | y | x | y | x | y |
10.0 | 8.04 | 10.0 | 9.14 | 10.0 | 7.46 | 8.0 | 6.58 |
8.0 | 6.95 | 8.0 | 8.14 | 8.0 | 6.77 | 8.0 | 5.76 |
13.0 | 7.58 | 13.0 | 8.74 | 13.0 | 12.74 | 8.0 | 7.71 |
9.0 | 8.81 | 9.0 | 8.77 | 9.0 | 7.11 | 8.0 | 8.84 |
11.0 | 8.33 | 11.0 | 9.26 | 11.0 | 7.81 | 8.0 | 8.47 |
14.0 | 9.96 | 14.0 | 8.10 | 14.0 | 8.84 | 8.0 | 7.04 |
6.0 | 7.24 | 6.0 | 6.13 | 6.0 | 6.08 | 8.0 | 5.25 |
4.0 | 4.26 | 4.0 | 3.10 | 4.0 | 5.39 | 19.0 | 12.50 |
12.0 | 10.84 | 12.0 | 9.13 | 12.0 | 8.15 | 8.0 | 5.56 |
7.0 | 4.82 | 7.0 | 7.26 | 7.0 | 6.42 | 8.0 | 7.91 |
5.0 | 5.68 | 5.0 | 4.74 | 5.0 | 5.73 | 8.0 | 6.89 |
Property | Value |
---|---|
Mean of x | 9 |
Variance of x | 11 |
Mean of y | 7.50 |
Variance of y | 4.125 |
Correlation between x and y | 0.816 |
Linear regression | y = 3.00 + 0.500x |
Coefficient of determination of the linear regression | 0.67 |
Ask friends and family
That's inferring statistics from a sample n=1
Data based decisions
1-D Linear | Document Lens, SeeSoft, Info Mural |
2-D Map | GIS, ArcView, PageMaker, Medical imagery |
3-D World | CAD, Medical, Molecules, Architecture |
Multi-Var | Spotfire, Tableau, GGobi, TableLens, ParCoords, |
Temporal | LifeLines, TimeSearcher, Palantir, DataMontage, LifeFlow |
Tree | Cone/Cam/Hyperbolic, SpaceTree, Treemap, Treeversity |
Network | Gephi, NodeXL, Sigmajs |
Too ambiguous!! π€¦π½ββοΈ Let's go beyond that
Can you fit it in one computer?
Yes? ππΌ Then, is not really big π€·π½ββοΈ
Big data ππΌ Big overhead
Big Data? π π½ββοΈ
How do you compute this?
80+ trillion photos (80'''000''000'000.000)
That's big data
How do you compute this?
Big Data? ππΌ Only if it doesn't fit on one π»
β οΈ Use it only if you must β οΈ
My wife tells it to me all the time!
Machine Learning?
Develop locally
Task: Change in drug's adverse effects reports
User: FDA Analysts
Task: Detect fraud networks
User: Undisclosed Analysts
A distributed computing alternative of to map reduce.
Traditional
Pros:
Cons:
| Data Mining/ML
Pros:
Cons:
| InfoVis
Pros:
Cons
|