John Alexis Guerra Gómez
@duto_guerra
Use spacebar and the arrows to advance slides
Can you fit it in one computer?
Yes? -> Then is not really big
Let's call it big data only if it doesn't fit on one computer (and has the 3Vs)
Because if it fits in one computer you don't need all the overhead of big data technologies, just use a traditional relational database.
How do you compute this?
Put all your photos in one computer
Go through all the collection and count
80+ trillion photos (80'''000''000'000.000)
That's big data
How do you compute this?
Distribute the data among hundreds of thousand of computers (a cluster).
Compute subtotals on each chunk of the data. (Map)
Aggregate the subtotals into one big total. (Reduce)
total / one computer capacity?
What if one computer breaks down?
We need redundancy -> Each photo is stored in many computers
How do we control versions? How to keep records? What goes where?
That's why we need big data!!
A distributed computing alternative of to map reduce.
Traditional
Pros:
Cons:
|
Data Mining/ML
Pros:
Cons:
|
InfoVis
Pros:
Cons
|
Adapted from from:Tamara Munzner Book Chapter
1-D Linear | Document Lens, SeeSoft, Info Mural |
2-D Map | GIS, ArcView, PageMaker, Medical imagery |
3-D World | CAD, Medical, Molecules, Architecture |
Multi-Var | Spotfire, Tableau, GGobi, TableLens, ParCoords, |
Temporal | LifeLines, TimeSearcher, Palantir, DataMontage, LifeFlow |
Tree | Cone/Cam/Hyperbolic, SpaceTree, Treemap, Treeversity |
Network | Gephi, NodeXL, Sigmajs |