Big Data
22 August 2015
Excel Basics
Big Data
CAP Theorem
- Consistency
- Availability
- Partition Tolerance
CA-> SQL Server,MySQL,PostgreSQL
AP->Cassandra,CouchDB,DynamoDB
CP->MongoDB,Memcache,Hbase,Redis
##Big Data Definition
- 100s Of TB - XPB
- Uses Hadoop
- Three Vs
- Too big for OLTP
- Use distributed/parallel processing
##MapReduce
Hadoop is a implement of MapReduce
- Map Step: split the data and pre-process it
- Reduce step: aggregate the result
Hadoop
It’s an Apache project,combining a MapReduce engine with a distributed file system(HDFS).
Hadoop stack:
Log file integration Machine/Learning Data Mining RDBMS Import/Export Query:HiveQL and Pig Latin Database MapReduce,HDFS
MPP:Massivelly Parallel Processing:Use column stores
NoSQL+Big Data
R language
blog comments powered by Disqus