22 August 2015

Excel Basics

Big Data

CAP Theorem

  • Consistency
  • Availability
  • Partition Tolerance

CA-> SQL Server,MySQL,PostgreSQL

AP->Cassandra,CouchDB,DynamoDB

CP->MongoDB,Memcache,Hbase,Redis

##Big Data Definition

  • 100s Of TB - XPB
  • Uses Hadoop
  • Three Vs
  • Too big for OLTP
  • Use distributed/parallel processing

##MapReduce

Hadoop is a implement of MapReduce

  • Map Step: split the data and pre-process it
  • Reduce step: aggregate the result

Hadoop

It’s an Apache project,combining a MapReduce engine with a distributed file system(HDFS).

Hadoop stack:

Log file integration Machine/Learning Data Mining RDBMS Import/Export Query:HiveQL and Pig Latin Database MapReduce,HDFS

MPP:Massivelly Parallel Processing:Use column stores

NoSQL+Big Data

R language



blog comments powered by Disqus