MR(Hadoop) Job Execution PatternsMRprogramTheexecutionofMap TasksaMRjobinvolvesReduce Tasks6 steps1:Job submission6: Output will be stored backMasternodetoDistributedFileSystemWorkernodesWorkernodeReduce output3: Map phase5:Reduce phase4:ShufflephaseConcurrent tasksConcurrent tasks16
16 MR(Hadoop) Job Execution Patterns MR program Master node 1: Job submission Worker nodes Worker nodes Map Tasks Reduce Tasks 5: Reduce phase Concurrent tasks 6: Output will be stored back to Distributed File System Reduce output 3: Map phase Concurrent tasks 4: Shuffle phase The execution of a MR job involves 6 steps
MR(Hadoop) Job Execution PatternsMRprogramThe execution ofMap TasksaMRjobinvolvesReduce Tasks6 steps1: Job submission6: Output will be stored backMaster nodetoDistributedFileSvstemWorkernodesWorkernodeAMapReduce(MR)job is resource-consuming:l:Inputdata scanintheMapphase=>localorremoteI/Os2:Storeintermediateresultsof Mapoutput=>localI/Os3:Transferdata acrossinthe Shufflephase=>networkcosts4:Storefinalresultsof thisMRjob=>localI/Os+networkcosts(replicatedata)
17 MR(Hadoop) Job Execution Patterns MR program Master node 1: Job submission Worker nodes Worker nodes Map Tasks Reduce Tasks 5: Reduce phase Concurrent tasks 6: Output will be stored back to Distributed File System Reduce output 3: Map phase Concurrent tasks 4: Shuffle phase The execution of a MR job involves 6 steps A MapReduce (MR) job is resource-consuming: 1: Input data scan in the Map phase => local or remote I/Os 2: Store intermediate results of Map output => local I/Os 3: Transfer data across in the Shuffle phase => network costs 4: Store final results of this MR job => local I/Os + network costs (replicate data)
A big data warehouseeisbuilt on top of HadoopHIVE
A big data warehouse is built on top of Hadoop
Hive is a data warehouse systemSQL queriesScalable &FaultHivetolerantby HadoopLarge scaleLargevolumeclustere.g.300PB@FB19
Hive is a data warehouse system Hive Large volume e.g. 300PB@FB Scalable & Faulttolerant by Hadoop 19 SQL queries Large scale . cluster
An open source for production systemsMajor users: eBay,Facebook,LinkedlnSpotify, Netflix, Taobao, Tencent, Yahoo! and- Plus major software vendors: IBM, Microsoft,Teradata, :: Active open source development community- 100+ developers made efforts on3000+tickets inrecent yearsLatestrelease:Hive0.13.120
An open source for production systems • Major users: eBay, Facebook, LinkedIn, Spotify, Netflix, Taobao, Tencent, Yahoo! and . – Plus major software vendors: IBM, Microsoft, Teradata, . • Active open source development community – 100+ developers made efforts on 3000+ tickets in recent years • Latest release: Hive 0.13.1 20