当前位置：和泉文库 > 计算机 > 浏览文档

《系统软件与软件安全》课程教学课件（PPT讲稿，英文）Lecture-3-MR-model-and-systems

文件格式：PPTX，文件大小：1.7MB，售价：13.64元

文档详细内容（约67页）

6MajorIssuesofBigDataSystems·Access patterns are unpredictable- data analytics can be in different ways: Locality may not be a major concern-Every data may be important (e.g. a key word search).Majorconcerns-Scale out: throughput + as the number of nodes +- Fault tolerant (nodes are frequently dead)- Low cost processing for increasingly large volumes: These are not largely considered in existing systems

Major Issues of Big Data Systems • Access patterns are unpredictable – data analytics can be in different ways • Locality may not be a major concern – Every data may be important (e.g. a key word search) • Major concerns – Scale out: throughput + as the number of nodes + – Fault tolerant (nodes are frequently dead) – Low cost processing for increasingly large volumes • These are not largely considered in existing systems 6

MapReduce Data Processing EngineA simple but effective programming model designed toprocess huge volumes of data concurrentlyTwo unique properties- Minimum dependency among tasks (almost sharing nothing)- Simpletaskoperationsineachnode(lowcostmachinesaresufficient). Two strong merits for big data anaytics- Scalability (Amadal's Law): increase throughput byincreasing # of nodes-Fault-tolerance (quick and low cost recovery of thefailuresof tasks)·HadoopisawidelyusedsoftwareofMapReduce- in thousands of society-dependent corporations andorganizations for big data analytics: AOL, Baidu, EBayFacebook,IBM,NY Times,Yahoo!7

MapReduce Data Processing Engine • A simple but effective programming model designed to process huge volumes of data concurrently • Two unique properties – Minimum dependency among tasks (almost sharing nothing) – Simple task operations in each node (low cost machines are sufficient) • Two strong merits for big data anaytics – Scalability (Amadal’s Law): increase throughput by increasing # of nodes – Fault-tolerance (quick and low cost recovery of the failures of tasks) • Hadoop is a widely used software of MapReduce – in thousands of society-dependent corporations and organizations for big data analytics: AOL, Baidu, EBay, Facebook, IBM, NY Times, Yahoo! . 7

MapReduce Operations on HadoopGet average salary of each of 2 organizations in a huge file(name: (org., salary))(org.: avg. salary)KeyValueKeyValueOriginal key/value pairs:Result key/value pairs: twoall the person namesentries showing the org nameassociated with each organd corresponding averagesalaryname and their salariesName(dept.,salary)dept.avg.salaryAlice(Org-1, 3000)Org-1Bob(Org-2, 3500)Org-28

MapReduce Operations on Hadoop • Get average salary of each of 2 organizations in a huge file. {name: (org., salary)} {org.: avg. salary} 8 Key Value Key Value Original key/value pairs: all the person names associated with each org name and their salaries Result key/value pairs: two entries showing the org name and corresponding average salary Name (dept. ,salary) Alice (Org-1, 3000) Bob (Org-2, 3500) . . dept. avg. salary Org-1 . Org-2

MapReduce Operations on Hadoop Calculate the average salary of every organization(name: (org., salary))(org.: avg. salary)HDFSA HDFS blockHadoop Distributed File System (HDFS)

HDFS MapReduce Operations on Hadoop • Calculate the average salary of every organization {name: (org., salary)} {org.: avg. salary} 9 A HDFS block Hadoop Distributed File System (HDFS)

MapReduce Operations on Hadoop: Calculate the average salary of every department(name: (org., salary))(org.: avg. salary]TTDHDFSMapMapMap业业力Each map task takes 4 HbFS blocks as its inputand extract (org. salary? as new key/value pairs,(Alice: (org-1, 3000)o (org-1: 3000)e.g.3 Map tasks concurrently process input dataRecords of "org-1"Records of "org-2"10

HDFS MapReduce Operations on Hadoop • Calculate the average salary of every department Map Map Map {name: (org., salary)} {org.: avg. salary} 10 Each map task takes 4 HDFS blocks as its input and extract {org.: salary} as new key/value pairs, e.g. {Alice: (org-1, 3000)} to {org-1: 3000} 3 Map tasks concurrently process input data Records of “org-1” Records of “org-2

点击进入文档下载页（PPTX格式）

共67页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

《系统软件与软件安全》课程教学课件（PPT讲稿，英文）Lecture-2-access-patterns-in-big-data
《系统软件与软件安全》课程教学课件（PPT讲稿，英文）Lecture-1-balanced-systems-updated
《系统软件与软件安全》课程教学资源（文献资料）系统软件与软件安全文献合集
济南大学：研究生院《人工智能》专业课程教学大纲汇编
济南大学：研究生院《计算机技术》专业课程教学大纲汇编
济南大学：研究生院《计算机科学与技术》专业课程教学大纲汇编
北京信息科技大学：研究生院计算机学院课程教学大纲汇编
湖南工业大学：计算机与人工智能学院人工智能专业课程教学大纲汇编（2023版人才培养方案）
湖南工业大学：计算机与人工智能学院智能科学与技术专业课程教学大纲汇编（2023版人才培养方案）
湖南工业大学：计算机与人工智能学院物联网工程专业课程教学大纲汇编（2023版人才培养方案）
湖南工业大学：计算机与人工智能学院网络工程专业课程教学大纲汇编（2023版人才培养方案）
湖南工业大学：计算机与人工智能学院通信工程专业课程教学大纲汇编（2023版人才培养方案）
《系统软件与软件安全》课程教学课件（PPT讲稿，英文）Lecture-4-LSbM-tree
《系统软件与软件安全》课程教学课件（PPT讲稿，英文）Lecture-7-big-volume-data-accesses
《系统软件与软件安全》课程教学课件（PPT讲稿，英文）Lecture-6-locks-and-CC
《系统软件与软件安全》课程教学课件（PPT讲稿，英文）Lecture-7-SSD-sys
《系统软件与软件安全》课程教学课件（PPT讲稿，英文）Lecture-8-SDS-vision
江苏科技大学：《计算机组成原理》课程教学资源（PPT课件，完整讲稿，共十章）
江苏科技大学：《微机原理与接口技术》课程教学资源（PPT课件）Chapter1_1计算机基础知识
江苏科技大学：《微机原理与接口技术》课程教学资源（PPT课件）Chapter1_2计算机中数的表示和编码
江苏科技大学：《微机原理与接口技术》课程教学资源（PPT课件）Chapter2_1 8086-8088微处理器结构
江苏科技大学：《微机原理与接口技术》课程教学资源（PPT课件）Chapter2_2 8086-8088的寻址方式
江苏科技大学：《微机原理与接口技术》课程教学资源（PPT课件）Chapter2_3 8086-8088的指令系统
江苏科技大学：《微机原理与接口技术》课程教学资源（PPT课件）Chapter2_4逻辑指令-控制转移指令

点击购买下载（PPTX）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录