LSbM-tree:一个读写兼优的大数据存储结构
LSbM-tree:一个读写兼优的大数据存储结构 1
2MajorDataFormatsinStorageSystems: Sequentially archived data- Indexed data, e.g. sorted data by a defined key, ...- Read/write largely by B+-tree and LSM-tree.Relational tables- Structured data formats for relational databases, e.g. MySQL- Read/write operations by relational algebra/calculus·Key-value store-Apair ofkey/value fora data item, e.g.redis, memcached-Read/write: request -> index -> fetching data·Graph-databases? Free-style text files-A file may be retrieved by KV-store, indexed directory,
Major Data Formats in Storage Systems • Sequentially archived data – Indexed data, e.g. sorted data by a defined key, . – Read/write largely by B+-tree and LSM-tree • Relational tables – Structured data formats for relational databases, e.g. MySQL – Read/write operations by relational algebra/calculus • Key-value store – A pair of key/value for a data item, e.g. redis, memcached – Read/write: request -> index –> fetching data • Graph-databases • Free-style text files – A file may be retrieved by KV-store, indexed directory, . 2
3New Challenges to AccessPerformance in Big Data.Sequentially archived data-Canwemassivelyprocessbothreadsandwritesconcurrently?-butLSM-treefavorswritesandB+-treefavorsreads.Relationaltables- Tables must partitioned/placed among many nodes,e.g.Apache Hive- How to minimize data transfers among nodes and from local disks?·Key-valuestore-Keyindexingbecomesabottleneckas#concurrentrequestsincrease-Howtoacceleratedataaccessesforin-memorykey-valuestore?
New Challenges to Access Performance in Big Data • Sequentially archived data – Can we massively process both reads and writes concurrently? – but LSM-tree favors writes and B+-tree favors reads • Relational tables – Tables must partitioned/placed among many nodes, e.g. Apache Hive – How to minimize data transfers among nodes and from local disks? • Key-value store – Key indexing becomes a bottleneck as # concurrent requests increase – How to accelerate data accesses for in-memory key-value store? 3
Fast Accesses to Sequentially/ArchivedDatain both memory and disks
Fast Accesses to Sequentially Archived Data in both memory and disks 4
What is LSM-tree?It is a Log-structured merge-tree (1996):Multiple levels of sorted data, (e.g. each by a B+ tree)Each level increases exponentially, forming a “pyramid'The smallest level is in memory and the rest on diskCoDRAMHDDC1C2
What is LSM-tree? It is a Log-structured merge-tree (1996): • Multiple levels of sorted data, (e.g. each by a B+ tree) • Each level increases exponentially, forming a “pyramid” • The smallest level is in memory and the rest on disk C0 C1 C2 C0 C1 C2 HDD DRAM 5