Workload Analysis of a Large-Scale Key-Value StoreBerk AtikogluYuehai XuEitanFrachtenbergStanford,FacebookWayne State, FacebookFacebookatikoglu@stanford.eduetc@fb.comyhxu@wayne.eduSong JiangMike PalecznyWayne StateFacebookmpal@fb.comsjiang@wayne.eduABSTRACTKeywordsKey-value stores are a vital component in many scale-outWorkload Analysis, Workload modeling, Key-Value Storeenterprises, including social networks, online retail, and riskanalysis. Accordingly, they are receiving increased atten-INTRODUCTION1.tion from theresearch community in an effortto improveKey-value (KV) stores play an important role in manytheir performance, scalability, reliability, cost, and powerlarge websites.Examples include: Dynamo at Amazon [15];consumption. To be effective, such efforts require a detailedRedis at GitHub, Digg, and Blizzard Interactive [27]; Mem-understanding of realistic key-value workloads. And yet lit-cached at Facebook, Zynga and Twitter [18, 26]; and Volde-tle is known about these workloads outside of the companiesmort at Linkedin [1]. All these systems store ordered (key, value)that operate them. This paper aims to address this gap.pairs and are, in essence, a distributed hash table.To this end, we have collected detailed traces from Face-A common use case for these systems is as a layer in thebook's Memcached deployment, arguably the world's largest.data-retrieval hierarchy: a cache for expensive-to-obtain val-The traces capture over 284 billion requests from five differ-ues, indexed by unique keys.These values can representent Memcached use cases over several days.We analyze theany data that is cheaper or faster to cache than re-obtain,workloadsfrommultipleangles,including:requestcompo-such as commonly accessed results of database queries orsition, size, and rate; cache efficacy; temporal patterns; andthe results of complex computations that require temporaryapplication use cases.We alsopropose a simplemodel of thestorage and distribution.most representative trace to enable the generation of moreBecause of their key role in large website performance,KVrealistic synthetic workloads by the communitystores arecarefullytunedforlowresponsetimes andhighOur analysis details many characteristics of the cachinghit rates. But like all caching heuristics, a KV-store's per-workload.Italsoreveals anumberof surprises:aGET/SETformance is highly dependent on its workload.It is there-ratio of 30:1 that is higher than assumed in the literature:fore imperative to understand the workload's characteris-some applications of Memcached behavemore like persistenttics. Additionally, analyzing and understanding large-scalestoragethan a cache; strong locality metrics,such as keyscache workloads can also: provide insights into topics suchaccessed many millions of times a day, do not always suf.as the role and effectiveness of memory-based caching in dis-fice for a high hit rate; and there is still room for efficiencytributed website infrastructure; expose the underlying pat-and hit rate improvements in Memcached's implementation.terns of user behavior; and provide difficult-to-obtain dataToward the last point, we make several suggestions that ad-and statistical distributions forfuture studies.dress the exposed deficiencies.In this paper, we analyze five workloads from Facebook'sMemcached deployment.Aside from the sheer scale of thesite and data (over 284 billion requests over a period of 58Categories and Subject Descriptorssample days), this case study also introduces to the commu-nity several different usage scenarios for KV stores. ThisC.2.4[Distributed Systems]:Distributed Databases:D.4.8variability serves to explore the relationship between the[Performance]: Modeling and Prediction; D.4.2 [Storagecache and various data domains: where overall site patternsManagement]:Distributed Memoriesare adequatelyhandled byageneralized cachinginfrastruc-ture, and where specialization would help. In addition, this*Corresponding author.paper offers thefollowingkey contributions and findings:1.A workload decomposition of the traces that showshow different applications of Memcached can have ex-treme variations in terms of read/write mix, requestPermission to make digital or hard copies of all or part of this work forsizes and rates, and usage patterns (Sec. 3)personal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copies2. An analysis of the caching characteristics of the tracesbear this notice and the full citation on the first page. To copy otherwise, toand the factors that determine hit rates.We foundrepublish, to post on servers orto redistribute to lists,requires prior specificthat different Memcached pools can vary significantlypermissionand/orafeein their locality metrics, but surprisingly, the best pre-SIGMETRICS'12, June 11-15, 2012, London, England, UK.dictor of hit rates is actually the pool's size (Sec. 6).Copyright 2012 ACM 978-1-4503-1097-0/12/06 ..S10.00
Workload Analysis of a Large-Scale Key-Value Store Berk Atikoglu Stanford, Facebook atikoglu@stanford.edu Yuehai Xu Wayne State, Facebook yhxu@wayne.edu Eitan Frachtenberg∗ Facebook etc@fb.com Song Jiang Wayne State sjiang@wayne.edu Mike Paleczny Facebook mpal@fb.com ABSTRACT Key-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, scalability, reliability, cost, and power consumption. To be effective, such efforts require a detailed understanding of realistic key-value workloads. And yet little is known about these workloads outside of the companies that operate them. This paper aims to address this gap. To this end, we have collected detailed traces from Facebook’s Memcached deployment, arguably the world’s largest. The traces capture over 284 billion requests from five different Memcached use cases over several days. We analyze the workloads from multiple angles, including: request composition, size, and rate; cache efficacy; temporal patterns; and application use cases. We also propose a simple model of the most representative trace to enable the generation of more realistic synthetic workloads by the community. Our analysis details many characteristics of the caching workload. It also reveals a number of surprises: a GET/SET ratio of 30:1 that is higher than assumed in the literature; some applications of Memcached behave more like persistent storage than a cache; strong locality metrics, such as keys accessed many millions of times a day, do not always suf- fice for a high hit rate; and there is still room for efficiency and hit rate improvements in Memcached’s implementation. Toward the last point, we make several suggestions that address the exposed deficiencies. Categories and Subject Descriptors C.2.4 [Distributed Systems]: Distributed Databases; D.4.8 [Performance]: Modeling and Prediction; D.4.2 [Storage Management]: Distributed Memories ∗Corresponding author. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMETRICS’12, June 11–15, 2012, London, England, UK. Copyright 2012 ACM 978-1-4503-1097-0/12/06 .$10.00. Keywords Workload Analysis, Workload modeling, Key-Value Store 1. INTRODUCTION Key-value (KV) stores play an important role in many large websites. Examples include: Dynamo at Amazon [15]; Redis at GitHub, Digg, and Blizzard Interactive [27]; Memcached at Facebook, Zynga and Twitter [18, 26]; and Voldemort at Linkedin [1]. All these systems store ordered (key, value) pairs and are, in essence, a distributed hash table. A common use case for these systems is as a layer in the data-retrieval hierarchy: a cache for expensive-to-obtain values, indexed by unique keys. These values can represent any data that is cheaper or faster to cache than re-obtain, such as commonly accessed results of database queries or the results of complex computations that require temporary storage and distribution. Because of their key role in large website performance, KV stores are carefully tuned for low response times and high hit rates. But like all caching heuristics, a KV-store’s performance is highly dependent on its workload. It is therefore imperative to understand the workload’s characteristics. Additionally, analyzing and understanding large-scale cache workloads can also: provide insights into topics such as the role and effectiveness of memory-based caching in distributed website infrastructure; expose the underlying patterns of user behavior; and provide difficult-to-obtain data and statistical distributions for future studies. In this paper, we analyze five workloads from Facebook’s Memcached deployment. Aside from the sheer scale of the site and data (over 284 billion requests over a period of 58 sample days), this case study also introduces to the community several different usage scenarios for KV stores. This variability serves to explore the relationship between the cache and various data domains: where overall site patterns are adequately handled by a generalized caching infrastructure, and where specialization would help. In addition, this paper offers the following key contributions and findings: 1. A workload decomposition of the traces that shows how different applications of Memcached can have extreme variations in terms of read/write mix, request sizes and rates, and usage patterns (Sec. 3). 2. An analysis of the caching characteristics of the traces and the factors that determine hit rates. We found that different Memcached pools can vary significantly in their locality metrics, but surprisingly, the best predictor of hit rates is actually the pool’s size (Sec. 6)
3.An examination of various performance metrics overTablel:Memcached pools sampled (in one cluster)time, showing diurnal and weekly patterns (Sec. 3.3,These pools do not match their UNIX namesakes,4.2.2, 6).but are used for illustrative purposes here insteadoftheirinternalnames4. An analytical model that can be used to generate morePoolSizeDescriptionrealisticsyntheticworkloads.WefoundthatthesalientUSRfewuser-account status informationsize characteristics follow power-law distributions, sim-APPdozensobject metadata of one applicationilar to other storage and Web-serving systems (Sec. 5).ETChundredsnonspecific, general-purposeVAR5.An exposition of a Memcached deployment that candozensserver-side browserinformationshed light on real-world, large-scale production usageSYSfewsystem data on service locationof KV-stores (Sec.2.2, 8).Therest of this paper is organized as follows.We begin byA new item arriving after the heap is exhausted requiresdescribing the architecture of Memcached, its deploymentthe eviction of an older item in the appropriate slab. Mem-at Facebook, and how we analyzed its workload.Sec.3cached uses the Least-Recently-Used (LRU) algorithm topresents the observed experimental properties of the traceselect the items for eviction. To this end, each slab classdata (from the request point of view), while Sec.4 describeshas an LRU queue maintaining access history on its items.the observed cache metrics (from the server point of view).AlthoughLRUdecreesthatanyaccesseditembemovedtoSec. 5 presents a simple analytical model of the most rep-the top of the queue, this version of Memcached coalescesresentative workload. The next section brings the data to-repeated accesses of the same item within a short periodgether in a discussion of our results, followed by a section(one minute by default) and only moves this item to the topsurveying previous efforts on analyzing cache behavior andthe first time, to reduce overhead.workload analysis.2.2Deployment2.MEMCACHEDDESCRIPTIONFacebook relies on Memcached forfast access tofrequently-accessed values. Web servers typically try to read persistent2.1 Architecturevalues from Memcached before trying the slower backenddatabases.In many cases,the caches aredemand-filledMemcached' is a simple,open-source software packagemeaning that generally, data is added to the cache afterthat exposes data in RAM to clients over the network. Asa client has requested it and failed.data sizegrows in theapplication, more RAM can be addedModifications to persistent data in the database oftento a server, or more servers can be added to the network.propagate as deletions (invalidations) to the MemcachedAdditionalserversgenerallyonlycommunicatewith clients.tier.Somecached data,however,istransientand not backedClients use consistent hashing [9] to select a unique serverby persistent storage, requiring no invalidations.perkey,requiringonlytheknowledgeofthetotalnumberofPhysically, Facebook deploys front-end servers in multipleservers and their IP addresses. This technique presents thedatacenters, each containing one or more clusters of varyingentire aggregate data in the servers as a unified distributedsizes. Front-end clusters consist of both Web servers, run-hash table, keeps servers completely independent, and facil-ning primarily HipHop [31], and caching servers, runningitates scaling as data size grows.primarily Memcached. These servers are further subdividedMemcached's interface provides the basic primitives thatbased on the concept of pools. A pool is a partition of thehash tables provideinsertion, deletion, and retrievalasentire key space, defined by a prefix of the key, and typi-well as more complex operations built atop them.cally represents a separate application or data domain. TheData are stored as individual items, each including a key,amain reason for separate domains (as opposed to one all-value, and metadata. Item size can vary from a few bytes toencompassing cache)is to ensure adequate quality of serviceover 100 K B, heavily skewed toward smaller items (Sec. 3).for each domain.For example,oneapplication withhighConsequently.anaive memory allocation scheme could re-turnover rate could evict keys of another application thatsultin significantmemoryfragmentation.Toaddress thisis-shares the same server, even if the latter has high temporalsue, Memcached adopts a slab allocation technique, in whichlocalitybutloweraccessrates.Anotherreasontoseparatememory isdivided intoslabs of differentsizes.The slabsindomains is to facilitate application-specific capacity plan-aclassstoreitemswhosesizesarewithintheslab'sspecificning and performanceanalysis.range.Anewly inserted item obtains its memory spacebyIn this paper,we describe tracesfrom fiveseparate poolsfirst searchingtheslab class corresponding to its size.Ifonetrace from eachpool (traces from separate machinesthis search fails, a new slab of the class is allocated fromin the same pool exhibit similar characteristics).Thesetheheap.Symmetrically, when an item is deleted from thepools represent avaried spectrum of application domainscache, its space is returned to the appropriate slab, ratherand cache usage characteristics (Table 1).Onepool inpar-than the heap. Memory is allocated to slab classes basedticular, ETC, represents general cache usage of multiple ap-on the initial workload and its item sizes, until the heapplications, and is also the largest of the pools; the data col-is exhausted. Consequently,if the workload characteristicsiected from this trace may be the most applicable to general-changesignificantlyafterthisinitialphase,wemayfindthatpurpose KV-stores.the slab allocation is inappropriate for the workload, result-Thefocus of this paper is on workload characteristics.ing in memory underutilization.patterns, and relationships to social networking, so the exact'http://memcached.org/details of server count and components have little relevance
3. An examination of various performance metrics over time, showing diurnal and weekly patterns (Sec. 3.3, 4.2.2, 6). 4. An analytical model that can be used to generate more realistic synthetic workloads. We found that the salient size characteristics follow power-law distributions, similar to other storage and Web-serving systems (Sec. 5). 5. An exposition of a Memcached deployment that can shed light on real-world, large-scale production usage of KV-stores (Sec. 2.2, 8). The rest of this paper is organized as follows. We begin by describing the architecture of Memcached, its deployment at Facebook, and how we analyzed its workload. Sec. 3 presents the observed experimental properties of the trace data (from the request point of view), while Sec. 4 describes the observed cache metrics (from the server point of view). Sec. 5 presents a simple analytical model of the most representative workload. The next section brings the data together in a discussion of our results, followed by a section surveying previous efforts on analyzing cache behavior and workload analysis. 2. MEMCACHED DESCRIPTION 2.1 Architecture Memcached1 is a simple, open-source software package that exposes data in RAM to clients over the network. As data size grows in the application, more RAM can be added to a server, or more servers can be added to the network. Additional servers generally only communicate with clients. Clients use consistent hashing [9] to select a unique server per key, requiring only the knowledge of the total number of servers and their IP addresses. This technique presents the entire aggregate data in the servers as a unified distributed hash table, keeps servers completely independent, and facilitates scaling as data size grows. Memcached’s interface provides the basic primitives that hash tables provide—insertion, deletion, and retrieval—as well as more complex operations built atop them. Data are stored as individual items, each including a key, a value, and metadata. Item size can vary from a few bytes to over 100 KB, heavily skewed toward smaller items (Sec. 3). Consequently, a na¨ıve memory allocation scheme could result in significant memory fragmentation. To address this issue, Memcached adopts a slab allocation technique, in which memory is divided into slabs of different sizes. The slabs in a class store items whose sizes are within the slab’s specific range. A newly inserted item obtains its memory space by first searching the slab class corresponding to its size. If this search fails, a new slab of the class is allocated from the heap. Symmetrically, when an item is deleted from the cache, its space is returned to the appropriate slab, rather than the heap. Memory is allocated to slab classes based on the initial workload and its item sizes, until the heap is exhausted. Consequently, if the workload characteristics change significantly after this initial phase, we may find that the slab allocation is inappropriate for the workload, resulting in memory underutilization. 1http://memcached.org/ Table 1: Memcached pools sampled (in one cluster). These pools do not match their UNIX namesakes, but are used for illustrative purposes here instead of their internal names. Pool Size Description USR few user-account status information APP dozens object metadata of one application ETC hundreds nonspecific, general-purpose VAR dozens server-side browser information SYS few system data on service location A new item arriving after the heap is exhausted requires the eviction of an older item in the appropriate slab. Memcached uses the Least-Recently-Used (LRU) algorithm to select the items for eviction. To this end, each slab class has an LRU queue maintaining access history on its items. Although LRU decrees that any accessed item be moved to the top of the queue, this version of Memcached coalesces repeated accesses of the same item within a short period (one minute by default) and only moves this item to the top the first time, to reduce overhead. 2.2 Deployment Facebook relies on Memcached for fast access to frequentlyaccessed values. Web servers typically try to read persistent values from Memcached before trying the slower backend databases. In many cases, the caches are demand-filled, meaning that generally, data is added to the cache after a client has requested it and failed. Modifications to persistent data in the database often propagate as deletions (invalidations) to the Memcached tier. Some cached data, however, is transient and not backed by persistent storage, requiring no invalidations. Physically, Facebook deploys front-end servers in multiple datacenters, each containing one or more clusters of varying sizes. Front-end clusters consist of both Web servers, running primarily HipHop [31], and caching servers, running primarily Memcached. These servers are further subdivided based on the concept of pools. A pool is a partition of the entire key space, defined by a prefix of the key, and typically represents a separate application or data domain. The main reason for separate domains (as opposed to one allencompassing cache) is to ensure adequate quality of service for each domain. For example, one application with high turnover rate could evict keys of another application that shares the same server, even if the latter has high temporal locality but lower access rates. Another reason to separate domains is to facilitate application-specific capacity planning and performance analysis. In this paper, we describe traces from five separate pools— one trace from each pool (traces from separate machines in the same pool exhibit similar characteristics). These pools represent a varied spectrum of application domains and cache usage characteristics (Table 1). One pool in particular, ETC, represents general cache usage of multiple applications, and is also the largest of the pools; the data collected from this trace may be the most applicable to generalpurpose KV-stores. The focus of this paper is on workload characteristics, patterns, and relationships to social networking, so the exact details of server count and components have little relevance
here. It is important to note, however, that all Memcached70000DELETEinstances in this study ran on identical hardware.UPDATE60000GET2.3TracingMethodologyOur analysis called for complete traces of traffic passing50000(srai)seethrough Memcached servers for at least a week. This task40000lilis particularly challenging because it requires nonintrusiveinstrumentation of high-traffic volume production servers.30000Standard packet sniffers such as tcpdumphave too muchWe therefore imple-overhead to run under heavy load.20000mented an efficient packet sniffer called mcap.ImplementedasaLinuxkernel module,mcap has several advantages over10000standard packet sniffers: it accessespacket data in kernelspace directly and avoids additional memory copying; it in-troduces only 3% performance overhead (as opposed to tcp-USRAPPETCVARSYSdump's 30%); and unlike standard sniffers, it handles out-Poolof-order packets correctly by capturing incoming traffic af-ter all TCP processing is done. Consequently, mcap has aFigure l:Distribution of request types per pool,complete view of what the Memcached server sees, whichover exactly7days.UPDATE commands aggregateeliminates the need for further processing of out-of-orderall non-DELETE writing operations, such as SET,packets. On the other hand, its packet parsing is optimizedREPLACE, etc.for Memcached packets, and would require adaptations forotherapplicationsThe captured traces vary in size from 3TB to 7TB each.operations. DELETE operations occur when a cachedThis data is too large to store locally on disk, adding anotherdatabase entry is modified (but not required to bechallenge: how to offload this much data (at an average rateset again in the cache). SET operations occur whenofmorethan80,000samplespersecond)without interferingthe Web servers add a value to the cache. The rela-with production traffic.We addressed this challenge by com-tively high number of DELETE operations show thatbininglocal disk buffering and dynamic ofload throttling tothis pool represents database-backed values that aretake advantage of low-activityperiods in the servers.affected by frequent user modifications.Finally,another challenge is this: how to effectively pro-cess these large data sets? We used Apache HIVE3 to ana-ETC has similar characteristics to APP, but with an evenlyzeMemcached traces.HIVE is part of the Hadoop frame-higher rate of DELETE requests (of which some maywork that translates SQL-like queries into MapReduce jobs.notbecurrentlycached).ETCisthelargestandleastWe also used the Memcached "stats" command, as well asspecificofthepools,soitsworkloadsmightbethemostFacebook's production logs, to verify that the statistics werepresentative to emulate. Because it is such a largecomputed, such as hit rates, are consistent with the aggre-and heterogenous workload, we pay special attentiongated operational metrics collected by these tools.to this workload throughout the paper.VAR is the only pool sampled that is write-dominated. It3.WORKLOADCHARACTERISTICSstores short-termvaluessuchasbrowser-windowsizeThis section describes the observed properties ofeach tracefor opportunistic latency reduction.As such, thesein terms of the requests that comprise it, their sizes, andvalues are not backed by a database (hence, no invali-their frequencies.dating DELETEs are required). But they change fre-quently, accounting for the high number of UPDATEs.3.11RequestCompositionWe begin by looking at the basic data that comprises theSYS is used to locate servers and services, not user data. Asworkload: the total number of requests in each server, bro-such, the number of requests scales with the numberken down by request types (Fig. 1). Several observationsof servers, not the number of user requests, which isdelineate the different usage of each pool:much larger. This explains why the total number ofSYS requests is much smaller than the other pools'.USR handles significantly more GET requests than any ofthe other pools.GET operations comprise over 99.8%It is interesting to note that the ratio of GETs to UPDATEsof this pool's workload. One reason for this is that thein ETC (approximately 30 : 1) is significantly higher thanpool is sized large enoughtomaximizehit rates,somost synthetic workloads typically assume (Sec.7).Forrefreshing values is rarely necessary.These values aredemand-filled caches like USR, where each miss is followedalso updated at a slower rate than some of the otherbyan UPDATE, theratios of GETtoUPDATEoperationspools.The overall effect is that USR is used more likementioned above are related to hit rate in general and theRAM-based persistent storage than a cache.sizing of the cache to the data in particular. So in theory,one could justify any synthetic GET to UPDATE mix byAPP has high GET rates too-owing to the popularity ofcontrolling the cache size. But in practice, not all caches orthis application-but also a large number of DELETEkeys are demand-filled, and these cachesarealready sized to2http://www.tcpdump.org/fit a real-world workload in a way that successfully trades3http://hive.apache.org/off hitrates tocost
here. It is important to note, however, that all Memcached instances in this study ran on identical hardware. 2.3 Tracing Methodology Our analysis called for complete traces of traffic passing through Memcached servers for at least a week. This task is particularly challenging because it requires nonintrusive instrumentation of high-traffic volume production servers. Standard packet sniffers such as tcpdump2 have too much overhead to run under heavy load. We therefore implemented an efficient packet sniffer called mcap. Implemented as a Linux kernel module, mcap has several advantages over standard packet sniffers: it accesses packet data in kernel space directly and avoids additional memory copying; it introduces only 3% performance overhead (as opposed to tcpdump’s 30%); and unlike standard sniffers, it handles outof-order packets correctly by capturing incoming traffic after all TCP processing is done. Consequently, mcap has a complete view of what the Memcached server sees, which eliminates the need for further processing of out-of-order packets. On the other hand, its packet parsing is optimized for Memcached packets, and would require adaptations for other applications. The captured traces vary in size from 3T B to 7T B each. This data is too large to store locally on disk, adding another challenge: how to offload this much data (at an average rate of more than 80, 000 samples per second) without interfering with production traffic. We addressed this challenge by combining local disk buffering and dynamic offload throttling to take advantage of low-activity periods in the servers. Finally, another challenge is this: how to effectively process these large data sets? We used Apache HIVE3 to analyze Memcached traces. HIVE is part of the Hadoop framework that translates SQL-like queries into MapReduce jobs. We also used the Memcached “stats” command, as well as Facebook’s production logs, to verify that the statistics we computed, such as hit rates, are consistent with the aggregated operational metrics collected by these tools. 3. WORKLOAD CHARACTERISTICS This section describes the observed properties of each trace in terms of the requests that comprise it, their sizes, and their frequencies. 3.1 Request Composition We begin by looking at the basic data that comprises the workload: the total number of requests in each server, broken down by request types (Fig. 1). Several observations delineate the different usage of each pool: USR handles significantly more GET requests than any of the other pools. GET operations comprise over 99.8% of this pool’s workload. One reason for this is that the pool is sized large enough to maximize hit rates, so refreshing values is rarely necessary. These values are also updated at a slower rate than some of the other pools. The overall effect is that USR is used more like RAM-based persistent storage than a cache. APP has high GET rates too—owing to the popularity of this application—but also a large number of DELETE 2http://www.tcpdump.org/ 3http://hive.apache.org/ 0 10000 20000 30000 40000 50000 60000 70000 USR APP ETC VAR SYS Requests (millions) Pool DELETE UPDATE GET Figure 1: Distribution of request types per pool, over exactly 7 days. UPDATE commands aggregate all non-DELETE writing operations, such as SET, REPLACE, etc. operations. DELETE operations occur when a cached database entry is modified (but not required to be set again in the cache). SET operations occur when the Web servers add a value to the cache. The relatively high number of DELETE operations show that this pool represents database-backed values that are affected by frequent user modifications. ETC has similar characteristics to APP, but with an even higher rate of DELETE requests (of which some may not be currently cached). ETC is the largest and least specific of the pools, so its workloads might be the most representative to emulate. Because it is such a large and heterogenous workload, we pay special attention to this workload throughout the paper. VAR is the only pool sampled that is write-dominated. It stores short-term values such as browser-window size for opportunistic latency reduction. As such, these values are not backed by a database (hence, no invalidating DELETEs are required). But they change frequently, accounting for the high number of UPDATEs. SYS is used to locate servers and services, not user data. As such, the number of requests scales with the number of servers, not the number of user requests, which is much larger. This explains why the total number of SYS requests is much smaller than the other pools’. It is interesting to note that the ratio of GETs to UPDATEs in ETC (approximately 30 : 1) is significantly higher than most synthetic workloads typically assume (Sec. 7). For demand-filled caches like USR, where each miss is followed by an UPDATE, the ratios of GET to UPDATE operations mentioned above are related to hit rate in general and the sizing of the cache to the data in particular. So in theory, one could justify any synthetic GET to UPDATE mix by controlling the cache size. But in practice, not all caches or keys are demand-filled, and these caches are already sized to fit a real-world workload in a way that successfully trades off hit rates to cost
KevsizeCDEbyannaaraneweigh0.8 0.8 0.80.6 0.60.60.4 0.40.4USRUSR饼0.20.202SYSSYSSYSn40601001001000100001000001001000100001000001e+0aKey size (bytes)Value size (bytes)Value size (bytes)Figure 2:Key and value size distributions for all traces. The leftmost CDF shows the sizes of keys, up toMemcached's limitof 250B (not shown).The center plot similarly showshow value sizes distribute.Therightmost CDF aggregates value sizes by the total amount of data they use in the cache, so for example,values under 320Bor so in SYS usevirtuallyno space in the cache;320Bvaluesweigharound 8% of the data,and values close to500Btakeupnearly80% of theentire cache'sallocationfor values.3.2RequestSizestrace) differ in which of the two peaks is higher, the entireperiod between them, representing the Western HemisphereNext, we look at the sizes of keys and values in each poolday,sees the highest traffic volume. In terms of weekly pat-(Fig. 2), based on SET requests.All distributions showterns, we observe a small traffic drop on most Fridays andstrong modalities.For example, over 90% of APP's keys areSaturdays, with traffic picking up again on Sundays and31byteslong,and values sizesaround 270B showup inmoreMondays.than 30% of SET requests.USR is themost extreme:it onlyThe diurnal cycle represents load variation on the order ofhas two key size values (16B and 21B)and virtually just2x. We also observe the presence of traffic spikes. Typically,one value size (2B).Even in ETC, the most heterogeneousthese can represent a swift surge in user interest on one topic,of thepools,requests with 2-,3-, or 11-bytevalues add upsuch as occur with major news or media events. Less fre-to 40% of the total requests. On the other hand, it also hasquently,these spikes stem from programmatic or operationala few very large values (around 1MB) that skew the weightcauses. Either way, the implication for Memcached devel-distribution (rightmost plot in Fig. 2), leaving less cachingopmentanddeploymentisthatonemustbudgetindividualspaceforsmallervaluesnode capacity to allow for these spikes, which can easily dou-Small values dominate all workloads, not just in count,ble or even triple the normal peak request rate.Althoughbutespecially in overall weight.Exceptfor ETC, 90%ofsuch budgeting underutilizes resources during normal traf-all cache space is allocated to values of less than 500 B.Thefic, it is nevertheless imperative; otherwise, the many Webimplications for caching and system optimizations are sig-servers thatwould taketo this sudden traffic and fail to getnificant.For example, network overhead in the processingaprompt response from Memcached, would all query theofmultiplesmallpacketscanbesubstantial,whichexplainssame database nodes.This scenario could be debilitating,why Facebook coalesces as many requests as possible in asso itmustremainhypothetical.fewpacketsaspossible9-Anotherexampleismemoryfragmentation.The strong modality of each workload imCACHEBEHAVIORplies that different Memcached pools can optimize memory4.allocation by modifyingthe slabsize constants tofit eachThe main metric used in evaluating cache efficacy is hitdistribution. In practice, this is an unmanageable and un-rate: the percentage of GET requests that return a value.scalable solution, so instead Memcached uses many (44) slabThe overall hit rate of each server, as derived from the tracesclasses with exponentially growing sizes, in the hope of re-and verified with Memcached's own statistics, are shown inducing allocation waste, especially for small sizes.Table 2.This section takes a deeper look at the factorsthat influence these hit rates and how they relate to cache3.3TemporalPatternslocality,user behavior, temporal patterns, and Memcached'sTo understand how production Memcached load variesdesign.over time, we look at each trace's transient request rate overits entire collection period (Fig. 3). All traces clearly showTable2: MeanLcachehitrateoverentiretrthe expected diurnal pattern, but with different values andPoolTAPPTVARTSYSTUSRTETCamplitudes.If we increase our zoom factor further (as in theHitrate92.9%93.7%98.7%98.2%81.4%lastplot),wenotice that trafficinETC bottoms out around08:00 and has two peaks around 17:00 and 03:00. Not sur-prisingly, the hours immediately preceding 08:00 UTC (mid-4.1HitRates over Timenight in Pacific Time) represent night time in the WesternWhen looking at how hit rates vary over time (Fig. 4).HemisphereThe first peak, on the other hand, occurs as North Amer-almost all traces show diurnal variance, within a small bandof a few percentage points.USR's plot is curious: it appearsica startsitsday,whileit iseveninginEurope,and continuesuntil the later peak time for North America. Although dif-to be monotonically increasing (with diurnal undulation)ferenttraces(andsometimes even differentdaysinthesameThis behavior stems from the usage model for USR.Recall
0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 Key size (bytes) Key size CDF by appearance USR APP ETC VAR SYS 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 100000 1e+06 Value size (bytes) Value Size CDF by appearance USR APP ETC VAR SYS 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 100000 1e+06 Value size (bytes) Value size CDF by total weight USR APP ETC VAR SYS Figure 2: Key and value size distributions for all traces. The leftmost CDF shows the sizes of keys, up to Memcached’s limit of 250 B (not shown). The center plot similarly shows how value sizes distribute. The rightmost CDF aggregates value sizes by the total amount of data they use in the cache, so for example, values under 320 B or so in SYS use virtually no space in the cache; 320 B values weigh around 8% of the data, and values close to 500 B take up nearly 80% of the entire cache’s allocation for values. 3.2 Request Sizes Next, we look at the sizes of keys and values in each pool (Fig. 2), based on SET requests. All distributions show strong modalities. For example, over 90% of APP’s keys are 31 bytes long, and values sizes around 270 B show up in more than 30% of SET requests. USR is the most extreme: it only has two key size values (16 B and 21 B) and virtually just one value size (2 B). Even in ETC, the most heterogeneous of the pools, requests with 2-, 3-, or 11-byte values add up to 40% of the total requests. On the other hand, it also has a few very large values (around 1MB) that skew the weight distribution (rightmost plot in Fig. 2), leaving less caching space for smaller values. Small values dominate all workloads, not just in count, but especially in overall weight. Except for ETC, 90% of all cache space is allocated to values of less than 500 B. The implications for caching and system optimizations are significant. For example, network overhead in the processing of multiple small packets can be substantial, which explains why Facebook coalesces as many requests as possible in as few packets as possible [9]. Another example is memory fragmentation. The strong modality of each workload implies that different Memcached pools can optimize memory allocation by modifying the slab size constants to fit each distribution. In practice, this is an unmanageable and unscalable solution, so instead Memcached uses many (44) slab classes with exponentially growing sizes, in the hope of reducing allocation waste, especially for small sizes. 3.3 Temporal Patterns To understand how production Memcached load varies over time, we look at each trace’s transient request rate over its entire collection period (Fig. 3). All traces clearly show the expected diurnal pattern, but with different values and amplitudes. If we increase our zoom factor further (as in the last plot), we notice that traffic in ETC bottoms out around 08:00 and has two peaks around 17:00 and 03:00. Not surprisingly, the hours immediately preceding 08:00 UTC (midnight in Pacific Time) represent night time in the Western Hemisphere. The first peak, on the other hand, occurs as North America starts its day, while it is evening in Europe, and continues until the later peak time for North America. Although different traces (and sometimes even different days in the same trace) differ in which of the two peaks is higher, the entire period between them, representing the Western Hemisphere day, sees the highest traffic volume. In terms of weekly patterns, we observe a small traffic drop on most Fridays and Saturdays, with traffic picking up again on Sundays and Mondays. The diurnal cycle represents load variation on the order of 2×. We also observe the presence of traffic spikes. Typically, these can represent a swift surge in user interest on one topic, such as occur with major news or media events. Less frequently, these spikes stem from programmatic or operational causes. Either way, the implication for Memcached development and deployment is that one must budget individual node capacity to allow for these spikes, which can easily double or even triple the normal peak request rate. Although such budgeting underutilizes resources during normal traf- fic, it is nevertheless imperative; otherwise, the many Web servers that would take to this sudden traffic and fail to get a prompt response from Memcached, would all query the same database nodes. This scenario could be debilitating, so it must remain hypothetical. 4. CACHE BEHAVIOR The main metric used in evaluating cache efficacy is hit rate: the percentage of GET requests that return a value. The overall hit rate of each server, as derived from the traces and verified with Memcached’s own statistics, are shown in Table 2. This section takes a deeper look at the factors that influence these hit rates and how they relate to cache locality, user behavior, temporal patterns, and Memcached’s design. Table 2: Mean cache hit rate over entire trace. Pool APP VAR SYS USR ETC Hit rate 92.9% 93.7% 98.7% 98.2% 81.4% 4.1 Hit Rates over Time When looking at how hit rates vary over time (Fig. 4), almost all traces show diurnal variance, within a small band of a few percentage points. USR’s plot is curious: it appears to be monotonically increasing (with diurnal undulation). This behavior stems from the usage model for USR. Recall
APPVAR16000011000010000014000090000120000W80000es/en006100000s/ssanbe7000080000M60000600005000400004000030000200018088888888888888888888888818888888888888888888888888888000.SYSUSR20000350000180003000001600025000014000A12000200000100001b1500008000MMMYM600010000040005000020008888888888888888888888888888888888888888888R8R888R838628868868888868858868869869869869892869869869869@9869@98@92房房5055ETCETC 24 hours9000080007500080000700007000065000es/sneWW60000055050000500004000450004000030000888888888888888888888350008588588588588588588885885888885889#######################馆##SSSISIIIIIEE#####SSSIIIEIEII店aFigure 3: Request rates at different dates and times of day, Coordinated Universal Time (UTC). Each datapoint counts the total number of requests in the preceding second. Except for USR and VAR,different traceswere collected in different times. The last plot zooms in on a 24-hour period from the ETC trace for greaterdetail
20000 40000 60000 80000 100000 120000 140000 160000 Sat 00:00 Sat 08:00 Sat 16:00 Sun 00:00 Sun 08:00 Sun 16:00 Mon 00:00 Mon 08:00 Mon 16:00 Tue 00:00 Tue 08:00 Tue 16:00 Wed 00:00 Wed 08:00 Wed 16:00 Thu 00:00 Thu 08:00 Thu 16:00 Fri 00:00 Fri 08:00 Fri 16:00 Sat 00:00 Sat 08:00 Sat 16:00 Sun 00:00 Sun 08:00 Sun 16:00 Mon 00:00 Mon 08:00 Mon 16:00 Tue 00:00 Tue 08:00 Tue 16:00 Wed 00:00 Wed 08:00 Wed 16:00 Requests/sec APP 30000 40000 50000 60000 70000 80000 90000 100000 110000 Wed 00:00 Wed 08:00 Wed 16:00 Thu 00:00 Thu 08:00 Thu 16:00 Fri 00:00 Fri 08:00 Fri 16:00 Sat 00:00 Sat 08:00 Sat 16:00 Sun 00:00 Sun 08:00 Sun 16:00 Mon 00:00 Mon 08:00 Mon 16:00 Tue 00:00 Tue 08:00 Tue 16:00 Wed 00:00 Wed 08:00 Wed 16:00 Thu 00:00 Requests/sec VAR 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Sat 08:00 Sat 16:00 Sun 00:00 Sun 08:00 Sun 16:00 Mon 00:00 Mon 08:00 Mon 16:00 Tue 00:00 Tue 08:00 Tue 16:00 Wed 00:00 Wed 08:00 Wed 16:00 Thu 00:00 Thu 08:00 Thu 16:00 Fri 00:00 Fri 08:00 Fri 16:00 Sat 00:00 Sat 08:00 Sat 16:00 Sun 00:00 Sun 08:00 Sun 16:00 Mon 00:00 Mon 08:00 Mon 16:00 Tue 00:00 Tue 08:00 Tue 16:00 Wed 00:00 Wed 08:00 Wed 16:00 Thu 00:00 Thu 08:00 Thu 16:00 Fri 00:00 Fri 08:00 Fri 16:00 Sat 00:00 Requests/sec SYS 0 50000 100000 150000 200000 250000 300000 350000 Thu 00:00 Thu 08:00 Thu 16:00 Fri 00:00 Fri 08:00 Fri 16:00 Sat 00:00 Sat 08:00 Sat 16:00 Sun 00:00 Sun 08:00 Sun 16:00 Mon 00:00 Mon 08:00 Mon 16:00 Tue 00:00 Tue 08:00 Tue 16:00 Wed 00:00 Wed 08:00 Wed 16:00 Thu 00:00 Thu 08:00 Thu 16:00 Fri 00:00 Fri 08:00 Fri 16:00 Sat 00:00 Sat 08:00 Sat 16:00 Sun 00:00 Sun 08:00 Sun 16:00 Mon 00:00 Mon 08:00 Mon 16:00 Tue 00:00 Tue 08:00 Tue 16:00 Requests/sec USR 30000 40000 50000 60000 70000 80000 90000 Sat 08:00 Sat 16:00 Sun 00:00 Sun 08:00 Sun 16:00 Mon 00:00 Mon 08:00 Mon 16:00 Tue 00:00 Tue 08:00 Tue 16:00 Wed 00:00 Wed 08:00 Wed 16:00 Thu 00:00 Thu 08:00 Thu 16:00 Fri 00:00 Fri 08:00 Fri 16:00 Sat 00:00 Sat 08:00 Sat 16:00 Sun 00:00 Sun 08:00 Sun 16:00 Mon 00:00 Mon 08:00 Mon 16:00 Tue 00:00 Tue 08:00 Tue 16:00 Wed 00:00 Wed 08:00 Wed 16:00 Requests/sec ETC 35000 40000 45000 50000 55000 60000 65000 70000 75000 80000 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 00:00 Requests/sec ETC 24 hours Figure 3: Request rates at different dates and times of day, Coordinated Universal Time (UTC). Each data point counts the total number of requests in the preceding second. Except for USR and VAR, different traces were collected in different times. The last plot zooms in on a 24-hour period from the ETC trace for greater detail