USRVARSYSAPPETCd898062ENSRI2RSFESEOFigure 4: GET hit rates over time for all pools (days start at midnight UTC).of the requests, and most keys repeating only a handful ofSo,forexample,50%of ETC'skeysoccur inonlytimes.1% of all requests, meaning they do not repeat many times,Key appearance CDFwhile a few popular keys repeat in millions of requests per1aday. This high concentration of repeating keys provides the0.1justification for caching them in the firstplaceAll curves are remarkably similar, except for SYS's, which0.01has two distinct sections.Thefirst,uptoabout65%ofUSRthekeys,represents keys that are repeated infrequently0.001APPconceivably those that are retrieved when one or more clientsstart up and fill their local cache. The second part, repre-0.0001senting the last 25% of keys and more than 90% of the re-SYSquests,may account for the normal SYS scenario, when a0.10.20.30.40.50.60.70.80.9value is added or updated in the cache and all the clientsCumulativeratioofkeys fromtotalretrieve it.4.2.2Locality over TimeFigure5:CDFsof keyappearances,depictinghowIt is also interesting to examine howkey uniqueness variesmany keys account for how many requests, in rela-over time by counting how many keys do not repeat in closetive terms.Keys arerankedfrom leastpopular totime proximity (Fig. 6). To interpret this data, note that amost popular.lowerpercentage indicatesthatfewer keysare unique,andtherefore suggests a higher hit rate. Indeed, note that thefrom Sec. 2.2 that USR is sized large enough to minimizediurnal dips correspond to increases in hit rates in Fig. 4.the number of missesin other words, to contain almostAn immediately apparent property is that in any givenpool, this percentage remains relatively constant over timeall possible keys. When a USR Memcached server starts,especially with hour-long bins, with only small diurnal vari-it contains no data and misses on all requests.But overtime, as clients add values to it while the pressure to evict isations and few spikes.Data for 5-minute bins are natu-nonexistent.hitrates climb upwards.Thus.USR'stransientrally noisier,butevenhere most samples remain confinedhit rate is correlated not only with time of day, but primarilyto a narrow range.This suggests that different pools havewith the server's uptime, reaching 99.8% after several weeks.not only different traffic patterns, but also different cachingLikeUSR,SYShasarelativelyboundeddatadomain,soproperties that can benefit from different tuning, justifyingit can easily be sized to keep hit rates high and stable.Butthechoiceto segregateworkloadstopoolsunlike the other four workloads, SYS does not react directlyEachpoolexhibitsits characteristiclocalityband and av-to user load, so its performance is less cyclical and regularerage. SYS's low average rate of 3.3% unique keys per hourfor example, suggests that different clients request roughly4.2LocalityMetricsthe same service information.In contrast, USR'smuchThis section looks at three ways to measure locality inhigheraveragerateof 34.6%uniquekevsperhour.suggestsGET requests: (1)how often and how much somekeys re-that theper-user dataitrepresentsspansamuchmore dis-peat in requests; (2) the amount of unique keys and howparaterange.Generally,wewouldassumethat lower bandsit varies over time; and (3) reuse period, as a measure oftranslate to higher overall hit rates, all other things beingtemporal locality.equal. This turns out not to be the case. In fact, the Pear-These metrics, unlike hit rates, are an inherent property ofson correlation coefficientbetween averageuniquekey ratiosthe request stream of each pool; changing the server's hard-with 60-minute bins (taken from Fig.6) and the average hitrates (Table 2) is negative as expected, but small: -0.097.ware or server count will not affect them. Consequently, thisdata could provide insights toward the workload, in isolationIndeed not all other things are equal, as discussed in Sec: 4.1.of implementation choices.Comparing 5-minute bins to hour-long bins reveals thatunique keys in theformer appear in significantly higher con-4.2.1RepeatingKeyscentrations than in the latter.This implies a rapid rate ofWe start by looking at the distribution of key repeatsdecayininterestinmostkeys.Butdoesthisratecontinueto(Fig. 5). All workloads exhibit the expected long-tail distri-drop very fast over a longer time window? The next sectionbutions, with a small percentage of keys appearing in mostsets out to answer this question
97 98 99 Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed USR 92 93 94 95 Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu APP 75 80 85 90 95 Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu ETC 92 93 94 95 96 Wed Thu Fri Sat Sun Mon Tue Wed Thu VAR 95 96 97 98 99 100 Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat SYS Figure 4: GET hit rates over time for all pools (days start at midnight UTC). 0.0001 0.001 0.01 0.1 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Ratio from total requests Cumulative ratio of keys from total Key appearance CDF USR APP ETC VAR SYS Figure 5: CDFs of key appearances, depicting how many keys account for how many requests, in relative terms. Keys are ranked from least popular to most popular. from Sec. 2.2 that USR is sized large enough to minimize the number of misses—in other words, to contain almost all possible keys. When a USR Memcached server starts, it contains no data and misses on all requests. But over time, as clients add values to it while the pressure to evict is nonexistent, hit rates climb upwards. Thus, USR’s transient hit rate is correlated not only with time of day, but primarily with the server’s uptime, reaching 99.8% after several weeks. Like USR, SYS has a relatively bounded data domain, so it can easily be sized to keep hit rates high and stable. But unlike the other four workloads, SYS does not react directly to user load, so its performance is less cyclical and regular. 4.2 Locality Metrics This section looks at three ways to measure locality in GET requests: (1) how often and how much some keys repeat in requests; (2) the amount of unique keys and how it varies over time; and (3) reuse period, as a measure of temporal locality. These metrics, unlike hit rates, are an inherent property of the request stream of each pool; changing the server’s hardware or server count will not affect them. Consequently, this data could provide insights toward the workload, in isolation of implementation choices. 4.2.1 Repeating Keys We start by looking at the distribution of key repeats (Fig. 5). All workloads exhibit the expected long-tail distributions, with a small percentage of keys appearing in most of the requests, and most keys repeating only a handful of times. So, for example, 50% of ETC’s keys occur in only 1% of all requests, meaning they do not repeat many times, while a few popular keys repeat in millions of requests per day. This high concentration of repeating keys provides the justification for caching them in the first place. All curves are remarkably similar, except for SYS’s, which has two distinct sections. The first, up to about 65% of the keys, represents keys that are repeated infrequently— conceivably those that are retrieved when one or more clients start up and fill their local cache. The second part, representing the last 25% of keys and more than 90% of the requests, may account for the normal SYS scenario, when a value is added or updated in the cache and all the clients retrieve it. 4.2.2 Locality over Time It is also interesting to examine how key uniqueness varies over time by counting how many keys do not repeat in close time proximity (Fig. 6). To interpret this data, note that a lower percentage indicates that fewer keys are unique, and therefore suggests a higher hit rate. Indeed, note that the diurnal dips correspond to increases in hit rates in Fig. 4. An immediately apparent property is that in any given pool, this percentage remains relatively constant over time— especially with hour-long bins, with only small diurnal variations and few spikes. Data for 5-minute bins are naturally noisier, but even here most samples remain confined to a narrow range. This suggests that different pools have not only different traffic patterns, but also different caching properties that can benefit from different tuning, justifying the choice to segregate workloads to pools. Each pool exhibits its characteristic locality band and average. SYS’s low average rate of 3.3% unique keys per hour, for example, suggests that different clients request roughly the same service information. In contrast, USR’s much higher average rate of 34.6% unique keys per hour, suggests that the per-user data it represents spans a much more disparate range . Generally, we would assume that lower bands translate to higher overall hit rates, all other things being equal. This turns out not to be the case. In fact, the Pearson correlation coefficient between average unique key ratios with 60-minute bins (taken from Fig. 6) and the average hit rates (Table 2) is negative as expected, but small: −0.097. Indeed not all other things are equal, as discussed in Sec. 4.1. Comparing 5-minute bins to hour-long bins reveals that unique keys in the former appear in significantly higher concentrations than in the latter. This implies a rapid rate of decay in interest in most keys. But does this rate continue to drop very fast over a longer time window? The next section sets out to answer this question
Percentage of unique keys out of total in 5-minute bins100888USR=74.7%(%) skey anblunR884ETC=44.5%APP=43.0%VAR=33.4%AA30SYS=18.4%120-OL22456789222224522820323458Time (days)Percentage of unique keys out of total in 60-minute bins10090上80上(%)shey enbrun70上2884USR=34.6%APP=22.4%ETC=20.7%VAR=15.3%88AAAA10SYS=3.3%o Lullluhhlwhlnanaotodttbtttataswwlhullwlul2345678901284567892223456282332345Time (days)Figure 6: Ratio of unique keys over time. Each data point on the top (bottom) plot shows how many uniquekeys were requested in the preceding 5 (60) minutes, as percentage of all keys.Thelabel for each pool, atthe top right corner of the data, also includes the average ratio throughout the entire pool's trace.4.2.3TemporalLocality:ReusePeriodTemporallocalityreferstohowoftenakeyisre-accessedOnemetric toquantifytemporal locality of any givenkey isthe reuse period-the time between consecutive accesses to1e+16the key. Fig. 7 counts all key accesses in the five traces, and1e+11USRbins them according to the time duration from the previous1e+141e+101e+09TCkey's access. Unique keys (those that do not repeat at allVAR1e+121e+08withinthetraceperiod)areexcludedfromthiscount.seeseSYS1e+07The answer to the question from the previous section is1e+101e+061e+05therefore positive: count of accesses in each reuse period1e+082463continues to decay quickly after the first hour. For the ETC1e+06trace, for example, 88.5% of the keys are reused within anhour, but only 4% more within two, and within six hours,1e+0496.4% of all nonunique keys have already repeated. It con-1e+02tinues to decay at a slower rate. This access behavior sug-gests a pattern for Facebook's users as well, with some users1e+00O24426visiting the site more frequently than others and reusing24880468the keys associated with their accounts.Another interest-ing sub-pattern occurs every day. Note the periodic peaksTime (hours)on even 24 hours in four of the five traces, especially inthe VAR pool that is associated with browser usage. TheseFigure 7:Reuse period histogram per pool. Eachpeaks suggest that a noteworthy number of users log in tohour-longbin n countskeysthat werefirstrequestedthe site at approximately the same time of day each time.The insetn hours after their latest appearance.Once more, these increased-locality indications also corre-zooms in on the five hours after the first.spond to increasedhitrates inFig.4.Asintheprevioussection,theSYSpoolstandsout.Itdoes not show the same 24-hour periodicity, because its keys
0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Unique keys (%) Time (days) Percentage of unique keys out of total in 5-minute bins ETC=44.5% APP=43.0% VAR=33.4% USR=74.7% SYS=18.4% 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Unique keys (%) Time (days) Percentage of unique keys out of total in 60-minute bins ETC=20.7% APP=22.4% VAR=15.3% USR=34.6% SYS=3.3% Figure 6: Ratio of unique keys over time. Each data point on the top (bottom) plot shows how many unique keys were requested in the preceding 5 (60) minutes, as percentage of all keys. The label for each pool, at the top right corner of the data, also includes the average ratio throughout the entire pool’s trace. 4.2.3 Temporal Locality: Reuse Period Temporal locality refers to how often a key is re-accessed. One metric to quantify temporal locality of any given key is the reuse period—the time between consecutive accesses to the key. Fig. 7 counts all key accesses in the five traces, and bins them according to the time duration from the previous key’s access. Unique keys (those that do not repeat at all within the trace period) are excluded from this count. The answer to the question from the previous section is therefore positive: count of accesses in each reuse period continues to decay quickly after the first hour. For the ETC trace, for example, 88.5% of the keys are reused within an hour, but only 4% more within two, and within six hours, 96.4% of all nonunique keys have already repeated. It continues to decay at a slower rate. This access behavior suggests a pattern for Facebook’s users as well, with some users visiting the site more frequently than others and reusing the keys associated with their accounts. Another interesting sub-pattern occurs every day. Note the periodic peaks on even 24 hours in four of the five traces, especially in the VAR pool that is associated with browser usage. These peaks suggest that a noteworthy number of users log in to the site at approximately the same time of day each time. Once more, these increased-locality indications also correspond to increased hit rates in Fig. 4. As in the previous section, the SYS pool stands out. It does not show the same 24-hour periodicity, because its keys 1e+00 1e+02 1e+04 1e+06 1e+08 1e+10 1e+12 1e+14 1e+16 0 24 48 72 96 120 144 168 192 216 240 264 288 312 Keys requested Time (hours) USR APP ETC VAR SYS 1e+05 1e+06 1e+07 1e+08 1e+09 1e+10 1e+11 1 2 3 4 5 6 Figure 7: Reuse period histogram per pool. Each hour-long bin n counts keys that were first requested n hours after their latest appearance. The inset zooms in on the five hours after the first
relate to servers and services, not users. It also decays pre-or SYS, may be interesting as edge cases, but probably notcipitously compared to the others.As in Sec. 4.2.1, we findso much as models for synthetic workloads.that since its data are cached locally by clients, it is likelyThe functional models presented here prioritize parsimo-that most of SYS's GET requests represent data that arenious characterization over fidelity. As such, they obviouslynewly available, updated or expired from the client cache;do not capture all the nuances of the trace, such as its burstythese are then requested by many clients concurrently. Thisnature or the inclusion of one-off events.But barring accesswould explain why 99.9% of GET requests are repeatedto the actual trace, they can serve the community as a bet-within an hour of the first key access.Later, such keysterbasis for synthetic workload generation than assumptionswould be cached locally and accessed rarely,perhaps whenbased on guesswork or small-scale logs.a newly added client needs to fill its own cache.Nevertheless, there is still value in reuse period to predictMethodologyhit rates. Since all pools have suficient memory for over anWe modeled independently the three main performance prop-hour of fresh data, thepercentage of keys reused within anerties that would enable simple emulation of this trace: keyhour correlates positively with the overall hit rates in Ta-sizes, value sizes, and inter-arrival rates. The rate and ratioble 2 (with a Pearson coefficient of 0.17). The correlationbetween GET/SET/DELETE requests can be derived fromis stronger-witha coefficient of 0.44if we omit USRandSec.3.1. For cache analysis, additional properties can beSYS, which have atypical cache behavior (minimum evic-gleaned from Sec. 4.tions in the former and local caching in the latter).To justify the assumption that the three properties areindependent,wepicked a sampleof 1,000,000consecutive4.3Case Study:ETCHit Ratesrequests and measured the Pearson coefficient between eachWe turn our attention to ETC's hit/miss rates, becausepair of variables..The pairwise correlations,as shown infrequent misses can noticeably hurt user experience. At thisTable 4, are indeed very low.point, one might expect ETC's hit rate to exceed the 96%6-hour key-reuse rate, since it is provisioned with more thanenough RAM to store the fresh data of the preceding 6 hours.Table 4:Pearson correlation coefficient betweenUnfortunately, this is not the case, and the observed hit rateeach two pair of modeled variables.issignificantlylowerat81%.To understand why, we an-CorrelationVariable pairalyzed all the misses in thelast 24 hours of the trace (Ta-0.0111Inter-arrival gap Key sizeble.3).Thelargestnumberof misses inETC comes fromInter-arrival gapValue size0.0065keys that are accessed for the first time (at least in a 10-day-0.0286Keysize→Valuesizeperiod). This is the long tail of the locality metrics we an-alyzed before.Sec.4.2.1 showed that ~50%of ETC's keysare accessed in only 1% of requests, and therefore benefitWe created functional models by fitting various distribu-little or not at all from a demand-flled cache.The manytions (such as Weibull,Gamma,Extreme Value,Normal,deletions in the cache also hinder the cache's efficacy. ETCetc.) to each data set and choosing the distribution thatis a very diversepool with many applications, some withminimizes the Kolmogorov-Smirnov distance. All our datalimited reusability.But the other half of the keys that showresemble power-law distributions, and fit the selected mod-up in 99% of the requests are so popular (some repeatingels quite well, with the exception of a handful of points (seemillions of times) that Memcached can satisfy over 4 in 5Fig.8).To deal with these outliers and improve the fit, werequests to the ETC pool.removed thesepoints as necessaryand modeledthe remain-ing samples.The few removed points are tabulated sepa-rately as a histogram, so a more accurate synthetic workloadTable 3: Miss categories in last 24 hours of theof the entire trace should combine thefunctional model withETC trace.Compulsorymisses count GETs withthe short list of value-frequency outliers.no matching SET in the preceding 10 days (mean-ing, for all practical purposes, new keys to theKey-SizeDistributioncache).Invalidation misses countGETs precededby a matching DELETE request.Eviction missesWe found the model that best fits key sizes in bytes (with acount all other missing GETsKolmogorov-Smirnov distance of 10.5) to be Generalized Ex-CompulsoryInvalidationEvictionMiss categorytreme Value distribution with parameters μ= 30.7984, =8%22%Ratioofmisses70%8.20449, k =0.078688.We have verified that these param-eters remain fairly constant, regardless of time of dayValue-SizeDistribution5.STATISTICALMODELINGWe found the model that best fits value sizes in bytes (withThis section describes the salient workload characteristicsof the ETC trace using simple distribution models. Thea Kolmogorov-Smirnov distanceof 10.5),startingfrom 15ETC trace was selected because it is both the most repre-bytes,tobeGeneralizedParetowithparameters0=0,=214.476, k =0.348238 (this distribution is also independentsentative of large-scale, general-purpose KV stores, and theof the time of day). The first 15 values of length and prob-easiest to model, since it is not distorted by the idiosyncraticaberrations of application-specific pools.We also think thatabilities can be modeled separately as a discrete probabilitydistribution whose values are given in Table 5.its mixed workload is easier to generalize to other general-purpose caches with a heterogeneous mix of requests andsizes. The more Facebook-specific workloads, such as USR
relate to servers and services, not users. It also decays precipitously compared to the others. As in Sec. 4.2.1, we find that since its data are cached locally by clients, it is likely that most of SYS’s GET requests represent data that are newly available, updated or expired from the client cache; these are then requested by many clients concurrently. This would explain why 99.9% of GET requests are repeated within an hour of the first key access. Later, such keys would be cached locally and accessed rarely, perhaps when a newly added client needs to fill its own cache. Nevertheless, there is still value in reuse period to predict hit rates. Since all pools have sufficient memory for over an hour of fresh data, the percentage of keys reused within an hour correlates positively with the overall hit rates in Table 2 (with a Pearson coefficient of 0.17). The correlation is stronger—with a coefficient of 0.44—if we omit USR and SYS, which have atypical cache behavior (minimum evictions in the former and local caching in the latter). 4.3 Case Study: ETC Hit Rates We turn our attention to ETC’s hit/miss rates, because frequent misses can noticeably hurt user experience. At this point, one might expect ETC’s hit rate to exceed the 96% 6-hour key-reuse rate, since it is provisioned with more than enough RAM to store the fresh data of the preceding 6 hours. Unfortunately, this is not the case, and the observed hit rate is significantly lower at 81%. To understand why, we analyzed all the misses in the last 24 hours of the trace (Table. 3). The largest number of misses in ETC comes from keys that are accessed for the first time (at least in a 10-day period). This is the long tail of the locality metrics we analyzed before. Sec. 4.2.1 showed that ≈ 50% of ETC’s keys are accessed in only 1% of requests, and therefore benefit little or not at all from a demand-filled cache. The many deletions in the cache also hinder the cache’s efficacy. ETC is a very diverse pool with many applications, some with limited reusability. But the other half of the keys that show up in 99% of the requests are so popular (some repeating millions of times) that Memcached can satisfy over 4 in 5 requests to the ETC pool. Table 3: Miss categories in last 24 hours of the ETC trace. Compulsory misses count GETs with no matching SET in the preceding 10 days (meaning, for all practical purposes, new keys to the cache). Invalidation misses count GETs preceded by a matching DELETE request. Eviction misses count all other missing GETs. Miss category Compulsory Invalidation Eviction Ratio of misses 70% 8% 22% 5. STATISTICAL MODELING This section describes the salient workload characteristics of the ETC trace using simple distribution models. The ETC trace was selected because it is both the most representative of large-scale, general-purpose KV stores, and the easiest to model, since it is not distorted by the idiosyncratic aberrations of application-specific pools. We also think that its mixed workload is easier to generalize to other generalpurpose caches with a heterogeneous mix of requests and sizes. The more Facebook-specific workloads, such as USR or SYS, may be interesting as edge cases, but probably not so much as models for synthetic workloads. The functional models presented here prioritize parsimonious characterization over fidelity. As such, they obviously do not capture all the nuances of the trace, such as its bursty nature or the inclusion of one-off events. But barring access to the actual trace, they can serve the community as a better basis for synthetic workload generation than assumptions based on guesswork or small-scale logs. Methodology We modeled independently the three main performance properties that would enable simple emulation of this trace: key sizes, value sizes, and inter-arrival rates. The rate and ratio between GET/SET/DELETE requests can be derived from Sec. 3.1. For cache analysis, additional properties can be gleaned from Sec. 4. To justify the assumption that the three properties are independent, we picked a sample of 1, 000, 000 consecutive requests and measured the Pearson coefficient between each pair of variables. The pairwise correlations, as shown in Table 4, are indeed very low. Table 4: Pearson correlation coefficient between each two pair of modeled variables. Variable pair Correlation Inter-arrival gap ↔ Key size −0.0111 Inter-arrival gap ↔ Value size 0.0065 Key size ↔ Value size −0.0286 We created functional models by fitting various distributions (such as Weibull, Gamma, Extreme Value, Normal, etc.) to each data set and choosing the distribution that minimizes the Kolmogorov-Smirnov distance. All our data resemble power-law distributions, and fit the selected models quite well, with the exception of a handful of points (see Fig. 8). To deal with these outliers and improve the fit, we removed these points as necessary and modeled the remaining samples. The few removed points are tabulated separately as a histogram, so a more accurate synthetic workload of the entire trace should combine the functional model with the short list of value-frequency outliers. Key-Size Distribution We found the model that best fits key sizes in bytes (with a Kolmogorov-Smirnov distance of 10.5) to be Generalized Extreme Value distribution with parameters μ = 30.7984, σ = 8.20449, k = 0.078688. We have verified that these parameters remain fairly constant, regardless of time of day. Value-Size Distribution We found the model that best fits value sizes in bytes (with a Kolmogorov-Smirnov distance of 10.5), starting from 15 bytes, to be Generalized Pareto with parameters θ = 0, σ = 214.476, k = 0.348238 (this distribution is also independent of the time of day). The first 15 values of length and probabilities can be modeled separately as a discrete probability distribution whose values are given in Table 5
ETC Key Size PDFETC Key Size CDFETC Key Size Residuals0.105100Sample营营Model80105eeeid260A040-5-1020Sample-15000Model0-20050150050100150200250010020025050100150200250Key size (bytes)Key size (bytes)Value size (bytes)ETC Value Size CDFETC Value Size PDFETC Value Size Residuals2505059100Sample0.1Mode800.01oe ense0.001Kungeordraoe600.00011e-05401e-06-101e-0720Sample-151e-08Model-2001e-09101001000100001101001000 100001000001e+061101001000100001000001e+061Value size (bytes)Value size (bytes)Value size (bytes)ETC Request Inter-arrival Gap PDFETC Request Inter-arrival Gap CDFETC Request Inter-arrival Gap Residuals0.110025Sample0.01Model800.0010505aeneKqeqoid雪0.000160e1e-05401e-06-101e-0720-15SMmdl1e-08-20 51e-090101001000101001000 100001000001e+061101001000 100001000001e+06Inter-arrival gap (us)Inter-arrival gap (us)Request inter-arrival gap (us)Figure 8:PDF (left), CDF (middle),and CDF residuals (right)plots for the distribution of ETC'skey size(top), value size (center): and inter-arrival gap (bottom). Note that some axes are logarithmic, and thatPDF plots limit the X-axis to areaof interestfor greater detail.Inter-arrivalRateDistributionWe found the model that best describes the time gap inmicroseconds between consecutive received requests (withTable 5:Probability distribution for first few valuea Kolmogorov-Smirnov distance of 2.0) to be Generalizedlengths, in bytes.Paretowithparameters0=0,α=16.0292,k=0.154971,Value sizeProbabilitystarting from the second value.0One excluded point from this model is the first value, rep-0.00536resenting a gap of O μsec (in other words, multiple requests0.000471at thesame microsecond timeslot),with a probability of20.178200.1159. This is likely an artifact of our measurement gran-30.09239ularity and aggregation by the network stack, and not of40.00018concurrent requests, since they are all serialized by the sin-50.02740gle networking interface.60.00065In addition, the model is most accurate up to about a70.00606gap of 1000 μsec.But the total number of sampled points80.00023not covered by this model (i.e., those requests that arrive90.00837more than lmsec after the previous request) represents less100.00837than0.002%ofthetotal samples and theirresidual erroris110.08989negligible.120.00092Unlike the previous two distributions, inter-arrival rate130.00326thereciprocalfunction of offered load-ishighlydependent140.01980on time of day, as evident in Fig.3.For those wishing to cap-ture this diurnal variation, this complete-trace model maybe too coarse. To refine this distribution, we divided the
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 50 100 150 200 250 Probability Key size (bytes) ETC Key Size PDF Sample Model 0 20 40 60 80 100 0 50 100 150 200 250 Percentile Key size (bytes) ETC Key Size CDF Sample Model -20 -15 -10 -5 0 5 10 15 20 0 50 100 150 200 250 Residual error Value size (bytes) ETC Key Size Residuals 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1 10 100 1000 10000 Probability Value size (bytes) ETC Value Size PDF Sample Model 0 20 40 60 80 100 1 10 100 1000 10000 100000 1e+06 Percentile Value size (bytes) ETC Value Size CDF Sample Model -20 -15 -10 -5 0 5 10 15 20 1 10 100 1000 10000 100000 1e+06 Residual error Value size (bytes) ETC Value Size Residuals 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 Probability Inter-arrival gap (us) ETC Request Inter-arrival Gap PDF Sample Model 0 20 40 60 80 100 1 10 100 1000 10000 100000 1e+06 Percentile Inter-arrival gap (us) ETC Request Inter-arrival Gap CDF Sample Model -20 -15 -10 -5 0 5 10 15 20 1 10 100 1000 10000 100000 1e+06 Residual error Request inter-arrival gap (us) ETC Request Inter-arrival Gap Residuals Figure 8: PDF (left), CDF (middle), and CDF residuals (right) plots for the distribution of ETC’s key size (top), value size (center). and inter-arrival gap (bottom). Note that some axes are logarithmic, and that PDF plots limit the X-axis to area of interest for greater detail. Table 5: Probability distribution for first few value lengths, in bytes. Value size Probability 0 0.00536 1 0.00047 2 0.17820 3 0.09239 4 0.00018 5 0.02740 6 0.00065 7 0.00606 8 0.00023 9 0.00837 10 0.00837 11 0.08989 12 0.00092 13 0.00326 14 0.01980 Inter-arrival Rate Distribution We found the model that best describes the time gap in microseconds between consecutive received requests (with a Kolmogorov-Smirnov distance of 2.0) to be Generalized Pareto with parameters θ = 0, σ = 16.0292, k = 0.154971, starting from the second value. One excluded point from this model is the first value, representing a gap of 0 μsec (in other words, multiple requests at the same microsecond time slot), with a probability of 0.1159. This is likely an artifact of our measurement granularity and aggregation by the network stack, and not of concurrent requests, since they are all serialized by the single networking interface. In addition, the model is most accurate up to about a gap of 1000 μsec. But the total number of sampled points not covered by this model (i.e., those requests that arrive more than 1msec after the previous request) represents less than 0.002% of the total samples and their residual error is negligible. Unlike the previous two distributions, inter-arrival rate— the reciprocal function of offered load—is highly dependent on time of day, as evident in Fig. 3. For those wishing to capture this diurnal variation, this complete-trace model may be too coarse. To refine this distribution, we divided the
Nevertheless, improving hit rates is important for theseTable 6: Hourly distributions for inter-arrival gap.applications, or we would not need a cache in the first place.The columns represent (in order): start time of eachOne way to improve ETC's hit rates, at least in theory, is tohourly bin (in UTC), the two Generalized Pareto pa-increase the total amount of RAM (or servers) in the pool sorameters (with = 0), the fraction of samples underthat we can keep a longer history. But in practice, beyond1μs gap,and the Kolmogorov-Smirnov distance ofa couple of daysworth of history, the number of keys thatthe fitwould benefit from the longer memory is vanishingly small,KTimeKSa<1μsas Fig.7 shows. And of course, adding hardware adds cost.0:0016.28680.1552800.11582.18A more fruitful direction may be to focus on the cache1:0015.89370.1413680.11702.140.11714replacement policy.Several past studies demonstrated re-2:0015.63450.1375790.11742.09placement policies with reduced eviction misses, compared3:0015.70030.1423820.11742.16toLRU, such as LIRS [19].Table3puts an upper limit4:0016.32310.1607060.11762.32on the number of eviction missesthat can be eliminated5:000.1812780.11622.5217.5157at around 22%, meaning that eviction policy changes could6:000.1968852.6418.67480.1146improve hit rates by another 0.22× (1-0.814) = 4.1%7:0019.51140.2023960.11442.64This may sound modest, but it represents over 120 million8:0020.20500.2016370.11232.58GET requests per day per server, with noticeable impacton service latency.Moreover, the current cache replace-9:0020.29150.1937640.11162.46mentschemeanditsimplementationaresuboptimalwhenit10:0019.55770.1783860.11222.35comes to multithreaded performance [9], with its global lock11:0018.22940.1616360.11302.17protecting both hash table and slab LRUs. We therefore12:000.1404610.11382.0017.1879perceive great potential in alternative replacement policies13:0016.21590.1192420.11461.88not only for better hit rates but also for better performance.14:0015.67160.1045350.11521.76Another interesting question is whether we should opti-15:0015.29040.0942860.11441.72mize Memcached for hit rates or byte hit rates. To answer16:0015.20330.0969630.11361.72it, we estimated the penalty of each miss in the ETC work-17:0014.95330.0985100.11401.74load, bymeasuring the duration between the missing GET18:0015.13810.11281.670.096155and the subsequent SET that reinstates the value (presum-19:0015.32100.0941560.11291.65ably, the duration represents the time cost to recalculate the20:0015.38480.1003650.10.11281.68value, and is already highly optimized, emphasizing the im-21:0015.75020.1119210.11271.80portance of improved hit rates). We found that it is roughly22:001.9616.02050.1319460.1129proportional to thevalue size, so whetherwe fill the cache23:0016.32380.1472580.11482.14with few large items or many small items of the same aggre-gate size should not affect recalculation timemuch. On theotherhand,frequent misses do noticeably increase the loadraw data into 24 hourly bins and modeled each separatelyon the back-end servers and hurt the user experience, whichFortunately, they all fit a Generalized Pareto distributionexplains why this cache does not prioritize byte hit rate.with = 0 rather well.The remaining two parameters areMemcached optimizes for small values,because they are bydistributed over time in Table. 6.farthemost common values.It may even be worthwhileto investigate not caching large objects at all, to increaseoverall hit rates.6.DISCUSSIONOne pertinent question is, what are the factors that affect7.RELATEDWORKand predict hit rates? Since all hosts have the same amountTo the best of our knowledge, this is the first detailed de-of RAM, we should be able to easily explain the relative dif-ferences between different traces using the data we gatheredscription of a large-scale KV-store workload.Neverthelessso far on locality.But as Sec. 4 discusses, hit rates do notthere are a number of related studies on other caching sys-actually correlate verywell with most locality metrics, buttems that can shed light on the relevance of this work andrather, correlates inversely with the size of the pool (com-itsmethodology.pare Tables 1 and2). Does correlation imply causation inThe design and implementation of any storage or cachingthis case?system must be optimized for its workload to be effective.Probably not. A more likely explanation invokes a third,Accordingly, there is a large body of work on the collection,related parameter: the size of the application domain. Bothanalysis, and characterization of the workloads on storagein USR's and SYS's cases, these sizes aremore or less capped,systems, including enterprise computing environments [2,and the bound is small enough that a limited number of20, 21] and high-performance computing environments [11,servers can cover virtually the entire domain, so locality no22,30j.The observations canbe of great importancetolonger plays a factor. On the other extreme,ETC has a vary-system design, engineering, and tuning. For example, in aing and growing number of applications using it, some withstudy on file system workloads for large-scale scientific com-unbounded data. If any single application grows enoughputing applications, Wang et. al. collected and analyzed fileaccesses on an 800-node cluster running the Lustre file sys-in importance to require a certain quality of service, andhas the size limitations to enable this quality,given enoughtem at Lawrence Livermore National Laboratory [30]. Oneservers, then it is separated out of ETC to its own pool.Soof their findings is that in some workloads, small requeststhe applications that end up using ETC are precisely thoseaccount for more than 90% of all requests, but almost allthat cannot or need not benefit from hit-rate guarantees.data are accessed by large requests. In a study on file sys-
Table 6: Hourly distributions for inter-arrival gap. The columns represent (in order): start time of each hourly bin (in UTC), the two Generalized Pareto parameters (with θ = 0), the fraction of samples under 1 μs gap , and the Kolmogorov-Smirnov distance of the fit. Time σ k < 1 μs KS 0:00 16.2868 0.155280 0.1158 2.18 1:00 15.8937 0.141368 0.1170 2.14 2:00 15.6345 0.137579 0.1174 2.09 3:00 15.7003 0.142382 0.1174 2.16 4:00 16.3231 0.160706 0.1176 2.32 5:00 17.5157 0.181278 0.1162 2.52 6:00 18.6748 0.196885 0.1146 2.64 7:00 19.5114 0.202396 0.1144 2.64 8:00 20.2050 0.201637 0.1123 2.58 9:00 20.2915 0.193764 0.1116 2.46 10:00 19.5577 0.178386 0.1122 2.35 11:00 18.2294 0.161636 0.1130 2.17 12:00 17.1879 0.140461 0.1138 2.00 13:00 16.2159 0.119242 0.1146 1.88 14:00 15.6716 0.104535 0.1152 1.76 15:00 15.2904 0.094286 0.1144 1.72 16:00 15.2033 0.096963 0.1136 1.72 17:00 14.9533 0.098510 0.1140 1.74 18:00 15.1381 0.096155 0.1128 1.67 19:00 15.3210 0.094156 0.1129 1.65 20:00 15.3848 0.100365 0.1128 1.68 21:00 15.7502 0.111921 0.1127 1.80 22:00 16.0205 0.131946 0.1129 1.96 23:00 16.3238 0.147258 0.1148 2.14 raw data into 24 hourly bins and modeled each separately. Fortunately, they all fit a Generalized Pareto distribution with θ = 0 rather well. The remaining two parameters are distributed over time in Table. 6. 6. DISCUSSION One pertinent question is, what are the factors that affect and predict hit rates? Since all hosts have the same amount of RAM, we should be able to easily explain the relative differences between different traces using the data we gathered so far on locality. But as Sec. 4 discusses, hit rates do not actually correlate very well with most locality metrics, but rather, correlates inversely with the size of the pool (compare Tables 1 and2). Does correlation imply causation in this case? Probably not. A more likely explanation invokes a third, related parameter: the size of the application domain. Both in USR’s and SYS’s cases, these sizes are more or less capped, and the bound is small enough that a limited number of servers can cover virtually the entire domain, so locality no longer plays a factor. On the other extreme, ETC has a varying and growing number of applications using it, some with unbounded data. If any single application grows enough in importance to require a certain quality of service, and has the size limitations to enable this quality, given enough servers, then it is separated out of ETC to its own pool. So the applications that end up using ETC are precisely those that cannot or need not benefit from hit-rate guarantees. Nevertheless, improving hit rates is important for these applications, or we would not need a cache in the first place. One way to improve ETC’s hit rates, at least in theory, is to increase the total amount of RAM (or servers) in the pool so that we can keep a longer history. But in practice, beyond a couple of days’ worth of history, the number of keys that would benefit from the longer memory is vanishingly small, as Fig. 7 shows. And of course, adding hardware adds cost. A more fruitful direction may be to focus on the cache replacement policy. Several past studies demonstrated replacement policies with reduced eviction misses, compared to LRU, such as LIRS [19]. Table 3 puts an upper limit on the number of eviction misses that can be eliminated, at around 22%, meaning that eviction policy changes could improve hit rates by another 0.22 × (1 − 0.814) = 4.1%. This may sound modest, but it represents over 120 million GET requests per day per server, with noticeable impact on service latency. Moreover, the current cache replacement scheme and its implementation are suboptimal when it comes to multithreaded performance [9], with its global lock protecting both hash table and slab LRUs. We therefore perceive great potential in alternative replacement policies, not only for better hit rates but also for better performance. Another interesting question is whether we should optimize Memcached for hit rates or byte hit rates. To answer it, we estimated the penalty of each miss in the ETC workload, by measuring the duration between the missing GET and the subsequent SET that reinstates the value (presumably, the duration represents the time cost to recalculate the value, and is already highly optimized, emphasizing the importance of improved hit rates). We found that it is roughly proportional to the value size, so whether we fill the cache with few large items or many small items of the same aggregate size should not affect recalculation time much. On the other hand, frequent misses do noticeably increase the load on the back-end servers and hurt the user experience, which explains why this cache does not prioritize byte hit rate. Memcached optimizes for small values, because they are by far the most common values. It may even be worthwhile to investigate not caching large objects at all, to increase overall hit rates. 7. RELATED WORK To the best of our knowledge, this is the first detailed description of a large-scale KV-store workload. Nevertheless, there are a number of related studies on other caching systems that can shed light on the relevance of this work and its methodology. The design and implementation of any storage or caching system must be optimized for its workload to be effective. Accordingly, there is a large body of work on the collection, analysis, and characterization of the workloads on storage systems, including enterprise computing environments [2, 20, 21] and high-performance computing environments [11, 22, 30]. The observations can be of great importance to system design, engineering, and tuning. For example, in a study on file system workloads for large-scale scientific computing applications, Wang et. al. collected and analyzed file accesses on an 800-node cluster running the Lustre file system at Lawrence Livermore National Laboratory [30]. One of their findings is that in some workloads, small requests account for more than 90% of all requests, but almost all data are accessed by large requests. In a study on file sys-