TDBv0.2: Cache background revalidation and eviction #515

krizhanovsky · 2016-05-25T18:18:18Z

Depends on #1869

Scope

tfw_cache_mgr thread must traverse Web-cache and evict stale records on memory pressure or revalidate them otherwise. The thread must be accurately scheduled and throttled to not to impact system performance as well as efficiently free required memory. #500 must be kept in mind as well.

Validation logic is defined by RFC 7234 4.3 and requires implementation of conditional requests.

Keep in mind DoS attack from #520. Following items linked with #516 (TDB v0.3) must be implemented:

The task is required to fix #803.

UPD. Since filtering (#731) and QoS (#488) also require eviction, there job should be done in tdb_mgr thread instead.

UPD. TDB was designed to provide access to stored data in zero-copy fashion, such that cached response body can be sent directly to a socket. This property made several design limitations and introduced many difficulties. However, with TLS we always have to copy data. So TDB design can be significantly simplified with copying. So depends on #634.

Cache eviction

While CART is well known good adaptive replacement algorithm, there are number of caching algorithms based on machine learning, which provide much better cache hit. See for example the survey and Cacheus. Some of the algorithms required access to columnar storage for statistics (common practice in CDNs).

At least some interface for the user space algorithm is required. Probably just CART with some weights, where weights are loaded from the users space into the kernel, would be enough.

The cache must implement per-vhost eviction strategies and space quotas to provide caching QoS for CDN cases. Probably 2-layer quotas are required to not prevent poor configuration issues for bad Vary specification on application side, which may take too much space (linked with #733). Different eviction strategies are required to handle e.g. chunks of live streams (huge data volume, immediately remove outdated chunks) and rarely updated web content like CSS (may service stale entries).

It must be possible to 'lock' some records in evictable data sets (see #858 and #471).

Purging

On this feature implementation we should be able to normally update the site content w/o Tempesta restart or memory leaks. It's hard to track which new pages appeared and which are deleted during site content update, so in this task we need:

full web content purging;
immediate (purge in original [Cache] purging #501) strategy for the purging (we still need the mode to leave stale responses in the cache for Servicing stale cached responses and immediate purging #522);

Documentation

Need to update https://github.com/tempesta-tech/tempesta/wiki/Caching-Responses#manual-cache-purging wiki page.

Testing

Throughput on large cached objects and compare with Nginx
web content purging with invalidate and immediate strategies
Test on web cache larger than 4GB in 1 and 2 NUMA nodes with cache modes 1 and 2.

The text was updated successfully, but these errors were encountered:

krizhanovsky · 2021-01-18T09:10:49Z

It seems there is some race in the lock-free index or we actually hit the #500 problem in scenario from #1435 : multiple parallel requests to large file

./wrk -d 3600 -c 16000 -t 8 -H 'connection: close' https://debian:443/research/web_acceleration_mechanics.pdf

combined with the Tempesta restart in the VM

# while :; do ./scripts/tempesta.sh --restart; sleep 30; done

sometimes produce warnings like

[ 1103.775556] [tdb] ERROR: out of free space
[ 1103.810415] [tdb] ERROR: out of free space
[ 1103.845177] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
[ 1103.929897] [tdb] ERROR: out of free space
[ 1103.949002] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
[ 1103.984315] [tdb] ERROR: out of free space
[ 1104.010543] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
[ 1104.070816] [tdb] ERROR: out of free space
[ 1104.080997] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
[ 1104.151540] [tdb] ERROR: out of free space
[ 1104.158845] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
[ 1104.199489] [tdb] ERROR: out of free space
[ 1104.231891] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
....

krizhanovsky · 2021-10-25T16:44:48Z

The task must be split. After #788 the most crucial part is removing cache entries for #522 and some basic eviction to get the cache usable, i.e. get rid of the memory leaking.

const-t · 2022-12-30T16:24:31Z

I've made few roughly benchmarks HTTP2 with enabled caching.

h2load -c700 -m100 --duration=30 -t2 https://debian

Tempesta

1kb response

finished in 30.14s, 337279.80 req/s, 393.06MB/s
requests: 10118394 total, 10188394 started, 10118394 done, 10118394 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10118394 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 11.52GB (12364696856) total, 1.70GB (1821310920) headers (space savings 23.08%), 9.65GB (10361235456) data
                     min         max         mean         sd        +/- sd
time for request:      391us    404.11ms     69.33ms     52.31ms    64.69%
time for connect:    70.24ms    229.04ms    169.16ms     56.50ms    61.71%
time to 1st byte:   195.61ms    323.51ms    252.20ms     27.06ms    79.96%
req/s           :       0.00     4462.36      803.41      771.99    59.29%

5kb response

finished in 30.23s, 229514.40 req/s, 1.14GB/s
requests: 6885532 total, 6955433 started, 6885532 done, 6885432 succeeded, 100 failed, 100 errored, 0 timeout
status codes: 6885469 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 34.16GB (36684160200) total, 1.16GB (1244661614) headers (space savings 23.00%), 32.83GB (35253572326) data
                     min         max         mean         sd        +/- sd
time for request:    17.12ms    698.47ms    103.21ms     39.88ms    90.88%
time for connect:    73.25ms    237.29ms    165.14ms     56.21ms    69.57%
time to 1st byte:   210.69ms    299.74ms    253.76ms     25.23ms    58.53%
req/s           :       0.00      603.40      366.27      247.73    69.86%

128kb response

finished in 30.36s, 17200.80 req/s, 2.11GB/s
requests: 516024 total, 586024 started, 516024 done, 516024 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 516273 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 63.24GB (67904607399) total, 90.50MB (94901121) headers (space savings 22.71%), 63.01GB (67651755146) data
                     min         max         mean         sd        +/- sd
time for request:    47.50ms      18.31s    998.10ms       1.12s    95.44%
time for connect:    70.58ms    254.74ms    159.74ms     56.57ms    68.43%
time to 1st byte:   203.41ms    474.57ms    360.97ms     78.33ms    58.21%
req/s           :       0.00      181.65       31.60       47.24    77.14%

128kb reponse with HTTP/1

finished in 30.37s, 21665.00 req/s, 2.65GB/s
requests: 649950 total, 719750 started, 649950 done, 649950 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 650181 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 79.52GB (85388074799) total, 142.43MB (149350032) headers (space savings 0.00%), 79.95GB (85844417328) data
                     min         max         mean         sd        +/- sd
time for request:    27.77ms       2.64s    510.89ms    293.07ms    85.45%
time for connect:    76.97ms    210.16ms    152.70ms     47.93ms    69.34%
time to 1st byte:   187.62ms    302.22ms    253.48ms     39.83ms    54.62%
req/s           :       0.00      336.64       48.35       78.75    82.86%

Nginx (nginx/1.23.3)

1kb response

finished in 30.15s, 135510.73 req/s, 150.56MB/s
requests: 4065322 total, 4135322 started, 4065322 done, 4065322 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4065322 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.41GB (4736134430) total, 476.87MB (500034606) headers (space savings 33.15%), 3.88GB (4162889728) data
                     min         max         mean         sd        +/- sd
time for request:     1.45ms       1.54s    530.87ms    307.86ms    70.73%
time for connect:    15.54ms    374.44ms    123.50ms     85.68ms    77.57%
time to 1st byte:   179.61ms    909.80ms    359.37ms    165.22ms    86.00%
req/s           :     109.97      366.27      193.44       80.16    71.71%

5kb response

finished in 30.16s, 168594.90 req/s, 846.10MB/s
requests: 5057847 total, 5127847 started, 5057847 done, 5057847 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 5065270 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.79GB (26616104602) total, 599.00MB (628093480) headers (space savings 32.97%), 24.12GB (25896832020) data
                     min         max         mean         sd        +/- sd
time for request:      359us       5.39s    432.35ms    460.44ms    87.07%
time for connect:    22.18ms    265.32ms    123.70ms     63.49ms    57.29%
time to 1st byte:   219.39ms       2.17s    803.55ms    511.62ms    59.57%
req/s           :      55.85      558.71      240.58      163.94    72.29%

128kb response

finished in 30.27s, 16222.27 req/s, 2.05GB/s
requests: 486668 total, 556668 started, 486668 done, 486668 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 548023 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 61.56GB (66099265904) total, 65.85MB (69050898) headers (space savings 32.98%), 61.42GB (65952787645) data
                     min         max         mean         sd        +/- sd
time for request:    21.49ms      29.62s       3.73s       3.07s    71.63%
time for connect:    23.21ms    310.06ms    147.42ms     71.60ms    57.86%
time to 1st byte:   247.08ms       1.68s    754.43ms    418.40ms    52.57%
req/s           :       3.10      175.05       23.13       21.80    88.00%

FYI:
Sometimes h2load freezes at the end of benchmarking tempesta. Looks like tempesta holds connection.

krizhanovsky added the enhancement label May 25, 2016

krizhanovsky added this to the 0.5.0 Web Server milestone May 25, 2016

krizhanovsky assigned keshonok May 25, 2016

krizhanovsky mentioned this issue May 25, 2016

TDBv0.3: transactions, indexes, durability #516

Open

13 tasks

krizhanovsky unassigned keshonok May 25, 2016

krizhanovsky changed the title ~~[Cache] background revalidation and eviction~~ [Cache] tfw_cache_mgr: background revalidation and eviction May 25, 2016

This was referenced May 25, 2016

[RFC 7232] Conditional requests for cache entries revalidation #518

Open

[Frang] negative cache entries rate limit #520

Closed

krizhanovsky self-assigned this May 25, 2016

krizhanovsky mentioned this issue May 25, 2016

[RFC7233] Range Requests & partial responses #499

Open

krizhanovsky changed the title ~~[Cache] tfw_cache_mgr: background revalidation and eviction~~ [Cache] background revalidation and eviction May 13, 2017

vankoven mentioned this issue Aug 7, 2017

Unable to serve from cache if cache entry was expired or invalidated #788

Closed

krizhanovsky modified the milestones: 1.0 Web Server, 0.8 TDB v0.2 Jan 9, 2018

intelfx mentioned this issue Mar 31, 2018

Missing functional tests for cache #810

Closed

9 tasks

krizhanovsky added the crucial label Mar 31, 2018

krizhanovsky modified the milestones: 1.1 TDB v0.2, 1.0 Beta Jul 15, 2018

krizhanovsky mentioned this issue Aug 11, 2018

TLS sessions resumption #1054

Closed

6 tasks

krizhanovsky mentioned this issue Sep 9, 2018

Reducing origin server requests #500

Open

krizhanovsky mentioned this issue Nov 19, 2018

Incorrect restoring of responces from chunked cache entries #803

Closed

krizhanovsky changed the title ~~[Cache] background revalidation and eviction~~ TDBv0.2: Cache background revalidation and eviction Nov 29, 2018

krizhanovsky added a commit that referenced this issue Dec 25, 2018

TODO comments for #488,#515,#1054

cc180c0

krizhanovsky mentioned this issue Dec 26, 2018

Temporal client accounting #1115

Closed

krizhanovsky added a commit that referenced this issue Dec 31, 2018

TODO comments for #488,#515,#1054

a0acb89

krizhanovsky mentioned this issue Jan 18, 2019

Storing client data in tdb #1145

Closed

krizhanovsky modified the milestones: 1.0 Beta, 0.8 TDBv0.2 Feb 2, 2019

krizhanovsky mentioned this issue Feb 8, 2019

Temporal client accounting #1178

Merged

krizhanovsky mentioned this issue Jul 5, 2019

Optimizer for HTTP messages adjustment #1103

Open

krizhanovsky mentioned this issue Sep 30, 2019

Per-vhost sticky cookie configuration #1354

Merged

krizhanovsky modified the milestones: 0.9 TDBv0.2 - Beta, 0.8 TLS 1.3 & TDBv0.2 Nov 6, 2019

krizhanovsky added the TDB Tempesta DB module and related issues label Apr 27, 2020

krizhanovsky mentioned this issue Jun 9, 2020

Test: automated performance testing suite #781

Open

8 tasks

krizhanovsky added the cache label Oct 1, 2020

krizhanovsky mentioned this issue Jan 19, 2021

Crash in tfw_cache_add_body_page() #1487

Open

krizhanovsky mentioned this issue Jun 30, 2021

TDB secondary index #733

Open

krizhanovsky mentioned this issue Aug 29, 2021

Huge pages allocation issue and the crash on cache sizes >=2GB #1515

Open

krizhanovsky mentioned this issue Oct 25, 2021

Servicing stale cached responses and immediate purging #522

Open

krizhanovsky mentioned this issue Dec 28, 2021

Custom logging #537

Open

krizhanovsky modified the milestones: 0.8 TLS 1.3 & TDBv0.2 - Beta, 0.7 HTTP/2 & TLS performance, 0.7 - HTTP/2, fast in-kernel TLS, 0.11 - TDBv0.2 Jan 3, 2022

This was referenced Sep 8, 2022

Web-server mode #471

Open

Alternate fallback content for DDoS #858

Open

krizhanovsky modified the milestones: 0.10 - TDBv0.3, 0.7 - HTTP/2, fast in-kernel TLS Oct 12, 2022

krizhanovsky mentioned this issue Feb 7, 2023

H2 related corrections and optimizations. #1411

Open

17 tasks

krizhanovsky modified the milestones: 0.7 - Beta, 1.0 - GA Apr 19, 2023

krizhanovsky mentioned this issue Jul 21, 2023

TDB crash #1921

Open

krizhanovsky modified the milestones: 1.0 - GA, 0.9 - LA Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TDBv0.2: Cache background revalidation and eviction #515

TDBv0.2: Cache background revalidation and eviction #515

krizhanovsky commented May 25, 2016 •

edited

krizhanovsky commented Jan 18, 2021

krizhanovsky commented Oct 25, 2021 •

edited

const-t commented Dec 30, 2022

TDBv0.2: Cache background revalidation and eviction #515

TDBv0.2: Cache background revalidation and eviction #515

Comments

krizhanovsky commented May 25, 2016 • edited

Scope

Cache eviction

Purging

Documentation

Testing

krizhanovsky commented Jan 18, 2021

krizhanovsky commented Oct 25, 2021 • edited

const-t commented Dec 30, 2022

Tempesta

Nginx (nginx/1.23.3)

krizhanovsky commented May 25, 2016 •

edited

krizhanovsky commented Oct 25, 2021 •

edited