-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TDBv0.2: Cache background revalidation and eviction #515
Comments
It seems there is some race in the lock-free index or we actually hit the #500 problem in scenario from #1435 : multiple parallel requests to large file
combined with the Tempesta restart in the VM
sometimes produce warnings like
|
I've made few roughly benchmarks HTTP2 with enabled caching.
Tempesta1kb response
5kb response
128kb response
128kb reponse with HTTP/1
Nginx (nginx/1.23.3)1kb response
5kb response
128kb response
FYI: |
Depends on #1869
Scope
tfw_cache_mgr
thread must traverse Web-cache and evict stale records on memory pressure or revalidate them otherwise. The thread must be accurately scheduled and throttled to not to impact system performance as well as efficiently free required memory. #500 must be kept in mind as well.Validation logic is defined by RFC 7234 4.3 and requires implementation of conditional requests.
Keep in mind DoS attack from #520. Following items linked with #516 (TDB v0.3) must be implemented:
reinsert
andlookup & insert
(tdb_rec_get_alloc()
) logic from Temporal client accounting #1115 (temporary implementatied in Temporal client accounting #1178).__cache_add_node()
creates a TDB entry, which immediately becomes visible for other threads, and latertfw_cache_copy_resp()
inserts actual data, so concurrent threads may get incomplete or corrupted data. It can be done in 2 phases (soft updates): (1) allocate space in TDB data area and (2) actual insert (index update) to link the data.tfw_client_obtain()
modifications from Temporal client accounting #1178, as well as similar HTTP sessions storage (Sticky cookies load balancing #685), and__cache_add_node()
must be changed to use the soft updates. This also implies some versioning: while a softirq sending data for current cached object (probably very slowly with Redesign of TCP synchronous sending and data caching #391 .1 in mind), the object may stall and/or replaced by a new version, so the new version only must be fetched by new scans while the old version must reside in TDB untill it's fully transmitted and then it should be evicted.chroot
isolation).The current TDB table size maximum is 128GB, which is too small for the web cache on the modern hardwareThis is teh subject for NUMA-aware cache modes #400__cache_entry_size()
call which introduces an extra response traversal. It seems we can just allocate new TDB data blocks and later reuse them if we have extra space or just ignore the tail if it's unusable.The task is required to fix #803.
UPD. Since filtering (#731) and QoS (#488) also require eviction, there job should be done in
tdb_mgr
thread instead.UPD. TDB was designed to provide access to stored data in zero-copy fashion, such that cached response body can be sent directly to a socket. This property made several design limitations and introduced many difficulties. However, with TLS we always have to copy data. So TDB design can be significantly simplified with copying. So depends on #634.
Cache eviction
While CART is well known good adaptive replacement algorithm, there are number of caching algorithms based on machine learning, which provide much better cache hit. See for example the survey and Cacheus. Some of the algorithms required access to columnar storage for statistics (common practice in CDNs).
At least some interface for the user space algorithm is required. Probably just CART with some weights, where weights are loaded from the users space into the kernel, would be enough.
The cache must implement per-vhost eviction strategies and space quotas to provide caching QoS for CDN cases. Probably 2-layer quotas are required to not prevent poor configuration issues for bad Vary specification on application side, which may take too much space (linked with #733). Different eviction strategies are required to handle e.g. chunks of live streams (huge data volume, immediately remove outdated chunks) and rarely updated web content like CSS (may service stale entries).
It must be possible to 'lock' some records in evictable data sets (see #858 and #471).
Purging
On this feature implementation we should be able to normally update the site content w/o Tempesta restart or memory leaks. It's hard to track which new pages appeared and which are deleted during site content update, so in this task we need:
immediate
(purge
in original [Cache] purging #501) strategy for the purging (we still need the mode to leave stale responses in the cache for Servicing stale cached responses and immediate purging #522);Documentation
Need to update https://github.com/tempesta-tech/tempesta/wiki/Caching-Responses#manual-cache-purging wiki page.
Testing
invalidate
andimmediate
strategiesThe text was updated successfully, but these errors were encountered: