Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vespa content pod created snapshot folder owned by nobody user in a high RAM situation #31024

Open
nehajatav opened this issue Apr 24, 2024 · 3 comments
Assignees
Projects
Milestone

Comments

@nehajatav
Copy link

Describe the bug
var/db/vespa/search/cluster.lexdoc/n0/documents/testDocument/0.ready/attribute/testField13_s_attr/snapshot-2298**
Size: 148 Blocks: 0 IO Block: 4096 directory
Device: fd09h/64777d Inode: 447003 Links: 1
Access: (0755/drwxr-xr-x) Uid: (65534/ nobody) Gid: (65534/ nobody)

To Reproduce
Push the content node's RAM beyond limits and you may be able to see this
Our RAM was just slightly lower than 80% and we pushed a schema change to convert numerics to non-paged attributes. This led to multiple hours of searchnode restarts initially without any warning and spike in RAM i.e. crash, and subsequently warnings like these appeared in a few restarts. Lastly we reached the state described above where the podman container itself crashed

Expected behavior
Understand that ram needs to be in check, but the last thing we expect is creation of file by a user other than the vespa user

Screenshots
1713931119.902770 vespa.content 10 - vespa-start-services warning Could not create a file in directory 'var/db/vespa/search/cluster.lexdoc/n0/documents/testDocument/0.ready/attribute/testField13_s_attr/snapshot-2298'
1713931119.902798 vespa.content 10 - vespa-start-services warning problem: open var/db/vespa/search/cluster.lexdoc/n0/documents/testDocument/0.ready/attribute/testField13_s_attr/snapshot-2298/tmp.probe.656871207.tmp: permission denied
1713931119.902802 vespa.content 10 - vespa-start-services warning Currently running as user: vespa
1713931119.902805 vespa.content 10 - vespa-start-services warning Wanted var/db/vespa/search/cluster.lexdoc/n0/documents/testDocument/0.ready/attribute/testField13_s_attr/snapshot-2298 to be owned by user id: 1000
1713931119.902807 vespa.content 10 - vespa-start-services warning Wanted var/db/vespa/search/cluster.lexdoc/n0/documents/testDocument/0.ready/attribute/testField13_s_attr/snapshot-2298 to have group id: 1000
1713931119.902811 vespa.content 10 - vespa-start-services warning Wanted var/db/vespa/search/cluster.lexdoc/n0/documents/testDocument/0.ready/attribute/testField13_s_attr/snapshot-2298 to have permissions: -rwxr-xr-x
1713931119.902813 vespa.content 10 - vespa-start-services warning current status of var/db/vespa/search/cluster.lexdoc/n0/documents/testDocument/0.ready/attribute/testField13_s_attr/snapshot-2298 is:
1713931119.904633 vespa.content 10 - vespa-start-services warning File: var/db/vespa/search/cluster.lexdoc/n0/documents/testDocument/0.ready/attribute/testField13_s_attr/snapshot-2298
Size: 148 Blocks: 0 IO Block: 4096 directory
Device: fd09h/64777d Inode: 447003 Links: 1
Access: (0755/drwxr-xr-x) Uid: (65534/ nobody) Gid: (65534/ nobody)
Access: 2024-04-23 22:18:09.444566411 +0000
Modify: 2024-04-23 22:18:09.429566725 +0000
Change: 2024-04-23 22:18:09.429566725 +0000
Birth: 2024-04-23 22:18:09.406567205 +0000

1713931119.904645 vespa.content 10 - vespa-start-services warning this is a fatal error!
open var/db/vespa/search/cluster.lexdoc/n0/documents/testDocument/0.ready/attribute/testField13_s_attr/snapshot-2298/tmp.probe.656871207.tmp: permission denied

Environment (please complete the following information):

  • OS: RHEL8 v8
  • Infrastructure: Podman 4.4.1

Vespa version
8.221.29

Additional context
Just before this problem occurred, for several hours nodes restart was accompanied with warnings like these due to high RAM

[2024-04-23 18:12:47.075] WARNING searchnode vespa-proton-bin Wanted 18446744073709551615 as limit for core file size but cannot exceed current hard limit: 0
[2024-04-23 18:18:18.792] WARNING searchnode proton.proton.server.disk_mem_usage_filter Write operations are now blocked: 'memoryLimitReached: { action: "add more content nodes", reason: "memory used (0.925613) > memory limit (0.9)", stats: { mapped: { virt: 52275617792, rss: 4988928}, anonymous: { virt: 58651811840, rss: 46603702272}, physicalMemory: 50349035520, memoryUsed: 0.925613, memoryLimit: 0.9}}'

[2024-04-24 03:17:59.062] WARNING searchnode proton.persistence.util Slow processing of message GetIterCommand(). Processing time: 154.5 s (>=5.0 s)

[2024-04-24 03:19:40.088] WARNING searchnode proton.deadlock.detector Thread filestormanager has gone 255035 milliseconds without registering a tick. Global slack not expended yet. Warning for now. Attempting to dump stack of thread...\nBacktrace:\n /opt/vespa/lib64/libsearchlib.so(+0x77704e) [0x7f4d6010204e]\n /opt/vespa/sbin/vespa-proton-bin() [0x6a0d22]\n /opt/vespa/sbin/vespa-proton-bin(proton::DocumentRetriever::populate(unsigned int, document::Document&, document::Field::Set const&) const+0x91) [0x5f3281]\n /opt/vespa/sbin/vespa-proton-bin() [0x5f36ec]\n /opt/vespa/lib64/libsearchlib.so(+0x90150e) [0x7f4d6028c50e]\n /opt/vespa/lib64/libsearchlib.so(search::FileChunk::read(__gnu_cxx::__normal_iterator<search::LidInfoWithLid const*, std::vector<search::LidInfoWithLid, std::allocatorsearch::LidInfoWithLid > >, unsigned long, search::FileChunk::ChunkInfo, search::IBufferVisitor&) const+0xf1) [0x7f4d60291591]\n /opt/vespa/lib64/libsearchlib.so(search::FileChunk::read(__gnu_cxx::__normal_iterator<search::LidInfoWithLid const*, std::vector<search::LidInfoWithLid, std::allocatorsearch::LidInfoWithLid > >, unsigned long, search::IBufferVisitor&) const+0xa1) [0x7f4d602916f1]\n /opt/vespa/lib64/libsearchlib.so(search::LogDataStore::read(std::vector<unsigned int, std::allocator > const&, search::IBufferVisitor&) const+0x2d5) [0x7f4d60296e25]\n /opt/vespa/lib64/libsearchlib.so(search::docstore::BackingStore::visit(std::vector<unsigned int, std::allocator > const&, document::DocumentTypeRepo const&, search::IDocumentVisitor&) const+0x29) [0x7f4d6028b899]\n /opt/vespa/sbin/vespa-proton-bin(proton::DocumentRetriever::visitDocuments(std::vector<unsigned int, std::allocator > const&, search::IDocumentVisitor&, storage::spi::ReadConsistency) const+0x4e) [0x5f1f8e]\n /opt/vespa/sbin/vespa-proton-bin(proton::DocumentIterator::fetchCompleteSource(proton::IDocumentRetriever const&, std::vector<std::unique_ptr<storage::spi::DocEntry, std::default_deletestorage::spi::DocEntry >, std::allocator<std::unique_ptr<storage::spi::DocEntry, std::default_deletestorage::spi::DocEntry > > >&)+0x765) [0x619335]\n /opt/vespa/sbin/vespa-proton-bin(proton::DocumentIterator::iterate(unsigned long)+0x62) [0x619c62]\n /opt/vespa/sbin/vespa-proton-bin(proton::PersistenceEngine::iterate(storage::spi::IteratorId, unsigned long) const+0x155) [0x60b8a5]\n /opt/vespa/lib64/libstorage.so(storage::ProviderErrorWrapper::iterate(storage::spi::IteratorId, unsigned long) const+0x37f) [0x7f4d5e84801f]\n /opt/vespa/lib64/libstorage.so(storage::SimpleMessageHandler::handleGetIter(storage::GetIterCommand&, std::unique_ptr<storage::MessageTracker, std::default_deletestorage::MessageTracker >) const+0x5e) [0x7f4d5e84934e]\n /opt/vespa/lib64/libstorage.so(storage::PersistenceHandler::handleCommandSplitByType(storage::api::StorageCommand&, std::unique_ptr<storage::MessageTracker, std::default_deletestorage::MessageTracker >) const+0x59f) [0x7f4d5e83df2f]\n /opt/vespa/lib64/libstorage.so(storage::PersistenceHandler::processMessage(storage::api::StorageMessage&, std::unique_ptr<storage::MessageTracker, std::default_deletestorage::MessageTracker >) const+0x14b) [0x7f4d5e83e16b]\n /opt/vespa/lib64/libstorage.so(storage::PersistenceHandler::processLockedMessage(storage::FileStorHandler::LockedMessage) const+0x1a5) [0x7f4d5e83e675]\n /opt/vespa/lib64/libstorage.so(storage::PersistenceThread::run(storage::framework::ThreadHandle&)+0xd2) [0x7f4d5e83ec62]\n /opt/vespa/lib64/libstorage.so(storage::framework::defaultimplementation::ThreadImpl::run()+0x53) [0x7f4d5e983f83]\n /lib64/libstdc++.so.6(+0xc2b23) [0x7f4d5113bb23]\n /opt/vespa/lib64/vespa/malloc/libvespamalloc.so(+0x8c30) [0x7f4d60888c30]\n /lib64/libpthread.so.0(+0x81ca) [0x7f4d5c8d31ca]\n /lib64/libc.so.6(clone+0x43) [0x7f4d50753e73]\n [(nil)] (Repeated 5 times since 1713928779.703860)

@vekterli
Copy link
Member

As far as I know, the search core does not set or modify the ownership of its data directories or files. However—is this by any chance running on an XFS filesystem? We have observed this exact scenario very spuriously on XFS during periods with lots of IO and file syncing, but only when the kernel file system buffer cache and/or directory entries cache is being concurrently flushed/evicted.

The observed result was a directory with partially bogus inode metadata, in particular an owner that does not make sense (we've seen uid/gid of both 65534 and 0, neither of which are correct). Also note the Links: 1 field—I would expect a correct directory inode to have at least 2 links since a directory should normally include a hard-link to itself (note; does not apply to all file systems—Btrfs for instance seems to just have 1).

The scenario you describe is one where the kernel is under a very high memory pressure, and where it's likely to try to evict file system caches to get more available memory for applications. If so, the preconditions are similar to what we have observed.

Note: this is inherently just a theory, and we have not had the time to do a deep dive into figuring this out (particularly since doing so would involve reading lots of kernel code). I've tried reproducing this issue by writing a program that stresses the kernel file system cache management through lots of concurrent nested directory creation, fsyncing and forced cache eviction, but never managed to trigger it.

@nehajatav
Copy link
Author

It is XFS indeed

@vekterli
Copy link
Member

Entirely anecdotally, we have not observed this issue for some time on our systems. We do regular fleet-wide OS upgrades, so it's possible the underlying issue has been fixed in the upstream kernel.

What's the kernel version you are running? I.e. uname -rv.

For instance, torvalds/linux@28b4b05 (which I have no idea if is related, but does sound strikingly familiar...) appears to be part of kernel v6.2 and beyond. After doing some digging it looks like this particular fix was backported into RHEL8 kernel version 4.18.0-513.5.1.

@kkraune kkraune added this to the later milestone May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Support
Awaiting triage
Development

No branches or pull requests

3 participants