Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] There may be sth wrong in func raxRemove when oom occurs #13180

Open
wudilun123 opened this issue Mar 29, 2024 · 5 comments
Open

[QUESTION] There may be sth wrong in func raxRemove when oom occurs #13180

wudilun123 opened this issue Mar 29, 2024 · 5 comments

Comments

@wudilun123
Copy link

In func raxRemove, it calls raxLowWalk with a stack to save parents.
But when oom occurs, raxStackPush will not be able to grow in size and save new node, and there is no handling of return value in raxLowWalk:
if (ts) raxStackPush(ts,h); /* Save stack of parent nodes. */

Therefore, when function raxRemove calls raxStackPop after oom, an incomplete stack may be obtained. This will omit processing of some nodes and may cause a memory leak.

    if (h->size == 0) {
        debugf("Key deleted in node without children. Cleanup needed.\n");
        raxNode *child = NULL;
        while(h != rax->head) {
            child = h;
            debugf("Freeing child %p [%.*s] key:%d\n", (void*)child,
                (int)child->size, (char*)child->data, child->iskey);
            rax_free(child);
/
            rax->numnodes--;
            h = raxStackPop(&ts);
             /* If this node has more then one child, or actually holds
              * a key, stop here. */
            if (h->iskey || (!h->iscompr && h->size != 1)) break;
        }

What makes me more confused is that there is a judgment about oom in the code later to avoid using an incomplete stack.

    /* Don't try node compression if our nodes pointers stack is not
     * complete because of OOM while executing raxLowWalk() */
    if (trycompress && ts.oom) trycompress = 0;
@sundb
Copy link
Collaborator

sundb commented Mar 29, 2024

In func raxRemove, it calls raxLowWalk with a stack to save parents. But when oom occurs, raxStackPush will not be able to grow in size and save new node, and there is no handling of return value in raxLowWalk: if (ts) raxStackPush(ts,h); /* Save stack of parent nodes. */

Therefore, when function raxRemove calls raxStackPop after oom, an incomplete stack may be obtained. This will omit processing of some nodes and may cause a memory leak.

there's no memory leak, it's just that when a node is deleted, some nodes might not be cleaned up, i.e. a node doesn't have any children anymore, but it's still there.
btw, this only affects users using rax as dependance, but not redis, because redis oom is taken over by zmalloc_default_oom.

What makes me more confused is that there is a judgment about oom in the code later to avoid using an incomplete stack.

    /* Don't try node compression if our nodes pointers stack is not
     * complete because of OOM while executing raxLowWalk() */
    if (trycompress && ts.oom) trycompress = 0;

Because ts is incomplete, the parent-child relationship is actually unclear, so it can't be compressed.

@wudilun123
Copy link
Author

Assume that the current tree structure is a->b->c->d->e, and you want to delete the string "abcde."
After executing this code, b c d e is cleaned up normally:

if (h->size == 0) {
    debugf("Key deleted in node without children. Cleanup needed.\n");
    raxNode *child = NULL;
    while(h != rax->head) {
        child = h;
        debugf("Freeing child %p [%.*s] key:%d\n", (void*)child,
            (int)child->size, (char*)child->data, child->iskey);
        rax_free(child);
        rax->numnodes--;
        h = raxStackPop(&ts);
         /* If this node has more then one child, or actually holds
          * a key, stop here. */
        if (h->iskey || (!h->iscompr && h->size != 1)) break;
    }

But if you get an incomplete stack after oom, only a & b in the stack.
In this case, b &e will be cleaned up after the above code executed, c & d are still there and we lose the reference to node c (it has no chance to free).

Although it doesn't affect the use of redis, I think a possible solution is to move the oom judgment to the beginning of raxRemove, like:

int raxRemove(rax *rax, unsigned char *s, size_t len, void **old) {
    ...
    size_t i = raxLowWalk(rax,s,len,&h,NULL,&splitpos,&ts);
    if (ts.oom || i != len || (h->iscompr && splitpos != 0) || !h->iskey) {
        raxStackFree(&ts);
        return 0;
    }

@sundb
Copy link
Collaborator

sundb commented Mar 29, 2024

@wudilun123 If the structure is a->b->c->d->e, then the abc and abcd nodes exist and are not deleted after deleting abcde.
If they don't have a key, the structure would be head -> abcd

@wudilun123
Copy link
Author

Thank you for your reply~
If the structure likes this, abc is a key but ab is not:

(gdb) call raxShow($r)
[a] -> [bd]
        `-(b) [c] -> []=(nil)
        `-(d) []=(nil)

Now call raxRemove to delete node abc, assume that we get an incomplete stack without node ab.
Func raxStackPop will return node a after freeing node abc, which results in a mismatch between parent and child.
After that, we call raxRemoveChild and get stuck in a dead loop, because parent node a doesn't have child node abc:

/* 2. Search the child pointer to remove inside the array of children
 *    pointers. */
while(1) {
    raxNode *aux;
    memcpy(&aux,c,sizeof(aux));
    if (aux == child) break;
    c++;
    e++;
    }

@sundb
Copy link
Collaborator

sundb commented Mar 30, 2024

@wudilun123 you're right, in theory it's possible.
maybe you can make a unittest to test it if you want.
btw, the default static stack size if RAX_STACK_STATIC_ITEMS(32), so to test it we need to fake a rax that go deeper than 32, then manually randomize to make it go OOM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants