Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.Fix the bug that after calling zsys_shutdown() for the first time, … #2274

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

ZhuXxxx
Copy link
Contributor

@ZhuXxxx ZhuXxxx commented Sep 20, 2023

…s_mutex is not available even if zsys_init() is called again.

2.Fix the bug that caused the exit callback function table to be full after calling zsys_init() and zsys_shutdown() multiple times.

Pull Request Notice

Before sending a pull request make sure each commit solves one clear, minimal,
plausible problem. Further each commit should have the following format:

Problem: X is broken

Solution: do Y and Z to fix X

Please avoid sending a pull request with recursive merge nodes, as they
are impossible to fix once merged. Please rebase your branch on
zeromq/czmq master instead of merging it.

git remote add upstream git@github.com:zeromq/czmq.git
git fetch upstream
git rebase upstream/master
git push -f

In case you already merged instead of rebasing you can drop the merge commit.

git rebase -i HEAD~10

Now, find your merge commit and mark it as drop and save. Finally rebase!

If you are a new contributor please have a look at our contributing guidelines:
CONTRIBUTING.md

…s_mutex is not available even if zsys_init() is called again.

2.Fix the bug that caused the exit callback function table to be full after calling zsys_init() and zsys_shutdown() multiple times.
@sphaero
Copy link
Contributor

sphaero commented Mar 8, 2024

Sorry for the delay but I've never seen this problem. I'm willing to merge this but the CI logs have expired so I can't check whether the failures are due to this change.

It seems you are talking about a bug when zsys_shutdown is called before zsys_init is called?

@ZhuXxxx
Copy link
Contributor Author

ZhuXxxx commented Mar 8, 2024

Sorry for the delay but I've never seen this problem. I'm willing to merge this but the CI logs have expired so I can't check whether the failures are due to this change.

It seems you are talking about a bug when zsys_shutdown is called before zsys_init is called?

In Unix, variable 's_mutex' in file 'zsys.c' is initialized only once during the whole process lifetime by 'pthread_once()', and it will be destroyed when calling function 'zsys_shutdown()', If you call sys_init , and then call sys_shutdown, and call sys_init for the second time, variable ‘s_mutex’ is not initialized again, so 's_mutex' is invalid because it has been destroyed in function sys_shutdown, it can not play a role in ensuring thread safety, now if you call sys_shutdown for the second time in a multi-threaded environment, and other threads are using z_sock, you will most likely observe a program crash。

Of cause, if sys_shutdown is called only once at the end of the process, it's not necessary to make change, but you provided a demo of calling sys_init and sys_shutdown multiple times in the file 'sys.c' function 'zsys_test()',so to make sure ‘s_mutex’ is initialized only once and will be destroyed only once,I made this modifications.

@sphaero
Copy link
Contributor

sphaero commented Mar 8, 2024

So doing this causes a potential error you say?

zsys_init();
zsys_shutdown();
zsys_init();
zsys_shutdown();
zsys_init();

It must be an edge case. What should I do to make it crash?

I'm not sure what this fixes. Isn't it that your fix only guarantees that the mutex is destroyed only once the first time it is called?

@ZhuXxxx
Copy link
Contributor Author

ZhuXxxx commented Mar 8, 2024

So doing this causes a potential error you say?

zsys_init();
zsys_shutdown();
zsys_init();
zsys_shutdown();
zsys_init();

It must be an edge case. What should I do to make it crash?

I'm not sure what this fixes. Isn't it that your fix only guarantees that the mutex is destroyed only once the first time it is called?

My first fix is replace 'atexit (zsys_shutdown);' with 'pthread_once(&register_atexit_shutdown, zsys_register_atexit_shutdown);', it guarantees ‘sys_shutdown()’ will only be registered once, if not fix it, calling sys_init and sys_shutdown multiple times(about 32) will cause the atexit function table to become full.

My second fix is add 'atexit (zsys_destroy_mutex);' to function 'zsys_initialize_mutex()', and remove 'ZMUTEX_DESTROY (s_mutex);' in Unix environment, it guarantees 's_mutex' is always valid during the whole process lifetime, it will be destroyed after 'main()', if not fix it, 's_mutex' will become invalid after calling 'sys_shutdown', even if the process will continue to run.

My business is too complex to show, but I will show you another deme to prove this bug is causing some problems:

static void func()
{   
    ::std::vector<void*> vec;
    while (0 == zsys_interrupted) {
        while (vec.size() < 100000 && 0 == zsys_interrupted) {
            void* const handle = zsys_socket(ZMQ_PULL, nullptr, 0);
            if (nullptr == handle) {
                continue;
            }
            vec.push_back(handle);
        }
        for (void* const handle : vec) {
            zsys_close(handle, nullptr, 0);
        }
        vec.clear();
    }
}
int main(const int argc, const char* const argv[])
{
    zsys_init();
//    zsys_shutdown();
//    zsys_init();
    //Create two threads to call zsys_sockect and  zsys_close frequently
    ::std::thread t0(func);
    t0.detach();
    ::std::thread t1(func); 
    t1.detach();                            
    ::std::this_thread::sleep_for(::std::chrono::milliseconds(1000));
    //End two other threads
    zsys_interrupted = 1;
    ::std::this_thread::sleep_for(::std::chrono::milliseconds(1000));                   
    //Shutdown
    zsys_shutdown();                                                
    return 0;
}

If you run this demo on Linux OS, it works fine.

But if you 'zsys_init(); zsys_shutdown(); zsys_init();', like this:

int main(const int argc, const char* const argv[])
{
    zsys_init();
    zsys_shutdown();
    zsys_init();
    //Create two threads to call zsys_sockect and  zsys_close frequently
    ::std::thread t0(func);
    t0.detach();
    ::std::thread t1(func);
    t1.detach();
    ::std::this_thread::sleep_for(::std::chrono::milliseconds(1000));
    //End two other threads
    zsys_interrupted = 1;
    ::std::this_thread::sleep_for(::std::chrono::milliseconds(1000));
    //Shutdown
    zsys_shutdown();   
                       
    return 0;
}

You will see a error like 'E: 24-03-09 01:18:29 dangling sockets: cannot terminate ZMQ safely 12'.

Both zsys_socket and zsys_close have critical sections protected by s_mutex, so if s_mutex is invalid, some problems will occur.

There was no crash in this example, but the crash did happen in my business.(Complex scenes using zactor and zloop)

In the final analysis, it is all because s_mutex invalid forever after calling sys_shutdown at the first time, even if you called sys_init a second time.

@sphaero
Copy link
Contributor

sphaero commented Mar 11, 2024

I guess you have given this thought. I'll merge it. If it comes back with trouble we can revert it.

Btw, if you are using this lib we can use some help in maintaining this. Mostly consists of merging PRs. If you want to help out have a look at C4 which describes how we work.

@sphaero sphaero merged commit db94044 into zeromq:master Mar 11, 2024
6 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants