Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout Event Callback Triggered Later Than Expected Even With EVENT_BASE_FLAG_PRECISE_TIMER Flag Set #1484

Open
tonyyxliu opened this issue Jul 5, 2023 · 3 comments

Comments

@tonyyxliu
Copy link

Dear libevent developers,

I am using libevent-2.1.12-stable in my project to do latency measurement for latency-critical applications like Memcached, and I wish that the callback function of the timeout event could be triggered exactly as I expected even when the timeout is with a precision of microseconds. However, I found that

The callback function of timeout event is sometimes triggered much later than expected

The following screenshot shows what happened, and the logging messages are printed by google-log.

image

Since I am sending requests to Memcached at a certain frequency (for example, 500 microseconds as shown in the screenshot), I compute the time that I have to wait before sending the next request (466 microseconds in this case), setting it as the timeout of the event by the following command, and finally send the request and prepare for the next tick event when the callback function of the timeout event is triggered.

// Initialize issue_event
m_issue_event = evtimer_new(m_base, issue_callback, this);
// Set event with the computed timeout
evtimer_add(m_issue_event, &time_before_next_tick);

However, as the screenshots shows, the timeout event callback is triggered after around 10000 microseconds, which is much later than expected (466 microseconds). As all the operations above are handled by the client, there is no server-side issue. Besides, the event_base is created with EVENT_BASE_FLAG_PRECISE_TIMER flag in the following way, which seems fine as well.

event_config *m_event_config = event_config_new();
event_config_set_flag(m_event_config, EVENT_BASE_FLAG_PRECISE_TIMER);
event_base *m_base = event_base_new_with_config(m_event_config);

I have tried some other EVENT_BASE_FLAGs such as NO_CACHE_TIME but none of them solved my problem.

I am sorry that the reproducible scripts cannot be provided since only a very small number of timeout events behave unexpected, while others are completely fine. It makes me confused why the timeout event suddenly failed.

Does anyone have any idea about what caused this problem? Thank you very much for your kindness!

Yuxuan

@azat
Copy link
Member

azat commented Jul 9, 2023

Looks like everything should work.

How do you calculate the timeout? Maybe you are affected by this bug #1474? Also can you try upstream libevent?

@tonyyxliu
Copy link
Author

tonyyxliu commented Jul 11, 2023

Hi @azat ,
Thank you so much for your timely reply.

How do you calculate the timeout?

I have written a test file in C++ explaining what I have done with libevent, and the problem of unexpected delay of timeout event callback seems to be reproducible at least in my Intel NUC. Could you please have a try and see if the problem is reproducible in your computer? The miss request should be 0 if the timeout event of libevent performs as unexpected. Thank you in advance!

//
// Created by tonyyxliu on 7/11/23.
//
// This is a test for the problem related to the unexpected delay of libevent
// timeout event callback trigger
//
// Compile this program in release mode and link it with two libraries: 'event' and 'glog'
//

#include <iostream>
#include <event2/buffer.h>
#include <event2/event.h>
#include <event2/util.h>
#include <glog/logging.h>

unsigned int num_issue_attempts = 0;
unsigned int num_missed_requests = 0;
double request_interval =
    0.0001; // in seconds (=100 microseconds, equivalent to QPS=10k)
double test_duration = 2; // in seconds
event_base *m_base = nullptr;
event *m_issue_event = nullptr;
event *m_wrapup_event = nullptr;
timeval next_issue_time{};
timeval next_wrapup_time{};

/**
 * @brief Transfer from timeval struct to double (in seconds)
 * @param tv
 * @return
 */
double timeval_to_double(timeval *tv)
{
    return tv->tv_sec + static_cast<double>(tv->tv_usec) / 1e+06;
}

/**
 * @brief Transfer from double (in seconds) to timeval struct
 * @param time_val
 * @param tv
 * @return
 */
void double_to_timeval(double time_val, timeval *tv)
{
    auto secs = static_cast<long long>(time_val);
    tv->tv_sec = secs;
    tv->tv_usec = static_cast<int>((time_val - secs) * 1e+06);
}

/**
 * Add a event for listening and update the next time to tick this event
 * @param timeout_event
 * @param next_tick_time
 * @param tick_interval
 * @return true on success, false on error (timeout later than expected)
 */
void add_timeout_event(event *const timeout_event, timeval &next_tick_time,
                       const double next_tick_interval)
{
    timeval now{}, time_before_next_tick{};
    gettimeofday(&now, nullptr);
    // Trigger the timeout event immediately if the tick interval is 0
    if (next_tick_interval == 0)
    {
        event_active(timeout_event, 0, 0);
        return;
    }
    // Update the next time of tick
    double_to_timeval(timeval_to_double(&next_tick_time) + next_tick_interval,
                      &next_tick_time);
    evutil_timersub(&next_tick_time, &now, &time_before_next_tick);
#ifdef DEBUG
    LOG(INFO) << "tick_interval is " << next_tick_interval << " seconds.";
    LOG(INFO) << "Time before next tick is: " << time_before_next_tick.tv_sec
              << "s and " << time_before_next_tick.tv_usec << "usec.";
#endif
    // It may happen that the current time is later than time next_tick_time,
    // resulting in a request miss
    if (time_before_next_tick.tv_sec < 0)
    {
        num_missed_requests++;
        add_timeout_event(timeout_event, next_tick_time, next_tick_interval);
    }
    else
    {
        // No need to delete the event since new timeout will replace the old
        // one
        evtimer_add(timeout_event, &time_before_next_tick);
    }
}

/**
 * Callback function for tick event.
 * It sends the requests with the help of protocol and trigger the tick()
 * function to set another tick timeout event.
 * @param fd: socket
 * @param events: events
 * @param arg: can be transformed to the pointer of Connection class
 */
void timeout_callback(evutil_socket_t fd, short events, void *arg)
{
    num_issue_attempts++;
#ifdef DEBUG
    LOG(INFO) << "Issue_callback function triggered.";
#endif
    // Set timeout event again for the next issue
    add_timeout_event(m_issue_event, next_issue_time, request_interval);
    // Send requests here
}

/**
 * Callback function for the wrapup event. The timeout is set as the duration
 * of the connection / load generation. When this event is triggered, the
 * connection should be closed and load generation finished.
 * @param fd: socket
 * @param events: events
 * @param arg: can be transformed to the pointer of Connection class
 */
void wrapup_callback(evutil_socket_t fd, short events, void *arg)
{
#ifdef DEBUG
    LOG(INFO) << "Test Wrapup";
#endif
    // Delete timeout events
    evtimer_del(m_issue_event);
    evtimer_del(m_wrapup_event);
    // Stop listening for events
    event_base_loopbreak(m_base);
    // Resource destruction
    event_free(m_issue_event);
    event_free(m_wrapup_event);
    event_base_free(m_base);
    // Statistics logging
    LOG(INFO) << "Number of issue attempts: " << num_issue_attempts;
    LOG(INFO) << "Number of missed requests: " << num_missed_requests;
    LOG(INFO) << "Miss request: "
              << 100.0 * num_missed_requests /
                     (num_missed_requests + num_issue_attempts)
              << "%";
    // Print the version of libevent
    LOG(INFO) << "libevent linked version: " << event_get_version() << "\n";
    LOG(INFO) << "libevent headers version: " << LIBEVENT_VERSION << "\n";
    LOG(INFO) << "libevent version number: " << std::hex
              << LIBEVENT_VERSION_NUMBER << "\n";
}

int main()
{
    event_config *m_event_config = event_config_new();
    event_config_set_flag(m_event_config, EVENT_BASE_FLAG_PRECISE_TIMER);
    m_base = event_base_new_with_config(m_event_config);
    m_issue_event = evtimer_new(m_base, timeout_callback, nullptr);
    m_wrapup_event = evtimer_new(m_base, wrapup_callback, nullptr);
    // Initialize the next time to tick timeout events as the current time,
    // which will be updated in each iteration
    gettimeofday(&next_issue_time, nullptr);
    gettimeofday(&next_wrapup_time, nullptr);
    // Activate the timeout events
    add_timeout_event(m_wrapup_event, next_wrapup_time,
                      test_duration); // test lasts for 5 seconds
    event_active(m_issue_event, 0, 0);
    // Start listening for events
    event_base_dispatch(m_base);
}

This is the result that I got in my Intel NUC. (BTW, the version of linking and headers are different for libevent-2.1.12-stable)
image

Maybe you are affected by this bug #1474?

Sorry that I am not sure what this issue (lazy gettimeofday) is really about. It seems that I used the native gettimeofday function to get the currect timeval instead of evutil_gettimeofday.

Also can you try upstream libevent?

The problem still exists after upgrade libevent to the latest version (2.2.1-alpha).
image

@azat
Copy link
Member

azat commented Aug 2, 2023

FWIW I've looked into this problem, and it looks like the problem is that epoll sometimes takes too much time, look:

azat:.../tmp/1484 [1]$ sudo perf trace --duration 0.1 env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./test_azat
     0.308 ( 0.291 ms): test_azat/21738 execve(filename: 0xffffe66b, argv: 0x7fffffffe3f8, envp: 0x555555560520) = 0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20230716 21:55:53.198009 21738 test_azat.cpp:88] tick_interval is 3000000 us.
I20230716 21:55:53.198060 21738 test_azat.cpp:89] Time before next tick is: 1s and 1999947usec.
     2.553 I20230716 21:55:53.198233 21738 test_azat.cpp:111] Issue_callback function triggered.
( 0.155 ms): test_azat/21738 epoll_pwait2(epfd: 3<anon_inode:[eventpoll]>, events: 0x55555556e5a0, maxevents: 32, timeout: 0x7fffffffe1d8, sigsetsize: 8) = 0
I20230716 21:55:53.198246 21738 test_azat.cpp:88] tick_interval is 100 us.
I20230716 21:55:53.198256 21738 test_azat.cpp:89] Time before next tick is: -1s and 999809usec.
test_azat: test_azat.cpp:93: void add_timeout_event(event *const, timeval &, const long, bool): Assertion `time_before_next_tick.tv_sec >= 0' failed.

P.S. sorry for the long delay, I thought that I will take a look at why epoll is takes that much time, but it is already 2 weeks left and I did not did this yet...

Here is the full sources - https://gist.github.com/azat/f60871c83a32b7e31d8dfecb8f10f4a9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants