Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory usage increases after fork the process #112

Open
gladtosee opened this issue May 20, 2019 · 6 comments
Open

memory usage increases after fork the process #112

gladtosee opened this issue May 20, 2019 · 6 comments
Labels

Comments

@gladtosee
Copy link
Contributor

@WojciechMula
hi? I am using the pyahocorasick well.
But I have a problem.

A minor-page fault occurs, which increases the memory usage of the child process.
(https://en.wikipedia.org/wiki/Copy-on-write)
I forked after using gc.freeze(), but a page fault occurred.
(https://docs.python.org/3/library/gc.html#gc.freeze)

What should I do??

I used perf to get the following results.
perf record -e minor-faults -g -p PID

In trace_begin:
                                                            
In trace_end:                                               
                                                            
There is 386377 records in gen_events table                 
Statistics about the general events grouped by thread/symbol/dso: 
                                                            
                                                            
            comm   number        histogram                  
==========================================                  
          python   340070     ###################           
         python3    46307     ################              
                                                            
                          symbol   number        histogram  
==========================================================  
      automaton_search_iter_next   242330     ################## 
          automaton_build_output    30382     ############### 
                      do_mktuple    23836     ############### 
                 PyObject_Malloc    17587     ###############  
                     _int_malloc    14089     ##############
               trienode_get_next    10282     ##############
                 PyMember_GetOne    10084     ##############
        _PyEval_EvalFrameDefault     7855     ############# 
        lookdict_unicode_nodummy     5679     ############# 
                      do_mkvalue     5643     ############# 
                  dict_subscript     3694     ############  
                         collect     1668     ###########   
                    visit_decref     1440     ###########   
          PyObject_GetAttrString     1376     ###########   
                   PyMem_Realloc     1178     ###########   
                pymalloc_realloc      837     ##########    
                  bytearray_init      830     ##########    
                    PyMem_Malloc      728     ##########    
                   PyList_Append      632     ##########    
                  _PyList_Extend      627     ##########    
                   dict_traverse      626     ##########    
                  tupleiter_next      564     ##########    
              malloc_consolidate      412     #########     
       PyBytes_FromStringAndSize      375     #########     
                   List_iterNext      271     #########     
            stringlib_bytes_join      225     ########      
                         set_add      219     ########      
                     PyTuple_New      213     ########      
                    PyMem_Calloc      211     ########      
         Object_beginTypeContext      176     ########      
            _PyFrame_New_NoTrack      171     ########      
             PyObject_GC_UnTrack      128     ########      
                        dict_get      116     #######       
                       sysmalloc       99     #######       
                PyObject_GetAttr       81     #######       
                   PyObject_Free       71     #######       
@WojciechMula
Copy link
Owner

@gladtosee To be honest I wasn't aware of this problem, you are the first one mentioning it. I need to learn a little bit about this issue. Thanks for these articles.

@gladtosee
Copy link
Contributor Author

gladtosee commented May 30, 2019

@WojciechMula
After loading the data from the master process, the child process increments the ref count and causes a Copy-On-Write.
Because Py_BuildValue() increase the reference count before returning from the automaton_build_output function.

#define Py_INCREF(op) (                         \
    _Py_INC_REFTOTAL  _Py_REF_DEBUG_COMMA       \
    ((PyObject *)(op))->ob_refcnt++)

If i modify the code like this:
copy on write does not happen.

//copy from cpython source - https://github.com/python/cpython/blob/v3.7.3/Objects/unicodeobject.c#L2380
PyObject*
_PyUnicode_Copy(PyObject *unicode)
{
    Py_ssize_t length;
    PyObject *copy;
 
    if (!PyUnicode_Check(unicode)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (PyUnicode_READY(unicode) == -1)
        return NULL;
 
    length = PyUnicode_GET_LENGTH(unicode);
    copy = PyUnicode_New(length, PyUnicode_MAX_CHAR_VALUE(unicode));
    if (!copy)
        return NULL;
    assert(PyUnicode_KIND(copy) == PyUnicode_KIND(unicode));
 
    memcpy(PyUnicode_DATA(copy), PyUnicode_DATA(unicode),
           length * PyUnicode_KIND(unicode));
//    assert(_PyUnicode_CheckConsistency(copy, 1));
    return copy;
}

static int automaton_build_output(PyObject* self, PyObject** result);

case STORE_ANY:
    if(PyUnicode_Check(node->output.object)) {
        //N: Same as O, except it doesn’t increment the reference count on the object.
        *result = F(Py_BuildValue)("iN", idx, _PyUnicode_Copy(node->output.object));
    }
    else {
        *result = F(Py_BuildValue)("iO", idx, node->output.object);
    }
    return OutputValue;

@WojciechMula WojciechMula added bug and removed question labels Oct 28, 2019
@WojciechMula
Copy link
Owner

@gladtosee Could you please provide a patch for this?

@yuanchaofa
Copy link

@WojciechMula
After loading the data from the master process, the child process increments the ref count and causes a Copy-On-Write.
Because Py_BuildValue() increase the reference count before returning from the automaton_build_output function.

#define Py_INCREF(op) (                         \
    _Py_INC_REFTOTAL  _Py_REF_DEBUG_COMMA       \
    ((PyObject *)(op))->ob_refcnt++)

If i modify the code like this:
copy on write does not happen.

//copy from cpython source - https://github.com/python/cpython/blob/v3.7.3/Objects/unicodeobject.c#L2380
PyObject*
_PyUnicode_Copy(PyObject *unicode)
{
    Py_ssize_t length;
    PyObject *copy;
 
    if (!PyUnicode_Check(unicode)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (PyUnicode_READY(unicode) == -1)
        return NULL;
 
    length = PyUnicode_GET_LENGTH(unicode);
    copy = PyUnicode_New(length, PyUnicode_MAX_CHAR_VALUE(unicode));
    if (!copy)
        return NULL;
    assert(PyUnicode_KIND(copy) == PyUnicode_KIND(unicode));
 
    memcpy(PyUnicode_DATA(copy), PyUnicode_DATA(unicode),
           length * PyUnicode_KIND(unicode));
//    assert(_PyUnicode_CheckConsistency(copy, 1));
    return copy;
}

static int automaton_build_output(PyObject* self, PyObject** result);

case STORE_ANY:
    if(PyUnicode_Check(node->output.object)) {
        //N: Same as O, except it doesn’t increment the reference count on the object.
        *result = F(Py_BuildValue)("iN", idx, _PyUnicode_Copy(node->output.object));
    }
    else {
        *result = F(Py_BuildValue)("iO", idx, node->output.object);
    }
    return OutputValue;

I tried to use your code and reinstall, but there are some errors.
Symbol not found: _PyUnicode_DATA
Would you give me more details about how you solve your problem

@pombredanne
Copy link
Collaborator

@yuanchaofa do you mind to provide a PR or patch? it would be much easier to review. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants