Skip to content
Adelhard Krämer edited this page Apr 12, 2022 · 6 revisions

Stand With Ukraine

How to use?

For example, you have a lot of HTML files that need to quickly parse and find some elements.

Firstly, create an object myhtml_t. This object needs only for initiate threads. His need to create once and forget about it. If we need this object we can always be call him from the myhtml_tree_t object by function myhtml_tree_get_myhtml. If you are using multi-threaded parsing the streams will be created with each new object myhtml_t. I recommend creating it in your program only once. In other words, this object (myhtml_t) needs to store information about threads and create myhtml_tree_t objects.

It is worth noting that myhtml_t object thread-safe. You can create this object in the main thread, and then used to create myhtml_tree_t objects in other threads.

Next, we need to create myhtml_tree_t object and it can be done in different ways:

  1. Create one myhtml_tree_t object and reuse it for each new html file. This method is the fastest, but the consumption of the memory will always be equal to the largest html file.

For example, you have four html file:

  1. 400KB first.html
  2. 4032KB second.html
  3. 100KB third.html
  4. 260KB fourth.html

The occupied memory after parsing each file (dimensions are given as an example):

  1. 400KB
  2. 4032KB
  3. 4032KB
  4. 4032KB

I use this method on a daily basis, except that if the HTML came over 5MB after parsing recreate myhtml_tree_t object.

myhtml_t* myhtml = myhtml_create();
myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 1, 0);

myhtml_tree_t* tree = myhtml_tree_create();
myhtml_tree_init(tree, myhtml);

for(size_t i = 0; i < 10000; i++) {
    myhtml_parse(tree, MyHTML_ENCODING_UTF_8, res[i].html, res[i].size);
    
    // do you need
    
    // this is the case if it comes a large html
    // to free memory
    if(res[i].size > 5000000) {
        myhtml_tree_destroy(tree);
        myhtml_tree_init(tree, myhtml);
    }
}

myhtml_tree_destroy(tree);
myhtml_destroy(myhtml);
  1. The second method is simple, every time you create a new object myhtml_tree_t. This method not much slower than the previous one.
myhtml_t* myhtml = myhtml_create();
myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 1, 0);

for(size_t i = 0; i < 10000; i++) {
    myhtml_tree_t* tree = myhtml_tree_create();
    myhtml_tree_init(tree, myhtml);
    
    myhtml_parse(tree, MyHTML_ENCODING_UTF_8, res[i].html, res[i].size);
    
    // do you need

    myhtml_tree_destroy(tree);
}

myhtml_destroy(myhtml);

I support the idea to re-use the memory, and do not initialize it every time. That is, as in the first method.

Clone this wiki locally