Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add APIs to read and destroy sections #3377

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

eloparco
Copy link
Contributor

@eloparco eloparco force-pushed the eloparco/read-and-destroy-sections branch from 7578068 to a2d24fe Compare April 30, 2024 08:36
@eloparco eloparco force-pushed the eloparco/read-and-destroy-sections branch from a2d24fe to 8efda94 Compare April 30, 2024 09:07
@yamt
Copy link
Collaborator

yamt commented Apr 30, 2024

is this basically a copy from wamr-app-framework?

@eloparco
Copy link
Contributor Author

is this basically a copy from wamr-app-framework?

Yes, I should have mentioned it in the PR description.

destroy_all_wasm_sections, destroy_all_aot_sections, destroy_part_wasm_sections and destroy_part_aot_sections are exactly the same. wasm_runtime_read_to_sections is adapted from wasm_app_module_on_install_request_byte_arrive.

We want to be able to free as many resources that are not necessary anymore after module instantiation or loading (especially for constrained embedded devices).

core/iwasm/common/wasm_runtime_common.c Outdated Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.c Outdated Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.c Outdated Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.c Outdated Show resolved Hide resolved
@eloparco eloparco force-pushed the eloparco/read-and-destroy-sections branch 2 times, most recently from 63e2571 to edb8046 Compare May 1, 2024 00:31
core/iwasm/aot/aot_loader.c Outdated Show resolved Hide resolved
core/iwasm/aot/aot_loader.c Outdated Show resolved Hide resolved
{
AOTModule fake_module; // Unused, passed since required by create_sections
return create_sections(&fake_module, buf, size, p_section_list, error_buf,
error_buf_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here in create_sections, the section->section_body may be referring to the buf, which is supposed to be freed to reduce the footprint after loading, per my understanding, the loading process can be somewhat like:

buf = read_file_to_buffer(..., &size);

wasm_runtime_read_to_sections(buf, size, &section_result);

wasm_runtime_free(buf);

wasm_runtime_load_from_sections(section_result.section_list, section_result.is_aot, ...);

wasm_runtime_destroy_sections(..); // destroy part of sections

wasm_runtime_instantiate();
...

wasm_runtime_destroy_sections(...); // destroy all of sections

So we should allocate memory for section->section_body and copy the data from the buf. How about adding a new argument to create_sections:

static bool
create_sections(AOTModule *module, const uint8 *buf, uint32 size,
                bool alloc_mem_for_section_body,
                AOTSection **p_section_list, char *error_buf,
                uint32 error_buf_size)

And pass true to alloc_mem_for_section_body, and allocate memory for section->section_body:

    if (section_type == AOT_SECTION_TYPE_TEXT) {
        ... original code ...
    }
    else {
        ...
    }

And should also change destroy_sections accordingly, add a new argument to indicate whether to free section->section_body or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I ran into the problem this morning when I was trying the code from an embedded with a sanitizer and I was getting an use-after-free error.

I'll adjust the code. Probably worth adding an example showing the usage as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I was able to release all wasm sections after module instantiation when testing in fast interpreter mode, including the code section that is left out here https://github.com/bytecodealliance/wamr-app-framework/blob/main/app-mgr/app-manager/module_wasm_app.c#L728-L746

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, and yes, it would be great if a sample is added. And we can also mention the method in the document Tune the memory usage, it should benefit to some developers of the Embdded/IoT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, let me do it

core/iwasm/interpreter/wasm_loader.c Outdated Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.c Show resolved Hide resolved
@eloparco eloparco force-pushed the eloparco/read-and-destroy-sections branch 2 times, most recently from cdae702 to 59aafe4 Compare May 1, 2024 13:15
core/iwasm/aot/aot_loader.c Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.c Outdated Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.c Outdated Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.c Outdated Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.h Outdated Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.h Outdated Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.c Outdated Show resolved Hide resolved
section->section_body_size = section_size;
if (alloc_mem_for_section_body) {
section->section_body = loader_malloc(
section->section_body_size, error_buf, error_buf_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as aot, should change destroy_sections in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed? destroy_sections is only called by load, not by create_sections

Copy link
Contributor

@wenyongh wenyongh May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to avoid memory leak, destroy_sections is called when create_sections failed: some section_bodies may have been allocated when a section body is allocated failed. See L6423.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when create_sections is called from L6423 it has alloc_mem_for_section_body set to false, meaning that no memory has to be freed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know why now, in this function, when creating section failed, it doesn't free the section nodes created like AOT, and the below code in function load will call destroy_sections to free them even when create_sections failed:

    if (!create_sections(buf, size, false, &section_list, error_buf,
                         error_buf_size)
        || !load_from_sections(module, section_list, true, error_buf,
                               error_buf_size)) {
        destroy_sections(section_list);
        return false;
    }

But I think it is not a good coding style, and it will lead to memory leak in the wasm_read_to_sections, which doesn't call destroy_sections when creating sections failed:

    return create_sections(buf, size, true, p_section_list, error_buf,
                           error_buf_size);

My suggestion is to do like aot: call destroy_sections in the lable fail when failing to create_sections, and change return false in this function to goto fail. And also add an argument free_section_body like aot.

And change below code to be like:

    if (!create_sections(buf, size, false, &section_list, error_buf,
                         error_buf_size))
        return false;

    if (!load_from_sections(module, section_list, true, error_buf,
                               error_buf_size)) {
        destroy_sections(section_list, false); //free_section_body = 
        return false;
    }

    destroy_sections(section_list, false); //free_section_body = false
    return true;

core/iwasm/common/wasm_runtime_common.c Outdated Show resolved Hide resolved
core/iwasm/common/wasm_runtime_common.c Outdated Show resolved Hide resolved
@eloparco eloparco force-pushed the eloparco/read-and-destroy-sections branch 2 times, most recently from e765899 to d1c132d Compare May 1, 2024 14:01
@eloparco eloparco force-pushed the eloparco/read-and-destroy-sections branch 2 times, most recently from b0b80d6 to c2ff590 Compare May 1, 2024 14:29
@eloparco eloparco force-pushed the eloparco/read-and-destroy-sections branch from c2ff590 to e8187f2 Compare May 1, 2024 14:48
@yamt
Copy link
Collaborator

yamt commented May 2, 2024

is this basically a copy from wamr-app-framework?

Yes, I should have mentioned it in the PR description.

please update the description then.

destroy_all_wasm_sections, destroy_all_aot_sections, destroy_part_wasm_sections and destroy_part_aot_sections are exactly the same. wasm_runtime_read_to_sections is adapted from wasm_app_module_on_install_request_byte_arrive.

We want to be able to free as many resources that are not necessary anymore after module instantiation or loading (especially for constrained embedded devices).

if you just want to save some memory, why do you need this complicated api?
i suspect what you need is an option to wasm_runtime_load etc.

@yamt
Copy link
Collaborator

yamt commented May 2, 2024

sometimes it's useful to have an api to examine sections. (especially custom sections)
eg. https://github.com/yamt/toywasm/tree/master/libdyld#dylink0-custom-section
but it isn't the purpose of this PR, is it?

@wenyongh
Copy link
Contributor

wenyongh commented May 2, 2024

is this basically a copy from wamr-app-framework?

Yes, I should have mentioned it in the PR description.

please update the description then.

destroy_all_wasm_sections, destroy_all_aot_sections, destroy_part_wasm_sections and destroy_part_aot_sections are exactly the same. wasm_runtime_read_to_sections is adapted from wasm_app_module_on_install_request_byte_arrive.
We want to be able to free as many resources that are not necessary anymore after module instantiation or loading (especially for constrained embedded devices).

if you just want to save some memory, why do you need this complicated api? i suspect what you need is an option to wasm_runtime_load etc.

wasm_runtime_load has been used for a long time, we had better not change its prototype to add an option? And we don't know what the wasm/aot file buf is allocated, normally it can be freed only after wasm_runtime_load, so if we also create sections in wasm_runtime_load and allocate memory for section->section_body, there will be almost two copies of wasm/aot file at the same time together with the memory for module data structures, the peak memory usage may be large (though many of them may be freed later) which may not meet the requirement of some embedded/iot devices. And since we have documented that the buffer for wasm_runtime_load is referred after loading, it is an ABI change if we change the behavior and developer may not know that the buf can freed. So I think it is reasonable to provide another way, the PR just introduces two API wasm_runtime_read_to_sections and wasm_runtime_destroy_sections, it should be acceptable with document and sample added.

@yamt
Copy link
Collaborator

yamt commented May 2, 2024

is this basically a copy from wamr-app-framework?

Yes, I should have mentioned it in the PR description.

please update the description then.

destroy_all_wasm_sections, destroy_all_aot_sections, destroy_part_wasm_sections and destroy_part_aot_sections are exactly the same. wasm_runtime_read_to_sections is adapted from wasm_app_module_on_install_request_byte_arrive.
We want to be able to free as many resources that are not necessary anymore after module instantiation or loading (especially for constrained embedded devices).

if you just want to save some memory, why do you need this complicated api? i suspect what you need is an option to wasm_runtime_load etc.

wasm_runtime_load has been used for a long time, we had better not change its prototype to add an option?

we have introduced wasm_runtime_load_ex very recently.

And we don't know what the wasm/aot file buf is allocated, normally it can be freed only after wasm_runtime_load, so if we also create sections in wasm_runtime_load and allocate memory for section->section_body, there will be almost two copies of wasm/aot file at the same time together with the memory for module data structures, the peak memory usage may be large (though many of them may be freed later) which may not meet the requirement of some embedded/iot devices. And since we have documented that the buffer for wasm_runtime_load is referred after loading, it is an ABI change if we change the behavior and developer may not know that the buf can freed. So I think it is reasonable to provide another way, the PR just introduces two API wasm_runtime_read_to_sections and wasm_runtime_destroy_sections, it should be acceptable with document and sample added.

iiuc, there are three memory allocations involved:

  1. user buffer (as large as the module)
  2. sections (almost same size)
  3. loaded module/instance

with this PR, the peak memory usage is max(1+2, 2+3), right?

if the peak memory usage is a problem, it's better to introduce a stream-style api
so that a user can use a fixed-sized buffer.
wasm module format is designed to be easily loaded that way.
that is,

  1. user buffer (1 byte at minimum)
  2. loaded module/instance

and/or, maybe we can simply make a user specify a "free the buffer" callback so that some buffers can be freed a bit earlier.

}

/* Destroy all sections after instantiation. For AOT, the text sections must
* be kept because referenced during execution */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, an api should not hardcode this kind of implementation details.

@wenyongh
Copy link
Contributor

wenyongh commented May 2, 2024

is this basically a copy from wamr-app-framework?

Yes, I should have mentioned it in the PR description.

please update the description then.

destroy_all_wasm_sections, destroy_all_aot_sections, destroy_part_wasm_sections and destroy_part_aot_sections are exactly the same. wasm_runtime_read_to_sections is adapted from wasm_app_module_on_install_request_byte_arrive.
We want to be able to free as many resources that are not necessary anymore after module instantiation or loading (especially for constrained embedded devices).

if you just want to save some memory, why do you need this complicated api? i suspect what you need is an option to wasm_runtime_load etc.

wasm_runtime_load has been used for a long time, we had better not change its prototype to add an option?

we have introduced wasm_runtime_load_ex very recently.

Yes, it is good to modify wasm_runtime_load_ex only, we may add an option for wasm_runtime_load_ex to choose whether to create sections, load and free sections. And even add another option of freeing buffer callback to allow wasm_runtime_load_ex to free the buffer, so it can create sections, free wasm/aot file buffer, load and free sections to reduce the peak memory usage.

And we don't know what the wasm/aot file buf is allocated, normally it can be freed only after wasm_runtime_load, so if we also create sections in wasm_runtime_load and allocate memory for section->section_body, there will be almost two copies of wasm/aot file at the same time together with the memory for module data structures, the peak memory usage may be large (though many of them may be freed later) which may not meet the requirement of some embedded/iot devices. And since we have documented that the buffer for wasm_runtime_load is referred after loading, it is an ABI change if we change the behavior and developer may not know that the buf can freed. So I think it is reasonable to provide another way, the PR just introduces two API wasm_runtime_read_to_sections and wasm_runtime_destroy_sections, it should be acceptable with document and sample added.

iiuc, there are three memory allocations involved:

  1. user buffer (as large as the module)
  2. sections (almost same size)
  3. loaded module/instance

with this PR, the peak memory usage is max(1+2, 2+3), right?

Yes, the user buffer can be freed after the sections are created.

if the peak memory usage is a problem, it's better to introduce a stream-style api so that a user can use a fixed-sized buffer. wasm module format is designed to be easily loaded that way. that is,

  1. user buffer (1 byte at minimum)
  2. loaded module/instance

and/or, maybe we can simply make a user specify a "free the buffer" callback so that some buffers can be freed a bit earlier.

How about letting the developer pass a read stream callback to read n bytes from the wasm/aot file, somewhat like

typedef uint32 (*read_file_callback_t)(char *buf, uint32 count);  //read count bytes to buffer

wasm_module_t
wasm_runtime_load_from_stream(read_file_callback_t, char *error_buf, uint32_t error_buf_size);

Or just extend wasm_runtime_load_ex to allow developer to pass the read file callback to LoadArgs structure.

Then the loader can then read and create sections first (read section type and section size, create section body with section size, and then read data to section body), and then load from sections, then destroy sections after loading. In this way, we can reduce both the peak memory usage and memory usage after loading.

@eloparco
Copy link
Contributor Author

eloparco commented May 3, 2024

The specific problem I'm trying to solve is the one described in #3354.

I'm using the wasm-c-api and module and module instance are saved into the store, that can't be freed until the end of wasm execution. And the module in the store has a copy of the module binary

wasm_byte_vec_copy(module_ex->binary, binary);
which is kept until the store is destroyed.

By using the APIs described in this PR, I can free the sections after module instantiation. Especially the code and data sections, that can be quite large, saving some memory before starting the wasm execution.
I don't think it's possible to obtain the same result with existing APIs.

@eloparco
Copy link
Contributor Author

eloparco commented May 3, 2024

Actually I was thinking, what if we only save the pointer here

module_ex->binary = malloc_internal(sizeof(wasm_byte_vec_t));
if (!module_ex->binary)
goto free_module;
wasm_byte_vec_copy(module_ex->binary, binary);
if (!module_ex->binary->data)
goto free_binary;
instead of copying? That would allow more control and fix the problem I have when using the wasm c api.
We could specify that behavior in LoadArgs from wasm_module_new_ex.

@wenyongh
Copy link
Contributor

wenyongh commented May 6, 2024

Actually I was thinking, what if we only save the pointer here

module_ex->binary = malloc_internal(sizeof(wasm_byte_vec_t));
if (!module_ex->binary)
goto free_module;
wasm_byte_vec_copy(module_ex->binary, binary);
if (!module_ex->binary->data)
goto free_binary;

instead of copying? That would allow more control and fix the problem I have when using the wasm c api.
We could specify that behavior in LoadArgs from wasm_module_new_ex.

@eloparco it seems to be a common issue so I think we can have a try. And we can also try to enable other methods in the future:

  • Enable LoadArgs->no_wasm_binary_copy or LoadArgs->clone_wasm_binary for wasm_runtime_load_ex, like yamt mentioned. For example, create sections internally and free parts of sections at the end of loading. And if free callback is provided in LoadArgs, runtime can call it to free the input buffer after creating sections to reduce peak memory usage.

  • Create sections from a stream when a read stream callback is provided, in this way, no need to read the whole wasm file to a buffer, so as to reduce the peak memory usage and total memory usage after loading.

  • Use wasm_runtime_read_to_sections and wasm_runtime_destroy_sections that this PR implements. Could you check the comment for destroy_sections and help address it if it is good to you? My suggestion is to merge this PR after it is well reviewed, but had better get others' feedback.

@yamt
Copy link
Collaborator

yamt commented May 7, 2024

How about letting the developer pass a read stream callback to read n bytes from the wasm/aot file, somewhat like

typedef uint32 (*read_file_callback_t)(char *buf, uint32 count);  //read count bytes to buffer

wasm_module_t
wasm_runtime_load_from_stream(read_file_callback_t, char *error_buf, uint32_t error_buf_size);

i feel that a callback based api in this case is a bit cumbersome to use for many use cases.
consider a loop with non-blocking recv and poll, which loads a module received via a socket.
in that case, a context-based apl similar to what we have in wamr-app-framework would be much simpler to use.

@wenyongh
Copy link
Contributor

wenyongh commented May 9, 2024

How about letting the developer pass a read stream callback to read n bytes from the wasm/aot file, somewhat like

typedef uint32 (*read_file_callback_t)(char *buf, uint32 count);  //read count bytes to buffer

wasm_module_t
wasm_runtime_load_from_stream(read_file_callback_t, char *error_buf, uint32_t error_buf_size);

i feel that a callback based api in this case is a bit cumbersome to use for many use cases. consider a loop with non-blocking recv and poll, which loads a module received via a socket. in that case, a context-based apl similar to what we have in wamr-app-framework would be much simpler to use.

Yes, that is better and it would be great if it supports receiving wasm file buffer from socket, uart or from a disk file. Not sure what the context-based api will be like?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants