Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llext: Proper support for Thumb32 relocations #72832

Closed
edersondisouza opened this issue May 16, 2024 · 3 comments · Fixed by #73425
Closed

llext: Proper support for Thumb32 relocations #72832

edersondisouza opened this issue May 16, 2024 · 3 comments · Fixed by #73425
Assignees
Labels
area: Linkable Loadable Extensions Enhancement Changes/Updates/Additions to existing features

Comments

@edersondisouza
Copy link
Collaborator

Is your enhancement proposal related to a problem? Please describe.
LLEXT doesn't seem to support certain Thumb32 relocation instructions on ARM targets. One can check that by building the samples/subsys/llext/edk from #69831 (follow instructions at the README.rst there). If one removes the first non-comment line from samples/subsys/llext/edk/app/CMakeLists.txt, and build the EDK, extensions and app, everything will be fine until one runs the application.

Without that line, there'll be a fault when the app tries to run the first extension:

[00:00:00.950,000] <err> os: ***** DATA ABORT *****
[00:00:00.950,000] <err> os: Permission Fault @ 0x00000000
[00:00:00.950,000] <err> os: r0/a1:  0x00022060  r1/a2:  0x00000000  r2/a3:  0x00022060
[00:00:00.960,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000001 r14/lr:  0x00010b98
[00:00:00.960,000] <err> os:  xpsr:  0x0000013f
[00:00:00.960,000] <err> os: Faulting instruction address (r15/pc): 0x0002b50a
[00:00:00.960,000] <err> os: >>> ZEPHYR FATAL ERROR 48: Unknown error on CPU 0
[00:00:00.960,000] <err> os: Current thread: 0x20840 (unknown)

Looking at the generated assembly for the offending instruction, one sees:

 4fc:   f7ff fded       bl      da <k_object_alloc>
 500:   4602            mov     r2, r0
 502:   f240 0300       movw    r3, #0
                        502: R_ARM_THM_MOVW_ABS_NC      my_sem
 506:   f2c0 0300       movt    r3, #0
                        506: R_ARM_THM_MOVT_ABS my_sem
 50a:   601a            str     r2, [r3, #0]

Last line is the offending instruction, it tries to write to memory pointed to byr3, but r3 is 0. Looking at the instructions above (at 502 and 506), r3 is set to be the address of my_sem object, which should be obtained via R_ARM_THM_MOVW_ABS_NC and R_ARM_THM_MOVT_ABS relocation instructions. The fact that r3 is zero indicates that the relocation is somehow not working.

Looking at the debug log from LLEXT, one sees several interesting messages related to my_sem symbol:

[00:00:00.690,000] <dbg> llext: llext_link: relocation 2:30 info 172f (type 47, sym 23) offset 1248 sym_name my_sem sym_type 0 sym_bind 0 sym_ndx 4
[00:00:00.690,000] <dbg> llext: llext_link: not relocated
[00:00:00.700,000] <dbg> llext: llext_link: relocation 2:31 info 1730 (type 48, sym 23) offset 1252 sym_name my_sem sym_type 0 sym_bind 0 sym_ndx 4
[00:00:00.700,000] <dbg> llext: llext_link: not relocated
[00:00:00.700,000] <dbg> llext: llext_link: relocation 2:32 info 172f (type 47, sym 23) offset 1282 sym_name my_sem sym_type 0 sym_bind 0 sym_ndx 4
[00:00:00.710,000] <dbg> llext: llext_link: not relocated
[00:00:00.710,000] <dbg> llext: llext_link: relocation 2:33 info 1730 (type 48, sym 23) offset 1286 sym_name my_sem sym_type 0 sym_bind 0 sym_ndx 4
[00:00:00.710,000] <dbg> llext: llext_link: not relocated
[00:00:00.710,000] <dbg> llext: llext_link: relocation 2:34 info 172f (type 47, sym 23) offset 1292 sym_name my_sem sym_type 0 sym_bind 0 sym_ndx 4
[00:00:00.720,000] <dbg> llext: llext_link: not relocated
[00:00:00.720,000] <dbg> llext: llext_link: relocation 2:35 info 1730 (type 48, sym 23) offset 1296 sym_name my_sem sym_type 0 sym_bind 0 sym_ndx 4
[00:00:00.720,000] <dbg> llext: llext_link: not relocated

The not relocated message is the final else branch for the if starting at https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/llext/llext.c#L825. It seems that LLEXT doesn't consider to be a needed relocation there and moves on. Interestingly enough, the arch relocation handler seems to be aware of Thumb32 relocation instructions, but that code is never called due to failing the if above.

Describe the solution you'd like
Support for the Thumb32 relocation instructions. It seems there's support on the arch to honour those relocation instructions, but they seem to not be recognised as such.

Describe alternatives you've considered
One workaround is to somehow build the extension without using those instructions, as it is currently done at #69831. However, that is done by an ugly hack that seems totally unrelated to the problem (dropping compiler flags -mcpu=cortex-r5 and -mfloat-abi=hard. This seems (judging from the objdump description) to change the architecture to armv4t (as opposed to armv7) and avoid those instructions. By doing that, the sample at #69831 just works, but it's far from ideal.

Additional context
While my analysis here could be way off (like, the relocation not being the root cause of the issue), there still the issue of the sample failing with the seemingly unrelated compiler flags.

@mathieuchopstm
Copy link
Contributor

mathieuchopstm commented May 24, 2024

Taking the Linux kernel's linker as a reference point - relevant parts being:

The main difference with Zephyr, as far as I can tell - and ignoring unrelated details - seems to be that Linux completely ignores the ELF_ST_TYPE of symbols; on the other hand, Zephyr only relocates symbols if they have specific values for it, as seen here:

zephyr/subsys/llext/llext.c

Lines 840 to 852 in 8c8e2f6

} else if (ELF_ST_TYPE(sym.st_info) == STT_SECTION ||
ELF_ST_TYPE(sym.st_info) == STT_FUNC ||
ELF_ST_TYPE(sym.st_info) == STT_OBJECT) {
/* Link address is relative to the start of the section */
link_addr = (uintptr_t)ext->mem[ldr->sect_map[sym.st_shndx]]
+ sym.st_value;
LOG_INF("found section symbol %s addr 0x%lx", name, link_addr);
} else {
/* Nothing to relocate here */
LOG_DBG("not relocated");
continue;
}

@edersondisouza could you try if the following patch fixes the issue on your side?

diff --git a/subsys/llext/llext.c b/subsys/llext/llext.c
index c92f49abd60..16748d941b4 100644
--- a/subsys/llext/llext.c
+++ b/subsys/llext/llext.c
@@ -835,18 +835,15 @@ static int llext_link(struct llext_loader *ldr, struct llext *ext, bool do_local
                                } else {
                                        LOG_INF("found symbol %s at 0x%lx", name, link_addr);
                                }
-                       } else if (ELF_ST_TYPE(sym.st_info) == STT_SECTION ||
-                                  ELF_ST_TYPE(sym.st_info) == STT_FUNC ||
-                                  ELF_ST_TYPE(sym.st_info) == STT_OBJECT) {
+                       } else if (IN_RANGE(sym.st_shndx, 0xFF00, 0xFFFF)) {
+                               //0xFF00 => SHN_LORESERVE, 0xFFFF => SHN_HIRESERVE
+                               LOG_DBG("section index in RESERVED range - not relocating");
+                               continue;
+                       } else {
                                /* Link address is relative to the start of the section */
                                link_addr = (uintptr_t)ext->mem[ldr->sect_map[sym.st_shndx]]
                                        + sym.st_value;
 
                                LOG_INF("found section symbol %s addr 0x%lx", name, link_addr);
-                       } else {
-                               /* Nothing to relocate here */
-                               LOG_DBG("not relocated");
-                               continue;
                        }
 
                        LOG_INF("writing relocation symbol %s type %zd sym %zd at addr 0x%lx "

Edit: re-reading the comment thread in EDK PR, I have seen the issue appeared on qemu_cortex_r5, so I tested on my side. Using ./scripts/twister -p qemu_cortex_r5 --testsuite-root tests/subsys/llext -K (-K required because qemu_cortex_r5 is excluded from LLEXT tests):

  • Without patch, 8 tests are performed: one is skipped, one passes and the other six fail
  • With the patch, one is still skipped but all other tests pass 😄

@edersondisouza
Copy link
Collaborator Author

edersondisouza commented May 24, 2024

@mathieuchopstm I've tested it here with the sample and can confirm that your patch does work!!! Will you submit a PR, please?

mathieuchopstm added a commit to mathieuchopstm/zephyr that referenced this issue May 28, 2024
In the current implementation, the LLEXT linker will only apply
relocations for a symbol if it has a specfic symbol type.
This is overzealous and causes issues on some platforms as legitimate
relocations are not performed because they are marked with a "bad" type.

Ignore the symbol type when performing relocation to solve this problem,
but exclude symbols in special (reserved) sections from being relocated.

This commit also updates an outdated comment.

Fixes zephyrproject-rtos#72832.

Signed-off-by: Mathieu Choplain <mathieu.choplain@st.com>
@mathieuchopstm
Copy link
Contributor

@edersondisouza PR with fix submitted

mathieuchopstm added a commit to mathieuchopstm/zephyr that referenced this issue May 30, 2024
In the current implementation, the LLEXT linker will only apply
relocations targeting a given symbol if it has a specfic symbol type.
This is overzealous and causes issues on some platforms, as some symbols
that need to be relocated are skipped due to being of a "bad" type.

Ignore the symbol type when performing relocation to solve this problem,
but also add checks to ensure we don't attempt to relocate symbols with
an invalid section index. If such a relocation is found, error out
instead of ignoring the relocation entry to ensure that it is impossible
to execute code from a (partially) unrelocated LLEXT.

Also remove all hacks added to circumvent this issue:
* qemu_cortex_r5 exclusion from test cases
* unnecessary exclusion of some flags when building with LLEXT EDK

Fixes zephyrproject-rtos#72832.

Signed-off-by: Mathieu Choplain <mathieu.choplain@st.com>
mathieuchopstm added a commit to mathieuchopstm/zephyr that referenced this issue May 31, 2024
In the current implementation, the LLEXT linker will only apply
relocations targeting a given symbol if it has a specfic symbol type.
This is overzealous and causes issues on some platforms, as some symbols
that need to be relocated are skipped due to being of a "bad" type.

Ignore the symbol type when performing relocation to solve this problem,
but also add checks to ensure we don't attempt to relocate symbols with
an invalid section index. If such a relocation is found, return an error
instead of ignoring the relocation entry to ensure that it is impossible
to execute code from a (partially) unrelocated LLEXT.

Also remove all hacks added to circumvent this issue:
* qemu_cortex_r5 exclusion from test cases
* unnecessary exclusion of some flags when building with LLEXT EDK

Fixes zephyrproject-rtos#72832.

Signed-off-by: Mathieu Choplain <mathieu.choplain@st.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Linkable Loadable Extensions Enhancement Changes/Updates/Additions to existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants