Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode type-/layout- vs. region-based alias info separately in LLVM IR #54395

Open
topolarity opened this issue May 7, 2024 · 0 comments
Open
Labels
compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM status:help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@topolarity
Copy link
Member

topolarity commented May 7, 2024

Currently we redundantly encode region information in both !tbaa and !alias.scope metadata for LLVM.

We'd like to separate these so that TBAA is only used to encode the layout-/type-based non-aliasing information, and !alias.scope is used just for the region-based information.

For reference, here's the existing TBAA hierarchy:

julia/src/codegen.cpp

Lines 351 to 375 in 5f7bfc0

struct jl_tbaacache_t {
// type-based alias analysis nodes. Indentation of comments indicates hierarchy.
MDNode *tbaa_root; // Everything
MDNode *tbaa_gcframe; // GC frame
// LLVM should have enough info for alias analysis of non-gcframe stack slot
// this is mainly a place holder for `jl_cgval_t::tbaa`
MDNode *tbaa_stack; // stack slot
MDNode *tbaa_unionselbyte; // a selector byte in isbits Union struct fields
MDNode *tbaa_data; // Any user data that `pointerset/ref` are allowed to alias
MDNode *tbaa_binding; // jl_binding_t::value
MDNode *tbaa_value; // jl_value_t, that is not jl_array_t or jl_genericmemory_t
MDNode *tbaa_mutab; // mutable type
MDNode *tbaa_datatype; // datatype
MDNode *tbaa_immut; // immutable type
MDNode *tbaa_ptrarraybuf; // Data in an array of boxed values
MDNode *tbaa_arraybuf; // Data in an array of POD
MDNode *tbaa_array; // jl_array_t or jl_genericmemory_t
MDNode *tbaa_arrayptr; // The pointer inside a jl_array_t (to memoryref)
MDNode *tbaa_arraysize; // A size in a jl_array_t
MDNode *tbaa_arrayselbyte; // a selector byte in a isbits Union jl_genericmemory_t
MDNode *tbaa_memoryptr; // The pointer inside a jl_genericmemory_t
MDNode *tbaa_memorylen; // The length in a jl_genericmemory_t
MDNode *tbaa_memoryown; // The owner in a foreign jl_genericmemory_t
MDNode *tbaa_const; // Memory that is immutable by the time LLVM can see it
bool initialized;

The first step is probably to (1) re-factor the code to stop using ::fromTBAA. Instead, the region information should be passed around as a separate piece of aliasing-related information. Most likely that means creating a jl_aliasinfo_t much earlier, and updating jl_cgval_t to carry it instead of just TBAA metadata. Then, (2) remove tbaa_gcframe, tbaa_stack, tbaa_const, and tbaa_data from the TBAA hierarchy.

Afterwards, an excellent follow-up will be to expand the TBAA hierarchy to encode a much broader set of types into it, including ideally user-defined structs.

@topolarity topolarity added status:help wanted Indicates that a maintainer wants help on an issue or pull request compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM labels May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM status:help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests

1 participant