Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Add Union Types #5

Open
penguin-wwy opened this issue Dec 30, 2018 · 6 comments
Open

About Add Union Types #5

penguin-wwy opened this issue Dec 30, 2018 · 6 comments

Comments

@penguin-wwy
Copy link

Hi, i want to add union type. However, i rarely use js, so some concepts which i understand may be error.
I will describe my thoughts about union type and if you approve it, then i will commit pr when done.

Example
ts code

function foo(): string | number | undefined;

converted to llvm-ir

%string = type { i8*, i32 }
%string.number.box = type { i32, %string, double}

so, the first i32 means the real return type's index, and then 0 means undefined.

Thus, we can use type { i32, [ 0 x i8 ] } to indicate undefined and use i32* null to indicate null in ts.

When union type is used on the parameter, i only get a certain type from parameter because function emitted when called, the type of parameter passed in at this time is ok. In this way, function will be emitted for each combination of parameter types. This can cause serious code bloat problem. Must do something to merge these function.

@emlai
Copy link
Owner

emlai commented Jan 1, 2019

Hi and welcome! Here are my thoughts on this:

Converting string | number | undefined to type { i32, %string, double } seems OK to me as an initial implementation. Later we would want to replace it with something like type { i32, [ N x i8 ] } where N is the size of the largest union member. This way we can reuse the same memory for each member, since only one of them is active at a time, like a C union.

I think we can use i8 for the discriminator instead of i32 since 256 members should be enough for anyone.

For undefined and null, in my opinion we don't need to support both from the start if it makes implementing this easier.

Also it might be easier to treat undefined as just a regular union member like the string and number, and generate a placeholder case for it in the LLVM struct: type { i32, %string, double, i8 }. Then the LLVM values would look like { 0, "abc", undef, undef } when the string is active, { 1, undef, 42, undef } when the number is active, and { 2, undef, undef, undef } when the undefined is active. Or, we could mark undefined and null with negative discriminators like { -1, undef, undef } and { -2, undef, undef } respectively. Whatever is easiest to implement, we can always optimize it later.

For function parameters I think we should use the same LLVM type as for return types, probably easier that way. Then we don't have to think about merging different instantiations of the same functions.

When you have something working done, feel free to make a WIP pull request, so maybe I can give more concrete feedback on the implementation.

@emlai emlai mentioned this issue Jan 1, 2019
47 tasks
@Kingwl
Copy link

Kingwl commented Jan 2, 2019

In fact, I am more worried about another thing:
Union of Object and type guard

for example:

type T = { a:number, b: string} | {b: string, c: number}

declare const t: T;
if ('c' in t) {
// ...
} else {
// ...
}

what is the expected memory layout for that?

@emlai
Copy link
Owner

emlai commented Jan 2, 2019

I think the memory layout would ideally be:

%ab = type { double, %string }
%bc = type { %string, double }
%T = type { i8, i8* }

When the first member is active, the i8 is 0 and we treat the i8* as a %ab*.
When the second member is active, the i8 is 1 and we treat the i8* as a %bc*.

But we can also initially implement it as %T = type { i8, %ab, %bc }, i.e. without sharing the memory for the two members, if it makes things easier.


Regarding 'c' in t:

I think ts-llvm will not allow more properties than is written in the type, so e.g. objects of type { a:number, b: string } cannot have other properties than a and b. This means that if type T = { a:number, b: string } then 'c' in t is resolved at compile-time to false.

When T is a union however, 'c' in t must be compiled to a switch to check which union member is active, and only then resolve to the constant true or false.

However I don't think we need to support the in operator for the initial release.

@Kingwl
Copy link

Kingwl commented Jan 2, 2019

Yes, that is the normally union type implement,but the ts are structural subtyping and do not have pattern match, that means we cannot access an object with the actually type and memory layout (if i'm correct),maybe an object is only a part of the other bigger object?

Of course we could limit some syntax such as all types for a union must have same members layout.
But IMO, if we wanna to implement full feature of that, we should consider that early

However I don't think we need to support the in operator for the initial release.

Actually,nearly all type guard have the same problem :XD

@penguin-wwy
Copy link
Author

penguin-wwy commented Jan 2, 2019

@emlai For function parameters i think you are right, no need to consider about merging different instantiations of the same functions.

Then, for undefined, replace it to type { i8, i8* } , reuse the same memory for each member like C union, i think this way may lead to difficulties in codegen. Any type can cast to i8, but which type should be cased from i8 when it used? In some cases, relying solely on the index does not completely save the type information.
For example

define internal { i8, i8* } @foo {
    ...
    ret_lable1:
        %s = bitcast %string* ... to i8*
        %3 = insertvalue { i8, i8* } undef, %0, 0
        %ret = insertvalue { i8, i8* } %3, %s, 1 
    ret_lable2:
        %d = bitcast i32* ... to i8*
        %4 = insertvalue { i8, i8* } undef, %1, 0
        %ret = insertvalue { i8, i8* } %4, %d, 1
    ...
    label3:
        ret { i8, i8* } %ret
}

Then, how can I know this fact that if first elem is 0, i8* cast to %string* else first elem is 1, i8* cast to i32*. We may create a large map to save { function : this function' return type } and other flag to different states.

If want to implemented like C, we need an extra layer middle-ir or high-ir, which as an intermediary to tell us how to merge different type to i8 and this action is safe.

So, i suggest that when transform ts to llvm, as an initial implementation, we must save all types of information as much as possible

for example

var one = undefined;

Marking the variable one type is { i8 0, i8* undef }.

when

function foo(): string { ... }
one = foo();

The one should be marked type { i8 1, %string* %s }, instead of { i8 1, i8* %s }

Then we can use directly api such as StructType.getElementType to get the variable type without other.

After that, according to the actual effect to optimize by other intermediate implementations. Minimize the complexity and code coupling we need to face in the implementation process.

Thinks :)

@emlai
Copy link
Owner

emlai commented Jan 11, 2019

Then, how can I know this fact that if first elem is 0, i8* cast to %string* else first elem is 1, i8* cast to i32*.

I think we can get that information from the TypeScript's AST and type-checker: we ask it what foo's return type is and then we know it is string | number. Then from that we can see that if the first element is 0, the second is %string, and so on.

My type { i8, i8* } was just an example if both of the union members are represented as pointers, which is the case for objects. But if one of the members is a string (which ts-llvm currently represents as a struct, not as a pointer), then the union memory layout would need to accommodate for the largest member. So then the union would be mapped to e.g. type { i8, %string } for string | number. And then cast the %string member to double if the number is active. We just need to compare the sizes of the union members and pick the largest for the second member, just like when implementing a C union in LLVM: https://mapping-high-level-constructs-to-llvm-ir.readthedocs.io/en/latest/basic-constructs/unions.html. I can't remember if we can cast the %string member to double directly in LLVM IR, if not then cast the whole union pointer like in the article linked above.

For your example where the type of a variable changes over time, e.g.:

var one = undefined;
...
one = foo();

I don't know if the TypeScript compiler API allows getting the "full" type of the variable i.e. undefined | string here. If it does then we can just use that type when converting one to LLVM IR. If not then I don't think we need to support that kind of construct at first, since the user can always declare the type explicitly, and then it's easy for us to get the correct type:

var one: undefined | string = undefined;
...
one = foo();

I personally don't see a need for a middle-IR or high-IR yet based on these examples..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants