Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured representation #5

Open
antiagainst opened this issue Feb 11, 2017 · 12 comments
Open

Structured representation #5

antiagainst opened this issue Feb 11, 2017 · 12 comments

Comments

@antiagainst
Copy link
Collaborator

Currently there is only a plain data representation (DR) implemented in this project. While it introduces little overheads when working with the binary parser and is straightforward for investigating the contents of SPIR-V binaries, it has quite limited functionalities. Instructions are not interconnected; types are not hierarchicalized; metadata (decorations, debug instructions, etc.) are not linked to the targets; etc. It is inconvenient to build future tasks like validation, optimization, and translating from front end languages on top of this DR. So a structured representation (SR) should be introduced, containing:

  • A type hierarchy
  • A constant hierarchy
  • Common language constructs: Module, Function, BasicBlock, Instruction, etc.
  • Representing all SPIR-V instructions
  • Decoration struct. This will be attached to types, instructions, etc.

Type hierarchy

We can either implement it using 1) enums or 2) marker traits with multiple implementing structs.

  1. Types are frequently used, it would be nice to reduce the overheads. This favors enums since marker traits leads to trait objects.
  2. It reads better and is more easy to use (without excessive matchs). Also easy to attach data and methods (for like, getting constants from a type, composing types, etc.) on each type.

I'm leaning towards 2. Let's worry about performance later.

// sketch

trait Type {
  is_numerical(&self) -> bool;
  is_aggregate(&self) -> bool;
  is_composite(&self) -> bool;
  ...
}
trait ScalarType { ... }
trait AggregateType { ... }

struct FloatType {
  bitwidth: u32,
}
struct VectorType {
  element_type: &ScalarType,
  size: u32,
}
struct ArraryType { ... }

impl ScalarType for IntType { ... }
impl AggreateType for VectorType { ... }

Representing SPIR-V instructions

A single flexible instruction struct that can represent all SPIR-V instructions is inferior to a dedicated instruction struct for each SPIR-V instruction: it loses the structure naturally encoded in dedicated instruction struct and will need validation for operand layout correctness, etc. A huge enum containing all possible instructions is ugly. It seems there is no way to go around trait objects, which is avoided by the Rust community to the best. But it's the natural way of fulfilling this kind of task.

// Sketch
 
trait Instruction {
  result_id(&self) -> Option<u32>;
  result_type(&self) -> Option<&Type>;
  ...
}

struct IAddInst {
  result_type: &Type,
  result_id: u32,
  op1: &Instruction,
  op2: &Instruction
}

impl Instruction for IAddInst { ... }

There are quite some instructions to implement, but most of them can be generated from the grammar.

Decoration struct

Decorations can have associated data.

Special care should be taken when handling struct types since decorations on struct members cannot penetrate to the underlying base types.

@kvark
Copy link
Member

kvark commented Sep 13, 2018

Please please make this happen! We've started writing the SPIRV->MSL/DXBC/HLSL/GLSL/etc transpiler in Rust, and processing the raw SPIRV modules (that are just a bunch of Instruction things) is a huge pain.

We can either implement it using 1) enums or 2) marker traits with multiple implementing structs.

I honestly don't see a compelling argument to 2. My choice would certainly be 1), given that:

  • you have the spec defining all the things, so you don't need to arbitrary expand the enums
  • when you update the spec, you can expand the enums and bump the breaking change version, it's fine
  • no overhead
  • explicit matching, instead of opaque accessor methods
  • you can always add those convenient accessors to enums if you want to

@antiagainst
Copy link
Collaborator Author

Thanks @kvark! I hear you. I'll look into putting some effort into implementing the structured representation. :)

@kvark
Copy link
Member

kvark commented Mar 14, 2019

@antiagainst please don't forget about this. As we delay, spirv-cross is getting developed further, and it will be harder for us to catch up.

@antiagainst
Copy link
Collaborator Author

Hey @kvark, I'm sorry I kept saying that I would like to devote some time to this but didn't really fulfilled. I recently moved onto working on TensorFlow and MLIR, so I'm not sure when I'll have time on this in the near future. But it's definitely on my plate. In the meantime, feel free to contribute! :)

@kvark
Copy link
Member

kvark commented Jul 3, 2019

@antiagainst I'd really like to move this forward. If you don't have time and energy to do this, could you outline a rough plan for contributors to follow when implementing this? Would you also be able to review the changes?

@antiagainst
Copy link
Collaborator Author

Hey @kvark, sorry for the late reply. I'm correctly busy with MLIR work so likely won't have the bandwidth here in the near future. But I'd certainly happy to help answering questions and reviewing pull requests.

@kvark
Copy link
Member

kvark commented Jul 15, 2019

Looked at the state of this today. Apparently, most of the infrastructure is already in place:

  • types
  • decorations
  • constants
  • instructions
  • modules, functions, etc

Do I understand correctly that we only need to get those root-level items (modules, functions, etc) now, after which we can implement conversion to/from the data representation? Or did you plan on working with bytes directly and skip the "mr" layer?

@antiagainst
Copy link
Collaborator Author

Firstly, regarding the current existing code for structured representation, it actually deviates from the design proposal as in the original post. Instead of doing trait objects, the existing code just uses memory arena and vectors and indexes into vectors. (I read several blog post regarding how to better model graphs using Rust; that seems the suggested way. But things may change now with the Rust ecosystem. So better suggestions certainly welcome!)

Yes structured representation classes for SPIR-V module, function, basic block are missing. We will need to add that. Aside from those, instruction class is half baked. We need to properly wire up instructions so that we can trace an operand back to its original generating instruction. So instead of using spirv::Word, which is from the original binary, we need something like a instructions vector inside the Context and have InstructionToken indexing into it. The tricky part is that there are a lots of churn for instructions: we want to create/delete instructions frequently for optimizations. So maybe a map is more appropriate for this case then. (Feel free to propose better ideas given that I haven't tracked Rust evolving for quite some time so there maybe new awesome stuff there that we can leverage!)

Besides, module-level instructions (e.g., OpEntryPoint, OpExecutionModel) are actually "metadata", compared to function-level instructions (e.g., OpFMul), which are real computation. We should consider "fold" these metadata instructions into the module representation itself so that we don't need to go through the same instruction processing mechanism to rediscover these facts.

Similarly for decorations, they are also metadata to the decorated instructions. We should also consider folding them into the instruction they decorate. The sr::Decoration class isn't wired up with anything yet given that sr::Instruction is half baked. So this is also something we need to do.

Regarding the translation path, my original thought is to have binary -> mr -> sr because I think it's easier to bootstrap. Going from binary to mr we can see in the future.

@kvark
Copy link
Member

kvark commented Jul 16, 2019

Thanks for the detailed response!

Instead of doing trait objects, the existing code just uses memory arena and vectors and indexes into vectors.

I think it's working well. If we have, say, Type contain other types, we still have to have an indirection, because otherwise the compiler can't figure out the size (reasonably), i.e.

enum Foo {
  One,
  Two(Foo), // NOPE
}

We could use Box though when referring to other types. This would make the memory access pattern worse, but ownership and general convenience would improve.
I'm planning on evaluating if it's feasible to use boxes at some point, but currently continuing the "token" based approach you started.

A lot of the things you describe are something I was already half-way doing, i.e.

instruction class is half baked
So instead of using spirv::Word, which is from the original binary, we need something like a instructions vector inside the Context and have InstructionToken indexing into it.
We should consider "fold" these metadata instructions into the module representation itself so that we don't need to go through the same instruction processing mechanism to rediscover these facts.

I'm splitting the instruction generator to put some stuff as separate structures (the mode setting ops, other module stuff) and the others as an enum variants (real instructions).

Similarly for decorations, they are also metadata to the decorated instructions. We should also consider folding them into the instruction they decorate.

Agreed. Similarly to what #38 is doing.

Regarding the translation path, my original thought is to have binary -> mr -> sr because I think it's easier to bootstrap. Going from binary to mr we can see in the future.

Great, thank you! I'm glad we are on the same page :)

antiagainst pushed a commit that referenced this issue Jul 19, 2019
Part of #5

Also refines the decoration support with:

1. associating member decorations directly with the members
2. supporting multiple decorations
@antiagainst
Copy link
Collaborator Author

I'm splitting the instruction generator to put some stuff as separate structures (the mode setting ops, other module stuff) and the others as an enum variants (real instructions).

Yup, a progressive approach SGTM. We can gradually make more and more instructions stand-alone. :)

@jpryne
Copy link

jpryne commented Oct 20, 2019

I've begun working through rspirv, & will devote my time to moving this forward.

@kvark
Copy link
Member

kvark commented Oct 20, 2019

Awesome, welcome to the party @jpryne ! You probably want to start off #106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants