Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add generic parameters to types to distinguish different data structures on different architectures #118

Open
Lancern opened this issue Oct 29, 2021 · 3 comments · May be fixed by #143

Comments

@Lancern
Copy link

Lancern commented Oct 29, 2021

Motivation

The link between core data types (such as Capstone, Insn, InsnDetail, etc.) and architecture-specific data types (the data types under crate::arch) is not that "explicit" on the typing. For example, to get the instruction ID on the x86 arch, we have to code:

let cs = Capstone::new().x86().build().unwrap();
let insns = cs.disasm_all(CODE, 0x1000).unwrap();
for i in insns.as_ref() {
    let insn_id = unsafe {
        // Neither typing nor documentation mention this!
        std::mem::transmute::<_, X86Insn>(i.id().0)
    };
    // do something with insn_id
}

which is not easier for beginners to catch.

We can simply resolve the problem above by adding a method arch_insn_id that returns the corresponding instruction ID enum variants just like the InsnDetail::arch_detail method:

pub enum ArchInsnId {
    X86InsnId(X86Insn),
    // other architectures
}

impl<'a> Insn<'a> {
    pub fn arch_insn_id(&self) -> ArchInsnId {
        // ... code
    }
}

This leads to another slightly disturbing problem. We have to match against the return value of arch_insn_id to extract the x86-specific instruction ID, given that we are already confident about the architecture. This problem also arises when we call the InsnDetail::arch_detail method or other methods with a similarly-typed return value.

The Proposal

The proposal posted here is only an (too-)early draft and more details may be missing for further considerations and discussions.

First of all, we can add a new trait that abstracts a specific architecture:

pub trait Arch {
    type InsnId;
    type InsnDetail;
    // ... other stuff
}

Then, we add a generic parameter to Capstone, Insn and InsnDetail that represents the architecture:

pub struct Capstone<A: Arch> {
    // ... fields
}

pub struct Insn<'a, A: Arch> {
    // ... fields
}

pub struct InsnDetail<'a, A: Arch> {
    // ... fields
}

Then, the methods mentioned in the motivation section can be typed in a more straight-forward way:

impl<A: Arch> Capstone<A> {
    pub fn insn_detail<'s, 'i: 's>(
        &'s self, 
        insn: &'i Insn<'_, A>
    ) -> CsResult<InsnDetail<'i, A>> {
        // ... code
    }
}

impl<'a, A: Arch> Insn<'a, A> {
    pub fn id(&self) -> A::InsnId {
        // ... code
    }
}

impl<'a, A: Arch> InsnDetail<'a, A> {
    pub fn arch_detail(&self) -> A::InsnDetail {
        // ... code
    }
}

No more matches, as long as we're targeting a specific architecture. Also, beginners can find corresponding architecture-specific implementations just by looking at the typings. The instruction ID problem can be resolved accordingly.

Unresolved Problems

When the target architecture cannot be determined during compile-time (when the disassembler is created by the Capstone::new_raw method), the generic parameters cannot be set to represent specific architecture. To resolve this problem, maybe we need to introduce a DynamicArch that implements Arch and represents the target architecture is determined during runtime.


This proposal may be pre-mature, but I do think that it reveals some (possibly minor) problems.

@tmfink
Copy link
Member

tmfink commented Oct 30, 2021

Thanks for filing the issue! I have thought about doing something like this before. I did not do this originally because I wanted to support "dynamic architecture". I like your idea of a DynamicArch.

I do not have the bandwidth to implement this myself, but feel free to start a PR (even if it's just a work-in-progress). I am happy to help without pointers/review.

@Lancern
Copy link
Author

Lancern commented Oct 30, 2021

Thanks for your reply. I'm glad to try to help implement this draft.

I think I can work on a new sub-module named experimental where all the new stuff goes. When everything is ready, we merge the sub-module into the root module. Is that sounds OK?

@tmfink
Copy link
Member

tmfink commented Oct 30, 2021

There's not need to make a separate submodule--just put new code in whatever module makes sense. It's always easy to move it around anyway.

Lancern added a commit to Lancern/capstone-rs that referenced this issue Aug 6, 2023
This commit is the first commit among a series of patches that implement the
compile-time architecture tag proposal in capstone-rust#118. It contains the following major
changes:

- Add the ArchTag trait. This trait indicates a "tag" type that represents a
  specific architecture. Its associated types specify specific data types
  corresponding to the architecture it represents.

- Add ArchTag implementations for supported architectures, and a special
  DynamicArchTag that indicates the target architecture is unknown at compile
  time.

- Update existing general type definitions (e.g. Capstone, Insn, etc.) to
  acquire a general arch tag type parameter.
@Lancern Lancern linked a pull request Aug 6, 2023 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants