Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regs_write() and regs_read() do not return complete results #84

Open
bcantrill opened this issue May 6, 2020 · 6 comments
Open

regs_write() and regs_read() do not return complete results #84

bcantrill opened this issue May 6, 2020 · 6 comments
Assignees

Comments

@bcantrill
Copy link

The instruction detail provided via InsnDetail's regs_write() and regs_read() only return a subset of the registers written to or read from.

For example, running the C-based cstool:

$ cstool -d thumb "f0bd"
 0  f0 bd  pop	{r4, r5, r6, r7, pc}
	ID: 423 (pop)
	op_count: 5
		operands[0].type: REG = r4
		operands[0].access: WRITE
		operands[1].type: REG = r5
		operands[1].access: WRITE
		operands[2].type: REG = r6
		operands[2].access: WRITE
		operands[3].type: REG = r7
		operands[3].access: WRITE
		operands[4].type: REG = pc
		operands[4].access: WRITE
	Registers read: sp
	Registers modified: sp r4 r5 r6 r7 pc
	Groups: thumb thumb1only 

In contrast, running the capstone-rs-based cstool:

$ cstool -d --arch arm --mode thumb --code "f0bd"
      1000:  f0 bd                               pop     {r4, r5, r6, r7, pc}
             insn id:     423
             read regs:   sp
             write regs:  sp
             insn groups: thumb, thumb1only

The problem -- alluded to in #63 -- is that the C-based consumers are explicitly calling cs_regs_access() while the Rust-based consumers have no way of knowing that this must be called. There are a number of ways of fixing this, but it seems that calling insn_detail() should return an InsnDetail for which regs_write()/regs_read() return correct data -- and therefore that insn_detail() should explicitly make the call to cs_regs_access().

Here is a diff that implements this:

diff --git a/capstone-rs/src/capstone.rs b/capstone-rs/src/capstone.rs
index 9e77f9d..1ae313d 100644
--- a/capstone-rs/src/capstone.rs
+++ b/capstone-rs/src/capstone.rs
@@ -374,6 +374,53 @@ impl Capstone {
         } else if Self::is_diet() {
             Err(Error::IrrelevantDataInDiet)
         } else {
+            /*
+             * To assure that our returned InsnDetail's regs_*() accessors
+             * return the correct data, we need to call into cs_regs_access().
+             * This interface is unsafe even by C's own poor standards for
+             * itself: it takes no bounds and has essentially undefined
+             * behavior if the arrays passed in correspond to the (same-sized)
+             * arrays in the detail member of the cs_insn.  We therefore call
+             * into this interface with a scratch buffer on our stack and then
+             * manually copy it into the same structures in our underlying
+             * insn.  This could presumably be made less verbose, but given
+             * the degree of unsafety, we have erred on the side of caution...
+             */
+            unsafe {
+                let mut rbuffer: [u16; 32] = [0; 32];
+                let mut wbuffer: [u16; 32] = [0; 32];
+                let mut rcnt: u8 = 0;
+                let mut wcnt: u8 = 0;
+                let mut detail = insn.insn.detail;
+
+                let regs_read: *mut u16 = &mut rbuffer[0];
+                let regs_write: *mut u16 = &mut wbuffer[0];
+                let regs_read_count: *mut u8 = &mut rcnt;
+                let regs_write_count: *mut u8 = &mut wcnt;
+
+                let err = cs_regs_access(
+                    self.csh(), &insn.insn,
+                    regs_read, regs_read_count,
+                    regs_write, regs_write_count
+                );
+
+                if err != cs_err::CS_ERR_OK {
+                    return Err(err.into());
+                }
+
+                for i in 0..rcnt {
+                    (*detail).regs_read[i as usize] = rbuffer[i as usize];
+                }
+
+                (*detail).regs_read_count = rcnt;
+
+                for i in 0..wcnt {
+                    (*detail).regs_write[i as usize] = wbuffer[i as usize];
+                }
+
+                (*detail).regs_write_count = wcnt;
+            }
+
             Ok(unsafe { insn.detail(self.arch) })
         }
     }

Running with this diff, the capstone-rs-based cstool has the correct results:

$ cstool -d --arch arm --mode thumb --code "f0bd"
      1000:  f0 bd                               pop     {r4, r5, r6, r7, pc}
             insn id:     423
             read regs:   sp
             write regs:  sp, r4, r5, r6, r7, pc
             insn groups: thumb, thumb1only
@tmfink tmfink self-assigned this May 7, 2020
@tmfink tmfink added the bug label May 7, 2020
@tmfink
Copy link
Member

tmfink commented May 7, 2020

Thanks for filing the issue! As you pointed out, this will mostly be solved once we fix #68.

It seems like you probably have the right idea that we should always call cs_regs_access when in detail mode. However, cs_regs_access only seems to be supported on x86, arm, and arm64, so we don't want to error out if a call fails (just ignore).

leoetlino added a commit to leoetlino/capstone-rs that referenced this issue Jul 27, 2021
@vn-ki
Copy link

vn-ki commented Sep 21, 2021

Just hit this. Are you planning on fixing this? If you are busy, is the fix something I can help with?

@tmfink
Copy link
Member

tmfink commented Sep 22, 2021

@vn-ki I did not make any plans to implement this myself. I would greatly appreciate help with this!
Would you like to open a PR? I would be happy to help with any questions you have.

The reason I have not gotten to this is that the support is limited to only a few architectures, which potentially complicates the implementation. Some open questions:

  1. Should regs_read()/regs_write() always call cs_regs_access?
  2. How should we manage the lifetimes? Should we just copy into Rust-managed memory?

If you open a PR, please reference this issue (#84) and the original #63.

@vn-ki
Copy link

vn-ki commented Sep 23, 2021

@tmfink I'll be happy to open a PR.

Some ideas:

  1. We can keep the current API (and approach), and add two new fields to capstone::InsnDetail corresponding to regs_read and regs_write. Depending on the architecture, we can return the one from cs_detail or the one from cs_regs_access. The initialization of these new two arrays would be at the time of construction when calling capstone::Capstone::insn_detail.
  2. We can do what the other bindings does and expose an extra Capstone::reg_access(&Insn) function which will call the cs_regs_access and return a InsnRegs(CapstoneRegs, CapstoneRegs) (lifetime shouldn't be an issue here since we are returning two fixed length arrays. Unless we want the regs to not live longer than the instruction). This is what the python binding seem to be doing, see details api and regs_access API.

I think approach 1 would be less confusing for the end user while approach 2 would be more inline with what upstream thinks is idiomatic (which might mean less headaches with future updates). Also with approach 2, maybe the user will have less of a headache if they want these regs to outlive the instruction?

Tell me if you think of any of these ones are good or if you had some other ideas?

@tmfink
Copy link
Member

tmfink commented Sep 27, 2021

@vn-ki Thanks for the proposals. My thoughts after thinking for a few days:

  • Approach 1: transparent to user
    • easier for user
    • requires extra run-time checks based on whether the architecture supports cs_regs_access() when calling
  • Approach 2: explicit cs_regs_access() API
    • annoying for user (need to manually track which arch supports the extra API)
    • closer to what other bindings (including C API)
  • Same for both
    • Need extra storage for regs info, probably equivalent of 2 pointers either in Capstone or InsnDetail

I think I favor approach 1. We already bend over backwards to adapt to Capstone's "interesting" C API.

Some notes when implementing:

  • For storage for the regs_read/regs_write, I suggest using a Option<Box<cs_regs>>
    • This won't require any special drop logic since Rust will own the allocation.
    • You only need to initialize the Option when one of InsnDetail::{regs_read, regs_write} are called.
    • It's probably not worth using MaybeUnit to avoid the "zeroizing" cost; initialization only happens once per Capstone instance

bcantrill added a commit to oxidecomputer/capstone-rs that referenced this issue Jan 17, 2022
@Tsn0w
Copy link

Tsn0w commented Mar 1, 2022

Hi, is @vn-ki still working on this?

leoetlino added a commit to leoetlino/capstone-rs that referenced this issue Jan 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants