Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index trait bound makes it impossible to return an owned value #33

Open
rocurley opened this issue Oct 7, 2021 · 12 comments
Open

Index trait bound makes it impossible to return an owned value #33

rocurley opened this issue Oct 7, 2021 · 12 comments

Comments

@rocurley
Copy link

rocurley commented Oct 7, 2021

I'm working on json diffing. I have some representation of two jsons in memory, like a serde_json::Value. Obviously these cannot be directly diffed, since they're not array-like. So to diff them I need to flatten them into an array of json-lines. I'd rather not actually construct the array, however: I don't want the memory usage. Instead, I'd like to generate the lines on the fly.

I can almost do this by implementing the Index trait. It doesn't quite work, however, since I need to return a reference. Ideally, I'd like to be able to pass in an FnMut(usize) -> T or something along those lines.

I'm not sure if anybody but me will find this useful, TBH, but I figured there's no harm in asking.

@mitsuhiko
Copy link
Owner

@rocurley Why can you not produce a &str (or similar). What the Index trait expands to does need to be a reference, but I did not find it massively to be a problem so far. Do you have an example where it does not work for you currently?

@rocurley
Copy link
Author

rocurley commented Nov 10, 2021

So my actual use case is diffing jsons. But you don't need the full complexity of json here to have this issue. Say you want to diff Vec<Vec<u8>>s. Now in principle you could just pass that in directly and be fine. But the diff will then consider the inner Vec<u8>s to be the unit of diffing, so the diffs will be too coarse. [[1,2]] -> [[1,2,3]] shouldn't say the entire input has changed, for example.

So one way to do this is to flatten the input into a vector representation that allows you to reconstruct the original input:

enum DiffElem {
    VecStart,
    Elem(u8),
}

fn to_diffable(xss : &Vec<Vec<u8>>) -> Vec<DiffElem> {
    let mut out : Vec<DiffElem> = Vec::new();
    for xs in xss {
        out.push(DiffElem::VecStart);
        out.extend(xs.iter().map(|x| DiffElem::Elem(*x)));
    }
    out
}

Now this will work, but requires making a copy of the entire input. But it doesn't seem strictly necessary. You could "index" into the diffable representation without ever constructing it:

// This implementation is ineffecient. To get a more effecient one, we'd
// annotate each subvector with the index of its VecStart, and use
// binary search to find the subvector we care about.
fn index_diffable(mut i : usize, xss : &Vec<Vec<u8>>) -> Option<DiffElem> {
    for xs in xss {
        if i == 0 {
            return Some(DiffElem::VecStart);
        }
        if i < xs.len() {
            return Some(DiffElem::Elem(xs[i]));
        }
        i -= xs.len();
    }
    None
}

But you can't do this with the Index trait.

@mitsuhiko
Copy link
Owner

I agree that this case is not great. Generally I'm not a huge fan of the Index trait as base. I have already considered alternatives before but never with too much success.

@kirawi
Copy link

kirawi commented Mar 6, 2022

I've been trying to implement a wrapper type to get ropey::RopeSlice to work with similar using the iterators it provides, but the problem is that they only return owned values with the exception of the Chunks iterator (which isn't what I need).

I have already considered alternatives before but never with too much success.

If you don't mind me asking, what did you try and why weren't they successful?

@mitsuhiko
Copy link
Owner

The original issue I had with Index wasn't even that it returns references but that it can be hard to get the bounds checks reliably eliminated (#19). So originally I was trying to do that.

To address this particular thing with borrows I tried to have a custom trait but making both borrows and owned values work is surprisingly complex and I did not go very far with it because I did not at all like what it did to the API. I'm not sure if there are clever ideas I haven't been thinking of yet that would make both cases work.

@cessen
Copy link

cessen commented Mar 15, 2022

We're running into this in Helix (using Ropey) again.

Might I suggest adding a variant of the capture_diff() and capture_diff_deadline() functions that take a |usize| -> Item closure? That would give a lot of flexibility to client code to work with whatever weird containers/situations they might have.

Edit: or rather a |usize| -> Option<Item> closure, since the collections are presumably not infinitely long, ha ha. :-)

@mitsuhiko
Copy link
Owner

mitsuhiko commented Mar 15, 2022

@cessen the issue is that this Index based API is all over the place. This goes really deep in the library.

I will play with it. Maybe I just break the internal algorithms API.

@cessen
Copy link

cessen commented Mar 17, 2022

Ah, got it. If I get some time and the mood strikes, I might take a crack at that as well. If we're lucky, even if the API goes deep, it still might not be too hard to separate from the algorithm code itself. If we're lucky... :-)

@mitsuhiko
Copy link
Owner

The other issue with the callback API is that it can't borrow. So it's not even that the callback based API could be the base of the API.

@kirawi
Copy link

kirawi commented Mar 24, 2022

Perhaps a macro that could create two versions of the internal functions; one that borrows and one that returns an owned value?

@mitsuhiko
Copy link
Owner

So one could move somewhat to improving this by going all in on the IdentifyDistinct type. So something like this could be done:

let old = Rope::from_str("...");
let new = Rope::from_str("...");
let h = IdentifyDistinct::<u32>::new_from_iter(old.lines(), new.lines());
let diff = capture_diff(
    Algorithm::Myers,
    h.old_lookup(),
    h.old_range(),
    h.new_lookup(),
    h.new_range(),
);

The new_from_iter is easy to add, but the returned ops are still annoying to work with. For instance iter_changes again requires an Index bound.

@mitsuhiko
Copy link
Owner

So now that there are GATs that issue might be worth toying around with again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants