Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate approximate matching #14

Open
ESultanik opened this issue May 18, 2020 · 2 comments
Open

Investigate approximate matching #14

ESultanik opened this issue May 18, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@ESultanik
Copy link
Collaborator

Allow the user to specify a given epsilon of matching cost, and find a matching that is at most that epsilon from the cost of the optimal matching.

@ESultanik ESultanik added the enhancement New feature or request label May 18, 2020
@ESultanik ESultanik self-assigned this May 18, 2020
@ivanistheone
Copy link

How do you plan to make approximate matching work? User provides a function that takes two nodes and returns a "distance" factor?

I have a use case in mind for a tree of content nodes that have a special content_id attribute on them which I can use for exact matching, e.g. if nodeA.content_id == nodeB.content_id the match is 100% (or distance 0).

I posted some links about that here and looking forward to trying graphtage on the tree fixtures I have.

@ESultanik
Copy link
Collaborator Author

Graphtage already has an internal notion of edit distance, which is what it uses to output its progress bar when run from a TTY. The idea would be to:

  1. allow the user to specify a maximum edit distance (defaulting to zero), and produce a result that is at most that distance from optimal; and/or
  2. immediately print out the best solution found thus far when Graphtage receives a SIGTERM.

It sounds like you might be trying to do something slightly different, though. You might actually be able to do what you want using Graphtage's not-yet-very-well-documented --match-if argument. If you can provide a more detailed example of input files, I could try and give you an example of its usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants