Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relations #284

Open
percevalw opened this issue Apr 3, 2024 · 1 comment
Open

Relations #284

percevalw opened this issue Apr 3, 2024 · 1 comment

Comments

@percevalw
Copy link
Member

percevalw commented Apr 3, 2024

It's time we tackled the task of predicting the relationships between entities. We first need to know how to represent them in our documents, and then which components (rb or ml) can solve this task.

Representations

Basic relationships

We could add a rel dictionary attribute to scopes such as :

ent._.rel == {
    # if we only know that the two entities are related, but not how (which may be fine in most cases)
    ent1 : True,  
    # if we know the relationship between the two entities
    ent2 : "is_located_in",
    # should we instead store several tags per related entity?
    ent3 : ["lives_with", "is_parent_of"]
}

and have getters to look at this list and find entities of a given label or with a certain type of relationship:

ent._.date == next((other._.date for other in ent._.rel if other.label_ == "date"), None)

Scopes

Another interesting approach is that of scopes and scope relationships. For example, in the following:

[The patient came in on [04/05/10] (date). We prescribed paracetamol because 
of headaches.] [On the [following day] (date) the headaches disappeared].

where we predict on which span of text (scope) a given cue/trigger entity convey its meaning. Scopes can overlap each other. For instance, a section title convey its meaning on its section, which itself can contain several entities with smaller scopes.

This is already indirectly done by the eds.negation, eds.hypothesis, eds.history etc qualification components (implementation of the Negex/Context algorithms).

This could take the following form:

Span("following day")._.scope == Span("The following day, the headache was gone.")

Frames

Frames, as described in https://aclanthology.org/2023.bionlp-1.13.pdf#page=2, could be an interesting end result, but I think it would be too restrictive to implement them only.

Dependency and constituency analysis

We could also exploit dependency and constituency parsing relationships to infer relationships between entities, but note that the example above would not be directly resolved since the scope of the first entity spans two sentences, which are not syntactically related.

Components

Machine learning

I'm currently reimplementing a generic version of the https://github.com/percevalw/breast-imaging-frame-extraction method (https://aclanthology.org/2023.bionlp-1.13/)

Rule-based

By mixing sections, subsections (e.g. enumerations and bullets), sentences and Context-like algorithms, we could probably already get good results on highly requested tasks such as detecting relationships between dates and surrounding entities.

@cvinot
Copy link
Contributor

cvinot commented Apr 10, 2024

I'm working on the same subject, I'm sure you already know most of it, but if that can help the thinking, here's my two cents:

Structure

  • Microsoft has done the same job of extracting relations, and they opted for a list of "relation_type".

To reflect the Microsoft approach, you could also opt for a structure like this:

ent1._.rel == [
   {
       object: ent2,
       nature : "is_located_in",
   },
   {
       object: ent3,
       nature : "has_frequency",
   }
]
ent2._.rel == [
   {
       object: ent1,
       nature : "is_location_of",
   }
]

And then a getter to build the list of relations such as:
doc._.relations == [{subject: ent1, object: ent2, nature: is_located_in}, {subject: ent2, object: ent1, nature: is_location_of}]

But it's very much like yours, except there is less relation specific attributes, which have potential overlap with the rest of span attributes in the context of a package, at the cost of a bit more complex way to fetch values.

It then raises the question of the exact definition of each relation, their reciprocity or not etc...

Basic Relationship ideas

  • The Relation Extraction task can be two tasks: Entity pairing, and Relationship classification

  • Entity Pairing: create pairs of eligible entities using scopes, dependency parser, sections, senttences, frames etc...

  • Relationship classification: attempt to find relations from the possible existing relations based on the entity type. (ex: condition can have a BODY_SITE_OF_CONDITION so attempt to find a potential body site, a FREQUENCY_OF_CONDITION so attempt to find a frequency modifier etc..)

  • ml/dl like this attempt to create relations between eligible entities, and assign a type according to the relations deduced (ex: relation between condition and body site would be deduced as BODY_SITE_OF_CONDITION). In this example, Entity Pairing is reduced to a distance threshold.

For dates only, I use a scope based approach with a rule based construction (mixing sections, subsections, sentences as well as implicit context changes, contextual vs enunciation etc.. which is the reason I asked for #253) and already reach a decent F1, so I can back the short term approach. It can benefit from a wide variety of rules, and dates have the tendency to "cascade" in sections so to speak (such as in your example), which could be quite different from say, measurement or other relation scopes, so I had the feeling to work on those separately.

I would love to see frames and ML based scope deduction approach at work though.

Very interested to see how you approach this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants