Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node.shift_after_node() does not update enhanced dependencies #95

Open
dan-zeman opened this issue Oct 29, 2021 · 2 comments
Open

node.shift_after_node() does not update enhanced dependencies #95

dan-zeman opened this issue Oct 29, 2021 · 2 comments
Labels

Comments

@dan-zeman
Copy link
Collaborator

I have the following code to fix tokenization issues in Spanish AnCora (UniversalDependencies/UD_Spanish-AnCora#6):

        if re.search(r'\w[¡!]$', node.form):
            # Separate the punctuation and attach it to the rest.
            punct = node.create_child()
            punct.shift_after_node(node)
            punct.form = node.form[-1:]
            node.form = node.form[:-1]
            punct.lemma = punct.form
            punct.upos = 'PUNCT'
            punct.xpos = 'faa' if punct.form == '¡' else 'fat'
            punct.feats['PunctType'] = 'Excl'
            punct.feats['PunctSide'] = 'Ini' if punct.form == '¡' else 'Fin'
            punct.misc['SpaceAfter'] = node.misc['SpaceAfter']
            node.misc['SpaceAfter'] = 'No'
            punct.deprel = 'punct'

The method shift_after_node() correctly updates ids and basic heads that are after the new position of the shifted node. Unfortunately it fails to also update the enhanced heads when enhanced representation is present. Hence the following source

-19     Yahoo!  Yahoo!  PROPN   np0000o _       16      appos   16:appos        ClusterId=CESS-CAST-A-20000503-1687-s5.sn.51|ClusterType=Spec.organization|MentionSpan=19
-20     con     con     ADP     sps00   _       21      case    21:case _
-21     intenciones     intención       NOUN    ncfp000 Gender=Fem|Number=Plur  8       obl     8:obl   ClusterId=CESS-CAST-A-20000503-1687-s5.sn.57|ClusterType=Gen|MentionSpan=21-22

results in the following (note the mismatch in the parent of the preposition con):

+19     Yahoo   Yahoo!  PROPN   np0000o _       16      appos   16:appos        ClusterId=CESS-CAST-A-20000503-1687-s5.sn.51|ClusterType=Spec.organization|MentionSpan=19|SpaceAfter=No
+20     !       !       PUNCT   fat     PunctSide=Fin|PunctType=Excl    19      punct   _       _
+21     con     con     ADP     sps00   _       22      case    21:case _
+22     intenciones     intención       NOUN    ncfp000 Gender=Fem|Number=Plur  8       obl     8:obl   ClusterId=CESS-CAST-A-20000503-1687-s5.sn.57|ClusterType=Gen|MentionSpan=21-22

It just occurred to me that the MentionSpan would also need updating but for that one would probably need to activate the CorefUD sub-API first?

@dan-zeman dan-zeman added the bug label Oct 29, 2021
@dan-zeman
Copy link
Collaborator Author

Related: If there is an empty node in the sentence, its ID is not updated, so it ends up between different nodes than where it was originally.

@dan-zeman
Copy link
Collaborator Author

dan-zeman commented Jul 1, 2023

After two years, I ran into this issue again. The following ugly workaround seems to help with the enhanced relations but not with the position of the empty nodes. It might be useful until the bug is fixed properly.

            # Bug in Udapi: shift_before_node() does not update enhanced relations.
            # Before using the method, deserialize the whole graph, i.e., convert
            # parent node ids to parent node object references.
            egraph = []
            if len(node.deps) > 0:
                for n in node.root.descendants_and_empty:
                    edeps = []
                    for ed in n.deps:
                        edeps.append({'parent': ed['parent'], 'deprel': ed['deprel']})
                    egraph.append((n, edeps))
            # Now shift the node, which will update numeric IDs (ords) of all
            # subsequent node objects.
            numbernode.shift_before_node(node)
            # Not sure if this is needed: Re-set the edeps to make sure that
            # Udapi will have to serialize the egraph with the updated numbers.
            for eg in egraph:
                n = eg[0]
                edeps = eg[1]
                n.deps = edeps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant