Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If there are HTML tags within XML tags, @JacksonXmlText will assign incorrect values to the content. #623

Open
Suny95 opened this issue Dec 14, 2023 · 7 comments
Labels
has-failing-test Indicates that there exists a test case (under `failing/`) to reproduce the issue

Comments

@Suny95
Copy link

Suny95 commented Dec 14, 2023

If there are HTML tags within XML tags, the Jackson XML parser will assign incorrect values to the content.

image image
@Data
    public static class Abstract {

        @JacksonXmlElementWrapper(useWrapping = false)
        @JacksonXmlProperty(localName = "AbstractText")
        private List<AbstractText> abstractTextList;

    }

    @Data
    public static class AbstractText {

        @JacksonXmlProperty(isAttribute = true)
        private String Label;

        @JacksonXmlProperty(isAttribute = true, localName = "NlmCategory")
        private String category;

        @JacksonXmlText
        private String value;

    }
@Suny95
Copy link
Author

Suny95 commented Dec 14, 2023

version:'com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.15.0'

@cowtowncoder
Copy link
Member

Although textual description can be helpful, what would be needed would be full (but ideally minimal) reproduction to show exact problem.

@ronnoceel
Copy link

ronnoceel commented Apr 25, 2024

I have a working example.

xml:

<Abstract>
   <AbstractText><i>Objective</i>. Holographic mixed reality (HMR) allows for the superimposition of computer-generated virtual objects onto the operator's view of the world. Innovative solutions can be developed to enable the use of this technology during surgery. The authors developed and iteratively optimized a pipeline to construct, visualize, and register intraoperative holographic models of patient landmarks during spinal fusion surgery. <i>Methods.</i> The study was carried out in two phases. In phase 1, the custom intraoperative pipeline to generate patient-specific holographic models was developed over 7 patients. In phase 2, registration accuracy was optimized iteratively for 6 patients in a real-time operative setting. <i>Results.</i> In phase 1, an intraoperative pipeline was successfully employed to generate and deploy patient-specific holographic models. In phase 2, the registration error with the native hand-gesture registration was 20.2 &#xb1; 10.8&#xa0;mm (n = 7 test points). Custom controller-based registration significantly reduced the mean registration error to 4.18 &#xb1; 2.83&#xa0;mm (n = 24 test points, <i>P</i> &lt; .01). Accuracy improved over time (B = -.69, <i>P</i> &lt; .0001) with the final patient achieving a registration error of 2.30 &#xb1; .58&#xa0;mm. Across both phases, the average model generation time was 18.0 &#xb1; 6.1&#xa0;minutes (n = 6) for isolated spinal hardware and 33.8 &#xb1; 8.6&#xa0;minutes (n = 6) for spinal anatomy. <i>Conclusions.</i> A custom pipeline is described for the generation of intraoperative 3D holographic models during spine surgery. Registration accuracy dramatically improved with iterative optimization of the pipeline and technique. While significant improvements and advancements need to be made to enable clinical utility, HMR demonstrates significant potential as the next frontier of intraoperative visualization.</AbstractText>
</Abstract>

Java:

Abstract.java

import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlElementWrapper;

import java.util.List;

public class Abstract {
    @JacksonXmlElementWrapper(useWrapping = false)
    public List<AbstractText> getAbstractText() {
        return this.AbstractText;
    }

    public void setAbstractText(List<AbstractText> AbstractText) {
        this.AbstractText = AbstractText;
    }

    List<AbstractText> AbstractText;

    public String getCopyrightInformation() {
        return this.CopyrightInformation;
    }

    public void setCopyrightInformation(String CopyrightInformation) {
        this.CopyrightInformation = CopyrightInformation;
    }

    String CopyrightInformation;
}

AbstractText.java

package articlemetadata.pubmed.efetch;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonRawValue;
import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlProperty;
import com.fasterxml.jackson.dataformat.xml.annotation.JacksonXmlText;

@JsonIgnoreProperties(value = { "i" , "b", "sup", "sub", "u"})
public class AbstractText {

    @JacksonXmlProperty(isAttribute = true)
    public String getLabel() {
        return this.Label;
    }

    public void setLabel(String Label) {
        this.Label = Label;
    }

    String Label;

    @JacksonXmlProperty(isAttribute = true)
    public String getNlmCategory() {
        return this.NlmCategory;
    }

    public void setNlmCategory(String NlmCategory) {
        this.NlmCategory = NlmCategory;
    }

    String NlmCategory;

    public String getText() {
        return text;
    }

    public void setText(String text) {
        this.text = text;
    }

    @JacksonXmlText
    @JsonRawValue
    String text;
}

driver

    final XmlMapper xmlMapper = XmlMapper.xmlBuilder()
            .propertyNamingStrategy(PropertyNamingStrategies.UPPER_CAMEL_CASE)
            .build();

    String input = "                <Abstract>\n" +
            "                    <AbstractText><i>Objective</i>. Holographic mixed reality (HMR) allows for the superimposition of computer-generated virtual objects onto the operator's view of the world. Innovative solutions can be developed to enable the use of this technology during surgery. The authors developed and iteratively optimized a pipeline to construct, visualize, and register intraoperative holographic models of patient landmarks during spinal fusion surgery. <i>Methods.</i> The study was carried out in two phases. In phase 1, the custom intraoperative pipeline to generate patient-specific holographic models was developed over 7 patients. In phase 2, registration accuracy was optimized iteratively for 6 patients in a real-time operative setting. <i>Results.</i> In phase 1, an intraoperative pipeline was successfully employed to generate and deploy patient-specific holographic models. In phase 2, the registration error with the native hand-gesture registration was 20.2 &#xb1; 10.8&#xa0;mm (n = 7 test points). Custom controller-based registration significantly reduced the mean registration error to 4.18 &#xb1; 2.83&#xa0;mm (n = 24 test points, <i>P</i> &lt; .01). Accuracy improved over time (B = -.69, <i>P</i> &lt; .0001) with the final patient achieving a registration error of 2.30 &#xb1; .58&#xa0;mm. Across both phases, the average model generation time was 18.0 &#xb1; 6.1&#xa0;minutes (n = 6) for isolated spinal hardware and 33.8 &#xb1; 8.6&#xa0;minutes (n = 6) for spinal anatomy. <i>Conclusions.</i> A custom pipeline is described for the generation of intraoperative 3D holographic models during spine surgery. Registration accuracy dramatically improved with iterative optimization of the pipeline and technique. While significant improvements and advancements need to be made to enable clinical utility, HMR demonstrates significant potential as the next frontier of intraoperative visualization.</AbstractText>\n" +
            "                </Abstract>\n";

    Abstract abs = xmlMapper.readValue(input, Abstract.class);
    String totalAbstract = abs.getAbstractText().get(0).getText();
    System.out.println(totalAbstract);

This only prints the value of the abstract text AFTER the "Conclusions" italics. Removing the <i> tags produces the entire string.

@cowtowncoder cowtowncoder added has-failing-test Indicates that there exists a test case (under `failing/`) to reproduce the issue and removed test-needed labels Apr 26, 2024
@cowtowncoder
Copy link
Member

This does not seem like valid usage due to a few things:

  1. You are marking "i" (etc) as properties to ignore: that way all text inside <i> will be skipped, as requested. Ignore does not mean that somehow XML tag only was ignored; it means property implied by tag and contents.
  2. List<AbstractText> won't work the way you perhaps expect since there is only one <AbstractText> element -- it does bind content from multiple text segments.

In general this kind of mixed content is very difficult to make work with data binding.
You may be able to work around some issues by using setters for content and combine it like so:

    private String text = "";

    @JacksonXmlText
    public void setText(String text) {
        this.text = this.text + text;
    }

but you would probably also need to have something like:

   @JsonProperty("i")
   @JsonAlias({ "other", "tags", "here" })
   public setTextFromTags(String text) {
      this.text = this.text + text;
   }

@ronnoceel
Copy link

ronnoceel commented May 1, 2024

Thank you for your answer.

I am reading this data in from the NCBI pubmed efetch API. <AbstractText> can sometimes be a list, but is often a list of one (like in my example).

I see the problem with why the data binding might not work in this scenario. If it changes in the future I would be happy to know, but for my use case I am using the following (lossy) workaround which I will record here for posterity:

HttpResponse<String> fetchResponse = httpClient.send(fetchRequest, HttpResponse.BodyHandlers.ofString());
String body = Optional.ofNullable(fetchResponse.body())
              .map(i -> i.replaceAll("<i>", ""))
              .map(i -> i.replaceAll("</i>", ""))
              .map(i -> i.replaceAll("<b>", ""))
              .map(i -> i.replaceAll("</b>", ""))
              .map(i -> i.replaceAll("<sup>", ""))
              .map(i -> i.replaceAll("</sup>", ""))
              .map(i -> i.replaceAll("<sub>", ""))
              .map(i -> i.replaceAll("</sub>", ""))
              .map(i -> i.replaceAll("<u>", ""))
              .map(i -> i.replaceAll("</u>", ""))
              .orElse("");

return xmlMapper.readValue(body, clazz);

There is perhaps a more elegant way of doing this but it is working for me for now. I hope this helps anyone in the future who stumbles upon this.

@ronnoceel
Copy link

ronnoceel commented May 1, 2024

In my ideal world, I would like to be able to specify something like @MixedContent which would let the mapper know that any content in that text field should be interpreted as a string in it's entirety. This is what I thought that @JsonRawValue might do but I was mistaken.

@cowtowncoder
Copy link
Member

@JsonRawValue is sort of opposite: it allows injecting pre-formatted content (and should also work for XML although not 100% sure if it does) on serialization but does nothing on deserialization (reading). Since XML parsers (and JSON parsers for that matter) rarely have any way to return un-parsed/un-decoded content, there's not really a reliable way to get "original" content anyway, so I don't think this would ever be supported. I am also not sure it'd be good idea if it could be, for most usage.

But one idea I have had for a while (but no solid plan to implement) has been possibility of something like XmlNode as subtype of JsonNode, as binding target.
Or alternatively supported use of DOM Node as binding target?
In both cases target type into which XML-native representation could be bound, and then custom code could process in whatever way it makes sense.
Challenging parts include separation of concerns between databinding (where JsonNode and (de)serializers are implemented) which is format-agnostic for most part (by design), and streaming level (JsonParser, FromXmlParser etc) where format-specific differences are implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has-failing-test Indicates that there exists a test case (under `failing/`) to reproduce the issue
Projects
None yet
Development

No branches or pull requests

3 participants