Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to simply parse/serialize a Map to/from XML that has keys as tag names wrapping text values. #192

Open
gladapps opened this issue Dec 18, 2023 · 5 comments

Comments

@gladapps
Copy link

I'm trying to simply parse a format like this:

<metadata>
  <fieldA>this is the text value for fieldA</fieldA>
  <fieldB>this is the text value for fieldB</fieldB>
  <fieldC>
    <fieldD>
      <fieldE>this is the text value for fieldE</fieldE>
    </fieldD>
  </fieldC>
</metadata>

into a Map<String, String> with the entry keys being the tag names:

mapOf(
  "fieldA" to "this is the text for fieldA",
  "fieldB" to "this is the text for fieldB",
  "fieldE" to "this is the text for fieldE"
)

I've made my own policy that overrides handleUnknownContentRecovering and just puts the keys and values into a single Map like this:
dataMap[input.name.toCName()] = input.elementContentToFragment().contentString
and returns the Map for elementIndex = 0. But I'm not sure what to do about the nested tags.

I also need to serialize such a Map.

I've tried using the existing MapEncoder, but I don't need keys and values to have their own tags. Maybe it can work if the entry name could use the key name and omit the key element, with the value collapsed. But I couldn't figure out how to get it to do that.

Any help would be greatly appreicated.

@pdvrieze
Copy link
Owner

The challenge is that this is not quite valid Xml. Tag names are intended to be well-defined. I would consider custom parsing the best solution (if you do this in a custom serializer you can still parse the values using serialization). However, there are other options:

  • Probably best: Using a custom serializer on the container parse the xml manually and then use serialization for the values.
  • You could do do some rewriting as in https://github.com/pdvrieze/xmlutil/blob/master/examples/DYNAMIC_TAG_NAMES.md
  • You could also use the fact that if you have a list of Node instances it will work (note that text element serialization is broken - also marking it as XmlValue breaks).
  • Your handling of the content would work (have a look at the depth property to "deal" with nesting)

@gladapps
Copy link
Author

Thank you! I've actually already started implementing the dynamic tag names approach serializing a Map instead of a List. Serializing the Map without nesting worked straightaway, but I have not successfully parsed out values with MapEntrySerializer and DynamicTagReader.
But, I'm coming to the realization that this approach is probably more work that it is worth, being that it is a minor part of the overall data model (the XML is the metadata of one object type in a vast sea of JSON). Being that our structure is not quite valid XML, I'm thinking maybe I should just revert back to doing dumb string building and parsing and use expect/actuals for the parts that don't have pure kotlin solutions (StringEscapeUtils.escapeXml11 in Apache Commons Text, for example).

@pdvrieze
Copy link
Owner

pdvrieze commented Dec 19, 2023

@gladapps You don't need to go to raw parsing with regexes or something. You can use the (separate) xml parsing support from the core library. You just create your parser, then read events, if it is a tag handle it (read the value (perhaps recursively), then add it to your list). It is serialization that doesn't like it (it makes too many assumptions), not the xml parser. However, parsing a list of Nodes "should" work (but it doesn't due to a bug).

@susrisha
Copy link

susrisha commented Feb 4, 2024

@pdvrieze Is there an example of creating own parser with read events and tag handles? Would love to have a look at it.

@pdvrieze
Copy link
Owner

pdvrieze commented Feb 4, 2024

@susrisha The way to go is to use the object XmlStreaming (I will be transitioning to an accessor function xmlStreaming due to changes in multiplatform expect/actual). This object allows you to create instances of parsers/serializers in a platform independent way (the "generic" variants create platform independent variants so you would have consistent behaviour, those can also be created directly as KtXmlReader and KtXmlWriter). Then you can use next to get the next event and nextTag to get the next tag event (note that the latter will verify there is no ignorable content in between). You can retrieve the current event as eventType and depending on the event type you can retrieve name, attributes, text content etc. Have a look at the documentation: https://pdvrieze.github.io/xmlutil/core/nl.adaptivity.xmlutil/-xml-reader/index.html .

Note that you don't need the serialization part of the library for this, only core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants