Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement removeWhitespaceElements on XMLDecoder #222

Merged
merged 12 commits into from Jul 30, 2021

Conversation

wooj2
Copy link
Contributor

@wooj2 wooj2 commented Jul 27, 2021

This updates the XMLStackParser to accept a parameter called removeWhitespaceElements.

The purpose of the XMLStackParser is to call the XML parser and create a tree of XMLCoderElement representing the structure of the parsed XML.

Assuming that XMLStackParser has trimValueWhitespaces set to false, when attempting to parse a nested data structure like the following:

<SomeType>
    <nestedStringList>
        <member>
            <member>foo</member>
            <member>bar</member>
        </member>
        <member>
            <member>baz</member>
            <member>qux</member>
        </member>
    </nestedStringList>
</SomeType>

... then there will multiple XMLCoderElements in the tree which will have elements set to elements that are either:
a. Purely whitespaced elements or
b. The child elements

These purely whitespaced elements are problematic for users who are implementing custom Decoder logic, as they are interpreted as regular child elements. Therefore, setting removeWhitespaceElements to true while trimValueWhitespaces is set to false, will remove these whitespace elements during the construction of the XMLCoderElement tree.

An in-depth analysis of the original problem can be found here.

For historical purposes, a separate approach was implemented. It uses a similar algorithm in a different part of the code. #221

@wooj2 wooj2 changed the title chore: filter out pure whitespace with nested elements version 2 chore: filter out pure whitespace with nested elements Jul 29, 2021
@wooj2 wooj2 marked this pull request as ready for review July 29, 2021 01:19
@wooj2 wooj2 changed the title chore: filter out pure whitespace with nested elements Add ability to filter out pure whitespace elements w/ nested structures Jul 29, 2021
wooj2 and others added 7 commits July 29, 2021 09:23
Co-authored-by: Max Desiatov <max@desiatov.com>
Co-authored-by: Max Desiatov <max@desiatov.com>
Co-authored-by: Max Desiatov <max@desiatov.com>
Co-authored-by: Max Desiatov <max@desiatov.com>
Co-authored-by: Max Desiatov <max@desiatov.com>
Co-authored-by: Max Desiatov <max@desiatov.com>
@codecov
Copy link

codecov bot commented Jul 29, 2021

Codecov Report

Merging #222 (02da507) into main (a469f60) will increase coverage by 0.37%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #222      +/-   ##
==========================================
+ Coverage   73.50%   73.87%   +0.37%     
==========================================
  Files          46       46              
  Lines        2404     2431      +27     
==========================================
+ Hits         1767     1796      +29     
+ Misses        637      635       -2     
Impacted Files Coverage Δ
...urces/XMLCoder/Auxiliaries/String+Extensions.swift 100.00% <100.00%> (ø)
Sources/XMLCoder/Auxiliaries/XMLCoderElement.swift 96.39% <100.00%> (+0.04%) ⬆️
Sources/XMLCoder/Auxiliaries/XMLStackParser.swift 93.93% <100.00%> (+1.16%) ⬆️
Sources/XMLCoder/Decoder/XMLDecoder.swift 78.72% <100.00%> (+2.08%) ⬆️
...es/XMLCoder/Decoder/XMLDecoderImplementation.swift 67.74% <0.00%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a469f60...02da507. Read the comment docs.

Copy link
Collaborator

@MaxDesiatov MaxDesiatov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@MaxDesiatov
Copy link
Collaborator

MaxDesiatov commented Jul 29, 2021

@wooj2 one last request, I promise 🙂

Can you add a subsection to the "Advanced features" section in README.md about this new flag? You can state it's available starting with the 0.13.0 release, which I'm going to tag after this is merged.

Just an example of your use case and a short description of the feature, similar to other subsections that describe advanced features. Thank you!

@MaxDesiatov MaxDesiatov linked an issue Jul 29, 2021 that may be closed by this pull request
@MaxDesiatov MaxDesiatov changed the title Add ability to filter out pure whitespace elements w/ nested structures Implement trimValueWhitespaces on XMLDecoder Jul 29, 2021
@wooj2 wooj2 changed the title Implement trimValueWhitespaces on XMLDecoder Implement removeWhitespaceElements on XMLDecoder Jul 29, 2021
@wooj2
Copy link
Contributor Author

wooj2 commented Jul 29, 2021

@wooj2 one last request, I promise 🙂

Can you add a subsection to the "Advanced features" section in README.md about this new flag? You can state it's available starting with the 0.13.0 release, which I'm going to tag after this is merged.

Just an example of your use case and a short description of the feature, similar to other subsections that describe advanced features. Thank you!

No prob at all! Just updated it. Thanks again for all the support

README.md Outdated Show resolved Hide resolved
@MaxDesiatov
Copy link
Collaborator

@kneekey23 @wooj2 perfect, thanks for uncovering this edge case! Your contribution is very appreciated 👏

@MaxDesiatov MaxDesiatov merged commit 487ece5 into CoreOffice:main Jul 30, 2021
spotlightishere pushed a commit to WiiLink24/XMLCoder that referenced this pull request Aug 6, 2021
This updates the `XMLStackParser` to accept a parameter called `removeWhitespaceElements`.

The purpose of the `XMLStackParser` is to call the XML parser and create a tree of `XMLCoderElement` representing the structure of the parsed XML.

Assuming that XMLStackParser has `trimValueWhitespaces` set to `false`, when attempting to parse a nested data structure like the following:
```xml
<SomeType>
    <nestedStringList>
        <member>
            <member>foo</member>
            <member>bar</member>
        </member>
        <member>
            <member>baz</member>
            <member>qux</member>
        </member>
    </nestedStringList>
</SomeType>
```

... then there will multiple `XMLCoderElement`s in the tree which will have `elements` set to elements that are either:
a.  Purely whitespaced elements or
b.  The child elements

These purely whitespaced elements are problematic for users who are implementing custom `Decoder` logic, as they are interpreted as regular child elements.  Therefore, setting `removeWhitespaceElements` to `true` while `trimValueWhitespaces` is set to `false`, will remove these whitespace elements during the construction of the `XMLCoderElement` tree.

An in-depth analysis of the original problem can be found [here](CoreOffice#219).

For historical purposes, a separate approach was implemented.  It uses a similar algorithm in a different part of the code. CoreOffice#221
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decoding special whitespace characters
3 participants