Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(#887): complete strictMode for JSONArray #888

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

rikkarth
Copy link
Contributor

@rikkarth rikkarth commented Apr 14, 2024

Description

If in strict mode, and having reached the end of the JSONArray construction life-cycle with no errors, we then perform an input validation.
image

In this validation we explicitly check if (1.) the last bracket was found and (2.) if more characters exist after that.

If the next character is not the end of the array (0), then it means we have more characters after the final closing square bracket, so we throw an error message informing the array is not compliant and we give back the complete input to the user for easy debugging.

image

To ensure this logic doesn't affect 2D arrays I also created a new test for it.
image

From the examples provided in the issue created: #887

Additional Note

Since the logic is fully encapsulated inside validateInput() which is only run if strictMode is true, it is guaranteed that previous implementations are not affected when strictMode is false and that this implementation is modular and can be maintained, improved or removed by simply removing validateInput().

Test cases added

image

image

@rikkarth rikkarth marked this pull request as ready for review April 14, 2024 22:17
@stleary
Copy link
Owner

stleary commented Apr 19, 2024

@rikkarth Sorry for taking so long to complete the review. Comments added.

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 21, 2024

  1. Fixed issue with double array breaking JSONTokener.nextValue() which now forwards jsonParserConfiguration object when creating a new JSONArray.
    image

  2. Changed input validation in JSONArray, removing collection of complete input in favor of leveraging x.syntaxError().
    image

image

@stleary
Copy link
Owner

stleary commented Apr 21, 2024

@rikkarth Please merge from master to pick up the latest changes:
image

@stleary
Copy link
Owner

stleary commented Apr 23, 2024

@rikkarth Looks like there is a regression when parsing array elements with non-string simple values. Please add this unit test:

    @Test
    public void shouldHandleNumericArray() {
        String expected = "[10]";
        JSONArray jsonArray = new JSONArray(expected, new JSONParserConfiguration().withStrictMode(true));
        assertEquals(expected, jsonArray.toString());
    }

This test passes when strict mode is false, but fails with this output when strict mode is true:
image

Similar problems occur in strict mode when the array elements are true/false, and null.

@rikkarth
Copy link
Contributor Author

Also boolean values are producing the same behaviour, I will fix it.
image

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 23, 2024

I changed how I validated numbers, boolean and string values using JSONObject.stringToValue existing logic, I've also added what I believe a few, more robust, test cases to prevent more regressions.

I hope the implementation looks cleaner now.

Sorry I didn't catch this earlier.

Correction - regression when parsing array elements with non-string simple values.

    /**
     * Parses unquoted text from the JSON input. This could be the values true, false, or null, or it can be a number.
     * Non-standard forms are also accepted. Characters are accumulated until the end of the text or a formatting
     * character is reached.
     *
     * @param c The starting character.
     * @return The parsed object.
     * @throws JSONException If the parsed string is empty.
     */
    private Object parsedUnquotedText(char c, boolean strictMode) {
        StringBuilder sb = new StringBuilder();
        while (c >= ' ' && ",:]}/\\\"[{;=#".indexOf(c) < 0) {
            sb.append(c);
            c = this.next();
        }
        if (!this.eof) {
            this.back();
        }

        String string = sb.toString().trim();

        if (string.isEmpty()) {
            throw this.syntaxError("Missing value");
        }

        Object stringToValue = JSONObject.stringToValue(string);

        return strictMode ? getValidNumberOrBooleanFromObject(stringToValue) : stringToValue;
    }

    private Object getValidNumberOrBooleanFromObject(Object value) {
        if (value instanceof Number || value instanceof Boolean) {
            return value;
        }

        throw new JSONException(String.format("Value is not surrounded by quotes: %s", value));
    }

@stleary
Copy link
Owner

stleary commented Apr 23, 2024

null without quotes is a valid JSON value, just like true and false (see https://json.org). Strict mode should allow this value when it appears without quotes. Please add this test case for coverage:

    @Test
    public void allowNullInStrictMode() {
        String expected = "[null]";
        JSONArray jsonArray = new JSONArray(expected, new JSONParserConfiguration().withStrictMode(true));
        assertEquals(expected, jsonArray.toString());
    }

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 23, 2024

Done as requested. Now it was simple as adding it to the this method:

    private Object getValidNumberBooleanOrNullFromObject(Object value) {
        if (value instanceof Number || value instanceof Boolean || value.equals(JSONObject.NULL)) {
            return value;
        }

        throw new JSONException(String.format("Value is not surrounded by quotes: %s", value));
    }

@stleary
Copy link
Owner

stleary commented Apr 25, 2024

@rikkarth, This string fails in strict mode because there are chars after the end of an empty array:
String s = "[[]]";
This string is allowed in strict mode because the chars at the end are not validated:
String s = "[1],";
You might be able to address the first problem by making the parser code block starting at line 160 exactly the same as the code block starting at line 134, as I mentioned sometime back. But it will probably still need to be reworked to handle values after the closing bracket, like a closing JSONObject brace. The second problem means that the parser is still not reliably detecting invalid trailing chars. If the current implementation is unable to do that, it might be necessary to revisit the design for this feature.

@rikkarth
Copy link
Contributor Author

@rikkarth, This string fails in strict mode because there are chars after the end of an empty array: String s = "[[]]"; This string is allowed in strict mode because the chars at the end are not validated: String s = "[1],"; You might be able to address the first problem by making the parser code block starting at line 160 exactly the same as the code block starting at line 134, as I mentioned sometime back. But it will probably still need to be reworked to handle values after the closing bracket, like a closing JSONObject brace. The second problem means that the parser is still not reliably detecting invalid trailing chars. If the current implementation is unable to do that, it might be necessary to revisit the design for this feature.

Yes you are absolutely right. I will come back to you with a more solid, robust solution and design.

Thank you for your patience.

@rikkarth
Copy link
Contributor Author

Mr @stleary,

After a few days of serious research of the codebase and work I believe I came up to a nice solution.

Ideally I'd like to present you with the changes I performed over a call, but I'll try to do my best through here.

Will present it in the next comment.

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 27, 2024

Switch from "endless for loop" into predictive recursive call

I transformed the "endless loop" into a more predictive controlled recursive call.

This was highly necessary for this feature since the implementation of a stable, working strict mode required touching a lot of key points in the JSONArray and JSONTokener validation process.

before and after of JSONArray main constructor
image

I did my best to maintain the codebase without regressions, during this process it was imperative that I made some "naming" changes in order to achieve my vision for this implementation.

example of variable naming change from nextChar to cursor
image

Strict Mode validations in JSONArray were improved. Two new key implementations were added to the JSONTokener:

  1. arrayLevel
  2. smallCharMemory.

Small Char Memory and ArrayLevel

new fields in JSONTokener
image

Small Char Memory

Small Char Memory is a small List of Characters which is designed to only contain 2 characters, this was extremely necessary because I required a wider "predictive" tool to perform certain validations and achieve a stronger and more stable "strict mode".

Why List?

I decided to use the List implementation because it provides the exact API I needed to enforce the smallest memory footprint in this implementation without losing the tools the List implementation provides.

By adopting the List we can also make this "memory" larger in the future, meaning we can broaden the "predictive" power of this tool with very little change.

image

Why Character and not primitive char ?

Because char holds no reference since it's primitive.

I had to make sure I was comparing the same reference of a character without reinventing the wheel in order to update the smallCharMemory with the right reference (not character).

image

Array Level

Knowing if I was at "level 0" was crucial to understand. I had to make sure if I was evaluating a (1.) double(or deeper) array case or (2.) a single/base level array case.

Every time nextValue() is called the arrayLevel goes up. We can replace this with AtomicInteger in the future if there is a need, but at the moment I didn't find a use-case that warranted using it.

image

The array level can only be accessed with JSONTokener.getArraylevel
image

Array Level Use Case

This is used to find rogue characters after an array at base level ( arrayLevel == 0 ).

Example_1: [],
Example_2: []asd
Example_3: []]

image

PROS & CONS

PROS

  • Increased performance due to recursive nature of parseTokener, higher predictability and control over JSONArray construction.
  • Cleaner implementation, some steps defined more clearly.
  • New Tools: arrayLevel and smallCharMemory adding more predict ability to the implementation and future use-cases.
  • Improved Strict Mode Implementation
  • More Unit Tests

CONS

  • Smaller but higher memory footprint due to the nature of smallCharMemory. (Around 4 bytes for two Characters + ArrayList overhead)
  • Small Adjustments to existing codebase, could lead to potential regression.

Future Recommendations

Since we are in "Maintenance Mode" it's important to keep things maintainable, so I'd like to propose some recommendations for the future. I can create the tickets for each point to keep the PRs focused and small for easier review. I only made extremely necessary changes in this PR with hopes that I can apply more "cosmetic" changes in future PRs with the goal to improve the maintainability of this project.

JSONArray refactor and cleanup

Semantic Refactor

Semantic refactor in JSONArray, which doesn't aim to change any of the existing functionality but only to clean the implementation and make it more maintainable.

Decrease Mutability / Increase Immutability && Decrease Side-Effects

Increase JSONArray internal immutability by removing all unjustified mutability from the class.
image

Decrease Cognitive Complexity

JSONArray Helper class to decrease the cognitive complexity of the implementation.

Increase Performance and Memory Footprint

Increase the performance of JSONArray through increasing the eligibility of reference garbage collection by improving the scope of certain references.

@stleary
Copy link
Owner

stleary commented Apr 28, 2024

@rikkarth Please see #891. Let's pause this review until the unit tests have been updated to give a better view of strict mode behavior.

@stleary
Copy link
Owner

stleary commented Apr 28, 2024

Branch strict-mode-unit-tests contains all of your code from this branch. In the JSONParserConfiguration default constructor, I forced strictMode to be true, and then modified the broken unit tests or replaced @test with @ignore.

  • Most of the broken tests were due to inadvertently forgetting to use double quotes in otherwise valid JSON docs. You can ignore those changes.
  • A few tests were set to @ignore because they intentionally used unquoted text. They will be restored later, and are not a concern.
  • There were a few grey areas, e.g. when an invalid numeric value is supposed to be converted to string, but instead throws an exception. It could be argued either way - should strictMode allow the conversion or not? No decision has been made yet, no action needed at this time.
  • There is a regression in single quote handling within double-quote delimited strings that must be fixed in strictMode.
  • There are also several errors in JSONTokenerTest that suggest a regression.

You can fix these issues on your current branch, or you can take a new fork off of the strict-mode-unit-tests branch.

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 28, 2024

I will review all this points and will come back to you with new feedback.

At first glance, this should be relatively easy to achieve/fix.

Thank you for this analysis.

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 28, 2024

@stleary

Update:
I fixed all the test cases in JSONTokener apart from the last which I require more feedback:

image

The Bug

The issue was, I didn't perform the most simple of validations which checks for end of file after finding ']' at 2nd, which I included now. After doing it, all tests pass accordingly.
image

As for this test case

@Test
public void testValid() {
    // ...
    checkValid("1 2", String.class);
}

throws JSONException which is indeed valid, since you manually set strictMode true in JSONParserConfiguration.

The test case value is not inside an array;
The test case value is not inside quotes;
The test case value is not valid according to strictMode guidelines (or is it?)

What should I do with this case?

org.json.JSONException: Value '1 2' is not surrounded by quotes at 3 [character 4 line 1]

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 28, 2024

This test is failing because the expected testCase is actually malformed, strictMode working accordingly. Not a regression.
Test name: toJSONObjectToJSONArray
image
image

image

inline
image

I'm not sure if you'd like me to keep the tests like they currently are (even if malformed).

In the mean time I will isolate the tests as requested. I will correct and externalize the test cases into JSON files which I believe will be more compliant since modern IDEs check for syntax errors for us in cases like this.

- JSONArray now evaluates EOF accordingly for empty Array inputs.
- JSONTokener fixed indentation
- externalized two JSONMLTest cases
@rikkarth
Copy link
Contributor Author

Branch strict-mode-unit-tests contains all of your code from this branch. In the JSONParserConfiguration default constructor, I forced strictMode to be true, and then modified the broken unit tests or replaced @test with @ignore.

All of the points you've highlighted were fixed. There are a few cases (e.g checkValid("1 2", String.class);) where I don't exactly know what I should do there, and will require some more feedback on this.

I removed the TODO's, comments, etc that you provided in strict-mode-unit-tests as I fixed each issue.

I left some of the "important" TODO's the one's you wrote "TBD later" or "will revert later" so you could easily navigate back to them later.

There is a regression in single quote handling within double-quote delimited strings that must be fixed in strictMode.

Fixed.

There are also several errors in JSONTokenerTest that suggest a regression.

Fixed.

You can fix these issues on your current branch, or you can take a new fork off of the strict-mode-unit-tests branch.

I merged them into this branch to keep a linear history of what was affected in this PR.

@rikkarth
Copy link
Contributor Author

I added this TODO's because I strongly believe the test cases here are super bloated and could be hugely simplified.

I will make my case in a future PR, leaving this here for reference.
image

@stleary
Copy link
Owner

stleary commented Apr 29, 2024

@rikkarth Nice work.
The 1 2 test result signals a regression - in this case, a change in the behavior of a public API, which is always a concern. Might be good to see if the existing behavior can be retained.
Will review the code, and add some comments.

@rikkarth
Copy link
Contributor Author

rikkarth commented Apr 29, 2024

@rikkarth Nice work. The 1 2 test result signals a regression - in this case, a change in the behavior of a public API, which is always a concern. Might be good to see if the existing behavior can be retained. Will review the code, and add some comments.

The test for this use-case passes in strictMode=false just not in strictMode=true which follows JSON rules.

The tokener is trying to understand if the value is enclosed within an array or object.
image

And then the tokener in strictMode sees this value is not a Number, null or boolean ... it's a string without quotes.
image

The question is: Why should this test pass in strictMode? The only answer I could get was - it shouldn't.

The return when strictMode=false.

checkValid("1 2", String.class); //strictMode false

output

1 2

@rikkarth
Copy link
Contributor Author

rikkarth commented May 2, 2024

Could the label for this PR be updated? It's no longer pending redesign but pending review.

Thank you.

@stleary
Copy link
Owner

stleary commented May 2, 2024

Could the label for this PR be updated? It's no longer pending redesign but pending review.

I think the label still applies. The goal of issue #887 was just to catch trailing chars at the end of the JSON doc. There might be a simpler solution that hasn't been found yet. For example, adding properties to the tokener does not seem like the right direction, or the way that JSONArray() was rewritten.

@rikkarth
Copy link
Contributor Author

rikkarth commented May 2, 2024

Fair enough.

Did you have time to read my previous comment?

It is the last point I have to complete the points you've defined previously.

I can make this last test pass, I'm just wondering why should this test pass in strict mode?

Do you have an opinion on it?

@stleary
Copy link
Owner

stleary commented May 2, 2024

Did you have time to read my previous comment?
It is the last point I have to complete the points you've defined previously.
I can make this last test pass, I'm just wondering why should this test pass in strict mode?
Do you have an opinion on it?

Do you mean the "1 2" validity test? Do you think the result difference is due to code changes to enforce quote-delimited strings, or to detecting trailing chars? In the former case, no additional changes are needed. In the latter case, it should be handled by the redesign. In either case, you don't need to make it work now.

@rikkarth
Copy link
Contributor Author

rikkarth commented May 3, 2024

Did you have time to read my previous comment?
It is the last point I have to complete the points you've defined previously.
I can make this last test pass, I'm just wondering why should this test pass in strict mode?
Do you have an opinion on it?

Do you mean the "1 2" validity test? Do you think the result difference is due to code changes to enforce quote-delimited strings, or to detecting trailing chars? In the former case, no additional changes are needed. In the latter case, it should be handled by the redesign. In either case, you don't need to make it work now.

Yes, the error is exactly that. Quote-delimited strings, which is coherent with the design of strictMode.

If you change the test and put quotes around the string, while strictMode is true, then it passes the test.

Please check screenshots bellow.

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants