Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add geojson parsing support #655

Merged
merged 2 commits into from Mar 3, 2022
Merged

Conversation

lognaturel
Copy link
Member

@lognaturel lognaturel commented Jan 25, 2022

Work towards getodk/collect#4943

This PR adds support for a geojson external secondary instance with:

  • a .geojson extension
  • a FeatureCollection at the top level
  • all features with Point geometry

If the file is parsed successfully, then an external secondary instance is available for use by clients. At that point there's no trace of it having been geojson previously. This matches the XML and CSV external instance implementations.

What has been done to verify that this works as intended?

Ran it in Collect!
Screenshot_1643130113

Why is this the best possible solution? Were any other approaches considered?

First I looked to Collect to see how json is parsed there. The org.json library is imported so I thought that's what was being used but it's not (getodk/collect#4997). Once I found that out I looked at gson or jackson for parsing. I went with Jackson because it's slightly more broadly used and Gson is in maintenance mode. Some of the folks who worked on Gson are working on moshi at Square but I'd rather go with the venerable Jackson for now.

I then had to pick a parsing strategy. I decided to stream then using data binding for each feature to reduce memory usage (compared to parsing the whole thing in memory). This matches what the CSV parser does.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

This is isolated. The only risk to existing behavior is that I messed up the conditional that chooses which parser to use.

Do we need any specific form for testing your changes? If so, please attach one.

https://test.getodk.cloud/#/projects/149/forms/geojson

geojson.zip

Does this change require updates to documentation? If so, please file an issue here and include the link below.

getodk/docs#1423

@lognaturel lognaturel marked this pull request as draft January 25, 2022 23:51
String path = getPath(instanceSrc);
return instanceSrc.contains("file-csv") ?
CsvExternalInstance.parse(instanceId, path) : XmlExternalInstance.parse(instanceId, path);
return instanceSrc.contains("file-csv") ? CsvExternalInstance.parse(instanceId, path)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered pulling this out into a more conventional factory so it could be tested but I don't see a big benefit given how everything else is structured at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem here is that it doesn't look like the new case is covered by any tests. Do we maybe want an outside-in/black-box/smoke test level test for this feature to check everything is in place?

@lognaturel lognaturel force-pushed the geojson-jackson branch 3 times, most recently from 734cb1f to 5b34936 Compare January 31, 2022 17:36
@lognaturel lognaturel marked this pull request as ready for review January 31, 2022 17:41
@lognaturel
Copy link
Member Author

lognaturel commented Jan 31, 2022

Apologies for all the force pushes! @seadowg we'd talked about leaving the node-based parsing but I looked at the CSV implementation and it streams so I changed my mind. It seemed like we'd want to go for lower memory usage sooner than later no matter what.

Copy link
Member

@seadowg seadowg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me after an initial scan through. There are a few pieces of testing that could be improved but the overall structure is nice and simple and makes sense!

build.gradle Show resolved Hide resolved
String path = getPath(instanceSrc);
return instanceSrc.contains("file-csv") ?
CsvExternalInstance.parse(instanceId, path) : XmlExternalInstance.parse(instanceId, path);
return instanceSrc.contains("file-csv") ? CsvExternalInstance.parse(instanceId, path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem here is that it doesn't look like the new case is covered by any tests. Do we maybe want an outside-in/black-box/smoke test level test for this feature to check everything is in place?

}
}

private static final String GEOMETRY_COLLECTION_GEOJSON = "{\n" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the JSON here is difficult enough to read that we should maybe pull out a fixture file or use a JSON framework to put it together inline.

Copy link
Member Author

@lognaturel lognaturel Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled out files since that addresses your comment about the protected method too. I used the constant because it was easier for me to jump to as I was writing the tests but I also see your point about the visual noise. Too bad we can't use Java 15 """.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:neckbeard: Kotlin has multiline strings with support all the way back to Java 6 (https://kotlinlang.org/docs/coding-conventions.html#strings).


public String getOdkCoordinates() {
if (!(type.equals("Point") && coordinates.size() == 2)) {
throw new UnsupportedOperationException("Only Points are currently supported");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to be a problem for long, but maybe we should just skip unsupported elements rather than exploding? That would let us test in Collect with .geojson files without worrying about it crashing the app.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we likely will have a release with only points. In that case it feels better to fail for reasons similar to the "fast external itemsets" bug you identified -- it's not great if a subset of user features are silently filtered out. I don't feel strongly about it but that was my reasoning for exploding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Let's use a checked exception then, so there's not a risk of Collect missing this and it resulting in a crash.

@lognaturel lognaturel requested a review from seadowg March 2, 2022 22:30
seadowg
seadowg previously approved these changes Mar 3, 2022
Copy link
Member

@seadowg seadowg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! As I commented inline, I think we should add a checked exception to deal with the unsupported feature type case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants