Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strategy for dealing with invalid data #1481

Open
stof opened this issue Jun 29, 2023 · 10 comments
Open

Strategy for dealing with invalid data #1481

stof opened this issue Jun 29, 2023 · 10 comments

Comments

@stof
Copy link
Member

stof commented Jun 29, 2023

Currently, the way invalid data is handled in value objects and results is quite messy:

  • getters are typed with the expected type of the value
  • when creating the object, missing required fields are initialized to null, which will then throw a TypeError when calling the getter
  • for required timestamp fields, passing a wrong date string will assign false to the property, which also leads to a TypeError when calling the getter
  • for optional timestamp fields, passing a wrong date string will assign null to the property, which means it removes the date
  • for integer/long fields transmitted in headers (only happening in the S3 service among the services we support currently), an invalid integer string assigns false to the property (which also leads to a TypeError when calling the getter)

Additionally, for JSON response parsers, we use filter_var(..., FILTER_VALIDATE_BOOLEAN), which means we first cast the boolean value from the json to string before validating it again. This is quite inefficient way to ensure we have a boolean in the property.

This inconsistency between the expected property type and the assigned value gets caught by static analysis tools in #1467 once I actually define the expected property type in phpdoc.

Note: input objects are already clean in that regard. They define all fields as nullable in a consistent way (getters are also nullable) and validate things when creating the request body.

What should be the strategy to handle those values ?

@stof
Copy link
Member Author

stof commented Jun 29, 2023

Here is my proposal:

  • for json parsers, use a (bool) cast instead of FILTER_VALIDATE_BOOLEAN. This still provides safety in case someone passes an unexpected value but makes it a no-op when we receive the AWS response (which is expected to return a value of the right type)
  • for integer and long fields in headers, use a (int) cast (as done when getting an integer field in XML)
  • add an helper in Core to convert date strings to DateTimeImmutable (with one static method per supported timestamp format of AWS) which would throw an exception in case of invalid value instead of returning false and use it in generated code (both for optional and required fields)
  • for missing required fields, throw an exception instead of assigning null.

what do you think about this proposal ?

@jderusse
Copy link
Member

when creating the object, missing required fields are initialized to null, which will then throw a TypeError when calling the getter

That's a wrong assumption. Let's take the CreateBucketRequest::$bucket property.
This property is required AND is type hinted string|null. The getter is type hinted ?string. That means, users are allowed to create an instance of CreateBucketRequest without that property and they can call getBucket() without error.
But, when trying to use the request, it will throw a clear exception throw new InvalidArgument(sprintf('Missing parameter "Bucket" for "%s". The value cannot be null.', __CLASS__));.

for required timestamp fields, passing a wrong date string will assign false to the property, which also leads to a TypeError when calling the getter
for optional timestamp fields, passing a wrong date string will assign null to the property, which means it removes the date
for integer/long fields transmitted in headers (only happening in the S3 service among the services we support currently), an invalid integer string assigns false to the property (which also leads to a TypeError when calling the getter)

The current code uses DateTimeImmutable::createFromFormat in responses only.

Unless I missed something, the cases you describe in this issue might occur only when the server returns invalid data.

When we started working on AsyncAws we decided to trust the provider.
We presume that the response will match the contract therefore we don't need (we don't want) adding extra checks / assertions everywhere. Because that would be a waste of resources/CPU in nearly 100% of cases. And, as you mention, it will fail later when the user calls the type hinted getter

@stof
Copy link
Member Author

stof commented Jun 29, 2023

That's a wrong assumption. Let's take the CreateBucketRequest::$bucket property.

Please read the ticket again. I'm explicitly saying that this applies to ValueObject and Result classes. CreateBucketRequest is an Input class.

@stof
Copy link
Member Author

stof commented Jun 29, 2023

When we started working on AsyncAws we decided to trust the provider.
We presume that the response will match the contract therefore we don't need (we don't want) adding extra checks / assertions everywhere. Because that would be a waste of resources/CPU in nearly 100% of cases. And, as you mention, it will fail later when the user calls the type hinted getter

Well, as explained here, you only trust it partially (as you still try to deal with missing array keys for invalid fields.
And the inconsistent state of the code in that regard makes the CI fail once static analysis tools know the type of the property (which I'm trying to add in #1467 and would be even worse if a future version uses typed properties when dropping support for PHP < 7.4).

@jderusse
Copy link
Member

you only trust it partially (as you still try to deal with missing array keys for invalid fields.

Do you have an example? As far as I can see

  • When the field is required, we don't check if the key exists. For example CreateScheduleGroupOutput::$scheduleGroupArn
  • When the field is optional, the getter allows returning null. For example ListSchedulesOutput::$nextToken

For ValueObject I don't remember which strategy is used given object might be used either in request (user are allowed to temporary enter null value) and response (we trust the provider and now the data will be populated with the correct value)

@stof
Copy link
Member Author

stof commented Jun 29, 2023

For the existence check of required fields, this is indeed something done in ValueObject only, not in Result.

user are allowed to temporary enter null value

this is not true actually. Value objects are immutable. They don't have any setter. So if they fill in invalid data, they have no opportunity to fix that later.

@jderusse
Copy link
Member

indeed.

When the object is used in INPUT:

  • move checks in the constructor to prevent users from building invalid objects (drawback: perf overhead when class is also used in response).
  • provider setter and allow returning null in getter (drawback: user don't know easily if a property is required by checking the type)

@stof
Copy link
Member Author

stof commented Jun 29, 2023

@jderusse the performance overhead will not be bigger than the current ?? null which also checks for undefined keys today.

about making the getter nullable, this would be a huge pain for usage in response (as static analysis tools will tell you that the value is possibly null and you need to handle it). and there are objects used in both input and response.

So I'm in favor of adding an early check (I can do the PR).

@jderusse
Copy link
Member

So I'm in favor of adding an early check (I can do the PR).

works for me. Thank.

Would it fix the issue with int/long cast and timestamp you talked about? or it's all about trusting the provider's response?

@stof
Copy link
Member Author

stof commented Jun 29, 2023

The issue for int would be fixed by using a cast instead of trying to validate the provider response, as done elsewhere.

for timestamp, I will check what is the easiest way to handle those false values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants