Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CERT-TEST-FAILURE] Test_TC_ACL_2_3/4 are very (too?) project-chip implementation specific #33487

Open
Apollon77 opened this issue May 16, 2024 · 13 comments
Labels
bug Something isn't working cert blocker needs triage spec Mismatch between spec and implementation

Comments

@Apollon77
Copy link
Contributor

Apollon77 commented May 16, 2024

The test cases for ACL (stareting with ACL2_3 and 2_4, but also other, are build very implementation specific to "how project chip" has implemented functionalities like chunked list writes and also ACL list even processing and sending.
The tests currently relying on the facts that:

  • writes are always process in exactly the chunks that chiptool sends in and each chunked array change is processed for it's alone inluding validation and event sending for the whole ACL list (like also resetting the event sends all former entries as deleted). The counter example would be a logic that collects all list chunks from one write request and processes them as one list change - but this would lead to completely different events in several cases
  • acl list change processing is validated implementation specific as chip has implemented it. (see [CERT-TEST-FAILURE] Test_TC_ACL_2_3/4 are very (too?) project-chip implementation specific #33487 (comment)) ... so an other implementation, e.g. processing the deletions also "in increasing order" and not decreasing would also lead to a different event order
  • even if chip has support for "unchunked lists" in ACl (which would also behave differently as the chunked variant regarding sent events and such) this functionality is never really tested with these tests because chiptool always send list in a chunked way (which formally is against the specs recommendation - yes only "SHOULD", but would still be better regarding the amount of data to be transported over the network especially also with missing tag compression ;-) ...)

This implementation specific tests makes it difficult for alternative implementations, unless all these details would be part of the specification and so become requirements.
Sure with the right argumentation vendors using alternative implementations could request exceptions from tests, but it adds a difficulty for alternative implementations of the standard.

So I will leave this open as a "Cert blocker" here and I'm available to chat about the topic.

If I missed specification details where some of the topics are defined then please provide details.

Original issue context

Feature Area

Other

Test Case

Test_TC_ACL_2_3/4

Reproduction steps

Test ACL 2.3:
Formally the last successful write is in "step 9" of the test ... but Step 18 expects another value.

It seems Step 18 expects the "first of the second entries" from step 17 ... which gave an ConstraintError because tried to write 2 entries while only one is allowed.

I assume that this is because of the chunked write handling added in https://github.com/project-chip/connectedhomeip/blob/master/src/app/clusters/access-control-server/access-control-server.cpp#L353 which is very chip specific because formally there is no requiremenst that I know that any array needs to be transported in a chunked way ... and also not that writes needs to be handled chunked at all.

In fat it is really unexpected in case of acl (and especially in case of ACL where consistence should be the most important topic) that a write command like in step 17 returns an error, BUT had an effect by writing the first of the two chunked parts ...

It would be great to get some background information for this, thank you!

In general the test will also have issues as soon as a controller implementation is not sending the list "always chunked" as chip does, because the logic in the access-control-server.cpp would return an Constraint error without saving any value when it is >1 entry in the array. So this behavior is not consistent.

The same topic seems to be contained in Test 2.4 - here for ACL entries themself in Step 29/30
https://github.com/project-chip/connectedhomeip/blob/master/src/app/tests/suites/certification/Test_TC_ACL_2_4.yaml#L1068

Bug prevalence

always

GitHub hash of the SDK that was being used

master

Platform

core

Anything else?

PS: matter.js would be such an controller which is not automatically chunks every value. We try to get the array unchunked into the data report and only if this does not fit we fallback to chunk the array.

Follow up question aout behavior because the tests only do two entries.
What should actually happen when someone wrtes 3 entries - first valid, second add invalid, third add valid ... Then I would assume with the current logic that the second entry in the target list will have the value from the third chunk, right?
Whats the reponse? Still ConstraintError because the second entry had that? Or should the third entry "addition" not be processed because the second failed? Would be cool to get some insights here

@Apollon77 Apollon77 added bug Something isn't working cert blocker needs triage labels May 16, 2024
@cecille
Copy link
Contributor

cecille commented May 16, 2024

@mlepage-google @hasty

@Apollon77 Apollon77 changed the title [CERT-TEST-FAILURE] Test_TC_ACL_2_3 implicitely requires chip chunked array writes [CERT-TEST-FAILURE] Test_TC_ACL_2_3/4 implicitely requires chip chunked array writes May 16, 2024
@Apollon77
Copy link
Contributor Author

Ok the more I look into it the more it becomes clear to me that the expectation with chunked writes is to really process them also in their chunks one by one. Matter.js is currently not doing this that way. I can change that.

With this the question (that is unanswered for me also after consulting specs - or I am too blind) I added above is the one left: If on one chunk an error happens are more chunks for the same path also processed or not? I assume yes. In fact the result would be then one response per chunk, e.g. one ok, one error and one ok ... and it is up to the caller to decide whats now up ;-) Correct?

Thank you for your support.

@Apollon77
Copy link
Contributor Author

Another follow up question: how exactly data version protected writes work with chinked writes? On acl cluster the specs tell the administrator to ideally protect writes with data version filters. So… when each chunked part is handled as own write then dataversion changes after each. So after first write that will change unless the controller makes assumptions on how be re ion increases. Is that also correct? So data version protected writes should never be chunked ?!

@mlepage-google
Copy link
Collaborator

mlepage-google commented May 16, 2024

Hello Appollon.

Some clarifications...

First, the extension attribute is optional and if present, must be of length 1. That is why there is a constraint error when attempting to write a second element to that list.

The ACL attribute is mandatory, and must support at least 3 entries per fabric. Therefore, writing 3 entries will never result in a constraint error due to exhausting list length; writing 4 or more entries might (depending on actual supported length) result in such errors, but they would always be at the end, and never have a successful entry written after exhausting the supported length.

An ACL entry may fail other data validity checks, and fail to write for that reason.

In the SDK, when a list is written (say with 3 elements), it is done as follows:

  1. List cleared via list replace (with empty list) action.
  2. First element added via list item append action.
  3. Second element added via list item append action.
  4. Third element added via list item append action.

In that case, if the second item resulted in a failure (e.g. due to invalid item data), the list would still be of length 1 after having attempted to append the second item (and failed). Processing would continue, and the third item would be appended, resulting in a list of length 2. It is as if only the two valid items were sent, except that a write error occurred for the second item.

@Apollon77
Copy link
Contributor Author

All right. Thank you for confirmation. In fact this is then also no testing issue and could be closed or moved.

Nevertheless the specification lack details on chunked write operations and how this should work. It was completely unclear to me till now.

In my eyes this behavior is maximum inconsistent because if I send an unchunked array with the 3 items where second is invalid then nothing is written and the constraint error happens for all three. But unassigned that this ship has sailed ;-)

Would you also have an info how chunked writes work with dataversion check on the write? (Also because acl specs propose to use that). In my understanding each processed chunk increase the version so the data version check only works for the first chunk.
I know that chip always send array full and so resets the array with the first chunk but the protocol does not dictate that. So someone could really just add items to an existing list. ;-)

@Apollon77
Copy link
Contributor Author

Ps: (and slightly off topic) is there a reason why Chip always chunks the data in data reports and writes? If the array would fit into the data report then sending it unchanged would need less bytes then always chunked.
Would that not be important for low power devices?

But yes then the test here would get issues because they could behave differently ;-))

@mlepage-google
Copy link
Collaborator

mlepage-google commented May 16, 2024

The CHIP SDK always chunks writes primarily for implementation simplicity, if I recall correctly.

@mlepage-google
Copy link
Collaborator

Regarding the question of how exactly DataVersion works with chunked writes, it's likely the DataVersion increment happens when the last chunk is received, but I'm not sure I recall exactly.

Spec section 10.6.4.7. Collection Types (List) seems to have an example showing a list write as both one IB (interaction block) and multiple IBs (the multiple replace/append blocks I mentioned earlier), each with DataVersion=1, so that would match that behaviour.

@Apollon77
Copy link
Contributor Author

The CHIP SDK always chunks writes primarily for implementation simplicity, if I recall correctly.

Ok. Still … in regard to best possibly minimizing traffic that’s maybe nor the best choice. I remember matter 1.3 announcement also contained that less data are transferred as an optimization… that here could be another low hanging fruit ;-) and trust me it is not that complex as it feels. Matter.js does that. ;-)

Additionally now that I think about it - it has the consequence that the current tests might now test all relevant code paths … as example in acl code is there for both options but the „full array“ code path should always ever be executed with zero or one element …

…. Just as thoughts from my opinion.

@Apollon77
Copy link
Contributor Author

Apollon77 commented May 16, 2024

Re your infos on dataversion: ok thank you for the reference. Puuhh that makes things interesting.

So we have kind of a „transaction“ (summarize multiple changes and „commit“ at the end to increase version) but without having a real „transaction“ (aka rollback everything in case of an error).

And all this over multiple potentially chunked write messages ;-)

Good. Will be fun to implement that. ;-)

@Apollon77
Copy link
Contributor Author

When I think deeper then I also ask myself why the spec 1.3 in "10.6.4.3.1. Lists" just tals about "full list" or "clear the list and send appends" ... Combined with a dataVersion security approach it should be easiely possible to just send list modifications. Also the spec examples kind of contain all these examples, just the spec tells is "SHOULD" not be done ...
This should be even less bytes needed and should be (because of the dataVersion security guard) end in the same final list ... or?! But yes thats more "spec and protocol feedback" and I have no idea if chip has implemented that ... I assume not becuse eg acl just handled additions beside "full array" and last time I checked in the c++ code I only saw places that handle these two cases, nothing else... Sorry to place it here, but I have no other ways to bring such thoughts/questions to any attention ;-)

@mlepage-google
Copy link
Collaborator

… as example in acl code is there for both options but the „full array“ code path should always ever be executed with zero or one element …

In access-control-server.cpp for writing the Extension attribute with an entire list, it only ever handles a list of length 0 or 1 because that attribute list can only ever be length 0 or 1.

return CHIP_IM_GLOBAL_STATUS(ConstraintError);

For the ACL attribute, the length of the list being written is not assumed.

@Apollon77
Copy link
Contributor Author

Apollon77 commented May 17, 2024

I did most adjustments needed and are currently finalizing all the tests but I need to tell you that the tests here (especially ACE 2_5, 2_6 and such) are veeery chip implementation specific. I now need to mimic the event order based on the code how chip is managing the entries in the relevant cases and send out events. Also the events have the "latestValue" which is not really described in the specs, but in the end is a different value depending on if it is a Removal or other action. All this is nowhere mentioned in the specification. In the end thats all valid and ok to require it, but if such things like the exact event behaviour, orders and such are checked in certification in this depth then also the specification should contain these implementation details as requirements. Thats at least my opinion :-)

In fact I spentsome hours today to just run tests, analyze why it broke, adjust my code even if specs did not contain these details, and do this in loops ;) Same for the specific write request handling for "splitted/chunked list processing". The specification (at least not clear to me) is not containing all the details whats required and tested. I think all details in specification would have saved me a lot of this time.

Please see this as feedback from a developer that implements the protocol mainly after specs (as it should work out).

PS: just as example what I mean with implementation specific: see

size_t i = 0;
while (iterator.Next())
{
if (i < oldCount)
{
ReturnErrorOnFailure(GetAccessControl().UpdateEntry(&aDecoder.GetSubjectDescriptor(), accessingFabricIndex, i,
iterator.GetValue().GetEntry()));
}
else
{
ReturnErrorOnFailure(GetAccessControl().CreateEntry(&aDecoder.GetSubjectDescriptor(), accessingFabricIndex, nullptr,
iterator.GetValue().GetEntry()));
}
++i;
}
ReturnErrorOnFailure(iterator.GetStatus());
while (i < oldCount)
{
--oldCount;
ReturnErrorOnFailure(GetAccessControl().DeleteEntry(&aDecoder.GetSubjectDescriptor(), accessingFabricIndex, oldCount));
}

New values are added up-counting 0,1,2 ... old values then removed down-counting 5,4,3 ... and tests check exactly this behavior and order of events ;-)

PS: If someone is interested I could try to collect all the things I would love to get added to the specs and post here, (at least collecting the topics)

@Apollon77 Apollon77 changed the title [CERT-TEST-FAILURE] Test_TC_ACL_2_3/4 implicitely requires chip chunked array writes [CERT-TEST-FAILURE] Test_TC_ACL_2_3/4 are very (too?) project-chip implementation specific May 22, 2024
@woody-apple woody-apple added the spec Mismatch between spec and implementation label May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cert blocker needs triage spec Mismatch between spec and implementation
Projects
Status: Open Cert Blockers
Development

No branches or pull requests

4 participants