Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How/Where to Implement a Data Store Delete function? #556

Open
brettforbes opened this issue Sep 8, 2022 · 11 comments
Open

Question: How/Where to Implement a Data Store Delete function? #556

brettforbes opened this issue Sep 8, 2022 · 11 comments

Comments

@brettforbes
Copy link

brettforbes commented Sep 8, 2022

Hi,

We are producing an extension module for Stix2 that makes Vaticle TypeDB a standard datasource/datasink for your library. We currently have a minimal setup that has the init() and add() methods for DataSink, and the init() and get() methods for DataSource. They all work perfectly and we have integration tests to demonstrate that a valid Stix 2.1object can be added to TypeDB, then retrieved using the get method, converted into a valid Stix2 Python object and asserted to be equivalent (equal except for ordering of lists).

We anticipate extending your Pydantic model to suit both MITRE ATT&CK and OASIS CACAO, as built in Stix dialects (i.e. produce a valid Stix2 Python object). This enables us to validate the ATT&CK and CACAO objects on both import and export of data. This model looks like the below image, and we would arrange open-source licensing to suit.

image

Before presenting it to you, we want to build in the entire Stix 2.1 certification process as integration tests (they were a lot easier than we were expecting). In order to develop this cleanly, we need to implement a delete() method. We notice you do not currently have one in your library.

Where should we develop a delete() method? In the Data Source or Sink?
We assume Sink for the moment, but thought we would ask, as it seems a sensible method for more general use

Thanks

@chisholm
Copy link
Contributor

Some of my thoughts:

It seems to me that a "delete" function isn't consistent with typical source/sink abstractions, where the former is read-only and the latter write-only. Sources and sinks are nice simple abstractions, but don't fit every use case. I don't know what the thinking was behind the design of data stores. We have some team members out this week, so some answers will be delayed. All I can do for now is offer some opinions.

At their most general, the source/sink abstractions don't require any sort of persistence or pool of objects from which an object might be deleted. The stix2 library's data source is more complex though, having some querying/searching ability, which seems to imply that there must be some kind of pool of objects to be searched. The data sink remains very simple, and doesn't seem to imply any persistence. (I think the docstring on DataSink.add doesn't appreciate the generality of the API.)

So one could argue that with the source being more database-like than the sink, that a "delete" function makes more sense in the source. The contract could be that after having been deleted, the source must not return those deleted objects in subsequent queries (unless they are re-added). Of course, that means sources would no longer be read-only, and it would drift farther from being a traditional "source". Assuming a sink did correspond to some kind of collection of objects, I'm not sure what semantics a delete method on it would have, since there is no way to examine its objects. Maybe it could just be completely arbitrary, with no API-visible effects?

Another idea might be to add it to data stores, keeping sources read-only and sinks write-only. But the current data store design is just a pairing of a source and sink, and those could encapsulate two different pools of objects (reading from one place and writing to another). In that case, deletion may not be well defined. I think a delete method would need to be optional.

We also would probably need to handle both dropping a single version of an object, and dropping all of the known object history (all versions). Maybe "drop" would be a better method name? The end of a STIX object's lifecycle is a revocation, not a deletion. Maybe that would avoid some misconceptions?

@brettforbes
Copy link
Author

brettforbes commented Sep 20, 2022

thanks Chisholm,

You raise some excellent points. Thanks for taking the time to review the idea.

I agree that the current concepts behind a DataSink and DataSource seem somewhat flimsy, like not suited for an actual database. It is true that it is the combined DataStore functionality that we are targeting. In the combined DataStore functionality, it does make sense to have a delete() method. Given the feedback we have had from industry, we also feel that you guys undervalue the clever idea behind your Pydantic interface. Being able to validate incoming data is critical, in our view, nice work anticipating this.

Our particular use case is the Stix 2.1 certification tests, as we want test our ability to consume, and then produce various packets. Clearly, to run these Stix 2.1 certification tests as integration tests, then it is also desirable to properly delete any of the test data at test conclusion. So our initial use case was really about completing certification, and bundling it in as integration tests.

At the moment we have 3 verbs, and we must applaud the original design team, cos its freaking cool. You may not have considered it in the original design, but it was always extensible, so kudos to you guys. Consider that I can now do:

  1. Add() - add any stix object to the database
  2. Get() - get any stix obect based on id, and
  3. Delete() - delete any set of stix objects, based on a list of id's

In short, your brilliant design enables us to do super-simple python commands, to move data in and out of the knowledge graph. Tremendous stuff. You can do Stix 2.1 on Jupyter Lab, and combine it with a knowledge graph visualisation, super cool.

At the moment i have attached the delete() method to the DataSink, but i admit it could have easily been attached to the DataSource, or indeed each one could have a copy of the same method. Interestingly, I use a get() method from the DataSource to check that the object is really loaded, and define its sub-object shape, before running the delete() method on the DataSink, but they are the same database.

I am just going through the final part of the testing program, and can't wait to show it to you guys. Thanks!!!

@priamai
Copy link

priamai commented Sep 20, 2022

Hi both, this is very good design thinking.
Taking @chisholm comments, for @brettforbes (if you have enough cycles) what about also:

  • hard delete all versions of an object with a seed STIX ID
  • revoke an object (not really a delete of course) with a STIX ID

@clenk
Copy link
Contributor

clenk commented Sep 26, 2022

The Datastores concept was developed only to be a shortcut for specific use cases, and deletion was not considered at the time. Considering that TAXII has a delete endpoint we probably should have. @chisholm also brings up excellent points.

I generally like methods on the source or sink to be callable from the datastore, too, but in this case it might cause confusion if the source and sink don't point to the same pool of objects. If you know your custom datastore classes will always have source and sink use the same pool of objects, delete() could just be a custom method on your custom datastore classes. Alternatively if it's added to the DataStoreMixin, maybe it would throw an error if source and sink do not point to the same object pool.

@frank7y
Copy link
Contributor

frank7y commented Jan 27, 2023

@brettforbes hey there, I'm jumping into this thread just to ask you if there was any tentative timeline on when will you make your model public, and if that's still something you're considering.

We're also currently in need of a CACAO model, and are actually working on an internal standalone CACAO library prototype. However we'd prefer to adopt and/or work on some already established or almost ready solution, instead of starting from scratch.

Sorry for the intrusion. Thanks

@brettforbes
Copy link
Author

Hey @frank7y , thanks for the question. Its funny you should ask as i am just finishing the refactoring to make custom objects easy, like ATT&CK and some case/feed management objects. So it should be easy to add CACAO. It is definitely on our roadmap, but the way the system is structured is that you can definitely do it yourself if you want.

We are also including a fully documented help system (except it doesn't document how to customise yet), and we hope to finalise the whole library for release later next week. We already have:

  • add
  • get
  • delete

Now it should be much easier to add custom objects and dialects (like CACAO) without much coding, including validation of input data. Hopefully we should get back to you before next Friday

@frank7y
Copy link
Contributor

frank7y commented Jan 27, 2023

@brettforbes nice to hear that. I'm looking forward to the release, and will certainly take into consideration for our internal use cases.

@brettforbes
Copy link
Author

@frank7y , note that we will be happy/keen to add CACAO support to it with you guys. One of the massive benefits of our system is we aim for fully normalised records, so for example, cyber observabes or ATT&CK records are only written once, and then all uses of them link to the original record.

We are also releasing an open source ui, and this is designed for easy modification, so we can defintiely support your use case

@frank7y
Copy link
Contributor

frank7y commented Jan 28, 2023

@brettforbes for the record, I'm just an external contributor, thus not affiliated with OASIS, nor involved in their decision processes. That said, despite appreciating your open-source approach I cannot 100% guarantee any contribution to your library by my side to be shared with the community, since it's for internal usage (and could require much work separating internal logic from the library). And by the way for us would be a medium/long term effort. Should we come up with some shareable contribution I'll reach out.

However I think it would be fruitful for the project you're working on if you also shared it on the OASIS STIX/CACAO mailing list and their official WGs other than here on GitHub.

@brettforbes
Copy link
Author

@frank7y no worries mate, we're Australian so we are pretty chilled. We would of course appreciate any contributions to our code base, but would more appreciate being able to collaborate with you to build the CACAO extension to suit your purposes. If it does this, then it helps other people by default. Thus your help is more to be the use case and test site for our CACAO layer

@frank7y
Copy link
Contributor

frank7y commented Jan 31, 2023

@brettforbes Great, let's keep updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants