Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI: EU AI Act: data original purpose, model intended task, and system intended purpose #653

Open
bact opened this issue Feb 29, 2024 · 0 comments
Milestone

Comments

@bact
Copy link
Contributor

bact commented Feb 29, 2024

Background

EU AI Act [1] will require the AI providers to provide information about intended purpose/intended task of the system/model they place on the market. If there's a use of personal data in training/validation/testing data sets, data original purpose should also be provided.

  • High-risk AI system provider information obligations (per Article 16(a)):
    • Data original purpose - Article 10(2)(aa)
    • System intended purpose - Article 11(1), detailed in Annex IV(1)(a)
  • General purpose AI (GPAI) model provider information obligations:
    • Model intended task - Article 52c(1)(b)(ii), detailed in Annex IXb(1)(a))

Example:

  • A GPAI model A intended task is facial recognition.
  • A high-risk AI system B intended purpose is user authentication.
  • The system B can use the model A to perform a facial recognition task to fulfill its authentication purpose.

Relevant fields in 3.0

  • primaryPurpose and additionalPurpose properties in SoftwareArtifact class of Software Profile provide information about the purposes of the software artifact. The purpose can be entries from SoftwarePurpose (for examples, "configuration, data, executable, library, model").
  • domain property in AIPackage class of AI Profile describes "the domain in which the AI model contained in the AI software can be expected to operate successfully. Examples include computer vision, natural language etc."
  • intendedUse property in Dataset class of Dataset Profile describes "what the given dataset should be used for." "if a dataset is collected for building a facial recognition model, the intendedUse field would specify that."

Possible gaps and proposal

System intended purpose

  • SoftwareArtifact primaryPurpose and additionalPurpose are for purposes of the element within the system, not purposes of the system.
  • Need a property for system intended purpose. "System" in this case could be a Package (distribution of software).

Model intended task

  • AIPackage domain looks a bit too broad compare to what we're looking for.
  • From one of the examples given for domain, "computer vision" - a computer vision domain has many tasks: pose estimation, facial recognition, optical character recognition, etc. So domain alone may not sufficient.

Data original purpose

  • Dataset intendedUse may sufficient for this.

Proposal

  • It may be possible to use intendedUse property for all three information items mentioned above (system intended purpose, model intended risk, and data original purpose)
  • We could move intendedUse property from Dataset class in Dataset Profile to Package class in Software Profile.
  • Then add that property to AIPackage class in AI Profile and Dataset class in Dataset Profile.
  • If package intendedUse or model intendedUse is different from data intendedUse, the data intendedUse is considered a data original purpose
  • How to make it more convenient for machine to understand/parse information inside intendedUse?

Note on "model"

SPDX AIPackage's current description is "Metadata information that can be added to a package to describe an AI application or trained AI model."

So a "model" can be either:

  • an SoftwareArtifact with primaryPurpose=model (a model file alone); or
  • an AIPackage (a model with an inference code or similar, as a package?)

For example:

  1. ggml-model-gpt-2-117M.bin is a model file
  2. The code at https://github.com/ggerganov/ggml/tree/master/examples/gpt-2 is an inference code for GPT-2 model

(1)+(2) together can be an AIPackage (a "trained AI model" according to the AIPackage description).

The proposal above may not work with an SoftwareArtifact with primaryPurpose=model, as SoftwareArtifact has neither an intendedUse nor a domain property.

References

[1] Latest draft 2 Feb 2024 https://www.europarl.europa.eu/meetdocs/2014_2019/plmrep/COMMITTEES/CJ40/AG/2024/02-13/1296003EN.pdf

@goneall goneall added this to the 3.1 milestone Mar 4, 2024
@bact bact changed the title [3.1] [AI] EU AI Act: data original purpose, model intended task, and system intended purpose AI: EU AI Act: data original purpose, model intended task, and system intended purpose Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants