Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-stepfunctions-tasks: state machine role is missing sagemaker:AddTags permission for SageMakerCreateTransformJob task #26012

Closed
l3ku opened this issue Jun 16, 2023 · 4 comments · Fixed by #27264
Labels
@aws-cdk/aws-stepfunctions-tasks bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@l3ku
Copy link

l3ku commented Jun 16, 2023

Describe the bug

Using Python, I am trying to create a step functions state machine that runs an AWS SageMaker batch transform job using the .sync version of the API like this:

batch_inference_job = sfn_tasks.SageMakerCreateTransformJob(
    self,
    "BatchInferenceTransformJob",
    integration_pattern=sfn.IntegrationPattern.RUN_JOB,
    transform_job_name=sfn.JsonPath.string_at("$.transform_job_name"),
    model_name="xxx",
    ...
)

state_machine = sfn.StateMachine(
    self,
    "BatchInferencePipeline",
    state_machine_name="xxx"
    definition=batch_inference_job
)

Note that I am not providing anything in the role parameter when I instantiate the StateMachine. When I try to execute the state machine, I get the following type of error:

User: arn:aws:sts::xxx:assumed-role/xxx is not authorized to perform: sagemaker:AddTags on resource: arn:aws:sagemaker:eu-west-1:xxx:transform-job/xxx because no identity-based policy allows the sagemaker:AddTags action (Service: AmazonSageMaker; Status Code: 400; Error Code: AccessDeniedException; Request ID: xxx; Proxy: null)

When I take a closer look at what tags step functions is trying to set for the transform job (I am not setting any tags for the job myself), I see some AWS managed tags, which presumably the step functions service appends:

{
    "Key": "MANAGED_BY_AWS",
    "Value": "STARTED_BY_STEP_FUNCTIONS"
}

So from my viewpoint it seems that the role generated by CDK for the state machine should already by default include a policy that allows the sagemaker:AddTags action. When I tried spinning up the batch transform job with sfn.IntegrationPattern.REQUEST_RESPONSE, step functions didn't try to set any tags and submitting the job worked as expected.

Expected Behavior

The default role generated by cdk for the step functions state machine should have all the necessary permissions to start a job when using integration_pattern=sfn.IntegrationPattern.RUN_JOB, including sagemaker:AddTags.

Current Behavior

Got an error when step functions tried to create the batch transform job:

User: arn:aws:sts::xxx:assumed-role/xxx is not authorized to perform: sagemaker:AddTags on resource: arn:aws:sagemaker:eu-west-1:xxx:transform-job/xxx because no identity-based policy allows the sagemaker:AddTags action (Service: AmazonSageMaker; Status Code: 400; Error Code: AccessDeniedException; Request ID: xxx; Proxy: null)

Reproduction Steps

batch_inference_job = sfn_tasks.SageMakerCreateTransformJob(
    self,
    "BatchInferenceTransformJob",
    integration_pattern=sfn.IntegrationPattern.RUN_JOB,
    transform_job_name=sfn.JsonPath.string_at("$.transform_job_name"),
    model_name="xxx",
    ...
)

state_machine = sfn.StateMachine(
    self,
    "BatchInferencePipeline",
    state_machine_name="xxx"
    definition=batch_inference_job
)

Possible Solution

Not tested, but it seems that the policies for the role are added in: https://github.com/aws/aws-cdk/blob/main/packages/aws-cdk-lib/aws-stepfunctions-tasks/lib/sagemaker/create-transform-job.ts#L273. So simply adding a policy statement that allows sagemaker:AddTags there.

Additional Information/Context

No response

CDK CLI Version

2.81.0

Framework Version

No response

Node.js Version

v18.16.0

OS

MacOS 12.5

Language

Python

Language Version

3.10.9

Other information

No response

@l3ku l3ku added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 16, 2023
@l3ku
Copy link
Author

l3ku commented Jun 16, 2023

I was able to work around this by adding the policy statement manually but IMO it should be there by default:

state_machine.add_to_role_policy(
    iam.PolicyStatement(
        actions=["sagemaker:AddTags"],
        resources=[
            f"arn:aws:sagemaker:{self.region}:{self.account}:transform-job/*"
        ],
    )
)

@pahud
Copy link
Contributor

pahud commented Jun 16, 2023

Thanks for the report. I think we probably should add the missing permissions here:

private makePolicyStatements(): iam.PolicyStatement[] {
const stack = Stack.of(this);
// create new role if doesn't exist
if (this._role === undefined) {
this._role = new iam.Role(this, 'SagemakerTransformRole', {
assumedBy: new iam.ServicePrincipal('sagemaker.amazonaws.com'),
managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonSageMakerFullAccess')],
});
}
// https://docs.aws.amazon.com/step-functions/latest/dg/sagemaker-iam.html
const policyStatements = [
new iam.PolicyStatement({
actions: ['sagemaker:CreateTransformJob', 'sagemaker:DescribeTransformJob', 'sagemaker:StopTransformJob'],
resources: [
stack.formatArn({
service: 'sagemaker',
resource: 'transform-job',
resourceName: '*',
}),
],
}),
new iam.PolicyStatement({
actions: ['sagemaker:ListTags'],
resources: ['*'],
}),
new iam.PolicyStatement({
actions: ['iam:PassRole'],
resources: [this.role.roleArn],
conditions: {
StringEquals: { 'iam:PassedToService': 'sagemaker.amazonaws.com' },
},
}),
];

@tam0ri
Copy link
Contributor

tam0ri commented Sep 24, 2023

Previous PR was closed over 2 months ago, and there has been no activity after that. So I submitted the new PR.

@mergify mergify bot closed this as completed in #27264 Dec 7, 2023
mergify bot pushed a commit that referenced this issue Dec 7, 2023
…Tags permission for SageMakerCreateTransformJob task (#27264)

If we specified RUN_JOB as IntegrationPattern prop for SageMakerCreateTransformJob construct, StepFunctions executes SageMaker batch transform job [synchronously](https://docs.aws.amazon.com/step-functions/latest/dg/connect-to-resource.html#connect-sync). In this case, StepFunctions add a tag (key: MANAGED_BY_AWS, value: STARTED_BY_STEP_FUNCTIONS) to the job, so state machine role needs the permission to do that. However, currently CDK does not add the permission automatically.

This PR solves the issue by adding `sagemaker:AddTags` permission to state machine role when RUN_JOB is specified as IntegrationPattern prop.

Closes #26012

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Copy link

github-actions bot commented Dec 7, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-stepfunctions-tasks bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
3 participants