Skip to content
Noel Martin Llevares edited this page Mar 11, 2024 · 2 revisions

glue-utils

License

Description

Reusable utilities for working with Glue PySpark jobs

Installation

As a runtime (or production) dependency...

pip install glue-utils

For development...

This library does not include pyspark and aws-glue-libs as dependencies as they are already pre-installed in Glue's runtime environment.

To help in developing your Glue jobs locally in your IDE, it is helpful to install pyspark and aws-glue-libs. Unfortunately, aws-glue-libs is not available through PyPI so we can only install it from its git repository.

pip install pyspark==3.3.0
pip install git+https://github.com/awslabs/aws-glue-libs.git@master

To make your local environment as close to Glue's runtime as possible, use the versions specified in this document:

Usage

glue_utils currently has the following features:

  • ManagedGlueContext

ManagedGlueContext

ManagedGlueContext instantiates a GlueContext (while initializing a Job) and wraps it in a ContextManager to ensure that the Job will be committed at the end of the ContextManager.

See below.

import sys

from awsglue.utils import getResolvedOptions

from glue_utils.context import ManagedGlueContext

options = getResolvedOptions(sys.argv, [])

with ManagedGlueContext(job_options=options) as glue_context:
    dynamicframe = glue_context.create_dynamic_frame_from_options(
        connection_type="s3",
        connection_options={
            "paths": ["s3://awsglue-datasets/examples/us-legislators/all/persons.json"],
            "recurse": True,
        },
        format="json",
    )
    dynamicframe.printSchema()