Skip to content

Latest commit

 

History

History

dataflow-xml-pubsub-to-gcs

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Overview

The purpose of this walkthrough is to create a Dataflow streaming pipeline to read XML encoded messages from PubSub:

Architecture

Pipeline

This pipeline is developed with the Beam Python SDK

  • Please refer to the Python codebase

A compatible qwiklabs environment

If you wish to execute this code within a Qwiklabs environment, you can use this Stream messages from Pub/Sub by using Dataflow

Recommendations and next steps

Best practices

Best practice recommends a Dataflow job to:

  1. Utilize a worker service account to access the pipeline's files and resources
  2. Minimally necessary IAM permissions for the worker service account
  3. Minimally required Google cloud services

Exactly once and in order processing

Beam may, for redundancy purposes, sometimes process messages more than once and message ordering is not guaranteed by design. However, in order and exactly once processing of the messages is a possible when using PubSub and Dataflow tegether. If this is a solution requirement please refer to the following Google Cloud Blog's entry: After Lambda: Exactly-once processing in Google Cloud Dataflow, Part 1