Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Null is returned when trying to get all the table cells for BigQuery TableRow #614

Open
polleyg opened this issue Oct 23, 2017 · 0 comments

Comments

@polleyg
Copy link

polleyg commented Oct 23, 2017

Pipeline is: BigQuery -> ParDo -> GCS

Using SDK 2.1.0. Inside the ParDo, calling c.element().getF() to get the List<TableCell> for the given TableRow, returns null:

public class BigQueryTableToOneFile {
    private static final String BIGQUERY_TABLE = "bigquery-samples:wikipedia_benchmark.Wiki1M";
    private static final String GCS_OUTPUT_FILE = "gs://bigquery-table-to-one-file/output/wiki_1M.csv";

    public static void main(String[] args) throws Exception {
        DataflowPipelineOptions options = PipelineOptionsFactory
                .fromArgs(args)
                .withValidation()
                .as(DataflowPipelineOptions.class);
        options.setAutoscalingAlgorithm(THROUGHPUT_BASED);
        Pipeline pipeline = Pipeline.create(options);
        pipeline.apply(BigQueryIO.read().from(BIGQUERY_TABLE))
                .apply(ParDo.of(new DoFn<TableRow, String>() {
                    @ProcessElement
                    public void processElement(ProcessContext c) throws Exception {
                        for(TableCell tableCell : c.element().getF()){ //<--this returns null
                            System.out.println(tableCell.getV().toString());
                        }
                    }
                }))
                .apply(TextIO.write().to(GCS_OUTPUT_FILE)
                        .withoutSharding()
                        .withWritableByteChannelFactory(GZIP)
                );
        pipeline.run();
    }
}

screen shot 2017-10-23 at 1 23 51 pm
screen shot 2017-10-23 at 1 24 03 pm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant