Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting minio to work with docker demo #338

Open
alberttwong opened this issue Feb 23, 2024 · 2 comments
Open

Getting minio to work with docker demo #338

alberttwong opened this issue Feb 23, 2024 · 2 comments

Comments

@alberttwong
Copy link
Contributor

alberttwong commented Feb 23, 2024

Docker-compose

  jupyter:
    container_name: jupyter
    hostname: jupyter
    image: 'almondsh/almond:latest'
    ports:
      - '8888:8888'
    volumes:
      - ./notebook:/home/jovyan/work
      - ./jars:/home/jars
      - ./data:/home/data
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - MINIO_ACCESS_KEY=admin
      - MINIO_SECRET_KEY=password
      - MINIO_URL=http://minio:9000

  minio:
    image: minio/minio
    environment:
      - MINIO_ROOT_USER=admin
      - MINIO_ROOT_PASSWORD=password
      - MINIO_DOMAIN=minio
    networks:
      default:
        aliases:
          - warehouse.minio
    ports:
      - 9001:9001
      - 9000:9000
    command: ["server", "/data", "--console-address", ":9001"]
  mc:
    depends_on:
      - minio
    image: minio/mc
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
    entrypoint: >
      /bin/sh -c "
      until (/usr/bin/mc config host add minio http://minio:9000 admin password) do echo '...waiting...' && sleep 1; done;
      /usr/bin/mc rm -r --force minio/warehouse;
      /usr/bin/mc mb minio/warehouse;
      /usr/bin/mc policy set public minio/warehouse;
      tail -f /dev/null
      "

Adding to the notebook

import $ivy.`com.amazonaws:aws-java-sdk:1.12.661`
import $ivy.`org.apache.hadoop:hadoop-aws:3.3.1`
val hudiTableName = "hudi_dimCustomer"
val hudiBasePath = "s3a://warehouse/db/hudi_dimCustomer"
val writerSchema = "{\"type\":\"record\",\"name\":\"Sample\",\"fields\":[{\"name\":\"_c0\",\"type\":\"string\"},{\"name\":\"CustomerKey\",\"type\":\"string\"},{\"name\":\"GeographyKey\",\"type\":\"string\"},{\"name\":\"FirstName\",\"type\":\"string\"},{\"name\":\"LastName\",\"type\":\"string\"},{\"name\":\"BirthDate\",\"type\":\"string\"},{\"name\":\"MaritalStatus\",\"type\":\"string\"},{\"name\":\"Gender\",\"type\":\"string\"},{\"name\":\"YearlyIncome\",\"type\":\"string\"},{\"name\":\"TotalChildren\",\"type\":\"string\"},{\"name\":\"NumberChildrenAtHome\",\"type\":\"string\"},{\"name\":\"Education\",\"type\":\"string\"},{\"name\":\"Occupation\",\"type\":\"string\"},{\"name\":\"HouseOwnerFlag\",\"type\":\"string\"},{\"name\":\"NumberCarsOwned\",\"type\":\"string\"}]}"
val hudiWriteOptions = new HashMap[String, String]()
hudiWriteOptions.put("hoodie.table.name", hudiTableName)
hudiWriteOptions.put("hoodie.datasource.write.recordkey.field", "CustomerKey")
hudiWriteOptions.put("hoodie.datasource.write.partitionpath.field", "")
hudiWriteOptions.put("hoodie.datasource.write.precombine.field", "_c0")
hudiWriteOptions.put("hoodie.datasource.write.operation", "insert")
hudiWriteOptions.put("hoodie.write.schema", writerSchema)
hudiWriteOptions.put("hoodie.populate.meta.fields", "false")
hudiWriteOptions.put("hoodie.parquet.small.file.limit", "1")

val deltaTableName = "delta_dimGeography"
val deltaBasePath = "s3a://warehouse/db/delta_dimGeography"

val spark = org.apache.spark.sql.SparkSession.builder()
    .appName("demo")
    .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.hudi.catalog.HoodieCatalog")
    .config("spark.sql.extensions", "org.apache.spark.sql.hudi.HoodieSparkSessionExtension")
    .config("spark.kryo.registrator", "org.apache.spark.HoodieSparkKryoRegistrar")
    .config("fs.s3a.endpoint", "http://minio:9000")
    .config("fs.s3a.connection.ssl.enabled", false)
    .config("fs.s3a.access.key", "admin")
    .config("fs.s3a.secret.key", "password")
    .config("fs.s3a.path.style.access", true)
    .config("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
    .config("fs.s3a.awsAccessKeyId", "admin")
    .config("fs.s3a.awsSecretAccessKey", "password")
    .config("fs.s3.awsAccessKeyId", "admin")
    .config("fs.s3.awsSecretAccessKey", "password")
    .config("spark.executorEnv.AWS_ACCESS_KEY_ID", "admin")
    .config("spark.executorEnv.AWS_SECRET_ACCESS_KEY", "password")
    .master("local")
    .getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
@alberttwong
Copy link
Contributor Author

none of those fs.s3a configs work. Only works when I add environment variables. Still has this error

org.apache.hudi.exception.HoodieIOException: Could not check if s3a://warehouse/db/hudi_dimCustomer is a valid table
  org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:59)
  org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:140)
  org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:692)
  org.apache.hudi.common.table.HoodieTableMetaClient.access$000(HoodieTableMetaClient.java:85)
  org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:774)
  io.onetable.hudi.HudiSourceClientProvider.getSourceClientInstance(HudiSourceClientProvider.java:42)
  io.onetable.hudi.HudiSourceClientProvider.getSourceClientInstance(HudiSourceClientProvider.java:31)
  io.onetable.client.OneTableClient.sync(OneTableClient.java:90)
  ammonite.$sess.cell5$Helper.<init>(cell5.sc:21)
  ammonite.$sess.cell5$.<init>(cell5.sc:7)
  ammonite.$sess.cell5$.<clinit>(cell5.sc:-1)
java.nio.file.AccessDeniedException: s3a://warehouse/db/hudi_dimCustomer/.hoodie: getFileStatus on s3a://warehouse/db/hudi_dimCustomer/.hoodie: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: EZ7706X8A0FJ41T4; S3 Extended Request ID: Avnviu89tVENUd7vmOuoh6Au5JlVPWqh08Ue5uFJL8d0EUwxGvLCITgECVIkffnPYcIT+KhYC4Q=; Proxy: null), S3 Extended Request ID: Avnviu89tVENUd7vmOuoh6Au5JlVPWqh08Ue5uFJL8d0EUwxGvLCITgECVIkffnPYcIT+KhYC4Q=:403 Forbidden
  org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:249)
  org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170)
  org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3286)
  org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
  org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053)
  org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:410)
  org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:114)
  org.apache.hudi.common.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:404)
  org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51)
  org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:140)
  org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:692)
  org.apache.hudi.common.table.HoodieTableMetaClient.access$000(HoodieTableMetaClient.java:85)
  org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:774)
  io.onetable.hudi.HudiSourceClientProvider.getSourceClientInstance(HudiSourceClientProvider.java:42)
  io.onetable.hudi.HudiSourceClientProvider.getSourceClientInstance(HudiSourceClientProvider.java:31)
  io.onetable.client.OneTableClient.sync(OneTableClient.java:90)
  ammonite.$sess.cell5$Helper.<init>(cell5.sc:21)
  ammonite.$sess.cell5$.<init>(cell5.sc:7)
  ammonite.$sess.cell5$.<clinit>(cell5.sc:-1)
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: EZ7706X8A0FJ41T4; S3 Extended Request ID: Avnviu89tVENUd7vmOuoh6Au5JlVPWqh08Ue5uFJL8d0EUwxGvLCITgECVIkffnPYcIT+KhYC4Q=; Proxy: null), S3 Extended Request ID: Avnviu89tVENUd7vmOuoh6Au5JlVPWqh08Ue5uFJL8d0EUwxGvLCITgECVIkffnPYcIT+KhYC4Q=
  com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879)
  com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418)
  com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387)
  com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
  com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
  com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
  com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
  com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
  com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
  com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
  com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
  com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5520)
  com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5467)
  com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1402)
  org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$6(S3AFileSystem.java:2066)
  org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:412)
  org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:375)
  org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2056)
  org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2032)
  org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3273)
  org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
  org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053)
  org.apache.hudi.common.fs.HoodieWrapperFileSystem.lambda$getFileStatus$17(HoodieWrapperFileSystem.java:410)
  org.apache.hudi.common.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:114)
  org.apache.hudi.common.fs.HoodieWrapperFileSystem.getFileStatus(HoodieWrapperFileSystem.java:404)
  org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:51)
  org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:140)
  org.apache.hudi.common.table.HoodieTableMetaClient.newMetaClient(HoodieTableMetaClient.java:692)
  org.apache.hudi.common.table.HoodieTableMetaClient.access$000(HoodieTableMetaClient.java:85)
  org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:774)
  io.onetable.hudi.HudiSourceClientProvider.getSourceClientInstance(HudiSourceClientProvider.java:42)
  io.onetable.hudi.HudiSourceClientProvider.getSourceClientInstance(HudiSourceClientProvider.java:31)
  io.onetable.client.OneTableClient.sync(OneTableClient.java:90)
  ammonite.$sess.cell5$Helper.<init>(cell5.sc:21)
  ammonite.$sess.cell5$.<init>(cell5.sc:7)
  ammonite.$sess.cell5$.<clinit>(cell5.sc:-1)

@alberttwong
Copy link
Contributor Author

alberttwong commented Feb 29, 2024

Minio said to try:

I tried this in my spark and it didn't work

  <property>
    <name>fs.s3a.aws.credentials.provider</name>
    <value>io.minio.credentials.AWSEnvironmentProvider</value>
  </property>

I also tried it with io.minio.credentials.MinioEnvironmentProvider and it also didn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant