Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting error in "Streaming ETL pipelines" showcase #48

Open
xHishamSaeedx opened this issue May 6, 2024 · 5 comments
Open

getting error in "Streaming ETL pipelines" showcase #48

xHishamSaeedx opened this issue May 6, 2024 · 5 comments
Labels
question Further information is requested

Comments

@xHishamSaeedx
Copy link

i tried run the "Streaming ETL pipelines in Python with Airbyte and Pathway"
and for many sources and i kept getting the folllowing error :

Traceback (most recent call last):
File "/home/hisham/Documents/GitHub/Hubspot-activity/test.py", line 3, in
commits_table = pw.io.airbyte.read(
File "/usr/local/lib/python3.10/dist-packages/pathway/io/airbyte/init.py", line 276, in read
for stream in source.configured_catalog["streams"]:
File "/usr/local/lib/python3.10/dist-packages/airbyte_serverless/sources.py", line 169, in getattr
return getattr(self.source, name)
File "/usr/local/lib/python3.10/dist-packages/airbyte_serverless/sources.py", line 102, in configured_catalog
configured_catalog = self.catalog
File "/usr/local/lib/python3.10/dist-packages/airbyte_serverless/sources.py", line 97, in catalog
message = self._run_and_return_first_message('discover')
File "/usr/local/lib/python3.10/dist-packages/airbyte_serverless/sources.py", line 73, in _run_and_return_first_message
message = next(
File "/usr/local/lib/python3.10/dist-packages/airbyte_serverless/sources.py", line 74, in
(message for message in messages if message['type'] not in ['LOG', 'TRACE']),
File "/usr/local/lib/python3.10/dist-packages/airbyte_serverless/sources.py", line 68, in _run
raise AirbyteSourceException(json.dumps(message['trace']['error']))
airbyte_serverless.sources.AirbyteSourceException: {"message": "Something went wrong in the connector. See the logs for more details.", "internal_message": "[Errno 2] No such file or directory: '/mnt/temp/config.json'", "stack_trace": "Traceback (most recent call last):\n File "/airbyte/integration_code/main.py", line 8, in \n run()\n File "/airbyte/integration_code/source_hubspot/run.py", line 14, in run\n launch(source, sys.argv[1:])\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 235, in launch\n for message in source_entrypoint.run(parsed_args):\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 108, in run\n raw_config = self.source.read_config(parsed_args.config)\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/connector.py", line 51, in read_config\n config = BaseConnector._read_json_file(config_path)\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/connector.py", line 61, in _read_json_file\n with open(file_path, "r") as file:\nFileNotFoundError: [Errno 2] No such file or directory: '/mnt/temp/config.json'\n", "failure_type": "system_error", "stream_descriptor": null}

@xHishamSaeedx xHishamSaeedx added the question Further information is requested label May 6, 2024
@janchorowski
Copy link
Member

Hi, we are investigating the issue. Can you clarify if:

  1. all connectors are failing, or
  2. some connectors work, but others fail?

What is your setup (operating system, architecture)?

@berkecanrizai
Copy link
Contributor

Hey @xHishamSaeedx , issue is from the auto-generated config, I replicated the error.
First, make sure version 0.23 of airbyte-serverless is installed, you can check with:
pip show airbyte-serverless.

Create the config with abs create github --source "airbyte/source-github", (I believe you have already done this)

Specific error is from several places in the config,

  • the repository (right on top of the repositories entry) shouldn't be left empty, you can completely remove or comment it out.
  • start_date also throws that error when it is left empty, you can also remove or populate it.
  • lastly, repositories should take a list in yaml format (rather than single string), replace the autogenerated snippet with:
repositories: 
  - "pathwaycom/pathway"

If you want to customize the app for yourself, above solution should fix.

If you just want to run the pipeline as in the showcase, you can completely remove the contents of the file after creating the config.

Then, copy and paste the config defined in the showcase as:

source:
  docker_image: "airbyte/source-github"  # Here the airbyte connector type is specified
  config: 
    credentials:
      option_title: "PAT Credentials"  # The second authentication option you've uncommented
      personal_access_token: "<TOKEN>"  # Taken from https://github.com/settings/tokens
    repositories:
      - pathwaycom/pathway  # Pathway repository
    api_url: "https://api.github.com/"
  streams: commits

Make sure to change the part with <TOKEN> to put your own git token (make sure that "read public repo" permission is selected while creating the token.

Make sure that, path in the pw.io.airbyte.read is pointing to the yaml file above

commits_table = pw.io.airbyte.read(
    "./connections/github.yaml",  # our yaml config
    streams=["commits"],
)

This should successfully run when you call pw.run().
Hope this helps.

@xHishamSaeedx
Copy link
Author

Hey @xHishamSaeedx , issue is from the auto-generated config, I replicated the error. First, make sure version 0.23 of airbyte-serverless is installed, you can check with: pip show airbyte-serverless.

Create the config with abs create github --source "airbyte/source-github", (I believe you have already done this)

Specific error is from several places in the config,

* the `repository` (right on top of the `repositories` entry) shouldn't be left empty, you can completely remove or comment it out.

* `start_date` also throws that error when it is left empty, you can also remove or populate it.

* lastly, `repositories` should take a list in yaml format (rather than single string), replace the autogenerated snippet with:
repositories: 
  - "pathwaycom/pathway"

If you want to customize the app for yourself, above solution should fix.

If you just want to run the pipeline as in the showcase, you can completely remove the contents of the file after creating the config.

Then, copy and paste the config defined in the showcase as:

source:
  docker_image: "airbyte/source-github"  # Here the airbyte connector type is specified
  config: 
    credentials:
      option_title: "PAT Credentials"  # The second authentication option you've uncommented
      personal_access_token: "<TOKEN>"  # Taken from https://github.com/settings/tokens
    repositories:
      - pathwaycom/pathway  # Pathway repository
    api_url: "https://api.github.com/"
  streams: commits

Make sure to change the part with <TOKEN> to put your own git token (make sure that "read public repo" permission is selected while creating the token.

Make sure that, path in the pw.io.airbyte.read is pointing to the yaml file above

commits_table = pw.io.airbyte.read(
    "./connections/github.yaml",  # our yaml config
    streams=["commits"],
)

This should successfully run when you call pw.run(). Hope this helps.

Still doesnt work

YAML file :

source:
  docker_image: "airbyte/source-github"  # Here the airbyte connector type is specified
  config: 
    credentials:
      option_title: "PAT Credentials"  # The second authentication option you've uncommented
      personal_access_token: "<TOKEN>"  # Taken from https://github.com/settings/tokens
    repositories:
      - pathwaycom/pathway  # Pathway repository
    api_url: "https://api.github.com/"
  streams: commits

and the file i ran was :

import pathway as pw

commits_table = pw.io.airbyte.read(
    "./connections/github.yaml",
    streams=["commits"],
)

pw.io.jsonlines.write(commits_table, "commits.jsonlines")
pw.run()

and before it was running perfectly , only a few days ago it stopped working without me making any changes to the yaml file, and at that point , github issues werent being pulled but comments somehow were and commits

@xHishamSaeedx
Copy link
Author

Hi, we are investigating the issue. Can you clarify if:

1. all connectors are failing, or

2. some connectors work, but others fail?

What is your setup (operating system, architecture)?

this error i got a few days ago , before it was running smoothly, but after i got the error without making changes to the yaml file at all, i checked other streams like commits , comments and those were still working , then even they stopped

i tried hubspot connector too but that also gave same error

operating system is Linux (Ubuntu) and architecture is x86_64,

@berkecanrizai
Copy link
Contributor

Hi, we are investigating the issue. Can you clarify if:

1. all connectors are failing, or

2. some connectors work, but others fail?

What is your setup (operating system, architecture)?

this error i got a few days ago , before it was running smoothly, but after i got the error without making changes to the yaml file at all, i checked other streams like commits , comments and those were still working , then even they stopped

i tried hubspot connector too but that also gave same error

operating system is Linux (Ubuntu) and architecture is x86_64,

Hey, I just ran your code from your repository test-pathway without any error.
Few things:

  • please remove your personal_access_token from the public repo. GitHub may have already revoked the token, but please make sure you remove it.
  • Token being revoked or not having the Read public repos permit is likely the error.
  • Make sure Pathway is updated, you can install with pip install -U pathway.
  • If you have done these, could you send the new error? I believe it is due to the token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants