RFileOp and PythonFileOp log stdout and stderr output to stdout instead of S3 file #3227
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
/fixes #3222
I am marking this WIP because of a pending answer regarding log output as file and upload to S3, resulting in current lint error in the checks. However, the change as it is now already does solve the issue of no logs appearing in Airflow or KFP for R and Python generic pipeline nodes.
What changes were proposed in this pull request?
Hello! I have a problem with my elyra pipelines not showing logs.
I have looked for issues on github but couldn't find anything related, and i am not sure if I am missing some needed config or if this is a bug.
When I run an elyra pipeline with two nodes, one a jupyter notebook and one a python script, only the jupyter notebook one shows the logging. I am not sure why this is the case as they are both managed the same way.
This error happens in both kfp and airflow pipeline runs. It is related to the bootstrapper.py file that executes the scripts behind generic pipeline nodes, in this case pythonFileOp and RFileOp have similar syntax, subprocess.run, in both kfp and airflow bootstrapper.py
https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L521
https://github.com/elyra-ai/elyra/blob/main/elyra/airflow/bootstrapper.py#L332
i.e.
with open(r_script_output, "w") as log_file: subprocess.run(run_args, stdout=log_file, stderr=subprocess.STDOUT, check=True)
We only see the name of the processed file in stdout and no logs in case of python or r nodes.
This change uses the subprocess.run mechanism in such as way that all stderr and stdout output is written to a variable whose content (i.e. the script run output) is then logged.
The stdout content is not logged anymore to S3 bucket storage log output file, or, better said, the file is now empty.
The question remains why Notebook, Python, and R Script run stdout is even put into a file and then uploaded to S3 storage before this change. Right now, a file with empty content is uploaded to S3.
How was this pull request tested?
Added changed bootstrapper.py file for airflow to a runtime image and used it in conjunction with Elyra and Airflow 2.8.2 to see whether the logs appear now in the container log output / in Airflow Logs tab.
Results positive and as intended, documented here
#3222 (comment)
The question as to why script run log output is written back to S3 is posed there and in Slack Elyra general channel as well.
Developer's Certificate of Origin 1.1