-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GCP step logging #2366
Fix GCP step logging #2366
Conversation
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
- | Google Cloud Keys | 82c26cb | zenml-key.json | View secret | |
- | Google Cloud Keys | c57dd68 | zenml-key.json | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Our GitHub checks need improvements? Share your feedbacks!
I accidentally included my GCP keys during the PR. I later made another commit to delete it. But GitGuardian still shows error. I have not worked on contributing to opensource before and so request some assistance on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Two questions
- Will this effect other artifact stores like S3?
- Is this still performant? Somehow I feel like we are doing too many IO operations... Maybe we should slow it down a bit?
Thanks for reviewing @htahir1 !
Thanks |
@adtygan As the keys remain git hit history, please invalidate them from GCP ASAP to make it safe for your cloud. For the performance, maybe a good way to do it would be to benchmark some running pipelines with varying logs. I have noticed if you use a rich or TQDM progress bar it slows down a LOT, and id love some benchmarks on the local store vs the GCS store for varying scripts :-) |
Thanks for the suggestion @htahir1 , I have invalidated my key. With regard to benchmarks, please give me some time. I will get back on this and update you. |
Hello @htahir1 , I want to confirm with you if I understand what you said correctly. I'm planning to measure the running time for logging 100, 1,000 and 10,000 lines. For each of these, I'm going to measure the running times for local, GCP, local with TQDM, GCP with TQDM. In total this should give 12 run time values. I need to measure these 12 values for the current version of the code and my PR version. Is this correct? Thanks. |
Yes this is correct!
Hamza Tahir
Co-Creator & CTO
[image: ZenML] <https://zenml.io/>
Github <https://github.com/zenml-io/zenml> Twitter
<https://twitter.com/zenml_io> Linkedin <https://linkedin.com/company/zenml>
ZenML Inc./GmbH, Schellingstr. 36, 80799 Munich
HRB Munich 268487, MD/GF: Adam Probst, Hamza Tahir
…On Sat 3. Feb 2024 at 14:56, Aditya Ganesh Kumar ***@***.***> wrote:
Hello @htahir1 <https://github.com/htahir1> , I want to confirm with you
if I understand what you said correctly. I'm planning to measure the
running time for logging 100, 1,000 and 10,000 lines. For each of these,
I'm going to measure the running times for local, GCP, local with TQDM, GCP
with TQDM. In total this should give 12 run time values. I need to measure
these 12 values for the current version of the code and my PR version.
Is this correct? Thanks.
—
Reply to this email directly, view it on GitHub
<#2366 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABER6ERWPQKNPONKWZGGEQTYRY62NAVCNFSM6AAAAABCLWRTHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGMZDQNBUG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I did some parts of the bechmarking and noticed a bunch of issues. Here is the details of the run for writing 100 lines of logs (averaged over 10 runs):
I'm noticing 2 big issues with my fix
I need some more time to look into this issue. |
@adtygan note that you're getting some linting failures on the CI. if you could fix those as well that'd be great! |
Hello @strickvl , I don't think my current code can be optimized to improve performance. Instead, I checked the Potential Solution you had mentioned in the initial post of the issue (#2211 (comment)). This option looks like the best choice. However, I want to clarify how to go about incorporating it. If I understand correctly, you are suggesting to open the log file in write mode and then proceed with logging. This would write all the contents. I have tested on GCP stack and it works. But the only issue I realize is it is going to overwrite past logs. Can I work on a solution where we create a temporary file to store the logs, and then using the exit() method append this file's contents to the main log file? Thanks |
@adtygan this sounds like a reasonable plan to try out! Would love to see how this new approach would benchmark against the old one |
Sorry @htahir1 , I took a break from work for a few weeks and did not keep you updated. I will get back to working on the issue. |
Describe changes
The issue arises because GCS artifacts are immutable Stack Overflow thread. To fix the issue, I rewrite the existing file with its old contents and buffer's content appended together.
Code to test the change: (Credits: @strickvl)
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes