Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: performance and memory optimisations #493

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

nothingbutlucas
Copy link
Contributor

@nothingbutlucas nothingbutlucas commented Apr 24, 2024

Hi!, how are you?

Well, I was looking at the project roadmap and one of the points is: Performance and memory optimisations of the Python implementation

This PR intends to work on this point. I started with the utilities. I'm uploading this PR now so I don't suddenly fall with 100 commits and a lot of changes.
I hope the commits are explanatory and clear. If there is something that can be fixed or done differently, please let me know.

I think best way to move forward (If you want that), would be to keep working on this PR/Branch and keep adding commits... What do you think?

I keep working on the utilities or is there any particular module that consumes a lot of memory or performs badly?

By the way, I ran the tests and the 34 ran well with these new changes.

Also, I did not see any style guide defined, I saw that some files are formatted as black would do and others in another way, I think it would be good to define this, at least to start and that there is a coherence between the files of the repo.

Thank you!

EDIT: Ok, as far as I see, the tests do not cover the utilities, maybe it would be good to add tests for the utilities as well, right?

Not only more concise and shorter, it's notably faster

Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
According PEP8 this is the recommended way to do it. I'm 99% sure that this does not speed up the code, but we are comparing identity, not equality, so "is" and "is not" must be used.

Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
Strings in Python are immutable so, the "+" operation means that we need to create the string again and attaching the old content. Per "+". We can replace this by using join() or built-in functions or arguments of functions, like "sep" in print()

Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
This will make it easier for us to maintain and understand the main function. Also deleting duplicated code

Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
This will remove the unnecesary import of time, and a little bit of format

Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
This may seems like a lot, but the only think that I change here was the if/else block. I move the else statement at the beginning and if that condition is true, will exit the function. This is just for code clarity, I think that this change should not improve the speed or performance of the code.

Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
This will improve code execution because nested loops are slower than list/dictionary comprehensions.

Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
The variable sorting can have only 1 value at a time. So does not make sense to ask on every if statement the value of sorting. If one statement is True, then, the other would be False.

Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
Same as previous commits. "".join() is better that "string" + "another_string" + "another_string_2", because strings are immutable.
Also, changed for loops with list comprehension. This is much faster and would improve performance.

Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
print(msg+" ", end=" ")
while (timeout == None or time.time()<timeout) and not until():
print(msg, end=" ")
while (timeout is None or time.time()<timeout) and not until():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this until() isn't actually a function, so the ()'s should be removed. unless I'm missing something about the default value None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!, yeah, I notice that, actually, is defined like this, with a default value of None:

imagen

The function spincheck declared in the middle of the code and is the value used on until()

imagen

Probably this needs a refactor to...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging further, I see there's the spin() function in rnx.py that uses the until kwarg and calls to it (only in rnx.py) are using a lambda. This may be where that pattern came from.

@markqvist
Copy link
Owner

Ok, as far as I see, the tests do not cover the utilities, maybe it would be good to add tests for the utilities as well, right?

YES. Adding tests to cover the utilities would be VERY welcome too :)

I think best way to move forward (If you want that), would be to keep working on this PR/Branch and keep adding commits... What do you think?

Agreed, that is a good approach.

Also, I did not see any style guide defined

As more developers are joining the project lately, I agree it's a good idea to have a more consistently defined style. Currently the definition has been "whatever @markqvist figured was the correct style". To give myself a bit of credit, though, it's pretty consistent ;)

Either way, actually coming up with a sound decision that can be applied to everything here will have to wait a bit, since it will require me looking through the style specs for all the contenders and making an informed decision, which I don't have the time and resources for currently.

We'll get there, and I'm glad you brought it up, but for now, I'm OK with you keeping this work in the Black style, if that is what you prefer, since it seems pretty close to the style that has been in use already.

@markqvist
Copy link
Owner

markqvist commented May 1, 2024

All of the above being said, though, I don't find it particularly useful at present to simply create a lot of commits that change code style. I've rejected this before, and I will likely continue to do so for the foreseeable future.

PRs that actually create performance and memory improvements are very, very welcome, and keeping those additions and changes in a specific style is totally acceptable.

But altering hundreds of lines of codes solely to match a given style is not a very good idea, since I will still have to meticulously trawl through every changed line and evaluate it for potential logic and/or security problems. This uses huge amounts of my time as a maintainer, and is not a very good way to move the project forward.

Someone reading this may think "well, but it's just formatting and style changes, that can't be a problem", but in reality it really can be a huge problem, and overwhelming maintainers with large amounts of seemingly benevolent and/or functionally ambivalent code changes is a common way to inject vulnerabilities and problems into projects.

Please note, I am in no way implying that anyone here is intending to do that, but that no matter what, it is something I need to be aware of, and exact the same amount of vigilance towards.

This means that commits that contain a lot of formatting or code style changes are likely to be heavily down-prioritized in favor of commits and PRs that provide functional benefits or improvements.

TL;DR: Don't change code style and formatting of code that is already working, it's likely to have your PRs rejected or put at the bottom of the queue for merging.

@nothingbutlucas
Copy link
Contributor Author

...common way to inject vulnerabilities and problems into projects.

Yes, you're right, especially with everything that's been going on with xz...

Are there any commits in this PR that you would like to see reversed where you see the changes are mostly style or format?

I tried to do very little of that and focus on improving performance, but some unnecessary changes may have been introduced...

I'm asking so I can move forward and keep adding useful commits and that it is not a burden to merge. Thanks!

@markqvist
Copy link
Owner

Are there any commits in this PR that you would like to see reversed where you see the changes are mostly style or format?

No, don't worry about that for now. I've reviewed most of it already, and it looks fine, so let's not waste time on changing it back. Better to focus on the improvements :)

Signed-off-by: nothingbutlucas <69118979+nothingbutlucas@users.noreply.github.com>
@markqvist
Copy link
Owner

Okay, wait. Hold on. :)

Sorry, I should have been more clear apparently. Changing string concatenation is not going to create any real performance or memory usage improvements, and is something I would consider purely stylistic and formatting-oriented.

I know this is all done in the very best intentions, and that in theory, it is more performant to do it this way, but in practice, it will have no impact here, and it is again just many, many lines of code I have to spend time reviewing, for a gain that is realistically very limited.

Refactoring and optimisations that can be objectively demonstrated to have a meaningful improvement on performance or resource consumption is what we should focus our time and effort on.

Thanks a lot for your willingness to create these PRs, but I have to prioritise my time and resources quite tightly at the moment, and I think we need to stay very focused on effective use everyone's development time.

@nothingbutlucas
Copy link
Contributor Author

Ah okey. I didn't understand, sorry. I'll revert the last commit then

@markqvist
Copy link
Owner

No worries, I should have been clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants