Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to Gevent #2029

Closed
wants to merge 8 commits into from
Closed

Conversation

Constantin1489
Copy link
Contributor

@Constantin1489 Constantin1489 commented Dec 3, 2023

  1. If you check the https://github.com/Constantin1489/changedetection.io/actions/runs/7076892148, it shows every test failed randomly. (sometimes failed early, sometimes went further luckily. I love it.)
image image image
  1. I turned off the monkey patch for threads. Our major threads are mostly lxml-dependent(lxml is rooted in C. https://stackoverflow.com/a/25275718/20307768). Also, as we already noticed our threads run lxml-dependent modules. But the problem is I don't know whether "I turned off the monkey patch for threads" is right. When I remove all monkeys or fully bring monkeys, the test also failed, BTW. for those cases, I assume the reason will also differ. To be honest , I don't know really this kind of the level.
image

I just assume and believe I know the reason. I don't know what to do currently.

@Constantin1489
Copy link
Contributor Author

Also for the longer test, there is some stackoverflow answer. But I didn't test it.

@Constantin1489 Constantin1489 changed the title [WIP] Migration to Gevent [WIP] Migration to Gevent (without python3.12) Dec 3, 2023
eventlet.wsgi.server(eventlet.listen((host, int(port)), s_type), app)

ssl_args = {}
sock = WSGIServer.get_listener((host, port), family=s_type)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with Netstat, ipv6 also works(?). FYI, http://[::1]:5000

sys.exit()
cd_server.stop()
cd_server.close()
print('Shutdown Success', file=sys.stderr)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for logging.

@dgtlmoon
Copy link
Owner

dgtlmoon commented Dec 3, 2023

by the way, the application MUST stay as a single-threaded application

this is because the data-structure is stored as a single dict, and running separate threads in gevent/flask means each thread gets its own datastructure, which is bad.

this explains your random failures

it is more like Redis than a traditional system where it has an external database

@@ -1,4 +1,4 @@
eventlet>=0.33.3 # related to dnspython fixes
gevent
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to dnspython fixes

also means we can maybe un-pin some package versions.. i wonder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh the dnspython I will apply it later.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventlet/eventlet#781 is that 'dnspython' problem i believe, but actually it may not be a 'thing' anymore

@Constantin1489
Copy link
Contributor Author

Constantin1489 commented Dec 3, 2023

then the reason why my CD works well is I don't do massive things frequently, haha.

@dgtlmoon
Copy link
Owner

maybe gunicorn is an alternative too? --threads=1

@Constantin1489
Copy link
Contributor Author

Constantin1489 commented Dec 23, 2023

@dgtlmoon gunicorn won't work with Windows. https://github.com/benoitc/gunicorn/blob/26aba9ed9d55f6465fd2271169b71bc119ea8186/pyproject.toml#L17
Also, gunicorn introduces itself as "Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX." (https://github.com/benoitc/gunicorn/blob/26aba9ed9d55f6465fd2271169b71bc119ea8186/README.rst?plain=1#L20)

@Constantin1489
Copy link
Contributor Author

Constantin1489 commented Jan 19, 2024

I'm not sure about when I can finish this (maybe 2-3 months?).
I'm currently trying to expand my knowledge about concurrency with books and so on. (BTW I finally figured out what context the single-threaded means.)
If you have good resources, I'd be glad to look at them.

Please ignore any CI activities related to this PR until I'm prepared. Thanks.
If there is any question, please ask me to make my understanding better.

@Constantin1489 Constantin1489 marked this pull request as draft January 21, 2024 12:02
@dgtlmoon
Copy link
Owner

dgtlmoon commented Jan 22, 2024

Currently I'm focusing threading and asyncio

The main problem here is that one thread is holding all the data, usually one would use something like mysql or whatever to keep the data in a resource that's accessible to any thread..

I would prefer to keep the application running without any need for mysql/mongo/etc/etc

The main solution would be that the data structure needs to be in some shared-memory resource between each thread, but this is unreliable a little bit in windows/mac.. unless theres some python library that will help

https://docs.python.org/3/library/multiprocessing.shared_memory.html for example

or move to gevent but keep exactly 1 thread only (like it is now)

i dont think there's much to win right now... better to move to multilang or some other feature perhaps (but ofcourse its up to you :) )

@Constantin1489
Copy link
Contributor Author

Constantin1489 commented Jan 22, 2024

Yeah because Python is the single-threaded model, one process-one main thread (and C extension module..). That's something tempting. Okay, why not Babel first? I will look at it. If my intuition is right, I can lay the foundation this week?

@dgtlmoon
Copy link
Owner

Okay, why not Babel first? I will look at it. If my intuition is right, I can lay the foundation this week?

sure :) thanks a billion :)

Co-authored-by: dgtlmoon <leigh@morresi.net>
@dgtlmoon
Copy link
Owner

https://wsgi.readthedocs.io/en/latest/definitions.html

since gevent just implements the WSGI standard, the flags are the same.

https://wsgi.readthedocs.io/en/latest/definitions.html#envvar-wsgi.multithread

https://wsgi.readthedocs.io/en/latest/definitions.html#envvar-wsgi.multiprocess

I think wsgi.multiprocess should be True because one changedetection.io process can have some child processes, like when it launches playwright

@dgtlmoon
Copy link
Owner

https://code.google.com/archive/p/modwsgi/wikis/ProcessesAndThreading.wiki

ahh multiprocess means something else in this context, so yeah, leave it False

@dgtlmoon
Copy link
Owner

Some history on the pinned dnspython version over at #1314 , it could be nice to remove the pin and let the version float - most OSs usually have this installed by default and any modern version should work

@Constantin1489
Copy link
Contributor Author

See also: https://eventlet.readthedocs.io/en/latest/migration.html#alternatives

If you really want to continue with Eventlet’s pretend-to-be-blocking approach, you can use gevent. But keep in mind that the same technical issues that make Eventlet maintenance unsustainable over the long term also apply to Gevent.

@dgtlmoon
Copy link
Owner

aaah... yeah thanks for the find, it is true that async code is better

Pretending to look like a blocking API while actually using an event loop underneath requires exact emulation of an ever-changing and ever-increasing API footprint, which is fundamentally unsustainable for a volunteer-driven open source project. This is why Eventlet is discouraging new users.

ok so eventlet and gevent are actually blocking-style wrappers ontop of what is really a asyncio on the "ground level" in python... so yeah thats kinda tough

@dgtlmoon
Copy link
Owner

I still think gevent is the way to go, it looks well maintained https://www.gevent.org/changelog.html

we can add some comment that "gevent is just a blocking wrapper ontop of what is really a asyncio interface at its heart, but due to our not-shared-across-processes datastructure (so you dont ever have to waste time setting up mysql etc) we have a single threaded model"

@dgtlmoon
Copy link
Owner

I'm learning more about doing fun stuff with python dicts, so i hope we can move to maybe even a sqlite backend in the future

@Constantin1489 Constantin1489 marked this pull request as ready for review February 16, 2024 10:38
@Constantin1489 Constantin1489 changed the title [WIP] Migration to Gevent (without python3.12) Migration to Gevent Feb 16, 2024
@Constantin1489
Copy link
Contributor Author

I closed this because I saw the gevent won't work for future python version too.(there was CPython issue similar to eventlet)

@dgtlmoon
Copy link
Owner

I closed this because I saw the gevent won't work for future python version too.(there was CPython issue similar to eventlet)

Understood, I'm sort of thinking it may be time to start migrating to SQLite in the backend and do things more traditionally so that we can run it multithreaded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants