Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aiohttp-ip-rotator #6

Closed
vincenthawke opened this issue Sep 23, 2021 · 10 comments
Closed

aiohttp-ip-rotator #6

vincenthawke opened this issue Sep 23, 2021 · 10 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@vincenthawke
Copy link

vincenthawke commented Sep 23, 2021

Hey, I am back again:) I was happily parsing last few days away, blew past the free tier already, but I wanted to scale up my operation further to make it even faster. I am limited with max 60 workers in the pool, so I decided to rewrite it from multiprocessing to asynchronous concurrency. This is where I realized that I can't use Requests module, but your module was made to work only with Requests.

How difficult would it be to rewrite it to make it compatible with aiohttp? Is there any way to make it work with requests, even thou the module is inherently blocking at the sockets level? The higher latency, the more beneficial would it be to move this work from multiprocessing to asynchronous loop. I could try to work on it, but I am new to python so I appreciate any kind of feedback or advice you can give me.

I just found Dask and am looking into it, if it could help me keep using Requests.
Another possibility is to rent a server that has enough virtual cores to go beyond 60 workers.

@Ge0rg3
Copy link
Owner

Ge0rg3 commented Sep 23, 2021

Hi!
If you go to the PR tab, the latest PR shows a version which supports this. I'm looking to complete the changes and merge it in early next month, if you'd rather wait, but if not then I'd recommend cloning the PR'd fork and using the example given in the thread.
Hope this helps! :)

@vincenthawke
Copy link
Author

vincenthawke commented Sep 23, 2021

Great stuff! I already installed the fork and am now going through the class Harry wrote. I am slightly lost due to also being new to every other module in the project, but I think I'll eventually get it, emphasis on eventually. A usage case of the new class would be much appreciated and would explain what everything does at a surface level.

Currently stuck at where labeled_urls came from, really hard when there are no variable or return types.

@ZOV-code
Copy link

Hi!
I was near starting to write something same but then found this project. Great job!
How is going your new project with aiohttp? Is release close?

@Ge0rg3
Copy link
Owner

Ge0rg3 commented Nov 16, 2021

Hey @ZOV-code, still very much a WIP and nothing publicly available for 2-4 months I'm afraid!

@ZOV-code
Copy link

@Ge0rg3 thank you for the information

@Ge0rg3 Ge0rg3 added help wanted Extra attention is needed enhancement New feature or request labels Nov 18, 2021
@Ge0rg3 Ge0rg3 mentioned this issue Dec 1, 2021
@Ge0rg3 Ge0rg3 pinned this issue Dec 30, 2021
@jherrerogb98
Copy link

Hi! There is any update on this? I am trying to run my code asynchronously but i dont find a way to do it with this approach. Thank you

@Ge0rg3
Copy link
Owner

Ge0rg3 commented May 14, 2022

Hi @jherrerogb98 , I probably won't get around to implementing this for another couple of months at the very least. However, multithreading via threading or otherwise parallel programming such as multiprocessing should work fine. I hope this helps 😄

@jherrerogb98
Copy link

Okay, thank you!

@D4rkwat3r
Copy link

Hello! I also needed a similar asynchronous library and I made an implementation of aiohttp-ip-rotator. You can find it here: https://github.com/D4rkwat3r/aiohttp-ip-rotator

@Ge0rg3
Copy link
Owner

Ge0rg3 commented Jan 16, 2024

Hi all, closing this issue as the aiohttp code will not be merged into this project. If you are set on on aiohttp, then the aiohttp-ip-rotator lib is probably a good fit.

Depending on your use case, aiohttp may be faster than threading requests.

However, you can also run async requests with this lib via:

import requests as rq
import concurrent.futures
from requests_ip_rotator import ApiGateway

site = "https://bbc.co.uk"
gateway = ApiGateway(site)
gateway.pool_connections = 30
gateway.pool_maxsize = 30
gateway.start()

session = rq.Session()
session.mount(site, gateway)

with concurrent.futures.ThreadPoolExecutor(max_workers=25) as executor:
    futures_map = {}
    # Trigger 100 requests
    for i in range(100):
        url = site + "/" + str(i)
        future = executor.submit(session.get, url)
        futures_map[future] = url

    # Collect results
    for future in concurrent.futures.as_completed(futures_map):
        # Check for error
        error = future.exception()
        url = futures_map[future]
        if error:
            print(f"Error for {url}: {error}")
            continue

        # Get response
        response = future.result()
        print(f"{url} - {response.status_code}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants