Skip to content
This repository has been archived by the owner on Apr 15, 2020. It is now read-only.

Is it possible to use chrome-prerender as a squid parent in a proxy sandwich setup? #35

Open
olidietzel opened this issue Dec 28, 2017 · 5 comments
Labels

Comments

@olidietzel
Copy link

I would like to use chrome prerender in a proxy sandwich configuration (cache as much as possible), but squid as a client uses different GET requests. Ideas what to configure where, anyone?

Curling works fine:
[2017-12-28 17:33:27 +0100] - (sanic.access)[INFO][1:2]: GET http://127.0.0.1:3000/http://www.nytimes.com/ 200 446977
2017-12-28 17:33:27,944 INFO sanic.access.log_response:325

Squid fails:
[2017-12-28 17:34:06 +0100] - (sanic.access)[INFO][1:2]: GET http://www.nytimes.com/ 400 11
2017-12-28 17:34:06,510 INFO sanic.access.log_response:325
[2017-12-28 17:34:11 +0100] [23436] [INFO] KeepAlive Timeout. Closing connection.
2017-12-28 17:34:11,510 INFO root.keep_alive_timeout_callback:193 KeepAlive Timeout. Closing connection.

@olidietzel
Copy link
Author

This minimal patch in app.py was good enough for a poc. :)

if not parsed_url.hostname:
    url = request.url
    #return response.text('Bad Request', status=400)

@messense
Copy link
Member

messense commented Jan 2, 2018

Haven't used squid with chrome-prerender before, I am not sure what's wrong. Could you elaborate?

@olidietzel
Copy link
Author

I wanted to see the render quality of the chrome engine with my own eyes in a browser and do a test spider run on an existing angular web spa with an old school tool, httrack, in order to see if the whole angular app is crawlable.

For both i needed a regular proxy api interface, so i have put a squid proxy in front of prerender. The Squid is configured to eat all static file requests directly and sends the rest of the requests to his chrome-prerender parent proxy.

Squid as a proxy client sends different than expected requests to his "parent proxy", in this case prerender, so i had to make prerender understand these.

[2017-12-28 17:34:06 +0100] - (sanic.access)[INFO][1:2]: GET http://www.nytimes.com/ 400 11

Worked by replacing

if not parsed_url.hostname:
return response.text('Bad Request', status=400)

with

if not parsed_url.hostname:
url = request.url

@messense
Copy link
Member

messense commented Jan 2, 2018

Would you like to send a PR to fix it?

@olidietzel
Copy link
Author

olidietzel commented Jan 2, 2018

First: Thx a lot for this great piece of software!

Second: I am just a dino admin, would be my first PR here. And what i did was just a crude hack job, should be done the right way by some coder more competent than me in order to minimize potential side effects! :)

If someone wants to do this and needs to configure a squid proxy for testing, this is the squid.conf i used (relevant parts are the cache_peer directive, squid runs locally on the same vm as prerender, and the "direct acls" named "static" and "direct"):

[root@prerender ~]# cat /etc/squid/squid.conf

###
### Recommended minimum configuration:
###
cache_peer 127.0.0.1 parent 8000 0 no-query no-digest
### Example rule allowing access from your local networks.
### Adapt to list your (internal) IP networks from where browsing
### should be allowed
acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
acl localnet src 172.16.0.0/12 # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
acl localnet src fc00::/7 # RFC 4193 local private network range
acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT

acl static urlpath_regex \.(html|htm|css|ico|js|gif|jpg|jpeg|png|xml|json|woff|JPG|JPEG|woff2|ttf|eot|svg)(\?.*)?$
acl direct dstdomain fonts.googleapis.com
###
### Recommended minimum Access Permission configuration:
###
### Deny requests to certain unsafe ports
http_access deny !Safe_ports

### Deny CONNECT to other than secure SSL ports
http_access deny CONNECT !SSL_ports

### Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager

### We strongly recommend the following be uncommented to protect innocent
### web applications running on the proxy server who think the only
### one who can access services on "localhost" is a local user
### http_access deny to_localhost

###
### INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
###

http_access allow localnet
http_access allow localhost

### And finally deny all other access to this proxy
http_access deny all

### Squid normally listens to port 3128
http_port 3128

### Uncomment and adjust the following to add a disk cache directory.
### cache_dir ufs /var/spool/squid 100 16 256

### Leave coredumps in the first cache dir
coredump_dir /var/spool/squid

###
### Add any of your own refresh_pattern entries above these.
###
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320
always_direct allow static
always_direct allow direct

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants