Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Warnings for good links #4

Open
Kristinita opened this issue Apr 10, 2017 · 4 comments
Open

[Bug] Warnings for good links #4

Kristinita opened this issue Apr 10, 2017 · 4 comments

Comments

@Kristinita
Copy link

Kristinita commented Apr 10, 2017

1. Summary

  1. deadlinks mark as dead good links.
  2. deadlinks mark as dead good links, but blocked from my IP (see Internet censorship in Russia).

2. Settings

My project — https://github.com/Kristinita/KristinitaPelican,

Part of my pelicanconf.py file:

PLUGIN_PATHS = ['pelican-plugins']
PLUGINS = [
    'pagefixer',
    'pelican_javascript',
    'section_number', 'interlinks', 'deadlinks'
]

DEADLINK_VALIDATION = True

DEADLINK_OPTS = {
    'archive': True,
    'classes': ['custom-class1', 'disabled'],
    'labels': True
}

3. Steps to reproduce

I run command in terminal:

pelican content --debug > DeadlinkDebug.txt 2>&1

See full output on Gist — https://gist.github.com/Kristinita/63c81829c196afd7dc68cbe5e3dba12a.

4. Expected behavior

Discover and replace real 403/404 links, not links from 1.1 and 1.2 items of my issue.

5. Actual behavior

List of links, mark as dead.

https://rsdn.ru/article/patterns/framework.xml#EKB
http://vaden-pro.ru/blog/laravel/laravel-chto-eto
http://web.archive.org/web/20150615162941/http://www.xpomo.com/ruskolan/tolpa/piramida.htm
http://www.spy-soft.net/chto-takoe-rat/
http://loveread.ec/read_book.php?id=45782&p=12
http://archive.is/20160611162905/http://liwihelp.ru/sistema/avtomaticheskoe_vklyuchenie_kompyutera.html
https://learn.javascript.ru/window-methods
http://javascript.ru/window-location
https://colocat.ru/texts/realip.html
http://dizems.ru/v-chem-otlichie-staticheskix-sajtov-ot-dinamicheskix
http://www.Is
http://optimakomp.ru/virustotal-totalnoe-skanirovanie-fajjlov-i-sajjtov-desyatkami-antivirusov/
http://wolandblog.com/3-pochemu-ya-ne-ispolzuyu-dnsbl-v-pomoshh-nachinayushhemu-postmasteru/
https://www.projecthoneypot.org/
https://urlquery.net/
http://www.dnsbl.info/dnsbl-database-check.php
http://wikireality.ru/wiki/MDK
http://archive.is/20160518165040/https://www.youtube.com/watch?v=qet1ypk3qDM&lc=z13owrebxvn2vt3e422du3wowrmzz5xxz04
http://archive.is/20160522103717/https://www.youtube.com/watch?v=8Lsrvn7oa60&lc=z12bz5axmkngxx10i22ucr15rtvnsjpyy04
http://archive.is/
http://web.archive.org/web/20150615162941/http://www.xpomo.com/ruskolan/tolpa/piramida.htm
http://archive.is/20160518125518/https://www.facebook.com/permalink.php?story_fbid=517539018447713&id=100005748574402%23
http://archive.is/20160601035438/http://www.sports.ru/profile/1021517009/comments/?p=30
http://archive.is/20160601041255/http://www.sports.ru/profile/70045047/comments/?p=38
http://web.archive.org/web/20150615162941/http://www.xpomo.com/ruskolan/tolpa/piramida.htm
http://alternativeto.net/software/resource-hacker/
https://www.google.ru/search?q=status+bar&newwindow=1&source=lnms&tbm=isch&sa=X&ved=0ahUKEwi-j9WygojTAhVGiSwKHfRhATYQ_AUIBigB&biw=1173&bih=729

I can successful visit this links without proxy and other anonymisation tools:

Some links working, but blocked by government of my country (Russia), example:

6. Environment

Operating system and version:
Windows 10 Enterprise LTSB 64-bit EN
Python:
3.6.1
Pelican:
3.7.1
BeautifulSoup4:
4.5.3

Thanks.

@silentlamb
Copy link
Collaborator

I've checked myself links from "actual behavior" section using most recent master and both raw and VPN connection (russian server) and here's the thing.

Some links to web.archive.org open up in web browser, but under the hood 403 status code is returned and the website says "cannot archive due to robots.txt on http://xxx.xxx.xxx". For these plugin seems to almost work properly. Almost, because I forgot to exclude links to web.archive.org from being checked (there's no reason to make web.archive.org link to web.archive.org) so that's a different bug.

In Firefox (checked using developer tools and network option) links to archive.is work properly (code 200 is returned) when using raw connection (Poland), but when switching to VPN connection (Russia) - timeouts occur.

For some websites connection cannot be made due to SSL errors:

@Kristinita
Copy link
Author

Kristinita commented Apr 30, 2017

@silentlamb,

1. Summary

In last Deadlinks version dead links doesn't replace to archive links, despite the fact that 'archive': True,.

2. Settings

Same Pelican configuration as first post.

Full output — https://gist.github.com/86cb35b6d9c445a81eadd1db2cf5b319,
Warnings — https://gist.github.com/c2a96ee8da4027ac763b3c0ecb017af4.

3. Steps to reproduce

Same as first post.

4. Expected behavior

Replace dead links to archive links.

5. Actual behavior

Skipping… (not available), examples:

DEBUG: Starting new HTTPS connection (1): esquire.ru
WARNING: Skipping: https://esquire.ru/coined-word (not available)

DEBUG: Starting new HTTPS connection (1): colocat.ru
WARNING: Skipping: https://colocat.ru/texts/realip.html (not available)

6. Environment

Same as first post.

Thanks.

@Kristinita
Copy link
Author

1. Question

Can you set Deadlinks, that your plugin replace links if return 403/404 status code, not other?

2. Argumentation

In this issue I showed, that Deadlinks can replace links, that good open for me. I think, it unexpected behavior.

Thanks.

@Kristinita
Copy link
Author

@silentlamb , actually.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants