Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any reason valid links to pdf files might raise false alarms #105

Open
markcmiller86 opened this issue Feb 1, 2024 · 35 comments
Open

Any reason valid links to pdf files might raise false alarms #105

markcmiller86 opened this issue Feb 1, 2024 · 35 comments

Comments

@markcmiller86
Copy link

I am getting false positives both of which have to do with .pdf files, https://github.com/betterscientificsoftware/bssw.io/actions/runs/7734371239/job/21088244489?pr=1633

Any reason to suspect the checker has trouble with .pdf files?

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

A pdf file is not a text machine readable file, so you should not ask the checker to parse it (or add to ignore).

@markcmiller86
Copy link
Author

markcmiller86 commented Feb 1, 2024

Hmm...did you follow the link to the failed tests? I am not using it to check links in pdf files. It is failing on links to PDF files which I can browse to fine.

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

My apologies - I did not! It looks like it has nothing to do with the PDF files, those servers have bad certificates:

HTTPSConnectionPool(host='iitj.ac.in', port=443): Max retries exceeded with url: /uploaded_docs/cc/HPC_training/mcmuserguide.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))

You can reproduce with two lines of python:

import requests
requests.get('https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf')

You can ask the webmasters to update their certs, and if you don't have that control, you'll have to add them to the skiplist.

@markcmiller86
Copy link
Author

You can ask the webmasters to update their certs, and if you don't have that control, you'll have to add them to the skiplist.

Any chance you'd be willing to add a feature to ignore bad certs? (maybe even make it the default). Its a common scenario and asking people to populate skip lists for such a common scenario seems onerous. And, its confusing why my browser is able to follow links fine that the urlchecker action deems "broken".

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

The browser, and actually depending on the browser, does a lot of wonky things to "just make the page load." If you use command line / core tools that enforce best practices to check certificates, you tend to see the truth. And actually, we go to some lengths to try to emulate a web driver, but it's not perfect.

We can definitely consider that feature. You'll still have the timeout issue on the second PDF, however.

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

okay, I have a branch for you to test, will post shortly. Note that this is an action for urlchecker-python, so you can run the tool manually on your directory to check.

image

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

Here you go! Please test this out locally, and let me know if the new option works.

urlstechie/urlchecker-python#89

urlchecker check  --branch master --no-check-certs --no-print --verbose --file-types .md --exclude-patterns http://localhost:4000,[https://preview.bssw.io,https://github.com/](https://preview.bssw.io,https//github.com/)<your-github-handle> --retry-count 3 --timeout 10 --files .github/workflows/check-urls.yml,.github/workflows/README.md,Articles/Blog/2020-01-usrse.md,Articles/Blog/2020-11-PSIP4HDF5.md,Articles/Blog/2021-09-CollegevilleReportDay1.md,Articles/Blog/2021-12-sc21-swe-cse-bof.md,Articles/Blog/ConnectingSoftwareDevelopers.md,Articles/Blog/Covid19WorkstationCleanliness.md,Articles/Blog/HowToEnablePerformancePortability.md,Articles/Blog/HowToWriteGoodDocumentation.md,Articles/Blog/URSSI.md,CuratedContent/GoodEnoughPracticesInScientificComputing.md,CuratedContent/LanguageReferenceOnLine.md,CuratedContent/TeamOfTeamsUNPUB.md,CuratedContent/kitchen-sink-TEST.md,Site/BSSwFellowshipProgram/People/2020-F-Eisty.md .

@markcmiller86
Copy link
Author

@vsoch thanks so much!

Lemme give this a try.

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

Thank you!!

Heads up I'm breaking for dinner, but will be back later.

@markcmiller86
Copy link
Author

Am running into ssl version issues...

(myenv) sh-3.2$ urlchecker check ../bssw.io/CuratedContent/LanguageReferenceOnLine.md 
/Users/miller86/ideas-ecp/urlchecker-python/myenv/lib/python3.8/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

You need to add the --no-check-certs flag as I showed above. I'm not going to take off checking by default, it's a "use at your own risk" feature.

@markcmiller86
Copy link
Author

I am using that flag though the command and error I pasted didn't include it. Its an issue with macOS ssl, python and virtual env.

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

I don't have a Mac that I use for programming, but I'd follow that GitHub link and see if you can track down the issue. This is unrelated to urlchecker and the PR - it seems like it's an issue with the Python/ssl versions on your system.

@markcmiller86
Copy link
Author

Ok, well it might help if I was in the correct branch of the clone. I've done that now. And, I built a docker ubuntu container..., but strangley, I am getting cert errors...

# urlchecker check  --branch master --no-check-certs --no-print --verbose --file-types .md /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: master
                 cleanup: False
                  serial: False
              file types: ['.md']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
https://yaml.org/spec/1.2.2/
https://docs.python.org/dev/reference/
https://en.wikipedia.org/wiki/POSIX
HTTPSConnectionPool(host='iitj.ac.in', port=443): Max retries exceeded with url: /uploaded_docs/cc/HPC_training/mcmuserguide.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf
HTTPSConnectionPool(host='iitj.ac.in', port=443): Max retries exceeded with url: /uploaded_docs/cc/HPC_training/mcmuserguide.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf
https://docs.python.org/3/reference/
https://docs.microsoft.com/en-us/cpp/standard-library/cpp-standard-library-reference?view=msvc-170
https://www.open-mpi.org/doc/v4.0/
https://wg5-fortran.org/N1151-N1200/N1191.pdf
https://chapel-lang.org/docs/language/spec/index.html

@markcmiller86
Copy link
Author

Ok, so I think --branch needs to be set to add-skip-check-certs, right? Well, that still isn't working though...

# urlchecker check --branch add-skip-check-certs --no-check-certs --no-print --verbose --file-types .md /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: add-skip-check-certs
                 cleanup: False
                  serial: False
              file types: ['.md']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
https://zsh.sourceforge.io/Guide/zshguide.html
https://parallel-netcdf.github.io/wiki/Documentation.html
https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf
https://docs.python.org/3.10/extending/extending.html
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf
https://julialang.org/blog/2019/07/multithreading/
https://en.wikipedia.org/wiki/C_standard_library#Implementations
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2020/n4849.pdf
https://access.redhat.com/articles/5594481
https://en.wikipedia.org/wiki/Data_parallelism
https://cplusplus.com/reference/multithreading/
https://docs.oracle.com/cd/E19048-01/chorus4/806-3328/6jcg1bm05/index.html
https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top.html
https://hpx-docs.stellar-group.org/latest/html/index.html
https://kokkos.org/kokkos-core-wiki/
https://gcc.gnu.org/onlinedocs/cpp/
https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
https://www.lrde.epita.fr/~adl/autotools.html
https://web.archive.org/web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html
https://support.hpe.com/hpesc/public/docDisplay?docId=a00115116en_us&docLocale=en_US&page=The_Cray_Compiling_Environment.html
https://www.mpi-forum.org/docs/mpi-1.3/mpi-report-1.3-2008-05-30.pdf
https://clang.llvm.org
https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference.html
https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference.html
https://svnbook.red-bean.com
http://port70.net/~nsz/c/c89/c89-draft.html
https://docs.globus.org/cli/
https://support.nag.com/nagware/np/r71_doc/compiler.pdf
https://legion.stanford.edu/pdfs/legion-manual.pdf
https://gcc.gnu.org/onlinedocs/libc/
https://developer.download.nvidia.com/compute/DevZone/docs/html/OpenCL/doc/OpenCL_Programming_Guide.pdf
https://github.com/markcmiller86
https://docs.oracle.com/cd/E36784_01/html/E36870/ksh-1.html
https://www.openacc.org/sites/default/files/inline-files/openacc-guide.pdf
https://www.gnu.org/software/libc/manual/html_mono/libc.html#I_002fO-Overview
https://www.mpi-forum.org/
https://docs.microsoft.com/en-us/cpp/preprocessor/c-cpp-preprocessor-reference?view=msvc-170
https://www.latex-project.org/help/documentation/
https://libc.llvm.org/
https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf
https://docs.python.org/2/reference/
https://docs.python.org/2.7/extending/extending.html
https://www.mpich.org/static/docs/v1.5.x/
https://spack.readthedocs.io/en/latest/
https://www.open-mpi.org/doc/v3.1/
https://www.extremetech.com/extreme/289423-it-took-half-a-ton-of-hard-drives-to-store-eht-black-hole-image-data
https://www.khronos.org/sycl/resources
https://en.wikipedia.org/wiki/Distributed_memory
https://docs.readthedocs.io/en/stable/tutorial/
https://www.json.org/json-en.html
https://hpc.pnl.gov/globalarrays/documentation.shtml
https://docs.microsoft.com/en-us/cpp/standard-library/cpp-standard-library-reference?view=msvc-170
https://yaml.org/spec/1.2.2/
https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html
https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf?#page=683
https://en.wikipedia.org/wiki/Virtual_private_network
https://docs.python.org/3/reference/
https://github.com/KhronosGroup/OpenCL-Guide
http://www.lahey.com/docs/LangRefEXP73_revG05.pdf
https://cgns.github.io/CGNS_docs_current/user/index.html
https://docs.hdfgroup.org/hdf5/v1_12/index.html
https://docs.hdfgroup.org/hdf5/v1_12/_r_m.html
https://docs.daos.io/v2.2/user/workflow/
https://cplusplus.com/reference/clibrary/
https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
https://support.hpe.com/hpesc/public/docDisplay?docId=a00115296en_us&page=About_the_Cray_Fortran_Reference_Manual.html
https://thrust.github.io/doc/modules.html
https://wg5-fortran.org/N1601-N1650/N1601.pdf
https://learn.microsoft.com/en-us/cpp/c-runtime-library/c-run-time-library-reference?view=msvc-170
https://www.mpich.org/static/docs/v3.4.x/
https://www.mpich.org/
https://www.gnu.org/software/make/manual/make.html
https://linux.die.net/man/1/tcsh
https://www.ibm.com/support/pages/system/files/support/swg/swgdocs.nsf/0/7e46ea600b6646d0852579dc00331978/$FILE/langref.pdf
https://j3-fortran.org/doc/year/18/18-007r1.pdf
https://hpc-tutorials.llnl.gov/posix/AppendixA/
https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
https://visit-dav.github.io/visit-website/
https://github.com/RadeonOpenCompute/ROCm/raw/rocm-4.5.2/AMD_HIP_Programming_Guide.pdf
https://llnl-conduit.readthedocs.io/en/latest/blueprint.html
https://clang.llvm.org/cxx_status.html
https://learn.microsoft.com/en-us/cpp/c-runtime-library/run-time-routines-by-category?view=msvc-170
https://www.w3.org/TR/xml/
https://docs.python.org/dev/reference/
https://gcc.gnu.org/onlinedocs/libstdc++/
https://www.intel.com/content/www/us/en/develop/documentation/iocl_rt_ref/top.html
https://www.intel.com/content/www/us/en/develop/documentation/iocl_rt_ref/top.html
https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf
https://www.doxygen.nl/manual/
https://en.wikipedia.org/wiki/POSIX
https://www.pgroup.com/resources/docs/17.10/x86/fortran-ref-guide/index.htm
https://www.ibm.com/docs/en/STXKQY_5.1.5/pdf/scale_cpr.pdf
https://github.com/python/cpython
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html
https://wg5-fortran.org/N1151-N1200/N1191.pdf
https://doc.lustre.org/lustre_manual.xhtml#file_striping.lfs_setstripe
https://www.computerhope.com/unix/scp.htm
https://docs.unidata.ucar.edu/nug/current/
https://en.wikipedia.org/wiki/Reference_implementation
https://man7.org/linux/man-pages/man1/make.1p.html
https://www.ibm.com/docs/en/i/7.3?topic=c-ile-cc-runtime-library-functions
https://www.gnu.org/software/bash/manual/bash.html
https://www.ibm.com/docs/en/ssw_ibm_i_71/rzarg/sc097852.pdf
https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
https://en.cppreference.com/w/cpp/experimental/parallelism
https://open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
https://docs.microsoft.com/en-us/cpp/cpp/cpp-language-reference?view=msvc-170
https://www.hdfgroup.org/2017/03/mif-parallel-io-with-hdf5/
https://docs.github.com/en/pages/getting-started-with-github-pages/about-github-pages
https://google.github.io/googletest/
https://www.amd.com/content/dam/amd/en/documents/developer/version-4-1-documents/aocc/aocc-4.1-user-guide.pdf
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2013/n3797.pdf
https://libcxx.llvm.org/
https://raja.readthedocs.io/en/develop/sphinx/user_guide/index.html
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2001/n1316/
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2001/n1316/
https://docs.microsoft.com/en-us/cpp/c-language/c-language-reference?view=msvc-170
https://www.markdownguide.org/tools/github-pages/
HTTPSConnectionPool(host='iitj.ac.in', port=443): Max retries exceeded with url: /uploaded_docs/cc/HPC_training/mcmuserguide.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf
HTTPSConnectionPool(host='iitj.ac.in', port=443): Max retries exceeded with url: /uploaded_docs/cc/HPC_training/mcmuserguide.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf
https://gcc.gnu.org/onlinedocs/cpp/Pragmas.html
https://j3-fortran.org/doc/year/10/10-007r1.pdf
https://chapel-lang.org/docs/language/spec/index.html
https://www.open-mpi.org/doc/v2.1/
https://numpy.org/doc/stable/reference/index.html#reference
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf
https://man.openbsd.org/ssh
https://docutils.sourceforge.io/rst.html
https://www.cplusplus.com/reference/
https://www.open-mpi.org/doc/v4.0/
https://cmake.org/cmake/help/latest/
https://adios2.readthedocs.io/en/latest/
https://docs.python.org/3.8/library/
https://git-scm.com/docs/user-manual
https://www.open-mpi.org/doc/v4.1/
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2017/n4659.pdf
https://docs.gitlab.com
https://wg5-fortran.org/N001-N1100/N692.pdf
https://charm.readthedocs.io/en/latest/charm++/manual.html
https://spec.oneapi.io/versions/latest/elements/oneTBB/source/nested-index.html
https://docs.python.org/2.7/library/
https://en.wikipedia.org/wiki/C%2B%2B_Standard_Library#Implementations
https://man7.org/linux/man-pages/man2/syscalls.2.html
https://devdocs.io/gnu_fortran/
https://support.google.com/a/users/answer/9282958?hl=en
https://ftp.mcs.anl.gov/pub/cobalt/archive/cobalt-0.95.2-manual.pdf
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2011/n3242.pdf
https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf
https://www.mpich.org/static/docs/v4.0.3/
https://hpss-collaboration.org/wp-content/uploads/2023/09/hpss_10.3_users_guide.pdf?#page=9
https://docs.github.com/en
https://cmake.org/cmake/help/latest/manual/ctest.1.html
https://www.openmp.org/wp-content/uploads/OpenMP3.1.pdf
https://en.wikipedia.org/wiki/CPython
https://github.com/fortran-lang/stdlib
https://slurm.schedmd.com
https://www.opengl.org/
https://rocmdocs.amd.com/_/downloads/en/latest/pdf/
https://llnl-conduit.readthedocs.io/en/latest/index.html
https://en.wikipedia.org/wiki/List_of_compilers
https://docs.nvidia.com/cuda/cuda-runtime-api/index.html

🤔 Uh oh... The following urls did not pass:
/tmp/LanguageReferenceOnLine.md:
     ❌️ https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference.html
     ❌️ https://www.intel.com/content/www/us/en/develop/documentation/iocl_rt_ref/top.html
     ❌️ https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2001/n1316/
     ❌️ https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

I would leave out branch and just run with --no-check-certs, and ensure the urlchecker "executable" is installed from the branch you cloned. Branch is only when you are cloning something, not when you have files locally. Do a urlchecker --help | grep no-check-certs to ensure you are hitting the right one.

@markcmiller86
Copy link
Author

I don't have confidence I am running the correct version in my docker container. I am still messing with it.

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

Let me know if you want some help to write a Dockerfile for it.

@markcmiller86
Copy link
Author

Ok, I am quite confident I've got it installed and am using the correct branch/version of urlchecker and I am not able to get it to work...

# which urlchecker
/usr/local/bin/urlchecker
# urlchecker --help
usage: urlchecker [-h] [--version] {version,check} ...

urlchecker python

options:
  -h, --help       show this help message and exit
  --version        suppress additional output.

actions:
  actions for urlchecker

  {version,check}  urlchecker python actions
    version        show software version
    check          check urls in static files (documentation or code)
# urlchecker --version
0.0.35
# urlchecker check /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: False
              file types: ['.md', '.py']
                   files: []
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
          no check certs: False
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
2024-02-01 20:51:39,908 - urlchecker - ERROR - Error running task
🤔 There were no URLs to check.


🤷. No urls were collected.
# ls /tmp
LanguageReferenceOnLine.md
# cat /tmp/LanguageReferenceOnLine.md | grep http
#### Contributed by [Mark C. Miller](https://github.com/markcmiller86 "Mark C. Miller GitHub Profile")
Instead, they rely solely on a [*reference implementation*](https://en.wikipedia.org/wiki/Reference_implementation).
Python's reference implementation is [CPython](https://en.wikipedia.org/wiki/CPython).
The *implementation* of a programming language is typically embodied in a [compiler](https://en.wikipedia.org/wiki/List_of_compilers) or, for interpretive languages like Python (or Basic), an *interpreter*.
[POSIX](https://en.wikipedia.org/wiki/POSIX) compliance was introduced in the 1990's to address this not only for the C standard library but also for many other aspects of how programs and humans (e.g. command-line *shells*) interact with an operating system.
For example, the GNU compiler collection (GCC) often supports a number of [language *extensions*](https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html) some of which eventually make their way into the formal language standard.
The [VisIt](https://visit-dav.github.io/visit-website/) project decided to permit C++11 constructs (specific to the 2011 C++ standard) into the code base only in 2018, a full 7 years after the language standard had been released.
Nonetheless, one critical differentiator is [shared memory vs. distributed memory](https://en.wikipedia.org/wiki/Distributed_memory) parallelism.
Another critical differentiator is whether parallelism manifests as the same computational task running simultaneously everywhere except on different data (e.g. [Data parallelism](https://en.wikipedia.org/wiki/Data_parallelism)) or something more generalized than this where computational tasks which can be wholly disparate are queued and divvied out to resources as they become available (e.g. Task parallelism).
The canonical example of an API that is managed in this way is the [Message Passing Interface (MPI)](https://www.mpi-forum.org/).
Another example is [OpenGL](https://www.opengl.org/), a graphics programming API (the *L* in OpenGL stands for *Library* but many often treat it as thought it stands for *Language*).
[MPICH](https://www.mpich.org/) serves as a *reference* implementation of MPI.
[3]: #a3 "The most formal resource for Python is the [language reference](https://docs.python.org/dev/reference/) and the *reference* implementation, [CPython](https://github.com/python/cpython)"
<a name="a3"></a><sup>3</sup>The most formal resource for Python is the *reference* implementation, [CPython](https://en.wikipedia.org/wiki/CPython)<br>
<a name="a4"></a><sup>4</sup>CPP is sometimes used to process other kinds of text files including those of other languages. CPP [`#pragma`](https://gcc.gnu.org/onlinedocs/cpp/Pragmas.html) directives are a common way for compiler vendors to extend the language.<br>
<a name="a7"></a><sup>7</sup>*USPSnet* is wordplay for sending physical storage media through the US Mail. Another name is *FootNet*. Sometimes, its the [best way](https://www.extremetech.com/extreme/289423-it-took-half-a-ton-of-hard-drives-to-store-eht-black-hole-image-data) to move a lot of data.
[c89-spec]: http://port70.net/~nsz/c/c89/c89-draft.html
[c99-spec]: https://open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
[c11-spec]: https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf
[c18-spec]: https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf
[c++03-spec]: https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2001/n1316/
[c++11-spec]: https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2011/n3242.pdf
[c++14-spec]: https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2013/n3797.pdf
[c++17-spec]: https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2017/n4659.pdf
[c++20-spec]: https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2020/n4849.pdf
[f77-spec]: https://web.archive.org/web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html
[f90-spec]: https://wg5-fortran.org/N001-N1100/N692.pdf
[f95-spec]: https://wg5-fortran.org/N1151-N1200/N1191.pdf
[f03-spec]: https://wg5-fortran.org/N1601-N1650/N1601.pdf
[f08-spec]: https://j3-fortran.org/doc/year/10/10-007r1.pdf
[f18-spec]: https://j3-fortran.org/doc/year/18/18-007r1.pdf
[ocl1.2-spec]: https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf
[ocl2.2-spec]: https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html
[ocl3.0-spec]: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html
[py2-spec]: https://docs.python.org/2/reference/
[py3-spec]: https://docs.python.org/3/reference/
[cpp-gnu]: https://gcc.gnu.org/onlinedocs/cpp/
[cpp-ms]: https://docs.microsoft.com/en-us/cpp/preprocessor/c-cpp-preprocessor-reference?view=msvc-170
[c-gnu]: https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf
[c-cray]: https://support.hpe.com/hpesc/public/docDisplay?docId=a00115116en_us&docLocale=en_US&page=The_Cray_Compiling_Environment.html
[c-ibm]: https://www.ibm.com/docs/en/ssw_ibm_i_71/rzarg/sc097852.pdf
[c-ms]: https://docs.microsoft.com/en-us/cpp/c-language/c-language-reference?view=msvc-170
[c-clang]: https://clang.llvm.org
[c-amd]: https://www.amd.com/content/dam/amd/en/documents/developer/version-4-1-documents/aocc/aocc-4.1-user-guide.pdf
[c++-intel]: https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top.html
[c++-cray]: https://support.hpe.com/hpesc/public/docDisplay?docId=a00115116en_us&docLocale=en_US&page=The_Cray_Compiling_Environment.html
[c++-ibm]: https://www.ibm.com/docs/en/ssw_ibm_i_71/rzarg/sc097852.pdf
[c++-ms]: https://docs.microsoft.com/en-us/cpp/cpp/cpp-language-reference?view=msvc-170
[c++-amd]: https://www.amd.com/content/dam/amd/en/documents/developer/version-4-1-documents/aocc/aocc-4.1-user-guide.pdf 
[c++-clang]: https://clang.llvm.org/cxx_status.html
[f-pg]: https://www.pgroup.com/resources/docs/17.10/x86/fortran-ref-guide/index.htm "Portland Group Compilers"
[f-lf]: http://www.lahey.com/docs/LangRefEXP73_revG05.pdf "Lahey/Fujitsu Fortran 95"
[f-intel]: https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference.html "All Fortran standards 90-18"
[f-cray]: https://support.hpe.com/hpesc/public/docDisplay?docId=a00115296en_us&page=About_the_Cray_Fortran_Reference_Manual.html
[f-ibm]: https://www.ibm.com/support/pages/system/files/support/swg/swgdocs.nsf/0/7e46ea600b6646d0852579dc00331978/$FILE/langref.pdf
[f-nag]: https://support.nag.com/nagware/np/r71_doc/compiler.pdf
[f-gnu]: https://devdocs.io/gnu_fortran/
[opencl-amd]: https://github.com/KhronosGroup/OpenCL-Guide
[opencl-intel]: https://www.intel.com/content/www/us/en/develop/documentation/iocl_rt_ref/top.html
[opencl-nvidia]: https://developer.download.nvidia.com/compute/DevZone/docs/html/OpenCL/doc/OpenCL_Programming_Guide.pdf
[py2]: https://docs.python.org/2/reference/
[py3]: https://docs.python.org/3/reference/
[c-stdlib-0]: https://cplusplus.com/reference/clibrary/
[c++-stdlib-0]: https://www.cplusplus.com/reference/
[c-stdlib-gnu]: https://gcc.gnu.org/onlinedocs/libc/
[c++-stdlib-gnu]: https://gcc.gnu.org/onlinedocs/libstdc++/
[c-stdlib-llvm]: https://libc.llvm.org/
[c++-stdlib-llvm]: https://libcxx.llvm.org/
[c-stdlib-ms]: https://learn.microsoft.com/en-us/cpp/c-runtime-library/c-run-time-library-reference?view=msvc-170
[c++-stdlib-ms]: https://docs.microsoft.com/en-us/cpp/standard-library/cpp-standard-library-reference?view=msvc-170
[c-stdlib-ibm]: https://www.ibm.com/docs/en/i/7.3?topic=c-ile-cc-runtime-library-functions
[c++-stdlib-ibm]: https://www.ibm.com/docs/en/i/7.3?topic=c-ile-cc-runtime-library-functions
[py-stdlib-2]: https://docs.python.org/2.7/library/
[py-stdlib-3]: https://docs.python.org/3.8/library/
[f-stdlib-0.2.1]: https://github.com/fortran-lang/stdlib
[imp-stdlib-c]: https://en.wikipedia.org/wiki/C_standard_library#Implementations
[imp-stdlib-c++]: https://en.wikipedia.org/wiki/C%2B%2B_Standard_Library#Implementations
[smpar-pthreads]: https://hpc-tutorials.llnl.gov/posix/AppendixA/
[smpar-tbb]: https://spec.oneapi.io/versions/latest/elements/oneTBB/source/nested-index.html
[smpar-c++mt]: https://cplusplus.com/reference/multithreading/
[smpar-cuda]: https://docs.nvidia.com/cuda/cuda-runtime-api/index.html
[smpar-hip]: https://github.com/RadeonOpenCompute/ROCm/raw/rocm-4.5.2/AMD_HIP_Programming_Guide.pdf
[smpar-omp-3.1]: https://www.openmp.org/wp-content/uploads/OpenMP3.1.pdf
[smpar-omp-4.5]: https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
[smpar-omp-5.2]: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf
[smpar-openacc]: https://www.openacc.org/sites/default/files/inline-files/openacc-guide.pdf
[dmpar-mpi-1.3]: https://www.mpi-forum.org/docs/mpi-1.3/mpi-report-1.3-2008-05-30.pdf
[dmpar-mpi-2.2]: https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
[dmpar-mpi-3.1]: https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
[dmpar-mpi-4.0]: https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf
[dmpar-mpich-1.5]: https://www.mpich.org/static/docs/v1.5.x/
[dmpar-mpich-3.4]: https://www.mpich.org/static/docs/v3.4.x/
[dmpar-mpich-4.0.3]: https://www.mpich.org/static/docs/v4.0.3/
[dmpar-ompi-4.1]: https://www.open-mpi.org/doc/v4.1/
[dmpar-ompi-4.0]: https://www.open-mpi.org/doc/v4.0/
[dmpar-ompi-3.1]: https://www.open-mpi.org/doc/v3.1/
[dmpar-ompi-2.1]: https://www.open-mpi.org/doc/v2.1/
[pparc-stl]: https://en.cppreference.com/w/cpp/experimental/parallelism
[pparc-hpx]: https://hpx-docs.stellar-group.org/latest/html/index.html
[pparc-thrust]: https://thrust.github.io/doc/modules.html
[pparc-raja]: https://raja.readthedocs.io/en/develop/sphinx/user_guide/index.html
[pparc-sycl]: https://www.khronos.org/sycl/resources
[pparc-rocm]: https://rocmdocs.amd.com/_/downloads/en/latest/pdf/
[ppard-kokkos]: https://kokkos.org/kokkos-core-wiki/
[ppard-ga]: https://hpc.pnl.gov/globalarrays/documentation.shtml
[ppard-legion]: https://legion.stanford.edu/pdfs/legion-manual.pdf
[ppard-charm++]: https://charm.readthedocs.io/en/latest/charm++/manual.html
[ppard-chapel]: https://chapel-lang.org/docs/language/spec/index.html
[ppard-julia]: https://julialang.org/blog/2019/07/multithreading/
[api-pyc-2]: https://docs.python.org/2.7/extending/extending.html 
[api-pyc-3]: https://docs.python.org/3.10/extending/extending.html
[api-py-numpy]: https://numpy.org/doc/stable/reference/index.html#reference
[api-sys-linux]: https://man7.org/linux/man-pages/man2/syscalls.2.html
[api-sys-posix]: https://docs.oracle.com/cd/E19048-01/chorus4/806-3328/6jcg1bm05/index.html
[api-sys-windows]: https://learn.microsoft.com/en-us/cpp/c-runtime-library/run-time-routines-by-category?view=msvc-170
[api-mifio]: https://www.hdfgroup.org/2017/03/mif-parallel-io-with-hdf5/
[api-posixio]: https://www.gnu.org/software/libc/manual/html_mono/libc.html#I_002fO-Overview
[api-hdf5-1.12]: https://docs.hdfgroup.org/hdf5/v1_12/index.html
[api-lustre]: https://doc.lustre.org/lustre_manual.xhtml#file_striping.lfs_setstripe
[api-gpfs]: https://www.ibm.com/docs/en/STXKQY_5.1.5/pdf/scale_cpr.pdf
[api-daos]: https://docs.daos.io/v2.2/user/workflow/
[api-adios]: https://adios2.readthedocs.io/en/latest/
[api-pnetcdf]: https://parallel-netcdf.github.io/wiki/Documentation.html
[api-mpiio]: https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf?#page=683
[api-sftp]: https://access.redhat.com/articles/5594481
[api-scp]: https://www.computerhope.com/unix/scp.htm
[api-hpss]: https://hpss-collaboration.org/wp-content/uploads/2023/09/hpss_10.3_users_guide.pdf?#page=9
[api-gdrive]: https://support.google.com/a/users/answer/9282958?hl=en
[api-globus]: https://docs.globus.org/cli/
[api-zsh]: https://zsh.sourceforge.io/Guide/zshguide.html
[api-bash]: https://www.gnu.org/software/bash/manual/bash.html
[api-ksh]: https://docs.oracle.com/cd/E36784_01/html/E36870/ksh-1.html
[api-tcsh]: https://linux.die.net/man/1/tcsh
[api-ssh]: https://man.openbsd.org/ssh
[api-vpn]: https://en.wikipedia.org/wiki/Virtual_private_network
[api-make]: https://man7.org/linux/man-pages/man1/make.1p.html
[api-gmake]: https://www.gnu.org/software/make/manual/make.html
[api-cmake]: https://cmake.org/cmake/help/latest/
[api-spack]: https://spack.readthedocs.io/en/latest/
[api-autotools]: https://www.lrde.epita.fr/~adl/autotools.html
[api-ctest]: https://cmake.org/cmake/help/latest/manual/ctest.1.html
[api-gtest]: https://google.github.io/googletest/
[api-yaml]: https://yaml.org/spec/1.2.2/
[api-json]: https://www.json.org/json-en.html
[api-xml]: https://www.w3.org/TR/xml/
[api-conduit]: https://llnl-conduit.readthedocs.io/en/latest/index.html
[api-hdf5]: https://docs.hdfgroup.org/hdf5/v1_12/_r_m.html
[api-netcdf]: https://docs.unidata.ucar.edu/nug/current/
[api-cgns]: https://cgns.github.io/CGNS_docs_current/user/index.html
[api-blueprint]: https://llnl-conduit.readthedocs.io/en/latest/blueprint.html
[api-latex]: https://www.latex-project.org/help/documentation/
[api-gfm]: https://www.markdownguide.org/tools/github-pages/
[api-rest]: https://docutils.sourceforge.io/rst.html
[api-doxygen]: https://www.doxygen.nl/manual/
[api-rtd]: https://docs.readthedocs.io/en/stable/tutorial/
[api-ghpages]: https://docs.github.com/en/pages/getting-started-with-github-pages/about-github-pages
[api-git]: https://git-scm.com/docs/user-manual
[api-svn]: https://svnbook.red-bean.com
[api-gitlab]: https://docs.gitlab.com
[api-github]: https://docs.github.com/en
[api-slurm]: https://slurm.schedmd.com
[api-cobalt]: https://ftp.mcs.anl.gov/pub/cobalt/archive/cobalt-0.95.2-manual.pdf
[api-moab]: https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf
# urlchecker check  --branch master --no-check-certs --no-print --verbose --file-types .md /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: master
                 cleanup: False
                  serial: False
              file types: ['.md']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
          no check certs: True
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
2024-02-01 20:54:46,353 - urlchecker - ERROR - Error running task
🤔 There were no URLs to check.
# urlchecker check --branch master --no-check-certs --no-print --verbose --file-types .md /tmp/LanguageReferenceOnLine.md
           original path: /tmp/LanguageReferenceOnLine.md
              final path: /tmp/LanguageReferenceOnLine.md
               subfolder: None
                  branch: master
                 cleanup: False
                  serial: False
              file types: ['.md']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
          no check certs: True
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
🤔 There were no URLs to check.


🤷. No urls were collected.

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

A few things:

  • no check certs: False should be True (your last run)
  • --branch should not be set for a local check
  • You can't target an individual file, just the directory with the files (this is a bug, but it's the current reality)

@markcmiller86
Copy link
Author

Right, I gave all scenarios I tried which included with and without --branch. I didn't worry about certs (other than confirming the version I was running is handling that CL arg) because I was never getting any checks to begin with.

All that being said, still not working (could be my container setup). I don't think its doing the task-launch in the loop over files.

# urlchecker check --no-check-certs --no-print --verbose --file-types .md /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: False
              file types: ['.md']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
          no check certs: True
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
2024-02-01 21:24:39,081 - urlchecker - ERROR - Error running task
🤔 There were no URLs to check.


🤷. No urls were collected.

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

Try removing file types? I did test it on a directory in tmp with one markdown file and a link (in markdown too, that's important) and it worked, but I since added a raw string and that might have broken it. We also have some bug that the regex is not working as it did before - pinging @SuperKogito he was going to look into that today.

@markcmiller86
Copy link
Author

no change...

# ls /tmp
LanguageReferenceOnLine.md
# urlchecker check --no-check-certs --no-print --verbose /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: False
              file types: ['.md', '.py']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
          no check certs: True
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
2024-02-01 21:36:24,363 - urlchecker - ERROR - Error running task
🤔 There were no URLs to check.


🤷. No urls were collected.

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

Let me try removing the raw string I added and I'll let you know, repull install and try again.

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

okay pushed.

@markcmiller86
Copy link
Author

markcmiller86 commented Feb 1, 2024

Ok, its going now. Getting a ton of error messages...

/usr/local/lib/python3.10/dist-packages/urllib3-2.2.0-py3.10.egg/urllib3/connectionpool.py:1103: InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.khronos.org'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings

Anyway to silence that. I mean, maybe one at the beginning or end would be good...but its echoing on every link. Not urgent. Put it on the todo list.

@markcmiller86
Copy link
Author

Ok, that worked...now trying with certs enabled to confirm a difference in behavior.

@vsoch
Copy link
Collaborator

vsoch commented Feb 1, 2024

Yeah no worries about that - this is a non-work, for fun open source project, so I'm good to prioritize based on that! I usually can add comments like this during the day and then actual work during non work hours.

@markcmiller86
Copy link
Author

Ok, what I am seeing withOUT --no-check-certs doesn't make sense. Almost all links are failing due to certs. Here are the last bits of output from the run...

.
.
.
https://clang.llvm.org
HTTPSConnectionPool(host='svnbook.red-bean.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://svnbook.red-bean.com
HTTPSConnectionPool(host='svnbook.red-bean.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://svnbook.red-bean.com
https://docs.gitlab.com
https://ftp.mcs.anl.gov/pub/cobalt/archive/cobalt-0.95.2-manual.pdf
HTTPSConnectionPool(host='www.khronos.org', port=443): Max retries exceeded with url: /registry/OpenCL/specs/opencl-1.2.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf
HTTPSConnectionPool(host='www.khronos.org', port=443): Max retries exceeded with url: /registry/OpenCL/specs/opencl-1.2.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf
https://docs.python.org/dev/reference/
HTTPSConnectionPool(host='www.openmp.org', port=443): Max retries exceeded with url: /wp-content/uploads/OpenMP-API-Specification-5-2.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf
HTTPSConnectionPool(host='www.openmp.org', port=443): Max retries exceeded with url: /wp-content/uploads/OpenMP-API-Specification-5-2.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf
HTTPSConnectionPool(host='j3-fortran.org', port=443): Max retries exceeded with url: /doc/year/18/18-007r1.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://j3-fortran.org/doc/year/18/18-007r1.pdf
HTTPSConnectionPool(host='j3-fortran.org', port=443): Max retries exceeded with url: /doc/year/18/18-007r1.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://j3-fortran.org/doc/year/18/18-007r1.pdf
HTTPSConnectionPool(host='web.archive.org', port=443): Max retries exceeded with url: /web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://web.archive.org/web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html
HTTPSConnectionPool(host='web.archive.org', port=443): Max retries exceeded with url: /web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://web.archive.org/web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html
https://github.com/fortran-lang/stdlib
HTTPSConnectionPool(host='www.gnu.org', port=443): Max retries exceeded with url: /software/gnu-c-manual/gnu-c-manual.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf
HTTPSConnectionPool(host='www.gnu.org', port=443): Max retries exceeded with url: /software/gnu-c-manual/gnu-c-manual.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf

🤔 Uh oh... The following urls did not pass:
/tmp/LanguageReferenceOnLine.md:
     ❌️ https://www.openmp.org/wp-content/uploads/OpenMP3.1.pdf
     ❌️ https://adios2.readthedocs.io/en/latest/
     ❌️ https://support.hpe.com/hpesc/public/docDisplay?docId=a00115116en_us&docLocale=en_US&page=The_Cray_Compiling_Environment.html
     ❌️ https://www.gnu.org/software/libc/manual/html_mono/libc.html#I_002fO-Overview
     ❌️ https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
     ❌️ https://www.json.org/json-en.html
     ❌️ https://www.khronos.org/sycl/resources
     ❌️ https://gcc.gnu.org/onlinedocs/cpp/
     ❌️ https://www.open-mpi.org/doc/v3.1/
     ❌️ https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
     ❌️ https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
     ❌️ https://www.open-mpi.org/doc/v4.0/
     ❌️ https://hpx-docs.stellar-group.org/latest/html/index.html
     ❌️ https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html
     ❌️ https://zsh.sourceforge.io/Guide/zshguide.html
     ❌️ https://slurm.schedmd.com
     ❌️ https://www.computerhope.com/unix/scp.htm
     ❌️ https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2011/n3242.pdf
     ❌️ https://docs.readthedocs.io/en/stable/tutorial/
     ❌️ https://man7.org/linux/man-pages/man2/syscalls.2.html
     ❌️ https://wg5-fortran.org/N1151-N1200/N1191.pdf
     ❌️ https://libc.llvm.org/
     ❌️ https://www.open-mpi.org/doc/v2.1/
     ❌️ https://legion.stanford.edu/pdfs/legion-manual.pdf
     ❌️ https://en.wikipedia.org/wiki/Reference_implementation
     ❌️ https://en.wikipedia.org/wiki/Virtual_private_network
.
.
.

@markcmiller86
Copy link
Author

@vsoch by the way...if you need a proj/task to charge for some time on this, I think I can accomodate. Lemme know.

@vsoch
Copy link
Collaborator

vsoch commented Feb 2, 2024

@markcmiller86 that might be reflecting the setup on your Mac?

I appreciate that, but this project has a FUNDING.yml meaning folks can find it with GitHub sponsors, and is clearly scoped outside of lab work. I have this registered as an outside business agreement and I set a pretty clear line between lab work and these projects, so I don't think that would work.

I'm pretty good at getting stuff done, so I can say I will be able to work on the underlying issues sooner than later, but absolutely not on lab time (I'm taking a quick break and drinking hot chocolate right now). ☕

@vsoch
Copy link
Collaborator

vsoch commented Feb 2, 2024

Also double check you installed ca-certificates in the container, and try using --network=host too. Likely that won't fix it (I am terrible with Macs and know they are terrible with docker) but just a suggestion!

@markcmiller86
Copy link
Author

I checked ca-certificates,

# apt-get install ca-certificates
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ca-certificates is already the newest version (20230311ubuntu0.22.04.1).
ca-certificates set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 19 not upgraded.

I installed links browser and went to one of the URLs urlchecker deemed invalid. It works but claims the cert is invalid...

Screen Shot 2024-02-01 at 4 26 38 PM

But, I get your point. Maybe the container is misconfigured. I certainly don't have much experience with them and I didn't launch it to use --network=host.

@markcmiller86
Copy link
Author

Ok, I gave up on docker. Installed on pascal. Asked ChatGPT for known sites with bad certs...

  • expired.badssl.com is set up with an expired SSL certificate.
  • self-signed.badssl.com uses a self-signed certificate.
  • wrong.host.badssl.com has a certificate that does not match the domain name.

Created this file

https://www.cultureco-op.com
https://expired.badssl.com
https://self-signed.badssl.com
https://wrong.host.badssl.com
https://www.sandia.gov

With --no-check-certs all pass. With it, it flags wrong and expired cases. So, I think this is working. Thanks for adding the feature!

@markcmiller86
Copy link
Author

Also, not sure what you are doing on back-end as far as testing urlchecker but I asked ChatGPT about useful URLs to use for testing....

Yes, for testing tools that check the validity and functionality of URLs in text files, it's helpful to use a variety of test websites and addresses that simulate different scenarios. Here are several categories and examples:

  1. HTTP Status Codes: Websites that return various HTTP status codes can help you test how your tool handles success, redirection, client errors, and server errors.

    • httpstat.us provides specific status codes (e.g., http://httpstat.us/200 for OK, http://httpstat.us/404 for Not Found, http://httpstat.us/500 for Internal Server Error).
  2. Invalid URLs: To test the handling of invalid URLs, you can construct URLs that are clearly malformed or unlikely to exist.

    • Example: http://thisisnotarealwebsite.invalid, https://123.456.789.012, or ftp://invalid.url.example.
  3. Timeout and Delay: To check how your tool handles timeouts and slow responses.

    • http://httpbin.org/delay/5 delays the response by 5 seconds, which can be used to simulate a slow server.
  4. DNS Errors: URLs that simulate DNS resolution errors can test how your tool handles domain names that cannot be resolved.

    • Example: http://domain.notfound.example, assuming example is a valid TLD but the subdomain does not exist.
  5. Redirects: Testing how your tool handles HTTP redirects is crucial for ensuring it follows or respects redirects correctly.

    • http://httpbin.org/redirect/1 redirects to another page, which can be used to test redirect handling.
  6. SSL/TLS Issues: As previously mentioned, badssl.com hosts various subdomains with specific SSL/TLS issues, which is useful for testing secure connection errors.

  7. Large Response Bodies: To test how your tool handles large data transfers.

    • http://httpbin.org/stream/20 streams 20 lines of JSON, which can be useful for testing how your tool handles streaming data or large responses.
  8. WebSockets: Testing WebSocket connections can be important for tools that need to verify real-time communication protocols.

    • wss://echo.websocket.org provides a WebSocket server that echoes messages sent to it, useful for testing WebSocket connections.

When using these resources, it's important to consider the impact of your testing on third-party services. Ensure that your testing complies with any usage policies or terms of service to avoid causing undue load or other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants