Skip to content

⚡ Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more!

License

Notifications You must be signed in to change notification settings

codesprout-com/lychee

 
 

Repository files navigation

lychee

Homepage GitHub Marketplace Rust docs.rs Check Links Docker Pulls

⚡ A fast, async, stream-based link checker written in Rust.
Finds broken hyperlinks and mail addresses inside Markdown, HTML, reStructuredText, or any other text file or website!

Available as a command-line utility, a library and a GitHub Action.

Lychee demo

Installation

Arch Linux

pacman -S lychee-link-checker

macOS

brew install lychee

Docker

docker pull lycheeverse/lychee

NixOS

nix-env -iA nixos.lychee

FreeBSD

pkg install lychee

Scoop

scoop install lychee

Termux

pkg install lychee

Pre-built binaries

We provide binaries for Linux, macOS, and Windows for every release.
You can download them from the releases page.

Cargo

Build dependencies

On APT/dpkg-based Linux distros (e.g. Debian, Ubuntu, Linux Mint and Kali Linux) the following commands will install all required build dependencies, including the Rust toolchain and cargo:

curl -sSf 'https://sh.rustup.rs' | sh
apt install gcc pkg-config libc6-dev libssl-dev

Compile and install lychee

cargo install lychee

Features

This comparison is made on a best-effort basis. Please create a PR to fix outdated information.

lychee awesome_bot muffet broken-link-checker linkinator linkchecker markdown-link-check fink
Language Rust Ruby Go JS TypeScript Python JS PHP
Async/Parallel yes yes yes yes yes yes yes yes
JSON output yes no yes yes yes maybe1 yes yes
Static binary yes no yes no no no no no
Markdown files yes yes no no no yes yes no
HTML files yes no no yes yes no yes no
Text files yes no no no no no no no
Website support yes no yes yes yes yes no yes
Chunked encodings yes maybe maybe maybe maybe no yes yes
GZIP compression yes maybe maybe yes maybe yes maybe no
Basic Auth yes no no yes no yes no no
Custom user agent yes no no yes no yes no no
Relative URLs yes yes no yes yes yes yes yes
Skip relative URLs yes no no maybe no no no no
Include patterns yes yes no yes no no no no
Exclude patterns yes no yes yes yes yes yes yes
Handle redirects yes yes yes yes yes yes yes yes
Ignore insecure SSL yes yes yes no no yes no yes
File globbing yes yes no no yes no yes no
Limit scheme yes no no yes no yes no no
Custom headers yes no yes no no no yes yes
Summary yes yes yes maybe yes yes no yes
HEAD requests yes yes no yes yes yes no no
Colored output yes maybe yes maybe yes yes no yes
Filter status code yes yes no no no no yes no
Custom timeout yes yes yes no yes yes no yes
E-mail links yes no no no no yes no no
Progress bar yes yes no no no yes yes yes
Retry and backoff yes no no no yes no yes no
Skip private domains yes no no no no no no no
Use as library yes yes no yes yes no yes no
Quiet mode yes no no no yes yes yes yes
Config file yes no no no yes yes yes no
Recursion no no yes yes yes yes yes no
Amazing lychee logo yes no no no no no no no

1 Other machine-readable formats like CSV are supported.

Commandline usage

Recursively check all links in supported files inside the current directory

lychee .

You can also specify various types of inputs:

# check links in specific local file(s):
lychee README.md
lychee test.html info.txt

# check links on a website:
lychee https://endler.dev

# check links in directory but block network requests
lychee --offline path/to/directory

# check links in a remote file:
lychee https://raw.githubusercontent.com/lycheeverse/lychee/master/README.md

# check links in local files via shell glob:
lychee ~/projects/*/README.md

# check links in local files (lychee supports advanced globbing and ~ expansion):
lychee "~/projects/big_project/**/README.*"

# ignore case when globbing and check result for each link:
lychee --glob-ignore-case --verbose "~/projects/**/[r]eadme.*"

# check links from epub file (requires atool: https://www.nongnu.org/atool)
acat -F zip {file.epub} "*.xhtml" "*.html" | lychee -

lychee parses other file formats as plaintext and extracts links using linkify. This generally works well if there are no format or encoding specifics, but in case you need dedicated support for a new file format, please consider creating an issue.

Docker Usage

Here's how to mount a local directory into the container and check some input with lychee. The --init parameter is passed so that lychee can be stopped from the terminal. We also pass -it to start an interactive terminal, which is required to show the progress bar.

docker run --init -it -v `pwd`:/input lycheeverse/lychee /input/README.md

GitHub Token

To avoid getting rate-limited while checking GitHub links, you can optionally set an environment variable with your Github token like so GITHUB_TOKEN=xxxx, or use the --github-token CLI option. It can also be set in the config file. Here is an example config file.

The token can be generated in your GitHub account settings page. A personal token with no extra permissions is enough to be able to check public repos links.

Commandline Parameters

There is an extensive list of commandline parameters to customize the behavior. See below for a full list.

A fast, async link checker

Finds broken URLs and mail addresses inside Markdown, HTML, `reStructuredText`, websites and more!

Usage: lychee [OPTIONS] <inputs>...

Arguments:
  <inputs>...
          The inputs (where to get links to check from). These can be: files (e.g. `README.md`), file globs (e.g. `"~/git/*/README.md"`), remote URLs (e.g. `https://example.com/README.md`) or standard input (`-`). NOTE: Use `--` to separate inputs from options that allow multiple arguments

Options:
  -c, --config <CONFIG_FILE>
          Configuration file to use
          
          [default: ./lychee.toml]

  -v, --verbose...
          Set verbosity level; more output per occurrence (e.g. `-v` or `-vv`)

  -q, --quiet...
          Less output per occurrence (e.g. `-q` or `-qq`)

  -n, --no-progress
          Do not show progress bar.
          This is recommended for non-interactive shells (e.g. for continuous integration)

      --cache
          Use request cache stored on disk at `.lycheecache`

      --max-cache-age <MAX_CACHE_AGE>
          Discard all cached requests older than this duration
          
          [default: 1d]

      --dump
          Don't perform any link checking. Instead, dump all the links extracted from inputs that would be checked

  -m, --max-redirects <MAX_REDIRECTS>
          Maximum number of allowed redirects
          
          [default: 5]

      --max-retries <MAX_RETRIES>
          Maximum number of retries per request
          
          [default: 3]

      --max-concurrency <MAX_CONCURRENCY>
          Maximum number of concurrent network requests
          
          [default: 128]

  -T, --threads <THREADS>
          Number of threads to utilize. Defaults to number of cores available to the system

  -u, --user-agent <USER_AGENT>
          User agent
          
          [default: lychee/0.10.3]

  -i, --insecure
          Proceed for server connections considered insecure (invalid TLS)

  -s, --scheme <SCHEME>
          Only test links with the given schemes (e.g. http and https)

      --offline
          Only check local files and block network requests

      --include <INCLUDE>
          URLs to check (supports regex). Has preference over all excludes

      --exclude <EXCLUDE>
          Exclude URLs and mail addresses from checking (supports regex)

      --exclude-file <EXCLUDE_FILE>
          Deprecated; use `--exclude-path` instead

      --exclude-path <EXCLUDE_PATH>
          Exclude file path from getting checked

  -E, --exclude-all-private
          Exclude all private IPs from checking.
          Equivalent to `--exclude-private --exclude-link-local --exclude-loopback`

      --exclude-private
          Exclude private IP address ranges from checking

      --exclude-link-local
          Exclude link-local IP address range from checking

      --exclude-loopback
          Exclude loopback IP address range and localhost from checking

      --exclude-mail
          Exclude all mail addresses from checking

      --remap <REMAP>
          Remap URI matching pattern to different URI

      --header <HEADER>
          Custom request header

  -a, --accept <ACCEPT>
          Comma-separated list of accepted status codes for valid links

  -t, --timeout <TIMEOUT>
          Website timeout in seconds from connect to response finished
          
          [default: 20]

  -r, --retry-wait-time <RETRY_WAIT_TIME>
          Minimum wait time in seconds between retries of failed requests
          
          [default: 1]

  -X, --method <METHOD>
          Request method
          
          [default: get]

  -b, --base <BASE>
          Base URL or website root directory to check relative URLs e.g. https://example.com or `/path/to/public`

      --basic-auth <BASIC_AUTH>
          Basic authentication support. E.g. `username:password`

      --github-token <GITHUB_TOKEN>
          GitHub API token to use when checking github.com links, to avoid rate limiting
          
          [env: GITHUB_TOKEN]

      --skip-missing
          Skip missing input files (default is to error if they don't exist)

      --include-verbatim
          Find links in verbatim sections like `pre`- and `code` blocks

      --glob-ignore-case
          Ignore case when expanding filesystem path glob inputs

  -o, --output <OUTPUT>
          Output file of status report

  -f, --format <FORMAT>
          Output format of final status report (compact, detailed, json, markdown)
          
          [default: compact]

      --require-https
          When HTTPS is available, treat HTTP links as errors

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Exit codes

  • 0 for success (all links checked successfully or excluded/skipped as configured)
  • 1 for missing inputs and any unexpected runtime failures or config errors
  • 2 for link check failures (if any non-excluded link failed the check)

Ignoring links

You can exclude links from getting checked by specifying regex patterns with --exclude (e.g. --exclude example\.(com|org)). If a file named .lycheeignore exists in the current working directory, its contents are excluded as well. The file allows you to list multiple regular expressions for exclusion (one pattern per line).

For excluding files/directories from being scanned use lychee.toml and exclude_path.

exclude_path = ["some/path", "*/dev/*"]

Caching

If the --cache flag is set, lychee will cache responses in a file called .lycheecache in the current directory. If the file exists and the flag is set, then the cache will be loaded on startup. This can greatly speed up future runs. Note that by default lychee will not store any data on disk.

Library usage

You can use lychee as a library for your own projects! Here is a "hello world" example:

use lychee_lib::Result;

#[tokio::main]
async fn main() -> Result<()> {
  let response = lychee_lib::check("https://github.com/lycheeverse/lychee").await?;
  println!("{response}");
  Ok(())
}

This is equivalent to the following snippet, in which we build our own client:

use lychee_lib::{ClientBuilder, Result, Status};

#[tokio::main]
async fn main() -> Result<()> {
  let client = ClientBuilder::default().client()?;
  let response = client.check("https://github.com/lycheeverse/lychee").await?;
  assert!(response.status().is_success());
  Ok(())
}

The client builder is very customizable:

let client = lychee_lib::ClientBuilder::builder()
    .includes(includes)
    .excludes(excludes)
    .max_redirects(cfg.max_redirects)
    .user_agent(cfg.user_agent)
    .allow_insecure(cfg.insecure)
    .custom_headers(headers)
    .method(method)
    .timeout(timeout)
    .github_token(cfg.github_token)
    .scheme(cfg.scheme)
    .accepted(accepted)
    .build()
    .client()?;

All options that you set will be used for all link checks. See the builder documentation for all options. For more information, check out the examples folder.

GitHub Action Usage

A GitHub Action that uses lychee is available as a separate repository: lycheeverse/lychee-action which includes usage instructions.

Contributing to lychee

We'd be thankful for any contribution.
We try to keep the issue-tracker up-to-date so you can quickly find a task to work on.

Try one of these links to get started:

For more detailed instructions, head over to CONTRIBUTING.md.

Debugging and improving async code

Lychee makes heavy use of async code to be resource-friendly while still being performant. Async code can be difficult to troubleshoot with most tools, however. Therefore we provide experimental support for tokio-console. It provides a top(1)-like overview for async tasks!

If you want to give it a spin, download and start the console:

git clone https://github.com/tokio-rs/console
cd console
cargo run

Then run lychee with some special flags and features enabled.

RUSTFLAGS="--cfg tokio_unstable" cargo run --features tokio-console -- <input1> <input2> ...

If you find a way to make lychee faster, please do reach out.

Troubleshooting and Workarounds

We collect a list of common workarounds for various websites in our troubleshooting guide.

Users

If you are using lychee for your project, please add it here.

Credits

The first prototype of lychee was built in episode 10 of Hello Rust. Thanks to all Github- and Patreon sponsors for supporting the development since the beginning. Also, thanks to all the great contributors who have since made this project more mature.

License

lychee is licensed under either of

at your option.

About

⚡ Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 98.2%
  • Other 1.8%