Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add --server-root flag #191

Merged
merged 1 commit into from Nov 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
43 changes: 25 additions & 18 deletions README.md
Expand Up @@ -2,10 +2,11 @@
> A super simple site crawler and broken link checker.

[![npm version](https://img.shields.io/npm/v/linkinator.svg)](https://www.npmjs.org/package/linkinator)
[![Build Status](https://api.cirrus-ci.com/github/JustinBeckwith/linkinator.svg)](https://cirrus-ci.com/github/JustinBeckwith/linkinator)
[![Build Status](https://github.com/JustinBeckwith/linkinator/workflows/ci/badge.svg)](https://github.com/JustinBeckwith/linkinator/actions)
[![codecov](https://codecov.io/gh/JustinBeckwith/linkinator/branch/master/graph/badge.svg)](https://codecov.io/gh/JustinBeckwith/linkinator)
[![Dependency Status](https://img.shields.io/david/JustinBeckwith/linkinator.svg)](https://david-dm.org/JustinBeckwith/linkinator)
[![Known Vulnerabilities](https://snyk.io/test/github/JustinBeckwith/linkinator/badge.svg)](https://snyk.io/test/github/JustinBeckwith/linkinator)
[![Code Style: Google](https://img.shields.io/badge/code%20style-google-blueviolet.svg)](https://github.com/google/gts)
[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/semantic-release/semantic-release)


Expand All @@ -26,7 +27,7 @@ $ npm install linkinator

You can use this as a library, or as a CLI. Let's see the CLI!

```sh
```
$ linkinator LOCATION [ --arguments ]

Positional arguments
Expand All @@ -36,35 +37,39 @@ $ linkinator LOCATION [ --arguments ]

Flags

--config
Path to the config file to use. Looks for `linkinator.config.json` by default.

--concurrency
The number of connections to make simultaneously. Defaults to 100.

--recurse, -r
Recursively follow links on the same root domain.

--skip, -s
List of urls in regexy form to not include in the check.
--config
Path to the config file to use. Looks for `linkinator.config.json` by default.

--format, -f
Return the data in CSV or JSON format.

--help
Show this command.

--include, -i
List of urls in regexy form to include. The opposite of --skip.

--format, -f
Return the data in CSV or JSON format.
--markdown
Automatically parse and scan markdown if scanning from a location on disk.

--recurse, -r
Recursively follow links on the same root domain.

--server-root
When scanning a locally directory, customize the location on disk
where the server is started. Defaults to the path passed in [LOCATION].

--silent
Only output broken links.

--skip, -s
List of urls in regexy form to not include in the check.

--timeout
Request timeout in ms. Defaults to 0 (no timeout).

--markdown
Automatically parse and scan markdown if scanning from a location on disk.

--help
Show this command.
```

### Command Examples
Expand Down Expand Up @@ -142,6 +147,8 @@ Asynchronous method that runs a site wide scan. Options come in the form of an o
- `concurrency` (number) - The number of connections to make simultaneously. Defaults to 100.
- `port` (number) - When the `path` is provided as a local path on disk, the `port` on which to start the temporary web server. Defaults to a random high range order port.
- `recurse` (boolean) - By default, all scans are shallow. Only the top level links on the requested page will be scanned. By setting `recurse` to `true`, the crawler will follow all links on the page, and continue scanning links **on the same domain** for as long as it can go. Results are cached, so no worries about loops.
- `serverRoot` (string) - When scanning a locally directory, customize the location on disk
where the server is started. Defaults to the path passed in `path`.
- `timeout` (number) - By default, requests made by linkinator do not time out (or follow the settings of the OS). This option (in milliseconds) will fail requests after the configured amount of time.
- `markdown` (boolean) - Automatically parse and scan markdown if scanning from a location on disk.
- `linksToSkip` (array | function) - An array of regular expression strings that should be skipped, OR an async function that's called for each link with the link URL as its only argument. Return a Promise that resolves to `true` to skip the link or `false` to check it.
Expand Down
33 changes: 20 additions & 13 deletions src/cli.ts
Expand Up @@ -25,33 +25,38 @@ const cli = meow(
Required. Either the URLs or the paths on disk to check for broken links.

Flags

--concurrency
The number of connections to make simultaneously. Defaults to 100.

--config
Path to the config file to use. Looks for \`linkinator.config.json\` by default.

--concurrency
The number of connections to make simultaneously. Defaults to 100.
--format, -f
Return the data in CSV or JSON format.

--help
Show this command.

--markdown
Automatically parse and scan markdown if scanning from a location on disk.

--recurse, -r
Recursively follow links on the same root domain.

--skip, -s
List of urls in regexy form to not include in the check.

--format, -f
Return the data in CSV or JSON format.
--server-root
When scanning a locally directory, customize the location on disk
where the server is started. Defaults to the path passed in [LOCATION].

--silent
Only output broken links

--skip, -s
List of urls in regexy form to not include in the check.

--timeout
Request timeout in ms. Defaults to 0 (no timeout).

--markdown
Automatically parse and scan markdown if scanning from a location on disk.

--help
Show this command.

Examples
$ linkinator docs/
$ linkinator https://www.google.com
Expand All @@ -69,6 +74,7 @@ const cli = meow(
silent: {type: 'boolean'},
timeout: {type: 'number'},
markdown: {type: 'boolean'},
serverRoot: {type: 'string'},
},
booleanDefault: undefined,
}
Expand Down Expand Up @@ -121,6 +127,7 @@ async function main() {
timeout: Number(flags.timeout),
markdown: flags.markdown,
concurrency: Number(flags.concurrency),
serverRoot: flags.serverRoot,
};
if (flags.skip) {
if (typeof flags.skip === 'string') {
Expand Down
1 change: 1 addition & 0 deletions src/config.ts
Expand Up @@ -12,6 +12,7 @@ export interface Flags {
silent?: boolean;
timeout?: number;
markdown?: boolean;
serverRoot?: string;
}

export async function getConfig(flags: Flags) {
Expand Down
58 changes: 46 additions & 12 deletions src/index.ts
Expand Up @@ -24,6 +24,7 @@ export interface CheckOptions {
timeout?: number;
markdown?: boolean;
linksToSkip?: string[] | ((link: string) => Promise<boolean>);
serverRoot?: string;
}

export enum LinkState {
Expand Down Expand Up @@ -64,28 +65,20 @@ export class LinkChecker extends EventEmitter {
* @param options Options to use while checking for 404s
*/
async check(options: CheckOptions) {
this.validateOptions(options);
options.linksToSkip = options.linksToSkip || [];
options.path = path.normalize(options.path);
let server: http.Server | undefined;
if (!options.path.startsWith('http')) {
let localDirectory = options.path;
let localFile = '';
const s = await stat(options.path);
if (s.isFile()) {
const pathParts = options.path.split(path.sep);
localFile = path.sep + pathParts[pathParts.length - 1];
localDirectory = pathParts
.slice(0, pathParts.length - 1)
.join(path.sep);
}
const serverOptions = await this.getServerRoot(options);
const port = options.port || 5000 + Math.round(Math.random() * 1000);
server = await this.startWebServer(
localDirectory,
serverOptions.serverRoot,
port,
options.markdown
);
enableDestroy(server);
options.path = `http://localhost:${port}${localFile}`;
options.path = `http://localhost:${port}${serverOptions.path}`;
}

const queue = new PQueue({
Expand Down Expand Up @@ -118,6 +111,47 @@ export class LinkChecker extends EventEmitter {
return result;
}

/**
* Validate the provided flags all work with each other.
* @param options CheckOptions passed in from the CLI (or API)
*/
private validateOptions(options: CheckOptions) {
if (options.serverRoot && options.path.startsWith('http')) {
throw new Error(
"'serverRoot' cannot be defined when the 'path' points to an HTTP endpoint."
);
}
}

/**
* Figure out which directory should be used as the root for the web server,
* and how that impacts the path to the file for the first request.
* @param options CheckOptions passed in from the CLI or API
*/
private async getServerRoot(options: CheckOptions) {
if (options.serverRoot) {
const filePath = options.path.startsWith('/')
? options.path
: '/' + options.path;
return {
serverRoot: options.serverRoot,
path: filePath,
};
}
let localDirectory = options.path;
let localFile = '';
const s = await stat(options.path);
if (s.isFile()) {
const pathParts = options.path.split(path.sep);
localFile = path.sep + pathParts[pathParts.length - 1];
localDirectory = pathParts.slice(0, pathParts.length - 1).join(path.sep);
}
return {
serverRoot: localDirectory,
path: localFile,
};
}

/**
* Spin up a local HTTP server to serve static requests from disk
* @param root The local path that should be mounted as a static web server
Expand Down
20 changes: 20 additions & 0 deletions test/test.ts
Expand Up @@ -289,4 +289,24 @@ describe('linkinator', () => {
assert.strictEqual(results.links.length, 3);
assert.ok(results.passed);
});

it('should throw an error if you pass server-root and an http based path', async () => {
await assert.rejects(
check({
path: 'https://jbeckwith.com',
serverRoot: process.cwd(),
}),
/cannot be defined/
);
});

it('should allow overriding the server root', async () => {
const results = await check({
serverRoot: 'test/fixtures/markdown',
markdown: true,
path: 'README.md',
});
assert.strictEqual(results.links.length, 3);
assert.ok(results.passed);
});
});
11 changes: 11 additions & 0 deletions test/zcli.ts
Expand Up @@ -61,4 +61,15 @@ describe('cli', () => {
]);
assert.strictEqual(res.stdout.indexOf('['), -1);
});

it('should accept a server-root', async () => {
const res = await execa('npx', [
'linkinator',
'--markdown',
'--server-root',
'test/fixtures/markdown',
'README.md',
]);
assert.ok(res.stdout.includes('Successfully scanned'));
});
});