Skip to content

Commit

Permalink
Preserve directive prologues fixes #5 #7 (#6)
Browse files Browse the repository at this point in the history
* Preserve directive prologues, fixes #5

The most important goal is correctness. Directives were destroyed by placing
imports above it. Directive prologues are defined by the spec as

> A Directive Prologue is the longest sequence of ExpressionStatements occurring as
> the initial StatementListItems or ModuleItems of a FunctionBody, a ScriptBody, or
> a ModuleBody and where each ExpressionStatement in the sequence consists entirely
> of a StringLiteral token followed by a semicolon. The semicolon may appear explicitly
> or may be inserted by automatic semicolon insertion (12.9). A Directive Prologue may
> be an empty sequence.

* Remove nodes from code via proper algorithm, fixes #7

Previously, a regex string replacement was used, and regex is an inadequate
tool for transforming entire JavaScript code files. Instead, this commit
changes that to a proper algorithm that removes the code at the source position
ranges of the nodes to be removed. This will not work properly if the Babel
parser ever did not return the correct source positions, but that is not a
regression since the former algortihm already relied on the positions
being correct.

* Remove comments from directives we add back at the top, #5

The comments are added back as part of the new directives placed at the top, so
we should remove them from the original code we add after the directives and the
imports.
  • Loading branch information
blutorange committed Apr 27, 2022
1 parent 9f5a0ba commit b5c23c7
Show file tree
Hide file tree
Showing 19 changed files with 453 additions and 37 deletions.
10 changes: 7 additions & 3 deletions src/preprocessor.ts
Expand Up @@ -7,7 +7,7 @@ import { getCodeFromAst } from './utils/get-code-from-ast';
import { getExperimentalParserPlugins } from './utils/get-experimental-parser-plugins';
import { getSortedNodes } from './utils/get-sorted-nodes';

export function preprocessor(code: string, options: PrettierOptions) {
export function preprocessor(code: string, options: PrettierOptions): string {
const {
importOrderParserPlugins,
importOrder,
Expand All @@ -24,6 +24,8 @@ export function preprocessor(code: string, options: PrettierOptions) {
};

const ast = babelParser(code, parserOptions);

const directives = ast.program.directives;
const interpreter = ast.program.interpreter;

traverse(ast, {
Expand All @@ -38,7 +40,9 @@ export function preprocessor(code: string, options: PrettierOptions) {
});

// short-circuit if there are no import declaration
if (importNodes.length === 0) return code;
if (importNodes.length === 0) {
return code;
}

const allImports = getSortedNodes(importNodes, {
importOrder,
Expand All @@ -48,5 +52,5 @@ export function preprocessor(code: string, options: PrettierOptions) {
importOrderSortSpecifiers,
});

return getCodeFromAst(allImports, code, interpreter);
return getCodeFromAst(allImports, code, directives, interpreter);
}
2 changes: 1 addition & 1 deletion src/utils/__tests__/get-code-from-ast.spec.ts
Expand Up @@ -22,7 +22,7 @@ import a from 'a';
importOrderGroupNamespaceSpecifiers: false,
importOrderSortSpecifiers: false,
});
const formatted = getCodeFromAst(sortedNodes, code, null);
const formatted = getCodeFromAst(sortedNodes, code, [], undefined);
expect(format(formatted, { parser: 'babel' })).toEqual(
`// first comment
// second comment
Expand Down
5 changes: 4 additions & 1 deletion src/utils/__tests__/remove-nodes-from-original-code.spec.ts
@@ -1,11 +1,12 @@
import { parse as babelParser } from '@babel/parser';
import { format } from 'prettier';

import { getAllCommentsFromNodes } from '../get-all-comments-from-nodes';
import { getImportNodes } from '../get-import-nodes';
import { getSortedNodes } from '../get-sorted-nodes';
import { removeNodesFromOriginalCode } from '../remove-nodes-from-original-code';

const code = `// first comment
const code = `"some directive";// first comment
// second comment
import z from 'z';
import c from 'c';
Expand All @@ -18,6 +19,7 @@ import a from 'a';
`;

test('it should remove nodes from the original code', () => {
const ast = babelParser(code, { sourceType: 'module' });
const importNodes = getImportNodes(code);
const sortedNodes = getSortedNodes(importNodes, {
importOrder: [],
Expand All @@ -31,6 +33,7 @@ test('it should remove nodes from the original code', () => {
const commentAndImportsToRemoveFromCode = [
...sortedNodes,
...allCommentsFromImports,
...ast.program.directives,
];
const codeWithoutImportDeclarations = removeNodesFromOriginalCode(
code,
Expand Down
4 changes: 2 additions & 2 deletions src/utils/get-all-comments-from-nodes.ts
@@ -1,6 +1,6 @@
import { CommentBlock, CommentLine, Statement } from '@babel/types';
import { CommentBlock, CommentLine, Directive, Statement } from '@babel/types';

export const getAllCommentsFromNodes = (nodes: Statement[]) =>
export const getAllCommentsFromNodes = (nodes: (Directive | Statement)[]) =>
nodes.reduce((acc, node) => {
if (
Array.isArray(node.leadingComments) &&
Expand Down
17 changes: 13 additions & 4 deletions src/utils/get-code-from-ast.ts
@@ -1,26 +1,35 @@
import generate from '@babel/generator';
import { InterpreterDirective, Statement, file } from '@babel/types';
import { Directive, InterpreterDirective, Statement, file } from '@babel/types';

import { newLineCharacters } from '../constants';
import { getAllCommentsFromNodes } from './get-all-comments-from-nodes';
import { removeNodesFromOriginalCode } from './remove-nodes-from-original-code';

/**
* This function generate a code string from the passed nodes.
* @param nodes all imports
* @param originalCode
* @param nodes All imports, in the sorted order in which they should appear in
* the generated code.
* @param originalCode The original input code that was passed to this plugin.
* @param directives All directive prologues from the original code (e.g.
* `"use strict";`).
* @param interpreter Optional interpreter directives, if present (e.g.
* `#!/bin/node`).
*/
export const getCodeFromAst = (
nodes: Statement[],
originalCode: string,
directives: Directive[],
interpreter?: InterpreterDirective | null,
) => {
const allCommentsFromImports = getAllCommentsFromNodes(nodes);
const allCommentsFromDirectives = getAllCommentsFromNodes(directives);

const nodesToRemoveFromCode = [
...nodes,
...allCommentsFromImports,
...allCommentsFromDirectives,
...(interpreter ? [interpreter] : []),
...directives,
];

const codeWithoutImportsAndInterpreter = removeNodesFromOriginalCode(
Expand All @@ -31,7 +40,7 @@ export const getCodeFromAst = (
const newAST = file({
type: 'Program',
body: nodes,
directives: [],
directives: directives,
sourceType: 'module',
interpreter: interpreter,
sourceFile: '',
Expand Down
5 changes: 4 additions & 1 deletion src/utils/get-import-nodes.ts
Expand Up @@ -2,7 +2,10 @@ import { ParserOptions, parse as babelParser } from '@babel/parser';
import traverse, { NodePath } from '@babel/traverse';
import { ImportDeclaration, isTSModuleDeclaration } from '@babel/types';

export const getImportNodes = (code: string, options?: ParserOptions) => {
export const getImportNodes = (
code: string,
options?: ParserOptions,
): ImportDeclaration[] => {
const importNodes: ImportDeclaration[] = [];
const ast = babelParser(code, {
...options,
Expand Down
5 changes: 4 additions & 1 deletion src/utils/get-sorted-nodes.ts
Expand Up @@ -17,7 +17,10 @@ import { getSortedNodesByImportOrder } from './get-sorted-nodes-by-import-order'
* @param nodes All import nodes that should be sorted.
* @param options Options to influence the behavior of the sorting algorithm.
*/
export const getSortedNodes: GetSortedNodes = (nodes, options) => {
export const getSortedNodes: GetSortedNodes = (
nodes,
options,
): ImportOrLine[] => {
const { importOrderSeparation } = options;

// Split nodes at each boundary between a side-effect node and a
Expand Down
197 changes: 173 additions & 24 deletions src/utils/remove-nodes-from-original-code.ts
@@ -1,20 +1,165 @@
import {
CommentBlock,
CommentLine,
Directive,
ImportDeclaration,
InterpreterDirective,
Statement,
} from '@babel/types';

/** Escapes a string literal to be passed to new RegExp. See: https://stackoverflow.com/a/6969486/480608.
* @param s the string to escape
/** A range between a start position (inclusive) and an end position (exclusive). */
type Range = readonly [start: number, end: number];

/** An optional range between a start position (inclusive) and an end position (exclusive). */
type OptionalRange = readonly [start: number | null, end: number | null];

/** Compares two range by their start position. */
function compareRangesByStart(range1: Range, range2: Range): number {
return range1[0] < range2[0] ? -1 : range1[0] > range2[0] ? 1 : 0;
}

/** Type predicate that checks whether a range has a defined start and end. */
function hasRange(range: OptionalRange): range is Range {
return (
range[0] !== null &&
range[1] !== null &&
Number.isSafeInteger(range[0]) &&
Number.isSafeInteger(range[1])
);
}

/**
* @param range1 One range to check.
* @param range2 Another range to check.
* @param `true` if both ranges have some overlap. This overlap may consist of
* a single point, i.e. `[2, 5)` and `[4, 8)` are considered overlapping.
*/
const escapeRegExp = (s: string) => s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
function hasOverlap(range1: Range, range2: Range): boolean {
return range1[1] > range2[0] && range2[1] > range1[0];
}

/**
* Removes imports from original file
* @param code the whole file as text
* @param nodes to be removed
* Given two ranges that are known to overlap, constructs a new range
* representing the single range enclosing both ranges.
* @param range1 One range to process.
* @param range2 Another range to process.
* @returns A single range representing the union of both ranges.
*/
function mergeOverlappingRanges(range1: Range, range2: Range): Range {
return [Math.min(range1[0], range2[0]), Math.max(range1[1], range2[1])];
}

/**
* Given a list of ordered, possibly overlapping (non-disjoint) ranges,
* constructs a new list of ranges that consists entirely of disjoint ranges.
* The new list is also ordered.
* @param A list of ranges that may be overlapping, but are ordered by their
* start position.
* @return A list of disjoint ranges that are also ordered by their start
* position.
*/
function mergeRanges(ranges: Range[]): Range[] {
// Start with a result list initialized to the empty list
// Iterate over all given ranges. If a range overlaps the last item in
// the result list, replace the last item with the merger between that item
// and the range. Otherwise, just add the item to the result list.
// For comparison, see also
// https://www.geeksforgeeks.org/merging-intervals/
const merged: Range[] = [];
for (const range of ranges) {
const currRange = merged[merged.length - 1];
if (currRange !== undefined && hasOverlap(currRange, range)) {
merged[merged.length - 1] = mergeOverlappingRanges(
currRange,
range,
);
} else {
merged.push(range);
}
}
return merged;
}

/**
* Takes a list of ordered, disjoint, non-overlapping ranges and a range
* `[0, totalLength)` that encloses all those ranges.
*
* Constructs a new list of ranges representing the negation of the ranges with
* respect to the enclosing range `[0, totalLength)`. Put in other words,
* subtracts the given ranges from the range `[0, totalLength)`.
*
* More formally, let `r_1`, `r_2`, ..., `r_n` denote the sets represented by
* the given ranges; and let `r` be the set `[0, totalLength)`. Then this
* function returns a list of ranges representing the set
*
* > r \ r_1 \ r_2 \ ... \ r_n
*
* (where `\` is the set negation operator)
* @param ranges A list of disjoint (non-overlapping) ranges ordered by
* their start position.
* @param totalLength The end of the enclosing range from which to subtract
* the given ranges.
* @returns A list of ranges representing the inverse of the given ranges with
* respect to the enclosing range.
*/
function invertRanges(ranges: Range[], totalLength: number): Range[] {
// We'd run into out-of-bounds checks if we performed the rest of the
// algorithm with an empty array, and would have to insert additional
// checks. So just return immediately to keep the code simpler.
if (ranges.length === 0) {
return ranges;
}

const resultRanges: Range[] = [];
const isValidRange = (start: number, end: number) => end > start;

// Add the part from the start of the enclosing range to the start of the
// first range to exclude.
//
// |-----------xxxxx-----xxxx-----xxxx-----------|
// ^---------^
// This part
const firstRange = ranges[0];
if (isValidRange(0, firstRange[0])) {
resultRanges.push([0, firstRange[0]]);
}

// Iterate over the parts between the ranges to remove and add those parts.
//
// |----------xxxxx-----xxxx------xxxx-----------|
// ^---^ ^----^
// These parts
for (let index = 0; index < ranges.length - 1; index += 1) {
const prevRange = ranges[index];
const nextRange = ranges[index + 1];
const start = prevRange[1];
const end = nextRange[0];
if (isValidRange(start, end)) {
resultRanges.push([start, end]);
}
}

// Add the part from the end of the last range to exclude to the end of the
// enclosing range.
//
// |----------xxxxx-----xxxx-----xxxx------------|
// ^----------^
// This part
const lastRange = ranges[ranges.length - 1];
if (isValidRange(lastRange[1], totalLength)) {
resultRanges.push([lastRange[1], totalLength]);
}

return resultRanges;
}

/**
* Given a piece of code and a list of nodes that appear in that code, removes
* all those nodes from the code.
* @param code The whole file as text
* @param nodes List of nodes to be removed from the code.
* @return The given code with all parts of the code removed that correspond to
* one of the given nodes.
*/
export const removeNodesFromOriginalCode = (
code: string,
Expand All @@ -24,23 +169,27 @@ export const removeNodesFromOriginalCode = (
| CommentLine
| ImportDeclaration
| InterpreterDirective
| Directive
)[],
) => {
let text = code;
for (const node of nodes) {
const start = Number(node.start);
const end = Number(node.end);
if (Number.isSafeInteger(start) && Number.isSafeInteger(end)) {
text = text.replace(
// only replace imports at the beginning of the line (ignoring whitespace)
// otherwise matching commented imports will be replaced
new RegExp(
'^\\s*' + escapeRegExp(code.substring(start, end)),
'm',
),
'',
);
}
}
return text;
): string => {
// A list of ranges we should remove from the code
// Each range [start, end] consists of a start position in the code
// (inclusive) and an end position in the code (exclusive)
const excludes = nodes
.map((node) => [node.start, node.end] as const)
.filter(hasRange)
.sort(compareRangesByStart);

// In case there are overlapping ranges, merge all overlapping ranges into
// a single range.
const mergedExcludes = mergeRanges(excludes);

// Find the code ranges we want to keep by inverting the excludes with
// respect to the entire range [0, code.length]
const includes = invertRanges(mergedExcludes, code.length);

// Extract all code parts we want to keep and join them together again
return includes
.map((include) => code.substring(include[0], include[1]))
.join('');
};

0 comments on commit b5c23c7

Please sign in to comment.