Skip to content

Commit

Permalink
(fix) fixed the tests + (feat) added castor tooling and a Symfony bridge
Browse files Browse the repository at this point in the history
* added castor and some tooling
* updated tests
* added a Symfony bundle for easy configuration
  • Loading branch information
xavierlacot committed Aug 21, 2023
1 parent 8306969 commit 04ab340
Show file tree
Hide file tree
Showing 139 changed files with 4,082 additions and 433 deletions.
78 changes: 78 additions & 0 deletions .castor/qa.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
<?php

/*
* This file is part of JoliCode's "markdown fixer" project.
*
* (c) JoliCode <coucou@jolicode.com>
*
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*/

namespace qa;

use Castor\Attribute\AsOption;
use Castor\Attribute\AsTask;
use Symfony\Component\Console\Input\InputOption;

use function Castor\get_context;
use function Castor\run;

#[AsTask(description: 'Runs all QA tasks')]
function all(): void
{
install();
cs(false);
phpstan();
rector();
phpunit();
}

#[AsTask(description: 'Installs tooling')]
function install(): void
{
run('composer install --working-dir tools/php-cs-fixer');
run('composer install --working-dir tools/phpstan');
run('composer install --working-dir tools/phpunit');
run('composer install --working-dir tools/rector');
}

#[AsTask(description: 'Fix coding standards')]
function cs(
#[AsOption(name: 'dry-run', description: 'Do not make changes and outputs diff', mode: InputOption::VALUE_NONE)]
bool $dryRun,
): int {
$command = 'tools/php-cs-fixer/vendor/bin/php-cs-fixer fix';

if ($dryRun) {
$command .= ' --dry-run --diff';
}

$c = get_context()
->withAllowFailure(true)
;

return run($command, context: $c)->getExitCode();
}

#[AsTask(description: 'Run the phpstan analysis')]
function phpstan(): int
{
return run('tools/phpstan/vendor/bin/phpstan analyse')->getExitCode();
}

#[AsTask(description: 'Run the phpunit tests')]
function phpunit(): int
{
$c = get_context()
->withAllowFailure(true)
;

return run('tools/phpunit/vendor/bin/simple-phpunit', context: $c)->getExitCode();
}

#[AsTask(description: 'Run the rector upgrade')]
function rector(): int
{
return run('tools/rector/vendor/bin/rector process')->getExitCode();
}
52 changes: 52 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: Continuous Integration

'on':
push:
branches:
- master
pull_request:
branches:
- master

jobs:
ci:
name: Run the tests suite
runs-on: ubuntu-latest
strategy:
matrix:
php-versions:
- '8.1'
- '8.2'

steps:
- name: Checkout
uses: actions/checkout@v3

- name: Setup PHP
uses: shivammathur/setup-php@v2
with:
php-version: '${{ matrix.php-versions }}'
extensions: mbstring, dom
tools: jolicode/castor

- name: Validate composer.json and composer.lock
run: composer validate --strict

- name: Install dependencies
run: castor install

-
name: Install quality tools
run: castor qa:install

-
name: Check coding standards
run: castor qa:cs --dry-run

-
name: Run PHPStan
run: castor qa:phpstan

-
name: Run tests
run: castor qa:phpunit
16 changes: 15 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
/.phpunit.result.cache
/composer.lock
/vendor/

/.castor.stub.php

# tools dependencies
tools/*/vendor/

###> symfony/phpunit-bridge ###
.phpunit.result.cache
/phpunit.xml
###< symfony/phpunit-bridge ###

###> friendsofphp/php-cs-fixer ###
/.php-cs-fixer.php
/.php-cs-fixer.cache
###< friendsofphp/php-cs-fixer ###
44 changes: 44 additions & 0 deletions .php-cs-fixer.dist.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<?php

/*
* This file is part of JoliCode's "markdown fixer" project.
*
* (c) JoliCode <coucou@jolicode.com>
*
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*/

$finder = (new PhpCsFixer\Finder())
->ignoreVCSIgnored(true)
->ignoreDotFiles(false)
->in(__DIR__)
->append([
__FILE__,
])
;

$header = <<<'EOF'
This file is part of JoliCode's "markdown fixer" project.
(c) JoliCode <coucou@jolicode.com>
For the full copyright and license information, please view the LICENSE
file that was distributed with this source code.
EOF;

return (new PhpCsFixer\Config())
->setRiskyAllowed(true)
->setRules([
'@PHP81Migration' => true,
'@Symfony' => true,
'@Symfony:risky' => true,
'concat_space' => ['spacing' => 'one'],
'header_comment' => ['header' => $header],
'ordered_class_elements' => true, // Symfony(PSR12) override the default value, but we don't want
'blank_line_before_statement' => true, // Symfony(PSR12) override the default value, but we don't want
'phpdoc_to_comment' => ['ignored_tags' => ['var']],
'trailing_comma_in_multiline' => ['elements' => ['arrays', 'match', 'parameters']],
])
->setFinder($finder)
;
19 changes: 19 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Copyright (c) 2023 JoliCode

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is furnished
to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
147 changes: 144 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,153 @@

## Usage

TODO
```php
use JoliMarkdown\MarkdownFixer;

$markdown = <<<MARKDOWN
# A sample Markdown document

Some paragraph here with an image <img src="/image.png" alt="description" /> inside.
MARKDOWN;

$markdownFixer = new MarkdownFixer();
$fixedMarkdown = $markdownFixer->fix($markdown);
```

The code above will return a "markdownized" version of the input string:

```md
# A sample Markdown document

Some paragraph here with an image [description](/image.png) inside.
```

If you are using Symfony, you may want to read the [documentation for the associated bundle](src/Bridge/Symfony/README.md).

## Configuration

Several configuration options are available as [League CommonMark](https://commonmark.thephpleague.com/) environment configuration options, to customize the behavior of the Markdown fixer:

```php
use JoliMarkdown\MarkdownFixer;
use League\CommonMark\Environment\Environment;

$markdown = <<<MARKDOWN
- some
- list
MARKDOWN;

$markdownFixer = new MarkdownFixer(new Environment([
'joli_markdown' => [
'unordered_list_marker' => '*',
],
]));
$fixedMarkdown = $markdownFixer->fix($markdown);

// outputs:
// * some
// * list
```

- `internal_domains`: an array of domains that are considered internal to the website. Whenever an image or a link URL is found, that sits under one of the listed domains, the URL will be converted to a relative one. Defaults to `[]`.
- `prefer_asterisk_over_underscore`: a boolean to indicate whether to prefer `*` over `_` for emphasis. Defaults to `true`.
- `unordered_list_marker`: a string to use as the marker for unordered lists. Defaults to `-`.

## Tests

To run the tests:
Tests are located under the `tests` directory and are written using [PHPUnit](https://phpunit.de/):

```bash
php vendor/bin/simple-phpunit
castor qa:install
castor qa:phpunit
```

## Context

Markdown is a simple text syntax for writing structured documents. Since its creation in 2004, this syntax has aimed to offer an alternative, faster and simpler way of writing HTML documents for Web publishing. Over the ensuing years, Markdown syntax has evolved iteratively, without any formal, perfectly standardized specification. Various variants have emerged, but none has become a de facto standard.

One of the most robust alternatives, however, is [CommonMark](https://commonmark.org/), a Markdown variant that was formally specified in 2014 and has been evolving ever since.

Markdown / Commonmark are frequently used in the development world (documentation in the form of a markdown README file, adoption by many publishing platforms) and is often also employed for web publishing. It was, for example, the syntax chosen when the [JoliCode website](https://jolicode.com) was created in 2012, and is still used today to structure the various bodies of content (blog posts, customer references, technologies, team sheets, etc.).

However, over the last 12 years, our way of transforming Markdown content into HTML has changed: writing a few articles in pure HTML, then using a *client-side* javascript Markdown pre-processor (in the Web browser), then finally, over the last few years, migrating to the [`league/commonmark`](https://commonmark.thephpleague.com/) library, which allows you to [transform] Markdown into HTML on the server side, in PHP. This library was chosen because it is particularly complete, well-maintained, extensible and robust.

During the development of `league/commonmark`, extension mechanisms were added, to support different Markdown "extensions", i.e. to support syntax elements that are not part of the CommonMark standard, but bring syntactic flexibility to writers. For example, the ["tables"] extension (https://commonmark.thephpleague.com/2.4/extensions/tables/#syntax) makes it possible to write tables in Markdown, with a lighter, more readable syntax, which is not possible in "standard" CommonMark.

One of the founding features of Markdown is its compatibility with HTML: in Markdown, it's perfectly valid to insert HTML tags into text, and these will simply be passed on as they are in the final HTML document. For example, you can write:

```markdown
# A Markdown document

<p>An HTML paragraph.</p>

A paragraph in Markdown.
```

Such a document will be rendered, in HTML, as follows:

```html
<h1>A Markdown document</h1>
<p>A paragraph in HTML.</p>
<p>A paragraph in Markdown.</p>
```

CommonMark's extension mechanism is therefore interesting, as it allows syntactic elements to be added that the extension will be able to interpret to generate rich, complex HTML output, without the end user (the editor) having to write HTML. This notion of extension is provided for in CommonMark (the [CommonMark specification](https://spec.commonmark.org/0.30/) is itself [written in CommonMark](https://github.com/commonmark/commonmark-spec/blob/master/spec.txt) and uses an extension to generate side-by-side rendering of Markdown syntax and the corresponding HTML output, as can be seen, for example, in the ["Tabs"](https://spec.commonmark.org/0.30/#tabs) section).

On the JoliCode site, we've taken advantage of the flexibility of `league/commonmark` to enrich HTML rendering, over the years, so that we can write richer, more expressive, more visual Markdown documents. For example, we've added an extension to write footnotes, HTML tables, strikethrough text, add HTML attributes to external links, automatically add attributes to `<img>` tags, and so on.

In spite of this, over the past 12 years we have frequently written HTML code within Markdown articles, in order to meet certain needs:

- add CSS classes to HTML elements, to be able to style them differently (centering an image on the page, for example) ;
- insert code with CSS classes, to use a syntax highlighting library;
- create the HTML structure to position two images side by side ;
- etc.

Sometimes HTML code has been added because the author of an article was uncomfortable with certain arcana of markdown, and chose the most direct approach to be able to publish his content. The use of HTML may have been appropriate at the time, but as the possibilities offered by HTML change, so do its limits: whereas for elements written in markdown, we can now make the program in charge of HTML rendering evolve to take on board new HTML functionalities, we can't do this for elements written directly in HTML, which will remain frozen in time in the form their author has chosen.

For example, we'd like to be able to offer images in modern, higher-performance formats (such as webp, which is both smaller and of better quality) than those used just a few years ago. For these images, we also want to move away from the use of the `<img>` tag, and take advantage of `<picture>`, `<source>` tags, and attributes like `srcset`. For images that have been inserted into articles using Markdown syntax, we can upgrade the HTML rendering program to support these new formats and tags. For images that have been inserted in HTML, we can't do this, and so have to replace them manually - or leave them as they are, with the inconvenience of having to accept that the articles concerned use dated, less efficient technologies, which have an impact on both speed and the comfort offered to site users.

So we're looking for an approach to *correct* existing Markdown articles, replacing the HTML elements they contain with equivalent Markdown elements wherever possible without distorting the final HTML rendering.

An extension, available in `league/commonmark` [for a few years now](https://github.com/thephpleague/commonmark/pull/489), can specifically help us with this task: it's the ["Attributes"] extension(https://commonmark.thephpleague.com/2.4/extensions/attributes/), which lets you add HTML attributes to Markdown elements. For example, you can write:

```markdown
{.block-class}
![An image](/path/to/image.jpg)

![Another image](/path/to/image.jpg){.image-class}
```

which will be rendered in HTML as follows:

```html
<p class="block-class"><img src="/path/to/image.jpg" alt="Une image"></p>
<p><img src="/path/to/image.jpg" alt="Another image" class="image-class"></p>
```

With the help of this extension, we'd like to be able to write a program which, for each Markdown article on the site, will:

- analyze the Markdown content of the article;
- identify HTML elements that can be replaced by equivalent Markdown elements;
- replace these HTML elements with Markdown elements, adding the necessary HTML attributes so that the final HTML rendering is identical to that of the original article.

This repository proposes a tool to achieve this goal, using the following overall approach:

- from an existing string (the initial Markdown content of the article), an abstract syntax tree (AST) is generated using the `league/commonmark` Markdown parser. This parser is specifically configured, with few extensions enabled, to be as close as possible to standard CommonMark syntax and to obtain an AST that contains (almost) only the basic syntactic elements of CommonMark ;
- the `league/commonmark` parser returns a `Document`, which is a hierarchy of `Nodes` (a `Node` being an element of the AST). Each node is typed (for example, a `Node` of type `Paragraph` represents a paragraph, a `Node` of type `Image` represents an image, etc.), and the HTML code parts are parsed in the form of `HtmlBlock` or `HtmlInline` nodes;
- this Document is then **corrected** via a set of correction classes. For example:
- if a Node of type `FencedCode` has a CSS class attribute `language-php`, then this class is removed and instead the Node's `Info` attribute is updated with the value `php` ;
- if a node of type `Image` has an absolute URL while the image is served by the JoliCode site, then this URL is replaced by a relative URL.
Numerous tests can be used to check special cases and different situations.
- ditto for HTML links;
- for `HTMLBlock` and `HTMLInline` nodes, a special treatment is applied:
- HTML content is loaded into a DOM tree
- this tree is then recursively traversed in an attempt to reconstitute equivalent pure Markdown nodes. For example, if we find a `<p>` element in the DOM, we'll try to replace it with a Markdown Node of type `Paragraph`. Each time, the HTML attributes are transformed into attributes as proposed by the `league/commonmark` "Attributes" extension;
- as a last resort, if at a given level of recursion we are unable to reconstitute a Markdown Node :
- we use the `league/html-to-markdown` library to try and convert HTML into Markdown. This step is necessary to transform HTML elements into Markdown that are not supported by the correction classes we've implemented (for example, HTML tables: we don't offer a "Fixer" for the DOM element `<table>`).
- the resulting string is returned as a new `HTMLBlock` or `HTMLInline` node, depending on the type of first-level node;
- finally, as a last step, the new `Document` thus corrected is "rendered" as a string, which is the corrected Markdown content of the original article. For this purpose, a set of Renderer classes have been written, heavily inspired by the [wnx/commonmark-markdown-renderer] library (https://github.com/stefanzweifel/commonmark-markdown-renderer).

## License

This library is under MIT License. See the LICENSE file.
23 changes: 23 additions & 0 deletions castor.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<?php

/*
* This file is part of JoliCode's "markdown fixer" project.
*
* (c) JoliCode <coucou@jolicode.com>
*
* For the full copyright and license information, please view the LICENSE
* file that was distributed with this source code.
*/

use Castor\Attribute\AsTask;

use function Castor\import;
use function Castor\run;

import(__DIR__ . '/.castor');

#[AsTask(description: 'Installs the application (composer, yarn, ...)')]
function install(): void
{
run('composer install -n --prefer-dist --optimize-autoloader');
}

0 comments on commit 04ab340

Please sign in to comment.