Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❗️ REFACTOR markdown links #613

Closed
wants to merge 32 commits into from
Closed

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Aug 31, 2022

The Issue

In MyST, currently, there is limited capability to specify "document-level" referencing, which work independent of the larger project.

Currently, all sphinx reference roles are project wide, e.g. {any}, {ref}, {doc}, ...
Note, these roles also have two limitation: (a) they are maybe not so "Markdownic", and (b) they cannot support nested syntax text.

Then for Markdown links:

  1. [text](https://example.com) is an external link
  2. [text](#target) links only to local "myst anchors", created by the anchor extension
  3. [text](path/to/doc.md) links to another document
  4. [text](path/to/doc.md#target) links only myst anchors in another document
  5. [text](target) links to anything in the project

Note also, that (2), (3), and (4) do not work for (docutils) single page builds, and (5) acts differently dependent on single page (docutils) or project (sphinx) builds.

The Goals

  1. Have [text](#target) link to any target in the local document, and work for docutils and sphinx
  2. Replace [text](target) with a more "specific" syntax, for what the target is targetting

Aside: anatomy of a CommonMark link

[Explicit _Markdown_ text](URI "optional explicit title")

The Uniform Resource Identifier (URI), should generally follow the specification in:

URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]

Note, if your URI has spaces in, then it can be enclosed in <>, e.g.

[text](<URI with space> "title")

Proposal

  1. This PR makes [text](#target) and [text](relative/path/file.md#target) work to reference any "standard" local target (plus anchors).
  2. [text](target) is replaced with a myst scheme, that can have different specificity [text](myst:reftype[?refquery]#target)

The implemented link types are currently as follows:

Link Type Auto Inline Docutils*
External URL <https://example.com> [](https://example.com)
Local file - [](file.txt)
Project document <myst:doc#file> [](file.md)
Local target <myst:local#target> [](#target)
Target in document <myst:doc?t=target#file> [](file.md#target)
Target in project <myst:project#target> [](myst:project#target)
Target in inventory <myst:inv#target> [](myst:inv#target)

* these have logic that relies on handling a full project, and so cannot be used when single document parsing


Questions:

  • How best to have syntax to represent up reftype, reftarget, refquery?

Note about:

  • difficulty of not wanting to replicate/maintain sphinx stuff, but all those roles don't allow nested text
  • specificity
  • what's allowed by docutils
  • "auto" creation of link text
  • anchor creation

TODO:

  • document path always posix?
  • files / documents relative to project source?
  • Have [text](=target) be an inline target

@chrisjsewell chrisjsewell marked this pull request as draft August 31, 2022 16:40
@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

Base: 89.86% // Head: 89.27% // Decreases project coverage by -0.58% ⚠️

Coverage data is based on head (ec01c85) compared to base (28725fc).
Patch coverage: 88.72% of modified lines in pull request are covered.

❗ Current head ec01c85 differs from pull request most recent head f468ad0. Consider uploading reports for the commit f468ad0 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #613      +/-   ##
==========================================
- Coverage   89.86%   89.27%   -0.59%     
==========================================
  Files          21       24       +3     
  Lines        2150     2826     +676     
==========================================
+ Hits         1932     2523     +591     
- Misses        218      303      +85     
Flag Coverage Δ
pytests 89.27% <88.72%> (-0.59%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
myst_parser/config/main.py 85.96% <72.72%> (-0.96%) ⬇️
myst_parser/mdit_to_docutils/local_links.py 82.25% <82.25%> (ø)
myst_parser/mdit_to_docutils/base.py 90.70% <85.83%> (-1.43%) ⬇️
myst_parser/sphinx_ext/references.py 86.80% <86.80%> (ø)
myst_parser/sphinx_ext/main.py 90.19% <91.66%> (-0.43%) ⬇️
myst_parser/mdit_to_docutils/inventory.py 92.37% <92.37%> (ø)
myst_parser/warnings.py 96.42% <96.42%> (ø)
myst_parser/mdit_to_docutils/html_to_nodes.py 90.90% <100.00%> (+0.16%) ⬆️
myst_parser/mdit_to_docutils/sphinx_.py 98.92% <100.00%> (+4.74%) ⬆️
myst_parser/parsers/docutils_.py 83.62% <100.00%> (+2.53%) ⬆️
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@rowanc1
Copy link
Member

rowanc1 commented Aug 31, 2022

I quite like the [text](myst:project#target) syntax.

@choldgraf
Copy link
Member

This seems like a nice direction to me - I like the idea of myst: being an extension point in the links.

A few quick thoughts:

  • Could the @ symbol be useful here? It has a common connotation as a "reference" symbol for citations in Pandoc. Maybe @target can be a short-hand for just [](#target)? @mydoc.md#target -> [](mydoc.md#target etc?
  • Could we use a short-hand for myst: to avoid the extra verbosity (which I think would be a bigger deal if there are lots of in-line references like this)? E.g. []($project#target would be short-hand for [](myst:project#target. Maybe that's something to think about for the future though, probably not needed now.
  • From a design perspective, I think we should try to disentangle "what is the most intuitive / flexible design for reference syntax" and "what is possible in Docutils / Sphinx. I know that there are obviously important relationships between the two from a practicality perspective, but I feel like our target should be "the best implementation-agnostic MyST spec".
  • Here's Obsidian's design around references, it seems to have a pretty happy/loyal following around it. Although they use wiki-style link syntax (ref: Wikilinks [[ ]] syntax for cross-references #421)

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Sep 3, 2022

Thanks for the comments @choldgraf

Could the @ symbol be useful here? It has a common connotation as a "reference" symbol for citations in Pandoc.

This is exactly I didn't want to use it here, since it is specifically reserved for citations, which are not the same as internal links. In the future it is likely that we will want to use @ specifically for this purpose of citation referencing, i.e. referencing a "key" in an external file

Could we use a short-hand for myst: to avoid the extra verbosity

"terseness" is certainly design goal 👍, but yeh I'd just worry about introducing too many "magic" symbols without thinking through it properly. I'd say we want a balance between:

  • commonmark compliance: where possible we should "re-use" the already available syntax, or at least have it degrade nicely
  • remeberability: having a syntax that is easy to remember
  • readability: having a syntax which people can understand at a glance
  • terseness: limiting "boilerplate" syntax
  • extensibiility: having syntaxes that will not limit us from adding features in the future

@choldgraf
Copy link
Member

Makes sense re: citations. I agree that is a different thing.

For short-hand symbols, sounds like something we can just track in a separate issue and revisit if many users report verbosity as a pain point

@chrisjsewell
Copy link
Member Author

For short-hand symbols, sounds like something we can just track in a separate issue and revisit if many users report verbosity as a pain point

Yeh exactly, I also would rather not introduce too many ways of doing the same thing

@rowanc1
Copy link
Member

rowanc1 commented Sep 5, 2022

Just a note that pandoc @my-label works for both citations and inline references, which as an author makes this very simple to write.

Question on a cross-project link: are you intending the syntax to be [text](myst:project?o=label#target) where project is filled in with the name of the project, keyed off the config? For example, [](myst:spec#directive) if I wanted to target this, where the spec name/id was defined somewhere in my config?

From the updated description it looks like you might be adding this as the i part instead? Going with something like the above might cut back on the verbosity and number of keys/query params you have to remember (i.e. the local project is named project) and the last two link examples are the same.

What is the o= query param supposed to do? Assuming d is for domain? And i could be disappear if using the name of the project after myst:.

@chrisjsewell
Copy link
Member Author

@rowanc1
Copy link
Member

rowanc1 commented Sep 6, 2022

Had an hour long convo with @chrisjsewell today, some notes below!

Summary

We should simplify some of the syntax:

  • Targets [](#target) look up locally, then project wide (but not externally)
    • Targets can be explicitly done under the <project:> protocol, which can have an optional file path.
    • These are really only used for completeness, and documentation points people towards markdown links
    • This is nice, because vscode autocompletes, and the syntax is really terse and we don't loose any thing (I don't think)
  • The myst: protocol is followed by the project, rather than inv
  • Relative and absolute paths work. Absolute paths are from the project root. The path separator is posix /
  • This should work with external objects.inv from intersphinx, and these are named explicitly in the config.yml or config.py and can be looked up.
    • For example: [](myst:jupyterbook#getting-started)

Scratch Notes:


```yaml
intersphinx:
    jupyterbook: (https://..., None)
```

[see external figure](#equation)
[see external figure](#equation)

<project:#equation> % This tries local and then the project
<project:file.md#equation> % This only tries the specific file
<project:/file/path.md#equation> % You can do the local file in the project, but it is a bit awkward
[](./abstract.md) % strips the md
[](_toc.yml) % downloads the thing, split the fragment off, (maybe warn?)
[](/) % This is from the root of the project.

file1.md
# introduction

[](#introduction)            % links locally, always, warns if it is implicit

file2.md
(introduction)=              % this should warn (this is a sphinx warning)
# some other header


* resolves explicit local
* resolves implicit local     (warn if you are trying to link to implicit)
* resolves explicit project


[see external figure](myst:jupyterbook#equation)

<myst:jupyterbook#equation>

{external+jupyterbook:py:class}`equation`


% File part is posix

<myst:doc#file> --> <project:file>
<myst:doc?t=target#file> --> <project:file#target>

<myst:inv#target> --> <myst:jupyterbook#target>
[](myst:inv#target) --> <myst:jupyterbook#target>

@chrisjsewell
Copy link
Member Author

Had an hour long convo with @chrisjsewell today! Some notes below, will clean up this in a sec!

Yep cheers, actually turned in to 2.5 hours 😅 with plenty of actionable items 👌

rowanc1 added a commit to curvenote/curvenote that referenced this pull request Sep 7, 2022
Copy link
Member

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave the docs a pass and think that in general this is a really nice direction to head in. I really like that we are moving away from Sphinx-specific role names, and instead defining a markdownic structure that is flexible enough to be translateable into the respective Sphinx roles under the hood.

I had a few questions and thoughts on how we could try to make the syntax more memorable - some of the wording feels unintuitive to me (particularly around using the word "project" and "myst" to refer to "things in a project" and "other projects", respectively. I threw out a few ideas there but am still not sure what makes the most sense, happy to discuss more.


MyST, supports the following destination types:

| Link Type | Auto | Inline | Single Page[^sp] |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little bit confusing to know when you need to use project: / myst: etc and when you may omit these. I think it might be easier to learn if we separated out the "shorthand" examples a bit.

For example, could we add a third column that shows the short-hand variants for some of these, and then use inline to show explicit URI scheme being used?

So something like the following table:

Link type Autolinks Inline Short-hand
Project document <project:file.md> [](project:file.md) [](file.md) (if file.md is in the project document store)
Local file path <path:file.txt> [](path:file.txt) [](file.txt) (if file.txt is a valid path in the filesystem)

Or, a separate section dedicated to describing the link short-hands in general?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it also might be helpful to split this out into a "spec" which could be our first enhancement proposal, which could go into all of these details, and the user facing docs.

In the user facing docs, I don't think there is ever really a use case where we should encourage people to use the project: form, as that is only necessary to suffice the autolink requiring a protocol.

That is, some of these things are important to discuss from a spec side, but not that useful to encourage or even show to users (or in the worst case, they just confuse people!).

Copy link
Member

@choldgraf choldgraf Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - I am coming at this from a user's perspective, not a spec perspective, so not sure if my comments are helpful or not. I think for the user docs we should choose the 1-3 most important workflows that are simplest and cover 80% of use-cases, and have them dig deeper if they really want to know all the things that MyST could do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the user facing docs, I don't think there is ever really a use case where we should encourage people to use the project: form, as that is only necessary to suffice the autolink requiring a protocol.

I wouldn't say this is necessarily the case, because if you are not using explicit text, then e.g. [](file.md) just "dissapears" in a commonmark renderer (it is here: ), whereas for <project:file.md> you can at least actually see it rendered: project:file.md.
In general I feel it may be better to use auto-links, for things with implicit text

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that all the things I've put in this new "Links and Referencing" section are all things that I use all the time as a user, and wish I'd known more easily how to do.
I all see lots of other people asking how to do them in the forums.

I guess I'd ask; what part of this documentation do you feel should not be there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, before I wrote this, it was really unclear how sphinx numbering worked, and how to work out what to reference from an external project.

Just having the myst-inv cmdline tool and sphinx-build -b myst_refs builder, I feel are going to be so helpful, in working out what references are in your project.

| :----------------- | :------------------------- | :------------------------ | :--------------: |
| External URL | `<https://example.com>` | `[](https://example.com)` | ✅ |
| Local file path | `<path:file.txt>` | `[](file.txt)` | ❌ |
| Project document | `<project:file.md>` | `[](file.md)` | ❌ |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word project: feels a little bit confusing to me because it suggests the target should be a project rather than a document within the project.

Can we define a word that is more semantically tied to the thing being referenced? e.g.:

  • doc
  • ref
  • key
  • id
  • target

I know some of these are maybe "too specific" (e.g. doc:file.md#mylabel isn't strictly a document reference)...

What about just id? This hints that the thing you're referencing is "known" to the system (thus it has an ID) e.g. <id:file.md> is valid because file.md is indexed by the build system, <id:file.md#label> is valid because label is indexed under file.md. This feels a bit more natural to me than <project:file.md> since the thing I'm referencing is an id in the system, not a project. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also feel that this isn't yet a great name, we could also make this longer as it is unlikely to be widely used? localproject:? local:?

Of the above list I am most partial to target: (doc, ref, id and key, seem either too specific or too general to me).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like target. <target#sometarget> seems reasonable...maybe specifying the project name could be a kwarg? E.g., <target?project=someproject#sometarget>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<target#sometarget> seems reasonable

Just to note, this is not valid Markdown (you'll see if you try it <target#sometarget>); auto-links are only recognised if they have a schema: schema:target#sometarget

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define a word that is more semantically tied to the thing being referenced?

From https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax

Each URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme.

I would say project should go to myst, because it is the "MyST specification" we are using to assign the identifiers

| Local target | `<project:.#target>` | `[](.#target)` | ✅ |
| Target in document | `<project:target.md#file>` | `[](file.md#target)` | ❌ |
| Target in project | `<project:#target>` | `[](#target)` | ~[^hash] |
| Cross-project | `<myst:key#target>` | `[](myst:key#target)` | ❌ |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following from the above comment, the myst syntax as used here confuses me a bit, because here we're explicitly referring to a target in another project, but now we don't use the project scheme, we use a myst scheme. To me, the syntax <project:key#target> reads like "reference target in project key". I'm not sure how to similarly interpret the myst: scheme.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well @rowanc1 wanted to use myst: here, so I'll let you take it up with him 😉

I think the logic for handling internal references and external inventory references is perhaps different enough though to warrant a different scheme

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another user through this and they also were confused by the syntax.

We could potentially merge what is currently project: and myst: into a single project:, and if the key isn't given we only look to the local project? That would still keep the logic separate.

I liked myst: because it was short, reinforced the branding of the project and the concept could be extended to other, richer, content links in the future. I am not super tied to it, and I do think that <project:jupyterbook#getting-started> is maybe much clearer?

Copy link
Member

@choldgraf choldgraf Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with calling it myst: if we can have a clear and easily memorable rationale for what the naming means and what behavior it should evoke. The confusing thing to me is that [](myst:, [](file:), and [](project:) all mean different kinds of things. file: is the thing file: refers to, project: is "a thing within the thing specified after project: and myst: is "anything that has something to do with MyST".

If we used myst: it feels more natural to me to use the myst scheme for everything and define the first word after (the "path" in URI language) as the way to interpret what comes next. e.g. [](myst:target?project=executablebooks#mytarget), [](myst:file#hello.txt, [](myst:someotherthing?foo=bar).

Maybe in that case, if a path is not specified (e.g. [](myst:#hello) then the project path is implied ([](myst:project#hello))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the key isn't given we only look to the local project?

what if you want to search through all external projects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how would you reference the MyClass autodoc class above?

autodoc references are not special-cases in any way, they just create a target, wherever the MyClass is documented, and a reference name to it being the fully qualified name (e.g. module.submodule.MyClass).

If you want to reference it you do the same as any other target [](#module.submodule.MyClass) or, with the logic I've put in, you can use [](#*MyClass) to signify a simple regex match of any target ending in MyClass.

Also as with any target, you can scope by domain and object type, in this case py:class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then is this the logic?

Nearly:

  • If there's a markdown link without a schema, then check if it is a valid local path.
  • If the path exists, check if it has a known document file extension. If so, treat it as [](myst:myfile.md)
    • if the document is not in the document store, then it will fail and emit a warning
  • If path exists but doesn't have a known extension, then treat it as a raw download [](myst:myfile.txt?kind=file)
  • If the link starts with # (or .#/? /.?) then assume that it is a MyST target.
  • If the link starts with a schema: get that and try to match it:
    • if it's in the "external list" (http, https etc) treat it as external
    • if it's known to MyST (e.g. myst) use that logic
    • Potentially in the future allow plugins to extend to handling other schemes
  • If there is still no match, then emit a warning.
    • Currently then, mainly for back-compatibility considerations, [](unknown) is treated as if it was actually [](#unknown), and progresses along that logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the basic thing is if you have a name and you say it can relate to any sphinx/intersphinx target, then you have potentially four variables to filter by:

  • the project of the target, i.e. a key in the intersphinx mapping
  • the docname of the target
    • this information is not specifically available from objects.inv, only for local targets
    • and event then, sphinx actually overwrites any conflicting targets (in the same domain/object type group) from different documents, so filtering by document does not work that well (except for MyST anchors, which are a different thing)
  • the domain of the target
  • the object type of the target

You need some way to specify those filters in a nice syntax,
plus also allowing for any of them to not be specified (i.e. search all)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you could say that for [](myst:xxx#name), xxx is either a local document path or key to an objects.inv path/URL.

You then would have to make sure obviously that no key conflicts with a file path, but this should not be too difficult, as long as your key does not end in .md or something.

I think [](#name) (i.e. [](myst:#name)) should always search through everything both locally and all objects.inv, warn if there are multiple matches, then can e.g. choose the match based on some rules like local first.

You don't have a syntax in this way to say only search in objects.inv's, and not locally. For this you could do something like [](*#name) 🤷

You then need to provide a filter for domain/objects.
As I've mentioned already, the most terse format would be [](?domain:object#name), and if you want all domains use [](?*:object#name).
Less terse, but keeping with the "usual" query format, would be [](?f=domain:object#name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for the above, links to "unknown" local files (aka ones to be downloaded) is not part of this, I don't think that should be part of the myst scheme as its quite different

Also [](myst:key) could either be meaningless (as opposed to [](myst:file.md) creating a link to the top of the document), i.e. you just emit a warning, or you do something like create a link to the "base" URL

| External URL | `<https://example.com>` | `[](https://example.com)` | ✅ |
| Local file path | `<path:file.txt>` | `[](file.txt)` | ❌ |
| Project document | `<project:file.md>` | `[](file.md)` | ❌ |
| Local target | `<project:.#target>` | `[](.#target)` | ✅ |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of making the target explicitly local via .#? Is that just in case you have duplicated target IDs in different documents of a project? We should document this behavior explicitly, e.g. a section like ### Restrict target search to the local document

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mainly for targetting auto-generated heading anchors, which could indeed be duplicated across the project

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the logic what @choldgraf specified, it only looks to the local document?

Do other links still look to all implicit/explicit references in the local document first regardless?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rowanc1

So firstly to note, there are no implicit targets stored by sphinx, only a unique set of target names is stored, per domain/object_type, e.g.

std:
  label:
    api/directive:
      docname: api/reference
      id: api-directive
      text: Directive and role processing

If sphinx encounters a domain/object_type/target that is already in the "database", then it will simply emit a warning.
But essentially, we don't have that information at resolution time.

With the MyST heading anchors extension, which the user has to explicitly activate with e.g. myst_heading_anchors = 2, we create an exception to this rule, by explicitly storing all anchors by docname/domain/object_type/target, i.e. this is the only time when we might have duplicate target names for the same domain/object_type

For e.g. [](#target) then:

  1. If heading anchors are not activated, it will simply look "project-wide" to find matches.
    If multiple matches are found, then a warning will be emitted, and (arbitrarily) the first match will be selected, e.g.
    WARNING: Multiple targets found for '*:*:target': 'std:label:target','std:doc:target' [myst.xref_ambiguous]
    
  2. If heading anchors are activated, it will look for both matches in the anchors "database" (for that docname) and matches in the "project" database (no docname filter).
    If both a heading anchor match and other matches are present, then it will emit a warning and select that e.g.
    <src>/test.md:6: WARNING: 'target' anchor superseding other matches: 'std:label:target' [myst.xref_anchor]
    

The difference with using [](other.md#target), is that (a) project matches will be filtered by that docname and (b) heading anchors will be matched by that docname.

If you use [](.#target) and e.g. the file that the reference is in is test.md, then this is the same as doing [](test.md#target)

Does that make sense?

A key design consideration here, is that I didn't want any "silent" selection between multiple matches, i.e. there is always a warning.

But then the user can choose to ignore/suppress this warning, by type e.g.

suppress_warnings = ["myst.xref_ambiguous", "myst.xref_anchor"]

}
```

You can then use `myst:key#target` to reference targets in an external inventory, in a similar fashion to the [project-wide targets](#syntax/referencing/myst-project).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crazy idea, what if we re-used the project (or renamed to id) scheme here, and let people give an external project as a query parameter? e.g. <project:?name=my-proj#mytarget> then it'd be one fewer schema people would have to remember.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually changing how the text in the query works, because I felt it was a bit verbose to specify the domain/object type.

Instead of:

[](project:?d=py&o=class#ref)
[](myst:sphinx?d=py&o=class#ref)

I was moving to :

[](?py:class#ref)
[](myst:sphinx?py:class#ref)

(note you can also now drop the scheme if the URL start with ?)

This tracks closer with, the current "role" way of doing it:

{py:class}`ref`

so the query string no longer maps to the "semi-standard" (its not actually in the spec) key1=val&key2=val format, but the trade-off is that its easier to write 🤷

I've also replaced the pattern matching ?pat#*target with just recognising that if the target starts with a * then it means match the end, e.g.

(areallylonglonglongtarget)=
# title
[](#*target)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think that makes sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it would obviously mean that no you could not allow for <project:?name=my-proj#mytarget>)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that staying within standard links is a positive. This means, for example you can use standard URL parsing libraries, that all support query parameters, for example in Javascript:

const u = new URL('myst:sphinx?d=py&o=class#ref')
u.protocol
u.hash
u.searchParams.get('o')

These are things that are likely supported in any major language, which is a nice thing to not have to reimagine and parse.

Can we just change the d/o --> a type to make it less verbose?

Is it very often that we will have to even use this syntax, because the default target will just go through til it finds the reference, so it is only when that fails that you have to become more specific?

I am coming around to @choldgraf 's idea of merging the two syntaxes into a project: protocol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just change the d/o --> a type to make it less verbose?

I'm not quite sure what you mean by "a type" here?

Is it very often that we will have to even use this syntax

In the sphinx world, I would say yes, it will be widely used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its just err, this looks horrible to me <project:?name=my-proj&d=py&o=class#mytarget>, and not very readable, whereas <project:my-proj?py:class#mytarget> feels so much more concise, and easy to follow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just change the d/o --> a type to make it less verbose?

By that I mean: 'myst:jupyterbook?t=py:class#ref' where t is named something sensible. In your example I guess you are saying that the t= may even be removed?

I agree that putting too much in query parameters makes it pretty unreadable, I am unsure though how much we would actually use the ?t=py:class when the implementation by default looks through all ids regardless of domain, which is different than sphinx, so maybe it won't be as necessary, and optimizing for terse-ness isn't as necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default looks through all ids regardless of domain, which is different than sphinx

there is an {any} role which does this, but most people choose(?) to use a specific role like {ref}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, if were to still use this, I would go for ?f=py:class, since its a filter; you can also do ?f=py:* and ?f=*:class

| Project document | `<project:file.md>` | `[](file.md)` | ❌ |
| Local target | `<project:.#target>` | `[](.#target)` | ✅ |
| Target in document | `<project:target.md#file>` | `[](file.md#target)` | ❌ |
| Target in project | `<project:#target>` | `[](#target)` | ~[^hash] |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this doesn't need the qualifier - it basically means the same thing as the Local target row, right? It'll work as long as the target exists and emit a warning if it doesn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the Local Target row

@chrisjsewell
Copy link
Member Author

Just to add here a part of the design spec I was working on, how sphinx internal targets work:

Sphinx internal target specification

At a minimum, a target must have fields: domain, object_type, docname, name and id.

The name should be unique per domain and object_type.
The user should be able to reference the target using name, and optionally filter by domain and/or object_type (these should contain only a-z).
Names are lower-cased, and whitespace-normalised (all whitespace is replaced with a single space).

The id should be unique per docname.
This is generated internally and need not be exposed to the user.
It should comply with the regex [a-z](-?[a-z0-9]+)*.
(tip to make unique append env.new_serialno())

Each target name can optionally have an implicit text field,
which is the default text used when referencing the target, if no explicit text is provided by the user.

Each target can also have an enum_type and number fields.
All number fields must be unique per enum_type.

A Domain class is responsible for storing and retrieving targets for its object_type and enforcing the above uniqueness contraints.

A Domain should implement the get_objects() method, which returns an iterator of all targets for the domain:
(name, text, object_type, docname, id, priority).
This should be available on reference resolution, after all documents have been parsed.
text can be empty and priority is used to resolve conflicts when multiple targets have the same name.

New for myst-parser:
A Domain can optionally implement a get_object_enum(docname, otype, name) method, which returns the (enum_type, number) for the target or (None, None) if not available.
This should be available on reference resolution, after all documents have been parsed.

Output formats:

  • html: id is used as the id attribute of the target element, and reference anchors use href="#<id>".
  • latex: docname and idare used to generate the label \label{identifier}, where identifier is an escaped version of <docname>:<id>.
    In numbered references, rather than explicitly adding the name or number, \nameref{identifier} and \ref{identifier} are used, so that latex can handle the numbering.
  • singlehtml: Not currently working (In singlehtml output, anchors are non-unique sphinx-doc/sphinx#4814), but should work the same as latex, to combine docname and id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants