New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
❗️ REFACTOR markdown links #613
Conversation
Codecov ReportBase: 89.86% // Head: 89.27% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #613 +/- ##
==========================================
- Coverage 89.86% 89.27% -0.59%
==========================================
Files 21 24 +3
Lines 2150 2826 +676
==========================================
+ Hits 1932 2523 +591
- Misses 218 303 +85
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I quite like the |
This seems like a nice direction to me - I like the idea of A few quick thoughts:
|
Thanks for the comments @choldgraf
This is exactly I didn't want to use it here, since it is specifically reserved for citations, which are not the same as internal links. In the future it is likely that we will want to use
"terseness" is certainly design goal 👍, but yeh I'd just worry about introducing too many "magic" symbols without thinking through it properly. I'd say we want a balance between:
|
Makes sense re: citations. I agree that is a different thing. For short-hand symbols, sounds like something we can just track in a separate issue and revisit if many users report verbosity as a pain point |
Yeh exactly, I also would rather not introduce too many ways of doing the same thing |
Just a note that pandoc Question on a cross-project link: are you intending the syntax to be From the updated description it looks like you might be adding this as the What is the |
New documentation is ready: https://myst-parser--613.org.readthedocs.build/en/613/syntax/syntax.html#links-and-referencing |
Had an hour long convo with @chrisjsewell today, some notes below! SummaryWe should simplify some of the syntax:
Scratch Notes:
|
Yep cheers, actually turned in to 2.5 hours 😅 with plenty of actionable items 👌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave the docs a pass and think that in general this is a really nice direction to head in. I really like that we are moving away from Sphinx-specific role names, and instead defining a markdownic structure that is flexible enough to be translateable into the respective Sphinx roles under the hood.
I had a few questions and thoughts on how we could try to make the syntax more memorable - some of the wording feels unintuitive to me (particularly around using the word "project" and "myst" to refer to "things in a project" and "other projects", respectively. I threw out a few ideas there but am still not sure what makes the most sense, happy to discuss more.
|
||
MyST, supports the following destination types: | ||
|
||
| Link Type | Auto | Inline | Single Page[^sp] | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little bit confusing to know when you need to use project:
/ myst:
etc and when you may omit these. I think it might be easier to learn if we separated out the "shorthand" examples a bit.
For example, could we add a third column that shows the short-hand
variants for some of these, and then use inline
to show explicit URI scheme being used?
So something like the following table:
Link type | Autolinks | Inline | Short-hand |
---|---|---|---|
Project document | <project:file.md> |
[](project:file.md) |
[](file.md) (if file.md is in the project document store) |
Local file path | <path:file.txt> |
[](path:file.txt) |
[](file.txt) (if file.txt is a valid path in the filesystem) |
Or, a separate section dedicated to describing the link short-hands in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it also might be helpful to split this out into a "spec" which could be our first enhancement proposal, which could go into all of these details, and the user facing docs.
In the user facing docs, I don't think there is ever really a use case where we should encourage people to use the project:
form, as that is only necessary to suffice the autolink requiring a protocol.
That is, some of these things are important to discuss from a spec side, but not that useful to encourage or even show to users (or in the worst case, they just confuse people!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - I am coming at this from a user's perspective, not a spec perspective, so not sure if my comments are helpful or not. I think for the user docs we should choose the 1-3 most important workflows that are simplest and cover 80% of use-cases, and have them dig deeper if they really want to know all the things that MyST could do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the user facing docs, I don't think there is ever really a use case where we should encourage people to use the project: form, as that is only necessary to suffice the autolink requiring a protocol.
I wouldn't say this is necessarily the case, because if you are not using explicit text, then e.g. [](file.md)
just "dissapears" in a commonmark renderer (it is here: ), whereas for <project:file.md>
you can at least actually see it rendered: project:file.md.
In general I feel it may be better to use auto-links, for things with implicit text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that all the things I've put in this new "Links and Referencing" section are all things that I use all the time as a user, and wish I'd known more easily how to do.
I all see lots of other people asking how to do them in the forums.
I guess I'd ask; what part of this documentation do you feel should not be there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, before I wrote this, it was really unclear how sphinx numbering worked, and how to work out what to reference from an external project.
Just having the myst-inv
cmdline tool and sphinx-build -b myst_refs
builder, I feel are going to be so helpful, in working out what references are in your project.
| :----------------- | :------------------------- | :------------------------ | :--------------: | | ||
| External URL | `<https://example.com>` | `[](https://example.com)` | ✅ | | ||
| Local file path | `<path:file.txt>` | `[](file.txt)` | ❌ | | ||
| Project document | `<project:file.md>` | `[](file.md)` | ❌ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word project:
feels a little bit confusing to me because it suggests the target should be a project rather than a document within the project.
Can we define a word that is more semantically tied to the thing being referenced? e.g.:
doc
ref
key
id
target
I know some of these are maybe "too specific" (e.g. doc:file.md#mylabel
isn't strictly a document reference)...
What about just id
? This hints that the thing you're referencing is "known" to the system (thus it has an ID) e.g. <id:file.md>
is valid because file.md
is indexed by the build system, <id:file.md#label>
is valid because label
is indexed under file.md
. This feels a bit more natural to me than <project:file.md>
since the thing I'm referencing is an id
in the system, not a project. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also feel that this isn't yet a great name, we could also make this longer as it is unlikely to be widely used? localproject:
? local:
?
Of the above list I am most partial to target:
(doc, ref, id and key, seem either too specific or too general to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like target
. <target#sometarget>
seems reasonable...maybe specifying the project name could be a kwarg? E.g., <target?project=someproject#sometarget>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<target#sometarget>
seems reasonable
Just to note, this is not valid Markdown (you'll see if you try it <target#sometarget>); auto-links are only recognised if they have a schema: schema:target#sometarget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define a word that is more semantically tied to the thing being referenced?
From https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax
Each URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme.
I would say project
should go to myst
, because it is the "MyST specification" we are using to assign the identifiers
| Local target | `<project:.#target>` | `[](.#target)` | ✅ | | ||
| Target in document | `<project:target.md#file>` | `[](file.md#target)` | ❌ | | ||
| Target in project | `<project:#target>` | `[](#target)` | ~[^hash] | | ||
| Cross-project | `<myst:key#target>` | `[](myst:key#target)` | ❌ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following from the above comment, the myst
syntax as used here confuses me a bit, because here we're explicitly referring to a target in another project, but now we don't use the project
scheme, we use a myst
scheme. To me, the syntax <project:key#target>
reads like "reference target
in project key
". I'm not sure how to similarly interpret the myst:
scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well @rowanc1 wanted to use myst:
here, so I'll let you take it up with him 😉
I think the logic for handling internal references and external inventory references is perhaps different enough though to warrant a different scheme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another user through this and they also were confused by the syntax.
We could potentially merge what is currently project:
and myst:
into a single project:
, and if the key isn't given we only look to the local project? That would still keep the logic separate.
I liked myst:
because it was short, reinforced the branding of the project and the concept could be extended to other, richer, content links in the future. I am not super tied to it, and I do think that <project:jupyterbook#getting-started>
is maybe much clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with calling it myst:
if we can have a clear and easily memorable rationale for what the naming means and what behavior it should evoke. The confusing thing to me is that [](myst:
, [](file:)
, and [](project:)
all mean different kinds of things. file:
is the thing file:
refers to, project:
is "a thing within the thing specified after project:
and myst:
is "anything that has something to do with MyST".
If we used myst:
it feels more natural to me to use the myst scheme for everything and define the first word after (the "path" in URI language) as the way to interpret what comes next. e.g. [](myst:target?project=executablebooks#mytarget)
, [](myst:file#hello.txt
, [](myst:someotherthing?foo=bar)
.
Maybe in that case, if a path
is not specified (e.g. [](myst:#hello)
then the project
path is implied ([](myst:project#hello)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the key isn't given we only look to the local project?
what if you want to search through all external projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how would you reference the MyClass autodoc class above?
autodoc references are not special-cases in any way, they just create a target, wherever the MyClass
is documented, and a reference name to it being the fully qualified name (e.g. module.submodule.MyClass
).
If you want to reference it you do the same as any other target [](#module.submodule.MyClass)
or, with the logic I've put in, you can use [](#*MyClass)
to signify a simple regex match of any target ending in MyClass
.
Also as with any target, you can scope by domain and object type, in this case py:class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then is this the logic?
Nearly:
- If there's a markdown link without a schema, then check if it is a valid local path.
- If the path exists, check if it has a known document file extension. If so, treat it as
[](myst:myfile.md)
- if the document is not in the document store, then it will fail and emit a warning
- If path exists but doesn't have a known extension, then treat it as a raw download
[](myst:myfile.txt?kind=file)
- If the link starts with
#
(or.#
/?
/.?
) then assume that it is a MyST target. - If the link starts with a
schema:
get that and try to match it:- if it's in the "external list" (
http
,https
etc) treat it as external - if it's known to MyST (e.g.
myst
) use that logic - Potentially in the future allow plugins to extend to handling other schemes
- if it's in the "external list" (
- If there is still no match, then emit a warning.
- Currently then, mainly for back-compatibility considerations,
[](unknown)
is treated as if it was actually[](#unknown)
, and progresses along that logic
- Currently then, mainly for back-compatibility considerations,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the basic thing is if you have a name
and you say it can relate to any sphinx/intersphinx target, then you have potentially four variables to filter by:
- the project of the target, i.e. a key in the intersphinx mapping
- the docname of the target
- this information is not specifically available from objects.inv, only for local targets
- and event then, sphinx actually overwrites any conflicting targets (in the same domain/object type group) from different documents, so filtering by document does not work that well (except for MyST anchors, which are a different thing)
- the domain of the target
- the object type of the target
You need some way to specify those filters in a nice syntax,
plus also allowing for any of them to not be specified (i.e. search all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you could say that for [](myst:xxx#name)
, xxx
is either a local document path or key to an objects.inv
path/URL.
You then would have to make sure obviously that no key conflicts with a file path, but this should not be too difficult, as long as your key does not end in .md
or something.
I think [](#name)
(i.e. [](myst:#name)
) should always search through everything both locally and all objects.inv
, warn if there are multiple matches, then can e.g. choose the match based on some rules like local first.
You don't have a syntax in this way to say only search in objects.inv
's, and not locally. For this you could do something like [](*#name)
🤷
You then need to provide a filter for domain/objects.
As I've mentioned already, the most terse format would be [](?domain:object#name)
, and if you want all domains use [](?*:object#name)
.
Less terse, but keeping with the "usual" query format, would be [](?f=domain:object#name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for the above, links to "unknown" local files (aka ones to be downloaded) is not part of this, I don't think that should be part of the myst
scheme as its quite different
Also [](myst:key)
could either be meaningless (as opposed to [](myst:file.md)
creating a link to the top of the document), i.e. you just emit a warning, or you do something like create a link to the "base" URL
docs/syntax/syntax.md
Outdated
| External URL | `<https://example.com>` | `[](https://example.com)` | ✅ | | ||
| Local file path | `<path:file.txt>` | `[](file.txt)` | ❌ | | ||
| Project document | `<project:file.md>` | `[](file.md)` | ❌ | | ||
| Local target | `<project:.#target>` | `[](.#target)` | ✅ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of making the target explicitly local via .#
? Is that just in case you have duplicated target IDs in different documents of a project? We should document this behavior explicitly, e.g. a section like ### Restrict target search to the local document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mainly for targetting auto-generated heading anchors, which could indeed be duplicated across the project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the logic what @choldgraf specified, it only looks to the local document?
Do other links still look to all implicit/explicit references in the local document first regardless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So firstly to note, there are no implicit targets stored by sphinx, only a unique set of target names is stored, per domain/object_type
, e.g.
std:
label:
api/directive:
docname: api/reference
id: api-directive
text: Directive and role processing
If sphinx encounters a domain/object_type/target
that is already in the "database", then it will simply emit a warning.
But essentially, we don't have that information at resolution time.
With the MyST heading anchors extension, which the user has to explicitly activate with e.g. myst_heading_anchors = 2
, we create an exception to this rule, by explicitly storing all anchors by docname/domain/object_type/target
, i.e. this is the only time when we might have duplicate target names for the same domain/object_type
For e.g. [](#target)
then:
- If heading anchors are not activated, it will simply look "project-wide" to find matches.
If multiple matches are found, then a warning will be emitted, and (arbitrarily) the first match will be selected, e.g.WARNING: Multiple targets found for '*:*:target': 'std:label:target','std:doc:target' [myst.xref_ambiguous]
- If heading anchors are activated, it will look for both matches in the anchors "database" (for that docname) and matches in the "project" database (no docname filter).
If both a heading anchor match and other matches are present, then it will emit a warning and select that e.g.<src>/test.md:6: WARNING: 'target' anchor superseding other matches: 'std:label:target' [myst.xref_anchor]
The difference with using [](other.md#target)
, is that (a) project matches will be filtered by that docname and (b) heading anchors will be matched by that docname.
If you use [](.#target)
and e.g. the file that the reference is in is test.md
, then this is the same as doing [](test.md#target)
Does that make sense?
A key design consideration here, is that I didn't want any "silent" selection between multiple matches, i.e. there is always a warning.
But then the user can choose to ignore/suppress this warning, by type e.g.
suppress_warnings = ["myst.xref_ambiguous", "myst.xref_anchor"]
docs/syntax/syntax.md
Outdated
} | ||
``` | ||
|
||
You can then use `myst:key#target` to reference targets in an external inventory, in a similar fashion to the [project-wide targets](#syntax/referencing/myst-project). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crazy idea, what if we re-used the project
(or renamed to id
) scheme here, and let people give an external project as a query parameter? e.g. <project:?name=my-proj#mytarget>
then it'd be one fewer schema people would have to remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually changing how the text in the query works, because I felt it was a bit verbose to specify the domain/object type.
Instead of:
[](project:?d=py&o=class#ref)
[](myst:sphinx?d=py&o=class#ref)
I was moving to :
[](?py:class#ref)
[](myst:sphinx?py:class#ref)
(note you can also now drop the scheme if the URL start with ?
)
This tracks closer with, the current "role" way of doing it:
{py:class}`ref`
so the query string no longer maps to the "semi-standard" (its not actually in the spec) key1=val&key2=val
format, but the trade-off is that its easier to write 🤷
I've also replaced the pattern matching ?pat#*target
with just recognising that if the target starts with a *
then it means match the end, e.g.
(areallylonglonglongtarget)=
# title
[](#*target)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think that makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it would obviously mean that no you could not allow for <project:?name=my-proj#mytarget>
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that staying within standard links is a positive. This means, for example you can use standard URL parsing libraries, that all support query parameters, for example in Javascript:
const u = new URL('myst:sphinx?d=py&o=class#ref')
u.protocol
u.hash
u.searchParams.get('o')
These are things that are likely supported in any major language, which is a nice thing to not have to reimagine and parse.
Can we just change the d/o --> a type to make it less verbose?
Is it very often that we will have to even use this syntax, because the default target will just go through til it finds the reference, so it is only when that fails that you have to become more specific?
I am coming around to @choldgraf 's idea of merging the two syntaxes into a project:
protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just change the d/o --> a type to make it less verbose?
I'm not quite sure what you mean by "a type" here?
Is it very often that we will have to even use this syntax
In the sphinx world, I would say yes, it will be widely used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its just err, this looks horrible to me <project:?name=my-proj&d=py&o=class#mytarget>
, and not very readable, whereas <project:my-proj?py:class#mytarget>
feels so much more concise, and easy to follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just change the d/o --> a type to make it less verbose?
By that I mean: 'myst:jupyterbook?t=py:class#ref'
where t
is named something sensible. In your example I guess you are saying that the t=
may even be removed?
I agree that putting too much in query parameters makes it pretty unreadable, I am unsure though how much we would actually use the ?t=py:class
when the implementation by default looks through all ids regardless of domain, which is different than sphinx, so maybe it won't be as necessary, and optimizing for terse-ness isn't as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default looks through all ids regardless of domain, which is different than sphinx
there is an {any}
role which does this, but most people choose(?) to use a specific role like {ref}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, if were to still use this, I would go for ?f=py:class
, since its a filter; you can also do ?f=py:*
and ?f=*:class
docs/syntax/syntax.md
Outdated
| Project document | `<project:file.md>` | `[](file.md)` | ❌ | | ||
| Local target | `<project:.#target>` | `[](.#target)` | ✅ | | ||
| Target in document | `<project:target.md#file>` | `[](file.md#target)` | ❌ | | ||
| Target in project | `<project:#target>` | `[](#target)` | ~[^hash] | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this doesn't need the qualifier - it basically means the same thing as the Local target
row, right? It'll work as long as the target exists and emit a warning if it doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the Local Target
row
Just to add here a part of the design spec I was working on, how sphinx internal targets work: Sphinx internal target specificationAt a minimum, a target must have fields: The The Each target name can optionally have an implicit Each A A New for myst-parser: Output formats:
|
with myst_suppress_warnings option, which works the same as sphinx suppress_warnings.
ec01c85
to
9c50ae5
Compare
9c50ae5
to
809144d
Compare
809144d
to
f468ad0
Compare
The Issue
In MyST, currently, there is limited capability to specify "document-level" referencing, which work independent of the larger project.
Currently, all sphinx reference roles are project wide, e.g.
{any}
,{ref}
,{doc}
, ...Note, these roles also have two limitation: (a) they are maybe not so "Markdownic", and (b) they cannot support nested syntax text.
Then for Markdown links:
[text](https://example.com)
is an external link[text](#target)
links only to local "myst anchors", created by the anchor extension[text](path/to/doc.md)
links to another document[text](path/to/doc.md#target)
links only myst anchors in another document[text](target)
links to anything in the projectNote also, that (2), (3), and (4) do not work for (docutils) single page builds, and (5) acts differently dependent on single page (docutils) or project (sphinx) builds.
The Goals
[text](#target)
link to any target in the local document, and work for docutils and sphinx[text](target)
with a more "specific" syntax, for what the target is targettingAside: anatomy of a CommonMark link
The Uniform Resource Identifier (URI), should generally follow the specification in:
Note, if your URI has spaces in, then it can be enclosed in
<>
, e.g.Proposal
[text](#target)
and[text](relative/path/file.md#target)
work to reference any "standard" local target (plus anchors).[text](target)
is replaced with amyst
scheme, that can have different specificity[text](myst:reftype[?refquery]#target)
The implemented link types are currently as follows:
<https://example.com>
[](https://example.com)
[](file.txt)
<myst:doc#file>
[](file.md)
<myst:local#target>
[](#target)
<myst:doc?t=target#file>
[](file.md#target)
<myst:project#target>
[](myst:project#target)
<myst:inv#target>
[](myst:inv#target)
* these have logic that relies on handling a full project, and so cannot be used when single document parsing
Questions:
reftype
,reftarget
,refquery
?Note about:
TODO:
[text](=target)
be an inline target