Add adjacent punctuation modification rules to the spec #81

bdarcus · 2020-06-05T13:14:37Z

Picking up on this comment from @fbennett and this reply from @ndw, I propose we document adjacent punctuation modification rules it for inclusion in 1.0.2. ~~explicitly disallow modification of adjacent punctuation in v1.1 of the spec.~~

As @fbennett indicated, in existing implementations like citeproc-js and pandoc-citeproc:

one of two choices had to be made: to control for such "accidental" duplicate punctuation in the implementations; or to address the issue in the styles by refactoring them to set joining punctuation exclusively as delimiters.

Current expectations, however, are unspecified in the documentation.

~~This change would instead be the latter, and so would force styles to be updated to fix what are bugs in the styles.~~

~~I believe, though am not 100% sure, that @fbennett already removed these tests from the test-suite.~~

I am also unsure, but think it should be possible, whether can can automate updating of styles to fix these problems, perhaps along with the csl-update.xsl file. Might help if we could have a handful of example styles to test with.

@ndw also suggested we add language to the current spec that required modification, to match existing behavior. I am unsure of whether we need this, particularly if we release 1.1 this summer and can get the styles updated quickly and easily.

Closes citation-style-language/test-suite#13

fbennett · 2020-06-05T13:31:46Z

I could be wrong, but I don't think a move from affix to delimiter joins can be easily automated. Macros often express different joins depending on the result of conditional branching, and so would need to be disaggregated. Locators are also a thorny problem, since the joining punctuation for them can vary. It would probably need to be done gradually over time, and it would be good to back up the more popular styles with bespoke test suites to protect against regressions.

bdarcus · 2020-06-05T13:36:19Z

I hope you're wrong (but you're probably not!), but we could at least automate simple conversions, and flag more complex examples of styles that need manual work?

If yes, we should do this sooner rather than later so we can evaluate?

Are there not too-complicated rules we could look for to identify the issues?

fbennett · 2020-06-05T13:50:29Z

A generic set of input data and test fixtures that exercise a style through the most common permutations might be used to turn up punctuation and space pairs. If duplicates are not that common, they might be controlled well enough by adjusting the use of affixes, rather than wholesale refactoring. It's a partial solution, since potential duplicates might still be lurking in untested forms, but might be less demanding.

…

On Fri, Jun 5, 2020 at 10:36 PM bdarcus ***@***.***> wrote: I hope you're wrong (but you're probably not!), but we could at least automate simple conversions, and flag more complex examples of styles that need manual work? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#81 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASMSREP6VETYIWEA24IJTRVDYGFANCNFSM4NTRNTBA> .

bdarcus · 2020-06-05T14:18:34Z

OK.

To illustrate what I was initially thinking ...

Perhaps one simple rule to check the style files themselves is looking for elements where there are a prefix but no suffix, or vice versa?

If we look at this example:

<group delimiter=" ">
  <text variable="URL" prefix=" Available from: "/>
  <date variable="accessed" prefix="[accessed " suffix="]" form="text"/>
</group>

That rule as stated above would skip the date element, but would flag the text element.

So maybe only look for punctuation in the affixes, or content outside of group, in which case the above would pass, but this would be flagged:

<text variable="title" suffix=", "/>
<text variable="URL" suffix=", "/>

... and could even be converted automatically to:

<group delimiter=", " suffix="' ">
  <text variable="title"/>
  <text variable="URL"/>
</group>

Do you think an approach like that could work, or must one look at the actual output?

Advantage if we could do it is it could be integrated in the styles repo CI for changes going forward.

I don't think I"ll be able to do either, but just so we know the right path or paths.

georgd · 2020-06-05T14:36:18Z

I’ve done such conversions several times manually because I try to follow a punctuation-in-delimiter-attribute-only policy and the template styles often use affixes for that. My impression is that you’ll often have to intervene manually to get group nesting right and on some occasions you’d have to introduce if/else-blocks and duplicate elements to account for cases of omitted variables that influence delimiter use.

That said, I believe that in the long run the delimiter approach, while more verbose, should be the way to go in the future. IMO it reduces the cases of unexpected punctuation clashes.

bwiernik · 2020-06-05T17:26:32Z

I think the punctuation/spacing correction has fairly little to do with style coding practices. That's one case, but there are numerous others.

Some punctuation modification absolutely needs to be permitted. For example, many styles place a period after the title. But this period should not be added when the title already ends in punctuation. That should be handled by the processor.

For the question of affixes vs. groups, I'm really not sure that most of the changes that would be required could be automated. I've refactored a bunch of styles over the years to use groups instead of chained affixes, and it usually is a fairly bespoke affair.

There is also the issue that style-coded affixes and user-entered affixes in citation data are different beasts. We could avoid the need for affix correction in styles through enforcing best practices there, but there isn't a way to prevent, for example, a user from entering extra/omitting necessary spaces in citation affixes. I don't think it would be a good user experience at all to have (Jones, 1985 , though he later recanted) appear rather than (Jones, 1985, though he later recanted) just due to minor sloppiness during the editing process. As I mentioned in the other thread, this is particularly an issue in GUI clients, where things like trailing spaces on a prefix are often hard to see, but it's also the case in text editors (e.g., it's easy to accidentally leave a double space during repeated editing).

I suggest we explicitly go in the opposite direction--add the current citeproc-js/pandoc-citeproc modifcation behavior to the processor schema.

bdarcus · 2020-06-05T17:32:57Z

Thanks for the thorough explanation. To be clear, I don't really care. I just presented an emphatic statement, with the plea for us to decide.

bdarcus · 2020-06-05T17:35:57Z

PS - advantage of status quo is we could just amend the 1.0 spec to include the behavior, and be done.

georgd · 2020-06-05T17:47:01Z

@bwiernik I agree with you that punctuation correction is necessary. When I commented, I already forgot that that was the initial proposal and concentrated only on moving punctuation from affixes to group delimiters which I understood to be the consequence.

I think, it’s a good idea to add the current behavior to the spec 1.0 spec. In a later version that could perhaps be made available to style authors so to allow for styles that don’t merge title-ending punctuation with the following punctuation character. Citavi presents the user with a substitution table for punctuation character combinations in the style editor.

bdarcus · 2020-06-05T19:16:09Z

I think, it’s a good idea to add the current behavior to the spec 1.0 spec.

Let's do this then.

Is there existing documentation of the behavior in citeproc-js and/or pandoc-citeproc that we could use as the basis?

Could someone with comfort-level with the feature (not me), and with the specification.rst file (also not me), please put together a PR that we could include in the 1.0.2 release?

I'll re-title this issue in the meantime to be more narrow.

fbennett · 2020-06-05T23:52:35Z

For citeproc-js, there are only the processor-specific tests, unfortunately. In addition to suppression of spaces and duplicate punctuation, there are issues around quotation marks and moving punctuation, where decisions must be made about the distribution of various combinations of punctuation marks (comma, period, semicolon, colon, queston mark, exclamaton point) on either side of a closing quote.

bdarcus · 2020-06-06T00:21:59Z

This appears to be the pandoc-citeproc code responsible for this, but alas there are no docstrings on the functions, and I don't find Haskell the most easy-to-understand language.

The citeproc-rs code seems to be in this directory, but is also undocumented, and I don't understand the code.

For citeproc-js, there are only the processor-specific tests, unfortunately.

Which tests?

I've seen specs, BTW, that embed test content within them.

Maybe we could start from the test examples, and build the language around them?

But to do that, we need to know which tests.

bdarcus added 1.1 deprecate documentation labels Jun 5, 2020

bdarcus changed the title ~~Disallow modifying adjacent punctuation on v1.1~~ Add adjacent punctuation modification rules to the spec Jun 5, 2020

bdarcus added 1.0.2 and removed 1.1 deprecate labels Jun 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adjacent punctuation modification rules to the spec #81

Add adjacent punctuation modification rules to the spec #81

bdarcus commented Jun 5, 2020 •

edited

fbennett commented Jun 5, 2020 •

edited

bdarcus commented Jun 5, 2020 •

edited

fbennett commented Jun 5, 2020 via email

bdarcus commented Jun 5, 2020 •

edited

georgd commented Jun 5, 2020

bwiernik commented Jun 5, 2020

bdarcus commented Jun 5, 2020 via email

bdarcus commented Jun 5, 2020 via email

georgd commented Jun 5, 2020

bdarcus commented Jun 5, 2020

fbennett commented Jun 5, 2020

bdarcus commented Jun 6, 2020 •

edited

Add adjacent punctuation modification rules to the spec #81

Add adjacent punctuation modification rules to the spec #81

Comments

bdarcus commented Jun 5, 2020 • edited

fbennett commented Jun 5, 2020 • edited

bdarcus commented Jun 5, 2020 • edited

fbennett commented Jun 5, 2020 via email

bdarcus commented Jun 5, 2020 • edited

georgd commented Jun 5, 2020

bwiernik commented Jun 5, 2020

bdarcus commented Jun 5, 2020 via email

bdarcus commented Jun 5, 2020 via email

georgd commented Jun 5, 2020

bdarcus commented Jun 5, 2020

fbennett commented Jun 5, 2020

bdarcus commented Jun 6, 2020 • edited

bdarcus commented Jun 5, 2020 •

edited

fbennett commented Jun 5, 2020 •

edited

bdarcus commented Jun 5, 2020 •

edited

bdarcus commented Jun 5, 2020 •

edited

bdarcus commented Jun 6, 2020 •

edited