Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add adjacent punctuation modification rules to the spec #81

Open
bdarcus opened this issue Jun 5, 2020 · 12 comments
Open

Add adjacent punctuation modification rules to the spec #81

bdarcus opened this issue Jun 5, 2020 · 12 comments

Comments

@bdarcus
Copy link
Member

bdarcus commented Jun 5, 2020

Picking up on this comment from @fbennett and this reply from @ndw, I propose we document adjacent punctuation modification rules it for inclusion in 1.0.2. explicitly disallow modification of adjacent punctuation in v1.1 of the spec.

As @fbennett indicated, in existing implementations like citeproc-js and pandoc-citeproc:

one of two choices had to be made: to control for such "accidental" duplicate punctuation in the implementations; or to address the issue in the styles by refactoring them to set joining punctuation exclusively as delimiters.

Current expectations, however, are unspecified in the documentation.

This change would instead be the latter, and so would force styles to be updated to fix what are bugs in the styles.

I believe, though am not 100% sure, that @fbennett already removed these tests from the test-suite.

I am also unsure, but think it should be possible, whether can can automate updating of styles to fix these problems, perhaps along with the csl-update.xsl file. Might help if we could have a handful of example styles to test with.

@ndw also suggested we add language to the current spec that required modification, to match existing behavior. I am unsure of whether we need this, particularly if we release 1.1 this summer and can get the styles updated quickly and easily.

Closes citation-style-language/test-suite#13

@fbennett
Copy link
Member

fbennett commented Jun 5, 2020

I could be wrong, but I don't think a move from affix to delimiter joins can be easily automated. Macros often express different joins depending on the result of conditional branching, and so would need to be disaggregated. Locators are also a thorny problem, since the joining punctuation for them can vary. It would probably need to be done gradually over time, and it would be good to back up the more popular styles with bespoke test suites to protect against regressions.

@bdarcus
Copy link
Member Author

bdarcus commented Jun 5, 2020

I hope you're wrong (but you're probably not!), but we could at least automate simple conversions, and flag more complex examples of styles that need manual work?

If yes, we should do this sooner rather than later so we can evaluate?

Are there not too-complicated rules we could look for to identify the issues?

@fbennett
Copy link
Member

fbennett commented Jun 5, 2020 via email

@bdarcus
Copy link
Member Author

bdarcus commented Jun 5, 2020

OK.

To illustrate what I was initially thinking ...

Perhaps one simple rule to check the style files themselves is looking for elements where there are a prefix but no suffix, or vice versa?

If we look at this example:

<group delimiter=" ">
  <text variable="URL" prefix=" Available from: "/>
  <date variable="accessed" prefix="[accessed " suffix="]" form="text"/>
</group>

That rule as stated above would skip the date element, but would flag the text element.

So maybe only look for punctuation in the affixes, or content outside of group, in which case the above would pass, but this would be flagged:

<text variable="title" suffix=", "/>
<text variable="URL" suffix=", "/>

... and could even be converted automatically to:

<group delimiter=", " suffix="' ">
  <text variable="title"/>
  <text variable="URL"/>
</group>

Do you think an approach like that could work, or must one look at the actual output?

Advantage if we could do it is it could be integrated in the styles repo CI for changes going forward.

I don't think I"ll be able to do either, but just so we know the right path or paths.

@georgd
Copy link
Contributor

georgd commented Jun 5, 2020

I’ve done such conversions several times manually because I try to follow a punctuation-in-delimiter-attribute-only policy and the template styles often use affixes for that. My impression is that you’ll often have to intervene manually to get group nesting right and on some occasions you’d have to introduce if/else-blocks and duplicate elements to account for cases of omitted variables that influence delimiter use.

That said, I believe that in the long run the delimiter approach, while more verbose, should be the way to go in the future. IMO it reduces the cases of unexpected punctuation clashes.

@bwiernik
Copy link
Member

bwiernik commented Jun 5, 2020

I think the punctuation/spacing correction has fairly little to do with style coding practices. That's one case, but there are numerous others.

Some punctuation modification absolutely needs to be permitted. For example, many styles place a period after the title. But this period should not be added when the title already ends in punctuation. That should be handled by the processor.

For the question of affixes vs. groups, I'm really not sure that most of the changes that would be required could be automated. I've refactored a bunch of styles over the years to use groups instead of chained affixes, and it usually is a fairly bespoke affair.

There is also the issue that style-coded affixes and user-entered affixes in citation data are different beasts. We could avoid the need for affix correction in styles through enforcing best practices there, but there isn't a way to prevent, for example, a user from entering extra/omitting necessary spaces in citation affixes. I don't think it would be a good user experience at all to have (Jones, 1985 , though he later recanted) appear rather than (Jones, 1985, though he later recanted) just due to minor sloppiness during the editing process. As I mentioned in the other thread, this is particularly an issue in GUI clients, where things like trailing spaces on a prefix are often hard to see, but it's also the case in text editors (e.g., it's easy to accidentally leave a double space during repeated editing).

I suggest we explicitly go in the opposite direction--add the current citeproc-js/pandoc-citeproc modifcation behavior to the processor schema.

@bdarcus
Copy link
Member Author

bdarcus commented Jun 5, 2020 via email

@bdarcus
Copy link
Member Author

bdarcus commented Jun 5, 2020 via email

@georgd
Copy link
Contributor

georgd commented Jun 5, 2020

@bwiernik I agree with you that punctuation correction is necessary. When I commented, I already forgot that that was the initial proposal and concentrated only on moving punctuation from affixes to group delimiters which I understood to be the consequence.

I think, it’s a good idea to add the current behavior to the spec 1.0 spec. In a later version that could perhaps be made available to style authors so to allow for styles that don’t merge title-ending punctuation with the following punctuation character. Citavi presents the user with a substitution table for punctuation character combinations in the style editor.

@bdarcus
Copy link
Member Author

bdarcus commented Jun 5, 2020

I think, it’s a good idea to add the current behavior to the spec 1.0 spec.

Let's do this then.

Is there existing documentation of the behavior in citeproc-js and/or pandoc-citeproc that we could use as the basis?

Could someone with comfort-level with the feature (not me), and with the specification.rst file (also not me), please put together a PR that we could include in the 1.0.2 release?

I'll re-title this issue in the meantime to be more narrow.

@bdarcus bdarcus changed the title Disallow modifying adjacent punctuation on v1.1 Add adjacent punctuation modification rules to the spec Jun 5, 2020
@fbennett
Copy link
Member

fbennett commented Jun 5, 2020

For citeproc-js, there are only the processor-specific tests, unfortunately. In addition to suppression of spaces and duplicate punctuation, there are issues around quotation marks and moving punctuation, where decisions must be made about the distribution of various combinations of punctuation marks (comma, period, semicolon, colon, queston mark, exclamaton point) on either side of a closing quote.

@bdarcus
Copy link
Member Author

bdarcus commented Jun 6, 2020

This appears to be the pandoc-citeproc code responsible for this, but alas there are no docstrings on the functions, and I don't find Haskell the most easy-to-understand language.

The citeproc-rs code seems to be in this directory, but is also undocumented, and I don't understand the code.

For citeproc-js, there are only the processor-specific tests, unfortunately.

Which tests?

I've seen specs, BTW, that embed test content within them.

Maybe we could start from the test examples, and build the language around them?

But to do that, we need to know which tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants