-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cases API + 4 implementations (Pascal, Camel, Kebab, Snake) #450
base: master
Are you sure you want to change the base?
Conversation
Also meant to add... As an experiment I implemented a Case that uses the CaseUtils logic to parse tokens such that one could theoretically deprecate that class and use this api like |
@elharo |
IMO: You should follow the YAGNI principle and only make public and protected what you must, especially in a first cut. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I like this proposal in general. I'd love to get some beta feedback before we lock everything in though. Does Apache commons have any sort of beta or incubation phase for APIs we can use?
-
Is there any good source of typical strings in other formats we could use for more extensive test coverage? E.g. test cases for a similar library in a different language?
"Does Apache commons have any sort of beta or incubation phase for APIs we can use?" No. The best we can do is to issue a release with a -beta1 or -alpha1 or -M1 version suffix like 1.12.0-beta1. See the Commons FileUpload and Commons Imaging components for recent examples. |
Guava has |
This feels unwise to me and I would discourage it. Based on what I've seen in Commons, once something is public or protected, it will be used and complained about when changed. This would also make JApiCmp more complex to configure: How do you tell it to ignore classes, methods, and fields annotated with @foo?. I certainly would not want to individually list each in-flux element. Same issue with using a ".beta." package name. If you put an Alpha, Beta or M label on the whole release then you make it obvious from the onset that you are offering an unstable platform, which is the approach Commons has already taken with Commons FileUpload and Commons Imaging. Using a Beta release requires buy-in from the user that there are unstable elements, as opposed to using a normal full release that happens to have unstable parts in it. Commons has been very good at maintaining binary compatibility in normal releases, so I think it wise to keep maintaining this expectation. |
Yes people will complain, but if it was explicitly marked beta, you tell them to pound sand (more politely). The problem with a complete beta release in a case like this is there's so much more in commons that isn't beta. We need a way to ship stable features while also experimenting with less stable APIs. Binary compatibility would be maintained for anyone who doesn't use explicitly Beta APIs. I'm not sure what JApiCmp is, but this approach has been tried and proven in Guava and other Google open source Java libraries. Probably worth moving this discussion to the list. |
I converted the cases API to use StringTokenizer as suggested by @elharo. In general this works out fine and greatly simplifies those classes, but necessitates a couple of other changes that I'd request your comments on...
If there's an alternative to using these new classes I'm all for it, just tried to keep it simple but clear.
|
src/main/java/org/apache/commons/text/TokenFormatterFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/text/TokenFormatterFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/text/TokenFormatterFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/text/matcher/AbstractStringMatcher.java
Show resolved
Hide resolved
src/main/java/org/apache/commons/text/matcher/AbstractStringMatcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/text/matcher/AbstractStringMatcher.java
Show resolved
Hide resolved
Honestly, I'm not sure I'm in favor of splitting CaseUtils into a bunch of separate classes. Actually, I'm sure I'm not in favor of this. Just sayin' |
That suffers from all the same problems of the original implementation. This PR wasn't just to refactor the existing features, it was to enhance them to allow for other cases. The old feature can't handle certain scenarios, and doesn't allow the user to parse the string back into it tokens. e.g. How would one convert from a camel cased string to a kebab case string using the methods you mentioned? |
Funny... I was just starting to play with my clone of your project, I want to run some tests using: But no, you can't convert back and forth to and from camelCase without tokens. |
Well, at least this one fails, it's the first test I ran The text is 𝑡ℎ𝑒_𝑞𝑢𝑖𝑐𝑘_𝑏𝑟𝑜𝑤𝑛_𝑓𝑜𝑥
AlphabetConverter returns java: illegal character: '\ufffd' |
That line of code runs fine for me. Can you send the stacktrace and/or the full unit test method? |
It's not giving me a stack trace, it's giving a build failure. I used Zulu 21 then Temurin 21 under IntelliJ Idea Ultimate Here is the method:
Maybe I fat-fingered something wrong into the method |
Yea that's something wrong with your build. The test can't be failing because its not being run to begin with. That class wasn't changed as part of this PR so I'm not sure what is causing the illegal character issue. |
Yeah, I don't know what I did, but it works, now, including my new test.
You might need to rebase your project and do a push, I don't know if it has to do with 1.12RC, or something, but I tried re-cloning from this PR, and none of your work was in it. Then I tried cloning directly from your repository, same thing, then I tried downloading the zip file from your repository and all your work was gone. I had to manually copy cases directory, the matcher files you edited, all the tokenizer stuff, etc. from the one I cloned a couple days ago into the new clone, and git thinks they are new and changed files. The release notes in what I downloaded say 1.11.0 Weird. This PR and my PR 528 are not incompatible with each other. They serve different purposes. Mine is designed to get everything into lower ASCII and spit out the string, while yours is designed to keep the Unicode characters intact as tokens for future manipulation. I don't think mine can handle UTF32 characters, so 𝑡ℎ𝑒_𝑞𝑢𝑖𝑐𝑘_𝑏𝑟𝑜𝑤𝑛_𝑓𝑜𝑥 would fail as an input on 528. I'll have to try it. I haven't, yet. I frittered away my morning playing with this pull (most of it trying to re-clone) so I will have to revisit it because I have a bunch of other stuff to do today. I'll probably re-visit Sunday. |
I got everything re-merged into the initial clone, and git likes it. It finds differences in StringTokenizerTest, but that's it. I think I must have accidentally pasted utf32 characters directly into that AlphabetConverter (which I wasn't editing, but I was nosing around in there), and Java didn't know what to do with it, so the compile burped. OK, I'm impressed. This passes:
The string is: "𝚖𝚢𝙲𝚊𝚖𝚎𝚕𝚅𝚊𝚛𝚒𝚊𝚋𝚕𝚎". The tokens are "𝚖𝚢", "𝙲𝚊𝚖𝚎𝚕", "𝚅𝚊𝚛𝚒𝚊𝚋𝚕𝚎". You'd only see a variable like this in something like Julia, but danged if your API didn't work like a charm on it. I'm liking this. |
The provided implementations are fairly simple (even more so after relying on StringTokenizer) and aim to impose as few restrictions as possible, relying on the java.lang.Character class to make most case determinations. Using the API is a bit more verbose than a single util method call, but it comes with more than a few advantages. |
This is a more formal API for parsing and formatting strings of various "Cases".
Similar logic is present in
org.apache.commons.text.CaseUtils
as well as #360 to add two additional case functions, but that code is based around the initial string having delimiters, and cannot translate between cases. This API aims to be slightly more formal, expecting the inputs to the parse method to abide by the syntax of the case. No assumptions (other than the Case syntax) are made about the inputs, and removal of unwanted characters is left to the user. Case classes can be sub classed as necessary to support that.The Case interface exposes two methods...
with parse() returning a List of String tokens, which can then be passed to another Case instance format() method. Allowing users to change the case of a string e.g.
CamelCase.INSTANCE.format(KebabCase.INSTANCE.parse("my-kebab-string"))
. I followed the pattern of other commons-text classes in using code points rather than primitive chars (I assume to support UTF32).