Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize to vectors #49

Open
phoe opened this issue May 18, 2020 · 14 comments
Open

Generalize to vectors #49

phoe opened this issue May 18, 2020 · 14 comments

Comments

@phoe
Copy link

phoe commented May 18, 2020

From a quick glance, it seems that almost all (or even all) of the operations here can apply to vectors of any type, not just (vector character) which is what strings are defined to be. Therefore, str could easily become a vector manipulation library, as opposed to just a string manipulation library.

@vindarel
Copy link
Owner

+1 I would like this. Then we add a local package nickname that makes sense to work on vectors (vec) (or another package?) and we're good.

@phoe
Copy link
Author

phoe commented May 20, 2020

Honestly, I find the name cl-str or str to be somewhat unfortunate in this context, since all of the operations that you list work on vectors as well as they do on strings. (Perhaps with an exception of the char-case operations, since - even though they are technically possible - they may have little practical meaning on vectors that hold characters but are nonetheless not specialized strings, like #(#\H #\e #\l #\l #\o).)

If you can afford that, I'd suggest renaming the library from str to anything that mentions vectors rather than just strings; if not, I'd leave a note that explains the reason why str is called str and that that the programmers are free to use local nicknames to nickname the package.

(If you decide to rename the package, I'd also suggest to use a longer package name rather than just str since that's a very short name and other people might want to locally nickname that with another string library of sorts. But that's a mostly off-topic note)

@Akhetopnu
Copy link

Not to necrobump, but has there been any progress in this matter?

@phoe
Copy link
Author

phoe commented Feb 6, 2022

I've looked into it.

I think that most uses of cl-ppcre in the source code can be modified to use something else instead, and only some would require coercing non-string vectors into strings and back away from them. The question is, how do we do that? Should we move the real implementation into some sort of cl-vec library that is guaranteed to work on all vectors, and then have cl-str become a shim that reexports stuff from cl-vec?

There is also the issue of symbols with "string" in their names, so, substring, non-empty-string-p, non-blank-string-p, string-case, and count-substring. I guess that these should get their own more generic names like subsequence in cl-vec, and then cl-str can export the old names for backwards compatibility.

What would be the best way to proceed here?

@mdbergmann
Copy link

Regarding splitting generic vectors. Isn't cl-str specifically a string utility?

@phoe
Copy link
Author

phoe commented Feb 6, 2022

Isn't cl-str specifically a string utility?

Yes, right now it is, hence my original proposal from the first post in this thread. Many of the operations defined here can be generalized to work on arbitrary vectors (or even sequences).

@mdbergmann
Copy link

if not, I'd leave a note that explains the reason why str is called str

IMO this doesn't need to be explained. Most people probably can figure that the library provides string utilities, hence the name 'str'.

@mdbergmann
Copy link

Isn't cl-str specifically a string utility?

Yes, right now it is, hence my original proposal from the first post in this thread. Many of the operations defined here can be generalized to work on arbitrary vectors (or even sequences).

Yeah, ok. But should it? I find it good that there is a library only for strings, with a specific purpose.

@phoe
Copy link
Author

phoe commented Feb 6, 2022

Personally, I cannot find a good reason why e.g. (join #(a b c) #(1 2 3)) should signal a type error rather than return #(1 a b c 2 a b c 3). If such a function doesn't belong in cl-str, then perhaps it should belong in a library that operates on all sequence types and that cl-str can then depend on.

@mdbergmann
Copy link

I cannot find a good reason why e.g. (join #(a b c) #(1 2 3)) should signal a type error rather than return #(1 a b c 2 a b c 3).

I would say it raises a type error because the arguments are not strings.
From a user perspective (or my perspective as a user) I find it comforting that I don't need to think of other use-cases when using cl-str. It deals with strings. So all inputs and outputs are strings, that's it. Kind of reduces the cognitive load. It also reduces the times when one need to look at the documentation for what types of arguments a function supports, etc.

@phoe
Copy link
Author

phoe commented Feb 6, 2022

OK, that works and suggests that a cl-str fork should be made, into a version that deals with all sequence types.

@mdbergmann
Copy link

I wouldn't care if there are more use-cases beyond string. But it would be good if the current public interface could be maintained and maybe an additional one be added that would work more generic?

@phoe
Copy link
Author

phoe commented Feb 6, 2022

I think it's possible to maintain the current interface and expose a more generic one elsewhere. I'll try doing that in a spare while.

@vindarel
Copy link
Owner

vindarel commented Feb 8, 2022

str could have a cousin library. With a similar interface, why not, to make users comfortable.

Or, we would quickload "str" and that would give us two packages, say str and seq? The one to use to the discretion of the user.

I am dubious of a generalized library (despite my first comment two years ago). We have many general libraries. This one wants to be straightforward, for strings. I know when I am working with strings (and when I know I am working with sequences, I appreciate that str:substring works with them too! Very useful.). I fear that extending to vectors would complicate the code too much. That we would loose advantages specific to strings. That remains to be seen.

Probably something can be done with generic-cl.

ps: str:join has been worked on for performance. That doesn't appear in unit tests, and might be easy to do for both versions, but it should be kept in mind. Ses #67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants