Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some useful Seq operations are deprecated on String #11676

Closed
11 tasks
julienrf opened this issue Aug 13, 2019 · 9 comments
Closed
11 tasks

Some useful Seq operations are deprecated on String #11676

julienrf opened this issue Aug 13, 2019 · 9 comments

Comments

@julienrf
Copy link

julienrf commented Aug 13, 2019

A bunch of operations have been deprecated on String, although they make perfect sense there. I’m missing the reason for such deprecations. The suggestion in the deprecation message is often awkward and inefficient (eg, “Use s.toSeq.groupBy(...).view.mapValues(_.unwrap).toMap instead of s.groupBy(...)”).

I think that it should be possible to provide an efficient implementation for all these operations. The work can be done in separate PRs, though. Here is the list of operations:

  • diff
  • intersect
  • distinct
  • distinctBy
  • sorted
  • sortWith
  • sortBy
  • groupBy
  • sliding
  • combinations
  • permutations
@julienrf julienrf changed the title Seq operations are deprecated on String Some useful Seq operations are deprecated on String Aug 13, 2019
@szeiger
Copy link
Member

szeiger commented Aug 13, 2019

The reason for the deprecation is that these methods don't make sense for Unicode strings in general. They could be useful for limited character sets but are almost guaranteed to lead to incorrect results when applied to arbitrary Unicode text.

@smarter
Copy link
Member

smarter commented Aug 13, 2019

But this is also true of existing methods on String like length or substring

@bjornregnell
Copy link

bjornregnell commented Oct 12, 2020

I was hit by this on a lecture today when trying to do distinct on "hello world" - seems like a natural thing to do IMHO and I was perplexed when trying to explain this to my beginner programmers in class. The problem with unicode, I guess, does not go away just because "hello world".toSeq.distinct which turns it into a WrappedString. Or is WrappedString somehow immune against Unicode string problems? (Sorry if my question is just due to my possible misunderstanding of the underlying problem...)

@bjornregnell
Copy link

My main point is that this removes regularity and makes it difficult to beginners.

@SethTisue
Copy link
Member

I would support removing these deprecations.

@martijnhoekstra
Copy link

If a string is nothing more than a sequence of code units, then sliding is a perfectly sensible operation, yielding an iterator over subsequences of code units. Only when you insist it's actually a sequence of arbitrary codepoints representing characters, sliding no longer makes sense. But that's a starting position that's not enforced by the data type at all.

You can only sensibly work with strings if you know what can and can't be in them, otherwise all you can do is treat them as opaque blobs of bytes. Its up to the user to know whether they can do something else with them (and what).

@som-snytt
Copy link

Let's ditch java.lang.String and embrace scala.String[Char], scala.String[CodePoint] where CodePoint is an opaque alias of Int.

"hello, world".diff("bollywood") is natural. For a beginner class, use "hullo, world".diff("hollywood") to reveal URL.

This might make a good lint, and an illustration where a lint is preferred to deprecation. Indeed, where a scalafix lint (opt-in) is preferable to an -Xlint (built-in).

@bjornregnell
Copy link

I've made a bold undeprecation in this PR: scala/scala#9246

@dwijnand
Copy link
Member

Fixed in scala/scala#9246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants