Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying locale in lexicographicSortSchema #2869

Closed
vhenzl opened this issue Dec 8, 2020 · 2 comments · Fixed by #2876
Closed

Allow specifying locale in lexicographicSortSchema #2869

vhenzl opened this issue Dec 8, 2020 · 2 comments · Fixed by #2876

Comments

@vhenzl
Copy link

vhenzl commented Dec 8, 2020

I use lexicographicSortSchema() to get a stable output when generating TS types from GraphQL schema and recently I discovered a problem when a field called chemicalName is misplaced and causing a diff.

The reason is that one of my machines has non-English, Czech locale and in Czech alphabet, the digraph Ch is handled as a single letter and sorted between H and I.

Example
const items = ['xx', 'aa', 'dd', 'cx', 'ch', 'cc'];

items.sort((a, b) => a.localeCompare(b, 'en'));
// ["aa", "cc", "ch", "cx", "dd", "xx"]

items.sort((a, b) => a.localeCompare(b, 'cs'));
// ["aa", "cc", "cx", "dd", "ch", "xx"]

Would be great if lexicographicSortSchema() could accept locales and possibly also options parameters for localeCompare to be able to force a particular (English) locale for the schema sorting independently on the machine's locale.

@IvanGoncharov
Copy link
Member

IvanGoncharov commented Dec 30, 2020

@vhenzl Thanks for taking the time to open this issue.
Since we sort only identifiers and GraphQL restricts them to A-Za-z0-9_ I think supporting custom locales is overkill and complicates APIs.
As a solution to different default locales, we can write a super simple function that compares characters based on position in ASCII tables.
What do you think?

@vhenzl
Copy link
Author

vhenzl commented Jan 8, 2021

Hi @IvanGoncharov! First of all, sorry for the late answer. And thanks for looking into it.

You are probably right, supporting locales is likely overkill. Using a simple character comparison would for sure work for my use case. My objective here is to get a stable order regardless of system locale, but not necessarily alphabetically/lexicographically correct order. Which I guess should suit most of the graphql users; but possibly not all.

Since the alphabet is restricted to English letters, only digraphs (trigraphs?) and letter case are interesting in terms of sorting. There is a handful of languages that sort digraphs in special ways, most other languages threat them as combinations of separate letters.

With a simple character comparison, words with digraphs would be placed differently than they should be in the given language. However, for someone to be really affected by that, it would mean that names in the sorted GraphQL schema are in that non-English language (who doesn't use English for programming?) and that the order really matters for the given use case (maybe something like a table of contents). So to cut a long story short, the chances that it would affect someone negatively are IMO minimal.

I'm not so sure about different order for letter cases tho, because it would affect all languages:

function ascii(x, y) {
  if(x > y) return 1; 
  if(x < y) return -1; 
  return 0
}

function asciiUpperCase(xx, yy) {
  const x = xx.toUpperCase(); 
  const y = yy.toUpperCase(); 
  if(x > y) return 1; 
  if(x < y) return -1; 
  return 0
}

const a = ['c', 'b', 'a', 'Ab', 'ab', 'AB', 'aB', '10', '5', 'C', 'B', 'A'];

[...a].sort(asciiUpperCase) 
// ["10", "5", "a", "A", "Ab", "ab", "AB", "aB", "b", "B", "c", "C"]

[...a].sort(ascii) // same as [...a].sort() 
// ["10", "5", "A", "AB", "Ab", "B", "C", "a", "aB", "ab", "b", "c"]

[...a].sort((x, y) => x.localeCompare(y, 'cs')) // same as 'en' locale
// ["10", "5", "a", "A", "ab", "aB", "Ab", "AB", "b", "B", "c", "C"]

So names using camelCase would be sorted differently then they are now.


As for the #2876 PR, I'm not sure I understand what is findBreakingChanges.js about, but it seems that you are sorting only fields. But the problem here applies to all names (types, inputs,…), not just fields. So I believe that sortBy function in lexicographicSortSchema.js should be modified. Before I noticed your PR, I was about to create my own with something like this:

function sortBy<T>(
  array: $ReadOnlyArray<T>,
  mapToKey: (T) => string,
): Array<T> {
  return array.slice().sort((obj1, obj2) => {
    const key1 = mapToKey(obj1).toUpperCase();
    const key2 = mapToKey(obj2).toUpperCase();
    if (key1 > key2) {
      return 1;
    }
    if (key1 < key2) {
      return -1;
    }
    return 0;
  });
}

I tried the change locally on my project and it works just fine. Using toUpperCase gives me the same order (apart from names with the ch digraph) like the original algorithm with localeCompare; comparing keys in the original case leads to a different order for some names.

IvanGoncharov added a commit to IvanGoncharov/graphql-js that referenced this issue Jan 15, 2021
IvanGoncharov added a commit to IvanGoncharov/graphql-js that referenced this issue Jan 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants