Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List.sort() very slow if list is already sorted or nearly so. #1141

Open
PureFox48 opened this issue Feb 25, 2023 · 27 comments
Open

List.sort() very slow if list is already sorted or nearly so. #1141

PureFox48 opened this issue Feb 25, 2023 · 27 comments

Comments

@PureFox48
Copy link
Contributor

PureFox48 commented Feb 25, 2023

I noticed recently that List.sort(), which is based on the Quicksort algorithm, is very slow indeed when applied to lists which are already sorted or nearly so.

For example, this code takes around 5.7 seconds to run on my Core i7 machine:

var list = (1..10000).toList
list.sort()

I tracked down the problem to the choice of pivot in the private partition_ method in wren_core.wren:

partition_(low, high, comparer) {
    var p = this[high]
    // blah
}

If instead of always using the high value we use the mid value:

partition_(low, high, comparer) {
    var mid = ((low + high)/2).floor
    this.swap(mid, high)
    var p = this[high]
    // blah
}

then the code now runs in around 0.014 seconds. Moreover, the effect of adding these two lines has negligible effect on the time taken to sort unsorted lists.

The implementation also has a problem when the list contains repeated elements. For example this code takes around 2.8 seconds to run whatever the pivot choice:

var list = [1] * 10000
list.sort()

However, I can't think of a simple way to solve this particular problem which probably occurs much less often than the 'already sorted' one in practice. If anyone has an idea on this, please post it.

These, of course, are known issues with Quicksort generally which is also very slow when sorting small lists. The latter could be solved by using an insertion sort for lists under 10 elements say but, frankly, I doubt whether it's worth the effort in a simple scripting language such as Wren.

@ruby0x1
Copy link
Member

ruby0x1 commented Feb 25, 2023

For the last snippet I was wondering. I'm assuming so but did you measure only the sort or both lines? If not what does this version measure at?

var list = List.filled(10000, 1)
list.sort()

@ruby0x1
Copy link
Member

ruby0x1 commented Feb 25, 2023

Thanks for digging btw, we can probably add some tests around this kinda thing to give us a baseline to compare with and validate against.

@PureFox48
Copy link
Contributor Author

I just measured the sort.

The time to create the list is virtually the same at 0.0183 seconds whether one uses: List.filled or the List.* operator so possibly they're using the same code under the hood.

Incidentally, the Wikipedia article on Quicksort suggests that using a 'median of three' may be better than a simple midpoint. It was actually a shade slower in my own tests possibly because the cost of interpreting the extra code outweighed any efficiency gained.

@PureFox48
Copy link
Contributor Author

PureFox48 commented Feb 25, 2023

Oops, sorry, I messed up the timings for the list creation.

List.* takes 0.00092 seconds but List.filled only takes around 1 millionth of a second. However, either way, it's still insignificant compared to the sort time of 2.8 seconds.

@mhermier
Copy link
Contributor

mhermier commented Feb 25, 2023 via email

@ruby0x1
Copy link
Member

ruby0x1 commented Feb 25, 2023

here's one I had in my libs laying around

    static bubble_sort(list, compare) {
      if(list.count == 0) return
      var done = false
      while(!done) {
        done = true
        for(i in 0 ... list.count-1) {
          var comp = compare.call(list[i], list[i+1])
          if(comp > 0) {
            var tmp = list[i]
            list[i] = list[i+1]
            list[i+1] = tmp
            done = false
          }
        }
      }
    } //bubble_sort

@PureFox48
Copy link
Contributor Author

PureFox48 commented Feb 25, 2023

Using the following optimized implementation of an ascending bubble sort:

var bubbleSort = Fn.new { |a|
    var n = a.count
    while (true) {
        var n2 = 0
        for (i in 1...n) {
            if (a[i - 1] > a[i]) {
                a.swap(i, i - 1)
                n2 = i
            }
        }
        n = n2
        if (n == 0) break
    }
}

the results are a bit variable but it's typically taking around 0.0008 seconds for both the 'already sorted' and 'all the same' cases.

@ruby0x1's version is slightly slower than this.

As expected, this is much quicker than Quicksort even with the modified pivot but, of course, it would be a completely different story if the list contained random values.

@ruby0x1
Copy link
Member

ruby0x1 commented Feb 25, 2023

one has a predicate so not a direct comparison (and not that it matters grand scheme, interesting none the less!)

@PureFox48
Copy link
Contributor Author

After reading through the salient parts of the Wikipedia article, I wonder whether the best solution here would be to change the partition scheme from Lomuto (which we seem to have at present) to Hoare?

On the face of it, this would give reasonable (albeit sub-optimal) performance for the 'already sorted' and 'all the same' cases and should be no slower for unsorted data. The interface of List.sort would be the same and so this change would be invisible to users.

The alternative would be to stick with what we have, changing the pivot choice to midpoint (or median of three) to give reasonable performance for already sorted lists and just accept the hit on cases where a lot of the data is the same on the grounds that this shouldn't occur very often in practice. We could perhaps put a warning in the docs about this so that knowledgeable users could use an alternative algorithm for this kind of scenario.

@ruby0x1
Copy link
Member

ruby0x1 commented Feb 25, 2023

If the Hoare version works better in all cases @PureFox48, then it makes sense to consider it. A PR would be the ideal path then, if you're up for it.

@PureFox48
Copy link
Contributor Author

OK, when I have a clearer head, I'll try and do a Hoare version and see how it compares.

@mhermier
Copy link
Contributor

mhermier commented Feb 26, 2023 via email

@mhermier
Copy link
Contributor

mhermier commented Feb 26, 2023 via email

@PureFox48
Copy link
Contributor Author

Both bubble sort and insertion sort are hopelessly slow when sorting random data even for lists as small as 100 elements.

Check out the Wren results for this Rosetta code task, comparing the performance of various sorting methods, which I did some time ago.

To keep things simple, I think we must stick with quicksort for the List.sort method which is easily the fastest algorithm for random data of non-trivial size and should (with Hoare partitioning which I'm working on just now) still give reasonable results even when the data is already sorted or all the same.

@PureFox48
Copy link
Contributor Author

OK, I've done the Hoare version and it seems to be working OK.

To make it easier to play about with them, I've put it (and also the modified Lomuto code) in separate classes for now.

import "random" for Random

// Lomuto partitioning (as per List.sort()) but with midpoint pivot.
class List2 {
  static sort(a) { sort(a) {|low, high| low < high } }

  static sort(a, comparer) {
    if (!(comparer is Fn)) {
      Fiber.abort("Comparer must be a function.")
    }
    quicksort_(a, 0, a.count - 1, comparer)
    return a
  }

  static quicksort_(a, low, high, comparer) {
    if (low < high) {
      var p = partition_(a, low, high, comparer)
      quicksort_(a, low, p - 1, comparer)
      quicksort_(a, p + 1, high, comparer)
    }
  }

  static partition_(a, low, high, comparer) {
    var mid = ((low + high)/2).floor // added this line
    a.swap(mid, high)                // ditto
    var p = a[high]
    var i = low - 1
    for (j in low..(high-1)) {
      if (comparer.call(a[j], p)) {
        i = i + 1
        var t = a[i]
        a[i] = a[j]
        a[j] = t
      }
    }
    var t = a[i+1]
    a[i+1] = a[high]
    a[high] = t
    return i+1
  }
}

// Hoare partitioning as per Wikpedia article.
class List3 {
  static sort(a) { sort(a) {|low, high| low < high } }

  static sort(a, comparer) {
    if (!(comparer is Fn)) {
      Fiber.abort("Comparer must be a function.")
    }
    quicksort_(a, 0, a.count - 1, comparer)
    return a
  }

  static quicksort_(a, low, high, comparer) {
    if (low >= 0 && high >= 0 && low < high) {
      var p = partition_(a, low, high, comparer)
      quicksort_(a, low, p, comparer)
      quicksort_(a, p+1, high, comparer)
    }
  }

  static partition_(a, low, high, comparer) {
    var mid = ((low + high)/2).floor
    var p = a[mid]
    var i = low - 1
    var j = high + 1
    while (true) {
      while (true) {
        i = i + 1
        if (!comparer.call(a[i], p)) break
      }
      while (true) {
        j = j - 1
        if (!comparer.call(p, a[j])) break
      }
      if (i >= j) return j
      a.swap(i, j)
    }
  }
}

var n = 20
var sorted = (1..n).toList
var same = List.filled(n, 1)
var rand = Random.new()
var unsorted = sorted.toList
rand.shuffle(unsorted)

System.print("Tests for List3.sort (Hoare):\n")

System.print(sorted)
List3.sort(sorted)
System.print(sorted)
System.print()

System.print(same)
List3.sort(same)
System.print(same)
System.print()

var unsorted2 = unsorted.toList // make a copy
System.print(unsorted)
List3.sort(unsorted) { |a, b| a < b }
System.print(unsorted)
System.print()

System.print(unsorted2)
List3.sort(unsorted2) { |a, b| a > b } // descending sort
System.print(unsorted2)

/* output:
Tests for List3.sort (Hoare):

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

[8, 12, 14, 19, 15, 1, 20, 11, 3, 6, 2, 16, 13, 18, 5, 10, 9, 4, 17, 7]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

[8, 12, 14, 19, 15, 1, 20, 11, 3, 6, 2, 16, 13, 18, 5, 10, 9, 4, 17, 7]
[20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
*/

I'm doing some comparative benchmarks and will post those shortly.

@PureFox48
Copy link
Contributor Author

PureFox48 commented Feb 26, 2023

OK, here's the performance figures for the existing List.sort method (Lomuto) with the midpoint pivot:

Benchmarks for List2.sort (modified Lomuto):

Running 'Sorted data size 10' over 10 iteration(s):

Best 0.004 ms
Mean 0.004 ms
Worst 0.007 ms

Running 'Same data size 10' over 10 iteration(s):

Best 0.005 ms
Mean 0.007 ms
Worst 0.019 ms

Running 'Random data size 10' over 10 iteration(s):

Best 0.004 ms
Mean 0.008 ms
Worst 0.024 ms

Running 'Sorted data size 100' over 10 iteration(s):

Best 0.060 ms
Mean 0.064 ms
Worst 0.096 ms

Running 'Same data size 100' over 10 iteration(s):

Best 0.304 ms
Mean 0.309 ms
Worst 0.340 ms

Running 'Random data size 100' over 10 iteration(s):

Best 0.076 ms
Mean 0.080 ms
Worst 0.110 ms

Running 'Sorted data size 1000' over 10 iteration(s):

Best 0.840 ms
Mean 0.850 ms
Worst 0.908 ms

Running 'Same data size 1000' over 10 iteration(s):

Best 27.987 ms
Mean 28.121 ms
Worst 28.585 ms

Running 'Random data size 1000' over 10 iteration(s):

Best 1.144 ms
Mean 1.159 ms
Worst 1.208 ms

Running 'Sorted data size 10000' over 10 iteration(s):

Best 11.382 ms
Mean 11.455 ms
Worst 11.632 ms

Running 'Same data size 10000' over 10 iteration(s):

Best 2773.198 ms
Mean 2777.167 ms
Worst 2792.362 ms

Running 'Random data size 10000' over 10 iteration(s):

Best 16.246 ms
Mean 16.354 ms
Worst 16.724 ms

@PureFox48
Copy link
Contributor Author

PureFox48 commented Feb 26, 2023

...and here's the performance figures for the Hoare partitioning version:

Benchmarks for List3.sort (Hoare):

Running 'Sorted data size 10' over 10 iteration(s):

Best 0.005 ms
Mean 0.005 ms
Worst 0.007 ms

Running 'Same data size 10' over 10 iteration(s):

Best 0.005 ms
Mean 0.006 ms
Worst 0.007 ms

Running 'Random data size 10' over 10 iteration(s):

Best 0.005 ms
Mean 0.006 ms
Worst 0.007 ms

Running 'Sorted data size 100' over 10 iteration(s):

Best 0.065 ms
Mean 0.067 ms
Worst 0.068 ms

Running 'Same data size 100' over 10 iteration(s):

Best 0.080 ms
Mean 0.081 ms
Worst 0.083 ms

Running 'Random data size 100' over 10 iteration(s):

Best 0.084 ms
Mean 0.085 ms
Worst 0.086 ms

Running 'Sorted data size 1000' over 10 iteration(s):

Best 0.839 ms
Mean 0.846 ms
Worst 0.901 ms

Running 'Same data size 1000' over 10 iteration(s):

Best 1.080 ms
Mean 1.100 ms
Worst 1.167 ms

Running 'Random data size 1000' over 10 iteration(s):

Best 1.183 ms
Mean 1.217 ms
Worst 1.303 ms

Running 'Sorted data size 10000' over 10 iteration(s):

Best 10.046 ms
Mean 10.114 ms
Worst 10.231 ms

Running 'Same data size 10000' over 10 iteration(s):

Best 13.034 ms
Mean 13.129 ms
Worst 13.248 ms

Running 'Random data size 10000' over 10 iteration(s):

Best 14.275 ms
Mean 14.372 ms
Worst 14.620 m

@PureFox48
Copy link
Contributor Author

PureFox48 commented Feb 26, 2023

For good measure, here are some figures for lists containing 100,000 and 1,000,000 random elements.

Benchmarks for existing list.sort (Lomuto):

Running 'Random data size 100000' over 10 iteration(s):

Best 175.197 ms
Mean 180.517 ms
Worst 184.312 ms

Running 'Random data size 1000000' over 10 iteration(s):

Best 2273.706 ms
Mean 2315.149 ms
Worst 2338.914 ms

Benchmarks for List2.sort (modified Lomuto):

Running 'Random data size 100000' over 10 iteration(s):

Best 188.390 ms
Mean 190.232 ms
Worst 192.531 ms

Running 'Random data size 1000000' over 10 iteration(s):

Best 2380.935 ms
Mean 2383.646 ms
Worst 2392.719 ms

Benchmarks for List3.sort (Hoare):

Running 'Random data size 100000' over 10 iteration(s):

Best 174.246 ms
Mean 174.686 ms
Worst 175.152 ms

Running 'Random data size 1000000' over 10 iteration(s):

Best 2028.855 ms
Mean 2031.921 ms
Worst 2036.394 ms

This demonstrates just how fast the quicksort algorithm is, even for large amounts of data and using an interpreted language such as Wren.

@PureFox48
Copy link
Contributor Author

Whilst we're at it, I think we should also add ordering operators to the String class so that (amongst other things) we can easily sort strings lexicographically by codepoint. This code should do it:

class String is Sequence {
  /* existing stuff */
  
  // Compares this to another string lexicographically by codepoint.
  // Returns -1, 0 or +1 depending on whether
  // this < other, this == other or this > other respectively.
  compareTo(other) {
     if (!(other is String)) Fiber.abort("Other must be a string")
     if (this == other) return 0
     var cp1 = this.codePoints
     var cp2 = other.codePoints
     var len = (cp1.count <= cp2.count) ? cp1.count : cp2.count
     for (i in 0...len) {
        if (cp1[i] < cp2[i]) return -1
        if (cp1[i] > cp2[i]) return 1
     }
     return (cp1.count < cp2.count) ? -1 : 1
  }

  // Comparison operators.
  <  (other) { compareTo(other) <  0 }
  <= (other) { compareTo(other) <= 0 }
  >  (other) { compareTo(other) >  0 }
  >= (other) { compareTo(other) >= 0 } 
}

@mhermier
Copy link
Contributor

mhermier commented Feb 26, 2023 via email

@PureFox48
Copy link
Contributor Author

Done. See #1142.

@ruby0x1
Copy link
Member

ruby0x1 commented Feb 26, 2023

Thanks for the detailed posts @PureFox48, it's always appreciated ✨

@mhermier
Copy link
Contributor

For reference, can you benchmark the algorithm provided by List, and the Hoare applied to it (It should optimize a little bit more, because functional is a little bit sub-optimal in wren).

@PureFox48
Copy link
Contributor Author

The only one we haven't benchmarked so far is the existing list.sort() method which is based on Lomuto partitioning but with the high value pivot. Consequently, it's very slow when the data is already sorted or is the same.The figures for that are given below:

Benchmarks for existing list.sort (Lomuto):

Running 'Sorted data size 10' over 10 iteration(s):

Best 0.007 ms
Mean 0.008 ms
Worst 0.012 ms

Running 'Same data size 10' over 10 iteration(s):

Best 0.005 ms
Mean 0.005 ms
Worst 0.006 ms

Running 'Random data size 10' over 10 iteration(s):

Best 0.003 ms
Mean 0.004 ms
Worst 0.004 ms

Running 'Sorted data size 100' over 10 iteration(s):

Best 0.578 ms
Mean 0.583 ms
Worst 0.595 ms

Running 'Same data size 100' over 10 iteration(s):

Best 0.296 ms
Mean 0.300 ms
Worst 0.304 ms

Running 'Random data size 100' over 10 iteration(s):

Best 0.069 ms
Mean 0.070 ms
Worst 0.070 ms

Running 'Sorted data size 1000' over 10 iteration(s):

Best 53.978 ms
Mean 55.671 ms
Worst 57.360 ms

Running 'Same data size 1000' over 10 iteration(s):

Best 27.065 ms
Mean 27.468 ms
Worst 28.218 ms

Running 'Random data size 1000' over 10 iteration(s):

Best 1.151 ms
Mean 1.161 ms
Worst 1.213 ms

Running 'Sorted data size 10000' over 10 iteration(s):

Best 5471.917 ms
Mean 5604.434 ms
Worst 5644.828 ms

Running 'Same data size 10000' over 10 iteration(s):

Best 2754.123 ms
Mean 2757.615 ms
Worst 2764.081 ms

Running 'Random data size 10000' over 10 iteration(s):

Best 14.600 ms
Mean 14.767 ms
Worst 15.056 ms

List2.sort differs from the above in that it uses a mid-value pivot to remedy the 'already sorted' problem.

List3.sort uses Hoare partitoning to remedy both the 'already sorted' and 'are the same' problems.

To ensure all 3 methods use exactly the same data, I've updated the figures for the other two in my earlier posts.

My overall conclusion is that we should change to the Hoare method which gives reasonable results across the board and seems a shade faster than Lomuto for large lists.

PureFox48 added a commit to PureFox48/wren that referenced this issue Mar 3, 2023
wren-lang#1141 identified two problems with the existing implementation, namely that it was very slow when presented with lists which were either already sorted or all the same.

The purpose of this PR is to solve those problems by changing from Lomuto to Hoare partitioning. As shown by the figures in the above issue, this should not lead to performance degradation for small 'random' lists and may even be a little quicker for large lists.

This change would be invisible to users as only private methods are affected and does not require any changes to the documentation or tests.
PureFox48 added a commit to PureFox48/wren that referenced this issue Mar 3, 2023
wren-lang#1141 identified two problems with the existing implementation, namely that is was very slow when presented with lists which were already sorted or all the same.

The purpose of this PR is to fix these problems by switching from Lomuto to Hoare partitioning. As far as 'random' lists are concerned, the figures in the above issue indicate that there will be no noticeable change in performance for small lists and possibly a small improvement for large lists.

This change will be invisible to users as only two private methods are affected. No change is required to either the documentation or tests.
@PureFox48
Copy link
Contributor Author

As it seems clear from these figures that a change to Hoare partitioning will be beneficial, I've now opened a PR for it.

@mhermier
Copy link
Contributor

mhermier commented Mar 8, 2023 via email

@PureFox48
Copy link
Contributor Author

The only time when high is less than 0 is when the list is empty and it's then equal to -1. However, low is then 0 and the condition low < high then fails and the method just returns without doing anything.

So, yes, we can get rid of the low >= 0 && high >= 0 part and I'll change the code in the PR when I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants