Skip to content
This repository has been archived by the owner on Jul 23, 2021. It is now read-only.

SortedList type #183

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

mcclure
Copy link

@mcclure mcclure commented Oct 17, 2020

This adds a new data type to immutable-js named SortedList, a Collection.Set subtype. Addresses #181. My description in the docs:

SortedLists are ordered dense collections that are kept in a sort order. Random access is not available, only insert in sort order, iteration, and access or removal at either end. SortedLists can thus be used as a priority queue.

SortedLists are immutable and fully persistent with O(1) reads from either end, O(N) iteration, and insert and removal (from either end) between O(log16 N) and O(log32 N).

Sorting is managed by two optional callbacks provided at list creation, a "key function" and a "less-than function". When comparing two values, the key function is called on both values and the result is passed to the less-than function. By default the key function returns the value, and less-than uses normal JS <. This means you have two ways to customize sort order. You can use the key function to reduce your values to a sortable value (for example, if your values are objects, your key function could return an "id" member), or you can implement less-than.

This patch is usable in practice, but is not ready to be merged. add(), first(), last(), shift(), pop() and ES6 iterate all work but other functions either are absent or retain the List.js implementation and will error.

SortedList works using a tree of list-of-lists, like List.js, but because insertion is at arbitrary location by design, instead of all nodes being kept full the list-max is treated as a maximum and if a node tries to exceed the maximum it splits into two halves. Because the node list length is variable, each node tracks the minimum and maximum value in its branch of the tree to make navigation possible. Pointers are maintained to the least and greatest leaf nodes for constant first/last value access.

For my own amusement, and because I have had trouble getting the immutable-js automated tests to run, I wrote a standalone Preact app for testing and debugging SortedList:

Screen Shot 2020-10-17 at 1 39 27 PM

To use this, check out both the sortedlist-pr branch and the test app, run npm run build in the sortedlist-pr dir, and run npm link ../path/to/sortedlist in the test app dir.

To test SortedList before submitting this PR, I temporarily reduced the node list max size to 4 and used this app to run 40,000 random operations on a list while verifying invariants (list is in sorted order; node min and max memos are accurate; head and tail point to actual head and tail) at each step. (This test was on immutable.js mainline; I also tested 20,000 operations on the -oss branch with the standard node list size of 32. See note at end.)

Here are the issues that I think need to be resolved before accepting this PR:

  • “Real” automated tests / perf tests
  • Implement all Collection.Set methods
  • I haven’t written anything like proofs that my O() claims above hold. I don’t know if you want something more concrete than “well I thought about it real carefully, and I’m pretty sure”.

Here are some things I think would be “nice to have” when including this in a release:

  • I would like a function for removing values at arbitrary positions by value. (This would probably break the O() claims a little because it would be hard to implement it in a way that doesn’t result in unnecessary depth levels in edge cases.)
  • I think adding a list of N already-sorted numbers could potentially be in-practice more efficient than calling add() N times.
  • “Mutable” currently doesn’t work. I think it would be cool if setting “mutable” on SortedList caused it to go into a batching mode where added elements are queued, and when they are needed are sorted and mass-added only then. (I think this would be superior to actually mutating the tree, which would be ungainly considering how complex the add/push/shift operations turned out to be.)
  • I would like some “range” iterators, for example “give me a Seq of all values between key X and key Y” or “give me the 10 keys greater than or equal to key X”.
  • I think it would be interesting to have a “naive map” implementation which assumes the map function is order-preserving and does not re-sort.

Here are the things I think I need help with:

  • What is __hash?
  • What is @@transducer? How do I implement it? (Conrad is helping me with this in Slack.)
  • What are __iterator and __iterate? What protocol do implementations of these two functions follow? (I think I understand __iterator but not whether “type 2” is mandatory, and I don’t understand what __iterate does.)
  • How do I run the automated tests? How do I run the perf tests? (I am already receiving some help with this in Slack.)
  • What is the correct list max size? I used 32 since that’s what List.js uses, but maybe the performance characteristics of SortedList are different from List. (Being able to run perf tests would help with this.)
  • Before this is merged I would like clarity on what the CLA and contributor-crediting policies are for this new -oss project.

Finally, I have one problems which is unique to the -oss repo, and which I don't know how to fix:

  • I got this error when running yarn build:

      [15:03:17] Error: Unknown type kind: NullKeyword
          at parseType (/Users/mcc/work/gh/other/immutable-js/pages/lib/genTypeDefData.js:421:11)
          at parseParam (/Users/mcc/work/gh/other/immutable-js/pages/lib/genTypeDefData.js:427:13)
          at /Users/mcc/work/gh/other/immutable-js/pages/lib/genTypeDefData.js:260:55
          at Array.map (<anonymous>)
          at parseCallSignature (/Users/mcc/work/gh/other/immutable-js/pages/lib/genTypeDefData.js:260:46)
          at visitFunctionDeclaration (/Users/mcc/work/gh/other/immutable-js/pages/lib/genTypeDefData.js:119:27)
          at visit (/Users/mcc/work/gh/other/immutable-js/pages/lib/genTypeDefData.js:38:16)
          at visitNodes (/Users/mcc/work/gh/other/immutable-js/node_modules/typescript/lib/typescript.js:15129:30)
          at Object.forEachChild (/Users/mcc/work/gh/other/immutable-js/node_modules/typescript/lib/typescript.js:15353:24)
          at visit (/Users/mcc/work/gh/other/immutable-js/pages/lib/genTypeDefData.js:48:19)
      [15:03:17] 'default' errored after 360 ms
    

    I don't get this building mainline immutable-js with npm. I don't get this building your normal master with yarn. Despite the error, I was still able to run my test app by, after running yarn build, deleting node_modules and then running npm link in the test app directory.

    I think this error might be because I typed a parameter in a function overload in the d.ts in my patch as the literal "null". If your tool thinks that's an error, your tool is bugged. That's legal TypeScript.

I have also submitted a version of this to immutable-js. My working branch with non-squashed commits is here.

Supports:
Constructor with custom key and less-than functions
add(), shift() and pop() between O(log16 N) and O(log32 N)
ES6 iteration at O(N)
first() and last() at O(1)

Many other functions are either absent or will crash when run. See my pull request for more information.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant