Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SortedList type #1794

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

SortedList type #1794

wants to merge 2 commits into from

Conversation

mcclure
Copy link
Contributor

@mcclure mcclure commented Oct 17, 2020

This adds a new data type to immutable-js named SortedList, a Collection.Set subtype. Addresses #1791. My description in the docs:

SortedLists are ordered dense collections that are kept in a sort order. Random access is not available, only insert in sort order, iteration, and access or removal at either end. SortedLists can thus be used as a priority queue.

SortedLists are immutable and fully persistent with O(1) reads from either end, O(N) iteration, and insert and removal (from either end) between O(log16 N) and O(log32 N).

Sorting is managed by two optional callbacks provided at list creation, a "key function" and a "less-than function". When comparing two values, the key function is called on both values and the result is passed to the less-than function. By default the key function returns the value, and less-than uses normal JS <. This means you have two ways to customize sort order. You can use the key function to reduce your values to a sortable value (for example, if your values are objects, your key function could return an "id" member), or you can implement less-than.

This patch is usable in practice, but is not ready to be merged. add(), first(), last(), shift(), pop() and ES6 iterate all work but other functions either are absent or retain the List.js implementation and will error.

SortedList works using a tree of list-of-lists, like List.js, but because insertion is at arbitrary location by design, instead of all nodes being kept full the list-max is treated as a maximum and if a node tries to exceed the maximum it splits into two halves. Because the node list length is variable, each node tracks the minimum and maximum value in its branch of the tree to make navigation possible. Pointers are maintained to the least and greatest leaf nodes for constant first/last value access.

For my own amusement, and because I have had trouble getting the immutable-js automated tests to run, I wrote a standalone Preact app for testing and debugging SortedList:

Screen Shot 2020-10-17 at 1 39 27 PM

To use this, check out both the sortedlist-pr branch and the test app, run npm run build in the sortedlist-pr dir, and run npm link ../path/to/sortedlist in the test app dir.

To test SortedList before submitting this PR, I temporarily reduced the node list max size to 4 and used this app to run 40,000 random operations on a list while verifying invariants (list is in sorted order; node min and max memos are accurate; head and tail point to actual head and tail) at each step.

Here are the issues that I think need to be resolved before accepting this PR:

  • “Real” automated tests / perf tests
  • Implement all Collection.Set methods
  • I notice there's a Flow type file, but I didn't add SortedList to it
  • I haven’t written anything like proofs that my O() claims above hold. I don’t know if you want something more concrete than “well I thought about it real carefully, and I’m pretty sure”.

Here are some things I think would be “nice to have” when including this in a release:

  • I would like a function for removing values at arbitrary positions by value. (This would probably break the O() claims a little because it would be hard to implement it in a way that doesn’t result in unnecessary depth levels in edge cases.)
  • I think adding a list of N already-sorted numbers could potentially be in-practice more efficient than calling add() N times.
  • “Mutable” currently doesn’t work. I think it would be cool if setting “mutable” on SortedList caused it to go into a batching mode where added elements are queued, and when they are needed are sorted and mass-added only then. (I think this would be superior to actually mutating the tree, which would be ungainly considering how complex the add/push/shift operations turned out to be.)
  • I would like some “range” iterators, for example “give me a Seq of all values between key X and key Y” or “give me the 10 keys greater than or equal to key X”.
  • I think it would be interesting to have a “naive map” implementation which assumes the map function is order-preserving and does not re-sort.

Here are the things I think I need help with:

  • What is __hash?
  • What is @@transducer? How do I implement it? (The immutable-js-oss folks are helping me with this.)
  • What are __iterator and __iterate? What protocol do implementations of these two functions follow? (I think I understand __iterator but not whether “type 2” is mandatory, and I don’t understand what __iterate does.)
  • How do I run the automated tests? How do I run the perf tests? (The immutable-js-oss folks are helping me with this.)
  • What is the correct list max size? I used 32 since that’s what List.js uses, but maybe the performance characteristics of SortedList are different from List. (Being able to run perf tests would help with this.)
  • In general, what is the submission and contributor crediting process for this repo? I have not signed the Facebook CLA, but I am confused why that CLA is needed since my understanding is @lee now controls this repo fully. Because this is a relatively major new feature, before this is merged I would like to ask we put some kind of process in place for contributors to be credited within the repo (rather than merely in git histories).

I will also be submitting a version of this to immutable-js-oss. My working branch with non-squashed commits is here.

Supports:
Constructor with custom key and less-than functions
add(), shift() and pop() between O(log16 N) and O(log32 N)
ES6 iteration at O(N)
first() and last() at O(1)

Many other functions are either absent or will crash when run. See my pull request for more information.
@jdeniau
Copy link
Member

jdeniau commented Jul 20, 2021

I may not be the best person for now to help, but I will try :

__hash

__hash: I think that you can watch Lee's conference about immutable, probably at that timecode: https://youtu.be/I7IdS-PbEgI?t=442 where it explain how to get data from a hash.

transducer

the @@transducer think : this article seems nice to understand the principle. You can see #1203 which initiated the code, and its reference : https://github.com/cognitect-labs/transducers-js#transformer-protocol. Regarding the implementation, it seems quite simple once you understand the principle.

iterator

__iterator and __iterate seems to be internal method to iterate or to get an iterator on a collection, __iterate in particular seems to execute the function passed as argument.

testing

automated tests should be easier now with the oss fork merged, you should be able to run with npm install first, and then npm run test, or npm run testonly -- --watch for unit test only. About the perf tests, I'm not really sure for now. Maybe @Methuselah96 can help as he put some effort IIRC.

list size

correct list size : according to the conference, I would guess 32 too, but that's really just guessing

license and attribution

immutable-js has now been splitted from facebook and is now under MIT license. There is no facebook CLA to sign anymore.
See #1814 for more.

About the attribution, there is no special attribution for now, but I think you are right that it's something that we should discuss.
There is some initiatives, like https://github.com/all-contributors/all-contributors for example, but it's a global think in the readme file, there is no special attribution for a specific feature that I know.

I add your PR to the 4.1 milestone, as it is not required in 4.0, but it can be in here if your PR is ready to merge

naming

One last think that troubles me : the name SortedList seems to me that the List is sorted, ie: SortedList(['a', 'd', 'b']).push('c') would be stored (or rendered) as ['a', 'b', 'c', 'd'], but this is not really explained in the documentation (the .d.ts file).

You mentioned that only operation can be done at the start or the end of the queue, which makes me thinks that it's more a Queue than a List. I will probably understand better once the unit tests are there.

@jdeniau jdeniau added this to the 4.1 milestone Jul 20, 2021
@jdeniau jdeniau modified the milestones: 4.1, 5.0, 5.1 Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants