Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitioning of Data into OrbitDB databases as a means of providing useful metaphors #993

Open
CSDUMMI opened this issue Apr 28, 2022 · 2 comments

Comments

@CSDUMMI
Copy link
Contributor

CSDUMMI commented Apr 28, 2022

In #975 I describe the notion that OrbitDB stores should not be seen as equivalents to an SQL table and much rather as an individual row or document in an SQL or NoSQL database respectively.

Here I want to elaborate on this idea and provide a more apt metaphor to use in relation to OrbitDB stores.

OrbitDB stores are all based on the operations log - an immutable, append-only list of "operations" which when consumed by a state machine creates the same state regardless of the peer executing the state machine.

This is another version of the Turing Machine - the operations log is the tape and the peer is the scanner and the m-configurations of the Turing Machine are all the states the peer can assumed throughout their parsing of the oplog - and thus can be used to execute any algorithm that a Turing Machine can execute.

Both the operations log and finite-state machine and the Turing Machine are plausible ways of looking at an OrbitDB Store. But I want to ask: What is the most useful way of thinking about OrbitDB Stores? What is the most apt metaphor for an OrbitDB Store?

I already suggested two possible metaphors - the Turing Machine and the finite-state Machine - but I would like to provide some further metaphors that are commonly used in computing:

  • Files
  • Objects
  • Tables in SQL Databases
  • Ports
  • Addresses
  • Streams
  • Flags
  • Trees
  • Graphs
  • Buffers
  • Heaps
  • and many more.

All of these use imagery to explain the underlying technology - imagery which has absolutely nothing to do with the technology that is used in the end, except that they share some metaphorical properties. But using these we gain such an easy image that both beginners and experts use them as common language and employ them instead of describing the underlying technology directly - mostly because that underlying technology is often extremely unhelpful in developing their technology - and often itself another metaphor or layer of abstraction. If you wanted to describe computing without using any metaphors, you'd be unable to use bits and bytes and would have to use voltages and quantum physics.

When you work with a Tree, is it easier to think about tree nodes, branches and leafs or to think about the heap, pointers and the single continuous space of memory allotted to your program through the operating system (another metaphor). It's clearly the first option anyone would choose, if they actually want to think about what happens to the data structure instead of the details used to represent it inside of another data structure.

And I'd argue the same thing holds true for OrbitDB. While thinking about the underlying oplog might be the metaphor closer to what actually happens inside of an OrbitDB store - that's more like thinking about the heap, pointers and structs when thinking about a Tree or any other data structure for that matter.

We can use several different already existing metaphors for OrbitDB. But it's a matter of reasoned debate and experience to determine what metaphors are useful and which are not useful. Importantly there maybe more than one metaphor that is ultimately adopted - you only need to look at the functional and procedural paradigms for two diverging schools of prorgramming based on two different metaphors for what programs are.

With that out of the way I want to detail how the Object metaphor might be used to think about a store.

A Store as an Object

We can borrow from the Object Oriented paradigm and declare each store to represent a single object with some properties, values and methods.

The values should not require explanation - they are much like the keyvalue store today, except some improvement might be made to make the values typed.

All values should be private to the object's methods and the developer has to implement getters and setters, that also implement validation.

The problem with the Object model is that it does not provide a cohesive model to include also Access Controllers and the properties of peer-to-peer.

Oplog to Object Mapping

In order to parse the oplog into an Object the oplog should be formatted as:

{ 
       "method": <setter method of the store to call>,
      "args": [<argument list to supply to the setter>],
}

Thus the setters can be used to create and parse the oplog - which can be done without any interaction of the developer.

class Obj {
     constructor(init) {
         this._a = 1;
         this._b = 2;
     }


     setA(a) {
        this._a = a;
    }

    setB(b) {
        this._b = b;
    }
}

OrbitDB.Store.create_and_add(Obj, setters = ["setB", "setA"]);

This should create a store that can have two operations "setA" and "setB" and they will result in calling setA and setB on a newly constructed Obj instances every time. The constructor in this example is provided with an init object, which can contain any information that is contained in the manifest of the store.

A possible oplog for this might look like this (only considering the payload fields for each operation):

[
   {
      "method": "setA",
      "args": [245]
   },
   {
     "method": "setB",
     "args": [523]
   }
]

Resulting in object with _a = 245 and _b = 523.

I'd like to ask anyone who has been able to read through all of this issue, to not just use my proposal but to try and develop alternate metaphors. Metaphors that also explain the access controller and peer-to-peer behavior in a cohesive form.

Otherwise I hope my issue was a nice perspective on OrbitDB from a more high level viewpoint.

@LucaPanofsky
Copy link

It seems to me that the fact the we should see an OrbitDB Store as row rather than a table depends more on the performance issue raised in #975 than on a truly convenient metaphor. Indeed, in my opinion this kind of metaphor might be good and useful at the domain level of an application whereas it might turn out to be cumbersome when reasoning about the core of the library.

My understanding up to know is that the oplog is, somehow, the source of truth of our database whereas the Stores provides us a representation of the oplog which fits our need. Indeed Stores share the same datastructure from which you can nevertheless build different representations: a keyvalue store, a feed store, a more complex and custom store and so on. In my opinion this is very good and it is one of the reason I really like OrbitDB.

Because of that, I also see the available stores as "examples". The point is that at the application level it is likely that we have to customize many aspects of the database. And OrbitDB makes it quite simple.
At the moment it is possible to create:

  • a custom store with a custom index
  • a custom keystore
  • a custom access controller
  • a custom identity provider

Finally, I am trying to make a social dapp built on top of orbitdb. I have an "origin" database which spawns other databases sharing the same access controller. The origin is a kind of supervisor and this led to me to think about stores as actors. However this is specific of my app rather than being, so to speak, a general metaphor.

@CSDUMMI
Copy link
Contributor Author

CSDUMMI commented May 15, 2022

My understanding up to know is that the oplog is, somehow, the source of truth of our database whereas the Stores provides us a representation of the oplog which fits our need. Indeed Stores share the same datastructure from which you can nevertheless build different representations: a keyvalue store, a feed store, a more complex and custom store and so on. In my opinion this is very good and it is one of the reason I really like OrbitDB.

That is a pretty apt description of the OrbitDB Store. May want to use that in the field manual.

It seems to me that the fact the we should see an OrbitDB Store as row rather than a table depends more on the performance issue raised in #975 than on a truly convenient metaphor

It's not just for performance reasons that this is a better style of using OrbitDB but it's also due to a fundamental design/architecture decision of OrbitDB.

OrbitDB databases are distributed. Devices with limited resources (such as mobile phones, low connectivity devices and IoT devices) have to load OrbitDB stores to interact with OrbitDB. Thus OrbitDB stores have to be small and limited to a single purpose. You can't load the entire users database only to look up a single user entry in OrbitDB, you must have separate stores for each user.

What I'm suggesting here is to create other models to think about the oplogs and stores, that don't rely on an understanding of the oplog, CRDTs or PubSub.

One way would be to create an OrbitDB Object-Relation-Mapper or rather Object-Oplog-Mapper (ORM/OOM), maybe inspired by peewee:

class User extends OrbitDBBase {
    id = IdentityField(default = "creator", setter = true)
    name = StringField()
   posts = ForeignStoreField(type = Posts, constraints = {
     user: "id",
  })
}

class Posts extends OrbitDBBase {
   user = IdentityField(default = "creator", setter = true)
   posts = ArrayField(StringField())
}

Some peculiarities:

  1. Both classes have field properties. These fields have a type, such as IdentityField (an OrbitDB identity), StringField or ArrayField. Their types can determine how they are encoded in the oplog.
  2. The ForeignStoreField permits connecting two stores by referring to one of them in the other. In the oplog, this would be stored as an orbitdb address.
  3. The fields take arguments that can be used for access control, defaults, options and validation.
    • setter = true on the IdentityField gives that identity write access the fields of the store.
    • default = "creator" sets the default value of the IdentityField to the identity of the store creator. May also provide other ids here.
    • type = Posts let's it be validated that any address for the ForeignStoreField is of the Posts type.
    • constraints = { user: "id" } on the ForeignStoreField validates that the Posts.user = the "id" field of the User store.
    • ArrayField takes another field as argument - the type of the values of the array.

This is the syntax employed by peewee with some modifications. Through this syntax I hope to provide a succinct, short and yet informative way of expressing custom stores, indexes and access controllers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants