Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database Encryption & Read Protection #931

Open
CSDUMMI opened this issue Oct 16, 2021 · 15 comments
Open

Database Encryption & Read Protection #931

CSDUMMI opened this issue Oct 16, 2021 · 15 comments

Comments

@CSDUMMI
Copy link
Contributor

CSDUMMI commented Oct 16, 2021

Disscussion for: Database Encryption Project.

Let me give a brief summary of the project, the current state of the discussion
gathered from #819 and an SCP proposal I produced a few months back.

What happened until now?

OrbitDB Stores are publicly readable.
And initially the OrbitDB development was not focused on the
issue of private stores, though ACLs were implemented with a
mind towards the future possibility of Read Access Control.

Why use encryption?

Because OrbitDB stores manifest and oplog on IPFS
and uses IPFS PubSub for communication, which
can both be accessed by anyone in the IPFS Network
with low enough latency, encryption is the only method known
to implement read access control.

What problems exist for implementing encryption?

In order to define Read AC in a specification, I think a few
questions need to be answered. (Gathered from #819 and my own considerations)

What security guarantees should encryption give to the user?

What security guarantees does OrbitDB Encryption give the users?

To be secure, OrbitDB must have properly defined security
guarantees for it's users to rely on and that can be verified by independent
parties.

What Library to use?

How will crypto libraries be supplied by the user?

No standard crypto library has been choosen.
It should be left to the user to choose the library
that is most secure according to their research.

OrbitDB has to provide an API for injecting such
libraries into OrbitDB encryption.

What should be encrypted?

Should the entire oplog or only the entries be encrypted?

  • Encrypt only the payload of the entries. Leaving the structure of the oplog for all to see.
  • Encrypt the entire oplog including the meta data inside the entries. The structure of the oplog would thus be hidden.

Tangentially related to this question is the question of hiding (i.e. encrypting or hashing) the manifest and the PubSub channel name and communication.

How should encryption work?

What scheme will OrbitDB use for Encryption and granting read access?

There are three considerations that could help answer this question:

  1. What cryptographic primitives can be used?
  2. How should read access be granted? For the entire oplog vs. for an individual entry.
  3. How many keys should be handled and how? What should the key store look like?

Security Audits & Analysis

How will it be ensured that OrbitDB Encryption is secure?

Next steps would in my opinion be to define an SCP for OrbitDB Encryption
followed by a reference implementation.

That implementation and SCP should then be verified based on the
security guarantees given by OrbitDB to ensure no bugs or errors
in the protocol remove the security guarantees.

  • Will Audits be conducted before releasing encryption on OrbitDB?
  • And who will be responsible for vulnerabilities, exploits and mitigations
    in the future after encryption has been released to OrbitDB?

Should this be part of OrbitDB 1.0 or rather a later release?

This is a very large project and I don't yet see how this can be implemented
in a timely fashion along side those many other projects for 1.0.
Maybe Encryption should be scheduled for a later release of OrbitDB 1.x?

I would like to ask for feedback to this issue from @tabcat @aphelionz @haadcode
as well as anyone else who has been discussing and thinking about this
project and possible feature.

@chrispanag
Copy link

Maybe Encryption should be scheduled for a later release of OrbitDB 1.x?

IMHO, it should be scheduled for a later release. I believe this would be better, given that the 1.0 release IMO should focus on making the current functionality more stable and performant. Given that devs can already implement some kind of encryption on top of Orbit, this is not something crucial for the 1.0 release.

@CSDUMMI
Copy link
Contributor Author

CSDUMMI commented Oct 22, 2021

I agree. Though work and discussion should start to provide this out of the box, because by consolidating encryption development, security risks can be reduced.

Since not every developer will be able to write secure encryption and be able to appreciate what data will be leaked when only encrypting the payload and not identity, time, dbname and signature of the entry - as well as not encrypting manifest data, pubsub messages and other data I am unaware of.

All this data could lead to accurate guesses about the content of an entry in specific use cases.

@CSDUMMI
Copy link
Contributor Author

CSDUMMI commented Oct 22, 2021

And this data, created and published for OrbitDB should be encrypted by OrbitDB,
because for me as a user, it is a lot harder to encrypt these internals.

@ccokee
Copy link

ccokee commented Nov 18, 2021

I was up to create a request about something like this. I had some thoughts and I'd like to share additions over db securization:

In a classic, centralized scenario, securization is achieved by access layering architecure i.e:

USER <- encrypted connection -> BACKEND <- encrypted connection -> DATABASE

In this scenario, ideally, user is never aware of were the database is located.

Besides IPFS files are public and unencrypted, one of the main issues I see in the security of OrbitDB is the client being aware of the location of the db. So my approach is somehow, to replicate the access layering to OrbitDB current specs.

The main idea is to develop this "backend" layer as a decentralized/p2p cluster.

But, as DBs are mostly used in apps, some kind of centralized authority must be achieved. This could be done by declaring database "genesis" blocks/manifests. This could help not only in the read scenarios but, by creating access control mechanisms (i.e, tokenizing data access) , you could also securize better: create, delete and update operations.

I know this would be hard to address but some interesting features to add might be:

  • Database snapshooting
  • Database cache

@julienmalard
Copy link
Contributor

One challenge I see with encrypting databases is a potential sense of "false security". By that I mean that, in a traditional centralised system, the password provides access to the data, so that if a password is lost, it can be changed as quickly as the leak is discovered and from that moment onwards no (or at least no further!) data can be accessed by anyone with the leaked password.

But in the case of OrbitDB, all the data would be permanently public, only encrypted. So even if one loses one's key (e.g., device is stolen or hacked) and then rotates the key, as I understand it all data previously encrypted with that key would be forever readable, with no clear manner of recalling that now not-so-encrypted information spread across the network.

This is mainly what has prevented me from using encryption in my apps with OrbitDB, so as to avoid unintentionally misleading users about the different risks of such an approach as compared to a centralised server approach...

@CSDUMMI
Copy link
Contributor Author

CSDUMMI commented Dec 14, 2021

That problem is extremely severe, because the solution for it requires technology missionaries to clearly, precisely and understandably explain the security of OrbitDB to (pretty much) anyone.

A more technical solution to the problem would be the creation of a secret key infrastructure, where multiple types of secret keys are used together to encrypt a file instead of a single secret key.

For example:
You could setup a key manager with a master key and several ephemeral application specific keys.

Now every encryption is encrypted not using just one key but a key derived from both the master key and the ephemeral key.

Meaning that the loss of either the master or the ephemeral key would not lead to a leaking of the entire database. Only the two together could lead to a leak of data.

But you are right: In this decentralized system the security of the system depends on the security of keys more so than anything else.

And yet, how is that very different from our current situation with passwords? If I loose a password for some service, I can change that password by logging in to my email provider (with my email password) and changing the password for the service from there.

But if I lost my E-Mail password, it'd be almost game over. I'd only be able to maybe contact the email service and get them to change the password for me - on the basis that they somehow know it's me who owns that Account and is requesting the change.

Thus the security of password security systems still depends entirely on passwords or some other means of ID. And if I lost all my passwords and means of ID today, I'd not be able to access any service or data tomorrow.

Similarly with keys: If I loose all the keys that I used to encrypt a certain file and all keys I used to encrypt those keys, then it'd be game over for me too. The only thing I can do is to divide this liability among as many keys as possible in as many different locations as possible to reduce the likelihood of such an attack happening.

@CSDUMMI
Copy link
Contributor Author

CSDUMMI commented Dec 14, 2021

In short: to improve the security of password systems, the reliance on a single password was replaced by the reliance on multiple (often two: Service + Email Password).

The same should be done with secret keys - have many keys - to reduce the risk and the gain from having access to a single key.

This kind of key management should in my opinion not be the job of OrbitDB but OrbitDB Encryption should allow for the modular injection of Key management chosen by the user.

@Rock-Lee-520
Copy link

Rock-Lee-520 commented Dec 30, 2021

Hi, guys!
I have a idea ,maybe we can try to use client to produce temporary secret key for other subscribe servers.
For example:

1、 Servers need to apply for a new secret key , when temporary secret key expire.
2、 If a password is lost, it can be changed as quickly as the leak is discovered

That is to say,client always authorize a temporary secret key to read and write data for servers.

By the way ,i am very happy to contribute code for Oribit-db or other database of decentralization.

@tabcat
Copy link
Member

tabcat commented Jan 10, 2022

@rock-liyi I like this, it sounds like each participant has some granted authentication with each other; and this includes renewing recovery keys.

This would be useful for advertising updates securely. Also an encryption key could be created and shared for encrypting db entries to all participants. Doing encryption this way would have some downsides though when encrypting the entire database, and mls has always seemed like a better option for the future. However with ratchet encryption keys are usually discarded and we still need to be able to read old entries; so they will either need to be kept, or a decrypted entry or payload would need to be copied locally.

I'm also interested to see if keeping the data private via controlling who can receive it is feasible with ipfs as an additional guard.

@julienmalard
Copy link
Contributor

@tabcat I'm very new to this, but if I understand correctly, would it then be possible to add encryption only to the "sharing" part of OrbitDB? In other words, the OrbitDB entries stored locally would remain unencrypted, but a new Encryption module would be added that would allow for encrypting entries just before they are shared on PubSub (and decrypted upon reception). Just a thought, but Local-web-first/auth might be useful for the key generation and group sharing part.

@tabcat
Copy link
Member

tabcat commented Jan 11, 2022

@julienmalard only entry CIDs are shared via pubsub, the entries are fetched with ipld. everything could be done with a more active replication where entries data is sent directly between each peer, and replicas are kept in a private repo. this would be quite the change and am less enthusiastic about it but it might be necessary in some cases.

@julienmalard
Copy link
Contributor

@tabcat Ah, I see. Thanks for the clarification! Could we simply encrypt the CIDs for a minimum level of protection, supposing that it is practically infeasable to guess the CID for an unauthorised person to access a (to them unknown) OrbitDB entry?

@tabcat
Copy link
Member

tabcat commented Jan 11, 2022

its safe to assume that entry content ids might be advertised by the ipfs node, and they would also be made public by any one requesting them from the network. this isnt a good security model; it seems like anything worthwhile would be centered around 1) encrypting the actual entry data with participants, or 2) moving replication of entry data from ipfs to encrypted channels with peers. really not that psyched about the 2nd one but it might be necessary in some cases where data must not be public even if encrypted.

@julienmalard
Copy link
Contributor

Ah, I see. I'd completely forgotten about the DHT...

@MichaelJCole
Copy link

MichaelJCole commented Nov 18, 2023

Hi, I need this for my users. I can't implement this at the application layer because I want the keys to be encrypted as well.

NOT implementing this is worse security than implementing it because putting the encryption in the application layer makes it less secure. It's not obvious from the README that the data is in the clear, which is pretty bad security.

Some of the questions asked are interesting, but things like this can get over-complicated without a use case. Here's my basic use case:

"Developers can encrypt user data so it's not available in the clear for the entire internet, so they can have some basic privacy."

Based on that use case, here are some of my answers:

What security guarantees should encryption give to the user?

None, The cryptographic details, should be left up to the developer.

Instead, I'd like an API to encrypt/decrypt blocks going in/out of the transport layer IPFS. It can be pluggable like the Storage interface, or a pair of functions passed during initialization.

Re: forward/backward secrecy, those are advanced use cases and expectations. Interested users should use MLS - Message Layer Security which is complicated beyond this use case.

What Library to use?

Dev teams will already have a crypto library selected, and don't want another library for reasons of policy, application size, or preference.

What should be encrypted?

Yes, the entire oplog

How should encryption work?

With an API, it could be handled through asymmetric encryption or PKI, depending on the developer.

OrbitDB could include implementations (like the Storage API), or leave it to some blogs to show how to implement.

Security Audits & Analysis

Not having to do this is the very reason devs use existing crypto libraries.

That problem is extremely severe, because the solution for it requires technology missionaries to clearly, precisely and understandably explain the security of OrbitDB to (pretty much) anyone.

If that was true, OrbitDB would be blaring a warning that data is in the clear on it's README. It's an impossible problem to design a universal solution. Providing an API allows developers to implement what they need. Including a couple examples would be enough.

IPFS is enough of a paradigm shift, allowing interested evangelists an opportunity to use the API and write a blog post would be a solution for this.

This is already a bit long for a post, so sorry if I missed anything from the convo above?

What do you think? Would you accept a PR for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants