Zero Knowledge Cloud Storage

Cubbit's infrastructure revolves around three players: the user, the Swarm, and the coordinator

The User: You access Cubbit directly via your chosen device (computer or phone).
The Swarm: A distributed, P2P network of Cubbit Cells where data is stored. 
The Coordinator: A suite of machine learning algorithms that optimize the payload distribution on the network, while also taking care of security and metadata. It’s also in charge of triggering the recovery procedure for files on the Swarm (see Recovery section).

These three components interact to enable safe and private cloud storage inside a zero-knowledge architecture, ensuring that no one in the system, not even the coordinator, can access the users’ data.

The Path of a File

    Client-side encryption

    The client generates a new AES 256 key and uses it to encrypt the file. This “file key” is, in turn, encrypted by a master key that is only retrievable with the user's password stored on the Coordinator, allowing the user to sign in and retrieve their keys from any device.

    File redundancy

    The encrypted file is split into N chunks and then processed into K additional redundancy shards through Reed Solomon error-correcting codes. This procedure allows the retrieval of the payload even if individual cells go offline, as long as you can reach any set of N cells. Parameters are dynamically chosen and optimized such that the probability of downtime is lower than 10^-6.

    Authorization and peer selection

    Next, the owner of the file asks the coordinator for authorization to upload it to Cubbit. The coordinator, in addition to taking care of this, assigns a location to the file inside the Swarm, determining which hosting peers are most suitable.

    To do so, the coordinator runs a fitness function in order to both nullify the probability of losing files due to natural disasters and grant a constant network performance. In other words, the coordinator spreads the chunks as far as possible, while also minimizing network latency and other factors (bandwidth usage, storage optimization, etc).

    File Distribution

    Each of the N+K shards is stored on a different Cubbit Cell, called a ‘hosting peer’, this means that Cells don’t contain the user's files but encrypted shards of other people's files.

    To make this possible, the coordinator facilitates peer-to-peer connections when needed, acting as a handshake server. Thanks to Reed Solomon, uptime is guaranteed as long as at least N hosting peers are online at the same time.

    cubbit-network

    Network self-healing

    The coordinator monitors the uptime status of each Cell and triggers a recovery procedure when the total number of online shards hits a certain threshold - namely, N + K/2. If more than half the K hosting peers go offline, the coordinator alerts the remaining hosting peers, which in turn contact other Cells via peer-to-peer, end-to-end encrypted channels to fully restore the number of online shards to the maximum level. It is worth noting that peers can retrieve the missing shards without the intervention of the original owner as they work on encrypted payloads. While redundancy parameters alone are tuned to guarantee a statistical uptime of c.a. 99.9999%, this recovery procedure virtually pushes the uptime to 100% by handling history effects such as permanently-disconnected peers and redistributing missing shards over new entries of the swarm. This is how our zero knowledge cloud storage works

    cubbit-network

    Environmental impact

    The internet infrastructure is responsible, as of today, for the 10% of the total worldwide energy demand [1,2]. Data centers account for one third of it, making “the Cloud”, despite the ephemeral name, an ecological monster that consumes as much as the entire United Kingdom (66 millions inhabitants and 5 th world’s economy).Cubbit is based on small, optimized single-board computers, which have an impact per GB that is 10 times smaller than data center racks. Moreover, it can leverage on geographical proximity to avoid long data transfers, which, in certain cases, can be as much consuming as storage itself [3].

     

    The result, detailed in our green paper, is that an average storage plan of ~5 TB will save, choosing Cubbit over traditional cloud storage, the equivalent consumption of an always-on fridge in a year.

    [1] Greenpeace International. How clean is your cloud? catalysing an energy revolution. Technical report, 2012
    [2] Mark P Mills. The cloud begins with coal. Digital Power Group, 2013.
    [3] Jayant Baliga, Robert WA Ayre, Kerry Hinton, and Rodney S Tucker. Green cloud computing: Balancing energy in processing, storage, and transport. Proceedings of the IEEE, 99(1):149–167, 2011.