Storage Engine

Storage Engine help us to handle file storage and local catching process, storage engine is also help to index files for further accession.

IPFS Storage Engine

IPFS Storage Engine is a distributed storage engine based on IPFS. The StorageEngineIPFS ins an implementation of IFileSystem and IFileIndex that handle all I/O operations and indexing.

/**
 * An interface of file engine, depend on the environment
 * file engine could be different
 */
export interface IFileSystem<S, T, R> {
  writeBytes(_data: R): Promise<T>;
  write(_filename: S, _data: R): Promise<T>;
  read(_filename: S): Promise<R>;
  remove(_filename: S): Promise<boolean>;
}

/**
 * Method that performing index and lookup file
 */
export interface IFileIndex<S, T, R> {
  publish(_contentID: T): Promise<R>;
  republish(): void;
  resolve(_peerID?: S): Promise<T>;
}

/**
 * IPFS file system
 */

export type TIPFSFileSystem = IFileSystem<string, CID, Uint8Array>;

/**
 * IPFS file index
 */
export type TIPFSFileIndex = IFileIndex<PeerId, CID, IPNSEntry>;

The relationship between StorageEngineIPFS and other classes/interfaces is shown below:

classDiagram
  LibP2pNode -- StorageEngineIPFS
  Helia-- StorageEngineIPFS
  UnixFS -- StorageEngineIPFS
  IPNS -- StorageEngineIPFS
  IFileSystem <|-- StorageEngineIPFS
  IFileIndex <|-- StorageEngineIPFS
  IFileSystem : writeByte(data Uint8Array) CID
  IFileSystem : write(filename string, data Uint8Array) CID
  IFileSystem : read(filename string) Uint8Array
  IFileSystem : remove(filename string) boolean
  IFileIndex : publish(contentID CID) IPNSEntry
  IFileIndex : republish() void
  IFileIndex : resolve(peerID PeerId) CID
  StorageEngineIPFS : static getInstance(basePath, config)

In our implementation, we used datastore-fs and blockstore-fs to persist changes with local file, for now browser is lack of performance to handle connections and I/O. So the best possible solution is provide a local node that handle all I/O and connection.

Usage of IPFS Storage Engine

The database will be cached at local to make sure that the record are there event it's out live of liveness on IPFS network. To start an instance of StorageEngineIPFS we need to provide a basePath and config (we ignored config in this example):

const storageIPFS = await StorageEngineIPFS.getInstance(
  "/Users/chiro/GitHub/zkDatabase/zkdb/data"
);

The basePath is the path to the local cache folder, the folder will be created if it's not exist. The config is the configuration of IPFS node, we will use default config if it's not provided. After we get the instance of StorageEngineIPFS we could use it to perform I/O operations.

// Switch to collection `test`
newInstance.use("test");

// Write a document to current collection
await newInstance.writeBSON({ something: "stupid" });

// Read BSON data from ipfs
console.log(
  BSON.deserialize(
    await newInstance.read(
      "bbkscciq5an6kqbwixefbpnftvo34pi2jem3e3rjppf3hai2gyifa"
    )
  )
);

The process to update collection metadata and master metadata will be described in the following sections.

File mutability

Since a DAG nodes are immutable but we unable to update the CID every time. So IPNS was used, IPNS create a record that mapped a CID to a PeerID hence the PeerID is unchanged, so as long as we keep the IPNSEntry update other people could get the CID of the zkDatabase.

Metadata

The medata file is holding a mapping of data's poseidon hash to its CID that allowed us to retrieve the data from ipfs. It's also use to reconstruct the merkle tree. Metada is stored on IPFS and we also make a copy at local file system.

IPFS Storage Engine folder structure

The structure of data folder is shown below:

├── helia
├── nodedata
│   ├── info
│   ├── peers
│   └── pkcs8
└── storage
    └── default

The helia folder is the folder that hold the Helia node's information, the nodedata folder is the folder that hold the IPFS node's information inclued node identity, peers and addition info. The storage folder is the folder that hold the data of our zkDatabase, all children folder of storage is the name of the collection, in this case we only have one collection called default.

Metadata structure

There is a metadata file at the root of storage folder that contains all the index records for children's metadata, we called it master metadata.

{
  "default": "bafkreibbdesmz6d4fp2h24d6gebefzfl2i4fpxseiqe75xmt4fvwblfehu"
}

The default is the name of the collection and the bafkreibbdesmz6d4fp2h24d6gebefzfl2i4fpxseiqe75xmt4fvwblfehu is the CID of the collection's metadata file. We use the IPNS to point current node PeerID to the CID of the master metadata file by which we could retrieve the list of CID of the collection's metadata file.

There are also a metadata file at each collection folder, we called it collection metadata.

{
  "bbkscciq5an6kqbwixefbpnftvo34pi2jem3e3rjppf3hai2gyifa": "bafkreifnz52i6ssyjqsbeogetwhgiabsjnztuuy6mshke5uemid33dsqny"
}

You might aware that the key of the collection metadata is the poseidon hash of the database document in base32 encoding, and the value is the CID of the document. The collection metadata is used to retrieve the CID of the document by its poseidon hash. There is also a file in the collection folder with the name bbkscciq5an6kqbwixefbpnftvo34pi2jem3e3rjppf3hai2gyifa.zkdb contains the content of the document which was encoded by BSON.

BSON Document

BSON or Binnary JSON is a data structure that we used to encode and decode document. The document will be categorized into collections.