overhide.io blog
21 May 2019
by Jakub Ner

Reference Implementation Deployment as of May 2019

This is a discussion of overhide deployments for services running in support of https://overhide.io and https://ohledger.com; as well as the future direction of components yet to be completed.

All overhide services are deployed on a Kubernetes cluster running in the cloud.

Remuneration API

The remuneration API implementations are simple standalone clound-native / 12-factor services. The API doesn’t require session state and API requests are round-robin load balanced. The APIs depend on underlying ledgers for indexing of transactions; the services use managed data or leverage third parties for the indexing. The services are stateless.

overhide-ledger

The service is available at:

The deployment is modelled below.

Any clound-native/12-factor Node.js instance (green) of the service can be hit with request in any order as the instances are round-robin load-balanced and have no session state. The layer-7 load-balancer (nginx reverse-proxy) terminates SSL for incoming TCP and keeps persistent sessions on the cluster.

Each instance has simple HTTP liveness and readiness probing for Kubernetes cluster management.

Overall liveness probing and alerting is done by https://uptimerobot.com, which sends emails should problems arise in getting answers from the cluster.

The cluster is instrumented with Prometheus and visualization–for tuning and introspecting the cluster–is done with Grafana.

The ledger data is written to a managed database independent of the cluster. Replication, backups, and availability guarantees are as per our agreement with the database provider.

In faded-red are indicated future considerations should the need arise based on cluster observation. Since overhide-ledger has a high ratio of reads to writes, it may make sense to add database read-replicas and preface with a read-cache.

Managed database backups may not be granular enough to fully guarantee that all ledger-processed transaction receipts have been captured and none are lost during a database fail-over. As such, we run a logs server that writes out a log stream of all transactions. The logs server sits in our Kubernetes cluster. At time of this writing reconciliation of the database and the logs server would be a manual process in the event of an outage.

Ethereum

The service is available at:

The deployment is modelled below.

Any clound-native/12-factor Node.js instance (green) of the service can be hit with request in any order as the instances are round-robin load-balanced and have no session state. The layer-7 load-balancer (nginx reverse-proxy) terminates SSL for incoming TCP and keeps persistent sessions on the cluster.

Each instance has simple HTTP liveness and readiness probing for Kubernetes cluster management.

Overall liveness probing and alerting is done by https://uptimerobot.com, which sends emails should problems arise in getting answers from the cluster.

The cluster is instrumented with Prometheus and visualization–for tuning and introspecting the cluster–is done with Grafana.

The service is a simple abstraction leveraging https://etherscan.io. There is no data stored by the service.

Scaling

As of this writing the Node.js instance nodes go straight to https://etherscan.io. As such this remuneration provider is limited by rate-limits on https://etherscan.io.

If these rate-limits prove insufficient in the future we will cache Etherscan results locally in our cluster: see faded-red nodes in diagram above. The cache database can be read-replicated as needed when scaling the cluster through observation.

overhide.io B(ack-end)aaS Broker

The rest of this write-up below is forward-looking as this work is yet to be done.

Architectural choices that went into overhide.io BaaS broker ecosystem attemp to balance simplicity of implementation while allowing for high-availability and enabling scalability.

Broker instances store user data as files. Each user reads and writes their files through their single nominated broker instance: files are coupled to each broker as the broker’s state. A user nominates a single broker in the whole ecosystem–not just our cluster–for all of their data (see active stewardship).

A user works with one nominated broker in the whole ecosystem. A broker works with a single file-system volume. Therefore one or more users share a broker and share a broker’s data-storage capacity. A user’s maximum data-storage capacity is the broker’s file-system volume capacity: at this limit a single user is assigned exclusive use of a broker.

As such all the users are effectively sharded across broker instances.

The deployment is modelled below.

Kubernetes maintains regular liveness and readiness probing on the pod to ensure there is always a broker instance running and that it’s mounted to its file-system volume.

As with other services, overall liveness probing and alerting is done by https://uptimerobot.com, which sends emails should problems arise in getting answers from the cluster. Note that each broker instance is its own deployment with its own unique address through the load-balancer. As such each broker instance is uniquely identifiable from the outside and needs its own external probing.

Load-balancing is layer-7 terminating SSL as well as websocket connections for the brokers’ wire-protocol. Each load-balancer IP is dictated by the broker-lookup peer mesh; hence the mesh itslef is an important component in load-balancing. Each load-balancer is tuned to have a Kubernetes service (virtual IPs) per broker-instance allowing sufficient file-handles and ephemeral ports to support all web-socket connections within the cluster.

The cluster is instrumented with Prometheus and visualization–for tuning and introspecting the cluster–is done with Grafana.

Single Pod Deployments; One Persistent Volume per Broker Instance

Within the Kubernetes cluster each broker instance runs in a single pod deployment that stores user data on one-to-one mapped mounted file-system. The broker instance reads and writes files directly to its specific mounted volume. The file-system is provided by a Kubernetes Persistent Volume Claim (PVC). The architecture is constrained to allow PVCs in ReadWriteOnce access-mode: the ReadWriteOnce PVC constraint means only a single pod in the cluster can write to the volume mount at any given time. Desiging for this constraint enables re-deployment with a good range of cloud providers and retains good performance–as this access-mode is well supported and performant.

Additional Reading

Performance tests with DigitalOcean’s ReadWriteOnce PVCs as well as non-critical-path writes to IPFS can be found in the performance-of-go-ipfs repo.

An introduction to the file organization on the file-system can be found in the data decentralization write-up.

Storage

Each user (identity) has a specific broker instance assigned to them for data-storage. The specific broker’s address is stored in the broker-lookup peer mesh. This is a distributed hash-table that dereferences a user’s identification to a broker instance.

The broker-lookup peer mesh exists as a standard lookup mechanism for the whole ecosystem–irrespective of who the actual broker instance host is. A user can switch providers and have their data migrated out of one cluster and into another. An initial registration with a broker or a migration updates the lookup mesh.

As such, for our users the lookup mesh directs them to a broker instances inside of our cluster. Data access connections are forwarded through the load-balancer to the actual instance.

Backup

All broker state is stored in the file-system of the mounted volume and the relibility of the mounted volume is relegated to the cloud provider: these are managed volumes.

As an additional safeguard we have an opt-in IPFS integration. Persisted files can be eventually consistent on a decentralized persistence network. Please reference the data decentralization write-up for more information.

Failover

Broker instance pods undergo regular Kubernetes failover as per liveness probing.

Scaling

Storage Threshold – Multi-Tenant

In a normal case each broker is multi-tenant whereby many users share the broker to access their data.

Should a storage threshold be reached some users need to be moved off of the threshold triggering broker-instance and have their data relocated to a new broker instance. Likely a broker’s users will be split evenly in half–by data-storage size–where users comprising half of the data are moved off to a fresh node.

There is no service interruption in such a move. When a decision is made to move a user to another broker instance, an export+fill migration is enacted; except the migration is not dictated by a client from the outside but is driven by the cluster from the inside. Instead of copying data over-the-top, the data is copied internally within the cluster.

The same migration can be done in the reverse direction: two or more near-empty broker instances can be consolidated into a single broker instance.

The migrations leverage the lookup mesh as per the export+fill migration write-up.

Storage Threshold – Single Tenant

This is a case whereby a single user’s data fills up a storage volume, is reaching storage thresholds, and it’s not feasible to simply increase the file-system PVC size with the cloud provider hosting the cluster.

Looking at the file folder organization as per the decentralization write-up we can imagine many opportunities for sharding a user’s data between multiple file-system volumes.

Compute Threshold

Broker instances should be I/O intensive and not compute intensive, as such reaching this threshold and necessitating compute scaling is not anticipated.

Should a broker instance reach a compute threshold, the easiest solution would be to look into leveraging ReadWriteMany mounted volume semantics. Switching to ReadWriteMany PVC access-mode would allow multi-pod deployments across multiple nodes, distributing compute load.