Introduction

(Document status: draft)

Apocryph is a decentralized network for verifiable, confidential, general compute. This network powers its unique ecosystem visualised in the diagram below:

Apocryph ecosystem

The ecosystem consists of autonomous parties exchanging value under the governance of blockchain technology. There are three fundamental groups os parties: Hardware Providers, End Users and Application Publishers (Developers); and one group intrinsic to the network: Autonomous Applications;

Hardware Providers

They provide confidential computing services to the ecosystem and receive payment for their services. The value chain of these confidential computing services is targeted towards the End User as the final destination.

End Users

They consume software in a private / confidential way based on SaaS model and pay for the computing service consumed by them. End Users are the final destination within the ecosystem, but upon the explicit decision by the End User potentially there could be other end users outside of the ecosystem that are using the Private SaaS if it supports some form of built-in multi-tenancy / multi-user capabilities (light blue bubble). Based on this, the Private SaaS itself can trigger its own market powered by Apocryph ecosystem (a very basic example is when the End User is a Company that runs a service for its employees).

Application Publishers (Developers)

They provide cloud software in the ecosystem, essentially enriching the number of use cases that the End User can fulfill by using the ecosystem. It is important to note that the Application Publisher role can be shared with the End User essentially the End User can bring a use case (cloud software) in the ecosystem, pay for its execution and consume it. In a similar way the Application Publisher role can be shared with the Hardware Provider role where the Hardware Provider also brings in the use case / application service.

Autonomous Applications

They provide end-user or platform services in the ecosystem, the services are metered and they get paid by the end-users or by other applications using the services. They have their own wallet and associated DAO governing their updates. They are typically packaged and launched in the network by Application Publishers, who typically is representing their governing DAO. After the initial launch they are no longer dependent on the Application Publishers as they can perform self-updates based on governing DAO decisions.

App Publishing

(Document status: mostly complete)

Much of the architecture of Apocryph revolves around two key pieces, the "Publisher Client" and the "Provider Client". A publisher is a buyer in the Apocryph network seeking to provision their pod/container on the network. A provider is a seller seeking to offer their hardware for rent.

The planned version of Apocryph takes care of matching the two and provisioning the pod on a specific target provider. While it offers a way to manually pick a provider, it defaults to automatically picking one for the user. However, it does not take care of operational issues that might subsequently arise; specifically, it does not take care of rescheduling pods when a provider becomes unavailable nor does it make any guarantees about uptime or availability of data—those will be handled in later versions. Such concerns are kept track of in the backlog.

Bird's-eye view

flowchart
  classDef Go stroke:#0ff
  classDef Chain stroke:#f0f
  classDef Lib stroke:#000
  classDef User stroke:#ff0

  subgraph Publisher machine
    RegistryPub[K8s Registry]:::Lib
    PubCl[Publisher Client]:::Go
  end
  External[External Clients]:::User
  Network[libp2p/IPFS]:::Lib
  subgraph TEE
    ProCl[Provider Client]:::Go
    KubeCtl[K8s Control Plane]:::Lib
    RegistryPro[K8s Registry]:::Lib
    Http[HTTP Facade]:::Lib
    App[Application Pod]:::User
  end
  subgraph On-chain
    RegistryC[Registry Contract]:::Chain
    PaymentC[Payment Contract]:::Chain
  end

  %% KubeCtl -- Execute --> ProCl
  
  RegistryC -- List Providers --> PubCl -- Create --> PaymentC -- Monitor --> ProCl
  RegistryPub -- App Container --> PubCl
  PubCl -- Encrypted App Container --> Network -- Execution request --> ProCl
  ProCl -- App Container --> RegistryPro --> App
  ProCl -- Configuration --> KubeCtl
  KubeCtl -- Execute --> Http -- Metrics --> KubeCtl -- Execute --> App
  External -- Requests --> Http -- Requests --> App

As a sequence of steps:

  1. The user starts the Publisher Client to deploy a container
  2. The Publisher Client collects the Pod Manifest
  3. The Publisher Client gets the list of providers from the Registry Contract
  4. The Publisher Client selects a provider, using the configured strategy (automatic or by asking the user to manually make a choice)
  5. The Publisher Client creates a Payment Contract and transfers the initial payment amount (possibly in parallel with steps 5-6)
  6. The Publisher Client bundles up the Pod Manifest, any related resources, and the Payment Contract's address and sends them to the Provider Client over the Deployment sub-protocol, encrypted
  7. The Provider Client creates the relevant configurations for the Pod using the Kubernetes API, including an HTTP Scaler, an Application Pod, and Monitoring
  8. The Provider Client confirms receiving the manifest and resources
  9. When HTTP requests come in, the HTTP Scaler contacts the Kubernetes API in order to scale the Application Pod up
  10. The Application Pod is started using the configuration from earlier
  11. The Application Pod handles the incoming requests
  12. After a period of no requests the Scaler uses the Kubernetes API to scale the Application Pod down
  13. The Monitoring component keeps track of how many e.g. CPU-seconds the Application Pod has run for, and forwards these metrics to the Provider Client
  14. The Provider Client submits the metrics to the Payment Contract and is then able to claim the payment due
  15. Whenever the Payment Contract runs out of funds, the Provider Client removes the related configurations from Kubernetes
Sequence Diagram
sequenceDiagram
  box
  actor User as User
  participant PubCl as Publisher Client
  end
  box
  participant RegistryC as Registry Contract
  participant PaymentC as Payment Contract
  participant IPFS as IPFS Network
  end
  box
  participant ProCl as Provider Client
  participant RegistryD as Docker Registry
  participant K8s as Kubernetes API
  participant Monitoring as Monitoring
  participant HTTP as HTTP Scaler
  participant App as Application Pod
  end
  %% 0.
  %% 1.
  User ->>+ PubCl: Pod Manifest
  User ->> PubCl: Pod Container
  %% 2.
  PubCl -->>+ RegistryC: List Providers
  RegistryC ->>- PubCl: 
  %% 3.
  PubCl ->> PubCl: Select Provider
  %% 4.
    %%PubCl ->>+ ProCl: Initial execution request
    %%ProCl -->>- PubCl: Confirmation
  PubCl ->> PaymentC: Create & Configure
  %% 5.
  PubCl ->>+ IPFS: Upload Encrypted Container
  PubCl ->>+ ProCl: Execution request
  %% 6.
  ProCl -->> IPFS: Download Container
  IPFS ->>- ProCl: 
  ProCl ->>+ PaymentC: Monitor
  ProCl ->> K8s: Configure Application & Scaler
  K8s ->>+ HTTP: Start Intercepting
  %% 7.
  ProCl -->>- PubCl: Confirmation
  PubCl -->>- User: 

  %% 8.
  User ->>+ HTTP: HTTP Request
  HTTP ->>+ K8s: Scale up
  %% 9.
  K8s ->>- App: Start w/ Secrets
  K8s ->>+ Monitoring: App up
  App ->>+ RegistryD: Fetch Image
  RegistryD ->>+ IPFS: 
  IPFS ->>- RegistryD: OCI Image
  RegistryD ->>- App: 
  App ->> App: Decrypt Image
  %% 10.
  HTTP ->>+ App: Request
  App ->>- HTTP: Response
  HTTP ->>- User: 
  %% 11.
  note over HTTP: Time passes
  HTTP ->>+ K8s: Scale down
  K8s -x- App: Stop
  Monitoring ->>- K8s: App down

  %% 12.
  ProCl ->>+ Monitoring: Get Metrics
  Monitoring ->>- ProCl: 
  %% 13.
  ProCl ->>+ PaymentC: Submit Metrics
  PaymentC -->>- ProCl: 

  %% 14.
  User ->>+ PaymentC: Unlock Funds
  PaymentC -->> User: 
  PaymentC -->>+ ProCl: 
  deactivate PaymentC
  opt insufficient funds left
  ProCl ->>- K8s: Remove Configurations
  K8s ->> HTTP: Stop intercepting
  deactivate HTTP
  end
  deactivate PaymentC

Application Package

(Document status: work-in-progress)

Manifest file

The manifest file, tentatively trustedpods.yml, is read by the Publisher Client and used to assemble the on-wire manifest that is then sent to the Provider Client. What follows is an example manifest file as it was initially conceptualized; the current implementation simply reads off a yaml-encoded protobuf object using the on-wire manifest format and does not follow the format below. (Use trustedpods init to get a sample file generated.)

# WARNING: Pseudo-code
type: "trustedpods"
version: "1.0"
containers:
- image: localregistryname:tag
  command: override command # or command: ["override", "command"] # $(VAR_NAME) as in K8s -- ENTRYPOINT
  args: override args # or args: ["override", "args"] # $(VAR_NAME) as in K8s -- CMD
  workingDir: /override/pwd/ # as in K8s
  port: 80 # HTTP port (must have only one per pod)
  host: example.com # HTTP hostname used to route requests to the container (must have only one per pod)
  ports:
  - 123:321 # port mapping, as in docker-compose
  - 123 # port mapping, as in docker-compose
  - port: 123 # as in K8s services
    targetPort: 321 # as in K8s services
    protocol: TCP # or UDP, as in K8s
    hostIP: false # request that the port be exposed to the external world; otherwise it will be accessible only using k8s DNS
  env:
  - name: XX
    value: VAL
  volumes:
  - mountPath: /vol # as in K8s
    name: vol # alternatively - without name, copy the same fields from the volume definition here.
    readOnly: false
  resources:
    cpu: 1000m # in milliCPU; equivalent to K8s Requests
    memory: 1Gi
    nvidia.com/gpu: 1
replicas:
  min: 0
  max: 1
volumes:
- name: vol
  type: volume # or emptyDir or secret
  resources: # for type: volume
    storage: 8Gi
  source: ./publisher/local/file.json # for type: secret

Note that the Publisher Client might eventually gain functions for reading other kinds of manifests, such as for directly consuming docker-compose files.

References:

Application Registry

The Registry connects publishers and providers in a decentralized environment, promoting service discovery, competitive and transparent pricing.

Smart contract

  • Holds a list of providers
  • Each provider have a name, Region(s), contact information, request endpoint(s), and attestation details.
  • Providers can submit their custom pricing table
  • Each pricing table is identified by a unique ID.
  • An event is triggered For each submission of a new pricing table (this could be valuable for publishers who are interested in exploring competitive pricing or for providers looking to compete with one another).
  • Each pricing table is associated with a set of available providers.
  • A provider can unsubscribe from a pricing table and move to another one, if the pricing table has no providers it is deleted

Provider

  • Get the list of available pricing tables from the registry
  • Create or Retrieve Attestation details
  • Registering in the contract by providing the following information:
    • Contact Information,
    • Name
    • Pod execution request endpoints
    • Attestation Details
    • Available Region(s) for pod execution
    • Support for Edge computing(optional)
  • Chooses a pricing table or create a custom pricing table.

Publisher

  • Retrieves the list of available pricing tables from the contract
  • Create a configuration which includes:
    • pricing table
    • Choose the Region(s) in which the pod will be hosted.
    • Optionally, you can specify the amount of funds you are prepared to allocate to your pod or the desired duration for its execution. The client application will then automatically propose a pricing table(s) tailored to your preference.
  • If automatic selection is enabled, it will select a provider filtered by the configuration
  • the publisher pings the provider and checks its availability
    • In the event that the provider is offline, the publisher iterates through the provider list associated with the pricing table until it identifies an available provider.
  • The publisher creates a payment channel configured with the selected pricing planand and initiates the pod execution request protocol

Application Deployment

(Document status: barebones)

When the Publisher Client connects to the Provider Client, it makes use of a libp2p connection; likely through the IPFS DHT, unless the Provider has advertised a stable IP earlier. These connections use the protocols /apocryph/attest/0.0.1, /apocryph/provisioning-capacity/0.0.1, and /apocryph/provision-pod/0.0.1, which is based on a Protobufs protocol defined in the :/proto/ folder.

The basic structure of this protocol is the following:

  1. The Publisher requests an attestation from the Provider using the /apocryph/attest/0.0.1 libp2p protocol.

  2. The Provider replies with an attestation (and optionally, the resource capacity available), proving that the whole Provider stack (including the endpoint of the current stream) is running inside a TEE which is trusted by the Publisher.

  3. Optionally, the Publisher inquires about the resources that will be requested, using /apocryph/provisioning-capacity/0.0.1. Resource requirements can include amounts of CPU cores, RAM memory, GPU presence, specific CPU models, and even certain numbers of external IPs available.

  4. Optionally, the Provider replies with the resources that would be offered / that are available; along with the prices (and payment address) at which the Provider is willing to offer those.

  5. The Publisher sends a message that includes the on-wire Manifest and payment channel information using /apocryph/provision-pod/0.0.1.

  6. The Provider provisions the requested services, and replies with a status message

Manifest wire format

When the pod manifest is transferred between the Publisher Client and Provider Client, it uses a modified version of the usual manifest format, based on protocol buffers, in order to reduce the ambiguity of using YAML.

Main changes:

  • There are no longer multiple ways to define volumes; all volumes must be in an array at the end.
  • There are no longer multiple ways to define ports; all ports use the (port, targetPort, protocol, hostIP) fields.
  • Likewise for the command and args fields.

Finally, the image field is possibly converted to an IPFS hash and key, containing respectively the output of imgcrypt uploaded with ipdr and the private key that can be used to decrypt it.

Pricing

(Document status: wip)

A Pricing Table defines the terms under which a given provider is willing to rent its compute resources for running pods. The Pricing Table is typically represented using a Protobuf object encoding the prices for reserving or using certain resources.

resourcetypical usagesuggested/example metrics
cpureserve (while application is running)kube_pod_resource_request{pod, resource=cpu}
memoryreserve (while application is running)kube_pod_resource_request{pod, resource=memory}
storagereserve (incl. while application is stopped)kubelet_volume_stats_available_bytes{namespace}
nvidia.com/gpu(|shared)reservekube_pod_resource_request{pod, resource=nvidia.com/gpu(|.shared)}
apocryph.network/ipreservekube_service_spec_type{namespace, type=NodePort}
kubernetes.io/(in|e)gress-bandwidthusagenginx_ingress_controller_nginx_process_(read|write)_bytes_total

The pricing table must encompass all essential billing details for clients, and it might be as straightforward as this:

ResourceDescriptionPrice
CPUN° of Cores/ N° of vCPUs$0.0001 VCPU(min/s/ms)
RAM CapacityCapacity (ex: GB)$0.00001 GB/(min/s/ms)
StorageType (e.g., Block, Object)$0.00001 GB/(min/s/ms)
GPU (Optional)Model$0.0001 per Execution (min/s/ms)

Or it could be split into categories with more detailed information:

Compute Pricing

  • CPU

    ResourceDescriptionNumber of CoresvCPUsModelTEE TypePrice per Unit
    CPUProcessing powerCoresvCPUsIntel, AMD, ARM, ...etcEnclaves, CVMs, ..etc$0.0001 VCPU(min/s/ms)
  • Ram

    ResourceDescriptionCapacityPrice
    RAMMemory capacityCapacity (ex: 1GB)$0.00001 GB/(min/s/ms)

Storage Pricing

ResourceDescriptionCapacityStorage TypePrice per Unit
StorageStorage resourcesCapacity (ex: 10GB)Block, Object, ...etc$000001 per GB(min/s/ms)

References:

Autoscaler

The Autoscaler autonomous applicatoin is designed to enable automatic redeployment across various providers, ensuring flexibility and resilience in deployment strategies. Below is a detailed explanation of its key features and functionalities:

Provider Selection Configuration

This feature allows users to configure criteria for selecting deployment providers and determining the number of instances to run. Users can specify detailed selection parameters to optimize deployment according to their requirements.

Deployment Modes

  1. Standalone Mode:

    • Operates as a sidecar container.
    • Accepts pod configurations for a base protocol.
    • Utilizes the provider selection configuration and a self-contained wallet for redeployment.
  2. Network Mode:

    • Functions as a separate network or autonomous application.
    • Accepts the same pod and provider selection configurations, but likely through an additional specific protocol.
    • Includes a payment system to manage the autoscaler, which tracks per-application balances.

Routing and Service Discovery

The application provides robust service discovery and DNS hosting for the deployed applications. It includes the following capabilities:

  • DNS of Autoscaler Network:
    • Linked to a real domain, making it discoverable as an authoritative DNS nameserver.
    • Facilitates efficient routing and service discovery within the network.

Load Balancing

The application offers potential load balancing features. This is useful for redistributing the load across application containers. It supports "true" scale-to-zero, deploying on a provider and scaling to zero as needed, though it is typically preferable to scale within the provider's environment.

Key Management

This system manages keys for decrypting application secrets, enabling secure and effective deployments. It ensures that all necessary security credentials are handled correctly during the deployment process and can be used by other applications requiring secure key management.

The Key Management is general and reusable component that could be splitted as a separate applicatoin

Marketplace

(Document status: draft)

Marketplace is autonomous application through which a publisher (second B) can publish a software package (pod) without actually deploying it. Instead, users (X) can take the software and deploy it to providers (first B) themselves (for a fixed price paid to the developer, or a percentage of execution costs, or under some other agreement). Marketplace allow the end-users to browse for different applications and hardware providers, which itself enable the standard marketplace monetization strategies - sponsored apps, promotions and etc. This provides the marketplace with its own revenue stream.

To describe this behavior, we add one extra end user to the rest of the architecture: the "developer". The developer is the one creating pod images, while providers are still the people running Apocryph clusters, and publishers are still the people provisioning the exact pods on Apocryph providers.

Deafault implementation (provided by Apocryph)

If the developer were to upload a complete pod, we run into the problem that this won't allow the publisher to interact with or configure the leased software except through the remote connections exposed by that pod -- disallowing, crucially, small modifications meant to ease the usage of the leased software. Hence, in the proposed implementation, the leased code should be provided as a series of encrypted OCI images, and not as a complete encrypted pod manifest. (Developers are still encouraged to upload sample manifests in addition to the encrypted images.)

Furthermore, if the developer were to upload a pod image encrypted, it would be trivial for users to download the pod image, repackage it, and then upload it as if they are the developers themselves -- and sidestepping the whole Marketplace in the process. Therefore, the developer should upload the pod image encrypted, perhaps using the ocicrypt library.

To avoid running into the whole debate of DRM and Software Licensing, instead of letting developers attest providers' TEEs on-demand, we have developers attest providers beforehand, and then encrypt the image keys for those particular providers (or equivalently, and then upload the decryption keys to those providers).

sequenceDiagram
    participant Publisher
    participant Provider
    participant Developer
    participant Blockchain
    participant Docker/IPDR
    Developer->>Docker/IPDR: Upload encrypted image, referencing self
    Developer->>+Provider: Attest
    Provider->>-Developer: Report
    Developer->>Docker/IPDR: Publish decryption key,<br/>encrypted with Provider's key
    note over Developer: later...
    Publisher->>+Provider: Attest
    Provider->>-Publisher: Report
    Publisher->>Blockchain: Create payment with reference to Developer
    Publisher->>+Provider: Provision pod
    Docker/IPDR->>Provider: Download encrypted image info
    Provider-->>+Blockchain: Verify payment,<br/>including that it references Developer
    Blockchain-->>-Provider: 
    Provider->>Provider: Deploy
    Provider->>-Publisher: Deployment info
    note over Developer: later...
    Publisher->>+Provider: Use deployed pod
    Provider->>-Publisher: Result
    Provider->>+Blockchain: Withdraw payment
    Blockchain-->>-Provider: Funds
    Developer->>+Blockchain: Withdraw payment
    Blockchain-->>-Developer: Funds

In terms of what will happen in a realistic user story:

  • The Developer uses Apocryph tooling to upload and encrypt an Docker image they have built locally, and selects a few providers (double-checking attestations) where it would be runnable, then markets their unique software solution through whatever conventional channels they have at their disposal.
  • The Publisher (perhaps a "normal" user with a web browser, perhaps a power user using the CLI) finds about the Developer's solution, and includes it as a CID reference in their pod manifest (or uses a manifest premade by the developer). They configure any environment variables, volumes, resource quotas, and additional containers specific to their pod, and once happy with the results, deploy the pod manifest as usual.
    • The CLI or browser interface warns the user that they are using the Marketplace, and includes any related costs in the a price breakdown along with explaining what is being paid to the Developer and the Provider.
  • The Provider receives the pod manifest, and upon encountering the encrypted image, confirms that the uploaded image can be decrypted by them and that any payment requirements outlined in the image are met by the payment contract details specified in the request.
  • The Provider decrypts the container image and starts executing it as normal.
  • Upon the Provider submitting a withdraw request to the payment channel, part of the funds gets distributed to the Developer; in way that allows the Developer to withdraw their part independently of the Provider's withdrawal.

Edge cases and attacks

  • If someone extends a provider's image with extra layers, they can trivially include binaries and modify commands in a way that would allow them to extract the whole contents of that image after it's been decrypted. Hence, the provider must double-check that the decryption keys and payment details refer to the same image id/hash.
  • If someone modifies the commandline used to start the image, they can similarly run commands that would extract the whole contents of the image, trivially. Hence, we should disallow changing the container's entrypoint when Marketplace is used (as well as warn developers that the arguments passed to the entrypoint can be customized by the publisher).
  • If a provider can only see a stale fork of ethereum, they might be able to run the developer's software without the developer getting paid... though in that case, the provider won't get paid either. This can be prevented through e.g. some variation of the uptime monitoring.
  • If someone uses containers from multiple developers in the same pod, the payment channel should be designed in a way that allows specifying multiple developers.

Road to MVP

Base Protocol

A Publisher connects to a Provider to run a pod. No understanding of a "network".

  • Pricing: Publisher requests pricing details and attestation from Provider. ✅
  • Payment: Publisher transfers payment to Provider, typically through on-chain escrow. ✅
  • Provisioning: Publisher communicates image details to the Provider, including exact resource requirements, external port/routing information, and any volumes/secrets. Also allows modifying an already running pod, likely with a restart. ✅ (also includes encryption of images/secrets with an extra layer of keys for easier upload-once-deploy-anywhere)
  • Monitoring: Publisher requests monitoring data from the Provider. ✅
  • Stacking: Allows publishers to get repaid in case the provider goes dark. ⏳
  • Paired-Down Constellation: For providers with monitoring and admin interface. Addons? ⏳ (partially done)
  • Tooling for Publishers: To create and deploy pod configuration. ✅

Autoscaler Protocol

Once deployed, allows an app to automatically redeploy itself on various providers.

  • Provider Selection Configuration: Allows configuring criteria (needs to specify) for selecting providers to deploy to and for deciding how many instances to run. ⏳
  • Standalone Mode: Works as a sidecar container; accepts pod configuration(s) for base protocol and redeploys that using the provider selection config and a self-contained wallet. ⏳
  • Network Mode: Works as a separate network/autonomous application; accepts same pod config and provider selection config, but likely through an extra protocol specifically for that. ⏳
    • Payment: Allows paying the autoscaler; the autoscaler then keeps track of per-application balances. ⏳
  • Routing: Providers service discovery and hosts DNS for the applications deployed through it. ⏳
    • DNS of Autoscaler Network: Linked to a real domain where it's discoverable as a real authoritative DNS nameserver. ⏳
  • Potential Load Balancing: If one would rather have something redistributing the load in front of their application containers (required for "true" scale-to-zero; but probably not what one really wants - one would rather have things deployed on a provider and scaled to zero on that provider). ⏳
  • Key Management: Manages keys for application secret decryption (so it can actually deploy something). ⏳

Marketplace Autonomous Application

  • Primary Function: Exists as an autonomous application, where providers can get themselves listed, and publishers (/autoscalers) can query using various criteria (needs to specify). ⏳
  • Reverse Market: Applications get listed and providers can bid on them. ⏳
  • Integration: Integrated with publisher (tooling) and provider (addon? so it's outside the trusted base) out-of-the-box. ⏳

Backlog

This file seeks to document the tasks left to do in this repository, as well as design flaws and accumulated tech debt present in the current implementation.

Rationale for not shoving all of this in GitHub Issues: while Issues are a great way for users and contributors to voice concerns and problems, for huge planned milestones, it feels simpler to just have them listed in a format more prone to conveying information and not to discussing the minutiae details.

Features yet to be implemented

  • Integrate attestation
  • Support private docker registries
  • Registry support in web frontend

Technical debt accumulated

Prometheus metrics used for billing

Status: Alternative prototyped

Prometheus's documentation explicitly states that Prometheus metrics are not suitable for billing since Prometheus is designed for availability (an AP system by the CAP theorem) and not for consistency / reliability of results (which is what a CP system would be).

Despite that, the current implementation uses Prometheus and kube-state-metrics to fetch the list of metrics that billing is based on. A prototype was created in the metrics-monitor branch to showcases an alternative way to fetch the same data from Kubernetes directly and avoid any possible inconsistencies in the result, yet it was decided that it's better to iterate quickly with Prometheus first instead and come back to this idea later.

A single monolithic tpodserver service

Status: Correct as needed

Currently, the whole of the Apocryph' Provider client/node is implemented as a pair of long-running processes deployed within Kubernetes -- one for listening for incoming Pod deployments and one for monitoring them. Going forward, it could be beneficial to make more parts of that service reusable by splitting off libp2p connections, actual deployments, metrics collection, and smart contract invoicing into their own processes/services can be changed or reused on their own.

Payment contract is one contract and not multiple

Status: Still evaluating, alternative prototyped

The payment contract currently takes care of absolutely all payments that pass through Apocryph. However, it might be worth splitting it into a factory/library contract and small "flyweight" contracts instead. Currently, that is prototyped in the contract-factory branch, but it ended up using more way more gas for deployment, so it was temporarily scrapped.

Using Kubo/IPFS p2p feature marked experimental

Status: Requires research

Kubo's p2p API is marked as an experimental feature, and is predictably rather finicky to work with. Moreover, it may very well be removed one day, with or without alternative, as is happening with the pubsub feature.

As such, it would be prudent to move away from using the p2p features of Kubo (and away from requiring Kubo-based IPFS nodes), and instead roll out an alternative, likely based on libp2p. This will likely be easier once the planned Amino/DHT refactor lands.

ipfs-p2p-helper is a sidecar

Status: Correct as needed

Currently, the ipfs-p2p-helper, a small piece of code responsible for registering p2p listeners in Kubo. Doing so is a bit tricky, as the Kubo daemon does not persist p2p connections between restarts, and hence we have to re-register them every time the IPFS container restarts.

This is currently done using a sidecar container (a container in the same pod), so the helper gets restarted together with IPFS -- and to top that off, it just watches the list of Services for ones that are labeled correctly. Ideally, if we keep using the p2p feature of Kubo, we would rewrite ipfs-p2p-helper to be a "proper" Kubernetes operator with a "proper" custom resource definition.

Custom HTTP client implementation in web frontend

Status: Correct as needed

Currently, the Libp2p Connect transport implemented in the repo ends up reimplementing a whole HTTP client, just for the sake of sending ConnectRPC messages over a libp2p connection. This is not ideal, as HTTP clients are notoriously complicated to implement right, and while it's unlikely that ours is rifle with vulnerabilities, it's also unlikely that implementing one ourselves is the best way forward.

The two main options here would be to either drop ConnectRPC completely and implement framing ourselves (and thus reimplementing ConnectRPC/GRPC while still using Protobufs for the message serialization itself) or to use an existing implementation of the HTTP client, such as node's HTTP package. Alternatively, if we use the Kubo/IPFS p2p feature instead of importing libp2p into the browser, we might be able to directly use ConnectRPC with the correct port numbers, at the cost of losing encryption and (currently) authenticity of the requests, unless the user is running their own Kubo node.

Secret encryption done with AESGCM directly

Status: Correct as needed

Currently, we encrypt secrets' data ((see EncryptWith/DecryptWith)) with AESGCM directly, forgoing using any libraries that could do this for us and give us a more generic encrypted package. Ideally, given that the rest of the code uses go-jose we would use go-jose's encryption facilities directly -- however, JWE objects base64-encode the whole ciphertext... making them ~33% less efficient in terms of space on-wire! Hence, we opt to directly write the bytes ourselves and save on some space.

Some ways to improve the situation would be to contribute BSON functionality to go-jose (unfortunately, such functionality would not be standards-compliant, unless someone goes the whole way to suggest BSON (or other binary) serialization for RFC7516), to switch to using PKCS11 instead of JSON Web Keys, or implementing our own key provider for ocicrypt (which was the reason to start using JSON Web Keys in the first place), perhaps one based on ERC-5630. Alternatively, we could look into other standards for storing encrypted secrets, such as IPFS/Ceramic's dag-jose or WNFS or any of the other nascent IPFS encryption standards.

Code duplication in cmd/trustedpods

Status: Correct as needed

A lot of the code in cmd/trustedpods has to do with setting up the environment for things like Ethereum connections, IPFS connections, deployment files, provider addresses, etc., and even IPFS uploads are in a sense a dependency of sending requests to the provider. It would be nice if we could express everything as a pipeline of dependencies that each inserts its own flags into the command parser and then gets processed in turn so as to create the whole desired environment in the end.

This has been attempted in the past (outside of Git history), but the result was even less managable. Perhaps, this is something cobra is not particularly well suited for, and an additional (homegrown?) dependency management system would help. Either way, the code duplication is not horrible, and the repo will survive as-is for a long time before it becomes problematic.

Missing features

Constellation cluster recovery not handled

Status: Solutions outlined

Constellation, the confidential Kubernetes solution we have opted to use, works by bootstrapping additional nodes on top of an existing cluster through their JoinService -- whereby a new node asks the old node's Join service for the keys used for encrypting Kubernetes state, while the old node confirms the new node's attestation through aTLS. This makes it excellent for autoscaling scenarios; however, in the case a full-cluster power outage occurs, it leaves the cluster in an hung state, as there is no initial node to bootstrap off of, and requires manually re-attesting the cluster and inputting the key that was backed up when provisioning the cluster initially -- as documented in the recovery procedure documentation

For Apocryph, however, we cannot trust the provider with a key that decrypts the whole state of the cluster - as that will destroy the confidentiality of the pods running within Apocryph. Hence, when recovering an existing cluster, or when initially provisioning a cluster, we would need a securely-stored key that can only be accessed from an attested TEE that is part of the cluster.

There are multiple ways to do so. A simple one would be to generate and store the key within a TPM, and making sure the TPM only reveals the key to the attested TEE; this still leaves attesting that the key is generated there as an open task. Another one would be to modify Constellation to allow for the master secret to be stored encrypted with the TEE's own key (inasmuch as one exists) - so that the same machine, when rebooted, can be bootstrapped on its own. And finally, a more involved solution would be to use Integretee or an equivalent thereof to generate and store cluster keys in a cloud of working attested enclaves.

Apocryph cluster attestation

Status: Correct as needed

Constellation allows attesting a cluster.. however, upon closer inspection, the attestation features provided only allow attesting that the whole machine is running a real Constellation cluster in a real TEE enclave... and say nothing about the containers running inside that cluster. This is only fair, perhaps, given that the containers can be configured in ways that could allow them to escape the confines of their sandboxes; however, it does mean that attestation, if implemented, will not be sufficient to convince the publisher the peer they are talking to is a Apocryph node.

The main solution to this, other than switching away from Constellation (to, e.g. Confidential Containers, despite them not being fully ready yet), would be to modify the base Constellation image so that it includes an additional API, either running within or without a container, whose hash is verified in the boot process, and which allows querying, and hence, attesting the rest of the Kubernetes state. Alternatively, the image could be modified to attest the Apocryph server container as part of the boot process; however, this feels like too much hardcoding.

Apocryph cluster hardening

Status: Known issue

In line with the two notes about Constellation's cluster recovery and attestation features, a third departure of a Apocryph cluster from what Constellation provides out of the box is the fact that Constellation issues an admin-level Kubectl access token upon installation; however, we would like to keep parts of the Apocryph cluster inaccessible even to the administrator.

For that, we would likely need to issue a Kubectl access token with lesser privileges, allowing for only partial configuration of the Apocryph cluster. The customizable features should be selected carefully to align with Provider needs, to allow for things like configuring backups and some kinds of dashboards and monitoring, while minimizing the leaking of user privacy.

Storage reliability

See the respective document for an in-depth storage reliability design proposal.

Uptime reliability

See the respective document for a more in-depth uptime reliability design proposal.

Software licensing

See the respective document for a more in-depth software licensing design proposal.

Individual TEEs

Status: Needs more usecases

Currently, the architecture of Constellation uses a single TEE encompassing all Kubernetes pods running in the cluster. However, for extra isolation of individual tenants, it could be beneficial to have separate TEEs for each publisher / pod / application. To implement that, we will likely end up scrapping Constellation and revamping the whole attestation process. As this is quite a bit of design and implementation work while the gains at this stage are minimal, we have opted to let the idea ruminate for the moment.

Forced scale-down

Status: Conceptualized

It would be great if we didn't just rely on KEDA's built-in scaling down after a certain time, but also allowed Pods to request their own scaling down. See also this issue.