EPF4 Update 3, All into statelessness...

If the math crashes in the website, read this

Hi there, it’s Agnish again. After racking my head over ALOT of topics, and even more github issues, I am finally able to narrow down, what I REALLY want to contribute and work on for this EPF Cohort.

So, this week I have dived deep into Verkle trees and how the roadmap of Stateless Ethereum is being planned. I’m doing an overall update of whatever I’ve learnt in this domain for the past week.

General Background

I started off with Vitalik’s first Verkle Tree post, he repeatedly referred to the first by Verkle Tree paper by Kuszmaul back in 2018. He also discussed about the merits of VKT over Merkle trees with the most important ones being :

Merkle trees generate proofs that have data of each of the siblings in the particular node while Verkle trees don’t. In practice, a merkle tree has 15 siblings for each node. Whereas, the average depth of a Verkle Tree is roughly ~4, VKTs specify more on the index of the of the particular child and it's path leading towards the root.

So here’s a basic comparison table stating the exhaustive details of PROOF SIZES for both MKT and VKTs:

Trie	Specifics in each proof	Levels	Proof Size	Cryptographic Reqs
Merkle Patricia Trie	leaf data + 15 siblings, 32 bytes for each level	~7	~3.5 MB for 1,000 leaves	Collision-resistant hash functions
Verkle Trie	leaf data + Polynomial Comm + value + `index of child, which is 32 bytes`, another 32 bytes + 1 byte + small constant size data	~4	~150 KB for 1,000 leaves	Earlier (for Eth1 State): KZG Polynomial Comms with BLS12_381 curve, Now (for Eth2): Pedersen Commitments with Bandersnatch Modulus

Mathematical Background

In real world scenarios, we can use a polynomial commitment as a vector commitment to generate proof for the root of a Verkle Trie.

Polynomial Commitments let’s us hash a polynomial, and make a proof for it’s evaluation and any point, usually we call that an opening.

First we need to agree upon a standardized set of coordinates, let us call it $[c_1,c_2,c_3,…,c_n]$, given a list of final points $[y_1,y_2,y_3,…,y_n]$, we can commit to the polynomial P such that $P(c_i) = y_i$ for all $i ∈ [1…n]$. We can find this particular polynomial using Lagrange’s Interpolation method.

Lagrange Polynomial Interpolation

This method is used to interpolate a given dataset and find the lowest possible degree polynomial that passes through the points of the dataset.

Formal Def:

The Lagrange Interpolation for points/nodes $[y_0,y_1,…,y_k]$, then the linear combination of these for points $[y_i]$ should look like -

$$\sum_{j=0}^{k} {y_j*l_j(x)}$$

where, $l_j(x)$ is the Lagrangian Basis for the polynomial.

$$ \mathcal{l_j}(x) = \prod_{m \neq j, m >= 0}\frac{x -x_m}{x_j - x_m} $$

To understand this better, you can tweak with this, Desmos graphing calculator.

Here, I have plotted 4 arbitrary points, $a,b,c,d$ and have plotted the Lagrangian Basis for point a.

This was the given polynomial, with $L_a$:

And hence, the curve would look like this:

This Lagrange Interpolation method can be used for computing polynomial commitments for both KZG and Bulletproof-style commitments (IPA).

Verkle Trie implementations for Eth1

VKT for Eth1 was largely based on KZG Polynomial Commitment Schemes. Initially, in a Merkle Tree of depth d, we could compute a commitment to a vector for elements in the Merkle Tree, which is, $a_0,a_1,…,a_2^d-1$.

By the definition of Polynomial Commitments, we define a polynomial $p(X)$ of degree $n$ is a function:

$$\sum_{i=0}^{n}p_iX^i$$

where $p_i$ are the coefficients of the corresponding polynomial.

Also, here the degree of the polynomial is the depth of the tree $n = 2^d - 1$, and by assigning $a_i = p_i$ and computing the proof of the coefficients, the prover wants to show the verifier that $p(z) = y$ for some $z$. The prover can do this by sending the verifier all such $p_i$ and the verifier can compute indeed, $p(z) = y$.

Some features using KZG

The commitment is nothing but a group element of an elliptic curve group (usually BLS12_381) that is based on pairings, generally 48 bytes long.

Need for the Trusted Setup

A short notation intro to Elliptic Curves and Pairings

Let $\mathbb{G_1}$ and $\mathbb{G_2}$ be 2 elliptic curves with a pairing $e : \mathbb{G_1} × \mathbb{G_2} -> \mathbb{G_T}$. Let $p$ be the order of $\mathbb{G_1}$ and $\mathbb{G_2}$, and $G$ and $H$ be generators of $\mathbb{G_1}$ and $\mathbb{G_2}$. We can represent that in the following ways:

$$[x]_1 = xG ∈ \mathbb{G_1}$$

and

$$[x]_2 = xH ∈ \mathbb{G_2}$$

for any setup $x ∈ \mathbb{F}_p$.

Now, let us assume that we have a trusted setup, so that for some secret $s$, the elements $[s^i]_1$ and $[s^i]_2$ are available to both the prover and the verifier for all $i = 0,…,n-1$.

Now, with ecc it’s impossible to find out what actually is from the trusted setup group elements. It’s a number in the finite field, but the prover cannot actually know the original number. They can only perform a certain set of group operations.

Hence, given that if $p(X) = \sum_{i=0}^{n} p_iX^i$ is a polynomial, then the prover can compute the following:

$$[p(s)]1 = [\sum{i=0}^{n}p_is^i] = \sum_{i=0}^{n}p_i[s^i]_1$$ Hence this can be represented as a group element.

In this way, everyone can evaluate a polynomial without knowing really what $s$ is, they only get an elliptic curve point such that $[p(s)]_1 = p(s)G$.

Verkle tries for Eth2 - Pedersen Commitments and Bandersnatch Modulus.

Definition:

Let $g$ and $h$ be elements of $G_q$ such that nobody knows what $log_g(h)$ is. These elements can either be chosen by a trusted center, when the system is initialized, or by (some of) the participants using a randomizing protocol.

These entire process DOES NOT need a trusted setup ceremony like KZG.

Note that here $p$ and $q$ denotes large primes such that $q$ divides $p-1$, and $G_q$ is a unique subgroup of $\mathbb{Z^*_p}$ of the order $q$, and $g$ is a generator of $G_q$. It can easily be tested if an element $a ∈ \mathbb{Z_q}$ is in $G_q$ since, $a ∈ G_q <=> a^q = 1.$ As any element $b$ != 1 in $G_q$ generates the same group, the discrete log of $a ∈ G_q$ with repsect to the base $b$ is defined and it is denoted by $log_b(a)$.

The commiter commits herself to an $s ∈ \mathbb{Z_q}$ by choosing $t ∈ \mathbb{Z_q}$ at random and she computes the following,

$$E(s,t) = g^sh^t.$$

Such a commitment can be again revealed later by revealing $s$ and $t$. Then the following theorem is very easy to prove and shows that $E(s,t)$ reveals no information about $s$ at all, and that the committer cannot open a commitment to $s$ as $s’ != s$ unless she can find what $log_g(h)$.

What I am left to cover so far:

I am yet to read more about the merits of Bandersnatch Modulus over BLS12_381 curve, how it is a 42% improvement over the Jubjub curve and how it’s implemented using the GLV multi-scalar multiplication method, and the deeper math behind how Multiproofs are computed using Inner Product Arguments.

Hoping to cover all of that by this week! I hope to dive deeper and work on the Hyperledger Besu implementation Verkle Tries in this github repo. I hope draft improvements of this Github repo in my project proposal as well. Also, looking forward to understanding more on State Expiry, Address Size Extensioning for several periods of data on Ethereum, and finally Portal Networks.

Happy Reading!

Published Aug 2, 2023

Staring into @go_ethereum, zkEVM @cipherem, Rollups-as-a-service @trueZkAgnish Ghosh on Twitter