This document briefly describes the technical details of Advogato's trust metric.
The basic trust metric evaluates a set of peer certificates, resulting in a set of accounts accepted. These certificates are represented as a graph, with each account as a node, and each certificate as a directed edge. The goal of the trust metric is to accept as many valid accounts as possible, while also reducing the impact of attackers.
Advogato performs certification to three different levels: Apprentice, Journeyer, and Master. This is actually done by running the basic trust metric three times, using the "level" value in the certificate as a threshold. Thus, certification of Apprentices is computed using all certificates, while Master is computed using Master certificates only.
The computation of the trust metric is performed relative to a "seed" of trusted accounts. At the time of this writing (22 Feb 2000), the seed consists of raph, miguel, federico, and alan.
The core of the trust metric is a network flow operation. Informally, if there is a rich web of interconnections, flow reaches almost all the nodes. However, only a few accounts would be accepted from a large nest of bogus accounts, as long as there are only a few certificates from the "good" web to the bogus accounts. Those certificates represent a bottleneck in the network flow.
The remainder of this document presents the actual trust metric in more detail, as well as an argument for the security against attackers.
Mapping into graph
The mapping of certificates into a graph is dependent on a parameter: the certification level l. Each account on Advogato corresponds to a node in the graph. An edge exists from node s to node t when account s has certified account t at level l or higher.
In addition, there is a distinguished "seed" node, with predefined edges to accounts. These are currently configured in the mod_virgule source, file tmetric.c.
Assignment of capacities
The next step is to assign a capacity to each node in the graph. This is done by breath-first searching the graph from the seed, computing a shortest distance from the seed to each node.
Then, capacities are assigned simply as a function of this distance. Nodes closer to the root have high capacity, which diminishes with distance. Currently:
cap(0) = 800
cap(1) = 200
cap(2) = 200
cap(3) = 50
cap(4) = 12
cap(5) = 4
cap(6) = 2
cap(i) = 1 for i > 6
An example of the capacity assignment is shown below, although with much smaller capacities than used on Advogato:
Conversion into single source, single sink
At this point, the network flow problem is defined as a single source, multiple sink problem. In addition, capacities are specified on nodes rather than edges. Since standard network flow algorithms are specified as a single source, single sink problem with capacity constraints on edges, we modify the graph slightly. A new node, the "supersink" is added to serve as a single sink for the network flow algorithm.
Each node x is split into two nodes, x- and x+. For a node x with capacity c, an edge is added from x- to x+ with capacity c - 1. For each edge from s to t in the original graph, we add an infinite capacity edge from s+ to t- in the new graph. Finally, from each node x, we add a unit capacity edge from x- to the supersink node.
This conversion is shown graphically below:
Computation of network flow
At this point, computation of the network flow is fairly straightforward, using a standard algorithm to compute a maximum flow from the seed to the supersink, subject to the capacity constraints placed on the edges. An additional constraint is placed on the result: if there is flow on the x- to x+ edge, there must also be flow on the x- to supersink edge.
Happily, when using the standard Ford-Fulkerson algorithm (repeatedly finding an augmenting path through the residual graph until no such path exists), this constraint is easily satisfied: simply use the heuristic of always choosing the shortest augmenting path. The fact that this heuristic satisfies the constraint also proves that the maximum flow satisfying the additional constraint is equal to the maximum flow.
An example of such a flow is shown below:
Security proof
To set up the security proof, we need a model of the attack. Here, we present a simple one.
The nodes are split into three categories: good, confused, and bad. The bad nodes are under the attackers control. The confused nodes themselves represent valid accounts, but may contain certificates to the bad nodes. The good nodes are both valid accounts and have certificates only for other good nodes and confused nodes. This partition is shown graphically below:
In addition, we represent the capacity of node x as cx. We are now ready to prove the following theorem:
Theorem 1
The number of bad servers chosen by the above metric is bounded by the sum of (cx - 1) over each of the confused servers x.
Proof outline. Consider the cut that includes all edges from good or confused nodes to the supersink, and all edges from confused nodes to bad nodes. This is clearly a cut because there are no direct edges from good to bad nodes.
The total flow across this cut is equal to the total number of nodes chosen. Thus, the number of bad nodes chosen is equal to the total flow minus the number of good and confused nodes chosen. It follows that the flow from confused to bad nodes is equal to the number of bad nodes chosen.
The flow from a confused node x to bad nodes cannot exceed cx - 1, as the total flow into x is bounded by cx and the above-mentioned constraint requires that there be unit flow from x- to the supersink if there is any flow at all. The constraint also ensures that the flow from non-chosen confused nodes is zero. Thus, the total number of bad nodes chosen is bounded by the sum of (cx - 1) over each of the confused servers x. QED.
Note that the number of bad servers accepted depends only on the number of confused servers, not on the number of bad servers. Thus, spam attacks are frustrated.
Conclusion
The trust metric used in Advogato has a property not known in any previous trust metric: resistance to catastrophic failure in the face of a sufficiently massive attack. Instead, the number of bad nodes accepted scales linearly, and with a fairly small constant, with the number of certificates from valid accounts to bogus ones. It is also easy to compute efficiently and fairly simple to understand. As such, it should find applications in security infrastructures, as well as defining online communities, reliably excluding spammers, trolls, and other common annoyances.