Load balancing topologies¶

An interesting property of Onion Services technology, from a service operator perspective, is that is allows for many possible network topologies.

Firstly, because it's a portable technology, meaning that a service can be moved around servers by just copying it's keys and configurations anywhere the Tor network is reachable.

And second, because it's execution can be split about multiple machines by the following approaches:

By running many tor daemon instances in parallel to act as a Onion Service load balancing/failover layer.
By splitting the Onion Services descriptor publisher from the actual backends.
Combining both methods above, by running a mix of tor daemon instances and publishers.

We'll discuss each approach below, but note that load balancing with Onion Services is related to the way Onion Services work, and depends on which introduction points a client picks up to connect, which is made available through a descriptor document published in the Tor network. A descriptor usually lists many introduction points from a single tor daemon instance, so the strategy to load balance is based on either:

Alternate between the currently published descriptor from different tor daemon instances, by simply running these instances in parallel.
Include introduction points from different tor daemon instances in the same descriptor, by splitting the publisher process from the backend instances.

Running multiple instances in parallel¶

This is the simpler approach, consisting in running multiple tor daemon instances in parallel in different servers (or in multiple CPUs from the same server, with limited effectiveness, as discussed in the topologies document):

Advantages:

Simpler to set up (just replicate what you already have).
Works well with the Proof Of Work anti-DoS defense (PoW).

Disadvantages:

Every tor server needs to have a copy of the .onion private key, so if one server is compromised then your service is compromised.
This is not "full" load balancing, acting mostly as a simple failover, and may depend in the timing you start each of the tor daemons plus a random internal timer on each tor instance, to ensure they (re)publish their descriptors at different times¹.
The descriptor re-publishing interval on each instance is rather unpredictable, since this depends on the random interval timer -- which is specified as between 60 and 120 minutes -- or any event that requires a descriptor to be republished, such as when Proof of Work is functioning.

With Onionspray, you can

Use hardmap to configure services.
And then copy/sync the whole project folder, with all configurations, keys, certificates etc to other machine, and run all instances in parallel.

Splitting the publisher from the backends¶

Right now this is achieved with Onionbalance, which is a tool that combines backend information in a single "superdescriptor" and publishes it in the Tor network, hence providing load-balancing and redundancy by distributing requests to multiple backend Tor instances.

Advantages:

Fully implements load balancing/failover.
Provides better isolation of the main .onion keys, reducing the attack surface.

Disadvantages:

It's harder to do the setup, although Onionspray can do some of the heavy lifting.
As of January 2024, it does not support the Proof Of Work anti-DoS defense (PoW), but this is [being planned][]; works with the other DoS protections though.
As of June 2019, some instabilities in Onionbalance are making it hard to run on recent GNU/Linux distributions because an stale Python crypto library, so OnionBalance is currently deprecated in Onionspray until these issues are fixed.

With Onionspray, you can

Use softmap to configure service to be used with Onionbalance.

Combining both methods¶

You can combine both approaches in hybrid setups, like

Using softmap so your services rely on Onionbalance.
Replicate the whole project folder to other servers, and run Onionspray (and hence Onionbalance) in parallel.

The number of ways can easily get very complicated, and the topologies page shows some examples.

Which one to go?¶

It's hard to tell what's best for every scenario. Onionbalance is the best candidate, except if you plan to deploy PoW.

If you prefer PoW over the other advantages offered by Onionbalance, maybe you should start by using the simpler method instead.

You can also switch anytime from one approach to the other, without disrupting the service.

As for the number of nodes, that will depend mostly on load/requests.

Specifically for Onionspray, recommendation right now is for people to avoid OnionBalance and use hardmap for the moment, until issues are fixes; an to run multiple parallel instances when load balancing is needed. But consider Onionbalance if your system is under sustained high bandwidth and strongly demonstrates extended choking on throughput if the simpler load balance does not distribute load enough.

Check discussion at the Periodically republish the descriptor issue. ↩