Skip to main content

Documentation Index

Fetch the complete documentation index at: https://rustunnel.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

rustunnel supports group-based load balancing for HTTP and TCP tunnels. Multiple clients can register against the same subdomain (HTTP) or share a TCP port pool, and inbound connections are dispatched at random across healthy members of the group. Optional client-side health probes automatically remove sick backends from the rotation. The model is modeled on FRP’s loadBalancer.group / healthCheck config — same shape, slightly different wire format.

Concepts

  • Group — a logical pool of tunnel members sharing the same subdomain (HTTP) or TCP port. Identified by a user-supplied group name plus a shared group_key. The server stores only the SHA-256 hash of the key and uses it to authorise joins; the raw key never leaves the client.
  • Member — one tunnel inside a group. Running two clients with the same (group, group_key) produces a 2-member pool.
  • Health bit — every member has a healthy flag. Dispatch routes around members whose flag is false. Without a health check configured, members are permanently healthy (the server trusts the client’s presence).
  • Dispatch — for each new public connection, the server picks one healthy member uniformly at random. There’s no weighting and no sticky sessions today.
                  +-> client A -> backend on :3000
public ─-->  ── group "web"
                  +-> client B -> backend on :3001

Configuration

Server (server.toml)

The kill switch. When false (the default), the server accepts the new fields on the wire but ignores them — every registration is a solo tunnel. When true, members sharing (subdomain, group_key_hash) (HTTP) or (group_name, group_key_hash) (TCP) form a real pool.
[load_balancing]
enabled = true
The kill switch is per-region for self-hosted multi-region deployments. Flip it on regions one at a time during a rollout — false is the safe default that preserves single-tunnel-per-key behaviour.

Client (~/.rustunnel/config.yml)

Add group, group_key, and optionally health_check to a tunnel definition:
server: tunnel.example.com:4040
auth_token: "your-token"

tunnels:
  a:
    proto: http
    local_port: 3000
    subdomain: pool
    group: web
    group_key: shared-secret-for-this-pool
    health_check:
      type: tcp
      interval_secs: 10
      timeout_secs: 3
      max_failed: 3
FieldRequiredDefaultMeaning
groupyes (for LB)Display name of the pool. The first joiner sets TunnelGroup.name; later joiners are accepted regardless of what they pass.
group_keyyes (for LB)Shared secret. SHA-256-hashed before transmission. Members of one pool MUST agree on this value; the server rejects a join with a mismatched key.
health_check.typenotcp (open a connection) or http (issue a GET). Omit to disable probing.
health_check.pathyes when type: httpPath to GET against the local service.
health_check.interval_secsno10Probe period.
health_check.timeout_secsno3Per-probe deadline.
health_check.max_failedno3Consecutive failures before reporting TunnelUnhealthy.
health_check.expect_2xxnotrueWhen false, any HTTP response counts as healthy.
health_check.alert_webhooknoPer-tenant URL the server POSTs to when this group transitions to 0 healthy members. See Webhook alerts below.

Behaviour rules

Members must declare the same protocol (http vs https). A mismatch is rejected with a clear error. The subdomain is the routing key — every member of a group shares one subdomain.
The first member of a (group, group_key) allocates a port from the configured tcp_port_range. Subsequent members reuse that port; the server returns the same assigned_port to all joiners. Members never see a Registered listener event after the first — the listener is already bound.
Registering a solo (no-group) tunnel against an existing group’s subdomain is rejected with subdomain '...' is already in use. Registering a grouped tunnel against an existing solo tunnel is rejected with group key does not match. A subdomain is owned by exactly one identity at a time.
The group entry is removed when its last member disconnects. The TCP port (if any) is returned to the pool. New registrations after that point start a fresh group with a fresh port.
The create / join / remove paths are serialised atomically via the routing-table entry API. Two concurrent first registrations produce one group, not two.

Health checks

Probes run on the client against local_addr. The server never opens a connection to the upstream itself — it just trusts the client’s TunnelHealthy / TunnelUnhealthy reports.
  • TCP probe: opens a TCP connection. Success = connect within timeout_secs.
  • HTTP probe: sends GET <path> HTTP/1.0 and reads the status line. Success = response within timeout_secs and (when expect_2xx) status in [200, 300).
Probe state is reported only on edges:
1

First probe success

Emits TunnelHealthy — lifts the initial healthy=false state for members that opted into probing.
2

`max_failed` consecutive failures

Emits TunnelUnhealthy. The server flips the healthy bit to false and excludes the member from dispatch.
3

First success after a failure streak

Emits TunnelHealthy. The server resets the consecutive-failure counter and re-includes the member.
A member with no health_check is permanently healthy. A member with a spec starts unhealthy and only joins dispatch after the first successful probe.

Webhook alerts

When a load-balancing group transitions to 0 healthy members, public traffic to that subdomain or port starts returning 502. The server can POST a JSON alert to one or more URLs at the moment of that transition so an operator or tenant can react. There are two distinct destinations, each addressing a different audience:

Operator URL — [load_balancing] alert_webhook_url in server.toml

Set on the edge. Fires for every group on that edge that goes 0/N, regardless of which tenant owns the group. Useful for self-hosted deployments and for ops awareness on a managed multi-tenant edge.
[load_balancing]
enabled = true
alert_webhook_url = "https://hooks.slack.com/services/operator-channel/..."

Per-tenant URL — health_check.alert_webhook in the client config

Set on the client. Fires only when the group containing this tunnel goes 0/N. Each tenant points it at their Slack / PagerDuty / email gateway. The URL is sent on the wire as part of HealthCheckSpec and stored on the affected GroupMember; only the server holds it (the URL is never returned by /api/groups — dashboards see a presence-only flag).
tunnels:
  a:
    proto: http
    local_port: 3000
    subdomain: pool
    group: web
    group_key: shared-secret-for-this-pool
    health_check:
      type: tcp
      alert_webhook: "https://hooks.slack.com/services/my-team/..."
Both destinations can be configured independently. Both fire on the same 0/N transition. The server collects unique URLs from the affected group’s members (so two members of one tenant pointing at the same URL receive a single POST per transition, not two), then fans out to each unique URL plus the operator URL.

Payload

Same JSON body sent to every destination:
{
  "event": "group_zero_healthy",
  "region_id": "eu",
  "protocol": "http",
  "label": "pool",
  "group_name": "web",
  "key_hash_short": "deadbeef",
  "member_count": 2,
  "at": "2026-05-06T13:24:55+00:00"
}
key_hash_short is the first 8 hex chars of the group’s SHA-256 key hash — stable across reconnects, useful for correlating alerts when a single team runs multiple pools with the same group_name.

Debounce

The server tracks a per-group zero_healthy_alerted flag. Once an alert fires, subsequent TunnelUnhealthy frames against the same already-down group do not re-fire. The flag resets the moment any member becomes healthy again — the next 0/N transition then fires fresh. In practice: if your pool flaps badly (down → up → down → up), each downward edge generates one alert per destination. Steady-state “everyone is still down” generates none.

Delivery

Best-effort. The server uses a 5-second per-request timeout, no retry, no queue. If your webhook receiver is down at the moment of the transition, the alert is lost. For high-stakes paging, point the URL at something durable — a queueing alertmanager, or a service like Pushover with retry — rather than relying on the rustunnel server for delivery guarantees.
The fire happens in a detached tokio::spawn, so a slow webhook receiver never blocks the server’s frame-handling hot path.

Testing the feature locally

Quick end-to-end smoke test against a self-hosted edge with [load_balancing] enabled = true. Spin up two clients with the same (group, group_key), point them at separate local backends, and hammer the public URL — both backends should serve.
1

Build the client from source

git clone https://github.com/joaoh82/rustunnel
cd rustunnel
cargo build --release -p rustunnel-client
2

Drop a config that opts into a group

cat > /tmp/lb-test.yml <<'EOF'
server: tunnel.example.com:4040
auth_token: "your-token"

tunnels:
  a:
    proto: http
    local_port: 3000
    subdomain: lbtest
    group: web
    group_key: shared-secret-for-lb-test
    health_check:
      type: tcp
EOF
3

Start backend A on :3000

python3 -m http.server 3000
4

Start client A pointing at backend A

./target/release/rustunnel start --config /tmp/lb-test.yml
5

Start backend B on :3001

In a separate terminal:
python3 -m http.server 3001
6

Start client B with `local_port: 3001`

Either edit /tmp/lb-test.yml and run a second rustunnel start, or use a second config file with the same group / group_key and local_port: 3001.
7

Hammer the public URL

for i in $(seq 1 50); do
  curl -fsS https://lbtest.tunnel.example.com/ -o /dev/null -w "%{http_code}\n"
done
Both backends should see roughly half of the requests in their access logs.
8

Verify via the metrics endpoint

ssh root@tunnel.example.com 'curl -sf http://127.0.0.1:9090/metrics' \
  | grep '^rustunnel_group_'
Expect output like:
rustunnel_group_members{group="web",region="eu",healthy="true"} 2
rustunnel_group_members{group="web",region="eu",healthy="false"} 0
rustunnel_group_dispatches_total{group="web",region="eu"} 50
rustunnel_group_health_failures_total{group="web",region="eu",kind="tcp"} 0
9

Validate failover

Kill one of the local backends. The probe loop on that client marks it unhealthy after max_failed * interval_secs seconds; subsequent requests all land on the survivor. Restart the backend — the probe re-registers it as healthy and dispatch distributes again.

Observability

When [load_balancing] enabled = true, the Prometheus exporter on :9090 emits three additional series:
MetricTypeLabelsWhat it measures
rustunnel_group_membersgaugegroup, region, healthyCount of registered members partitioned by their health bit.
rustunnel_group_dispatches_totalcountergroup, regionTotal dispatched connections, summed across the group’s members. Per-group rather than per-member to keep label cardinality bounded.
rustunnel_group_health_failures_totalcountergroup, region, kindTotal TunnelUnhealthy frames received across the group’s members. kind is tcp / http / none based on the probe type.
The pre-existing rustunnel_active_tunnels_* and rustunnel_requests_total gauges/counters keep counting members (not groups) so historical dashboards stay accurate.

Per-tunnel timeline + live event stream

Two REST surfaces let dashboards reconstruct recent health behaviour without polling all of /api/tunnels:
  • GET /api/tunnels/:id/health-events — last 50 health-state transitions for that tunnel ({ at, healthy, reason }[], oldest first). Records edges only — steady-state probe reports are not stored. Use this to render a per-tunnel timeline panel.
  • GET /api/groups/:protocol/:label/events — Server-Sent Events stream emitting one group_event per member health-bit transition affecting the named group. 30s keep-alive ping. Use this for live dashboards that want push instead of polling. A lagged SSE event means the consumer fell behind — resync via /api/groups.
Both endpoints are gated by the same auth as /api/tunnels (admin token or DB token), and they apply the same per-tenant scope: a user-scoped DB token sees only groups containing at least one of its own members; aggregate counters reflect just the visible members; groups the caller can’t see return 404 rather than 403. Admin tokens see everything.

Limitations & non-goals

rustunnel’s load balancing is intentionally minimal. If you need any of the features below, layer them at the application or DNS level.
  • No weighted dispatch — random uniform only.
  • No sticky sessions — every new connection is dispatched independently. Long-lived WebSocket connections that need affinity must handle reconnects at the application layer.
  • No active session draining on member removal — in-flight connections finish naturally; new connections route elsewhere.
  • No UDP groups — UDP is connectionless; there’s no obvious unit to dispatch.
  • No P2P groups — P2P publishers are 1-to-many by design.
  • No cross-region pools — members must be on the same edge server. Layer DNS-based routing on top for global LB.
  • No groupKey rotation — once a group exists, rotating its key requires dropping all members.

See also

Client guide

CLI flags, config file, and the full set of tunnel modes.

Architecture

How HTTP / TCP / UDP / P2P tunnels flow through the system.