Reproducing the Graph Transformer paper

This page documents how to reproduce the figures and tables from:

Doherty, M., Beghelli, A., Toni, L. Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing, Modulation and Spectrum Allocation in Elastic Optical Networks. (In preparation; targeted at the Journal of Optical Communications and Networking.)

The paper makes two main experimental claims:

Benchmark comparison — On four standard topologies (NSFNET, COST239, USNET, JPN48) and five RL benchmark settings (DeepRMSA, RewardRMSA, GCN-RMSA, MaskRSA, PtrNet-RSA), the Graph Transformer is the first RL method to consistently match or exceed the strongest heuristic baseline (Section 3).
Scalability — On TopologyBench's USA100 (100 nodes) and TataInd (143 nodes) topologies — the largest dynamic RMSA instances ever attempted with RL — the Transformer supports 3–4% higher load than FF-KSP (K=70 / K=90) at 0.1% blocking (Section 4).

Section 3 — Benchmark comparison

The Section 3 figure compares the Graph Transformer against five published RL methods, the strongest heuristic in each setting, and two capacity bound estimates (cut-set and reconfigurable-routing).

The comparison plot itself (Fig. 7, bounds_comparison_new_with_rl.png) is built using results from scripts in experimental/JOCN2024/ — the same directory used for the Reinforcement Learning: Hype or Hope? paper, since the heuristic and bounds curves are reused. See Reproducing the JOCN 2024 (Hype or Hope) paper for the heuristic and bounds runs in detail. The Transformer evaluation curves overlaid on top of those baselines come from experimental/JOCN2024/generate_data/eval_transformers.sh.

1. Train the Transformer policies

The trained Equinox checkpoints expected by the evaluation script are listed below. Train them with --SAVE_MODEL --MODEL_PATH=<filename>.eqx (one A100, ~30 min – 1 h 40 min each):

Benchmark setting	Topology	Model file
DeepRMSA / RewardRMSA / GCN-RMSA	NSFNET	`nsfnet_maskrsa_43_1.eqx` (RMSA) and `nsfnet_rsa80.eqx` (RSA)
DeepRMSA / RewardRMSA / GCN-RMSA	COST239	`cost239_deeprmsa_13.eqx`
DeepRMSA / RewardRMSA / GCN-RMSA	USNET	`usnet_2.eqx`
MaskRSA	JPN48	`jpn48_maskrsa.eqx`
MaskRSA	NSFNET	`nsfnet_maskrsa_43_1.eqx`
PtrNet-RSA-40	NSFNET, USNET, COST239	`*_rsa40.eqx`
PtrNet-RSA-80	NSFNET, USNET, COST239	`*_rsa80.eqx`

A representative training command (NSFNET / RMSA / DeepRMSA setting):

uv run xlron/train/train.py \
  --topology_name=nsfnet_deeprmsa_directed \
  --env_type=rmsa --link_resources=100 --k=50 \
  --load=145 --mean_service_holding_time=20 --truncate_holding_time \
  --modulations_csv_filepath=./xlron/data/modulations/modulations_deeprmsa.csv \
  --max_requests=13000 --ENV_WARMUP_STEPS=0 --relative_arrival_times \
  --USE_TRANSFORMER --transformer_num_layers=2 --transformer_num_heads=4 \
  --aggregate_slots=20 \
  --OFF_POLICY_IAM --VALID_MASS_LOSS_COEF=0.0002 --VML_SCHEDULE=constant \
  --LR=5e-3 --LR_SCHEDULE=cosine \
  --ENT_COEF=0.0175 --ENT_SCHEDULE=linear \
  --VF_COEF=0.05 --SEPARATE_VF_OPTIMIZER --VF_LR=1e-4 \
  --GAMMA=0.996 --GAE_LAMBDA=0.99 --CLIP_EPS=0.04 \
  --ROLLOUT_LENGTH=64 --NUM_ENVS=200 \
  --TOTAL_TIMESTEPS=100000000 --STEPS_PER_INCREMENT=5150000 \
  --WANDB --PROJECT=TRANSFORMER_TRAIN --DOWNSAMPLE_FACTOR=100 \
  --SAVE_MODEL --MODEL_PATH=./episodic_20_8_10.eqx

The full set of training commands for every benchmark setting is in experimental/JOCN2024/generate_data/eval_transformers.sh — that file contains both the --EVAL_MODEL evaluation commands shown below and the corresponding training settings (drop --EVAL_MODEL --MODEL_PATH=... and add --SAVE_MODEL to train from scratch).

2. Evaluate the trained Transformers and the heuristics

# Heuristic and bounds baselines (also used by the JOCN 2024 paper)
bash experimental/JOCN2024/generate_data/heuristic_evaluation.sh
bash experimental/JOCN2024/generate_data/run_cutsets_bounds.sh
bash experimental/JOCN2024/generate_data/run_reconfigurable_routing_bounds.sh

# Transformer evaluation across all four topologies and five RL settings
bash experimental/JOCN2024/generate_data/eval_transformers.sh
bash experimental/JOCN2024/generate_data/eval_transformers_bounds.sh

These produce the JSONL/CSV results consumed by the comparison plotting script. The Transformer evaluation reuses each saved .eqx checkpoint via --EVAL_MODEL --MODEL_PATH=....

3. Plot Fig. 7

uv run python experimental/JOCN2024/generate_plots/plot_heuristic_comparison.py

Output: experimental/JOCN2024/generate_plots/plots/bounds_comparison_new_with_rl.png (paper Fig. 7 — bp vs load across NSFNET / COST239 / USNET / JPN48 with all five benchmark settings).

The summary tables comparing supported load at 0.1% blocking come from summarise_bounds_table.py and summarise_review_table.py in the same directory.

Section 4 — Large-Scale Experiments (USA100 and TataInd)

All Section 4 figures live under experimental/large_topologies/. The trained Transformer checkpoints, FF-KSP trajectories, ablation runs, and slot-occupancy heatmaps are all included in experimental/large_topologies/results/{usa100,tataind}/.

1. Train the large-topology Transformers

Both runs use a single H100 (74 GB), 12 parallel envs, 40M steps, ~4–5 hours wall-clock.

# USA100 — 100 nodes, 342 directed links, k=70
uv run xlron/train/train.py \
  --topology_name=usa100_directed \
  --env_type=rmsa --link_resources=320 \
  --k=70 --aggregate_slots=80 \
  --load=620 --mean_service_holding_time=25 \
  --max_requests=25000 --ENV_WARMUP_STEPS=0 --relative_arrival_times \
  --USE_TRANSFORMER --transformer_num_layers=2 --transformer_num_heads=8 \
  --transformer_embedding_size=128 \
  --OFF_POLICY_IAM --VALID_MASS_LOSS_COEF=0.002 --VML_SCHEDULE=linear --VML_END_FRACTION=0.5 \
  --LR=1.5e-3 --LR_SCHEDULE=cosine \
  --ENT_COEF=0.01 --ENT_SCHEDULE=cosine \
  --VF_COEF=0.1 --SEPARATE_VF_OPTIMIZER --VF_LR=5e-5 \
  --GAMMA=0.996 --GAE_LAMBDA=0.99 --CLIP_EPS=0.04 \
  --ROLLOUT_LENGTH=64 --NUM_ENVS=12 \
  --TOTAL_TIMESTEPS=40000000 \
  --SAVE_MODEL --MODEL_PATH=./usa100_transformer.eqx \
  --WANDB --PROJECT=LARGE_TRANSFORMER

# TataInd — 143 nodes, 362 directed links, k=90
uv run xlron/train/train.py \
  --topology_name=tataind_directed \
  --env_type=rmsa --link_resources=320 \
  --k=90 --aggregate_slots=80 \
  --load=450 --mean_service_holding_time=25 \
  --max_requests=25000 --ENV_WARMUP_STEPS=0 --relative_arrival_times \
  --USE_TRANSFORMER --transformer_num_layers=2 --transformer_num_heads=8 \
  --transformer_embedding_size=128 \
  --OFF_POLICY_IAM --VALID_MASS_LOSS_COEF=0.001 --VML_SCHEDULE=linear --VML_END_FRACTION=0.5 \
  --LR=1.5e-3 --LR_SCHEDULE=cosine \
  --ENT_COEF=0.015 --ENT_SCHEDULE=cosine \
  --VF_COEF=0.1 --SEPARATE_VF_OPTIMIZER --VF_LR=5e-5 \
  --GAMMA=0.996 --GAE_LAMBDA=0.99 --CLIP_EPS=0.04 \
  --ROLLOUT_LENGTH=64 --NUM_ENVS=12 \
  --TOTAL_TIMESTEPS=40000000 \
  --SAVE_MODEL --MODEL_PATH=./tataind_transformer.eqx \
  --WANDB --PROJECT=LARGE_TRANSFORMER

Hyperparameters that differ between USA100 and TataInd are summarized in Table 2 of the paper.

2. Heuristic benchmarks (FF-KSP K-sweep)

The choice of K=70 (USA100) and K=90 (TataInd) for FF-KSP is justified in Section 4.1 by sweeping K from 10 to 100 and recording blocking probability at 620 / 450 Erlang. Reproduce with --EVAL_HEURISTIC --path_heuristic=ff_ksp over a K sweep:

for K in 10 20 30 40 50 60 70 80 90 100; do
  uv run python -m xlron.train.train \
    --env_type=rmsa --topology_name=usa100_directed --link_resources=320 \
    --k=$K --load=620 --mean_service_holding_time=25 \
    --max_requests=100000 --continuous_operation --ENV_WARMUP_STEPS=3000 \
    --EVAL_HEURISTIC --path_heuristic=ff_ksp \
    --NUM_ENVS=10 --TOTAL_TIMESTEPS=1000000 \
    --DATA_OUTPUT_FILE=experimental/large_topologies/results/usa100/usa100_ff_ksp_K${K}.jsonl
done
# Repeat with topology_name=tataind_directed and load=450 for TataInd

3. Evaluate Transformer vs FF-KSP across loads

For each topology, evaluate at multiple loads and dump full per-request trajectories:

# USA100 — load sweep with Transformer
for LOAD in 550 600 620 650 680 700 750; do
  uv run python -m xlron.train.train \
    --env_type=rmsa --topology_name=usa100_directed --link_resources=320 \
    --k=70 --aggregate_slots=80 --load=$LOAD --mean_service_holding_time=25 \
    --USE_TRANSFORMER --transformer_num_layers=2 --transformer_num_heads=8 --transformer_embedding_size=128 \
    --OFF_POLICY_IAM --max_requests=100000 --continuous_operation --ENV_WARMUP_STEPS=3000 \
    --EVAL_MODEL --MODEL_PATH=./usa100_transformer.eqx \
    --NUM_ENVS=10 --TOTAL_TIMESTEPS=1000000 \
    --DATA_OUTPUT_FILE=experimental/large_topologies/results/usa100/usa100_transformer_eval_results.jsonl
done

# Single full episode at training load — produces the trajectory CSVs used for path/spectral analysis
uv run python -m xlron.train.train \
  --env_type=rmsa --topology_name=usa100_directed --link_resources=320 \
  --k=70 --aggregate_slots=80 --load=620 \
  --USE_TRANSFORMER --transformer_num_layers=2 --transformer_num_heads=8 \
  --max_requests=100000 --ENV_WARMUP_STEPS=0 --relative_arrival_times \
  --EVAL_MODEL --MODEL_PATH=./usa100_transformer.eqx \
  --EPISODE_DATA_OUTPUT_FILE=experimental/large_topologies/results/usa100/usa100_transformer_traj.csv

Repeat with --EVAL_HEURISTIC --path_heuristic=ff_ksp to produce usa100_ff_ksp_* files. The same patterns apply to TataInd (load=450, k=90).

4. Slot-occupancy and link-usage processing

uv run python experimental/large_topologies/process_slot_occupancy.py
uv run python experimental/large_topologies/analyze_path_lengths.py --topology_name=usa100_directed --k=70
uv run python experimental/large_topologies/analyze_path_lengths.py --topology_name=tataind_directed --k=90
uv run python experimental/large_topologies/analyze_action_path_ratios.py

These read the trajectory CSVs and produce the *_slot_occupancy.npz and *_link_usage.npy artifacts consumed by the plotting script.

5. Ablation runs (Fig. 9)

The five ablation variants live in experimental/large_topologies/results/{usa100,tataind}/ablations/:

Variant	Drop
`original` (full model)	— (paper "All Features")
`onpolicy`	swap `--OFF_POLICY_IAM` for on-policy IAM (`--ON_POLICY_IAM`)
`nodamping`	drop `--LOSS_DAMPING_VALID_MASS_TARGET`
`nogating`	drop hard gating (`--LOSS_DAMPING_K_MIN=0`)
`nogating_nodamping`	drop both
`novml`	set `--VALID_MASS_LOSS_COEF=0`
`gating1`	use `--LOSS_DAMPING_K_MIN=1` (keep transitions with ≥1 valid action)
`ffksp`	`--EVAL_HEURISTIC --path_heuristic=ff_ksp` baseline

Each variant is a separate training run with the corresponding flag change.

6. Plot all Section 4 figures

uv run python experimental/large_topologies/plot_large_topologies.py

Paper figure	Plot file
Fig. 8 — TataInd and USA100 topologies	`figures/topologies.png` (handled separately by `topology_visualization` scripts)
Fig. 9 — Ablation: blocking over training	`figures/ablation_blocking.png`
Fig. 10 — Loss components over training	`figures/loss_components.png`
Fig. 11 — Blocking probability vs load	`figures/blocking_vs_load.png`
Fig. 12 — Bitrate blocking over a single episode	`figures/bitrate_blocking_over_steps.png`
Fig. 13 — Mean path length (km, hops) over an episode	`figures/path_comparison.png`
Fig. 14 — Per-request path length delta	`figures/path_delta.png`
Fig. 15 — Path-length distributions	`figures/path_boxplots.png`
Fig. 16 — Per-link FSU occupancy difference	`figures/slot_occupancy_diff.png`
Fig. 17 — Per-link usage difference	`figures/link_usage_delta.png`

The TopologyBench source for USA100 and TataInd is at https://github.com/TopologyBench/Real-Topologies; both topologies are bundled with XLRON via xlron/data/topologies/topology_bench_to_xlron_conversion.py.

Hardware

Section 3 training: NVIDIA A100 80 GB, 200 parallel envs, 100M steps, 30 min – 1 h 40 min per run.
Section 4 training: NVIDIA H100 80 GB, 12 parallel envs, 40M steps, ~4–5 h per topology, 74 GB GPU memory.
Section 4 evaluation: A100 or H100 sufficient; 100k requests per evaluation point.

Citation

This paper is in preparation — citation details will be updated once the manuscript is published.

@unpublished{doherty_graph_transformer,
  author  = {Doherty, Michael and Beghelli, Alejandra and Toni, Laura},
  title   = {Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing, Modulation and Spectrum Allocation in Elastic Optical Networks},
  note    = {In preparation},
  year    = {2026}
}