Stanford RNA 3D Folding 2
Predict the 3D structure of RNA from sequence. A classical sequence-NN baseline beats anything that ignores the closest train neighbors; RibonanzaNet3D refines the candidates; a learned reranker picks the final five.
The challenge
Predict five 3D coordinate sets per residue, evaluated as best-of-five structural RMSDRMSD — root mean square deviationbioinformaticsThe standard structural similarity metric. Compute the root-mean-square of pairwise atom-coordinate differences between two superposed structures. Lower is better; 0 Å is identical structures.full entry →Wikipedia against the held-out experimental crystallography. Five guesses per target hedge against ambiguity, but the leaderboard still rewards the top-1 aim. The competition is research-track with a 75,000 USD prize and a 2026-03-25 deadline.
The dataset shape
5,716 train targets. Sequences span four orders of magnitude in length. MSAsMSA — multiple sequence alignmentbioinformaticsAn alignment of three or more biological sequences (DNA, RNA, protein) that exposes conserved positions and evolutionary relationships. Strong feature for structure prediction.full entry →Wikipedia are gigabytes deep. And — the detail that shapes any serious solution — more than half of the train coordinates are incomplete.
MSA depth is the long pole on RAM/disk; long sequences are the long pole on inference compute.
More than half of train targets are missing chunks of the experimental structure. Any loss function that doesn't mask is implicitly fitting noise.
Spans four orders of magnitude. Min 10 nt, max 125,580 nt, mean 1,364 nt. Most targets are short ribozymes; the tail is full-length viral RNA.
MG dominates as the structural cofactor; ZN and combinations follow. Single targets often have multiple ligands (MG;ZN, K;MG, GTP;MG;ZN).
A:1 monomeric chains dominate; A:2 and B:1;A:1 indicate dimer/co-fold structures.
A three-tier hybrid
A pure deep-learning model would ignore the strongest signal in the data — that training targets with similar sequence often have similar backbones. A pure classical retriever leaves the leaderboard tail on the table. So the repo runs all three: retrieve, refine, rerank.
- 01 Tier 1 — Classical sequence-NN k-mer retrieval + coord warping Cheap, parallelisable on CPU. Establishes the strongest baseline by reusing training-set 3D structure where sequences match.
- 02 Tier 2 — RibonanzaNet3D refinement fine-tuned NVIDIA RibonanzaNet2 + Linear(256→3) head Dropout sampling at inference produces 4 stochastic + 1 deterministic candidate per target.
- 03 Tier 3 — Neural reranker 128 prefiltered → 24 reranked → 5 submitted MLP over kmer + sequence stats + structural stats. Picks the final five from the candidate pool.
Tier 1 — Classical sequence-NN
The strongest baseline in the repo is not a neural network. It’s a classical retrieve-and-warp pipeline — using k-merk-merbioinformaticsA substring of length k over a biological alphabet (e.g. {A,C,G,U} for RNA). The set of k-mers in a sequence, or the histogram of their counts, is a common feature representation.full entry →Wikipedia features and Needleman–Wunsch alignmentNeedleman–Wunsch alignmentbioinformaticsA dynamic-programming algorithm for optimal global alignment of two biological sequences. Fills an O(mn) table of partial-alignment scores and traces back the best path.full entry →Wikipedia to find similar training RNAs and copy their coordinates — that runs on CPU and finishes a full validation pass in minutes.
- k-mer order
- k = 3
- alphabet
- {A, C, G, U}
- histogram dim
- 4³ = 64
- normalization
- L2
- primary
- SequenceMatcher (Python stdlib)
- alternative
- Biopython PairwiseAligner (Needleman-Wunsch)
- top-k
- configurable
- metadata weights
- ligand Jaccard · stoichiometry · diversity penalty
- length match
- block-warping (overlay aligned positions)
- length mismatch
- linear interpolation along 3 axes
- backbone constraint
- 5.5 – 6.5 Å between consecutive residues
- base-pair geometry
- Watson–Crick on i+3..i+24 for short molecules
Source: baseline_sequence_nn.py, 758 lines. The constraint enforcement at output time is the difference between predictions that hold up under structural review and ones that don't.
Tier 2 — RibonanzaNet3D refinement
Fine-tune NVIDIA’s RibonanzaNet2RibonanzaNetbioinformaticsA sequence-only transformer for RNA structure understanding, trained on the Ribonanza chemical-mapping Kaggle dataset by NVIDIA. RibonanzaNet2 / 3D add 3D coordinate prediction heads.full entry →Ribonanza Kaggle — a pretrained sequence model from the Ribonanza Kaggle — with a 3D coordinate head. The base provides context-aware per-residue embeddings; a single linear layer maps each to its (x, y, z).
- base
- NVIDIA RibonanzaNet2 (pretrained)
- head
- Linear(256, 3)
- max sequence length
- 512 nt
- above max
- fall through to classical baseline
- optimizer
- AdamW, lr=2e-4, wd=1e-4
- dropout
- 0.1
- batch size
- 2
- grad accumulation
- 4 (effective 8)
- epochs
- 24
- stochastic passes
- 4 (dropout-on)
- deterministic pass
- 1 (eval-mode)
- output
- 5 candidates per target
Sources: train_ribonanzanet3d_v2.py + ribonanzanet3d_hybrid_infer.py + start_ribonanzanet3d_v2.sh.
Tier 3 — Neural reranker
The reranker is small. The job is to pick well, not to predict. It takes 128 prefiltered candidates from the classical pipeline, scores each, and surfaces the final 5. Trained on a 3-way temporal split so no candidate from after the query’s collection date is ever considered.
- core training set
- 88% — oldest cutoffs
- training queries
- 2% — middle cutoffs
- validation queries
- 10% — newest cutoffs
- leakage guard
- exclude templates >= query cutoff
- sequence
- query + template kmer (k=3, 64-dim each)
- seq stats
- length · GC% · AU%
- structure stats
- radius of gyration · mean/std step distance · end-to-end
- length-ratio gate
- 20% minimum
- optimizer
- AdamW, lr=3e-4, wd=1e-4
- batch size
- 64
- epochs
- 24
- embedding batch
- 16 · max_len 1536 · overlap 256
- candidate funnel
- 128 → 24 → 5
Source: train_hybrid_reranker.py + start_hybrid_reranker_overnight.sh.
Distributed search infrastructure
The classical search across 5,716 train targets is embarrassingly parallel and pure CPU. The repo runs it on three surfaces:
- 01 sbl1 (dev box) orchestration + merge
- 02 sbl2 / sbl3 / sbl4 tailnet tmux shards
- 03 GCP Batch spot CPU bursts
- 04 merge step merge_sequence_nn_search.py
No cluster manager. rsync + tmux + a fleet config array. The simplest thing that works.
#!/usr/bin/env bash
set -euo pipefail
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
OUT_DIR="$ROOT/outputs/analysis/shards"
if [[ $# -lt 2 ]]; then
echo "usage: $0 <shard-index> <num-shards> [workers] [extra search args...]" >&2
exit 1
fi
SHARD_INDEX="$1"
NUM_SHARDS="$2"
WORKERS="${3:-$(nproc)}"
if [[ $# -ge 3 ]]; then
shift 3
else
shift 2
fi
mkdir -p "$OUT_DIR"
PYTHON_BIN="${PYTHON_BIN:-python}"
if [[ -f "$ROOT/.venv/bin/activate" ]]; then
# Prefer the challenge-local environment when present.
# shellcheck disable=SC1091
source "$ROOT/.venv/bin/activate"
PYTHON_BIN="python"
fi
search_cmd=( What’s in flight
RibonanzaNet3D training is the active workstream; the hybrid reranker runs overnight. Their honest current contribution is small (33.98 → 33.89 is a 0.27% relative gain on top-1 RMSD), but the structural pieces — the 3-tier hybrid, the temporal-split reranker training, the candidate diversity protocol — are the levers that compound across future runs.