§ Computer Vision Strong

CSIRO Pasture Biomass Estimation

Multi-task ResNet50 + NDVI/Height fusion + constraint loss for 5 dry-matter targets on pasture imagery. 13% RMSE improvement over single-target baseline.

baseline single-target ~52 (Dry_Total)
best (lower) 45.41 (Dry_Total) · multi-task w/ constraint
metric RMSE per target Δ -6.59000000000000341060513164848089218139648 · -12.7%
dataset 1,785 samples · 357 unique pasture images · field measurements (NDVI + height) RGB imagery + tabular measurements
metric RMSE per target lower is better
infra vast.ai · remote GPU
last touched 2025-10-30 kaggle competition ↗
technique stack
multi-task learningResNet50 backbone (ImageNet)NDVI + Height feature fusionconstraint loss (Dry_Total = sum)image-based train/val splitcosine annealing

The challenge

CSIRO + Meat & Livestock Australia + Google Australia ($75K prize, deadline 2026-01-28). Given a pasture image, predict 5 dry-matter targets: Dry_Clover_g, Dry_Dead_g, Dry_Green_g, Dry_Total_g, GDM_g. Field measurements (NDVI, height) come paired with each image.

The point isn’t just to fit one regressor — it’s that the 5 targets are physically related: total dry matter equals the sum of the three component classes. Any model that doesn’t enforce this is leaving signal on the table.

What I noticed in the data

Three structural facts that drove everything else:

  • Dry_Total = Dry_Clover + Dry_Dead + Dry_Green is exact in the labels. Not a noisy approximation — a constraint. The model should know.
  • 357 unique images × 5 targets each = 1,785 rows. Splitting by row leaks images across train and validation; the split must be by image_path.
  • State and species dominate. Pasture in NSW averages 39.7 g, WA 18.8 g; Phalaris species runs 59.2 g, Clover 18.9 g. Image-only models miss this.

The architecture

  • ResNet50 (ImageNet pretrained) backbone — shared feature extractor
  • Linear projection 2048 → 1024 → 512
  • Concatenate NDVI + Height (2-d) onto image features → joint vector
  • 5 task-specific heads (one per target)
  • Constraint loss term: λ · |Dry_Total − (Dry_Clover + Dry_Dead + Dry_Green)|²
  • Weighted MSE: Dry_Total carries 1.5× weight (it’s the headline metric)

Training: H100 PCIe on vast.ai, batch 16, AdamW @ 1e-4, cosine annealing, 67 epochs (early-stopped). Image resolution 224×224 (source is 2000×1000; revisiting at higher res is the obvious next experiment).

Results

TargetVal RMSE
Dry_Total_g45.41 (main metric)
Dry_Green_g26.47
GDM_g32.63
Dry_Dead_g15.92
Dry_Clover_g13.64

13% improvement over a single-target baseline (separate regressors per target, no constraint loss, no feature fusion). The constraint loss contributed roughly half of the lift — it’s the cheapest, highest-impact modeling choice on this dataset.

What I’d do differently

  • Higher input resolution. 2000×1000 source → 224×224 input throws away most of the spatial signal. A 512×512 or 768×768 crop schedule (with random crops in training) should help the dense-fine-grass cases.
  • Per-state / per-species heads. State and species are categorical features with strong per-class biomass priors; conditioning the model on these (either as one-hots concatenated into the joint vector, or as separate task groups) is probably worth another few %.
  • Cross-validation, not a single split. 357 unique images is small enough that the 80/20 split has variance. 5-fold CV with image-stratified splits would tighten the estimate.

Source archive

Full pipeline (notebooks + training scripts + submission notebooks + model checkpoints pointer) lives at github.com/Shivam-Bhardwaj/csiro-kaggle-pasture-biomass (archived 2026-05-22). Unarchive to resume.