Troubleshooting¶
This page lists common problems encountered when using the heterodyne package and their recommended solutions.
NLSQ Convergence Failure¶
Symptom: result.success is False; the solver reports
“Maximum number of function evaluations reached” or “Cost function
not decreasing.”
Possible causes and remedies:
Poor initial guess – Use multi-start optimisation (
n_starts=20or more) to explore the parameter space from diverse starting points.Multi-modal landscape – Switch to CMA-ES for global search, then refine with NLSQ. See CMA-ES Global Optimisation.
Tight bounds – Widen parameter bounds in the configuration. Check whether any parameter is hitting its bound at the solution (
result.validate()will flag this).Ill-conditioned Jacobian – Check the condition number. If extremely large, consider fixing one or more weakly constrained parameters (e.g.,
D_offset_ref,f2).Insufficient function evaluations – Increase
max_nfevinNLSQConfig.
CMC Divergent Transitions¶
Symptom: NumPyro reports divergent transitions during NUTS sampling;
result.convergence_passed is False.
Remedies:
Increase target acceptance probability – Set
target_accept_prob=0.95inCMCConfig. This reduces the step size, improving sampling in regions of high curvature.Check priors – Overly wide priors can send the sampler into unphysical regions. Reduce
nlsq_prior_width_factorfrom 5.0 to 3.0.Tighten bounds – Ensure parameter bounds exclude regions where the model is undefined or numerically unstable.
Increase warmup – More warmup iterations allow the sampler to adapt its step size and mass matrix more thoroughly.
Memory Errors¶
Symptom: MemoryError or the process is killed by the OOM killer.
Remedies:
Switch to chunked or sequential strategy for NLSQ:
config = NLSQConfig(strategy="chunked", chunk_size=128)
Trim frame range – Load only the frames you need:
data = loader.load(frame_start=0, frame_end=500)
Increase CMC shards – More shards means less memory per shard:
cmc_config = CMCConfig(num_shards=16)
Check for memory leaks – If memory grows across multiple fits, ensure you are not accumulating JAX arrays in a loop without releasing references.
JAX Compilation Slow¶
Symptom: The first NLSQ call takes minutes before any fitting begins.
Causes:
Large array shapes – JIT compilation time scales with the complexity of the computation graph. For very large \(C_2\) matrices, use the chunked or sequential strategy to avoid compiling a single monolithic kernel.
Inconsistent shapes – JAX recompiles whenever input shapes change. Ensure all angles use the same number of frames, or pad to a common size.
Thread contention – Verify
OMP_NUM_THREADSis set appropriately. Over-subscription can slow compilation.XLA flags not set – Run
heterodyne-config-xlato configure optimal compiler flags.
Parameter at Bounds¶
Symptom: A fitted parameter is exactly at its lower or upper
bound; result.validate() may not flag this directly, but
uncertainties for that parameter will be unreliable.
Remedies:
Widen bounds – If the physical range permits, increase the bound.
Check initial values – A poor starting point near a bound can trap the optimiser.
Fix the parameter – If the data cannot constrain a parameter, fix it to a physically motivated value and re-fit.
Inspect the residuals – Parameter-at-bound may indicate a model mismatch rather than a bound problem.
Poor R-hat After CMC¶
Symptom: \(\hat{R} > 1.1\) for one or more parameters.
Remedies:
Run longer – Increase
num_warmupandnum_samples.Increase chains – More chains provide better \(\hat{R}\) estimates:
num_chains=6ornum_chains=8.Check for bimodality – Use
plot_shard_comparison(shard_results)to see if shards converge to different modes.Improve warm-start – A better NLSQ solution as the CMC warm-start helps chains explore the correct region faster.
NaN or Inf in Results¶
Symptom: Fitted parameters contain NaN or Inf.
Causes:
NaN in input data – Check
np.any(np.isnan(c2_data)). The loader’s validation should catch this, but preprocessed data may slip through.Numerical overflow – Very large
D0orv0values combined with long time spans can cause overflow in the exponential. Tighten bounds.Division by zero – If the fraction function reaches exactly 0 or 1, some terms may become degenerate. Check
f0andf3bounds.