Optimising neural network hyperparameters by predicting unseen architectures with learning curve approach

Smith, C. and Wong, T.C. (2026) Optimising neural network hyperparameters by predicting unseen architectures with learning curve approach. Applied Soft Computing, 187. 114340. ISSN 1568-4946 (https://doi.org/10.1016/j.asoc.2025.114340)

[thumbnail of Smith-Wong-ASC-2025-Optimising-neural-network-hyperparameters-by-predicting-unseen-architectures]
Preview
Text. Filename: Smith-Wong-ASC-2025-Optimising-neural-network-hyperparameters-by-predicting-unseen-architectures.pdf
Accepted Author Manuscript
License: Creative Commons Attribution 4.0 logo

Download (1MB)| Preview

Abstract

Hyperparameter optimisation (HPO) is a critical yet computationally expensive task in training neural networks. While recent research has used learning curve prediction to terminate poor configurations early, no existing method can predict the full set of unseen learning curves within a search space. Current techniques often rely on meta-learning across datasets or partial curve information, limiting both generalisability and precision. This study introduces SEquential LEarning Curve Training (SELECT), a novel HPO approach that accurately predicts the full learning curves of all unseen hyperparameter configurations using a sequence prediction model trained on a subset of the same dataset. SELECT employs a new representation of learning curve data—converted into structured sequences with aligned starting points—and leverages a Convolutional Gated Recurrent Neural Network (CGRNN) for high-fidelity forecasting. Benchmarked across multiple real-world regression datasets, SELECT consistently outperformed established methods including Random Search (RS), Hyperband (HB), Gaussian Process Bayesian Optimisation (GPBO), and Tree-structured Parzen Estimator (TPE), achieving up to 20% improvements in predictive accuracy while maintaining consistent and predictable computation time. Its design enables full parallel training of configurations, making it highly suitable for large-scale or time-sensitive applications. In addition to its performance and scalability, SELECT offers a unique advantage in search space visualisation. By predicting entire learning curves, it allows practitioners to inspect the shape and structure of the optimisation landscape, revealing patterns such as diminishing returns and performance plateaus. This transparency enhances trust, interpretability, and decision-making in both research and industrial HPO workflows.

ORCID iDs

Smith, C. and Wong, T.C. ORCID logoORCID: https://orcid.org/0000-0001-8942-1984;