Inter-annual variation in climate is reflected by changes in the timing of phenology. Over the last decades a considerable number of models have been developed in order to explain the inter-annual variation of spring phenology in trees. Contrary to empirical models, “process-based” models aim at simulating physiological processes in order to yield more realistic predictions of growing season onset dates. Despite the increasing knowledge on the environmental controls of seasonal dormancy in trees, the detailed action and interaction of the involved environmental drivers (chilling, photoperiod and warm temperature) remains to be elucidated. This study aims at a uniform comparison of a wide range of existing models (and new recombinations), on a multitude of long-term observation series in six tree species across central Europe, using extensive cross-validation. Even though the assessed models differ in the phases of dormancy and environmental drivers accounted for, they yielded a surprisingly similar quality of prediction of leaf unfolding dates. Depending on the species, the lowest average prediction errors for leaf unfolding (RMSE) ranged from 7 to 9 days for the dataset pooled across sites and years and from 4 to 6 days for site-specific predictions, in absence of any obvious geographical pattern. Simple models, that feature ecodormancy release only, performed similar or better than more complex models, which additionally include endodormancy release through chilling temperatures. Model parameterisation tended to converge towards similar behaviour and models with many parameters tended to overfit on the 40 year time-series of leaf unfolding. Additionally, all models tended to underestimate the inter-annual variation of leaf unfolding and failed to predict very early or late dates of leaf unfolding in certain years. The transfer of site-specific parameters to other sites was associated with an almost doubling of the average prediction error, independent of distance and climatic similarity between the calibration and validation sites. The findings challenge the accurate implementation of the physiological processes controlling spring phenology in the models and highlight shortcomings associated with model parameterisation on observational time-series only.