Ponente
Descripción
Graph Neural Networks (GNNs) have become promising candidates for particle reconstruction and identification in high-energy physics, but their computational complexity makes them challenging to deploy in real-time data processing pipelines. In the next-generation LHCb calorimeter, detector hits — characterized by energy, position, and timing—can be naturally encoded as node features, with spatial and energy-based relationships captured through edge features. This study investigates strategies to reduce both the structural complexity and numerical precision of GNNs to meet stringent real-time processing and resource constraints. We demonstrate that omitting explicit edge features and replacing conventional full message passing with learnable, permutation-invariant aggregation functions results in up to an 8× reduction in CPU inference time, while maintaining or even surpassing the energy resolution and classification performance of baseline methods. Furthermore, we explore post-training quantization, reducing model weights from 32-bit floating point (FP32) to 16-bit or 8-bit integers. While quantization could potentially offer additional efficiency gains, lightweight GNNs with approximately ˜100k parameters exhibit minor inference time performance degradation under aggressive precision reduction. We also present our knowledge distillation experiment, where we train a compact student model to mimic the performance of a larger, more complex teacher network. Our findings provide practical design guidelines for developing fast, efficient, and high-performing GNNs for real-time particle reconstruction in LHCb’s upgraded calorimeter, while also highlighting the limitations of quantization in small neural network architectures.
Abstract
We present a lightweight, attention-enhanced Graph Neural Networks (GNNs) tailored for real-time particle reconstruction and identification in LHCb’s next-generation calorimeter. Our architecture builds on node-centric GarNet layers, which eliminate costly edge message passing and are optimized for FPGA deployment, achieving sub-microsecond inference latency. By integrating attention mechanisms with global aggregation, our models achieve up to 8× faster inference than traditional message-passing GNNs, while maintaining superior performance over conventional algorithms in terms of energy resolution. Through model distillation, quantization and firmware-level integration, we pave a path towards real-time data filtering using GNNs in the LHCb trigger system.