Data is the backbone of any ML application. In catalysis, data can come from various sources such as experimental results, high-throughput screening, and computational simulations. The quality and quantity of data significantly influence the performance of ML models. Therefore, it's crucial to have well-curated datasets that accurately represent the catalytic systems under study.