Published October 26, 2025 | Version 1.0.0
Poster Open

EBI Search: Engineering and Sustaining Metadata Infrastructure for Life Sciences

  • 1. 0000-0001-7037-2422
  • 2. 0000-0001-8616-2585
  • 3. 0000-0002-0626-984X
  • 4. 0009-0001-4046-469X
  • 5. 0000-0001-8479-0262

Description

EBI Search indexes over 6.5 billion biological records across more than 170 datasets, providing the core metadata infrastructure behind EMBL-EBI’s discovery tools. It supports over 2.3 billion requests annually and enables unified search across biological data from both EMBL-EBI and external resources.

This poster presents an architectural overview of the EBI Search infrastructure and the engineering strategies used to sustain and evolve the system under increasing data volumes and user demands. These strategies include nightly parallel indexing pipelines, index partitioning to bypass Lucene's document limit, and API optimisations supporting faceted queries and bulk streaming across 2TB of data.

We explore how the team behind EBI Search balances competing requirements while maintaining an infrastructure with nearly two decades of continuous operation: optimising performance while ensuring reliability, expanding functionality without impacting backward compatibility, and delivering consistent performance across diverse data formats. 

Through this case study of EBI Search, we reflect on how targeted engineering decisions create lasting research infrastructure, connecting scientists with the data essential for breakthrough biological discoveries.

Files

136_poster_-_Dalia_Al-Shahrabi.pdf

Files (346.1 kB)

Name Size Download all
md5:663a007fead7da7f9f6f7ddae71db254
346.1 kB Preview Download