MBP scientist Dr. Michael Hoffman recently led the effort to formalize and standardize a widely-used genomics file format, the Browser Extensible Data (BED) format, which is now an approved Global Alliance for Genomics and Health (GA4GH) standard.
Genomic features — such as genes, regulatory elements and repeated sequences, as well as RNA — can have consequences for human health and disease. To better understand disease-causing genes, we must clearly document these features. Investigators use a process called genome annotation to identify what genomic features are present in a DNA sequence, where they are located and what they do.
Over the past two decades, the Browser Extensible Data (BED) file format has become a popular method of capturing the location of genomic features and associated annotations.
While seemingly straightforward, the lack of conventions has led to a plethora of ways to fill in and structure the fields. The new specification aims to define a numerical range for each specified BED field and provide semantics for whitespace, sorting, default values, and other missing details.
“By standardising the BED format, we can reduce any misinterpretation when using the format, minimise issues when interoperating between software tools, and ultimately avoid errors and inconsistencies in scientific results,” according to Dr. Hoffman, who is also the co-maintainer of the specification.
Dr. Hoffman’s team, which included MBP MSc graduate Danielle Denisko, built a tool to screen for correct behaviour across software tools that read BED files. These efforts are reported in the preprint, “Assessing and assuring interoperability of a genomics file format,” available on bioRxiv.
Please join the Department of Medical Biophysics in congratulating Dr. Hoffman and his team on this remarkable achievement.
Read the official press release from the Global Alliance for Genomics and Health.