Skip to main content

GenomicRanges Integration

GenomicRanges is a Python package that provides a convenient way to work with genomic ranges. BioBear can be used to read data from a GFF/GTF file quickly and convert it to a GenomicRanges object for further analysis.

From BioBear to GenomicRanges

BioBear and GenomicRanges both "speak" polars, and pandas for that matter, so we'll use a Polars DataFrame as an intermediary. Here's how you can convert a GFF file to a GenomicRanges object using BioBear:

First, we'll import the necessary packages and start a BioBear session:

import biobear as bb
from genomicranges import GenomicRanges

session = bb.new_session()

Then, we'll load the GFF file into a Polars DataFrame. We'll have to rename a couple of columns to match the expected column names in GenomicRanges, which can either be acompolished by using the rename method on the Polars DataFrame or by using SQL aliases.

For all Polars:

df = session.read_gtf_file("python/tests/data/test.gtf").to_polars()
df = df.rename({"seqname": "seqnames", "start": "starts", "end": "ends"})

For SQL:

df = session.sql("""
SELECT seqname AS seqnames, source, type, start AS starts, "end" as ends, score, frame, strand, attributes
FROM gtf_scan('python/tests/data/test.gtf')
""").to_polars()

Now, we can convert the Polars DataFrame to a GenomicRanges object:

gr = GenomicRanges.from_polars(df)
print(gr.get_width().mean())
# 188.77

And with that, you have a GenomicRanges object that you can use for further analysis.