GenomicRanges Integration
GenomicRanges is a Python package that provides a convenient way to work with genomic ranges. BioBear can be used to read data from a GFF/GTF file quickly and convert it to a GenomicRanges
object for further analysis.
From BioBear to GenomicRanges
BioBear and GenomicRanges both "speak" polars, and pandas for that matter, so we'll use a Polars DataFrame as an intermediary. Here's how you can convert a GFF file to a GenomicRanges
object using BioBear:
First, we'll import the necessary packages and start a BioBear session:
import biobear as bb
from genomicranges import GenomicRanges
session = bb.new_session()
Then, we'll load the GFF file into a Polars DataFrame. We'll have to rename a couple of columns to match the expected column names in GenomicRanges, which can either be acompolished by using the rename
method on the Polars DataFrame or by using SQL aliases.
For all Polars:
df = session.read_gtf_file("python/tests/data/test.gtf").to_polars()
df = df.rename({"seqname": "seqnames", "start": "starts", "end": "ends"})
For SQL:
df = session.sql("""
SELECT seqname AS seqnames, source, type, start AS starts, "end" as ends, score, frame, strand, attributes
FROM gtf_scan('python/tests/data/test.gtf')
""").to_polars()
Now, we can convert the Polars DataFrame to a GenomicRanges object:
gr = GenomicRanges.from_polars(df)
print(gr.get_width().mean())
# 188.77
And with that, you have a GenomicRanges
object that you can use for further analysis.