Skip to main content

Reading from Object Stores

This document covers how to read from object stores, such as S3 and GCS, in Exon and related tools like biobear and exonr. The S3 instructions are also applicable to S3 compatible APIs, such as CloudFlare R2, LocalStack, and minio.

For example, to read a file from S3 in exon-duckdb:

-- This script assumes
-- export AWS_PROFILE="my-profile"
-- export AWS_DEFAULT_REGION="us-east-1"

LOAD exon;
SELECT * FROM read_fasta('s3://bucket/test.fa') LIMIT 5;

S3 and S3 Compatible APIs

For S3, if you're on a personal computer setting the AWS_PROFILE and AWS_DEFAULT_REGION environment variables should be sufficient.

For automated use, a recommendation is harder to give as it dependents on the specific use case, though you can set some combination of to indicate the credentials to use:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
AWS_ENDPOINT
AWS_SESSION_TOKEN
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI

CloudFlare R2

The CloudFlare R2 service has an S3 compatible API and thus is supported by overwriting the relevant environment variables:

AWS_ACCESS_KEY_ID  # your cloudflare access key id
AWS_SECRET_ACCESS_KEY # your cloudflare secret access key
AWS_DEFAULT_REGION # the region to use, likely `auto` should be sufficient
AWS_ENDPOINT # the endpoint to use, e.g. https://$CLOUDFLARE_ACCOUNT_ID.r2.cloudflarestorage.com

After that, pass the path to biobear tools like: s3://bucket/path/to/file, where bucket is the bucket in CloudFlare and the path is the path in CloudFlare relative to the bucket.

LocalStack

LocalStack is a useful tool for local development and testing. It provides a local S3 compatible API, among other things. To use it, set the following environment variables:

AWS_ACCESS_KEY_ID  # your localstack access key id
AWS_SECRET_ACCESS_KEY # your localstack secret access key
AWS_DEFAULT_REGION # the region to use
AWS_ENDPOINT_URL # the endpoint, e.g. if running on the default port http://localhost:4566
AWS_ALLOW_HTTP # allow http connections, useful for local development

GCS

For GCS, you can use a service account, either the path to the service account file or the JSON serialized service account key:

GOOGLE_SERVICE_ACCOUNT: location of service account file
GOOGLE_SERVICE_ACCOUNT_KEY: JSON serialized service account key