Reading from Object Stores
This document covers how to read from object stores, such as S3 and GCS, in Exon and related tools like biobear
and exonr
. The S3 instructions are also applicable to S3 compatible APIs, such as CloudFlare R2, LocalStack, and minio.
For example, to read a file from S3 in exon-duckdb
:
-- This script assumes
-- export AWS_PROFILE="my-profile"
-- export AWS_DEFAULT_REGION="us-east-1"
LOAD exon;
SELECT * FROM read_fasta('s3://bucket/test.fa') LIMIT 5;
S3 and S3 Compatible APIs
For S3, if you're on a personal computer setting the AWS_PROFILE
and AWS_DEFAULT_REGION
environment variables should be sufficient.
For automated use, a recommendation is harder to give as it dependents on the specific use case, though you can set some combination of to indicate the credentials to use:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
AWS_ENDPOINT
AWS_SESSION_TOKEN
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
CloudFlare R2
The CloudFlare R2 service has an S3 compatible API and thus is supported by overwriting the relevant environment variables:
AWS_ACCESS_KEY_ID # your cloudflare access key id
AWS_SECRET_ACCESS_KEY # your cloudflare secret access key
AWS_DEFAULT_REGION # the region to use, likely `auto` should be sufficient
AWS_ENDPOINT # the endpoint to use, e.g. https://$CLOUDFLARE_ACCOUNT_ID.r2.cloudflarestorage.com
After that, pass the path to biobear
tools like: s3://bucket/path/to/file
, where bucket is the bucket in CloudFlare and the path is the path in CloudFlare relative to the bucket.
LocalStack
LocalStack is a useful tool for local development and testing. It provides a local S3 compatible API, among other things. To use it, set the following environment variables:
AWS_ACCESS_KEY_ID # your localstack access key id
AWS_SECRET_ACCESS_KEY # your localstack secret access key
AWS_DEFAULT_REGION # the region to use
AWS_ENDPOINT_URL # the endpoint, e.g. if running on the default port http://localhost:4566
AWS_ALLOW_HTTP # allow http connections, useful for local development
GCS
For GCS, you can use a service account, either the path to the service account file or the JSON serialized service account key:
GOOGLE_SERVICE_ACCOUNT: location of service account file
GOOGLE_SERVICE_ACCOUNT_KEY: JSON serialized service account key