Application
Transformations
Physical Storage
Attached Storage
Data Distribution
Parallel FS
Materialized Data
Columnar Storage
Evidence Access
MapReduce/DBMS
Presentation
Enriched Models
Schema
Data Models
Variant calling & analysis,
RNA-seq analysis, etc.
Disk, SDD, block
store, memory cache
HDFS, Tachyon, HPC file
systems, S3
Load data from Parquet and
legacy formats
Spark, Spark-SQL,
Hadoop
Enriched Read/Variant
Avro Schema for reads,
variants, and genotypes
Users define analyses
via transformations
Enriched models provide convenient
methods on common models
The evidence access layer
efficiently executes transformations
Schemas define the logical
structure of basic genomic objects
Common interfaces map logical
schema to bytes on disk
Parallel file system layer
coordinates distribution of data
Decoupling storage enables
performance/cost tradeoff