Metrics#

Given multiple representations of your data, you can evaluate the biological conservation and the batch removal to evaluate the quality of your representation with respect to batch correction.

Install dependencies#

For this workflow, make sure you install the following environments:

Configuring a basic metrics workflow#

Given a set of files that you want to compare, configure your workflow as follows:

output_dir: test/out
images: test/images

use_gpu: false # only set to True if you are working with GPUs

DATASETS:
  test_metrics:
    input:
      metrics:
        model1--model1_param=val1: test/input/pbmc68k.h5ad
        model2--model2_param=val2: test/input/pbmc68k.h5ad
        unintegrated: test/input/pbmc68k.h5ad
    metrics:
      label: bulk_labels
      batch: louvain
      overwrite_file_id: true
      metrics:
        - nmi
        - ari
        - asw_label
        - asw_batch
        - cell_cycle
        - clisi
        - ilisi
        - graph_connectivity
        - isolated_label_asw
        - isolated_label_f1
        - pcr_comparison
        - pcr_comparison
        - kbet_pg

Call the pipeline with either your runner script (e. g. called configs/qc/run.sh)

bash configs/metrics/run.sh qc_all -nq

You should get the following dry-run output:

Config file configs/outputs.yaml is extended by additional config specified via the command line.
Config file configs/load_data/config.yaml is extended by additional config specified via the command line.
Config file configs/exploration/config.yaml is extended by additional config specified via the command line.
WARNING: Duplicated columns: {'metric': ['methods', 'metrics']}
Building DAG of jobs...
Job stats:
job                                 count
--------------------------------  -------
metrics_all                             1
metrics_barplot                         3
metrics_barplot_per_dataset             3
metrics_barplot_per_file                9
metrics_cluster                        30
metrics_cluster_collect                 3
metrics_collect                         3
metrics_funkyheatmap                    1
metrics_funkyheatmap_per_dataset        1
metrics_merge                           1
metrics_merge_per_batch                 1
metrics_merge_per_dataset               1
metrics_merge_per_file                  3
metrics_merge_per_label                 1
metrics_prepare                         3
metrics_run                            36
total                                 100

Execute the workflow as follows:

bash configs/qc/run.sh metrics_all -c5

Output#

Check the outputs under images/metrics. Per dataset plots are under images/metrics/per_dataset

funkyheatmap

Metrics FunkyHeatmap from images/metrics/per_dataset/test_metrics/funky_heatmap.pdf#

barplot

Metrics values in a barplot from images/metrics/per_dataset/test_metrics/score-barplot.png#

barplot

Compute duration in seconds from images/metrics/per_dataset/test_metrics/s-barplot.png#

TL;DR Full Configuration#

You can find the complete configuration file and runner script under configs/metrics/. Here’s the final workflow configuration:

configs/metrics/example_workflow.yaml#
output_dir: data/out
images: images

use_gpu: false

DATASETS:
  test_metrics:
    input:
      metrics:
        model1--model1_param=val1: data/pbmc68k.h5ad
        model2--model2_param=val2: data/pbmc68k.h5ad
        unintegrated: data/pbmc68k.h5ad
    metrics:
      label: bulk_labels
      batch: louvain
      overwrite_file_id: true
      metrics:
        - nmi
        - ari
        - asw_label
        - asw_batch
        - cell_cycle
        - clisi
        - ilisi
        - graph_connectivity
        - isolated_label_asw
        - isolated_label_f1
        - pcr_comparison
        - pcr_comparison
        - kbet_pg
configs/metrics/run.sh#
#!/usr/bin/env bash
set -e -x

snakemake \
  --profile .profiles/local \
  --configfile \
    configs/metrics/example_workflow.yaml \
  --snakefile workflow/Snakefile \
  --use-conda \
  --rerun-incomplete \
  --keep-going \
  --printshellcmds \
    $@