Here’s a short screencast recapping my initial findings when testing the Google Batch API with samtools. I used the open source samtools container and pulled a BAM file from a public dataset.
My `job.json` file for configuring Google Batch to test a `samtools index` example is shown below:
{
"taskGroups": [
{
"taskSpec": {
"runnables": [
{
"container": {
"imageUri":"gcr.io/cloud-lifesciences/samtools",
"entrypoint": "/bin/sh",
"commands": [
"-c",
"samtools index ${BAM} /mnt/disks/share/${BAI}"
]
},
"environment": {
"variables": {
"BAM": "gs://genomics-public-data/NA12878.chr20.sample.bam",
"BAI": "NA12878.chr20.sample.bam.bai"
}
}
}
],
"volumes": [
{
"gcs": {
"remotePath": "<MYBUCKET>/<MYPATH/"
},
"mountPath": "/mnt/disks/share"
}
],
"computeResource": {
"cpuMilli": 2000,
"memoryMib": 2000
},
"maxRetryCount": 3,
"maxRunDuration": "100000s"
},
"taskCount": 1,
"parallelism": 10
}
],
"logsPolicy":{
"destination": "CLOUD_LOGGING"
}
}
More on my continued GCP Compute Service testing and comparisons using `samtools` examples at my `gcp-for-bioinformatics` repo on GitHub - link
Share this post