wdl学习心得

Posted on 2024-05-23 Edited on 2024-07-01 Views: Waline: Word count in article: 2.2k Reading time ≈ 4 mins.

WDL官网简介：

工作流描述语言（WDL）是一种开放标准，用于以人类可读可写的语法描述数据处理工作流。WDL 可以直接定义分析任务，在工作流中将它们连接起来，并并行执行。该语言力求让所有用户（包括程序员、分析师和生产系统的操作员）都能理解和使用。这种语言可以简单地表达常见的模式，如分散收集和条件执行。WDL 专为可移植性而设计，有多种实现方式可供选择，可在包括 HPC 系统和云平台在内的各种环境中运行。

WDL主要有5个核心组件，WDL不是顺序执行的，所以workflow、task、call在脚本内的排列顺序并不重要。

组件
workflow	定义了整个工作流程，类似于main
task	定义了单独的每个子任务，位于workflow模块外部，类似于函数
call	位于workflow模块内部，表示执行一个特定的函数（task）
command	字面意思
output	字面意思

例如：

task:

task my_task {
  String input_file
  String output_file

  command {
    cat ${input_file} > ${output_file}
  }

  output {
    File output = ${output_file}
  }
}

workflow

workflow my_workflow {
  input {
    File input_file
  }

  call my_task {
    input: input_file = input_file
  }

  output {
    File output_file = my_task.output
  }
}

完整的例子：

文件合并：

task merge_files {
  File file1
  File file2
  String output_file

  command {
    cat ${file1} ${file2} > ${output_file}
  }

  output {
    File merged_file = ${output_file}
  }
}

workflow merge_workflow {
  input {
    File file1
    File file2
  }

  call merge_files {
    input: file1 = file1, file2 = file2, output_file = "merged.txt"
  }

  output {
    File merged_file = merge_files.merged_file
  }
}

排序例子：

task sort_file {
  File input_file
  String output_file

  command {
    sort ${input_file} > ${output_file}
  }

  output {
    File sorted_file = ${output_file}
  }
}

workflow sort_workflow {
  input {
    File input_file
  }

  call sort_file {
    input: input_file = input_file, output_file = "sorted.txt"
  }

  output {
    File sorted_file = sort_file.sorted_file
  }
}

条件语句：

task process_file {
  File input_file
  String output_file

  command {
    # 处理文件
    # ...
  }

  output {
    File processed_file = ${output_file}
  }
}

workflow process_files {
  input {
    Array[File] input_files
  }

  scatter (file in input_files) {
    call process_file {
      input: input_file = file, output_file = "${file}.processed"
    }
  }

  output {
    Array[File] processed_files = process_file.processed_file
  }
}