How do I…¶
This section contains a number of smaller topics with links and examples meant to provide relatively concrete answers for specific tool development scenarios.
... deal with index/reference data?¶
Galaxy’s concept of data tables are meant to provide tools with access reference datasets or index data not tied to particular histories or users. A common example would be FASTA files for various genomes or mapper-specific indices of those files (e.g. a BWA index for the hg19 genome).
Galaxy data managers are specialized tools designed to populate tool data tables.
... cite tools without an obvious DOI?¶
In the absence of an obvious DOI, tools may contain embedded BibTeX directly.
Futher reading:
bibtex.xml (test tool with a bunch of random examples)
bwa-mem.xml (BWA-MEM tool by Anton Nekrutenko demonstrating citation of an arXiv article)
macros.xml (Macros for vcflib tool demonstrating citing a github repository)
... declare a Docker container for my tool?¶
Galaxy tools can be decorated to with container
tags indicated Docker
container ids that the tools can run inside of.
The longer term plan for the Tool Shed ecosystem is to be able to automatically build Docker containers for tool dependency descriptions and thereby obtain this Docker functionality for free and in a way that is completely backward compatible with non-Docker deployments.
Further reading:
Complete tutorial on Github by Aaron Petkau. Covers installing Docker, building a Dockerfile, publishing to Docker Hub, annotating tools and configuring Galaxy.
Another tutorial from the Galaxy User Group Grand Ouest.
Landing page on the Galaxy Wiki
Impementation details on Pull Request #401
... do extra validation of parameters?¶
Tool parameters support a validator
element (syntax)
to perform validation of a single parameter. More complex validation across
parameters can be performed using arbitrary Python functions using the
code
file syntax but this feature should be used sparingly.
Further reading:
validator XML tag syntax on the Galaxy wiki.
fastq_filter.xml (a FASTQ filtering tool demonstrating validator constructs)
gffread.xml (a tool by Jim Johnson demonstrating using regular expressions with
validator
tags)code_file.xml, code_file.py (test files demonstrating defining a simple constraint in Python across two parameters)
... check input type in command blocks?¶
Input data parameters may specify multiple formats. For example
<param name="input" type="data" format="fastq,fasta" label="Input" />
If the command-line under construction doesn’t require changes based
on the input type - this may just be referenced as $input
. However, if the
command-line under construction uses different argument names depending on
type for instance - it becomes important to dispatch on the underlying type.
In this example $input.ext
- would return the short code for the actual
datatype of the input supplied - for instance the string fasta
or
fastqsanger
would be valid responses for inputs to this parameter for the
above definition.
While .ext
may sometimes be useful - there are many cases where it is
inappropriate because of subtypes - checking if .ext
is equal to fastq
in the above example would not catch fastqsanger
inputs for instance. To
check if an input matches a type or any subtype thereof - the is_of_type
method can be used. For instance
$input.is_of_type('fastq')
would check if the input is of type fastq
or any derivative types such as
fastqsanger
.
... handle arbitrary output data formats?¶
If the output format of a tool’s output cannot be known ahead of time,
Galaxy can be instructed to “sniff” the output and determine the data type
using the same method used for uploads. Adding the auto_format="true"
attribute to a tool’s output enables this.
<output name="out1" auto_format="true" label="Auto Output" />
... determine the user submitting a job?¶
The variable $__user_email__
(as well as $__user_name__
and
$__user_id__
) is available when building up your command in
the tool’s <command>
block. The following tool demonstrates the use of
this and a few other special parameters available to all tools.
... test with multiple value inputs?¶
To write tests that supply multiple values to a multiple="true"
select
or data
parameter - simply specify the multiple values as a comma seperated list.
Here are examples of each:
... test dataset collections?¶
Here are some examples of testing tools that consume collections with type="data_collection"
parameters.
Here are some examples of testing tools that produce collections with output_collection
elements.
... test discovered datasets?¶
Tools which dynamically discover datasets
after the job is complete, either using the <discovered_datasets>
element,
the older default pattern approach (e.g. finding files with names like
primary_DATASET_ID_sample1_true_bam_hg18
), or the undocumented
galaxy.json
approach can be tested by placing discovered_dataset
elements beneath the corresponding output
element with the designation
corresponding to the file to test.
<test>
<param name="input" value="7" />
<output name="report" file="example_output.html">
<discovered_dataset designation="world1" file="world1.txt" />
<discovered_dataset designation="world2">
<assert_contents>
<has_line line="World Contents" />
</assert_contents>
</discovered_dataset>
</output>
</test>
The test examples distributed with Galaxy demonstrating dynamic discovery and the testing thereof include:
... test composite dataset contents?¶
Tools which consume Galaxy composite datatypes can
generate test inputs using the composite_data
element demonstrated by the
following tool.
Tools which produce Galaxy composite datatypes can
specify tests for the individual output files using the extra_files
element
demonstrated by the following tool.
... test index (.loc) data?¶
There is an idiom to supply test data for index during tests using Planemo.
To create this kind of test, one needs to provide a
tool_data_table_conf.xml.test
beside your tool’s
tool_data_table_conf.xml.sample
file that specifies paths to test .loc
files which in turn define paths to the test index data. Both the .loc
files and the tool_data_table_conf.xml.test
can use the value
${__HERE__}
which will be replaced with the path to the directory the file
lives in. This allows using relative-like paths in these files which is needed
for portable tests.
An example commit demonstrating the application of this approach to a Picard tool can be found here.
These tests can then be run with the Planemo test command.
... test exit codes?¶
A test
element can check the exit code of the underlying job using the
check_exit_code="n"
attribute.
... test failure states?¶
Normally, all tool test cases described by a test
element are expected to
pass - but on can assert a job should fail by adding expect_failure="true"
to the test
element.
... test output filters work?¶
If your tool contains filter
elements, you can’t verify properties of outputs
that are filtered out and do not exist. The test
element may contain an
expect_num_outputs
attribute to specify the expected number of outputs, this
can be used to verify that outputs not listed are expected to be filtered out during
tool execution.
... test metadata?¶
Output metadata can be checked using metadata
elements in the XML
description of the output
.
... test tools installed in an existing Galaxy instance?¶
Do not use planemo, Galaxy should be used to test its tools directly. The following two commands can be used to test Galaxy tools in an existing instance.
$ sh run_tests.sh --report_file tool_tests_shed.html --installed
This above command specifies the --installed
flag when calling
run_tests.sh
, this tells the test framework to test Tool Shed installed
tools and only those tools.
$ GALAXY_TEST_TOOL_CONF=config/tool_conf.xml sh run_tests.sh --report_file tool_tests_tool_conf.html functional.test_toolbox
The second command sets GALAXY_TEST_TOOL_CONF
environment variable, which
will restrict the testing framework to considering a single tool conf file
(such as the default tools that ship with Galaxy
config/tool_conf.xml.sample
and which must have their dependencies setup
manually). The last argument to run_tests.sh
, functional.test_toolbox
tells the test framework to run all the tool tests in the configured tool conf
file.
Note
Tip: To speed up tests you can use a pre-migrated database file the way Planemo
does by setting the following environment variable before running
run_tests.sh
.
$ export GALAXY_TEST_DB_TEMPLATE="https://github.com/jmchilton/galaxy-downloads/raw/master/db_gx_rev_0127.sqlite"
... test tools against a package or container in a bioconda pull request?¶
First, obtain the artifacts of the PR by adding this comment:
@BiocondaBot please fetch artifacts
. In the reply one finds the links a zip file containing
the built package and docker image. Download this zip and extract it. For the following let
PACKAGES_DIR
be the absolute path to the directory packages
in the resulting unzipped directory
and IMAGE_ZIP
be the absolute path to the tar.gz
file in the images
directory in the unzipped directory.
In order to test the tool with the package add the following to the planemo call:
$ planemo test ... --conda_channels file://PACKAGES_DIR,conda-forge,bioconda,defaults ...
For containerized testing we need to differentiate two cases:
the tool has a single requirement (that is fulfilled by the container)
the tool has multiple requirements (in this case a docker image will be built on the fly using package)
For the former case the docker image that has been created by the bioconda CI needs to be loaded:
$ gzip -dc IMAGE_ZIP | docker load
and a planemo test can then simply use this image:
$ planemo test ... --biocontainers --no_dependency_resolution --no_conda_auto_init ...
For the later case it suffices to call planemo as follows:
$ planemo test ... --biocontainers --no_dependency_resolution --no_conda_auto_init --conda_channels file://PACKAGES_DIR,conda-forge,bioconda,defaults ...
... interactively debug tool tests?¶
It can be desirable to interactively debug a tool test. In order to do so, start planemo test
with the option --no_cleanup
. Inspect the output: After Galaxy starts up, the tests commence. At the
start of each test one finds a message: ( <TOOL_ID> ) > Test-N
. After some upload jobs, the
actual tool job is started (it is the last before the next test is executed). There you will find
a message like Built script [/tmp/tmp1zixgse3/job_working_directory/000/3/tool_script.sh]
In this case /tmp/tmp1zixgse3/job_working_directory/000/3/
is the job dir. It contains some
files and directories of interest:
tool_script.sh
: the bash script generated from the tool’scommand
andversion_command
tags plus some boiler plate codegalaxy_3.sh
(note that the number may be different): a shell script setting up the environment (e.g. paths and environment variables), starting thetool_script.sh
, and postprocessing (e.g. error handling and setting metadata)working
: the job working directoryoutputs
: a directory containing the job stderr and stdout
For a tool test that uses a conda environment to resolve the requirements one can simply change
into working
and execute ../tool_script.sh
(works as long as no special environment variables
are used; in this case ../galaxy_3.sh
needs to be executed after cleaning the job dir).
By editing the tool script one may understand/fix problems in the command
block faster than by
rerunning planemo test
over and over again.
Alternatively one can change into the working
dir and load the conda environment
(the code to do so can be found in tool_script.sh
: . PATH_TO_CONDA_ENV activate
).
Afterwards one can execute individual commands, e.g. those found in tool_script.sh
or variants.
For a tool test that uses Docker to to resolve the requirements one needs to execute
../galaxy_3.sh
, because it executes docker run ... tool_script.sh
in order to rerun the job
(with a possible edited version of the tool script). In order to run the docker container
interactively execute the docker run .... /bin/bash
that you find in ../galaxy_3.sh
(i.e. ommitting the call of the tool_script.sh
) with added parameter -it
. Note that the
docker run
command contains some shell variables (-v "$_GALAXY_JOB_TMP_DIR:$_GALAXY_JOB_TMP_DIR:rw" -v "$_GALAXY_JOB_HOME_DIR:$_GALAXY_JOB_HOME_DIR:rw"
)
which ensure that the job’s temporary and home directory are available within docker. Ideally
these shell variables are set to the same values as in ../galaxy_3.sh
, but often its sufficient
to remove this part from the docker run
call.