In the age of computational science, researchers in the life sciences – just as in other domains – regularly face the need of composing several individual software tools into pipelines or workflows that perform the specific data analysis processes that they need in their research. For over 20 years now, dedicated scientific workflow management systems have been supporting scientists in this task, and they continue to gain popularity. In fact, recent years have seen significant progress in the functional annotation of bioinformatics software tools, as well as their virtualization, containerization and assembly into workflows for automatically executing the processes.
At least since the rise of the Semantic Web in the early 2000s, also the idea of semantics-based automated composition of workflows has been around to simplify the work with scientific workflows further and free life science researchers from having to deal with the technicalities of software composition. This would not only save valuable research time, but also reduce errors, allow benchmarking of data analysis pipelines and enable new scientific findings by discovering workflows that researchers would not have thought of themselves. However, despite its obvious potential and appeal, the need for optimizing data analysis workflows, and despite different research groups working on the topic, automated workflow composition has not yet arrived in the daily practice of life science researchers.
The reasons for this are manifold. Some are more practical (for example the lack of automatic composition tools in the commonly used software frameworks), others are of more fundamental nature (such as questions on specification languages, composition algorithms, formal semantics and workflows representations). On one important aspect, namely the semantic annotation of tools on a large scale, the life science community has made significant progress in the last years: The EDAM ontology provides a controlled vocabulary of bioinformatics operations, data types and formats, and the bio.tools registry has become a large collection of bioinformatics tools that are semantically annotated with terms from the EDAM ontology. As demonstrated in a recent Bioinformatics publication (https://academic.oup.com/bioinformatics/article/35/4/656/5060940), this forms a solid basis for performing automated workflow composition in the life sciences domain. Nevertheless, it is still a long way to its use in daily scientific practice.
This workshop will bring together researchers and practitioners who have been working on different aspects related to automated workflow composition in the life sciences. These include life science researchers, tool providers, infrastructure developers, ontologists, algorithmics researchers and many more. They do not normally come together as a group at the regular scientific events, so a Lorentz workshop devoted to this topic provides a unique opportunity to join forces and together significantly advance the field.
Towards this goal, the workshop aims at:
bringing the participants from different backgrounds to a common workable level of knowledge on automated workflow composition through a series of presentations on the relevant topics by experts in the field, letting the participants apply the presented concepts and techniques to a selection of real workflow scenarios from the life sciences to challenge their usability in practice, anddiscuss and evaluate the outcomes of these activities to develop a common perspective on future directions in the field of automated workflow composition.
Monday (Workflows in the Life Sciences)
until 11:00 Arrival, Coffee
11:00-11:15 Welcome by the Lorentz Center
11:15-12:00 Workshop goals and structure (workshop organizers), problem definition and possible solutions, brief introductions.
12:00-13:00 Lunch break
13:00-14:00 Opening Keynote “Workflow Wanders and Wonders” (Prof. Carole Goble, University of Manchester)
14:00-16:30 Concrete workflow examples from different domains
14:00-14:20 Genomics (Leon Mei, LUMC)
14:20-14:40 Proteomics (Veit Schwämmle, SDU, Denmark)
14:40-15:00 Proteogenomics (Tim Griffin, University of Minnesota)
15:00-15:30 Coffee break
15:30-15:50 Metabolomics (Aswin Verhoeven, LUMC)
15:50-16:10 Metaomics (Pratik Jagtap, University of Minnesota)
16:10-16:30 Scientometrics and text mining (Magnus Palmblad, LUMC)
16:30-17:00 Outlook on Tuesday-Thursday
17:00- Wine & cheese reception combined with poster session (attendants, in particular early-stage researchers, will be invited to bring relevant posters to stimulate discussion and interaction. Posters will be up all week.)
uesday (Semantics, Ontologies and Functional Tool Annotations)
9:00-09:45 Keynote Presentation on the principles of semantics and ontologies, including examples of biomedical ontologies (Prof. Robert Stevens, University of Manchester)
9:45-10:00 Discussion on the presentation
10:00-10:30 Break
10:30-11:00 Presentation: “EDAM, bio.tools and other important projects for software description in the European life sciences community” (Matúš Kalaš and Hervé Ménager)
11:00-11:30 Presentation: “Tool function description in practice” (Hans Ienasescu and Jon Ison)
11:30-12:00 Discussion on the presentations
12:00-14:00 Lunch break
14:00-16:00 Breakout sessions (in thematic groups) to work on semantic annotations of the tools (EDAM + bio.tools) needed in the workflow scenarios defined on Monday. What is there and what is needed in EDAM or bio.tools?
16:00-16:30 Break (coffee available)
16:30-17:30 Reports from breakout sessions presentation of the developed annotations, summary of problems and particular challenges
18:00-late Pizza and curatathon (optional)
Wednesday (Automated Workflow Composition: Specification and Algorithms)
09:00-09:20 Presentation: “Tool prediction in Galaxy” (Alireza Khanteymoori, University of Freiburg)
09:20-09:40 Presentation on intelligent workflow instance generation and selection with the WINGS framework (Prof. Paul Groth, University of Amsterdam)
09:40-10:00 Presentation: “The Automated Pipeline Explorer (APE)” (Anna-Lena Lamprecht, Utrecht University)
10:00-10:20 Presentation: “Semantic Data Federation with SADI, HYDRA and a Valet” (Prof. Chris Baker, University of New Brunswick)
10:20-11:00 Break
11:00-12:00 Panel discussion of commonalities and differences between the approaches
12:00-14:00 Lunch break
14:00-16:00 Breakout sessions (in thematic groups) How would these approaches work out in the different domains, on the different workflow examples?
16:00-16:30 Break
16:30-17:30 Reports from breakout sessions: summary of insights, problems and particular challenges
17:30-late Workshop dinner at Het Prentenkabinet (see directions in slides)
Thursday (Comparison/Ranking/Selection/Benchmarking of Workflows)
09:00-09:15 Short introduction to comparison/ranking/selection problems (Anna-Lena Lamprecht, Utrecht University)
09:15-11:30 Breakout sessions What can you say without executing the workflow? (design time decisions, implementation)
11:30-12:00 Reports from the breakout sessions
12:00-14:00 Lunch break
14:00-15:00 Presentation: Tool and workflow benchmarking (Salvador Capella-Gutierrez, Barcelona Supercomputing Center)
15:00-16:00 Breakout sessions How to assess the results of the workflows? (What can you say about workflow with executing them?)
16:00-16:30 Break
16:30-17:30 Reports from breakout sessions
18:00-late Pizza and hackathon (optional)
9:00-10:00 Plenary discussion
10:00-10:30 Break
10:30-12:00 Wrap-up and reviewing of session reports
12:00-13:00 Lunch
13:00-14:00 Inspirational Closing Keynote: “Making Workflows FAIR with
Nanopublications” (Tobias Kuhn)
14:00- Farewell (workshop organisers)