Automated Workflow Composition in the Life Sciences

9 - 13 March 2020

Venue: Oort

If you are invited or already registered for this workshop, you have received login details by email.

In the age of computational science, researchers in the life sciences – just as in other domains – regularly face the need of composing several individual software tools into pipelines or workflows that perform the specific data analysis processes that they need in their research. For over 20 years now, dedicated scientific workflow management systems have been supporting scientists in this task, and they continue to gain popularity. In fact, recent years have seen significant progress in the functional annotation of bioinformatics software tools, as well as their virtualization, containerization and assembly into workflows for automatically executing the processes.

At least since the rise of the Semantic Web in the early 2000s, also the idea of semantics-based automated composition of workflows has been around to simplify the work with scientific workflows further and free life science researchers from having to deal with the technicalities of software composition. This would not only save valuable research time, but also reduce errors, allow benchmarking of data analysis pipelines and enable new scientific findings by discovering workflows that researchers would not have thought of themselves. However, despite its obvious potential and appeal, the need for optimizing data analysis workflows, and despite different research groups working on the topic, automated workflow composition has not yet arrived in the daily practice of life science researchers.

The reasons for this are manifold. Some are more practical (for example the lack of automatic composition tools in the commonly used software frameworks), others are of more fundamental nature (such as questions on specification languages, composition algorithms, formal semantics and workflows representations). On one important aspect, namely the semantic annotation of tools on a large scale, the life science community has made significant progress in the last years: The EDAM ontology provides a controlled vocabulary of bioinformatics operations, data types and formats, and the bio.tools registry has become a large collection of bioinformatics tools that are semantically annotated with terms from the EDAM ontology. As demonstrated in a recent Bioinformatics publication (https://academic.oup.com/bioinformatics/article/35/4/656/5060940), this forms a solid basis for performing automated workflow composition in the life sciences domain. Nevertheless, it is still a long way to its use in daily scientific practice.

This workshop will bring together researchers and practitioners who have been working on different aspects related to automated workflow composition in the life sciences. These include life science researchers, tool providers, infrastructure developers, ontologists, algorithmics researchers and many more. They do not normally come together as a group at the regular scientific events, so a Lorentz workshop devoted to this topic provides a unique opportunity to join forces and together significantly advance the field.

Towards this goal, the workshop aims at:

bringing the participants from different backgrounds to a common workable level of knowledge on automated workflow composition through a series of presentations on the relevant topics by experts in the field, letting the participants apply the presented concepts and techniques to a selection of real workflow scenarios from the life sciences to challenge their usability in practice, anddiscuss and evaluate the outcomes of these activities to develop a common perspective on future directions in the field of automated workflow composition.

Program

March 9

Monday (Workflows in the Life Sciences)

until 11:00 Arrival, Coffee

11:00-11:15 Welcome by the Lorentz Center

11:15-12:00 Workshop goals and structure (workshop organizers), problem definition and possible solutions, brief introductions.

12:00-13:00 Lunch break

13:00-14:00 Opening Keynote “Workflow Wanders and Wonders” (Prof. Carole Goble, University of Manchester)

14:00-16:30 Concrete workflow examples from different domains

14:00-14:20 Genomics (Leon Mei, LUMC)

14:20-14:40 Proteomics (Veit Schwämmle, SDU, Denmark)

14:40-15:00 Proteogenomics (Tim Griffin, University of Minnesota)

15:00-15:30 Coffee break

15:30-15:50 Metabolomics (Aswin Verhoeven, LUMC)

15:50-16:10 Metaomics (Pratik Jagtap, University of Minnesota)

16:10-16:30 Scientometrics and text mining (Magnus Palmblad, LUMC)

16:30-17:00 Outlook on Tuesday-Thursday

17:00- Wine & cheese reception combined with poster session (attendants, in particular early-stage researchers, will be invited to bring relevant posters to stimulate discussion and interaction. Posters will be up all week.)

March 10

uesday (Semantics, Ontologies and Functional Tool Annotations)

9:00-09:45 Keynote Presentation on the principles of semantics and ontologies, including examples of biomedical ontologies (Prof. Robert Stevens, University of Manchester)

9:45-10:00 Discussion on the presentation

10:00-10:30 Break

10:30-11:00 Presentation: “EDAM, bio.tools and other important projects for software description in the European life sciences community” (Matúš Kalaš and Hervé Ménager)

11:00-11:30 Presentation: “Tool function description in practice” (Hans Ienasescu and Jon Ison)

11:30-12:00 Discussion on the presentations

12:00-14:00 Lunch break

14:00-16:00 Breakout sessions (in thematic groups) to work on semantic annotations of the tools (EDAM + bio.tools) needed in the workflow scenarios defined on Monday. What is there and what is needed in EDAM or bio.tools?

16:00-16:30 Break (coffee available)

16:30-17:30 Reports from breakout sessions presentation of the developed annotations, summary of problems and particular challenges

18:00-late Pizza and curatathon (optional)

March 11

Wednesday (Automated Workflow Composition: Specification and Algorithms)

09:00-09:20 Presentation: “Tool prediction in Galaxy” (Alireza Khanteymoori, University of Freiburg)

09:20-09:40 Presentation on intelligent workflow instance generation and selection with the WINGS framework (Prof. Paul Groth, University of Amsterdam)

09:40-10:00 Presentation: “The Automated Pipeline Explorer (APE)” (Anna-Lena Lamprecht, Utrecht University)

10:00-10:20 Presentation: “Semantic Data Federation with SADI, HYDRA and a Valet” (Prof. Chris Baker, University of New Brunswick)

10:20-11:00 Break

11:00-12:00 Panel discussion of commonalities and differences between the approaches

12:00-14:00 Lunch break

14:00-16:00 Breakout sessions (in thematic groups) How would these approaches work out in the different domains, on the different workflow examples?

16:00-16:30 Break

16:30-17:30 Reports from breakout sessions: summary of insights, problems and particular challenges

17:30-late Workshop dinner at Het Prentenkabinet (see directions in slides)

March 12

Thursday (Comparison/Ranking/Selection/Benchmarking of Workflows)

09:00-09:15 Short introduction to comparison/ranking/selection problems (Anna-Lena Lamprecht, Utrecht University)

09:15-11:30 Breakout sessions What can you say without executing the workflow? (design time decisions, implementation)

11:30-12:00 Reports from the breakout sessions

12:00-14:00 Lunch break

14:00-15:00 Presentation: Tool and workflow benchmarking (Salvador Capella-Gutierrez, Barcelona Supercomputing Center)

15:00-16:00 Breakout sessions How to assess the results of the workflows? (What can you say about workflow with executing them?)

16:00-16:30 Break

16:30-17:30 Reports from breakout sessions

18:00-late Pizza and hackathon (optional)

March 13

9:00-10:00 Plenary discussion

10:00-10:30 Break

10:30-12:00 Wrap-up and reviewing of session reports

12:00-13:00 Lunch

13:00-14:00 Inspirational Closing Keynote: “Making Workflows FAIR with

Nanopublications” (Tobias Kuhn)

14:00- Farewell (workshop organisers)

Participants

Please login to view the participants information. You have received the log in details in your registration confirmation.

Workshop files

Scientific Report

Workshop links

Article - Perspectives on automated composition of workflows in the life sciences

Scientific organizers:

Jon Ison, Technical University of Denmark

Magnus Palmblad, Leiden UMC

Anna-Lena Lamprecht, Utrecht University

Veit Schwämmle, University of Southern Denmark

Sponsors:

Workshop coordinator

Tanja Uitbeijerse

+31 71 527 5542

uitbeijerse@lorentzcenter.nl

Automated Workflow Composition in the Life Sciences

9 - 13 March 2020

Venue: Oort

Program

March 9

March 10

March 11

March 12

March 13

Participants

Workshop files

Workshop links

Scientific organizers:

Sponsors:

Workshop coordinator

Follow us on: