What are we doing?

We are working with life sciences training communities to adapt and deliver existing training materials for High Throughput Sequencing (HTS) data analysis skills needed in genomics research.

The problem

Rapid development of DNA sequencing technologies has made it possible for biomedical disciplines to rival the physical sciences in data production capability. Today’s sequencing instruments rival those fields in terms of data throughput. Yet biology is different from these disciplines in one fundamental aspect—the lack of computational and data analysis training in standard biomedical curricula.

The main challenge with the explosion of biomedical datasets is not the data and the respective required storage space, nor the computational resources, but rather the general lack of trained and skilled researchers to manipulate and analyze these data. The need for such training cannot be overstated; while the majority (>95%) of researchers work or plan to work with large datasets, most (>65%) possess only minimal bioinformatics skills and are not comfortable with statistical analyses (Larcombe et al, 2017; Williams and Teal, 2017). This overwhelming need drives the demand, which, at present, greatly exceeds supply (Attwood et al, 2017). In a recent survey from EMBL-ABR, over 60% of biologists expressed the need for more training while only 5% called for additional computing power.

Different communities such as GOBLET, ELIXIR, EMBL, The Carpentries and Galaxy, are tackling this problem by creating and providing high quality, decentralized, accessible, and practical training in computational data analysis to biomedical researchers worldwide. The impact of these efforts is already evident, with many researchers now being able to analyze their own data. However, each of these communities are developing material covering one or two aspects of the full data analysis process in life science i.e. data management and metadata, bioinformatics and data analysis, or basic computing and scripting skills. Moreover the provided workshops are not always accessible nor inclusive.

Given their cumulative experience and expertise in training for Life Sciences, the next step for these communities is to work together towards a full curriculum on the computational analysis of HTS data, starting from raw data and leading up to the production of publication ready visualizations of the analysis results. More importantly, these training activities should be scalable and accessible to anyone around the globe, particularly for people unable to attend face-to-face workshops due to social, technical or cultural limitations, leveraging new technologies to overcome this obstacle.

The solution

Gallantries aims to bridge the different training communities (EMBL, The Carpentries, Galaxy, ELIXIR, GOBLET) and fill the remaining gap in bioinformatics training.

Development of curriculum and training material

We create curriculum and training material on the computational analysis of HTS data

  1. Data analysis leading from the raw sequencing data to the downstream count tables, using Galaxy tools relying on both their wide use as well as the standardization they offer.
  2. Manipulation and visualization of the results using R as the data science programming language of choice, including an introduction to R using RStudio inside a Galaxy Interactive Environment (GIE).

The complete curriculum will be provided here, with complete bidirectional linking to both websites, and acts as a representation of the virtual bridge between the two communities.

Training delivering

The produced training materials are delivered during three days workshops. Each workshop includes both a physical and a virtual aspect; it will comprise of multiple sites delivering the same content at the same time across multiple time zones and various locations, through online streaming. This is a very ambitious goal of the project, with significant importance to other communities and initiatives; as such we document all the organizational and practical steps required in making this “hybrid” workshops happen, and provide this information as a resource for adoption, reuse and further improvement.

Bridging communities

Finally, going beyond the strict limits of preparing and delivering the training material, the ultimate goal of this project is to build a lasting community of training in Life Sciences. This is achieved by bridging the Galaxy Training Network and the Carpentries, thus bringing together the 15+ year training expertise of the Carpentries community with the sustainable computational infrastructure of Galaxy.

To bridge these communities, collaboration fests are organized. Each one will entail focused development of the training materials in the form of a sprint, but more important it will foster collaboration between the, currently, disjointed communities.

Who are we?

We are a group of enthousiastic people about training and community: you can find more details about us in the team

We are instructors, mentors, and contributors within the Carpentries and Galaxy Training communities. We have extensive experience and a strong commitment to producing concise and evidenced based training, while also being mindful of providing inclusive training for all involved.

We are key contributors of the Galaxy community, with clear insights on the current and future technical capabilities of the Galaxy infrastructure that is widely used particularly in training events around the globe.

Within the ELIXIR network, our role as Training Coordinators allow us to directly support both Institutions and individual researchers in Life Sciences across Europe by providing training, as well as continuously identifying potential gaps in the training needs.

Finally, we all have a long standing commitment to the Open Science movement: Mozilla Open Leaders initiative, Open Science Training Handbook, etc.

Our values

We have high ethical standards, including:

  • Education: Educate the researcher about HTS data analyse, reproducibility, open science
  • Transparency: Emphasize transparency and the sharing of resources, material, knowledge and experiences
  • Open science: Promote citizen science and decentralized access to science
  • Modesty: Know you don’t know everything
  • Community: Carefully listen to any concerns and questions and respond honestly
  • Respect: Respect humans and all living systems
  • Responsibility: Recognize the complexity and dynamics of life science and research and our responsibility towards them

What do we need?

You! In whatever way you can help.

We need expertise in training, community building, education, communication, HTS data analysis. We’d love your feedback along the way, of course.

Get involved

If you think you can help in any of the areas listed above (and we bet you can) or in any of the many areas that we haven’t yet thought of (and here we’re sure you can) then please check out our contributors’ guidelines and our roadmap.

Please note that it’s very important to us that we maintain a positive and supportive environment for everyone who wants to participate. When you join us we ask that you follow our code of conduct in all interactions both on and offline.

You are very welcomed and invited to join the community: Come and chat with us on Gitter