Tuesday, March 21, 2017

OpenAPI to the Ensembl example

Already many months ago I joined a (doi:10.1093/nar/gkv1116) workshop in Amsterdam, organized by Gert Vriend et al (see this coverage). I learned then how to register services, search, and that underneath JSON is used in the API to exchange information about the services. One neat feature is that allows you to specify a lot of detail of the service calls.

Now, at the time we had already used OpenAPI (then still called Swagger) for Open PHACTS for some time, which we later picked up for other projects, like eNanoMapper (API), WikiPathways (API), and BridgeDb (API). OpenAPI configuration files also describe how web services work. So, the idea arose to that it should be possible to convert the first to the second. Simple. I started a GitHub repository, but, of course, did not really have time to implement it.

Then, half a year ago, at the ELIXIR track meeting at the ECCB in The Hague (where I presented this BridgeDb poster), I spoke with people from ELIXIR-DK who were just starting a studentship scheme. This led to a project idea, then a proposal, and then an small, approved project, allowing me to fund Jonathan Mélius to work on this part-time, for about a man month of work, spread over several months.

Jonathan has been doing great work, and because we liked to demo the OpenAPI 2 bridge with a major European resource, Ensembl was suggested (which just published a paper on their core software). An OpenAPI for Ensembl was set up, which is going to be the primary input for the new tool:

The next step was to take the JSON defining the content of this page (you can find the URL to the JSON file at the top of that page, hosted on GitHub too), and convert that to fragments. That the approach works, shows this test entry in

The observant eye will see that various bits of details of the descriptions of the API calls are annotated with EDAM ontology (doi:10.1093/bioinformatics/btt113) terms, a key feature of This information is currently not available in the OpenAPI JSON (we will be exploring how that specification could/should be extended to do this). Moreover, the webservice API methods need ontological annotation in the first place, and we will not be able to totally remove human involvement there.

The EDAM IRIs are still hard-coded in the conversion tool at this moment, but are being factored out into a secondary JSON file for now. So, the conversion tool will take two input JSON files, OpenAPI + EDAM annotation, and create JSON output. The latter can then be inserted into the JSON. We will work on something based on the API to automate that step too.

So, we still have some work to do, but I'm happy with the current progress. We're well on track to complete this project before summer and actually get a long way with the ontology annotation, which was an secondary in the original plan.

Feedback welcome!

Saturday, March 11, 2017

What an Open Science project does: eNanoMapper deliverables archived on ZENODO

eNanoMapper has ended. It was my first EC-funded project as PI. It was great to run a three year Open Science project at this scale. I loved the collaboration with the other partners, and like to thank Lucian and Markus for their weekly coordination of the project! Lucian also reflected on the project in this blog post. He describes the successful completion of the project, and we partly thank that to the uptake of ideas, solutions, and approaches by the NanoSafety Cluster (NSC) community. Many thanks to all NSC projects, including for example NANoREG who were very early adopters!

Our legacy is substantial, I think. I have blogged about some aspects in the past. The projects output includes RRegrs for scanning the regression model space, extensions of AMBIT for substances, tools on top of the APIs, visualizations with JavaScript, etc. Things have been done Open Source and you can find many repositories on GitHub, and we used Jenkins to autobuild various components, and not just source code, but also the eNanoMapper ontology. Several software releases are archived on ZENODO, the ontology is available from BioPortal, the Ontology Lookup Service, and AberOWL (and thanks to the operators for their support to get it properly online!).

Several publications have been published, along with many tutorials. On the website you could already access many of the deliverables of the project. And last week all public deliverables are now archived on ZENODO (HT to Lucian):

Next time, I want to see if we can get the deliverables published in, for example, Research Intentions and Outcomes journal.

Finally, I like to thanks everyone else if the Maastricht University team that worked on eNanoMapper: Cristian Munteanu, who was my first post-doc, Bart Smeets, Linda Rieswijk, Freddie Ehrhart, and part-time Nuno Nunes and Lars Eijssen. Without them I could not have completed our deliverables.

Sunday, March 05, 2017

Upcoming meeting: "Open science and the chemistry lab of the future"

Following the example by Henry Rzepa, here an announcement of a meeting with a great program organized by the Beilstein Institut in Germany. The meeting does also mean I cannot attend another really important meeting, WikiCite, which has a partial overlap :(

At the Open science and the chemistry lab of the future meeting meeting I will represent ELIXIR, which is quite a challenge as they are doing so much, and I only have so much time to cover that. Worse, I am only part-time working on specific ELIXIR tasks, but fortunately getting great help from Rob Hooft of the Dutch Techcenter for Life Sciences (DTL, practically the Dutch ELIXIR node).

I am very much looking forward to meeting friends and seeing people I have only yet met online, like Stuart Chalk (who recently published the CCZero Open Spectral Database) and Open Source Malaria Matthew Todd. Oh, and if you cannot attend the meeting in person, the hashtag to follow is #BeilsteinOS. If you can join, you can register to the meeting here.

Sunday, February 19, 2017

Talk: "Making open science a reality, from a researcher perspective"

Slide from the presentations with
a screenshot of the
Woordenboek Organische Chemie.
Last week I was in Paris (wonderful, but like London, a city that makes you understand Ankh-morpork) for the AgreenSkills+ annual meeting. AgreenSkills+ is a program for postdoc funding in France and the postdocs presented their works. Wednesday (#agreenskills) was a day to learn about Open Science, with other talks from Nancy Potinka and Ivo Grigorov from Foster Open Science, Martin Donnelly from the Edinburgh Digital Curation Centre about data management and the DMPonline tool, and Michael Witt of Purdue University about digital repositories and DataCite (which I should really make time to blog aobut too).

I was asked to talk about my experiences from a researcher perspective (which started with the Woordenboek Organische Chemie). Here are my slides:

Saturday, February 18, 2017

Open Science is already a thing in The Netherlands

It has been hard to miss it: the Dutch National Plan Open Science (doi:10.4233/uuid:9e9fa82e-06c1-4d0d-9e20-5620259a6c65). It sets out an important step forward: it goes beyond Open Access publishing, which has become a tainted topic. After all, green Open Access does not provide enough rights. For example, teachers can still not share green Open Access publications with their students easily.

I am happy I have been able to give feedback on a draft version, and hope it helped. During the weeks before the release I also looked how the Open Science working group of the Open Knowledge International foundation(?) is doing, and happy that at least the Dutch mailing list is still in action. Things are a bit in a flux, as the OKI is undergoing a migration to a new platform. Maybe more about that later.

But one of my main comments was that there already is a lot of Open Science ongoing in The Netherlands. And then I am not talking about all those scientists that already publish part of their work as (gold) Open Access, but the many researchers that already share Open Data, Open Source, or other Open research outputs. In fact, I started a public (CCZero) spreadsheet with GitHub repositories of Dutch research groups, which now also covers many educational groups, at our universities and "hogescholen". This now includes some fourty(!) git repositories, mostly on GitHub but also on GitLab. Wageningen even have their own public git website!

Mind you, I had to educate myself a bit in the exact history of the term Open Science. It actually seems to go back to the USA Open Source community (see these references and particularly this article). And that's actually where I also knew it from, in particular from Dan Gezelter, founding author of the well-known Jmol viewer for small molecules and protein structures, and host of the domain.