Sunday, October 29, 2017

Happy Birthday, Wikidata!

Wikidata celebrates their 5th birthday with a great WikidataCon in Berlin. Sadly, I could not join in person, so I assuming it is a great meeting, following the #WikidataCon hash tag and occasionally the live stream.

Happy Birthday, Wikidata!

My first encounter was soon after they started, and was particularly impressed by the presentation by Lydia Pintscher at the Dutch Wikimedia Conferentie 2012. I had played with DBPedia occasionally but always disappointed by the number of issues with extracting chemistry from the ChemBox infobox, but that's of course the general problem with data that has been mangled into something that looks nice. We know that problem from text mining from PDFs too. Of course, if you start with something machine readable in the first place, your odds for success are much higher.

Yesterday, Lydia shows the State of Wikidata and I think they delivered on their promise.

I did not create my Wikidata account until a year later but did not use the account much in the first two years. But the Wikidata team did a lot of great work in their first three years, and somewhere in 2015 I wrote my first blog post about Wikidata. That year Daniel Mietchen also asked me to join the writing of a project proposal (later published in RIO Journal). The reason for more active adoption of Wikidata and joining Daniel's writing team, was the CCZero license and that chemical identifiers had really picked up. Indeed, free CAS numbers was an important boon. Since then, I have been using Wikidata as data source for our BridgeDb project and for WikiPathways (together with Denise Slenter). I also have to mention the work by Andra Waagmeester and the rest of the Andrew Su team gave me extra support to push Wikidata in our local research agenda around FAIR data.

The Wikidata RDF export and SPARQL end point was an important tipping point. This makes reuse of Wikidata so much easier. Integrating slices of data with curl is trivial and easy to integrate into other projects, as I do for BridgeDb. Someone in the education breakout session mentioned that you can use the interactive SPARQL end point even with people with zero programming experience. I wholeheartedly agree. That is exactly what I did last Thursday at the Surf Verder bouwen aan Open Science seminar. The learning curve with all the example queries is so shallow, it is generally applicable.

And then there is Scholia. What do I need to say? Impressive project by Finn Nielsen to which I am happy to contribute. Check out his WikidataCon talk. Here I am contributing to the biology corner and working on RSS feeds. It makes a marvelous tool to systematically analyze literature, e.g. for the Rett Syndrome as disease or as topic.

Wikidata has evolved to a tremendously useful resource in my biology research and I cannot imagine where we will be next year, at the sixth Wikidata birthday. But it will be huge!

Sunday, October 15, 2017

Two conference proceedings: nanopublications and Scholia

The nanopublication conference article in
It takes effort to move scholarly publishing forward. And the traditional publishers have not all shown to be good at that: we're still basically stuck with machine-broken channels like PDFs and ReadCubes. They seem to all love text mining, but only if they can do it themselves.

Fortunately, there are plenty of people who do like to make a difference and like to innovate. I find this important, because if we do not do it, who will. Two people who make an effort are two researchers who recently published their work as conference proceedings: Tobias Kuhn and Finn Nielsen. And I am happy to have been able to contribute to both efforts.

Tobias works on nanopublications which innovates how we make knowledge machine readable. And I have stressed how important this is in my blog for years. Nanopublications describe how knowledge is captures, makes it FAIR, but importantly, it links the knowledge to the research that led to the knowledge. His recent conference proceedings details how nanopublications can be used to establish incremental knowledge. That is, given two sets of nanopubblications, it determines which have been removed, added, and changed. The paper continues outlining how that can be used to reduce, for example, download sizes and how it can help establish an efficient change history.

And Finn developed Scholia, an interface not unlike Web-of-Science. But then based on Wikidata and therefore fully on CCZero data. And, with a community actively adding the full history of scholarly literature and the citations between papers, courtesy to the Initiative for Open Citations. This is opening up a lot of possibilities: from keeping track of articles citing your work, to get alerts of articles publishing new data on your favorite gene or metabolite.

Kuhn T, Willighagen E, Evelo C, Queralt-Rosinach N, Centeno E, Furlong L. Reliable Granular References to Changing Linked Data. In: d'Amato C, Fernandez M, Tamma V, Lecue F, Cudré-Mauroux P, Sequeda J, et al., editors. The Semantic Web – ISWC 2017. vol. 10587 of Lecture Notes in Computer Science. Springer International Publishing; 2017. p. 436-451. doi:10.1007/978-3-319-68288-4_26

Nielsen FÃ, Mietchen D, Willighagen E. Scholia and scientometrics with Wikidata.; 2017. Available from:

Sunday, October 08, 2017

CDK used in SIRIUS 3: metabolomics tools from Germany

Screenshot from the SIRIUS 3 Documentation.
License: unknown.
It has been ages I blogged about work I heard about and think should receive more attention. So, I'll try to pick up that habit again.

After my PhD research (about machine learning (chemometrics, mostly), crystallography, QSAR) I first went into the field metabolomics. Because is combines core chemistry with the complexity biology. My first position was with Chris Steinbeck, in Cologne, within the bioinformatics institute led by Prof. Schomburg (of the BRENDA database). During that year, I worked in a group that worked on NMR data (NMRShiftDb, dr. Stefan Kuhn), Bioclipse (collaboration with Ola Spjuth), and, of course, the Chemistry Development Kit (see our new paper).

This new paper, actually, introduces functionality that was developed in that year, for example, work started by Miquel Rojas-Cheró. This includes the work on atom types, which we needed to handle radicals, lone pairs, etc, for delocalisation. It also includes work around handling molecular formula and calculating molecular formulas from (accurate) molecular masses. For the latter, more recent work even further improved on earlier work.

So, whenever metabolomics work is published and they use the CDK, I realize that what the CDK does has impact. This week Google Scholar alerted me about a user guidance document for SIRIUS 3 (see the screenshot). Seems really nice (great) work from Sebastian Böcker et al.!

It also makes me happy, as our Faculty of Heath, Medicine, and Life Sciences (FHML) is now part of the Netherlands Metabolomics Center, and that we published the recent article our vision of a stronger, more FAIR European metabolomics community.

Wednesday, October 04, 2017

new paper: "The future of metabolomics in ELIXIR"

CC-BY from F1000 article.
This spring I attended a meeting organized by researchers from the European metabolomics community, including from PhenoMeNal to talk about proposing a use case to ELIXIR. Doing research in metabolomics and being part of ELIXIR, I was happy that meeting happened. During the meeting I presented the work from our BiGCaT group (e.g. WikiPathways, see doi:10.1093/nar/gkv1024).

During the meeting various metabolomics topics were discussed, and I pushed for interoperability of chemical (metabolic) structures, which requires structure normalization, equivalence testing, etc. You know, the kind of work that partners in Open PHACTS did, and that we're now trying to bootstrap with ChemStructMaps. It did not make it, but ideas are included in the selected topic.

All this you can read in this meeting write up, peer-reviewed in F1000Research (doi:10.12688/f1000research.12342.1). I am happy to have been given the opportunity to contribute to this work. The work in our group (e.g. from our PhD student Denise) can surely contribute to this community effort.

 Van Rijswijk M, Beirnaert C, Caron C, Cascante M, Dominguez V, Dunn WB, et al. The future of metabolomics in ELIXIR. F1000Research. 2017 Sep;6:1649+. 10.12688/f1000research.12342.1.