Saturday, January 20, 2018

Winter solstice challenge #3: the winner is Bianca Kramer!

Part of the winning submission in the category 'best tool'.
A bit later than intended, but I am pleased to announce the winner of the Winter solstice challenge: Bianca Kramer! Of course, she was the only contender, but her solution is awesome! In fact, I am surprised no one took her took, ran it on their own data and just submit that (which was perfectly well within the scope of the challenge).

Best Tool: Bianca Kramer
The best tool (see the code snippet on the right) uses R and a few R packages (rorcid, rjson, httpcache) and services like ORCID and CrossRef (and the I4OC project), and the (also awesome) project. The code is available on GitHub.

Highest Open Knowledge Score: Bianca Kramer
I did not check the self-reported score of 54%, but since no one challenged here, Bianca wins this category too.

So, what next? First, start calculating your own Open Knowledge Scores. Just to be prepared for the next challenge in 11 months. Of course, there is still a lot to explore. For example, how far should we recurse with calculating this score? The following tweet by Daniel Gonzales visualizes the importance so clearly (go RT it!):

We have all been there, and I really think we should not teach our students it is normal that you have to trust your current read and no be able to look up details. I do not know how much time Gonzales spent on traversing this trail, but it must not take more than a minute, IMHO. Clearly, any paper in this trail that is not Open, will require a look up, and if your library does not have access, an ILL will make the traverse much, much longer. Unacceptable. And many seem to agree, because Sci-Hub seems to be getting more popular every day. About the latter, almost two years ago I wrote Sci-Hub: a sign on the wall, but not a new sign.

Of course, in the end, it is the scholars that should just make their knowledge open, so that every citizen can benefit from it (keep in mind, a European goal is to educate half the population with higher education, so half of the population is basically able to read primary literature!).

That completes the circle back to the winner. After all, Bianca Kramer has done really important work on how scientists can exactly do that: make their research open. I was shocked to see this morning that Bianca did not have a Scholia page yet, but that is fixed now (though far from complete):

Other papers that you should be read more include:
Congratulations, Bianca!

Sunday, January 14, 2018

Faceted browsing of chemicals in Wikidata

A few days ago Aidan and José introduced GraFa on the Wikidata mailing list. It is a faceted browser for content in Wikidata, and the screenshot on the right shows that for chemical compounds. They are welcoming feedback.

GraFa run on things of type chemical compound.
Besides this screenshot, I have not played with it a lot. It looks quite promising, and my initial feedback would be a feature to sort the results, and ability to export the full list to some other tool, e.g. download all those items as RDF.

Saturday, January 06, 2018

"All things must come to an end"

Cover of the book.
No worries, this is just about my Groovy Cheminformatics book. Seven years ago I started a project that was very educational to me: self-publishing a book. With the help from I managed to get a book out that sold over 100 copies and that was regularly updated. But their lies the problem: supply creates demand. So, I had a system that supplied me with an automated set up that reran scripts, recreated text output and even figures for the book (2D chemical diagrams). I wanted to make an edition for every CDK release. All in all, I got quite far with that: eleven editions.

But the current research setting, or at least in academia, does not provide me with the means to keep this going. Sad thing is, the hardest part is actually updating the graphics for the cover, which needs to resize each time the book gets ticker. But John Mayfield introduced so many API changes, I just did not have the time to update the book. I tried, and I have a twelfth edition on my desk. But where my automated setup scales quite nicely, I don't.

It may we worth reiterating why I started the book. We have had several places where information was given, and questions were answered: the mailing list, wiki pages, JavaDoc, the Chemistry Toolkit Rosetta Wiki, and more. Nothing in the book was not already answered somewhere else. The book was just a boon for me to answer those questions and provide an easy way for people to get many answers.

Now, because I could not keep up with the recent API changes, I am no longer feeling comfortable with releasing the book. As such, I have "retired" the book.

I am now working out on how to move from here. An earlier edition is already online under a Creative Commons license, and it's tempting to release the latest version like this too. That said, I have also been talking with the other CDK project leaders about alternatives. More on this soon, I guess.

Here's an overview of posts about the book: