Pages

Sunday, August 27, 2017

DataCite: the PubMed for data and software

We have services like PubMed, Europe PMC, and Google Scholar to make a list of literature. Scholia/Wikidata and ORCID are upcoming services, but for data and software there are fewer options. One notable exception is DataCite (two past blogs where I mentioned it). There is plenty of caution in interpreting the results, like versioning, the fact that preprints, posters, etc are also hosted by the supported repositories (e.g. Figshare, Zenodo), but it seems the faceted browsing based on metadata is really improving.

This is what my recent "DataCite" history looks like:


And it get's even more exciting when you realize that DataCite integrates with ORCID so that you can have it all listed on your ORCID profile.

Saturday, August 26, 2017

Updated HMDB identifier scheme

I have not found further details about it yet, but noticed half an hour ago that the Human Metabolome Database (doi:10.1093/nar/gks1065) seems to have changes all their identifiers: the added extra zeros. The screenshot for D-fructose on the right shows how the old identifiers are now secondary identifiers. We will face a period of a few years where one resource uses the old identifiers (archives, supplementary information, other databases, etc).

This change has huge implications, including that mere string matching of identifiers becomes really difficult: we need to know if it uses the old scheme or the new scheme. Of course, we can see this simply from the identifier length, but we likely need a bit of software ("artificial intelligence") in our software.

I ran into the change just now, because I was working on the next BridgeDb metabolite identifier mapping database. The release of this weekend will not have the new identifiers for sure: I first need more info, more detail.

For now, if you use HMDB identifiers in your database, get prepared! Using old identifiers to link to the HMDB website seems to work fine, as they have a redirect working at the server level. Starting to think about internally updating your identifiers (by adding two zero's), is likely something to put on the agenda.

What about postprint servers?

Various article version types, including pre and post.
Source: SHERPA/ROMEO.
Now that preprint servers are picking up speed, let's talk about postprint servers. Sure, we have plenty of places to place and find discussions about the content of articles (e.g. PubPeer, PubMed Commons, ...), and sure we have retractions and corrections.

But what if we could just make revisions of articles?

And I'm not only talking about typo-fixes, but also clarifications that show up during post-publication peer-review. Not about full revisions; if a paper is wrong, then this is not the method of choice. They should happen frequently either, but sometimes it is just convenient. Maybe to fix broken website URLs?

One point is, ResearchGate, Academia, Mendeley, and the likes allow you to host versions, but we need to track the fixes and versioned DOIs. That metadata is essential: it is the FAIRness of the post-publication life time of a publication.

Thursday, August 17, 2017

Text mining literature that mention JRC representative nanomaterials

The week before a short holiday in France (nature, cycling, hiking, touristic CERN visit; thanks to Philippe for the ViaRhone tip!), I did some further work on contentmining literature that mention the JRC representative nanomaterials. One important reason was that I could play with the tools developed by Lars in his fellowship with The ContentMine.

I had about one day, as there always is work left over to finish in your first week of holiday, and had several OS upgrades to do too (happily running the latest 64bit Debian!). But, as a good practice, I kept an Open Notebook Science practice, and the initial run of the workflow turned out quite satisfactory:


What we see here is content mined from literature searched with "titanium dioxide" with the getpapers tool. AMI then extracted the nanomaterials and species information. Tools developed by Lars aggregated all information into a single JSON, which I converted into input for cytoscape.js with a simple Groovy script. Yeah, click on the image, and you get the live network.

So, if I find a bit of time before I get back to work, I'll convert this output also to eNanoMapper RDF for loading into data.enanomapper.net. Of course, then I will run this on other EuropePMC searches too, for other nanomaterials.