## Thursday, September 25, 2014

### Slides at the Open PHACTS community workshop (June 26)

It seems had not posted my slides yet of the presentation at the 6th Open PHACTS community workshop. At this meeting I gave an overview of the Programming in the Life Sciences course we give to 2nd and 3rd year students of the Maastricht Science Programme (MSP; some participants graduated this summer, see the photo on the right side).

This course will again be given this year, starting in about a month from now, and I am looking forward to all the cool apps the students come up with! Given that the Open PHACTS API has been extended with pathways and disease information, they will likely be even cooler than last year.

### OpenTox Europe 2014 presentation: "Open PHACTS: solutions and the foundation"

 CC-BY 2.0 by Dmitry Valberg.
Where the OpenTox Europe 2013 presentation focused on the technical layers of Open PHACTS, this presentation addressed a key knowledge management solution to scientific questions and the Open PHACTS Foundation. I stress here too, as in the slides, that the presentation is on behalf of the full consortium!

For the knowledge management, I think Open PHACTS did really interested work in the field of "identity" and am happy to have been involved in this [Brenninkmeijer2012]. The platform implementation is, furthermore, based on the BridgeDb platform, that originated in our group [VanIersel2010]. The slides outline the scientific issues addressed by this solution:

PS, many more Open PHACTS presentations are found here.

Brenninkmeijer, C. et al. Scientific lenses over linked data: An approach to support task specific views of the data. a vision. In Linked Science 2012 - Tackling Big Data (2012). URL http://ceur-ws.org/Vol-951/paper5.pdf.

van Iersel, M. et al. The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services. BMC Bioinformatics 11, 5+ (2010). URL http://dx.doi.org/10.1186/1471-2105-11-5.

## Tuesday, September 16, 2014

### Do a postdoc with eNanoMapper

 CC-BY-SA from Zherebetskyy @ WP.
Details will still have to follow as they are being worked out, but with Cristian Munteanu having accepted an associate professorship, I need a new postdoc to fill his place, and I am reopening the position I had almost a year ago. Do you like to works in a systems biology group (BiGCaT), are pro Open Science, like to work on tools for safe-by-design nanomaterials, and have skills in one or more of bioinformatics, chemoinformatics, statistics, coding, ontologies, then this position may be something for you.

The primary project for this position is eNanoMapper and you will be working within the large European NanoSafety Cluster network, though interactions are not limited to the EU.

If you have interest and cannot wait until the details of the position come out, please send me an email. first.lastname @ maastrichtuniversity.nl. General questions about eNanoMapper and our BiGCaT solutions for nanosafety are also welcome in the comments.

## Sunday, September 14, 2014

### CDK: Element and Isotope information

When reading files the format in one way or another has implicit information you may need for some algorithms. Element and isotope information is a key example. Typically, the element symbol is provided in the file, but not the mass number or isotope implied. You would need to read the format specification what properties are implicitly meant. The idea here is that information about elements and isotopes is pretty standardized by other organizations such as the IUPAC. Such default element and isotope properties are exposed in the CDK by the classes Elements and Isotopes. I am extending my Groovy Cheminformatics with the CDK with these bits.

Elements
The Elements class provides information about the element's atomic number, symbol, periodic table group and period, covalent radius and Van der Waals radius, and Pauling electronegativity (Groovy code):

Elements lithium = Elements.Lithium
println "atomic number: " + lithium.number()
println "symbol: " + lithium.symbol()
println "periodic group: " + lithium.group()
println "periodic period: " + lithium.period()
println "electronegativity: " + lithium.electronegativity()

For example, for lithium this gives:

atomic number: 3
symbol: Li
periodic group: 1
periodic period: 2
electronegativity: 0.98

Isotopes
Similarly, there is the Isotopes class to help you look up isotope information. For example, you can get all isotopes for an element or just the major isotope:

isofac = Isotopes.getInstance();
isotopes = isofac.getIsotopes("H");
majorIsotope = isofac.getMajorIsotope("H")
for (isotope in isotopes) {
print "${isotope.massNumber}${isotope.symbol}: " +
"${isotope.exactMass}${isotope.naturalAbundance}%"
if (majorIsotope.massNumber == isotope.massNumber)
print " (major isotope)"
println ""
}

For hydrogen this gives:

1H: 1.007825032 99.9885% (major isotope)
2H: 2.014101778 0.0115%
3H: 3.016049278 0.0%
4H: 4.02781 0.0%
5H: 5.03531 0.0%
6H: 6.04494 0.0%
7H: 7.05275 0.0%

This class is also used by the getMajorIsotopeMass() method in the MolecularFormulaManipulator class to calculate the monoisotopic mass of a molecule:

molFormula = MolecularFormulaManipulator
.getMolecularFormula(
"C2H6O",
SilentChemObjectBuilder.getInstance()
)
println "Monoisotopic mass: " +
MolecularFormulaManipulator.getMajorIsotopeMass(
molFormula
)

The output for ethanol looks like:

Monoisotopic mass: 46.041864812

## Saturday, September 13, 2014

### CDK 1.5.8, Zenodo, GitHub, and DOIs

 Screenshot from John blog post.
John released CDK 1.5.8, which has a few nice goodies, like a new renderer. The full changelog is available. Interesting aspect of this release is, that it uses one ZENODO to make the release citable with a DOI. And that is relevant because it simplifies (making it a lot cheaper!) to track the impact of it, e.g. with #altmetrics. And that matters too, because no one really has a clue on how to decide which scientist is better than another, and which scientist should and should not get funding. Where we know peer review of literature is severely limited, we happily accept it to determine career future.

Anyways, so, we have a DOI now for a CDK release. So, everything using the CDK in research can cite this specific CDK release in their papers with this DOI. Of course, most publishers still don't support providing reference lists as a list of DOIs and often do not show the DOI there, but all this is a step forward. John listed the DOI with a nicely ZENODO-provided icon in the release post.

If you follow the DOI you go to the ZENODO website (they effectively act as a publishing platform). It is this page that I want to continue talking about, and in particular about the list of authors. The webpage provides two alternatives. The first is the most prominent one if you visit the page first:

This looks pretty good, I think. It seems to have picked up a list of authors, and looking at the list, not from the standard AUTHORS file, but from the commit messages. However, that is unsuited for the CDK, with a repository history in CVS, via SVN, to Git, but only the latter show up. The list seems sorted by the amount of contributions, but note that Christoph is missing. His work predates the Git era.

The second list of "authors" is given in the bottom right of the page, as "Cite As":

This suggestion is different, though it seems reasonable to assume the et al. (missing dot) refers to the rest of the authors of the first list. In the BibTeX export the full author list shows up again, supporting that idea.

Correct citation?
This makes me wonder: whom are the authors of this release? Clearly, this version includes code from all authors in some sort of way. Code from some original authors may have been long replaced with newer code. And we noted the problem of missing authors, because of the right version control history.

An alternative is to consider this release as a product of those people who have contributed patches since the previous release. In fact, this is something we noted as important in the past and now always report when making a release. For the 1.5.8 release that list looks like:

That is, this approach basically takes an accepted approach in publishing: papers describing updates of running projects involve only the people that contributed to that released work.

Therefore, I think the proper citation for this CDK 1.5.8 release should be:
John May, Egon Willighagen, Mark Vine, Oliver Stücker, Andy Howlett, Mark Williamson, Sambit Gaan, Alison Choy (2014). cdk: CDK Release 1.5.8. ZENODO. 10.5281/zenodo.11681
Also note the correct spelling of the author names, though one can argue that they should have correctly spelled their names in the Git commit messages. Here are some challenges for GitHub in adopting the ORCID, I guess.

The question is, however, how do we get ZENODO to do this they way we want it to do? I think the above citation makes much more sense, but others may have good reasons why the current practice is better. What should ZENODO pick up to get the author provenance from?