At 350 Years Old, It’s Time to Change Scientific Publishing

Picture 009When something turns 350 years old you’d think it’s worth a celebration! Sadly, in the case of scientific publishing it is absolutely not! First, some interesting history, the world’s oldest scientific publication is Philosophical Transactions, which published its first periodical on 6th March 1665. Since then, the basic principle of scientific publishing has remained the same, researchers send their work to a central body for ‘publication,’ i.e. to record their accomplishment and disseminate it from a central source. Of course, there have been changes in the structure of this publication, first with peer review, and more recently with publishing the articles online. However, what is incredible is that the publishers themselves still exist. Not only do they exist, they are thriving, with net profit margins at 36% for Elsevier, a larger margin than Apple (at 35%)! Clearly we need to ask tough questions about the current mechanism of publishing and whether there is a better way.

To this end, there was a conference held at the University of St. Andrew’s recently where they asked the question of how to make scientific communication more efficient. This was in context that researchers spend approximately 15million person hours a year on unpublished submission to journals. Of course the number of hours spent on things that never get submitted probably ends up being over 90% of a researcher’s time. The Gaurdian wrote a short article about the panelists remarks, and the majority of the comments that people brought up were rather incremental in nature things like:

  • Who is paying? What value is generated and how is any surplus reinvested?
  • Peer-review is radically different from domain to domain, from discipline to discipline
  • Authors still create journals in prose-style — do we really need to produce all that text?

However, there was one comment that really stood out to me by Cameron Neylon, the advocacy director of PLoS:

“Scientific communication is a means of dissemination, it is not a product”

This one statement is by far the most powerful way to communicate how and why the current system is broken. We too often view the publication itself as the product and scientists will treat it as ‘finished.’ This ‘productization’ of the publication is effectively what enables publishing companies to make so much money. However, if we take a publication for what it is at it’s core, a means of dissemination, we open up the world of alternatives considerably. News media has already been revolutionized by blogs and twitter, and television by YouTube. What’s more impressive is that the change in media has enabled a greater population to partake in the dialogue and has certainly improved the overall quality of news media. Here are just a few ideas for scientific publishing to spur the conversation:

  • Publishing videos of the experiment and result to improve reproducibility, like
  • Enabling longer form communications that discuss the methods in great detail, as opposed to ‘letter’ formats which are impossibly short
  • Hosting an AMA (ask me anything) with the authors are periods after publication (1 month, 3 months, 12 months, for example)
  • Writing a lay-person version of the main findings of the paper
  • Publishing a SlideShare version along with the paper (this idea is inspired by an ex-colleague of mine)
  • Publishing plots to Plotly so others can re-visualize the data as they please
  • Enabling researchers to publicly mark-up documents with comments via

These are just a few of the many things we can do to dramatically improve communication. So we must ask ourselves as the scientific community, what the real reasons are for not changing our means of knowledge dissemination?

As always, please do leave comments below to spur discussion!


Moving Toward Publication of Raw Data in Science

The concept of sharing the raw data of an experiment is still very new to the scientific community in general. The Nature Publishing Group created Scientific Data, a repository for ‘scientifically valuable datasets,’ less than a year ago. Similar efforts are being made by other journals with a variety of tools out there that encourage you to put up your raw data. For most fields in science this is an alien concept, as we largely only display ‘final findings’ in journals. However, a group of researchers described in a study that the field of Paleogenetics, i.e. sequencing genes of ancient organisms, has a very high incidence of sharing their data, close to 100%! So they conducted a study about this high incidence of data sharing and discovered three interesting insights:

1) A belief that sharing is important has greater impact that policy and regulation

caring greater than policy

This finding is quite surprising at first, as one would expect that a change in regulation is typically what causes a change in behavior. However, an overwhelming majority of the paleogenetics community choose to share their data not because of any regulation, but because of a belief in the importance of sharing. So this brings up a very important question, if regulation can’t change behavior, how do we move other fields toward sharing their data? Presumably, this has to inherently be done via some selfish incentive to the lead authors. One potential solution is to highlight when an article does not include raw data, i.e. public shaming. This tends to be an incredible motivator, and I suspect will work very well in academia, where reputation is our main currency.

2) A lack of consistent archiving methods

method of data sharing

Although the paleogenetics community is very aggressive about sharing their raw data, there is a clear lack of systemization to this process. Worse still, different fields choose to share their data differently. To me this is a situation where the industry is screaming for a solution to standardize data sharing. While I acknowledge that this is a non-trivial task, this is where some companies have realized there is an unmet need and are trying to offer point solutions by giving a DOI number to datasets, such as figshare. However, there is still arguably no good solution to effectively search and systematize all of the data associated with a publication.

3) Paleogenetics no longer has reproducibility crises


It turns out that in the 80’s, the field of paleogenetics had a major reproducibility crisis, which acted as a major instigator for them to start sharing raw data more aggressively. The consequence of this is that in the recent history of the field, there is little or no reproducibility questions and they have more stringent standards than many other fields. This is now in context of the major reproducibility crises seen in the biological sciences today, and small companies taking on projects such as the reproducibility initiative to try to verify results. It seems that actually doing experiments to verify results is an extreme measure, when sharing raw data can alleviate a large amount of the reproducibility concern.

While this study is certainly not exhaustive, it does put into focus a few very interesting, and non-obvious observations about open science in general. I strongly believe the scientific community will move toward an open system where raw data is shared in within the next 15 years, and that once this happens, we will see incredible progress in the rate of scientific discovery. For now though, I would love to hear opinions on why data is not shared upon publication in other fields, so please comment below!

(data charts are reproduced from ‘When Data Sharing Gets Close to 100%: What Human Paleogenetics Can Teach the Open Science Movement.’ )

Why Do Researchers Work In Isolation?

I won’t dwell on the amazing things the internet has enabled, since there are too many examples and I am not an expert in several of those. However, I will point out that arguably one of the biggest impacts of the internet is very simply in connecting people (whether via Facebook, Skype, LinkedIn, or any other service). So I can’t help but wonder:

Given how well connected we are these days, why does research still happen in isolation?

There are several studies that have looked at both collaborations and overall productivity over time in research (this study and this study for example), and there is no significant difference between the time before the internet and after the internet! The only places where there seems to be some evidence of an impact is in lower tier universities, where presumably, the existence of the internet enables them to find resources previously difficult to find. However, in top tier institutions, collaboration and productivity are increasing at the same rate as before the internet. So there seems to be something strange about research. Let’s explore some potential reasons for the lack of impact:
1) Lack of willingness to share ‘confidential’ information


This is by far the most popular answer you will hear. Competition apparently encourages researchers to keep their work secret until it is published and they can forever receive the credit for that work.  There are certainly many perverse incentives at play here that cause this behavior. By far the biggest issue is that we choose to cite journal articles, and hence if the result is not in a journal article, it will not get cited. Also, if someone else publishes something first, we are ‘scooped’ and it is impossible for us to publish our work. Yet, some fields are quite open, Physics pre-publishes on arXiv, and Economics publishes ‘working papers,’ so what is really stopping other fields?

2) A belief that no one else knows the answer

dont know what talking abt

A common reason, and certainly the one I contest with the most, is the belief that because you are at the cutting edge of a field, that no one can help you. While I certainly agree that you are discovering something new, it is very unlikely that the steps you take to get there have not been done by someone else (methods, materials, processes, etc.). So it is important to ask for help. In fact, I would bet that if you are a researcher reading this, you have had at least one conversation with a colleague that led to a major breakthrough in your work.

3) Waiting for the meeting or conference


This is just an argument of inertia. Many people will say that using web-based tools doesn’t help that much because they meet every few weeks or months anyway. When it comes to using the internet for broader broadcasting, there is still a belief that the venue for that is at conferences. Researchers often stick to these arguments, though there is now a slow but growing movement to socialize discoveries via twitter and blogs. These are still nascent, and I hope will continue to grow.

On the flip side the internet has had major impact in the ability to publish open-access work (PLOS, arXiv) find published work (Google Scholar), and is now enabling very easy contract research work (ScienceExchange) or even robotic labs accessible via the internet (Emerald Cloud Lab). Hopefully, the number of companies and tools out there enabling a superior scientific workforce continues to grow!

Collaborating Across Universities Yields Better Research

cross uni image

‘Innovation happens at the intersections’ and this philosophy is becoming more important over time as researchers have to become more specialized to contribute meaningfully to a field. This is also true both in the natural and social sciences. So a natural question to ask, is what should these intersections look like and where should they happen? Fundamentally of course, the intersections for us are collaborations. However, the important question is who to collaborate with. A previous post discussed a few insights about collaboration group sizes and rank of the PI. Here we look specifically at collaborations between different universities. A research article in Science in 2008 showed that cross university collaborations create more high impact work than collaborations within a university.

This finding is somewhat intriguing, as one would expect it to be easier to collaborate within the same university. However, this finding is very similar to a finding my Michael Bikard that I discussed in a previous post – where inter-department collaborations produce more impactful work than intra-department collaborations. A similar logic can be applied here, where someone would only make the effort to collaborate across universities if the work is important. Another interpretation is that collaborations across different universities are constructed with a more effective division of labor as there is a geographic separation.

collaboration benefits
Essentially, simply having a top institution name on the research gives that research more perceived importance. This result is not very surprising, but it does have important implications. Unfortunately, it implies that there is a incentive for top tier universities to only collaborate with other top tier universities. The data also supports this hypothesis, over time, universities are stratifying more, i.e. tier I universities are collaborating with each other more than with non-tier I universities.What is also surprising is that the benefit of cross university collaboration is seen by all universities ranked in the top 20%, and is most pronounced the higher the rank of the university. This result is shown in the graph above which is taken from the original article published in Science. Essentially, what this graph is telling you is when two top schools collaborate there is a greater incremental benefit (as opposed to collaborating within the school) than when two ‘teir II’ schools collaborate. Amazingly, this is more significant in the social sciences than in the natural sciences/engineering. At first glance, one could assume that the reason for the increased benefit in science and engineering is a resource bias. Meaning, top schools have more equipment than tier II schools to perform science and engineering research. However, the increased benefit seen in the social sciences can’t be explained as easily via resource arguments. This seems to suggest that it could simply be a reputation issue at play.

So this leaves us with two important learnings:

1) It is important to collaborate across university borders

2) There is a perception benefit to collaboration with tier I schools

The first point is something we can actually do something about, i.e. collaborate more with others. The second one in many ways is what we should be fighting – unfair gains via reputation. I can only hope that with greater information access and collaboration tools that we can break down this reputation barrier. I am curious to know how we can design more cross university collaborations, if you have ideas and thoughts on this topic, please leave a comment below.

Tweet to Increase your Scientific Impact!

Science Tweet

The more tweets your article gets the more likely it is to have more citations! There have been a plethora of studies around this topic in the past few years and all of them point to the same conclusion. In fact, Altmetrics is receiving growing attention from the scientific community as an alternative to citations as a way to measure impact. Altmetrics are essentially just a measurement of how much social media and main stream media mentions an article receives once it is published. Given that citations take years to accumulate, Altmetrics are aimed at providing some kind of measure of the ‘importance’ of an article especially in the short term. So it looks like these measurements mean something after all.

The two articles which I read that most heavily influenced this post are ‘Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact,’ by G. Eysenbach and ‘Do Altmetrics Work? Twitter and Ten Other Social Web Services’ by M. Thelwall et. al. Both of these articles find that within the biological sciences (the sample set for both) there is a positive correlation between the number of tweets and the number of citations. Even better in fact (i.e. stronger positive correlation to citations) according to the Altmetrics study is writing a blog article about the work. However, I understand that this can be time consuming. While there is certainly no evidence for causation, there are multiple effects here that are worth discussing.

The first and foremost is the obvious difference between ‘popularity’ and ‘useful’ so to speak. Where in its simplest form popularity just means that it sounds cool, whereas useful means that the article actually helps with something. So, many tweets is a sign of popularity, and citations presumably usefulness (though not necessarily). This significantly convolutes the interpretation of all of the studies in this area. Another issue is one of timing, most tweets seem to occur within the first month after publication, but most citation come significantly later. So can we make any real causal arguments with such a large time gap?

Despite these difficulties is finding causation, there are 2 things which are universally true:

  1. You will only cite an article if you know it exists
  2. Citations that actually make use of a published result will always have a time delay (i.e. science takes time)

So if we assume these two facts to be true, it is certainly reasonable that all else equal, if you tweet about an article it is more likely to get cited (i.e. the exact same article tweeted vs. not tweeted). Also, that the return on that tweet may take time, but it can only be positive. Given these facts about twitter, and how inexpensive it is (i.e. free + a little setup time) one wonders why so few scientists use it. I recently read that only 12% of scientists are even on twitter (much less actively use it).

So, instead of sitting on our hands after we publish, we should be our own marketers for our work! We should tweet/blog/facebook about it!

Should I Publish Open Access or Elite Journal?

open access vs pay

The number of open access journals is quickly increasing. In fact just last week I got an e-mail from ACS notifying me of the creation of ACS Central Science. Such e-mails are becoming more of a norm than outlier, given the massive growth of open access journals over the last decade. In fact, a quick Wikipedia search for Open Access, shows that there are almost 5,000 open access journals which have published just shy of 200,000 articles! So the question for each individual researcher then becomes, ‘should I publish open access?’

As a scientist, my currency is basically two things, citations and number of articles I publish. Essentially, I am more successful, and more likely to get tenure if I publish more and have more people citing me. So this should help me answer my question of where to publish – I should always try to publish in the highest impact factor journal. This is pretty much the stream of logic every scientist follows, and it leads them to Nature and Science (of which I am partial to Science).

So I decided to do a simple study – I opened Web of Science and found the number of articles published by Nature Nanotech and PLoS ONE that had received more than 100 citations. I chose 100, because that to me shows that the research had some real and lasting impact beyond headlines (though I admit it is somewhat arbitrary). Regardless, the results are AMAZING:

PLoS ONE: 388 papers with over 100 citations

Nature Nanotechnology: 324 papers with over 100 citations

i.e. PLoS ONE published more impactful work than Nature Nanotechnology

By the way, both started in 2006, so there is no bias from timing. What is incredible is that a 1 year subscription to Nature Nanotech is $175, and to PLoS ONE is……$0. Both charge the author for publishing as well, and the amount they charge is approximately the same (a little over $1,000). This advocates pretty strongly for PLoS ONE. Further, over 45% of Nature Nanotechnology articles are cited less than 10 times. In fact, Nature Nanotech has a median citation count of only 13, with a mean of 68! What seems to happen is that work published in an elite journal gets a ‘citation multiplier.’ So the most impactful contributions to an elite journal get cited a LOT (Nature Nano has 16 articles with over 1,000 citations).

So what does all of this mean to a researcher submitting a paper? PLoS ONE was able to distribute more impactful work than Nature Nanotechnology since its existence. To me, this one data point certainly points toward PLoS ONE’s hypothesis being true: Impact of an article will be decided by the community after publication, not just by the reviewers. Seemingly the community so far is doing this work, with almost 400 Very High impact articles being published in just under 10 years!

Returning to our question: Should I Publish Open Access or Elite Journal? I think the answer is self-evident. If you have faith that your work adds immense value to the scientific community, they will reward you via citations, regardless of where you publish. Publishing Open Access just means that your work will be visible and accessible to more scientists than a subscription journal.

3 Ways to Increase Impact in Science: Designing Effective Collaborations

I recently read an economic study entitled “Exploring tradeoffs in the organization of scientific work: Collaboration and scientific reward,” written by researchers at the MIT Sloan School of Management.  At first I was skeptical, because as a PhD in Materials Science, I thought, how could Management experts possibly understand the nuances of science? So I read the paper front to back and even reached out the one of the authors, Prof. Michaël Bikard. Even though I have been practicing science for the past 10 years, I learnt 3 incredibly insightful things from the study and speaking with the author:

1) Collaborate in groups of 3-6 – But no more!

The idea that if you collaborate more, you will produce more is becoming more and more popular. Both anecdotally and statistically it is the case that when you collaborate you gain both in creativity and total output. The improvement in creativity is quite clear: you gain insights from others with differing expertise. In fact, the authors of the study show that the greater the size the collaboration, the more creative and original the work is! However, on the output side, surprisingly what Prof. Bikard and his colleagues observe is a stagnation in productivity for groups of more than 3, and decrease in productivity for groups that have more than 6 collaborators. One possible reason for this decrease is the cost of needing to communicate and coordinate. This makes sense even anecdotally, whenever I need to collaborate, it is non-trivial: lots of e-mails, meetings, shared Dropbox folders, the list goes on. So, if you want to maximize your output (i.e. number of publications) collaborate in groups of 3-6!

2) Collaborate across departments more than within departments

Another interesting and counterintuitive finding of the study is that collaborations across departments tend to increase overall productivity in comparison to those within a department. This is a confusing result at first. If you are in the same department, shouldn’t that make coordination and communication easier? It is easier for me to walk across the hall than across campus after all. However, upon deeper consideration it may be the case that when collaborators are in different departments, their work streams are so different that they can work alone. Meaning that the division of labor is well constructed enough that they don’t need to communicate! Another possible explanation is that only important/urgent experiments will create cross-department collaborations, and this is what causes the greater productivity. Either way, there is an interesting lesson here: when you design experiments think about how another department’s expertise can help you!

3) Work with faculty at the same level as you

A final point of interest in their work is they analyze the effect of collaborating with faculty of lower, equal and higher rank. For example, if you are and assistant professor, do you gain more from collaborating with another assistant professor or a full professor? At a superficial level one would expect that collaborating with established faculty should always be advantageous, however this is not the case. What seems to happen is that a difference in incentives makes it such that collaborating with faculty of the same level creates the greatest productivity benefits (again measured by number of papers). However, this does not mean there aren’t other benefits of collaborating regardless of faculty status. It simply means that if you are trying to increase your publication count, collaborate with faculty of equal rank.

What I did not discuss in much detail in this post is whether there is a ‘loss’ in credit if you collaborate with others. If there are 5 authors, do you only get 20% credit? Probably not! In fact, in many fields it does not matter at all if you co-author with others. In fields where it does matter, the ‘discount’ to collaborating is never that high. Maybe, just maybe, if you have 5 authors on a work, and you are the lead author, you receive 70% credit – pretty good deal!

These lessons leave us with an incredibly interesting take-away, especially for young faculty trying to get tenure. Typically tenure-track faculty are trying to increase their productivity over all else. So, to maximize your publication count and impact: collaborate with other tenure track faculty in a different department in a small team of 3-6!