What do universities, journals, and government need to do to stimulate breakthrough scientific discovery?

Answer by Rishabh Jain:

The biggest change that needs to occur is researchers need to be able to more easily access relevant, actionable insights from work that has already been done.
This means that discovering/accessing results and experiments either from publications or from the dusty lab notebooks of grad student needs to be much easier and faster. This will enable the greatest leverage on researchers time: They will be able to quickly do the experiments that matter and not spend a ton of time reproducing results that are already known.
There are 2 critical components to making this possible:
  1. Making results accessible and machine readable
  2. Developing methods to parse existing data to derive actionable insight
1) Making results accessible and machine readable: Currently only PLoS enables open access via an API to their articles for both access and machine readability. We need to go several steps further. We definitely need at least all journals to make their data available via an API, even if we can't get them to open access.
Importantly, we need to figure out a way to have the results that are unpublished available to others in a structured manner. It is too well known that 99% of research results don't make it into the journal article. To get there, research needs to be documented in a structured manner and then needs to be made available when the researcher sees fit. In fact, this is part of the core mission of my startup, OpenLab, where we enable researchers to store and share data in a highly structured easily searchable fashion within the lab. So at the very least you can learn from your own past work and your labmates' work. 
However, to get to the end goal of truly open data sharing will require government agencies and universities to start imposing open access of not only the publications, but also the full body of work that goes along with the research. The NIH and the Gates foundation are leading this charge with their policies. The role of the journal must be to enable greater access and machine readability. Functionally, they can continue to act as the article repository, and in addition to the SI, provide a  link to the structured full data set (wherever that may be hosted).
2) Developing methods to parse existing data to derive actionable insight: Once we have all of the data available to us, the real question is how to do we make use of it in a meaningful way. In order to do this universities and governments should sponsor programs to develop parsing algorithms so it is easy for the researcher to find and discover relevant work that has already been doe in the past. This will essentially be a recommendation engine for research, not unlike google now is for consumer goods and services.
We need a future in research where we can leverage researchers time at least 10 – 100x, and thus equally increase the odds of greater number of breakthrough discoveries. The beautiful thing is the information already exists to do so. We just need to make it actionable!

What do universities, journals, and government need to do to stimulate breakthrough scientific discovery?


How do you narrow down an advisor for a PhD?

Answer by Rishabh Jain:

You should maximize for only one trait: Being able to learn a ton from them.
If you can spend your PhD constantly learning from your advisor you will leave you PhD a MUCH better researcher than when you started, that is success. Here is what maximizing for learning implies, and why you can use this simple criteria:
1) They work in a field you are interested in: If you weren't interested at all, you would learn little since you wouldn't care enough to learn from them. Similarly, if outside your core areas, it is possible you don't know enough of the fundamentals to truly learn a lot.
2) You have excellent communication with them: Communication is a must for effective learning. This includes have a good working relationship and being able to converse on a similar enough wavelength that they can effectively teach you.
3) They have something to teach you in every respect: If you optimize for learning over the duration of your PhD, that should imply that they can not only teach you how to do research (scientifically), but also how to write grants, present, run a group, etc. Most people will take the full 5 years to learn these things really really well.
4) You are hungry: Finally, the last obvious point is if you believe you can learn a lot from them, it means you are hungry. That hunger really should be strong,especially in the first few years where output is low, and your measurement for success really should be how much you are growing a researcher (i.e learning from your advisor).
The way I looked for someone I could learn a ton from, was to apply to a bunch of schools (8 in my case), and when I visited, I made sure I set up as many meetings as possible with professors in the department. I would gauge essentially one thing, when I'm in the room with them am I learning a ton from them? and could I be learning a ton over the 5 years? If there weren't at least 2 faculty I felt that way about in a given department, I decided I wouldn't go to that school. Essentially, I could not be so arrogant to assume any faculty would take me (which was really good, cause I was in fact turned down by my 1st choice advisor when I started grad school).
So the short if it is, a PhD is a training – optimize to get the most you can out of it's purpose – learning.
(the below graph should stay high, but for different aspects of research over time!)

How do you narrow down an advisor for a PhD?

What is the future of a PhD student?

Answer by Rishabh Jain:

There are two interesting ways to think about this question:
  1. What happens to PhDs later in life?
  2. What is the future of PhD students?
The first is easier (since it requires less viewing into my crystal ball). Unfortunately it is somewhat less exciting. The short story is very few PhDs get academic jobs. This article from the Atlantic looks at US NSF data, and the author finds less than 20% of PhDs get academic jobs (across fields).
Now onto the crystal ball question… what does the future hold for PhD students?
Here is a short list of what I think the future will look like:
  1. More holistic training: Given the increasing rate at which we produce PhDs, the labor market will demand broader skills from PhDs and expect more and more preparation for incoming newly minted researchers. So universities will have to train researchers more holistically, including simple things like grant writing classes, presentation classes, journal editorial processes, and similar. Meaning, we will move to actually training people for the job, not just the research.
  2. More career guidance: Right now PhDs are basically left to fend for themselves at the end of the degree. This is creating a lot of unhappiness and anxiety in newly minted PhDs. So eventually, this major hole will have to get filled by someone, whether the university or an external body. Helping to place PhDs in jobs where they can apply their skills. I am seeing this happen already in the private sector with cool startups like Oystir – The Best Jobs for PhDs.
  3. Even more collaboration during grad school: The levels of collaboration during PhDs is growing constantly. It is already known that collaborations improve the quality and impact of researchers work (see more here). So we will see the PhD students doing increasingly collaborative work!
  4. More software tools for PhDs: This is more and more obvious, that research is lagging as an industry in adopting and creating new software and web based tools to increase productivity. We still communicate by word of mouth and group meetings, and write notes in physical books. My own startup OpenLab is a part of this ecosystem trying to improve communication and collaboration within research (do check us out!). Others are doing cool things as well, like TetraScience, creating an IoT platform for labs. These tools will easily make PhDs 5-10x more productive than they are today. Just like what software tools do for business today compared to 10-15 years ago.
  5. Fewer PhDs overall: Sadly, the feedback of employment difficulty after a PhD will eventually mean that we will see a slow down in PhD enrollment and eventually a reduction. Essentially, a correction in the labor market for PhDs has to happen soon. This is especially true given the rise in postdocs everyone knows too well:
Overall, I am highly optimistic for what the future state of the PhD process and student looks like, and am super excited to be a part of that future!

What is the future of a PhD student?

Introducing OpenLab – Facilitate Collaboration, Promote Innovation

I am excited to announce that OpenLab is now open for anyone to sign up!!!

OpenLab banner image

OpenLab is a simple laboratory collaboration platform, where scientists can upload their data and observations and give and receive feedback on that data. We are super focused on making the experience joyful and appropriate for laboratory environments and hence focus on three things as our priority:

  1. Security and privacy of data
  2. Making sharing and communication easy and delightful
  3. Helping scientists to get even more value out of their data

So why are we doing this? When I was a scientist working in the lab, I realized that the best resource I had was my labmates, but getting input from my peers on my experiments and results was not easy. Some people did this informally, by walking up to individuals and asking ‘hey, what do you think about this data?’ Most people got input in group meeting when they stood in front of the lab every few months and asked for feedback and showed progress. Neither of these methods really made sense to me. One was very time consuming (asking a single person at a time), the other was intimidating and infrequent. So OpenLab was born.

From inception over a year ago, to today, we have learned a lot about lab communication and operations, and hence the three main pillars for this first release of OpenLab:

1. Security and privacy of data: The number 1 concern for a lab is the security and privacy of their data, and hence it was first in foremost in our application design. Having worked in the lab ourselves, we know that there is a need for privacy and security within the lab. So, for every OpenLab lab, we allow the users create groups within the lab instance that are private. Meaning, new members to the lab must be explicitly invited to the individual groups – being a member of one group does not imply permissions to other groups. This allows for interesting situations, like a group where you can be comfortable inviting external collaborators, or giving permission to undergrads for the relevant group alone.

Of course, we also use the highest internet security standards in our application and data transmission protocols.

2. Making sharing and communication easy and delightful: The current methods of communication within the lab are largely confined to in person communication, with occasional emails. However, social media has taught us that online communication can actually be easier and more fun as it is low pressure and you can distribute quickly to multiple people. So we borrow the ‘post’ process from social media as the basic structure of communication within OpenLab, including #tags and @mentions. We made even this specific to the lab: you can directly upload files from almost anywhere (dropbox, onedrive, your PC, and so on) and we have no file type or size restriction!

Post Image

3. Helping scientists to get even more value out of their data: The way research works today, most datasets are lost in someone’s notebook never to be seen again. However, with OpenLab, your data is instantly seen by the relevant people in your group and is easily found later with our management tools. Posts stay associated with the group and person, is found and filtered by #tags, and our search makes finding anything super easy!

What’s more? We have a bunch of features coming your way to make managing and using tags even easier and more interesting. As a sneak peak, we will soon allow users to define ‘preferred tags’ to get the lab using the same vocabulary, and other useful management and measurement features.

We are very excited to have you try this first release of OpenLab. Just like any other good scientist, we know that this is just the initial development, and we will work to discover more about how to improve the lives of scientists! So please give us feedback – we reply very quickly! And in the meantime:

Welcome to OpenLab!

(p.s. to learn more, check out our Help Center at ZenDesk)

How do Scientists Keep Informed in their Fields?

I recently learned that people have many different ways of staying updated on the progress in their respective fields. The major way scientists stay updated on the field is publications within a given field. However, with the number of publications being so high, it is completely unreasonable for someone to read everything that gets published, and certainly not every paper in full. So how do people stay updated in that case?


Let’s first discuss journals in a bit more details as the primary method for information transfer. What tends to happen is early in your research career you have to learn a lot. So this makes you consume a lot of material early on in your research career, for me this was about 20% of papers I chose to open. As you consume material and perform research yourself, you start to figure out where relevant papers are published, whose papers you should read and how to find those papers, so this cuts the number of papers you read in half. Eventually, as you become very comfortable with your field and understand how people do things, you basically end up not having to read any full papers. Just by reading the abstract, conclusion and viewing the figures/tables you can get a strong idea of what the paper is showing. Only in the rare circumstance of something being exceptionally interesting/relevant/confusing does one read a paper start to finish at this juncture in your career. So basically your ‘reading’ and ‘understanding’ graphs look like this (x-axis is years):

graph of journals reading
While the numbers may change field to field the trend does not, because independent of your field, you learn more and pick up on things more quickly. So the information to time ration increases dramatically.


The ‘top’ researchers in any field essentially use conferences as a way to keep updated on progress that is unpublished or simply missed despite best efforts to read new material. In all cases, conferences enable you to listen to researchers present condensed versions of their work. In many cases, there is also time to network and interact with people in your field. There is no good replacement to this process of personal interaction to stay updated on progress.

So despite people’s belief that journals are the mechanism of information transfer in the sciences; this does not happen in all cases, and certainly is not the way top researchers get information.  It is by direct communication with their peers that the most valuable information is transferred. Knowing this, it gives us something to consider when we think about journals.

  • Are they still a relevant method for communicating science?
  • What are other ways we could more efficiently communicate scientific advances to our peers?

As always I welcome comments on ideas around this topic!

Women are 2X more likely to get STEM Academic Jobs than Men

This past week, several news sources covered a study out of Cornell that shows a 2:1 preference to hire women faculty in STEM fields. The graph that they produced says it all:

PNAS graph showing woman preference

The graph is showing the percentage of male and female voters (M vs F grouping) who voted in favor of hiring a male or female applicant (left vs. right bar within grouping) in each of four fields. As you can see, only in economics do male voters show an equivalent behavior toward both male and female applicants.

When I shared this finding with my friends and colleagues, the knee jerk reaction was to attribute this difference to the current push for women in engineering. Hence, there is an inherent bias toward hiring a female candidate. However, my reaction was somewhat different. I thought that if this bias existed as a way to increase female participation, then it should also exist for every other minority group in the sciences. Meaning, there should be a bias toward any underrepresented group, such as Hispanic, African-American, and others. In order to test this theory, I tried to find data about hiring of minorities and wasn’t able to find any good sources (if you have any, please post in the comments!) However, as a proxy, I was able to find data about the success of grant applications to the NSF based on these criteria:

Table about grant success

First of all: amazing data published by the NSF!

Secondly, let’s look at the data over the past decade.

Every year in the published data, Women are the most successful in receiving grants, followed by men, followed by minorities

This is despite the fact that the NSF aims to improve participation from underrepresented minorities in general.

So to me the popular media answer pointing to the current push for women in engineering is not a strong enough explanation. Particularly, an article in Time magazine brushes the discussion aside in a particularly crass fashion saying ‘For years it has been apparent that hiring bias runs in favor of women, not against them. It’s time to shut down the costly diversity bureaucracy and allow faculty to hire on merit alone.’ This is a pretty naive view point! The NSF data shows pretty clearly that there are some underlying advantages that women have, that give them a boost over other underrepresented groups.

In many ways, the real question is, what is it that women are doing better than other groups to obtain this statistically significant level of success? And perhaps more importantly, what can we learn from this?

Why do Scientists want to Publish in Nature or Science?

dream of nature

Academics have one primary currency, papers. Just like real currencies, the value of every paper is different, with a well-established high value placed on Nature, Science and similar high-impact journals. In this article I wanted to explore why publishing in these journals is so desirable to academics. More importantly, to ask the important question of whether these purposes can be solved in ways that are less expensive.

First of all, let’s establish the costs associated with publishing in Nature or Science:

  • Time to do research:The burden is on the researcher to present a full and compelling story, this often implies that an enormous body of work is presented as a ‘single finding’ in one paper. If you read one of these articles however, it is clear that the authors are showing several independently important results, just presented as one to improve ‘impact.’ This is self-evident in the absurdly long supplemental information sections of a Nature or Science paper.
  • Time in review:The average time a paper spends in review at a high impact journal is much longer than other journals, largely because reviewers are exceptionally critical and feel obliged to send a long list of comments, simply because the author submitted to a high impact journal. Also, higher impact journals will solicit more reviewers than medium impact journals, eg. Nature sends it to 2-4 as opposed to average of 2. Further, there are usually multiple rounds of review at these journals (again, much of which is inflated simply because of ‘journal ego’).
  • Reviewer conflict of interest:Reviewers of high impact papers are always in a conflict of interest. The acceptance of a colleague’s paper in a high impact journal means that colleague is creating ‘high impact’ and ‘novel’ work in your field. This is a terrible conflict of interest where a researcher has to accept that someone else is producing ‘important’ work in their field, even if they are not. This is often a major indirect cost in these journals as it makes reviews highly biased and stretches ethics or academic integrity.
  • Monetary cost:High impact journals, by virtue of being desirable also charge the most for publishing in their journals. This cost is borne directly by the authors despite the work not being made open access in general.

These are some pretty high costs! The time aspect is certainly the biggest cost. I have known several researchers for whom first submission to final publication takes over 3 years (though Nature will never let these figures be known). So, let’s now explore why researchers bear these costs and maintain such a desire to publish in these journals:

  1. Prestige / Perceived value: The sheer ability to publish work in one of these journals carries with it a perceived value by the scientific community that the research is ‘important’ and ‘good’ enough to get through the obscenely long and often unnecessary peer review process. However, this aspect of the benefit has to ‘real’ value. Meaning that the article does not ever have to help anyone or provide substantive value (eg. via citations) to have the researcher gain this perception benefit.
  1. Citation value: A major marketing benefit of a ‘high impact’ journal is that they carry a high Impact Factor. This simply means that the average number of citations per year for an article in that journal is high. However it does not mean that your article will be cited. In fact, journals have a very high skew, with few articles getting cited a lot and most articles getting cited very little. In Nature for example, the average number of citations for an article is 121 (over its life, till date).  However, the median is 24 citations and over 40% of articles are cited less than 10 times! So if you submit to Nature, and you produce an ‘average’ Nature article, it will be cited 24 times-total!*
  1. Publicity value: The second major marketing advantage is that traditional and other mass media look to Science and Nature to publish flashy articles about cool new science. Given that Nature and Science are meant to be broad interest journals, this makes it very easy to get mass media to write about discoveries published in these journals. Further, the ‘news and views’ section in Nature journals makes it even more media friendly by getting top scientists to write their opinions of an article in that same issue – in lay words.
  1. Readership value: Building slightly on a point made in the previous section, these journals are broad. This means that scientists that have nothing to do with your field will still at least see your article (even if they don’t read it). This exposure benefit implies that there is a higher chance that your work will influence someone in some way that you didn’t expect, i.e. increase your real impact.

So how can scientists reap the benefits without the costs of a high impact journal?

OK, so now that we have a pretty strong understanding of the various benefits a researcher obtains through publication in a high impact journal, we can elicit an important insight, that all of the benefits a high impact journal provides is on the marketing side of the paper.

Given that they are providing marketing value, the one advantage that is hard to reproduce or replace, is brand value. Our marketing friends at any major company will tell you that brand is hard to replace. However, it is certainly not impossible, especially given how easy it is to establish personal brands on the internet these days. In fact, several faculty have started doing this (check out this list!), using the internet to make themselves more discoverable, and have their personal brand push them forward rather than rely on Nature to do it for them.

The other benefits outlined above all boil down to a single insight, that high impact journals push articles to more eyes. It is pretty clear given the citation analysis above that your article must provide some substantive value for people to cite it, i.e. there are ‘duds’ even in Nature. So, to increase your citation count you simply need to get your article read by others. This implies 2 very simple actionable insights:

  1. You should write a lay version of your article, i.e. your own ‘news and views’ so that it is accessible to mass media and other fields
  2. You should promote your own content: either via 3rd party media outlets or direct

Thankfully, these are very easy to do. The internet has made accessible the ability to push or pull content to anyone in the world extremely easily. In fact, several faculty already do this in some form. I recently got forwarded an e-mail that was written by the author of a new paper directly to my Professor to advertise the paper (the paper was attached so we could read it too)! The list of actions one can take is quite large, but to get you started here are some:

  1. Direct e-mail to people you know are in your field: this is easy to find, just look at your citations to start!
  2. Writing a guest blog in a science blog: Check out scienceblogs.com
  3. Pushing your new article on social media: If professors don’t use social media, it doesn’t matter, grad students certainly do!
  4. Getting an article written by your university newspaper: I know MIT is really good about putting their research news on the front page of their website. Other universities do also get their faculties’ work pushed into the media!

The list of ways to get your article discovered is quite long, I haven’t even addressed SEO ideas for example. Regardless, there are plenty of other resources that you can find on getting content discovered.

So let’s return to the original question: ‘Why do Scientists want to Publish in Nature or Science?’ It looks to me like the value these journals provide can be replaced with lower cost alternatives. In fact, not only can scientists avoid the pain of trying to publish in these journals, they can do better by influencing directly the visibility and access to their work. The cherry that tops it off is that all of the content and marketing work they put in is owned by them and remains an asset of the scientist as opposed to that of Nature or Science. I am excited to see more and more scientists take ownership of their work, and build a stronger scientific ecosystem!

As always, let me know your thoughts in the comments section below. Specifically, if you have successfully marketed your work on your own, leave those ideas here for others to benefit from your insight!

*These statistics were calculated using the citation report generate by Thomson Reuters Web of Science.