The main purpose of a scientific paper is to report new results, usually experimental, and to relate these results to previous knowledge in the field. Papers are one of the most important ways that we communicate with one another.
In understanding how to read a paper, we need to start at the beginning with a few preliminaries. We then address the main questions that will enable you to understand and evaluate the paper.
2. How do I prepare to read a paper, particularly in an area not so familiar to me?
4. How do I understand and evaluate the contents of the paper?
In most scientific journals, scientific papers follow a standard format. They are divided into several sections, and each section serves a specific purpose in the paper. We first describe the standard format, then some variations on that format.
A paper begins with a short Summary or Abstract. Generally, it gives a brief background to the topic; describes concisely the major findings of the paper; and relates these findings to the field of study. As will be seen, this logical order is also that of the paper as a whole.
The next section of the paper is the Introduction. In many journals this section is not given a title. As its name implies, this section presents the background knowledge necessary for the reader to understand why the findings of the paper are an advance on the knowledge in the field. Typically, the Introduction describes first the accepted state of knowledge in a specialized field; then it focuses more specifically on a particular aspect, usually describing a finding or set of findings that led directly to the work described in the paper. If the authors are testing a hypothesis, the source of that hypothesis is spelled out, findings are given with which it is consistent, and one or more predictions are given. In many papers, one or several major conclusions of the paper are presented at the end of this section, so that the reader knows the major answers to the questions just posed. Papers more descriptive or comparative in nature may begin with an introduction to an area which interests the authors, or the need for a broader database.
The next section of most papers is the Materials and Methods. In some journals this section is the last one. Its purpose is to describe the materials used in the experiments and the methods by which the experiments were carried out. In principle, this description should be detailed enough to allow other researchers to replicate the work. In practice, these descriptions are often highly compressed, and they often refer back to previous papers by the authors.
The third section is usually Results. This section describes the experiments and the reasons they were done. Generally, the logic of the Results section follows directly from that of the Introduction. That is, the Introduction poses the questions addressed in the early part of Results. Beyond this point, the organization of Results differs from one paper to another. In some papers, the results are presented without extensive discussion, which is reserved for the following section. This is appropriate when the data in the early parts do not need to be interpreted extensively to understand why the later experiments were done. In other papers, results are given, and then they are interpreted, perhaps taken together with other findings not in the paper, so as to give the logical basis for later experiments.
The fourth section is the Discussion. This section serves several purposes. First, the data in the paper are interpreted; that is, they are analyzed to show what the authors believe the data show. Any limitations to the interpretations should be acknowledged, and fact should clearly be separated from speculation. Second, the findings of the paper are related to other findings in the field. This serves to show how the findings contribute to knowledge, or correct the errors of previous work. As stated, some of these logical arguments are often found in the Results when it is necessary to clarify why later experiments were carried out. Although you might argue that in this case the discussion material should be presented in the Introduction, more often you cannot grasp its significance until the first part of Results is given.
Finally, papers usually have a short Acknowledgements section, in which various contributions of other workers are recognized, followed by a Reference list giving references to papers and other works cited in the text.
Papers also contain several Figures and Tables. These contain data described in the paper. The figures and tables also have legends, whose purpose is to give details of the particular experiment or experiments shown there. Typically, if a procedure is used only once in a paper, these details are described in Materials and Methods, and the Figure or Table legend refers back to that description. If a procedure is used repeatedly, however, a general description is given in Materials and Methods, and the details for a particular experiment are given in the Table or Figure legend.
Variations on the organization of a paper
In most scientific journals, the above format is followed. Occasionally, the Results and Discussion are combined, in cases in which the data need extensive discussion to allow the reader to follow the train of logic developed in the course of the research. As stated, in some journals, Materials and Methods follows the Discussion. In certain older papers, the Summary was given at the end of the paper.
The formats for two widely-read journals, Science and Nature, differ markedly from the above outline. These journals reach a wide audience, and many authors wish to publish in them; accordingly, the space limitations on the papers are severe, and the prose is usually highly compressed. In both journals, there are no discrete sections, except for a short abstract and a reference list. In Science, the abstract is self-contained; in Nature, the abstract also serves as a brief introduction to the paper. Experimental details are usually given either in endnotes (for Science) or Figure and Table legends and a short Methods section (in Nature). Authors often try to circumvent length limitations by putting as much material as possible in these places. In addition, an increasingly common practice is to put a substantial fraction of the less-important material, and much of the methodology, into Supplemental Data that can be accessed online.
Many other journals also have length limitations, which similarly lead to a need for conciseness. For example, the Proceedings of the National Academy of Sciences (PNAS) has a six-page limit; Cell severely edits many papers to shorten them, and has a short word limit in the abstract; and so on.
In response to the pressure to edit and make the paper concise, many authors choose to condense or, more typically, omit the logical connections that would make the flow of the paper easy. In addition, much of the background that would make the paper accessible to a wider audience is condensed or omitted, so that the less-informed reader has to consult a review article or previous papers to make sense of what the issues are and why they are important. Finally, again, authors often circumvent page limitations by putting crucial details into the Figure and Table legends, especially when (as in PNAS) these are set in smaller type. Fortunately, the recent widespread practice of putting less-critical material into online supplemental material has lessened the pressure to compress content so drastically, but it is still a problem for older papers.
Although it is tempting to read the paper straight through as you would do with most text, it is more efficient to organize the way you read. Generally, you first read the Abstract in order to understand the major points of the work. The extent of background assumed by different authors, and allowed by the journal, also varies as just discussed.
One extremely useful habit in reading a paper is to read the Title and the Abstract and, before going on, review in your mind what you know about the topic. This serves several purposes. First, it clarifies whether you in fact know enough background to appreciate the paper. If not, you might choose to read the background in a review or textbook, as appropriate.
Second, it refreshes your memory about the topic. Third, and perhaps most importantly, it helps you as the reader integrate the new information into your previous knowledge about the topic. That is, it is used as a part of the self-education process that any professional must continue throughout his/her career.
If you are very familiar with the field, the Introduction can be skimmed or even skipped. As stated above, the logical flow of most papers goes straight from the Introduction to Results; accordingly, the paper should be read in that way as well, skipping Materials and Methods and referring back to this section as needed to clarify what was actually done. A reader familiar with the field who is interested in a particular point given in the Abstract often skips directly to the relevant section of the Results, and from there to the Discussion for interpretation of the findings. This is only easy to do if the paper is organized properly.
Many papers contain shorthand phrases that we might term 'codewords', since they have connotations that are generally not explicit. In many papers, not all the experimental data are shown, but referred to by "(data not shown)". This is often for reasons of space; the practice is accepted when the authors have documented their competence to do the experiments properly (usually in previous papers). Two other codewords are "unpublished data" and "preliminary data". The former can either mean that the data are not of publishable quality or that the work is part of a larger story that will one day be published. The latter means different things to different people, but one connotation is that the experiment was done only once.
Several difficulties confront the reader, particularly one who is not familiar with the field. As discussed above, it may be necessary to bring yourself up to speed before beginning a paper, no matter how well written it is. Be aware, however, that although some problems may lie in the reader, many are the fault of the writer.
One major problem is that many papers are poorly written. Some scientists are poor writers. Many others do not enjoy writing, and do not take the time or effort to ensure that the prose is clear and logical. Also, the author is typically so familiar with the material that it is difficult to step back and see it from the point of view of a reader not familiar with the topic and for whom the paper is just another of a large stack of papers that need to be read.
Bad writing has several consequences for the reader. First, the logical connections are often left out. Instead of saying why an experiment was done, or what ideas were being tested, the experiment is simply described. Second, papers are often cluttered with a great deal of jargon. Third, the authors often do not provide a clear road-map through the paper; side issues and fine points are given equal air time with the main logical thread, and the reader loses this thread. In better writing, these side issues are relegated to Figure legends, Materials and Methods, or online Supplemental Material, or else clearly identified as side issues, so as not to distract the reader.
Another major difficulty arises when the reader seeks to understand just what the experiment was. All too often, authors refer back to previous papers; these refer in turn to previous papers in a long chain. Often that chain ends in a paper that describes several methods, and it is unclear which was used. Or the chain ends in a journal with severe space limitations, and the description is so compressed as to be unclear. More often, the descriptions are simply not well-written, so that it is ambiguous what was done.
Other difficulties arise when the authors are uncritical about their experiments; if they firmly believe a particular model, they may not be open-minded about other possibilities. These may not be tested experimentally, and may even go unmentioned in the Discussion. Still another, related problem is that many authors do not clearly distinguish between fact and speculation, especially in the Discussion. This makes it difficult for the reader to know how well-established are the "facts" under discussion.
One final problem arises from the sociology of science. Many authors are ambitious and wish to publish in trendy journals. As a consequence, they overstate the importance of their findings, or put a speculation into the title in a way that makes it sound like a well-established finding. Another example of this approach is the "Assertive Sentence Title", which presents a major conclusion of the paper as a declarative sentence. This trend is becoming prevalent; look at recent issues of Cell for examples. It's not so bad when the assertive sentence is well-documented (as it was in the example given), but all too often the assertive sentence is nothing more than a speculation, and the hasty reader may well conclude that the issue is settled when it isn't.
These last factors represent the public relations side of a competitive field. This behavior is understandable, if not praiseworthy. But when the authors mislead the reader as to what is firmly established and what is speculation, it is hard, especially for the novice, to know what is settled and what is not. A careful evaluation is necessary, as we now discuss.
A thorough understanding and evaluation of a paper involves answering several questions:
a. What questions does the paper address?
b. What are the main conclusions of the paper?
c. What evidence supports those conclusions?
d. Do the data actually support the conclusions?
e. What is the quality of the evidence?
f. Why are the conclusions important?
Before addressing this question, we need to be aware that research in biochemistry and molecular biology can be of several different types:
Type of research
What is there? What do we see?
How does it compare to other organisms? Are our findings general?
How does it work? What is the mechanism?
Descriptive research often takes place in the early stages of our understanding of a system. We can't formulate hypotheses about how a system works, or what its interconnections are, until we know what is there. Typical descriptive approaches in molecular biology are DNA sequencing and DNA microarray approaches. In biochemistry, one could regard x-ray crystallography as a descriptive endeavor.
Comparative research often takes place when we are asking how general a finding is. Is it specific to my particular organism, or is it broadly applicable? A typical comparative approach would be comparing the sequence of a gene from one organism with that from the other organisms in which that gene is found. One example of this is the observation that the actin genes from humans and budding yeast are 89% identical and 96% similar.
Analytical research generally takes place when we know enough to begin formulating hypotheses about how a system works, about how the parts are interconnected, and what the causal connections are. A typical analytical approach would be to devise two (or more) alternative hypotheses about how a system operates. These hypotheses would all be consistent with current knowledge about the system. Ideally, the approach would devise a set of experiments todistinguish among these hypotheses. A classic example is the Meselson-Stahl experiment.
Of course, many papers are a combination of these approaches. For instance, researchers might sequence a gene from their model organism; compare its sequence to homologous genes from other organisms; use this comparison to devise a hypothesis for the function of the gene product; and test this hypothesis by making a site-directed change in the gene and asking how that affects the phenotype of the organism and/or the biochemical function of the gene product.
Being aware that not all papers have the same approach can orient you towards recognizing the major questions that a paper addresses.
What are these questions? In a well-written paper, as described above, the Introduction generally goes from the general to the specific, eventually framing a question or set of questions. This is a good starting place. In addition, the results of experiments usually raise additional questions, which the authors may attempt to answer. These questions usually become evident only in the Results section.
This question can often be answered in a preliminary way by studying the abstract of the paper. Here the authors highlight what they think are the key points. This is not enough, because abstracts often have severe space constraints, but it can serve as a starting point. Still, you need to read the paper with this question in mind.
Generally, you can get a pretty good idea about this from the Results section. The description of the findings points to the relevant tables and figures. This is easiest when there is one primary experiment to support a point. However, it is often the case that several different experiments or approaches combine to support a particular conclusion. For example, the first experiment might have several possible interpretations, and the later ones are designed to distinguish among these.
In the ideal case, the Discussion begins with a section of the form "Three lines of evidence provide support for the conclusion that... First, ...Second,... etc." However, difficulties can arise when the paper is poorly written (see above). The authors often do not present a concise summary of this type, leaving you to make it yourself. A skeptic might argue that in such cases the logical structure of the argument is weak and is omitted on purpose! In any case, you need to be sure that you understand the relationship between the data and the conclusions.
One major advantage of doing this is that it helps you to evaluate whether the conclusion is sound. If we assume for the moment that the data are believable (see next section), it still might be the case that the data do not actually support the conclusion the authors wish to reach. There are at least two different ways this can happen:
i. The logical connection between the data and the interpretation is not sound
ii. There might be other interpretations that might be consistent with the data.
One important aspect to look for is whether the authors take multiple approaches to answering a question. Do they have multiple lines of evidence, from different directions, supporting their conclusions? If there is only one line of evidence, it is more likely that it could be interpreted in a different way; multiple approaches make the argument more persuasive.
Another thing to look for is implicit or hidden assumptions used by the authors in interpreting their data. This can be hard to do, unless you understand the field thoroughly.
This is the hardest question to answer, for novices and experts alike. At the same time, it is one of the most important skills to learn as a young scientist. It involves a major reorientation from being a relatively passive consumer of information and ideas to an active producer and critical evaluator of them. This is not easy and takes years to master. Beginning scientists often wonder, "Who am I to question these authorities? After all the paper was published in a top journal, so the authors must have a high standing, and the work must have received a critical review by experts." Unfortunately, that's not always the case. In any case, developing your ability to evaluate evidence is one of the hardest and most important aspects of learning to be a critical scientist and reader.
How can you evaluate the evidence?
First, you need to understand thoroughly the methods used in the experiments. Often these are described poorly or not at all.. The details are often missing, but more importantly the authors usually assume that the reader has a general knowledge of common methods in the field (such as immunoblotting, cloning, genetic methods, or DNase I footprinting). If you lack this knowledge, as discussed you have to make the extra effort to inform yourself about the basic methodology before you can evaluate the data.
Second, you need to know the limitations of the methodology. Every method has limitations, and if the experiments are not done correctly they can't be interpreted.
For instance, an immunoblot is not a very quantitative method. Moreover, in a certain range of protein the signal increases (that is, the signal is at least roughly "linear"), but above a certain amount of protein the signal no longer increases. Therefore, to use this method correctly one needs a standard curve that shows that the experimental lanes are in a linear range. Often, the authors will not show this standard curve, but they should state that such curves were done. If you don't see such an assertion, it could of course result from bad writing, but it might also not have been done. If it wasn't done, a dark band might mean "there is this much protein or an indefinite amount more".
Third, importantly, you need to distinguish between what the data show and what the authors say they show. The latter is really an interpretation on the authors' part, though it is generally not stated to be an interpretation. Papers usually state something like "the data in Fig. x show that ...". This is the authors' interpretation of the data. Do you interpret it the same way? You need to look carefully at the data to ensure that they really do show what the authors say they do. You can only do this effectively if you understand the methods and their limitations.
Fourth, it is often helpful to look at the original journal, or its electronic counterpart, instead of a photocopy. Particularly for half-tone figures such as photos of gels or autoradiograms, the contrast is distorted, usually increased, by photocopying, so that the data are misrepresented.
Fifth, you should ask if the proper controls are present. Controls tell us that nature is behaving the way we expect it to under the conditions of the experiment. If the controls are missing, it is harder to be confident that the results really show what is happening in the experiment. You should try to develop the habit of asking "where are the controls?" and looking for them.