**Blinded study** = a study where the identity of participants and their data are kept hidden from researchers to prevent bias until after the results are known.

**Box and whisker plot** = a graphical representation of a range of data points. The box represents the middle half of the values (if you were to line all the values up in order, from ¼ to ¾ of the way through), and the whiskers show the full range of the data values. Dots outside the whiskers show outliers that are unusually low or high in the data set.

**Causation** = a relationship between two sets of data, when the trend in one of the sets caused the change in the other. For example, as the temperature in a heating pot with water rises, the amount of water in the pot decreases (because it evaporates).

**Computerized simulation** = an imitation of a real-life system, simplified and designed on a computer. See also “Scientific model”.

**Control group** = one of the tested groups in a scientific experiment, which is NOT exposed to any experimental treatment but is tested under the same conditions as all other groups. This helps the scientists confirm that the experimental treatment actually makes a difference. It also provides a baseline to compare the results of the experiment against. Every good scientific study must have a control group if it wants to reach valid results. That is how the scientists confirm that their findings are not just a coincidence.

**Correlation** = a relationship between two trends in the data, when the two change in sync with each other: as one increases, the other increases too (if it’s directly correlated) or the second one decreases (if it’s inversely correlated). Correlations are easy to demonstrate – you just need to plot the data. They do not, by themselves, prove causation.

**Data** = (scientific data) any type of real-world fact or information related to a question scientists are investigating. (Note that the word “data” is the plural form!)

**Database** = an organized set of data that is usually stored on a computer.

**Data reconstruction** = inferring data from proxy data sources for a time period for which direct measurements are unavailable.

**Distribution** = a mathematical tool that allows you to see the frequency of a given value in a large dataset.

**Hypothesis** = a proposed explanation for an observed phenomenon or a prediction scientists make which can be tested and disproved. Hypotheses are usually based on observation or previous scientific knowledge.

**Index** = a number calculated based on several different variables observed in a data set.

**Median** = in a set of numbers or data points, the median is the number that falls exactly in the middle: half the numbers in a data set are higher and half are lower than the median.

**Model** = a representation of an object, process, or physical system. A paper airplane is a very simple model of a Boeing 747. The simpler the model, the fewer the details it incorporates. A mathematical model represents real-world situations using a variety of mathematical structures (eg. graphs, equations, diagrams). See also “Scientific model.”

**Model organism** = a species that has been widely studied, usually because it is easy to maintain and breed in a laboratory. Scientific discoveries in model organisms can often tell us a lot about the workings of many other organisms, too. A few examples of common model organisms are fruit flies, mice, zebrafish, and the bacteria E. coli.

**Non-linear relationship between two variables** = when a given increase in one variable (x, or the independent variable) does not always correspond to a constant increase in the other variable (y, or the dependent variable), for all possible values of x. The graph of this relationship will be a curve instead of a straight line (as is the case in linear relationships).

**Outlier** = a data point in a set of data that is much bigger or much smaller than the rest of the set.

**Parameter** = a factor we measure. A climate parameter can be the amount of rainfall, maximum or average temperature, length of drought periods, and so on.

**Percentile** = given a set of numbers, the value below which a given percentage of observations fall. For example, if we had the following set of numbers {4, 7, 3, 9, 3}, the 60th percentile would be 4, because 3/5 x 100% = 60% of the set are equal to or less than 4.

**Proxy** = something that represents something else. A proxy data source is an indirect source of data that scientists use when direct measurements are not available. For example, annual tree rings are a proxy for precipitation data.

**Randomized trial** = a study where participants are randomly assigned into either the intervention group or the control group. This is done to prevent bias among researchers, who may be subconsciously wishing for the study to show a particular result.

**Representative sampling** = taking samples that are “typical,” so as not to get too high or too low of an estimate accidentally. For example, if we were measuring the average heights of all citizens in California, but we accidentally sampled no children, then our estimate would be too high. We would want to sample an array of ages that are typical for the state.

**Review article** = a scientific article that compiles past research on one topic to analyze trends and assess the current state of knowledge on that topic.

**Scientific model **= a model that uses our knowledge of natural processes to predict outcomes, make hypotheses and explain phenomena. Models can be material (like the paper airplane), conceptual, or mathematical. A scientific model is often a computer program that attempts to simulate a particular system and to predict how the system would behave in the real world. See also “Model” and “Computerized simulation.”

**Statistically significant** = describing a result that is likely, not due to chance, but rather due to a real process. Scientists define a result as “significant” if it would happen by chance less than 5% of the time (shown as p < 0.05). The more significant a result is the lower its p-value. Statistical significance is an important way for scientists to deal with uncertainty.

**Validation** = checking the accuracy of something. For example, to validate a climate model, scientists run the model to “predict” the weather patterns over the past 20 years. Then they compare the output with the actual measurements and observations we have collected for that time.

**Variable** = in scientific or mathematical models, a factor whose value may vary (e.g. average air temperature, amount of solar radiation, etc.)

**Variance** = a measure of how different the numbers in a set are from one another. For example, the two sets of numbers {6, 2, 3, 1} and {2, 4, 3, 3} have the same average (or mean) of 3, but the first set has a greater variance.