Thoughts on Inference and Method

January 19, 2014 10:57 pm1 comment

One pretty certain fact about science is that much of the public is more interested in the various findings of science, than in the methods by which they are generated. Science generates some “wow, cool!” findings, so this is (somewhat) understandable. Science writers generally oblige this interest by focusing on findings over methods, thereby implicitly trusting that whatever (perhaps mysterious) methods were used to obtain them, were legitimate. Scientists themselves look at the matter pretty differently though.

Pulling ice core by brookpeterson via Flickr

Scientists pulling ice core by brookpeterson via Flickr

I, for example, almost always prefer issues in the validity of a study’s methods over the findings themselves, and is why, as soon as I understand the essentials being reported, I go straight to the methods section. That’s because another fact in science is that if your methods are not valid, you will come to wrong conclusions. And a third fact–and this one’s key–is that analytical methods used in studies can be complex, wildly varying, poorly described, and most importantly, not well evaluated before specific application. So I think it’s important to outline some concepts that determine scientific practice, that is, some epistemology, however boring or obvious that might seem.

Inference, roughly, is the process of reaching conclusions about the world based on observations of it. Different branches of science have different inferential constraints; these occupy central places in how they operate, but there are some common logical considerations across all. We start from the general observation that the world is complex: an enormous number of objects interact in a huge number of ways. The most powerful way to arrive at an understanding of cause and effect in any system is to conduct manipulative experiments, varying system components or processes systematically and observing the results—think high school physics or chemistry lab. In a now classic paper 50 years ago, Platt (1964) termed this general approach “strong inference”, and its power derives from its efficient ability to separate and estimate the effects of individual causes (“drivers”), and their various interactions, a task which is often otherwise difficult to impossible. From these estimates one progressively builds a model of the system that tries to explain the dominant sources of variation therein. This process also goes, in philosophy, by the name of reductionism.

Photo of smoke plume in southwestern Colorado taken by astronauts aboard the ISS during Expedition 36. Credit: NASA Marshall Flight Center via Flickr

Photo of smoke plume in southwestern Colorado taken by astronauts aboard the ISS during Expedition 36. Credit: NASA Marshall Flight Center via Flickr

The problem with this (one that Platt avoids), is that you very often can’t execute anything like it, for either physical or practical (e.g. cost) reasons. Such limits are obvious for any phenomenon operating at large scales of space or time, which includes much of the earth and environmental sciences; if earth scientists could make 50 earths and run 500 year experiments on them, they likely would. So you have to come up with some other way of identifying and quantifying the magnitudes of the various system interactions. Now this is a truly critical point, because if you can’t do so, you can and likely will come to mistaken conclusions about reality, because a common characteristic of nature is that many variables co-vary with each other to varying degrees. This issue’s importance corresponds to a given system’s complexity, and I would argue that many of the major scientific mistakes result from this problem.

One can conceive of (at least) two major conceptual approaches to resolving the problem. The first recognizes that it’s not always necessary to fully abandon the idea of a controlled experiment. In an approach known as a “natural experiment”, one can sometimes take advantage of the fact that nature at times provides situations in which certain important system drivers vary naturally, while others do not. These variations may not be perfectly ideal, as they might be in a manipulative experiment, but still good enough to give important insights into cause and effect. Using a fire ecology example, you might for example have a landscape across which wildfires burn at varying intensities, or seasons, but in which a number of other factors known to be of potential importance in fire behavior are roughly equal, e.g. topography, starting biomass, or relative humidity. Intentionally burning large landscape areas at different intensities is not politically feasible, but since unplanned wildfires are common, valuable information can often be obtained.

I would argue that variations on this basic idea are the most powerful way of quantifying cause and effect in complex systems not amenable to controlled experimentation, and that in fact humans employ this idea commonly in assessing various things in everyday life. But natural experiments of high enough quality do not always present themselves. What then? Well, that’s right where things start to get analytically hairy, resulting in a large army of statistical techniques and approaches, often discipline specific and jargony, and thus where one has to start being very careful as a reader. And statisticians have frankly not been good at making complex techniques easily understandable to the scientists who need to use them.

Principal component analysis graph from a PhD thesis by Matthias Scholz. Republished by Michael Quirke under fair use.

Principal component analysis graph from a PhD thesis by Matthias Scholz. Republished by Michael Quirke under fair use.

One approach is to switch the focus from the classical, step-wise analysis of single elements of the system, to an analysis of the entire system simultaneously. Mathematically, this is accomplished by the use of multivariate statistical methods, which very roughly, replace the original system variables with a new set of “synthetic” variables that capture, in descending order, the major patterns of variation in those original variables, but which unlike the originals, are uncorrelated with each other. This can be done with either or both of the response and driving variables, resulting in a description of the system dynamics at a more synthetic (inclusive) level of organization. The most common of these techniques in climate science is known as principal component analysis (PCA), while in ecology that and several other techniques, such as correspondence analysis, are common. It’s an interesting and powerful approach, but it necessarily sacrifices an understanding of interactions between individual system components. And that unfortunately is likely to a big problem whenever several processes affect a given response variable, as is common in biology or most any complex system for that matter.

Example of structural equation modeling from University of South Florida website.

Example of structural equation modeling from University of South Florida website.

Another more sophisticated class of multivariate techniques is designed to address that problem. These use detailed analyses of correlations of all system variables with each other, individually, to hypothesize the magnitudes of the cause and effect relationships between all of them, including both direct (A–>B), and indirect (e.g. A–>B–>C; A affects C, via B), relationships. (Note that I said “hypothesize” there.) They include techniques known as structural equation modeling and path analysis. They extract the maximum amount of information from the system and present a “most likely” cause and effect depiction of the entire system, as derived from a particular set of data alone.

Both of these approaches are purely statistical: system observations themselves, rather than outside information (i.e. pre-existing or theoretical knowledge of some type), determine the estimates of the system processes during model building. Scientific insight and training are crucial in choosing between the former and the latter sources, and this decision process is not easily described or explained by simple rules, because it involves context-specific decisions. It constitutes the important art of knowing which information is most relevant to the particular problem at hand, which in turn involves other questions, such as how broadly generalizable one’s conclusions will be, issues of data quality and availability, and others. Opinions and debate about such decisions, i.e. determination of the best strategy for addressing a given problem, can and should form a core part of legitimate scientific debate, because they are by no means always clear-cut, and thus disagreements can readily arise.

Once a model is constructed, by whatever method, then controlled experimentation can begin, by systematically varying the model’s parameters, or even the more basic structures of its equations. In so doing, we have exchanged the experimentation we would like to do on the actual system, with experimentation on the quantitative model(s) of that system. This practice throws a lot of people, especially non-scientists, who wonder what the point is if you don’t really know how well your model(s) reflect reality to begin with. This concern is entirely valid, and we should not trust any model that does not get the critical things right, either as a basis for further scientific studies, or for real-world predictions.

But there’s more to it than just that; for one, what it means to get something “right” with a model is a persistent source of confusion. But more generally, in the process of experimenting with a model, you often learn highly important things. Your understanding improves as you inspect model output and gauge it against observations, in a variety of formal and informal ways. It informs you on how equations interact with each other in producing complex output. It can show you where certain kinds of observations are especially needed, thereby making monitoring system evolution more efficient (and thus, e.g., saving money).

There are many more, often detailed, considerations than just these in conducting research; this is just a framework of some important concepts and approaches that might be useful.
Note: edited on 1-21 for clarity

Reference
Platt, J. R. (1964). Strong inference. Science 146(3642): 347-353.

Physics Lab, Cushing Memorial Library and Archives, Texas A&M

Physics Lab by Cushing Memorial Library and Archives, Texas A&M via Flickr

THE FORUM'S COMMENT THREAD

  • Jim’s second approach involves developing a statistical model of the system under study. A somewhat different approach uses the same term, “model”, but in the context of a dynamical model of the climate system.

    While the core of a statistical model is a set of statistical inferences, the core of a dynamical model is a set of physical laws. Those physical laws (an example would be that force equals mass times acceleration) have been well established for a century or more.

    While a statistical model represents the relationships among things, a dynamical model represents the relationships among various adjoining bits of fluid that together make up the atmosphere and ocean.

    Such dynamical models have two inherent shortcomings. First, they cannot possibly simulate all the physical interactions throughout the climate system: there are simply too many interactions and not enough computer power. Second, and as a consequence of the first, important interactions at a scale too small to be simulated are represented by a mix of physical and statistical approximations, known as “parameterizations”.

    Dynamical models, many of which are called “global climate models” are very useful as experimental tools as long as one does not forget that they do not perfectly mimic the atmosphere. I like to say that they simulate a world quite similar to our own.

    One application is hypothesis-testing. Suppose you have a hypothesis about how the climate system works. If none of the approximations inherent in a global climate model would interfere with the physics of your hypothesis, you can further hypothesize that a global climate model ought to work the same way. You can then perform controlled experiments with the global climate model. If the global climate model doesn’t work as hypothesized, that’s evidence that your hypothesis may not work in the real world either. If it works in the model, then it’s time to check the observational evidence.

Leave a Reply

You must be logged in to post a comment.

PUBLIC COMMENT THREAD

  • http://atmo.tamu.edu/profile/JNielsen-Gammon John Nielsen-Gammon

    Jim’s second approach involves developing a statistical model of the system under study. A somewhat different approach uses the same term, “model”, but in the context of a dynamical model of the climate system.

    While the core of a statistical model is a set of statistical inferences, the core of a dynamical model is a set of physical laws. Those physical laws (an example would be that force equals mass times acceleration) have been well established for a century or more.

    While a statistical model represents the relationships among things, a dynamical model represents the relationships among various adjoining bits of fluid that together make up the atmosphere and ocean.

    Such dynamical models have two inherent shortcomings. First, they cannot possibly simulate all the physical interactions throughout the climate system: there are simply too many interactions and not enough computer power. Second, and as a consequence of the first, important interactions at a scale too small to be simulated are represented by a mix of physical and statistical approximations, known as “parameterizations”.

    Dynamical models, many of which are called “global climate models” are very useful as experimental tools as long as one does not forget that they do not perfectly mimic the atmosphere. I like to say that they simulate a world quite similar to our own.

    One application is hypothesis-testing. Suppose you have a hypothesis about how the climate system works. If none of the approximations inherent in a global climate model would interfere with the physics of your hypothesis, you can further hypothesize that a global climate model ought to work the same way. You can then perform controlled experiments with the global climate model. If the global climate model doesn’t work as hypothesized, that’s evidence that your hypothesis may not work in the real world either. If it works in the model, then it’s time to check the observational evidence.