Validation and the Scientific Organization: Organizing U.S. Climate Modeling (4)
Validation and the Scientific Organization: Organizing U.S. Climate Modeling (4)
This entry is the last I will be writing about organizing U.S. climate modeling, software, and open source communities – for a while. At the end of this entry are links to the blogs/articles in a couple of series. I am going to start by quoting a comment from atmoaggie on the previous entry.
“The difference between all of those (rbr: types of models in a previous comment) and climate models is the ability to study their validity.
I would like to see a climate modeling 10 year forecast of some parameters, such as, maybe, average SST for the month of June 2021. Too specific? How about average global SST for JJA (summer) 2021. Still too specific? Maybe the average global SST for the next 10 years.
I, too, work in modeling. In storm surge modeling, one can very easily tune a model to better match the results for one storm (by adjusting air-sea drag, e.g.) only to find that the model is not useful for forecasting as another parameter or physical calculation is incorrect (the sea floor friction formulation, e.g.)
I bring this up to illustrate what can go wrong when modeling a hindcast, tuning to match observations, and applying that model to forecasts. And climate is far more complex, I think, than tides and TC wind and pressure-forced storm surges.”
I want to bring together two streams of thought that I have pursued over the past few months – validation and the scientific organization. First, I will discuss whether or not climate models can be validated and then argue that the development of a validation plan is at the center of developing a scientific organization.
Validation: As suggested in some of my earlier entries the question about whether or not climate models can be validated is a controversial issue. The controversy lies, first, in philosophy. The formal discussion of whether or not climate models can or cannot be validated often starts with a greatly cited paper by Naomi Oreskes et al. entitled Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences. In fact quoting the first two sentences in the abstract:
“Verification and validation of numerical models of natural systems is impossible. This is because natural systems are never closed and because model results are always nonunique.”
Oreskes et al. argues that the performance of the models can be “confirmed” by comparison with observations. However, if the metric of “validation” is a measure of absolute truth, then such absolute validation is not possible. By such a definition little of the science of complex systems, which would include most biological science, medical science, and nuclear weapons management, can stand up to formal validation.
I will return to the stream I started with the quote from atmoaggie, which makes reference to storm surge model (see here for excellent discussion of storm surges Resio and Westerink). The point of the comment is that the storm surge model can be tuned and thereby calibrated based on observations of past storm surges and theory, but the model may still fail in future predictions of storm surges. This points out a weakness in the development of models of natural systems, that the adjustments of the models to represent a historical situation does not assure that model correctly represents the physics of cause and effect. In fact, this is a general problem with modeling of complex natural systems, if you get the answer “right,” then that does not mean you get it right for the right reason. Hence, in the spirit of Oreskes et al. validation is not possible – there is no absolute to be had.
Yet, aren’t storm surge models useful and usable? The same situation is true for weather models and river forecast models, their correctness cannot be assured in any absolute sense, but aren’t they useful and usable? Atmoaggie poses a set of predictions, all of which are reasonable propositions, that may or may not be convincing to him or her. These do not represent a complete set of metrics to evaluate models, and the success or failure of these predictions does not state in any absolute sense whether or not the models have usable information. There are many more elements of model evaluation that determine our level of confidence in the use of models.
It is easy, therefore, to establish that models that cannot be formally validated can be both useful and usable. The results of these models might not be certain, but the degree of confidence that can be attributed to their calculations is very high. This confidence is, in general, established by many forms of model evaluation and additional sources of relevant information, most importantly, observations and basic physical principles.
Validation, verification, evaluation, certification, confirmation, calibration: All of the words in this list have been used in discussions of how to assess the quality of models. For some, there are nuanced differences between the words, but in the general discussion they are all likely to take on the same meaning – some quantitative measure of model quality. The word “validation;” however, is special. Within political or philosophical arguments, the statement “models cannot be validated,” carries a powerful message, especially if one establishes as a principle that the elimination or the reduction of uncertainty is required prior to taking action (see Shearer and Rood). Many scientists take on the mantra that climate models cannot be validated. When I worked at NASA, the culture was that measurements of temperature (for example) could be validated, but that models could not. But if one is talking about temperatures from satellites over a deep layer of the atmosphere, in the spirit of Oreskes et al., can satellite temperature measurements be validated? We can state with stunning confidence that the satellite temperatures are within a certain closeness of a more intuitive or accepted measure of temperature – like a thermometer on a balloon. This is, to me, more calibration than validation, but in my world at NASA, calibration was done in a lab with standards (and that is why we have NIST). At NASA we talked about models being “evaluated.”
Other arguments I have heard about climate modes defying validation are based on to what do we chose to validate against – what is our standard? Suppose that you are interested in how well the model represents the Pacific Ocean, and I am interested in how well it represents the Arctic Ocean. And the scientist down the hall wants to know how well it represents the ice-age cycles, and another wants to know how well it represents the 20th century temperature variability. There is no absolute way to make these choices. More fundamentally, if it is a climate model then how do we measure “climate?”
The list goes on – I have frequently heard arguments of one community making critical remarks about the “science” of other communities. The weather forecast community relies strongly on forecast skill scores, but these measures are by no means unique and for a variety of reasons often only indirectly relevant to the quality of climate models. There is no fundamental reason that an excellent climate model would automatically be an excellent weather forecasting model. The opposite is true as well. Over the years of my career there have been criticisms of climate science by other fields of physics. The gist of their arguments is that they don’t validate models the same way we do, and since we do a good job, they don’t. These arguments make great fuel for political argument and the maintenance of doubt. (Here is an interesting article by Oreskes and Renouf.)
Validation is, therefore, both controversial and important. I pose that validation is at the center of the development of the scientific organization.
Validation and the Scientific Organization: The definition I have posed for the scientific organization is an organization that as a whole functions according to the scientific method. Therefore, if it is a climate modeling organization the model development path, the modeling problems that are being addressed, are determined in a unified way. In that determination, it is required that ways to measure success be identified. This leads to a strategy of evaluation that is determined prior to the development and implementation of model software. With the existence of an evaluation strategy, a group of scientists who are independent of the developers can be formed to serve as the evaluation team.
The development of an evaluation plan requires that a fundamental question be asked? What is the purpose of the model development? What is the application? If the model is being developed to do “science,” then there is no real constraint that balances the interests of one scientific problem versus another. There is little or no way to set up a ladder of priorities.
Again, I will emphasize that to achieve this, and it can be achieved, is a matter of governance and management. It is a process of developing organizational rather than individual goals. It is a myth to imagine that if a group of individuals are each making the “best” scientific decisions, the accumulation of their activities will be the best integrated science. Science and scientists are not immune to the The Tragedy of the Commons. If one wants to achieve scientifically robust results from a unified body of knowledge, then one needs to manage the components of that body of knowledge so that as a whole the scientific method is honored. Enough on that pulpit.
Back to evaluation and validation – Minimally, the arguments about the nuanced meaning of validation and evaluation are a subject about which the climate modeling community needs to develop a standard. By my interpretation, the evaluation of climate models can be structured and quantified as “validation.”
When I was at NASA I had a programmatic requirement to develop a validation plan. And, yes, my friends and colleagues would tell me that that validation was “impossible.” But I am stubborn, and not so smart, so I persisted and still persist with the notion. That old plan can still be found here in Algorithm Theoretical Basis Document for Goddard Earth Observing System Data Assimilation System (GEOS DAS) with a Focus on Version 2.
The software we produced was an amalgam of weather forecasting and climate modeling. For the validation plan the strategy was taken to define a quantitative baseline of model performance for a set of geophysical phenomena. These phenomena were broadly studied and simulated well enough that they described a credibility threshold for system performance. They were chosen to represent the climate system. Important aspects of this validation approach were that it is defined by a specific suite of phenomena, formally separated validation from development, and relied on both quantitative and qualitative analysis.
The validation plan separated "scientific" validation from "systems" validation. It included steps of routine point-by-point monitoring of simulation and observations, formal measures of quality assessment by measure of fit of simulations and observations, and calculation of skill scores to a set of "established forecasts." There was a melding of methodologies of practices of the study of weather and the study of climate. We distinguished the attributes of the scientific validation from the systems validation. The systems validation, focused on the credibility threshold described above, used simulations that were of longer time scales than the established forecasts and brought attention to a wider range of variables important to climate. The scientific validation was a more open-ended process, often requiring novel scientific investigation of new problems. The modeling software system was released for scientific validation and use after a successful systems validation.
The end result of this process was the quantitative description of the modeling system against a standard set of measures over the course of one modeling release to the next. Did it meet the criterion of the absolute validation? No. Did it provide a defensible quantitative foundation for scientific software and its application? Yes.
All told, it does little to base a body of scientific knowledge on the premise that validation is “impossible.” Rather than following such a premise, which immediately devalues the knowledge base, it is more useful to develop a systematic approach to robust, appropriate validation. This stands to represent the complexity of the Earth’s climate and its investigation that serves not only the scientific method, but the communication of that science to other scientists, and to those with a stake in those scientific results. It sets a standard.
Open Climate Modeling:
Greening of the Desert
Stickiness and Climate Models
Open Source Communities, What are the Problems?
A Culture of Checking
Organizing U.S. Climate Modeling:
Something New in the Past Decade?
The Scientific Organization
A Science-Organized Community
Validation and the Scientific Organization