The Big Why of Evaluation

Impact Evaluations - How Good is Good Enough?

Written by Jake Millette | December 22, 2021 at 10:07 PM

A major theme of the Big Why of Evaluation is that evaluations always involve balancing accuracy and costs due to time and budget constraints as well as the pesky issue of dealing with counterfactuals. The eternal question of “how good is good enough” drives one of Michaels Energy’s core values of “intuitive analysis.”

The key to impact evaluations is managing uncertainty. Often, this is measured in terms of confidence and precision related to sampling error (i.e., the industry standard 90% confidence/10% precision threshold), but there are many other types of error that we need to account for that cannot be easily quantified (and are rarely acknowledged). These include measurement error, non-response and self-selection bias, and data processing and analysis errors.[1]

When conducting impact evaluations, we are constantly striking a balance between the most accurate estimates possible and the cost for obtaining them. We need to weigh the value of the information, the costs of evaluation, and the uncertainty of the approach. We need to consider the risk to the program of over or underestimating savings as well as not excluding viable efficiency projects because we may not be certain about their savings. All of this calculus is completely subjective, of course. But in general, low risk projects require less evaluation resources and high risk projects require more.

Fortunately, in addition to being able to maximize precision through sampling approaches, evaluators have a host of tools available to estimate savings. The State and Local Energy Efficiency Action Network (SEEAction) breaks out impact evaluations into three main approaches:

  • Measurement & Verification (M&V) – Discussed in more detail below
  • Deemed Savings – This approach uses stipulated values for projects or parameters that are well-known, documented, and generally accepted. We covered the perils of deemed savings in an earlier post.
  • Large-Scale Consumption Data Analysis – This approach measures the difference between the energy use of facilities that participated in a program, called the treatment group, and a comparison group of facilities that did not (the control group). This approach is primarily used for residential programs where there are large numbers of homogeneous facilities, such as behavior programs and weatherization programs. The two main types of this approach are randomized control trials (RCTs) and quasi-experimental methods.

Focusing In on Measurement and Verification (M&V)

M&V is used to determine gross energy or demand savings at the individual site or project level. This approach is most often used in custom programs where the savings depend on specific measures and customer characteristics. However, many evaluations use a blend of M&V and deemed savings to use site-specific values where needed and stipulated values where not.

Based on the project’s specifics and the evaluation objectives, evaluators develop an M&V plan to lay out the savings determination approach and what information will need to be collected. The plan will describe the data collection activities, potentially including collecting equipment quantities and specifications, observing field conditions, interviewing occupants/users, measuring the appropriate parameters, and metering and monitoring the equipment.

The Efficiency Value Organization’s (EVO) International Performance Measurement & Verification Protocols (IPMVP) framework includes four M&V options which are the industry standards. These include measuring key parameters, measuring all parameters, analyzing the whole facility’s energy savings using billing data regressions, and determining savings through calibrated simulation. Evaluators choose the option to use based on the type and complexity of the project, the uncertainty of its savings, and its value to the program. The choice is also based on the level of measurement (e.g., equipment/system or whole-building) and the type of measurement (e.g., spot or long term metering, use of stipulated values, or billing analysis).

Data Collection Methods

In addition to the IPMVP options available, evaluators can determine savings using different data collection methods. Which approach we use depends on what parameters are required to calculate savings, the uncertainty of the claimed estimates of each parameter, and the cost and uncertainty in measuring each parameter. For IPMVP Option A (Retrofit Isolation – Key Parameter Measurement) and Option B (Retrofit Isolation – All Parameter Measurement), evaluators typically use the following:

Desk Review – This approach relies on reviewing existing information and not conducting data on-site. As part of this approach, an engineer would likely review program tracking data and calculations and check that the correct values and appropriate parameters (e.g., hours of use) were used. They would review invoices and manufacturers’ specification sheets to verify that the installed equipment matches the program data and often contact the site to verify that the equipment was installed and operating.

Desk reviews can be a great blend of cost effectiveness and sufficient rigor for energy efficiency measures that are well researched and have relatively simple calculations. For example, rather than performing costly measurement of the cooling capacity and efficiency of every central AC unit, a desk review can use the manufacturer’s ratings which are determined through an industry standard approval process at fixed conditions.

In some cases, desk reviews are very limited and only focus on certain variables. For example, in one recent project, our team only looked at installation rates which were then applied to a deemed program-level engineering adjustment factor. This seems like a wasted opportunity because the incremental level of effort required to create a project-specific engineering adjustment was minimal – we had to review all of the data and documentation anyway to determine if the right equipment was installed.

Site Visit – For more complex projects where on-site measurements or visual inspections are required, evaluators visit the facility to gather data in person. In addition to the tasks described for desk reviews, projects with site visits often involve metering key parameters of the equipment in order to calculate its energy use and demand. Projects with the highest savings and most complexity often receive site visits because of their high level of uncertainty and risk to the program’s overall savings.

Virtual Site Visit – Because in-person site visits are costly, both evaluators and implementers have been exploring ways to conduct remote assessments and site visits. This approach has been fast-tracked since 2020 due to the Covid pandemic limiting in-person contact. In our experience, Michaels has found that in many cases a virtual site visit can be as effective but lower cost than an in-person visit. Through a combination of pictures, video, and screen sharing facilities’ building automation systems, our engineers are often able to gather all of the information they need while also reducing the burden on the site contact. Although in-person data collection is still often needed for large commercial and industrial projects, these remote assessments balance accuracy and cost well in many cases.

How Good is Good Enough?

Good evaluators understand that every program (and every year of its implementation) is unique. Beyond common requirements for confidence and precision, the methodology used for impact evaluation is up to the evaluator. To balance maximum accuracy, minimal uncertainty, and cost-effectiveness, we need to use all of the tools available and understand that trade-offs are always necessary. Typical strategies for achieving this balance include focusing resources on different key research questions over multiple years or concentrating evaluation efforts on the largest programs or those with the most uncertainty. Ultimately, a big part of evaluation is the intuitive analysis of knowing when the results are good enough and when diminishing returns indicate that additional resources should be focused on other research.

[1] Check out this massive list of cognitive biases if you never want to trust your intuition again.