Estimating North American background ozone in U.S. surface air with two independent global models: Variability, uncertainties, and recommendations

Accurate estimates for North American background (NAB) ozone (O 3 ) in surface air over the United States are needed for setting and implementing an attainable national O 3 standard. These estimates rely on simulations with atmospheric chemistry-transport models that set North American anthropogenic emissions to zero, and to date have relied heavily on one global model. We examine, for the first time, NAB estimates for spring and summer 2006 with two independent global models (GEOS-Chem and GFDL AM3). Evaluation of the standard simulations, which include North American anthropogenic emissions, with mid-tropospheric O 3 retrieved from space and ground-level O 3 measurements, shows that the models often bracket the observed values, implying value in developing a multi-model approach to estimate NAB O 3 . Consistent with earlier studies, the models robustly simulate the largest nation-wide NAB levels at high-altitude western U.S. sites (average values of ~40-50 ppb in spring and ~25-40 ppb in summer) where it correlates with observed O 3 . At these sites, a 27-year GFDL AM3 simulation simulates observed O 3 events above 60 ppb and indicates a role for year-to-year variations in NAB O 3 in driving their frequency (contributing 50-60 ppb or more during some events). During summer over the eastern United States (EUS), when photochemical production from regional anthropogenic emissions peaks, NAB is largely uncorrelated with observed values and it is lower than at high-altitude sites (average values of ~20-30 ppb). We identify four processes that contribute substantially to model differences in specific regions and seasons: lightning NO x , biogenic isoprene emissions and chemistry, wildfires, and stratosphere-to-troposphere transport. Differences in model representation of these processes contribute more to uncertainty in NAB estimates than the choice of horizontal resolution within a single model. We propose that future efforts seek to constrain these processes with targeted analysis of multi-model simulations evaluated with observations of O 3 and related species from multiple platforms, and thereby reduce the error on NAB estimates needed for air quality planning.


Introduction
The United States Environmental Protection Agency (U.S. EPA) sets National Ambient Air Quality Standards (NAAQS) to protect public health and environmental welfare. Under the Clean Air Act, ground-level ozone (O 3 ) is regulated as a criteria air pollutant, with a review every five years to assess and incorporate the best available scientific evidence. Following these reviews, the threshold for the O 3 NAAQS has been lowered over the past decade, from 0.08 ppm in 1997 to the current threshold of 0.075 ppm (75 ppb) in 2008, with proposals calling for even lower thresholds, within a range of 60-70 ppb on the basis of the latest health evidence (Federal Register, 2010). In order to better understand how the O 3 NAAQS can most effectively be attained, a fundamental, quantitative understanding of the background O 3 -both magnitude and variability-over the United States is needed. McDonald-Bueller et al. (2011) (Table 1) and, for the first time, compare simulations from two independent models (GEOS-Chem and GFDL AM3) in the context of observational constraints with a focus on spatial, seasonal, and daily variability. Differences between the models provide a first estimate of the error in our quantitative understanding. The type of process-oriented multi-model approach demonstrated here, tied closely to in situ and space-based observations, can harness the strengths of individual models to provide information requested by air quality managers during both the standard-setting and implementation processes.
The term "background" is ambiguous, with several definitions used in practice to estimate it from observations and models (e.g., see discussion in Fiore et al., 2003). The U.S. EPA defines a North American Background (NAB) as the O 3 levels that would exist in the absence of continental North American (i.e., Canadian, U.S., and Mexican) anthropogenic emissions (EPA, 2006). Background O 3 defined this way includes: natural O 3 produced photochemically from non-methane volatile organic compounds (NMVOC) and nitrogen oxides (NO x ) originating from biogenic emissions, wildfire effluents including NO x , NMVOC and carbon monoxide (CO) originating from natural sources such as biogenic emissions from vegetation and wildfires; O 3 produced from precursor emissions outside of North America as well as global methane; and O 3 transported from the stratosphere. This definition restricts NAB to a model construct, estimated in simulations in which North American anthropogenic emissions are set to zero. The desire to quantify the impact of Canadian and Mexican emissions on NAB O 3 has led to the term "U.S. background", a parallel model construct but estimated by setting only U.S. anthropogenic emissions to zero.
The development of effective State Implementation Plans (SIPs), by which states demonstrate how non-attainment regions will reach compliance with the NAAQS, requires an accurate assessment of the role of local, regional, and background sources in contributing to individual high-O 3 events. The Clean Air Act includes a provision for 'exceptional events', whereby high-O 3 events due to natural causes (such as wildfires or stratospheric intrusions) or foreign influence (e.g., Asian pollution) can be exempted from counting towards non-attainment status (Federal Register, 2007). Modeling the specific components of NAB can provide information to aid in interpreting such events including attribution to specific sources.
In the previous review of the O 3 NAAQS, the U.S. EPA considered NAB estimates from the GEOS-Chem model for a single year (Fiore et al., 2003), the only estimates documented in the published literature at that time. Recent work has updated those estimates (Wang et al., 2009;Zhang et al., 2011) and compared them with NAB in regional models using GEOS-Chem boundary conditions (Emery et al., 2012;Mueller and Mallard, 2011) and considered additional years. The first NAB estimates with an independent global model, (GFDL AM3; hereafter AM3; Table 2) were found to episodically reach 60-75 ppb over the Western United States in spring (Lin et al., 2012a).
By contrast, GEOS-Chem estimated a maximum NAB of 65 ppb  the AM3 NAB was typically ~10 ppb higher than GEOS-Chem NAB on days when observations exceeded 70 ppb (Lin et al., 2012a). These studies, however, focused on different simulation years. Here we examine the AM3 and GEOS-Chem NAB estimates in a fully consistent and process-oriented manner for the year 2006, drawing on a multidecadal AM3 simulation to provide context for the single year inter-comparison. We include an evaluation of their base simulations with ground-based and space-based observations to identify conclusions that are robust to the specific modeling system, as well as situations where observation-based constraints can be most effective in reducing uncertainty.

Review of prior model estimates for NAB and its components
We focus here on model estimates for NAB using the U.S. EPA definition, which relies on simulations with North American anthropogenic emissions set to zero. Earlier reviews synthesize observations relevant for evaluating base model simulations at remote sites (McDonald-Buller et al., 2011;Reid et al., 2008;Vingarzan, 2004). Even with the same approach, model estimates will differ due to different representations of natural emissions and the choice of different years since meteorological variability alters the balance between transported vs. regionally produced O 3 . In Table 1 Despite quantitative differences, a basic consensus emerges that the highest NAB levels generally occur during springtime and at western U.S. (WUS) high-altitude regions, with lowest NAB levels during EUS low-altitude regions in summer. The summertime minimum reflects the peak in regional photochemistry, which leads to accumulation of O 3 generated from regional precursors at the same time as it shortens the lifetime of O 3 mixing downward into the photochemically active boundary layer (see e.g., (Fiore et al., 2002) ). At high-altitude WUS sites, models consistently indicate a correlation between NAB levels and total O 3 during spring (Emery et al., 2012;Fiore et al., 2003;Lin et al., 2012a;Lin et al., 2012b;Zhang et al., 2011), implying that enhanced NAB levels play a role in raising total O 3 , including above the NAAQS threshold. While these results are qualitatively consistent across several modeling platforms, the models vary in their quantitative attributions for NAB and its specific sources.
A few studies report the annual fourth highest maximum daily average 8-hour (MDA8) NAB value, which represents the minimum threshold for an O 3 standard that would be achievable by eliminating all North American anthropogenic emissions.
Consideration of different metrics, and different years complicates using the ranges across different modeling systems in Table 1 (Wild et al., 2012). More recent increases in Asian emissions may have additionally raised WUS NAB by up to 3 ppb in spring between 2001 and 2006 (Zhang et al., 2008). This Asian component of NAB, as well as European contributions and global anthropogenic methane has received particular attention under the UNECE Task Force on Hemispheric Transport of Air Pollution Reidmiller et al., 2009;TFHTAP, 2010;Wild et al., 2012). Recent studies have further documented the mechanisms by which Asian pollution can reach surface air over the WUS (e.g., (Brown-Steiner and Hess, 2011;Lin et al., 2012b). Wang et al. (2009) additionally estimated summertime U.S. Background (USB) for 2001 conditions, including the influence of Canadian and Mexican anthropogenic emissions (excluding methane). They found that average USB is 4 ppb higher than NAB over the contiguous United States, and up to 33 ppb higher during transport events at U.S. border sites directly downwind of these sources. In the model, Canadian and Mexican sources often contributed more than 10 ppb to total surface O 3 in excess of the 75 ppb NAAQS threshold in eastern Michigan, western New York, New Jersey, and southern California (Wang et al., 2009).
The natural portion of NAB has been quantified in a few modeling studies and generally follows the same patterns as total NAB, with maximum levels occurring during spring at high-altitude regions of the WUS (Table 1). Natural sources of NAB can also contribute to high-O 3 events. Observational evidence indicates events mainly of stratospheric origin at high-altitude sites in the WUS (e.g., (Langford et al., 2009)) but these efforts are hampered by a sparse observational network. Models are useful for quantifying the frequency of these events and for determining the contribution of these events to seasonal mean ozone levels. For decades, quantifying the stratospheric contribution to the troposphere, and particularly to surface air, has been contentious, with controversy rooted in the imprecise methods for quantifying accurately this component, as summarized in Lin et al. (2012a) (see their Section 2.3). Lin et al. (2012a) demonstrate that stratospheric intrusions play an important role in driving variability, including high-O 3 events, at high-altitude WUS sites during spring. High-altitude greatly increases susceptibility to stratospheric influence; for days when observed O 3 exceeds 70 ppb at monitoring sites in the western states of EPA Region 8 during April-June of 2010, Lin et al. (2012a) find that median values of stratospheric O 3 in the AM3 model are 10 ppb lower at the lower elevation AQS sites than at high-elevation sites. Episodic wildfires also contribute to high-O 3 events (e.g., Jaffe and Wigder, 2012;McKeen et al., 2002;Mueller andMallard, 2011), though Singh et al. (2010) found little O 3 production in wildfire plumes in California unless mixing with an urban plume occurred. The role of stratospheric intrusions and wildfires in contributing to differences between AM3 and GEOS-Chem high-NAB events is considered in Section 3.4.

North American background estimates from two independent global models
We compare background estimates for March through August of 2006 from two independent global models: the GEOS-Chem global chemistry-transport model (CTM) and the AM3 chemistry-climate model nudged to re-analysis winds. The models include different representations for the processes contributing to the abundance and distributions of tropospheric O 3 (Table 2). We evaluate the base O 3 simulations with hourly measurements from a ground-based network of monitoring sites and with monthly averaged retrievals from satellite instruments that are sensitive to O 3 in the midtroposphere. We compare the models for March through August of 2006, the period analyzed previously by Zhang et al. (2011), drawing on the 27-year AM3 simulation to place the 2006 NAB estimates in the context of inter-annual variability. We note that the inter-annual variability may be underestimated in AM3 in some regions due its use of climatological inventories for soil NO x and wildfire emissions.  (Fiore et al., 2002), the 2001 O 3 season (Fiore et al., 2003;Wang et al., 2009Wang et al., ), and the 2006Wang et al., -2008 seasons Zhang et al., 2013) including extensive evaluation with in situ and satellite observations. The AM3 model has previously been applied at ~50 km horizontal resolution globally to estimate the impacts of Asian pollution and stratospheric intrusions on surface O 3 over the WUS during March through June of 2010. Extensive evaluation with in situ and space-based observations for that period shows it represents the subsidence of Asian and stratospheric O 3 plumes over the WUS (Lin et al., 2012a;Lin et al., 2012b). The AM3 simulation used here is ~200 km horizontal resolution and is multi-decadal (1980-2007; first year is discarded as initialization), enabling us to place the year 2006 in the context of interannual variability (Section 4). Both models estimate NAB in U.S. surface air by setting North American anthropogenic emissions of aerosol and O 3 precursors to zero.

Model NAB Simulations, Observations and Analysis Methods
Anthropogenic sources include fossil and biofuel combustion (including aircraft and ship emissions within the domain), agricultural waste burning, and fertilizer application.

For anthropogenic emissions inventories, GEOS-Chem uses the 2005 National
Emissions Inventory for the U.S., while AM3 uses the historical ACCMIP emissions developed in support of IPCC AR5 (Lamarque et al., 2011;Lamarque et al., 2010).
Differences in the North American anthropogenic emissions inventories (5.58 and 6.67 Tg N a -1 in AM3 and GEOS-Chem, respectively; 4.85 and 5.32 Tg N a -1 for the United States), while crucial to the standard simulation for comparison with observations, should be irrelevant for the NAB simulations. Shortcomings in model representation of anthropogenic emissions and isoprene chemistry do not necessarily preclude their use for examining NAB, particularly its daily to inter-annual variability driven by transported components of NAB, such as O 3 associated with stratospheric intrusions, production from lightning NO x, wildfires, or methane.
The ground-based U.S. EPA Clean Air Status and Trends Network CASTNet site (CASTNet) were located to minimize the influence of polluted urban air (Baumgardner et al., 2002) and thus are useful for evaluating O 3 simulated by coarse grid models. Our  Columns retrieved from satellite instruments are sensitive to free tropospheric O 3 and enable an evaluation on a continuous spatial scale of the simulated background available to subside into surface air. We use here direct tropospheric O 3 retrievals from both the Ozone Monitoring Instrument (OMI)  and the Tropospheric Emission Spectrometer (TES) (Beer, 2006). All data are processed using a single fixed a priori as described in Zhang et al. (2010). Previous validation of these retrievals against in situ and aircraft measurements indicate an accuracy to within 5 ppb at 500 hPa (Zhang et al., 2010) and references therein). We remove the average bias of the satellite columns as compared to sondes at northern mid-latitudes prior to comparing with the model midtropospheric O 3 distributions and apply the appropriate satellite averaging kernels to the model daily ozone fields for direct comparison with the retrieved satellite O 3 columns (Zhang et al., 2010).

Regional and seasonal NAB estimates
Seasonal mean MDA8 NAB O 3 is consistently higher over the WUS than the EUS in both models ( Figure 1). During spring, AM3 simulates higher NAB over the high-altitude Western U.S., which we attribute at least partially to a larger stratospheric influence in AM3 (Lin et al., 2012a) than in GEOS-Chem . It is not clear whether AM3 actually simulates more stratosphere-to-troposphere exchange of O 3 , or whether it mixes free tropospheric air (including the stratospheric component) into the planetary boundary layer more efficiently. Evaluation with daily O 3 sondes will be important to ascertain whether the models represent the vertical structure of O 3 throughout the troposphere and lower stratosphere, as shown for AM3 during the 2010 CalNex field campaign (Lin et al., 2012a;Lin et al., 2012b). During summer, the different simulated spatial patterns for NAB over the western U.S. are influenced by differences in the lightning NO x sources as discussed further in Section 3.4.3. Figure 2 shows the spatial patterns of the fourth highest NAB value between March 1 and August 31. As the ozone seasonal cycle is typically highest during the summer in polluted regions, we expect the fourth highest during this six-month period to represent reasonably this statistic over a full year. AM3 simulates the highest values over Colorado whereas GEOS-Chem indicates that the highest values occur over New Mexico (Figure 2), reflecting the excessive NAB produced from lightning NO x (Zhang et al., 2013). Due to different seasonal timing of these processes, AM3 simulates the fourth highest values during spring over much of Colorado but GEOS-Chem simulates peak values over much of New Mexico during August ( Figure 2). Over Minnesota and Wisconsin, GEOS-Chem generally produces the fourth highest values in spring but AM3 suggests they occur in summer. The fourth highest values often occur during months when model biases are largest (Section 3.4), indicating that bias-correction techniques may be necessary for quantitatively accurate NAB estimates at specific locations and times. Over the northeastern states and west coast, the fourth highest values generally occur during spring, though later dates occur in the southeastern states, with occurrences generally later in GEOS-Chem than AM3. In the following sections, we analyze the model NAB estimates in the context of evaluating the total surface O 3 simulations with both space-and ground-based observations, a first step towards developing the processlevel knowledge needed for accurate bias-correction.

Constraints from space-based observations
With the exception of O 3 produced within the U.S. boundary layer from CH 4 or natural NMVOC and natural NO x , NAB in surface air mixes downward from the free troposphere. We use 500 hPa products retrieved from both the OMI and TES instruments aboard the NASA Aura satellite to evaluate the potential for space-based constraints on During spring, AM3 estimates a stronger north-to-south O 3 decrease in the midtroposphere than GEOS-Chem ( Figure 3). The satellite retrievals from both instruments suggest a stronger gradient than simulated with GEOS-Chem, which generally underestimates O 3 in the northern half of the United States compared both to TES (5-15 ppb) and OMI (up to 10 ppb). In contrast, AM3 mid-tropospheric O 3 is higher than the satellite products in the northern half of the domain, with a closer match to the OMI retrievals (generally within 5 ppb over the United States) than TES (positive biases up to 10-20 ppb). Prior direct evaluation of AM3 with O 3 sondes indicates biases of up to 10 ppb in AM3 at the high northern latitude sites of Alert and Resolute at 500 and 800 hPa with little bias in spring at the mid-latitude North American sites of Edmonton, Trinidad Head, Boulder and Wallops Island (Naik et al., 2013), roughly consistent with the biases relative to OMI.
Both satellite instruments indicate a general decrease from spring into summer over the western and northern United States, but an increase over several southeastern states, northern Mexico, and the Gulf of Mexico (compare Figures 3 and 4). The summertime spatial pattern of U.S. O 3 observed from space is broadly consistent with that estimated by interpolating upper tropospheric ozonesonde measurements during August of 2006 (Cooper et al., 2007). While the increases from spring to summer in the mid-troposphere over the EUS may include a contribution from lofting of regional anthropogenic O 3 production, there is likely also a contribution from the larger lightning NO x source in the free troposphere during summer. GEOS-Chem estimates a summertime mid-tropospheric O 3 enhancement at mid-latitudes, centered over the United States whereas AM3 simulates a gradient with O 3 generally increasing along the southwest-tonortheast direction ( Figure 4). The AM3 model tends to be high in summer by up to 15-20 ppb compared to both retrievals over Canada, as for the springtime comparison with TES, but with larger biases than in spring compared to OMI.
We expect discrepancies between AM3 and observations during summer in forested boreal regions due to the use of a climatological wildfire inventory and the vertical distribution used to prescribe those emissions (Dentener et al., 2006), which lofts fire effluents into the mid-troposphere where they can efficiently produce O 3 and PAN (see also Section 3.4.2). GEOS-Chem includes fire emissions representative of the year 2006 and restricts emission to the planetary boundary layer, and the mid-tropospheric O 3 biases versus the satellite products are smaller than AM3 in this region. The model differences in mid-tropospheric O 3 distributions shown in Figures 3 and 4 likely contribute to the different spatial distributions of simulated NAB at the surface, specifically the higher NAB estimated with AM3 over the northern United States and Canada relative to the NAB estimated with GEOS-Chem (Figures 1 and 2).
In both Figures 3 and 4, the models are generally more consistent with the OMI retrievals, which likely reflect differences in the vertical sensitivity of the TES and OMI instruments. While the satellite retrievals provide useful qualitative constraints on the simulated mid-tropospheric distributions, the disagreement between OMI and TES over many locations (grey boxes in Figures 3 and 4) hinders their quantitative utility. The higher sampling frequency possible from instruments on geostationary satellites such as TEMPO (Hilsenrath and Chance, 2013) should improve the potential for space-based constraints on free-tropospheric and near-surface distributions.
We can nevertheless glean additional insights into the model vertical distributions of NAB by examining differences in the models sampled with the two different averaging kernels. For example, over Canada, GEOS-Chem indicates that OMI would measure higher O 3 than TES whereas AM3 indicates that TES should retrieve higher O 3 than OMI during both seasons. In the spring, the retrieved OMI product is generally higher than TES over this region, as simulated by AM3. GEOS-Chem is generally within 10 ppb of the OMI product with a tendency to underestimate springtime mid-tropospheric O 3 over Canada, whereas AM3 is generally within 5 ppb of OMI over much of the United States, with a tendency towards a positive bias. During summer, TES is higher than OMI over Canada. The high O 3 bias over the EUS in AM3 is confined close to the surface ( Figure   5) since AM3 tends to underestimate free tropospheric O 3 , particularly over the convectively active Gulf of Mexico region where lightning NO x is expected to be an important source of NAB O 3 . We conclude that the estimates from the models could bracket the true NAB in many cases, but the ability of the models to bracket the satellite measurements does not preclude biases in the NAB estimates. This conclusion is examined further below by comparisons of the two models with ground-based measurements.

Constraints from ground-based measurements
We use the CASTNet MDA8 O 3 observations to further constrain the model NAB estimates through an evaluation of the base simulations, which include all anthropogenic emissions, to simulate total surface O 3 . Since NAB depends strongly on altitude ( Figure   1; references in Table 1), the remainder of our analysis separates the data by altitude to gain insight into the different processes shaping NAB distributions. Specifically, we divide the CASTNet sites into two groups: (1) below 1.5 km in elevation (low-altitude sites), primarily sites in the EUS, and (2) Intermountain West CASTNet sites with elevation greater than 1.5 km (high-altitude sites). This second category includes all highaltitude CASTNet sites except for those in California. At the low-altitude sites, AM3 exhibits a large positive bias in total surface O 3 in all months, most problematic during summer. The exacerbation of the bias in summer implies a problem with O 3 produced from regional emissions, with isoprene-NO x -O 3 chemistry a likely culprit given its different treatment in the models (Table 2; see Section 3.5.3). Both models show declining NAB levels from spring into summer, though the GEOS-Chem amplitude of the seasonal cycle is smaller than that of AM3. The AM3 discrepancy with observations is much larger than the difference between the GEOS-Chem and AM3 NAB estimates except for March and April. If we focus on March and April, and assume that the model biases at both the high and low altitude sites are entirely due to problems representing NAB, then the models would be more consistent in their NAB estimates. While we conclude that the AM3 NAB at low-altitude sites is too high in March since we expect NAB to be lower than the observed value, it is possible that NAB could actually be higher in an atmosphere with lower NO x than under current conditions due to more efficient O 3 production and slower chemical loss.

Seasonal Variability
At the high-altitude sites in summer, the GEOS-Chem overestimate of observed O 3 has been attributed previously to an overestimate of O 3 produced from lightning NO x when prescribing a higher production of NO x from flashes at mid-latitudes and spatially scaling the source to match LIS-OTD climatological flash counts (Murray et al., 2012), which may lead to regional errors for a specific year (Zhang et al., 2013). The larger difference between the NAB estimates from the two models in August than between the simulated and observed total O 3 implies that the agreement with observations, while a necessary condition, does not sufficiently constrain the NAB estimates.  Table 3. We additionally include in Figure 6 estimates from a coarse resolution version of the GEOS-Chem model (green) in order to examine the extent to which differences in horizontal resolution contribute to the different NAB and total O 3 estimates in AM3 versus GEOS-Chem. In all cases, the NAB (dotted lines) differ more between the GEOS-Chem and AM3 models than between the high-versus low-resolution versions of GEOS-Chem. This conclusion also holds for the total O 3 distributions in spring. In summer, however, the total O 3 distributions in GEOS-Chem are more sensitive to the choice of horizontal resolution, presumably reflecting the larger contributions from local-to-regional photochemical production during this season and the importance of spatially resolving domestic anthropogenic and natural emissions distributions. Emery et al. (2012) found that the higher resolution CAMx model generally simulated higher WUS NAB than a coarse resolution version of GEOS-Chem, and better agreement has been noted between CAMx and the higher resolution version of GEOS-Chem (EPA, 2013). Simulation of higher WUS NAB by higher resolution models (Emery et al., 2012;Lin et al., 2012a) likely reflects improved resolution of mesoscale meteorology at higher resolution and the damping of vertical eddy transport at coarser resolution (Wang et al., 2004;Zhang et al., 2011).

Daily Variability
AM3 simulates a wider NAB range than GEOS-Chem ( Figure 6 and Table 3).
This wider range of NAB may contribute to the wider total surface O 3 distribution in the AM3 versus GEOS-Chem standard simulations, which aligns more closely with the observed variability, except for O 3 simulated with the high-resolution GEOS-Chem model in summer at high-altitude sites. The relative skill of AM3 in capturing the variability of NAB despite its generally high bias implies that AM3 is useful for processlevel analysis and for quantifying day-to-day variability. We underscore the need for future efforts to focus on specific processes and describe below (Section 3.4) some first steps towards this goal.
In Table 3, we further partition statistics for total and NAB O 3 in surface air into average versus high-O 3 days. We use observed values, rather than simulated values used in Zhang et al. (2011), to select for high-O 3 days in order to sample the same temporal subset from both models. Using the simulated total O 3 values would lead to subsets of different sizes given the individual model biases. During spring, the models robustly estimate NAB to be ~10 ppb higher on average at high-altitude than at low-altitude CASTNet sites, but AM3 estimates higher NAB levels than GEOS-Chem. During summer, the models also estimate higher NAB at high-altitude than at low-altitude sites, and average NAB levels decrease from spring to summer at low-elevation sites. GEOS-Chem suggests little change from spring to summer in average high-altitude NAB whereas AM3 simulates a decrease of over 10 ppb. At the high-altitude sites, both models suggest that NAB increases as total O 3 increases, although the sample size is small for events above 75 ppb and the average values for the different data subsets all fall within one standard deviation each other. At the low altitude sites, there is little change in the average NAB when selecting for observed values exceeding 60, 70, or 75 ppb. The variability in NAB, as measured by the standard deviation in Table 3, is similar in the two models at the low-elevation sites, but AM3 simulates more variability in NAB at the high-altitude sites than GEOS-Chem, particularly on high-O 3 days.
The time series in Figure 7 provide evidence at the local scale for our assessment of regional and seasonal biases. At the two western U.S sites (Gothic, CO and Grand Canyon NP, AZ) in Figure 7, the 6-month average NAB is nearly the same in both models, but this reflects little seasonal variation in the GC NAB (thin blue line) versus a sharp seasonal decline from spring into summer in AM3 (thin red line). The standard deviation is twice as large in AM3 as in GEOS-Chem, consistent with the frequency distributions of NAB in Figure 6 (left side) and with the observed variability.
We further probe the time series in Figure 7 by calculating correlation statistics separately for the spring and summer seasons (Table 4). During spring, the correlations at the WUS sites are higher in GEOS-Chem (Table 4), but AM3 maintains the same level of correlation into summer at the Colorado site while the correlation improves into summer at the Arizona site. Table 4 also shows the correlation of the NAB estimates versus the simulated total O 3 . Over the WUS sites, the models robustly indicate that variability in NAB drives a substantial portion of the total surface O 3 variability in both seasons, but with a stronger influence (higher correlations) during spring.
Despite the summertime high bias in AM3 at the two EUS sites (M.K. Goddard, PA and Georgia Station, GA), it correlates at least as well with the observations as GEOS-Chem (Figure 7, Table 4). At the EUS sites in Figure 7, the NAB in both models is poorly correlated, and in some cases, anti-correlated with the total simulated surface O 3 .
An important implication is that the highest total surface O 3 events are generally decoupled from the highest NAB events, consistent with the current understanding that regional pollution is the dominant influence on total O 3 distributions in this region.

Processes contributing to inter-model differences in total and NAB surface O 3
We examine here the role of specific processes in contributing to differences in the GEOS-Chem and AM3 Base and NAB simulations. Superimposed in Figure 7 are results from a separate simulation (Lin et al., 2013) in which a stratospheric O 3 tracer (O3Se90) was available, tagged relative to the e90 tropopause (Prather et al., 2011) as described in Lin et al. (2012a). The correlation of the O3Se90 tracer with the NAB in AM3 is also provided in Table 4 Figure 7) suggests that the AM3 model is simulating surface O 3 enhancements associated with a stratospheric intrusion and consistent with the observed spatial pattern of enhanced ground-level O 3 at the CASTNet sites. Figure 7 and Table 4 further suggest that these events drive much of the variability in NAB at high-altitude western sites in spring (Figure 7 and Table 4), consistent with earlier findings for April through June of 2010 (Lin et al., 2012a).

Wildfires over the EUS in spring and summer
There are several EUS events during spring and summer where AM3 simulates a localized spike in NAB that is not simulated by GEOS-Chem, which we attribute at least partially to the differing treatment of wildfire emissions in the models. In AM3, the recommendations from Dentener et al. (2006) are applied to vertically distribute biomass burning emissions, placing 40% of the total emissions between 3 and 6 km (see their   (Table 2). We find that the use of a year-specific fire inventory versus a climatology in AM3 leads to differences of 10 ppb for the June 28, 2006 event (not shown).

Lightning NO x over the Southwestern United States in summer
GEOS-Chem produces approximately 10 times more lightning NO x than AM3 over the southwestern U.S. during summer (0.018 Tg N in AM3 versus 0.159 Tg N in GC within the region 26°N-42°N, 124°W-97°W) and the models further differ in their spatial distributions of the lightning NO x source ( Table 2). The models differ markedly in their NAB estimates over this region in summer (e.g., Figures 1 and 2). This source has been reduced in a newer version of GEOS-Chem, decreasing simulated NAB O 3 in these regions (Zhang et al., 2013).
During August at the two WUS sites in Figure 7, the models reverse their relative rankings of simulated NAB relative to springtime, with the GEOS-Chem NAB as much as 10-20 ppb higher than AM3 NAB in summer. In notable contrast to the spring, GEOS-Chem overestimates the observed O 3 values. We attribute the summertime overestimate and poor correlations of GEOS-Chem with the observed values over the two WUS sites in Figure 7 (Table 4) to the lightning NO x source and subsequent transport.

Isopene oxidation chemistry over the EUS in summer
Earlier work (e.g. Fiore et al. 2002Fiore et al. , 2003) demonstrated that NAB is fundamentally different between the EUS and the WUS, with the EUS more strongly controlled by regional photochemistry, where the O 3 lifetime in the planetary boundary layer is as short as 1-2 days and isoprene-NO x -O 3 chemistry dominates much of the region from May through September (Jacob et al., 1995). At the two EUS sites in Figure   7 (M.K. Goddard, PA and Georgia Station, GA), we attribute some of the differences in the summertime simulations to the isoprene oxidation mechanism (Table 2) that would tend to reduce O 3 production in GEOS-Chem relative to AM3 due to isoprene ozonolysis serving as a more important loss pathway for NAB in GEOS-Chem (Fiore et al., 2002;Mickley et al., 2001). These differences in isoprene oxidation chemistry could at least partially explain the higher NAB in AM3 during the isoprene emission season (i.e., a longer O 3 lifetime in the AM3 boundary layer). The largest inter-model differences in NAB, however, occur in spring when transported sources are more important than regional production involving natural sources.
The isoprene oxidation chemistry likely also contributes to the large bias in AM3 total surface O 3 . GEOS-Chem assumes a much higher yield of isoprene nitrates from the reaction of isoprene hydroxyperoxy radicals with NO and assumes they are a permanent sink of NO x (Table 2). In constrast, AM3 assumes an 8% isoprene nitrate yield and allows 40% of the products to recycle back to NO x on the basis of observational constraints from field campaigns (Horowitz et al., 2007;Perring et al., 2009). Earlier work with predecessors of the models used here suggests that these differences may explain over 10 ppbv of the high bias in AM3 relative to GEOS-Chem over the EUS in summer (Fiore et al., 2005). The fact that GEOS-Chem best captures the observations implies that the additional O 3 production from isoprene oxidation using the field-based constraints on isoprene nitrates must be offset by larger O 3 losses, such as from additional HO x uptake by aerosol  and halogen-induced O 3 destruction (Parrella et al., 2012).

Inter-annual variability in NAB MDA8 O 3 estimates in surface air
The 27-year AM3 NAB simulation (1981 enables us to define spring and summer climatologies of seasonal mean NAB O 3 in surface air, and to quantify the yearto-year variability as the standard deviation of the annual seasonal mean values ( Figure   11). The seasonal mean spatial patterns are similar to those in 2006 (Figure 1), with little year-to-year variation over much of the country. Figure 11 also includes the climatological fourth highest MDA8 value between March 1 and August 31 over the multi-decadal simulation. We emphasize that these estimates are subject to the biases diagnosed above in comparison to observations. In particular, NAB estimates over the  For observed O 3 events above 60 ppb, AM3 tends to overestimate observations during spring but does not exhibit any systematic bias during summer. Furthermore, the model captures events up to 80 ppb during spring of 1999, though in other years there is a general tendency to underestimate events above 75 ppb. This finding contrasts with those from higher-resolution models including the GEOS-Chem version used here, which underestimates events above 60 ppb . During all years and both seasons shown in Figure 12 Consistent with earlier work (Fiore et al., 2003), Figure 12 shows that summertime NAB levels are typically much lower than in spring, with maximum values nearly always below 60 ppb and 75 th percentile values generally below 50 ppb. Jaffe

Conclusions and Recommendations
On the basis of health evidence, the threshold for the National Ambient Air Quality Standard for ground-level O 3 has been lowered in recent years, pushing closer to "background" levels. In the past, the U.S. Environmental Protection Agency considered model-based estimates of background O 3 as part of the process for setting the NAAQS.
These model-based estimates, previously called "Policy-Relevant Background", are now termed "North American Background" (NAB), which is defined to be background levels that would exist in the absence of North American anthropogenic emissions. Identifying high-background events is crucial for determining whether an observation merits consideration for "exceptional event" status, which exempts a particular observation from counting towards non-attainment if it can be shown that the event occurred due to processes beyond the control of U.S. air quality management options. The model simulations presented here can provide information on the frequency of such events and the individual components contributing to NAB, including O 3 originating from international pollution, wildfires, or the stratosphere.
As a first step towards assessing our understanding of NAB and its components, we briefly reviewed recent model estimates (Table 1) year-to-year differences in the frequency of springtime high-O 3 events (Figure 12).
At high-altitude WUS sites, the GEOS-Chem and AM3 models consistently show higher NAB than at low-altitude sites, but the magnitude and day-to-day variability often differs (Figures 1,5,6,7,Tables 3,4). In some months (e.g., August), the larger differences between the NAB estimates from the two models than between the simulated and observed total O 3, imply that agreement with observations, while a necessary condition, does not sufficiently constrain the NAB estimates. While AM3 indicates a seasonal decline of NAB into summer over this region, GEOS-Chem suggests a relatively weak seasonal cycle associated with an increase of influence from lightning NO x in that model during the late summer (Figures 5 and 7). Higher stratosphere-troposphere exchange in AM3 may explain the springtime NAB enhancement in the free troposphere relative to GEOS-Chem (Figure 3), which, followed by more vigorous mixing between the free troposphere and boundary layer, may explain the higher NAB in surface air during this season in AM3 (Figure 1).
At low-altitude sites, such as over the EUS, the models consistently show lower NAB levels than at high-altitude sites, as in earlier work (Table 1). We find that the highest total surface O 3 events over the EUS are often decoupled from the highest NAB events, consistent with the understanding that regional pollution is the dominant influence on total O 3 distributions there. Over the EUS, uncertainties in isoprene-NO x -O 3 chemistry ( Table 2) likely contribute to differences in simulated total O 3 , and to a lesser extent, NAB estimates.
We find little evidence that horizontal resolution is a major contributor to differences in mean NAB estimates in the models (Figure 6), consistent with EPA (2013).
Higher resolution refines spatially local NAB estimates, including at the tails of the distribution and is also important for resolving the impact from local and regional emissions, as evidenced by the larger differences associated with resolution in summertime distributions when photochemical production peaks in many U.S. regions ( Figure 6). We conclude that simulated NAB distributions reflect large-scale synoptic transport that is resolved sufficiently at the relatively coarse scale of global models, with the NAB differences mainly stemming from different treatments of NAB sources such as stratospheric O 3 , boreal fires, and lightning NO x . The regional and seasonal variability in these driving processes further manifests as differences in the model timings of the fourth highest NAB over many regions (Figure 2). Future efforts to determine the processes contributing to model differences, and to the biases in individual models versus observations, would benefit from evaluation with daily ozone vertical profiles as measured by sondes, consistently defined tracers of stratospheric influence (e.g., the O3Se90 tracer in AM3), as well as daily threedimensional archival of other chemical species (e.g., CO, PAN, H 2 O) that can aid in disentangling tropospheric versus stratospheric origins and from meteorological variables (e.g., mixing depth, mass fluxes) to diagnose the role of mixing processes. The routine use of synthetic tracers could further aid in distinguishing between model differences in transport, dilution, and mixing versus chemical evolution during transport. Improved estimates of NAB in a given region and season will require better constraints on, for example: lightning NO x for central and Southwestern U.S. in summer; transported stratospheric O 3 over the high-altitude Western U.S. in spring; isoprene chemistry and its impact on chemical processing and NAB lifetime over the EUS in summer; and wildfires which may influence NAB throughout the nation from late spring into summer.
We propose that future multi-model studies target limited time periods to enable process-oriented analysis during field campaigns when ground-based and satellite observations are supplemented with a broader suite of observations from intensive aircraft flights and balloon launches. If combined with a thorough evaluation of O 3 precursors, such analysis should hasten progress towards understanding the impact of specific sources on NAB O 3 . We further recommend developing bias-correction techniques, such as those routinely applied in numerical weather prediction, to improve the accuracy of local NAB estimates. As a first step, simple assumptions assuming the bias is entirely driven by one process (e.g., as applied to the stratospheric O 3 estimates from the AM3 model by Lin et al. (2012a)) can be applied to individual models and then used to generate a multi-model estimate with uncertainties. The two models analyzed here often bracket the observations (Figures 3-7, and 9), thereby indicating different sources of error, which leads us to conclude that a multi-model approach can harness unique capabilities of different modeling systems and thus provide more accurate NAB estimates than a single model.

Meteorology
Online, nudged to NCEP u and v (Kalnay et al., 1996). The nudging timescale is inversely proportional to pressure (Lin et al., 2012b) Assimilated from NASA GEOS-5 Lightning NO x distribution Parameterized based on convective cloud top height (Price and Rind, 1992), and described in Horowitz et al. (2003); source in 2006 is 4.9 Tg N a -1 ; range over 1981-2007 is 4.4-4.9 Tg N a -1 .
Scaled to match a top-down constraint of 6 Tg N a -1 (Martin et al., 2007) and spatially redistributed based on the LIS/OTD flash climatology (Murray et al., 2012) and includes a higher yield (500 mol N flash -1 at northern mid-latitudes and 125 mol N flash -1 elsewhere (Hudman et al., 2007) Anthropogenic emissions ACCMIP  with annual interpolation after 2000 to RCP4.5 2010 value (Lamarque et al., 2011) EDGAR (Olivier and Berdowski, 2001) (Guenther et al., 2006), implemented as described by Emmons et al. (2010) and Rasmussen et al. (2012) MEGAN 2.0 (Guenther et al., 2006) Biomass burning emissions As for anthropogenic emissions but distributed vertically as recommended for AeroCom (Dentener et al., 2006) GFEDv2 year-specific monthly fires , emitted at surface   Table 1 for model configurations.    ) for the GEOS-Chem (GC ½°x⅔° horizontal resolution; blue) and GFDL AM3 (~2°x2° horizontal resolution red) simulations sampled at the CASTNet sites (using bilinear interpolation of the nearest four grid cells and sampling only on days with valid measurements) at altitudes a) above 1.5km excluding California to focus on the InterMountain West region and b) below 1.5km in altitude. Also shown are NAB estimates (thin lines) with GC (blue) and AM3 (red). The grey band delineates the one standard deviation range about the observed regional mean monthly values.