22042010

What exactly is a monthly mean?

By Eco Guy 1:31am 22nd April 2010
A lot of climate analysis is done using the monthly mean temperature. Given that data is quite often available for hourly measurements I thought I'd have a look at whether creating a monthly mean straight off all the hourly measurements would produce a difference; initial findings indicate it does.

Background

Based upon Watts observations in GISS & METAR – dial “M” for missing minus signs: it’s worse than we thought and how simply using '-' instead of 'M' can flip measurements and introduce error I thought I'd have a look at two things:
  • Does this sign flip error show up in other places?
  • Does the act of calculating the monthly mean introduce error or bias?
In order to do this one needed access to a 'higher resolution' data set than just the simple daily mean measurements; as without that it would be impossible to determine with any degree of significance whether the act of creating the mean value would introduce error into the final value.

Data sources

Two data sources were used for this:

Method

Quite simple really:
  • Determine 'matching' stations in both the SCRAM and GHCN data set; i.e. both based on the same airport. (for instance Santa Barbara/FAA Airport SCRAM:CA23190 -> GHCN:42574606001) Also validated by lat/long check.
  • Extract the monthly and hourly data.
  • Run a program to calculate for each month:
    • The 'Derived' monthly mean, based on taking the min/max per day mean, summing for month and finding the mean over the month with 1 decimal place of final rounding applied. This should closely track the GHCN measurement, indicating the hourly data set is closely correlating the data set GHCN used to get its figure; i.e. they are probably the same.
    • The 'exact' monthly mean, based on taking all the hourly measurements in a given month and producing the mean value, to 4 decimal places.
    • A rounded form of that to one decimal place.
    • The difference of the GHCN mean to the 'exact' mean to one decimal place.
Now, the reason for the 1 decimal place rounding is to fairly compare to the same degree of supplied accuracy the GHCN figures against the Derived and exact means.

Also error values etc are filtered out and the means are calculated across the exact set size in each case (i.e. if a day has only 23 measurements, thats what the mean is calculated across and not 24).

The Results

I have only had time to produce one pair of results so far, but they are quite interesting in their own right. These are for Santa Barbara/FAA Airport.

Year=1984
Month   Org.Mean   [Derived]    Mean from hrs  rounded   Diff
  1       12.4       [12.2]        12.0639      12.1      0.3*
  2       12.4       [12.4]        12.4082      12.4      0.0
  3       14.6       [14.6]        14.7050      14.7     -0.1
  4       15.2       [15.3]        15.3426      15.3     -0.1
  5       17.8       [17.6]        17.3775      17.4      0.4*
  6       17.4       [17.4]        17.3889      17.4      0.0
  7       19.7       [19.6]        19.1129      19.1      0.6*
  8       21.3       [21.2]        20.9677      21.0      0.3*
  9       22.7       [22.6]        22.0139      22.0      0.7*
 10       16.5       [16.5]        16.7100      16.7     -0.2
 11       12.8       [12.7]        12.8372      12.8      0.0
 12       10.4       [10.6]*       10.6325      10.6     -0.2
Net yearly temp diff between Org Monthly measure and hourly calc mean (+ is org is higher) = 1.7

Year=1985
Month   Org.Mean   [Derived]    Mean from hrs  rounded   Diff
  1       10.2       [10.2]        10.0986      10.1      0.1
  2       11.0       [10.9]        10.9755      11.0      0.0
  3       11.6       [11.6]        11.7279      11.7     -0.1
  4       14.3       [14.3]        14.1443      14.1      0.2*
  5       14.2       [14.1]        14.3182      14.3     -0.1
  6       17.2       [16.9]*       16.5093      16.5      0.7*
  7       20.2       [19.9]*       19.3735      19.4      0.8*
  8       19.1       [18.9]*       18.5297      18.5      0.6*
  9       18.2       [18.1]        18.0733      18.1      0.1
 10       16.3       [16.3]        16.1850      16.2      0.1
 11       12.1       [12.2]        12.3125      12.3     -0.2*
 12       11.9       [11.9]        11.2463      11.2      0.7*
Net yearly temp diff between Org Monthly measure and hourly calc mean (+ is org is higher) = 2.9

Year=1986
Month   Org.Mean   [Derived]    Mean from hrs  rounded   Diff
  1       13.1       [12.9]*       12.3581      12.4      0.7*
  2       13.1       [13.0]        12.9787      13.0      0.1
  3       13.7       [13.6]        13.5633      13.6      0.1
  4       14.4       [14.3]        14.4097      14.4      0.0
  5       14.9       [14.8]        14.6752      14.7      0.2*
  6       16.9       [16.9]        16.6937      16.7      0.2
  7       18.2       [18.2]        17.9495      17.9      0.3*
  8       18.2       [17.9]*       17.0393      17.0      1.2*
  9       16.7       [16.6]        16.6335      16.6      0.1
 10       17.1       [17.0]        16.7174      16.7      0.4*
 11       14.8       [15.0]        14.6952      14.7      0.1
 12       12.1       [12.1]        11.2970      11.3      0.8*
Net yearly temp diff between Org Monthly measure and hourly calc mean (+ is org is higher) = 4.2


The 'Org Mean' field is from the GHCN data set. The * indicates a difference greater than 0.2 - this is to allow for the effects of different types of rounding at the limit could take the same original value and produce a difference of up to 0.2 with 1 decimal place of rounding enforced.

Given this, what is interesting is:
  • My derived value seems to closely track the GHCN data set well - although given its not perfect I suspect some corrections are being applied that I'm not aware of.
  • Only in one case is the derived value actually higher than the GHCN value.
  • The exact mean value is able to often 'stray' by greater than 0.2 away from the GHCN data set value whilst the derived value remains in scope (i.e a * in the fair right column, but none in the third). To me this indicates the daily min/max mean approach is often incorrectly representing the real distribution of temperatures in the month; which is not that much of a surprise as the daily min/max only uses a maximum of 62 contributing data points, compared to the exact mean that uses up to 744 contributing data points (or 10x plus the effective resolution).
  • It seems the min/max monthly mean method appears to add a net warming when compared to the exact monthly mean value.
Note: All the above needs further analysis across a much larger set of data pairs to determine if there is a trend here - but the degree of differences encountered so far are quite worrying.

Definitely food for thought, I was quite taken aback by the differences coming through and had to double check my code (even putting in some crazy sanity checks). Also I have put in code to detect possible 'sign switching' problems although it hasn't triggered on this one data set pair.

Whats next?

I want to spend a bit of time and turn this into an online tool so people can have a play themselves and see whats going on.

Also I want to investigate the yearly accumulative difference over a larger data set - to see if a trend across multiple data set pairs is evident.



More exact monthly means.. The results are in..

Related Content Tags: climate change, measurement


Follow us on Facebook, click here!
Comment

Comments left

  • Steven Mosher said:

    There are several reasons why your derived value can be different from that given in GHCN.

    1. Application of a TOBS correction in ghcn. which ghcn file are you using?

    2. Averaging of duplicates in the ghcn file. for example if there are two instruments at the same location then GHCN will average these.

    3. To calculate a monthly mean ghcn does this:

    a. They take the daily min and daily max as defined by ( I believe ) a midnight time of observation. this is important and can cause differences.

    b. The min figure is rounded. the max figure is rounded ( nearest F)

    c. day ave = min+max/2. rounded. i believe, have to check the manual.

    d. days are summed to the month

    In general in studying hundreds of stations over decades you will find a difference between the "hourly" average and min/max average. But you wont find a trend bias, which is what matters.

    ON Fri, 23 Apr 10, 4:43am probably from United States  Reply to this comment

    • Eco Guy said:

      In answer,

      ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z as downloaded on 21st Jan 10 for GIS model. not the _adj version. I have a ref to TOBS. I'll also try with the adjusted data.

      anyway to find out if they had multiple instruments at a location?

      wow on rounding the intermediate min/max - any reason given? Also do they define the rounding being done, i.e. nearest, up, down, 5up etc.

      Yep, there probably won't be a trend to see, but I'm actually more interested in the effect on confidence this introduces and if that varies over the months and temp ranges encountered. Easy enough for me to put in a DB and check, although wondering if this has been done before.

      ON Fri, 23 Apr 10, 8:45am probably from Australia  Reply to this comment

Add Coment

Got a question or comment about this?

Find what you were looking for?.. Not quite what you expected?.. Got a question to ask people?
Share your thoughts and use the form below to post a public comment right on this page.


Simple HTML is supported i.e <b> <i> etc. Excessive inline URL's, spam, ANY ads or swearing is blocked/removed quickly. youtube URL's get embedded.

Posting Terms & Conditions