Flood Frequency at Deming, Ferndale, and Everson Main Street

10-13 December 2003
Delbert D. Franz

Preliminary Results
Test the following method for estimating flood frequency for Deming,
Ferndale, and the overflow at Everson Main Street:

1. Develop a relationship between flood peak at Ferndale and flood
peak at Deming for events selected from 1990, 1995, and 2002.
These are the events for which we have calibrated and adjusted the
flows at Deming to get a reasonable mimicry of the hydrograph at
Ferndale, stage values at Huntingdon a the US-Canada border,
and high-water marks.

Events with a peak flow of 15,000 or more at Deming will be selected.
This gives ten events. We suplement these ten with two others
using factors on the flows at Deming to better define the shape of
the relationship. In future work we will add the recent floods in
late 2003 and the flood in February of 2002 as well to provide
greater definition in some regions of the relationship.

2. Use the relationship from step 1, giving the peak flow at Deming
as a function of the peak flow at Ferndale, to estimate the peak flow
at Deming for each flood peak in the annual flood-peak series at
Ferndale. We do this for the following reasons:

2.1 The record at Deming is unreliable and the flows
can be greatly in error. The errors are often large,
more than 50 per cent, and unpredictable. The only
reliable flows at Deming are those that have been tested
and adjusted using the unsteady-flow model of the lower
Nooksack River to mimic observed results at Ferndale,
Huntingdon (for overflows), and high-water marks.

2.2 The record at Ferndale is rated good in most years.
Thus the flows there appear to be valid. Testing of
tidal influences has found no influence to date but
further tests are planned in order to put that question
to rest. However, the annual flood-peak series at Ferndale
is a mixed population with about one flood in five having
been affected by an overflow near Everson. Consequently
fitting these peaks with a probability distribution is probably
invalid and extrapolation to the levels of interest will be
in error. If the two populations are treated individually,
then we do not have many events with overflows in that
sample.

2.3 Thus we are in the interesting dilemma that neither Deming
nor Ferndale are suitable for a traditional frequency
analysis. Deming, because it has large errors in its
peak series, and Ferndale because of the uncommon
location of the watershed divide, at the bank of the
Nooksack near Everson, between the Fraser River
drainage and the Nooksack River drainage.

2.4 The key to this dilemma is to make use of the model
results to create a series of peaks at Deming from
peaks at Ferndale. The relationship takes into account
the effect of overflow at Everson so that the
peak series so derived for Deming is the current best
source of data for flood frequency analysis.
We do lose some years of record that would be
available at Deming but additional years of data that
are in gross error does not lead to any improvement.

3. Do a conventional frequency analysis on the annual
peak-flow series at Deming developed in step 2. This series
is homogeneous, the effect of overflow has been removed,
it is consistent with the conditions in the Nooksack in
1990-2002 and thus reflects recent conditions and not
conditions 40 years ago that are for the most part unknown.

4. Extend the relationship found in step 1 and reverse its
sense so that it will predict the peak flow at Ferndale given
the peak flow at Deming. We add some synthetic large
events by inflating the boundary condition at Deming to
produce some higher overflow at Everson. Analysis of these
flows is used to make a further extension of the relationship
between the peak flows when overflow is present. This
is taken up below. The final relationship will span the flow
range at Deming from about 15,000 cfs at the low end
to a bit more than 100,000 cfs at the high end.

5. Also develop a relationship between the peak flow at
Deming and the peak flow at Everson Main Street using
both real and inflated boundary conditions at Deming.

6. Use the return-period flows defined in step 3 to estimate
the flows at Ferndale and at Everson Main Street for the
same return period.

7. We also do a check to estimate the influence of including
a normally-distributed disturbance term in the relationship
in step 4. From this and the flood frequency distribution at
Deming, we then define the joint probability distribution of
the annual peak floods at Deming and Ferndale. We then
compute the marginal distribution for Ferndale and integrate
it to define the return-period flows at Ferndale.
Put Ferndale and Deming peaks into a matrix. Mathcad can
only sort on a matrix.
Bring out the vectors again under a new name. These are now
sorted on flows at Ferndale
Define a relationship between flow at Everson Main Street
and at Deming. This will be extended below. Here it shows
the general nature of the overflows and that we have few
small overflows. The 2003 events may remedy that!
We will be using the linfit function to fit a linear spline. Here we define
the break points and the so-called hat functions that form a basis
for the linear spline with the given break points. This approach,
although a bit indirect, allows us to conveniently compute a fit with a
least squares criterion. We can then manually change the middle
breakpoint to see if that helps get a better fit. However, a better
fit may not make good sense, given the limited data we have.
An example will come out of this exercise.
We will have two segments on the linear spline. This gives three
parameters, the function value of the spline at each of the three
breakpoints that define the two segments. Therefore we need
three basis functions, one per parameter. We use scalar
variables to hold the break points for defining the hat functions
and we also put these break points into a vector for use in
other contexts.
These give the flows at Ferndale
for the first linear spline.
We introduce an intermediate function that lists the breakpoints
involved to allow more convenient reuse of functions.
In Figure 2 it is important to note that the basis function is defined
across all segments but it may be zero in some segments. It is
always nonzero in at least one segment. The remaining two
basis functions are given by:
Setup for the linfit function. This requires a vector function that
contains the basis functions.
Least squares fit to the data
Define a convenient function to evaluate the linear spline. A linear
spline is just linear interpolation between the parameter values
found by linfit!
Compute the standard deviation of the variation about the fitted
function. np is already one less than the number of points. It gives
the index to the last point and indices in Mathcad usually start at
zero. I have left that at its default. Thus we subtract 2 to get a reduction
of three from the number of points to account for the number of
parameters estimated
Now at this point we will redo the computations but will vary the
middle breakpoint Location to reduce the scatter about the linear
spline. To do that we have to redefine some values and
Mathcad does mark these with a wavy green line under each
value that is redefined.
This revised fit of the flow at Deming versus Ferndale obtained
by shifting the middle breakpoint has a standard deviation of
the scatter that is significantly less than the first fit. However,
the reason for this seems to be that there are no points
between 20,000 and 30,000 cfs. The break in slope makes
sense it it is close to the flow level at Deming at which
overflow at Everson begins. A break in slope does not make
sense at 22,000 cfs. Therefore, we will retain a middle break-
point near 30,000 at Ferndale.
The above fitting process makes the common assumption that
the scatter about the fitted function, often represented by what
is called a disturbance term, has a variance that is
the same at all levels of the independent variable,
that is the flow at Ferndale. This is the assumption of
homoscedasticity, one that is present in nearly all applications
of least squares fitting. However, there is no reason to believe
that the scatter is in fact homoscedastic. The potential scatter
in the flow predicted at Deming is surely somewhat larger when the
flow at Ferndale is 55,000 cfs than when the flow at Ferndale is
18,000 cfs. In order to establish the nature of this variation we would
need to have fifty or more joint observations of peak flow at
Ferndale and peak flow at Deming. However, that will never be
the case because the measured peaks at Deming are too unreliable
to use and neither the data required nor the budget needed
are available to do a detailed calibration for each of about fifty
maximum annual peak flows!

Consequently we will make a simple but yet useful assumption
on the variation of the variance of the scatter about the fitted
function. We will assume that the disturbance term is
heteroscedastic and that the variance is proportional to the
value of the independent variable. In this case one can
estimate the parameters using least squares by dividing the
variables involved by the square root of the independent variable.
In the rescaled variables the disturbance term is now
homoscedastic and we can use ordinary least squares to
estimate the parameters. Usually taking into account the effect
of heteroscedasticity is most important when statistical tests
of the parameters are being made. We do not do that here.

However, we will be constructing a joint probability density function
for the peak flow at Deming and the peak flow at Ferndale so that
we need to invoke some assumptions about the nature of the
disturbance term. We want the variation about the larger flows
to be larger in absolute terms but smaller in relative terms.
The assumed nature of heteroscedasticity made here attains
that end.
We need to redefine the basis function here. Mathcad always uses
the last definition that is above or to the left of the current location.
Thus we copy some of the above information again.
Rescale with the square root of the independent variable. Note that
the rescaling of the flows at Ferndale must take place in the vector
function that contains all the basis functions. We also have to
create a function that gives us the rescaled result, to estimate
the variance/standard deviation of the disturbance term. We also
need another function to give us the true-scale result
Note that the differences in parameters are quite small. An
assumption of stronger variation of the variance would have made a
somewhat greater difference.
Gives the rescaled result
Gives the true-scale result
Define a function for the true-scale standard deviation:
Here we show the variation of standard deviation at the breakpoints.
Notice that the change is a bit more than a factor of two from one end
of the range to the other.
Input the annual flood-peak series at Ferndale. Then use the linear spline fit above assuming heteroscedasticity to estimate corrected peaks at Deming. The series at Ferndale is unofficial and includes some provisional peaks.
Get the peak-flow series into a vector.
Compute flood peaks at Deming using the linear spline giving the
true scale results for a heteroscedastic disturbance term:
Compute the statistics for Log-Pearson Type III. These are based on
Bulletin 17B and other sources.
Compute the weighted skew following Bulletin 17B. The plate 1 skew
for both Ferndale and Deming is 0.0. The MSE for that value from the
plate is to be taken as 0.302. We compute the MSE for the station
skew using equations from Bulletin 17B:
We will start with the Pearson Type III pdf and will use log-space values.
We want to convert to natural log space. The source info will be log
base 10.
Here are the basic moment values from my analysis at Deming.
Here are the moment-estimation equations from p 18.20 of Handbook of
Hydrology, edited by D. Maidment
Give the pdf in terms of these values.
The small skew value makes this nearly a normal distribution. Thus
the distribution of peak annual flows as created here have a small
skew.
Compute the flows for a sequence of return periods
This is a first estimate for the
root function.