Wednesday, July 7, 2010

Gnuplot Fitting

If you aren't using gnuplot, you should check it out.  For those that do, I just figured out a fun tidbit on how to print fit errors that I couldn't find on the Intarwebs for the life of me.

The "fit" command in gnuplot works much like the "plot" command; you give it a data file and a "using" specification, and also the function you want to fit with and its parameters like so:

f(x) = a*x + b
a = -0.01
b = 0
fit f(x) 'somefile.txt' using 1:2:3 via a,b

You do need to give a first guess for parameters, as you see above.  The "using" specification here tells it to use the first column for x, second for y and the third for error in y (which you want to include if you can).  I've had plenty of trouble in the past getting a fit to converge, but generally you need to keep playing with your initial parameters and things will work out.  Once the fit is done you get something like this:

After 4 iterations the fit converged.
final sum of squares of residuals : 8.19305
rel. change during last iteration : -6.6513e-10


degrees of freedom    (FIT_NDF)                        : 9
rms of residuals      (FIT_STDFIT) = sqrt(WSSR/ndf)    : 0.954117
variance of residuals (reduced chisquare) = WSSR/ndf   : 0.910339


Final set of parameters            Asymptotic Standard Error
=======================            ==========================


a               = -0.00123217      +/- 0.0002571    (20.87%)
b               = 0.0347759        +/- 0.001622     (4.665%)


correlation matrix of the fit parameters:
               a      b      
a               1.000 
b              -0.872  1.000 

Huzzah!  You get your final fit parameters with error; you get degrees of freedom and stdfit and chisquare.  This is fun and all, but how to I put that in my plot?  First, you can get at some environment variables to use in your gnuplot script, like "FIT_WSSR" and "FIT_STDFIT."  As far as the final parameters and errors, here's the best way I've found to put them on your plot:

set fit errorvariables
fit f(x) 'group_asym_out.txt' index 1 u 1:2:3 via a,b 
set label sprintf("a = %2.6f +- %2.6f",a,a_err) at graph 0.7,0.15
set label sprintf("b = %2.6f +- %2.6f",b,b_err) at graph 0.7,0.1
plot 'somefile.txt' u 1:2:3 w yerr, f(x) title 'f(x)=a*x+b'

Here I've put the tricky part in bold.  We have to set labels on the graph somewhere; mercifully we can use the "graph" coordinates instead of the coordinates of the plot itself, so that top right is graph (1,1) and bottom left is graph (0,0).  Now we access our parameters "a" and "b" using "sprintf" in the label.

To get the errors on these parameters, we need to include "set fit errorvariables" which enables the error variables "_err," for example "a_err" and "b_err."  The graph still needs some stylistic attention, but at this point it looks this:


Sources: Janert's Gnuplot in Action

4 comments:

  1. Note that gnuplot gives "Asmptotic Standard Errors", otherwise known as the "Wrong Error", at least in my opinion.

    My understanding (which may not be exact) is that it does the fit, determines the parameters, uncertainties, and the chisquared. It then rescales the uncertainties so that the fit gives a chisquared of one, and the final errors it gives are the errors you would get if you did the fit with the reduced error bars. [this is equivalent to reducing the uncertainties by a factor of approximately sqrt(chisquared).

    As a quick example, if somefile.txt has:
    0 1.000 1
    1 1.000 1
    2 0.9999 1
    3 1.0001 1

    you're and you do
    > f(x)=a
    > fit f(x) 'somefile.txt' using 1:2:3 via a

    you're basically asking for the average of four measurements, each of which is 1 +/- 1 (very nearly). So it can be shown that the average should be 1 and the uncertainty should be 1/sqrt(4)=0.5. [Note that if you set all of the data points to be exactly one, then chisquared=0 and it blows up when trying to get the uncertainty].

    However, what you get is the following:
    Final set of parameters Asymptotic Standard Error
    ======================= ==========================

    a = 1.00002 +/- 6.292e-05 (0.006291%)

    because the chisquared is very small, and so it assumes that you messed up your error bars and it 'corrects' them for you to make them consistent with the fluctuations. You can actually see that if you leave the uncertainties alone but tweak the values (1.0001 --> 1.0002) the uncertainty changes, because the chisquared changed.

    Note that rescaling to give chisquared=1 is not an uncommpon procedure to use. But it's very important to know if your fitting code has decided to do it for you without actually telling you!

    ReplyDelete
  2. Sorry - in the above example i used the input file where I changed the 1.0001
    to 1.0002. If you use the somefile.txt I gave, you get:

    Final set of parameters Asymptotic Standard Error
    ======================= ==========================

    a = 1 +/- 4.082e-05 (0.004082%)

    ReplyDelete
  3. There's another problem with fitting. You need to make sure that the FIT_LIMIT is set low enough, otherwise, you'll get different answers depending on starting parameters.

    To make sure your errors in the fit parameters are "real", I do something like this:

    Example script:
    ====================================================
    this(x)=d0*exp(-d1*x)
    FIT_LIMIT = 1e-15
    set fit errorvariables
    fit [] this(x) "file.dat" using 1:2:3 via d0, d1
    d1_err_act=d1_err/FIT_STDFIT
    save var 'fit_params'

    =====================================================

    This also saves all your fit variables in a file that you can grep from later (like if you're running your fitting scripts from perl and you want the parameters back right now!)

    ReplyDelete

Keep it to Physics, please.