The ECA_Opt library page

ECA_Opt is a little collection of macros and underlying code that provides a flexible means of parsing command lines, checking for errors, and printing documentation in various formats. The documentation for each command line option is written into the source code in a streamlined manner, from whence it can be output in several different formats, examples of which appear below:

Relevant Links: (Note: if you are not viewing this at fish-dna-math.homeunix.net (i.e. if you received this file in a distribution of eca_opt) then these links may be inactive. Everything you can get from these links is, however, included in the distribution.)

  • Download the files.
  • Doxygen documentation of the source code. Navigate your way to the documentation for the header file eca_opt.h and find there a fair bit of documentation about the macros you can use to get ECA_Opt to work for you.
  • ECA_Opt_TUTORIAL A tutorial that shows how to use ECA_Opt.

Examples of the Different Output Formats

Below are several examples of the different types of output available with ECA_Opt. These are all taken from a statistical analysis program called CoNe that I wrote for a problem arising in genetics. The source code in which the ECA_Opt macros are used to store all this goes into a single function in the program, a listing of which may be viewed at the bottom of the page, here: source_code_creating_the_example_from_the_program_cone. This code listing shows that most of the typing to create it has gone into providing the actual documentation. The amount of code that has to be written to use ECA_Opt to declare the options, parse the command line, and do error checking is pretty minimal, and that is a good thing. ECA_Opt can help minimize the amount of time you spend writing code to read command line options, provide pretty decent error checking on the command line, and let you maintain documentation about each option as you are actually creating it.

Short Help Format: --help

CoNe’s short help output is invoked with CoNe --help and looks like:

cone  --  a program for estimating Ne

     --help                                  short listing of options
     --help-full                             long listing of options
     --help-nroff                            long listing of options in nroff man-page format
     --help-xml                              output all options in XML format
     --version                               prints program version information
     --version-history                       prints history of different versions
     --command-file         F                inserts contents of file F into the command line

   ****  Data Analysis Options  ****

-f , --file-name            F                pathname of the data file
-p , --path-to-probs-files  D                directory path to XXXpr.txt files
-T , --gens-between         R                generations between samples
-m , --mc-reps              J                number of Monte Carlo reps
-n , --ne-lo-hi-step        R1 R2 R3         values of Ne to compute L(Ne) at
-q , --prior                R                allele frequency prior parameter
-s , --seed-phrase          S                random number seed-phrase

Long Help Format: --help-full

The long help format is much more informative and is obtained with CoNe --help-full. It looks like:

cone  --  a program for estimating Ne

Author(s):
        Eric C. Anderson (eric.anderson@noaa.gov)

About the Program:
    CoNe computes the likelihood of Ne given data on two temporally spaced genetic
    samples. The statistical model used is based on the coalescent of the gene
    copies drawn in the second sample, as described in Berthier et al. (2003)
    Genetics 2003. 160:741-51. The Monte Carlo computations to compute the
    likelihood, however, were developed by Eric Anderson, and are orders of
    magnitude faster than previous implementations. 
    
    Details of the algorithm are given in Anderson (2005) Genetics 170:955-967.

In the following:
        "J" refers to an integer argument to an option

        "R" refers to a real number argument to an option

        "S" refers to a string argument to an option

        "F" refers to a file path argument to an option. For example,
                "datfile.txt" if the file is in the current working directory, or
                something like "~/eriq/Documents/data/datfile.txt" if you want to
                provide a complete file path.  (Beware of spaces in file paths!)

        "D" refers to a directory path argument to an option. For example,
                "data_direcory/" if the directory is in the current working directory, or
                something like "~/eriq/Documents/data_directory/" if you want to
                provide a complete directory path.  Note that the trailing slash should be
                optional, but currently is not.  (ERIC ADD MACROS FOR GETTING FILES AND DIRECTORIES

        "G" refers to a string that gives a (possibly) discontinous range of
                nonnegative integers.  For example:  "1-5,7,9,10-15" specifies
                the integers 1 through 5, 7, 9, and 10 through 15.  There can be no
                whitespace in the string specifying the range, and the numbers must all
                be increasing.  Also, the string cannot start nor end with a comma or a dash.
                Finally, you should not use "-" to denote two ranges without placing any
                commas in between.

        "C" refers to a "constrained" string argument to an option,
                i.e., the argument is a string that may only be drawn from a small
                set of alternatives, as specified in the help-full description.

   ****  Program-description and documentation  parameters  ****

     --help                 
        this returns a short list of all program options and associated
        arguments

     --help-full            
        this returns a full list of all program options and associated
        arguments

     --help-nroff           
        this returns a full list of all program options and associated
        arguments using the formatting styles in nroff that give you the look
        of a man page. View the formatted ouput by doing: 'prog --help-nroff
        | nroff -man | more' where prog is the name of the program.

     --help-xml             
        This returns a list of all options in a simple XML format which is
        suitable for input to the guiLiner front end.

     --version              
        prints program version information

     --version-history      
        prints history of different versions

     --command-file         F
        By using this option you can store command line arguments in the file
        named in F. You may have any kind of white space (including line
        endings) in the file. The line endings are treated as just another
        space. Comments may be included in the file by enclosing them within
        a pair of ampersands (the & character). Note that you must have a &
        at the beginning and at the end of the comment. You cannot use just a
        single & to comment to the end of a line. Your comments may spread
        over several lines---they will still be stripped from the resulting
        command line so long as the are enclosed in ampersands. This feature
        is helpful if you have long and complex command lines that you wish
        to store if it makes it easier to read the command line by breaking
        it across multiple lines or if you have certain clusters of options
        that you like to store together as a module. This option may be used
        up to 10000 times. Optional.


   ****  Data Analysis Options  ****

-f , --file-name            F
        F is the name of the file in which you have your data. It is in the
        same format as data files for TM3 (by Pierre Berthier) and TMVP (by
        Mark Beaumont) The data file should start with a 0 (this is a strange
        vestige of some sort from TM3 or TMVP) followed by the number of time
        periods ***which in this case must always be 2*** followed by the
        number of loci. Then data for each locus consists of the number of
        alleles observed at the locus followed by a row of counts of the
        different alleles observed in the first sample and a row of counts of
        the different alleles observed at the second sample. (By first sample
        I mean the sample taken first in time going forward. Hence, the
        second sample is the sample that was collected most recently.) There
        must be only integers and whitespace in the file. An example file is
        shown in the FILES section of the manual pages. NOTE! It turns out
        that it is not essential that the counts of alleles observed in the
        first sample (the one further back in the past) be integers. It turns
        out to be convenient to express them as real numbers in some cases,
        so I have recoded it so that they can be real numbers and not just
        integers. Now when the data file is echoed to standard input, these
        counts are expressed as real numbers. Do not let this alarm you.
        Everything is OK, still.

-p , --path-to-probs-files  D
        This is the pathway to files containing the precomputed probabilities
        of having j lineages remaining at scaled time t given that you
        started with i lineages. Note that the trailing slash is required.
        For example on my system ~/Documents/eca_code/CoNe/probs/ is the
        pathway. The probs files are a collection of files named XXXpr.txt
        where XXX is a number giving the number of gene copies in the second
        sample. These files have been precomputed using the program
        simCoNeprob and are described below in FILES. The CoNe distribution
        includes precomputed XXXpr.txt files with XXX ranging from 10 to 400.
        This represents samples of between 5 and 200 diploid organisms. For
        different sample sizes it is necessary to create new XXXpr.txt files
        using simCoNeprob which is also included with the CoNe distribution.

-T , --gens-between         R
        R is the number of generations between samples. It may be specified as
        a non-integer in order to allow for a non-integer number of
        generations.

-m , --mc-reps              J
        J is the number of importance sampling reps to perform for each value
        of the number of genes ancestral to the second sample on each locus.
        I have found the importance sampling algorithm to be good enough that
        100 J=100 gives reliable results and usually runs very quickly.
        However; on a final run of your data set J should be much larger. You
        can get an idea of whether J should be larger by the width of the
        Monte Carlo confidence intervals around the estimated likelihood
        curve.

-n , --ne-lo-hi-step        R1 R2 R3
        sets the values of Ne for which the likelihood will be computed. R1 is
        the lowest value of Ne. R2 is the highest value of Ne. R3 is the step
        size between values of Ne. For values of Ne such that T/(2Ne) is
        smaller than .06 (or so) the precomputed values of scaled time
        (stored in the appropriate XXXpr.txt file (see the description of the
        -p option)) will be used. So the step size may not be that given by
        R3. Arguments need not be given as real numbers. An integer (like
        250) will work just fine.

-q , --prior                R
        R specifies the prior distribution for the allele frequencies. R=0
        makes the prior a uniform Dirichlet distribution. R>0 makes the
        parameters of the Dirichlet prior are R/K. Hence R=1 is the so-called
        unit information prior. R=1 is the default.

-s , --seed-phrase          S
        S is a single string (no spaces) that will be used to seed the random
        number generator. If this option is not invoked then a seed is chosen
        based on the current time or---if the file cone_seeds is
        present---the seeds are taken from that file. Upon completion of the
        program the next random number seeds in series are printed to the
        file cone_seeds.

Unix Man Page Format: --help-nroff

This format has all the same information as the long help format, but it is formatted so that when processed with nroff -man and read through less (or more if more is less on your *nix system) it has the nice look of a Unix man page. The output is obtained with CoNe --help-nroff. The raw output is:

.\" Process this file with 
.\" groff -man -Tascii FileName.1 
.\"
.TH CoNe2 1 "Date Not Available" AUTO_GENERATED_BY_ECA_OPTION_PACKAGE "User Manuals"
.SH NAME
cone  --  a program for estimating Ne
.SH AUTHOR(S)
Eric C. Anderson (eric.anderson@noaa.gov)
.SH ABOUT THE PROGRAM
CoNe computes the likelihood of Ne given data on two temporally spaced genetic samples. The statistical model used is based on the coalescent of the gene copies drawn in the second sample, as described in Berthier et al. (2003) Genetics 2003. 160:741-51. The Monte Carlo computations to compute the likelihood, however, were developed by Eric Anderson, and are orders of magnitude faster than previous implementations. 

Details of the algorithm are given in Anderson (2005) Genetics 170:955-967..\" starting program-generated options part
.SH OPTIONS
.PP
In the following:

.IR J "  refers to an integer argument to an option"

.IR R "  refers to a real number argument to an option"

.IR S "  refers to a string argument to an option"

.IR F "  refers to a file path argument to an option. For example,"
"datfile.txt" if the file is in the current working directory, or
something like "~/eriq/Documents/data/datfile.txt" if you want to
provide a complete file path.  (Beware of spaces in file paths!)

.IR D "  refers to a directory path argument to an option. For example,"
"data_direcory/" if the directory is in the current working directory, or
something like "~/eriq/Documents/data_directory/" if you want to
provide a complete directory path.  Note that the trailing slash should be
optional, but currently is not.  (ERIC ADD MACROS FOR GETTING FILES AND DIRECTORIES)

.IR G "  refers to a string that gives a (possibly) discontinous range of"
nonnegative integers.  For example:  "1-5,7,9,10-15" specifies
the integers 1 through 5, 7, 9, and 10 through 15.  There can be no
whitespace in the string specifying the range, and the numbers must all
be increasing.  Also, the string cannot start nor end with a comma or a dash.
Finally, you should not use "-" to denote two ranges without placing any
commas in between.

.IR C "  refers to a constrained string argument to an option,"
i.e., the argument is a string that may only be drawn from a small
set of alternatives, as specified in the help-full description.


   ****  Program-description and documentation  parameters  ****
.PP
.B  "--help" 
.RS
this returns a short list of all program options and associated arguments
.RE
.PP
.B  "--help-full" 
.RS
this returns a full list of all program options and associated arguments
.RE
.PP
.B  "--help-nroff" 
.RS
this returns a full list of all program options and associated arguments using the formatting styles in nroff that give you the look of a man page. View the formatted ouput by doing: 'prog --help-nroff | nroff -man | more' where prog is the name of the program.
.RE
.PP
.B  "--help-xml" 
.RS
This returns a list of all options in a simple XML format which is suitable for input to the guiLiner front end.
.RE
.PP
.B  "--version" 
.RS
prints program version information
.RE
.PP
.B  "--version-history" 
.RS
prints history of different versions
.RE
.PP
.BI  "--command-file" "   F"
.RS
By using this option you can store command line arguments in the file named in F. You may have any kind of white space (including line endings) in the file. The line endings are treated as just another space. Comments may be included in the file by enclosing them within a pair of ampersands (the & character). Note that you must have a & at the beginning and at the end of the comment. You cannot use just a single & to comment to the end of a line. Your comments may spread over several lines---they will still be stripped from the resulting command line so long as the are enclosed in ampersands. This feature is helpful if you have long and complex command lines that you wish to store if it makes it easier to read the command line by breaking it across multiple lines or if you have certain clusters of options that you like to store together as a module. This option may be used up to 10000 times. Optional.
.RE
.PP

   ****  Data Analysis Options  ****

.BI  "-f , --file-name " "   F"
.RS
F is the name of the file in which you have your data. It is in the same format as data files for TM3 (by Pierre Berthier) and TMVP (by Mark Beaumont) The data file should start with a 0 (this is a strange vestige of some sort from TM3 or TMVP) followed by the number of time periods ***which in this case must always be 2*** followed by the number of loci. Then data for each locus consists of the number of alleles observed at the locus followed by a row of counts of the different alleles observed in the first sample and a row of counts of the different alleles observed at the second sample. (By first sample I mean the sample taken first in time going forward. Hence, the second sample is the sample that was collected most recently.) There must be only integers and whitespace in the file. An example file is shown in the FILES section of the manual pages. NOTE! It turns out that it is not essential that the counts of alleles observed in the first sample (the one further back in the past) be integers. It turns out to be convenient to express them as real numbers in some cases, so I have recoded it so that they can be real numbers and not just integers. Now when the data file is echoed to standard input, these counts are expressed as real numbers. Do not let this alarm you. Everything is OK, still.
.RE
.PP
.BI  "-p , --path-to-probs-files " "   D"
.RS
This is the pathway to files containing the precomputed probabilities of having j lineages remaining at scaled time t given that you started with i lineages. Note that the trailing slash is required. For example on my system ~/Documents/eca_code/CoNe/probs/ is the pathway. The probs files are a collection of files named XXXpr.txt where XXX is a number giving the number of gene copies in the second sample. These files have been precomputed using the program simCoNeprob and are described below in FILES. The CoNe distribution includes precomputed XXXpr.txt files with XXX ranging from 10 to 400. This represents samples of between 5 and 200 diploid organisms. For different sample sizes it is necessary to create new XXXpr.txt files using simCoNeprob which is also included with the CoNe distribution.
.RE
.PP
.BI  "-T , --gens-between " "   R"
.RS
R is the number of generations between samples. It may be specified as a non-integer in order to allow for a non-integer number of generations.
.RE
.PP
.BI  "-m , --mc-reps " "   J"
.RS
J is the number of importance sampling reps to perform for each value of the number of genes ancestral to the second sample on each locus. I have found the importance sampling algorithm to be good enough that 100 J=100 gives reliable results and usually runs very quickly. However; on a final run of your data set J should be much larger. You can get an idea of whether J should be larger by the width of the Monte Carlo confidence intervals around the estimated likelihood curve.
.RE
.PP
.BI  "-n , --ne-lo-hi-step " "   R1 R2 R3"
.RS
sets the values of Ne for which the likelihood will be computed. R1 is the lowest value of Ne. R2 is the highest value of Ne. R3 is the step size between values of Ne. For values of Ne such that T/(2Ne) is smaller than .06 (or so) the precomputed values of scaled time (stored in the appropriate XXXpr.txt file (see the description of the -p option)) will be used. So the step size may not be that given by R3. Arguments need not be given as real numbers. An integer (like 250) will work just fine.
.RE
.PP
.BI  "-q , --prior " "   R"
.RS
R specifies the prior distribution for the allele frequencies. R=0 makes the prior a uniform Dirichlet distribution. R>0 makes the parameters of the Dirichlet prior are R/K. Hence R=1 is the so-called unit information prior. R=1 is the default.
.RE
.PP
.BI  "-s , --seed-phrase " "   S"
.RS
S is a single string (no spaces) that will be used to seed the random number generator. If this option is not invoked then a seed is chosen based on the current time or---if the file cone_seeds is present---the seeds are taken from that file. Upon completion of the program the next random number seeds in series are printed to the file cone_seeds.
.RE
.PP
.\" done with program-generated options part

A snapshot of what it looks like once it is processed with nroff using, say,

CoNe --helpf-nroff | nroff -man | more 

appears below:

Additionally, should you want to get html output for the man page you could do:

CoNe  --help-nroff | groff -man -Thtml > cone_man_page.html 

and get HTML output that looks like cone_man_page.html (Link broken if not on fish-dna-math.homeunix.net). Or, if you wanted to get postscript output you could do:

CoNe  --help-nroff | groff -man -Tps > cone_man_page.ps 

Then, once that is converted to a pdf file, it looks like cone_man_page.pdf (Link broken if not on fish-dna-math.homeunix.net).

GuiLiner XML Format: --help-xml

Finally, we have GuiLiner output. Doing

CoNe --help-xml

will produce output which, if captured in a file with the .xml extension will provide input to GuiLiner to assemble a GUI to “host” your command line program. See http://sourceforge.net/projects/guiliner/ for more details. The output produced in this case looks like:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!--A GuiLiner specification for cone-->
<guiLiner>
<program>
    <binary_name>cone</binary_name>
    <binary_location></binary_location>
    <binary_html_manual_index></binary_html_manual_index>
    <binary_short_info>a program for estimating Ne</binary_short_info>
    <binary_long_info>
        <![CDATA[
            CoNe computes the likelihood of Ne given data on two temporally spaced genetic
            samples. The statistical model used is based on the coalescent of the gene
            copies drawn in the second sample, as described in Berthier et al. (2003)
            Genetics 2003. 160:741-51. The Monte Carlo computations to compute the
            likelihood, however, were developed by Eric Anderson, and are orders of
            magnitude faster than previous implementations. <br><br>Details of the
            algorithm are given in Anderson (2005) Genetics 170:955-967.
        ]]>
    </binary_long_info>
    <gL_window_size>800</gL_window_size>
    <look_feel>java</look_feel> <!-- MAY BE 'SYSTEM' OR 'JAVA' -->
    <wizard>false</wizard>
</program>
<options>
    <option>
        <option_type>info</option_type>
        <option_name>Version</option_name>
        <bin></bin>
        <option_flag>--version</option_flag>
        <option_linker> </option_linker>
        <option_required>false</option_required>
        <option_can_appear_in_command>false</option_can_appear_in_command>
        <option_short_description>prints program version information</option_short_description>
        <option_long_description></option_long_description>
    </option>
    <option>
        <option_type>info</option_type>
        <option_name>Version History</option_name>
        <bin></bin>
        <option_flag>--version-history</option_flag>
        <option_linker> </option_linker>
        <option_required>false</option_required>
        <option_can_appear_in_command>false</option_can_appear_in_command>
        <option_short_description>prints history of different versions</option_short_description>
        <option_long_description></option_long_description>
    </option>
    <option>
        <option_type>mflag</option_type>
        <option_name>Command File</option_name>
        <option_subtext>input format is: F</option_subtext>
        <values></values>
        <option_flag>--command-file</option_flag>
        <option_linker> </option_linker>
        <option_required>false</option_required>
        <option_can_appear_in_command>true</option_can_appear_in_command>
        <option_short_description>inserts contents of file F into the command line</option_short_description>
        <option_long_description>
            <![CDATA[
            By using this option you can store command line arguments in the file
            named in F. You may have any kind of white space (including line
            endings) in the file. The line endings are treated as just another
            space. Comments may be included in the file by enclosing them within
            a pair of ampersands (the & character). Note that you must have a &
            at the beginning and at the end of the comment. You cannot use just a
            single & to comment to the end of a line. Your comments may spread
            over several lines---they will still be stripped from the resulting
            command line so long as the are enclosed in ampersands. This feature
            is helpful if you have long and complex command lines that you wish
            to store if it makes it easier to read the command line by breaking
            it across multiple lines or if you have certain clusters of options
            that you like to store together as a module. This option may be used
            up to 10000 times. Optional.
<br><br>Expected Format Of Arguments: F            ]]>
        </option_long_description>
    </option>
    <subset>
        <subset_name>Data Analysis Options</subset_name>
        <subset_description>These options control the inputs and methods for the CoNe analysis</subset_description>
        <option>
            <option_type>filechooser</option_type>
            <option_name>Data File</option_name>
            <file></file>
            <option_flag>-f</option_flag>
            <option_linker> </option_linker>
            <option_required>true</option_required>
            <option_can_appear_in_command>true</option_can_appear_in_command>
            <option_short_description>pathname of the data file</option_short_description>
            <option_long_description>
                <![CDATA[
                F is the name of the file in which you have your data. It is in the
                same format as data files for TM3 (by Pierre Berthier) and TMVP (by
                Mark Beaumont) The data file should start with a 0 (this is a strange
                vestige of some sort from TM3 or TMVP) followed by the number of time
                periods ***which in this case must always be 2*** followed by the
                number of loci. Then data for each locus consists of the number of
                alleles observed at the locus followed by a row of counts of the
                different alleles observed in the first sample and a row of counts of
                the different alleles observed at the second sample. (By first sample
                I mean the sample taken first in time going forward. Hence, the
                second sample is the sample that was collected most recently.) There
                must be only integers and whitespace in the file. An example file is
                shown in the FILES section of the manual pages. NOTE! It turns out
                that it is not essential that the counts of alleles observed in the
                first sample (the one further back in the past) be integers. It turns
                out to be convenient to express them as real numbers in some cases,
                so I have recoded it so that they can be real numbers and not just
                integers. Now when the data file is echoed to standard input, these
                counts are expressed as real numbers. Do not let this alarm you.
                Everything is OK, still.
                ]]>
            </option_long_description>
        </option>
        <option>
            <option_type>complex</option_type>
            <option_name>Probs File Path</option_name>
            <option_subtext>D</option_subtext>
            <complex></complex>
            <option_flag>-p</option_flag>
            <option_linker> </option_linker>
            <option_required>true</option_required>
            <option_can_appear_in_command>true</option_can_appear_in_command>
            <option_short_description>directory path to XXXpr.txt files</option_short_description>
            <option_long_description>
                <![CDATA[
                This is the pathway to files containing the precomputed probabilities
                of having j lineages remaining at scaled time t given that you
                started with i lineages. Note that the trailing slash is required.
                For example on my system ~/Documents/eca_code/CoNe/probs/ is the
                pathway. The probs files are a collection of files named XXXpr.txt
                where XXX is a number giving the number of gene copies in the second
                sample. These files have been precomputed using the program
                simCoNeprob and are described below in FILES. The CoNe distribution
                includes precomputed XXXpr.txt files with XXX ranging from 10 to 400.
                This represents samples of between 5 and 200 diploid organisms. For
                different sample sizes it is necessary to create new XXXpr.txt files
                using simCoNeprob which is also included with the CoNe distribution.
<br><br>Expected Format Of Arguments: D
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
 
<DL>
	<DT>D</DT>
	<DD>refers to a directory path argument to an option, typically with the trailing slash.  For example:  /Users/eriq/data_directory/</DD>
                ]]>
            </option_long_description>
        </option>
        <option>
            <option_type>complex</option_type>
            <option_name>Number of Generations</option_name>
            <option_subtext>R</option_subtext>
            <complex></complex>
            <option_flag>-T</option_flag>
            <option_linker> </option_linker>
            <option_required>true</option_required>
            <option_can_appear_in_command>true</option_can_appear_in_command>
            <option_short_description>generations between samples</option_short_description>
            <option_long_description>
                <![CDATA[
                R is the number of generations between samples. It may be specified as
                a non-integer in order to allow for a non-integer number of
                generations.
<br><br>Expected Format Of Arguments: R
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
 
<DL>
	<DT>R</DT>
	<DD>refers to a real number argument to an option. For example:  2.3</DD>
                ]]>
            </option_long_description>
        </option>
        <option>
            <option_type>complex</option_type>
            <option_name>Monte Carlo Reps</option_name>
            <option_subtext>J</option_subtext>
            <complex></complex>
            <option_flag>-m</option_flag>
            <option_linker> </option_linker>
            <option_required>true</option_required>
            <option_can_appear_in_command>true</option_can_appear_in_command>
            <option_short_description>number of Monte Carlo reps</option_short_description>
            <option_long_description>
                <![CDATA[
                J is the number of importance sampling reps to perform for each value
                of the number of genes ancestral to the second sample on each locus.
                I have found the importance sampling algorithm to be good enough that
                100 J=100 gives reliable results and usually runs very quickly.
                However; on a final run of your data set J should be much larger. You
                can get an idea of whether J should be larger by the width of the
                Monte Carlo confidence intervals around the estimated likelihood
                curve.
<br><br>Expected Format Of Arguments: J
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
 
<DL>
	<DT>J</DT>
	<DD>refers to an integer argument to an option.  For example:  15</DD>
                ]]>
            </option_long_description>
        </option>
        <option>
            <option_type>complex</option_type>
            <option_name>Ne Values</option_name>
            <option_subtext>R1 R2 R3</option_subtext>
            <complex></complex>
            <option_flag>-n</option_flag>
            <option_linker> </option_linker>
            <option_required>true</option_required>
            <option_can_appear_in_command>true</option_can_appear_in_command>
            <option_short_description>values of Ne to compute L(Ne) at</option_short_description>
            <option_long_description>
                <![CDATA[
                sets the values of Ne for which the likelihood will be computed. R1 is
                the lowest value of Ne. R2 is the highest value of Ne. R3 is the step
                size between values of Ne. For values of Ne such that T/(2Ne) is
                smaller than .06 (or so) the precomputed values of scaled time
                (stored in the appropriate XXXpr.txt file (see the description of the
                -p option)) will be used. So the step size may not be that given by
                R3. Arguments need not be given as real numbers. An integer (like
                250) will work just fine.
<br><br>Expected Format Of Arguments: R1 R2 R3
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
 
<DL>
	<DT>R</DT>
	<DD>refers to a real number argument to an option. For example:  2.3</DD>
                ]]>
            </option_long_description>
        </option>
        <option>
            <option_type>complex</option_type>
            <option_name>Allele Freq Prior</option_name>
            <option_subtext>R</option_subtext>
            <complex></complex>
            <option_flag>-q</option_flag>
            <option_linker> </option_linker>
            <option_required>false</option_required>
            <option_can_appear_in_command>true</option_can_appear_in_command>
            <option_short_description>allele frequency prior parameter</option_short_description>
            <option_long_description>
                <![CDATA[
                R specifies the prior distribution for the allele frequencies. R=0
                makes the prior a uniform Dirichlet distribution. R>0 makes the
                parameters of the Dirichlet prior are R/K. Hence R=1 is the so-called
                unit information prior. R=1 is the default.
<br><br>Expected Format Of Arguments: R
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
 
<DL>
	<DT>R</DT>
	<DD>refers to a real number argument to an option. For example:  2.3</DD>
                ]]>
            </option_long_description>
        </option>
        <option>
            <option_type>complex</option_type>
            <option_name>Random Seed Phrase</option_name>
            <option_subtext>S</option_subtext>
            <complex></complex>
            <option_flag>-s</option_flag>
            <option_linker> </option_linker>
            <option_required>false</option_required>
            <option_can_appear_in_command>true</option_can_appear_in_command>
            <option_short_description>random number seed-phrase</option_short_description>
            <option_long_description>
                <![CDATA[
                S is a single string (no spaces) that will be used to seed the random
                number generator. If this option is not invoked then a seed is chosen
                based on the current time or---if the file cone_seeds is
                present---the seeds are taken from that file. Upon completion of the
                program the next random number seeds in series are printed to the
                file cone_seeds.
<br><br>Expected Format Of Arguments: S
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
 
<DL>
	<DT>S</DT>
	<DD>refers to a string argument to an option. For example: boing</DD>
                ]]>
            </option_long_description>
        </option>
    </subset>
    </options>
</guiLiner>

Source Code Creating the Example from the Program CoNe

The following is the fragment of a function called GetCoNeData where all the options and documentation are defined that allow ECA_Opt to produce the documentation seen in the examples. The syntax highlighting in the following highlights all ECA_Opt macros in UPPERCASE turquoise. (there are a few lowercase turquoise “options” in there that shouldn’t be highlighted as they are not ECA_Opt macros).

void GetCoNeData(char *FILENAME, ldat **loc, char **argv, int argc, int *NumLoc, double *T,  int *NumReps, char *phrase, Nstruct **Ns)
{
	int i,j,l,K,n0,nT,temp;
	FILE *in;
	double Nlo,Nhi,Nstep;
	int NumSam;
 
	int File_f = 0,
		T_f = 0,
		Path_f = 0,
		N_f = 0,
		Seed_f = 0,
		Nlo_hi_step_f = 0,
		Prior_f = 0;
		
	
	DECLARE_ECA_OPT_VARS;
	
	/* some defaults */
	gPrior = 1.0;
	sprintf(phrase,"");
	
	
	SET_OPT_WIDTH(28);
	SET_ARG_WIDTH(17);
	SET_PROGRAM_NAME("cone");
	SET_PROGRAM_SHORT_DESCRIPTION("a program for estimating Ne");
	SET_PROGRAM_LONG_DESCRIPTION(
		CoNe
		computes the likelihood of Ne given data on two temporally spaced
		genetic samples.  The statistical model used is based on the
		coalescent of the gene copies drawn in the second sample\054 as
		described in Berthier et al. (2003) Genetics 2003. 160:741-51.
		The Monte Carlo computations to compute the likelihood\054
		however\054 were developed by Eric Anderson\054 and are orders of
		magnitude faster than previous implementations. 
 
		\n\nDetails of the algorithm are given in Anderson (2005) Genetics 170:955-967.
	);
	SET_VERSION("VERSION: 1.02\nAUTHOR: Eric C. Anderson (eric.anderson@noaa.gov)\nDATE: 1 August 2007")
	SET_PROGRAM_AUTHOR_STRING("Eric C. Anderson (eric.anderson@noaa.gov)");
	SET_VERSION_HISTORY("\
				\n\nVERSION: 1.02\nAUTHOR: Eric C. Anderson (eric.anderson@noaa.gov)\nDATE: 1 August 2007\
				\nCHANGES:\
				\n1. Modified options input to use ECA_Opt3, so it can ouput to guiLiner.\
				\n\nVERSION: 1.01\nAUTHOR: Eric C. Anderson (eric.anderson@noaa.gov)\nDATE: 28 September 2005\
				\nCHANGES:\
				\n1. Added the LOC_SPECIFIC_LOGLS lines on the output.  These\
				\n   provide locus specific log-likelihood curves.\
				\n2. Removed the ALL_LOCI lines of output because they are meaningless\
				\n   when sample sizes differ between loci.\
				\n3. Fixed a bug in the companion utility \"simCoNeprob\" that \
				\n   caused spurious results if the number\
				\n   of lineages at time T was very large, and few replicates were\
				\n   run in simCoNeprob.  (Thanks to Stuart Barker for catching this.) \
				\nCOPYRIGHT: Federal Gov't Work.  No copyright.\n\n\
				\n\nVERSION: 1.0\nAUTHOR: Eric C. Anderson (eric.anderson@noaa.gov)\nDATE: 7 March 2005\nCOPYRIGHT: Federal Gov't Work.  No copyright.\n\n\
				\n\nVERSION: 1.0 beta\nAUTHOR: Eric C. Anderson (eric.anderson@noaa.gov)\nDATE: 21 April 2004\nCOPYRIGHT: Federal Gov't Work.  No copyright. \n\n")
	
 
	BEGIN_OPT_LOOP
	
		OPEN_SUBSET(Data Analysis Options, Data Analysis Options, These options control the inputs and methods for the CoNe analysis);
	
		if ( REQUIRED_OPTION(Data File,
				File_f,
				f,
				file-name,
				F,
				pathname of the data file,
				F is the name of the file in which you have your data.  It is in the same format as
				data files for TM3 (by Pierre Berthier) 
				and TMVP (by Mark Beaumont)
				The data file should start  with a 0 (this is a strange vestige of some sort 
				from TM3 or TMVP) followed by the number of time periods
				***which in this case must
				always be 2*** 
				followed by the number of loci.  Then data for each locus consists of
				the number of alleles observed at the locus followed by a row of counts
				of the different alleles observed in the first sample and a row of counts
				of the different alleles observed at the second sample. (By first sample I mean the
				sample taken first in time going forward.  Hence\054 the second sample is the sample that
				was collected most recently.)  There must be only
				integers and whitespace in the file.  An example file is shown in the FILES section
				of the manual pages.  NOTE! It turns out that it is not essential that the counts of alleles
				observed in the first sample (the one further back in the past) be integers.  It turns out
				to be convenient to express them as real numbers in some cases\054 so I have recoded it so that they
				can be real numbers and not just integers.  Now when the data file is echoed to standard input\054 these counts are expressed
				as real numbers.  Do not let this alarm you.  Everything is OK\054 still.
				 ) ) {
			if( ARGS_EQ(1) ) {
				GET_STR(FILENAME);
			}
		}
		if ( REQUIRED_OPTION(Probs File Path,
					Path_f,
					p,
					path-to-probs-files,
					D, 
					directory path to XXXpr.txt files ,
					This is the pathway to files containing the precomputed probabilities of
					having j lineages remaining at scaled time t given that you started with
					i lineages.
					Note that the trailing slash is required.
					For example on my system 
					~/Documents/eca_code/CoNe/probs/ is the pathway.
					The probs files are a collection of files named
					XXXpr.txt
					where XXX is a number giving the number of gene copies in the 
					second sample.  These files have been precomputed using the program
					simCoNeprob and are described below in FILES.  The CoNe distribution
					includes precomputed XXXpr.txt files with XXX ranging from 10 to 400.  This
					represents samples of between 5 and 200 diploid organisms.  For different sample
					sizes it is necessary to create new XXXpr.txt files using simCoNeprob which is
					also included with the CoNe distribution.
		) ) {
			if( ARGS_EQ(1) ) {
				GET_STR(gProbsPath);
			}
		}
		if ( REQUIRED_OPTION(
				Number of Generations,
				T_f,
				T,
				gens-between,
				R,
				generations between samples, 
				R is the number
					of generations between samples.  It may be specified as a non-integer in order to allow
					for a non-integer number of generations.  
		) ) {
			if( ARGS_EQ(1) ) {
				*T = GET_DUB;
			}
		}
		if ( REQUIRED_OPTION(
				Monte Carlo Reps,
				N_f,
				m,
				mc-reps,
				J,
				number of Monte Carlo reps,
				J is the number
					of importance sampling reps to perform for each value of the number of 
					genes ancestral to the second sample on each locus.  I have found the importance
					sampling algorithm to be good enough that 100 J=100 gives reliable results and
					usually runs very quickly.  However; on a final run of your data set J should be
					much larger.  You can get an idea of whether J should be larger by the width of the
					Monte Carlo confidence intervals around the estimated likelihood curve.
		) ) {
			if( ARGS_EQ(1) ) {
				*NumReps = GET_INT;
			}
		}
		if ( REQUIRED_OPTION(
				Ne Values,
				Nlo_hi_step_f,
				n,
				ne-lo-hi-step,
				R1 R2 R3, 
				values of Ne to compute L(Ne) at ,
				sets the values of Ne for which the likelihood will be computed.
					R1 is the lowest value of Ne. R2 is the highest value of Ne. R3 is
					the step size between values of Ne.  For values of Ne such that T/(2Ne)
					is smaller than .06 (or so) the precomputed values of scaled time
					(stored in the appropriate XXXpr.txt file (see the description of the -p option)) will be
					used.  So the step size may not be that given by R3.  Arguments need 
					not be given as real numbers.  An integer (like 250) will work just fine.
					) ) {
			if( ARGS_EQ(3) ) {
				Nlo = GET_DUB;
				Nhi = GET_DUB;
				Nstep = GET_DUB;
				if(Nlo>Nhi) {
					fprintf(stderr,"Error Processing Option -n/--ne-lo-hi-step! The first argument is less than the second argument.\n");
					OPT_ERROR;
				}
			}
		}
		if (OPTION(
				Allele Freq Prior,
				Prior_f,
				q,
				prior,
				R,
				allele frequency prior parameter,
				R specifies the prior distribution  for  the
              allele frequencies.  R=0 makes the prior a uniform Dirichlet distribution.
              R>0 makes the parameters
              of the Dirichlet prior are R/K. Hence R=1 is  the so-called unit 
              information prior.  R=1 is the default.) ) {
			if( ARGS_EQ(1) ) {
				gPrior = GET_DUB;
				if(gPrior < 0.0) {
					fprintf(stderr,"Error Processing Option -q/--prior!  The argument of this option must be greater than zero\n");
					OPT_ERROR;
				}
			}
		}
		if ( OPTION(
				Random Seed Phrase,
				Seed_f,
				s,
				seed-phrase,
				S,
				random number seed-phrase ,
				S is a single string (no spaces) that will be used to seed the random number generator.  If 
			this option is not invoked then a seed is chosen based on the current time or---if the file
			cone_seeds is present---the seeds are taken from that file.  Upon completion of the 
			program the next random number seeds in series are printed to the file cone_seeds.
			) ) {
			if( ARGS_EQ(1) ) {
				 GET_STR(phrase);;
			}
		}
		CLOSE_SUBSET
		
	END_OPT_LOOP
 
 
software/eca_opt/eca_opt_main.txt · Last modified: 2007/11/13 13:22 by eriq
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki