ECA_Opt is a little collection of macros and underlying code that provides a flexible means of parsing command lines, checking for errors, and printing documentation in various formats. The documentation for each command line option is written into the source code in a streamlined manner, from whence it can be output in several different formats, examples of which appear below:
Relevant Links: (Note: if you are not viewing this at fish-dna-math.homeunix.net (i.e. if you received this file in a distribution of eca_opt) then these links may be inactive. Everything you can get from these links is, however, included in the distribution.)
eca_opt.h
and find there a fair bit of documentation about the macros you can use to get ECA_Opt to work for you.
Below are several examples of the different types of output available
with ECA_Opt. These are all taken from a statistical analysis program
called CoNe
that I wrote for a problem arising in genetics. The source code in
which the ECA_Opt macros are used to store all this goes into a single
function in the program, a listing of which may be viewed at the bottom
of the page, here: source_code_creating_the_example_from_the_program_cone.
This code listing shows that most of the typing to create it has gone
into providing the actual documentation. The amount of code that has to
be written to use ECA_Opt to declare the options, parse the command
line, and do error checking is pretty minimal, and that is a good
thing. ECA_Opt can help minimize the amount of time you spend writing
code to read command line options, provide pretty decent error checking
on the command line, and let you maintain documentation about each
option as you are actually creating it.
CoNe’s short help output is invoked with CoNe --help
and looks like:
cone -- a program for estimating Ne --help short listing of options --help-full long listing of options --help-nroff long listing of options in nroff man-page format --help-xml output all options in XML format --version prints program version information --version-history prints history of different versions --command-file F inserts contents of file F into the command line **** Data Analysis Options **** -f , --file-name F pathname of the data file -p , --path-to-probs-files D directory path to XXXpr.txt files -T , --gens-between R generations between samples -m , --mc-reps J number of Monte Carlo reps -n , --ne-lo-hi-step R1 R2 R3 values of Ne to compute L(Ne) at -q , --prior R allele frequency prior parameter -s , --seed-phrase S random number seed-phrase
The long help format is much more informative and is obtained with CoNe --help-full
. It looks like:
cone -- a program for estimating Ne Author(s): Eric C. Anderson (eric.anderson@noaa.gov) About the Program: CoNe computes the likelihood of Ne given data on two temporally spaced genetic samples. The statistical model used is based on the coalescent of the gene copies drawn in the second sample, as described in Berthier et al. (2003) Genetics 2003. 160:741-51. The Monte Carlo computations to compute the likelihood, however, were developed by Eric Anderson, and are orders of magnitude faster than previous implementations. Details of the algorithm are given in Anderson (2005) Genetics 170:955-967. In the following: "J" refers to an integer argument to an option "R" refers to a real number argument to an option "S" refers to a string argument to an option "F" refers to a file path argument to an option. For example, "datfile.txt" if the file is in the current working directory, or something like "~/eriq/Documents/data/datfile.txt" if you want to provide a complete file path. (Beware of spaces in file paths!) "D" refers to a directory path argument to an option. For example, "data_direcory/" if the directory is in the current working directory, or something like "~/eriq/Documents/data_directory/" if you want to provide a complete directory path. Note that the trailing slash should be optional, but currently is not. (ERIC ADD MACROS FOR GETTING FILES AND DIRECTORIES "G" refers to a string that gives a (possibly) discontinous range of nonnegative integers. For example: "1-5,7,9,10-15" specifies the integers 1 through 5, 7, 9, and 10 through 15. There can be no whitespace in the string specifying the range, and the numbers must all be increasing. Also, the string cannot start nor end with a comma or a dash. Finally, you should not use "-" to denote two ranges without placing any commas in between. "C" refers to a "constrained" string argument to an option, i.e., the argument is a string that may only be drawn from a small set of alternatives, as specified in the help-full description. **** Program-description and documentation parameters **** --help this returns a short list of all program options and associated arguments --help-full this returns a full list of all program options and associated arguments --help-nroff this returns a full list of all program options and associated arguments using the formatting styles in nroff that give you the look of a man page. View the formatted ouput by doing: 'prog --help-nroff | nroff -man | more' where prog is the name of the program. --help-xml This returns a list of all options in a simple XML format which is suitable for input to the guiLiner front end. --version prints program version information --version-history prints history of different versions --command-file F By using this option you can store command line arguments in the file named in F. You may have any kind of white space (including line endings) in the file. The line endings are treated as just another space. Comments may be included in the file by enclosing them within a pair of ampersands (the & character). Note that you must have a & at the beginning and at the end of the comment. You cannot use just a single & to comment to the end of a line. Your comments may spread over several lines---they will still be stripped from the resulting command line so long as the are enclosed in ampersands. This feature is helpful if you have long and complex command lines that you wish to store if it makes it easier to read the command line by breaking it across multiple lines or if you have certain clusters of options that you like to store together as a module. This option may be used up to 10000 times. Optional. **** Data Analysis Options **** -f , --file-name F F is the name of the file in which you have your data. It is in the same format as data files for TM3 (by Pierre Berthier) and TMVP (by Mark Beaumont) The data file should start with a 0 (this is a strange vestige of some sort from TM3 or TMVP) followed by the number of time periods ***which in this case must always be 2*** followed by the number of loci. Then data for each locus consists of the number of alleles observed at the locus followed by a row of counts of the different alleles observed in the first sample and a row of counts of the different alleles observed at the second sample. (By first sample I mean the sample taken first in time going forward. Hence, the second sample is the sample that was collected most recently.) There must be only integers and whitespace in the file. An example file is shown in the FILES section of the manual pages. NOTE! It turns out that it is not essential that the counts of alleles observed in the first sample (the one further back in the past) be integers. It turns out to be convenient to express them as real numbers in some cases, so I have recoded it so that they can be real numbers and not just integers. Now when the data file is echoed to standard input, these counts are expressed as real numbers. Do not let this alarm you. Everything is OK, still. -p , --path-to-probs-files D This is the pathway to files containing the precomputed probabilities of having j lineages remaining at scaled time t given that you started with i lineages. Note that the trailing slash is required. For example on my system ~/Documents/eca_code/CoNe/probs/ is the pathway. The probs files are a collection of files named XXXpr.txt where XXX is a number giving the number of gene copies in the second sample. These files have been precomputed using the program simCoNeprob and are described below in FILES. The CoNe distribution includes precomputed XXXpr.txt files with XXX ranging from 10 to 400. This represents samples of between 5 and 200 diploid organisms. For different sample sizes it is necessary to create new XXXpr.txt files using simCoNeprob which is also included with the CoNe distribution. -T , --gens-between R R is the number of generations between samples. It may be specified as a non-integer in order to allow for a non-integer number of generations. -m , --mc-reps J J is the number of importance sampling reps to perform for each value of the number of genes ancestral to the second sample on each locus. I have found the importance sampling algorithm to be good enough that 100 J=100 gives reliable results and usually runs very quickly. However; on a final run of your data set J should be much larger. You can get an idea of whether J should be larger by the width of the Monte Carlo confidence intervals around the estimated likelihood curve. -n , --ne-lo-hi-step R1 R2 R3 sets the values of Ne for which the likelihood will be computed. R1 is the lowest value of Ne. R2 is the highest value of Ne. R3 is the step size between values of Ne. For values of Ne such that T/(2Ne) is smaller than .06 (or so) the precomputed values of scaled time (stored in the appropriate XXXpr.txt file (see the description of the -p option)) will be used. So the step size may not be that given by R3. Arguments need not be given as real numbers. An integer (like 250) will work just fine. -q , --prior R R specifies the prior distribution for the allele frequencies. R=0 makes the prior a uniform Dirichlet distribution. R>0 makes the parameters of the Dirichlet prior are R/K. Hence R=1 is the so-called unit information prior. R=1 is the default. -s , --seed-phrase S S is a single string (no spaces) that will be used to seed the random number generator. If this option is not invoked then a seed is chosen based on the current time or---if the file cone_seeds is present---the seeds are taken from that file. Upon completion of the program the next random number seeds in series are printed to the file cone_seeds.
This format has all the same information as the long help format, but it is formatted so that when processed with nroff -man
and read through less
(or more
if more
is less
on your *nix system) it has the nice look of a Unix man page. The output is obtained with CoNe --help-nroff
. The raw output is:
.\" Process this file with .\" groff -man -Tascii FileName.1 .\" .TH CoNe2 1 "Date Not Available" AUTO_GENERATED_BY_ECA_OPTION_PACKAGE "User Manuals" .SH NAME cone -- a program for estimating Ne .SH AUTHOR(S) Eric C. Anderson (eric.anderson@noaa.gov) .SH ABOUT THE PROGRAM CoNe computes the likelihood of Ne given data on two temporally spaced genetic samples. The statistical model used is based on the coalescent of the gene copies drawn in the second sample, as described in Berthier et al. (2003) Genetics 2003. 160:741-51. The Monte Carlo computations to compute the likelihood, however, were developed by Eric Anderson, and are orders of magnitude faster than previous implementations. Details of the algorithm are given in Anderson (2005) Genetics 170:955-967..\" starting program-generated options part .SH OPTIONS .PP In the following: .IR J " refers to an integer argument to an option" .IR R " refers to a real number argument to an option" .IR S " refers to a string argument to an option" .IR F " refers to a file path argument to an option. For example," "datfile.txt" if the file is in the current working directory, or something like "~/eriq/Documents/data/datfile.txt" if you want to provide a complete file path. (Beware of spaces in file paths!) .IR D " refers to a directory path argument to an option. For example," "data_direcory/" if the directory is in the current working directory, or something like "~/eriq/Documents/data_directory/" if you want to provide a complete directory path. Note that the trailing slash should be optional, but currently is not. (ERIC ADD MACROS FOR GETTING FILES AND DIRECTORIES) .IR G " refers to a string that gives a (possibly) discontinous range of" nonnegative integers. For example: "1-5,7,9,10-15" specifies the integers 1 through 5, 7, 9, and 10 through 15. There can be no whitespace in the string specifying the range, and the numbers must all be increasing. Also, the string cannot start nor end with a comma or a dash. Finally, you should not use "-" to denote two ranges without placing any commas in between. .IR C " refers to a constrained string argument to an option," i.e., the argument is a string that may only be drawn from a small set of alternatives, as specified in the help-full description. **** Program-description and documentation parameters **** .PP .B "--help" .RS this returns a short list of all program options and associated arguments .RE .PP .B "--help-full" .RS this returns a full list of all program options and associated arguments .RE .PP .B "--help-nroff" .RS this returns a full list of all program options and associated arguments using the formatting styles in nroff that give you the look of a man page. View the formatted ouput by doing: 'prog --help-nroff | nroff -man | more' where prog is the name of the program. .RE .PP .B "--help-xml" .RS This returns a list of all options in a simple XML format which is suitable for input to the guiLiner front end. .RE .PP .B "--version" .RS prints program version information .RE .PP .B "--version-history" .RS prints history of different versions .RE .PP .BI "--command-file" " F" .RS By using this option you can store command line arguments in the file named in F. You may have any kind of white space (including line endings) in the file. The line endings are treated as just another space. Comments may be included in the file by enclosing them within a pair of ampersands (the & character). Note that you must have a & at the beginning and at the end of the comment. You cannot use just a single & to comment to the end of a line. Your comments may spread over several lines---they will still be stripped from the resulting command line so long as the are enclosed in ampersands. This feature is helpful if you have long and complex command lines that you wish to store if it makes it easier to read the command line by breaking it across multiple lines or if you have certain clusters of options that you like to store together as a module. This option may be used up to 10000 times. Optional. .RE .PP **** Data Analysis Options **** .BI "-f , --file-name " " F" .RS F is the name of the file in which you have your data. It is in the same format as data files for TM3 (by Pierre Berthier) and TMVP (by Mark Beaumont) The data file should start with a 0 (this is a strange vestige of some sort from TM3 or TMVP) followed by the number of time periods ***which in this case must always be 2*** followed by the number of loci. Then data for each locus consists of the number of alleles observed at the locus followed by a row of counts of the different alleles observed in the first sample and a row of counts of the different alleles observed at the second sample. (By first sample I mean the sample taken first in time going forward. Hence, the second sample is the sample that was collected most recently.) There must be only integers and whitespace in the file. An example file is shown in the FILES section of the manual pages. NOTE! It turns out that it is not essential that the counts of alleles observed in the first sample (the one further back in the past) be integers. It turns out to be convenient to express them as real numbers in some cases, so I have recoded it so that they can be real numbers and not just integers. Now when the data file is echoed to standard input, these counts are expressed as real numbers. Do not let this alarm you. Everything is OK, still. .RE .PP .BI "-p , --path-to-probs-files " " D" .RS This is the pathway to files containing the precomputed probabilities of having j lineages remaining at scaled time t given that you started with i lineages. Note that the trailing slash is required. For example on my system ~/Documents/eca_code/CoNe/probs/ is the pathway. The probs files are a collection of files named XXXpr.txt where XXX is a number giving the number of gene copies in the second sample. These files have been precomputed using the program simCoNeprob and are described below in FILES. The CoNe distribution includes precomputed XXXpr.txt files with XXX ranging from 10 to 400. This represents samples of between 5 and 200 diploid organisms. For different sample sizes it is necessary to create new XXXpr.txt files using simCoNeprob which is also included with the CoNe distribution. .RE .PP .BI "-T , --gens-between " " R" .RS R is the number of generations between samples. It may be specified as a non-integer in order to allow for a non-integer number of generations. .RE .PP .BI "-m , --mc-reps " " J" .RS J is the number of importance sampling reps to perform for each value of the number of genes ancestral to the second sample on each locus. I have found the importance sampling algorithm to be good enough that 100 J=100 gives reliable results and usually runs very quickly. However; on a final run of your data set J should be much larger. You can get an idea of whether J should be larger by the width of the Monte Carlo confidence intervals around the estimated likelihood curve. .RE .PP .BI "-n , --ne-lo-hi-step " " R1 R2 R3" .RS sets the values of Ne for which the likelihood will be computed. R1 is the lowest value of Ne. R2 is the highest value of Ne. R3 is the step size between values of Ne. For values of Ne such that T/(2Ne) is smaller than .06 (or so) the precomputed values of scaled time (stored in the appropriate XXXpr.txt file (see the description of the -p option)) will be used. So the step size may not be that given by R3. Arguments need not be given as real numbers. An integer (like 250) will work just fine. .RE .PP .BI "-q , --prior " " R" .RS R specifies the prior distribution for the allele frequencies. R=0 makes the prior a uniform Dirichlet distribution. R>0 makes the parameters of the Dirichlet prior are R/K. Hence R=1 is the so-called unit information prior. R=1 is the default. .RE .PP .BI "-s , --seed-phrase " " S" .RS S is a single string (no spaces) that will be used to seed the random number generator. If this option is not invoked then a seed is chosen based on the current time or---if the file cone_seeds is present---the seeds are taken from that file. Upon completion of the program the next random number seeds in series are printed to the file cone_seeds. .RE .PP .\" done with program-generated options part
A snapshot of what it looks like once it is processed with nroff
using, say,
CoNe --helpf-nroff | nroff -man | more
appears below:
Additionally, should you want to get html output for the man page you could do:
CoNe --help-nroff | groff -man -Thtml > cone_man_page.html
and get HTML output that looks like cone_man_page.html (Link broken if not on fish-dna-math.homeunix.net). Or, if you wanted to get postscript output you could do:
CoNe --help-nroff | groff -man -Tps > cone_man_page.ps
Then, once that is converted to a pdf file, it looks like cone_man_page.pdf (Link broken if not on fish-dna-math.homeunix.net).
Finally, we have GuiLiner output. Doing
CoNe --help-xml
will produce output which, if captured in a file with the .xml
extension will provide input to GuiLiner to assemble a GUI to “host” your command line program. See http://sourceforge.net/projects/guiliner/ for more details. The output produced in this case looks like:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!--A GuiLiner specification for cone-->
<guiLiner>
<program>
<binary_name>cone</binary_name>
<binary_location></binary_location>
<binary_html_manual_index></binary_html_manual_index>
<binary_short_info>a program for estimating Ne</binary_short_info>
<binary_long_info>
<![CDATA[
CoNe computes the likelihood of Ne given data on two temporally spaced genetic
samples. The statistical model used is based on the coalescent of the gene
copies drawn in the second sample, as described in Berthier et al. (2003)
Genetics 2003. 160:741-51. The Monte Carlo computations to compute the
likelihood, however, were developed by Eric Anderson, and are orders of
magnitude faster than previous implementations. <br><br>Details of the
algorithm are given in Anderson (2005) Genetics 170:955-967.
]]>
</binary_long_info>
<gL_window_size>800</gL_window_size>
<look_feel>java</look_feel> <!-- MAY BE 'SYSTEM' OR 'JAVA' -->
<wizard>false</wizard>
</program>
<options>
<option>
<option_type>info</option_type>
<option_name>Version</option_name>
<bin></bin>
<option_flag>--version</option_flag>
<option_linker> </option_linker>
<option_required>false</option_required>
<option_can_appear_in_command>false</option_can_appear_in_command>
<option_short_description>prints program version information</option_short_description>
<option_long_description></option_long_description>
</option>
<option>
<option_type>info</option_type>
<option_name>Version History</option_name>
<bin></bin>
<option_flag>--version-history</option_flag>
<option_linker> </option_linker>
<option_required>false</option_required>
<option_can_appear_in_command>false</option_can_appear_in_command>
<option_short_description>prints history of different versions</option_short_description>
<option_long_description></option_long_description>
</option>
<option>
<option_type>mflag</option_type>
<option_name>Command File</option_name>
<option_subtext>input format is: F</option_subtext>
<values></values>
<option_flag>--command-file</option_flag>
<option_linker> </option_linker>
<option_required>false</option_required>
<option_can_appear_in_command>true</option_can_appear_in_command>
<option_short_description>inserts contents of file F into the command line</option_short_description>
<option_long_description>
<![CDATA[
By using this option you can store command line arguments in the file
named in F. You may have any kind of white space (including line
endings) in the file. The line endings are treated as just another
space. Comments may be included in the file by enclosing them within
a pair of ampersands (the & character). Note that you must have a &
at the beginning and at the end of the comment. You cannot use just a
single & to comment to the end of a line. Your comments may spread
over several lines---they will still be stripped from the resulting
command line so long as the are enclosed in ampersands. This feature
is helpful if you have long and complex command lines that you wish
to store if it makes it easier to read the command line by breaking
it across multiple lines or if you have certain clusters of options
that you like to store together as a module. This option may be used
up to 10000 times. Optional.
<br><br>Expected Format Of Arguments: F ]]>
</option_long_description>
</option>
<subset>
<subset_name>Data Analysis Options</subset_name>
<subset_description>These options control the inputs and methods for the CoNe analysis</subset_description>
<option>
<option_type>filechooser</option_type>
<option_name>Data File</option_name>
<file></file>
<option_flag>-f</option_flag>
<option_linker> </option_linker>
<option_required>true</option_required>
<option_can_appear_in_command>true</option_can_appear_in_command>
<option_short_description>pathname of the data file</option_short_description>
<option_long_description>
<![CDATA[
F is the name of the file in which you have your data. It is in the
same format as data files for TM3 (by Pierre Berthier) and TMVP (by
Mark Beaumont) The data file should start with a 0 (this is a strange
vestige of some sort from TM3 or TMVP) followed by the number of time
periods ***which in this case must always be 2*** followed by the
number of loci. Then data for each locus consists of the number of
alleles observed at the locus followed by a row of counts of the
different alleles observed in the first sample and a row of counts of
the different alleles observed at the second sample. (By first sample
I mean the sample taken first in time going forward. Hence, the
second sample is the sample that was collected most recently.) There
must be only integers and whitespace in the file. An example file is
shown in the FILES section of the manual pages. NOTE! It turns out
that it is not essential that the counts of alleles observed in the
first sample (the one further back in the past) be integers. It turns
out to be convenient to express them as real numbers in some cases,
so I have recoded it so that they can be real numbers and not just
integers. Now when the data file is echoed to standard input, these
counts are expressed as real numbers. Do not let this alarm you.
Everything is OK, still.
]]>
</option_long_description>
</option>
<option>
<option_type>complex</option_type>
<option_name>Probs File Path</option_name>
<option_subtext>D</option_subtext>
<complex></complex>
<option_flag>-p</option_flag>
<option_linker> </option_linker>
<option_required>true</option_required>
<option_can_appear_in_command>true</option_can_appear_in_command>
<option_short_description>directory path to XXXpr.txt files</option_short_description>
<option_long_description>
<![CDATA[
This is the pathway to files containing the precomputed probabilities
of having j lineages remaining at scaled time t given that you
started with i lineages. Note that the trailing slash is required.
For example on my system ~/Documents/eca_code/CoNe/probs/ is the
pathway. The probs files are a collection of files named XXXpr.txt
where XXX is a number giving the number of gene copies in the second
sample. These files have been precomputed using the program
simCoNeprob and are described below in FILES. The CoNe distribution
includes precomputed XXXpr.txt files with XXX ranging from 10 to 400.
This represents samples of between 5 and 200 diploid organisms. For
different sample sizes it is necessary to create new XXXpr.txt files
using simCoNeprob which is also included with the CoNe distribution.
<br><br>Expected Format Of Arguments: D
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
<DL>
<DT>D</DT>
<DD>refers to a directory path argument to an option, typically with the trailing slash. For example: /Users/eriq/data_directory/</DD>
]]>
</option_long_description>
</option>
<option>
<option_type>complex</option_type>
<option_name>Number of Generations</option_name>
<option_subtext>R</option_subtext>
<complex></complex>
<option_flag>-T</option_flag>
<option_linker> </option_linker>
<option_required>true</option_required>
<option_can_appear_in_command>true</option_can_appear_in_command>
<option_short_description>generations between samples</option_short_description>
<option_long_description>
<![CDATA[
R is the number of generations between samples. It may be specified as
a non-integer in order to allow for a non-integer number of
generations.
<br><br>Expected Format Of Arguments: R
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
<DL>
<DT>R</DT>
<DD>refers to a real number argument to an option. For example: 2.3</DD>
]]>
</option_long_description>
</option>
<option>
<option_type>complex</option_type>
<option_name>Monte Carlo Reps</option_name>
<option_subtext>J</option_subtext>
<complex></complex>
<option_flag>-m</option_flag>
<option_linker> </option_linker>
<option_required>true</option_required>
<option_can_appear_in_command>true</option_can_appear_in_command>
<option_short_description>number of Monte Carlo reps</option_short_description>
<option_long_description>
<![CDATA[
J is the number of importance sampling reps to perform for each value
of the number of genes ancestral to the second sample on each locus.
I have found the importance sampling algorithm to be good enough that
100 J=100 gives reliable results and usually runs very quickly.
However; on a final run of your data set J should be much larger. You
can get an idea of whether J should be larger by the width of the
Monte Carlo confidence intervals around the estimated likelihood
curve.
<br><br>Expected Format Of Arguments: J
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
<DL>
<DT>J</DT>
<DD>refers to an integer argument to an option. For example: 15</DD>
]]>
</option_long_description>
</option>
<option>
<option_type>complex</option_type>
<option_name>Ne Values</option_name>
<option_subtext>R1 R2 R3</option_subtext>
<complex></complex>
<option_flag>-n</option_flag>
<option_linker> </option_linker>
<option_required>true</option_required>
<option_can_appear_in_command>true</option_can_appear_in_command>
<option_short_description>values of Ne to compute L(Ne) at</option_short_description>
<option_long_description>
<![CDATA[
sets the values of Ne for which the likelihood will be computed. R1 is
the lowest value of Ne. R2 is the highest value of Ne. R3 is the step
size between values of Ne. For values of Ne such that T/(2Ne) is
smaller than .06 (or so) the precomputed values of scaled time
(stored in the appropriate XXXpr.txt file (see the description of the
-p option)) will be used. So the step size may not be that given by
R3. Arguments need not be given as real numbers. An integer (like
250) will work just fine.
<br><br>Expected Format Of Arguments: R1 R2 R3
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
<DL>
<DT>R</DT>
<DD>refers to a real number argument to an option. For example: 2.3</DD>
]]>
</option_long_description>
</option>
<option>
<option_type>complex</option_type>
<option_name>Allele Freq Prior</option_name>
<option_subtext>R</option_subtext>
<complex></complex>
<option_flag>-q</option_flag>
<option_linker> </option_linker>
<option_required>false</option_required>
<option_can_appear_in_command>true</option_can_appear_in_command>
<option_short_description>allele frequency prior parameter</option_short_description>
<option_long_description>
<![CDATA[
R specifies the prior distribution for the allele frequencies. R=0
makes the prior a uniform Dirichlet distribution. R>0 makes the
parameters of the Dirichlet prior are R/K. Hence R=1 is the so-called
unit information prior. R=1 is the default.
<br><br>Expected Format Of Arguments: R
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
<DL>
<DT>R</DT>
<DD>refers to a real number argument to an option. For example: 2.3</DD>
]]>
</option_long_description>
</option>
<option>
<option_type>complex</option_type>
<option_name>Random Seed Phrase</option_name>
<option_subtext>S</option_subtext>
<complex></complex>
<option_flag>-s</option_flag>
<option_linker> </option_linker>
<option_required>false</option_required>
<option_can_appear_in_command>true</option_can_appear_in_command>
<option_short_description>random number seed-phrase</option_short_description>
<option_long_description>
<![CDATA[
S is a single string (no spaces) that will be used to seed the random
number generator. If this option is not invoked then a seed is chosen
based on the current time or---if the file cone_seeds is
present---the seeds are taken from that file. Upon completion of the
program the next random number seeds in series are printed to the
file cone_seeds.
<br><br>Expected Format Of Arguments: S
<br><br><hr><br>The elements of the expected format string are described by their first letter as follows:<br>
<DL>
<DT>S</DT>
<DD>refers to a string argument to an option. For example: boing</DD>
]]>
</option_long_description>
</option>
</subset>
</options>
</guiLiner>
The following is the fragment of a function called GetCoNeData
where all the options and documentation are defined that allow ECA_Opt
to produce the documentation seen in the examples. The syntax
highlighting in the following highlights all ECA_Opt macros in
UPPERCASE turquoise. (there are a few lowercase turquoise “options” in
there that shouldn’t be highlighted as they are not ECA_Opt macros).
void GetCoNeData(char *FILENAME, ldat **loc, char **argv, int argc, int *NumLoc, double *T, int *NumReps, char *phrase, Nstruct **Ns) { int i,j,l,K,n0,nT,temp; FILE *in; double Nlo,Nhi,Nstep; int NumSam; int File_f = 0, T_f = 0, Path_f = 0, N_f = 0, Seed_f = 0, Nlo_hi_step_f = 0, Prior_f = 0; DECLARE_ECA_OPT_VARS; /* some defaults */ gPrior = 1.0; sprintf(phrase,""); SET_OPT_WIDTH(28); SET_ARG_WIDTH(17); SET_PROGRAM_NAME("cone"); SET_PROGRAM_SHORT_DESCRIPTION("a program for estimating Ne"); SET_PROGRAM_LONG_DESCRIPTION( CoNe computes the likelihood of Ne given data on two temporally spaced genetic samples. The statistical model used is based on the coalescent of the gene copies drawn in the second sample\054 as described in Berthier et al. (2003) Genetics 2003. 160:741-51. The Monte Carlo computations to compute the likelihood\054 however\054 were developed by Eric Anderson\054 and are orders of magnitude faster than previous implementations. \n\nDetails of the algorithm are given in Anderson (2005) Genetics 170:955-967. ); SET_VERSION("VERSION: 1.02\nAUTHOR: Eric C. Anderson (eric.anderson@noaa.gov)\nDATE: 1 August 2007") SET_PROGRAM_AUTHOR_STRING("Eric C. Anderson (eric.anderson@noaa.gov)"); SET_VERSION_HISTORY("\ \n\nVERSION: 1.02\nAUTHOR: Eric C. Anderson (eric.anderson@noaa.gov)\nDATE: 1 August 2007\ \nCHANGES:\ \n1. Modified options input to use ECA_Opt3, so it can ouput to guiLiner.\ \n\nVERSION: 1.01\nAUTHOR: Eric C. Anderson (eric.anderson@noaa.gov)\nDATE: 28 September 2005\ \nCHANGES:\ \n1. Added the LOC_SPECIFIC_LOGLS lines on the output. These\ \n provide locus specific log-likelihood curves.\ \n2. Removed the ALL_LOCI lines of output because they are meaningless\ \n when sample sizes differ between loci.\ \n3. Fixed a bug in the companion utility \"simCoNeprob\" that \ \n caused spurious results if the number\ \n of lineages at time T was very large, and few replicates were\ \n run in simCoNeprob. (Thanks to Stuart Barker for catching this.) \ \nCOPYRIGHT: Federal Gov't Work. No copyright.\n\n\ \n\nVERSION: 1.0\nAUTHOR: Eric C. Anderson (eric.anderson@noaa.gov)\nDATE: 7 March 2005\nCOPYRIGHT: Federal Gov't Work. No copyright.\n\n\ \n\nVERSION: 1.0 beta\nAUTHOR: Eric C. Anderson (eric.anderson@noaa.gov)\nDATE: 21 April 2004\nCOPYRIGHT: Federal Gov't Work. No copyright. \n\n") BEGIN_OPT_LOOP OPEN_SUBSET(Data Analysis Options, Data Analysis Options, These options control the inputs and methods for the CoNe analysis); if ( REQUIRED_OPTION(Data File, File_f, f, file-name, F, pathname of the data file, F is the name of the file in which you have your data. It is in the same format as data files for TM3 (by Pierre Berthier) and TMVP (by Mark Beaumont) The data file should start with a 0 (this is a strange vestige of some sort from TM3 or TMVP) followed by the number of time periods ***which in this case must always be 2*** followed by the number of loci. Then data for each locus consists of the number of alleles observed at the locus followed by a row of counts of the different alleles observed in the first sample and a row of counts of the different alleles observed at the second sample. (By first sample I mean the sample taken first in time going forward. Hence\054 the second sample is the sample that was collected most recently.) There must be only integers and whitespace in the file. An example file is shown in the FILES section of the manual pages. NOTE! It turns out that it is not essential that the counts of alleles observed in the first sample (the one further back in the past) be integers. It turns out to be convenient to express them as real numbers in some cases\054 so I have recoded it so that they can be real numbers and not just integers. Now when the data file is echoed to standard input\054 these counts are expressed as real numbers. Do not let this alarm you. Everything is OK\054 still. ) ) { if( ARGS_EQ(1) ) { GET_STR(FILENAME); } } if ( REQUIRED_OPTION(Probs File Path, Path_f, p, path-to-probs-files, D, directory path to XXXpr.txt files , This is the pathway to files containing the precomputed probabilities of having j lineages remaining at scaled time t given that you started with i lineages. Note that the trailing slash is required. For example on my system ~/Documents/eca_code/CoNe/probs/ is the pathway. The probs files are a collection of files named XXXpr.txt where XXX is a number giving the number of gene copies in the second sample. These files have been precomputed using the program simCoNeprob and are described below in FILES. The CoNe distribution includes precomputed XXXpr.txt files with XXX ranging from 10 to 400. This represents samples of between 5 and 200 diploid organisms. For different sample sizes it is necessary to create new XXXpr.txt files using simCoNeprob which is also included with the CoNe distribution. ) ) { if( ARGS_EQ(1) ) { GET_STR(gProbsPath); } } if ( REQUIRED_OPTION( Number of Generations, T_f, T, gens-between, R, generations between samples, R is the number of generations between samples. It may be specified as a non-integer in order to allow for a non-integer number of generations. ) ) { if( ARGS_EQ(1) ) { *T = GET_DUB; } } if ( REQUIRED_OPTION( Monte Carlo Reps, N_f, m, mc-reps, J, number of Monte Carlo reps, J is the number of importance sampling reps to perform for each value of the number of genes ancestral to the second sample on each locus. I have found the importance sampling algorithm to be good enough that 100 J=100 gives reliable results and usually runs very quickly. However; on a final run of your data set J should be much larger. You can get an idea of whether J should be larger by the width of the Monte Carlo confidence intervals around the estimated likelihood curve. ) ) { if( ARGS_EQ(1) ) { *NumReps = GET_INT; } } if ( REQUIRED_OPTION( Ne Values, Nlo_hi_step_f, n, ne-lo-hi-step, R1 R2 R3, values of Ne to compute L(Ne) at , sets the values of Ne for which the likelihood will be computed. R1 is the lowest value of Ne. R2 is the highest value of Ne. R3 is the step size between values of Ne. For values of Ne such that T/(2Ne) is smaller than .06 (or so) the precomputed values of scaled time (stored in the appropriate XXXpr.txt file (see the description of the -p option)) will be used. So the step size may not be that given by R3. Arguments need not be given as real numbers. An integer (like 250) will work just fine. ) ) { if( ARGS_EQ(3) ) { Nlo = GET_DUB; Nhi = GET_DUB; Nstep = GET_DUB; if(Nlo>Nhi) { fprintf(stderr,"Error Processing Option -n/--ne-lo-hi-step! The first argument is less than the second argument.\n"); OPT_ERROR; } } } if (OPTION( Allele Freq Prior, Prior_f, q, prior, R, allele frequency prior parameter, R specifies the prior distribution for the allele frequencies. R=0 makes the prior a uniform Dirichlet distribution. R>0 makes the parameters of the Dirichlet prior are R/K. Hence R=1 is the so-called unit information prior. R=1 is the default.) ) { if( ARGS_EQ(1) ) { gPrior = GET_DUB; if(gPrior < 0.0) { fprintf(stderr,"Error Processing Option -q/--prior! The argument of this option must be greater than zero\n"); OPT_ERROR; } } } if ( OPTION( Random Seed Phrase, Seed_f, s, seed-phrase, S, random number seed-phrase , S is a single string (no spaces) that will be used to seed the random number generator. If this option is not invoked then a seed is chosen based on the current time or---if the file cone_seeds is present---the seeds are taken from that file. Upon completion of the program the next random number seeds in series are printed to the file cone_seeds. ) ) { if( ARGS_EQ(1) ) { GET_STR(phrase);; } } CLOSE_SUBSET END_OPT_LOOP