Skip to content

Module: Argument Parser

smehringer edited this page Dec 10, 2018 · 19 revisions

Helpful Links:

Design decisions:

Internal Parsing of the command line

There are to ways to handle command line parsing and bridge the developer-user communication. both have advantaes and disadvantages:

1. [Developer->User]

Store the options/parameters/flags/.. specified by the developer in a list. When parsing starts, we already now the names, types and properties of every option and can dissect the command line piece by piece by looking for expected options.

Advantages:

  • Knowing the options/.. beforehand can solve ambiguousness and help to throw helpful errors. E.g. -i hello could be a flag i and argument/positional hello or it could be an option-value pair. The developer thus is not restricted to specify the parser set up in a certain order.

Disadvantages:

  • Storing options. is difficult for type handling. SeqAn2: define extra types like SEQAN::ARGUMENT_TYPE::INTEGER, which is a lot of overhead, limits the types, and requires maintenance. A >C++-11 solution would be to store the options/.., which are templetized by the option type, in vector of std::any or via an std::variant. This gets tedious for a lot of functionality like transforming the command line string into the value type because I would always need to ask for the type.

2. [User->Developer]

Parse the command line first and store user defined options/flags/parameters/.. in a list. Now check every option specified by the developer directly for existence. Advantages:

  • Since the developer specified options are retrieved directly I can have functions working on any type, because I don't need to store them.

Disadvantages:

  • I need to take care of ambiguousness, e.g. -i hello flag+argument vs. option-value, by storing both possibilities and restricting the developer to set up the parser in a strict order, e.g. first specify all flags, than all options.

Our approach πŸ₯‡

To avoid the above mentioned disadvantages our current approach combines the two others. We do not want to store the options in a list but parse them directly, but we also want to allow the developer to specify options and arguments in an independent order.

Current interface:

// positional option
// in order, required (except if last one is a list)
add_positional_option(value_type & save_place,         // e.g. myOptions.outputFile
                      std::string const & description, // temporary string for help page output
                      option_spec const & spec, // (optional) default, advanced, hidden
                      functor const & validator, // (optional) functor satisfying the validator concept
                      );

// non-valued option (existance is true/false)
add_flag(bool & save_place               // e.g. myOptions.fastMode
         char const short_id,            // -f 
         std::string const & long_id,    // --fast
         std::string const & description // "fast mode is switched on"
         );

// valued option
add_option(value_type & save_place,        // e.g. myOptions.outputFile
           char const short_id,            // -o
           std::string const & long_id,    // --output
           std::string const & description // "give the name of an output file."
           option_spec const & spec,       // (optional) default, advanced, hidden
           functor const & validator,      // (optional) functor satisfying the validator concept
           );

savePlace would be an existing variable that also defines the type of the argument. savePlace can be set to a default value upon definition/creation / outside of the add_* function. validator is a functor that verifies the argument. It can be user-specified, but there shall be pre-defined validators for integral ranges (takes a pair of integral) or file extensions (takes a container of std::string).

Legal ways to specify flags (options without value):

Allowed:
-i     (simple, one char short id)
-fi    (clustered, one char short id)
--iter (simple, multi char long id, no cluster allowed)

Legal ways to specify options:

Allowed:
-i 5     (short id, space separation)
-i5      (short id, no space separation)
--iter 5 (long id, space separation)
--iter=5 (long id, equality sign separation)

Not allowed:
-i=5     (short id, equality sign separation)
--iter5  (long id, no space separation)
-fi 5 (where f is flag)
-fi5  (where f is flag)
-fi=5 (where f is flag)

Validators

Validators are designed as functors which throw if a value does not pass validation and thus terminate the program. They are not Semi-regular since default construction is not possible right now. The reason for this decision was that there is no obvious use case to default construct a validator and change its state later on, as most validators are given directly as rvalues to the add_option(myint, ..., range_validator{0,10}) call. If default construction is implemented later on, it would be preferable that a default constructed validator always accepts every value (type dependent).

Outstanding decisions:

  • Naming convention: parameter option argument positional anonymous...
  • Should the parser be put into it's own repo?

TODO

Wishlist from h-2:

  • also support enums as TValue and offer a validator that accepts the enum labels as valid strings. This will be a little tricky to implement, but one could check how e.g. Cereal get string values from enum labels.
  • the input_file_validator (and output_file_validator and directory...) shall check via http://en.cppreference.com/w/cpp/filesystem that the file/directory exists / is readable / writeable et cetera.

Questions:

  • rename option to parameter?

Check if the following is supported: (taken from seqan2 feature request issue #533)

Martin Frith notes that there are some deficiencies with the current state of the argument parser:

  1. It requires space between an option and its argument.
  2. It doesn't allow "-" as a positional argument. "-" is often used to indicate stdin and stdout.
  3. It doesn't seem to allow a list of zero or more positional arguments. (It does allow a list of one or more positional arguments.) It's standard to allow a list of zero or more file-names, with zero meaning "read stdin".
  4. It doesn't recognize "--" meaning "end of options (http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html). This is useful to allow negative numbers as positional arguments.
  5. It requires argv to be const, whereas, strictly speaking, argv should not be const (http://en.wikipedia.org/wiki/Main_function). And it's not possible to convert non-const argv to const (http://www.parashift.com/c++-faq/constptrptr-conversion.html).
  6. (Minor): It's not possible to display a fake default value in the help message. This might be useful if the default value is logically -INF, but as an implementation detail is actually INT_MIN.
Clone this wiki locally