Skip to content

Commit

Permalink
Improved pregex and plex command-line tools
Browse files Browse the repository at this point in the history
  • Loading branch information
phorward committed Apr 5, 2018
1 parent 744059a commit efd87e0
Show file tree
Hide file tree
Showing 5 changed files with 94 additions and 51 deletions.
10 changes: 6 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,20 @@ This file is used to document any relevant changes done to libphorward.
Current development version.

- Parsing tools
- Created better definition language called PBNF (Phorward BNF, pbnf.c)
- Revised all modules, separating the grammar definition entirely from the
parsing algorithm, lexer, parser and abstract syntax tree. This allows for
a much higher modularity. The ``pppar`` object now represents the internal
LALR parser that can be executed on arbitrary input, with a combined lexical
analyzer implemented using ``plex``.
- Revised and simplified LR parser driver, now working on state machine, and
not the data-structures from lr.c anymore.
- Created better definition language called PBNF (Phorward BNF, pbnf.c)
- Support for a BNF, EBNF and a Phorward-style BNF (PBNF) as input grammars
using the functions pp_gram_from_bnf(), pp_gram_from_ebnf() and
pp_gram_from_pbnf().
- Regular expressions
- Internal revisions and renamings, cleaning data structures from ephemeral
values
- Internal revisions and renamings.
- Cleaning data structures from temporal and ephemeral values.
- Removing ``pregex_accept`` structure
- Renamed ``begin`` to ``start`` in the ``prange`` structure.
- Trace facilities
Expand All @@ -29,7 +29,9 @@ Current development version.
- New LOG macro to allow for printf-style formatted output.
- Bugfixes
- Improved the plex command-line utility, it now recognizes `-b` and `-e`
correctly and can read from stdin.
correctly, allows for escape sequences and can read from stdin.
- Improved the pregex command-line utility to use the input parameter as is,
if the parameter is not the name of a file.
- Removed warnings and unused static functions from the entire library.
- Fixing & refactoring in p_ccl_parseshorthand() that caused invalid dfa state
machines generated from regular expressions on some 32-bit machine
Expand Down
53 changes: 23 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,65 +169,58 @@ int main( int argc, char** argv )
- Consequent object-oriented build-up of all function interfaces (e.g. plist, parray, pregex, pparse, ...)
- Growing code-base of more and more powerful functions
## Getting started
## Documentation
*phorward* is under heavy development. It is kept simple, clear and straightforward. Recent documentation can be found [here](https://www.phorward-software.com/products/phorward/doc/phorward.html), but also locally after installation.
Recently updated, full documentation can be found [here](https://www.phorward-software.com/products/phorward/doc/phorward.html), but also locally after installation. The documentation currently focuses on the stable parts of the library only. Parts which are experimental or under-development not covered or only shortly mentioned.
The documentation is currently in an under-development state and incomplete. It contains a generated functions reference and handles all library parts shortly.
## Building
### Building
Building *phorward* is simple as every GNU-style open source program. Extract the downloaded release tarball or clone the source repository into a directory of your choice.
Then, run
Building *phorward* is simple as every GNU-style open source program. Extract the downloaded release tarball or clone the source repository into a directory of your choice. Then, do the steps
```bash
$ ./configure
```

to configure the build-system and generate the Makefiles for your current platform. After successful configuration, run

```bash
$ make
$ make install
```

and
And you're ready to go!

```bash
$ make install
```
### Windows platforms

(properly as root), to install the toolkit into your system.
On Windows, the usage of [Cygwin #http://cygwin.org/] or another Unix shell environment is required. *phorward* also perfectly cross-compiles on Linux using the MinGW and MinGW_x86-64 compilers.

### Local building
To compile into 32-Bit Windows executables, configure with

Alternatively there is also a simpler method for setting up a local build system for development and testing purposes locally in the file-system.
```bash
$ ./configure --host=i486-mingw32 --prefix=/usr/i486-mingw32
```

Once, type
To compile into 64-Bit Windows executables, configure with

```bash
$ make -f Makefile.gnu make_install
$ ./configure --host=x86_64-w64-mingw32 --prefix=/usr/x86_64-w64-mingw32
```

then, a simple run of
### Local development build

Alternatively there is also a simpler method for setting up a local build system for development and testing purposes locally in the file-system:

```bash
$ make -f Makefile.gnu make_install
$ make
```

can be used to simply build the entire library or parts of it.

Note, that changes to the build system then must be done in the local Makefile, the local Makefile.gnu as well as the Makefile.am for the autotools-based build system.
This locally compiles the toolkit and parts of it.

## Credits

*phorward* is developed and maintained by Jan Max Meyer at Phorward Software Technologies.

Some other projects by the author are:
Some other, related projects by the author are:

- [UniCC](https://github.com/phorward/unicc), the universal parser generator, mostly based on *phorward*,
- [RapidBATCH](https://github.com/phorward/rapidbatch), a scripting language, also based on *phorward*,
- [pynetree](https://github.com/phorward/pynetree), a light-weight parsing toolkit written in pure Python.
- [UniCC](https://github.com/phorward/unicc), the universal parser generator, created on top of *phorward*,
- [RapidBATCH](https://github.com/phorward/rapidbatch), a scripting language, created on top of *phorward*,
- [pynetree](https://github.com/phorward/pynetree), a light-weight parsing toolkit written in pure Python,
- [JS/CC](https://jscc.brobston.com), the JavaScript parser generator.

## License
Expand Down
39 changes: 38 additions & 1 deletion doc/misc.t2t
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ The Phorward library provides some useful command-line tools which can also be u
**pregex** is a command-line tool for regular expression operations on files and strings. It can be used for match, find, split and replace actions.

```
Usage: pregex OPTIONS {expression-only-if-no-other} [input-file]
Usage: pregex OPTIONS {expression} input

-a --action ACTION Perform regular expression action:
match (default), find, split, replace
Expand All @@ -192,6 +192,17 @@ Usage: pregex OPTIONS {expression-only-if-no-other} [input-file]
-V --version Show version info and exit.
```

Example call:
```
$ pregex -a find "\d+|[a-z]+" "123 abc456 78xy9"
123
abc
456
78
xy
9
```

=== plex ===

**plex** is a command-line tool to construct and run lexical analyzers. It returns a list of tokens on success to stdout.
Expand All @@ -209,6 +220,17 @@ Usage: plex OPTIONS patterns...
-V --version Show version info and exit.
```

Example call:
```
plex -b ":" -e "\n" -i "123 abc456 78xy9" "\d+" "[a-z]+"
1:123
2:abc
1:456
1:78
2:xy
1:9
```

=== pparse ===

**pparse** is a command-line tool to compile and run parsers via command-line. It outputs the parse trees (if any) of the parsed inputs, or just checks for correct syntax.
Expand All @@ -229,6 +251,21 @@ Usage: pparse OPTIONS grammar [input [input ...]]
-V --version Show version info and exit.
```

Example call:

```
$ pparse "Int := /[0-9]+/; f : Int | '(' e ')'; t : t '*' f = mul | f ; e : e '+' t = add | t ;" "1+2*3+4*5"
add
add
Int (1)
mul
Int (2)
Int (3)
mul
Int (4)
Int (5)
```

== Other tools ==

There are also some more, useful command-line tools for C programmers, which are installed and made available. These tools are heavily used by libphorward's own build process, but may also be interesting to others. These tools are all written using standard GNU utilities like sh, awk, grep and sed.
Expand Down
4 changes: 2 additions & 2 deletions tools/plex.c
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ int main( int argc, char** argv )
"help input: version", i ) ) == 0; i++ )
{
if( !strcmp( opt, "begin" ) || !strcmp( opt, "b" ) )
begin_sep = param;
begin_sep = pstrunescape( param );
else if( !strcmp( opt, "end" ) || !strcmp( opt, "e" ) )
end_sep = param;
end_sep = pstrunescape( param );
else if( !strcmp( opt, "file" ) || !strcmp( opt, "f" ) )
{
finput = pfree( finput );
Expand Down
39 changes: 25 additions & 14 deletions tools/pregex.c
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Usage: A pregex object demonstration suite.

void help( char** argv )
{
printf( "Usage: %s OPTIONS {expression-only-if-no-other} [input-file]\n\n"
printf( "Usage: %s OPTIONS {expression} input\n\n"

" -a --action ACTION Perform regular expression action:\n"
" match (default), find, split, replace\n"
Expand Down Expand Up @@ -52,13 +52,18 @@ int main( int argc, char** argv )
char opt [ 10 + 1 ];
char* param;

PROC( "pregex" );

/* Analyze command-line parameters */
for( i = 0; ( rc = pgetopt( opt, &param, &next, argc, argv,
"a:d:De:hf:i:r:V",
"action: delimiter: exec: file: help "
"input: replace: version", i ) )
== 0; i++ )
{
VARS( "opt", "%s", opt );
VARS( "param", "%s", param );

if( !strcmp( opt, "action" ) || !strcmp( opt, "a" ) )
{
action = (char*)NULL;
Expand All @@ -74,7 +79,7 @@ int main( int argc, char** argv )
{
fprintf( stderr, "Invalid action '%s'\n", param );
help( argv );
return 1;
RETURN( 1 );
}

}
Expand All @@ -90,15 +95,15 @@ int main( int argc, char** argv )
{
fprintf( stderr, "Unable to read expression file '%s'\n",
param );
return 1;
RETURN( 1 );
}

expr = fexpr;
}
else if( !strcmp( opt, "help" ) || !strcmp( opt, "h" ) )
{
help( argv );
return 0;
RETURN( 0 );
}
else if( !strcmp( opt, "input" ) || !strcmp( opt, "i" ) )
input = param;
Expand All @@ -107,10 +112,12 @@ int main( int argc, char** argv )
else if( !strcmp( opt, "version" ) || !strcmp( opt, "V" ) )
{
version( argv, "Regular expression command-line utility" );
return 0;
RETURN( 0 );
}
}

VARS( "rc", "%d", rc );

if( rc == 1 )
{
if( !expr && argc > next )
Expand All @@ -119,17 +126,17 @@ int main( int argc, char** argv )
if( !input && argc > next )
{
if( !pfiletostr( &finput, argv[next] ) )
{
fprintf( stderr, "Unable to read input file '%s'\n",
argv[next] );
return 1;
}
input = argv[next];
else
input = finput;

input = finput;
next++;
next++; /* <- obsolete, but complete ;-) */
}
}

VARS( "expr", "%s", expr );
VARS( "input", "%s", input );

if( !( expr && input ) || ( rc < 0 && param ) )
{
if( rc < 0 && param )
Expand All @@ -138,14 +145,16 @@ int main( int argc, char** argv )
fprintf( stderr, "Too less parameters given.\n" );

help( argv );
return 1;
RETURN( 1 );
}

/* Process */

start = input;
re = pregex_create( expr, PREGEX_FLAG_NONE );

VARS( "action", "%s", action );

if( strcmp( action, "match" ) == 0 )
{
if( pregex_match( re, start, &end ) )
Expand All @@ -171,6 +180,8 @@ int main( int argc, char** argv )
}
else if( strcmp( action, "replace" ) == 0 )
{
VARS( "replace", "%s", replace );

start = pregex_replace( re, start, replace );
printf( "%s", start );

Expand All @@ -182,6 +193,6 @@ int main( int argc, char** argv )
pfree( finput );
pfree( fexpr );

return 0;
RETURN( 0 );
}

0 comments on commit efd87e0

Please sign in to comment.