Skip to content

Commit

Permalink
Merge pull request #12 from ekaats/further-features
Browse files Browse the repository at this point in the history
Further features
  • Loading branch information
ekaats authored Oct 23, 2021
2 parents f2f2339 + 672bd42 commit f03cad3
Show file tree
Hide file tree
Showing 13 changed files with 406 additions and 264 deletions.
26 changes: 17 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Parse XPATH 3.1 using Pyparsing
XPath (XML Path Language) is a query language for selecting nodes from an XML document.
In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document.
XPath is supported by the World Wide Web Consortium (W3C).
In addition, XPath is used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document.
XPath is maintained by the World Wide Web Consortium (W3C).

[Pyparsing](https://github.com/pyparsing/pyparsing) is a parsing module used to construct grammar in Python.
XPyth uses Pyparsing to parse XPath strings, and offers an additional abstraction layer.
XPyth-parser uses Pyparsing to parse XPath strings, and offers an additional abstraction layer.

## Status
This library is an attempt to create a parser which can be used both to query XML documents,
Expand All @@ -13,12 +13,16 @@ The original plan was to support both options. However, XPath 3.1 is not widely
Parsing XPath 3.1 on a grammar level should still be supported, but not all information may be available when using
the abstraction layer. Most importantly, there will be [XPath functions](https://www.w3.org/2005/xpath-functions/) missing.

Dealing with dynamic contexts (i.e., parsing XML as Parser.xml will be done using LXML for now).
Dealing with dynamic contexts (i.e., parsing XML as Parser.xml will be done using LXML for now).
In a way, XPyth-parser is at the present moment a fancy wrapper around LXML, in order to support some XPath 2.0+ functionality.

### Alternatives
For most use cases, there will be (better) alternatives to this project. [LXML](https://lxml.de/) is Pythonic binding
for the C libraries libxml2 and libxslt. If only XPath 1.0 is needed, LXML will be a better solution.

### Requirements
xpyth-parser depends on LXML, PyParsing. For parsing dates we use Isodate.

## Goals
This project started out with a specific goal:
to parse [XBRL formula](https://specifications.xbrl.org/work-product-index-formula-formula-1.0.html) tests.
Expand All @@ -27,15 +31,19 @@ Because of this, the author of this library is focussing on correctly interpreti

# Examples

from xpyth_parser.parse import Parser
count = Parser("count(1,2,3)")

from xpyth_parser.parse import Parser
count = Parser("count(1,2,3)").run()
print(count) -> 3


This will give a wrapper class which contains the resolved syntax tree in count.XPath and the answer in count.resolved_answer

# Parsing only
It is also possible to only parse the string, but not try to resolve the static and dynamic context
count = Parser("count(1,2,3), no_resolve=True")

count.xpath will be the full syntax tree, instead of having functions processed and contexts applied.
count.run() will resolve the expression as if no_resolve=False. contexts might need to be passed to the object beforehand.
count = Parser("count(1,2,3), no_resolve=True")

`count.xpath` will be the full syntax tree, instead of having functions processed and contexts applied.
`count.run()` will resolve the expression as if no_resolve=False. contexts might need to be passed to the object beforehand.

5 changes: 3 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = xpyth_31_parser
version = 0.0.7
name = xpyth_parser
version = 0.0.9
author = Erwin Kaats
author_email = [email protected]
description = An XPath 3.1 Parser
Expand All @@ -19,6 +19,7 @@ package_dir =
packages = find:
python_requires = >=3.6
install_requires =
lxml
pyparsing
isodate

Expand Down
137 changes: 102 additions & 35 deletions src/xpyth_parser/conversion/function.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
import functools

import lxml.etree
from isodate import parse_date, parse_duration
from functools import partial

from .functions.generic import FunctionRegistry
from .functions.generic import FunctionRegistry, QuerySingleton
from .qname import QName, Parameter


reg = FunctionRegistry()

def cast_lxml_elements(args):
Expand All @@ -14,15 +17,39 @@ def cast_lxml_elements(args):
:return:
"""

# If it is one element we found, we can cast it and return it
if isinstance(args, lxml.etree._Element):
if isinstance(args, functools.partial):
#todo: this kind of recursion we now have to build into every function?
args = args()

if hasattr(args, "expr"):
args = args.expr

# If it is already a primary, return it
if isinstance(args, str) or isinstance(args, int) or isinstance(args, float):
return args

elif isinstance(args, bytes):
# Could be an unparsed (L)XML element
etree = lxml.etree.fromstring(args)
try:
arg = int(etree.text)
except:
arg = etree.text

return arg

elif isinstance(args, lxml.etree._Element):
try:
arg = int(args.text)
except:
arg = args.text

# But we still want to return a list, because that is expected in functions
return [arg]
return arg

elif args == None:
# If none is passed though (LXML has not found any elements, return the empty list)
return []


# Else, we need to go through the list
casted_args = []
Expand All @@ -35,72 +62,94 @@ def cast_lxml_elements(args):
casted_args.append(arg)

return casted_args
def fn_count(args):


def fn_count(*args, **kwargs):
args = args[0]
if isinstance(args, list):

return len(args)
else:
return 1


def fn_avg(args):
casted_args = cast_lxml_elements(args=args)
def fn_avg(*args, **kwargs):
casted_args = cast_lxml_elements(args=args[0])

if isinstance(casted_args, int):
# If there is only one value, the sum would be the same as the value
return casted_args

return sum(casted_args) / len(casted_args)


def fn_max(args):
casted_args = cast_lxml_elements(args=args)
def fn_max(*args, **kwargs):
casted_args = cast_lxml_elements(args=args[0])
if isinstance(casted_args, int):
# If there is only one value, the sum would be the same as the value
return casted_args

return max(casted_args)

def fn_min(args):
casted_args = cast_lxml_elements(args=args)
def fn_min(*args, **kwargs):
casted_args = cast_lxml_elements(args=args[0])
if isinstance(casted_args, int):
# If there is only one value, the sum would be the same as the value
return casted_args

return min(casted_args)

def fn_sum(args):
casted_args = cast_lxml_elements(args=args)
def fn_sum(*args, **kwargs):
casted_args = cast_lxml_elements(args=args[0])

if isinstance(casted_args, int):
# If there is only one value, the sum would be the same as the value
return casted_args

return sum(casted_args)


def fn_not(args):
def fn_not(*args, **kwargs):
for arg in args:
if arg is True:
return False # found an argument that is true
# Did not find a True value
return True

def fn_empty(args):
def fn_empty(*args, **kwargs):
for arg in args:
if arg is None or arg == "":
return True

return False

def xs_date(args):
if len(args) == 0:
def xs_date(*args, **kwargs):
casted_args = cast_lxml_elements(args=args[0])
if len(casted_args) == 0:
return False
else:
date = parse_date(args)
date = parse_date(casted_args)
return date

def xs_yearMonthDuration(args):

if len(args) == 0:
def xs_yearMonthDuration(*args, **kwargs):
casted_args = cast_lxml_elements(args=args[0])
if len(casted_args) == 0:
return False
else:
duration = parse_duration(args)
duration = parse_duration(casted_args)
return duration

def xs_dayTimeDuration(args):
if len(args) == 0:
def xs_dayTimeDuration(*args, **kwargs):
casted_args = cast_lxml_elements(args=args[0])
if len(casted_args) == 0:
return False
else:
duration = parse_duration(args)
duration = parse_duration(casted_args)
return duration

def xs_qname(args):
def xs_qname(*args, **kwargs):
# Returns an xs:QName value formed using a supplied namespace URI and lexical QName.

args = args[0]
if isinstance(args, str):
prefix, localname = str(args).split(":", 1)
return QName(prefix=prefix, localname=localname)
Expand All @@ -113,11 +162,17 @@ def xs_qname(args):
prefix, localname = str(args[1]).split(":", 1)
return QName(prefix=prefix, localname=localname, namespace=args[0])

def fn_number(args):
# Returns an xs:QName value formed using a supplied namespace URI and lexical QName.
def fn_number(*args, **kwargs):
"""
Returns an xs:QName value formed using a supplied namespace URI and lexical QName.
:param args:
:return:
"""
casted_args = cast_lxml_elements(args=args[0])

# Otherwise try to cast the argument to float.
return float(args)
return float(casted_args)

functions = {
"fn:count":fn_count,
Expand All @@ -134,13 +189,25 @@ def fn_number(args):
"xs:QName": xs_qname,

}

# Add XBRL functions
from .functions.xbrl import function_list
functions.update(function_list)

# Add the initial set of functions to the registry
reg.add_functions(functions=functions, overwrite_functions=True)

def get_function(v):
qname = v[0]
args = list(v[1:])
query = QuerySingleton()

def get_function(toks):
qname = toks[0]

if not isinstance(qname, QName):
# The first token should really be a qname. This is the name of the function.
return toks


args = list(toks[1:])

# If no prefix is defined, FN will be assumed for function calls
if qname.prefix is None:
Expand All @@ -157,7 +224,7 @@ def get_function(v):
if len(args) == 1:
args = args[0]

return partial(function, args)
return partial(function, args, query=query.lxml_tree)
else:
print("Cannot find function in registry")

Expand Down
21 changes: 18 additions & 3 deletions src/xpyth_parser/conversion/functions/generic.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import functools
import logging

from typing import Union, Optional
from ..qname import Parameter, QName
from ..qname import QName


class FunctionRegistry:
Expand Down Expand Up @@ -54,6 +53,22 @@ def add_functions(self, functions: dict = None, overwrite_functions: Optional[bo
# Only overwrite functions if this is explicitly set
self.functions[function_name] = function

class QuerySingleton:
_instance = None
lxml_tree = None

def __new__(cls, *args, **kwargs):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance

def __init__(
self,
lxml_tree = None
):
if lxml_tree is not None:
self.lxml_tree = lxml_tree


class OrExpr:
def __init__(self, a, b):
Expand Down
14 changes: 14 additions & 0 deletions src/xpyth_parser/conversion/functions/xbrl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
def identifier(*args, **kwargs):
"""
Gets the identifier of one or more XBRL facts
:param self:
:return:
"""
# https://specifications.xbrl.org/registries/functions-registry-1.0/80132%20xfi.identifier/80132%20xfi.identifier%20function.html
for arg in args[0]:
context_ref = arg.get("contextRef")
q = kwargs['query']
context = q.xpath(f"/xbrli:xbrl/xbrli:context[@id='{context_ref}']/xbrli:entity/xbrli:identifier", namespaces=q.nsmap)
return context[0].text

function_list = {"xfi:identifier": identifier}
Loading

0 comments on commit f03cad3

Please sign in to comment.