Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for GDAL 3.9 and Python 3.12 and 3.13 #60

Merged
merged 1 commit into from
Oct 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# OversightML Imagery Toolkit
![Build Badge](https://github.com/aws-solutions-library-samples/osml-imagery-toolkit/actions/workflows/build.yml/badge.svg)
![Python Badge](https://img.shields.io/badge/python-3.9%2C%203.10%2C%203.11-blue)
![Python Badge](https://img.shields.io/badge/python-3.9%2C%203.10%2C%203.11%2C%203.12%2C%203.13-blue)
![GDAL Badge](https://img.shields.io/badge/gdal-3.7%2C%203.8%2C%203.9-blue)
![GitHub License](https://img.shields.io/github/license/aws-solutions-library-samples/osml-imagery-toolkit?color=blue)
![PyPI - Version](https://img.shields.io/pypi/v/osml-imagery-toolkit)

Expand Down Expand Up @@ -35,9 +36,12 @@ distribution.
```shell
pip install .[gdal]
```
Note that GDAL is listed as an extra dependency for this package. This is done to facilitate environments that either
don't want to use GDAL or those that have their own custom installation steps for that library. Future versions of
this package will include image IO backbones that have fewer dependencies.
Note that GDAL is currently required but it is listed as an extra dependency for this package. This is done to facilitate
environments that either don't want to use GDAL or those that have their own custom installation steps for that library.
Future versions of this package will include image IO backbones that have fewer dependencies. Beware that GDAL has been
known to introduce breaking changes on minor version numbers so testing for specific version compatability is
recommended. The tox build has been setup to test multiple gdal/proj/python combinations and the versions
checked by automated testing can be seen in the environment-*.yml files.

## Contributing

Expand Down
8 changes: 8 additions & 0 deletions environment-py310.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Copyright 2023-2024 Amazon.com, Inc. or its affiliates.

name: osml_imagery_toolkit
channels:
- conda-forge
dependencies:
- conda-forge::gdal=3.8.3
- conda-forge::proj=9.3.1
1 change: 0 additions & 1 deletion environment-py311.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,5 @@ name: osml_imagery_toolkit
channels:
- conda-forge
dependencies:
- conda-forge::python=3.11.6
- conda-forge::gdal=3.8.3
- conda-forge::proj=9.3.1
8 changes: 8 additions & 0 deletions environment-py312.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Copyright 2023-2024 Amazon.com, Inc. or its affiliates.

name: osml_imagery_toolkit
channels:
- conda-forge
dependencies:
- conda-forge::gdal=3.8.5
- conda-forge::proj=9.4.1
8 changes: 8 additions & 0 deletions environment-py313.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Copyright 2023-2024 Amazon.com, Inc. or its affiliates.

name: osml_imagery_toolkit
channels:
- conda-forge
dependencies:
- conda-forge::gdal=3.9.2
- conda-forge::proj=9.5.0
8 changes: 8 additions & 0 deletions environment-py39.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Copyright 2023-2024 Amazon.com, Inc. or its affiliates.

name: osml_imagery_toolkit
channels:
- conda-forge
dependencies:
- conda-forge::gdal=3.7.0
- conda-forge::proj=9.2.1
4 changes: 2 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ name: osml_imagery_toolkit
channels:
- conda-forge
dependencies:
- conda-forge::gdal=3.7.0
- conda-forge::proj=9.2.1
- conda-forge::gdal>=3.7.0,<3.10
- conda-forge::proj>=9.2.1,<9.6
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,6 @@ package_data =

[options.extras_require]
gdal =
gdal>=3.7.0
gdal>=3.7.0,<3.10
test =
tox
3 changes: 1 addition & 2 deletions src/aws/osml/gdal/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,7 @@
xml_data_content_segments = des_accessor.get_segments_by_name("XML_DATA_CONTENT")
if xml_data_content_segments is not None:
for xml_data_segment in xml_data_content_segments:
xml_bytes = des_accessor.parse_field_value(xml_data_segment, "DESDATA", base64.b64decode)
xml_str = xml_bytes.decode("utf-8")
xml_str = des_accessor.extract_desdata_xml(xml_data_segment)
if "SICD" in xml_str:
temp = xml.dom.minidom.parseString(xml_str)
new_xml = temp.toprettyxml()
Expand Down
42 changes: 40 additions & 2 deletions src/aws/osml/gdal/nitf_des_accessor.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Copyright 2023-2024 Amazon.com, Inc. or its affiliates.

import base64
import re
from io import StringIO
from typing import Callable, List, TypeVar
from typing import Callable, List, Optional, TypeVar
from xml.etree import ElementTree as ET

from defusedxml import ElementTree
Expand Down Expand Up @@ -30,7 +32,12 @@ def __init__(self, gdal_xml_des_metadata: List[str]):
self.parsed_des_lists = []
if gdal_xml_des_metadata is not None and len(gdal_xml_des_metadata) > 0:
for xml_des_list in gdal_xml_des_metadata:
des_list = ElementTree.fromstring(xml_des_list)
# The new handling GDAL has for XML data content causes an XML document to be expanded in the middle
# of the xml:DES data structure. An embedded xml prolog (e.g. <?xml version= ... ?>) is invalid syntax
# that will throw off some XML parsers. The xml prolog is optional, so we can strip all of them from
# the XML as a workaround while we look for a better way to address this recent GDAL change.
clean_xml_string = re.sub(r"<\?xml.*?\?>", "", xml_des_list).strip()
des_list = ElementTree.fromstring(clean_xml_string)
self.parsed_des_lists.append(des_list)

def get_segments_by_name(self, des_name: str) -> List[ET.Element]:
Expand Down Expand Up @@ -90,6 +97,37 @@ def extract_des_header(des_element: ET.Element) -> str:

return result_builder.getvalue()

@staticmethod
def extract_desdata_xml(des_element: ET.Element) -> Optional[str]:
"""
This function attempts to extract a block of XML from the field element named DESDATA. Versions of GDAL
before 3.9 returned the XML data base64 encoded as a value attribute. Versions >=3.9 are automatically
expanding the xml into the text area of an <xml_content> element.

:param des_element: the root xml:DES metadata element
:return: the xml content if it is found and can be extracted
"""
desdata_element = des_element.find("./field[@name='DESDATA']")
if desdata_element is None:
return None

value_attribute = desdata_element.get("value")
if value_attribute:
# This appears to be the encoding used by GDAL versions <3.9. Extract the
# XML from the base64 encoded value attribute
xml_bytes = base64.b64decode(value_attribute)
return xml_bytes.decode("utf-8")

xml_content_element = desdata_element.find("xml_content")
if xml_content_element:
# This appears to be a encoding used by GDAL >3.9. The XML is already parsed
# and available as the content of this element. See: https://github.com/OSGeo/gdal/pull/8953
return ET.tostring(xml_content_element[0], "unicode")

# Unable to parse the XML from the data segment. This sometimes happens if GDAL
# changes the representation of this information in their APIs
return None

@staticmethod
def parse_field_value(des_element: ET.Element, field_name: str, type_conversion: Callable[[str], T]) -> T:
"""
Expand Down
7 changes: 4 additions & 3 deletions src/aws/osml/gdal/sensor_model_factory.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Copyright 2023-2024 Amazon.com, Inc. or its affiliates.

import base64
import logging
from enum import Enum
from typing import List, Optional
Expand Down Expand Up @@ -188,8 +187,10 @@ def build(self) -> Optional[SensorModel]:
xml_data_content_segments = des_accessor.get_segments_by_name("XML_DATA_CONTENT")
if xml_data_content_segments is not None:
for xml_data_segment in xml_data_content_segments:
xml_bytes = des_accessor.parse_field_value(xml_data_segment, "DESDATA", base64.b64decode)
xml_str = xml_bytes.decode("utf-8")
xml_str = des_accessor.extract_desdata_xml(xml_data_segment)
if not xml_str:
continue

if "SIDD" in xml_str:
# SIDD images will often contain SICD XML metadata as well but the SIDD should come first
# so we can stop processing other XML data segments
Expand Down
27 changes: 16 additions & 11 deletions src/aws/osml/image_processing/gdal_tile_factory.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Copyright 2023-2024 Amazon.com, Inc. or its affiliates.

import base64
import copy
import logging
from secrets import token_hex
Expand Down Expand Up @@ -63,16 +62,22 @@ def __init__(

xml_data_content_segments = self.des_accessor.get_segments_by_name("XML_DATA_CONTENT")
if xml_data_content_segments is not None and len(xml_data_content_segments) > 0:
# This appears to be SICD or SIDD data
xml_data_segment = xml_data_content_segments[0]
xml_bytes = self.des_accessor.parse_field_value(xml_data_segment, "DESDATA", base64.b64decode)
xml_str = xml_bytes.decode("utf-8")
if "SIDD" in xml_str:
self.sar_des_header = self.des_accessor.extract_des_header(xml_data_segment)
self.sar_updater = SIDDUpdater(xml_str)
elif "SICD" in xml_str:
self.sar_des_header = self.des_accessor.extract_des_header(xml_data_segment)
self.sar_updater = SICDUpdater(xml_str)
for xml_data_segment in xml_data_content_segments:
xml_str = self.des_accessor.extract_desdata_xml(xml_data_segment)
if not xml_str:
continue

# Check to see if this is SICD or SIDD data
if "SIDD" in xml_str:
# SIDD images will often contain SICD XML metadata as well but the SIDD should come first
# so we can stop processing other XML data segments if we find a SIDD segment.
self.sar_des_header = self.des_accessor.extract_des_header(xml_data_segment)
self.sar_updater = SIDDUpdater(xml_str)
break
elif "SICD" in xml_str:
self.sar_des_header = self.des_accessor.extract_des_header(xml_data_segment)
self.sar_updater = SICDUpdater(xml_str)
break

self.default_gdal_translate_kwargs = self._create_gdal_translate_kwargs()

Expand Down
8 changes: 6 additions & 2 deletions tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[tox]
envlist =
# Basic configurations: Run the tests for each python version.
py{39, 310, 311}
py{39, 310, 311, 312, 313}

# Build and test the docs with sphinx.
docs
Expand All @@ -22,14 +22,17 @@ requires =
skip_missing_interpreters = False

[testenv]
conda_env = {toxinidir}/environment.yml
allowlist_externals =
conda
conda_env = {toxinidir}/environment-{envname}.yml
deps =
pytest==7.2.1
pytest-cov==4.0.0
pytest-xdist==3.2.0
pytest-asyncio==0.20.3
mock==5.0.1
commands =
conda list "^(gdal|proj|python)$"
pytest --cov-config .coveragerc --cov aws.osml --cov-report term-missing {posargs}
{env:IGNORE_COVERAGE:} coverage html --rcfile .coveragerc

Expand All @@ -48,6 +51,7 @@ commands =
twine check dist/*.tar.gz

[testenv:docs]
conda_env =
changedir = doc
deps =
sphinx>=6.2.1
Expand Down