Skip to content

Commit

Permalink
#151: Add option resolve_hostnames (#152)
Browse files Browse the repository at this point in the history
Co-authored-by: Nicola Coretti <[email protected]>
  • Loading branch information
kaklakariada and Nicoretti authored Sep 9, 2024
1 parent ff9b7f2 commit 09919c1
Show file tree
Hide file tree
Showing 12 changed files with 303 additions and 30 deletions.
9 changes: 6 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@

## [Unreleased]

## [0.27.0] - 2024-09-09

- Relocked dependencies (Internal)
- [#151](https://github.com/exasol/pyexasol/issues/151): Added option to deactivate hostname resolution

## [0.26.0] - 2024-07-04

Expand All @@ -12,9 +15,9 @@

This driver facade should only be used if one is certain that using the dbapi2 is the right solution for their scenario, taking all implications into account. For more details on why and who should avoid using dbapi2, please refer to the [DBAPI2 compatibility section](/docs/DBAPI_COMPAT.md) in our documentation.

- Droped support for python 3.7
- Droped support for Exasol 6.x
- Droped support for Exasol 7.0.x
- Dropped support for python 3.7
- Dropped support for Exasol 6.x
- Dropped support for Exasol 7.0.x
- Relocked dependencies (Internal)
- Switched packaging and project workflow to poetry (internal)

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ PyEXASOL provides API to read & write multiple data streams in parallel using se
- [DB-API 2.0 compatibility](/docs/DBAPI_COMPAT.md)
- [Optional dependencies](/docs/DEPENDENCIES.md)
- [Changelog](/CHANGELOG.md)
- [Developer Guide](/docs/DEVELOPER_GUIDE.md)


## PyEXASOL main concepts
Expand Down Expand Up @@ -116,4 +117,3 @@ Enjoy!

## Maintained by
[Exasol](https://www.exasol.com) 2023 — Today

27 changes: 27 additions & 0 deletions docs/DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Developer Guide

This guide explains how to develop `pyexasol` and run tests.

## Initial Setup

Create a virtual environment and install dependencies:

```sh
poetry install --all-extras
```

Run the following to enter the virtual environment:

```sh
poetry shell
```

## Running Integration Tests

To run integration tests first start a local database:

```sh
nox -s db-start
```

Then you can run tests as usual with `pytest`.
7 changes: 7 additions & 0 deletions docs/REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ Open new connection and return `ExaConnection` object.
| `udf_output_connect_address` | `('udf_host', 8580)` | Specific SCRIPT_OUTPUT_ADDRESS value to connect from Exasol to UDF script output server (Default: inherited from TCP server) |
| `udf_output_dir` | `/tmp` | Path or path-like object pointing to directory for script output log files (Default: `tempfile.gettempdir()`) |
| `http_proxy` | `http://myproxy.com:3128` | HTTP proxy string in Linux [`http_proxy`](https://www.shellhacks.com/linux-proxy-server-settings-set-proxy-command-line/) format (Default: `None`) |
| `resolve_hostnames` | `False` | Explicitly resolve host names to IP addresses before connecting. Deactivating this will let the operating system resolve the host name (Default: `True`) |
| `client_name` | `MyClient` | Custom name of client application displayed in Exasol sessions tables (Default: `PyEXASOL`) |
| `client_version` | `1.0.0` | Custom version of client application (Default: `pyexasol.__version__`) |
| `client_os_username` | `john` | Custom OS username displayed in Exasol sessions table (Default: `getpass.getuser()`) |
Expand All @@ -122,6 +123,12 @@ Open new connection and return `ExaConnection` object.
| `access_token` | `...` | OpenID access token to use for the login process |
| `refresh_token` | `...` | OpenID refresh token to use for the login process |

### Host Name Resolution

By default pyexasol resolves host names to IP addresses, randomly shuffles the IP addresses and tries to connect until connection succeeds. See the [design documentation](/docs/DESIGN.md#automatic-resolution-and-randomization-of-connection-addresses) for details.

If host name resolution causes problems, you can deactivate it by specifying argument `resolve_hostnames=False`. This may be required when connecting through a proxy that allows connections only to defined host names. In all other cases we recommend to omit the argument.

## connect_local_config()
Open new connection and return `ExaConnection` object using local .ini file (usually `~/.pyexasol.ini`) to read credentials and connection parameters. Please read [local config](/docs/LOCAL_CONFIG.md) page for more details.

Expand Down
79 changes: 55 additions & 24 deletions pyexasol/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@

from . import callback as cb

from typing import (
NamedTuple,
Optional
)
from .exceptions import *
from .statement import ExaStatement
from .logger import ExaLogger
Expand All @@ -27,6 +31,13 @@
from .version import __version__


class Host(NamedTuple):
"""This represents a resolved host name with its IP address and port number."""
hostname: str
ip_address: Optional[str]
port: int
fingerprint: Optional[str]

class ExaConnection(object):
cls_statement = ExaStatement
cls_formatter = ExaFormatter
Expand Down Expand Up @@ -69,6 +80,7 @@ def __init__(self
, udf_output_connect_address=None
, udf_output_dir=None
, http_proxy=None
, resolve_hostnames=True
, client_name=None
, client_version=None
, client_os_username=None
Expand Down Expand Up @@ -104,6 +116,7 @@ def __init__(self
:param udf_output_connect_address: Specific SCRIPT_OUTPUT_ADDRESS value to connect from Exasol to UDF script output server (default: inherited from TCP server)
:param udf_output_dir: Directory to store captured UDF script output logs, split by <session_id>_<statement_id>/<vm_num>
:param http_proxy: HTTP proxy string in Linux http_proxy format (default: None)
:param resolve_hostnames: Explicitly resolve host names to IP addresses before connecting. Deactivating this will let the operating system resolve the host name (default: True)
:param client_name: Custom name of client application displayed in Exasol sessions tables (Default: PyEXASOL)
:param client_version: Custom version of client application (Default: pyexasol.__version__)
:param client_os_username: Custom OS username displayed in Exasol sessions table (Default: getpass.getuser())
Expand Down Expand Up @@ -144,6 +157,7 @@ def __init__(self
'udf_output_dir': udf_output_dir,

'http_proxy': http_proxy,
'resolve_hostnames': resolve_hostnames,

'client_name': client_name,
'client_version': client_version,
Expand Down Expand Up @@ -652,30 +666,17 @@ def _init_ws(self):
"""
dsn_items = self._process_dsn(self.options['dsn'])
failed_attempts = 0

ws_prefix = 'wss://' if self.options['encryption'] else 'ws://'
ws_options = self._get_ws_options()

for hostname, ipaddr, port, fingerprint in dsn_items:
self.logger.debug(f"Connection attempt [{ipaddr}:{port}]")

# Use correct hostname matching IP address for each connection attempt
if self.options['encryption']:
ws_options['sslopt']['server_hostname'] = hostname

try:
self._ws = websocket.create_connection(f'{ws_prefix}{ipaddr}:{port}', **ws_options)
self._ws = self._create_websocket_connection(hostname, ipaddr, port)
except Exception as e:
self.logger.debug(f'Failed to connect [{ipaddr}:{port}]: {e}')

failed_attempts += 1

if failed_attempts == len(dsn_items):
raise ExaConnectionFailedError(self, 'Could not connect to Exasol: ' + str(e))
raise ExaConnectionFailedError(self, 'Could not connect to Exasol: ' + str(e)) from e
else:
self._ws.settimeout(self.options['socket_timeout'])

self.ws_ipaddr = ipaddr
self.ws_ipaddr = ipaddr or hostname
self.ws_port = port

self._ws_send = self._ws.send
Expand All @@ -686,6 +687,32 @@ def _init_ws(self):

return

def _create_websocket_connection(self, hostname:str, ipaddr:str, port:int) -> websocket.WebSocket:
ws_options = self._get_ws_options()
# Use correct hostname matching IP address for each connection attempt
if self.options['encryption'] and self.options["resolve_hostnames"]:
ws_options['sslopt']['server_hostname'] = hostname

connection_string = self._get_websocket_connection_string(hostname, ipaddr, port)
self.logger.debug(f"Connection attempt {connection_string}")
try:
return websocket.create_connection(connection_string, **ws_options)
except Exception as e:
self.logger.debug(f'Failed to connect [{connection_string}]: {e}')
raise e

def _get_websocket_connection_string(self, hostname:str, ipaddr:Optional[str], port:int) -> str:
host = hostname
if self.options["resolve_hostnames"]:
if ipaddr is None:
raise ValueError("IP address was not resolved")
host = ipaddr
if self.options["encryption"]:
return f"wss://{host}:{port}"
else:
return f"ws://{host}:{port}"


def _get_ws_options(self):
options = {
'timeout': self.options['connection_timeout'],
Expand Down Expand Up @@ -729,13 +756,13 @@ def _get_login_attributes(self):

return attributes

def _process_dsn(self, dsn):
def _process_dsn(self, dsn: str) -> list[Host]:
"""
Parse DSN, expand ranges and resolve IP addresses for all hostnames
Return list of (hostname, ip_address, port) tuples in random order
Randomness is required to guarantee proper distribution of workload across all nodes
"""
if len(dsn.strip()) == 0:
if dsn is None or len(dsn.strip()) == 0:
raise ExaConnectionDsnError(self, 'Connection string is empty')

current_port = constant.DEFAULT_PORT
Expand Down Expand Up @@ -787,24 +814,28 @@ def _process_dsn(self, dsn):
result.extend(self._resolve_hostname(hostname, current_port, current_fingerprint))
# Just a single hostname or single IP address
else:
result.extend(self._resolve_hostname(m.group('hostname_prefix'), current_port, current_fingerprint))
hostname = m.group('hostname_prefix')
if self.options["resolve_hostnames"]:
result.extend(self._resolve_hostname(hostname, current_port, current_fingerprint))
else:
result.append(Host(hostname, None, current_port, current_fingerprint))

random.shuffle(result)

return result

def _resolve_hostname(self, hostname, port, fingerprint):
def _resolve_hostname(self, hostname: str, port: int, fingerprint: Optional[str]) -> list[Host]:
"""
Resolve all IP addresses for hostname and add port
It also implicitly checks that all hostnames mentioned in DSN can be resolved
"""
try:
hostname, alias_list, ipaddr_list = socket.gethostbyname_ex(hostname)
except OSError:
hostname, _, ipaddr_list = socket.gethostbyname_ex(hostname)
except OSError as e:
raise ExaConnectionDsnError(self, f'Could not resolve IP address of hostname [{hostname}] '
f'derived from connection string')
f'derived from connection string') from e

return [(hostname, ipaddr, port, fingerprint) for ipaddr in ipaddr_list]
return [Host(hostname, ipaddr, port, fingerprint) for ipaddr in ipaddr_list]

def _validate_fingerprint(self, provided_fingerprint):
server_fingerprint = hashlib.sha256(self._ws.sock.getpeercert(True)).hexdigest().upper()
Expand Down
2 changes: 1 addition & 1 deletion pyexasol/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.26.0'
__version__ = '0.27.0'
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "pyexasol"
version = "0.26.0"
version = "0.27.0"
license = "MIT"
readme = "README.md"
description = "Exasol python driver with extra features"
Expand Down
Loading

0 comments on commit 09919c1

Please sign in to comment.