Skip to content

Latest commit

 

History

History
238 lines (164 loc) · 11.3 KB

README.md

File metadata and controls

238 lines (164 loc) · 11.3 KB

AxonOps™ CSV Extractor

The AxonOps CSV Extractor is a Python utility designed to extract monthly data from AxonOps and convert it to CSV files. This tool is particularly useful for integrating into data pipelines, data warehouses or ETL workflows

You simply add your AxonOps API keys as environment variables, create a report config (see data/queryconfig/query_config_example.json) and then run axonops_csv_extractor.py.

To get it working, load the Python dependencies in requirements.txt into your Python environment and then run the Python script axonops_csv_extractor.py

usage: axonops_csv_extractor.py [-h] -o OUTPUTDIR -q QUERYCONFIG -m MONTHOFYEAR [-d]

AxonOps CSV Extractor

options:
  -h, --help            show this help message and exit
  -o OUTPUTDIR, --outputdir OUTPUTDIR
                        The file path to a directory for outputting the CSV data.
  -q QUERYCONFIG, --queryconfig QUERYCONFIG
                        File path to the JSON configuration file listing the queries to run and extract to CSV. See the README.md for more information on this configuration file.
  -m MONTHOFYEAR, --monthofyear MONTHOFYEAR
                        The month of year in format YYYYMM for which data will be extracted to CSV. This can not be in the future nor the current month
  -d, --deletejson      If set, the downloaded JSON will be kept in the output directory. By default it is automatically deleted after being converted to CSV.

Note: please ensure that there is sufficient disk space at the location you choose to output the CSV - they can be large.

For example

python axonops_csv_extractor.py --outputdir data/results/mydata --queryconfig data/queryconfig/myqueries.json --monthofyear 202409

This will extract all the data for September 2024 returned by the queries in myqueries.json and store it as CSV files in data/results/mydata


Setup Instructions

Query Configuration Setup

The --queryconfig argument should be pointing at a JSON file that defines the queries to run, the results of which will be converted to CSV. There is an example you can see in data/queryconfig/query_config_example.json.

The file looks roughly like this:

{
    "clusters": ["mycassandracluster1","mycassandracluster2","mycassandracluster3"],
    "queries": [
        {
            "description": "Live Disk Space Used by DC and Keyspace",
            "unit": "bytes (SI)",
            "axon_query": "sum(cas_Table_LiveDiskSpaceUsed{function='Count',keyspace!~'system|system_auth|system_distributed|system_schema|system_traces',scope!=''}) by (dc, keyspace)",
            "file_prefix": "live_disk_per_keyspace"
        },
        {
            "description": "Total Coordinator Reads by DC and Keyspace",
            "unit": "rps",
            "axon_query": "sum(cas_Table_CoordinatorReadLatency{axonfunction='rate',function=~'Count',keyspace!~'system|system_auth|system_distributed|system_schema|system_traces'}) by (dc,keyspace,scope)",
            "file_prefix": "total_coordinator_table_reads_per_dc",
            "field_renames": [
                {"rename": "scope", "value": "table"}
            ]
        }
    ]
}

JSON Field Information

  • clusters: Add the names of your clusters connected to AxonOps in the clusters list.
  • queries: For each of the clusters in entered, this is the list of queries that will be run and the results converted to CSV.
    • description: A human readable description of the query
    • unit: the unit returned
    • axon_query: the AxonOps query to run the results of which will be converted to CSV.
    • file_prefix: The file prefix that will be used in the name of the CSV. This must be unique.
    • field_renames: (optional) Sometimes, the JSON API response has fields returned that can be named better.

How to find values for axon_query

In AxonOps we using a query language that is very similar to PromQL to query for retrieving metrics.

If there is a metric you want to extract to CSV, navigate to the dashboard it is on:

Screenshot 2024-10-25 at 10 26 40

Then edit the graph containing the metric you want to extract:

Screenshot 2024-10-25 at 10 31 47

One in the edit window, click on VISUALIZATION tab:

Screenshot 2024-10-25 at 10 27 20

From there you can see the Query value for this graph. In this example it is defined as:

host_CPU_Percent_Merge{time='real',dc=~'$dc',rack=~'$rack',host_id=~'$host_id'}

This query has some templated variables $dc, $rack, $host_id - these are used to pass in parameters selected in the dashboard GUI and we dont want these for extraction purposes, unless you want to substitute values in for the extract.

You have several options with this query. The first option host_CPU_Percent_Merge will return a large volume of data, it will return the raw values for every single server in the cluster. Things to consider as an alternative is to group the data by dc, keyspace or whatever way you want to group the data. Additionally you can use aggregate functions like sum etc.. Visit the AxonOps documentation to find out more.

Setup Steps

You need to install Python 3 for this application, a good resource is the official Python website. You can find installation instructions and download links for various operating systems there.

  • Python Downloads - This page provides the latest releases of Python for Windows, macOS, and other platforms, along with detailed installation instructions.

1. Clone the Repository

First, clone the GitHub repository to your local machine:

git clone https://github.com/axonops/axonops-csv-extractor.git
cd axonops-csv-extractor

2. Generate an AxonOps API token for your organisation

Login to your AxonOps console. For the SaaS console go to http://console.axonops.com.

Once you have logged in, choose the organisation you want to create an API token for. Then enter the API Tokens section:

API Tokens

From there, click create a new API Token. Give the token a name, expiry period, select the clusters you want to be able to access with the token and select the Readonly Role

Create New Token

Click the Generate button and then you can copy the API token to use in Step 3.

Generated Token

(obviously that token is not valid 😊)

3. Set-up your AxonOps organisation and API tokens as environment variables

To interact with the AxonOps APIs to query its data you need to store your organisation and API key as environment variables in a new .env you need to create in the root directory of the repo.

See .env-example - copy this to a file called .env and update it with your organisation id and the API token you generated. This file is in .gitignore and will not be committed.

AXONOPS_ORG_ID="youraxonopsorg"
AXONOPS_API_SECRET_TOKEN="yourapitoken"

For self-hosted AxonOps users you can also add a AXONOPS_DASH_URL variable to point at your own installation of AxonOps.

4. Set Up a Python Virtual Environment

Create a virtual environment in the project directory using .venv as the directory name. This ensures that all Python dependencies are installed in an isolated environment specific to this project:

python3 -m venv .venv

This command creates a directory named .venv inside your project directory, which contains the virtual environment.

5. Activate the Virtual Environment

Activate the virtual environment to start using it. The activation command depends on your operating system:

  • On macOS and Linux:

    source .venv/bin/activate
  • On Windows (Command Prompt):

    .venv\Scripts\activate.bat
  • On Windows (PowerShell):

    .venv\Scripts\Activate.ps1

Once activated, your terminal prompt should change to indicate that you are working within the virtual environment.

6. Install Required Python Packages

With the virtual environment activated, install all necessary packages listed in requirements.txt:

pip install -r requirements.txt

This command reads the requirements.txt file and installs all specified packages into the virtual environment.

7. Run the Python Script

Now that all dependencies are installed, you can run the main Python script:

python axonops_csv_extractor.py [options]

Usage:

usage: axonops_csv_extractor.py [-h] -o OUTPUTDIR -q QUERYCONFIG -m MONTHOFYEAR [-d]

AxonOps CSV Extractor

options:
  -h, --help            show this help message and exit
  -o OUTPUTDIR, --outputdir OUTPUTDIR
                        The file path to a directory for outputting the CSV data.
  -q QUERYCONFIG, --queryconfig QUERYCONFIG
                        File path to the JSON configuration file listing the queries to run and extract to CSV. See the README.md for more information on this configuration file.
  -m MONTHOFYEAR, --monthofyear MONTHOFYEAR
                        The month of year in format YYYYMM for which data will be extracted to CSV. This can not be in the future nor the current month
  -d, --deletejson      If set, the downloaded JSON will be kept in the output directory. By default it is automatically deleted after being converted to CSV.

8. Deactivate the Virtual Environment

Once you are done working, deactivate the virtual environment to return to your system's default Python environment:

deactivate

AxonOps is a registered trademark of AxonOps Limited. Apache, Apache Cassandra, Cassandra, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.