Opensearch OTEL Demo Architecture

This document will review the OpenSearch architecture for the OTEL demo and will review how to use the new Observability capabilities implemented into OpenSearch.

This diagram provides an overview of the system components, showcasing the configuration derived from the OpenTelemetry Collector (otelcol) configuration file utilized by the OpenTelemetry demo application.

Additionally, it highlights the observability data (traces and metrics) flow within the system.

OTEL DEMO Describes the list of services that are composing the Astronomy Shop.

They are combined of:

Accounting
Ad
Cart
Checkout
Currency
Email
Feature Flag
Fraud Detection
Frontend
Kafka
Payment
Product Catalog
Quote
Recommendation
Shipping
Fluent-Bit (nginx's otel log exported)
Integrations (pre-canned OpenSearch assets)
DataPrepper *(OpenSearch's ingestion pipeline)

Backend supportive services

Load Generator
- See description
Frontend Nginx Proxy (replacement for Frontend-Proxy)
- See description
OpenSearch
- See description
Dashboards
- See description
Prometheus
- See description
Feature-Flag
- See description
Grafana
- See description

Services Topology

The next diagram shows the docker compose services dependencies

Purpose

The purpose of this demo is to demonstrate the different capabilities of OpenSearch Observability to investigate and reflect your system.

Ingestion

The ingestion capabilities for OpenSearch is to be able to support multiple pipelines:

Data-Prepper is an OpenSearch ingestion project that allows ingestion of OTEL standard signals using Otel-Collector
Jaeger is an ingestion framework which has a build in capability for pushing OTEL signals into OpenSearch
Fluent-Bit is an ingestion framework which has a build in capability for pushing OTEL signals into OpenSearch

Integrations -

The integration service is a list of pre-canned assets that are loaded in a combined manner to allow users the ability for simple and automatic way to discover and review their services topology.

These (demo-sample) integrations contain the following assets:

components & index template mapping
datasources
data-stream & indices
queries
dashboards

Once they are loaded, the user can imminently review his OTEL demo services and dashboards that reflect the system state.

Nginx Dashboard - reflects the Nginx Proxy server that routes all the network communication to/from the frontend
Prometheus datasource - reflects the connectivity to the prometheus metric storage that allows us to federate metrics analytics queries
Logs Datastream - reflects the data-stream used by nginx logs ingestion and dashboards representing a well-structured log schema

Once these assets are loaded - the user can start reviewing its Observability dashboards and traces

Scenarios

How can you solve problems with OpenTelemetry? These scenarios walk you through some pre-configured problems and show you how to interpret OpenTelemetry data to solve them.

Generate a Product Catalog error for GetProduct requests with product id: OLJCESPC7Z using the Feature Flag service
Discover a memory leak and diagnose it using metrics and traces. Read more

Prometheus Metrics

Getting all metrics names call the following API http://localhost:9090/api/v1/label/__name__/values

This will return the following response:

{
   "status": "success",
   "data": [
      "app_ads_ad_requests_total",
      "app_currency_counter_total",
      "app_frontend_requests_total",
      "app_payment_transactions_total",
      "app_recommendations_counter_total",
      "http_server_duration_milliseconds_bucket",
      "http_server_duration_milliseconds_count",
      "http_server_duration_milliseconds_sum",
      "kafka_consumer_assigned_partitions",
      "kafka_consumer_bytes_consumed_rate",
      "kafka_consumer_bytes_consumed_total",
      "kafka_consumer_commit_latency_avg",
      "kafka_consumer_commit_latency_max",
      "kafka_consumer_commit_rate",
      "kafka_consumer_commit_sync_time_ns_total",
      "kafka_consumer_commit_total",
      "kafka_consumer_committed_time_ns_total",
      "kafka_consumer_connection_close_rate",
      "kafka_consumer_connection_close_total",
      "kafka_consumer_connection_count",
      "kafka_consumer_connection_creation_rate",
      "kafka_consumer_connection_creation_total",
      "kafka_consumer_failed_authentication_rate",
      "kafka_consumer_failed_authentication_total",
      "kafka_consumer_failed_reauthentication_rate",
      "kafka_consumer_failed_reauthentication_total",
      "kafka_consumer_failed_rebalance_rate_per_hour",
      "kafka_consumer_failed_rebalance_total",
      "kafka_consumer_fetch_latency_avg",
      "kafka_consumer_fetch_latency_max",
      "kafka_consumer_fetch_rate",
      "kafka_consumer_fetch_size_avg",
      "kafka_consumer_fetch_size_max",
      "kafka_consumer_fetch_throttle_time_avg",
      "kafka_consumer_fetch_throttle_time_max",
      "kafka_consumer_fetch_total",
      "kafka_consumer_heartbeat_rate",
      "kafka_consumer_heartbeat_response_time_max",
      "kafka_consumer_heartbeat_total",
      "kafka_consumer_incoming_byte_rate",
      "kafka_consumer_incoming_byte_total",
      "kafka_consumer_io_ratio",
      "kafka_consumer_io_time_ns_avg",
      "kafka_consumer_io_time_ns_total",
      "kafka_consumer_io_wait_ratio",
      "kafka_consumer_io_wait_time_ns_avg",
      "kafka_consumer_io_wait_time_ns_total",
      "kafka_consumer_io_waittime_total",
      "kafka_consumer_iotime_total",
      "kafka_consumer_join_rate",
      "kafka_consumer_join_time_avg",
      "kafka_consumer_join_time_max",
      "kafka_consumer_join_total",
      "kafka_consumer_last_heartbeat_seconds_ago",
      "kafka_consumer_last_poll_seconds_ago",
      "kafka_consumer_last_rebalance_seconds_ago",
      "kafka_consumer_network_io_rate",
      "kafka_consumer_network_io_total",
      "kafka_consumer_outgoing_byte_rate",
      "kafka_consumer_outgoing_byte_total",
      "kafka_consumer_partition_assigned_latency_avg",
      "kafka_consumer_partition_assigned_latency_max",
      "kafka_consumer_partition_lost_latency_avg",
      "kafka_consumer_partition_lost_latency_max",
      "kafka_consumer_partition_revoked_latency_avg",
      "kafka_consumer_partition_revoked_latency_max",
      "kafka_consumer_poll_idle_ratio_avg",
      "kafka_consumer_reauthentication_latency_avg",
      "kafka_consumer_reauthentication_latency_max",
      "kafka_consumer_rebalance_latency_avg",
      "kafka_consumer_rebalance_latency_max",
      "kafka_consumer_rebalance_latency_total",
      "kafka_consumer_rebalance_rate_per_hour",
      "kafka_consumer_rebalance_total",
      "kafka_consumer_records_consumed_rate",
      "kafka_consumer_records_consumed_total",
      "kafka_consumer_records_lag",
      "kafka_consumer_records_lag_avg",
      "kafka_consumer_records_lag_max",
      "kafka_consumer_records_lead",
      "kafka_consumer_records_lead_avg",
      "kafka_consumer_records_lead_min",
      "kafka_consumer_records_per_request_avg",
      "kafka_consumer_request_latency_avg",
      "kafka_consumer_request_latency_max",
      "kafka_consumer_request_rate",
      "kafka_consumer_request_size_avg",
      "kafka_consumer_request_size_max",
      "kafka_consumer_request_total",
      "kafka_consumer_response_rate",
      "kafka_consumer_response_total",
      "kafka_consumer_select_rate",
      "kafka_consumer_select_total",
      "kafka_consumer_successful_authentication_no_reauth_total",
      "kafka_consumer_successful_authentication_rate",
      "kafka_consumer_successful_authentication_total",
      "kafka_consumer_successful_reauthentication_rate",
      "kafka_consumer_successful_reauthentication_total",
      "kafka_consumer_sync_rate",
      "kafka_consumer_sync_time_avg",
      "kafka_consumer_sync_time_max",
      "kafka_consumer_sync_total",
      "kafka_consumer_time_between_poll_avg",
      "kafka_consumer_time_between_poll_max",
      "kafka_controller_active_count",
      "kafka_isr_operation_count",
      "kafka_lag_max",
      "kafka_logs_flush_Count_milliseconds_total",
      "kafka_logs_flush_time_50p_milliseconds",
      "kafka_logs_flush_time_99p_milliseconds",
      "kafka_message_count_total",
      "kafka_network_io_bytes_total",
      "kafka_partition_count",
      "kafka_partition_offline",
      "kafka_partition_underReplicated",
      "kafka_purgatory_size",
      "kafka_request_count_total",
      "kafka_request_failed_total",
      "kafka_request_queue",
      "kafka_request_time_50p_milliseconds",
      "kafka_request_time_99p_milliseconds",
      "kafka_request_time_milliseconds_total",
      "otel_logs_log_processor_logs",
      "otel_logs_log_processor_queue_limit",
      "otel_logs_log_processor_queue_usage",
      "otel_trace_span_processor_queue_limit",
      "otel_trace_span_processor_queue_usage",
      "otel_trace_span_processor_spans",
      "otelcol_exporter_enqueue_failed_log_records",
      "otelcol_exporter_enqueue_failed_metric_points",
      "otelcol_exporter_enqueue_failed_spans",
      "otelcol_exporter_queue_capacity",
      "otelcol_exporter_queue_size",
      "otelcol_exporter_sent_log_records",
      "otelcol_exporter_sent_metric_points",
      "otelcol_exporter_sent_spans",
      "otelcol_process_cpu_seconds",
      "otelcol_process_memory_rss",
      "otelcol_process_runtime_heap_alloc_bytes",
      "otelcol_process_runtime_total_alloc_bytes",
      "otelcol_process_runtime_total_sys_memory_bytes",
      "otelcol_process_uptime",
      "otelcol_processor_accepted_log_records",
      "otelcol_processor_accepted_metric_points",
      "otelcol_processor_accepted_spans",
      "otelcol_processor_batch_batch_send_size_bucket",
      "otelcol_processor_batch_batch_send_size_count",
      "otelcol_processor_batch_batch_send_size_sum",
      "otelcol_processor_batch_timeout_trigger_send",
      "otelcol_processor_dropped_log_records",
      "otelcol_processor_dropped_metric_points",
      "otelcol_processor_dropped_spans",
      "otelcol_processor_refused_log_records",
      "otelcol_processor_refused_metric_points",
      "otelcol_processor_refused_spans",
      "otelcol_processor_servicegraph_expired_edges",
      "otelcol_processor_servicegraph_total_edges",
      "otelcol_receiver_accepted_log_records",
      "otelcol_receiver_accepted_metric_points",
      "otelcol_receiver_accepted_spans",
      "otelcol_receiver_refused_log_records",
      "otelcol_receiver_refused_metric_points",
      "otelcol_receiver_refused_spans",
      "otlp_exporter_exported_total",
      "otlp_exporter_seen_total",
      "process_runtime_dotnet_assemblies_count",
      "process_runtime_dotnet_exceptions_count_total",
      "process_runtime_dotnet_gc_allocations_size_bytes_total",
      "process_runtime_dotnet_gc_collections_count_total",
      "process_runtime_dotnet_gc_committed_memory_size_bytes",
      "process_runtime_dotnet_gc_heap_size_bytes",
      "process_runtime_dotnet_gc_objects_size_bytes",
      "process_runtime_dotnet_jit_compilation_time_nanoseconds_total",
      "process_runtime_dotnet_jit_il_compiled_size_bytes_total",
      "process_runtime_dotnet_jit_methods_compiled_count_total",
      "process_runtime_dotnet_monitor_lock_contention_count_total",
      "process_runtime_dotnet_thread_pool_completed_items_count_total",
      "process_runtime_dotnet_thread_pool_queue_length",
      "process_runtime_dotnet_thread_pool_threads_count",
      "process_runtime_dotnet_timer_count",
      "process_runtime_go_cgo_calls",
      "process_runtime_go_gc_count_total",
      "process_runtime_go_gc_pause_ns_bucket",
      "process_runtime_go_gc_pause_ns_count",
      "process_runtime_go_gc_pause_ns_sum",
      "process_runtime_go_gc_pause_ns_total",
      "process_runtime_go_goroutines",
      "process_runtime_go_mem_heap_alloc_bytes",
      "process_runtime_go_mem_heap_idle_bytes",
      "process_runtime_go_mem_heap_inuse_bytes",
      "process_runtime_go_mem_heap_objects",
      "process_runtime_go_mem_heap_released_bytes",
      "process_runtime_go_mem_heap_sys_bytes",
      "process_runtime_go_mem_live_objects",
      "process_runtime_go_mem_lookups_total",
      "process_runtime_jvm_buffer_count",
      "process_runtime_jvm_buffer_limit_bytes",
      "process_runtime_jvm_buffer_usage_bytes",
      "process_runtime_jvm_classes_current_loaded",
      "process_runtime_jvm_classes_loaded_total",
      "process_runtime_jvm_classes_unloaded_total",
      "process_runtime_jvm_cpu_utilization_ratio",
      "process_runtime_jvm_gc_duration_milliseconds_bucket",
      "process_runtime_jvm_gc_duration_milliseconds_count",
      "process_runtime_jvm_gc_duration_milliseconds_sum",
      "process_runtime_jvm_memory_committed_bytes",
      "process_runtime_jvm_memory_init_bytes",
      "process_runtime_jvm_memory_limit_bytes",
      "process_runtime_jvm_memory_usage_after_last_gc_bytes",
      "process_runtime_jvm_memory_usage_bytes",
      "process_runtime_jvm_system_cpu_load_1m_ratio",
      "process_runtime_jvm_system_cpu_utilization_ratio",
      "process_runtime_jvm_threads_count",
      "processedLogs_total",
      "processedSpans_total",
      "rpc_client_duration_milliseconds_bucket",
      "rpc_client_duration_milliseconds_count",
      "rpc_client_duration_milliseconds_sum",
      "rpc_server_duration_milliseconds_bucket",
      "rpc_server_duration_milliseconds_count",
      "rpc_server_duration_milliseconds_sum",
      "runtime_cpython_cpu_time_seconds_total",
      "runtime_cpython_gc_count_bytes_total",
      "runtime_cpython_memory_bytes_total",
      "runtime_uptime_milliseconds",
      "scrape_duration_seconds",
      "scrape_samples_post_metric_relabeling",
      "scrape_samples_scraped",
      "scrape_series_added",
      "span_metrics_calls_total",
      "span_metrics_duration_milliseconds_bucket",
      "span_metrics_duration_milliseconds_count",
      "span_metrics_duration_milliseconds_sum",
      "system_cpu_time_seconds_total",
      "system_cpu_utilization_ratio",
      "system_disk_io_bytes_total",
      "system_disk_operations_total",
      "system_disk_time_seconds_total",
      "system_memory_usage_bytes",
      "system_memory_utilization_ratio",
      "system_network_connections",
      "system_network_dropped_packets_total",
      "system_network_errors_total",
      "system_network_io_bytes_total",
      "system_network_packets_total",
      "system_swap_usage_pages",
      "system_swap_utilization_ratio",
      "system_thread_count",
      "target_info",
      "up"
   ]
}

Reference

Project reference documentation, like requirements and feature matrices here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

architecture.md

architecture.md

Opensearch OTEL Demo Architecture

This document will review the OpenSearch architecture for the OTEL demo and will review how to use the new Observability capabilities implemented into OpenSearch.

Services Topology

Purpose

Ingestion

Integrations -

Scenarios

Prometheus Metrics

Reference

Files

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Opensearch OTEL Demo Architecture

This document will review the OpenSearch architecture for the OTEL demo and will review how to use the new Observability capabilities implemented into OpenSearch.

Services Topology

Purpose

Ingestion

Integrations -

Scenarios

Prometheus Metrics

Reference