Menu

Search and Query System

Relevant source files

Purpose and Scope

The Search and Query System provides full-text search, structured queries, and analytics capabilities for indexed Nginx logs using the Bleve search engine. This system sits between the indexed log data (see Parallel Indexing Pipeline) and the analytics services (see Analytics and Dashboard), enabling efficient log retrieval, filtering, and aggregation across multiple index shards.

This page covers:

  • Bleve search engine integration and index schema design
  • Distributed search architecture across sharded indexes
  • Query construction from high-level search requests
  • Faceting and aggregation for analytics
  • Result caching and performance optimization
  • Search API endpoints and request handling

For information about log indexing and parsing, see Parallel Indexing Pipeline. For analytics calculations and dashboard generation, see Analytics and Dashboard.

System Architecture

The search system operates as a layer between the indexed Bleve shards and the analytics/API layers, providing query translation, distributed search coordination, and result aggregation.

Sources: internal/nginx_log/modern_services.go1-677 internal/nginx_log/searcher/types.go1-513 internal/nginx_log/analytics/service.go1-154

Bleve Index Schema and Mapping

The search system uses Bleve's mapping system to define how log documents are indexed and queried. The mapping specifies field types, analyzers, and indexing options for optimal search performance.

Index Mapping Definition

The CreateLogIndexMapping() function defines the schema for log documents:

Key Fields and Their Purposes:

FieldTypeAnalyzerDocValuesPurpose
timestampNumericN/ANoTime range queries, sorting
ipTextKeywordYesExact IP matching, faceting
methodTextKeywordNoHTTP method filtering
pathTextStandardNoFull-text path search
path_exactTextKeywordYesExact path matching, faceting
statusNumericN/ANoStatus code range queries
bytes_sentNumericN/ANoTraffic volume queries
browser, os, device_typeTextKeywordNoDevice/browser filtering
main_log_pathTextKeywordYesLog group filtering, efficient aggregation
file_pathTextKeywordNoPhysical file filtering
rawTextN/ANo (stored only)Raw log line retrieval

Sources: internal/nginx_log/indexer/types.go340-456 internal/nginx_log/indexer/parser.go172-231

Searcher Implementation

The Searcher struct provides the main search interface, wrapping multiple Bleve index shards through a bleve.IndexAlias for distributed search.

Searcher Initialization

Key Functions:

  • GetSearcher() internal/nginx_log/modern_services.go165-198: Returns global searcher instance with health checks and auto-healing
  • NewSearcher(): Creates searcher with IndexAlias wrapping multiple shards
  • SwapShards(): Atomically replaces shards using Bleve's IndexAlias.Swap() for zero-downtime updates
  • IsHealthy(): Checks if searcher has at least one active shard

Sources: internal/nginx_log/modern_services.go90-129 internal/nginx_log/modern_services.go165-198 internal/nginx_log/modern_services.go484-571

Query Construction

The search system translates high-level SearchRequest objects into Bleve query objects, supporting multiple query types and filters.

Query Builder Architecture

Query Types Used:

Bleve Query TypeUse CaseFields
QueryStringQueryFree-text searchquery parameter
TermQueryExact field matchingip, method, status, main_log_path
NumericRangeQueryNumeric range filteringtimestamp, status, bytes_sent, request_time
ConjunctionQueryAND combination of filtersAll of the above
DisjunctionQueryOR combinationMultiple values in arrays

Main Log Path vs File Path:

The use_main_log_path flag determines which field to use for log filtering:

  • main_log_path: Groups rotated logs together (e.g., access.log, access.log.1, access.log.2.gz all map to access.log)
  • file_path: Targets specific physical files

Sources: internal/nginx_log/searcher/types.go48-95 internal/nginx_log/indexer/types.go179-203

Distributed Search Execution

The searcher executes queries across multiple Bleve shards and merges results, leveraging Bleve's IndexAlias for parallel execution.

Search Flow

Merge Operations:

  1. Hit Merging: Results from all shards are merged and sorted by score or specified field
  2. Total Count: Sum of TotalHits from all shards
  3. Facet Aggregation: Facet terms are merged and counts aggregated across shards
  4. Max Score: Highest score across all shards

Sources: internal/nginx_log/modern_services.go484-571

Faceting and Aggregation

Faceting enables analytics by counting unique values for specified fields. The system supports high-cardinality faceting through multiple strategies.

Faceting Architecture

Facet Configuration:

ParameterDefaultPurpose
FacetFields[]List of fields to facet on
FacetSize10Number of top terms to return per facet
IncludeFacetsfalseEnable faceting in search

High-Cardinality Optimization:

For fields with many unique values (like IP addresses), the system uses a Counter with HyperLogLog algorithm:

Counter Usage:

Sources: internal/nginx_log/analytics/service.go32-85 internal/nginx_log/searcher/types.go87-89 internal/nginx_log/analytics/dashboard.go13-100

Search Result Caching

The search system implements an LRU cache to avoid redundant queries for identical search requests.

Cache Architecture

Cache Configuration:

SettingDefaultPurpose
CacheSize1000Maximum cached search results
EnableCachetrueGlobal cache toggle
CacheKeyHash of requestUnique identifier for each query

Cache Key Generation:

The cache key includes:

  • Query string
  • All filter parameters (time range, IP, method, status, etc.)
  • Sort order and pagination
  • Facet configuration

This ensures that different queries don't collide while allowing exact duplicate queries to hit the cache.

Sources: internal/nginx_log/searcher/types.go12-36 internal/nginx_log/searcher/types.go40-46

Analytics Integration

The analytics service builds on the search system to provide comprehensive log statistics and dashboards.

Analytics Query Pipeline

Analytics Service Methods:

MethodPurposeKey Features
GetDashboardAnalytics()Full dashboard dataHourly/daily stats, top URLs, device distribution
GetLogEntriesStats()Entry-level statisticsStatus distribution, method distribution, traffic stats
GetGeoDistribution()Geographic analyticsCountry/city distribution using region_code, city fields
GetTopPaths()Most accessed URLsUses path_exact faceting
GetTopIPs()Most active IPsUses ip faceting
GetTopUserAgents()User agent distributionUses user_agent faceting

Sources: internal/nginx_log/analytics/service.go11-154 internal/nginx_log/analytics/dashboard.go1-690

API Endpoints

The search system is exposed through several API endpoints that handle search requests and analytics queries.

Search API Endpoints

Endpoint Details:

1. Advanced Search (POST /api/nginx_log/search)

Request:

Response:

2. Preflight Check (POST /api/nginx_log/preflight)

Checks if a log file is indexed and available for searching:

Request:

Response:

3. Analytics (GET /api/nginx_log/analytics)

Returns comprehensive analytics for dashboard visualization:

Query Parameters:

  • path: Log file path
  • start_time: Unix timestamp
  • end_time: Unix timestamp
  • limit: Result limit

Response: Full DashboardAnalytics object with hourly/daily stats, top URLs, device distribution, etc.

Sources: api/nginx_log/analytics.go1-694 api/nginx_log/index_management.go1-513

Performance Optimizations

The search system employs several optimization strategies for efficient query execution:

Optimization Techniques

TechniqueImplementationBenefit
Shard DistributionHash-based key distributionParallel query execution across N shards
DocValuesEnabled on ip, path_exact, main_log_pathFast faceting without document loading
LRU Caching1000-entry cache with TTLEliminates redundant queries
Cardinality ApproximationHyperLogLog for UV countingSub-second unique counts for millions of IPs
Index AliasBleve's IndexAliasZero-downtime shard updates
Batch QueriesCombined hourly/daily statsReduces round-trips
Selective FieldsFields: [] parameterReturns only needed fields
Main Log Pathuse_main_log_path flagEfficient log group queries

Query Performance Characteristics:

  • Simple term queries: < 10ms per shard
  • Range queries with facets: 50-200ms depending on result size
  • High-cardinality UV counting: < 1s for 100K+ unique IPs
  • Dashboard analytics: 200-500ms with caching

Sources: internal/nginx_log/indexer/types.go340-456 internal/nginx_log/searcher/types.go12-46 internal/nginx_log/modern_services.go484-571


Key Takeaways:

  1. The search system wraps multiple Bleve index shards through an IndexAlias for distributed queries
  2. Query construction translates high-level filters into Bleve query objects with proper field mapping
  3. Hot-swapping of shards enables zero-downtime index rebuilds
  4. Faceting and cardinality counting provide fast analytics for high-cardinality fields
  5. LRU caching and DocValues optimization ensure sub-second query performance
  6. The main_log_path field enables efficient queries across rotated log groups
  7. API endpoints expose search functionality for both structured queries and analytics dashboards