Search and Query System

Relevant source files

Purpose and Scope

The Search and Query System provides full-text search, structured queries, and analytics capabilities for indexed Nginx logs using the Bleve search engine. This system sits between the indexed log data (see Parallel Indexing Pipeline) and the analytics services (see Analytics and Dashboard), enabling efficient log retrieval, filtering, and aggregation across multiple index shards.

This page covers:

Bleve search engine integration and index schema design
Distributed search architecture across sharded indexes
Query construction from high-level search requests
Faceting and aggregation for analytics
Result caching and performance optimization
Search API endpoints and request handling

For information about log indexing and parsing, see Parallel Indexing Pipeline. For analytics calculations and dashboard generation, see Analytics and Dashboard.

System Architecture

The search system operates as a layer between the indexed Bleve shards and the analytics/API layers, providing query translation, distributed search coordination, and result aggregation.

Sources: internal/nginx_log/modern_services.go1-677 internal/nginx_log/searcher/types.go1-513 internal/nginx_log/analytics/service.go1-154

Bleve Index Schema and Mapping

The search system uses Bleve's mapping system to define how log documents are indexed and queried. The mapping specifies field types, analyzers, and indexing options for optimal search performance.

Index Mapping Definition

The CreateLogIndexMapping() function defines the schema for log documents:

Key Fields and Their Purposes:

Field	Type	Analyzer	DocValues	Purpose
`timestamp`	Numeric	N/A	No	Time range queries, sorting
`ip`	Text	Keyword	Yes	Exact IP matching, faceting
`method`	Text	Keyword	No	HTTP method filtering
`path`	Text	Standard	No	Full-text path search
`path_exact`	Text	Keyword	Yes	Exact path matching, faceting
`status`	Numeric	N/A	No	Status code range queries
`bytes_sent`	Numeric	N/A	No	Traffic volume queries
`browser`, `os`, `device_type`	Text	Keyword	No	Device/browser filtering
`main_log_path`	Text	Keyword	Yes	Log group filtering, efficient aggregation
`file_path`	Text	Keyword	No	Physical file filtering
`raw`	Text	N/A	No (stored only)	Raw log line retrieval

Sources: internal/nginx_log/indexer/types.go340-456 internal/nginx_log/indexer/parser.go172-231

Searcher Implementation

The Searcher struct provides the main search interface, wrapping multiple Bleve index shards through a bleve.IndexAlias for distributed search.

Searcher Initialization

Key Functions:

GetSearcher() internal/nginx_log/modern_services.go165-198: Returns global searcher instance with health checks and auto-healing
NewSearcher(): Creates searcher with IndexAlias wrapping multiple shards
SwapShards(): Atomically replaces shards using Bleve's IndexAlias.Swap() for zero-downtime updates
IsHealthy(): Checks if searcher has at least one active shard

Sources: internal/nginx_log/modern_services.go90-129 internal/nginx_log/modern_services.go165-198 internal/nginx_log/modern_services.go484-571

Query Construction

The search system translates high-level SearchRequest objects into Bleve query objects, supporting multiple query types and filters.

Query Builder Architecture

Query Types Used:

Bleve Query Type	Use Case	Fields
`QueryStringQuery`	Free-text search	`query` parameter
`TermQuery`	Exact field matching	`ip`, `method`, `status`, `main_log_path`
`NumericRangeQuery`	Numeric range filtering	`timestamp`, `status`, `bytes_sent`, `request_time`
`ConjunctionQuery`	AND combination of filters	All of the above
`DisjunctionQuery`	OR combination	Multiple values in arrays

Main Log Path vs File Path:

The use_main_log_path flag determines which field to use for log filtering:

main_log_path: Groups rotated logs together (e.g., access.log, access.log.1, access.log.2.gz all map to access.log)
file_path: Targets specific physical files

Sources: internal/nginx_log/searcher/types.go48-95 internal/nginx_log/indexer/types.go179-203

Distributed Search Execution

The searcher executes queries across multiple Bleve shards and merges results, leveraging Bleve's IndexAlias for parallel execution.

Search Flow

Merge Operations:

Hit Merging: Results from all shards are merged and sorted by score or specified field
Total Count: Sum of TotalHits from all shards
Facet Aggregation: Facet terms are merged and counts aggregated across shards
Max Score: Highest score across all shards

Sources: internal/nginx_log/modern_services.go484-571

Faceting and Aggregation

Faceting enables analytics by counting unique values for specified fields. The system supports high-cardinality faceting through multiple strategies.

Faceting Architecture

Facet Configuration:

Parameter	Default	Purpose
`FacetFields`	`[]`	List of fields to facet on
`FacetSize`	10	Number of top terms to return per facet
`IncludeFacets`	`false`	Enable faceting in search

High-Cardinality Optimization:

For fields with many unique values (like IP addresses), the system uses a Counter with HyperLogLog algorithm:

Counter Usage:

Sources: internal/nginx_log/analytics/service.go32-85 internal/nginx_log/searcher/types.go87-89 internal/nginx_log/analytics/dashboard.go13-100

Search Result Caching

The search system implements an LRU cache to avoid redundant queries for identical search requests.

Cache Architecture

Cache Configuration:

Setting	Default	Purpose
`CacheSize`	1000	Maximum cached search results
`EnableCache`	`true`	Global cache toggle
`CacheKey`	Hash of request	Unique identifier for each query

Cache Key Generation:

The cache key includes:

Query string
All filter parameters (time range, IP, method, status, etc.)
Sort order and pagination
Facet configuration

This ensures that different queries don't collide while allowing exact duplicate queries to hit the cache.

Sources: internal/nginx_log/searcher/types.go12-36 internal/nginx_log/searcher/types.go40-46

Analytics Integration

The analytics service builds on the search system to provide comprehensive log statistics and dashboards.

Analytics Query Pipeline

Analytics Service Methods:

Method	Purpose	Key Features
`GetDashboardAnalytics()`	Full dashboard data	Hourly/daily stats, top URLs, device distribution
`GetLogEntriesStats()`	Entry-level statistics	Status distribution, method distribution, traffic stats
`GetGeoDistribution()`	Geographic analytics	Country/city distribution using `region_code`, `city` fields
`GetTopPaths()`	Most accessed URLs	Uses `path_exact` faceting
`GetTopIPs()`	Most active IPs	Uses `ip` faceting
`GetTopUserAgents()`	User agent distribution	Uses `user_agent` faceting

Sources: internal/nginx_log/analytics/service.go11-154 internal/nginx_log/analytics/dashboard.go1-690

API Endpoints

The search system is exposed through several API endpoints that handle search requests and analytics queries.

Search API Endpoints

Endpoint Details:

1. Advanced Search (`POST /api/nginx_log/search`)

Request:

Response:

2. Preflight Check (`POST /api/nginx_log/preflight`)

Checks if a log file is indexed and available for searching:

Request:

Response:

3. Analytics (`GET /api/nginx_log/analytics`)

Returns comprehensive analytics for dashboard visualization:

Query Parameters:

path: Log file path
start_time: Unix timestamp
end_time: Unix timestamp
limit: Result limit

Response: Full DashboardAnalytics object with hourly/daily stats, top URLs, device distribution, etc.

Sources: api/nginx_log/analytics.go1-694 api/nginx_log/index_management.go1-513

Performance Optimizations

The search system employs several optimization strategies for efficient query execution:

Optimization Techniques

Technique	Implementation	Benefit
Shard Distribution	Hash-based key distribution	Parallel query execution across N shards
DocValues	Enabled on `ip`, `path_exact`, `main_log_path`	Fast faceting without document loading
LRU Caching	1000-entry cache with TTL	Eliminates redundant queries
Cardinality Approximation	HyperLogLog for UV counting	Sub-second unique counts for millions of IPs
Index Alias	Bleve's `IndexAlias`	Zero-downtime shard updates
Batch Queries	Combined hourly/daily stats	Reduces round-trips
Selective Fields	`Fields: []` parameter	Returns only needed fields
Main Log Path	`use_main_log_path` flag	Efficient log group queries

Query Performance Characteristics:

Simple term queries: < 10ms per shard
Range queries with facets: 50-200ms depending on result size
High-cardinality UV counting: < 1s for 100K+ unique IPs
Dashboard analytics: 200-500ms with caching

Sources: internal/nginx_log/indexer/types.go340-456 internal/nginx_log/searcher/types.go12-46 internal/nginx_log/modern_services.go484-571

Key Takeaways:

The search system wraps multiple Bleve index shards through an IndexAlias for distributed queries
Query construction translates high-level filters into Bleve query objects with proper field mapping
Hot-swapping of shards enables zero-downtime index rebuilds
Faceting and cardinality counting provide fast analytics for high-cardinality fields
LRU caching and DocValues optimization ensure sub-second query performance
The main_log_path field enables efficient queries across rotated log groups
API endpoints expose search functionality for both structured queries and analytics dashboards