The Search and Query System provides full-text search, structured queries, and analytics capabilities for indexed Nginx logs using the Bleve search engine. This system sits between the indexed log data (see Parallel Indexing Pipeline) and the analytics services (see Analytics and Dashboard), enabling efficient log retrieval, filtering, and aggregation across multiple index shards.
This page covers:
For information about log indexing and parsing, see Parallel Indexing Pipeline. For analytics calculations and dashboard generation, see Analytics and Dashboard.
The search system operates as a layer between the indexed Bleve shards and the analytics/API layers, providing query translation, distributed search coordination, and result aggregation.
Sources: internal/nginx_log/modern_services.go1-677 internal/nginx_log/searcher/types.go1-513 internal/nginx_log/analytics/service.go1-154
The search system uses Bleve's mapping system to define how log documents are indexed and queried. The mapping specifies field types, analyzers, and indexing options for optimal search performance.
The CreateLogIndexMapping() function defines the schema for log documents:
Key Fields and Their Purposes:
| Field | Type | Analyzer | DocValues | Purpose |
|---|---|---|---|---|
timestamp | Numeric | N/A | No | Time range queries, sorting |
ip | Text | Keyword | Yes | Exact IP matching, faceting |
method | Text | Keyword | No | HTTP method filtering |
path | Text | Standard | No | Full-text path search |
path_exact | Text | Keyword | Yes | Exact path matching, faceting |
status | Numeric | N/A | No | Status code range queries |
bytes_sent | Numeric | N/A | No | Traffic volume queries |
browser, os, device_type | Text | Keyword | No | Device/browser filtering |
main_log_path | Text | Keyword | Yes | Log group filtering, efficient aggregation |
file_path | Text | Keyword | No | Physical file filtering |
raw | Text | N/A | No (stored only) | Raw log line retrieval |
Sources: internal/nginx_log/indexer/types.go340-456 internal/nginx_log/indexer/parser.go172-231
The Searcher struct provides the main search interface, wrapping multiple Bleve index shards through a bleve.IndexAlias for distributed search.
Key Functions:
GetSearcher() internal/nginx_log/modern_services.go165-198: Returns global searcher instance with health checks and auto-healingNewSearcher(): Creates searcher with IndexAlias wrapping multiple shardsSwapShards(): Atomically replaces shards using Bleve's IndexAlias.Swap() for zero-downtime updatesIsHealthy(): Checks if searcher has at least one active shardSources: internal/nginx_log/modern_services.go90-129 internal/nginx_log/modern_services.go165-198 internal/nginx_log/modern_services.go484-571
The search system translates high-level SearchRequest objects into Bleve query objects, supporting multiple query types and filters.
Query Types Used:
| Bleve Query Type | Use Case | Fields |
|---|---|---|
QueryStringQuery | Free-text search | query parameter |
TermQuery | Exact field matching | ip, method, status, main_log_path |
NumericRangeQuery | Numeric range filtering | timestamp, status, bytes_sent, request_time |
ConjunctionQuery | AND combination of filters | All of the above |
DisjunctionQuery | OR combination | Multiple values in arrays |
Main Log Path vs File Path:
The use_main_log_path flag determines which field to use for log filtering:
main_log_path: Groups rotated logs together (e.g., access.log, access.log.1, access.log.2.gz all map to access.log)file_path: Targets specific physical filesSources: internal/nginx_log/searcher/types.go48-95 internal/nginx_log/indexer/types.go179-203
The searcher executes queries across multiple Bleve shards and merges results, leveraging Bleve's IndexAlias for parallel execution.
Merge Operations:
TotalHits from all shardsSources: internal/nginx_log/modern_services.go484-571
Faceting enables analytics by counting unique values for specified fields. The system supports high-cardinality faceting through multiple strategies.
Facet Configuration:
| Parameter | Default | Purpose |
|---|---|---|
FacetFields | [] | List of fields to facet on |
FacetSize | 10 | Number of top terms to return per facet |
IncludeFacets | false | Enable faceting in search |
High-Cardinality Optimization:
For fields with many unique values (like IP addresses), the system uses a Counter with HyperLogLog algorithm:
Counter Usage:
Sources: internal/nginx_log/analytics/service.go32-85 internal/nginx_log/searcher/types.go87-89 internal/nginx_log/analytics/dashboard.go13-100
The search system implements an LRU cache to avoid redundant queries for identical search requests.
Cache Configuration:
| Setting | Default | Purpose |
|---|---|---|
CacheSize | 1000 | Maximum cached search results |
EnableCache | true | Global cache toggle |
CacheKey | Hash of request | Unique identifier for each query |
Cache Key Generation:
The cache key includes:
This ensures that different queries don't collide while allowing exact duplicate queries to hit the cache.
Sources: internal/nginx_log/searcher/types.go12-36 internal/nginx_log/searcher/types.go40-46
The analytics service builds on the search system to provide comprehensive log statistics and dashboards.
Analytics Service Methods:
| Method | Purpose | Key Features |
|---|---|---|
GetDashboardAnalytics() | Full dashboard data | Hourly/daily stats, top URLs, device distribution |
GetLogEntriesStats() | Entry-level statistics | Status distribution, method distribution, traffic stats |
GetGeoDistribution() | Geographic analytics | Country/city distribution using region_code, city fields |
GetTopPaths() | Most accessed URLs | Uses path_exact faceting |
GetTopIPs() | Most active IPs | Uses ip faceting |
GetTopUserAgents() | User agent distribution | Uses user_agent faceting |
Sources: internal/nginx_log/analytics/service.go11-154 internal/nginx_log/analytics/dashboard.go1-690
The search system is exposed through several API endpoints that handle search requests and analytics queries.
Endpoint Details:
POST /api/nginx_log/search)Request:
Response:
POST /api/nginx_log/preflight)Checks if a log file is indexed and available for searching:
Request:
Response:
GET /api/nginx_log/analytics)Returns comprehensive analytics for dashboard visualization:
Query Parameters:
path: Log file pathstart_time: Unix timestampend_time: Unix timestamplimit: Result limitResponse: Full DashboardAnalytics object with hourly/daily stats, top URLs, device distribution, etc.
Sources: api/nginx_log/analytics.go1-694 api/nginx_log/index_management.go1-513
The search system employs several optimization strategies for efficient query execution:
| Technique | Implementation | Benefit |
|---|---|---|
| Shard Distribution | Hash-based key distribution | Parallel query execution across N shards |
| DocValues | Enabled on ip, path_exact, main_log_path | Fast faceting without document loading |
| LRU Caching | 1000-entry cache with TTL | Eliminates redundant queries |
| Cardinality Approximation | HyperLogLog for UV counting | Sub-second unique counts for millions of IPs |
| Index Alias | Bleve's IndexAlias | Zero-downtime shard updates |
| Batch Queries | Combined hourly/daily stats | Reduces round-trips |
| Selective Fields | Fields: [] parameter | Returns only needed fields |
| Main Log Path | use_main_log_path flag | Efficient log group queries |
Query Performance Characteristics:
Sources: internal/nginx_log/indexer/types.go340-456 internal/nginx_log/searcher/types.go12-46 internal/nginx_log/modern_services.go484-571
Key Takeaways:
IndexAlias for distributed queriesmain_log_path field enables efficient queries across rotated log groupsRefresh this wiki
This wiki was recently refreshed. Please wait 7 days to refresh again.