Log Management and Analytics

Relevant source files

This document describes the log management and analytics subsystem of Nginx UI, which provides automated log file discovery, high-performance indexing, full-text search, and analytics capabilities for Nginx access and error logs.

Scope: This page covers the overall architecture, service initialization, log file management, and data flow through the log processing pipeline. For detailed information about the indexing internals, see Log Indexing System. For search and analytics features, see Search and Analytics. For real-time communication, see WebSocket Services.

System Architecture

The log management system consists of modular services that handle log discovery, indexing, searching, and analytics. The architecture uses global singleton instances initialized through InitializeServices() in internal/nginx_log/modern_services.go

Core Architecture with Code Entities

Key Design Patterns:

Singleton Services: Global instances (globalIndexer, globalSearcher, etc.) initialized once via InitializeServices()
Worker Pool: ParallelIndexer maintains workers []*indexWorker that process jobs from jobQueue
Grouped Sharding: GroupedShardManager creates one Bleve index per log group using main_log_path field
Zero-Downtime Updates: SwapShards() uses bleve.IndexAlias for atomic shard replacement
Event-Driven UI: WebSocket events (TypeNginxLogIndexProgress, TypeNginxLogIndexReady) provide real-time updates

Sources:

Service Initialization

The log management system initializes conditionally based on settings.NginxLogSettings.IndexingEnabled. When enabled, it creates and starts all modular services with proper lifecycle management.

Service Initialization Flow

Initialization Sequence

The InitializeServices() function (internal/nginx_log/modern_services.go44-87) performs these steps:

Enable Check: Exits early if settings.NginxLogSettings.IndexingEnabled is false
Duplicate Check: Returns if servicesInitialized is already true
Context Creation: Creates cancellable context via context.WithCancel(ctx)

Component Initialization: Calls initializeWithDefaults(serviceCtx) (internal/nginx_log/modern_services.go90-129):

initializeWithDefaults() ├─> indexer.InitLogParser() // Global parser singleton ├─> searcher.NewSearcher(config, []bleve.Index{}) // Empty searcher ├─> analytics.NewService(globalSearcher) // Analytics service ├─> indexer.NewGroupedShardManager(config) // Shard manager ├─> indexer.NewParallelIndexer(config, manager) // Indexer with workers ├─> globalIndexer.Start(ctx) // Start worker pool ├─> indexer.NewLogFileManager() // Log file tracker ├─> manager.SetIndexer(globalIndexer) // Inject indexer reference └─> updateSearcherShardsLocked() // Load existing shards

Task Scheduling: Starts InitTaskScheduler(serviceCtx) in goroutine for background job processing
Shutdown Monitoring: Goroutine monitors serviceCtx.Done() and calls StopServices() for cleanup

Index Path Configuration

The index storage location is determined by getConfigDirIndexPath() (internal/nginx_log/modern_services.go132-162):

Custom Path: Uses settings.NginxLogSettings.IndexPath if configured
Config Directory: Uses <config_dir>/log-index where <config_dir> is from cSettings.ConfPath
Fallback: Uses ./log-index in current working directory

Fallback Storage When Indexing Disabled

When IndexingEnabled is false, log discovery still works using in-memory storage:

The legacy API functions (internal/nginx_log/modern_services.go256-317) check GetLogFileManager() == nil and fall back to fallbackCache for operations like AddLogPath(), GetAllLogPaths(), and RemoveLogPathsFromConfig().

Sources:

Log File Discovery and Management

The LogFileManager discovers and tracks Nginx log files through integration with the configuration scanner:

NginxLogCache Structure

The NginxLogCache struct (internal/nginx_log/indexer/log_file_manager.go20-26) represents a discovered log file from nginx configuration:

This is distinct from NginxLogWithIndex (internal/nginx_log/indexer/log_file_manager.go28-50), which includes indexing metadata:

Field	Type	Description
`Path`	`string`	Absolute path to log file
`Type`	`string`	"access" or "error"
`Name`	`string`	Display name
`ConfigFile`	`string`	Source nginx config file
`IndexStatus`	`string`	"indexed", "indexing", "queued", "not_indexed", "error"
`LastModified`	`int64`	Unix timestamp of file modification
`LastSize`	`int64`	File size in bytes
`LastIndexed`	`int64`	Unix timestamp of last successful index
`IndexStartTime`	`int64`	Unix timestamp when indexing started
`IndexDuration`	`int64`	Duration in milliseconds
`DocumentCount`	`uint64`	Number of indexed documents
`TimeRangeStart`	`int64`	Earliest log entry timestamp
`TimeRangeEnd`	`int64`	Latest log entry timestamp
`ErrorMessage`	`string`	Error details if status is "error"
`QueuePosition`	`int`	Position in indexing queue

Sources:

Log File Status Lifecycle

Each log file progresses through distinct indexing states tracked in the model.NginxLogIndex database table:

Log File Status State Machine

IndexStatus Enum Values

The IndexStatus type is defined in internal/nginx_log/indexer/types.go13-22:

Status	Value	Description
`IndexStatusNotIndexed`	`"not_indexed"`	File discovered but not yet processed
`IndexStatusQueued`	`"queued"`	Waiting in the indexer's job queue
`IndexStatusIndexing`	`"indexing"`	Currently being processed by a worker
`IndexStatusIndexed`	`"indexed"`	Successfully indexed with complete metadata
`IndexStatusError`	`"error"`	Failed with error message stored

Status Transitions

The PersistenceManager updates status through these key methods:

SaveLogIndex(): Sets status to "indexed" when IndexDuration > 0 (internal/nginx_log/indexer/persistence.go93-124)
SetIndexingStatus(): Temporarily marks a file as "indexing" during processing (internal/nginx_log/indexer/log_file_manager.go129-138)
Status updates are propagated to the frontend via WebSocket events

Log Grouping and Rotation Handling

The LogFileManager.GetAllLogsWithIndexGrouped() method aggregates rotated files by their base log path (internal/nginx_log/indexer/log_file_manager.go187-426):

Detects rotation patterns: access.log.1, access.log.2.gz, access.log.20240101
Groups files by getBaseLogName() which strips rotation suffixes
Aggregates DocumentCount and time ranges across all files in the group
Returns a single entry per log group with combined statistics

Sources:

Indexing Pipeline Architecture

The indexing pipeline uses a parallel worker pool with grouped sharding and SIMD-optimized parsing. The pipeline processes log groups (base log files and their rotated variants) through multiple stages: file discovery, parsing, document routing, and Bleve indexing. Detailed implementation is covered in page 8.1 (Parallel Indexing Pipeline) and 8.2 (Search and Query System).

Indexing Pipeline Data Flow

Key Components

ParallelIndexer Structure (internal/nginx_log/indexer/parallel_indexer.go18-52)

The ParallelIndexer struct maintains the core indexing state:

Worker Pool Processing (internal/nginx_log/indexer/parallel_indexer.go940-977)

Each indexWorker runs in its own goroutine:

Blocks on jobQueue channel
Updates status to WorkerStatusBusy
Calls processJob() to:
- Group documents by mainLogPath then by shardID
- Call GetShardForDocument() to route to correct shard
- Index documents via indexShardDocuments()
Sends IndexResult to resultQueue
Executes job callback with error status
Updates status back to WorkerStatusIdle

SIMD-Optimized Parser (internal/nginx_log/indexer/parser.go59-76)

The global logParser singleton uses optimized parsing:

Initialized once via InitLogParser() (internal/nginx_log/indexer/parser.go24-52)
Uses parser.StreamParse() for 7-8x faster batch processing
Achieves 70% memory reduction through zero-allocation techniques
Processes logs in chunks with configurable BatchSize (default 15000)

Grouped Shard Manager (internal/nginx_log/indexer/parallel_indexer.go104-108)

Routes documents by mainLogPath instead of individual file paths:

Creates one shard per log group (e.g., /var/log/nginx/access.log and all its rotated files share a shard)
Implements GetShardForDocument(mainLogPath, docID) method
Maintains shardsByMainLogPath map[string]bleve.Index
Enables efficient queries across all rotated files in a log group

Sources:

Search and Analytics Architecture

The search and analytics layer provides full-text search, faceted filtering, and aggregated statistics with HyperLogLog cardinality counting for accurate UV metrics. Detailed coverage is in Search and Analytics.

Search and Analytics Component Architecture

Search Request Structure

The SearchRequest struct (internal/nginx_log/searcher/types.go48-95) specifies search parameters:

Field	Type	Description
`Query`	`string`	Full-text search query string
`Fields`	`[]string`	Fields to retrieve in results
`LogPaths`	`[]string`	Filter by specific log file paths
`UseMainLogPath`	`bool`	Use `main_log_path` field for log group queries
`StartTime`	`*int64`	Unix timestamp start (inclusive)
`EndTime`	`*int64`	Unix timestamp end (inclusive)
`IPAddresses`	`[]string`	Filter by IP addresses
`Methods`	`[]string`	Filter by HTTP methods
`StatusCodes`	`[]int`	Filter by HTTP status codes
`Paths`	`[]string`	Filter by request paths
`UserAgents`	`[]string`	Filter by user agent strings
`Referers`	`[]string`	Filter by referer headers
`Countries`	`[]string`	Filter by country codes
`Browsers`	`[]string`	Filter by browser names
`OSs`	`[]string`	Filter by operating systems
`Devices`	`[]string`	Filter by device types
`MinBytes`	`*int64`	Minimum bytes sent
`MaxBytes`	`*int64`	Maximum bytes sent
`MinReqTime`	`*float64`	Minimum request time
`MaxReqTime`	`*float64`	Maximum request time
`Limit`	`int`	Page size
`Offset`	`int`	Page offset
`SortBy`	`string`	Field to sort by
`SortOrder`	`string`	`"asc"` or `"desc"`
`IncludeHighlighting`	`bool`	Enable search term highlighting
`IncludeFacets`	`bool`	Request faceted aggregations
`FacetFields`	`[]string`	Fields to facet on
`FacetSize`	`int`	Number of terms per facet (default 10)
`IncludeStats`	`bool`	Include statistical aggregations
`Timeout`	`time.Duration`	Request timeout
`UseCache`	`bool`	Enable result caching
`CacheKey`	`string`	Custom cache key

Analytics Methods and Metrics

The analytics.Service interface (internal/nginx_log/analytics/service.go12-31) provides these analysis methods:

Method	Purpose	Return Type
`GetDashboardAnalytics(ctx, req)`	Comprehensive dashboard stats	`*DashboardAnalytics`
`GetLogEntriesStats(ctx, req)`	General log statistics	`*EntriesStats`
`GetGeoDistribution(ctx, req)`	Geographic access patterns	`*GeoDistribution`
`GetTopPaths(ctx, req)`	Most accessed URLs	`[]KeyValue`
`GetTopIPs(ctx, req)`	Most active IP addresses	`[]KeyValue`
`GetTopUserAgents(ctx, req)`	Most common user agents	`[]KeyValue`
`GetTopCountries(ctx, req)`	Most active countries	`[]CountryStats`
`GetTopCities(ctx, req)`	Most active cities	`[]CityStats`

Cardinality Counting with HyperLogLog

The analytics service uses HyperLogLog for efficient unique value counting (internal/nginx_log/analytics/service.go66-85):

Accuracy Comparison:

Metric	Facet-Based	HyperLogLog-Based	Use Case
UV (Unique IPs)	Limited by `FacetSize`	~2% error rate	Used when facets would undercount
PV (Total Hits)	Exact via `TotalHits`	N/A	Direct from search results
Unique Pages	Limited by `FacetSize`	~2% error rate	URL diversity metrics
Browser/OS Stats	Exact via facets	N/A	Facets sufficient for categories
Geo Distribution	Exact via facets	N/A	Facets sufficient for regions

The Counter.CountCardinality() method enables accurate UV metrics even when the number of unique IPs exceeds facet size limits (typically 50-1000).

Dashboard Analytics Structure (internal/nginx_log/analytics/dashboard.go14-100)

GetDashboardAnalytics() returns:

Summary: Total PV, UV, traffic, unique pages
HourlyStats[]: 48 hours of access statistics
DailyStats[]: Daily aggregated metrics
TopURLs[]: Most accessed paths with hit counts
Browsers[]: Browser usage distribution
OperatingSystems[]: OS usage distribution
Devices[]: Device type breakdown

Sources:

Real-Time Monitoring and Events

The log management system publishes WebSocket events via the event package to provide real-time progress updates during indexing operations.

WebSocket Event Flow

Event Types and Structures

The system uses typed event constants from the event package:

TypeNginxLogIndexProgress (api/nginx_log/index_management.go168-179)

Published during indexing to show real-time progress:

TypeNginxLogIndexReady (api/nginx_log/index_management.go234-242)

Published when indexing completes and searcher is updated:

processing_status (api/nginx_log/index_management.go125)

Global indexing status via ProcessingStatusManager:

Frontend Event Subscription

The frontend subscribes using useWebSocketEventBus() composable (app/src/views/nginx_log/NginxLogList.vue90-130):

Subscribe to Processing Status:

Subscribe to Index Ready:

Progress Tracking with useIndexProgress:

The composable maintains a reactive map of file progress (app/src/views/nginx_log/NginxLogList.vue35-36):

Sources:

Data Storage and Organization

The log management system uses multiple storage layers with grouped sharding for efficient log group queries.

Storage Architecture

Bleve Index Organization

Index Path Configuration (internal/nginx_log/modern_services.go132-162)

The index path is determined by:

Custom path from settings.NginxLogSettings.IndexPath if configured
Otherwise: <config_dir>/log-index/ where <config_dir> is from cSettings.ConfPath
Fallback: ./log-index in current directory

Grouped Sharding Strategy (internal/nginx_log/indexer/parallel_indexer.go74-80)

The GroupedShardManager creates one Bleve index per log group:

Key: main_log_path (e.g., /var/log/nginx/access.log)
Value: Single bleve.Index containing all documents from that log group
Benefits:
- Efficient queries across all rotated files (access.log, access.log.1, access.log.2.gz, etc.)
- Reduced shard count compared to per-file sharding
- Simplified index management

Document Mapping

The Bleve index mapping is defined in CreateLogIndexMapping() (internal/nginx_log/indexer/types.go340-456). Key fields:

Field	Type	Analyzer	DocValues	Purpose
`timestamp`	Numeric	-	-	Time-range queries
`ip`	Text	keyword	Yes	Exact IP matching, UV counting
`main_log_path`	Text	keyword	Yes	Log group filtering
`file_path`	Text	keyword	-	Actual physical file path
`method`	Text	keyword	-	HTTP method filtering
`path_exact`	Text	keyword	Yes	Exact path matching, faceting
`status`	Numeric	-	-	Status code filtering
`browser`	Text	keyword	-	Browser analytics
`os`	Text	keyword	-	OS analytics
`device_type`	Text	keyword	-	Device type analytics
`region_code`	Text	keyword	-	Geographic filtering

Database Schema: model.NginxLogIndex

The nginx_log_index table tracks indexing metadata (internal/nginx_log/indexer/persistence.go69-124):

Column	Type	Description
`path`	`varchar(512)`	Primary key: Log file path
`main_log_path`	`varchar(512)`	Base log path for grouping rotated files
`index_status`	`varchar(20)`	`"indexed"`, `"indexing"`, `"queued"`, `"error"`, `"not_indexed"`
`document_count`	`bigint`	Total indexed documents
`last_indexed`	`datetime`	Timestamp of last successful index
`index_start_time`	`datetime`	When last indexing operation started
`index_duration`	`bigint`	Milliseconds to complete last indexing
`timerange_start`	`datetime`	Earliest timestamp in indexed logs
`timerange_end`	`datetime`	Latest timestamp in indexed logs
`last_modified`	`datetime`	File modification time at last index
`last_size`	`bigint`	File size in bytes at last index
`error_message`	`text`	Error description if status is `"error"`
`error_time`	`datetime`	When error occurred
`retry_count`	`int`	Number of retry attempts
`queue_position`	`int`	Position in indexing queue
`enabled`	`boolean`	Whether indexing is enabled for this file

Key Operations:

GetLogIndex(path): Retrieves or creates a record via FirstOrCreate() (internal/nginx_log/indexer/persistence.go70-89)
SaveLogIndex(logIndex): Updates/creates record, sets enabled = true, auto-sets index_status = "indexed" when IndexDuration > 0 (internal/nginx_log/indexer/persistence.go93-124)
GetLogGroupIndexes(mainLogPath): Returns all records for a log group (internal/nginx_log/indexer/persistence.go236-247)

Sources:

System Requirements and Configuration

The advanced indexing system has minimum and recommended hardware requirements:

Resource	Minimum	Recommended	Purpose
CPU Cores	1	2+	Worker parallelism
RAM	1GB	4GB+	Memory pooling and batch processing
Disk Space	20GB	-	Index storage (varies with log volume)

Configuration options (internal/nginx_log/indexer/types.go116-200):

The system automatically adjusts worker count and batch sizes based on available system resources and indexing performance.

Sources:

Log Management and Analytics

System Architecture

Service Initialization

Log File Discovery and Management

NginxLogCache Structure

Log File Status Lifecycle

Indexing Pipeline Architecture

Key Components

Search and Analytics Architecture

Search Request Structure

Analytics Methods and Metrics

Real-Time Monitoring and Events

Event Types and Structures

Frontend Event Subscription

Data Storage and Organization

Bleve Index Organization

Document Mapping

Database Schema: model.NginxLogIndex

System Requirements and Configuration

On this page