diff --git a/README.md b/README.md index 5d052f5..c211122 100644 --- a/README.md +++ b/README.md @@ -112,6 +112,13 @@ yarn start test-risk-score ## Entity Store Performance Testing +> **Important**: Entity store performance tests work reliably against **cloud environments** and **newly deployed environments only**. +> +> - **Running tests on the same instance is problematic** due to Elasticsearch's [node query cache](https://www.elastic.co/guide/en/elasticsearch/reference/5.1/query-cache.html), which can skew results by caching query results between test runs. +> - **Running on local instances is not stable** and should be avoided. Local environments often have resource constraints and inconsistent performance that make baseline comparisons unreliable. +> +> For accurate and comparable results, always run performance tests against a fresh cloud deployment or a newly provisioned environment. + ### Sending one of the pre-built files #### One time send @@ -132,27 +139,124 @@ To do this use the `upload-perf-data-interval` command. This will upload a file ``` # upload the small data file 10 times with 30 seconds between sends -yarn start upload-perf-data-interval small --deleteEntities +yarn start upload-perf-data-interval small --deleteData ``` The count and interval can be customized: ``` # upload the small data file 100 times with 60 seconds between sends -yarn start upload-perf-data-interval small --deleteEntities --interval 60 --count 100 +yarn start upload-perf-data-interval small --deleteData --interval 60 --count 100 + +# Customize the sampling interval for metrics collection (default: 5 seconds) +yarn start upload-perf-data-interval small --deleteData --interval 60 --count 100 --samplingInterval 10 + +# Skip transform-related operations (for ESQL workflows) +yarn start upload-perf-data-interval small --deleteData --noTransforms ``` +Options: +- `--interval ` - Interval between uploads in seconds (default: 30) +- `--count ` - Number of times to upload (default: 10) +- `--deleteData` - Delete all entities and data streams before uploading +- `--deleteEngines` - Delete all entity engines before uploading +- `--transformTimeout ` - Timeout in minutes for waiting for generic transform to complete (default: 30) +- `--samplingInterval ` - Sampling interval in seconds for metrics collection (default: 5) +- `--noTransforms` - Skip transform-related operations (for ESQL workflows) + The entity IDs are modified before sending so that each upload creates new entities, this means there will be count * entityCount entities by the end of the test. -While the files are uploaded, we poll elasticsearch for the cluster health and the transform health, these files can be found in `./logs`. Where one file contains the cluster health every 5 seconds, and the other contains the transform health every 5 seconds: +While the files are uploaded, we poll elasticsearch and Kibana for various metrics. These log files can be found in `./logs`: ``` > ll logs total 464 --rw-r--r--@ 1 mark staff 33K Oct 28 11:20 small-2024-10-28T11:14:06.828Z-cluster-health.log --rw-r--r--@ 1 mark staff 145K Oct 28 11:20 small-2024-10-28T11:14:06.828Z-transform-stats.log +-rw-r--r--@ 1 dg staff 103K Nov 27 09:54 standard-2025-11-27T07:51:02.295Z-cluster-health.log +-rw-r--r--@ 1 dg staff 486K Nov 27 09:54 standard-2025-11-27T07:51:02.295Z-kibana-stats.log +-rw-r--r--@ 1 dg staff 429K Nov 27 09:54 standard-2025-11-27T07:51:02.295Z-node-stats.log +-rw-r--r--@ 1 dg staff 886K Nov 27 09:54 standard-2025-11-27T07:51:02.295Z-transform-stats.log +``` + +The log files contain: +- **cluster-health.log**: Cluster health status, active shards, and unassigned shards (sampled every N seconds, default: 5) +- **transform-stats.log**: Transform statistics including search/index/processing latencies, document counts, and per-entity-type metrics (sampled every N seconds, default: 5). Only generated if transforms are enabled (not using `--noTransforms`) +- **node-stats.log**: Elasticsearch node statistics including CPU usage, memory heap usage, and per-node metrics (sampled every N seconds, default: 5) +- **kibana-stats.log**: Kibana statistics including event loop metrics, Elasticsearch client stats, response times, memory usage, and OS load (sampled every N seconds, default: 5) + +### Baseline Metrics and Comparison + +After running performance tests, you can extract metrics from the generated log files and create baselines for comparison. + +#### Creating a baseline + +Extract metrics from log files and save them as a baseline: + +``` +# Create a baseline from logs with a specific prefix +yarn start create-baseline small-2024-10-28T11:14:06.828Z -e 100000 -l 5 + +# Create a baseline with a custom name +yarn start create-baseline small-2024-10-28T11:14:06.828Z -e 100000 -l 5 -n "baseline-v1_0-standard" + +# For interval tests, include upload count and interval +yarn start create-baseline small-2025-11-13T15:03:32 -e 100000 -l 5 -u 10 -i 30000 ``` +Options: +- `-e ` - Number of entities in the test +- `-l ` - Number of logs per entity +- `-u ` - Number of uploads (for interval tests) +- `-i ` - Interval in milliseconds (for interval tests) +- `-n ` - Custom name for the baseline (defaults to log-prefix) + +The baseline will be saved to the `./baselines` directory. + +#### Listing baselines + +View all available baselines: + +``` +yarn start list-baselines +``` + +#### Comparing metrics + +Compare current run metrics against a baseline: + +``` +# Compare against the latest baseline +yarn start compare-metrics standard-2025-11-27T07:51 -e 100000 -l 5 + +# Compare against a specific baseline by name pattern +yarn start compare-metrics standard-2025-11-27T07:51 -b "baseline-v1_0" -e 100000 -l 5 + +# Customize comparison thresholds +yarn start compare-metrics standard-2025-11-27T07:51 \ + -b "baseline-v1_0" \ + -e 100000 -l 5 \ + --degradation-threshold 20 \ + --warning-threshold 10 \ + --improvement-threshold 10 +``` + +Options: +- `-b ` - Path to baseline file or pattern to match (uses latest if not specified) +- `-e ` - Number of entities for current run +- `-l ` - Number of logs per entity for current run +- `-u ` - Number of uploads for current run +- `-i ` - Interval in milliseconds for current run +- `--degradation-threshold ` - Percentage worse to be considered degradation (default: 20) +- `--warning-threshold ` - Percentage worse to be considered warning (default: 10) +- `--improvement-threshold ` - Percentage better to be considered improvement (default: 10) + +The comparison report shows metrics including: +- **Latency metrics**: Search, Intake, and Processing latencies (avg, p50, p95, p99, max) +- **System metrics**: CPU, Memory, Throughput, Index Efficiency +- **Entity metrics**: Per-entity-type metrics (host, user, service, generic) +- **Error metrics**: Search failures, Index failures +- **Kibana metrics**: Event loop, Elasticsearch client, Response times, Memory, Requests, OS Load + + ### Generating a data file To generate a data file for performance testing, use the `create-perf-data` command. diff --git a/baselines/baseline-v1_0-standard-1-sec-logging-2025-11-25T17-10-19-485Z.json b/baselines/baseline-v1_0-standard-1-sec-logging-2025-11-25T17-10-19-485Z.json new file mode 100644 index 0000000..b5734c9 --- /dev/null +++ b/baselines/baseline-v1_0-standard-1-sec-logging-2025-11-25T17-10-19-485Z.json @@ -0,0 +1,254 @@ +{ + "testName": "baseline-v1_2-standard-1-sec-logging", + "timestamp": "2025-11-25T17:10:19.485Z", + "testConfig": { + "entityCount": 100000, + "logsPerEntity": 10 + }, + "metrics": { + "searchLatency": { + "avg": 58.29926933267098, + "p50": 55.111111111111114, + "p95": 98.42857142857143, + "p99": 104.85714285714286, + "max": 109.2 + }, + "intakeLatency": { + "avg": 136.78236914600555, + "p50": 125.66666666666667, + "p95": 184, + "p99": 203, + "max": 220.5 + }, + "processingLatency": { + "avg": 11.306138670073098, + "p50": 11, + "p95": 17.857142857142858, + "p99": 20.714285714285715, + "max": 21 + }, + "cpu": { + "avg": 5.888405797101449, + "peak": 40, + "avgPerNode": { + "tiebreaker-0000000002": 0.05652173913043478, + "instance-0000000001": 9.452173913043477, + "instance-0000000000": 8.156521739130435 + } + }, + "memory": { + "avgHeapPercent": 58.45217391304348, + "peakHeapPercent": 88, + "avgHeapBytes": 392642187.8956522, + "peakHeapBytes": 782218776 + }, + "throughput": { + "avgDocumentsPerSecond": 7311.038341315635, + "peakDocumentsPerSecond": 3992614.161508086 + }, + "indexEfficiency": { + "avgRatio": 0.10000107784302055, + "totalDocumentsIndexed": 167002, + "totalDocumentsProcessed": 1670002 + }, + "pagesProcessed": { + "total": 1018, + "avgPerSample": 1.1162280701754386 + }, + "triggerCount": { + "total": 16, + "avgPerTransform": 0.017543859649122806 + }, + "exponentialAverages": { + "checkpointDuration": 12030.396694214876, + "documentsIndexed": 16343.256198347106, + "documentsProcessed": 163431.2231404959 + }, + "perEntityType": { + "host": { + "searchLatency": { + "avg": 51.34667415917416, + "p50": 46.5, + "p95": 78.28571428571429, + "p99": 82.57142857142857, + "max": 82.57142857142857 + }, + "intakeLatency": { + "avg": 171.18, + "p50": 171.33333333333334, + "p95": 187.5, + "p99": 199.5, + "max": 199.5 + }, + "processingLatency": { + "avg": 8.002926240426241, + "p50": 7.555555555555555, + "p95": 11.714285714285714, + "p99": 12.428571428571429, + "max": 12.428571428571429 + }, + "documentsProcessed": 330001, + "documentsIndexed": 33001, + "pagesProcessed": 204, + "triggerCount": 4, + "sampleCounts": { + "search": 26, + "index": 25, + "processing": 26 + } + }, + "user": { + "searchLatency": { + "avg": 34.705555555555556, + "p50": 34.90909090909091, + "p95": 40.7, + "p99": 42.25, + "max": 42.25 + }, + "intakeLatency": { + "avg": 154.7625, + "p50": 151, + "p95": 179.33333333333334, + "p99": 203, + "max": 203 + }, + "processingLatency": { + "avg": 7.409040404040406, + "p50": 7.5, + "p95": 9.5, + "p99": 9.909090909090908, + "max": 9.909090909090908 + }, + "documentsProcessed": 330000, + "documentsIndexed": 33000, + "pagesProcessed": 200, + "triggerCount": 4, + "sampleCounts": { + "search": 20, + "index": 20, + "processing": 20 + } + }, + "service": { + "searchLatency": { + "avg": 36.95, + "p50": 4.5, + "p95": 69.4, + "p99": 69.4, + "max": 69.4 + }, + "intakeLatency": { + "avg": 139.5, + "p50": 108, + "p95": 171, + "p99": 171, + "max": 171 + }, + "processingLatency": { + "avg": 4, + "p50": 0, + "p95": 8, + "p99": 8, + "max": 8 + }, + "documentsProcessed": 10000, + "documentsIndexed": 1000, + "pagesProcessed": 8, + "triggerCount": 4, + "sampleCounts": { + "search": 2, + "index": 2, + "processing": 2 + } + }, + "generic": { + "searchLatency": { + "avg": 67.69575972075974, + "p50": 60.285714285714285, + "p95": 101.28571428571429, + "p99": 109.2, + "max": 109.2 + }, + "intakeLatency": { + "avg": 120.2286036036036, + "p50": 119.66666666666667, + "p95": 137, + "p99": 220.5, + "max": 220.5 + }, + "processingLatency": { + "avg": 13.71745982995983, + "p50": 13, + "p95": 19.714285714285715, + "p99": 21, + "max": 21 + }, + "documentsProcessed": 1000001, + "documentsIndexed": 100001, + "pagesProcessed": 606, + "triggerCount": 4, + "sampleCounts": { + "search": 74, + "index": 74, + "processing": 74 + } + } + }, + "transformStates": { + "indexing": 124, + "started": 792 + }, + "errors": { + "searchFailures": 0, + "indexFailures": 0, + "totalFailures": 0 + }, + "clusterHealth": { + "status": "green", + "avgActiveShards": 189.05652173913043, + "unassignedShards": 0 + }, + "kibana": { + "eventLoop": { + "delay": { + "avg": 13.394809434782614, + "p50": 10.076159, + "p95": 10.272767, + "p99": 12.156927, + "max": 32.276479 + }, + "utilization": { + "avg": 0.019528116725809345, + "peak": 0.11119570504768685 + } + }, + "elasticsearchClient": { + "avgActiveSockets": 0.2391304347826087, + "avgIdleSockets": 9.347826086956522, + "peakQueuedRequests": 0 + }, + "responseTimes": { + "avg": 8.806734512645638, + "max": 584 + }, + "memory": { + "avgHeapBytes": 513021785.3913044, + "peakHeapBytes": 552635592, + "avgRssBytes": 921329236.5913043, + "peakRssBytes": 922148864 + }, + "requests": { + "total": 16, + "avgPerSecond": 0.06979858745108646, + "errorRate": 0, + "disconnects": 0 + }, + "osLoad": { + "avg1m": 0.5982608695652174, + "avg5m": 0.7195652173913035, + "avg15m": 0.7216956521739128, + "peak1m": 0.95 + } + } + } +} \ No newline at end of file diff --git a/package.json b/package.json index 8b3aa45..40bb61a 100644 --- a/package.json +++ b/package.json @@ -34,8 +34,9 @@ "node-fetch": "^3.3.1", "p-map": "^7.0.2", "readline": "^1.3.0", - "tsx": "^4.7.1", - "url-join": "^5.0.0" + "tsx": "^4.20.6", + "url-join": "^5.0.0", + "uuid": "^13.0.0" }, "devDependencies": { "@types/cli-progress": "3.11.6", diff --git a/src/commands/entity_store_perf.ts b/src/commands/entity_store_perf.ts index 559ea5b..ff28ac9 100644 --- a/src/commands/entity_store_perf.ts +++ b/src/commands/entity_store_perf.ts @@ -2,8 +2,9 @@ import { faker } from '@faker-js/faker'; import fs from 'fs'; import cliProgress from 'cli-progress'; import { getEsClient, getFileLineCount } from './utils/indices'; +import { ensureSecurityDefaultDataView } from '../utils/security_default_data_view'; import readline from 'readline'; -import { deleteEngines, initEntityEngineForEntityTypes } from '../utils/kibana_api'; +import { deleteEngines, initEntityEngineForEntityTypes, kibanaFetch } from '../utils/kibana_api'; import { get } from 'lodash-es'; import { dirname } from 'path'; import { fileURLToPath } from 'url'; @@ -12,6 +13,12 @@ import * as path from 'path'; const config = getConfig(); +// Checkpoint stability configuration for transform completion detection +// Consider checkpoint stable if it hasn't changed in this duration (10 seconds) +const CHECKPOINT_STABLE_TIME_MS = 10000; +// Consider stable after this many consecutive checks with the same checkpoint +const STABLE_CHECKPOINT_THRESHOLD = 3; + interface EntityFields { id: string; name: string; @@ -47,6 +54,41 @@ interface UserFields { }; } +interface ServiceFields { + entity: EntityFields; + service: { + name: string; + id?: string; + type?: string; + node?: { + roles?: string; + name?: string; + }; + environment?: string; + address?: string; + state?: string; + ephemeral_id?: string; + version?: string; + }; +} + +interface GenericEntityFields { + entity: EntityFields; + event?: { + ingested?: string; + dataset?: string; + module?: string; + }; + cloud?: { + provider?: string; + region?: string; + account?: { + name?: string; + id?: string; + }; + }; +} + let stop = false; process.on('SIGINT', () => { @@ -101,12 +143,22 @@ const getLogsPerEntity = (filePath: string) => { if (doc.host) { idField = 'host.name'; idValue = doc.host.name; - } else { + } else if (doc.user) { idField = 'user.name'; idValue = doc.user.name; + } else if (doc.service) { + idField = 'service.name'; + idValue = doc.service.name; + } else if (doc.entity) { + idField = 'entity.id'; + idValue = doc.entity.id; } } + if (!idField) { + return; + } + const docId = get(doc, idField); if (docId !== idValue && !resolved) { resolved = true; @@ -120,6 +172,21 @@ const getLogsPerEntity = (filePath: string) => { rl.on('error', (err) => { reject(err); }); + + rl.on('close', () => { + if (!resolved) { + if (!idField) { + reject( + new Error( + 'Could not determine entity type from file. Expected host, user, service, or entity fields.' + ) + ); + } else { + // All documents have the same entity ID, resolve with total count + resolve(count); + } + } + }); }); }; @@ -168,6 +235,31 @@ const changeUserName = (doc: Record, addition: string) => { return doc; }; +// eslint-disable-next-line @typescript-eslint/no-explicit-any +const changeServiceName = (doc: Record, addition: string) => { + const newName = `${doc.service.name}-${addition}`; + doc.service.name = newName; + doc.service.id = newName; + doc.entity.name = newName; + doc.entity.id = newName; + return doc; +}; + +// eslint-disable-next-line @typescript-eslint/no-explicit-any +const changeGenericEntityName = (doc: Record, addition: string) => { + const originalName = doc.entity.name; // Store original name before modification + const newName = `${originalName}-${addition}`; + doc.entity.name = newName; + + // Check if ARN contains the original name (before modification) + if (doc.entity.id && doc.entity.id.includes(originalName)) { + // Update ARN: replace the last part (after last colon or slash) with new name + // ARN format: arn:aws:service:region:account:resource or arn:aws:service:region:account:type/resource + doc.entity.id = doc.entity.id.replace(/([^:/]+)$/, newName); + } + return doc; +}; + const generateUserFields = ({ idPrefix, entityIndex }: GeneratorOptions): UserFields => { const id = `${idPrefix}-user-${entityIndex}`; return { @@ -189,11 +281,162 @@ const generateUserFields = ({ idPrefix, entityIndex }: GeneratorOptions): UserFi }; }; +const generateServiceFields = ({ idPrefix, entityIndex }: GeneratorOptions): ServiceFields => { + const id = `${idPrefix}-service-${entityIndex}`; + return { + entity: { + id: id, + name: id, + type: 'service', + sub_type: 'system', + address: `example.${idPrefix}.com`, + }, + service: { + id: id, + name: id, + type: 'system', + node: { + roles: 'data', + name: `${id}-node`, + }, + environment: 'production', + address: generateIpAddresses(entityIndex * FIELD_LENGTH, 1)[0], + state: 'running', + ephemeral_id: `${id}-ephemeral`, + version: '8.0.0', + }, + }; +}; + +const generateGenericEntityFields = ({ + idPrefix, + entityIndex, +}: GeneratorOptions): GenericEntityFields => { + const id = `${idPrefix}-generic-${entityIndex}`; + const genericTypes = [ + { type: 'Messaging Service', subType: 'AWS SNS Topic' }, + { type: 'Storage Service', subType: 'AWS S3 Bucket' }, + { type: 'Compute Service', subType: 'AWS EC2 Instance' }, + { type: 'Database Service', subType: 'AWS RDS Instance' }, + { type: 'Compute Service', subType: 'AWS Lambda Function' }, + { type: 'Network Service', subType: 'AWS VPC' }, + { type: 'Storage Service', subType: 'AWS EBS Volume' }, + { type: 'Database Service', subType: 'AWS DynamoDB Table' }, + { type: 'Compute Service', subType: 'AWS ECS Service' }, + { type: 'Network Service', subType: 'AWS Load Balancer' }, + ]; + const taxonomy = genericTypes[entityIndex % genericTypes.length]; + const regions = ['us-east-1', 'us-west-2', 'eu-west-1', 'eu-central-1', 'ap-southeast-1']; + const region = regions[entityIndex % regions.length]; + const accountId = '123456789012'; + + let resourceId: string; + if (taxonomy.subType.includes('SNS')) { + resourceId = `arn:aws:sns:${region}:${accountId}:${id}`; + } else if (taxonomy.subType.includes('S3')) { + resourceId = `arn:aws:s3:::${id}`; + } else if (taxonomy.subType.includes('EC2')) { + resourceId = `arn:aws:ec2:${region}:${accountId}:instance/${id}`; + } else if (taxonomy.subType.includes('RDS')) { + resourceId = `arn:aws:rds:${region}:${accountId}:db:${id}`; + } else if (taxonomy.subType.includes('Lambda')) { + resourceId = `arn:aws:lambda:${region}:${accountId}:function:${id}`; + } else if (taxonomy.subType.includes('VPC')) { + resourceId = `arn:aws:ec2:${region}:${accountId}:vpc/${id}`; + } else if (taxonomy.subType.includes('EBS')) { + resourceId = `arn:aws:ec2:${region}:${accountId}:volume/${id}`; + } else if (taxonomy.subType.includes('DynamoDB')) { + resourceId = `arn:aws:dynamodb:${region}:${accountId}:table/${id}`; + } else if (taxonomy.subType.includes('ECS')) { + resourceId = `arn:aws:ecs:${region}:${accountId}:service/${id}`; + } else if (taxonomy.subType.includes('Load Balancer')) { + resourceId = `arn:aws:elasticloadbalancing:${region}:${accountId}:loadbalancer/${id}`; + } else { + resourceId = `arn:aws:${taxonomy.subType.toLowerCase().replace(/\s+/g, '-')}:${region}:${accountId}:${id}`; + } + + return { + entity: { + id: resourceId, + name: id, + type: taxonomy.type, + sub_type: taxonomy.subType, + address: `example.${idPrefix}.com`, + }, + event: { + ingested: new Date().toISOString(), + dataset: 'cloud_asset_inventory.asset_inventory', + module: 'cloud_asset_inventory', + }, + cloud: { + provider: 'aws', + region: region, + account: { + name: `${idPrefix}-account`, + id: accountId, + }, + }, + }; +}; + const FIELD_LENGTH = 2; const directoryName = dirname(fileURLToPath(import.meta.url)); const DATA_DIRECTORY = directoryName + '/../../data/entity_store_perf_data'; const LOGS_DIRECTORY = directoryName + '/../../logs'; +/** + * Predefined entity distribution presets + */ +export const ENTITY_DISTRIBUTIONS = { + // Equal distribution: 25% each + equal: { + user: 0.25, + host: 0.25, + generic: 0.25, + service: 0.25, + }, + // Standard distribution: 33% users, 33% hosts, 33% generic, 1% service + standard: { + user: 0.33, + host: 0.33, + generic: 0.33, + service: 0.01, + }, +} as const; + +export type DistributionType = keyof typeof ENTITY_DISTRIBUTIONS; +export type EntityType = 'user' | 'host' | 'service' | 'generic'; + +export const DEFAULT_DISTRIBUTION: DistributionType = 'standard'; + +/** + * Get entity distribution by type + */ +export const getEntityDistribution = (type: DistributionType = DEFAULT_DISTRIBUTION) => { + return ENTITY_DISTRIBUTIONS[type]; +}; + +/** + * Calculate entity counts for each type based on total entity count and distribution + */ +export const calculateEntityCounts = ( + totalEntityCount: number, + distribution = getEntityDistribution() +) => { + const userCount = Math.floor(totalEntityCount * distribution.user); + const hostCount = Math.floor(totalEntityCount * distribution.host); + const genericCount = Math.floor(totalEntityCount * distribution.generic); + const serviceCount = totalEntityCount - userCount - hostCount - genericCount; // Remaining + + return { + user: userCount, + host: hostCount, + generic: genericCount, + service: serviceCount, + total: totalEntityCount, + }; +}; + const getFilePath = (name: string) => { return `${DATA_DIRECTORY}/${name}${name.endsWith('.jsonl') ? '' : '.jsonl'}`; }; @@ -219,6 +462,15 @@ const deleteLogsIndex = async (index: string) => { ); }; +const deleteDataStream = async (index: string) => { + return await getEsClient().indices.deleteDataStream( + { + name: index, + }, + { ignore: [404] } + ); +}; + const countEntities = async (baseDomainName: string) => { const esClient = getEsClient(); const res = await esClient.count({ @@ -236,6 +488,16 @@ const countEntities = async (baseDomainName: string) => { 'user.domain': `example.${baseDomainName}.com`, }, }, + { + prefix: { + 'service.name': `${baseDomainName}-service-`, + }, + }, + { + prefix: { + 'entity.name': `${baseDomainName}-generic`, + }, + }, ], minimum_should_match: 1, }, @@ -272,6 +534,95 @@ const countEntitiesUntil = async (name: string, count: number) => { return total; }; +const waitForTransformToComplete = async ( + transformId: string, + expectedDocumentsProcessed: number, + timeoutMs: number = 1800000 // 30 minutes default timeout +): Promise => { + const esClient = getEsClient(); + const startTime = Date.now(); + const pollInterval = 5000; // Check every 5 seconds + + console.log( + `Waiting for transform ${transformId} to process ${expectedDocumentsProcessed} documents (timeout: ${timeoutMs / 1000 / 60} minutes)...` + ); + + // Create progress bar similar to countEntitiesUntil + const progress = new cliProgress.SingleBar( + { + format: 'Progress | {value}/{total} Documents | Checkpoint: {checkpoint}', + }, + cliProgress.Presets.shades_classic + ); + progress.start(expectedDocumentsProcessed, 0, { checkpoint: 0 }); + + let lastCheckpoint = 0; + let stableCheckpointCount = 0; + + try { + while (Date.now() - startTime < timeoutMs) { + try { + const res = await esClient.transform.getTransformStats({ + transform_id: transformId, + }); + + if (res.transforms && res.transforms.length > 0) { + const stats = res.transforms[0].stats; + const documentsProcessed = stats.documents_processed || 0; + const checkpointing = res.transforms[0].checkpointing; + const currentCheckpoint = checkpointing?.last?.checkpoint || 0; + const checkpointTimestamp = checkpointing?.last?.timestamp_millis || 0; + + // Update progress bar + progress.update(documentsProcessed, { checkpoint: currentCheckpoint }); + + // Check if checkpoint is stable (not changing) + if (currentCheckpoint === lastCheckpoint) { + stableCheckpointCount++; + } else { + stableCheckpointCount = 0; + lastCheckpoint = currentCheckpoint; + } + + // Check if checkpoint has been stable for a while + const timeSinceLastCheckpoint = Date.now() - checkpointTimestamp; + const checkpointStable = timeSinceLastCheckpoint >= CHECKPOINT_STABLE_TIME_MS; + + // Transform has finished processing when: + // 1. Documents processed >= expected + // 2. Checkpoint has been stable for several checks (not incrementing) + // 3. Checkpoint timestamp indicates it's been stable for a while + if ( + documentsProcessed >= expectedDocumentsProcessed && + stableCheckpointCount >= STABLE_CHECKPOINT_THRESHOLD && + checkpointStable + ) { + progress.stop(); + console.log( + `\nTransform ${transformId} completed processing ${documentsProcessed} documents (checkpoint: ${currentCheckpoint})` + ); + return; + } + } + } catch (error) { + console.warn(`\nError checking transform stats: ${error}`); + } + + await new Promise((resolve) => setTimeout(resolve, pollInterval)); + } + + progress.stop(); + throw new Error( + `Timeout waiting for transform ${transformId} to process ${expectedDocumentsProcessed} documents after ${timeoutMs / 1000 / 60} minutes` + ); + } finally { + // Ensure progress bar is stopped even if there's an error + if (progress) { + progress.stop(); + } + } +}; + const logClusterHealthEvery = (name: string, interval: number): (() => void) => { if (config.serverless) { console.log('Skipping cluster health on serverless cluster'); @@ -314,6 +665,8 @@ const logTransformStatsEvery = (name: string, interval: number): (() => void) => const TRANSFORM_NAMES = [ 'entities-v1-latest-security_host_default', 'entities-v1-latest-security_user_default', + 'entities-v1-latest-security_service_default', + 'entities-v1-latest-security_generic_default', ]; let stopCalled = false; @@ -353,20 +706,184 @@ const logTransformStatsEvery = (name: string, interval: number): (() => void) => return stopCallback; }; -export const createPerfDataFile = ({ +const logNodeStatsEvery = (name: string, interval: number): (() => void) => { + if (config.serverless) { + console.log('Skipping node stats on serverless cluster'); + return () => {}; + } + + let stopCalled = false; + + const stopCallback = () => { + stopCalled = true; + }; + + const logFile = `${LOGS_DIRECTORY}/${name}-${new Date().toISOString()}-node-stats.log`; + + const stream = fs.createWriteStream(logFile, { flags: 'a' }); + + const log = (message: string) => { + stream.write(`${new Date().toISOString()} - ${message}\n`); + }; + + const logNodeStats = async () => { + const esClient = getEsClient(); + // Get node stats with CPU, JVM, and OS metrics + const res = await esClient.nodes.stats({ + metric: ['process', 'jvm', 'os'], + human: false, // Get raw numbers, not human-readable format + }); + + // Extract CPU and performance metrics for each node + const nodeStats = Object.entries(res.nodes).map(([nodeId, node]) => ({ + node_id: nodeId, + node_name: node.name, + timestamp: new Date().toISOString(), + cpu: { + percent: node.process?.cpu?.percent, // CPU usage percentage + total_in_millis: node.process?.cpu?.total_in_millis, // Total CPU time + }, + jvm: { + mem: { + heap_used_percent: node.jvm?.mem?.heap_used_percent, // Heap usage % + heap_used_in_bytes: node.jvm?.mem?.heap_used_in_bytes, + heap_max_in_bytes: node.jvm?.mem?.heap_max_in_bytes, + }, + gc: { + collectors: node.jvm?.gc?.collectors, // GC stats + }, + }, + os: { + cpu: { + percent: node.os?.cpu?.percent, // OS-level CPU % + load_average: node.os?.cpu?.load_average, // Load average (1m, 5m, 15m) + }, + mem: { + used_percent: node.os?.mem?.used_percent, // OS memory usage % + total_in_bytes: node.os?.mem?.total_in_bytes, + free_in_bytes: node.os?.mem?.free_in_bytes, + }, + }, + })); + + log(JSON.stringify({ nodes: nodeStats })); + }; + + const int = setInterval(async () => { + await logNodeStats(); + + if (stopCalled || stop) { + clearInterval(int); + stream.end(); + } + }, interval); + + return stopCallback; +}; + +const logKibanaStatsEvery = (name: string, interval: number): (() => void) => { + let stopCalled = false; + + const stopCallback = () => { + stopCalled = true; + }; + + const logFile = `${LOGS_DIRECTORY}/${name}-${new Date().toISOString()}-kibana-stats.log`; + + const stream = fs.createWriteStream(logFile, { flags: 'a' }); + + const log = (message: string) => { + stream.write(`${new Date().toISOString()} - ${message}\n`); + }; + + const logKibanaStats = async () => { + try { + const stats = await kibanaFetch<{ + process: { + uptime_in_millis: number; + memory: { + heap: { + total_in_bytes: number; + used_in_bytes: number; + size_limit: number; + }; + }; + }; + requests: { + total: number; + disconnects: number; + statusCodes: Record; + }; + response_times: { + avg_in_millis: number; + max_in_millis: number; + }; + concurrent_connections: number; + os: { + load: { + '1m': number; + '5m': number; + '15m': number; + }; + memory: { + total_in_bytes: number; + free_in_bytes: number; + used_in_bytes: number; + }; + }; + }>( + '/api/stats', + { + method: 'GET', + }, + { + apiVersion: '1', + } + ); + + log(JSON.stringify(stats)); + } catch (error) { + log(JSON.stringify({ error: error instanceof Error ? error.message : String(error) })); + } + }; + + const int = setInterval(async () => { + await logKibanaStats(); + + if (stopCalled || stop) { + clearInterval(int); + stream.end(); + } + }, interval); + + return stopCallback; +}; + +export const createPerfDataFile = async ({ entityCount, logsPerEntity, startIndex, name, + distribution = DEFAULT_DISTRIBUTION, }: { name: string; entityCount: number; logsPerEntity: number; startIndex: number; -}) => { + distribution?: DistributionType; +}): Promise => { const filePath = getFilePath(name); + const dist = getEntityDistribution(distribution); + const entityCounts = calculateEntityCounts(entityCount, dist); + + console.log( + `Creating performance data file ${name} with ${entityCount} entities and ${logsPerEntity} logs per entity. Starting at index ${startIndex}` + ); console.log( - `Creating performance data file ${name} at with ${entityCount} entities and ${logsPerEntity} logs per entity. Starting at index ${startIndex}` + `Distribution (${distribution}): ${entityCounts.user} users (${(dist.user * 100).toFixed(1)}%), ` + + `${entityCounts.host} hosts (${(dist.host * 100).toFixed(1)}%), ` + + `${entityCounts.service} services (${(dist.service * 100).toFixed(1)}%), ` + + `${entityCounts.generic} generic entities (${(dist.generic * 100).toFixed(1)}%)` ); if (fs.existsSync(filePath)) { @@ -382,47 +899,120 @@ export const createPerfDataFile = ({ // we will write to the file as we generate the data to avoid running out of memory const writeStream = fs.createWriteStream(filePath, { flags: 'a' }); + // Map entity types to their generator functions for cleaner code + const entityGenerators: Record< + EntityType, + (opts: GeneratorOptions) => HostFields | UserFields | ServiceFields | GenericEntityFields + > = { + host: generateHostFields, + user: generateUserFields, + service: generateServiceFields, + generic: generateGenericEntityFields, + }; + const generateLogs = async () => { - for (let i = 0; i < entityCount; i++) { - // we generate 50/50 host/user entities - const entityType = i % 2 === 0 ? 'host' : 'user'; - - // user-0 host-0 user-1 host-1 user-2 host-2 - const entityIndex = Math.floor(i / 2) + 1; - - for (let j = 0; j < logsPerEntity; j++) { - // start index for IP/MAC addresses - // host-0: 0-1, host-1: 2-3, host-2: 4-5 - const valueStartIndex = startIndex + j * FIELD_LENGTH; - const generatorOpts = { - entityIndex, - valueStartIndex: valueStartIndex, - fieldLength: FIELD_LENGTH, - idPrefix: name, - }; - const doc = { - // @timestamp is generated on ingest - ...(entityType === 'host' - ? generateHostFields(generatorOpts) - : generateUserFields(generatorOpts)), - message: faker.lorem.sentence(), - tags: ['entity-store-perf'], - }; + let globalEntityIndex = 0; + let streamError: Error | null = null; + + // Set up error handler once for the entire stream + writeStream.on('error', (error) => { + streamError = error; + }); - writeStream.write(JSON.stringify(doc) + '\n'); - progress.increment(); + // Generate entities in order: users, hosts, services, generic + const entityOrder: Array<{ type: EntityType; count: number }> = [ + { type: 'user', count: entityCounts.user }, + { type: 'host', count: entityCounts.host }, + { type: 'service', count: entityCounts.service }, + { type: 'generic', count: entityCounts.generic }, + ]; + + // Helper function to write to stream and wait for drain if needed + const writeToStream = (data: string): Promise => { + return new Promise((resolve, reject) => { + // Check for previous errors + if (streamError) { + reject(streamError); + return; + } + + const canContinue = writeStream.write(data); + if (canContinue) { + // Check again after write in case error occurred synchronously + if (streamError) { + reject(streamError); + } else { + resolve(); + } + } else { + writeStream.once('drain', () => { + if (streamError) { + reject(streamError); + } else { + resolve(); + } + }); + } + }); + }; + + try { + for (const { type, count } of entityOrder) { + for (let i = 0; i < count; i++) { + const entityIndex = i + 1; // 1-based index for entity within its type + + for (let j = 0; j < logsPerEntity; j++) { + // Fix: Calculate valueStartIndex to ensure unique IP addresses per entity + // Each entity gets a unique range: entity 0 uses 0-1, entity 1 uses 2-3, etc. + // Each log within an entity also gets unique values + const valueStartIndex = + startIndex + globalEntityIndex * logsPerEntity * FIELD_LENGTH + j * FIELD_LENGTH; + + const generatorOpts = { + entityIndex, + valueStartIndex, + fieldLength: FIELD_LENGTH, + idPrefix: name, + }; + + // Use map lookup instead of if/else chain + const doc = entityGenerators[type](generatorOpts); + + const finalDoc = { + ...doc, + message: faker.lorem.sentence(), + tags: ['entity-store-perf'], + }; + + await writeToStream(JSON.stringify(finalDoc) + '\n'); + progress.increment(); + } + + globalEntityIndex++; + // Yield to the event loop to prevent blocking + await new Promise((resolve) => setImmediate(resolve)); + } } - // Yield to the event loop to prevent blocking - await new Promise((resolve) => setImmediate(resolve)); + // Wait for all writes to complete before closing + await new Promise((resolve, reject) => { + writeStream.once('finish', resolve); + writeStream.once('error', reject); + writeStream.end(); + }); + + progress.stop(); + console.log(`Data file ${filePath} created`); + } catch (error) { + // Ensure stream is closed even on error + writeStream.destroy(); + progress.stop(); + throw error; } - progress.stop(); - console.log(`Data file ${filePath} created`); }; - generateLogs().catch((err) => { - console.error('Error generating logs:', err); - }); + // Properly await the operation to ensure stream is closed + await generateLogs(); }; export const uploadFile = async ({ @@ -512,6 +1102,10 @@ export const uploadPerfDataFile = async ( await deleteAllEntities(); console.log('All entities deleted'); + console.log('Deleting data stream...'); + await deleteDataStream(index); + console.log('Data stream deleted'); + console.log('Deleting logs index...'); await deleteLogsIndex(index); console.log('Logs index deleted'); @@ -526,7 +1120,7 @@ export const uploadPerfDataFile = async ( } console.log('initialising entity engines'); - await initEntityEngineForEntityTypes(['host', 'user']); + await initEntityEngineForEntityTypes(['host', 'user', 'service', 'generic']); console.log('entity engines initialised'); const { lineCount, logsPerEntity, entityCount } = await getFileStats(filePath); @@ -551,17 +1145,23 @@ export const uploadPerfDataFileInterval = async ( intervalMs: number, uploadCount: number, deleteEntities?: boolean, - doDeleteEngines?: boolean + doDeleteEngines?: boolean, + transformTimeoutMs?: number, + samplingIntervalMs?: number, + noTransforms?: boolean ) => { // eslint-disable-next-line @typescript-eslint/no-explicit-any const addIdPrefix = (prefix: string) => (doc: Record) => { - const isHost = !!doc.host; - - if (isHost) { + if (doc.host) { return changeHostName(doc, prefix); + } else if (doc.user) { + return changeUserName(doc, prefix); + } else if (doc.service) { + return changeServiceName(doc, prefix); + } else if (doc.entity && doc.cloud) { + return changeGenericEntityName(doc, prefix); } - - return changeUserName(doc, prefix); + return doc; }; const index = `logs-perftest.${name}-default`; @@ -581,6 +1181,10 @@ export const uploadPerfDataFileInterval = async ( await deleteAllEntities(); console.log('All entities deleted'); + console.log('Deleting data stream...'); + await deleteDataStream(index); + console.log('Data stream deleted'); + console.log('Deleting logs index...'); await deleteLogsIndex(index); console.log('Logs index deleted'); @@ -591,9 +1195,13 @@ export const uploadPerfDataFileInterval = async ( process.exit(1); } + // Initialize entity engines (required for both transform and no-transform modes + // because we need to check entities in .entities.v1.latest* indices) console.log('initialising entity engines'); - await initEntityEngineForEntityTypes(['host', 'user']); + await ensureSecurityDefaultDataView('default'); + + await initEntityEngineForEntityTypes(['host', 'user', 'service', 'generic']); console.log('entity engines initialised'); @@ -607,8 +1215,18 @@ export const uploadPerfDataFileInterval = async ( let previousUpload = Promise.resolve(); - const stopHealthLogging = logClusterHealthEvery(name, 5000); - const stopTransformsLogging = logTransformStatsEvery(name, 5000); + // Use configurable sampling interval, default to 5 seconds (5000ms) + const samplingInterval = samplingIntervalMs ?? 5000; + + const stopHealthLogging = logClusterHealthEvery(name, samplingInterval); + // Only log transform stats if transforms are enabled + const stopTransformsLogging = noTransforms + ? () => { + // No-op function when transforms are disabled + } + : logTransformStatsEvery(name, samplingInterval); + const stopNodeStatsLogging = logNodeStatsEvery(name, samplingInterval); + const stopKibanaStatsLogging = logKibanaStatsEvery(name, samplingInterval); for (let i = 0; i < uploadCount; i++) { if (stop) { @@ -661,10 +1279,36 @@ export const uploadPerfDataFileInterval = async ( await countEntitiesUntil(name, entityCount * uploadCount); + // Only wait for transform completion if transforms are enabled + if (!noTransforms) { + // Wait for generic transform to finish processing all documents + // Generic transform processes ALL documents (host + user + service + generic) + const totalDocumentsIngested = lineCount * uploadCount; + const timeout = transformTimeoutMs ?? 1800000; // Default 30 minutes + console.log( + `Waiting for generic transform to process ${totalDocumentsIngested} documents (timeout: ${timeout / 1000 / 60} minutes)...` + ); + try { + await waitForTransformToComplete( + 'entities-v1-latest-security_generic_default', + totalDocumentsIngested, + timeout + ); + } catch (error) { + console.warn( + `Warning: ${error instanceof Error ? error.message : 'Failed to wait for transform completion'}. Continuing...` + ); + } + } else { + console.log('Skipping transform completion wait (--noTransforms mode)'); + } + const tookTotal = Date.now() - startTime; stopHealthLogging(); stopTransformsLogging(); + stopNodeStatsLogging(); + stopKibanaStatsLogging(); console.log(`Total time: ${tookTotal}ms`); }; diff --git a/src/commands/utils/baseline_metrics/calculators/entity_metrics_calculator.ts b/src/commands/utils/baseline_metrics/calculators/entity_metrics_calculator.ts new file mode 100644 index 0000000..27f7fa7 --- /dev/null +++ b/src/commands/utils/baseline_metrics/calculators/entity_metrics_calculator.ts @@ -0,0 +1,64 @@ +import { EntityTypeMetrics, EntityTypeData, TransformStatsData } from '../types'; +import { percentile, avg, max } from '../utils'; + +/** + * Calculate entity type metrics from entity data + */ +const calculateEntityTypeMetrics = (entityData: EntityTypeData): EntityTypeMetrics => { + const sortedSearch = [...entityData.searchLatencies].sort((a, b) => a - b); + const sortedIndex = [...entityData.indexLatencies].sort((a, b) => a - b); + const sortedProcessing = [...entityData.processingLatencies].sort((a, b) => a - b); + + return { + searchLatency: { + avg: avg(entityData.searchLatencies), + p50: percentile(sortedSearch, 50), + p95: percentile(sortedSearch, 95), + p99: percentile(sortedSearch, 99), + max: max(entityData.searchLatencies), + }, + intakeLatency: { + avg: avg(entityData.indexLatencies), + p50: percentile(sortedIndex, 50), + p95: percentile(sortedIndex, 95), + p99: percentile(sortedIndex, 99), + max: max(entityData.indexLatencies), + }, + processingLatency: { + avg: avg(entityData.processingLatencies), + p50: percentile(sortedProcessing, 50), + p95: percentile(sortedProcessing, 95), + p99: percentile(sortedProcessing, 99), + max: max(entityData.processingLatencies), + }, + // Use MAX values (final cumulative values) instead of summing + documentsProcessed: max(entityData.documentsProcessed), + documentsIndexed: max(entityData.documentsIndexed), + pagesProcessed: max(entityData.pagesProcessed), + triggerCount: max(entityData.triggerCounts), + sampleCounts: { + search: entityData.searchLatencies.length, + index: entityData.indexLatencies.length, + processing: entityData.processingLatencies.length, + }, + }; +}; + +/** + * Calculate per-entity-type metrics from transform stats data + */ +export const calculateEntityMetrics = ( + transformData: TransformStatsData +): { + host: EntityTypeMetrics; + user: EntityTypeMetrics; + service: EntityTypeMetrics; + generic: EntityTypeMetrics; +} => { + return { + host: calculateEntityTypeMetrics(transformData.perEntityType.host), + user: calculateEntityTypeMetrics(transformData.perEntityType.user), + service: calculateEntityTypeMetrics(transformData.perEntityType.service), + generic: calculateEntityTypeMetrics(transformData.perEntityType.generic), + }; +}; diff --git a/src/commands/utils/baseline_metrics/calculators/kibana_metrics_calculator.ts b/src/commands/utils/baseline_metrics/calculators/kibana_metrics_calculator.ts new file mode 100644 index 0000000..3dc3021 --- /dev/null +++ b/src/commands/utils/baseline_metrics/calculators/kibana_metrics_calculator.ts @@ -0,0 +1,158 @@ +import { percentile, avg, max, safeDivide } from '../utils'; + +interface KibanaStatsData { + eventLoopDelays: number[]; + eventLoopDelayPercentiles: { + p50: number[]; + p95: number[]; + p99: number[]; + }; + eventLoopUtilizations: number[]; + esClientActiveSockets: number[]; + esClientIdleSockets: number[]; + esClientQueuedRequests: number[]; + responseTimes: number[]; + maxResponseTimes: number[]; + heapBytes: number[]; + rssBytes: number[]; + requestTotals: number[]; + requestErrorCounts: number[]; + requestDisconnects: number[]; + osLoad1m: number[]; + osLoad5m: number[]; + osLoad15m: number[]; + timeSpan: number; +} + +export interface KibanaMetrics { + eventLoop: { + delay: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; + utilization: { + avg: number; + peak: number; + }; + }; + elasticsearchClient: { + avgActiveSockets: number; + avgIdleSockets: number; + peakQueuedRequests: number; + }; + responseTimes: { + avg: number; + max: number; + }; + memory: { + avgHeapBytes: number; + peakHeapBytes: number; + avgRssBytes: number; + peakRssBytes: number; + }; + requests: { + total: number; + avgPerSecond: number; + errorRate: number; + disconnects: number; + }; + osLoad: { + avg1m: number; + avg5m: number; + avg15m: number; + peak1m: number; + }; +} + +/** + * Calculate Kibana metrics from Kibana stats data + */ +export const calculateKibanaMetrics = (kibanaData: KibanaStatsData | null): KibanaMetrics => { + if (!kibanaData) { + return { + eventLoop: { + delay: { avg: 0, p50: 0, p95: 0, p99: 0, max: 0 }, + utilization: { avg: 0, peak: 0 }, + }, + elasticsearchClient: { + avgActiveSockets: 0, + avgIdleSockets: 0, + peakQueuedRequests: 0, + }, + responseTimes: { avg: 0, max: 0 }, + memory: { + avgHeapBytes: 0, + peakHeapBytes: 0, + avgRssBytes: 0, + peakRssBytes: 0, + }, + requests: { + total: 0, + avgPerSecond: 0, + errorRate: 0, + disconnects: 0, + }, + osLoad: { + avg1m: 0, + avg5m: 0, + avg15m: 0, + peak1m: 0, + }, + }; + } + + return { + eventLoop: { + delay: { + avg: avg(kibanaData.eventLoopDelays), + p50: percentile( + [...kibanaData.eventLoopDelayPercentiles.p50].sort((a, b) => a - b), + 50 + ), + p95: percentile( + [...kibanaData.eventLoopDelayPercentiles.p95].sort((a, b) => a - b), + 95 + ), + p99: percentile( + [...kibanaData.eventLoopDelayPercentiles.p99].sort((a, b) => a - b), + 99 + ), + max: max(kibanaData.eventLoopDelays), + }, + utilization: { + avg: avg(kibanaData.eventLoopUtilizations), + peak: max(kibanaData.eventLoopUtilizations), + }, + }, + elasticsearchClient: { + avgActiveSockets: avg(kibanaData.esClientActiveSockets), + avgIdleSockets: avg(kibanaData.esClientIdleSockets), + peakQueuedRequests: max(kibanaData.esClientQueuedRequests), + }, + responseTimes: { + avg: avg(kibanaData.responseTimes), + max: max(kibanaData.maxResponseTimes), + }, + memory: { + avgHeapBytes: avg(kibanaData.heapBytes), + peakHeapBytes: max(kibanaData.heapBytes), + avgRssBytes: avg(kibanaData.rssBytes), + peakRssBytes: max(kibanaData.rssBytes), + }, + requests: { + total: max(kibanaData.requestTotals), + avgPerSecond: safeDivide(max(kibanaData.requestTotals), kibanaData.timeSpan), + errorRate: avg(kibanaData.requestErrorCounts), + disconnects: max(kibanaData.requestDisconnects), + }, + osLoad: { + avg1m: avg(kibanaData.osLoad1m), + avg5m: avg(kibanaData.osLoad5m), + avg15m: avg(kibanaData.osLoad15m), + peak1m: max(kibanaData.osLoad1m), + }, + }; +}; diff --git a/src/commands/utils/baseline_metrics/calculators/latency_calculator.ts b/src/commands/utils/baseline_metrics/calculators/latency_calculator.ts new file mode 100644 index 0000000..4ab6b82 --- /dev/null +++ b/src/commands/utils/baseline_metrics/calculators/latency_calculator.ts @@ -0,0 +1,67 @@ +import { TransformStatsData } from '../types'; +import { percentile, avg, max } from '../utils'; + +export interface LatencyMetrics { + searchLatency: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; + intakeLatency: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; + processingLatency: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; +} + +/** + * Calculate latency metrics from transform stats data + */ +export const calculateLatencyMetrics = (transformData: TransformStatsData): LatencyMetrics => { + // Calculate search latency metrics + const sortedSearchLatencies = [...transformData.searchLatencies].sort((a, b) => a - b); + const searchLatency = { + avg: avg(transformData.searchLatencies), + p50: percentile(sortedSearchLatencies, 50), + p95: percentile(sortedSearchLatencies, 95), + p99: percentile(sortedSearchLatencies, 99), + max: max(transformData.searchLatencies), + }; + + // Calculate index latency metrics + const sortedIndexLatencies = [...transformData.indexLatencies].sort((a, b) => a - b); + const intakeLatency = { + avg: avg(transformData.indexLatencies), + p50: percentile(sortedIndexLatencies, 50), + p95: percentile(sortedIndexLatencies, 95), + p99: percentile(sortedIndexLatencies, 99), + max: max(transformData.indexLatencies), + }; + + // Calculate processing latency metrics + const sortedProcessingLatencies = [...transformData.processingLatencies].sort((a, b) => a - b); + const processingLatency = { + avg: avg(transformData.processingLatencies), + p50: percentile(sortedProcessingLatencies, 50), + p95: percentile(sortedProcessingLatencies, 95), + p99: percentile(sortedProcessingLatencies, 99), + max: max(transformData.processingLatencies), + }; + + return { + searchLatency, + intakeLatency, + processingLatency, + }; +}; diff --git a/src/commands/utils/baseline_metrics/calculators/system_metrics_calculator.ts b/src/commands/utils/baseline_metrics/calculators/system_metrics_calculator.ts new file mode 100644 index 0000000..699fe04 --- /dev/null +++ b/src/commands/utils/baseline_metrics/calculators/system_metrics_calculator.ts @@ -0,0 +1,158 @@ +import { TransformStatsData, EntityTypeMetrics } from '../types'; +import { avg, max, safeDivide, last } from '../utils'; + +export interface SystemMetrics { + cpu: { + avg: number; + peak: number; + avgPerNode: Record; + }; + memory: { + avgHeapPercent: number; + peakHeapPercent: number; + avgHeapBytes: number; + peakHeapBytes: number; + }; + throughput: { + avgDocumentsPerSecond: number; + peakDocumentsPerSecond: number; + }; + indexEfficiency: { + avgRatio: number; + totalDocumentsIndexed: number; + totalDocumentsProcessed: number; + }; + pagesProcessed: { + total: number; + avgPerSample: number; + }; + triggerCount: { + total: number; + avgPerTransform: number; + }; + exponentialAverages: { + checkpointDuration: number; + documentsIndexed: number; + documentsProcessed: number; + }; +} + +interface NodeStatsData { + cpuPercentages: number[]; + heapPercentages: number[]; + heapBytes: number[]; + cpuPerNode: Record; +} + +/** + * Calculate system metrics from transform and node stats data + */ +export const calculateSystemMetrics = ( + transformData: TransformStatsData, + nodeData: NodeStatsData, + perEntityType: { + host: EntityTypeMetrics; + user: EntityTypeMetrics; + service: EntityTypeMetrics; + generic: EntityTypeMetrics; + } +): SystemMetrics => { + // Calculate CPU metrics + const avgCpuPerNode: Record = {}; + for (const [nodeName, cpuValues] of Object.entries(nodeData.cpuPerNode)) { + avgCpuPerNode[nodeName] = avg(cpuValues); + } + + const cpu = { + avg: avg(nodeData.cpuPercentages), + peak: max(nodeData.cpuPercentages), + avgPerNode: avgCpuPerNode, + }; + + // Calculate memory metrics + const memory = { + avgHeapPercent: avg(nodeData.heapPercentages), + peakHeapPercent: max(nodeData.heapPercentages), + avgHeapBytes: avg(nodeData.heapBytes), + peakHeapBytes: max(nodeData.heapBytes), + }; + + // Calculate throughput + // Sum the MAX values from each entity type (cumulative totals) + const timeSpan = + transformData.timestamps.length > 1 + ? (max(transformData.timestamps) - Math.min(...transformData.timestamps)) / 1000 + : 1; + const totalDocuments = + perEntityType.host.documentsProcessed + + perEntityType.user.documentsProcessed + + perEntityType.service.documentsProcessed + + perEntityType.generic.documentsProcessed; + const avgDocumentsPerSecond = safeDivide(totalDocuments, timeSpan); + const peakDocumentsPerSecond = + transformData.documentsProcessed.length > 0 && timeSpan > 0 + ? max(transformData.documentsProcessed) / (timeSpan / transformData.documentsProcessed.length) + : 0; + + // Calculate index efficiency + // Sum the MAX values from each entity type (cumulative totals) + const totalDocumentsIndexed = + perEntityType.host.documentsIndexed + + perEntityType.user.documentsIndexed + + perEntityType.service.documentsIndexed + + perEntityType.generic.documentsIndexed; + const totalDocumentsProcessed = + perEntityType.host.documentsProcessed + + perEntityType.user.documentsProcessed + + perEntityType.service.documentsProcessed + + perEntityType.generic.documentsProcessed; + const indexEfficiency = { + avgRatio: safeDivide(totalDocumentsIndexed, totalDocumentsProcessed), + totalDocumentsIndexed, + totalDocumentsProcessed, + }; + + // Calculate pages processed metrics + // Sum the MAX values from each entity type (cumulative totals) + const totalPagesProcessed = + perEntityType.host.pagesProcessed + + perEntityType.user.pagesProcessed + + perEntityType.service.pagesProcessed + + perEntityType.generic.pagesProcessed; + const pagesProcessed = { + total: totalPagesProcessed, + avgPerSample: safeDivide(totalPagesProcessed, transformData.pagesProcessed.length), + }; + + // Calculate trigger count metrics + // Sum the MAX values from each entity type (cumulative totals) + const totalTriggerCount = + perEntityType.host.triggerCount + + perEntityType.user.triggerCount + + perEntityType.service.triggerCount + + perEntityType.generic.triggerCount; + const triggerCount = { + total: totalTriggerCount, + avgPerTransform: safeDivide(totalTriggerCount, transformData.triggerCounts.length), + }; + + // Calculate exponential averages (use last non-zero value) + const exponentialAverages = { + checkpointDuration: last(transformData.exponentialAverages.checkpointDuration), + documentsIndexed: last(transformData.exponentialAverages.documentsIndexed), + documentsProcessed: last(transformData.exponentialAverages.documentsProcessed), + }; + + return { + cpu, + memory, + throughput: { + avgDocumentsPerSecond, + peakDocumentsPerSecond, + }, + indexEfficiency, + pagesProcessed, + triggerCount, + exponentialAverages, + }; +}; diff --git a/src/commands/utils/baseline_metrics/index.ts b/src/commands/utils/baseline_metrics/index.ts new file mode 100644 index 0000000..680aeed --- /dev/null +++ b/src/commands/utils/baseline_metrics/index.ts @@ -0,0 +1,147 @@ +import fs from 'fs'; +import path from 'path'; +import { BaselineMetrics } from './types'; +import { parseTransformStats, createEmptyTransformData } from './parsers/transform_stats_parser'; +import { parseNodeStats } from './parsers/node_stats_parser'; +import { parseClusterHealth } from './parsers/cluster_health_parser'; +import { parseKibanaStats } from './parsers/kibana_stats_parser'; +import { calculateLatencyMetrics } from './calculators/latency_calculator'; +import { calculateSystemMetrics } from './calculators/system_metrics_calculator'; +import { calculateEntityMetrics } from './calculators/entity_metrics_calculator'; +import { calculateKibanaMetrics } from './calculators/kibana_metrics_calculator'; +import { + saveBaseline, + loadBaseline, + listBaselines, + findBaselineByPattern, + loadBaselineWithPattern, +} from './storage'; + +// Re-export types +export type { BaselineMetrics } from './types'; + +// Re-export storage functions +export { + saveBaseline, + loadBaseline, + listBaselines, + findBaselineByPattern, + loadBaselineWithPattern, +}; + +/** + * Extract baseline metrics from log files + */ +export const extractBaselineMetrics = async ( + logPrefix: string, + testConfig: { + entityCount: number; + logsPerEntity: number; + uploadCount?: number; + intervalMs?: number; + } +): Promise => { + const logsDir = path.join(process.cwd(), 'logs'); + + // Check if logs directory exists + if (!fs.existsSync(logsDir)) { + throw new Error(`Logs directory does not exist: ${logsDir}`); + } + + // Find log files with the given prefix + let files: string[]; + try { + files = fs.readdirSync(logsDir); + } catch (error) { + throw new Error( + `Failed to read logs directory ${logsDir}: ${error instanceof Error ? error.message : String(error)}` + ); + } + const clusterHealthLog = files.find( + (f) => f.startsWith(logPrefix) && f.includes('cluster-health') + ); + const nodeStatsLog = files.find((f) => f.startsWith(logPrefix) && f.includes('node-stats')); + const transformStatsLog = files.find( + (f) => f.startsWith(logPrefix) && f.includes('transform-stats') + ); + const kibanaStatsLog = files.find((f) => f.startsWith(logPrefix) && f.includes('kibana-stats')); + + // Only require cluster-health and node-stats logs + // transform-stats and kibana-stats are optional + if (!clusterHealthLog || !nodeStatsLog) { + throw new Error( + `Could not find required log files with prefix "${logPrefix}". Found: ${JSON.stringify({ clusterHealthLog, nodeStatsLog, transformStatsLog, kibanaStatsLog })}` + ); + } + + const logFiles = [clusterHealthLog, nodeStatsLog]; + if (transformStatsLog) logFiles.push(transformStatsLog); + if (kibanaStatsLog) logFiles.push(kibanaStatsLog); + console.log(`Parsing logs: ${logFiles.join(', ')}`); + + // Parse logs - provide empty transform data if transform stats log is missing + const transformData = transformStatsLog + ? parseTransformStats(path.join(logsDir, transformStatsLog)) + : createEmptyTransformData(); + const nodeData = parseNodeStats(path.join(logsDir, nodeStatsLog)); + const clusterData = parseClusterHealth(path.join(logsDir, clusterHealthLog)); + const kibanaData = kibanaStatsLog ? parseKibanaStats(path.join(logsDir, kibanaStatsLog)) : null; + + // Calculate per-entity-type metrics first (needed for totals) + const perEntityType = calculateEntityMetrics(transformData); + + // Calculate latency metrics + const latencyMetrics = calculateLatencyMetrics(transformData); + + // Calculate system metrics + const systemMetrics = calculateSystemMetrics(transformData, nodeData, perEntityType); + + // Calculate errors + const errors = { + searchFailures: transformData.searchFailures, + indexFailures: transformData.indexFailures, + totalFailures: transformData.searchFailures + transformData.indexFailures, + }; + + // Cluster health + const clusterHealth = { + status: + clusterData.statuses.length > 0 + ? clusterData.statuses[clusterData.statuses.length - 1] + : 'unknown', + avgActiveShards: + clusterData.activeShards.length > 0 + ? clusterData.activeShards.reduce((a, b) => a + b, 0) / clusterData.activeShards.length + : 0, + unassignedShards: + clusterData.unassignedShards.length > 0 ? Math.max(...clusterData.unassignedShards) : 0, + }; + + // Calculate Kibana metrics + const kibana = calculateKibanaMetrics(kibanaData); + + const baseline: BaselineMetrics = { + testName: logPrefix, + timestamp: new Date().toISOString(), + testConfig, + metrics: { + searchLatency: latencyMetrics.searchLatency, + intakeLatency: latencyMetrics.intakeLatency, + processingLatency: latencyMetrics.processingLatency, + cpu: systemMetrics.cpu, + memory: systemMetrics.memory, + throughput: systemMetrics.throughput, + indexEfficiency: systemMetrics.indexEfficiency, + pagesProcessed: systemMetrics.pagesProcessed, + triggerCount: systemMetrics.triggerCount, + exponentialAverages: systemMetrics.exponentialAverages, + perEntityType, + transformStates: transformData.transformStates, + errors, + clusterHealth, + kibana, + }, + }; + + return baseline; +}; diff --git a/src/commands/utils/baseline_metrics/parsers/cluster_health_parser.ts b/src/commands/utils/baseline_metrics/parsers/cluster_health_parser.ts new file mode 100644 index 0000000..335dd3c --- /dev/null +++ b/src/commands/utils/baseline_metrics/parsers/cluster_health_parser.ts @@ -0,0 +1,43 @@ +import { readFileSafely } from '../utils'; + +/** + * Parse cluster health log + */ +export const parseClusterHealth = ( + logPath: string +): { + statuses: string[]; + activeShards: number[]; + unassignedShards: number[]; +} => { + const content = readFileSafely(logPath, 'Cluster health log file'); + const lines = content.split('\n').filter((line) => line.trim()); + + const statuses: string[] = []; + const activeShards: number[] = []; + const unassignedShards: number[] = []; + + for (const line of lines) { + try { + const match = line.match(/^(\d{4}-\d{2}-\d{2}T[\d:.-]+Z)\s+-\s+(.+)$/); + if (!match) continue; + + const data = JSON.parse(match[2]); + + if (data.status) { + statuses.push(data.status); + } + if (data.active_shards !== undefined) { + activeShards.push(data.active_shards); + } + if (data.unassigned_shards !== undefined) { + unassignedShards.push(data.unassigned_shards); + } + } catch { + // Skip malformed lines + continue; + } + } + + return { statuses, activeShards, unassignedShards }; +}; diff --git a/src/commands/utils/baseline_metrics/parsers/kibana_stats_parser.ts b/src/commands/utils/baseline_metrics/parsers/kibana_stats_parser.ts new file mode 100644 index 0000000..1750559 --- /dev/null +++ b/src/commands/utils/baseline_metrics/parsers/kibana_stats_parser.ts @@ -0,0 +1,181 @@ +import { readFileSafely } from '../utils'; + +/** + * Parse Kibana stats log and extract metrics + */ +export const parseKibanaStats = ( + logPath: string +): { + eventLoopDelays: number[]; + eventLoopDelayPercentiles: { + p50: number[]; + p95: number[]; + p99: number[]; + }; + eventLoopUtilizations: number[]; + esClientActiveSockets: number[]; + esClientIdleSockets: number[]; + esClientQueuedRequests: number[]; + responseTimes: number[]; + maxResponseTimes: number[]; + heapBytes: number[]; + rssBytes: number[]; + requestTotals: number[]; + requestErrorCounts: number[]; + requestDisconnects: number[]; + osLoad1m: number[]; + osLoad5m: number[]; + osLoad15m: number[]; + timestamps: number[]; + timeSpan: number; +} => { + const content = readFileSafely(logPath, 'Kibana stats log file'); + const lines = content.split('\n').filter((line) => line.trim()); + + const eventLoopDelays: number[] = []; + const eventLoopDelayPercentiles = { + p50: [] as number[], + p95: [] as number[], + p99: [] as number[], + }; + const eventLoopUtilizations: number[] = []; + const esClientActiveSockets: number[] = []; + const esClientIdleSockets: number[] = []; + const esClientQueuedRequests: number[] = []; + const responseTimes: number[] = []; + const maxResponseTimes: number[] = []; + const heapBytes: number[] = []; + const rssBytes: number[] = []; + const requestTotals: number[] = []; + const requestErrorCounts: number[] = []; + const requestDisconnects: number[] = []; + const osLoad1m: number[] = []; + const osLoad5m: number[] = []; + const osLoad15m: number[] = []; + const timestamps: number[] = []; + + for (const line of lines) { + try { + const match = line.match(/^(\d{4}-\d{2}-\d{2}T[\d:.-]+Z)\s+-\s+(.+)$/); + if (!match) continue; + + const timestamp = new Date(match[1]).getTime(); + timestamps.push(timestamp); + + const data = JSON.parse(match[2]); + + // Event loop metrics + if (data.process?.event_loop_delay !== undefined) { + eventLoopDelays.push(data.process.event_loop_delay); + } + if (data.process?.event_loop_delay_histogram?.percentiles) { + const percentiles = data.process.event_loop_delay_histogram.percentiles; + if (percentiles['50'] !== undefined) { + eventLoopDelayPercentiles.p50.push(percentiles['50']); + } + if (percentiles['95'] !== undefined) { + eventLoopDelayPercentiles.p95.push(percentiles['95']); + } + if (percentiles['99'] !== undefined) { + eventLoopDelayPercentiles.p99.push(percentiles['99']); + } + } + if (data.process?.event_loop_utilization?.utilization !== undefined) { + eventLoopUtilizations.push(data.process.event_loop_utilization.utilization); + } + + // Elasticsearch client metrics + if (data.elasticsearch_client) { + if (data.elasticsearch_client.total_active_sockets !== undefined) { + esClientActiveSockets.push(data.elasticsearch_client.total_active_sockets); + } + if (data.elasticsearch_client.total_idle_sockets !== undefined) { + esClientIdleSockets.push(data.elasticsearch_client.total_idle_sockets); + } + if (data.elasticsearch_client.total_queued_requests !== undefined) { + esClientQueuedRequests.push(data.elasticsearch_client.total_queued_requests); + } + } + + // Response times + if (data.response_times) { + if (data.response_times.avg_ms !== undefined) { + responseTimes.push(data.response_times.avg_ms); + } + if (data.response_times.max_ms !== undefined) { + maxResponseTimes.push(data.response_times.max_ms); + } + } + + // Memory metrics + if (data.process?.memory?.heap?.used_bytes !== undefined) { + heapBytes.push(data.process.memory.heap.used_bytes); + } + if (data.process?.memory?.resident_set_size_bytes !== undefined) { + rssBytes.push(data.process.memory.resident_set_size_bytes); + } + + // Request metrics + if (data.requests) { + if (data.requests.total !== undefined) { + requestTotals.push(data.requests.total); + } + if (data.requests.disconnects !== undefined) { + requestDisconnects.push(data.requests.disconnects); + } + // Calculate error rate from status codes + if (data.requests.status_codes) { + let errorCount = 0; + const totalRequests = data.requests.total || 0; + for (const [code, count] of Object.entries(data.requests.status_codes)) { + const statusCode = parseInt(code, 10); + if (statusCode >= 400) { + errorCount += count as number; + } + } + requestErrorCounts.push(totalRequests > 0 ? (errorCount / totalRequests) * 100 : 0); + } + } + + // OS load metrics + if (data.os?.load) { + if (data.os.load['1m'] !== undefined) { + osLoad1m.push(data.os.load['1m']); + } + if (data.os.load['5m'] !== undefined) { + osLoad5m.push(data.os.load['5m']); + } + if (data.os.load['15m'] !== undefined) { + osLoad15m.push(data.os.load['15m']); + } + } + } catch { + // Skip malformed lines + continue; + } + } + + const timeSpan = + timestamps.length > 1 ? (Math.max(...timestamps) - Math.min(...timestamps)) / 1000 : 1; + + return { + eventLoopDelays, + eventLoopDelayPercentiles, + eventLoopUtilizations, + esClientActiveSockets, + esClientIdleSockets, + esClientQueuedRequests, + responseTimes, + maxResponseTimes, + heapBytes, + rssBytes, + requestTotals, + requestErrorCounts, + requestDisconnects, + osLoad1m, + osLoad5m, + osLoad15m, + timestamps, + timeSpan, + }; +}; diff --git a/src/commands/utils/baseline_metrics/parsers/node_stats_parser.ts b/src/commands/utils/baseline_metrics/parsers/node_stats_parser.ts new file mode 100644 index 0000000..716ee04 --- /dev/null +++ b/src/commands/utils/baseline_metrics/parsers/node_stats_parser.ts @@ -0,0 +1,67 @@ +import { readFileSafely } from '../utils'; + +/** + * Parse node stats log and extract CPU and memory metrics + */ +export const parseNodeStats = ( + logPath: string +): { + cpuPercentages: number[]; + heapPercentages: number[]; + heapBytes: number[]; + cpuPerNode: Record; + timestamps: number[]; +} => { + const content = readFileSafely(logPath, 'Node stats log file'); + const lines = content.split('\n').filter((line) => line.trim()); + + const cpuPercentages: number[] = []; + const heapPercentages: number[] = []; + const heapBytes: number[] = []; + const cpuPerNode: Record = {}; + const timestamps: number[] = []; + + for (const line of lines) { + try { + const match = line.match(/^(\d{4}-\d{2}-\d{2}T[\d:.-]+Z)\s+-\s+(.+)$/); + if (!match) continue; + + const timestamp = new Date(match[1]).getTime(); + timestamps.push(timestamp); + + const data = JSON.parse(match[2]); + if (!data.nodes || !Array.isArray(data.nodes)) continue; + + for (const node of data.nodes) { + const nodeName = node.node_name || node.node_id || 'unknown'; + + if (node.cpu?.percent !== undefined) { + cpuPercentages.push(node.cpu.percent); + if (!cpuPerNode[nodeName]) { + cpuPerNode[nodeName] = []; + } + cpuPerNode[nodeName].push(node.cpu.percent); + } + + if (node.jvm?.mem?.heap_used_percent !== undefined) { + heapPercentages.push(node.jvm.mem.heap_used_percent); + } + + if (node.jvm?.mem?.heap_used_in_bytes !== undefined) { + heapBytes.push(node.jvm.mem.heap_used_in_bytes); + } + } + } catch { + // Skip malformed lines + continue; + } + } + + return { + cpuPercentages, + heapPercentages, + heapBytes, + cpuPerNode, + timestamps, + }; +}; diff --git a/src/commands/utils/baseline_metrics/parsers/transform_stats_parser.ts b/src/commands/utils/baseline_metrics/parsers/transform_stats_parser.ts new file mode 100644 index 0000000..64eb35d --- /dev/null +++ b/src/commands/utils/baseline_metrics/parsers/transform_stats_parser.ts @@ -0,0 +1,385 @@ +import { TransformStatsData } from '../types'; +import { readFileSafely, MAX_REASONABLE_SAMPLING_INTERVAL_MS } from '../utils'; + +/** + * Parse transform stats log and extract metrics + */ +export const parseTransformStats = (logPath: string): TransformStatsData => { + const content = readFileSafely(logPath, 'Transform stats log file'); + const lines = content.split('\n').filter((line) => line.trim()); + + const searchLatencies: number[] = []; + const indexLatencies: number[] = []; + const processingLatencies: number[] = []; + const documentsProcessed: number[] = []; + const documentsIndexed: number[] = []; + const pagesProcessed: number[] = []; + const triggerCounts: number[] = []; + let searchFailures = 0; + let indexFailures = 0; + const timestamps: number[] = []; + const exponentialAverages = { + checkpointDuration: [] as number[], + documentsIndexed: [] as number[], + documentsProcessed: [] as number[], + }; + const transformStates = { + indexing: 0, + started: 0, + }; + + const perEntityType = { + host: { + searchLatencies: [] as number[], + indexLatencies: [] as number[], + processingLatencies: [] as number[], + documentsProcessed: [] as number[], + documentsIndexed: [] as number[], + pagesProcessed: [] as number[], + triggerCounts: [] as number[], + }, + user: { + searchLatencies: [] as number[], + indexLatencies: [] as number[], + processingLatencies: [] as number[], + documentsProcessed: [] as number[], + documentsIndexed: [] as number[], + pagesProcessed: [] as number[], + triggerCounts: [] as number[], + }, + service: { + searchLatencies: [] as number[], + indexLatencies: [] as number[], + processingLatencies: [] as number[], + documentsProcessed: [] as number[], + documentsIndexed: [] as number[], + pagesProcessed: [] as number[], + triggerCounts: [] as number[], + }, + generic: { + searchLatencies: [] as number[], + indexLatencies: [] as number[], + processingLatencies: [] as number[], + documentsProcessed: [] as number[], + documentsIndexed: [] as number[], + pagesProcessed: [] as number[], + triggerCounts: [] as number[], + }, + }; + + // Track previous values per transform for incremental latency calculation + const prevValues: Record< + string, + { + searchTime: number; + searchTotal: number; + indexTime: number; + indexTotal: number; + processingTime: number; + processingTotal: number; + } + > = {}; + + // First pass: Collect timestamps to detect sampling interval + // Group timestamps by sampling batch (transforms are logged together sequentially) + const sampleTimestamps: number[] = []; + let lastBatchTime: number | null = null; + const BATCH_TOLERANCE_MS = 100; // Consider entries within 100ms as same batch + + for (const line of lines) { + try { + const match = line.match( + /^(\d{4}-\d{2}-\d{2}T[\d:.-]+Z)\s+-\s+Transform\s+(.+?)\s+stats:\s+(.+)$/ + ); + if (match) { + const timestamp = new Date(match[1]).getTime(); + + // Only add timestamp if it's from a new batch (not within tolerance of last batch) + // This handles the case where 4 transforms are logged sequentially in quick succession + if (lastBatchTime === null || Math.abs(timestamp - lastBatchTime) > BATCH_TOLERANCE_MS) { + sampleTimestamps.push(timestamp); + lastBatchTime = timestamp; + } + } + } catch { + // Skip malformed lines + } + } + + // Detect average sampling interval from timestamps + let avgSamplingInterval = 5000; // Default to 5 seconds + if (sampleTimestamps.length > 1) { + const intervals: number[] = []; + for (let i = 1; i < sampleTimestamps.length; i++) { + const interval = sampleTimestamps[i] - sampleTimestamps[i - 1]; + if (interval > 0 && interval < MAX_REASONABLE_SAMPLING_INTERVAL_MS) { + // Only consider intervals between 0 and MAX_REASONABLE_SAMPLING_INTERVAL_MS (5 minutes, reasonable range) + intervals.push(interval); + } + } + if (intervals.length > 0) { + // Use median interval to avoid outliers + intervals.sort((a, b) => a - b); + avgSamplingInterval = intervals[Math.floor(intervals.length / 2)]; + } + } + + // Adjust thresholds based on sampling interval (normalize to 5-second baseline) + // For 1-second sampling (0.2x), thresholds should be 0.2x of baseline + // For 5-second sampling (1.0x), thresholds remain at baseline + const intervalMultiplier = avgSamplingInterval / 5000; // 1.0 for 5s, 0.2 for 1s + const searchThreshold = Math.max(1, Math.floor(5 * intervalMultiplier)); + const indexThreshold = Math.max(1, Math.floor(10 * intervalMultiplier)); + const processingThreshold = Math.max(1, Math.floor(5 * intervalMultiplier)); + + // Second pass: Process data with adaptive thresholds + for (const line of lines) { + try { + const match = line.match( + /^(\d{4}-\d{2}-\d{2}T[\d:.-]+Z)\s+-\s+Transform\s+(.+?)\s+stats:\s+(.+)$/ + ); + if (!match) continue; + + const timestamp = new Date(match[1]).getTime(); + timestamps.push(timestamp); + + const transformId = match[2]; + const jsonStr = match[3]; + if (!jsonStr) continue; + + const data = JSON.parse(jsonStr); + const transform = data.transforms?.[0]; + if (!transform || !transform.stats) continue; + + const stats = transform.stats; + + // Determine entity type from transform ID + let entityType: 'host' | 'user' | 'service' | 'generic' | null = null; + if (transformId.includes('host')) { + entityType = 'host'; + } else if (transformId.includes('user')) { + entityType = 'user'; + } else if (transformId.includes('service')) { + entityType = 'service'; + } else if (transformId.includes('generic')) { + entityType = 'generic'; + } + + // Track transform state + if (transform.state === 'indexing') { + transformStates.indexing++; + } else if (transform.state === 'started') { + transformStates.started++; + } + + // Initialize previous values for this transform if not exists + if (!prevValues[transformId]) { + prevValues[transformId] = { + searchTime: stats.search_time_in_ms || 0, + searchTotal: stats.search_total || 0, + indexTime: stats.index_time_in_ms || 0, + indexTotal: stats.index_total || 0, + processingTime: stats.processing_time_in_ms || 0, + processingTotal: stats.processing_total || 0, + }; + // Skip first sample as we need previous values for incremental calculation + continue; + } + + // Calculate incremental search latency + // Use adaptive threshold based on sampling interval + const incrementalSearchTime = + (stats.search_time_in_ms || 0) - prevValues[transformId].searchTime; + const incrementalSearchTotal = + (stats.search_total || 0) - prevValues[transformId].searchTotal; + if (incrementalSearchTotal >= searchThreshold && incrementalSearchTime >= 0) { + const incrementalSearchLatency = incrementalSearchTime / incrementalSearchTotal; + searchLatencies.push(incrementalSearchLatency); + if (entityType) { + perEntityType[entityType].searchLatencies.push(incrementalSearchLatency); + } + } + + // Calculate incremental index latency + // Use adaptive threshold based on sampling interval + const incrementalIndexTime = + (stats.index_time_in_ms || 0) - prevValues[transformId].indexTime; + const incrementalIndexTotal = (stats.index_total || 0) - prevValues[transformId].indexTotal; + if (incrementalIndexTotal >= indexThreshold && incrementalIndexTime >= 0) { + const incrementalIndexLatency = incrementalIndexTime / incrementalIndexTotal; + indexLatencies.push(incrementalIndexLatency); + if (entityType) { + perEntityType[entityType].indexLatencies.push(incrementalIndexLatency); + } + } + + // Calculate incremental processing latency + // Use adaptive threshold based on sampling interval + const incrementalProcessingTime = + (stats.processing_time_in_ms || 0) - prevValues[transformId].processingTime; + const incrementalProcessingTotal = + (stats.processing_total || 0) - prevValues[transformId].processingTotal; + if (incrementalProcessingTotal >= processingThreshold && incrementalProcessingTime >= 0) { + const incrementalProcessingLatency = incrementalProcessingTime / incrementalProcessingTotal; + processingLatencies.push(incrementalProcessingLatency); + if (entityType) { + perEntityType[entityType].processingLatencies.push(incrementalProcessingLatency); + } + } + + // Update previous values for next iteration + prevValues[transformId] = { + searchTime: stats.search_time_in_ms || 0, + searchTotal: stats.search_total || 0, + indexTime: stats.index_time_in_ms || 0, + indexTotal: stats.index_total || 0, + processingTime: stats.processing_time_in_ms || 0, + processingTotal: stats.processing_total || 0, + }; + + // Track documents processed + if (stats.documents_processed !== undefined) { + documentsProcessed.push(stats.documents_processed); + if (entityType) { + perEntityType[entityType].documentsProcessed.push(stats.documents_processed); + } + } + + // Track documents indexed + if (stats.documents_indexed !== undefined) { + documentsIndexed.push(stats.documents_indexed); + if (entityType) { + perEntityType[entityType].documentsIndexed.push(stats.documents_indexed); + } + } + + // Track pages processed + if (stats.pages_processed !== undefined) { + pagesProcessed.push(stats.pages_processed); + if (entityType) { + perEntityType[entityType].pagesProcessed.push(stats.pages_processed); + } + } + + // Track trigger count + if (stats.trigger_count !== undefined) { + triggerCounts.push(stats.trigger_count); + if (entityType) { + perEntityType[entityType].triggerCounts.push(stats.trigger_count); + } + } + + // Track exponential averages (use final non-zero values) + if ( + stats.exponential_avg_checkpoint_duration_ms !== undefined && + stats.exponential_avg_checkpoint_duration_ms > 0 + ) { + exponentialAverages.checkpointDuration.push(stats.exponential_avg_checkpoint_duration_ms); + } + if ( + stats.exponential_avg_documents_indexed !== undefined && + stats.exponential_avg_documents_indexed > 0 + ) { + exponentialAverages.documentsIndexed.push(stats.exponential_avg_documents_indexed); + } + if ( + stats.exponential_avg_documents_processed !== undefined && + stats.exponential_avg_documents_processed > 0 + ) { + exponentialAverages.documentsProcessed.push(stats.exponential_avg_documents_processed); + } + + // Track failures + if (stats.search_failures) { + searchFailures += stats.search_failures; + } + if (stats.index_failures) { + indexFailures += stats.index_failures; + } + } catch { + // Skip malformed lines + continue; + } + } + + return { + searchLatencies, + indexLatencies, + processingLatencies, + documentsProcessed, + documentsIndexed, + pagesProcessed, + triggerCounts, + searchFailures, + indexFailures, + timestamps, + exponentialAverages, + transformStates, + perEntityType, + }; +}; + +/** + * Create empty transform stats data structure + */ +export const createEmptyTransformData = (): TransformStatsData => { + return { + searchLatencies: [], + indexLatencies: [], + processingLatencies: [], + documentsProcessed: [], + documentsIndexed: [], + pagesProcessed: [], + triggerCounts: [], + searchFailures: 0, + indexFailures: 0, + timestamps: [], + exponentialAverages: { + checkpointDuration: [], + documentsIndexed: [], + documentsProcessed: [], + }, + transformStates: { + indexing: 0, + started: 0, + }, + perEntityType: { + host: { + searchLatencies: [], + indexLatencies: [], + processingLatencies: [], + documentsProcessed: [], + documentsIndexed: [], + pagesProcessed: [], + triggerCounts: [], + }, + user: { + searchLatencies: [], + indexLatencies: [], + processingLatencies: [], + documentsProcessed: [], + documentsIndexed: [], + pagesProcessed: [], + triggerCounts: [], + }, + service: { + searchLatencies: [], + indexLatencies: [], + processingLatencies: [], + documentsProcessed: [], + documentsIndexed: [], + pagesProcessed: [], + triggerCounts: [], + }, + generic: { + searchLatencies: [], + indexLatencies: [], + processingLatencies: [], + documentsProcessed: [], + documentsIndexed: [], + pagesProcessed: [], + triggerCounts: [], + }, + }, + }; +}; diff --git a/src/commands/utils/baseline_metrics/storage.ts b/src/commands/utils/baseline_metrics/storage.ts new file mode 100644 index 0000000..7bdec06 --- /dev/null +++ b/src/commands/utils/baseline_metrics/storage.ts @@ -0,0 +1,156 @@ +import fs from 'fs'; +import path from 'path'; +import { BaselineMetrics } from './types'; +import { BASELINES_DIR, readFileSafely } from './utils'; + +/** + * Save baseline to file + */ +export const saveBaseline = (baseline: BaselineMetrics): string => { + const filename = `${baseline.testName}-${baseline.timestamp.replace(/[:.]/g, '-')}.json`; + const filepath = path.join(BASELINES_DIR, filename); + + fs.writeFileSync(filepath, JSON.stringify(baseline, null, 2)); + console.log(`Baseline saved to: ${filepath}`); + + return filepath; +}; + +/** + * Load baseline from file + */ +export const loadBaseline = (baselinePath: string): BaselineMetrics => { + const content = readFileSafely(baselinePath, 'Baseline file'); + try { + return JSON.parse(content) as BaselineMetrics; + } catch (error) { + throw new Error( + `Failed to parse baseline file ${baselinePath}: ${error instanceof Error ? error.message : String(error)}` + ); + } +}; + +/** + * List all available baselines + */ +export const listBaselines = (): string[] => { + if (!fs.existsSync(BASELINES_DIR)) { + return []; + } + + return fs + .readdirSync(BASELINES_DIR) + .filter((f) => f.endsWith('.json')) + .map((f) => path.join(BASELINES_DIR, f)) + .sort() + .reverse(); // Most recent first +}; + +/** + * Find baseline file by pattern (supports prefix matching) + * If multiple files match, returns the latest modified one + */ +export const findBaselineByPattern = (pattern: string): string | null => { + if (!fs.existsSync(BASELINES_DIR)) { + return null; + } + + // Normalize pattern - remove .json extension if present, handle paths + let searchPattern = pattern; + if (searchPattern.endsWith('.json')) { + searchPattern = searchPattern.slice(0, -5); + } + + // Remove baselines/ prefix if present (for convenience) + if (searchPattern.startsWith('baselines/')) { + searchPattern = searchPattern.slice(10); + } + + // Handle absolute paths + if (path.isAbsolute(searchPattern)) { + const baselinesDirName = path.basename(BASELINES_DIR); + if (searchPattern.includes(baselinesDirName)) { + searchPattern = path.basename(searchPattern, '.json'); + } + } + + // Get all baseline files + const allFiles = fs + .readdirSync(BASELINES_DIR) + .filter((f) => f.endsWith('.json')) + .map((f) => path.join(BASELINES_DIR, f)); + + // Find files matching the pattern (starts with pattern) + const matchingFiles = allFiles.filter((filepath) => { + const filename = path.basename(filepath, '.json'); + return filename.startsWith(searchPattern); + }); + + if (matchingFiles.length === 0) { + return null; + } + + // If exact match exists, use it + const exactMatch = matchingFiles.find((filepath) => { + const filename = path.basename(filepath, '.json'); + return filename === searchPattern; + }); + if (exactMatch) { + return exactMatch; + } + + // If multiple matches, return the latest modified file + if (matchingFiles.length > 1) { + const filesWithStats = matchingFiles.map((filepath) => ({ + filepath, + mtime: fs.statSync(filepath).mtime.getTime(), + })); + filesWithStats.sort((a, b) => b.mtime - a.mtime); // Sort by modification time, newest first + return filesWithStats[0].filepath; + } + + return matchingFiles[0]; +}; + +/** + * Load baseline by pattern or path, with fallback to latest + */ +export const loadBaselineWithPattern = ( + baselinePattern?: string +): { baseline: BaselineMetrics; path: string } => { + let baselinePath: string; + let baseline: BaselineMetrics; + + if (baselinePattern) { + // Try to find by pattern first + const matchedPath = findBaselineByPattern(baselinePattern); + if (!matchedPath) { + // If pattern matching fails, try direct path + if (fs.existsSync(baselinePattern)) { + baselinePath = baselinePattern; + baseline = loadBaseline(baselinePath); + console.log(`Using baseline: ${baselinePath}`); + } else { + console.error(`❌ Baseline not found: ${baselinePattern}`); + console.error(` Tried pattern matching and direct path, but no matches found.`); + process.exit(1); + } + } else { + baselinePath = matchedPath; + baseline = loadBaseline(baselinePath); + console.log(`Using baseline: ${baselinePath} (matched pattern: ${baselinePattern})`); + } + } else { + // Use latest baseline + const baselines = listBaselines(); + if (baselines.length === 0) { + console.error('❌ No baselines found. Create one first with create-baseline command.'); + process.exit(1); + } + baselinePath = baselines[0]; + baseline = loadBaseline(baselinePath); + console.log(`Using latest baseline: ${baselinePath}`); + } + + return { baseline, path: baselinePath }; +}; diff --git a/src/commands/utils/baseline_metrics/types.ts b/src/commands/utils/baseline_metrics/types.ts new file mode 100644 index 0000000..d358e26 --- /dev/null +++ b/src/commands/utils/baseline_metrics/types.ts @@ -0,0 +1,199 @@ +export interface EntityTypeMetrics { + searchLatency: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; + intakeLatency: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; + processingLatency: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; + documentsProcessed: number; + documentsIndexed: number; + pagesProcessed: number; + triggerCount: number; + sampleCounts?: { + search: number; + index: number; + processing: number; + }; +} + +export interface BaselineMetrics { + testName: string; + timestamp: string; + testConfig: { + entityCount: number; + logsPerEntity: number; + uploadCount?: number; + intervalMs?: number; + }; + metrics: { + searchLatency: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; + intakeLatency: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; + processingLatency: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; + cpu: { + avg: number; + peak: number; + avgPerNode: Record; + }; + memory: { + avgHeapPercent: number; + peakHeapPercent: number; + avgHeapBytes: number; + peakHeapBytes: number; + }; + throughput: { + avgDocumentsPerSecond: number; + peakDocumentsPerSecond: number; + }; + indexEfficiency: { + avgRatio: number; + totalDocumentsIndexed: number; + totalDocumentsProcessed: number; + }; + pagesProcessed: { + total: number; + avgPerSample: number; + }; + triggerCount: { + total: number; + avgPerTransform: number; + }; + exponentialAverages: { + checkpointDuration: number; + documentsIndexed: number; + documentsProcessed: number; + }; + perEntityType: { + host: EntityTypeMetrics; + user: EntityTypeMetrics; + service: EntityTypeMetrics; + generic: EntityTypeMetrics; + }; + transformStates: { + indexing: number; + started: number; + }; + errors: { + searchFailures: number; + indexFailures: number; + totalFailures: number; + }; + clusterHealth: { + status: string; + avgActiveShards: number; + unassignedShards: number; + }; + kibana: { + eventLoop: { + delay: { + avg: number; + p50: number; + p95: number; + p99: number; + max: number; + }; + utilization: { + avg: number; + peak: number; + }; + }; + elasticsearchClient: { + avgActiveSockets: number; + avgIdleSockets: number; + peakQueuedRequests: number; + }; + responseTimes: { + avg: number; + max: number; + }; + memory: { + avgHeapBytes: number; + peakHeapBytes: number; + avgRssBytes: number; + peakRssBytes: number; + }; + requests: { + total: number; + avgPerSecond: number; + errorRate: number; + disconnects: number; + }; + osLoad: { + avg1m: number; + avg5m: number; + avg15m: number; + peak1m: number; + }; + }; + }; +} + +export interface EntityTypeData { + searchLatencies: number[]; + indexLatencies: number[]; + processingLatencies: number[]; + documentsProcessed: number[]; + documentsIndexed: number[]; + pagesProcessed: number[]; + triggerCounts: number[]; +} + +export interface TransformStatsData { + searchLatencies: number[]; + indexLatencies: number[]; + processingLatencies: number[]; + documentsProcessed: number[]; + documentsIndexed: number[]; + pagesProcessed: number[]; + triggerCounts: number[]; + searchFailures: number; + indexFailures: number; + timestamps: number[]; + exponentialAverages: { + checkpointDuration: number[]; + documentsIndexed: number[]; + documentsProcessed: number[]; + }; + transformStates: { + indexing: number; + started: number; + }; + perEntityType: { + host: EntityTypeData; + user: EntityTypeData; + service: EntityTypeData; + generic: EntityTypeData; + }; +} diff --git a/src/commands/utils/baseline_metrics/utils.ts b/src/commands/utils/baseline_metrics/utils.ts new file mode 100644 index 0000000..746232f --- /dev/null +++ b/src/commands/utils/baseline_metrics/utils.ts @@ -0,0 +1,72 @@ +import fs from 'fs'; +import path from 'path'; + +export const BASELINES_DIR = path.join(process.cwd(), 'baselines'); + +// Maximum reasonable sampling interval (5 minutes in milliseconds) +// Used to filter out invalid intervals when detecting sampling frequency +export const MAX_REASONABLE_SAMPLING_INTERVAL_MS = 300000; + +/** + * Helper function to safely read a file with error handling + * @param filePath - Path to the file to read + * @param fileDescription - Description of the file for error messages (e.g., "transform stats log file") + * @returns The file content as a string + * @throws Error if file doesn't exist or can't be read + */ +export const readFileSafely = (filePath: string, fileDescription: string): string => { + if (!fs.existsSync(filePath)) { + throw new Error(`${fileDescription} does not exist: ${filePath}`); + } + try { + return fs.readFileSync(filePath, 'utf-8'); + } catch (error) { + throw new Error( + `Failed to read ${fileDescription} ${filePath}: ${error instanceof Error ? error.message : String(error)}` + ); + } +}; + +// Ensure baselines directory exists +if (!fs.existsSync(BASELINES_DIR)) { + fs.mkdirSync(BASELINES_DIR, { recursive: true }); +} + +/** + * Calculate percentile from sorted array + */ +export const percentile = (sortedArray: number[], percentile: number): number => { + if (sortedArray.length === 0) return 0; + const index = Math.ceil((percentile / 100) * sortedArray.length) - 1; + return sortedArray[Math.max(0, index)]; +}; + +/** + * Calculate average from array of numbers + */ +export const avg = (array: number[]): number => { + if (array.length === 0) return 0; + return array.reduce((a, b) => a + b, 0) / array.length; +}; + +/** + * Calculate maximum from array of numbers + */ +export const max = (array: number[]): number => { + if (array.length === 0) return 0; + return Math.max(...array); +}; + +/** + * Safe division - returns 0 if denominator is 0 + */ +export const safeDivide = (numerator: number, denominator: number): number => { + return denominator > 0 ? numerator / denominator : 0; +}; + +/** + * Get last element from array, or 0 if empty + */ +export const last = (array: number[]): number => { + return array.length > 0 ? array[array.length - 1] : 0; +}; diff --git a/src/commands/utils/metric_comparisons/comparison_helpers.ts b/src/commands/utils/metric_comparisons/comparison_helpers.ts new file mode 100644 index 0000000..663e913 --- /dev/null +++ b/src/commands/utils/metric_comparisons/comparison_helpers.ts @@ -0,0 +1,101 @@ +import { ComparisonResult, ComparisonThresholds } from '../metrics_comparison'; + +/** + * Determine status based on effective difference percentage. + * After normalization via effectiveDiffPercent: + * - Negative values = worse (degradation/warning) + * - Positive values = better (improvement) + * - Near zero = stable + */ +export const determineStatus = ( + effectiveDiffPercent: number, + thresholds: ComparisonThresholds +): ComparisonResult['status'] => { + if (effectiveDiffPercent < -thresholds.degradationThreshold) { + return 'degradation'; + } else if (effectiveDiffPercent < -thresholds.warningThreshold) { + return 'warning'; + } else if (effectiveDiffPercent > thresholds.improvementThreshold) { + return 'improvement'; + } else { + return 'stable'; + } +}; + +/** + * Helper to create comparison result. + * Calculates percentage difference and normalizes based on metric type. + */ +export const createResult = ( + metric: string, + baselineValue: number, + currentValue: number, + lowerIsBetter: boolean, + thresholds: ComparisonThresholds +): ComparisonResult => { + const diff = currentValue - baselineValue; + + // Calculate percentage difference + // Handle edge case: when baseline is 0, use a sentinel value to indicate change + let diffPercent: number; + if (baselineValue === 0) { + if (currentValue === 0) { + diffPercent = 0; // No change + } else { + // Significant change from zero baseline - use 100% as sentinel + // The effectiveDiffPercent will handle the direction based on lowerIsBetter + diffPercent = 100; + } + } else { + diffPercent = (diff / baselineValue) * 100; + } + + // Normalize the difference percentage based on metric type + // For "lower is better" metrics (latency, errors): + // - Flip sign so negative = worse, positive = better + // For "higher is better" metrics (throughput, efficiency): + // - Keep sign as-is so positive = better, negative = worse + const effectiveDiffPercent = lowerIsBetter ? -diffPercent : diffPercent; + + // Determine status using normalized percentage + // Since effectiveDiffPercent is normalized, we can use the same logic for both cases + const status = determineStatus(effectiveDiffPercent, thresholds); + + return { + metric, + baseline: baselineValue, + current: currentValue, + diff, + diffPercent: effectiveDiffPercent, + status, + }; +}; + +/** + * Helper to create informational result (values only, no status comparison) + * Used for volatile metrics like p99 and max where status labels are less meaningful + */ +export const createInfoResult = ( + metric: string, + baselineValue: number, + currentValue: number +): ComparisonResult => { + const diff = currentValue - baselineValue; + const diffPercent = baselineValue !== 0 ? (diff / baselineValue) * 100 : 0; + + return { + metric, + baseline: baselineValue, + current: currentValue, + diff, + diffPercent, + status: 'info', + }; +}; + +/** + * Convert bytes to megabytes + */ +export const bytesToMB = (bytes: number): number => { + return bytes / (1024 * 1024); +}; diff --git a/src/commands/utils/metric_comparisons/entity_metrics.ts b/src/commands/utils/metric_comparisons/entity_metrics.ts new file mode 100644 index 0000000..da24925 --- /dev/null +++ b/src/commands/utils/metric_comparisons/entity_metrics.ts @@ -0,0 +1,241 @@ +import { BaselineMetrics } from '../baseline_metrics'; +import { ComparisonResult } from '../metrics_comparison'; +import { createResult } from './comparison_helpers'; +import { ComparisonThresholds } from '../metrics_comparison'; + +// Minimum number of samples required for reliable metric comparison +// Increased to 10 for more reliable percentile calculations (especially p95) +const MIN_SAMPLES_FOR_RELIABLE_METRICS = 10; + +// Service entities have lower sample counts due to 1% distribution in standard tests +// Use a lower threshold for service to allow comparison with fewer samples +const MIN_SAMPLES_FOR_SERVICE_ENTITY = 3; + +/** + * Get the minimum sample threshold for an entity type + */ +const getMinSamplesForEntity = (entityType: string): number => { + return entityType === 'service' + ? MIN_SAMPLES_FOR_SERVICE_ENTITY + : MIN_SAMPLES_FOR_RELIABLE_METRICS; +}; + +/** + * Compare per-entity-type metrics between baseline and current + */ +export const compareEntityMetrics = ( + baseline: BaselineMetrics, + current: BaselineMetrics, + thresholds: ComparisonThresholds +): ComparisonResult[] => { + const results: ComparisonResult[] = []; + + const entityTypes = ['host', 'user', 'service', 'generic'] as const; + for (const entityType of entityTypes) { + const baselineEntity = baseline.metrics.perEntityType[entityType]; + const currentEntity = current.metrics.perEntityType[entityType]; + const minSamplesRequired = getMinSamplesForEntity(entityType); + + // Search Latency per entity - only compare if both have sufficient samples + const baselineSearchSamples = baselineEntity.sampleCounts?.search || 0; + const currentSearchSamples = currentEntity.sampleCounts?.search || 0; + if (baselineSearchSamples >= minSamplesRequired && currentSearchSamples >= minSamplesRequired) { + results.push( + createResult( + `${entityType} - Search Latency (avg)`, + baselineEntity.searchLatency.avg, + currentEntity.searchLatency.avg, + true, + thresholds + ) + ); + results.push( + createResult( + `${entityType} - Search Latency (p95)`, + baselineEntity.searchLatency.p95, + currentEntity.searchLatency.p95, + true, + thresholds + ) + ); + } else { + // Mark as insufficient_data if insufficient samples (to avoid false positives) + // But still calculate the actual difference for visibility + const diffAvg = currentEntity.searchLatency.avg - baselineEntity.searchLatency.avg; + const diffPercentAvg = + baselineEntity.searchLatency.avg !== 0 + ? (diffAvg / baselineEntity.searchLatency.avg) * 100 + : 0; + results.push({ + metric: `${entityType} - Search Latency (avg)`, + baseline: baselineEntity.searchLatency.avg, + current: currentEntity.searchLatency.avg, + diff: diffAvg, + diffPercent: diffPercentAvg, + status: 'insufficient', + }); + const diffP95 = currentEntity.searchLatency.p95 - baselineEntity.searchLatency.p95; + const diffPercentP95 = + baselineEntity.searchLatency.p95 !== 0 + ? (diffP95 / baselineEntity.searchLatency.p95) * 100 + : 0; + results.push({ + metric: `${entityType} - Search Latency (p95)`, + baseline: baselineEntity.searchLatency.p95, + current: currentEntity.searchLatency.p95, + diff: diffP95, + diffPercent: diffPercentP95, + status: 'insufficient', + }); + } + + // Intake Latency per entity - only compare if both have sufficient samples + const baselineIndexSamples = baselineEntity.sampleCounts?.index || 0; + const currentIndexSamples = currentEntity.sampleCounts?.index || 0; + if (baselineIndexSamples >= minSamplesRequired && currentIndexSamples >= minSamplesRequired) { + results.push( + createResult( + `${entityType} - Intake Latency (avg)`, + baselineEntity.intakeLatency.avg, + currentEntity.intakeLatency.avg, + true, + thresholds + ) + ); + results.push( + createResult( + `${entityType} - Intake Latency (p95)`, + baselineEntity.intakeLatency.p95, + currentEntity.intakeLatency.p95, + true, + thresholds + ) + ); + } else { + // Mark as insufficient_data if insufficient samples + // But still calculate the actual difference for visibility + const diffAvg = currentEntity.intakeLatency.avg - baselineEntity.intakeLatency.avg; + const diffPercentAvg = + baselineEntity.intakeLatency.avg !== 0 + ? (diffAvg / baselineEntity.intakeLatency.avg) * 100 + : 0; + results.push({ + metric: `${entityType} - Intake Latency (avg)`, + baseline: baselineEntity.intakeLatency.avg, + current: currentEntity.intakeLatency.avg, + diff: diffAvg, + diffPercent: diffPercentAvg, + status: 'insufficient', + }); + const diffP95 = currentEntity.intakeLatency.p95 - baselineEntity.intakeLatency.p95; + const diffPercentP95 = + baselineEntity.intakeLatency.p95 !== 0 + ? (diffP95 / baselineEntity.intakeLatency.p95) * 100 + : 0; + results.push({ + metric: `${entityType} - Intake Latency (p95)`, + baseline: baselineEntity.intakeLatency.p95, + current: currentEntity.intakeLatency.p95, + diff: diffP95, + diffPercent: diffPercentP95, + status: 'insufficient', + }); + } + + // Processing Latency per entity - only compare if both have sufficient samples + const baselineProcessingSamples = baselineEntity.sampleCounts?.processing || 0; + const currentProcessingSamples = currentEntity.sampleCounts?.processing || 0; + if ( + baselineProcessingSamples >= minSamplesRequired && + currentProcessingSamples >= minSamplesRequired + ) { + results.push( + createResult( + `${entityType} - Processing Latency (avg)`, + baselineEntity.processingLatency.avg, + currentEntity.processingLatency.avg, + true, + thresholds + ) + ); + results.push( + createResult( + `${entityType} - Processing Latency (p95)`, + baselineEntity.processingLatency.p95, + currentEntity.processingLatency.p95, + true, + thresholds + ) + ); + } else { + // Mark as insufficient_data if insufficient samples + // But still calculate the actual difference for visibility + const diffAvg = currentEntity.processingLatency.avg - baselineEntity.processingLatency.avg; + const diffPercentAvg = + baselineEntity.processingLatency.avg !== 0 + ? (diffAvg / baselineEntity.processingLatency.avg) * 100 + : 0; + results.push({ + metric: `${entityType} - Processing Latency (avg)`, + baseline: baselineEntity.processingLatency.avg, + current: currentEntity.processingLatency.avg, + diff: diffAvg, + diffPercent: diffPercentAvg, + status: 'insufficient', + }); + const diffP95 = currentEntity.processingLatency.p95 - baselineEntity.processingLatency.p95; + const diffPercentP95 = + baselineEntity.processingLatency.p95 !== 0 + ? (diffP95 / baselineEntity.processingLatency.p95) * 100 + : 0; + results.push({ + metric: `${entityType} - Processing Latency (p95)`, + baseline: baselineEntity.processingLatency.p95, + current: currentEntity.processingLatency.p95, + diff: diffP95, + diffPercent: diffPercentP95, + status: 'insufficient', + }); + } + + // Documents metrics per entity + results.push( + createResult( + `${entityType} - Documents Processed`, + baselineEntity.documentsProcessed, + currentEntity.documentsProcessed, + false, + thresholds + ) + ); + results.push( + createResult( + `${entityType} - Documents Indexed`, + baselineEntity.documentsIndexed, + currentEntity.documentsIndexed, + false, + thresholds + ) + ); + results.push( + createResult( + `${entityType} - Pages Processed`, + baselineEntity.pagesProcessed, + currentEntity.pagesProcessed, + false, + thresholds + ) + ); + results.push( + createResult( + `${entityType} - Trigger Count`, + baselineEntity.triggerCount, + currentEntity.triggerCount, + false, + thresholds + ) + ); + } + + return results; +}; diff --git a/src/commands/utils/metric_comparisons/error_metrics.ts b/src/commands/utils/metric_comparisons/error_metrics.ts new file mode 100644 index 0000000..18c2930 --- /dev/null +++ b/src/commands/utils/metric_comparisons/error_metrics.ts @@ -0,0 +1,45 @@ +import { BaselineMetrics } from '../baseline_metrics'; +import { ComparisonResult } from '../metrics_comparison'; +import { createResult } from './comparison_helpers'; +import { ComparisonThresholds } from '../metrics_comparison'; + +/** + * Compare error metrics between baseline and current + */ +export const compareErrorMetrics = ( + baseline: BaselineMetrics, + current: BaselineMetrics, + thresholds: ComparisonThresholds +): ComparisonResult[] => { + const results: ComparisonResult[] = []; + + results.push( + createResult( + 'Search Failures', + baseline.metrics.errors.searchFailures, + current.metrics.errors.searchFailures, + true, + thresholds + ) + ); + results.push( + createResult( + 'Index Failures', + baseline.metrics.errors.indexFailures, + current.metrics.errors.indexFailures, + true, + thresholds + ) + ); + results.push( + createResult( + 'Total Failures', + baseline.metrics.errors.totalFailures, + current.metrics.errors.totalFailures, + true, + thresholds + ) + ); + + return results; +}; diff --git a/src/commands/utils/metric_comparisons/kibana_metrics.ts b/src/commands/utils/metric_comparisons/kibana_metrics.ts new file mode 100644 index 0000000..707c9d1 --- /dev/null +++ b/src/commands/utils/metric_comparisons/kibana_metrics.ts @@ -0,0 +1,235 @@ +import { BaselineMetrics } from '../baseline_metrics'; +import { ComparisonResult } from '../metrics_comparison'; +import { createResult, createInfoResult, bytesToMB } from './comparison_helpers'; +import { ComparisonThresholds } from '../metrics_comparison'; + +/** + * Compare Kibana metrics between baseline and current + */ +export const compareKibanaMetrics = ( + baseline: BaselineMetrics, + current: BaselineMetrics, + thresholds: ComparisonThresholds +): ComparisonResult[] => { + const results: ComparisonResult[] = []; + + // Event Loop metrics + results.push( + createResult( + 'Kibana Event Loop Delay (avg)', + baseline.metrics.kibana.eventLoop.delay.avg, + current.metrics.kibana.eventLoop.delay.avg, + true, + thresholds + ) + ); + results.push( + createResult( + 'Kibana Event Loop Delay (p50)', + baseline.metrics.kibana.eventLoop.delay.p50, + current.metrics.kibana.eventLoop.delay.p50, + true, + thresholds + ) + ); + results.push( + createResult( + 'Kibana Event Loop Delay (p95)', + baseline.metrics.kibana.eventLoop.delay.p95, + current.metrics.kibana.eventLoop.delay.p95, + true, + thresholds + ) + ); + results.push( + createInfoResult( + 'Kibana Event Loop Delay (p99)', + baseline.metrics.kibana.eventLoop.delay.p99, + current.metrics.kibana.eventLoop.delay.p99 + ) + ); + results.push( + createInfoResult( + 'Kibana Event Loop Delay (max)', + baseline.metrics.kibana.eventLoop.delay.max, + current.metrics.kibana.eventLoop.delay.max + ) + ); + results.push( + createResult( + 'Kibana Event Loop Utilization (avg)', + baseline.metrics.kibana.eventLoop.utilization.avg, + current.metrics.kibana.eventLoop.utilization.avg, + true, + thresholds + ) + ); + results.push( + createResult( + 'Kibana Event Loop Utilization (peak)', + baseline.metrics.kibana.eventLoop.utilization.peak, + current.metrics.kibana.eventLoop.utilization.peak, + true, + thresholds + ) + ); + + // Elasticsearch Client metrics + results.push( + createInfoResult( + 'Kibana ES Client Active Sockets (avg)', + baseline.metrics.kibana.elasticsearchClient.avgActiveSockets, + current.metrics.kibana.elasticsearchClient.avgActiveSockets + ) + ); + results.push( + createResult( + 'Kibana ES Client Idle Sockets (avg)', + baseline.metrics.kibana.elasticsearchClient.avgIdleSockets, + current.metrics.kibana.elasticsearchClient.avgIdleSockets, + false, + thresholds + ) + ); + results.push( + createResult( + 'Kibana ES Client Queued Requests (peak)', + baseline.metrics.kibana.elasticsearchClient.peakQueuedRequests, + current.metrics.kibana.elasticsearchClient.peakQueuedRequests, + true, + thresholds + ) + ); + + // Response Times metrics + results.push( + createResult( + 'Kibana Response Time (avg)', + baseline.metrics.kibana.responseTimes.avg, + current.metrics.kibana.responseTimes.avg, + true, + thresholds + ) + ); + results.push( + createInfoResult( + 'Kibana Response Time (max)', + baseline.metrics.kibana.responseTimes.max, + current.metrics.kibana.responseTimes.max + ) + ); + + // Memory metrics + results.push( + createResult( + 'Kibana Heap Memory (avg MB)', + bytesToMB(baseline.metrics.kibana.memory.avgHeapBytes), + bytesToMB(current.metrics.kibana.memory.avgHeapBytes), + false, + thresholds + ) + ); + results.push( + createResult( + 'Kibana Heap Memory (peak MB)', + bytesToMB(baseline.metrics.kibana.memory.peakHeapBytes), + bytesToMB(current.metrics.kibana.memory.peakHeapBytes), + false, + thresholds + ) + ); + results.push( + createResult( + 'Kibana RSS Memory (avg MB)', + bytesToMB(baseline.metrics.kibana.memory.avgRssBytes), + bytesToMB(current.metrics.kibana.memory.avgRssBytes), + false, + thresholds + ) + ); + results.push( + createResult( + 'Kibana RSS Memory (peak MB)', + bytesToMB(baseline.metrics.kibana.memory.peakRssBytes), + bytesToMB(current.metrics.kibana.memory.peakRssBytes), + false, + thresholds + ) + ); + + // Request metrics + results.push( + createResult( + 'Kibana Requests (total)', + baseline.metrics.kibana.requests.total, + current.metrics.kibana.requests.total, + false, + thresholds + ) + ); + results.push( + createResult( + 'Kibana Requests (avg per second)', + baseline.metrics.kibana.requests.avgPerSecond, + current.metrics.kibana.requests.avgPerSecond, + false, + thresholds + ) + ); + results.push( + createResult( + 'Kibana Request Error Rate (%)', + baseline.metrics.kibana.requests.errorRate, + current.metrics.kibana.requests.errorRate, + true, + thresholds + ) + ); + results.push( + createResult( + 'Kibana Request Disconnects', + baseline.metrics.kibana.requests.disconnects, + current.metrics.kibana.requests.disconnects, + true, + thresholds + ) + ); + + // OS Load metrics + results.push( + createResult( + 'Kibana OS Load 1m (avg)', + baseline.metrics.kibana.osLoad.avg1m, + current.metrics.kibana.osLoad.avg1m, + true, + thresholds + ) + ); + results.push( + createResult( + 'Kibana OS Load 5m (avg)', + baseline.metrics.kibana.osLoad.avg5m, + current.metrics.kibana.osLoad.avg5m, + true, + thresholds + ) + ); + results.push( + createResult( + 'Kibana OS Load 15m (avg)', + baseline.metrics.kibana.osLoad.avg15m, + current.metrics.kibana.osLoad.avg15m, + true, + thresholds + ) + ); + results.push( + createInfoResult( + 'Kibana OS Load 1m (peak)', + baseline.metrics.kibana.osLoad.peak1m, + current.metrics.kibana.osLoad.peak1m + ) + ); + + return results; +}; diff --git a/src/commands/utils/metric_comparisons/latency_metrics.ts b/src/commands/utils/metric_comparisons/latency_metrics.ts new file mode 100644 index 0000000..afea4c3 --- /dev/null +++ b/src/commands/utils/metric_comparisons/latency_metrics.ts @@ -0,0 +1,146 @@ +import { BaselineMetrics } from '../baseline_metrics'; +import { ComparisonResult } from '../metrics_comparison'; +import { createResult, createInfoResult } from './comparison_helpers'; +import { ComparisonThresholds } from '../metrics_comparison'; + +/** + * Compare latency metrics (Search, Intake, Processing) between baseline and current + */ +export const compareLatencyMetrics = ( + baseline: BaselineMetrics, + current: BaselineMetrics, + thresholds: ComparisonThresholds +): ComparisonResult[] => { + const results: ComparisonResult[] = []; + + // Search Latency metrics + results.push( + createResult( + 'Search Latency (avg)', + baseline.metrics.searchLatency.avg, + current.metrics.searchLatency.avg, + true, + thresholds + ) + ); + results.push( + createResult( + 'Search Latency (p50)', + baseline.metrics.searchLatency.p50, + current.metrics.searchLatency.p50, + true, + thresholds + ) + ); + results.push( + createResult( + 'Search Latency (p95)', + baseline.metrics.searchLatency.p95, + current.metrics.searchLatency.p95, + true, + thresholds + ) + ); + results.push( + createInfoResult( + 'Search Latency (p99)', + baseline.metrics.searchLatency.p99, + current.metrics.searchLatency.p99 + ) + ); + results.push( + createInfoResult( + 'Search Latency (max)', + baseline.metrics.searchLatency.max, + current.metrics.searchLatency.max + ) + ); + + // Intake Latency metrics + results.push( + createResult( + 'Intake Latency (avg)', + baseline.metrics.intakeLatency.avg, + current.metrics.intakeLatency.avg, + true, + thresholds + ) + ); + results.push( + createResult( + 'Intake Latency (p50)', + baseline.metrics.intakeLatency.p50, + current.metrics.intakeLatency.p50, + true, + thresholds + ) + ); + results.push( + createResult( + 'Intake Latency (p95)', + baseline.metrics.intakeLatency.p95, + current.metrics.intakeLatency.p95, + true, + thresholds + ) + ); + results.push( + createInfoResult( + 'Intake Latency (p99)', + baseline.metrics.intakeLatency.p99, + current.metrics.intakeLatency.p99 + ) + ); + results.push( + createInfoResult( + 'Intake Latency (max)', + baseline.metrics.intakeLatency.max, + current.metrics.intakeLatency.max + ) + ); + + // Processing Latency metrics + results.push( + createResult( + 'Processing Latency (avg)', + baseline.metrics.processingLatency.avg, + current.metrics.processingLatency.avg, + true, + thresholds + ) + ); + results.push( + createResult( + 'Processing Latency (p50)', + baseline.metrics.processingLatency.p50, + current.metrics.processingLatency.p50, + true, + thresholds + ) + ); + results.push( + createResult( + 'Processing Latency (p95)', + baseline.metrics.processingLatency.p95, + current.metrics.processingLatency.p95, + true, + thresholds + ) + ); + results.push( + createInfoResult( + 'Processing Latency (p99)', + baseline.metrics.processingLatency.p99, + current.metrics.processingLatency.p99 + ) + ); + results.push( + createInfoResult( + 'Processing Latency (max)', + baseline.metrics.processingLatency.max, + current.metrics.processingLatency.max + ) + ); + + return results; +}; diff --git a/src/commands/utils/metric_comparisons/report_formatter.ts b/src/commands/utils/metric_comparisons/report_formatter.ts new file mode 100644 index 0000000..b441846 --- /dev/null +++ b/src/commands/utils/metric_comparisons/report_formatter.ts @@ -0,0 +1,167 @@ +import { ComparisonReport, ComparisonResult } from '../metrics_comparison'; + +/** + * Helper function to determine appropriate decimal places + */ +const getDecimalPlaces = (baseline: number, current: number): number => { + // For very small values (< 1), use more precision + if (Math.abs(baseline) < 1 || Math.abs(current) < 1) { + // Check if values are very close - if so, need more precision to show difference + const diff = Math.abs(current - baseline); + if (diff < 0.01 && baseline !== 0) { + return 4; // Show 4 decimal places for small differences + } + return 3; // Show 3 decimal places for small values + } + return 2; // Default to 2 decimal places +}; + +/** + * Helper function to format values - integers without decimals, others with adaptive precision + */ +const formatValue = (value: number, metric: string, baseline: number, current: number): string => { + // Check if this is an integer metric + const isIntegerMetric = + metric.includes('Documents Processed') || + metric.includes('Documents Indexed') || + metric.includes('Pages Processed') || + metric.includes('Trigger Count') || + metric.includes('Transform States') || + metric.includes('Search Failures') || + metric.includes('Index Failures') || + metric.includes('Total Failures'); + + if (isIntegerMetric) { + // Format as integer (round to nearest integer) + return Math.round(value).toString(); + } + + // Check if this is a MB metric (memory metrics) + const isMBMetric = metric.includes('MB'); + + if (isMBMetric) { + // Format MB values with 2 decimal places for readability + return value.toFixed(2); + } + + // For non-integer metrics, use adaptive precision + const decimalPlaces = getDecimalPlaces(baseline, current); + return value.toFixed(decimalPlaces); +}; + +/** + * Format comparison report as a table + */ +export const formatComparisonReport = (report: ComparisonReport): string => { + const lines: string[] = []; + + lines.push('\n' + '='.repeat(100)); + lines.push('PERFORMANCE COMPARISON REPORT'); + lines.push('='.repeat(100)); + lines.push(`Baseline: ${report.baselineName}`); + lines.push(`Current: ${report.currentName}`); + lines.push(`Generated: ${report.timestamp}`); + lines.push(''); + lines.push('SUMMARY:'); + lines.push(` ✅ Improvements: ${report.summary.improvements}`); + lines.push(` ⚠️ Warnings: ${report.summary.warnings}`); + lines.push(` ❌ Degradations: ${report.summary.degradations}`); + lines.push(` ➖ Stable: ${report.summary.stable}`); + lines.push(` 📊 Insufficient Data: ${report.summary.insufficientData}`); + lines.push(''); + lines.push('='.repeat(100)); + lines.push('DETAILED METRICS'); + lines.push('='.repeat(100)); + lines.push(''); + + // Header + lines.push( + 'Metric'.padEnd(45) + + 'Baseline'.padStart(15) + + 'Current'.padStart(15) + + 'Diff %'.padStart(12) + + 'Status'.padStart(20) + ); + lines.push('-'.repeat(100)); + + // Group results by category + const categories: Record = { + 'Search Latency': report.results.filter( + (r) => r.metric.startsWith('Search Latency') && !r.metric.includes(' - ') + ), + 'Intake Latency': report.results.filter( + (r) => r.metric.startsWith('Intake Latency') && !r.metric.includes(' - ') + ), + 'Processing Latency': report.results.filter( + (r) => r.metric.startsWith('Processing Latency') && !r.metric.includes(' - ') + ), + CPU: report.results.filter((r) => r.metric.startsWith('CPU')), + Memory: report.results.filter((r) => r.metric.startsWith('Memory')), + Throughput: report.results.filter((r) => r.metric.startsWith('Throughput')), + 'Index Efficiency': report.results.filter((r) => r.metric.startsWith('Index Efficiency')), + 'Pages Processed': report.results.filter((r) => r.metric.startsWith('Pages Processed')), + 'Trigger Count': report.results.filter((r) => r.metric.startsWith('Trigger Count')), + 'Exponential Averages': report.results.filter((r) => r.metric.startsWith('Exp Avg')), + 'Transform States': report.results.filter((r) => r.metric.startsWith('Transform States')), + 'Per-Entity-Type (Host)': report.results.filter((r) => r.metric.startsWith('host - ')), + 'Per-Entity-Type (User)': report.results.filter((r) => r.metric.startsWith('user - ')), + 'Per-Entity-Type (Service)': report.results.filter((r) => r.metric.startsWith('service - ')), + 'Per-Entity-Type (Generic)': report.results.filter((r) => r.metric.startsWith('generic - ')), + Errors: report.results.filter( + (r) => + r.metric.startsWith('Search Failures') || + r.metric.startsWith('Index Failures') || + r.metric.startsWith('Total Failures') + ), + 'Kibana Event Loop': report.results.filter((r) => r.metric.startsWith('Kibana Event Loop')), + 'Kibana ES Client': report.results.filter((r) => r.metric.startsWith('Kibana ES Client')), + 'Kibana Response Times': report.results.filter((r) => + r.metric.startsWith('Kibana Response Time') + ), + 'Kibana Memory': report.results.filter( + (r) => r.metric.startsWith('Kibana') && r.metric.includes('Memory') + ), + 'Kibana Requests': report.results.filter((r) => r.metric.startsWith('Kibana Request')), + 'Kibana OS Load': report.results.filter((r) => r.metric.startsWith('Kibana OS Load')), + }; + + for (const [category, categoryResults] of Object.entries(categories)) { + if (categoryResults.length === 0) continue; + + lines.push(`\n${category}:`); + for (const result of categoryResults) { + const statusIcon = + result.status === 'improvement' + ? '✅' + : result.status === 'degradation' + ? '❌' + : result.status === 'warning' + ? '⚠️ ' + : result.status === 'insufficient' + ? '📊' + : result.status === 'info' + ? 'ℹ️ ' + : '➖'; + + const diffStr = + result.diffPercent >= 0 + ? `+${result.diffPercent.toFixed(1)}%` + : `${result.diffPercent.toFixed(1)}%`; + + lines.push( + ` ${result.metric}`.padEnd(45) + + formatValue(result.baseline, result.metric, result.baseline, result.current).padStart( + 15 + ) + + formatValue(result.current, result.metric, result.baseline, result.current).padStart(15) + + diffStr.padStart(12) + + ` ${statusIcon} ${result.status}`.padStart(20) + ); + } + } + + lines.push(''); + lines.push('='.repeat(100)); + + return lines.join('\n'); +}; diff --git a/src/commands/utils/metric_comparisons/system_metrics.ts b/src/commands/utils/metric_comparisons/system_metrics.ts new file mode 100644 index 0000000..ab299ab --- /dev/null +++ b/src/commands/utils/metric_comparisons/system_metrics.ts @@ -0,0 +1,189 @@ +import { BaselineMetrics } from '../baseline_metrics'; +import { ComparisonResult } from '../metrics_comparison'; +import { createResult } from './comparison_helpers'; +import { ComparisonThresholds } from '../metrics_comparison'; + +/** + * Compare system metrics (CPU, Memory, Throughput, etc.) between baseline and current + */ +export const compareSystemMetrics = ( + baseline: BaselineMetrics, + current: BaselineMetrics, + thresholds: ComparisonThresholds +): ComparisonResult[] => { + const results: ComparisonResult[] = []; + + // CPU metrics + results.push( + createResult('CPU (avg)', baseline.metrics.cpu.avg, current.metrics.cpu.avg, false, thresholds) + ); + results.push( + createResult( + 'CPU (peak)', + baseline.metrics.cpu.peak, + current.metrics.cpu.peak, + false, + thresholds + ) + ); + + // Memory metrics + results.push( + createResult( + 'Memory Heap % (avg)', + baseline.metrics.memory.avgHeapPercent, + current.metrics.memory.avgHeapPercent, + false, + thresholds + ) + ); + results.push( + createResult( + 'Memory Heap % (peak)', + baseline.metrics.memory.peakHeapPercent, + current.metrics.memory.peakHeapPercent, + false, + thresholds + ) + ); + + // Throughput metrics (higher is better, so lowerIsBetter = false) + results.push( + createResult( + 'Throughput (avg docs/sec)', + baseline.metrics.throughput.avgDocumentsPerSecond, + current.metrics.throughput.avgDocumentsPerSecond, + false, + thresholds + ) + ); + results.push( + createResult( + 'Throughput (peak docs/sec)', + baseline.metrics.throughput.peakDocumentsPerSecond, + current.metrics.throughput.peakDocumentsPerSecond, + false, + thresholds + ) + ); + + // Index Efficiency metrics + results.push( + createResult( + 'Index Efficiency (ratio)', + baseline.metrics.indexEfficiency.avgRatio, + current.metrics.indexEfficiency.avgRatio, + false, // Higher ratio means better efficiency + thresholds + ) + ); + results.push( + createResult( + 'Index Efficiency (total indexed)', + baseline.metrics.indexEfficiency.totalDocumentsIndexed, + current.metrics.indexEfficiency.totalDocumentsIndexed, + false, + thresholds + ) + ); + results.push( + createResult( + 'Index Efficiency (total processed)', + baseline.metrics.indexEfficiency.totalDocumentsProcessed, + current.metrics.indexEfficiency.totalDocumentsProcessed, + false, + thresholds + ) + ); + + // Pages Processed metrics + results.push( + createResult( + 'Pages Processed (total)', + baseline.metrics.pagesProcessed.total, + current.metrics.pagesProcessed.total, + false, + thresholds + ) + ); + results.push( + createResult( + 'Pages Processed (avg per sample)', + baseline.metrics.pagesProcessed.avgPerSample, + current.metrics.pagesProcessed.avgPerSample, + false, + thresholds + ) + ); + + // Trigger Count metrics + results.push( + createResult( + 'Trigger Count (total)', + baseline.metrics.triggerCount.total, + current.metrics.triggerCount.total, + false, + thresholds + ) + ); + results.push( + createResult( + 'Trigger Count (avg per transform)', + baseline.metrics.triggerCount.avgPerTransform, + current.metrics.triggerCount.avgPerTransform, + false, + thresholds + ) + ); + + // Exponential Averages metrics + results.push( + createResult( + 'Exp Avg Checkpoint Duration', + baseline.metrics.exponentialAverages.checkpointDuration, + current.metrics.exponentialAverages.checkpointDuration, + true, + thresholds + ) + ); + results.push( + createResult( + 'Exp Avg Documents Indexed', + baseline.metrics.exponentialAverages.documentsIndexed, + current.metrics.exponentialAverages.documentsIndexed, + false, + thresholds + ) + ); + results.push( + createResult( + 'Exp Avg Documents Processed', + baseline.metrics.exponentialAverages.documentsProcessed, + current.metrics.exponentialAverages.documentsProcessed, + false, + thresholds + ) + ); + + // Transform States metrics + results.push( + createResult( + 'Transform States (indexing)', + baseline.metrics.transformStates.indexing, + current.metrics.transformStates.indexing, + false, + thresholds + ) + ); + results.push( + createResult( + 'Transform States (started)', + baseline.metrics.transformStates.started, + current.metrics.transformStates.started, + false, + thresholds + ) + ); + + return results; +}; diff --git a/src/commands/utils/metrics_comparison.ts b/src/commands/utils/metrics_comparison.ts new file mode 100644 index 0000000..776de0c --- /dev/null +++ b/src/commands/utils/metrics_comparison.ts @@ -0,0 +1,94 @@ +import { BaselineMetrics } from './baseline_metrics'; +import { compareKibanaMetrics } from './metric_comparisons/kibana_metrics'; +import { compareLatencyMetrics } from './metric_comparisons/latency_metrics'; +import { compareSystemMetrics } from './metric_comparisons/system_metrics'; +import { compareEntityMetrics } from './metric_comparisons/entity_metrics'; +import { compareErrorMetrics } from './metric_comparisons/error_metrics'; + +export interface ComparisonResult { + metric: string; + baseline: number; + current: number; + diff: number; + diffPercent: number; + status: 'improvement' | 'degradation' | 'warning' | 'stable' | 'insufficient' | 'info'; +} + +export interface ComparisonReport { + baselineName: string; + currentName: string; + timestamp: string; + results: ComparisonResult[]; + summary: { + improvements: number; + degradations: number; + warnings: number; + stable: number; + insufficientData: number; + }; +} + +export interface ComparisonThresholds { + degradationThreshold: number; // Percentage worse to be considered degradation (default: 20) + warningThreshold: number; // Percentage worse to be considered warning (default: 10) + improvementThreshold: number; // Percentage better to be considered improvement (default: 10) +} + +const DEFAULT_THRESHOLDS: ComparisonThresholds = { + degradationThreshold: 20, + warningThreshold: 10, + improvementThreshold: 10, +}; + +/** + * Compare current metrics against baseline + */ +export const compareMetrics = ( + baseline: BaselineMetrics, + current: BaselineMetrics, + thresholds: ComparisonThresholds = DEFAULT_THRESHOLDS +): ComparisonReport => { + const results: ComparisonResult[] = [ + ...compareLatencyMetrics(baseline, current, thresholds), + ...compareSystemMetrics(baseline, current, thresholds), + ...compareEntityMetrics(baseline, current, thresholds), + ...compareErrorMetrics(baseline, current, thresholds), + ...compareKibanaMetrics(baseline, current, thresholds), + ]; + + // Calculate summary + // Note: 'info' status metrics (p99, max) are excluded from summary counts + const summary = { + improvements: results.filter((r) => r.status === 'improvement').length, + degradations: results.filter((r) => r.status === 'degradation').length, + warnings: results.filter((r) => r.status === 'warning').length, + stable: results.filter((r) => r.status === 'stable').length, + insufficientData: results.filter((r) => r.status === 'insufficient').length, + }; + + return { + baselineName: baseline.testName, + currentName: current.testName, + timestamp: new Date().toISOString(), + results, + summary, + }; +}; + +// Re-export formatComparisonReport from report_formatter +export { formatComparisonReport } from './metric_comparisons/report_formatter'; + +/** + * Build comparison thresholds from options + */ +export const buildComparisonThresholds = (options: { + degradationThreshold?: number; + warningThreshold?: number; + improvementThreshold?: number; +}): ComparisonThresholds => { + return { + degradationThreshold: options.degradationThreshold || DEFAULT_THRESHOLDS.degradationThreshold, + warningThreshold: options.warningThreshold || DEFAULT_THRESHOLDS.warningThreshold, + improvementThreshold: options.improvementThreshold || DEFAULT_THRESHOLDS.improvementThreshold, + }; +}; diff --git a/src/constants.ts b/src/constants.ts index cfde06a..dc2ba28 100644 --- a/src/constants.ts +++ b/src/constants.ts @@ -73,3 +73,7 @@ export const ENTITY_ENGINES_URL = '/api/entity_store/engines'; export const ENTITY_ENGINE_URL = (engineType: string) => `${ENTITY_ENGINES_URL}/${engineType}`; export const INIT_ENTITY_ENGINE_URL = (engineType: string) => `${ENTITY_ENGINE_URL(engineType)}/init`; + +// Kibana Settings API endpoints +export const KIBANA_SETTINGS_URL = '/api/kibana/settings'; +export const KIBANA_SETTINGS_INTERNAL_URL = '/internal/kibana/settings'; diff --git a/src/index.ts b/src/index.ts index d89ac56..00db630 100644 --- a/src/index.ts +++ b/src/index.ts @@ -16,6 +16,8 @@ import { listPerfDataFiles, uploadPerfDataFile, uploadPerfDataFileInterval, + ENTITY_DISTRIBUTIONS, + DistributionType, } from './commands/entity_store_perf'; import { checkbox, input } from '@inquirer/prompts'; import { @@ -40,6 +42,18 @@ import fs from 'fs'; import * as RiskEngine from './risk_engine/generate_perf_data'; import * as RiskEngineIngest from './risk_engine/ingest'; import * as Pain from './risk_engine/scripted_metrics_stress_test'; +import { + extractBaselineMetrics, + saveBaseline, + loadBaseline, + listBaselines, + loadBaselineWithPattern, +} from './commands/utils/baseline_metrics'; +import { + compareMetrics, + formatComparisonReport, + buildComparisonThresholds, +} from './commands/utils/metrics_comparison'; await createConfigFileOnFirstRun(); @@ -198,9 +212,34 @@ program .argument('', 'number of entities', parseIntBase10) .argument('', 'number of logs per entity', parseIntBase10) .argument('[start-index]', 'for sequential data, which index to start at', parseIntBase10, 0) + .option( + '--distribution ', + `Entity distribution type: equal (user/host/generic/service: 25% each), standard (user/host/generic/service: 33/33/33/1) (default: standard)`, + 'standard' + ) .description('Create performance data') - .action((name, entityCount, logsPerEntity, startIndex) => { - createPerfDataFile({ name, entityCount, logsPerEntity, startIndex }); + .action(async (name, entityCount, logsPerEntity, startIndex, options) => { + const distributionType = options.distribution as DistributionType; + + // Validate distribution type + if (!ENTITY_DISTRIBUTIONS[distributionType]) { + console.error(`❌ Invalid distribution type: ${distributionType}`); + console.error(` Available types: ${Object.keys(ENTITY_DISTRIBUTIONS).join(', ')}`); + process.exit(1); + } + + try { + await createPerfDataFile({ + name, + entityCount, + logsPerEntity, + startIndex, + distribution: distributionType, + }); + } catch (error) { + console.error('Failed to create performance data file:', error); + process.exit(1); + } }); program @@ -224,6 +263,19 @@ program .option('--count ', 'number of times to upload', parseIntBase10, 10) .option('--deleteData', 'Delete all entities before uploading') .option('--deleteEngines', 'Delete all entities before uploading') + .option( + '--transformTimeout ', + 'Timeout in minutes for waiting for generic transform to complete (default: 30)', + parseIntBase10, + 30 + ) + .option( + '--samplingInterval ', + 'Sampling interval in seconds for metrics collection (default: 5)', + parseIntBase10, + 5 + ) + .option('--noTransforms', 'Skip transform-related operations (for ESQL workflows)') .description('Upload performance data file') .action(async (file, options) => { await uploadPerfDataFileInterval( @@ -231,7 +283,10 @@ program options.interval * 1000, options.count, options.deleteData, - options.deleteEngines + options.deleteEngines, + options.transformTimeout * 60 * 1000, // Convert minutes to milliseconds + options.samplingInterval * 1000, // Convert seconds to milliseconds + options.noTransforms // Skip transform-related operations ); }); @@ -512,4 +567,129 @@ program }); }); +// Baseline metrics commands +program + .command('create-baseline') + .argument('', 'Prefix of log files (e.g., tmp-all-2025-11-13T15:03:32)') + .option('-e ', 'Number of entities', parseIntBase10) + .option('-l ', 'Number of logs per entity', parseIntBase10) + .option('-u ', 'Number of uploads (for interval tests)', parseIntBase10) + .option('-i ', 'Interval in milliseconds (for interval tests)', parseIntBase10) + .option('-n ', 'Custom name for baseline (defaults to log-prefix)') + .description('Extract metrics from logs and create a baseline') + .action(async (logPrefix, options) => { + try { + const testConfig = { + entityCount: options.e || 0, + logsPerEntity: options.l || 0, + uploadCount: options.u, + intervalMs: options.i, + }; + + console.log(`Extracting baseline metrics from logs with prefix: ${logPrefix}`); + const baseline = await extractBaselineMetrics(logPrefix, testConfig); + + if (options.n) { + baseline.testName = options.n; + } + + const filepath = saveBaseline(baseline); + console.log(`\n✅ Baseline created successfully!`); + console.log(`File: ${filepath}`); + console.log(`\nSummary:`); + console.log(` Search Latency (avg): ${baseline.metrics.searchLatency.avg.toFixed(2)}ms`); + console.log(` Intake Latency (avg): ${baseline.metrics.intakeLatency.avg.toFixed(2)}ms`); + console.log(` CPU (avg): ${baseline.metrics.cpu.avg.toFixed(2)}%`); + console.log(` Memory Heap (avg): ${baseline.metrics.memory.avgHeapPercent.toFixed(2)}%`); + console.log( + ` Throughput (avg): ${baseline.metrics.throughput.avgDocumentsPerSecond.toFixed(2)} docs/sec` + ); + console.log(` Errors: ${baseline.metrics.errors.totalFailures}`); + } catch (error) { + console.error('❌ Failed to create baseline:', error); + if (error instanceof Error) { + console.error('Error:', error.message); + } + process.exit(1); + } + }); + +program + .command('list-baselines') + .description('List all available baselines') + .action(() => { + const baselines = listBaselines(); + if (baselines.length === 0) { + console.log('No baselines found.'); + return; + } + + console.log(`\nFound ${baselines.length} baseline(s):\n`); + baselines.forEach((filepath: string, index: number) => { + try { + const baseline = loadBaseline(filepath); + console.log(`${index + 1}. ${baseline.testName}`); + console.log(` Timestamp: ${baseline.timestamp}`); + console.log(` File: ${filepath}`); + console.log(''); + } catch { + console.log(`${index + 1}. ${filepath} (error loading)`); + } + }); + }); + +program + .command('compare-metrics') + .argument('', 'Prefix of current run log files') + .option('-b ', 'Path to baseline file (or use latest if not specified)') + .option('-e ', 'Number of entities for current run', parseIntBase10) + .option('-l ', 'Number of logs per entity for current run', parseIntBase10) + .option('-u ', 'Number of uploads for current run', parseIntBase10) + .option('-i ', 'Interval in milliseconds for current run', parseIntBase10) + .option('--degradation-threshold ', 'Degradation threshold percentage', parseFloat) + .option('--warning-threshold ', 'Warning threshold percentage', parseFloat) + .option('--improvement-threshold ', 'Improvement threshold percentage', parseFloat) + .description('Compare current run metrics against a baseline') + .action(async (currentLogPrefix, options) => { + try { + // Load baseline + const { baseline } = loadBaselineWithPattern(options.b); + + // Extract current metrics + const currentTestConfig = { + entityCount: options.e || 0, + logsPerEntity: options.l || 0, + uploadCount: options.u, + intervalMs: options.i, + }; + + console.log(`Extracting metrics from current run: ${currentLogPrefix}`); + const current = await extractBaselineMetrics(currentLogPrefix, currentTestConfig); + + // Build thresholds and compare + const thresholds = buildComparisonThresholds({ + degradationThreshold: options.degradationThreshold, + warningThreshold: options.warningThreshold, + improvementThreshold: options.improvementThreshold, + }); + + const report = compareMetrics(baseline, current, thresholds); + + // Print report + console.log(formatComparisonReport(report)); + + // Exit with error code if degradations found + if (report.summary.degradations > 0) { + console.log(`\n⚠️ Warning: ${report.summary.degradations} metric(s) show degradation.`); + process.exit(1); + } + } catch (error) { + console.error('❌ Failed to compare metrics:', error); + if (error instanceof Error) { + console.error('Error:', error.message); + } + process.exit(1); + } + }); + program.parse(); diff --git a/src/utils/kibana_api.ts b/src/utils/kibana_api.ts index f33d69a..c8c78b6 100644 --- a/src/utils/kibana_api.ts +++ b/src/utils/kibana_api.ts @@ -21,6 +21,8 @@ import { DETECTION_ENGINE_RULES_BULK_ACTION_URL, API_VERSIONS, RISK_SCORE_ENGINE_SCHEDULE_NOW_URL, + KIBANA_SETTINGS_URL, + KIBANA_SETTINGS_INTERNAL_URL, } from '../constants'; export const buildKibanaUrl = (opts: { path: string; space?: string }) => { @@ -351,7 +353,10 @@ const _deleteEngine = (engineType: string, space?: string) => { ); }; -export const deleteEngines = async (entityTypes: string[] = ['host', 'user'], space?: string) => { +export const deleteEngines = async ( + entityTypes: string[] = ['host', 'user', 'service', 'generic'], + space?: string +) => { const responses = await Promise.all( entityTypes.map((entityType) => _deleteEngine(entityType, space)) ); @@ -367,39 +372,216 @@ const _listEngines = (space?: string) => { { apiVersion: API_VERSIONS.public.v1, space } ); - return res as Promise<{ engines: Array<{ status: string }> }>; + return res as Promise<{ + engines: Array<{ + type?: string; + name?: string; + id?: string; + status: string; + error?: string; + message?: string; + }>; + }>; }; -const allEnginesAreStarted = async (space?: string) => { +const allRequestedEnginesAreStarted = async (entityTypes: string[], space?: string) => { const { engines } = await _listEngines(space); if (engines.length === 0) { return false; } - return engines.every((engine) => engine.status === 'started'); + + // Check that all requested entity types are present and started + for (const entityType of entityTypes) { + // Try to find engine by type, name, or id field + const engine = engines.find( + (e) => e.type === entityType || e.name === entityType || e.id === entityType + ); + if (!engine || engine.status !== 'started') { + return false; + } + } + + return true; +}; + +const getEngineStatusDetails = async (entityTypes: string[], space?: string) => { + const { engines } = await _listEngines(space); + const missingEngines = entityTypes.filter( + (entityType) => + !engines.find((e) => e.type === entityType || e.name === entityType || e.id === entityType) + ); + const errorEngines = entityTypes.filter((entityType) => { + const engine = engines.find( + (e) => e.type === entityType || e.name === entityType || e.id === entityType + ); + return engine && engine.status === 'error'; + }); + const notStartedEngines = entityTypes.filter((entityType) => { + const engine = engines.find( + (e) => e.type === entityType || e.name === entityType || e.id === entityType + ); + return engine && engine.status !== 'started' && engine.status !== 'error'; + }); + + return { + missingEngines, + errorEngines, + notStartedEngines, + availableEngines: engines.map((e) => ({ + type: e.type, + name: e.name, + id: e.id, + status: e.status, + error: e.error, + message: e.message, + })), + }; +}; + +/** + * Updates Kibana advanced settings. + * + * @param settings - Dictionary of settings to update in Kibana + * @returns The response from the Kibana settings API + */ +export const updateKibanaSettings = async (settings: Record) => { + const config = getConfig(); + + // Use standard API endpoint by default + let path = KIBANA_SETTINGS_URL; + + // Update to serverless endpoint if needed + if (config.serverless) { + path = KIBANA_SETTINGS_INTERNAL_URL; + console.log('Detected serverless deployment, switching to internal API endpoint.'); + } else { + console.log('Using standard Kibana settings API endpoint.'); + } + + const payload = { + changes: settings, + }; + + try { + const response = await kibanaFetch<{ settings: Record }>( + path, + { + method: 'POST', + body: JSON.stringify(payload), + }, + { + // Advanced Settings API version + apiVersion: '1', + } + ); + + console.log('Kibana settings updated successfully.'); + return response; + } catch (error) { + console.error('Failed to update Kibana settings:', error); + throw error; + } +}; + +/** + * Enables the Asset Inventory feature in Kibana. + * This is required for generic entity types to work. + */ +export const enableAssetInventory = async () => { + console.log('Enabling Asset Inventory feature...'); + await updateKibanaSettings({ + 'securitySolution:enableAssetInventory': true, + }); + console.log('Asset Inventory feature enabled.'); + // Wait a moment for the setting to take effect + await new Promise((resolve) => setTimeout(resolve, 5000)); }; export const initEntityEngineForEntityTypes = async ( - entityTypes: string[] = ['host', 'user', 'service'], + entityTypes: string[] = ['host', 'user', 'service', 'generic'], space?: string ) => { - if (await allEnginesAreStarted(space)) { - console.log('All engines are already started'); + // Enable Asset Inventory if generic entities are requested + if (entityTypes.includes('generic')) { + try { + await enableAssetInventory(); + } catch (error) { + console.warn('Failed to enable Asset Inventory feature, continuing anyway:', error); + } + } + + if (await allRequestedEnginesAreStarted(entityTypes, space)) { + console.log('All requested engines are already started'); return; } - await Promise.all(entityTypes.map((entityType) => _initEngine(entityType, space))); + + console.log(`Initializing engines for types: ${entityTypes.join(', ')}`); + const initResults = await Promise.allSettled( + entityTypes.map((entityType) => _initEngine(entityType, space)) + ); + + // Log any initialization failures + initResults.forEach((result, index) => { + if (result.status === 'rejected') { + console.error(`Failed to initialize engine for '${entityTypes[index]}':`, result.reason); + } else { + console.log(`Successfully initialized engine for '${entityTypes[index]}'`); + } + }); + const attempts = 20; const delay = 2000; for (let i = 0; i < attempts; i++) { console.log('Checking if all engines are started attempt:', i + 1); - if (await allEnginesAreStarted(space)) { + + // Check for engines in error state during polling + const statusDetails = await getEngineStatusDetails(entityTypes, space); + if (statusDetails.errorEngines.length > 0) { + console.warn( + `Engines in error state detected: ${statusDetails.errorEngines.join(', ')}. This may indicate a configuration issue.` + ); + if (statusDetails.errorEngines.includes('generic')) { + console.warn( + 'Generic engine is in error state. Ensure Asset Inventory feature is enabled in Kibana advanced settings.' + ); + } + } + + if (await allRequestedEnginesAreStarted(entityTypes, space)) { console.log('All engines are started'); return; } await new Promise((resolve) => setTimeout(resolve, delay)); } - throw new Error('Failed to start engines'); + // Final check with detailed error message + const statusDetails = await getEngineStatusDetails(entityTypes, space); + let errorMessage = 'Failed to start engines.'; + if (statusDetails.missingEngines.length > 0) { + errorMessage += ` Missing engines: ${statusDetails.missingEngines.join(', ')}.`; + } + if (statusDetails.errorEngines.length > 0) { + errorMessage += ` Engines in error state: ${statusDetails.errorEngines.join(', ')}.`; + // Log error details for engines in error state + statusDetails.errorEngines.forEach((entityType) => { + const engine = statusDetails.availableEngines.find( + (e) => e.type === entityType || e.name === entityType || e.id === entityType + ); + if (engine) { + console.error( + `Engine '${entityType}' error details:`, + engine.error || engine.message || 'No error details available' + ); + } + }); + } + if (statusDetails.notStartedEngines.length > 0) { + errorMessage += ` Engines not started: ${statusDetails.notStartedEngines.join(', ')}.`; + } + errorMessage += ` Available engines: ${JSON.stringify(statusDetails.availableEngines)}`; + + throw new Error(errorMessage); }; export const getAllRules = async (space?: string) => { diff --git a/yarn.lock b/yarn.lock index 7513425..6780ef1 100644 --- a/yarn.lock +++ b/yarn.lock @@ -1632,7 +1632,7 @@ tslib@^2.4.0, tslib@^2.6.2, tslib@^2.8.0, tslib@^2.8.1: resolved "https://registry.yarnpkg.com/tslib/-/tslib-2.8.1.tgz#612efe4ed235d567e8aba5f2a5fab70280ade83f" integrity sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w== -tsx@^4.7.1: +tsx@^4.20.6: version "4.20.6" resolved "https://registry.yarnpkg.com/tsx/-/tsx-4.20.6.tgz#8fb803fd9c1f70e8ccc93b5d7c5e03c3979ccb2e" integrity sha512-ytQKuwgmrrkDTFP4LjR0ToE2nqgy886GpvRSpU0JAnrdBYppuY5rLkRUYPU1yCryb24SsKBTL/hlDQAEFVwtZg==