Monitoring & Metrics
Joynare Nexus includes a built-in, persistent metrics system designed for high-performance monitoring and historical analysis.
1. Overview
Unlike standard logs, which are optimized for human readability and debugging, Metrics are optimized for machine processing and long-term storage. They allow you to:
- Track API usage and response codes.
- Identify slow services (Flows and Adapters).
- Analyze request volume and trends over time.
- Power administrative monitoring dashboards.
2. Persistent Storage
Metrics are stored in the system database configured in config/system.yaml. Two specialized tables are automatically created upon startup:
A. API Request Metrics (metrics_requests)
Tracks every HTTP request handled by the ESB.
trace_id: Correlates the request with system logs.path: The URL path of the request.method: HTTP Verb (GET, POST, etc.).status_code: The HTTP response code (e.g., 200, 404, 500).duration_ms: Total end-to-end processing time in milliseconds.
B. Service Execution Metrics (metrics_services)
Tracks every Flow or Adapter execution, including those nested within other flows.
service_key: The qualified name of the service (e.g.,orders:ProcessOrder).duration_ms: The internal execution time of the service.status: Whether the service finished withSUCCESSorFAILURE.
3. High-Performance Architecture
To ensure that monitoring does not slow down your business logic, Joynare Nexus uses an Asynchronous Batching strategy:
- Non-Blocking Capture: When a request or service completes, its data is pushed into an in-memory buffered channel.
- Background Worker: A dedicated background goroutine periodically drains this channel.
- Bulk Insertion: Data is written to the database in batches (e.g., every 5 seconds or every 100 records). This significantly reduces the overhead of database round-trips.
4. Configuration
The metrics system reuses the system database connection. Ensure your config/system.yaml is correctly configured:
database:
driver: "mysql"
dsn: "user:pass@tcp(localhost:3306)/nexus_db?parseTime=true"The worker settings are currently tuned for production defaults (5-second flush interval, 100-record batch size).
5. Querying Metrics
You can query the metrics directly from your database to generate reports.
Slowest Services (Top 10)
SELECT service_key, AVG(duration_ms) as avg_lat
FROM metrics_services
GROUP BY service_key
ORDER BY avg_lat DESC
LIMIT 10;Request Volume by Status Code
SELECT status_code, COUNT(*) as count
FROM metrics_requests
GROUP BY status_code;Hourly Usage Trend
SELECT DATE_FORMAT(timestamp, '%Y-%m-%d %H:00:00') as hour, COUNT(*) as volume
FROM metrics_requests
GROUP BY hour
ORDER BY hour DESC;