Health Monitoring rev 04
Health Monitoring integrates the Checkmk monitoring stack into JovianDSS as a self-contained LXC container. It collects metrics from the local Storage Server out-of-the-box and can be extended to monitor additional Storage Servers over the network. Notifications for state changes are delivered by email using the Storage Server's existing SMTP settings.
Note: Health Monitoring is delivered as an optional Small Update (the xc-checkmk module). This article describes revision 04. If your system was updated to a newer revision, refer to the matching Extension:Health_Monitoring_rev_NN article.
What's new in revision 04
Revision 04 is a major refresh of the container and a behavior change for several end-user-visible areas. Highlights:
- Platform — container rebased on Debian 12 (bookworm); Checkmk updated to 2.1.0p49.
- Login uses the Storage Server admin password — the admin user is authenticated via PAM against the same credentials you use on the Storage Server GUI. Changing the admin password on the Storage Server immediately applies here; no separate password management. The legacy cmkadmin user is preserved for backward compatibility.
- Email uses the Storage Server's SMTP settings — notifications are sent through the same email gateway configured under System Settings → Administration → Email notifications. The To address is applied automatically via a placeholder (admin@localhost) that is rewritten at send time, so changes to the Storage Server's email settings take effect immediately without touching Checkmk.
- Automatic service discovery — newly imported pools and their datasets appear in monitoring automatically (typically within 2 hours). Exported pools and removed plugins are cleaned up the same way. No manual Activate Changes click is required for discovered services.
- Zero spurious notifications around reboots — Checkmk is placed in scheduled maintenance before the container stops, and the maintenance state is cleared at boot once services have settled. Pool-specific downtimes are cleared event-driven when each pool mounts, so a pool that fails to import still produces a legitimate alert once the downtime window expires.
- No "flapping" emails — brief outages during reboots, pool imports, or network blips no longer generate flap notifications. Flap state is still visible in the GUI.
- OEM branding — ScaleLogic-branded sidebar icon and default-password handling for ScaleLogic systems.
Accessing the Checkmk GUI
Once the Storage Server boots with the Health Monitoring Small Update installed, the Checkmk GUI is available at:
https://<storage-server-ip>:4080/dssmonitor
- Login: admin (same credentials as the Storage Server GUI)
- Password: your Storage Server administrator password
The Checkmk web interface uses the same authentication as the Storage Server GUI — when you change your admin password on the Storage Server, it automatically applies to Checkmk as well. No separate password management is needed.
Note: the legacy cmkadmin user is also available for backward compatibility.
Local Storage Server configuration
The local Storage Server is pre-configured in the Checkmk container as local-storage-server (in the storage-servers folder). It receives monitoring data from the local Checkmk agent.
Pre-configured monitoring rules
The default monitoring rules are located at:
Setup → Agents → Other integrations → Individual program call instead of agent access
Three rules are pre-configured:
Rule 1: Local Storage Server (Enabled)
- Purpose: monitor the local Storage Server via SSH localhost
- Status: enabled by default
- Explicit Hosts entry:
~local-storage-server— the tilde (~) makes the rule apply to every host name beginning with local-storage-server. - Customization: if you prefer a more descriptive host name (e.g. local-storage-server-220), create the new hostname and remove the default local-storage-server entry.
Rule 2: Remote monitoring via REST API (Disabled)
- Purpose: monitor additional Storage Servers via the REST API
- Status: disabled by default
- Explicit Hosts entry:
~storage-server - Default credentials:
rest_api_user=admin; rest_api_pswd=admin; rest_api_port=82;
Prerequisites on each remote Storage Server:
- Enable REST API under System Settings → Administration → REST API access.
- Update the command-line credentials above if the remote password differs from admin.
Rule 3: Remote monitoring via SSH CLI (Enabled)
- Purpose: monitor additional Storage Servers using the SSH CLI command check_mk_agent
- Status: enabled by default
- Explicit Hosts entry:
~storage-server - Default credentials:
rest_api_user=admin; rest_api_pswd=admin; rest_api_port=82; cli_port=22223;
Prerequisites on each remote Storage Server:
- Enable CLI access under System Settings → Administration → CLI access and click Generate and download to retrieve the SSH key.
- Enable the REST API (required for the initial SSH-key download) under System Settings → Administration → REST API access.
- Update the command-line credentials if the remote Storage Server uses different settings.
Step-by-step: add a remote Storage Server
- On the remote Storage Server, enable CLI access and download the SSH key; enable REST API access.
- In Checkmk, create a new host with a name starting with storage-server (e.g. storage-server-220). The SSH-CLI rule will apply automatically via the
~storage-serverpattern. - Update the credentials in the rule if the remote Storage Server uses different values.
- Run service discovery on the new host and verify services are discovered.
Explicit hosts configuration
Rules with tilde-prefixed entries (e.g. ~storage-server, ~local-storage-server) apply to every monitored host whose name begins with the given string:
- storage-server-220, storage-server-221, …
- local-storage-server-220, local-storage-server-221, …
Storage space and automatic cleanup
- Allocated storage: approximately 4 GB is reserved for data collection.
- Automatic cleanup: Checkmk removes older data when free space falls below 300 MB.
Console access
The console of the container is reachable at:
https://<ip>:4200/checkmk/
Log in as admin with the Storage Server administrator password.
Email notifications
Email notifications are automatically configured from the Storage Server's own email settings — no separate SMTP configuration is needed inside Checkmk.
How it works
- Configure email on the Storage Server: System Settings → Administration → Email notifications. Set the SMTP server, From address and To address.
- Once saved:
- the From address is used as the sender for Checkmk notification emails;
- the To address is used as the destination — Checkmk contacts are configured with a placeholder (admin@localhost) that the email gateway rewrites to your configured To address;
- changes to either address take effect immediately; no Checkmk restart needed.
- Additional customization is available in the Checkmk GUI:
- notification rules under Setup → Events → Notifications;
- per-user email settings under Setup → Users.
Important notes
- Email delivery uses the Storage Server's central SMTP proxy — the same gateway used by ownCloud, cron jobs, etc.
- The default notification rule emails only for actual state changes (OK↔WARN↔CRIT↔UNKNOWN). Scheduled-downtime start/end and flapping events do not generate emails.
- To send a test email, use Setup → Events → Notifications → Test notifications.
Automatic service discovery
Checkmk automatically manages the list of monitored services:
- new services (e.g. from a newly imported pool) are added to monitoring within 2 hours;
- vanished services (e.g. from an exported pool or a removed plugin) are cleaned up within 2 hours;
- changes are activated automatically — no manual Activate Changes click needed for discovered services.
So, when you import a new ZFS pool, its capacity, health, snapshot age and compression services appear in Checkmk on their own. When you export a pool, the corresponding services are removed instead of lingering in UNKNOWN state. When a monitoring plugin is added or removed, services are updated accordingly.
Manual discovery
To avoid waiting for the automatic cycle, trigger discovery manually:
- via the GUI: go to the host → Setup → Services → Full service scan
- via the console:
su - dssmonitor -c "cmk -II local-storage-server && cmk -O"
Reboot behavior
The monitoring container is designed to produce zero spurious notification emails during system reboots:
- At shutdown: Checkmk automatically enters a scheduled maintenance mode before the container stops, so no service unavailable alerts fire while the system is going down.
- At boot: pool-specific downtimes are cleared automatically the moment each pool mounts; downtimes on other services (SMART, CPU, latency) are cleared once monitoring has confirmed three consecutive healthy checks. Monitoring then resumes normally.
- Boot ordering: the container starts only after ZFS pools are imported and the monitoring agent has collected fresh data, so the first check cycle sees everything already in its normal state.
If a real problem persists after the boot window (e.g. a pool fails to import), it is reported normally once the maintenance period for that pool expires.
For further customization or troubleshooting, refer to the upstream Checkmk documentation or contact Open-E support.