Requirements

[R001] lmaster

Main daemon program that controls all other components.

[R002] lagent

Program in charge of collecting data and reporting it somewhere else. An instance of this program will run on each system to be monitored.

[R003] limon status

The utility program limon has a status sub-command that reports about the running status of LiMonaDA. For instance:

$ limon status
[master]
status: ok
IP: aaa.bbb.ccc.ddd
port: xzyjk

[agent: node01]
status: ok
IP: aaa.bbb.ccc.ddd
sensors: a, b, c
port: xzyjk

[agent: node02]
status: unknown
IP: aaa.bbb.ccc.ddd
port: xzyjk
last message: [Jan 12 07:23:14][ERROR] bla-bla

# ...

[R004] limon sensors

With limon sensors one can list and get information about avaible sensors. For instance:

$ limon sensors
cpu load
memory
disk
ipmi
energy
gpu load
...
$ limon sensors --long
cpu load
  Get information about CPU load...

  Usage:
  ...

  Parameters:
  ...

memory
  ...

[R005] Open Source Security Foundation Best Practices badge

See Open Source Security Foundation (OpenSSF) Best Practices badge.

[R006] Active users [sensor]

Sensor that finds information about active users on the system:

  • logged in users

  • active users (users that used the cluster recently and have a valid account)

[R007] limon health-check

Command to trigger a battery of checks for given nodes:

$ limon health-check gn01
cores: ...
cpu load: 23.3 18.0 10.0
gpu load: ...
tail dmesg: ...
memory: ...
vmstat: ...
hardware events (ipmi): ...
network: ...
current users:
active jobs:
disk:
...

[R008] limon report

Command to produce a report of different observables in different formats. By default it will be text, but also plots or html (template based) should be possible.

It should be able to aggregate results where possible and requested, for instance: cpu load of nodes node01 and node02.

One use case would be reporting the current value of some sensors, e.g.:

$ limon report cpu-load --nodes gn01,cn02
gn01:
  cpu-load: 0.09 0.42 0.35
cn02:
  cpu-load: 0.33 1.37 12.09

$ limon report live cpu-load,cpu-count --nodes gn01
gn01:
  cpu-load: 0.09 0.42 0.35
  cpu-count: 14 (logical); 12 (physical)

[R009] limon live

Open a Textual-based dashboard with live data of the cluster.

  • use textual-plottext

  • get inspiration from dolphie

  • Example of idea: collapsible wit one sparkline per node with history of cpu load