.. _requirements: ============ Requirements ============ [R001] ``lmaster`` ================== Main daemon program that controls all other components. [R002] ``lagent`` ================= Program in charge of collecting data and reporting it somewhere else. An instance of this program will run on each system to be monitored. [R003] ``limon status`` ======================= The utility program ``limon`` has a ``status`` sub-command that reports about the running status of |project|. For instance: .. code-block:: console $ limon status [master] status: ok IP: aaa.bbb.ccc.ddd port: xzyjk [agent: node01] status: ok IP: aaa.bbb.ccc.ddd sensors: a, b, c port: xzyjk [agent: node02] status: unknown IP: aaa.bbb.ccc.ddd port: xzyjk last message: [Jan 12 07:23:14][ERROR] bla-bla # ... [R004] ``limon sensors`` ======================== With ``limon sensors`` one can list and get information about avaible sensors. For instance: .. code-block:: console $ limon sensors cpu load memory disk ipmi energy gpu load ... $ limon sensors --long cpu load Get information about CPU load... Usage: ... Parameters: ... memory ... [R005] Open Source Security Foundation Best Practices badge =========================================================== See `Open Source Security Foundation (OpenSSF) Best Practices badge`_. .. _`Open Source Security Foundation (OpenSSF) Best Practices badge`: https://www.bestpractices.dev/en [R006] Active users [sensor] ============================ Sensor that finds information about active users on the system: - logged in users - active users (users that used the cluster recently and have a valid account) [R007] ``limon health-check`` ============================= Command to trigger a battery of checks for given nodes: .. code-block:: console $ limon health-check gn01 cores: ... cpu load: 23.3 18.0 10.0 gpu load: ... tail dmesg: ... memory: ... vmstat: ... hardware events (ipmi): ... network: ... current users: active jobs: disk: ... [R008] ``limon report`` ======================= Command to produce a report of different observables in different formats. By default it will be text, but also plots or html (template based) should be possible. It should be able to aggregate results where possible and requested, for instance: cpu load of nodes ``node01`` and ``node02``. One use case would be reporting the current value of some sensors, e.g.: .. code-block:: console $ limon report cpu-load --nodes gn01,cn02 gn01: cpu-load: 0.09 0.42 0.35 cn02: cpu-load: 0.33 1.37 12.09 $ limon report live cpu-load,cpu-count --nodes gn01 gn01: cpu-load: 0.09 0.42 0.35 cpu-count: 14 (logical); 12 (physical) [R009] ``limon live`` ===================== Open a Textual-based dashboard with live data of the cluster. - use textual-plottext - get inspiration from dolphie - Example of idea: collapsible wit one sparkline per node with history of cpu load