All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Bristot de Oliveira <bristot@kernel.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>,
	Wim Van Sebroeck <wim@linux-watchdog.org>,
	Guenter Roeck <linux@roeck-us.net>,
	Jonathan Corbet <corbet@lwn.net>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marco Elver <elver@google.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Gabriele Paoloni <gpaoloni@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Clark Williams <williams@redhat.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-trace-devel@vger.kernel.org
Subject: [PATCH V4 20/20] Documentation/rv: Add watchdog-monitor documentation
Date: Thu, 16 Jun 2022 10:45:02 +0200	[thread overview]
Message-ID: <129a431c1a12610fa7b44f76ce73aa8058f55bc6.1655368610.git.bristot@kernel.org> (raw)
In-Reply-To: <cover.1655368610.git.bristot@kernel.org>

Adds documentation about the safe_wtd and safe_wtd_nwo RV monitors,
and their usage via a safety application.

Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
---
 Documentation/trace/rv/watchdog-monitor.rst | 250 ++++++++++++++++++++
 1 file changed, 250 insertions(+)
 create mode 100644 Documentation/trace/rv/watchdog-monitor.rst

diff --git a/Documentation/trace/rv/watchdog-monitor.rst b/Documentation/trace/rv/watchdog-monitor.rst
new file mode 100644
index 000000000000..2b142fb31572
--- /dev/null
+++ b/Documentation/trace/rv/watchdog-monitor.rst
@@ -0,0 +1,250 @@
+Watchdog monitor
+----------------
+
+The watchdog is an essential building block for the usage of Linux in
+safety-critical systems because it allows the system to be monitored from
+an external element - the watchdog hardware, acting as a safety-monitor.
+
+A user-space application controls the watchdog device via the watchdog
+interface. This application, hereafter safety_app, enables the watchdog
+and periodically pets the watchdog upon correct completion of the safety
+related processing.
+
+If the safety_app, for any reason, stops pinging the watchdog,
+the watchdog hardware can set the system in a fail-safe state. For
+example, shutting the system down.
+
+Given the importance of the safety_app / watchdog hardware couple,
+the interaction between these software pieces also needs some
+sort of monitoring. In other words, "who monitors the monitor?"
+
+The safe watchdog (safe_wtd) RV monitor monitors the interaction between
+the safety_app and the watchdog device, enforcing the correct sequence of
+events that leads the system to a safe state.
+
+Furthermore, the safety_app can monitor the RV monitor by collecting the
+events generated by the RV monitor itself via tracing interface. In this way,
+closing the monitoring loop with the safety_app.
+
+A diagram of the components and their interactions is::
+
+  user-space:
+                  +--------------------------------+
+                  | safety_app                     |-----------+
+                  +--------------------------------+           |
+                     |                    ^                    |
+                     | Configure          | Enable and         |
+                     |                    | check data         |
+  ===================+====================+===============     |
+  kernel-space:      |                    |                    |
+                     v                    v                    |
+                +----------+  instr.     +-------------+       |
+                | watchdog | ----------->| RV Monitor  |----+  |
+                | device   |             +-------------+    |  |
+                +----------+                                |  |
+                  |                                         |  |
+                  |                                         |  |
+  ================+======================================   |  |
+  hardware:       |                                         |  |
+                  v                                         |  +-> Bring the system
+                +--------------------+                      +----> to a safe state,
+                | watchdog hardware  |---------------------------> e.g., halt.
+                +--------------------+
+
+Sample safety_app
+-----------------
+
+The user-space safety_app sample code in ``tools/verification/safety_app/``
+serves to illustrate the usage of the RV monitors for this use-case, as
+well as the starting point to the development of a user-specific safety_app.
+
+Watchdog events
+---------------
+
+The RV monitor observes the watchdog by using instrumentation to
+process the events generated by the interaction between the
+safety_app and the watchdog device layer in kernel.
+
+The monitored events are:
+
+  - watchdog:watchdog_open: open the watchdog device;
+  - watchdog:watchdog_close: close the watchdog device;
+  - watchdog:watchdog_start: start the watchdog;
+  - watchdog:watchdog_stop: stop the watchdog;
+  - watchdog:watchdog_set_timeout: set the watchdog timeout;
+  - watchdog:watchdog_ping: reprogram the watchdog with the previously set
+    timeout;
+  - watchdog:watchdog_nowayout: prevents the watchdog from stopping;
+  - watchdog:watchdog_set_keep_alive: set an intermediary ping to overcome
+    the limitation of a hardware watchdog maximum timeout being shorter than
+    the timeout set by the user-space tool;
+  - watchdog:watchdog_keep_alive: the execution of the function that runs the
+    intermediary keep alive ping;
+
+RV monitor events
+-----------------
+
+The RV monitor monitors the relevant events as an outside observer,
+interpreting all the components (the hardware; the watchdog device
+interface; and the safety monitor) as an integrated component.
+
+The events selected for the monitor are:
+
+  - other_threads: an event generated by any thread other than the
+    one that set nowayout or open the watchdog the last time.
+  - open: a thread opens the watchdog to manipulate it;
+  - close: a thread closes the watchdog;
+  - start: starts the watchdog countdown;
+  - stop: stops the watchdog;
+  - set_safe_timeout: configures the watchdog with a given timeout;
+  - ping: resets the watchdog countdown with the previously configured timeout;
+  - nowayout: prevents the watchdog to be stopped until the system's shutdown;
+  - sched_keep_alive: schedules a kernel worker to ping the watchdog if the
+    timeout is longer than the watchdog hardware can handle.
+  - keep_alive: executes the previously scheduled watchdog ping;
+
+Noting that the events that does not appear in the automata models are
+considered blocked events, and their execution will always cause the
+RV monitor to react to an unexpected event.
+
+RV monitor specification
+------------------------
+
+The monitor's goal is to assess a set of specifications that conducts the
+system to a safe state.
+
+These specifications are:
+
+  - 1: Once open, only one process manipulates the watchdog;
+  - 2: Following 1, the keep-alive mechanisms will not be used;
+  - 3: If required, nowayout will be set before opening the watchdog;
+  - 4: A safe timeout must be set;
+  - 5: At least one ping must be made before entering the safe/safe_nwo states
+  - 6: The RV monitor does not react if the watchdog is closed without stopping.
+       But the hardware watchdog is expected to react.
+
+Deterministic automata monitors
+-------------------------------
+
+Following the specifications, a deterministic automata monitor
+was developed. The monitor is modeled as Deterministic Automata model.
+
+The deterministic automata model for safe_wtd is::
+
+              #==================================#   other_threads
+              H                                  H ----------------+
+ -----------> H               init               H                 |
+              H                                  H <---------------+
+              #==================================#
+                      |     |     ^
+                      |     |     |               close
+                      |     |     +----------------------------------------------------+
+                      |     |                                                          |
+                      |     |                     open                                 |
+                      |     +------------------------------------------------------+   |
+                      |                                                            |   |
+                      |  nowayout                                                  |   |
+                      v                                                            |   |
+    nowayout        +-------------------+                                          |   |
+    other_threads   |                   |          nowayout                        |   |
+  +---------------- |        nwo        |<-------------------------------------+   |   |
+  |                 |                   |                                      |   |   |
+  +---------------> |                   | <+                                   |   |   |
+                    +-------------------+  |                                   |   |   |
+                      |                    |                                   |   |   |
+                      | open               | close                             |   |   |
+                      v                    |                                   |   |   |
+                    +-------------------+  |                                   |   |   |
+                    |    opened_nwo     | -+                                   |   |   |
+                    +-------------------+                                      |   |   |
+                      |                                                        |   |   |
+                      | start                                                  |   |   |
+                      v                                                        |   |   |
+                    +-------------------+                                      |   |   |
+  +---------------> |    started_nwo    | -+                                   |   |   |
+  |                 +-------------------+  |                                   |   |   |
+  |                   |                    |                                   |   |   |
+  | open              | set_safe_timeout   |                                   |   |   |
+  |                   v                    |                                   |   |   |
+  |                 +-------------------+  |                                   |   |   |
+  |                 |      set_nwo      |  |                                   |   |   |
+  |                 +-------------------+  |                                   |   |   |
+  |                           |            |                                   |   |   |
+  |     +-------------+       | ping       |                                   |   |   |
+  |     |             |       |            |                                   |   |   |
+  |     | ping        v       v            |                                   |   |   |
+  |     |           +-------------------+  |                                   |   |   |
+  |     +-----------|     safe_nwo      |  |                                   |   |   |
+  |                 +-------------------+  |                                   |   |   |
+  |                   |                    |                                   |   |   |
+  |                   | close              | close                             |   |   |
+  |                   v                    v                                   |   |   |
+  |                 +----------------------------------+   nowayout            |   |   |
+  |                 |                                  |   other_threads       |   |   |
+  |                 |        closed_running_nwo        | ----------------+     |   |   |
+  |                 |                                  |                 |     |   |   |
+  +---------------- |                                  | <---------------+     |   |   |
+                    +----------------------------------+                       |   |   |
+                        |        nowayout             ^                        |   |   |
+                        +-----------------------------+                        |   |   |
+                                                                               |   |   |
+                                                                               |   |   |
+                               +-------------------+           +--------+      |   |   |
+                               |                   |           |        |------+---+   |
+                               |      started      |  start    | opened |      |       |
+             +---------------- |                   | <-------- |        |>-----+-------+
+             |                 +-------------------+           +--------+      |       ^
+             |                   |                                             |       |
+             |                   | set_safe_timeout              +-------------+-------+
+             |                   v                               |             |
+             |                 +-------------------+             |             |
+             |                 |                   |             |             |
+             |                 |        set        |             |             |
+  +----------+---------------> |                   |             |             |
+  |          |                 +-------------------+             |             |
+  |          |                   |                               |             |
+  |          |                   | ping                          |             |
+  |          |                   v                               |             |
+  |          |                 +-------------------+   ping      |             |
+  |          |                 |                   | -------+    |             |
+  |          |           +---- |       safe        |        |    |             |
+  |          |           |     |                   | <------+    |             |
+  |          |           |     +-------------------+             |             |
+  |          |           |       |                               |             |
+  |          | stop      |       | stop                          |             |
+  |          |           |       v                               |             |
+  |          |           |     +-------------------+   close     |             |
+  |          +-----------+---> |      stopped      |-------------+             |
+  |                      |     +-------------------+                           |
+  |                      +---+                                                 |
+  |                          | close                                           |
+  |                          v                                                 |
+  |     other_threads  +----------------------------------------+              |
+  |   +--------------> |                                        |              |
+  |   |                |             closed_running             |              |
+  |   +--------------- |                                        |--------------+
+  |                    +----------------------------------------+
+  |                               |          ^
+  |                         open  |          | close
+  |                               v          |
+  |    set_safe_timeout       +-------------------+
+  +-------------------------> |     reopened      |
+                                +-------------------+
+
+It is important to note that the events sched_keep_alive and keep_alive
+are not allowed in the monitor (they are said to be blocked events).
+The execution of any blocked events leads the RV monitor to react.
+
+Additional options
+------------------
+
+The RV monitor also has a set of options enabled via kernel command
+line/module options. They are:
+
+ - watchdog_id: the device id to monitor (default 0);
+ - dont_stop: once enabled, do not allow the RV monitor to be stopped (default off);
+ - safe_timeout: define a maximum safe value that a user-space application can
+   set as the watchdog timeout (default unlimited);
+ - check_timeout: After every ping, check if the time left in the watchdog is less
+   than or equal to the last timeout set for the watchdog. It only works for watchdog
+   devices that provide the get_timeleft() function (default off);
-- 
2.35.1


  parent reply	other threads:[~2022-06-16  8:48 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-16  8:44 [PATCH V4 00/20] The Runtime Verification (RV) interface Daniel Bristot de Oliveira
2022-06-16  8:44 ` [PATCH V4 01/20] rv: Add " Daniel Bristot de Oliveira
2022-06-23 17:21   ` Punit Agrawal
2022-07-01 13:24     ` Daniel Bristot de Oliveira
2022-06-23 20:26   ` Steven Rostedt
2022-07-04 19:49     ` Daniel Bristot de Oliveira
2022-07-06 17:49   ` Tao Zhou
2022-07-06 17:53     ` Matthew Wilcox
2022-07-08 15:36       ` Tao Zhou
2022-07-08 15:55         ` Matthew Wilcox
2022-07-08 14:39     ` Daniel Bristot de Oliveira
2022-07-10 15:11       ` Tao Zhou
2022-07-10 15:42         ` Steven Rostedt
2022-07-10 22:28           ` Tao Zhou
2022-06-16  8:44 ` [PATCH V4 02/20] rv: Add runtime reactors interface Daniel Bristot de Oliveira
2022-06-23 20:40   ` Steven Rostedt
2022-06-16  8:44 ` [PATCH V4 03/20] rv/include: Add helper functions for deterministic automata Daniel Bristot de Oliveira
2022-06-28 17:48   ` Steven Rostedt
2022-07-06 18:35   ` Tao Zhou
2022-07-13 18:38     ` Daniel Bristot de Oliveira
2022-06-16  8:44 ` [PATCH V4 04/20] rv/include: Add deterministic automata monitor definition via C macros Daniel Bristot de Oliveira
2022-07-06 18:56   ` Tao Zhou
2022-07-13 18:39     ` Daniel Bristot de Oliveira
2022-06-16  8:44 ` [PATCH V4 05/20] rv/include: Add instrumentation helper functions Daniel Bristot de Oliveira
2022-06-16  8:44 ` [PATCH V4 06/20] tools/rv: Add dot2c Daniel Bristot de Oliveira
2022-06-28 18:10   ` Steven Rostedt
2022-06-28 18:16   ` Steven Rostedt
2022-07-13 18:41     ` Daniel Bristot de Oliveira
2022-06-16  8:44 ` [PATCH V4 07/20] tools/rv: Add dot2k Daniel Bristot de Oliveira
2022-06-16  8:44 ` [PATCH V4 08/20] rv/monitor: Add the wip monitor skeleton created by dot2k Daniel Bristot de Oliveira
2022-06-28 19:02   ` Steven Rostedt
2022-06-16  8:44 ` [PATCH V4 09/20] rv/monitor: wip instrumentation and Makefile/Kconfig entries Daniel Bristot de Oliveira
2022-06-16 11:21   ` kernel test robot
2022-06-16 21:00   ` Randy Dunlap
2022-06-17 16:07     ` Daniel Bristot de Oliveira
2022-06-28 19:02     ` Steven Rostedt
2022-06-16  8:44 ` [PATCH V4 10/20] rv/monitor: Add the wwnr monitor skeleton created by dot2k Daniel Bristot de Oliveira
2022-07-06 20:08   ` Tao Zhou
2022-06-16  8:44 ` [PATCH V4 11/20] rv/monitor: wwnr instrumentation and Makefile/Kconfig entries Daniel Bristot de Oliveira
2022-06-16 13:47   ` kernel test robot
2022-06-28 19:05   ` Steven Rostedt
2022-06-16  8:44 ` [PATCH V4 12/20] rv/reactor: Add the printk reactor Daniel Bristot de Oliveira
2022-06-16  8:44 ` [PATCH V4 13/20] rv/reactor: Add the panic reactor Daniel Bristot de Oliveira
2022-06-16 15:20   ` kernel test robot
2022-06-16 21:03   ` Randy Dunlap
2022-06-17 16:09     ` Daniel Bristot de Oliveira
2022-07-13 18:47     ` Daniel Bristot de Oliveira
2022-06-28 19:06   ` Steven Rostedt
2022-06-16  8:44 ` [PATCH V4 14/20] Documentation/rv: Add a basic documentation Daniel Bristot de Oliveira
2022-06-29  3:35   ` Bagas Sanjaya
2022-07-13 19:30     ` Daniel Bristot de Oliveira
2022-06-16  8:44 ` [PATCH V4 15/20] Documentation/rv: Add deterministic automata monitor synthesis documentation Daniel Bristot de Oliveira
2022-06-28 19:09   ` Steven Rostedt
2022-06-16  8:44 ` [PATCH V4 16/20] Documentation/rv: Add deterministic automata instrumentation documentation Daniel Bristot de Oliveira
2022-06-16  8:44 ` [PATCH V4 17/20] watchdog/dev: Add tracepoints Daniel Bristot de Oliveira
2022-06-16 13:44   ` Guenter Roeck
2022-06-16 15:47     ` Daniel Bristot de Oliveira
2022-06-16 23:55       ` Guenter Roeck
2022-06-17 16:16         ` Daniel Bristot de Oliveira
2022-07-13 18:49         ` Daniel Bristot de Oliveira
2022-06-16  8:45 ` [PATCH V4 18/20] rv/monitor: Add safe watchdog monitor Daniel Bristot de Oliveira
2022-06-16 13:36   ` Guenter Roeck
2022-06-16 15:29     ` Daniel Bristot de Oliveira
     [not found]       ` <CA+wEVJbvcMZbCroO2_rdVxLvYkUo-ePxCwsp5vbDpoqys4HGWQ@mail.gmail.com>
2022-06-16 23:53         ` Guenter Roeck
2022-06-17 17:06           ` Daniel Bristot de Oliveira
2022-06-28 19:32           ` Steven Rostedt
2022-07-01 14:45             ` Guenter Roeck
2022-07-01 15:38               ` Steven Rostedt
2022-07-04 12:41                 ` Daniel Bristot de Oliveira
2022-06-16 20:57   ` Randy Dunlap
2022-06-17 16:17     ` Daniel Bristot de Oliveira
2022-07-13 19:13   ` Daniel Bristot de Oliveira
2022-06-16  8:45 ` [PATCH V4 19/20] rv/safety_app: Add a safety_app sample Daniel Bristot de Oliveira
2022-06-16  8:45 ` Daniel Bristot de Oliveira [this message]
2022-07-07 12:41   ` [PATCH V4 20/20] Documentation/rv: Add watchdog-monitor documentation Tao Zhou
2022-07-13 18:51     ` Daniel Bristot de Oliveira
2022-06-22  7:24 ` [PATCH V4 00/20] The Runtime Verification (RV) interface Song Liu
2022-06-23 16:41   ` Daniel Bristot de Oliveira
2022-06-23 17:52     ` Song Liu
2022-06-23 20:29       ` Daniel Bristot de Oliveira
2022-06-23 21:10         ` Song Liu
2022-07-06 16:18 ` Tao Zhou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=129a431c1a12610fa7b44f76ce73aa8058f55bc6.1655368610.git.bristot@kernel.org \
    --to=bristot@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=gpaoloni@redhat.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=skhan@linuxfoundation.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=williams@redhat.com \
    --cc=wim@linux-watchdog.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.