Linux-Serial Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
@ 2020-09-23 15:15 Maximilian Luz
  2020-09-23 15:15 ` [RFC PATCH 2/9] surface_aggregator: Add control packet allocation chaching Maximilian Luz
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maximilian Luz, linux-serial, linux-acpi, Arnd Bergmann,
	Greg Kroah-Hartman, Rob Herring, Jiri Slaby, Rafael J. Wysocki,
	Len Brown, Blaž Hrastnik, Dorian Stoll

Hello,

The Surface System Aggregator Module (we'll refer to it as Surface
Aggregator or SAM below) is an embedded controller (EC) found on various
Microsoft Surface devices. Specifically, all 4th and later generation
Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
exception of the Surface Go series and the Surface Duo. Notably, it
seems like this EC can also be found on the ARM-based Surface Pro X [1].

Functionality provided by this EC depends on the Surface model and can
(roughly) be broken down by their generations: Starting with 5th
generation devices (Surface Pro 2017/5, Surface Book 2, Surface Laptop
1, and later), the EC provides battery and thermal readings, as well as
access to the real-time clock. On 5th and 6th generations, these
features, specifically, are provided via the ACPI interface of the EC,
referred to as Surface ACPI Notify (SAN), i.e. they act as standard ACPI
devices of that type, but require a driver translating requests written
to an ACPI operation region to requests to the EC. On 7th generation
devices, the ACPI interface is (largely) gone, and has been replaced
with custom battery and thermal drivers, directly querying the EC.

Additionally, HID keyboard and touchpad input for Surface models with
these devices built in can be handled via the EC: On the Surface Laptops
1 and 2, this includes only the keyboard, while on the Surface Laptop 3
and Book 3, this includes both touchpad and keyboard. In this case,
actual input is provided as HID data and the EC connection acts as HID
transport, thus requiring a special transport driver for those devices
to work.

Further, all known devices (5th and later generations) also support
changing of performance/cooling modes, which can influence cooling
capabilities of the device (e.g. prefer silent operation over
performance), and may influence power limits (e.g. of the discrete GPU
found on Surface Books).

While this constitutes all major functionality, some more device
specific functionality is also handled by the EC. For example, on the
Surface Books, the EC handles detaching of the clipboard (i.e. the upper
part with screen and CPU) from the base (the lower part with keyboard
and optional discrete GPU) and can influence its behavior (i.e. it
provides an interface via which detachment can be requested, aborted, or
confirmed). It can also be used to detect if there has been a base
attached to the clipboard, and if so what type.

This patch-series adds the basis for supporting this EC and the features
provided by it, by, first, implementing a communication driver providing
a fundamental API for client drivers, handling specific aspects of the
EC. Additionally, it builds on top of that to provide a dedicated bus
and device type to better manage EC clients (and break it down pseudo-
device-wise), especially in the case when these client devices are not
described in ACPI, i.e. cannot be discovered by conventional means.
Furthermore, it provides support for debugging and prototyping via an
optional DebugFS interface, and, lastly, also support for the
aforementioned ACPI interface, allowing ACPI to communicate with the EC
directly.

This series only addresses 5th and later generation Surface models as
the communication interface has changed substantially from 4th to 5th
generation, and the 4th generation interface has not been reverse-
engineered yet. Specifically, the underlying transport has been changed
from HID feature and input-/output-reports to serial communication via
an UART and a custom protocol. Support for 4th generation devices may be
added in the future, but as currently not much is known about 4th
generation SAM, it yet remains to be seen if this can happen as addition
to this subsystem, or if it will be easier to implement this as separate
platform driver. Especially as the 4th generation EC does not seem to
provide much of the functionality found on 5th and later generations
(e.g. no battery status reporting, no thermal reporting, ..., we assume
it's just clipboard detachment on the Surface Book 1 and performance
mode setting).

In more detail, this series adds a driver for the Surface Serial Hub
(SSH), the 5th- and later-generation communication channel to the EC, a
pseudo-device and driver exposing a DebugFS interface that can be used
to communicate with the EC from user-space, intended for debugging,
testing, and prototyping, as well as a driver for the Surface ACPI
Notify (SAN) device, i.e. the interface between ACPI and EC. Some more
details on those can be found on the individual commit messages.

This series, apart from the SAN and DebugFS drivers, does not add any
client drivers. This will be handled via future patches once the core
has been accepted (and the other client drivers have been cleaned up a
bit).

On the top level, EC communication via the SSH driver can be broken down
into requests (sent from host to EC), corresponding responses (sent from
EC to host, associated with and triggered by a request), and events
(sent from EC to host without association ot a request). The SSH driver
manages all communication (i.e. matches responses to requests, enables
and disables events, and manages event handlers/notifiers installed by
client drivers). On the lower levels, SSH communication is packet-based,
and described in more detail in the documentation added in this series
(specifically ssh.rst).

This set of modules and drivers has been developed out of tree at [2]
and used/tested in the kernel we provide at [3] pretty much since its
beginnings. It has been developed by reverse-engineering the SSH
protocol, mostly through the ACPI interface, communication dumps
obtained from listening in on Windows, and deduction. So things may be
wrong. There have been some attempts at reverse-engineering existing
drivers, which also gave a bit of insight for development, however, I
haven't gotten very far on that front beyond some more higher-level
concepts and detecting a couple of new EC commands/confirming the
functionality of already known commands.

Driver and module names have been chosen to align with Windows driver
names, some field, vairable, and concept names have been chosen to align
with ACPI code (or at least with what I think some of the more cryptic
names could mean and make sense in the respective context, e.g. IID ->
Instance ID, TC -> Target Category).

While I consider this submission complete, I've decided to submit this
as RFC first, mainly due to its size and it being my first submission on
this scale. Any feedback, review, comment, question, etc. is much
appreciated.

This patch-set can also be found at the following respository and
reference, if you prefer to look at a kernel tree instead of these
emails:

  https://github.com/linux-surface/kernel tags/s/surface-aggregator/rfc-v1

Thanks,
Max

[1]: The Surface Pro X is, however, currently considered unsupported due
     to a lack of test candidates and, as it seems, general lack of
     Linux support on other parts. AFAIK there is an issue preventing
     serial devices from being registered, on which the core driver in
     this series is build on, thus there is no way to even test that at
     this point. I'd be happy to work out any issues regarding SAM on
     the Pro X at some point in the future, provided someone can/wants
     to actually test it.

[2]: https://github.com/linux-surface/surface-aggregator-module
[3]: https://github.com/linux-surface/linux-surface

Maximilian Luz (9):
  misc: Add Surface Aggregator subsystem
  surface_aggregator: Add control packet allocation chaching
  surface_aggregator: Add event item allocation chaching
  surface_aggregator: Add trace points
  surface_aggregator: Add error injection capabilities
  surface_aggregator: Add dedicated bus and device type
  docs: driver-api: Add Surface Aggregator subsystem documentation
  surface_aggregator: Add DebugFS interface
  surface_aggregator: Add Surface ACPI Notify client driver

 Documentation/driver-api/index.rst            |    1 +
 .../surface_aggregator/client-api.rst         |   38 +
 .../driver-api/surface_aggregator/client.rst  |  394 +++
 .../surface_aggregator/clients/dbgdev.rst     |  130 +
 .../surface_aggregator/clients/index.rst      |   21 +
 .../surface_aggregator/clients/san.rst        |   44 +
 .../driver-api/surface_aggregator/index.rst   |   21 +
 .../surface_aggregator/internal-api.rst       |   67 +
 .../surface_aggregator/internal.rst           |   50 +
 .../surface_aggregator/overview.rst           |   76 +
 .../driver-api/surface_aggregator/ssh.rst     |  343 +++
 MAINTAINERS                                   |   10 +
 drivers/misc/Kconfig                          |    1 +
 drivers/misc/Makefile                         |    1 +
 drivers/misc/surface_aggregator/Kconfig       |   65 +
 drivers/misc/surface_aggregator/Makefile      |   17 +
 drivers/misc/surface_aggregator/bus.c         |  419 +++
 drivers/misc/surface_aggregator/bus.h         |   22 +
 .../misc/surface_aggregator/clients/Kconfig   |   38 +
 .../misc/surface_aggregator/clients/Makefile  |    4 +
 .../clients/surface_acpi_notify.c             |  882 ++++++
 .../clients/surface_aggregator_debugfs.c      |  281 ++
 drivers/misc/surface_aggregator/controller.c  | 2505 +++++++++++++++++
 drivers/misc/surface_aggregator/controller.h  |  283 ++
 drivers/misc/surface_aggregator/core.c        |  821 ++++++
 drivers/misc/surface_aggregator/ssh_msgb.h    |  196 ++
 .../surface_aggregator/ssh_packet_layer.c     | 2002 +++++++++++++
 .../surface_aggregator/ssh_packet_layer.h     |  170 ++
 drivers/misc/surface_aggregator/ssh_parser.c  |  224 ++
 drivers/misc/surface_aggregator/ssh_parser.h  |  152 +
 .../surface_aggregator/ssh_request_layer.c    | 1249 ++++++++
 .../surface_aggregator/ssh_request_layer.h    |  137 +
 drivers/misc/surface_aggregator/trace.h       |  621 ++++
 include/linux/mod_devicetable.h               |   18 +
 include/linux/surface_acpi_notify.h           |   37 +
 include/linux/surface_aggregator/controller.h |  812 ++++++
 include/linux/surface_aggregator/device.h     |  408 +++
 include/linux/surface_aggregator/serial_hub.h |  657 +++++
 scripts/mod/devicetable-offsets.c             |    8 +
 scripts/mod/file2alias.c                      |   23 +
 40 files changed, 13248 insertions(+)
 create mode 100644 Documentation/driver-api/surface_aggregator/client-api.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/client.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/clients/dbgdev.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/clients/index.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/clients/san.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/index.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/internal-api.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/internal.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/overview.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/ssh.rst
 create mode 100644 drivers/misc/surface_aggregator/Kconfig
 create mode 100644 drivers/misc/surface_aggregator/Makefile
 create mode 100644 drivers/misc/surface_aggregator/bus.c
 create mode 100644 drivers/misc/surface_aggregator/bus.h
 create mode 100644 drivers/misc/surface_aggregator/clients/Kconfig
 create mode 100644 drivers/misc/surface_aggregator/clients/Makefile
 create mode 100644 drivers/misc/surface_aggregator/clients/surface_acpi_notify.c
 create mode 100644 drivers/misc/surface_aggregator/clients/surface_aggregator_debugfs.c
 create mode 100644 drivers/misc/surface_aggregator/controller.c
 create mode 100644 drivers/misc/surface_aggregator/controller.h
 create mode 100644 drivers/misc/surface_aggregator/core.c
 create mode 100644 drivers/misc/surface_aggregator/ssh_msgb.h
 create mode 100644 drivers/misc/surface_aggregator/ssh_packet_layer.c
 create mode 100644 drivers/misc/surface_aggregator/ssh_packet_layer.h
 create mode 100644 drivers/misc/surface_aggregator/ssh_parser.c
 create mode 100644 drivers/misc/surface_aggregator/ssh_parser.h
 create mode 100644 drivers/misc/surface_aggregator/ssh_request_layer.c
 create mode 100644 drivers/misc/surface_aggregator/ssh_request_layer.h
 create mode 100644 drivers/misc/surface_aggregator/trace.h
 create mode 100644 include/linux/surface_acpi_notify.h
 create mode 100644 include/linux/surface_aggregator/controller.h
 create mode 100644 include/linux/surface_aggregator/device.h
 create mode 100644 include/linux/surface_aggregator/serial_hub.h

-- 
2.28.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 2/9] surface_aggregator: Add control packet allocation chaching
  2020-09-23 15:15 [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Maximilian Luz
@ 2020-09-23 15:15 ` Maximilian Luz
  2020-09-23 15:15 ` [RFC PATCH 3/9] surface_aggregator: Add event item " Maximilian Luz
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maximilian Luz, linux-serial, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Jiri Slaby, Blaž Hrastnik, Dorian Stoll

Surface Serial Hub communication is, in its core, packet based. Each
sequenced packet requires to be acknowledged, via an ACK-type control
packet. In case invalid data has been received by the driver, a NAK-type
(not-acknowledge/negative acknowledge) control packet is sent,
triggering retransmission.

Control packets are therefore a core communication primitive and used
frequently enough (with every sequenced packet transmission sent by the
embedded controller, including events and request responses) that it may
warrant caching their allocations to reduce possible memory
fragmentation.

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
---
 drivers/misc/surface_aggregator/core.c        | 27 ++++++++++-
 .../surface_aggregator/ssh_packet_layer.c     | 47 +++++++++++++++----
 .../surface_aggregator/ssh_packet_layer.h     |  3 ++
 3 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/drivers/misc/surface_aggregator/core.c b/drivers/misc/surface_aggregator/core.c
index e428aa59fcc3..938c46a9e20b 100644
--- a/drivers/misc/surface_aggregator/core.c
+++ b/drivers/misc/surface_aggregator/core.c
@@ -762,7 +762,32 @@ static struct serdev_device_driver ssam_serial_hub = {
 		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
 	},
 };
-module_serdev_device_driver(ssam_serial_hub);
+
+
+/* -- Module setup. --------------------------------------------------------- */
+
+static int __init ssam_core_init(void)
+{
+	int status;
+
+	status = ssh_ctrl_packet_cache_init();
+	if (status)
+		return status;
+
+	status = serdev_device_driver_register(&ssam_serial_hub);
+	if (status)
+		ssh_ctrl_packet_cache_destroy();
+
+	return status;
+}
+module_init(ssam_core_init);
+
+static void __exit ssam_core_exit(void)
+{
+	serdev_device_driver_unregister(&ssam_serial_hub);
+	ssh_ctrl_packet_cache_destroy();
+}
+module_exit(ssam_core_exit);
 
 MODULE_AUTHOR("Maximilian Luz <luzmaximilian@gmail.com>");
 MODULE_DESCRIPTION("Subsystem and Surface Serial Hub driver for Surface System Aggregator Module");
diff --git a/drivers/misc/surface_aggregator/ssh_packet_layer.c b/drivers/misc/surface_aggregator/ssh_packet_layer.c
index fa1a3d1d4a49..993aabfdfdae 100644
--- a/drivers/misc/surface_aggregator/ssh_packet_layer.c
+++ b/drivers/misc/surface_aggregator/ssh_packet_layer.c
@@ -284,24 +284,53 @@ void ssh_packet_init(struct ssh_packet *packet, unsigned long type,
 }
 
 
+static struct kmem_cache *ssh_ctrl_packet_cache;
+
+/**
+ * ssh_ctrl_packet_cache_init() - Initialize the control packet cache.
+ */
+int ssh_ctrl_packet_cache_init(void)
+{
+	const unsigned int size = sizeof(struct ssh_packet) + SSH_MSG_LEN_CTRL;
+	const unsigned int align = __alignof__(struct ssh_packet);
+	struct kmem_cache *cache;
+
+	cache = kmem_cache_create("ssam_ctrl_packet", size, align, 0, NULL);
+	if (!cache)
+		return -ENOMEM;
+
+	ssh_ctrl_packet_cache = cache;
+	return 0;
+}
+
+/**
+ * ssh_ctrl_packet_cache_destroy() - Deinitialize the control packet cache.
+ */
+void ssh_ctrl_packet_cache_destroy(void)
+{
+	kmem_cache_destroy(ssh_ctrl_packet_cache);
+	ssh_ctrl_packet_cache = NULL;
+}
+
 /**
- * ssh_ctrl_packet_alloc() - Allocate control packet.
+ * ssh_ctrl_packet_alloc() - Allocate packet from control packet cache.
  * @packet: Where the pointer to the newly allocated packet should be stored.
  * @buffer: The buffer corresponding to this packet.
  * @flags:  Flags used for allocation.
  *
- * Allocates a packet and corresponding transport buffer. Sets the packet's
- * buffer reference to the allocated buffer. The packet must be freed via
- * ssh_ctrl_packet_free(), which will also free the corresponding buffer. The
- * corresponding buffer must not be freed separately. Intended to be used with
- * %ssh_ptl_ctrl_packet_ops as packet operations.
+ * Allocates a packet and corresponding transport buffer from the control
+ * packet cache. Sets the packet's buffer reference to the allocated buffer.
+ * The packet must be freed via ssh_ctrl_packet_free(), which will also free
+ * the corresponding buffer. The corresponding buffer must not be freed
+ * separately. Intended to be used with %ssh_ptl_ctrl_packet_ops as packet
+ * operations.
  *
  * Return: Returns zero on success, %-ENOMEM if the allocation failed.
  */
 static int ssh_ctrl_packet_alloc(struct ssh_packet **packet,
 				 struct ssam_span *buffer, gfp_t flags)
 {
-	*packet = kzalloc(sizeof(**packet) + SSH_MSG_LEN_CTRL, flags);
+	*packet = kmem_cache_alloc(ssh_ctrl_packet_cache, flags);
 	if (!*packet)
 		return -ENOMEM;
 
@@ -312,12 +341,12 @@ static int ssh_ctrl_packet_alloc(struct ssh_packet **packet,
 }
 
 /**
- * ssh_ctrl_packet_free() - Free control packet.
+ * ssh_ctrl_packet_free() - Free packet allocated from control packet cache.
  * @p: The packet to free.
  */
 static void ssh_ctrl_packet_free(struct ssh_packet *p)
 {
-	kfree(p);
+	kmem_cache_free(ssh_ctrl_packet_cache, p);
 }
 
 static const struct ssh_packet_ops ssh_ptl_ctrl_packet_ops = {
diff --git a/drivers/misc/surface_aggregator/ssh_packet_layer.h b/drivers/misc/surface_aggregator/ssh_packet_layer.h
index 51295cf48519..957dd3f1006c 100644
--- a/drivers/misc/surface_aggregator/ssh_packet_layer.h
+++ b/drivers/misc/surface_aggregator/ssh_packet_layer.h
@@ -164,4 +164,7 @@ void ssh_ptl_tx_wakeup(struct ssh_ptl *ptl);
 void ssh_packet_init(struct ssh_packet *packet, unsigned long type,
 		     u8 priority, const struct ssh_packet_ops *ops);
 
+int ssh_ctrl_packet_cache_init(void);
+void ssh_ctrl_packet_cache_destroy(void);
+
 #endif /* _SURFACE_AGGREGATOR_SSH_PACKET_LAYER_H */
-- 
2.28.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 3/9] surface_aggregator: Add event item allocation chaching
  2020-09-23 15:15 [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Maximilian Luz
  2020-09-23 15:15 ` [RFC PATCH 2/9] surface_aggregator: Add control packet allocation chaching Maximilian Luz
@ 2020-09-23 15:15 ` Maximilian Luz
  2020-09-23 15:15 ` [RFC PATCH 4/9] surface_aggregator: Add trace points Maximilian Luz
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maximilian Luz, linux-serial, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Jiri Slaby, Blaž Hrastnik, Dorian Stoll

Event items are used for completing Surface Aggregator EC events, i.e.
placing event command data and payload on a workqueue for later
processing to avoid doing said processing directly on the receiver
thread. This means that event items are allocated for each incoming
event, regardless of that event being transmitted via sequenced or
unsequenced packets.

On the Surface Book 3 and Surface Laptop 3, touchpad HID input events
(unsequenced), can constitute a larger amount of traffic, and therefore
allocation of event items. This warrants caching event items to reduce
memory fragmentation. The size of the cached objects is specifically
tuned to accommodate keyboard and touchpad input events and their
payloads on those devices. As a result, this effectively also covers
most other event types. In case of a larger event payload, event item
allocation will fall back to kzalloc().

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
---
 drivers/misc/surface_aggregator/controller.c | 84 ++++++++++++++++++--
 drivers/misc/surface_aggregator/controller.h |  9 +++
 drivers/misc/surface_aggregator/core.c       | 16 +++-
 3 files changed, 100 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/surface_aggregator/controller.c b/drivers/misc/surface_aggregator/controller.c
index 6963cf1e1840..9780a410873e 100644
--- a/drivers/misc/surface_aggregator/controller.c
+++ b/drivers/misc/surface_aggregator/controller.c
@@ -530,14 +530,74 @@ static void ssam_nf_destroy(struct ssam_nf *nf)
 
 #define SSAM_CPLT_WQ_NAME	"ssam_cpltq"
 
+/*
+ * SSAM_EVENT_ITEM_CACHE_PAYLOAD_LEN - Maximum payload length for a cached
+ * &struct ssam_event_item.
+ *
+ * This length has been chosen to be accommodate standard touchpad and
+ * keyboard input events. Events with larger payloads will be allocated
+ * separately.
+ */
+#define SSAM_EVENT_ITEM_CACHE_PAYLOAD_LEN	32
+
+static struct kmem_cache *ssam_event_item_cache;
+
+/**
+ * ssam_event_item_cache_init() - Initialize the event item cache.
+ */
+int ssam_event_item_cache_init(void)
+{
+	const unsigned int size = sizeof(struct ssam_event_item)
+				  + SSAM_EVENT_ITEM_CACHE_PAYLOAD_LEN;
+	const unsigned int align = __alignof__(struct ssam_event_item);
+	struct kmem_cache *cache;
+
+	cache = kmem_cache_create("ssam_event_item", size, align, 0, NULL);
+	if (!cache)
+		return -ENOMEM;
+
+	ssam_event_item_cache = cache;
+	return 0;
+}
+
+/**
+ * ssam_event_item_cache_destroy() - Deinitialize the event item cache.
+ */
+void ssam_event_item_cache_destroy(void)
+{
+	kmem_cache_destroy(ssam_event_item_cache);
+	ssam_event_item_cache = NULL;
+}
+
+static void __ssam_event_item_free_cached(struct ssam_event_item *item)
+{
+	kmem_cache_free(ssam_event_item_cache, item);
+}
+
+static void __ssam_event_item_free_generic(struct ssam_event_item *item)
+{
+	kfree(item);
+}
+
+/**
+ * ssam_event_item_free() - Free the provided event item.
+ * @item: The event item to free.
+ */
+static void ssam_event_item_free(struct ssam_event_item *item)
+{
+	item->ops.free(item);
+}
+
 /**
  * ssam_event_item_alloc() - Allocate an event item with the given payload size.
  * @len:   The event payload length.
  * @flags: The flags used for allocation.
  *
- * Allocate an event item with the given payload size. Sets the item
- * operations and payload length values. The item free callback (``ops.free``)
- * should not be overwritten after this call.
+ * Allocate an event item with the given payload size, preferring allocation
+ * from the event item cache if the payload is small enough (i.e. smaller than
+ * %SSAM_EVENT_ITEM_CACHE_PAYLOAD_LEN). Sets the item operations and payload
+ * length values. The item free callback (``ops.free``) should not be
+ * overwritten after this call.
  *
  * Return: Returns the newly allocated event item.
  */
@@ -545,9 +605,19 @@ static struct ssam_event_item *ssam_event_item_alloc(size_t len, gfp_t flags)
 {
 	struct ssam_event_item *item;
 
-	item = kzalloc(sizeof(*item) + len, GFP_KERNEL);
-	if (!item)
-		return NULL;
+	if (len <= SSAM_EVENT_ITEM_CACHE_PAYLOAD_LEN) {
+		item = kmem_cache_alloc(ssam_event_item_cache, GFP_KERNEL);
+		if (!item)
+			return NULL;
+
+		item->ops.free = __ssam_event_item_free_cached;
+	} else {
+		item = kzalloc(sizeof(*item) + len, GFP_KERNEL);
+		if (!item)
+			return NULL;
+
+		item->ops.free = __ssam_event_item_free_generic;
+	}
 
 	item->event.length = len;
 	return item;
@@ -709,7 +779,7 @@ static void ssam_event_queue_work_fn(struct work_struct *work)
 			return;
 
 		ssam_nf_call(nf, dev, item->rqid, &item->event);
-		kfree(item);
+		ssam_event_item_free(item);
 	}
 
 	if (!ssam_event_queue_is_empty(queue))
diff --git a/drivers/misc/surface_aggregator/controller.h b/drivers/misc/surface_aggregator/controller.h
index 88652120cf4b..5dde5705c79b 100644
--- a/drivers/misc/surface_aggregator/controller.h
+++ b/drivers/misc/surface_aggregator/controller.h
@@ -76,12 +76,18 @@ struct ssam_cplt;
  * struct ssam_event_item - Struct for event queuing and completion.
  * @node:     The node in the queue.
  * @rqid:     The request ID of the event.
+ * @ops:      Instance specific functions.
+ * @ops.free: Callback for freeing this event item.
  * @event:    Actual event data.
  */
 struct ssam_event_item {
 	struct list_head node;
 	u16 rqid;
 
+	struct {
+		void (*free)(struct ssam_event_item *event);
+	} ops;
+
 	struct ssam_event event;	// must be last
 };
 
@@ -271,4 +277,7 @@ int ssam_ctrl_notif_d0_entry(struct ssam_controller *ctrl);
 int ssam_controller_suspend(struct ssam_controller *ctrl);
 int ssam_controller_resume(struct ssam_controller *ctrl);
 
+int ssam_event_item_cache_init(void);
+void ssam_event_item_cache_destroy(void);
+
 #endif /* _SURFACE_AGGREGATOR_CONTROLLER_H */
diff --git a/drivers/misc/surface_aggregator/core.c b/drivers/misc/surface_aggregator/core.c
index 938c46a9e20b..5b79e57773fd 100644
--- a/drivers/misc/surface_aggregator/core.c
+++ b/drivers/misc/surface_aggregator/core.c
@@ -772,12 +772,23 @@ static int __init ssam_core_init(void)
 
 	status = ssh_ctrl_packet_cache_init();
 	if (status)
-		return status;
+		goto err_cpkg;
+
+	status = ssam_event_item_cache_init();
+	if (status)
+		goto err_evitem;
 
 	status = serdev_device_driver_register(&ssam_serial_hub);
 	if (status)
-		ssh_ctrl_packet_cache_destroy();
+		goto err_register;
 
+	return 0;
+
+err_register:
+	ssam_event_item_cache_destroy();
+err_evitem:
+	ssh_ctrl_packet_cache_destroy();
+err_cpkg:
 	return status;
 }
 module_init(ssam_core_init);
@@ -785,6 +796,7 @@ module_init(ssam_core_init);
 static void __exit ssam_core_exit(void)
 {
 	serdev_device_driver_unregister(&ssam_serial_hub);
+	ssam_event_item_cache_destroy();
 	ssh_ctrl_packet_cache_destroy();
 }
 module_exit(ssam_core_exit);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 4/9] surface_aggregator: Add trace points
  2020-09-23 15:15 [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Maximilian Luz
  2020-09-23 15:15 ` [RFC PATCH 2/9] surface_aggregator: Add control packet allocation chaching Maximilian Luz
  2020-09-23 15:15 ` [RFC PATCH 3/9] surface_aggregator: Add event item " Maximilian Luz
@ 2020-09-23 15:15 ` Maximilian Luz
  2020-09-23 20:07   ` Steven Rostedt
  2020-09-23 15:15 ` [RFC PATCH 5/9] surface_aggregator: Add error injection capabilities Maximilian Luz
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maximilian Luz, linux-serial, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Jiri Slaby, Blaž Hrastnik, Dorian Stoll

Add trace points to the Surface Aggregator subsystem core. These trace
points can be used to track packets, requests, and allocations. They are
further intended for debugging and testing/validation, specifically in
combination with the error injection capabilities introduced in the
subsequent commit.

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
---
 drivers/misc/surface_aggregator/Makefile      |   3 +
 drivers/misc/surface_aggregator/controller.c  |   5 +
 drivers/misc/surface_aggregator/core.c        |   3 +
 .../surface_aggregator/ssh_packet_layer.c     |  21 +
 .../surface_aggregator/ssh_request_layer.c    |  18 +
 drivers/misc/surface_aggregator/trace.h       | 612 ++++++++++++++++++
 6 files changed, 662 insertions(+)
 create mode 100644 drivers/misc/surface_aggregator/trace.h

diff --git a/drivers/misc/surface_aggregator/Makefile b/drivers/misc/surface_aggregator/Makefile
index c92230746c7c..92718c7956a0 100644
--- a/drivers/misc/surface_aggregator/Makefile
+++ b/drivers/misc/surface_aggregator/Makefile
@@ -1,5 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-or-later
 
+# For include/trace/define_trace.h to include trace.h
+CFLAGS_core.o = -I$(src)
+
 obj-$(CONFIG_SURFACE_AGGREGATOR) += surface_aggregator.o
 
 surface_aggregator-objs := core.o
diff --git a/drivers/misc/surface_aggregator/controller.c b/drivers/misc/surface_aggregator/controller.c
index 9780a410873e..d7483d3a2721 100644
--- a/drivers/misc/surface_aggregator/controller.c
+++ b/drivers/misc/surface_aggregator/controller.c
@@ -24,6 +24,8 @@
 #include "ssh_msgb.h"
 #include "ssh_request_layer.h"
 
+#include "trace.h"
+
 
 /* -- Safe counters. -------------------------------------------------------- */
 
@@ -585,6 +587,7 @@ static void __ssam_event_item_free_generic(struct ssam_event_item *item)
  */
 static void ssam_event_item_free(struct ssam_event_item *item)
 {
+	trace_ssam_event_item_free(item);
 	item->ops.free(item);
 }
 
@@ -620,6 +623,8 @@ static struct ssam_event_item *ssam_event_item_alloc(size_t len, gfp_t flags)
 	}
 
 	item->event.length = len;
+
+	trace_ssam_event_item_alloc(item, len);
 	return item;
 }
 
diff --git a/drivers/misc/surface_aggregator/core.c b/drivers/misc/surface_aggregator/core.c
index 5b79e57773fd..44bf83dd8fa9 100644
--- a/drivers/misc/surface_aggregator/core.c
+++ b/drivers/misc/surface_aggregator/core.c
@@ -22,6 +22,9 @@
 #include <linux/surface_aggregator/controller.h>
 #include "controller.h"
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
 
 /* -- Static controller reference. ------------------------------------------ */
 
diff --git a/drivers/misc/surface_aggregator/ssh_packet_layer.c b/drivers/misc/surface_aggregator/ssh_packet_layer.c
index 993aabfdfdae..29dc64dd27a8 100644
--- a/drivers/misc/surface_aggregator/ssh_packet_layer.c
+++ b/drivers/misc/surface_aggregator/ssh_packet_layer.c
@@ -19,6 +19,8 @@
 #include "ssh_packet_layer.h"
 #include "ssh_parser.h"
 
+#include "trace.h"
+
 
 /*
  * To simplify reasoning about the code below, we define a few concepts. The
@@ -209,6 +211,8 @@ static void __ssh_ptl_packet_release(struct kref *kref)
 {
 	struct ssh_packet *p = container_of(kref, struct ssh_packet, refcnt);
 
+	trace_ssam_packet_release(p);
+
 	ptl_dbg_cond(p->ptl, "ptl: releasing packet %p\n", p);
 	p->ops->release(p);
 }
@@ -337,6 +341,7 @@ static int ssh_ctrl_packet_alloc(struct ssh_packet **packet,
 	buffer->ptr = (u8 *)(*packet + 1);
 	buffer->len = SSH_MSG_LEN_CTRL;
 
+	trace_ssam_ctrl_packet_alloc(*packet, buffer->len);
 	return 0;
 }
 
@@ -346,6 +351,7 @@ static int ssh_ctrl_packet_alloc(struct ssh_packet **packet,
  */
 static void ssh_ctrl_packet_free(struct ssh_packet *p)
 {
+	trace_ssam_ctrl_packet_free(p);
 	kmem_cache_free(ssh_ctrl_packet_cache, p);
 }
 
@@ -550,6 +556,7 @@ static void __ssh_ptl_complete(struct ssh_packet *p, int status)
 {
 	struct ssh_ptl *ptl = READ_ONCE(p->ptl);
 
+	trace_ssam_packet_complete(p, status);
 	ptl_dbg_cond(ptl, "ptl: completing packet %p (status: %d)\n", p, status);
 
 	if (p->ops->complete)
@@ -968,6 +975,8 @@ int ssh_ptl_submit(struct ssh_ptl *ptl, struct ssh_packet *p)
 	struct ssh_ptl *ptl_old;
 	int status;
 
+	trace_ssam_packet_submit(p);
+
 	// validate packet fields
 	if (test_bit(SSH_PACKET_TY_FLUSH_BIT, &p->state)) {
 		if (p->data.ptr || test_bit(SSH_PACKET_TY_SEQUENCED_BIT, &p->state))
@@ -1002,6 +1011,8 @@ static int __ssh_ptl_resubmit(struct ssh_packet *packet)
 {
 	int status;
 
+	trace_ssam_packet_resubmit(packet);
+
 	spin_lock(&packet->ptl->queue.lock);
 
 	status = __ssh_ptl_queue_push(packet);
@@ -1094,6 +1105,8 @@ void ssh_ptl_cancel(struct ssh_packet *p)
 	if (test_and_set_bit(SSH_PACKET_SF_CANCELED_BIT, &p->state))
 		return;
 
+	trace_ssam_packet_cancel(p);
+
 	/*
 	 * Lock packet and commit with memory barrier. If this packet has
 	 * already been locked, it's going to be removed and completed by
@@ -1147,6 +1160,8 @@ static void ssh_ptl_timeout_reap(struct work_struct *work)
 	ktime_t next = KTIME_MAX;
 	bool resub = false;
 
+	trace_ssam_ptl_timeout_reap("pending", atomic_read(&ptl->pending.count));
+
 	/*
 	 * Mark reaper as "not pending". This is done before checking any
 	 * packets to avoid lost-update type problems.
@@ -1178,6 +1193,8 @@ static void ssh_ptl_timeout_reap(struct work_struct *work)
 		// check if we still have some tries left
 		try = ssh_packet_priority_get_try(READ_ONCE(p->priority));
 		if (likely(try < SSH_PTL_MAX_PACKET_TRIES)) {
+			trace_ssam_packet_timeout(p);
+
 			/*
 			 * Submission fails if the packet has been locked, is
 			 * already queued, or the layer is being shut down.
@@ -1195,6 +1212,8 @@ static void ssh_ptl_timeout_reap(struct work_struct *work)
 		if (test_and_set_bit(SSH_PACKET_SF_LOCKED_BIT, &p->state))
 			continue;
 
+		trace_ssam_packet_timeout(p);
+
 		/*
 		 * We have now marked the packet as locked. Thus it cannot be
 		 * added to the pending list again after we've removed it here.
@@ -1359,6 +1378,8 @@ static size_t ssh_ptl_rx_eval(struct ssh_ptl *ptl, struct ssam_span *source)
 	if (!frame)	// not enough data
 		return aligned.ptr - source->ptr;
 
+	trace_ssam_rx_frame_received(frame);
+
 	switch (frame->type) {
 	case SSH_FRAME_TYPE_ACK:
 		ssh_ptl_acknowledge(ptl, frame->seq);
diff --git a/drivers/misc/surface_aggregator/ssh_request_layer.c b/drivers/misc/surface_aggregator/ssh_request_layer.c
index f0f471f61b0a..e7358b9a2e4d 100644
--- a/drivers/misc/surface_aggregator/ssh_request_layer.c
+++ b/drivers/misc/surface_aggregator/ssh_request_layer.c
@@ -16,6 +16,8 @@
 #include "ssh_packet_layer.h"
 #include "ssh_request_layer.h"
 
+#include "trace.h"
+
 
 /*
  * SSH_RTL_REQUEST_TIMEOUT - Request timeout.
@@ -136,6 +138,8 @@ static void ssh_rtl_complete_with_status(struct ssh_request *rqst, int status)
 {
 	struct ssh_rtl *rtl = ssh_request_rtl(rqst);
 
+	trace_ssam_request_complete(rqst, status);
+
 	// rtl/ptl may not be set if we're cancelling before submitting
 	rtl_dbg_cond(rtl, "rtl: completing request (rqid: 0x%04x, status: %d)\n",
 		     ssh_request_get_rqid_safe(rqst), status);
@@ -149,6 +153,8 @@ static void ssh_rtl_complete_with_rsp(struct ssh_request *rqst,
 {
 	struct ssh_rtl *rtl = ssh_request_rtl(rqst);
 
+	trace_ssam_request_complete(rqst, 0);
+
 	rtl_dbg(rtl, "rtl: completing request with response (rqid: 0x%04x)\n",
 		ssh_request_get_rqid(rqst));
 
@@ -321,6 +327,8 @@ static void ssh_rtl_tx_work_fn(struct work_struct *work)
  */
 int ssh_rtl_submit(struct ssh_rtl *rtl, struct ssh_request *rqst)
 {
+	trace_ssam_request_submit(rqst);
+
 	/*
 	 * Ensure that requests expecting a response are sequenced. If this
 	 * invariant ever changes, see the comment in ssh_rtl_complete() on what
@@ -433,6 +441,8 @@ static void ssh_rtl_complete(struct ssh_rtl *rtl,
 	struct ssh_request *p, *n;
 	u16 rqid = get_unaligned_le16(&command->rqid);
 
+	trace_ssam_rx_response_received(command, command_data->len);
+
 	/*
 	 * Get request from pending based on request ID and mark it as response
 	 * received and locked.
@@ -683,6 +693,8 @@ bool ssh_rtl_cancel(struct ssh_request *rqst, bool pending)
 	if (test_and_set_bit(SSH_REQUEST_SF_CANCELED_BIT, &rqst->state))
 		return true;
 
+	trace_ssam_request_cancel(rqst);
+
 	if (pending)
 		canceled = ssh_rtl_cancel_pending(rqst);
 	else
@@ -772,6 +784,8 @@ static void ssh_rtl_timeout_reap(struct work_struct *work)
 	ktime_t timeout = rtl->rtx_timeout.timeout;
 	ktime_t next = KTIME_MAX;
 
+	trace_ssam_rtl_timeout_reap("pending", atomic_read(&rtl->pending.count));
+
 	/*
 	 * Mark reaper as "not pending". This is done before checking any
 	 * requests to avoid lost-update type problems.
@@ -820,6 +834,8 @@ static void ssh_rtl_timeout_reap(struct work_struct *work)
 
 	// cancel and complete the request
 	list_for_each_entry_safe(r, n, &claimed, node) {
+		trace_ssam_request_timeout(r);
+
 		/*
 		 * At this point we've removed the packet from pending. This
 		 * means that we've obtained the last (only) reference of the
@@ -845,6 +861,8 @@ static void ssh_rtl_timeout_reap(struct work_struct *work)
 static void ssh_rtl_rx_event(struct ssh_rtl *rtl, const struct ssh_command *cmd,
 			     const struct ssam_span *data)
 {
+	trace_ssam_rx_event_received(cmd, data->len);
+
 	rtl_dbg(rtl, "rtl: handling event (rqid: 0x%04x)\n",
 		get_unaligned_le16(&cmd->rqid));
 
diff --git a/drivers/misc/surface_aggregator/trace.h b/drivers/misc/surface_aggregator/trace.h
new file mode 100644
index 000000000000..eb2e3e1457de
--- /dev/null
+++ b/drivers/misc/surface_aggregator/trace.h
@@ -0,0 +1,612 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM surface_aggregator
+
+#if !defined(_SURFACE_AGGREGATOR_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _SURFACE_AGGREGATOR_TRACE_H
+
+#include <linux/surface_aggregator/serial_hub.h>
+
+#include <asm/unaligned.h>
+#include <linux/tracepoint.h>
+
+
+TRACE_DEFINE_ENUM(SSH_FRAME_TYPE_DATA_SEQ);
+TRACE_DEFINE_ENUM(SSH_FRAME_TYPE_DATA_NSQ);
+TRACE_DEFINE_ENUM(SSH_FRAME_TYPE_ACK);
+TRACE_DEFINE_ENUM(SSH_FRAME_TYPE_NAK);
+
+TRACE_DEFINE_ENUM(SSH_PACKET_SF_LOCKED_BIT);
+TRACE_DEFINE_ENUM(SSH_PACKET_SF_QUEUED_BIT);
+TRACE_DEFINE_ENUM(SSH_PACKET_SF_PENDING_BIT);
+TRACE_DEFINE_ENUM(SSH_PACKET_SF_TRANSMITTING_BIT);
+TRACE_DEFINE_ENUM(SSH_PACKET_SF_TRANSMITTED_BIT);
+TRACE_DEFINE_ENUM(SSH_PACKET_SF_ACKED_BIT);
+TRACE_DEFINE_ENUM(SSH_PACKET_SF_CANCELED_BIT);
+TRACE_DEFINE_ENUM(SSH_PACKET_SF_COMPLETED_BIT);
+
+TRACE_DEFINE_ENUM(SSH_PACKET_TY_FLUSH_BIT);
+TRACE_DEFINE_ENUM(SSH_PACKET_TY_SEQUENCED_BIT);
+TRACE_DEFINE_ENUM(SSH_PACKET_TY_BLOCKING_BIT);
+
+TRACE_DEFINE_ENUM(SSH_PACKET_FLAGS_SF_MASK);
+TRACE_DEFINE_ENUM(SSH_PACKET_FLAGS_TY_MASK);
+
+TRACE_DEFINE_ENUM(SSH_REQUEST_SF_LOCKED_BIT);
+TRACE_DEFINE_ENUM(SSH_REQUEST_SF_QUEUED_BIT);
+TRACE_DEFINE_ENUM(SSH_REQUEST_SF_PENDING_BIT);
+TRACE_DEFINE_ENUM(SSH_REQUEST_SF_TRANSMITTING_BIT);
+TRACE_DEFINE_ENUM(SSH_REQUEST_SF_TRANSMITTED_BIT);
+TRACE_DEFINE_ENUM(SSH_REQUEST_SF_RSPRCVD_BIT);
+TRACE_DEFINE_ENUM(SSH_REQUEST_SF_CANCELED_BIT);
+TRACE_DEFINE_ENUM(SSH_REQUEST_SF_COMPLETED_BIT);
+
+TRACE_DEFINE_ENUM(SSH_REQUEST_TY_FLUSH_BIT);
+TRACE_DEFINE_ENUM(SSH_REQUEST_TY_HAS_RESPONSE_BIT);
+
+TRACE_DEFINE_ENUM(SSH_REQUEST_FLAGS_SF_MASK);
+TRACE_DEFINE_ENUM(SSH_REQUEST_FLAGS_TY_MASK);
+
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_SAM);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_BAT);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_TMP);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_PMC);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_FAN);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_PoM);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_DBG);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_KBD);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_FWU);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_UNI);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_LPC);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_TCL);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_SFL);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_KIP);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_EXT);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_BLD);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_BAS);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_SEN);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_SRQ);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_MCU);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_HID);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_TCH);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_BKL);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_TAM);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_ACC);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_UFI);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_USC);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_PEN);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_VID);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_AUD);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_SMC);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_KPD);
+TRACE_DEFINE_ENUM(SSAM_SSH_TC_REG);
+
+
+#define SSAM_PTR_UID_LEN		9
+#define SSAM_U8_FIELD_NOT_APPLICABLE	((u16)-1)
+#define SSAM_SEQ_NOT_APPLICABLE		((u16)-1)
+#define SSAM_RQID_NOT_APPLICABLE	((u32)-1)
+#define SSAM_SSH_TC_NOT_APPLICABLE	0
+
+
+#ifndef _SURFACE_AGGREGATOR_TRACE_HELPERS
+#define _SURFACE_AGGREGATOR_TRACE_HELPERS
+
+/**
+ * ssam_trace_ptr_uid() - Convert the pointer to a non-pointer UID string.
+ * @ptr: The pointer to convert.
+ * @uid_str: A buffer of length SSAM_PTR_UID_LEN where the UID will be stored.
+ *
+ * Converts the given pointer into a UID string that is safe to be shared
+ * with userspace and logs, i.e. doesn't give away the real memory location.
+ */
+static inline void ssam_trace_ptr_uid(const void *ptr, char *uid_str)
+{
+	char buf[2 * sizeof(void *) + 1];
+
+	snprintf(buf, ARRAY_SIZE(buf), "%p", ptr);
+	memcpy(uid_str, &buf[ARRAY_SIZE(buf) - SSAM_PTR_UID_LEN],
+	       SSAM_PTR_UID_LEN);
+}
+
+/**
+ * ssam_trace_get_packet_seq() - Read the packet's sequence ID.
+ * @p: The packet.
+ *
+ * Return: Returns the packet's sequence ID (SEQ) field if present, or
+ * %SSAM_SEQ_NOT_APPLICABLE if not (e.g. flush packet).
+ */
+static inline u16 ssam_trace_get_packet_seq(const struct ssh_packet *p)
+{
+	if (!p->data.ptr || p->data.len < SSH_MESSAGE_LENGTH(0))
+		return SSAM_SEQ_NOT_APPLICABLE;
+
+	return p->data.ptr[SSH_MSGOFFSET_FRAME(seq)];
+}
+
+/**
+ * ssam_trace_get_request_id() - Read the packet's request ID.
+ * @p: The packet.
+ *
+ * Return: Returns the packet's request ID (RQID) field if the packet
+ * represents a request with command data, or %SSAM_RQID_NOT_APPLICABLE if not
+ * (e.g. flush request, control packet).
+ */
+static inline u32 ssam_trace_get_request_id(const struct ssh_packet *p)
+{
+	if (!p->data.ptr || p->data.len < SSH_COMMAND_MESSAGE_LENGTH(0))
+		return SSAM_RQID_NOT_APPLICABLE;
+
+	return get_unaligned_le16(&p->data.ptr[SSH_MSGOFFSET_COMMAND(rqid)]);
+}
+
+/**
+ * ssam_trace_get_request_tc() - Read the packet's request target category.
+ * @p: The packet.
+ *
+ * Return: Returns the packet's request target category (TC) field if the
+ * packet represents a request with command data, or %SSAM_TC_NOT_APPLICABLE
+ * if not (e.g. flush request, control packet).
+ */
+static inline u32 ssam_trace_get_request_tc(const struct ssh_packet *p)
+{
+	if (!p->data.ptr || p->data.len < SSH_COMMAND_MESSAGE_LENGTH(0))
+		return SSAM_SSH_TC_NOT_APPLICABLE;
+
+	return get_unaligned_le16(&p->data.ptr[SSH_MSGOFFSET_COMMAND(tc)]);
+}
+
+#endif /* _SURFACE_AGGREGATOR_TRACE_HELPERS */
+
+#define ssam_trace_get_command_field_u8(packet, field) \
+	((!packet || packet->data.len < SSH_COMMAND_MESSAGE_LENGTH(0)) \
+	 ? 0 : p->data.ptr[SSH_MSGOFFSET_COMMAND(field)])
+
+#define ssam_show_generic_u8_field(value)				\
+	__print_symbolic(value,						\
+		{ SSAM_U8_FIELD_NOT_APPLICABLE,		"N/A" }		\
+	)
+
+
+#define ssam_show_frame_type(ty)					\
+	__print_symbolic(ty,						\
+		{ SSH_FRAME_TYPE_DATA_SEQ,		"DSEQ" },	\
+		{ SSH_FRAME_TYPE_DATA_NSQ,		"DNSQ" },	\
+		{ SSH_FRAME_TYPE_ACK,			"ACK"  },	\
+		{ SSH_FRAME_TYPE_NAK,			"NAK"  }	\
+	)
+
+#define ssam_show_packet_type(type)					\
+	__print_flags(flags & SSH_PACKET_FLAGS_TY_MASK, "",		\
+		{ BIT(SSH_PACKET_TY_FLUSH_BIT),		"F" },		\
+		{ BIT(SSH_PACKET_TY_SEQUENCED_BIT),	"S" },		\
+		{ BIT(SSH_PACKET_TY_BLOCKING_BIT),	"B" }		\
+	)
+
+#define ssam_show_packet_state(state)					\
+	__print_flags(flags & SSH_PACKET_FLAGS_SF_MASK, "",		\
+		{ BIT(SSH_PACKET_SF_LOCKED_BIT),	"L" },		\
+		{ BIT(SSH_PACKET_SF_QUEUED_BIT),	"Q" },		\
+		{ BIT(SSH_PACKET_SF_PENDING_BIT),	"P" },		\
+		{ BIT(SSH_PACKET_SF_TRANSMITTING_BIT),	"S" },		\
+		{ BIT(SSH_PACKET_SF_TRANSMITTED_BIT),	"T" },		\
+		{ BIT(SSH_PACKET_SF_ACKED_BIT),		"A" },		\
+		{ BIT(SSH_PACKET_SF_CANCELED_BIT),	"C" },		\
+		{ BIT(SSH_PACKET_SF_COMPLETED_BIT),	"F" }		\
+	)
+
+#define ssam_show_packet_seq(seq)					\
+	__print_symbolic(seq,						\
+		{ SSAM_SEQ_NOT_APPLICABLE,		"N/A" }		\
+	)
+
+
+#define ssam_show_request_type(flags)					\
+	__print_flags(flags & SSH_REQUEST_FLAGS_TY_MASK, "",		\
+		{ BIT(SSH_REQUEST_TY_FLUSH_BIT),	"F" },		\
+		{ BIT(SSH_REQUEST_TY_HAS_RESPONSE_BIT),	"R" }		\
+	)
+
+#define ssam_show_request_state(flags)					\
+	__print_flags(flags & SSH_REQUEST_FLAGS_SF_MASK, "",		\
+		{ BIT(SSH_REQUEST_SF_LOCKED_BIT),	"L" },		\
+		{ BIT(SSH_REQUEST_SF_QUEUED_BIT),	"Q" },		\
+		{ BIT(SSH_REQUEST_SF_PENDING_BIT),	"P" },		\
+		{ BIT(SSH_REQUEST_SF_TRANSMITTING_BIT),	"S" },		\
+		{ BIT(SSH_REQUEST_SF_TRANSMITTED_BIT),	"T" },		\
+		{ BIT(SSH_REQUEST_SF_RSPRCVD_BIT),	"A" },		\
+		{ BIT(SSH_REQUEST_SF_CANCELED_BIT),	"C" },		\
+		{ BIT(SSH_REQUEST_SF_COMPLETED_BIT),	"F" }		\
+	)
+
+#define ssam_show_request_id(rqid)					\
+	__print_symbolic(rqid,						\
+		{ SSAM_RQID_NOT_APPLICABLE,		"N/A" }		\
+	)
+
+#define ssam_show_ssh_tc(rqid)						\
+	__print_symbolic(rqid,						\
+		{ SSAM_SSH_TC_NOT_APPLICABLE,		"N/A" },	\
+		{ SSAM_SSH_TC_SAM,			"SAM" },	\
+		{ SSAM_SSH_TC_BAT,			"BAT" },	\
+		{ SSAM_SSH_TC_TMP,			"TMP" },	\
+		{ SSAM_SSH_TC_PMC,			"PMC" },	\
+		{ SSAM_SSH_TC_FAN,			"FAN" },	\
+		{ SSAM_SSH_TC_PoM,			"PoM" },	\
+		{ SSAM_SSH_TC_DBG,			"DBG" },	\
+		{ SSAM_SSH_TC_KBD,			"KBD" },	\
+		{ SSAM_SSH_TC_FWU,			"FWU" },	\
+		{ SSAM_SSH_TC_UNI,			"UNI" },	\
+		{ SSAM_SSH_TC_LPC,			"LPC" },	\
+		{ SSAM_SSH_TC_TCL,			"TCL" },	\
+		{ SSAM_SSH_TC_SFL,			"SFL" },	\
+		{ SSAM_SSH_TC_KIP,			"KIP" },	\
+		{ SSAM_SSH_TC_EXT,			"EXT" },	\
+		{ SSAM_SSH_TC_BLD,			"BLD" },	\
+		{ SSAM_SSH_TC_BAS,			"BAS" },	\
+		{ SSAM_SSH_TC_SEN,			"SEN" },	\
+		{ SSAM_SSH_TC_SRQ,			"SRQ" },	\
+		{ SSAM_SSH_TC_MCU,			"MCU" },	\
+		{ SSAM_SSH_TC_HID,			"HID" },	\
+		{ SSAM_SSH_TC_TCH,			"TCH" },	\
+		{ SSAM_SSH_TC_BKL,			"BKL" },	\
+		{ SSAM_SSH_TC_TAM,			"TAM" },	\
+		{ SSAM_SSH_TC_ACC,			"ACC" },	\
+		{ SSAM_SSH_TC_UFI,			"UFI" },	\
+		{ SSAM_SSH_TC_USC,			"USC" },	\
+		{ SSAM_SSH_TC_PEN,			"PEN" },	\
+		{ SSAM_SSH_TC_VID,			"VID" },	\
+		{ SSAM_SSH_TC_AUD,			"AUD" },	\
+		{ SSAM_SSH_TC_SMC,			"SMC" },	\
+		{ SSAM_SSH_TC_KPD,			"KPD" },	\
+		{ SSAM_SSH_TC_REG,			"REG" }		\
+	)
+
+
+DECLARE_EVENT_CLASS(ssam_frame_class,
+	TP_PROTO(const struct ssh_frame *frame),
+
+	TP_ARGS(frame),
+
+	TP_STRUCT__entry(
+		__field(u8, type)
+		__field(u8, seq)
+		__field(u16, len)
+	),
+
+	TP_fast_assign(
+		__entry->type = frame->type;
+		__entry->seq = frame->seq;
+		__entry->len = get_unaligned_le16(&frame->len);
+	),
+
+	TP_printk("ty=%s, seq=0x%02x, len=%u",
+		ssam_show_frame_type(__entry->type),
+		__entry->seq,
+		__entry->len
+	)
+);
+
+#define DEFINE_SSAM_FRAME_EVENT(name)				\
+	DEFINE_EVENT(ssam_frame_class, ssam_##name,		\
+		TP_PROTO(const struct ssh_frame *frame),	\
+		TP_ARGS(frame)					\
+	)
+
+
+DECLARE_EVENT_CLASS(ssam_command_class,
+	TP_PROTO(const struct ssh_command *cmd, u16 len),
+
+	TP_ARGS(cmd, len),
+
+	TP_STRUCT__entry(
+		__field(u16, rqid)
+		__field(u16, len)
+		__field(u8, tc)
+		__field(u8, cid)
+		__field(u8, iid)
+	),
+
+	TP_fast_assign(
+		__entry->rqid = get_unaligned_le16(&cmd->rqid);
+		__entry->tc = cmd->tc;
+		__entry->cid = cmd->cid;
+		__entry->iid = cmd->iid;
+		__entry->len = len;
+	),
+
+	TP_printk("rqid=0x%04x, tc=%s, cid=0x%02x, iid=0x%02x, len=%u",
+		__entry->rqid,
+		ssam_show_ssh_tc(__entry->tc),
+		__entry->cid,
+		__entry->iid,
+		__entry->len
+	)
+);
+
+#define DEFINE_SSAM_COMMAND_EVENT(name)					\
+	DEFINE_EVENT(ssam_command_class, ssam_##name,			\
+		TP_PROTO(const struct ssh_command *cmd, u16 len),	\
+		TP_ARGS(cmd, len)					\
+	)
+
+
+DECLARE_EVENT_CLASS(ssam_packet_class,
+	TP_PROTO(const struct ssh_packet *packet),
+
+	TP_ARGS(packet),
+
+	TP_STRUCT__entry(
+		__array(char, uid, SSAM_PTR_UID_LEN)
+		__field(u8, priority)
+		__field(u16, length)
+		__field(unsigned long, state)
+		__field(u16, seq)
+	),
+
+	TP_fast_assign(
+		ssam_trace_ptr_uid(packet, __entry->uid);
+		__entry->priority = READ_ONCE(packet->priority);
+		__entry->length = packet->data.len;
+		__entry->state = READ_ONCE(packet->state);
+		__entry->seq = ssam_trace_get_packet_seq(packet);
+	),
+
+	TP_printk("uid=%s, seq=%s, ty=%s, pri=0x%02x, len=%u, sta=%s",
+		__entry->uid,
+		ssam_show_packet_seq(__entry->seq),
+		ssam_show_packet_type(__entry->state),
+		__entry->priority,
+		__entry->length,
+		ssam_show_packet_state(__entry->state)
+	)
+);
+
+#define DEFINE_SSAM_PACKET_EVENT(name)				\
+	DEFINE_EVENT(ssam_packet_class, ssam_##name,		\
+		TP_PROTO(const struct ssh_packet *packet),	\
+		TP_ARGS(packet)					\
+	)
+
+
+DECLARE_EVENT_CLASS(ssam_packet_status_class,
+	TP_PROTO(const struct ssh_packet *packet, int status),
+
+	TP_ARGS(packet, status),
+
+	TP_STRUCT__entry(
+		__array(char, uid, SSAM_PTR_UID_LEN)
+		__field(u8, priority)
+		__field(u16, length)
+		__field(unsigned long, state)
+		__field(u16, seq)
+		__field(int, status)
+	),
+
+	TP_fast_assign(
+		ssam_trace_ptr_uid(packet, __entry->uid);
+		__entry->priority = READ_ONCE(packet->priority);
+		__entry->length = packet->data.len;
+		__entry->state = READ_ONCE(packet->state);
+		__entry->seq = ssam_trace_get_packet_seq(packet);
+		__entry->status = status;
+	),
+
+	TP_printk("uid=%s, seq=%s, ty=%s, pri=0x%02x, len=%u, sta=%s, status=%d",
+		__entry->uid,
+		ssam_show_packet_seq(__entry->seq),
+		ssam_show_packet_type(__entry->state),
+		__entry->priority,
+		__entry->length,
+		ssam_show_packet_state(__entry->state),
+		__entry->status
+	)
+);
+
+#define DEFINE_SSAM_PACKET_STATUS_EVENT(name)				\
+	DEFINE_EVENT(ssam_packet_status_class, ssam_##name,		\
+		TP_PROTO(const struct ssh_packet *packet, int status),	\
+		TP_ARGS(packet, status)					\
+	)
+
+
+DECLARE_EVENT_CLASS(ssam_request_class,
+	TP_PROTO(const struct ssh_request *request),
+
+	TP_ARGS(request),
+
+	TP_STRUCT__entry(
+		__array(char, uid, SSAM_PTR_UID_LEN)
+		__field(unsigned long, state)
+		__field(u32, rqid)
+		__field(u8, tc)
+		__field(u16, cid)
+		__field(u16, iid)
+	),
+
+	TP_fast_assign(
+		const struct ssh_packet *p = &request->packet;
+
+		// use packet for UID so we can match requests to packets
+		ssam_trace_ptr_uid(p, __entry->uid);
+		__entry->state = READ_ONCE(request->state);
+		__entry->rqid = ssam_trace_get_request_id(p);
+		__entry->tc = ssam_trace_get_request_tc(p);
+		__entry->cid = ssam_trace_get_command_field_u8(p, cid);
+		__entry->iid = ssam_trace_get_command_field_u8(p, iid);
+	),
+
+	TP_printk("uid=%s, rqid=%s, ty=%s, sta=%s, tc=%s, cid=%s, iid=%s",
+		__entry->uid,
+		ssam_show_request_id(__entry->rqid),
+		ssam_show_request_type(__entry->state),
+		ssam_show_request_state(__entry->state),
+		ssam_show_ssh_tc(__entry->tc),
+		ssam_show_generic_u8_field(__entry->cid),
+		ssam_show_generic_u8_field(__entry->iid)
+	)
+);
+
+#define DEFINE_SSAM_REQUEST_EVENT(name)				\
+	DEFINE_EVENT(ssam_request_class, ssam_##name,		\
+		TP_PROTO(const struct ssh_request *request),	\
+		TP_ARGS(request)				\
+	)
+
+
+DECLARE_EVENT_CLASS(ssam_request_status_class,
+	TP_PROTO(const struct ssh_request *request, int status),
+
+	TP_ARGS(request, status),
+
+	TP_STRUCT__entry(
+		__array(char, uid, SSAM_PTR_UID_LEN)
+		__field(unsigned long, state)
+		__field(u32, rqid)
+		__field(u8, tc)
+		__field(u16, cid)
+		__field(u16, iid)
+		__field(int, status)
+	),
+
+	TP_fast_assign(
+		const struct ssh_packet *p = &request->packet;
+
+		// use packet for UID so we can match requests to packets
+		ssam_trace_ptr_uid(p, __entry->uid);
+		__entry->state = READ_ONCE(request->state);
+		__entry->rqid = ssam_trace_get_request_id(p);
+		__entry->tc = ssam_trace_get_request_tc(p);
+		__entry->cid = ssam_trace_get_command_field_u8(p, cid);
+		__entry->iid = ssam_trace_get_command_field_u8(p, iid);
+		__entry->status = status;
+	),
+
+	TP_printk("uid=%s, rqid=%s, ty=%s, sta=%s, tc=%s, cid=%s, iid=%s, status=%d",
+		__entry->uid,
+		ssam_show_request_id(__entry->rqid),
+		ssam_show_request_type(__entry->state),
+		ssam_show_request_state(__entry->state),
+		ssam_show_ssh_tc(__entry->tc),
+		ssam_show_generic_u8_field(__entry->cid),
+		ssam_show_generic_u8_field(__entry->iid),
+		__entry->status
+	)
+);
+
+#define DEFINE_SSAM_REQUEST_STATUS_EVENT(name)				\
+	DEFINE_EVENT(ssam_request_status_class, ssam_##name,		\
+		TP_PROTO(const struct ssh_request *request, int status),\
+		TP_ARGS(request, status)				\
+	)
+
+
+DECLARE_EVENT_CLASS(ssam_alloc_class,
+	TP_PROTO(void *ptr, size_t len),
+
+	TP_ARGS(ptr, len),
+
+	TP_STRUCT__entry(
+		__array(char, uid, SSAM_PTR_UID_LEN)
+		__field(size_t, len)
+	),
+
+	TP_fast_assign(
+		ssam_trace_ptr_uid(ptr, __entry->uid);
+		__entry->len = len;
+	),
+
+	TP_printk("uid=%s, len=%zu", __entry->uid, __entry->len)
+);
+
+#define DEFINE_SSAM_ALLOC_EVENT(name)					\
+	DEFINE_EVENT(ssam_alloc_class, ssam_##name,			\
+		TP_PROTO(void *ptr, size_t len),			\
+		TP_ARGS(ptr, len)					\
+	)
+
+
+DECLARE_EVENT_CLASS(ssam_free_class,
+	TP_PROTO(void *ptr),
+
+	TP_ARGS(ptr),
+
+	TP_STRUCT__entry(
+		__array(char, uid, SSAM_PTR_UID_LEN)
+		__field(size_t, len)
+	),
+
+	TP_fast_assign(
+		ssam_trace_ptr_uid(ptr, __entry->uid);
+	),
+
+	TP_printk("uid=%s", __entry->uid)
+);
+
+#define DEFINE_SSAM_FREE_EVENT(name)					\
+	DEFINE_EVENT(ssam_free_class, ssam_##name,			\
+		TP_PROTO(void *ptr),					\
+		TP_ARGS(ptr)						\
+	)
+
+
+DECLARE_EVENT_CLASS(ssam_generic_uint_class,
+	TP_PROTO(const char *property, unsigned int value),
+
+	TP_ARGS(property, value),
+
+	TP_STRUCT__entry(
+		__string(property, property)
+		__field(unsigned int, value)
+	),
+
+	TP_fast_assign(
+		__assign_str(property, property);
+		__entry->value = value;
+	),
+
+	TP_printk("%s=%u", __get_str(property), __entry->value)
+);
+
+#define DEFINE_SSAM_GENERIC_UINT_EVENT(name)				\
+	DEFINE_EVENT(ssam_generic_uint_class, ssam_##name,		\
+		TP_PROTO(const char *property, unsigned int value),	\
+		TP_ARGS(property, value)				\
+	)
+
+
+DEFINE_SSAM_FRAME_EVENT(rx_frame_received);
+DEFINE_SSAM_COMMAND_EVENT(rx_response_received);
+DEFINE_SSAM_COMMAND_EVENT(rx_event_received);
+
+DEFINE_SSAM_PACKET_EVENT(packet_release);
+DEFINE_SSAM_PACKET_EVENT(packet_submit);
+DEFINE_SSAM_PACKET_EVENT(packet_resubmit);
+DEFINE_SSAM_PACKET_EVENT(packet_timeout);
+DEFINE_SSAM_PACKET_EVENT(packet_cancel);
+DEFINE_SSAM_PACKET_STATUS_EVENT(packet_complete);
+DEFINE_SSAM_GENERIC_UINT_EVENT(ptl_timeout_reap);
+
+DEFINE_SSAM_REQUEST_EVENT(request_submit);
+DEFINE_SSAM_REQUEST_EVENT(request_timeout);
+DEFINE_SSAM_REQUEST_EVENT(request_cancel);
+DEFINE_SSAM_REQUEST_STATUS_EVENT(request_complete);
+DEFINE_SSAM_GENERIC_UINT_EVENT(rtl_timeout_reap);
+
+DEFINE_SSAM_ALLOC_EVENT(ctrl_packet_alloc);
+DEFINE_SSAM_FREE_EVENT(ctrl_packet_free);
+
+DEFINE_SSAM_ALLOC_EVENT(event_item_alloc);
+DEFINE_SSAM_FREE_EVENT(event_item_free);
+
+#endif /* _SURFACE_AGGREGATOR_TRACE_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace
+
+#include <trace/define_trace.h>
-- 
2.28.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 5/9] surface_aggregator: Add error injection capabilities
  2020-09-23 15:15 [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Maximilian Luz
                   ` (2 preceding siblings ...)
  2020-09-23 15:15 ` [RFC PATCH 4/9] surface_aggregator: Add trace points Maximilian Luz
@ 2020-09-23 15:15 ` Maximilian Luz
  2020-09-23 17:45   ` Greg Kroah-Hartman
  2020-09-23 15:15 ` [RFC PATCH 7/9] docs: driver-api: Add Surface Aggregator subsystem documentation Maximilian Luz
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maximilian Luz, linux-serial, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Jiri Slaby, Blaž Hrastnik, Dorian Stoll

This commit adds error injection hooks to the Surface Serial Hub
communication protocol implementation, to:

 - simulate simple serial transmission errors,

 - drop packets, requests, and responses, simulating communication
   failures and potentially trigger retransmission timeouts, as well as

 - inject invalid data into submitted and received packets.

Together with the trace points introduced in the previous commit, these
facilities are intended to aid in testing, validation, and debugging of
the Surface Aggregator communication layer.

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
---
 drivers/misc/surface_aggregator/Kconfig       |  15 +
 .../surface_aggregator/ssh_packet_layer.c     | 302 +++++++++++++++++-
 .../surface_aggregator/ssh_request_layer.c    |  36 +++
 drivers/misc/surface_aggregator/trace.h       |   9 +
 4 files changed, 360 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/surface_aggregator/Kconfig b/drivers/misc/surface_aggregator/Kconfig
index a5a98c9e17a6..36a47c9e129d 100644
--- a/drivers/misc/surface_aggregator/Kconfig
+++ b/drivers/misc/surface_aggregator/Kconfig
@@ -34,3 +34,18 @@ menuconfig SURFACE_AGGREGATOR
 	  devices using SAM-over-SSH are supported, whereas devices using
 	  SAM-over-HID, which is used on the 4th generation, are currently not
 	  supported.
+
+config SURFACE_AGGREGATOR_ERROR_INJECTION
+	bool "Surface System Aggregator Module Error Injection Capabilities"
+	depends on SURFACE_AGGREGATOR
+	depends on FUNCTION_ERROR_INJECTION
+	default n
+	help
+	  Provides error-injection capabilities for the Surface System
+	  Aggregator Module subsystem and Surface Serial Hub driver.
+
+	  Specifically, exports error injection hooks to be used with the
+	  kernel's function error injection capabilities to simulate underlying
+	  transport and communication problems, such as invalid data sent to or
+	  received from the EC, dropped data, and communication timeouts.
+	  Intended for development and debugging.
diff --git a/drivers/misc/surface_aggregator/ssh_packet_layer.c b/drivers/misc/surface_aggregator/ssh_packet_layer.c
index 29dc64dd27a8..4ce8e493cfa3 100644
--- a/drivers/misc/surface_aggregator/ssh_packet_layer.c
+++ b/drivers/misc/surface_aggregator/ssh_packet_layer.c
@@ -2,6 +2,7 @@
 
 #include <asm/unaligned.h>
 #include <linux/atomic.h>
+#include <linux/error-injection.h>
 #include <linux/jiffies.h>
 #include <linux/kfifo.h>
 #include <linux/kref.h>
@@ -207,6 +208,288 @@
 #define SSH_PTL_RX_FIFO_LEN			4096
 
 
+#ifdef CONFIG_SURFACE_AGGREGATOR_ERROR_INJECTION
+
+/**
+ * ssh_ptl_should_drop_ack_packet() - Error injection hook to drop ACK packets.
+ *
+ * Useful to test detection and handling of automated re-transmits by the EC.
+ * Specifically of packets that the EC consideres not-ACKed but the driver
+ * already consideres ACKed (due to dropped ACK). In this case, the EC
+ * re-transmits the packet-to-be-ACKed and the driver should detect it as
+ * duplicate/already handled. Note that the driver should still send an ACK
+ * for the re-transmitted packet.
+ */
+static noinline bool ssh_ptl_should_drop_ack_packet(void)
+{
+	return false;
+}
+ALLOW_ERROR_INJECTION(ssh_ptl_should_drop_ack_packet, TRUE);
+
+/**
+ * ssh_ptl_should_drop_nak_packet() - Error injection hook to drop NAK packets.
+ *
+ * Useful to test/force automated (timeout-based) re-transmit by the EC.
+ * Specifically, packets that have not reached the driver completely/with valid
+ * checksums. Only useful in combination with receival of (injected) bad data.
+ */
+static noinline bool ssh_ptl_should_drop_nak_packet(void)
+{
+	return false;
+}
+ALLOW_ERROR_INJECTION(ssh_ptl_should_drop_nak_packet, TRUE);
+
+/**
+ * ssh_ptl_should_drop_dsq_packet() - Error injection hook to drop sequenced
+ * data packet.
+ *
+ * Useful to test re-transmit timeout of the driver. If the data packet has not
+ * been ACKed after a certain time, the driver should re-transmit the packet up
+ * to limited number of times defined in SSH_PTL_MAX_PACKET_TRIES.
+ */
+static noinline bool ssh_ptl_should_drop_dsq_packet(void)
+{
+	return false;
+}
+ALLOW_ERROR_INJECTION(ssh_ptl_should_drop_dsq_packet, TRUE);
+
+/**
+ * ssh_ptl_should_fail_write() - Error injection hook to make
+ * serdev_device_write() fail.
+ *
+ * Hook to simulate errors in serdev_device_write when transmitting packets.
+ */
+static noinline int ssh_ptl_should_fail_write(void)
+{
+	return 0;
+}
+ALLOW_ERROR_INJECTION(ssh_ptl_should_fail_write, ERRNO);
+
+/**
+ * ssh_ptl_should_corrupt_tx_data() - Error injection hook to simualte invalid
+ * data being sent to the EC.
+ *
+ * Hook to simulate corrupt/invalid data being sent from host (driver) to EC.
+ * Causes the packet data to be actively corrupted by overwriting it with
+ * pre-defined values, such that it becomes invalid, causing the EC to respond
+ * with a NAK packet. Useful to test handling of NAK packets received by the
+ * driver.
+ */
+static noinline bool ssh_ptl_should_corrupt_tx_data(void)
+{
+	return false;
+}
+ALLOW_ERROR_INJECTION(ssh_ptl_should_corrupt_tx_data, TRUE);
+
+/**
+ * ssh_ptl_should_corrupt_rx_syn() - Error injection hook to simulate invalid
+ * data being sent by the EC.
+ *
+ * Hook to simulate invalid SYN bytes, i.e. an invalid start of messages and
+ * test handling thereof in the driver.
+ */
+static noinline bool ssh_ptl_should_corrupt_rx_syn(void)
+{
+	return false;
+}
+ALLOW_ERROR_INJECTION(ssh_ptl_should_corrupt_rx_syn, TRUE);
+
+/**
+ * ssh_ptl_should_corrupt_rx_data() - Error injection hook to simulate invalid
+ * data being sent by the EC.
+ *
+ * Hook to simulate invalid data/checksum of the message frame and test handling
+ * thereof in the driver.
+ */
+static noinline bool ssh_ptl_should_corrupt_rx_data(void)
+{
+	return false;
+}
+ALLOW_ERROR_INJECTION(ssh_ptl_should_corrupt_rx_data, TRUE);
+
+
+static bool __ssh_ptl_should_drop_ack_packet(struct ssh_packet *packet)
+{
+	if (likely(!ssh_ptl_should_drop_ack_packet()))
+		return false;
+
+	trace_ssam_ei_tx_drop_ack_packet(packet);
+	ptl_info(packet->ptl, "packet error injection: dropping ACK packet %p\n",
+		 packet);
+
+	return true;
+}
+
+static bool __ssh_ptl_should_drop_nak_packet(struct ssh_packet *packet)
+{
+	if (likely(!ssh_ptl_should_drop_nak_packet()))
+		return false;
+
+	trace_ssam_ei_tx_drop_nak_packet(packet);
+	ptl_info(packet->ptl, "packet error injection: dropping NAK packet %p\n",
+		 packet);
+
+	return true;
+}
+
+static bool __ssh_ptl_should_drop_dsq_packet(struct ssh_packet *packet)
+{
+	if (likely(!ssh_ptl_should_drop_dsq_packet()))
+		return false;
+
+	trace_ssam_ei_tx_drop_dsq_packet(packet);
+	ptl_info(packet->ptl,
+		"packet error injection: dropping sequenced data packet %p\n",
+		 packet);
+
+	return true;
+}
+
+static bool ssh_ptl_should_drop_packet(struct ssh_packet *packet)
+{
+	// ignore packets that don't carry any data (i.e. flush)
+	if (!packet->data.ptr || !packet->data.len)
+		return false;
+
+	switch (packet->data.ptr[SSH_MSGOFFSET_FRAME(type)]) {
+	case SSH_FRAME_TYPE_ACK:
+		return __ssh_ptl_should_drop_ack_packet(packet);
+
+	case SSH_FRAME_TYPE_NAK:
+		return __ssh_ptl_should_drop_nak_packet(packet);
+
+	case SSH_FRAME_TYPE_DATA_SEQ:
+		return __ssh_ptl_should_drop_dsq_packet(packet);
+
+	default:
+		return false;
+	}
+}
+
+static int ssh_ptl_write_buf(struct ssh_ptl *ptl, struct ssh_packet *packet,
+			     const unsigned char *buf, size_t count)
+{
+	int status;
+
+	status = ssh_ptl_should_fail_write();
+	if (unlikely(status)) {
+		trace_ssam_ei_tx_fail_write(packet, status);
+		ptl_info(packet->ptl,
+			 "packet error injection: simulating transmit error %d,"
+			 " packet %p\n", status, packet);
+
+		return status;
+	}
+
+	return serdev_device_write_buf(ptl->serdev, buf, count);
+}
+
+static void ssh_ptl_tx_inject_invalid_data(struct ssh_packet *packet)
+{
+	// ignore packets that don't carry any data (i.e. flush)
+	if (!packet->data.ptr || !packet->data.len)
+		return;
+
+	// only allow sequenced data packets to be modified
+	if (packet->data.ptr[SSH_MSGOFFSET_FRAME(type)] != SSH_FRAME_TYPE_DATA_SEQ)
+		return;
+
+	if (likely(!ssh_ptl_should_corrupt_tx_data()))
+		return;
+
+	trace_ssam_ei_tx_corrupt_data(packet);
+	ptl_info(packet->ptl,
+		 "packet error injection: simulating invalid transmit data on packet %p\n",
+		 packet);
+
+	/*
+	 * NB: The value 0xb3 has been chosen more or less randomly so that it
+	 * doesn't have any (major) overlap with the SYN bytes (aa 55) and is
+	 * non-trivial (i.e. non-zero, non-0xff).
+	 */
+	memset(packet->data.ptr, 0xb3, packet->data.len);
+}
+
+static void ssh_ptl_rx_inject_invalid_syn(struct ssh_ptl *ptl,
+					  struct ssam_span *data)
+{
+	struct ssam_span frame;
+
+	// check if there actually is something to corrupt
+	if (!sshp_find_syn(data, &frame))
+		return;
+
+	if (likely(!ssh_ptl_should_corrupt_rx_syn()))
+		return;
+
+	trace_ssam_ei_rx_corrupt_syn("data_length", data->len);
+
+	data->ptr[1] = 0xb3;	// set second byte of SYN to "random" value
+}
+
+static void ssh_ptl_rx_inject_invalid_data(struct ssh_ptl *ptl,
+					   struct ssam_span *frame)
+{
+	size_t payload_len, message_len;
+	struct ssh_frame *sshf;
+
+	// ignore incomplete messages, will get handled once it's complete
+	if (frame->len < SSH_MESSAGE_LENGTH(0))
+		return;
+
+	// ignore incomplete messages, part 2
+	payload_len = get_unaligned_le16(&frame->ptr[SSH_MSGOFFSET_FRAME(len)]);
+	message_len = SSH_MESSAGE_LENGTH(payload_len);
+	if (frame->len < message_len)
+		return;
+
+	if (likely(!ssh_ptl_should_corrupt_rx_data()))
+		return;
+
+	sshf = (struct ssh_frame *)&frame->ptr[SSH_MSGOFFSET_FRAME(type)];
+	trace_ssam_ei_rx_corrupt_data(sshf);
+
+	/*
+	 * Flip bits in first byte of payload checksum. This is basically
+	 * equivalent to a payload/frame data error without us having to worry
+	 * about (the, arguably pretty small, probability of) accidental
+	 * checksum collisions.
+	 */
+	frame->ptr[frame->len - 2] = ~frame->ptr[frame->len - 2];
+}
+
+#else /* CONFIG_SURFACE_AGGREGATOR_ERROR_INJECTION */
+
+static inline bool ssh_ptl_should_drop_packet(struct ssh_packet *packet)
+{
+	return false;
+}
+
+static inline int ssh_ptl_write_buf(struct ssh_ptl *ptl,
+				    struct ssh_packet *packet,
+				    const unsigned char *buf,
+				    size_t count)
+{
+	return serdev_device_write_buf(ptl->serdev, buf, count);
+}
+
+static inline void ssh_ptl_tx_inject_invalid_data(struct ssh_packet *packet)
+{
+}
+
+static inline void ssh_ptl_rx_inject_invalid_syn(struct ssh_ptl *ptl,
+						 struct ssam_span *data)
+{
+}
+
+static inline void ssh_ptl_rx_inject_invalid_data(struct ssh_ptl *ptl,
+						  struct ssam_span *frame)
+{
+}
+
+#endif /* CONFIG_SURFACE_AGGREGATOR_ERROR_INJECTION */
+
+
 static void __ssh_ptl_packet_release(struct kref *kref)
 {
 	struct ssh_packet *p = container_of(kref, struct ssh_packet, refcnt);
@@ -736,6 +1019,7 @@ static int ssh_ptl_tx_threadfn(void *data)
 
 	while (!kthread_should_stop()) {
 		unsigned char *buf;
+		bool drop = false;
 		size_t len = 0;
 		int status = 0;
 
@@ -751,8 +1035,16 @@ static int ssh_ptl_tx_threadfn(void *data)
 			}
 		}
 
+		// error injection: drop packet to simulate transmission problem
+		if (ptl->tx.offset == 0)
+			drop = ssh_ptl_should_drop_packet(ptl->tx.packet);
+
+		// error injection: simulate invalid packet data
+		if (ptl->tx.offset == 0 && !drop)
+			ssh_ptl_tx_inject_invalid_data(ptl->tx.packet);
+
 		// note: flush-packets don't have any data
-		if (likely(ptl->tx.packet->data.ptr)) {
+		if (likely(ptl->tx.packet->data.ptr && !drop)) {
 			buf = ptl->tx.packet->data.ptr + ptl->tx.offset;
 			len = ptl->tx.packet->data.len - ptl->tx.offset;
 
@@ -760,7 +1052,7 @@ static int ssh_ptl_tx_threadfn(void *data)
 			print_hex_dump_debug("tx: ", DUMP_PREFIX_OFFSET, 16, 1,
 					     buf, len, false);
 
-			status = serdev_device_write_buf(ptl->serdev, buf, len);
+			status = ssh_ptl_write_buf(ptl, ptl->tx.packet, buf, len);
 		}
 
 		if (status < 0) {
@@ -1340,6 +1632,9 @@ static size_t ssh_ptl_rx_eval(struct ssh_ptl *ptl, struct ssam_span *source)
 	bool syn_found;
 	int status;
 
+	// error injection: modify data to simulate corrupt SYN bytes
+	ssh_ptl_rx_inject_invalid_syn(ptl, source);
+
 	// find SYN
 	syn_found = sshp_find_syn(source, &aligned);
 
@@ -1370,6 +1665,9 @@ static size_t ssh_ptl_rx_eval(struct ssh_ptl *ptl, struct ssam_span *source)
 	if (unlikely(!syn_found))
 		return aligned.ptr - source->ptr;
 
+	// error injection: modify data to simulate corruption
+	ssh_ptl_rx_inject_invalid_data(ptl, &aligned);
+
 	// parse and validate frame
 	status = sshp_parse_frame(&ptl->serdev->dev, &aligned, &frame, &payload,
 				  SSH_PTL_RX_BUF_LEN);
diff --git a/drivers/misc/surface_aggregator/ssh_request_layer.c b/drivers/misc/surface_aggregator/ssh_request_layer.c
index e7358b9a2e4d..62513594d4e8 100644
--- a/drivers/misc/surface_aggregator/ssh_request_layer.c
+++ b/drivers/misc/surface_aggregator/ssh_request_layer.c
@@ -3,6 +3,7 @@
 #include <asm/unaligned.h>
 #include <linux/atomic.h>
 #include <linux/completion.h>
+#include <linux/error-injection.h>
 #include <linux/ktime.h>
 #include <linux/list.h>
 #include <linux/slab.h>
@@ -47,6 +48,31 @@
 #define SSH_RTL_MAX_PENDING		3
 
 
+#ifdef CONFIG_SURFACE_AGGREGATOR_ERROR_INJECTION
+
+/**
+ * ssh_rtl_should_drop_response() - Error injection hook to drop request
+ * responses.
+ *
+ * Useful to cause request transmission timeouts in the driver by dropping the
+ * response to a request.
+ */
+static noinline bool ssh_rtl_should_drop_response(void)
+{
+	return false;
+}
+ALLOW_ERROR_INJECTION(ssh_rtl_should_drop_response, TRUE);
+
+#else
+
+static inline bool ssh_rtl_should_drop_response(void)
+{
+	return false;
+}
+
+#endif
+
+
 static u16 ssh_request_get_rqid(struct ssh_request *rqst)
 {
 	return get_unaligned_le16(rqst->packet.data.ptr
@@ -453,6 +479,16 @@ static void ssh_rtl_complete(struct ssh_rtl *rtl,
 		if (unlikely(ssh_request_get_rqid(p) != rqid))
 			continue;
 
+		// simulate response timeout
+		if (ssh_rtl_should_drop_response()) {
+			spin_unlock(&rtl->pending.lock);
+
+			trace_ssam_ei_rx_drop_response(p);
+			rtl_info(rtl, "request error injection: dropping response for request %p\n",
+				 &p->packet);
+			return;
+		}
+
 		/*
 		 * Mark as "response received" and "locked" as we're going to
 		 * complete it.
diff --git a/drivers/misc/surface_aggregator/trace.h b/drivers/misc/surface_aggregator/trace.h
index eb2e3e1457de..e99e35a2d939 100644
--- a/drivers/misc/surface_aggregator/trace.h
+++ b/drivers/misc/surface_aggregator/trace.h
@@ -594,6 +594,15 @@ DEFINE_SSAM_REQUEST_EVENT(request_cancel);
 DEFINE_SSAM_REQUEST_STATUS_EVENT(request_complete);
 DEFINE_SSAM_GENERIC_UINT_EVENT(rtl_timeout_reap);
 
+DEFINE_SSAM_PACKET_EVENT(ei_tx_drop_ack_packet);
+DEFINE_SSAM_PACKET_EVENT(ei_tx_drop_nak_packet);
+DEFINE_SSAM_PACKET_EVENT(ei_tx_drop_dsq_packet);
+DEFINE_SSAM_PACKET_STATUS_EVENT(ei_tx_fail_write);
+DEFINE_SSAM_PACKET_EVENT(ei_tx_corrupt_data);
+DEFINE_SSAM_GENERIC_UINT_EVENT(ei_rx_corrupt_syn);
+DEFINE_SSAM_FRAME_EVENT(ei_rx_corrupt_data);
+DEFINE_SSAM_REQUEST_EVENT(ei_rx_drop_response);
+
 DEFINE_SSAM_ALLOC_EVENT(ctrl_packet_alloc);
 DEFINE_SSAM_FREE_EVENT(ctrl_packet_free);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 7/9] docs: driver-api: Add Surface Aggregator subsystem documentation
  2020-09-23 15:15 [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Maximilian Luz
                   ` (3 preceding siblings ...)
  2020-09-23 15:15 ` [RFC PATCH 5/9] surface_aggregator: Add error injection capabilities Maximilian Luz
@ 2020-09-23 15:15 ` Maximilian Luz
  2020-09-23 15:30 ` [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Arnd Bergmann
       [not found] ` <20200923151511.3842150-2-luzmaximilian@gmail.com>
  6 siblings, 0 replies; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Maximilian Luz, linux-serial, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Jiri Slaby, Blaž Hrastnik, Dorian Stoll

Add documentation for the Surface Aggregator subsystem and its client
drivers, giving an overview of the subsystem, its use-cases, its
internal structure and internal API, as well as its external API for
writing client drivers.

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
---
 Documentation/driver-api/index.rst            |   1 +
 .../surface_aggregator/client-api.rst         |  38 ++
 .../driver-api/surface_aggregator/client.rst  | 394 ++++++++++++++++++
 .../surface_aggregator/clients/index.rst      |  10 +
 .../driver-api/surface_aggregator/index.rst   |  21 +
 .../surface_aggregator/internal-api.rst       |  67 +++
 .../surface_aggregator/internal.rst           |  50 +++
 .../surface_aggregator/overview.rst           |  76 ++++
 .../driver-api/surface_aggregator/ssh.rst     | 343 +++++++++++++++
 MAINTAINERS                                   |   1 +
 10 files changed, 1001 insertions(+)
 create mode 100644 Documentation/driver-api/surface_aggregator/client-api.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/client.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/clients/index.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/index.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/internal-api.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/internal.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/overview.rst
 create mode 100644 Documentation/driver-api/surface_aggregator/ssh.rst

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 5ef2cfe3a16b..dbb5f7353022 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -100,6 +100,7 @@ available subsections can be seen below.
    rfkill
    serial/index
    sm501
+   surface_aggregator/index
    switchtec
    sync_file
    vfio-mediated-device
diff --git a/Documentation/driver-api/surface_aggregator/client-api.rst b/Documentation/driver-api/surface_aggregator/client-api.rst
new file mode 100644
index 000000000000..b93608a1be38
--- /dev/null
+++ b/Documentation/driver-api/surface_aggregator/client-api.rst
@@ -0,0 +1,38 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Client Driver API Documentation
+===============================
+
+.. contents::
+    :depth: 2
+
+
+Serial Hub Communication
+========================
+
+.. kernel-doc:: include/linux/surface_aggregator/serial_hub.h
+
+.. kernel-doc:: drivers/misc/surface_aggregator/ssh_packet_layer.c
+    :export:
+
+
+Controller and Core Interface
+=============================
+
+.. kernel-doc:: include/linux/surface_aggregator/controller.h
+
+.. kernel-doc:: drivers/misc/surface_aggregator/controller.c
+    :export:
+
+.. kernel-doc:: drivers/misc/surface_aggregator/core.c
+    :export:
+
+
+Client Bus and Client Device API
+================================
+
+.. kernel-doc:: include/linux/surface_aggregator/device.h
+
+.. kernel-doc:: drivers/misc/surface_aggregator/bus.c
+    :export:
diff --git a/Documentation/driver-api/surface_aggregator/client.rst b/Documentation/driver-api/surface_aggregator/client.rst
new file mode 100644
index 000000000000..3ae21a49dee0
--- /dev/null
+++ b/Documentation/driver-api/surface_aggregator/client.rst
@@ -0,0 +1,394 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. |ssam_controller| replace:: :c:type:`struct ssam_controller <ssam_controller>`
+.. |ssam_device| replace:: :c:type:`struct ssam_device <ssam_device>`
+.. |ssam_device_driver| replace:: :c:type:`struct ssam_device_driver <ssam_device_driver>`
+.. |ssam_client_bind| replace:: :c:func:`ssam_client_bind`
+.. |ssam_client_link| replace:: :c:func:`ssam_client_link`
+.. |ssam_get_controller| replace:: :c:func:`ssam_get_controller`
+.. |ssam_controller_get| replace:: :c:func:`ssam_controller_get`
+.. |ssam_controller_put| replace:: :c:func:`ssam_controller_put`
+.. |ssam_device_alloc| replace:: :c:func:`ssam_device_alloc`
+.. |ssam_device_add| replace:: :c:func:`ssam_device_add`
+.. |ssam_device_remove| replace:: :c:func:`ssam_device_remove`
+.. |ssam_device_driver_register| replace:: :c:func:`ssam_device_driver_register`
+.. |ssam_device_driver_unregister| replace:: :c:func:`ssam_device_driver_unregister`
+.. |module_ssam_device_driver| replace:: :c:func:`module_ssam_device_driver`
+.. |SSAM_DEVICE| replace:: :c:func:`SSAM_DEVICE`
+.. |ssam_notifier_register| replace:: :c:func:`ssam_notifier_register`
+.. |ssam_notifier_unregister| replace:: :c:func:`ssam_notifier_unregister`
+.. |ssam_request_sync| replace:: :c:func:`ssam_request_sync`
+.. |ssam_event_mask| replace:: :c:type:`enum ssam_event_mask <ssam_event_mask>`
+
+
+======================
+Writing Client Drivers
+======================
+
+For the API documentation, refer to:
+
+.. toctree::
+   :maxdepth: 2
+
+   client-api
+
+
+Overview
+========
+
+Client drivers can be set up in two main ways, depending on how the
+corresponding device is made available to the system. We specifically
+differentiate between devices that are presented to the system via one of
+the conventional ways, e.g. as platform devices via ACPI, and devices that
+are non-discoverable and instead need to be explicitly provided by some
+other mechanism, as discussed further below.
+
+
+Non-SSAM Client Drivers
+=======================
+
+All communication with the SAM EC is handled via the |ssam_controller|
+representing that EC to the kernel. Drivers targeting a non-SSAM device (and
+thus not being a |ssam_device_driver|) need to explicitly establish a
+connection/relation to that controller. This can be done via the
+|ssam_client_bind| function. Said function returns a reference to the SSAM
+controller, but, more importantly, also establishes a device link between
+client device and controller (this can also be done separate via
+|ssam_client_link|). It is important to do this, as it, first, guarantees
+that the returned controller is valid for use in the client driver for as
+long as this driver is bound to its device, i.e. that the driver gets
+un-bound before the controller ever becomes invalid, and, second, as it
+ensures correct suspend/resume ordering. This setup should be done in the
+driver's probe function, and may be used to defer probing in case the SSAM
+subsystem is not ready yet, for example:
+
+.. code-block:: c
+
+   static int client_driver_probe(struct platform_device *pdev)
+   {
+           struct ssam_controller *ctrl;
+           int status;
+
+           status = ssam_client_bind(&pdev->dev, &ctrl);
+           if (status)
+                   return status == -ENXIO ? -EPROBE_DEFER : status;
+
+           // ...
+
+           return 0;
+   }
+
+The controller may be separately obtained via |ssam_get_controller| and its
+lifetime be guaranteed via |ssam_controller_get| and |ssam_controller_put|.
+Note that none of these functions, however, guarantee that the controller
+will not be shut down or suspended. These functions essentially only operate
+on the reference, i.e. only guarantee a bare minimum of accessability
+without any guarantees at all on practical operability.
+
+
+Adding SSAM Devices
+===================
+
+If a device does not already exist/is not already provided via conventional
+means, it should be provided as |ssam_device| via the SSAM client device
+hub. New devices can be added to this hub by entering their UID into the
+corresponding registry. SSAM devices can also be manually allocated via
+|ssam_device_alloc|, subsequently to which they have to be added via
+|ssam_device_add| and eventually removed via |ssam_device_remove|. By
+default, the parent of the device is set to the controller device provided
+for allocation, however this may be changed before the device is added. Note
+that, when changing the parent device, care must be taken to ensure that the
+controller lifetime and suspend/resume ordering guarantees, in the default
+setup provided through the parent-child relation, are preserved. If
+necessary, by use of |ssam_client_link| as is done for non-SSAM client
+drivers and described in more detail above.
+
+A client device must always be removed by the party which added the
+respective device before the controller shuts down. Such removal can be
+guaranteed by linking the driver providing the SSAM device to the controller
+via |ssam_client_link|, causing it to unbind before the controller driver
+unbinds. Client devices registered with the controller as parent are
+automatically removed when the controller shuts down, but this should not be
+relied upon, especially as this does not extend to client devices with a
+different parent.
+
+
+SSAM Client Drivers
+===================
+
+SSAM client device drivers are, in essence, no different than other device
+driver types. They are represented via |ssam_device_driver| and bind to a
+|ssam_device| via its UID (:c:type:`struct ssam_device.uid <ssam_device>`)
+member and the match table
+(:c:type:`struct ssam_device_driver.match_table <ssam_device_driver>`),
+which should be set when declaring the driver struct instance. Refer to the
+|SSAM_DEVICE| macro documentation for more details on how to define members
+of the driver's match table.
+
+The UID for SSAM client devices consists of a ``domain``, a ``category``,
+a ``target``, an ``instance``, and a ``function``. The ``domain`` is used
+differentiate between physical SAM devices
+(:c:type:`SSAM_DOMAIN_SERIALHUB <ssam_device_domain>`), i.e. devices that can
+be accessed via the Surface Serial Hub, and virtual ones
+(:c:type:`SSAM_DOMAIN_VIRTUAL <ssam_device_domain>`), such as client-device
+hubs, that have no real representation on the SAM EC and are solely used on
+the kernel/driver-side. For physical devices, ``category`` represents the
+target category, ``target`` the target ID, and ``instance`` the instance ID
+used to access the physical SAM device. In addition, ``function`` references
+a specific device functionality, but has no meaning to the SAM EC. The
+(default) name of a client device is generated based on its UID.
+
+A driver instance can be registered via |ssam_device_driver_register| and
+unregistered via |ssam_device_driver_unregister|. For convenience, the
+|module_ssam_device_driver| macro may be used to define module init- and
+exit-functions registering the driver.
+
+The controller associated with a SSAM client device can be found in its
+:c:type:`struct ssam_device.ctrl <ssam_device>` member. This reference is
+guaranteed to be valid for at least as long as the client driver is bound,
+but should also be valid for as long as the client device exists. Note,
+however, that access outside of the bound client driver must ensure that the
+controller device is not suspended while making any requests or
+(un)registering event notifiers (and thus should generally be avoided). This
+is guaranteed when the controller is accessed from inside the bound client
+driver.
+
+
+Making Synchronous Requests
+===========================
+
+Synchronous requests are (currently) the main form of host-initiated
+communication with the EC. There are a couple of ways to define and execute
+such requests, however, most of them boil down to something similar as shown
+in the example below. This example defines a write-read request, meaning
+that the caller provides an argument to the SAM EC and receives a response.
+The caller needs to know the (maximum) length of the response payload and
+provide a buffer for it.
+
+Care must be taken to ensure that any command payload data passed to the SAM
+EC is provided in little-endian format and, similarly, any response payload
+data received from it is converted from little-endian to host endianness.
+
+.. code-block:: c
+
+   int perform_request(struct ssam_controller *ctrl, u32 arg, u32 *ret)
+   {
+           struct ssam_request rqst;
+           struct ssam_response resp;
+           int status;
+
+           /* Convert request argument to little-endian. */
+           __le32 arg_le = cpu_to_le32(arg);
+           __le32 ret_le = cpu_to_le32(0);
+
+           /*
+            * Initialize request specification. Replace this with your values.
+            * The rqst.payload field may be NULL if rqst.length is zero,
+            * indicating that the request does not have any argument.
+            *
+            * Note: The request parameters used here are not valid, i.e.
+            *       they do not correspond to an actual SAM/EC request.
+            */
+           rqst.target_category = SSAM_SSH_TC_SAM;
+           rqst.target_id = 0x01;
+           rqst.command_id = 0x02;
+           rqst.instance_id = 0x03;
+           rqst.flags = SSAM_REQUEST_HAS_RESPONSE;
+           rqst.length = sizeof(arg_le);
+           rqst.payload = (u8 *)&arg_le;
+
+           /* Initialize request response. */
+           resp.capacity = sizeof(ret_le);
+           resp.length = 0;
+           resp.pointer = (u8 *)&ret_le;
+
+           /*
+            * Perform actual request. The response pointer may be null in case
+            * the request does not have any response. This must be consistent
+            * with the SSAM_REQUEST_HAS_RESPONSE flag set in the specification
+            * above.
+            */
+           status = ssam_request_sync(ctrl, &rqst, &resp);
+           if (status)
+               return status;
+
+           /*
+            * Alternatively use
+            *
+            *   ssam_request_sync_onstack(ctrl, &rqst, &resp, sizeof(arg_le));
+            *
+            * to perform the request, allocating the message buffer directly
+            * on the stack as opposed to via kzalloc(.
+            */
+
+           /*
+            * Convert request response back to native format. Note that in the
+            * error case, this value is not touched.
+            */
+           *ret = le32_to_cpu(ret_le);
+
+           return status;
+   }
+
+Note that |ssam_request_sync| in its essence is a wrapper over lower-level
+request primitives, which may also be used to perform requests. Refer to its
+implementation and documentation for more details.
+
+An arguably more user-friendly way of defining such functions is by using
+one of the generator macros, for example via:
+
+.. code-block:: c
+
+   SSAM_DEFINE_SYNC_REQUEST_W(__ssam_tmp_perf_mode_set, __le32, {
+           .target_category = SSAM_SSH_TC_TMP,
+           .target_id       = 0x01,
+           .command_id      = 0x03,
+           .instance_id     = 0x00,
+   });
+
+This example defines a function
+
+.. code-block:: c
+
+   int __ssam_tmp_perf_mode_set(struct ssam_controller *ctrl, const __le32 *arg);
+
+executing the specified request, with the controller passed in when calling
+said function. In this example, the argument is provided via the ``arg``
+pointer. Note that the generated function allocates the message buffer on
+the stack. Thus, if the argument provided via the request is large, these
+kinds of macros should be avoided. Also note that, in contrast to the
+previous non-macro example, this function does not do any endianness
+conversion, which has to be handled by the caller. Apart from those
+differences the function generated by the macro is similar to the one
+provided in the non-macro example above.
+
+The full list of such function-generating macros is
+
+- :c:func:`SSAM_DEFINE_SYNC_REQUEST_N` for requests without return value and
+  without argument.
+- :c:func:`SSAM_DEFINE_SYNC_REQUEST_R` for equests with return value but no
+  argument.
+- :c:func:`SSAM_DEFINE_SYNC_REQUEST_W` for requests without return value but
+  with argument.
+
+Refer to their respecitve documentation for more details. For each one of
+these macros, a special variant is provided, which targets request types
+applicable to multiple instances of the same device type:
+
+- :c:func:`SSAM_DEFINE_SYNC_REQUEST_MD_N`
+- :c:func:`SSAM_DEFINE_SYNC_REQUEST_MD_R`
+- :c:func:`SSAM_DEFINE_SYNC_REQUEST_MD_W`
+
+The difference of those macros to the previously mentioned versions is, that
+the device target and instance IDs are not fixed for the generated function,
+but instead have to be provided by the caller of said function.
+
+Additionally, variants for direct use with client devices, i.e.
+|ssam_device|, are also provided. These can, for example, be used as
+follows:
+
+.. code-block:: c
+
+   SSAM_DEFINE_SYNC_REQUEST_CL_R(ssam_bat_get_sta, __le32, {
+           .target_category = SSAM_SSH_TC_BAT,
+           .command_id      = 0x01,
+   });
+
+This invocation of the macro defines a function
+
+.. code-block:: c
+
+   int ssam_bat_get_sta(struct ssam_device *sdev, __le32 *ret);
+
+executing the specified request, using the device IDs and controller given
+in the client device. The full list of such macros for client devices is:
+
+- :c:func:`SSAM_DEFINE_SYNC_REQUEST_CL_N`
+- :c:func:`SSAM_DEFINE_SYNC_REQUEST_CL_R`
+- :c:func:`SSAM_DEFINE_SYNC_REQUEST_CL_W`
+
+
+Handling Events
+===============
+
+To receive events from the SAM EC, an event notifier must be registered for
+the desired event via |ssam_notifier_register|. The notifier must be
+unregistered via |ssam_notifier_unregister| once it is not required any
+more.
+
+Event notifiers are registered by providing (at minimum) a callback to call
+in case an event has been received, the registry specifying how the event
+should be enabled, an event ID specifying for which target category and,
+optionally and depending on the registry used, for which instance ID events
+should be enabled, and finally, flags describing how the EC will send these
+events. Additionally, a priority for the respective notifier may be
+specified, which determines its order in relation to any other notifier
+registered for the same target category.
+
+By default, event notifiers will receive all events for the specific target
+category, regardless of the instance ID specified when registering the
+notifier. The core may be instructed to only call a notifier if the target
+ID or instance ID (or both) of the event match the ones implied by the
+notifier IDs (in case of target ID, the target ID of the registry), by
+providing an event mask (see |ssam_event_mask|).
+
+In general, the target ID of the registry is also the target ID of the
+enabled event (with the notable exception being keyboard input events on the
+Surface Laptop 1 and 2, which are enabled via a registry with target ID 1,
+but provide events with target ID 2).
+
+A full example for registering an event notifier and handling received
+events is provided below:
+
+.. code-block:: c
+
+   u32 notifier_callback(struct ssam_event_notifier *nf,
+                         const struct ssam_event *event)
+   {
+           int status = ...
+
+           /* Handle the event here ... */
+
+           /* Convert return value and indicate that we handled the event. */
+           return ssam_notifier_from_errno(status) | SSAM_NOTIF_HANDLED;
+   }
+
+   int setup_notifier(struct ssam_device *sdev,
+                      struct ssam_event_notifier *nf)
+   {
+           /* Set priority wrt. other handlers of same target category. */
+           nf->base.priority = 1;
+
+           /* Set event/notifier callback. */
+           nf->base.fn = notifier_callback;
+
+           /* Specify event registry, i.e. how events get enabled/disabled. */
+           nf->event.reg = SSAM_EVENT_REGISTRY_KIP;
+
+           /* Specify which event to enable/disable */
+           nf->event.id.target_category = sdev->uid.category;
+           nf->event.id.instance = sdev->uid.instance;
+
+           /*
+            * Specify for which events the notifier callback gets executed.
+            * This essentially tells the core if it can skip notifiers that
+            * don't have target or instance IDs matching those of the event.
+            */
+           nf->event.mask = SSAM_EVENT_MASK_STRICT;
+
+           /* Specify event flags. */
+           nf->event.flags = SSAM_EVENT_SEQUENCED;
+
+           return ssam_notifier_register(sdev->ctrl, nf);
+   }
+
+Multiple event notifiers can be registered for the same event. The event
+handler core takes care of enabling and disabling events when notifiers are
+registered and unregistered, by keeping track of how many notifiers for a
+specific event (combination of registry, event target category, and event
+instance ID) are currently registered. This means that a specific event will
+be enabled when the first notifier for it is being registered and disabled
+when the last notifier for it is being unregistered. Note that the event
+flags are therefore only used on the first registered notifier, however, one
+should take care that notifiers for a specific event are always registered
+with the same flag and it is considered a bug to do otherwise.
diff --git a/Documentation/driver-api/surface_aggregator/clients/index.rst b/Documentation/driver-api/surface_aggregator/clients/index.rst
new file mode 100644
index 000000000000..680fa621dc9f
--- /dev/null
+++ b/Documentation/driver-api/surface_aggregator/clients/index.rst
@@ -0,0 +1,10 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+Client Driver Documentation
+===========================
+
+This is the documentation for client drivers themselves. Refer to
+:doc:`../client` for documentation on how to write client drivers.
+
+.. Place documentation for individual client drivers here.
diff --git a/Documentation/driver-api/surface_aggregator/index.rst b/Documentation/driver-api/surface_aggregator/index.rst
new file mode 100644
index 000000000000..5eff57c1836d
--- /dev/null
+++ b/Documentation/driver-api/surface_aggregator/index.rst
@@ -0,0 +1,21 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+Surface System Aggregator Module (SSAM)
+=======================================
+
+.. toctree::
+   :maxdepth: 2
+
+   overview
+   ssh
+   client
+   internal
+   clients/index
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/surface_aggregator/internal-api.rst b/Documentation/driver-api/surface_aggregator/internal-api.rst
new file mode 100644
index 000000000000..910fa9ec736c
--- /dev/null
+++ b/Documentation/driver-api/surface_aggregator/internal-api.rst
@@ -0,0 +1,67 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+Internal API Documentation
+==========================
+
+.. contents::
+    :depth: 2
+
+
+Packet Transport Layer
+======================
+
+.. kernel-doc:: drivers/misc/surface_aggregator/ssh_parser.h
+    :internal:
+
+.. kernel-doc:: drivers/misc/surface_aggregator/ssh_parser.c
+    :internal:
+
+.. kernel-doc:: drivers/misc/surface_aggregator/ssh_msgb.h
+    :internal:
+
+.. kernel-doc:: drivers/misc/surface_aggregator/ssh_packet_layer.h
+    :internal:
+
+.. kernel-doc:: drivers/misc/surface_aggregator/ssh_packet_layer.c
+    :internal:
+
+
+Request Transport Layer
+=======================
+
+.. kernel-doc:: drivers/misc/surface_aggregator/ssh_request_layer.h
+    :internal:
+
+.. kernel-doc:: drivers/misc/surface_aggregator/ssh_request_layer.c
+    :internal:
+
+
+Controller
+==========
+
+.. kernel-doc:: drivers/misc/surface_aggregator/controller.h
+    :internal:
+
+.. kernel-doc:: drivers/misc/surface_aggregator/controller.c
+    :internal:
+
+
+Client Device Bus
+=================
+
+.. kernel-doc:: drivers/misc/surface_aggregator/bus.c
+    :internal:
+
+
+Core
+====
+
+.. kernel-doc:: drivers/misc/surface_aggregator/core.c
+    :internal:
+
+
+Trace Helpers
+=============
+
+.. kernel-doc:: drivers/misc/surface_aggregator/trace.h
diff --git a/Documentation/driver-api/surface_aggregator/internal.rst b/Documentation/driver-api/surface_aggregator/internal.rst
new file mode 100644
index 000000000000..03cbc28659c3
--- /dev/null
+++ b/Documentation/driver-api/surface_aggregator/internal.rst
@@ -0,0 +1,50 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Core Driver Internals
+=====================
+
+For the API documentation, refer to:
+
+.. toctree::
+   :maxdepth: 2
+
+   internal-api
+
+
+Overview
+========
+
+The SSAM core implementation is structured in layers, somewhat following the
+SSH protocol structure:
+
+Lower-level packet transport is implemented in the *packet transport layer
+(PTL)*, directly building on top of the serial device (serdev)
+infrastructure of the kernel. As the name indicates, this layer deals with
+the packet transport logic and handles things like packet validation, packet
+acknowledgment (ACKing), packet (retransmission) timeouts, and relaying
+packet payloads to higher-level layers.
+
+Above this sits the *request transport layer (RTL)*. This layer is centered
+around command-type packet payloads, i.e. requests (sent from host to EC),
+responses of the EC to those requests, and events (sent from EC to host).
+It, specifically, distinguishes events from request responses, matches
+responses to their corresponding requests, and implements request timeouts.
+
+The *controller* layer is building on top of this and essentially decides
+how request responses and, especially, events are dealt with. It provides an
+event notifier system, handles event activation/deactivation, provides a
+workqueue for event and asynchronous request completion, and also manages
+the message counters required for building command messages (``SEQ``,
+``RQID``). This layer basically provides a fundamental interface to the SAM
+EC for use in other kernel drivers.
+
+While the controller layer already provides an interface for other kernel
+drivers, the client *bus* extends this interface to provide support for
+native SSAM devices, i.e. devices that are not defined in ACPI and not
+implemented as platform devices, via :c:type:`struct ssam_device <ssam_device>`
+and :c:type:`struct ssam_device_driver <ssam_device_driver>`. This aims to
+simplify management of client devices and client drivers.
+
+Refer to :doc:`client` for documentation regarding the client device/driver
+API and interface options for other kernel drivers.
diff --git a/Documentation/driver-api/surface_aggregator/overview.rst b/Documentation/driver-api/surface_aggregator/overview.rst
new file mode 100644
index 000000000000..7b7a6d9e8e22
--- /dev/null
+++ b/Documentation/driver-api/surface_aggregator/overview.rst
@@ -0,0 +1,76 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========
+Overview
+========
+
+The Surface/System Aggregator Module (SAM, SSAM) is an (arguably *the*)
+embedded controller (EC) on Microsoft Surface devices. It has been originally
+introduced on 4th generation devices (Surface Pro 4, Surface Book 1), but
+its responsibilities and feature-set have since been expanded significantly
+with the following generations.
+
+
+Features and Integration
+========================
+
+Not much is currently known about SAM on 4th generation devices (Surface Pro
+4, Surface Book 1), due to the use of a different communication interface
+between host and EC (as detailed below). On 5th (Surface Pro 2017, Surface
+Book 2, Surface Laptop 1) and later generation devices, SAM is responsible
+for providing battery information (both current status and static values,
+such as maximum capacity etc.), as well as an assortment of temperature
+sensors (e.g. skin temperature) and cooling/performance-mode setting to the
+host. On the Surface Book 2, specifically, it additionally provides an
+interface for properly handling clipboard detachment (i.e. separating the
+display part from the keyboard part of the device), on the Surface Laptop 1
+and 2 it is required for keyboard HID input. This HID subsystem has been
+restructured for 7th generation devices and on those, specifically Surface
+Laptop 3 and Surface Book 3, is responsible for all major HID input (i.e.
+keyboard and touchpad).
+
+While the features have not changed much on a coarse level since the 5th
+generation, internal interfaces have undergone some rather large changes. On
+5th and 6th generation devices, both battery and temperature information is
+exposed to ACPI via a shim driver (referred to as Surface ACPI Notify, or
+SAN), translating ACPI generic serial bus write-/read-accesses to SAM
+requests. On 7th generation devices, this additional layer is gone and these
+devices require a driver hooking directly into the SAM interface. Equally,
+on newer generations, less devices are declared in ACPI, making them a bit
+harder to discover and requiring us to hard-code a sort of device registry.
+Due to this, a SSAM bus and subsystem with client devices
+(:c:type:`struct ssam_device <ssam_device>`) has been implemented.
+
+
+Communication
+=============
+
+The type of communication interface between host and EC depends on the
+generation of the Surface device. On 4th generation devices, host and EC
+communicate via HID, specifically using a HID-over-I2C device, whereas on
+5th and later generations, communication takes place via a USART serial
+device. In accordance to the drivers found on other operating systems, we
+refer to the serial device and its driver as Surface Serial Hub (SSH) and
+when needed to differentiate between both types of SAM as SAM-over-SSH, in
+contrast to SAM-over-HID for the former variant.
+
+Currently, this subsystem only supports SAM-over-SSH. The SSH communication
+interface is described in more detail below. The HID interface has not been
+reverse engineered yet and it is, at the moment, unclear how many (and
+which) concepts of the SSH interface detailed below can be transferred to
+it.
+
+Surface Serial Hub
+------------------
+
+As already elaborated above, the Surface Serial Hub (SSH) is the
+communication interface for SAM on 5th- and all later-generation Surface
+devices. On the highest level, communication can be sparated into two main
+types: Requests, messages sent from host to EC that may trigger a direct
+response from the EC (explicitly associated with the request), and events
+(sometimes also referred to as notifications), sent from EC to host without
+being a direct response to a previous request. We may also refer to requests
+without response as commands. In general, events need to be enabled via one
+of multiple dedicated commands before they are sent by the EC.
+
+See :doc:`ssh` for a more technical protocol documentation.
diff --git a/Documentation/driver-api/surface_aggregator/ssh.rst b/Documentation/driver-api/surface_aggregator/ssh.rst
new file mode 100644
index 000000000000..8ffa93c92b94
--- /dev/null
+++ b/Documentation/driver-api/surface_aggregator/ssh.rst
@@ -0,0 +1,343 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. |u8| replace:: :c:type:`u8 <u8>`
+.. |u16| replace:: :c:type:`u16 <u16>`
+.. |TYPE| replace:: ``TYPE``
+.. |LEN| replace:: ``LEN``
+.. |SEQ| replace:: ``SEQ``
+.. |SYN| replace:: ``SYN``
+.. |NAK| replace:: ``NAK``
+.. |ACK| replace:: ``ACK``
+.. |DATA| replace:: ``DATA``
+.. |DATA_SEQ| replace:: ``DATA_SEQ``
+.. |DATA_NSQ| replace:: ``DATA_NSQ``
+.. |TC| replace:: ``TC``
+.. |TID| replace:: ``TID``
+.. |IID| replace:: ``IID``
+.. |RQID| replace:: ``RQID``
+.. |CID| replace:: ``CID``
+
+===========================
+Surface Serial Hub Protocol
+===========================
+
+The Surface Serial Hub (SSH) is the central communication interface for the
+embedded Surface Aggregator Module controller (SAM or EC) on newer Surface
+generations. We will refer to this protocol and interface as SAM-over-SSH,
+as opposed to SAM-over-HID for the older generations.
+
+On Surface devices with SAM-over-SSH, SAM is connected to the host via UART
+and defined in ACPI as device with ID ``MSHW0084``. On these devices,
+significant functionality is provided via SAM, including access to battery
+and power information and events, thermal read-outs and events, and many
+more. For Surface Laptops, keyboard input is handled via HID directed
+through SAM, on the Surface Laptop 3 and Surface Book 3 this also includes
+touchpad input.
+
+Note that the standard disclaimer for this subsystem also applies to this
+document: All of this has been reverse-engineered and may thus be erroneous
+and/or incomplete.
+
+All CRCs used in the following are two-byte ``crc_ccitt_false(0xffff, ...)``.
+All multi-byte values are little-endian, there is no implicit padding between
+values.
+
+
+SSH Packet Protocol: Definitions
+================================
+
+The fundamental communication unit of the SSH protocol is a frame
+(:c:type:`struct ssh_frame <ssh_frame>`). A frame consists of the following
+fields, packed together and in order:
+
+.. flat-table:: SSH Frame
+   :widths: 1 1 4
+   :header-rows: 1
+
+   * - Field
+     - Type
+     - Description
+
+   * - |TYPE|
+     - |u8|
+     - Type identifier of the frame.
+
+   * - |LEN|
+     - |u16|
+     - Length of the payload associated with the frame.
+
+   * - |SEQ|
+     - |u8|
+     - Sequence ID (see explanation below).
+
+Each frame structure is followed by a CRC over this structure. The CRC over
+the frame structure (|TYPE|, |LEN|, and |SEQ| fields) is placed directly
+after the frame structure and before the payload. The payload is followed by
+its own CRC (over all payload bytes). If the payload is not present (i.e.
+the frame has ``LEN=0``), the CRC of the payload is still present and will
+evaluate to ``0xffff``. The |LEN| field does not include any of the CRCs, it
+equals the number of bytes inbetween the CRC of the frame and the CRC of the
+payload.
+
+Additionally, the following fixed two-byte sequences are used:
+
+.. flat-table:: SSH Byte Sequences
+   :widths: 1 1 4
+   :header-rows: 1
+
+   * - Name
+     - Value
+     - Description
+
+   * - |SYN|
+     - ``[0xAA, 0x55]``
+     - Synchronization bytes.
+
+A message consists of |SYN|, followed by the frame (|TYPE|, |LEN|, |SEQ| and
+CRC) and, if specified in the frame (i.e. ``LEN > 0``), payload bytes,
+followed finally, regardless if the payload is present, the payload CRC. The
+messages corresponding to an exchange are, in part, identified by having the
+same sequence ID (|SEQ|), stored inside the frame (more on this in the next
+section). The sequence ID is a wrapping counter.
+
+A frame can have the following types
+(:c:type:`enum ssh_frame_type <ssh_frame_type>`):
+
+.. flat-table:: SSH Frame Types
+   :widths: 1 1 4
+   :header-rows: 1
+
+   * - Name
+     - Value
+     - Short Description
+
+   * - |NAK|
+     - ``0x04``
+     - Sent on error in previously received message.
+
+   * - |ACK|
+     - ``0x40``
+     - Sent to acknowledge receival of |DATA| frame.
+
+   * - |DATA_SEQ|
+     - ``0x80``
+     - Sent to transfer data. Sequenced.
+
+   * - |DATA_NSQ|
+     - ``0x00``
+     - Same as |DATA_SEQ|, but does not need to be ACKed.
+
+Both |NAK|- and |ACK|-type frames are used to control flow of messages and
+thus do not carry a payload. |DATA_SEQ|- and |DATA_NSQ|-type frames on the
+other hand must carry a payload. The flow sequence and interaction of
+different frame types will be described in more depth in the next section.
+
+
+SSH Packet Protocol: Flow Sequence
+==================================
+
+Each exchange begins with |SYN|, followed by a |DATA_SEQ|- or
+|DATA_NSQ|-type frame, followed by its CRC, payload, and payload CRC. In
+case of a |DATA_NSQ|-type frame, the exchange is then finished. In case of a
+|DATA_SEQ|-type frame, the receiving party has to acknowledge receival of
+the frame by responding with a message containing an |ACK|-type frame with
+the same sequence ID of the |DATA| frame. In other words, the sequence ID of
+the |ACK| frame specifies the |DATA| frame to be acknowledged. In case of an
+error, e.g. an invalid CRC, the receiving party responds with a message
+containing an |NAK|-type frame. As the sequence ID of the previous data
+frame, for which an error is indicated via the |NAK| frame, cannot be relied
+upon, the sequence ID of the |NAK| frame should not be used and is set to
+zero. After receival of an |NAK| frame, the sending party should re-send all
+outstanding (non-ACKed) messages.
+
+Sequence IDs are not synchronized between the two parties, meaning that they
+are managed independently for each party. Identifying the messages
+corresponding to a single exchange thus relies on the sequence ID as well as
+the type of the message, and the context. Specifically, the sequence ID is
+used to associate an ``ACK`` with its ``DATA_SEQ``-type frame, but not
+``DATA_SEQ``- or ``DATA_NSQ``-type frames with other ``DATA``- type frames.
+
+An example exchange might look like this:
+
+::
+
+    tx: -- SYN FRAME(D) CRC(F) PAYLOAD CRC(P) -----------------------------
+    rx: ------------------------------------- SYN FRAME(A) CRC(F) CRC(P) --
+
+where both frames have the same sequence ID (``SEQ``). Here, ``FRAME(D)``
+indicates a |DATA_SEQ|-type frame, ``FRAME(A)`` an ``ACK``-type frame,
+``CRC(F)`` the CRC over the previous frame, ``CRC(P)`` the CRC over the
+previous payload. In case of an error, the exchange would look like this:
+
+::
+
+    tx: -- SYN FRAME(D) CRC(F) PAYLOAD CRC(P) -----------------------------
+    rx: ------------------------------------- SYN FRAME(N) CRC(F) CRC(P) --
+
+upon which the sender should re-send the message. ``FRAME(N)`` indicates an
+|NAK|-type frame. Note that the sequence ID of the |NAK|-type frame is fixed
+to zero. For |DATA_NSQ|-type frames, both exchanges are the same:
+
+::
+
+    tx: -- SYN FRAME(DATA_NSQ) CRC(F) PAYLOAD CRC(P) ----------------------
+    rx: -------------------------------------------------------------------
+
+Here, an error can be detected, but not corrected or indicated to the
+sending party. These exchanges are symmetric, i.e. switching rx and tx
+results again in a valid exchange. Currently, no longer exchanges are known.
+
+
+Commands: Requests, Responses, and Events
+=========================================
+
+Commands are sent as payload inside a data frame. Currently, this is the
+only known payload type of |DATA| frames, with a payload-type value of
+``0x80`` (:c:type:`SSH_PLD_TYPE_CMD <ssh_payload_type>`).
+
+The command-type payload (:c:type:`struct ssh_command <ssh_command>`)
+consists of an eight-byte command structure, followed by optional and
+variable length command data. The length of this optional data is derived
+from the frame payload length given in the corresponding frame, i.e. it is
+``frame.len - sizeof(struct ssh_command)``. The command struct contains the
+following fields, packed together and in order:
+
+.. flat-table:: SSH Command
+   :widths: 1 1 4
+   :header-rows: 1
+
+   * - Field
+     - Type
+     - Description
+
+   * - |TYPE|
+     - |u8|
+     - Type of the payload. For commands always ``0x80``.
+
+   * - |TC|
+     - |u8|
+     - Target category.
+
+   * - |TID| (out)
+     - |u8|
+     - Target ID for outgoing (host to EC) commands.
+
+   * - |TID| (in)
+     - |u8|
+     - Target ID for incoming (EC to host) commands.
+
+   * - |IID|
+     - |u8|
+     - Instance ID.
+
+   * - |RQID|
+     - |u16|
+     - Request ID.
+
+   * - |CID|
+     - |u8|
+     - Command ID.
+
+The command struct and data, in general, does not contain any failure
+detection mechanism (e.g. CRCs), this is solely done on the frame level.
+
+Command-type payloads are used by the host to send commands and requests to
+the EC as well as by the EC to send responses and events back to the host.
+We differentiate between requests (sent by the host), responses (sent by the
+EC in response to a request), and events (sent by the EC without a preceding
+request).
+
+Commands and events are uniquely identified by their target category
+(``TC``) and command ID (``CID``). The target category specifies a general
+category for the command (e.g. system in general, vs. battery and ac, vs.
+temperature, and so on), while the command ID specifies the command inside
+that category. Only the combination of |TC| + |CID| is unique. Additionally,
+commands have an instance ID (``IID``), which is used to differentiate
+between different sub-devices. For example ``TC=3`` ``CID=1`` is a
+request to get the temperature on a thermal sensor, where |IID| specifies
+the respective sensor. If the instance ID is not used, it should be set to
+zero. If instance IDs are used, they, in general, start with a value of one,
+whereas zero may be used for instance independent queries, if applicable. A
+response to a request should have the same target category, command ID, and
+instance ID as the corresponding request.
+
+Responses are matched to their corresponding request via the request ID
+(``RQID``) field. This is a 16 bit wrapping counter similar to the sequence
+ID on the frames. Note that the sequence ID of the frames for a
+request-response pair does not match. Only the request ID has to match.
+Frame-protocol wise these are two separate exchanges, and may even be
+separated, e.g. by an event being sent after the request but before the
+response. Not all commands produce a response, and this is not detectable by
+|TC| + |CID|. It is the responsibility of the issuing party to wait for a
+response (or signal this to the communication framework, as is done in
+SAN/ACPI via the ``SNC`` flag).
+
+Events are identified by unique and reserved request IDs. These IDs should
+not be used by the host when sending a new request. They are used on the
+host to, first, detect events and, second, match them with a registered
+event handler. Request IDs for events are chosen by the host and directed to
+the EC when setting up and enabling an event source (via the
+enable-event-source request). The EC then uses the specified request ID for
+events sent from the respective source. Note that an event should still be
+identified by its target category, command ID, and, if applicable, instance
+ID, as a single event source can send multiple different event types. In
+general, however, a single target category should map to a single reserved
+event request ID.
+
+Furthermore, requests, responses, and events have an associated target ID
+(``TID``). This target ID is split into output (host to EC) and input (EC to
+host) fields, with the respecting other field (e.g. output field on incoming
+messages) set to zero. Two ``TID`` values are known: Primary (``0x01``) and
+secondary (``0x02``). In general, the response to a request should have the
+same ``TID`` value, however, the field (output vs. input) should be used in
+accordance to the direction in which the response is sent (i.e. on the input
+field, as responses are generally sent from the EC to the host).
+
+Note that, even though requests and events should be uniquely identifiable
+by target category and command ID alone, the EC may require specific
+priority and instance ID values to accept a command. A command that is
+accepted for ``TID=1``, for example, may not be accepted for ``TID=2``
+and vice versa.
+
+
+Limitations and Observations
+============================
+
+The protocol can, in theory, handle up to ``U8_MAX`` frames in parallel,
+with up to ``U16_MAX`` pending requests (neglecting request IDs reserved for
+events). In practice, however, this is more limited. From our testing
+(although via a python and thus a user-space program), it seems that the EC
+can handle up to four requests (mostly) reliably in parallel at a certain
+time. With five or more requests in parallel, consistent discarding of
+commands (ACKed frame but no command response) has been observed. For five
+simultaneous commands, this reproducibly resulted in one command being
+dropped and four commands being handled.
+
+However, it has also been noted that, even with three requests in parallel,
+occasional frame drops happen. Apart from this, with a limit of three
+pending requests, no dropped commands (i.e. command being dropped but frame
+carrying command being ACKed) have been observed. In any case, frames (and
+possibly also commands) should be re-sent by the host if a certain timeout
+is exceeded. This is done by the EC for frames with a timeout of one second,
+up to two re-tries (i.e. three transmissions in total). The limit of
+re-tries also applies to received NAKs, and, in a worst case scenario, can
+lead to entire messages being dropped.
+
+While this also seems to work fine for pending data frames as long as no
+transmission failures occur, implementation and handling of these seems to
+depend on the assumption that there is only one non-acknowledged data frame.
+In particular, the detection of repeated frames relies on the last sequence
+number. This means that, if a frame that has been successfully received by
+the EC is sent again, e.g. due to the host not receiving an |ACK|, the EC
+will only detect this if it has the sequence ID of the last frame received
+by the EC. As an example: Sending two frames with ``SEQ=0`` and ``SEQ=1``
+followed by a repetition of ``SEQ=0`` will not detect the second ``SEQ=0``
+frame as such, and thus execute the command in this frame each time it has
+been received, i.e. twice in this example. Sending ``SEQ=0``, ``SEQ=1`` and
+then repeating ``SEQ=1`` will detect the second ``SEQ=1`` as repetition of
+the first one and ignore it, thus executing the contained command only once.
+
+In conclusion, this suggests a limit of at most one pending un-ACKed frame
+(per party, effectively leading to synchronous communication regarding
+frames) and at most three pending commands. The limit to synchronous frame
+transfers seems to be consistent with behavior observed on Windows.
diff --git a/MAINTAINERS b/MAINTAINERS
index fd22bec9a67d..74122a2a792d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11568,6 +11568,7 @@ M:	Maximilian Luz <luzmaximilian@gmail.com>
 S:	Maintained
 W:	https://github.com/linux-surface/surface-aggregator-module
 C:	irc://chat.freenode.net/##linux-surface
+F:	Documentation/driver-api/surface_aggregator/
 F:	drivers/misc/surface_aggregator/
 F:	include/linux/surface_aggregator/
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-23 15:15 [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Maximilian Luz
                   ` (4 preceding siblings ...)
  2020-09-23 15:15 ` [RFC PATCH 7/9] docs: driver-api: Add Surface Aggregator subsystem documentation Maximilian Luz
@ 2020-09-23 15:30 ` Arnd Bergmann
  2020-09-23 15:43   ` Maximilian Luz
  2020-09-24  8:30   ` Andy Shevchenko
       [not found] ` <20200923151511.3842150-2-luzmaximilian@gmail.com>
  6 siblings, 2 replies; 27+ messages in thread
From: Arnd Bergmann @ 2020-09-23 15:30 UTC (permalink / raw)
  To: Maximilian Luz
  Cc: linux-kernel, linux-serial, ACPI Devel Maling List,
	Greg Kroah-Hartman, Rob Herring, Jiri Slaby, Rafael J. Wysocki,
	Len Brown, Blaž Hrastnik, Dorian Stoll

On Wed, Sep 23, 2020 at 5:15 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>
> Hello,
>
> The Surface System Aggregator Module (we'll refer to it as Surface
> Aggregator or SAM below) is an embedded controller (EC) found on various
> Microsoft Surface devices. Specifically, all 4th and later generation
> Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
> exception of the Surface Go series and the Surface Duo. Notably, it
> seems like this EC can also be found on the ARM-based Surface Pro X [1].

I think this should go to drivers/platform/x86 or drivers/platform/surface/
along with other laptop vendor specific code rather than drivers/misc/.

I'll have a look at the code myself, but I'd prefer to have the maintainers
for the other laptop drivers review this properly.

       Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-23 15:30 ` [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Arnd Bergmann
@ 2020-09-23 15:43   ` Maximilian Luz
  2020-09-23 19:43     ` Arnd Bergmann
  2020-09-24  8:30   ` Andy Shevchenko
  1 sibling, 1 reply; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 15:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, linux-serial, ACPI Devel Maling List,
	Greg Kroah-Hartman, Rob Herring, Jiri Slaby, Rafael J. Wysocki,
	Len Brown, Blaž Hrastnik, Dorian Stoll

On 9/23/20 5:30 PM, Arnd Bergmann wrote:
> On Wed, Sep 23, 2020 at 5:15 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>
>> Hello,
>>
>> The Surface System Aggregator Module (we'll refer to it as Surface
>> Aggregator or SAM below) is an embedded controller (EC) found on various
>> Microsoft Surface devices. Specifically, all 4th and later generation
>> Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
>> exception of the Surface Go series and the Surface Duo. Notably, it
>> seems like this EC can also be found on the ARM-based Surface Pro X [1].
> 
> I think this should go to drivers/platform/x86 or drivers/platform/surface/
> along with other laptop vendor specific code rather than drivers/misc/.

I initially had this under drivers/platform/x86. There are two main
reasons I changed that: First, I think it's a bit too big for
platform/x86 given that it basically introduces a new subsystem. At this
point it's really less of "a couple of odd devices here and there" and
more of a bus-type thing. Second, with the possibility of future support
for ARM devices (Pro X, Pro X 2 which is rumored to come out soon), I
thought that platform/x86 would not be a good fit.

I'd be happy to move this to platform/surface though, if that's
considered a better fit and you're okay with me adding that. Would make
sense given that there's already a platform/chrome, which, as far as I
can tell, also seems to be mainly focused on EC support.

> I'll have a look at the code myself, but I'd prefer to have the maintainers
> for the other laptop drivers review this properly.

Thanks! I'll CC them for the next version.

Regards,
Max

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/9] misc: Add Surface Aggregator subsystem
       [not found] ` <20200923151511.3842150-2-luzmaximilian@gmail.com>
@ 2020-09-23 16:57   ` Greg Kroah-Hartman
  2020-09-23 20:34     ` Maximilian Luz
  0 siblings, 1 reply; 27+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-23 16:57 UTC (permalink / raw)
  To: Maximilian Luz
  Cc: linux-kernel, linux-serial, Arnd Bergmann, Rob Herring,
	Jiri Slaby, Blaž Hrastnik, Dorian Stoll

On Wed, Sep 23, 2020 at 05:15:03PM +0200, Maximilian Luz wrote:
> +/* -- Safe counters. -------------------------------------------------------- */
> +
> +/**
> + * ssh_seq_reset() - Reset/initialize sequence ID counter.
> + * @c: The counter to reset.
> + */
> +static void ssh_seq_reset(struct ssh_seq_counter *c)
> +{
> +	WRITE_ONCE(c->value, 0);
> +}

These "counters" are odd, what exactly are they?

They seem like a simple atomic counter, but not quite, so you have
rolled your own pseudo-atomic variable.  Are you sure that it works
properly?  If so, how?

What about just using an ida/idr structure instead?  Or just a simple
atomic counter that avoids the values you can't touch, or better yet, a
simple number with a correct lock protecting it :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 5/9] surface_aggregator: Add error injection capabilities
  2020-09-23 15:15 ` [RFC PATCH 5/9] surface_aggregator: Add error injection capabilities Maximilian Luz
@ 2020-09-23 17:45   ` Greg Kroah-Hartman
  2020-09-23 21:28     ` Maximilian Luz
  0 siblings, 1 reply; 27+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-23 17:45 UTC (permalink / raw)
  To: Maximilian Luz
  Cc: linux-kernel, linux-serial, Arnd Bergmann, Rob Herring,
	Jiri Slaby, Blaž Hrastnik, Dorian Stoll

On Wed, Sep 23, 2020 at 05:15:07PM +0200, Maximilian Luz wrote:
> This commit adds error injection hooks to the Surface Serial Hub
> communication protocol implementation, to:
> 
>  - simulate simple serial transmission errors,
> 
>  - drop packets, requests, and responses, simulating communication
>    failures and potentially trigger retransmission timeouts, as well as
> 
>  - inject invalid data into submitted and received packets.
> 
> Together with the trace points introduced in the previous commit, these
> facilities are intended to aid in testing, validation, and debugging of
> the Surface Aggregator communication layer.
> 
> Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>

Ok, this is ridiculous.

You are dropping a whole new subsystem on us, with full documentation,
correct driver model integration, crazy debugfs interactions (I made fun
of the patch, but the code did work, you just did more work than was
needed), proper auto-loading of modules, tracing, documentation for more
things than is ever expected, and now you are adding error injection
support?

You just made all other code submissions of new subsystems I have gotten
in the past 2 months look like total crud.  Which, to be fair, they
probably were, but wow, you just stepped up the level of professionalism
to a whole new height.

I can only dream that "real Linux companies" take note and try to follow
this example.  I think I will point them all at this in the future and
say, "go do it like this one."

very very very nice work, we owe you the beverage of your choice.

greg k-h

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-23 15:43   ` Maximilian Luz
@ 2020-09-23 19:43     ` Arnd Bergmann
  2020-09-23 23:28       ` Maximilian Luz
  0 siblings, 1 reply; 27+ messages in thread
From: Arnd Bergmann @ 2020-09-23 19:43 UTC (permalink / raw)
  To: Maximilian Luz
  Cc: linux-kernel, linux-serial, ACPI Devel Maling List,
	Greg Kroah-Hartman, Rob Herring, Jiri Slaby, Rafael J. Wysocki,
	Len Brown, Blaž Hrastnik, Dorian Stoll

On Wed, Sep 23, 2020 at 5:43 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>
> On 9/23/20 5:30 PM, Arnd Bergmann wrote:
> > On Wed, Sep 23, 2020 at 5:15 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> The Surface System Aggregator Module (we'll refer to it as Surface
> >> Aggregator or SAM below) is an embedded controller (EC) found on various
> >> Microsoft Surface devices. Specifically, all 4th and later generation
> >> Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
> >> exception of the Surface Go series and the Surface Duo. Notably, it
> >> seems like this EC can also be found on the ARM-based Surface Pro X [1].
> >
> > I think this should go to drivers/platform/x86 or drivers/platform/surface/
> > along with other laptop vendor specific code rather than drivers/misc/.
>
> I initially had this under drivers/platform/x86. There are two main
> reasons I changed that: First, I think it's a bit too big for
> platform/x86 given that it basically introduces a new subsystem. At this
> point it's really less of "a couple of odd devices here and there" and
> more of a bus-type thing. Second, with the possibility of future support
> for ARM devices (Pro X, Pro X 2 which is rumored to come out soon), I
> thought that platform/x86 would not be a good fit.

I don't see that as a strong reason against it. As you write yourself, the
driver won't work on the arm machines without major changes anyway,
and even if it does, it fits much better with the rest of it.

If you are worried about the size of the directory,
drivers/platform/x86/surface/
would also work.

> I'd be happy to move this to platform/surface though, if that's
> considered a better fit and you're okay with me adding that. Would make
> sense given that there's already a platform/chrome, which, as far as I
> can tell, also seems to be mainly focused on EC support.

Yes, I think the main question is how much overlap you see functionally
between this driver and the others in drivers/platform/x86.

      Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 4/9] surface_aggregator: Add trace points
  2020-09-23 15:15 ` [RFC PATCH 4/9] surface_aggregator: Add trace points Maximilian Luz
@ 2020-09-23 20:07   ` Steven Rostedt
  2020-09-23 23:43     ` Maximilian Luz
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2020-09-23 20:07 UTC (permalink / raw)
  To: Maximilian Luz
  Cc: linux-kernel, linux-serial, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Jiri Slaby, Blaž Hrastnik, Dorian Stoll

On Wed, 23 Sep 2020 17:15:06 +0200
Maximilian Luz <luzmaximilian@gmail.com> wrote:

> Add trace points to the Surface Aggregator subsystem core. These trace
> points can be used to track packets, requests, and allocations. They are
> further intended for debugging and testing/validation, specifically in
> combination with the error injection capabilities introduced in the
> subsequent commit.

I'm impressed! This uses some of the advanced features of the tracing
infrastructure. But I still have some comments to make about the layout
of the TP_STRUCT__entry() fields.

> 

> diff --git a/drivers/misc/surface_aggregator/trace.h b/drivers/misc/surface_aggregator/trace.h
> new file mode 100644
> index 000000000000..eb2e3e1457de
> --- /dev/null
> +++ b/drivers/misc/surface_aggregator/trace.h
> @@ -0,0 +1,612 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM surface_aggregator
> +
> +#if !defined(_SURFACE_AGGREGATOR_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _SURFACE_AGGREGATOR_TRACE_H
> +
> +#include <linux/surface_aggregator/serial_hub.h>
> +
> +#include <asm/unaligned.h>
> +#include <linux/tracepoint.h>
> +
> +
> +TRACE_DEFINE_ENUM(SSH_FRAME_TYPE_DATA_SEQ);
> +TRACE_DEFINE_ENUM(SSH_FRAME_TYPE_DATA_NSQ);
> +TRACE_DEFINE_ENUM(SSH_FRAME_TYPE_ACK);
> +TRACE_DEFINE_ENUM(SSH_FRAME_TYPE_NAK);

Kudos on using TRACE_DEFINE_ENUM :-)


> +
> +#define ssam_show_generic_u8_field(value)				\
> +	__print_symbolic(value,						\
> +		{ SSAM_U8_FIELD_NOT_APPLICABLE,		"N/A" }		\
> +	)
> +
> +
> +#define ssam_show_frame_type(ty)					\
> +	__print_symbolic(ty,						\
> +		{ SSH_FRAME_TYPE_DATA_SEQ,		"DSEQ" },	\
> +		{ SSH_FRAME_TYPE_DATA_NSQ,		"DNSQ" },	\
> +		{ SSH_FRAME_TYPE_ACK,			"ACK"  },	\
> +		{ SSH_FRAME_TYPE_NAK,			"NAK"  }	\
> +	)
> +
> +#define ssam_show_packet_type(type)					\
> +	__print_flags(flags & SSH_PACKET_FLAGS_TY_MASK, "",		\
> +		{ BIT(SSH_PACKET_TY_FLUSH_BIT),		"F" },		\
> +		{ BIT(SSH_PACKET_TY_SEQUENCED_BIT),	"S" },		\
> +		{ BIT(SSH_PACKET_TY_BLOCKING_BIT),	"B" }		\

More kudos on proper usage of __print_symbolic() and __print_flags() :-)



> +DECLARE_EVENT_CLASS(ssam_packet_class,
> +	TP_PROTO(const struct ssh_packet *packet),
> +
> +	TP_ARGS(packet),
> +
> +	TP_STRUCT__entry(
> +		__array(char, uid, SSAM_PTR_UID_LEN)
> +		__field(u8, priority)
> +		__field(u16, length)
> +		__field(unsigned long, state)
> +		__field(u16, seq)


Order matters above to keep the events as compact as possible. The more
compact they are, the more events you can store without loss.

Now with SSAM_PTR_UID_LEN = 9, the above is (on a 64 bit system);

	9 bytes;
	1 byte;
	2 bytes;
	8 bytes;
	2 bytes;

The ftrace ring buffer is 4 byte aligned. As words and long words are
also 4 byte aligned, there's not much different to change. But it is
possible that the compiler might add 4 byte padding between the long
word "length" and "priority". Note, these are not packed structures.

Testing this out with the following code:

 $ cat << EOF > test.c
struct test {
	unsigned char array[9];
	unsigned char priority;
	unsigned short length;
	unsigned long state;
	unsigned short seq;
};

static struct test x;

void receive_x(struct test *p)
{
	p = &x;
}
EOF

 $ gcc -g -c -o test.o test.c
 $ pahole test.o
struct test {
	unsigned char              array[9];             /*     0     9 */
	unsigned char              priority;             /*     9     1 */
	short unsigned int         length;               /*    10     2 */

	/* XXX 4 bytes hole, try to pack */

	long unsigned int          state;                /*    16     8 */
	short unsigned int         seq;                  /*    24     2 */

	/* size: 32, cachelines: 1, members: 5 */
	/* sum members: 22, holes: 1, sum holes: 4 */
	/* padding: 6 */
	/* last cacheline: 32 bytes */
};

You do see a hole between length and state. Now if we were to move this
around a little.

 $ cat <<EOF > test2.c
struct test {
	unsigned long state;
	unsigned char array[9];
	unsigned char priority;
	unsigned short length;
	unsigned short seq;
};

static struct test x;

void receive_x(struct test *p)
{
	p = &x;
}
EOF

 $ gcc -g -c -o test2 test2.c
 $ pahole test2.o
struct test {
	long unsigned int          state;                /*     0     8 */
	unsigned char              array[9];             /*     8     9 */
	unsigned char              priority;             /*    17     1 */
	short unsigned int         length;               /*    18     2 */
	short unsigned int         seq;                  /*    20     2 */

	/* size: 24, cachelines: 1, members: 5 */
	/* padding: 2 */
	/* last cacheline: 24 bytes */
};


We get a more compact structure with:

	TP_STRUCT__entry(
		__field(unsigned long, state)
		__array(char, uid, SSAM_PTR_UID_LEN)
		__field(u8, priority)
		__field(u16, length)
		__field(u16, seq)
	),


Note, you can find pahole here:

   https://git.kernel.org/pub/scm/devel/pahole/pahole.git


> +	),


> +
> +
> +DECLARE_EVENT_CLASS(ssam_packet_status_class,
> +	TP_PROTO(const struct ssh_packet *packet, int status),
> +
> +	TP_ARGS(packet, status),
> +
> +	TP_STRUCT__entry(
> +		__array(char, uid, SSAM_PTR_UID_LEN)
> +		__field(u8, priority)
> +		__field(u16, length)
> +		__field(unsigned long, state)
> +		__field(u16, seq)
> +		__field(int, status)

The above can be better compacted with:

	TP_STRUCT__entry(
		__field(unsigned long, state)
		__field(int, status)
		__array(char, uid, SSAM_PTR_UID_LEN)
		__field(u8, priority)
		__field(u16, length)
		__field(u16, seq)


> +	),
> +
> +	TP_fast_assign(
> +		ssam_trace_ptr_uid(packet, __entry->uid);
> +		__entry->priority = READ_ONCE(packet->priority);
> +		__entry->length = packet->data.len;
> +		__entry->state = READ_ONCE(packet->state);
> +		__entry->seq = ssam_trace_get_packet_seq(packet);
> +		__entry->status = status;
> +	),
> +
> +	TP_printk("uid=%s, seq=%s, ty=%s, pri=0x%02x, len=%u, sta=%s, status=%d",
> +		__entry->uid,
> +		ssam_show_packet_seq(__entry->seq),
> +		ssam_show_packet_type(__entry->state),
> +		__entry->priority,
> +		__entry->length,
> +		ssam_show_packet_state(__entry->state),
> +		__entry->status
> +	)


> +DECLARE_EVENT_CLASS(ssam_request_class,
> +	TP_PROTO(const struct ssh_request *request),
> +
> +	TP_ARGS(request),
> +
> +	TP_STRUCT__entry(
> +		__array(char, uid, SSAM_PTR_UID_LEN)
> +		__field(unsigned long, state)
> +		__field(u32, rqid)
> +		__field(u8, tc)
> +		__field(u16, cid)
> +		__field(u16, iid)

The above should be:

	TP_STRUCT__entry(
		__field(unsigned long, state)
		__field(u32, rqid)
		__array(char, uid, SSAM_PTR_UID_LEN)
		__field(u8, tc)
		__field(u16, cid)
		__field(u16, iid)


> +	),
> +
> +	TP_fast_assign(
> +		const struct ssh_packet *p = &request->packet;
> +
> +		// use packet for UID so we can match requests to packets
> +		ssam_trace_ptr_uid(p, __entry->uid);
> +		__entry->state = READ_ONCE(request->state);
> +		__entry->rqid = ssam_trace_get_request_id(p);
> +		__entry->tc = ssam_trace_get_request_tc(p);
> +		__entry->cid = ssam_trace_get_command_field_u8(p, cid);
> +		__entry->iid = ssam_trace_get_command_field_u8(p, iid);
> +	),
> +


> +DECLARE_EVENT_CLASS(ssam_request_status_class,
> +	TP_PROTO(const struct ssh_request *request, int status),
> +
> +	TP_ARGS(request, status),
> +
> +	TP_STRUCT__entry(
> +		__array(char, uid, SSAM_PTR_UID_LEN)
> +		__field(unsigned long, state)
> +		__field(u32, rqid)
> +		__field(u8, tc)
> +		__field(u16, cid)
> +		__field(u16, iid)
> +		__field(int, status)
> +	),

The above should be:

	TP_STRUCT__entry(
		__field(unsigned long, state)
		__field(u32, rqid)
		__field(int, status)
		__array(char, uid, SSAM_PTR_UID_LEN)
		__field(u8, tc)
		__field(u16, cid)
		__field(u16, iid)
	),

> +
> +	TP_fast_assign(
> +		const struct ssh_packet *p = &request->packet;
> +
> +		// use packet for UID so we can match requests to packets
> +		ssam_trace_ptr_uid(p, __entry->uid);
> +		__entry->state = READ_ONCE(request->state);
> +		__entry->rqid = ssam_trace_get_request_id(p);
> +		__entry->tc = ssam_trace_get_request_tc(p);
> +		__entry->cid = ssam_trace_get_command_field_u8(p, cid);
> +		__entry->iid = ssam_trace_get_command_field_u8(p, iid);
> +		__entry->status = status;
> +	),


> +DECLARE_EVENT_CLASS(ssam_alloc_class,
> +	TP_PROTO(void *ptr, size_t len),
> +
> +	TP_ARGS(ptr, len),
> +
> +	TP_STRUCT__entry(
> +		__array(char, uid, SSAM_PTR_UID_LEN)
> +		__field(size_t, len)
> +	),
> +
> +	TP_fast_assign(
> +		ssam_trace_ptr_uid(ptr, __entry->uid);
> +		__entry->len = len;
> +	),
> +
> +	TP_printk("uid=%s, len=%zu", __entry->uid, __entry->len)
> +);
> +
> +#define DEFINE_SSAM_ALLOC_EVENT(name)					\
> +	DEFINE_EVENT(ssam_alloc_class, ssam_##name,			\
> +		TP_PROTO(void *ptr, size_t len),			\
> +		TP_ARGS(ptr, len)					\
> +	)
> +
> +
> +DECLARE_EVENT_CLASS(ssam_free_class,
> +	TP_PROTO(void *ptr),
> +
> +	TP_ARGS(ptr),
> +
> +	TP_STRUCT__entry(
> +		__array(char, uid, SSAM_PTR_UID_LEN)
> +		__field(size_t, len)

Even this would be better swapping the fields.

> +	),
> +
> +	TP_fast_assign(
> +		ssam_trace_ptr_uid(ptr, __entry->uid);
> +	),
> +
> +	TP_printk("uid=%s", __entry->uid)
> +);
> +
> +#define DEFINE_SSAM_FREE_EVENT(name)					\
> +	DEFINE_EVENT(ssam_free_class, ssam_##name,			\
> +		TP_PROTO(void *ptr),					\
> +		TP_ARGS(ptr)						\
> +	)
> +
> +
> +DECLARE_EVENT_CLASS(ssam_generic_uint_class,
> +	TP_PROTO(const char *property, unsigned int value),
> +
> +	TP_ARGS(property, value),
> +
> +	TP_STRUCT__entry(
> +		__string(property, property)
> +		__field(unsigned int, value)

Strings are dynamic, and hold a 2 byte buffer. May be better to switch
the above as well.

Besides the layout, this patch was done nicely. Good job!

-- Steve


> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(property, property);
> +		__entry->value = value;
> +	),
> +
> +	TP_printk("%s=%u", __get_str(property), __entry->value)
> +);
> +
> +#define
> DEFINE_SSAM_GENERIC_UINT_EVENT(name)				\
> +	DEFINE_EVENT(ssam_generic_uint_class,
> ssam_##name,		\
> +		TP_PROTO(const char *property, unsigned int
> value),	\
> +		TP_ARGS(property,
> value)				\
> +	)
> +
> +
> +DEFINE_SSAM_FRAME_EVENT(rx_frame_received);
> +DEFINE_SSAM_COMMAND_EVENT(rx_response_received);
> +DEFINE_SSAM_COMMAND_EVENT(rx_event_received);
> +
> +DEFINE_SSAM_PACKET_EVENT(packet_release);
> +DEFINE_SSAM_PACKET_EVENT(packet_submit);
> +DEFINE_SSAM_PACKET_EVENT(packet_resubmit);
> +DEFINE_SSAM_PACKET_EVENT(packet_timeout);
> +DEFINE_SSAM_PACKET_EVENT(packet_cancel);
> +DEFINE_SSAM_PACKET_STATUS_EVENT(packet_complete);
> +DEFINE_SSAM_GENERIC_UINT_EVENT(ptl_timeout_reap);
> +
> +DEFINE_SSAM_REQUEST_EVENT(request_submit);
> +DEFINE_SSAM_REQUEST_EVENT(request_timeout);
> +DEFINE_SSAM_REQUEST_EVENT(request_cancel);
> +DEFINE_SSAM_REQUEST_STATUS_EVENT(request_complete);
> +DEFINE_SSAM_GENERIC_UINT_EVENT(rtl_timeout_reap);
> +
> +DEFINE_SSAM_ALLOC_EVENT(ctrl_packet_alloc);
> +DEFINE_SSAM_FREE_EVENT(ctrl_packet_free);
> +
> +DEFINE_SSAM_ALLOC_EVENT(event_item_alloc);
> +DEFINE_SSAM_FREE_EVENT(event_item_free);
> +
> +#endif /* _SURFACE_AGGREGATOR_TRACE_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_PATH
> +#undef TRACE_INCLUDE_FILE
> +
> +#define TRACE_INCLUDE_PATH .
> +#define TRACE_INCLUDE_FILE trace
> +
> +#include <trace/define_trace.h>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/9] misc: Add Surface Aggregator subsystem
  2020-09-23 16:57   ` [RFC PATCH 1/9] misc: Add Surface Aggregator subsystem Greg Kroah-Hartman
@ 2020-09-23 20:34     ` Maximilian Luz
  2020-09-24  6:48       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 20:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-serial, Arnd Bergmann, Rob Herring,
	Jiri Slaby, Blaž Hrastnik, Dorian Stoll

On 9/23/20 6:57 PM, Greg Kroah-Hartman wrote:
> On Wed, Sep 23, 2020 at 05:15:03PM +0200, Maximilian Luz wrote:
>> +/* -- Safe counters. -------------------------------------------------------- */
>> +
>> +/**
>> + * ssh_seq_reset() - Reset/initialize sequence ID counter.
>> + * @c: The counter to reset.
>> + */
>> +static void ssh_seq_reset(struct ssh_seq_counter *c)
>> +{
>> +	WRITE_ONCE(c->value, 0);
>> +}
> 
> These "counters" are odd, what exactly are they?

The SEQ counter is a sequence counter for transport packets with
roll-over at U8_MAX. As far as I can tell from testing, it doesn't
specifically have to be in sequence, but that's what I've called it
after initially reverse-engineering the protocol (the Windows driver
does keep it in sequence).

This counter is basically used to ensure that data packets can be
matched up with the corresponding ACK (acknowledge) control-packet. The
main reason for it being sequential, or rather where the sequentiality
can help, is packet retransmission detection:

Imagine the EC sends us a data packet, we ACK it but the ACK is somehow
dropped in communication. After the retransmission timeout the EC sends
us the data packet again. We can now assume (since the counter is
sequential) that there is a significant amount of time needed until we
can see the same SEQ number again. So we can write it to a fixed-size
rotating list after we've first received it, ACK the packet and
ignore-and-ACK any packet with the same ID after that until we've
received, let's say 16 packets with different IDs.

I assume that a similar mechanism is implemented on the EC side,
although I'm not sure how it's implemented specifically.

The RQID counter is pretty much the same, except for requests, roll-over
at U16_MAX, and some reserved IDs at the beginning (specifically 1 to
SSH_NUM_EVENTS, both inclusive) to differentiate events from responses
to requests.

> They seem like a simple atomic counter, but not quite, so you have
> rolled your own pseudo-atomic variable.  Are you sure that it works
> properly?  If so, how?

I'm fairly sure they work as designed, at least in terms of being safe
for concurrent execution (please do point it out if they're not). The
reset function is only called on init when no other thread should have
access to it, the counter get-and-increment is handled via cmpxchg to
guarantee that the same (old) value is only returned once per roll-over
and the correct new one gets assigned to the current counter value.

There can be a problem when a new packet/request is being put on hold by
the submitter (before actually being submitted) for an overly long time
and the counter rolls over at least once, causing the exact same packet
(or request) ID to be sent in sequence, and that in turn the EC to drop
the packet and result in a request timeout.

Note that everything else should (with the current implementation) work
fine. I.e. matching a request with its response will work (on the
driver-side) even with two requests with the same request ID in
sequence, as the pending "set" is actually a list that's traversed in
order, so matching the first submitted to the first received and the
second submitted to the second received.

I think that in practice, holding a packet/request this long should not
happen. My current implementation just ignores that issue. Maybe not the
best strategy, really...

> What about just using an ida/idr structure instead?  Or just a simple
> atomic counter that avoids the values you can't touch, or better yet, a
> simple number with a correct lock protecting it :)

As mentioned above, I belive that concurrent execution doesn't cause any
issues due to the cmpxchg, so no need to use a lock here. AFAIK atomic
counters don't roll over, so there shouldn't be any difference in
implementing this with atomic_int apart from then using atomic_cmpxchg
(semantically, they are the same, right?).

As far as I can tell, a fairly easy way to fix the duplicate-value
problem itself would be using IDAs for both, however, they would need to
be kept allocated for some time after the packet/request has been
completed to avoid re-using the same ID sequentially and accidentally
triggering retransmission-detection on the EC. I don't have a clue how
that's implemented... just that it works with roll-over counters in the
Windows driver, and that the EC itself uses a roll-over counter for its
SEQ values (SEQs sent by the EC and SEQs sent by the driver are
completely independent). I also don't know how that works on the Windows
driver for that matter.

_Theoretically_, the protocol should be able to support multiple packets
waiting for an ACK, but in testing that caused problems with
retransmission after error-detection on my Surface Book 2, so that's the
reason why it's currently limited to one due to this. I'm not sure if MS
considers changing that/has changed that on newer devices, which could
influence how retransmission-detection behaves, so that's another reason
why I'd like to keep it similar to what we observe on Windows.

We could add an "in-use" range to block allocation of new IDs. This
would be fairly cheap, still sequential, and guarantee that no ID is
used twice at the same time. On the other hand, that would also
completely block communication if just one packet is held back before
submission (something that could be avoided with IDAs).

And the last alternative: Keep it as it is. As said, this can result in
a dropped packet/request, upon which the caller is encouraged to
resubmit. In that case, I should probably also document this
somewhere... This will likely bite us though if the request throughput
gets large (and thus time-to-roll-over small) so I think this should be
addressed properly.

In short: Concurrent execution of the counter functions works, as far as
I can tell at least, and, as you see by the long answer, I have to spend
some time and think about the duplicate-value problem (again). If you've
managed to read through this wall of text (sorry about that) and you
have any ideas/preferences, please let me know.

Thanks,
Max


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 5/9] surface_aggregator: Add error injection capabilities
  2020-09-23 17:45   ` Greg Kroah-Hartman
@ 2020-09-23 21:28     ` Maximilian Luz
  0 siblings, 0 replies; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 21:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-serial, Arnd Bergmann, Rob Herring,
	Jiri Slaby, Blaž Hrastnik, Dorian Stoll

On 9/23/20 7:45 PM, Greg Kroah-Hartman wrote:
> On Wed, Sep 23, 2020 at 05:15:07PM +0200, Maximilian Luz wrote:
>> This commit adds error injection hooks to the Surface Serial Hub
>> communication protocol implementation, to:
>>
>>   - simulate simple serial transmission errors,
>>
>>   - drop packets, requests, and responses, simulating communication
>>     failures and potentially trigger retransmission timeouts, as well as
>>
>>   - inject invalid data into submitted and received packets.
>>
>> Together with the trace points introduced in the previous commit, these
>> facilities are intended to aid in testing, validation, and debugging of
>> the Surface Aggregator communication layer.
>>
>> Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
> 
> Ok, this is ridiculous.
> 
> You are dropping a whole new subsystem on us, with full documentation,
> correct driver model integration, crazy debugfs interactions (I made fun
> of the patch, but the code did work, you just did more work than was
> needed), proper auto-loading of modules, tracing, documentation for more
> things than is ever expected, and now you are adding error injection
> support?
> 
> You just made all other code submissions of new subsystems I have gotten
> in the past 2 months look like total crud.  Which, to be fair, they
> probably were, but wow, you just stepped up the level of professionalism
> to a whole new height.
> 
> I can only dream that "real Linux companies" take note and try to follow
> this example.  I think I will point them all at this in the future and
> say, "go do it like this one."
> 
> very very very nice work, we owe you the beverage of your choice.
> 
> greg k-h


Wow, thank you very much for those kind words! That means quite a lot to
me.

To be fair, I've been working on this whole project for about two years
now and a large part of the code has been rewritten in the last half a
year, specifically to get it ready for the kernel. So I guess that might
relativize things a bit :)

Thanks again,
Max

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-23 19:43     ` Arnd Bergmann
@ 2020-09-23 23:28       ` Maximilian Luz
  2020-09-24  8:26         ` Arnd Bergmann
  0 siblings, 1 reply; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 23:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, linux-serial, ACPI Devel Maling List,
	Greg Kroah-Hartman, Rob Herring, Jiri Slaby, Rafael J. Wysocki,
	Len Brown, Blaž Hrastnik, Dorian Stoll

On 9/23/20 9:43 PM, Arnd Bergmann wrote:
> On Wed, Sep 23, 2020 at 5:43 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>
>> On 9/23/20 5:30 PM, Arnd Bergmann wrote:
>>> On Wed, Sep 23, 2020 at 5:15 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> The Surface System Aggregator Module (we'll refer to it as Surface
>>>> Aggregator or SAM below) is an embedded controller (EC) found on various
>>>> Microsoft Surface devices. Specifically, all 4th and later generation
>>>> Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
>>>> exception of the Surface Go series and the Surface Duo. Notably, it
>>>> seems like this EC can also be found on the ARM-based Surface Pro X [1].
>>>
>>> I think this should go to drivers/platform/x86 or drivers/platform/surface/
>>> along with other laptop vendor specific code rather than drivers/misc/.
>>
>> I initially had this under drivers/platform/x86. There are two main
>> reasons I changed that: First, I think it's a bit too big for
>> platform/x86 given that it basically introduces a new subsystem. At this
>> point it's really less of "a couple of odd devices here and there" and
>> more of a bus-type thing. Second, with the possibility of future support
>> for ARM devices (Pro X, Pro X 2 which is rumored to come out soon), I
>> thought that platform/x86 would not be a good fit.
> 
> I don't see that as a strong reason against it. As you write yourself, the
> driver won't work on the arm machines without major changes anyway,
> and even if it does, it fits much better with the rest of it.

Sorry, I should have written that a bit more clearly. I don't see any
reason why these drivers would not work on an ARM device such as the Pro
X right now, assuming that it boots via ACPI and the serial device it
loads against is fully functional.

The reason (at least as far as I know) it currently hasn't been tested
is that a) there aren't a lot of people around attempting to run Linux
on the currently only ARM device with that and b) it's currently blocked
by a reason unrelated to this driver itself, specifically that the
serial controller isn't being set up and thus the core driver doesn't
have a device it can attach to. My information may be outdated though
and is pretty much exclusively based on
https://github.com/Sonicadvance1/linux/issues/7.

> If you are worried about the size of the directory,
> drivers/platform/x86/surface/
> would also work.

This was the alternative I'd have considered without ARM devices.

>> I'd be happy to move this to platform/surface though, if that's
>> considered a better fit and you're okay with me adding that. Would make
>> sense given that there's already a platform/chrome, which, as far as I
>> can tell, also seems to be mainly focused on EC support.
> 
> Yes, I think the main question is how much overlap you see functionally
> between this driver and the others in drivers/platform/x86.

I think that the Pro X likely won't be the last ARM Surface device with
a SAM EC. Further, the subsystem is going to grow, and platform/x86
seems more like a collection of, if at all, loosely connected drivers,
which might give off the wrong impression. In my mind, this is just a
bit more comparable to platform/chrome than the rest of platform/x86. I
don't think I'm really qualified to make the decision on that though,
that's just my opinion.

Here's an overview of other drivers that I hopefully at some point get
in good enough shape, which are part of this subsystem/dependent on the
EC API introduced here:

- A device registry / device hub for devices that are connected to the
   EC but can't be detected via ACPI.

- A dedicated battery driver for 7th generation devices (where the
   battery isn't hanled via the ACPI shim).

- A driver properly handling clipboard detachment on the Surface Books.

- A driver for HID input/transport on the Surface Laptops and Surface
   Book 3.

- A driver for allowing users to set the performance/cooling mode via
   sysfs.

- Possibly a driver improving hot-plug handling of the discrete GPU in
   the Surface Book base.

And also some stuff that hasn't been written yet:

- A dedicated driver for temperature sensors handled via the EC on 7th
   generation devices (also handled via the ACPI shim on previous
   generations).

- Possibly a driver for real-time-clock access on 7th generation
   devices (it has yet to be tested if that interface is still
   around/required on those devices; that's also a thing handled via
   the ACPI shim on previous generations).

I doubt that those client drivers will be exclusive to x86, and I could
see (current and future) ARM devices using SAM based battery,
keyboard/HID, and performance mode drivers (which will likely also
require the device registry, because for some reason MS doesn't want to
describe those devices in ACPI on the newer generations any more...).
All of those should work as-is on ARM (or at least after the
corresponding device entries have been added to the device registry),
modulo bugs of course.

I hope this all gives a better overview of the form this may eventually
take on and helps you in your decision. I'd be completely happy to move
it to either, platform/surface or platform/x86/surface, whatever the
consensus is. I'd very much like to keep the client drivers all
contained to one sub-directory, though, and not scattered all over
platform/x86/surface_*.c. Again that's more of a personal preference
though :)

Thanks,
Max

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 4/9] surface_aggregator: Add trace points
  2020-09-23 20:07   ` Steven Rostedt
@ 2020-09-23 23:43     ` Maximilian Luz
  0 siblings, 0 replies; 27+ messages in thread
From: Maximilian Luz @ 2020-09-23 23:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-serial, Arnd Bergmann, Greg Kroah-Hartman,
	Rob Herring, Jiri Slaby, Blaž Hrastnik, Dorian Stoll

On 9/23/20 10:07 PM, Steven Rostedt wrote:
> On Wed, 23 Sep 2020 17:15:06 +0200
> Maximilian Luz <luzmaximilian@gmail.com> wrote:
> 
>> Add trace points to the Surface Aggregator subsystem core. These trace
>> points can be used to track packets, requests, and allocations. They are
>> further intended for debugging and testing/validation, specifically in
>> combination with the error injection capabilities introduced in the
>> subsequent commit.
> 
> I'm impressed! This uses some of the advanced features of the tracing
> infrastructure. But I still have some comments to make about the layout
> of the TP_STRUCT__entry() fields.

Thanks!

[...]

>> +DECLARE_EVENT_CLASS(ssam_packet_class,
>> +	TP_PROTO(const struct ssh_packet *packet),
>> +
>> +	TP_ARGS(packet),
>> +
>> +	TP_STRUCT__entry(
>> +		__array(char, uid, SSAM_PTR_UID_LEN)
>> +		__field(u8, priority)
>> +		__field(u16, length)
>> +		__field(unsigned long, state)
>> +		__field(u16, seq)
> 
> 
> Order matters above to keep the events as compact as possible. The more
> compact they are, the more events you can store without loss.
> 
> Now with SSAM_PTR_UID_LEN = 9, the above is (on a 64 bit system);
> 
> 	9 bytes;
> 	1 byte;
> 	2 bytes;
> 	8 bytes;
> 	2 bytes;
> 
> The ftrace ring buffer is 4 byte aligned. As words and long words are
> also 4 byte aligned, there's not much different to change. But it is
> possible that the compiler might add 4 byte padding between the long
> word "length" and "priority". Note, these are not packed structures.
> 
> Testing this out with the following code:
> 
>   $ cat << EOF > test.c
> struct test {
> 	unsigned char array[9];
> 	unsigned char priority;
> 	unsigned short length;
> 	unsigned long state;
> 	unsigned short seq;
> };
> 
> static struct test x;
> 
> void receive_x(struct test *p)
> {
> 	p = &x;
> }
> EOF
> 
>   $ gcc -g -c -o test.o test.c
>   $ pahole test.o
> struct test {
> 	unsigned char              array[9];             /*     0     9 */
> 	unsigned char              priority;             /*     9     1 */
> 	short unsigned int         length;               /*    10     2 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	long unsigned int          state;                /*    16     8 */
> 	short unsigned int         seq;                  /*    24     2 */
> 
> 	/* size: 32, cachelines: 1, members: 5 */
> 	/* sum members: 22, holes: 1, sum holes: 4 */
> 	/* padding: 6 */
> 	/* last cacheline: 32 bytes */
> };
> 
> You do see a hole between length and state. Now if we were to move this
> around a little.
> 
>   $ cat <<EOF > test2.c
> struct test {
> 	unsigned long state;
> 	unsigned char array[9];
> 	unsigned char priority;
> 	unsigned short length;
> 	unsigned short seq;
> };
> 
> static struct test x;
> 
> void receive_x(struct test *p)
> {
> 	p = &x;
> }
> EOF
> 
>   $ gcc -g -c -o test2 test2.c
>   $ pahole test2.o
> struct test {
> 	long unsigned int          state;                /*     0     8 */
> 	unsigned char              array[9];             /*     8     9 */
> 	unsigned char              priority;             /*    17     1 */
> 	short unsigned int         length;               /*    18     2 */
> 	short unsigned int         seq;                  /*    20     2 */
> 
> 	/* size: 24, cachelines: 1, members: 5 */
> 	/* padding: 2 */
> 	/* last cacheline: 24 bytes */
> };
> 
> 
> We get a more compact structure with:
> 
> 	TP_STRUCT__entry(
> 		__field(unsigned long, state)
> 		__array(char, uid, SSAM_PTR_UID_LEN)
> 		__field(u8, priority)
> 		__field(u16, length)
> 		__field(u16, seq)
> 	),
> 
> 
> Note, you can find pahole here:
> 
>     https://git.kernel.org/pub/scm/devel/pahole/pahole.git
> 
> 
>> +	),

Thank you for that detailed write-up! As you have clearly noticed, I
have not really looked at the struct layouts. I will fix this for v2,
include your changes, and have a look at pahole.

[...]

Thanks,
Max

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/9] misc: Add Surface Aggregator subsystem
  2020-09-23 20:34     ` Maximilian Luz
@ 2020-09-24  6:48       ` Greg Kroah-Hartman
  2020-09-24 18:16         ` Maximilian Luz
  0 siblings, 1 reply; 27+ messages in thread
From: Greg Kroah-Hartman @ 2020-09-24  6:48 UTC (permalink / raw)
  To: Maximilian Luz
  Cc: linux-kernel, linux-serial, Arnd Bergmann, Rob Herring,
	Jiri Slaby, Blaž Hrastnik, Dorian Stoll

On Wed, Sep 23, 2020 at 10:34:23PM +0200, Maximilian Luz wrote:
> In short: Concurrent execution of the counter functions works, as far as
> I can tell at least, and, as you see by the long answer, I have to spend
> some time and think about the duplicate-value problem (again). If you've
> managed to read through this wall of text (sorry about that) and you
> have any ideas/preferences, please let me know.

No, this all answers my question really well, thanks, what you have now
is fine, no need to change it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-23 23:28       ` Maximilian Luz
@ 2020-09-24  8:26         ` Arnd Bergmann
  2020-09-24 18:59           ` Maximilian Luz
  0 siblings, 1 reply; 27+ messages in thread
From: Arnd Bergmann @ 2020-09-24  8:26 UTC (permalink / raw)
  To: Maximilian Luz
  Cc: linux-kernel, linux-serial, ACPI Devel Maling List,
	Greg Kroah-Hartman, Rob Herring, Jiri Slaby, Rafael J. Wysocki,
	Len Brown, Blaž Hrastnik, Dorian Stoll, Darren Hart,
	Andy Shevchenko, Platform Driver

On Thu, Sep 24, 2020 at 1:28 AM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> On 9/23/20 9:43 PM, Arnd Bergmann wrote:
> > On Wed, Sep 23, 2020 at 5:43 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> >>
> >> On 9/23/20 5:30 PM, Arnd Bergmann wrote:
> >>> On Wed, Sep 23, 2020 at 5:15 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> The Surface System Aggregator Module (we'll refer to it as Surface
> >>>> Aggregator or SAM below) is an embedded controller (EC) found on various
> >>>> Microsoft Surface devices. Specifically, all 4th and later generation
> >>>> Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
> >>>> exception of the Surface Go series and the Surface Duo. Notably, it
> >>>> seems like this EC can also be found on the ARM-based Surface Pro X [1].
> >>>
> >>> I think this should go to drivers/platform/x86 or drivers/platform/surface/
> >>> along with other laptop vendor specific code rather than drivers/misc/.
> >>
> >> I initially had this under drivers/platform/x86. There are two main
> >> reasons I changed that: First, I think it's a bit too big for
> >> platform/x86 given that it basically introduces a new subsystem. At this
> >> point it's really less of "a couple of odd devices here and there" and
> >> more of a bus-type thing. Second, with the possibility of future support
> >> for ARM devices (Pro X, Pro X 2 which is rumored to come out soon), I
> >> thought that platform/x86 would not be a good fit.
> >
> > I don't see that as a strong reason against it. As you write yourself, the
> > driver won't work on the arm machines without major changes anyway,
> > and even if it does, it fits much better with the rest of it.
>
> Sorry, I should have written that a bit more clearly. I don't see any
> reason why these drivers would not work on an ARM device such as the Pro
> X right now, assuming that it boots via ACPI and the serial device it
> loads against is fully functional.

As I understand, the dialect of ACPI used on the snapdragon laptops
is not really compatible with the subset expected by the kernel, so
you'd be more likely to run those laptops with a device tree description
of the hardware instead (if at all).

Making the driver talk to the hardware directly instead of going through
AML likely requires more refactoring.

> >> I'd be happy to move this to platform/surface though, if that's
> >> considered a better fit and you're okay with me adding that. Would make
> >> sense given that there's already a platform/chrome, which, as far as I
> >> can tell, also seems to be mainly focused on EC support.
> >
> > Yes, I think the main question is how much overlap you see functionally
> > between this driver and the others in drivers/platform/x86.
>
> I think that the Pro X likely won't be the last ARM Surface device with
> a SAM EC. Further, the subsystem is going to grow, and platform/x86
> seems more like a collection of, if at all, loosely connected drivers,
> which might give off the wrong impression. In my mind, this is just a
> bit more comparable to platform/chrome than the rest of platform/x86. I
> don't think I'm really qualified to make the decision on that though,
> that's just my opinion.

I would ask the drivers/platform/x86 maintainers for an opinion here,
they are probably best qualified to make that decision.

I don't really mind either way, for me this is more about who is
responsible as a subsystem maintainer than whether these are
technically x86 or not.

> Here's an overview of other drivers that I hopefully at some point get
> in good enough shape, which are part of this subsystem/dependent on the
> EC API introduced here:
>
> - A device registry / device hub for devices that are connected to the
>    EC but can't be detected via ACPI.
>
> - A dedicated battery driver for 7th generation devices (where the
>    battery isn't hanled via the ACPI shim).
>
> - A driver properly handling clipboard detachment on the Surface Books.
>
> - A driver for HID input/transport on the Surface Laptops and Surface
>    Book 3.
>
> - A driver for allowing users to set the performance/cooling mode via
>    sysfs.
>
> - Possibly a driver improving hot-plug handling of the discrete GPU in
>    the Surface Book base.

Note that drivers that connect to the bus typically don't live in the
same subdirectory as the driver that operates the bus. E.g. the
battery driver would go into drivers/power/supply and the input
would go into drivers/input/ or drivers/hid.

    Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-23 15:30 ` [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Arnd Bergmann
  2020-09-23 15:43   ` Maximilian Luz
@ 2020-09-24  8:30   ` Andy Shevchenko
  2020-09-24 19:17     ` Maximilian Luz
  1 sibling, 1 reply; 27+ messages in thread
From: Andy Shevchenko @ 2020-09-24  8:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Maximilian Luz, linux-kernel, open list:SERIAL DRIVERS,
	ACPI Devel Maling List, Greg Kroah-Hartman, Rob Herring,
	Jiri Slaby, Rafael J. Wysocki, Len Brown, Blaž Hrastnik,
	Dorian Stoll

On Wed, Sep 23, 2020 at 6:32 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Sep 23, 2020 at 5:15 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> >
> > Hello,
> >
> > The Surface System Aggregator Module (we'll refer to it as Surface
> > Aggregator or SAM below) is an embedded controller (EC) found on various
> > Microsoft Surface devices. Specifically, all 4th and later generation
> > Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
> > exception of the Surface Go series and the Surface Duo. Notably, it
> > seems like this EC can also be found on the ARM-based Surface Pro X [1].
>
> I think this should go to drivers/platform/x86 or drivers/platform/surface/
> along with other laptop vendor specific code rather than drivers/misc/.

+1 here. drivers/platform/surface is a good place to start.
And you may begin with moving a few Surface drivers out of PDx86 to
the new folder.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/9] misc: Add Surface Aggregator subsystem
  2020-09-24  6:48       ` Greg Kroah-Hartman
@ 2020-09-24 18:16         ` Maximilian Luz
  0 siblings, 0 replies; 27+ messages in thread
From: Maximilian Luz @ 2020-09-24 18:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, linux-serial, Arnd Bergmann, Rob Herring,
	Jiri Slaby, Blaž Hrastnik, Dorian Stoll

On 9/24/20 8:48 AM, Greg Kroah-Hartman wrote:
> On Wed, Sep 23, 2020 at 10:34:23PM +0200, Maximilian Luz wrote:
>> In short: Concurrent execution of the counter functions works, as far as
>> I can tell at least, and, as you see by the long answer, I have to spend
>> some time and think about the duplicate-value problem (again). If you've
>> managed to read through this wall of text (sorry about that) and you
>> have any ideas/preferences, please let me know.
> 
> No, this all answers my question really well, thanks, what you have now
> is fine, no need to change it.

Okay, thank you again!

Regards,
Max

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-24  8:26         ` Arnd Bergmann
@ 2020-09-24 18:59           ` Maximilian Luz
  2020-09-24 19:38             ` Arnd Bergmann
  0 siblings, 1 reply; 27+ messages in thread
From: Maximilian Luz @ 2020-09-24 18:59 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, linux-serial, ACPI Devel Maling List,
	Greg Kroah-Hartman, Rob Herring, Jiri Slaby, Rafael J. Wysocki,
	Len Brown, Blaž Hrastnik, Dorian Stoll, Darren Hart,
	Andy Shevchenko, Platform Driver

On 9/24/20 10:26 AM, Arnd Bergmann wrote:
> On Thu, Sep 24, 2020 at 1:28 AM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>> On 9/23/20 9:43 PM, Arnd Bergmann wrote:
>>> On Wed, Sep 23, 2020 at 5:43 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>>
>>>> On 9/23/20 5:30 PM, Arnd Bergmann wrote:
>>>>> On Wed, Sep 23, 2020 at 5:15 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> The Surface System Aggregator Module (we'll refer to it as Surface
>>>>>> Aggregator or SAM below) is an embedded controller (EC) found on various
>>>>>> Microsoft Surface devices. Specifically, all 4th and later generation
>>>>>> Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
>>>>>> exception of the Surface Go series and the Surface Duo. Notably, it
>>>>>> seems like this EC can also be found on the ARM-based Surface Pro X [1].
>>>>>
>>>>> I think this should go to drivers/platform/x86 or drivers/platform/surface/
>>>>> along with other laptop vendor specific code rather than drivers/misc/.
>>>>
>>>> I initially had this under drivers/platform/x86. There are two main
>>>> reasons I changed that: First, I think it's a bit too big for
>>>> platform/x86 given that it basically introduces a new subsystem. At this
>>>> point it's really less of "a couple of odd devices here and there" and
>>>> more of a bus-type thing. Second, with the possibility of future support
>>>> for ARM devices (Pro X, Pro X 2 which is rumored to come out soon), I
>>>> thought that platform/x86 would not be a good fit.
>>>
>>> I don't see that as a strong reason against it. As you write yourself, the
>>> driver won't work on the arm machines without major changes anyway,
>>> and even if it does, it fits much better with the rest of it.
>>
>> Sorry, I should have written that a bit more clearly. I don't see any
>> reason why these drivers would not work on an ARM device such as the Pro
>> X right now, assuming that it boots via ACPI and the serial device it
>> loads against is fully functional.
> 
> As I understand, the dialect of ACPI used on the snapdragon laptops
> is not really compatible with the subset expected by the kernel, so
> you'd be more likely to run those laptops with a device tree description
> of the hardware instead (if at all).
> 
> Making the driver talk to the hardware directly instead of going through
> AML likely requires more refactoring.

Oh, I did not know that! Thanks!

>>>> I'd be happy to move this to platform/surface though, if that's
>>>> considered a better fit and you're okay with me adding that. Would make
>>>> sense given that there's already a platform/chrome, which, as far as I
>>>> can tell, also seems to be mainly focused on EC support.
>>>
>>> Yes, I think the main question is how much overlap you see functionally
>>> between this driver and the others in drivers/platform/x86.
>>
>> I think that the Pro X likely won't be the last ARM Surface device with
>> a SAM EC. Further, the subsystem is going to grow, and platform/x86
>> seems more like a collection of, if at all, loosely connected drivers,
>> which might give off the wrong impression. In my mind, this is just a
>> bit more comparable to platform/chrome than the rest of platform/x86. I
>> don't think I'm really qualified to make the decision on that though,
>> that's just my opinion.
> 
> I would ask the drivers/platform/x86 maintainers for an opinion here,
> they are probably best qualified to make that decision.
> 
> I don't really mind either way, for me this is more about who is
> responsible as a subsystem maintainer than whether these are
> technically x86 or not.

I see, okay. I'll ask them and CC them on the next submission.

>> Here's an overview of other drivers that I hopefully at some point get
>> in good enough shape, which are part of this subsystem/dependent on the
>> EC API introduced here:
>>
>> - A device registry / device hub for devices that are connected to the
>>     EC but can't be detected via ACPI.
>>
>> - A dedicated battery driver for 7th generation devices (where the
>>     battery isn't hanled via the ACPI shim).
>>
>> - A driver properly handling clipboard detachment on the Surface Books.
>>
>> - A driver for HID input/transport on the Surface Laptops and Surface
>>     Book 3.
>>
>> - A driver for allowing users to set the performance/cooling mode via
>>     sysfs.
>>
>> - Possibly a driver improving hot-plug handling of the discrete GPU in
>>     the Surface Book base.
> 
> Note that drivers that connect to the bus typically don't live in the
> same subdirectory as the driver that operates the bus. E.g. the
> battery driver would go into drivers/power/supply and the input
> would go into drivers/input/ or drivers/hid.

Right. I wonder if this also holds for devices that are directly
dependent on a special platform though? It could make sense to have them
under plaform/surface rather than in the individual subsystems as they
are only ever going to be used on this platform. On the other hand, one
could argue that having them in the subsystem directories is better for
maintainability.

Thanks,
Max

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-24  8:30   ` Andy Shevchenko
@ 2020-09-24 19:17     ` Maximilian Luz
  2020-09-25 14:58       ` Andy Shevchenko
  0 siblings, 1 reply; 27+ messages in thread
From: Maximilian Luz @ 2020-09-24 19:17 UTC (permalink / raw)
  To: Andy Shevchenko, Arnd Bergmann
  Cc: linux-kernel, open list:SERIAL DRIVERS, ACPI Devel Maling List,
	Greg Kroah-Hartman, Rob Herring, Jiri Slaby, Rafael J. Wysocki,
	Len Brown, Blaž Hrastnik, Dorian Stoll,
	Enric Balletbo i Serra, Hans de Goede, Mika Westerberg,
	Gayatri Kammela, Darren Hart, platform-driver-x86

On 9/24/20 10:30 AM, Andy Shevchenko wrote:
> On Wed, Sep 23, 2020 at 6:32 PM Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> On Wed, Sep 23, 2020 at 5:15 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> The Surface System Aggregator Module (we'll refer to it as Surface
>>> Aggregator or SAM below) is an embedded controller (EC) found on various
>>> Microsoft Surface devices. Specifically, all 4th and later generation
>>> Surface devices, i.e. Surface Pro 4, Surface Book 1 and later, with the
>>> exception of the Surface Go series and the Surface Duo. Notably, it
>>> seems like this EC can also be found on the ARM-based Surface Pro X [1].
>>
>> I think this should go to drivers/platform/x86 or drivers/platform/surface/
>> along with other laptop vendor specific code rather than drivers/misc/.
> 
> +1 here. drivers/platform/surface is a good place to start.
> And you may begin with moving a few Surface drivers out of PDx86 to
> the new folder.

Perfect, thanks! I'll draft up a patch series over the weekend.

A couple questions regarding structure and maintenance:

  - Should I CC the platform-driver-x86 list on future submissions to
    drivers/platform/surface? I.e. is this something you would want to
    review if it doesn't touch the drivers/platform/x86 directory?

  - How would you want the layout to be, specifically regarding to the
    surface-aggregator stuff? My suggestion would be simply:

    drivers/platform/surface/
        surface_aggregator/
            Kconfig
            Makefile
            core.c
            controller.c
            ... (all core stuff built into the surface_aggregator module)
        Kconfig
        Makefile
        surface_aggregator_debugfs.c
        surface_acpi_notify.c
        surface_*.c        (any other surface platform driver as well
                            as drivers dependent on surface_aggregator)

  - Regarding future things like HID transport driver, battery/AC driver:
    Submit them to drivers/platform/surface or to their respective
    subsystem directories?

Thanks,
Max

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-24 18:59           ` Maximilian Luz
@ 2020-09-24 19:38             ` Arnd Bergmann
  2020-09-24 21:07               ` Maximilian Luz
  2020-09-25 14:53               ` Andy Shevchenko
  0 siblings, 2 replies; 27+ messages in thread
From: Arnd Bergmann @ 2020-09-24 19:38 UTC (permalink / raw)
  To: Maximilian Luz
  Cc: linux-kernel, linux-serial, ACPI Devel Maling List,
	Greg Kroah-Hartman, Rob Herring, Jiri Slaby, Rafael J. Wysocki,
	Len Brown, Blaž Hrastnik, Dorian Stoll, Darren Hart,
	Andy Shevchenko, Platform Driver

On Thu, Sep 24, 2020 at 8:59 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> On 9/24/20 10:26 AM, Arnd Bergmann wrote:
> > On Thu, Sep 24, 2020 at 1:28 AM Maximilian Luz <luzmaximilian@gmail.com> wrote:

> > Note that drivers that connect to the bus typically don't live in the
> > same subdirectory as the driver that operates the bus. E.g. the
> > battery driver would go into drivers/power/supply and the input
> > would go into drivers/input/ or drivers/hid.
>
> Right. I wonder if this also holds for devices that are directly
> dependent on a special platform though? It could make sense to have them
> under plaform/surface rather than in the individual subsystems as they
> are only ever going to be used on this platform. On the other hand, one
> could argue that having them in the subsystem directories is better for
> maintainability.

Yes, absolutely. The subsystem maintainers are the ones that are
most qualified of reviewing code that uses their subsystem, regardless
of which bus is used underneath the device, and having all drivers
for a subsystem in one place makes it much easier to refactor them
all at once in case the internal interfaces are changed or common bugs
are found in multiple drivers.

       Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-24 19:38             ` Arnd Bergmann
@ 2020-09-24 21:07               ` Maximilian Luz
  2020-09-25 14:53               ` Andy Shevchenko
  1 sibling, 0 replies; 27+ messages in thread
From: Maximilian Luz @ 2020-09-24 21:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, linux-serial, ACPI Devel Maling List,
	Greg Kroah-Hartman, Rob Herring, Jiri Slaby, Rafael J. Wysocki,
	Len Brown, Blaž Hrastnik, Dorian Stoll, Darren Hart,
	Andy Shevchenko, Platform Driver

On 9/24/20 9:38 PM, Arnd Bergmann wrote:
> On Thu, Sep 24, 2020 at 8:59 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>> On 9/24/20 10:26 AM, Arnd Bergmann wrote:
>>> On Thu, Sep 24, 2020 at 1:28 AM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> 
>>> Note that drivers that connect to the bus typically don't live in the
>>> same subdirectory as the driver that operates the bus. E.g. the
>>> battery driver would go into drivers/power/supply and the input
>>> would go into drivers/input/ or drivers/hid.
>>
>> Right. I wonder if this also holds for devices that are directly
>> dependent on a special platform though? It could make sense to have them
>> under plaform/surface rather than in the individual subsystems as they
>> are only ever going to be used on this platform. On the other hand, one
>> could argue that having them in the subsystem directories is better for
>> maintainability.
> 
> Yes, absolutely. The subsystem maintainers are the ones that are
> most qualified of reviewing code that uses their subsystem, regardless
> of which bus is used underneath the device, and having all drivers
> for a subsystem in one place makes it much easier to refactor them
> all at once in case the internal interfaces are changed or common bugs
> are found in multiple drivers.

Got it.

Thank you for bearing with me and answering all my (probably a bit
silly) questions! I really appreciate it!

Regards,
Max

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-24 19:38             ` Arnd Bergmann
  2020-09-24 21:07               ` Maximilian Luz
@ 2020-09-25 14:53               ` Andy Shevchenko
  1 sibling, 0 replies; 27+ messages in thread
From: Andy Shevchenko @ 2020-09-25 14:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Maximilian Luz, linux-kernel, open list:SERIAL DRIVERS,
	ACPI Devel Maling List, Greg Kroah-Hartman, Rob Herring,
	Jiri Slaby, Rafael J. Wysocki, Len Brown, Blaž Hrastnik,
	Dorian Stoll, Darren Hart, Andy Shevchenko, Platform Driver

On Thu, Sep 24, 2020 at 10:38 PM Arnd Bergmann <arnd@arndb.de> wrote:
> On Thu, Sep 24, 2020 at 8:59 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> > On 9/24/20 10:26 AM, Arnd Bergmann wrote:
> > > On Thu, Sep 24, 2020 at 1:28 AM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>
> > > Note that drivers that connect to the bus typically don't live in the
> > > same subdirectory as the driver that operates the bus. E.g. the
> > > battery driver would go into drivers/power/supply and the input
> > > would go into drivers/input/ or drivers/hid.
> >
> > Right. I wonder if this also holds for devices that are directly
> > dependent on a special platform though? It could make sense to have them
> > under plaform/surface rather than in the individual subsystems as they
> > are only ever going to be used on this platform. On the other hand, one
> > could argue that having them in the subsystem directories is better for
> > maintainability.
>
> Yes, absolutely. The subsystem maintainers are the ones that are
> most qualified of reviewing code that uses their subsystem, regardless
> of which bus is used underneath the device, and having all drivers
> for a subsystem in one place makes it much easier to refactor them
> all at once in case the internal interfaces are changed or common bugs
> are found in multiple drivers.

The problem is that some of the drivers are mostly reincarnation of
board files due to the platform being Windows-oriented with badly
written ACPI tables / firmware as a whole (which means a lot of quirks
are required).

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-24 19:17     ` Maximilian Luz
@ 2020-09-25 14:58       ` Andy Shevchenko
  2020-09-25 15:41         ` Maximilian Luz
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Shevchenko @ 2020-09-25 14:58 UTC (permalink / raw)
  To: Maximilian Luz
  Cc: Arnd Bergmann, linux-kernel, open list:SERIAL DRIVERS,
	ACPI Devel Maling List, Greg Kroah-Hartman, Rob Herring,
	Jiri Slaby, Rafael J. Wysocki, Len Brown, Blaž Hrastnik,
	Dorian Stoll, Enric Balletbo i Serra, Hans de Goede,
	Mika Westerberg, Gayatri Kammela, Darren Hart, Platform Driver

On Thu, Sep 24, 2020 at 10:17 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> On 9/24/20 10:30 AM, Andy Shevchenko wrote:
> > On Wed, Sep 23, 2020 at 6:32 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >> On Wed, Sep 23, 2020 at 5:15 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:

...

> >> I think this should go to drivers/platform/x86 or drivers/platform/surface/
> >> along with other laptop vendor specific code rather than drivers/misc/.
> >
> > +1 here. drivers/platform/surface is a good place to start.
> > And you may begin with moving a few Surface drivers out of PDx86 to
> > the new folder.
>
> Perfect, thanks! I'll draft up a patch series over the weekend.
>
> A couple questions regarding structure and maintenance:
>
>   - Should I CC the platform-driver-x86 list on future submissions to
>     drivers/platform/surface? I.e. is this something you would want to
>     review if it doesn't touch the drivers/platform/x86 directory?

Include PDx86 mailing list to the list of that. Current SURFACE*
drivers have per driver record in MAINTAINERS IIRC. So, update them as
well if needed.

>   - How would you want the layout to be, specifically regarding to the
>     surface-aggregator stuff? My suggestion would be simply:
>
>     drivers/platform/surface/

>         surface_aggregator/

Don't repeat parts of the path, the aggregator is enough as a folder
name, but the driver of course should be in its own namespace
('surface').

>             Kconfig
>             Makefile
>             core.c
>             controller.c
>             ... (all core stuff built into the surface_aggregator module)
>         Kconfig
>         Makefile

>         surface_aggregator_debugfs.c

(Not sure why it's not a part of aggregator folder)

>         surface_acpi_notify.c
>         surface_*.c        (any other surface platform driver as well
>                             as drivers dependent on surface_aggregator)
>
>   - Regarding future things like HID transport driver, battery/AC driver:
>     Submit them to drivers/platform/surface or to their respective
>     subsystem directories?

Respective subsystem _if_ it is a subsystem related driver and not
kinda board file. Use common sense and existing examples.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module
  2020-09-25 14:58       ` Andy Shevchenko
@ 2020-09-25 15:41         ` Maximilian Luz
  0 siblings, 0 replies; 27+ messages in thread
From: Maximilian Luz @ 2020-09-25 15:41 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Arnd Bergmann, linux-kernel, open list:SERIAL DRIVERS,
	ACPI Devel Maling List, Greg Kroah-Hartman, Rob Herring,
	Jiri Slaby, Rafael J. Wysocki, Len Brown, Blaž Hrastnik,
	Dorian Stoll, Enric Balletbo i Serra, Hans de Goede,
	Mika Westerberg, Gayatri Kammela, Darren Hart, Platform Driver

On 9/25/20 4:58 PM, Andy Shevchenko wrote:
> On Thu, Sep 24, 2020 at 10:17 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>> On 9/24/20 10:30 AM, Andy Shevchenko wrote:
>>> On Wed, Sep 23, 2020 at 6:32 PM Arnd Bergmann <arnd@arndb.de> wrote:
>>>> On Wed, Sep 23, 2020 at 5:15 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> 
> ...
> 
>>>> I think this should go to drivers/platform/x86 or drivers/platform/surface/
>>>> along with other laptop vendor specific code rather than drivers/misc/.
>>>
>>> +1 here. drivers/platform/surface is a good place to start.
>>> And you may begin with moving a few Surface drivers out of PDx86 to
>>> the new folder.
>>
>> Perfect, thanks! I'll draft up a patch series over the weekend.
>>
>> A couple questions regarding structure and maintenance:
>>
>>    - Should I CC the platform-driver-x86 list on future submissions to
>>      drivers/platform/surface? I.e. is this something you would want to
>>      review if it doesn't touch the drivers/platform/x86 directory?
> 
> Include PDx86 mailing list to the list of that. Current SURFACE*
> drivers have per driver record in MAINTAINERS IIRC. So, update them as
> well if needed.

Will do.

>>    - How would you want the layout to be, specifically regarding to the
>>      surface-aggregator stuff? My suggestion would be simply:
>>
>>      drivers/platform/surface/
> 
>>          surface_aggregator/
> 
> Don't repeat parts of the path, the aggregator is enough as a folder
> name, but the driver of course should be in its own namespace
> ('surface').

Okay.

>>              Kconfig
>>              Makefile
>>              core.c
>>              controller.c
>>              ... (all core stuff built into the surface_aggregator module)
>>          Kconfig
>>          Makefile
> 
>>          surface_aggregator_debugfs.c
> 
> (Not sure why it's not a part of aggregator folder)

I kind of thought of the aggregator folder to contain only files that
build the core module. surface_aggregator_debugfs is intended as
separate module, to be loaded when needed. So I'd consider it a client
driver to the aggregator in the same way that surface_acpi_notify is.

Let me know if you still want me to move this into the aggregator folder
though. Personally, I just feel that that might lead to a bit of
confusion, specifically the idea that it's built into the core when it's
not.

>>          surface_acpi_notify.c
>>          surface_*.c        (any other surface platform driver as well
>>                              as drivers dependent on surface_aggregator)
>>
>>    - Regarding future things like HID transport driver, battery/AC driver:
>>      Submit them to drivers/platform/surface or to their respective
>>      subsystem directories?
> 
> Respective subsystem _if_ it is a subsystem related driver and not
> kinda board file. Use common sense and existing examples.

Right, thank you!

Regards,
Max

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, back to index

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 15:15 [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Maximilian Luz
2020-09-23 15:15 ` [RFC PATCH 2/9] surface_aggregator: Add control packet allocation chaching Maximilian Luz
2020-09-23 15:15 ` [RFC PATCH 3/9] surface_aggregator: Add event item " Maximilian Luz
2020-09-23 15:15 ` [RFC PATCH 4/9] surface_aggregator: Add trace points Maximilian Luz
2020-09-23 20:07   ` Steven Rostedt
2020-09-23 23:43     ` Maximilian Luz
2020-09-23 15:15 ` [RFC PATCH 5/9] surface_aggregator: Add error injection capabilities Maximilian Luz
2020-09-23 17:45   ` Greg Kroah-Hartman
2020-09-23 21:28     ` Maximilian Luz
2020-09-23 15:15 ` [RFC PATCH 7/9] docs: driver-api: Add Surface Aggregator subsystem documentation Maximilian Luz
2020-09-23 15:30 ` [RFC PATCH 0/9] Add support for Microsoft Surface System Aggregator Module Arnd Bergmann
2020-09-23 15:43   ` Maximilian Luz
2020-09-23 19:43     ` Arnd Bergmann
2020-09-23 23:28       ` Maximilian Luz
2020-09-24  8:26         ` Arnd Bergmann
2020-09-24 18:59           ` Maximilian Luz
2020-09-24 19:38             ` Arnd Bergmann
2020-09-24 21:07               ` Maximilian Luz
2020-09-25 14:53               ` Andy Shevchenko
2020-09-24  8:30   ` Andy Shevchenko
2020-09-24 19:17     ` Maximilian Luz
2020-09-25 14:58       ` Andy Shevchenko
2020-09-25 15:41         ` Maximilian Luz
     [not found] ` <20200923151511.3842150-2-luzmaximilian@gmail.com>
2020-09-23 16:57   ` [RFC PATCH 1/9] misc: Add Surface Aggregator subsystem Greg Kroah-Hartman
2020-09-23 20:34     ` Maximilian Luz
2020-09-24  6:48       ` Greg Kroah-Hartman
2020-09-24 18:16         ` Maximilian Luz

Linux-Serial Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-serial/0 linux-serial/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-serial linux-serial/ https://lore.kernel.org/linux-serial \
		linux-serial@vger.kernel.org
	public-inbox-index linux-serial

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-serial


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git