[char-misc-next 0/4] mei: enhance mei pxp recoverability

All of lore.kernel.org
 help / color / mirror / Atom feed

* [char-misc-next 0/4] mei: enhance mei pxp recoverability
@ 2023-10-11 11:01 Tomas Winkler
  2023-10-11 11:01 ` [char-misc-next 1/4] mei: bus: add send and recv api with timeout Tomas Winkler
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Tomas Winkler @ 2023-10-11 11:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Alexander Usyskin, Vitaly Lubart, linux-kernel, Tomas Winkler

- Add timeouts to send/receive API on mei bus to avoid
  infinite waits and consequent deadlock.
- Re-enable the mei pxp client upon loss.
- Retry receive operation in case of memory pressure related failure.

Alan Previn (1):
  mei: update mei-pxp's component interface with timeouts

Alexander Usyskin (3):
  mei: bus: add send and recv api with timeout
  mei: pxp: recover from recv fail under memory pressure
  mei: pxp: re-enable client on errors

 drivers/gpu/drm/i915/pxp/intel_pxp_tee.c |  8 ++-
 drivers/misc/mei/bus.c                   | 88 +++++++++++++++++++++++-
 drivers/misc/mei/pxp/mei_pxp.c           | 88 +++++++++++++++++++++---
 include/drm/i915_pxp_tee_interface.h     |  6 +-
 include/linux/mei_cl_bus.h               |  8 +++
 5 files changed, 183 insertions(+), 15 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [char-misc-next 1/4] mei: bus: add send and recv api with timeout
  2023-10-11 11:01 [char-misc-next 0/4] mei: enhance mei pxp recoverability Tomas Winkler
@ 2023-10-11 11:01 ` Tomas Winkler
  2023-10-11 11:01 ` [char-misc-next 2/4] mei: pxp: recover from recv fail under memory pressure Tomas Winkler
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Tomas Winkler @ 2023-10-11 11:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Alexander Usyskin, Vitaly Lubart, linux-kernel, Alan Previn,
	Tomas Winkler

From: Alexander Usyskin <alexander.usyskin@intel.com>

Add variation of the send and recv functions on bus
that define timeout.
Caller can use such functions in flow that can stuck
to bail out and not to put down the whole system.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
---
 drivers/misc/mei/bus.c     | 88 +++++++++++++++++++++++++++++++++++++-
 include/linux/mei_cl_bus.h |  8 ++++
 2 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index 67557c67bd214415b8dc6747..f9bcff197615128d72f17590 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -257,7 +257,7 @@ ssize_t __mei_cl_recv(struct mei_cl *cl, u8 *buf, size_t length, u8 *vtag,
 }
 
 /**
- * mei_cldev_send_vtag - me device send with vtag  (write)
+ * mei_cldev_send_vtag - me device send with vtag (write)
  *
  * @cldev: me client device
  * @buf: buffer to send
@@ -278,6 +278,29 @@ ssize_t mei_cldev_send_vtag(struct mei_cl_device *cldev, const u8 *buf,
 }
 EXPORT_SYMBOL_GPL(mei_cldev_send_vtag);
 
+/**
+ * mei_cldev_send_vtag_timeout - me device send with vtag and timeout (write)
+ *
+ * @cldev: me client device
+ * @buf: buffer to send
+ * @length: buffer length
+ * @vtag: virtual tag
+ * @timeout: send timeout in milliseconds, 0 for infinite timeout
+ *
+ * Return:
+ *  * written size in bytes
+ *  * < 0 on error
+ */
+
+ssize_t mei_cldev_send_vtag_timeout(struct mei_cl_device *cldev, const u8 *buf,
+				    size_t length, u8 vtag, unsigned long timeout)
+{
+	struct mei_cl *cl = cldev->cl;
+
+	return __mei_cl_send_timeout(cl, buf, length, vtag, MEI_CL_IO_TX_BLOCKING, timeout);
+}
+EXPORT_SYMBOL_GPL(mei_cldev_send_vtag_timeout);
+
 /**
  * mei_cldev_recv_vtag - client receive with vtag (read)
  *
@@ -323,7 +346,49 @@ ssize_t mei_cldev_recv_nonblock_vtag(struct mei_cl_device *cldev, u8 *buf,
 EXPORT_SYMBOL_GPL(mei_cldev_recv_nonblock_vtag);
 
 /**
- * mei_cldev_send - me device send  (write)
+ * mei_cldev_recv_timeout - client receive with timeout (read)
+ *
+ * @cldev: me client device
+ * @buf: buffer to receive
+ * @length: buffer length
+ * @timeout: send timeout in milliseconds, 0 for infinite timeout
+ *
+ * Return:
+ * * read size in bytes
+ * *  < 0 on error
+ */
+ssize_t mei_cldev_recv_timeout(struct mei_cl_device *cldev, u8 *buf, size_t length,
+			       unsigned long timeout)
+{
+	return mei_cldev_recv_vtag_timeout(cldev, buf, length, NULL, timeout);
+}
+EXPORT_SYMBOL_GPL(mei_cldev_recv_timeout);
+
+/**
+ * mei_cldev_recv_vtag_timeout - client receive with vtag (read)
+ *
+ * @cldev: me client device
+ * @buf: buffer to receive
+ * @length: buffer length
+ * @vtag: virtual tag
+ * @timeout: recv timeout in milliseconds, 0 for infinite timeout
+ *
+ * Return:
+ * * read size in bytes
+ * *  < 0 on error
+ */
+
+ssize_t mei_cldev_recv_vtag_timeout(struct mei_cl_device *cldev, u8 *buf, size_t length,
+				    u8 *vtag, unsigned long timeout)
+{
+	struct mei_cl *cl = cldev->cl;
+
+	return __mei_cl_recv(cl, buf, length, vtag, 0, timeout);
+}
+EXPORT_SYMBOL_GPL(mei_cldev_recv_vtag_timeout);
+
+/**
+ * mei_cldev_send - me device send (write)
  *
  * @cldev: me client device
  * @buf: buffer to send
@@ -339,6 +404,25 @@ ssize_t mei_cldev_send(struct mei_cl_device *cldev, const u8 *buf, size_t length
 }
 EXPORT_SYMBOL_GPL(mei_cldev_send);
 
+/**
+ * mei_cldev_send_timeout - me device send with timeout (write)
+ *
+ * @cldev: me client device
+ * @buf: buffer to send
+ * @length: buffer length
+ * @timeout: send timeout in milliseconds, 0 for infinite timeout
+ *
+ * Return:
+ *  * written size in bytes
+ *  * < 0 on error
+ */
+ssize_t mei_cldev_send_timeout(struct mei_cl_device *cldev, const u8 *buf, size_t length,
+			       unsigned long timeout)
+{
+	return mei_cldev_send_vtag_timeout(cldev, buf, length, 0, timeout);
+}
+EXPORT_SYMBOL_GPL(mei_cldev_send_timeout);
+
 /**
  * mei_cldev_recv - client receive (read)
  *
diff --git a/include/linux/mei_cl_bus.h b/include/linux/mei_cl_bus.h
index c9af62e54577033bf9bae0e4..b38a56a13f39277f948c53b8 100644
--- a/include/linux/mei_cl_bus.h
+++ b/include/linux/mei_cl_bus.h
@@ -94,15 +94,23 @@ void mei_cldev_driver_unregister(struct mei_cl_driver *cldrv);
 
 ssize_t mei_cldev_send(struct mei_cl_device *cldev, const u8 *buf,
 		       size_t length);
+ssize_t mei_cldev_send_timeout(struct mei_cl_device *cldev, const u8 *buf,
+			       size_t length, unsigned long timeout);
 ssize_t mei_cldev_recv(struct mei_cl_device *cldev, u8 *buf, size_t length);
 ssize_t mei_cldev_recv_nonblock(struct mei_cl_device *cldev, u8 *buf,
 				size_t length);
+ssize_t mei_cldev_recv_timeout(struct mei_cl_device *cldev, u8 *buf, size_t length,
+			       unsigned long timeout);
 ssize_t mei_cldev_send_vtag(struct mei_cl_device *cldev, const u8 *buf,
 			    size_t length, u8 vtag);
+ssize_t mei_cldev_send_vtag_timeout(struct mei_cl_device *cldev, const u8 *buf,
+				    size_t length, u8 vtag, unsigned long timeout);
 ssize_t mei_cldev_recv_vtag(struct mei_cl_device *cldev, u8 *buf, size_t length,
 			    u8 *vtag);
 ssize_t mei_cldev_recv_nonblock_vtag(struct mei_cl_device *cldev, u8 *buf,
 				     size_t length, u8 *vtag);
+ssize_t mei_cldev_recv_vtag_timeout(struct mei_cl_device *cldev, u8 *buf, size_t length,
+			    u8 *vtag, unsigned long timeout);
 
 int mei_cldev_register_rx_cb(struct mei_cl_device *cldev, mei_cldev_cb_t rx_cb);
 int mei_cldev_register_notif_cb(struct mei_cl_device *cldev,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [char-misc-next 2/4] mei: pxp: recover from recv fail under memory pressure
  2023-10-11 11:01 [char-misc-next 0/4] mei: enhance mei pxp recoverability Tomas Winkler
  2023-10-11 11:01 ` [char-misc-next 1/4] mei: bus: add send and recv api with timeout Tomas Winkler
@ 2023-10-11 11:01 ` Tomas Winkler
  2023-10-11 11:01 ` [char-misc-next 3/4] mei: pxp: re-enable client on errors Tomas Winkler
  2023-10-11 11:01 ` [char-misc-next 4/4] mei: update mei-pxp's component interface with timeouts Tomas Winkler
  3 siblings, 0 replies; 19+ messages in thread
From: Tomas Winkler @ 2023-10-11 11:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Alexander Usyskin, Vitaly Lubart, linux-kernel, Alan Previn,
	Tomas Winkler

From: Alexander Usyskin <alexander.usyskin@intel.com>

Under memory pressure recv fails due to kmalloc failure,
and if drivers(pxp) retry send/receive, send blocks
indefinitely.
Send without recv leaves the channel in a bad state.

Retry send attempt after small timeout and reset the channel if
the retry failed on kmalloc failure too.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
---
 drivers/misc/mei/pxp/mei_pxp.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/mei/pxp/mei_pxp.c b/drivers/misc/mei/pxp/mei_pxp.c
index 2dcb9169e404c3bcbbccc7b4..c6cdd6a47308ebcc72f34c38 100644
--- a/drivers/misc/mei/pxp/mei_pxp.c
+++ b/drivers/misc/mei/pxp/mei_pxp.c
@@ -11,6 +11,7 @@
  * negotiation messages to ME FW command payloads and vice versa.
  */
 
+#include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/mei.h>
@@ -61,16 +62,38 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
 {
 	struct mei_cl_device *cldev;
 	ssize_t byte;
+	bool retry = false;
 
 	if (!dev || !buffer)
 		return -EINVAL;
 
 	cldev = to_mei_cl_device(dev);
 
+retry:
 	byte = mei_cldev_recv(cldev, buffer, size);
 	if (byte < 0) {
 		dev_dbg(dev, "mei_cldev_recv failed. %zd\n", byte);
-		return byte;
+		if (byte != -ENOMEM)
+			return byte;
+
+		/* Retry the read when pages are reclaimed */
+		msleep(20);
+		if (!retry) {
+			retry = true;
+			goto retry;
+		} else {
+			dev_warn(dev, "No memory on data receive after retry, trying to reset the channel...\n");
+			byte = mei_cldev_disable(cldev);
+			if (byte < 0)
+				dev_warn(dev, "mei_cldev_disable failed. %zd\n", byte);
+			/*
+			 * Explicitly ignoring disable failure,
+			 * enable may fix the states and succeed
+			 */
+			byte = mei_cldev_enable(cldev);
+			if (byte < 0)
+				dev_err(dev, "mei_cldev_enable failed. %zd\n", byte);
+		}
 	}
 
 	return byte;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [char-misc-next 3/4] mei: pxp: re-enable client on errors
  2023-10-11 11:01 [char-misc-next 0/4] mei: enhance mei pxp recoverability Tomas Winkler
  2023-10-11 11:01 ` [char-misc-next 1/4] mei: bus: add send and recv api with timeout Tomas Winkler
  2023-10-11 11:01 ` [char-misc-next 2/4] mei: pxp: recover from recv fail under memory pressure Tomas Winkler
@ 2023-10-11 11:01 ` Tomas Winkler
  2023-11-14 14:00     ` [Intel-gfx] " Ville Syrjälä
  2023-10-11 11:01 ` [char-misc-next 4/4] mei: update mei-pxp's component interface with timeouts Tomas Winkler
  3 siblings, 1 reply; 19+ messages in thread
From: Tomas Winkler @ 2023-10-11 11:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Alexander Usyskin, Vitaly Lubart, linux-kernel, Tomas Winkler

From: Alexander Usyskin <alexander.usyskin@intel.com>

Disable and enable mei-pxp client on errors to clean the internal state.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
---
 drivers/misc/mei/pxp/mei_pxp.c | 70 +++++++++++++++++++++++-----------
 1 file changed, 48 insertions(+), 22 deletions(-)

diff --git a/drivers/misc/mei/pxp/mei_pxp.c b/drivers/misc/mei/pxp/mei_pxp.c
index c6cdd6a47308ebcc72f34c38..9875d16445bb03efcfb31cd9 100644
--- a/drivers/misc/mei/pxp/mei_pxp.c
+++ b/drivers/misc/mei/pxp/mei_pxp.c
@@ -23,6 +23,24 @@
 
 #include "mei_pxp.h"
 
+static inline int mei_pxp_reenable(const struct device *dev, struct mei_cl_device *cldev)
+{
+	int ret;
+
+	dev_warn(dev, "Trying to reset the channel...\n");
+	ret = mei_cldev_disable(cldev);
+	if (ret < 0)
+		dev_warn(dev, "mei_cldev_disable failed. %d\n", ret);
+	/*
+	 * Explicitly ignoring disable failure,
+	 * enable may fix the states and succeed
+	 */
+	ret = mei_cldev_enable(cldev);
+	if (ret < 0)
+		dev_err(dev, "mei_cldev_enable failed. %d\n", ret);
+	return ret;
+}
+
 /**
  * mei_pxp_send_message() - Sends a PXP message to ME FW.
  * @dev: device corresponding to the mei_cl_device
@@ -35,6 +53,7 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
 {
 	struct mei_cl_device *cldev;
 	ssize_t byte;
+	int ret;
 
 	if (!dev || !message)
 		return -EINVAL;
@@ -44,10 +63,20 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
 	byte = mei_cldev_send(cldev, message, size);
 	if (byte < 0) {
 		dev_dbg(dev, "mei_cldev_send failed. %zd\n", byte);
-		return byte;
+		switch (byte) {
+		case -ENOMEM:
+			fallthrough;
+		case -ENODEV:
+			fallthrough;
+		case -ETIME:
+			ret = mei_pxp_reenable(dev, cldev);
+			if (ret)
+				byte = ret;
+			break;
+		}
 	}
 
-	return 0;
+	return byte;
 }
 
 /**
@@ -63,6 +92,7 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
 	struct mei_cl_device *cldev;
 	ssize_t byte;
 	bool retry = false;
+	int ret;
 
 	if (!dev || !buffer)
 		return -EINVAL;
@@ -73,26 +103,22 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
 	byte = mei_cldev_recv(cldev, buffer, size);
 	if (byte < 0) {
 		dev_dbg(dev, "mei_cldev_recv failed. %zd\n", byte);
-		if (byte != -ENOMEM)
-			return byte;
-
-		/* Retry the read when pages are reclaimed */
-		msleep(20);
-		if (!retry) {
-			retry = true;
-			goto retry;
-		} else {
-			dev_warn(dev, "No memory on data receive after retry, trying to reset the channel...\n");
-			byte = mei_cldev_disable(cldev);
-			if (byte < 0)
-				dev_warn(dev, "mei_cldev_disable failed. %zd\n", byte);
-			/*
-			 * Explicitly ignoring disable failure,
-			 * enable may fix the states and succeed
-			 */
-			byte = mei_cldev_enable(cldev);
-			if (byte < 0)
-				dev_err(dev, "mei_cldev_enable failed. %zd\n", byte);
+		switch (byte) {
+		case -ENOMEM:
+			/* Retry the read when pages are reclaimed */
+			msleep(20);
+			if (!retry) {
+				retry = true;
+				goto retry;
+			}
+			fallthrough;
+		case -ENODEV:
+			fallthrough;
+		case -ETIME:
+			ret = mei_pxp_reenable(dev, cldev);
+			if (ret)
+				byte = ret;
+			break;
 		}
 	}
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [char-misc-next 4/4] mei: update mei-pxp's component interface with timeouts
  2023-10-11 11:01 [char-misc-next 0/4] mei: enhance mei pxp recoverability Tomas Winkler
                   ` (2 preceding siblings ...)
  2023-10-11 11:01 ` [char-misc-next 3/4] mei: pxp: re-enable client on errors Tomas Winkler
@ 2023-10-11 11:01 ` Tomas Winkler
  3 siblings, 0 replies; 19+ messages in thread
From: Tomas Winkler @ 2023-10-11 11:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Alexander Usyskin, Vitaly Lubart, linux-kernel, Alan Previn,
	Tomas Winkler

From: Alan Previn <alan.previn.teres.alexis@intel.com>

In debugging platform or firmware related MEI-PXP connection
issues, having a timeout when clients (such as i915) calling
into mei-pxp's send/receive functions have proven useful as opposed to
blocking forever until the kernel triggers a watchdog panic (when
platform issues are experienced).

Update the mei-pxp component interface send and receive functions
to take in timeouts.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
---
 drivers/gpu/drm/i915/pxp/intel_pxp_tee.c |  8 ++++--
 drivers/misc/mei/pxp/mei_pxp.c           | 33 +++++++++++++++++++-----
 include/drm/i915_pxp_tee_interface.h     |  6 +++--
 3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c b/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c
index 80bb0018986525f16d410e56..dfc2878426fc2226ccb55270 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c
@@ -20,6 +20,8 @@
 #include "intel_pxp_tee.h"
 #include "intel_pxp_types.h"
 
+#define PXP_TRANSPORT_TIMEOUT_MS 5000 /* 5 sec */
+
 static bool
 is_fw_err_platform_config(u32 type)
 {
@@ -71,13 +73,15 @@ static int intel_pxp_tee_io_message(struct intel_pxp *pxp,
 		goto unlock;
 	}
 
-	ret = pxp_component->ops->send(pxp_component->tee_dev, msg_in, msg_in_size);
+	ret = pxp_component->ops->send(pxp_component->tee_dev, msg_in, msg_in_size,
+				       PXP_TRANSPORT_TIMEOUT_MS);
 	if (ret) {
 		drm_err(&i915->drm, "Failed to send PXP TEE message\n");
 		goto unlock;
 	}
 
-	ret = pxp_component->ops->recv(pxp_component->tee_dev, msg_out, msg_out_max_size);
+	ret = pxp_component->ops->recv(pxp_component->tee_dev, msg_out, msg_out_max_size,
+				       PXP_TRANSPORT_TIMEOUT_MS);
 	if (ret < 0) {
 		drm_err(&i915->drm, "Failed to receive PXP TEE message\n");
 		goto unlock;
diff --git a/drivers/misc/mei/pxp/mei_pxp.c b/drivers/misc/mei/pxp/mei_pxp.c
index 9875d16445bb03efcfb31cd9..f77d78fa50549e69f0a0873b 100644
--- a/drivers/misc/mei/pxp/mei_pxp.c
+++ b/drivers/misc/mei/pxp/mei_pxp.c
@@ -46,10 +46,20 @@ static inline int mei_pxp_reenable(const struct device *dev, struct mei_cl_devic
  * @dev: device corresponding to the mei_cl_device
  * @message: a message buffer to send
  * @size: size of the message
- * Return: 0 on Success, <0 on Failure
+ * @timeout_ms: timeout in milliseconds, zero means wait indefinitely.
+ *
+ * Returns: 0 on Success, <0 on Failure with the following defined failures.
+ *         -ENODEV: Client was not connected.
+ *                  Caller may attempt to try again immediately.
+ *         -ENOMEM: Internal memory allocation failure experienced.
+ *                  Caller may sleep to allow kernel reclaim before retrying.
+ *         -EINTR : Calling thread received a signal. Caller may choose
+ *                  to abandon with the same thread id.
+ *         -ETIME : Request is timed out.
+ *                  Caller may attempt to try again immediately.
  */
 static int
-mei_pxp_send_message(struct device *dev, const void *message, size_t size)
+mei_pxp_send_message(struct device *dev, const void *message, size_t size, unsigned long timeout_ms)
 {
 	struct mei_cl_device *cldev;
 	ssize_t byte;
@@ -60,7 +70,7 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
 
 	cldev = to_mei_cl_device(dev);
 
-	byte = mei_cldev_send(cldev, message, size);
+	byte = mei_cldev_send_timeout(cldev, message, size, timeout_ms);
 	if (byte < 0) {
 		dev_dbg(dev, "mei_cldev_send failed. %zd\n", byte);
 		switch (byte) {
@@ -84,10 +94,21 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
  * @dev: device corresponding to the mei_cl_device
  * @buffer: a message buffer to contain the received message
  * @size: size of the buffer
- * Return: bytes sent on Success, <0 on Failure
+ * @timeout_ms: timeout in milliseconds, zero means wait indefinitely.
+ *
+ * Returns: number of bytes send on Success, <0 on Failure with the following defined failures.
+ *         -ENODEV: Client was not connected.
+ *                  Caller may attempt to try again from send immediately.
+ *         -ENOMEM: Internal memory allocation failure experienced.
+ *                  Caller may sleep to allow kernel reclaim before retrying.
+ *         -EINTR : Calling thread received a signal. Caller will need to repeat calling
+ *                  (with a different owning thread) to retrieve existing unclaimed response
+ *                  (and may discard it).
+ *         -ETIME : Request is timed out.
+ *                  Caller may attempt to try again from send immediately.
  */
 static int
-mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
+mei_pxp_receive_message(struct device *dev, void *buffer, size_t size, unsigned long timeout_ms)
 {
 	struct mei_cl_device *cldev;
 	ssize_t byte;
@@ -100,7 +121,7 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
 	cldev = to_mei_cl_device(dev);
 
 retry:
-	byte = mei_cldev_recv(cldev, buffer, size);
+	byte = mei_cldev_recv_timeout(cldev, buffer, size, timeout_ms);
 	if (byte < 0) {
 		dev_dbg(dev, "mei_cldev_recv failed. %zd\n", byte);
 		switch (byte) {
diff --git a/include/drm/i915_pxp_tee_interface.h b/include/drm/i915_pxp_tee_interface.h
index a702b6ec17f7ca95eda7d225..7d96985f2d05327151b0dfc1 100644
--- a/include/drm/i915_pxp_tee_interface.h
+++ b/include/drm/i915_pxp_tee_interface.h
@@ -22,8 +22,10 @@ struct i915_pxp_component_ops {
 	 */
 	struct module *owner;
 
-	int (*send)(struct device *dev, const void *message, size_t size);
-	int (*recv)(struct device *dev, void *buffer, size_t size);
+	int (*send)(struct device *dev, const void *message, size_t size,
+		    unsigned long timeout_ms);
+	int (*recv)(struct device *dev, void *buffer, size_t size,
+		    unsigned long timeout_ms);
 	ssize_t (*gsc_command)(struct device *dev, u8 client_id, u32 fence_id,
 			       struct scatterlist *sg_in, size_t total_in_len,
 			       struct scatterlist *sg_out);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
  2023-10-11 11:01 ` [char-misc-next 3/4] mei: pxp: re-enable client on errors Tomas Winkler
@ 2023-11-14 14:00     ` Ville Syrjälä
  0 siblings, 0 replies; 19+ messages in thread
From: Ville Syrjälä @ 2023-11-14 14:00 UTC (permalink / raw)
  To: Tomas Winkler
  Cc: Greg Kroah-Hartman, Alexander Usyskin, Vitaly Lubart,
	linux-kernel, intel-gfx, Alan Previn

On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> From: Alexander Usyskin <alexander.usyskin@intel.com>
> 
> Disable and enable mei-pxp client on errors to clean the internal state.

This broke i915 on my Alderlake-P laptop.

Trying to start Xorg just hangs and I eventually have to power off the
laptop to get things back into shape.

The behaviour gets a bit better after commit fb99e79ee62a ("mei: update mei-pxp's
component interface with timeouts") as Xorg "only" gets blocked for
~10 seconds, after which it manages to start, and I get a bunch of spew
in dmesg:
[   25.431535] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.435241] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   30.435965] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.437341] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.437356] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   35.555210] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   35.555919] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   35.555937] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg init arb session, ret=[-62]
[   35.555941] i915 0000:00:02.0: [drm] *ERROR* tee cmd for arb session creation failed
[   35.556765] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   36.021808] fuse: init (API version 7.39)
[   40.675183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   40.676045] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   40.676591] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   40.676602] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   40.960209] mate-session-ch[5936]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[   45.795172] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   45.795872] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   45.796520] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   50.915183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   50.916005] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   50.916012] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[-62]
[   50.916846] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.035149] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   56.035956] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.036585] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.036592] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   61.155137] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...

The same spew repeats every time I run any application that uses the GPU,
and the application also gets blocked for a long time (eg. firefox takes
over 15 seconds to start now).

> 
> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
> ---
>  drivers/misc/mei/pxp/mei_pxp.c | 70 +++++++++++++++++++++++-----------
>  1 file changed, 48 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/misc/mei/pxp/mei_pxp.c b/drivers/misc/mei/pxp/mei_pxp.c
> index c6cdd6a47308ebcc72f34c38..9875d16445bb03efcfb31cd9 100644
> --- a/drivers/misc/mei/pxp/mei_pxp.c
> +++ b/drivers/misc/mei/pxp/mei_pxp.c
> @@ -23,6 +23,24 @@
>  
>  #include "mei_pxp.h"
>  
> +static inline int mei_pxp_reenable(const struct device *dev, struct mei_cl_device *cldev)
> +{
> +	int ret;
> +
> +	dev_warn(dev, "Trying to reset the channel...\n");
> +	ret = mei_cldev_disable(cldev);
> +	if (ret < 0)
> +		dev_warn(dev, "mei_cldev_disable failed. %d\n", ret);
> +	/*
> +	 * Explicitly ignoring disable failure,
> +	 * enable may fix the states and succeed
> +	 */
> +	ret = mei_cldev_enable(cldev);
> +	if (ret < 0)
> +		dev_err(dev, "mei_cldev_enable failed. %d\n", ret);
> +	return ret;
> +}
> +
>  /**
>   * mei_pxp_send_message() - Sends a PXP message to ME FW.
>   * @dev: device corresponding to the mei_cl_device
> @@ -35,6 +53,7 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
>  {
>  	struct mei_cl_device *cldev;
>  	ssize_t byte;
> +	int ret;
>  
>  	if (!dev || !message)
>  		return -EINVAL;
> @@ -44,10 +63,20 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
>  	byte = mei_cldev_send(cldev, message, size);
>  	if (byte < 0) {
>  		dev_dbg(dev, "mei_cldev_send failed. %zd\n", byte);
> -		return byte;
> +		switch (byte) {
> +		case -ENOMEM:
> +			fallthrough;
> +		case -ENODEV:
> +			fallthrough;
> +		case -ETIME:
> +			ret = mei_pxp_reenable(dev, cldev);
> +			if (ret)
> +				byte = ret;
> +			break;
> +		}
>  	}
>  
> -	return 0;
> +	return byte;
>  }
>  
>  /**
> @@ -63,6 +92,7 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
>  	struct mei_cl_device *cldev;
>  	ssize_t byte;
>  	bool retry = false;
> +	int ret;
>  
>  	if (!dev || !buffer)
>  		return -EINVAL;
> @@ -73,26 +103,22 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
>  	byte = mei_cldev_recv(cldev, buffer, size);
>  	if (byte < 0) {
>  		dev_dbg(dev, "mei_cldev_recv failed. %zd\n", byte);
> -		if (byte != -ENOMEM)
> -			return byte;
> -
> -		/* Retry the read when pages are reclaimed */
> -		msleep(20);
> -		if (!retry) {
> -			retry = true;
> -			goto retry;
> -		} else {
> -			dev_warn(dev, "No memory on data receive after retry, trying to reset the channel...\n");
> -			byte = mei_cldev_disable(cldev);
> -			if (byte < 0)
> -				dev_warn(dev, "mei_cldev_disable failed. %zd\n", byte);
> -			/*
> -			 * Explicitly ignoring disable failure,
> -			 * enable may fix the states and succeed
> -			 */
> -			byte = mei_cldev_enable(cldev);
> -			if (byte < 0)
> -				dev_err(dev, "mei_cldev_enable failed. %zd\n", byte);
> +		switch (byte) {
> +		case -ENOMEM:
> +			/* Retry the read when pages are reclaimed */
> +			msleep(20);
> +			if (!retry) {
> +				retry = true;
> +				goto retry;
> +			}
> +			fallthrough;
> +		case -ENODEV:
> +			fallthrough;
> +		case -ETIME:
> +			ret = mei_pxp_reenable(dev, cldev);
> +			if (ret)
> +				byte = ret;
> +			break;
>  		}
>  	}
>  
> -- 
> 2.41.0
> 

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
@ 2023-11-14 14:00     ` Ville Syrjälä
  0 siblings, 0 replies; 19+ messages in thread
From: Ville Syrjälä @ 2023-11-14 14:00 UTC (permalink / raw)
  To: Tomas Winkler
  Cc: Alan Previn, Greg Kroah-Hartman, intel-gfx, Alexander Usyskin,
	linux-kernel, Vitaly Lubart

On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> From: Alexander Usyskin <alexander.usyskin@intel.com>
> 
> Disable and enable mei-pxp client on errors to clean the internal state.

This broke i915 on my Alderlake-P laptop.

Trying to start Xorg just hangs and I eventually have to power off the
laptop to get things back into shape.

The behaviour gets a bit better after commit fb99e79ee62a ("mei: update mei-pxp's
component interface with timeouts") as Xorg "only" gets blocked for
~10 seconds, after which it manages to start, and I get a bunch of spew
in dmesg:
[   25.431535] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.435241] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   30.435965] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.437341] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   30.437356] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   35.555210] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   35.555919] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   35.555937] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg init arb session, ret=[-62]
[   35.555941] i915 0000:00:02.0: [drm] *ERROR* tee cmd for arb session creation failed
[   35.556765] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   36.021808] fuse: init (API version 7.39)
[   40.675183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   40.676045] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   40.676591] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   40.676602] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   40.960209] mate-session-ch[5936]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[   45.795172] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   45.795872] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   45.796520] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   50.915183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   50.916005] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   50.916012] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[-62]
[   50.916846] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.035149] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...
[   56.035956] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.036585] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message
[   56.036592] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28]
[   61.155137] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel...

The same spew repeats every time I run any application that uses the GPU,
and the application also gets blocked for a long time (eg. firefox takes
over 15 seconds to start now).

> 
> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
> ---
>  drivers/misc/mei/pxp/mei_pxp.c | 70 +++++++++++++++++++++++-----------
>  1 file changed, 48 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/misc/mei/pxp/mei_pxp.c b/drivers/misc/mei/pxp/mei_pxp.c
> index c6cdd6a47308ebcc72f34c38..9875d16445bb03efcfb31cd9 100644
> --- a/drivers/misc/mei/pxp/mei_pxp.c
> +++ b/drivers/misc/mei/pxp/mei_pxp.c
> @@ -23,6 +23,24 @@
>  
>  #include "mei_pxp.h"
>  
> +static inline int mei_pxp_reenable(const struct device *dev, struct mei_cl_device *cldev)
> +{
> +	int ret;
> +
> +	dev_warn(dev, "Trying to reset the channel...\n");
> +	ret = mei_cldev_disable(cldev);
> +	if (ret < 0)
> +		dev_warn(dev, "mei_cldev_disable failed. %d\n", ret);
> +	/*
> +	 * Explicitly ignoring disable failure,
> +	 * enable may fix the states and succeed
> +	 */
> +	ret = mei_cldev_enable(cldev);
> +	if (ret < 0)
> +		dev_err(dev, "mei_cldev_enable failed. %d\n", ret);
> +	return ret;
> +}
> +
>  /**
>   * mei_pxp_send_message() - Sends a PXP message to ME FW.
>   * @dev: device corresponding to the mei_cl_device
> @@ -35,6 +53,7 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
>  {
>  	struct mei_cl_device *cldev;
>  	ssize_t byte;
> +	int ret;
>  
>  	if (!dev || !message)
>  		return -EINVAL;
> @@ -44,10 +63,20 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size)
>  	byte = mei_cldev_send(cldev, message, size);
>  	if (byte < 0) {
>  		dev_dbg(dev, "mei_cldev_send failed. %zd\n", byte);
> -		return byte;
> +		switch (byte) {
> +		case -ENOMEM:
> +			fallthrough;
> +		case -ENODEV:
> +			fallthrough;
> +		case -ETIME:
> +			ret = mei_pxp_reenable(dev, cldev);
> +			if (ret)
> +				byte = ret;
> +			break;
> +		}
>  	}
>  
> -	return 0;
> +	return byte;
>  }
>  
>  /**
> @@ -63,6 +92,7 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
>  	struct mei_cl_device *cldev;
>  	ssize_t byte;
>  	bool retry = false;
> +	int ret;
>  
>  	if (!dev || !buffer)
>  		return -EINVAL;
> @@ -73,26 +103,22 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size)
>  	byte = mei_cldev_recv(cldev, buffer, size);
>  	if (byte < 0) {
>  		dev_dbg(dev, "mei_cldev_recv failed. %zd\n", byte);
> -		if (byte != -ENOMEM)
> -			return byte;
> -
> -		/* Retry the read when pages are reclaimed */
> -		msleep(20);
> -		if (!retry) {
> -			retry = true;
> -			goto retry;
> -		} else {
> -			dev_warn(dev, "No memory on data receive after retry, trying to reset the channel...\n");
> -			byte = mei_cldev_disable(cldev);
> -			if (byte < 0)
> -				dev_warn(dev, "mei_cldev_disable failed. %zd\n", byte);
> -			/*
> -			 * Explicitly ignoring disable failure,
> -			 * enable may fix the states and succeed
> -			 */
> -			byte = mei_cldev_enable(cldev);
> -			if (byte < 0)
> -				dev_err(dev, "mei_cldev_enable failed. %zd\n", byte);
> +		switch (byte) {
> +		case -ENOMEM:
> +			/* Retry the read when pages are reclaimed */
> +			msleep(20);
> +			if (!retry) {
> +				retry = true;
> +				goto retry;
> +			}
> +			fallthrough;
> +		case -ENODEV:
> +			fallthrough;
> +		case -ETIME:
> +			ret = mei_pxp_reenable(dev, cldev);
> +			if (ret)
> +				byte = ret;
> +			break;
>  		}
>  	}
>  
> -- 
> 2.41.0
> 

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
  2023-11-14 14:00     ` [Intel-gfx] " Ville Syrjälä
@ 2023-11-14 15:31       ` Teres Alexis, Alan Previn
  -1 siblings, 0 replies; 19+ messages in thread
From: Teres Alexis, Alan Previn @ 2023-11-14 15:31 UTC (permalink / raw)
  To: ville.syrjala, Winkler, Tomas
  Cc: gregkh, Usyskin, Alexander, linux-kernel, intel-gfx, Lubart, Vitaly

On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > From: Alexander Usyskin <alexander.usyskin@intel.com>
> > 
> > Disable and enable mei-pxp client on errors to clean the internal state.
> 
> This broke i915 on my Alderlake-P laptop.
> 

Hi Alex, i just relooked at the series that got merged, and i noticed
that in patch #3 of the series, you had changed mei_pxp_send_message
to return bytes sent instead of zero on success. IIRC, we had
agreed to not effect the behavior of this component interface (other
than adding the timeout) - this was the intention of Patch #4 that i
was pushing for in order to spec the interface (which continues
to say zero on success). We should fix this to stay with the original
behavior - where mei-pxp should NOT send partial packets and
will only return zero in success case where success is sending of
the complete packets - so we don't need to get back the "bytes sent"
from mei_pxp_send_message. So i think this might be causing the problem.

Side note  to Ville:, are you enabling PXP kernel config by default in
all MESA contexts? I recall that MESA folks were running some CI testing
with enable pxp contexts, but didn't realize this is being enabled by
default in all contexts. Please be aware that enabling pxp-contexts
would temporarily disabled runtime-pm during that contexts lifetime.
Also pxp contexts will be forced to be irrecoverable if it ever hangs.
The former is a hardware architecture requirement but doesn't do anything
if you're enabling display (which I beleive also blocks in ADL). The
latter was a requirement to comply with Vulkan.

...alan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
@ 2023-11-14 15:31       ` Teres Alexis, Alan Previn
  0 siblings, 0 replies; 19+ messages in thread
From: Teres Alexis, Alan Previn @ 2023-11-14 15:31 UTC (permalink / raw)
  To: ville.syrjala, Winkler,  Tomas
  Cc: gregkh, intel-gfx, Usyskin, Alexander, linux-kernel, Lubart, Vitaly

On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > From: Alexander Usyskin <alexander.usyskin@intel.com>
> > 
> > Disable and enable mei-pxp client on errors to clean the internal state.
> 
> This broke i915 on my Alderlake-P laptop.
> 

Hi Alex, i just relooked at the series that got merged, and i noticed
that in patch #3 of the series, you had changed mei_pxp_send_message
to return bytes sent instead of zero on success. IIRC, we had
agreed to not effect the behavior of this component interface (other
than adding the timeout) - this was the intention of Patch #4 that i
was pushing for in order to spec the interface (which continues
to say zero on success). We should fix this to stay with the original
behavior - where mei-pxp should NOT send partial packets and
will only return zero in success case where success is sending of
the complete packets - so we don't need to get back the "bytes sent"
from mei_pxp_send_message. So i think this might be causing the problem.

Side note  to Ville:, are you enabling PXP kernel config by default in
all MESA contexts? I recall that MESA folks were running some CI testing
with enable pxp contexts, but didn't realize this is being enabled by
default in all contexts. Please be aware that enabling pxp-contexts
would temporarily disabled runtime-pm during that contexts lifetime.
Also pxp contexts will be forced to be irrecoverable if it ever hangs.
The former is a hardware architecture requirement but doesn't do anything
if you're enabling display (which I beleive also blocks in ADL). The
latter was a requirement to comply with Vulkan.

...alan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
  2023-11-14 15:31       ` [Intel-gfx] " Teres Alexis, Alan Previn
@ 2023-11-14 18:40         ` Winkler, Tomas
  -1 siblings, 0 replies; 19+ messages in thread
From: Winkler, Tomas @ 2023-11-14 18:40 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, ville.syrjala, gregkh
  Cc: intel-gfx, Usyskin, Alexander, linux-kernel, Lubart,  Vitaly



> -----Original Message-----
> From: Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>
> Sent: Tuesday, November 14, 2023 5:32 PM
> To: ville.syrjala@linux.intel.com; Winkler, Tomas <tomas.winkler@intel.com>
> Cc: gregkh@linuxfoundation.org; Usyskin, Alexander
> <alexander.usyskin@intel.com>; linux-kernel@vger.kernel.org; intel-
> gfx@lists.freedesktop.org; Lubart, Vitaly <vitaly.lubart@intel.com>
> Subject: Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
> 
> On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> > On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > > From: Alexander Usyskin <alexander.usyskin@intel.com>
> > >
> > > Disable and enable mei-pxp client on errors to clean the internal state.
> >
> > This broke i915 on my Alderlake-P laptop.

This fix was already posted, just missed the merging window
https://lkml.org/lkml/2023/10/31/636

Greg can you please take this fix into v6.7-rc2 run, or I can repost it with the correct subject.
Thanks
Tomas


> >
> 
> 
> Hi Alex, i just relooked at the series that got merged, and i noticed that in patch
> #3 of the series, you had changed mei_pxp_send_message to return bytes sent
> instead of zero on success. IIRC, we had agreed to not effect the behavior of
> this component interface (other than adding the timeout) - this was the
> intention of Patch #4 that i was pushing for in order to spec the interface
> (which continues to say zero on success). We should fix this to stay with the
> original behavior - where mei-pxp should NOT send partial packets and will
> only return zero in success case where success is sending of the complete
> packets - so we don't need to get back the "bytes sent"
> from mei_pxp_send_message. So i think this might be causing the problem.
> 
> 
> Side note  to Ville:, are you enabling PXP kernel config by default in all MESA
> contexts? I recall that MESA folks were running some CI testing with enable
> pxp contexts, but didn't realize this is being enabled by default in all contexts.
> Please be aware that enabling pxp-contexts would temporarily disabled
> runtime-pm during that contexts lifetime.
> Also pxp contexts will be forced to be irrecoverable if it ever hangs.
> The former is a hardware architecture requirement but doesn't do anything if
> you're enabling display (which I beleive also blocks in ADL). The latter was a
> requirement to comply with Vulkan.
> 
> ...alan
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [char-misc-next 3/4] mei: pxp: re-enable client on errors
@ 2023-11-14 18:40         ` Winkler, Tomas
  0 siblings, 0 replies; 19+ messages in thread
From: Winkler, Tomas @ 2023-11-14 18:40 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, ville.syrjala, gregkh
  Cc: Usyskin, Alexander, linux-kernel, intel-gfx, Lubart, Vitaly



> -----Original Message-----
> From: Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>
> Sent: Tuesday, November 14, 2023 5:32 PM
> To: ville.syrjala@linux.intel.com; Winkler, Tomas <tomas.winkler@intel.com>
> Cc: gregkh@linuxfoundation.org; Usyskin, Alexander
> <alexander.usyskin@intel.com>; linux-kernel@vger.kernel.org; intel-
> gfx@lists.freedesktop.org; Lubart, Vitaly <vitaly.lubart@intel.com>
> Subject: Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
> 
> On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> > On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > > From: Alexander Usyskin <alexander.usyskin@intel.com>
> > >
> > > Disable and enable mei-pxp client on errors to clean the internal state.
> >
> > This broke i915 on my Alderlake-P laptop.

This fix was already posted, just missed the merging window
https://lkml.org/lkml/2023/10/31/636

Greg can you please take this fix into v6.7-rc2 run, or I can repost it with the correct subject.
Thanks
Tomas


> >
> 
> 
> Hi Alex, i just relooked at the series that got merged, and i noticed that in patch
> #3 of the series, you had changed mei_pxp_send_message to return bytes sent
> instead of zero on success. IIRC, we had agreed to not effect the behavior of
> this component interface (other than adding the timeout) - this was the
> intention of Patch #4 that i was pushing for in order to spec the interface
> (which continues to say zero on success). We should fix this to stay with the
> original behavior - where mei-pxp should NOT send partial packets and will
> only return zero in success case where success is sending of the complete
> packets - so we don't need to get back the "bytes sent"
> from mei_pxp_send_message. So i think this might be causing the problem.
> 
> 
> Side note  to Ville:, are you enabling PXP kernel config by default in all MESA
> contexts? I recall that MESA folks were running some CI testing with enable
> pxp contexts, but didn't realize this is being enabled by default in all contexts.
> Please be aware that enabling pxp-contexts would temporarily disabled
> runtime-pm during that contexts lifetime.
> Also pxp contexts will be forced to be irrecoverable if it ever hangs.
> The former is a hardware architecture requirement but doesn't do anything if
> you're enabling display (which I beleive also blocks in ADL). The latter was a
> requirement to comply with Vulkan.
> 
> ...alan
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
  2023-11-14 15:31       ` [Intel-gfx] " Teres Alexis, Alan Previn
  (?)
  (?)
@ 2023-11-15 13:31       ` Tvrtko Ursulin
  2023-11-15 15:58           ` Teres Alexis, Alan Previn
  -1 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2023-11-15 13:31 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, ville.syrjala, Winkler, Tomas
  Cc: gregkh, intel-gfx, Usyskin, Alexander, linux-kernel, Lubart, Vitaly


On 14/11/2023 15:31, Teres Alexis, Alan Previn wrote:
> On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
>> On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
>>> From: Alexander Usyskin <alexander.usyskin@intel.com>
>>>
>>> Disable and enable mei-pxp client on errors to clean the internal state.
>>
>> This broke i915 on my Alderlake-P laptop.
>>
> 
> 
> Hi Alex, i just relooked at the series that got merged, and i noticed
> that in patch #3 of the series, you had changed mei_pxp_send_message
> to return bytes sent instead of zero on success. IIRC, we had
> agreed to not effect the behavior of this component interface (other
> than adding the timeout) - this was the intention of Patch #4 that i
> was pushing for in order to spec the interface (which continues
> to say zero on success). We should fix this to stay with the original
> behavior - where mei-pxp should NOT send partial packets and
> will only return zero in success case where success is sending of
> the complete packets - so we don't need to get back the "bytes sent"
> from mei_pxp_send_message. So i think this might be causing the problem.
> 
> 
> Side note  to Ville:, are you enabling PXP kernel config by default in
> all MESA contexts? I recall that MESA folks were running some CI testing
> with enable pxp contexts, but didn't realize this is being enabled by
> default in all contexts. Please be aware that enabling pxp-contexts
> would temporarily disabled runtime-pm during that contexts lifetime.
> Also pxp contexts will be forced to be irrecoverable if it ever hangs.
> The former is a hardware architecture requirement but doesn't do anything
> if you're enabling display (which I beleive also blocks in ADL). The
> latter was a requirement to comply with Vulkan.

Regardless of the mei_pxp_send_message being temporarily broken, doesn't 
Ville's logs suggest the PXP detection is altogether messed up? AFAIR 
the plan was exactly to avoid stalls during Mesa init and new uapi was 
added to achieve that. But it doesn't seem to be working?!

commit 3b918f4f0c8b5344af4058f1a12e2023363d0097
Author: Alan Previn <alan.previn.teres.alexis@intel.com>
Date:   Wed Aug 2 11:25:50 2023 -0700

     drm/i915/pxp: Optimize GET_PARAM:PXP_STATUS

     After recent discussions with Mesa folks, it was requested
     that we optimize i915's GET_PARAM for the PXP_STATUS without
     changing the UAPI spec.

     Add these additional optimizations:
        - If any PXP initializatoin flow failed, then ensure that
          we catch it so that we can change the returned PXP_STATUS
          from "2" (i.e. 'PXP is supported but not yet ready')
          to "-ENODEV". This typically should not happen and if it
          does, we have a platform configuration issue.
        - If a PXP arbitration session creation event failed
          due to incorrect firmware version or blocking SOC fusing
          or blocking BIOS configuration (platform reasons that won't
          change if we retry), then reflect that blockage by also
          returning -ENODEV in the GET_PARAM:PXP_STATUS.
        - GET_PARAM:PXP_STATUS should not wait at all if PXP is
          supported but non-i915 dependencies (component-driver /
          firmware) we are still pending to complete the init flows.
          In this case, just return "2" immediately (i.e. 'PXP is
          supported but not yet ready').

AFAIU is things failed there shouldn't be long waits, repeated/constant 
ones even less so.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
  2023-11-15 13:31       ` Tvrtko Ursulin
@ 2023-11-15 15:58           ` Teres Alexis, Alan Previn
  0 siblings, 0 replies; 19+ messages in thread
From: Teres Alexis, Alan Previn @ 2023-11-15 15:58 UTC (permalink / raw)
  To: ville.syrjala, Winkler, Tomas, tvrtko.ursulin
  Cc: gregkh, Usyskin, Alexander, intel-gfx, linux-kernel, Lubart, Vitaly

On Wed, 2023-11-15 at 13:31 +0000, Tvrtko Ursulin wrote:
> On 14/11/2023 15:31, Teres Alexis, Alan Previn wrote:
> > On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> > > On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > > 
> 
> Regardless of the mei_pxp_send_message being temporarily broken, doesn't 
> Ville's logs suggest the PXP detection is altogether messed up? AFAIR 
> the plan was exactly to avoid stalls during Mesa init and new uapi was 
> added to achieve that. But it doesn't seem to be working?!
> 
> commit 3b918f4f0c8b5344af4058f1a12e2023363d0097
> Author: Alan Previn <alan.previn.teres.alexis@intel.com>
> Date:   Wed Aug 2 11:25:50 2023 -0700
> 
>      drm/i915/pxp: Optimize GET_PARAM:PXP_STATUS
> 
>      After recent discussions with Mesa folks, it was requested
>      that we optimize i915's GET_PARAM for the PXP_STATUS without
>      changing the UAPI spec.
> 
>      Add these additional optimizations:
>         - If any PXP initializatoin flow failed, then ensure that
>           we catch it so that we can change the returned PXP_STATUS
>           from "2" (i.e. 'PXP is supported but not yet ready')
>           to "-ENODEV". This typically should not happen and if it
>           does, we have a platform configuration issue.
>         - If a PXP arbitration session creation event failed
>           due to incorrect firmware version or blocking SOC fusing
>           or blocking BIOS configuration (platform reasons that won't
>           change if we retry), then reflect that blockage by also
>           returning -ENODEV in the GET_PARAM:PXP_STATUS.
>         - GET_PARAM:PXP_STATUS should not wait at all if PXP is
>           supported but non-i915 dependencies (component-driver /
>           firmware) we are still pending to complete the init flows.
>           In this case, just return "2" immediately (i.e. 'PXP is
>           supported but not yet ready').
> 
> AFAIU is things failed there shouldn't be long waits, repeated/constant 
> ones even less so.
> 
alan: I agree - but I don't think MESA is detecting for PXP for above case...
which is designed to be quick if using the GET_PARAM IOCTL - i believe MESA
may actually be opting in to enforce PXP. When this happens we are required
to be stringent when managing the protected-hw-sessions and for certain PXP
operations we retry when a failure occurs - one in particular is the PXP hw
session teardown or reset. We are expected to retry up to 3 times to ensure
that the session is properly torn down (requirements from architecture)
unless the error returned from FW indicated that we dont have proper fusing or
security assets in the platform (i.e. lower level aspects of the platform won't
permit PXP support). All of this is already coded.

So we have two problems here:

1. The code for enforcing PXP when PXP is explicitly requested by UMD is
expecting the mei-component driver to follow the mei-component-interface spec
but a change was introduced by Alex that caused a bug in this lowest layer spec
(see: https://elixir.bootlin.com/linux/latest/source/drivers/misc/mei/pxp/mei_pxp.c#L25)
"Return: 0 on Success, <0 on Failure" for send and "Return: bytes sent on
Success, <0 on Failure" for receive - the change they made was returning
a positive number on send's success and i915 was checking the according to spec:

         ret = pxp_component->ops->send(pxp_component->tee_dev, msg_in, msg_in_size,
                                        PXP_TRANSPORT_TIMEOUT_MS);
----->   if (ret) {
                 drm_err(&i915->drm, "Failed to send PXP TEE message\n");
                 goto unlock;
         }

         ret = pxp_component->ops->recv(pxp_component->tee_dev, msg_out, msg_out_max_size,
                                        PXP_TRANSPORT_TIMEOUT_MS);
         if (ret < 0) {
                 drm_err(&i915->drm, "Failed to receive PXP TEE message\n");
                 goto unlock;
         }

So i really cannot guarantee anything if the mei code itself is broken and not
complying to the spec we agreed on - i can change the above check for send
from "if (ret)" to "if (ret < 0)" but that would only be "working aroung"
the mei bug. As a side note, when we bail on a successful "send" thinking its
a failure, and try to "send" again, it triggers an link reset in the mei-hw
link because the receive was never picked up. IF i understand this correctly,
this is hw-fw design requirement, i.e. that the send-recv be an atomic FSM
sequence. That said, if the the send-failure was a real failure, the timeout
would have not been hit and the retry would have been much faster. As a side note,
Alex from mei team did reply to me offline with indication that the fix was
supposed to have been sent via "urgent queue" and he will follow up on that. 

2. I dont believe MESA is using the GET_PARAM ioctl for detecting PXP availibility,
I think MESA is explicitly opting into creating PXP contexts. That will of course
trigger the the backend PXP code that actually executes any PXP operation. Previously
in the older code MESA was creating PXP contexts on every context creation just to
test if PXP was available which is much worse - especially on MTL which can take
much longer for the underlying firmware to come up. I have an ongoing email thread
with the MESA folks which i will include you on to clarify this.

Problem #1 is resolved and the patch-fix will need to be merged. But Problem #2 is
still an open because Ville said he did not modify MESA to opt-in to enforce
protected sessions for all the PXP contexts.

...alan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
@ 2023-11-15 15:58           ` Teres Alexis, Alan Previn
  0 siblings, 0 replies; 19+ messages in thread
From: Teres Alexis, Alan Previn @ 2023-11-15 15:58 UTC (permalink / raw)
  To: ville.syrjala, Winkler,  Tomas, tvrtko.ursulin
  Cc: gregkh, intel-gfx, Usyskin, Alexander, linux-kernel, Lubart, Vitaly

On Wed, 2023-11-15 at 13:31 +0000, Tvrtko Ursulin wrote:
> On 14/11/2023 15:31, Teres Alexis, Alan Previn wrote:
> > On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> > > On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > > 
> 
> Regardless of the mei_pxp_send_message being temporarily broken, doesn't 
> Ville's logs suggest the PXP detection is altogether messed up? AFAIR 
> the plan was exactly to avoid stalls during Mesa init and new uapi was 
> added to achieve that. But it doesn't seem to be working?!
> 
> commit 3b918f4f0c8b5344af4058f1a12e2023363d0097
> Author: Alan Previn <alan.previn.teres.alexis@intel.com>
> Date:   Wed Aug 2 11:25:50 2023 -0700
> 
>      drm/i915/pxp: Optimize GET_PARAM:PXP_STATUS
> 
>      After recent discussions with Mesa folks, it was requested
>      that we optimize i915's GET_PARAM for the PXP_STATUS without
>      changing the UAPI spec.
> 
>      Add these additional optimizations:
>         - If any PXP initializatoin flow failed, then ensure that
>           we catch it so that we can change the returned PXP_STATUS
>           from "2" (i.e. 'PXP is supported but not yet ready')
>           to "-ENODEV". This typically should not happen and if it
>           does, we have a platform configuration issue.
>         - If a PXP arbitration session creation event failed
>           due to incorrect firmware version or blocking SOC fusing
>           or blocking BIOS configuration (platform reasons that won't
>           change if we retry), then reflect that blockage by also
>           returning -ENODEV in the GET_PARAM:PXP_STATUS.
>         - GET_PARAM:PXP_STATUS should not wait at all if PXP is
>           supported but non-i915 dependencies (component-driver /
>           firmware) we are still pending to complete the init flows.
>           In this case, just return "2" immediately (i.e. 'PXP is
>           supported but not yet ready').
> 
> AFAIU is things failed there shouldn't be long waits, repeated/constant 
> ones even less so.
> 
alan: I agree - but I don't think MESA is detecting for PXP for above case...
which is designed to be quick if using the GET_PARAM IOCTL - i believe MESA
may actually be opting in to enforce PXP. When this happens we are required
to be stringent when managing the protected-hw-sessions and for certain PXP
operations we retry when a failure occurs - one in particular is the PXP hw
session teardown or reset. We are expected to retry up to 3 times to ensure
that the session is properly torn down (requirements from architecture)
unless the error returned from FW indicated that we dont have proper fusing or
security assets in the platform (i.e. lower level aspects of the platform won't
permit PXP support). All of this is already coded.

So we have two problems here:

1. The code for enforcing PXP when PXP is explicitly requested by UMD is
expecting the mei-component driver to follow the mei-component-interface spec
but a change was introduced by Alex that caused a bug in this lowest layer spec
(see: https://elixir.bootlin.com/linux/latest/source/drivers/misc/mei/pxp/mei_pxp.c#L25)
"Return: 0 on Success, <0 on Failure" for send and "Return: bytes sent on
Success, <0 on Failure" for receive - the change they made was returning
a positive number on send's success and i915 was checking the according to spec:

         ret = pxp_component->ops->send(pxp_component->tee_dev, msg_in, msg_in_size,
                                        PXP_TRANSPORT_TIMEOUT_MS);
----->   if (ret) {
                 drm_err(&i915->drm, "Failed to send PXP TEE message\n");
                 goto unlock;
         }

         ret = pxp_component->ops->recv(pxp_component->tee_dev, msg_out, msg_out_max_size,
                                        PXP_TRANSPORT_TIMEOUT_MS);
         if (ret < 0) {
                 drm_err(&i915->drm, "Failed to receive PXP TEE message\n");
                 goto unlock;
         }

So i really cannot guarantee anything if the mei code itself is broken and not
complying to the spec we agreed on - i can change the above check for send
from "if (ret)" to "if (ret < 0)" but that would only be "working aroung"
the mei bug. As a side note, when we bail on a successful "send" thinking its
a failure, and try to "send" again, it triggers an link reset in the mei-hw
link because the receive was never picked up. IF i understand this correctly,
this is hw-fw design requirement, i.e. that the send-recv be an atomic FSM
sequence. That said, if the the send-failure was a real failure, the timeout
would have not been hit and the retry would have been much faster. As a side note,
Alex from mei team did reply to me offline with indication that the fix was
supposed to have been sent via "urgent queue" and he will follow up on that. 

2. I dont believe MESA is using the GET_PARAM ioctl for detecting PXP availibility,
I think MESA is explicitly opting into creating PXP contexts. That will of course
trigger the the backend PXP code that actually executes any PXP operation. Previously
in the older code MESA was creating PXP contexts on every context creation just to
test if PXP was available which is much worse - especially on MTL which can take
much longer for the underlying firmware to come up. I have an ongoing email thread
with the MESA folks which i will include you on to clarify this.

Problem #1 is resolved and the patch-fix will need to be merged. But Problem #2 is
still an open because Ville said he did not modify MESA to opt-in to enforce
protected sessions for all the PXP contexts.

...alan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
  2023-11-14 18:40         ` Winkler, Tomas
@ 2023-11-15 20:35           ` Ville Syrjälä
  -1 siblings, 0 replies; 19+ messages in thread
From: Ville Syrjälä @ 2023-11-15 20:35 UTC (permalink / raw)
  To: Winkler, Tomas
  Cc: Teres Alexis, Alan Previn, gregkh, intel-gfx, Usyskin, Alexander,
	linux-kernel, Lubart, Vitaly

On Tue, Nov 14, 2023 at 06:40:26PM +0000, Winkler, Tomas wrote:
> 
> 
> > -----Original Message-----
> > From: Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>
> > Sent: Tuesday, November 14, 2023 5:32 PM
> > To: ville.syrjala@linux.intel.com; Winkler, Tomas <tomas.winkler@intel.com>
> > Cc: gregkh@linuxfoundation.org; Usyskin, Alexander
> > <alexander.usyskin@intel.com>; linux-kernel@vger.kernel.org; intel-
> > gfx@lists.freedesktop.org; Lubart, Vitaly <vitaly.lubart@intel.com>
> > Subject: Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
> > 
> > On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> > > On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > > > From: Alexander Usyskin <alexander.usyskin@intel.com>
> > > >
> > > > Disable and enable mei-pxp client on errors to clean the internal state.
> > >
> > > This broke i915 on my Alderlake-P laptop.
> 
> This fix was already posted, just missed the merging window
> https://lkml.org/lkml/2023/10/31/636

Gave this a spin and it fixes the issue for me. Thanks.

> 
> Greg can you please take this fix into v6.7-rc2 run, or I can repost it with the correct subject.
> Thanks
> Tomas
> 
> 
> > >
> > 
> > 
> > Hi Alex, i just relooked at the series that got merged, and i noticed that in patch
> > #3 of the series, you had changed mei_pxp_send_message to return bytes sent
> > instead of zero on success. IIRC, we had agreed to not effect the behavior of
> > this component interface (other than adding the timeout) - this was the
> > intention of Patch #4 that i was pushing for in order to spec the interface
> > (which continues to say zero on success). We should fix this to stay with the
> > original behavior - where mei-pxp should NOT send partial packets and will
> > only return zero in success case where success is sending of the complete
> > packets - so we don't need to get back the "bytes sent"
> > from mei_pxp_send_message. So i think this might be causing the problem.
> > 
> > 
> > Side note  to Ville:, are you enabling PXP kernel config by default in all MESA
> > contexts? I recall that MESA folks were running some CI testing with enable
> > pxp contexts, but didn't realize this is being enabled by default in all contexts.
> > Please be aware that enabling pxp-contexts would temporarily disabled
> > runtime-pm during that contexts lifetime.
> > Also pxp contexts will be forced to be irrecoverable if it ever hangs.
> > The former is a hardware architecture requirement but doesn't do anything if
> > you're enabling display (which I beleive also blocks in ADL). The latter was a
> > requirement to comply with Vulkan.
> > 
> > ...alan
> > 
> 

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
@ 2023-11-15 20:35           ` Ville Syrjälä
  0 siblings, 0 replies; 19+ messages in thread
From: Ville Syrjälä @ 2023-11-15 20:35 UTC (permalink / raw)
  To: Winkler, Tomas
  Cc: Teres Alexis, Alan Previn, gregkh, Usyskin, Alexander,
	linux-kernel, intel-gfx, Lubart, Vitaly

On Tue, Nov 14, 2023 at 06:40:26PM +0000, Winkler, Tomas wrote:
> 
> 
> > -----Original Message-----
> > From: Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>
> > Sent: Tuesday, November 14, 2023 5:32 PM
> > To: ville.syrjala@linux.intel.com; Winkler, Tomas <tomas.winkler@intel.com>
> > Cc: gregkh@linuxfoundation.org; Usyskin, Alexander
> > <alexander.usyskin@intel.com>; linux-kernel@vger.kernel.org; intel-
> > gfx@lists.freedesktop.org; Lubart, Vitaly <vitaly.lubart@intel.com>
> > Subject: Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
> > 
> > On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> > > On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > > > From: Alexander Usyskin <alexander.usyskin@intel.com>
> > > >
> > > > Disable and enable mei-pxp client on errors to clean the internal state.
> > >
> > > This broke i915 on my Alderlake-P laptop.
> 
> This fix was already posted, just missed the merging window
> https://lkml.org/lkml/2023/10/31/636

Gave this a spin and it fixes the issue for me. Thanks.

> 
> Greg can you please take this fix into v6.7-rc2 run, or I can repost it with the correct subject.
> Thanks
> Tomas
> 
> 
> > >
> > 
> > 
> > Hi Alex, i just relooked at the series that got merged, and i noticed that in patch
> > #3 of the series, you had changed mei_pxp_send_message to return bytes sent
> > instead of zero on success. IIRC, we had agreed to not effect the behavior of
> > this component interface (other than adding the timeout) - this was the
> > intention of Patch #4 that i was pushing for in order to spec the interface
> > (which continues to say zero on success). We should fix this to stay with the
> > original behavior - where mei-pxp should NOT send partial packets and will
> > only return zero in success case where success is sending of the complete
> > packets - so we don't need to get back the "bytes sent"
> > from mei_pxp_send_message. So i think this might be causing the problem.
> > 
> > 
> > Side note  to Ville:, are you enabling PXP kernel config by default in all MESA
> > contexts? I recall that MESA folks were running some CI testing with enable
> > pxp contexts, but didn't realize this is being enabled by default in all contexts.
> > Please be aware that enabling pxp-contexts would temporarily disabled
> > runtime-pm during that contexts lifetime.
> > Also pxp contexts will be forced to be irrecoverable if it ever hangs.
> > The former is a hardware architecture requirement but doesn't do anything if
> > you're enabling display (which I beleive also blocks in ADL). The latter was a
> > requirement to comply with Vulkan.
> > 
> > ...alan
> > 
> 

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
  2023-11-15 20:35           ` Ville Syrjälä
  (?)
@ 2023-11-27 13:22           ` Ville Syrjälä
  2023-11-27 13:31               ` gregkh
  -1 siblings, 1 reply; 19+ messages in thread
From: Ville Syrjälä @ 2023-11-27 13:22 UTC (permalink / raw)
  To: Winkler, Tomas
  Cc: Teres Alexis, Alan Previn, gregkh, intel-gfx, Usyskin, Alexander,
	linux-kernel, Lubart, Vitaly

On Wed, Nov 15, 2023 at 10:35:16PM +0200, Ville Syrjälä wrote:
> On Tue, Nov 14, 2023 at 06:40:26PM +0000, Winkler, Tomas wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>
> > > Sent: Tuesday, November 14, 2023 5:32 PM
> > > To: ville.syrjala@linux.intel.com; Winkler, Tomas <tomas.winkler@intel.com>
> > > Cc: gregkh@linuxfoundation.org; Usyskin, Alexander
> > > <alexander.usyskin@intel.com>; linux-kernel@vger.kernel.org; intel-
> > > gfx@lists.freedesktop.org; Lubart, Vitaly <vitaly.lubart@intel.com>
> > > Subject: Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
> > > 
> > > On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> > > > On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > > > > From: Alexander Usyskin <alexander.usyskin@intel.com>
> > > > >
> > > > > Disable and enable mei-pxp client on errors to clean the internal state.
> > > >
> > > > This broke i915 on my Alderlake-P laptop.
> > 
> > This fix was already posted, just missed the merging window
> > https://lkml.org/lkml/2023/10/31/636
> 
> Gave this a spin and it fixes the issue for me. Thanks.
> 
> > 
> > Greg can you please take this fix into v6.7-rc2 run, or I can repost it with the correct subject.

We're at -rc3 already and this fix is still not in!

> > Thanks
> > Tomas
> > 
> > 
> > > >
> > > 
> > > 
> > > Hi Alex, i just relooked at the series that got merged, and i noticed that in patch
> > > #3 of the series, you had changed mei_pxp_send_message to return bytes sent
> > > instead of zero on success. IIRC, we had agreed to not effect the behavior of
> > > this component interface (other than adding the timeout) - this was the
> > > intention of Patch #4 that i was pushing for in order to spec the interface
> > > (which continues to say zero on success). We should fix this to stay with the
> > > original behavior - where mei-pxp should NOT send partial packets and will
> > > only return zero in success case where success is sending of the complete
> > > packets - so we don't need to get back the "bytes sent"
> > > from mei_pxp_send_message. So i think this might be causing the problem.
> > > 
> > > 
> > > Side note  to Ville:, are you enabling PXP kernel config by default in all MESA
> > > contexts? I recall that MESA folks were running some CI testing with enable
> > > pxp contexts, but didn't realize this is being enabled by default in all contexts.
> > > Please be aware that enabling pxp-contexts would temporarily disabled
> > > runtime-pm during that contexts lifetime.
> > > Also pxp contexts will be forced to be irrecoverable if it ever hangs.
> > > The former is a hardware architecture requirement but doesn't do anything if
> > > you're enabling display (which I beleive also blocks in ADL). The latter was a
> > > requirement to comply with Vulkan.
> > > 
> > > ...alan
> > > 
> > 
> 
> -- 
> Ville Syrjälä
> Intel

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
  2023-11-27 13:22           ` [Intel-gfx] " Ville Syrjälä
@ 2023-11-27 13:31               ` gregkh
  0 siblings, 0 replies; 19+ messages in thread
From: gregkh @ 2023-11-27 13:31 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Teres Alexis, Alan Previn, intel-gfx, Usyskin, Alexander,
	linux-kernel, Winkler, Tomas, Lubart, Vitaly

On Mon, Nov 27, 2023 at 03:22:29PM +0200, Ville Syrjälä wrote:
> On Wed, Nov 15, 2023 at 10:35:16PM +0200, Ville Syrjälä wrote:
> > On Tue, Nov 14, 2023 at 06:40:26PM +0000, Winkler, Tomas wrote:
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>
> > > > Sent: Tuesday, November 14, 2023 5:32 PM
> > > > To: ville.syrjala@linux.intel.com; Winkler, Tomas <tomas.winkler@intel.com>
> > > > Cc: gregkh@linuxfoundation.org; Usyskin, Alexander
> > > > <alexander.usyskin@intel.com>; linux-kernel@vger.kernel.org; intel-
> > > > gfx@lists.freedesktop.org; Lubart, Vitaly <vitaly.lubart@intel.com>
> > > > Subject: Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
> > > > 
> > > > On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> > > > > On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > > > > > From: Alexander Usyskin <alexander.usyskin@intel.com>
> > > > > >
> > > > > > Disable and enable mei-pxp client on errors to clean the internal state.
> > > > >
> > > > > This broke i915 on my Alderlake-P laptop.
> > > 
> > > This fix was already posted, just missed the merging window
> > > https://lkml.org/lkml/2023/10/31/636
> > 
> > Gave this a spin and it fixes the issue for me. Thanks.
> > 
> > > 
> > > Greg can you please take this fix into v6.7-rc2 run, or I can repost it with the correct subject.
> 
> We're at -rc3 already and this fix is still not in!

Ah, good point, I'll take it now, sorry, catching up on things...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [char-misc-next 3/4] mei: pxp: re-enable client on errors
@ 2023-11-27 13:31               ` gregkh
  0 siblings, 0 replies; 19+ messages in thread
From: gregkh @ 2023-11-27 13:31 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Winkler, Tomas, Teres Alexis, Alan Previn, intel-gfx, Usyskin,
	Alexander, linux-kernel, Lubart, Vitaly

On Mon, Nov 27, 2023 at 03:22:29PM +0200, Ville Syrjälä wrote:
> On Wed, Nov 15, 2023 at 10:35:16PM +0200, Ville Syrjälä wrote:
> > On Tue, Nov 14, 2023 at 06:40:26PM +0000, Winkler, Tomas wrote:
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>
> > > > Sent: Tuesday, November 14, 2023 5:32 PM
> > > > To: ville.syrjala@linux.intel.com; Winkler, Tomas <tomas.winkler@intel.com>
> > > > Cc: gregkh@linuxfoundation.org; Usyskin, Alexander
> > > > <alexander.usyskin@intel.com>; linux-kernel@vger.kernel.org; intel-
> > > > gfx@lists.freedesktop.org; Lubart, Vitaly <vitaly.lubart@intel.com>
> > > > Subject: Re: [char-misc-next 3/4] mei: pxp: re-enable client on errors
> > > > 
> > > > On Tue, 2023-11-14 at 16:00 +0200, Ville Syrjälä wrote:
> > > > > On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote:
> > > > > > From: Alexander Usyskin <alexander.usyskin@intel.com>
> > > > > >
> > > > > > Disable and enable mei-pxp client on errors to clean the internal state.
> > > > >
> > > > > This broke i915 on my Alderlake-P laptop.
> > > 
> > > This fix was already posted, just missed the merging window
> > > https://lkml.org/lkml/2023/10/31/636
> > 
> > Gave this a spin and it fixes the issue for me. Thanks.
> > 
> > > 
> > > Greg can you please take this fix into v6.7-rc2 run, or I can repost it with the correct subject.
> 
> We're at -rc3 already and this fix is still not in!

Ah, good point, I'll take it now, sorry, catching up on things...

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-11-27 13:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-11 11:01 [char-misc-next 0/4] mei: enhance mei pxp recoverability Tomas Winkler
2023-10-11 11:01 ` [char-misc-next 1/4] mei: bus: add send and recv api with timeout Tomas Winkler
2023-10-11 11:01 ` [char-misc-next 2/4] mei: pxp: recover from recv fail under memory pressure Tomas Winkler
2023-10-11 11:01 ` [char-misc-next 3/4] mei: pxp: re-enable client on errors Tomas Winkler
2023-11-14 14:00   ` Ville Syrjälä
2023-11-14 14:00     ` [Intel-gfx] " Ville Syrjälä
2023-11-14 15:31     ` Teres Alexis, Alan Previn
2023-11-14 15:31       ` [Intel-gfx] " Teres Alexis, Alan Previn
2023-11-14 18:40       ` Winkler, Tomas
2023-11-14 18:40         ` Winkler, Tomas
2023-11-15 20:35         ` [Intel-gfx] " Ville Syrjälä
2023-11-15 20:35           ` Ville Syrjälä
2023-11-27 13:22           ` [Intel-gfx] " Ville Syrjälä
2023-11-27 13:31             ` gregkh
2023-11-27 13:31               ` gregkh
2023-11-15 13:31       ` Tvrtko Ursulin
2023-11-15 15:58         ` Teres Alexis, Alan Previn
2023-11-15 15:58           ` Teres Alexis, Alan Previn
2023-10-11 11:01 ` [char-misc-next 4/4] mei: update mei-pxp's component interface with timeouts Tomas Winkler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.