[PATCH v2 0/5] soundwire: Fixes for spurious and missing UNATTACH

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/5] soundwire: Fixes for spurious and missing UNATTACH
@ 2022-09-07  8:52 ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: alsa-devel, linux-kernel, patches, Richard Fitzgerald

The bus and cadence code has several bugs that cause UNATTACH notifications
to either be sent spuriously or to be missed.

These can be seen occasionally with a single peripheral on the bus, but are
much more frequent with multiple peripherals, where several peripherals
could change state and report in consecutive PINGs.

The root of all of these bugs seems to be a code design flaw that assumed
every PING status change would be handled separately. However, PINGs are
handled by a workqueue function and there is no guarantee when that function
will be scheduled to run or how much CPU time it will receive. PINGs will
continue while the work function is handling a snapshot of a previous PING
so the code must take account that (a) status could change during the
work function and (b) there can be a backlog of changes before the IRQ work
function runs again.

Tested with 4 peripherals on 1 bus, and 8 peripherals on 2 buses.

CHANGES SINCE V1:
Patch #3 replaced with a better solution to the same bug.
Patches #4 and #5 added to fix some more bugs that were found.

Richard Fitzgerald (4):
  soundwire: bus: Don't lose unattach notifications
  soundwire: bus: Don't re-enumerate before status is UNATTACHED
  soundwire: cadence: Fix lost ATTACHED interrupts when enumerating
  soundwire: bus: Don't exit early if no device IDs were programmed

Simon Trimmer (1):
  soundwire: cadence: fix updating slave status when a bus has multiple
    peripherals

 drivers/soundwire/bus.c            | 40 +++++++++++-----
 drivers/soundwire/cadence_master.c | 75 ++++++++++++++++--------------
 2 files changed, 68 insertions(+), 47 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 0/5] soundwire: Fixes for spurious and missing UNATTACH
@ 2022-09-07  8:52 ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: patches, alsa-devel, Richard Fitzgerald, linux-kernel

The bus and cadence code has several bugs that cause UNATTACH notifications
to either be sent spuriously or to be missed.

These can be seen occasionally with a single peripheral on the bus, but are
much more frequent with multiple peripherals, where several peripherals
could change state and report in consecutive PINGs.

The root of all of these bugs seems to be a code design flaw that assumed
every PING status change would be handled separately. However, PINGs are
handled by a workqueue function and there is no guarantee when that function
will be scheduled to run or how much CPU time it will receive. PINGs will
continue while the work function is handling a snapshot of a previous PING
so the code must take account that (a) status could change during the
work function and (b) there can be a backlog of changes before the IRQ work
function runs again.

Tested with 4 peripherals on 1 bus, and 8 peripherals on 2 buses.

CHANGES SINCE V1:
Patch #3 replaced with a better solution to the same bug.
Patches #4 and #5 added to fix some more bugs that were found.

Richard Fitzgerald (4):
  soundwire: bus: Don't lose unattach notifications
  soundwire: bus: Don't re-enumerate before status is UNATTACHED
  soundwire: cadence: Fix lost ATTACHED interrupts when enumerating
  soundwire: bus: Don't exit early if no device IDs were programmed

Simon Trimmer (1):
  soundwire: cadence: fix updating slave status when a bus has multiple
    peripherals

 drivers/soundwire/bus.c            | 40 +++++++++++-----
 drivers/soundwire/cadence_master.c | 75 ++++++++++++++++--------------
 2 files changed, 68 insertions(+), 47 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/5] soundwire: cadence: fix updating slave status when a bus has multiple peripherals
  2022-09-07  8:52 ` Richard Fitzgerald
@ 2022-09-07  8:52   ` Richard Fitzgerald
  -1 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: alsa-devel, linux-kernel, patches, Simon Trimmer, Richard Fitzgerald

From: Simon Trimmer <simont@opensource.cirrus.com>

The cadence IP explicitly reports slave status changes with bits for
each possible change. The function cdns_update_slave_status() attempts
to translate this into the current status of each of the slaves.

However when there are multiple peripherals on a bus any slave that did
not have a status change when the work function ran would not have it's
status updated - the array is initialised to a value that equates to
UNATTACHED and this can cause spurious reports that slaves had dropped
off the bus.

In the case where a slave has no status change or has multiple status
changes the value from the last PING command is used.

Signed-off-by: Simon Trimmer <simont@opensource.cirrus.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 drivers/soundwire/cadence_master.c | 57 +++++++++++++-----------------
 1 file changed, 25 insertions(+), 32 deletions(-)

diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c
index 4fbb19557f5e..245191d22ccd 100644
--- a/drivers/soundwire/cadence_master.c
+++ b/drivers/soundwire/cadence_master.c
@@ -782,6 +782,7 @@ static int cdns_update_slave_status(struct sdw_cdns *cdns,
 	enum sdw_slave_status status[SDW_MAX_DEVICES + 1];
 	bool is_slave = false;
 	u32 mask;
+	u32 val;
 	int i, set_status;
 
 	memset(status, 0, sizeof(status));
@@ -789,41 +790,38 @@ static int cdns_update_slave_status(struct sdw_cdns *cdns,
 	for (i = 0; i <= SDW_MAX_DEVICES; i++) {
 		mask = (slave_intstat >> (i * CDNS_MCP_SLAVE_STATUS_NUM)) &
 			CDNS_MCP_SLAVE_STATUS_BITS;
-		if (!mask)
-			continue;
 
-		is_slave = true;
 		set_status = 0;
 
-		if (mask & CDNS_MCP_SLAVE_INTSTAT_RESERVED) {
-			status[i] = SDW_SLAVE_RESERVED;
-			set_status++;
-		}
-
-		if (mask & CDNS_MCP_SLAVE_INTSTAT_ATTACHED) {
-			status[i] = SDW_SLAVE_ATTACHED;
-			set_status++;
-		}
+		if (mask) {
+			is_slave = true;
 
-		if (mask & CDNS_MCP_SLAVE_INTSTAT_ALERT) {
-			status[i] = SDW_SLAVE_ALERT;
-			set_status++;
-		}
+			if (mask & CDNS_MCP_SLAVE_INTSTAT_RESERVED) {
+				status[i] = SDW_SLAVE_RESERVED;
+				set_status++;
+			}
 
-		if (mask & CDNS_MCP_SLAVE_INTSTAT_NPRESENT) {
-			status[i] = SDW_SLAVE_UNATTACHED;
-			set_status++;
-		}
+			if (mask & CDNS_MCP_SLAVE_INTSTAT_ATTACHED) {
+				status[i] = SDW_SLAVE_ATTACHED;
+				set_status++;
+			}
 
-		/* first check if Slave reported multiple status */
-		if (set_status > 1) {
-			u32 val;
+			if (mask & CDNS_MCP_SLAVE_INTSTAT_ALERT) {
+				status[i] = SDW_SLAVE_ALERT;
+				set_status++;
+			}
 
-			dev_warn_ratelimited(cdns->dev,
-					     "Slave %d reported multiple Status: %d\n",
-					     i, mask);
+			if (mask & CDNS_MCP_SLAVE_INTSTAT_NPRESENT) {
+				status[i] = SDW_SLAVE_UNATTACHED;
+				set_status++;
+			}
+		}
 
-			/* check latest status extracted from PING commands */
+		/*
+		 * check that there was a single reported Slave status and when
+		 * there is not use the latest status extracted from PING commands
+		 */
+		if (set_status != 1) {
 			val = cdns_readl(cdns, CDNS_MCP_SLAVE_STAT);
 			val >>= (i * 2);
 
@@ -842,11 +840,6 @@ static int cdns_update_slave_status(struct sdw_cdns *cdns,
 				status[i] = SDW_SLAVE_RESERVED;
 				break;
 			}
-
-			dev_warn_ratelimited(cdns->dev,
-					     "Slave %d status updated to %d\n",
-					     i, status[i]);
-
 		}
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 1/5] soundwire: cadence: fix updating slave status when a bus has multiple peripherals
@ 2022-09-07  8:52   ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: patches, alsa-devel, Richard Fitzgerald, Simon Trimmer, linux-kernel

From: Simon Trimmer <simont@opensource.cirrus.com>

The cadence IP explicitly reports slave status changes with bits for
each possible change. The function cdns_update_slave_status() attempts
to translate this into the current status of each of the slaves.

However when there are multiple peripherals on a bus any slave that did
not have a status change when the work function ran would not have it's
status updated - the array is initialised to a value that equates to
UNATTACHED and this can cause spurious reports that slaves had dropped
off the bus.

In the case where a slave has no status change or has multiple status
changes the value from the last PING command is used.

Signed-off-by: Simon Trimmer <simont@opensource.cirrus.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 drivers/soundwire/cadence_master.c | 57 +++++++++++++-----------------
 1 file changed, 25 insertions(+), 32 deletions(-)

diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c
index 4fbb19557f5e..245191d22ccd 100644
--- a/drivers/soundwire/cadence_master.c
+++ b/drivers/soundwire/cadence_master.c
@@ -782,6 +782,7 @@ static int cdns_update_slave_status(struct sdw_cdns *cdns,
 	enum sdw_slave_status status[SDW_MAX_DEVICES + 1];
 	bool is_slave = false;
 	u32 mask;
+	u32 val;
 	int i, set_status;
 
 	memset(status, 0, sizeof(status));
@@ -789,41 +790,38 @@ static int cdns_update_slave_status(struct sdw_cdns *cdns,
 	for (i = 0; i <= SDW_MAX_DEVICES; i++) {
 		mask = (slave_intstat >> (i * CDNS_MCP_SLAVE_STATUS_NUM)) &
 			CDNS_MCP_SLAVE_STATUS_BITS;
-		if (!mask)
-			continue;
 
-		is_slave = true;
 		set_status = 0;
 
-		if (mask & CDNS_MCP_SLAVE_INTSTAT_RESERVED) {
-			status[i] = SDW_SLAVE_RESERVED;
-			set_status++;
-		}
-
-		if (mask & CDNS_MCP_SLAVE_INTSTAT_ATTACHED) {
-			status[i] = SDW_SLAVE_ATTACHED;
-			set_status++;
-		}
+		if (mask) {
+			is_slave = true;
 
-		if (mask & CDNS_MCP_SLAVE_INTSTAT_ALERT) {
-			status[i] = SDW_SLAVE_ALERT;
-			set_status++;
-		}
+			if (mask & CDNS_MCP_SLAVE_INTSTAT_RESERVED) {
+				status[i] = SDW_SLAVE_RESERVED;
+				set_status++;
+			}
 
-		if (mask & CDNS_MCP_SLAVE_INTSTAT_NPRESENT) {
-			status[i] = SDW_SLAVE_UNATTACHED;
-			set_status++;
-		}
+			if (mask & CDNS_MCP_SLAVE_INTSTAT_ATTACHED) {
+				status[i] = SDW_SLAVE_ATTACHED;
+				set_status++;
+			}
 
-		/* first check if Slave reported multiple status */
-		if (set_status > 1) {
-			u32 val;
+			if (mask & CDNS_MCP_SLAVE_INTSTAT_ALERT) {
+				status[i] = SDW_SLAVE_ALERT;
+				set_status++;
+			}
 
-			dev_warn_ratelimited(cdns->dev,
-					     "Slave %d reported multiple Status: %d\n",
-					     i, mask);
+			if (mask & CDNS_MCP_SLAVE_INTSTAT_NPRESENT) {
+				status[i] = SDW_SLAVE_UNATTACHED;
+				set_status++;
+			}
+		}
 
-			/* check latest status extracted from PING commands */
+		/*
+		 * check that there was a single reported Slave status and when
+		 * there is not use the latest status extracted from PING commands
+		 */
+		if (set_status != 1) {
 			val = cdns_readl(cdns, CDNS_MCP_SLAVE_STAT);
 			val >>= (i * 2);
 
@@ -842,11 +840,6 @@ static int cdns_update_slave_status(struct sdw_cdns *cdns,
 				status[i] = SDW_SLAVE_RESERVED;
 				break;
 			}
-
-			dev_warn_ratelimited(cdns->dev,
-					     "Slave %d status updated to %d\n",
-					     i, status[i]);
-
 		}
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/5] soundwire: bus: Don't lose unattach notifications
  2022-09-07  8:52 ` Richard Fitzgerald
@ 2022-09-07  8:52   ` Richard Fitzgerald
  -1 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: alsa-devel, linux-kernel, patches, Richard Fitzgerald

Ensure that if sdw_handle_slave_status() sees a peripheral
has dropped off the bus it reports it to the client driver.

If there are any devices reporting on address 0 it bails out
after programming the device IDs. So it never reaches the second
loop that calls sdw_update_slave_status().

If the missing device is one that is now showing as unenumerated
it has been given a device ID so will report as attached next
time sdw_handle_slave_status() runs.

With the previous code the client driver would only see another
ATTACHED notification because the UNATTACHED state was lost when
sdw_handle_slave_status() bailed out after programming the
device ID.

This shows up most when the peripheral has to be reset after
downloading updated firmware and there are multiple of these
peripherals on the bus. They will all return to unenumerated state
after the reset, and then there is a mix of unattached, attached
and unenumerated PING states from the peripherals, as each is reset
and they reboot.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 drivers/soundwire/bus.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index d773eee71bc1..1cc858b4107d 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -1767,6 +1767,11 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
 			dev_warn(&slave->dev, "Slave %d state check1: UNATTACHED, status was %d\n",
 				 i, slave->status);
 			sdw_modify_slave_status(slave, SDW_SLAVE_UNATTACHED);
+
+			/* Ensure driver knows that peripheral unattached */
+			ret = sdw_update_slave_status(slave, status[i]);
+			if (ret < 0)
+				dev_warn(&slave->dev, "Update Slave status failed:%d\n", ret);
 		}
 	}

-- 
2.30.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/5] soundwire: bus: Don't lose unattach notifications
@ 2022-09-07  8:52   ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: patches, alsa-devel, Richard Fitzgerald, linux-kernel

Ensure that if sdw_handle_slave_status() sees a peripheral
has dropped off the bus it reports it to the client driver.

If there are any devices reporting on address 0 it bails out
after programming the device IDs. So it never reaches the second
loop that calls sdw_update_slave_status().

If the missing device is one that is now showing as unenumerated
it has been given a device ID so will report as attached next
time sdw_handle_slave_status() runs.

With the previous code the client driver would only see another
ATTACHED notification because the UNATTACHED state was lost when
sdw_handle_slave_status() bailed out after programming the
device ID.

This shows up most when the peripheral has to be reset after
downloading updated firmware and there are multiple of these
peripherals on the bus. They will all return to unenumerated state
after the reset, and then there is a mix of unattached, attached
and unenumerated PING states from the peripherals, as each is reset
and they reboot.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 drivers/soundwire/bus.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index d773eee71bc1..1cc858b4107d 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -1767,6 +1767,11 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
 			dev_warn(&slave->dev, "Slave %d state check1: UNATTACHED, status was %d\n",
 				 i, slave->status);
 			sdw_modify_slave_status(slave, SDW_SLAVE_UNATTACHED);
+
+			/* Ensure driver knows that peripheral unattached */
+			ret = sdw_update_slave_status(slave, status[i]);
+			if (ret < 0)
+				dev_warn(&slave->dev, "Update Slave status failed:%d\n", ret);
 		}
 	}

-- 
2.30.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/5] soundwire: bus: Don't re-enumerate before status is UNATTACHED
  2022-09-07  8:52 ` Richard Fitzgerald
@ 2022-09-07  8:52   ` Richard Fitzgerald
  -1 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: alsa-devel, linux-kernel, patches, Richard Fitzgerald

Don't re-enumerate a peripheral on #0 until we have seen and
handled an UNATTACHED notification for that peripheral.

Without this, it is possible for the UNATTACHED status to be missed
and so the slave->status remains at ATTACHED. If slave->status never
changes to UNATTACHED the child driver will never be notified of the
UNATTACH, and the code in sdw_handle_slave_status() will skip the
second part of enumeration because the slave->status has not changed.

This scenario can happen because PINGs are handled in a workqueue
function which is working from a snapshot of an old PING, and there
is no guarantee when this function will run.

A peripheral could report attached in the PING being handled by
sdw_handle_slave_status(), but has since reverted to device #0 and is
then found in the loop in sdw_program_device_num(). Previously the
code would not have updated slave->status to UNATTACHED because it had
not yet handled a PING where that peripheral had UNATTACHED.

This situation happens fairly frequently with multiple peripherals on
a bus that are intentionally reset (for example after downloading
firmware).

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
---
 drivers/soundwire/bus.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index 1cc858b4107d..6e569a875a9b 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -773,6 +773,16 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 			if (sdw_compare_devid(slave, id) == 0) {
 				found = true;

+				/*
+				 * To prevent skipping state-machine stages don't
+				 * program a device until we've seen it UNATTACH.
+				 * Must return here because no other device on #0
+				 * can be detected until this one has been
+				 * assigned a device ID.
+				 */
+				if (slave->status != SDW_SLAVE_UNATTACHED)
+					return 0;
+
 				/*
 				 * Assign a new dev_num to this Slave and
 				 * not mark it present. It will be marked
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/5] soundwire: bus: Don't re-enumerate before status is UNATTACHED
@ 2022-09-07  8:52   ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: patches, alsa-devel, Richard Fitzgerald, linux-kernel

Don't re-enumerate a peripheral on #0 until we have seen and
handled an UNATTACHED notification for that peripheral.

Without this, it is possible for the UNATTACHED status to be missed
and so the slave->status remains at ATTACHED. If slave->status never
changes to UNATTACHED the child driver will never be notified of the
UNATTACH, and the code in sdw_handle_slave_status() will skip the
second part of enumeration because the slave->status has not changed.

This scenario can happen because PINGs are handled in a workqueue
function which is working from a snapshot of an old PING, and there
is no guarantee when this function will run.

A peripheral could report attached in the PING being handled by
sdw_handle_slave_status(), but has since reverted to device #0 and is
then found in the loop in sdw_program_device_num(). Previously the
code would not have updated slave->status to UNATTACHED because it had
not yet handled a PING where that peripheral had UNATTACHED.

This situation happens fairly frequently with multiple peripherals on
a bus that are intentionally reset (for example after downloading
firmware).

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
---
 drivers/soundwire/bus.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index 1cc858b4107d..6e569a875a9b 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -773,6 +773,16 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 			if (sdw_compare_devid(slave, id) == 0) {
 				found = true;

+				/*
+				 * To prevent skipping state-machine stages don't
+				 * program a device until we've seen it UNATTACH.
+				 * Must return here because no other device on #0
+				 * can be detected until this one has been
+				 * assigned a device ID.
+				 */
+				if (slave->status != SDW_SLAVE_UNATTACHED)
+					return 0;
+
 				/*
 				 * Assign a new dev_num to this Slave and
 				 * not mark it present. It will be marked
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/5] soundwire: cadence: Fix lost ATTACHED interrupts when enumerating
  2022-09-07  8:52 ` Richard Fitzgerald
@ 2022-09-07  8:52   ` Richard Fitzgerald
  -1 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: alsa-devel, linux-kernel, patches, Richard Fitzgerald

The correct way to handle interrupts is to clear the bits we
are about to handle _before_ handling them. Thus if the condition
then re-asserts during the handling we won't lose it.

This patch changes cdns_update_slave_status_work() to do this.

The previous code cleared the interrupts after handling them.
The problem with this is that when handling enumeration of devices
the ATTACH statuses can be accidentally cleared and so some or all
of the devices never complete their enumeration.

Thus we can have a situation like this:
- one or more devices are reverting to ID #0

- accumulated status bits indicate some devices attached and some
  on ID #0. (Remember: status bits are sticky until they are handled)

- Because of device on #0 sdw_handle_slave_status() programs the
  device ID and exits without handling the other status, expecting
  to get an ATTACHED from this reprogrammed device.

- The device immediately starts reporting ATTACHED in PINGs, which
  will assert its CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit.

- cdns_update_slave_status_work() clears INTSTAT0/1. If the initial
  status had CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit set it will be
  cleared.

- The ATTACHED change for the device has now been lost.

- cdns_update_slave_status_work() clears CDNS_MCP_INT_SLAVE_MASK so
  if the new ATTACHED state had set it, it will be cleared without
  ever having been handled.

Unless there is some other state change from another device to cause
a new interrupt, the ATTACHED state of the reprogrammed device will
never cause an interrupt so its enumeration will not be completed.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
---
 drivers/soundwire/cadence_master.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c
index 245191d22ccd..3acd7b89c940 100644
--- a/drivers/soundwire/cadence_master.c
+++ b/drivers/soundwire/cadence_master.c
@@ -954,9 +954,22 @@ static void cdns_update_slave_status_work(struct work_struct *work)
 	u32 device0_status;
 	int retry_count = 0;
 
+	/*
+	 * Clear main interrupt first so we don't lose any assertions
+	 * the happen during this function.
+	 */
+	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
+
 	slave0 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT0);
 	slave1 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT1);
 
+	/*
+	 * Clear the bits before handling so we don't lose any
+	 * bits that re-assert.
+	 */
+	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
+	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
+
 	/* combine the two status */
 	slave_intstat = ((u64)slave1 << 32) | slave0;
 
@@ -964,8 +977,6 @@ static void cdns_update_slave_status_work(struct work_struct *work)
 
 update_status:
 	cdns_update_slave_status(cdns, slave_intstat);
-	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
-	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
 
 	/*
 	 * When there is more than one peripheral per link, it's
@@ -1001,8 +1012,7 @@ static void cdns_update_slave_status_work(struct work_struct *work)
 		}
 	}
 
-	/* clear and unmask Slave interrupt now */
-	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
+	/* unmask Slave interrupt now */
 	cdns_updatel(cdns, CDNS_MCP_INTMASK,
 		     CDNS_MCP_INT_SLAVE_MASK, CDNS_MCP_INT_SLAVE_MASK);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/5] soundwire: cadence: Fix lost ATTACHED interrupts when enumerating
@ 2022-09-07  8:52   ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: patches, alsa-devel, Richard Fitzgerald, linux-kernel

The correct way to handle interrupts is to clear the bits we
are about to handle _before_ handling them. Thus if the condition
then re-asserts during the handling we won't lose it.

This patch changes cdns_update_slave_status_work() to do this.

The previous code cleared the interrupts after handling them.
The problem with this is that when handling enumeration of devices
the ATTACH statuses can be accidentally cleared and so some or all
of the devices never complete their enumeration.

Thus we can have a situation like this:
- one or more devices are reverting to ID #0

- accumulated status bits indicate some devices attached and some
  on ID #0. (Remember: status bits are sticky until they are handled)

- Because of device on #0 sdw_handle_slave_status() programs the
  device ID and exits without handling the other status, expecting
  to get an ATTACHED from this reprogrammed device.

- The device immediately starts reporting ATTACHED in PINGs, which
  will assert its CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit.

- cdns_update_slave_status_work() clears INTSTAT0/1. If the initial
  status had CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit set it will be
  cleared.

- The ATTACHED change for the device has now been lost.

- cdns_update_slave_status_work() clears CDNS_MCP_INT_SLAVE_MASK so
  if the new ATTACHED state had set it, it will be cleared without
  ever having been handled.

Unless there is some other state change from another device to cause
a new interrupt, the ATTACHED state of the reprogrammed device will
never cause an interrupt so its enumeration will not be completed.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
---
 drivers/soundwire/cadence_master.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c
index 245191d22ccd..3acd7b89c940 100644
--- a/drivers/soundwire/cadence_master.c
+++ b/drivers/soundwire/cadence_master.c
@@ -954,9 +954,22 @@ static void cdns_update_slave_status_work(struct work_struct *work)
 	u32 device0_status;
 	int retry_count = 0;
 
+	/*
+	 * Clear main interrupt first so we don't lose any assertions
+	 * the happen during this function.
+	 */
+	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
+
 	slave0 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT0);
 	slave1 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT1);
 
+	/*
+	 * Clear the bits before handling so we don't lose any
+	 * bits that re-assert.
+	 */
+	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
+	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
+
 	/* combine the two status */
 	slave_intstat = ((u64)slave1 << 32) | slave0;
 
@@ -964,8 +977,6 @@ static void cdns_update_slave_status_work(struct work_struct *work)
 
 update_status:
 	cdns_update_slave_status(cdns, slave_intstat);
-	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
-	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
 
 	/*
 	 * When there is more than one peripheral per link, it's
@@ -1001,8 +1012,7 @@ static void cdns_update_slave_status_work(struct work_struct *work)
 		}
 	}
 
-	/* clear and unmask Slave interrupt now */
-	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
+	/* unmask Slave interrupt now */
 	cdns_updatel(cdns, CDNS_MCP_INTMASK,
 		     CDNS_MCP_INT_SLAVE_MASK, CDNS_MCP_INT_SLAVE_MASK);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed
  2022-09-07  8:52 ` Richard Fitzgerald
@ 2022-09-07  8:52   ` Richard Fitzgerald
  -1 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: alsa-devel, linux-kernel, patches, Richard Fitzgerald

Only exit sdw_handle_slave_status() right after calling
sdw_program_device_num() if it actually programmed an ID into at
least one device.

sdw_handle_slave_status() should protect itself against phantom
device #0 ATTACHED indications. In that case there is no actual
device still on #0. The early exit relies on there being a status
change to ATTACHED on the reprogrammed device to trigger another
call to sdw_handle_slave_status() which will then handle the status
of all peripherals. If no device was actually programmed with an
ID there won't be a new ATTACHED indication. This can lead to the
status of other peripherals not being handled.

The status passed to sdw_handle_slave_status() is obviously always
from a point of time in the past, and may indicate accumulated
unhandled events (depending how the bus manager operates). It's
possible that a device ID is reprogrammed but the last PING status
captured state just before that, when it was still reporting on
ID #0. Then sdw_handle_slave_status() is called with this PING info,
just before a new PING status is available showing it now on its new
ID. So sdw_handle_slave_status() will receive a phantom report of a
device on #0, but it will not find one.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
---
 drivers/soundwire/bus.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index 6e569a875a9b..0bcc2d161eb9 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -736,20 +736,19 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 	struct sdw_slave_id id;
 	struct sdw_msg msg;
 	bool found;
-	int count = 0, ret;
+	int count = 0, num_programmed = 0, ret;
 	u64 addr;
 
 	/* No Slave, so use raw xfer api */
 	ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
 			   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
 	if (ret < 0)
-		return ret;
+		return 0;
 
 	do {
 		ret = sdw_transfer(bus, &msg);
 		if (ret == -ENODATA) { /* end of device id reads */
 			dev_dbg(bus->dev, "No more devices to enumerate\n");
-			ret = 0;
 			break;
 		}
 		if (ret < 0) {
@@ -781,7 +780,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 				 * assigned a device ID.
 				 */
 				if (slave->status != SDW_SLAVE_UNATTACHED)
-					return 0;
+					return num_programmed;
 
 				/*
 				 * Assign a new dev_num to this Slave and
@@ -794,9 +793,11 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 					dev_err(bus->dev,
 						"Assign dev_num failed:%d\n",
 						ret);
-					return ret;
+					return num_programmed;
 				}
 
+				++num_programmed;
+
 				break;
 			}
 		}
@@ -825,7 +826,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 
 	} while (ret == 0 && count < (SDW_MAX_DEVICES * 2));
 
-	return ret;
+	return num_programmed;
 }
 
 static void sdw_modify_slave_status(struct sdw_slave *slave,
@@ -1787,14 +1788,16 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
 
 	if (status[0] == SDW_SLAVE_ATTACHED) {
 		dev_dbg(bus->dev, "Slave attached, programming device number\n");
-		ret = sdw_program_device_num(bus);
-		if (ret < 0)
-			dev_err(bus->dev, "Slave attach failed: %d\n", ret);
+
 		/*
-		 * programming a device number will have side effects,
-		 * so we deal with other devices at a later time
+		 * Programming a device number will have side effects,
+		 * so we deal with other devices at a later time.
+		 * But only if any devices were reprogrammed, because
+		 * this relies on its PING state changing to ATTACHED,
+		 * triggering a status change.
 		 */
-		return ret;
+		if (sdw_program_device_num(bus))
+			return 0;
 	}
 
 	/* Continue to check other slave statuses */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed
@ 2022-09-07  8:52   ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-07  8:52 UTC (permalink / raw)
  To: vkoul, yung-chuan.liao, pierre-louis.bossart, sanyog.r.kale
  Cc: patches, alsa-devel, Richard Fitzgerald, linux-kernel

Only exit sdw_handle_slave_status() right after calling
sdw_program_device_num() if it actually programmed an ID into at
least one device.

sdw_handle_slave_status() should protect itself against phantom
device #0 ATTACHED indications. In that case there is no actual
device still on #0. The early exit relies on there being a status
change to ATTACHED on the reprogrammed device to trigger another
call to sdw_handle_slave_status() which will then handle the status
of all peripherals. If no device was actually programmed with an
ID there won't be a new ATTACHED indication. This can lead to the
status of other peripherals not being handled.

The status passed to sdw_handle_slave_status() is obviously always
from a point of time in the past, and may indicate accumulated
unhandled events (depending how the bus manager operates). It's
possible that a device ID is reprogrammed but the last PING status
captured state just before that, when it was still reporting on
ID #0. Then sdw_handle_slave_status() is called with this PING info,
just before a new PING status is available showing it now on its new
ID. So sdw_handle_slave_status() will receive a phantom report of a
device on #0, but it will not find one.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
---
 drivers/soundwire/bus.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index 6e569a875a9b..0bcc2d161eb9 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -736,20 +736,19 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 	struct sdw_slave_id id;
 	struct sdw_msg msg;
 	bool found;
-	int count = 0, ret;
+	int count = 0, num_programmed = 0, ret;
 	u64 addr;
 
 	/* No Slave, so use raw xfer api */
 	ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
 			   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
 	if (ret < 0)
-		return ret;
+		return 0;
 
 	do {
 		ret = sdw_transfer(bus, &msg);
 		if (ret == -ENODATA) { /* end of device id reads */
 			dev_dbg(bus->dev, "No more devices to enumerate\n");
-			ret = 0;
 			break;
 		}
 		if (ret < 0) {
@@ -781,7 +780,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 				 * assigned a device ID.
 				 */
 				if (slave->status != SDW_SLAVE_UNATTACHED)
-					return 0;
+					return num_programmed;
 
 				/*
 				 * Assign a new dev_num to this Slave and
@@ -794,9 +793,11 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 					dev_err(bus->dev,
 						"Assign dev_num failed:%d\n",
 						ret);
-					return ret;
+					return num_programmed;
 				}
 
+				++num_programmed;
+
 				break;
 			}
 		}
@@ -825,7 +826,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
 
 	} while (ret == 0 && count < (SDW_MAX_DEVICES * 2));
 
-	return ret;
+	return num_programmed;
 }
 
 static void sdw_modify_slave_status(struct sdw_slave *slave,
@@ -1787,14 +1788,16 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
 
 	if (status[0] == SDW_SLAVE_ATTACHED) {
 		dev_dbg(bus->dev, "Slave attached, programming device number\n");
-		ret = sdw_program_device_num(bus);
-		if (ret < 0)
-			dev_err(bus->dev, "Slave attach failed: %d\n", ret);
+
 		/*
-		 * programming a device number will have side effects,
-		 * so we deal with other devices at a later time
+		 * Programming a device number will have side effects,
+		 * so we deal with other devices at a later time.
+		 * But only if any devices were reprogrammed, because
+		 * this relies on its PING state changing to ATTACHED,
+		 * triggering a status change.
 		 */
-		return ret;
+		if (sdw_program_device_num(bus))
+			return 0;
 	}
 
 	/* Continue to check other slave statuses */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] soundwire: bus: Don't re-enumerate before status is UNATTACHED
  2022-09-07  8:52   ` Richard Fitzgerald
@ 2022-09-12 11:00     ` Pierre-Louis Bossart
  -1 siblings, 0 replies; 25+ messages in thread
From: Pierre-Louis Bossart @ 2022-09-12 11:00 UTC (permalink / raw)
  To: Richard Fitzgerald, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: alsa-devel, linux-kernel, patches



On 9/7/22 10:52, Richard Fitzgerald wrote:
> Don't re-enumerate a peripheral on #0 until we have seen and
> handled an UNATTACHED notification for that peripheral.
> 
> Without this, it is possible for the UNATTACHED status to be missed
> and so the slave->status remains at ATTACHED. If slave->status never
> changes to UNATTACHED the child driver will never be notified of the
> UNATTACH, and the code in sdw_handle_slave_status() will skip the
> second part of enumeration because the slave->status has not changed.
> 
> This scenario can happen because PINGs are handled in a workqueue
> function which is working from a snapshot of an old PING, and there
> is no guarantee when this function will run.
> 
> A peripheral could report attached in the PING being handled by
> sdw_handle_slave_status(), but has since reverted to device #0 and is
> then found in the loop in sdw_program_device_num(). Previously the
> code would not have updated slave->status to UNATTACHED because it had
> not yet handled a PING where that peripheral had UNATTACHED.
> 
> This situation happens fairly frequently with multiple peripherals on
> a bus that are intentionally reset (for example after downloading
> firmware).
> 
> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>

Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> ---
>  drivers/soundwire/bus.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
> index 1cc858b4107d..6e569a875a9b 100644
> --- a/drivers/soundwire/bus.c
> +++ b/drivers/soundwire/bus.c
> @@ -773,6 +773,16 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>  			if (sdw_compare_devid(slave, id) == 0) {
>  				found = true;
>  
> +				/*
> +				 * To prevent skipping state-machine stages don't
> +				 * program a device until we've seen it UNATTACH.
> +				 * Must return here because no other device on #0
> +				 * can be detected until this one has been
> +				 * assigned a device ID.
> +				 */
> +				if (slave->status != SDW_SLAVE_UNATTACHED)
> +					return 0;
> +
>  				/*
>  				 * Assign a new dev_num to this Slave and
>  				 * not mark it present. It will be marked

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] soundwire: bus: Don't re-enumerate before status is UNATTACHED
@ 2022-09-12 11:00     ` Pierre-Louis Bossart
  0 siblings, 0 replies; 25+ messages in thread
From: Pierre-Louis Bossart @ 2022-09-12 11:00 UTC (permalink / raw)
  To: Richard Fitzgerald, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: patches, alsa-devel, linux-kernel



On 9/7/22 10:52, Richard Fitzgerald wrote:
> Don't re-enumerate a peripheral on #0 until we have seen and
> handled an UNATTACHED notification for that peripheral.
> 
> Without this, it is possible for the UNATTACHED status to be missed
> and so the slave->status remains at ATTACHED. If slave->status never
> changes to UNATTACHED the child driver will never be notified of the
> UNATTACH, and the code in sdw_handle_slave_status() will skip the
> second part of enumeration because the slave->status has not changed.
> 
> This scenario can happen because PINGs are handled in a workqueue
> function which is working from a snapshot of an old PING, and there
> is no guarantee when this function will run.
> 
> A peripheral could report attached in the PING being handled by
> sdw_handle_slave_status(), but has since reverted to device #0 and is
> then found in the loop in sdw_program_device_num(). Previously the
> code would not have updated slave->status to UNATTACHED because it had
> not yet handled a PING where that peripheral had UNATTACHED.
> 
> This situation happens fairly frequently with multiple peripherals on
> a bus that are intentionally reset (for example after downloading
> firmware).
> 
> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>

Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> ---
>  drivers/soundwire/bus.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
> index 1cc858b4107d..6e569a875a9b 100644
> --- a/drivers/soundwire/bus.c
> +++ b/drivers/soundwire/bus.c
> @@ -773,6 +773,16 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>  			if (sdw_compare_devid(slave, id) == 0) {
>  				found = true;
>  
> +				/*
> +				 * To prevent skipping state-machine stages don't
> +				 * program a device until we've seen it UNATTACH.
> +				 * Must return here because no other device on #0
> +				 * can be detected until this one has been
> +				 * assigned a device ID.
> +				 */
> +				if (slave->status != SDW_SLAVE_UNATTACHED)
> +					return 0;
> +
>  				/*
>  				 * Assign a new dev_num to this Slave and
>  				 * not mark it present. It will be marked

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] soundwire: cadence: Fix lost ATTACHED interrupts when enumerating
  2022-09-07  8:52   ` Richard Fitzgerald
@ 2022-09-12 11:05     ` Pierre-Louis Bossart
  -1 siblings, 0 replies; 25+ messages in thread
From: Pierre-Louis Bossart @ 2022-09-12 11:05 UTC (permalink / raw)
  To: Richard Fitzgerald, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: alsa-devel, linux-kernel, patches



On 9/7/22 10:52, Richard Fitzgerald wrote:
> The correct way to handle interrupts is to clear the bits we
> are about to handle _before_ handling them. Thus if the condition
> then re-asserts during the handling we won't lose it.
> 
> This patch changes cdns_update_slave_status_work() to do this.
> 
> The previous code cleared the interrupts after handling them.
> The problem with this is that when handling enumeration of devices
> the ATTACH statuses can be accidentally cleared and so some or all
> of the devices never complete their enumeration.
> 
> Thus we can have a situation like this:
> - one or more devices are reverting to ID #0
> 
> - accumulated status bits indicate some devices attached and some
>   on ID #0. (Remember: status bits are sticky until they are handled)
> 
> - Because of device on #0 sdw_handle_slave_status() programs the
>   device ID and exits without handling the other status, expecting
>   to get an ATTACHED from this reprogrammed device.
> 
> - The device immediately starts reporting ATTACHED in PINGs, which
>   will assert its CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit.
> 
> - cdns_update_slave_status_work() clears INTSTAT0/1. If the initial
>   status had CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit set it will be
>   cleared.
> 
> - The ATTACHED change for the device has now been lost.
> 
> - cdns_update_slave_status_work() clears CDNS_MCP_INT_SLAVE_MASK so
>   if the new ATTACHED state had set it, it will be cleared without
>   ever having been handled.
> 
> Unless there is some other state change from another device to cause
> a new interrupt, the ATTACHED state of the reprogrammed device will
> never cause an interrupt so its enumeration will not be completed.
> 
> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
> ---
>  drivers/soundwire/cadence_master.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c
> index 245191d22ccd..3acd7b89c940 100644
> --- a/drivers/soundwire/cadence_master.c
> +++ b/drivers/soundwire/cadence_master.c
> @@ -954,9 +954,22 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>  	u32 device0_status;
>  	int retry_count = 0;
>  
> +	/*
> +	 * Clear main interrupt first so we don't lose any assertions
> +	 * the happen during this function.
> +	 */
> +	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
> +
>  	slave0 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT0);
>  	slave1 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT1);
>  
> +	/*
> +	 * Clear the bits before handling so we don't lose any
> +	 * bits that re-assert.
> +	 */
> +	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
> +	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
> +
>  	/* combine the two status */
>  	slave_intstat = ((u64)slave1 << 32) | slave0;
>  
> @@ -964,8 +977,6 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>  
>  update_status:
>  	cdns_update_slave_status(cdns, slave_intstat);
> -	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
> -	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);

this one is hard to review, if you don't clear the status here, then how
does the retry work if there is a new event?

Put differently, do we need to retry and the 'goto update_status' any more?

>  
>  	/*
>  	 * When there is more than one peripheral per link, it's
> @@ -1001,8 +1012,7 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>  		}
>  	}
>  
> -	/* clear and unmask Slave interrupt now */
> -	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
> +	/* unmask Slave interrupt now */
>  	cdns_updatel(cdns, CDNS_MCP_INTMASK,
>  		     CDNS_MCP_INT_SLAVE_MASK, CDNS_MCP_INT_SLAVE_MASK);
>  

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] soundwire: cadence: Fix lost ATTACHED interrupts when enumerating
@ 2022-09-12 11:05     ` Pierre-Louis Bossart
  0 siblings, 0 replies; 25+ messages in thread
From: Pierre-Louis Bossart @ 2022-09-12 11:05 UTC (permalink / raw)
  To: Richard Fitzgerald, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: patches, alsa-devel, linux-kernel



On 9/7/22 10:52, Richard Fitzgerald wrote:
> The correct way to handle interrupts is to clear the bits we
> are about to handle _before_ handling them. Thus if the condition
> then re-asserts during the handling we won't lose it.
> 
> This patch changes cdns_update_slave_status_work() to do this.
> 
> The previous code cleared the interrupts after handling them.
> The problem with this is that when handling enumeration of devices
> the ATTACH statuses can be accidentally cleared and so some or all
> of the devices never complete their enumeration.
> 
> Thus we can have a situation like this:
> - one or more devices are reverting to ID #0
> 
> - accumulated status bits indicate some devices attached and some
>   on ID #0. (Remember: status bits are sticky until they are handled)
> 
> - Because of device on #0 sdw_handle_slave_status() programs the
>   device ID and exits without handling the other status, expecting
>   to get an ATTACHED from this reprogrammed device.
> 
> - The device immediately starts reporting ATTACHED in PINGs, which
>   will assert its CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit.
> 
> - cdns_update_slave_status_work() clears INTSTAT0/1. If the initial
>   status had CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit set it will be
>   cleared.
> 
> - The ATTACHED change for the device has now been lost.
> 
> - cdns_update_slave_status_work() clears CDNS_MCP_INT_SLAVE_MASK so
>   if the new ATTACHED state had set it, it will be cleared without
>   ever having been handled.
> 
> Unless there is some other state change from another device to cause
> a new interrupt, the ATTACHED state of the reprogrammed device will
> never cause an interrupt so its enumeration will not be completed.
> 
> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
> ---
>  drivers/soundwire/cadence_master.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c
> index 245191d22ccd..3acd7b89c940 100644
> --- a/drivers/soundwire/cadence_master.c
> +++ b/drivers/soundwire/cadence_master.c
> @@ -954,9 +954,22 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>  	u32 device0_status;
>  	int retry_count = 0;
>  
> +	/*
> +	 * Clear main interrupt first so we don't lose any assertions
> +	 * the happen during this function.
> +	 */
> +	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
> +
>  	slave0 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT0);
>  	slave1 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT1);
>  
> +	/*
> +	 * Clear the bits before handling so we don't lose any
> +	 * bits that re-assert.
> +	 */
> +	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
> +	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
> +
>  	/* combine the two status */
>  	slave_intstat = ((u64)slave1 << 32) | slave0;
>  
> @@ -964,8 +977,6 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>  
>  update_status:
>  	cdns_update_slave_status(cdns, slave_intstat);
> -	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
> -	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);

this one is hard to review, if you don't clear the status here, then how
does the retry work if there is a new event?

Put differently, do we need to retry and the 'goto update_status' any more?

>  
>  	/*
>  	 * When there is more than one peripheral per link, it's
> @@ -1001,8 +1012,7 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>  		}
>  	}
>  
> -	/* clear and unmask Slave interrupt now */
> -	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
> +	/* unmask Slave interrupt now */
>  	cdns_updatel(cdns, CDNS_MCP_INTMASK,
>  		     CDNS_MCP_INT_SLAVE_MASK, CDNS_MCP_INT_SLAVE_MASK);
>  

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed
  2022-09-07  8:52   ` Richard Fitzgerald
  (?)
@ 2022-09-12 11:43   ` Pierre-Louis Bossart
  2022-09-12 12:25       ` Richard Fitzgerald
  -1 siblings, 1 reply; 25+ messages in thread
From: Pierre-Louis Bossart @ 2022-09-12 11:43 UTC (permalink / raw)
  To: Richard Fitzgerald, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: patches, alsa-devel, linux-kernel



On 9/7/22 10:52, Richard Fitzgerald wrote:
> Only exit sdw_handle_slave_status() right after calling
> sdw_program_device_num() if it actually programmed an ID into at
> least one device.
> 
> sdw_handle_slave_status() should protect itself against phantom
> device #0 ATTACHED indications. In that case there is no actual
> device still on #0. The early exit relies on there being a status
> change to ATTACHED on the reprogrammed device to trigger another
> call to sdw_handle_slave_status() which will then handle the status
> of all peripherals. If no device was actually programmed with an
> ID there won't be a new ATTACHED indication. This can lead to the
> status of other peripherals not being handled.
> 
> The status passed to sdw_handle_slave_status() is obviously always
> from a point of time in the past, and may indicate accumulated
> unhandled events (depending how the bus manager operates). It's
> possible that a device ID is reprogrammed but the last PING status
> captured state just before that, when it was still reporting on
> ID #0. Then sdw_handle_slave_status() is called with this PING info,
> just before a new PING status is available showing it now on its new
> ID. So sdw_handle_slave_status() will receive a phantom report of a
> device on #0, but it will not find one.
> 
> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
> ---
>  drivers/soundwire/bus.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
> index 6e569a875a9b..0bcc2d161eb9 100644
> --- a/drivers/soundwire/bus.c
> +++ b/drivers/soundwire/bus.c
> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>  	struct sdw_slave_id id;
>  	struct sdw_msg msg;
>  	bool found;
> -	int count = 0, ret;
> +	int count = 0, num_programmed = 0, ret;
>  	u64 addr;
>  
>  	/* No Slave, so use raw xfer api */
>  	ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>  			   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
>  	if (ret < 0)
> -		return ret;
> +		return 0;

this doesn't seem quite right to me, there are multiple -EINVAL cases
handled in sdw_fill_msg().

I didn't check if all these error cases are irrelevant in that specific
enumeration case, if that was the case maybe we need to break that
function in two helpers so that all the checks can be skipped.

>  
>  	do {
>  		ret = sdw_transfer(bus, &msg);
>  		if (ret == -ENODATA) { /* end of device id reads */
>  			dev_dbg(bus->dev, "No more devices to enumerate\n");
> -			ret = 0;
>  			break;
>  		}
>  		if (ret < 0) {
> @@ -781,7 +780,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>  				 * assigned a device ID.
>  				 */
>  				if (slave->status != SDW_SLAVE_UNATTACHED)
> -					return 0;
> +					return num_programmed;
>  
>  				/*
>  				 * Assign a new dev_num to this Slave and
> @@ -794,9 +793,11 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>  					dev_err(bus->dev,
>  						"Assign dev_num failed:%d\n",
>  						ret);
> -					return ret;
> +					return num_programmed;
>  				}
>  
> +				++num_programmed;
> +
>  				break;
>  			}
>  		}
> @@ -825,7 +826,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>  
>  	} while (ret == 0 && count < (SDW_MAX_DEVICES * 2));
>  
> -	return ret;
> +	return num_programmed;
>  }
>  
>  static void sdw_modify_slave_status(struct sdw_slave *slave,
> @@ -1787,14 +1788,16 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
>  
>  	if (status[0] == SDW_SLAVE_ATTACHED) {
>  		dev_dbg(bus->dev, "Slave attached, programming device number\n");
> -		ret = sdw_program_device_num(bus);
> -		if (ret < 0)
> -			dev_err(bus->dev, "Slave attach failed: %d\n", ret);
> +
>  		/*
> -		 * programming a device number will have side effects,
> -		 * so we deal with other devices at a later time
> +		 * Programming a device number will have side effects,
> +		 * so we deal with other devices at a later time.
> +		 * But only if any devices were reprogrammed, because
> +		 * this relies on its PING state changing to ATTACHED,
> +		 * triggering a status change.
>  		 */
> -		return ret;
> +		if (sdw_program_device_num(bus))
> +			return 0;
>  	}
>  
>  	/* Continue to check other slave statuses */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed
  2022-09-12 11:43   ` Pierre-Louis Bossart
@ 2022-09-12 12:25       ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-12 12:25 UTC (permalink / raw)
  To: Pierre-Louis Bossart, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: patches, alsa-devel, linux-kernel

On 12/09/2022 12:43, Pierre-Louis Bossart wrote:
> 
> 
> On 9/7/22 10:52, Richard Fitzgerald wrote:
>> Only exit sdw_handle_slave_status() right after calling
>> sdw_program_device_num() if it actually programmed an ID into at
>> least one device.
>>
>> sdw_handle_slave_status() should protect itself against phantom
>> device #0 ATTACHED indications. In that case there is no actual
>> device still on #0. The early exit relies on there being a status
>> change to ATTACHED on the reprogrammed device to trigger another
>> call to sdw_handle_slave_status() which will then handle the status
>> of all peripherals. If no device was actually programmed with an
>> ID there won't be a new ATTACHED indication. This can lead to the
>> status of other peripherals not being handled.
>>
>> The status passed to sdw_handle_slave_status() is obviously always
>> from a point of time in the past, and may indicate accumulated
>> unhandled events (depending how the bus manager operates). It's
>> possible that a device ID is reprogrammed but the last PING status
>> captured state just before that, when it was still reporting on
>> ID #0. Then sdw_handle_slave_status() is called with this PING info,
>> just before a new PING status is available showing it now on its new
>> ID. So sdw_handle_slave_status() will receive a phantom report of a
>> device on #0, but it will not find one.
>>
>> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
>> ---
>>   drivers/soundwire/bus.c | 27 +++++++++++++++------------
>>   1 file changed, 15 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>> index 6e569a875a9b..0bcc2d161eb9 100644
>> --- a/drivers/soundwire/bus.c
>> +++ b/drivers/soundwire/bus.c
>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>>   	struct sdw_slave_id id;
>>   	struct sdw_msg msg;
>>   	bool found;
>> -	int count = 0, ret;
>> +	int count = 0, num_programmed = 0, ret;
>>   	u64 addr;
>>   
>>   	/* No Slave, so use raw xfer api */
>>   	ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>   			   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
>>   	if (ret < 0)
>> -		return ret;
>> +		return 0;
> 
> this doesn't seem quite right to me, there are multiple -EINVAL cases
> handled in sdw_fill_msg().
> 
> I didn't check if all these error cases are irrelevant in that specific
> enumeration case, if that was the case maybe we need to break that
> function in two helpers so that all the checks can be skipped.
> 

I don't think that there's anything useful that
sdw_modify_slave_status() could do to recover from an error.

If any device IDs were programmed then, according to the statement in
sdw_modify_slave_status()

	* programming a device number will have side effects,
	* so we deal with other devices at a later time

if this is true, then we need to exit to deal with what _was_
programmed, even if one of them failed.

If nothing was programmed, and there was an error, we can't bail out of
sdw_modify_slave_status(). We have status for other devices which
we can't simply ignore.

Ultimately I can't see how pushing the error code up is useful.
sdw_modify_slave_status() can't really do any effective recovery action,
and the original behavior of giving up and returning means that
an error in programming dev ID potentially causes collateral damage to
the status of other peripherals.

>>   
>>   	do {
>>   		ret = sdw_transfer(bus, &msg);
>>   		if (ret == -ENODATA) { /* end of device id reads */
>>   			dev_dbg(bus->dev, "No more devices to enumerate\n");
>> -			ret = 0;
>>   			break;
>>   		}
>>   		if (ret < 0) {
>> @@ -781,7 +780,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>>   				 * assigned a device ID.
>>   				 */
>>   				if (slave->status != SDW_SLAVE_UNATTACHED)
>> -					return 0;
>> +					return num_programmed;
>>   
>>   				/*
>>   				 * Assign a new dev_num to this Slave and
>> @@ -794,9 +793,11 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>>   					dev_err(bus->dev,
>>   						"Assign dev_num failed:%d\n",
>>   						ret);
>> -					return ret;
>> +					return num_programmed;
>>   				}
>>   
>> +				++num_programmed;
>> +
>>   				break;
>>   			}
>>   		}
>> @@ -825,7 +826,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>>   
>>   	} while (ret == 0 && count < (SDW_MAX_DEVICES * 2));
>>   
>> -	return ret;
>> +	return num_programmed;
>>   }
>>   
>>   static void sdw_modify_slave_status(struct sdw_slave *slave,
>> @@ -1787,14 +1788,16 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
>>   
>>   	if (status[0] == SDW_SLAVE_ATTACHED) {
>>   		dev_dbg(bus->dev, "Slave attached, programming device number\n");
>> -		ret = sdw_program_device_num(bus);
>> -		if (ret < 0)
>> -			dev_err(bus->dev, "Slave attach failed: %d\n", ret);
>> +
>>   		/*
>> -		 * programming a device number will have side effects,
>> -		 * so we deal with other devices at a later time
>> +		 * Programming a device number will have side effects,
>> +		 * so we deal with other devices at a later time.
>> +		 * But only if any devices were reprogrammed, because
>> +		 * this relies on its PING state changing to ATTACHED,
>> +		 * triggering a status change.
>>   		 */
>> -		return ret;
>> +		if (sdw_program_device_num(bus))
>> +			return 0;
>>   	}
>>   
>>   	/* Continue to check other slave statuses */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed
@ 2022-09-12 12:25       ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-12 12:25 UTC (permalink / raw)
  To: Pierre-Louis Bossart, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: patches, alsa-devel, linux-kernel

On 12/09/2022 12:43, Pierre-Louis Bossart wrote:
> 
> 
> On 9/7/22 10:52, Richard Fitzgerald wrote:
>> Only exit sdw_handle_slave_status() right after calling
>> sdw_program_device_num() if it actually programmed an ID into at
>> least one device.
>>
>> sdw_handle_slave_status() should protect itself against phantom
>> device #0 ATTACHED indications. In that case there is no actual
>> device still on #0. The early exit relies on there being a status
>> change to ATTACHED on the reprogrammed device to trigger another
>> call to sdw_handle_slave_status() which will then handle the status
>> of all peripherals. If no device was actually programmed with an
>> ID there won't be a new ATTACHED indication. This can lead to the
>> status of other peripherals not being handled.
>>
>> The status passed to sdw_handle_slave_status() is obviously always
>> from a point of time in the past, and may indicate accumulated
>> unhandled events (depending how the bus manager operates). It's
>> possible that a device ID is reprogrammed but the last PING status
>> captured state just before that, when it was still reporting on
>> ID #0. Then sdw_handle_slave_status() is called with this PING info,
>> just before a new PING status is available showing it now on its new
>> ID. So sdw_handle_slave_status() will receive a phantom report of a
>> device on #0, but it will not find one.
>>
>> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
>> ---
>>   drivers/soundwire/bus.c | 27 +++++++++++++++------------
>>   1 file changed, 15 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>> index 6e569a875a9b..0bcc2d161eb9 100644
>> --- a/drivers/soundwire/bus.c
>> +++ b/drivers/soundwire/bus.c
>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>>   	struct sdw_slave_id id;
>>   	struct sdw_msg msg;
>>   	bool found;
>> -	int count = 0, ret;
>> +	int count = 0, num_programmed = 0, ret;
>>   	u64 addr;
>>   
>>   	/* No Slave, so use raw xfer api */
>>   	ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>   			   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
>>   	if (ret < 0)
>> -		return ret;
>> +		return 0;
> 
> this doesn't seem quite right to me, there are multiple -EINVAL cases
> handled in sdw_fill_msg().
> 
> I didn't check if all these error cases are irrelevant in that specific
> enumeration case, if that was the case maybe we need to break that
> function in two helpers so that all the checks can be skipped.
> 

I don't think that there's anything useful that
sdw_modify_slave_status() could do to recover from an error.

If any device IDs were programmed then, according to the statement in
sdw_modify_slave_status()

	* programming a device number will have side effects,
	* so we deal with other devices at a later time

if this is true, then we need to exit to deal with what _was_
programmed, even if one of them failed.

If nothing was programmed, and there was an error, we can't bail out of
sdw_modify_slave_status(). We have status for other devices which
we can't simply ignore.

Ultimately I can't see how pushing the error code up is useful.
sdw_modify_slave_status() can't really do any effective recovery action,
and the original behavior of giving up and returning means that
an error in programming dev ID potentially causes collateral damage to
the status of other peripherals.

>>   
>>   	do {
>>   		ret = sdw_transfer(bus, &msg);
>>   		if (ret == -ENODATA) { /* end of device id reads */
>>   			dev_dbg(bus->dev, "No more devices to enumerate\n");
>> -			ret = 0;
>>   			break;
>>   		}
>>   		if (ret < 0) {
>> @@ -781,7 +780,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>>   				 * assigned a device ID.
>>   				 */
>>   				if (slave->status != SDW_SLAVE_UNATTACHED)
>> -					return 0;
>> +					return num_programmed;
>>   
>>   				/*
>>   				 * Assign a new dev_num to this Slave and
>> @@ -794,9 +793,11 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>>   					dev_err(bus->dev,
>>   						"Assign dev_num failed:%d\n",
>>   						ret);
>> -					return ret;
>> +					return num_programmed;
>>   				}
>>   
>> +				++num_programmed;
>> +
>>   				break;
>>   			}
>>   		}
>> @@ -825,7 +826,7 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>>   
>>   	} while (ret == 0 && count < (SDW_MAX_DEVICES * 2));
>>   
>> -	return ret;
>> +	return num_programmed;
>>   }
>>   
>>   static void sdw_modify_slave_status(struct sdw_slave *slave,
>> @@ -1787,14 +1788,16 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
>>   
>>   	if (status[0] == SDW_SLAVE_ATTACHED) {
>>   		dev_dbg(bus->dev, "Slave attached, programming device number\n");
>> -		ret = sdw_program_device_num(bus);
>> -		if (ret < 0)
>> -			dev_err(bus->dev, "Slave attach failed: %d\n", ret);
>> +
>>   		/*
>> -		 * programming a device number will have side effects,
>> -		 * so we deal with other devices at a later time
>> +		 * Programming a device number will have side effects,
>> +		 * so we deal with other devices at a later time.
>> +		 * But only if any devices were reprogrammed, because
>> +		 * this relies on its PING state changing to ATTACHED,
>> +		 * triggering a status change.
>>   		 */
>> -		return ret;
>> +		if (sdw_program_device_num(bus))
>> +			return 0;
>>   	}
>>   
>>   	/* Continue to check other slave statuses */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] soundwire: cadence: Fix lost ATTACHED interrupts when enumerating
  2022-09-12 11:05     ` Pierre-Louis Bossart
@ 2022-09-12 12:36       ` Richard Fitzgerald
  -1 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-12 12:36 UTC (permalink / raw)
  To: Pierre-Louis Bossart, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: alsa-devel, linux-kernel, patches



On 12/09/2022 12:05, Pierre-Louis Bossart wrote:
> 
> 
> On 9/7/22 10:52, Richard Fitzgerald wrote:
>> The correct way to handle interrupts is to clear the bits we
>> are about to handle _before_ handling them. Thus if the condition
>> then re-asserts during the handling we won't lose it.
>>
>> This patch changes cdns_update_slave_status_work() to do this.
>>
>> The previous code cleared the interrupts after handling them.
>> The problem with this is that when handling enumeration of devices
>> the ATTACH statuses can be accidentally cleared and so some or all
>> of the devices never complete their enumeration.
>>
>> Thus we can have a situation like this:
>> - one or more devices are reverting to ID #0
>>
>> - accumulated status bits indicate some devices attached and some
>>    on ID #0. (Remember: status bits are sticky until they are handled)
>>
>> - Because of device on #0 sdw_handle_slave_status() programs the
>>    device ID and exits without handling the other status, expecting
>>    to get an ATTACHED from this reprogrammed device.
>>
>> - The device immediately starts reporting ATTACHED in PINGs, which
>>    will assert its CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit.
>>
>> - cdns_update_slave_status_work() clears INTSTAT0/1. If the initial
>>    status had CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit set it will be
>>    cleared.
>>
>> - The ATTACHED change for the device has now been lost.
>>
>> - cdns_update_slave_status_work() clears CDNS_MCP_INT_SLAVE_MASK so
>>    if the new ATTACHED state had set it, it will be cleared without
>>    ever having been handled.
>>
>> Unless there is some other state change from another device to cause
>> a new interrupt, the ATTACHED state of the reprogrammed device will
>> never cause an interrupt so its enumeration will not be completed.
>>
>> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
>> ---
>>   drivers/soundwire/cadence_master.c | 18 ++++++++++++++----
>>   1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c
>> index 245191d22ccd..3acd7b89c940 100644
>> --- a/drivers/soundwire/cadence_master.c
>> +++ b/drivers/soundwire/cadence_master.c
>> @@ -954,9 +954,22 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>>   	u32 device0_status;
>>   	int retry_count = 0;
>>   
>> +	/*
>> +	 * Clear main interrupt first so we don't lose any assertions
>> +	 * the happen during this function.
>> +	 */
>> +	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
>> +
>>   	slave0 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT0);
>>   	slave1 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT1);
>>   
>> +	/*
>> +	 * Clear the bits before handling so we don't lose any
>> +	 * bits that re-assert.
>> +	 */
>> +	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
>> +	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
>> +
>>   	/* combine the two status */
>>   	slave_intstat = ((u64)slave1 << 32) | slave0;
>>   
>> @@ -964,8 +977,6 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>>   
>>   update_status:
>>   	cdns_update_slave_status(cdns, slave_intstat);
>> -	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
>> -	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
> 
> this one is hard to review, if you don't clear the status here, then how
> does the retry work if there is a new event?

The retry loop doesn't work off the interrupt status bits. Precisely
because the #0 ATTACH bit probably doesn't re-assert if the PING status
for #0 doesn't change, the retry checks the most recent PING response
instead.

> 
> Put differently, do we need to retry and the 'goto update_status' any more?
> 

Yes, I believe you do still need it. The Cadence interrupts appear to
assert when there is a change of status. If there are multiple devices
reporting on dev ID #0 then the PING status of #0 will not change until
they have all been reprogrammed, so it will not automatically re-assert.

Anyway, I don't want to mix bugfixes with code improvements. If the loop
_could_ be removed that should be done separately from fixing the
interrupt handling bug.

>>   
>>   	/*
>>   	 * When there is more than one peripheral per link, it's
>> @@ -1001,8 +1012,7 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>>   		}
>>   	}
>>   
>> -	/* clear and unmask Slave interrupt now */
>> -	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
>> +	/* unmask Slave interrupt now */
>>   	cdns_updatel(cdns, CDNS_MCP_INTMASK,
>>   		     CDNS_MCP_INT_SLAVE_MASK, CDNS_MCP_INT_SLAVE_MASK);
>>   

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] soundwire: cadence: Fix lost ATTACHED interrupts when enumerating
@ 2022-09-12 12:36       ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-12 12:36 UTC (permalink / raw)
  To: Pierre-Louis Bossart, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: patches, alsa-devel, linux-kernel



On 12/09/2022 12:05, Pierre-Louis Bossart wrote:
> 
> 
> On 9/7/22 10:52, Richard Fitzgerald wrote:
>> The correct way to handle interrupts is to clear the bits we
>> are about to handle _before_ handling them. Thus if the condition
>> then re-asserts during the handling we won't lose it.
>>
>> This patch changes cdns_update_slave_status_work() to do this.
>>
>> The previous code cleared the interrupts after handling them.
>> The problem with this is that when handling enumeration of devices
>> the ATTACH statuses can be accidentally cleared and so some or all
>> of the devices never complete their enumeration.
>>
>> Thus we can have a situation like this:
>> - one or more devices are reverting to ID #0
>>
>> - accumulated status bits indicate some devices attached and some
>>    on ID #0. (Remember: status bits are sticky until they are handled)
>>
>> - Because of device on #0 sdw_handle_slave_status() programs the
>>    device ID and exits without handling the other status, expecting
>>    to get an ATTACHED from this reprogrammed device.
>>
>> - The device immediately starts reporting ATTACHED in PINGs, which
>>    will assert its CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit.
>>
>> - cdns_update_slave_status_work() clears INTSTAT0/1. If the initial
>>    status had CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit set it will be
>>    cleared.
>>
>> - The ATTACHED change for the device has now been lost.
>>
>> - cdns_update_slave_status_work() clears CDNS_MCP_INT_SLAVE_MASK so
>>    if the new ATTACHED state had set it, it will be cleared without
>>    ever having been handled.
>>
>> Unless there is some other state change from another device to cause
>> a new interrupt, the ATTACHED state of the reprogrammed device will
>> never cause an interrupt so its enumeration will not be completed.
>>
>> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
>> ---
>>   drivers/soundwire/cadence_master.c | 18 ++++++++++++++----
>>   1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c
>> index 245191d22ccd..3acd7b89c940 100644
>> --- a/drivers/soundwire/cadence_master.c
>> +++ b/drivers/soundwire/cadence_master.c
>> @@ -954,9 +954,22 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>>   	u32 device0_status;
>>   	int retry_count = 0;
>>   
>> +	/*
>> +	 * Clear main interrupt first so we don't lose any assertions
>> +	 * the happen during this function.
>> +	 */
>> +	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
>> +
>>   	slave0 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT0);
>>   	slave1 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT1);
>>   
>> +	/*
>> +	 * Clear the bits before handling so we don't lose any
>> +	 * bits that re-assert.
>> +	 */
>> +	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
>> +	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
>> +
>>   	/* combine the two status */
>>   	slave_intstat = ((u64)slave1 << 32) | slave0;
>>   
>> @@ -964,8 +977,6 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>>   
>>   update_status:
>>   	cdns_update_slave_status(cdns, slave_intstat);
>> -	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
>> -	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
> 
> this one is hard to review, if you don't clear the status here, then how
> does the retry work if there is a new event?

The retry loop doesn't work off the interrupt status bits. Precisely
because the #0 ATTACH bit probably doesn't re-assert if the PING status
for #0 doesn't change, the retry checks the most recent PING response
instead.

> 
> Put differently, do we need to retry and the 'goto update_status' any more?
> 

Yes, I believe you do still need it. The Cadence interrupts appear to
assert when there is a change of status. If there are multiple devices
reporting on dev ID #0 then the PING status of #0 will not change until
they have all been reprogrammed, so it will not automatically re-assert.

Anyway, I don't want to mix bugfixes with code improvements. If the loop
_could_ be removed that should be done separately from fixing the
interrupt handling bug.

>>   
>>   	/*
>>   	 * When there is more than one peripheral per link, it's
>> @@ -1001,8 +1012,7 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>>   		}
>>   	}
>>   
>> -	/* clear and unmask Slave interrupt now */
>> -	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
>> +	/* unmask Slave interrupt now */
>>   	cdns_updatel(cdns, CDNS_MCP_INTMASK,
>>   		     CDNS_MCP_INT_SLAVE_MASK, CDNS_MCP_INT_SLAVE_MASK);
>>   

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed
  2022-09-12 12:25       ` Richard Fitzgerald
  (?)
@ 2022-09-12 17:09       ` Pierre-Louis Bossart
  2022-09-13 15:30           ` Richard Fitzgerald
  -1 siblings, 1 reply; 25+ messages in thread
From: Pierre-Louis Bossart @ 2022-09-12 17:09 UTC (permalink / raw)
  To: Richard Fitzgerald, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: patches, alsa-devel, linux-kernel



On 9/12/22 14:25, Richard Fitzgerald wrote:
> On 12/09/2022 12:43, Pierre-Louis Bossart wrote:
>>
>>
>> On 9/7/22 10:52, Richard Fitzgerald wrote:
>>> Only exit sdw_handle_slave_status() right after calling
>>> sdw_program_device_num() if it actually programmed an ID into at
>>> least one device.
>>>
>>> sdw_handle_slave_status() should protect itself against phantom
>>> device #0 ATTACHED indications. In that case there is no actual
>>> device still on #0. The early exit relies on there being a status
>>> change to ATTACHED on the reprogrammed device to trigger another
>>> call to sdw_handle_slave_status() which will then handle the status
>>> of all peripherals. If no device was actually programmed with an
>>> ID there won't be a new ATTACHED indication. This can lead to the
>>> status of other peripherals not being handled.
>>>
>>> The status passed to sdw_handle_slave_status() is obviously always
>>> from a point of time in the past, and may indicate accumulated
>>> unhandled events (depending how the bus manager operates). It's
>>> possible that a device ID is reprogrammed but the last PING status
>>> captured state just before that, when it was still reporting on
>>> ID #0. Then sdw_handle_slave_status() is called with this PING info,
>>> just before a new PING status is available showing it now on its new
>>> ID. So sdw_handle_slave_status() will receive a phantom report of a
>>> device on #0, but it will not find one.
>>>
>>> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
>>> ---
>>>   drivers/soundwire/bus.c | 27 +++++++++++++++------------
>>>   1 file changed, 15 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>>> index 6e569a875a9b..0bcc2d161eb9 100644
>>> --- a/drivers/soundwire/bus.c
>>> +++ b/drivers/soundwire/bus.c
>>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct
>>> sdw_bus *bus)
>>>       struct sdw_slave_id id;
>>>       struct sdw_msg msg;
>>>       bool found;
>>> -    int count = 0, ret;
>>> +    int count = 0, num_programmed = 0, ret;
>>>       u64 addr;
>>>         /* No Slave, so use raw xfer api */
>>>       ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>>                  SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
>>>       if (ret < 0)
>>> -        return ret;
>>> +        return 0;
>>
>> this doesn't seem quite right to me, there are multiple -EINVAL cases
>> handled in sdw_fill_msg().
>>
>> I didn't check if all these error cases are irrelevant in that specific
>> enumeration case, if that was the case maybe we need to break that
>> function in two helpers so that all the checks can be skipped.
>>
> 
> I don't think that there's anything useful that
> sdw_modify_slave_status() could do to recover from an error.
> 
> If any device IDs were programmed then, according to the statement in
> sdw_modify_slave_status()
> 
>     * programming a device number will have side effects,
>     * so we deal with other devices at a later time
> 
> if this is true, then we need to exit to deal with what _was_
> programmed, even if one of them failed.
> 
> If nothing was programmed, and there was an error, we can't bail out of
> sdw_modify_slave_status(). We have status for other devices which
> we can't simply ignore.
> 
> Ultimately I can't see how pushing the error code up is useful.
> sdw_modify_slave_status() can't really do any effective recovery action,
> and the original behavior of giving up and returning means that
> an error in programming dev ID potentially causes collateral damage to
> the status of other peripherals.

I was suggesting something like


void sdw_fill_msg_data(...)
{
  copy data in the msg structure
}

int sdw_fill_msg(...)
{
    sdw_fill_msg_data();
    handle_error_cases
}

and in sdw sdw_program_device_num() we call directly sdw_fill_msg_data()

So no change in functionality beyond explicit skip of error checks that
are not relevant and cannot be handled even if they were.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed
  2022-09-12 17:09       ` Pierre-Louis Bossart
@ 2022-09-13 15:30           ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-13 15:30 UTC (permalink / raw)
  To: Pierre-Louis Bossart, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: patches, alsa-devel, linux-kernel

On 12/09/2022 18:09, Pierre-Louis Bossart wrote:
> 
> 
> On 9/12/22 14:25, Richard Fitzgerald wrote:
>> On 12/09/2022 12:43, Pierre-Louis Bossart wrote:
>>>
>>>
>>> On 9/7/22 10:52, Richard Fitzgerald wrote:
>>>> Only exit sdw_handle_slave_status() right after calling
>>>> sdw_program_device_num() if it actually programmed an ID into at
>>>> least one device.
>>>>
>>>> sdw_handle_slave_status() should protect itself against phantom
>>>> device #0 ATTACHED indications. In that case there is no actual
>>>> device still on #0. The early exit relies on there being a status
>>>> change to ATTACHED on the reprogrammed device to trigger another
>>>> call to sdw_handle_slave_status() which will then handle the status
>>>> of all peripherals. If no device was actually programmed with an
>>>> ID there won't be a new ATTACHED indication. This can lead to the
>>>> status of other peripherals not being handled.
>>>>
>>>> The status passed to sdw_handle_slave_status() is obviously always
>>>> from a point of time in the past, and may indicate accumulated
>>>> unhandled events (depending how the bus manager operates). It's
>>>> possible that a device ID is reprogrammed but the last PING status
>>>> captured state just before that, when it was still reporting on
>>>> ID #0. Then sdw_handle_slave_status() is called with this PING info,
>>>> just before a new PING status is available showing it now on its new
>>>> ID. So sdw_handle_slave_status() will receive a phantom report of a
>>>> device on #0, but it will not find one.
>>>>
>>>> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
>>>> ---
>>>>    drivers/soundwire/bus.c | 27 +++++++++++++++------------
>>>>    1 file changed, 15 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>>>> index 6e569a875a9b..0bcc2d161eb9 100644
>>>> --- a/drivers/soundwire/bus.c
>>>> +++ b/drivers/soundwire/bus.c
>>>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct
>>>> sdw_bus *bus)
>>>>        struct sdw_slave_id id;
>>>>        struct sdw_msg msg;
>>>>        bool found;
>>>> -    int count = 0, ret;
>>>> +    int count = 0, num_programmed = 0, ret;
>>>>        u64 addr;
>>>>          /* No Slave, so use raw xfer api */
>>>>        ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>>>                   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
>>>>        if (ret < 0)
>>>> -        return ret;
>>>> +        return 0;
>>>
>>> this doesn't seem quite right to me, there are multiple -EINVAL cases
>>> handled in sdw_fill_msg().
>>>
>>> I didn't check if all these error cases are irrelevant in that specific
>>> enumeration case, if that was the case maybe we need to break that
>>> function in two helpers so that all the checks can be skipped.
>>>
>>
>> I don't think that there's anything useful that
>> sdw_modify_slave_status() could do to recover from an error.
>>
>> If any device IDs were programmed then, according to the statement in
>> sdw_modify_slave_status()
>>
>>      * programming a device number will have side effects,
>>      * so we deal with other devices at a later time
>>
>> if this is true, then we need to exit to deal with what _was_
>> programmed, even if one of them failed.
>>
>> If nothing was programmed, and there was an error, we can't bail out of
>> sdw_modify_slave_status(). We have status for other devices which
>> we can't simply ignore.
>>
>> Ultimately I can't see how pushing the error code up is useful.
>> sdw_modify_slave_status() can't really do any effective recovery action,
>> and the original behavior of giving up and returning means that
>> an error in programming dev ID potentially causes collateral damage to
>> the status of other peripherals.
> 
> I was suggesting something like
> 
> 
> void sdw_fill_msg_data(...)
> {
>    copy data in the msg structure
> }
> 
> int sdw_fill_msg(...)
> {
>      sdw_fill_msg_data();
>      handle_error_cases
> }
> 
> and in sdw sdw_program_device_num() we call directly sdw_fill_msg_data()
> 
> So no change in functionality beyond explicit skip of error checks that
> are not relevant and cannot be handled even if they were.
> 

sdw_fill_msg() will never report an error during
sdw_program_device_num() because the first check is to return if
the address doesn't need paging, and sdw_program_device_num() only
accesses SCP registers.

I don't want to mix coding improvements with bugfixes. Splitting
sdw_fill_msg() isn't needed to fix this bug.

> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed
@ 2022-09-13 15:30           ` Richard Fitzgerald
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Fitzgerald @ 2022-09-13 15:30 UTC (permalink / raw)
  To: Pierre-Louis Bossart, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: patches, alsa-devel, linux-kernel

On 12/09/2022 18:09, Pierre-Louis Bossart wrote:
> 
> 
> On 9/12/22 14:25, Richard Fitzgerald wrote:
>> On 12/09/2022 12:43, Pierre-Louis Bossart wrote:
>>>
>>>
>>> On 9/7/22 10:52, Richard Fitzgerald wrote:
>>>> Only exit sdw_handle_slave_status() right after calling
>>>> sdw_program_device_num() if it actually programmed an ID into at
>>>> least one device.
>>>>
>>>> sdw_handle_slave_status() should protect itself against phantom
>>>> device #0 ATTACHED indications. In that case there is no actual
>>>> device still on #0. The early exit relies on there being a status
>>>> change to ATTACHED on the reprogrammed device to trigger another
>>>> call to sdw_handle_slave_status() which will then handle the status
>>>> of all peripherals. If no device was actually programmed with an
>>>> ID there won't be a new ATTACHED indication. This can lead to the
>>>> status of other peripherals not being handled.
>>>>
>>>> The status passed to sdw_handle_slave_status() is obviously always
>>>> from a point of time in the past, and may indicate accumulated
>>>> unhandled events (depending how the bus manager operates). It's
>>>> possible that a device ID is reprogrammed but the last PING status
>>>> captured state just before that, when it was still reporting on
>>>> ID #0. Then sdw_handle_slave_status() is called with this PING info,
>>>> just before a new PING status is available showing it now on its new
>>>> ID. So sdw_handle_slave_status() will receive a phantom report of a
>>>> device on #0, but it will not find one.
>>>>
>>>> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
>>>> ---
>>>>    drivers/soundwire/bus.c | 27 +++++++++++++++------------
>>>>    1 file changed, 15 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>>>> index 6e569a875a9b..0bcc2d161eb9 100644
>>>> --- a/drivers/soundwire/bus.c
>>>> +++ b/drivers/soundwire/bus.c
>>>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct
>>>> sdw_bus *bus)
>>>>        struct sdw_slave_id id;
>>>>        struct sdw_msg msg;
>>>>        bool found;
>>>> -    int count = 0, ret;
>>>> +    int count = 0, num_programmed = 0, ret;
>>>>        u64 addr;
>>>>          /* No Slave, so use raw xfer api */
>>>>        ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>>>                   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
>>>>        if (ret < 0)
>>>> -        return ret;
>>>> +        return 0;
>>>
>>> this doesn't seem quite right to me, there are multiple -EINVAL cases
>>> handled in sdw_fill_msg().
>>>
>>> I didn't check if all these error cases are irrelevant in that specific
>>> enumeration case, if that was the case maybe we need to break that
>>> function in two helpers so that all the checks can be skipped.
>>>
>>
>> I don't think that there's anything useful that
>> sdw_modify_slave_status() could do to recover from an error.
>>
>> If any device IDs were programmed then, according to the statement in
>> sdw_modify_slave_status()
>>
>>      * programming a device number will have side effects,
>>      * so we deal with other devices at a later time
>>
>> if this is true, then we need to exit to deal with what _was_
>> programmed, even if one of them failed.
>>
>> If nothing was programmed, and there was an error, we can't bail out of
>> sdw_modify_slave_status(). We have status for other devices which
>> we can't simply ignore.
>>
>> Ultimately I can't see how pushing the error code up is useful.
>> sdw_modify_slave_status() can't really do any effective recovery action,
>> and the original behavior of giving up and returning means that
>> an error in programming dev ID potentially causes collateral damage to
>> the status of other peripherals.
> 
> I was suggesting something like
> 
> 
> void sdw_fill_msg_data(...)
> {
>    copy data in the msg structure
> }
> 
> int sdw_fill_msg(...)
> {
>      sdw_fill_msg_data();
>      handle_error_cases
> }
> 
> and in sdw sdw_program_device_num() we call directly sdw_fill_msg_data()
> 
> So no change in functionality beyond explicit skip of error checks that
> are not relevant and cannot be handled even if they were.
> 

sdw_fill_msg() will never report an error during
sdw_program_device_num() because the first check is to return if
the address doesn't need paging, and sdw_program_device_num() only
accesses SCP registers.

I don't want to mix coding improvements with bugfixes. Splitting
sdw_fill_msg() isn't needed to fix this bug.

> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed
  2022-09-13 15:30           ` Richard Fitzgerald
  (?)
@ 2022-09-13 17:59           ` Pierre-Louis Bossart
  -1 siblings, 0 replies; 25+ messages in thread
From: Pierre-Louis Bossart @ 2022-09-13 17:59 UTC (permalink / raw)
  To: Richard Fitzgerald, vkoul, yung-chuan.liao, sanyog.r.kale
  Cc: patches, alsa-devel, linux-kernel




>>>>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>>>>> index 6e569a875a9b..0bcc2d161eb9 100644
>>>>> --- a/drivers/soundwire/bus.c
>>>>> +++ b/drivers/soundwire/bus.c
>>>>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct
>>>>> sdw_bus *bus)
>>>>>        struct sdw_slave_id id;
>>>>>        struct sdw_msg msg;
>>>>>        bool found;
>>>>> -    int count = 0, ret;
>>>>> +    int count = 0, num_programmed = 0, ret;
>>>>>        u64 addr;
>>>>>          /* No Slave, so use raw xfer api */
>>>>>        ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>>>>                   SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ,
>>>>> buf);
>>>>>        if (ret < 0)
>>>>> -        return ret;
>>>>> +        return 0;
>>>>
>>>> this doesn't seem quite right to me, there are multiple -EINVAL cases
>>>> handled in sdw_fill_msg().
>>>>
>>>> I didn't check if all these error cases are irrelevant in that specific
>>>> enumeration case, if that was the case maybe we need to break that
>>>> function in two helpers so that all the checks can be skipped.
>>>>
>>>
>>> I don't think that there's anything useful that
>>> sdw_modify_slave_status() could do to recover from an error.
>>>
>>> If any device IDs were programmed then, according to the statement in
>>> sdw_modify_slave_status()
>>>
>>>      * programming a device number will have side effects,
>>>      * so we deal with other devices at a later time
>>>
>>> if this is true, then we need to exit to deal with what _was_
>>> programmed, even if one of them failed.
>>>
>>> If nothing was programmed, and there was an error, we can't bail out of
>>> sdw_modify_slave_status(). We have status for other devices which
>>> we can't simply ignore.
>>>
>>> Ultimately I can't see how pushing the error code up is useful.
>>> sdw_modify_slave_status() can't really do any effective recovery action,
>>> and the original behavior of giving up and returning means that
>>> an error in programming dev ID potentially causes collateral damage to
>>> the status of other peripherals.
>>
>> I was suggesting something like
>>
>>
>> void sdw_fill_msg_data(...)
>> {
>>    copy data in the msg structure
>> }
>>
>> int sdw_fill_msg(...)
>> {
>>      sdw_fill_msg_data();
>>      handle_error_cases
>> }
>>
>> and in sdw sdw_program_device_num() we call directly sdw_fill_msg_data()
>>
>> So no change in functionality beyond explicit skip of error checks that
>> are not relevant and cannot be handled even if they were.
>>
> 
> sdw_fill_msg() will never report an error during
> sdw_program_device_num() because the first check is to return if
> the address doesn't need paging, and sdw_program_device_num() only
> accesses SCP registers.
> 
> I don't want to mix coding improvements with bugfixes. Splitting
> sdw_fill_msg() isn't needed to fix this bug.

It's not required but it helps remove a useless always-false condition.
We have way too many error cases in the bus code, most of which have
never been tested. Agree it can be done later, it's just that reviewing
this code changes exposes things that were not noticed before.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-09-13 18:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-07  8:52 [PATCH v2 0/5] soundwire: Fixes for spurious and missing UNATTACH Richard Fitzgerald
2022-09-07  8:52 ` Richard Fitzgerald
2022-09-07  8:52 ` [PATCH v2 1/5] soundwire: cadence: fix updating slave status when a bus has multiple peripherals Richard Fitzgerald
2022-09-07  8:52   ` Richard Fitzgerald
2022-09-07  8:52 ` [PATCH v2 2/5] soundwire: bus: Don't lose unattach notifications Richard Fitzgerald
2022-09-07  8:52   ` Richard Fitzgerald
2022-09-07  8:52 ` [PATCH v2 3/5] soundwire: bus: Don't re-enumerate before status is UNATTACHED Richard Fitzgerald
2022-09-07  8:52   ` Richard Fitzgerald
2022-09-12 11:00   ` Pierre-Louis Bossart
2022-09-12 11:00     ` Pierre-Louis Bossart
2022-09-07  8:52 ` [PATCH v2 4/5] soundwire: cadence: Fix lost ATTACHED interrupts when enumerating Richard Fitzgerald
2022-09-07  8:52   ` Richard Fitzgerald
2022-09-12 11:05   ` Pierre-Louis Bossart
2022-09-12 11:05     ` Pierre-Louis Bossart
2022-09-12 12:36     ` Richard Fitzgerald
2022-09-12 12:36       ` Richard Fitzgerald
2022-09-07  8:52 ` [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs were programmed Richard Fitzgerald
2022-09-07  8:52   ` Richard Fitzgerald
2022-09-12 11:43   ` Pierre-Louis Bossart
2022-09-12 12:25     ` Richard Fitzgerald
2022-09-12 12:25       ` Richard Fitzgerald
2022-09-12 17:09       ` Pierre-Louis Bossart
2022-09-13 15:30         ` Richard Fitzgerald
2022-09-13 15:30           ` Richard Fitzgerald
2022-09-13 17:59           ` Pierre-Louis Bossart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.