LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] selftests/powerpc: Squash spurious errors due to device removal
@ 2020-07-27  1:01 Oliver O'Halloran
  2020-07-30 12:50 ` Michael Ellerman
  0 siblings, 1 reply; 2+ messages in thread
From: Oliver O'Halloran @ 2020-07-27  1:01 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Oliver O'Halloran

For drivers that don't have the error handling callbacks we implement
recovery by removing the device and re-probing it. This causes the sysfs
directory for the PCI device to be removed which causes the following
spurious error to be printed when checking the PE state:

Breaking 0005:03:00.0...
./eeh-basic.sh: line 13: can't open /sys/bus/pci/devices/0005:03:00.0/eeh_pe_state: no such file
0005:03:00.0, waited 0/60
0005:03:00.0, waited 1/60
0005:03:00.0, waited 2/60
0005:03:00.0, waited 3/60
0005:03:00.0, waited 4/60
0005:03:00.0, waited 5/60
0005:03:00.0, waited 6/60
0005:03:00.0, waited 7/60
0005:03:00.0, Recovered after 8 seconds

We currently try to avoid this by checking if the PE state file exists
before reading from it. This is however inherently racy so re-work the
state checking so that we only read from the file once, and we squash any
errors that occur while reading.

Fixes: 85d86c8aa52e ("selftests/powerpc: Add basic EEH selftest")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 tools/testing/selftests/powerpc/eeh/eeh-functions.sh | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
index f52ed92b53e7..00dc32c0ed75 100755
--- a/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-functions.sh
@@ -5,12 +5,17 @@ pe_ok() {
 	local dev="$1"
 	local path="/sys/bus/pci/devices/$dev/eeh_pe_state"
 
-	if ! [ -e "$path" ] ; then
+	# if a driver doesn't support the error handling callbacks then the
+	# device is recovered by removing and re-probing it. This causes the
+	# sysfs directory to disappear so read the PE state once and squash
+	# any potential error messages
+	local eeh_state="$(cat $path 2>/dev/null)"
+	if [ -z "$eeh_state" ]; then
 		return 1;
 	fi
 
-	local fw_state="$(cut -d' ' -f1 < $path)"
-	local sw_state="$(cut -d' ' -f2 < $path)"
+	local fw_state="$(echo $eeh_state | cut -d' ' -f1)"
+	local sw_state="$(echo $eeh_state | cut -d' ' -f2)"
 
 	# If EEH_PE_ISOLATED or EEH_PE_RECOVERING are set then the PE is in an
 	# error state or being recovered. Either way, not ok.
-- 
2.26.2


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] selftests/powerpc: Squash spurious errors due to device removal
  2020-07-27  1:01 [PATCH] selftests/powerpc: Squash spurious errors due to device removal Oliver O'Halloran
@ 2020-07-30 12:50 ` Michael Ellerman
  0 siblings, 0 replies; 2+ messages in thread
From: Michael Ellerman @ 2020-07-30 12:50 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev

On Mon, 27 Jul 2020 11:01:27 +1000, Oliver O'Halloran wrote:
> For drivers that don't have the error handling callbacks we implement
> recovery by removing the device and re-probing it. This causes the sysfs
> directory for the PCI device to be removed which causes the following
> spurious error to be printed when checking the PE state:
> 
> Breaking 0005:03:00.0...
> ./eeh-basic.sh: line 13: can't open /sys/bus/pci/devices/0005:03:00.0/eeh_pe_state: no such file
> 0005:03:00.0, waited 0/60
> 0005:03:00.0, waited 1/60
> 0005:03:00.0, waited 2/60
> 0005:03:00.0, waited 3/60
> 0005:03:00.0, waited 4/60
> 0005:03:00.0, waited 5/60
> 0005:03:00.0, waited 6/60
> 0005:03:00.0, waited 7/60
> 0005:03:00.0, Recovered after 8 seconds
> 
> [...]

Applied to powerpc/next.

[1/1] selftests/powerpc: Squash spurious errors due to device removal
      https://git.kernel.org/powerpc/c/5f8cf6475828b600ff6d000e580c961ac839cc61

cheers

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-27  1:01 [PATCH] selftests/powerpc: Squash spurious errors due to device removal Oliver O'Halloran
2020-07-30 12:50 ` Michael Ellerman

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git