All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] selftests/powerpc: Add script to test HMI functionality
@ 2015-11-18  4:43 Daniel Axtens
  2015-11-18 11:33 ` Denis Kirjanov
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Axtens @ 2015-11-18  4:43 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, benh, Daniel Axtens, Mahesh J Salgaonkar

HMIs (Hypervisor Management|Maintenance Interrupts) are a class of interrupt
on POWER systems.

HMI support has traditionally been exceptionally difficult to test. However
Skiboot ships a tool that, with the correct magic numbers, will inject them.

This, therefore, is a first pass at a script to inject HMIs and monitor
Linux's response. It injects an HMI on each core on every chip in turn.
It then watches dmesg to see if it's acknowledged by Linux.

On a Tuletta, I observed that we see 8 (or sometimes 9 or more) events per
injection, regardless of SMT setting, so we wait for 8 before progressing.

It sits in a new scripts/ directory in selftests/powerpc, because it's not
designed to be run as part of the regular make selftests process. In
particular, it is quite possibly going to end up garding lots of your CPUs,
so it should only be run if you know how to undo that.

CC: Mahesh J Salgaonkar <mahesh.salgaonkar@in.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 tools/testing/selftests/powerpc/scripts/hmi.sh | 77 ++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)
 create mode 100755 tools/testing/selftests/powerpc/scripts/hmi.sh

diff --git a/tools/testing/selftests/powerpc/scripts/hmi.sh b/tools/testing/selftests/powerpc/scripts/hmi.sh
new file mode 100755
index 000000000000..ebce03933784
--- /dev/null
+++ b/tools/testing/selftests/powerpc/scripts/hmi.sh
@@ -0,0 +1,77 @@
+#!/bin/sh
+
+# do we have ./getscom, ./putscom?
+if [ -x ./getscom ] && [ -x ./putscom ]; then
+	GETSCOM=./getscom
+	PUTSCOM=./putscom
+elif which getscom > /dev/null; then
+	GETSCOM=$(which getscom)
+	PUTSCOM=$(which putscom)
+else
+	cat <<EOF
+Can't find getscom/putscom in . or \$PATH.
+See https://github.com/open-power/skiboot.
+The tool is in external/xscom-utils
+EOF
+	exit 1
+fi
+
+# We will get 8 HMI events per injection
+# todo: deal with things being offline
+expected_hmis=8
+COUNT_HMIS() {
+    dmesg | grep -c 'Harmless Hypervisor Maintenance interrupt'
+}
+
+# massively expand snooze delay, allowing injection on all cores
+ppc64_cpu --smt-snooze-delay=1000000000
+
+# when we exit, restore it
+trap "ppc64_cpu --smt-snooze-delay=100" 0 1
+
+# for each chip+core combination
+# todo - less fragile parsing
+egrep -o 'OCC: Chip [0-9a-f]+ Core [0-9a-f]' < /sys/firmware/opal/msglog |
+while read chipcore; do
+	chip=$(echo "$chipcore"|awk '{print $3}')
+	core=$(echo "$chipcore"|awk '{print $5}')
+	fir="0x1${core}013100"
+
+	# verify that Core FIR is zero as expected
+	if [ "$($GETSCOM -c 0x${chip} $fir)" != 0 ]; then
+		echo "FIR was not zero before injection for chip $chip, core $core. Aborting!"
+		echo "Result of $GETSCOM -c 0x${chip} $fir:"
+		$GETSCOM -c 0x${chip} $fir
+		echo "If you get a -5 error, the core may be in idle state. Try stress-ng."
+		echo "Otherwise, try $PUTSCOM -c 0x${chip} $fir 0"
+		exit 1
+	fi
+
+	# keep track of the number of HMIs handled
+	old_hmis=$(COUNT_HMIS)
+
+	# do injection, adding a marker to dmesg for clarity
+	echo "Injecting HMI on core $core, chip $chip" | tee /dev/kmsg
+	# inject a RegFile recoverable error
+	if ! $PUTSCOM -c 0x${chip} $fir 2000000000000000 > /dev/null; then
+		echo "Error injecting. Aborting!"
+		exit 1
+	fi
+
+	# now we want to wait for all the HMIs to be processed
+	# we expect one per thread on the core
+	i=0;
+	new_hmis=$(COUNT_HMIS)
+	while [ $new_hmis -lt $((old_hmis + expected_hmis)) ] && [ $i -lt 12 ]; do
+	    echo "Seen $((new_hmis - old_hmis)) HMI(s) out of $expected_hmis expected, sleeping"
+	    sleep 5;
+	    i=$((i + 1))
+	    new_hmis=$(COUNT_HMIS)
+	done
+	if [ $i = 12 ]; then
+	    echo "Haven't seen expected $expected_hmis recoveries after 1 min. Aborting."
+	    exit 1
+	fi
+	echo "Processed $expected_hmis events; presumed success. Check dmesg."
+	echo ""
+done
-- 
2.6.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] selftests/powerpc: Add script to test HMI functionality
  2015-11-18  4:43 [PATCH] selftests/powerpc: Add script to test HMI functionality Daniel Axtens
@ 2015-11-18 11:33 ` Denis Kirjanov
  2015-12-09 23:37   ` Daniel Axtens
  0 siblings, 1 reply; 4+ messages in thread
From: Denis Kirjanov @ 2015-11-18 11:33 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: linuxppc-dev, Mahesh J Salgaonkar, Linux Kernel Mailing List

On 11/18/15, Daniel Axtens <dja@axtens.net> wrote:
> HMIs (Hypervisor Management|Maintenance Interrupts) are a class of interrupt
> on POWER systems.
>
> HMI support has traditionally been exceptionally difficult to test. However
> Skiboot ships a tool that, with the correct magic numbers, will inject them.
>
> This, therefore, is a first pass at a script to inject HMIs and monitor
> Linux's response. It injects an HMI on each core on every chip in turn.
> It then watches dmesg to see if it's acknowledged by Linux.
>
> On a Tuletta, I observed that we see 8 (or sometimes 9 or more) events per
> injection, regardless of SMT setting, so we wait for 8 before progressing.
>
> It sits in a new scripts/ directory in selftests/powerpc, because it's not
> designed to be run as part of the regular make selftests process. In
> particular, it is quite possibly going to end up garding lots of your CPUs,
> so it should only be run if you know how to undo that.

Hi Daniel,

Could you explain why it's useful, and what it's useful for. Moreover,
it's POWER8 feature, right?
>
> CC: Mahesh J Salgaonkar <mahesh.salgaonkar@in.ibm.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  tools/testing/selftests/powerpc/scripts/hmi.sh | 77
> ++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
>  create mode 100755 tools/testing/selftests/powerpc/scripts/hmi.sh
>
> diff --git a/tools/testing/selftests/powerpc/scripts/hmi.sh
> b/tools/testing/selftests/powerpc/scripts/hmi.sh
> new file mode 100755
> index 000000000000..ebce03933784
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/scripts/hmi.sh
> @@ -0,0 +1,77 @@
> +#!/bin/sh
> +
> +# do we have ./getscom, ./putscom?
> +if [ -x ./getscom ] && [ -x ./putscom ]; then
> +	GETSCOM=./getscom
> +	PUTSCOM=./putscom
> +elif which getscom > /dev/null; then
> +	GETSCOM=$(which getscom)
> +	PUTSCOM=$(which putscom)
> +else
> +	cat <<EOF
> +Can't find getscom/putscom in . or \$PATH.
> +See https://github.com/open-power/skiboot.
> +The tool is in external/xscom-utils
> +EOF
> +	exit 1
> +fi
> +
> +# We will get 8 HMI events per injection
> +# todo: deal with things being offline
> +expected_hmis=8
> +COUNT_HMIS() {
> +    dmesg | grep -c 'Harmless Hypervisor Maintenance interrupt'
> +}
> +
> +# massively expand snooze delay, allowing injection on all cores
> +ppc64_cpu --smt-snooze-delay=1000000000
> +
> +# when we exit, restore it
> +trap "ppc64_cpu --smt-snooze-delay=100" 0 1
> +
> +# for each chip+core combination
> +# todo - less fragile parsing
> +egrep -o 'OCC: Chip [0-9a-f]+ Core [0-9a-f]' < /sys/firmware/opal/msglog |
> +while read chipcore; do
> +	chip=$(echo "$chipcore"|awk '{print $3}')
> +	core=$(echo "$chipcore"|awk '{print $5}')
> +	fir="0x1${core}013100"
> +
> +	# verify that Core FIR is zero as expected
> +	if [ "$($GETSCOM -c 0x${chip} $fir)" != 0 ]; then
> +		echo "FIR was not zero before injection for chip $chip, core $core.
> Aborting!"
> +		echo "Result of $GETSCOM -c 0x${chip} $fir:"
> +		$GETSCOM -c 0x${chip} $fir
> +		echo "If you get a -5 error, the core may be in idle state. Try
> stress-ng."
> +		echo "Otherwise, try $PUTSCOM -c 0x${chip} $fir 0"
> +		exit 1
> +	fi
> +
> +	# keep track of the number of HMIs handled
> +	old_hmis=$(COUNT_HMIS)
> +
> +	# do injection, adding a marker to dmesg for clarity
> +	echo "Injecting HMI on core $core, chip $chip" | tee /dev/kmsg
> +	# inject a RegFile recoverable error
> +	if ! $PUTSCOM -c 0x${chip} $fir 2000000000000000 > /dev/null; then
> +		echo "Error injecting. Aborting!"
> +		exit 1
> +	fi
> +
> +	# now we want to wait for all the HMIs to be processed
> +	# we expect one per thread on the core
> +	i=0;
> +	new_hmis=$(COUNT_HMIS)
> +	while [ $new_hmis -lt $((old_hmis + expected_hmis)) ] && [ $i -lt 12 ]; do
> +	    echo "Seen $((new_hmis - old_hmis)) HMI(s) out of $expected_hmis
> expected, sleeping"
> +	    sleep 5;
> +	    i=$((i + 1))
> +	    new_hmis=$(COUNT_HMIS)
> +	done
> +	if [ $i = 12 ]; then
> +	    echo "Haven't seen expected $expected_hmis recoveries after 1 min.
> Aborting."
> +	    exit 1
> +	fi
> +	echo "Processed $expected_hmis events; presumed success. Check dmesg."
> +	echo ""
> +done
> --
> 2.6.2
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] selftests/powerpc: Add script to test HMI functionality
  2015-11-18 11:33 ` Denis Kirjanov
@ 2015-12-09 23:37   ` Daniel Axtens
  2015-12-10  0:38     ` Stewart Smith
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Axtens @ 2015-12-09 23:37 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: linuxppc-dev, Mahesh J Salgaonkar, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 7301 bytes --]

I just realised I sent my reply to Denis not the list - apologies. This
info goes for v2 as well.

  > Could you explain why it's useful, and what it's useful for. Moreover,
  > it's POWER8 feature, right?

  I'm not sure whether you're asking about the script or HMIs. Explaining
  HMIs helps make sense of the script, so I'll start there.

  HMIs are a class of interrupt or exception that, broadly speaking,
  require the hypervisor to intervene to 'do something'. They are (very
  lightly) documented in the POWER ISA, which is available on the
  OpenPOWER website. That file doesn't do a particuarly good job of
  explaining what can trigger an HMI, because that's a Book IV question.

  So, while I can't point you to documentation about what might cause an
  HMI, I can point you to some source code. Here goes:

  An HMI will (per the ISA) cause execution to jump to
  0x0000 0000 0000 0E60. Through some asm and C you end up calling
  ppc_md.hmi_exception_early() and then possibly
  ppc_md.handle_hmi_expection(). This is only defined on PowerNV, where
  they point to opal_hmi_exception_early() and
  opal_handle_hmi_exception() respectively.

  The early exception calls into opal through opal_handle_hmi, which is an
  OPAL call (OPAL_HANDLE_HMI). skiboot/core/hmi.c lists the contents of
  the HMER (Hypervisor Maintenance Exception Register), which identifies
  the actual cause of the HMI. You can find the list in the skiboot repo
  on github, including the action that will be taken:
  https://github.com/open-power/skiboot/blob/master/core/hmi.c
  The rest of the file fleshes out the mechanics of HMIs: for example,
  where they are caused by the failure of a POWER8 co-processor such as
  CAPI or NX.

  Some HMIs are relayed by Skiboot to Linux by sending an OPAL_MSG_HMI_EVT
  to Linux. This triggers off some further processing which causes a
  message to be printed in dmesg. The relevant file here is
  platforms/powernv/opal-hmi.c

  The script, therefore, is useful because:
   - HMIs are an exceptional/error condition that is not hit in normal
     operation. Indeed, without the xscom commands in this script
     (or a CAPI card), it's almost impossible to hit them.
   - HMIs involve communications between Skiboot and Linux, involve
     touching the PACA, and generally work in an area that is prone to
     bugs, so testing them is especially valuable.
   - The script is carefully calibrated to send HMIs that trigger a
     message in dmesg but which don't checkstop the machine.

  To answer your final question, I'm not entirely sure if HMIs are POWER8
  specific. I suspect they've been around for a lot longer, but maybe
  someone who's been around IBM chips for longer than me could clarify this.

  Regards,
  Daniel


Denis Kirjanov <kda@linux-powerpc.org> writes:

> On 11/18/15, Daniel Axtens <dja@axtens.net> wrote:
>> HMIs (Hypervisor Management|Maintenance Interrupts) are a class of interrupt
>> on POWER systems.
>>
>> HMI support has traditionally been exceptionally difficult to test. However
>> Skiboot ships a tool that, with the correct magic numbers, will inject them.
>>
>> This, therefore, is a first pass at a script to inject HMIs and monitor
>> Linux's response. It injects an HMI on each core on every chip in turn.
>> It then watches dmesg to see if it's acknowledged by Linux.
>>
>> On a Tuletta, I observed that we see 8 (or sometimes 9 or more) events per
>> injection, regardless of SMT setting, so we wait for 8 before progressing.
>>
>> It sits in a new scripts/ directory in selftests/powerpc, because it's not
>> designed to be run as part of the regular make selftests process. In
>> particular, it is quite possibly going to end up garding lots of your CPUs,
>> so it should only be run if you know how to undo that.
>
> Hi Daniel,
>
> Could you explain why it's useful, and what it's useful for. Moreover,
> it's POWER8 feature, right?
>>
>> CC: Mahesh J Salgaonkar <mahesh.salgaonkar@in.ibm.com>
>> Signed-off-by: Daniel Axtens <dja@axtens.net>
>> ---
>>  tools/testing/selftests/powerpc/scripts/hmi.sh | 77
>> ++++++++++++++++++++++++++
>>  1 file changed, 77 insertions(+)
>>  create mode 100755 tools/testing/selftests/powerpc/scripts/hmi.sh
>>
>> diff --git a/tools/testing/selftests/powerpc/scripts/hmi.sh
>> b/tools/testing/selftests/powerpc/scripts/hmi.sh
>> new file mode 100755
>> index 000000000000..ebce03933784
>> --- /dev/null
>> +++ b/tools/testing/selftests/powerpc/scripts/hmi.sh
>> @@ -0,0 +1,77 @@
>> +#!/bin/sh
>> +
>> +# do we have ./getscom, ./putscom?
>> +if [ -x ./getscom ] && [ -x ./putscom ]; then
>> +	GETSCOM=./getscom
>> +	PUTSCOM=./putscom
>> +elif which getscom > /dev/null; then
>> +	GETSCOM=$(which getscom)
>> +	PUTSCOM=$(which putscom)
>> +else
>> +	cat <<EOF
>> +Can't find getscom/putscom in . or \$PATH.
>> +See https://github.com/open-power/skiboot.
>> +The tool is in external/xscom-utils
>> +EOF
>> +	exit 1
>> +fi
>> +
>> +# We will get 8 HMI events per injection
>> +# todo: deal with things being offline
>> +expected_hmis=8
>> +COUNT_HMIS() {
>> +    dmesg | grep -c 'Harmless Hypervisor Maintenance interrupt'
>> +}
>> +
>> +# massively expand snooze delay, allowing injection on all cores
>> +ppc64_cpu --smt-snooze-delay=1000000000
>> +
>> +# when we exit, restore it
>> +trap "ppc64_cpu --smt-snooze-delay=100" 0 1
>> +
>> +# for each chip+core combination
>> +# todo - less fragile parsing
>> +egrep -o 'OCC: Chip [0-9a-f]+ Core [0-9a-f]' < /sys/firmware/opal/msglog |
>> +while read chipcore; do
>> +	chip=$(echo "$chipcore"|awk '{print $3}')
>> +	core=$(echo "$chipcore"|awk '{print $5}')
>> +	fir="0x1${core}013100"
>> +
>> +	# verify that Core FIR is zero as expected
>> +	if [ "$($GETSCOM -c 0x${chip} $fir)" != 0 ]; then
>> +		echo "FIR was not zero before injection for chip $chip, core $core.
>> Aborting!"
>> +		echo "Result of $GETSCOM -c 0x${chip} $fir:"
>> +		$GETSCOM -c 0x${chip} $fir
>> +		echo "If you get a -5 error, the core may be in idle state. Try
>> stress-ng."
>> +		echo "Otherwise, try $PUTSCOM -c 0x${chip} $fir 0"
>> +		exit 1
>> +	fi
>> +
>> +	# keep track of the number of HMIs handled
>> +	old_hmis=$(COUNT_HMIS)
>> +
>> +	# do injection, adding a marker to dmesg for clarity
>> +	echo "Injecting HMI on core $core, chip $chip" | tee /dev/kmsg
>> +	# inject a RegFile recoverable error
>> +	if ! $PUTSCOM -c 0x${chip} $fir 2000000000000000 > /dev/null; then
>> +		echo "Error injecting. Aborting!"
>> +		exit 1
>> +	fi
>> +
>> +	# now we want to wait for all the HMIs to be processed
>> +	# we expect one per thread on the core
>> +	i=0;
>> +	new_hmis=$(COUNT_HMIS)
>> +	while [ $new_hmis -lt $((old_hmis + expected_hmis)) ] && [ $i -lt 12 ]; do
>> +	    echo "Seen $((new_hmis - old_hmis)) HMI(s) out of $expected_hmis
>> expected, sleeping"
>> +	    sleep 5;
>> +	    i=$((i + 1))
>> +	    new_hmis=$(COUNT_HMIS)
>> +	done
>> +	if [ $i = 12 ]; then
>> +	    echo "Haven't seen expected $expected_hmis recoveries after 1 min.
>> Aborting."
>> +	    exit 1
>> +	fi
>> +	echo "Processed $expected_hmis events; presumed success. Check dmesg."
>> +	echo ""
>> +done
>> --
>> 2.6.2
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] selftests/powerpc: Add script to test HMI functionality
  2015-12-09 23:37   ` Daniel Axtens
@ 2015-12-10  0:38     ` Stewart Smith
  0 siblings, 0 replies; 4+ messages in thread
From: Stewart Smith @ 2015-12-10  0:38 UTC (permalink / raw)
  To: Daniel Axtens, Denis Kirjanov
  Cc: linuxppc-dev, Linux Kernel Mailing List, Mahesh J Salgaonkar

Daniel Axtens <dja@axtens.net> writes:
> I just realised I sent my reply to Denis not the list - apologies. This
> info goes for v2 as well.
>
>   > Could you explain why it's useful, and what it's useful for. Moreover,
>   > it's POWER8 feature, right?
>
>   I'm not sure whether you're asking about the script or HMIs. Explaining
>   HMIs helps make sense of the script, so I'll start there.
>
>   HMIs are a class of interrupt or exception that, broadly speaking,
>   require the hypervisor to intervene to 'do something'. They are (very
>   lightly) documented in the POWER ISA, which is available on the
>   OpenPOWER website. That file doesn't do a particuarly good job of
>   explaining what can trigger an HMI, because that's a Book IV question.
>
>   So, while I can't point you to documentation about what might cause an
>   HMI, I can point you to some source code. Here goes:
>
>   An HMI will (per the ISA) cause execution to jump to
>   0x0000 0000 0000 0E60. Through some asm and C you end up calling
>   ppc_md.hmi_exception_early() and then possibly
>   ppc_md.handle_hmi_expection(). This is only defined on PowerNV, where
>   they point to opal_hmi_exception_early() and
>   opal_handle_hmi_exception() respectively.
>
>   The early exception calls into opal through opal_handle_hmi, which is an
>   OPAL call (OPAL_HANDLE_HMI). skiboot/core/hmi.c lists the contents of
>   the HMER (Hypervisor Maintenance Exception Register), which identifies
>   the actual cause of the HMI. You can find the list in the skiboot repo
>   on github, including the action that will be taken:
>   https://github.com/open-power/skiboot/blob/master/core/hmi.c
>   The rest of the file fleshes out the mechanics of HMIs: for example,
>   where they are caused by the failure of a POWER8 co-processor such as
>   CAPI or NX.
>
>   Some HMIs are relayed by Skiboot to Linux by sending an OPAL_MSG_HMI_EVT
>   to Linux. This triggers off some further processing which causes a
>   message to be printed in dmesg. The relevant file here is
>   platforms/powernv/opal-hmi.c
>
>   The script, therefore, is useful because:
>    - HMIs are an exceptional/error condition that is not hit in normal
>      operation. Indeed, without the xscom commands in this script
>      (or a CAPI card), it's almost impossible to hit them.
>    - HMIs involve communications between Skiboot and Linux, involve
>      touching the PACA, and generally work in an area that is prone to
>      bugs, so testing them is especially valuable.
>    - The script is carefully calibrated to send HMIs that trigger a
>      message in dmesg but which don't checkstop the machine.
>
>   To answer your final question, I'm not entirely sure if HMIs are POWER8
>   specific. I suspect they've been around for a lot longer, but maybe
>   someone who's been around IBM chips for longer than me could clarify this.

Adding this to doc/ somewhere in kernel and/or skiboot would be
great. There's a skiboot doc/hmi.txt that's begging for a patch, you
know, creating it :)


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-12-10  0:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-18  4:43 [PATCH] selftests/powerpc: Add script to test HMI functionality Daniel Axtens
2015-11-18 11:33 ` Denis Kirjanov
2015-12-09 23:37   ` Daniel Axtens
2015-12-10  0:38     ` Stewart Smith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.