All of lore.kernel.org
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-24 16:10 ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

Hi,

Looking at my records it seems as though it has been a while since I
last posted these tests. As I'm hoping to get the final bits of MTTCG
merged upstream on the next QEMU development cycle I've been re-basing
these and getting them cleaned up for merging.

Some of the patches might be worth taking now if the maintainers are
happy to do so (run_test tweaks, libcflat updates?). The others could
do with more serious review. I've CC'd some of the ARM guys to look
over the tlbflush/barrier tests so they can cast their expert eyes
over them ;-)

There are two additions to the series.

The tcg-test is a general torture test aimed at QEMU's TCG execution
model. It stresses the cpu execution loop through the use of
cross-page and computed jumps. It can also add IRQ's and self-modifying
code to the mix.

The tlbflush-data test is a new one, the old tlbflush test is renamed
tlbflush-code to better indicate the code path it exercise. The the
code test tests the translation invalidation pathways in QEMU the data
test exercises the SoftMMU's TLBs and explicitly that tlbflush
completion semantics are correct.

The tlbflush-data passes most of the times on real hardware but
definitely showed the problem with deferred TLB flushes running under
MTTCG QEMU. I've looked at some of the failure cases on real hardware
and it did look like a timestamp appeared on a page that shouldn't
have been accessible at the time - I don't know if this is a real
silicon bug or my misreading of the semantics so I'd appreciate
a comment from the experts.

The code needs to be applied on top of Drew's latest ARM GIC patches
or you can grab my tree from:

  https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7

Cheers,

Alex.

Alex Bennée (11):
  run_tests: allow forcing of acceleration mode
  run_tests: allow disabling of timeouts
  run_tests: allow passing of options to QEMU
  libcflat: add PRI(dux)32 format types
  lib: add isaac prng library from CCAN
  arm/Makefile.common: force -fno-pic
  arm/tlbflush-code: Add TLB flush during code execution test
  arm/tlbflush-data: Add TLB flush during data writes test
  arm/locking-tests: add comprehensive locking test
  arm/barrier-litmus-tests: add simple mp and sal litmus tests
  arm/tcg-test: some basic TCG exercising tests

 Makefile                  |   2 +
 arm/Makefile.arm          |   2 +
 arm/Makefile.arm64        |   2 +
 arm/Makefile.common       |  11 ++
 arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
 arm/locking-test.c        | 302 ++++++++++++++++++++++++++++++++
 arm/tcg-test-asm.S        | 170 ++++++++++++++++++
 arm/tcg-test-asm64.S      | 169 ++++++++++++++++++
 arm/tcg-test.c            | 337 +++++++++++++++++++++++++++++++++++
 arm/tlbflush-code.c       | 212 ++++++++++++++++++++++
 arm/tlbflush-data.c       | 401 ++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg         | 190 ++++++++++++++++++++
 lib/arm/asm/barrier.h     |  63 ++++++-
 lib/arm64/asm/barrier.h   |  50 ++++++
 lib/libcflat.h            |   5 +
 lib/prng.c                | 162 +++++++++++++++++
 lib/prng.h                |  82 +++++++++
 run_tests.sh              |  18 +-
 scripts/functions.bash    |  13 +-
 scripts/runtime.bash      |   8 +
 20 files changed, 2626 insertions(+), 10 deletions(-)
 create mode 100644 arm/barrier-litmus-test.c
 create mode 100644 arm/locking-test.c
 create mode 100644 arm/tcg-test-asm.S
 create mode 100644 arm/tcg-test-asm64.S
 create mode 100644 arm/tcg-test.c
 create mode 100644 arm/tlbflush-code.c
 create mode 100644 arm/tlbflush-data.c
 create mode 100644 lib/prng.c
 create mode 100644 lib/prng.h

-- 
2.10.1


^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-24 16:10 ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

Hi,

Looking at my records it seems as though it has been a while since I
last posted these tests. As I'm hoping to get the final bits of MTTCG
merged upstream on the next QEMU development cycle I've been re-basing
these and getting them cleaned up for merging.

Some of the patches might be worth taking now if the maintainers are
happy to do so (run_test tweaks, libcflat updates?). The others could
do with more serious review. I've CC'd some of the ARM guys to look
over the tlbflush/barrier tests so they can cast their expert eyes
over them ;-)

There are two additions to the series.

The tcg-test is a general torture test aimed at QEMU's TCG execution
model. It stresses the cpu execution loop through the use of
cross-page and computed jumps. It can also add IRQ's and self-modifying
code to the mix.

The tlbflush-data test is a new one, the old tlbflush test is renamed
tlbflush-code to better indicate the code path it exercise. The the
code test tests the translation invalidation pathways in QEMU the data
test exercises the SoftMMU's TLBs and explicitly that tlbflush
completion semantics are correct.

The tlbflush-data passes most of the times on real hardware but
definitely showed the problem with deferred TLB flushes running under
MTTCG QEMU. I've looked at some of the failure cases on real hardware
and it did look like a timestamp appeared on a page that shouldn't
have been accessible at the time - I don't know if this is a real
silicon bug or my misreading of the semantics so I'd appreciate
a comment from the experts.

The code needs to be applied on top of Drew's latest ARM GIC patches
or you can grab my tree from:

  https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7

Cheers,

Alex.

Alex Bennée (11):
  run_tests: allow forcing of acceleration mode
  run_tests: allow disabling of timeouts
  run_tests: allow passing of options to QEMU
  libcflat: add PRI(dux)32 format types
  lib: add isaac prng library from CCAN
  arm/Makefile.common: force -fno-pic
  arm/tlbflush-code: Add TLB flush during code execution test
  arm/tlbflush-data: Add TLB flush during data writes test
  arm/locking-tests: add comprehensive locking test
  arm/barrier-litmus-tests: add simple mp and sal litmus tests
  arm/tcg-test: some basic TCG exercising tests

 Makefile                  |   2 +
 arm/Makefile.arm          |   2 +
 arm/Makefile.arm64        |   2 +
 arm/Makefile.common       |  11 ++
 arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
 arm/locking-test.c        | 302 ++++++++++++++++++++++++++++++++
 arm/tcg-test-asm.S        | 170 ++++++++++++++++++
 arm/tcg-test-asm64.S      | 169 ++++++++++++++++++
 arm/tcg-test.c            | 337 +++++++++++++++++++++++++++++++++++
 arm/tlbflush-code.c       | 212 ++++++++++++++++++++++
 arm/tlbflush-data.c       | 401 ++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg         | 190 ++++++++++++++++++++
 lib/arm/asm/barrier.h     |  63 ++++++-
 lib/arm64/asm/barrier.h   |  50 ++++++
 lib/libcflat.h            |   5 +
 lib/prng.c                | 162 +++++++++++++++++
 lib/prng.h                |  82 +++++++++
 run_tests.sh              |  18 +-
 scripts/functions.bash    |  13 +-
 scripts/runtime.bash      |   8 +
 20 files changed, 2626 insertions(+), 10 deletions(-)
 create mode 100644 arm/barrier-litmus-test.c
 create mode 100644 arm/locking-test.c
 create mode 100644 arm/tcg-test-asm.S
 create mode 100644 arm/tcg-test-asm64.S
 create mode 100644 arm/tcg-test.c
 create mode 100644 arm/tlbflush-code.c
 create mode 100644 arm/tlbflush-data.c
 create mode 100644 lib/prng.c
 create mode 100644 lib/prng.h

-- 
2.10.1

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-24 16:10 ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Looking at my records it seems as though it has been a while since I
last posted these tests. As I'm hoping to get the final bits of MTTCG
merged upstream on the next QEMU development cycle I've been re-basing
these and getting them cleaned up for merging.

Some of the patches might be worth taking now if the maintainers are
happy to do so (run_test tweaks, libcflat updates?). The others could
do with more serious review. I've CC'd some of the ARM guys to look
over the tlbflush/barrier tests so they can cast their expert eyes
over them ;-)

There are two additions to the series.

The tcg-test is a general torture test aimed at QEMU's TCG execution
model. It stresses the cpu execution loop through the use of
cross-page and computed jumps. It can also add IRQ's and self-modifying
code to the mix.

The tlbflush-data test is a new one, the old tlbflush test is renamed
tlbflush-code to better indicate the code path it exercise. The the
code test tests the translation invalidation pathways in QEMU the data
test exercises the SoftMMU's TLBs and explicitly that tlbflush
completion semantics are correct.

The tlbflush-data passes most of the times on real hardware but
definitely showed the problem with deferred TLB flushes running under
MTTCG QEMU. I've looked at some of the failure cases on real hardware
and it did look like a timestamp appeared on a page that shouldn't
have been accessible at the time - I don't know if this is a real
silicon bug or my misreading of the semantics so I'd appreciate
a comment from the experts.

The code needs to be applied on top of Drew's latest ARM GIC patches
or you can grab my tree from:

  https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7

Cheers,

Alex.

Alex Benn?e (11):
  run_tests: allow forcing of acceleration mode
  run_tests: allow disabling of timeouts
  run_tests: allow passing of options to QEMU
  libcflat: add PRI(dux)32 format types
  lib: add isaac prng library from CCAN
  arm/Makefile.common: force -fno-pic
  arm/tlbflush-code: Add TLB flush during code execution test
  arm/tlbflush-data: Add TLB flush during data writes test
  arm/locking-tests: add comprehensive locking test
  arm/barrier-litmus-tests: add simple mp and sal litmus tests
  arm/tcg-test: some basic TCG exercising tests

 Makefile                  |   2 +
 arm/Makefile.arm          |   2 +
 arm/Makefile.arm64        |   2 +
 arm/Makefile.common       |  11 ++
 arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
 arm/locking-test.c        | 302 ++++++++++++++++++++++++++++++++
 arm/tcg-test-asm.S        | 170 ++++++++++++++++++
 arm/tcg-test-asm64.S      | 169 ++++++++++++++++++
 arm/tcg-test.c            | 337 +++++++++++++++++++++++++++++++++++
 arm/tlbflush-code.c       | 212 ++++++++++++++++++++++
 arm/tlbflush-data.c       | 401 ++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg         | 190 ++++++++++++++++++++
 lib/arm/asm/barrier.h     |  63 ++++++-
 lib/arm64/asm/barrier.h   |  50 ++++++
 lib/libcflat.h            |   5 +
 lib/prng.c                | 162 +++++++++++++++++
 lib/prng.h                |  82 +++++++++
 run_tests.sh              |  18 +-
 scripts/functions.bash    |  13 +-
 scripts/runtime.bash      |   8 +
 20 files changed, 2626 insertions(+), 10 deletions(-)
 create mode 100644 arm/barrier-litmus-test.c
 create mode 100644 arm/locking-test.c
 create mode 100644 arm/tcg-test-asm.S
 create mode 100644 arm/tcg-test-asm64.S
 create mode 100644 arm/tcg-test.c
 create mode 100644 arm/tlbflush-code.c
 create mode 100644 arm/tlbflush-data.c
 create mode 100644 lib/prng.c
 create mode 100644 lib/prng.h

-- 
2.10.1

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 01/11] run_tests: allow forcing of acceleration mode
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: mttcg, nikunj, jan.kiszka, mark.burton, qemu-devel, cota,
	serge.fdrv, pbonzini, bobby.prani, rth

While tests can be pegged to tcg it is useful to override this from time
to time, especially when testing correctness on real systems.
---
 run_tests.sh         | 8 ++++++--
 scripts/runtime.bash | 4 ++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index 254129d..b88c36f 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -13,9 +13,10 @@ function usage()
 {
 cat <<EOF
 
-Usage: $0 [-g group] [-h] [-v]
+Usage: $0 [-g group] [-a accel] [-h] [-v]
 
     -g: Only execute tests in the given group
+    -a: Force acceleration mode (tcg/kvm)
     -h: Output this help text
     -v: Enables verbose mode
 
@@ -28,11 +29,14 @@ EOF
 RUNTIME_arch_run="./$TEST_DIR/run"
 source scripts/runtime.bash
 
-while getopts "g:hv" opt; do
+while getopts "g:a:hv" opt; do
     case $opt in
         g)
             only_group=$OPTARG
             ;;
+        a)
+            force_accel=$OPTARG
+            ;;
         h)
             usage
             exit
diff --git a/scripts/runtime.bash b/scripts/runtime.bash
index 11a40a9..578cf32 100644
--- a/scripts/runtime.bash
+++ b/scripts/runtime.bash
@@ -75,6 +75,10 @@ function run()
         return;
     fi
 
+    if [ -n "$force_accel" ]; then
+        accel=$force_accel
+    fi
+
     if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
         echo "`SKIP` $1 ($arch only)"
         return 2
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 01/11] run_tests: allow forcing of acceleration mode
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

While tests can be pegged to tcg it is useful to override this from time
to time, especially when testing correctness on real systems.
---
 run_tests.sh         | 8 ++++++--
 scripts/runtime.bash | 4 ++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index 254129d..b88c36f 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -13,9 +13,10 @@ function usage()
 {
 cat <<EOF
 
-Usage: $0 [-g group] [-h] [-v]
+Usage: $0 [-g group] [-a accel] [-h] [-v]
 
     -g: Only execute tests in the given group
+    -a: Force acceleration mode (tcg/kvm)
     -h: Output this help text
     -v: Enables verbose mode
 
@@ -28,11 +29,14 @@ EOF
 RUNTIME_arch_run="./$TEST_DIR/run"
 source scripts/runtime.bash
 
-while getopts "g:hv" opt; do
+while getopts "g:a:hv" opt; do
     case $opt in
         g)
             only_group=$OPTARG
             ;;
+        a)
+            force_accel=$OPTARG
+            ;;
         h)
             usage
             exit
diff --git a/scripts/runtime.bash b/scripts/runtime.bash
index 11a40a9..578cf32 100644
--- a/scripts/runtime.bash
+++ b/scripts/runtime.bash
@@ -75,6 +75,10 @@ function run()
         return;
     fi
 
+    if [ -n "$force_accel" ]; then
+        accel=$force_accel
+    fi
+
     if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
         echo "`SKIP` $1 ($arch only)"
         return 2
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 01/11] run_tests: allow forcing of acceleration mode
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

While tests can be pegged to tcg it is useful to override this from time
to time, especially when testing correctness on real systems.
---
 run_tests.sh         | 8 ++++++--
 scripts/runtime.bash | 4 ++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index 254129d..b88c36f 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -13,9 +13,10 @@ function usage()
 {
 cat <<EOF
 
-Usage: $0 [-g group] [-h] [-v]
+Usage: $0 [-g group] [-a accel] [-h] [-v]
 
     -g: Only execute tests in the given group
+    -a: Force acceleration mode (tcg/kvm)
     -h: Output this help text
     -v: Enables verbose mode
 
@@ -28,11 +29,14 @@ EOF
 RUNTIME_arch_run="./$TEST_DIR/run"
 source scripts/runtime.bash
 
-while getopts "g:hv" opt; do
+while getopts "g:a:hv" opt; do
     case $opt in
         g)
             only_group=$OPTARG
             ;;
+        a)
+            force_accel=$OPTARG
+            ;;
         h)
             usage
             exit
diff --git a/scripts/runtime.bash b/scripts/runtime.bash
index 11a40a9..578cf32 100644
--- a/scripts/runtime.bash
+++ b/scripts/runtime.bash
@@ -75,6 +75,10 @@ function run()
         return;
     fi
 
+    if [ -n "$force_accel" ]; then
+        accel=$force_accel
+    fi
+
     if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
         echo "`SKIP` $1 ($arch only)"
         return 2
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 02/11] run_tests: allow disabling of timeouts
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

Certainly during development of the tests and MTTCG there are times when
the timeout just gets in the way.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 run_tests.sh         | 8 ++++++--
 scripts/runtime.bash | 4 ++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index b88c36f..4f2e5cb 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -13,10 +13,11 @@ function usage()
 {
 cat <<EOF
 
-Usage: $0 [-g group] [-a accel] [-h] [-v]
+Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
 
     -g: Only execute tests in the given group
     -a: Force acceleration mode (tcg/kvm)
+    -t: disable timeouts
     -h: Output this help text
     -v: Enables verbose mode
 
@@ -29,7 +30,7 @@ EOF
 RUNTIME_arch_run="./$TEST_DIR/run"
 source scripts/runtime.bash
 
-while getopts "g:a:hv" opt; do
+while getopts "g:a:thv" opt; do
     case $opt in
         g)
             only_group=$OPTARG
@@ -37,6 +38,9 @@ while getopts "g:a:hv" opt; do
         a)
             force_accel=$OPTARG
             ;;
+        t)
+            no_timeout="yes"
+            ;;
         h)
             usage
             exit
diff --git a/scripts/runtime.bash b/scripts/runtime.bash
index 578cf32..968ff6d 100644
--- a/scripts/runtime.bash
+++ b/scripts/runtime.bash
@@ -79,6 +79,10 @@ function run()
         accel=$force_accel
     fi
 
+    if [ "$no_timeout" = "yes" ]; then
+        timeout=""
+    fi
+
     if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
         echo "`SKIP` $1 ($arch only)"
         return 2
-- 
2.10.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 02/11] run_tests: allow disabling of timeouts
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

Certainly during development of the tests and MTTCG there are times when
the timeout just gets in the way.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 run_tests.sh         | 8 ++++++--
 scripts/runtime.bash | 4 ++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index b88c36f..4f2e5cb 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -13,10 +13,11 @@ function usage()
 {
 cat <<EOF
 
-Usage: $0 [-g group] [-a accel] [-h] [-v]
+Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
 
     -g: Only execute tests in the given group
     -a: Force acceleration mode (tcg/kvm)
+    -t: disable timeouts
     -h: Output this help text
     -v: Enables verbose mode
 
@@ -29,7 +30,7 @@ EOF
 RUNTIME_arch_run="./$TEST_DIR/run"
 source scripts/runtime.bash
 
-while getopts "g:a:hv" opt; do
+while getopts "g:a:thv" opt; do
     case $opt in
         g)
             only_group=$OPTARG
@@ -37,6 +38,9 @@ while getopts "g:a:hv" opt; do
         a)
             force_accel=$OPTARG
             ;;
+        t)
+            no_timeout="yes"
+            ;;
         h)
             usage
             exit
diff --git a/scripts/runtime.bash b/scripts/runtime.bash
index 578cf32..968ff6d 100644
--- a/scripts/runtime.bash
+++ b/scripts/runtime.bash
@@ -79,6 +79,10 @@ function run()
         accel=$force_accel
     fi
 
+    if [ "$no_timeout" = "yes" ]; then
+        timeout=""
+    fi
+
     if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
         echo "`SKIP` $1 ($arch only)"
         return 2
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 02/11] run_tests: allow disabling of timeouts
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

Certainly during development of the tests and MTTCG there are times when
the timeout just gets in the way.

Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
---
 run_tests.sh         | 8 ++++++--
 scripts/runtime.bash | 4 ++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index b88c36f..4f2e5cb 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -13,10 +13,11 @@ function usage()
 {
 cat <<EOF
 
-Usage: $0 [-g group] [-a accel] [-h] [-v]
+Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
 
     -g: Only execute tests in the given group
     -a: Force acceleration mode (tcg/kvm)
+    -t: disable timeouts
     -h: Output this help text
     -v: Enables verbose mode
 
@@ -29,7 +30,7 @@ EOF
 RUNTIME_arch_run="./$TEST_DIR/run"
 source scripts/runtime.bash
 
-while getopts "g:a:hv" opt; do
+while getopts "g:a:thv" opt; do
     case $opt in
         g)
             only_group=$OPTARG
@@ -37,6 +38,9 @@ while getopts "g:a:hv" opt; do
         a)
             force_accel=$OPTARG
             ;;
+        t)
+            no_timeout="yes"
+            ;;
         h)
             usage
             exit
diff --git a/scripts/runtime.bash b/scripts/runtime.bash
index 578cf32..968ff6d 100644
--- a/scripts/runtime.bash
+++ b/scripts/runtime.bash
@@ -79,6 +79,10 @@ function run()
         accel=$force_accel
     fi
 
+    if [ "$no_timeout" = "yes" ]; then
+        timeout=""
+    fi
+
     if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
         echo "`SKIP` $1 ($arch only)"
         return 2
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: mttcg, peter.maydell, claudio.fontana, nikunj, jan.kiszka,
	mark.burton, a.rigo, qemu-devel, cota, serge.fdrv, pbonzini,
	bobby.prani, rth, Alex Bennée, fred.konrad

This introduces a the option -o for passing of options directly to QEMU
which is useful. In my case I'm using it to toggle MTTCG on an off:

  ./run_tests.sh -t -o "-tcg mttcg=on"

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 run_tests.sh           | 10 +++++++---
 scripts/functions.bash | 13 +++++++------
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index 4f2e5cb..05cc7fb 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -13,10 +13,11 @@ function usage()
 {
 cat <<EOF
 
-Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
+Usage: $0 [-g group] [-a accel] [-o qemu_opts] [-t] [-h] [-v]
 
     -g: Only execute tests in the given group
     -a: Force acceleration mode (tcg/kvm)
+    -o: additional options for QEMU command line
     -t: disable timeouts
     -h: Output this help text
     -v: Enables verbose mode
@@ -30,7 +31,7 @@ EOF
 RUNTIME_arch_run="./$TEST_DIR/run"
 source scripts/runtime.bash
 
-while getopts "g:a:thv" opt; do
+while getopts "g:a:o:thv" opt; do
     case $opt in
         g)
             only_group=$OPTARG
@@ -38,6 +39,9 @@ while getopts "g:a:thv" opt; do
         a)
             force_accel=$OPTARG
             ;;
+        o)
+            extra_opts=$OPTARG
+            ;;
         t)
             no_timeout="yes"
             ;;
@@ -67,4 +71,4 @@ RUNTIME_log_stdout () {
 config=$TEST_DIR/unittests.cfg
 rm -f test.log
 printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
-for_each_unittest $config run
+for_each_unittest $config run "$extra_opts"
diff --git a/scripts/functions.bash b/scripts/functions.bash
index ee9143c..d38a69e 100644
--- a/scripts/functions.bash
+++ b/scripts/functions.bash
@@ -2,11 +2,12 @@
 function for_each_unittest()
 {
 	local unittests="$1"
-	local cmd="$2"
-	local testname
+        local cmd="$2"
+        local extra_opts=$3
+        local testname
 	local smp
 	local kernel
-	local opts
+        local opts=$extra_opts
 	local groups
 	local arch
 	local check
@@ -21,7 +22,7 @@ function for_each_unittest()
 			testname=${BASH_REMATCH[1]}
 			smp=1
 			kernel=""
-			opts=""
+                        opts=$extra_opts
 			groups=""
 			arch=""
 			check=""
@@ -32,7 +33,7 @@ function for_each_unittest()
 		elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
 			smp=${BASH_REMATCH[1]}
 		elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
-			opts=${BASH_REMATCH[1]}
+                        opts="$opts ${BASH_REMATCH[1]}"
 		elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
 			groups=${BASH_REMATCH[1]}
 		elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
@@ -45,6 +46,6 @@ function for_each_unittest()
 			timeout=${BASH_REMATCH[1]}
 		fi
 	done
-	"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
+        "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
 	exec {fd}<&-
 }
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

This introduces a the option -o for passing of options directly to QEMU
which is useful. In my case I'm using it to toggle MTTCG on an off:

  ./run_tests.sh -t -o "-tcg mttcg=on"

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 run_tests.sh           | 10 +++++++---
 scripts/functions.bash | 13 +++++++------
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index 4f2e5cb..05cc7fb 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -13,10 +13,11 @@ function usage()
 {
 cat <<EOF
 
-Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
+Usage: $0 [-g group] [-a accel] [-o qemu_opts] [-t] [-h] [-v]
 
     -g: Only execute tests in the given group
     -a: Force acceleration mode (tcg/kvm)
+    -o: additional options for QEMU command line
     -t: disable timeouts
     -h: Output this help text
     -v: Enables verbose mode
@@ -30,7 +31,7 @@ EOF
 RUNTIME_arch_run="./$TEST_DIR/run"
 source scripts/runtime.bash
 
-while getopts "g:a:thv" opt; do
+while getopts "g:a:o:thv" opt; do
     case $opt in
         g)
             only_group=$OPTARG
@@ -38,6 +39,9 @@ while getopts "g:a:thv" opt; do
         a)
             force_accel=$OPTARG
             ;;
+        o)
+            extra_opts=$OPTARG
+            ;;
         t)
             no_timeout="yes"
             ;;
@@ -67,4 +71,4 @@ RUNTIME_log_stdout () {
 config=$TEST_DIR/unittests.cfg
 rm -f test.log
 printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
-for_each_unittest $config run
+for_each_unittest $config run "$extra_opts"
diff --git a/scripts/functions.bash b/scripts/functions.bash
index ee9143c..d38a69e 100644
--- a/scripts/functions.bash
+++ b/scripts/functions.bash
@@ -2,11 +2,12 @@
 function for_each_unittest()
 {
 	local unittests="$1"
-	local cmd="$2"
-	local testname
+        local cmd="$2"
+        local extra_opts=$3
+        local testname
 	local smp
 	local kernel
-	local opts
+        local opts=$extra_opts
 	local groups
 	local arch
 	local check
@@ -21,7 +22,7 @@ function for_each_unittest()
 			testname=${BASH_REMATCH[1]}
 			smp=1
 			kernel=""
-			opts=""
+                        opts=$extra_opts
 			groups=""
 			arch=""
 			check=""
@@ -32,7 +33,7 @@ function for_each_unittest()
 		elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
 			smp=${BASH_REMATCH[1]}
 		elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
-			opts=${BASH_REMATCH[1]}
+                        opts="$opts ${BASH_REMATCH[1]}"
 		elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
 			groups=${BASH_REMATCH[1]}
 		elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
@@ -45,6 +46,6 @@ function for_each_unittest()
 			timeout=${BASH_REMATCH[1]}
 		fi
 	done
-	"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
+        "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
 	exec {fd}<&-
 }
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

This introduces a the option -o for passing of options directly to QEMU
which is useful. In my case I'm using it to toggle MTTCG on an off:

  ./run_tests.sh -t -o "-tcg mttcg=on"

Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
---
 run_tests.sh           | 10 +++++++---
 scripts/functions.bash | 13 +++++++------
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index 4f2e5cb..05cc7fb 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -13,10 +13,11 @@ function usage()
 {
 cat <<EOF
 
-Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
+Usage: $0 [-g group] [-a accel] [-o qemu_opts] [-t] [-h] [-v]
 
     -g: Only execute tests in the given group
     -a: Force acceleration mode (tcg/kvm)
+    -o: additional options for QEMU command line
     -t: disable timeouts
     -h: Output this help text
     -v: Enables verbose mode
@@ -30,7 +31,7 @@ EOF
 RUNTIME_arch_run="./$TEST_DIR/run"
 source scripts/runtime.bash
 
-while getopts "g:a:thv" opt; do
+while getopts "g:a:o:thv" opt; do
     case $opt in
         g)
             only_group=$OPTARG
@@ -38,6 +39,9 @@ while getopts "g:a:thv" opt; do
         a)
             force_accel=$OPTARG
             ;;
+        o)
+            extra_opts=$OPTARG
+            ;;
         t)
             no_timeout="yes"
             ;;
@@ -67,4 +71,4 @@ RUNTIME_log_stdout () {
 config=$TEST_DIR/unittests.cfg
 rm -f test.log
 printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
-for_each_unittest $config run
+for_each_unittest $config run "$extra_opts"
diff --git a/scripts/functions.bash b/scripts/functions.bash
index ee9143c..d38a69e 100644
--- a/scripts/functions.bash
+++ b/scripts/functions.bash
@@ -2,11 +2,12 @@
 function for_each_unittest()
 {
 	local unittests="$1"
-	local cmd="$2"
-	local testname
+        local cmd="$2"
+        local extra_opts=$3
+        local testname
 	local smp
 	local kernel
-	local opts
+        local opts=$extra_opts
 	local groups
 	local arch
 	local check
@@ -21,7 +22,7 @@ function for_each_unittest()
 			testname=${BASH_REMATCH[1]}
 			smp=1
 			kernel=""
-			opts=""
+                        opts=$extra_opts
 			groups=""
 			arch=""
 			check=""
@@ -32,7 +33,7 @@ function for_each_unittest()
 		elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
 			smp=${BASH_REMATCH[1]}
 		elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
-			opts=${BASH_REMATCH[1]}
+                        opts="$opts ${BASH_REMATCH[1]}"
 		elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
 			groups=${BASH_REMATCH[1]}
 		elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
@@ -45,6 +46,6 @@ function for_each_unittest()
 			timeout=${BASH_REMATCH[1]}
 		fi
 	done
-	"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
+        "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
 	exec {fd}<&-
 }
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: mttcg, peter.maydell, claudio.fontana, nikunj, jan.kiszka,
	mark.burton, a.rigo, qemu-devel, cota, serge.fdrv, pbonzini,
	bobby.prani, rth, Alex Bennée, fred.konrad

So we can have portable formatting of uint32_t types.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 lib/libcflat.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/libcflat.h b/lib/libcflat.h
index bdcc561..6dab5be 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -55,12 +55,17 @@ typedef _Bool		bool;
 #define true  1
 
 #if __SIZEOF_LONG__ == 8
+#  define __PRI32_PREFIX
 #  define __PRI64_PREFIX	"l"
 #  define __PRIPTR_PREFIX	"l"
 #else
+#  define __PRI32_PREFIX        "l"
 #  define __PRI64_PREFIX	"ll"
 #  define __PRIPTR_PREFIX
 #endif
+#define PRId32  __PRI32_PREFIX	"d"
+#define PRIu32  __PRI32_PREFIX	"u"
+#define PRIx32  __PRI32_PREFIX	"x"
 #define PRId64  __PRI64_PREFIX	"d"
 #define PRIu64  __PRI64_PREFIX	"u"
 #define PRIx64  __PRI64_PREFIX	"x"
-- 
2.10.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

So we can have portable formatting of uint32_t types.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 lib/libcflat.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/libcflat.h b/lib/libcflat.h
index bdcc561..6dab5be 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -55,12 +55,17 @@ typedef _Bool		bool;
 #define true  1
 
 #if __SIZEOF_LONG__ == 8
+#  define __PRI32_PREFIX
 #  define __PRI64_PREFIX	"l"
 #  define __PRIPTR_PREFIX	"l"
 #else
+#  define __PRI32_PREFIX        "l"
 #  define __PRI64_PREFIX	"ll"
 #  define __PRIPTR_PREFIX
 #endif
+#define PRId32  __PRI32_PREFIX	"d"
+#define PRIu32  __PRI32_PREFIX	"u"
+#define PRIx32  __PRI32_PREFIX	"x"
 #define PRId64  __PRI64_PREFIX	"d"
 #define PRIu64  __PRI64_PREFIX	"u"
 #define PRIx64  __PRI64_PREFIX	"x"
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

So we can have portable formatting of uint32_t types.

Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
---
 lib/libcflat.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/libcflat.h b/lib/libcflat.h
index bdcc561..6dab5be 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -55,12 +55,17 @@ typedef _Bool		bool;
 #define true  1
 
 #if __SIZEOF_LONG__ == 8
+#  define __PRI32_PREFIX
 #  define __PRI64_PREFIX	"l"
 #  define __PRIPTR_PREFIX	"l"
 #else
+#  define __PRI32_PREFIX        "l"
 #  define __PRI64_PREFIX	"ll"
 #  define __PRIPTR_PREFIX
 #endif
+#define PRId32  __PRI32_PREFIX	"d"
+#define PRIu32  __PRI32_PREFIX	"u"
+#define PRIx32  __PRI32_PREFIX	"x"
 #define PRId64  __PRI64_PREFIX	"d"
 #define PRIu64  __PRI64_PREFIX	"u"
 #define PRIx64  __PRI64_PREFIX	"x"
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 05/11] lib: add isaac prng library from CCAN
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Timothy B . Terriberry

It's often useful to introduce some sort of random variation when
testing several racing CPU conditions. Instead of each test implementing
some half-arsed PRNG bring in a a decent one which has good statistical
randomness. Obviously it is deterministic for a given seed value which
is likely the behaviour you want.

I've pulled in the ISAAC library from CCAN:

    http://ccodearchive.net/info/isaac.html

I shaved off the float related stuff which is less useful for unit
testing and re-indented to fit the style. The original license was
CC0 (Public Domain) which is compatible with the LGPL v2 of
kvm-unit-tests.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Timothy B. Terriberry <tterribe@xiph.org>
Acked-by: Andrew Jones <drjones@redhat.com>
---
 arm/Makefile.common |   1 +
 lib/prng.c          | 162 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/prng.h          |  82 ++++++++++++++++++++++++++
 3 files changed, 245 insertions(+)
 create mode 100644 lib/prng.c
 create mode 100644 lib/prng.h

diff --git a/arm/Makefile.common b/arm/Makefile.common
index 6c0898f..52f7440 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -40,6 +40,7 @@ cflatobjs += lib/pci-testdev.o
 cflatobjs += lib/virtio.o
 cflatobjs += lib/virtio-mmio.o
 cflatobjs += lib/chr-testdev.o
+cflatobjs += lib/prng.o
 cflatobjs += lib/arm/io.o
 cflatobjs += lib/arm/setup.o
 cflatobjs += lib/arm/mmu.o
diff --git a/lib/prng.c b/lib/prng.c
new file mode 100644
index 0000000..ebd6df7
--- /dev/null
+++ b/lib/prng.c
@@ -0,0 +1,162 @@
+/*
+ * Pseudo Random Number Generator
+ *
+ * Lifted from ccan modules ilog/isaac under CC0
+ *   - http://ccodearchive.net/info/isaac.html
+ *   - http://ccodearchive.net/info/ilog.html
+ *
+ * And lightly hacked to compile under the KVM unit test environment.
+ * This provides a handy RNG for torture tests that want to vary
+ * delays and the like.
+ *
+ */
+
+/*Written by Timothy B. Terriberry (tterribe@xiph.org) 1999-2009.
+  CC0 (Public domain) - see LICENSE file for details
+  Based on the public domain implementation by Robert J. Jenkins Jr.*/
+
+#include "libcflat.h"
+
+#include <string.h>
+#include "prng.h"
+
+#define ISAAC_MASK        (0xFFFFFFFFU)
+
+/* Extract ISAAC_SZ_LOG bits (starting at bit 2). */
+static inline uint32_t lower_bits(uint32_t x)
+{
+	return (x & ((ISAAC_SZ-1) << 2)) >> 2;
+}
+
+/* Extract next ISAAC_SZ_LOG bits (starting at bit ISAAC_SZ_LOG+2). */
+static inline uint32_t upper_bits(uint32_t y)
+{
+	return (y >> (ISAAC_SZ_LOG+2)) & (ISAAC_SZ-1);
+}
+
+static void isaac_update(isaac_ctx *_ctx){
+	uint32_t *m;
+	uint32_t *r;
+	uint32_t  a;
+	uint32_t  b;
+	uint32_t  x;
+	uint32_t  y;
+	int       i;
+	m=_ctx->m;
+	r=_ctx->r;
+	a=_ctx->a;
+	b=_ctx->b+(++_ctx->c);
+	for(i=0;i<ISAAC_SZ/2;i++){
+		x=m[i];
+		a=(a^a<<13)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>6)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a<<2)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>16)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+	}
+	for(i=ISAAC_SZ/2;i<ISAAC_SZ;i++){
+		x=m[i];
+		a=(a^a<<13)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>6)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a<<2)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>16)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+	}
+	_ctx->b=b;
+	_ctx->a=a;
+	_ctx->n=ISAAC_SZ;
+}
+
+static void isaac_mix(uint32_t _x[8]){
+	static const unsigned char SHIFT[8]={11,2,8,16,10,4,8,9};
+	int i;
+	for(i=0;i<8;i++){
+		_x[i]^=_x[(i+1)&7]<<SHIFT[i];
+		_x[(i+3)&7]+=_x[i];
+		_x[(i+1)&7]+=_x[(i+2)&7];
+		i++;
+		_x[i]^=_x[(i+1)&7]>>SHIFT[i];
+		_x[(i+3)&7]+=_x[i];
+		_x[(i+1)&7]+=_x[(i+2)&7];
+	}
+}
+
+
+void isaac_init(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed){
+	_ctx->a=_ctx->b=_ctx->c=0;
+	memset(_ctx->r,0,sizeof(_ctx->r));
+	isaac_reseed(_ctx,_seed,_nseed);
+}
+
+void isaac_reseed(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed){
+	uint32_t *m;
+	uint32_t *r;
+	uint32_t  x[8];
+	int       i;
+	int       j;
+	m=_ctx->m;
+	r=_ctx->r;
+	if(_nseed>ISAAC_SEED_SZ_MAX)_nseed=ISAAC_SEED_SZ_MAX;
+	for(i=0;i<_nseed>>2;i++){
+		r[i]^=(uint32_t)_seed[i<<2|3]<<24|(uint32_t)_seed[i<<2|2]<<16|
+			(uint32_t)_seed[i<<2|1]<<8|_seed[i<<2];
+	}
+	_nseed-=i<<2;
+	if(_nseed>0){
+		uint32_t ri;
+		ri=_seed[i<<2];
+		for(j=1;j<_nseed;j++)ri|=(uint32_t)_seed[i<<2|j]<<(j<<3);
+		r[i++]^=ri;
+	}
+	x[0]=x[1]=x[2]=x[3]=x[4]=x[5]=x[6]=x[7]=0x9E3779B9U;
+	for(i=0;i<4;i++)isaac_mix(x);
+	for(i=0;i<ISAAC_SZ;i+=8){
+		for(j=0;j<8;j++)x[j]+=r[i+j];
+		isaac_mix(x);
+		memcpy(m+i,x,sizeof(x));
+	}
+	for(i=0;i<ISAAC_SZ;i+=8){
+		for(j=0;j<8;j++)x[j]+=m[i+j];
+		isaac_mix(x);
+		memcpy(m+i,x,sizeof(x));
+	}
+	isaac_update(_ctx);
+}
+
+uint32_t isaac_next_uint32(isaac_ctx *_ctx){
+	if(!_ctx->n)isaac_update(_ctx);
+	return _ctx->r[--_ctx->n];
+}
+
+uint32_t isaac_next_uint(isaac_ctx *_ctx,uint32_t _n){
+	uint32_t r;
+	uint32_t v;
+	uint32_t d;
+	do{
+		r=isaac_next_uint32(_ctx);
+		v=r%_n;
+		d=r-v;
+	}
+	while(((d+_n-1)&ISAAC_MASK)<d);
+	return v;
+}
diff --git a/lib/prng.h b/lib/prng.h
new file mode 100644
index 0000000..bf5776d
--- /dev/null
+++ b/lib/prng.h
@@ -0,0 +1,82 @@
+/*
+ * PRNG Header
+ */
+#ifndef __PRNG_H__
+#define __PRNG_H__
+
+# include <stdint.h>
+
+
+
+typedef struct isaac_ctx isaac_ctx;
+
+
+
+/*This value may be lowered to reduce memory usage on embedded platforms, at
+  the cost of reducing security and increasing bias.
+  Quoting Bob Jenkins: "The current best guess is that bias is detectable after
+  2**37 values for [ISAAC_SZ_LOG]=3, 2**45 for 4, 2**53 for 5, 2**61 for 6,
+  2**69 for 7, and 2**77 values for [ISAAC_SZ_LOG]=8."*/
+#define ISAAC_SZ_LOG      (8)
+#define ISAAC_SZ          (1<<ISAAC_SZ_LOG)
+#define ISAAC_SEED_SZ_MAX (ISAAC_SZ<<2)
+
+
+
+/*ISAAC is the most advanced of a series of pseudo-random number generators
+  designed by Robert J. Jenkins Jr. in 1996.
+  http://www.burtleburtle.net/bob/rand/isaac.html
+  To quote:
+  No efficient method is known for deducing their internal states.
+  ISAAC requires an amortized 18.75 instructions to produce a 32-bit value.
+  There are no cycles in ISAAC shorter than 2**40 values.
+  The expected cycle length is 2**8295 values.*/
+struct isaac_ctx{
+	unsigned n;
+	uint32_t r[ISAAC_SZ];
+	uint32_t m[ISAAC_SZ];
+	uint32_t a;
+	uint32_t b;
+	uint32_t c;
+};
+
+
+/**
+ * isaac_init - Initialize an instance of the ISAAC random number generator.
+ * @_ctx:   The instance to initialize.
+ * @_seed:  The specified seed bytes.
+ *          This may be NULL if _nseed is less than or equal to zero.
+ * @_nseed: The number of bytes to use for the seed.
+ *          If this is greater than ISAAC_SEED_SZ_MAX, the extra bytes are
+ *           ignored.
+ */
+void isaac_init(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed);
+
+/**
+ * isaac_reseed - Mix a new batch of entropy into the current state.
+ * To reset ISAAC to a known state, call isaac_init() again instead.
+ * @_ctx:   The instance to reseed.
+ * @_seed:  The specified seed bytes.
+ *          This may be NULL if _nseed is zero.
+ * @_nseed: The number of bytes to use for the seed.
+ *          If this is greater than ISAAC_SEED_SZ_MAX, the extra bytes are
+ *           ignored.
+ */
+void isaac_reseed(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed);
+/**
+ * isaac_next_uint32 - Return the next random 32-bit value.
+ * @_ctx: The ISAAC instance to generate the value with.
+ */
+uint32_t isaac_next_uint32(isaac_ctx *_ctx);
+/**
+ * isaac_next_uint - Uniform random integer less than the given value.
+ * @_ctx: The ISAAC instance to generate the value with.
+ * @_n:   The upper bound on the range of numbers returned (not inclusive).
+ *        This must be greater than zero and less than 2**32.
+ *        To return integers in the full range 0...2**32-1, use
+ *         isaac_next_uint32() instead.
+ * Return: An integer uniformly distributed between 0 and _n-1 (inclusive).
+ */
+uint32_t isaac_next_uint(isaac_ctx *_ctx,uint32_t _n);
+
+#endif
-- 
2.10.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 05/11] lib: add isaac prng library from CCAN
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée,
	Timothy B . Terriberry

It's often useful to introduce some sort of random variation when
testing several racing CPU conditions. Instead of each test implementing
some half-arsed PRNG bring in a a decent one which has good statistical
randomness. Obviously it is deterministic for a given seed value which
is likely the behaviour you want.

I've pulled in the ISAAC library from CCAN:

    http://ccodearchive.net/info/isaac.html

I shaved off the float related stuff which is less useful for unit
testing and re-indented to fit the style. The original license was
CC0 (Public Domain) which is compatible with the LGPL v2 of
kvm-unit-tests.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Timothy B. Terriberry <tterribe@xiph.org>
Acked-by: Andrew Jones <drjones@redhat.com>
---
 arm/Makefile.common |   1 +
 lib/prng.c          | 162 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/prng.h          |  82 ++++++++++++++++++++++++++
 3 files changed, 245 insertions(+)
 create mode 100644 lib/prng.c
 create mode 100644 lib/prng.h

diff --git a/arm/Makefile.common b/arm/Makefile.common
index 6c0898f..52f7440 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -40,6 +40,7 @@ cflatobjs += lib/pci-testdev.o
 cflatobjs += lib/virtio.o
 cflatobjs += lib/virtio-mmio.o
 cflatobjs += lib/chr-testdev.o
+cflatobjs += lib/prng.o
 cflatobjs += lib/arm/io.o
 cflatobjs += lib/arm/setup.o
 cflatobjs += lib/arm/mmu.o
diff --git a/lib/prng.c b/lib/prng.c
new file mode 100644
index 0000000..ebd6df7
--- /dev/null
+++ b/lib/prng.c
@@ -0,0 +1,162 @@
+/*
+ * Pseudo Random Number Generator
+ *
+ * Lifted from ccan modules ilog/isaac under CC0
+ *   - http://ccodearchive.net/info/isaac.html
+ *   - http://ccodearchive.net/info/ilog.html
+ *
+ * And lightly hacked to compile under the KVM unit test environment.
+ * This provides a handy RNG for torture tests that want to vary
+ * delays and the like.
+ *
+ */
+
+/*Written by Timothy B. Terriberry (tterribe@xiph.org) 1999-2009.
+  CC0 (Public domain) - see LICENSE file for details
+  Based on the public domain implementation by Robert J. Jenkins Jr.*/
+
+#include "libcflat.h"
+
+#include <string.h>
+#include "prng.h"
+
+#define ISAAC_MASK        (0xFFFFFFFFU)
+
+/* Extract ISAAC_SZ_LOG bits (starting at bit 2). */
+static inline uint32_t lower_bits(uint32_t x)
+{
+	return (x & ((ISAAC_SZ-1) << 2)) >> 2;
+}
+
+/* Extract next ISAAC_SZ_LOG bits (starting at bit ISAAC_SZ_LOG+2). */
+static inline uint32_t upper_bits(uint32_t y)
+{
+	return (y >> (ISAAC_SZ_LOG+2)) & (ISAAC_SZ-1);
+}
+
+static void isaac_update(isaac_ctx *_ctx){
+	uint32_t *m;
+	uint32_t *r;
+	uint32_t  a;
+	uint32_t  b;
+	uint32_t  x;
+	uint32_t  y;
+	int       i;
+	m=_ctx->m;
+	r=_ctx->r;
+	a=_ctx->a;
+	b=_ctx->b+(++_ctx->c);
+	for(i=0;i<ISAAC_SZ/2;i++){
+		x=m[i];
+		a=(a^a<<13)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>6)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a<<2)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>16)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+	}
+	for(i=ISAAC_SZ/2;i<ISAAC_SZ;i++){
+		x=m[i];
+		a=(a^a<<13)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>6)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a<<2)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>16)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+	}
+	_ctx->b=b;
+	_ctx->a=a;
+	_ctx->n=ISAAC_SZ;
+}
+
+static void isaac_mix(uint32_t _x[8]){
+	static const unsigned char SHIFT[8]={11,2,8,16,10,4,8,9};
+	int i;
+	for(i=0;i<8;i++){
+		_x[i]^=_x[(i+1)&7]<<SHIFT[i];
+		_x[(i+3)&7]+=_x[i];
+		_x[(i+1)&7]+=_x[(i+2)&7];
+		i++;
+		_x[i]^=_x[(i+1)&7]>>SHIFT[i];
+		_x[(i+3)&7]+=_x[i];
+		_x[(i+1)&7]+=_x[(i+2)&7];
+	}
+}
+
+
+void isaac_init(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed){
+	_ctx->a=_ctx->b=_ctx->c=0;
+	memset(_ctx->r,0,sizeof(_ctx->r));
+	isaac_reseed(_ctx,_seed,_nseed);
+}
+
+void isaac_reseed(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed){
+	uint32_t *m;
+	uint32_t *r;
+	uint32_t  x[8];
+	int       i;
+	int       j;
+	m=_ctx->m;
+	r=_ctx->r;
+	if(_nseed>ISAAC_SEED_SZ_MAX)_nseed=ISAAC_SEED_SZ_MAX;
+	for(i=0;i<_nseed>>2;i++){
+		r[i]^=(uint32_t)_seed[i<<2|3]<<24|(uint32_t)_seed[i<<2|2]<<16|
+			(uint32_t)_seed[i<<2|1]<<8|_seed[i<<2];
+	}
+	_nseed-=i<<2;
+	if(_nseed>0){
+		uint32_t ri;
+		ri=_seed[i<<2];
+		for(j=1;j<_nseed;j++)ri|=(uint32_t)_seed[i<<2|j]<<(j<<3);
+		r[i++]^=ri;
+	}
+	x[0]=x[1]=x[2]=x[3]=x[4]=x[5]=x[6]=x[7]=0x9E3779B9U;
+	for(i=0;i<4;i++)isaac_mix(x);
+	for(i=0;i<ISAAC_SZ;i+=8){
+		for(j=0;j<8;j++)x[j]+=r[i+j];
+		isaac_mix(x);
+		memcpy(m+i,x,sizeof(x));
+	}
+	for(i=0;i<ISAAC_SZ;i+=8){
+		for(j=0;j<8;j++)x[j]+=m[i+j];
+		isaac_mix(x);
+		memcpy(m+i,x,sizeof(x));
+	}
+	isaac_update(_ctx);
+}
+
+uint32_t isaac_next_uint32(isaac_ctx *_ctx){
+	if(!_ctx->n)isaac_update(_ctx);
+	return _ctx->r[--_ctx->n];
+}
+
+uint32_t isaac_next_uint(isaac_ctx *_ctx,uint32_t _n){
+	uint32_t r;
+	uint32_t v;
+	uint32_t d;
+	do{
+		r=isaac_next_uint32(_ctx);
+		v=r%_n;
+		d=r-v;
+	}
+	while(((d+_n-1)&ISAAC_MASK)<d);
+	return v;
+}
diff --git a/lib/prng.h b/lib/prng.h
new file mode 100644
index 0000000..bf5776d
--- /dev/null
+++ b/lib/prng.h
@@ -0,0 +1,82 @@
+/*
+ * PRNG Header
+ */
+#ifndef __PRNG_H__
+#define __PRNG_H__
+
+# include <stdint.h>
+
+
+
+typedef struct isaac_ctx isaac_ctx;
+
+
+
+/*This value may be lowered to reduce memory usage on embedded platforms, at
+  the cost of reducing security and increasing bias.
+  Quoting Bob Jenkins: "The current best guess is that bias is detectable after
+  2**37 values for [ISAAC_SZ_LOG]=3, 2**45 for 4, 2**53 for 5, 2**61 for 6,
+  2**69 for 7, and 2**77 values for [ISAAC_SZ_LOG]=8."*/
+#define ISAAC_SZ_LOG      (8)
+#define ISAAC_SZ          (1<<ISAAC_SZ_LOG)
+#define ISAAC_SEED_SZ_MAX (ISAAC_SZ<<2)
+
+
+
+/*ISAAC is the most advanced of a series of pseudo-random number generators
+  designed by Robert J. Jenkins Jr. in 1996.
+  http://www.burtleburtle.net/bob/rand/isaac.html
+  To quote:
+  No efficient method is known for deducing their internal states.
+  ISAAC requires an amortized 18.75 instructions to produce a 32-bit value.
+  There are no cycles in ISAAC shorter than 2**40 values.
+  The expected cycle length is 2**8295 values.*/
+struct isaac_ctx{
+	unsigned n;
+	uint32_t r[ISAAC_SZ];
+	uint32_t m[ISAAC_SZ];
+	uint32_t a;
+	uint32_t b;
+	uint32_t c;
+};
+
+
+/**
+ * isaac_init - Initialize an instance of the ISAAC random number generator.
+ * @_ctx:   The instance to initialize.
+ * @_seed:  The specified seed bytes.
+ *          This may be NULL if _nseed is less than or equal to zero.
+ * @_nseed: The number of bytes to use for the seed.
+ *          If this is greater than ISAAC_SEED_SZ_MAX, the extra bytes are
+ *           ignored.
+ */
+void isaac_init(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed);
+
+/**
+ * isaac_reseed - Mix a new batch of entropy into the current state.
+ * To reset ISAAC to a known state, call isaac_init() again instead.
+ * @_ctx:   The instance to reseed.
+ * @_seed:  The specified seed bytes.
+ *          This may be NULL if _nseed is zero.
+ * @_nseed: The number of bytes to use for the seed.
+ *          If this is greater than ISAAC_SEED_SZ_MAX, the extra bytes are
+ *           ignored.
+ */
+void isaac_reseed(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed);
+/**
+ * isaac_next_uint32 - Return the next random 32-bit value.
+ * @_ctx: The ISAAC instance to generate the value with.
+ */
+uint32_t isaac_next_uint32(isaac_ctx *_ctx);
+/**
+ * isaac_next_uint - Uniform random integer less than the given value.
+ * @_ctx: The ISAAC instance to generate the value with.
+ * @_n:   The upper bound on the range of numbers returned (not inclusive).
+ *        This must be greater than zero and less than 2**32.
+ *        To return integers in the full range 0...2**32-1, use
+ *         isaac_next_uint32() instead.
+ * Return: An integer uniformly distributed between 0 and _n-1 (inclusive).
+ */
+uint32_t isaac_next_uint(isaac_ctx *_ctx,uint32_t _n);
+
+#endif
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 05/11] lib: add isaac prng library from CCAN
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

It's often useful to introduce some sort of random variation when
testing several racing CPU conditions. Instead of each test implementing
some half-arsed PRNG bring in a a decent one which has good statistical
randomness. Obviously it is deterministic for a given seed value which
is likely the behaviour you want.

I've pulled in the ISAAC library from CCAN:

    http://ccodearchive.net/info/isaac.html

I shaved off the float related stuff which is less useful for unit
testing and re-indented to fit the style. The original license was
CC0 (Public Domain) which is compatible with the LGPL v2 of
kvm-unit-tests.

Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
CC: Timothy B. Terriberry <tterribe@xiph.org>
Acked-by: Andrew Jones <drjones@redhat.com>
---
 arm/Makefile.common |   1 +
 lib/prng.c          | 162 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/prng.h          |  82 ++++++++++++++++++++++++++
 3 files changed, 245 insertions(+)
 create mode 100644 lib/prng.c
 create mode 100644 lib/prng.h

diff --git a/arm/Makefile.common b/arm/Makefile.common
index 6c0898f..52f7440 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -40,6 +40,7 @@ cflatobjs += lib/pci-testdev.o
 cflatobjs += lib/virtio.o
 cflatobjs += lib/virtio-mmio.o
 cflatobjs += lib/chr-testdev.o
+cflatobjs += lib/prng.o
 cflatobjs += lib/arm/io.o
 cflatobjs += lib/arm/setup.o
 cflatobjs += lib/arm/mmu.o
diff --git a/lib/prng.c b/lib/prng.c
new file mode 100644
index 0000000..ebd6df7
--- /dev/null
+++ b/lib/prng.c
@@ -0,0 +1,162 @@
+/*
+ * Pseudo Random Number Generator
+ *
+ * Lifted from ccan modules ilog/isaac under CC0
+ *   - http://ccodearchive.net/info/isaac.html
+ *   - http://ccodearchive.net/info/ilog.html
+ *
+ * And lightly hacked to compile under the KVM unit test environment.
+ * This provides a handy RNG for torture tests that want to vary
+ * delays and the like.
+ *
+ */
+
+/*Written by Timothy B. Terriberry (tterribe at xiph.org) 1999-2009.
+  CC0 (Public domain) - see LICENSE file for details
+  Based on the public domain implementation by Robert J. Jenkins Jr.*/
+
+#include "libcflat.h"
+
+#include <string.h>
+#include "prng.h"
+
+#define ISAAC_MASK        (0xFFFFFFFFU)
+
+/* Extract ISAAC_SZ_LOG bits (starting at bit 2). */
+static inline uint32_t lower_bits(uint32_t x)
+{
+	return (x & ((ISAAC_SZ-1) << 2)) >> 2;
+}
+
+/* Extract next ISAAC_SZ_LOG bits (starting at bit ISAAC_SZ_LOG+2). */
+static inline uint32_t upper_bits(uint32_t y)
+{
+	return (y >> (ISAAC_SZ_LOG+2)) & (ISAAC_SZ-1);
+}
+
+static void isaac_update(isaac_ctx *_ctx){
+	uint32_t *m;
+	uint32_t *r;
+	uint32_t  a;
+	uint32_t  b;
+	uint32_t  x;
+	uint32_t  y;
+	int       i;
+	m=_ctx->m;
+	r=_ctx->r;
+	a=_ctx->a;
+	b=_ctx->b+(++_ctx->c);
+	for(i=0;i<ISAAC_SZ/2;i++){
+		x=m[i];
+		a=(a^a<<13)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>6)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a<<2)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>16)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+	}
+	for(i=ISAAC_SZ/2;i<ISAAC_SZ;i++){
+		x=m[i];
+		a=(a^a<<13)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>6)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a<<2)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>16)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+	}
+	_ctx->b=b;
+	_ctx->a=a;
+	_ctx->n=ISAAC_SZ;
+}
+
+static void isaac_mix(uint32_t _x[8]){
+	static const unsigned char SHIFT[8]={11,2,8,16,10,4,8,9};
+	int i;
+	for(i=0;i<8;i++){
+		_x[i]^=_x[(i+1)&7]<<SHIFT[i];
+		_x[(i+3)&7]+=_x[i];
+		_x[(i+1)&7]+=_x[(i+2)&7];
+		i++;
+		_x[i]^=_x[(i+1)&7]>>SHIFT[i];
+		_x[(i+3)&7]+=_x[i];
+		_x[(i+1)&7]+=_x[(i+2)&7];
+	}
+}
+
+
+void isaac_init(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed){
+	_ctx->a=_ctx->b=_ctx->c=0;
+	memset(_ctx->r,0,sizeof(_ctx->r));
+	isaac_reseed(_ctx,_seed,_nseed);
+}
+
+void isaac_reseed(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed){
+	uint32_t *m;
+	uint32_t *r;
+	uint32_t  x[8];
+	int       i;
+	int       j;
+	m=_ctx->m;
+	r=_ctx->r;
+	if(_nseed>ISAAC_SEED_SZ_MAX)_nseed=ISAAC_SEED_SZ_MAX;
+	for(i=0;i<_nseed>>2;i++){
+		r[i]^=(uint32_t)_seed[i<<2|3]<<24|(uint32_t)_seed[i<<2|2]<<16|
+			(uint32_t)_seed[i<<2|1]<<8|_seed[i<<2];
+	}
+	_nseed-=i<<2;
+	if(_nseed>0){
+		uint32_t ri;
+		ri=_seed[i<<2];
+		for(j=1;j<_nseed;j++)ri|=(uint32_t)_seed[i<<2|j]<<(j<<3);
+		r[i++]^=ri;
+	}
+	x[0]=x[1]=x[2]=x[3]=x[4]=x[5]=x[6]=x[7]=0x9E3779B9U;
+	for(i=0;i<4;i++)isaac_mix(x);
+	for(i=0;i<ISAAC_SZ;i+=8){
+		for(j=0;j<8;j++)x[j]+=r[i+j];
+		isaac_mix(x);
+		memcpy(m+i,x,sizeof(x));
+	}
+	for(i=0;i<ISAAC_SZ;i+=8){
+		for(j=0;j<8;j++)x[j]+=m[i+j];
+		isaac_mix(x);
+		memcpy(m+i,x,sizeof(x));
+	}
+	isaac_update(_ctx);
+}
+
+uint32_t isaac_next_uint32(isaac_ctx *_ctx){
+	if(!_ctx->n)isaac_update(_ctx);
+	return _ctx->r[--_ctx->n];
+}
+
+uint32_t isaac_next_uint(isaac_ctx *_ctx,uint32_t _n){
+	uint32_t r;
+	uint32_t v;
+	uint32_t d;
+	do{
+		r=isaac_next_uint32(_ctx);
+		v=r%_n;
+		d=r-v;
+	}
+	while(((d+_n-1)&ISAAC_MASK)<d);
+	return v;
+}
diff --git a/lib/prng.h b/lib/prng.h
new file mode 100644
index 0000000..bf5776d
--- /dev/null
+++ b/lib/prng.h
@@ -0,0 +1,82 @@
+/*
+ * PRNG Header
+ */
+#ifndef __PRNG_H__
+#define __PRNG_H__
+
+# include <stdint.h>
+
+
+
+typedef struct isaac_ctx isaac_ctx;
+
+
+
+/*This value may be lowered to reduce memory usage on embedded platforms, at
+  the cost of reducing security and increasing bias.
+  Quoting Bob Jenkins: "The current best guess is that bias is detectable after
+  2**37 values for [ISAAC_SZ_LOG]=3, 2**45 for 4, 2**53 for 5, 2**61 for 6,
+  2**69 for 7, and 2**77 values for [ISAAC_SZ_LOG]=8."*/
+#define ISAAC_SZ_LOG      (8)
+#define ISAAC_SZ          (1<<ISAAC_SZ_LOG)
+#define ISAAC_SEED_SZ_MAX (ISAAC_SZ<<2)
+
+
+
+/*ISAAC is the most advanced of a series of pseudo-random number generators
+  designed by Robert J. Jenkins Jr. in 1996.
+  http://www.burtleburtle.net/bob/rand/isaac.html
+  To quote:
+  No efficient method is known for deducing their internal states.
+  ISAAC requires an amortized 18.75 instructions to produce a 32-bit value.
+  There are no cycles in ISAAC shorter than 2**40 values.
+  The expected cycle length is 2**8295 values.*/
+struct isaac_ctx{
+	unsigned n;
+	uint32_t r[ISAAC_SZ];
+	uint32_t m[ISAAC_SZ];
+	uint32_t a;
+	uint32_t b;
+	uint32_t c;
+};
+
+
+/**
+ * isaac_init - Initialize an instance of the ISAAC random number generator.
+ * @_ctx:   The instance to initialize.
+ * @_seed:  The specified seed bytes.
+ *          This may be NULL if _nseed is less than or equal to zero.
+ * @_nseed: The number of bytes to use for the seed.
+ *          If this is greater than ISAAC_SEED_SZ_MAX, the extra bytes are
+ *           ignored.
+ */
+void isaac_init(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed);
+
+/**
+ * isaac_reseed - Mix a new batch of entropy into the current state.
+ * To reset ISAAC to a known state, call isaac_init() again instead.
+ * @_ctx:   The instance to reseed.
+ * @_seed:  The specified seed bytes.
+ *          This may be NULL if _nseed is zero.
+ * @_nseed: The number of bytes to use for the seed.
+ *          If this is greater than ISAAC_SEED_SZ_MAX, the extra bytes are
+ *           ignored.
+ */
+void isaac_reseed(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed);
+/**
+ * isaac_next_uint32 - Return the next random 32-bit value.
+ * @_ctx: The ISAAC instance to generate the value with.
+ */
+uint32_t isaac_next_uint32(isaac_ctx *_ctx);
+/**
+ * isaac_next_uint - Uniform random integer less than the given value.
+ * @_ctx: The ISAAC instance to generate the value with.
+ * @_n:   The upper bound on the range of numbers returned (not inclusive).
+ *        This must be greater than zero and less than 2**32.
+ *        To return integers in the full range 0...2**32-1, use
+ *         isaac_next_uint32() instead.
+ * Return: An integer uniformly distributed between 0 and _n-1 (inclusive).
+ */
+uint32_t isaac_next_uint(isaac_ctx *_ctx,uint32_t _n);
+
+#endif
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 06/11] arm/Makefile.common: force -fno-pic
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

As distro compilers move towards defaults for build hardening for things
like ASLR we need to force -fno-pic. Failure to do can lead to weird
relocation problems when we build our "lat" binaries.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 arm/Makefile.common | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arm/Makefile.common b/arm/Makefile.common
index 52f7440..cca0d9c 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -21,6 +21,7 @@ phys_base = $(LOADADDR)
 
 CFLAGS += -std=gnu99
 CFLAGS += -ffreestanding
+CFLAGS += -fno-pic
 CFLAGS += -Wextra
 CFLAGS += -O2
 CFLAGS += -I lib -I lib/libfdt
-- 
2.10.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 06/11] arm/Makefile.common: force -fno-pic
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

As distro compilers move towards defaults for build hardening for things
like ASLR we need to force -fno-pic. Failure to do can lead to weird
relocation problems when we build our "lat" binaries.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 arm/Makefile.common | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arm/Makefile.common b/arm/Makefile.common
index 52f7440..cca0d9c 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -21,6 +21,7 @@ phys_base = $(LOADADDR)
 
 CFLAGS += -std=gnu99
 CFLAGS += -ffreestanding
+CFLAGS += -fno-pic
 CFLAGS += -Wextra
 CFLAGS += -O2
 CFLAGS += -I lib -I lib/libfdt
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 06/11] arm/Makefile.common: force -fno-pic
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

As distro compilers move towards defaults for build hardening for things
like ASLR we need to force -fno-pic. Failure to do can lead to weird
relocation problems when we build our "lat" binaries.

Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
---
 arm/Makefile.common | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arm/Makefile.common b/arm/Makefile.common
index 52f7440..cca0d9c 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -21,6 +21,7 @@ phys_base = $(LOADADDR)
 
 CFLAGS += -std=gnu99
 CFLAGS += -ffreestanding
+CFLAGS += -fno-pic
 CFLAGS += -Wextra
 CFLAGS += -O2
 CFLAGS += -I lib -I lib/libfdt
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 07/11] arm/tlbflush-code: Add TLB flush during code execution test
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: mttcg, nikunj, jan.kiszka, mark.burton, qemu-devel, cota,
	serge.fdrv, pbonzini, bobby.prani, rth

This adds a fairly brain dead torture test for TLB flushes intended for
stressing the MTTCG QEMU build. It takes the usual -smp option for
multiple CPUs.

By default it CPU0 will do a TLBIALL flush after each cycle. You can
pass options via -append to control additional aspects of the test:

  - "page" flush each page in turn (one per function)
  - "self" do the flush after each computation cycle
  - "verbose" report progress on each computation cycle

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Mark Rutland <mark.rutland@arm.com>

---
v2
  - rename to tlbflush-test
  - made makefile changes cleaner
  - added self/other flush mode
  - create specific prefix
  - whitespace fixes
v3
  - using new SMP framework for test runing
v4
  - merge in the unitests.cfg
v5
  - max out at -smp 4
  - printf fmtfix
v7
  - rename to tlbflush-code
  - int -> bool flags
---
 arm/Makefile.common |   2 +
 arm/tlbflush-code.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  24 ++++++
 3 files changed, 238 insertions(+)
 create mode 100644 arm/tlbflush-code.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index cca0d9c..de99a6e 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -13,6 +13,7 @@ tests-common  = $(TEST_DIR)/selftest.flat
 tests-common += $(TEST_DIR)/spinlock-test.flat
 tests-common += $(TEST_DIR)/pci-test.flat
 tests-common += $(TEST_DIR)/gic.flat
+tests-common += $(TEST_DIR)/tlbflush-code.flat
 
 all: test_cases
 
@@ -81,3 +82,4 @@ generated_files = $(asm-offsets)
 test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
+$(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
diff --git a/arm/tlbflush-code.c b/arm/tlbflush-code.c
new file mode 100644
index 0000000..cb5cdc2
--- /dev/null
+++ b/arm/tlbflush-code.c
@@ -0,0 +1,212 @@
+/*
+ * TLB Flush Race Tests
+ *
+ * These tests are designed to test for incorrect TLB flush semantics
+ * under emulation. The initial CPU will set all the others working a
+ * compuation task and will then trigger TLB flushes across the
+ * system. It doesn't actually need to re-map anything but the flushes
+ * themselves will trigger QEMU's TCG self-modifying code detection
+ * which will invalidate any generated  code causing re-translation.
+ * Eventually the code buffer will fill and a general tb_lush() will
+ * be triggered.
+ *
+ * Copyright (C) 2016, Linaro, Alex Bennée <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#define SEQ_LENGTH 10
+#define SEQ_HASH 0x7cd707fe
+
+static cpumask_t smp_test_complete;
+static int flush_count = 1000000;
+static bool flush_self;
+static bool flush_page;
+static bool flush_verbose;
+
+/*
+ * Work functions
+ *
+ * These work functions need to be:
+ *
+ *  - page aligned, so we can flush one function at a time
+ *  - have branches, so QEMU TCG generates multiple basic blocks
+ *  - call across pages, so we exercise the TCG basic block slow path
+ */
+
+/* Adler32 */
+__attribute__((aligned(PAGE_SIZE))) uint32_t hash_array(const void *buf,
+							size_t buflen)
+{
+	const uint8_t *data = (uint8_t *) buf;
+	uint32_t s1 = 1;
+	uint32_t s2 = 0;
+
+	for (size_t n = 0; n < buflen; n++) {
+		s1 = (s1 + data[n]) % 65521;
+		s2 = (s2 + s1) % 65521;
+	}
+	return (s2 << 16) | s1;
+}
+
+__attribute__((aligned(PAGE_SIZE))) void create_fib_sequence(int length,
+							unsigned int *array)
+{
+	int i;
+
+	/* first two values */
+	array[0] = 0;
+	array[1] = 1;
+	for (i=2; i<length; i++) {
+		array[i] = array[i-2] + array[i-1];
+	}
+}
+
+__attribute__((aligned(PAGE_SIZE))) unsigned long long factorial(unsigned int n)
+{
+	unsigned int i;
+	unsigned long long fac = 1;
+	for (i=1; i<=n; i++)
+	{
+		fac = fac * i;
+	}
+	return fac;
+}
+
+__attribute__((aligned(PAGE_SIZE))) void factorial_array
+(unsigned int n, unsigned int *input, unsigned long long *output)
+{
+	unsigned int i;
+	for (i=0; i<n; i++) {
+		output[i] = factorial(input[i]);
+	}
+}
+
+__attribute__((aligned(PAGE_SIZE))) unsigned int do_computation(void)
+{
+	unsigned int fib_array[SEQ_LENGTH];
+	unsigned long long facfib_array[SEQ_LENGTH];
+	uint32_t fib_hash, facfib_hash;
+
+	create_fib_sequence(SEQ_LENGTH, &fib_array[0]);
+	fib_hash = hash_array(&fib_array[0], sizeof(fib_array));
+	factorial_array(SEQ_LENGTH, &fib_array[0], &facfib_array[0]);
+	facfib_hash = hash_array(&facfib_array[0], sizeof(facfib_array));
+
+	return (fib_hash ^ facfib_hash);
+}
+
+/* This provides a table of the work functions so we can flush each
+ * page individually
+ */
+static void * pages[] = {&hash_array, &create_fib_sequence, &factorial,
+			 &factorial_array, &do_computation};
+
+static void do_flush(int i)
+{
+	if (flush_page) {
+		flush_tlb_page((unsigned long)pages[i % ARRAY_SIZE(pages)]);
+	} else {
+		flush_tlb_all();
+	}
+}
+
+
+static void just_compute(void)
+{
+	int i, errors = 0;
+	int cpu = smp_processor_id();
+
+	uint32_t result;
+
+	printf("CPU%d online\n", cpu);
+
+	for (i=0; i < flush_count; i++) {
+		result = do_computation();
+
+		if (result != SEQ_HASH) {
+			errors++;
+			printf("CPU%d: seq%d 0x%"PRIx32"!=0x%x\n",
+				cpu, i, result, SEQ_HASH);
+		}
+
+		if (flush_verbose && (i % 1000) == 0) {
+			printf("CPU%d: seq%d\n", cpu, i);
+		}
+
+		if (flush_self) {
+			do_flush(i);
+		}
+	}
+
+	report("CPU%d: Done - Errors: %d\n", errors == 0, cpu, errors);
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+static void just_flush(void)
+{
+	int cpu = smp_processor_id();
+	int i = 0;
+
+	/* set our CPU as done, keep flushing until everyone else
+	   finished */
+	cpumask_set_cpu(cpu, &smp_test_complete);
+
+	while (!cpumask_full(&smp_test_complete)) {
+		do_flush(i++);
+	}
+
+	report("CPU%d: Done - Triggered %d flushes\n", true, cpu, i);
+}
+
+int main(int argc, char **argv)
+{
+	int cpu, i;
+	char prefix[100];
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		if (strcmp(arg, "page") == 0) {
+			flush_page = true;
+                }
+
+                if (strcmp(arg, "self") == 0) {
+			flush_self = true;
+                }
+
+		if (strcmp(arg, "verbose") == 0) {
+			flush_verbose = true;
+                }
+	}
+
+	snprintf(prefix, sizeof(prefix), "tlbflush_%s_%s",
+		flush_page?"page":"all",
+		flush_self?"self":"other");
+	report_prefix_push(prefix);
+
+	for_each_present_cpu(cpu) {
+		if (cpu == 0)
+			continue;
+		smp_boot_secondary(cpu, just_compute);
+	}
+
+	if (flush_self)
+		just_compute();
+	else
+		just_flush();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index c7392c7..beaae84 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -72,3 +72,27 @@ file = gic.flat
 smp = $MAX_SMP
 extra_params = -machine gic-version=3 -append 'ipi'
 groups = gic
+
+# TLB Torture Tests
+[tlbflush-code::all_other]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+groups = tlbflush
+
+[tlbflush-code::page_other]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'page'
+groups = tlbflush
+
+[tlbflush-code::all_self]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'self'
+groups = tlbflush
+
+[tlbflush-code::page_self]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'page self'
+groups = tlbflush
-- 
2.10.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 07/11] arm/tlbflush-code: Add TLB flush during code execution test
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée, Mark Rutland

This adds a fairly brain dead torture test for TLB flushes intended for
stressing the MTTCG QEMU build. It takes the usual -smp option for
multiple CPUs.

By default it CPU0 will do a TLBIALL flush after each cycle. You can
pass options via -append to control additional aspects of the test:

  - "page" flush each page in turn (one per function)
  - "self" do the flush after each computation cycle
  - "verbose" report progress on each computation cycle

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Mark Rutland <mark.rutland@arm.com>

---
v2
  - rename to tlbflush-test
  - made makefile changes cleaner
  - added self/other flush mode
  - create specific prefix
  - whitespace fixes
v3
  - using new SMP framework for test runing
v4
  - merge in the unitests.cfg
v5
  - max out at -smp 4
  - printf fmtfix
v7
  - rename to tlbflush-code
  - int -> bool flags
---
 arm/Makefile.common |   2 +
 arm/tlbflush-code.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  24 ++++++
 3 files changed, 238 insertions(+)
 create mode 100644 arm/tlbflush-code.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index cca0d9c..de99a6e 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -13,6 +13,7 @@ tests-common  = $(TEST_DIR)/selftest.flat
 tests-common += $(TEST_DIR)/spinlock-test.flat
 tests-common += $(TEST_DIR)/pci-test.flat
 tests-common += $(TEST_DIR)/gic.flat
+tests-common += $(TEST_DIR)/tlbflush-code.flat
 
 all: test_cases
 
@@ -81,3 +82,4 @@ generated_files = $(asm-offsets)
 test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
+$(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
diff --git a/arm/tlbflush-code.c b/arm/tlbflush-code.c
new file mode 100644
index 0000000..cb5cdc2
--- /dev/null
+++ b/arm/tlbflush-code.c
@@ -0,0 +1,212 @@
+/*
+ * TLB Flush Race Tests
+ *
+ * These tests are designed to test for incorrect TLB flush semantics
+ * under emulation. The initial CPU will set all the others working a
+ * compuation task and will then trigger TLB flushes across the
+ * system. It doesn't actually need to re-map anything but the flushes
+ * themselves will trigger QEMU's TCG self-modifying code detection
+ * which will invalidate any generated  code causing re-translation.
+ * Eventually the code buffer will fill and a general tb_lush() will
+ * be triggered.
+ *
+ * Copyright (C) 2016, Linaro, Alex Bennée <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#define SEQ_LENGTH 10
+#define SEQ_HASH 0x7cd707fe
+
+static cpumask_t smp_test_complete;
+static int flush_count = 1000000;
+static bool flush_self;
+static bool flush_page;
+static bool flush_verbose;
+
+/*
+ * Work functions
+ *
+ * These work functions need to be:
+ *
+ *  - page aligned, so we can flush one function at a time
+ *  - have branches, so QEMU TCG generates multiple basic blocks
+ *  - call across pages, so we exercise the TCG basic block slow path
+ */
+
+/* Adler32 */
+__attribute__((aligned(PAGE_SIZE))) uint32_t hash_array(const void *buf,
+							size_t buflen)
+{
+	const uint8_t *data = (uint8_t *) buf;
+	uint32_t s1 = 1;
+	uint32_t s2 = 0;
+
+	for (size_t n = 0; n < buflen; n++) {
+		s1 = (s1 + data[n]) % 65521;
+		s2 = (s2 + s1) % 65521;
+	}
+	return (s2 << 16) | s1;
+}
+
+__attribute__((aligned(PAGE_SIZE))) void create_fib_sequence(int length,
+							unsigned int *array)
+{
+	int i;
+
+	/* first two values */
+	array[0] = 0;
+	array[1] = 1;
+	for (i=2; i<length; i++) {
+		array[i] = array[i-2] + array[i-1];
+	}
+}
+
+__attribute__((aligned(PAGE_SIZE))) unsigned long long factorial(unsigned int n)
+{
+	unsigned int i;
+	unsigned long long fac = 1;
+	for (i=1; i<=n; i++)
+	{
+		fac = fac * i;
+	}
+	return fac;
+}
+
+__attribute__((aligned(PAGE_SIZE))) void factorial_array
+(unsigned int n, unsigned int *input, unsigned long long *output)
+{
+	unsigned int i;
+	for (i=0; i<n; i++) {
+		output[i] = factorial(input[i]);
+	}
+}
+
+__attribute__((aligned(PAGE_SIZE))) unsigned int do_computation(void)
+{
+	unsigned int fib_array[SEQ_LENGTH];
+	unsigned long long facfib_array[SEQ_LENGTH];
+	uint32_t fib_hash, facfib_hash;
+
+	create_fib_sequence(SEQ_LENGTH, &fib_array[0]);
+	fib_hash = hash_array(&fib_array[0], sizeof(fib_array));
+	factorial_array(SEQ_LENGTH, &fib_array[0], &facfib_array[0]);
+	facfib_hash = hash_array(&facfib_array[0], sizeof(facfib_array));
+
+	return (fib_hash ^ facfib_hash);
+}
+
+/* This provides a table of the work functions so we can flush each
+ * page individually
+ */
+static void * pages[] = {&hash_array, &create_fib_sequence, &factorial,
+			 &factorial_array, &do_computation};
+
+static void do_flush(int i)
+{
+	if (flush_page) {
+		flush_tlb_page((unsigned long)pages[i % ARRAY_SIZE(pages)]);
+	} else {
+		flush_tlb_all();
+	}
+}
+
+
+static void just_compute(void)
+{
+	int i, errors = 0;
+	int cpu = smp_processor_id();
+
+	uint32_t result;
+
+	printf("CPU%d online\n", cpu);
+
+	for (i=0; i < flush_count; i++) {
+		result = do_computation();
+
+		if (result != SEQ_HASH) {
+			errors++;
+			printf("CPU%d: seq%d 0x%"PRIx32"!=0x%x\n",
+				cpu, i, result, SEQ_HASH);
+		}
+
+		if (flush_verbose && (i % 1000) == 0) {
+			printf("CPU%d: seq%d\n", cpu, i);
+		}
+
+		if (flush_self) {
+			do_flush(i);
+		}
+	}
+
+	report("CPU%d: Done - Errors: %d\n", errors == 0, cpu, errors);
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+static void just_flush(void)
+{
+	int cpu = smp_processor_id();
+	int i = 0;
+
+	/* set our CPU as done, keep flushing until everyone else
+	   finished */
+	cpumask_set_cpu(cpu, &smp_test_complete);
+
+	while (!cpumask_full(&smp_test_complete)) {
+		do_flush(i++);
+	}
+
+	report("CPU%d: Done - Triggered %d flushes\n", true, cpu, i);
+}
+
+int main(int argc, char **argv)
+{
+	int cpu, i;
+	char prefix[100];
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		if (strcmp(arg, "page") == 0) {
+			flush_page = true;
+                }
+
+                if (strcmp(arg, "self") == 0) {
+			flush_self = true;
+                }
+
+		if (strcmp(arg, "verbose") == 0) {
+			flush_verbose = true;
+                }
+	}
+
+	snprintf(prefix, sizeof(prefix), "tlbflush_%s_%s",
+		flush_page?"page":"all",
+		flush_self?"self":"other");
+	report_prefix_push(prefix);
+
+	for_each_present_cpu(cpu) {
+		if (cpu == 0)
+			continue;
+		smp_boot_secondary(cpu, just_compute);
+	}
+
+	if (flush_self)
+		just_compute();
+	else
+		just_flush();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index c7392c7..beaae84 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -72,3 +72,27 @@ file = gic.flat
 smp = $MAX_SMP
 extra_params = -machine gic-version=3 -append 'ipi'
 groups = gic
+
+# TLB Torture Tests
+[tlbflush-code::all_other]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+groups = tlbflush
+
+[tlbflush-code::page_other]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'page'
+groups = tlbflush
+
+[tlbflush-code::all_self]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'self'
+groups = tlbflush
+
+[tlbflush-code::page_self]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'page self'
+groups = tlbflush
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 07/11] arm/tlbflush-code: Add TLB flush during code execution test
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

This adds a fairly brain dead torture test for TLB flushes intended for
stressing the MTTCG QEMU build. It takes the usual -smp option for
multiple CPUs.

By default it CPU0 will do a TLBIALL flush after each cycle. You can
pass options via -append to control additional aspects of the test:

  - "page" flush each page in turn (one per function)
  - "self" do the flush after each computation cycle
  - "verbose" report progress on each computation cycle

Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
CC: Mark Rutland <mark.rutland@arm.com>

---
v2
  - rename to tlbflush-test
  - made makefile changes cleaner
  - added self/other flush mode
  - create specific prefix
  - whitespace fixes
v3
  - using new SMP framework for test runing
v4
  - merge in the unitests.cfg
v5
  - max out at -smp 4
  - printf fmtfix
v7
  - rename to tlbflush-code
  - int -> bool flags
---
 arm/Makefile.common |   2 +
 arm/tlbflush-code.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  24 ++++++
 3 files changed, 238 insertions(+)
 create mode 100644 arm/tlbflush-code.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index cca0d9c..de99a6e 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -13,6 +13,7 @@ tests-common  = $(TEST_DIR)/selftest.flat
 tests-common += $(TEST_DIR)/spinlock-test.flat
 tests-common += $(TEST_DIR)/pci-test.flat
 tests-common += $(TEST_DIR)/gic.flat
+tests-common += $(TEST_DIR)/tlbflush-code.flat
 
 all: test_cases
 
@@ -81,3 +82,4 @@ generated_files = $(asm-offsets)
 test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
+$(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
diff --git a/arm/tlbflush-code.c b/arm/tlbflush-code.c
new file mode 100644
index 0000000..cb5cdc2
--- /dev/null
+++ b/arm/tlbflush-code.c
@@ -0,0 +1,212 @@
+/*
+ * TLB Flush Race Tests
+ *
+ * These tests are designed to test for incorrect TLB flush semantics
+ * under emulation. The initial CPU will set all the others working a
+ * compuation task and will then trigger TLB flushes across the
+ * system. It doesn't actually need to re-map anything but the flushes
+ * themselves will trigger QEMU's TCG self-modifying code detection
+ * which will invalidate any generated  code causing re-translation.
+ * Eventually the code buffer will fill and a general tb_lush() will
+ * be triggered.
+ *
+ * Copyright (C) 2016, Linaro, Alex Benn?e <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#define SEQ_LENGTH 10
+#define SEQ_HASH 0x7cd707fe
+
+static cpumask_t smp_test_complete;
+static int flush_count = 1000000;
+static bool flush_self;
+static bool flush_page;
+static bool flush_verbose;
+
+/*
+ * Work functions
+ *
+ * These work functions need to be:
+ *
+ *  - page aligned, so we can flush one function at a time
+ *  - have branches, so QEMU TCG generates multiple basic blocks
+ *  - call across pages, so we exercise the TCG basic block slow path
+ */
+
+/* Adler32 */
+__attribute__((aligned(PAGE_SIZE))) uint32_t hash_array(const void *buf,
+							size_t buflen)
+{
+	const uint8_t *data = (uint8_t *) buf;
+	uint32_t s1 = 1;
+	uint32_t s2 = 0;
+
+	for (size_t n = 0; n < buflen; n++) {
+		s1 = (s1 + data[n]) % 65521;
+		s2 = (s2 + s1) % 65521;
+	}
+	return (s2 << 16) | s1;
+}
+
+__attribute__((aligned(PAGE_SIZE))) void create_fib_sequence(int length,
+							unsigned int *array)
+{
+	int i;
+
+	/* first two values */
+	array[0] = 0;
+	array[1] = 1;
+	for (i=2; i<length; i++) {
+		array[i] = array[i-2] + array[i-1];
+	}
+}
+
+__attribute__((aligned(PAGE_SIZE))) unsigned long long factorial(unsigned int n)
+{
+	unsigned int i;
+	unsigned long long fac = 1;
+	for (i=1; i<=n; i++)
+	{
+		fac = fac * i;
+	}
+	return fac;
+}
+
+__attribute__((aligned(PAGE_SIZE))) void factorial_array
+(unsigned int n, unsigned int *input, unsigned long long *output)
+{
+	unsigned int i;
+	for (i=0; i<n; i++) {
+		output[i] = factorial(input[i]);
+	}
+}
+
+__attribute__((aligned(PAGE_SIZE))) unsigned int do_computation(void)
+{
+	unsigned int fib_array[SEQ_LENGTH];
+	unsigned long long facfib_array[SEQ_LENGTH];
+	uint32_t fib_hash, facfib_hash;
+
+	create_fib_sequence(SEQ_LENGTH, &fib_array[0]);
+	fib_hash = hash_array(&fib_array[0], sizeof(fib_array));
+	factorial_array(SEQ_LENGTH, &fib_array[0], &facfib_array[0]);
+	facfib_hash = hash_array(&facfib_array[0], sizeof(facfib_array));
+
+	return (fib_hash ^ facfib_hash);
+}
+
+/* This provides a table of the work functions so we can flush each
+ * page individually
+ */
+static void * pages[] = {&hash_array, &create_fib_sequence, &factorial,
+			 &factorial_array, &do_computation};
+
+static void do_flush(int i)
+{
+	if (flush_page) {
+		flush_tlb_page((unsigned long)pages[i % ARRAY_SIZE(pages)]);
+	} else {
+		flush_tlb_all();
+	}
+}
+
+
+static void just_compute(void)
+{
+	int i, errors = 0;
+	int cpu = smp_processor_id();
+
+	uint32_t result;
+
+	printf("CPU%d online\n", cpu);
+
+	for (i=0; i < flush_count; i++) {
+		result = do_computation();
+
+		if (result != SEQ_HASH) {
+			errors++;
+			printf("CPU%d: seq%d 0x%"PRIx32"!=0x%x\n",
+				cpu, i, result, SEQ_HASH);
+		}
+
+		if (flush_verbose && (i % 1000) == 0) {
+			printf("CPU%d: seq%d\n", cpu, i);
+		}
+
+		if (flush_self) {
+			do_flush(i);
+		}
+	}
+
+	report("CPU%d: Done - Errors: %d\n", errors == 0, cpu, errors);
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+static void just_flush(void)
+{
+	int cpu = smp_processor_id();
+	int i = 0;
+
+	/* set our CPU as done, keep flushing until everyone else
+	   finished */
+	cpumask_set_cpu(cpu, &smp_test_complete);
+
+	while (!cpumask_full(&smp_test_complete)) {
+		do_flush(i++);
+	}
+
+	report("CPU%d: Done - Triggered %d flushes\n", true, cpu, i);
+}
+
+int main(int argc, char **argv)
+{
+	int cpu, i;
+	char prefix[100];
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		if (strcmp(arg, "page") == 0) {
+			flush_page = true;
+                }
+
+                if (strcmp(arg, "self") == 0) {
+			flush_self = true;
+                }
+
+		if (strcmp(arg, "verbose") == 0) {
+			flush_verbose = true;
+                }
+	}
+
+	snprintf(prefix, sizeof(prefix), "tlbflush_%s_%s",
+		flush_page?"page":"all",
+		flush_self?"self":"other");
+	report_prefix_push(prefix);
+
+	for_each_present_cpu(cpu) {
+		if (cpu == 0)
+			continue;
+		smp_boot_secondary(cpu, just_compute);
+	}
+
+	if (flush_self)
+		just_compute();
+	else
+		just_flush();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index c7392c7..beaae84 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -72,3 +72,27 @@ file = gic.flat
 smp = $MAX_SMP
 extra_params = -machine gic-version=3 -append 'ipi'
 groups = gic
+
+# TLB Torture Tests
+[tlbflush-code::all_other]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+groups = tlbflush
+
+[tlbflush-code::page_other]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'page'
+groups = tlbflush
+
+[tlbflush-code::all_self]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'self'
+groups = tlbflush
+
+[tlbflush-code::page_self]
+file = tlbflush-code.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'page self'
+groups = tlbflush
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 08/11] arm/tlbflush-data: Add TLB flush during data writes test
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: mttcg, nikunj, jan.kiszka, mark.burton, qemu-devel, cota,
	serge.fdrv, pbonzini, bobby.prani, rth

This test is the cousin of the tlbflush-code test. Instead of flushing
running code it re-maps virtual addresses while a buffer is being filled
up. It then audits the results checking for writes that have ended up in
the wrong place.

While tlbflush-code exercises QEMU's translation invalidation logic this
test stresses the SoftMMU cputlb code and ensures it is semantically
correct.

The test optionally takes two parameters for debugging:

   cycles           - change the default number of test iterations
   page             - flush pages individually instead of all

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Mark Rutland <mark.rutland@arm.com>
---
 arm/Makefile.common |   2 +
 arm/tlbflush-data.c | 401 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  12 ++
 3 files changed, 415 insertions(+)
 create mode 100644 arm/tlbflush-data.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index de99a6e..528166d 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -14,6 +14,7 @@ tests-common += $(TEST_DIR)/spinlock-test.flat
 tests-common += $(TEST_DIR)/pci-test.flat
 tests-common += $(TEST_DIR)/gic.flat
 tests-common += $(TEST_DIR)/tlbflush-code.flat
+tests-common += $(TEST_DIR)/tlbflush-data.flat
 
 all: test_cases
 
@@ -83,3 +84,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
 $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
+$(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
diff --git a/arm/tlbflush-data.c b/arm/tlbflush-data.c
new file mode 100644
index 0000000..7920179
--- /dev/null
+++ b/arm/tlbflush-data.c
@@ -0,0 +1,401 @@
+/*
+ * TLB Flush Race Tests
+ *
+ * These tests are designed to test for incorrect TLB flush semantics
+ * under emulation. The initial CPU will set all the others working on
+ * a writing to a set of pages. It will then re-map one of the pages
+ * back and forth while recording the timestamps of when each page was
+ * active. The test fails if a write was detected on a page after the
+ * tlbflush switching to a new page should have completed.
+ *
+ * Copyright (C) 2016, Linaro, Alex Bennée <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#define NR_TIMESTAMPS 		((PAGE_SIZE/sizeof(u64)) << 2)
+#define NR_AUDIT_RECORDS	16384
+#define NR_DYNAMIC_PAGES 	3
+#define MAX_CPUS 		8
+
+#define MIN(a, b)		((a) < (b) ? (a) : (b))
+
+typedef struct {
+	u64    		timestamps[NR_TIMESTAMPS];
+} write_buffer;
+
+typedef struct {
+	write_buffer 	*newbuf;
+	u64		time_before_flush;
+	u64		time_after_flush;
+} audit_rec_t;
+
+typedef struct {
+	audit_rec_t 	records[NR_AUDIT_RECORDS];
+} audit_buffer;
+
+typedef struct {
+	write_buffer 	*stable_pages;
+	write_buffer    *dynamic_pages[NR_DYNAMIC_PAGES];
+	audit_buffer 	*audit;
+	unsigned int 	flush_count;
+} test_data_t;
+
+static test_data_t test_data[MAX_CPUS];
+
+static cpumask_t ready;
+static cpumask_t complete;
+
+static bool test_complete;
+static bool flush_verbose;
+static bool flush_by_page;
+static int test_cycles=3;
+static int secondary_cpus;
+
+static write_buffer * alloc_test_pages(void)
+{
+	write_buffer *pg;
+	pg = calloc(NR_TIMESTAMPS, sizeof(u64));
+	return pg;
+}
+
+static void setup_pages_for_cpu(int cpu)
+{
+	unsigned int i;
+
+	test_data[cpu].stable_pages = alloc_test_pages();
+
+	for (i=0; i<NR_DYNAMIC_PAGES; i++) {
+		test_data[cpu].dynamic_pages[i] = alloc_test_pages();
+	}
+
+	test_data[cpu].audit = calloc(NR_AUDIT_RECORDS, sizeof(audit_rec_t));
+}
+
+static audit_rec_t * get_audit_record(audit_buffer *buf, unsigned int record)
+{
+	return &buf->records[record];
+}
+
+/* Sync on a given cpumask */
+static void wait_on(int cpu, cpumask_t *mask)
+{
+	cpumask_set_cpu(cpu, mask);
+	while (!cpumask_full(mask))
+		cpu_relax();
+}
+
+static uint64_t sync_start(void)
+{
+	const uint64_t gate_mask = ~0x7ff;
+	uint64_t gate, now;
+	gate = get_cntvct() & gate_mask;
+	do {
+		now = get_cntvct();
+	} while ((now & gate_mask) == gate);
+
+	return now;
+}
+
+static void do_page_writes(void)
+{
+	unsigned int i, runs = 0;
+	int cpu = smp_processor_id();
+	write_buffer *stable_pages = test_data[cpu].stable_pages;
+	write_buffer *moving_page = test_data[cpu].dynamic_pages[0];
+
+	printf("CPU%d: ready %p/%p @ 0x%08" PRIx64"\n",
+		cpu, stable_pages, moving_page, get_cntvct());
+
+	while (!test_complete) {
+		u64 run_start, run_end;
+
+		smp_mb();
+		wait_on(cpu, &ready);
+		run_start = sync_start();
+
+		for (i = 0; i < NR_TIMESTAMPS; i++) {
+			u64 ts = get_cntvct();
+			moving_page->timestamps[i] = ts;
+			stable_pages->timestamps[i] = ts;
+		}
+
+		run_end = get_cntvct();
+		printf("CPU%d: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles)\n",
+			cpu, runs++, run_start, run_end, run_end - run_start);
+
+		/* wait on completion - gets clear my main thread*/
+		wait_on(cpu, &complete);
+	}
+}
+
+
+/*
+ * This is the core of the test. Timestamps are taken either side of
+ * the updating of the page table and the flush instruction. By
+ * keeping track of when the page mapping is changed we can detect any
+ * writes that shouldn't have made it to the other pages.
+ *
+ * This isn't the recommended way to update the page table. ARM
+ * recommends break-before-make so accesses that are in flight can
+ * trigger faults that can be handled cleanly.
+ */
+
+/* This mimics  __flush_tlb_range from the kernel, doing a series of
+ * flush operations and then the dsb() to complete. */
+static void flush_pages(unsigned long start, unsigned long end)
+{
+	unsigned long addr;
+	start = start >> 12;
+	end = end >> 12;
+
+	dsb(ishst);
+	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT -12)) {
+#if defined(__aarch64__)
+		asm("tlbi	vaae1is, %0" :: "r" (addr));
+#else
+		asm volatile("mcr p15, 0, %0, c8, c7, 3" :: "r" (addr));
+#endif
+	}
+	dsb(ish);
+}
+
+static void remap_one_page(test_data_t *data)
+{
+	u64 ts_before, ts_after;
+	int pg = (data->flush_count % (NR_DYNAMIC_PAGES + 1));
+	write_buffer *dynamic_pages_vaddr = data->dynamic_pages[0];
+	write_buffer *newbuf_paddr = data->dynamic_pages[pg];
+	write_buffer *end_page_paddr = newbuf_paddr+1;
+
+	ts_before = get_cntvct();
+	/* update the page table */
+	mmu_set_range_ptes(mmu_idmap,
+			(unsigned long) dynamic_pages_vaddr,
+			(unsigned long) newbuf_paddr,
+			(unsigned long) end_page_paddr,
+			__pgprot(PTE_WBWA));
+	/* until the flush + isb() writes may still go to old address */
+	if (flush_by_page) {
+		flush_pages((unsigned long)dynamic_pages_vaddr, (unsigned long)(dynamic_pages_vaddr+1));
+	} else {
+		flush_tlb_all();
+	}
+	ts_after = get_cntvct();
+
+	if (data->flush_count < NR_AUDIT_RECORDS) {
+		audit_rec_t *rec = get_audit_record(data->audit, data->flush_count);
+		rec->newbuf = newbuf_paddr;
+		rec->time_before_flush = ts_before;
+		rec->time_after_flush = ts_after;
+	}
+	data->flush_count++;
+}
+
+static int check_pages(int cpu, char *msg,
+		write_buffer *base_page, write_buffer *test_page,
+		audit_buffer *audit, unsigned int flushes)
+{
+	write_buffer *prev_page = base_page;
+	unsigned int empty = 0, write = 0, late = 0, weird = 0;
+	unsigned int ts_index = 0, audit_index;
+	u64 ts;
+
+	/* For each audit record */
+	for (audit_index = 0; audit_index < MIN(flushes, NR_AUDIT_RECORDS); audit_index++) {
+		audit_rec_t *rec = get_audit_record(audit, audit_index);
+
+		do {
+			/* Work through timestamps until we overtake
+			 * this audit record */
+			ts = test_page->timestamps[ts_index];
+
+			if (ts == 0) {
+				empty++;
+			} else if (ts < rec->time_before_flush) {
+				if (test_page == prev_page) {
+					write++;
+				} else {
+					late++;
+				}
+			} else if (ts >= rec->time_before_flush
+				&& ts <= rec->time_after_flush) {
+				if (test_page == prev_page
+					|| test_page == rec->newbuf) {
+					write++;
+				} else {
+					weird++;
+				}
+			} else if (ts > rec->time_after_flush) {
+				if (test_page == rec->newbuf) {
+					write++;
+				}
+				/* It's possible the ts is way ahead
+				 * of the current record so we can't
+				 * call a non-match weird...
+				 *
+				 * Time to skip to next audit record
+				 */
+				break;
+			}
+
+			ts = test_page->timestamps[ts_index++];
+		} while (ts <= rec->time_after_flush && ts_index < NR_TIMESTAMPS);
+
+
+		/* Next record */
+		prev_page = rec->newbuf;
+	} /* for each audit record */
+
+	if (flush_verbose) {
+		printf("CPU%d: %s %p => %p %u/%u/%u/%u (0/OK/L/?) = %u total\n",
+			cpu, msg, test_page, base_page,
+			empty, write, late, weird, empty+write+late+weird);
+	}
+
+	return weird;
+}
+
+static int audit_cpu_pages(int cpu, test_data_t *data)
+{
+	unsigned int pg, writes=0, ts_index = 0;
+	write_buffer *test_page;
+	int errors = 0;
+
+	/* first the stable page */
+	test_page = data->stable_pages;
+	do {
+		if (test_page->timestamps[ts_index++]) {
+			writes++;
+		}
+	} while (ts_index < NR_TIMESTAMPS);
+
+	if (writes != ts_index) {
+		errors += 1;
+	}
+
+	if (flush_verbose) {
+		printf("CPU%d: stable page %p %u writes\n",
+			cpu, test_page, writes);
+	}
+
+
+	/* Restore the mapping for dynamic page */
+	test_page = data->dynamic_pages[0];
+
+	mmu_set_range_ptes(mmu_idmap,
+			(unsigned long) test_page,
+			(unsigned long) test_page,
+			(unsigned long) &test_page[1],
+			__pgprot(PTE_WBWA));
+	flush_tlb_all();
+
+	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
+		errors += check_pages(cpu, "dynamic page", test_page,
+				data->dynamic_pages[pg],
+				data->audit, data->flush_count);
+	}
+
+	/* reset for next run */
+	memset(data->stable_pages, 0, sizeof(write_buffer));
+	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
+		memset(data->dynamic_pages[pg], 0, sizeof(write_buffer));
+	}
+	memset(data->audit, 0, sizeof(audit_buffer));
+	data->flush_count = 0;
+	smp_mb();
+
+	report("CPU%d: checked, errors: %d", errors == 0, cpu, errors);
+	return errors;
+}
+
+static void do_page_flushes(void)
+{
+	int i, cpu;
+
+	printf("CPU0: ready @ 0x%08" PRIx64"\n", get_cntvct());
+
+	for (i=0; i<test_cycles; i++) {
+		unsigned int flushes=0;
+		u64 run_start, run_end;
+		int cpus_finished;
+
+		cpumask_clear(&complete);
+		wait_on(0, &ready);
+		run_start = sync_start();
+
+		do {
+			for_each_present_cpu(cpu) {
+				if (cpu == 0)
+					continue;
+
+				/* do remap & flush */
+				remap_one_page(&test_data[cpu]);
+				flushes++;
+			}
+
+			cpus_finished = cpumask_weight(&complete);
+		} while (cpus_finished < secondary_cpus);
+
+		run_end = get_cntvct();
+
+		printf("CPU0: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles, %u flushes)\n",
+			i, run_start, run_end, run_end - run_start, flushes);
+
+		/* Reset our ready mask for next cycle */
+		cpumask_clear_cpu(0, &ready);
+		smp_mb();
+		wait_on(0, &complete);
+
+		/* Check for discrepancies */
+		for_each_present_cpu(cpu) {
+			if (cpu == 0)
+				continue;
+			audit_cpu_pages(cpu, &test_data[cpu]);
+		}
+	}
+
+	test_complete = true;
+	smp_mb();
+	cpumask_set_cpu(0, &ready);
+	cpumask_set_cpu(0, &complete);
+}
+
+int main(int argc, char **argv)
+{
+	int cpu, i;
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+		if (strcmp(arg, "verbose") == 0) {
+			flush_verbose = true;
+		}
+		if (strcmp(arg, "page") == 0) {
+			flush_by_page = true;
+		}
+		if (strstr(arg, "cycles=") != NULL) {
+			char *p = strstr(arg, "=");
+			test_cycles = atol(p+1);
+		}
+	}
+
+	for_each_present_cpu(cpu) {
+		if (cpu == 0)
+			continue;
+
+		setup_pages_for_cpu(cpu);
+		smp_boot_secondary(cpu, do_page_writes);
+		secondary_cpus++;
+	}
+
+	/* CPU 0 does the flushes and checks the results */
+	do_page_flushes();
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index beaae84..7dc7799 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -96,3 +96,15 @@ file = tlbflush-code.flat
 smp = $(($MAX_SMP>4?4:$MAX_SMP))
 extra_params = -append 'page self'
 groups = tlbflush
+
+[tlbflush-data::all]
+file = tlbflush-data.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+groups = tlbflush
+
+[tlbflush-data::page]
+file = tlbflush-data.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append "page"
+groups = tlbflush
+
-- 
2.10.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 08/11] arm/tlbflush-data: Add TLB flush during data writes test
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée, Mark Rutland

This test is the cousin of the tlbflush-code test. Instead of flushing
running code it re-maps virtual addresses while a buffer is being filled
up. It then audits the results checking for writes that have ended up in
the wrong place.

While tlbflush-code exercises QEMU's translation invalidation logic this
test stresses the SoftMMU cputlb code and ensures it is semantically
correct.

The test optionally takes two parameters for debugging:

   cycles           - change the default number of test iterations
   page             - flush pages individually instead of all

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Mark Rutland <mark.rutland@arm.com>
---
 arm/Makefile.common |   2 +
 arm/tlbflush-data.c | 401 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  12 ++
 3 files changed, 415 insertions(+)
 create mode 100644 arm/tlbflush-data.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index de99a6e..528166d 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -14,6 +14,7 @@ tests-common += $(TEST_DIR)/spinlock-test.flat
 tests-common += $(TEST_DIR)/pci-test.flat
 tests-common += $(TEST_DIR)/gic.flat
 tests-common += $(TEST_DIR)/tlbflush-code.flat
+tests-common += $(TEST_DIR)/tlbflush-data.flat
 
 all: test_cases
 
@@ -83,3 +84,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
 $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
+$(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
diff --git a/arm/tlbflush-data.c b/arm/tlbflush-data.c
new file mode 100644
index 0000000..7920179
--- /dev/null
+++ b/arm/tlbflush-data.c
@@ -0,0 +1,401 @@
+/*
+ * TLB Flush Race Tests
+ *
+ * These tests are designed to test for incorrect TLB flush semantics
+ * under emulation. The initial CPU will set all the others working on
+ * a writing to a set of pages. It will then re-map one of the pages
+ * back and forth while recording the timestamps of when each page was
+ * active. The test fails if a write was detected on a page after the
+ * tlbflush switching to a new page should have completed.
+ *
+ * Copyright (C) 2016, Linaro, Alex Bennée <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#define NR_TIMESTAMPS 		((PAGE_SIZE/sizeof(u64)) << 2)
+#define NR_AUDIT_RECORDS	16384
+#define NR_DYNAMIC_PAGES 	3
+#define MAX_CPUS 		8
+
+#define MIN(a, b)		((a) < (b) ? (a) : (b))
+
+typedef struct {
+	u64    		timestamps[NR_TIMESTAMPS];
+} write_buffer;
+
+typedef struct {
+	write_buffer 	*newbuf;
+	u64		time_before_flush;
+	u64		time_after_flush;
+} audit_rec_t;
+
+typedef struct {
+	audit_rec_t 	records[NR_AUDIT_RECORDS];
+} audit_buffer;
+
+typedef struct {
+	write_buffer 	*stable_pages;
+	write_buffer    *dynamic_pages[NR_DYNAMIC_PAGES];
+	audit_buffer 	*audit;
+	unsigned int 	flush_count;
+} test_data_t;
+
+static test_data_t test_data[MAX_CPUS];
+
+static cpumask_t ready;
+static cpumask_t complete;
+
+static bool test_complete;
+static bool flush_verbose;
+static bool flush_by_page;
+static int test_cycles=3;
+static int secondary_cpus;
+
+static write_buffer * alloc_test_pages(void)
+{
+	write_buffer *pg;
+	pg = calloc(NR_TIMESTAMPS, sizeof(u64));
+	return pg;
+}
+
+static void setup_pages_for_cpu(int cpu)
+{
+	unsigned int i;
+
+	test_data[cpu].stable_pages = alloc_test_pages();
+
+	for (i=0; i<NR_DYNAMIC_PAGES; i++) {
+		test_data[cpu].dynamic_pages[i] = alloc_test_pages();
+	}
+
+	test_data[cpu].audit = calloc(NR_AUDIT_RECORDS, sizeof(audit_rec_t));
+}
+
+static audit_rec_t * get_audit_record(audit_buffer *buf, unsigned int record)
+{
+	return &buf->records[record];
+}
+
+/* Sync on a given cpumask */
+static void wait_on(int cpu, cpumask_t *mask)
+{
+	cpumask_set_cpu(cpu, mask);
+	while (!cpumask_full(mask))
+		cpu_relax();
+}
+
+static uint64_t sync_start(void)
+{
+	const uint64_t gate_mask = ~0x7ff;
+	uint64_t gate, now;
+	gate = get_cntvct() & gate_mask;
+	do {
+		now = get_cntvct();
+	} while ((now & gate_mask) == gate);
+
+	return now;
+}
+
+static void do_page_writes(void)
+{
+	unsigned int i, runs = 0;
+	int cpu = smp_processor_id();
+	write_buffer *stable_pages = test_data[cpu].stable_pages;
+	write_buffer *moving_page = test_data[cpu].dynamic_pages[0];
+
+	printf("CPU%d: ready %p/%p @ 0x%08" PRIx64"\n",
+		cpu, stable_pages, moving_page, get_cntvct());
+
+	while (!test_complete) {
+		u64 run_start, run_end;
+
+		smp_mb();
+		wait_on(cpu, &ready);
+		run_start = sync_start();
+
+		for (i = 0; i < NR_TIMESTAMPS; i++) {
+			u64 ts = get_cntvct();
+			moving_page->timestamps[i] = ts;
+			stable_pages->timestamps[i] = ts;
+		}
+
+		run_end = get_cntvct();
+		printf("CPU%d: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles)\n",
+			cpu, runs++, run_start, run_end, run_end - run_start);
+
+		/* wait on completion - gets clear my main thread*/
+		wait_on(cpu, &complete);
+	}
+}
+
+
+/*
+ * This is the core of the test. Timestamps are taken either side of
+ * the updating of the page table and the flush instruction. By
+ * keeping track of when the page mapping is changed we can detect any
+ * writes that shouldn't have made it to the other pages.
+ *
+ * This isn't the recommended way to update the page table. ARM
+ * recommends break-before-make so accesses that are in flight can
+ * trigger faults that can be handled cleanly.
+ */
+
+/* This mimics  __flush_tlb_range from the kernel, doing a series of
+ * flush operations and then the dsb() to complete. */
+static void flush_pages(unsigned long start, unsigned long end)
+{
+	unsigned long addr;
+	start = start >> 12;
+	end = end >> 12;
+
+	dsb(ishst);
+	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT -12)) {
+#if defined(__aarch64__)
+		asm("tlbi	vaae1is, %0" :: "r" (addr));
+#else
+		asm volatile("mcr p15, 0, %0, c8, c7, 3" :: "r" (addr));
+#endif
+	}
+	dsb(ish);
+}
+
+static void remap_one_page(test_data_t *data)
+{
+	u64 ts_before, ts_after;
+	int pg = (data->flush_count % (NR_DYNAMIC_PAGES + 1));
+	write_buffer *dynamic_pages_vaddr = data->dynamic_pages[0];
+	write_buffer *newbuf_paddr = data->dynamic_pages[pg];
+	write_buffer *end_page_paddr = newbuf_paddr+1;
+
+	ts_before = get_cntvct();
+	/* update the page table */
+	mmu_set_range_ptes(mmu_idmap,
+			(unsigned long) dynamic_pages_vaddr,
+			(unsigned long) newbuf_paddr,
+			(unsigned long) end_page_paddr,
+			__pgprot(PTE_WBWA));
+	/* until the flush + isb() writes may still go to old address */
+	if (flush_by_page) {
+		flush_pages((unsigned long)dynamic_pages_vaddr, (unsigned long)(dynamic_pages_vaddr+1));
+	} else {
+		flush_tlb_all();
+	}
+	ts_after = get_cntvct();
+
+	if (data->flush_count < NR_AUDIT_RECORDS) {
+		audit_rec_t *rec = get_audit_record(data->audit, data->flush_count);
+		rec->newbuf = newbuf_paddr;
+		rec->time_before_flush = ts_before;
+		rec->time_after_flush = ts_after;
+	}
+	data->flush_count++;
+}
+
+static int check_pages(int cpu, char *msg,
+		write_buffer *base_page, write_buffer *test_page,
+		audit_buffer *audit, unsigned int flushes)
+{
+	write_buffer *prev_page = base_page;
+	unsigned int empty = 0, write = 0, late = 0, weird = 0;
+	unsigned int ts_index = 0, audit_index;
+	u64 ts;
+
+	/* For each audit record */
+	for (audit_index = 0; audit_index < MIN(flushes, NR_AUDIT_RECORDS); audit_index++) {
+		audit_rec_t *rec = get_audit_record(audit, audit_index);
+
+		do {
+			/* Work through timestamps until we overtake
+			 * this audit record */
+			ts = test_page->timestamps[ts_index];
+
+			if (ts == 0) {
+				empty++;
+			} else if (ts < rec->time_before_flush) {
+				if (test_page == prev_page) {
+					write++;
+				} else {
+					late++;
+				}
+			} else if (ts >= rec->time_before_flush
+				&& ts <= rec->time_after_flush) {
+				if (test_page == prev_page
+					|| test_page == rec->newbuf) {
+					write++;
+				} else {
+					weird++;
+				}
+			} else if (ts > rec->time_after_flush) {
+				if (test_page == rec->newbuf) {
+					write++;
+				}
+				/* It's possible the ts is way ahead
+				 * of the current record so we can't
+				 * call a non-match weird...
+				 *
+				 * Time to skip to next audit record
+				 */
+				break;
+			}
+
+			ts = test_page->timestamps[ts_index++];
+		} while (ts <= rec->time_after_flush && ts_index < NR_TIMESTAMPS);
+
+
+		/* Next record */
+		prev_page = rec->newbuf;
+	} /* for each audit record */
+
+	if (flush_verbose) {
+		printf("CPU%d: %s %p => %p %u/%u/%u/%u (0/OK/L/?) = %u total\n",
+			cpu, msg, test_page, base_page,
+			empty, write, late, weird, empty+write+late+weird);
+	}
+
+	return weird;
+}
+
+static int audit_cpu_pages(int cpu, test_data_t *data)
+{
+	unsigned int pg, writes=0, ts_index = 0;
+	write_buffer *test_page;
+	int errors = 0;
+
+	/* first the stable page */
+	test_page = data->stable_pages;
+	do {
+		if (test_page->timestamps[ts_index++]) {
+			writes++;
+		}
+	} while (ts_index < NR_TIMESTAMPS);
+
+	if (writes != ts_index) {
+		errors += 1;
+	}
+
+	if (flush_verbose) {
+		printf("CPU%d: stable page %p %u writes\n",
+			cpu, test_page, writes);
+	}
+
+
+	/* Restore the mapping for dynamic page */
+	test_page = data->dynamic_pages[0];
+
+	mmu_set_range_ptes(mmu_idmap,
+			(unsigned long) test_page,
+			(unsigned long) test_page,
+			(unsigned long) &test_page[1],
+			__pgprot(PTE_WBWA));
+	flush_tlb_all();
+
+	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
+		errors += check_pages(cpu, "dynamic page", test_page,
+				data->dynamic_pages[pg],
+				data->audit, data->flush_count);
+	}
+
+	/* reset for next run */
+	memset(data->stable_pages, 0, sizeof(write_buffer));
+	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
+		memset(data->dynamic_pages[pg], 0, sizeof(write_buffer));
+	}
+	memset(data->audit, 0, sizeof(audit_buffer));
+	data->flush_count = 0;
+	smp_mb();
+
+	report("CPU%d: checked, errors: %d", errors == 0, cpu, errors);
+	return errors;
+}
+
+static void do_page_flushes(void)
+{
+	int i, cpu;
+
+	printf("CPU0: ready @ 0x%08" PRIx64"\n", get_cntvct());
+
+	for (i=0; i<test_cycles; i++) {
+		unsigned int flushes=0;
+		u64 run_start, run_end;
+		int cpus_finished;
+
+		cpumask_clear(&complete);
+		wait_on(0, &ready);
+		run_start = sync_start();
+
+		do {
+			for_each_present_cpu(cpu) {
+				if (cpu == 0)
+					continue;
+
+				/* do remap & flush */
+				remap_one_page(&test_data[cpu]);
+				flushes++;
+			}
+
+			cpus_finished = cpumask_weight(&complete);
+		} while (cpus_finished < secondary_cpus);
+
+		run_end = get_cntvct();
+
+		printf("CPU0: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles, %u flushes)\n",
+			i, run_start, run_end, run_end - run_start, flushes);
+
+		/* Reset our ready mask for next cycle */
+		cpumask_clear_cpu(0, &ready);
+		smp_mb();
+		wait_on(0, &complete);
+
+		/* Check for discrepancies */
+		for_each_present_cpu(cpu) {
+			if (cpu == 0)
+				continue;
+			audit_cpu_pages(cpu, &test_data[cpu]);
+		}
+	}
+
+	test_complete = true;
+	smp_mb();
+	cpumask_set_cpu(0, &ready);
+	cpumask_set_cpu(0, &complete);
+}
+
+int main(int argc, char **argv)
+{
+	int cpu, i;
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+		if (strcmp(arg, "verbose") == 0) {
+			flush_verbose = true;
+		}
+		if (strcmp(arg, "page") == 0) {
+			flush_by_page = true;
+		}
+		if (strstr(arg, "cycles=") != NULL) {
+			char *p = strstr(arg, "=");
+			test_cycles = atol(p+1);
+		}
+	}
+
+	for_each_present_cpu(cpu) {
+		if (cpu == 0)
+			continue;
+
+		setup_pages_for_cpu(cpu);
+		smp_boot_secondary(cpu, do_page_writes);
+		secondary_cpus++;
+	}
+
+	/* CPU 0 does the flushes and checks the results */
+	do_page_flushes();
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index beaae84..7dc7799 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -96,3 +96,15 @@ file = tlbflush-code.flat
 smp = $(($MAX_SMP>4?4:$MAX_SMP))
 extra_params = -append 'page self'
 groups = tlbflush
+
+[tlbflush-data::all]
+file = tlbflush-data.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+groups = tlbflush
+
+[tlbflush-data::page]
+file = tlbflush-data.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append "page"
+groups = tlbflush
+
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 08/11] arm/tlbflush-data: Add TLB flush during data writes test
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

This test is the cousin of the tlbflush-code test. Instead of flushing
running code it re-maps virtual addresses while a buffer is being filled
up. It then audits the results checking for writes that have ended up in
the wrong place.

While tlbflush-code exercises QEMU's translation invalidation logic this
test stresses the SoftMMU cputlb code and ensures it is semantically
correct.

The test optionally takes two parameters for debugging:

   cycles           - change the default number of test iterations
   page             - flush pages individually instead of all

Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
CC: Mark Rutland <mark.rutland@arm.com>
---
 arm/Makefile.common |   2 +
 arm/tlbflush-data.c | 401 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  12 ++
 3 files changed, 415 insertions(+)
 create mode 100644 arm/tlbflush-data.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index de99a6e..528166d 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -14,6 +14,7 @@ tests-common += $(TEST_DIR)/spinlock-test.flat
 tests-common += $(TEST_DIR)/pci-test.flat
 tests-common += $(TEST_DIR)/gic.flat
 tests-common += $(TEST_DIR)/tlbflush-code.flat
+tests-common += $(TEST_DIR)/tlbflush-data.flat
 
 all: test_cases
 
@@ -83,3 +84,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
 $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
+$(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
diff --git a/arm/tlbflush-data.c b/arm/tlbflush-data.c
new file mode 100644
index 0000000..7920179
--- /dev/null
+++ b/arm/tlbflush-data.c
@@ -0,0 +1,401 @@
+/*
+ * TLB Flush Race Tests
+ *
+ * These tests are designed to test for incorrect TLB flush semantics
+ * under emulation. The initial CPU will set all the others working on
+ * a writing to a set of pages. It will then re-map one of the pages
+ * back and forth while recording the timestamps of when each page was
+ * active. The test fails if a write was detected on a page after the
+ * tlbflush switching to a new page should have completed.
+ *
+ * Copyright (C) 2016, Linaro, Alex Benn?e <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#define NR_TIMESTAMPS 		((PAGE_SIZE/sizeof(u64)) << 2)
+#define NR_AUDIT_RECORDS	16384
+#define NR_DYNAMIC_PAGES 	3
+#define MAX_CPUS 		8
+
+#define MIN(a, b)		((a) < (b) ? (a) : (b))
+
+typedef struct {
+	u64    		timestamps[NR_TIMESTAMPS];
+} write_buffer;
+
+typedef struct {
+	write_buffer 	*newbuf;
+	u64		time_before_flush;
+	u64		time_after_flush;
+} audit_rec_t;
+
+typedef struct {
+	audit_rec_t 	records[NR_AUDIT_RECORDS];
+} audit_buffer;
+
+typedef struct {
+	write_buffer 	*stable_pages;
+	write_buffer    *dynamic_pages[NR_DYNAMIC_PAGES];
+	audit_buffer 	*audit;
+	unsigned int 	flush_count;
+} test_data_t;
+
+static test_data_t test_data[MAX_CPUS];
+
+static cpumask_t ready;
+static cpumask_t complete;
+
+static bool test_complete;
+static bool flush_verbose;
+static bool flush_by_page;
+static int test_cycles=3;
+static int secondary_cpus;
+
+static write_buffer * alloc_test_pages(void)
+{
+	write_buffer *pg;
+	pg = calloc(NR_TIMESTAMPS, sizeof(u64));
+	return pg;
+}
+
+static void setup_pages_for_cpu(int cpu)
+{
+	unsigned int i;
+
+	test_data[cpu].stable_pages = alloc_test_pages();
+
+	for (i=0; i<NR_DYNAMIC_PAGES; i++) {
+		test_data[cpu].dynamic_pages[i] = alloc_test_pages();
+	}
+
+	test_data[cpu].audit = calloc(NR_AUDIT_RECORDS, sizeof(audit_rec_t));
+}
+
+static audit_rec_t * get_audit_record(audit_buffer *buf, unsigned int record)
+{
+	return &buf->records[record];
+}
+
+/* Sync on a given cpumask */
+static void wait_on(int cpu, cpumask_t *mask)
+{
+	cpumask_set_cpu(cpu, mask);
+	while (!cpumask_full(mask))
+		cpu_relax();
+}
+
+static uint64_t sync_start(void)
+{
+	const uint64_t gate_mask = ~0x7ff;
+	uint64_t gate, now;
+	gate = get_cntvct() & gate_mask;
+	do {
+		now = get_cntvct();
+	} while ((now & gate_mask) == gate);
+
+	return now;
+}
+
+static void do_page_writes(void)
+{
+	unsigned int i, runs = 0;
+	int cpu = smp_processor_id();
+	write_buffer *stable_pages = test_data[cpu].stable_pages;
+	write_buffer *moving_page = test_data[cpu].dynamic_pages[0];
+
+	printf("CPU%d: ready %p/%p @ 0x%08" PRIx64"\n",
+		cpu, stable_pages, moving_page, get_cntvct());
+
+	while (!test_complete) {
+		u64 run_start, run_end;
+
+		smp_mb();
+		wait_on(cpu, &ready);
+		run_start = sync_start();
+
+		for (i = 0; i < NR_TIMESTAMPS; i++) {
+			u64 ts = get_cntvct();
+			moving_page->timestamps[i] = ts;
+			stable_pages->timestamps[i] = ts;
+		}
+
+		run_end = get_cntvct();
+		printf("CPU%d: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles)\n",
+			cpu, runs++, run_start, run_end, run_end - run_start);
+
+		/* wait on completion - gets clear my main thread*/
+		wait_on(cpu, &complete);
+	}
+}
+
+
+/*
+ * This is the core of the test. Timestamps are taken either side of
+ * the updating of the page table and the flush instruction. By
+ * keeping track of when the page mapping is changed we can detect any
+ * writes that shouldn't have made it to the other pages.
+ *
+ * This isn't the recommended way to update the page table. ARM
+ * recommends break-before-make so accesses that are in flight can
+ * trigger faults that can be handled cleanly.
+ */
+
+/* This mimics  __flush_tlb_range from the kernel, doing a series of
+ * flush operations and then the dsb() to complete. */
+static void flush_pages(unsigned long start, unsigned long end)
+{
+	unsigned long addr;
+	start = start >> 12;
+	end = end >> 12;
+
+	dsb(ishst);
+	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT -12)) {
+#if defined(__aarch64__)
+		asm("tlbi	vaae1is, %0" :: "r" (addr));
+#else
+		asm volatile("mcr p15, 0, %0, c8, c7, 3" :: "r" (addr));
+#endif
+	}
+	dsb(ish);
+}
+
+static void remap_one_page(test_data_t *data)
+{
+	u64 ts_before, ts_after;
+	int pg = (data->flush_count % (NR_DYNAMIC_PAGES + 1));
+	write_buffer *dynamic_pages_vaddr = data->dynamic_pages[0];
+	write_buffer *newbuf_paddr = data->dynamic_pages[pg];
+	write_buffer *end_page_paddr = newbuf_paddr+1;
+
+	ts_before = get_cntvct();
+	/* update the page table */
+	mmu_set_range_ptes(mmu_idmap,
+			(unsigned long) dynamic_pages_vaddr,
+			(unsigned long) newbuf_paddr,
+			(unsigned long) end_page_paddr,
+			__pgprot(PTE_WBWA));
+	/* until the flush + isb() writes may still go to old address */
+	if (flush_by_page) {
+		flush_pages((unsigned long)dynamic_pages_vaddr, (unsigned long)(dynamic_pages_vaddr+1));
+	} else {
+		flush_tlb_all();
+	}
+	ts_after = get_cntvct();
+
+	if (data->flush_count < NR_AUDIT_RECORDS) {
+		audit_rec_t *rec = get_audit_record(data->audit, data->flush_count);
+		rec->newbuf = newbuf_paddr;
+		rec->time_before_flush = ts_before;
+		rec->time_after_flush = ts_after;
+	}
+	data->flush_count++;
+}
+
+static int check_pages(int cpu, char *msg,
+		write_buffer *base_page, write_buffer *test_page,
+		audit_buffer *audit, unsigned int flushes)
+{
+	write_buffer *prev_page = base_page;
+	unsigned int empty = 0, write = 0, late = 0, weird = 0;
+	unsigned int ts_index = 0, audit_index;
+	u64 ts;
+
+	/* For each audit record */
+	for (audit_index = 0; audit_index < MIN(flushes, NR_AUDIT_RECORDS); audit_index++) {
+		audit_rec_t *rec = get_audit_record(audit, audit_index);
+
+		do {
+			/* Work through timestamps until we overtake
+			 * this audit record */
+			ts = test_page->timestamps[ts_index];
+
+			if (ts == 0) {
+				empty++;
+			} else if (ts < rec->time_before_flush) {
+				if (test_page == prev_page) {
+					write++;
+				} else {
+					late++;
+				}
+			} else if (ts >= rec->time_before_flush
+				&& ts <= rec->time_after_flush) {
+				if (test_page == prev_page
+					|| test_page == rec->newbuf) {
+					write++;
+				} else {
+					weird++;
+				}
+			} else if (ts > rec->time_after_flush) {
+				if (test_page == rec->newbuf) {
+					write++;
+				}
+				/* It's possible the ts is way ahead
+				 * of the current record so we can't
+				 * call a non-match weird...
+				 *
+				 * Time to skip to next audit record
+				 */
+				break;
+			}
+
+			ts = test_page->timestamps[ts_index++];
+		} while (ts <= rec->time_after_flush && ts_index < NR_TIMESTAMPS);
+
+
+		/* Next record */
+		prev_page = rec->newbuf;
+	} /* for each audit record */
+
+	if (flush_verbose) {
+		printf("CPU%d: %s %p => %p %u/%u/%u/%u (0/OK/L/?) = %u total\n",
+			cpu, msg, test_page, base_page,
+			empty, write, late, weird, empty+write+late+weird);
+	}
+
+	return weird;
+}
+
+static int audit_cpu_pages(int cpu, test_data_t *data)
+{
+	unsigned int pg, writes=0, ts_index = 0;
+	write_buffer *test_page;
+	int errors = 0;
+
+	/* first the stable page */
+	test_page = data->stable_pages;
+	do {
+		if (test_page->timestamps[ts_index++]) {
+			writes++;
+		}
+	} while (ts_index < NR_TIMESTAMPS);
+
+	if (writes != ts_index) {
+		errors += 1;
+	}
+
+	if (flush_verbose) {
+		printf("CPU%d: stable page %p %u writes\n",
+			cpu, test_page, writes);
+	}
+
+
+	/* Restore the mapping for dynamic page */
+	test_page = data->dynamic_pages[0];
+
+	mmu_set_range_ptes(mmu_idmap,
+			(unsigned long) test_page,
+			(unsigned long) test_page,
+			(unsigned long) &test_page[1],
+			__pgprot(PTE_WBWA));
+	flush_tlb_all();
+
+	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
+		errors += check_pages(cpu, "dynamic page", test_page,
+				data->dynamic_pages[pg],
+				data->audit, data->flush_count);
+	}
+
+	/* reset for next run */
+	memset(data->stable_pages, 0, sizeof(write_buffer));
+	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
+		memset(data->dynamic_pages[pg], 0, sizeof(write_buffer));
+	}
+	memset(data->audit, 0, sizeof(audit_buffer));
+	data->flush_count = 0;
+	smp_mb();
+
+	report("CPU%d: checked, errors: %d", errors == 0, cpu, errors);
+	return errors;
+}
+
+static void do_page_flushes(void)
+{
+	int i, cpu;
+
+	printf("CPU0: ready @ 0x%08" PRIx64"\n", get_cntvct());
+
+	for (i=0; i<test_cycles; i++) {
+		unsigned int flushes=0;
+		u64 run_start, run_end;
+		int cpus_finished;
+
+		cpumask_clear(&complete);
+		wait_on(0, &ready);
+		run_start = sync_start();
+
+		do {
+			for_each_present_cpu(cpu) {
+				if (cpu == 0)
+					continue;
+
+				/* do remap & flush */
+				remap_one_page(&test_data[cpu]);
+				flushes++;
+			}
+
+			cpus_finished = cpumask_weight(&complete);
+		} while (cpus_finished < secondary_cpus);
+
+		run_end = get_cntvct();
+
+		printf("CPU0: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles, %u flushes)\n",
+			i, run_start, run_end, run_end - run_start, flushes);
+
+		/* Reset our ready mask for next cycle */
+		cpumask_clear_cpu(0, &ready);
+		smp_mb();
+		wait_on(0, &complete);
+
+		/* Check for discrepancies */
+		for_each_present_cpu(cpu) {
+			if (cpu == 0)
+				continue;
+			audit_cpu_pages(cpu, &test_data[cpu]);
+		}
+	}
+
+	test_complete = true;
+	smp_mb();
+	cpumask_set_cpu(0, &ready);
+	cpumask_set_cpu(0, &complete);
+}
+
+int main(int argc, char **argv)
+{
+	int cpu, i;
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+		if (strcmp(arg, "verbose") == 0) {
+			flush_verbose = true;
+		}
+		if (strcmp(arg, "page") == 0) {
+			flush_by_page = true;
+		}
+		if (strstr(arg, "cycles=") != NULL) {
+			char *p = strstr(arg, "=");
+			test_cycles = atol(p+1);
+		}
+	}
+
+	for_each_present_cpu(cpu) {
+		if (cpu == 0)
+			continue;
+
+		setup_pages_for_cpu(cpu);
+		smp_boot_secondary(cpu, do_page_writes);
+		secondary_cpus++;
+	}
+
+	/* CPU 0 does the flushes and checks the results */
+	do_page_flushes();
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index beaae84..7dc7799 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -96,3 +96,15 @@ file = tlbflush-code.flat
 smp = $(($MAX_SMP>4?4:$MAX_SMP))
 extra_params = -append 'page self'
 groups = tlbflush
+
+[tlbflush-data::all]
+file = tlbflush-data.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+groups = tlbflush
+
+[tlbflush-data::page]
+file = tlbflush-data.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append "page"
+groups = tlbflush
+
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 09/11] arm/locking-tests: add comprehensive locking test
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

This test has been written mainly to stress multi-threaded TCG behaviour
but will demonstrate failure by default on real hardware. The test takes
the following parameters:

  - "lock" use GCC's locking semantics
  - "atomic" use GCC's __atomic primitives
  - "wfelock" use WaitForEvent sleep
  - "excl" use load/store exclusive semantics

Also two more options allow the test to be tweaked

  - "noshuffle" disables the memory shuffling
  - "count=%ld" set your own per-CPU increment count

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - Don't use thumb style strexeq stuff
  - Add atomic and wfelock tests
  - Add count/noshuffle test controls
  - Move barrier tests to separate test file
v4
  - fix up unitests.cfg to use correct test name
  - move into "locking" group, remove barrier tests
  - use a table to add tests, mark which are expected to work
  - correctly report XFAIL
v5
  - max out at -smp 4 in unittest.cfg
v7
  - make test control flags bools
  - default the count to 100000 (so it doesn't timeout)
---
 arm/Makefile.common |   2 +
 arm/locking-test.c  | 302 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  34 ++++++
 3 files changed, 338 insertions(+)
 create mode 100644 arm/locking-test.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index 528166d..eb4cfdf 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -15,6 +15,7 @@ tests-common += $(TEST_DIR)/pci-test.flat
 tests-common += $(TEST_DIR)/gic.flat
 tests-common += $(TEST_DIR)/tlbflush-code.flat
 tests-common += $(TEST_DIR)/tlbflush-data.flat
+tests-common += $(TEST_DIR)/locking-test.flat
 
 all: test_cases
 
@@ -85,3 +86,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
 $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
 $(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
+$(TEST_DIR)/locking-test.elf: $(cstart.o) $(TEST_DIR)/locking-test.o
diff --git a/arm/locking-test.c b/arm/locking-test.c
new file mode 100644
index 0000000..f10c61b
--- /dev/null
+++ b/arm/locking-test.c
@@ -0,0 +1,302 @@
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#include <prng.h>
+
+#define MAX_CPUS 8
+
+/* Test definition structure
+ *
+ * A simple structure that describes the test name, expected pass and
+ * increment function.
+ */
+
+/* Function pointers for test */
+typedef void (*inc_fn)(int cpu);
+
+typedef struct {
+	const char *test_name;
+	bool  should_pass;
+	inc_fn main_fn;
+} test_descr_t;
+
+/* How many increments to do */
+static int increment_count = 1000000;
+static bool do_shuffle = true;
+
+/* Shared value all the tests attempt to safely increment using
+ * various forms of atomic locking and exclusive behaviour.
+ */
+static unsigned int shared_value;
+
+/* PAGE_SIZE * uint32_t means we span several pages */
+__attribute__((aligned(PAGE_SIZE))) static uint32_t memory_array[PAGE_SIZE];
+
+/* We use the alignment of the following to ensure accesses to locking
+ * and synchronisation primatives don't interfere with the page of the
+ * shared value
+ */
+__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
+__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
+__attribute__((aligned(PAGE_SIZE))) struct isaac_ctx prng_context[MAX_CPUS];
+
+/* Some of the approaches use a global lock to prevent contention. */
+static int global_lock;
+
+/* In any SMP setting this *should* fail due to cores stepping on
+ * each other updating the shared variable
+ */
+static void increment_shared(int cpu)
+{
+	(void)cpu;
+
+	shared_value++;
+}
+
+/* GCC __sync primitives are deprecated in favour of __atomic */
+static void increment_shared_with_lock(int cpu)
+{
+	(void)cpu;
+
+	while (__sync_lock_test_and_set(&global_lock, 1));
+	shared_value++;
+	__sync_lock_release(&global_lock);
+}
+
+/* In practice even __ATOMIC_RELAXED uses ARM's ldxr/stex exclusive
+ * semantics */
+static void increment_shared_with_atomic(int cpu)
+{
+	(void)cpu;
+
+	__atomic_add_fetch(&shared_value, 1, __ATOMIC_SEQ_CST);
+}
+
+
+/*
+ * Load/store exclusive with WFE (wait-for-event)
+ *
+ * See ARMv8 ARM examples:
+ *   Use of Wait For Event (WFE) and Send Event (SEV) with locks
+ */
+
+static void increment_shared_with_wfelock(int cpu)
+{
+	(void)cpu;
+
+#if defined(__aarch64__)
+	asm volatile(
+	"	mov     w1, #1\n"
+	"       sevl\n"
+	"       prfm PSTL1KEEP, [%[lock]]\n"
+	"1:     wfe\n"
+	"	ldaxr	w0, [%[lock]]\n"
+	"	cbnz    w0, 1b\n"
+	"	stxr    w0, w1, [%[lock]]\n"
+	"	cbnz	w0, 1b\n"
+	/* lock held */
+	"	ldr	w0, [%[sptr]]\n"
+	"	add	w0, w0, #0x1\n"
+	"	str	w0, [%[sptr]]\n"
+	/* now release */
+	"	stlr	wzr, [%[lock]]\n"
+	: /* out */
+	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
+	: "w0", "w1", "cc");
+#else
+	asm volatile(
+	"	mov     r1, #1\n"
+	"1:	ldrex	r0, [%[lock]]\n"
+	"	cmp     r0, #0\n"
+	"	wfene\n"
+	"	strexeq r0, r1, [%[lock]]\n"
+	"	cmpeq	r0, #0\n"
+	"	bne	1b\n"
+	"	dmb\n"
+	/* lock held */
+	"	ldr	r0, [%[sptr]]\n"
+	"	add	r0, r0, #0x1\n"
+	"	str	r0, [%[sptr]]\n"
+	/* now release */
+	"	mov	r0, #0\n"
+	"	dmb\n"
+	"	str	r0, [%[lock]]\n"
+	"	dsb\n"
+	"	sev\n"
+	: /* out */
+	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+}
+
+
+/*
+ * Hand-written version of the load/store exclusive
+ */
+static void increment_shared_with_excl(int cpu)
+{
+	(void)cpu;
+
+#if defined(__aarch64__)
+        asm volatile(
+	"1:	ldxr	w0, [%[sptr]]\n"
+	"	add     w0, w0, #0x1\n"
+	"	stxr	w1, w0, [%[sptr]]\n"
+	"	cbnz	w1, 1b\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "w0", "w1", "cc");
+#else
+	asm volatile(
+	"1:	ldrex	r0, [%[sptr]]\n"
+	"	add     r0, r0, #0x1\n"
+	"	strex	r1, r0, [%[sptr]]\n"
+	"	cmp	r1, #0\n"
+	"	bne	1b\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+}
+
+/* Test array */
+static test_descr_t tests[] = {
+	{ "none", false, increment_shared },
+	{ "lock", true, increment_shared_with_lock },
+	{ "atomic", true, increment_shared_with_atomic },
+	{ "wfelock", true, increment_shared_with_wfelock },
+	{ "excl", true, increment_shared_with_excl }
+};
+
+/* The idea of this is just to generate some random load/store
+ * activity which may or may not race with an un-barried incremented
+ * of the shared counter
+ */
+static void shuffle_memory(int cpu)
+{
+	int i;
+	uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
+	uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
+	int count = seq & 0x1f;
+	uint32_t val=0;
+
+	seq >>= 5;
+
+	for (i=0; i<count; i++) {
+		int index = seq & ~PAGE_MASK;
+		if (lspat & 1) {
+			val ^= memory_array[index];
+		} else {
+			memory_array[index] = val;
+		}
+		seq >>= PAGE_SHIFT;
+		seq ^= lspat;
+		lspat >>= 1;
+	}
+
+}
+
+static inc_fn increment_function;
+
+static void do_increment(void)
+{
+	int i;
+	int cpu = smp_processor_id();
+
+	printf("CPU%d: online and ++ing\n", cpu);
+
+	for (i=0; i < increment_count; i++) {
+		per_cpu_value[cpu]++;
+		increment_function(cpu);
+
+		if (do_shuffle)
+			shuffle_memory(cpu);
+	}
+
+	printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+static void setup_and_run_test(test_descr_t *test)
+{
+	unsigned int i, sum = 0;
+	int cpu, cpu_cnt = 0;
+
+	increment_function = test->main_fn;
+
+	/* fill our random page */
+        for (i=0; i<PAGE_SIZE; i++) {
+		memory_array[i] = isaac_next_uint32(&prng_context[0]);
+	}
+
+	for_each_present_cpu(cpu) {
+		uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
+		cpu_cnt++;
+		if (cpu == 0)
+			continue;
+
+		isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
+		smp_boot_secondary(cpu, do_increment);
+	}
+
+	do_increment();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	/* All CPUs done, do we add up */
+	for_each_present_cpu(cpu) {
+		sum += per_cpu_value[cpu];
+	}
+
+	if (test->should_pass) {
+		report("total incs %d", sum == shared_value, shared_value);
+	} else {
+		report_xfail("total incs %d", true, sum == shared_value, shared_value);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	static const unsigned char seed[] = "myseed";
+	test_descr_t *test = &tests[0];
+	int i;
+	unsigned int j;
+
+	isaac_init(&prng_context[0], &seed[0], sizeof(seed));
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		/* Check for test name */
+		for (j = 0; j < ARRAY_SIZE(tests); j++) {
+			if (strcmp(arg, tests[j].test_name) == 0)
+				test = &tests[j];
+		}
+
+		/* Test modifiers */
+		if (strcmp(arg, "noshuffle") == 0) {
+			do_shuffle = false;
+			report_prefix_push("noshuffle");
+		} else if (strstr(arg, "count=") != NULL) {
+			char *p = strstr(arg, "=");
+			increment_count = atol(p+1);
+		} else {
+			isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
+		}
+	}
+
+	if (test) {
+		setup_and_run_test(test);
+	} else {
+		report("Unknown test", false);
+	}
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 7dc7799..abbfe79 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -108,3 +108,37 @@ smp = $(($MAX_SMP>4?4:$MAX_SMP))
 extra_params = -append "page"
 groups = tlbflush
 
+# Locking tests
+[locking::none]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+groups = locking
+accel = tcg
+
+[locking::lock]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'lock'
+groups = locking
+accel = tcg
+
+[locking::atomic]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'atomic'
+groups = locking
+accel = tcg
+
+[locking::wfelock]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'wfelock'
+groups = locking
+accel = tcg
+
+[locking::excl]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'excl'
+groups = locking
+accel = tcg
-- 
2.10.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 09/11] arm/locking-tests: add comprehensive locking test
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

This test has been written mainly to stress multi-threaded TCG behaviour
but will demonstrate failure by default on real hardware. The test takes
the following parameters:

  - "lock" use GCC's locking semantics
  - "atomic" use GCC's __atomic primitives
  - "wfelock" use WaitForEvent sleep
  - "excl" use load/store exclusive semantics

Also two more options allow the test to be tweaked

  - "noshuffle" disables the memory shuffling
  - "count=%ld" set your own per-CPU increment count

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - Don't use thumb style strexeq stuff
  - Add atomic and wfelock tests
  - Add count/noshuffle test controls
  - Move barrier tests to separate test file
v4
  - fix up unitests.cfg to use correct test name
  - move into "locking" group, remove barrier tests
  - use a table to add tests, mark which are expected to work
  - correctly report XFAIL
v5
  - max out at -smp 4 in unittest.cfg
v7
  - make test control flags bools
  - default the count to 100000 (so it doesn't timeout)
---
 arm/Makefile.common |   2 +
 arm/locking-test.c  | 302 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  34 ++++++
 3 files changed, 338 insertions(+)
 create mode 100644 arm/locking-test.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index 528166d..eb4cfdf 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -15,6 +15,7 @@ tests-common += $(TEST_DIR)/pci-test.flat
 tests-common += $(TEST_DIR)/gic.flat
 tests-common += $(TEST_DIR)/tlbflush-code.flat
 tests-common += $(TEST_DIR)/tlbflush-data.flat
+tests-common += $(TEST_DIR)/locking-test.flat
 
 all: test_cases
 
@@ -85,3 +86,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
 $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
 $(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
+$(TEST_DIR)/locking-test.elf: $(cstart.o) $(TEST_DIR)/locking-test.o
diff --git a/arm/locking-test.c b/arm/locking-test.c
new file mode 100644
index 0000000..f10c61b
--- /dev/null
+++ b/arm/locking-test.c
@@ -0,0 +1,302 @@
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#include <prng.h>
+
+#define MAX_CPUS 8
+
+/* Test definition structure
+ *
+ * A simple structure that describes the test name, expected pass and
+ * increment function.
+ */
+
+/* Function pointers for test */
+typedef void (*inc_fn)(int cpu);
+
+typedef struct {
+	const char *test_name;
+	bool  should_pass;
+	inc_fn main_fn;
+} test_descr_t;
+
+/* How many increments to do */
+static int increment_count = 1000000;
+static bool do_shuffle = true;
+
+/* Shared value all the tests attempt to safely increment using
+ * various forms of atomic locking and exclusive behaviour.
+ */
+static unsigned int shared_value;
+
+/* PAGE_SIZE * uint32_t means we span several pages */
+__attribute__((aligned(PAGE_SIZE))) static uint32_t memory_array[PAGE_SIZE];
+
+/* We use the alignment of the following to ensure accesses to locking
+ * and synchronisation primatives don't interfere with the page of the
+ * shared value
+ */
+__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
+__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
+__attribute__((aligned(PAGE_SIZE))) struct isaac_ctx prng_context[MAX_CPUS];
+
+/* Some of the approaches use a global lock to prevent contention. */
+static int global_lock;
+
+/* In any SMP setting this *should* fail due to cores stepping on
+ * each other updating the shared variable
+ */
+static void increment_shared(int cpu)
+{
+	(void)cpu;
+
+	shared_value++;
+}
+
+/* GCC __sync primitives are deprecated in favour of __atomic */
+static void increment_shared_with_lock(int cpu)
+{
+	(void)cpu;
+
+	while (__sync_lock_test_and_set(&global_lock, 1));
+	shared_value++;
+	__sync_lock_release(&global_lock);
+}
+
+/* In practice even __ATOMIC_RELAXED uses ARM's ldxr/stex exclusive
+ * semantics */
+static void increment_shared_with_atomic(int cpu)
+{
+	(void)cpu;
+
+	__atomic_add_fetch(&shared_value, 1, __ATOMIC_SEQ_CST);
+}
+
+
+/*
+ * Load/store exclusive with WFE (wait-for-event)
+ *
+ * See ARMv8 ARM examples:
+ *   Use of Wait For Event (WFE) and Send Event (SEV) with locks
+ */
+
+static void increment_shared_with_wfelock(int cpu)
+{
+	(void)cpu;
+
+#if defined(__aarch64__)
+	asm volatile(
+	"	mov     w1, #1\n"
+	"       sevl\n"
+	"       prfm PSTL1KEEP, [%[lock]]\n"
+	"1:     wfe\n"
+	"	ldaxr	w0, [%[lock]]\n"
+	"	cbnz    w0, 1b\n"
+	"	stxr    w0, w1, [%[lock]]\n"
+	"	cbnz	w0, 1b\n"
+	/* lock held */
+	"	ldr	w0, [%[sptr]]\n"
+	"	add	w0, w0, #0x1\n"
+	"	str	w0, [%[sptr]]\n"
+	/* now release */
+	"	stlr	wzr, [%[lock]]\n"
+	: /* out */
+	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
+	: "w0", "w1", "cc");
+#else
+	asm volatile(
+	"	mov     r1, #1\n"
+	"1:	ldrex	r0, [%[lock]]\n"
+	"	cmp     r0, #0\n"
+	"	wfene\n"
+	"	strexeq r0, r1, [%[lock]]\n"
+	"	cmpeq	r0, #0\n"
+	"	bne	1b\n"
+	"	dmb\n"
+	/* lock held */
+	"	ldr	r0, [%[sptr]]\n"
+	"	add	r0, r0, #0x1\n"
+	"	str	r0, [%[sptr]]\n"
+	/* now release */
+	"	mov	r0, #0\n"
+	"	dmb\n"
+	"	str	r0, [%[lock]]\n"
+	"	dsb\n"
+	"	sev\n"
+	: /* out */
+	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+}
+
+
+/*
+ * Hand-written version of the load/store exclusive
+ */
+static void increment_shared_with_excl(int cpu)
+{
+	(void)cpu;
+
+#if defined(__aarch64__)
+        asm volatile(
+	"1:	ldxr	w0, [%[sptr]]\n"
+	"	add     w0, w0, #0x1\n"
+	"	stxr	w1, w0, [%[sptr]]\n"
+	"	cbnz	w1, 1b\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "w0", "w1", "cc");
+#else
+	asm volatile(
+	"1:	ldrex	r0, [%[sptr]]\n"
+	"	add     r0, r0, #0x1\n"
+	"	strex	r1, r0, [%[sptr]]\n"
+	"	cmp	r1, #0\n"
+	"	bne	1b\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+}
+
+/* Test array */
+static test_descr_t tests[] = {
+	{ "none", false, increment_shared },
+	{ "lock", true, increment_shared_with_lock },
+	{ "atomic", true, increment_shared_with_atomic },
+	{ "wfelock", true, increment_shared_with_wfelock },
+	{ "excl", true, increment_shared_with_excl }
+};
+
+/* The idea of this is just to generate some random load/store
+ * activity which may or may not race with an un-barried incremented
+ * of the shared counter
+ */
+static void shuffle_memory(int cpu)
+{
+	int i;
+	uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
+	uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
+	int count = seq & 0x1f;
+	uint32_t val=0;
+
+	seq >>= 5;
+
+	for (i=0; i<count; i++) {
+		int index = seq & ~PAGE_MASK;
+		if (lspat & 1) {
+			val ^= memory_array[index];
+		} else {
+			memory_array[index] = val;
+		}
+		seq >>= PAGE_SHIFT;
+		seq ^= lspat;
+		lspat >>= 1;
+	}
+
+}
+
+static inc_fn increment_function;
+
+static void do_increment(void)
+{
+	int i;
+	int cpu = smp_processor_id();
+
+	printf("CPU%d: online and ++ing\n", cpu);
+
+	for (i=0; i < increment_count; i++) {
+		per_cpu_value[cpu]++;
+		increment_function(cpu);
+
+		if (do_shuffle)
+			shuffle_memory(cpu);
+	}
+
+	printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+static void setup_and_run_test(test_descr_t *test)
+{
+	unsigned int i, sum = 0;
+	int cpu, cpu_cnt = 0;
+
+	increment_function = test->main_fn;
+
+	/* fill our random page */
+        for (i=0; i<PAGE_SIZE; i++) {
+		memory_array[i] = isaac_next_uint32(&prng_context[0]);
+	}
+
+	for_each_present_cpu(cpu) {
+		uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
+		cpu_cnt++;
+		if (cpu == 0)
+			continue;
+
+		isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
+		smp_boot_secondary(cpu, do_increment);
+	}
+
+	do_increment();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	/* All CPUs done, do we add up */
+	for_each_present_cpu(cpu) {
+		sum += per_cpu_value[cpu];
+	}
+
+	if (test->should_pass) {
+		report("total incs %d", sum == shared_value, shared_value);
+	} else {
+		report_xfail("total incs %d", true, sum == shared_value, shared_value);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	static const unsigned char seed[] = "myseed";
+	test_descr_t *test = &tests[0];
+	int i;
+	unsigned int j;
+
+	isaac_init(&prng_context[0], &seed[0], sizeof(seed));
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		/* Check for test name */
+		for (j = 0; j < ARRAY_SIZE(tests); j++) {
+			if (strcmp(arg, tests[j].test_name) == 0)
+				test = &tests[j];
+		}
+
+		/* Test modifiers */
+		if (strcmp(arg, "noshuffle") == 0) {
+			do_shuffle = false;
+			report_prefix_push("noshuffle");
+		} else if (strstr(arg, "count=") != NULL) {
+			char *p = strstr(arg, "=");
+			increment_count = atol(p+1);
+		} else {
+			isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
+		}
+	}
+
+	if (test) {
+		setup_and_run_test(test);
+	} else {
+		report("Unknown test", false);
+	}
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 7dc7799..abbfe79 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -108,3 +108,37 @@ smp = $(($MAX_SMP>4?4:$MAX_SMP))
 extra_params = -append "page"
 groups = tlbflush
 
+# Locking tests
+[locking::none]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+groups = locking
+accel = tcg
+
+[locking::lock]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'lock'
+groups = locking
+accel = tcg
+
+[locking::atomic]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'atomic'
+groups = locking
+accel = tcg
+
+[locking::wfelock]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'wfelock'
+groups = locking
+accel = tcg
+
+[locking::excl]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'excl'
+groups = locking
+accel = tcg
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 09/11] arm/locking-tests: add comprehensive locking test
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

This test has been written mainly to stress multi-threaded TCG behaviour
but will demonstrate failure by default on real hardware. The test takes
the following parameters:

  - "lock" use GCC's locking semantics
  - "atomic" use GCC's __atomic primitives
  - "wfelock" use WaitForEvent sleep
  - "excl" use load/store exclusive semantics

Also two more options allow the test to be tweaked

  - "noshuffle" disables the memory shuffling
  - "count=%ld" set your own per-CPU increment count

Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>

---
v2
  - Don't use thumb style strexeq stuff
  - Add atomic and wfelock tests
  - Add count/noshuffle test controls
  - Move barrier tests to separate test file
v4
  - fix up unitests.cfg to use correct test name
  - move into "locking" group, remove barrier tests
  - use a table to add tests, mark which are expected to work
  - correctly report XFAIL
v5
  - max out at -smp 4 in unittest.cfg
v7
  - make test control flags bools
  - default the count to 100000 (so it doesn't timeout)
---
 arm/Makefile.common |   2 +
 arm/locking-test.c  | 302 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  34 ++++++
 3 files changed, 338 insertions(+)
 create mode 100644 arm/locking-test.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index 528166d..eb4cfdf 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -15,6 +15,7 @@ tests-common += $(TEST_DIR)/pci-test.flat
 tests-common += $(TEST_DIR)/gic.flat
 tests-common += $(TEST_DIR)/tlbflush-code.flat
 tests-common += $(TEST_DIR)/tlbflush-data.flat
+tests-common += $(TEST_DIR)/locking-test.flat
 
 all: test_cases
 
@@ -85,3 +86,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
 $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
 $(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
+$(TEST_DIR)/locking-test.elf: $(cstart.o) $(TEST_DIR)/locking-test.o
diff --git a/arm/locking-test.c b/arm/locking-test.c
new file mode 100644
index 0000000..f10c61b
--- /dev/null
+++ b/arm/locking-test.c
@@ -0,0 +1,302 @@
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#include <prng.h>
+
+#define MAX_CPUS 8
+
+/* Test definition structure
+ *
+ * A simple structure that describes the test name, expected pass and
+ * increment function.
+ */
+
+/* Function pointers for test */
+typedef void (*inc_fn)(int cpu);
+
+typedef struct {
+	const char *test_name;
+	bool  should_pass;
+	inc_fn main_fn;
+} test_descr_t;
+
+/* How many increments to do */
+static int increment_count = 1000000;
+static bool do_shuffle = true;
+
+/* Shared value all the tests attempt to safely increment using
+ * various forms of atomic locking and exclusive behaviour.
+ */
+static unsigned int shared_value;
+
+/* PAGE_SIZE * uint32_t means we span several pages */
+__attribute__((aligned(PAGE_SIZE))) static uint32_t memory_array[PAGE_SIZE];
+
+/* We use the alignment of the following to ensure accesses to locking
+ * and synchronisation primatives don't interfere with the page of the
+ * shared value
+ */
+__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
+__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
+__attribute__((aligned(PAGE_SIZE))) struct isaac_ctx prng_context[MAX_CPUS];
+
+/* Some of the approaches use a global lock to prevent contention. */
+static int global_lock;
+
+/* In any SMP setting this *should* fail due to cores stepping on
+ * each other updating the shared variable
+ */
+static void increment_shared(int cpu)
+{
+	(void)cpu;
+
+	shared_value++;
+}
+
+/* GCC __sync primitives are deprecated in favour of __atomic */
+static void increment_shared_with_lock(int cpu)
+{
+	(void)cpu;
+
+	while (__sync_lock_test_and_set(&global_lock, 1));
+	shared_value++;
+	__sync_lock_release(&global_lock);
+}
+
+/* In practice even __ATOMIC_RELAXED uses ARM's ldxr/stex exclusive
+ * semantics */
+static void increment_shared_with_atomic(int cpu)
+{
+	(void)cpu;
+
+	__atomic_add_fetch(&shared_value, 1, __ATOMIC_SEQ_CST);
+}
+
+
+/*
+ * Load/store exclusive with WFE (wait-for-event)
+ *
+ * See ARMv8 ARM examples:
+ *   Use of Wait For Event (WFE) and Send Event (SEV) with locks
+ */
+
+static void increment_shared_with_wfelock(int cpu)
+{
+	(void)cpu;
+
+#if defined(__aarch64__)
+	asm volatile(
+	"	mov     w1, #1\n"
+	"       sevl\n"
+	"       prfm PSTL1KEEP, [%[lock]]\n"
+	"1:     wfe\n"
+	"	ldaxr	w0, [%[lock]]\n"
+	"	cbnz    w0, 1b\n"
+	"	stxr    w0, w1, [%[lock]]\n"
+	"	cbnz	w0, 1b\n"
+	/* lock held */
+	"	ldr	w0, [%[sptr]]\n"
+	"	add	w0, w0, #0x1\n"
+	"	str	w0, [%[sptr]]\n"
+	/* now release */
+	"	stlr	wzr, [%[lock]]\n"
+	: /* out */
+	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
+	: "w0", "w1", "cc");
+#else
+	asm volatile(
+	"	mov     r1, #1\n"
+	"1:	ldrex	r0, [%[lock]]\n"
+	"	cmp     r0, #0\n"
+	"	wfene\n"
+	"	strexeq r0, r1, [%[lock]]\n"
+	"	cmpeq	r0, #0\n"
+	"	bne	1b\n"
+	"	dmb\n"
+	/* lock held */
+	"	ldr	r0, [%[sptr]]\n"
+	"	add	r0, r0, #0x1\n"
+	"	str	r0, [%[sptr]]\n"
+	/* now release */
+	"	mov	r0, #0\n"
+	"	dmb\n"
+	"	str	r0, [%[lock]]\n"
+	"	dsb\n"
+	"	sev\n"
+	: /* out */
+	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+}
+
+
+/*
+ * Hand-written version of the load/store exclusive
+ */
+static void increment_shared_with_excl(int cpu)
+{
+	(void)cpu;
+
+#if defined(__aarch64__)
+        asm volatile(
+	"1:	ldxr	w0, [%[sptr]]\n"
+	"	add     w0, w0, #0x1\n"
+	"	stxr	w1, w0, [%[sptr]]\n"
+	"	cbnz	w1, 1b\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "w0", "w1", "cc");
+#else
+	asm volatile(
+	"1:	ldrex	r0, [%[sptr]]\n"
+	"	add     r0, r0, #0x1\n"
+	"	strex	r1, r0, [%[sptr]]\n"
+	"	cmp	r1, #0\n"
+	"	bne	1b\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+}
+
+/* Test array */
+static test_descr_t tests[] = {
+	{ "none", false, increment_shared },
+	{ "lock", true, increment_shared_with_lock },
+	{ "atomic", true, increment_shared_with_atomic },
+	{ "wfelock", true, increment_shared_with_wfelock },
+	{ "excl", true, increment_shared_with_excl }
+};
+
+/* The idea of this is just to generate some random load/store
+ * activity which may or may not race with an un-barried incremented
+ * of the shared counter
+ */
+static void shuffle_memory(int cpu)
+{
+	int i;
+	uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
+	uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
+	int count = seq & 0x1f;
+	uint32_t val=0;
+
+	seq >>= 5;
+
+	for (i=0; i<count; i++) {
+		int index = seq & ~PAGE_MASK;
+		if (lspat & 1) {
+			val ^= memory_array[index];
+		} else {
+			memory_array[index] = val;
+		}
+		seq >>= PAGE_SHIFT;
+		seq ^= lspat;
+		lspat >>= 1;
+	}
+
+}
+
+static inc_fn increment_function;
+
+static void do_increment(void)
+{
+	int i;
+	int cpu = smp_processor_id();
+
+	printf("CPU%d: online and ++ing\n", cpu);
+
+	for (i=0; i < increment_count; i++) {
+		per_cpu_value[cpu]++;
+		increment_function(cpu);
+
+		if (do_shuffle)
+			shuffle_memory(cpu);
+	}
+
+	printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+static void setup_and_run_test(test_descr_t *test)
+{
+	unsigned int i, sum = 0;
+	int cpu, cpu_cnt = 0;
+
+	increment_function = test->main_fn;
+
+	/* fill our random page */
+        for (i=0; i<PAGE_SIZE; i++) {
+		memory_array[i] = isaac_next_uint32(&prng_context[0]);
+	}
+
+	for_each_present_cpu(cpu) {
+		uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
+		cpu_cnt++;
+		if (cpu == 0)
+			continue;
+
+		isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
+		smp_boot_secondary(cpu, do_increment);
+	}
+
+	do_increment();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	/* All CPUs done, do we add up */
+	for_each_present_cpu(cpu) {
+		sum += per_cpu_value[cpu];
+	}
+
+	if (test->should_pass) {
+		report("total incs %d", sum == shared_value, shared_value);
+	} else {
+		report_xfail("total incs %d", true, sum == shared_value, shared_value);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	static const unsigned char seed[] = "myseed";
+	test_descr_t *test = &tests[0];
+	int i;
+	unsigned int j;
+
+	isaac_init(&prng_context[0], &seed[0], sizeof(seed));
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		/* Check for test name */
+		for (j = 0; j < ARRAY_SIZE(tests); j++) {
+			if (strcmp(arg, tests[j].test_name) == 0)
+				test = &tests[j];
+		}
+
+		/* Test modifiers */
+		if (strcmp(arg, "noshuffle") == 0) {
+			do_shuffle = false;
+			report_prefix_push("noshuffle");
+		} else if (strstr(arg, "count=") != NULL) {
+			char *p = strstr(arg, "=");
+			increment_count = atol(p+1);
+		} else {
+			isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
+		}
+	}
+
+	if (test) {
+		setup_and_run_test(test);
+	} else {
+		report("Unknown test", false);
+	}
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 7dc7799..abbfe79 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -108,3 +108,37 @@ smp = $(($MAX_SMP>4?4:$MAX_SMP))
 extra_params = -append "page"
 groups = tlbflush
 
+# Locking tests
+[locking::none]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+groups = locking
+accel = tcg
+
+[locking::lock]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'lock'
+groups = locking
+accel = tcg
+
+[locking::atomic]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'atomic'
+groups = locking
+accel = tcg
+
+[locking::wfelock]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'wfelock'
+groups = locking
+accel = tcg
+
+[locking::excl]
+file = locking-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'excl'
+groups = locking
+accel = tcg
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 10/11] arm/barrier-litmus-tests: add simple mp and sal litmus tests
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée, Will Deacon

This adds a framework for adding simple barrier litmus tests against
ARM. The litmus tests aren't as comprehensive as the academic exercises
which will attempt to do all sorts of things to keep racing CPUs synced
up. These tests do honour the "sync" parameter to do a poor-mans
equivalent.

The two litmus tests are:
  - message passing
  - store-after-load

They both have case that should fail (although won't on single-threaded
TCG setups). If barriers aren't working properly the store-after-load
test will fail even on an x86 backend as x86 allows re-ording of non
aliased stores.

I've imported a few more of the barrier primatives from the Linux source
tree so we consistently use macros.

The arm64 barrier primitives trip up on -Wstrict-aliasing so this is
disabled in the Makefile.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Will Deacon <will.deacon@arm.com>

---
v7
  - merge in store-after-load
  - clean-up sync-up code
  - use new counter api
  - fix xfail for sal test
v6
  - add a unittest.cfg
  - -fno-strict-aliasing
---
 Makefile                  |   2 +
 arm/Makefile.common       |   2 +
 arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg         |  36 ++++
 lib/arm/asm/barrier.h     |  63 ++++++-
 lib/arm64/asm/barrier.h   |  50 ++++++
 6 files changed, 589 insertions(+), 1 deletion(-)
 create mode 100644 arm/barrier-litmus-test.c

diff --git a/Makefile b/Makefile
index 5201472..53594a1 100644
--- a/Makefile
+++ b/Makefile
@@ -51,10 +51,12 @@ fomit_frame_pointer := $(call cc-option, $(frame-pointer-flag), "")
 fnostack_protector := $(call cc-option, -fno-stack-protector, "")
 fnostack_protector_all := $(call cc-option, -fno-stack-protector-all, "")
 wno_frame_address := $(call cc-option, -Wno-frame-address, "")
+fno_strict_aliasing := $(call cc-option, -fno-strict-aliasing, "")
 CFLAGS += $(fomit_frame_pointer)
 CFLAGS += $(fno_stack_protector)
 CFLAGS += $(fno_stack_protector_all)
 CFLAGS += $(wno_frame_address)
+CFLAGS += $(fno_strict_aliasing)
 
 CXXFLAGS += $(CFLAGS)
 
diff --git a/arm/Makefile.common b/arm/Makefile.common
index eb4cfdf..a508128 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -16,6 +16,7 @@ tests-common += $(TEST_DIR)/gic.flat
 tests-common += $(TEST_DIR)/tlbflush-code.flat
 tests-common += $(TEST_DIR)/tlbflush-data.flat
 tests-common += $(TEST_DIR)/locking-test.flat
+tests-common += $(TEST_DIR)/barrier-litmus-test.flat
 
 all: test_cases
 
@@ -87,3 +88,4 @@ $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
 $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
 $(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
 $(TEST_DIR)/locking-test.elf: $(cstart.o) $(TEST_DIR)/locking-test.o
+$(TEST_DIR)/barrier-litmus-test.elf: $(cstart.o) $(TEST_DIR)/barrier-litmus-test.o
diff --git a/arm/barrier-litmus-test.c b/arm/barrier-litmus-test.c
new file mode 100644
index 0000000..2557a88
--- /dev/null
+++ b/arm/barrier-litmus-test.c
@@ -0,0 +1,437 @@
+/*
+ * ARM Barrier Litmus Tests
+ *
+ * This test provides a framework for testing barrier conditions on
+ * the processor. It's simpler than the more involved barrier testing
+ * frameworks as we are looking for simple failures of QEMU's TCG not
+ * weird edge cases the silicon gets wrong.
+ */
+
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#define MAX_CPUS 8
+
+/* Array size and access controls */
+static int array_size = 100000;
+static int wait_if_ahead = 0;
+
+static cpumask_t cpu_mask;
+
+/*
+ * These test_array_* structures are a contiguous array modified by two or more
+ * competing CPUs. The padding is to ensure the variables do not share
+ * cache lines.
+ *
+ * All structures start zeroed.
+ */
+
+typedef struct test_array
+{
+	volatile unsigned int x;
+	uint8_t dummy[64];
+	volatile unsigned int y;
+	uint8_t dummy2[64];
+	volatile unsigned int r[MAX_CPUS];
+} test_array;
+
+volatile test_array *array;
+
+/* Test definition structure
+ *
+ * The first function will always run on the primary CPU, it is
+ * usually the one that will detect any weirdness and trigger the
+ * failure of the test.
+ */
+
+typedef void (*test_fn)(void);
+
+typedef struct {
+	const char *test_name;
+	bool  should_pass;
+	test_fn main_fn;
+	test_fn secondary_fns[MAX_CPUS-1];
+} test_descr_t;
+
+/* Litmus tests */
+
+static unsigned long sync_start(void)
+{
+	const unsigned long gate_mask = ~0x3ffff;
+	unsigned long gate, now;
+	gate = get_cntvct() & gate_mask;
+	do {
+		now =get_cntvct();
+	} while ((now & gate_mask)==gate);
+
+	return now;
+}
+
+/* Simple Message Passing
+ *
+ * x is the message data
+ * y is the flag to indicate the data is ready
+ *
+ * Reading x == 0 when y == 1 is a failure.
+ */
+
+void message_passing_write(void)
+{
+	int i;
+
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		entry->x = 1;
+		entry->y = 1;
+	}
+
+	halt();
+}
+
+void message_passing_read(void)
+{
+	int i;
+	int errors = 0, ready = 0;
+
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int x,y;
+		y = entry->y;
+		x = entry->x;
+
+		if (y && !x)
+			errors++;
+		ready += y;
+	}
+
+	report_xfail("mp: %d errors, %d ready", true, errors == 0, errors, ready);
+}
+
+/* Simple Message Passing with barriers */
+void message_passing_write_barrier(void)
+{
+	int i;
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		entry->x = 1;
+		smp_wmb();
+		entry->y = 1;
+	}
+
+	halt();
+}
+
+void message_passing_read_barrier(void)
+{
+	int i;
+	int errors = 0, ready = 0, not_ready = 0;
+
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int x, y;
+		y = entry->y;
+		smp_rmb();
+		x = entry->x;
+
+		if (y && !x)
+			errors++;
+
+		if (y) {
+			ready++;
+		} else {
+			not_ready++;
+
+			if (not_ready > 2) {
+				entry = &array[i+1];
+				do {
+					not_ready = 0;
+				} while (wait_if_ahead && !entry->y);
+			}
+		}
+	}
+
+	report("mp barrier: %d errors, %d ready", errors == 0, errors, ready);
+}
+
+/* Simple Message Passing with Acquire/Release */
+void message_passing_write_release(void)
+{
+	int i;
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		entry->x = 1;
+		smp_store_release(&entry->y, 1);
+	}
+
+	halt();
+}
+
+void message_passing_read_acquire(void)
+{
+	int i;
+	int errors = 0, ready = 0, not_ready = 0;
+
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int x, y;
+		y = smp_load_acquire(&entry->y);
+		x = entry->x;
+
+		if (y && !x)
+			errors++;
+
+		if (y) {
+			ready++;
+		} else {
+			not_ready++;
+
+			if (not_ready > 2) {
+				entry = &array[i+1];
+				do {
+					not_ready = 0;
+				} while (wait_if_ahead && !entry->y);
+			}
+		}
+	}
+
+	report("mp acqrel: %d errors, %d ready", errors == 0, errors, ready);
+}
+
+/*
+ * Store after load
+ *
+ * T1: write 1 to x, load r from y
+ * T2: write 1 to y, load r from x
+ *
+ * Without memory fence r[0] && r[1] == 0
+ * With memory fence both == 0 should be impossible
+ */
+
+static void check_store_and_load_results(char *name, int thread, bool xfail,
+					unsigned long start, unsigned long end)
+{
+	int i;
+	int neither = 0;
+	int only_first = 0;
+	int only_second = 0;
+	int both = 0;
+
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		if (entry->r[0] == 0 &&
+		    entry->r[1] == 0) {
+			neither++;
+		} else if (entry->r[0] &&
+			entry->r[1]) {
+			both++;
+		} else if (entry->r[0]) {
+			only_first++;
+		} else {
+			only_second++;
+		}
+	}
+
+	printf("T%d: %08lx->%08lx neither=%d only_t1=%d only_t2=%d both=%d\n", thread,
+		start, end, neither, only_first, only_second, both);
+
+	if (thread == 1) {
+		if (xfail) {
+			report_xfail("%s: errors=%d", true, neither==0,
+				name, neither);
+		} else {
+			report("%s: errors=%d", neither==0, name, neither);
+		}
+
+	}
+}
+
+/*
+ * This attempts to synchronise the start of both threads to roughly
+ * the same time. On real hardware there is a little latency as the
+ * secondary vCPUs are powered up however this effect it much more
+ * exaggerated on a TCG host.
+ *
+ * Busy waits until the we pass a future point in time, returns final
+ * start time.
+ */
+
+void store_and_load_1(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i<array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->x = 1;
+		r = entry->y;
+		entry->r[0] = r;
+	}
+	end = get_cntvct();
+
+	smp_mb();
+
+	while (!cpumask_test_cpu(1, &cpu_mask))
+		cpu_relax();
+
+	check_store_and_load_results("sal", 1, true, start, end);
+}
+
+void store_and_load_2(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i<array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->y = 1;
+		r = entry->x;
+		entry->r[1] = r;
+	}
+	end = get_cntvct();
+
+	check_store_and_load_results("sal", 2, true, start, end);
+
+	cpumask_set_cpu(1, &cpu_mask);
+
+	halt();
+}
+
+void store_and_load_barrier_1(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->x = 1;
+		smp_mb();
+		r = entry->y;
+		entry->r[0] = r;
+	}
+	end = get_cntvct();
+
+	smp_mb();
+
+	while (!cpumask_test_cpu(1, &cpu_mask))
+		cpu_relax();
+
+	check_store_and_load_results("sal_barrier", 1, false, start, end);
+}
+
+void store_and_load_barrier_2(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->y = 1;
+		smp_mb();
+		r = entry->x;
+		entry->r[1] = r;
+	}
+	end = get_cntvct();
+
+	check_store_and_load_results("sal_barrier", 2, false, start, end);
+
+	cpumask_set_cpu(1, &cpu_mask);
+
+	halt();
+}
+
+
+/* Test array */
+static test_descr_t tests[] = {
+
+	{ "mp",         false,
+	  message_passing_read,
+	  { message_passing_write }
+	},
+
+	{ "mp_barrier", true,
+	  message_passing_read_barrier,
+	  { message_passing_write_barrier }
+	},
+
+	{ "mp_acqrel", true,
+	  message_passing_read_acquire,
+	  { message_passing_write_release }
+	},
+
+	{ "sal",       false,
+	  store_and_load_1,
+	  { store_and_load_2 }
+	},
+
+	{ "sal_barrier", true,
+	  store_and_load_barrier_1,
+	  { store_and_load_barrier_2 }
+	},
+};
+
+
+void setup_and_run_litmus(test_descr_t *test)
+{
+	array = calloc(array_size, sizeof(test_array));
+
+	if (array) {
+		int i = 0;
+		printf("Allocated test array @ %p\n", array);
+
+		while (test->secondary_fns[i]) {
+			smp_boot_secondary(i+1, test->secondary_fns[i]);
+			i++;
+		}
+
+		test->main_fn();
+	} else {
+		report("%s: failed to allocate memory",false, test->test_name);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+	unsigned int j;
+	test_descr_t *test = NULL;
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		for (j = 0; j < ARRAY_SIZE(tests); j++) {
+			if (strcmp(arg, tests[j].test_name) == 0)
+				test = &tests[j];
+		}
+
+		/* Test modifiers */
+		if (strstr(arg, "count=") != NULL) {
+			char *p = strstr(arg, "=");
+			array_size = atol(p+1);
+		} else if (strcmp (arg, "wait") == 0) {
+			wait_if_ahead = 1;
+		}
+	}
+
+	if (test) {
+		setup_and_run_litmus(test);
+	} else {
+		report("Unknown test", false);
+	}
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index abbfe79..355dcfb 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -142,3 +142,39 @@ smp = $(($MAX_SMP>4?4:$MAX_SMP))
 extra_params = -append 'excl'
 groups = locking
 accel = tcg
+
+[barrier-litmus::mp]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'mp'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::mp-barrier]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'mp_barrier'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::mp-acqrel]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'mp_acqrel'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::sal]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'sal'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::sal-barrier]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'sal_barrier'
+groups = barrier
+accel = tcg
+
diff --git a/lib/arm/asm/barrier.h b/lib/arm/asm/barrier.h
index 394a4a2..e3b7a2e 100644
--- a/lib/arm/asm/barrier.h
+++ b/lib/arm/asm/barrier.h
@@ -1,9 +1,11 @@
 #ifndef _ASMARM_BARRIER_H_
 #define _ASMARM_BARRIER_H_
 /*
- * Adapted form arch/arm/include/asm/barrier.h
+ * Adapted from arch/arm/include/asm/barrier.h
  */
 
+#include <stdint.h>
+
 #define sev()		asm volatile("sev" : : : "memory")
 #define wfe()		asm volatile("wfe" : : : "memory")
 #define wfi()		asm volatile("wfi" : : : "memory")
@@ -20,4 +22,63 @@
 #define smp_rmb()	smp_mb()
 #define smp_wmb()	dmb(ishst)
 
+extern void abort(void);
+
+static inline void __write_once_size(volatile void *p, void *res, int size)
+{
+	switch (size) {
+	case 1: *(volatile uint8_t *)p = *(uint8_t *)res; break;
+	case 2: *(volatile uint16_t *)p = *(uint16_t *)res; break;
+	case 4: *(volatile uint32_t *)p = *(uint32_t *)res; break;
+	case 8: *(volatile uint64_t *)p = *(uint64_t *)res; break;
+	default:
+		/* unhandled case */
+		abort();
+	}
+}
+
+#define WRITE_ONCE(x, val) \
+({							\
+	union { typeof(x) __val; char __c[1]; } __u =	\
+		{ .__val = (typeof(x)) (val) }; \
+	__write_once_size(&(x), __u.__c, sizeof(x));	\
+	__u.__val;					\
+})
+
+#define smp_store_release(p, v)						\
+do {									\
+	smp_mb();							\
+	WRITE_ONCE(*p, v);						\
+} while (0)
+
+
+static inline
+void __read_once_size(const volatile void *p, void *res, int size)
+{
+	switch (size) {
+	case 1: *(uint8_t *)res = *(volatile uint8_t *)p; break;
+	case 2: *(uint16_t *)res = *(volatile uint16_t *)p; break;
+	case 4: *(uint32_t *)res = *(volatile uint32_t *)p; break;
+	case 8: *(uint64_t *)res = *(volatile uint64_t *)p; break;
+	default:
+		/* unhandled case */
+		abort();
+	}
+}
+
+#define READ_ONCE(x)							\
+({									\
+	union { typeof(x) __val; char __c[1]; } __u;			\
+	__read_once_size(&(x), __u.__c, sizeof(x));			\
+	__u.__val;							\
+})
+
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = READ_ONCE(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* _ASMARM_BARRIER_H_ */
diff --git a/lib/arm64/asm/barrier.h b/lib/arm64/asm/barrier.h
index dbdac9d..aafabdc 100644
--- a/lib/arm64/asm/barrier.h
+++ b/lib/arm64/asm/barrier.h
@@ -19,4 +19,54 @@
 #define smp_rmb()	dmb(ishld)
 #define smp_wmb()	dmb(ishst)
 
+#define smp_store_release(p, v)						\
+do {									\
+	switch (sizeof(*p)) {						\
+	case 1:								\
+		asm volatile ("stlrb %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 2:								\
+		asm volatile ("stlrh %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 4:								\
+		asm volatile ("stlr %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 8:								\
+		asm volatile ("stlr %1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	}								\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	union { typeof(*p) __val; char __c[1]; } __u;			\
+	switch (sizeof(*p)) {						\
+	case 1:								\
+		asm volatile ("ldarb %w0, %1"				\
+			: "=r" (*(u8 *)__u.__c)				\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	case 2:								\
+		asm volatile ("ldarh %w0, %1"				\
+			: "=r" (*(u16 *)__u.__c)			\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	case 4:								\
+		asm volatile ("ldar %w0, %1"				\
+			: "=r" (*(u32 *)__u.__c)			\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	case 8:								\
+		asm volatile ("ldar %0, %1"				\
+			: "=r" (*(u64 *)__u.__c)			\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	}								\
+	__u.__val;							\
+})
+
 #endif /* _ASMARM64_BARRIER_H_ */
-- 
2.10.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 10/11] arm/barrier-litmus-tests: add simple mp and sal litmus tests
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée, Will Deacon

This adds a framework for adding simple barrier litmus tests against
ARM. The litmus tests aren't as comprehensive as the academic exercises
which will attempt to do all sorts of things to keep racing CPUs synced
up. These tests do honour the "sync" parameter to do a poor-mans
equivalent.

The two litmus tests are:
  - message passing
  - store-after-load

They both have case that should fail (although won't on single-threaded
TCG setups). If barriers aren't working properly the store-after-load
test will fail even on an x86 backend as x86 allows re-ording of non
aliased stores.

I've imported a few more of the barrier primatives from the Linux source
tree so we consistently use macros.

The arm64 barrier primitives trip up on -Wstrict-aliasing so this is
disabled in the Makefile.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Will Deacon <will.deacon@arm.com>

---
v7
  - merge in store-after-load
  - clean-up sync-up code
  - use new counter api
  - fix xfail for sal test
v6
  - add a unittest.cfg
  - -fno-strict-aliasing
---
 Makefile                  |   2 +
 arm/Makefile.common       |   2 +
 arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg         |  36 ++++
 lib/arm/asm/barrier.h     |  63 ++++++-
 lib/arm64/asm/barrier.h   |  50 ++++++
 6 files changed, 589 insertions(+), 1 deletion(-)
 create mode 100644 arm/barrier-litmus-test.c

diff --git a/Makefile b/Makefile
index 5201472..53594a1 100644
--- a/Makefile
+++ b/Makefile
@@ -51,10 +51,12 @@ fomit_frame_pointer := $(call cc-option, $(frame-pointer-flag), "")
 fnostack_protector := $(call cc-option, -fno-stack-protector, "")
 fnostack_protector_all := $(call cc-option, -fno-stack-protector-all, "")
 wno_frame_address := $(call cc-option, -Wno-frame-address, "")
+fno_strict_aliasing := $(call cc-option, -fno-strict-aliasing, "")
 CFLAGS += $(fomit_frame_pointer)
 CFLAGS += $(fno_stack_protector)
 CFLAGS += $(fno_stack_protector_all)
 CFLAGS += $(wno_frame_address)
+CFLAGS += $(fno_strict_aliasing)
 
 CXXFLAGS += $(CFLAGS)
 
diff --git a/arm/Makefile.common b/arm/Makefile.common
index eb4cfdf..a508128 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -16,6 +16,7 @@ tests-common += $(TEST_DIR)/gic.flat
 tests-common += $(TEST_DIR)/tlbflush-code.flat
 tests-common += $(TEST_DIR)/tlbflush-data.flat
 tests-common += $(TEST_DIR)/locking-test.flat
+tests-common += $(TEST_DIR)/barrier-litmus-test.flat
 
 all: test_cases
 
@@ -87,3 +88,4 @@ $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
 $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
 $(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
 $(TEST_DIR)/locking-test.elf: $(cstart.o) $(TEST_DIR)/locking-test.o
+$(TEST_DIR)/barrier-litmus-test.elf: $(cstart.o) $(TEST_DIR)/barrier-litmus-test.o
diff --git a/arm/barrier-litmus-test.c b/arm/barrier-litmus-test.c
new file mode 100644
index 0000000..2557a88
--- /dev/null
+++ b/arm/barrier-litmus-test.c
@@ -0,0 +1,437 @@
+/*
+ * ARM Barrier Litmus Tests
+ *
+ * This test provides a framework for testing barrier conditions on
+ * the processor. It's simpler than the more involved barrier testing
+ * frameworks as we are looking for simple failures of QEMU's TCG not
+ * weird edge cases the silicon gets wrong.
+ */
+
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#define MAX_CPUS 8
+
+/* Array size and access controls */
+static int array_size = 100000;
+static int wait_if_ahead = 0;
+
+static cpumask_t cpu_mask;
+
+/*
+ * These test_array_* structures are a contiguous array modified by two or more
+ * competing CPUs. The padding is to ensure the variables do not share
+ * cache lines.
+ *
+ * All structures start zeroed.
+ */
+
+typedef struct test_array
+{
+	volatile unsigned int x;
+	uint8_t dummy[64];
+	volatile unsigned int y;
+	uint8_t dummy2[64];
+	volatile unsigned int r[MAX_CPUS];
+} test_array;
+
+volatile test_array *array;
+
+/* Test definition structure
+ *
+ * The first function will always run on the primary CPU, it is
+ * usually the one that will detect any weirdness and trigger the
+ * failure of the test.
+ */
+
+typedef void (*test_fn)(void);
+
+typedef struct {
+	const char *test_name;
+	bool  should_pass;
+	test_fn main_fn;
+	test_fn secondary_fns[MAX_CPUS-1];
+} test_descr_t;
+
+/* Litmus tests */
+
+static unsigned long sync_start(void)
+{
+	const unsigned long gate_mask = ~0x3ffff;
+	unsigned long gate, now;
+	gate = get_cntvct() & gate_mask;
+	do {
+		now =get_cntvct();
+	} while ((now & gate_mask)==gate);
+
+	return now;
+}
+
+/* Simple Message Passing
+ *
+ * x is the message data
+ * y is the flag to indicate the data is ready
+ *
+ * Reading x == 0 when y == 1 is a failure.
+ */
+
+void message_passing_write(void)
+{
+	int i;
+
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		entry->x = 1;
+		entry->y = 1;
+	}
+
+	halt();
+}
+
+void message_passing_read(void)
+{
+	int i;
+	int errors = 0, ready = 0;
+
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int x,y;
+		y = entry->y;
+		x = entry->x;
+
+		if (y && !x)
+			errors++;
+		ready += y;
+	}
+
+	report_xfail("mp: %d errors, %d ready", true, errors == 0, errors, ready);
+}
+
+/* Simple Message Passing with barriers */
+void message_passing_write_barrier(void)
+{
+	int i;
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		entry->x = 1;
+		smp_wmb();
+		entry->y = 1;
+	}
+
+	halt();
+}
+
+void message_passing_read_barrier(void)
+{
+	int i;
+	int errors = 0, ready = 0, not_ready = 0;
+
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int x, y;
+		y = entry->y;
+		smp_rmb();
+		x = entry->x;
+
+		if (y && !x)
+			errors++;
+
+		if (y) {
+			ready++;
+		} else {
+			not_ready++;
+
+			if (not_ready > 2) {
+				entry = &array[i+1];
+				do {
+					not_ready = 0;
+				} while (wait_if_ahead && !entry->y);
+			}
+		}
+	}
+
+	report("mp barrier: %d errors, %d ready", errors == 0, errors, ready);
+}
+
+/* Simple Message Passing with Acquire/Release */
+void message_passing_write_release(void)
+{
+	int i;
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		entry->x = 1;
+		smp_store_release(&entry->y, 1);
+	}
+
+	halt();
+}
+
+void message_passing_read_acquire(void)
+{
+	int i;
+	int errors = 0, ready = 0, not_ready = 0;
+
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int x, y;
+		y = smp_load_acquire(&entry->y);
+		x = entry->x;
+
+		if (y && !x)
+			errors++;
+
+		if (y) {
+			ready++;
+		} else {
+			not_ready++;
+
+			if (not_ready > 2) {
+				entry = &array[i+1];
+				do {
+					not_ready = 0;
+				} while (wait_if_ahead && !entry->y);
+			}
+		}
+	}
+
+	report("mp acqrel: %d errors, %d ready", errors == 0, errors, ready);
+}
+
+/*
+ * Store after load
+ *
+ * T1: write 1 to x, load r from y
+ * T2: write 1 to y, load r from x
+ *
+ * Without memory fence r[0] && r[1] == 0
+ * With memory fence both == 0 should be impossible
+ */
+
+static void check_store_and_load_results(char *name, int thread, bool xfail,
+					unsigned long start, unsigned long end)
+{
+	int i;
+	int neither = 0;
+	int only_first = 0;
+	int only_second = 0;
+	int both = 0;
+
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		if (entry->r[0] == 0 &&
+		    entry->r[1] == 0) {
+			neither++;
+		} else if (entry->r[0] &&
+			entry->r[1]) {
+			both++;
+		} else if (entry->r[0]) {
+			only_first++;
+		} else {
+			only_second++;
+		}
+	}
+
+	printf("T%d: %08lx->%08lx neither=%d only_t1=%d only_t2=%d both=%d\n", thread,
+		start, end, neither, only_first, only_second, both);
+
+	if (thread == 1) {
+		if (xfail) {
+			report_xfail("%s: errors=%d", true, neither==0,
+				name, neither);
+		} else {
+			report("%s: errors=%d", neither==0, name, neither);
+		}
+
+	}
+}
+
+/*
+ * This attempts to synchronise the start of both threads to roughly
+ * the same time. On real hardware there is a little latency as the
+ * secondary vCPUs are powered up however this effect it much more
+ * exaggerated on a TCG host.
+ *
+ * Busy waits until the we pass a future point in time, returns final
+ * start time.
+ */
+
+void store_and_load_1(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i<array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->x = 1;
+		r = entry->y;
+		entry->r[0] = r;
+	}
+	end = get_cntvct();
+
+	smp_mb();
+
+	while (!cpumask_test_cpu(1, &cpu_mask))
+		cpu_relax();
+
+	check_store_and_load_results("sal", 1, true, start, end);
+}
+
+void store_and_load_2(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i<array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->y = 1;
+		r = entry->x;
+		entry->r[1] = r;
+	}
+	end = get_cntvct();
+
+	check_store_and_load_results("sal", 2, true, start, end);
+
+	cpumask_set_cpu(1, &cpu_mask);
+
+	halt();
+}
+
+void store_and_load_barrier_1(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->x = 1;
+		smp_mb();
+		r = entry->y;
+		entry->r[0] = r;
+	}
+	end = get_cntvct();
+
+	smp_mb();
+
+	while (!cpumask_test_cpu(1, &cpu_mask))
+		cpu_relax();
+
+	check_store_and_load_results("sal_barrier", 1, false, start, end);
+}
+
+void store_and_load_barrier_2(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->y = 1;
+		smp_mb();
+		r = entry->x;
+		entry->r[1] = r;
+	}
+	end = get_cntvct();
+
+	check_store_and_load_results("sal_barrier", 2, false, start, end);
+
+	cpumask_set_cpu(1, &cpu_mask);
+
+	halt();
+}
+
+
+/* Test array */
+static test_descr_t tests[] = {
+
+	{ "mp",         false,
+	  message_passing_read,
+	  { message_passing_write }
+	},
+
+	{ "mp_barrier", true,
+	  message_passing_read_barrier,
+	  { message_passing_write_barrier }
+	},
+
+	{ "mp_acqrel", true,
+	  message_passing_read_acquire,
+	  { message_passing_write_release }
+	},
+
+	{ "sal",       false,
+	  store_and_load_1,
+	  { store_and_load_2 }
+	},
+
+	{ "sal_barrier", true,
+	  store_and_load_barrier_1,
+	  { store_and_load_barrier_2 }
+	},
+};
+
+
+void setup_and_run_litmus(test_descr_t *test)
+{
+	array = calloc(array_size, sizeof(test_array));
+
+	if (array) {
+		int i = 0;
+		printf("Allocated test array @ %p\n", array);
+
+		while (test->secondary_fns[i]) {
+			smp_boot_secondary(i+1, test->secondary_fns[i]);
+			i++;
+		}
+
+		test->main_fn();
+	} else {
+		report("%s: failed to allocate memory",false, test->test_name);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+	unsigned int j;
+	test_descr_t *test = NULL;
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		for (j = 0; j < ARRAY_SIZE(tests); j++) {
+			if (strcmp(arg, tests[j].test_name) == 0)
+				test = &tests[j];
+		}
+
+		/* Test modifiers */
+		if (strstr(arg, "count=") != NULL) {
+			char *p = strstr(arg, "=");
+			array_size = atol(p+1);
+		} else if (strcmp (arg, "wait") == 0) {
+			wait_if_ahead = 1;
+		}
+	}
+
+	if (test) {
+		setup_and_run_litmus(test);
+	} else {
+		report("Unknown test", false);
+	}
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index abbfe79..355dcfb 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -142,3 +142,39 @@ smp = $(($MAX_SMP>4?4:$MAX_SMP))
 extra_params = -append 'excl'
 groups = locking
 accel = tcg
+
+[barrier-litmus::mp]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'mp'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::mp-barrier]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'mp_barrier'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::mp-acqrel]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'mp_acqrel'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::sal]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'sal'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::sal-barrier]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'sal_barrier'
+groups = barrier
+accel = tcg
+
diff --git a/lib/arm/asm/barrier.h b/lib/arm/asm/barrier.h
index 394a4a2..e3b7a2e 100644
--- a/lib/arm/asm/barrier.h
+++ b/lib/arm/asm/barrier.h
@@ -1,9 +1,11 @@
 #ifndef _ASMARM_BARRIER_H_
 #define _ASMARM_BARRIER_H_
 /*
- * Adapted form arch/arm/include/asm/barrier.h
+ * Adapted from arch/arm/include/asm/barrier.h
  */
 
+#include <stdint.h>
+
 #define sev()		asm volatile("sev" : : : "memory")
 #define wfe()		asm volatile("wfe" : : : "memory")
 #define wfi()		asm volatile("wfi" : : : "memory")
@@ -20,4 +22,63 @@
 #define smp_rmb()	smp_mb()
 #define smp_wmb()	dmb(ishst)
 
+extern void abort(void);
+
+static inline void __write_once_size(volatile void *p, void *res, int size)
+{
+	switch (size) {
+	case 1: *(volatile uint8_t *)p = *(uint8_t *)res; break;
+	case 2: *(volatile uint16_t *)p = *(uint16_t *)res; break;
+	case 4: *(volatile uint32_t *)p = *(uint32_t *)res; break;
+	case 8: *(volatile uint64_t *)p = *(uint64_t *)res; break;
+	default:
+		/* unhandled case */
+		abort();
+	}
+}
+
+#define WRITE_ONCE(x, val) \
+({							\
+	union { typeof(x) __val; char __c[1]; } __u =	\
+		{ .__val = (typeof(x)) (val) }; \
+	__write_once_size(&(x), __u.__c, sizeof(x));	\
+	__u.__val;					\
+})
+
+#define smp_store_release(p, v)						\
+do {									\
+	smp_mb();							\
+	WRITE_ONCE(*p, v);						\
+} while (0)
+
+
+static inline
+void __read_once_size(const volatile void *p, void *res, int size)
+{
+	switch (size) {
+	case 1: *(uint8_t *)res = *(volatile uint8_t *)p; break;
+	case 2: *(uint16_t *)res = *(volatile uint16_t *)p; break;
+	case 4: *(uint32_t *)res = *(volatile uint32_t *)p; break;
+	case 8: *(uint64_t *)res = *(volatile uint64_t *)p; break;
+	default:
+		/* unhandled case */
+		abort();
+	}
+}
+
+#define READ_ONCE(x)							\
+({									\
+	union { typeof(x) __val; char __c[1]; } __u;			\
+	__read_once_size(&(x), __u.__c, sizeof(x));			\
+	__u.__val;							\
+})
+
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = READ_ONCE(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* _ASMARM_BARRIER_H_ */
diff --git a/lib/arm64/asm/barrier.h b/lib/arm64/asm/barrier.h
index dbdac9d..aafabdc 100644
--- a/lib/arm64/asm/barrier.h
+++ b/lib/arm64/asm/barrier.h
@@ -19,4 +19,54 @@
 #define smp_rmb()	dmb(ishld)
 #define smp_wmb()	dmb(ishst)
 
+#define smp_store_release(p, v)						\
+do {									\
+	switch (sizeof(*p)) {						\
+	case 1:								\
+		asm volatile ("stlrb %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 2:								\
+		asm volatile ("stlrh %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 4:								\
+		asm volatile ("stlr %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 8:								\
+		asm volatile ("stlr %1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	}								\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	union { typeof(*p) __val; char __c[1]; } __u;			\
+	switch (sizeof(*p)) {						\
+	case 1:								\
+		asm volatile ("ldarb %w0, %1"				\
+			: "=r" (*(u8 *)__u.__c)				\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	case 2:								\
+		asm volatile ("ldarh %w0, %1"				\
+			: "=r" (*(u16 *)__u.__c)			\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	case 4:								\
+		asm volatile ("ldar %w0, %1"				\
+			: "=r" (*(u32 *)__u.__c)			\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	case 8:								\
+		asm volatile ("ldar %0, %1"				\
+			: "=r" (*(u64 *)__u.__c)			\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	}								\
+	__u.__val;							\
+})
+
 #endif /* _ASMARM64_BARRIER_H_ */
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 10/11] arm/barrier-litmus-tests: add simple mp and sal litmus tests
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

This adds a framework for adding simple barrier litmus tests against
ARM. The litmus tests aren't as comprehensive as the academic exercises
which will attempt to do all sorts of things to keep racing CPUs synced
up. These tests do honour the "sync" parameter to do a poor-mans
equivalent.

The two litmus tests are:
  - message passing
  - store-after-load

They both have case that should fail (although won't on single-threaded
TCG setups). If barriers aren't working properly the store-after-load
test will fail even on an x86 backend as x86 allows re-ording of non
aliased stores.

I've imported a few more of the barrier primatives from the Linux source
tree so we consistently use macros.

The arm64 barrier primitives trip up on -Wstrict-aliasing so this is
disabled in the Makefile.

Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
CC: Will Deacon <will.deacon@arm.com>

---
v7
  - merge in store-after-load
  - clean-up sync-up code
  - use new counter api
  - fix xfail for sal test
v6
  - add a unittest.cfg
  - -fno-strict-aliasing
---
 Makefile                  |   2 +
 arm/Makefile.common       |   2 +
 arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg         |  36 ++++
 lib/arm/asm/barrier.h     |  63 ++++++-
 lib/arm64/asm/barrier.h   |  50 ++++++
 6 files changed, 589 insertions(+), 1 deletion(-)
 create mode 100644 arm/barrier-litmus-test.c

diff --git a/Makefile b/Makefile
index 5201472..53594a1 100644
--- a/Makefile
+++ b/Makefile
@@ -51,10 +51,12 @@ fomit_frame_pointer := $(call cc-option, $(frame-pointer-flag), "")
 fnostack_protector := $(call cc-option, -fno-stack-protector, "")
 fnostack_protector_all := $(call cc-option, -fno-stack-protector-all, "")
 wno_frame_address := $(call cc-option, -Wno-frame-address, "")
+fno_strict_aliasing := $(call cc-option, -fno-strict-aliasing, "")
 CFLAGS += $(fomit_frame_pointer)
 CFLAGS += $(fno_stack_protector)
 CFLAGS += $(fno_stack_protector_all)
 CFLAGS += $(wno_frame_address)
+CFLAGS += $(fno_strict_aliasing)
 
 CXXFLAGS += $(CFLAGS)
 
diff --git a/arm/Makefile.common b/arm/Makefile.common
index eb4cfdf..a508128 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -16,6 +16,7 @@ tests-common += $(TEST_DIR)/gic.flat
 tests-common += $(TEST_DIR)/tlbflush-code.flat
 tests-common += $(TEST_DIR)/tlbflush-data.flat
 tests-common += $(TEST_DIR)/locking-test.flat
+tests-common += $(TEST_DIR)/barrier-litmus-test.flat
 
 all: test_cases
 
@@ -87,3 +88,4 @@ $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
 $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
 $(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
 $(TEST_DIR)/locking-test.elf: $(cstart.o) $(TEST_DIR)/locking-test.o
+$(TEST_DIR)/barrier-litmus-test.elf: $(cstart.o) $(TEST_DIR)/barrier-litmus-test.o
diff --git a/arm/barrier-litmus-test.c b/arm/barrier-litmus-test.c
new file mode 100644
index 0000000..2557a88
--- /dev/null
+++ b/arm/barrier-litmus-test.c
@@ -0,0 +1,437 @@
+/*
+ * ARM Barrier Litmus Tests
+ *
+ * This test provides a framework for testing barrier conditions on
+ * the processor. It's simpler than the more involved barrier testing
+ * frameworks as we are looking for simple failures of QEMU's TCG not
+ * weird edge cases the silicon gets wrong.
+ */
+
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#define MAX_CPUS 8
+
+/* Array size and access controls */
+static int array_size = 100000;
+static int wait_if_ahead = 0;
+
+static cpumask_t cpu_mask;
+
+/*
+ * These test_array_* structures are a contiguous array modified by two or more
+ * competing CPUs. The padding is to ensure the variables do not share
+ * cache lines.
+ *
+ * All structures start zeroed.
+ */
+
+typedef struct test_array
+{
+	volatile unsigned int x;
+	uint8_t dummy[64];
+	volatile unsigned int y;
+	uint8_t dummy2[64];
+	volatile unsigned int r[MAX_CPUS];
+} test_array;
+
+volatile test_array *array;
+
+/* Test definition structure
+ *
+ * The first function will always run on the primary CPU, it is
+ * usually the one that will detect any weirdness and trigger the
+ * failure of the test.
+ */
+
+typedef void (*test_fn)(void);
+
+typedef struct {
+	const char *test_name;
+	bool  should_pass;
+	test_fn main_fn;
+	test_fn secondary_fns[MAX_CPUS-1];
+} test_descr_t;
+
+/* Litmus tests */
+
+static unsigned long sync_start(void)
+{
+	const unsigned long gate_mask = ~0x3ffff;
+	unsigned long gate, now;
+	gate = get_cntvct() & gate_mask;
+	do {
+		now =get_cntvct();
+	} while ((now & gate_mask)==gate);
+
+	return now;
+}
+
+/* Simple Message Passing
+ *
+ * x is the message data
+ * y is the flag to indicate the data is ready
+ *
+ * Reading x == 0 when y == 1 is a failure.
+ */
+
+void message_passing_write(void)
+{
+	int i;
+
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		entry->x = 1;
+		entry->y = 1;
+	}
+
+	halt();
+}
+
+void message_passing_read(void)
+{
+	int i;
+	int errors = 0, ready = 0;
+
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int x,y;
+		y = entry->y;
+		x = entry->x;
+
+		if (y && !x)
+			errors++;
+		ready += y;
+	}
+
+	report_xfail("mp: %d errors, %d ready", true, errors == 0, errors, ready);
+}
+
+/* Simple Message Passing with barriers */
+void message_passing_write_barrier(void)
+{
+	int i;
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		entry->x = 1;
+		smp_wmb();
+		entry->y = 1;
+	}
+
+	halt();
+}
+
+void message_passing_read_barrier(void)
+{
+	int i;
+	int errors = 0, ready = 0, not_ready = 0;
+
+	sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int x, y;
+		y = entry->y;
+		smp_rmb();
+		x = entry->x;
+
+		if (y && !x)
+			errors++;
+
+		if (y) {
+			ready++;
+		} else {
+			not_ready++;
+
+			if (not_ready > 2) {
+				entry = &array[i+1];
+				do {
+					not_ready = 0;
+				} while (wait_if_ahead && !entry->y);
+			}
+		}
+	}
+
+	report("mp barrier: %d errors, %d ready", errors == 0, errors, ready);
+}
+
+/* Simple Message Passing with Acquire/Release */
+void message_passing_write_release(void)
+{
+	int i;
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		entry->x = 1;
+		smp_store_release(&entry->y, 1);
+	}
+
+	halt();
+}
+
+void message_passing_read_acquire(void)
+{
+	int i;
+	int errors = 0, ready = 0, not_ready = 0;
+
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int x, y;
+		y = smp_load_acquire(&entry->y);
+		x = entry->x;
+
+		if (y && !x)
+			errors++;
+
+		if (y) {
+			ready++;
+		} else {
+			not_ready++;
+
+			if (not_ready > 2) {
+				entry = &array[i+1];
+				do {
+					not_ready = 0;
+				} while (wait_if_ahead && !entry->y);
+			}
+		}
+	}
+
+	report("mp acqrel: %d errors, %d ready", errors == 0, errors, ready);
+}
+
+/*
+ * Store after load
+ *
+ * T1: write 1 to x, load r from y
+ * T2: write 1 to y, load r from x
+ *
+ * Without memory fence r[0] && r[1] == 0
+ * With memory fence both == 0 should be impossible
+ */
+
+static void check_store_and_load_results(char *name, int thread, bool xfail,
+					unsigned long start, unsigned long end)
+{
+	int i;
+	int neither = 0;
+	int only_first = 0;
+	int only_second = 0;
+	int both = 0;
+
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		if (entry->r[0] == 0 &&
+		    entry->r[1] == 0) {
+			neither++;
+		} else if (entry->r[0] &&
+			entry->r[1]) {
+			both++;
+		} else if (entry->r[0]) {
+			only_first++;
+		} else {
+			only_second++;
+		}
+	}
+
+	printf("T%d: %08lx->%08lx neither=%d only_t1=%d only_t2=%d both=%d\n", thread,
+		start, end, neither, only_first, only_second, both);
+
+	if (thread == 1) {
+		if (xfail) {
+			report_xfail("%s: errors=%d", true, neither==0,
+				name, neither);
+		} else {
+			report("%s: errors=%d", neither==0, name, neither);
+		}
+
+	}
+}
+
+/*
+ * This attempts to synchronise the start of both threads to roughly
+ * the same time. On real hardware there is a little latency as the
+ * secondary vCPUs are powered up however this effect it much more
+ * exaggerated on a TCG host.
+ *
+ * Busy waits until the we pass a future point in time, returns final
+ * start time.
+ */
+
+void store_and_load_1(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i<array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->x = 1;
+		r = entry->y;
+		entry->r[0] = r;
+	}
+	end = get_cntvct();
+
+	smp_mb();
+
+	while (!cpumask_test_cpu(1, &cpu_mask))
+		cpu_relax();
+
+	check_store_and_load_results("sal", 1, true, start, end);
+}
+
+void store_and_load_2(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i<array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->y = 1;
+		r = entry->x;
+		entry->r[1] = r;
+	}
+	end = get_cntvct();
+
+	check_store_and_load_results("sal", 2, true, start, end);
+
+	cpumask_set_cpu(1, &cpu_mask);
+
+	halt();
+}
+
+void store_and_load_barrier_1(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->x = 1;
+		smp_mb();
+		r = entry->y;
+		entry->r[0] = r;
+	}
+	end = get_cntvct();
+
+	smp_mb();
+
+	while (!cpumask_test_cpu(1, &cpu_mask))
+		cpu_relax();
+
+	check_store_and_load_results("sal_barrier", 1, false, start, end);
+}
+
+void store_and_load_barrier_2(void)
+{
+	int i;
+	unsigned long start, end;
+
+	start = sync_start();
+	for (i=0; i< array_size; i++) {
+		volatile test_array *entry = &array[i];
+		unsigned int r;
+		entry->y = 1;
+		smp_mb();
+		r = entry->x;
+		entry->r[1] = r;
+	}
+	end = get_cntvct();
+
+	check_store_and_load_results("sal_barrier", 2, false, start, end);
+
+	cpumask_set_cpu(1, &cpu_mask);
+
+	halt();
+}
+
+
+/* Test array */
+static test_descr_t tests[] = {
+
+	{ "mp",         false,
+	  message_passing_read,
+	  { message_passing_write }
+	},
+
+	{ "mp_barrier", true,
+	  message_passing_read_barrier,
+	  { message_passing_write_barrier }
+	},
+
+	{ "mp_acqrel", true,
+	  message_passing_read_acquire,
+	  { message_passing_write_release }
+	},
+
+	{ "sal",       false,
+	  store_and_load_1,
+	  { store_and_load_2 }
+	},
+
+	{ "sal_barrier", true,
+	  store_and_load_barrier_1,
+	  { store_and_load_barrier_2 }
+	},
+};
+
+
+void setup_and_run_litmus(test_descr_t *test)
+{
+	array = calloc(array_size, sizeof(test_array));
+
+	if (array) {
+		int i = 0;
+		printf("Allocated test array @ %p\n", array);
+
+		while (test->secondary_fns[i]) {
+			smp_boot_secondary(i+1, test->secondary_fns[i]);
+			i++;
+		}
+
+		test->main_fn();
+	} else {
+		report("%s: failed to allocate memory",false, test->test_name);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+	unsigned int j;
+	test_descr_t *test = NULL;
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		for (j = 0; j < ARRAY_SIZE(tests); j++) {
+			if (strcmp(arg, tests[j].test_name) == 0)
+				test = &tests[j];
+		}
+
+		/* Test modifiers */
+		if (strstr(arg, "count=") != NULL) {
+			char *p = strstr(arg, "=");
+			array_size = atol(p+1);
+		} else if (strcmp (arg, "wait") == 0) {
+			wait_if_ahead = 1;
+		}
+	}
+
+	if (test) {
+		setup_and_run_litmus(test);
+	} else {
+		report("Unknown test", false);
+	}
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index abbfe79..355dcfb 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -142,3 +142,39 @@ smp = $(($MAX_SMP>4?4:$MAX_SMP))
 extra_params = -append 'excl'
 groups = locking
 accel = tcg
+
+[barrier-litmus::mp]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'mp'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::mp-barrier]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'mp_barrier'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::mp-acqrel]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'mp_acqrel'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::sal]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'sal'
+groups = barrier
+accel = tcg
+
+[barrier-litmus::sal-barrier]
+file = barrier-litmus-test.flat
+smp = 2
+extra_params = -append 'sal_barrier'
+groups = barrier
+accel = tcg
+
diff --git a/lib/arm/asm/barrier.h b/lib/arm/asm/barrier.h
index 394a4a2..e3b7a2e 100644
--- a/lib/arm/asm/barrier.h
+++ b/lib/arm/asm/barrier.h
@@ -1,9 +1,11 @@
 #ifndef _ASMARM_BARRIER_H_
 #define _ASMARM_BARRIER_H_
 /*
- * Adapted form arch/arm/include/asm/barrier.h
+ * Adapted from arch/arm/include/asm/barrier.h
  */
 
+#include <stdint.h>
+
 #define sev()		asm volatile("sev" : : : "memory")
 #define wfe()		asm volatile("wfe" : : : "memory")
 #define wfi()		asm volatile("wfi" : : : "memory")
@@ -20,4 +22,63 @@
 #define smp_rmb()	smp_mb()
 #define smp_wmb()	dmb(ishst)
 
+extern void abort(void);
+
+static inline void __write_once_size(volatile void *p, void *res, int size)
+{
+	switch (size) {
+	case 1: *(volatile uint8_t *)p = *(uint8_t *)res; break;
+	case 2: *(volatile uint16_t *)p = *(uint16_t *)res; break;
+	case 4: *(volatile uint32_t *)p = *(uint32_t *)res; break;
+	case 8: *(volatile uint64_t *)p = *(uint64_t *)res; break;
+	default:
+		/* unhandled case */
+		abort();
+	}
+}
+
+#define WRITE_ONCE(x, val) \
+({							\
+	union { typeof(x) __val; char __c[1]; } __u =	\
+		{ .__val = (typeof(x)) (val) }; \
+	__write_once_size(&(x), __u.__c, sizeof(x));	\
+	__u.__val;					\
+})
+
+#define smp_store_release(p, v)						\
+do {									\
+	smp_mb();							\
+	WRITE_ONCE(*p, v);						\
+} while (0)
+
+
+static inline
+void __read_once_size(const volatile void *p, void *res, int size)
+{
+	switch (size) {
+	case 1: *(uint8_t *)res = *(volatile uint8_t *)p; break;
+	case 2: *(uint16_t *)res = *(volatile uint16_t *)p; break;
+	case 4: *(uint32_t *)res = *(volatile uint32_t *)p; break;
+	case 8: *(uint64_t *)res = *(volatile uint64_t *)p; break;
+	default:
+		/* unhandled case */
+		abort();
+	}
+}
+
+#define READ_ONCE(x)							\
+({									\
+	union { typeof(x) __val; char __c[1]; } __u;			\
+	__read_once_size(&(x), __u.__c, sizeof(x));			\
+	__u.__val;							\
+})
+
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = READ_ONCE(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* _ASMARM_BARRIER_H_ */
diff --git a/lib/arm64/asm/barrier.h b/lib/arm64/asm/barrier.h
index dbdac9d..aafabdc 100644
--- a/lib/arm64/asm/barrier.h
+++ b/lib/arm64/asm/barrier.h
@@ -19,4 +19,54 @@
 #define smp_rmb()	dmb(ishld)
 #define smp_wmb()	dmb(ishst)
 
+#define smp_store_release(p, v)						\
+do {									\
+	switch (sizeof(*p)) {						\
+	case 1:								\
+		asm volatile ("stlrb %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 2:								\
+		asm volatile ("stlrh %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 4:								\
+		asm volatile ("stlr %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 8:								\
+		asm volatile ("stlr %1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	}								\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	union { typeof(*p) __val; char __c[1]; } __u;			\
+	switch (sizeof(*p)) {						\
+	case 1:								\
+		asm volatile ("ldarb %w0, %1"				\
+			: "=r" (*(u8 *)__u.__c)				\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	case 2:								\
+		asm volatile ("ldarh %w0, %1"				\
+			: "=r" (*(u16 *)__u.__c)			\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	case 4:								\
+		asm volatile ("ldar %w0, %1"				\
+			: "=r" (*(u32 *)__u.__c)			\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	case 8:								\
+		asm volatile ("ldar %0, %1"				\
+			: "=r" (*(u64 *)__u.__c)			\
+			: "Q" (*p) : "memory");				\
+		break;							\
+	}								\
+	__u.__val;							\
+})
+
 #endif /* _ASMARM64_BARRIER_H_ */
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 11/11] arm/tcg-test: some basic TCG exercising tests
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-24 16:10   ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

These tests are not really aimed at KVM at all but exist to stretch
QEMU's TCG code generator. In particular these exercise the ability of
the TCG to:

  * Chain TranslationBlocks together (tight)
  * Handle heavy usage of the tb_jump_cache (paged)
  * Pathological case of computed local jumps (computed)

In addition the tests can be varied by adding IPI IRQs or SMC sequences
into the mix to stress the tcg_exit and invalidation mechanisms.

To explicitly stress the tb_flush() mechanism you can use the mod/rounds
parameters to force more frequent tb invalidation. Combined with setting
-tb-size 1 in QEMU to limit the code generation buffer size.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v5
  - added armv8 version of the tcg tests
  - max out at -smp 4 in unittests.cfg
  - add up IRQs sent and delivered for PASS/FAIL
  - take into account error count
  - add "rounds=" parameter
  - tweak smc to tb-size=1
  - printf fmt fix
v7
  - merged in IRQ numerology
  - updated to latest IRQ API
---
 arm/Makefile.arm     |   2 +
 arm/Makefile.arm64   |   2 +
 arm/Makefile.common  |   1 +
 arm/tcg-test-asm.S   | 170 ++++++++++++++++++++++++++
 arm/tcg-test-asm64.S | 169 ++++++++++++++++++++++++++
 arm/tcg-test.c       | 337 +++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg    |  84 +++++++++++++
 7 files changed, 765 insertions(+)
 create mode 100644 arm/tcg-test-asm.S
 create mode 100644 arm/tcg-test-asm64.S
 create mode 100644 arm/tcg-test.c

diff --git a/arm/Makefile.arm b/arm/Makefile.arm
index 92f3757..7058bd2 100644
--- a/arm/Makefile.arm
+++ b/arm/Makefile.arm
@@ -24,4 +24,6 @@ tests =
 
 include $(TEST_DIR)/Makefile.common
 
+$(TEST_DIR)/tcg-test.elf: $(cstart.o) $(TEST_DIR)/tcg-test.o $(TEST_DIR)/tcg-test-asm.o
+
 arch_clean: arm_clean
diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
index 0b0761c..678fca4 100644
--- a/arm/Makefile.arm64
+++ b/arm/Makefile.arm64
@@ -16,5 +16,7 @@ tests =
 
 include $(TEST_DIR)/Makefile.common
 
+$(TEST_DIR)/tcg-test.elf: $(cstart.o) $(TEST_DIR)/tcg-test.o $(TEST_DIR)/tcg-test-asm64.o
+
 arch_clean: arm_clean
 	$(RM) lib/arm64/.*.d
diff --git a/arm/Makefile.common b/arm/Makefile.common
index a508128..9af758f 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -17,6 +17,7 @@ tests-common += $(TEST_DIR)/tlbflush-code.flat
 tests-common += $(TEST_DIR)/tlbflush-data.flat
 tests-common += $(TEST_DIR)/locking-test.flat
 tests-common += $(TEST_DIR)/barrier-litmus-test.flat
+tests-common += $(TEST_DIR)/tcg-test.flat
 
 all: test_cases
 
diff --git a/arm/tcg-test-asm.S b/arm/tcg-test-asm.S
new file mode 100644
index 0000000..6e823b7
--- /dev/null
+++ b/arm/tcg-test-asm.S
@@ -0,0 +1,170 @@
+/*
+ * TCG Test assembler functions for armv7 tests.
+ *
+ * Copyright (C) 2016, Linaro Ltd, Alex Bennée <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ *
+ * These helper functions are written in pure asm to control the size
+ * of the basic blocks and ensure they fit neatly into page
+ * aligned chunks. The pattern of branches they follow is determined by
+ * the 32 bit seed they are passed. It should be the same for each set.
+ *
+ * Calling convention
+ *  - r0, iterations
+ *  - r1, jump pattern
+ *  - r2-r3, scratch
+ *
+ * Returns r0
+ */
+
+.arm
+
+.section .text
+
+/* Tight - all blocks should quickly be patched and should run
+ * very fast unless irqs or smc gets in the way
+ */
+
+.global tight_start
+tight_start:
+        subs    r0, r0, #1
+        beq     tight_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     tightA
+        b       tight_start
+
+tightA:
+        subs    r0, r0, #1
+        beq     tight_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     tightB
+        b       tight_start
+
+tightB:
+        subs    r0, r0, #1
+        beq     tight_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     tight_start
+        b       tightA
+
+.global tight_end
+tight_end:
+        mov     pc, lr
+
+/*
+ * Computed jumps cannot be hardwired into the basic blocks so each one
+ * will cause an exit for the main execution loop to look up the next block.
+ *
+ * There is some caching which should ameliorate the cost a little.
+ */
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global computed_start
+computed_start:
+        subs    r0, r0, #1
+        beq     computed_end
+
+        /* Jump table */
+        ror     r1, r1, #1
+        and     r2, r1, #1
+        adr     r3, computed_jump_table
+        ldr     r2, [r3, r2, lsl #2]
+        mov     pc, r2
+
+        b       computed_err
+
+computed_jump_table:
+        .word   computed_start
+        .word   computedA
+
+computedA:
+        subs    r0, r0, #1
+        beq     computed_end
+
+        /* Jump into code */
+        ror     r1, r1, #1
+        and     r2, r1, #1
+        adr     r3, 1f
+        add	r3, r2, lsl #2
+        mov     pc, r3
+1:      b       computed_start
+        b       computedB
+
+        b       computed_err
+
+
+computedB:
+        subs    r0, r0, #1
+        beq     computed_end
+        ror     r1, r1, #1
+
+        /* Conditional register load */
+        adr     r3, computedA
+        tst     r1, #1
+        adreq   r3, computed_start
+        mov     pc, r3
+
+        b       computed_err
+
+computed_err:
+        mov     r0, #1
+        .global computed_end
+computed_end:
+        mov     pc, lr
+
+
+/*
+ * Page hoping
+ *
+ * Each block is in a different page, hence the blocks never get joined
+ */
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global paged_start
+paged_start:
+        subs    r0, r0, #1
+        beq     paged_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     pagedA
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedA:
+        subs    r0, r0, #1
+        beq     paged_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     pagedB
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedB:
+        subs    r0, r0, #1
+        beq     paged_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     paged_start
+        b       pagedA
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+.global paged_end
+paged_end:
+        mov     pc, lr
+
+.global test_code_end
+test_code_end:
diff --git a/arm/tcg-test-asm64.S b/arm/tcg-test-asm64.S
new file mode 100644
index 0000000..22bcfb4
--- /dev/null
+++ b/arm/tcg-test-asm64.S
@@ -0,0 +1,169 @@
+/*
+ * TCG Test assembler functions for armv8 tests.
+ *
+ * Copyright (C) 2016, Linaro Ltd, Alex Bennée <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ *
+ * These helper functions are written in pure asm to control the size
+ * of the basic blocks and ensure they fit neatly into page
+ * aligned chunks. The pattern of branches they follow is determined by
+ * the 32 bit seed they are passed. It should be the same for each set.
+ *
+ * Calling convention
+ *  - x0, iterations
+ *  - x1, jump pattern
+ *  - x2-x3, scratch
+ *
+ * Returns x0
+ */
+
+.section .text
+
+/* Tight - all blocks should quickly be patched and should run
+ * very fast unless irqs or smc gets in the way
+ */
+
+.global tight_start
+tight_start:
+        subs    x0, x0, #1
+        beq     tight_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     tightA
+        b       tight_start
+
+tightA:
+        subs    x0, x0, #1
+        beq     tight_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     tightB
+        b       tight_start
+
+tightB:
+        subs    x0, x0, #1
+        beq     tight_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     tight_start
+        b       tightA
+
+.global tight_end
+tight_end:
+        ret
+
+/*
+ * Computed jumps cannot be hardwired into the basic blocks so each one
+ * will cause an exit for the main execution loop to look up the next block.
+ *
+ * There is some caching which should ameliorate the cost a little.
+ */
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global computed_start
+computed_start:
+        subs    x0, x0, #1
+        beq     computed_end
+
+        /* Jump table */
+        ror     x1, x1, #1
+        and     x2, x1, #1
+        adr     x3, computed_jump_table
+        ldr     x2, [x3, x2, lsl #3]
+        br      x2
+
+        b       computed_err
+
+computed_jump_table:
+        .quad   computed_start
+        .quad   computedA
+
+computedA:
+        subs    x0, x0, #1
+        beq     computed_end
+
+        /* Jump into code */
+        ror     x1, x1, #1
+        and     x2, x1, #1
+        adr     x3, 1f
+        add	x3, x3, x2, lsl #2
+        br      x3
+1:      b       computed_start
+        b       computedB
+
+        b       computed_err
+
+
+computedB:
+        subs    x0, x0, #1
+        beq     computed_end
+        ror     x1, x1, #1
+
+        /* Conditional register load */
+        adr     x2, computedA
+        adr     x3, computed_start
+        tst     x1, #1
+        csel    x2, x3, x2, eq
+        br      x2
+
+        b       computed_err
+
+computed_err:
+        mov     x0, #1
+        .global computed_end
+computed_end:
+        ret
+
+
+/*
+ * Page hoping
+ *
+ * Each block is in a different page, hence the blocks never get joined
+ */
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global paged_start
+paged_start:
+        subs    x0, x0, #1
+        beq     paged_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     pagedA
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedA:
+        subs    x0, x0, #1
+        beq     paged_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     pagedB
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedB:
+        subs    x0, x0, #1
+        beq     paged_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     paged_start
+        b       pagedA
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+.global paged_end
+paged_end:
+        ret
+
+.global test_code_end
+test_code_end:
diff --git a/arm/tcg-test.c b/arm/tcg-test.c
new file mode 100644
index 0000000..341dca3
--- /dev/null
+++ b/arm/tcg-test.c
@@ -0,0 +1,337 @@
+/*
+ * ARM TCG Tests
+ *
+ * These tests are explicitly aimed at stretching the QEMU TCG engine.
+ */
+
+#include <libcflat.h>
+#include <asm/processor.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+#include <asm/gic.h>
+
+#include <prng.h>
+
+#define MAX_CPUS 8
+
+/* These entry points are in the assembly code */
+extern int tight_start(uint32_t count, uint32_t pattern);
+extern int computed_start(uint32_t count, uint32_t pattern);
+extern int paged_start(uint32_t count, uint32_t pattern);
+extern uint32_t tight_end;
+extern uint32_t computed_end;
+extern uint32_t paged_end;
+extern unsigned long test_code_end;
+
+typedef int (*test_fn)(uint32_t count, uint32_t pattern);
+
+typedef struct {
+	const char *test_name;
+	bool       should_pass;
+	test_fn    start_fn;
+	uint32_t   *code_end;
+} test_descr_t;
+
+/* Test array */
+static test_descr_t tests[] = {
+       /*
+	* Tight chain.
+	*
+	* These are a bunch of basic blocks that have fixed branches in
+	* a page aligned space. The branches taken are decided by a
+	* psuedo-random bitmap for each CPU.
+	*
+	* Once the basic blocks have been chained together by the TCG they
+	* should run until they reach their block count. This will be the
+	* most efficient mode in which generated code is run. The only other
+	* exits will be caused by interrupts or TB invalidation.
+	*/
+	{ "tight", true, tight_start, &tight_end },
+	/*
+	 * Computed jumps.
+	 *
+	 * A bunch of basic blocks which just do computed jumps so the basic
+	 * block is never chained but they are all within a page (maybe not
+	 * required). This will exercise the cache lookup but not the new
+	 * generation.
+	 */
+	{ "computed", true, computed_start, &computed_end },
+        /*
+	 * Page ping pong.
+	 *
+	 * Have the blocks are separated by PAGE_SIZE so they can never
+	 * be chained together.
+	 *
+	 */
+	{ "paged", true, paged_start, &paged_end}
+};
+
+static test_descr_t *test = NULL;
+
+static int iterations = 1000000;
+static int rounds = 1000;
+static int mod_freq = 5;
+static uint32_t pattern[MAX_CPUS];
+
+/* control flags */
+static int smc = 0;
+static int irq = 0;
+static int check_irq = 0;
+
+/* IRQ accounting */
+#define MAX_IRQ_IDS 16
+static int irqv;
+static unsigned long irq_sent_ts[MAX_CPUS][MAX_CPUS][MAX_IRQ_IDS];
+
+static int irq_recv[MAX_CPUS];
+static int irq_sent[MAX_CPUS];
+static int irq_overlap[MAX_CPUS];  /* if ts > now, i.e a race */
+static int irq_slow[MAX_CPUS];  /* if delay > threshold */
+static unsigned long irq_latency[MAX_CPUS]; /* cumulative time */
+
+static int errors[MAX_CPUS];
+
+static cpumask_t smp_test_complete;
+
+static cpumask_t ready;
+
+static void wait_on_ready(void)
+{
+	cpumask_set_cpu(smp_processor_id(), &ready);
+	while (!cpumask_full(&ready))
+		cpu_relax();
+}
+
+/* This triggers TCGs SMC detection by writing values to the executing
+ * code pages. We are not actually modifying the instructions and the
+ * underlying code will remain unchanged. However this should trigger
+ * invalidation of the Translation Blocks
+ */
+
+void trigger_smc_detection(uint32_t *start, uint32_t *end)
+{
+	volatile uint32_t *ptr = start;
+	while (ptr < end) {
+		uint32_t inst = *ptr;
+		*ptr++ = inst;
+	}
+}
+
+/* Handler for receiving IRQs */
+
+static void irq_handler(struct pt_regs *regs __unused)
+{
+	unsigned long then, now = get_cntvct();
+	int cpu = smp_processor_id();
+	u32 irqstat = gic_read_iar();
+	u32 irqnr = gic_iar_irqnr(irqstat);
+
+	if (irqnr != GICC_INT_SPURIOUS) {
+		unsigned int src_cpu = (irqstat >> 10) & 0x7; ;
+		gic_write_eoir(irqstat);
+		irq_recv[cpu]++;
+
+		then = irq_sent_ts[src_cpu][cpu][irqnr];
+
+		if (then > now) {
+			irq_overlap[cpu]++;
+		} else {
+			unsigned long latency = (now - then);
+			if (latency > 30000) {
+				irq_slow[cpu]++;
+			} else {
+				irq_latency[cpu] += latency;
+			}
+		}
+	}
+}
+
+/* This triggers cross-CPU IRQs. Each IRQ should cause the basic block
+ * execution to finish the main run-loop get entered again.
+ */
+int send_cross_cpu_irqs(int this_cpu, int irq)
+{
+	int cpu, sent = 0;
+	cpumask_t mask;
+
+	cpumask_copy(&mask, &cpu_present_mask);
+
+	for_each_present_cpu(cpu) {
+		if (cpu != this_cpu) {
+			irq_sent_ts[this_cpu][cpu][irq] = get_cntvct();
+			cpumask_clear_cpu(cpu, &mask);
+			sent++;
+		}
+	}
+
+	gic_ipi_send_mask(irq, &mask);
+
+	return sent;
+}
+
+void do_test(void)
+{
+	int cpu = smp_processor_id();
+	int i, irq_id = 0;
+
+	printf("CPU%d: online and setting up with pattern 0x%"PRIx32"\n", cpu, pattern[cpu]);
+
+	if (irq) {
+		gic_enable_defaults();
+#ifdef __arm__
+		install_exception_handler(EXCPTN_IRQ, irq_handler);
+#else
+		install_irq_handler(EL1H_IRQ, irq_handler);
+#endif
+		local_irq_enable();
+
+		wait_on_ready();
+	}
+
+	for (i=0; i<rounds; i++)
+	{
+		/* Enter the blocks */
+		errors[cpu] += test->start_fn(iterations, pattern[cpu]);
+
+		if ((i + cpu) % mod_freq == 0)
+		{
+			if (smc) {
+				trigger_smc_detection((uint32_t *) test->start_fn,
+						test->code_end);
+			}
+			if (irq) {
+				irq_sent[cpu] += send_cross_cpu_irqs(cpu, irq_id);
+				irq_id++;
+				irq_id = irq_id % 15;
+			}
+		}
+	}
+
+	smp_wmb();
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+void report_irq_stats(int cpu)
+{
+	int recv = irq_recv[cpu];
+	int race = irq_overlap[cpu];
+	int slow = irq_slow[cpu];
+
+	unsigned long avg_latency = irq_latency[cpu] / (recv - (race + slow));
+
+	printf("CPU%d: %d irqs (%d races, %d slow,  %ld ticks avg latency)\n",
+		cpu, recv, race, slow, avg_latency);
+}
+
+
+void setup_and_run_tcg_test(void)
+{
+	static const unsigned char seed[] = "tcg-test";
+	struct isaac_ctx prng_context;
+	int cpu;
+	int total_err = 0, total_sent = 0, total_recv = 0;
+
+	isaac_init(&prng_context, &seed[0], sizeof(seed));
+
+	/* boot other CPUs */
+	for_each_present_cpu(cpu) {
+		pattern[cpu] = isaac_next_uint32(&prng_context);
+
+		if (cpu == 0)
+			continue;
+
+		smp_boot_secondary(cpu, do_test);
+	}
+
+	do_test();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	smp_mb();
+
+	/* Now total up errors and irqs */
+	for_each_present_cpu(cpu) {
+		total_err += errors[cpu];
+		total_sent += irq_sent[cpu];
+		total_recv += irq_recv[cpu];
+
+		if (check_irq) {
+			report_irq_stats(cpu);
+		}
+	}
+
+	if (check_irq) {
+		if (total_sent != total_recv) {
+			report("%d IRQs sent, %d received\n", false, total_sent, total_recv);
+		} else {
+			report("%d errors, IRQs OK", total_err == 0, total_err);
+		}
+	} else {
+		report("%d errors, IRQs not checked", total_err == 0, total_err);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+	unsigned int j;
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		for (j = 0; j < ARRAY_SIZE(tests); j++) {
+			if (strcmp(arg, tests[j].test_name) == 0) {
+				test = &tests[j];
+			}
+		}
+
+		/* Test modifiers */
+		if (strstr(arg, "mod=") != NULL) {
+			char *p = strstr(arg, "=");
+			mod_freq = atol(p+1);
+		}
+
+		if (strstr(arg, "rounds=") != NULL) {
+			char *p = strstr(arg, "=");
+			rounds = atol(p+1);
+		}
+
+		if (strcmp(arg, "smc") == 0) {
+			unsigned long test_start = (unsigned long) &tight_start;
+			unsigned long test_end = (unsigned long) &test_code_end;
+
+			smc = 1;
+			mmu_set_range_ptes(mmu_idmap, test_start, test_start, test_end,
+					__pgprot(PTE_WBWA));
+
+			report_prefix_push("smc");
+		}
+
+		if (strcmp(arg, "irq") == 0) {
+			irq = 1;
+			if (!gic_init())
+				report_abort("No supported gic present!");
+			irqv = gic_version();
+			report_prefix_push("irq");
+		}
+
+		if (strcmp(arg, "check_irq") == 0) {
+			check_irq = 1;
+		}
+	}
+
+	if (test) {
+		smp_mb();
+		setup_and_run_tcg_test();
+	} else {
+		report("Unknown test", false);
+	}
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 355dcfb..38934f2 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -178,3 +178,87 @@ extra_params = -append 'sal_barrier'
 groups = barrier
 accel = tcg
 
+# TCG Tests
+[tcg::tight]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight'
+groups = tcg
+accel = tcg
+
+[tcg::tight-smc]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight smc' -tb-size 1
+groups = tcg
+accel = tcg
+
+[tcg::tight-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight irq'
+groups = tcg
+accel = tcg
+
+[tcg::tight-smc-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight smc irq'
+groups = tcg
+accel = tcg
+
+[tcg::computed]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed'
+groups = tcg
+accel = tcg
+
+[tcg::computed-smc]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed smc'
+groups = tcg
+accel = tcg
+
+[tcg::computed-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed irq'
+groups = tcg
+accel = tcg
+
+[tcg::computed-smc-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed smc irq'
+groups = tcg
+accel = tcg
+
+[tcg::paged]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged'
+groups = tcg
+accel = tcg
+
+[tcg::paged-smc]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged smc'
+groups = tcg
+accel = tcg
+
+[tcg::paged-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged irq'
+groups = tcg
+accel = tcg
+
+[tcg::paged-smc-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged smc irq'
+groups = tcg
+accel = tcg
-- 
2.10.1


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 11/11] arm/tcg-test: some basic TCG exercising tests
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana, Alex Bennée

These tests are not really aimed at KVM at all but exist to stretch
QEMU's TCG code generator. In particular these exercise the ability of
the TCG to:

  * Chain TranslationBlocks together (tight)
  * Handle heavy usage of the tb_jump_cache (paged)
  * Pathological case of computed local jumps (computed)

In addition the tests can be varied by adding IPI IRQs or SMC sequences
into the mix to stress the tcg_exit and invalidation mechanisms.

To explicitly stress the tb_flush() mechanism you can use the mod/rounds
parameters to force more frequent tb invalidation. Combined with setting
-tb-size 1 in QEMU to limit the code generation buffer size.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v5
  - added armv8 version of the tcg tests
  - max out at -smp 4 in unittests.cfg
  - add up IRQs sent and delivered for PASS/FAIL
  - take into account error count
  - add "rounds=" parameter
  - tweak smc to tb-size=1
  - printf fmt fix
v7
  - merged in IRQ numerology
  - updated to latest IRQ API
---
 arm/Makefile.arm     |   2 +
 arm/Makefile.arm64   |   2 +
 arm/Makefile.common  |   1 +
 arm/tcg-test-asm.S   | 170 ++++++++++++++++++++++++++
 arm/tcg-test-asm64.S | 169 ++++++++++++++++++++++++++
 arm/tcg-test.c       | 337 +++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg    |  84 +++++++++++++
 7 files changed, 765 insertions(+)
 create mode 100644 arm/tcg-test-asm.S
 create mode 100644 arm/tcg-test-asm64.S
 create mode 100644 arm/tcg-test.c

diff --git a/arm/Makefile.arm b/arm/Makefile.arm
index 92f3757..7058bd2 100644
--- a/arm/Makefile.arm
+++ b/arm/Makefile.arm
@@ -24,4 +24,6 @@ tests =
 
 include $(TEST_DIR)/Makefile.common
 
+$(TEST_DIR)/tcg-test.elf: $(cstart.o) $(TEST_DIR)/tcg-test.o $(TEST_DIR)/tcg-test-asm.o
+
 arch_clean: arm_clean
diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
index 0b0761c..678fca4 100644
--- a/arm/Makefile.arm64
+++ b/arm/Makefile.arm64
@@ -16,5 +16,7 @@ tests =
 
 include $(TEST_DIR)/Makefile.common
 
+$(TEST_DIR)/tcg-test.elf: $(cstart.o) $(TEST_DIR)/tcg-test.o $(TEST_DIR)/tcg-test-asm64.o
+
 arch_clean: arm_clean
 	$(RM) lib/arm64/.*.d
diff --git a/arm/Makefile.common b/arm/Makefile.common
index a508128..9af758f 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -17,6 +17,7 @@ tests-common += $(TEST_DIR)/tlbflush-code.flat
 tests-common += $(TEST_DIR)/tlbflush-data.flat
 tests-common += $(TEST_DIR)/locking-test.flat
 tests-common += $(TEST_DIR)/barrier-litmus-test.flat
+tests-common += $(TEST_DIR)/tcg-test.flat
 
 all: test_cases
 
diff --git a/arm/tcg-test-asm.S b/arm/tcg-test-asm.S
new file mode 100644
index 0000000..6e823b7
--- /dev/null
+++ b/arm/tcg-test-asm.S
@@ -0,0 +1,170 @@
+/*
+ * TCG Test assembler functions for armv7 tests.
+ *
+ * Copyright (C) 2016, Linaro Ltd, Alex Bennée <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ *
+ * These helper functions are written in pure asm to control the size
+ * of the basic blocks and ensure they fit neatly into page
+ * aligned chunks. The pattern of branches they follow is determined by
+ * the 32 bit seed they are passed. It should be the same for each set.
+ *
+ * Calling convention
+ *  - r0, iterations
+ *  - r1, jump pattern
+ *  - r2-r3, scratch
+ *
+ * Returns r0
+ */
+
+.arm
+
+.section .text
+
+/* Tight - all blocks should quickly be patched and should run
+ * very fast unless irqs or smc gets in the way
+ */
+
+.global tight_start
+tight_start:
+        subs    r0, r0, #1
+        beq     tight_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     tightA
+        b       tight_start
+
+tightA:
+        subs    r0, r0, #1
+        beq     tight_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     tightB
+        b       tight_start
+
+tightB:
+        subs    r0, r0, #1
+        beq     tight_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     tight_start
+        b       tightA
+
+.global tight_end
+tight_end:
+        mov     pc, lr
+
+/*
+ * Computed jumps cannot be hardwired into the basic blocks so each one
+ * will cause an exit for the main execution loop to look up the next block.
+ *
+ * There is some caching which should ameliorate the cost a little.
+ */
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global computed_start
+computed_start:
+        subs    r0, r0, #1
+        beq     computed_end
+
+        /* Jump table */
+        ror     r1, r1, #1
+        and     r2, r1, #1
+        adr     r3, computed_jump_table
+        ldr     r2, [r3, r2, lsl #2]
+        mov     pc, r2
+
+        b       computed_err
+
+computed_jump_table:
+        .word   computed_start
+        .word   computedA
+
+computedA:
+        subs    r0, r0, #1
+        beq     computed_end
+
+        /* Jump into code */
+        ror     r1, r1, #1
+        and     r2, r1, #1
+        adr     r3, 1f
+        add	r3, r2, lsl #2
+        mov     pc, r3
+1:      b       computed_start
+        b       computedB
+
+        b       computed_err
+
+
+computedB:
+        subs    r0, r0, #1
+        beq     computed_end
+        ror     r1, r1, #1
+
+        /* Conditional register load */
+        adr     r3, computedA
+        tst     r1, #1
+        adreq   r3, computed_start
+        mov     pc, r3
+
+        b       computed_err
+
+computed_err:
+        mov     r0, #1
+        .global computed_end
+computed_end:
+        mov     pc, lr
+
+
+/*
+ * Page hoping
+ *
+ * Each block is in a different page, hence the blocks never get joined
+ */
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global paged_start
+paged_start:
+        subs    r0, r0, #1
+        beq     paged_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     pagedA
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedA:
+        subs    r0, r0, #1
+        beq     paged_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     pagedB
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedB:
+        subs    r0, r0, #1
+        beq     paged_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     paged_start
+        b       pagedA
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+.global paged_end
+paged_end:
+        mov     pc, lr
+
+.global test_code_end
+test_code_end:
diff --git a/arm/tcg-test-asm64.S b/arm/tcg-test-asm64.S
new file mode 100644
index 0000000..22bcfb4
--- /dev/null
+++ b/arm/tcg-test-asm64.S
@@ -0,0 +1,169 @@
+/*
+ * TCG Test assembler functions for armv8 tests.
+ *
+ * Copyright (C) 2016, Linaro Ltd, Alex Bennée <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ *
+ * These helper functions are written in pure asm to control the size
+ * of the basic blocks and ensure they fit neatly into page
+ * aligned chunks. The pattern of branches they follow is determined by
+ * the 32 bit seed they are passed. It should be the same for each set.
+ *
+ * Calling convention
+ *  - x0, iterations
+ *  - x1, jump pattern
+ *  - x2-x3, scratch
+ *
+ * Returns x0
+ */
+
+.section .text
+
+/* Tight - all blocks should quickly be patched and should run
+ * very fast unless irqs or smc gets in the way
+ */
+
+.global tight_start
+tight_start:
+        subs    x0, x0, #1
+        beq     tight_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     tightA
+        b       tight_start
+
+tightA:
+        subs    x0, x0, #1
+        beq     tight_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     tightB
+        b       tight_start
+
+tightB:
+        subs    x0, x0, #1
+        beq     tight_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     tight_start
+        b       tightA
+
+.global tight_end
+tight_end:
+        ret
+
+/*
+ * Computed jumps cannot be hardwired into the basic blocks so each one
+ * will cause an exit for the main execution loop to look up the next block.
+ *
+ * There is some caching which should ameliorate the cost a little.
+ */
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global computed_start
+computed_start:
+        subs    x0, x0, #1
+        beq     computed_end
+
+        /* Jump table */
+        ror     x1, x1, #1
+        and     x2, x1, #1
+        adr     x3, computed_jump_table
+        ldr     x2, [x3, x2, lsl #3]
+        br      x2
+
+        b       computed_err
+
+computed_jump_table:
+        .quad   computed_start
+        .quad   computedA
+
+computedA:
+        subs    x0, x0, #1
+        beq     computed_end
+
+        /* Jump into code */
+        ror     x1, x1, #1
+        and     x2, x1, #1
+        adr     x3, 1f
+        add	x3, x3, x2, lsl #2
+        br      x3
+1:      b       computed_start
+        b       computedB
+
+        b       computed_err
+
+
+computedB:
+        subs    x0, x0, #1
+        beq     computed_end
+        ror     x1, x1, #1
+
+        /* Conditional register load */
+        adr     x2, computedA
+        adr     x3, computed_start
+        tst     x1, #1
+        csel    x2, x3, x2, eq
+        br      x2
+
+        b       computed_err
+
+computed_err:
+        mov     x0, #1
+        .global computed_end
+computed_end:
+        ret
+
+
+/*
+ * Page hoping
+ *
+ * Each block is in a different page, hence the blocks never get joined
+ */
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global paged_start
+paged_start:
+        subs    x0, x0, #1
+        beq     paged_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     pagedA
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedA:
+        subs    x0, x0, #1
+        beq     paged_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     pagedB
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedB:
+        subs    x0, x0, #1
+        beq     paged_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     paged_start
+        b       pagedA
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+.global paged_end
+paged_end:
+        ret
+
+.global test_code_end
+test_code_end:
diff --git a/arm/tcg-test.c b/arm/tcg-test.c
new file mode 100644
index 0000000..341dca3
--- /dev/null
+++ b/arm/tcg-test.c
@@ -0,0 +1,337 @@
+/*
+ * ARM TCG Tests
+ *
+ * These tests are explicitly aimed at stretching the QEMU TCG engine.
+ */
+
+#include <libcflat.h>
+#include <asm/processor.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+#include <asm/gic.h>
+
+#include <prng.h>
+
+#define MAX_CPUS 8
+
+/* These entry points are in the assembly code */
+extern int tight_start(uint32_t count, uint32_t pattern);
+extern int computed_start(uint32_t count, uint32_t pattern);
+extern int paged_start(uint32_t count, uint32_t pattern);
+extern uint32_t tight_end;
+extern uint32_t computed_end;
+extern uint32_t paged_end;
+extern unsigned long test_code_end;
+
+typedef int (*test_fn)(uint32_t count, uint32_t pattern);
+
+typedef struct {
+	const char *test_name;
+	bool       should_pass;
+	test_fn    start_fn;
+	uint32_t   *code_end;
+} test_descr_t;
+
+/* Test array */
+static test_descr_t tests[] = {
+       /*
+	* Tight chain.
+	*
+	* These are a bunch of basic blocks that have fixed branches in
+	* a page aligned space. The branches taken are decided by a
+	* psuedo-random bitmap for each CPU.
+	*
+	* Once the basic blocks have been chained together by the TCG they
+	* should run until they reach their block count. This will be the
+	* most efficient mode in which generated code is run. The only other
+	* exits will be caused by interrupts or TB invalidation.
+	*/
+	{ "tight", true, tight_start, &tight_end },
+	/*
+	 * Computed jumps.
+	 *
+	 * A bunch of basic blocks which just do computed jumps so the basic
+	 * block is never chained but they are all within a page (maybe not
+	 * required). This will exercise the cache lookup but not the new
+	 * generation.
+	 */
+	{ "computed", true, computed_start, &computed_end },
+        /*
+	 * Page ping pong.
+	 *
+	 * Have the blocks are separated by PAGE_SIZE so they can never
+	 * be chained together.
+	 *
+	 */
+	{ "paged", true, paged_start, &paged_end}
+};
+
+static test_descr_t *test = NULL;
+
+static int iterations = 1000000;
+static int rounds = 1000;
+static int mod_freq = 5;
+static uint32_t pattern[MAX_CPUS];
+
+/* control flags */
+static int smc = 0;
+static int irq = 0;
+static int check_irq = 0;
+
+/* IRQ accounting */
+#define MAX_IRQ_IDS 16
+static int irqv;
+static unsigned long irq_sent_ts[MAX_CPUS][MAX_CPUS][MAX_IRQ_IDS];
+
+static int irq_recv[MAX_CPUS];
+static int irq_sent[MAX_CPUS];
+static int irq_overlap[MAX_CPUS];  /* if ts > now, i.e a race */
+static int irq_slow[MAX_CPUS];  /* if delay > threshold */
+static unsigned long irq_latency[MAX_CPUS]; /* cumulative time */
+
+static int errors[MAX_CPUS];
+
+static cpumask_t smp_test_complete;
+
+static cpumask_t ready;
+
+static void wait_on_ready(void)
+{
+	cpumask_set_cpu(smp_processor_id(), &ready);
+	while (!cpumask_full(&ready))
+		cpu_relax();
+}
+
+/* This triggers TCGs SMC detection by writing values to the executing
+ * code pages. We are not actually modifying the instructions and the
+ * underlying code will remain unchanged. However this should trigger
+ * invalidation of the Translation Blocks
+ */
+
+void trigger_smc_detection(uint32_t *start, uint32_t *end)
+{
+	volatile uint32_t *ptr = start;
+	while (ptr < end) {
+		uint32_t inst = *ptr;
+		*ptr++ = inst;
+	}
+}
+
+/* Handler for receiving IRQs */
+
+static void irq_handler(struct pt_regs *regs __unused)
+{
+	unsigned long then, now = get_cntvct();
+	int cpu = smp_processor_id();
+	u32 irqstat = gic_read_iar();
+	u32 irqnr = gic_iar_irqnr(irqstat);
+
+	if (irqnr != GICC_INT_SPURIOUS) {
+		unsigned int src_cpu = (irqstat >> 10) & 0x7; ;
+		gic_write_eoir(irqstat);
+		irq_recv[cpu]++;
+
+		then = irq_sent_ts[src_cpu][cpu][irqnr];
+
+		if (then > now) {
+			irq_overlap[cpu]++;
+		} else {
+			unsigned long latency = (now - then);
+			if (latency > 30000) {
+				irq_slow[cpu]++;
+			} else {
+				irq_latency[cpu] += latency;
+			}
+		}
+	}
+}
+
+/* This triggers cross-CPU IRQs. Each IRQ should cause the basic block
+ * execution to finish the main run-loop get entered again.
+ */
+int send_cross_cpu_irqs(int this_cpu, int irq)
+{
+	int cpu, sent = 0;
+	cpumask_t mask;
+
+	cpumask_copy(&mask, &cpu_present_mask);
+
+	for_each_present_cpu(cpu) {
+		if (cpu != this_cpu) {
+			irq_sent_ts[this_cpu][cpu][irq] = get_cntvct();
+			cpumask_clear_cpu(cpu, &mask);
+			sent++;
+		}
+	}
+
+	gic_ipi_send_mask(irq, &mask);
+
+	return sent;
+}
+
+void do_test(void)
+{
+	int cpu = smp_processor_id();
+	int i, irq_id = 0;
+
+	printf("CPU%d: online and setting up with pattern 0x%"PRIx32"\n", cpu, pattern[cpu]);
+
+	if (irq) {
+		gic_enable_defaults();
+#ifdef __arm__
+		install_exception_handler(EXCPTN_IRQ, irq_handler);
+#else
+		install_irq_handler(EL1H_IRQ, irq_handler);
+#endif
+		local_irq_enable();
+
+		wait_on_ready();
+	}
+
+	for (i=0; i<rounds; i++)
+	{
+		/* Enter the blocks */
+		errors[cpu] += test->start_fn(iterations, pattern[cpu]);
+
+		if ((i + cpu) % mod_freq == 0)
+		{
+			if (smc) {
+				trigger_smc_detection((uint32_t *) test->start_fn,
+						test->code_end);
+			}
+			if (irq) {
+				irq_sent[cpu] += send_cross_cpu_irqs(cpu, irq_id);
+				irq_id++;
+				irq_id = irq_id % 15;
+			}
+		}
+	}
+
+	smp_wmb();
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+void report_irq_stats(int cpu)
+{
+	int recv = irq_recv[cpu];
+	int race = irq_overlap[cpu];
+	int slow = irq_slow[cpu];
+
+	unsigned long avg_latency = irq_latency[cpu] / (recv - (race + slow));
+
+	printf("CPU%d: %d irqs (%d races, %d slow,  %ld ticks avg latency)\n",
+		cpu, recv, race, slow, avg_latency);
+}
+
+
+void setup_and_run_tcg_test(void)
+{
+	static const unsigned char seed[] = "tcg-test";
+	struct isaac_ctx prng_context;
+	int cpu;
+	int total_err = 0, total_sent = 0, total_recv = 0;
+
+	isaac_init(&prng_context, &seed[0], sizeof(seed));
+
+	/* boot other CPUs */
+	for_each_present_cpu(cpu) {
+		pattern[cpu] = isaac_next_uint32(&prng_context);
+
+		if (cpu == 0)
+			continue;
+
+		smp_boot_secondary(cpu, do_test);
+	}
+
+	do_test();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	smp_mb();
+
+	/* Now total up errors and irqs */
+	for_each_present_cpu(cpu) {
+		total_err += errors[cpu];
+		total_sent += irq_sent[cpu];
+		total_recv += irq_recv[cpu];
+
+		if (check_irq) {
+			report_irq_stats(cpu);
+		}
+	}
+
+	if (check_irq) {
+		if (total_sent != total_recv) {
+			report("%d IRQs sent, %d received\n", false, total_sent, total_recv);
+		} else {
+			report("%d errors, IRQs OK", total_err == 0, total_err);
+		}
+	} else {
+		report("%d errors, IRQs not checked", total_err == 0, total_err);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+	unsigned int j;
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		for (j = 0; j < ARRAY_SIZE(tests); j++) {
+			if (strcmp(arg, tests[j].test_name) == 0) {
+				test = &tests[j];
+			}
+		}
+
+		/* Test modifiers */
+		if (strstr(arg, "mod=") != NULL) {
+			char *p = strstr(arg, "=");
+			mod_freq = atol(p+1);
+		}
+
+		if (strstr(arg, "rounds=") != NULL) {
+			char *p = strstr(arg, "=");
+			rounds = atol(p+1);
+		}
+
+		if (strcmp(arg, "smc") == 0) {
+			unsigned long test_start = (unsigned long) &tight_start;
+			unsigned long test_end = (unsigned long) &test_code_end;
+
+			smc = 1;
+			mmu_set_range_ptes(mmu_idmap, test_start, test_start, test_end,
+					__pgprot(PTE_WBWA));
+
+			report_prefix_push("smc");
+		}
+
+		if (strcmp(arg, "irq") == 0) {
+			irq = 1;
+			if (!gic_init())
+				report_abort("No supported gic present!");
+			irqv = gic_version();
+			report_prefix_push("irq");
+		}
+
+		if (strcmp(arg, "check_irq") == 0) {
+			check_irq = 1;
+		}
+	}
+
+	if (test) {
+		smp_mb();
+		setup_and_run_tcg_test();
+	} else {
+		report("Unknown test", false);
+	}
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 355dcfb..38934f2 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -178,3 +178,87 @@ extra_params = -append 'sal_barrier'
 groups = barrier
 accel = tcg
 
+# TCG Tests
+[tcg::tight]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight'
+groups = tcg
+accel = tcg
+
+[tcg::tight-smc]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight smc' -tb-size 1
+groups = tcg
+accel = tcg
+
+[tcg::tight-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight irq'
+groups = tcg
+accel = tcg
+
+[tcg::tight-smc-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight smc irq'
+groups = tcg
+accel = tcg
+
+[tcg::computed]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed'
+groups = tcg
+accel = tcg
+
+[tcg::computed-smc]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed smc'
+groups = tcg
+accel = tcg
+
+[tcg::computed-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed irq'
+groups = tcg
+accel = tcg
+
+[tcg::computed-smc-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed smc irq'
+groups = tcg
+accel = tcg
+
+[tcg::paged]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged'
+groups = tcg
+accel = tcg
+
+[tcg::paged-smc]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged smc'
+groups = tcg
+accel = tcg
+
+[tcg::paged-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged irq'
+groups = tcg
+accel = tcg
+
+[tcg::paged-smc-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged smc irq'
+groups = tcg
+accel = tcg
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 11/11] arm/tcg-test: some basic TCG exercising tests
@ 2016-11-24 16:10   ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-24 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

These tests are not really aimed at KVM at all but exist to stretch
QEMU's TCG code generator. In particular these exercise the ability of
the TCG to:

  * Chain TranslationBlocks together (tight)
  * Handle heavy usage of the tb_jump_cache (paged)
  * Pathological case of computed local jumps (computed)

In addition the tests can be varied by adding IPI IRQs or SMC sequences
into the mix to stress the tcg_exit and invalidation mechanisms.

To explicitly stress the tb_flush() mechanism you can use the mod/rounds
parameters to force more frequent tb invalidation. Combined with setting
-tb-size 1 in QEMU to limit the code generation buffer size.

Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>

---
v5
  - added armv8 version of the tcg tests
  - max out at -smp 4 in unittests.cfg
  - add up IRQs sent and delivered for PASS/FAIL
  - take into account error count
  - add "rounds=" parameter
  - tweak smc to tb-size=1
  - printf fmt fix
v7
  - merged in IRQ numerology
  - updated to latest IRQ API
---
 arm/Makefile.arm     |   2 +
 arm/Makefile.arm64   |   2 +
 arm/Makefile.common  |   1 +
 arm/tcg-test-asm.S   | 170 ++++++++++++++++++++++++++
 arm/tcg-test-asm64.S | 169 ++++++++++++++++++++++++++
 arm/tcg-test.c       | 337 +++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg    |  84 +++++++++++++
 7 files changed, 765 insertions(+)
 create mode 100644 arm/tcg-test-asm.S
 create mode 100644 arm/tcg-test-asm64.S
 create mode 100644 arm/tcg-test.c

diff --git a/arm/Makefile.arm b/arm/Makefile.arm
index 92f3757..7058bd2 100644
--- a/arm/Makefile.arm
+++ b/arm/Makefile.arm
@@ -24,4 +24,6 @@ tests =
 
 include $(TEST_DIR)/Makefile.common
 
+$(TEST_DIR)/tcg-test.elf: $(cstart.o) $(TEST_DIR)/tcg-test.o $(TEST_DIR)/tcg-test-asm.o
+
 arch_clean: arm_clean
diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
index 0b0761c..678fca4 100644
--- a/arm/Makefile.arm64
+++ b/arm/Makefile.arm64
@@ -16,5 +16,7 @@ tests =
 
 include $(TEST_DIR)/Makefile.common
 
+$(TEST_DIR)/tcg-test.elf: $(cstart.o) $(TEST_DIR)/tcg-test.o $(TEST_DIR)/tcg-test-asm64.o
+
 arch_clean: arm_clean
 	$(RM) lib/arm64/.*.d
diff --git a/arm/Makefile.common b/arm/Makefile.common
index a508128..9af758f 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -17,6 +17,7 @@ tests-common += $(TEST_DIR)/tlbflush-code.flat
 tests-common += $(TEST_DIR)/tlbflush-data.flat
 tests-common += $(TEST_DIR)/locking-test.flat
 tests-common += $(TEST_DIR)/barrier-litmus-test.flat
+tests-common += $(TEST_DIR)/tcg-test.flat
 
 all: test_cases
 
diff --git a/arm/tcg-test-asm.S b/arm/tcg-test-asm.S
new file mode 100644
index 0000000..6e823b7
--- /dev/null
+++ b/arm/tcg-test-asm.S
@@ -0,0 +1,170 @@
+/*
+ * TCG Test assembler functions for armv7 tests.
+ *
+ * Copyright (C) 2016, Linaro Ltd, Alex Benn?e <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ *
+ * These helper functions are written in pure asm to control the size
+ * of the basic blocks and ensure they fit neatly into page
+ * aligned chunks. The pattern of branches they follow is determined by
+ * the 32 bit seed they are passed. It should be the same for each set.
+ *
+ * Calling convention
+ *  - r0, iterations
+ *  - r1, jump pattern
+ *  - r2-r3, scratch
+ *
+ * Returns r0
+ */
+
+.arm
+
+.section .text
+
+/* Tight - all blocks should quickly be patched and should run
+ * very fast unless irqs or smc gets in the way
+ */
+
+.global tight_start
+tight_start:
+        subs    r0, r0, #1
+        beq     tight_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     tightA
+        b       tight_start
+
+tightA:
+        subs    r0, r0, #1
+        beq     tight_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     tightB
+        b       tight_start
+
+tightB:
+        subs    r0, r0, #1
+        beq     tight_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     tight_start
+        b       tightA
+
+.global tight_end
+tight_end:
+        mov     pc, lr
+
+/*
+ * Computed jumps cannot be hardwired into the basic blocks so each one
+ * will cause an exit for the main execution loop to look up the next block.
+ *
+ * There is some caching which should ameliorate the cost a little.
+ */
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global computed_start
+computed_start:
+        subs    r0, r0, #1
+        beq     computed_end
+
+        /* Jump table */
+        ror     r1, r1, #1
+        and     r2, r1, #1
+        adr     r3, computed_jump_table
+        ldr     r2, [r3, r2, lsl #2]
+        mov     pc, r2
+
+        b       computed_err
+
+computed_jump_table:
+        .word   computed_start
+        .word   computedA
+
+computedA:
+        subs    r0, r0, #1
+        beq     computed_end
+
+        /* Jump into code */
+        ror     r1, r1, #1
+        and     r2, r1, #1
+        adr     r3, 1f
+        add	r3, r2, lsl #2
+        mov     pc, r3
+1:      b       computed_start
+        b       computedB
+
+        b       computed_err
+
+
+computedB:
+        subs    r0, r0, #1
+        beq     computed_end
+        ror     r1, r1, #1
+
+        /* Conditional register load */
+        adr     r3, computedA
+        tst     r1, #1
+        adreq   r3, computed_start
+        mov     pc, r3
+
+        b       computed_err
+
+computed_err:
+        mov     r0, #1
+        .global computed_end
+computed_end:
+        mov     pc, lr
+
+
+/*
+ * Page hoping
+ *
+ * Each block is in a different page, hence the blocks never get joined
+ */
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global paged_start
+paged_start:
+        subs    r0, r0, #1
+        beq     paged_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     pagedA
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedA:
+        subs    r0, r0, #1
+        beq     paged_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     pagedB
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedB:
+        subs    r0, r0, #1
+        beq     paged_end
+
+        ror     r1, r1, #1
+        tst     r1, #1
+        beq     paged_start
+        b       pagedA
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+.global paged_end
+paged_end:
+        mov     pc, lr
+
+.global test_code_end
+test_code_end:
diff --git a/arm/tcg-test-asm64.S b/arm/tcg-test-asm64.S
new file mode 100644
index 0000000..22bcfb4
--- /dev/null
+++ b/arm/tcg-test-asm64.S
@@ -0,0 +1,169 @@
+/*
+ * TCG Test assembler functions for armv8 tests.
+ *
+ * Copyright (C) 2016, Linaro Ltd, Alex Benn?e <alex.bennee@linaro.org>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ *
+ * These helper functions are written in pure asm to control the size
+ * of the basic blocks and ensure they fit neatly into page
+ * aligned chunks. The pattern of branches they follow is determined by
+ * the 32 bit seed they are passed. It should be the same for each set.
+ *
+ * Calling convention
+ *  - x0, iterations
+ *  - x1, jump pattern
+ *  - x2-x3, scratch
+ *
+ * Returns x0
+ */
+
+.section .text
+
+/* Tight - all blocks should quickly be patched and should run
+ * very fast unless irqs or smc gets in the way
+ */
+
+.global tight_start
+tight_start:
+        subs    x0, x0, #1
+        beq     tight_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     tightA
+        b       tight_start
+
+tightA:
+        subs    x0, x0, #1
+        beq     tight_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     tightB
+        b       tight_start
+
+tightB:
+        subs    x0, x0, #1
+        beq     tight_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     tight_start
+        b       tightA
+
+.global tight_end
+tight_end:
+        ret
+
+/*
+ * Computed jumps cannot be hardwired into the basic blocks so each one
+ * will cause an exit for the main execution loop to look up the next block.
+ *
+ * There is some caching which should ameliorate the cost a little.
+ */
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global computed_start
+computed_start:
+        subs    x0, x0, #1
+        beq     computed_end
+
+        /* Jump table */
+        ror     x1, x1, #1
+        and     x2, x1, #1
+        adr     x3, computed_jump_table
+        ldr     x2, [x3, x2, lsl #3]
+        br      x2
+
+        b       computed_err
+
+computed_jump_table:
+        .quad   computed_start
+        .quad   computedA
+
+computedA:
+        subs    x0, x0, #1
+        beq     computed_end
+
+        /* Jump into code */
+        ror     x1, x1, #1
+        and     x2, x1, #1
+        adr     x3, 1f
+        add	x3, x3, x2, lsl #2
+        br      x3
+1:      b       computed_start
+        b       computedB
+
+        b       computed_err
+
+
+computedB:
+        subs    x0, x0, #1
+        beq     computed_end
+        ror     x1, x1, #1
+
+        /* Conditional register load */
+        adr     x2, computedA
+        adr     x3, computed_start
+        tst     x1, #1
+        csel    x2, x3, x2, eq
+        br      x2
+
+        b       computed_err
+
+computed_err:
+        mov     x0, #1
+        .global computed_end
+computed_end:
+        ret
+
+
+/*
+ * Page hoping
+ *
+ * Each block is in a different page, hence the blocks never get joined
+ */
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+        .global paged_start
+paged_start:
+        subs    x0, x0, #1
+        beq     paged_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     pagedA
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedA:
+        subs    x0, x0, #1
+        beq     paged_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     pagedB
+        b       paged_start
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+pagedB:
+        subs    x0, x0, #1
+        beq     paged_end
+
+        ror     x1, x1, #1
+        tst     x1, #1
+        beq     paged_start
+        b       pagedA
+
+        /* Align << 13 == 4096 byte alignment */
+        .align 13
+.global paged_end
+paged_end:
+        ret
+
+.global test_code_end
+test_code_end:
diff --git a/arm/tcg-test.c b/arm/tcg-test.c
new file mode 100644
index 0000000..341dca3
--- /dev/null
+++ b/arm/tcg-test.c
@@ -0,0 +1,337 @@
+/*
+ * ARM TCG Tests
+ *
+ * These tests are explicitly aimed@stretching the QEMU TCG engine.
+ */
+
+#include <libcflat.h>
+#include <asm/processor.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+#include <asm/gic.h>
+
+#include <prng.h>
+
+#define MAX_CPUS 8
+
+/* These entry points are in the assembly code */
+extern int tight_start(uint32_t count, uint32_t pattern);
+extern int computed_start(uint32_t count, uint32_t pattern);
+extern int paged_start(uint32_t count, uint32_t pattern);
+extern uint32_t tight_end;
+extern uint32_t computed_end;
+extern uint32_t paged_end;
+extern unsigned long test_code_end;
+
+typedef int (*test_fn)(uint32_t count, uint32_t pattern);
+
+typedef struct {
+	const char *test_name;
+	bool       should_pass;
+	test_fn    start_fn;
+	uint32_t   *code_end;
+} test_descr_t;
+
+/* Test array */
+static test_descr_t tests[] = {
+       /*
+	* Tight chain.
+	*
+	* These are a bunch of basic blocks that have fixed branches in
+	* a page aligned space. The branches taken are decided by a
+	* psuedo-random bitmap for each CPU.
+	*
+	* Once the basic blocks have been chained together by the TCG they
+	* should run until they reach their block count. This will be the
+	* most efficient mode in which generated code is run. The only other
+	* exits will be caused by interrupts or TB invalidation.
+	*/
+	{ "tight", true, tight_start, &tight_end },
+	/*
+	 * Computed jumps.
+	 *
+	 * A bunch of basic blocks which just do computed jumps so the basic
+	 * block is never chained but they are all within a page (maybe not
+	 * required). This will exercise the cache lookup but not the new
+	 * generation.
+	 */
+	{ "computed", true, computed_start, &computed_end },
+        /*
+	 * Page ping pong.
+	 *
+	 * Have the blocks are separated by PAGE_SIZE so they can never
+	 * be chained together.
+	 *
+	 */
+	{ "paged", true, paged_start, &paged_end}
+};
+
+static test_descr_t *test = NULL;
+
+static int iterations = 1000000;
+static int rounds = 1000;
+static int mod_freq = 5;
+static uint32_t pattern[MAX_CPUS];
+
+/* control flags */
+static int smc = 0;
+static int irq = 0;
+static int check_irq = 0;
+
+/* IRQ accounting */
+#define MAX_IRQ_IDS 16
+static int irqv;
+static unsigned long irq_sent_ts[MAX_CPUS][MAX_CPUS][MAX_IRQ_IDS];
+
+static int irq_recv[MAX_CPUS];
+static int irq_sent[MAX_CPUS];
+static int irq_overlap[MAX_CPUS];  /* if ts > now, i.e a race */
+static int irq_slow[MAX_CPUS];  /* if delay > threshold */
+static unsigned long irq_latency[MAX_CPUS]; /* cumulative time */
+
+static int errors[MAX_CPUS];
+
+static cpumask_t smp_test_complete;
+
+static cpumask_t ready;
+
+static void wait_on_ready(void)
+{
+	cpumask_set_cpu(smp_processor_id(), &ready);
+	while (!cpumask_full(&ready))
+		cpu_relax();
+}
+
+/* This triggers TCGs SMC detection by writing values to the executing
+ * code pages. We are not actually modifying the instructions and the
+ * underlying code will remain unchanged. However this should trigger
+ * invalidation of the Translation Blocks
+ */
+
+void trigger_smc_detection(uint32_t *start, uint32_t *end)
+{
+	volatile uint32_t *ptr = start;
+	while (ptr < end) {
+		uint32_t inst = *ptr;
+		*ptr++ = inst;
+	}
+}
+
+/* Handler for receiving IRQs */
+
+static void irq_handler(struct pt_regs *regs __unused)
+{
+	unsigned long then, now = get_cntvct();
+	int cpu = smp_processor_id();
+	u32 irqstat = gic_read_iar();
+	u32 irqnr = gic_iar_irqnr(irqstat);
+
+	if (irqnr != GICC_INT_SPURIOUS) {
+		unsigned int src_cpu = (irqstat >> 10) & 0x7; ;
+		gic_write_eoir(irqstat);
+		irq_recv[cpu]++;
+
+		then = irq_sent_ts[src_cpu][cpu][irqnr];
+
+		if (then > now) {
+			irq_overlap[cpu]++;
+		} else {
+			unsigned long latency = (now - then);
+			if (latency > 30000) {
+				irq_slow[cpu]++;
+			} else {
+				irq_latency[cpu] += latency;
+			}
+		}
+	}
+}
+
+/* This triggers cross-CPU IRQs. Each IRQ should cause the basic block
+ * execution to finish the main run-loop get entered again.
+ */
+int send_cross_cpu_irqs(int this_cpu, int irq)
+{
+	int cpu, sent = 0;
+	cpumask_t mask;
+
+	cpumask_copy(&mask, &cpu_present_mask);
+
+	for_each_present_cpu(cpu) {
+		if (cpu != this_cpu) {
+			irq_sent_ts[this_cpu][cpu][irq] = get_cntvct();
+			cpumask_clear_cpu(cpu, &mask);
+			sent++;
+		}
+	}
+
+	gic_ipi_send_mask(irq, &mask);
+
+	return sent;
+}
+
+void do_test(void)
+{
+	int cpu = smp_processor_id();
+	int i, irq_id = 0;
+
+	printf("CPU%d: online and setting up with pattern 0x%"PRIx32"\n", cpu, pattern[cpu]);
+
+	if (irq) {
+		gic_enable_defaults();
+#ifdef __arm__
+		install_exception_handler(EXCPTN_IRQ, irq_handler);
+#else
+		install_irq_handler(EL1H_IRQ, irq_handler);
+#endif
+		local_irq_enable();
+
+		wait_on_ready();
+	}
+
+	for (i=0; i<rounds; i++)
+	{
+		/* Enter the blocks */
+		errors[cpu] += test->start_fn(iterations, pattern[cpu]);
+
+		if ((i + cpu) % mod_freq == 0)
+		{
+			if (smc) {
+				trigger_smc_detection((uint32_t *) test->start_fn,
+						test->code_end);
+			}
+			if (irq) {
+				irq_sent[cpu] += send_cross_cpu_irqs(cpu, irq_id);
+				irq_id++;
+				irq_id = irq_id % 15;
+			}
+		}
+	}
+
+	smp_wmb();
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+void report_irq_stats(int cpu)
+{
+	int recv = irq_recv[cpu];
+	int race = irq_overlap[cpu];
+	int slow = irq_slow[cpu];
+
+	unsigned long avg_latency = irq_latency[cpu] / (recv - (race + slow));
+
+	printf("CPU%d: %d irqs (%d races, %d slow,  %ld ticks avg latency)\n",
+		cpu, recv, race, slow, avg_latency);
+}
+
+
+void setup_and_run_tcg_test(void)
+{
+	static const unsigned char seed[] = "tcg-test";
+	struct isaac_ctx prng_context;
+	int cpu;
+	int total_err = 0, total_sent = 0, total_recv = 0;
+
+	isaac_init(&prng_context, &seed[0], sizeof(seed));
+
+	/* boot other CPUs */
+	for_each_present_cpu(cpu) {
+		pattern[cpu] = isaac_next_uint32(&prng_context);
+
+		if (cpu == 0)
+			continue;
+
+		smp_boot_secondary(cpu, do_test);
+	}
+
+	do_test();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	smp_mb();
+
+	/* Now total up errors and irqs */
+	for_each_present_cpu(cpu) {
+		total_err += errors[cpu];
+		total_sent += irq_sent[cpu];
+		total_recv += irq_recv[cpu];
+
+		if (check_irq) {
+			report_irq_stats(cpu);
+		}
+	}
+
+	if (check_irq) {
+		if (total_sent != total_recv) {
+			report("%d IRQs sent, %d received\n", false, total_sent, total_recv);
+		} else {
+			report("%d errors, IRQs OK", total_err == 0, total_err);
+		}
+	} else {
+		report("%d errors, IRQs not checked", total_err == 0, total_err);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	int i;
+	unsigned int j;
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		for (j = 0; j < ARRAY_SIZE(tests); j++) {
+			if (strcmp(arg, tests[j].test_name) == 0) {
+				test = &tests[j];
+			}
+		}
+
+		/* Test modifiers */
+		if (strstr(arg, "mod=") != NULL) {
+			char *p = strstr(arg, "=");
+			mod_freq = atol(p+1);
+		}
+
+		if (strstr(arg, "rounds=") != NULL) {
+			char *p = strstr(arg, "=");
+			rounds = atol(p+1);
+		}
+
+		if (strcmp(arg, "smc") == 0) {
+			unsigned long test_start = (unsigned long) &tight_start;
+			unsigned long test_end = (unsigned long) &test_code_end;
+
+			smc = 1;
+			mmu_set_range_ptes(mmu_idmap, test_start, test_start, test_end,
+					__pgprot(PTE_WBWA));
+
+			report_prefix_push("smc");
+		}
+
+		if (strcmp(arg, "irq") == 0) {
+			irq = 1;
+			if (!gic_init())
+				report_abort("No supported gic present!");
+			irqv = gic_version();
+			report_prefix_push("irq");
+		}
+
+		if (strcmp(arg, "check_irq") == 0) {
+			check_irq = 1;
+		}
+	}
+
+	if (test) {
+		smp_mb();
+		setup_and_run_tcg_test();
+	} else {
+		report("Unknown test", false);
+	}
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 355dcfb..38934f2 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -178,3 +178,87 @@ extra_params = -append 'sal_barrier'
 groups = barrier
 accel = tcg
 
+# TCG Tests
+[tcg::tight]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight'
+groups = tcg
+accel = tcg
+
+[tcg::tight-smc]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight smc' -tb-size 1
+groups = tcg
+accel = tcg
+
+[tcg::tight-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight irq'
+groups = tcg
+accel = tcg
+
+[tcg::tight-smc-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'tight smc irq'
+groups = tcg
+accel = tcg
+
+[tcg::computed]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed'
+groups = tcg
+accel = tcg
+
+[tcg::computed-smc]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed smc'
+groups = tcg
+accel = tcg
+
+[tcg::computed-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed irq'
+groups = tcg
+accel = tcg
+
+[tcg::computed-smc-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'computed smc irq'
+groups = tcg
+accel = tcg
+
+[tcg::paged]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged'
+groups = tcg
+accel = tcg
+
+[tcg::paged-smc]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged smc'
+groups = tcg
+accel = tcg
+
+[tcg::paged-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged irq'
+groups = tcg
+accel = tcg
+
+[tcg::paged-smc-irq]
+file = tcg-test.flat
+smp = $(($MAX_SMP>4?4:$MAX_SMP))
+extra_params = -append 'paged smc irq'
+groups = tcg
+accel = tcg
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [kvm-unit-tests PATCH v7 01/11] run_tests: allow forcing of acceleration mode
  2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-28  8:51     ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  8:51 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, nikunj, kvm, marc.zyngier, jan.kiszka, mark.burton,
	qemu-devel, cota, linux-arm-kernel, pbonzini, serge.fdrv,
	bobby.prani, kvmarm, rth

On Thu, Nov 24, 2016 at 04:10:23PM +0000, Alex Bennée wrote:
> While tests can be pegged to tcg it is useful to override this from time
> to time, especially when testing correctness on real systems.
> ---
>  run_tests.sh         | 8 ++++++--
>  scripts/runtime.bash | 4 ++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index 254129d..b88c36f 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,9 +13,10 @@ function usage()
>  {
>  cat <<EOF
>  
> -Usage: $0 [-g group] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-h] [-v]
>  
>      -g: Only execute tests in the given group
> +    -a: Force acceleration mode (tcg/kvm)
>      -h: Output this help text
>      -v: Enables verbose mode
>  
> @@ -28,11 +29,14 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:hv" opt; do
> +while getopts "g:a:hv" opt; do
>      case $opt in
>          g)
>              only_group=$OPTARG
>              ;;
> +        a)
> +            force_accel=$OPTARG
> +            ;;
>          h)
>              usage
>              exit
> diff --git a/scripts/runtime.bash b/scripts/runtime.bash
> index 11a40a9..578cf32 100644
> --- a/scripts/runtime.bash
> +++ b/scripts/runtime.bash
> @@ -75,6 +75,10 @@ function run()
>          return;
>      fi
>  
> +    if [ -n "$force_accel" ]; then
> +        accel=$force_accel
> +    fi
> +
>      if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
>          echo "`SKIP` $1 ($arch only)"
>          return 2
> -- 
> 2.10.1

We can already do 'ACCEL=tcg ./run_tests.sh' to force, for example, tcg.
Additionally, you can add any env you want to the config.mak after running
configure,

 echo ACCEL=tcg >> config.mak

If you still prefer a cmdline parameter, then I'd suggest a boolean
instead, with the default being KVM. So the param would be '-tcg', or
something.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 01/11] run_tests: allow forcing of acceleration mode
@ 2016-11-28  8:51     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  8:51 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, nikunj, jan.kiszka, mark.burton, qemu-devel, cota,
	serge.fdrv, pbonzini, bobby.prani, rth

On Thu, Nov 24, 2016 at 04:10:23PM +0000, Alex Bennée wrote:
> While tests can be pegged to tcg it is useful to override this from time
> to time, especially when testing correctness on real systems.
> ---
>  run_tests.sh         | 8 ++++++--
>  scripts/runtime.bash | 4 ++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index 254129d..b88c36f 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,9 +13,10 @@ function usage()
>  {
>  cat <<EOF
>  
> -Usage: $0 [-g group] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-h] [-v]
>  
>      -g: Only execute tests in the given group
> +    -a: Force acceleration mode (tcg/kvm)
>      -h: Output this help text
>      -v: Enables verbose mode
>  
> @@ -28,11 +29,14 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:hv" opt; do
> +while getopts "g:a:hv" opt; do
>      case $opt in
>          g)
>              only_group=$OPTARG
>              ;;
> +        a)
> +            force_accel=$OPTARG
> +            ;;
>          h)
>              usage
>              exit
> diff --git a/scripts/runtime.bash b/scripts/runtime.bash
> index 11a40a9..578cf32 100644
> --- a/scripts/runtime.bash
> +++ b/scripts/runtime.bash
> @@ -75,6 +75,10 @@ function run()
>          return;
>      fi
>  
> +    if [ -n "$force_accel" ]; then
> +        accel=$force_accel
> +    fi
> +
>      if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
>          echo "`SKIP` $1 ($arch only)"
>          return 2
> -- 
> 2.10.1

We can already do 'ACCEL=tcg ./run_tests.sh' to force, for example, tcg.
Additionally, you can add any env you want to the config.mak after running
configure,

 echo ACCEL=tcg >> config.mak

If you still prefer a cmdline parameter, then I'd suggest a boolean
instead, with the default being KVM. So the param would be '-tcg', or
something.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 01/11] run_tests: allow forcing of acceleration mode
@ 2016-11-28  8:51     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2016 at 04:10:23PM +0000, Alex Benn?e wrote:
> While tests can be pegged to tcg it is useful to override this from time
> to time, especially when testing correctness on real systems.
> ---
>  run_tests.sh         | 8 ++++++--
>  scripts/runtime.bash | 4 ++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index 254129d..b88c36f 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,9 +13,10 @@ function usage()
>  {
>  cat <<EOF
>  
> -Usage: $0 [-g group] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-h] [-v]
>  
>      -g: Only execute tests in the given group
> +    -a: Force acceleration mode (tcg/kvm)
>      -h: Output this help text
>      -v: Enables verbose mode
>  
> @@ -28,11 +29,14 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:hv" opt; do
> +while getopts "g:a:hv" opt; do
>      case $opt in
>          g)
>              only_group=$OPTARG
>              ;;
> +        a)
> +            force_accel=$OPTARG
> +            ;;
>          h)
>              usage
>              exit
> diff --git a/scripts/runtime.bash b/scripts/runtime.bash
> index 11a40a9..578cf32 100644
> --- a/scripts/runtime.bash
> +++ b/scripts/runtime.bash
> @@ -75,6 +75,10 @@ function run()
>          return;
>      fi
>  
> +    if [ -n "$force_accel" ]; then
> +        accel=$force_accel
> +    fi
> +
>      if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
>          echo "`SKIP` $1 ($arch only)"
>          return 2
> -- 
> 2.10.1

We can already do 'ACCEL=tcg ./run_tests.sh' to force, for example, tcg.
Additionally, you can add any env you want to the config.mak after running
configure,

 echo ACCEL=tcg >> config.mak

If you still prefer a cmdline parameter, then I'd suggest a boolean
instead, with the default being KVM. So the param would be '-tcg', or
something.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [kvm-unit-tests PATCH v7 02/11] run_tests: allow disabling of timeouts
  2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-28  9:00     ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:00 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana

On Thu, Nov 24, 2016 at 04:10:24PM +0000, Alex Bennée wrote:
> Certainly during development of the tests and MTTCG there are times when
> the timeout just gets in the way.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  run_tests.sh         | 8 ++++++--
>  scripts/runtime.bash | 4 ++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index b88c36f..4f2e5cb 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,10 +13,11 @@ function usage()
>  {
>  cat <<EOF
>  
> -Usage: $0 [-g group] [-a accel] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
>  
>      -g: Only execute tests in the given group
>      -a: Force acceleration mode (tcg/kvm)
> +    -t: disable timeouts
>      -h: Output this help text
>      -v: Enables verbose mode
>  
> @@ -29,7 +30,7 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:a:hv" opt; do
> +while getopts "g:a:thv" opt; do
>      case $opt in
>          g)
>              only_group=$OPTARG
> @@ -37,6 +38,9 @@ while getopts "g:a:hv" opt; do
>          a)
>              force_accel=$OPTARG
>              ;;
> +        t)
> +            no_timeout="yes"
> +            ;;
>          h)
>              usage
>              exit
> diff --git a/scripts/runtime.bash b/scripts/runtime.bash
> index 578cf32..968ff6d 100644
> --- a/scripts/runtime.bash
> +++ b/scripts/runtime.bash
> @@ -79,6 +79,10 @@ function run()
>          accel=$force_accel
>      fi
>  
> +    if [ "$no_timeout" = "yes" ]; then
> +        timeout=""
> +    fi
> +
>      if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
>          echo "`SKIP` $1 ($arch only)"
>          return 2
> -- 
> 2.10.1
>

A timeout value of zero disables the timeout. So you just need to run
 TIMEOUT=0 ./run_tests.sh, or add it to config.mak.

drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 02/11] run_tests: allow disabling of timeouts
@ 2016-11-28  9:00     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:00 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana

On Thu, Nov 24, 2016 at 04:10:24PM +0000, Alex Bennée wrote:
> Certainly during development of the tests and MTTCG there are times when
> the timeout just gets in the way.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  run_tests.sh         | 8 ++++++--
>  scripts/runtime.bash | 4 ++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index b88c36f..4f2e5cb 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,10 +13,11 @@ function usage()
>  {
>  cat <<EOF
>  
> -Usage: $0 [-g group] [-a accel] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
>  
>      -g: Only execute tests in the given group
>      -a: Force acceleration mode (tcg/kvm)
> +    -t: disable timeouts
>      -h: Output this help text
>      -v: Enables verbose mode
>  
> @@ -29,7 +30,7 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:a:hv" opt; do
> +while getopts "g:a:thv" opt; do
>      case $opt in
>          g)
>              only_group=$OPTARG
> @@ -37,6 +38,9 @@ while getopts "g:a:hv" opt; do
>          a)
>              force_accel=$OPTARG
>              ;;
> +        t)
> +            no_timeout="yes"
> +            ;;
>          h)
>              usage
>              exit
> diff --git a/scripts/runtime.bash b/scripts/runtime.bash
> index 578cf32..968ff6d 100644
> --- a/scripts/runtime.bash
> +++ b/scripts/runtime.bash
> @@ -79,6 +79,10 @@ function run()
>          accel=$force_accel
>      fi
>  
> +    if [ "$no_timeout" = "yes" ]; then
> +        timeout=""
> +    fi
> +
>      if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
>          echo "`SKIP` $1 ($arch only)"
>          return 2
> -- 
> 2.10.1
>

A timeout value of zero disables the timeout. So you just need to run
 TIMEOUT=0 ./run_tests.sh, or add it to config.mak.

drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 02/11] run_tests: allow disabling of timeouts
@ 2016-11-28  9:00     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2016 at 04:10:24PM +0000, Alex Benn?e wrote:
> Certainly during development of the tests and MTTCG there are times when
> the timeout just gets in the way.
> 
> Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
> ---
>  run_tests.sh         | 8 ++++++--
>  scripts/runtime.bash | 4 ++++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index b88c36f..4f2e5cb 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,10 +13,11 @@ function usage()
>  {
>  cat <<EOF
>  
> -Usage: $0 [-g group] [-a accel] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
>  
>      -g: Only execute tests in the given group
>      -a: Force acceleration mode (tcg/kvm)
> +    -t: disable timeouts
>      -h: Output this help text
>      -v: Enables verbose mode
>  
> @@ -29,7 +30,7 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:a:hv" opt; do
> +while getopts "g:a:thv" opt; do
>      case $opt in
>          g)
>              only_group=$OPTARG
> @@ -37,6 +38,9 @@ while getopts "g:a:hv" opt; do
>          a)
>              force_accel=$OPTARG
>              ;;
> +        t)
> +            no_timeout="yes"
> +            ;;
>          h)
>              usage
>              exit
> diff --git a/scripts/runtime.bash b/scripts/runtime.bash
> index 578cf32..968ff6d 100644
> --- a/scripts/runtime.bash
> +++ b/scripts/runtime.bash
> @@ -79,6 +79,10 @@ function run()
>          accel=$force_accel
>      fi
>  
> +    if [ "$no_timeout" = "yes" ]; then
> +        timeout=""
> +    fi
> +
>      if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
>          echo "`SKIP` $1 ($arch only)"
>          return 2
> -- 
> 2.10.1
>

A timeout value of zero disables the timeout. So you just need to run
 TIMEOUT=0 ./run_tests.sh, or add it to config.mak.

drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU
  2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
@ 2016-11-28  9:10     ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:10 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, peter.maydell, claudio.fontana, nikunj, jan.kiszka,
	mark.burton, a.rigo, qemu-devel, cota, serge.fdrv, pbonzini,
	bobby.prani, rth, fred.konrad

On Thu, Nov 24, 2016 at 04:10:25PM +0000, Alex Bennée wrote:
> This introduces a the option -o for passing of options directly to QEMU
> which is useful. In my case I'm using it to toggle MTTCG on an off:
> 
>   ./run_tests.sh -t -o "-tcg mttcg=on"
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  run_tests.sh           | 10 +++++++---
>  scripts/functions.bash | 13 +++++++------
>  2 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index 4f2e5cb..05cc7fb 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,10 +13,11 @@ function usage()
>  {
>  cat <<EOF
>  
> -Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-o qemu_opts] [-t] [-h] [-v]
>  
>      -g: Only execute tests in the given group
>      -a: Force acceleration mode (tcg/kvm)
> +    -o: additional options for QEMU command line
>      -t: disable timeouts
>      -h: Output this help text
>      -v: Enables verbose mode
> @@ -30,7 +31,7 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:a:thv" opt; do
> +while getopts "g:a:o:thv" opt; do
>      case $opt in
>          g)
>              only_group=$OPTARG
> @@ -38,6 +39,9 @@ while getopts "g:a:thv" opt; do
>          a)
>              force_accel=$OPTARG
>              ;;
> +        o)
> +            extra_opts=$OPTARG
> +            ;;
>          t)
>              no_timeout="yes"
>              ;;
> @@ -67,4 +71,4 @@ RUNTIME_log_stdout () {
>  config=$TEST_DIR/unittests.cfg
>  rm -f test.log
>  printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
> -for_each_unittest $config run
> +for_each_unittest $config run "$extra_opts"
> diff --git a/scripts/functions.bash b/scripts/functions.bash
> index ee9143c..d38a69e 100644
> --- a/scripts/functions.bash
> +++ b/scripts/functions.bash
> @@ -2,11 +2,12 @@
>  function for_each_unittest()
>  {
>  	local unittests="$1"
> -	local cmd="$2"
> -	local testname
> +        local cmd="$2"
> +        local extra_opts=$3
> +        local testname

We use tabs in this file. Not sure why cmd and testname got
changed too...

>  	local smp
>  	local kernel
> -	local opts
> +        local opts=$extra_opts
>  	local groups
>  	local arch
>  	local check
> @@ -21,7 +22,7 @@ function for_each_unittest()
>  			testname=${BASH_REMATCH[1]}
>  			smp=1
>  			kernel=""
> -			opts=""
> +                        opts=$extra_opts
>  			groups=""
>  			arch=""
>  			check=""
> @@ -32,7 +33,7 @@ function for_each_unittest()
>  		elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
>  			smp=${BASH_REMATCH[1]}
>  		elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
> -			opts=${BASH_REMATCH[1]}
> +                        opts="$opts ${BASH_REMATCH[1]}"
>  		elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
>  			groups=${BASH_REMATCH[1]}
>  		elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
> @@ -45,6 +46,6 @@ function for_each_unittest()
>  			timeout=${BASH_REMATCH[1]}
>  		fi
>  	done
> -	"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
> +        "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
>  	exec {fd}<&-
>  }
> -- 
> 2.10.1
> 
>

This is a pretty good idea, but I think I might like the extra options
to be given like this instead

  ./run_tests.sh [run_tests.sh options] -- [qemu options]

Thanks,
drew 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU
@ 2016-11-28  9:10     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2016 at 04:10:25PM +0000, Alex Benn?e wrote:
> This introduces a the option -o for passing of options directly to QEMU
> which is useful. In my case I'm using it to toggle MTTCG on an off:
> 
>   ./run_tests.sh -t -o "-tcg mttcg=on"
> 
> Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
> ---
>  run_tests.sh           | 10 +++++++---
>  scripts/functions.bash | 13 +++++++------
>  2 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index 4f2e5cb..05cc7fb 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -13,10 +13,11 @@ function usage()
>  {
>  cat <<EOF
>  
> -Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
> +Usage: $0 [-g group] [-a accel] [-o qemu_opts] [-t] [-h] [-v]
>  
>      -g: Only execute tests in the given group
>      -a: Force acceleration mode (tcg/kvm)
> +    -o: additional options for QEMU command line
>      -t: disable timeouts
>      -h: Output this help text
>      -v: Enables verbose mode
> @@ -30,7 +31,7 @@ EOF
>  RUNTIME_arch_run="./$TEST_DIR/run"
>  source scripts/runtime.bash
>  
> -while getopts "g:a:thv" opt; do
> +while getopts "g:a:o:thv" opt; do
>      case $opt in
>          g)
>              only_group=$OPTARG
> @@ -38,6 +39,9 @@ while getopts "g:a:thv" opt; do
>          a)
>              force_accel=$OPTARG
>              ;;
> +        o)
> +            extra_opts=$OPTARG
> +            ;;
>          t)
>              no_timeout="yes"
>              ;;
> @@ -67,4 +71,4 @@ RUNTIME_log_stdout () {
>  config=$TEST_DIR/unittests.cfg
>  rm -f test.log
>  printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
> -for_each_unittest $config run
> +for_each_unittest $config run "$extra_opts"
> diff --git a/scripts/functions.bash b/scripts/functions.bash
> index ee9143c..d38a69e 100644
> --- a/scripts/functions.bash
> +++ b/scripts/functions.bash
> @@ -2,11 +2,12 @@
>  function for_each_unittest()
>  {
>  	local unittests="$1"
> -	local cmd="$2"
> -	local testname
> +        local cmd="$2"
> +        local extra_opts=$3
> +        local testname

We use tabs in this file. Not sure why cmd and testname got
changed too...

>  	local smp
>  	local kernel
> -	local opts
> +        local opts=$extra_opts
>  	local groups
>  	local arch
>  	local check
> @@ -21,7 +22,7 @@ function for_each_unittest()
>  			testname=${BASH_REMATCH[1]}
>  			smp=1
>  			kernel=""
> -			opts=""
> +                        opts=$extra_opts
>  			groups=""
>  			arch=""
>  			check=""
> @@ -32,7 +33,7 @@ function for_each_unittest()
>  		elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
>  			smp=${BASH_REMATCH[1]}
>  		elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
> -			opts=${BASH_REMATCH[1]}
> +                        opts="$opts ${BASH_REMATCH[1]}"
>  		elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
>  			groups=${BASH_REMATCH[1]}
>  		elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
> @@ -45,6 +46,6 @@ function for_each_unittest()
>  			timeout=${BASH_REMATCH[1]}
>  		fi
>  	done
> -	"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
> +        "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
>  	exec {fd}<&-
>  }
> -- 
> 2.10.1
> 
>

This is a pretty good idea, but I think I might like the extra options
to be given like this instead

  ./run_tests.sh [run_tests.sh options] -- [qemu options]

Thanks,
drew 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
  2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-28  9:18     ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:18 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, nikunj, jan.kiszka, mark.burton, qemu-devel, cota,
	serge.fdrv, pbonzini, bobby.prani, rth

On Thu, Nov 24, 2016 at 04:10:26PM +0000, Alex Bennée wrote:
> So we can have portable formatting of uint32_t types.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  lib/libcflat.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/lib/libcflat.h b/lib/libcflat.h
> index bdcc561..6dab5be 100644
> --- a/lib/libcflat.h
> +++ b/lib/libcflat.h
> @@ -55,12 +55,17 @@ typedef _Bool		bool;
>  #define true  1
>  
>  #if __SIZEOF_LONG__ == 8
> +#  define __PRI32_PREFIX
>  #  define __PRI64_PREFIX	"l"
>  #  define __PRIPTR_PREFIX	"l"
>  #else
> +#  define __PRI32_PREFIX        "l"

But a 32-bit value is an 'int' and an 'int' shouldn't ever
require an 'l'. Why was this necessary?

>  #  define __PRI64_PREFIX	"ll"
>  #  define __PRIPTR_PREFIX
>  #endif
> +#define PRId32  __PRI32_PREFIX	"d"
> +#define PRIu32  __PRI32_PREFIX	"u"
> +#define PRIx32  __PRI32_PREFIX	"x"
>  #define PRId64  __PRI64_PREFIX	"d"
>  #define PRIu64  __PRI64_PREFIX	"u"
>  #define PRIx64  __PRI64_PREFIX	"x"
> -- 
> 2.10.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
@ 2016-11-28  9:18     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:18 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, nikunj, jan.kiszka, mark.burton, qemu-devel, cota,
	serge.fdrv, pbonzini, bobby.prani, rth

On Thu, Nov 24, 2016 at 04:10:26PM +0000, Alex Bennée wrote:
> So we can have portable formatting of uint32_t types.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  lib/libcflat.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/lib/libcflat.h b/lib/libcflat.h
> index bdcc561..6dab5be 100644
> --- a/lib/libcflat.h
> +++ b/lib/libcflat.h
> @@ -55,12 +55,17 @@ typedef _Bool		bool;
>  #define true  1
>  
>  #if __SIZEOF_LONG__ == 8
> +#  define __PRI32_PREFIX
>  #  define __PRI64_PREFIX	"l"
>  #  define __PRIPTR_PREFIX	"l"
>  #else
> +#  define __PRI32_PREFIX        "l"

But a 32-bit value is an 'int' and an 'int' shouldn't ever
require an 'l'. Why was this necessary?

>  #  define __PRI64_PREFIX	"ll"
>  #  define __PRIPTR_PREFIX
>  #endif
> +#define PRId32  __PRI32_PREFIX	"d"
> +#define PRIu32  __PRI32_PREFIX	"u"
> +#define PRIx32  __PRI32_PREFIX	"x"
>  #define PRId64  __PRI64_PREFIX	"d"
>  #define PRIu64  __PRI64_PREFIX	"u"
>  #define PRIx64  __PRI64_PREFIX	"x"
> -- 
> 2.10.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
@ 2016-11-28  9:18     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2016 at 04:10:26PM +0000, Alex Benn?e wrote:
> So we can have portable formatting of uint32_t types.
> 
> Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
> ---
>  lib/libcflat.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/lib/libcflat.h b/lib/libcflat.h
> index bdcc561..6dab5be 100644
> --- a/lib/libcflat.h
> +++ b/lib/libcflat.h
> @@ -55,12 +55,17 @@ typedef _Bool		bool;
>  #define true  1
>  
>  #if __SIZEOF_LONG__ == 8
> +#  define __PRI32_PREFIX
>  #  define __PRI64_PREFIX	"l"
>  #  define __PRIPTR_PREFIX	"l"
>  #else
> +#  define __PRI32_PREFIX        "l"

But a 32-bit value is an 'int' and an 'int' shouldn't ever
require an 'l'. Why was this necessary?

>  #  define __PRI64_PREFIX	"ll"
>  #  define __PRIPTR_PREFIX
>  #endif
> +#define PRId32  __PRI32_PREFIX	"d"
> +#define PRIu32  __PRI32_PREFIX	"u"
> +#define PRIx32  __PRI32_PREFIX	"x"
>  #define PRId64  __PRI64_PREFIX	"d"
>  #define PRIu64  __PRI64_PREFIX	"u"
>  #define PRIx64  __PRI64_PREFIX	"x"
> -- 
> 2.10.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 06/11] arm/Makefile.common: force -fno-pic
  2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
@ 2016-11-28  9:33     ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:33 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, peter.maydell, claudio.fontana, nikunj, jan.kiszka,
	mark.burton, a.rigo, qemu-devel, cota, serge.fdrv, pbonzini,
	bobby.prani, rth, fred.konrad

On Thu, Nov 24, 2016 at 04:10:28PM +0000, Alex Bennée wrote:
> As distro compilers move towards defaults for build hardening for things
> like ASLR we need to force -fno-pic. Failure to do can lead to weird
> relocation problems when we build our "lat" binaries.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  arm/Makefile.common | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index 52f7440..cca0d9c 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -21,6 +21,7 @@ phys_base = $(LOADADDR)
>  
>  CFLAGS += -std=gnu99
>  CFLAGS += -ffreestanding
> +CFLAGS += -fno-pic
>  CFLAGS += -Wextra
>  CFLAGS += -O2
>  CFLAGS += -I lib -I lib/libfdt
> -- 
> 2.10.1
> 
>

Applied to arm/next

https://github.com/rhdrjones/kvm-unit-tests/commits/arm/next

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 06/11] arm/Makefile.common: force -fno-pic
@ 2016-11-28  9:33     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2016 at 04:10:28PM +0000, Alex Benn?e wrote:
> As distro compilers move towards defaults for build hardening for things
> like ASLR we need to force -fno-pic. Failure to do can lead to weird
> relocation problems when we build our "lat" binaries.
> 
> Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
> ---
>  arm/Makefile.common | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index 52f7440..cca0d9c 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -21,6 +21,7 @@ phys_base = $(LOADADDR)
>  
>  CFLAGS += -std=gnu99
>  CFLAGS += -ffreestanding
> +CFLAGS += -fno-pic
>  CFLAGS += -Wextra
>  CFLAGS += -O2
>  CFLAGS += -I lib -I lib/libfdt
> -- 
> 2.10.1
> 
>

Applied to arm/next

https://github.com/rhdrjones/kvm-unit-tests/commits/arm/next

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [kvm-unit-tests PATCH v7 07/11] arm/tlbflush-code: Add TLB flush during code execution test
  2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-28  9:42     ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:42 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, nikunj, kvm, marc.zyngier, jan.kiszka, mark.burton,
	qemu-devel, cota, linux-arm-kernel, pbonzini, serge.fdrv,
	bobby.prani, kvmarm, rth

On Thu, Nov 24, 2016 at 04:10:29PM +0000, Alex Bennée wrote:
> This adds a fairly brain dead torture test for TLB flushes intended for
> stressing the MTTCG QEMU build. It takes the usual -smp option for
> multiple CPUs.
> 
> By default it CPU0 will do a TLBIALL flush after each cycle. You can
> pass options via -append to control additional aspects of the test:
> 
>   - "page" flush each page in turn (one per function)
>   - "self" do the flush after each computation cycle
>   - "verbose" report progress on each computation cycle
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> CC: Mark Rutland <mark.rutland@arm.com>
> 
> ---
> v2
>   - rename to tlbflush-test
>   - made makefile changes cleaner
>   - added self/other flush mode
>   - create specific prefix
>   - whitespace fixes
> v3
>   - using new SMP framework for test runing
> v4
>   - merge in the unitests.cfg
> v5
>   - max out at -smp 4
>   - printf fmtfix
> v7
>   - rename to tlbflush-code
>   - int -> bool flags
> ---
>  arm/Makefile.common |   2 +
>  arm/tlbflush-code.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  24 ++++++
>  3 files changed, 238 insertions(+)
>  create mode 100644 arm/tlbflush-code.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index cca0d9c..de99a6e 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -13,6 +13,7 @@ tests-common  = $(TEST_DIR)/selftest.flat
>  tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
> +tests-common += $(TEST_DIR)/tlbflush-code.flat
>  
>  all: test_cases
>  
> @@ -81,3 +82,4 @@ generated_files = $(asm-offsets)
>  test_cases: $(generated_files) $(tests-common) $(tests)
>  
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
> +$(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o

This should no longer be necessary.

> diff --git a/arm/tlbflush-code.c b/arm/tlbflush-code.c
> new file mode 100644
> index 0000000..cb5cdc2
> --- /dev/null
> +++ b/arm/tlbflush-code.c
> @@ -0,0 +1,212 @@
> +/*
> + * TLB Flush Race Tests
> + *
> + * These tests are designed to test for incorrect TLB flush semantics
> + * under emulation. The initial CPU will set all the others working a
> + * compuation task and will then trigger TLB flushes across the
> + * system. It doesn't actually need to re-map anything but the flushes
> + * themselves will trigger QEMU's TCG self-modifying code detection
> + * which will invalidate any generated  code causing re-translation.
> + * Eventually the code buffer will fill and a general tb_lush() will
> + * be triggered.
> + *
> + * Copyright (C) 2016, Linaro, Alex Bennée <alex.bennee@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.
> + */
> +
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#define SEQ_LENGTH 10
> +#define SEQ_HASH 0x7cd707fe
> +
> +static cpumask_t smp_test_complete;
> +static int flush_count = 1000000;
> +static bool flush_self;
> +static bool flush_page;
> +static bool flush_verbose;
> +
> +/*
> + * Work functions
> + *
> + * These work functions need to be:
> + *
> + *  - page aligned, so we can flush one function at a time
> + *  - have branches, so QEMU TCG generates multiple basic blocks
> + *  - call across pages, so we exercise the TCG basic block slow path
> + */
> +
> +/* Adler32 */
> +__attribute__((aligned(PAGE_SIZE))) uint32_t hash_array(const void *buf,
> +							size_t buflen)

I think I'd prefer

__attribute__((aligned(PAGE_SIZE)))
uint32_t hash_array(const void *buf, size_t buflen)

to handle the long line

> +{
> +	const uint8_t *data = (uint8_t *) buf;
> +	uint32_t s1 = 1;
> +	uint32_t s2 = 0;
> +
> +	for (size_t n = 0; n < buflen; n++) {
> +		s1 = (s1 + data[n]) % 65521;
> +		s2 = (s2 + s1) % 65521;
> +	}
> +	return (s2 << 16) | s1;
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) void create_fib_sequence(int length,
> +							unsigned int *array)
> +{
> +	int i;
> +
> +	/* first two values */
> +	array[0] = 0;
> +	array[1] = 1;
> +	for (i=2; i<length; i++) {
> +		array[i] = array[i-2] + array[i-1];
> +	}

please don't use {} for one-liners. Try running the kernel's check_patch
on your patches. Applies many places below

> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) unsigned long long factorial(unsigned int n)

long line

> +{
> +	unsigned int i;
> +	unsigned long long fac = 1;
> +	for (i=1; i<=n; i++)
> +	{
> +		fac = fac * i;
> +	}
> +	return fac;
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) void factorial_array
> +(unsigned int n, unsigned int *input, unsigned long long *output)
> +{
> +	unsigned int i;
> +	for (i=0; i<n; i++) {
> +		output[i] = factorial(input[i]);
> +	}
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) unsigned int do_computation(void)
> +{
> +	unsigned int fib_array[SEQ_LENGTH];
> +	unsigned long long facfib_array[SEQ_LENGTH];
> +	uint32_t fib_hash, facfib_hash;
> +
> +	create_fib_sequence(SEQ_LENGTH, &fib_array[0]);
> +	fib_hash = hash_array(&fib_array[0], sizeof(fib_array));
> +	factorial_array(SEQ_LENGTH, &fib_array[0], &facfib_array[0]);
> +	facfib_hash = hash_array(&facfib_array[0], sizeof(facfib_array));
> +
> +	return (fib_hash ^ facfib_hash);
> +}
> +
> +/* This provides a table of the work functions so we can flush each
> + * page individually
> + */
> +static void * pages[] = {&hash_array, &create_fib_sequence, &factorial,
> +			 &factorial_array, &do_computation};

please put the '*' by pages

> +
> +static void do_flush(int i)
> +{
> +	if (flush_page) {
> +		flush_tlb_page((unsigned long)pages[i % ARRAY_SIZE(pages)]);
> +	} else {
> +		flush_tlb_all();
> +	}
> +}
> +
> +
> +static void just_compute(void)
> +{
> +	int i, errors = 0;
> +	int cpu = smp_processor_id();
> +
> +	uint32_t result;
> +
> +	printf("CPU%d online\n", cpu);
> +
> +	for (i=0; i < flush_count; i++) {
> +		result = do_computation();
> +
> +		if (result != SEQ_HASH) {
> +			errors++;
> +			printf("CPU%d: seq%d 0x%"PRIx32"!=0x%x\n",
> +				cpu, i, result, SEQ_HASH);
> +		}
> +
> +		if (flush_verbose && (i % 1000) == 0) {
> +			printf("CPU%d: seq%d\n", cpu, i);
> +		}
> +
> +		if (flush_self) {
> +			do_flush(i);
> +		}
> +	}
> +
> +	report("CPU%d: Done - Errors: %d\n", errors == 0, cpu, errors);
> +
> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +	if (cpu != 0)
> +		halt();
> +}
> +
> +static void just_flush(void)
> +{
> +	int cpu = smp_processor_id();
> +	int i = 0;
> +
> +	/* set our CPU as done, keep flushing until everyone else
> +	   finished */

Not our comment style

> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +
> +	while (!cpumask_full(&smp_test_complete)) {
> +		do_flush(i++);
> +	}
> +
> +	report("CPU%d: Done - Triggered %d flushes\n", true, cpu, i);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int cpu, i;
> +	char prefix[100];
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +
> +		if (strcmp(arg, "page") == 0) {
> +			flush_page = true;
> +                }
> +
> +                if (strcmp(arg, "self") == 0) {
> +			flush_self = true;
> +                }
> +
> +		if (strcmp(arg, "verbose") == 0) {
> +			flush_verbose = true;
> +                }
> +	}
> +
> +	snprintf(prefix, sizeof(prefix), "tlbflush_%s_%s",
> +		flush_page?"page":"all",
> +		flush_self?"self":"other");
> +	report_prefix_push(prefix);
> +
> +	for_each_present_cpu(cpu) {
> +		if (cpu == 0)
> +			continue;
> +		smp_boot_secondary(cpu, just_compute);
> +	}
> +
> +	if (flush_self)
> +		just_compute();
> +	else
> +		just_flush();
> +
> +	while (!cpumask_full(&smp_test_complete))
> +		cpu_relax();
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index c7392c7..beaae84 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -72,3 +72,27 @@ file = gic.flat
>  smp = $MAX_SMP
>  extra_params = -machine gic-version=3 -append 'ipi'
>  groups = gic
> +
> +# TLB Torture Tests
> +[tlbflush-code::all_other]

We don't use the '::' style anymore, as it doesn't work
well with mkstandalone.

> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +groups = tlbflush
> +
> +[tlbflush-code::page_other]
> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'page'
> +groups = tlbflush
> +
> +[tlbflush-code::all_self]
> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'self'
> +groups = tlbflush
> +
> +[tlbflush-code::page_self]
> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'page self'
> +groups = tlbflush
> -- 
> 2.10.1
>

I only did a superficial review, but it looks familiar. I guess I've
reviewed some of it before.

drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 07/11] arm/tlbflush-code: Add TLB flush during code execution test
@ 2016-11-28  9:42     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:42 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, nikunj, jan.kiszka, mark.burton, qemu-devel, cota,
	serge.fdrv, pbonzini, bobby.prani, rth

On Thu, Nov 24, 2016 at 04:10:29PM +0000, Alex Bennée wrote:
> This adds a fairly brain dead torture test for TLB flushes intended for
> stressing the MTTCG QEMU build. It takes the usual -smp option for
> multiple CPUs.
> 
> By default it CPU0 will do a TLBIALL flush after each cycle. You can
> pass options via -append to control additional aspects of the test:
> 
>   - "page" flush each page in turn (one per function)
>   - "self" do the flush after each computation cycle
>   - "verbose" report progress on each computation cycle
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> CC: Mark Rutland <mark.rutland@arm.com>
> 
> ---
> v2
>   - rename to tlbflush-test
>   - made makefile changes cleaner
>   - added self/other flush mode
>   - create specific prefix
>   - whitespace fixes
> v3
>   - using new SMP framework for test runing
> v4
>   - merge in the unitests.cfg
> v5
>   - max out at -smp 4
>   - printf fmtfix
> v7
>   - rename to tlbflush-code
>   - int -> bool flags
> ---
>  arm/Makefile.common |   2 +
>  arm/tlbflush-code.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  24 ++++++
>  3 files changed, 238 insertions(+)
>  create mode 100644 arm/tlbflush-code.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index cca0d9c..de99a6e 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -13,6 +13,7 @@ tests-common  = $(TEST_DIR)/selftest.flat
>  tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
> +tests-common += $(TEST_DIR)/tlbflush-code.flat
>  
>  all: test_cases
>  
> @@ -81,3 +82,4 @@ generated_files = $(asm-offsets)
>  test_cases: $(generated_files) $(tests-common) $(tests)
>  
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
> +$(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o

This should no longer be necessary.

> diff --git a/arm/tlbflush-code.c b/arm/tlbflush-code.c
> new file mode 100644
> index 0000000..cb5cdc2
> --- /dev/null
> +++ b/arm/tlbflush-code.c
> @@ -0,0 +1,212 @@
> +/*
> + * TLB Flush Race Tests
> + *
> + * These tests are designed to test for incorrect TLB flush semantics
> + * under emulation. The initial CPU will set all the others working a
> + * compuation task and will then trigger TLB flushes across the
> + * system. It doesn't actually need to re-map anything but the flushes
> + * themselves will trigger QEMU's TCG self-modifying code detection
> + * which will invalidate any generated  code causing re-translation.
> + * Eventually the code buffer will fill and a general tb_lush() will
> + * be triggered.
> + *
> + * Copyright (C) 2016, Linaro, Alex Bennée <alex.bennee@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.
> + */
> +
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#define SEQ_LENGTH 10
> +#define SEQ_HASH 0x7cd707fe
> +
> +static cpumask_t smp_test_complete;
> +static int flush_count = 1000000;
> +static bool flush_self;
> +static bool flush_page;
> +static bool flush_verbose;
> +
> +/*
> + * Work functions
> + *
> + * These work functions need to be:
> + *
> + *  - page aligned, so we can flush one function at a time
> + *  - have branches, so QEMU TCG generates multiple basic blocks
> + *  - call across pages, so we exercise the TCG basic block slow path
> + */
> +
> +/* Adler32 */
> +__attribute__((aligned(PAGE_SIZE))) uint32_t hash_array(const void *buf,
> +							size_t buflen)

I think I'd prefer

__attribute__((aligned(PAGE_SIZE)))
uint32_t hash_array(const void *buf, size_t buflen)

to handle the long line

> +{
> +	const uint8_t *data = (uint8_t *) buf;
> +	uint32_t s1 = 1;
> +	uint32_t s2 = 0;
> +
> +	for (size_t n = 0; n < buflen; n++) {
> +		s1 = (s1 + data[n]) % 65521;
> +		s2 = (s2 + s1) % 65521;
> +	}
> +	return (s2 << 16) | s1;
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) void create_fib_sequence(int length,
> +							unsigned int *array)
> +{
> +	int i;
> +
> +	/* first two values */
> +	array[0] = 0;
> +	array[1] = 1;
> +	for (i=2; i<length; i++) {
> +		array[i] = array[i-2] + array[i-1];
> +	}

please don't use {} for one-liners. Try running the kernel's check_patch
on your patches. Applies many places below

> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) unsigned long long factorial(unsigned int n)

long line

> +{
> +	unsigned int i;
> +	unsigned long long fac = 1;
> +	for (i=1; i<=n; i++)
> +	{
> +		fac = fac * i;
> +	}
> +	return fac;
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) void factorial_array
> +(unsigned int n, unsigned int *input, unsigned long long *output)
> +{
> +	unsigned int i;
> +	for (i=0; i<n; i++) {
> +		output[i] = factorial(input[i]);
> +	}
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) unsigned int do_computation(void)
> +{
> +	unsigned int fib_array[SEQ_LENGTH];
> +	unsigned long long facfib_array[SEQ_LENGTH];
> +	uint32_t fib_hash, facfib_hash;
> +
> +	create_fib_sequence(SEQ_LENGTH, &fib_array[0]);
> +	fib_hash = hash_array(&fib_array[0], sizeof(fib_array));
> +	factorial_array(SEQ_LENGTH, &fib_array[0], &facfib_array[0]);
> +	facfib_hash = hash_array(&facfib_array[0], sizeof(facfib_array));
> +
> +	return (fib_hash ^ facfib_hash);
> +}
> +
> +/* This provides a table of the work functions so we can flush each
> + * page individually
> + */
> +static void * pages[] = {&hash_array, &create_fib_sequence, &factorial,
> +			 &factorial_array, &do_computation};

please put the '*' by pages

> +
> +static void do_flush(int i)
> +{
> +	if (flush_page) {
> +		flush_tlb_page((unsigned long)pages[i % ARRAY_SIZE(pages)]);
> +	} else {
> +		flush_tlb_all();
> +	}
> +}
> +
> +
> +static void just_compute(void)
> +{
> +	int i, errors = 0;
> +	int cpu = smp_processor_id();
> +
> +	uint32_t result;
> +
> +	printf("CPU%d online\n", cpu);
> +
> +	for (i=0; i < flush_count; i++) {
> +		result = do_computation();
> +
> +		if (result != SEQ_HASH) {
> +			errors++;
> +			printf("CPU%d: seq%d 0x%"PRIx32"!=0x%x\n",
> +				cpu, i, result, SEQ_HASH);
> +		}
> +
> +		if (flush_verbose && (i % 1000) == 0) {
> +			printf("CPU%d: seq%d\n", cpu, i);
> +		}
> +
> +		if (flush_self) {
> +			do_flush(i);
> +		}
> +	}
> +
> +	report("CPU%d: Done - Errors: %d\n", errors == 0, cpu, errors);
> +
> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +	if (cpu != 0)
> +		halt();
> +}
> +
> +static void just_flush(void)
> +{
> +	int cpu = smp_processor_id();
> +	int i = 0;
> +
> +	/* set our CPU as done, keep flushing until everyone else
> +	   finished */

Not our comment style

> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +
> +	while (!cpumask_full(&smp_test_complete)) {
> +		do_flush(i++);
> +	}
> +
> +	report("CPU%d: Done - Triggered %d flushes\n", true, cpu, i);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int cpu, i;
> +	char prefix[100];
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +
> +		if (strcmp(arg, "page") == 0) {
> +			flush_page = true;
> +                }
> +
> +                if (strcmp(arg, "self") == 0) {
> +			flush_self = true;
> +                }
> +
> +		if (strcmp(arg, "verbose") == 0) {
> +			flush_verbose = true;
> +                }
> +	}
> +
> +	snprintf(prefix, sizeof(prefix), "tlbflush_%s_%s",
> +		flush_page?"page":"all",
> +		flush_self?"self":"other");
> +	report_prefix_push(prefix);
> +
> +	for_each_present_cpu(cpu) {
> +		if (cpu == 0)
> +			continue;
> +		smp_boot_secondary(cpu, just_compute);
> +	}
> +
> +	if (flush_self)
> +		just_compute();
> +	else
> +		just_flush();
> +
> +	while (!cpumask_full(&smp_test_complete))
> +		cpu_relax();
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index c7392c7..beaae84 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -72,3 +72,27 @@ file = gic.flat
>  smp = $MAX_SMP
>  extra_params = -machine gic-version=3 -append 'ipi'
>  groups = gic
> +
> +# TLB Torture Tests
> +[tlbflush-code::all_other]

We don't use the '::' style anymore, as it doesn't work
well with mkstandalone.

> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +groups = tlbflush
> +
> +[tlbflush-code::page_other]
> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'page'
> +groups = tlbflush
> +
> +[tlbflush-code::all_self]
> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'self'
> +groups = tlbflush
> +
> +[tlbflush-code::page_self]
> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'page self'
> +groups = tlbflush
> -- 
> 2.10.1
>

I only did a superficial review, but it looks familiar. I guess I've
reviewed some of it before.

drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 07/11] arm/tlbflush-code: Add TLB flush during code execution test
@ 2016-11-28  9:42     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28  9:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2016 at 04:10:29PM +0000, Alex Benn?e wrote:
> This adds a fairly brain dead torture test for TLB flushes intended for
> stressing the MTTCG QEMU build. It takes the usual -smp option for
> multiple CPUs.
> 
> By default it CPU0 will do a TLBIALL flush after each cycle. You can
> pass options via -append to control additional aspects of the test:
> 
>   - "page" flush each page in turn (one per function)
>   - "self" do the flush after each computation cycle
>   - "verbose" report progress on each computation cycle
> 
> Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
> CC: Mark Rutland <mark.rutland@arm.com>
> 
> ---
> v2
>   - rename to tlbflush-test
>   - made makefile changes cleaner
>   - added self/other flush mode
>   - create specific prefix
>   - whitespace fixes
> v3
>   - using new SMP framework for test runing
> v4
>   - merge in the unitests.cfg
> v5
>   - max out at -smp 4
>   - printf fmtfix
> v7
>   - rename to tlbflush-code
>   - int -> bool flags
> ---
>  arm/Makefile.common |   2 +
>  arm/tlbflush-code.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  24 ++++++
>  3 files changed, 238 insertions(+)
>  create mode 100644 arm/tlbflush-code.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index cca0d9c..de99a6e 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -13,6 +13,7 @@ tests-common  = $(TEST_DIR)/selftest.flat
>  tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
> +tests-common += $(TEST_DIR)/tlbflush-code.flat
>  
>  all: test_cases
>  
> @@ -81,3 +82,4 @@ generated_files = $(asm-offsets)
>  test_cases: $(generated_files) $(tests-common) $(tests)
>  
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
> +$(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o

This should no longer be necessary.

> diff --git a/arm/tlbflush-code.c b/arm/tlbflush-code.c
> new file mode 100644
> index 0000000..cb5cdc2
> --- /dev/null
> +++ b/arm/tlbflush-code.c
> @@ -0,0 +1,212 @@
> +/*
> + * TLB Flush Race Tests
> + *
> + * These tests are designed to test for incorrect TLB flush semantics
> + * under emulation. The initial CPU will set all the others working a
> + * compuation task and will then trigger TLB flushes across the
> + * system. It doesn't actually need to re-map anything but the flushes
> + * themselves will trigger QEMU's TCG self-modifying code detection
> + * which will invalidate any generated  code causing re-translation.
> + * Eventually the code buffer will fill and a general tb_lush() will
> + * be triggered.
> + *
> + * Copyright (C) 2016, Linaro, Alex Benn?e <alex.bennee@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.
> + */
> +
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#define SEQ_LENGTH 10
> +#define SEQ_HASH 0x7cd707fe
> +
> +static cpumask_t smp_test_complete;
> +static int flush_count = 1000000;
> +static bool flush_self;
> +static bool flush_page;
> +static bool flush_verbose;
> +
> +/*
> + * Work functions
> + *
> + * These work functions need to be:
> + *
> + *  - page aligned, so we can flush one function at a time
> + *  - have branches, so QEMU TCG generates multiple basic blocks
> + *  - call across pages, so we exercise the TCG basic block slow path
> + */
> +
> +/* Adler32 */
> +__attribute__((aligned(PAGE_SIZE))) uint32_t hash_array(const void *buf,
> +							size_t buflen)

I think I'd prefer

__attribute__((aligned(PAGE_SIZE)))
uint32_t hash_array(const void *buf, size_t buflen)

to handle the long line

> +{
> +	const uint8_t *data = (uint8_t *) buf;
> +	uint32_t s1 = 1;
> +	uint32_t s2 = 0;
> +
> +	for (size_t n = 0; n < buflen; n++) {
> +		s1 = (s1 + data[n]) % 65521;
> +		s2 = (s2 + s1) % 65521;
> +	}
> +	return (s2 << 16) | s1;
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) void create_fib_sequence(int length,
> +							unsigned int *array)
> +{
> +	int i;
> +
> +	/* first two values */
> +	array[0] = 0;
> +	array[1] = 1;
> +	for (i=2; i<length; i++) {
> +		array[i] = array[i-2] + array[i-1];
> +	}

please don't use {} for one-liners. Try running the kernel's check_patch
on your patches. Applies many places below

> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) unsigned long long factorial(unsigned int n)

long line

> +{
> +	unsigned int i;
> +	unsigned long long fac = 1;
> +	for (i=1; i<=n; i++)
> +	{
> +		fac = fac * i;
> +	}
> +	return fac;
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) void factorial_array
> +(unsigned int n, unsigned int *input, unsigned long long *output)
> +{
> +	unsigned int i;
> +	for (i=0; i<n; i++) {
> +		output[i] = factorial(input[i]);
> +	}
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) unsigned int do_computation(void)
> +{
> +	unsigned int fib_array[SEQ_LENGTH];
> +	unsigned long long facfib_array[SEQ_LENGTH];
> +	uint32_t fib_hash, facfib_hash;
> +
> +	create_fib_sequence(SEQ_LENGTH, &fib_array[0]);
> +	fib_hash = hash_array(&fib_array[0], sizeof(fib_array));
> +	factorial_array(SEQ_LENGTH, &fib_array[0], &facfib_array[0]);
> +	facfib_hash = hash_array(&facfib_array[0], sizeof(facfib_array));
> +
> +	return (fib_hash ^ facfib_hash);
> +}
> +
> +/* This provides a table of the work functions so we can flush each
> + * page individually
> + */
> +static void * pages[] = {&hash_array, &create_fib_sequence, &factorial,
> +			 &factorial_array, &do_computation};

please put the '*' by pages

> +
> +static void do_flush(int i)
> +{
> +	if (flush_page) {
> +		flush_tlb_page((unsigned long)pages[i % ARRAY_SIZE(pages)]);
> +	} else {
> +		flush_tlb_all();
> +	}
> +}
> +
> +
> +static void just_compute(void)
> +{
> +	int i, errors = 0;
> +	int cpu = smp_processor_id();
> +
> +	uint32_t result;
> +
> +	printf("CPU%d online\n", cpu);
> +
> +	for (i=0; i < flush_count; i++) {
> +		result = do_computation();
> +
> +		if (result != SEQ_HASH) {
> +			errors++;
> +			printf("CPU%d: seq%d 0x%"PRIx32"!=0x%x\n",
> +				cpu, i, result, SEQ_HASH);
> +		}
> +
> +		if (flush_verbose && (i % 1000) == 0) {
> +			printf("CPU%d: seq%d\n", cpu, i);
> +		}
> +
> +		if (flush_self) {
> +			do_flush(i);
> +		}
> +	}
> +
> +	report("CPU%d: Done - Errors: %d\n", errors == 0, cpu, errors);
> +
> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +	if (cpu != 0)
> +		halt();
> +}
> +
> +static void just_flush(void)
> +{
> +	int cpu = smp_processor_id();
> +	int i = 0;
> +
> +	/* set our CPU as done, keep flushing until everyone else
> +	   finished */

Not our comment style

> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +
> +	while (!cpumask_full(&smp_test_complete)) {
> +		do_flush(i++);
> +	}
> +
> +	report("CPU%d: Done - Triggered %d flushes\n", true, cpu, i);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int cpu, i;
> +	char prefix[100];
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +
> +		if (strcmp(arg, "page") == 0) {
> +			flush_page = true;
> +                }
> +
> +                if (strcmp(arg, "self") == 0) {
> +			flush_self = true;
> +                }
> +
> +		if (strcmp(arg, "verbose") == 0) {
> +			flush_verbose = true;
> +                }
> +	}
> +
> +	snprintf(prefix, sizeof(prefix), "tlbflush_%s_%s",
> +		flush_page?"page":"all",
> +		flush_self?"self":"other");
> +	report_prefix_push(prefix);
> +
> +	for_each_present_cpu(cpu) {
> +		if (cpu == 0)
> +			continue;
> +		smp_boot_secondary(cpu, just_compute);
> +	}
> +
> +	if (flush_self)
> +		just_compute();
> +	else
> +		just_flush();
> +
> +	while (!cpumask_full(&smp_test_complete))
> +		cpu_relax();
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index c7392c7..beaae84 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -72,3 +72,27 @@ file = gic.flat
>  smp = $MAX_SMP
>  extra_params = -machine gic-version=3 -append 'ipi'
>  groups = gic
> +
> +# TLB Torture Tests
> +[tlbflush-code::all_other]

We don't use the '::' style anymore, as it doesn't work
well with mkstandalone.

> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +groups = tlbflush
> +
> +[tlbflush-code::page_other]
> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'page'
> +groups = tlbflush
> +
> +[tlbflush-code::all_self]
> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'self'
> +groups = tlbflush
> +
> +[tlbflush-code::page_self]
> +file = tlbflush-code.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'page self'
> +groups = tlbflush
> -- 
> 2.10.1
>

I only did a superficial review, but it looks familiar. I guess I've
reviewed some of it before.

drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [kvm-unit-tests PATCH v7 08/11] arm/tlbflush-data: Add TLB flush during data writes test
  2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-28 10:11     ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:11 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, nikunj, kvm, marc.zyngier, jan.kiszka, mark.burton,
	qemu-devel, cota, linux-arm-kernel, pbonzini, serge.fdrv,
	bobby.prani, kvmarm, christoffer.dall, rth

On Thu, Nov 24, 2016 at 04:10:30PM +0000, Alex Bennée wrote:
> This test is the cousin of the tlbflush-code test. Instead of flushing
> running code it re-maps virtual addresses while a buffer is being filled
> up. It then audits the results checking for writes that have ended up in
> the wrong place.
> 
> While tlbflush-code exercises QEMU's translation invalidation logic this
> test stresses the SoftMMU cputlb code and ensures it is semantically
> correct.
> 
> The test optionally takes two parameters for debugging:
> 
>    cycles           - change the default number of test iterations
>    page             - flush pages individually instead of all
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> CC: Mark Rutland <mark.rutland@arm.com>
> ---
>  arm/Makefile.common |   2 +
>  arm/tlbflush-data.c | 401 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  12 ++
>  3 files changed, 415 insertions(+)
>  create mode 100644 arm/tlbflush-data.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index de99a6e..528166d 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -14,6 +14,7 @@ tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
>  tests-common += $(TEST_DIR)/tlbflush-code.flat
> +tests-common += $(TEST_DIR)/tlbflush-data.flat
>  
>  all: test_cases
>  
> @@ -83,3 +84,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
>  
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
>  $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
> +$(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o

This isn't necessary

> diff --git a/arm/tlbflush-data.c b/arm/tlbflush-data.c
> new file mode 100644
> index 0000000..7920179
> --- /dev/null
> +++ b/arm/tlbflush-data.c
> @@ -0,0 +1,401 @@
> +/*
> + * TLB Flush Race Tests
> + *
> + * These tests are designed to test for incorrect TLB flush semantics
> + * under emulation. The initial CPU will set all the others working on
> + * a writing to a set of pages. It will then re-map one of the pages
> + * back and forth while recording the timestamps of when each page was
> + * active. The test fails if a write was detected on a page after the
> + * tlbflush switching to a new page should have completed.
> + *
> + * Copyright (C) 2016, Linaro, Alex Bennée <alex.bennee@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.
> + */
> +
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#define NR_TIMESTAMPS 		((PAGE_SIZE/sizeof(u64)) << 2)
> +#define NR_AUDIT_RECORDS	16384
> +#define NR_DYNAMIC_PAGES 	3
> +#define MAX_CPUS 		8
> +
> +#define MIN(a, b)		((a) < (b) ? (a) : (b))

Peter Xu is bringing MIN to libcflat with his edu series.

> +
> +typedef struct {
> +	u64    		timestamps[NR_TIMESTAMPS];
> +} write_buffer;
> +
> +typedef struct {
> +	write_buffer 	*newbuf;
> +	u64		time_before_flush;
> +	u64		time_after_flush;
> +} audit_rec_t;
> +
> +typedef struct {
> +	audit_rec_t 	records[NR_AUDIT_RECORDS];
> +} audit_buffer;
> +
> +typedef struct {
> +	write_buffer 	*stable_pages;
> +	write_buffer    *dynamic_pages[NR_DYNAMIC_PAGES];
> +	audit_buffer 	*audit;
> +	unsigned int 	flush_count;
> +} test_data_t;
> +
> +static test_data_t test_data[MAX_CPUS];
> +
> +static cpumask_t ready;
> +static cpumask_t complete;
> +
> +static bool test_complete;
> +static bool flush_verbose;
> +static bool flush_by_page;
> +static int test_cycles=3;
> +static int secondary_cpus;
> +
> +static write_buffer * alloc_test_pages(void)
> +{
> +	write_buffer *pg;
> +	pg = calloc(NR_TIMESTAMPS, sizeof(u64));
> +	return pg;
> +}
> +
> +static void setup_pages_for_cpu(int cpu)
> +{
> +	unsigned int i;
> +
> +	test_data[cpu].stable_pages = alloc_test_pages();
> +
> +	for (i=0; i<NR_DYNAMIC_PAGES; i++) {
> +		test_data[cpu].dynamic_pages[i] = alloc_test_pages();
> +	}
> +
> +	test_data[cpu].audit = calloc(NR_AUDIT_RECORDS, sizeof(audit_rec_t));
> +}
> +
> +static audit_rec_t * get_audit_record(audit_buffer *buf, unsigned int record)
> +{
> +	return &buf->records[record];
> +}
> +
> +/* Sync on a given cpumask */
> +static void wait_on(int cpu, cpumask_t *mask)
> +{

Why take 'cpu' as a parameter. Just use smp_processor_id()

> +	cpumask_set_cpu(cpu, mask);
> +	while (!cpumask_full(mask))
> +		cpu_relax();
> +}
> +
> +static uint64_t sync_start(void)
> +{
> +	const uint64_t gate_mask = ~0x7ff;
> +	uint64_t gate, now;
> +	gate = get_cntvct() & gate_mask;
> +	do {
> +		now = get_cntvct();
> +	} while ((now & gate_mask) == gate);

I'm not really sure what this function is doing. Trying to
get synchronized timestamps between cpus?

> +
> +	return now;
> +}
> +
> +static void do_page_writes(void)
> +{
> +	unsigned int i, runs = 0;
> +	int cpu = smp_processor_id();
> +	write_buffer *stable_pages = test_data[cpu].stable_pages;
> +	write_buffer *moving_page = test_data[cpu].dynamic_pages[0];
> +
> +	printf("CPU%d: ready %p/%p @ 0x%08" PRIx64"\n",
> +		cpu, stable_pages, moving_page, get_cntvct());
> +
> +	while (!test_complete) {
> +		u64 run_start, run_end;
> +
> +		smp_mb();
> +		wait_on(cpu, &ready);
> +		run_start = sync_start();
> +
> +		for (i = 0; i < NR_TIMESTAMPS; i++) {
> +			u64 ts = get_cntvct();
> +			moving_page->timestamps[i] = ts;
> +			stable_pages->timestamps[i] = ts;
> +		}
> +
> +		run_end = get_cntvct();
> +		printf("CPU%d: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles)\n",
> +			cpu, runs++, run_start, run_end, run_end - run_start);
> +
> +		/* wait on completion - gets clear my main thread*/
> +		wait_on(cpu, &complete);
> +	}
> +}
> +
> +
> +/*
> + * This is the core of the test. Timestamps are taken either side of
> + * the updating of the page table and the flush instruction. By
> + * keeping track of when the page mapping is changed we can detect any
> + * writes that shouldn't have made it to the other pages.
> + *
> + * This isn't the recommended way to update the page table. ARM
> + * recommends break-before-make so accesses that are in flight can
> + * trigger faults that can be handled cleanly.
> + */
> +
> +/* This mimics  __flush_tlb_range from the kernel, doing a series of
> + * flush operations and then the dsb() to complete. */
> +static void flush_pages(unsigned long start, unsigned long end)
> +{
> +	unsigned long addr;
> +	start = start >> 12;
> +	end = end >> 12;

Looks like you're assuming 4K pages, but AArch64 unit tests have 64K
pages. You're free to change that, but you'll need to disable and
re-enable the mmu with new parameters.

> +
> +	dsb(ishst);
> +	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT -12)) {

Hmm, start and end are 4K aligned, but now you do shift addr appropriately
for 64K pages. Why not just do

start &= PAGE_MASK;
end &= PAGE_MASK;
addr += PAGE_SIZE

> +#if defined(__aarch64__)
> +		asm("tlbi	vaae1is, %0" :: "r" (addr));
> +#else
> +		asm volatile("mcr p15, 0, %0, c8, c7, 3" :: "r" (addr));
> +#endif
> +	}
> +	dsb(ish);

flush_pages() may be something we want in common code.

> +}
> +
> +static void remap_one_page(test_data_t *data)
> +{
> +	u64 ts_before, ts_after;
> +	int pg = (data->flush_count % (NR_DYNAMIC_PAGES + 1));
> +	write_buffer *dynamic_pages_vaddr = data->dynamic_pages[0];
> +	write_buffer *newbuf_paddr = data->dynamic_pages[pg];
> +	write_buffer *end_page_paddr = newbuf_paddr+1;
> +
> +	ts_before = get_cntvct();
> +	/* update the page table */
> +	mmu_set_range_ptes(mmu_idmap,
> +			(unsigned long) dynamic_pages_vaddr,
> +			(unsigned long) newbuf_paddr,
> +			(unsigned long) end_page_paddr,
> +			__pgprot(PTE_WBWA));
> +	/* until the flush + isb() writes may still go to old address */
> +	if (flush_by_page) {
> +		flush_pages((unsigned long)dynamic_pages_vaddr, (unsigned long)(dynamic_pages_vaddr+1));
> +	} else {
> +		flush_tlb_all();
> +	}
> +	ts_after = get_cntvct();
> +
> +	if (data->flush_count < NR_AUDIT_RECORDS) {
> +		audit_rec_t *rec = get_audit_record(data->audit, data->flush_count);
> +		rec->newbuf = newbuf_paddr;
> +		rec->time_before_flush = ts_before;
> +		rec->time_after_flush = ts_after;
> +	}
> +	data->flush_count++;
> +}
> +
> +static int check_pages(int cpu, char *msg,
> +		write_buffer *base_page, write_buffer *test_page,
> +		audit_buffer *audit, unsigned int flushes)
> +{
> +	write_buffer *prev_page = base_page;
> +	unsigned int empty = 0, write = 0, late = 0, weird = 0;

The variable 'weird' is a bit weird. How about 'bad'?

> +	unsigned int ts_index = 0, audit_index;
> +	u64 ts;
> +
> +	/* For each audit record */
> +	for (audit_index = 0; audit_index < MIN(flushes, NR_AUDIT_RECORDS); audit_index++) {
> +		audit_rec_t *rec = get_audit_record(audit, audit_index);
> +
> +		do {
> +			/* Work through timestamps until we overtake
> +			 * this audit record */
> +			ts = test_page->timestamps[ts_index];
> +
> +			if (ts == 0) {
> +				empty++;
> +			} else if (ts < rec->time_before_flush) {
> +				if (test_page == prev_page) {
> +					write++;
> +				} else {
> +					late++;
> +				}
> +			} else if (ts >= rec->time_before_flush
> +				&& ts <= rec->time_after_flush) {
> +				if (test_page == prev_page
> +					|| test_page == rec->newbuf) {
> +					write++;
> +				} else {
> +					weird++;
> +				}
> +			} else if (ts > rec->time_after_flush) {
> +				if (test_page == rec->newbuf) {
> +					write++;
> +				}
> +				/* It's possible the ts is way ahead
> +				 * of the current record so we can't
> +				 * call a non-match weird...
> +				 *
> +				 * Time to skip to next audit record
> +				 */
> +				break;
> +			}
> +
> +			ts = test_page->timestamps[ts_index++];
> +		} while (ts <= rec->time_after_flush && ts_index < NR_TIMESTAMPS);
> +
> +
> +		/* Next record */
> +		prev_page = rec->newbuf;
> +	} /* for each audit record */
> +
> +	if (flush_verbose) {
> +		printf("CPU%d: %s %p => %p %u/%u/%u/%u (0/OK/L/?) = %u total\n",
> +			cpu, msg, test_page, base_page,
> +			empty, write, late, weird, empty+write+late+weird);
> +	}
> +
> +	return weird;
> +}
> +
> +static int audit_cpu_pages(int cpu, test_data_t *data)
> +{
> +	unsigned int pg, writes=0, ts_index = 0;
> +	write_buffer *test_page;
> +	int errors = 0;
> +
> +	/* first the stable page */
> +	test_page = data->stable_pages;
> +	do {
> +		if (test_page->timestamps[ts_index++]) {
> +			writes++;
> +		}
> +	} while (ts_index < NR_TIMESTAMPS);
> +
> +	if (writes != ts_index) {
> +		errors += 1;
> +	}
> +
> +	if (flush_verbose) {
> +		printf("CPU%d: stable page %p %u writes\n",
> +			cpu, test_page, writes);
> +	}
> +
> +
> +	/* Restore the mapping for dynamic page */
> +	test_page = data->dynamic_pages[0];
> +
> +	mmu_set_range_ptes(mmu_idmap,
> +			(unsigned long) test_page,
> +			(unsigned long) test_page,
> +			(unsigned long) &test_page[1],
> +			__pgprot(PTE_WBWA));
> +	flush_tlb_all();
> +
> +	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
> +		errors += check_pages(cpu, "dynamic page", test_page,
> +				data->dynamic_pages[pg],
> +				data->audit, data->flush_count);
> +	}
> +
> +	/* reset for next run */
> +	memset(data->stable_pages, 0, sizeof(write_buffer));
> +	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
> +		memset(data->dynamic_pages[pg], 0, sizeof(write_buffer));
> +	}
> +	memset(data->audit, 0, sizeof(audit_buffer));
> +	data->flush_count = 0;
> +	smp_mb();
> +
> +	report("CPU%d: checked, errors: %d", errors == 0, cpu, errors);
> +	return errors;
> +}
> +
> +static void do_page_flushes(void)
> +{
> +	int i, cpu;
> +
> +	printf("CPU0: ready @ 0x%08" PRIx64"\n", get_cntvct());
> +
> +	for (i=0; i<test_cycles; i++) {
> +		unsigned int flushes=0;
> +		u64 run_start, run_end;
> +		int cpus_finished;
> +
> +		cpumask_clear(&complete);
> +		wait_on(0, &ready);
> +		run_start = sync_start();
> +
> +		do {
> +			for_each_present_cpu(cpu) {
> +				if (cpu == 0)
> +					continue;
> +
> +				/* do remap & flush */
> +				remap_one_page(&test_data[cpu]);
> +				flushes++;
> +			}
> +
> +			cpus_finished = cpumask_weight(&complete);
> +		} while (cpus_finished < secondary_cpus);
> +
> +		run_end = get_cntvct();
> +
> +		printf("CPU0: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles, %u flushes)\n",
> +			i, run_start, run_end, run_end - run_start, flushes);
> +
> +		/* Reset our ready mask for next cycle */
> +		cpumask_clear_cpu(0, &ready);
> +		smp_mb();
> +		wait_on(0, &complete);
> +
> +		/* Check for discrepancies */
> +		for_each_present_cpu(cpu) {
> +			if (cpu == 0)
> +				continue;
> +			audit_cpu_pages(cpu, &test_data[cpu]);
> +		}
> +	}
> +
> +	test_complete = true;
> +	smp_mb();
> +	cpumask_set_cpu(0, &ready);
> +	cpumask_set_cpu(0, &complete);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int cpu, i;
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +		if (strcmp(arg, "verbose") == 0) {
> +			flush_verbose = true;
> +		}
> +		if (strcmp(arg, "page") == 0) {
> +			flush_by_page = true;
> +		}
> +		if (strstr(arg, "cycles=") != NULL) {
> +			char *p = strstr(arg, "=");
> +			test_cycles = atol(p+1);

We have parse_keyval for this. Radim has plans to improve
parse_keyval though, as nobody (including the author, me)
really like it as is...

> +		}
> +	}
> +
> +	for_each_present_cpu(cpu) {
> +		if (cpu == 0)
> +			continue;
> +
> +		setup_pages_for_cpu(cpu);
> +		smp_boot_secondary(cpu, do_page_writes);
> +		secondary_cpus++;
> +	}
> +
> +	/* CPU 0 does the flushes and checks the results */
> +	do_page_flushes();
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index beaae84..7dc7799 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -96,3 +96,15 @@ file = tlbflush-code.flat
>  smp = $(($MAX_SMP>4?4:$MAX_SMP))
>  extra_params = -append 'page self'
>  groups = tlbflush
> +
> +[tlbflush-data::all]
> +file = tlbflush-data.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +groups = tlbflush
> +
> +[tlbflush-data::page]
> +file = tlbflush-data.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append "page"
> +groups = tlbflush
> +
> -- 
> 2.10.1
>

Same style comments as last patch apply to this one too.

I skimmed this pretty quickly mostly looking at it wrt framework API and
style. And that looks pretty good to me.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 08/11] arm/tlbflush-data: Add TLB flush during data writes test
@ 2016-11-28 10:11     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:11 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, nikunj, jan.kiszka, mark.burton, qemu-devel, cota,
	serge.fdrv, pbonzini, bobby.prani, rth

On Thu, Nov 24, 2016 at 04:10:30PM +0000, Alex Bennée wrote:
> This test is the cousin of the tlbflush-code test. Instead of flushing
> running code it re-maps virtual addresses while a buffer is being filled
> up. It then audits the results checking for writes that have ended up in
> the wrong place.
> 
> While tlbflush-code exercises QEMU's translation invalidation logic this
> test stresses the SoftMMU cputlb code and ensures it is semantically
> correct.
> 
> The test optionally takes two parameters for debugging:
> 
>    cycles           - change the default number of test iterations
>    page             - flush pages individually instead of all
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> CC: Mark Rutland <mark.rutland@arm.com>
> ---
>  arm/Makefile.common |   2 +
>  arm/tlbflush-data.c | 401 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  12 ++
>  3 files changed, 415 insertions(+)
>  create mode 100644 arm/tlbflush-data.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index de99a6e..528166d 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -14,6 +14,7 @@ tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
>  tests-common += $(TEST_DIR)/tlbflush-code.flat
> +tests-common += $(TEST_DIR)/tlbflush-data.flat
>  
>  all: test_cases
>  
> @@ -83,3 +84,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
>  
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
>  $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
> +$(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o

This isn't necessary

> diff --git a/arm/tlbflush-data.c b/arm/tlbflush-data.c
> new file mode 100644
> index 0000000..7920179
> --- /dev/null
> +++ b/arm/tlbflush-data.c
> @@ -0,0 +1,401 @@
> +/*
> + * TLB Flush Race Tests
> + *
> + * These tests are designed to test for incorrect TLB flush semantics
> + * under emulation. The initial CPU will set all the others working on
> + * a writing to a set of pages. It will then re-map one of the pages
> + * back and forth while recording the timestamps of when each page was
> + * active. The test fails if a write was detected on a page after the
> + * tlbflush switching to a new page should have completed.
> + *
> + * Copyright (C) 2016, Linaro, Alex Bennée <alex.bennee@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.
> + */
> +
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#define NR_TIMESTAMPS 		((PAGE_SIZE/sizeof(u64)) << 2)
> +#define NR_AUDIT_RECORDS	16384
> +#define NR_DYNAMIC_PAGES 	3
> +#define MAX_CPUS 		8
> +
> +#define MIN(a, b)		((a) < (b) ? (a) : (b))

Peter Xu is bringing MIN to libcflat with his edu series.

> +
> +typedef struct {
> +	u64    		timestamps[NR_TIMESTAMPS];
> +} write_buffer;
> +
> +typedef struct {
> +	write_buffer 	*newbuf;
> +	u64		time_before_flush;
> +	u64		time_after_flush;
> +} audit_rec_t;
> +
> +typedef struct {
> +	audit_rec_t 	records[NR_AUDIT_RECORDS];
> +} audit_buffer;
> +
> +typedef struct {
> +	write_buffer 	*stable_pages;
> +	write_buffer    *dynamic_pages[NR_DYNAMIC_PAGES];
> +	audit_buffer 	*audit;
> +	unsigned int 	flush_count;
> +} test_data_t;
> +
> +static test_data_t test_data[MAX_CPUS];
> +
> +static cpumask_t ready;
> +static cpumask_t complete;
> +
> +static bool test_complete;
> +static bool flush_verbose;
> +static bool flush_by_page;
> +static int test_cycles=3;
> +static int secondary_cpus;
> +
> +static write_buffer * alloc_test_pages(void)
> +{
> +	write_buffer *pg;
> +	pg = calloc(NR_TIMESTAMPS, sizeof(u64));
> +	return pg;
> +}
> +
> +static void setup_pages_for_cpu(int cpu)
> +{
> +	unsigned int i;
> +
> +	test_data[cpu].stable_pages = alloc_test_pages();
> +
> +	for (i=0; i<NR_DYNAMIC_PAGES; i++) {
> +		test_data[cpu].dynamic_pages[i] = alloc_test_pages();
> +	}
> +
> +	test_data[cpu].audit = calloc(NR_AUDIT_RECORDS, sizeof(audit_rec_t));
> +}
> +
> +static audit_rec_t * get_audit_record(audit_buffer *buf, unsigned int record)
> +{
> +	return &buf->records[record];
> +}
> +
> +/* Sync on a given cpumask */
> +static void wait_on(int cpu, cpumask_t *mask)
> +{

Why take 'cpu' as a parameter. Just use smp_processor_id()

> +	cpumask_set_cpu(cpu, mask);
> +	while (!cpumask_full(mask))
> +		cpu_relax();
> +}
> +
> +static uint64_t sync_start(void)
> +{
> +	const uint64_t gate_mask = ~0x7ff;
> +	uint64_t gate, now;
> +	gate = get_cntvct() & gate_mask;
> +	do {
> +		now = get_cntvct();
> +	} while ((now & gate_mask) == gate);

I'm not really sure what this function is doing. Trying to
get synchronized timestamps between cpus?

> +
> +	return now;
> +}
> +
> +static void do_page_writes(void)
> +{
> +	unsigned int i, runs = 0;
> +	int cpu = smp_processor_id();
> +	write_buffer *stable_pages = test_data[cpu].stable_pages;
> +	write_buffer *moving_page = test_data[cpu].dynamic_pages[0];
> +
> +	printf("CPU%d: ready %p/%p @ 0x%08" PRIx64"\n",
> +		cpu, stable_pages, moving_page, get_cntvct());
> +
> +	while (!test_complete) {
> +		u64 run_start, run_end;
> +
> +		smp_mb();
> +		wait_on(cpu, &ready);
> +		run_start = sync_start();
> +
> +		for (i = 0; i < NR_TIMESTAMPS; i++) {
> +			u64 ts = get_cntvct();
> +			moving_page->timestamps[i] = ts;
> +			stable_pages->timestamps[i] = ts;
> +		}
> +
> +		run_end = get_cntvct();
> +		printf("CPU%d: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles)\n",
> +			cpu, runs++, run_start, run_end, run_end - run_start);
> +
> +		/* wait on completion - gets clear my main thread*/
> +		wait_on(cpu, &complete);
> +	}
> +}
> +
> +
> +/*
> + * This is the core of the test. Timestamps are taken either side of
> + * the updating of the page table and the flush instruction. By
> + * keeping track of when the page mapping is changed we can detect any
> + * writes that shouldn't have made it to the other pages.
> + *
> + * This isn't the recommended way to update the page table. ARM
> + * recommends break-before-make so accesses that are in flight can
> + * trigger faults that can be handled cleanly.
> + */
> +
> +/* This mimics  __flush_tlb_range from the kernel, doing a series of
> + * flush operations and then the dsb() to complete. */
> +static void flush_pages(unsigned long start, unsigned long end)
> +{
> +	unsigned long addr;
> +	start = start >> 12;
> +	end = end >> 12;

Looks like you're assuming 4K pages, but AArch64 unit tests have 64K
pages. You're free to change that, but you'll need to disable and
re-enable the mmu with new parameters.

> +
> +	dsb(ishst);
> +	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT -12)) {

Hmm, start and end are 4K aligned, but now you do shift addr appropriately
for 64K pages. Why not just do

start &= PAGE_MASK;
end &= PAGE_MASK;
addr += PAGE_SIZE

> +#if defined(__aarch64__)
> +		asm("tlbi	vaae1is, %0" :: "r" (addr));
> +#else
> +		asm volatile("mcr p15, 0, %0, c8, c7, 3" :: "r" (addr));
> +#endif
> +	}
> +	dsb(ish);

flush_pages() may be something we want in common code.

> +}
> +
> +static void remap_one_page(test_data_t *data)
> +{
> +	u64 ts_before, ts_after;
> +	int pg = (data->flush_count % (NR_DYNAMIC_PAGES + 1));
> +	write_buffer *dynamic_pages_vaddr = data->dynamic_pages[0];
> +	write_buffer *newbuf_paddr = data->dynamic_pages[pg];
> +	write_buffer *end_page_paddr = newbuf_paddr+1;
> +
> +	ts_before = get_cntvct();
> +	/* update the page table */
> +	mmu_set_range_ptes(mmu_idmap,
> +			(unsigned long) dynamic_pages_vaddr,
> +			(unsigned long) newbuf_paddr,
> +			(unsigned long) end_page_paddr,
> +			__pgprot(PTE_WBWA));
> +	/* until the flush + isb() writes may still go to old address */
> +	if (flush_by_page) {
> +		flush_pages((unsigned long)dynamic_pages_vaddr, (unsigned long)(dynamic_pages_vaddr+1));
> +	} else {
> +		flush_tlb_all();
> +	}
> +	ts_after = get_cntvct();
> +
> +	if (data->flush_count < NR_AUDIT_RECORDS) {
> +		audit_rec_t *rec = get_audit_record(data->audit, data->flush_count);
> +		rec->newbuf = newbuf_paddr;
> +		rec->time_before_flush = ts_before;
> +		rec->time_after_flush = ts_after;
> +	}
> +	data->flush_count++;
> +}
> +
> +static int check_pages(int cpu, char *msg,
> +		write_buffer *base_page, write_buffer *test_page,
> +		audit_buffer *audit, unsigned int flushes)
> +{
> +	write_buffer *prev_page = base_page;
> +	unsigned int empty = 0, write = 0, late = 0, weird = 0;

The variable 'weird' is a bit weird. How about 'bad'?

> +	unsigned int ts_index = 0, audit_index;
> +	u64 ts;
> +
> +	/* For each audit record */
> +	for (audit_index = 0; audit_index < MIN(flushes, NR_AUDIT_RECORDS); audit_index++) {
> +		audit_rec_t *rec = get_audit_record(audit, audit_index);
> +
> +		do {
> +			/* Work through timestamps until we overtake
> +			 * this audit record */
> +			ts = test_page->timestamps[ts_index];
> +
> +			if (ts == 0) {
> +				empty++;
> +			} else if (ts < rec->time_before_flush) {
> +				if (test_page == prev_page) {
> +					write++;
> +				} else {
> +					late++;
> +				}
> +			} else if (ts >= rec->time_before_flush
> +				&& ts <= rec->time_after_flush) {
> +				if (test_page == prev_page
> +					|| test_page == rec->newbuf) {
> +					write++;
> +				} else {
> +					weird++;
> +				}
> +			} else if (ts > rec->time_after_flush) {
> +				if (test_page == rec->newbuf) {
> +					write++;
> +				}
> +				/* It's possible the ts is way ahead
> +				 * of the current record so we can't
> +				 * call a non-match weird...
> +				 *
> +				 * Time to skip to next audit record
> +				 */
> +				break;
> +			}
> +
> +			ts = test_page->timestamps[ts_index++];
> +		} while (ts <= rec->time_after_flush && ts_index < NR_TIMESTAMPS);
> +
> +
> +		/* Next record */
> +		prev_page = rec->newbuf;
> +	} /* for each audit record */
> +
> +	if (flush_verbose) {
> +		printf("CPU%d: %s %p => %p %u/%u/%u/%u (0/OK/L/?) = %u total\n",
> +			cpu, msg, test_page, base_page,
> +			empty, write, late, weird, empty+write+late+weird);
> +	}
> +
> +	return weird;
> +}
> +
> +static int audit_cpu_pages(int cpu, test_data_t *data)
> +{
> +	unsigned int pg, writes=0, ts_index = 0;
> +	write_buffer *test_page;
> +	int errors = 0;
> +
> +	/* first the stable page */
> +	test_page = data->stable_pages;
> +	do {
> +		if (test_page->timestamps[ts_index++]) {
> +			writes++;
> +		}
> +	} while (ts_index < NR_TIMESTAMPS);
> +
> +	if (writes != ts_index) {
> +		errors += 1;
> +	}
> +
> +	if (flush_verbose) {
> +		printf("CPU%d: stable page %p %u writes\n",
> +			cpu, test_page, writes);
> +	}
> +
> +
> +	/* Restore the mapping for dynamic page */
> +	test_page = data->dynamic_pages[0];
> +
> +	mmu_set_range_ptes(mmu_idmap,
> +			(unsigned long) test_page,
> +			(unsigned long) test_page,
> +			(unsigned long) &test_page[1],
> +			__pgprot(PTE_WBWA));
> +	flush_tlb_all();
> +
> +	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
> +		errors += check_pages(cpu, "dynamic page", test_page,
> +				data->dynamic_pages[pg],
> +				data->audit, data->flush_count);
> +	}
> +
> +	/* reset for next run */
> +	memset(data->stable_pages, 0, sizeof(write_buffer));
> +	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
> +		memset(data->dynamic_pages[pg], 0, sizeof(write_buffer));
> +	}
> +	memset(data->audit, 0, sizeof(audit_buffer));
> +	data->flush_count = 0;
> +	smp_mb();
> +
> +	report("CPU%d: checked, errors: %d", errors == 0, cpu, errors);
> +	return errors;
> +}
> +
> +static void do_page_flushes(void)
> +{
> +	int i, cpu;
> +
> +	printf("CPU0: ready @ 0x%08" PRIx64"\n", get_cntvct());
> +
> +	for (i=0; i<test_cycles; i++) {
> +		unsigned int flushes=0;
> +		u64 run_start, run_end;
> +		int cpus_finished;
> +
> +		cpumask_clear(&complete);
> +		wait_on(0, &ready);
> +		run_start = sync_start();
> +
> +		do {
> +			for_each_present_cpu(cpu) {
> +				if (cpu == 0)
> +					continue;
> +
> +				/* do remap & flush */
> +				remap_one_page(&test_data[cpu]);
> +				flushes++;
> +			}
> +
> +			cpus_finished = cpumask_weight(&complete);
> +		} while (cpus_finished < secondary_cpus);
> +
> +		run_end = get_cntvct();
> +
> +		printf("CPU0: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles, %u flushes)\n",
> +			i, run_start, run_end, run_end - run_start, flushes);
> +
> +		/* Reset our ready mask for next cycle */
> +		cpumask_clear_cpu(0, &ready);
> +		smp_mb();
> +		wait_on(0, &complete);
> +
> +		/* Check for discrepancies */
> +		for_each_present_cpu(cpu) {
> +			if (cpu == 0)
> +				continue;
> +			audit_cpu_pages(cpu, &test_data[cpu]);
> +		}
> +	}
> +
> +	test_complete = true;
> +	smp_mb();
> +	cpumask_set_cpu(0, &ready);
> +	cpumask_set_cpu(0, &complete);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int cpu, i;
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +		if (strcmp(arg, "verbose") == 0) {
> +			flush_verbose = true;
> +		}
> +		if (strcmp(arg, "page") == 0) {
> +			flush_by_page = true;
> +		}
> +		if (strstr(arg, "cycles=") != NULL) {
> +			char *p = strstr(arg, "=");
> +			test_cycles = atol(p+1);

We have parse_keyval for this. Radim has plans to improve
parse_keyval though, as nobody (including the author, me)
really like it as is...

> +		}
> +	}
> +
> +	for_each_present_cpu(cpu) {
> +		if (cpu == 0)
> +			continue;
> +
> +		setup_pages_for_cpu(cpu);
> +		smp_boot_secondary(cpu, do_page_writes);
> +		secondary_cpus++;
> +	}
> +
> +	/* CPU 0 does the flushes and checks the results */
> +	do_page_flushes();
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index beaae84..7dc7799 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -96,3 +96,15 @@ file = tlbflush-code.flat
>  smp = $(($MAX_SMP>4?4:$MAX_SMP))
>  extra_params = -append 'page self'
>  groups = tlbflush
> +
> +[tlbflush-data::all]
> +file = tlbflush-data.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +groups = tlbflush
> +
> +[tlbflush-data::page]
> +file = tlbflush-data.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append "page"
> +groups = tlbflush
> +
> -- 
> 2.10.1
>

Same style comments as last patch apply to this one too.

I skimmed this pretty quickly mostly looking at it wrt framework API and
style. And that looks pretty good to me.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 08/11] arm/tlbflush-data: Add TLB flush during data writes test
@ 2016-11-28 10:11     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2016 at 04:10:30PM +0000, Alex Benn?e wrote:
> This test is the cousin of the tlbflush-code test. Instead of flushing
> running code it re-maps virtual addresses while a buffer is being filled
> up. It then audits the results checking for writes that have ended up in
> the wrong place.
> 
> While tlbflush-code exercises QEMU's translation invalidation logic this
> test stresses the SoftMMU cputlb code and ensures it is semantically
> correct.
> 
> The test optionally takes two parameters for debugging:
> 
>    cycles           - change the default number of test iterations
>    page             - flush pages individually instead of all
> 
> Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
> CC: Mark Rutland <mark.rutland@arm.com>
> ---
>  arm/Makefile.common |   2 +
>  arm/tlbflush-data.c | 401 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  12 ++
>  3 files changed, 415 insertions(+)
>  create mode 100644 arm/tlbflush-data.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index de99a6e..528166d 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -14,6 +14,7 @@ tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
>  tests-common += $(TEST_DIR)/tlbflush-code.flat
> +tests-common += $(TEST_DIR)/tlbflush-data.flat
>  
>  all: test_cases
>  
> @@ -83,3 +84,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
>  
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
>  $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
> +$(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o

This isn't necessary

> diff --git a/arm/tlbflush-data.c b/arm/tlbflush-data.c
> new file mode 100644
> index 0000000..7920179
> --- /dev/null
> +++ b/arm/tlbflush-data.c
> @@ -0,0 +1,401 @@
> +/*
> + * TLB Flush Race Tests
> + *
> + * These tests are designed to test for incorrect TLB flush semantics
> + * under emulation. The initial CPU will set all the others working on
> + * a writing to a set of pages. It will then re-map one of the pages
> + * back and forth while recording the timestamps of when each page was
> + * active. The test fails if a write was detected on a page after the
> + * tlbflush switching to a new page should have completed.
> + *
> + * Copyright (C) 2016, Linaro, Alex Benn?e <alex.bennee@linaro.org>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.
> + */
> +
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#define NR_TIMESTAMPS 		((PAGE_SIZE/sizeof(u64)) << 2)
> +#define NR_AUDIT_RECORDS	16384
> +#define NR_DYNAMIC_PAGES 	3
> +#define MAX_CPUS 		8
> +
> +#define MIN(a, b)		((a) < (b) ? (a) : (b))

Peter Xu is bringing MIN to libcflat with his edu series.

> +
> +typedef struct {
> +	u64    		timestamps[NR_TIMESTAMPS];
> +} write_buffer;
> +
> +typedef struct {
> +	write_buffer 	*newbuf;
> +	u64		time_before_flush;
> +	u64		time_after_flush;
> +} audit_rec_t;
> +
> +typedef struct {
> +	audit_rec_t 	records[NR_AUDIT_RECORDS];
> +} audit_buffer;
> +
> +typedef struct {
> +	write_buffer 	*stable_pages;
> +	write_buffer    *dynamic_pages[NR_DYNAMIC_PAGES];
> +	audit_buffer 	*audit;
> +	unsigned int 	flush_count;
> +} test_data_t;
> +
> +static test_data_t test_data[MAX_CPUS];
> +
> +static cpumask_t ready;
> +static cpumask_t complete;
> +
> +static bool test_complete;
> +static bool flush_verbose;
> +static bool flush_by_page;
> +static int test_cycles=3;
> +static int secondary_cpus;
> +
> +static write_buffer * alloc_test_pages(void)
> +{
> +	write_buffer *pg;
> +	pg = calloc(NR_TIMESTAMPS, sizeof(u64));
> +	return pg;
> +}
> +
> +static void setup_pages_for_cpu(int cpu)
> +{
> +	unsigned int i;
> +
> +	test_data[cpu].stable_pages = alloc_test_pages();
> +
> +	for (i=0; i<NR_DYNAMIC_PAGES; i++) {
> +		test_data[cpu].dynamic_pages[i] = alloc_test_pages();
> +	}
> +
> +	test_data[cpu].audit = calloc(NR_AUDIT_RECORDS, sizeof(audit_rec_t));
> +}
> +
> +static audit_rec_t * get_audit_record(audit_buffer *buf, unsigned int record)
> +{
> +	return &buf->records[record];
> +}
> +
> +/* Sync on a given cpumask */
> +static void wait_on(int cpu, cpumask_t *mask)
> +{

Why take 'cpu' as a parameter. Just use smp_processor_id()

> +	cpumask_set_cpu(cpu, mask);
> +	while (!cpumask_full(mask))
> +		cpu_relax();
> +}
> +
> +static uint64_t sync_start(void)
> +{
> +	const uint64_t gate_mask = ~0x7ff;
> +	uint64_t gate, now;
> +	gate = get_cntvct() & gate_mask;
> +	do {
> +		now = get_cntvct();
> +	} while ((now & gate_mask) == gate);

I'm not really sure what this function is doing. Trying to
get synchronized timestamps between cpus?

> +
> +	return now;
> +}
> +
> +static void do_page_writes(void)
> +{
> +	unsigned int i, runs = 0;
> +	int cpu = smp_processor_id();
> +	write_buffer *stable_pages = test_data[cpu].stable_pages;
> +	write_buffer *moving_page = test_data[cpu].dynamic_pages[0];
> +
> +	printf("CPU%d: ready %p/%p @ 0x%08" PRIx64"\n",
> +		cpu, stable_pages, moving_page, get_cntvct());
> +
> +	while (!test_complete) {
> +		u64 run_start, run_end;
> +
> +		smp_mb();
> +		wait_on(cpu, &ready);
> +		run_start = sync_start();
> +
> +		for (i = 0; i < NR_TIMESTAMPS; i++) {
> +			u64 ts = get_cntvct();
> +			moving_page->timestamps[i] = ts;
> +			stable_pages->timestamps[i] = ts;
> +		}
> +
> +		run_end = get_cntvct();
> +		printf("CPU%d: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles)\n",
> +			cpu, runs++, run_start, run_end, run_end - run_start);
> +
> +		/* wait on completion - gets clear my main thread*/
> +		wait_on(cpu, &complete);
> +	}
> +}
> +
> +
> +/*
> + * This is the core of the test. Timestamps are taken either side of
> + * the updating of the page table and the flush instruction. By
> + * keeping track of when the page mapping is changed we can detect any
> + * writes that shouldn't have made it to the other pages.
> + *
> + * This isn't the recommended way to update the page table. ARM
> + * recommends break-before-make so accesses that are in flight can
> + * trigger faults that can be handled cleanly.
> + */
> +
> +/* This mimics  __flush_tlb_range from the kernel, doing a series of
> + * flush operations and then the dsb() to complete. */
> +static void flush_pages(unsigned long start, unsigned long end)
> +{
> +	unsigned long addr;
> +	start = start >> 12;
> +	end = end >> 12;

Looks like you're assuming 4K pages, but AArch64 unit tests have 64K
pages. You're free to change that, but you'll need to disable and
re-enable the mmu with new parameters.

> +
> +	dsb(ishst);
> +	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT -12)) {

Hmm, start and end are 4K aligned, but now you do shift addr appropriately
for 64K pages. Why not just do

start &= PAGE_MASK;
end &= PAGE_MASK;
addr += PAGE_SIZE

> +#if defined(__aarch64__)
> +		asm("tlbi	vaae1is, %0" :: "r" (addr));
> +#else
> +		asm volatile("mcr p15, 0, %0, c8, c7, 3" :: "r" (addr));
> +#endif
> +	}
> +	dsb(ish);

flush_pages() may be something we want in common code.

> +}
> +
> +static void remap_one_page(test_data_t *data)
> +{
> +	u64 ts_before, ts_after;
> +	int pg = (data->flush_count % (NR_DYNAMIC_PAGES + 1));
> +	write_buffer *dynamic_pages_vaddr = data->dynamic_pages[0];
> +	write_buffer *newbuf_paddr = data->dynamic_pages[pg];
> +	write_buffer *end_page_paddr = newbuf_paddr+1;
> +
> +	ts_before = get_cntvct();
> +	/* update the page table */
> +	mmu_set_range_ptes(mmu_idmap,
> +			(unsigned long) dynamic_pages_vaddr,
> +			(unsigned long) newbuf_paddr,
> +			(unsigned long) end_page_paddr,
> +			__pgprot(PTE_WBWA));
> +	/* until the flush + isb() writes may still go to old address */
> +	if (flush_by_page) {
> +		flush_pages((unsigned long)dynamic_pages_vaddr, (unsigned long)(dynamic_pages_vaddr+1));
> +	} else {
> +		flush_tlb_all();
> +	}
> +	ts_after = get_cntvct();
> +
> +	if (data->flush_count < NR_AUDIT_RECORDS) {
> +		audit_rec_t *rec = get_audit_record(data->audit, data->flush_count);
> +		rec->newbuf = newbuf_paddr;
> +		rec->time_before_flush = ts_before;
> +		rec->time_after_flush = ts_after;
> +	}
> +	data->flush_count++;
> +}
> +
> +static int check_pages(int cpu, char *msg,
> +		write_buffer *base_page, write_buffer *test_page,
> +		audit_buffer *audit, unsigned int flushes)
> +{
> +	write_buffer *prev_page = base_page;
> +	unsigned int empty = 0, write = 0, late = 0, weird = 0;

The variable 'weird' is a bit weird. How about 'bad'?

> +	unsigned int ts_index = 0, audit_index;
> +	u64 ts;
> +
> +	/* For each audit record */
> +	for (audit_index = 0; audit_index < MIN(flushes, NR_AUDIT_RECORDS); audit_index++) {
> +		audit_rec_t *rec = get_audit_record(audit, audit_index);
> +
> +		do {
> +			/* Work through timestamps until we overtake
> +			 * this audit record */
> +			ts = test_page->timestamps[ts_index];
> +
> +			if (ts == 0) {
> +				empty++;
> +			} else if (ts < rec->time_before_flush) {
> +				if (test_page == prev_page) {
> +					write++;
> +				} else {
> +					late++;
> +				}
> +			} else if (ts >= rec->time_before_flush
> +				&& ts <= rec->time_after_flush) {
> +				if (test_page == prev_page
> +					|| test_page == rec->newbuf) {
> +					write++;
> +				} else {
> +					weird++;
> +				}
> +			} else if (ts > rec->time_after_flush) {
> +				if (test_page == rec->newbuf) {
> +					write++;
> +				}
> +				/* It's possible the ts is way ahead
> +				 * of the current record so we can't
> +				 * call a non-match weird...
> +				 *
> +				 * Time to skip to next audit record
> +				 */
> +				break;
> +			}
> +
> +			ts = test_page->timestamps[ts_index++];
> +		} while (ts <= rec->time_after_flush && ts_index < NR_TIMESTAMPS);
> +
> +
> +		/* Next record */
> +		prev_page = rec->newbuf;
> +	} /* for each audit record */
> +
> +	if (flush_verbose) {
> +		printf("CPU%d: %s %p => %p %u/%u/%u/%u (0/OK/L/?) = %u total\n",
> +			cpu, msg, test_page, base_page,
> +			empty, write, late, weird, empty+write+late+weird);
> +	}
> +
> +	return weird;
> +}
> +
> +static int audit_cpu_pages(int cpu, test_data_t *data)
> +{
> +	unsigned int pg, writes=0, ts_index = 0;
> +	write_buffer *test_page;
> +	int errors = 0;
> +
> +	/* first the stable page */
> +	test_page = data->stable_pages;
> +	do {
> +		if (test_page->timestamps[ts_index++]) {
> +			writes++;
> +		}
> +	} while (ts_index < NR_TIMESTAMPS);
> +
> +	if (writes != ts_index) {
> +		errors += 1;
> +	}
> +
> +	if (flush_verbose) {
> +		printf("CPU%d: stable page %p %u writes\n",
> +			cpu, test_page, writes);
> +	}
> +
> +
> +	/* Restore the mapping for dynamic page */
> +	test_page = data->dynamic_pages[0];
> +
> +	mmu_set_range_ptes(mmu_idmap,
> +			(unsigned long) test_page,
> +			(unsigned long) test_page,
> +			(unsigned long) &test_page[1],
> +			__pgprot(PTE_WBWA));
> +	flush_tlb_all();
> +
> +	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
> +		errors += check_pages(cpu, "dynamic page", test_page,
> +				data->dynamic_pages[pg],
> +				data->audit, data->flush_count);
> +	}
> +
> +	/* reset for next run */
> +	memset(data->stable_pages, 0, sizeof(write_buffer));
> +	for (pg=0; pg<NR_DYNAMIC_PAGES; pg++) {
> +		memset(data->dynamic_pages[pg], 0, sizeof(write_buffer));
> +	}
> +	memset(data->audit, 0, sizeof(audit_buffer));
> +	data->flush_count = 0;
> +	smp_mb();
> +
> +	report("CPU%d: checked, errors: %d", errors == 0, cpu, errors);
> +	return errors;
> +}
> +
> +static void do_page_flushes(void)
> +{
> +	int i, cpu;
> +
> +	printf("CPU0: ready @ 0x%08" PRIx64"\n", get_cntvct());
> +
> +	for (i=0; i<test_cycles; i++) {
> +		unsigned int flushes=0;
> +		u64 run_start, run_end;
> +		int cpus_finished;
> +
> +		cpumask_clear(&complete);
> +		wait_on(0, &ready);
> +		run_start = sync_start();
> +
> +		do {
> +			for_each_present_cpu(cpu) {
> +				if (cpu == 0)
> +					continue;
> +
> +				/* do remap & flush */
> +				remap_one_page(&test_data[cpu]);
> +				flushes++;
> +			}
> +
> +			cpus_finished = cpumask_weight(&complete);
> +		} while (cpus_finished < secondary_cpus);
> +
> +		run_end = get_cntvct();
> +
> +		printf("CPU0: run %d 0x%" PRIx64 "->0x%" PRIx64 " (%" PRId64 " cycles, %u flushes)\n",
> +			i, run_start, run_end, run_end - run_start, flushes);
> +
> +		/* Reset our ready mask for next cycle */
> +		cpumask_clear_cpu(0, &ready);
> +		smp_mb();
> +		wait_on(0, &complete);
> +
> +		/* Check for discrepancies */
> +		for_each_present_cpu(cpu) {
> +			if (cpu == 0)
> +				continue;
> +			audit_cpu_pages(cpu, &test_data[cpu]);
> +		}
> +	}
> +
> +	test_complete = true;
> +	smp_mb();
> +	cpumask_set_cpu(0, &ready);
> +	cpumask_set_cpu(0, &complete);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int cpu, i;
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +		if (strcmp(arg, "verbose") == 0) {
> +			flush_verbose = true;
> +		}
> +		if (strcmp(arg, "page") == 0) {
> +			flush_by_page = true;
> +		}
> +		if (strstr(arg, "cycles=") != NULL) {
> +			char *p = strstr(arg, "=");
> +			test_cycles = atol(p+1);

We have parse_keyval for this. Radim has plans to improve
parse_keyval though, as nobody (including the author, me)
really like it as is...

> +		}
> +	}
> +
> +	for_each_present_cpu(cpu) {
> +		if (cpu == 0)
> +			continue;
> +
> +		setup_pages_for_cpu(cpu);
> +		smp_boot_secondary(cpu, do_page_writes);
> +		secondary_cpus++;
> +	}
> +
> +	/* CPU 0 does the flushes and checks the results */
> +	do_page_flushes();
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index beaae84..7dc7799 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -96,3 +96,15 @@ file = tlbflush-code.flat
>  smp = $(($MAX_SMP>4?4:$MAX_SMP))
>  extra_params = -append 'page self'
>  groups = tlbflush
> +
> +[tlbflush-data::all]
> +file = tlbflush-data.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +groups = tlbflush
> +
> +[tlbflush-data::page]
> +file = tlbflush-data.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append "page"
> +groups = tlbflush
> +
> -- 
> 2.10.1
>

Same style comments as last patch apply to this one too.

I skimmed this pretty quickly mostly looking at it wrt framework API and
style. And that looks pretty good to me.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [kvm-unit-tests PATCH v7 09/11] arm/locking-tests: add comprehensive locking test
  2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-28 10:29     ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:29 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, nikunj, kvm, marc.zyngier, jan.kiszka, mark.burton,
	qemu-devel, cota, linux-arm-kernel, serge.fdrv, pbonzini,
	bobby.prani, rth, kvmarm

On Thu, Nov 24, 2016 at 04:10:31PM +0000, Alex Bennée wrote:
> This test has been written mainly to stress multi-threaded TCG behaviour
> but will demonstrate failure by default on real hardware. The test takes
> the following parameters:
> 
>   - "lock" use GCC's locking semantics
>   - "atomic" use GCC's __atomic primitives
>   - "wfelock" use WaitForEvent sleep
>   - "excl" use load/store exclusive semantics
> 
> Also two more options allow the test to be tweaked
> 
>   - "noshuffle" disables the memory shuffling
>   - "count=%ld" set your own per-CPU increment count
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v2
>   - Don't use thumb style strexeq stuff
>   - Add atomic and wfelock tests
>   - Add count/noshuffle test controls
>   - Move barrier tests to separate test file
> v4
>   - fix up unitests.cfg to use correct test name
>   - move into "locking" group, remove barrier tests
>   - use a table to add tests, mark which are expected to work
>   - correctly report XFAIL
> v5
>   - max out at -smp 4 in unittest.cfg
> v7
>   - make test control flags bools
>   - default the count to 100000 (so it doesn't timeout)
> ---
>  arm/Makefile.common |   2 +
>  arm/locking-test.c  | 302 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  34 ++++++
>  3 files changed, 338 insertions(+)
>  create mode 100644 arm/locking-test.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index 528166d..eb4cfdf 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -15,6 +15,7 @@ tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
>  tests-common += $(TEST_DIR)/tlbflush-code.flat
>  tests-common += $(TEST_DIR)/tlbflush-data.flat
> +tests-common += $(TEST_DIR)/locking-test.flat
>  
>  all: test_cases
>  
> @@ -85,3 +86,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
>  $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
>  $(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
> +$(TEST_DIR)/locking-test.elf: $(cstart.o) $(TEST_DIR)/locking-test.o

Instead of adding a new test file, please extend the one we already have,
which iirc was the first MTTCG test, arm/spinlock-test.c. If you don't
like the naming or code in spinlock-test.c, then feel free to change it,
delete it. It's currently not getting run by arm/unittests.cfg, and it's
not getting maintained.

> diff --git a/arm/locking-test.c b/arm/locking-test.c
> new file mode 100644
> index 0000000..f10c61b
> --- /dev/null
> +++ b/arm/locking-test.c
> @@ -0,0 +1,302 @@
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#include <prng.h>
> +
> +#define MAX_CPUS 8
> +
> +/* Test definition structure
> + *
> + * A simple structure that describes the test name, expected pass and
> + * increment function.
> + */
> +
> +/* Function pointers for test */
> +typedef void (*inc_fn)(int cpu);
> +
> +typedef struct {
> +	const char *test_name;
> +	bool  should_pass;
> +	inc_fn main_fn;
> +} test_descr_t;
> +
> +/* How many increments to do */
> +static int increment_count = 1000000;
> +static bool do_shuffle = true;
> +
> +/* Shared value all the tests attempt to safely increment using
> + * various forms of atomic locking and exclusive behaviour.
> + */
> +static unsigned int shared_value;
> +
> +/* PAGE_SIZE * uint32_t means we span several pages */
> +__attribute__((aligned(PAGE_SIZE))) static uint32_t memory_array[PAGE_SIZE];
> +
> +/* We use the alignment of the following to ensure accesses to locking
> + * and synchronisation primatives don't interfere with the page of the
> + * shared value
> + */
> +__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
> +__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
> +__attribute__((aligned(PAGE_SIZE))) struct isaac_ctx prng_context[MAX_CPUS];
> +
> +/* Some of the approaches use a global lock to prevent contention. */
> +static int global_lock;
> +
> +/* In any SMP setting this *should* fail due to cores stepping on
> + * each other updating the shared variable
> + */
> +static void increment_shared(int cpu)
> +{
> +	(void)cpu;
> +
> +	shared_value++;
> +}
> +
> +/* GCC __sync primitives are deprecated in favour of __atomic */
> +static void increment_shared_with_lock(int cpu)
> +{
> +	(void)cpu;
> +
> +	while (__sync_lock_test_and_set(&global_lock, 1));
> +	shared_value++;
> +	__sync_lock_release(&global_lock);
> +}
> +
> +/* In practice even __ATOMIC_RELAXED uses ARM's ldxr/stex exclusive
> + * semantics */
> +static void increment_shared_with_atomic(int cpu)
> +{
> +	(void)cpu;
> +
> +	__atomic_add_fetch(&shared_value, 1, __ATOMIC_SEQ_CST);
> +}
> +
> +
> +/*
> + * Load/store exclusive with WFE (wait-for-event)
> + *
> + * See ARMv8 ARM examples:
> + *   Use of Wait For Event (WFE) and Send Event (SEV) with locks
> + */
> +
> +static void increment_shared_with_wfelock(int cpu)
> +{
> +	(void)cpu;
> +
> +#if defined(__aarch64__)
> +	asm volatile(
> +	"	mov     w1, #1\n"
> +	"       sevl\n"
> +	"       prfm PSTL1KEEP, [%[lock]]\n"
> +	"1:     wfe\n"
> +	"	ldaxr	w0, [%[lock]]\n"
> +	"	cbnz    w0, 1b\n"
> +	"	stxr    w0, w1, [%[lock]]\n"
> +	"	cbnz	w0, 1b\n"
> +	/* lock held */
> +	"	ldr	w0, [%[sptr]]\n"
> +	"	add	w0, w0, #0x1\n"
> +	"	str	w0, [%[sptr]]\n"
> +	/* now release */
> +	"	stlr	wzr, [%[lock]]\n"
> +	: /* out */
> +	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
> +	: "w0", "w1", "cc");
> +#else
> +	asm volatile(
> +	"	mov     r1, #1\n"
> +	"1:	ldrex	r0, [%[lock]]\n"
> +	"	cmp     r0, #0\n"
> +	"	wfene\n"
> +	"	strexeq r0, r1, [%[lock]]\n"
> +	"	cmpeq	r0, #0\n"
> +	"	bne	1b\n"
> +	"	dmb\n"
> +	/* lock held */
> +	"	ldr	r0, [%[sptr]]\n"
> +	"	add	r0, r0, #0x1\n"
> +	"	str	r0, [%[sptr]]\n"
> +	/* now release */
> +	"	mov	r0, #0\n"
> +	"	dmb\n"
> +	"	str	r0, [%[lock]]\n"
> +	"	dsb\n"
> +	"	sev\n"
> +	: /* out */
> +	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
> +	: "r0", "r1", "cc");
> +#endif
> +}
> +
> +
> +/*
> + * Hand-written version of the load/store exclusive
> + */
> +static void increment_shared_with_excl(int cpu)
> +{
> +	(void)cpu;
> +
> +#if defined(__aarch64__)
> +        asm volatile(
> +	"1:	ldxr	w0, [%[sptr]]\n"
> +	"	add     w0, w0, #0x1\n"
> +	"	stxr	w1, w0, [%[sptr]]\n"
> +	"	cbnz	w1, 1b\n"
> +	: /* out */
> +	: [sptr] "r" (&shared_value) /* in */
> +	: "w0", "w1", "cc");
> +#else
> +	asm volatile(
> +	"1:	ldrex	r0, [%[sptr]]\n"
> +	"	add     r0, r0, #0x1\n"
> +	"	strex	r1, r0, [%[sptr]]\n"
> +	"	cmp	r1, #0\n"
> +	"	bne	1b\n"
> +	: /* out */
> +	: [sptr] "r" (&shared_value) /* in */
> +	: "r0", "r1", "cc");
> +#endif
> +}
> +
> +/* Test array */
> +static test_descr_t tests[] = {
> +	{ "none", false, increment_shared },
> +	{ "lock", true, increment_shared_with_lock },
> +	{ "atomic", true, increment_shared_with_atomic },
> +	{ "wfelock", true, increment_shared_with_wfelock },
> +	{ "excl", true, increment_shared_with_excl }
> +};
> +
> +/* The idea of this is just to generate some random load/store
> + * activity which may or may not race with an un-barried incremented
> + * of the shared counter
> + */
> +static void shuffle_memory(int cpu)
> +{
> +	int i;
> +	uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
> +	uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
> +	int count = seq & 0x1f;
> +	uint32_t val=0;
> +
> +	seq >>= 5;
> +
> +	for (i=0; i<count; i++) {
> +		int index = seq & ~PAGE_MASK;
> +		if (lspat & 1) {
> +			val ^= memory_array[index];
> +		} else {
> +			memory_array[index] = val;
> +		}
> +		seq >>= PAGE_SHIFT;
> +		seq ^= lspat;
> +		lspat >>= 1;
> +	}
> +

extra line here

> +}
> +
> +static inc_fn increment_function;
> +
> +static void do_increment(void)
> +{
> +	int i;
> +	int cpu = smp_processor_id();
> +
> +	printf("CPU%d: online and ++ing\n", cpu);
> +
> +	for (i=0; i < increment_count; i++) {
> +		per_cpu_value[cpu]++;
> +		increment_function(cpu);
> +
> +		if (do_shuffle)
> +			shuffle_memory(cpu);
> +	}
> +
> +	printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
> +
> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +	if (cpu != 0)
> +		halt();
> +}
> +
> +static void setup_and_run_test(test_descr_t *test)
> +{
> +	unsigned int i, sum = 0;
> +	int cpu, cpu_cnt = 0;
> +
> +	increment_function = test->main_fn;
> +
> +	/* fill our random page */
> +        for (i=0; i<PAGE_SIZE; i++) {
> +		memory_array[i] = isaac_next_uint32(&prng_context[0]);
> +	}
> +
> +	for_each_present_cpu(cpu) {
> +		uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
> +		cpu_cnt++;
> +		if (cpu == 0)
> +			continue;
> +
> +		isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
> +		smp_boot_secondary(cpu, do_increment);
> +	}
> +
> +	do_increment();
> +
> +	while (!cpumask_full(&smp_test_complete))
> +		cpu_relax();
> +
> +	/* All CPUs done, do we add up */
> +	for_each_present_cpu(cpu) {
> +		sum += per_cpu_value[cpu];
> +	}
> +
> +	if (test->should_pass) {
> +		report("total incs %d", sum == shared_value, shared_value);
> +	} else {
> +		report_xfail("total incs %d", true, sum == shared_value, shared_value);
> +	}
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	static const unsigned char seed[] = "myseed";
> +	test_descr_t *test = &tests[0];
> +	int i;
> +	unsigned int j;
> +
> +	isaac_init(&prng_context[0], &seed[0], sizeof(seed));
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +
> +		/* Check for test name */
> +		for (j = 0; j < ARRAY_SIZE(tests); j++) {
> +			if (strcmp(arg, tests[j].test_name) == 0)
> +				test = &tests[j];
> +		}
> +
> +		/* Test modifiers */
> +		if (strcmp(arg, "noshuffle") == 0) {
> +			do_shuffle = false;
> +			report_prefix_push("noshuffle");
> +		} else if (strstr(arg, "count=") != NULL) {
> +			char *p = strstr(arg, "=");
> +			increment_count = atol(p+1);
> +		} else {
> +			isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
> +		}
> +	}
> +
> +	if (test) {
> +		setup_and_run_test(test);
> +	} else {
> +		report("Unknown test", false);
> +	}
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 7dc7799..abbfe79 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -108,3 +108,37 @@ smp = $(($MAX_SMP>4?4:$MAX_SMP))
>  extra_params = -append "page"
>  groups = tlbflush
>  
> +# Locking tests
> +[locking::none]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +groups = locking
> +accel = tcg
> +
> +[locking::lock]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'lock'
> +groups = locking
> +accel = tcg
> +
> +[locking::atomic]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'atomic'
> +groups = locking
> +accel = tcg
> +
> +[locking::wfelock]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'wfelock'
> +groups = locking
> +accel = tcg
> +
> +[locking::excl]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'excl'
> +groups = locking
> +accel = tcg
> -- 
> 2.10.1
>

I didn't look too closely at this one...

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 09/11] arm/locking-tests: add comprehensive locking test
@ 2016-11-28 10:29     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:29 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana

On Thu, Nov 24, 2016 at 04:10:31PM +0000, Alex Bennée wrote:
> This test has been written mainly to stress multi-threaded TCG behaviour
> but will demonstrate failure by default on real hardware. The test takes
> the following parameters:
> 
>   - "lock" use GCC's locking semantics
>   - "atomic" use GCC's __atomic primitives
>   - "wfelock" use WaitForEvent sleep
>   - "excl" use load/store exclusive semantics
> 
> Also two more options allow the test to be tweaked
> 
>   - "noshuffle" disables the memory shuffling
>   - "count=%ld" set your own per-CPU increment count
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v2
>   - Don't use thumb style strexeq stuff
>   - Add atomic and wfelock tests
>   - Add count/noshuffle test controls
>   - Move barrier tests to separate test file
> v4
>   - fix up unitests.cfg to use correct test name
>   - move into "locking" group, remove barrier tests
>   - use a table to add tests, mark which are expected to work
>   - correctly report XFAIL
> v5
>   - max out at -smp 4 in unittest.cfg
> v7
>   - make test control flags bools
>   - default the count to 100000 (so it doesn't timeout)
> ---
>  arm/Makefile.common |   2 +
>  arm/locking-test.c  | 302 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  34 ++++++
>  3 files changed, 338 insertions(+)
>  create mode 100644 arm/locking-test.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index 528166d..eb4cfdf 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -15,6 +15,7 @@ tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
>  tests-common += $(TEST_DIR)/tlbflush-code.flat
>  tests-common += $(TEST_DIR)/tlbflush-data.flat
> +tests-common += $(TEST_DIR)/locking-test.flat
>  
>  all: test_cases
>  
> @@ -85,3 +86,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
>  $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
>  $(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
> +$(TEST_DIR)/locking-test.elf: $(cstart.o) $(TEST_DIR)/locking-test.o

Instead of adding a new test file, please extend the one we already have,
which iirc was the first MTTCG test, arm/spinlock-test.c. If you don't
like the naming or code in spinlock-test.c, then feel free to change it,
delete it. It's currently not getting run by arm/unittests.cfg, and it's
not getting maintained.

> diff --git a/arm/locking-test.c b/arm/locking-test.c
> new file mode 100644
> index 0000000..f10c61b
> --- /dev/null
> +++ b/arm/locking-test.c
> @@ -0,0 +1,302 @@
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#include <prng.h>
> +
> +#define MAX_CPUS 8
> +
> +/* Test definition structure
> + *
> + * A simple structure that describes the test name, expected pass and
> + * increment function.
> + */
> +
> +/* Function pointers for test */
> +typedef void (*inc_fn)(int cpu);
> +
> +typedef struct {
> +	const char *test_name;
> +	bool  should_pass;
> +	inc_fn main_fn;
> +} test_descr_t;
> +
> +/* How many increments to do */
> +static int increment_count = 1000000;
> +static bool do_shuffle = true;
> +
> +/* Shared value all the tests attempt to safely increment using
> + * various forms of atomic locking and exclusive behaviour.
> + */
> +static unsigned int shared_value;
> +
> +/* PAGE_SIZE * uint32_t means we span several pages */
> +__attribute__((aligned(PAGE_SIZE))) static uint32_t memory_array[PAGE_SIZE];
> +
> +/* We use the alignment of the following to ensure accesses to locking
> + * and synchronisation primatives don't interfere with the page of the
> + * shared value
> + */
> +__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
> +__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
> +__attribute__((aligned(PAGE_SIZE))) struct isaac_ctx prng_context[MAX_CPUS];
> +
> +/* Some of the approaches use a global lock to prevent contention. */
> +static int global_lock;
> +
> +/* In any SMP setting this *should* fail due to cores stepping on
> + * each other updating the shared variable
> + */
> +static void increment_shared(int cpu)
> +{
> +	(void)cpu;
> +
> +	shared_value++;
> +}
> +
> +/* GCC __sync primitives are deprecated in favour of __atomic */
> +static void increment_shared_with_lock(int cpu)
> +{
> +	(void)cpu;
> +
> +	while (__sync_lock_test_and_set(&global_lock, 1));
> +	shared_value++;
> +	__sync_lock_release(&global_lock);
> +}
> +
> +/* In practice even __ATOMIC_RELAXED uses ARM's ldxr/stex exclusive
> + * semantics */
> +static void increment_shared_with_atomic(int cpu)
> +{
> +	(void)cpu;
> +
> +	__atomic_add_fetch(&shared_value, 1, __ATOMIC_SEQ_CST);
> +}
> +
> +
> +/*
> + * Load/store exclusive with WFE (wait-for-event)
> + *
> + * See ARMv8 ARM examples:
> + *   Use of Wait For Event (WFE) and Send Event (SEV) with locks
> + */
> +
> +static void increment_shared_with_wfelock(int cpu)
> +{
> +	(void)cpu;
> +
> +#if defined(__aarch64__)
> +	asm volatile(
> +	"	mov     w1, #1\n"
> +	"       sevl\n"
> +	"       prfm PSTL1KEEP, [%[lock]]\n"
> +	"1:     wfe\n"
> +	"	ldaxr	w0, [%[lock]]\n"
> +	"	cbnz    w0, 1b\n"
> +	"	stxr    w0, w1, [%[lock]]\n"
> +	"	cbnz	w0, 1b\n"
> +	/* lock held */
> +	"	ldr	w0, [%[sptr]]\n"
> +	"	add	w0, w0, #0x1\n"
> +	"	str	w0, [%[sptr]]\n"
> +	/* now release */
> +	"	stlr	wzr, [%[lock]]\n"
> +	: /* out */
> +	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
> +	: "w0", "w1", "cc");
> +#else
> +	asm volatile(
> +	"	mov     r1, #1\n"
> +	"1:	ldrex	r0, [%[lock]]\n"
> +	"	cmp     r0, #0\n"
> +	"	wfene\n"
> +	"	strexeq r0, r1, [%[lock]]\n"
> +	"	cmpeq	r0, #0\n"
> +	"	bne	1b\n"
> +	"	dmb\n"
> +	/* lock held */
> +	"	ldr	r0, [%[sptr]]\n"
> +	"	add	r0, r0, #0x1\n"
> +	"	str	r0, [%[sptr]]\n"
> +	/* now release */
> +	"	mov	r0, #0\n"
> +	"	dmb\n"
> +	"	str	r0, [%[lock]]\n"
> +	"	dsb\n"
> +	"	sev\n"
> +	: /* out */
> +	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
> +	: "r0", "r1", "cc");
> +#endif
> +}
> +
> +
> +/*
> + * Hand-written version of the load/store exclusive
> + */
> +static void increment_shared_with_excl(int cpu)
> +{
> +	(void)cpu;
> +
> +#if defined(__aarch64__)
> +        asm volatile(
> +	"1:	ldxr	w0, [%[sptr]]\n"
> +	"	add     w0, w0, #0x1\n"
> +	"	stxr	w1, w0, [%[sptr]]\n"
> +	"	cbnz	w1, 1b\n"
> +	: /* out */
> +	: [sptr] "r" (&shared_value) /* in */
> +	: "w0", "w1", "cc");
> +#else
> +	asm volatile(
> +	"1:	ldrex	r0, [%[sptr]]\n"
> +	"	add     r0, r0, #0x1\n"
> +	"	strex	r1, r0, [%[sptr]]\n"
> +	"	cmp	r1, #0\n"
> +	"	bne	1b\n"
> +	: /* out */
> +	: [sptr] "r" (&shared_value) /* in */
> +	: "r0", "r1", "cc");
> +#endif
> +}
> +
> +/* Test array */
> +static test_descr_t tests[] = {
> +	{ "none", false, increment_shared },
> +	{ "lock", true, increment_shared_with_lock },
> +	{ "atomic", true, increment_shared_with_atomic },
> +	{ "wfelock", true, increment_shared_with_wfelock },
> +	{ "excl", true, increment_shared_with_excl }
> +};
> +
> +/* The idea of this is just to generate some random load/store
> + * activity which may or may not race with an un-barried incremented
> + * of the shared counter
> + */
> +static void shuffle_memory(int cpu)
> +{
> +	int i;
> +	uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
> +	uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
> +	int count = seq & 0x1f;
> +	uint32_t val=0;
> +
> +	seq >>= 5;
> +
> +	for (i=0; i<count; i++) {
> +		int index = seq & ~PAGE_MASK;
> +		if (lspat & 1) {
> +			val ^= memory_array[index];
> +		} else {
> +			memory_array[index] = val;
> +		}
> +		seq >>= PAGE_SHIFT;
> +		seq ^= lspat;
> +		lspat >>= 1;
> +	}
> +

extra line here

> +}
> +
> +static inc_fn increment_function;
> +
> +static void do_increment(void)
> +{
> +	int i;
> +	int cpu = smp_processor_id();
> +
> +	printf("CPU%d: online and ++ing\n", cpu);
> +
> +	for (i=0; i < increment_count; i++) {
> +		per_cpu_value[cpu]++;
> +		increment_function(cpu);
> +
> +		if (do_shuffle)
> +			shuffle_memory(cpu);
> +	}
> +
> +	printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
> +
> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +	if (cpu != 0)
> +		halt();
> +}
> +
> +static void setup_and_run_test(test_descr_t *test)
> +{
> +	unsigned int i, sum = 0;
> +	int cpu, cpu_cnt = 0;
> +
> +	increment_function = test->main_fn;
> +
> +	/* fill our random page */
> +        for (i=0; i<PAGE_SIZE; i++) {
> +		memory_array[i] = isaac_next_uint32(&prng_context[0]);
> +	}
> +
> +	for_each_present_cpu(cpu) {
> +		uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
> +		cpu_cnt++;
> +		if (cpu == 0)
> +			continue;
> +
> +		isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
> +		smp_boot_secondary(cpu, do_increment);
> +	}
> +
> +	do_increment();
> +
> +	while (!cpumask_full(&smp_test_complete))
> +		cpu_relax();
> +
> +	/* All CPUs done, do we add up */
> +	for_each_present_cpu(cpu) {
> +		sum += per_cpu_value[cpu];
> +	}
> +
> +	if (test->should_pass) {
> +		report("total incs %d", sum == shared_value, shared_value);
> +	} else {
> +		report_xfail("total incs %d", true, sum == shared_value, shared_value);
> +	}
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	static const unsigned char seed[] = "myseed";
> +	test_descr_t *test = &tests[0];
> +	int i;
> +	unsigned int j;
> +
> +	isaac_init(&prng_context[0], &seed[0], sizeof(seed));
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +
> +		/* Check for test name */
> +		for (j = 0; j < ARRAY_SIZE(tests); j++) {
> +			if (strcmp(arg, tests[j].test_name) == 0)
> +				test = &tests[j];
> +		}
> +
> +		/* Test modifiers */
> +		if (strcmp(arg, "noshuffle") == 0) {
> +			do_shuffle = false;
> +			report_prefix_push("noshuffle");
> +		} else if (strstr(arg, "count=") != NULL) {
> +			char *p = strstr(arg, "=");
> +			increment_count = atol(p+1);
> +		} else {
> +			isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
> +		}
> +	}
> +
> +	if (test) {
> +		setup_and_run_test(test);
> +	} else {
> +		report("Unknown test", false);
> +	}
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 7dc7799..abbfe79 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -108,3 +108,37 @@ smp = $(($MAX_SMP>4?4:$MAX_SMP))
>  extra_params = -append "page"
>  groups = tlbflush
>  
> +# Locking tests
> +[locking::none]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +groups = locking
> +accel = tcg
> +
> +[locking::lock]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'lock'
> +groups = locking
> +accel = tcg
> +
> +[locking::atomic]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'atomic'
> +groups = locking
> +accel = tcg
> +
> +[locking::wfelock]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'wfelock'
> +groups = locking
> +accel = tcg
> +
> +[locking::excl]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'excl'
> +groups = locking
> +accel = tcg
> -- 
> 2.10.1
>

I didn't look too closely at this one...

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 09/11] arm/locking-tests: add comprehensive locking test
@ 2016-11-28 10:29     ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2016 at 04:10:31PM +0000, Alex Benn?e wrote:
> This test has been written mainly to stress multi-threaded TCG behaviour
> but will demonstrate failure by default on real hardware. The test takes
> the following parameters:
> 
>   - "lock" use GCC's locking semantics
>   - "atomic" use GCC's __atomic primitives
>   - "wfelock" use WaitForEvent sleep
>   - "excl" use load/store exclusive semantics
> 
> Also two more options allow the test to be tweaked
> 
>   - "noshuffle" disables the memory shuffling
>   - "count=%ld" set your own per-CPU increment count
> 
> Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
> 
> ---
> v2
>   - Don't use thumb style strexeq stuff
>   - Add atomic and wfelock tests
>   - Add count/noshuffle test controls
>   - Move barrier tests to separate test file
> v4
>   - fix up unitests.cfg to use correct test name
>   - move into "locking" group, remove barrier tests
>   - use a table to add tests, mark which are expected to work
>   - correctly report XFAIL
> v5
>   - max out at -smp 4 in unittest.cfg
> v7
>   - make test control flags bools
>   - default the count to 100000 (so it doesn't timeout)
> ---
>  arm/Makefile.common |   2 +
>  arm/locking-test.c  | 302 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg   |  34 ++++++
>  3 files changed, 338 insertions(+)
>  create mode 100644 arm/locking-test.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index 528166d..eb4cfdf 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -15,6 +15,7 @@ tests-common += $(TEST_DIR)/pci-test.flat
>  tests-common += $(TEST_DIR)/gic.flat
>  tests-common += $(TEST_DIR)/tlbflush-code.flat
>  tests-common += $(TEST_DIR)/tlbflush-data.flat
> +tests-common += $(TEST_DIR)/locking-test.flat
>  
>  all: test_cases
>  
> @@ -85,3 +86,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
>  $(TEST_DIR)/selftest.o $(cstart.o): $(asm-offsets)
>  $(TEST_DIR)/tlbflush-code.elf: $(cstart.o) $(TEST_DIR)/tlbflush-code.o
>  $(TEST_DIR)/tlbflush-data.elf: $(cstart.o) $(TEST_DIR)/tlbflush-data.o
> +$(TEST_DIR)/locking-test.elf: $(cstart.o) $(TEST_DIR)/locking-test.o

Instead of adding a new test file, please extend the one we already have,
which iirc was the first MTTCG test, arm/spinlock-test.c. If you don't
like the naming or code in spinlock-test.c, then feel free to change it,
delete it. It's currently not getting run by arm/unittests.cfg, and it's
not getting maintained.

> diff --git a/arm/locking-test.c b/arm/locking-test.c
> new file mode 100644
> index 0000000..f10c61b
> --- /dev/null
> +++ b/arm/locking-test.c
> @@ -0,0 +1,302 @@
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#include <prng.h>
> +
> +#define MAX_CPUS 8
> +
> +/* Test definition structure
> + *
> + * A simple structure that describes the test name, expected pass and
> + * increment function.
> + */
> +
> +/* Function pointers for test */
> +typedef void (*inc_fn)(int cpu);
> +
> +typedef struct {
> +	const char *test_name;
> +	bool  should_pass;
> +	inc_fn main_fn;
> +} test_descr_t;
> +
> +/* How many increments to do */
> +static int increment_count = 1000000;
> +static bool do_shuffle = true;
> +
> +/* Shared value all the tests attempt to safely increment using
> + * various forms of atomic locking and exclusive behaviour.
> + */
> +static unsigned int shared_value;
> +
> +/* PAGE_SIZE * uint32_t means we span several pages */
> +__attribute__((aligned(PAGE_SIZE))) static uint32_t memory_array[PAGE_SIZE];
> +
> +/* We use the alignment of the following to ensure accesses to locking
> + * and synchronisation primatives don't interfere with the page of the
> + * shared value
> + */
> +__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
> +__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
> +__attribute__((aligned(PAGE_SIZE))) struct isaac_ctx prng_context[MAX_CPUS];
> +
> +/* Some of the approaches use a global lock to prevent contention. */
> +static int global_lock;
> +
> +/* In any SMP setting this *should* fail due to cores stepping on
> + * each other updating the shared variable
> + */
> +static void increment_shared(int cpu)
> +{
> +	(void)cpu;
> +
> +	shared_value++;
> +}
> +
> +/* GCC __sync primitives are deprecated in favour of __atomic */
> +static void increment_shared_with_lock(int cpu)
> +{
> +	(void)cpu;
> +
> +	while (__sync_lock_test_and_set(&global_lock, 1));
> +	shared_value++;
> +	__sync_lock_release(&global_lock);
> +}
> +
> +/* In practice even __ATOMIC_RELAXED uses ARM's ldxr/stex exclusive
> + * semantics */
> +static void increment_shared_with_atomic(int cpu)
> +{
> +	(void)cpu;
> +
> +	__atomic_add_fetch(&shared_value, 1, __ATOMIC_SEQ_CST);
> +}
> +
> +
> +/*
> + * Load/store exclusive with WFE (wait-for-event)
> + *
> + * See ARMv8 ARM examples:
> + *   Use of Wait For Event (WFE) and Send Event (SEV) with locks
> + */
> +
> +static void increment_shared_with_wfelock(int cpu)
> +{
> +	(void)cpu;
> +
> +#if defined(__aarch64__)
> +	asm volatile(
> +	"	mov     w1, #1\n"
> +	"       sevl\n"
> +	"       prfm PSTL1KEEP, [%[lock]]\n"
> +	"1:     wfe\n"
> +	"	ldaxr	w0, [%[lock]]\n"
> +	"	cbnz    w0, 1b\n"
> +	"	stxr    w0, w1, [%[lock]]\n"
> +	"	cbnz	w0, 1b\n"
> +	/* lock held */
> +	"	ldr	w0, [%[sptr]]\n"
> +	"	add	w0, w0, #0x1\n"
> +	"	str	w0, [%[sptr]]\n"
> +	/* now release */
> +	"	stlr	wzr, [%[lock]]\n"
> +	: /* out */
> +	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
> +	: "w0", "w1", "cc");
> +#else
> +	asm volatile(
> +	"	mov     r1, #1\n"
> +	"1:	ldrex	r0, [%[lock]]\n"
> +	"	cmp     r0, #0\n"
> +	"	wfene\n"
> +	"	strexeq r0, r1, [%[lock]]\n"
> +	"	cmpeq	r0, #0\n"
> +	"	bne	1b\n"
> +	"	dmb\n"
> +	/* lock held */
> +	"	ldr	r0, [%[sptr]]\n"
> +	"	add	r0, r0, #0x1\n"
> +	"	str	r0, [%[sptr]]\n"
> +	/* now release */
> +	"	mov	r0, #0\n"
> +	"	dmb\n"
> +	"	str	r0, [%[lock]]\n"
> +	"	dsb\n"
> +	"	sev\n"
> +	: /* out */
> +	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
> +	: "r0", "r1", "cc");
> +#endif
> +}
> +
> +
> +/*
> + * Hand-written version of the load/store exclusive
> + */
> +static void increment_shared_with_excl(int cpu)
> +{
> +	(void)cpu;
> +
> +#if defined(__aarch64__)
> +        asm volatile(
> +	"1:	ldxr	w0, [%[sptr]]\n"
> +	"	add     w0, w0, #0x1\n"
> +	"	stxr	w1, w0, [%[sptr]]\n"
> +	"	cbnz	w1, 1b\n"
> +	: /* out */
> +	: [sptr] "r" (&shared_value) /* in */
> +	: "w0", "w1", "cc");
> +#else
> +	asm volatile(
> +	"1:	ldrex	r0, [%[sptr]]\n"
> +	"	add     r0, r0, #0x1\n"
> +	"	strex	r1, r0, [%[sptr]]\n"
> +	"	cmp	r1, #0\n"
> +	"	bne	1b\n"
> +	: /* out */
> +	: [sptr] "r" (&shared_value) /* in */
> +	: "r0", "r1", "cc");
> +#endif
> +}
> +
> +/* Test array */
> +static test_descr_t tests[] = {
> +	{ "none", false, increment_shared },
> +	{ "lock", true, increment_shared_with_lock },
> +	{ "atomic", true, increment_shared_with_atomic },
> +	{ "wfelock", true, increment_shared_with_wfelock },
> +	{ "excl", true, increment_shared_with_excl }
> +};
> +
> +/* The idea of this is just to generate some random load/store
> + * activity which may or may not race with an un-barried incremented
> + * of the shared counter
> + */
> +static void shuffle_memory(int cpu)
> +{
> +	int i;
> +	uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
> +	uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
> +	int count = seq & 0x1f;
> +	uint32_t val=0;
> +
> +	seq >>= 5;
> +
> +	for (i=0; i<count; i++) {
> +		int index = seq & ~PAGE_MASK;
> +		if (lspat & 1) {
> +			val ^= memory_array[index];
> +		} else {
> +			memory_array[index] = val;
> +		}
> +		seq >>= PAGE_SHIFT;
> +		seq ^= lspat;
> +		lspat >>= 1;
> +	}
> +

extra line here

> +}
> +
> +static inc_fn increment_function;
> +
> +static void do_increment(void)
> +{
> +	int i;
> +	int cpu = smp_processor_id();
> +
> +	printf("CPU%d: online and ++ing\n", cpu);
> +
> +	for (i=0; i < increment_count; i++) {
> +		per_cpu_value[cpu]++;
> +		increment_function(cpu);
> +
> +		if (do_shuffle)
> +			shuffle_memory(cpu);
> +	}
> +
> +	printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
> +
> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +	if (cpu != 0)
> +		halt();
> +}
> +
> +static void setup_and_run_test(test_descr_t *test)
> +{
> +	unsigned int i, sum = 0;
> +	int cpu, cpu_cnt = 0;
> +
> +	increment_function = test->main_fn;
> +
> +	/* fill our random page */
> +        for (i=0; i<PAGE_SIZE; i++) {
> +		memory_array[i] = isaac_next_uint32(&prng_context[0]);
> +	}
> +
> +	for_each_present_cpu(cpu) {
> +		uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
> +		cpu_cnt++;
> +		if (cpu == 0)
> +			continue;
> +
> +		isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
> +		smp_boot_secondary(cpu, do_increment);
> +	}
> +
> +	do_increment();
> +
> +	while (!cpumask_full(&smp_test_complete))
> +		cpu_relax();
> +
> +	/* All CPUs done, do we add up */
> +	for_each_present_cpu(cpu) {
> +		sum += per_cpu_value[cpu];
> +	}
> +
> +	if (test->should_pass) {
> +		report("total incs %d", sum == shared_value, shared_value);
> +	} else {
> +		report_xfail("total incs %d", true, sum == shared_value, shared_value);
> +	}
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	static const unsigned char seed[] = "myseed";
> +	test_descr_t *test = &tests[0];
> +	int i;
> +	unsigned int j;
> +
> +	isaac_init(&prng_context[0], &seed[0], sizeof(seed));
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +
> +		/* Check for test name */
> +		for (j = 0; j < ARRAY_SIZE(tests); j++) {
> +			if (strcmp(arg, tests[j].test_name) == 0)
> +				test = &tests[j];
> +		}
> +
> +		/* Test modifiers */
> +		if (strcmp(arg, "noshuffle") == 0) {
> +			do_shuffle = false;
> +			report_prefix_push("noshuffle");
> +		} else if (strstr(arg, "count=") != NULL) {
> +			char *p = strstr(arg, "=");
> +			increment_count = atol(p+1);
> +		} else {
> +			isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
> +		}
> +	}
> +
> +	if (test) {
> +		setup_and_run_test(test);
> +	} else {
> +		report("Unknown test", false);
> +	}
> +
> +	return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 7dc7799..abbfe79 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -108,3 +108,37 @@ smp = $(($MAX_SMP>4?4:$MAX_SMP))
>  extra_params = -append "page"
>  groups = tlbflush
>  
> +# Locking tests
> +[locking::none]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +groups = locking
> +accel = tcg
> +
> +[locking::lock]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'lock'
> +groups = locking
> +accel = tcg
> +
> +[locking::atomic]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'atomic'
> +groups = locking
> +accel = tcg
> +
> +[locking::wfelock]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'wfelock'
> +groups = locking
> +accel = tcg
> +
> +[locking::excl]
> +file = locking-test.flat
> +smp = $(($MAX_SMP>4?4:$MAX_SMP))
> +extra_params = -append 'excl'
> +groups = locking
> +accel = tcg
> -- 
> 2.10.1
>

I didn't look too closely at this one...

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-28 10:37   ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:37 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, nikunj, kvm, mark.burton, marc.zyngier, jan.kiszka,
	qemu-devel, cota, linux-arm-kernel, pbonzini, serge.fdrv,
	bobby.prani, kvmarm, rth

On Thu, Nov 24, 2016 at 04:10:22PM +0000, Alex Bennée wrote:
> Hi,
> 
> Looking at my records it seems as though it has been a while since I
> last posted these tests. As I'm hoping to get the final bits of MTTCG
> merged upstream on the next QEMU development cycle I've been re-basing
> these and getting them cleaned up for merging.
> 
> Some of the patches might be worth taking now if the maintainers are
> happy to do so (run_test tweaks, libcflat updates?). The others could
> do with more serious review. I've CC'd some of the ARM guys to look
> over the tlbflush/barrier tests so they can cast their expert eyes
> over them ;-)
> 
> There are two additions to the series.
> 
> The tcg-test is a general torture test aimed at QEMU's TCG execution
> model. It stresses the cpu execution loop through the use of
> cross-page and computed jumps. It can also add IRQ's and self-modifying
> code to the mix.
> 
> The tlbflush-data test is a new one, the old tlbflush test is renamed
> tlbflush-code to better indicate the code path it exercise. The the
> code test tests the translation invalidation pathways in QEMU the data
> test exercises the SoftMMU's TLBs and explicitly that tlbflush
> completion semantics are correct.
> 
> The tlbflush-data passes most of the times on real hardware but
> definitely showed the problem with deferred TLB flushes running under
> MTTCG QEMU. I've looked at some of the failure cases on real hardware
> and it did look like a timestamp appeared on a page that shouldn't
> have been accessible at the time - I don't know if this is a real
> silicon bug or my misreading of the semantics so I'd appreciate
> a comment from the experts.
> 
> The code needs to be applied on top of Drew's latest ARM GIC patches
> or you can grab my tree from:
> 
>   https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7

Thanks Alex,

I've skimmed over everything looking at it from a framwork/sytle
perspective. I didn't dig in trying to understand the tests though.
One general comment, I see many tests introduce MAX_CPUS 8. Why do
that? Why not allow all cpus by using NR_CPUS for the array sizes?

Thanks,
drew

> 
> Cheers,
> 
> Alex.
> 
> Alex Bennée (11):
>   run_tests: allow forcing of acceleration mode
>   run_tests: allow disabling of timeouts
>   run_tests: allow passing of options to QEMU
>   libcflat: add PRI(dux)32 format types
>   lib: add isaac prng library from CCAN
>   arm/Makefile.common: force -fno-pic
>   arm/tlbflush-code: Add TLB flush during code execution test
>   arm/tlbflush-data: Add TLB flush during data writes test
>   arm/locking-tests: add comprehensive locking test
>   arm/barrier-litmus-tests: add simple mp and sal litmus tests
>   arm/tcg-test: some basic TCG exercising tests
> 
>  Makefile                  |   2 +
>  arm/Makefile.arm          |   2 +
>  arm/Makefile.arm64        |   2 +
>  arm/Makefile.common       |  11 ++
>  arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
>  arm/locking-test.c        | 302 ++++++++++++++++++++++++++++++++
>  arm/tcg-test-asm.S        | 170 ++++++++++++++++++
>  arm/tcg-test-asm64.S      | 169 ++++++++++++++++++
>  arm/tcg-test.c            | 337 +++++++++++++++++++++++++++++++++++
>  arm/tlbflush-code.c       | 212 ++++++++++++++++++++++
>  arm/tlbflush-data.c       | 401 ++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg         | 190 ++++++++++++++++++++
>  lib/arm/asm/barrier.h     |  63 ++++++-
>  lib/arm64/asm/barrier.h   |  50 ++++++
>  lib/libcflat.h            |   5 +
>  lib/prng.c                | 162 +++++++++++++++++
>  lib/prng.h                |  82 +++++++++
>  run_tests.sh              |  18 +-
>  scripts/functions.bash    |  13 +-
>  scripts/runtime.bash      |   8 +
>  20 files changed, 2626 insertions(+), 10 deletions(-)
>  create mode 100644 arm/barrier-litmus-test.c
>  create mode 100644 arm/locking-test.c
>  create mode 100644 arm/tcg-test-asm.S
>  create mode 100644 arm/tcg-test-asm64.S
>  create mode 100644 arm/tcg-test.c
>  create mode 100644 arm/tlbflush-code.c
>  create mode 100644 arm/tlbflush-data.c
>  create mode 100644 lib/prng.c
>  create mode 100644 lib/prng.h
> 
> -- 
> 2.10.1
> 
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 10:37   ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:37 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, peter.maydell, claudio.fontana, nikunj, jan.kiszka,
	mark.burton, a.rigo, qemu-devel, cota, serge.fdrv, pbonzini,
	bobby.prani, rth, fred.konrad

On Thu, Nov 24, 2016 at 04:10:22PM +0000, Alex Bennée wrote:
> Hi,
> 
> Looking at my records it seems as though it has been a while since I
> last posted these tests. As I'm hoping to get the final bits of MTTCG
> merged upstream on the next QEMU development cycle I've been re-basing
> these and getting them cleaned up for merging.
> 
> Some of the patches might be worth taking now if the maintainers are
> happy to do so (run_test tweaks, libcflat updates?). The others could
> do with more serious review. I've CC'd some of the ARM guys to look
> over the tlbflush/barrier tests so they can cast their expert eyes
> over them ;-)
> 
> There are two additions to the series.
> 
> The tcg-test is a general torture test aimed at QEMU's TCG execution
> model. It stresses the cpu execution loop through the use of
> cross-page and computed jumps. It can also add IRQ's and self-modifying
> code to the mix.
> 
> The tlbflush-data test is a new one, the old tlbflush test is renamed
> tlbflush-code to better indicate the code path it exercise. The the
> code test tests the translation invalidation pathways in QEMU the data
> test exercises the SoftMMU's TLBs and explicitly that tlbflush
> completion semantics are correct.
> 
> The tlbflush-data passes most of the times on real hardware but
> definitely showed the problem with deferred TLB flushes running under
> MTTCG QEMU. I've looked at some of the failure cases on real hardware
> and it did look like a timestamp appeared on a page that shouldn't
> have been accessible at the time - I don't know if this is a real
> silicon bug or my misreading of the semantics so I'd appreciate
> a comment from the experts.
> 
> The code needs to be applied on top of Drew's latest ARM GIC patches
> or you can grab my tree from:
> 
>   https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7

Thanks Alex,

I've skimmed over everything looking at it from a framwork/sytle
perspective. I didn't dig in trying to understand the tests though.
One general comment, I see many tests introduce MAX_CPUS 8. Why do
that? Why not allow all cpus by using NR_CPUS for the array sizes?

Thanks,
drew

> 
> Cheers,
> 
> Alex.
> 
> Alex Bennée (11):
>   run_tests: allow forcing of acceleration mode
>   run_tests: allow disabling of timeouts
>   run_tests: allow passing of options to QEMU
>   libcflat: add PRI(dux)32 format types
>   lib: add isaac prng library from CCAN
>   arm/Makefile.common: force -fno-pic
>   arm/tlbflush-code: Add TLB flush during code execution test
>   arm/tlbflush-data: Add TLB flush during data writes test
>   arm/locking-tests: add comprehensive locking test
>   arm/barrier-litmus-tests: add simple mp and sal litmus tests
>   arm/tcg-test: some basic TCG exercising tests
> 
>  Makefile                  |   2 +
>  arm/Makefile.arm          |   2 +
>  arm/Makefile.arm64        |   2 +
>  arm/Makefile.common       |  11 ++
>  arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
>  arm/locking-test.c        | 302 ++++++++++++++++++++++++++++++++
>  arm/tcg-test-asm.S        | 170 ++++++++++++++++++
>  arm/tcg-test-asm64.S      | 169 ++++++++++++++++++
>  arm/tcg-test.c            | 337 +++++++++++++++++++++++++++++++++++
>  arm/tlbflush-code.c       | 212 ++++++++++++++++++++++
>  arm/tlbflush-data.c       | 401 ++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg         | 190 ++++++++++++++++++++
>  lib/arm/asm/barrier.h     |  63 ++++++-
>  lib/arm64/asm/barrier.h   |  50 ++++++
>  lib/libcflat.h            |   5 +
>  lib/prng.c                | 162 +++++++++++++++++
>  lib/prng.h                |  82 +++++++++
>  run_tests.sh              |  18 +-
>  scripts/functions.bash    |  13 +-
>  scripts/runtime.bash      |   8 +
>  20 files changed, 2626 insertions(+), 10 deletions(-)
>  create mode 100644 arm/barrier-litmus-test.c
>  create mode 100644 arm/locking-test.c
>  create mode 100644 arm/tcg-test-asm.S
>  create mode 100644 arm/tcg-test-asm64.S
>  create mode 100644 arm/tcg-test.c
>  create mode 100644 arm/tlbflush-code.c
>  create mode 100644 arm/tlbflush-data.c
>  create mode 100644 lib/prng.c
>  create mode 100644 lib/prng.h
> 
> -- 
> 2.10.1
> 
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 10:37   ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2016 at 04:10:22PM +0000, Alex Benn?e wrote:
> Hi,
> 
> Looking at my records it seems as though it has been a while since I
> last posted these tests. As I'm hoping to get the final bits of MTTCG
> merged upstream on the next QEMU development cycle I've been re-basing
> these and getting them cleaned up for merging.
> 
> Some of the patches might be worth taking now if the maintainers are
> happy to do so (run_test tweaks, libcflat updates?). The others could
> do with more serious review. I've CC'd some of the ARM guys to look
> over the tlbflush/barrier tests so they can cast their expert eyes
> over them ;-)
> 
> There are two additions to the series.
> 
> The tcg-test is a general torture test aimed at QEMU's TCG execution
> model. It stresses the cpu execution loop through the use of
> cross-page and computed jumps. It can also add IRQ's and self-modifying
> code to the mix.
> 
> The tlbflush-data test is a new one, the old tlbflush test is renamed
> tlbflush-code to better indicate the code path it exercise. The the
> code test tests the translation invalidation pathways in QEMU the data
> test exercises the SoftMMU's TLBs and explicitly that tlbflush
> completion semantics are correct.
> 
> The tlbflush-data passes most of the times on real hardware but
> definitely showed the problem with deferred TLB flushes running under
> MTTCG QEMU. I've looked at some of the failure cases on real hardware
> and it did look like a timestamp appeared on a page that shouldn't
> have been accessible at the time - I don't know if this is a real
> silicon bug or my misreading of the semantics so I'd appreciate
> a comment from the experts.
> 
> The code needs to be applied on top of Drew's latest ARM GIC patches
> or you can grab my tree from:
> 
>   https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7

Thanks Alex,

I've skimmed over everything looking at it from a framwork/sytle
perspective. I didn't dig in trying to understand the tests though.
One general comment, I see many tests introduce MAX_CPUS 8. Why do
that? Why not allow all cpus by using NR_CPUS for the array sizes?

Thanks,
drew

> 
> Cheers,
> 
> Alex.
> 
> Alex Benn?e (11):
>   run_tests: allow forcing of acceleration mode
>   run_tests: allow disabling of timeouts
>   run_tests: allow passing of options to QEMU
>   libcflat: add PRI(dux)32 format types
>   lib: add isaac prng library from CCAN
>   arm/Makefile.common: force -fno-pic
>   arm/tlbflush-code: Add TLB flush during code execution test
>   arm/tlbflush-data: Add TLB flush during data writes test
>   arm/locking-tests: add comprehensive locking test
>   arm/barrier-litmus-tests: add simple mp and sal litmus tests
>   arm/tcg-test: some basic TCG exercising tests
> 
>  Makefile                  |   2 +
>  arm/Makefile.arm          |   2 +
>  arm/Makefile.arm64        |   2 +
>  arm/Makefile.common       |  11 ++
>  arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
>  arm/locking-test.c        | 302 ++++++++++++++++++++++++++++++++
>  arm/tcg-test-asm.S        | 170 ++++++++++++++++++
>  arm/tcg-test-asm64.S      | 169 ++++++++++++++++++
>  arm/tcg-test.c            | 337 +++++++++++++++++++++++++++++++++++
>  arm/tlbflush-code.c       | 212 ++++++++++++++++++++++
>  arm/tlbflush-data.c       | 401 ++++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg         | 190 ++++++++++++++++++++
>  lib/arm/asm/barrier.h     |  63 ++++++-
>  lib/arm64/asm/barrier.h   |  50 ++++++
>  lib/libcflat.h            |   5 +
>  lib/prng.c                | 162 +++++++++++++++++
>  lib/prng.h                |  82 +++++++++
>  run_tests.sh              |  18 +-
>  scripts/functions.bash    |  13 +-
>  scripts/runtime.bash      |   8 +
>  20 files changed, 2626 insertions(+), 10 deletions(-)
>  create mode 100644 arm/barrier-litmus-test.c
>  create mode 100644 arm/locking-test.c
>  create mode 100644 arm/tcg-test-asm.S
>  create mode 100644 arm/tcg-test-asm64.S
>  create mode 100644 arm/tcg-test.c
>  create mode 100644 arm/tlbflush-code.c
>  create mode 100644 arm/tlbflush-data.c
>  create mode 100644 lib/prng.c
>  create mode 100644 lib/prng.h
> 
> -- 
> 2.10.1
> 
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
  2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
  (?)
@ 2016-11-28 10:51   ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:51 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, nikunj, kvm, mark.burton, marc.zyngier, jan.kiszka,
	qemu-devel, cota, linux-arm-kernel, pbonzini, serge.fdrv,
	bobby.prani, kvmarm, rth

On Thu, Nov 24, 2016 at 04:10:22PM +0000, Alex Bennée wrote:
> Hi,
> 
> Looking at my records it seems as though it has been a while since I
> last posted these tests. As I'm hoping to get the final bits of MTTCG
> merged upstream on the next QEMU development cycle I've been re-basing
> these and getting them cleaned up for merging.
> 
> Some of the patches might be worth taking now if the maintainers are
> happy to do so (run_test tweaks, libcflat updates?). The others could
> do with more serious review. I've CC'd some of the ARM guys to look
> over the tlbflush/barrier tests so they can cast their expert eyes
> over them ;-)
> 
> There are two additions to the series.
> 
> The tcg-test is a general torture test aimed at QEMU's TCG execution
> model. It stresses the cpu execution loop through the use of
> cross-page and computed jumps. It can also add IRQ's and self-modifying
> code to the mix.
> 
> The tlbflush-data test is a new one, the old tlbflush test is renamed
> tlbflush-code to better indicate the code path it exercise. The the
> code test tests the translation invalidation pathways in QEMU the data
> test exercises the SoftMMU's TLBs and explicitly that tlbflush
> completion semantics are correct.
> 
> The tlbflush-data passes most of the times on real hardware but
> definitely showed the problem with deferred TLB flushes running under
> MTTCG QEMU. I've looked at some of the failure cases on real hardware
> and it did look like a timestamp appeared on a page that shouldn't
> have been accessible at the time - I don't know if this is a real
> silicon bug or my misreading of the semantics so I'd appreciate
> a comment from the experts.

One other thought. I'm not sure how best to approach a bunch of TCG-only
tests getting integrated. I'm thinking it might be nice to give them
their own subdir under the arch dir, e.g. arm/tcg. That subdir would
have its own unittests.cfg file too. Otherwise when we run on KVM we'll
have a load of "SKIP: requires TCG" type messages...

We'll want to add a run_tests.sh option to pass the name of the subdir,
'-d tcg'. When the subdir name is 'tcg' ACCEL could automatically be
switched to 'tcg' as well.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 10:51   ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:51 UTC (permalink / raw)
  To: Alex Bennée
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, peter.maydell, claudio.fontana, nikunj, jan.kiszka,
	mark.burton, a.rigo, qemu-devel, cota, serge.fdrv, pbonzini,
	bobby.prani, rth, fred.konrad

On Thu, Nov 24, 2016 at 04:10:22PM +0000, Alex Bennée wrote:
> Hi,
> 
> Looking at my records it seems as though it has been a while since I
> last posted these tests. As I'm hoping to get the final bits of MTTCG
> merged upstream on the next QEMU development cycle I've been re-basing
> these and getting them cleaned up for merging.
> 
> Some of the patches might be worth taking now if the maintainers are
> happy to do so (run_test tweaks, libcflat updates?). The others could
> do with more serious review. I've CC'd some of the ARM guys to look
> over the tlbflush/barrier tests so they can cast their expert eyes
> over them ;-)
> 
> There are two additions to the series.
> 
> The tcg-test is a general torture test aimed at QEMU's TCG execution
> model. It stresses the cpu execution loop through the use of
> cross-page and computed jumps. It can also add IRQ's and self-modifying
> code to the mix.
> 
> The tlbflush-data test is a new one, the old tlbflush test is renamed
> tlbflush-code to better indicate the code path it exercise. The the
> code test tests the translation invalidation pathways in QEMU the data
> test exercises the SoftMMU's TLBs and explicitly that tlbflush
> completion semantics are correct.
> 
> The tlbflush-data passes most of the times on real hardware but
> definitely showed the problem with deferred TLB flushes running under
> MTTCG QEMU. I've looked at some of the failure cases on real hardware
> and it did look like a timestamp appeared on a page that shouldn't
> have been accessible at the time - I don't know if this is a real
> silicon bug or my misreading of the semantics so I'd appreciate
> a comment from the experts.

One other thought. I'm not sure how best to approach a bunch of TCG-only
tests getting integrated. I'm thinking it might be nice to give them
their own subdir under the arch dir, e.g. arm/tcg. That subdir would
have its own unittests.cfg file too. Otherwise when we run on KVM we'll
have a load of "SKIP: requires TCG" type messages...

We'll want to add a run_tests.sh option to pass the name of the subdir,
'-d tcg'. When the subdir name is 'tcg' ACCEL could automatically be
switched to 'tcg' as well.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 10:51   ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 24, 2016 at 04:10:22PM +0000, Alex Benn?e wrote:
> Hi,
> 
> Looking at my records it seems as though it has been a while since I
> last posted these tests. As I'm hoping to get the final bits of MTTCG
> merged upstream on the next QEMU development cycle I've been re-basing
> these and getting them cleaned up for merging.
> 
> Some of the patches might be worth taking now if the maintainers are
> happy to do so (run_test tweaks, libcflat updates?). The others could
> do with more serious review. I've CC'd some of the ARM guys to look
> over the tlbflush/barrier tests so they can cast their expert eyes
> over them ;-)
> 
> There are two additions to the series.
> 
> The tcg-test is a general torture test aimed at QEMU's TCG execution
> model. It stresses the cpu execution loop through the use of
> cross-page and computed jumps. It can also add IRQ's and self-modifying
> code to the mix.
> 
> The tlbflush-data test is a new one, the old tlbflush test is renamed
> tlbflush-code to better indicate the code path it exercise. The the
> code test tests the translation invalidation pathways in QEMU the data
> test exercises the SoftMMU's TLBs and explicitly that tlbflush
> completion semantics are correct.
> 
> The tlbflush-data passes most of the times on real hardware but
> definitely showed the problem with deferred TLB flushes running under
> MTTCG QEMU. I've looked at some of the failure cases on real hardware
> and it did look like a timestamp appeared on a page that shouldn't
> have been accessible at the time - I don't know if this is a real
> silicon bug or my misreading of the semantics so I'd appreciate
> a comment from the experts.

One other thought. I'm not sure how best to approach a bunch of TCG-only
tests getting integrated. I'm thinking it might be nice to give them
their own subdir under the arch dir, e.g. arm/tcg. That subdir would
have its own unittests.cfg file too. Otherwise when we run on KVM we'll
have a load of "SKIP: requires TCG" type messages...

We'll want to add a run_tests.sh option to pass the name of the subdir,
'-d tcg'. When the subdir name is 'tcg' ACCEL could automatically be
switched to 'tcg' as well.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
  2016-11-28 10:37   ` Andrew Jones
@ 2016-11-28 11:12     ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-28 11:12 UTC (permalink / raw)
  To: Andrew Jones
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, peter.maydell, claudio.fontana, nikunj, jan.kiszka,
	mark.burton, a.rigo, qemu-devel, cota, serge.fdrv, pbonzini,
	bobby.prani, rth, fred.konrad


Andrew Jones <drjones@redhat.com> writes:

> On Thu, Nov 24, 2016 at 04:10:22PM +0000, Alex Bennée wrote:
>> Hi,
>>
>> Looking at my records it seems as though it has been a while since I
>> last posted these tests. As I'm hoping to get the final bits of MTTCG
>> merged upstream on the next QEMU development cycle I've been re-basing
>> these and getting them cleaned up for merging.
>>
>> Some of the patches might be worth taking now if the maintainers are
>> happy to do so (run_test tweaks, libcflat updates?). The others could
>> do with more serious review. I've CC'd some of the ARM guys to look
>> over the tlbflush/barrier tests so they can cast their expert eyes
>> over them ;-)
>>
>> There are two additions to the series.
>>
>> The tcg-test is a general torture test aimed at QEMU's TCG execution
>> model. It stresses the cpu execution loop through the use of
>> cross-page and computed jumps. It can also add IRQ's and self-modifying
>> code to the mix.
>>
>> The tlbflush-data test is a new one, the old tlbflush test is renamed
>> tlbflush-code to better indicate the code path it exercise. The the
>> code test tests the translation invalidation pathways in QEMU the data
>> test exercises the SoftMMU's TLBs and explicitly that tlbflush
>> completion semantics are correct.
>>
>> The tlbflush-data passes most of the times on real hardware but
>> definitely showed the problem with deferred TLB flushes running under
>> MTTCG QEMU. I've looked at some of the failure cases on real hardware
>> and it did look like a timestamp appeared on a page that shouldn't
>> have been accessible at the time - I don't know if this is a real
>> silicon bug or my misreading of the semantics so I'd appreciate
>> a comment from the experts.
>>
>> The code needs to be applied on top of Drew's latest ARM GIC patches
>> or you can grab my tree from:
>>
>>   https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7
>
> Thanks Alex,
>
> I've skimmed over everything looking at it from a framwork/sytle
> perspective. I didn't dig in trying to understand the tests though.
> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> that? Why not allow all cpus by using NR_CPUS for the array sizes?

Yeah - I can fix those. I wonder what the maximum is with GIC V3?
>
> Thanks,
> drew
>
>>
>> Cheers,
>>
>> Alex.
>>
>> Alex Bennée (11):
>>   run_tests: allow forcing of acceleration mode
>>   run_tests: allow disabling of timeouts
>>   run_tests: allow passing of options to QEMU
>>   libcflat: add PRI(dux)32 format types
>>   lib: add isaac prng library from CCAN
>>   arm/Makefile.common: force -fno-pic
>>   arm/tlbflush-code: Add TLB flush during code execution test
>>   arm/tlbflush-data: Add TLB flush during data writes test
>>   arm/locking-tests: add comprehensive locking test
>>   arm/barrier-litmus-tests: add simple mp and sal litmus tests
>>   arm/tcg-test: some basic TCG exercising tests
>>
>>  Makefile                  |   2 +
>>  arm/Makefile.arm          |   2 +
>>  arm/Makefile.arm64        |   2 +
>>  arm/Makefile.common       |  11 ++
>>  arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
>>  arm/locking-test.c        | 302 ++++++++++++++++++++++++++++++++
>>  arm/tcg-test-asm.S        | 170 ++++++++++++++++++
>>  arm/tcg-test-asm64.S      | 169 ++++++++++++++++++
>>  arm/tcg-test.c            | 337 +++++++++++++++++++++++++++++++++++
>>  arm/tlbflush-code.c       | 212 ++++++++++++++++++++++
>>  arm/tlbflush-data.c       | 401 ++++++++++++++++++++++++++++++++++++++++++
>>  arm/unittests.cfg         | 190 ++++++++++++++++++++
>>  lib/arm/asm/barrier.h     |  63 ++++++-
>>  lib/arm64/asm/barrier.h   |  50 ++++++
>>  lib/libcflat.h            |   5 +
>>  lib/prng.c                | 162 +++++++++++++++++
>>  lib/prng.h                |  82 +++++++++
>>  run_tests.sh              |  18 +-
>>  scripts/functions.bash    |  13 +-
>>  scripts/runtime.bash      |   8 +
>>  20 files changed, 2626 insertions(+), 10 deletions(-)
>>  create mode 100644 arm/barrier-litmus-test.c
>>  create mode 100644 arm/locking-test.c
>>  create mode 100644 arm/tcg-test-asm.S
>>  create mode 100644 arm/tcg-test-asm64.S
>>  create mode 100644 arm/tcg-test.c
>>  create mode 100644 arm/tlbflush-code.c
>>  create mode 100644 arm/tlbflush-data.c
>>  create mode 100644 lib/prng.c
>>  create mode 100644 lib/prng.h
>>
>> --
>> 2.10.1
>>
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 11:12     ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-28 11:12 UTC (permalink / raw)
  To: linux-arm-kernel


Andrew Jones <drjones@redhat.com> writes:

> On Thu, Nov 24, 2016 at 04:10:22PM +0000, Alex Benn?e wrote:
>> Hi,
>>
>> Looking at my records it seems as though it has been a while since I
>> last posted these tests. As I'm hoping to get the final bits of MTTCG
>> merged upstream on the next QEMU development cycle I've been re-basing
>> these and getting them cleaned up for merging.
>>
>> Some of the patches might be worth taking now if the maintainers are
>> happy to do so (run_test tweaks, libcflat updates?). The others could
>> do with more serious review. I've CC'd some of the ARM guys to look
>> over the tlbflush/barrier tests so they can cast their expert eyes
>> over them ;-)
>>
>> There are two additions to the series.
>>
>> The tcg-test is a general torture test aimed at QEMU's TCG execution
>> model. It stresses the cpu execution loop through the use of
>> cross-page and computed jumps. It can also add IRQ's and self-modifying
>> code to the mix.
>>
>> The tlbflush-data test is a new one, the old tlbflush test is renamed
>> tlbflush-code to better indicate the code path it exercise. The the
>> code test tests the translation invalidation pathways in QEMU the data
>> test exercises the SoftMMU's TLBs and explicitly that tlbflush
>> completion semantics are correct.
>>
>> The tlbflush-data passes most of the times on real hardware but
>> definitely showed the problem with deferred TLB flushes running under
>> MTTCG QEMU. I've looked at some of the failure cases on real hardware
>> and it did look like a timestamp appeared on a page that shouldn't
>> have been accessible at the time - I don't know if this is a real
>> silicon bug or my misreading of the semantics so I'd appreciate
>> a comment from the experts.
>>
>> The code needs to be applied on top of Drew's latest ARM GIC patches
>> or you can grab my tree from:
>>
>>   https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7
>
> Thanks Alex,
>
> I've skimmed over everything looking at it from a framwork/sytle
> perspective. I didn't dig in trying to understand the tests though.
> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> that? Why not allow all cpus by using NR_CPUS for the array sizes?

Yeah - I can fix those. I wonder what the maximum is with GIC V3?
>
> Thanks,
> drew
>
>>
>> Cheers,
>>
>> Alex.
>>
>> Alex Benn?e (11):
>>   run_tests: allow forcing of acceleration mode
>>   run_tests: allow disabling of timeouts
>>   run_tests: allow passing of options to QEMU
>>   libcflat: add PRI(dux)32 format types
>>   lib: add isaac prng library from CCAN
>>   arm/Makefile.common: force -fno-pic
>>   arm/tlbflush-code: Add TLB flush during code execution test
>>   arm/tlbflush-data: Add TLB flush during data writes test
>>   arm/locking-tests: add comprehensive locking test
>>   arm/barrier-litmus-tests: add simple mp and sal litmus tests
>>   arm/tcg-test: some basic TCG exercising tests
>>
>>  Makefile                  |   2 +
>>  arm/Makefile.arm          |   2 +
>>  arm/Makefile.arm64        |   2 +
>>  arm/Makefile.common       |  11 ++
>>  arm/barrier-litmus-test.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++
>>  arm/locking-test.c        | 302 ++++++++++++++++++++++++++++++++
>>  arm/tcg-test-asm.S        | 170 ++++++++++++++++++
>>  arm/tcg-test-asm64.S      | 169 ++++++++++++++++++
>>  arm/tcg-test.c            | 337 +++++++++++++++++++++++++++++++++++
>>  arm/tlbflush-code.c       | 212 ++++++++++++++++++++++
>>  arm/tlbflush-data.c       | 401 ++++++++++++++++++++++++++++++++++++++++++
>>  arm/unittests.cfg         | 190 ++++++++++++++++++++
>>  lib/arm/asm/barrier.h     |  63 ++++++-
>>  lib/arm64/asm/barrier.h   |  50 ++++++
>>  lib/libcflat.h            |   5 +
>>  lib/prng.c                | 162 +++++++++++++++++
>>  lib/prng.h                |  82 +++++++++
>>  run_tests.sh              |  18 +-
>>  scripts/functions.bash    |  13 +-
>>  scripts/runtime.bash      |   8 +
>>  20 files changed, 2626 insertions(+), 10 deletions(-)
>>  create mode 100644 arm/barrier-litmus-test.c
>>  create mode 100644 arm/locking-test.c
>>  create mode 100644 arm/tcg-test-asm.S
>>  create mode 100644 arm/tcg-test-asm64.S
>>  create mode 100644 arm/tcg-test.c
>>  create mode 100644 arm/tlbflush-code.c
>>  create mode 100644 arm/tlbflush-data.c
>>  create mode 100644 lib/prng.c
>>  create mode 100644 lib/prng.h
>>
>> --
>> 2.10.1
>>
>>


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
  2016-11-28 11:12     ` Alex Bennée
  (?)
@ 2016-11-28 11:14       ` Peter Maydell
  -1 siblings, 0 replies; 93+ messages in thread
From: Peter Maydell @ 2016-11-28 11:14 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Andrew Jones, kvm-devel, arm-mail-list, kvmarm, Christoffer Dall,
	Marc Zyngier, MTTCG Devel, Claudio Fontana, Nikunj A Dadhania,
	Jan Kiszka, Mark Burton, Alvise Rigo, QEMU Developers,
	Emilio G. Cota, Fedorov Sergey, Paolo Bonzini, Pranith Kumar,
	Richard Henderson

On 28 November 2016 at 11:12, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Andrew Jones <drjones@redhat.com> writes:
>> I've skimmed over everything looking at it from a framwork/sytle
>> perspective. I didn't dig in trying to understand the tests though.
>> One general comment, I see many tests introduce MAX_CPUS 8. Why do
>> that? Why not allow all cpus by using NR_CPUS for the array sizes?
>
> Yeah - I can fix those. I wonder what the maximum is with GIC V3?

So large that you don't want to hardcode it as an array size...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 11:14       ` Peter Maydell
  0 siblings, 0 replies; 93+ messages in thread
From: Peter Maydell @ 2016-11-28 11:14 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Andrew Jones, kvm-devel, arm-mail-list, kvmarm, Christoffer Dall,
	Marc Zyngier, MTTCG Devel, Claudio Fontana, Nikunj A Dadhania,
	Jan Kiszka, Mark Burton, Alvise Rigo, QEMU Developers,
	Emilio G. Cota, Fedorov Sergey, Paolo Bonzini, Pranith Kumar,
	Richard Henderson, KONRAD Frédéric

On 28 November 2016 at 11:12, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Andrew Jones <drjones@redhat.com> writes:
>> I've skimmed over everything looking at it from a framwork/sytle
>> perspective. I didn't dig in trying to understand the tests though.
>> One general comment, I see many tests introduce MAX_CPUS 8. Why do
>> that? Why not allow all cpus by using NR_CPUS for the array sizes?
>
> Yeah - I can fix those. I wonder what the maximum is with GIC V3?

So large that you don't want to hardcode it as an array size...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 11:14       ` Peter Maydell
  0 siblings, 0 replies; 93+ messages in thread
From: Peter Maydell @ 2016-11-28 11:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 November 2016 at 11:12, Alex Benn?e <alex.bennee@linaro.org> wrote:
>
> Andrew Jones <drjones@redhat.com> writes:
>> I've skimmed over everything looking at it from a framwork/sytle
>> perspective. I didn't dig in trying to understand the tests though.
>> One general comment, I see many tests introduce MAX_CPUS 8. Why do
>> that? Why not allow all cpus by using NR_CPUS for the array sizes?
>
> Yeah - I can fix those. I wonder what the maximum is with GIC V3?

So large that you don't want to hardcode it as an array size...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU
  2016-11-28  9:10     ` Andrew Jones
  (?)
@ 2016-11-28 11:22       ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-28 11:22 UTC (permalink / raw)
  To: Andrew Jones
  Cc: mttcg, nikunj, kvm, mark.burton, marc.zyngier, jan.kiszka,
	qemu-devel, cota, linux-arm-kernel, pbonzini, serge.fdrv,
	bobby.prani, kvmarm, rth


Andrew Jones <drjones@redhat.com> writes:

> On Thu, Nov 24, 2016 at 04:10:25PM +0000, Alex Bennée wrote:
>> This introduces a the option -o for passing of options directly to QEMU
>> which is useful. In my case I'm using it to toggle MTTCG on an off:
>>
>>   ./run_tests.sh -t -o "-tcg mttcg=on"
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>  run_tests.sh           | 10 +++++++---
>>  scripts/functions.bash | 13 +++++++------
>>  2 files changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/run_tests.sh b/run_tests.sh
>> index 4f2e5cb..05cc7fb 100755
>> --- a/run_tests.sh
>> +++ b/run_tests.sh
>> @@ -13,10 +13,11 @@ function usage()
>>  {
>>  cat <<EOF
>>
>> -Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
>> +Usage: $0 [-g group] [-a accel] [-o qemu_opts] [-t] [-h] [-v]
>>
>>      -g: Only execute tests in the given group
>>      -a: Force acceleration mode (tcg/kvm)
>> +    -o: additional options for QEMU command line
>>      -t: disable timeouts
>>      -h: Output this help text
>>      -v: Enables verbose mode
>> @@ -30,7 +31,7 @@ EOF
>>  RUNTIME_arch_run="./$TEST_DIR/run"
>>  source scripts/runtime.bash
>>
>> -while getopts "g:a:thv" opt; do
>> +while getopts "g:a:o:thv" opt; do
>>      case $opt in
>>          g)
>>              only_group=$OPTARG
>> @@ -38,6 +39,9 @@ while getopts "g:a:thv" opt; do
>>          a)
>>              force_accel=$OPTARG
>>              ;;
>> +        o)
>> +            extra_opts=$OPTARG
>> +            ;;
>>          t)
>>              no_timeout="yes"
>>              ;;
>> @@ -67,4 +71,4 @@ RUNTIME_log_stdout () {
>>  config=$TEST_DIR/unittests.cfg
>>  rm -f test.log
>>  printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
>> -for_each_unittest $config run
>> +for_each_unittest $config run "$extra_opts"
>> diff --git a/scripts/functions.bash b/scripts/functions.bash
>> index ee9143c..d38a69e 100644
>> --- a/scripts/functions.bash
>> +++ b/scripts/functions.bash
>> @@ -2,11 +2,12 @@
>>  function for_each_unittest()
>>  {
>>  	local unittests="$1"
>> -	local cmd="$2"
>> -	local testname
>> +        local cmd="$2"
>> +        local extra_opts=$3
>> +        local testname
>
> We use tabs in this file. Not sure why cmd and testname got
> changed too...
>
>>  	local smp
>>  	local kernel
>> -	local opts
>> +        local opts=$extra_opts
>>  	local groups
>>  	local arch
>>  	local check
>> @@ -21,7 +22,7 @@ function for_each_unittest()
>>  			testname=${BASH_REMATCH[1]}
>>  			smp=1
>>  			kernel=""
>> -			opts=""
>> +                        opts=$extra_opts
>>  			groups=""
>>  			arch=""
>>  			check=""
>> @@ -32,7 +33,7 @@ function for_each_unittest()
>>  		elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
>>  			smp=${BASH_REMATCH[1]}
>>  		elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
>> -			opts=${BASH_REMATCH[1]}
>> +                        opts="$opts ${BASH_REMATCH[1]}"
>>  		elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
>>  			groups=${BASH_REMATCH[1]}
>>  		elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
>> @@ -45,6 +46,6 @@ function for_each_unittest()
>>  			timeout=${BASH_REMATCH[1]}
>>  		fi
>>  	done
>> -	"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
>> +        "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
>>  	exec {fd}<&-
>>  }
>> --
>> 2.10.1
>>
>>
>
> This is a pretty good idea, but I think I might like the extra options
> to be given like this instead
>
>   ./run_tests.sh [run_tests.sh options] -- [qemu options]
>
> Thanks,
> drew

That sounds like a better way, I'll fix that.

--
Alex Bennée
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU
@ 2016-11-28 11:22       ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-28 11:22 UTC (permalink / raw)
  To: Andrew Jones
  Cc: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier,
	mttcg, peter.maydell, claudio.fontana, nikunj, jan.kiszka,
	mark.burton, a.rigo, qemu-devel, cota, serge.fdrv, pbonzini,
	bobby.prani, rth, fred.konrad


Andrew Jones <drjones@redhat.com> writes:

> On Thu, Nov 24, 2016 at 04:10:25PM +0000, Alex Bennée wrote:
>> This introduces a the option -o for passing of options directly to QEMU
>> which is useful. In my case I'm using it to toggle MTTCG on an off:
>>
>>   ./run_tests.sh -t -o "-tcg mttcg=on"
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>  run_tests.sh           | 10 +++++++---
>>  scripts/functions.bash | 13 +++++++------
>>  2 files changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/run_tests.sh b/run_tests.sh
>> index 4f2e5cb..05cc7fb 100755
>> --- a/run_tests.sh
>> +++ b/run_tests.sh
>> @@ -13,10 +13,11 @@ function usage()
>>  {
>>  cat <<EOF
>>
>> -Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
>> +Usage: $0 [-g group] [-a accel] [-o qemu_opts] [-t] [-h] [-v]
>>
>>      -g: Only execute tests in the given group
>>      -a: Force acceleration mode (tcg/kvm)
>> +    -o: additional options for QEMU command line
>>      -t: disable timeouts
>>      -h: Output this help text
>>      -v: Enables verbose mode
>> @@ -30,7 +31,7 @@ EOF
>>  RUNTIME_arch_run="./$TEST_DIR/run"
>>  source scripts/runtime.bash
>>
>> -while getopts "g:a:thv" opt; do
>> +while getopts "g:a:o:thv" opt; do
>>      case $opt in
>>          g)
>>              only_group=$OPTARG
>> @@ -38,6 +39,9 @@ while getopts "g:a:thv" opt; do
>>          a)
>>              force_accel=$OPTARG
>>              ;;
>> +        o)
>> +            extra_opts=$OPTARG
>> +            ;;
>>          t)
>>              no_timeout="yes"
>>              ;;
>> @@ -67,4 +71,4 @@ RUNTIME_log_stdout () {
>>  config=$TEST_DIR/unittests.cfg
>>  rm -f test.log
>>  printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
>> -for_each_unittest $config run
>> +for_each_unittest $config run "$extra_opts"
>> diff --git a/scripts/functions.bash b/scripts/functions.bash
>> index ee9143c..d38a69e 100644
>> --- a/scripts/functions.bash
>> +++ b/scripts/functions.bash
>> @@ -2,11 +2,12 @@
>>  function for_each_unittest()
>>  {
>>  	local unittests="$1"
>> -	local cmd="$2"
>> -	local testname
>> +        local cmd="$2"
>> +        local extra_opts=$3
>> +        local testname
>
> We use tabs in this file. Not sure why cmd and testname got
> changed too...
>
>>  	local smp
>>  	local kernel
>> -	local opts
>> +        local opts=$extra_opts
>>  	local groups
>>  	local arch
>>  	local check
>> @@ -21,7 +22,7 @@ function for_each_unittest()
>>  			testname=${BASH_REMATCH[1]}
>>  			smp=1
>>  			kernel=""
>> -			opts=""
>> +                        opts=$extra_opts
>>  			groups=""
>>  			arch=""
>>  			check=""
>> @@ -32,7 +33,7 @@ function for_each_unittest()
>>  		elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
>>  			smp=${BASH_REMATCH[1]}
>>  		elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
>> -			opts=${BASH_REMATCH[1]}
>> +                        opts="$opts ${BASH_REMATCH[1]}"
>>  		elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
>>  			groups=${BASH_REMATCH[1]}
>>  		elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
>> @@ -45,6 +46,6 @@ function for_each_unittest()
>>  			timeout=${BASH_REMATCH[1]}
>>  		fi
>>  	done
>> -	"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
>> +        "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
>>  	exec {fd}<&-
>>  }
>> --
>> 2.10.1
>>
>>
>
> This is a pretty good idea, but I think I might like the extra options
> to be given like this instead
>
>   ./run_tests.sh [run_tests.sh options] -- [qemu options]
>
> Thanks,
> drew

That sounds like a better way, I'll fix that.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU
@ 2016-11-28 11:22       ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2016-11-28 11:22 UTC (permalink / raw)
  To: linux-arm-kernel


Andrew Jones <drjones@redhat.com> writes:

> On Thu, Nov 24, 2016 at 04:10:25PM +0000, Alex Benn?e wrote:
>> This introduces a the option -o for passing of options directly to QEMU
>> which is useful. In my case I'm using it to toggle MTTCG on an off:
>>
>>   ./run_tests.sh -t -o "-tcg mttcg=on"
>>
>> Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
>> ---
>>  run_tests.sh           | 10 +++++++---
>>  scripts/functions.bash | 13 +++++++------
>>  2 files changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/run_tests.sh b/run_tests.sh
>> index 4f2e5cb..05cc7fb 100755
>> --- a/run_tests.sh
>> +++ b/run_tests.sh
>> @@ -13,10 +13,11 @@ function usage()
>>  {
>>  cat <<EOF
>>
>> -Usage: $0 [-g group] [-a accel] [-t] [-h] [-v]
>> +Usage: $0 [-g group] [-a accel] [-o qemu_opts] [-t] [-h] [-v]
>>
>>      -g: Only execute tests in the given group
>>      -a: Force acceleration mode (tcg/kvm)
>> +    -o: additional options for QEMU command line
>>      -t: disable timeouts
>>      -h: Output this help text
>>      -v: Enables verbose mode
>> @@ -30,7 +31,7 @@ EOF
>>  RUNTIME_arch_run="./$TEST_DIR/run"
>>  source scripts/runtime.bash
>>
>> -while getopts "g:a:thv" opt; do
>> +while getopts "g:a:o:thv" opt; do
>>      case $opt in
>>          g)
>>              only_group=$OPTARG
>> @@ -38,6 +39,9 @@ while getopts "g:a:thv" opt; do
>>          a)
>>              force_accel=$OPTARG
>>              ;;
>> +        o)
>> +            extra_opts=$OPTARG
>> +            ;;
>>          t)
>>              no_timeout="yes"
>>              ;;
>> @@ -67,4 +71,4 @@ RUNTIME_log_stdout () {
>>  config=$TEST_DIR/unittests.cfg
>>  rm -f test.log
>>  printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
>> -for_each_unittest $config run
>> +for_each_unittest $config run "$extra_opts"
>> diff --git a/scripts/functions.bash b/scripts/functions.bash
>> index ee9143c..d38a69e 100644
>> --- a/scripts/functions.bash
>> +++ b/scripts/functions.bash
>> @@ -2,11 +2,12 @@
>>  function for_each_unittest()
>>  {
>>  	local unittests="$1"
>> -	local cmd="$2"
>> -	local testname
>> +        local cmd="$2"
>> +        local extra_opts=$3
>> +        local testname
>
> We use tabs in this file. Not sure why cmd and testname got
> changed too...
>
>>  	local smp
>>  	local kernel
>> -	local opts
>> +        local opts=$extra_opts
>>  	local groups
>>  	local arch
>>  	local check
>> @@ -21,7 +22,7 @@ function for_each_unittest()
>>  			testname=${BASH_REMATCH[1]}
>>  			smp=1
>>  			kernel=""
>> -			opts=""
>> +                        opts=$extra_opts
>>  			groups=""
>>  			arch=""
>>  			check=""
>> @@ -32,7 +33,7 @@ function for_each_unittest()
>>  		elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
>>  			smp=${BASH_REMATCH[1]}
>>  		elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
>> -			opts=${BASH_REMATCH[1]}
>> +                        opts="$opts ${BASH_REMATCH[1]}"
>>  		elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
>>  			groups=${BASH_REMATCH[1]}
>>  		elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
>> @@ -45,6 +46,6 @@ function for_each_unittest()
>>  			timeout=${BASH_REMATCH[1]}
>>  		fi
>>  	done
>> -	"$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
>> +        "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
>>  	exec {fd}<&-
>>  }
>> --
>> 2.10.1
>>
>>
>
> This is a pretty good idea, but I think I might like the extra options
> to be given like this instead
>
>   ./run_tests.sh [run_tests.sh options] -- [qemu options]
>
> Thanks,
> drew

That sounds like a better way, I'll fix that.

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
  2016-11-28 11:14       ` Peter Maydell
  (?)
@ 2016-11-28 11:58         ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 11:58 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alex Bennée, MTTCG Devel, Mark Burton, Nikunj A Dadhania,
	kvm-devel, Marc Zyngier, Jan Kiszka, Claudio Fontana,
	Alvise Rigo, QEMU Developers, Emilio G. Cota, arm-mail-list,
	Paolo Bonzini, Fedorov Sergey, Pranith Kumar,
	KONRAD Frédéric, kvmarm

On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
> On 28 November 2016 at 11:12, Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> > Andrew Jones <drjones@redhat.com> writes:
> >> I've skimmed over everything looking at it from a framwork/sytle
> >> perspective. I didn't dig in trying to understand the tests though.
> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> >
> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> 
> So large that you don't want to hardcode it as an array size...

255 with the gic series, not yet merged. Even if you have a dozen arrays
with that as the size, then unless your array element size is huge (on
the order multiple pages), then it probably doesn't matter. Using the
default memory allocation of a QEMU guest, 128 MB, unit tests have plenty
of memory at their disposal. The framework itself only allocates a handful
of pages.

Of course the framework also supports dynamic memory allocation, so you
could do malloc(nr_cpus * element_size), to avoid excess.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 11:58         ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 11:58 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alex Bennée, MTTCG Devel, Mark Burton, Nikunj A Dadhania,
	kvm-devel, Marc Zyngier, Jan Kiszka, Claudio Fontana,
	Alvise Rigo, QEMU Developers, Emilio G. Cota, arm-mail-list,
	Paolo Bonzini, Fedorov Sergey, Pranith Kumar,
	KONRAD Frédéric, kvmarm, Christoffer Dall,
	Richard Henderson

On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
> On 28 November 2016 at 11:12, Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> > Andrew Jones <drjones@redhat.com> writes:
> >> I've skimmed over everything looking at it from a framwork/sytle
> >> perspective. I didn't dig in trying to understand the tests though.
> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> >
> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> 
> So large that you don't want to hardcode it as an array size...

255 with the gic series, not yet merged. Even if you have a dozen arrays
with that as the size, then unless your array element size is huge (on
the order multiple pages), then it probably doesn't matter. Using the
default memory allocation of a QEMU guest, 128 MB, unit tests have plenty
of memory at their disposal. The framework itself only allocates a handful
of pages.

Of course the framework also supports dynamic memory allocation, so you
could do malloc(nr_cpus * element_size), to avoid excess.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 11:58         ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 11:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
> On 28 November 2016 at 11:12, Alex Benn?e <alex.bennee@linaro.org> wrote:
> >
> > Andrew Jones <drjones@redhat.com> writes:
> >> I've skimmed over everything looking at it from a framwork/sytle
> >> perspective. I didn't dig in trying to understand the tests though.
> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> >
> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> 
> So large that you don't want to hardcode it as an array size...

255 with the gic series, not yet merged. Even if you have a dozen arrays
with that as the size, then unless your array element size is huge (on
the order multiple pages), then it probably doesn't matter. Using the
default memory allocation of a QEMU guest, 128 MB, unit tests have plenty
of memory at their disposal. The framework itself only allocates a handful
of pages.

Of course the framework also supports dynamic memory allocation, so you
could do malloc(nr_cpus * element_size), to avoid excess.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
  2016-11-28 11:58         ` Andrew Jones
  (?)
@ 2016-11-28 13:30           ` Peter Maydell
  -1 siblings, 0 replies; 93+ messages in thread
From: Peter Maydell @ 2016-11-28 13:30 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Alex Bennée, MTTCG Devel, Mark Burton, Nikunj A Dadhania,
	kvm-devel, Marc Zyngier, Jan Kiszka, Claudio Fontana,
	Alvise Rigo, QEMU Developers, Emilio G. Cota, arm-mail-list,
	Paolo Bonzini, Fedorov Sergey, Pranith Kumar,
	KONRAD Frédéric, kvmarm

On 28 November 2016 at 11:58, Andrew Jones <drjones@redhat.com> wrote:
> On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
>> On 28 November 2016 at 11:12, Alex Bennée <alex.bennee@linaro.org> wrote:
>> >
>> > Andrew Jones <drjones@redhat.com> writes:
>> >> I've skimmed over everything looking at it from a framwork/sytle
>> >> perspective. I didn't dig in trying to understand the tests though.
>> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
>> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
>> >
>> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
>>
>> So large that you don't want to hardcode it as an array size...
>
> 255 with the gic series, not yet merged.

I was talking about the architectural GICv3 limit, which is larger
than that by many orders of magnitude. For QEMU it looks like
MAX_CPUMASK_BITS is now 288 rather than 255.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 13:30           ` Peter Maydell
  0 siblings, 0 replies; 93+ messages in thread
From: Peter Maydell @ 2016-11-28 13:30 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Alex Bennée, MTTCG Devel, Mark Burton, Nikunj A Dadhania,
	kvm-devel, Marc Zyngier, Jan Kiszka, Claudio Fontana,
	Alvise Rigo, QEMU Developers, Emilio G. Cota, arm-mail-list,
	Paolo Bonzini, Fedorov Sergey, Pranith Kumar,
	KONRAD Frédéric, kvmarm, Christoffer Dall,
	Richard Henderson

On 28 November 2016 at 11:58, Andrew Jones <drjones@redhat.com> wrote:
> On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
>> On 28 November 2016 at 11:12, Alex Bennée <alex.bennee@linaro.org> wrote:
>> >
>> > Andrew Jones <drjones@redhat.com> writes:
>> >> I've skimmed over everything looking at it from a framwork/sytle
>> >> perspective. I didn't dig in trying to understand the tests though.
>> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
>> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
>> >
>> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
>>
>> So large that you don't want to hardcode it as an array size...
>
> 255 with the gic series, not yet merged.

I was talking about the architectural GICv3 limit, which is larger
than that by many orders of magnitude. For QEMU it looks like
MAX_CPUMASK_BITS is now 288 rather than 255.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 13:30           ` Peter Maydell
  0 siblings, 0 replies; 93+ messages in thread
From: Peter Maydell @ 2016-11-28 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 November 2016 at 11:58, Andrew Jones <drjones@redhat.com> wrote:
> On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
>> On 28 November 2016 at 11:12, Alex Benn?e <alex.bennee@linaro.org> wrote:
>> >
>> > Andrew Jones <drjones@redhat.com> writes:
>> >> I've skimmed over everything looking at it from a framwork/sytle
>> >> perspective. I didn't dig in trying to understand the tests though.
>> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
>> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
>> >
>> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
>>
>> So large that you don't want to hardcode it as an array size...
>
> 255 with the gic series, not yet merged.

I was talking about the architectural GICv3 limit, which is larger
than that by many orders of magnitude. For QEMU it looks like
MAX_CPUMASK_BITS is now 288 rather than 255.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
  2016-11-28 13:30           ` Peter Maydell
  (?)
@ 2016-11-28 14:04             ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 14:04 UTC (permalink / raw)
  To: Peter Maydell
  Cc: MTTCG Devel, Nikunj A Dadhania, kvm-devel, Marc Zyngier,
	Jan Kiszka, Mark Burton, QEMU Developers, Emilio G. Cota,
	Fedorov Sergey, Paolo Bonzini, Pranith Kumar, Richard Henderson,
	kvmarm, arm-mail-list

On Mon, Nov 28, 2016 at 01:30:54PM +0000, Peter Maydell wrote:
> On 28 November 2016 at 11:58, Andrew Jones <drjones@redhat.com> wrote:
> > On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
> >> On 28 November 2016 at 11:12, Alex Bennée <alex.bennee@linaro.org> wrote:
> >> >
> >> > Andrew Jones <drjones@redhat.com> writes:
> >> >> I've skimmed over everything looking at it from a framwork/sytle
> >> >> perspective. I didn't dig in trying to understand the tests though.
> >> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> >> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> >> >
> >> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> >>
> >> So large that you don't want to hardcode it as an array size...
> >
> > 255 with the gic series, not yet merged.
> 
> I was talking about the architectural GICv3 limit, which is larger
> than that by many orders of magnitude. For QEMU it looks like
> MAX_CPUMASK_BITS is now 288 rather than 255.

Ah, yeah. So far we haven't considered testing limits beyond what
KVM supports, VGIC_V3_MAX_CPUS=255. However with TCG, and some
patience, we could attempt to test bigger limits. In that case,
though, we'll want to recompile kvm-unit-tests with a larger NR_CPUS
and run a specific unit test.

mach-virt still has 255 as well, mc->max_cpus = 255, so we'd have
to bump that too if we want to experiment.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 14:04             ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 14:04 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alex Bennée, MTTCG Devel, Mark Burton, Nikunj A Dadhania,
	kvm-devel, Marc Zyngier, Jan Kiszka, Claudio Fontana,
	Alvise Rigo, QEMU Developers, Emilio G. Cota, arm-mail-list,
	Paolo Bonzini, Fedorov Sergey, Pranith Kumar,
	KONRAD Frédéric, kvmarm, Christoffer Dall,
	Richard Henderson

On Mon, Nov 28, 2016 at 01:30:54PM +0000, Peter Maydell wrote:
> On 28 November 2016 at 11:58, Andrew Jones <drjones@redhat.com> wrote:
> > On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
> >> On 28 November 2016 at 11:12, Alex Bennée <alex.bennee@linaro.org> wrote:
> >> >
> >> > Andrew Jones <drjones@redhat.com> writes:
> >> >> I've skimmed over everything looking at it from a framwork/sytle
> >> >> perspective. I didn't dig in trying to understand the tests though.
> >> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> >> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> >> >
> >> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> >>
> >> So large that you don't want to hardcode it as an array size...
> >
> > 255 with the gic series, not yet merged.
> 
> I was talking about the architectural GICv3 limit, which is larger
> than that by many orders of magnitude. For QEMU it looks like
> MAX_CPUMASK_BITS is now 288 rather than 255.

Ah, yeah. So far we haven't considered testing limits beyond what
KVM supports, VGIC_V3_MAX_CPUS=255. However with TCG, and some
patience, we could attempt to test bigger limits. In that case,
though, we'll want to recompile kvm-unit-tests with a larger NR_CPUS
and run a specific unit test.

mach-virt still has 255 as well, mc->max_cpus = 255, so we'd have
to bump that too if we want to experiment.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 14:04             ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 28, 2016 at 01:30:54PM +0000, Peter Maydell wrote:
> On 28 November 2016 at 11:58, Andrew Jones <drjones@redhat.com> wrote:
> > On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
> >> On 28 November 2016 at 11:12, Alex Benn?e <alex.bennee@linaro.org> wrote:
> >> >
> >> > Andrew Jones <drjones@redhat.com> writes:
> >> >> I've skimmed over everything looking at it from a framwork/sytle
> >> >> perspective. I didn't dig in trying to understand the tests though.
> >> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> >> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> >> >
> >> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> >>
> >> So large that you don't want to hardcode it as an array size...
> >
> > 255 with the gic series, not yet merged.
> 
> I was talking about the architectural GICv3 limit, which is larger
> than that by many orders of magnitude. For QEMU it looks like
> MAX_CPUMASK_BITS is now 288 rather than 255.

Ah, yeah. So far we haven't considered testing limits beyond what
KVM supports, VGIC_V3_MAX_CPUS=255. However with TCG, and some
patience, we could attempt to test bigger limits. In that case,
though, we'll want to recompile kvm-unit-tests with a larger NR_CPUS
and run a specific unit test.

mach-virt still has 255 as well, mc->max_cpus = 255, so we'd have
to bump that too if we want to experiment.

Thanks,
drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
  2016-11-28 14:04             ` Andrew Jones
  (?)
@ 2016-11-28 14:07               ` Andrew Jones
  -1 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 14:07 UTC (permalink / raw)
  To: Peter Maydell
  Cc: MTTCG Devel, Nikunj A Dadhania, kvm-devel, Marc Zyngier,
	Jan Kiszka, Mark Burton, QEMU Developers, Emilio G. Cota,
	Fedorov Sergey, Paolo Bonzini, Pranith Kumar, Richard Henderson,
	kvmarm, arm-mail-list

On Mon, Nov 28, 2016 at 03:04:45PM +0100, Andrew Jones wrote:
> On Mon, Nov 28, 2016 at 01:30:54PM +0000, Peter Maydell wrote:
> > On 28 November 2016 at 11:58, Andrew Jones <drjones@redhat.com> wrote:
> > > On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
> > >> On 28 November 2016 at 11:12, Alex Bennée <alex.bennee@linaro.org> wrote:
> > >> >
> > >> > Andrew Jones <drjones@redhat.com> writes:
> > >> >> I've skimmed over everything looking at it from a framwork/sytle
> > >> >> perspective. I didn't dig in trying to understand the tests though.
> > >> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> > >> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> > >> >
> > >> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> > >>
> > >> So large that you don't want to hardcode it as an array size...
> > >
> > > 255 with the gic series, not yet merged.
> > 
> > I was talking about the architectural GICv3 limit, which is larger
> > than that by many orders of magnitude. For QEMU it looks like
> > MAX_CPUMASK_BITS is now 288 rather than 255.
> 
> Ah, yeah. So far we haven't considered testing limits beyond what
> KVM supports, VGIC_V3_MAX_CPUS=255. However with TCG, and some
> patience, we could attempt to test bigger limits. In that case,
> though, we'll want to recompile kvm-unit-tests with a larger NR_CPUS
> and run a specific unit test.
> 
> mach-virt still has 255 as well, mc->max_cpus = 255, so we'd have
> to bump that too if we want to experiment.

Er... actually mach-virt is 123, as we only allocate 123 redistributors.

drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 14:07               ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 14:07 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alex Bennée, MTTCG Devel, Mark Burton, Nikunj A Dadhania,
	kvm-devel, Marc Zyngier, Jan Kiszka, Claudio Fontana,
	Alvise Rigo, QEMU Developers, Emilio G. Cota, arm-mail-list,
	Paolo Bonzini, Fedorov Sergey, Pranith Kumar,
	KONRAD Frédéric, kvmarm, Christoffer Dall,
	Richard Henderson

On Mon, Nov 28, 2016 at 03:04:45PM +0100, Andrew Jones wrote:
> On Mon, Nov 28, 2016 at 01:30:54PM +0000, Peter Maydell wrote:
> > On 28 November 2016 at 11:58, Andrew Jones <drjones@redhat.com> wrote:
> > > On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
> > >> On 28 November 2016 at 11:12, Alex Bennée <alex.bennee@linaro.org> wrote:
> > >> >
> > >> > Andrew Jones <drjones@redhat.com> writes:
> > >> >> I've skimmed over everything looking at it from a framwork/sytle
> > >> >> perspective. I didn't dig in trying to understand the tests though.
> > >> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> > >> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> > >> >
> > >> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> > >>
> > >> So large that you don't want to hardcode it as an array size...
> > >
> > > 255 with the gic series, not yet merged.
> > 
> > I was talking about the architectural GICv3 limit, which is larger
> > than that by many orders of magnitude. For QEMU it looks like
> > MAX_CPUMASK_BITS is now 288 rather than 255.
> 
> Ah, yeah. So far we haven't considered testing limits beyond what
> KVM supports, VGIC_V3_MAX_CPUS=255. However with TCG, and some
> patience, we could attempt to test bigger limits. In that case,
> though, we'll want to recompile kvm-unit-tests with a larger NR_CPUS
> and run a specific unit test.
> 
> mach-virt still has 255 as well, mc->max_cpus = 255, so we'd have
> to bump that too if we want to experiment.

Er... actually mach-virt is 123, as we only allocate 123 redistributors.

drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 14:07               ` Andrew Jones
  0 siblings, 0 replies; 93+ messages in thread
From: Andrew Jones @ 2016-11-28 14:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 28, 2016 at 03:04:45PM +0100, Andrew Jones wrote:
> On Mon, Nov 28, 2016 at 01:30:54PM +0000, Peter Maydell wrote:
> > On 28 November 2016 at 11:58, Andrew Jones <drjones@redhat.com> wrote:
> > > On Mon, Nov 28, 2016 at 11:14:48AM +0000, Peter Maydell wrote:
> > >> On 28 November 2016 at 11:12, Alex Benn?e <alex.bennee@linaro.org> wrote:
> > >> >
> > >> > Andrew Jones <drjones@redhat.com> writes:
> > >> >> I've skimmed over everything looking at it from a framwork/sytle
> > >> >> perspective. I didn't dig in trying to understand the tests though.
> > >> >> One general comment, I see many tests introduce MAX_CPUS 8. Why do
> > >> >> that? Why not allow all cpus by using NR_CPUS for the array sizes?
> > >> >
> > >> > Yeah - I can fix those. I wonder what the maximum is with GIC V3?
> > >>
> > >> So large that you don't want to hardcode it as an array size...
> > >
> > > 255 with the gic series, not yet merged.
> > 
> > I was talking about the architectural GICv3 limit, which is larger
> > than that by many orders of magnitude. For QEMU it looks like
> > MAX_CPUMASK_BITS is now 288 rather than 255.
> 
> Ah, yeah. So far we haven't considered testing limits beyond what
> KVM supports, VGIC_V3_MAX_CPUS=255. However with TCG, and some
> patience, we could attempt to test bigger limits. In that case,
> though, we'll want to recompile kvm-unit-tests with a larger NR_CPUS
> and run a specific unit test.
> 
> mach-virt still has 255 as well, mc->max_cpus = 255, so we'd have
> to bump that too if we want to experiment.

Er... actually mach-virt is 123, as we only allocate 123 redistributors.

drew

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
  2016-11-28 14:07               ` Andrew Jones
  (?)
@ 2016-11-28 14:09                 ` Peter Maydell
  -1 siblings, 0 replies; 93+ messages in thread
From: Peter Maydell @ 2016-11-28 14:09 UTC (permalink / raw)
  To: Andrew Jones
  Cc: MTTCG Devel, Nikunj A Dadhania, kvm-devel, Marc Zyngier,
	Jan Kiszka, Mark Burton, QEMU Developers, Emilio G. Cota,
	Fedorov Sergey, Paolo Bonzini, Pranith Kumar, Richard Henderson,
	kvmarm, arm-mail-list

On 28 November 2016 at 14:07, Andrew Jones <drjones@redhat.com> wrote:
> Er... actually mach-virt is 123, as we only allocate 123 redistributors.

Oh yes, I'd forgotten about that limit. We'd need to add
a KVM API for allocating redistributors in non-contiguous
bits of memory if we wanted to raise that.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 14:09                 ` Peter Maydell
  0 siblings, 0 replies; 93+ messages in thread
From: Peter Maydell @ 2016-11-28 14:09 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Alex Bennée, MTTCG Devel, Mark Burton, Nikunj A Dadhania,
	kvm-devel, Marc Zyngier, Jan Kiszka, Claudio Fontana,
	Alvise Rigo, QEMU Developers, Emilio G. Cota, arm-mail-list,
	Paolo Bonzini, Fedorov Sergey, Pranith Kumar,
	KONRAD Frédéric, kvmarm, Christoffer Dall,
	Richard Henderson

On 28 November 2016 at 14:07, Andrew Jones <drjones@redhat.com> wrote:
> Er... actually mach-virt is 123, as we only allocate 123 redistributors.

Oh yes, I'd forgotten about that limit. We'd need to add
a KVM API for allocating redistributors in non-contiguous
bits of memory if we wanted to raise that.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases
@ 2016-11-28 14:09                 ` Peter Maydell
  0 siblings, 0 replies; 93+ messages in thread
From: Peter Maydell @ 2016-11-28 14:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 November 2016 at 14:07, Andrew Jones <drjones@redhat.com> wrote:
> Er... actually mach-virt is 123, as we only allocate 123 redistributors.

Oh yes, I'd forgotten about that limit. We'd need to add
a KVM API for allocating redistributors in non-contiguous
bits of memory if we wanted to raise that.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
  2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
  (?)
@ 2017-01-10 15:23     ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2017-01-10 15:23 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana


Alex Bennée <alex.bennee@linaro.org> writes:

> So we can have portable formatting of uint32_t types.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  lib/libcflat.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/lib/libcflat.h b/lib/libcflat.h
> index bdcc561..6dab5be 100644
> --- a/lib/libcflat.h
> +++ b/lib/libcflat.h
> @@ -55,12 +55,17 @@ typedef _Bool		bool;
>  #define true  1
>
>  #if __SIZEOF_LONG__ == 8
> +#  define __PRI32_PREFIX
>  #  define __PRI64_PREFIX	"l"
>  #  define __PRIPTR_PREFIX	"l"
>  #else
> +#  define __PRI32_PREFIX        "l"

OK this is bogus, but the failure is because of where we get uint32_t
from (hint using arm32 compiler on a arm64 system) so I got:

  lib/pci.c:71:9: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'uint32_t {aka long unsigned int}' [-Werro\
r=format=]

Which makes me think we should be more careful about including system
headers in kvm-unit-tests (done in 75e777a0).

--
Alex Bennée

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
@ 2017-01-10 15:23     ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2017-01-10 15:23 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana


Alex Bennée <alex.bennee@linaro.org> writes:

> So we can have portable formatting of uint32_t types.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  lib/libcflat.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/lib/libcflat.h b/lib/libcflat.h
> index bdcc561..6dab5be 100644
> --- a/lib/libcflat.h
> +++ b/lib/libcflat.h
> @@ -55,12 +55,17 @@ typedef _Bool		bool;
>  #define true  1
>
>  #if __SIZEOF_LONG__ == 8
> +#  define __PRI32_PREFIX
>  #  define __PRI64_PREFIX	"l"
>  #  define __PRIPTR_PREFIX	"l"
>  #else
> +#  define __PRI32_PREFIX        "l"

OK this is bogus, but the failure is because of where we get uint32_t
from (hint using arm32 compiler on a arm64 system) so I got:

  lib/pci.c:71:9: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'uint32_t {aka long unsigned int}' [-Werro\
r=format=]

Which makes me think we should be more careful about including system
headers in kvm-unit-tests (done in 75e777a0).

--
Alex Bennée

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
@ 2017-01-10 15:23     ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2017-01-10 15:23 UTC (permalink / raw)
  To: linux-arm-kernel


Alex Benn?e <alex.bennee@linaro.org> writes:

> So we can have portable formatting of uint32_t types.
>
> Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
> ---
>  lib/libcflat.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/lib/libcflat.h b/lib/libcflat.h
> index bdcc561..6dab5be 100644
> --- a/lib/libcflat.h
> +++ b/lib/libcflat.h
> @@ -55,12 +55,17 @@ typedef _Bool		bool;
>  #define true  1
>
>  #if __SIZEOF_LONG__ == 8
> +#  define __PRI32_PREFIX
>  #  define __PRI64_PREFIX	"l"
>  #  define __PRIPTR_PREFIX	"l"
>  #else
> +#  define __PRI32_PREFIX        "l"

OK this is bogus, but the failure is because of where we get uint32_t
from (hint using arm32 compiler on a arm64 system) so I got:

  lib/pci.c:71:9: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'uint32_t {aka long unsigned int}' [-Werro\
r=format=]

Which makes me think we should be more careful about including system
headers in kvm-unit-tests (done in 75e777a0).

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
  2017-01-10 15:23     ` [Qemu-devel] " Alex Bennée
  (?)
@ 2017-01-10 15:29       ` Alex Bennée
  -1 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2017-01-10 15:29 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana


Alex Bennée <alex.bennee@linaro.org> writes:

> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> So we can have portable formatting of uint32_t types.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>  lib/libcflat.h | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/lib/libcflat.h b/lib/libcflat.h
>> index bdcc561..6dab5be 100644
>> --- a/lib/libcflat.h
>> +++ b/lib/libcflat.h
>> @@ -55,12 +55,17 @@ typedef _Bool		bool;
>>  #define true  1
>>
>>  #if __SIZEOF_LONG__ == 8
>> +#  define __PRI32_PREFIX
>>  #  define __PRI64_PREFIX	"l"
>>  #  define __PRIPTR_PREFIX	"l"
>>  #else
>> +#  define __PRI32_PREFIX        "l"
>
> OK this is bogus, but the failure is because of where we get uint32_t
> from (hint using arm32 compiler on a arm64 system) so I got:
>
>   lib/pci.c:71:9: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'uint32_t {aka long unsigned int}' [-Werro\
> r=format=]
>
> Which makes me think we should be more careful about including system
> headers in kvm-unit-tests (done in 75e777a0).

Hmm it turns out my compiler is d.r.t as far as it is concerned:

  # 34 "/usr/lib/gcc/arm-none-eabi/5.4.1/include/stdint-gcc.h" 3 4
  typedef signed char int8_t;


  typedef short int int16_t;


  typedef long int int32_t;


  typedef long long int int64_t;


  typedef unsigned char uint8_t;


  typedef short unsigned int uint16_t;


  typedef long unsigned int uint32_t;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
@ 2017-01-10 15:29       ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2017-01-10 15:29 UTC (permalink / raw)
  To: kvm, linux-arm-kernel, kvmarm, christoffer.dall, marc.zyngier
  Cc: qemu-devel, mttcg, fred.konrad, a.rigo, cota, bobby.prani,
	nikunj, mark.burton, pbonzini, jan.kiszka, serge.fdrv, rth,
	peter.maydell, claudio.fontana


Alex Bennée <alex.bennee@linaro.org> writes:

> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> So we can have portable formatting of uint32_t types.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>  lib/libcflat.h | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/lib/libcflat.h b/lib/libcflat.h
>> index bdcc561..6dab5be 100644
>> --- a/lib/libcflat.h
>> +++ b/lib/libcflat.h
>> @@ -55,12 +55,17 @@ typedef _Bool		bool;
>>  #define true  1
>>
>>  #if __SIZEOF_LONG__ == 8
>> +#  define __PRI32_PREFIX
>>  #  define __PRI64_PREFIX	"l"
>>  #  define __PRIPTR_PREFIX	"l"
>>  #else
>> +#  define __PRI32_PREFIX        "l"
>
> OK this is bogus, but the failure is because of where we get uint32_t
> from (hint using arm32 compiler on a arm64 system) so I got:
>
>   lib/pci.c:71:9: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'uint32_t {aka long unsigned int}' [-Werro\
> r=format=]
>
> Which makes me think we should be more careful about including system
> headers in kvm-unit-tests (done in 75e777a0).

Hmm it turns out my compiler is d.r.t as far as it is concerned:

  # 34 "/usr/lib/gcc/arm-none-eabi/5.4.1/include/stdint-gcc.h" 3 4
  typedef signed char int8_t;


  typedef short int int16_t;


  typedef long int int32_t;


  typedef long long int int64_t;


  typedef unsigned char uint8_t;


  typedef short unsigned int uint16_t;


  typedef long unsigned int uint32_t;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types
@ 2017-01-10 15:29       ` Alex Bennée
  0 siblings, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2017-01-10 15:29 UTC (permalink / raw)
  To: linux-arm-kernel


Alex Benn?e <alex.bennee@linaro.org> writes:

> Alex Benn?e <alex.bennee@linaro.org> writes:
>
>> So we can have portable formatting of uint32_t types.
>>
>> Signed-off-by: Alex Benn?e <alex.bennee@linaro.org>
>> ---
>>  lib/libcflat.h | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/lib/libcflat.h b/lib/libcflat.h
>> index bdcc561..6dab5be 100644
>> --- a/lib/libcflat.h
>> +++ b/lib/libcflat.h
>> @@ -55,12 +55,17 @@ typedef _Bool		bool;
>>  #define true  1
>>
>>  #if __SIZEOF_LONG__ == 8
>> +#  define __PRI32_PREFIX
>>  #  define __PRI64_PREFIX	"l"
>>  #  define __PRIPTR_PREFIX	"l"
>>  #else
>> +#  define __PRI32_PREFIX        "l"
>
> OK this is bogus, but the failure is because of where we get uint32_t
> from (hint using arm32 compiler on a arm64 system) so I got:
>
>   lib/pci.c:71:9: error: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'uint32_t {aka long unsigned int}' [-Werro\
> r=format=]
>
> Which makes me think we should be more careful about including system
> headers in kvm-unit-tests (done in 75e777a0).

Hmm it turns out my compiler is d.r.t as far as it is concerned:

  # 34 "/usr/lib/gcc/arm-none-eabi/5.4.1/include/stdint-gcc.h" 3 4
  typedef signed char int8_t;


  typedef short int int16_t;


  typedef long int int32_t;


  typedef long long int int64_t;


  typedef unsigned char uint8_t;


  typedef short unsigned int uint16_t;


  typedef long unsigned int uint32_t;


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 93+ messages in thread

end of thread, other threads:[~2017-01-10 15:29 UTC | newest]

Thread overview: 93+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-24 16:10 [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases Alex Bennée
2016-11-24 16:10 ` Alex Bennée
2016-11-24 16:10 ` [Qemu-devel] " Alex Bennée
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 01/11] run_tests: allow forcing of acceleration mode Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-28  8:51   ` Andrew Jones
2016-11-28  8:51     ` Andrew Jones
2016-11-28  8:51     ` [Qemu-devel] " Andrew Jones
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 02/11] run_tests: allow disabling of timeouts Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-28  9:00   ` Andrew Jones
2016-11-28  9:00     ` Andrew Jones
2016-11-28  9:00     ` [Qemu-devel] " Andrew Jones
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 03/11] run_tests: allow passing of options to QEMU Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-28  9:10   ` Andrew Jones
2016-11-28  9:10     ` Andrew Jones
2016-11-28 11:22     ` Alex Bennée
2016-11-28 11:22       ` Alex Bennée
2016-11-28 11:22       ` Alex Bennée
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 04/11] libcflat: add PRI(dux)32 format types Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-28  9:18   ` Andrew Jones
2016-11-28  9:18     ` Andrew Jones
2016-11-28  9:18     ` [Qemu-devel] " Andrew Jones
2017-01-10 15:23   ` Alex Bennée
2017-01-10 15:23     ` Alex Bennée
2017-01-10 15:23     ` [Qemu-devel] " Alex Bennée
2017-01-10 15:29     ` Alex Bennée
2017-01-10 15:29       ` Alex Bennée
2017-01-10 15:29       ` [Qemu-devel] " Alex Bennée
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 05/11] lib: add isaac prng library from CCAN Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 06/11] arm/Makefile.common: force -fno-pic Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-28  9:33   ` Andrew Jones
2016-11-28  9:33     ` Andrew Jones
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 07/11] arm/tlbflush-code: Add TLB flush during code execution test Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-28  9:42   ` Andrew Jones
2016-11-28  9:42     ` Andrew Jones
2016-11-28  9:42     ` [Qemu-devel] " Andrew Jones
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 08/11] arm/tlbflush-data: Add TLB flush during data writes test Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-28 10:11   ` Andrew Jones
2016-11-28 10:11     ` Andrew Jones
2016-11-28 10:11     ` [Qemu-devel] " Andrew Jones
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 09/11] arm/locking-tests: add comprehensive locking test Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-28 10:29   ` Andrew Jones
2016-11-28 10:29     ` Andrew Jones
2016-11-28 10:29     ` [Qemu-devel] " Andrew Jones
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 10/11] arm/barrier-litmus-tests: add simple mp and sal litmus tests Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-24 16:10 ` [kvm-unit-tests PATCH v7 11/11] arm/tcg-test: some basic TCG exercising tests Alex Bennée
2016-11-24 16:10   ` Alex Bennée
2016-11-24 16:10   ` [Qemu-devel] " Alex Bennée
2016-11-28 10:37 ` [Qemu-devel] [kvm-unit-tests PATCH v7 00/11] QEMU MTTCG Test cases Andrew Jones
2016-11-28 10:37   ` Andrew Jones
2016-11-28 10:37   ` Andrew Jones
2016-11-28 11:12   ` Alex Bennée
2016-11-28 11:12     ` Alex Bennée
2016-11-28 11:14     ` Peter Maydell
2016-11-28 11:14       ` Peter Maydell
2016-11-28 11:14       ` Peter Maydell
2016-11-28 11:58       ` Andrew Jones
2016-11-28 11:58         ` Andrew Jones
2016-11-28 11:58         ` Andrew Jones
2016-11-28 13:30         ` Peter Maydell
2016-11-28 13:30           ` Peter Maydell
2016-11-28 13:30           ` Peter Maydell
2016-11-28 14:04           ` Andrew Jones
2016-11-28 14:04             ` Andrew Jones
2016-11-28 14:04             ` Andrew Jones
2016-11-28 14:07             ` Andrew Jones
2016-11-28 14:07               ` Andrew Jones
2016-11-28 14:07               ` Andrew Jones
2016-11-28 14:09               ` Peter Maydell
2016-11-28 14:09                 ` Peter Maydell
2016-11-28 14:09                 ` Peter Maydell
2016-11-28 10:51 ` Andrew Jones
2016-11-28 10:51   ` Andrew Jones
2016-11-28 10:51   ` Andrew Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.