[PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs()

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs()
@ 2024-03-13 17:27 Andrew Cooper
  2024-03-13 17:27 ` [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements Andrew Cooper
                   ` (7 more replies)
  0 siblings, 8 replies; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 17:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

bitops.h is a mess.  It has grown organtically over many years, and forces
unreasonable repsonsibilities out into the per-arch stubs.

Start cleaning it up with ffs() and friends.  Across the board, this adds:

 * Functioning bitops without arch-specific asm
 * An option for arches to provide more optimal code generation
 * Compile-time constant folding
 * Testing at both compile time and during init that the basic operations
   behave according to spec.

and the only reason this series isn't a net reduction in code alone is the
because of the testing infrastructure in patch 1.

This form is superior in many ways, including getting RISC-V support for free.

Testing:
  https://gitlab.com/xen-project/people/andyhhp/xen/-/pipelines/1212269924
  https://cirrus-ci.com/build/4939856296542208

Andrew Cooper (7):
  xen/bitops: Cleanup ahead of rearrangements
  xen/bitops: Implement ffs() in common logic
  xen/bitops: Implement ffsl() in common logic
  xen/bitops: Delete generic_ffs{,l}()
  xen/bitops: Implement ffs64() in common logic
  xen: Swap find_first_set_bit() for ffsl() - 1
  xen/bitops: Delete find_first_set_bit()

 xen/arch/arm/include/asm/bitops.h            | 16 +---
 xen/arch/ppc/include/asm/bitops.h            | 11 ---
 xen/arch/x86/guest/xen/xen.c                 |  4 +-
 xen/arch/x86/hvm/dom0_build.c                |  2 +-
 xen/arch/x86/hvm/hpet.c                      |  8 +-
 xen/arch/x86/include/asm/bitops.h            | 53 +++++------
 xen/arch/x86/include/asm/pt-contig-markers.h |  2 +-
 xen/arch/x86/mm.c                            |  2 +-
 xen/arch/x86/mm/p2m-pod.c                    |  4 +-
 xen/common/Makefile                          |  1 +
 xen/common/bitops.c                          | 70 ++++++++++++++
 xen/common/page_alloc.c                      |  2 +-
 xen/common/softirq.c                         |  2 +-
 xen/drivers/passthrough/amd/iommu_map.c      |  2 +-
 xen/drivers/passthrough/iommu.c              |  4 +-
 xen/drivers/passthrough/x86/iommu.c          |  4 +-
 xen/include/xen/bitops.h                     | 97 +++++++++-----------
 xen/include/xen/compiler.h                   |  3 +-
 18 files changed, 160 insertions(+), 127 deletions(-)
 create mode 100644 xen/common/bitops.c


base-commit: 03cf7ca23e0e876075954c558485b267b7d02406
-- 
2.30.2



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements
  2024-03-13 17:27 [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
@ 2024-03-13 17:27 ` Andrew Cooper
  2024-03-13 18:39   ` Shawn Anastasio
                     ` (2 more replies)
  2024-03-13 17:27 ` [PATCH 2/7] xen/bitops: Implement ffs() in common logic Andrew Cooper
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 17:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

 * Rename __attribute_pure__ to just __pure before it gains users.
 * Identify the areas of xen/bitops.h which are a mess.
 * Create common/bitops.c for compile and runtime testing.  This provides a
   statement of the ABI, and a confirmation that arch-specific implementations
   behave as expected.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>

I expect MISRA will have something to say about the macros here, but they are
in aid of better testing.
---
 xen/common/Makefile        |  1 +
 xen/common/bitops.c        | 41 ++++++++++++++++++++++++++++++++++++++
 xen/include/xen/bitops.h   | 13 +++++++++---
 xen/include/xen/compiler.h |  3 ++-
 4 files changed, 54 insertions(+), 4 deletions(-)
 create mode 100644 xen/common/bitops.c

diff --git a/xen/common/Makefile b/xen/common/Makefile
index e5eee19a8537..1f8ca9a2f4f8 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_ARGO) += argo.o
 obj-y += bitmap.o
+obj-y += bitops.o
 obj-$(CONFIG_GENERIC_BUG_FRAME) += bug.o
 obj-$(CONFIG_HYPFS_CONFIG) += config_data.o
 obj-$(CONFIG_CORE_PARKING) += core_parking.o
diff --git a/xen/common/bitops.c b/xen/common/bitops.c
new file mode 100644
index 000000000000..4c07191b4030
--- /dev/null
+++ b/xen/common/bitops.c
@@ -0,0 +1,41 @@
+#include <xen/bitops.h>
+#include <xen/bug.h>
+#include <xen/init.h>
+
+/* Hide a value from the optimiser. */
+#define HIDE(x) ({ typeof(x) _x = x; asm volatile ( "" : "+r" (_x) ); _x; })
+
+/*
+ * Check that fn(val) can be calcuated by the compiler, and that it gives the
+ * expected answer.
+ */
+#define COMPILE_CHECK(fn, val, res)                                     \
+    do {                                                                \
+        if ( fn(val) != res )                                           \
+            asm (".error \"Compile time check '" STR(fn(val) == res) "' failed\""); \
+    } while ( 0 )
+
+/*
+ * Check that Xen's runtime logic for fn(val) gives the expected answer.  This
+ * requires using HIDE() to prevent the optimiser from emitting the full
+ * calculation.
+ */
+#define RUNTIME_CHECK(fn, val, res)             \
+    do {                                        \
+        BUG_ON(fn(HIDE(val)) != res);           \
+    } while ( 0 )
+
+/*
+ * Perform compiletime and runtime checks for fn(val) == res.
+ */
+#define CHECK(fn, val, res)                     \
+    do {                                        \
+        COMPILE_CHECK(fn, val, res);            \
+        RUNTIME_CHECK(fn, val, res);            \
+    } while ( 0 )
+
+static int __init cf_check test_bitops(void)
+{
+    return 0;
+}
+__initcall(test_bitops);
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index e3c5a4ccf321..9b40f20381a2 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -1,5 +1,7 @@
-#ifndef _LINUX_BITOPS_H
-#define _LINUX_BITOPS_H
+#ifndef XEN_BITOPS_H
+#define XEN_BITOPS_H
+
+#include <xen/compiler.h>
 #include <xen/types.h>
 
 /*
@@ -103,8 +105,13 @@ static inline int generic_flsl(unsigned long x)
  * Include this here because some architectures need generic_ffs/fls in
  * scope
  */
+
+/* --------------------- Please tidy above here --------------------- */
+
 #include <asm/bitops.h>
 
+/* --------------------- Please tidy below here --------------------- */
+
 #ifndef find_next_bit
 /**
  * find_next_bit - find the next set bit in a memory region
@@ -294,4 +301,4 @@ static inline __u32 ror32(__u32 word, unsigned int shift)
 
 #define BIT_WORD(nr) ((nr) / BITS_PER_LONG)
 
-#endif
+#endif /* XEN_BITOPS_H */
diff --git a/xen/include/xen/compiler.h b/xen/include/xen/compiler.h
index 16d554f2a593..972719df55b3 100644
--- a/xen/include/xen/compiler.h
+++ b/xen/include/xen/compiler.h
@@ -85,7 +85,8 @@
 #define inline inline __init
 #endif
 
-#define __attribute_pure__  __attribute__((__pure__))
+#define __pure  __attribute__((__pure__))
+
 #define __attribute_const__ __attribute__((__const__))
 #define __transparent__     __attribute__((__transparent_union__))
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/7] xen/bitops: Implement ffs() in common logic
  2024-03-13 17:27 [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
  2024-03-13 17:27 ` [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements Andrew Cooper
@ 2024-03-13 17:27 ` Andrew Cooper
  2024-03-14 14:16   ` Jan Beulich
  2024-03-13 17:27 ` [PATCH 3/7] xen/bitops: Implement ffsl() " Andrew Cooper
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 17:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

Allow the optimiser to elimiate the call completely, and use the compiler
builtin by default.  Architectures should only proide arch_ffs() if they think
they can do better than the compiler.

Confirm the expected behaviour with compile time and boot time tests.

For x86, correct the prototype, and simplify the asm() with the statement
given by the Intel architects to Linux about the behaviour on processors newer
than the 486.

For PPC, __builtin_ffs() is 1/3 of the size of size of the transform to
generic_fls().  Drop the definition entirely.

For ARM, simply rename ffs() to arch_ffs().  It appears that the
transformation to __builtin_clz() still makes better code than
__builtin_ffs().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>
---
 xen/arch/arm/include/asm/bitops.h |  2 +-
 xen/arch/ppc/include/asm/bitops.h |  1 -
 xen/arch/x86/include/asm/bitops.h | 19 +++++++++++++------
 xen/common/bitops.c               | 10 ++++++++++
 xen/include/xen/bitops.h          | 15 +++++++++++++++
 5 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
index ab030b6cb032..09c6064274a7 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -157,7 +157,7 @@ static inline int fls(unsigned int x)
 }
 
 
-#define ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
+#define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
 #define ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
 
 /**
diff --git a/xen/arch/ppc/include/asm/bitops.h b/xen/arch/ppc/include/asm/bitops.h
index 5820b9ce7bb5..635a3b4e3e33 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -173,7 +173,6 @@ static inline int __test_and_clear_bit(int nr, volatile void *addr)
 
 #define flsl(x) generic_flsl(x)
 #define fls(x) generic_fls(x)
-#define ffs(x) ({ unsigned int t_ = (x); fls(t_ & -t_); })
 #define ffsl(x) ({ unsigned long t_ = (x); flsl(t_ & -t_); })
 
 /* Based on linux/include/asm-generic/bitops/ffz.h */
diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h
index 5a71afbc89d5..2c5b103cbbd9 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -430,16 +430,23 @@ static inline int ffsl(unsigned long x)
     return (int)r+1;
 }
 
-static inline int ffs(unsigned int x)
+static inline unsigned int arch_ffs(unsigned int x)
 {
-    int r;
+    int r = -1;
+
+    /*
+     * The AMD manual states that BSF won't modify the destination register if
+     * x=0.  The Intel manual states that the result is undefined, but the
+     * architects have said that the register is written back with it's old
+     * value, possibly zero extended above 32 bits.
+     */
+    asm ( "bsf %[val], %[res]"
+          : [res] "+r" (r)
+          : [val] "rm" (x) );
 
-    asm ( "bsf %1,%0\n\t"
-          "jnz 1f\n\t"
-          "mov $-1,%0\n"
-          "1:" : "=r" (r) : "rm" (x));
     return r + 1;
 }
+#define arch_ffs arch_ffs
 
 /**
  * fls - find last bit set
diff --git a/xen/common/bitops.c b/xen/common/bitops.c
index 4c07191b4030..484df68768ad 100644
--- a/xen/common/bitops.c
+++ b/xen/common/bitops.c
@@ -34,8 +34,18 @@
         RUNTIME_CHECK(fn, val, res);            \
     } while ( 0 )
 
+static void test_ffs(void)
+{
+    /* unsigned int ffs(unsigned int) */
+    CHECK(ffs, 0, 0);
+    CHECK(ffs, 1, 1);
+    CHECK(ffs, 0x80000000U, 32);
+}
+
 static int __init cf_check test_bitops(void)
 {
+    test_ffs();
+
     return 0;
 }
 __initcall(test_bitops);
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index 9b40f20381a2..fb3645d9cf87 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -110,6 +110,21 @@ static inline int generic_flsl(unsigned long x)
 
 #include <asm/bitops.h>
 
+/*
+ * Find First Set bit.  Bits are labelled from 1.
+ */
+static always_inline __pure unsigned int ffs(unsigned int x)
+{
+    if ( __builtin_constant_p(x) )
+        return __builtin_ffs(x);
+
+#ifndef arch_ffs
+#define arch_ffs __builtin_ffs
+#endif
+
+    return arch_ffs(x);
+}
+
 /* --------------------- Please tidy below here --------------------- */
 
 #ifndef find_next_bit
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/7] xen/bitops: Implement ffsl() in common logic
  2024-03-13 17:27 [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
  2024-03-13 17:27 ` [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements Andrew Cooper
  2024-03-13 17:27 ` [PATCH 2/7] xen/bitops: Implement ffs() in common logic Andrew Cooper
@ 2024-03-13 17:27 ` Andrew Cooper
  2024-03-13 17:48   ` Andrew Cooper
  2024-03-13 18:16   ` Andrew Cooper
  2024-03-13 17:27 ` [PATCH 4/7] xen/bitops: Delete generic_ffs{,l}() Andrew Cooper
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 17:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

Exactly as per ffs() in the previous patch.  Express the upper bound of the
testing in terms of BITS_PER_LONG as it varies between architectures.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>
---
 xen/arch/arm/include/asm/bitops.h |  2 +-
 xen/arch/ppc/include/asm/bitops.h |  1 -
 xen/arch/x86/include/asm/bitops.h | 30 +++++++++++++-----------------
 xen/common/bitops.c               |  7 +++++++
 xen/include/xen/bitops.h          | 12 ++++++++++++
 5 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
index 09c6064274a7..59ae8ed150b6 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -158,7 +158,7 @@ static inline int fls(unsigned int x)
 
 
 #define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
-#define ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
+#define arch_ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
 
 /**
  * find_first_set_bit - find the first set bit in @word
diff --git a/xen/arch/ppc/include/asm/bitops.h b/xen/arch/ppc/include/asm/bitops.h
index 635a3b4e3e33..ecec2a826660 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -173,7 +173,6 @@ static inline int __test_and_clear_bit(int nr, volatile void *addr)
 
 #define flsl(x) generic_flsl(x)
 #define fls(x) generic_fls(x)
-#define ffsl(x) ({ unsigned long t_ = (x); flsl(t_ & -t_); })
 
 /* Based on linux/include/asm-generic/bitops/ffz.h */
 /*
diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h
index 2c5b103cbbd9..99342877e32f 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -413,23 +413,6 @@ static inline unsigned int find_first_set_bit(unsigned long word)
     return (unsigned int)word;
 }
 
-/**
- * ffs - find first bit set
- * @x: the word to search
- *
- * This is defined the same way as the libc and compiler builtin ffs routines.
- */
-static inline int ffsl(unsigned long x)
-{
-    long r;
-
-    asm ( "bsf %1,%0\n\t"
-          "jnz 1f\n\t"
-          "mov $-1,%0\n"
-          "1:" : "=r" (r) : "rm" (x));
-    return (int)r+1;
-}
-
 static inline unsigned int arch_ffs(unsigned int x)
 {
     int r = -1;
@@ -448,6 +431,19 @@ static inline unsigned int arch_ffs(unsigned int x)
 }
 #define arch_ffs arch_ffs
 
+static inline unsigned int arch_ffsl(unsigned long x)
+{
+    long r = -1;
+
+    /* See arch_ffs() for safety discussion. */
+    asm ( "bsf %[val], %[res]"
+          : [res] "+r" (r)
+          : [val] "rm" (x) );
+
+    return r + 1;
+}
+#define arch_ffsl arch_ffsl
+
 /**
  * fls - find last bit set
  * @x: the word to search
diff --git a/xen/common/bitops.c b/xen/common/bitops.c
index 484df68768ad..eceffe5029d6 100644
--- a/xen/common/bitops.c
+++ b/xen/common/bitops.c
@@ -40,6 +40,13 @@ static void test_ffs(void)
     CHECK(ffs, 0, 0);
     CHECK(ffs, 1, 1);
     CHECK(ffs, 0x80000000U, 32);
+
+    /* unsigned int ffsl(unsigned long) */
+    CHECK(ffsl, 0, 0);
+    CHECK(ffsl, 1, 1);
+    CHECK(ffsl, 1UL << (BITS_PER_LONG - 1), BITS_PER_LONG);
+    if ( BITS_PER_LONG > 32 )
+        CHECK(ffsl, 1UL << 32, 33);
 }
 
 static int __init cf_check test_bitops(void)
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index fb3645d9cf87..a37b42342bc5 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -125,6 +125,18 @@ static always_inline __pure unsigned int ffs(unsigned int x)
     return arch_ffs(x);
 }
 
+static always_inline __pure unsigned int ffsl(unsigned long x)
+{
+    if ( __builtin_constant_p(x) )
+        return __builtin_ffsl(x);
+
+#ifndef arch_ffsl
+#define arch_ffsl __builtin_ffsl
+#endif
+
+    return arch_ffsl(x);
+}
+
 /* --------------------- Please tidy below here --------------------- */
 
 #ifndef find_next_bit
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/7] xen/bitops: Delete generic_ffs{,l}()
  2024-03-13 17:27 [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
                   ` (2 preceding siblings ...)
  2024-03-13 17:27 ` [PATCH 3/7] xen/bitops: Implement ffsl() " Andrew Cooper
@ 2024-03-13 17:27 ` Andrew Cooper
  2024-03-13 17:27 ` [PATCH 5/7] xen/bitops: Implement ffs64() in common logic Andrew Cooper
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 17:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

No more users.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>
---
 xen/include/xen/bitops.h | 41 ----------------------------------------
 1 file changed, 41 deletions(-)

diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index a37b42342bc5..b85b35c40781 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -14,41 +14,6 @@
 #define GENMASK_ULL(h, l) \
     (((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LLONG - 1 - (h))))
 
-/*
- * ffs: find first bit set. This is defined the same way as
- * the libc and compiler builtin ffs routines, therefore
- * differs in spirit from the above ffz (man ffs).
- */
-
-static inline int generic_ffs(unsigned int x)
-{
-    int r = 1;
-
-    if (!x)
-        return 0;
-    if (!(x & 0xffff)) {
-        x >>= 16;
-        r += 16;
-    }
-    if (!(x & 0xff)) {
-        x >>= 8;
-        r += 8;
-    }
-    if (!(x & 0xf)) {
-        x >>= 4;
-        r += 4;
-    }
-    if (!(x & 3)) {
-        x >>= 2;
-        r += 2;
-    }
-    if (!(x & 1)) {
-        x >>= 1;
-        r += 1;
-    }
-    return r;
-}
-
 /*
  * fls: find last bit set.
  */
@@ -84,11 +49,6 @@ static inline int generic_fls(unsigned int x)
 
 #if BITS_PER_LONG == 64
 
-static inline int generic_ffsl(unsigned long x)
-{
-    return !x || (u32)x ? generic_ffs(x) : generic_ffs(x >> 32) + 32;
-}
-
 static inline int generic_flsl(unsigned long x)
 {
     u32 h = x >> 32;
@@ -97,7 +57,6 @@ static inline int generic_flsl(unsigned long x)
 }
 
 #else
-# define generic_ffsl generic_ffs
 # define generic_flsl generic_fls
 #endif
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/7] xen/bitops: Implement ffs64() in common logic
  2024-03-13 17:27 [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
                   ` (3 preceding siblings ...)
  2024-03-13 17:27 ` [PATCH 4/7] xen/bitops: Delete generic_ffs{,l}() Andrew Cooper
@ 2024-03-13 17:27 ` Andrew Cooper
  2024-03-14 15:56   ` Jan Beulich
  2024-03-13 17:27 ` [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1 Andrew Cooper
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 17:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

As per ffs()/ffsl() in previous patches.  Add tests for all interesting bit
positions at 32bit boundaries.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>
---
 xen/common/bitops.c      | 12 ++++++++++++
 xen/include/xen/bitops.h | 16 ++++++++--------
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/xen/common/bitops.c b/xen/common/bitops.c
index eceffe5029d6..cd194fe672b7 100644
--- a/xen/common/bitops.c
+++ b/xen/common/bitops.c
@@ -47,6 +47,18 @@ static void test_ffs(void)
     CHECK(ffsl, 1UL << (BITS_PER_LONG - 1), BITS_PER_LONG);
     if ( BITS_PER_LONG > 32 )
         CHECK(ffsl, 1UL << 32, 33);
+
+    /*
+     * unsigned int ffs64(uint64_t)
+     *
+     * 32-bit builds of Xen have to split this into two adjacent operations,
+     * so test all interesting bit positions.
+     */
+    CHECK(ffs64, 0, 0);
+    CHECK(ffs64, 1, 1);
+    CHECK(ffs64, (uint64_t)0x0000000080000000, 32);
+    CHECK(ffs64, (uint64_t)0x0000000100000000, 33);
+    CHECK(ffs64, (uint64_t)0x8000000000000000, 64);
 }
 
 static int __init cf_check test_bitops(void)
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index b85b35c40781..f14ad0d33aa3 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -96,6 +96,14 @@ static always_inline __pure unsigned int ffsl(unsigned long x)
     return arch_ffsl(x);
 }
 
+static always_inline __pure unsigned int ffs64(uint64_t x)
+{
+    if ( BITS_PER_LONG == 64 )
+        return ffsl(x);
+    else
+        return !x || (uint32_t)x ? ffs(x) : ffs(x >> 32) + 32;
+}
+
 /* --------------------- Please tidy below here --------------------- */
 
 #ifndef find_next_bit
@@ -148,15 +156,7 @@ extern unsigned long find_first_zero_bit(const unsigned long *addr,
 
 #if BITS_PER_LONG == 64
 # define fls64 flsl
-# define ffs64 ffsl
 #else
-# ifndef ffs64
-static inline int generic_ffs64(__u64 x)
-{
-    return !x || (__u32)x ? ffs(x) : ffs(x >> 32) + 32;
-}
-#  define ffs64 generic_ffs64
-# endif
 # ifndef fls64
 static inline int generic_fls64(__u64 x)
 {
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1
  2024-03-13 17:27 [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
                   ` (4 preceding siblings ...)
  2024-03-13 17:27 ` [PATCH 5/7] xen/bitops: Implement ffs64() in common logic Andrew Cooper
@ 2024-03-13 17:27 ` Andrew Cooper
  2024-03-14 14:30   ` Jan Beulich
  2024-03-13 17:27 ` [PATCH 7/7] xen/bitops: Delete find_first_set_bit() Andrew Cooper
  2024-03-14 14:45 ` [RISCV] [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
  7 siblings, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 17:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

find_first_set_bit() is a Xen-ism which has undefined behaviour with a 0
input.  The latter is well defined with an input of 0, and is a found outside
of Xen too.

_init_heap_pages() is the one special case here, comparing the LSB of two
different addresses.  The -1 cancels off both sides of the expression.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>
---
 xen/arch/x86/guest/xen/xen.c                 | 4 ++--
 xen/arch/x86/hvm/dom0_build.c                | 2 +-
 xen/arch/x86/hvm/hpet.c                      | 8 ++++----
 xen/arch/x86/include/asm/pt-contig-markers.h | 2 +-
 xen/arch/x86/mm.c                            | 2 +-
 xen/arch/x86/mm/p2m-pod.c                    | 4 ++--
 xen/common/page_alloc.c                      | 2 +-
 xen/common/softirq.c                         | 2 +-
 xen/drivers/passthrough/amd/iommu_map.c      | 2 +-
 xen/drivers/passthrough/iommu.c              | 4 ++--
 xen/drivers/passthrough/x86/iommu.c          | 4 ++--
 11 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index d9768cc9527d..7484b3f73ad3 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -168,14 +168,14 @@ static void cf_check xen_evtchn_upcall(void)
 
     while ( pending )
     {
-        unsigned int l1 = find_first_set_bit(pending);
+        unsigned int l1 = ffsl(pending) - 1;
         unsigned long evtchn = xchg(&XEN_shared_info->evtchn_pending[l1], 0);
 
         __clear_bit(l1, &pending);
         evtchn &= ~XEN_shared_info->evtchn_mask[l1];
         while ( evtchn )
         {
-            unsigned int port = find_first_set_bit(evtchn);
+            unsigned int port = ffsl(evtchn) - 1;
 
             __clear_bit(port, &evtchn);
             port += l1 * BITS_PER_LONG;
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index bbae8a564522..7bc092675628 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -139,7 +139,7 @@ static int __init pvh_populate_memory_range(struct domain *d,
         order = get_order_from_pages(end - start + 1);
         order = min(order ? order - 1 : 0, max_order);
         /* The order allocated and populated must be aligned to the address. */
-        order = min(order, start ? find_first_set_bit(start) : MAX_ORDER);
+        order = min(order, start ? ffsl(start) - 1 : MAX_ORDER);
         page = alloc_domheap_pages(d, order, dom0_memflags | MEMF_no_scrub);
         if ( page == NULL )
         {
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 1db9c0b60ee0..30ec14b24110 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -336,7 +336,7 @@ static void timer_sanitize_int_route(HPETState *h, unsigned int tn)
      * enabled pick the first irq.
      */
     timer_config(h, tn) |=
-        MASK_INSR(find_first_set_bit(timer_int_route_cap(h, tn)),
+        MASK_INSR(ffsl(timer_int_route_cap(h, tn)) - 1,
                   HPET_TN_ROUTE);
 }
 
@@ -410,7 +410,7 @@ static int cf_check hpet_write(
         {
             bool active;
 
-            i = find_first_set_bit(new_val);
+            i = ffsl(new_val) - 1;
             if ( i >= HPET_TIMER_NUM )
                 break;
             __clear_bit(i, &new_val);
@@ -536,14 +536,14 @@ static int cf_check hpet_write(
     /* stop/start timers whos state was changed by this write. */
     while (stop_timers)
     {
-        i = find_first_set_bit(stop_timers);
+        i = ffsl(stop_timers) - 1;
         __clear_bit(i, &stop_timers);
         hpet_stop_timer(h, i, guest_time);
     }
 
     while (start_timers)
     {
-        i = find_first_set_bit(start_timers);
+        i = ffsl(start_timers) - 1;
         __clear_bit(i, &start_timers);
         hpet_set_timer(h, i, guest_time);
     }
diff --git a/xen/arch/x86/include/asm/pt-contig-markers.h b/xen/arch/x86/include/asm/pt-contig-markers.h
index b3c1fe803534..e8c8157d605f 100644
--- a/xen/arch/x86/include/asm/pt-contig-markers.h
+++ b/xen/arch/x86/include/asm/pt-contig-markers.h
@@ -60,7 +60,7 @@ static bool pt_update_contig_markers(uint64_t *pt, unsigned int idx,
     /* Step 1: Reduce markers in lower numbered entries. */
     while ( i )
     {
-        b = find_first_set_bit(i);
+        b = ffsl(i) - 1;
         i &= ~(1U << b);
         if ( GET_MARKER(pt[i]) <= b )
             break;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 62f5b811bbe8..28e9a159b577 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3418,7 +3418,7 @@ static int vcpumask_to_pcpumask(
         {
             unsigned int cpu;
 
-            vcpu_id = find_first_set_bit(vmask);
+            vcpu_id = ffsl(vmask) - 1;
             vmask &= ~(1UL << vcpu_id);
             vcpu_id += vcpu_bias;
             if ( (vcpu_id >= d->max_vcpus) )
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index 65d31e552305..e0ad934d2e30 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -684,7 +684,7 @@ unsigned long
 p2m_pod_decrease_reservation(struct domain *d, gfn_t gfn, unsigned int order)
 {
     unsigned long left = 1UL << order, ret = 0;
-    unsigned int chunk_order = find_first_set_bit(gfn_x(gfn) | left);
+    unsigned int chunk_order = ffsl(gfn_x(gfn) | left) - 1;
 
     do {
         ret += decrease_reservation(d, gfn, chunk_order);
@@ -1384,7 +1384,7 @@ guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
                                       unsigned int order)
 {
     unsigned long left = 1UL << order;
-    unsigned int chunk_order = find_first_set_bit(gfn | left);
+    unsigned int chunk_order = ffsl(gfn | left) - 1;
     int rc;
 
     if ( !paging_mode_translate(d) )
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 2ec17df9b420..812eac51ea0d 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1817,7 +1817,7 @@ static void _init_heap_pages(const struct page_info *pg,
     if ( unlikely(!avail[nid]) )
     {
         bool use_tail = IS_ALIGNED(s, 1UL << MAX_ORDER) &&
-                        (find_first_set_bit(e) <= find_first_set_bit(s));
+                        (ffsl(e) <= ffsl(s));
         unsigned long n;
 
         n = init_node_heap(nid, s, nr_pages, &use_tail);
diff --git a/xen/common/softirq.c b/xen/common/softirq.c
index 321d26902d37..bee4a82009c3 100644
--- a/xen/common/softirq.c
+++ b/xen/common/softirq.c
@@ -48,7 +48,7 @@ static void __do_softirq(unsigned long ignore_mask)
              || cpu_is_offline(cpu) )
             break;
 
-        i = find_first_set_bit(pending);
+        i = ffsl(pending) - 1;
         clear_bit(i, &softirq_pending(cpu));
         (*softirq_handlers[i])();
     }
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index e0f4fe736a8d..f1061bfc798c 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -137,7 +137,7 @@ static void set_iommu_ptes_present(unsigned long pt_mfn,
         ASSERT(!pde->u);
 
         if ( pde > table )
-            ASSERT(pde->ign0 == find_first_set_bit(pde - table));
+            ASSERT(pde->ign0 == ffsl(pde - table) - 1);
         else
             ASSERT(pde->ign0 == CONTIG_LEVEL_SHIFT);
 
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 996c31be1284..67dd8e5cd9e1 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -301,7 +301,7 @@ static unsigned int mapping_order(const struct domain_iommu *hd,
 {
     unsigned long res = dfn_x(dfn) | mfn_x(mfn);
     unsigned long sizes = hd->platform_ops->page_sizes;
-    unsigned int bit = find_first_set_bit(sizes), order = 0;
+    unsigned int bit = ffsl(sizes) - 1, order = 0;
 
     ASSERT(bit == PAGE_SHIFT);
 
@@ -309,7 +309,7 @@ static unsigned int mapping_order(const struct domain_iommu *hd,
     {
         unsigned long mask;
 
-        bit = find_first_set_bit(sizes);
+        bit = ffsl(sizes) - 1;
         mask = (1UL << bit) - 1;
         if ( nr <= mask || (res & mask) )
             break;
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index a3fa0aef7c37..d721ea27a033 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
     if ( contig_mask )
     {
         /* See pt-contig-markers.h for a description of the marker scheme. */
-        unsigned int i, shift = find_first_set_bit(contig_mask);
+        unsigned int i, shift = ffsl(contig_mask) - 1;
 
         ASSERT((CONTIG_LEVEL_SHIFT & (contig_mask >> shift)) == CONTIG_LEVEL_SHIFT);
 
@@ -652,7 +652,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
 
         for ( i = 4; i < PAGE_SIZE / sizeof(*p); i += 4 )
         {
-            p[i + 0] = (find_first_set_bit(i) + 0ULL) << shift;
+            p[i + 0] = (ffsl(i) - 1ULL) << shift;
             p[i + 1] = 0;
             p[i + 2] = 1ULL << shift;
             p[i + 3] = 0;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/7] xen/bitops: Delete find_first_set_bit()
  2024-03-13 17:27 [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
                   ` (5 preceding siblings ...)
  2024-03-13 17:27 ` [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1 Andrew Cooper
@ 2024-03-13 17:27 ` Andrew Cooper
  2024-03-14 15:59   ` Jan Beulich
  2024-03-14 14:45 ` [RISCV] [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
  7 siblings, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 17:27 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

No more users.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>
---
 xen/arch/arm/include/asm/bitops.h | 12 ------------
 xen/arch/ppc/include/asm/bitops.h |  9 ---------
 xen/arch/x86/include/asm/bitops.h | 12 ------------
 3 files changed, 33 deletions(-)

diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
index 59ae8ed150b6..5104334e4874 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -160,18 +160,6 @@ static inline int fls(unsigned int x)
 #define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
 #define arch_ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- *
- * Returns the bit-number of the first set bit (first bit being 0).
- * The input must *not* be zero.
- */
-static inline unsigned int find_first_set_bit(unsigned long word)
-{
-        return ffsl(word) - 1;
-}
-
 /**
  * hweightN - returns the hamming weight of a N-bit word
  * @x: the word to weigh
diff --git a/xen/arch/ppc/include/asm/bitops.h b/xen/arch/ppc/include/asm/bitops.h
index ecec2a826660..989d341a44c7 100644
--- a/xen/arch/ppc/include/asm/bitops.h
+++ b/xen/arch/ppc/include/asm/bitops.h
@@ -206,13 +206,4 @@ static always_inline unsigned long __ffs(unsigned long word)
     return __builtin_ctzl(word);
 }
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- *
- * Returns the bit-number of the first set bit (first bit being 0).
- * The input must *not* be zero.
- */
-#define find_first_set_bit(x) (ffsl(x) - 1)
-
 #endif /* _ASM_PPC_BITOPS_H */
diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h
index 99342877e32f..2835bb6814d5 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
     r__;                                                                    \
 })
 
-/**
- * find_first_set_bit - find the first set bit in @word
- * @word: the word to search
- * 
- * Returns the bit-number of the first set bit. The input must *not* be zero.
- */
-static inline unsigned int find_first_set_bit(unsigned long word)
-{
-    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
-    return (unsigned int)word;
-}
-
 static inline unsigned int arch_ffs(unsigned int x)
 {
     int r = -1;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/7] xen/bitops: Implement ffsl() in common logic
  2024-03-13 17:27 ` [PATCH 3/7] xen/bitops: Implement ffsl() " Andrew Cooper
@ 2024-03-13 17:48   ` Andrew Cooper
  2024-03-14 13:45     ` Andrew Cooper
  2024-03-13 18:16   ` Andrew Cooper
  1 sibling, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 17:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

On 13/03/2024 5:27 pm, Andrew Cooper wrote:
>  xen/arch/arm/include/asm/bitops.h |  2 +-
>  xen/arch/ppc/include/asm/bitops.h |  1 -
>  xen/arch/x86/include/asm/bitops.h | 30 +++++++++++++-----------------
>  xen/common/bitops.c               |  7 +++++++
>  xen/include/xen/bitops.h          | 12 ++++++++++++
>  5 files changed, 33 insertions(+), 19 deletions(-)
>
> diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
> index 09c6064274a7..59ae8ed150b6 100644
> --- a/xen/arch/arm/include/asm/bitops.h
> +++ b/xen/arch/arm/include/asm/bitops.h
> @@ -158,7 +158,7 @@ static inline int fls(unsigned int x)
>  
>  
>  #define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
> -#define ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
> +#define arch_ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
>  
>  /**
>   * find_first_set_bit - find the first set bit in @word

It turns out this change isn't bisectable on ARM, but it is by the end
of the series.

Reordering patches 6+7 to be ahead of this one resolves the bisection
problem.

~Andrew


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/7] xen/bitops: Implement ffsl() in common logic
  2024-03-13 17:27 ` [PATCH 3/7] xen/bitops: Implement ffsl() " Andrew Cooper
  2024-03-13 17:48   ` Andrew Cooper
@ 2024-03-13 18:16   ` Andrew Cooper
  1 sibling, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 18:16 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

On 13/03/2024 5:27 pm, Andrew Cooper wrote:
> diff --git a/xen/common/bitops.c b/xen/common/bitops.c
> index 484df68768ad..eceffe5029d6 100644
> --- a/xen/common/bitops.c
> +++ b/xen/common/bitops.c
> @@ -40,6 +40,13 @@ static void test_ffs(void)
>      CHECK(ffs, 0, 0);
>      CHECK(ffs, 1, 1);
>      CHECK(ffs, 0x80000000U, 32);
> +
> +    /* unsigned int ffsl(unsigned long) */
> +    CHECK(ffsl, 0, 0);
> +    CHECK(ffsl, 1, 1);
> +    CHECK(ffsl, 1UL << (BITS_PER_LONG - 1), BITS_PER_LONG);
> +    if ( BITS_PER_LONG > 32 )
> +        CHECK(ffsl, 1UL << 32, 33);
>  }

This if() needs to be an #if to compile on arm32.

Otherwise, I've managed to make the series fully bisectable on all
architectures.

~Andrew


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements
  2024-03-13 17:27 ` [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements Andrew Cooper
@ 2024-03-13 18:39   ` Shawn Anastasio
  2024-03-13 23:06   ` Andrew Cooper
  2024-03-14 13:59   ` Jan Beulich
  2 siblings, 0 replies; 33+ messages in thread
From: Shawn Anastasio @ 2024-03-13 18:39 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	consulting @ bugseng . com, Simone Ballarin, Federico Serafini,
	Nicola Vetrini

Hi Andrew,

On 3/13/24 12:27 PM, Andrew Cooper wrote:
> diff --git a/xen/common/bitops.c b/xen/common/bitops.c
> new file mode 100644
> index 000000000000..4c07191b4030
> --- /dev/null
> +++ b/xen/common/bitops.c
> @@ -0,0 +1,41 @@
> +#include <xen/bitops.h>
> +#include <xen/bug.h>
> +#include <xen/init.h>
> +
> +/* Hide a value from the optimiser. */
> +#define HIDE(x) ({ typeof(x) _x = x; asm volatile ( "" : "+r" (_x) ); _x; })
> +
> +/*
> + * Check that fn(val) can be calcuated by the compiler, and that it gives the
> + * expected answer.
> + */
> +#define COMPILE_CHECK(fn, val, res)                                     \
> +    do {                                                                \
> +        if ( fn(val) != res )                                           \
> +            asm (".error \"Compile time check '" STR(fn(val) == res) "' failed\""); \
> +    } while ( 0 )
> +

For improved diagnostics, I think it might make sense to explicitly
check if the expression can be evaluated at compile time and emit a
different error if not. Perhaps something like the following:

#define COMPILE_CHECK(fn, val, res)                                     \
    do {                                                                \
        __typeof__(fn(val)) actual = fn(val);                           \
        if ( !__builtin_constant_p(actual) )                            \
            asm (".error \"Unable to evaluate '" STR(fn(val)) "' at
compile time\"\n"); \
        else if ( actual != res )                                       \
            asm (".error \"Compile time check '" STR(fn(val) == res) "'
failed\""); \
    } while ( 0 )



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements
  2024-03-13 17:27 ` [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements Andrew Cooper
  2024-03-13 18:39   ` Shawn Anastasio
@ 2024-03-13 23:06   ` Andrew Cooper
  2024-03-14 13:59   ` Jan Beulich
  2 siblings, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2024-03-13 23:06 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

On 13/03/2024 5:27 pm, Andrew Cooper wrote:
> diff --git a/xen/common/bitops.c b/xen/common/bitops.c
> new file mode 100644
> index 000000000000..4c07191b4030
> --- /dev/null
> +++ b/xen/common/bitops.c
> @@ -0,0 +1,41 @@
> +#include <xen/bitops.h>
> +#include <xen/bug.h>
> +#include <xen/init.h>
> +
> +/* Hide a value from the optimiser. */
> +#define HIDE(x) ({ typeof(x) _x = x; asm volatile ( "" : "+r" (_x) ); _x; })
> +
> +/*
> + * Check that fn(val) can be calcuated by the compiler, and that it gives the
> + * expected answer.
> + */
> +#define COMPILE_CHECK(fn, val, res)                                     \
> +    do {                                                                \
> +        if ( fn(val) != res )                                           \
> +            asm (".error \"Compile time check '" STR(fn(val) == res) "' failed\""); \
> +    } while ( 0 )

It turns out that Clang doesn't like this

https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/6387413632

despite it being capable of reducing the expression to a constant.

This also calls into question whether it's a viable replacement for
__bad_bitop_size() et al.

~Andrew


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/7] xen/bitops: Implement ffsl() in common logic
  2024-03-13 17:48   ` Andrew Cooper
@ 2024-03-14 13:45     ` Andrew Cooper
  0 siblings, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2024-03-14 13:45 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

On 13/03/2024 5:48 pm, Andrew Cooper wrote:
> On 13/03/2024 5:27 pm, Andrew Cooper wrote:
>>  xen/arch/arm/include/asm/bitops.h |  2 +-
>>  xen/arch/ppc/include/asm/bitops.h |  1 -
>>  xen/arch/x86/include/asm/bitops.h | 30 +++++++++++++-----------------
>>  xen/common/bitops.c               |  7 +++++++
>>  xen/include/xen/bitops.h          | 12 ++++++++++++
>>  5 files changed, 33 insertions(+), 19 deletions(-)
>>
>> diff --git a/xen/arch/arm/include/asm/bitops.h b/xen/arch/arm/include/asm/bitops.h
>> index 09c6064274a7..59ae8ed150b6 100644
>> --- a/xen/arch/arm/include/asm/bitops.h
>> +++ b/xen/arch/arm/include/asm/bitops.h
>> @@ -158,7 +158,7 @@ static inline int fls(unsigned int x)
>>  
>>  
>>  #define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
>> -#define ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
>> +#define arch_ffsl(x) ({ unsigned long __t = (x); flsl(ISOLATE_LSB(__t)); })
>>  
>>  /**
>>   * find_first_set_bit - find the first set bit in @word
> It turns out this change isn't bisectable on ARM, but it is by the end
> of the series.
>
> Reordering patches 6+7 to be ahead of this one resolves the bisection
> problem.

... but introduces a bisection issue on x86.

I'm going to have to split the series up a bit more to do nicely.

~Andrew

>
> ~Andrew



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements
  2024-03-13 17:27 ` [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements Andrew Cooper
  2024-03-13 18:39   ` Shawn Anastasio
  2024-03-13 23:06   ` Andrew Cooper
@ 2024-03-14 13:59   ` Jan Beulich
  2 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2024-03-14 13:59 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 13.03.2024 18:27, Andrew Cooper wrote:
>  * Rename __attribute_pure__ to just __pure before it gains users.
>  * Identify the areas of xen/bitops.h which are a mess.
>  * Create common/bitops.c for compile and runtime testing.  This provides a
>    statement of the ABI, and a confirmation that arch-specific implementations
>    behave as expected.

If this is the sole purpose of the new file, then ...

> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -1,5 +1,6 @@
>  obj-$(CONFIG_ARGO) += argo.o
>  obj-y += bitmap.o
> +obj-y += bitops.o

obj-bin-y += bitops.init.o

please.

> --- /dev/null
> +++ b/xen/common/bitops.c
> @@ -0,0 +1,41 @@
> +#include <xen/bitops.h>
> +#include <xen/bug.h>
> +#include <xen/init.h>
> +
> +/* Hide a value from the optimiser. */
> +#define HIDE(x) ({ typeof(x) _x = x; asm volatile ( "" : "+r" (_x) ); _x; })

Irrespective of the question of leading underscores, x wants parenthesizing here.

> +/*
> + * Check that fn(val) can be calcuated by the compiler, and that it gives the
> + * expected answer.
> + */
> +#define COMPILE_CHECK(fn, val, res)                                     \
> +    do {                                                                \
> +        if ( fn(val) != res )                                           \
> +            asm (".error \"Compile time check '" STR(fn(val) == res) "' failed\""); \

Nit: Blanks missing immediately inside the outermost pair of parentheses. (As
per your own reply it's unclear whether this would actually survive.)

> --- a/xen/include/xen/compiler.h
> +++ b/xen/include/xen/compiler.h
> @@ -85,7 +85,8 @@
>  #define inline inline __init
>  #endif
>  
> -#define __attribute_pure__  __attribute__((__pure__))
> +#define __pure  __attribute__((__pure__))

I'd say either there be just a single padding blank or enough to align the
rhs with ...

>  #define __attribute_const__ __attribute__((__const__))
>  #define __transparent__     __attribute__((__transparent_union__))

... these.

Jan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] xen/bitops: Implement ffs() in common logic
  2024-03-13 17:27 ` [PATCH 2/7] xen/bitops: Implement ffs() in common logic Andrew Cooper
@ 2024-03-14 14:16   ` Jan Beulich
  2024-03-14 16:23     ` Andrew Cooper
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2024-03-14 14:16 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 13.03.2024 18:27, Andrew Cooper wrote:
> --- a/xen/arch/arm/include/asm/bitops.h
> +++ b/xen/arch/arm/include/asm/bitops.h
> @@ -157,7 +157,7 @@ static inline int fls(unsigned int x)
>  }
>  
>  
> -#define ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
> +#define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })

The way the macro is invoked, I don't think the helper local variable
is then needed anymore?

> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -430,16 +430,23 @@ static inline int ffsl(unsigned long x)
>      return (int)r+1;
>  }
>  
> -static inline int ffs(unsigned int x)
> +static inline unsigned int arch_ffs(unsigned int x)
>  {
> -    int r;
> +    int r = -1;
> +
> +    /*
> +     * The AMD manual states that BSF won't modify the destination register if
> +     * x=0.  The Intel manual states that the result is undefined, but the
> +     * architects have said that the register is written back with it's old
> +     * value, possibly zero extended above 32 bits.
> +     */
> +    asm ( "bsf %[val], %[res]"
> +          : [res] "+r" (r)
> +          : [val] "rm" (x) );

And this isn't what the compiler would be doing anyway?

Also, just to mention it: I take it that you/we are sure that disallowing
both operands to be the same register is still better than ...

> -    asm ( "bsf %1,%0\n\t"
> -          "jnz 1f\n\t"
> -          "mov $-1,%0\n"
> -          "1:" : "=r" (r) : "rm" (x));

... the original form?

> --- a/xen/common/bitops.c
> +++ b/xen/common/bitops.c
> @@ -34,8 +34,18 @@
>          RUNTIME_CHECK(fn, val, res);            \
>      } while ( 0 )
>  
> +static void test_ffs(void)

Nit: __init please, even if there ought to be no reason for the compiler
to not inline this function.

> --- a/xen/include/xen/bitops.h
> +++ b/xen/include/xen/bitops.h
> @@ -110,6 +110,21 @@ static inline int generic_flsl(unsigned long x)
>  
>  #include <asm/bitops.h>
>  
> +/*
> + * Find First Set bit.  Bits are labelled from 1.
> + */
> +static always_inline __pure unsigned int ffs(unsigned int x)

Why always_inline?

> +{
> +    if ( __builtin_constant_p(x) )
> +        return __builtin_ffs(x);
> +
> +#ifndef arch_ffs
> +#define arch_ffs __builtin_ffs
> +#endif
> +
> +    return arch_ffs(x);
> +}

Just to mention it: __builtin_ffs() takes and returns plain int. I'm
happy about our own helper being unsigned-correct, but anything like
this has a Misra angle too.

Jan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1
  2024-03-13 17:27 ` [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1 Andrew Cooper
@ 2024-03-14 14:30   ` Jan Beulich
  2024-03-14 16:48     ` Oleksii
  2024-03-14 18:47     ` Andrew Cooper
  0 siblings, 2 replies; 33+ messages in thread
From: Jan Beulich @ 2024-03-14 14:30 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 13.03.2024 18:27, Andrew Cooper wrote:
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
>      if ( contig_mask )
>      {
>          /* See pt-contig-markers.h for a description of the marker scheme. */
> -        unsigned int i, shift = find_first_set_bit(contig_mask);
> +        unsigned int i, shift = ffsl(contig_mask) - 1;

The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
introduced).

But what I first of all would like to have clarification on is what your
(perhaps just abstract at this point) plans are wrt ffz() / ffzl().
Potential side-by-side uses would be odd now, and would continue to be odd
if the difference in bit labeling was retained. Since we're switching to
a consolidated set of basic helpers, such an anomaly would better not
survive imo.

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RISCV] [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs()
  2024-03-13 17:27 [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
                   ` (6 preceding siblings ...)
  2024-03-13 17:27 ` [PATCH 7/7] xen/bitops: Delete find_first_set_bit() Andrew Cooper
@ 2024-03-14 14:45 ` Andrew Cooper
  2024-03-14 15:33   ` Jan Beulich
  7 siblings, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-14 14:45 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini

On 13/03/2024 5:27 pm, Andrew Cooper wrote:
> Start cleaning it up with ffs() and friends.  Across the board, this adds:
>
>  * Functioning bitops without arch-specific asm

It turns out that RISC-V doesn't have a CLZ instruction in the base
ISA.  As a consequence, __builtin_ffs() emits a library call to ffs() on
GCC, or a de Bruijn sequence on Clang.

The optional Zbb extension adds a CLZ instruction, after which
__builtin_ffs() emits a very simple sequence.

This leaves us with several options.

1) Put generic_ffs() back in, although if we do this then it's going to
be out-of-line in lib/ where it can be mostly ignored.

2) Require Zbb for Xen.

3) Alternative it up with Zbb or generic_ffs().

I've got half a mind to do 1) irrespective.  It's mostly just shuffling
logic out of bitops.h into lib/.

I also think we should do option 2 for RISCV.  Given the instruction
groups that H does mandate, it's unrealistic to expect that such a chip
wouldn't support Zbb/etc.

Also, getting full alternatives working is yet-more work that's not
trivial at this point in RISCV's development.  I think it is entirely
reasonable to avoid this work for now, and make it a problem for anyone
who has an H-capable Zbb-incapable system.  (with a strong implication
that this is work that probably never needs to be done.)

~Andrew

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RISCV] [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs()
  2024-03-14 14:45 ` [RISCV] [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
@ 2024-03-14 15:33   ` Jan Beulich
  2024-03-14 15:55     ` Andrew Cooper
  2024-03-14 16:32     ` Oleksii
  0 siblings, 2 replies; 33+ messages in thread
From: Jan Beulich @ 2024-03-14 15:33 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 14.03.2024 15:45, Andrew Cooper wrote:
> On 13/03/2024 5:27 pm, Andrew Cooper wrote:
>> Start cleaning it up with ffs() and friends.  Across the board, this adds:
>>
>>  * Functioning bitops without arch-specific asm
> 
> It turns out that RISC-V doesn't have a CLZ instruction in the base
> ISA.  As a consequence, __builtin_ffs() emits a library call to ffs() on
> GCC, or a de Bruijn sequence on Clang.
> 
> The optional Zbb extension adds a CLZ instruction, after which
> __builtin_ffs() emits a very simple sequence.
> 
> This leaves us with several options.
> 
> 1) Put generic_ffs() back in, although if we do this then it's going to
> be out-of-line in lib/ where it can be mostly ignored.
> 
> 2) Require Zbb for Xen.
> 
> 3) Alternative it up with Zbb or generic_ffs().
> 
> 
> I've got half a mind to do 1) irrespective.  It's mostly just shuffling
> logic out of bitops.h into lib/.

Yes. Might also help with the bi-sectability issue you faced.

> I also think we should do option 2 for RISCV.  Given the instruction
> groups that H does mandate, it's unrealistic to expect that such a chip
> wouldn't support Zbb/etc.

I'm not so sure here.

> Also, getting full alternatives working is yet-more work that's not
> trivial at this point in RISCV's development.  I think it is entirely
> reasonable to avoid this work for now, and make it a problem for anyone
> who has an H-capable Zbb-incapable system.  (with a strong implication
> that this is work that probably never needs to be done.)

That's definitely for later.

Jan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RISCV] [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs()
  2024-03-14 15:33   ` Jan Beulich
@ 2024-03-14 15:55     ` Andrew Cooper
  2024-03-14 16:32     ` Oleksii
  1 sibling, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2024-03-14 15:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 14/03/2024 3:33 pm, Jan Beulich wrote:
> On 14.03.2024 15:45, Andrew Cooper wrote:
>> On 13/03/2024 5:27 pm, Andrew Cooper wrote:
>>> Start cleaning it up with ffs() and friends.  Across the board, this adds:
>>>
>>>  * Functioning bitops without arch-specific asm
>> It turns out that RISC-V doesn't have a CLZ instruction in the base
>> ISA.  As a consequence, __builtin_ffs() emits a library call to ffs() on
>> GCC, or a de Bruijn sequence on Clang.
>>
>> The optional Zbb extension adds a CLZ instruction, after which
>> __builtin_ffs() emits a very simple sequence.
>>
>> This leaves us with several options.
>>
>> 1) Put generic_ffs() back in, although if we do this then it's going to
>> be out-of-line in lib/ where it can be mostly ignored.
>>
>> 2) Require Zbb for Xen.
>>
>> 3) Alternative it up with Zbb or generic_ffs().
>>
>>
>> I've got half a mind to do 1) irrespective.  It's mostly just shuffling
>> logic out of bitops.h into lib/.
> Yes. Might also help with the bi-sectability issue you faced.

I'm not sure it will help for bisectability in this case.  But it might
simplify some of the other rearrangements.


>> I also think we should do option 2 for RISCV.  Given the instruction
>> groups that H does mandate, it's unrealistic to expect that such a chip
>> wouldn't support Zbb/etc.
> I'm not so sure here.
>
>> Also, getting full alternatives working is yet-more work that's not
>> trivial at this point in RISCV's development.  I think it is entirely
>> reasonable to avoid this work for now, and make it a problem for anyone
>> who has an H-capable Zbb-incapable system.  (with a strong implication
>> that this is work that probably never needs to be done.)
> That's definitely for later.

The argument being made is that it seems highly unlikely for there to be
non-Zbb systems running Xen, and furthermore, if this turns out not to
be true, it is reasonable to offload the effort of making it work to
whomever has hardware looking like that.

i.e. it's fine to require Zbb at this point.

This doesn't prevent someone else doing the work to alter this in the
future, for what appears to be an absurd configuration that is unlikely
to exist in reality.

~Andrew


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/7] xen/bitops: Implement ffs64() in common logic
  2024-03-13 17:27 ` [PATCH 5/7] xen/bitops: Implement ffs64() in common logic Andrew Cooper
@ 2024-03-14 15:56   ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2024-03-14 15:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 13.03.2024 18:27, Andrew Cooper wrote:
> --- a/xen/common/bitops.c
> +++ b/xen/common/bitops.c
> @@ -47,6 +47,18 @@ static void test_ffs(void)
>      CHECK(ffsl, 1UL << (BITS_PER_LONG - 1), BITS_PER_LONG);
>      if ( BITS_PER_LONG > 32 )
>          CHECK(ffsl, 1UL << 32, 33);
> +
> +    /*
> +     * unsigned int ffs64(uint64_t)
> +     *
> +     * 32-bit builds of Xen have to split this into two adjacent operations,
> +     * so test all interesting bit positions.
> +     */
> +    CHECK(ffs64, 0, 0);
> +    CHECK(ffs64, 1, 1);
> +    CHECK(ffs64, (uint64_t)0x0000000080000000, 32);
> +    CHECK(ffs64, (uint64_t)0x0000000100000000, 33);
> +    CHECK(ffs64, (uint64_t)0x8000000000000000, 64);

I'm pretty sure Misra will want ULL suffixes on the last two and at least an UL
one on the middle of the lines. The casts aren't going to help (and could then
be dropped).

Jan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] xen/bitops: Delete find_first_set_bit()
  2024-03-13 17:27 ` [PATCH 7/7] xen/bitops: Delete find_first_set_bit() Andrew Cooper
@ 2024-03-14 15:59   ` Jan Beulich
  2024-03-14 17:14     ` Andrew Cooper
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2024-03-14 15:59 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 13.03.2024 18:27, Andrew Cooper wrote:
> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>      r__;                                                                    \
>  })
>  
> -/**
> - * find_first_set_bit - find the first set bit in @word
> - * @word: the word to search
> - * 
> - * Returns the bit-number of the first set bit. The input must *not* be zero.
> - */
> -static inline unsigned int find_first_set_bit(unsigned long word)
> -{
> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
> -    return (unsigned int)word;
> -}

And you think it's okay to no longer use TZCNT like this when available,
where the output doesn't have to have its value set up front?

Jan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] xen/bitops: Implement ffs() in common logic
  2024-03-14 14:16   ` Jan Beulich
@ 2024-03-14 16:23     ` Andrew Cooper
  2024-03-14 16:35       ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-14 16:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 14/03/2024 2:16 pm, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
>> --- a/xen/arch/arm/include/asm/bitops.h
>> +++ b/xen/arch/arm/include/asm/bitops.h
>> @@ -157,7 +157,7 @@ static inline int fls(unsigned int x)
>>  }
>>  
>>  
>> -#define ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
>> +#define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
> The way the macro is invoked, I don't think the helper local variable
> is then needed anymore?

I strongly suspect It is still needed.  ISOLATE_LSB() double-expands its
parameter.

Either way, I'm not reopening that can of worms that lead to this form.
>> --- a/xen/arch/x86/include/asm/bitops.h
>> +++ b/xen/arch/x86/include/asm/bitops.h
>> @@ -430,16 +430,23 @@ static inline int ffsl(unsigned long x)
>>      return (int)r+1;
>>  }
>>  
>> -static inline int ffs(unsigned int x)
>> +static inline unsigned int arch_ffs(unsigned int x)
>>  {
>> -    int r;
>> +    int r = -1;
>> +
>> +    /*
>> +     * The AMD manual states that BSF won't modify the destination register if
>> +     * x=0.  The Intel manual states that the result is undefined, but the
>> +     * architects have said that the register is written back with it's old
>> +     * value, possibly zero extended above 32 bits.
>> +     */
>> +    asm ( "bsf %[val], %[res]"
>> +          : [res] "+r" (r)
>> +          : [val] "rm" (x) );
> And this isn't what the compiler would be doing anyway?

No.  The builtin avoids all undefined behaviour, and is quite a lot of
asm as a result.

With some help from the gcc mailing list
https://gcc.gnu.org/pipermail/gcc/2024-March/243465.html I've found a
solution which improves things in the common case.

> Also, just to mention it: I take it that you/we are sure that disallowing
> both operands to be the same register is still better than ...
>
>> -    asm ( "bsf %1,%0\n\t"
>> -          "jnz 1f\n\t"
>> -          "mov $-1,%0\n"
>> -          "1:" : "=r" (r) : "rm" (x));
> ... the original form?

Yes.  Without any doubt, on a 64bit CPU.

This transformation isn't safe on a 486, but I expect even the later
32bit CPUs lacking register renaming would still be better with the
non-branch form.


>> --- a/xen/include/xen/bitops.h
>> +++ b/xen/include/xen/bitops.h
>> @@ -110,6 +110,21 @@ static inline int generic_flsl(unsigned long x)
>>  
>>  #include <asm/bitops.h>
>>  
>> +/*
>> + * Find First Set bit.  Bits are labelled from 1.
>> + */
>> +static always_inline __pure unsigned int ffs(unsigned int x)
> Why always_inline?

For all the normal reasons to counter Clang and GCC doing stupid things
with inlines that contain assembly.

>
>> +{
>> +    if ( __builtin_constant_p(x) )
>> +        return __builtin_ffs(x);
>> +
>> +#ifndef arch_ffs
>> +#define arch_ffs __builtin_ffs
>> +#endif
>> +
>> +    return arch_ffs(x);
>> +}
> Just to mention it: __builtin_ffs() takes and returns plain int. I'm
> happy about our own helper being unsigned-correct, but anything like
> this has a Misra angle too.

I did note that, and decided it could wait until some other point.

~Andrew


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RISCV] [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs()
  2024-03-14 15:33   ` Jan Beulich
  2024-03-14 15:55     ` Andrew Cooper
@ 2024-03-14 16:32     ` Oleksii
  1 sibling, 0 replies; 33+ messages in thread
From: Oleksii @ 2024-03-14 16:32 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Shawn Anastasio,
	consulting @ bugseng . com, Simone Ballarin, Federico Serafini,
	Nicola Vetrini, Xen-devel

On Thu, 2024-03-14 at 16:33 +0100, Jan Beulich wrote:
> On 14.03.2024 15:45, Andrew Cooper wrote:
> > On 13/03/2024 5:27 pm, Andrew Cooper wrote:
> > > Start cleaning it up with ffs() and friends.  Across the board,
> > > this adds:
> > > 
> > >  * Functioning bitops without arch-specific asm
> > 
> > It turns out that RISC-V doesn't have a CLZ instruction in the base
> > ISA.  As a consequence, __builtin_ffs() emits a library call to
> > ffs() on
> > GCC, or a de Bruijn sequence on Clang.
> > 
> > The optional Zbb extension adds a CLZ instruction, after which
> > __builtin_ffs() emits a very simple sequence.
> > 
> > This leaves us with several options.
> > 
> > 1) Put generic_ffs() back in, although if we do this then it's
> > going to
> > be out-of-line in lib/ where it can be mostly ignored.
> > 
> > 2) Require Zbb for Xen.
> > 
> > 3) Alternative it up with Zbb or generic_ffs().
> > 
> > 
> > I've got half a mind to do 1) irrespective.  It's mostly just
> > shuffling
> > logic out of bitops.h into lib/.
> 
> Yes. Might also help with the bi-sectability issue you faced.
> 
> > I also think we should do option 2 for RISCV.  Given the
> > instruction
> > groups that H does mandate, it's unrealistic to expect that such a
> > chip
> > wouldn't support Zbb/etc.
> 
> I'm not so sure here.
If to look at available specs of CPUs with H, then, for example, SiFive
P600 series family doesn't support Zbb extenstion:
https://sifive.cdn.prismic.io/sifive/7be0420e-dac1-4558-85bc-50c7a10787e7_p600_datasheet.pdf

But I asked a team who are producing CPU with H support and they have
Zbb extenstion.

> 
> > Also, getting full alternatives working is yet-more work that's not
> > trivial at this point in RISCV's development.  I think it is
> > entirely
> > reasonable to avoid this work for now, and make it a problem for
> > anyone
> > who has an H-capable Zbb-incapable system.  (with a strong
> > implication
> > that this is work that probably never needs to be done.)
> 
> That's definitely for later.
Considering that we are mainly using QEMU and it provides Zbb extension
we can just update -march, and that a real h/w where I can ask to test
a code also support this extenstion I will update riscv/booting.txt and
update -march.

~ Oleksii


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] xen/bitops: Implement ffs() in common logic
  2024-03-14 16:23     ` Andrew Cooper
@ 2024-03-14 16:35       ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2024-03-14 16:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 14.03.2024 17:23, Andrew Cooper wrote:
> On 14/03/2024 2:16 pm, Jan Beulich wrote:
>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>> --- a/xen/arch/arm/include/asm/bitops.h
>>> +++ b/xen/arch/arm/include/asm/bitops.h
>>> @@ -157,7 +157,7 @@ static inline int fls(unsigned int x)
>>>  }
>>>  
>>>  
>>> -#define ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
>>> +#define arch_ffs(x) ({ unsigned int __t = (x); fls(ISOLATE_LSB(__t)); })
>> The way the macro is invoked, I don't think the helper local variable
>> is then needed anymore?
> 
> I strongly suspect It is still needed.  ISOLATE_LSB() double-expands its
> parameter.

Even that double evaluation doesn't matter when the invoking entity is an
inline function, and it doesn't use any non-trivial expression as argument.

>>> --- a/xen/include/xen/bitops.h
>>> +++ b/xen/include/xen/bitops.h
>>> @@ -110,6 +110,21 @@ static inline int generic_flsl(unsigned long x)
>>>  
>>>  #include <asm/bitops.h>
>>>  
>>> +/*
>>> + * Find First Set bit.  Bits are labelled from 1.
>>> + */
>>> +static always_inline __pure unsigned int ffs(unsigned int x)
>> Why always_inline?
> 
> For all the normal reasons to counter Clang and GCC doing stupid things
> with inlines that contain assembly.

Hmm, there are issues when the asm() would look "complex" to the compiler,
but that's not the case here. I was asking because, as you imply by how
you responded, we may need to gain many more always_inline when at some
time even you were arguing against overriding compiler decisions like this
(unless I'm mis-remembering).

Jan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1
  2024-03-14 14:30   ` Jan Beulich
@ 2024-03-14 16:48     ` Oleksii
  2024-03-14 16:55       ` Jan Beulich
  2024-03-14 18:47     ` Andrew Cooper
  1 sibling, 1 reply; 33+ messages in thread
From: Oleksii @ 2024-03-14 16:48 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Shawn Anastasio,
	consulting @ bugseng . com, Simone Ballarin, Federico Serafini,
	Nicola Vetrini, Xen-devel

On Thu, 2024-03-14 at 15:30 +0100, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct
> > domain_iommu *hd,
> >      if ( contig_mask )
> >      {
> >          /* See pt-contig-markers.h for a description of the marker
> > scheme. */
> > -        unsigned int i, shift = find_first_set_bit(contig_mask);
> > +        unsigned int i, shift = ffsl(contig_mask) - 1;
> 
> The need for subtracting 1 is why personally I dislike ffs() / ffsl()
> (and
> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
> introduced).
> 
> But what I first of all would like to have clarification on is what
> your
> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
> Potential side-by-side uses would be odd now, and would continue to
> be odd
> if the difference in bit labeling was retained. Since we're switching
> to
> a consolidated set of basic helpers, such an anomaly would better not
> survive imo.
Right now, ffz() is defined as __ffs(~(x)), so and it seems to me
__ffs()/ffz() exist only as a Linux compatible, so I wanted as a part
of RISC-V patch series put into xen/linux-compat.h and just include
this header where it will be necessary:

#define __ffs(x) (ffs(~(x)) - 1)
#define ffz(x) __ffs(~(x))

Why should we care about ffzl()? It is not used in Xen, is it?

~ Oleksii




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1
  2024-03-14 16:48     ` Oleksii
@ 2024-03-14 16:55       ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2024-03-14 16:55 UTC (permalink / raw)
  To: Oleksii
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Shawn Anastasio,
	consulting @ bugseng . com, Simone Ballarin, Federico Serafini,
	Nicola Vetrini, Xen-devel, Andrew Cooper

On 14.03.2024 17:48, Oleksii wrote:
> On Thu, 2024-03-14 at 15:30 +0100, Jan Beulich wrote:
>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct
>>> domain_iommu *hd,
>>>      if ( contig_mask )
>>>      {
>>>          /* See pt-contig-markers.h for a description of the marker
>>> scheme. */
>>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
>>
>> The need for subtracting 1 is why personally I dislike ffs() / ffsl()
>> (and
>> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
>> introduced).
>>
>> But what I first of all would like to have clarification on is what
>> your
>> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
>> Potential side-by-side uses would be odd now, and would continue to
>> be odd
>> if the difference in bit labeling was retained. Since we're switching
>> to
>> a consolidated set of basic helpers, such an anomaly would better not
>> survive imo.
> Right now, ffz() is defined as __ffs(~(x)), so and it seems to me
> __ffs()/ffz() exist only as a Linux compatible, so I wanted as a part
> of RISC-V patch series put into xen/linux-compat.h and just include
> this header where it will be necessary:
> 
> #define __ffs(x) (ffs(~(x)) - 1)
> #define ffz(x) __ffs(~(x))

Well, right now ffz() is used in just a single file. One option therefore
would be to not have it available generally, and - as you say - if need
be supply it in linux-compat.h. Another option would be to have something
along its lines generally available, if we deem it useful.

> Why should we care about ffzl()? It is not used in Xen, is it?

I find it odd to have ffs() and ffsl(), but then ffz() without ffzl().
That's not my understanding of a generally useful (and largely free of
surprises) set of library routines.

Jan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] xen/bitops: Delete find_first_set_bit()
  2024-03-14 15:59   ` Jan Beulich
@ 2024-03-14 17:14     ` Andrew Cooper
  2024-03-15 13:48       ` Andrew Cooper
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-14 17:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 14/03/2024 3:59 pm, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/bitops.h
>> +++ b/xen/arch/x86/include/asm/bitops.h
>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>>      r__;                                                                    \
>>  })
>>  
>> -/**
>> - * find_first_set_bit - find the first set bit in @word
>> - * @word: the word to search
>> - * 
>> - * Returns the bit-number of the first set bit. The input must *not* be zero.
>> - */
>> -static inline unsigned int find_first_set_bit(unsigned long word)
>> -{
>> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
>> -    return (unsigned int)word;
>> -}
> And you think it's okay to no longer use TZCNT like this when available,
> where the output doesn't have to have its value set up front?

This is a particularly evil piece of inline asm.

It is interpreted as BSF or TZCNT depending on the BMI instruction set
(Haswell/Piledriver era).  Furthermore there are errata on some Intel
systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated.

Which means this piece of asm suffers from all of an undefined output
register, undefined CF behaviour, and differing ZF behaviour (I believe)
depending on which hardware you're running on.

The only thing the REP prefix is getting you is a deterministic 0 in the
destination register, on some hardware only, for code which has already
violated the input safety condition.  As a piece of defence in depth,
then perhaps it's useful.

But following up from the other thread,
https://gcc.gnu.org/pipermail/gcc/2024-March/243475.html is form where
the compiler can (and does!) simplify back to the plain BSF form when it
can prove that this is safe.

The only case where using TZCNT is helpful is when we're compiling for
x86_64-v3 and there is no need to work around BSF's undefined behaviour.

Even with x86's arch_ffs() now split nicely based on whether the
compiler knows BSF is safe or not, an alternative to swap between BSF
and TZCNT probably isn't a win; you still have to cover up 6 or 7 bytes
of the -1 setup, which you can't do with leading prefixes on the TZCNT
itself.

All CPUs with BMI can swallow the double-instruction data dependency
without breaking a sweat, at which point you're trading off (at best) a
1-cycle improvement vs the setup costs of the alternative.  If there is
any real improvement to be had here, it's marginal enough that I'm not
sure it's worth doing.

~Andrew

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1
  2024-03-14 14:30   ` Jan Beulich
  2024-03-14 16:48     ` Oleksii
@ 2024-03-14 18:47     ` Andrew Cooper
  2024-03-14 18:51       ` Andrew Cooper
  1 sibling, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-14 18:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 14/03/2024 2:30 pm, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
>>      if ( contig_mask )
>>      {
>>          /* See pt-contig-markers.h for a description of the marker scheme. */
>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
> introduced).

It's sad that there are competing APIs with different bit-labelling, but
the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
ARM that I studied in detail).

I firmly believe that fewer APIs which are fully well defined (and can
optimise based on the compiler's idea of safety) is still better than a
maze of APIs with different behaviours.

> But what I first of all would like to have clarification on is what your
> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
> Potential side-by-side uses would be odd now, and would continue to be odd
> if the difference in bit labeling was retained. Since we're switching to
> a consolidated set of basic helpers, such an anomaly would better not
> survive imo.

I honestly hadn't got that far yet.  I was mainly trying to dis-entangle
the existing mess so RISC-V wasn't making it yet-worse.

But yes - it warrants thinking about.

I was intending to do the fls() next then popcnt().   The latter has
quite a lot of cleanup wanting to come with it, and is more
architecturally invasive, and I know I've got a years-old outstanding
piece of work to try and do popcnt more nicely on x86.

I have wanted ffz() in the past.  I think I just went with explicit ~
because I didn't want to continue this debate at the time.

However, I (very much more) do not want a situation where ffs() and
ffz() have different bit-labellings.

There are no builtins, and having now studied the architectures we care
about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count
leading/trailing zeros" instruction.

So using ffs(~val) really will get you the best code generation
available, and seeing as it halves the number of bitops to maintain, I
think this is the best tradeoff overall.

I intend to put ffz() and __ffs() into linux-compat.h and leave them
there to discourage their use generally.

~Andrew

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1
  2024-03-14 18:47     ` Andrew Cooper
@ 2024-03-14 18:51       ` Andrew Cooper
  2024-03-18  9:13         ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-14 18:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 14/03/2024 6:47 pm, Andrew Cooper wrote:
> On 14/03/2024 2:30 pm, Jan Beulich wrote:
>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
>>>      if ( contig_mask )
>>>      {
>>>          /* See pt-contig-markers.h for a description of the marker scheme. */
>>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
>> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
>> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
>> introduced).
> It's sad that there are competing APIs with different bit-labelling, but
> the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
> ARM that I studied in detail).
>
> I firmly believe that fewer APIs which are fully well defined (and can
> optimise based on the compiler's idea of safety) is still better than a
> maze of APIs with different behaviours.
>
>> But what I first of all would like to have clarification on is what your
>> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
>> Potential side-by-side uses would be odd now, and would continue to be odd
>> if the difference in bit labeling was retained. Since we're switching to
>> a consolidated set of basic helpers, such an anomaly would better not
>> survive imo.
> I honestly hadn't got that far yet.  I was mainly trying to dis-entangle
> the existing mess so RISC-V wasn't making it yet-worse.
>
> But yes - it warrants thinking about.
>
>
> I was intending to do the fls() next then popcnt().   The latter has
> quite a lot of cleanup wanting to come with it, and is more
> architecturally invasive, and I know I've got a years-old outstanding
> piece of work to try and do popcnt more nicely on x86.
>
> I have wanted ffz() in the past.  I think I just went with explicit ~
> because I didn't want to continue this debate at the time.
>
> However, I (very much more) do not want a situation where ffs() and
> ffz() have different bit-labellings.
>
>
> There are no builtins, and having now studied the architectures we care
> about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count
> leading/trailing zeros" instruction.

Hopefully obviously, I meant ones here.   My point is that the compiler
emitted code always has a NOT in it somewhere.

>
> So using ffs(~val) really will get you the best code generation
> available, and seeing as it halves the number of bitops to maintain, I
> think this is the best tradeoff overall.
>
> I intend to put ffz() and __ffs() into linux-compat.h and leave them
> there to discourage their use generally.
>
> ~Andrew



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] xen/bitops: Delete find_first_set_bit()
  2024-03-14 17:14     ` Andrew Cooper
@ 2024-03-15 13:48       ` Andrew Cooper
  2024-03-15 14:16         ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2024-03-15 13:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 14/03/2024 5:14 pm, Andrew Cooper wrote:
> On 14/03/2024 3:59 pm, Jan Beulich wrote:
>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/include/asm/bitops.h
>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>>>      r__;                                                                    \
>>>  })
>>>  
>>> -/**
>>> - * find_first_set_bit - find the first set bit in @word
>>> - * @word: the word to search
>>> - * 
>>> - * Returns the bit-number of the first set bit. The input must *not* be zero.
>>> - */
>>> -static inline unsigned int find_first_set_bit(unsigned long word)
>>> -{
>>> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
>>> -    return (unsigned int)word;
>>> -}
>> And you think it's okay to no longer use TZCNT like this when available,
>> where the output doesn't have to have its value set up front?
> This is a particularly evil piece of inline asm.
>
> It is interpreted as BSF or TZCNT depending on the BMI instruction set
> (Haswell/Piledriver era).  Furthermore there are errata on some Intel
> systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated.
>
> Which means this piece of asm suffers from all of an undefined output
> register, undefined CF behaviour, and differing ZF behaviour (I believe)
> depending on which hardware you're running on.
>
> The only thing the REP prefix is getting you is a deterministic 0 in the
> destination register,

No, it doesn't.

For a zero input, TZCNT yields the operand size, so you get 16/32/64; 64
in this case.

It also means there's no chance of coming up with a useful alternative
for ffs() to use TZCNT when available.

~Andrew


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] xen/bitops: Delete find_first_set_bit()
  2024-03-15 13:48       ` Andrew Cooper
@ 2024-03-15 14:16         ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2024-03-15 14:16 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 15.03.2024 14:48, Andrew Cooper wrote:
> On 14/03/2024 5:14 pm, Andrew Cooper wrote:
>> On 14/03/2024 3:59 pm, Jan Beulich wrote:
>>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>>> --- a/xen/arch/x86/include/asm/bitops.h
>>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>>> @@ -401,18 +401,6 @@ static always_inline unsigned int __scanbit(unsigned long val, unsigned int max)
>>>>      r__;                                                                    \
>>>>  })
>>>>  
>>>> -/**
>>>> - * find_first_set_bit - find the first set bit in @word
>>>> - * @word: the word to search
>>>> - * 
>>>> - * Returns the bit-number of the first set bit. The input must *not* be zero.
>>>> - */
>>>> -static inline unsigned int find_first_set_bit(unsigned long word)
>>>> -{
>>>> -    asm ( "rep; bsf %1,%0" : "=r" (word) : "rm" (word) );
>>>> -    return (unsigned int)word;
>>>> -}
>>> And you think it's okay to no longer use TZCNT like this when available,
>>> where the output doesn't have to have its value set up front?
>> This is a particularly evil piece of inline asm.
>>
>> It is interpreted as BSF or TZCNT depending on the BMI instruction set
>> (Haswell/Piledriver era).  Furthermore there are errata on some Intel
>> systems where REP BSF behaves as per TZCNT *even* when BMI isn't enumerated.
>>
>> Which means this piece of asm suffers from all of an undefined output
>> register, undefined CF behaviour, and differing ZF behaviour (I believe)
>> depending on which hardware you're running on.
>>
>> The only thing the REP prefix is getting you is a deterministic 0 in the
>> destination register,
> 
> No, it doesn't.
> 
> For a zero input, TZCNT yields the operand size, so you get 16/32/64; 64
> in this case.
> 
> It also means there's no chance of coming up with a useful alternative
> for ffs() to use TZCNT when available.

Right, for ffs() TZCNT isn't suitable. But for find_first_set_bit() it was,
yielding a reliably out-of-range output for zero input (which BSF wouldn't
guarantee).

Jan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1
  2024-03-14 18:51       ` Andrew Cooper
@ 2024-03-18  9:13         ` Jan Beulich
  2024-03-18 12:27           ` Andrew Cooper
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2024-03-18  9:13 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 14.03.2024 19:51, Andrew Cooper wrote:
> On 14/03/2024 6:47 pm, Andrew Cooper wrote:
>> On 14/03/2024 2:30 pm, Jan Beulich wrote:
>>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
>>>>      if ( contig_mask )
>>>>      {
>>>>          /* See pt-contig-markers.h for a description of the marker scheme. */
>>>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>>>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
>>> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
>>> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
>>> introduced).
>> It's sad that there are competing APIs with different bit-labelling, but
>> the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
>> ARM that I studied in detail).
>>
>> I firmly believe that fewer APIs which are fully well defined (and can
>> optimise based on the compiler's idea of safety) is still better than a
>> maze of APIs with different behaviours.

I agree here. The anomaly (as I would call it) with ffs(), though, is what
makes me wonder whether we might not be better off introducing ctz() and
clz() instead. Unlike ffs() their name says exactly what is meant. This is
then also a clear hint, for Arm and RISC-V at least, what underlying
instruction is used. Plus there are matching builtins (unlike for e.g.
fls()).

>>> But what I first of all would like to have clarification on is what your
>>> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
>>> Potential side-by-side uses would be odd now, and would continue to be odd
>>> if the difference in bit labeling was retained. Since we're switching to
>>> a consolidated set of basic helpers, such an anomaly would better not
>>> survive imo.
>> I honestly hadn't got that far yet.  I was mainly trying to dis-entangle
>> the existing mess so RISC-V wasn't making it yet-worse.
>>
>> But yes - it warrants thinking about.
>>
>>
>> I was intending to do the fls() next then popcnt().   The latter has
>> quite a lot of cleanup wanting to come with it, and is more
>> architecturally invasive, and I know I've got a years-old outstanding
>> piece of work to try and do popcnt more nicely on x86.
>>
>> I have wanted ffz() in the past.  I think I just went with explicit ~
>> because I didn't want to continue this debate at the time.
>>
>> However, I (very much more) do not want a situation where ffs() and
>> ffz() have different bit-labellings.
>>
>>
>> There are no builtins, and having now studied the architectures we care
>> about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count
>> leading/trailing zeros" instruction.
> 
> Hopefully obviously, I meant ones here.   My point is that the compiler
> emitted code always has a NOT in it somewhere.

Right; I was about to ask but then remembered there was another mail from
you on this thread.

>> So using ffs(~val) really will get you the best code generation
>> available, and seeing as it halves the number of bitops to maintain, I
>> think this is the best tradeoff overall.
>>
>> I intend to put ffz() and __ffs() into linux-compat.h and leave them
>> there to discourage their use generally.

I'm okay with this plan. As per above I'd prefer if ffs() moved there, too.

Jan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1
  2024-03-18  9:13         ` Jan Beulich
@ 2024-03-18 12:27           ` Andrew Cooper
  0 siblings, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2024-03-18 12:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Bertrand Marquis, Michal Orzel, Oleksii Kurochko,
	Shawn Anastasio, consulting @ bugseng . com, Simone Ballarin,
	Federico Serafini, Nicola Vetrini, Xen-devel

On 18/03/2024 9:13 am, Jan Beulich wrote:
> On 14.03.2024 19:51, Andrew Cooper wrote:
>> On 14/03/2024 6:47 pm, Andrew Cooper wrote:
>>> On 14/03/2024 2:30 pm, Jan Beulich wrote:
>>>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>>>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd,
>>>>>      if ( contig_mask )
>>>>>      {
>>>>>          /* See pt-contig-markers.h for a description of the marker scheme. */
>>>>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>>>>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
>>>> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
>>>> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
>>>> introduced).
>>> It's sad that there are competing APIs with different bit-labelling, but
>>> the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
>>> ARM that I studied in detail).
>>>
>>> I firmly believe that fewer APIs which are fully well defined (and can
>>> optimise based on the compiler's idea of safety) is still better than a
>>> maze of APIs with different behaviours.
> I agree here. The anomaly (as I would call it) with ffs(), though, is what
> makes me wonder whether we might not be better off introducing ctz() and
> clz() instead. Unlike ffs() their name says exactly what is meant. This is
> then also a clear hint, for Arm and RISC-V at least, what underlying
> instruction is used. Plus there are matching builtins (unlike for e.g.
> fls()).

I considered this, but I think it will be a bad idea.

Right now, almost all of our logic is expressed in terms of
ffs()/fls().  Rearranging this to clz/ctz is risky enough on its own,
let alone the potential for mistakes during backport.

Both ffs() and fls() are well defined for all inputs, and I've found a
way to let the optimiser deal with simplifying things when safe to do so.

Therefore, keeping ffs()/fls() is the right thing to do.  It's harder to
shoot yourself in the foot with, and optimiser can still do an good job
in the general case.

~Andrew


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-03-18 12:28 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-13 17:27 [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
2024-03-13 17:27 ` [PATCH 1/7] xen/bitops: Cleanup ahead of rearrangements Andrew Cooper
2024-03-13 18:39   ` Shawn Anastasio
2024-03-13 23:06   ` Andrew Cooper
2024-03-14 13:59   ` Jan Beulich
2024-03-13 17:27 ` [PATCH 2/7] xen/bitops: Implement ffs() in common logic Andrew Cooper
2024-03-14 14:16   ` Jan Beulich
2024-03-14 16:23     ` Andrew Cooper
2024-03-14 16:35       ` Jan Beulich
2024-03-13 17:27 ` [PATCH 3/7] xen/bitops: Implement ffsl() " Andrew Cooper
2024-03-13 17:48   ` Andrew Cooper
2024-03-14 13:45     ` Andrew Cooper
2024-03-13 18:16   ` Andrew Cooper
2024-03-13 17:27 ` [PATCH 4/7] xen/bitops: Delete generic_ffs{,l}() Andrew Cooper
2024-03-13 17:27 ` [PATCH 5/7] xen/bitops: Implement ffs64() in common logic Andrew Cooper
2024-03-14 15:56   ` Jan Beulich
2024-03-13 17:27 ` [PATCH 6/7] xen: Swap find_first_set_bit() for ffsl() - 1 Andrew Cooper
2024-03-14 14:30   ` Jan Beulich
2024-03-14 16:48     ` Oleksii
2024-03-14 16:55       ` Jan Beulich
2024-03-14 18:47     ` Andrew Cooper
2024-03-14 18:51       ` Andrew Cooper
2024-03-18  9:13         ` Jan Beulich
2024-03-18 12:27           ` Andrew Cooper
2024-03-13 17:27 ` [PATCH 7/7] xen/bitops: Delete find_first_set_bit() Andrew Cooper
2024-03-14 15:59   ` Jan Beulich
2024-03-14 17:14     ` Andrew Cooper
2024-03-15 13:48       ` Andrew Cooper
2024-03-15 14:16         ` Jan Beulich
2024-03-14 14:45 ` [RISCV] [PATCH 0/7] xen/bitops: Reduce the mess, starting with ffs() Andrew Cooper
2024-03-14 15:33   ` Jan Beulich
2024-03-14 15:55     ` Andrew Cooper
2024-03-14 16:32     ` Oleksii

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.