xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen.
@ 2020-11-11 21:51 Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 01/15] xen/arm: Support detection of CPU features in other ID registers Ash Wilding
                   ` (15 more replies)
  0 siblings, 16 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>


Hey,

I've found some time to improve this series a bit:


Changes since RFC v1
====================

 - Broken up patches into smaller chunks to aid in readability.

 - As per Julien's feedback I've also introduced intermediary patches
   that first remove Xen's existing headers, then pull in the current
   Linux versions as-is. This means we only need to review the changes
   made while porting to Xen, rather than reviewing the existing Linux
   code.

 - Pull in Linux's <asm-generic/rwonce.h> as <xen/rwonce.h> for
   __READ_ONCE()/__WRITE_ONCE() instead of putting these in Xen's
   <xen/compiler.h>. While doing this, provide justification for
   dropping Linux's <linux/compiler_types.h> and relaxing the
   __READ_ONCE() usage of __unqual_scalar_typeof() to just typeof()
   (see patches #5 and #6).



Keeping the rest of the cover-letter unchanged as it would still be
good to discuss these things:


Arguments in favour of doing this
=================================

    - Lets SMMUv3 driver switch to using <asm/atomic.h> rather than
      maintaining its own implementation of the helpers.

    - Provides mitigation against XSA-295 [2], which affects both arm32
      and arm64, across all versions of Xen, and may allow a domU to
      maliciously or erroneously DoS the hypervisor.

    - All Armv8-A core implementations since ~2017 implement LSE so
      there is an argument to be made we are long overdue support in
      Xen. This is compounded by LSE atomics being more performant than
      LL/SC atomics in most real-world applications, especially at high
      core counts.

    - We may be able to get improved performance when using LL/SC too
      as Linux provides helpers with relaxed ordering requirements which
      are currently not available in Xen, though for this we would need
      to go back through existing code to see where the more relaxed
      versions can be safely used.

    - Anything else?


Arguments against doing this
============================

    - Limited testing infrastructure in place to ensure use of new
      atomics helpers does not introduce new bugs and regressions. This
      is a particularly strong argument given how difficult it can be to
      identify and debug malfunctioning atomics. The old adage applies,
      "If it ain't broke, don't fix it."

    - Anything else?


Disclaimers
===========

 - This is a very rough first-pass effort intended to spur the
   discussions along.

 - Only build-tested on arm64 and arm32, *not* run-tested. I did also
   build for x86_64 just to make I didn't inadvertently break that.

 - This version only tackles atomics and cmpxchg; I've not yet had a
   chance to look at locks so those are still using LL/SC.

 - The timeout variants of cmpxchg() (used by guest atomics) are still
   using LL/SC regardless of LSE being available, as these helpers are
   not provided by Linux so I just copied over the existing Xen ones.


Any further comments, guidance, and discussion on how to improve this 
approach and get LSE atomics support merged into Xen would be greatly
appreciated.

Thanks!
Ash.

[1] https://lore.kernel.org/xen-devel/13baac40-8b10-0def-4e44-0d8f655fcaf1@xen.org/
[2] https://xenbits.xen.org/xsa/advisory-295.txt

Ash Wilding (15):
  xen/arm: Support detection of CPU features in other ID registers
  xen/arm: Add detection of Armv8.1-LSE atomic instructions
  xen/arm: Add ARM64_HAS_LSE_ATOMICS hwcap
  xen/arm: Delete Xen atomics helpers
  xen/arm: pull in Linux atomics helpers and dependencies
  xen: port Linux <asm-generic/rwonce.h> to Xen
  xen/arm: prepare existing Xen headers for Linux atomics
  xen/arm64: port Linux LL/SC atomics helpers to Xen
  xen/arm64: port Linux LSE atomics helpers to Xen
  xen/arm64: port Linux's arm64 cmpxchg.h to Xen
  xen/arm64: port Linux's arm64 atomic.h to Xen
  xen/arm64: port Linux's arm64 lse.h to Xen
  xen/arm32: port Linux's arm32 atomic.h to Xen
  xen/arm32: port Linux's arm32 cmpxchg.h to Xen
  xen/arm: remove dependency on gcc built-in __sync_fetch_and_add()

 xen/arch/arm/Kconfig                     |  11 +
 xen/arch/arm/Makefile                    |   1 +
 xen/arch/arm/lse.c                       |  13 +
 xen/arch/arm/setup.c                     |  13 +-
 xen/include/asm-arm/arm32/atomic.h       | 253 +++++++-----
 xen/include/asm-arm/arm32/cmpxchg.h      | 388 +++++++++++-------
 xen/include/asm-arm/arm32/system.h       |   2 +-
 xen/include/asm-arm/arm64/atomic.h       | 236 +++++------
 xen/include/asm-arm/arm64/atomic_ll_sc.h | 231 +++++++++++
 xen/include/asm-arm/arm64/atomic_lse.h   | 246 +++++++++++
 xen/include/asm-arm/arm64/cmpxchg.h      | 497 ++++++++++++++++-------
 xen/include/asm-arm/arm64/lse.h          |  48 +++
 xen/include/asm-arm/arm64/system.h       |   2 +-
 xen/include/asm-arm/atomic.h             |  15 +-
 xen/include/asm-arm/cpufeature.h         |  57 +--
 xen/include/asm-arm/system.h             |   9 +-
 xen/include/xen/rwonce.h                 |  21 +
 17 files changed, 1469 insertions(+), 574 deletions(-)
 create mode 100644 xen/arch/arm/lse.c
 create mode 100644 xen/include/asm-arm/arm64/atomic_ll_sc.h
 create mode 100644 xen/include/asm-arm/arm64/atomic_lse.h
 create mode 100644 xen/include/asm-arm/arm64/lse.h
 create mode 100644 xen/include/xen/rwonce.h

-- 
2.24.3 (Apple Git-128)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 01/15] xen/arm: Support detection of CPU features in other ID registers
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 02/15] xen/arm: Add detection of Armv8.1-LSE atomic instructions Ash Wilding
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

The current Arm boot_cpu_feature64() and boot_cpu_feature32() macros
are hardcoded to only detect features in ID_AA64PFR{0,1}_EL1 and
ID_PFR{0,1} respectively.

This patch replaces these macros with a new macro, boot_cpu_feature(),
which takes an explicit ID register name as an argument.

While we're here, cull cpu_feature64() and cpu_feature32() as they
have no callers (we only ever use the boot CPU features), and update
the printk() messages in setup.c to use the new macro.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/arch/arm/setup.c             |  8 +++---
 xen/include/asm-arm/cpufeature.h | 44 +++++++++++++++-----------------
 2 files changed, 24 insertions(+), 28 deletions(-)

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 7fcff9af2a..5121f06fc5 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -134,16 +134,16 @@ static void __init processor_id(void)
            cpu_has_gicv3 ? " GICv3-SysReg" : "");
 
     /* Warn user if we find unknown floating-point features */
-    if ( cpu_has_fp && (boot_cpu_feature64(fp) >= 2) )
+    if ( cpu_has_fp && (boot_cpu_feature(pfr64, fp) >= 2) )
         printk(XENLOG_WARNING "WARNING: Unknown Floating-point ID:%d, "
                "this may result in corruption on the platform\n",
-               boot_cpu_feature64(fp));
+               boot_cpu_feature(pfr64, fp));
 
     /* Warn user if we find unknown AdvancedSIMD features */
-    if ( cpu_has_simd && (boot_cpu_feature64(simd) >= 2) )
+    if ( cpu_has_simd && (boot_cpu_feature(pfr64, simd) >= 2) )
         printk(XENLOG_WARNING "WARNING: Unknown AdvancedSIMD ID:%d, "
                "this may result in corruption on the platform\n",
-               boot_cpu_feature64(simd));
+               boot_cpu_feature(pfr64, simd));
 
     printk("  Debug Features: %016"PRIx64" %016"PRIx64"\n",
            boot_cpu_data.dbg64.bits[0], boot_cpu_data.dbg64.bits[1]);
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index 10878ead8a..f9281ea343 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -1,39 +1,35 @@
 #ifndef __ASM_ARM_CPUFEATURE_H
 #define __ASM_ARM_CPUFEATURE_H
 
+#define boot_cpu_feature(idreg, feat) (boot_cpu_data.idreg.feat)
+
 #ifdef CONFIG_ARM_64
-#define cpu_feature64(c, feat)         ((c)->pfr64.feat)
-#define boot_cpu_feature64(feat)       (boot_cpu_data.pfr64.feat)
-
-#define cpu_has_el0_32    (boot_cpu_feature64(el0) == 2)
-#define cpu_has_el0_64    (boot_cpu_feature64(el0) >= 1)
-#define cpu_has_el1_32    (boot_cpu_feature64(el1) == 2)
-#define cpu_has_el1_64    (boot_cpu_feature64(el1) >= 1)
-#define cpu_has_el2_32    (boot_cpu_feature64(el2) == 2)
-#define cpu_has_el2_64    (boot_cpu_feature64(el2) >= 1)
-#define cpu_has_el3_32    (boot_cpu_feature64(el3) == 2)
-#define cpu_has_el3_64    (boot_cpu_feature64(el3) >= 1)
-#define cpu_has_fp        (boot_cpu_feature64(fp) < 8)
-#define cpu_has_simd      (boot_cpu_feature64(simd) < 8)
-#define cpu_has_gicv3     (boot_cpu_feature64(gic) == 1)
+#define cpu_has_el0_32          (boot_cpu_feature(pfr64, el0) == 2)
+#define cpu_has_el0_64          (boot_cpu_feature(pfr64, el0) >= 1)
+#define cpu_has_el1_32          (boot_cpu_feature(pfr64, el1) == 2)
+#define cpu_has_el1_64          (boot_cpu_feature(pfr64, el1) >= 1)
+#define cpu_has_el2_32          (boot_cpu_feature(pfr64, el2) == 2)
+#define cpu_has_el2_64          (boot_cpu_feature(pfr64, el2) >= 1)
+#define cpu_has_el3_32          (boot_cpu_feature(pfr64, el3) == 2)
+#define cpu_has_el3_64          (boot_cpu_feature(pfr64, el3) >= 1)
+#define cpu_has_fp              (boot_cpu_feature(pfr64, fp) < 8)
+#define cpu_has_simd            (boot_cpu_feature(pfr64, simd) < 8)
+#define cpu_has_gicv3           (boot_cpu_feature(pfr64, gic) == 1)
 #endif
 
-#define cpu_feature32(c, feat)         ((c)->pfr32.feat)
-#define boot_cpu_feature32(feat)       (boot_cpu_data.pfr32.feat)
-
-#define cpu_has_arm       (boot_cpu_feature32(arm) == 1)
-#define cpu_has_thumb     (boot_cpu_feature32(thumb) >= 1)
-#define cpu_has_thumb2    (boot_cpu_feature32(thumb) >= 3)
-#define cpu_has_jazelle   (boot_cpu_feature32(jazelle) > 0)
-#define cpu_has_thumbee   (boot_cpu_feature32(thumbee) == 1)
+#define cpu_has_arm       (boot_cpu_feature(pfr32, arm) == 1)
+#define cpu_has_thumb     (boot_cpu_feature(pfr32, thumb) >= 1)
+#define cpu_has_thumb2    (boot_cpu_feature(pfr32, thumb) >= 3)
+#define cpu_has_jazelle   (boot_cpu_feature(pfr32, jazelle) > 0)
+#define cpu_has_thumbee   (boot_cpu_feature(pfr32, thumbee) == 1)
 #define cpu_has_aarch32   (cpu_has_arm || cpu_has_thumb)
 
 #ifdef CONFIG_ARM_32
-#define cpu_has_gentimer  (boot_cpu_feature32(gentimer) == 1)
+#define cpu_has_gentimer  (boot_cpu_feature(pfr32, gentimer) == 1)
 #else
 #define cpu_has_gentimer  (1)
 #endif
-#define cpu_has_security  (boot_cpu_feature32(security) > 0)
+#define cpu_has_security  (boot_cpu_feature(pfr32, security) > 0)
 
 #define ARM64_WORKAROUND_CLEAN_CACHE    0
 #define ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE    1
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 02/15] xen/arm: Add detection of Armv8.1-LSE atomic instructions
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 01/15] xen/arm: Support detection of CPU features in other ID registers Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 03/15] xen/arm: Add ARM64_HAS_LSE_ATOMICS hwcap Ash Wilding
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

Use the new infrastructure for detecting CPU features in other ID
registers to detect the presence of Armv8.1-LSE atomic instructions,
as reported by ID_AA64ISAR0_EL1.Atomic.

While we're here, print detection of these instructions in setup.c's
processor_id().

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/arch/arm/setup.c             |  5 +++--
 xen/include/asm-arm/cpufeature.h | 10 +++++++++-
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 5121f06fc5..138e1957c5 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -128,10 +128,11 @@ static void __init processor_id(void)
            cpu_has_el2_32 ? "64+32" : cpu_has_el2_64 ? "64" : "No",
            cpu_has_el1_32 ? "64+32" : cpu_has_el1_64 ? "64" : "No",
            cpu_has_el0_32 ? "64+32" : cpu_has_el0_64 ? "64" : "No");
-    printk("    Extensions:%s%s%s\n",
+    printk("    Extensions:%s%s%s%s\n",
            cpu_has_fp ? " FloatingPoint" : "",
            cpu_has_simd ? " AdvancedSIMD" : "",
-           cpu_has_gicv3 ? " GICv3-SysReg" : "");
+           cpu_has_gicv3 ? " GICv3-SysReg" : "",
+           cpu_has_lse_atomics ? " LSE-Atomics" : "");
 
     /* Warn user if we find unknown floating-point features */
     if ( cpu_has_fp && (boot_cpu_feature(pfr64, fp) >= 2) )
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index f9281ea343..2366926e82 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -15,6 +15,7 @@
 #define cpu_has_fp              (boot_cpu_feature(pfr64, fp) < 8)
 #define cpu_has_simd            (boot_cpu_feature(pfr64, simd) < 8)
 #define cpu_has_gicv3           (boot_cpu_feature(pfr64, gic) == 1)
+#define cpu_has_lse_atomics     (boot_cpu_feature(isa64, atomic) == 2)
 #endif
 
 #define cpu_has_arm       (boot_cpu_feature(pfr32, arm) == 1)
@@ -187,8 +188,15 @@ struct cpuinfo_arm {
         };
     } mm64;
 
-    struct {
+    union {
         uint64_t bits[2];
+        struct {
+            unsigned long __res0 : 20;
+            unsigned long atomic : 4;
+            unsigned long __res1 : 40;
+
+            unsigned long __res2 : 64;
+        };
     } isa64;
 
 #endif
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 03/15] xen/arm: Add ARM64_HAS_LSE_ATOMICS hwcap
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 01/15] xen/arm: Support detection of CPU features in other ID registers Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 02/15] xen/arm: Add detection of Armv8.1-LSE atomic instructions Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 04/15] xen/arm: Delete Xen atomics helpers Ash Wilding
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

This patch introduces the ARM64_HAS_LSE_ATOMICS hwcap.

While doing this, CONFIG_ARM64_LSE_ATOMICS is added to control whether
the hwcap is actually detected and set at runtime. Without this Kconfig
being set we will always fallback on LL/SC atomics using Armv8.0 exlusive
accesses.

Note this patch does not actually add the ALTERNATIVE() switching based
on the hwcap being detected and set; that comes later in the series.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/arch/arm/Kconfig             | 11 +++++++++++
 xen/arch/arm/Makefile            |  1 +
 xen/arch/arm/lse.c               | 13 +++++++++++++
 xen/include/asm-arm/cpufeature.h |  3 ++-
 4 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/lse.c

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 2777388265..febc41e492 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -78,6 +78,17 @@ config SBSA_VUART_CONSOLE
 	  Allows a guest to use SBSA Generic UART as a console. The
 	  SBSA Generic UART implements a subset of ARM PL011 UART.
 
+config ARM64_LSE_ATOMICS
+	bool "Armv8.1-LSE Atomics"
+	depends on ARM_64 && HAS_ALTERNATIVE
+	default y
+	---help---
+	When set, dynamically patch Xen at runtime to use Armv8.1-LSE
+	atomics when supported by the system.
+
+	When unset, or when Armv8.1-LSE atomics are not supported by the
+	system, fallback on LL/SC atomics using Armv8.0 exclusive accesses.
+
 config ARM_SSBD
 	bool "Speculative Store Bypass Disable" if EXPERT
 	depends on HAS_ALTERNATIVE
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 296c5e68bb..cadd0ad253 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -63,6 +63,7 @@ obj-y += vsmc.o
 obj-y += vpsci.o
 obj-y += vuart.o
 extra-y += $(TARGET_SUBARCH)/head.o
+obj-$(CONFIG_ARM64_LSE_ATOMICS) += lse.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/lse.c b/xen/arch/arm/lse.c
new file mode 100644
index 0000000000..8274dac671
--- /dev/null
+++ b/xen/arch/arm/lse.c
@@ -0,0 +1,13 @@
+
+#include <asm/cpufeature.h>
+#include <xen/init.h>
+
+static int __init update_lse_caps(void)
+{
+    if ( cpu_has_lse_atomics )
+        cpus_set_cap(ARM64_HAS_LSE_ATOMICS);
+
+    return 0;
+}
+
+__initcall(update_lse_caps);
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index 2366926e82..48c172ee29 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -42,8 +42,9 @@
 #define ARM_SSBD 7
 #define ARM_SMCCC_1_1 8
 #define ARM64_WORKAROUND_AT_SPECULATE 9
+#define ARM64_HAS_LSE_ATOMICS 10
 
-#define ARM_NCAPS           10
+#define ARM_NCAPS           11
 
 #ifndef __ASSEMBLY__
 
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 04/15] xen/arm: Delete Xen atomics helpers
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (2 preceding siblings ...)
  2020-11-11 21:51 ` [RFC PATCH v2 03/15] xen/arm: Add ARM64_HAS_LSE_ATOMICS hwcap Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 05/15] xen/arm: pull in Linux atomics helpers and dependencies Ash Wilding
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

To maintain clean diffs and dissectability, let's delete the existing
Xen atomics helpers before pulling in the Linux versions.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/arm32/atomic.h  | 175 ---------------------
 xen/include/asm-arm/arm32/cmpxchg.h | 229 ----------------------------
 xen/include/asm-arm/arm64/atomic.h  | 148 ------------------
 xen/include/asm-arm/arm64/cmpxchg.h | 183 ----------------------
 4 files changed, 735 deletions(-)
 delete mode 100644 xen/include/asm-arm/arm32/atomic.h
 delete mode 100644 xen/include/asm-arm/arm32/cmpxchg.h
 delete mode 100644 xen/include/asm-arm/arm64/atomic.h
 delete mode 100644 xen/include/asm-arm/arm64/cmpxchg.h

diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
deleted file mode 100644
index 2832a72792..0000000000
--- a/xen/include/asm-arm/arm32/atomic.h
+++ /dev/null
@@ -1,175 +0,0 @@
-/*
- *  arch/arm/include/asm/atomic.h
- *
- *  Copyright (C) 1996 Russell King.
- *  Copyright (C) 2002 Deep Blue Solutions Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#ifndef __ARCH_ARM_ARM32_ATOMIC__
-#define __ARCH_ARM_ARM32_ATOMIC__
-
-/*
- * ARMv6 UP and SMP safe atomic ops.  We use load exclusive and
- * store exclusive to ensure that these are atomic.  We may loop
- * to ensure that the update happens.
- */
-static inline void atomic_add(int i, atomic_t *v)
-{
-	unsigned long tmp;
-	int result;
-
-	prefetchw(&v->counter);
-	__asm__ __volatile__("@ atomic_add\n"
-"1:	ldrex	%0, [%3]\n"
-"	add	%0, %0, %4\n"
-"	strex	%1, %0, [%3]\n"
-"	teq	%1, #0\n"
-"	bne	1b"
-	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
-	: "r" (&v->counter), "Ir" (i)
-	: "cc");
-}
-
-static inline int atomic_add_return(int i, atomic_t *v)
-{
-	unsigned long tmp;
-	int result;
-
-	smp_mb();
-	prefetchw(&v->counter);
-
-	__asm__ __volatile__("@ atomic_add_return\n"
-"1:	ldrex	%0, [%3]\n"
-"	add	%0, %0, %4\n"
-"	strex	%1, %0, [%3]\n"
-"	teq	%1, #0\n"
-"	bne	1b"
-	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
-	: "r" (&v->counter), "Ir" (i)
-	: "cc");
-
-	smp_mb();
-
-	return result;
-}
-
-static inline void atomic_sub(int i, atomic_t *v)
-{
-	unsigned long tmp;
-	int result;
-
-	prefetchw(&v->counter);
-	__asm__ __volatile__("@ atomic_sub\n"
-"1:	ldrex	%0, [%3]\n"
-"	sub	%0, %0, %4\n"
-"	strex	%1, %0, [%3]\n"
-"	teq	%1, #0\n"
-"	bne	1b"
-	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
-	: "r" (&v->counter), "Ir" (i)
-	: "cc");
-}
-
-static inline int atomic_sub_return(int i, atomic_t *v)
-{
-	unsigned long tmp;
-	int result;
-
-	smp_mb();
-	prefetchw(&v->counter);
-
-	__asm__ __volatile__("@ atomic_sub_return\n"
-"1:	ldrex	%0, [%3]\n"
-"	sub	%0, %0, %4\n"
-"	strex	%1, %0, [%3]\n"
-"	teq	%1, #0\n"
-"	bne	1b"
-	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
-	: "r" (&v->counter), "Ir" (i)
-	: "cc");
-
-	smp_mb();
-
-	return result;
-}
-
-static inline void atomic_and(int m, atomic_t *v)
-{
-	unsigned long tmp;
-	int result;
-
-	prefetchw(&v->counter);
-	__asm__ __volatile__("@ atomic_and\n"
-"1:	ldrex	%0, [%3]\n"
-"	and	%0, %0, %4\n"
-"	strex	%1, %0, [%3]\n"
-"	teq	%1, #0\n"
-"	bne	1b"
-	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
-	: "r" (&v->counter), "Ir" (m)
-	: "cc");
-}
-
-static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
-{
-	int oldval;
-	unsigned long res;
-
-	smp_mb();
-	prefetchw(&ptr->counter);
-
-	do {
-		__asm__ __volatile__("@ atomic_cmpxchg\n"
-		"ldrex	%1, [%3]\n"
-		"mov	%0, #0\n"
-		"teq	%1, %4\n"
-		"strexeq %0, %5, [%3]\n"
-		    : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
-		    : "r" (&ptr->counter), "Ir" (old), "r" (new)
-		    : "cc");
-	} while (res);
-
-	smp_mb();
-
-	return oldval;
-}
-
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
-{
-	int oldval, newval;
-	unsigned long tmp;
-
-	smp_mb();
-	prefetchw(&v->counter);
-
-	__asm__ __volatile__ ("@ atomic_add_unless\n"
-"1:	ldrex	%0, [%4]\n"
-"	teq	%0, %5\n"
-"	beq	2f\n"
-"	add	%1, %0, %6\n"
-"	strex	%2, %1, [%4]\n"
-"	teq	%2, #0\n"
-"	bne	1b\n"
-"2:"
-	: "=&r" (oldval), "=&r" (newval), "=&r" (tmp), "+Qo" (v->counter)
-	: "r" (&v->counter), "r" (u), "r" (a)
-	: "cc");
-
-	if (oldval != u)
-		smp_mb();
-
-	return oldval;
-}
-
-#endif /* __ARCH_ARM_ARM32_ATOMIC__ */
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 8
- * indent-tabs-mode: t
- * End:
- */
diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
deleted file mode 100644
index b0bd1d8b68..0000000000
--- a/xen/include/asm-arm/arm32/cmpxchg.h
+++ /dev/null
@@ -1,229 +0,0 @@
-#ifndef __ASM_ARM32_CMPXCHG_H
-#define __ASM_ARM32_CMPXCHG_H
-
-#include <xen/prefetch.h>
-
-extern void __bad_xchg(volatile void *, int);
-
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
-{
-	unsigned long ret;
-	unsigned int tmp;
-
-	smp_mb();
-	prefetchw((const void *)ptr);
-
-	switch (size) {
-	case 1:
-		asm volatile("@	__xchg1\n"
-		"1:	ldrexb	%0, [%3]\n"
-		"	strexb	%1, %2, [%3]\n"
-		"	teq	%1, #0\n"
-		"	bne	1b"
-			: "=&r" (ret), "=&r" (tmp)
-			: "r" (x), "r" (ptr)
-			: "memory", "cc");
-		break;
-	case 4:
-		asm volatile("@	__xchg4\n"
-		"1:	ldrex	%0, [%3]\n"
-		"	strex	%1, %2, [%3]\n"
-		"	teq	%1, #0\n"
-		"	bne	1b"
-			: "=&r" (ret), "=&r" (tmp)
-			: "r" (x), "r" (ptr)
-			: "memory", "cc");
-		break;
-	default:
-		__bad_xchg(ptr, size), ret = 0;
-		break;
-	}
-	smp_mb();
-
-	return ret;
-}
-
-#define xchg(ptr,x) \
-	((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
-
-/*
- * Atomic compare and exchange.  Compare OLD with MEM, if identical,
- * store NEW in MEM.  Return the initial value in MEM.  Success is
- * indicated by comparing RETURN with OLD.
- */
-
-extern unsigned long __bad_cmpxchg(volatile void *ptr, int size);
-
-#define __CMPXCHG_CASE(sz, name)					\
-static inline bool __cmpxchg_case_##name(volatile void *ptr,		\
-					 unsigned long *old,		\
-					 unsigned long new,		\
-					 bool timeout,			\
-					 unsigned int max_try)		\
-{									\
-	unsigned long oldval;						\
-	unsigned long res;						\
-									\
-	do {								\
-		asm volatile("@ __cmpxchg_case_" #name "\n"		\
-		"	ldrex" #sz "	%1, [%2]\n"			\
-		"	mov	%0, #0\n"				\
-		"	teq	%1, %3\n"				\
-		"	strex" #sz "eq %0, %4, [%2]\n"			\
-		: "=&r" (res), "=&r" (oldval)				\
-		: "r" (ptr), "Ir" (*old), "r" (new)			\
-		: "memory", "cc");					\
-									\
-		if (!res)						\
-			break;						\
-	} while (!timeout || ((--max_try) > 0));			\
-									\
-	*old = oldval;							\
-									\
-	return !res;							\
-}
-
-__CMPXCHG_CASE(b, 1)
-__CMPXCHG_CASE(h, 2)
-__CMPXCHG_CASE( , 4)
-
-static inline bool __cmpxchg_case_8(volatile uint64_t *ptr,
-			 	    uint64_t *old,
-			 	    uint64_t new,
-			 	    bool timeout,
-				    unsigned int max_try)
-{
-	uint64_t oldval;
-	uint64_t res;
-
-	do {
-		asm volatile(
-		"	ldrexd		%1, %H1, [%3]\n"
-		"	teq		%1, %4\n"
-		"	teqeq		%H1, %H4\n"
-		"	movne		%0, #0\n"
-		"	movne		%H0, #0\n"
-		"	bne		2f\n"
-		"	strexd		%0, %5, %H5, [%3]\n"
-		"2:"
-		: "=&r" (res), "=&r" (oldval), "+Qo" (*ptr)
-		: "r" (ptr), "r" (*old), "r" (new)
-		: "memory", "cc");
-		if (!res)
-			break;
-	} while (!timeout || ((--max_try) > 0));
-
-	*old = oldval;
-
-	return !res;
-}
-
-static always_inline bool __int_cmpxchg(volatile void *ptr, unsigned long *old,
-					unsigned long new, int size,
-					bool timeout, unsigned int max_try)
-{
-	prefetchw((const void *)ptr);
-
-	switch (size) {
-	case 1:
-		return __cmpxchg_case_1(ptr, old, new, timeout, max_try);
-	case 2:
-		return __cmpxchg_case_2(ptr, old, new, timeout, max_try);
-	case 4:
-		return __cmpxchg_case_4(ptr, old, new, timeout, max_try);
-	default:
-		return __bad_cmpxchg(ptr, size);
-	}
-
-	ASSERT_UNREACHABLE();
-}
-
-static always_inline unsigned long __cmpxchg(volatile void *ptr,
-					     unsigned long old,
-					     unsigned long new,
-					     int size)
-{
-	smp_mb();
-	if (!__int_cmpxchg(ptr, &old, new, size, false, 0))
-		ASSERT_UNREACHABLE();
-	smp_mb();
-
-	return old;
-}
-
-/*
- * The helper may fail to update the memory if the action takes too long.
- *
- * @old: On call the value pointed contains the expected old value. It will be
- * updated to the actual old value.
- * @max_try: Maximum number of iterations
- *
- * The helper will return true when the update has succeeded (i.e no
- * timeout) and false if the update has failed.
- */
-static always_inline bool __cmpxchg_timeout(volatile void *ptr,
-					    unsigned long *old,
-					    unsigned long new,
-					    int size,
-					    unsigned int max_try)
-{
-	bool ret;
-
-	smp_mb();
-	ret = __int_cmpxchg(ptr, old, new, size, true, max_try);
-	smp_mb();
-
-	return ret;
-}
-
-/*
- * The helper may fail to update the memory if the action takes too long.
- *
- * @old: On call the value pointed contains the expected old value. It will be
- * updated to the actual old value.
- * @max_try: Maximum number of iterations
- *
- * The helper will return true when the update has succeeded (i.e no
- * timeout) and false if the update has failed.
- */
-static always_inline bool __cmpxchg64_timeout(volatile uint64_t *ptr,
-					      uint64_t *old,
-					      uint64_t new,
-					      unsigned int max_try)
-{
-	bool ret;
-
-	smp_mb();
-	ret = __cmpxchg_case_8(ptr, old, new, true, max_try);
-	smp_mb();
-
-	return ret;
-}
-
-#define cmpxchg(ptr,o,n)						\
-	((__typeof__(*(ptr)))__cmpxchg((ptr),				\
-				       (unsigned long)(o),		\
-				       (unsigned long)(n),		\
-				       sizeof(*(ptr))))
-
-static inline uint64_t cmpxchg64(volatile uint64_t *ptr,
-				 uint64_t old,
-				 uint64_t new)
-{
-	smp_mb();
-	if (!__cmpxchg_case_8(ptr, &old, new, false, 0))
-		ASSERT_UNREACHABLE();
-	smp_mb();
-
-	return old;
-}
-
-#endif
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 8
- * indent-tabs-mode: t
- * End:
- */
diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
deleted file mode 100644
index 2d42567866..0000000000
--- a/xen/include/asm-arm/arm64/atomic.h
+++ /dev/null
@@ -1,148 +0,0 @@
-/*
- * Based on arch/arm64/include/asm/atomic.h
- * which in turn is
- * Based on arch/arm/include/asm/atomic.h
- *
- * Copyright (C) 1996 Russell King.
- * Copyright (C) 2002 Deep Blue Solutions Ltd.
- * Copyright (C) 2012 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see <http://www.gnu.org/licenses/>.
- */
-#ifndef __ARCH_ARM_ARM64_ATOMIC
-#define __ARCH_ARM_ARM64_ATOMIC
-
-/*
- * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
- * store exclusive to ensure that these are atomic.  We may loop
- * to ensure that the update happens.
- */
-static inline void atomic_add(int i, atomic_t *v)
-{
-	unsigned long tmp;
-	int result;
-
-	asm volatile("// atomic_add\n"
-"1:	ldxr	%w0, %2\n"
-"	add	%w0, %w0, %w3\n"
-"	stxr	%w1, %w0, %2\n"
-"	cbnz	%w1, 1b"
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
-	: "Ir" (i));
-}
-
-static inline int atomic_add_return(int i, atomic_t *v)
-{
-	unsigned long tmp;
-	int result;
-
-	asm volatile("// atomic_add_return\n"
-"1:	ldxr	%w0, %2\n"
-"	add	%w0, %w0, %w3\n"
-"	stlxr	%w1, %w0, %2\n"
-"	cbnz	%w1, 1b"
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
-	: "Ir" (i)
-	: "memory");
-
-	smp_mb();
-	return result;
-}
-
-static inline void atomic_sub(int i, atomic_t *v)
-{
-	unsigned long tmp;
-	int result;
-
-	asm volatile("// atomic_sub\n"
-"1:	ldxr	%w0, %2\n"
-"	sub	%w0, %w0, %w3\n"
-"	stxr	%w1, %w0, %2\n"
-"	cbnz	%w1, 1b"
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
-	: "Ir" (i));
-}
-
-static inline int atomic_sub_return(int i, atomic_t *v)
-{
-	unsigned long tmp;
-	int result;
-
-	asm volatile("// atomic_sub_return\n"
-"1:	ldxr	%w0, %2\n"
-"	sub	%w0, %w0, %w3\n"
-"	stlxr	%w1, %w0, %2\n"
-"	cbnz	%w1, 1b"
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
-	: "Ir" (i)
-	: "memory");
-
-	smp_mb();
-	return result;
-}
-
-static inline void atomic_and(int m, atomic_t *v)
-{
-	unsigned long tmp;
-	int result;
-
-	asm volatile("// atomic_and\n"
-"1:	ldxr	%w0, %2\n"
-"	and	%w0, %w0, %w3\n"
-"	stxr	%w1, %w0, %2\n"
-"	cbnz	%w1, 1b"
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
-	: "Ir" (m));
-}
-
-static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
-{
-	unsigned long tmp;
-	int oldval;
-
-	smp_mb();
-
-	asm volatile("// atomic_cmpxchg\n"
-"1:	ldxr	%w1, %2\n"
-"	cmp	%w1, %w3\n"
-"	b.ne	2f\n"
-"	stxr	%w0, %w4, %2\n"
-"	cbnz	%w0, 1b\n"
-"2:"
-	: "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter)
-	: "Ir" (old), "r" (new)
-	: "cc");
-
-	smp_mb();
-	return oldval;
-}
-
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
-{
-	int c, old;
-
-	c = atomic_read(v);
-	while (c != u && (old = atomic_cmpxchg((v), c, c + a)) != c)
-		c = old;
-	return c;
-}
-
-#endif
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 8
- * indent-tabs-mode: t
- * End:
- */
diff --git a/xen/include/asm-arm/arm64/cmpxchg.h b/xen/include/asm-arm/arm64/cmpxchg.h
deleted file mode 100644
index 10e4edc022..0000000000
--- a/xen/include/asm-arm/arm64/cmpxchg.h
+++ /dev/null
@@ -1,183 +0,0 @@
-#ifndef __ASM_ARM64_CMPXCHG_H
-#define __ASM_ARM64_CMPXCHG_H
-
-extern void __bad_xchg(volatile void *, int);
-
-static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
-{
-	unsigned long ret, tmp;
-
-	switch (size) {
-	case 1:
-		asm volatile("//	__xchg1\n"
-		"1:	ldxrb	%w0, %2\n"
-		"	stlxrb	%w1, %w3, %2\n"
-		"	cbnz	%w1, 1b\n"
-			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
-			: "r" (x)
-			: "memory");
-		break;
-	case 2:
-		asm volatile("//	__xchg2\n"
-		"1:	ldxrh	%w0, %2\n"
-		"	stlxrh	%w1, %w3, %2\n"
-		"	cbnz	%w1, 1b\n"
-			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
-			: "r" (x)
-			: "memory");
-		break;
-	case 4:
-		asm volatile("//	__xchg4\n"
-		"1:	ldxr	%w0, %2\n"
-		"	stlxr	%w1, %w3, %2\n"
-		"	cbnz	%w1, 1b\n"
-			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
-			: "r" (x)
-			: "memory");
-		break;
-	case 8:
-		asm volatile("//	__xchg8\n"
-		"1:	ldxr	%0, %2\n"
-		"	stlxr	%w1, %3, %2\n"
-		"	cbnz	%w1, 1b\n"
-			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
-			: "r" (x)
-			: "memory");
-		break;
-	default:
-		__bad_xchg(ptr, size), ret = 0;
-		break;
-	}
-
-	smp_mb();
-	return ret;
-}
-
-#define xchg(ptr,x) \
-({ \
-	__typeof__(*(ptr)) __ret; \
-	__ret = (__typeof__(*(ptr))) \
-		__xchg((unsigned long)(x), (ptr), sizeof(*(ptr))); \
-	__ret; \
-})
-
-extern unsigned long __bad_cmpxchg(volatile void *ptr, int size);
-
-#define __CMPXCHG_CASE(w, sz, name)					\
-static inline bool __cmpxchg_case_##name(volatile void *ptr,		\
-					 unsigned long *old,		\
-					 unsigned long new,		\
-					 bool timeout,			\
-					 unsigned int max_try)		\
-{									\
-	unsigned long oldval;						\
-	unsigned long res;						\
-									\
-	do {								\
-		asm volatile("// __cmpxchg_case_" #name "\n"		\
-		"	ldxr" #sz "	%" #w "1, %2\n"			\
-		"	mov	%w0, #0\n"				\
-		"	cmp	%" #w "1, %" #w "3\n"			\
-		"	b.ne	1f\n"					\
-		"	stxr" #sz "	%w0, %" #w "4, %2\n"		\
-		"1:\n"							\
-		: "=&r" (res), "=&r" (oldval),				\
-		  "+Q" (*(unsigned long *)ptr)				\
-		: "Ir" (*old), "r" (new)				\
-		: "cc");						\
-									\
-		if (!res)						\
-			break;						\
-	} while (!timeout || ((--max_try) > 0));			\
-									\
-	*old = oldval;							\
-									\
-	return !res;							\
-}
-
-__CMPXCHG_CASE(w, b, 1)
-__CMPXCHG_CASE(w, h, 2)
-__CMPXCHG_CASE(w,  , 4)
-__CMPXCHG_CASE( ,  , 8)
-
-static always_inline bool __int_cmpxchg(volatile void *ptr, unsigned long *old,
-					unsigned long new, int size,
-					bool timeout, unsigned int max_try)
-{
-	switch (size) {
-	case 1:
-		return __cmpxchg_case_1(ptr, old, new, timeout, max_try);
-	case 2:
-		return __cmpxchg_case_2(ptr, old, new, timeout, max_try);
-	case 4:
-		return __cmpxchg_case_4(ptr, old, new, timeout, max_try);
-	case 8:
-		return __cmpxchg_case_8(ptr, old, new, timeout, max_try);
-	default:
-		return __bad_cmpxchg(ptr, size);
-	}
-
-	ASSERT_UNREACHABLE();
-}
-
-static always_inline unsigned long __cmpxchg(volatile void *ptr,
-					     unsigned long old,
-					     unsigned long new,
-					     int size)
-{
-	smp_mb();
-	if (!__int_cmpxchg(ptr, &old, new, size, false, 0))
-		ASSERT_UNREACHABLE();
-	smp_mb();
-
-	return old;
-}
-
-/*
- * The helper may fail to update the memory if the action takes too long.
- *
- * @old: On call the value pointed contains the expected old value. It will be
- * updated to the actual old value.
- * @max_try: Maximum number of iterations
- *
- * The helper will return true when the update has succeeded (i.e no
- * timeout) and false if the update has failed.
- */
-static always_inline bool __cmpxchg_timeout(volatile void *ptr,
-					    unsigned long *old,
-					    unsigned long new,
-					    int size,
-					    unsigned int max_try)
-{
-	bool ret;
-
-	smp_mb();
-	ret = __int_cmpxchg(ptr, old, new, size, true, max_try);
-	smp_mb();
-
-	return ret;
-}
-
-#define cmpxchg(ptr, o, n) \
-({ \
-	__typeof__(*(ptr)) __ret; \
-	__ret = (__typeof__(*(ptr))) \
-		__cmpxchg((ptr), (unsigned long)(o), (unsigned long)(n), \
-			  sizeof(*(ptr))); \
-	__ret; \
-})
-
-#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
-
-#define __cmpxchg64_timeout(ptr, old, new, max_try)	\
-	__cmpxchg_timeout(ptr, old, new, 8, max_try)
-
-#endif
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 8
- * indent-tabs-mode: t
- * End:
- */
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 05/15] xen/arm: pull in Linux atomics helpers and dependencies
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (3 preceding siblings ...)
  2020-11-11 21:51 ` [RFC PATCH v2 04/15] xen/arm: Delete Xen atomics helpers Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-11-12  8:31   ` Jan Beulich
  2020-11-11 21:51 ` [RFC PATCH v2 06/15] xen: port Linux <asm-generic/rwonce.h> to Xen Ash Wilding
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

This patch pulls in Linux's atomics helpers for arm32 and arm64, and
their dependencies, as-is.

Note that Linux's arm32 atomics helpers use the READ_ONCE() and
WRITE_ONCE() macros defined in <asm-generic/rwonce.h>, while Linux's
arm64 atomics helpers use __READ_ONCE() and __WRITE_ONCE().

The only difference is that the __* versions skip checking whether the
object being accessed is the same size as a native C type (e.g. char,
int, long, etc). Given the arm32 helpers are using the macros to access
an atomic_t's counter member, which is an int, we can skip this check by
using the __* versions like the arm64 code does.

I mention this here because it forms the first part of my justification
for *not* copying Linux's <linux/compiler_types.h> to Xen; the size
check described above is performed by __native_word() defined in that
header.

The second part of my justification may be more contentious; as you'll
see in the next patch, I intend to drop the __unqual_scalar_typeof()
calls in __READ_ONCE() and __WRITE_ONCE(). This is because the pointer
to the atomic_t's counter member is always a basic (int *) so we don't
need this, and dropping it means we can avoid having to port Linux's
<linux/compiler_types.h>.

If people are against this approach, please bear in mind that the
current version of __unqual_scalar_typeof() in <linux/compiler_types.h>
was actually the reason for Linux upgrading the minimum GCC version
required to 4.9 earlier this year so that they can guarantee C11
_Generic support [1].

So if we do want to take Linux's <linux/compiler_types.h> we'll either
need to:

   A) bump up the minimum required version of GCC to 4.9 to match
      that required by Linux; in the Xen README we currently state
      GCC 4.8 for Arm and GCC 4.1.2_20070115 for x86.

or:

   B) mix-and-match an older version of Linux's <linux/compiler_types.h>
      with the other headers taken from a newer version of Linux.

Thoughts?

[1] https://lkml.org/lkml/2020/7/8/1308

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/arm32/atomic.h       | 507 +++++++++++++++++++++++
 xen/include/asm-arm/arm32/cmpxchg.h      | 279 +++++++++++++
 xen/include/asm-arm/arm64/atomic.h       | 228 ++++++++++
 xen/include/asm-arm/arm64/atomic_ll_sc.h | 353 ++++++++++++++++
 xen/include/asm-arm/arm64/atomic_lse.h   | 419 +++++++++++++++++++
 xen/include/asm-arm/arm64/cmpxchg.h      | 285 +++++++++++++
 xen/include/asm-arm/arm64/lse.h          |  48 +++
 xen/include/xen/rwonce.h                 |  90 ++++
 8 files changed, 2209 insertions(+)
 create mode 100644 xen/include/asm-arm/arm32/atomic.h
 create mode 100644 xen/include/asm-arm/arm32/cmpxchg.h
 create mode 100644 xen/include/asm-arm/arm64/atomic.h
 create mode 100644 xen/include/asm-arm/arm64/atomic_ll_sc.h
 create mode 100644 xen/include/asm-arm/arm64/atomic_lse.h
 create mode 100644 xen/include/asm-arm/arm64/cmpxchg.h
 create mode 100644 xen/include/asm-arm/arm64/lse.h
 create mode 100644 xen/include/xen/rwonce.h

diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
new file mode 100644
index 0000000000..ac6338dd9b
--- /dev/null
+++ b/xen/include/asm-arm/arm32/atomic.h
@@ -0,0 +1,507 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ *  arch/arm/include/asm/atomic.h
+ *
+ *  Copyright (C) 1996 Russell King.
+ *  Copyright (C) 2002 Deep Blue Solutions Ltd.
+ */
+#ifndef __ASM_ARM_ATOMIC_H
+#define __ASM_ARM_ATOMIC_H
+
+#include <linux/compiler.h>
+#include <linux/prefetch.h>
+#include <linux/types.h>
+#include <linux/irqflags.h>
+#include <asm/barrier.h>
+#include <asm/cmpxchg.h>
+
+#ifdef __KERNEL__
+
+/*
+ * On ARM, ordinary assignment (str instruction) doesn't clear the local
+ * strex/ldrex monitor on some implementations. The reason we can use it for
+ * atomic_set() is the clrex or dummy strex done on every exception return.
+ */
+#define atomic_read(v)	READ_ONCE((v)->counter)
+#define atomic_set(v,i)	WRITE_ONCE(((v)->counter), (i))
+
+#if __LINUX_ARM_ARCH__ >= 6
+
+/*
+ * ARMv6 UP and SMP safe atomic ops.  We use load exclusive and
+ * store exclusive to ensure that these are atomic.  We may loop
+ * to ensure that the update happens.
+ */
+
+#define ATOMIC_OP(op, c_op, asm_op)					\
+static inline void atomic_##op(int i, atomic_t *v)			\
+{									\
+	unsigned long tmp;						\
+	int result;							\
+									\
+	prefetchw(&v->counter);						\
+	__asm__ __volatile__("@ atomic_" #op "\n"			\
+"1:	ldrex	%0, [%3]\n"						\
+"	" #asm_op "	%0, %0, %4\n"					\
+"	strex	%1, %0, [%3]\n"						\
+"	teq	%1, #0\n"						\
+"	bne	1b"							\
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
+	: "r" (&v->counter), "Ir" (i)					\
+	: "cc");							\
+}									\
+
+#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
+static inline int atomic_##op##_return_relaxed(int i, atomic_t *v)	\
+{									\
+	unsigned long tmp;						\
+	int result;							\
+									\
+	prefetchw(&v->counter);						\
+									\
+	__asm__ __volatile__("@ atomic_" #op "_return\n"		\
+"1:	ldrex	%0, [%3]\n"						\
+"	" #asm_op "	%0, %0, %4\n"					\
+"	strex	%1, %0, [%3]\n"						\
+"	teq	%1, #0\n"						\
+"	bne	1b"							\
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
+	: "r" (&v->counter), "Ir" (i)					\
+	: "cc");							\
+									\
+	return result;							\
+}
+
+#define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
+static inline int atomic_fetch_##op##_relaxed(int i, atomic_t *v)	\
+{									\
+	unsigned long tmp;						\
+	int result, val;						\
+									\
+	prefetchw(&v->counter);						\
+									\
+	__asm__ __volatile__("@ atomic_fetch_" #op "\n"			\
+"1:	ldrex	%0, [%4]\n"						\
+"	" #asm_op "	%1, %0, %5\n"					\
+"	strex	%2, %1, [%4]\n"						\
+"	teq	%2, #0\n"						\
+"	bne	1b"							\
+	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Qo" (v->counter)	\
+	: "r" (&v->counter), "Ir" (i)					\
+	: "cc");							\
+									\
+	return result;							\
+}
+
+#define atomic_add_return_relaxed	atomic_add_return_relaxed
+#define atomic_sub_return_relaxed	atomic_sub_return_relaxed
+#define atomic_fetch_add_relaxed	atomic_fetch_add_relaxed
+#define atomic_fetch_sub_relaxed	atomic_fetch_sub_relaxed
+
+#define atomic_fetch_and_relaxed	atomic_fetch_and_relaxed
+#define atomic_fetch_andnot_relaxed	atomic_fetch_andnot_relaxed
+#define atomic_fetch_or_relaxed		atomic_fetch_or_relaxed
+#define atomic_fetch_xor_relaxed	atomic_fetch_xor_relaxed
+
+static inline int atomic_cmpxchg_relaxed(atomic_t *ptr, int old, int new)
+{
+	int oldval;
+	unsigned long res;
+
+	prefetchw(&ptr->counter);
+
+	do {
+		__asm__ __volatile__("@ atomic_cmpxchg\n"
+		"ldrex	%1, [%3]\n"
+		"mov	%0, #0\n"
+		"teq	%1, %4\n"
+		"strexeq %0, %5, [%3]\n"
+		    : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
+		    : "r" (&ptr->counter), "Ir" (old), "r" (new)
+		    : "cc");
+	} while (res);
+
+	return oldval;
+}
+#define atomic_cmpxchg_relaxed		atomic_cmpxchg_relaxed
+
+static inline int atomic_fetch_add_unless(atomic_t *v, int a, int u)
+{
+	int oldval, newval;
+	unsigned long tmp;
+
+	smp_mb();
+	prefetchw(&v->counter);
+
+	__asm__ __volatile__ ("@ atomic_add_unless\n"
+"1:	ldrex	%0, [%4]\n"
+"	teq	%0, %5\n"
+"	beq	2f\n"
+"	add	%1, %0, %6\n"
+"	strex	%2, %1, [%4]\n"
+"	teq	%2, #0\n"
+"	bne	1b\n"
+"2:"
+	: "=&r" (oldval), "=&r" (newval), "=&r" (tmp), "+Qo" (v->counter)
+	: "r" (&v->counter), "r" (u), "r" (a)
+	: "cc");
+
+	if (oldval != u)
+		smp_mb();
+
+	return oldval;
+}
+#define atomic_fetch_add_unless		atomic_fetch_add_unless
+
+#else /* ARM_ARCH_6 */
+
+#ifdef CONFIG_SMP
+#error SMP not supported on pre-ARMv6 CPUs
+#endif
+
+#define ATOMIC_OP(op, c_op, asm_op)					\
+static inline void atomic_##op(int i, atomic_t *v)			\
+{									\
+	unsigned long flags;						\
+									\
+	raw_local_irq_save(flags);					\
+	v->counter c_op i;						\
+	raw_local_irq_restore(flags);					\
+}									\
+
+#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
+static inline int atomic_##op##_return(int i, atomic_t *v)		\
+{									\
+	unsigned long flags;						\
+	int val;							\
+									\
+	raw_local_irq_save(flags);					\
+	v->counter c_op i;						\
+	val = v->counter;						\
+	raw_local_irq_restore(flags);					\
+									\
+	return val;							\
+}
+
+#define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
+static inline int atomic_fetch_##op(int i, atomic_t *v)			\
+{									\
+	unsigned long flags;						\
+	int val;							\
+									\
+	raw_local_irq_save(flags);					\
+	val = v->counter;						\
+	v->counter c_op i;						\
+	raw_local_irq_restore(flags);					\
+									\
+	return val;							\
+}
+
+static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+	int ret;
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	ret = v->counter;
+	if (likely(ret == old))
+		v->counter = new;
+	raw_local_irq_restore(flags);
+
+	return ret;
+}
+
+#define atomic_fetch_andnot		atomic_fetch_andnot
+
+#endif /* __LINUX_ARM_ARCH__ */
+
+#define ATOMIC_OPS(op, c_op, asm_op)					\
+	ATOMIC_OP(op, c_op, asm_op)					\
+	ATOMIC_OP_RETURN(op, c_op, asm_op)				\
+	ATOMIC_FETCH_OP(op, c_op, asm_op)
+
+ATOMIC_OPS(add, +=, add)
+ATOMIC_OPS(sub, -=, sub)
+
+#define atomic_andnot atomic_andnot
+
+#undef ATOMIC_OPS
+#define ATOMIC_OPS(op, c_op, asm_op)					\
+	ATOMIC_OP(op, c_op, asm_op)					\
+	ATOMIC_FETCH_OP(op, c_op, asm_op)
+
+ATOMIC_OPS(and, &=, and)
+ATOMIC_OPS(andnot, &= ~, bic)
+ATOMIC_OPS(or,  |=, orr)
+ATOMIC_OPS(xor, ^=, eor)
+
+#undef ATOMIC_OPS
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_OP_RETURN
+#undef ATOMIC_OP
+
+#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+
+#ifndef CONFIG_GENERIC_ATOMIC64
+typedef struct {
+	s64 counter;
+} atomic64_t;
+
+#define ATOMIC64_INIT(i) { (i) }
+
+#ifdef CONFIG_ARM_LPAE
+static inline s64 atomic64_read(const atomic64_t *v)
+{
+	s64 result;
+
+	__asm__ __volatile__("@ atomic64_read\n"
+"	ldrd	%0, %H0, [%1]"
+	: "=&r" (result)
+	: "r" (&v->counter), "Qo" (v->counter)
+	);
+
+	return result;
+}
+
+static inline void atomic64_set(atomic64_t *v, s64 i)
+{
+	__asm__ __volatile__("@ atomic64_set\n"
+"	strd	%2, %H2, [%1]"
+	: "=Qo" (v->counter)
+	: "r" (&v->counter), "r" (i)
+	);
+}
+#else
+static inline s64 atomic64_read(const atomic64_t *v)
+{
+	s64 result;
+
+	__asm__ __volatile__("@ atomic64_read\n"
+"	ldrexd	%0, %H0, [%1]"
+	: "=&r" (result)
+	: "r" (&v->counter), "Qo" (v->counter)
+	);
+
+	return result;
+}
+
+static inline void atomic64_set(atomic64_t *v, s64 i)
+{
+	s64 tmp;
+
+	prefetchw(&v->counter);
+	__asm__ __volatile__("@ atomic64_set\n"
+"1:	ldrexd	%0, %H0, [%2]\n"
+"	strexd	%0, %3, %H3, [%2]\n"
+"	teq	%0, #0\n"
+"	bne	1b"
+	: "=&r" (tmp), "=Qo" (v->counter)
+	: "r" (&v->counter), "r" (i)
+	: "cc");
+}
+#endif
+
+#define ATOMIC64_OP(op, op1, op2)					\
+static inline void atomic64_##op(s64 i, atomic64_t *v)			\
+{									\
+	s64 result;							\
+	unsigned long tmp;						\
+									\
+	prefetchw(&v->counter);						\
+	__asm__ __volatile__("@ atomic64_" #op "\n"			\
+"1:	ldrexd	%0, %H0, [%3]\n"					\
+"	" #op1 " %Q0, %Q0, %Q4\n"					\
+"	" #op2 " %R0, %R0, %R4\n"					\
+"	strexd	%1, %0, %H0, [%3]\n"					\
+"	teq	%1, #0\n"						\
+"	bne	1b"							\
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
+	: "r" (&v->counter), "r" (i)					\
+	: "cc");							\
+}									\
+
+#define ATOMIC64_OP_RETURN(op, op1, op2)				\
+static inline s64							\
+atomic64_##op##_return_relaxed(s64 i, atomic64_t *v)			\
+{									\
+	s64 result;							\
+	unsigned long tmp;						\
+									\
+	prefetchw(&v->counter);						\
+									\
+	__asm__ __volatile__("@ atomic64_" #op "_return\n"		\
+"1:	ldrexd	%0, %H0, [%3]\n"					\
+"	" #op1 " %Q0, %Q0, %Q4\n"					\
+"	" #op2 " %R0, %R0, %R4\n"					\
+"	strexd	%1, %0, %H0, [%3]\n"					\
+"	teq	%1, #0\n"						\
+"	bne	1b"							\
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
+	: "r" (&v->counter), "r" (i)					\
+	: "cc");							\
+									\
+	return result;							\
+}
+
+#define ATOMIC64_FETCH_OP(op, op1, op2)					\
+static inline s64							\
+atomic64_fetch_##op##_relaxed(s64 i, atomic64_t *v)			\
+{									\
+	s64 result, val;						\
+	unsigned long tmp;						\
+									\
+	prefetchw(&v->counter);						\
+									\
+	__asm__ __volatile__("@ atomic64_fetch_" #op "\n"		\
+"1:	ldrexd	%0, %H0, [%4]\n"					\
+"	" #op1 " %Q1, %Q0, %Q5\n"					\
+"	" #op2 " %R1, %R0, %R5\n"					\
+"	strexd	%2, %1, %H1, [%4]\n"					\
+"	teq	%2, #0\n"						\
+"	bne	1b"							\
+	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Qo" (v->counter)	\
+	: "r" (&v->counter), "r" (i)					\
+	: "cc");							\
+									\
+	return result;							\
+}
+
+#define ATOMIC64_OPS(op, op1, op2)					\
+	ATOMIC64_OP(op, op1, op2)					\
+	ATOMIC64_OP_RETURN(op, op1, op2)				\
+	ATOMIC64_FETCH_OP(op, op1, op2)
+
+ATOMIC64_OPS(add, adds, adc)
+ATOMIC64_OPS(sub, subs, sbc)
+
+#define atomic64_add_return_relaxed	atomic64_add_return_relaxed
+#define atomic64_sub_return_relaxed	atomic64_sub_return_relaxed
+#define atomic64_fetch_add_relaxed	atomic64_fetch_add_relaxed
+#define atomic64_fetch_sub_relaxed	atomic64_fetch_sub_relaxed
+
+#undef ATOMIC64_OPS
+#define ATOMIC64_OPS(op, op1, op2)					\
+	ATOMIC64_OP(op, op1, op2)					\
+	ATOMIC64_FETCH_OP(op, op1, op2)
+
+#define atomic64_andnot atomic64_andnot
+
+ATOMIC64_OPS(and, and, and)
+ATOMIC64_OPS(andnot, bic, bic)
+ATOMIC64_OPS(or,  orr, orr)
+ATOMIC64_OPS(xor, eor, eor)
+
+#define atomic64_fetch_and_relaxed	atomic64_fetch_and_relaxed
+#define atomic64_fetch_andnot_relaxed	atomic64_fetch_andnot_relaxed
+#define atomic64_fetch_or_relaxed	atomic64_fetch_or_relaxed
+#define atomic64_fetch_xor_relaxed	atomic64_fetch_xor_relaxed
+
+#undef ATOMIC64_OPS
+#undef ATOMIC64_FETCH_OP
+#undef ATOMIC64_OP_RETURN
+#undef ATOMIC64_OP
+
+static inline s64 atomic64_cmpxchg_relaxed(atomic64_t *ptr, s64 old, s64 new)
+{
+	s64 oldval;
+	unsigned long res;
+
+	prefetchw(&ptr->counter);
+
+	do {
+		__asm__ __volatile__("@ atomic64_cmpxchg\n"
+		"ldrexd		%1, %H1, [%3]\n"
+		"mov		%0, #0\n"
+		"teq		%1, %4\n"
+		"teqeq		%H1, %H4\n"
+		"strexdeq	%0, %5, %H5, [%3]"
+		: "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
+		: "r" (&ptr->counter), "r" (old), "r" (new)
+		: "cc");
+	} while (res);
+
+	return oldval;
+}
+#define atomic64_cmpxchg_relaxed	atomic64_cmpxchg_relaxed
+
+static inline s64 atomic64_xchg_relaxed(atomic64_t *ptr, s64 new)
+{
+	s64 result;
+	unsigned long tmp;
+
+	prefetchw(&ptr->counter);
+
+	__asm__ __volatile__("@ atomic64_xchg\n"
+"1:	ldrexd	%0, %H0, [%3]\n"
+"	strexd	%1, %4, %H4, [%3]\n"
+"	teq	%1, #0\n"
+"	bne	1b"
+	: "=&r" (result), "=&r" (tmp), "+Qo" (ptr->counter)
+	: "r" (&ptr->counter), "r" (new)
+	: "cc");
+
+	return result;
+}
+#define atomic64_xchg_relaxed		atomic64_xchg_relaxed
+
+static inline s64 atomic64_dec_if_positive(atomic64_t *v)
+{
+	s64 result;
+	unsigned long tmp;
+
+	smp_mb();
+	prefetchw(&v->counter);
+
+	__asm__ __volatile__("@ atomic64_dec_if_positive\n"
+"1:	ldrexd	%0, %H0, [%3]\n"
+"	subs	%Q0, %Q0, #1\n"
+"	sbc	%R0, %R0, #0\n"
+"	teq	%R0, #0\n"
+"	bmi	2f\n"
+"	strexd	%1, %0, %H0, [%3]\n"
+"	teq	%1, #0\n"
+"	bne	1b\n"
+"2:"
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
+	: "r" (&v->counter)
+	: "cc");
+
+	smp_mb();
+
+	return result;
+}
+#define atomic64_dec_if_positive atomic64_dec_if_positive
+
+static inline s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
+{
+	s64 oldval, newval;
+	unsigned long tmp;
+
+	smp_mb();
+	prefetchw(&v->counter);
+
+	__asm__ __volatile__("@ atomic64_add_unless\n"
+"1:	ldrexd	%0, %H0, [%4]\n"
+"	teq	%0, %5\n"
+"	teqeq	%H0, %H5\n"
+"	beq	2f\n"
+"	adds	%Q1, %Q0, %Q6\n"
+"	adc	%R1, %R0, %R6\n"
+"	strexd	%2, %1, %H1, [%4]\n"
+"	teq	%2, #0\n"
+"	bne	1b\n"
+"2:"
+	: "=&r" (oldval), "=&r" (newval), "=&r" (tmp), "+Qo" (v->counter)
+	: "r" (&v->counter), "r" (u), "r" (a)
+	: "cc");
+
+	if (oldval != u)
+		smp_mb();
+
+	return oldval;
+}
+#define atomic64_fetch_add_unless atomic64_fetch_add_unless
+
+#endif /* !CONFIG_GENERIC_ATOMIC64 */
+#endif
+#endif
\ No newline at end of file
diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
new file mode 100644
index 0000000000..638ae84afb
--- /dev/null
+++ b/xen/include/asm-arm/arm32/cmpxchg.h
@@ -0,0 +1,279 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_ARM_CMPXCHG_H
+#define __ASM_ARM_CMPXCHG_H
+
+#include <linux/irqflags.h>
+#include <linux/prefetch.h>
+#include <asm/barrier.h>
+
+#if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
+/*
+ * On the StrongARM, "swp" is terminally broken since it bypasses the
+ * cache totally.  This means that the cache becomes inconsistent, and,
+ * since we use normal loads/stores as well, this is really bad.
+ * Typically, this causes oopsen in filp_close, but could have other,
+ * more disastrous effects.  There are two work-arounds:
+ *  1. Disable interrupts and emulate the atomic swap
+ *  2. Clean the cache, perform atomic swap, flush the cache
+ *
+ * We choose (1) since its the "easiest" to achieve here and is not
+ * dependent on the processor type.
+ *
+ * NOTE that this solution won't work on an SMP system, so explcitly
+ * forbid it here.
+ */
+#define swp_is_buggy
+#endif
+
+static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
+{
+	extern void __bad_xchg(volatile void *, int);
+	unsigned long ret;
+#ifdef swp_is_buggy
+	unsigned long flags;
+#endif
+#if __LINUX_ARM_ARCH__ >= 6
+	unsigned int tmp;
+#endif
+
+	prefetchw((const void *)ptr);
+
+	switch (size) {
+#if __LINUX_ARM_ARCH__ >= 6
+#ifndef CONFIG_CPU_V6 /* MIN ARCH >= V6K */
+	case 1:
+		asm volatile("@	__xchg1\n"
+		"1:	ldrexb	%0, [%3]\n"
+		"	strexb	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+	case 2:
+		asm volatile("@	__xchg2\n"
+		"1:	ldrexh	%0, [%3]\n"
+		"	strexh	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+#endif
+	case 4:
+		asm volatile("@	__xchg4\n"
+		"1:	ldrex	%0, [%3]\n"
+		"	strex	%1, %2, [%3]\n"
+		"	teq	%1, #0\n"
+		"	bne	1b"
+			: "=&r" (ret), "=&r" (tmp)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+#elif defined(swp_is_buggy)
+#ifdef CONFIG_SMP
+#error SMP is not supported on this platform
+#endif
+	case 1:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned char *)ptr;
+		*(volatile unsigned char *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+
+	case 4:
+		raw_local_irq_save(flags);
+		ret = *(volatile unsigned long *)ptr;
+		*(volatile unsigned long *)ptr = x;
+		raw_local_irq_restore(flags);
+		break;
+#else
+	case 1:
+		asm volatile("@	__xchg1\n"
+		"	swpb	%0, %1, [%2]"
+			: "=&r" (ret)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+	case 4:
+		asm volatile("@	__xchg4\n"
+		"	swp	%0, %1, [%2]"
+			: "=&r" (ret)
+			: "r" (x), "r" (ptr)
+			: "memory", "cc");
+		break;
+#endif
+	default:
+		/* Cause a link-time error, the xchg() size is not supported */
+		__bad_xchg(ptr, size), ret = 0;
+		break;
+	}
+
+	return ret;
+}
+
+#define xchg_relaxed(ptr, x) ({						\
+	(__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr),		\
+				   sizeof(*(ptr)));			\
+})
+
+#include <asm-generic/cmpxchg-local.h>
+
+#if __LINUX_ARM_ARCH__ < 6
+/* min ARCH < ARMv6 */
+
+#ifdef CONFIG_SMP
+#error "SMP is not supported on this platform"
+#endif
+
+#define xchg xchg_relaxed
+
+/*
+ * cmpxchg_local and cmpxchg64_local are atomic wrt current CPU. Always make
+ * them available.
+ */
+#define cmpxchg_local(ptr, o, n) ({					\
+	(__typeof(*ptr))__cmpxchg_local_generic((ptr),			\
+					        (unsigned long)(o),	\
+					        (unsigned long)(n),	\
+					        sizeof(*(ptr)));	\
+})
+
+#define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
+
+#include <asm-generic/cmpxchg.h>
+
+#else	/* min ARCH >= ARMv6 */
+
+extern void __bad_cmpxchg(volatile void *ptr, int size);
+
+/*
+ * cmpxchg only support 32-bits operands on ARMv6.
+ */
+
+static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
+				      unsigned long new, int size)
+{
+	unsigned long oldval, res;
+
+	prefetchw((const void *)ptr);
+
+	switch (size) {
+#ifndef CONFIG_CPU_V6	/* min ARCH >= ARMv6K */
+	case 1:
+		do {
+			asm volatile("@ __cmpxchg1\n"
+			"	ldrexb	%1, [%2]\n"
+			"	mov	%0, #0\n"
+			"	teq	%1, %3\n"
+			"	strexbeq %0, %4, [%2]\n"
+				: "=&r" (res), "=&r" (oldval)
+				: "r" (ptr), "Ir" (old), "r" (new)
+				: "memory", "cc");
+		} while (res);
+		break;
+	case 2:
+		do {
+			asm volatile("@ __cmpxchg1\n"
+			"	ldrexh	%1, [%2]\n"
+			"	mov	%0, #0\n"
+			"	teq	%1, %3\n"
+			"	strexheq %0, %4, [%2]\n"
+				: "=&r" (res), "=&r" (oldval)
+				: "r" (ptr), "Ir" (old), "r" (new)
+				: "memory", "cc");
+		} while (res);
+		break;
+#endif
+	case 4:
+		do {
+			asm volatile("@ __cmpxchg4\n"
+			"	ldrex	%1, [%2]\n"
+			"	mov	%0, #0\n"
+			"	teq	%1, %3\n"
+			"	strexeq %0, %4, [%2]\n"
+				: "=&r" (res), "=&r" (oldval)
+				: "r" (ptr), "Ir" (old), "r" (new)
+				: "memory", "cc");
+		} while (res);
+		break;
+	default:
+		__bad_cmpxchg(ptr, size);
+		oldval = 0;
+	}
+
+	return oldval;
+}
+
+#define cmpxchg_relaxed(ptr,o,n) ({					\
+	(__typeof__(*(ptr)))__cmpxchg((ptr),				\
+				      (unsigned long)(o),		\
+				      (unsigned long)(n),		\
+				      sizeof(*(ptr)));			\
+})
+
+static inline unsigned long __cmpxchg_local(volatile void *ptr,
+					    unsigned long old,
+					    unsigned long new, int size)
+{
+	unsigned long ret;
+
+	switch (size) {
+#ifdef CONFIG_CPU_V6	/* min ARCH == ARMv6 */
+	case 1:
+	case 2:
+		ret = __cmpxchg_local_generic(ptr, old, new, size);
+		break;
+#endif
+	default:
+		ret = __cmpxchg(ptr, old, new, size);
+	}
+
+	return ret;
+}
+
+#define cmpxchg_local(ptr, o, n) ({					\
+	(__typeof(*ptr))__cmpxchg_local((ptr),				\
+				        (unsigned long)(o),		\
+				        (unsigned long)(n),		\
+				        sizeof(*(ptr)));		\
+})
+
+static inline unsigned long long __cmpxchg64(unsigned long long *ptr,
+					     unsigned long long old,
+					     unsigned long long new)
+{
+	unsigned long long oldval;
+	unsigned long res;
+
+	prefetchw(ptr);
+
+	__asm__ __volatile__(
+"1:	ldrexd		%1, %H1, [%3]\n"
+"	teq		%1, %4\n"
+"	teqeq		%H1, %H4\n"
+"	bne		2f\n"
+"	strexd		%0, %5, %H5, [%3]\n"
+"	teq		%0, #0\n"
+"	bne		1b\n"
+"2:"
+	: "=&r" (res), "=&r" (oldval), "+Qo" (*ptr)
+	: "r" (ptr), "r" (old), "r" (new)
+	: "cc");
+
+	return oldval;
+}
+
+#define cmpxchg64_relaxed(ptr, o, n) ({					\
+	(__typeof__(*(ptr)))__cmpxchg64((ptr),				\
+					(unsigned long long)(o),	\
+					(unsigned long long)(n));	\
+})
+
+#define cmpxchg64_local(ptr, o, n) cmpxchg64_relaxed((ptr), (o), (n))
+
+#endif	/* __LINUX_ARM_ARCH__ >= 6 */
+
+#endif /* __ASM_ARM_CMPXCHG_H */
\ No newline at end of file
diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
new file mode 100644
index 0000000000..a2eab9f091
--- /dev/null
+++ b/xen/include/asm-arm/arm64/atomic.h
@@ -0,0 +1,228 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Based on arch/arm/include/asm/atomic.h
+ *
+ * Copyright (C) 1996 Russell King.
+ * Copyright (C) 2002 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012 ARM Ltd.
+ */
+#ifndef __ASM_ATOMIC_H
+#define __ASM_ATOMIC_H
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+#include <asm/barrier.h>
+#include <asm/cmpxchg.h>
+#include <asm/lse.h>
+
+#define ATOMIC_OP(op)							\
+static inline void arch_##op(int i, atomic_t *v)			\
+{									\
+	__lse_ll_sc_body(op, i, v);					\
+}
+
+ATOMIC_OP(atomic_andnot)
+ATOMIC_OP(atomic_or)
+ATOMIC_OP(atomic_xor)
+ATOMIC_OP(atomic_add)
+ATOMIC_OP(atomic_and)
+ATOMIC_OP(atomic_sub)
+
+#undef ATOMIC_OP
+
+#define ATOMIC_FETCH_OP(name, op)					\
+static inline int arch_##op##name(int i, atomic_t *v)			\
+{									\
+	return __lse_ll_sc_body(op##name, i, v);			\
+}
+
+#define ATOMIC_FETCH_OPS(op)						\
+	ATOMIC_FETCH_OP(_relaxed, op)					\
+	ATOMIC_FETCH_OP(_acquire, op)					\
+	ATOMIC_FETCH_OP(_release, op)					\
+	ATOMIC_FETCH_OP(        , op)
+
+ATOMIC_FETCH_OPS(atomic_fetch_andnot)
+ATOMIC_FETCH_OPS(atomic_fetch_or)
+ATOMIC_FETCH_OPS(atomic_fetch_xor)
+ATOMIC_FETCH_OPS(atomic_fetch_add)
+ATOMIC_FETCH_OPS(atomic_fetch_and)
+ATOMIC_FETCH_OPS(atomic_fetch_sub)
+ATOMIC_FETCH_OPS(atomic_add_return)
+ATOMIC_FETCH_OPS(atomic_sub_return)
+
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_FETCH_OPS
+
+#define ATOMIC64_OP(op)							\
+static inline void arch_##op(long i, atomic64_t *v)			\
+{									\
+	__lse_ll_sc_body(op, i, v);					\
+}
+
+ATOMIC64_OP(atomic64_andnot)
+ATOMIC64_OP(atomic64_or)
+ATOMIC64_OP(atomic64_xor)
+ATOMIC64_OP(atomic64_add)
+ATOMIC64_OP(atomic64_and)
+ATOMIC64_OP(atomic64_sub)
+
+#undef ATOMIC64_OP
+
+#define ATOMIC64_FETCH_OP(name, op)					\
+static inline long arch_##op##name(long i, atomic64_t *v)		\
+{									\
+	return __lse_ll_sc_body(op##name, i, v);			\
+}
+
+#define ATOMIC64_FETCH_OPS(op)						\
+	ATOMIC64_FETCH_OP(_relaxed, op)					\
+	ATOMIC64_FETCH_OP(_acquire, op)					\
+	ATOMIC64_FETCH_OP(_release, op)					\
+	ATOMIC64_FETCH_OP(        , op)
+
+ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
+ATOMIC64_FETCH_OPS(atomic64_fetch_or)
+ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
+ATOMIC64_FETCH_OPS(atomic64_fetch_add)
+ATOMIC64_FETCH_OPS(atomic64_fetch_and)
+ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
+ATOMIC64_FETCH_OPS(atomic64_add_return)
+ATOMIC64_FETCH_OPS(atomic64_sub_return)
+
+#undef ATOMIC64_FETCH_OP
+#undef ATOMIC64_FETCH_OPS
+
+static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
+{
+	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
+}
+
+#define arch_atomic_read(v)			__READ_ONCE((v)->counter)
+#define arch_atomic_set(v, i)			__WRITE_ONCE(((v)->counter), (i))
+
+#define arch_atomic_add_return_relaxed		arch_atomic_add_return_relaxed
+#define arch_atomic_add_return_acquire		arch_atomic_add_return_acquire
+#define arch_atomic_add_return_release		arch_atomic_add_return_release
+#define arch_atomic_add_return			arch_atomic_add_return
+
+#define arch_atomic_sub_return_relaxed		arch_atomic_sub_return_relaxed
+#define arch_atomic_sub_return_acquire		arch_atomic_sub_return_acquire
+#define arch_atomic_sub_return_release		arch_atomic_sub_return_release
+#define arch_atomic_sub_return			arch_atomic_sub_return
+
+#define arch_atomic_fetch_add_relaxed		arch_atomic_fetch_add_relaxed
+#define arch_atomic_fetch_add_acquire		arch_atomic_fetch_add_acquire
+#define arch_atomic_fetch_add_release		arch_atomic_fetch_add_release
+#define arch_atomic_fetch_add			arch_atomic_fetch_add
+
+#define arch_atomic_fetch_sub_relaxed		arch_atomic_fetch_sub_relaxed
+#define arch_atomic_fetch_sub_acquire		arch_atomic_fetch_sub_acquire
+#define arch_atomic_fetch_sub_release		arch_atomic_fetch_sub_release
+#define arch_atomic_fetch_sub			arch_atomic_fetch_sub
+
+#define arch_atomic_fetch_and_relaxed		arch_atomic_fetch_and_relaxed
+#define arch_atomic_fetch_and_acquire		arch_atomic_fetch_and_acquire
+#define arch_atomic_fetch_and_release		arch_atomic_fetch_and_release
+#define arch_atomic_fetch_and			arch_atomic_fetch_and
+
+#define arch_atomic_fetch_andnot_relaxed	arch_atomic_fetch_andnot_relaxed
+#define arch_atomic_fetch_andnot_acquire	arch_atomic_fetch_andnot_acquire
+#define arch_atomic_fetch_andnot_release	arch_atomic_fetch_andnot_release
+#define arch_atomic_fetch_andnot		arch_atomic_fetch_andnot
+
+#define arch_atomic_fetch_or_relaxed		arch_atomic_fetch_or_relaxed
+#define arch_atomic_fetch_or_acquire		arch_atomic_fetch_or_acquire
+#define arch_atomic_fetch_or_release		arch_atomic_fetch_or_release
+#define arch_atomic_fetch_or			arch_atomic_fetch_or
+
+#define arch_atomic_fetch_xor_relaxed		arch_atomic_fetch_xor_relaxed
+#define arch_atomic_fetch_xor_acquire		arch_atomic_fetch_xor_acquire
+#define arch_atomic_fetch_xor_release		arch_atomic_fetch_xor_release
+#define arch_atomic_fetch_xor			arch_atomic_fetch_xor
+
+#define arch_atomic_xchg_relaxed(v, new) \
+	arch_xchg_relaxed(&((v)->counter), (new))
+#define arch_atomic_xchg_acquire(v, new) \
+	arch_xchg_acquire(&((v)->counter), (new))
+#define arch_atomic_xchg_release(v, new) \
+	arch_xchg_release(&((v)->counter), (new))
+#define arch_atomic_xchg(v, new) \
+	arch_xchg(&((v)->counter), (new))
+
+#define arch_atomic_cmpxchg_relaxed(v, old, new) \
+	arch_cmpxchg_relaxed(&((v)->counter), (old), (new))
+#define arch_atomic_cmpxchg_acquire(v, old, new) \
+	arch_cmpxchg_acquire(&((v)->counter), (old), (new))
+#define arch_atomic_cmpxchg_release(v, old, new) \
+	arch_cmpxchg_release(&((v)->counter), (old), (new))
+#define arch_atomic_cmpxchg(v, old, new) \
+	arch_cmpxchg(&((v)->counter), (old), (new))
+
+#define arch_atomic_andnot			arch_atomic_andnot
+
+/*
+ * 64-bit arch_atomic operations.
+ */
+#define ATOMIC64_INIT				ATOMIC_INIT
+#define arch_atomic64_read			arch_atomic_read
+#define arch_atomic64_set			arch_atomic_set
+
+#define arch_atomic64_add_return_relaxed	arch_atomic64_add_return_relaxed
+#define arch_atomic64_add_return_acquire	arch_atomic64_add_return_acquire
+#define arch_atomic64_add_return_release	arch_atomic64_add_return_release
+#define arch_atomic64_add_return		arch_atomic64_add_return
+
+#define arch_atomic64_sub_return_relaxed	arch_atomic64_sub_return_relaxed
+#define arch_atomic64_sub_return_acquire	arch_atomic64_sub_return_acquire
+#define arch_atomic64_sub_return_release	arch_atomic64_sub_return_release
+#define arch_atomic64_sub_return		arch_atomic64_sub_return
+
+#define arch_atomic64_fetch_add_relaxed		arch_atomic64_fetch_add_relaxed
+#define arch_atomic64_fetch_add_acquire		arch_atomic64_fetch_add_acquire
+#define arch_atomic64_fetch_add_release		arch_atomic64_fetch_add_release
+#define arch_atomic64_fetch_add			arch_atomic64_fetch_add
+
+#define arch_atomic64_fetch_sub_relaxed		arch_atomic64_fetch_sub_relaxed
+#define arch_atomic64_fetch_sub_acquire		arch_atomic64_fetch_sub_acquire
+#define arch_atomic64_fetch_sub_release		arch_atomic64_fetch_sub_release
+#define arch_atomic64_fetch_sub			arch_atomic64_fetch_sub
+
+#define arch_atomic64_fetch_and_relaxed		arch_atomic64_fetch_and_relaxed
+#define arch_atomic64_fetch_and_acquire		arch_atomic64_fetch_and_acquire
+#define arch_atomic64_fetch_and_release		arch_atomic64_fetch_and_release
+#define arch_atomic64_fetch_and			arch_atomic64_fetch_and
+
+#define arch_atomic64_fetch_andnot_relaxed	arch_atomic64_fetch_andnot_relaxed
+#define arch_atomic64_fetch_andnot_acquire	arch_atomic64_fetch_andnot_acquire
+#define arch_atomic64_fetch_andnot_release	arch_atomic64_fetch_andnot_release
+#define arch_atomic64_fetch_andnot		arch_atomic64_fetch_andnot
+
+#define arch_atomic64_fetch_or_relaxed		arch_atomic64_fetch_or_relaxed
+#define arch_atomic64_fetch_or_acquire		arch_atomic64_fetch_or_acquire
+#define arch_atomic64_fetch_or_release		arch_atomic64_fetch_or_release
+#define arch_atomic64_fetch_or			arch_atomic64_fetch_or
+
+#define arch_atomic64_fetch_xor_relaxed		arch_atomic64_fetch_xor_relaxed
+#define arch_atomic64_fetch_xor_acquire		arch_atomic64_fetch_xor_acquire
+#define arch_atomic64_fetch_xor_release		arch_atomic64_fetch_xor_release
+#define arch_atomic64_fetch_xor			arch_atomic64_fetch_xor
+
+#define arch_atomic64_xchg_relaxed		arch_atomic_xchg_relaxed
+#define arch_atomic64_xchg_acquire		arch_atomic_xchg_acquire
+#define arch_atomic64_xchg_release		arch_atomic_xchg_release
+#define arch_atomic64_xchg			arch_atomic_xchg
+
+#define arch_atomic64_cmpxchg_relaxed		arch_atomic_cmpxchg_relaxed
+#define arch_atomic64_cmpxchg_acquire		arch_atomic_cmpxchg_acquire
+#define arch_atomic64_cmpxchg_release		arch_atomic_cmpxchg_release
+#define arch_atomic64_cmpxchg			arch_atomic_cmpxchg
+
+#define arch_atomic64_andnot			arch_atomic64_andnot
+
+#define arch_atomic64_dec_if_positive		arch_atomic64_dec_if_positive
+
+#define ARCH_ATOMIC
+
+#endif /* __ASM_ATOMIC_H */
\ No newline at end of file
diff --git a/xen/include/asm-arm/arm64/atomic_ll_sc.h b/xen/include/asm-arm/arm64/atomic_ll_sc.h
new file mode 100644
index 0000000000..e1009c0f94
--- /dev/null
+++ b/xen/include/asm-arm/arm64/atomic_ll_sc.h
@@ -0,0 +1,353 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Based on arch/arm/include/asm/atomic.h
+ *
+ * Copyright (C) 1996 Russell King.
+ * Copyright (C) 2002 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#ifndef __ASM_ATOMIC_LL_SC_H
+#define __ASM_ATOMIC_LL_SC_H
+
+#include <linux/stringify.h>
+
+#ifdef CONFIG_ARM64_LSE_ATOMICS
+#define __LL_SC_FALLBACK(asm_ops)					\
+"	b	3f\n"							\
+"	.subsection	1\n"						\
+"3:\n"									\
+asm_ops "\n"								\
+"	b	4f\n"							\
+"	.previous\n"							\
+"4:\n"
+#else
+#define __LL_SC_FALLBACK(asm_ops) asm_ops
+#endif
+
+#ifndef CONFIG_CC_HAS_K_CONSTRAINT
+#define K
+#endif
+
+/*
+ * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
+ * store exclusive to ensure that these are atomic.  We may loop
+ * to ensure that the update happens.
+ */
+
+#define ATOMIC_OP(op, asm_op, constraint)				\
+static inline void							\
+__ll_sc_atomic_##op(int i, atomic_t *v)					\
+{									\
+	unsigned long tmp;						\
+	int result;							\
+									\
+	asm volatile("// atomic_" #op "\n"				\
+	__LL_SC_FALLBACK(						\
+"	prfm	pstl1strm, %2\n"					\
+"1:	ldxr	%w0, %2\n"						\
+"	" #asm_op "	%w0, %w0, %w3\n"				\
+"	stxr	%w1, %w0, %2\n"						\
+"	cbnz	%w1, 1b\n")						\
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
+	: __stringify(constraint) "r" (i));				\
+}
+
+#define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
+static inline int							\
+__ll_sc_atomic_##op##_return##name(int i, atomic_t *v)			\
+{									\
+	unsigned long tmp;						\
+	int result;							\
+									\
+	asm volatile("// atomic_" #op "_return" #name "\n"		\
+	__LL_SC_FALLBACK(						\
+"	prfm	pstl1strm, %2\n"					\
+"1:	ld" #acq "xr	%w0, %2\n"					\
+"	" #asm_op "	%w0, %w0, %w3\n"				\
+"	st" #rel "xr	%w1, %w0, %2\n"					\
+"	cbnz	%w1, 1b\n"						\
+"	" #mb )								\
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
+	: __stringify(constraint) "r" (i)				\
+	: cl);								\
+									\
+	return result;							\
+}
+
+#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint) \
+static inline int							\
+__ll_sc_atomic_fetch_##op##name(int i, atomic_t *v)			\
+{									\
+	unsigned long tmp;						\
+	int val, result;						\
+									\
+	asm volatile("// atomic_fetch_" #op #name "\n"			\
+	__LL_SC_FALLBACK(						\
+"	prfm	pstl1strm, %3\n"					\
+"1:	ld" #acq "xr	%w0, %3\n"					\
+"	" #asm_op "	%w1, %w0, %w4\n"				\
+"	st" #rel "xr	%w2, %w1, %3\n"					\
+"	cbnz	%w2, 1b\n"						\
+"	" #mb )								\
+	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
+	: __stringify(constraint) "r" (i)				\
+	: cl);								\
+									\
+	return result;							\
+}
+
+#define ATOMIC_OPS(...)							\
+	ATOMIC_OP(__VA_ARGS__)						\
+	ATOMIC_OP_RETURN(        , dmb ish,  , l, "memory", __VA_ARGS__)\
+	ATOMIC_OP_RETURN(_relaxed,        ,  ,  ,         , __VA_ARGS__)\
+	ATOMIC_OP_RETURN(_acquire,        , a,  , "memory", __VA_ARGS__)\
+	ATOMIC_OP_RETURN(_release,        ,  , l, "memory", __VA_ARGS__)\
+	ATOMIC_FETCH_OP (        , dmb ish,  , l, "memory", __VA_ARGS__)\
+	ATOMIC_FETCH_OP (_relaxed,        ,  ,  ,         , __VA_ARGS__)\
+	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
+	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
+
+ATOMIC_OPS(add, add, I)
+ATOMIC_OPS(sub, sub, J)
+
+#undef ATOMIC_OPS
+#define ATOMIC_OPS(...)							\
+	ATOMIC_OP(__VA_ARGS__)						\
+	ATOMIC_FETCH_OP (        , dmb ish,  , l, "memory", __VA_ARGS__)\
+	ATOMIC_FETCH_OP (_relaxed,        ,  ,  ,         , __VA_ARGS__)\
+	ATOMIC_FETCH_OP (_acquire,        , a,  , "memory", __VA_ARGS__)\
+	ATOMIC_FETCH_OP (_release,        ,  , l, "memory", __VA_ARGS__)
+
+ATOMIC_OPS(and, and, K)
+ATOMIC_OPS(or, orr, K)
+ATOMIC_OPS(xor, eor, K)
+/*
+ * GAS converts the mysterious and undocumented BIC (immediate) alias to
+ * an AND (immediate) instruction with the immediate inverted. We don't
+ * have a constraint for this, so fall back to register.
+ */
+ATOMIC_OPS(andnot, bic, )
+
+#undef ATOMIC_OPS
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_OP_RETURN
+#undef ATOMIC_OP
+
+#define ATOMIC64_OP(op, asm_op, constraint)				\
+static inline void							\
+__ll_sc_atomic64_##op(s64 i, atomic64_t *v)				\
+{									\
+	s64 result;							\
+	unsigned long tmp;						\
+									\
+	asm volatile("// atomic64_" #op "\n"				\
+	__LL_SC_FALLBACK(						\
+"	prfm	pstl1strm, %2\n"					\
+"1:	ldxr	%0, %2\n"						\
+"	" #asm_op "	%0, %0, %3\n"					\
+"	stxr	%w1, %0, %2\n"						\
+"	cbnz	%w1, 1b")						\
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
+	: __stringify(constraint) "r" (i));				\
+}
+
+#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
+static inline long							\
+__ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v)		\
+{									\
+	s64 result;							\
+	unsigned long tmp;						\
+									\
+	asm volatile("// atomic64_" #op "_return" #name "\n"		\
+	__LL_SC_FALLBACK(						\
+"	prfm	pstl1strm, %2\n"					\
+"1:	ld" #acq "xr	%0, %2\n"					\
+"	" #asm_op "	%0, %0, %3\n"					\
+"	st" #rel "xr	%w1, %0, %2\n"					\
+"	cbnz	%w1, 1b\n"						\
+"	" #mb )								\
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
+	: __stringify(constraint) "r" (i)				\
+	: cl);								\
+									\
+	return result;							\
+}
+
+#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
+static inline long							\
+__ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v)		\
+{									\
+	s64 result, val;						\
+	unsigned long tmp;						\
+									\
+	asm volatile("// atomic64_fetch_" #op #name "\n"		\
+	__LL_SC_FALLBACK(						\
+"	prfm	pstl1strm, %3\n"					\
+"1:	ld" #acq "xr	%0, %3\n"					\
+"	" #asm_op "	%1, %0, %4\n"					\
+"	st" #rel "xr	%w2, %1, %3\n"					\
+"	cbnz	%w2, 1b\n"						\
+"	" #mb )								\
+	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
+	: __stringify(constraint) "r" (i)				\
+	: cl);								\
+									\
+	return result;							\
+}
+
+#define ATOMIC64_OPS(...)						\
+	ATOMIC64_OP(__VA_ARGS__)					\
+	ATOMIC64_OP_RETURN(, dmb ish,  , l, "memory", __VA_ARGS__)	\
+	ATOMIC64_OP_RETURN(_relaxed,,  ,  ,         , __VA_ARGS__)	\
+	ATOMIC64_OP_RETURN(_acquire,, a,  , "memory", __VA_ARGS__)	\
+	ATOMIC64_OP_RETURN(_release,,  , l, "memory", __VA_ARGS__)	\
+	ATOMIC64_FETCH_OP (, dmb ish,  , l, "memory", __VA_ARGS__)	\
+	ATOMIC64_FETCH_OP (_relaxed,,  ,  ,         , __VA_ARGS__)	\
+	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
+	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
+
+ATOMIC64_OPS(add, add, I)
+ATOMIC64_OPS(sub, sub, J)
+
+#undef ATOMIC64_OPS
+#define ATOMIC64_OPS(...)						\
+	ATOMIC64_OP(__VA_ARGS__)					\
+	ATOMIC64_FETCH_OP (, dmb ish,  , l, "memory", __VA_ARGS__)	\
+	ATOMIC64_FETCH_OP (_relaxed,,  ,  ,         , __VA_ARGS__)	\
+	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
+	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
+
+ATOMIC64_OPS(and, and, L)
+ATOMIC64_OPS(or, orr, L)
+ATOMIC64_OPS(xor, eor, L)
+/*
+ * GAS converts the mysterious and undocumented BIC (immediate) alias to
+ * an AND (immediate) instruction with the immediate inverted. We don't
+ * have a constraint for this, so fall back to register.
+ */
+ATOMIC64_OPS(andnot, bic, )
+
+#undef ATOMIC64_OPS
+#undef ATOMIC64_FETCH_OP
+#undef ATOMIC64_OP_RETURN
+#undef ATOMIC64_OP
+
+static inline s64
+__ll_sc_atomic64_dec_if_positive(atomic64_t *v)
+{
+	s64 result;
+	unsigned long tmp;
+
+	asm volatile("// atomic64_dec_if_positive\n"
+	__LL_SC_FALLBACK(
+"	prfm	pstl1strm, %2\n"
+"1:	ldxr	%0, %2\n"
+"	subs	%0, %0, #1\n"
+"	b.lt	2f\n"
+"	stlxr	%w1, %0, %2\n"
+"	cbnz	%w1, 1b\n"
+"	dmb	ish\n"
+"2:")
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
+	:
+	: "cc", "memory");
+
+	return result;
+}
+
+#define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint)	\
+static inline u##sz							\
+__ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
+					 unsigned long old,		\
+					 u##sz new)			\
+{									\
+	unsigned long tmp;						\
+	u##sz oldval;							\
+									\
+	/*								\
+	 * Sub-word sizes require explicit casting so that the compare  \
+	 * part of the cmpxchg doesn't end up interpreting non-zero	\
+	 * upper bits of the register containing "old".			\
+	 */								\
+	if (sz < 32)							\
+		old = (u##sz)old;					\
+									\
+	asm volatile(							\
+	__LL_SC_FALLBACK(						\
+	"	prfm	pstl1strm, %[v]\n"				\
+	"1:	ld" #acq "xr" #sfx "\t%" #w "[oldval], %[v]\n"		\
+	"	eor	%" #w "[tmp], %" #w "[oldval], %" #w "[old]\n"	\
+	"	cbnz	%" #w "[tmp], 2f\n"				\
+	"	st" #rel "xr" #sfx "\t%w[tmp], %" #w "[new], %[v]\n"	\
+	"	cbnz	%w[tmp], 1b\n"					\
+	"	" #mb "\n"						\
+	"2:")								\
+	: [tmp] "=&r" (tmp), [oldval] "=&r" (oldval),			\
+	  [v] "+Q" (*(u##sz *)ptr)					\
+	: [old] __stringify(constraint) "r" (old), [new] "r" (new)	\
+	: cl);								\
+									\
+	return oldval;							\
+}
+
+/*
+ * Earlier versions of GCC (no later than 8.1.0) appear to incorrectly
+ * handle the 'K' constraint for the value 4294967295 - thus we use no
+ * constraint for 32 bit operations.
+ */
+__CMPXCHG_CASE(w, b,     ,  8,        ,  ,  ,         , K)
+__CMPXCHG_CASE(w, h,     , 16,        ,  ,  ,         , K)
+__CMPXCHG_CASE(w,  ,     , 32,        ,  ,  ,         , K)
+__CMPXCHG_CASE( ,  ,     , 64,        ,  ,  ,         , L)
+__CMPXCHG_CASE(w, b, acq_,  8,        , a,  , "memory", K)
+__CMPXCHG_CASE(w, h, acq_, 16,        , a,  , "memory", K)
+__CMPXCHG_CASE(w,  , acq_, 32,        , a,  , "memory", K)
+__CMPXCHG_CASE( ,  , acq_, 64,        , a,  , "memory", L)
+__CMPXCHG_CASE(w, b, rel_,  8,        ,  , l, "memory", K)
+__CMPXCHG_CASE(w, h, rel_, 16,        ,  , l, "memory", K)
+__CMPXCHG_CASE(w,  , rel_, 32,        ,  , l, "memory", K)
+__CMPXCHG_CASE( ,  , rel_, 64,        ,  , l, "memory", L)
+__CMPXCHG_CASE(w, b,  mb_,  8, dmb ish,  , l, "memory", K)
+__CMPXCHG_CASE(w, h,  mb_, 16, dmb ish,  , l, "memory", K)
+__CMPXCHG_CASE(w,  ,  mb_, 32, dmb ish,  , l, "memory", K)
+__CMPXCHG_CASE( ,  ,  mb_, 64, dmb ish,  , l, "memory", L)
+
+#undef __CMPXCHG_CASE
+
+#define __CMPXCHG_DBL(name, mb, rel, cl)				\
+static inline long							\
+__ll_sc__cmpxchg_double##name(unsigned long old1,			\
+				      unsigned long old2,		\
+				      unsigned long new1,		\
+				      unsigned long new2,		\
+				      volatile void *ptr)		\
+{									\
+	unsigned long tmp, ret;						\
+									\
+	asm volatile("// __cmpxchg_double" #name "\n"			\
+	__LL_SC_FALLBACK(						\
+	"	prfm	pstl1strm, %2\n"				\
+	"1:	ldxp	%0, %1, %2\n"					\
+	"	eor	%0, %0, %3\n"					\
+	"	eor	%1, %1, %4\n"					\
+	"	orr	%1, %0, %1\n"					\
+	"	cbnz	%1, 2f\n"					\
+	"	st" #rel "xp	%w0, %5, %6, %2\n"			\
+	"	cbnz	%w0, 1b\n"					\
+	"	" #mb "\n"						\
+	"2:")								\
+	: "=&r" (tmp), "=&r" (ret), "+Q" (*(unsigned long *)ptr)	\
+	: "r" (old1), "r" (old2), "r" (new1), "r" (new2)		\
+	: cl);								\
+									\
+	return ret;							\
+}
+
+__CMPXCHG_DBL(   ,        ,  ,         )
+__CMPXCHG_DBL(_mb, dmb ish, l, "memory")
+
+#undef __CMPXCHG_DBL
+#undef K
+
+#endif	/* __ASM_ATOMIC_LL_SC_H */
\ No newline at end of file
diff --git a/xen/include/asm-arm/arm64/atomic_lse.h b/xen/include/asm-arm/arm64/atomic_lse.h
new file mode 100644
index 0000000000..b3b0d43a7d
--- /dev/null
+++ b/xen/include/asm-arm/arm64/atomic_lse.h
@@ -0,0 +1,419 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Based on arch/arm/include/asm/atomic.h
+ *
+ * Copyright (C) 1996 Russell King.
+ * Copyright (C) 2002 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#ifndef __ASM_ATOMIC_LSE_H
+#define __ASM_ATOMIC_LSE_H
+
+#define ATOMIC_OP(op, asm_op)						\
+static inline void __lse_atomic_##op(int i, atomic_t *v)			\
+{									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+"	" #asm_op "	%w[i], %[v]\n"					\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v));							\
+}
+
+ATOMIC_OP(andnot, stclr)
+ATOMIC_OP(or, stset)
+ATOMIC_OP(xor, steor)
+ATOMIC_OP(add, stadd)
+
+#undef ATOMIC_OP
+
+#define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...)			\
+static inline int __lse_atomic_fetch_##op##name(int i, atomic_t *v)	\
+{									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+"	" #asm_op #mb "	%w[i], %w[i], %[v]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
+									\
+	return i;							\
+}
+
+#define ATOMIC_FETCH_OPS(op, asm_op)					\
+	ATOMIC_FETCH_OP(_relaxed,   , op, asm_op)			\
+	ATOMIC_FETCH_OP(_acquire,  a, op, asm_op, "memory")		\
+	ATOMIC_FETCH_OP(_release,  l, op, asm_op, "memory")		\
+	ATOMIC_FETCH_OP(        , al, op, asm_op, "memory")
+
+ATOMIC_FETCH_OPS(andnot, ldclr)
+ATOMIC_FETCH_OPS(or, ldset)
+ATOMIC_FETCH_OPS(xor, ldeor)
+ATOMIC_FETCH_OPS(add, ldadd)
+
+#undef ATOMIC_FETCH_OP
+#undef ATOMIC_FETCH_OPS
+
+#define ATOMIC_OP_ADD_RETURN(name, mb, cl...)				\
+static inline int __lse_atomic_add_return##name(int i, atomic_t *v)	\
+{									\
+	u32 tmp;							\
+									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+	"	ldadd" #mb "	%w[i], %w[tmp], %[v]\n"			\
+	"	add	%w[i], %w[i], %w[tmp]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
+	: "r" (v)							\
+	: cl);								\
+									\
+	return i;							\
+}
+
+ATOMIC_OP_ADD_RETURN(_relaxed,   )
+ATOMIC_OP_ADD_RETURN(_acquire,  a, "memory")
+ATOMIC_OP_ADD_RETURN(_release,  l, "memory")
+ATOMIC_OP_ADD_RETURN(        , al, "memory")
+
+#undef ATOMIC_OP_ADD_RETURN
+
+static inline void __lse_atomic_and(int i, atomic_t *v)
+{
+	asm volatile(
+	__LSE_PREAMBLE
+	"	mvn	%w[i], %w[i]\n"
+	"	stclr	%w[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
+}
+
+#define ATOMIC_FETCH_OP_AND(name, mb, cl...)				\
+static inline int __lse_atomic_fetch_and##name(int i, atomic_t *v)	\
+{									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+	"	mvn	%w[i], %w[i]\n"					\
+	"	ldclr" #mb "	%w[i], %w[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
+									\
+	return i;							\
+}
+
+ATOMIC_FETCH_OP_AND(_relaxed,   )
+ATOMIC_FETCH_OP_AND(_acquire,  a, "memory")
+ATOMIC_FETCH_OP_AND(_release,  l, "memory")
+ATOMIC_FETCH_OP_AND(        , al, "memory")
+
+#undef ATOMIC_FETCH_OP_AND
+
+static inline void __lse_atomic_sub(int i, atomic_t *v)
+{
+	asm volatile(
+	__LSE_PREAMBLE
+	"	neg	%w[i], %w[i]\n"
+	"	stadd	%w[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
+}
+
+#define ATOMIC_OP_SUB_RETURN(name, mb, cl...)				\
+static inline int __lse_atomic_sub_return##name(int i, atomic_t *v)	\
+{									\
+	u32 tmp;							\
+									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+	"	neg	%w[i], %w[i]\n"					\
+	"	ldadd" #mb "	%w[i], %w[tmp], %[v]\n"			\
+	"	add	%w[i], %w[i], %w[tmp]"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
+	: "r" (v)							\
+	: cl);							\
+									\
+	return i;							\
+}
+
+ATOMIC_OP_SUB_RETURN(_relaxed,   )
+ATOMIC_OP_SUB_RETURN(_acquire,  a, "memory")
+ATOMIC_OP_SUB_RETURN(_release,  l, "memory")
+ATOMIC_OP_SUB_RETURN(        , al, "memory")
+
+#undef ATOMIC_OP_SUB_RETURN
+
+#define ATOMIC_FETCH_OP_SUB(name, mb, cl...)				\
+static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v)	\
+{									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+	"	neg	%w[i], %w[i]\n"					\
+	"	ldadd" #mb "	%w[i], %w[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
+									\
+	return i;							\
+}
+
+ATOMIC_FETCH_OP_SUB(_relaxed,   )
+ATOMIC_FETCH_OP_SUB(_acquire,  a, "memory")
+ATOMIC_FETCH_OP_SUB(_release,  l, "memory")
+ATOMIC_FETCH_OP_SUB(        , al, "memory")
+
+#undef ATOMIC_FETCH_OP_SUB
+
+#define ATOMIC64_OP(op, asm_op)						\
+static inline void __lse_atomic64_##op(s64 i, atomic64_t *v)		\
+{									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+"	" #asm_op "	%[i], %[v]\n"					\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v));							\
+}
+
+ATOMIC64_OP(andnot, stclr)
+ATOMIC64_OP(or, stset)
+ATOMIC64_OP(xor, steor)
+ATOMIC64_OP(add, stadd)
+
+#undef ATOMIC64_OP
+
+#define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...)			\
+static inline long __lse_atomic64_fetch_##op##name(s64 i, atomic64_t *v)\
+{									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+"	" #asm_op #mb "	%[i], %[i], %[v]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
+									\
+	return i;							\
+}
+
+#define ATOMIC64_FETCH_OPS(op, asm_op)					\
+	ATOMIC64_FETCH_OP(_relaxed,   , op, asm_op)			\
+	ATOMIC64_FETCH_OP(_acquire,  a, op, asm_op, "memory")		\
+	ATOMIC64_FETCH_OP(_release,  l, op, asm_op, "memory")		\
+	ATOMIC64_FETCH_OP(        , al, op, asm_op, "memory")
+
+ATOMIC64_FETCH_OPS(andnot, ldclr)
+ATOMIC64_FETCH_OPS(or, ldset)
+ATOMIC64_FETCH_OPS(xor, ldeor)
+ATOMIC64_FETCH_OPS(add, ldadd)
+
+#undef ATOMIC64_FETCH_OP
+#undef ATOMIC64_FETCH_OPS
+
+#define ATOMIC64_OP_ADD_RETURN(name, mb, cl...)				\
+static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
+{									\
+	unsigned long tmp;						\
+									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
+	"	add	%[i], %[i], %x[tmp]"				\
+	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
+	: "r" (v)							\
+	: cl);								\
+									\
+	return i;							\
+}
+
+ATOMIC64_OP_ADD_RETURN(_relaxed,   )
+ATOMIC64_OP_ADD_RETURN(_acquire,  a, "memory")
+ATOMIC64_OP_ADD_RETURN(_release,  l, "memory")
+ATOMIC64_OP_ADD_RETURN(        , al, "memory")
+
+#undef ATOMIC64_OP_ADD_RETURN
+
+static inline void __lse_atomic64_and(s64 i, atomic64_t *v)
+{
+	asm volatile(
+	__LSE_PREAMBLE
+	"	mvn	%[i], %[i]\n"
+	"	stclr	%[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
+}
+
+#define ATOMIC64_FETCH_OP_AND(name, mb, cl...)				\
+static inline long __lse_atomic64_fetch_and##name(s64 i, atomic64_t *v)	\
+{									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+	"	mvn	%[i], %[i]\n"					\
+	"	ldclr" #mb "	%[i], %[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
+									\
+	return i;							\
+}
+
+ATOMIC64_FETCH_OP_AND(_relaxed,   )
+ATOMIC64_FETCH_OP_AND(_acquire,  a, "memory")
+ATOMIC64_FETCH_OP_AND(_release,  l, "memory")
+ATOMIC64_FETCH_OP_AND(        , al, "memory")
+
+#undef ATOMIC64_FETCH_OP_AND
+
+static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
+{
+	asm volatile(
+	__LSE_PREAMBLE
+	"	neg	%[i], %[i]\n"
+	"	stadd	%[i], %[v]"
+	: [i] "+&r" (i), [v] "+Q" (v->counter)
+	: "r" (v));
+}
+
+#define ATOMIC64_OP_SUB_RETURN(name, mb, cl...)				\
+static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)	\
+{									\
+	unsigned long tmp;						\
+									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+	"	neg	%[i], %[i]\n"					\
+	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
+	"	add	%[i], %[i], %x[tmp]"				\
+	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
+	: "r" (v)							\
+	: cl);								\
+									\
+	return i;							\
+}
+
+ATOMIC64_OP_SUB_RETURN(_relaxed,   )
+ATOMIC64_OP_SUB_RETURN(_acquire,  a, "memory")
+ATOMIC64_OP_SUB_RETURN(_release,  l, "memory")
+ATOMIC64_OP_SUB_RETURN(        , al, "memory")
+
+#undef ATOMIC64_OP_SUB_RETURN
+
+#define ATOMIC64_FETCH_OP_SUB(name, mb, cl...)				\
+static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v)	\
+{									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+	"	neg	%[i], %[i]\n"					\
+	"	ldadd" #mb "	%[i], %[i], %[v]"			\
+	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
+	: "r" (v)							\
+	: cl);								\
+									\
+	return i;							\
+}
+
+ATOMIC64_FETCH_OP_SUB(_relaxed,   )
+ATOMIC64_FETCH_OP_SUB(_acquire,  a, "memory")
+ATOMIC64_FETCH_OP_SUB(_release,  l, "memory")
+ATOMIC64_FETCH_OP_SUB(        , al, "memory")
+
+#undef ATOMIC64_FETCH_OP_SUB
+
+static inline s64 __lse_atomic64_dec_if_positive(atomic64_t *v)
+{
+	unsigned long tmp;
+
+	asm volatile(
+	__LSE_PREAMBLE
+	"1:	ldr	%x[tmp], %[v]\n"
+	"	subs	%[ret], %x[tmp], #1\n"
+	"	b.lt	2f\n"
+	"	casal	%x[tmp], %[ret], %[v]\n"
+	"	sub	%x[tmp], %x[tmp], #1\n"
+	"	sub	%x[tmp], %x[tmp], %[ret]\n"
+	"	cbnz	%x[tmp], 1b\n"
+	"2:"
+	: [ret] "+&r" (v), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)
+	:
+	: "cc", "memory");
+
+	return (long)v;
+}
+
+#define __CMPXCHG_CASE(w, sfx, name, sz, mb, cl...)			\
+static __always_inline u##sz						\
+__lse__cmpxchg_case_##name##sz(volatile void *ptr,			\
+					      u##sz old,		\
+					      u##sz new)		\
+{									\
+	register unsigned long x0 asm ("x0") = (unsigned long)ptr;	\
+	register u##sz x1 asm ("x1") = old;				\
+	register u##sz x2 asm ("x2") = new;				\
+	unsigned long tmp;						\
+									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+	"	mov	%" #w "[tmp], %" #w "[old]\n"			\
+	"	cas" #mb #sfx "\t%" #w "[tmp], %" #w "[new], %[v]\n"	\
+	"	mov	%" #w "[ret], %" #w "[tmp]"			\
+	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr),		\
+	  [tmp] "=&r" (tmp)						\
+	: [old] "r" (x1), [new] "r" (x2)				\
+	: cl);								\
+									\
+	return x0;							\
+}
+
+__CMPXCHG_CASE(w, b,     ,  8,   )
+__CMPXCHG_CASE(w, h,     , 16,   )
+__CMPXCHG_CASE(w,  ,     , 32,   )
+__CMPXCHG_CASE(x,  ,     , 64,   )
+__CMPXCHG_CASE(w, b, acq_,  8,  a, "memory")
+__CMPXCHG_CASE(w, h, acq_, 16,  a, "memory")
+__CMPXCHG_CASE(w,  , acq_, 32,  a, "memory")
+__CMPXCHG_CASE(x,  , acq_, 64,  a, "memory")
+__CMPXCHG_CASE(w, b, rel_,  8,  l, "memory")
+__CMPXCHG_CASE(w, h, rel_, 16,  l, "memory")
+__CMPXCHG_CASE(w,  , rel_, 32,  l, "memory")
+__CMPXCHG_CASE(x,  , rel_, 64,  l, "memory")
+__CMPXCHG_CASE(w, b,  mb_,  8, al, "memory")
+__CMPXCHG_CASE(w, h,  mb_, 16, al, "memory")
+__CMPXCHG_CASE(w,  ,  mb_, 32, al, "memory")
+__CMPXCHG_CASE(x,  ,  mb_, 64, al, "memory")
+
+#undef __CMPXCHG_CASE
+
+#define __CMPXCHG_DBL(name, mb, cl...)					\
+static __always_inline long						\
+__lse__cmpxchg_double##name(unsigned long old1,				\
+					 unsigned long old2,		\
+					 unsigned long new1,		\
+					 unsigned long new2,		\
+					 volatile void *ptr)		\
+{									\
+	unsigned long oldval1 = old1;					\
+	unsigned long oldval2 = old2;					\
+	register unsigned long x0 asm ("x0") = old1;			\
+	register unsigned long x1 asm ("x1") = old2;			\
+	register unsigned long x2 asm ("x2") = new1;			\
+	register unsigned long x3 asm ("x3") = new2;			\
+	register unsigned long x4 asm ("x4") = (unsigned long)ptr;	\
+									\
+	asm volatile(							\
+	__LSE_PREAMBLE							\
+	"	casp" #mb "\t%[old1], %[old2], %[new1], %[new2], %[v]\n"\
+	"	eor	%[old1], %[old1], %[oldval1]\n"			\
+	"	eor	%[old2], %[old2], %[oldval2]\n"			\
+	"	orr	%[old1], %[old1], %[old2]"			\
+	: [old1] "+&r" (x0), [old2] "+&r" (x1),				\
+	  [v] "+Q" (*(unsigned long *)ptr)				\
+	: [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4),		\
+	  [oldval1] "r" (oldval1), [oldval2] "r" (oldval2)		\
+	: cl);								\
+									\
+	return x0;							\
+}
+
+__CMPXCHG_DBL(   ,   )
+__CMPXCHG_DBL(_mb, al, "memory")
+
+#undef __CMPXCHG_DBL
+
+#endif	/* __ASM_ATOMIC_LSE_H */
\ No newline at end of file
diff --git a/xen/include/asm-arm/arm64/cmpxchg.h b/xen/include/asm-arm/arm64/cmpxchg.h
new file mode 100644
index 0000000000..c51388216e
--- /dev/null
+++ b/xen/include/asm-arm/arm64/cmpxchg.h
@@ -0,0 +1,285 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Based on arch/arm/include/asm/cmpxchg.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ */
+#ifndef __ASM_CMPXCHG_H
+#define __ASM_CMPXCHG_H
+
+#include <linux/build_bug.h>
+#include <linux/compiler.h>
+
+#include <asm/barrier.h>
+#include <asm/lse.h>
+
+/*
+ * We need separate acquire parameters for ll/sc and lse, since the full
+ * barrier case is generated as release+dmb for the former and
+ * acquire+release for the latter.
+ */
+#define __XCHG_CASE(w, sfx, name, sz, mb, nop_lse, acq, acq_lse, rel, cl)	\
+static inline u##sz __xchg_case_##name##sz(u##sz x, volatile void *ptr)		\
+{										\
+	u##sz ret;								\
+	unsigned long tmp;							\
+										\
+	asm volatile(ARM64_LSE_ATOMIC_INSN(					\
+	/* LL/SC */								\
+	"	prfm	pstl1strm, %2\n"					\
+	"1:	ld" #acq "xr" #sfx "\t%" #w "0, %2\n"				\
+	"	st" #rel "xr" #sfx "\t%w1, %" #w "3, %2\n"			\
+	"	cbnz	%w1, 1b\n"						\
+	"	" #mb,								\
+	/* LSE atomics */							\
+	"	swp" #acq_lse #rel #sfx "\t%" #w "3, %" #w "0, %2\n"		\
+		__nops(3)							\
+	"	" #nop_lse)							\
+	: "=&r" (ret), "=&r" (tmp), "+Q" (*(u##sz *)ptr)			\
+	: "r" (x)								\
+	: cl);									\
+										\
+	return ret;								\
+}
+
+__XCHG_CASE(w, b,     ,  8,        ,    ,  ,  ,  ,         )
+__XCHG_CASE(w, h,     , 16,        ,    ,  ,  ,  ,         )
+__XCHG_CASE(w,  ,     , 32,        ,    ,  ,  ,  ,         )
+__XCHG_CASE( ,  ,     , 64,        ,    ,  ,  ,  ,         )
+__XCHG_CASE(w, b, acq_,  8,        ,    , a, a,  , "memory")
+__XCHG_CASE(w, h, acq_, 16,        ,    , a, a,  , "memory")
+__XCHG_CASE(w,  , acq_, 32,        ,    , a, a,  , "memory")
+__XCHG_CASE( ,  , acq_, 64,        ,    , a, a,  , "memory")
+__XCHG_CASE(w, b, rel_,  8,        ,    ,  ,  , l, "memory")
+__XCHG_CASE(w, h, rel_, 16,        ,    ,  ,  , l, "memory")
+__XCHG_CASE(w,  , rel_, 32,        ,    ,  ,  , l, "memory")
+__XCHG_CASE( ,  , rel_, 64,        ,    ,  ,  , l, "memory")
+__XCHG_CASE(w, b,  mb_,  8, dmb ish, nop,  , a, l, "memory")
+__XCHG_CASE(w, h,  mb_, 16, dmb ish, nop,  , a, l, "memory")
+__XCHG_CASE(w,  ,  mb_, 32, dmb ish, nop,  , a, l, "memory")
+__XCHG_CASE( ,  ,  mb_, 64, dmb ish, nop,  , a, l, "memory")
+
+#undef __XCHG_CASE
+
+#define __XCHG_GEN(sfx)							\
+static __always_inline  unsigned long __xchg##sfx(unsigned long x,	\
+					volatile void *ptr,		\
+					int size)			\
+{									\
+	switch (size) {							\
+	case 1:								\
+		return __xchg_case##sfx##_8(x, ptr);			\
+	case 2:								\
+		return __xchg_case##sfx##_16(x, ptr);			\
+	case 4:								\
+		return __xchg_case##sfx##_32(x, ptr);			\
+	case 8:								\
+		return __xchg_case##sfx##_64(x, ptr);			\
+	default:							\
+		BUILD_BUG();						\
+	}								\
+									\
+	unreachable();							\
+}
+
+__XCHG_GEN()
+__XCHG_GEN(_acq)
+__XCHG_GEN(_rel)
+__XCHG_GEN(_mb)
+
+#undef __XCHG_GEN
+
+#define __xchg_wrapper(sfx, ptr, x)					\
+({									\
+	__typeof__(*(ptr)) __ret;					\
+	__ret = (__typeof__(*(ptr)))					\
+		__xchg##sfx((unsigned long)(x), (ptr), sizeof(*(ptr))); \
+	__ret;								\
+})
+
+/* xchg */
+#define arch_xchg_relaxed(...)	__xchg_wrapper(    , __VA_ARGS__)
+#define arch_xchg_acquire(...)	__xchg_wrapper(_acq, __VA_ARGS__)
+#define arch_xchg_release(...)	__xchg_wrapper(_rel, __VA_ARGS__)
+#define arch_xchg(...)		__xchg_wrapper( _mb, __VA_ARGS__)
+
+#define __CMPXCHG_CASE(name, sz)			\
+static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
+					      u##sz old,		\
+					      u##sz new)		\
+{									\
+	return __lse_ll_sc_body(_cmpxchg_case_##name##sz,		\
+				ptr, old, new);				\
+}
+
+__CMPXCHG_CASE(    ,  8)
+__CMPXCHG_CASE(    , 16)
+__CMPXCHG_CASE(    , 32)
+__CMPXCHG_CASE(    , 64)
+__CMPXCHG_CASE(acq_,  8)
+__CMPXCHG_CASE(acq_, 16)
+__CMPXCHG_CASE(acq_, 32)
+__CMPXCHG_CASE(acq_, 64)
+__CMPXCHG_CASE(rel_,  8)
+__CMPXCHG_CASE(rel_, 16)
+__CMPXCHG_CASE(rel_, 32)
+__CMPXCHG_CASE(rel_, 64)
+__CMPXCHG_CASE(mb_,  8)
+__CMPXCHG_CASE(mb_, 16)
+__CMPXCHG_CASE(mb_, 32)
+__CMPXCHG_CASE(mb_, 64)
+
+#undef __CMPXCHG_CASE
+
+#define __CMPXCHG_DBL(name)						\
+static inline long __cmpxchg_double##name(unsigned long old1,		\
+					 unsigned long old2,		\
+					 unsigned long new1,		\
+					 unsigned long new2,		\
+					 volatile void *ptr)		\
+{									\
+	return __lse_ll_sc_body(_cmpxchg_double##name, 			\
+				old1, old2, new1, new2, ptr);		\
+}
+
+__CMPXCHG_DBL(   )
+__CMPXCHG_DBL(_mb)
+
+#undef __CMPXCHG_DBL
+
+#define __CMPXCHG_GEN(sfx)						\
+static __always_inline unsigned long __cmpxchg##sfx(volatile void *ptr,	\
+					   unsigned long old,		\
+					   unsigned long new,		\
+					   int size)			\
+{									\
+	switch (size) {							\
+	case 1:								\
+		return __cmpxchg_case##sfx##_8(ptr, old, new);		\
+	case 2:								\
+		return __cmpxchg_case##sfx##_16(ptr, old, new);		\
+	case 4:								\
+		return __cmpxchg_case##sfx##_32(ptr, old, new);		\
+	case 8:								\
+		return __cmpxchg_case##sfx##_64(ptr, old, new);		\
+	default:							\
+		BUILD_BUG();						\
+	}								\
+									\
+	unreachable();							\
+}
+
+__CMPXCHG_GEN()
+__CMPXCHG_GEN(_acq)
+__CMPXCHG_GEN(_rel)
+__CMPXCHG_GEN(_mb)
+
+#undef __CMPXCHG_GEN
+
+#define __cmpxchg_wrapper(sfx, ptr, o, n)				\
+({									\
+	__typeof__(*(ptr)) __ret;					\
+	__ret = (__typeof__(*(ptr)))					\
+		__cmpxchg##sfx((ptr), (unsigned long)(o),		\
+				(unsigned long)(n), sizeof(*(ptr)));	\
+	__ret;								\
+})
+
+/* cmpxchg */
+#define arch_cmpxchg_relaxed(...)	__cmpxchg_wrapper(    , __VA_ARGS__)
+#define arch_cmpxchg_acquire(...)	__cmpxchg_wrapper(_acq, __VA_ARGS__)
+#define arch_cmpxchg_release(...)	__cmpxchg_wrapper(_rel, __VA_ARGS__)
+#define arch_cmpxchg(...)		__cmpxchg_wrapper( _mb, __VA_ARGS__)
+#define arch_cmpxchg_local		arch_cmpxchg_relaxed
+
+/* cmpxchg64 */
+#define arch_cmpxchg64_relaxed		arch_cmpxchg_relaxed
+#define arch_cmpxchg64_acquire		arch_cmpxchg_acquire
+#define arch_cmpxchg64_release		arch_cmpxchg_release
+#define arch_cmpxchg64			arch_cmpxchg
+#define arch_cmpxchg64_local		arch_cmpxchg_local
+
+/* cmpxchg_double */
+#define system_has_cmpxchg_double()     1
+
+#define __cmpxchg_double_check(ptr1, ptr2)					\
+({										\
+	if (sizeof(*(ptr1)) != 8)						\
+		BUILD_BUG();							\
+	VM_BUG_ON((unsigned long *)(ptr2) - (unsigned long *)(ptr1) != 1);	\
+})
+
+#define arch_cmpxchg_double(ptr1, ptr2, o1, o2, n1, n2)				\
+({										\
+	int __ret;								\
+	__cmpxchg_double_check(ptr1, ptr2);					\
+	__ret = !__cmpxchg_double_mb((unsigned long)(o1), (unsigned long)(o2),	\
+				     (unsigned long)(n1), (unsigned long)(n2),	\
+				     ptr1);					\
+	__ret;									\
+})
+
+#define arch_cmpxchg_double_local(ptr1, ptr2, o1, o2, n1, n2)			\
+({										\
+	int __ret;								\
+	__cmpxchg_double_check(ptr1, ptr2);					\
+	__ret = !__cmpxchg_double((unsigned long)(o1), (unsigned long)(o2),	\
+				  (unsigned long)(n1), (unsigned long)(n2),	\
+				  ptr1);					\
+	__ret;									\
+})
+
+#define __CMPWAIT_CASE(w, sfx, sz)					\
+static inline void __cmpwait_case_##sz(volatile void *ptr,		\
+				       unsigned long val)		\
+{									\
+	unsigned long tmp;						\
+									\
+	asm volatile(							\
+	"	sevl\n"							\
+	"	wfe\n"							\
+	"	ldxr" #sfx "\t%" #w "[tmp], %[v]\n"			\
+	"	eor	%" #w "[tmp], %" #w "[tmp], %" #w "[val]\n"	\
+	"	cbnz	%" #w "[tmp], 1f\n"				\
+	"	wfe\n"							\
+	"1:"								\
+	: [tmp] "=&r" (tmp), [v] "+Q" (*(unsigned long *)ptr)		\
+	: [val] "r" (val));						\
+}
+
+__CMPWAIT_CASE(w, b, 8);
+__CMPWAIT_CASE(w, h, 16);
+__CMPWAIT_CASE(w,  , 32);
+__CMPWAIT_CASE( ,  , 64);
+
+#undef __CMPWAIT_CASE
+
+#define __CMPWAIT_GEN(sfx)						\
+static __always_inline void __cmpwait##sfx(volatile void *ptr,		\
+				  unsigned long val,			\
+				  int size)				\
+{									\
+	switch (size) {							\
+	case 1:								\
+		return __cmpwait_case##sfx##_8(ptr, (u8)val);		\
+	case 2:								\
+		return __cmpwait_case##sfx##_16(ptr, (u16)val);		\
+	case 4:								\
+		return __cmpwait_case##sfx##_32(ptr, val);		\
+	case 8:								\
+		return __cmpwait_case##sfx##_64(ptr, val);		\
+	default:							\
+		BUILD_BUG();						\
+	}								\
+									\
+	unreachable();							\
+}
+
+__CMPWAIT_GEN()
+
+#undef __CMPWAIT_GEN
+
+#define __cmpwait_relaxed(ptr, val) \
+	__cmpwait((ptr), (unsigned long)(val), sizeof(*(ptr)))
+
+#endif	/* __ASM_CMPXCHG_H */
\ No newline at end of file
diff --git a/xen/include/asm-arm/arm64/lse.h b/xen/include/asm-arm/arm64/lse.h
new file mode 100644
index 0000000000..704be3e4e4
--- /dev/null
+++ b/xen/include/asm-arm/arm64/lse.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_LSE_H
+#define __ASM_LSE_H
+
+#include <asm/atomic_ll_sc.h>
+
+#ifdef CONFIG_ARM64_LSE_ATOMICS
+
+#define __LSE_PREAMBLE	".arch_extension lse\n"
+
+#include <linux/compiler_types.h>
+#include <linux/export.h>
+#include <linux/jump_label.h>
+#include <linux/stringify.h>
+#include <asm/alternative.h>
+#include <asm/atomic_lse.h>
+#include <asm/cpucaps.h>
+
+extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
+extern struct static_key_false arm64_const_caps_ready;
+
+static inline bool system_uses_lse_atomics(void)
+{
+	return (static_branch_likely(&arm64_const_caps_ready)) &&
+		static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
+}
+
+#define __lse_ll_sc_body(op, ...)					\
+({									\
+	system_uses_lse_atomics() ?					\
+		__lse_##op(__VA_ARGS__) :				\
+		__ll_sc_##op(__VA_ARGS__);				\
+})
+
+/* In-line patching at runtime */
+#define ARM64_LSE_ATOMIC_INSN(llsc, lse)				\
+	ALTERNATIVE(llsc, __LSE_PREAMBLE lse, ARM64_HAS_LSE_ATOMICS)
+
+#else	/* CONFIG_ARM64_LSE_ATOMICS */
+
+static inline bool system_uses_lse_atomics(void) { return false; }
+
+#define __lse_ll_sc_body(op, ...)		__ll_sc_##op(__VA_ARGS__)
+
+#define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
+
+#endif	/* CONFIG_ARM64_LSE_ATOMICS */
+#endif	/* __ASM_LSE_H */
\ No newline at end of file
diff --git a/xen/include/xen/rwonce.h b/xen/include/xen/rwonce.h
new file mode 100644
index 0000000000..6b47392d1c
--- /dev/null
+++ b/xen/include/xen/rwonce.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Prevent the compiler from merging or refetching reads or writes. The
+ * compiler is also forbidden from reordering successive instances of
+ * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
+ * particular ordering. One way to make the compiler aware of ordering is to
+ * put the two invocations of READ_ONCE or WRITE_ONCE in different C
+ * statements.
+ *
+ * These two macros will also work on aggregate data types like structs or
+ * unions.
+ *
+ * Their two major use cases are: (1) Mediating communication between
+ * process-level code and irq/NMI handlers, all running on the same CPU,
+ * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
+ * mutilate accesses that either do not require ordering or that interact
+ * with an explicit memory barrier or atomic instruction that provides the
+ * required ordering.
+ */
+#ifndef __ASM_GENERIC_RWONCE_H
+#define __ASM_GENERIC_RWONCE_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/compiler_types.h>
+#include <linux/kasan-checks.h>
+#include <linux/kcsan-checks.h>
+
+/*
+ * Yes, this permits 64-bit accesses on 32-bit architectures. These will
+ * actually be atomic in some cases (namely Armv7 + LPAE), but for others we
+ * rely on the access being split into 2x32-bit accesses for a 32-bit quantity
+ * (e.g. a virtual address) and a strong prevailing wind.
+ */
+#define compiletime_assert_rwonce_type(t)					\
+	compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),	\
+		"Unsupported access size for {READ,WRITE}_ONCE().")
+
+/*
+ * Use __READ_ONCE() instead of READ_ONCE() if you do not require any
+ * atomicity. Note that this may result in tears!
+ */
+#ifndef __READ_ONCE
+#define __READ_ONCE(x)	(*(const volatile __unqual_scalar_typeof(x) *)&(x))
+#endif
+
+#define READ_ONCE(x)							\
+({									\
+	compiletime_assert_rwonce_type(x);				\
+	__READ_ONCE(x);							\
+})
+
+#define __WRITE_ONCE(x, val)						\
+do {									\
+	*(volatile typeof(x) *)&(x) = (val);				\
+} while (0)
+
+#define WRITE_ONCE(x, val)						\
+do {									\
+	compiletime_assert_rwonce_type(x);				\
+	__WRITE_ONCE(x, val);						\
+} while (0)
+
+static __no_sanitize_or_inline
+unsigned long __read_once_word_nocheck(const void *addr)
+{
+	return __READ_ONCE(*(unsigned long *)addr);
+}
+
+/*
+ * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need to load a
+ * word from memory atomically but without telling KASAN/KCSAN. This is
+ * usually used by unwinding code when walking the stack of a running process.
+ */
+#define READ_ONCE_NOCHECK(x)						\
+({									\
+	compiletime_assert(sizeof(x) == sizeof(unsigned long),		\
+		"Unsupported access size for READ_ONCE_NOCHECK().");	\
+	(typeof(x))__read_once_word_nocheck(&(x));			\
+})
+
+static __no_kasan_or_inline
+unsigned long read_word_at_a_time(const void *addr)
+{
+	kasan_check_read(addr, 1);
+	return *(unsigned long *)addr;
+}
+
+#endif /* __ASSEMBLY__ */
+#endif	/* __ASM_GENERIC_RWONCE_H */
\ No newline at end of file
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 06/15] xen: port Linux <asm-generic/rwonce.h> to Xen
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (4 preceding siblings ...)
  2020-11-11 21:51 ` [RFC PATCH v2 05/15] xen/arm: pull in Linux atomics helpers and dependencies Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 07/15] xen/arm: prepare existing Xen headers for Linux atomics Ash Wilding
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

 - Drop kasan related helpers.

 - Drop READ_ONCE() and WRITE_ONCE(); the __* versions are fine for now
   as the only callers in Xen are the arm32 atomics helpers which are
   always accessing an atomic_t's counter member, which is an int. This
   means we can swap the arm32 atomics helpers over to using the __*
   versions like the arm64 code does, removing a dependency on
   <linux/compiler_types.h> for __native_word().

 - Relax __unqual_scalar_typeof() in __READ_ONCE() to just typeof().
   Similarly to above, the only callers in Xen are the arm32/arm64
   atomics helpers, which are always accessing an atomic_t's counter
   member as a regular (int *) which doesn't need unqual'ing. This means
   we can remove the other dependency on <linux/compiler_types.h>.

Please see previous patch in the series for expanded rationale on why
not having to port <linux/compiler_types.h> to Xen makes life easier.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/xen/rwonce.h | 79 +++-------------------------------------
 1 file changed, 5 insertions(+), 74 deletions(-)

diff --git a/xen/include/xen/rwonce.h b/xen/include/xen/rwonce.h
index 6b47392d1c..d001e7e41e 100644
--- a/xen/include/xen/rwonce.h
+++ b/xen/include/xen/rwonce.h
@@ -1,90 +1,21 @@
-/* SPDX-License-Identifier: GPL-2.0 */
 /*
- * Prevent the compiler from merging or refetching reads or writes. The
- * compiler is also forbidden from reordering successive instances of
- * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
- * particular ordering. One way to make the compiler aware of ordering is to
- * put the two invocations of READ_ONCE or WRITE_ONCE in different C
- * statements.
+ * Taken from Linux 5.10-rc2 (last commit 3cea11cd5)
  *
- * These two macros will also work on aggregate data types like structs or
- * unions.
- *
- * Their two major use cases are: (1) Mediating communication between
- * process-level code and irq/NMI handlers, all running on the same CPU,
- * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
- * mutilate accesses that either do not require ordering or that interact
- * with an explicit memory barrier or atomic instruction that provides the
- * required ordering.
+ * SPDX-License-Identifier: GPL-2.0
  */
+
 #ifndef __ASM_GENERIC_RWONCE_H
 #define __ASM_GENERIC_RWONCE_H
 
 #ifndef __ASSEMBLY__
 
-#include <linux/compiler_types.h>
-#include <linux/kasan-checks.h>
-#include <linux/kcsan-checks.h>
-
-/*
- * Yes, this permits 64-bit accesses on 32-bit architectures. These will
- * actually be atomic in some cases (namely Armv7 + LPAE), but for others we
- * rely on the access being split into 2x32-bit accesses for a 32-bit quantity
- * (e.g. a virtual address) and a strong prevailing wind.
- */
-#define compiletime_assert_rwonce_type(t)					\
-	compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),	\
-		"Unsupported access size for {READ,WRITE}_ONCE().")
-
-/*
- * Use __READ_ONCE() instead of READ_ONCE() if you do not require any
- * atomicity. Note that this may result in tears!
- */
-#ifndef __READ_ONCE
-#define __READ_ONCE(x)	(*(const volatile __unqual_scalar_typeof(x) *)&(x))
-#endif
-
-#define READ_ONCE(x)							\
-({									\
-	compiletime_assert_rwonce_type(x);				\
-	__READ_ONCE(x);							\
-})
+#define __READ_ONCE(x)	(*(const volatile typeof(x) *)&(x))
 
 #define __WRITE_ONCE(x, val)						\
 do {									\
 	*(volatile typeof(x) *)&(x) = (val);				\
 } while (0)
 
-#define WRITE_ONCE(x, val)						\
-do {									\
-	compiletime_assert_rwonce_type(x);				\
-	__WRITE_ONCE(x, val);						\
-} while (0)
-
-static __no_sanitize_or_inline
-unsigned long __read_once_word_nocheck(const void *addr)
-{
-	return __READ_ONCE(*(unsigned long *)addr);
-}
-
-/*
- * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need to load a
- * word from memory atomically but without telling KASAN/KCSAN. This is
- * usually used by unwinding code when walking the stack of a running process.
- */
-#define READ_ONCE_NOCHECK(x)						\
-({									\
-	compiletime_assert(sizeof(x) == sizeof(unsigned long),		\
-		"Unsupported access size for READ_ONCE_NOCHECK().");	\
-	(typeof(x))__read_once_word_nocheck(&(x));			\
-})
-
-static __no_kasan_or_inline
-unsigned long read_word_at_a_time(const void *addr)
-{
-	kasan_check_read(addr, 1);
-	return *(unsigned long *)addr;
-}
 
 #endif /* __ASSEMBLY__ */
-#endif	/* __ASM_GENERIC_RWONCE_H */
\ No newline at end of file
+#endif	/* __ASM_GENERIC_RWONCE_H */
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 07/15] xen/arm: prepare existing Xen headers for Linux atomics
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (5 preceding siblings ...)
  2020-11-11 21:51 ` [RFC PATCH v2 06/15] xen: port Linux <asm-generic/rwonce.h> to Xen Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 08/15] xen/arm64: port Linux's arm64 atomic_ll_sc.h to Xen Ash Wilding
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

This small patch helps prepare <asm-arm/atomic.h> and the arm32/arm64
specific system.h headers to play nicely with the Linux atomics helpers:

 - We don't need the indirection around atomic_add_unless() anymore so
   let's just pull up the old Xen arm64 definition into here and use it
   for both arm32 and arm64.

 - We don't need an atomic_xchg() in <asm-arm/atomic.h> as the arm32/arm64
   specific cmpxchg.h from Linux defines it for us.

 - We drop the includes of <asm/system.h> and <xen/prefetch.h> from
   <asm-arm/atomic.h> as they're not needed.

 - We swap out the include of the arm32/arm64 specific cmpxchg.h in the
   arm32/arm64 specific system.h and instead make them include atomic.h;
   this works around build issues from cmpxchg.h trying to use the
   __ll_sc_lse_body() macro before it's ready.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/arm32/system.h |  2 +-
 xen/include/asm-arm/arm64/system.h |  2 +-
 xen/include/asm-arm/atomic.h       | 15 +++++++++++----
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h
index ab57abfbc5..88798d11db 100644
--- a/xen/include/asm-arm/arm32/system.h
+++ b/xen/include/asm-arm/arm32/system.h
@@ -2,7 +2,7 @@
 #ifndef __ASM_ARM32_SYSTEM_H
 #define __ASM_ARM32_SYSTEM_H
 
-#include <asm/arm32/cmpxchg.h>
+#include <asm/atomic.h>
 
 #define local_irq_disable() asm volatile ( "cpsid i @ local_irq_disable\n" : : : "cc" )
 #define local_irq_enable()  asm volatile ( "cpsie i @ local_irq_enable\n" : : : "cc" )
diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h
index 2e36573ac6..dfbbe4b87d 100644
--- a/xen/include/asm-arm/arm64/system.h
+++ b/xen/include/asm-arm/arm64/system.h
@@ -2,7 +2,7 @@
 #ifndef __ASM_ARM64_SYSTEM_H
 #define __ASM_ARM64_SYSTEM_H
 
-#include <asm/arm64/cmpxchg.h>
+#include <asm/atomic.h>
 
 /* Uses uimm4 as a bitmask to select the clearing of one or more of
  * the DAIF exception mask bits:
diff --git a/xen/include/asm-arm/atomic.h b/xen/include/asm-arm/atomic.h
index ac2798d095..866f54d03c 100644
--- a/xen/include/asm-arm/atomic.h
+++ b/xen/include/asm-arm/atomic.h
@@ -2,8 +2,6 @@
 #define __ARCH_ARM_ATOMIC__
 
 #include <xen/atomic.h>
-#include <xen/prefetch.h>
-#include <asm/system.h>
 
 #define build_atomic_read(name, size, width, type) \
 static inline type name(const volatile type *addr) \
@@ -220,10 +218,19 @@ static inline int atomic_add_negative(int i, atomic_t *v)
 
 static inline int atomic_add_unless(atomic_t *v, int a, int u)
 {
-    return __atomic_add_unless(v, a, u);
+	int c, old;
+
+	c = atomic_read(v);
+	while (c != u && (old = atomic_cmpxchg((v), c, c + a)) != c)
+		c = old;
+
+	return c;
 }
 
-#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+	return cmpxchg(&((v)->counter), (old), (new));
+}
 
 #endif /* __ARCH_ARM_ATOMIC__ */
 /*
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 08/15] xen/arm64: port Linux's arm64 atomic_ll_sc.h to Xen
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (6 preceding siblings ...)
  2020-11-11 21:51 ` [RFC PATCH v2 07/15] xen/arm: prepare existing Xen headers for Linux atomics Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-12-15 18:38   ` Julien Grall
  2020-11-11 21:51 ` [RFC PATCH v2 09/15] xen/arm64: port Linux's arm64 atomic_lse.h " Ash Wilding
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

Most of the "work" here is simply deleting the atomic64_t helper
definitions as we don't have an atomic64_t type in Xen.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/arm64/atomic_ll_sc.h | 134 +----------------------
 1 file changed, 6 insertions(+), 128 deletions(-)

diff --git a/xen/include/asm-arm/arm64/atomic_ll_sc.h b/xen/include/asm-arm/arm64/atomic_ll_sc.h
index e1009c0f94..20b0cb174e 100644
--- a/xen/include/asm-arm/arm64/atomic_ll_sc.h
+++ b/xen/include/asm-arm/arm64/atomic_ll_sc.h
@@ -1,16 +1,16 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
 /*
- * Based on arch/arm/include/asm/atomic.h
+ * Taken from Linux 5.10-rc2 (last commit 3cea11cd5)
  *
  * Copyright (C) 1996 Russell King.
  * Copyright (C) 2002 Deep Blue Solutions Ltd.
  * Copyright (C) 2012 ARM Ltd.
+ * SPDX-License-Identifier: GPL-2.0-only
  */
 
-#ifndef __ASM_ATOMIC_LL_SC_H
-#define __ASM_ATOMIC_LL_SC_H
+#ifndef __ASM_ARM_ARM64_ATOMIC_LL_SC_H
+#define __ASM_ARM_ARM64_ATOMIC_LL_SC_H
 
-#include <linux/stringify.h>
+#include <xen/stringify.h>
 
 #ifdef CONFIG_ARM64_LSE_ATOMICS
 #define __LL_SC_FALLBACK(asm_ops)					\
@@ -134,128 +134,6 @@ ATOMIC_OPS(andnot, bic, )
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
 
-#define ATOMIC64_OP(op, asm_op, constraint)				\
-static inline void							\
-__ll_sc_atomic64_##op(s64 i, atomic64_t *v)				\
-{									\
-	s64 result;							\
-	unsigned long tmp;						\
-									\
-	asm volatile("// atomic64_" #op "\n"				\
-	__LL_SC_FALLBACK(						\
-"	prfm	pstl1strm, %2\n"					\
-"1:	ldxr	%0, %2\n"						\
-"	" #asm_op "	%0, %0, %3\n"					\
-"	stxr	%w1, %0, %2\n"						\
-"	cbnz	%w1, 1b")						\
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: __stringify(constraint) "r" (i));				\
-}
-
-#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
-static inline long							\
-__ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v)		\
-{									\
-	s64 result;							\
-	unsigned long tmp;						\
-									\
-	asm volatile("// atomic64_" #op "_return" #name "\n"		\
-	__LL_SC_FALLBACK(						\
-"	prfm	pstl1strm, %2\n"					\
-"1:	ld" #acq "xr	%0, %2\n"					\
-"	" #asm_op "	%0, %0, %3\n"					\
-"	st" #rel "xr	%w1, %0, %2\n"					\
-"	cbnz	%w1, 1b\n"						\
-"	" #mb )								\
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: __stringify(constraint) "r" (i)				\
-	: cl);								\
-									\
-	return result;							\
-}
-
-#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
-static inline long							\
-__ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v)		\
-{									\
-	s64 result, val;						\
-	unsigned long tmp;						\
-									\
-	asm volatile("// atomic64_fetch_" #op #name "\n"		\
-	__LL_SC_FALLBACK(						\
-"	prfm	pstl1strm, %3\n"					\
-"1:	ld" #acq "xr	%0, %3\n"					\
-"	" #asm_op "	%1, %0, %4\n"					\
-"	st" #rel "xr	%w2, %1, %3\n"					\
-"	cbnz	%w2, 1b\n"						\
-"	" #mb )								\
-	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
-	: __stringify(constraint) "r" (i)				\
-	: cl);								\
-									\
-	return result;							\
-}
-
-#define ATOMIC64_OPS(...)						\
-	ATOMIC64_OP(__VA_ARGS__)					\
-	ATOMIC64_OP_RETURN(, dmb ish,  , l, "memory", __VA_ARGS__)	\
-	ATOMIC64_OP_RETURN(_relaxed,,  ,  ,         , __VA_ARGS__)	\
-	ATOMIC64_OP_RETURN(_acquire,, a,  , "memory", __VA_ARGS__)	\
-	ATOMIC64_OP_RETURN(_release,,  , l, "memory", __VA_ARGS__)	\
-	ATOMIC64_FETCH_OP (, dmb ish,  , l, "memory", __VA_ARGS__)	\
-	ATOMIC64_FETCH_OP (_relaxed,,  ,  ,         , __VA_ARGS__)	\
-	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
-	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
-
-ATOMIC64_OPS(add, add, I)
-ATOMIC64_OPS(sub, sub, J)
-
-#undef ATOMIC64_OPS
-#define ATOMIC64_OPS(...)						\
-	ATOMIC64_OP(__VA_ARGS__)					\
-	ATOMIC64_FETCH_OP (, dmb ish,  , l, "memory", __VA_ARGS__)	\
-	ATOMIC64_FETCH_OP (_relaxed,,  ,  ,         , __VA_ARGS__)	\
-	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
-	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
-
-ATOMIC64_OPS(and, and, L)
-ATOMIC64_OPS(or, orr, L)
-ATOMIC64_OPS(xor, eor, L)
-/*
- * GAS converts the mysterious and undocumented BIC (immediate) alias to
- * an AND (immediate) instruction with the immediate inverted. We don't
- * have a constraint for this, so fall back to register.
- */
-ATOMIC64_OPS(andnot, bic, )
-
-#undef ATOMIC64_OPS
-#undef ATOMIC64_FETCH_OP
-#undef ATOMIC64_OP_RETURN
-#undef ATOMIC64_OP
-
-static inline s64
-__ll_sc_atomic64_dec_if_positive(atomic64_t *v)
-{
-	s64 result;
-	unsigned long tmp;
-
-	asm volatile("// atomic64_dec_if_positive\n"
-	__LL_SC_FALLBACK(
-"	prfm	pstl1strm, %2\n"
-"1:	ldxr	%0, %2\n"
-"	subs	%0, %0, #1\n"
-"	b.lt	2f\n"
-"	stlxr	%w1, %0, %2\n"
-"	cbnz	%w1, 1b\n"
-"	dmb	ish\n"
-"2:")
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
-	:
-	: "cc", "memory");
-
-	return result;
-}
-
 #define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint)	\
 static inline u##sz							\
 __ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
@@ -350,4 +228,4 @@ __CMPXCHG_DBL(_mb, dmb ish, l, "memory")
 #undef __CMPXCHG_DBL
 #undef K
 
-#endif	/* __ASM_ATOMIC_LL_SC_H */
\ No newline at end of file
+#endif	/* __ASM_ARM_ARM64_ATOMIC_LL_SC_H */
\ No newline at end of file
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 09/15] xen/arm64: port Linux's arm64 atomic_lse.h to Xen
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (7 preceding siblings ...)
  2020-11-11 21:51 ` [RFC PATCH v2 08/15] xen/arm64: port Linux's arm64 atomic_ll_sc.h to Xen Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-11-11 21:51 ` [RFC PATCH v2 10/15] xen/arm64: port Linux's arm64 cmpxchg.h " Ash Wilding
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

As with the LL/SC atomics helpers, most of the "work" here is simply
deleting the atomic64_t helper definitions as we don't have an
atomic64_t type in Xen.

We do also need to s/__always_inline/always_inline/ to match the
qualifier name used by Xen.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/arm64/atomic_lse.h | 189 ++-----------------------
 1 file changed, 8 insertions(+), 181 deletions(-)

diff --git a/xen/include/asm-arm/arm64/atomic_lse.h b/xen/include/asm-arm/arm64/atomic_lse.h
index b3b0d43a7d..81613f7250 100644
--- a/xen/include/asm-arm/arm64/atomic_lse.h
+++ b/xen/include/asm-arm/arm64/atomic_lse.h
@@ -1,14 +1,15 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
+
 /*
- * Based on arch/arm/include/asm/atomic.h
+ * Taken from Linux 5.10-rc2 (last commit 3cea11cd5)
  *
  * Copyright (C) 1996 Russell King.
  * Copyright (C) 2002 Deep Blue Solutions Ltd.
  * Copyright (C) 2012 ARM Ltd.
+ * SPDX-License-Identifier: GPL-2.0-only
  */
 
-#ifndef __ASM_ATOMIC_LSE_H
-#define __ASM_ATOMIC_LSE_H
+#ifndef __ASM_ARM_ARM64_ATOMIC_LSE_H
+#define __ASM_ARM_ARM64_ATOMIC_LSE_H
 
 #define ATOMIC_OP(op, asm_op)						\
 static inline void __lse_atomic_##op(int i, atomic_t *v)			\
@@ -163,182 +164,8 @@ ATOMIC_FETCH_OP_SUB(        , al, "memory")
 
 #undef ATOMIC_FETCH_OP_SUB
 
-#define ATOMIC64_OP(op, asm_op)						\
-static inline void __lse_atomic64_##op(s64 i, atomic64_t *v)		\
-{									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-"	" #asm_op "	%[i], %[v]\n"					\
-	: [i] "+r" (i), [v] "+Q" (v->counter)				\
-	: "r" (v));							\
-}
-
-ATOMIC64_OP(andnot, stclr)
-ATOMIC64_OP(or, stset)
-ATOMIC64_OP(xor, steor)
-ATOMIC64_OP(add, stadd)
-
-#undef ATOMIC64_OP
-
-#define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...)			\
-static inline long __lse_atomic64_fetch_##op##name(s64 i, atomic64_t *v)\
-{									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-"	" #asm_op #mb "	%[i], %[i], %[v]"				\
-	: [i] "+r" (i), [v] "+Q" (v->counter)				\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
-}
-
-#define ATOMIC64_FETCH_OPS(op, asm_op)					\
-	ATOMIC64_FETCH_OP(_relaxed,   , op, asm_op)			\
-	ATOMIC64_FETCH_OP(_acquire,  a, op, asm_op, "memory")		\
-	ATOMIC64_FETCH_OP(_release,  l, op, asm_op, "memory")		\
-	ATOMIC64_FETCH_OP(        , al, op, asm_op, "memory")
-
-ATOMIC64_FETCH_OPS(andnot, ldclr)
-ATOMIC64_FETCH_OPS(or, ldset)
-ATOMIC64_FETCH_OPS(xor, ldeor)
-ATOMIC64_FETCH_OPS(add, ldadd)
-
-#undef ATOMIC64_FETCH_OP
-#undef ATOMIC64_FETCH_OPS
-
-#define ATOMIC64_OP_ADD_RETURN(name, mb, cl...)				\
-static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
-{									\
-	unsigned long tmp;						\
-									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
-	"	add	%[i], %[i], %x[tmp]"				\
-	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
-}
-
-ATOMIC64_OP_ADD_RETURN(_relaxed,   )
-ATOMIC64_OP_ADD_RETURN(_acquire,  a, "memory")
-ATOMIC64_OP_ADD_RETURN(_release,  l, "memory")
-ATOMIC64_OP_ADD_RETURN(        , al, "memory")
-
-#undef ATOMIC64_OP_ADD_RETURN
-
-static inline void __lse_atomic64_and(s64 i, atomic64_t *v)
-{
-	asm volatile(
-	__LSE_PREAMBLE
-	"	mvn	%[i], %[i]\n"
-	"	stclr	%[i], %[v]"
-	: [i] "+&r" (i), [v] "+Q" (v->counter)
-	: "r" (v));
-}
-
-#define ATOMIC64_FETCH_OP_AND(name, mb, cl...)				\
-static inline long __lse_atomic64_fetch_and##name(s64 i, atomic64_t *v)	\
-{									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-	"	mvn	%[i], %[i]\n"					\
-	"	ldclr" #mb "	%[i], %[i], %[v]"			\
-	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
-}
-
-ATOMIC64_FETCH_OP_AND(_relaxed,   )
-ATOMIC64_FETCH_OP_AND(_acquire,  a, "memory")
-ATOMIC64_FETCH_OP_AND(_release,  l, "memory")
-ATOMIC64_FETCH_OP_AND(        , al, "memory")
-
-#undef ATOMIC64_FETCH_OP_AND
-
-static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
-{
-	asm volatile(
-	__LSE_PREAMBLE
-	"	neg	%[i], %[i]\n"
-	"	stadd	%[i], %[v]"
-	: [i] "+&r" (i), [v] "+Q" (v->counter)
-	: "r" (v));
-}
-
-#define ATOMIC64_OP_SUB_RETURN(name, mb, cl...)				\
-static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)	\
-{									\
-	unsigned long tmp;						\
-									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-	"	neg	%[i], %[i]\n"					\
-	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
-	"	add	%[i], %[i], %x[tmp]"				\
-	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
-}
-
-ATOMIC64_OP_SUB_RETURN(_relaxed,   )
-ATOMIC64_OP_SUB_RETURN(_acquire,  a, "memory")
-ATOMIC64_OP_SUB_RETURN(_release,  l, "memory")
-ATOMIC64_OP_SUB_RETURN(        , al, "memory")
-
-#undef ATOMIC64_OP_SUB_RETURN
-
-#define ATOMIC64_FETCH_OP_SUB(name, mb, cl...)				\
-static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v)	\
-{									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-	"	neg	%[i], %[i]\n"					\
-	"	ldadd" #mb "	%[i], %[i], %[v]"			\
-	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
-}
-
-ATOMIC64_FETCH_OP_SUB(_relaxed,   )
-ATOMIC64_FETCH_OP_SUB(_acquire,  a, "memory")
-ATOMIC64_FETCH_OP_SUB(_release,  l, "memory")
-ATOMIC64_FETCH_OP_SUB(        , al, "memory")
-
-#undef ATOMIC64_FETCH_OP_SUB
-
-static inline s64 __lse_atomic64_dec_if_positive(atomic64_t *v)
-{
-	unsigned long tmp;
-
-	asm volatile(
-	__LSE_PREAMBLE
-	"1:	ldr	%x[tmp], %[v]\n"
-	"	subs	%[ret], %x[tmp], #1\n"
-	"	b.lt	2f\n"
-	"	casal	%x[tmp], %[ret], %[v]\n"
-	"	sub	%x[tmp], %x[tmp], #1\n"
-	"	sub	%x[tmp], %x[tmp], %[ret]\n"
-	"	cbnz	%x[tmp], 1b\n"
-	"2:"
-	: [ret] "+&r" (v), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)
-	:
-	: "cc", "memory");
-
-	return (long)v;
-}
-
 #define __CMPXCHG_CASE(w, sfx, name, sz, mb, cl...)			\
-static __always_inline u##sz						\
+static always_inline u##sz						\
 __lse__cmpxchg_case_##name##sz(volatile void *ptr,			\
 					      u##sz old,		\
 					      u##sz new)		\
@@ -381,7 +208,7 @@ __CMPXCHG_CASE(x,  ,  mb_, 64, al, "memory")
 #undef __CMPXCHG_CASE
 
 #define __CMPXCHG_DBL(name, mb, cl...)					\
-static __always_inline long						\
+static always_inline long						\
 __lse__cmpxchg_double##name(unsigned long old1,				\
 					 unsigned long old2,		\
 					 unsigned long new1,		\
@@ -416,4 +243,4 @@ __CMPXCHG_DBL(_mb, al, "memory")
 
 #undef __CMPXCHG_DBL
 
-#endif	/* __ASM_ATOMIC_LSE_H */
\ No newline at end of file
+#endif	/* __ASM_ARM_ARM64_ATOMIC_LSE_H */
\ No newline at end of file
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 10/15] xen/arm64: port Linux's arm64 cmpxchg.h to Xen
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (8 preceding siblings ...)
  2020-11-11 21:51 ` [RFC PATCH v2 09/15] xen/arm64: port Linux's arm64 atomic_lse.h " Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-12-15 18:53   ` Julien Grall
  2020-11-11 21:51 ` [RFC PATCH v2 11/15] xen/arm64: port Linux's arm64 atomic.h " Ash Wilding
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

 - s/arch_xchg/xchg/

 - s/arch_cmpxchg/cmpxchg/

 - Replace calls to BUILD_BUG() with calls to __bad_cmpxchg() as we
   don't currently have a BUILD_BUG() macro in Xen and this will
   equivalently cause a link-time error.

 - Replace calls to VM_BUG_ON() with BUG_ON() as we don't currently
   have a VM_BUG_ON() macro in Xen.

 - Pull in the timeout variants of cmpxchg from the original Xen
   arm64 cmpxchg.h as these are required for guest atomics and are
   not provided by Linux. Note these are always using LL/SC so we
   should ideally write LSE versions at some point.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/arm64/cmpxchg.h | 165 ++++++++++++++++++++++------
 1 file changed, 131 insertions(+), 34 deletions(-)

diff --git a/xen/include/asm-arm/arm64/cmpxchg.h b/xen/include/asm-arm/arm64/cmpxchg.h
index c51388216e..a5282cf66e 100644
--- a/xen/include/asm-arm/arm64/cmpxchg.h
+++ b/xen/include/asm-arm/arm64/cmpxchg.h
@@ -1,17 +1,16 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
 /*
- * Based on arch/arm/include/asm/cmpxchg.h
+ * Taken from Linux 5.10-rc2 (last commit 3cea11cd5)
  *
  * Copyright (C) 2012 ARM Ltd.
+ * SPDX-License-Identifier: GPL-2.0-only
  */
-#ifndef __ASM_CMPXCHG_H
-#define __ASM_CMPXCHG_H
+#ifndef __ASM_ARM_ARM64_CMPXCHG_H
+#define __ASM_ARM_ARM64_CMPXCHG_H
 
-#include <linux/build_bug.h>
-#include <linux/compiler.h>
+#include <asm/bug.h>
+#include "lse.h"
 
-#include <asm/barrier.h>
-#include <asm/lse.h>
+extern unsigned long __bad_cmpxchg(volatile void *ptr, int size);
 
 /*
  * We need separate acquire parameters for ll/sc and lse, since the full
@@ -33,7 +32,9 @@ static inline u##sz __xchg_case_##name##sz(u##sz x, volatile void *ptr)		\
 	"	" #mb,								\
 	/* LSE atomics */							\
 	"	swp" #acq_lse #rel #sfx "\t%" #w "3, %" #w "0, %2\n"		\
-		__nops(3)							\
+		"nop\n"							\
+		"nop\n"							\
+		"nop\n"							\
 	"	" #nop_lse)							\
 	: "=&r" (ret), "=&r" (tmp), "+Q" (*(u##sz *)ptr)			\
 	: "r" (x)								\
@@ -62,7 +63,7 @@ __XCHG_CASE( ,  ,  mb_, 64, dmb ish, nop,  , a, l, "memory")
 #undef __XCHG_CASE
 
 #define __XCHG_GEN(sfx)							\
-static __always_inline  unsigned long __xchg##sfx(unsigned long x,	\
+static always_inline  unsigned long __xchg##sfx(unsigned long x,	\
 					volatile void *ptr,		\
 					int size)			\
 {									\
@@ -76,7 +77,7 @@ static __always_inline  unsigned long __xchg##sfx(unsigned long x,	\
 	case 8:								\
 		return __xchg_case##sfx##_64(x, ptr);			\
 	default:							\
-		BUILD_BUG();						\
+		return __bad_cmpxchg(ptr, size);						\
 	}								\
 									\
 	unreachable();							\
@@ -98,10 +99,10 @@ __XCHG_GEN(_mb)
 })
 
 /* xchg */
-#define arch_xchg_relaxed(...)	__xchg_wrapper(    , __VA_ARGS__)
-#define arch_xchg_acquire(...)	__xchg_wrapper(_acq, __VA_ARGS__)
-#define arch_xchg_release(...)	__xchg_wrapper(_rel, __VA_ARGS__)
-#define arch_xchg(...)		__xchg_wrapper( _mb, __VA_ARGS__)
+#define xchg_relaxed(...)	__xchg_wrapper(    , __VA_ARGS__)
+#define xchg_acquire(...)	__xchg_wrapper(_acq, __VA_ARGS__)
+#define xchg_release(...)	__xchg_wrapper(_rel, __VA_ARGS__)
+#define xchg(...)		__xchg_wrapper( _mb, __VA_ARGS__)
 
 #define __CMPXCHG_CASE(name, sz)			\
 static inline u##sz __cmpxchg_case_##name##sz(volatile void *ptr,	\
@@ -148,7 +149,7 @@ __CMPXCHG_DBL(_mb)
 #undef __CMPXCHG_DBL
 
 #define __CMPXCHG_GEN(sfx)						\
-static __always_inline unsigned long __cmpxchg##sfx(volatile void *ptr,	\
+static always_inline unsigned long __cmpxchg##sfx(volatile void *ptr,	\
 					   unsigned long old,		\
 					   unsigned long new,		\
 					   int size)			\
@@ -163,7 +164,7 @@ static __always_inline unsigned long __cmpxchg##sfx(volatile void *ptr,	\
 	case 8:								\
 		return __cmpxchg_case##sfx##_64(ptr, old, new);		\
 	default:							\
-		BUILD_BUG();						\
+		return __bad_cmpxchg(ptr, size);						\
 	}								\
 									\
 	unreachable();							\
@@ -186,18 +187,18 @@ __CMPXCHG_GEN(_mb)
 })
 
 /* cmpxchg */
-#define arch_cmpxchg_relaxed(...)	__cmpxchg_wrapper(    , __VA_ARGS__)
-#define arch_cmpxchg_acquire(...)	__cmpxchg_wrapper(_acq, __VA_ARGS__)
-#define arch_cmpxchg_release(...)	__cmpxchg_wrapper(_rel, __VA_ARGS__)
-#define arch_cmpxchg(...)		__cmpxchg_wrapper( _mb, __VA_ARGS__)
-#define arch_cmpxchg_local		arch_cmpxchg_relaxed
+#define cmpxchg_relaxed(...)	__cmpxchg_wrapper(    , __VA_ARGS__)
+#define cmpxchg_acquire(...)	__cmpxchg_wrapper(_acq, __VA_ARGS__)
+#define cmpxchg_release(...)	__cmpxchg_wrapper(_rel, __VA_ARGS__)
+#define cmpxchg(...)		__cmpxchg_wrapper( _mb, __VA_ARGS__)
+#define cmpxchg_local		cmpxchg_relaxed
 
 /* cmpxchg64 */
-#define arch_cmpxchg64_relaxed		arch_cmpxchg_relaxed
-#define arch_cmpxchg64_acquire		arch_cmpxchg_acquire
-#define arch_cmpxchg64_release		arch_cmpxchg_release
-#define arch_cmpxchg64			arch_cmpxchg
-#define arch_cmpxchg64_local		arch_cmpxchg_local
+#define cmpxchg64_relaxed		cmpxchg_relaxed
+#define cmpxchg64_acquire		cmpxchg_acquire
+#define cmpxchg64_release		cmpxchg_release
+#define cmpxchg64			cmpxchg
+#define cmpxchg64_local		cmpxchg_local
 
 /* cmpxchg_double */
 #define system_has_cmpxchg_double()     1
@@ -205,11 +206,11 @@ __CMPXCHG_GEN(_mb)
 #define __cmpxchg_double_check(ptr1, ptr2)					\
 ({										\
 	if (sizeof(*(ptr1)) != 8)						\
-		BUILD_BUG();							\
-	VM_BUG_ON((unsigned long *)(ptr2) - (unsigned long *)(ptr1) != 1);	\
+		return __bad_cmpxchg(ptr, size);							\
+	BUG_ON((unsigned long *)(ptr2) - (unsigned long *)(ptr1) != 1);	\
 })
 
-#define arch_cmpxchg_double(ptr1, ptr2, o1, o2, n1, n2)				\
+#define cmpxchg_double(ptr1, ptr2, o1, o2, n1, n2)				\
 ({										\
 	int __ret;								\
 	__cmpxchg_double_check(ptr1, ptr2);					\
@@ -219,7 +220,7 @@ __CMPXCHG_GEN(_mb)
 	__ret;									\
 })
 
-#define arch_cmpxchg_double_local(ptr1, ptr2, o1, o2, n1, n2)			\
+#define cmpxchg_double_local(ptr1, ptr2, o1, o2, n1, n2)			\
 ({										\
 	int __ret;								\
 	__cmpxchg_double_check(ptr1, ptr2);					\
@@ -255,7 +256,7 @@ __CMPWAIT_CASE( ,  , 64);
 #undef __CMPWAIT_CASE
 
 #define __CMPWAIT_GEN(sfx)						\
-static __always_inline void __cmpwait##sfx(volatile void *ptr,		\
+static always_inline void __cmpwait##sfx(volatile void *ptr,		\
 				  unsigned long val,			\
 				  int size)				\
 {									\
@@ -269,7 +270,7 @@ static __always_inline void __cmpwait##sfx(volatile void *ptr,		\
 	case 8:								\
 		return __cmpwait_case##sfx##_64(ptr, val);		\
 	default:							\
-		BUILD_BUG();						\
+		__bad_cmpxchg(ptr, size);						\
 	}								\
 									\
 	unreachable();							\
@@ -282,4 +283,100 @@ __CMPWAIT_GEN()
 #define __cmpwait_relaxed(ptr, val) \
 	__cmpwait((ptr), (unsigned long)(val), sizeof(*(ptr)))
 
-#endif	/* __ASM_CMPXCHG_H */
\ No newline at end of file
+/*
+ * This code is from the original Xen arm64 cmpxchg.h, from before the
+ * Linux 5.10-rc2 atomics helpers were ported over. The only changes
+ * here are renaming the macros and functions to explicitly use
+ * "timeout" in their names so that they don't clash with the above.
+ *
+ * We need this here for guest atomics (the only user of the timeout
+ * variants).
+ */
+
+#define __CMPXCHG_TIMEOUT_CASE(w, sz, name)                             \
+static inline bool __cmpxchg_timeout_case_##name(volatile void *ptr,    \
+                                         unsigned long *old,            \
+                                         unsigned long new,             \
+                                         bool timeout,                  \
+                                         unsigned int max_try)          \
+{                                                                       \
+        unsigned long oldval;                                           \
+        unsigned long res;                                              \
+                                                                        \
+        do {                                                            \
+                asm volatile("// __cmpxchg_timeout_case_" #name "\n"    \
+                "       ldxr" #sz "     %" #w "1, %2\n"                 \
+                "       mov     %w0, #0\n"                              \
+                "       cmp     %" #w "1, %" #w "3\n"                   \
+                "       b.ne    1f\n"                                   \
+                "       stxr" #sz "     %w0, %" #w "4, %2\n"            \
+                "1:\n"                                                  \
+                : "=&r" (res), "=&r" (oldval),                          \
+                  "+Q" (*(unsigned long *)ptr)                          \
+                : "Ir" (*old), "r" (new)                                \
+                : "cc");                                                \
+                                                                        \
+                if (!res)                                               \
+                        break;                                          \
+        } while (!timeout || ((--max_try) > 0));                        \
+                                                                        \
+        *old = oldval;                                                  \
+                                                                        \
+        return !res;                                                    \
+}
+
+__CMPXCHG_TIMEOUT_CASE(w, b, 1)
+__CMPXCHG_TIMEOUT_CASE(w, h, 2)
+__CMPXCHG_TIMEOUT_CASE(w,  , 4)
+__CMPXCHG_TIMEOUT_CASE( ,  , 8)
+
+static always_inline bool __int_cmpxchg(volatile void *ptr, unsigned long *old,
+                                        unsigned long new, int size,
+                                        bool timeout, unsigned int max_try)
+{
+        switch (size) {
+        case 1:
+                return __cmpxchg_timeout_case_1(ptr, old, new, timeout, max_try);
+        case 2:
+                return __cmpxchg_timeout_case_2(ptr, old, new, timeout, max_try);
+        case 4:
+                return __cmpxchg_timeout_case_4(ptr, old, new, timeout, max_try);
+        case 8:
+                return __cmpxchg_timeout_case_8(ptr, old, new, timeout, max_try);
+        default:
+                return __bad_cmpxchg(ptr, size);
+        }
+
+        ASSERT_UNREACHABLE();
+}
+
+/*
+ * The helper may fail to update the memory if the action takes too long.
+ *
+ * @old: On call the value pointed contains the expected old value. It will be
+ * updated to the actual old value.
+ * @max_try: Maximum number of iterations
+ *
+ * The helper will return true when the update has succeeded (i.e no
+ * timeout) and false if the update has failed.
+ */
+static always_inline bool __cmpxchg_timeout(volatile void *ptr,
+                                            unsigned long *old,
+                                            unsigned long new,
+                                            int size,
+                                            unsigned int max_try)
+{
+        bool ret;
+
+        smp_mb();
+        ret = __int_cmpxchg(ptr, old, new, size, true, max_try);
+        smp_mb();
+
+        return ret;
+}
+
+#define __cmpxchg64_timeout(ptr, old, new, max_try)     \
+        __cmpxchg_timeout(ptr, old, new, 8, max_try)
+
+
+#endif	/* __ASM_ARM_ARM64_CMPXCHG_H */
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 11/15] xen/arm64: port Linux's arm64 atomic.h to Xen
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (9 preceding siblings ...)
  2020-11-11 21:51 ` [RFC PATCH v2 10/15] xen/arm64: port Linux's arm64 cmpxchg.h " Ash Wilding
@ 2020-11-11 21:51 ` Ash Wilding
  2020-11-11 21:52 ` [RFC PATCH v2 12/15] xen/arm64: port Linux's arm64 lse.h " Ash Wilding
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

 - Drop atomic64_t helper declarations as we don't currently have an
   atomic64_t in Xen.

 - Drop arch_* prefixes.

 - Swap include of <linux/compiler.h> to just <xen/rwonce.h>.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/arm64/atomic.h | 256 ++++++++---------------------
 1 file changed, 73 insertions(+), 183 deletions(-)

diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h
index a2eab9f091..b695cc6e09 100644
--- a/xen/include/asm-arm/arm64/atomic.h
+++ b/xen/include/asm-arm/arm64/atomic.h
@@ -1,23 +1,23 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
+
 /*
- * Based on arch/arm/include/asm/atomic.h
+ * Taken from Linux 5.10-rc2 (last commit 3cea11cd5)
  *
  * Copyright (C) 1996 Russell King.
  * Copyright (C) 2002 Deep Blue Solutions Ltd.
  * Copyright (C) 2012 ARM Ltd.
+ * SPDX-License-Identifier: GPL-2.0-only
  */
-#ifndef __ASM_ATOMIC_H
-#define __ASM_ATOMIC_H
+#ifndef __ASM_ARM_ARM64_ATOMIC_H
+#define __ASM_ARM_ARM64_ATOMIC_H
 
-#include <linux/compiler.h>
-#include <linux/types.h>
+#include <xen/rwonce.h>
+#include <xen/types.h>
 
-#include <asm/barrier.h>
-#include <asm/cmpxchg.h>
-#include <asm/lse.h>
+#include "lse.h"
+#include "cmpxchg.h"
 
 #define ATOMIC_OP(op)							\
-static inline void arch_##op(int i, atomic_t *v)			\
+static inline void op(int i, atomic_t *v)			\
 {									\
 	__lse_ll_sc_body(op, i, v);					\
 }
@@ -32,7 +32,7 @@ ATOMIC_OP(atomic_sub)
 #undef ATOMIC_OP
 
 #define ATOMIC_FETCH_OP(name, op)					\
-static inline int arch_##op##name(int i, atomic_t *v)			\
+static inline int op##name(int i, atomic_t *v)			\
 {									\
 	return __lse_ll_sc_body(op##name, i, v);			\
 }
@@ -54,175 +54,65 @@ ATOMIC_FETCH_OPS(atomic_sub_return)
 
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_FETCH_OPS
-
-#define ATOMIC64_OP(op)							\
-static inline void arch_##op(long i, atomic64_t *v)			\
-{									\
-	__lse_ll_sc_body(op, i, v);					\
-}
-
-ATOMIC64_OP(atomic64_andnot)
-ATOMIC64_OP(atomic64_or)
-ATOMIC64_OP(atomic64_xor)
-ATOMIC64_OP(atomic64_add)
-ATOMIC64_OP(atomic64_and)
-ATOMIC64_OP(atomic64_sub)
-
-#undef ATOMIC64_OP
-
-#define ATOMIC64_FETCH_OP(name, op)					\
-static inline long arch_##op##name(long i, atomic64_t *v)		\
-{									\
-	return __lse_ll_sc_body(op##name, i, v);			\
-}
-
-#define ATOMIC64_FETCH_OPS(op)						\
-	ATOMIC64_FETCH_OP(_relaxed, op)					\
-	ATOMIC64_FETCH_OP(_acquire, op)					\
-	ATOMIC64_FETCH_OP(_release, op)					\
-	ATOMIC64_FETCH_OP(        , op)
-
-ATOMIC64_FETCH_OPS(atomic64_fetch_andnot)
-ATOMIC64_FETCH_OPS(atomic64_fetch_or)
-ATOMIC64_FETCH_OPS(atomic64_fetch_xor)
-ATOMIC64_FETCH_OPS(atomic64_fetch_add)
-ATOMIC64_FETCH_OPS(atomic64_fetch_and)
-ATOMIC64_FETCH_OPS(atomic64_fetch_sub)
-ATOMIC64_FETCH_OPS(atomic64_add_return)
-ATOMIC64_FETCH_OPS(atomic64_sub_return)
-
-#undef ATOMIC64_FETCH_OP
-#undef ATOMIC64_FETCH_OPS
-
-static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
-{
-	return __lse_ll_sc_body(atomic64_dec_if_positive, v);
-}
-
-#define arch_atomic_read(v)			__READ_ONCE((v)->counter)
-#define arch_atomic_set(v, i)			__WRITE_ONCE(((v)->counter), (i))
-
-#define arch_atomic_add_return_relaxed		arch_atomic_add_return_relaxed
-#define arch_atomic_add_return_acquire		arch_atomic_add_return_acquire
-#define arch_atomic_add_return_release		arch_atomic_add_return_release
-#define arch_atomic_add_return			arch_atomic_add_return
-
-#define arch_atomic_sub_return_relaxed		arch_atomic_sub_return_relaxed
-#define arch_atomic_sub_return_acquire		arch_atomic_sub_return_acquire
-#define arch_atomic_sub_return_release		arch_atomic_sub_return_release
-#define arch_atomic_sub_return			arch_atomic_sub_return
-
-#define arch_atomic_fetch_add_relaxed		arch_atomic_fetch_add_relaxed
-#define arch_atomic_fetch_add_acquire		arch_atomic_fetch_add_acquire
-#define arch_atomic_fetch_add_release		arch_atomic_fetch_add_release
-#define arch_atomic_fetch_add			arch_atomic_fetch_add
-
-#define arch_atomic_fetch_sub_relaxed		arch_atomic_fetch_sub_relaxed
-#define arch_atomic_fetch_sub_acquire		arch_atomic_fetch_sub_acquire
-#define arch_atomic_fetch_sub_release		arch_atomic_fetch_sub_release
-#define arch_atomic_fetch_sub			arch_atomic_fetch_sub
-
-#define arch_atomic_fetch_and_relaxed		arch_atomic_fetch_and_relaxed
-#define arch_atomic_fetch_and_acquire		arch_atomic_fetch_and_acquire
-#define arch_atomic_fetch_and_release		arch_atomic_fetch_and_release
-#define arch_atomic_fetch_and			arch_atomic_fetch_and
-
-#define arch_atomic_fetch_andnot_relaxed	arch_atomic_fetch_andnot_relaxed
-#define arch_atomic_fetch_andnot_acquire	arch_atomic_fetch_andnot_acquire
-#define arch_atomic_fetch_andnot_release	arch_atomic_fetch_andnot_release
-#define arch_atomic_fetch_andnot		arch_atomic_fetch_andnot
-
-#define arch_atomic_fetch_or_relaxed		arch_atomic_fetch_or_relaxed
-#define arch_atomic_fetch_or_acquire		arch_atomic_fetch_or_acquire
-#define arch_atomic_fetch_or_release		arch_atomic_fetch_or_release
-#define arch_atomic_fetch_or			arch_atomic_fetch_or
-
-#define arch_atomic_fetch_xor_relaxed		arch_atomic_fetch_xor_relaxed
-#define arch_atomic_fetch_xor_acquire		arch_atomic_fetch_xor_acquire
-#define arch_atomic_fetch_xor_release		arch_atomic_fetch_xor_release
-#define arch_atomic_fetch_xor			arch_atomic_fetch_xor
-
-#define arch_atomic_xchg_relaxed(v, new) \
-	arch_xchg_relaxed(&((v)->counter), (new))
-#define arch_atomic_xchg_acquire(v, new) \
-	arch_xchg_acquire(&((v)->counter), (new))
-#define arch_atomic_xchg_release(v, new) \
-	arch_xchg_release(&((v)->counter), (new))
-#define arch_atomic_xchg(v, new) \
-	arch_xchg(&((v)->counter), (new))
-
-#define arch_atomic_cmpxchg_relaxed(v, old, new) \
-	arch_cmpxchg_relaxed(&((v)->counter), (old), (new))
-#define arch_atomic_cmpxchg_acquire(v, old, new) \
-	arch_cmpxchg_acquire(&((v)->counter), (old), (new))
-#define arch_atomic_cmpxchg_release(v, old, new) \
-	arch_cmpxchg_release(&((v)->counter), (old), (new))
-#define arch_atomic_cmpxchg(v, old, new) \
-	arch_cmpxchg(&((v)->counter), (old), (new))
-
-#define arch_atomic_andnot			arch_atomic_andnot
-
-/*
- * 64-bit arch_atomic operations.
- */
-#define ATOMIC64_INIT				ATOMIC_INIT
-#define arch_atomic64_read			arch_atomic_read
-#define arch_atomic64_set			arch_atomic_set
-
-#define arch_atomic64_add_return_relaxed	arch_atomic64_add_return_relaxed
-#define arch_atomic64_add_return_acquire	arch_atomic64_add_return_acquire
-#define arch_atomic64_add_return_release	arch_atomic64_add_return_release
-#define arch_atomic64_add_return		arch_atomic64_add_return
-
-#define arch_atomic64_sub_return_relaxed	arch_atomic64_sub_return_relaxed
-#define arch_atomic64_sub_return_acquire	arch_atomic64_sub_return_acquire
-#define arch_atomic64_sub_return_release	arch_atomic64_sub_return_release
-#define arch_atomic64_sub_return		arch_atomic64_sub_return
-
-#define arch_atomic64_fetch_add_relaxed		arch_atomic64_fetch_add_relaxed
-#define arch_atomic64_fetch_add_acquire		arch_atomic64_fetch_add_acquire
-#define arch_atomic64_fetch_add_release		arch_atomic64_fetch_add_release
-#define arch_atomic64_fetch_add			arch_atomic64_fetch_add
-
-#define arch_atomic64_fetch_sub_relaxed		arch_atomic64_fetch_sub_relaxed
-#define arch_atomic64_fetch_sub_acquire		arch_atomic64_fetch_sub_acquire
-#define arch_atomic64_fetch_sub_release		arch_atomic64_fetch_sub_release
-#define arch_atomic64_fetch_sub			arch_atomic64_fetch_sub
-
-#define arch_atomic64_fetch_and_relaxed		arch_atomic64_fetch_and_relaxed
-#define arch_atomic64_fetch_and_acquire		arch_atomic64_fetch_and_acquire
-#define arch_atomic64_fetch_and_release		arch_atomic64_fetch_and_release
-#define arch_atomic64_fetch_and			arch_atomic64_fetch_and
-
-#define arch_atomic64_fetch_andnot_relaxed	arch_atomic64_fetch_andnot_relaxed
-#define arch_atomic64_fetch_andnot_acquire	arch_atomic64_fetch_andnot_acquire
-#define arch_atomic64_fetch_andnot_release	arch_atomic64_fetch_andnot_release
-#define arch_atomic64_fetch_andnot		arch_atomic64_fetch_andnot
-
-#define arch_atomic64_fetch_or_relaxed		arch_atomic64_fetch_or_relaxed
-#define arch_atomic64_fetch_or_acquire		arch_atomic64_fetch_or_acquire
-#define arch_atomic64_fetch_or_release		arch_atomic64_fetch_or_release
-#define arch_atomic64_fetch_or			arch_atomic64_fetch_or
-
-#define arch_atomic64_fetch_xor_relaxed		arch_atomic64_fetch_xor_relaxed
-#define arch_atomic64_fetch_xor_acquire		arch_atomic64_fetch_xor_acquire
-#define arch_atomic64_fetch_xor_release		arch_atomic64_fetch_xor_release
-#define arch_atomic64_fetch_xor			arch_atomic64_fetch_xor
-
-#define arch_atomic64_xchg_relaxed		arch_atomic_xchg_relaxed
-#define arch_atomic64_xchg_acquire		arch_atomic_xchg_acquire
-#define arch_atomic64_xchg_release		arch_atomic_xchg_release
-#define arch_atomic64_xchg			arch_atomic_xchg
-
-#define arch_atomic64_cmpxchg_relaxed		arch_atomic_cmpxchg_relaxed
-#define arch_atomic64_cmpxchg_acquire		arch_atomic_cmpxchg_acquire
-#define arch_atomic64_cmpxchg_release		arch_atomic_cmpxchg_release
-#define arch_atomic64_cmpxchg			arch_atomic_cmpxchg
-
-#define arch_atomic64_andnot			arch_atomic64_andnot
-
-#define arch_atomic64_dec_if_positive		arch_atomic64_dec_if_positive
-
-#define ARCH_ATOMIC
-
-#endif /* __ASM_ATOMIC_H */
\ No newline at end of file
+#define atomic_read(v)			__READ_ONCE((v)->counter)
+#define atomic_set(v, i)			__WRITE_ONCE(((v)->counter), (i))
+
+#define atomic_add_return_relaxed		atomic_add_return_relaxed
+#define atomic_add_return_acquire		atomic_add_return_acquire
+#define atomic_add_return_release		atomic_add_return_release
+#define atomic_add_return			atomic_add_return
+
+#define atomic_sub_return_relaxed		atomic_sub_return_relaxed
+#define atomic_sub_return_acquire		atomic_sub_return_acquire
+#define atomic_sub_return_release		atomic_sub_return_release
+#define atomic_sub_return			atomic_sub_return
+
+#define atomic_fetch_add_relaxed		atomic_fetch_add_relaxed
+#define atomic_fetch_add_acquire		atomic_fetch_add_acquire
+#define atomic_fetch_add_release		atomic_fetch_add_release
+#define atomic_fetch_add			atomic_fetch_add
+
+#define atomic_fetch_sub_relaxed		atomic_fetch_sub_relaxed
+#define atomic_fetch_sub_acquire		atomic_fetch_sub_acquire
+#define atomic_fetch_sub_release		atomic_fetch_sub_release
+#define atomic_fetch_sub			atomic_fetch_sub
+
+#define atomic_fetch_and_relaxed		atomic_fetch_and_relaxed
+#define atomic_fetch_and_acquire		atomic_fetch_and_acquire
+#define atomic_fetch_and_release		atomic_fetch_and_release
+#define atomic_fetch_and			atomic_fetch_and
+
+#define atomic_fetch_andnot_relaxed	atomic_fetch_andnot_relaxed
+#define atomic_fetch_andnot_acquire	atomic_fetch_andnot_acquire
+#define atomic_fetch_andnot_release	atomic_fetch_andnot_release
+#define atomic_fetch_andnot		atomic_fetch_andnot
+
+#define atomic_fetch_or_relaxed		atomic_fetch_or_relaxed
+#define atomic_fetch_or_acquire		atomic_fetch_or_acquire
+#define atomic_fetch_or_release		atomic_fetch_or_release
+#define atomic_fetch_or			atomic_fetch_or
+
+#define atomic_fetch_xor_relaxed		atomic_fetch_xor_relaxed
+#define atomic_fetch_xor_acquire		atomic_fetch_xor_acquire
+#define atomic_fetch_xor_release		atomic_fetch_xor_release
+#define atomic_fetch_xor			atomic_fetch_xor
+
+#define atomic_xchg_relaxed(v, new) \
+	xchg_relaxed(&((v)->counter), (new))
+#define atomic_xchg_acquire(v, new) \
+	xchg_acquire(&((v)->counter), (new))
+#define atomic_xchg_release(v, new) \
+	xchg_release(&((v)->counter), (new))
+#define atomic_xchg(v, new) \
+	xchg(&((v)->counter), (new))
+
+#define atomic_cmpxchg_relaxed(v, old, new) \
+	cmpxchg_relaxed(&((v)->counter), (old), (new))
+#define atomic_cmpxchg_acquire(v, old, new) \
+	cmpxchg_acquire(&((v)->counter), (old), (new))
+#define atomic_cmpxchg_release(v, old, new) \
+	cmpxchg_release(&((v)->counter), (old), (new))
+
+#define atomic_andnot			atomic_andnot
+
+#endif /* __ASM_ARM_ARM64_ATOMIC_H */
\ No newline at end of file
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 12/15] xen/arm64: port Linux's arm64 lse.h to Xen
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (10 preceding siblings ...)
  2020-11-11 21:51 ` [RFC PATCH v2 11/15] xen/arm64: port Linux's arm64 atomic.h " Ash Wilding
@ 2020-11-11 21:52 ` Ash Wilding
  2020-11-11 21:52 ` [RFC PATCH v2 13/15] xen/arm32: port Linux's arm32 atomic.h " Ash Wilding
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

This just involves making system_uses_lse_atomics() call cpus_have_cap()
instead of directly looking up in cpu_hwcap_keys.

Not 100% sure whether this is a valid transformation until I do a run
test.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/arm64/lse.h | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/xen/include/asm-arm/arm64/lse.h b/xen/include/asm-arm/arm64/lse.h
index 704be3e4e4..847727f219 100644
--- a/xen/include/asm-arm/arm64/lse.h
+++ b/xen/include/asm-arm/arm64/lse.h
@@ -1,28 +1,28 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_LSE_H
-#define __ASM_LSE_H
+/*
+ * Taken from Linux 5.10-rc2 (last commit 3cea11cd5)
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+#ifndef __ASM_ARM_ARM64_LSE_H
+#define __ASM_ARM_ARM64_LSE_H
 
-#include <asm/atomic_ll_sc.h>
+#include "atomic_ll_sc.h"
 
 #ifdef CONFIG_ARM64_LSE_ATOMICS
 
 #define __LSE_PREAMBLE	".arch_extension lse\n"
 
-#include <linux/compiler_types.h>
-#include <linux/export.h>
-#include <linux/jump_label.h>
-#include <linux/stringify.h>
+#include <xen/compiler.h>
+#include <xen/stringify.h>
+#include <xen/types.h>
+
 #include <asm/alternative.h>
-#include <asm/atomic_lse.h>
-#include <asm/cpucaps.h>
 
-extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
-extern struct static_key_false arm64_const_caps_ready;
+#include "atomic_lse.h"
 
 static inline bool system_uses_lse_atomics(void)
 {
-	return (static_branch_likely(&arm64_const_caps_ready)) &&
-		static_branch_likely(&cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]);
+	return cpus_have_cap(ARM64_HAS_LSE_ATOMICS);
 }
 
 #define __lse_ll_sc_body(op, ...)					\
@@ -45,4 +45,4 @@ static inline bool system_uses_lse_atomics(void) { return false; }
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
 
 #endif	/* CONFIG_ARM64_LSE_ATOMICS */
-#endif	/* __ASM_LSE_H */
\ No newline at end of file
+#endif	/* __ASM_ARM_ARM64_LSE_H */
\ No newline at end of file
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 13/15] xen/arm32: port Linux's arm32 atomic.h to Xen
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (11 preceding siblings ...)
  2020-11-11 21:52 ` [RFC PATCH v2 12/15] xen/arm64: port Linux's arm64 lse.h " Ash Wilding
@ 2020-11-11 21:52 ` Ash Wilding
  2020-11-11 21:52 ` [RFC PATCH v2 14/15] xen/arm32: port Linux's arm32 cmpxchg.h " Ash Wilding
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

 - Drop arch_* prefixes.

 - Redirect include of <xen/compiler.h> to <xen/rwonce.h>, and swap
   usage of READ_ONCE()/WRITE_ONCE() to the __* versions accordingly.
   As discussed earlier in the series, we can do this because we're
   accessing an atomic_t's counter member, which is an int, so the
   extra checks performed by READ_ONCE()/WRITE_ONCE() are redundant,
   and this actually matches the Linux arm64 code which already uses
   the __* variants.

 - Drop support for pre-Armv7 systems.

 - Drop atomic64_t helper definitions as we don't currently have an
   atomic64_t in Xen.

 - Add explicit strict variants of atomic_{add,sub}_return() as Linux
   does not define these for arm32 and they're needed for Xen. These
   strict variants are just wrappers that sandwich a call to the
   relaxed variants between two smp_mb()'s.'

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/arm32/atomic.h | 357 +++--------------------------
 1 file changed, 28 insertions(+), 329 deletions(-)

diff --git a/xen/include/asm-arm/arm32/atomic.h b/xen/include/asm-arm/arm32/atomic.h
index ac6338dd9b..2d8cd3c586 100644
--- a/xen/include/asm-arm/arm32/atomic.h
+++ b/xen/include/asm-arm/arm32/atomic.h
@@ -1,31 +1,26 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
 /*
- *  arch/arm/include/asm/atomic.h
+ * Taken from Linux 5.10-rc2 (last commit 3cea11cd5)
  *
- *  Copyright (C) 1996 Russell King.
- *  Copyright (C) 2002 Deep Blue Solutions Ltd.
+ * Copyright (C) 1996 Russell King.
+ * Copyright (C) 2002 Deep Blue Solutions Ltd.
+ * SPDX-License-Identifier: GPL-2.0-only
  */
-#ifndef __ASM_ARM_ATOMIC_H
-#define __ASM_ARM_ATOMIC_H
+#ifndef __ASM_ARM_ARM32_ATOMIC_H
+#define __ASM_ARM_ARM32_ATOMIC_H
 
-#include <linux/compiler.h>
-#include <linux/prefetch.h>
-#include <linux/types.h>
-#include <linux/irqflags.h>
-#include <asm/barrier.h>
-#include <asm/cmpxchg.h>
-
-#ifdef __KERNEL__
+#include <xen/rwonce.h>
+#include <xen/prefetch.h>
+#include <xen/types.h>
+#include "system.h"
+#include "cmpxchg.h"
 
 /*
  * On ARM, ordinary assignment (str instruction) doesn't clear the local
  * strex/ldrex monitor on some implementations. The reason we can use it for
  * atomic_set() is the clrex or dummy strex done on every exception return.
  */
-#define atomic_read(v)	READ_ONCE((v)->counter)
-#define atomic_set(v,i)	WRITE_ONCE(((v)->counter), (i))
-
-#if __LINUX_ARM_ARCH__ >= 6
+#define atomic_read(v)	__READ_ONCE((v)->counter)
+#define atomic_set(v,i)	__WRITE_ONCE(((v)->counter), (i))
 
 /*
  * ARMv6 UP and SMP safe atomic ops.  We use load exclusive and
@@ -153,68 +148,6 @@ static inline int atomic_fetch_add_unless(atomic_t *v, int a, int u)
 }
 #define atomic_fetch_add_unless		atomic_fetch_add_unless
 
-#else /* ARM_ARCH_6 */
-
-#ifdef CONFIG_SMP
-#error SMP not supported on pre-ARMv6 CPUs
-#endif
-
-#define ATOMIC_OP(op, c_op, asm_op)					\
-static inline void atomic_##op(int i, atomic_t *v)			\
-{									\
-	unsigned long flags;						\
-									\
-	raw_local_irq_save(flags);					\
-	v->counter c_op i;						\
-	raw_local_irq_restore(flags);					\
-}									\
-
-#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
-static inline int atomic_##op##_return(int i, atomic_t *v)		\
-{									\
-	unsigned long flags;						\
-	int val;							\
-									\
-	raw_local_irq_save(flags);					\
-	v->counter c_op i;						\
-	val = v->counter;						\
-	raw_local_irq_restore(flags);					\
-									\
-	return val;							\
-}
-
-#define ATOMIC_FETCH_OP(op, c_op, asm_op)				\
-static inline int atomic_fetch_##op(int i, atomic_t *v)			\
-{									\
-	unsigned long flags;						\
-	int val;							\
-									\
-	raw_local_irq_save(flags);					\
-	val = v->counter;						\
-	v->counter c_op i;						\
-	raw_local_irq_restore(flags);					\
-									\
-	return val;							\
-}
-
-static inline int atomic_cmpxchg(atomic_t *v, int old, int new)
-{
-	int ret;
-	unsigned long flags;
-
-	raw_local_irq_save(flags);
-	ret = v->counter;
-	if (likely(ret == old))
-		v->counter = new;
-	raw_local_irq_restore(flags);
-
-	return ret;
-}
-
-#define atomic_fetch_andnot		atomic_fetch_andnot
-
-#endif /* __LINUX_ARM_ARCH__ */
-
 #define ATOMIC_OPS(op, c_op, asm_op)					\
 	ATOMIC_OP(op, c_op, asm_op)					\
 	ATOMIC_OP_RETURN(op, c_op, asm_op)				\
@@ -242,266 +175,32 @@ ATOMIC_OPS(xor, ^=, eor)
 
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 
-#ifndef CONFIG_GENERIC_ATOMIC64
-typedef struct {
-	s64 counter;
-} atomic64_t;
+/*
+ * Linux doesn't define strict atomic_add_return() or atomic_sub_return()
+ * for /arch/arm -- Let's manually define these for Xen.
+ */
 
-#define ATOMIC64_INIT(i) { (i) }
-
-#ifdef CONFIG_ARM_LPAE
-static inline s64 atomic64_read(const atomic64_t *v)
-{
-	s64 result;
-
-	__asm__ __volatile__("@ atomic64_read\n"
-"	ldrd	%0, %H0, [%1]"
-	: "=&r" (result)
-	: "r" (&v->counter), "Qo" (v->counter)
-	);
-
-	return result;
-}
-
-static inline void atomic64_set(atomic64_t *v, s64 i)
-{
-	__asm__ __volatile__("@ atomic64_set\n"
-"	strd	%2, %H2, [%1]"
-	: "=Qo" (v->counter)
-	: "r" (&v->counter), "r" (i)
-	);
-}
-#else
-static inline s64 atomic64_read(const atomic64_t *v)
-{
-	s64 result;
-
-	__asm__ __volatile__("@ atomic64_read\n"
-"	ldrexd	%0, %H0, [%1]"
-	: "=&r" (result)
-	: "r" (&v->counter), "Qo" (v->counter)
-	);
-
-	return result;
-}
-
-static inline void atomic64_set(atomic64_t *v, s64 i)
+static inline int atomic_add_return(int i, atomic_t *v)
 {
-	s64 tmp;
-
-	prefetchw(&v->counter);
-	__asm__ __volatile__("@ atomic64_set\n"
-"1:	ldrexd	%0, %H0, [%2]\n"
-"	strexd	%0, %3, %H3, [%2]\n"
-"	teq	%0, #0\n"
-"	bne	1b"
-	: "=&r" (tmp), "=Qo" (v->counter)
-	: "r" (&v->counter), "r" (i)
-	: "cc");
-}
-#endif
-
-#define ATOMIC64_OP(op, op1, op2)					\
-static inline void atomic64_##op(s64 i, atomic64_t *v)			\
-{									\
-	s64 result;							\
-	unsigned long tmp;						\
-									\
-	prefetchw(&v->counter);						\
-	__asm__ __volatile__("@ atomic64_" #op "\n"			\
-"1:	ldrexd	%0, %H0, [%3]\n"					\
-"	" #op1 " %Q0, %Q0, %Q4\n"					\
-"	" #op2 " %R0, %R0, %R4\n"					\
-"	strexd	%1, %0, %H0, [%3]\n"					\
-"	teq	%1, #0\n"						\
-"	bne	1b"							\
-	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
-	: "r" (&v->counter), "r" (i)					\
-	: "cc");							\
-}									\
-
-#define ATOMIC64_OP_RETURN(op, op1, op2)				\
-static inline s64							\
-atomic64_##op##_return_relaxed(s64 i, atomic64_t *v)			\
-{									\
-	s64 result;							\
-	unsigned long tmp;						\
-									\
-	prefetchw(&v->counter);						\
-									\
-	__asm__ __volatile__("@ atomic64_" #op "_return\n"		\
-"1:	ldrexd	%0, %H0, [%3]\n"					\
-"	" #op1 " %Q0, %Q0, %Q4\n"					\
-"	" #op2 " %R0, %R0, %R4\n"					\
-"	strexd	%1, %0, %H0, [%3]\n"					\
-"	teq	%1, #0\n"						\
-"	bne	1b"							\
-	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)		\
-	: "r" (&v->counter), "r" (i)					\
-	: "cc");							\
-									\
-	return result;							\
-}
-
-#define ATOMIC64_FETCH_OP(op, op1, op2)					\
-static inline s64							\
-atomic64_fetch_##op##_relaxed(s64 i, atomic64_t *v)			\
-{									\
-	s64 result, val;						\
-	unsigned long tmp;						\
-									\
-	prefetchw(&v->counter);						\
-									\
-	__asm__ __volatile__("@ atomic64_fetch_" #op "\n"		\
-"1:	ldrexd	%0, %H0, [%4]\n"					\
-"	" #op1 " %Q1, %Q0, %Q5\n"					\
-"	" #op2 " %R1, %R0, %R5\n"					\
-"	strexd	%2, %1, %H1, [%4]\n"					\
-"	teq	%2, #0\n"						\
-"	bne	1b"							\
-	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Qo" (v->counter)	\
-	: "r" (&v->counter), "r" (i)					\
-	: "cc");							\
-									\
-	return result;							\
-}
-
-#define ATOMIC64_OPS(op, op1, op2)					\
-	ATOMIC64_OP(op, op1, op2)					\
-	ATOMIC64_OP_RETURN(op, op1, op2)				\
-	ATOMIC64_FETCH_OP(op, op1, op2)
-
-ATOMIC64_OPS(add, adds, adc)
-ATOMIC64_OPS(sub, subs, sbc)
-
-#define atomic64_add_return_relaxed	atomic64_add_return_relaxed
-#define atomic64_sub_return_relaxed	atomic64_sub_return_relaxed
-#define atomic64_fetch_add_relaxed	atomic64_fetch_add_relaxed
-#define atomic64_fetch_sub_relaxed	atomic64_fetch_sub_relaxed
-
-#undef ATOMIC64_OPS
-#define ATOMIC64_OPS(op, op1, op2)					\
-	ATOMIC64_OP(op, op1, op2)					\
-	ATOMIC64_FETCH_OP(op, op1, op2)
-
-#define atomic64_andnot atomic64_andnot
-
-ATOMIC64_OPS(and, and, and)
-ATOMIC64_OPS(andnot, bic, bic)
-ATOMIC64_OPS(or,  orr, orr)
-ATOMIC64_OPS(xor, eor, eor)
-
-#define atomic64_fetch_and_relaxed	atomic64_fetch_and_relaxed
-#define atomic64_fetch_andnot_relaxed	atomic64_fetch_andnot_relaxed
-#define atomic64_fetch_or_relaxed	atomic64_fetch_or_relaxed
-#define atomic64_fetch_xor_relaxed	atomic64_fetch_xor_relaxed
-
-#undef ATOMIC64_OPS
-#undef ATOMIC64_FETCH_OP
-#undef ATOMIC64_OP_RETURN
-#undef ATOMIC64_OP
-
-static inline s64 atomic64_cmpxchg_relaxed(atomic64_t *ptr, s64 old, s64 new)
-{
-	s64 oldval;
-	unsigned long res;
-
-	prefetchw(&ptr->counter);
-
-	do {
-		__asm__ __volatile__("@ atomic64_cmpxchg\n"
-		"ldrexd		%1, %H1, [%3]\n"
-		"mov		%0, #0\n"
-		"teq		%1, %4\n"
-		"teqeq		%H1, %H4\n"
-		"strexdeq	%0, %5, %H5, [%3]"
-		: "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
-		: "r" (&ptr->counter), "r" (old), "r" (new)
-		: "cc");
-	} while (res);
-
-	return oldval;
-}
-#define atomic64_cmpxchg_relaxed	atomic64_cmpxchg_relaxed
-
-static inline s64 atomic64_xchg_relaxed(atomic64_t *ptr, s64 new)
-{
-	s64 result;
-	unsigned long tmp;
-
-	prefetchw(&ptr->counter);
-
-	__asm__ __volatile__("@ atomic64_xchg\n"
-"1:	ldrexd	%0, %H0, [%3]\n"
-"	strexd	%1, %4, %H4, [%3]\n"
-"	teq	%1, #0\n"
-"	bne	1b"
-	: "=&r" (result), "=&r" (tmp), "+Qo" (ptr->counter)
-	: "r" (&ptr->counter), "r" (new)
-	: "cc");
-
-	return result;
-}
-#define atomic64_xchg_relaxed		atomic64_xchg_relaxed
-
-static inline s64 atomic64_dec_if_positive(atomic64_t *v)
-{
-	s64 result;
-	unsigned long tmp;
+	int ret;
 
 	smp_mb();
-	prefetchw(&v->counter);
-
-	__asm__ __volatile__("@ atomic64_dec_if_positive\n"
-"1:	ldrexd	%0, %H0, [%3]\n"
-"	subs	%Q0, %Q0, #1\n"
-"	sbc	%R0, %R0, #0\n"
-"	teq	%R0, #0\n"
-"	bmi	2f\n"
-"	strexd	%1, %0, %H0, [%3]\n"
-"	teq	%1, #0\n"
-"	bne	1b\n"
-"2:"
-	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
-	: "r" (&v->counter)
-	: "cc");
-
+	ret = atomic_add_return_relaxed(i, v);
 	smp_mb();
 
-	return result;
+	return ret;
 }
-#define atomic64_dec_if_positive atomic64_dec_if_positive
+#define atomic_fetch_add(i, v) atomic_add_return(i, v)
 
-static inline s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
+static inline int atomic_sub_return(int i, atomic_t *v)
 {
-	s64 oldval, newval;
-	unsigned long tmp;
+	int ret;
 
 	smp_mb();
-	prefetchw(&v->counter);
-
-	__asm__ __volatile__("@ atomic64_add_unless\n"
-"1:	ldrexd	%0, %H0, [%4]\n"
-"	teq	%0, %5\n"
-"	teqeq	%H0, %H5\n"
-"	beq	2f\n"
-"	adds	%Q1, %Q0, %Q6\n"
-"	adc	%R1, %R0, %R6\n"
-"	strexd	%2, %1, %H1, [%4]\n"
-"	teq	%2, #0\n"
-"	bne	1b\n"
-"2:"
-	: "=&r" (oldval), "=&r" (newval), "=&r" (tmp), "+Qo" (v->counter)
-	: "r" (&v->counter), "r" (u), "r" (a)
-	: "cc");
-
-	if (oldval != u)
-		smp_mb();
+	ret = atomic_sub_return_relaxed(i, v);
+	smp_mb();
 
-	return oldval;
+	return ret;
 }
-#define atomic64_fetch_add_unless atomic64_fetch_add_unless
 
-#endif /* !CONFIG_GENERIC_ATOMIC64 */
-#endif
-#endif
\ No newline at end of file
+#endif /* __ASM_ARM_ARM32_ATOMIC_H */
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 14/15] xen/arm32: port Linux's arm32 cmpxchg.h to Xen
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (12 preceding siblings ...)
  2020-11-11 21:52 ` [RFC PATCH v2 13/15] xen/arm32: port Linux's arm32 atomic.h " Ash Wilding
@ 2020-11-11 21:52 ` Ash Wilding
  2020-11-11 21:52 ` [RFC PATCH v2 15/15] xen/arm: remove dependency on gcc built-in __sync_fetch_and_add() Ash Wilding
  2020-12-15 19:31 ` [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Julien Grall
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

 - Drop support for pre-Armv7 systems, including the workarounds for
   the swp instruction being broken on StrongARM.

 - Drop local variants as no callers in Xen.

 - Keep the compiler happy by fixing __cmpxchg64()'s ptr arg to be
   volatile, and casting ptr to be (const void *) in the call to
   prefetchw().

 - Add explicit strict variants of xchg(), cmpxchg(), and cmpxchg64(),
   as the Linux arm32 cmpxchg.h doesn't define these and they're needed
   for Xen. These strict variants are just wrappers that sandwich calls
   to the relaxed variants between two smp_mb()'s.

 - Pull in the timeout variants of cmpxchg from the original Xen
   arm32 cmpxchg.h as these are required for guest atomics and are
   not provided by Linux.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/arm32/cmpxchg.h | 322 ++++++++++++++++------------
 1 file changed, 188 insertions(+), 134 deletions(-)

diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
index 638ae84afb..d7189984d0 100644
--- a/xen/include/asm-arm/arm32/cmpxchg.h
+++ b/xen/include/asm-arm/arm32/cmpxchg.h
@@ -1,46 +1,24 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_ARM_CMPXCHG_H
-#define __ASM_ARM_CMPXCHG_H
-
-#include <linux/irqflags.h>
-#include <linux/prefetch.h>
-#include <asm/barrier.h>
-
-#if defined(CONFIG_CPU_SA1100) || defined(CONFIG_CPU_SA110)
 /*
- * On the StrongARM, "swp" is terminally broken since it bypasses the
- * cache totally.  This means that the cache becomes inconsistent, and,
- * since we use normal loads/stores as well, this is really bad.
- * Typically, this causes oopsen in filp_close, but could have other,
- * more disastrous effects.  There are two work-arounds:
- *  1. Disable interrupts and emulate the atomic swap
- *  2. Clean the cache, perform atomic swap, flush the cache
+ * Taken from Linux 5.10-rc2 (last commit 3cea11cd5)
  *
- * We choose (1) since its the "easiest" to achieve here and is not
- * dependent on the processor type.
- *
- * NOTE that this solution won't work on an SMP system, so explcitly
- * forbid it here.
+ * SPDX-License-Identifier: GPL-2.0
  */
-#define swp_is_buggy
-#endif
+#ifndef __ASM_ARM_ARM32_CMPXCHG_H
+#define __ASM_ARM_ARM32_CMPXCHG_H
+
+#include <xen/prefetch.h>
+#include <xen/types.h>
+
+extern void __bad_cmpxchg(volatile void *ptr, int size);
 
 static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
 {
-	extern void __bad_xchg(volatile void *, int);
 	unsigned long ret;
-#ifdef swp_is_buggy
-	unsigned long flags;
-#endif
-#if __LINUX_ARM_ARCH__ >= 6
 	unsigned int tmp;
-#endif
 
 	prefetchw((const void *)ptr);
 
 	switch (size) {
-#if __LINUX_ARM_ARCH__ >= 6
-#ifndef CONFIG_CPU_V6 /* MIN ARCH >= V6K */
 	case 1:
 		asm volatile("@	__xchg1\n"
 		"1:	ldrexb	%0, [%3]\n"
@@ -61,7 +39,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 			: "r" (x), "r" (ptr)
 			: "memory", "cc");
 		break;
-#endif
 	case 4:
 		asm volatile("@	__xchg4\n"
 		"1:	ldrex	%0, [%3]\n"
@@ -72,42 +49,10 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 			: "r" (x), "r" (ptr)
 			: "memory", "cc");
 		break;
-#elif defined(swp_is_buggy)
-#ifdef CONFIG_SMP
-#error SMP is not supported on this platform
-#endif
-	case 1:
-		raw_local_irq_save(flags);
-		ret = *(volatile unsigned char *)ptr;
-		*(volatile unsigned char *)ptr = x;
-		raw_local_irq_restore(flags);
-		break;
 
-	case 4:
-		raw_local_irq_save(flags);
-		ret = *(volatile unsigned long *)ptr;
-		*(volatile unsigned long *)ptr = x;
-		raw_local_irq_restore(flags);
-		break;
-#else
-	case 1:
-		asm volatile("@	__xchg1\n"
-		"	swpb	%0, %1, [%2]"
-			: "=&r" (ret)
-			: "r" (x), "r" (ptr)
-			: "memory", "cc");
-		break;
-	case 4:
-		asm volatile("@	__xchg4\n"
-		"	swp	%0, %1, [%2]"
-			: "=&r" (ret)
-			: "r" (x), "r" (ptr)
-			: "memory", "cc");
-		break;
-#endif
 	default:
-		/* Cause a link-time error, the xchg() size is not supported */
-		__bad_xchg(ptr, size), ret = 0;
+		/* Cause a link-time error, the size is not supported */
+		__bad_cmpxchg(ptr, size), ret = 0;
 		break;
 	}
 
@@ -119,40 +64,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 				   sizeof(*(ptr)));			\
 })
 
-#include <asm-generic/cmpxchg-local.h>
-
-#if __LINUX_ARM_ARCH__ < 6
-/* min ARCH < ARMv6 */
-
-#ifdef CONFIG_SMP
-#error "SMP is not supported on this platform"
-#endif
-
-#define xchg xchg_relaxed
-
-/*
- * cmpxchg_local and cmpxchg64_local are atomic wrt current CPU. Always make
- * them available.
- */
-#define cmpxchg_local(ptr, o, n) ({					\
-	(__typeof(*ptr))__cmpxchg_local_generic((ptr),			\
-					        (unsigned long)(o),	\
-					        (unsigned long)(n),	\
-					        sizeof(*(ptr)));	\
-})
-
-#define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
-
-#include <asm-generic/cmpxchg.h>
-
-#else	/* min ARCH >= ARMv6 */
-
-extern void __bad_cmpxchg(volatile void *ptr, int size);
-
-/*
- * cmpxchg only support 32-bits operands on ARMv6.
- */
-
 static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 				      unsigned long new, int size)
 {
@@ -161,7 +72,6 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 	prefetchw((const void *)ptr);
 
 	switch (size) {
-#ifndef CONFIG_CPU_V6	/* min ARCH >= ARMv6K */
 	case 1:
 		do {
 			asm volatile("@ __cmpxchg1\n"
@@ -186,7 +96,6 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 				: "memory", "cc");
 		} while (res);
 		break;
-#endif
 	case 4:
 		do {
 			asm volatile("@ __cmpxchg4\n"
@@ -199,6 +108,7 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 				: "memory", "cc");
 		} while (res);
 		break;
+
 	default:
 		__bad_cmpxchg(ptr, size);
 		oldval = 0;
@@ -214,41 +124,14 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 				      sizeof(*(ptr)));			\
 })
 
-static inline unsigned long __cmpxchg_local(volatile void *ptr,
-					    unsigned long old,
-					    unsigned long new, int size)
-{
-	unsigned long ret;
-
-	switch (size) {
-#ifdef CONFIG_CPU_V6	/* min ARCH == ARMv6 */
-	case 1:
-	case 2:
-		ret = __cmpxchg_local_generic(ptr, old, new, size);
-		break;
-#endif
-	default:
-		ret = __cmpxchg(ptr, old, new, size);
-	}
-
-	return ret;
-}
-
-#define cmpxchg_local(ptr, o, n) ({					\
-	(__typeof(*ptr))__cmpxchg_local((ptr),				\
-				        (unsigned long)(o),		\
-				        (unsigned long)(n),		\
-				        sizeof(*(ptr)));		\
-})
-
-static inline unsigned long long __cmpxchg64(unsigned long long *ptr,
+static inline unsigned long long __cmpxchg64(volatile unsigned long long *ptr,
 					     unsigned long long old,
 					     unsigned long long new)
 {
 	unsigned long long oldval;
 	unsigned long res;
 
-	prefetchw(ptr);
+	prefetchw((const void *)ptr);
 
 	__asm__ __volatile__(
 "1:	ldrexd		%1, %H1, [%3]\n"
@@ -272,8 +155,179 @@ static inline unsigned long long __cmpxchg64(unsigned long long *ptr,
 					(unsigned long long)(n));	\
 })
 
-#define cmpxchg64_local(ptr, o, n) cmpxchg64_relaxed((ptr), (o), (n))
 
-#endif	/* __LINUX_ARM_ARCH__ >= 6 */
+/*
+ * Linux doesn't provide strict versions of xchg(), cmpxchg(), and cmpxchg64(),
+ * so manually define these for Xen as smp_mb() wrappers around the relaxed
+ * variants.
+ */
 
-#endif /* __ASM_ARM_CMPXCHG_H */
\ No newline at end of file
+#define xchg(ptr, x) ({ \
+	long ret; \
+	smp_mb(); \
+	ret = xchg_relaxed(ptr, x); \
+	smp_mb(); \
+	ret; \
+})
+
+#define cmpxchg(ptr, o, n) ({ \
+	long ret; \
+	smp_mb(); \
+	ret = cmpxchg_relaxed(ptr, o, n); \
+	smp_mb(); \
+	ret; \
+})
+
+#define cmpxchg64(ptr, o, n) ({ \
+	long long ret; \
+	smp_mb(); \
+	ret = cmpxchg64_relaxed(ptr, o, n); \
+	smp_mb(); \
+	ret; \
+})
+
+/*
+ * This code is from the original Xen arm32 cmpxchg.h, from before the
+ * Linux 5.10-rc2 atomics helpers were ported over. The only changes
+ * here are renaming the macros and functions to explicitly use
+ * "timeout" in their names so that they don't clash with the above.
+ *
+ * We need this here for guest atomics (the only user of the timeout
+ * variants).
+ */
+
+#define __CMPXCHG_TIMEOUT_CASE(sz, name)                                        \
+static inline bool __cmpxchg_timeout_case_##name(volatile void *ptr,            \
+                                         unsigned long *old,            \
+                                         unsigned long new,             \
+                                         bool timeout,                  \
+                                         unsigned int max_try)          \
+{                                                                       \
+        unsigned long oldval;                                           \
+        unsigned long res;                                              \
+                                                                        \
+        do {                                                            \
+                asm volatile("@ __cmpxchg_timeout_case_" #name "\n"             \
+                "       ldrex" #sz "    %1, [%2]\n"                     \
+                "       mov     %0, #0\n"                               \
+                "       teq     %1, %3\n"                               \
+                "       strex" #sz "eq %0, %4, [%2]\n"                  \
+                : "=&r" (res), "=&r" (oldval)                           \
+                : "r" (ptr), "Ir" (*old), "r" (new)                     \
+                : "memory", "cc");                                      \
+                                                                        \
+                if (!res)                                               \
+                        break;                                          \
+        } while (!timeout || ((--max_try) > 0));                        \
+                                                                        \
+        *old = oldval;                                                  \
+                                                                        \
+        return !res;                                                    \
+}
+
+__CMPXCHG_TIMEOUT_CASE(b, 1)
+__CMPXCHG_TIMEOUT_CASE(h, 2)
+__CMPXCHG_TIMEOUT_CASE( , 4)
+
+static inline bool __cmpxchg_timeout_case_8(volatile uint64_t *ptr,
+                                    uint64_t *old,
+                                    uint64_t new,
+                                    bool timeout,
+                                    unsigned int max_try)
+{
+        uint64_t oldval;
+        uint64_t res;
+
+        do {
+                asm volatile(
+                "       ldrexd          %1, %H1, [%3]\n"
+                "       teq             %1, %4\n"
+                "       teqeq           %H1, %H4\n"
+                "       movne           %0, #0\n"
+                "       movne           %H0, #0\n"
+                "       bne             2f\n"
+                "       strexd          %0, %5, %H5, [%3]\n"
+                "2:"
+                : "=&r" (res), "=&r" (oldval), "+Qo" (*ptr)
+                : "r" (ptr), "r" (*old), "r" (new)
+                : "memory", "cc");
+                if (!res)
+                        break;
+        } while (!timeout || ((--max_try) > 0));
+
+        *old = oldval;
+
+        return !res;
+}
+
+static always_inline bool __int_cmpxchg(volatile void *ptr, unsigned long *old,
+                                        unsigned long new, int size,
+                                        bool timeout, unsigned int max_try)
+{
+        prefetchw((const void *)ptr);
+
+        switch (size) {
+        case 1:
+                return __cmpxchg_timeout_case_1(ptr, old, new, timeout, max_try);
+        case 2:
+                return __cmpxchg_timeout_case_2(ptr, old, new, timeout, max_try);
+        case 4:
+                return __cmpxchg_timeout_case_4(ptr, old, new, timeout, max_try);
+        default:
+                __bad_cmpxchg(ptr, size);
+				return false;
+        }
+
+        ASSERT_UNREACHABLE();
+}
+
+/*
+ * The helper may fail to update the memory if the action takes too long.
+ *
+ * @old: On call the value pointed contains the expected old value. It will be
+ * updated to the actual old value.
+ * @max_try: Maximum number of iterations
+ *
+ * The helper will return true when the update has succeeded (i.e no
+ * timeout) and false if the update has failed.
+ */
+static always_inline bool __cmpxchg_timeout(volatile void *ptr,
+                                            unsigned long *old,
+                                            unsigned long new,
+                                            int size,
+                                            unsigned int max_try)
+{
+        bool ret;
+
+        smp_mb();
+        ret = __int_cmpxchg(ptr, old, new, size, true, max_try);
+        smp_mb();
+
+        return ret;
+}
+
+/*
+ * The helper may fail to update the memory if the action takes too long.
+ *
+ * @old: On call the value pointed contains the expected old value. It will be
+ * updated to the actual old value.
+ * @max_try: Maximum number of iterations
+ *
+ * The helper will return true when the update has succeeded (i.e no
+ * timeout) and false if the update has failed.
+ */
+static always_inline bool __cmpxchg64_timeout(volatile uint64_t *ptr,
+                                              uint64_t *old,
+                                              uint64_t new,
+                                              unsigned int max_try)
+{
+        bool ret;
+
+        smp_mb();
+        ret = __cmpxchg_timeout_case_8(ptr, old, new, true, max_try);
+        smp_mb();
+
+        return ret;
+}
+
+#endif /* __ASM_ARM_ARM32_CMPXCHG_H */
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 15/15] xen/arm: remove dependency on gcc built-in __sync_fetch_and_add()
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (13 preceding siblings ...)
  2020-11-11 21:52 ` [RFC PATCH v2 14/15] xen/arm32: port Linux's arm32 cmpxchg.h " Ash Wilding
@ 2020-11-11 21:52 ` Ash Wilding
  2020-12-15 19:31 ` [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Julien Grall
  15 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-11 21:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Ash Wilding, julien, bertrand.marquis, rahul.singh

From: Ash Wilding <ash.j.wilding@gmail.com>

Now that we have explicit implementations of LL/SC and LSE atomics
helpers after porting Linux's versions to Xen, we can drop the reference
to gcc's built-in __sync_fetch_and_add().

This requires some fudging using container_of() because the users of
__sync_fetch_and_add(), namely xen/spinlock.c, expect the ptr to be
directly to the u32 being modified while the atomics helpers expect the
ptr to be to an atomic_t and then access that atomic_t's counter member.

By using container_of() we can create a "fake" (atomic_t *) pointer and
pass that to the atomic_fetch_add() that we ported from Linux.

NOTE: spinlock.c is using u32 for the value being added while the atomics
helpers use int for their counter member. This shouldn't actually matter
because we do the addition in assembly and the compiler isn't smart enough
to detect potential signed integer overflow in inline assembly, but I
thought it worth calling out in the commit message.

Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
---
 xen/include/asm-arm/system.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h
index 65d5c8e423..0326e3ade4 100644
--- a/xen/include/asm-arm/system.h
+++ b/xen/include/asm-arm/system.h
@@ -58,7 +58,14 @@ static inline int local_abort_is_enabled(void)
     return !(flags & PSR_ABT_MASK);
 }
 
-#define arch_fetch_and_add(x, v) __sync_fetch_and_add(x, v)
+#define arch_fetch_and_add(ptr, x) ({                                   \
+    int ret;                                                            \
+                                                                        \
+    atomic_t * tmp = container_of((int *)(&(x)), atomic_t, counter);    \
+    ret = atomic_fetch_add(x, tmp);                                     \
+                                                                        \
+    ret;                                                                \
+})
 
 extern struct vcpu *__context_switch(struct vcpu *prev, struct vcpu *next);
 
-- 
2.24.3 (Apple Git-128)



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 05/15] xen/arm: pull in Linux atomics helpers and dependencies
  2020-11-11 21:51 ` [RFC PATCH v2 05/15] xen/arm: pull in Linux atomics helpers and dependencies Ash Wilding
@ 2020-11-12  8:31   ` Jan Beulich
  2020-11-12 10:52     ` Ash Wilding
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2020-11-12  8:31 UTC (permalink / raw)
  To: Ash Wilding; +Cc: julien, bertrand.marquis, rahul.singh, xen-devel

On 11.11.2020 22:51, Ash Wilding wrote:
> From: Ash Wilding <ash.j.wilding@gmail.com>
> 
> This patch pulls in Linux's atomics helpers for arm32 and arm64, and
> their dependencies, as-is.
> 
> Note that Linux's arm32 atomics helpers use the READ_ONCE() and
> WRITE_ONCE() macros defined in <asm-generic/rwonce.h>, while Linux's
> arm64 atomics helpers use __READ_ONCE() and __WRITE_ONCE().

And our ACCESS_ONCE() can't be used, or be made usable? I don't think
we want a 3rd variant when we're already in the process of discussing
how to fold the two ones we have right now.

> The only difference is that the __* versions skip checking whether the
> object being accessed is the same size as a native C type (e.g. char,
> int, long, etc). Given the arm32 helpers are using the macros to access
> an atomic_t's counter member, which is an int, we can skip this check by
> using the __* versions like the arm64 code does.
> 
> I mention this here because it forms the first part of my justification
> for *not* copying Linux's <linux/compiler_types.h> to Xen; the size
> check described above is performed by __native_word() defined in that
> header.
> 
> The second part of my justification may be more contentious; as you'll
> see in the next patch, I intend to drop the __unqual_scalar_typeof()
> calls in __READ_ONCE() and __WRITE_ONCE(). This is because the pointer
> to the atomic_t's counter member is always a basic (int *) so we don't
> need this, and dropping it means we can avoid having to port Linux's
> <linux/compiler_types.h>.

I don't think weakening the checking is a good idea when the macros
are being made available for general use. If they'd be renamed to be
private flavors for use just in Arm's atomics, this would be a
different thing.

> If people are against this approach, please bear in mind that the
> current version of __unqual_scalar_typeof() in <linux/compiler_types.h>
> was actually the reason for Linux upgrading the minimum GCC version
> required to 4.9 earlier this year so that they can guarantee C11
> _Generic support [1].
> 
> So if we do want to take Linux's <linux/compiler_types.h> we'll either
> need to:
> 
>    A) bump up the minimum required version of GCC to 4.9 to match
>       that required by Linux; in the Xen README we currently state
>       GCC 4.8 for Arm and GCC 4.1.2_20070115 for x86.

Discussion about bumping the baseline has started, but is progressing
rather slowly. I'm not an Arm maintainer, but it would seem to me
that going from 4.8 to 4.9 just for Arm at this point may be easier
than the decision what the new baseline ought to be for x86.

> --- /dev/null
> +++ b/xen/include/xen/rwonce.h
> @@ -0,0 +1,90 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Prevent the compiler from merging or refetching reads or writes. The
> + * compiler is also forbidden from reordering successive instances of
> + * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
> + * particular ordering. One way to make the compiler aware of ordering is to
> + * put the two invocations of READ_ONCE or WRITE_ONCE in different C
> + * statements.
> + *
> + * These two macros will also work on aggregate data types like structs or
> + * unions.
> + *
> + * Their two major use cases are: (1) Mediating communication between
> + * process-level code and irq/NMI handlers, all running on the same CPU,
> + * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
> + * mutilate accesses that either do not require ordering or that interact
> + * with an explicit memory barrier or atomic instruction that provides the
> + * required ordering.
> + */
> +#ifndef __ASM_GENERIC_RWONCE_H
> +#define __ASM_GENERIC_RWONCE_H
> +
> +#ifndef __ASSEMBLY__
> +
> +#include <linux/compiler_types.h>
> +#include <linux/kasan-checks.h>
> +#include <linux/kcsan-checks.h>
> +
> +/*
> + * Yes, this permits 64-bit accesses on 32-bit architectures. These will
> + * actually be atomic in some cases (namely Armv7 + LPAE), but for others we
> + * rely on the access being split into 2x32-bit accesses for a 32-bit quantity
> + * (e.g. a virtual address) and a strong prevailing wind.
> + */
> +#define compiletime_assert_rwonce_type(t)					\
> +	compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),	\
> +		"Unsupported access size for {READ,WRITE}_ONCE().")
> +
> +/*
> + * Use __READ_ONCE() instead of READ_ONCE() if you do not require any
> + * atomicity. Note that this may result in tears!
> + */
> +#ifndef __READ_ONCE
> +#define __READ_ONCE(x)	(*(const volatile __unqual_scalar_typeof(x) *)&(x))
> +#endif
> +
> +#define READ_ONCE(x)							\
> +({									\
> +	compiletime_assert_rwonce_type(x);				\
> +	__READ_ONCE(x);							\
> +})
> +
> +#define __WRITE_ONCE(x, val)						\
> +do {									\
> +	*(volatile typeof(x) *)&(x) = (val);				\
> +} while (0)
> +
> +#define WRITE_ONCE(x, val)						\
> +do {									\
> +	compiletime_assert_rwonce_type(x);				\
> +	__WRITE_ONCE(x, val);						\
> +} while (0)
> +
> +static __no_sanitize_or_inline
> +unsigned long __read_once_word_nocheck(const void *addr)
> +{
> +	return __READ_ONCE(*(unsigned long *)addr);
> +}
> +
> +/*
> + * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need to load a
> + * word from memory atomically but without telling KASAN/KCSAN. This is
> + * usually used by unwinding code when walking the stack of a running process.
> + */
> +#define READ_ONCE_NOCHECK(x)						\
> +({									\
> +	compiletime_assert(sizeof(x) == sizeof(unsigned long),		\
> +		"Unsupported access size for READ_ONCE_NOCHECK().");	\
> +	(typeof(x))__read_once_word_nocheck(&(x));			\
> +})
> +
> +static __no_kasan_or_inline
> +unsigned long read_word_at_a_time(const void *addr)
> +{
> +	kasan_check_read(addr, 1);
> +	return *(unsigned long *)addr;
> +}
> +
> +#endif /* __ASSEMBLY__ */
> +#endif	/* __ASM_GENERIC_RWONCE_H */
> \ No newline at end of file

This wants taking care of in any event - there are multiple instances
in this patch (in fact it looks as if all of the new files had this
issue), and I didn't check others.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 05/15] xen/arm: pull in Linux atomics helpers and dependencies
  2020-11-12  8:31   ` Jan Beulich
@ 2020-11-12 10:52     ` Ash Wilding
  0 siblings, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-11-12 10:52 UTC (permalink / raw)
  To: jbeulich; +Cc: ash.j.wilding, bertrand.marquis, julien, rahul.singh, xen-devel

Hey Jan,


>>
>> Note that Linux's arm32 atomics helpers use the READ_ONCE() and
>> WRITE_ONCE() macros defined in <asm-generic/rwonce.h>, while Linux's
>> arm64 atomics helpers use __READ_ONCE() and __WRITE_ONCE().
>
> And our ACCESS_ONCE() can't be used, or be made usable? I don't think
> we want a 3rd variant when we're already in the process of discussing
> how to fold the two ones we have right now.

Many thanks for the pointer, I'm still familiarising myself with Xen's
codebase and wasn't aware of ACCESS_ONCE(); yes, that's exactly what we
need which means we can drop Linux's <asm-generic/rwonce.h> completely.


That also means:

>
> I don't think weakening the checking is a good idea when the macros
> are being made available for general use. If they'd be renamed to be
> private flavors for use just in Arm's atomics, this would be a
> different thing.

This problem goes away, as does the need/desire to bump the minimum GCC
version up to 4.9 for xen/arm just to support the usage of C11 _Generic
in Linux's <linux/compiler_types.h>.

That said, agreed, I did think the way I'd done it was a tad suspect
hence the "possibly contentious" disclaimer :-) I'll keep this in mind
for similar porting work in future. 


>>
>> \ No newline at end of file
>
> This wants taking care of in any event - there are multiple instances
> in this patch (in fact it looks as if all of the new files had this
> issue), and I didn't check others.

Ack, will fix.


Thanks for taking the time to provide feedback!

Cheers,
Ash.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 08/15] xen/arm64: port Linux's arm64 atomic_ll_sc.h to Xen
  2020-11-11 21:51 ` [RFC PATCH v2 08/15] xen/arm64: port Linux's arm64 atomic_ll_sc.h to Xen Ash Wilding
@ 2020-12-15 18:38   ` Julien Grall
  0 siblings, 0 replies; 24+ messages in thread
From: Julien Grall @ 2020-12-15 18:38 UTC (permalink / raw)
  To: Ash Wilding, xen-devel; +Cc: bertrand.marquis, rahul.singh

Hi Ash,

On 11/11/2020 21:51, Ash Wilding wrote:
> From: Ash Wilding <ash.j.wilding@gmail.com>
> 
> Most of the "work" here is simply deleting the atomic64_t helper
> definitions as we don't have an atomic64_t type in Xen.

There is some request to support atomic64_t in Xen. I would suggest to 
keep as this would be simpler to support it in the future.

Although, we can probably just revert part of the patch when this is 
necessary.

> 
> Signed-off-by: Ash Wilding <ash.j.wilding@gmail.com>
> ---
>   xen/include/asm-arm/arm64/atomic_ll_sc.h | 134 +----------------------

I think we want to update README.LinuxPrimitives also with the new 
baseline for the helpers.

>   1 file changed, 6 insertions(+), 128 deletions(-)
> 
> diff --git a/xen/include/asm-arm/arm64/atomic_ll_sc.h b/xen/include/asm-arm/arm64/atomic_ll_sc.h
> index e1009c0f94..20b0cb174e 100644
> --- a/xen/include/asm-arm/arm64/atomic_ll_sc.h
> +++ b/xen/include/asm-arm/arm64/atomic_ll_sc.h
> @@ -1,16 +1,16 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
>   /*
> - * Based on arch/arm/include/asm/atomic.h
> + * Taken from Linux 5.10-rc2 (last commit 3cea11cd5)
>    *
>    * Copyright (C) 1996 Russell King.
>    * Copyright (C) 2002 Deep Blue Solutions Ltd.
>    * Copyright (C) 2012 ARM Ltd.
> + * SPDX-License-Identifier: GPL-2.0-only

May I ask why the SDPX license was moved around?

>    */
>   
> -#ifndef __ASM_ATOMIC_LL_SC_H
> -#define __ASM_ATOMIC_LL_SC_H
> +#ifndef __ASM_ARM_ARM64_ATOMIC_LL_SC_H
> +#define __ASM_ARM_ARM64_ATOMIC_LL_SC_H

I would suggest to keep the Linux guards.

>   
> -#include <linux/stringify.h>
> +#include <xen/stringify.h>
>   
>   #ifdef CONFIG_ARM64_LSE_ATOMICS
>   #define __LL_SC_FALLBACK(asm_ops)					\
> @@ -134,128 +134,6 @@ ATOMIC_OPS(andnot, bic, )
>   #undef ATOMIC_OP_RETURN
>   #undef ATOMIC_OP
>   
> -#define ATOMIC64_OP(op, asm_op, constraint)				\
> -static inline void							\
> -__ll_sc_atomic64_##op(s64 i, atomic64_t *v)				\
> -{									\
> -	s64 result;							\
> -	unsigned long tmp;						\
> -									\
> -	asm volatile("// atomic64_" #op "\n"				\
> -	__LL_SC_FALLBACK(						\
> -"	prfm	pstl1strm, %2\n"					\
> -"1:	ldxr	%0, %2\n"						\
> -"	" #asm_op "	%0, %0, %3\n"					\
> -"	stxr	%w1, %0, %2\n"						\
> -"	cbnz	%w1, 1b")						\
> -	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
> -	: __stringify(constraint) "r" (i));				\
> -}
> -
> -#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\
> -static inline long							\
> -__ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v)		\
> -{									\
> -	s64 result;							\
> -	unsigned long tmp;						\
> -									\
> -	asm volatile("// atomic64_" #op "_return" #name "\n"		\
> -	__LL_SC_FALLBACK(						\
> -"	prfm	pstl1strm, %2\n"					\
> -"1:	ld" #acq "xr	%0, %2\n"					\
> -"	" #asm_op "	%0, %0, %3\n"					\
> -"	st" #rel "xr	%w1, %0, %2\n"					\
> -"	cbnz	%w1, 1b\n"						\
> -"	" #mb )								\
> -	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
> -	: __stringify(constraint) "r" (i)				\
> -	: cl);								\
> -									\
> -	return result;							\
> -}
> -
> -#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
> -static inline long							\
> -__ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v)		\
> -{									\
> -	s64 result, val;						\
> -	unsigned long tmp;						\
> -									\
> -	asm volatile("// atomic64_fetch_" #op #name "\n"		\
> -	__LL_SC_FALLBACK(						\
> -"	prfm	pstl1strm, %3\n"					\
> -"1:	ld" #acq "xr	%0, %3\n"					\
> -"	" #asm_op "	%1, %0, %4\n"					\
> -"	st" #rel "xr	%w2, %1, %3\n"					\
> -"	cbnz	%w2, 1b\n"						\
> -"	" #mb )								\
> -	: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter)	\
> -	: __stringify(constraint) "r" (i)				\
> -	: cl);								\
> -									\
> -	return result;							\
> -}
> -
> -#define ATOMIC64_OPS(...)						\
> -	ATOMIC64_OP(__VA_ARGS__)					\
> -	ATOMIC64_OP_RETURN(, dmb ish,  , l, "memory", __VA_ARGS__)	\
> -	ATOMIC64_OP_RETURN(_relaxed,,  ,  ,         , __VA_ARGS__)	\
> -	ATOMIC64_OP_RETURN(_acquire,, a,  , "memory", __VA_ARGS__)	\
> -	ATOMIC64_OP_RETURN(_release,,  , l, "memory", __VA_ARGS__)	\
> -	ATOMIC64_FETCH_OP (, dmb ish,  , l, "memory", __VA_ARGS__)	\
> -	ATOMIC64_FETCH_OP (_relaxed,,  ,  ,         , __VA_ARGS__)	\
> -	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
> -	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
> -
> -ATOMIC64_OPS(add, add, I)
> -ATOMIC64_OPS(sub, sub, J)
> -
> -#undef ATOMIC64_OPS
> -#define ATOMIC64_OPS(...)						\
> -	ATOMIC64_OP(__VA_ARGS__)					\
> -	ATOMIC64_FETCH_OP (, dmb ish,  , l, "memory", __VA_ARGS__)	\
> -	ATOMIC64_FETCH_OP (_relaxed,,  ,  ,         , __VA_ARGS__)	\
> -	ATOMIC64_FETCH_OP (_acquire,, a,  , "memory", __VA_ARGS__)	\
> -	ATOMIC64_FETCH_OP (_release,,  , l, "memory", __VA_ARGS__)
> -
> -ATOMIC64_OPS(and, and, L)
> -ATOMIC64_OPS(or, orr, L)
> -ATOMIC64_OPS(xor, eor, L)
> -/*
> - * GAS converts the mysterious and undocumented BIC (immediate) alias to
> - * an AND (immediate) instruction with the immediate inverted. We don't
> - * have a constraint for this, so fall back to register.
> - */
> -ATOMIC64_OPS(andnot, bic, )
> -
> -#undef ATOMIC64_OPS
> -#undef ATOMIC64_FETCH_OP
> -#undef ATOMIC64_OP_RETURN
> -#undef ATOMIC64_OP
> -
> -static inline s64
> -__ll_sc_atomic64_dec_if_positive(atomic64_t *v)
> -{
> -	s64 result;
> -	unsigned long tmp;
> -
> -	asm volatile("// atomic64_dec_if_positive\n"
> -	__LL_SC_FALLBACK(
> -"	prfm	pstl1strm, %2\n"
> -"1:	ldxr	%0, %2\n"
> -"	subs	%0, %0, #1\n"
> -"	b.lt	2f\n"
> -"	stlxr	%w1, %0, %2\n"
> -"	cbnz	%w1, 1b\n"
> -"	dmb	ish\n"
> -"2:")
> -	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
> -	:
> -	: "cc", "memory");
> -
> -	return result;
> -}
> -
>   #define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint)	\
>   static inline u##sz							\
>   __ll_sc__cmpxchg_case_##name##sz(volatile void *ptr,			\
> @@ -350,4 +228,4 @@ __CMPXCHG_DBL(_mb, dmb ish, l, "memory")
>   #undef __CMPXCHG_DBL
>   #undef K
>   
> -#endif	/* __ASM_ATOMIC_LL_SC_H */
> \ No newline at end of file
> +#endif	/* __ASM_ARM_ARM64_ATOMIC_LL_SC_H */
> \ No newline at end of file
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 10/15] xen/arm64: port Linux's arm64 cmpxchg.h to Xen
  2020-11-11 21:51 ` [RFC PATCH v2 10/15] xen/arm64: port Linux's arm64 cmpxchg.h " Ash Wilding
@ 2020-12-15 18:53   ` Julien Grall
  0 siblings, 0 replies; 24+ messages in thread
From: Julien Grall @ 2020-12-15 18:53 UTC (permalink / raw)
  To: Ash Wilding, xen-devel; +Cc: bertrand.marquis, rahul.singh

Hi Ash,

On 11/11/2020 21:51, Ash Wilding wrote:
> From: Ash Wilding <ash.j.wilding@gmail.com>
> 
>   - s/arch_xchg/xchg/
> 
>   - s/arch_cmpxchg/cmpxchg/
> 
>   - Replace calls to BUILD_BUG() with calls to __bad_cmpxchg() as we
>     don't currently have a BUILD_BUG() macro in Xen and this will
>     equivalently cause a link-time error.

How complex would it be to import BUILD_BUG() in Xen? Would the 
following work:

#define BUILD_BUG() BUILD_BUG_ON(1)

> 
>   - Replace calls to VM_BUG_ON() with BUG_ON() as we don't currently
>     have a VM_BUG_ON() macro in Xen.
> 
>   - Pull in the timeout variants of cmpxchg from the original Xen
>     arm64 cmpxchg.h as these are required for guest atomics and are
>     not provided by Linux. Note these are always using LL/SC so we
>     should ideally write LSE versions at some point.

The main advantage of LSE support in Xen is to drop the timeout helpers. 
I guess this would require a bit more thought to still allow inlining.

In any case, it may make sense to move them in a separate headers so you 
don't have to remove/add them. Anyway, I would be happy with the current 
approach as well.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen.
  2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
                   ` (14 preceding siblings ...)
  2020-11-11 21:52 ` [RFC PATCH v2 15/15] xen/arm: remove dependency on gcc built-in __sync_fetch_and_add() Ash Wilding
@ 2020-12-15 19:31 ` Julien Grall
  2020-12-17 15:37   ` Ash Wilding
  15 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2020-12-15 19:31 UTC (permalink / raw)
  To: Ash Wilding, xen-devel; +Cc: bertrand.marquis, rahul.singh, Stefano Stabellini



On 11/11/2020 21:51, Ash Wilding wrote:
> From: Ash Wilding <ash.j.wilding@gmail.com>
> 
> 
> Hey,

Hi Ash,

> 
> I've found some time to improve this series a bit:
> 
> 
> Changes since RFC v1
> ====================
> 
>   - Broken up patches into smaller chunks to aid in readability.
> 
>   - As per Julien's feedback I've also introduced intermediary patches
>     that first remove Xen's existing headers, then pull in the current
>     Linux versions as-is. This means we only need to review the changes
>     made while porting to Xen, rather than reviewing the existing Linux
>     code.
> 
>   - Pull in Linux's <asm-generic/rwonce.h> as <xen/rwonce.h> for
>     __READ_ONCE()/__WRITE_ONCE() instead of putting these in Xen's
>     <xen/compiler.h>. While doing this, provide justification for
>     dropping Linux's <linux/compiler_types.h> and relaxing the
>     __READ_ONCE() usage of __unqual_scalar_typeof() to just typeof()
>     (see patches #5 and #6).
> 
> 
> 
> Keeping the rest of the cover-letter unchanged as it would still be
> good to discuss these things:
> 
> 
> Arguments in favour of doing this
> =================================
> 
>      - Lets SMMUv3 driver switch to using <asm/atomic.h> rather than
>        maintaining its own implementation of the helpers.
> 
>      - Provides mitigation against XSA-295 [2], which affects both arm32
>        and arm64, across all versions of Xen, and may allow a domU to
>        maliciously or erroneously DoS the hypervisor.
> 
>      - All Armv8-A core implementations since ~2017 implement LSE so
>        there is an argument to be made we are long overdue support in
>        Xen. This is compounded by LSE atomics being more performant than
>        LL/SC atomics in most real-world applications, especially at high
>        core counts.
> 
>      - We may be able to get improved performance when using LL/SC too
>        as Linux provides helpers with relaxed ordering requirements which
>        are currently not available in Xen, though for this we would need
>        to go back through existing code to see where the more relaxed
>        versions can be safely used.
> 
>      - Anything else?
> 
> 
> Arguments against doing this
> ============================
> 
>      - Limited testing infrastructure in place to ensure use of new
>        atomics helpers does not introduce new bugs and regressions. This
>        is a particularly strong argument given how difficult it can be to
>        identify and debug malfunctioning atomics. The old adage applies,
>        "If it ain't broke, don't fix it."

I am not too concerned about the Linux atomics implementation. They are 
well tested and your changes doesn't seem to touch the implementation 
itself.

However, I vaguely remember that some of the atomics helper in Xen are 
somewhat much stronger than the Linux counterpart.

Would you be able to look at all the existing helpers and see whether 
the memory ordering diverge?

Once we have a list we could decide whether we want to make them 
stronger again or check the callers.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen
  2020-12-15 19:31 ` [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Julien Grall
@ 2020-12-17 15:37   ` Ash Wilding
  2020-12-18 12:17     ` Ash Wilding
  2021-02-14 18:36     ` Julien Grall
  0 siblings, 2 replies; 24+ messages in thread
From: Ash Wilding @ 2020-12-17 15:37 UTC (permalink / raw)
  To: julien
  Cc: ash.j.wilding, bertrand.marquis, rahul.singh, sstabellini, xen-devel

Hi Julien,

Thanks for taking a look at the patches and providing feedback. I've seen your
other comments and will reply to those separately when I get a chance (maybe at
the weekend or over the Christmas break).

RE the differences in ordering semantics between Xen's and Linux's atomics
helpers, please find my notes below.

Thoughts?

Cheers,
Ash.


The tables below use format AAA/BBB/CCC/DDD/EEE, where:

 - AAA is the memory barrier before the operation
 - BBB is the acquire semantics of the atomic operation
 - CCC is the release semantics of the atomic operation
 - DDD is whether the asm() block clobbers memory
 - EEE is the memory barrier after the operation

For example, ---/---/rel/mem/dmb would mean:

 - No memory barrier before the operation
 - The atomic does *not* have acquire semantics
 - The atomic *does* have release semantics
 - The asm() block clobbers memory
 - There is a DMB memory barrier after the atomic operation


    arm64 LL/SC
    ===========

        Xen Function            Xen                     Linux                   Inconsistent
        ============            ===                     =====                   ============

        atomic_add              ---/---/---/---/---     ---/---/---/---/---     ---
        atomic_add_return       ---/---/rel/mem/dmb     ---/---/rel/mem/dmb     --- (1)
        atomic_sub              ---/---/---/---/---     ---/---/---/---/---     ---
        atomic_sub_return       ---/---/rel/mem/dmb     ---/---/rel/mem/dmb     --- (1)        
        atomic_and              ---/---/---/---/---     ---/---/---/---/---     ---
        atomic_cmpxchg          dmb/---/---/---/dmb     ---/---/rel/mem/---     YES (2)
        atomic_xchg             ---/---/rel/mem/dmb     ---/acq/rel/mem/dmb     YES (3)

(1) It's actually interesting to me that Linux does it this way. As with the
    LSE atomics below, I'd have expected acq/rel semantics and ditch the DMB.
    Unless I'm missing something where there is a concern around taking an IRQ
    between the LDAXR and the STLXR, which can't happen in the LSE atomic case
    since it's a single instruction. But the exclusive monitor is cleared on
    exception return in AArch64 so I'm struggling to see what that potential
    issue may be. Regardless, Linux and Xen are consistent so we're OK ;-)

(2) The Linux version uses either STLXR with rel semantics if the comparison
    passes, or DMB if the comparison fails. This is weaker than Xen's version,
    which is quite blunt in always wrapping the operation between two DMBs. This
    may be a holdover from Xen's arm32 versions being ported to arm64, as we
    didn't support acq/rel semantics on LDREX and STREX in Armv7-A? Regardless,
    this is quite a big discrepancy and I've not yet given it enough thought to
    determine whether it would actually cause an issue. My feeling is that the
    Linux LL/SC atomic_cmpxchg() should have have acq semantics on the LL, but
    like you said these helpers are well tested so I'd be surprised if there
    is a bug. See (5) below though, where the Linux LSE atomic_cmpxchg() *does*
    have acq semantics.

(3) The Linux version just adds acq semantics to the LL, so we're OK here.


    arm64 LSE (comparison to Xen's LL/SC)
    =====================================

        Xen Function            Xen                     Linux                   Inconsistent
        ============            ===                     =====                   ============

        atomic_add              ---/---/---/---/---     ---/---/---/---/---     ---
        atomic_add_return       ---/---/rel/mem/dmb     ---/acq/rel/mem/---     YES (4)
        atomic_sub              ---/---/---/---/---     ---/---/---/---/---     ---
        atomic_sub_return       ---/---/rel/mem/dmb     ---/acq/rel/mem/---     YES (4)
        atomic_and              ---/---/---/---/---     ---/---/---/---/---     ---
        atomic_cmpxchg          dmb/---/---/---/dmb     ---/acq/rel/mem/---     YES (5)
        atomic_xchg             ---/---/rel/mem/dmb     ---/acq/rel/mem/---     YES (4)

(4) As noted in (1), this is how I would have expected Linux's LL/SC atomics to
    work too. I don't think this discrepancy will cause any issues.

(5) As with (2) above, this is quite a big discrepancy to Xen. However at least
    this version has acq semantics unlike the LL/SC version in (2), so I'm more
    confident that there won't be regressions going from Xen LL/SC to Linux LSE
    version of atomic_cmpxchg().


    arm32 LL/SC
    ===========

        Xen Function            Xen                     Linux                   Inconsistent
        ============            ===                     =====                   ============

        atomic_add              ---/---/---/---/---     ---/---/---/---/---     ---
        atomic_add_return       dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     YES (6)
        atomic_sub              ---/---/---/---/---     ---/---/---/---/---     ---
        atomic_sub_return       dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     YES (6)
        atomic_and              ---/---/---/---/---     ---/---/---/---/---     ---  
        atomic_cmpxchg          dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     YES (6)
        atomic_xchg             dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     YES (6)

(6) Linux only provides relaxed variants of these functions, such as
    atomic_add_return_relaxed() and atomic_xchg_relaxed(). Patches #13 and #14
    in the series add the stricter versions expected by Xen, wrapping calls to
    Linux's relaxed variants inbetween two calls to smb_mb(). This makes them
    consistent with Xen's existing helpers, though is quite blunt. It is worth
    noting that Armv8-A AArch32 does support acq/rel semantics on exclusive
    accesses, with LDAEX and STLEX, so I could imagine us introducing a new
    arm32 hwcap to detect whether we're on actual Armv7-A hardware or Armv8-A
    AArch32, then swap to lighterweight STLEX versions of these helpers rather
    than the heavyweight double DMB versions. Whether that would actually give
    measurable performance improvements is another story!


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen
  2020-12-17 15:37   ` Ash Wilding
@ 2020-12-18 12:17     ` Ash Wilding
  2021-02-14 18:36     ` Julien Grall
  1 sibling, 0 replies; 24+ messages in thread
From: Ash Wilding @ 2020-12-18 12:17 UTC (permalink / raw)
  To: ash.j.wilding
  Cc: bertrand.marquis, julien, rahul.singh, sstabellini, xen-devel

Having pondered note (1) in my previous email a bit more, I imagine the reason
for using a DMB instead of acq/rel semantics is to prevent accesses following
the STLXR from being reordered between it and the LDAXR.

I won't be winning any awards for this ASCII art but hopefully it helps convey
the point.

Using just an LDAXR/STLXR pair and ditching the DMB, accesses to [D] and [E]
can be reodered between the LDAXR and STLXR:

                    ...
        +---------- LDR   [A]
        |           ...
        |           ...
        |    +----- STR   [B]
        |    |      ...
    ====|====|======LDAXR [C]================
        |    |      ...                  X
        |    +----> ...                  |
        |           ...                  |
        |           ...   <----------+   |
        X           ...              |   |
    ================STLXR [C]========|===|===
                    ...              |   |
                    ...              |   |
                    LDR   [D]--------+   |
                    ...                  |
                    STR   [E]------------+
                    ...


While dropping the acq semantics from the LDAXR and using a DMB instead will
prevent accesses to [D] and [E] being reordered between the LDXR/STLXR pair,
and keeping the rel semantics on the STLXR to prevents accesses to [A] and [B]
from being reordered after the STLXR:

                    ...
        +---------- LDR   [A]
        |           ...
        |           ...
        |    +----- STR   [B]
        |    |      ...
        |    |      LDXR  [C]
        |    |      ...
        |    +----> ...
        |           ...
        X           ...
    ================STLXR [C]================
    ================DMB======================
                    ...          X    X
                    ...          |    |
                    LDR   [D]----+    |  
                    ...               |
                    STR   [E]---------+
                    ...


As mentioned in my original email, the LSE atomic is a single instruction so
we can give it acq/rel semantics and not worry about any accesses to [A], [B],
[D], or [E] being reordered relative to that atomic:

                    ...
        +---------- LDR   [A]
        |           ...
        |           ...
        |    +----- STR   [B]
        |    |      ...
        X    X      ...
    ================LDADDAL [C]================
                    ...          X    X
                    ...          |    |
                    LDR   [D]----+    |  
                    ...               |
                    STR   [E]---------+
                    ...


So, makes sense that Linux uses acq/rel and no DMB for LSE, but Linux (and Xen)
are forced to use rel semantics and a DMB for the LL/SC case.

Anyway, point (2) from my earlier email is the one that's potentially more
concerning as we only have rel semantics and no DMB on the Linux LL/SC version
of atomic_cmpxchg(), in contrast to the existing Xen LL/SC implementation being
sandwiched between two DMBs and the Linux LSE version having acq/rel semantics.

Cheers,
Ash.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen
  2020-12-17 15:37   ` Ash Wilding
  2020-12-18 12:17     ` Ash Wilding
@ 2021-02-14 18:36     ` Julien Grall
  1 sibling, 0 replies; 24+ messages in thread
From: Julien Grall @ 2021-02-14 18:36 UTC (permalink / raw)
  To: Ash Wilding; +Cc: bertrand.marquis, rahul.singh, sstabellini, xen-devel



On 17/12/2020 15:37, Ash Wilding wrote:
> Hi Julien,

Hi,

First of all, apologies for the very late reply.

> Thanks for taking a look at the patches and providing feedback. I've seen your
> other comments and will reply to those separately when I get a chance (maybe at
> the weekend or over the Christmas break).
> 
> RE the differences in ordering semantics between Xen's and Linux's atomics
> helpers, please find my notes below.
> 
> Thoughts?

Thank you for the very detailed answer, it made a lot easier to 
understand the differences!

I think it would be good to have some (if not all) the content in 
Documents for future reference.

[...]

> The tables below use format AAA/BBB/CCC/DDD/EEE, where:
> 
>   - AAA is the memory barrier before the operation
>   - BBB is the acquire semantics of the atomic operation
>   - CCC is the release semantics of the atomic operation
>   - DDD is whether the asm() block clobbers memory
>   - EEE is the memory barrier after the operation
> 
> For example, ---/---/rel/mem/dmb would mean:
> 
>   - No memory barrier before the operation
>   - The atomic does *not* have acquire semantics
>   - The atomic *does* have release semantics
>   - The asm() block clobbers memory
>   - There is a DMB memory barrier after the atomic operation
> 
> 
>      arm64 LL/SC
>      ===========
> 
>          Xen Function            Xen                     Linux                   Inconsistent
>          ============            ===                     =====                   ============
> 
>          atomic_add              ---/---/---/---/---     ---/---/---/---/---     ---
>          atomic_add_return       ---/---/rel/mem/dmb     ---/---/rel/mem/dmb     --- (1)
>          atomic_sub              ---/---/---/---/---     ---/---/---/---/---     ---
>          atomic_sub_return       ---/---/rel/mem/dmb     ---/---/rel/mem/dmb     --- (1)
>          atomic_and              ---/---/---/---/---     ---/---/---/---/---     ---
>          atomic_cmpxchg          dmb/---/---/---/dmb     ---/---/rel/mem/---     YES (2)

If I am not mistaken, Linux is implementing atomic_cmpxchg() with the 
*_mb() version. So the semantic would be:

---/---/rel/mem/dmb

>          atomic_xchg             ---/---/rel/mem/dmb     ---/acq/rel/mem/dmb     YES (3)

 From Linux:

#define __XCHG_CASE(w, sfx, name, sz, mb, nop_lse, acq, acq_lse, rel, 
cl)       \

[...]

         /* LL/SC */ 
          \
         "       prfm    pstl1strm, %2\n" 
          \
         "1:     ld" #acq "xr" #sfx "\t%" #w "0, %2\n" 
          \
         "       st" #rel "xr" #sfx "\t%w1, %" #w "3, %2\n" 
          \
         "       cbnz    %w1, 1b\n" 
          \
         "       " #mb, 
          \

[...]

__XCHG_CASE(w,  ,  mb_, 32, dmb ish, nop,  , a, l, "memory")

So I think the Linux semantic would be:

---/---/rel/mem/dmb

Therefore there would be no inconsistency between Xen and Linux.

> 
> (1) It's actually interesting to me that Linux does it this way. As with the
>      LSE atomics below, I'd have expected acq/rel semantics and ditch the DMB.

I have done some digging. The original implementation of atomic_{sub, 
add}_return was actually using acq/rel semantics up until Linux 3.14. 
But this was reworked by 8e86f0b409a4 "arm64: atomics: fix use of 
acquire + release for full barrier semantics".

>      Unless I'm missing something where there is a concern around taking an IRQ
>      between the LDAXR and the STLXR, which can't happen in the LSE atomic case
>      since it's a single instruction. But the exclusive monitor is cleared on
>      exception return in AArch64 so I'm struggling to see what that potential
>      issue may be. Regardless, Linux and Xen are consistent so we're OK ;-)

The commit I pointed above contains a lot of details on the issue. For 
convenience, I copied the most relevant bits below:

"
On arm64, these operations have been incorrectly implemented as follows:

             // A, B, C are independent memory locations

             <Access [A]>

             // atomic_op (B)
     1:      ldaxr   x0, [B]         // Exclusive load with acquire
             <op(B)>
             stlxr   w1, x0, [B]     // Exclusive store with release
             cbnz    w1, 1b

             <Access [C]>

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.
"

> (2) The Linux version uses either STLXR with rel semantics if the comparison
>      passes, or DMB if the comparison fails.

I think the DMB is only on the success path and there is no barrier on 
the failure path. The commit 4e39715f4b5c "arm64: cmpxchg: avoid memory 
barrier on comparison failure" seems to corroborate that.

> This is weaker than Xen's version,
>      which is quite blunt in always wrapping the operation between two DMBs. This
>      may be a holdover from Xen's arm32 versions being ported to arm64, as we
>      didn't support acq/rel semantics on LDREX and STREX in Armv7-A? Regardless,

The atomic helpers used in Xen were originally taken from Linux 3.14. 
Back then, atomic_cmpxchg() were using the two full barriers. This was 
introduced by the commit I pointed out in (1).

This was then optimized with commit 4e39715f4b5c "arm64: cmpxchg: avoid 
memory barrier on comparison failure".

>      this is quite a big discrepancy and I've not yet given it enough thought to
>      determine whether it would actually cause an issue. My feeling is that the
>      Linux LL/SC atomic_cmpxchg() should have have acq semantics on the LL, but
>      like you said these helpers are well tested so I'd be surprised if there
>      is a bug. See (5) below though, where the Linux LSE atomic_cmpxchg() *does*
>      have acq semantics.

If my understanding is correct the semantics would be (xen vs Linux):

  - failure: dmb/---/---/---/dmb     ---/---/rel/mem/---
  - success: dmb/---/---/---/dmb     ---/---/rel/mem/dmb

I think the success path is not going to be a problem. But we would need 
to check if all the callers expect a full barrier for the failure path 
(I would hope not).

> 
> (3) The Linux version just adds acq semantics to the LL, so we're OK here.
> 
> 
>      arm64 LSE (comparison to Xen's LL/SC)
>      =====================================
> 
>          Xen Function            Xen                     Linux                   Inconsistent
>          ============            ===                     =====                   ============
> 
>          atomic_add              ---/---/---/---/---     ---/---/---/---/---     ---
>          atomic_add_return       ---/---/rel/mem/dmb     ---/acq/rel/mem/---     YES (4)
>          atomic_sub              ---/---/---/---/---     ---/---/---/---/---     ---
>          atomic_sub_return       ---/---/rel/mem/dmb     ---/acq/rel/mem/---     YES (4)
>          atomic_and              ---/---/---/---/---     ---/---/---/---/---     ---
>          atomic_cmpxchg          dmb/---/---/---/dmb     ---/acq/rel/mem/---     YES (5)
>          atomic_xchg             ---/---/rel/mem/dmb     ---/acq/rel/mem/---     YES (4)
> 
> (4) As noted in (1), this is how I would have expected Linux's LL/SC atomics to
>      work too. I don't think this discrepancy will cause any issues.
IIUC, the LSE implementation would be using a single instruction that 
has both the acquire and release semantics. Therefore, it would act as a 
full barrier.

On the other hand, in the LL/SC implementation, the acquire and release 
semantics is happening with two different instruction. Therefore, they 
don't act as a full barrier.

So I think the memory ordering is going to be equivalent between Xen and 
the Linux LSE implementation.

> 
> (5) As with (2) above, this is quite a big discrepancy to Xen. However at least
>      this version has acq semantics unlike the LL/SC version in (2), so I'm more
>      confident that there won't be regressions going from Xen LL/SC to Linux LSE
>      version of atomic_cmpxchg().
Are they really different? In both cases, the helper will act as a full 
barrier. The main difference is how this ordering is achieved.

> 
> 
>      arm32 LL/SC
>      ===========
> 
>          Xen Function            Xen                     Linux                   Inconsistent
>          ============            ===                     =====                   ============
> 
>          atomic_add              ---/---/---/---/---     ---/---/---/---/---     ---
>          atomic_add_return       dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     YES (6)
>          atomic_sub              ---/---/---/---/---     ---/---/---/---/---     ---
>          atomic_sub_return       dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     YES (6)
>          atomic_and              ---/---/---/---/---     ---/---/---/---/---     ---
>          atomic_cmpxchg          dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     YES (6)
>          atomic_xchg             dmb/---/---/---/dmb     XXX/XXX/XXX/XXX/XXX     YES (6)
> 
> (6) Linux only provides relaxed variants of these functions, such as
>      atomic_add_return_relaxed() and atomic_xchg_relaxed(). Patches #13 and #14
>      in the series add the stricter versions expected by Xen, wrapping calls to
>      Linux's relaxed variants inbetween two calls to smb_mb(). This makes them
>      consistent with Xen's existing helpers, though is quite blunt.

Linux will do the same when the fully ordered version is not implemented 
(see include/linux/atomic-fallback.h).

>  It is worth
>      noting that Armv8-A AArch32 does support acq/rel semantics on exclusive
>      accesses, with LDAEX and STLEX, so I could imagine us introducing a new
>      arm32 hwcap to detect whether we're on actual Armv7-A hardware or Armv8-A
>      AArch32, then swap to lighterweight STLEX versions of these helpers rather
>      than the heavyweight double DMB versions. Whether that would actually give
>      measurable performance improvements is another story!

That's good to know! So far, I haven't heard anyone using Xen 32-bit on 
Armv8. I would wait until there is a request before introducing a 3rd 
(4th if counting the ll/sc as 2) implementation for the atomics helpers.

That said, the 32-bit port is meant to only be supported on Armv7. It 
should boot on Armv8, but there is no promise it will be fully 
functional nor stable.

Overall, it looks like to me that re-syncing the atomic implementation 
with Linux should not be a major problem.

We are in the middle of 4.15 freeze, I will try to go through the series 
ASAP.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-02-14 18:37 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-11 21:51 [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Ash Wilding
2020-11-11 21:51 ` [RFC PATCH v2 01/15] xen/arm: Support detection of CPU features in other ID registers Ash Wilding
2020-11-11 21:51 ` [RFC PATCH v2 02/15] xen/arm: Add detection of Armv8.1-LSE atomic instructions Ash Wilding
2020-11-11 21:51 ` [RFC PATCH v2 03/15] xen/arm: Add ARM64_HAS_LSE_ATOMICS hwcap Ash Wilding
2020-11-11 21:51 ` [RFC PATCH v2 04/15] xen/arm: Delete Xen atomics helpers Ash Wilding
2020-11-11 21:51 ` [RFC PATCH v2 05/15] xen/arm: pull in Linux atomics helpers and dependencies Ash Wilding
2020-11-12  8:31   ` Jan Beulich
2020-11-12 10:52     ` Ash Wilding
2020-11-11 21:51 ` [RFC PATCH v2 06/15] xen: port Linux <asm-generic/rwonce.h> to Xen Ash Wilding
2020-11-11 21:51 ` [RFC PATCH v2 07/15] xen/arm: prepare existing Xen headers for Linux atomics Ash Wilding
2020-11-11 21:51 ` [RFC PATCH v2 08/15] xen/arm64: port Linux's arm64 atomic_ll_sc.h to Xen Ash Wilding
2020-12-15 18:38   ` Julien Grall
2020-11-11 21:51 ` [RFC PATCH v2 09/15] xen/arm64: port Linux's arm64 atomic_lse.h " Ash Wilding
2020-11-11 21:51 ` [RFC PATCH v2 10/15] xen/arm64: port Linux's arm64 cmpxchg.h " Ash Wilding
2020-12-15 18:53   ` Julien Grall
2020-11-11 21:51 ` [RFC PATCH v2 11/15] xen/arm64: port Linux's arm64 atomic.h " Ash Wilding
2020-11-11 21:52 ` [RFC PATCH v2 12/15] xen/arm64: port Linux's arm64 lse.h " Ash Wilding
2020-11-11 21:52 ` [RFC PATCH v2 13/15] xen/arm32: port Linux's arm32 atomic.h " Ash Wilding
2020-11-11 21:52 ` [RFC PATCH v2 14/15] xen/arm32: port Linux's arm32 cmpxchg.h " Ash Wilding
2020-11-11 21:52 ` [RFC PATCH v2 15/15] xen/arm: remove dependency on gcc built-in __sync_fetch_and_add() Ash Wilding
2020-12-15 19:31 ` [RFC PATCH v2 00/15] xen/arm: port Linux LL/SC and LSE atomics helpers to Xen Julien Grall
2020-12-17 15:37   ` Ash Wilding
2020-12-18 12:17     ` Ash Wilding
2021-02-14 18:36     ` Julien Grall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).