All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Make PDX compression optional
@ 2023-07-17 16:03 Alejandro Vallejo
  2023-07-17 16:03 ` [PATCH 1/8] mm/pdx: Add comments throughout the codebase for pdx Alejandro Vallejo
                   ` (9 more replies)
  0 siblings, 10 replies; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-17 16:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu, Bertrand Marquis,
	Volodymyr Babchuk, Roger Pau Monné

Currently there's a CONFIG_HAS_PDX Kconfig option, but it's impossible to
disable it because the whole codebase performs unconditional
compression/decompression operations on addresses. This has the
unfortunate side effect that systems without a need for compression still
have to pay the performance impact of juggling bits on every pfn<->pdx
conversion (this requires reading several global variables). This series
attempts to:

  * Leave the state of pdx and pdx compression documented
  * Factor out compression so it _can_ be removed through Kconfig
  * Make it so compression is disabled on x86 and enabled on both Aarch32
    and Aarch64 by default.

Series summary:

Patch 1 documents the current general understanding of the pdx concept and
        pdx compression in particular
Patch 2 makes a clarification in ARM code to explain some discrepancies
        between the concept of "directmap" in arm32 and arm64 relevant to
        this series (i.e: why this series doesn't touch arm32 in directmap
        accesses).
Patch 3 Marks the pdx compression globals as ro_after_init
Patch 4 Gets rid of the current CONFIG_HAS_PDX option because (a) the name
        is misleading (b) cannot be removed in its current form.
Patch 5 Moves hard-coded compression-related logic to helper functions
Patch 6 Refactors all instances of regions being validated for pdx
        compression conformance so it's done through a helper
Patch 7 Non-functional reorder in order to simplify the patch 8 diff
Patch 8 Adds new Kconfig option to compile out PDX compression

Alejandro Vallejo (8):
  mm/pdx: Add comments throughout the codebase for pdx
  arm/mm: Document the differences between arm32 and arm64 directmaps
  pdx: Mark pdx hole description globals readonly after boot
  build: Remove CONFIG_HAS_PDX
  mm: Factor out the pdx compression logic in ma/va converters
  mm/pdx: Standardize region validation wrt pdx compression
  pdx: Reorder pdx.[ch]
  pdx: Add CONFIG_HAS_PDX_COMPRESSION as a Kconfig option

 xen/arch/arm/Kconfig                   |   1 -
 xen/arch/arm/include/asm/mm.h          |  30 ++-
 xen/arch/x86/Kconfig                   |   1 -
 xen/arch/x86/domain.c                  |  19 +-
 xen/arch/x86/include/asm/x86_64/page.h |  28 ++-
 xen/arch/x86/x86_64/mm.c               |   2 +-
 xen/common/Kconfig                     |  13 +-
 xen/common/Makefile                    |   2 +-
 xen/common/efi/boot.c                  |   6 +-
 xen/common/pdx.c                       | 120 +++++++++---
 xen/include/xen/mm.h                   |  11 ++
 xen/include/xen/pdx.h                  | 241 +++++++++++++++++++++++--
 12 files changed, 405 insertions(+), 69 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/8] mm/pdx: Add comments throughout the codebase for pdx
  2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
@ 2023-07-17 16:03 ` Alejandro Vallejo
  2023-07-18  9:14   ` Jan Beulich
  2023-07-17 16:03 ` [PATCH 2/8] arm/mm: Document the differences between arm32 and arm64 directmaps Alejandro Vallejo
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-17 16:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

Document the behaviour of the pdx machinery in Xen. Some logic is fairly
opaque and hard to follow without it being documented anywhere. This
explains the rationale behind compression and its relationship to
frametable indexing and directmap management.

While modifying the file:
  * Convert u64 -> uint64_t
  * Remove extern keyword from function prototypes

No functional change.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
This would've been v4 of the previously standalone patch:

v4:
  * Adds 2 "may"s to mm.h as requested
---
 xen/common/pdx.c      |  59 +++++++++++++++++-
 xen/include/xen/mm.h  |  11 ++++
 xen/include/xen/pdx.h | 139 ++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 202 insertions(+), 7 deletions(-)

diff --git a/xen/common/pdx.c b/xen/common/pdx.c
index c91875fabe..ec64d3d2ef 100644
--- a/xen/common/pdx.c
+++ b/xen/common/pdx.c
@@ -20,13 +20,56 @@
 #include <xen/bitops.h>
 #include <xen/nospec.h>
 
-/* Parameters for PFN/MADDR compression. */
+/*
+ * Diagram to make sense of the following variables. The masks and shifts
+ * are done on mfn values in order to convert to/from pdx:
+ *
+ *                      pfn_hole_mask
+ *                      pfn_pdx_hole_shift (mask bitsize)
+ *                      |
+ *                 |---------|
+ *                 |         |
+ *                 V         V
+ *         --------------------------
+ *         |HHHHHHH|000000000|LLLLLL| <--- mfn
+ *         --------------------------
+ *         ^       ^         ^      ^
+ *         |       |         |------|
+ *         |       |             |
+ *         |       |             pfn_pdx_bottom_mask
+ *         |       |
+ *         |-------|
+ *             |
+ *             pfn_top_mask
+ *
+ * ma_{top,va_bottom}_mask is simply a shifted pfn_{top,pdx_bottom}_mask,
+ * where ma_top_mask has zeroes shifted in while ma_va_bottom_mask has
+ * ones.
+ */
+
+/** Maximum (non-inclusive) usable pdx */
 unsigned long __read_mostly max_pdx;
+
+/** Mask for the lower non-compressible bits of an mfn */
 unsigned long __read_mostly pfn_pdx_bottom_mask = ~0UL;
+
+/** Mask for the lower non-compressible bits of an maddr or vaddr */
 unsigned long __read_mostly ma_va_bottom_mask = ~0UL;
+
+/** Mask for the higher non-compressible bits of an mfn */
 unsigned long __read_mostly pfn_top_mask = 0;
+
+/** Mask for the higher non-compressible bits of an maddr or vaddr */
 unsigned long __read_mostly ma_top_mask = 0;
+
+/**
+ * Mask for a pdx compression bit slice.
+ *
+ *  Invariant: valid(mfn) implies (mfn & pfn_hole_mask) == 0
+ */
 unsigned long __read_mostly pfn_hole_mask = 0;
+
+/** Number of bits of the "compressible" bit slice of an mfn */
 unsigned int __read_mostly pfn_pdx_hole_shift = 0;
 
 unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS(
@@ -42,7 +85,7 @@ bool __mfn_valid(unsigned long mfn)
 }
 
 /* Sets all bits from the most-significant 1-bit down to the LSB */
-static u64 __init fill_mask(u64 mask)
+static uint64_t __init fill_mask(uint64_t mask)
 {
     while (mask & (mask + 1))
         mask |= mask + 1;
@@ -57,8 +100,18 @@ uint64_t __init pdx_init_mask(uint64_t base_addr)
                          (uint64_t)1 << (MAX_ORDER + PAGE_SHIFT)) - 1);
 }
 
-u64 __init pdx_region_mask(u64 base, u64 len)
+uint64_t __init pdx_region_mask(uint64_t base, uint64_t len)
 {
+    /*
+     * We say a bit "moves" in a range if there exist 2 addresses in that
+     * range that have that bit both set and cleared respectively. We want
+     * to create a mask of _all_ moving bits in this range. We do this by
+     * comparing the first and last addresses in the range, discarding the
+     * bits that remain the same (this is logically an XOR operation). The
+     * MSB of the resulting expression is the most significant moving bit
+     * in the range. Then it's a matter of setting every bit in lower
+     * positions in order to get the mask of moving bits.
+     */
     return fill_mask(base ^ (base + len - 1));
 }
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index b0dc3ba9c9..962ef216fd 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -31,6 +31,17 @@
  *   (i.e. all devices assigned to) a guest share a single DMA address space
  *   and, by default, Xen will ensure dfn == pfn.
  *
+ * pdx: Page InDeX
+ *   Indices into the frame table holding the per-page's book-keeping
+ *   metadata. A compression scheme may be used, so there's a possibly non
+ *   identity mapping between valid(mfn) <-> valid(pdx). See the comments
+ *   in pdx.c for an in-depth explanation of that mapping. This may also
+ *   have a knock-on effect on the directmap, as "compressed" pfns may not have
+ *   corresponding mapped frames.
+ *
+ * maddr: Machine Address
+ *   The physical address that corresponds to an mfn
+ *
  * WARNING: Some of these terms have changed over time while others have been
  * used inconsistently, meaning that a lot of existing code does not match the
  * definitions above.  New code should use these terms as described here, and
diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
index 9fcfb0ce52..de5439a5e5 100644
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -1,6 +1,72 @@
 #ifndef __XEN_PDX_H__
 #define __XEN_PDX_H__
 
+/*
+ * PDX (Page inDeX)
+ *
+ * This file deals with optimisations pertaining to frame table and
+ * directmap indexing, A pdx is an index into the frame table, which
+ * typically also means an index into the directmap[1]. However, having an
+ * identity relationship between mfn and pdx could waste copious amounts of
+ * memory in empty frame table entries and page tables. There are some
+ * techniques to bring memory wastage down.
+ *
+ * [1] Some ports apply further modifications to a pdx before indexing the
+ *     directmap. This doesn't change the fact that the same compression
+ *     present in the frame table is also present in the directmap
+ *     whenever said map is present.
+ *
+ * ## PDX grouping
+ *
+ * The frame table may have some sparsity even on systems where the memory
+ * banks are tightly packed. This is due to system quirks (like the PCI
+ * hole) which might introduce several GiB of unused page frame numbers
+ * that uselessly waste memory in the frame table. PDX grouping addresses
+ * this by keeping a bitmap of the ranges in the frame table containing
+ * invalid entries and not allocating backing memory for them.
+ *
+ * ## PDX compression
+ *
+ * This is a technique to avoid wasting memory on machines known to have
+ * split their machine address space in several big discontinuous and highly
+ * disjoint chunks.
+ *
+ * In its uncompressed form the frame table must have book-keeping metadata
+ * structures for every page between [0, max_mfn) (whether they are backed
+ * by RAM or not), and a similar condition exists for the direct map. We
+ * know some systems, however, that have some sparsity in their address
+ * space, leading to a lot of wastage in the form of unused frame table
+ * entries.
+ *
+ * This is where compression becomes useful. The idea is to note that if
+ * you have several big chunks of memory sufficiently far apart you can
+ * ignore the middle part of the address because it will always contain
+ * zeroes.
+ *
+ * i.e:
+ *   Consider 2 regions of memory. One starts at 0 while the other starts
+ *   at offset 2^off_h. Furthermore, let's assume both regions are smaller
+ *   than 2^off_l. This means that all addresses between [2^off_l, 2^off_h)
+ *   are invalid and we can assume them to be zero on all valid addresses.
+ *
+ *                 off_h     off_l
+ *                 |         |
+ *                 V         V
+ *         --------------------------
+ *         |HHHHHHH|000000000|LLLLLL| <--- mfn
+ *         --------------------------
+ *           ^ |
+ *           | | (de)compression by adding/removing "useless" zeroes
+ *           | V
+ *         ---------------
+ *         |HHHHHHHLLLLLL| <--- pdx
+ *         ---------------
+ *
+ * This scheme also holds for multiple regions, where HHHHHHH acts as
+ * the region identifier and LLLLLL fully contains the span of every
+ * region involved.
+ */
+
 #ifdef CONFIG_HAS_PDX
 
 extern unsigned long max_pdx;
@@ -13,22 +79,78 @@ extern unsigned long pfn_top_mask, ma_top_mask;
                          (sizeof(*frame_table) & -sizeof(*frame_table)))
 extern unsigned long pdx_group_valid[];
 
-extern uint64_t pdx_init_mask(u64 base_addr);
-extern u64 pdx_region_mask(u64 base, u64 len);
+/**
+ * Calculates a mask covering "moving" bits of all addresses of a region
+ *
+ * The i-th bit of the mask must be set if there's 2 different addresses
+ * in the region that have different j-th bits. where j >= i.
+ *
+ * e.g:
+ *       base=0x1B00000000
+ *   len+base=0x1B00042000
+ *
+ *   ought to return 0x000007FFFF, which implies that every bit position
+ *   with a zero in the mask remains unchanged in every address of the
+ *   region.
+ *
+ * @param base Base address of the region
+ * @param len  Size in octets of the region
+ * @return Mask of moving bits at the bottom of all the region addresses
+ */
+uint64_t pdx_region_mask(uint64_t base, uint64_t len);
 
-extern void set_pdx_range(unsigned long smfn, unsigned long emfn);
+/**
+ * Creates the mask to start from when calculating non-compressible bits
+ *
+ * This function is intimately related to pdx_region_mask(), and together
+ * they are meant to calculate the mask of non-compressible bits given the
+ * current memory map.
+ *
+ * @param base_addr Address of the first maddr in the system
+ * @return An integer of the form 2^n - 1
+ */
+uint64_t pdx_init_mask(uint64_t base_addr);
+
+/**
+ * Mark [smfn, emfn) as accesible in the frame table
+ *
+ * @param smfn Start mfn
+ * @param emfn End mfn
+ */
+void set_pdx_range(unsigned long smfn, unsigned long emfn);
 
 #define page_to_pdx(pg)  ((pg) - frame_table)
 #define pdx_to_page(pdx) gcc11_wrap(frame_table + (pdx))
 
+/**
+ * Invoked to determine if an mfn has an associated valid frame table entry
+ *
+ * In order for it to be legal it must pass bounds, grouping and
+ * compression sanity checks.
+ *
+ * @param mfn To-be-checked mfn
+ * @return True iff all checks pass
+ */
 bool __mfn_valid(unsigned long mfn);
 
+/**
+ * Map pfn to its corresponding pdx
+ *
+ * @param pfn Frame number
+ * @return Obtained pdx after compressing the pfn
+ */
 static inline unsigned long pfn_to_pdx(unsigned long pfn)
 {
     return (pfn & pfn_pdx_bottom_mask) |
            ((pfn & pfn_top_mask) >> pfn_pdx_hole_shift);
 }
 
+/**
+ * Map a pdx to its corresponding pfn
+ *
+ * @param pdx Page index
+ * @return Obtained pfn after decompressing the pdx
+ */
 static inline unsigned long pdx_to_pfn(unsigned long pdx)
 {
     return (pdx & pfn_pdx_bottom_mask) |
@@ -38,7 +160,16 @@ static inline unsigned long pdx_to_pfn(unsigned long pdx)
 #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
 #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
 
-extern void pfn_pdx_hole_setup(unsigned long);
+/**
+ * Initializes global variables with information about the compressible
+ * range of the current memory regions.
+ *
+ * @param mask This mask is the biggest pdx_mask of every region in the
+ *             system ORed with all base addresses of every region in the
+ *             system. This results in a mask where every zero in a bit
+ *             position marks a potentially compressible bit.
+ */
+void pfn_pdx_hole_setup(unsigned long mask);
 
 #endif /* HAS_PDX */
 #endif /* __XEN_PDX_H__ */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/8] arm/mm: Document the differences between arm32 and arm64 directmaps
  2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
  2023-07-17 16:03 ` [PATCH 1/8] mm/pdx: Add comments throughout the codebase for pdx Alejandro Vallejo
@ 2023-07-17 16:03 ` Alejandro Vallejo
  2023-07-20 20:05   ` Julien Grall
  2023-07-17 16:03 ` [PATCH 3/8] pdx: Mark pdx hole description globals readonly after boot Alejandro Vallejo
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-17 16:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk

arm32 merely covers the XENHEAP, whereas arm64 currently covers anything in
the frame table. These comments highlight why arm32 doesn't need to account for PDX
compression in its __va() implementation while arm64 does.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 xen/arch/arm/include/asm/mm.h | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 4262165ce2..1a83f41879 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -280,6 +280,19 @@ static inline paddr_t __virt_to_maddr(vaddr_t va)
 #define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
 
 #ifdef CONFIG_ARM_32
+/**
+ * Find the virtual address corresponding to a machine address
+ *
+ * Only memory backing the XENHEAP has a corresponding virtual address to
+ * be found. This is so we can save precious virtual space, as it's in
+ * short supply on arm32. This mapping is not subject to PDX compression
+ * because XENHEAP is known to be physically contiguous and can't hence
+ * jump over the PDX hole. This means we can avoid the roundtrips
+ * converting to/from pdx.
+ *
+ * @param ma Machine address
+ * @return Virtual address mapped to `ma`
+ */
 static inline void *maddr_to_virt(paddr_t ma)
 {
     ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
@@ -287,6 +300,20 @@ static inline void *maddr_to_virt(paddr_t ma)
     return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
 }
 #else
+/**
+ * Find the virtual address corresponding to a machine address
+ *
+ * The directmap covers all conventional memory accesible by the
+ * hypervisor. This means it's subject to PDX compression.
+ *
+ * More specifically to arm64, the directmap mappings start at the first
+ * GiB boundary containing valid RAM. This means there's an extra offset
+ * applied (directmap_base_pdx) on top of the regular PDX compression
+ * logic.
+ *
+ * @param ma Machine address
+ * @return Virtual address mapped to `ma`
+ */
 static inline void *maddr_to_virt(paddr_t ma)
 {
     ASSERT((mfn_to_pdx(maddr_to_mfn(ma)) - directmap_base_pdx) <
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/8] pdx: Mark pdx hole description globals readonly after boot
  2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
  2023-07-17 16:03 ` [PATCH 1/8] mm/pdx: Add comments throughout the codebase for pdx Alejandro Vallejo
  2023-07-17 16:03 ` [PATCH 2/8] arm/mm: Document the differences between arm32 and arm64 directmaps Alejandro Vallejo
@ 2023-07-17 16:03 ` Alejandro Vallejo
  2023-07-18  9:14   ` Jan Beulich
  2023-07-17 16:03 ` [PATCH 4/8] build: Remove CONFIG_HAS_PDX Alejandro Vallejo
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-17 16:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

They define where the compressible area of valid mfns is, and all of them
are populated on boot (with the exception of max_pdx, that's updated on
memory hotplug).

No functional change.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 xen/common/pdx.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/xen/common/pdx.c b/xen/common/pdx.c
index ec64d3d2ef..99d4a90a50 100644
--- a/xen/common/pdx.c
+++ b/xen/common/pdx.c
@@ -47,30 +47,33 @@
  * ones.
  */
 
-/** Maximum (non-inclusive) usable pdx */
+/**
+ * Maximum (non-inclusive) usable pdx. Must be
+ * modifiable after init due to memory hotplug
+ */
 unsigned long __read_mostly max_pdx;
 
 /** Mask for the lower non-compressible bits of an mfn */
-unsigned long __read_mostly pfn_pdx_bottom_mask = ~0UL;
+unsigned long __ro_after_init pfn_pdx_bottom_mask = ~0UL;
 
 /** Mask for the lower non-compressible bits of an maddr or vaddr */
-unsigned long __read_mostly ma_va_bottom_mask = ~0UL;
+unsigned long __ro_after_init ma_va_bottom_mask = ~0UL;
 
 /** Mask for the higher non-compressible bits of an mfn */
-unsigned long __read_mostly pfn_top_mask = 0;
+unsigned long __ro_after_init pfn_top_mask = 0;
 
 /** Mask for the higher non-compressible bits of an maddr or vaddr */
-unsigned long __read_mostly ma_top_mask = 0;
+unsigned long __ro_after_init ma_top_mask = 0;
 
 /**
  * Mask for a pdx compression bit slice.
  *
  *  Invariant: valid(mfn) implies (mfn & pfn_hole_mask) == 0
  */
-unsigned long __read_mostly pfn_hole_mask = 0;
+unsigned long __ro_after_init pfn_hole_mask = 0;
 
 /** Number of bits of the "compressible" bit slice of an mfn */
-unsigned int __read_mostly pfn_pdx_hole_shift = 0;
+unsigned int __ro_after_init pfn_pdx_hole_shift = 0;
 
 unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS(
     (FRAMETABLE_NR + PDX_GROUP_COUNT - 1) / PDX_GROUP_COUNT)] = { [0] = 1 };
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/8] build: Remove CONFIG_HAS_PDX
  2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
                   ` (2 preceding siblings ...)
  2023-07-17 16:03 ` [PATCH 3/8] pdx: Mark pdx hole description globals readonly after boot Alejandro Vallejo
@ 2023-07-17 16:03 ` Alejandro Vallejo
  2023-07-18  9:19   ` Jan Beulich
  2023-07-17 16:03 ` [PATCH 5/8] mm: Factor out the pdx compression logic in ma/va converters Alejandro Vallejo
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-17 16:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Roger Pau Monné

It's set everywhere and can't be turned off because it's presence is
assumed in several parts of the codebase. This is an initial patch towards
adding a more fine-grained CONFIG_HAS_PDX_COMPRESSION that can actually be
disabled on systems that don't typically benefit from it.

No functional change.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 xen/arch/arm/Kconfig  | 1 -
 xen/arch/x86/Kconfig  | 1 -
 xen/common/Kconfig    | 3 ---
 xen/common/Makefile   | 2 +-
 xen/include/xen/pdx.h | 3 ---
 5 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 439cc94f33..ea1949fbaa 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -14,7 +14,6 @@ config ARM
 	select HAS_ALTERNATIVE
 	select HAS_DEVICE_TREE
 	select HAS_PASSTHROUGH
-	select HAS_PDX
 	select HAS_PMAP
 	select HAS_UBSAN
 	select IOMMU_FORCE_PT_SHARE
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 92f3a627da..30df085d96 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -24,7 +24,6 @@ config X86
 	select HAS_PASSTHROUGH
 	select HAS_PCI
 	select HAS_PCI_MSI
-	select HAS_PDX
 	select HAS_SCHED_GRANULARITY
 	select HAS_UBSAN
 	select HAS_VPCI if HVM
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index dd8d7c3f1c..40ec63c4b2 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -53,9 +53,6 @@ config HAS_IOPORTS
 config HAS_KEXEC
 	bool
 
-config HAS_PDX
-	bool
-
 config HAS_PMAP
 	bool
 
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 46049eac35..0020cafb8a 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -29,7 +29,7 @@ obj-y += multicall.o
 obj-y += notifier.o
 obj-$(CONFIG_NUMA) += numa.o
 obj-y += page_alloc.o
-obj-$(CONFIG_HAS_PDX) += pdx.o
+obj-y += pdx.o
 obj-$(CONFIG_PERF_COUNTERS) += perfc.o
 obj-bin-$(CONFIG_HAS_PMAP) += pmap.init.o
 obj-y += preempt.o
diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
index de5439a5e5..67ae20e89c 100644
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -67,8 +67,6 @@
  * region involved.
  */
 
-#ifdef CONFIG_HAS_PDX
-
 extern unsigned long max_pdx;
 extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask;
 extern unsigned int pfn_pdx_hole_shift;
@@ -171,7 +169,6 @@ static inline unsigned long pdx_to_pfn(unsigned long pdx)
  */
 void pfn_pdx_hole_setup(unsigned long mask);
 
-#endif /* HAS_PDX */
 #endif /* __XEN_PDX_H__ */
 
 /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/8] mm: Factor out the pdx compression logic in ma/va converters
  2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
                   ` (3 preceding siblings ...)
  2023-07-17 16:03 ` [PATCH 4/8] build: Remove CONFIG_HAS_PDX Alejandro Vallejo
@ 2023-07-17 16:03 ` Alejandro Vallejo
  2023-07-21 16:11   ` Julien Grall
  2023-07-17 16:03 ` [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression Alejandro Vallejo
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-17 16:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Roger Pau Monné

This patch factors out the pdx compression logic hardcoded in both ports
for the maddr<->vaddr conversion functions.

Touches both x86 and arm ports.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 xen/arch/arm/include/asm/mm.h          |  3 +--
 xen/arch/x86/include/asm/x86_64/page.h | 28 +++++++++++---------------
 xen/include/xen/pdx.h                  | 25 +++++++++++++++++++++++
 3 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 1a83f41879..78cb23858a 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -320,8 +320,7 @@ static inline void *maddr_to_virt(paddr_t ma)
            (DIRECTMAP_SIZE >> PAGE_SHIFT));
     return (void *)(XENHEAP_VIRT_START -
                     (directmap_base_pdx << PAGE_SHIFT) +
-                    ((ma & ma_va_bottom_mask) |
-                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
+                    maddr_to_directmapoff(ma));
 }
 #endif
 
diff --git a/xen/arch/x86/include/asm/x86_64/page.h b/xen/arch/x86/include/asm/x86_64/page.h
index 53faa7875b..b589c93e77 100644
--- a/xen/arch/x86/include/asm/x86_64/page.h
+++ b/xen/arch/x86/include/asm/x86_64/page.h
@@ -36,26 +36,22 @@ static inline unsigned long __virt_to_maddr(unsigned long va)
 {
     ASSERT(va < DIRECTMAP_VIRT_END);
     if ( va >= DIRECTMAP_VIRT_START )
-        va -= DIRECTMAP_VIRT_START;
-    else
-    {
-        BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
-        /* Signed, so ((long)XEN_VIRT_START >> 30) fits in an imm32. */
-        ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
-               ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
-
-        va += xen_phys_start - XEN_VIRT_START;
-    }
-    return (va & ma_va_bottom_mask) |
-           ((va << pfn_pdx_hole_shift) & ma_top_mask);
+        return directmapoff_to_maddr(va - DIRECTMAP_VIRT_START);
+
+    BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
+    /* Signed, so ((long)XEN_VIRT_START >> 30) fits in an imm32. */
+    ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
+           ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
+
+    return xen_phys_start + va - XEN_VIRT_START;
 }
 
 static inline void *__maddr_to_virt(unsigned long ma)
 {
-    ASSERT(pfn_to_pdx(ma >> PAGE_SHIFT) < (DIRECTMAP_SIZE >> PAGE_SHIFT));
-    return (void *)(DIRECTMAP_VIRT_START +
-                    ((ma & ma_va_bottom_mask) |
-                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
+    /* Offset in the direct map, accounting for pdx compression */
+    size_t va_offset = maddr_to_directmapoff(ma);
+    ASSERT(va_offset < DIRECTMAP_SIZE);
+    return (void *)(DIRECTMAP_VIRT_START + va_offset);
 }
 
 /* read access (should only be used for debug printk's) */
diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
index 67ae20e89c..f8ca0f5821 100644
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -158,6 +158,31 @@ static inline unsigned long pdx_to_pfn(unsigned long pdx)
 #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
 #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
 
+/**
+ * Computes the offset into the direct map of an maddr
+ *
+ * @param ma Machine address
+ * @return Offset on the direct map where that
+ *         machine address can be accessed
+ */
+static inline unsigned long maddr_to_directmapoff(uint64_t ma)
+{
+    return ((ma & ma_top_mask) >> pfn_pdx_hole_shift) |
+           (ma & ma_va_bottom_mask);
+}
+
+/**
+ * Computes a machine address given a direct map offset
+ *
+ * @param offset Offset into the direct map
+ * @return Corresponding machine address of that virtual location
+ */
+static inline uint64_t directmapoff_to_maddr(unsigned long offset)
+{
+    return ((offset << pfn_pdx_hole_shift) & ma_top_mask) |
+           (offset & ma_va_bottom_mask);
+}
+
 /**
  * Initializes global variables with information about the compressible
  * range of the current memory regions.
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression
  2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
                   ` (4 preceding siblings ...)
  2023-07-17 16:03 ` [PATCH 5/8] mm: Factor out the pdx compression logic in ma/va converters Alejandro Vallejo
@ 2023-07-17 16:03 ` Alejandro Vallejo
  2023-07-21 17:05   ` Julien Grall
  2023-07-17 16:03 ` [PATCH 7/8] pdx: Reorder pdx.[ch] Alejandro Vallejo
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-17 16:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini

Regions must be occasionally validated for pdx compression validity. That
is, whether any of the machine addresses spanning the region have a bit set
in the pdx "hole" (which is expected to always contain zeroes). There are
a few such tests through the code, and they all check for different things.

This patch replaces all such occurences with a call to a centralized
function that checks a region for validity.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 xen/arch/x86/x86_64/mm.c |  2 +-
 xen/common/efi/boot.c    |  6 +++---
 xen/common/pdx.c         | 13 +++++++++++--
 xen/include/xen/pdx.h    |  9 +++++++++
 4 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 60db439af3..914e65c26c 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1168,7 +1168,7 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn)
     if ( (spfn | epfn) & ((1UL << PAGETABLE_ORDER) - 1) )
         return 0;
 
-    if ( (spfn | epfn) & pfn_hole_mask )
+    if ( !pdx_is_region_compressible(spfn, epfn) )
         return 0;
 
     /* Make sure the new range is not present now */
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 24169b7b50..b098a8c030 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -14,6 +14,7 @@
 #include <xen/multiboot.h>
 #include <xen/param.h>
 #include <xen/pci_regs.h>
+#include <xen/pdx.h>
 #include <xen/pfn.h>
 #if EFI_PAGE_SIZE != PAGE_SIZE
 # error Cannot use xen/pfn.h here!
@@ -1647,7 +1648,7 @@ static bool __init cf_check ram_range_valid(unsigned long smfn, unsigned long em
 {
     unsigned long sz = pfn_to_pdx(emfn - 1) / PDX_GROUP_COUNT + 1;
 
-    return !(smfn & pfn_hole_mask) &&
+    return pdx_is_region_compressible(smfn, emfn) &&
            find_next_bit(pdx_group_valid, sz,
                          pfn_to_pdx(smfn) / PDX_GROUP_COUNT) < sz;
 }
@@ -1759,8 +1760,7 @@ void __init efi_init_memory(void)
             prot |= _PAGE_NX;
 
         if ( pfn_to_pdx(emfn - 1) < (DIRECTMAP_SIZE >> PAGE_SHIFT) &&
-             !(smfn & pfn_hole_mask) &&
-             !((smfn ^ (emfn - 1)) & ~pfn_pdx_bottom_mask) )
+             pdx_is_region_compressible(smfn, emfn))
         {
             if ( (unsigned long)mfn_to_virt(emfn - 1) >= HYPERVISOR_VIRT_END )
                 prot &= ~_PAGE_GLOBAL;
diff --git a/xen/common/pdx.c b/xen/common/pdx.c
index 99d4a90a50..72845e4bab 100644
--- a/xen/common/pdx.c
+++ b/xen/common/pdx.c
@@ -88,7 +88,7 @@ bool __mfn_valid(unsigned long mfn)
 }
 
 /* Sets all bits from the most-significant 1-bit down to the LSB */
-static uint64_t __init fill_mask(uint64_t mask)
+static uint64_t fill_mask(uint64_t mask)
 {
     while (mask & (mask + 1))
         mask |= mask + 1;
@@ -96,6 +96,15 @@ static uint64_t __init fill_mask(uint64_t mask)
     return mask;
 }
 
+bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn)
+{
+    uint64_t base = smfn << PAGE_SHIFT;
+    uint64_t len = (emfn - smfn) << PAGE_SHIFT;
+
+    return !(smfn & pfn_hole_mask) &&
+           !(pdx_region_mask(base, len) & ~ma_va_bottom_mask);
+}
+
 /* We don't want to compress the low MAX_ORDER bits of the addresses. */
 uint64_t __init pdx_init_mask(uint64_t base_addr)
 {
@@ -103,7 +112,7 @@ uint64_t __init pdx_init_mask(uint64_t base_addr)
                          (uint64_t)1 << (MAX_ORDER + PAGE_SHIFT)) - 1);
 }
 
-uint64_t __init pdx_region_mask(uint64_t base, uint64_t len)
+uint64_t pdx_region_mask(uint64_t base, uint64_t len)
 {
     /*
      * We say a bit "moves" in a range if there exist 2 addresses in that
diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
index f8ca0f5821..5378e664c2 100644
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -77,6 +77,15 @@ extern unsigned long pfn_top_mask, ma_top_mask;
                          (sizeof(*frame_table) & -sizeof(*frame_table)))
 extern unsigned long pdx_group_valid[];
 
+/**
+ * Validate a region's compatibility with the current compression runtime
+ *
+ * @param smfn Start mfn
+ * @param emfn End mfn (non-inclusive)
+ * @return True iff the region can be used with the current compression
+ */
+bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn);
+
 /**
  * Calculates a mask covering "moving" bits of all addresses of a region
  *
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 7/8] pdx: Reorder pdx.[ch]
  2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
                   ` (5 preceding siblings ...)
  2023-07-17 16:03 ` [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression Alejandro Vallejo
@ 2023-07-17 16:03 ` Alejandro Vallejo
  2023-07-17 16:24   ` Alejandro Vallejo
  2023-07-17 16:03 ` [PATCH 8/8] pdx: Add CONFIG_HAS_PDX_COMPRESSION as a Kconfig option Alejandro Vallejo
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-17 16:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

The next patch compiles out compression-related chunks, and it's helpful to
have them grouped together beforehand.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 xen/common/pdx.c      | 58 +++++++++++++++++++++----------------------
 xen/include/xen/pdx.h | 37 +++++++++++++++++++++------
 2 files changed, 59 insertions(+), 36 deletions(-)

diff --git a/xen/common/pdx.c b/xen/common/pdx.c
index 72845e4bab..cc963a3cb3 100644
--- a/xen/common/pdx.c
+++ b/xen/common/pdx.c
@@ -20,6 +20,35 @@
 #include <xen/bitops.h>
 #include <xen/nospec.h>
 
+/**
+ * Maximum (non-inclusive) usable pdx. Must be
+ * modifiable after init due to memory hotplug
+ */
+unsigned long __read_mostly max_pdx;
+
+unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS(
+    (FRAMETABLE_NR + PDX_GROUP_COUNT - 1) / PDX_GROUP_COUNT)] = { [0] = 1 };
+
+bool __mfn_valid(unsigned long mfn)
+{
+    if ( unlikely(evaluate_nospec(mfn >= max_page)) )
+        return false;
+    return likely(!(mfn & pfn_hole_mask)) &&
+           likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT,
+                           pdx_group_valid));
+}
+
+void set_pdx_range(unsigned long smfn, unsigned long emfn)
+{
+    unsigned long idx, eidx;
+
+    idx = pfn_to_pdx(smfn) / PDX_GROUP_COUNT;
+    eidx = (pfn_to_pdx(emfn - 1) + PDX_GROUP_COUNT) / PDX_GROUP_COUNT;
+
+    for ( ; idx < eidx; ++idx )
+        __set_bit(idx, pdx_group_valid);
+}
+
 /*
  * Diagram to make sense of the following variables. The masks and shifts
  * are done on mfn values in order to convert to/from pdx:
@@ -47,12 +76,6 @@
  * ones.
  */
 
-/**
- * Maximum (non-inclusive) usable pdx. Must be
- * modifiable after init due to memory hotplug
- */
-unsigned long __read_mostly max_pdx;
-
 /** Mask for the lower non-compressible bits of an mfn */
 unsigned long __ro_after_init pfn_pdx_bottom_mask = ~0UL;
 
@@ -75,18 +98,6 @@ unsigned long __ro_after_init pfn_hole_mask = 0;
 /** Number of bits of the "compressible" bit slice of an mfn */
 unsigned int __ro_after_init pfn_pdx_hole_shift = 0;
 
-unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS(
-    (FRAMETABLE_NR + PDX_GROUP_COUNT - 1) / PDX_GROUP_COUNT)] = { [0] = 1 };
-
-bool __mfn_valid(unsigned long mfn)
-{
-    if ( unlikely(evaluate_nospec(mfn >= max_page)) )
-        return false;
-    return likely(!(mfn & pfn_hole_mask)) &&
-           likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT,
-                           pdx_group_valid));
-}
-
 /* Sets all bits from the most-significant 1-bit down to the LSB */
 static uint64_t fill_mask(uint64_t mask)
 {
@@ -127,17 +138,6 @@ uint64_t pdx_region_mask(uint64_t base, uint64_t len)
     return fill_mask(base ^ (base + len - 1));
 }
 
-void set_pdx_range(unsigned long smfn, unsigned long emfn)
-{
-    unsigned long idx, eidx;
-
-    idx = pfn_to_pdx(smfn) / PDX_GROUP_COUNT;
-    eidx = (pfn_to_pdx(emfn - 1) + PDX_GROUP_COUNT) / PDX_GROUP_COUNT;
-
-    for ( ; idx < eidx; ++idx )
-        __set_bit(idx, pdx_group_valid);
-}
-
 void __init pfn_pdx_hole_setup(unsigned long mask)
 {
     unsigned int i, j, bottom_shift = 0, hole_shift = 0;
diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
index 5378e664c2..ce27177b56 100644
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -68,15 +68,41 @@
  */
 
 extern unsigned long max_pdx;
-extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask;
-extern unsigned int pfn_pdx_hole_shift;
-extern unsigned long pfn_hole_mask;
-extern unsigned long pfn_top_mask, ma_top_mask;
 
 #define PDX_GROUP_COUNT ((1 << PDX_GROUP_SHIFT) / \
                          (sizeof(*frame_table) & -sizeof(*frame_table)))
 extern unsigned long pdx_group_valid[];
 
+/**
+ * Mark [smfn, emfn) as allocatable in the frame table
+ *
+ * @param smfn Start mfn
+ * @param emfn End mfn
+ */
+void set_pdx_range(unsigned long smfn, unsigned long emfn);
+
+/**
+ * Invoked to determine if an mfn has an associated valid frame table entry
+ *
+ * In order for it to be legal it must pass bounds, grouping and
+ * compression sanity checks.
+ *
+ * @param mfn To-be-checked mfn
+ * @return True iff all checks pass
+ */
+bool __mfn_valid(unsigned long mfn);
+
+#define page_to_pdx(pg)  ((pg) - frame_table)
+#define pdx_to_page(pdx) gcc11_wrap(frame_table + (pdx))
+
+#define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
+#define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
+
+extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask;
+extern unsigned int pfn_pdx_hole_shift;
+extern unsigned long pfn_hole_mask;
+extern unsigned long pfn_top_mask, ma_top_mask;
+
 /**
  * Validate a region's compatibility with the current compression runtime
  *
@@ -164,9 +190,6 @@ static inline unsigned long pdx_to_pfn(unsigned long pdx)
            ((pdx << pfn_pdx_hole_shift) & pfn_top_mask);
 }
 
-#define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
-#define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
-
 /**
  * Computes the offset into the direct map of an maddr
  *
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 8/8] pdx: Add CONFIG_HAS_PDX_COMPRESSION as a Kconfig option
  2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
                   ` (6 preceding siblings ...)
  2023-07-17 16:03 ` [PATCH 7/8] pdx: Reorder pdx.[ch] Alejandro Vallejo
@ 2023-07-17 16:03 ` Alejandro Vallejo
  2023-07-18  9:33 ` [PATCH 0/8] Make PDX compression optional Jan Beulich
  2023-07-20 22:00 ` Julien Grall
  9 siblings, 0 replies; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-17 16:03 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini

Adds a new compile-time flag to allow disabling pdx compression and
compiles out compression-related code/data. It also shorts the pdx<->pfn
conversion macros and creates stubs for masking fucntions.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
 xen/arch/x86/domain.c | 19 +++++++++++++------
 xen/common/Kconfig    | 10 ++++++++++
 xen/common/pdx.c      | 15 +++++++++++----
 xen/include/xen/pdx.h | 34 ++++++++++++++++++++++++++++++++++
 4 files changed, 68 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 39c2153165..c818ccc4d5 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -458,7 +458,7 @@ void domain_cpu_policy_changed(struct domain *d)
     }
 }
 
-#ifndef CONFIG_BIGMEM
+#if !defined(CONFIG_BIGMEM) && defined(CONFIG_HAS_PDX_COMPRESSION)
 /*
  * The hole may be at or above the 44-bit boundary, so we need to determine
  * the total bit count until reaching 32 significant (not squashed out) bits
@@ -485,13 +485,20 @@ static unsigned int __init noinline _domain_struct_bits(void)
 struct domain *alloc_domain_struct(void)
 {
     struct domain *d;
-#ifdef CONFIG_BIGMEM
-    const unsigned int bits = 0;
-#else
+
     /*
-     * We pack the PDX of the domain structure into a 32-bit field within
-     * the page_info structure. Hence the MEMF_bits() restriction.
+     * Without CONFIG_BIGMEM, we pack the PDX of the domain structure into
+     * a 32-bit field within the page_info structure. Hence the MEMF_bits()
+     * restriction. With PDX compression in place the number of bits must
+     * be calculated at runtime, but it's fixed otherwise.
+     *
+     * On systems with CONFIG_BIGMEM there's no packing, and so there's no
+     * such restriction.
      */
+#if defined(CONFIG_BIGMEM) || !defined(CONFIG_HAS_PDX_COMPRESSION)
+    const unsigned int bits = IS_ENABLED(CONFIG_BIGMEM) ? 0 :
+                                                          32 + PAGE_SHIFT;
+#else
     static unsigned int __read_mostly bits;
 
     if ( unlikely(!bits) )
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 40ec63c4b2..6605a60ff7 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -23,6 +23,16 @@ config GRANT_TABLE
 
 	  If unsure, say Y.
 
+config HAS_PDX_COMPRESSION
+	bool "PDX (Page inDeX) compression support"
+	default ARM
+	help
+	  PDX compression is a technique that allows the hypervisor to
+	  represent physical addresses in a very space-efficient manner.
+	  This is very helpful reducing memory wastage in systems with
+	  memory banks with base addresses far from each other, but carrier
+	  a performance cost.
+
 config ALTERNATIVE_CALL
 	bool
 
diff --git a/xen/common/pdx.c b/xen/common/pdx.c
index cc963a3cb3..d0fac9d7c7 100644
--- a/xen/common/pdx.c
+++ b/xen/common/pdx.c
@@ -31,11 +31,15 @@ unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS(
 
 bool __mfn_valid(unsigned long mfn)
 {
-    if ( unlikely(evaluate_nospec(mfn >= max_page)) )
+    bool invalid = mfn >= max_page;
+#ifdef CONFIG_HAS_PDX_COMPRESSION
+    invalid |= mfn & pfn_hole_mask;
+#endif
+
+    if ( unlikely(evaluate_nospec(invalid)) )
         return false;
-    return likely(!(mfn & pfn_hole_mask)) &&
-           likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT,
-                           pdx_group_valid));
+
+    return test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, pdx_group_valid);
 }
 
 void set_pdx_range(unsigned long smfn, unsigned long emfn)
@@ -49,6 +53,8 @@ void set_pdx_range(unsigned long smfn, unsigned long emfn)
         __set_bit(idx, pdx_group_valid);
 }
 
+#ifdef CONFIG_HAS_PDX_COMPRESSION
+
 /*
  * Diagram to make sense of the following variables. The masks and shifts
  * are done on mfn values in order to convert to/from pdx:
@@ -178,6 +184,7 @@ void __init pfn_pdx_hole_setup(unsigned long mask)
     pfn_top_mask        = ~(pfn_pdx_bottom_mask | pfn_hole_mask);
     ma_top_mask         = pfn_top_mask << PAGE_SHIFT;
 }
+#endif /* CONFIG_HAS_PDX_COMPRESSION */
 
 
 /*
diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
index ce27177b56..5531890d1c 100644
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -98,6 +98,8 @@ bool __mfn_valid(unsigned long mfn);
 #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
 #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
 
+#ifdef CONFIG_HAS_PDX_COMPRESSION
+
 extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask;
 extern unsigned int pfn_pdx_hole_shift;
 extern unsigned long pfn_hole_mask;
@@ -225,7 +227,39 @@ static inline uint64_t directmapoff_to_maddr(unsigned long offset)
  *             position marks a potentially compressible bit.
  */
 void pfn_pdx_hole_setup(unsigned long mask);
+#else /* CONFIG_HAS_PDX_COMPRESSION */
+
+/* Without PDX compression we can skip some computations */
+
+/* pdx<->pfn == identity */
+#define pdx_to_pfn(x) (x)
+#define pfn_to_pdx(x) (x)
+
+/* directmap is indexed by by maddr */
+#define maddr_to_directmapoff(x) (x)
+#define directmapoff_to_maddr(x) (x)
+
+static inline bool pdx_is_region_compressible(unsigned long smfn,
+                                              unsigned long emfn)
+{
+    return true;
+}
+
+static inline uint64_t pdx_init_mask(uint64_t base_addr)
+{
+    return 0;
+}
+
+static inline uint64_t pdx_region_mask(uint64_t base, uint64_t len)
+{
+    return 0;
+}
+
+static inline void pfn_pdx_hole_setup(unsigned long mask)
+{
+}
 
+#endif /* CONFIG_HAS_PDX_COMPRESSION */
 #endif /* __XEN_PDX_H__ */
 
 /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 7/8] pdx: Reorder pdx.[ch]
  2023-07-17 16:03 ` [PATCH 7/8] pdx: Reorder pdx.[ch] Alejandro Vallejo
@ 2023-07-17 16:24   ` Alejandro Vallejo
  0 siblings, 0 replies; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-17 16:24 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Wei Liu

On Mon, Jul 17, 2023 at 05:03:17PM +0100, Alejandro Vallejo wrote:
> The next patch compiles out compression-related chunks, and it's helpful to
> have them grouped together beforehand.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>  xen/common/pdx.c      | 58 +++++++++++++++++++++----------------------
>  xen/include/xen/pdx.h | 37 +++++++++++++++++++++------
>  2 files changed, 59 insertions(+), 36 deletions(-)
> 
> [snip]
>
> diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
> index 5378e664c2..ce27177b56 100644
> --- a/xen/include/xen/pdx.h
> +++ b/xen/include/xen/pdx.h
> @@ -68,15 +68,41 @@
>   */
>  
>  extern unsigned long max_pdx;
> -extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask;
> -extern unsigned int pfn_pdx_hole_shift;
> -extern unsigned long pfn_hole_mask;
> -extern unsigned long pfn_top_mask, ma_top_mask;
>  
>  #define PDX_GROUP_COUNT ((1 << PDX_GROUP_SHIFT) / \
>                           (sizeof(*frame_table) & -sizeof(*frame_table)))
>  extern unsigned long pdx_group_valid[];
>  
> +/**
> + * Mark [smfn, emfn) as allocatable in the frame table
> + *
> + * @param smfn Start mfn
> + * @param emfn End mfn
> + */
> +void set_pdx_range(unsigned long smfn, unsigned long emfn);
> +
> +/**
> + * Invoked to determine if an mfn has an associated valid frame table entry
> + *
> + * In order for it to be legal it must pass bounds, grouping and
> + * compression sanity checks.
> + *
> + * @param mfn To-be-checked mfn
> + * @return True iff all checks pass
> + */
> +bool __mfn_valid(unsigned long mfn);
> +
> +#define page_to_pdx(pg)  ((pg) - frame_table)
> +#define pdx_to_page(pdx) gcc11_wrap(frame_table + (pdx))
> +
> +#define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
> +#define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
> +
> +extern unsigned long pfn_pdx_bottom_mask, ma_va_bottom_mask;
> +extern unsigned int pfn_pdx_hole_shift;
> +extern unsigned long pfn_hole_mask;
> +extern unsigned long pfn_top_mask, ma_top_mask;
> +
>  /**
>   * Validate a region's compatibility with the current compression runtime
>   *
> @@ -164,9 +190,6 @@ static inline unsigned long pdx_to_pfn(unsigned long pdx)
>             ((pdx << pfn_pdx_hole_shift) & pfn_top_mask);
>  }
>  
> -#define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
> -#define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
> -
>  /**
>   * Computes the offset into the direct map of an maddr
>   *
> -- 
> 2.34.1
The latest rebase messed up this patch. set_pdx_range(), __mfn_valid() and
the page_to_pdx()/pdx_to_page() macros should be removed in order for the
additions to make sense. Re-running GitLab with that now...

Alejandro




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/8] mm/pdx: Add comments throughout the codebase for pdx
  2023-07-17 16:03 ` [PATCH 1/8] mm/pdx: Add comments throughout the codebase for pdx Alejandro Vallejo
@ 2023-07-18  9:14   ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2023-07-18  9:14 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Xen-devel

On 17.07.2023 18:03, Alejandro Vallejo wrote:
> Document the behaviour of the pdx machinery in Xen. Some logic is fairly
> opaque and hard to follow without it being documented anywhere. This
> explains the rationale behind compression and its relationship to
> frametable indexing and directmap management.
> 
> While modifying the file:
>   * Convert u64 -> uint64_t
>   * Remove extern keyword from function prototypes
> 
> No functional change.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/8] pdx: Mark pdx hole description globals readonly after boot
  2023-07-17 16:03 ` [PATCH 3/8] pdx: Mark pdx hole description globals readonly after boot Alejandro Vallejo
@ 2023-07-18  9:14   ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2023-07-18  9:14 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Xen-devel

On 17.07.2023 18:03, Alejandro Vallejo wrote:
> They define where the compressible area of valid mfns is, and all of them
> are populated on boot (with the exception of max_pdx, that's updated on
> memory hotplug).
> 
> No functional change.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/8] build: Remove CONFIG_HAS_PDX
  2023-07-17 16:03 ` [PATCH 4/8] build: Remove CONFIG_HAS_PDX Alejandro Vallejo
@ 2023-07-18  9:19   ` Jan Beulich
  2023-07-18  9:35     ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-07-18  9:19 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné,
	Xen-devel

On 17.07.2023 18:03, Alejandro Vallejo wrote:
> It's set everywhere and can't be turned off because it's presence is
> assumed in several parts of the codebase. This is an initial patch towards
> adding a more fine-grained CONFIG_HAS_PDX_COMPRESSION that can actually be
> disabled on systems that don't typically benefit from it.
> 
> No functional change.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>

On its own I don't think this is okay, as it's unclear whether it would
affect RISC-V or PPC in a negative way. If at all, it should only go in
together with the later re-introduction of a CONFIG_*. Still I question
whether that new CONFIG_HAS_PDX_COMPRESSION (which imo ought to be
CONFIG_PDX_COMPRESSION) then wouldn't better depend on the arch-selected
HAS_PDX, such that it wouldn't wrongly be offered to choose a value when
compression isn't supported in the first place.

Jan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Make PDX compression optional
  2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
                   ` (7 preceding siblings ...)
  2023-07-17 16:03 ` [PATCH 8/8] pdx: Add CONFIG_HAS_PDX_COMPRESSION as a Kconfig option Alejandro Vallejo
@ 2023-07-18  9:33 ` Jan Beulich
  2023-07-18 12:58   ` Alejandro Vallejo
  2023-07-20 22:00 ` Julien Grall
  9 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-07-18  9:33 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Xen-devel

On 17.07.2023 18:03, Alejandro Vallejo wrote:
> Currently there's a CONFIG_HAS_PDX Kconfig option, but it's impossible to
> disable it because the whole codebase performs unconditional
> compression/decompression operations on addresses. This has the
> unfortunate side effect that systems without a need for compression still
> have to pay the performance impact of juggling bits on every pfn<->pdx
> conversion (this requires reading several global variables). This series
> attempts to:
> 
>   * Leave the state of pdx and pdx compression documented
>   * Factor out compression so it _can_ be removed through Kconfig
>   * Make it so compression is disabled on x86 and enabled on both Aarch32
>     and Aarch64 by default.

I disagree with this choice of default for x86. To avoid surprising
downstreams, this should at best be a two-step process: Keep the
default as Y right now, and switch to N a couple of releases later.

But that's only the simple / mechanical side. Considering my earlier
effort to reduce / remove the involved overhead dynamically at
runtime (which you may or may not be aware of; see [2]), I view a
compile time choice as less desirable. At the very least I would
expect some justification towards this build time choice being
acceptable / reasonable despite the earlier effort towards greater
flexibility. Only such would be likely to have me merely defer to
other x86 maintainers, rather than outright objecting.

Jan

[2] https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01616.html


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/8] build: Remove CONFIG_HAS_PDX
  2023-07-18  9:19   ` Jan Beulich
@ 2023-07-18  9:35     ` Andrew Cooper
  2023-07-18  9:38       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2023-07-18  9:35 UTC (permalink / raw)
  To: Jan Beulich, Alejandro Vallejo
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, George Dunlap, Wei Liu, Roger Pau Monné,
	Xen-devel

On 18/07/2023 10:19 am, Jan Beulich wrote:
> On 17.07.2023 18:03, Alejandro Vallejo wrote:
>> It's set everywhere and can't be turned off because it's presence is
>> assumed in several parts of the codebase. This is an initial patch towards
>> adding a more fine-grained CONFIG_HAS_PDX_COMPRESSION that can actually be
>> disabled on systems that don't typically benefit from it.
>>
>> No functional change.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> On its own I don't think this is okay, as it's unclear whether it would
> affect RISC-V or PPC in a negative way.

Neither could compile without layering violations of PDX logic in common
code, and neither have got that far yet.

Now is an excellent time to be doing this, because it removes work for
RISC-V/PPC...

>  If at all, it should only go in
> together with the later re-introduction of a CONFIG_*. Still I question
> whether that new CONFIG_HAS_PDX_COMPRESSION (which imo ought to be
> CONFIG_PDX_COMPRESSION) then wouldn't better depend on the arch-selected
> HAS_PDX, such that it wouldn't wrongly be offered to choose a value when
> compression isn't supported in the first place.

... although I do agree that the resulting option shouldn't be user
selectable.  It's a property of the architecture, not something a user
should be worrying about.

I also don't see why this patch needs to come here in the series.  The
main refactoring in the following two patches looks to be independent.

~Andrew


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/8] build: Remove CONFIG_HAS_PDX
  2023-07-18  9:35     ` Andrew Cooper
@ 2023-07-18  9:38       ` Jan Beulich
  2023-07-18 13:35         ` Alejandro Vallejo
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-07-18  9:38 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, George Dunlap, Wei Liu, Roger Pau Monné,
	Xen-devel, Alejandro Vallejo

On 18.07.2023 11:35, Andrew Cooper wrote:
> On 18/07/2023 10:19 am, Jan Beulich wrote:
>> On 17.07.2023 18:03, Alejandro Vallejo wrote:
>>> It's set everywhere and can't be turned off because it's presence is
>>> assumed in several parts of the codebase. This is an initial patch towards
>>> adding a more fine-grained CONFIG_HAS_PDX_COMPRESSION that can actually be
>>> disabled on systems that don't typically benefit from it.
>>>
>>> No functional change.
>>>
>>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>> On its own I don't think this is okay, as it's unclear whether it would
>> affect RISC-V or PPC in a negative way.
> 
> Neither could compile without layering violations of PDX logic in common
> code, and neither have got that far yet.
> 
> Now is an excellent time to be doing this, because it removes work for
> RISC-V/PPC...
> 
>>  If at all, it should only go in
>> together with the later re-introduction of a CONFIG_*. Still I question
>> whether that new CONFIG_HAS_PDX_COMPRESSION (which imo ought to be
>> CONFIG_PDX_COMPRESSION) then wouldn't better depend on the arch-selected
>> HAS_PDX, such that it wouldn't wrongly be offered to choose a value when
>> compression isn't supported in the first place.
> 
> ... although I do agree that the resulting option shouldn't be user
> selectable.  It's a property of the architecture, not something a user
> should be worrying about.

I disagree. Exotic x86 hardware may or may not want supporting, which
can only be determined by the build meister, as long as this is indeed
meant to become a build-time decision (rather than a runtime one; see
my response to the cover letter of this series).

Jan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Make PDX compression optional
  2023-07-18  9:33 ` [PATCH 0/8] Make PDX compression optional Jan Beulich
@ 2023-07-18 12:58   ` Alejandro Vallejo
  2023-07-18 13:06     ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-18 12:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Xen-devel

On Tue, Jul 18, 2023 at 11:33:03AM +0200, Jan Beulich wrote:
> On 17.07.2023 18:03, Alejandro Vallejo wrote:
> > Currently there's a CONFIG_HAS_PDX Kconfig option, but it's impossible to
> > disable it because the whole codebase performs unconditional
> > compression/decompression operations on addresses. This has the
> > unfortunate side effect that systems without a need for compression still
> > have to pay the performance impact of juggling bits on every pfn<->pdx
> > conversion (this requires reading several global variables). This series
> > attempts to:
> > 
> >   * Leave the state of pdx and pdx compression documented
> >   * Factor out compression so it _can_ be removed through Kconfig
> >   * Make it so compression is disabled on x86 and enabled on both Aarch32
> >     and Aarch64 by default.
> 
> I disagree with this choice of default for x86. To avoid surprising
> downstreams, this should at best be a two-step process: Keep the
> default as Y right now, and switch to N a couple of releases later.
I'm not particularly worried about the default, as that's easy to change
locally for our customers. That said, my .02 on the matter is that it would
be easier to accept (leaving it as Y) if there was some documented case of
the feature being triggered at present on x86. I'd rather turn it off
unless we have strong evidence that it's actually used. Forcing Xen users
to pay the cost of a feature they don't benefit from just feels wrong.

> 
> But that's only the simple / mechanical side. Considering my earlier
> effort to reduce / remove the involved overhead dynamically at
> runtime (which you may or may not be aware of; see [2]),
I wasn't. I'll have a look at the end of the day[3]. Regardless, most of this
series is about containment of compression into known helpers and hopefully
that should be undeniably positive irrespective of how the overhead
reduction is achieved.

> I view a
> compile time choice as less desirable. At the very least I would
> expect some justification towards this build time choice being
> acceptable / reasonable despite the earlier effort towards greater
> flexibility. Only such would be likely to have me merely defer to
> other x86 maintainers, rather than outright objecting.
> 
> Jan
> 
> [2] https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01616.html

I believe the burden of proof is reversed. Features bring complexity, and
complexity increases the chances of introducing bugs. It's the presence of
a feature that ought to be backed by a proof-of-requirement, not its
absence.

As far as data goes, we're aware of several ARM systems with compressible
memory banks, but no publicly available x86 ones. I'm purposefully leaving
RISC-V and PPC out of this equation, as they may or may not require this,
it's something the maintainers of those ports will have to assess in due
time.

Alejandro

[3] Note that I send this email before reading carefully your 2018 series.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Make PDX compression optional
  2023-07-18 12:58   ` Alejandro Vallejo
@ 2023-07-18 13:06     ` Jan Beulich
  2023-07-18 13:40       ` Alejandro Vallejo
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-07-18 13:06 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Xen-devel

On 18.07.2023 14:58, Alejandro Vallejo wrote:
> I believe the burden of proof is reversed. Features bring complexity, and
> complexity increases the chances of introducing bugs. It's the presence of
> a feature that ought to be backed by a proof-of-requirement, not its
> absence.

The feature was introduced to support hardware a partner of ours was
working on at the time. Xen wouldn't have worked very well there
without these additions. It is beyond my control or knowledge whether
any such system has ever made it into the public. So at the time of
its introduction, the need for this code was well justified imo.

Jan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/8] build: Remove CONFIG_HAS_PDX
  2023-07-18  9:38       ` Jan Beulich
@ 2023-07-18 13:35         ` Alejandro Vallejo
  0 siblings, 0 replies; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-18 13:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, George Dunlap, Wei Liu,
	Roger Pau Monné,
	Xen-devel

On Tue, Jul 18, 2023 at 11:38:14AM +0200, Jan Beulich wrote:
> On 18.07.2023 11:35, Andrew Cooper wrote:
> > On 18/07/2023 10:19 am, Jan Beulich wrote:
> >> On 17.07.2023 18:03, Alejandro Vallejo wrote:
> >>> It's set everywhere and can't be turned off because it's presence is
> >>> assumed in several parts of the codebase. This is an initial patch towards
> >>> adding a more fine-grained CONFIG_HAS_PDX_COMPRESSION that can actually be
> >>> disabled on systems that don't typically benefit from it.
> >>>
> >>> No functional change.
> >>>
> >>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> >> On its own I don't think this is okay, as it's unclear whether it would
> >> affect RISC-V or PPC in a negative way.
> > 
> > Neither could compile without layering violations of PDX logic in common
> > code, and neither have got that far yet.
> > 
> > Now is an excellent time to be doing this, because it removes work for
> > RISC-V/PPC...
> > 
> >>  If at all, it should only go in
> >> together with the later re-introduction of a CONFIG_*.
I could merge this patch with patch 8. I do feel they tackle different
matters, but when HAS_PDX goes away doesn't matter.

> >> Still I question
> >> whether that new CONFIG_HAS_PDX_COMPRESSION (which imo ought to be
> >> CONFIG_PDX_COMPRESSION)
Sure, I'll change that in v2.

> >> then wouldn't better depend on the arch-selected
> >> HAS_PDX, such that it wouldn't wrongly be offered to choose a value when
> >> compression isn't supported in the first place.
There are several concepts to consider:

  * Frame numbers (mfn)
  * Frame table indices (pdx)
  * Mapping between the 2 (pdx_to_pfn/pfn_to_pdx)

  (I purposefully ignore the directmap. There's a similar argument for it)

An arch opting out of pdx and an arch opting out of pdx compression are in
the exact same situation, except the later has the ability to easily toggle
it back on. Using pdx (_not_ compression) as a first-class type has several
advantages.

  1. It allows common code to deal with pdx too. There are some conversions
     to/from pdx that have to do with arch-specific code calling common
     code or vice-versa. This is particularly bad in the presence of
     compression This is particularly bad in the presence of compression
     because it implies fairly frequent reads of global state.
  2. It allows a port to get pdx-compression for free if it opts into it;
     and more importantly, the ability to toggle it.
  3. Simplifies the compression removal logic. Otherwise #ifdefs need to be
     sprinkled around any architecture that may want to toggle it and
     that's several orders of magnitude more difficult to read.

TL;DR: There is not a benefit in a new port purposefuilly avoiding PDX.
It's just a name for a particular index. It's _compression_ that makes it
have a whacky relationship with mfns and might want toggling.

Incidentally, note that common/numa.c does use PDX.

> > 
> > ... although I do agree that the resulting option shouldn't be user
> > selectable.  It's a property of the architecture, not something a user
> > should be worrying about.
> 
> I disagree. Exotic x86 hardware may or may not want supporting, which
> can only be determined by the build meister, as long as this is indeed
> meant to become a build-time decision (rather than a runtime one; see
> my response to the cover letter of this series).
> 
> Jan
I won't get into x86, because I have never seen such exotic hardware, but
ARM definitely has a lot more heterogeneous system designs where it might
be needed on a system-by-system basis.

Alejandro


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Make PDX compression optional
  2023-07-18 13:06     ` Jan Beulich
@ 2023-07-18 13:40       ` Alejandro Vallejo
  0 siblings, 0 replies; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-18 13:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Xen-devel

On Tue, Jul 18, 2023 at 03:06:26PM +0200, Jan Beulich wrote:
> On 18.07.2023 14:58, Alejandro Vallejo wrote:
> > I believe the burden of proof is reversed. Features bring complexity, and
> > complexity increases the chances of introducing bugs. It's the presence of
> > a feature that ought to be backed by a proof-of-requirement, not its
> > absence.
> 
> The feature was introduced to support hardware a partner of ours was
> working on at the time. Xen wouldn't have worked very well there
> without these additions. It is beyond my control or knowledge whether
> any such system has ever made it into the public. So at the time of
> its introduction, the need for this code was well justified imo.
> 
> Jan
Oh, of course. I don't question the legitimacy of its introduction at all,
nor do I question the matter of its optional presence. I do question the
default considering the public data we have available.

Alejandro


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/8] arm/mm: Document the differences between arm32 and arm64 directmaps
  2023-07-17 16:03 ` [PATCH 2/8] arm/mm: Document the differences between arm32 and arm64 directmaps Alejandro Vallejo
@ 2023-07-20 20:05   ` Julien Grall
  2023-07-21 15:09     ` Alejandro Vallejo
  0 siblings, 1 reply; 34+ messages in thread
From: Julien Grall @ 2023-07-20 20:05 UTC (permalink / raw)
  To: Alejandro Vallejo, Xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Alejandro,

On 17/07/2023 17:03, Alejandro Vallejo wrote:
> arm32 merely covers the XENHEAP, whereas arm64 currently covers anything in
> the frame table. These comments highlight why arm32 doesn't need to account for PDX
> compression in its __va() implementation while arm64 does.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>   xen/arch/arm/include/asm/mm.h | 27 +++++++++++++++++++++++++++
>   1 file changed, 27 insertions(+)
> 
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 4262165ce2..1a83f41879 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -280,6 +280,19 @@ static inline paddr_t __virt_to_maddr(vaddr_t va)
>   #define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
>   
>   #ifdef CONFIG_ARM_32
> +/**
> + * Find the virtual address corresponding to a machine address
> + *
> + * Only memory backing the XENHEAP has a corresponding virtual address to
> + * be found. This is so we can save precious virtual space, as it's in
> + * short supply on arm32. This mapping is not subject to PDX compression
> + * because XENHEAP is known to be physically contiguous and can't hence
> + * jump over the PDX hole. This means we can avoid the roundtrips
> + * converting to/from pdx.
> + *
> + * @param ma Machine address
> + * @return Virtual address mapped to `ma`
> + */
>   static inline void *maddr_to_virt(paddr_t ma)
>   {
>       ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
> @@ -287,6 +300,20 @@ static inline void *maddr_to_virt(paddr_t ma)
>       return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
>   }
>   #else
> +/**
> + * Find the virtual address corresponding to a machine address
> + *
> + * The directmap covers all conventional memory accesible by the
> + * hypervisor. This means it's subject to PDX compression.
> + *
> + * More specifically to arm64, the directmap mappings start at the first
> + * GiB boundary containing valid RAM. This means there's an extra offset
> + * applied (directmap_base_pdx) on top of the regular PDX compression
> + * logic.

I find this paragraph a bit confusing to read because it leads to think 
that pdx_to_maddr(directmap_base_pdx) will return a GiB aligned address.

The base PDX corresponds to the start of the first region and the only 
requirement is it should be page-aligned. However, when mapping in the 
virtual address space we also offset the start to ensure that superpage 
can be used (this is where the GiB alignment is used).

That's why XENHEAP_VIRT_START points to directmap_virt_start rather than 
DIRECTMAP_VIRT_START. I think it would make sense to have the logic 
following what you suggest as it would remove a memory read. But I would 
understand if you don't want to take that extra work. :)

So for now, I would suggest to remove "GiB boundary containing".

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Make PDX compression optional
  2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
                   ` (8 preceding siblings ...)
  2023-07-18  9:33 ` [PATCH 0/8] Make PDX compression optional Jan Beulich
@ 2023-07-20 22:00 ` Julien Grall
  2023-07-20 22:13   ` Andrew Cooper
  9 siblings, 1 reply; 34+ messages in thread
From: Julien Grall @ 2023-07-20 22:00 UTC (permalink / raw)
  To: Alejandro Vallejo, Xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné

Hi Alejandro,

Great work!

On 17/07/2023 17:03, Alejandro Vallejo wrote:
> Currently there's a CONFIG_HAS_PDX Kconfig option, but it's impossible to
> disable it because the whole codebase performs unconditional
> compression/decompression operations on addresses. This has the
> unfortunate side effect that systems without a need for compression still
> have to pay the performance impact of juggling bits on every pfn<->pdx
> conversion (this requires reading several global variables). This series
> attempts to:
Just as a datapoint. I applied this to a tree with Live-Update support. 
 From the basic test I did, this is reducing the downtime by 10% :).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/8] Make PDX compression optional
  2023-07-20 22:00 ` Julien Grall
@ 2023-07-20 22:13   ` Andrew Cooper
  0 siblings, 0 replies; 34+ messages in thread
From: Andrew Cooper @ 2023-07-20 22:13 UTC (permalink / raw)
  To: Julien Grall, Alejandro Vallejo, Xen-devel
  Cc: George Dunlap, Jan Beulich, Stefano Stabellini, Wei Liu,
	Bertrand Marquis, Volodymyr Babchuk, Roger Pau Monné

On 20/07/2023 11:00 pm, Julien Grall wrote:
> Hi Alejandro,
>
> Great work!
>
> On 17/07/2023 17:03, Alejandro Vallejo wrote:
>> Currently there's a CONFIG_HAS_PDX Kconfig option, but it's
>> impossible to
>> disable it because the whole codebase performs unconditional
>> compression/decompression operations on addresses. This has the
>> unfortunate side effect that systems without a need for compression
>> still
>> have to pay the performance impact of juggling bits on every pfn<->pdx
>> conversion (this requires reading several global variables). This series
>> attempts to:
> Just as a datapoint. I applied this to a tree with Live-Update
> support. From the basic test I did, this is reducing the downtime by
> 10% :).

I'm not surprised in the slightest.

We've had many cases that prove that compression (of 0 bits, on all x86
systems) is a disaster perf wise, and its used in pretty much every
fastpath in Xen.

Look no further than c/s 564d261687c and the 10% improvements in general
PV runtime too, and that was optimising away one single instance in one
single fastpath.

It's also why I'm not entertaining the concept of leaving it active or
selectable on x86.

~Andrew


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/8] arm/mm: Document the differences between arm32 and arm64 directmaps
  2023-07-20 20:05   ` Julien Grall
@ 2023-07-21 15:09     ` Alejandro Vallejo
  2023-07-21 17:51       ` Julien Grall
  0 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-21 15:09 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julian,

On Thu, Jul 20, 2023 at 09:05:55PM +0100, Julien Grall wrote:
> Hi Alejandro,
> 
> On 17/07/2023 17:03, Alejandro Vallejo wrote:
> > arm32 merely covers the XENHEAP, whereas arm64 currently covers anything in
> > the frame table. These comments highlight why arm32 doesn't need to account for PDX
> > compression in its __va() implementation while arm64 does.
> > 
> > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> > ---
> >   xen/arch/arm/include/asm/mm.h | 27 +++++++++++++++++++++++++++
> >   1 file changed, 27 insertions(+)
> > 
> > diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> > index 4262165ce2..1a83f41879 100644
> > --- a/xen/arch/arm/include/asm/mm.h
> > +++ b/xen/arch/arm/include/asm/mm.h
> > @@ -280,6 +280,19 @@ static inline paddr_t __virt_to_maddr(vaddr_t va)
> >   #define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
> >   #ifdef CONFIG_ARM_32
> > +/**
> > + * Find the virtual address corresponding to a machine address
> > + *
> > + * Only memory backing the XENHEAP has a corresponding virtual address to
> > + * be found. This is so we can save precious virtual space, as it's in
> > + * short supply on arm32. This mapping is not subject to PDX compression
> > + * because XENHEAP is known to be physically contiguous and can't hence
> > + * jump over the PDX hole. This means we can avoid the roundtrips
> > + * converting to/from pdx.
> > + *
> > + * @param ma Machine address
> > + * @return Virtual address mapped to `ma`
> > + */
> >   static inline void *maddr_to_virt(paddr_t ma)
> >   {
> >       ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
> > @@ -287,6 +300,20 @@ static inline void *maddr_to_virt(paddr_t ma)
> >       return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
> >   }
> >   #else
> > +/**
> > + * Find the virtual address corresponding to a machine address
> > + *
> > + * The directmap covers all conventional memory accesible by the
> > + * hypervisor. This means it's subject to PDX compression.
> > + *
> > + * More specifically to arm64, the directmap mappings start at the first
> > + * GiB boundary containing valid RAM. This means there's an extra offset
> > + * applied (directmap_base_pdx) on top of the regular PDX compression
> > + * logic.
> 
> I find this paragraph a bit confusing to read because it leads to think that
> pdx_to_maddr(directmap_base_pdx) will return a GiB aligned address.
> 
> The base PDX corresponds to the start of the first region and the only
> requirement is it should be page-aligned. However, when mapping in the
> virtual address space we also offset the start to ensure that superpage can
> be used (this is where the GiB alignment is used).
> 
> That's why XENHEAP_VIRT_START points to directmap_virt_start rather than
> DIRECTMAP_VIRT_START. I think it would make sense to have the logic
> following what you suggest as it would remove a memory read. But I would
> understand if you don't want to take that extra work. :)
> 
> So for now, I would suggest to remove "GiB boundary containing".
> 
> Cheers,
> 
> -- 
> Julien Grall
Just to make sure it's the wording and not my understanding at fault
(definitely having DIRECTMAP_VIRT_START != directmap_virt_start doesn't do
any favours cognitive load).

/GiB boundary
|
|   /offset=address of 1st region of RAM % 1GiB
|   |
|---------|
V         V
--------------------------------------------------------------------------
| padding |                           directmap                | padding |
--------------------------------------------------------------------------
^         ^
|         |
|         \directmap_virt_start=pdx[directmap_base_pdx]
|
\DIRECTMAP_VIRT_START

In actual words, I considered DIRECTMAP_VIRT_START the beginning of the
directmap, not directmap_virt_start.

If this is it, you probably want to document somewhere what's what. In
particular, you want a big scary message in DIRECTMAP_VIRT_START stating
that it merely delimits the virtual range where the directmap can be, not
where the directmap is, with a "See directmap_virt_start for the address
where the directmap actually starts" message attached.

With that considered I'm happy to amend as you suggested on v2.

IMO, the ARM port should not keep that base pdx variable around, but
integrate it in the pdx logic, so the first valid address always
corresponds to pdx[0]. Then given a pdx it's immediate to find frame table
entries and directmap frames. It would also greatly simplify the definition
of a pdx.

Alejandro


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/8] mm: Factor out the pdx compression logic in ma/va converters
  2023-07-17 16:03 ` [PATCH 5/8] mm: Factor out the pdx compression logic in ma/va converters Alejandro Vallejo
@ 2023-07-21 16:11   ` Julien Grall
  0 siblings, 0 replies; 34+ messages in thread
From: Julien Grall @ 2023-07-21 16:11 UTC (permalink / raw)
  To: Alejandro Vallejo, Xen-devel
  Cc: Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Jan Beulich, Wei Liu,
	Roger Pau Monné

Hi Alejandro,

On 17/07/2023 17:03, Alejandro Vallejo wrote:
> This patch factors out the pdx compression logic hardcoded in both ports
> for the maddr<->vaddr conversion functions.
> 
> Touches both x86 and arm ports.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>   xen/arch/arm/include/asm/mm.h          |  3 +--
>   xen/arch/x86/include/asm/x86_64/page.h | 28 +++++++++++---------------
>   xen/include/xen/pdx.h                  | 25 +++++++++++++++++++++++
>   3 files changed, 38 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 1a83f41879..78cb23858a 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -320,8 +320,7 @@ static inline void *maddr_to_virt(paddr_t ma)
>              (DIRECTMAP_SIZE >> PAGE_SHIFT));
>       return (void *)(XENHEAP_VIRT_START -
>                       (directmap_base_pdx << PAGE_SHIFT) +
> -                    ((ma & ma_va_bottom_mask) |
> -                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
> +                    maddr_to_directmapoff(ma));
>   }
>   #endif
>   
> diff --git a/xen/arch/x86/include/asm/x86_64/page.h b/xen/arch/x86/include/asm/x86_64/page.h
> index 53faa7875b..b589c93e77 100644
> --- a/xen/arch/x86/include/asm/x86_64/page.h
> +++ b/xen/arch/x86/include/asm/x86_64/page.h
> @@ -36,26 +36,22 @@ static inline unsigned long __virt_to_maddr(unsigned long va)
>   {
>       ASSERT(va < DIRECTMAP_VIRT_END);
>       if ( va >= DIRECTMAP_VIRT_START )
> -        va -= DIRECTMAP_VIRT_START;
> -    else
> -    {
> -        BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
> -        /* Signed, so ((long)XEN_VIRT_START >> 30) fits in an imm32. */
> -        ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
> -               ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
> -
> -        va += xen_phys_start - XEN_VIRT_START;
> -    }
> -    return (va & ma_va_bottom_mask) |
> -           ((va << pfn_pdx_hole_shift) & ma_top_mask);
> +        return directmapoff_to_maddr(va - DIRECTMAP_VIRT_START);
> +
> +    BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
> +    /* Signed, so ((long)XEN_VIRT_START >> 30) fits in an imm32. */
> +    ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
> +           ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
> +
> +    return xen_phys_start + va - XEN_VIRT_START;
>   }
>   
>   static inline void *__maddr_to_virt(unsigned long ma)
>   {
> -    ASSERT(pfn_to_pdx(ma >> PAGE_SHIFT) < (DIRECTMAP_SIZE >> PAGE_SHIFT));
> -    return (void *)(DIRECTMAP_VIRT_START +
> -                    ((ma & ma_va_bottom_mask) |
> -                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
> +    /* Offset in the direct map, accounting for pdx compression */
> +    size_t va_offset = maddr_to_directmapoff(ma);
> +    ASSERT(va_offset < DIRECTMAP_SIZE);
> +    return (void *)(DIRECTMAP_VIRT_START + va_offset);
>   }
>   
>   /* read access (should only be used for debug printk's) */
> diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
> index 67ae20e89c..f8ca0f5821 100644
> --- a/xen/include/xen/pdx.h
> +++ b/xen/include/xen/pdx.h
> @@ -158,6 +158,31 @@ static inline unsigned long pdx_to_pfn(unsigned long pdx)
>   #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
>   #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
>   
> +/**
> + * Computes the offset into the direct map of an maddr
> + *
> + * @param ma Machine address
> + * @return Offset on the direct map where that
> + *         machine address can be accessed
> + */
> +static inline unsigned long maddr_to_directmapoff(uint64_t ma)
> +{
> +    return ((ma & ma_top_mask) >> pfn_pdx_hole_shift) |
> +           (ma & ma_va_bottom_mask);

NIT: I got a bit confused because your re-order the two operations. I 
guess this was done because it is nicer to read.

Anyway, I have confirmed the logic is still the same (just different 
ordering).

> +}
> +
> +/**
> + * Computes a machine address given a direct map offset
> + *
> + * @param offset Offset into the direct map
> + * @return Corresponding machine address of that virtual location
> + */
> +static inline uint64_t directmapoff_to_maddr(unsigned long offset)
> +{
> +    return ((offset << pfn_pdx_hole_shift) & ma_top_mask) |

'unsigned long' may be 32-bit. So I think you want to cast offset to 
uint64_t.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression
  2023-07-17 16:03 ` [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression Alejandro Vallejo
@ 2023-07-21 17:05   ` Julien Grall
  2023-07-24 12:18     ` Alejandro Vallejo
  0 siblings, 1 reply; 34+ messages in thread
From: Julien Grall @ 2023-07-21 17:05 UTC (permalink / raw)
  To: Alejandro Vallejo, Xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Stefano Stabellini

Hi Alejandro,

On 17/07/2023 17:03, Alejandro Vallejo wrote:
> Regions must be occasionally validated for pdx compression validity. That
> is, whether any of the machine addresses spanning the region have a bit set
> in the pdx "hole" (which is expected to always contain zeroes). There are
> a few such tests through the code, and they all check for different things.
> 
> This patch replaces all such occurences with a call to a centralized

Typo: s/occurences/occurrences/

> function that checks a region for validity.
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> ---
>   xen/arch/x86/x86_64/mm.c |  2 +-
>   xen/common/efi/boot.c    |  6 +++---
>   xen/common/pdx.c         | 13 +++++++++++--
>   xen/include/xen/pdx.h    |  9 +++++++++
>   4 files changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index 60db439af3..914e65c26c 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -1168,7 +1168,7 @@ static int mem_hotadd_check(unsigned long spfn, unsigned long epfn)
>       if ( (spfn | epfn) & ((1UL << PAGETABLE_ORDER) - 1) )
>           return 0;
>   
> -    if ( (spfn | epfn) & pfn_hole_mask )
> +    if ( !pdx_is_region_compressible(spfn, epfn) )
>           return 0;
>   
>       /* Make sure the new range is not present now */
> diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
> index 24169b7b50..b098a8c030 100644
> --- a/xen/common/efi/boot.c
> +++ b/xen/common/efi/boot.c
> @@ -14,6 +14,7 @@
>   #include <xen/multiboot.h>
>   #include <xen/param.h>
>   #include <xen/pci_regs.h>
> +#include <xen/pdx.h>
>   #include <xen/pfn.h>
>   #if EFI_PAGE_SIZE != PAGE_SIZE
>   # error Cannot use xen/pfn.h here!
> @@ -1647,7 +1648,7 @@ static bool __init cf_check ram_range_valid(unsigned long smfn, unsigned long em
>   {
>       unsigned long sz = pfn_to_pdx(emfn - 1) / PDX_GROUP_COUNT + 1;
>   
> -    return !(smfn & pfn_hole_mask) &&
> +    return pdx_is_region_compressible(smfn, emfn) &&
>              find_next_bit(pdx_group_valid, sz,
>                            pfn_to_pdx(smfn) / PDX_GROUP_COUNT) < sz;
>   }
> @@ -1759,8 +1760,7 @@ void __init efi_init_memory(void)
>               prot |= _PAGE_NX;
>   
>           if ( pfn_to_pdx(emfn - 1) < (DIRECTMAP_SIZE >> PAGE_SHIFT) &&
> -             !(smfn & pfn_hole_mask) &&
> -             !((smfn ^ (emfn - 1)) & ~pfn_pdx_bottom_mask) )
> +             pdx_is_region_compressible(smfn, emfn))
>           {
>               if ( (unsigned long)mfn_to_virt(emfn - 1) >= HYPERVISOR_VIRT_END )
>                   prot &= ~_PAGE_GLOBAL;
> diff --git a/xen/common/pdx.c b/xen/common/pdx.c
> index 99d4a90a50..72845e4bab 100644
> --- a/xen/common/pdx.c
> +++ b/xen/common/pdx.c
> @@ -88,7 +88,7 @@ bool __mfn_valid(unsigned long mfn)
>   }
>   
>   /* Sets all bits from the most-significant 1-bit down to the LSB */
> -static uint64_t __init fill_mask(uint64_t mask)
> +static uint64_t fill_mask(uint64_t mask)
>   {
>       while (mask & (mask + 1))
>           mask |= mask + 1;
> @@ -96,6 +96,15 @@ static uint64_t __init fill_mask(uint64_t mask)
>       return mask;
>   }
>   
> +bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn)

For newer interface, I would rather prefer if we use start + size. It is 
easier to reason (you don't have to wonder whether 'emfn' is inclusive 
or not) and avoid issue in the case you are trying to handle a region 
right at the end of the address space as emfn would be 0 in the 
non-inclusive case (not much a concern for MFNs as the last one should 
be invalid, but it makes harder to reason).

> +{
> +    uint64_t base = smfn << PAGE_SHIFT;

On Arm32, physical address are up to 40-bit. So you want to cast smfn to 
uint64_t before shifting. That said, it would be best to use 
pfn_to_paddr() and possibly switch to paddr_t for the type.

Note that I understand that the rest of the PDX code is using uint64_t. 
So I would be ok if you don't want to switch to paddr_t.

> +    uint64_t len = (emfn - smfn) << PAGE_SHIFT;

Same here.

> +
> +    return !(smfn & pfn_hole_mask) &&
> +           !(pdx_region_mask(base, len) & ~ma_va_bottom_mask);
> +}
> +
>   /* We don't want to compress the low MAX_ORDER bits of the addresses. */
>   uint64_t __init pdx_init_mask(uint64_t base_addr)
>   {
> @@ -103,7 +112,7 @@ uint64_t __init pdx_init_mask(uint64_t base_addr)
>                            (uint64_t)1 << (MAX_ORDER + PAGE_SHIFT)) - 1);
>   }
>   
> -uint64_t __init pdx_region_mask(uint64_t base, uint64_t len)
> +uint64_t pdx_region_mask(uint64_t base, uint64_t len)
>   {
>       /*
>        * We say a bit "moves" in a range if there exist 2 addresses in that
> diff --git a/xen/include/xen/pdx.h b/xen/include/xen/pdx.h
> index f8ca0f5821..5378e664c2 100644
> --- a/xen/include/xen/pdx.h
> +++ b/xen/include/xen/pdx.h
> @@ -77,6 +77,15 @@ extern unsigned long pfn_top_mask, ma_top_mask;
>                            (sizeof(*frame_table) & -sizeof(*frame_table)))
>   extern unsigned long pdx_group_valid[];
>   
> +/**
> + * Validate a region's compatibility with the current compression runtime
> + *
> + * @param smfn Start mfn
> + * @param emfn End mfn (non-inclusive)
> + * @return True iff the region can be used with the current compression
> + */
> +bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn);
> +
>   /**
>    * Calculates a mask covering "moving" bits of all addresses of a region
>    *

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/8] arm/mm: Document the differences between arm32 and arm64 directmaps
  2023-07-21 15:09     ` Alejandro Vallejo
@ 2023-07-21 17:51       ` Julien Grall
  0 siblings, 0 replies; 34+ messages in thread
From: Julien Grall @ 2023-07-21 17:51 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Alejandro,

On 21/07/2023 16:09, Alejandro Vallejo wrote:
> On Thu, Jul 20, 2023 at 09:05:55PM +0100, Julien Grall wrote:
>> Hi Alejandro,
>>
>> On 17/07/2023 17:03, Alejandro Vallejo wrote:
>>> arm32 merely covers the XENHEAP, whereas arm64 currently covers anything in
>>> the frame table. These comments highlight why arm32 doesn't need to account for PDX
>>> compression in its __va() implementation while arm64 does.
>>>
>>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>>> ---
>>>    xen/arch/arm/include/asm/mm.h | 27 +++++++++++++++++++++++++++
>>>    1 file changed, 27 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
>>> index 4262165ce2..1a83f41879 100644
>>> --- a/xen/arch/arm/include/asm/mm.h
>>> +++ b/xen/arch/arm/include/asm/mm.h
>>> @@ -280,6 +280,19 @@ static inline paddr_t __virt_to_maddr(vaddr_t va)
>>>    #define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
>>>    #ifdef CONFIG_ARM_32
>>> +/**
>>> + * Find the virtual address corresponding to a machine address
>>> + *
>>> + * Only memory backing the XENHEAP has a corresponding virtual address to
>>> + * be found. This is so we can save precious virtual space, as it's in
>>> + * short supply on arm32. This mapping is not subject to PDX compression
>>> + * because XENHEAP is known to be physically contiguous and can't hence
>>> + * jump over the PDX hole. This means we can avoid the roundtrips
>>> + * converting to/from pdx.
>>> + *
>>> + * @param ma Machine address
>>> + * @return Virtual address mapped to `ma`
>>> + */
>>>    static inline void *maddr_to_virt(paddr_t ma)
>>>    {
>>>        ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
>>> @@ -287,6 +300,20 @@ static inline void *maddr_to_virt(paddr_t ma)
>>>        return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
>>>    }
>>>    #else
>>> +/**
>>> + * Find the virtual address corresponding to a machine address
>>> + *
>>> + * The directmap covers all conventional memory accesible by the
>>> + * hypervisor. This means it's subject to PDX compression.
>>> + *
>>> + * More specifically to arm64, the directmap mappings start at the first
>>> + * GiB boundary containing valid RAM. This means there's an extra offset
>>> + * applied (directmap_base_pdx) on top of the regular PDX compression
>>> + * logic.
>>
>> I find this paragraph a bit confusing to read because it leads to think that
>> pdx_to_maddr(directmap_base_pdx) will return a GiB aligned address.
>>
>> The base PDX corresponds to the start of the first region and the only
>> requirement is it should be page-aligned. However, when mapping in the
>> virtual address space we also offset the start to ensure that superpage can
>> be used (this is where the GiB alignment is used).
>>
>> That's why XENHEAP_VIRT_START points to directmap_virt_start rather than
>> DIRECTMAP_VIRT_START. I think it would make sense to have the logic
>> following what you suggest as it would remove a memory read. But I would
>> understand if you don't want to take that extra work. :)
>>
>> So for now, I would suggest to remove "GiB boundary containing".
>>
>> Cheers,
>>
>> -- 
>> Julien Grall
> Just to make sure it's the wording and not my understanding at fault
> (definitely having DIRECTMAP_VIRT_START != directmap_virt_start doesn't do
> any favours cognitive load).

I take your point. This was recently renamed from xenheap_virt_start to 
directmap_virt_start after the static heap work.

Looking through the code, I think we probably can remove 
directmap_virt_start as, for arm64, it is only used indirectly (via 
XEN_HEAP_VIRT_START) in the virt_to_page() and maddr_to_virt().

This would remove one memory load in the two functions.

>  > /GiB boundary
> |
> |   /offset=address of 1st region of RAM % 1GiB
> |   |
> |---------|
> V         V
> --------------------------------------------------------------------------
> | padding |                           directmap                | padding |
> --------------------------------------------------------------------------
> ^         ^
> |         |
> |         \directmap_virt_start=pdx[directmap_base_pdx]
> |
> \DIRECTMAP_VIRT_START

The drawing is correct.

> 
> In actual words, I considered DIRECTMAP_VIRT_START the beginning of the
> directmap, not directmap_virt_start.

This is where we disagreed on the definition :). Definitely something 
that needs to be documented or removed (see below).

> 
> If this is it, you probably want to document somewhere what's what. In
> particular, you want a big scary message in DIRECTMAP_VIRT_START stating
> that it merely delimits the virtual range where the directmap can be, not
> where the directmap is, with a "See directmap_virt_start for the address
> where the directmap actually starts" message attached.

The uppercase only difference would probably still be confusing. I am 
thinking to remove directmap_virt_start completely because the directmap 
has already hole and we should consider the initial padding as  a hole.

Let me have a look.

> 
> With that considered I'm happy to amend as you suggested on v2.
> 
> IMO, the ARM port should not keep that base pdx variable around, but
> integrate it in the pdx logic, so the first valid address always
> corresponds to pdx[0]. Then given a pdx it's immediate to find frame table
> entries and directmap frames. It would also greatly simplify the definition
> of a pdx.

That's a good idea. However, I don't think I will have the bandwidth to 
look at hacking the PDX code in the near future. Although, I would be 
happy to review patches if someone want to tackle the problem.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression
  2023-07-21 17:05   ` Julien Grall
@ 2023-07-24 12:18     ` Alejandro Vallejo
  2023-07-24 18:20       ` Julien Grall
  0 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-24 12:18 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Stefano Stabellini

On Fri, Jul 21, 2023 at 06:05:51PM +0100, Julien Grall wrote:
> Hi Alejandro,
> 
> On 17/07/2023 17:03, Alejandro Vallejo wrote:
> > +bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn)
> 
> For newer interface, I would rather prefer if we use start + size. It is
> easier to reason (you don't have to wonder whether 'emfn' is inclusive or
> not) and avoid issue in the case you are trying to handle a region right at
> the end of the address space as emfn would be 0 in the non-inclusive case
> (not much a concern for MFNs as the last one should be invalid, but it makes
> harder to reason).
I could agree on this, but every single caller is based on (smfn, emfn),
so it needlessly forces every caller to perform conversions where the
callee can do it just once. That said, I think your point makes sense and
it ought to be done. Probably as as part of a bigger refactor where
(smfn, emfn)-based functions are turned into (base, len) variants.

> 
> > +{
> > +    uint64_t base = smfn << PAGE_SHIFT;
> 
> On Arm32, physical address are up to 40-bit. So you want to cast smfn to
> uint64_t before shifting. That said, it would be best to use pfn_to_paddr()
> and possibly switch to paddr_t for the type.
Ack. I keep forgetting 32 bits is not gone on non-x86 ports :)

> 
> Note that I understand that the rest of the PDX code is using uint64_t. So I
> would be ok if you don't want to switch to paddr_t.
Not for this patch, but I will keep "Turn all u64 maddr variables in
pdx.[ch] into paddr_t" in the back of my head for a rainy day.

> 
> > +    uint64_t len = (emfn - smfn) << PAGE_SHIFT;
> 
> Same here.
Ack.

Thanks,
Alejandro


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression
  2023-07-24 12:18     ` Alejandro Vallejo
@ 2023-07-24 18:20       ` Julien Grall
  2023-07-25  6:51         ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Julien Grall @ 2023-07-24 18:20 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Stefano Stabellini

Hi Alejandro,

On 24/07/2023 13:18, Alejandro Vallejo wrote:
> On Fri, Jul 21, 2023 at 06:05:51PM +0100, Julien Grall wrote:
>> Hi Alejandro,
>>
>> On 17/07/2023 17:03, Alejandro Vallejo wrote:
>>> +bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn)
>>
>> For newer interface, I would rather prefer if we use start + size. It is
>> easier to reason (you don't have to wonder whether 'emfn' is inclusive or
>> not) and avoid issue in the case you are trying to handle a region right at
>> the end of the address space as emfn would be 0 in the non-inclusive case
>> (not much a concern for MFNs as the last one should be invalid, but it makes
>> harder to reason).
> I could agree on this, but every single caller is based on (smfn, emfn),
> so it needlessly forces every caller to perform conversions where the
> callee can do it just once.

That's indeed one way to see it. The problem is that...

> That said, I think your point makes sense and
> it ought to be done. Probably as as part of a bigger refactor where
> (smfn, emfn)-based functions are turned into (base, len) variants.

... clean-up tends to be put in the back-burner and we just continue to 
add new use. This makes the task to remove every use a lot more 
difficult. So there is a point when one has to say no more.

Therefore, I would strongly prefer if each callers are doing the 
computation. The rest can be removed leisurely.

Let see what the opinion of the other maintainers.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression
  2023-07-24 18:20       ` Julien Grall
@ 2023-07-25  6:51         ` Jan Beulich
  2023-07-25 14:27           ` Julien Grall
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-07-25  6:51 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Stefano Stabellini, Alejandro Vallejo

On 24.07.2023 20:20, Julien Grall wrote:
> On 24/07/2023 13:18, Alejandro Vallejo wrote:
>> On Fri, Jul 21, 2023 at 06:05:51PM +0100, Julien Grall wrote:
>>> Hi Alejandro,
>>>
>>> On 17/07/2023 17:03, Alejandro Vallejo wrote:
>>>> +bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn)
>>>
>>> For newer interface, I would rather prefer if we use start + size. It is
>>> easier to reason (you don't have to wonder whether 'emfn' is inclusive or
>>> not) and avoid issue in the case you are trying to handle a region right at
>>> the end of the address space as emfn would be 0 in the non-inclusive case
>>> (not much a concern for MFNs as the last one should be invalid, but it makes
>>> harder to reason).
>> I could agree on this, but every single caller is based on (smfn, emfn),
>> so it needlessly forces every caller to perform conversions where the
>> callee can do it just once.
> 
> That's indeed one way to see it. The problem is that...
> 
>> That said, I think your point makes sense and
>> it ought to be done. Probably as as part of a bigger refactor where
>> (smfn, emfn)-based functions are turned into (base, len) variants.
> 
> ... clean-up tends to be put in the back-burner and we just continue to 
> add new use. This makes the task to remove every use a lot more 
> difficult. So there is a point when one has to say no more.
> 
> Therefore, I would strongly prefer if each callers are doing the 
> computation. The rest can be removed leisurely.
> 
> Let see what the opinion of the other maintainers.

I think [a,b] ranges are fine to pass, and may even be preferable over
passing a size. I'm specifically using that term that you also used:
"size" (or "length") is ambiguous when talking about page granular
items - is it in bytes or number of pages? Especially in the former
case calculations at the call sites would be quite a bit more cumbersome.
I could agree with (mfn,nr) tuples, but as said I think inclusive
ranges are also fine to use (and would be less of a problem at the call
sites here, afaics).

Jan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression
  2023-07-25  6:51         ` Jan Beulich
@ 2023-07-25 14:27           ` Julien Grall
  2023-07-25 14:34             ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Julien Grall @ 2023-07-25 14:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Stefano Stabellini, Alejandro Vallejo

Hi,

On 25/07/2023 07:51, Jan Beulich wrote:
> On 24.07.2023 20:20, Julien Grall wrote:
>> On 24/07/2023 13:18, Alejandro Vallejo wrote:
>>> On Fri, Jul 21, 2023 at 06:05:51PM +0100, Julien Grall wrote:
>>>> Hi Alejandro,
>>>>
>>>> On 17/07/2023 17:03, Alejandro Vallejo wrote:
>>>>> +bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn)
>>>>
>>>> For newer interface, I would rather prefer if we use start + size. It is
>>>> easier to reason (you don't have to wonder whether 'emfn' is inclusive or
>>>> not) and avoid issue in the case you are trying to handle a region right at
>>>> the end of the address space as emfn would be 0 in the non-inclusive case
>>>> (not much a concern for MFNs as the last one should be invalid, but it makes
>>>> harder to reason).
>>> I could agree on this, but every single caller is based on (smfn, emfn),
>>> so it needlessly forces every caller to perform conversions where the
>>> callee can do it just once.
>>
>> That's indeed one way to see it. The problem is that...
>>
>>> That said, I think your point makes sense and
>>> it ought to be done. Probably as as part of a bigger refactor where
>>> (smfn, emfn)-based functions are turned into (base, len) variants.
>>
>> ... clean-up tends to be put in the back-burner and we just continue to
>> add new use. This makes the task to remove every use a lot more
>> difficult. So there is a point when one has to say no more.
>>
>> Therefore, I would strongly prefer if each callers are doing the
>> computation. The rest can be removed leisurely.
>>
>> Let see what the opinion of the other maintainers.
> 
> I think [a,b] ranges are fine to pass, and may even be preferable over
> passing a size. I'm specifically using that term that you also used:
> "size" (or "length") is ambiguous when talking about page granular
> items - is it in bytes or number of pages?

I was referring to the number of pages. I don't think it make sense to 
have it in bytes given the start is a frame.

> Especially in the former
> case calculations at the call sites would be quite a bit more cumbersome.
> I could agree with (mfn,nr) tuples

Ok. So your objection of my proposal is just about the name, right? If 
so, I didn't put too much thought behind the naming when I sent my 
original e-mail. I am open to any.

, but as said I think inclusive
> ranges are also fine to use (and would be less of a problem at the call
> sites here, afaics).

The problem with range is that it can lead to confusion on whether the 
end is inclusive or exclusive. We had one bug recently in the Arm PCI 
code because of that.

So I would like to get rid of any use of range in new functions.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression
  2023-07-25 14:27           ` Julien Grall
@ 2023-07-25 14:34             ` Jan Beulich
  2023-07-27 10:14               ` Alejandro Vallejo
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2023-07-25 14:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: Xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Stefano Stabellini, Alejandro Vallejo

On 25.07.2023 16:27, Julien Grall wrote:
> Hi,
> 
> On 25/07/2023 07:51, Jan Beulich wrote:
>> On 24.07.2023 20:20, Julien Grall wrote:
>>> On 24/07/2023 13:18, Alejandro Vallejo wrote:
>>>> On Fri, Jul 21, 2023 at 06:05:51PM +0100, Julien Grall wrote:
>>>>> Hi Alejandro,
>>>>>
>>>>> On 17/07/2023 17:03, Alejandro Vallejo wrote:
>>>>>> +bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn)
>>>>>
>>>>> For newer interface, I would rather prefer if we use start + size. It is
>>>>> easier to reason (you don't have to wonder whether 'emfn' is inclusive or
>>>>> not) and avoid issue in the case you are trying to handle a region right at
>>>>> the end of the address space as emfn would be 0 in the non-inclusive case
>>>>> (not much a concern for MFNs as the last one should be invalid, but it makes
>>>>> harder to reason).
>>>> I could agree on this, but every single caller is based on (smfn, emfn),
>>>> so it needlessly forces every caller to perform conversions where the
>>>> callee can do it just once.
>>>
>>> That's indeed one way to see it. The problem is that...
>>>
>>>> That said, I think your point makes sense and
>>>> it ought to be done. Probably as as part of a bigger refactor where
>>>> (smfn, emfn)-based functions are turned into (base, len) variants.
>>>
>>> ... clean-up tends to be put in the back-burner and we just continue to
>>> add new use. This makes the task to remove every use a lot more
>>> difficult. So there is a point when one has to say no more.
>>>
>>> Therefore, I would strongly prefer if each callers are doing the
>>> computation. The rest can be removed leisurely.
>>>
>>> Let see what the opinion of the other maintainers.
>>
>> I think [a,b] ranges are fine to pass, and may even be preferable over
>> passing a size. I'm specifically using that term that you also used:
>> "size" (or "length") is ambiguous when talking about page granular
>> items - is it in bytes or number of pages?
> 
> I was referring to the number of pages. I don't think it make sense to 
> have it in bytes given the start is a frame.
> 
>> Especially in the former
>> case calculations at the call sites would be quite a bit more cumbersome.
>> I could agree with (mfn,nr) tuples
> 
> Ok. So your objection of my proposal is just about the name, right? If 
> so, I didn't put too much thought behind the naming when I sent my 
> original e-mail. I am open to any.

Something like "nr" would be fine with me, for example.

> , but as said I think inclusive
>> ranges are also fine to use (and would be less of a problem at the call
>> sites here, afaics).
> 
> The problem with range is that it can lead to confusion on whether the 
> end is inclusive or exclusive. We had one bug recently in the Arm PCI 
> code because of that.

It's a matter of properly writing it down, I would say.

> So I would like to get rid of any use of range in new functions.

Hmm, seems to need further input from other maintainers / committers
then.

Jan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression
  2023-07-25 14:34             ` Jan Beulich
@ 2023-07-27 10:14               ` Alejandro Vallejo
  2023-07-27 10:19                 ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Alejandro Vallejo @ 2023-07-27 10:14 UTC (permalink / raw)
  Cc: Julien Grall, Xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Stefano Stabellini

On Tue, Jul 25, 2023 at 04:34:42PM +0200, Jan Beulich wrote:
> On 25.07.2023 16:27, Julien Grall wrote:
> > Hi,
> > 
> > On 25/07/2023 07:51, Jan Beulich wrote:
> >> On 24.07.2023 20:20, Julien Grall wrote:
> >>> On 24/07/2023 13:18, Alejandro Vallejo wrote:
> >>>> On Fri, Jul 21, 2023 at 06:05:51PM +0100, Julien Grall wrote:
> >>>>> Hi Alejandro,
> >>>>>
> >>>>> On 17/07/2023 17:03, Alejandro Vallejo wrote:
> >>>>>> +bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn)
> >>>>>
> >>>>> For newer interface, I would rather prefer if we use start + size. It is
> >>>>> easier to reason (you don't have to wonder whether 'emfn' is inclusive or
> >>>>> not) and avoid issue in the case you are trying to handle a region right at
> >>>>> the end of the address space as emfn would be 0 in the non-inclusive case
> >>>>> (not much a concern for MFNs as the last one should be invalid, but it makes
> >>>>> harder to reason).
> >>>> I could agree on this, but every single caller is based on (smfn, emfn),
> >>>> so it needlessly forces every caller to perform conversions where the
> >>>> callee can do it just once.
> >>>
> >>> That's indeed one way to see it. The problem is that...
> >>>
> >>>> That said, I think your point makes sense and
> >>>> it ought to be done. Probably as as part of a bigger refactor where
> >>>> (smfn, emfn)-based functions are turned into (base, len) variants.
> >>>
> >>> ... clean-up tends to be put in the back-burner and we just continue to
> >>> add new use. This makes the task to remove every use a lot more
> >>> difficult. So there is a point when one has to say no more.
> >>>
> >>> Therefore, I would strongly prefer if each callers are doing the
> >>> computation. The rest can be removed leisurely.
> >>>
> >>> Let see what the opinion of the other maintainers.
> >>
> >> I think [a,b] ranges are fine to pass, and may even be preferable over
> >> passing a size. I'm specifically using that term that you also used:
> >> "size" (or "length") is ambiguous when talking about page granular
> >> items - is it in bytes or number of pages?
> > 
> > I was referring to the number of pages. I don't think it make sense to 
> > have it in bytes given the start is a frame.
> > 
> >> Especially in the former
> >> case calculations at the call sites would be quite a bit more cumbersome.
> >> I could agree with (mfn,nr) tuples
> > 
> > Ok. So your objection of my proposal is just about the name, right? If 
> > so, I didn't put too much thought behind the naming when I sent my 
> > original e-mail. I am open to any.
> 
> Something like "nr" would be fine with me, for example.
> 
> > , but as said I think inclusive
> >> ranges are also fine to use (and would be less of a problem at the call
> >> sites here, afaics).
> > 
> > The problem with range is that it can lead to confusion on whether the 
> > end is inclusive or exclusive. We had one bug recently in the Arm PCI 
> > code because of that.
> 
> It's a matter of properly writing it down, I would say.
> 
> > So I would like to get rid of any use of range in new functions.
> 
> Hmm, seems to need further input from other maintainers / committers
> then.
> 
> Jan

From the look of things, I think I'm leaning on the side of Julien. The
ranges are always ambiguous in the absence of a strong code convention, but
it appears to be absent in the codebase. i.e: Are they meant to be
inclusive or exclusive? Traditionally I've seen and used the later more
often for ease of arithmetic, while from your last message you seem to be
mentioning the latter.

If the point is to disambiguate without resorting to conventions then
there's only realistic one option I see. Would refactoring it to
"paddr_t base, size_t npages" satisfy you both? If this is something to be
taken into account for future code we probably want a gitlab ticket to
refactor everything else to (base,[npages|noctets]) too. e.g:
XENPF_mem_hotadd uses the [spfn, epfn) convention, and so do a bunch of
other pieces in x86 and common code.

Thanks,
Alejandro


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression
  2023-07-27 10:14               ` Alejandro Vallejo
@ 2023-07-27 10:19                 ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2023-07-27 10:19 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Julien Grall, Xen-devel, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Stefano Stabellini

On 27.07.2023 12:14, Alejandro Vallejo wrote:
> On Tue, Jul 25, 2023 at 04:34:42PM +0200, Jan Beulich wrote:
>> On 25.07.2023 16:27, Julien Grall wrote:
>>> Hi,
>>>
>>> On 25/07/2023 07:51, Jan Beulich wrote:
>>>> On 24.07.2023 20:20, Julien Grall wrote:
>>>>> On 24/07/2023 13:18, Alejandro Vallejo wrote:
>>>>>> On Fri, Jul 21, 2023 at 06:05:51PM +0100, Julien Grall wrote:
>>>>>>> Hi Alejandro,
>>>>>>>
>>>>>>> On 17/07/2023 17:03, Alejandro Vallejo wrote:
>>>>>>>> +bool pdx_is_region_compressible(unsigned long smfn, unsigned long emfn)
>>>>>>>
>>>>>>> For newer interface, I would rather prefer if we use start + size. It is
>>>>>>> easier to reason (you don't have to wonder whether 'emfn' is inclusive or
>>>>>>> not) and avoid issue in the case you are trying to handle a region right at
>>>>>>> the end of the address space as emfn would be 0 in the non-inclusive case
>>>>>>> (not much a concern for MFNs as the last one should be invalid, but it makes
>>>>>>> harder to reason).
>>>>>> I could agree on this, but every single caller is based on (smfn, emfn),
>>>>>> so it needlessly forces every caller to perform conversions where the
>>>>>> callee can do it just once.
>>>>>
>>>>> That's indeed one way to see it. The problem is that...
>>>>>
>>>>>> That said, I think your point makes sense and
>>>>>> it ought to be done. Probably as as part of a bigger refactor where
>>>>>> (smfn, emfn)-based functions are turned into (base, len) variants.
>>>>>
>>>>> ... clean-up tends to be put in the back-burner and we just continue to
>>>>> add new use. This makes the task to remove every use a lot more
>>>>> difficult. So there is a point when one has to say no more.
>>>>>
>>>>> Therefore, I would strongly prefer if each callers are doing the
>>>>> computation. The rest can be removed leisurely.
>>>>>
>>>>> Let see what the opinion of the other maintainers.
>>>>
>>>> I think [a,b] ranges are fine to pass, and may even be preferable over
>>>> passing a size. I'm specifically using that term that you also used:
>>>> "size" (or "length") is ambiguous when talking about page granular
>>>> items - is it in bytes or number of pages?
>>>
>>> I was referring to the number of pages. I don't think it make sense to 
>>> have it in bytes given the start is a frame.
>>>
>>>> Especially in the former
>>>> case calculations at the call sites would be quite a bit more cumbersome.
>>>> I could agree with (mfn,nr) tuples
>>>
>>> Ok. So your objection of my proposal is just about the name, right? If 
>>> so, I didn't put too much thought behind the naming when I sent my 
>>> original e-mail. I am open to any.
>>
>> Something like "nr" would be fine with me, for example.
>>
>>> , but as said I think inclusive
>>>> ranges are also fine to use (and would be less of a problem at the call
>>>> sites here, afaics).
>>>
>>> The problem with range is that it can lead to confusion on whether the 
>>> end is inclusive or exclusive. We had one bug recently in the Arm PCI 
>>> code because of that.
>>
>> It's a matter of properly writing it down, I would say.
>>
>>> So I would like to get rid of any use of range in new functions.
>>
>> Hmm, seems to need further input from other maintainers / committers
>> then.
>>
>> Jan
> 
> From the look of things, I think I'm leaning on the side of Julien. The
> ranges are always ambiguous in the absence of a strong code convention, but
> it appears to be absent in the codebase. i.e: Are they meant to be
> inclusive or exclusive? Traditionally I've seen and used the later more
> often for ease of arithmetic, while from your last message you seem to be
> mentioning the latter.
> 
> If the point is to disambiguate without resorting to conventions then
> there's only realistic one option I see. Would refactoring it to
> "paddr_t base, size_t npages" satisfy you both?

If we are to refactor, then yes except it shouldn't be size_t. I can be
unsigned int (when we know more than 4 billion pages aren't possible)
or unsigned long.

Jan

> If this is something to be
> taken into account for future code we probably want a gitlab ticket to
> refactor everything else to (base,[npages|noctets]) too. e.g:
> XENPF_mem_hotadd uses the [spfn, epfn) convention, and so do a bunch of
> other pieces in x86 and common code.
> 
> Thanks,
> Alejandro
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2023-07-27 10:19 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-17 16:03 [PATCH 0/8] Make PDX compression optional Alejandro Vallejo
2023-07-17 16:03 ` [PATCH 1/8] mm/pdx: Add comments throughout the codebase for pdx Alejandro Vallejo
2023-07-18  9:14   ` Jan Beulich
2023-07-17 16:03 ` [PATCH 2/8] arm/mm: Document the differences between arm32 and arm64 directmaps Alejandro Vallejo
2023-07-20 20:05   ` Julien Grall
2023-07-21 15:09     ` Alejandro Vallejo
2023-07-21 17:51       ` Julien Grall
2023-07-17 16:03 ` [PATCH 3/8] pdx: Mark pdx hole description globals readonly after boot Alejandro Vallejo
2023-07-18  9:14   ` Jan Beulich
2023-07-17 16:03 ` [PATCH 4/8] build: Remove CONFIG_HAS_PDX Alejandro Vallejo
2023-07-18  9:19   ` Jan Beulich
2023-07-18  9:35     ` Andrew Cooper
2023-07-18  9:38       ` Jan Beulich
2023-07-18 13:35         ` Alejandro Vallejo
2023-07-17 16:03 ` [PATCH 5/8] mm: Factor out the pdx compression logic in ma/va converters Alejandro Vallejo
2023-07-21 16:11   ` Julien Grall
2023-07-17 16:03 ` [PATCH 6/8] mm/pdx: Standardize region validation wrt pdx compression Alejandro Vallejo
2023-07-21 17:05   ` Julien Grall
2023-07-24 12:18     ` Alejandro Vallejo
2023-07-24 18:20       ` Julien Grall
2023-07-25  6:51         ` Jan Beulich
2023-07-25 14:27           ` Julien Grall
2023-07-25 14:34             ` Jan Beulich
2023-07-27 10:14               ` Alejandro Vallejo
2023-07-27 10:19                 ` Jan Beulich
2023-07-17 16:03 ` [PATCH 7/8] pdx: Reorder pdx.[ch] Alejandro Vallejo
2023-07-17 16:24   ` Alejandro Vallejo
2023-07-17 16:03 ` [PATCH 8/8] pdx: Add CONFIG_HAS_PDX_COMPRESSION as a Kconfig option Alejandro Vallejo
2023-07-18  9:33 ` [PATCH 0/8] Make PDX compression optional Jan Beulich
2023-07-18 12:58   ` Alejandro Vallejo
2023-07-18 13:06     ` Jan Beulich
2023-07-18 13:40       ` Alejandro Vallejo
2023-07-20 22:00 ` Julien Grall
2023-07-20 22:13   ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.