xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] x86: memcpy() / memset() (non-)ERMS flavors plus fallout
@ 2021-04-27 12:51 Jan Beulich
  2021-04-27 12:53 ` [PATCH 1/7] x86: correct comment about alternatives ordering Jan Beulich
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Jan Beulich @ 2021-04-27 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

While the performance varies quite a bit on older (pre-ERMS) and
newer (ERMS) hardware, so far we've been going with just a single
flavor of these two functions, and oddly enough with ones not
consistent with one another. Using plain memcpy() / memset() on
MMIO (video frame buffer) is generally okay, but the ERMS variant
of memcpy() turned out to regress (boot) performance in a way
easily visible to the human eye. Hence as a prerequisite step
this series switches the frame buffer (and VGA) mapping to be
write-combining independent of firmware arrangements (of MTRRs
in particular).

1: x86: correct comment about alternatives ordering
2: x86: introduce ioremap_wc()
3: x86: re-work memset()
4: x86: re-work memcpy()
5: video/vesa: unmap frame buffer when relinquishing console
6: video/vesa: drop "vesa-mtrr" command line option
7: video/vesa: adjust (not just) command line option handling

Side note: While strictly speaking the xen/drivers/video/ changes
fall under REST maintainership, with that code getting built for
x86 only I'm restricting Cc-s to x86 maintainers.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/7] x86: correct comment about alternatives ordering
  2021-04-27 12:51 [PATCH 0/7] x86: memcpy() / memset() (non-)ERMS flavors plus fallout Jan Beulich
@ 2021-04-27 12:53 ` Jan Beulich
  2021-04-27 13:19   ` Andrew Cooper
  2021-04-27 12:54 ` [PATCH 2/7] x86: introduce ioremap_wc() Jan Beulich
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2021-04-27 12:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Unlike Linux, Xen has never (so far) used alternatives patching for
memcpy() or memset(), even less such utilizing multiple alternatives.
Correct the Linux-inherited comment to match reality.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -194,8 +194,7 @@ static void init_or_livepatch _apply_alt
     /*
      * The scan order should be from start to end. A later scanned
      * alternative code can overwrite a previous scanned alternative code.
-     * Some kernel functions (e.g. memcpy, memset, etc) use this order to
-     * patch code.
+     * Some code (e.g. ALTERNATIVE_2()) relies on this order of patching.
      *
      * So be careful if you want to change the scan order to any other
      * order.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/7] x86: introduce ioremap_wc()
  2021-04-27 12:51 [PATCH 0/7] x86: memcpy() / memset() (non-)ERMS flavors plus fallout Jan Beulich
  2021-04-27 12:53 ` [PATCH 1/7] x86: correct comment about alternatives ordering Jan Beulich
@ 2021-04-27 12:54 ` Jan Beulich
  2021-04-27 17:13   ` Andrew Cooper
  2021-04-27 12:54 ` [PATCH 3/7] x86: re-work memset() Jan Beulich
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2021-04-27 12:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

In order for a to-be-introduced ERMS form of memcpy() to not regress
boot performance on certain systems when video output is active, we
first need to arrange for avoiding further dependency on firmware
setting up MTRRs in a way we can actually further modify. On many
systems, due to the continuously growing amounts of installed memory,
MTRRs get configured with at least one huge WB range, and with MMIO
ranges below 4Gb then forced to UC via overlapping MTRRs. mtrr_add(), as
it is today, can't deal with such a setup. Hence on such systems we
presently leave the frame buffer mapped UC, leading to significantly
reduced performance when using REP STOSB / REP MOVSB.

On post-PentiumII hardware (i.e. any that's capable of running 64-bit
code), an effective memory type of WC can be achieved without MTRRs, by
simply referencing the respective PAT entry from the PTEs. While this
will leave the switch to ERMS forms of memset() and memcpy() with
largely unchanged performance, the change here on its own improves
performance on affected systems quite significantly: Measuring just the
individual affected memcpy() invocations yielded a speedup by a factor
of over 250 on my initial (Skylake) test system. memset() isn't getting
improved by as much there, but still by a factor of about 20.

While adding {__,}PAGE_HYPERVISOR_WC, also add {__,}PAGE_HYPERVISOR_WT
to, at the very least, make clear what PTE flags this memory type uses.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: Both callers are __init, so in principle ioremap_wc() could be so,
     too, at least for the time being.
TBD: If the VGA range is WC in the fixed range MTRRs, reusing the low
     1st Mb mapping (like ioremap() does) would be an option.

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5883,6 +5883,20 @@ void __iomem *ioremap(paddr_t pa, size_t
     return (void __force __iomem *)va;
 }
 
+void __iomem *ioremap_wc(paddr_t pa, size_t len)
+{
+    mfn_t mfn = _mfn(PFN_DOWN(pa));
+    unsigned int offs = pa & (PAGE_SIZE - 1);
+    unsigned int nr = PFN_UP(offs + len);
+    void *va;
+
+    WARN_ON(page_is_ram_type(mfn_x(mfn), RAM_TYPE_CONVENTIONAL));
+
+    va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_WC, VMAP_DEFAULT);
+
+    return (void __force __iomem *)(va + offs);
+}
+
 int create_perdomain_mapping(struct domain *d, unsigned long va,
                              unsigned int nr, l1_pgentry_t **pl1tab,
                              struct page_info **ppg)
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -9,9 +9,9 @@
 #include <xen/param.h>
 #include <xen/xmalloc.h>
 #include <xen/kernel.h>
+#include <xen/mm.h>
 #include <xen/vga.h>
 #include <asm/io.h>
-#include <asm/page.h>
 #include "font.h"
 #include "lfb.h"
 
@@ -103,7 +103,7 @@ void __init vesa_init(void)
     lfbp.text_columns = vlfb_info.width / font->width;
     lfbp.text_rows = vlfb_info.height / font->height;
 
-    lfbp.lfb = lfb = ioremap(lfb_base(), vram_remap);
+    lfbp.lfb = lfb = ioremap_wc(lfb_base(), vram_remap);
     if ( !lfb )
         return;
 
@@ -179,8 +179,7 @@ void __init vesa_mtrr_init(void)
 
 static void lfb_flush(void)
 {
-    if ( vesa_mtrr == 3 )
-        __asm__ __volatile__ ("sfence" : : : "memory");
+    __asm__ __volatile__ ("sfence" : : : "memory");
 }
 
 void __init vesa_endboot(bool_t keep)
--- a/xen/drivers/video/vga.c
+++ b/xen/drivers/video/vga.c
@@ -79,7 +79,7 @@ void __init video_init(void)
     {
     case XEN_VGATYPE_TEXT_MODE_3:
         if ( page_is_ram_type(paddr_to_pfn(0xB8000), RAM_TYPE_CONVENTIONAL) ||
-             ((video = ioremap(0xB8000, 0x8000)) == NULL) )
+             ((video = ioremap_wc(0xB8000, 0x8000)) == NULL) )
             return;
         outw(0x200a, 0x3d4); /* disable cursor */
         columns = vga_console_info.u.text_mode_3.columns;
@@ -164,7 +164,11 @@ void __init video_endboot(void)
     {
     case XEN_VGATYPE_TEXT_MODE_3:
         if ( !vgacon_keep )
+        {
             memset(video, 0, columns * lines * 2);
+            iounmap(video);
+            video = ZERO_BLOCK_PTR;
+        }
         break;
     case XEN_VGATYPE_VESA_LFB:
     case XEN_VGATYPE_EFI_LFB:
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -615,6 +615,8 @@ void destroy_perdomain_mapping(struct do
                                unsigned int nr);
 void free_perdomain_mappings(struct domain *);
 
+void __iomem *ioremap_wc(paddr_t, size_t);
+
 extern int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm);
 
 void domain_set_alloc_bitsize(struct domain *d);
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -349,8 +349,10 @@ void efi_update_l4_pgtable(unsigned int
 #define __PAGE_HYPERVISOR_RX      (_PAGE_PRESENT | _PAGE_ACCESSED)
 #define __PAGE_HYPERVISOR         (__PAGE_HYPERVISOR_RX | \
                                    _PAGE_DIRTY | _PAGE_RW)
+#define __PAGE_HYPERVISOR_WT      (__PAGE_HYPERVISOR | _PAGE_PWT)
 #define __PAGE_HYPERVISOR_UCMINUS (__PAGE_HYPERVISOR | _PAGE_PCD)
 #define __PAGE_HYPERVISOR_UC      (__PAGE_HYPERVISOR | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_HYPERVISOR_WC      (__PAGE_HYPERVISOR | _PAGE_PAT)
 #define __PAGE_HYPERVISOR_SHSTK   (__PAGE_HYPERVISOR_RO | _PAGE_DIRTY)
 
 #define MAP_SMALL_PAGES _PAGE_AVAIL0 /* don't use superpages mappings */
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -154,6 +154,10 @@ static inline intpte_t put_pte_flags(uns
                                  _PAGE_GLOBAL | _PAGE_NX)
 #define PAGE_HYPERVISOR_UC      (__PAGE_HYPERVISOR_UC | \
                                  _PAGE_GLOBAL | _PAGE_NX)
+#define PAGE_HYPERVISOR_WC      (__PAGE_HYPERVISOR_WC | \
+                                 _PAGE_GLOBAL | _PAGE_NX)
+#define PAGE_HYPERVISOR_WT      (__PAGE_HYPERVISOR_WT | \
+                                 _PAGE_GLOBAL | _PAGE_NX)
 
 #endif /* __X86_64_PAGE_H__ */
 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/7] x86: re-work memset()
  2021-04-27 12:51 [PATCH 0/7] x86: memcpy() / memset() (non-)ERMS flavors plus fallout Jan Beulich
  2021-04-27 12:53 ` [PATCH 1/7] x86: correct comment about alternatives ordering Jan Beulich
  2021-04-27 12:54 ` [PATCH 2/7] x86: introduce ioremap_wc() Jan Beulich
@ 2021-04-27 12:54 ` Jan Beulich
  2021-04-27 12:54 ` [PATCH 4/7] x86: re-work memcpy() Jan Beulich
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2021-04-27 12:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Move the function to its own assembly file. Having it in C just for the
entire body to be an asm() isn't really helpful. Then have two flavors:
A "basic" version using qword steps for the bulk of the operation, and an
ERMS version for modern hardware, to be substituted in via alternatives
patching.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
We may want to consider branching over the REP STOSQ as well, if the
number of qwords turns out to be zero.
We may also want to consider using non-REP STOS{L,W,B} for the tail.

--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_INDIRECT_THUNK) += indirect
 obj-y += ioport_emulate.o
 obj-y += irq.o
 obj-$(CONFIG_KEXEC) += machine_kexec.o
+obj-y += memset.o
 obj-y += mm.o x86_64/mm.o
 obj-$(CONFIG_HVM) += monitor.o
 obj-y += mpparse.o
--- /dev/null
+++ b/xen/arch/x86/memset.S
@@ -0,0 +1,31 @@
+#include <asm/asm_defns.h>
+
+.macro memset
+        and     $7, %edx
+        shr     $3, %rcx
+        movzbl  %sil, %esi
+        mov     $0x0101010101010101, %rax
+        imul    %rsi, %rax
+        mov     %rdi, %rsi
+        rep stosq
+        or      %edx, %ecx
+        jz      0f
+        rep stosb
+0:
+        mov     %rsi, %rax
+        ret
+.endm
+
+.macro memset_erms
+        mov     %esi, %eax
+        mov     %rdi, %rsi
+        rep stosb
+        mov     %rsi, %rax
+        ret
+.endm
+
+ENTRY(memset)
+        mov     %rdx, %rcx
+        ALTERNATIVE memset, memset_erms, X86_FEATURE_ERMS
+        .type memset, @function
+        .size memset, . - memset
--- a/xen/arch/x86/string.c
+++ b/xen/arch/x86/string.c
@@ -22,19 +22,6 @@ void *(memcpy)(void *dest, const void *s
     return dest;
 }
 
-void *(memset)(void *s, int c, size_t n)
-{
-    long d0, d1;
-
-    asm volatile (
-        "rep stosb"
-        : "=&c" (d0), "=&D" (d1)
-        : "a" (c), "1" (s), "0" (n)
-        : "memory");
-
-    return s;
-}
-
 void *(memmove)(void *dest, const void *src, size_t n)
 {
     long d0, d1, d2;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/7] x86: re-work memcpy()
  2021-04-27 12:51 [PATCH 0/7] x86: memcpy() / memset() (non-)ERMS flavors plus fallout Jan Beulich
                   ` (2 preceding siblings ...)
  2021-04-27 12:54 ` [PATCH 3/7] x86: re-work memset() Jan Beulich
@ 2021-04-27 12:54 ` Jan Beulich
  2021-04-27 12:55 ` [PATCH 5/7] video/vesa: unmap frame buffer when relinquishing console Jan Beulich
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2021-04-27 12:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Move the function to its own assembly file. Having it in C just for the
entire body to be an asm() isn't really helpful. Then have two flavors:
A "basic" version using qword steps for the bulk of the operation, and an
ERMS version for modern hardware, to be substituted in via alternatives
patching.

Alternatives patching, however, requires an extra precaution: It uses
memcpy() itself, and hence the function may patch itself. Luckily the
patched-in code only replaces the prolog of the original function. Make
sure this remains this way.

Additionally alternatives patching, while supposedly safe via enforcing
a control flow change when modifying already prefetched code, may not
really be. Afaict a request is pending to drop the first of the two
options in the SDM's "Handling Self- and Cross-Modifying Code" section.
Insert a serializing instruction there. To avoid having to introduce a
local variable, also switch text_poke() to return void: Neither of its
callers cares about the returned value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
We may want to consider branching over the REP MOVSQ as well, if the
number of qwords turns out to be zero.
We may also want to consider using non-REP MOVS{L,W,B} for the tail.

--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_INDIRECT_THUNK) += indirect
 obj-y += ioport_emulate.o
 obj-y += irq.o
 obj-$(CONFIG_KEXEC) += machine_kexec.o
+obj-y += memcpy.o
 obj-y += memset.o
 obj-y += mm.o x86_64/mm.o
 obj-$(CONFIG_HVM) += monitor.o
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -164,12 +164,14 @@ void init_or_livepatch add_nops(void *in
  * executing.
  *
  * "noinline" to cause control flow change and thus invalidate I$ and
- * cause refetch after modification.
+ * cause refetch after modification.  While the SDM continues to suggest this
+ * is sufficient, it may not be - issue a serializing insn afterwards as well.
  */
-static void *init_or_livepatch noinline
+static void init_or_livepatch noinline
 text_poke(void *addr, const void *opcode, size_t len)
 {
-    return memcpy(addr, opcode, len);
+    memcpy(addr, opcode, len);
+    cpuid_eax(0);
 }
 
 /*
--- /dev/null
+++ b/xen/arch/x86/memcpy.S
@@ -0,0 +1,21 @@
+#include <asm/asm_defns.h>
+
+ENTRY(memcpy)
+        mov     %rdx, %rcx
+        mov     %rdi, %rax
+        /*
+         * We need to be careful here: memcpy() is involved in alternatives
+         * patching, so the code doing the actual copying (i.e. past setting
+         * up registers) may not be subject to patching (unless further
+         * precautions were taken).
+         */
+        ALTERNATIVE "and $7, %edx; shr $3, %rcx", \
+                    "rep movsb; ret", X86_FEATURE_ERMS
+        rep movsq
+        or      %edx, %ecx
+        jz      1f
+        rep movsb
+1:
+        ret
+        .type memcpy, @function
+        .size memcpy, . - memcpy
--- a/xen/arch/x86/string.c
+++ b/xen/arch/x86/string.c
@@ -7,21 +7,6 @@
 
 #include <xen/lib.h>
 
-void *(memcpy)(void *dest, const void *src, size_t n)
-{
-    long d0, d1, d2;
-
-    asm volatile (
-        "   rep ; movs"__OS" ; "
-        "   mov %k4,%k3      ; "
-        "   rep ; movsb        "
-        : "=&c" (d0), "=&D" (d1), "=&S" (d2)
-        : "0" (n/BYTES_PER_LONG), "r" (n%BYTES_PER_LONG), "1" (dest), "2" (src)
-        : "memory" );
-
-    return dest;
-}
-
 void *(memmove)(void *dest, const void *src, size_t n)
 {
     long d0, d1, d2;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/7] video/vesa: unmap frame buffer when relinquishing console
  2021-04-27 12:51 [PATCH 0/7] x86: memcpy() / memset() (non-)ERMS flavors plus fallout Jan Beulich
                   ` (3 preceding siblings ...)
  2021-04-27 12:54 ` [PATCH 4/7] x86: re-work memcpy() Jan Beulich
@ 2021-04-27 12:55 ` Jan Beulich
  2021-04-27 12:56 ` [PATCH 6/7] video/vesa: drop "vesa-mtrr" command line option Jan Beulich
  2021-04-27 12:56 ` [PATCH 7/7] video/vesa: adjust (not just) command line option handling Jan Beulich
  6 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2021-04-27 12:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

There's no point in keeping the VA space occupied when no further output
will occur.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/video/lfb.c
+++ b/xen/drivers/video/lfb.c
@@ -168,4 +168,5 @@ void lfb_free(void)
     xfree(lfb.lbuf);
     xfree(lfb.text_buf);
     xfree(lfb.line_len);
+    lfb.lfbp.lfb = ZERO_BLOCK_PTR;
 }
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -197,5 +197,7 @@ void __init vesa_endboot(bool_t keep)
                    vlfb_info.width * bpp);
         lfb_flush();
         lfb_free();
+        iounmap(lfb);
+        lfb = ZERO_BLOCK_PTR;
     }
 }



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 6/7] video/vesa: drop "vesa-mtrr" command line option
  2021-04-27 12:51 [PATCH 0/7] x86: memcpy() / memset() (non-)ERMS flavors plus fallout Jan Beulich
                   ` (4 preceding siblings ...)
  2021-04-27 12:55 ` [PATCH 5/7] video/vesa: unmap frame buffer when relinquishing console Jan Beulich
@ 2021-04-27 12:56 ` Jan Beulich
  2021-04-27 13:20   ` Andrew Cooper
  2021-04-27 12:56 ` [PATCH 7/7] video/vesa: adjust (not just) command line option handling Jan Beulich
  6 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2021-04-27 12:56 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Now that we use ioremap_wc() for mapping the frame buffer, there's no
need for this option anymore. As noted in the change introducing the
use of ioremap_wc(), mtrr_add() didn't work in certain cases anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,9 @@ The format is based on [Keep a Changelog
 
 ## [unstable UNRELEASED](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=staging) - TBD
 
+### Removed / support downgraded
+ - dropped support for the (x86-only) "vesa-mtrr" command line option
+
 ## [4.15.0 UNRELEASED](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.15.0) - TBD
 
 ### Added / support upgraded
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2369,9 +2369,6 @@ cache-warming. 1ms (1000) has been measu
 ### vesa-map
 > `= <integer>`
 
-### vesa-mtrr
-> `= <integer>`
-
 ### vesa-ram
 > `= <integer>`
 
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1813,8 +1813,6 @@ void __init noreturn __start_xen(unsigne
 
     local_irq_enable();
 
-    vesa_mtrr_init();
-
     early_msi_init();
 
     iommu_setup();    /* setup iommu if available */
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -145,38 +145,6 @@ void __init vesa_init(void)
     video_puts = lfb_redraw_puts;
 }
 
-#include <asm/mtrr.h>
-
-static unsigned int vesa_mtrr;
-integer_param("vesa-mtrr", vesa_mtrr);
-
-void __init vesa_mtrr_init(void)
-{
-    static const int mtrr_types[] = {
-        0, MTRR_TYPE_UNCACHABLE, MTRR_TYPE_WRBACK,
-        MTRR_TYPE_WRCOMB, MTRR_TYPE_WRTHROUGH };
-    unsigned int size_total;
-    int rc, type;
-
-    if ( !lfb || (vesa_mtrr == 0) || (vesa_mtrr >= ARRAY_SIZE(mtrr_types)) )
-        return;
-
-    type = mtrr_types[vesa_mtrr];
-    if ( !type )
-        return;
-
-    /* Find the largest power-of-two */
-    size_total = vram_total;
-    while ( size_total & (size_total - 1) )
-        size_total &= size_total - 1;
-
-    /* Try and find a power of two to add */
-    do {
-        rc = mtrr_add(lfb_base(), size_total, type, 1);
-        size_total >>= 1;
-    } while ( (size_total >= PAGE_SIZE) && (rc == -EINVAL) );
-}
-
 static void lfb_flush(void)
 {
     __asm__ __volatile__ ("sfence" : : : "memory");
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -25,10 +25,8 @@ void init_IRQ(void);
 
 #ifdef CONFIG_VIDEO
 void vesa_init(void);
-void vesa_mtrr_init(void);
 #else
 static inline void vesa_init(void) {};
-static inline void vesa_mtrr_init(void) {};
 #endif
 
 int construct_dom0(



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 7/7] video/vesa: adjust (not just) command line option handling
  2021-04-27 12:51 [PATCH 0/7] x86: memcpy() / memset() (non-)ERMS flavors plus fallout Jan Beulich
                   ` (5 preceding siblings ...)
  2021-04-27 12:56 ` [PATCH 6/7] video/vesa: drop "vesa-mtrr" command line option Jan Beulich
@ 2021-04-27 12:56 ` Jan Beulich
  2021-04-27 13:49   ` Andrew Cooper
  6 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2021-04-27 12:56 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Document both options. Add section annotations to both variables holding
the parsed values as well as a few adjacent ones. Adjust the types of
font_height and vga_compat.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2369,9 +2369,21 @@ cache-warming. 1ms (1000) has been measu
 ### vesa-map
 > `= <integer>`
 
+> Default: `0`
+
+Specify, in MiB, the portion of video RAM to actually remap.  This will be
+honored only when large enough to cover the space needed for the chosen video
+mode, and only when less than a non-zero value possibly specified through
+'vesa-ram'.
+
 ### vesa-ram
 > `= <integer>`
 
+> Default: `0`
+
+This allows to override the amount of video RAM, in MiB, determined to be
+present.
+
 ### vga
 > `= ( ask | current | text-80x<rows> | gfx-<width>x<height>x<depth> | mode-<mode> )[,keep]`
 
--- a/xen/drivers/video/vesa.c
+++ b/xen/drivers/video/vesa.c
@@ -19,17 +19,17 @@
 
 static void lfb_flush(void);
 
-static unsigned char *lfb;
-static const struct font_desc *font;
-static bool_t vga_compat;
+static unsigned char *__read_mostly lfb;
+static const struct font_desc *__initdata font;
+static bool __initdata vga_compat;
 
-static unsigned int vram_total;
+static unsigned int __initdata vram_total;
 integer_param("vesa-ram", vram_total);
 
-static unsigned int vram_remap;
+static unsigned int __initdata vram_remap;
 integer_param("vesa-map", vram_remap);
 
-static int font_height;
+static unsigned int __initdata font_height;
 static int __init parse_font_height(const char *s)
 {
     if ( simple_strtoul(s, &s, 10) == 8 && (*s++ == 'x') )



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] x86: correct comment about alternatives ordering
  2021-04-27 12:53 ` [PATCH 1/7] x86: correct comment about alternatives ordering Jan Beulich
@ 2021-04-27 13:19   ` Andrew Cooper
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Cooper @ 2021-04-27 13:19 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 27/04/2021 13:53, Jan Beulich wrote:
> Unlike Linux, Xen has never (so far) used alternatives patching for
> memcpy() or memset(), even less such utilizing multiple alternatives.
> Correct the Linux-inherited comment to match reality.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/7] video/vesa: drop "vesa-mtrr" command line option
  2021-04-27 12:56 ` [PATCH 6/7] video/vesa: drop "vesa-mtrr" command line option Jan Beulich
@ 2021-04-27 13:20   ` Andrew Cooper
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Cooper @ 2021-04-27 13:20 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 27/04/2021 13:56, Jan Beulich wrote:
> Now that we use ioremap_wc() for mapping the frame buffer, there's no
> need for this option anymore. As noted in the change introducing the
> use of ioremap_wc(), mtrr_add() didn't work in certain cases anyway.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/7] video/vesa: adjust (not just) command line option handling
  2021-04-27 12:56 ` [PATCH 7/7] video/vesa: adjust (not just) command line option handling Jan Beulich
@ 2021-04-27 13:49   ` Andrew Cooper
  2021-04-27 14:04     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2021-04-27 13:49 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 27/04/2021 13:56, Jan Beulich wrote:

The grammar in the subject is very awkward.  The (not just) like that is
weird.

If it were me, I'd phrase this as "minor adjustments to command line
handling".

> Document both options. Add section annotations to both variables holding
> the parsed values as well as a few adjacent ones. Adjust the types of
> font_height and vga_compat.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

In principle, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>, with
one note below.

However, is there really any value in these options?  I can't see a case
where their use will result in a less broken system.

>
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -2369,9 +2369,21 @@ cache-warming. 1ms (1000) has been measu
>  ### vesa-map
>  > `= <integer>`
>  
> +> Default: `0`
> +
> +Specify, in MiB, the portion of video RAM to actually remap.  This will be
> +honored only when large enough to cover the space needed for the chosen video
> +mode, and only when less than a non-zero value possibly specified through
> +'vesa-ram'.

"and only when less than a non-zero value possibly specified" is
confusing to follow.

What I think you mean is that vesa-map will be honoured when it is >=
chosen video mode, and <= vesa-ram?

~Andrew

> +
>  ### vesa-ram
>  > `= <integer>`
>  
> +> Default: `0`
> +
> +This allows to override the amount of video RAM, in MiB, determined to be
> +present.
> +
>  ### vga
>  > `= ( ask | current | text-80x<rows> | gfx-<width>x<height>x<depth> | mode-<mode> )[,keep]`
>  
> --- a/xen/drivers/video/vesa.c
> +++ b/xen/drivers/video/vesa.c
> @@ -19,17 +19,17 @@
>  
>  static void lfb_flush(void);
>  
> -static unsigned char *lfb;
> -static const struct font_desc *font;
> -static bool_t vga_compat;
> +static unsigned char *__read_mostly lfb;
> +static const struct font_desc *__initdata font;
> +static bool __initdata vga_compat;
>  
> -static unsigned int vram_total;
> +static unsigned int __initdata vram_total;
>  integer_param("vesa-ram", vram_total);
>  
> -static unsigned int vram_remap;
> +static unsigned int __initdata vram_remap;
>  integer_param("vesa-map", vram_remap);
>  
> -static int font_height;
> +static unsigned int __initdata font_height;
>  static int __init parse_font_height(const char *s)
>  {
>      if ( simple_strtoul(s, &s, 10) == 8 && (*s++ == 'x') )
>




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/7] video/vesa: adjust (not just) command line option handling
  2021-04-27 13:49   ` Andrew Cooper
@ 2021-04-27 14:04     ` Jan Beulich
  2021-05-27 11:47       ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2021-04-27 14:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 27.04.2021 15:49, Andrew Cooper wrote:
> On 27/04/2021 13:56, Jan Beulich wrote:
> 
> The grammar in the subject is very awkward.  The (not just) like that is
> weird.
> 
> If it were me, I'd phrase this as "minor adjustments to command line
> handling".

Well, the (not just) is intentionally there because there are changes
to stuff unrelated to command line option handling as well.

>> Document both options. Add section annotations to both variables holding
>> the parsed values as well as a few adjacent ones. Adjust the types of
>> font_height and vga_compat.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> In principle, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>, with
> one note below.

Thanks.

> However, is there really any value in these options?  I can't see a case
> where their use will result in a less broken system.

Well, if we mis-detect VRAM size, the respective option might indeed
help. I'm less certain of the utility of the mapping option, the more
that now there's no possible (and implicit) effect on MTRRs anymore.

>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -2369,9 +2369,21 @@ cache-warming. 1ms (1000) has been measu
>>  ### vesa-map
>>  > `= <integer>`
>>  
>> +> Default: `0`
>> +
>> +Specify, in MiB, the portion of video RAM to actually remap.  This will be
>> +honored only when large enough to cover the space needed for the chosen video
>> +mode, and only when less than a non-zero value possibly specified through
>> +'vesa-ram'.
> 
> "and only when less than a non-zero value possibly specified" is
> confusing to follow.
> 
> What I think you mean is that vesa-map will be honoured when it is >=
> chosen video mode, and <= vesa-ram?

Yes. Any suggestion how to improve the wording without using >= and
<= ?

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/7] x86: introduce ioremap_wc()
  2021-04-27 12:54 ` [PATCH 2/7] x86: introduce ioremap_wc() Jan Beulich
@ 2021-04-27 17:13   ` Andrew Cooper
  2021-04-28  9:41     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2021-04-27 17:13 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 27/04/2021 13:54, Jan Beulich wrote:
> In order for a to-be-introduced ERMS form of memcpy() to not regress
> boot performance on certain systems when video output is active, we
> first need to arrange for avoiding further dependency on firmware
> setting up MTRRs in a way we can actually further modify. On many
> systems, due to the continuously growing amounts of installed memory,
> MTRRs get configured with at least one huge WB range, and with MMIO
> ranges below 4Gb then forced to UC via overlapping MTRRs. mtrr_add(), as
> it is today, can't deal with such a setup. Hence on such systems we
> presently leave the frame buffer mapped UC, leading to significantly
> reduced performance when using REP STOSB / REP MOVSB.
>
> On post-PentiumII hardware (i.e. any that's capable of running 64-bit
> code), an effective memory type of WC can be achieved without MTRRs, by
> simply referencing the respective PAT entry from the PTEs. While this
> will leave the switch to ERMS forms of memset() and memcpy() with
> largely unchanged performance, the change here on its own improves
> performance on affected systems quite significantly: Measuring just the
> individual affected memcpy() invocations yielded a speedup by a factor
> of over 250 on my initial (Skylake) test system. memset() isn't getting
> improved by as much there, but still by a factor of about 20.
>
> While adding {__,}PAGE_HYPERVISOR_WC, also add {__,}PAGE_HYPERVISOR_WT
> to, at the very least, make clear what PTE flags this memory type uses.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---

Seeing as MTRRs are full of firmware issues, shouldn't we also
cross-check that the vram is marked WC, or we'll still fall into bad
perf from combining down to UC.  (Obviously follow-up work if so.)

> TBD: Both callers are __init, so in principle ioremap_wc() could be so,
>      too, at least for the time being.

I don't see us making use this at runtime.  Uses of WC are few and far
between.

> TBD: If the VGA range is WC in the fixed range MTRRs, reusing the low
>      1st Mb mapping (like ioremap() does) would be an option.

It might be fine to do that unconditionally.  The low VRAM has had known
settings for 2 decades now.

That said, the low 1MB does use UC- mappings, which means we're entirely
dependent on MTRRs specifying WC for sensible performance.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5883,6 +5883,20 @@ void __iomem *ioremap(paddr_t pa, size_t
>      return (void __force __iomem *)va;
>  }
>  
> +void __iomem *ioremap_wc(paddr_t pa, size_t len)
> +{
> +    mfn_t mfn = _mfn(PFN_DOWN(pa));
> +    unsigned int offs = pa & (PAGE_SIZE - 1);
> +    unsigned int nr = PFN_UP(offs + len);
> +    void *va;
> +
> +    WARN_ON(page_is_ram_type(mfn_x(mfn), RAM_TYPE_CONVENTIONAL));
> +
> +    va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_WC, VMAP_DEFAULT);

This doesn't look correct.  granularity and nr are passed the wrong way
around, but maybe that's related to the fact that only a single mfn is
passed.  I'm confused.

Also, several truncations will occur for a framebuffer > 4G, both with
calculations here, and the types of __vmap()'s parameters.

> +
> +    return (void __force __iomem *)(va + offs);
> +}
> +
>  int create_perdomain_mapping(struct domain *d, unsigned long va,
>                               unsigned int nr, l1_pgentry_t **pl1tab,
>                               struct page_info **ppg)
> --- a/xen/drivers/video/vesa.c
> +++ b/xen/drivers/video/vesa.c
> @@ -9,9 +9,9 @@
>  #include <xen/param.h>
>  #include <xen/xmalloc.h>
>  #include <xen/kernel.h>
> +#include <xen/mm.h>
>  #include <xen/vga.h>
>  #include <asm/io.h>
> -#include <asm/page.h>
>  #include "font.h"
>  #include "lfb.h"
>  
> @@ -103,7 +103,7 @@ void __init vesa_init(void)
>      lfbp.text_columns = vlfb_info.width / font->width;
>      lfbp.text_rows = vlfb_info.height / font->height;
>  
> -    lfbp.lfb = lfb = ioremap(lfb_base(), vram_remap);
> +    lfbp.lfb = lfb = ioremap_wc(lfb_base(), vram_remap);
>      if ( !lfb )
>          return;
>  
> @@ -179,8 +179,7 @@ void __init vesa_mtrr_init(void)
>  
>  static void lfb_flush(void)
>  {
> -    if ( vesa_mtrr == 3 )
> -        __asm__ __volatile__ ("sfence" : : : "memory");
> +    __asm__ __volatile__ ("sfence" : : : "memory");

wmb(), seeing as that is the operation we mean here?

>  }
>  
>  void __init vesa_endboot(bool_t keep)
> --- a/xen/drivers/video/vga.c
> +++ b/xen/drivers/video/vga.c
> @@ -79,7 +79,7 @@ void __init video_init(void)
>      {
>      case XEN_VGATYPE_TEXT_MODE_3:
>          if ( page_is_ram_type(paddr_to_pfn(0xB8000), RAM_TYPE_CONVENTIONAL) ||
> -             ((video = ioremap(0xB8000, 0x8000)) == NULL) )
> +             ((video = ioremap_wc(0xB8000, 0x8000)) == NULL) )
>              return;
>          outw(0x200a, 0x3d4); /* disable cursor */
>          columns = vga_console_info.u.text_mode_3.columns;
> @@ -164,7 +164,11 @@ void __init video_endboot(void)
>      {
>      case XEN_VGATYPE_TEXT_MODE_3:
>          if ( !vgacon_keep )
> +        {
>              memset(video, 0, columns * lines * 2);
> +            iounmap(video);
> +            video = ZERO_BLOCK_PTR;
> +        }
>          break;

Shouldn't this hunk be in patch 5?

>      case XEN_VGATYPE_VESA_LFB:
>      case XEN_VGATYPE_EFI_LFB:
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -615,6 +615,8 @@ void destroy_perdomain_mapping(struct do
>                                 unsigned int nr);
>  void free_perdomain_mappings(struct domain *);
>  
> +void __iomem *ioremap_wc(paddr_t, size_t);

I'm not sure we want to add a second prototype.  ARM has ioremap_wc()
too, and we absolutely don't want them to get out of sync, and we have
two new architectures on the horizon.

Perhaps a new xen/ioremap.h which includes asm/ioremap.h  (although
thinking forward to encrypted RAM, we might want something which can
also encompass the memremap*() variants.)

ARM can #define ioremap_wc ioremap_wc and provide their inline wrapper. 
x86 can fall back to the common prototype.  Other architectures can do
whatever is best for them.

~Andrew



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/7] x86: introduce ioremap_wc()
  2021-04-27 17:13   ` Andrew Cooper
@ 2021-04-28  9:41     ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2021-04-28  9:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 27.04.2021 19:13, Andrew Cooper wrote:
> On 27/04/2021 13:54, Jan Beulich wrote:
>> In order for a to-be-introduced ERMS form of memcpy() to not regress
>> boot performance on certain systems when video output is active, we
>> first need to arrange for avoiding further dependency on firmware
>> setting up MTRRs in a way we can actually further modify. On many
>> systems, due to the continuously growing amounts of installed memory,
>> MTRRs get configured with at least one huge WB range, and with MMIO
>> ranges below 4Gb then forced to UC via overlapping MTRRs. mtrr_add(), as
>> it is today, can't deal with such a setup. Hence on such systems we
>> presently leave the frame buffer mapped UC, leading to significantly
>> reduced performance when using REP STOSB / REP MOVSB.
>>
>> On post-PentiumII hardware (i.e. any that's capable of running 64-bit
>> code), an effective memory type of WC can be achieved without MTRRs, by
>> simply referencing the respective PAT entry from the PTEs. While this
>> will leave the switch to ERMS forms of memset() and memcpy() with
>> largely unchanged performance, the change here on its own improves
>> performance on affected systems quite significantly: Measuring just the
>> individual affected memcpy() invocations yielded a speedup by a factor
>> of over 250 on my initial (Skylake) test system. memset() isn't getting
>> improved by as much there, but still by a factor of about 20.
>>
>> While adding {__,}PAGE_HYPERVISOR_WC, also add {__,}PAGE_HYPERVISOR_WT
>> to, at the very least, make clear what PTE flags this memory type uses.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
> 
> Seeing as MTRRs are full of firmware issues, shouldn't we also
> cross-check that the vram is marked WC, or we'll still fall into bad
> perf from combining down to UC.  (Obviously follow-up work if so.)

VRAM generally isn't marked WC nowadays, at least from my observations.
And it doesn't need to be - there's no "combining down to UC" when the
WC PAT entry is chosen by the PTE. Plus firmware doing so may actually
be counter-productive, as the VRAM address range may easily move when
BARs get re-assigned by an OS.

>> TBD: Both callers are __init, so in principle ioremap_wc() could be so,
>>      too, at least for the time being.
> 
> I don't see us making use this at runtime.  Uses of WC are few and far
> between.

Okay, will mark the function __init then.

>> TBD: If the VGA range is WC in the fixed range MTRRs, reusing the low
>>      1st Mb mapping (like ioremap() does) would be an option.
> 
> It might be fine to do that unconditionally.  The low VRAM has had known
> settings for 2 decades now.
> 
> That said, the low 1MB does use UC- mappings, which means we're entirely
> dependent on MTRRs specifying WC for sensible performance.

The two parts of your reply look to contradict one another: I've been
suggesting to use the low-Mb mapping precisely only when the fixed
range MTRRs covering the range are all saying WC.

>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5883,6 +5883,20 @@ void __iomem *ioremap(paddr_t pa, size_t
>>      return (void __force __iomem *)va;
>>  }
>>  
>> +void __iomem *ioremap_wc(paddr_t pa, size_t len)
>> +{
>> +    mfn_t mfn = _mfn(PFN_DOWN(pa));
>> +    unsigned int offs = pa & (PAGE_SIZE - 1);
>> +    unsigned int nr = PFN_UP(offs + len);
>> +    void *va;
>> +
>> +    WARN_ON(page_is_ram_type(mfn_x(mfn), RAM_TYPE_CONVENTIONAL));
>> +
>> +    va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_WC, VMAP_DEFAULT);
> 
> This doesn't look correct.  granularity and nr are passed the wrong way
> around, but maybe that's related to the fact that only a single mfn is
> passed.  I'm confused.

It's identical to ioremap(). And yes, it's this way because of there
just being a single MFN getting passed in here. If I flipped the 2nd
and 3rd arguments, nr calls to map_pages_to_xen() would result
instead of just one.

> Also, several truncations will occur for a framebuffer > 4G, both with
> calculations here, and the types of __vmap()'s parameters.

Truncations may occur, indeed, but for frame buffers >= 16Tb. I
didn't think I needed to worry about this more here than ioremap()
does.

>> @@ -179,8 +179,7 @@ void __init vesa_mtrr_init(void)
>>  
>>  static void lfb_flush(void)
>>  {
>> -    if ( vesa_mtrr == 3 )
>> -        __asm__ __volatile__ ("sfence" : : : "memory");
>> +    __asm__ __volatile__ ("sfence" : : : "memory");
> 
> wmb(), seeing as that is the operation we mean here?

Not sure, to be honest. It's not been all this long ago that
smp_wmb() and wmb() were the same. And it's also not a write
barrier we mean here, but specifically SFENCE (which merely
happens to be the insn of choice for implementing wmb()). We
don't care about having a full-fledged write barrier here -
other writes getting reordered would in principle be fine.

>> --- a/xen/drivers/video/vga.c
>> +++ b/xen/drivers/video/vga.c
>> @@ -79,7 +79,7 @@ void __init video_init(void)
>>      {
>>      case XEN_VGATYPE_TEXT_MODE_3:
>>          if ( page_is_ram_type(paddr_to_pfn(0xB8000), RAM_TYPE_CONVENTIONAL) ||
>> -             ((video = ioremap(0xB8000, 0x8000)) == NULL) )
>> +             ((video = ioremap_wc(0xB8000, 0x8000)) == NULL) )
>>              return;
>>          outw(0x200a, 0x3d4); /* disable cursor */
>>          columns = vga_console_info.u.text_mode_3.columns;
>> @@ -164,7 +164,11 @@ void __init video_endboot(void)
>>      {
>>      case XEN_VGATYPE_TEXT_MODE_3:
>>          if ( !vgacon_keep )
>> +        {
>>              memset(video, 0, columns * lines * 2);
>> +            iounmap(video);
>> +            video = ZERO_BLOCK_PTR;
>> +        }
>>          break;
> 
> Shouldn't this hunk be in patch 5?

Only if ioremap_wc() unconditionally re-used the low-Mb mapping;
I don't want to introduce another leak, paralleling the one which
patch 5 addresses. And as per above I see ioremap_wc() use __va()
at best conditionally.

>> --- a/xen/include/asm-x86/mm.h
>> +++ b/xen/include/asm-x86/mm.h
>> @@ -615,6 +615,8 @@ void destroy_perdomain_mapping(struct do
>>                                 unsigned int nr);
>>  void free_perdomain_mappings(struct domain *);
>>  
>> +void __iomem *ioremap_wc(paddr_t, size_t);
> 
> I'm not sure we want to add a second prototype.  ARM has ioremap_wc()
> too, and we absolutely don't want them to get out of sync, and we have
> two new architectures on the horizon.
> 
> Perhaps a new xen/ioremap.h which includes asm/ioremap.h  (although
> thinking forward to encrypted RAM, we might want something which can
> also encompass the memremap*() variants.)
> 
> ARM can #define ioremap_wc ioremap_wc and provide their inline wrapper. 
> x86 can fall back to the common prototype.  Other architectures can do
> whatever is best for them.

I'm afraid I don't see how what you suggest would not lead to
duplicate prototypes. Arm wanting the function be an inline
one precludes exposing the extern declaration there. Hence
there will still be a risk of things going out of sync. In
fact I first had the declaration next to ioremap()'s, until I
ran into the build issue on Arm.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/7] video/vesa: adjust (not just) command line option handling
  2021-04-27 14:04     ` Jan Beulich
@ 2021-05-27 11:47       ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2021-05-27 11:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, xen-devel

On 27.04.2021 16:04, Jan Beulich wrote:
> On 27.04.2021 15:49, Andrew Cooper wrote:
>> However, is there really any value in these options?  I can't see a case
>> where their use will result in a less broken system.
> 
> Well, if we mis-detect VRAM size, the respective option might indeed
> help. I'm less certain of the utility of the mapping option, the more
> that now there's no possible (and implicit) effect on MTRRs anymore.

Actually I was wrong with referring to an implied effect on MTRRs - that
would come from "vesa-ram=". "vesa-map=" may help if we mis-detected the
space we actually need for the mode. However, we'd then be screwed up
elsewhere as well, so I guess I'll add a patch removing "vesa-map=" as
well.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-27 11:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 12:51 [PATCH 0/7] x86: memcpy() / memset() (non-)ERMS flavors plus fallout Jan Beulich
2021-04-27 12:53 ` [PATCH 1/7] x86: correct comment about alternatives ordering Jan Beulich
2021-04-27 13:19   ` Andrew Cooper
2021-04-27 12:54 ` [PATCH 2/7] x86: introduce ioremap_wc() Jan Beulich
2021-04-27 17:13   ` Andrew Cooper
2021-04-28  9:41     ` Jan Beulich
2021-04-27 12:54 ` [PATCH 3/7] x86: re-work memset() Jan Beulich
2021-04-27 12:54 ` [PATCH 4/7] x86: re-work memcpy() Jan Beulich
2021-04-27 12:55 ` [PATCH 5/7] video/vesa: unmap frame buffer when relinquishing console Jan Beulich
2021-04-27 12:56 ` [PATCH 6/7] video/vesa: drop "vesa-mtrr" command line option Jan Beulich
2021-04-27 13:20   ` Andrew Cooper
2021-04-27 12:56 ` [PATCH 7/7] video/vesa: adjust (not just) command line option handling Jan Beulich
2021-04-27 13:49   ` Andrew Cooper
2021-04-27 14:04     ` Jan Beulich
2021-05-27 11:47       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).