xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/12] direct-map DomUs
@ 2020-04-15  1:02 Stefano Stabellini
  2020-04-15  1:02 ` [PATCH 01/12] xen: introduce xen_dom_flags Stefano Stabellini
                   ` (12 more replies)
  0 siblings, 13 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel
  Cc: Artem_Mygaiev, peng.fan, sstabellini, julien, andrew.cooper3,
	George.Dunlap, Bertrand.Marquis, jbeulich, Volodymyr_Babchuk

Hi all,

This series adds support for 1:1 mapping (guest physical == physical)
the memory of dom0less domUs. The memory ranges assigned to a domU can be
explicitly chosen by the user at boot time.

This is desirable in cases where an IOMMU is not present in the system,
or it cannot be used. For instance, it might not be usable because it
doesn't cover a specific device, or because it doesn't have enough
bandwidth, or because it adds too much latency. In these cases, the user
should use a MPU to protect the memory in the system (e.g. the Xilinx
XMPU), configuring it with the chosen address ranges.

Cheers,

Stefano



The following changes since commit 7372466b21c3b6c96bb7a52754e432bac883a1e3:

  x86/mem_sharing: Fix build with !CONFIG_XSM (2020-04-10 15:20:10 +0100)

are available in the Git repository at:

  http://xenbits.xenproject.org/git-http/people/sstabellini/xen-unstable.git direct-map-1

for you to fetch changes up to 43503720ab6851a28a66fdd067f592d5354ae83a:

  xen/arm: call iomem_permit_access for passthrough devices (2020-04-14 17:42:21 -0700)

----------------------------------------------------------------
Stefano Stabellini (12):
      xen: introduce xen_dom_flags
      xen/arm: introduce arch_xen_dom_flags and direct_map
      xen/arm: introduce 1:1 mapping for domUs
      xen: split alloc_heap_pages in two halves for reusability
      xen: introduce reserve_heap_pages
      xen/arm: reserve 1:1 memory for direct_map domUs
      xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase
      xen/arm: if is_domain_direct_mapped use native addresses for GICv2
      xen/arm: if is_domain_direct_mapped use native addresses for GICv3
      xen/arm: if is_domain_direct_mapped use native UART address for vPL011
      xen/arm: if xen_force don't try to setup the IOMMU
      xen/arm: call iomem_permit_access for passthrough devices

 docs/misc/arm/device-tree/booting.txt |  13 +++
 docs/misc/arm/passthrough-noiommu.txt |  35 ++++++++
 xen/arch/arm/domain.c                 |   4 +-
 xen/arch/arm/domain_build.c           | 141 ++++++++++++++++++++++++++----
 xen/arch/arm/setup.c                  |   3 +-
 xen/arch/arm/vgic-v2.c                |  12 +--
 xen/arch/arm/vgic-v3.c                |  18 +++-
 xen/arch/arm/vgic/vgic-init.c         |   4 +-
 xen/arch/arm/vgic/vgic-v2.c           |  18 ++--
 xen/arch/arm/vpl011.c                 |  12 ++-
 xen/arch/x86/domain.c                 |   3 +-
 xen/arch/x86/setup.c                  |   3 +-
 xen/common/domain.c                   |  13 +--
 xen/common/domctl.c                   |   3 +-
 xen/common/page_alloc.c               | 158 +++++++++++++++++++++++++---------
 xen/common/sched/core.c               |   3 +-
 xen/include/asm-arm/domain.h          |  10 ++-
 xen/include/asm-arm/new_vgic.h        |   4 +-
 xen/include/asm-arm/vgic.h            |   1 +
 xen/include/asm-x86/domain.h          |   2 +
 xen/include/xen/domain.h              |   8 +-
 xen/include/xen/mm.h                  |   2 +
 xen/include/xen/sched.h               |   2 +-
 23 files changed, 373 insertions(+), 99 deletions(-)
 create mode 100644 docs/misc/arm/passthrough-noiommu.txt


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 01/12] xen: introduce xen_dom_flags
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15  9:12   ` Jan Beulich
  2020-04-15  1:02 ` [PATCH 02/12] xen/arm: introduce arch_xen_dom_flags and direct_map Stefano Stabellini
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, julien, Wei Liu, George Dunlap, andrew.cooper3,
	Ian Jackson, Dario Faggioli, jbeulich, Stefano Stabellini,
	Volodymyr_Babchuk, Roger Pau Monné

We are passing an extra special boolean flag at domain creation to
specify whether we want to the domain to be privileged (i.e. dom0) or
not. Another flag will be introduced later in this series.

Introduce a new struct xen_dom_flags and move the privileged flag to it.
Other flags will be added to struct xen_dom_flags.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
CC: andrew.cooper3@citrix.com
CC: jbeulich@suse.com
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wl@xen.org>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Dario Faggioli <dfaggioli@suse.com>
---
 xen/arch/arm/domain.c       |  3 ++-
 xen/arch/arm/domain_build.c |  3 ++-
 xen/arch/arm/setup.c        |  3 ++-
 xen/arch/x86/domain.c       |  3 ++-
 xen/arch/x86/setup.c        |  3 ++-
 xen/common/domain.c         | 13 +++++++------
 xen/common/domctl.c         |  3 ++-
 xen/common/sched/core.c     |  3 ++-
 xen/include/xen/domain.h    |  7 ++++++-
 xen/include/xen/sched.h     |  2 +-
 10 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 6627be2922..b906a38b6b 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -669,7 +669,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
 }
 
 int arch_domain_create(struct domain *d,
-                       struct xen_domctl_createdomain *config)
+                       struct xen_domctl_createdomain *config,
+                       struct xen_dom_flags *flags)
 {
     int rc, count = 0;
 
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 4307087536..20e62a9fc4 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2467,6 +2467,7 @@ void __init create_domUs(void)
             .max_grant_frames = 64,
             .max_maptrack_frames = 1024,
         };
+        struct xen_dom_flags flags = { false };
 
         if ( !dt_device_is_compatible(node, "xen,domain") )
             continue;
@@ -2491,7 +2492,7 @@ void __init create_domUs(void)
                                          GUEST_VPL011_SPI - 32 + 1);
         }
 
-        d = domain_create(++max_init_domid, &d_cfg, false);
+        d = domain_create(++max_init_domid, &d_cfg, &flags);
         if ( IS_ERR(d) )
             panic("Error creating domain %s\n", dt_node_name(node));
 
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 7968cee47d..9ccb3f7385 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -787,6 +787,7 @@ void __init start_xen(unsigned long boot_phys_offset,
         .max_maptrack_frames = -1,
     };
     int rc;
+    struct xen_dom_flags flags = { true };
 
     dcache_line_bytes = read_dcache_line_bytes();
 
@@ -955,7 +956,7 @@ void __init start_xen(unsigned long boot_phys_offset,
     if ( iommu_enabled )
         dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
 
-    dom0 = domain_create(0, &dom0_cfg, true);
+    dom0 = domain_create(0, &dom0_cfg, &flags);
     if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
         panic("Error creating domain 0\n");
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a008d7df1c..b92776f824 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -529,7 +529,8 @@ static bool emulation_flags_ok(const struct domain *d, uint32_t emflags)
 }
 
 int arch_domain_create(struct domain *d,
-                       struct xen_domctl_createdomain *config)
+                       struct xen_domctl_createdomain *config,
+                       struct xen_dom_flags *flags)
 {
     bool paging_initialised = false;
     uint32_t emflags;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 885919d5c3..e61114858b 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -706,6 +706,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         .max_maptrack_frames = -1,
     };
     const char *hypervisor_name;
+    struct xen_dom_flags flags = { !pv_shim };
 
     /* Critical region without IDT or TSS.  Any fault is deadly! */
 
@@ -1761,7 +1762,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
 
     /* Create initial domain 0. */
-    dom0 = domain_create(get_initial_domain_id(), &dom0_cfg, !pv_shim);
+    dom0 = domain_create(get_initial_domain_id(), &dom0_cfg, &flags);
     if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
         panic("Error creating domain 0\n");
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 3dcd73f67c..dd53475cc3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -341,7 +341,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
 
 struct domain *domain_create(domid_t domid,
                              struct xen_domctl_createdomain *config,
-                             bool is_priv)
+                             struct xen_dom_flags *flags)
 {
     struct domain *d, **pd, *old_hwdom = NULL;
     enum { INIT_watchdog = 1u<<1,
@@ -363,7 +363,7 @@ struct domain *domain_create(domid_t domid,
     ASSERT(is_system_domain(d) ? config == NULL : config != NULL);
 
     /* Sort out our idea of is_control_domain(). */
-    d->is_privileged = is_priv;
+    d->is_privileged =  flags ? flags->is_priv : false;
 
     /* Sort out our idea of is_hardware_domain(). */
     if ( domid == 0 || domid == hardware_domid )
@@ -443,7 +443,7 @@ struct domain *domain_create(domid_t domid,
         radix_tree_init(&d->pirq_tree);
     }
 
-    if ( (err = arch_domain_create(d, config)) != 0 )
+    if ( (err = arch_domain_create(d, config, flags)) != 0 )
         goto fail;
     init_status |= INIT_arch;
 
@@ -547,6 +547,7 @@ struct domain *domain_create(domid_t domid,
 
 void __init setup_system_domains(void)
 {
+    struct xen_dom_flags flags = { false };
     /*
      * Initialise our DOMID_XEN domain.
      * Any Xen-heap pages that we will allow to be mapped will have
@@ -554,7 +555,7 @@ void __init setup_system_domains(void)
      * Hidden PCI devices will also be associated with this domain
      * (but be [partly] controlled by Dom0 nevertheless).
      */
-    dom_xen = domain_create(DOMID_XEN, NULL, false);
+    dom_xen = domain_create(DOMID_XEN, NULL, &flags);
     if ( IS_ERR(dom_xen) )
         panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
 
@@ -564,7 +565,7 @@ void __init setup_system_domains(void)
      * array. Mappings occur at the priv of the caller.
      * Quarantined PCI devices will be associated with this domain.
      */
-    dom_io = domain_create(DOMID_IO, NULL, false);
+    dom_io = domain_create(DOMID_IO, NULL, &flags);
     if ( IS_ERR(dom_io) )
         panic("Failed to create d[IO]: %ld\n", PTR_ERR(dom_io));
 
@@ -573,7 +574,7 @@ void __init setup_system_domains(void)
      * Initialise our COW domain.
      * This domain owns sharable pages.
      */
-    dom_cow = domain_create(DOMID_COW, NULL, false);
+    dom_cow = domain_create(DOMID_COW, NULL, &flags);
     if ( IS_ERR(dom_cow) )
         panic("Failed to create d[COW]: %ld\n", PTR_ERR(dom_cow));
 #endif
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index a69b3b59a8..cb8d25500a 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -364,6 +364,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
     bool_t copyback = 0;
     struct xen_domctl curop, *op = &curop;
     struct domain *d;
+    struct xen_dom_flags flags ={ false };
 
     if ( copy_from_guest(op, u_domctl, 1) )
         return -EFAULT;
@@ -515,7 +516,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             rover = dom;
         }
 
-        d = domain_create(dom, &op->u.createdomain, false);
+        d = domain_create(dom, &op->u.createdomain, &flags);
         if ( IS_ERR(d) )
         {
             ret = PTR_ERR(d);
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 626861a3fe..6457c71ce1 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2888,6 +2888,7 @@ void __init scheduler_init(void)
 {
     struct domain *idle_domain;
     int i;
+    struct xen_dom_flags flags = { false };
 
     scheduler_enable();
 
@@ -2957,7 +2958,7 @@ void __init scheduler_init(void)
         sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
     }
 
-    idle_domain = domain_create(DOMID_IDLE, NULL, false);
+    idle_domain = domain_create(DOMID_IDLE, NULL, &flags);
     BUG_ON(IS_ERR(idle_domain));
     BUG_ON(nr_cpu_ids > ARRAY_SIZE(idle_vcpu));
     idle_domain->vcpu = idle_vcpu;
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 7e51d361de..4423e34500 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -63,8 +63,13 @@ void arch_vcpu_destroy(struct vcpu *v);
 int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset);
 void unmap_vcpu_info(struct vcpu *v);
 
+struct xen_dom_flags {
+    bool is_priv;
+};
+
 int arch_domain_create(struct domain *d,
-                       struct xen_domctl_createdomain *config);
+                       struct xen_domctl_createdomain *config,
+                       struct xen_dom_flags *flags);
 
 void arch_domain_destroy(struct domain *d);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 195e7ee583..4112f1ffc9 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -594,7 +594,7 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config);
  */
 struct domain *domain_create(domid_t domid,
                              struct xen_domctl_createdomain *config,
-                             bool is_priv);
+                             struct xen_dom_flags *flags);
 
 /*
  * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 02/12] xen/arm: introduce arch_xen_dom_flags and direct_map
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
  2020-04-15  1:02 ` [PATCH 01/12] xen: introduce xen_dom_flags Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 10:27   ` Jan Beulich
  2020-04-15  1:02 ` [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs Stefano Stabellini
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, julien, Wei Liu, George Dunlap, andrew.cooper3,
	Ian Jackson, jbeulich, Stefano Stabellini, Volodymyr_Babchuk,
	Roger Pau Monné

Introduce a new field in struct xen_dom_flags to store arch-specific
flags.

Add an ARM-specific flag to specify that the domain should be directly
mapped (guest physical addresses == physical addresses).

Also, add a direct_map flag under struct arch_domain and use it to
implement is_domain_direct_mapped.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
CC: andrew.cooper3@citrix.com
CC: jbeulich@suse.com
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wl@xen.org>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
---
 xen/arch/arm/domain.c        | 1 +
 xen/arch/arm/domain_build.c  | 1 +
 xen/arch/arm/setup.c         | 2 +-
 xen/include/asm-arm/domain.h | 9 ++++++++-
 xen/include/asm-x86/domain.h | 2 ++
 xen/include/xen/domain.h     | 1 +
 6 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index b906a38b6b..59eae36de7 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -682,6 +682,7 @@ int arch_domain_create(struct domain *d,
         return 0;
 
     ASSERT(config != NULL);
+    d->arch.direct_map = flags != NULL ? flags->arch.is_direct_map : false;
 
     /* p2m_init relies on some value initialized by the IOMMU subsystem */
     if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 20e62a9fc4..2ec7453aa3 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2527,6 +2527,7 @@ int __init construct_dom0(struct domain *d)
 
     iommu_hwdom_init(d);
 
+    d->arch.direct_map = true;
     d->max_pages = ~0U;
 
     kinfo.unassigned_mem = dom0_mem;
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 9ccb3f7385..5434548e7b 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -787,7 +787,7 @@ void __init start_xen(unsigned long boot_phys_offset,
         .max_maptrack_frames = -1,
     };
     int rc;
-    struct xen_dom_flags flags = { true };
+    struct xen_dom_flags flags = { true, .arch.is_direct_map = true };
 
     dcache_line_bytes = read_dcache_line_bytes();
 
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index d39477a939..7a498921bf 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -32,7 +32,7 @@ enum domain_type {
 #endif
 
 /* The hardware domain has always its memory direct mapped. */
-#define is_domain_direct_mapped(d) ((d) == hardware_domain)
+#define is_domain_direct_mapped(d) ((d)->arch.direct_map != false)
 
 struct vtimer {
     struct vcpu *v;
@@ -98,8 +98,15 @@ struct arch_domain
 #ifdef CONFIG_TEE
     void *tee;
 #endif
+
+    bool direct_map;
 }  __cacheline_aligned;
 
+struct arch_xen_dom_flags
+{
+    bool is_direct_map;
+};
+
 struct arch_vcpu
 {
     struct {
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 105adf96eb..52199ed5b9 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -418,6 +418,8 @@ struct arch_domain
     uint32_t emulation_flags;
 } __cacheline_aligned;
 
+struct arch_xen_dom_flags {};
+
 #ifdef CONFIG_HVM
 #define X86_EMU_LAPIC    XEN_X86_EMU_LAPIC
 #define X86_EMU_HPET     XEN_X86_EMU_HPET
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 4423e34500..7227e6ca98 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -65,6 +65,7 @@ void unmap_vcpu_info(struct vcpu *v);
 
 struct xen_dom_flags {
     bool is_priv;
+    struct arch_xen_dom_flags arch;
 };
 
 int arch_domain_create(struct domain *d,
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
  2020-04-15  1:02 ` [PATCH 01/12] xen: introduce xen_dom_flags Stefano Stabellini
  2020-04-15  1:02 ` [PATCH 02/12] xen/arm: introduce arch_xen_dom_flags and direct_map Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 13:36   ` Julien Grall
  2020-04-15  1:02 ` [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability Stefano Stabellini
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Volodymyr_Babchuk, sstabellini, julien, Stefano Stabellini

In some cases it is desirable to map domU memory 1:1 (guest physical ==
physical.) For instance, because we want to assign a device to the domU
but the IOMMU is not present or cannot be used. In these cases, other
mechanisms should be used for DMA protection, e.g. a MPU.

This patch introduces a new device tree option for dom0less guests to
request a domain to be directly mapped. It also specifies the memory
ranges. This patch documents the new attribute and parses it at boot
time. (However, the implementation of 1:1 mapping is missing and just
BUG() out at the moment.)  Finally the patch sets the new direct_map
flag for DomU domains.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
 docs/misc/arm/device-tree/booting.txt | 13 +++++++
 docs/misc/arm/passthrough-noiommu.txt | 35 ++++++++++++++++++
 xen/arch/arm/domain_build.c           | 52 +++++++++++++++++++++++++--
 3 files changed, 98 insertions(+), 2 deletions(-)
 create mode 100644 docs/misc/arm/passthrough-noiommu.txt

diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt
index 5243bc7fd3..fce5f7ed5a 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -159,6 +159,19 @@ with the following properties:
     used, or GUEST_VPL011_SPI+1 if vpl011 is enabled, whichever is
     greater.
 
+- direct-map
+
+    Optional. An array of integer pairs specifying addresses and sizes.
+    direct_map requests the memory of the domain to be 1:1 mapped with
+    the memory ranges specified as argument. Only sizes that are a
+    power of two number of pages are allowed.
+
+- #direct-map-addr-cells and #direct-map-size-cells
+
+    The number of cells to use for the addresses and for the sizes in
+    direct-map. Default and maximum are 2 cells for both addresses and
+    sizes.
+
 - #address-cells and #size-cells
 
     Both #address-cells and #size-cells need to be specified because
diff --git a/docs/misc/arm/passthrough-noiommu.txt b/docs/misc/arm/passthrough-noiommu.txt
new file mode 100644
index 0000000000..d2bfaf26c3
--- /dev/null
+++ b/docs/misc/arm/passthrough-noiommu.txt
@@ -0,0 +1,35 @@
+Request Device Assignment without IOMMU support
+===============================================
+
+Add xen,force-assign-without-iommu; to the device tree snippet
+
+    ethernet: ethernet@ff0e0000 {
+        compatible = "cdns,zynqmp-gem";
+        xen,path = "/amba/ethernet@ff0e0000";
+        xen,reg = <0x0 0xff0e0000 0x1000 0x0 0xff0e0000>;
+        xen,force-assign-without-iommu;
+
+Optionally, if none of the domains require an IOMMU, then it could be
+disabled (not recommended). For instance by adding status = "disabled";
+under the smmu node:
+
+    smmu@fd800000 {
+        compatible = "arm,mmu-500";
+        status = "disabled";
+
+
+Request 1:1 memory mapping for the dom0-less domain
+===================================================
+
+Add a direct-map property under the appropriate /chosen/domU node with
+the memory ranges you want to assign to your domain. If you are using
+imagebuilder, you can add to boot.source something like the following:
+
+    fdt set /chosen/domU0 direct-map <0x0 0x10000000 0x0 0x10000000 0x0 0x60000000 0x0 0x10000000>
+
+Which will assign the ranges:
+
+    0x10000000 - 0x20000000
+    0x60000000 - 0x70000000
+
+to the first dom0less domU.
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 2ec7453aa3..a2bb411568 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2435,7 +2435,51 @@ static int __init construct_domU(struct domain *d,
     /* type must be set before allocate memory */
     d->arch.type = kinfo.type;
 #endif
-    allocate_memory(d, &kinfo);
+
+    if ( !is_domain_direct_mapped(d) )
+        allocate_memory(d, &kinfo);
+    else
+    {
+        struct membank banks[NR_MEM_BANKS];
+        const struct dt_property *prop;
+        u32 direct_map_len, direct_map_addr_len = 2, direct_map_size_len = 2;
+        unsigned int i;
+        __be32 *p;
+
+        prop = dt_find_property(node, "direct-map", &direct_map_len);
+        BUG_ON(!prop);
+
+        dt_property_read_u32(node,
+                             "#direct-map-addr-cells",
+                             &direct_map_addr_len);
+        dt_property_read_u32(node,
+                             "#direct-map-size-cells",
+                             &direct_map_size_len);
+        BUG_ON(direct_map_size_len > 2 || direct_map_addr_len > 2);
+
+        for ( i = 0, p = prop->value;
+              i < direct_map_len /
+                  (4 * (direct_map_addr_len + direct_map_size_len));
+              i++)
+        {
+            int j;
+
+            banks[i].start = 0;
+            for ( j = 0; j < direct_map_addr_len; j++, p++ )
+                banks[i].start |= __be32_to_cpu(*p) << (32*j);
+
+            banks[i].size = 0;
+            for ( j = 0; j < direct_map_size_len; j++, p++ )
+                banks[i].size |= __be32_to_cpu(*p) << (32*j);
+
+            printk(XENLOG_DEBUG
+                   "direct_map start=%#"PRIpaddr" size=%#"PRIpaddr"\n",
+                   banks[i].start, banks[i].size);
+        }
+
+        /* reserve_memory_11(d, &kinfo, &banks[0], i); */
+        BUG();
+    }
 
     rc = prepare_dtb_domU(d, &kinfo);
     if ( rc < 0 )
@@ -2455,6 +2499,7 @@ void __init create_domUs(void)
 {
     struct dt_device_node *node;
     const struct dt_device_node *chosen = dt_find_node_by_path("/chosen");
+    u32 direct_map = 0;
 
     BUG_ON(chosen == NULL);
     dt_for_each_child_node(chosen, node)
@@ -2476,8 +2521,11 @@ void __init create_domUs(void)
             panic("Missing property 'cpus' for domain %s\n",
                   dt_node_name(node));
 
-        if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") )
+        dt_find_property(node, "direct-map", &direct_map);
+        if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") &&
+             !direct_map )
             d_cfg.flags |= XEN_DOMCTL_CDF_iommu;
+        flags.arch.is_direct_map = direct_map != 0;
 
         if ( !dt_property_read_u32(node, "nr_spis", &d_cfg.arch.nr_spis) )
         {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
                   ` (2 preceding siblings ...)
  2020-04-15  1:02 ` [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 11:22   ` Wei Liu
  2020-04-17 10:02   ` Jan Beulich
  2020-04-15  1:02 ` [PATCH 05/12] xen: introduce reserve_heap_pages Stefano Stabellini
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, julien, Wei Liu, andrew.cooper3, Ian Jackson,
	George Dunlap, jbeulich, Stefano Stabellini, Volodymyr_Babchuk

This patch splits the implementation of alloc_heap_pages into two halves
so that the second half can be reused by the next patch.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
CC: andrew.cooper3@citrix.com
CC: jbeulich@suse.com
CC: George Dunlap <george.dunlap@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wl@xen.org>
---
Comments are welcome. I am not convinced that this is the right way to
split it. Please let me know if you have any suggestions.
---
 xen/common/page_alloc.c | 94 +++++++++++++++++++++++------------------
 1 file changed, 53 insertions(+), 41 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 10b7aeca48..79ae64d4b8 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -911,54 +911,18 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
     }
 }
 
-/* Allocate 2^@order contiguous pages. */
-static struct page_info *alloc_heap_pages(
-    unsigned int zone_lo, unsigned int zone_hi,
-    unsigned int order, unsigned int memflags,
-    struct domain *d)
+static void __alloc_heap_pages(struct page_info **pgo,
+                               unsigned int order,
+                               unsigned int memflags,
+                               struct domain *d)
 {
     nodeid_t node;
     unsigned int i, buddy_order, zone, first_dirty;
     unsigned long request = 1UL << order;
-    struct page_info *pg;
     bool need_tlbflush = false;
     uint32_t tlbflush_timestamp = 0;
     unsigned int dirty_cnt = 0;
-
-    /* Make sure there are enough bits in memflags for nodeID. */
-    BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
-
-    ASSERT(zone_lo <= zone_hi);
-    ASSERT(zone_hi < NR_ZONES);
-
-    if ( unlikely(order > MAX_ORDER) )
-        return NULL;
-
-    spin_lock(&heap_lock);
-
-    /*
-     * Claimed memory is considered unavailable unless the request
-     * is made by a domain with sufficient unclaimed pages.
-     */
-    if ( (outstanding_claims + request > total_avail_pages) &&
-          ((memflags & MEMF_no_refcount) ||
-           !d || d->outstanding_pages < request) )
-    {
-        spin_unlock(&heap_lock);
-        return NULL;
-    }
-
-    pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
-    /* Try getting a dirty buddy if we couldn't get a clean one. */
-    if ( !pg && !(memflags & MEMF_no_scrub) )
-        pg = get_free_buddy(zone_lo, zone_hi, order,
-                            memflags | MEMF_no_scrub, d);
-    if ( !pg )
-    {
-        /* No suitable memory blocks. Fail the request. */
-        spin_unlock(&heap_lock);
-        return NULL;
-    }
+    struct page_info *pg = *pgo;
 
     node = phys_to_nid(page_to_maddr(pg));
     zone = page_to_zone(pg);
@@ -984,6 +948,7 @@ static struct page_info *alloc_heap_pages(
                 first_dirty = 0; /* We've moved past original first_dirty */
         }
     }
+    *pgo = pg;
 
     ASSERT(avail[node][zone] >= request);
     avail[node][zone] -= request;
@@ -1062,6 +1027,53 @@ static struct page_info *alloc_heap_pages(
     if ( need_tlbflush )
         filtered_flush_tlb_mask(tlbflush_timestamp);
 
+}
+
+/* Allocate 2^@order contiguous pages. */
+static struct page_info *alloc_heap_pages(
+    unsigned int zone_lo, unsigned int zone_hi,
+    unsigned int order, unsigned int memflags,
+    struct domain *d)
+{
+    unsigned long request = 1UL << order;
+    struct page_info *pg;
+
+    /* Make sure there are enough bits in memflags for nodeID. */
+    BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
+
+    ASSERT(zone_lo <= zone_hi);
+    ASSERT(zone_hi < NR_ZONES);
+
+    if ( unlikely(order > MAX_ORDER) )
+        return NULL;
+
+    spin_lock(&heap_lock);
+
+    /*
+     * Claimed memory is considered unavailable unless the request
+     * is made by a domain with sufficient unclaimed pages.
+     */
+    if ( (outstanding_claims + request > total_avail_pages) &&
+          ((memflags & MEMF_no_refcount) ||
+           !d || d->outstanding_pages < request) )
+    {
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
+
+    pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+    /* Try getting a dirty buddy if we couldn't get a clean one. */
+    if ( !pg && !(memflags & MEMF_no_scrub) )
+        pg = get_free_buddy(zone_lo, zone_hi, order,
+                            memflags | MEMF_no_scrub, d);
+    if ( !pg )
+    {
+        /* No suitable memory blocks. Fail the request. */
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
+
+    __alloc_heap_pages(&pg, order, memflags, d);
     return pg;
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
                   ` (3 preceding siblings ...)
  2020-04-15  1:02 ` [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 13:24   ` Julien Grall
  2020-04-17 10:11   ` Jan Beulich
  2020-04-15  1:02 ` [PATCH 06/12] xen/arm: reserve 1:1 memory for direct_map domUs Stefano Stabellini
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, julien, Wei Liu, andrew.cooper3, Ian Jackson,
	George Dunlap, jbeulich, Stefano Stabellini, Volodymyr_Babchuk

Introduce a function named reserve_heap_pages (similar to
alloc_heap_pages) that allocates a requested memory range. Call
__alloc_heap_pages for the implementation.

Change __alloc_heap_pages so that the original page doesn't get
modified, giving back unneeded memory top to bottom rather than bottom
to top.

Also introduce a function named reserve_domheap_pages, similar to
alloc_domheap_pages, that checks memflags before calling
reserve_heap_pages. It also assign_pages to the domain on success.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
CC: andrew.cooper3@citrix.com
CC: jbeulich@suse.com
CC: George Dunlap <george.dunlap@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/common/page_alloc.c | 72 ++++++++++++++++++++++++++++++++++++++---
 xen/include/xen/mm.h    |  2 ++
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 79ae64d4b8..3a9c1a291b 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -911,7 +911,7 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
     }
 }
 
-static void __alloc_heap_pages(struct page_info **pgo,
+static void __alloc_heap_pages(struct page_info *pg,
                                unsigned int order,
                                unsigned int memflags,
                                struct domain *d)
@@ -922,7 +922,7 @@ static void __alloc_heap_pages(struct page_info **pgo,
     bool need_tlbflush = false;
     uint32_t tlbflush_timestamp = 0;
     unsigned int dirty_cnt = 0;
-    struct page_info *pg = *pgo;
+    struct page_info *pg_start = pg;
 
     node = phys_to_nid(page_to_maddr(pg));
     zone = page_to_zone(pg);
@@ -934,10 +934,10 @@ static void __alloc_heap_pages(struct page_info **pgo,
     while ( buddy_order != order )
     {
         buddy_order--;
+        pg = pg_start + (1U << buddy_order);
         page_list_add_scrub(pg, node, zone, buddy_order,
                             (1U << buddy_order) > first_dirty ?
                             first_dirty : INVALID_DIRTY_IDX);
-        pg += 1U << buddy_order;
 
         if ( first_dirty != INVALID_DIRTY_IDX )
         {
@@ -948,7 +948,7 @@ static void __alloc_heap_pages(struct page_info **pgo,
                 first_dirty = 0; /* We've moved past original first_dirty */
         }
     }
-    *pgo = pg;
+    pg = pg_start;
 
     ASSERT(avail[node][zone] >= request);
     avail[node][zone] -= request;
@@ -1073,7 +1073,42 @@ static struct page_info *alloc_heap_pages(
         return NULL;
     }
 
-    __alloc_heap_pages(&pg, order, memflags, d);
+    __alloc_heap_pages(pg, order, memflags, d);
+    return pg;
+}
+
+static struct page_info *reserve_heap_pages(struct domain *d,
+                                            paddr_t start,
+                                            unsigned int order,
+                                            unsigned int memflags)
+{
+    nodeid_t node;
+    unsigned int zone;
+    struct page_info *pg;
+
+    if ( unlikely(order > MAX_ORDER) )
+        return NULL;
+
+    spin_lock(&heap_lock);
+
+    /*
+     * Claimed memory is considered unavailable unless the request
+     * is made by a domain with sufficient unclaimed pages.
+     */
+    if ( (outstanding_claims + (1UL << order) > total_avail_pages) &&
+          ((memflags & MEMF_no_refcount) ||
+           !d || d->outstanding_pages < (1UL << order)) )
+    {
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
+
+    pg = maddr_to_page(start);
+    node = phys_to_nid(start);
+    zone = page_to_zone(pg);
+    page_list_del(pg, &heap(node, zone, order));
+
+    __alloc_heap_pages(pg, order, memflags, d);
     return pg;
 }
 
@@ -2385,6 +2420,33 @@ struct page_info *alloc_domheap_pages(
     return pg;
 }
 
+struct page_info *reserve_domheap_pages(
+    struct domain *d, paddr_t start, unsigned int order, unsigned int memflags)
+{
+    struct page_info *pg = NULL;
+
+    ASSERT(!in_irq());
+
+    if ( memflags & MEMF_no_owner )
+        memflags |= MEMF_no_refcount;
+    else if ( (memflags & MEMF_no_refcount) && d )
+    {
+        ASSERT(!(memflags & MEMF_no_refcount));
+        return NULL;
+    }
+
+    pg = reserve_heap_pages(d, start, order, memflags);
+
+    if ( d && !(memflags & MEMF_no_owner) &&
+         assign_pages(d, pg, order, memflags) )
+    {
+        free_heap_pages(pg, order, memflags & MEMF_no_scrub);
+        return NULL;
+    }
+
+    return pg;
+}
+
 void free_domheap_pages(struct page_info *pg, unsigned int order)
 {
     struct domain *d = page_get_owner(pg);
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 9b62087be1..35407e1b68 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -199,6 +199,8 @@ void get_outstanding_claims(uint64_t *free_pages, uint64_t *outstanding_pages);
 void init_domheap_pages(paddr_t ps, paddr_t pe);
 struct page_info *alloc_domheap_pages(
     struct domain *d, unsigned int order, unsigned int memflags);
+struct page_info *reserve_domheap_pages(
+    struct domain *d, paddr_t start, unsigned int order, unsigned int memflags);
 void free_domheap_pages(struct page_info *pg, unsigned int order);
 unsigned long avail_domheap_pages_region(
     unsigned int node, unsigned int min_width, unsigned int max_width);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 06/12] xen/arm: reserve 1:1 memory for direct_map domUs
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
                   ` (4 preceding siblings ...)
  2020-04-15  1:02 ` [PATCH 05/12] xen: introduce reserve_heap_pages Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 13:38   ` Julien Grall
  2020-04-15  1:02 ` [PATCH 07/12] xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase Stefano Stabellini
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Volodymyr_Babchuk, sstabellini, julien, Stefano Stabellini

Use reserve_domheap_pages to implement the direct-map ranges allocation
for DomUs.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
 xen/arch/arm/domain_build.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index a2bb411568..627e0c5e8e 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -198,6 +198,40 @@ fail:
     return false;
 }
 
+static void __init reserve_memory_11(struct domain *d,
+                                     struct kernel_info *kinfo,
+                                     struct membank *banks,
+                                     unsigned int nr_banks)
+{
+    unsigned int i, order;
+    struct page_info *pg;
+   
+    kinfo->mem.nr_banks = 0;
+
+    for ( i = 0; i < nr_banks; i++ )
+    {
+        order = get_order_from_bytes(banks[i].size);
+        pg = reserve_domheap_pages(d, banks[i].start, order, 0);
+        if ( pg == NULL || !insert_11_bank(d, kinfo, pg, order) )
+        {
+            printk(XENLOG_ERR
+                   "%pd: cannot reserve memory start=%#"PRIpaddr" size=%#"PRIpaddr"\n",
+                    d, banks[i].start, banks[i].size);
+            BUG();
+        }
+    }
+
+    for( i = 0; i < kinfo->mem.nr_banks; i++ )
+    {
+        printk("BANK[%d] %#"PRIpaddr"-%#"PRIpaddr" (%ldMB)\n",
+                i,
+                kinfo->mem.bank[i].start,
+                kinfo->mem.bank[i].start + kinfo->mem.bank[i].size,
+                /* Don't want format this as PRIpaddr (16 digit hex) */
+                (unsigned long)(kinfo->mem.bank[i].size >> 20));
+    }
+}
+
 /*
  * This is all pretty horrible.
  *
@@ -2477,8 +2511,7 @@ static int __init construct_domU(struct domain *d,
                    banks[i].start, banks[i].size);
         }
 
-        /* reserve_memory_11(d, &kinfo, &banks[0], i); */
-        BUG();
+        reserve_memory_11(d, &kinfo, &banks[0], i);
     }
 
     rc = prepare_dtb_domU(d, &kinfo);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 07/12] xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
                   ` (5 preceding siblings ...)
  2020-04-15  1:02 ` [PATCH 06/12] xen/arm: reserve 1:1 memory for direct_map domUs Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 13:41   ` Julien Grall
  2020-04-15  1:02 ` [PATCH 08/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv2 Stefano Stabellini
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Volodymyr_Babchuk, sstabellini, julien, Stefano Stabellini

To be uniform with the old vgic. Name uniformity will become immediately
useful in the following patch.

In vgic_v2_map_resources, use the fields in struct vgic_dist rather than
local variables.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
 xen/arch/arm/vgic/vgic-init.c  |  4 ++--
 xen/arch/arm/vgic/vgic-v2.c    | 16 ++++++++--------
 xen/include/asm-arm/new_vgic.h |  4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/vgic/vgic-init.c b/xen/arch/arm/vgic/vgic-init.c
index 62ae553699..ea739d081e 100644
--- a/xen/arch/arm/vgic/vgic-init.c
+++ b/xen/arch/arm/vgic/vgic-init.c
@@ -112,8 +112,8 @@ int domain_vgic_register(struct domain *d, int *mmio_count)
         BUG();
     }
 
-    d->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
-    d->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
+    d->arch.vgic.dbase = VGIC_ADDR_UNDEF;
+    d->arch.vgic.cbase = VGIC_ADDR_UNDEF;
     d->arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
 
     return 0;
diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
index b5ba4ace87..09cb92039a 100644
--- a/xen/arch/arm/vgic/vgic-v2.c
+++ b/xen/arch/arm/vgic/vgic-v2.c
@@ -258,7 +258,7 @@ void vgic_v2_enable(struct vcpu *vcpu)
 int vgic_v2_map_resources(struct domain *d)
 {
     struct vgic_dist *dist = &d->arch.vgic;
-    paddr_t cbase, csize;
+    paddr_t csize;
     paddr_t vbase;
     int ret;
 
@@ -268,7 +268,7 @@ int vgic_v2_map_resources(struct domain *d)
      */
     if ( is_hardware_domain(d) )
     {
-        d->arch.vgic.vgic_dist_base = gic_v2_hw_data.dbase;
+        d->arch.vgic.dbase = gic_v2_hw_data.dbase;
         /*
          * For the hardware domain, we always map the whole HW CPU
          * interface region in order to match the device tree (the "reg"
@@ -276,13 +276,13 @@ int vgic_v2_map_resources(struct domain *d)
          * Note that we assume the size of the CPU interface is always
          * aligned to PAGE_SIZE.
          */
-        cbase = gic_v2_hw_data.cbase;
+        d->arch.vgic.cbase = gic_v2_hw_data.cbase;
         csize = gic_v2_hw_data.csize;
         vbase = gic_v2_hw_data.vbase;
     }
     else
     {
-        d->arch.vgic.vgic_dist_base = GUEST_GICD_BASE;
+        d->arch.vgic.dbase = GUEST_GICD_BASE;
         /*
          * The CPU interface exposed to the guest is always 8kB. We may
          * need to add an offset to the virtual CPU interface base
@@ -290,13 +290,13 @@ int vgic_v2_map_resources(struct domain *d)
          * region.
          */
         BUILD_BUG_ON(GUEST_GICC_SIZE != SZ_8K);
-        cbase = GUEST_GICC_BASE;
+        d->arch.vgic.cbase = GUEST_GICC_BASE;
         csize = GUEST_GICC_SIZE;
         vbase = gic_v2_hw_data.vbase + gic_v2_hw_data.aliased_offset;
     }
 
 
-    ret = vgic_register_dist_iodev(d, gaddr_to_gfn(dist->vgic_dist_base),
+    ret = vgic_register_dist_iodev(d, gaddr_to_gfn(dist->dbase),
                                    VGIC_V2);
     if ( ret )
     {
@@ -308,8 +308,8 @@ int vgic_v2_map_resources(struct domain *d)
      * Map the gic virtual cpu interface in the gic cpu interface
      * region of the guest.
      */
-    ret = map_mmio_regions(d, gaddr_to_gfn(cbase), csize / PAGE_SIZE,
-                           maddr_to_mfn(vbase));
+    ret = map_mmio_regions(d, gaddr_to_gfn(d->arch.vgic.cbase),
+                           csize / PAGE_SIZE, maddr_to_mfn(vbase));
     if ( ret )
     {
         gdprintk(XENLOG_ERR, "Unable to remap VGIC CPU to VCPU\n");
diff --git a/xen/include/asm-arm/new_vgic.h b/xen/include/asm-arm/new_vgic.h
index 97d622bff6..e3319900ab 100644
--- a/xen/include/asm-arm/new_vgic.h
+++ b/xen/include/asm-arm/new_vgic.h
@@ -115,11 +115,11 @@ struct vgic_dist {
     unsigned int        nr_spis;
 
     /* base addresses in guest physical address space: */
-    paddr_t             vgic_dist_base;     /* distributor */
+    paddr_t             dbase;     /* distributor */
     union
     {
         /* either a GICv2 CPU interface */
-        paddr_t         vgic_cpu_base;
+        paddr_t         cbase;
         /* or a number of GICv3 redistributor regions */
         struct
         {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 08/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv2
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
                   ` (6 preceding siblings ...)
  2020-04-15  1:02 ` [PATCH 07/12] xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 14:00   ` Julien Grall
  2020-04-15  1:02 ` [PATCH 09/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv3 Stefano Stabellini
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Volodymyr_Babchuk, sstabellini, julien, Stefano Stabellini

Today we use native addresses to map the GICv2 for Dom0 and fixed
addresses for DomUs.

This patch changes the behavior so that native addresses are used for
any domain that is_domain_direct_mapped.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
 xen/arch/arm/domain_build.c | 10 +++++++---
 xen/arch/arm/vgic-v2.c      | 12 ++++++------
 xen/arch/arm/vgic/vgic-v2.c |  2 +-
 xen/include/asm-arm/vgic.h  |  1 +
 4 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 627e0c5e8e..303bee60f6 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1643,8 +1643,12 @@ static int __init make_gicv2_domU_node(struct kernel_info *kinfo)
     int res = 0;
     __be32 reg[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) * 2];
     __be32 *cells;
+    struct domain *d = kinfo->d;
+    char buf[38];
 
-    res = fdt_begin_node(fdt, "interrupt-controller@"__stringify(GUEST_GICD_BASE));
+    snprintf(buf, sizeof(buf), "interrupt-controller@%"PRIx64,
+             d->arch.vgic.dbase);
+    res = fdt_begin_node(fdt, buf);
     if ( res )
         return res;
 
@@ -1666,9 +1670,9 @@ static int __init make_gicv2_domU_node(struct kernel_info *kinfo)
 
     cells = &reg[0];
     dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
-                       GUEST_GICD_BASE, GUEST_GICD_SIZE);
+                       d->arch.vgic.dbase, GUEST_GICD_SIZE);
     dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
-                       GUEST_GICC_BASE, GUEST_GICC_SIZE);
+                       d->arch.vgic.cbase, GUEST_GICC_SIZE);
 
     res = fdt_property(fdt, "reg", reg, sizeof(reg));
     if (res)
diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 64b141fea5..9a74e2ed38 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -650,14 +650,14 @@ static int vgic_v2_vcpu_init(struct vcpu *v)
 static int vgic_v2_domain_init(struct domain *d)
 {
     int ret;
-    paddr_t cbase, csize;
+    paddr_t csize;
     paddr_t vbase;
 
     /*
      * The hardware domain gets the hardware address.
      * Guests get the virtual platform layout.
      */
-    if ( is_hardware_domain(d) )
+    if ( is_domain_direct_mapped(d) )
     {
         d->arch.vgic.dbase = vgic_v2_hw.dbase;
         /*
@@ -667,7 +667,7 @@ static int vgic_v2_domain_init(struct domain *d)
          * Note that we assume the size of the CPU interface is always
          * aligned to PAGE_SIZE.
          */
-        cbase = vgic_v2_hw.cbase;
+        d->arch.vgic.cbase = vgic_v2_hw.cbase;
         csize = vgic_v2_hw.csize;
         vbase = vgic_v2_hw.vbase;
     }
@@ -681,7 +681,7 @@ static int vgic_v2_domain_init(struct domain *d)
          * region.
          */
         BUILD_BUG_ON(GUEST_GICC_SIZE != SZ_8K);
-        cbase = GUEST_GICC_BASE;
+        d->arch.vgic.cbase = GUEST_GICC_BASE;
         csize = GUEST_GICC_SIZE;
         vbase = vgic_v2_hw.vbase + vgic_v2_hw.aliased_offset;
     }
@@ -690,8 +690,8 @@ static int vgic_v2_domain_init(struct domain *d)
      * Map the gic virtual cpu interface in the gic cpu interface
      * region of the guest.
      */
-    ret = map_mmio_regions(d, gaddr_to_gfn(cbase), csize / PAGE_SIZE,
-                           maddr_to_mfn(vbase));
+    ret = map_mmio_regions(d, gaddr_to_gfn(d->arch.vgic.cbase),
+                           csize / PAGE_SIZE, maddr_to_mfn(vbase));
     if ( ret )
         return ret;
 
diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
index 09cb92039a..275dd8bea9 100644
--- a/xen/arch/arm/vgic/vgic-v2.c
+++ b/xen/arch/arm/vgic/vgic-v2.c
@@ -266,7 +266,7 @@ int vgic_v2_map_resources(struct domain *d)
      * The hardware domain gets the hardware address.
      * Guests get the virtual platform layout.
      */
-    if ( is_hardware_domain(d) )
+    if ( is_domain_direct_mapped(d) )
     {
         d->arch.vgic.dbase = gic_v2_hw_data.dbase;
         /*
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index ce1e3c4bbd..f151d98773 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -152,6 +152,7 @@ struct vgic_dist {
     struct pending_irq *pending_irqs;
     /* Base address for guest GIC */
     paddr_t dbase; /* Distributor base address */
+    paddr_t cbase; /* CPU interface base address */
 #ifdef CONFIG_GICV3
     /* GIC V3 addressing */
     /* List of contiguous occupied by the redistributors */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 09/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv3
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
                   ` (7 preceding siblings ...)
  2020-04-15  1:02 ` [PATCH 08/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv2 Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 14:09   ` Julien Grall
  2020-04-15  1:02 ` [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011 Stefano Stabellini
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Volodymyr_Babchuk, sstabellini, julien, Stefano Stabellini

Today we use native addresses to map the GICv3 for Dom0 and fixed
addresses for DomUs.

This patch changes the behavior so that native addresses are used for
any domain that is_domain_direct_mapped. The patch has to introduce one
#ifndef CONFIG_NEW_VGIC because the new vgic doesn't support GICv3.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
 xen/arch/arm/domain_build.c | 12 +++++++++---
 xen/arch/arm/vgic-v3.c      | 18 ++++++++++++++----
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 303bee60f6..beec0a144c 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1697,8 +1697,12 @@ static int __init make_gicv3_domU_node(struct kernel_info *kinfo)
     int res = 0;
     __be32 reg[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) * 2];
     __be32 *cells;
+    struct domain *d = kinfo->d;
+    char buf[38];
 
-    res = fdt_begin_node(fdt, "interrupt-controller@"__stringify(GUEST_GICV3_GICD_BASE));
+    snprintf(buf, sizeof(buf), "interrupt-controller@%"PRIx64,
+             d->arch.vgic.dbase);
+    res = fdt_begin_node(fdt, buf);
     if ( res )
         return res;
 
@@ -1720,9 +1724,11 @@ static int __init make_gicv3_domU_node(struct kernel_info *kinfo)
 
     cells = &reg[0];
     dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
-                       GUEST_GICV3_GICD_BASE, GUEST_GICV3_GICD_SIZE);
+                       d->arch.vgic.dbase, GUEST_GICV3_GICD_SIZE);
+#if defined(CONFIG_GICV3) && !defined(CONFIG_NEW_VGIC)
     dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
-                       GUEST_GICV3_GICR0_BASE, GUEST_GICV3_GICR0_SIZE);
+                       d->arch.vgic.rdist_regions[0].base, GUEST_GICV3_GICR0_SIZE);
+#endif
 
     res = fdt_property(fdt, "reg", reg, sizeof(reg));
     if (res)
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 4e60ba15cc..4cf430f865 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -1677,13 +1677,25 @@ static int vgic_v3_domain_init(struct domain *d)
      * Domain 0 gets the hardware address.
      * Guests get the virtual platform layout.
      */
-    if ( is_hardware_domain(d) )
+    if ( is_domain_direct_mapped(d) )
     {
         unsigned int first_cpu = 0;
+        unsigned int nr_rdist_regions;
 
         d->arch.vgic.dbase = vgic_v3_hw.dbase;
 
-        for ( i = 0; i < vgic_v3_hw.nr_rdist_regions; i++ )
+        if ( is_hardware_domain(d) )
+        {
+            nr_rdist_regions = vgic_v3_hw.nr_rdist_regions;
+            d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
+        }
+        else
+        {
+            nr_rdist_regions = 1;
+            d->arch.vgic.intid_bits = 10;
+        }
+
+        for ( i = 0; i < nr_rdist_regions; i++ )
         {
             paddr_t size = vgic_v3_hw.regions[i].size;
 
@@ -1706,8 +1718,6 @@ static int vgic_v3_domain_init(struct domain *d)
          * exposing unused region as they will not get emulated.
          */
         d->arch.vgic.nr_regions = i + 1;
-
-        d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
     }
     else
     {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
                   ` (8 preceding siblings ...)
  2020-04-15  1:02 ` [PATCH 09/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv3 Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 14:11   ` Julien Grall
  2020-04-15  1:02 ` [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU Stefano Stabellini
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Volodymyr_Babchuk, sstabellini, julien, Stefano Stabellini

We always use a fix address to map the vPL011 to domains. The address
could be a problem for domains that are directly mapped.

Instead, for domains that are directly mapped, reuse the address of the
physical UART on the platform to avoid potential clashes.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
 xen/arch/arm/domain_build.c  | 13 ++++++++-----
 xen/arch/arm/vpl011.c        | 12 +++++++++---
 xen/include/asm-arm/domain.h |  1 +
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index beec0a144c..9bc0b810a7 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1768,8 +1768,11 @@ static int __init make_vpl011_uart_node(struct kernel_info *kinfo)
     gic_interrupt_t intr;
     __be32 reg[GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS];
     __be32 *cells;
+    struct domain *d = kinfo->d;
+    char buf[27];
 
-    res = fdt_begin_node(fdt, "sbsa-uart@"__stringify(GUEST_PL011_BASE));
+    snprintf(buf, sizeof(buf), "sbsa-uart@%"PRIx64, d->arch.vpl011_addr);
+    res = fdt_begin_node(fdt, buf);
     if ( res )
         return res;
 
@@ -1779,7 +1782,7 @@ static int __init make_vpl011_uart_node(struct kernel_info *kinfo)
 
     cells = &reg[0];
     dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS,
-                       GUEST_ROOT_SIZE_CELLS, GUEST_PL011_BASE,
+                       GUEST_ROOT_SIZE_CELLS, d->arch.vpl011_addr,
                        GUEST_PL011_SIZE);
 
     res = fdt_property(fdt, "reg", reg, sizeof(reg));
@@ -2524,6 +2527,9 @@ static int __init construct_domU(struct domain *d,
         reserve_memory_11(d, &kinfo, &banks[0], i);
     }
 
+    if ( kinfo.vpl011 )
+        rc = domain_vpl011_init(d, NULL);
+
     rc = prepare_dtb_domU(d, &kinfo);
     if ( rc < 0 )
         return rc;
@@ -2532,9 +2538,6 @@ static int __init construct_domU(struct domain *d,
     if ( rc < 0 )
         return rc;
 
-    if ( kinfo.vpl011 )
-        rc = domain_vpl011_init(d, NULL);
-
     return rc;
 }
 
diff --git a/xen/arch/arm/vpl011.c b/xen/arch/arm/vpl011.c
index 895f436cc4..44173a76fd 100644
--- a/xen/arch/arm/vpl011.c
+++ b/xen/arch/arm/vpl011.c
@@ -347,7 +347,7 @@ static int vpl011_mmio_read(struct vcpu *v,
                             void *priv)
 {
     struct hsr_dabt dabt = info->dabt;
-    uint32_t vpl011_reg = (uint32_t)(info->gpa - GUEST_PL011_BASE);
+    uint32_t vpl011_reg = (uint32_t)(info->gpa - v->domain->arch.vpl011_addr);
     struct vpl011 *vpl011 = &v->domain->arch.vpl011;
     struct domain *d = v->domain;
     unsigned long flags;
@@ -430,7 +430,7 @@ static int vpl011_mmio_write(struct vcpu *v,
                              void *priv)
 {
     struct hsr_dabt dabt = info->dabt;
-    uint32_t vpl011_reg = (uint32_t)(info->gpa - GUEST_PL011_BASE);
+    uint32_t vpl011_reg = (uint32_t)(info->gpa - v->domain->arch.vpl011_addr);
     struct vpl011 *vpl011 = &v->domain->arch.vpl011;
     struct domain *d = v->domain;
     unsigned long flags;
@@ -622,10 +622,16 @@ int domain_vpl011_init(struct domain *d, struct vpl011_init_info *info)
 {
     int rc;
     struct vpl011 *vpl011 = &d->arch.vpl011;
+    const struct vuart_info *uart = serial_vuart_info(SERHND_DTUART);
 
     if ( vpl011->backend.dom.ring_buf )
         return -EINVAL;
 
+    if ( is_domain_direct_mapped(d) && uart != NULL )
+        d->arch.vpl011_addr = uart->base_addr;
+    else
+        d->arch.vpl011_addr = GUEST_PL011_BASE;
+
     /*
      * info is NULL when the backend is in Xen.
      * info is != NULL when the backend is in a domain.
@@ -673,7 +679,7 @@ int domain_vpl011_init(struct domain *d, struct vpl011_init_info *info)
     spin_lock_init(&vpl011->lock);
 
     register_mmio_handler(d, &vpl011_mmio_handler,
-                          GUEST_PL011_BASE, GUEST_PL011_SIZE, NULL);
+                          d->arch.vpl011_addr, GUEST_PL011_SIZE, NULL);
 
     return 0;
 
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 7a498921bf..52741895c8 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -100,6 +100,7 @@ struct arch_domain
 #endif
 
     bool direct_map;
+    paddr_t vpl011_addr;
 }  __cacheline_aligned;
 
 struct arch_xen_dom_flags
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
                   ` (9 preceding siblings ...)
  2020-04-15  1:02 ` [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011 Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 14:12   ` Julien Grall
  2020-04-15  1:02 ` [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices Stefano Stabellini
  2020-04-16  8:59 ` [PATCH 0/12] direct-map DomUs Julien Grall
  12 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Volodymyr_Babchuk, sstabellini, julien, Stefano Stabellini

If xen_force (which means xen,force-assign-without-iommu was requested)
don't try to add the device to the IOMMU. Return early instead.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
 xen/arch/arm/domain_build.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 9bc0b810a7..d0fc497d07 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1884,12 +1884,14 @@ static int __init handle_passthrough_prop(struct kernel_info *kinfo,
     if ( res < 0 )
         return res;
 
+    if ( xen_force )
+        return 0;
+
     res = iommu_add_dt_device(node);
     if ( res < 0 )
         return res;
 
-    /* If xen_force, we allow assignment of devices without IOMMU protection. */
-    if ( xen_force && !dt_device_is_protected(node) )
+    if ( !dt_device_is_protected(node) )
         return 0;
 
     return iommu_assign_dt_device(kinfo->d, node);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
                   ` (10 preceding siblings ...)
  2020-04-15  1:02 ` [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU Stefano Stabellini
@ 2020-04-15  1:02 ` Stefano Stabellini
  2020-04-15 14:18   ` Julien Grall
  2020-04-16  8:59 ` [PATCH 0/12] direct-map DomUs Julien Grall
  12 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-15  1:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Volodymyr_Babchuk, sstabellini, julien, Stefano Stabellini

iomem_permit_access should be called for MMIO regions of devices
assigned to a domain. Currently it is not called for MMIO regions of
passthrough devices of Dom0less guests. This patch fixes it.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
---
 xen/arch/arm/domain_build.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index d0fc497d07..d3d42eef5d 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1846,6 +1846,17 @@ static int __init handle_passthrough_prop(struct kernel_info *kinfo,
             return -EINVAL;
         }
 
+        res = iomem_permit_access(kinfo->d, paddr_to_pfn(mstart),
+                                  paddr_to_pfn(PAGE_ALIGN(mstart + size - 1)));
+        if ( res )
+        {
+            printk(XENLOG_ERR "Unable to permit to dom%d access to"
+                   " 0x%"PRIx64" - 0x%"PRIx64"\n",
+                   kinfo->d->domain_id,
+                   mstart & PAGE_MASK, PAGE_ALIGN(mstart + size) - 1);
+            return res;
+        }
+
         res = map_regions_p2mt(kinfo->d,
                                gaddr_to_gfn(gstart),
                                PFN_DOWN(size),
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/12] xen: introduce xen_dom_flags
  2020-04-15  1:02 ` [PATCH 01/12] xen: introduce xen_dom_flags Stefano Stabellini
@ 2020-04-15  9:12   ` Jan Beulich
  2020-04-15 13:26     ` Julien Grall
  2020-04-29 23:57     ` Stefano Stabellini
  0 siblings, 2 replies; 67+ messages in thread
From: Jan Beulich @ 2020-04-15  9:12 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: julien, Wei Liu, George Dunlap, andrew.cooper3, Ian Jackson,
	Dario Faggioli, xen-devel, Stefano Stabellini, Volodymyr_Babchuk,
	Roger Pau Monné

On 15.04.2020 03:02, Stefano Stabellini wrote:
> We are passing an extra special boolean flag at domain creation to
> specify whether we want to the domain to be privileged (i.e. dom0) or
> not. Another flag will be introduced later in this series.
> 
> Introduce a new struct xen_dom_flags and move the privileged flag to it.
> Other flags will be added to struct xen_dom_flags.

I'm unsure whether introducing a 2nd structure is worth it here.
We could as well define some internal-use-only flags for
struct xen_domctl_createdomain's respective field.

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -529,7 +529,8 @@ static bool emulation_flags_ok(const struct domain *d, uint32_t emflags)
>  }
>  
>  int arch_domain_create(struct domain *d,
> -                       struct xen_domctl_createdomain *config)
> +                       struct xen_domctl_createdomain *config,
> +                       struct xen_dom_flags *flags)

const (also elsewhere)?

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -706,6 +706,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>          .max_maptrack_frames = -1,
>      };
>      const char *hypervisor_name;
> +    struct xen_dom_flags flags = { !pv_shim };

Here and elsewhere please use field designators right away, even if
there's only a single field now.

> @@ -363,7 +363,7 @@ struct domain *domain_create(domid_t domid,
>      ASSERT(is_system_domain(d) ? config == NULL : config != NULL);
>  
>      /* Sort out our idea of is_control_domain(). */
> -    d->is_privileged = is_priv;
> +    d->is_privileged =  flags ? flags->is_priv : false;

Stray double blanks.

> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -364,6 +364,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>      bool_t copyback = 0;
>      struct xen_domctl curop, *op = &curop;
>      struct domain *d;
> +    struct xen_dom_flags flags ={ false };

Missing blank.

> --- a/xen/include/xen/domain.h
> +++ b/xen/include/xen/domain.h
> @@ -63,8 +63,13 @@ void arch_vcpu_destroy(struct vcpu *v);
>  int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset);
>  void unmap_vcpu_info(struct vcpu *v);
>  
> +struct xen_dom_flags {
> +    bool is_priv;

Use a single bit bitfield instead? May even want to consider passing
this struct by value then.

Jan


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/12] xen/arm: introduce arch_xen_dom_flags and direct_map
  2020-04-15  1:02 ` [PATCH 02/12] xen/arm: introduce arch_xen_dom_flags and direct_map Stefano Stabellini
@ 2020-04-15 10:27   ` Jan Beulich
  2020-04-15 11:27     ` Andrew Cooper
  2020-04-30  0:34     ` Stefano Stabellini
  0 siblings, 2 replies; 67+ messages in thread
From: Jan Beulich @ 2020-04-15 10:27 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: julien, Wei Liu, George Dunlap, andrew.cooper3, Ian Jackson,
	xen-devel, Stefano Stabellini, Volodymyr_Babchuk,
	Roger Pau Monné

On 15.04.2020 03:02, Stefano Stabellini wrote:
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -682,6 +682,7 @@ int arch_domain_create(struct domain *d,
>          return 0;
>  
>      ASSERT(config != NULL);
> +    d->arch.direct_map = flags != NULL ? flags->arch.is_direct_map : false;

Shouldn't "true" here imply ->disable_migrate also getting
set to true? Or is this already the case anyway for domains
created right at boot?

> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -2527,6 +2527,7 @@ int __init construct_dom0(struct domain *d)
>  
>      iommu_hwdom_init(d);
>  
> +    d->arch.direct_map = true;

Shouldn't this get set via arch_domain_create() instead?

> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -418,6 +418,8 @@ struct arch_domain
>      uint32_t emulation_flags;
>  } __cacheline_aligned;
>  
> +struct arch_xen_dom_flags {};

This trivial x86 change
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability
  2020-04-15  1:02 ` [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability Stefano Stabellini
@ 2020-04-15 11:22   ` Wei Liu
  2020-04-17 10:02   ` Jan Beulich
  1 sibling, 0 replies; 67+ messages in thread
From: Wei Liu @ 2020-04-15 11:22 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: julien, Wei Liu, andrew.cooper3, Ian Jackson, George Dunlap,
	jbeulich, xen-devel, Stefano Stabellini, Volodymyr_Babchuk

On Tue, Apr 14, 2020 at 06:02:47PM -0700, Stefano Stabellini wrote:
> This patch splits the implementation of alloc_heap_pages into two halves
> so that the second half can be reused by the next patch.

It would be good if you can put a summary on what each half does here so
that we can check you intent against the implementation.

Wei.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/12] xen/arm: introduce arch_xen_dom_flags and direct_map
  2020-04-15 10:27   ` Jan Beulich
@ 2020-04-15 11:27     ` Andrew Cooper
  2020-04-30  0:34     ` Stefano Stabellini
  1 sibling, 0 replies; 67+ messages in thread
From: Andrew Cooper @ 2020-04-15 11:27 UTC (permalink / raw)
  To: Jan Beulich, Stefano Stabellini
  Cc: julien, Wei Liu, George Dunlap, Ian Jackson, xen-devel,
	Stefano Stabellini, Volodymyr_Babchuk, Roger Pau Monné

On 15/04/2020 11:27, Jan Beulich wrote:
> On 15.04.2020 03:02, Stefano Stabellini wrote:
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -682,6 +682,7 @@ int arch_domain_create(struct domain *d,
>>          return 0;
>>  
>>      ASSERT(config != NULL);
>> +    d->arch.direct_map = flags != NULL ? flags->arch.is_direct_map : false;
> Shouldn't "true" here imply ->disable_migrate also getting
> set to true? Or is this already the case anyway for domains
> created right at boot?

Please don't introduce any more uses for disable_migrate.  It is
fundamentally broken and won't survive to 4.14 (assuming that the 30 odd
patches of prereq fixes on xen-devel start unblocking themselves in time)

~Andrew


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-15  1:02 ` [PATCH 05/12] xen: introduce reserve_heap_pages Stefano Stabellini
@ 2020-04-15 13:24   ` Julien Grall
  2020-04-17 10:11   ` Jan Beulich
  1 sibling, 0 replies; 67+ messages in thread
From: Julien Grall @ 2020-04-15 13:24 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel
  Cc: Wei Liu, andrew.cooper3, Ian Jackson, George Dunlap, jbeulich,
	Stefano Stabellini, Volodymyr_Babchuk

On 15/04/2020 02:02, Stefano Stabellini wrote:
> Introduce a function named reserve_heap_pages (similar to
> alloc_heap_pages) that allocates a requested memory range. Call
> __alloc_heap_pages for the implementation.
> 
> Change __alloc_heap_pages so that the original page doesn't get
> modified, giving back unneeded memory top to bottom rather than bottom
> to top.
> 
> Also introduce a function named reserve_domheap_pages, similar to
> alloc_domheap_pages, that checks memflags before calling
> reserve_heap_pages. It also assign_pages to the domain on success.

Xen may have already allocated the part of region for its own purpose or 
for another domain. So this will not work reliably.

We have the same issues with LiveUpdate as memory have to be preserved. 
We need to mark the page reserved before any allocation (including early 
boot allocation) so nobody can use them. See [1].

Cheers,

[1]  Live update: boot memory management, data stream handling, record 
format <a92287c03fed310e08ba40063e370038569b94ac.camel@infradead.org>

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/12] xen: introduce xen_dom_flags
  2020-04-15  9:12   ` Jan Beulich
@ 2020-04-15 13:26     ` Julien Grall
  2020-04-29 23:57     ` Stefano Stabellini
  1 sibling, 0 replies; 67+ messages in thread
From: Julien Grall @ 2020-04-15 13:26 UTC (permalink / raw)
  To: Jan Beulich, Stefano Stabellini
  Cc: Wei Liu, George Dunlap, andrew.cooper3, Ian Jackson,
	Dario Faggioli, xen-devel, Stefano Stabellini, Volodymyr_Babchuk,
	Roger Pau Monné



On 15/04/2020 10:12, Jan Beulich wrote:
> On 15.04.2020 03:02, Stefano Stabellini wrote:
>> We are passing an extra special boolean flag at domain creation to
>> specify whether we want to the domain to be privileged (i.e. dom0) or
>> not. Another flag will be introduced later in this series.
>>
>> Introduce a new struct xen_dom_flags and move the privileged flag to it.
>> Other flags will be added to struct xen_dom_flags.
> 
> I'm unsure whether introducing a 2nd structure is worth it here.
> We could as well define some internal-use-only flags for
> struct xen_domctl_createdomain's respective field.

+1.

> 
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -529,7 +529,8 @@ static bool emulation_flags_ok(const struct domain *d, uint32_t emflags)
>>   }
>>   
>>   int arch_domain_create(struct domain *d,
>> -                       struct xen_domctl_createdomain *config)
>> +                       struct xen_domctl_createdomain *config,
>> +                       struct xen_dom_flags *flags)
> 
> const (also elsewhere)?
> 
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -706,6 +706,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>           .max_maptrack_frames = -1,
>>       };
>>       const char *hypervisor_name;
>> +    struct xen_dom_flags flags = { !pv_shim };
> 
> Here and elsewhere please use field designators right away, even if
> there's only a single field now.
> 
>> @@ -363,7 +363,7 @@ struct domain *domain_create(domid_t domid,
>>       ASSERT(is_system_domain(d) ? config == NULL : config != NULL);
>>   
>>       /* Sort out our idea of is_control_domain(). */
>> -    d->is_privileged = is_priv;
>> +    d->is_privileged =  flags ? flags->is_priv : false;
> 
> Stray double blanks.
> 
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -364,6 +364,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>>       bool_t copyback = 0;
>>       struct xen_domctl curop, *op = &curop;
>>       struct domain *d;
>> +    struct xen_dom_flags flags ={ false };
> 
> Missing blank.
> 
>> --- a/xen/include/xen/domain.h
>> +++ b/xen/include/xen/domain.h
>> @@ -63,8 +63,13 @@ void arch_vcpu_destroy(struct vcpu *v);
>>   int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset);
>>   void unmap_vcpu_info(struct vcpu *v);
>>   
>> +struct xen_dom_flags {
>> +    bool is_priv;
> 
> Use a single bit bitfield instead? May even want to consider passing
> this struct by value then.

This is an alternative if extending xen_domctl_createdomain is not a 
solution. The bitfield is easier to extend because we don't need to 
create a new field for each flag in struct domain.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs
  2020-04-15  1:02 ` [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs Stefano Stabellini
@ 2020-04-15 13:36   ` Julien Grall
  2020-05-01  1:26     ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-15 13:36 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: Stefano Stabellini, Volodymyr_Babchuk



On 15/04/2020 02:02, Stefano Stabellini wrote:
> In some cases it is desirable to map domU memory 1:1 (guest physical ==
> physical.) For instance, because we want to assign a device to the domU
> but the IOMMU is not present or cannot be used. In these cases, other
> mechanisms should be used for DMA protection, e.g. a MPU.

I am not against this, however the documentation should clearly explain 
that you are making your platform insecure unless you have other mean 
for DMA protection.

> 
> This patch introduces a new device tree option for dom0less guests to
> request a domain to be directly mapped. It also specifies the memory
> ranges. This patch documents the new attribute and parses it at boot
> time. (However, the implementation of 1:1 mapping is missing and just
> BUG() out at the moment.)  Finally the patch sets the new direct_map
> flag for DomU domains.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> ---
>   docs/misc/arm/device-tree/booting.txt | 13 +++++++
>   docs/misc/arm/passthrough-noiommu.txt | 35 ++++++++++++++++++
>   xen/arch/arm/domain_build.c           | 52 +++++++++++++++++++++++++--
>   3 files changed, 98 insertions(+), 2 deletions(-)
>   create mode 100644 docs/misc/arm/passthrough-noiommu.txt
> 
> diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt
> index 5243bc7fd3..fce5f7ed5a 100644
> --- a/docs/misc/arm/device-tree/booting.txt
> +++ b/docs/misc/arm/device-tree/booting.txt
> @@ -159,6 +159,19 @@ with the following properties:
>       used, or GUEST_VPL011_SPI+1 if vpl011 is enabled, whichever is
>       greater.
>   
> +- direct-map
> +
> +    Optional. An array of integer pairs specifying addresses and sizes.
> +    direct_map requests the memory of the domain to be 1:1 mapped with
> +    the memory ranges specified as argument. Only sizes that are a
> +    power of two number of pages are allowed.
> +
> +- #direct-map-addr-cells and #direct-map-size-cells
> +
> +    The number of cells to use for the addresses and for the sizes in
> +    direct-map. Default and maximum are 2 cells for both addresses and
> +    sizes.
> +

As this is going to be mostly used for passthrough, can't we take 
advantage of the partial device-tree and describe the memory region 
using memory node?

>   - #address-cells and #size-cells
>   
>       Both #address-cells and #size-cells need to be specified because
> diff --git a/docs/misc/arm/passthrough-noiommu.txt b/docs/misc/arm/passthrough-noiommu.txt
> new file mode 100644
> index 0000000000..d2bfaf26c3
> --- /dev/null
> +++ b/docs/misc/arm/passthrough-noiommu.txt
> @@ -0,0 +1,35 @@
> +Request Device Assignment without IOMMU support
> +===============================================
> +
> +Add xen,force-assign-without-iommu; to the device tree snippet
> +
> +    ethernet: ethernet@ff0e0000 {
> +        compatible = "cdns,zynqmp-gem";
> +        xen,path = "/amba/ethernet@ff0e0000";
> +        xen,reg = <0x0 0xff0e0000 0x1000 0x0 0xff0e0000>;
> +        xen,force-assign-without-iommu;
> +
> +Optionally, if none of the domains require an IOMMU, then it could be
> +disabled (not recommended). For instance by adding status = "disabled";
> +under the smmu node:
> +
> +    smmu@fd800000 {
> +        compatible = "arm,mmu-500";
> +        status = "disabled";

I am not sure why this section is added in this patch. Furthermore, the 
file is named "noiommu" but here you mention the IOMMU.

> +
> +
> +Request 1:1 memory mapping for the dom0-less domain
> +===================================================
> +
> +Add a direct-map property under the appropriate /chosen/domU node with
> +the memory ranges you want to assign to your domain. If you are using
> +imagebuilder, you can add to boot.source something like the following:
> +
> +    fdt set /chosen/domU0 direct-map <0x0 0x10000000 0x0 0x10000000 0x0 0x60000000 0x0 0x10000000>
> +
> +Which will assign the ranges:
> +
> +    0x10000000 - 0x20000000
> +    0x60000000 - 0x70000000
> +
> +to the first dom0less domU.
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 2ec7453aa3..a2bb411568 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -2435,7 +2435,51 @@ static int __init construct_domU(struct domain *d,
>       /* type must be set before allocate memory */
>       d->arch.type = kinfo.type;
>   #endif
> -    allocate_memory(d, &kinfo);
> +
> +    if ( !is_domain_direct_mapped(d) )
> +        allocate_memory(d, &kinfo);
> +    else
> +    {
> +        struct membank banks[NR_MEM_BANKS];
> +        const struct dt_property *prop;
> +        u32 direct_map_len, direct_map_addr_len = 2, direct_map_size_len = 2;
> +        unsigned int i;
> +        __be32 *p;
> +
> +        prop = dt_find_property(node, "direct-map", &direct_map_len);
> +        BUG_ON(!prop);
> +
> +        dt_property_read_u32(node,
> +                             "#direct-map-addr-cells",
> +                             &direct_map_addr_len);
> +        dt_property_read_u32(node,
> +                             "#direct-map-size-cells",
> +                             &direct_map_size_len);
> +        BUG_ON(direct_map_size_len > 2 || direct_map_addr_len > 2);
> +
> +        for ( i = 0, p = prop->value;
> +              i < direct_map_len /
> +                  (4 * (direct_map_addr_len + direct_map_size_len));
> +              i++)
> +        {
> +            int j;
> +
> +            banks[i].start = 0;
> +            for ( j = 0; j < direct_map_addr_len; j++, p++ )
> +                banks[i].start |= __be32_to_cpu(*p) << (32*j);

Please use dt_read_number();

> +
> +            banks[i].size = 0;
> +            for ( j = 0; j < direct_map_size_len; j++, p++ )
> +                banks[i].size |= __be32_to_cpu(*p) << (32*j);
> +

Same here.

> +            printk(XENLOG_DEBUG
> +                   "direct_map start=%#"PRIpaddr" size=%#"PRIpaddr"\n",
> +                   banks[i].start, banks[i].size);
> +        }
> +
> +        /* reserve_memory_11(d, &kinfo, &banks[0], i); */
> +        BUG();

This can be avoided by re-ordering the patches and folding #6 in this patch.

> +    }
>   
>       rc = prepare_dtb_domU(d, &kinfo);
>       if ( rc < 0 )
> @@ -2455,6 +2499,7 @@ void __init create_domUs(void)
>   {
>       struct dt_device_node *node;
>       const struct dt_device_node *chosen = dt_find_node_by_path("/chosen");
> +    u32 direct_map = 0;
>   
>       BUG_ON(chosen == NULL);
>       dt_for_each_child_node(chosen, node)
> @@ -2476,8 +2521,11 @@ void __init create_domUs(void)
>               panic("Missing property 'cpus' for domain %s\n",`
>                     dt_node_name(node));
>   
> -        if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") )
> +        dt_find_property(node, "direct-map", &direct_map);

Please use dt_property_read_bool().

> +        if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") &&
> +             !direct_map )
>               d_cfg.flags |= XEN_DOMCTL_CDF_iommu;
> +        flags.arch.is_direct_map = direct_map != 0;
>   
>           if ( !dt_property_read_u32(node, "nr_spis", &d_cfg.arch.nr_spis) )
>           {
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 06/12] xen/arm: reserve 1:1 memory for direct_map domUs
  2020-04-15  1:02 ` [PATCH 06/12] xen/arm: reserve 1:1 memory for direct_map domUs Stefano Stabellini
@ 2020-04-15 13:38   ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2020-04-15 13:38 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: Stefano Stabellini, Volodymyr_Babchuk



On 15/04/2020 02:02, Stefano Stabellini wrote:
> Use reserve_domheap_pages to implement the direct-map ranges allocation
> for DomUs.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> ---
>   xen/arch/arm/domain_build.c | 37 +++++++++++++++++++++++++++++++++++--
>   1 file changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index a2bb411568..627e0c5e8e 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -198,6 +198,40 @@ fail:
>       return false;
>   }
>   
> +static void __init reserve_memory_11(struct domain *d,
> +                                     struct kernel_info *kinfo,
> +                                     struct membank *banks,
> +                                     unsigned int nr_banks)

Can we stop introduce more void function and properly return an error?

> +{
> +    unsigned int i, order;
> +    struct page_info *pg;
> +
> +    kinfo->mem.nr_banks = 0;
> +
> +    for ( i = 0; i < nr_banks; i++ )
> +    {
> +        order = get_order_from_bytes(banks[i].size);
> +        pg = reserve_domheap_pages(d, banks[i].start, order, 0);
> +        if ( pg == NULL || !insert_11_bank(d, kinfo, pg, order) )
> +        {
> +            printk(XENLOG_ERR
> +                   "%pd: cannot reserve memory start=%#"PRIpaddr" size=%#"PRIpaddr"\n",
> +                    d, banks[i].start, banks[i].size);
> +            BUG();
> +        }
> +    }
> +
> +    for( i = 0; i < kinfo->mem.nr_banks; i++ )
> +    {
> +        printk("BANK[%d] %#"PRIpaddr"-%#"PRIpaddr" (%ldMB)\n",
> +                i,
> +                kinfo->mem.bank[i].start,
> +                kinfo->mem.bank[i].start + kinfo->mem.bank[i].size,
> +                /* Don't want format this as PRIpaddr (16 digit hex) */
> +                (unsigned long)(kinfo->mem.bank[i].size >> 20));
> +    }
> +}
> +
>   /*
>    * This is all pretty horrible.
>    *
> @@ -2477,8 +2511,7 @@ static int __init construct_domU(struct domain *d,
>                      banks[i].start, banks[i].size);
>           }
>   
> -        /* reserve_memory_11(d, &kinfo, &banks[0], i); */
> -        BUG();
> +        reserve_memory_11(d, &kinfo, &banks[0], i);

If you fold this in #3 and re-order the patches then you don't need the 
the commented code + BUG().

>       }
>   
>       rc = prepare_dtb_domU(d, &kinfo);
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 07/12] xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase
  2020-04-15  1:02 ` [PATCH 07/12] xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase Stefano Stabellini
@ 2020-04-15 13:41   ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2020-04-15 13:41 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: Stefano Stabellini, Volodymyr_Babchuk

Hi,

On 15/04/2020 02:02, Stefano Stabellini wrote:
> To be uniform with the old vgic. Name uniformity will become immediately
> useful in the following patch.

Please no. Those fields are not meant to be exposed outside of the vGIC.

'cbase' is also much less obvious to read over vgic_cpu_base.

Instead please introduce an helper to retrieve the information you need.

> In vgic_v2_map_resources, use the fields in struct vgic_dist rather than
> local variables.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> ---
>   xen/arch/arm/vgic/vgic-init.c  |  4 ++--
>   xen/arch/arm/vgic/vgic-v2.c    | 16 ++++++++--------
>   xen/include/asm-arm/new_vgic.h |  4 ++--
>   3 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/arch/arm/vgic/vgic-init.c b/xen/arch/arm/vgic/vgic-init.c
> index 62ae553699..ea739d081e 100644
> --- a/xen/arch/arm/vgic/vgic-init.c
> +++ b/xen/arch/arm/vgic/vgic-init.c
> @@ -112,8 +112,8 @@ int domain_vgic_register(struct domain *d, int *mmio_count)
>           BUG();
>       }
>   
> -    d->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
> -    d->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
> +    d->arch.vgic.dbase = VGIC_ADDR_UNDEF;
> +    d->arch.vgic.cbase = VGIC_ADDR_UNDEF;
>       d->arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
>   
>       return 0;
> diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
> index b5ba4ace87..09cb92039a 100644
> --- a/xen/arch/arm/vgic/vgic-v2.c
> +++ b/xen/arch/arm/vgic/vgic-v2.c
> @@ -258,7 +258,7 @@ void vgic_v2_enable(struct vcpu *vcpu)
>   int vgic_v2_map_resources(struct domain *d)
>   {
>       struct vgic_dist *dist = &d->arch.vgic;
> -    paddr_t cbase, csize;
> +    paddr_t csize;
>       paddr_t vbase;
>       int ret;
>   
> @@ -268,7 +268,7 @@ int vgic_v2_map_resources(struct domain *d)
>        */
>       if ( is_hardware_domain(d) )
>       {
> -        d->arch.vgic.vgic_dist_base = gic_v2_hw_data.dbase;
> +        d->arch.vgic.dbase = gic_v2_hw_data.dbase;
>           /*
>            * For the hardware domain, we always map the whole HW CPU
>            * interface region in order to match the device tree (the "reg"
> @@ -276,13 +276,13 @@ int vgic_v2_map_resources(struct domain *d)
>            * Note that we assume the size of the CPU interface is always
>            * aligned to PAGE_SIZE.
>            */
> -        cbase = gic_v2_hw_data.cbase;
> +        d->arch.vgic.cbase = gic_v2_hw_data.cbase;
>           csize = gic_v2_hw_data.csize;
>           vbase = gic_v2_hw_data.vbase;
>       }
>       else
>       {
> -        d->arch.vgic.vgic_dist_base = GUEST_GICD_BASE;
> +        d->arch.vgic.dbase = GUEST_GICD_BASE;
>           /*
>            * The CPU interface exposed to the guest is always 8kB. We may
>            * need to add an offset to the virtual CPU interface base
> @@ -290,13 +290,13 @@ int vgic_v2_map_resources(struct domain *d)
>            * region.
>            */
>           BUILD_BUG_ON(GUEST_GICC_SIZE != SZ_8K);
> -        cbase = GUEST_GICC_BASE;
> +        d->arch.vgic.cbase = GUEST_GICC_BASE;
>           csize = GUEST_GICC_SIZE;
>           vbase = gic_v2_hw_data.vbase + gic_v2_hw_data.aliased_offset;
>       }
>   
>   
> -    ret = vgic_register_dist_iodev(d, gaddr_to_gfn(dist->vgic_dist_base),
> +    ret = vgic_register_dist_iodev(d, gaddr_to_gfn(dist->dbase),
>                                      VGIC_V2);
>       if ( ret )
>       {
> @@ -308,8 +308,8 @@ int vgic_v2_map_resources(struct domain *d)
>        * Map the gic virtual cpu interface in the gic cpu interface
>        * region of the guest.
>        */
> -    ret = map_mmio_regions(d, gaddr_to_gfn(cbase), csize / PAGE_SIZE,
> -                           maddr_to_mfn(vbase));
> +    ret = map_mmio_regions(d, gaddr_to_gfn(d->arch.vgic.cbase),
> +                           csize / PAGE_SIZE, maddr_to_mfn(vbase));
>       if ( ret )
>       {
>           gdprintk(XENLOG_ERR, "Unable to remap VGIC CPU to VCPU\n");
> diff --git a/xen/include/asm-arm/new_vgic.h b/xen/include/asm-arm/new_vgic.h
> index 97d622bff6..e3319900ab 100644
> --- a/xen/include/asm-arm/new_vgic.h
> +++ b/xen/include/asm-arm/new_vgic.h
> @@ -115,11 +115,11 @@ struct vgic_dist {
>       unsigned int        nr_spis;
>   
>       /* base addresses in guest physical address space: */
> -    paddr_t             vgic_dist_base;     /* distributor */
> +    paddr_t             dbase;     /* distributor */
>       union
>       {
>           /* either a GICv2 CPU interface */
> -        paddr_t         vgic_cpu_base;
> +        paddr_t         cbase;
>           /* or a number of GICv3 redistributor regions */
>           struct
>           {
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 08/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv2
  2020-04-15  1:02 ` [PATCH 08/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv2 Stefano Stabellini
@ 2020-04-15 14:00   ` Julien Grall
  2020-05-01  1:26     ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-15 14:00 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: Stefano Stabellini, Volodymyr_Babchuk



On 15/04/2020 02:02, Stefano Stabellini wrote:
> Today we use native addresses to map the GICv2 for Dom0 and fixed
> addresses for DomUs.
> 
> This patch changes the behavior so that native addresses are used for
> any domain that is_domain_direct_mapped.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> ---
>   xen/arch/arm/domain_build.c | 10 +++++++---
>   xen/arch/arm/vgic-v2.c      | 12 ++++++------
>   xen/arch/arm/vgic/vgic-v2.c |  2 +-
>   xen/include/asm-arm/vgic.h  |  1 +
>   4 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 627e0c5e8e..303bee60f6 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1643,8 +1643,12 @@ static int __init make_gicv2_domU_node(struct kernel_info *kinfo)
>       int res = 0;
>       __be32 reg[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) * 2];
>       __be32 *cells;
> +    struct domain *d = kinfo->d;
> +    char buf[38];
>   
> -    res = fdt_begin_node(fdt, "interrupt-controller@"__stringify(GUEST_GICD_BASE));
> +    snprintf(buf, sizeof(buf), "interrupt-controller@%"PRIx64,
> +             d->arch.vgic.dbase);
> +    res = fdt_begin_node(fdt, buf);
>       if ( res )
>           return res;
>   
> @@ -1666,9 +1670,9 @@ static int __init make_gicv2_domU_node(struct kernel_info *kinfo)
>   
>       cells = &reg[0];
>       dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
> -                       GUEST_GICD_BASE, GUEST_GICD_SIZE);
> +                       d->arch.vgic.dbase, GUEST_GICD_SIZE);
>       dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
> -                       GUEST_GICC_BASE, GUEST_GICC_SIZE);
> +                       d->arch.vgic.cbase, GUEST_GICC_SIZE);

You can't use the native base address and not the native size. 
Particularly, this is going to screw any GIC using 8KB alias.

It would be preferable if only expose part of the CPU interface as we do 
for the guest. So d->arch.vgic.cbase would be equal to vgic_v2_hw.dbase 
+ vgic_v2.hw.aliased_offset.


Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 09/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv3
  2020-04-15  1:02 ` [PATCH 09/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv3 Stefano Stabellini
@ 2020-04-15 14:09   ` Julien Grall
  2020-05-01  1:31     ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-15 14:09 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: Stefano Stabellini, Volodymyr_Babchuk

Hi,

On 15/04/2020 02:02, Stefano Stabellini wrote:
> Today we use native addresses to map the GICv3 for Dom0 and fixed
> addresses for DomUs.
> 
> This patch changes the behavior so that native addresses are used for
> any domain that is_domain_direct_mapped. The patch has to introduce one
> #ifndef CONFIG_NEW_VGIC because the new vgic doesn't support GICv3.

Please no #ifdef. Instead move the creation of the DT node in vgic.c and 
introduce a stub for non-GICv3 platform.

> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> ---
>   xen/arch/arm/domain_build.c | 12 +++++++++---
>   xen/arch/arm/vgic-v3.c      | 18 ++++++++++++++----
>   2 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 303bee60f6..beec0a144c 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1697,8 +1697,12 @@ static int __init make_gicv3_domU_node(struct kernel_info *kinfo)
>       int res = 0;
>       __be32 reg[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) * 2];
>       __be32 *cells;
> +    struct domain *d = kinfo->d;
> +    char buf[38];
>   
> -    res = fdt_begin_node(fdt, "interrupt-controller@"__stringify(GUEST_GICV3_GICD_BASE));
> +    snprintf(buf, sizeof(buf), "interrupt-controller@%"PRIx64,
> +             d->arch.vgic.dbase);
> +    res = fdt_begin_node(fdt, buf);
>       if ( res )
>           return res;
>   
> @@ -1720,9 +1724,11 @@ static int __init make_gicv3_domU_node(struct kernel_info *kinfo)
>   
>       cells = &reg[0];
>       dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
> -                       GUEST_GICV3_GICD_BASE, GUEST_GICV3_GICD_SIZE);
> +                       d->arch.vgic.dbase, GUEST_GICV3_GICD_SIZE);
> +#if defined(CONFIG_GICV3) && !defined(CONFIG_NEW_VGIC)

CONFIG_GICV3 does not exist. Did you mean CONFIG_HAS_GICV3?

>       dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
> -                       GUEST_GICV3_GICR0_BASE, GUEST_GICV3_GICR0_SIZE);
> +                       d->arch.vgic.rdist_regions[0].base, GUEST_GICV3_GICR0_SIZE);
> +#endif
>   
>       res = fdt_property(fdt, "reg", reg, sizeof(reg));
>       if (res)
> diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> index 4e60ba15cc..4cf430f865 100644
> --- a/xen/arch/arm/vgic-v3.c
> +++ b/xen/arch/arm/vgic-v3.c
> @@ -1677,13 +1677,25 @@ static int vgic_v3_domain_init(struct domain *d)


I think you also want to modify vgic_v3_max_rdist_count().

>        * Domain 0 gets the hardware address.
>        * Guests get the virtual platform layout.

This comment needs to be updated.

>        */
> -    if ( is_hardware_domain(d) )
> +    if ( is_domain_direct_mapped(d) )
>       {
>           unsigned int first_cpu = 0;
> +        unsigned int nr_rdist_regions;
>   
>           d->arch.vgic.dbase = vgic_v3_hw.dbase;
>   
> -        for ( i = 0; i < vgic_v3_hw.nr_rdist_regions; i++ )
> +        if ( is_hardware_domain(d) )
> +        {
> +            nr_rdist_regions = vgic_v3_hw.nr_rdist_regions;
> +            d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
> +        }
> +        else
> +        {
> +            nr_rdist_regions = 1;

What does promise your the rdist region will be big enough to cater all 
the re-distributors for your domain?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011
  2020-04-15  1:02 ` [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011 Stefano Stabellini
@ 2020-04-15 14:11   ` Julien Grall
  2020-05-01  1:26     ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-15 14:11 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: Stefano Stabellini, Volodymyr_Babchuk

Hi Stefano,

On 15/04/2020 02:02, Stefano Stabellini wrote:
> We always use a fix address to map the vPL011 to domains. The address
> could be a problem for domains that are directly mapped.
> 
> Instead, for domains that are directly mapped, reuse the address of the
> physical UART on the platform to avoid potential clashes.

How do you know the physical UART MMIO region is big enough to fit the 
PL011?

What if the user want to assign the physical UART to the domain + the 
vpl011?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU
  2020-04-15  1:02 ` [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU Stefano Stabellini
@ 2020-04-15 14:12   ` Julien Grall
  2020-04-29 21:55     ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-15 14:12 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: Stefano Stabellini, Volodymyr_Babchuk

Hi Stefano,

On 15/04/2020 02:02, Stefano Stabellini wrote:
> If xen_force (which means xen,force-assign-without-iommu was requested)
> don't try to add the device to the IOMMU. Return early instead.


Could you explain why this is an issue to call xen_force after 
iommu_add_dt_device()?

Cheers,
-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices
  2020-04-15  1:02 ` [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices Stefano Stabellini
@ 2020-04-15 14:18   ` Julien Grall
  2020-04-29 20:47     ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-15 14:18 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: Stefano Stabellini, Volodymyr_Babchuk



On 15/04/2020 02:02, Stefano Stabellini wrote:
> iomem_permit_access should be called for MMIO regions of devices
> assigned to a domain. Currently it is not called for MMIO regions of
> passthrough devices of Dom0less guests. This patch fixes it.

You can already have cases today where the MMIO regions are mapped to 
the guest but the guest doesn't have permission on them.

Can you explain why this is a problem here?

Cheers,

> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> ---
>   xen/arch/arm/domain_build.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d0fc497d07..d3d42eef5d 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1846,6 +1846,17 @@ static int __init handle_passthrough_prop(struct kernel_info *kinfo,
>               return -EINVAL;
>           }
>   
> +        res = iomem_permit_access(kinfo->d, paddr_to_pfn(mstart),
> +                                  paddr_to_pfn(PAGE_ALIGN(mstart + size - 1)));
> +        if ( res )
> +        {
> +            printk(XENLOG_ERR "Unable to permit to dom%d access to"
> +                   " 0x%"PRIx64" - 0x%"PRIx64"\n",
> +                   kinfo->d->domain_id,
> +                   mstart & PAGE_MASK, PAGE_ALIGN(mstart + size) - 1);
> +            return res;
> +        }
> +
>           res = map_regions_p2mt(kinfo->d,
>                                  gaddr_to_gfn(gstart),
>                                  PFN_DOWN(size),
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/12] direct-map DomUs
  2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
                   ` (11 preceding siblings ...)
  2020-04-15  1:02 ` [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices Stefano Stabellini
@ 2020-04-16  8:59 ` Julien Grall
  2020-04-29 20:16   ` Stefano Stabellini
  12 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-16  8:59 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel
  Cc: Artem_Mygaiev, peng.fan, andrew.cooper3, George.Dunlap,
	Bertrand.Marquis, jbeulich, Volodymyr_Babchuk



On 15/04/2020 02:02, Stefano Stabellini wrote:
> Hi all,
> 
> This series adds support for 1:1 mapping (guest physical == physical)
> the memory of dom0less domUs. The memory ranges assigned to a domU can be
> explicitly chosen by the user at boot time.
> 
> This is desirable in cases where an IOMMU is not present in the system,
> or it cannot be used. For instance, it might not be usable because it
> doesn't cover a specific device, or because it doesn't have enough
> bandwidth, or because it adds too much latency. In these cases, the user
> should use a MPU to protect the memory in the system (e.g. the Xilinx
> XMPU), configuring it with the chosen address ranges.
> 
> Cheers,
> 
> Stefano
> 
> 
> 
> The following changes since commit 7372466b21c3b6c96bb7a52754e432bac883a1e3:
> 
>    x86/mem_sharing: Fix build with !CONFIG_XSM (2020-04-10 15:20:10 +0100)
> 
> are available in the Git repository at:
> 
>    http://xenbits.xenproject.org/git-http/people/sstabellini/xen-unstable.git direct-map-1
> 
> for you to fetch changes up to 43503720ab6851a28a66fdd067f592d5354ae83a:
> 
>    xen/arm: call iomem_permit_access for passthrough devices (2020-04-14 17:42:21 -0700)
> 
> ----------------------------------------------------------------
> Stefano Stabellini (12):
>        xen: introduce xen_dom_flags
>        xen/arm: introduce arch_xen_dom_flags and direct_map
>        xen/arm: introduce 1:1 mapping for domUs
>        xen: split alloc_heap_pages in two halves for reusability
>        xen: introduce reserve_heap_pages
>        xen/arm: reserve 1:1 memory for direct_map domUs
>        xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase
>        xen/arm: if is_domain_direct_mapped use native addresses for GICv2
>        xen/arm: if is_domain_direct_mapped use native addresses for GICv3
>        xen/arm: if is_domain_direct_mapped use native UART address for vPL011

The 3 patches above cover addresses but not interrupts. Why?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability
  2020-04-15  1:02 ` [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability Stefano Stabellini
  2020-04-15 11:22   ` Wei Liu
@ 2020-04-17 10:02   ` Jan Beulich
  2020-04-29 23:09     ` Stefano Stabellini
  1 sibling, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2020-04-17 10:02 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: julien, Wei Liu, andrew.cooper3, Ian Jackson, George Dunlap,
	xen-devel, Stefano Stabellini, Volodymyr_Babchuk

On 15.04.2020 03:02, Stefano Stabellini wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -911,54 +911,18 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
>      }
>  }
>  
> -/* Allocate 2^@order contiguous pages. */
> -static struct page_info *alloc_heap_pages(
> -    unsigned int zone_lo, unsigned int zone_hi,
> -    unsigned int order, unsigned int memflags,
> -    struct domain *d)
> +static void __alloc_heap_pages(struct page_info **pgo,
> +                               unsigned int order,
> +                               unsigned int memflags,
> +                               struct domain *d)

Along the line of Wei's reply, I'd suggest the name to better reflect
the difference to alloc_heap_pages() itself. Maybe
alloc_pages_from_buddy() or alloc_buddy_pages()?

In addition
- no double leading underscores please
- instead of the function returning void and taking
  struct page_info **pgo, why not have it take and return
  struct page_info *?
- add a comment about the non-standard locking behavior

Jan


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-15  1:02 ` [PATCH 05/12] xen: introduce reserve_heap_pages Stefano Stabellini
  2020-04-15 13:24   ` Julien Grall
@ 2020-04-17 10:11   ` Jan Beulich
  2020-04-29 22:46     ` Stefano Stabellini
  1 sibling, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2020-04-17 10:11 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: julien, Wei Liu, andrew.cooper3, Ian Jackson, George Dunlap,
	xen-devel, Stefano Stabellini, Volodymyr_Babchuk

On 15.04.2020 03:02, Stefano Stabellini wrote:
> Introduce a function named reserve_heap_pages (similar to
> alloc_heap_pages) that allocates a requested memory range. Call
> __alloc_heap_pages for the implementation.
> 
> Change __alloc_heap_pages so that the original page doesn't get
> modified, giving back unneeded memory top to bottom rather than bottom
> to top.

While it may be less of a problem within a zone, doing so is
against our general "return high pages first" policy.

> @@ -1073,7 +1073,42 @@ static struct page_info *alloc_heap_pages(
>          return NULL;
>      }
>  
> -    __alloc_heap_pages(&pg, order, memflags, d);
> +    __alloc_heap_pages(pg, order, memflags, d);
> +    return pg;
> +}
> +
> +static struct page_info *reserve_heap_pages(struct domain *d,
> +                                            paddr_t start,
> +                                            unsigned int order,
> +                                            unsigned int memflags)
> +{
> +    nodeid_t node;
> +    unsigned int zone;
> +    struct page_info *pg;
> +
> +    if ( unlikely(order > MAX_ORDER) )
> +        return NULL;
> +
> +    spin_lock(&heap_lock);
> +
> +    /*
> +     * Claimed memory is considered unavailable unless the request
> +     * is made by a domain with sufficient unclaimed pages.
> +     */
> +    if ( (outstanding_claims + (1UL << order) > total_avail_pages) &&
> +          ((memflags & MEMF_no_refcount) ||
> +           !d || d->outstanding_pages < (1UL << order)) )
> +    {
> +        spin_unlock(&heap_lock);
> +        return NULL;
> +    }

Where would such a claim come from? Given the purpose I'd assume
the function (as well as reserve_domheap_pages()) can actually be
__init. With that I'd then also be okay with them getting built
unconditionally, i.e. even on x86.

> +    pg = maddr_to_page(start);
> +    node = phys_to_nid(start);
> +    zone = page_to_zone(pg);
> +    page_list_del(pg, &heap(node, zone, order));
> +
> +    __alloc_heap_pages(pg, order, memflags, d);

I agree with Julien in not seeing how this can be safe / correct.

Jan


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/12] direct-map DomUs
  2020-04-16  8:59 ` [PATCH 0/12] direct-map DomUs Julien Grall
@ 2020-04-29 20:16   ` Stefano Stabellini
  2020-04-30 12:54     ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-29 20:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Artem_Mygaiev, peng.fan, Stefano Stabellini, andrew.cooper3,
	George.Dunlap, Bertrand.Marquis, jbeulich, xen-devel,
	Volodymyr_Babchuk

On Thu, 16 Apr 2020, Julien Grall wrote:
> > Stefano Stabellini (12):
> >        xen: introduce xen_dom_flags
> >        xen/arm: introduce arch_xen_dom_flags and direct_map
> >        xen/arm: introduce 1:1 mapping for domUs
> >        xen: split alloc_heap_pages in two halves for reusability
> >        xen: introduce reserve_heap_pages
> >        xen/arm: reserve 1:1 memory for direct_map domUs
> >        xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase
> >        xen/arm: if is_domain_direct_mapped use native addresses for GICv2
> >        xen/arm: if is_domain_direct_mapped use native addresses for GICv3
> >        xen/arm: if is_domain_direct_mapped use native UART address for vPL011
> 
> The 3 patches above cover addresses but not interrupts. Why?

Hi Julien,

I take that you are referring to GUEST_VPL011_SPI, right?


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices
  2020-04-15 14:18   ` Julien Grall
@ 2020-04-29 20:47     ` Stefano Stabellini
  2020-04-30 13:01       ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-29 20:47 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Wed, 15 Apr 2020, Julien Grall wrote:
> On 15/04/2020 02:02, Stefano Stabellini wrote:
> > iomem_permit_access should be called for MMIO regions of devices
> > assigned to a domain. Currently it is not called for MMIO regions of
> > passthrough devices of Dom0less guests. This patch fixes it.
> 
> You can already have cases today where the MMIO regions are mapped to the
> guest but the guest doesn't have permission on them.
> 
> Can you explain why this is a problem here?

I don't remember the original problem that prompted me into doing this
investigation. It came up while I was developing and testing this
series: I noticed the discrepancy compared to the regular (not dom0less)
device assignment code path
(tools/libxl/libxl_create.c:domcreate_launch_dm).

Now I removed this patch from the series, went back to test it, and it
still works fine. Oops, it is not needed after all :-)


I think it makes sense to remove this patch from this series, I'll do
that.

But doesn't it make sense to give domU permission if it is going to get
the memory mapped? But admittedly I can't think of something that would
break because of the lack of the iomem_permit_access call in this code
path.


> > Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> > ---
> >   xen/arch/arm/domain_build.c | 11 +++++++++++
> >   1 file changed, 11 insertions(+)
> > 
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index d0fc497d07..d3d42eef5d 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -1846,6 +1846,17 @@ static int __init handle_passthrough_prop(struct
> > kernel_info *kinfo,
> >               return -EINVAL;
> >           }
> >   +        res = iomem_permit_access(kinfo->d, paddr_to_pfn(mstart),
> > +                                  paddr_to_pfn(PAGE_ALIGN(mstart + size -
> > 1)));
> > +        if ( res )
> > +        {
> > +            printk(XENLOG_ERR "Unable to permit to dom%d access to"
> > +                   " 0x%"PRIx64" - 0x%"PRIx64"\n",
> > +                   kinfo->d->domain_id,
> > +                   mstart & PAGE_MASK, PAGE_ALIGN(mstart + size) - 1);
> > +            return res;
> > +        }
> > +
> >           res = map_regions_p2mt(kinfo->d,
> >                                  gaddr_to_gfn(gstart),
> >                                  PFN_DOWN(size),
> > 
> 
> -- 
> Julien Grall
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU
  2020-04-15 14:12   ` Julien Grall
@ 2020-04-29 21:55     ` Stefano Stabellini
  2020-04-30 13:51       ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-29 21:55 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Wed, 15 Apr 2020, Julien Grall wrote:
> Hi Stefano,
> 
> On 15/04/2020 02:02, Stefano Stabellini wrote:
> > If xen_force (which means xen,force-assign-without-iommu was requested)
> > don't try to add the device to the IOMMU. Return early instead.
> 
> 
> Could you explain why this is an issue to call xen_force after
> iommu_add_dt_device()?

There are two issues. I should add info about both of them to the commit
message.


The first issue is that an error returned by iommu_add_dt_device (for
any reason) would cause handle_passthrough_prop to stop and return error
right away. But actually the iommu is not needed for that device if
xen_force is set.

(In fact, one of the reasons why a user might want to set
force-assign-without-iommu is because there are iommu issues with a
device.)


The second issue is about the usage of "xen,force-assign-without-iommu":
it would be useful to let the user set "xen,force-assign-without-iommu"
for devices that are described as behind a SMMU in device tree, but
the SMMU can't actually be used for some reason. Of course, the user
could always manually edit the device tree to make it look like as if
the device is not behind an IOMMU. That would work OK. However, I think
it would be better to avoid making that a requirement.

If we want to allow "xen,force-assign-without-iommu" for a device behind
a SMMU then we need this patch, otherwise this would happen:

    res = iommu_add_dt_device(node); // succeeds
    if ( xen_force && !dt_device_is_protected(node) ) // fails because the device is protected
        return 0;
    return iommu_assign_dt_device(kinfo->d, node); // fails because !is_iommu_enabled(d) which is fine but then handle_prop_pfdt returns error too



All in all, I thought it would make sense to avoid any iommu settings
and potential iommu errors altogether if xen_force is set and return
early.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-17 10:11   ` Jan Beulich
@ 2020-04-29 22:46     ` Stefano Stabellini
  2020-04-30  6:29       ` Jan Beulich
  2020-04-30 14:51       ` Julien Grall
  0 siblings, 2 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-29 22:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, julien, Wei Liu, andrew.cooper3, Ian Jackson,
	George Dunlap, xen-devel, Stefano Stabellini, Volodymyr_Babchuk

On Fri, 17 Apr 2020, Jan Beulich wrote:
> On 15.04.2020 03:02, Stefano Stabellini wrote:
> > Introduce a function named reserve_heap_pages (similar to
> > alloc_heap_pages) that allocates a requested memory range. Call
> > __alloc_heap_pages for the implementation.
> > 
> > Change __alloc_heap_pages so that the original page doesn't get
> > modified, giving back unneeded memory top to bottom rather than bottom
> > to top.
> 
> While it may be less of a problem within a zone, doing so is
> against our general "return high pages first" policy.

Is this something you'd be OK with anyway?

If not, do you have a suggestion on how to do it better? I couldn't find
a nice way to do it without code duplication, or a big nasty 'if' in the
middle of the function.


> > @@ -1073,7 +1073,42 @@ static struct page_info *alloc_heap_pages(
> >          return NULL;
> >      }
> >  
> > -    __alloc_heap_pages(&pg, order, memflags, d);
> > +    __alloc_heap_pages(pg, order, memflags, d);
> > +    return pg;
> > +}
> > +
> > +static struct page_info *reserve_heap_pages(struct domain *d,
> > +                                            paddr_t start,
> > +                                            unsigned int order,
> > +                                            unsigned int memflags)
> > +{
> > +    nodeid_t node;
> > +    unsigned int zone;
> > +    struct page_info *pg;
> > +
> > +    if ( unlikely(order > MAX_ORDER) )
> > +        return NULL;
> > +
> > +    spin_lock(&heap_lock);
> > +
> > +    /*
> > +     * Claimed memory is considered unavailable unless the request
> > +     * is made by a domain with sufficient unclaimed pages.
> > +     */
> > +    if ( (outstanding_claims + (1UL << order) > total_avail_pages) &&
> > +          ((memflags & MEMF_no_refcount) ||
> > +           !d || d->outstanding_pages < (1UL << order)) )
> > +    {
> > +        spin_unlock(&heap_lock);
> > +        return NULL;
> > +    }
> 
> Where would such a claim come from? Given the purpose I'd assume
> the function (as well as reserve_domheap_pages()) can actually be
> __init. With that I'd then also be okay with them getting built
> unconditionally, i.e. even on x86.

Yes, you are right, I'll make the function __init and also remove this
check on claimed memory.


> > +    pg = maddr_to_page(start);
> > +    node = phys_to_nid(start);
> > +    zone = page_to_zone(pg);
> > +    page_list_del(pg, &heap(node, zone, order));
> > +
> > +    __alloc_heap_pages(pg, order, memflags, d);
> 
> I agree with Julien in not seeing how this can be safe / correct.

I haven't seen any issues so far in my testing -- I imagine it is
because there aren't many memory allocations after setup_mm() and before
create_domUs()  (which on ARM is called just before
domain_unpause_by_systemcontroller at the end of start_xen.)


I gave a quick look at David's series. Is the idea that I should add a
patch to do the following:

- avoiding adding these ranges to xenheap in setup_mm, wait for later
  (a bit like reserved_mem regions)

- in construct_domU, add the range to xenheap and reserve it with reserve_heap_pages

Is that right?


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability
  2020-04-17 10:02   ` Jan Beulich
@ 2020-04-29 23:09     ` Stefano Stabellini
  0 siblings, 0 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-29 23:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, julien, Wei Liu, andrew.cooper3, Ian Jackson,
	George Dunlap, xen-devel, Stefano Stabellini, Volodymyr_Babchuk

On Fri, 17 Apr 2020, Jan Beulich wrote:
> On 15.04.2020 03:02, Stefano Stabellini wrote:
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -911,54 +911,18 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
> >      }
> >  }
> >  
> > -/* Allocate 2^@order contiguous pages. */
> > -static struct page_info *alloc_heap_pages(
> > -    unsigned int zone_lo, unsigned int zone_hi,
> > -    unsigned int order, unsigned int memflags,
> > -    struct domain *d)
> > +static void __alloc_heap_pages(struct page_info **pgo,
> > +                               unsigned int order,
> > +                               unsigned int memflags,
> > +                               struct domain *d)
> 
> Along the line of Wei's reply, I'd suggest the name to better reflect
> the difference to alloc_heap_pages() itself. Maybe
> alloc_pages_from_buddy() or alloc_buddy_pages()?
> 
> In addition
> - no double leading underscores please
> - instead of the function returning void and taking
>   struct page_info **pgo, why not have it take and return
>   struct page_info *?
> - add a comment about the non-standard locking behavior

I made all these changes


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 01/12] xen: introduce xen_dom_flags
  2020-04-15  9:12   ` Jan Beulich
  2020-04-15 13:26     ` Julien Grall
@ 2020-04-29 23:57     ` Stefano Stabellini
  1 sibling, 0 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-29 23:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, julien, Wei Liu, George Dunlap,
	andrew.cooper3, Ian Jackson, Dario Faggioli, xen-devel,
	Stefano Stabellini, Volodymyr_Babchuk, Roger Pau Monné

On Wed, 15 Apr 2020, Jan Beulich wrote:
> On 15.04.2020 03:02, Stefano Stabellini wrote:
> > We are passing an extra special boolean flag at domain creation to
> > specify whether we want to the domain to be privileged (i.e. dom0) or
> > not. Another flag will be introduced later in this series.
> > 
> > Introduce a new struct xen_dom_flags and move the privileged flag to it.
> > Other flags will be added to struct xen_dom_flags.
> 
> I'm unsure whether introducing a 2nd structure is worth it here.
> We could as well define some internal-use-only flags for
> struct xen_domctl_createdomain's respective field.

Yep, great idea, I'll do that instead.


> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -529,7 +529,8 @@ static bool emulation_flags_ok(const struct domain *d, uint32_t emflags)
> >  }
> >  
> >  int arch_domain_create(struct domain *d,
> > -                       struct xen_domctl_createdomain *config)
> > +                       struct xen_domctl_createdomain *config,
> > +                       struct xen_dom_flags *flags)
> 
> const (also elsewhere)?

All of this goes away now, using the exising flag field in
xen_domctl_createdomain, reserving the top 16 bits for internal usage.


> > --- a/xen/arch/x86/setup.c
> > +++ b/xen/arch/x86/setup.c
> > @@ -706,6 +706,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >          .max_maptrack_frames = -1,
> >      };
> >      const char *hypervisor_name;
> > +    struct xen_dom_flags flags = { !pv_shim };
> 
> Here and elsewhere please use field designators right away, even if
> there's only a single field now.
> 
> > @@ -363,7 +363,7 @@ struct domain *domain_create(domid_t domid,
> >      ASSERT(is_system_domain(d) ? config == NULL : config != NULL);
> >  
> >      /* Sort out our idea of is_control_domain(). */
> > -    d->is_privileged = is_priv;
> > +    d->is_privileged =  flags ? flags->is_priv : false;
> 
> Stray double blanks.
> 
> > --- a/xen/common/domctl.c
> > +++ b/xen/common/domctl.c
> > @@ -364,6 +364,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> >      bool_t copyback = 0;
> >      struct xen_domctl curop, *op = &curop;
> >      struct domain *d;
> > +    struct xen_dom_flags flags ={ false };
> 
> Missing blank.
> 
> > --- a/xen/include/xen/domain.h
> > +++ b/xen/include/xen/domain.h
> > @@ -63,8 +63,13 @@ void arch_vcpu_destroy(struct vcpu *v);
> >  int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset);
> >  void unmap_vcpu_info(struct vcpu *v);
> >  
> > +struct xen_dom_flags {
> > +    bool is_priv;
> 
> Use a single bit bitfield instead? May even want to consider passing
> this struct by value then.
 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 02/12] xen/arm: introduce arch_xen_dom_flags and direct_map
  2020-04-15 10:27   ` Jan Beulich
  2020-04-15 11:27     ` Andrew Cooper
@ 2020-04-30  0:34     ` Stefano Stabellini
  1 sibling, 0 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-30  0:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, julien, Wei Liu, George Dunlap,
	andrew.cooper3, Ian Jackson, xen-devel, Stefano Stabellini,
	Volodymyr_Babchuk, Roger Pau Monné

On Wed, 15 Apr 2020, Jan Beulich wrote:
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -2527,6 +2527,7 @@ int __init construct_dom0(struct domain *d)
> >  
> >      iommu_hwdom_init(d);
> >  
> > +    d->arch.direct_map = true;
> 
> Shouldn't this get set via arch_domain_create() instead?

Yes you are right, this is unnecessary and I can remove it.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-29 22:46     ` Stefano Stabellini
@ 2020-04-30  6:29       ` Jan Beulich
  2020-04-30 16:21         ` Stefano Stabellini
  2020-04-30 14:51       ` Julien Grall
  1 sibling, 1 reply; 67+ messages in thread
From: Jan Beulich @ 2020-04-30  6:29 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: julien, Wei Liu, andrew.cooper3, Ian Jackson, George Dunlap,
	xen-devel, Stefano Stabellini, Volodymyr_Babchuk

On 30.04.2020 00:46, Stefano Stabellini wrote:
> On Fri, 17 Apr 2020, Jan Beulich wrote:
>> On 15.04.2020 03:02, Stefano Stabellini wrote:
>>> Introduce a function named reserve_heap_pages (similar to
>>> alloc_heap_pages) that allocates a requested memory range. Call
>>> __alloc_heap_pages for the implementation.
>>>
>>> Change __alloc_heap_pages so that the original page doesn't get
>>> modified, giving back unneeded memory top to bottom rather than bottom
>>> to top.
>>
>> While it may be less of a problem within a zone, doing so is
>> against our general "return high pages first" policy.
> 
> Is this something you'd be OK with anyway?

As a last resort, maybe. But it really depends on why it needs to be
this way.

> If not, do you have a suggestion on how to do it better? I couldn't find
> a nice way to do it without code duplication, or a big nasty 'if' in the
> middle of the function.

I'd first need to understand the problem to solve.

>>> +    pg = maddr_to_page(start);
>>> +    node = phys_to_nid(start);
>>> +    zone = page_to_zone(pg);
>>> +    page_list_del(pg, &heap(node, zone, order));
>>> +
>>> +    __alloc_heap_pages(pg, order, memflags, d);
>>
>> I agree with Julien in not seeing how this can be safe / correct.
> 
> I haven't seen any issues so far in my testing -- I imagine it is
> because there aren't many memory allocations after setup_mm() and before
> create_domUs()  (which on ARM is called just before
> domain_unpause_by_systemcontroller at the end of start_xen.)
> 
> 
> I gave a quick look at David's series. Is the idea that I should add a
> patch to do the following:
> 
> - avoiding adding these ranges to xenheap in setup_mm, wait for later
>   (a bit like reserved_mem regions)
> 
> - in construct_domU, add the range to xenheap and reserve it with reserve_heap_pages
> 
> Is that right?

This may be one way, but it may also be not the only possible one.
The main thing to arrange for is that there is either a guarantee
for these ranges to be free (which I think you want to check in
any event, rather than risking to give out something that's already
in use elsewhere), or that you skip ranges which are already in use
(potentially altering [decreasing?] what the specific domain gets
allocated).

Jan


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/12] direct-map DomUs
  2020-04-29 20:16   ` Stefano Stabellini
@ 2020-04-30 12:54     ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2020-04-30 12:54 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Artem_Mygaiev, peng.fan, andrew.cooper3, George.Dunlap,
	Bertrand.Marquis, jbeulich, xen-devel, Volodymyr_Babchuk

Hi Stefano,

On 29/04/2020 21:16, Stefano Stabellini wrote:
> On Thu, 16 Apr 2020, Julien Grall wrote:
>>> Stefano Stabellini (12):
>>>         xen: introduce xen_dom_flags
>>>         xen/arm: introduce arch_xen_dom_flags and direct_map
>>>         xen/arm: introduce 1:1 mapping for domUs
>>>         xen: split alloc_heap_pages in two halves for reusability
>>>         xen: introduce reserve_heap_pages
>>>         xen/arm: reserve 1:1 memory for direct_map domUs
>>>         xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase
>>>         xen/arm: if is_domain_direct_mapped use native addresses for GICv2
>>>         xen/arm: if is_domain_direct_mapped use native addresses for GICv3
>>>         xen/arm: if is_domain_direct_mapped use native UART address for vPL011
>>
>> The 3 patches above cover addresses but not interrupts. Why?
> 
> Hi Julien,
> 
> I take that you are referring to GUEST_VPL011_SPI, right?

GUEST_VPL011_SPI is at least one of them. For long term, we may want to 
consider PPIs as well (e.g timer).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices
  2020-04-29 20:47     ` Stefano Stabellini
@ 2020-04-30 13:01       ` Julien Grall
  2020-05-24 14:12         ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-30 13:01 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk

Hi Stefano,

On 29/04/2020 21:47, Stefano Stabellini wrote:
> On Wed, 15 Apr 2020, Julien Grall wrote: 
> But doesn't it make sense to give domU permission if it is going to get
> the memory mapped? But admittedly I can't think of something that would
> break because of the lack of the iomem_permit_access call in this code
> path.

On Arm, the permissions are only useful if you plan you DomU to delegate 
the regions to another domain. As your domain is not even aware it is 
running on Xen (we don't expose 'xen' node in the DT), it makes little 
sense to add the permission.

Even today, you can map IOMEM to a DomU and then revert the permission 
right after. They IOMEM will still be mapped in the guest and it will 
act normaly.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU
  2020-04-29 21:55     ` Stefano Stabellini
@ 2020-04-30 13:51       ` Julien Grall
  2020-05-01  1:25         ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-30 13:51 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk

Hi Stefano,

On 29/04/2020 22:55, Stefano Stabellini wrote:
> On Wed, 15 Apr 2020, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 15/04/2020 02:02, Stefano Stabellini wrote:
>>> If xen_force (which means xen,force-assign-without-iommu was requested)
>>> don't try to add the device to the IOMMU. Return early instead.
>>
>>
>> Could you explain why this is an issue to call xen_force after
>> iommu_add_dt_device()?
> 
> There are two issues. I should add info about both of them to the commit
> message.
> 
> 
> The first issue is that an error returned by iommu_add_dt_device (for
> any reason) would cause handle_passthrough_prop to stop and return error
> right away. But actually the iommu is not needed for that device if
> xen_force is set.

During boot, Xen will configure the IOMMUs to fault on any DMA 
transactions that are not valid. So if you don't call 
iommu_assign_dt_device(), then your device will be unusable.

Without your patch, the user will know directly something went wrong. 
With your patch, the fault may occur much later and be more difficult to 
diagnostics.

> (In fact, one of the reasons why a user might want to set
> force-assign-without-iommu is because there are iommu issues with a
> device.)
This would not work because of the reasons I explained above. The only 
way would be to configure the IOMMU in bypass mode for that device.

So you would still need to call the IOMMU subsystem.

> 
> 
> The second issue is about the usage of "xen,force-assign-without-iommu":
> it would be useful to let the user set "xen,force-assign-without-iommu"
> for devices that are described as behind a SMMU in device tree, but
> the SMMU can't actually be used for some reason. Of course, the user
> could always manually edit the device tree to make it look like as if
> the device is not behind an IOMMU. That would work OK. However, I think
> it would be better to avoid making that a requirement.

 From the documentation:

"If xen,force-assign-without-iommu is present, Xen allows to assign a
device even if it is not behind an IOMMU. This renders your platform
*unsafe* if the device is DMA-capable."

xen,force-assign-without-iommu was never meant to be used if the device 
is protected behind an IOMMU. If you want to do that, then your patch is 
not going to be sufficient (see why above).

> 
> If we want to allow "xen,force-assign-without-iommu" for a device behind
> a SMMU then we need this patch, otherwise this would happen:
> 
>      res = iommu_add_dt_device(node); // succeeds
>      if ( xen_force && !dt_device_is_protected(node) ) // fails because the device is protected
>          return 0;
>      return iommu_assign_dt_device(kinfo->d, node); // fails because !is_iommu_enabled(d) which is fine but then handle_prop_pfdt returns error too

You are mixing two things here... xen,force-assign-without-iommu doesn't 
have a say on whether the IOMMU will be used for a domain. This decision 
is only based on whether a partial DT exists and (with your patch #3) 
whether the DomU memory is direct mapped.

The problem here is your are not enabling the IOMMU when a direct 
mapping is used. I don't think we want the direct mapping option to 
disable the IOMMU. This should be a separate option (maybe a bool 
property iommu).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-29 22:46     ` Stefano Stabellini
  2020-04-30  6:29       ` Jan Beulich
@ 2020-04-30 14:51       ` Julien Grall
  2020-04-30 17:00         ` Stefano Stabellini
  1 sibling, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-30 14:51 UTC (permalink / raw)
  To: Stefano Stabellini, Jan Beulich
  Cc: Wei Liu, andrew.cooper3, Ian Jackson, George Dunlap, xen-devel,
	Stefano Stabellini, Volodymyr_Babchuk, Woodhouse, David

Hi,

On 29/04/2020 23:46, Stefano Stabellini wrote:
> On Fri, 17 Apr 2020, Jan Beulich wrote:
>> On 15.04.2020 03:02, Stefano Stabellini wrote:
>>> Introduce a function named reserve_heap_pages (similar to
>>> alloc_heap_pages) that allocates a requested memory range. Call
>>> __alloc_heap_pages for the implementation.
>>>
>>> Change __alloc_heap_pages so that the original page doesn't get
>>> modified, giving back unneeded memory top to bottom rather than bottom
>>> to top.
>>
>> While it may be less of a problem within a zone, doing so is
>> against our general "return high pages first" policy.
> 
> Is this something you'd be OK with anyway?
> 
> If not, do you have a suggestion on how to do it better? I couldn't find
> a nice way to do it without code duplication, or a big nasty 'if' in the
> middle of the function.
> 
> 
>>> @@ -1073,7 +1073,42 @@ static struct page_info *alloc_heap_pages(
>>>           return NULL;
>>>       }
>>>   
>>> -    __alloc_heap_pages(&pg, order, memflags, d);
>>> +    __alloc_heap_pages(pg, order, memflags, d);
>>> +    return pg;
>>> +}
>>> +
>>> +static struct page_info *reserve_heap_pages(struct domain *d,
>>> +                                            paddr_t start,
>>> +                                            unsigned int order,
>>> +                                            unsigned int memflags)
>>> +{
>>> +    nodeid_t node;
>>> +    unsigned int zone;
>>> +    struct page_info *pg;
>>> +
>>> +    if ( unlikely(order > MAX_ORDER) )
>>> +        return NULL;
>>> +
>>> +    spin_lock(&heap_lock);
>>> +
>>> +    /*
>>> +     * Claimed memory is considered unavailable unless the request
>>> +     * is made by a domain with sufficient unclaimed pages.
>>> +     */
>>> +    if ( (outstanding_claims + (1UL << order) > total_avail_pages) &&
>>> +          ((memflags & MEMF_no_refcount) ||
>>> +           !d || d->outstanding_pages < (1UL << order)) )
>>> +    {
>>> +        spin_unlock(&heap_lock);
>>> +        return NULL;
>>> +    }
>>
>> Where would such a claim come from? Given the purpose I'd assume
>> the function (as well as reserve_domheap_pages()) can actually be
>> __init. With that I'd then also be okay with them getting built
>> unconditionally, i.e. even on x86.
> 
> Yes, you are right, I'll make the function __init and also remove this
> check on claimed memory.
> 
> 
>>> +    pg = maddr_to_page(start);
>>> +    node = phys_to_nid(start);
>>> +    zone = page_to_zone(pg);
>>> +    page_list_del(pg, &heap(node, zone, order));
>>> +
>>> +    __alloc_heap_pages(pg, order, memflags, d);
>>
>> I agree with Julien in not seeing how this can be safe / correct.
> 
> I haven't seen any issues so far in my testing -- I imagine it is
> because there aren't many memory allocations after setup_mm() and before
> create_domUs()  (which on ARM is called just before
> domain_unpause_by_systemcontroller at the end of start_xen.)

I am not sure why you exclude setup_mm(). Any memory allocated (boot 
allocator, xenheap) can clash with your regions. The main memory 
allocations are for the frametable and dom0. I would say you were lucky 
to not hit them.

> 
> 
> I gave a quick look at David's series. Is the idea that I should add a
> patch to do the following:
> 
> - avoiding adding these ranges to xenheap in setup_mm, wait for later
>    (a bit like reserved_mem regions)

I guess by xenheap, you mean domheap? But the problem is not only for 
domheap, it is also for any memory allocated via the boot allocator. So 
you need to exclude those regions from any possible allocations.

> 
> - in construct_domU, add the range to xenheap and reserve it with reserve_heap_pages

I am afraid you can't give the regions to the allocator and then 
allocate them. The allocator is free to use any page for its own purpose 
or exclude them.

AFAICT, the allocator doesn't have a list of page in use. It only keeps 
track of free pages. So we can make the content of struct page_info to 
look like it was allocated by the allocator.

We would need to be careful when giving a page back to allocator as the 
page would need to be initialized (see [1]). This may not be a concern 
for Dom0less as the domain may never be destroyed but will be for 
correctness PoV.

For LiveUpdate, the original Xen will carve out space to use by the boot 
allocator in the new Xen. But I think this is not necessary in your context.

It should be sufficient to exclude the page from the boot allocators (as 
we do for other modules).

One potential issue that can arise is there is no easy way today to 
differentiate between pages allocated and pages not yet initialized. To 
make the code robust, we need to prevent a page to be used in two 
places. So for LiveUpdate we are marking them with a special value, this 
is used afterwards to check we are effictively using a reserved page.

I hope this helps.

Cheers,

[1] <20200319212150.2651419-2-dwmw2@infradead.org>

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-30  6:29       ` Jan Beulich
@ 2020-04-30 16:21         ` Stefano Stabellini
  2020-05-04  9:16           ` Jan Beulich
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-30 16:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, julien, Wei Liu, andrew.cooper3, Ian Jackson,
	George Dunlap, xen-devel, Stefano Stabellini, Volodymyr_Babchuk

On Thu, 30 Apr 2020, Jan Beulich wrote:
> On 30.04.2020 00:46, Stefano Stabellini wrote:
> > On Fri, 17 Apr 2020, Jan Beulich wrote:
> >> On 15.04.2020 03:02, Stefano Stabellini wrote:
> >>> Introduce a function named reserve_heap_pages (similar to
> >>> alloc_heap_pages) that allocates a requested memory range. Call
> >>> __alloc_heap_pages for the implementation.
> >>>
> >>> Change __alloc_heap_pages so that the original page doesn't get
> >>> modified, giving back unneeded memory top to bottom rather than bottom
> >>> to top.
> >>
> >> While it may be less of a problem within a zone, doing so is
> >> against our general "return high pages first" policy.
> > 
> > Is this something you'd be OK with anyway?
> 
> As a last resort, maybe. But it really depends on why it needs to be
> this way.
> 
> > If not, do you have a suggestion on how to do it better? I couldn't find
> > a nice way to do it without code duplication, or a big nasty 'if' in the
> > middle of the function.
> 
> I'd first need to understand the problem to solve.

OK, I'll make an example.

reserve_heap_pages wants to reserve the range 0x10000000 - 0x20000000.

reserve_heap_pages get the struct page_info for 0x10000000 and calls
alloc_pages_from_buddy to allocate an order of 28.

alloc_pages_from_buddy realizes that the free memory area starting from
0x10000000 is actually of order 30, even larger than the requested order
of 28. The free area is 0x10000000 - 0x50000000.

alloc_pages_from_buddy instead of just allocating an order of 28
starting from 0x10000000, it would allocate the "top" order of 28 in the
free area. So it would allocate: 0x40000000 - 0x50000000, returning
0x40000000.

Of course, this doesn't work for reserve_heap_pages.


This patch changes the behavior of alloc_pages_from_buddy so that in a
situation like the above, it would return 0x10000000 - 0x20000000
(leaving the rest of the free area unallocated.)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-30 14:51       ` Julien Grall
@ 2020-04-30 17:00         ` Stefano Stabellini
  2020-04-30 18:27           ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-04-30 17:00 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, andrew.cooper3, Ian Jackson,
	George Dunlap, Jan Beulich, xen-devel, Stefano Stabellini,
	Volodymyr_Babchuk, Woodhouse, David

On Thu, 30 Apr 2020, Julien Grall wrote:
> > > > +    pg = maddr_to_page(start);
> > > > +    node = phys_to_nid(start);
> > > > +    zone = page_to_zone(pg);
> > > > +    page_list_del(pg, &heap(node, zone, order));
> > > > +
> > > > +    __alloc_heap_pages(pg, order, memflags, d);
> > > 
> > > I agree with Julien in not seeing how this can be safe / correct.
> > 
> > I haven't seen any issues so far in my testing -- I imagine it is
> > because there aren't many memory allocations after setup_mm() and before
> > create_domUs()  (which on ARM is called just before
> > domain_unpause_by_systemcontroller at the end of start_xen.)
> 
> I am not sure why you exclude setup_mm(). Any memory allocated (boot
> allocator, xenheap) can clash with your regions. The main memory allocations
> are for the frametable and dom0. I would say you were lucky to not hit them.

Maybe it is because Xen typically allocates memory top-down? So if I
chose a high range then I would see a failure? But I have been mostly
testing with ranges close to the begin of RAM (as opposed to
ranges close to the end of RAM.)

 
> > I gave a quick look at David's series. Is the idea that I should add a
> > patch to do the following:
> > 
> > - avoiding adding these ranges to xenheap in setup_mm, wait for later
> >    (a bit like reserved_mem regions)
> 
> I guess by xenheap, you mean domheap? But the problem is not only for domheap,
> it is also for any memory allocated via the boot allocator. So you need to
> exclude those regions from any possible allocations.

OK, I think we are saying the same thing but let me check.

By boot allocator you mean alloc_boot_pages, right? That boot allocator
operates on ranges given to it by init_boot_pages calls.
init_boot_pages is called from setup_mm. I didn't write it clearly but
I also meant not calling init_boot_pages on them from setup_mm.

Are we saying the same thing?



> > - in construct_domU, add the range to xenheap and reserve it with
> > reserve_heap_pages
> 
> I am afraid you can't give the regions to the allocator and then allocate
> them. The allocator is free to use any page for its own purpose or exclude
> them.
>
> AFAICT, the allocator doesn't have a list of page in use. It only keeps track
> of free pages. So we can make the content of struct page_info to look like it
> was allocated by the allocator.
> 
> We would need to be careful when giving a page back to allocator as the page
> would need to be initialized (see [1]). This may not be a concern for Dom0less
> as the domain may never be destroyed but will be for correctness PoV.
> 
> For LiveUpdate, the original Xen will carve out space to use by the boot
> allocator in the new Xen. But I think this is not necessary in your context.
> 
> It should be sufficient to exclude the page from the boot allocators (as we do
> for other modules).
> 
> One potential issue that can arise is there is no easy way today to
> differentiate between pages allocated and pages not yet initialized. To make
> the code robust, we need to prevent a page to be used in two places. So for
> LiveUpdate we are marking them with a special value, this is used afterwards
> to check we are effictively using a reserved page.
> 
> I hope this helps.

Thanks for writing all of this down but I haven't understood some of it.

For the sake of this discussion let's say that we managed to "reserve"
the range early enough like we do for other modules, as you wrote.

At the point where we want to call reserve_heap_pages() we would call
init_heap_pages() just before it. We are still relatively early at boot
so there aren't any concurrent memory operations. Why this doesn't work?

If it doesn't work, I am not following what is your alternative
suggestion about making "the content of struct page_info to look like it
was allocated by the allocator."


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-30 17:00         ` Stefano Stabellini
@ 2020-04-30 18:27           ` Julien Grall
  2020-05-12  1:10             ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-04-30 18:27 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Wei Liu, andrew.cooper3, Ian Jackson, George Dunlap, Jan Beulich,
	xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Woodhouse,
	David

Hi,

On 30/04/2020 18:00, Stefano Stabellini wrote:
> On Thu, 30 Apr 2020, Julien Grall wrote:
>>>>> +    pg = maddr_to_page(start);
>>>>> +    node = phys_to_nid(start);
>>>>> +    zone = page_to_zone(pg);
>>>>> +    page_list_del(pg, &heap(node, zone, order));
>>>>> +
>>>>> +    __alloc_heap_pages(pg, order, memflags, d);
>>>>
>>>> I agree with Julien in not seeing how this can be safe / correct.
>>>
>>> I haven't seen any issues so far in my testing -- I imagine it is
>>> because there aren't many memory allocations after setup_mm() and before
>>> create_domUs()  (which on ARM is called just before
>>> domain_unpause_by_systemcontroller at the end of start_xen.)
>>
>> I am not sure why you exclude setup_mm(). Any memory allocated (boot
>> allocator, xenheap) can clash with your regions. The main memory allocations
>> are for the frametable and dom0. I would say you were lucky to not hit them.
> 
> Maybe it is because Xen typically allocates memory top-down? So if I
> chose a high range then I would see a failure? But I have been mostly
> testing with ranges close to the begin of RAM (as opposed to
> ranges close to the end of RAM.)

I haven't looked at the details of the implementation, but you can try 
to specify dom0 addresses for your domU. You should see a failure.

> 
>   
>>> I gave a quick look at David's series. Is the idea that I should add a
>>> patch to do the following:
>>>
>>> - avoiding adding these ranges to xenheap in setup_mm, wait for later
>>>     (a bit like reserved_mem regions)
>>
>> I guess by xenheap, you mean domheap? But the problem is not only for domheap,
>> it is also for any memory allocated via the boot allocator. So you need to
>> exclude those regions from any possible allocations.
> 
> OK, I think we are saying the same thing but let me check.
> 
> By boot allocator you mean alloc_boot_pages, right? That boot allocator
> operates on ranges given to it by init_boot_pages calls.

That's correct.

> init_boot_pages is called from setup_mm. I didn't write it clearly but
> I also meant not calling init_boot_pages on them from setup_mm.
> 
> Are we saying the same thing?

Yes.

> 
> 
>>> - in construct_domU, add the range to xenheap and reserve it with
>>> reserve_heap_pages
>>
>> I am afraid you can't give the regions to the allocator and then allocate
>> them. The allocator is free to use any page for its own purpose or exclude
>> them.
>>
>> AFAICT, the allocator doesn't have a list of page in use. It only keeps track
>> of free pages. So we can make the content of struct page_info to look like it
>> was allocated by the allocator.
>>
>> We would need to be careful when giving a page back to allocator as the page
>> would need to be initialized (see [1]). This may not be a concern for Dom0less
>> as the domain may never be destroyed but will be for correctness PoV.
>>
>> For LiveUpdate, the original Xen will carve out space to use by the boot
>> allocator in the new Xen. But I think this is not necessary in your context.
>>
>> It should be sufficient to exclude the page from the boot allocators (as we do
>> for other modules).
>>
>> One potential issue that can arise is there is no easy way today to
>> differentiate between pages allocated and pages not yet initialized. To make
>> the code robust, we need to prevent a page to be used in two places. So for
>> LiveUpdate we are marking them with a special value, this is used afterwards
>> to check we are effictively using a reserved page.
>>
>> I hope this helps.
> 
> Thanks for writing all of this down but I haven't understood some of it.
> 
> For the sake of this discussion let's say that we managed to "reserve"
> the range early enough like we do for other modules, as you wrote.
> 
> At the point where we want to call reserve_heap_pages() we would call
> init_heap_pages() just before it. We are still relatively early at boot
> so there aren't any concurrent memory operations. Why this doesn't work?

Because init_heap_pages() may exclude some pages (for instance MFN 0 is 
carved out) or use pages for its internal structure (see 
init_node_heap()). So you can't expect to be able to allocate the exact 
same region by reserve_heap_pages().

> 
> If it doesn't work, I am not following what is your alternative
> suggestion about making "the content of struct page_info to look like it
> was allocated by the allocator."

If you look at alloc_heap_pages(), it will allocate pages, the allocator 
will initialize some fields in struct page_info before returning the 
page. We basically need to do the same thing, so the struct page_info 
looks exactly the same whether we call alloc_heap_pages() or use memory 
that was carved out from the allocator.

David has spent more time than me on this problem, so I may be missing 
some bits. Based on what we did in the LU PoC, my suggestion would be to:
    1) Carve out the memory from any allocator (and before any memory is 
allocated).
    2) Make sure a struct page_info is allocated for those regions in 
the boot allocator
    3) Mark the regions as reserved in the frametable so we can 
differentiate from the others pages.
    4) Allocate the region when necessary

When it is necessary to allocate the region. For each page:
    1) Check if it is a valid page
    2) Check if the page is reserved
    3) Do the necessary preparation on struct page_info

At the moment, in the LU PoC, we are using count_info = PGC_allocated to 
mark the reserved page. I don't particularly like it and not sure of the 
consequence. So I am open to a different way to mark them.

The last part we need to take care is how to hand over the pages to the 
allocator. This may happen if your domain die or ballooning (although 
not in the direct map case). Even without this series, this is actually 
already a problem today because boot allocator pages may be freed 
afterwards (I think this can only happen on x86 so far). But we are 
getting away because in most of the cases you never carve out a full 
NUMA node. This is where David's patch should help.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU
  2020-04-30 13:51       ` Julien Grall
@ 2020-05-01  1:25         ` Stefano Stabellini
  0 siblings, 0 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-01  1:25 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Thu, 30 Apr 2020, Julien Grall wrote:
> Hi Stefano,
> 
> On 29/04/2020 22:55, Stefano Stabellini wrote:
> > On Wed, 15 Apr 2020, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 15/04/2020 02:02, Stefano Stabellini wrote:
> > > > If xen_force (which means xen,force-assign-without-iommu was requested)
> > > > don't try to add the device to the IOMMU. Return early instead.
> > > 
> > > 
> > > Could you explain why this is an issue to call xen_force after
> > > iommu_add_dt_device()?
> > 
> > There are two issues. I should add info about both of them to the commit
> > message.
> > 
> > 
> > The first issue is that an error returned by iommu_add_dt_device (for
> > any reason) would cause handle_passthrough_prop to stop and return error
> > right away. But actually the iommu is not needed for that device if
> > xen_force is set.
> 
> During boot, Xen will configure the IOMMUs to fault on any DMA transactions
> that are not valid. So if you don't call iommu_assign_dt_device(), then your
> device will be unusable.
> 
> Without your patch, the user will know directly something went wrong. With
> your patch, the fault may occur much later and be more difficult to
> diagnostics.
> 
> > (In fact, one of the reasons why a user might want to set
> > force-assign-without-iommu is because there are iommu issues with a
> > device.)
> This would not work because of the reasons I explained above. The only way
> would be to configure the IOMMU in bypass mode for that device.
> 
> So you would still need to call the IOMMU subsystem.

What you wrote makes sense and also matches my understanding. But some
of my testing results confused me. I am going to go back and look at
this closely again, but I am thinking of dropping this patch and
avoiding interfering with IOMMU settings.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011
  2020-04-15 14:11   ` Julien Grall
@ 2020-05-01  1:26     ` Stefano Stabellini
  2020-05-01  8:09       ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-01  1:26 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Wed, 15 Apr 2020, Julien Grall wrote:
> Hi Stefano,
> 
> On 15/04/2020 02:02, Stefano Stabellini wrote:
> > We always use a fix address to map the vPL011 to domains. The address
> > could be a problem for domains that are directly mapped.
> > 
> > Instead, for domains that are directly mapped, reuse the address of the
> > physical UART on the platform to avoid potential clashes.
> 
> How do you know the physical UART MMIO region is big enough to fit the PL011?

That cannot be because the vPL011 MMIO size is 1 page, which is the
minimum right?


> What if the user want to assign the physical UART to the domain + the vpl011?

A user can assign a UART to the domain, but it cannot assign the UART
used by Xen (DTUART), which is the one we are using here to get the
physical information.


(If there is no UART used by Xen then we'll fall back to the virtual
addresses. If they conflict we return error and let the user fix the
configuration.)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 08/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv2
  2020-04-15 14:00   ` Julien Grall
@ 2020-05-01  1:26     ` Stefano Stabellini
  2020-05-01  8:23       ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-01  1:26 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Wed, 15 Apr 2020, Julien Grall wrote:
> On 15/04/2020 02:02, Stefano Stabellini wrote:
> > Today we use native addresses to map the GICv2 for Dom0 and fixed
> > addresses for DomUs.
> > 
> > This patch changes the behavior so that native addresses are used for
> > any domain that is_domain_direct_mapped.
> > 
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> > ---
> >   xen/arch/arm/domain_build.c | 10 +++++++---
> >   xen/arch/arm/vgic-v2.c      | 12 ++++++------
> >   xen/arch/arm/vgic/vgic-v2.c |  2 +-
> >   xen/include/asm-arm/vgic.h  |  1 +
> >   4 files changed, 15 insertions(+), 10 deletions(-)
> > 
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index 627e0c5e8e..303bee60f6 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -1643,8 +1643,12 @@ static int __init make_gicv2_domU_node(struct
> > kernel_info *kinfo)
> >       int res = 0;
> >       __be32 reg[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) * 2];
> >       __be32 *cells;
> > +    struct domain *d = kinfo->d;
> > +    char buf[38];
> >   -    res = fdt_begin_node(fdt,
> > "interrupt-controller@"__stringify(GUEST_GICD_BASE));
> > +    snprintf(buf, sizeof(buf), "interrupt-controller@%"PRIx64,
> > +             d->arch.vgic.dbase);
> > +    res = fdt_begin_node(fdt, buf);
> >       if ( res )
> >           return res;
> >   @@ -1666,9 +1670,9 @@ static int __init make_gicv2_domU_node(struct
> > kernel_info *kinfo)
> >         cells = &reg[0];
> >       dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS,
> > GUEST_ROOT_SIZE_CELLS,
> > -                       GUEST_GICD_BASE, GUEST_GICD_SIZE);
> > +                       d->arch.vgic.dbase, GUEST_GICD_SIZE);
> >       dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS,
> > GUEST_ROOT_SIZE_CELLS,
> > -                       GUEST_GICC_BASE, GUEST_GICC_SIZE);
> > +                       d->arch.vgic.cbase, GUEST_GICC_SIZE);
> 
> You can't use the native base address and not the native size. Particularly,
> this is going to screw any GIC using 8KB alias.

I don't follow why it could cause problems with a GIC using the 8KB
alias. Maybe an example (even a fake example) would help.



> It would be preferable if only expose part of the CPU interface as we do for
> the guest. So d->arch.vgic.cbase would be equal to vgic_v2_hw.dbase +
> vgic_v2.hw.aliased_offset.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs
  2020-04-15 13:36   ` Julien Grall
@ 2020-05-01  1:26     ` Stefano Stabellini
  2020-05-01  8:30       ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-01  1:26 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Wed, 15 Apr 2020, Julien Grall wrote:
> On 15/04/2020 02:02, Stefano Stabellini wrote:
> > In some cases it is desirable to map domU memory 1:1 (guest physical ==
> > physical.) For instance, because we want to assign a device to the domU
> > but the IOMMU is not present or cannot be used. In these cases, other
> > mechanisms should be used for DMA protection, e.g. a MPU.
> 
> I am not against this, however the documentation should clearly explain that
> you are making your platform insecure unless you have other mean for DMA
> protection.

I'll expand the docs


> > 
> > This patch introduces a new device tree option for dom0less guests to
> > request a domain to be directly mapped. It also specifies the memory
> > ranges. This patch documents the new attribute and parses it at boot
> > time. (However, the implementation of 1:1 mapping is missing and just
> > BUG() out at the moment.)  Finally the patch sets the new direct_map
> > flag for DomU domains.
> > 
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> > ---
> >   docs/misc/arm/device-tree/booting.txt | 13 +++++++
> >   docs/misc/arm/passthrough-noiommu.txt | 35 ++++++++++++++++++
> >   xen/arch/arm/domain_build.c           | 52 +++++++++++++++++++++++++--
> >   3 files changed, 98 insertions(+), 2 deletions(-)
> >   create mode 100644 docs/misc/arm/passthrough-noiommu.txt
> > 
> > diff --git a/docs/misc/arm/device-tree/booting.txt
> > b/docs/misc/arm/device-tree/booting.txt
> > index 5243bc7fd3..fce5f7ed5a 100644
> > --- a/docs/misc/arm/device-tree/booting.txt
> > +++ b/docs/misc/arm/device-tree/booting.txt
> > @@ -159,6 +159,19 @@ with the following properties:
> >       used, or GUEST_VPL011_SPI+1 if vpl011 is enabled, whichever is
> >       greater.
> >   +- direct-map
> > +
> > +    Optional. An array of integer pairs specifying addresses and sizes.
> > +    direct_map requests the memory of the domain to be 1:1 mapped with
> > +    the memory ranges specified as argument. Only sizes that are a
> > +    power of two number of pages are allowed.
> > +
> > +- #direct-map-addr-cells and #direct-map-size-cells
> > +
> > +    The number of cells to use for the addresses and for the sizes in
> > +    direct-map. Default and maximum are 2 cells for both addresses and
> > +    sizes.
> > +
> 
> As this is going to be mostly used for passthrough, can't we take advantage of
> the partial device-tree and describe the memory region using memory node?

With the system device tree bindings that are under discussion the role
of the partial device tree might be reduce going forward, and might even
go away in the long term. For this reason, I would prefer not to add
more things to the partial device tree.


> >   - #address-cells and #size-cells
> >         Both #address-cells and #size-cells need to be specified because
> > diff --git a/docs/misc/arm/passthrough-noiommu.txt
> > b/docs/misc/arm/passthrough-noiommu.txt
> > new file mode 100644
> > index 0000000000..d2bfaf26c3
> > --- /dev/null
> > +++ b/docs/misc/arm/passthrough-noiommu.txt
> > @@ -0,0 +1,35 @@
> > +Request Device Assignment without IOMMU support
> > +===============================================
> > +
> > +Add xen,force-assign-without-iommu; to the device tree snippet
> > +
> > +    ethernet: ethernet@ff0e0000 {
> > +        compatible = "cdns,zynqmp-gem";
> > +        xen,path = "/amba/ethernet@ff0e0000";
> > +        xen,reg = <0x0 0xff0e0000 0x1000 0x0 0xff0e0000>;
> > +        xen,force-assign-without-iommu;
> > +
> > +Optionally, if none of the domains require an IOMMU, then it could be
> > +disabled (not recommended). For instance by adding status = "disabled";
> > +under the smmu node:
> > +
> > +    smmu@fd800000 {
> > +        compatible = "arm,mmu-500";
> > +        status = "disabled";
> 
> I am not sure why this section is added in this patch. Furthermore, the file
> is named "noiommu" but here you mention the IOMMU.

I took the habit of writing user and testing docs at the time of writing
the patch series. I'll move this doc to the end of the the series. Also,
the words here are inaccurate, I'll improve it in the next version.


I have addressed all other comments.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 09/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv3
  2020-04-15 14:09   ` Julien Grall
@ 2020-05-01  1:31     ` Stefano Stabellini
  2020-05-01  8:40       ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-01  1:31 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Wed, 15 Apr 2020, Julien Grall wrote:
> > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> > index 4e60ba15cc..4cf430f865 100644
> > --- a/xen/arch/arm/vgic-v3.c
> > +++ b/xen/arch/arm/vgic-v3.c
> > @@ -1677,13 +1677,25 @@ static int vgic_v3_domain_init(struct domain *d)
> 
> 
> I think you also want to modify vgic_v3_max_rdist_count().

I don't think so: domUs even direct-mapped still only get 1 rdist
region. This patch is not changing the layout of the domU gic, it is
only finding a "hole" in the physical address space to make sure there
are no conflicts (or at least minimize the chance of conflicts.)


> >        * Domain 0 gets the hardware address.
> >        * Guests get the virtual platform layout.
> 
> This comment needs to be updated.

Yep, I'll do


> >        */
> > -    if ( is_hardware_domain(d) )
> > +    if ( is_domain_direct_mapped(d) )
> >       {
> >           unsigned int first_cpu = 0;
> > +        unsigned int nr_rdist_regions;
> >             d->arch.vgic.dbase = vgic_v3_hw.dbase;
> >   -        for ( i = 0; i < vgic_v3_hw.nr_rdist_regions; i++ )
> > +        if ( is_hardware_domain(d) )
> > +        {
> > +            nr_rdist_regions = vgic_v3_hw.nr_rdist_regions;
> > +            d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
> > +        }
> > +        else
> > +        {
> > +            nr_rdist_regions = 1;
> 
> What does promise your the rdist region will be big enough to cater all the
> re-distributors for your domain?

Good point. I'll add an explicit check for that with at least a warning.
I don't think we want to return error because the configuration it is
still likely to work. 

It might be better to continue this conversation on the next version of
the patch -- it is going to be much clearer.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011
  2020-05-01  1:26     ` Stefano Stabellini
@ 2020-05-01  8:09       ` Julien Grall
  2020-05-09  0:07         ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-05-01  8:09 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk



On 01/05/2020 02:26, Stefano Stabellini wrote:
> On Wed, 15 Apr 2020, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 15/04/2020 02:02, Stefano Stabellini wrote:
>>> We always use a fix address to map the vPL011 to domains. The address
>>> could be a problem for domains that are directly mapped.
>>>
>>> Instead, for domains that are directly mapped, reuse the address of the
>>> physical UART on the platform to avoid potential clashes.
>>
>> How do you know the physical UART MMIO region is big enough to fit the PL011?
> 
> That cannot be because the vPL011 MMIO size is 1 page, which is the
> minimum right?

No, there are platforms out with multiple UARTs in the same page (see 
sunxi for instance).

> 
> 
>> What if the user want to assign the physical UART to the domain + the vpl011?
> 
> A user can assign a UART to the domain, but it cannot assign the UART
> used by Xen (DTUART), which is the one we are using here to get the
> physical information.
> 
> 
> (If there is no UART used by Xen then we'll fall back to the virtual
> addresses. If they conflict we return error and let the user fix the
> configuration.)

If there is no UART in Xen, how the user will know the addresses 
conflicted? Earlyprintk?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 08/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv2
  2020-05-01  1:26     ` Stefano Stabellini
@ 2020-05-01  8:23       ` Julien Grall
  2020-05-09  0:06         ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-05-01  8:23 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk



On 01/05/2020 02:26, Stefano Stabellini wrote:
> On Wed, 15 Apr 2020, Julien Grall wrote:
>> On 15/04/2020 02:02, Stefano Stabellini wrote:
>>> Today we use native addresses to map the GICv2 for Dom0 and fixed
>>> addresses for DomUs.
>>>
>>> This patch changes the behavior so that native addresses are used for
>>> any domain that is_domain_direct_mapped.
>>>
>>> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
>>> ---
>>>    xen/arch/arm/domain_build.c | 10 +++++++---
>>>    xen/arch/arm/vgic-v2.c      | 12 ++++++------
>>>    xen/arch/arm/vgic/vgic-v2.c |  2 +-
>>>    xen/include/asm-arm/vgic.h  |  1 +
>>>    4 files changed, 15 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>>> index 627e0c5e8e..303bee60f6 100644
>>> --- a/xen/arch/arm/domain_build.c
>>> +++ b/xen/arch/arm/domain_build.c
>>> @@ -1643,8 +1643,12 @@ static int __init make_gicv2_domU_node(struct
>>> kernel_info *kinfo)
>>>        int res = 0;
>>>        __be32 reg[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) * 2];
>>>        __be32 *cells;
>>> +    struct domain *d = kinfo->d;
>>> +    char buf[38];
>>>    -    res = fdt_begin_node(fdt,
>>> "interrupt-controller@"__stringify(GUEST_GICD_BASE));
>>> +    snprintf(buf, sizeof(buf), "interrupt-controller@%"PRIx64,
>>> +             d->arch.vgic.dbase);
>>> +    res = fdt_begin_node(fdt, buf);
>>>        if ( res )
>>>            return res;
>>>    @@ -1666,9 +1670,9 @@ static int __init make_gicv2_domU_node(struct
>>> kernel_info *kinfo)
>>>          cells = &reg[0];
>>>        dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS,
>>> GUEST_ROOT_SIZE_CELLS,
>>> -                       GUEST_GICD_BASE, GUEST_GICD_SIZE);
>>> +                       d->arch.vgic.dbase, GUEST_GICD_SIZE);
>>>        dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS,
>>> GUEST_ROOT_SIZE_CELLS,
>>> -                       GUEST_GICC_BASE, GUEST_GICC_SIZE);
>>> +                       d->arch.vgic.cbase, GUEST_GICC_SIZE);
>>
>> You can't use the native base address and not the native size. Particularly,
>> this is going to screw any GIC using 8KB alias.
> 
> I don't follow why it could cause problems with a GIC using the 8KB
> alias. Maybe an example (even a fake example) would help.

The GICC interface is composed of the two 4KB pages. In some of the 
implementation, each pages starts at a 64KB aligned address. Each page 
are also aliased every 4KB in the 64KB region.

For guest, we don't expose the full 128KB region but only part of it 
(8KB). So the guest interface is the same regardless the underlying 
implementation of the GIC.

vgic.cbase points to the beginning of the first region. So what you are 
mapping is the first 8KB of the first region. The second region is not 
mapped at all.

As the pages are aliased, the trick we use is to map from vgic.cbase + 
60KB (vgic_v2.hw.aliased_offset). This means the 2 pages will now be 
contiguous in the guest physical memory.

>> It would be preferable if only expose part of the CPU interface as we do for
>> the guest. So d->arch.vgic.cbase would be equal to vgic_v2_hw.dbase +

I meant cbase rather than dbase here.

>> vgic_v2.hw.aliased_offset.
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs
  2020-05-01  1:26     ` Stefano Stabellini
@ 2020-05-01  8:30       ` Julien Grall
  2020-05-09  0:07         ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-05-01  8:30 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk



On 01/05/2020 02:26, Stefano Stabellini wrote:
> On Wed, 15 Apr 2020, Julien Grall wrote:
>> On 15/04/2020 02:02, Stefano Stabellini wrote:
>>> In some cases it is desirable to map domU memory 1:1 (guest physical ==
>>> physical.) For instance, because we want to assign a device to the domU
>>> but the IOMMU is not present or cannot be used. In these cases, other
>>> mechanisms should be used for DMA protection, e.g. a MPU.
>>
>> I am not against this, however the documentation should clearly explain that
>> you are making your platform insecure unless you have other mean for DMA
>> protection.
> 
> I'll expand the docs
> 
> 
>>>
>>> This patch introduces a new device tree option for dom0less guests to
>>> request a domain to be directly mapped. It also specifies the memory
>>> ranges. This patch documents the new attribute and parses it at boot
>>> time. (However, the implementation of 1:1 mapping is missing and just
>>> BUG() out at the moment.)  Finally the patch sets the new direct_map
>>> flag for DomU domains.
>>>
>>> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
>>> ---
>>>    docs/misc/arm/device-tree/booting.txt | 13 +++++++
>>>    docs/misc/arm/passthrough-noiommu.txt | 35 ++++++++++++++++++
>>>    xen/arch/arm/domain_build.c           | 52 +++++++++++++++++++++++++--
>>>    3 files changed, 98 insertions(+), 2 deletions(-)
>>>    create mode 100644 docs/misc/arm/passthrough-noiommu.txt
>>>
>>> diff --git a/docs/misc/arm/device-tree/booting.txt
>>> b/docs/misc/arm/device-tree/booting.txt
>>> index 5243bc7fd3..fce5f7ed5a 100644
>>> --- a/docs/misc/arm/device-tree/booting.txt
>>> +++ b/docs/misc/arm/device-tree/booting.txt
>>> @@ -159,6 +159,19 @@ with the following properties:
>>>        used, or GUEST_VPL011_SPI+1 if vpl011 is enabled, whichever is
>>>        greater.
>>>    +- direct-map
>>> +
>>> +    Optional. An array of integer pairs specifying addresses and sizes.
>>> +    direct_map requests the memory of the domain to be 1:1 mapped with
>>> +    the memory ranges specified as argument. Only sizes that are a
>>> +    power of two number of pages are allowed.
>>> +
>>> +- #direct-map-addr-cells and #direct-map-size-cells
>>> +
>>> +    The number of cells to use for the addresses and for the sizes in
>>> +    direct-map. Default and maximum are 2 cells for both addresses and
>>> +    sizes.
>>> +
>>
>> As this is going to be mostly used for passthrough, can't we take advantage of
>> the partial device-tree and describe the memory region using memory node?
> 
> With the system device tree bindings that are under discussion the role
> of the partial device tree might be reduce going forward, and might even
> go away in the long term. For this reason, I would prefer not to add
> more things to the partial device tree.

Was the interface you suggested approved by the committee behind system 
device tree? If not, we will still have to support your proposal + 
whatever the committee come up with. So I am not entirely sure why using 
the partial device-tree will be an issue.

It is actually better to keep everything in the partial device-tree as 
it would avoid to clash with whatever you come up with the system device 
tree.

Also, I don't think the partial device-tree could ever go away at least 
in Xen. This is an external interface we provide to the user, removing 
it would mean users would not be able to upgrade from Xen 4.x to 4.y 
without any major rewrite of there DT.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 09/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv3
  2020-05-01  1:31     ` Stefano Stabellini
@ 2020-05-01  8:40       ` Julien Grall
  2020-05-09  0:06         ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-05-01  8:40 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk

Hi Stefano,

On 01/05/2020 02:31, Stefano Stabellini wrote:
> On Wed, 15 Apr 2020, Julien Grall wrote:
>>> diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
>>> index 4e60ba15cc..4cf430f865 100644
>>> --- a/xen/arch/arm/vgic-v3.c
>>> +++ b/xen/arch/arm/vgic-v3.c
>>> @@ -1677,13 +1677,25 @@ static int vgic_v3_domain_init(struct domain *d)
>>
>>
>> I think you also want to modify vgic_v3_max_rdist_count().
> 
> I don't think so: domUs even direct-mapped still only get 1 rdist
> region. This patch is not changing the layout of the domU gic, it is
> only finding a "hole" in the physical address space to make sure there
> are no conflicts (or at least minimize the chance of conflicts.)

How do you know the "hole" is big enough?

> 
>>>         * Domain 0 gets the hardware address.
>>>         * Guests get the virtual platform layout.
>>
>> This comment needs to be updated.
> 
> Yep, I'll do
> 
> 
>>>         */
>>> -    if ( is_hardware_domain(d) )
>>> +    if ( is_domain_direct_mapped(d) )
>>>        {
>>>            unsigned int first_cpu = 0;
>>> +        unsigned int nr_rdist_regions;
>>>              d->arch.vgic.dbase = vgic_v3_hw.dbase;
>>>    -        for ( i = 0; i < vgic_v3_hw.nr_rdist_regions; i++ )
>>> +        if ( is_hardware_domain(d) )
>>> +        {
>>> +            nr_rdist_regions = vgic_v3_hw.nr_rdist_regions;
>>> +            d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
>>> +        }
>>> +        else
>>> +        {
>>> +            nr_rdist_regions = 1;
>>
>> What does promise your the rdist region will be big enough to cater all the
>> re-distributors for your domain?
> 
> Good point. I'll add an explicit check for that with at least a warning.
> I don't think we want to return error because the configuration it is
> still likely to work.

No it is not going to work. Imagine you have have a guest with 3 vCPUs 
but the first re-distributor region can only cater 2 re-distributor. How 
is this going to be fine to continue?

For dom0, we are re-using the same hole but possibly not all of them. 
Why can't we do that for domU?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-30 16:21         ` Stefano Stabellini
@ 2020-05-04  9:16           ` Jan Beulich
  0 siblings, 0 replies; 67+ messages in thread
From: Jan Beulich @ 2020-05-04  9:16 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: julien, Wei Liu, andrew.cooper3, Ian Jackson, George Dunlap,
	xen-devel, Stefano Stabellini, Volodymyr_Babchuk

On 30.04.2020 18:21, Stefano Stabellini wrote:
> On Thu, 30 Apr 2020, Jan Beulich wrote:
>> On 30.04.2020 00:46, Stefano Stabellini wrote:
>>> On Fri, 17 Apr 2020, Jan Beulich wrote:
>>>> On 15.04.2020 03:02, Stefano Stabellini wrote:
>>>>> Introduce a function named reserve_heap_pages (similar to
>>>>> alloc_heap_pages) that allocates a requested memory range. Call
>>>>> __alloc_heap_pages for the implementation.
>>>>>
>>>>> Change __alloc_heap_pages so that the original page doesn't get
>>>>> modified, giving back unneeded memory top to bottom rather than bottom
>>>>> to top.
>>>>
>>>> While it may be less of a problem within a zone, doing so is
>>>> against our general "return high pages first" policy.
>>>
>>> Is this something you'd be OK with anyway?
>>
>> As a last resort, maybe. But it really depends on why it needs to be
>> this way.
>>
>>> If not, do you have a suggestion on how to do it better? I couldn't find
>>> a nice way to do it without code duplication, or a big nasty 'if' in the
>>> middle of the function.
>>
>> I'd first need to understand the problem to solve.
> 
> OK, I'll make an example.
> 
> reserve_heap_pages wants to reserve the range 0x10000000 - 0x20000000.
> 
> reserve_heap_pages get the struct page_info for 0x10000000 and calls
> alloc_pages_from_buddy to allocate an order of 28.
> 
> alloc_pages_from_buddy realizes that the free memory area starting from
> 0x10000000 is actually of order 30, even larger than the requested order
> of 28. The free area is 0x10000000 - 0x50000000.
> 
> alloc_pages_from_buddy instead of just allocating an order of 28
> starting from 0x10000000, it would allocate the "top" order of 28 in the
> free area. So it would allocate: 0x40000000 - 0x50000000, returning
> 0x40000000.
> 
> Of course, this doesn't work for reserve_heap_pages.
> 
> 
> This patch changes the behavior of alloc_pages_from_buddy so that in a
> situation like the above, it would return 0x10000000 - 0x20000000
> (leaving the rest of the free area unallocated.)

So what if then, for the same order-30 (really order-18 if I assume
you name addresses, not frame numbers), a reservation request came
in for the highest order-28 sub-region? You'd again be screwed if
you relied on which part of a larger buddy gets returned by the
lower level function you call. I can't help thinking that basing
reservation on allocation functions can't really be made work for
all possible cases. Instead reservation requests need to check that
the requested range is free _and_ split the potentially larger
range according to the request.

Jan


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 09/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv3
  2020-05-01  8:40       ` Julien Grall
@ 2020-05-09  0:06         ` Stefano Stabellini
  0 siblings, 0 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-09  0:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Fri, 1 May 2020, Julien Grall wrote:
> Hi Stefano,
> 
> On 01/05/2020 02:31, Stefano Stabellini wrote:
> > On Wed, 15 Apr 2020, Julien Grall wrote:
> > > > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> > > > index 4e60ba15cc..4cf430f865 100644
> > > > --- a/xen/arch/arm/vgic-v3.c
> > > > +++ b/xen/arch/arm/vgic-v3.c
> > > > @@ -1677,13 +1677,25 @@ static int vgic_v3_domain_init(struct domain *d)
> > > 
> > > 
> > > I think you also want to modify vgic_v3_max_rdist_count().
> > 
> > I don't think so: domUs even direct-mapped still only get 1 rdist
> > region. This patch is not changing the layout of the domU gic, it is
> > only finding a "hole" in the physical address space to make sure there
> > are no conflicts (or at least minimize the chance of conflicts.)
> 
> How do you know the "hole" is big enough?
> 
> > 
> > > >         * Domain 0 gets the hardware address.
> > > >         * Guests get the virtual platform layout.
> > > 
> > > This comment needs to be updated.
> > 
> > Yep, I'll do
> > 
> > 
> > > >         */
> > > > -    if ( is_hardware_domain(d) )
> > > > +    if ( is_domain_direct_mapped(d) )
> > > >        {
> > > >            unsigned int first_cpu = 0;
> > > > +        unsigned int nr_rdist_regions;
> > > >              d->arch.vgic.dbase = vgic_v3_hw.dbase;
> > > >    -        for ( i = 0; i < vgic_v3_hw.nr_rdist_regions; i++ )
> > > > +        if ( is_hardware_domain(d) )
> > > > +        {
> > > > +            nr_rdist_regions = vgic_v3_hw.nr_rdist_regions;
> > > > +            d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
> > > > +        }
> > > > +        else
> > > > +        {
> > > > +            nr_rdist_regions = 1;
> > > 
> > > What does promise your the rdist region will be big enough to cater all
> > > the
> > > re-distributors for your domain?
> > 
> > Good point. I'll add an explicit check for that with at least a warning.
> > I don't think we want to return error because the configuration it is
> > still likely to work.
> 
> No it is not going to work. Imagine you have have a guest with 3 vCPUs but the
> first re-distributor region can only cater 2 re-distributor. How is this going
> to be fine to continue?
> 
> For dom0, we are re-using the same hole but possibly not all of them. Why
> can't we do that for domU?

I implemented what you suggested


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 08/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv2
  2020-05-01  8:23       ` Julien Grall
@ 2020-05-09  0:06         ` Stefano Stabellini
  0 siblings, 0 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-09  0:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Fri, 1 May 2020, Julien Grall wrote:
> On 01/05/2020 02:26, Stefano Stabellini wrote:
> > On Wed, 15 Apr 2020, Julien Grall wrote:
> > > On 15/04/2020 02:02, Stefano Stabellini wrote:
> > > > Today we use native addresses to map the GICv2 for Dom0 and fixed
> > > > addresses for DomUs.
> > > > 
> > > > This patch changes the behavior so that native addresses are used for
> > > > any domain that is_domain_direct_mapped.
> > > > 
> > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> > > > ---
> > > >    xen/arch/arm/domain_build.c | 10 +++++++---
> > > >    xen/arch/arm/vgic-v2.c      | 12 ++++++------
> > > >    xen/arch/arm/vgic/vgic-v2.c |  2 +-
> > > >    xen/include/asm-arm/vgic.h  |  1 +
> > > >    4 files changed, 15 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > > > index 627e0c5e8e..303bee60f6 100644
> > > > --- a/xen/arch/arm/domain_build.c
> > > > +++ b/xen/arch/arm/domain_build.c
> > > > @@ -1643,8 +1643,12 @@ static int __init make_gicv2_domU_node(struct
> > > > kernel_info *kinfo)
> > > >        int res = 0;
> > > >        __be32 reg[(GUEST_ROOT_ADDRESS_CELLS + GUEST_ROOT_SIZE_CELLS) *
> > > > 2];
> > > >        __be32 *cells;
> > > > +    struct domain *d = kinfo->d;
> > > > +    char buf[38];
> > > >    -    res = fdt_begin_node(fdt,
> > > > "interrupt-controller@"__stringify(GUEST_GICD_BASE));
> > > > +    snprintf(buf, sizeof(buf), "interrupt-controller@%"PRIx64,
> > > > +             d->arch.vgic.dbase);
> > > > +    res = fdt_begin_node(fdt, buf);
> > > >        if ( res )
> > > >            return res;
> > > >    @@ -1666,9 +1670,9 @@ static int __init make_gicv2_domU_node(struct
> > > > kernel_info *kinfo)
> > > >          cells = &reg[0];
> > > >        dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS,
> > > > GUEST_ROOT_SIZE_CELLS,
> > > > -                       GUEST_GICD_BASE, GUEST_GICD_SIZE);
> > > > +                       d->arch.vgic.dbase, GUEST_GICD_SIZE);
> > > >        dt_child_set_range(&cells, GUEST_ROOT_ADDRESS_CELLS,
> > > > GUEST_ROOT_SIZE_CELLS,
> > > > -                       GUEST_GICC_BASE, GUEST_GICC_SIZE);
> > > > +                       d->arch.vgic.cbase, GUEST_GICC_SIZE);
> > > 
> > > You can't use the native base address and not the native size.
> > > Particularly,
> > > this is going to screw any GIC using 8KB alias.
> > 
> > I don't follow why it could cause problems with a GIC using the 8KB
> > alias. Maybe an example (even a fake example) would help.
> 
> The GICC interface is composed of the two 4KB pages. In some of the
> implementation, each pages starts at a 64KB aligned address. Each page are
> also aliased every 4KB in the 64KB region.
> 
> For guest, we don't expose the full 128KB region but only part of it (8KB). So
> the guest interface is the same regardless the underlying implementation of
> the GIC.
> 
> vgic.cbase points to the beginning of the first region. So what you are
> mapping is the first 8KB of the first region. The second region is not mapped
> at all.
> 
> As the pages are aliased, the trick we use is to map from vgic.cbase + 60KB
> (vgic_v2.hw.aliased_offset). This means the 2 pages will now be contiguous in
> the guest physical memory.

I understood the issue and I fixed it by applying
vgic_v2.hw.aliased_offset to vbase and cbase. (Although only vbase is
actually necessary as far as I can tell.)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs
  2020-05-01  8:30       ` Julien Grall
@ 2020-05-09  0:07         ` Stefano Stabellini
  2020-05-09  9:56           ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-09  0:07 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Fri, 1 May 2020, Julien Grall wrote:
> On 01/05/2020 02:26, Stefano Stabellini wrote:
> > On Wed, 15 Apr 2020, Julien Grall wrote:
> > > On 15/04/2020 02:02, Stefano Stabellini wrote:
> > > > In some cases it is desirable to map domU memory 1:1 (guest physical ==
> > > > physical.) For instance, because we want to assign a device to the domU
> > > > but the IOMMU is not present or cannot be used. In these cases, other
> > > > mechanisms should be used for DMA protection, e.g. a MPU.
> > > 
> > > I am not against this, however the documentation should clearly explain
> > > that
> > > you are making your platform insecure unless you have other mean for DMA
> > > protection.
> > 
> > I'll expand the docs
> > 
> > 
> > > > 
> > > > This patch introduces a new device tree option for dom0less guests to
> > > > request a domain to be directly mapped. It also specifies the memory
> > > > ranges. This patch documents the new attribute and parses it at boot
> > > > time. (However, the implementation of 1:1 mapping is missing and just
> > > > BUG() out at the moment.)  Finally the patch sets the new direct_map
> > > > flag for DomU domains.
> > > > 
> > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> > > > ---
> > > >    docs/misc/arm/device-tree/booting.txt | 13 +++++++
> > > >    docs/misc/arm/passthrough-noiommu.txt | 35 ++++++++++++++++++
> > > >    xen/arch/arm/domain_build.c           | 52
> > > > +++++++++++++++++++++++++--
> > > >    3 files changed, 98 insertions(+), 2 deletions(-)
> > > >    create mode 100644 docs/misc/arm/passthrough-noiommu.txt
> > > > 
> > > > diff --git a/docs/misc/arm/device-tree/booting.txt
> > > > b/docs/misc/arm/device-tree/booting.txt
> > > > index 5243bc7fd3..fce5f7ed5a 100644
> > > > --- a/docs/misc/arm/device-tree/booting.txt
> > > > +++ b/docs/misc/arm/device-tree/booting.txt
> > > > @@ -159,6 +159,19 @@ with the following properties:
> > > >        used, or GUEST_VPL011_SPI+1 if vpl011 is enabled, whichever is
> > > >        greater.
> > > >    +- direct-map
> > > > +
> > > > +    Optional. An array of integer pairs specifying addresses and sizes.
> > > > +    direct_map requests the memory of the domain to be 1:1 mapped with
> > > > +    the memory ranges specified as argument. Only sizes that are a
> > > > +    power of two number of pages are allowed.
> > > > +
> > > > +- #direct-map-addr-cells and #direct-map-size-cells
> > > > +
> > > > +    The number of cells to use for the addresses and for the sizes in
> > > > +    direct-map. Default and maximum are 2 cells for both addresses and
> > > > +    sizes.
> > > > +
> > > 
> > > As this is going to be mostly used for passthrough, can't we take
> > > advantage of
> > > the partial device-tree and describe the memory region using memory node?
> > 
> > With the system device tree bindings that are under discussion the role
> > of the partial device tree might be reduce going forward, and might even
> > go away in the long term. For this reason, I would prefer not to add
> > more things to the partial device tree.
> 
> Was the interface you suggested approved by the committee behind system device
> tree? If not, we will still have to support your proposal + whatever the
> committee come up with. So I am not entirely sure why using the partial
> device-tree will be an issue.

Not yet


> It is actually better to keep everything in the partial device-tree as it
> would avoid to clash with whatever you come up with the system device tree.

I don't think we want to support both in the long term. The closer we
are to it the better for transitioning.


> Also, I don't think the partial device-tree could ever go away at least in
> Xen. This is an external interface we provide to the user, removing it would
> mean users would not be able to upgrade from Xen 4.x to 4.y without any major
> rewrite of there DT.

I don't want to put the memory ranges inside the multiboot,device-tree
module because that is clearly for device assignment:

"Device Assignment (Passthrough) is supported by adding another module,
alongside the kernel and ramdisk, with the device tree fragment
corresponding to the device node to assign to the guest."

One could do 1:1 memory mapping without device assignment.


Genuine question: did we write down any compatibility promise on that
interface? If so, do you know where? I'd like to take a look.

In any case backward compatible interfaces can be deprecated although it
takes time. Alternatively it could be made optional (even for device
assignment). So expanding its scope beyond device assignment
configuration it is not a good idea.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011
  2020-05-01  8:09       ` Julien Grall
@ 2020-05-09  0:07         ` Stefano Stabellini
  2020-05-09 10:11           ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-09  0:07 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Fri, 1 May 2020, Julien Grall wrote:
> On 01/05/2020 02:26, Stefano Stabellini wrote:
> > On Wed, 15 Apr 2020, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 15/04/2020 02:02, Stefano Stabellini wrote:
> > > > We always use a fix address to map the vPL011 to domains. The address
> > > > could be a problem for domains that are directly mapped.
> > > > 
> > > > Instead, for domains that are directly mapped, reuse the address of the
> > > > physical UART on the platform to avoid potential clashes.
> > > 
> > > How do you know the physical UART MMIO region is big enough to fit the
> > > PL011?
> > 
> > That cannot be because the vPL011 MMIO size is 1 page, which is the
> > minimum right?
> 
> No, there are platforms out with multiple UARTs in the same page (see sunxi
> for instance).

But if there are multiple UARTs sharing the same page, and the first one
is used by Xen, there is no way to assign one of the secondary UARTs to
a domU. So there would be no problem choosing the physical UART address
for the virtual PL011.
 
 
> > > What if the user want to assign the physical UART to the domain + the
> > > vpl011?
> > 
> > A user can assign a UART to the domain, but it cannot assign the UART
> > used by Xen (DTUART), which is the one we are using here to get the
> > physical information.
> > 
> > 
> > (If there is no UART used by Xen then we'll fall back to the virtual
> > addresses. If they conflict we return error and let the user fix the
> > configuration.)
> 
> If there is no UART in Xen, how the user will know the addresses conflicted?
> Earlyprintk?

That's a good question. Yes, I think earlyprintk would be the only way.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs
  2020-05-09  0:07         ` Stefano Stabellini
@ 2020-05-09  9:56           ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2020-05-09  9:56 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk

Hi,

On 09/05/2020 01:07, Stefano Stabellini wrote:
> On Fri, 1 May 2020, Julien Grall wrote:
>> On 01/05/2020 02:26, Stefano Stabellini wrote:
>>> On Wed, 15 Apr 2020, Julien Grall wrote:
>>>> On 15/04/2020 02:02, Stefano Stabellini wrote:
>>>>> In some cases it is desirable to map domU memory 1:1 (guest physical ==
>>>>> physical.) For instance, because we want to assign a device to the domU
>>>>> but the IOMMU is not present or cannot be used. In these cases, other
>>>>> mechanisms should be used for DMA protection, e.g. a MPU.
>>>>
>>>> I am not against this, however the documentation should clearly explain
>>>> that
>>>> you are making your platform insecure unless you have other mean for DMA
>>>> protection.
>>>
>>> I'll expand the docs
>>>
>>>
>>>>>
>>>>> This patch introduces a new device tree option for dom0less guests to
>>>>> request a domain to be directly mapped. It also specifies the memory
>>>>> ranges. This patch documents the new attribute and parses it at boot
>>>>> time. (However, the implementation of 1:1 mapping is missing and just
>>>>> BUG() out at the moment.)  Finally the patch sets the new direct_map
>>>>> flag for DomU domains.
>>>>>
>>>>> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
>>>>> ---
>>>>>     docs/misc/arm/device-tree/booting.txt | 13 +++++++
>>>>>     docs/misc/arm/passthrough-noiommu.txt | 35 ++++++++++++++++++
>>>>>     xen/arch/arm/domain_build.c           | 52
>>>>> +++++++++++++++++++++++++--
>>>>>     3 files changed, 98 insertions(+), 2 deletions(-)
>>>>>     create mode 100644 docs/misc/arm/passthrough-noiommu.txt
>>>>>
>>>>> diff --git a/docs/misc/arm/device-tree/booting.txt
>>>>> b/docs/misc/arm/device-tree/booting.txt
>>>>> index 5243bc7fd3..fce5f7ed5a 100644
>>>>> --- a/docs/misc/arm/device-tree/booting.txt
>>>>> +++ b/docs/misc/arm/device-tree/booting.txt
>>>>> @@ -159,6 +159,19 @@ with the following properties:
>>>>>         used, or GUEST_VPL011_SPI+1 if vpl011 is enabled, whichever is
>>>>>         greater.
>>>>>     +- direct-map
>>>>> +
>>>>> +    Optional. An array of integer pairs specifying addresses and sizes.
>>>>> +    direct_map requests the memory of the domain to be 1:1 mapped with
>>>>> +    the memory ranges specified as argument. Only sizes that are a
>>>>> +    power of two number of pages are allowed.
>>>>> +
>>>>> +- #direct-map-addr-cells and #direct-map-size-cells
>>>>> +
>>>>> +    The number of cells to use for the addresses and for the sizes in
>>>>> +    direct-map. Default and maximum are 2 cells for both addresses and
>>>>> +    sizes.
>>>>> +
>>>>
>>>> As this is going to be mostly used for passthrough, can't we take
>>>> advantage of
>>>> the partial device-tree and describe the memory region using memory node?
>>>
>>> With the system device tree bindings that are under discussion the role
>>> of the partial device tree might be reduce going forward, and might even
>>> go away in the long term. For this reason, I would prefer not to add
>>> more things to the partial device tree.
>>
>> Was the interface you suggested approved by the committee behind system device
>> tree? If not, we will still have to support your proposal + whatever the
>> committee come up with. So I am not entirely sure why using the partial
>> device-tree will be an issue.
> 
> Not yet

This answer...

> 
> 
>> It is actually better to keep everything in the partial device-tree as it
>> would avoid to clash with whatever you come up with the system device tree.
> 
> I don't think we want to support both in the long term. The closer we
> are to it the better for transitioning.

... raises the question how your solution is going to be closer? Do you 
mind providing more details on the system device-tree?

> 
> 
>> Also, I don't think the partial device-tree could ever go away at least in
>> Xen. This is an external interface we provide to the user, removing it would
>> mean users would not be able to upgrade from Xen 4.x to 4.y without any major
>> rewrite of there DT.
> 
> I don't want to put the memory ranges inside the multiboot,device-tree
> module because that is clearly for device assignment:
> 
> "Device Assignment (Passthrough) is supported by adding another module,
> alongside the kernel and ramdisk, with the device tree fragment
> corresponding to the device node to assign to the guest."

Thanks you for copying the documentation here... As you know, when the 
partial device-tree was designed, it was only focused on device 
assignment. However, I don't see how this prevents us to extend it to 
more use cases.

Describing the RAM regions in the partial device-tree means you have a 
single place where you can understand the memory layout of your guest.

You have also much more flexibility for describing your guests over the 
/chosen node and avoid to clash with the rest of the host device-tree.

> 
> One could do 1:1 memory mapping without device assignment.
 >
> Genuine question: did we write down any compatibility promise on that
> interface? If so, do you know where? I'd like to take a look.

Nothing written in Xen, however a Device-Tree bindings are meant to be 
stable.

This would be a pretty bad user experience if you had to rewrite your 
device-tree when upgrading Xen 4.14 to Xen 5.x. This also means 
roll-back would be more difficult as there are more components dependency.

> In any case backward compatible interfaces can be deprecated although it
> takes time. Alternatively it could be made optional (even for device
> assignment). So expanding its scope beyond device assignment
> configuration it is not a good idea.

What would be the replacement? I still haven't seen any light of the 
so-called system device-tree.

At the moment, I can better picture how this can work with the partial 
device-tree. One of the advantage is you could describe your guest 
layout in one place and then re-use it for booting a guest from the 
toolstack (not implemented yet) or the hypervisor.

I could change my mind if it turns out to be genuinely more complicated 
to implement in Xen and/or you provide more details on how this is going 
to work out with the system device-tree.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011
  2020-05-09  0:07         ` Stefano Stabellini
@ 2020-05-09 10:11           ` Julien Grall
  2020-05-11 22:58             ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-05-09 10:11 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk

Hi Stefano,

On 09/05/2020 01:07, Stefano Stabellini wrote:
> On Fri, 1 May 2020, Julien Grall wrote:
>> On 01/05/2020 02:26, Stefano Stabellini wrote:
>>> On Wed, 15 Apr 2020, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On 15/04/2020 02:02, Stefano Stabellini wrote:
>>>>> We always use a fix address to map the vPL011 to domains. The address
>>>>> could be a problem for domains that are directly mapped.
>>>>>
>>>>> Instead, for domains that are directly mapped, reuse the address of the
>>>>> physical UART on the platform to avoid potential clashes.
>>>>
>>>> How do you know the physical UART MMIO region is big enough to fit the
>>>> PL011?
>>>
>>> That cannot be because the vPL011 MMIO size is 1 page, which is the
>>> minimum right?
>>
>> No, there are platforms out with multiple UARTs in the same page (see sunxi
>> for instance).
> 
> But if there are multiple UARTs sharing the same page, and the first one
> is used by Xen, there is no way to assign one of the secondary UARTs to
> a domU. So there would be no problem choosing the physical UART address
> for the virtual PL011.

AFAICT, nothing prevents a user to assign such UART to a dom0less guest 
today. It would not be safe, but it should work.

If you want to make it safe, then you would need to trap the MMIO access 
so they can be sanitized. For a UART device, I don't think the overhead 
would be too bad.

Anyway, the only thing I request is to add sanity check in the code to 
help the user diagnostics any potential clash.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011
  2020-05-09 10:11           ` Julien Grall
@ 2020-05-11 22:58             ` Stefano Stabellini
  0 siblings, 0 replies; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-11 22:58 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Sat, 9 May 2020, Julien Grall wrote:
> Hi Stefano,
> 
> On 09/05/2020 01:07, Stefano Stabellini wrote:
> > On Fri, 1 May 2020, Julien Grall wrote:
> > > On 01/05/2020 02:26, Stefano Stabellini wrote:
> > > > On Wed, 15 Apr 2020, Julien Grall wrote:
> > > > > Hi Stefano,
> > > > > 
> > > > > On 15/04/2020 02:02, Stefano Stabellini wrote:
> > > > > > We always use a fix address to map the vPL011 to domains. The
> > > > > > address
> > > > > > could be a problem for domains that are directly mapped.
> > > > > > 
> > > > > > Instead, for domains that are directly mapped, reuse the address of
> > > > > > the
> > > > > > physical UART on the platform to avoid potential clashes.
> > > > > 
> > > > > How do you know the physical UART MMIO region is big enough to fit the
> > > > > PL011?
> > > > 
> > > > That cannot be because the vPL011 MMIO size is 1 page, which is the
> > > > minimum right?
> > > 
> > > No, there are platforms out with multiple UARTs in the same page (see
> > > sunxi
> > > for instance).
> > 
> > But if there are multiple UARTs sharing the same page, and the first one
> > is used by Xen, there is no way to assign one of the secondary UARTs to
> > a domU. So there would be no problem choosing the physical UART address
> > for the virtual PL011.
> 
> AFAICT, nothing prevents a user to assign such UART to a dom0less guest today.
> It would not be safe, but it should work.
>
> If you want to make it safe, then you would need to trap the MMIO access so
> they can be sanitized. For a UART device, I don't think the overhead would be
> too bad.
> 
> Anyway, the only thing I request is to add sanity check in the code to help
> the user diagnostics any potential clash.

OK thanks for clarifying, I'll do that.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-04-30 18:27           ` Julien Grall
@ 2020-05-12  1:10             ` Stefano Stabellini
  2020-05-12  8:57               ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-12  1:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, andrew.cooper3, Ian Jackson,
	George Dunlap, Jan Beulich, xen-devel, Stefano Stabellini,
	Volodymyr_Babchuk, Woodhouse, David

On Thu, 30 Apr 2020, Julien Grall wrote:
> On 30/04/2020 18:00, Stefano Stabellini wrote:
> > On Thu, 30 Apr 2020, Julien Grall wrote:
> > > > > > +    pg = maddr_to_page(start);
> > > > > > +    node = phys_to_nid(start);
> > > > > > +    zone = page_to_zone(pg);
> > > > > > +    page_list_del(pg, &heap(node, zone, order));
> > > > > > +
> > > > > > +    __alloc_heap_pages(pg, order, memflags, d);
> > > > > 
> > > > > I agree with Julien in not seeing how this can be safe / correct.
> > > > 
> > > > I haven't seen any issues so far in my testing -- I imagine it is
> > > > because there aren't many memory allocations after setup_mm() and before
> > > > create_domUs()  (which on ARM is called just before
> > > > domain_unpause_by_systemcontroller at the end of start_xen.)
> > > 
> > > I am not sure why you exclude setup_mm(). Any memory allocated (boot
> > > allocator, xenheap) can clash with your regions. The main memory
> > > allocations
> > > are for the frametable and dom0. I would say you were lucky to not hit
> > > them.
> > 
> > Maybe it is because Xen typically allocates memory top-down? So if I
> > chose a high range then I would see a failure? But I have been mostly
> > testing with ranges close to the begin of RAM (as opposed to
> > ranges close to the end of RAM.)
> 
> I haven't looked at the details of the implementation, but you can try to
> specify dom0 addresses for your domU. You should see a failure.

I managed to reproduce a failure by choosing the top address range. On
Xilinx ZynqMP the memory is:

  reg = <0x0 0x0 0x0 0x7ff00000 0x8 0x0 0x0 0x80000000>;

And I chose:

  fdt set /chosen/domU0 direct-map <0x0 0x10000000 0x10000000 0x8 0x70000000 0x10000000>

Resulting in:

(XEN) *** LOADING DOMU cpus=1 memory=80000KB ***
(XEN) Loading d1 kernel from boot module @ 0000000007200000
(XEN) Loading ramdisk from boot module @ 0000000008200000
(XEN) direct_map start=0x00000010000000 size=0x00000010000000
(XEN) direct_map start=0x00000870000000 size=0x00000010000000
(XEN) Data Abort Trap. Syndrome=0x5
(XEN) Walking Hypervisor VA 0x2403480018 on CPU0 via TTBR 0x0000000000f05000
(XEN) 0TH[0x0] = 0x0000000000f08f7f
(XEN) 1ST[0x90] = 0x0000000000000000
(XEN) CPU0: Unexpected Trap: Data Abort

[...]

(XEN) Xen call trace:
(XEN)    [<000000000021a65c>] page_alloc.c#alloc_pages_from_buddy+0x15c/0x5d0 (PC)
(XEN)    [<000000000021b43c>] reserve_domheap_pages+0xc4/0x148 (LR)

Anything other than the very top of memory works.


> > > > - in construct_domU, add the range to xenheap and reserve it with
> > > > reserve_heap_pages
> > > 
> > > I am afraid you can't give the regions to the allocator and then allocate
> > > them. The allocator is free to use any page for its own purpose or exclude
> > > them.
> > > 
> > > AFAICT, the allocator doesn't have a list of page in use. It only keeps
> > > track
> > > of free pages. So we can make the content of struct page_info to look like
> > > it
> > > was allocated by the allocator.
> > > 
> > > We would need to be careful when giving a page back to allocator as the
> > > page
> > > would need to be initialized (see [1]). This may not be a concern for
> > > Dom0less
> > > as the domain may never be destroyed but will be for correctness PoV.
> > > 
> > > For LiveUpdate, the original Xen will carve out space to use by the boot
> > > allocator in the new Xen. But I think this is not necessary in your
> > > context.
> > > 
> > > It should be sufficient to exclude the page from the boot allocators (as
> > > we do
> > > for other modules).
> > > 
> > > One potential issue that can arise is there is no easy way today to
> > > differentiate between pages allocated and pages not yet initialized. To
> > > make
> > > the code robust, we need to prevent a page to be used in two places. So
> > > for
> > > LiveUpdate we are marking them with a special value, this is used
> > > afterwards
> > > to check we are effictively using a reserved page.
> > > 
> > > I hope this helps.
> > 
> > Thanks for writing all of this down but I haven't understood some of it.
> > 
> > For the sake of this discussion let's say that we managed to "reserve"
> > the range early enough like we do for other modules, as you wrote.
> > 
> > At the point where we want to call reserve_heap_pages() we would call
> > init_heap_pages() just before it. We are still relatively early at boot
> > so there aren't any concurrent memory operations. Why this doesn't work?
> 
> Because init_heap_pages() may exclude some pages (for instance MFN 0 is carved
> out) or use pages for its internal structure (see init_node_heap()). So you
> can't expect to be able to allocate the exact same region by
> reserve_heap_pages().

But it can't possibly use of any of pages it is trying to add to the
heap, right?

We have reserved a certain range, we tell init_heap_pages to add the
range to the heap, init_node_heap gets called and it ends up calling
xmalloc. There is no way xmalloc can use any memory from that
particular range because it is not in the heap yet. That should be safe.

The init_node_heap code is a bit hard to follow but I went through it
and couldn't spot anything that could cause any issues (MFN 0 aside
which is a bit special). Am I missing something?


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 05/12] xen: introduce reserve_heap_pages
  2020-05-12  1:10             ` Stefano Stabellini
@ 2020-05-12  8:57               ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2020-05-12  8:57 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Wei Liu, andrew.cooper3, Ian Jackson, George Dunlap, Jan Beulich,
	xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Woodhouse,
	David

Hi,

On 12/05/2020 02:10, Stefano Stabellini wrote:
> On Thu, 30 Apr 2020, Julien Grall wrote:
>> On 30/04/2020 18:00, Stefano Stabellini wrote:
>>> On Thu, 30 Apr 2020, Julien Grall wrote:
>>>>>>> +    pg = maddr_to_page(start);
>>>>>>> +    node = phys_to_nid(start);
>>>>>>> +    zone = page_to_zone(pg);
>>>>>>> +    page_list_del(pg, &heap(node, zone, order));
>>>>>>> +
>>>>>>> +    __alloc_heap_pages(pg, order, memflags, d);
>>>>>>
>>>>>> I agree with Julien in not seeing how this can be safe / correct.
>>>>>
>>>>> I haven't seen any issues so far in my testing -- I imagine it is
>>>>> because there aren't many memory allocations after setup_mm() and before
>>>>> create_domUs()  (which on ARM is called just before
>>>>> domain_unpause_by_systemcontroller at the end of start_xen.)
>>>>
>>>> I am not sure why you exclude setup_mm(). Any memory allocated (boot
>>>> allocator, xenheap) can clash with your regions. The main memory
>>>> allocations
>>>> are for the frametable and dom0. I would say you were lucky to not hit
>>>> them.
>>>
>>> Maybe it is because Xen typically allocates memory top-down? So if I
>>> chose a high range then I would see a failure? But I have been mostly
>>> testing with ranges close to the begin of RAM (as opposed to
>>> ranges close to the end of RAM.)
>>
>> I haven't looked at the details of the implementation, but you can try to
>> specify dom0 addresses for your domU. You should see a failure.
> 
> I managed to reproduce a failure by choosing the top address range. On
> Xilinx ZynqMP the memory is:
> 
>    reg = <0x0 0x0 0x0 0x7ff00000 0x8 0x0 0x0 0x80000000>;
> 
> And I chose:
> 
>    fdt set /chosen/domU0 direct-map <0x0 0x10000000 0x10000000 0x8 0x70000000 0x10000000>
> 
> Resulting in:
> 
> (XEN) *** LOADING DOMU cpus=1 memory=80000KB ***
> (XEN) Loading d1 kernel from boot module @ 0000000007200000
> (XEN) Loading ramdisk from boot module @ 0000000008200000
> (XEN) direct_map start=0x00000010000000 size=0x00000010000000
> (XEN) direct_map start=0x00000870000000 size=0x00000010000000
> (XEN) Data Abort Trap. Syndrome=0x5
> (XEN) Walking Hypervisor VA 0x2403480018 on CPU0 via TTBR 0x0000000000f05000
> (XEN) 0TH[0x0] = 0x0000000000f08f7f
> (XEN) 1ST[0x90] = 0x0000000000000000
> (XEN) CPU0: Unexpected Trap: Data Abort
> 
> [...]
> 
> (XEN) Xen call trace:
> (XEN)    [<000000000021a65c>] page_alloc.c#alloc_pages_from_buddy+0x15c/0x5d0 (PC)
> (XEN)    [<000000000021b43c>] reserve_domheap_pages+0xc4/0x148 (LR)

This isn't what I was expecting. If there is any failure, I would expect 
an error message not a data abort. However...

> 
> Anything other than the very top of memory works.

... I am very confused by this. Are you suggesting that with your series 
you can allocate the same range for Dom0 and a DomU without any trouble?

> 
>>>>> - in construct_domU, add the range to xenheap and reserve it with
>>>>> reserve_heap_pages
>>>>
>>>> I am afraid you can't give the regions to the allocator and then allocate
>>>> them. The allocator is free to use any page for its own purpose or exclude
>>>> them.
>>>>
>>>> AFAICT, the allocator doesn't have a list of page in use. It only keeps
>>>> track
>>>> of free pages. So we can make the content of struct page_info to look like
>>>> it
>>>> was allocated by the allocator.
>>>>
>>>> We would need to be careful when giving a page back to allocator as the
>>>> page
>>>> would need to be initialized (see [1]). This may not be a concern for
>>>> Dom0less
>>>> as the domain may never be destroyed but will be for correctness PoV.
>>>>
>>>> For LiveUpdate, the original Xen will carve out space to use by the boot
>>>> allocator in the new Xen. But I think this is not necessary in your
>>>> context.
>>>>
>>>> It should be sufficient to exclude the page from the boot allocators (as
>>>> we do
>>>> for other modules).
>>>>
>>>> One potential issue that can arise is there is no easy way today to
>>>> differentiate between pages allocated and pages not yet initialized. To
>>>> make
>>>> the code robust, we need to prevent a page to be used in two places. So
>>>> for
>>>> LiveUpdate we are marking them with a special value, this is used
>>>> afterwards
>>>> to check we are effictively using a reserved page.
>>>>
>>>> I hope this helps.
>>>
>>> Thanks for writing all of this down but I haven't understood some of it.
>>>
>>> For the sake of this discussion let's say that we managed to "reserve"
>>> the range early enough like we do for other modules, as you wrote.
>>>
>>> At the point where we want to call reserve_heap_pages() we would call
>>> init_heap_pages() just before it. We are still relatively early at boot
>>> so there aren't any concurrent memory operations. Why this doesn't work?
>>
>> Because init_heap_pages() may exclude some pages (for instance MFN 0 is carved
>> out) or use pages for its internal structure (see init_node_heap()). So you
>> can't expect to be able to allocate the exact same region by
>> reserve_heap_pages().
> 
> But it can't possibly use of any of pages it is trying to add to the
> heap, right?
Yes it can, there are already multiple examples in the buddy allocator.

> 
> We have reserved a certain range, we tell init_heap_pages to add the
> range to the heap, init_node_heap gets called and it ends up calling
> xmalloc. There is no way xmalloc can use any memory from that
> particular range because it is not in the heap yet. That should be safe.

If you look carefully at the code, you will notice:

     else if ( *use_tail && nr >= needed &&
               arch_mfn_in_directmap(mfn + nr) &&
               (!xenheap_bits ||
                !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
     {
         _heap[node] = mfn_to_virt(mfn + nr - needed);
         avail[node] = mfn_to_virt(mfn + nr - 1) +
                       PAGE_SIZE - sizeof(**avail) * NR_ZONES;
     }

This is one of the condition where the allocator will use a few pages 
from the region for itself.

> The init_node_heap code is a bit hard to follow but I went through it
> and couldn't spot anything that could cause any issues (MFN 0 aside
> which is a bit special). Am I missing something?
Aside what I wrote above, as soon as you give a page to an allocator, 
you waive a right to decide what the page is used for. The allocator is 
free to use the page for bookeeping or even carve out the page because 
it can't deal with it.

So I don't really see how giving a region to the allocator and then 
expecting the same region a call after is ever going to be safe.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices
  2020-04-30 13:01       ` Julien Grall
@ 2020-05-24 14:12         ` Julien Grall
  2020-05-26 16:46           ` Stefano Stabellini
  0 siblings, 1 reply; 67+ messages in thread
From: Julien Grall @ 2020-05-24 14:12 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk

Hi,

On 30/04/2020 14:01, Julien Grall wrote:
> On 29/04/2020 21:47, Stefano Stabellini wrote:
>> On Wed, 15 Apr 2020, Julien Grall wrote: But doesn't it make sense to 
>> give domU permission if it is going to get
>> the memory mapped? But admittedly I can't think of something that would
>> break because of the lack of the iomem_permit_access call in this code
>> path.
> 
> On Arm, the permissions are only useful if you plan you DomU to delegate 
> the regions to another domain. As your domain is not even aware it is 
> running on Xen (we don't expose 'xen' node in the DT), it makes little 
> sense to add the permission.

I actually found one use when helping a user last week. You can dump the 
list of MMIO regions assigned to a guest from Xen Console.

This will use d->iomem_caps that is modified via iomem_permit_access(). 
Without it, there is no easy way to confirm the list of MMIO regions 
assigned to a guest. Although...

> Even today, you can map IOMEM to a DomU and then revert the permission 
> right after. They IOMEM will still be mapped in the guest and it will 
> act normaly.

... this would not help the case where permissions are reverted. But I 
am assuming this shouldn't happen for Dom0less.

Stefano, I am not sure what's your plan for the series itself for Xen 
4.14. I think this patch could go in now. Any thoughts?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices
  2020-05-24 14:12         ` Julien Grall
@ 2020-05-26 16:46           ` Stefano Stabellini
  2020-05-27 18:09             ` Julien Grall
  0 siblings, 1 reply; 67+ messages in thread
From: Stefano Stabellini @ 2020-05-26 16:46 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk, Stefano Stabellini

On Sun, 24 May 2020, Julien Grall wrote:
> On 30/04/2020 14:01, Julien Grall wrote:
> > On 29/04/2020 21:47, Stefano Stabellini wrote:
> > > On Wed, 15 Apr 2020, Julien Grall wrote: But doesn't it make sense to give
> > > domU permission if it is going to get
> > > the memory mapped? But admittedly I can't think of something that would
> > > break because of the lack of the iomem_permit_access call in this code
> > > path.
> > 
> > On Arm, the permissions are only useful if you plan you DomU to delegate the
> > regions to another domain. As your domain is not even aware it is running on
> > Xen (we don't expose 'xen' node in the DT), it makes little sense to add the
> > permission.
> 
> I actually found one use when helping a user last week. You can dump the list
> of MMIO regions assigned to a guest from Xen Console.
> 
> This will use d->iomem_caps that is modified via iomem_permit_access().
> Without it, there is no easy way to confirm the list of MMIO regions assigned
> to a guest. Although...
> 
> > Even today, you can map IOMEM to a DomU and then revert the permission right
> > after. They IOMEM will still be mapped in the guest and it will act normaly.
> 
> ... this would not help the case where permissions are reverted. But I am
> assuming this shouldn't happen for Dom0less.

Thank you for looking into this


> Stefano, I am not sure what's your plan for the series itself for Xen 4.14. I
> think this patch could go in now. Any thoughts?

For the series: I have addresses all comments in my working tree except
for the ones on memory allocation (the thread "xen: introduce
reserve_heap_pages"). It looks like that part requires a complete
rewrite, and it seems that the new code is not trivial to write. So I am
thinking of not targeting 4.14. What do you think? Do you think the new
code should be "easy" enough that I could target 4.14?

For this patch: it is fine to go in now, doesn't have to wait for the
series.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices
  2020-05-26 16:46           ` Stefano Stabellini
@ 2020-05-27 18:09             ` Julien Grall
  0 siblings, 0 replies; 67+ messages in thread
From: Julien Grall @ 2020-05-27 18:09 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Stefano Stabellini, Volodymyr_Babchuk

Hi Stefano,

On 26/05/2020 17:46, Stefano Stabellini wrote:
> On Sun, 24 May 2020, Julien Grall wrote:
>> On 30/04/2020 14:01, Julien Grall wrote:
>>> On 29/04/2020 21:47, Stefano Stabellini wrote:
>>>> On Wed, 15 Apr 2020, Julien Grall wrote: But doesn't it make sense to give
>>>> domU permission if it is going to get
>>>> the memory mapped? But admittedly I can't think of something that would
>>>> break because of the lack of the iomem_permit_access call in this code
>>>> path.
>>>
>>> On Arm, the permissions are only useful if you plan you DomU to delegate the
>>> regions to another domain. As your domain is not even aware it is running on
>>> Xen (we don't expose 'xen' node in the DT), it makes little sense to add the
>>> permission.
>>
>> I actually found one use when helping a user last week. You can dump the list
>> of MMIO regions assigned to a guest from Xen Console.
>>
>> This will use d->iomem_caps that is modified via iomem_permit_access().
>> Without it, there is no easy way to confirm the list of MMIO regions assigned
>> to a guest. Although...
>>
>>> Even today, you can map IOMEM to a DomU and then revert the permission right
>>> after. They IOMEM will still be mapped in the guest and it will act normaly.
>>
>> ... this would not help the case where permissions are reverted. But I am
>> assuming this shouldn't happen for Dom0less.
> 
> Thank you for looking into this
> 
> 
>> Stefano, I am not sure what's your plan for the series itself for Xen 4.14. I
>> think this patch could go in now. Any thoughts?
> 
> For the series: I have addresses all comments in my working tree except
> for the ones on memory allocation (the thread "xen: introduce
> reserve_heap_pages"). It looks like that part requires a complete
> rewrite, and it seems that the new code is not trivial to write. So I am
> thinking of not targeting 4.14. What do you think? Do you think the new
> code should be "easy" enough that I could target 4.14?
It may be a stretch with the code freeze on Friday. I would suggest to 
send it when it is ready and we can either include in Xen 4.14 or as 
soon as the tree re-open.

> 
> For this patch: it is fine to go in now, doesn't have to wait for the
> series.

Feel free to add my ack on the patch.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2020-05-27 18:10 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-15  1:02 [PATCH 0/12] direct-map DomUs Stefano Stabellini
2020-04-15  1:02 ` [PATCH 01/12] xen: introduce xen_dom_flags Stefano Stabellini
2020-04-15  9:12   ` Jan Beulich
2020-04-15 13:26     ` Julien Grall
2020-04-29 23:57     ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 02/12] xen/arm: introduce arch_xen_dom_flags and direct_map Stefano Stabellini
2020-04-15 10:27   ` Jan Beulich
2020-04-15 11:27     ` Andrew Cooper
2020-04-30  0:34     ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 03/12] xen/arm: introduce 1:1 mapping for domUs Stefano Stabellini
2020-04-15 13:36   ` Julien Grall
2020-05-01  1:26     ` Stefano Stabellini
2020-05-01  8:30       ` Julien Grall
2020-05-09  0:07         ` Stefano Stabellini
2020-05-09  9:56           ` Julien Grall
2020-04-15  1:02 ` [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability Stefano Stabellini
2020-04-15 11:22   ` Wei Liu
2020-04-17 10:02   ` Jan Beulich
2020-04-29 23:09     ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 05/12] xen: introduce reserve_heap_pages Stefano Stabellini
2020-04-15 13:24   ` Julien Grall
2020-04-17 10:11   ` Jan Beulich
2020-04-29 22:46     ` Stefano Stabellini
2020-04-30  6:29       ` Jan Beulich
2020-04-30 16:21         ` Stefano Stabellini
2020-05-04  9:16           ` Jan Beulich
2020-04-30 14:51       ` Julien Grall
2020-04-30 17:00         ` Stefano Stabellini
2020-04-30 18:27           ` Julien Grall
2020-05-12  1:10             ` Stefano Stabellini
2020-05-12  8:57               ` Julien Grall
2020-04-15  1:02 ` [PATCH 06/12] xen/arm: reserve 1:1 memory for direct_map domUs Stefano Stabellini
2020-04-15 13:38   ` Julien Grall
2020-04-15  1:02 ` [PATCH 07/12] xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase Stefano Stabellini
2020-04-15 13:41   ` Julien Grall
2020-04-15  1:02 ` [PATCH 08/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv2 Stefano Stabellini
2020-04-15 14:00   ` Julien Grall
2020-05-01  1:26     ` Stefano Stabellini
2020-05-01  8:23       ` Julien Grall
2020-05-09  0:06         ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 09/12] xen/arm: if is_domain_direct_mapped use native addresses for GICv3 Stefano Stabellini
2020-04-15 14:09   ` Julien Grall
2020-05-01  1:31     ` Stefano Stabellini
2020-05-01  8:40       ` Julien Grall
2020-05-09  0:06         ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 10/12] xen/arm: if is_domain_direct_mapped use native UART address for vPL011 Stefano Stabellini
2020-04-15 14:11   ` Julien Grall
2020-05-01  1:26     ` Stefano Stabellini
2020-05-01  8:09       ` Julien Grall
2020-05-09  0:07         ` Stefano Stabellini
2020-05-09 10:11           ` Julien Grall
2020-05-11 22:58             ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU Stefano Stabellini
2020-04-15 14:12   ` Julien Grall
2020-04-29 21:55     ` Stefano Stabellini
2020-04-30 13:51       ` Julien Grall
2020-05-01  1:25         ` Stefano Stabellini
2020-04-15  1:02 ` [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices Stefano Stabellini
2020-04-15 14:18   ` Julien Grall
2020-04-29 20:47     ` Stefano Stabellini
2020-04-30 13:01       ` Julien Grall
2020-05-24 14:12         ` Julien Grall
2020-05-26 16:46           ` Stefano Stabellini
2020-05-27 18:09             ` Julien Grall
2020-04-16  8:59 ` [PATCH 0/12] direct-map DomUs Julien Grall
2020-04-29 20:16   ` Stefano Stabellini
2020-04-30 12:54     ` Julien Grall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).