All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes
@ 2018-04-27 17:12 Mirela Simonovic
  2018-04-27 17:12 ` [PATCH v3 01/10] xen/arm64: Added handling of the trapped access to OSLSR register Mirela Simonovic
                   ` (9 more replies)
  0 siblings, 10 replies; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel
  Cc: edgar.iglesias, sstabellini, dm, George Dunlap, Dario Faggioli,
	julien.grall, Mirela Simonovic

This patch set contains fixes that are required as precondition for suspend to
RAM support, including the CPU hotplug which is required to suspend non-boot
CPUs.
The first two patches in this series:
1) xen/arm64: Added handling of the trapped access to OSLSR register
2) xen/arm: Ignore write to GICD_ISACTIVERn registers (vgic-v2)
are required to avoid Dom0 crashes when Dom0 performs its own suspend. This
patch set does not include the implementation of virtual PSCI system suspend
call that would allow guests to finalize their suspend procedures. This will
be submitted in the following series.

Remaining of the patches are related to enabling CPU hotplug for non-boot
CPUs is resume scenario. CPU hotplug of non-boot CPUs will be used for suspend
to RAM support for ARM. In suspend procedure, the hot-unplug of non-boot CPUs
will be triggered with disable_nonboot_cpus(), while the hotplug is triggered
with enable_nonboot_cpus(). Using these calls, the physical non-boot CPUs could
be powered down/up on suspend/resume, respectively, if the underlying firmware
allows so. Calls to enable/disable_nonboot_cpus() functions currently do not
exist in Xen ARM code. This will be added with the suspend to RAM support for
ARM.

When non-boot pCPUs are hot-unplugged their interrupts are migrated to the boot
pCPU. This series also includes a fix that would restore the interrupts affinity
once non-boot pCPUs are hotplugged. Here only SPIs used by guests are covered.
Migration of Xen internal SPIs is not covered. According to my understanding
Xen internal SPIs are routed to the boot CPU which initializes the respective
devices.

The code is tested on Xilinx Zynq UltraScale+ MPSoC/ZCU102 board (includes
physical power down/up of non-boot CPUs). The testing requires additional
patches for issuing system suspend. These patches and instructions for testing
will be submitted later, when we get closer to the final version of the series.

---
Changes in v2:
-Rename cover-letter title and emphasize that 2 patches from this series are not
specific to CPU hotplug (my initial fault, splitting it now could be confusing)
-Fix cover-letter explanations
-Address all the issues and comments as discussed on mailing list for v1
-Add 3 patches to ensure that suspend/resume does not cause any memory leaks.
All the memory allocated when a CPU was hotplugged is now freed when the CPU is
hot-unplugged.
-Remove from the v1 series the patch which incorrectly dealt with an issue:
[PATCH 4/7] xen/arm: When CPU dies, free percpu area immediatelly
One solution to the issue addressed by the patch above is to add rcu_barrier()
prior to calling enable_nonboot_cpus() during the suspend. This is how it is
done in x86 suspend implementation. Until the discussion here
https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg01199.html
doesn't conclude differently, I need to assume that adding rcu_barrier() prior
to calling enable_nonboot_cpus() as it is done for x86 is the right way to go.
Therefore, the fix to the issue will be part of the suspend to RAM series.

Changes in v3:
-Add acked-by where needed
-Fix CPU_OFF PSCI implementation (physical interface)
-Use notifiers to implement freeing memory and releasing interrupts on CPU
hotplug
-Use notifier to trigger setup of virtual paging for non-boot CPUs on CPU
hotplug
-Add enabling errata workarounds on CPU hotplug, also based on a notifier
-Remove patch:
[PATCH v2 10/10] xen/arm: Call check_local_cpu_errata for secondary CPU only on boot

---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Dario Faggioli <dfaggioli@suse.com>
---

Mirela Simonovic (10):
  xen/arm64: Added handling of the trapped access to OSLSR register
  xen/arm: Ignore write to GICD_ISACTIVERn registers (vgic-v2)
  xen/arm: Implement CPU_OFF PSCI call (physical interface)
  xen/arm: Remove __initdata and __init to enable CPU hotplug
  xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume
  xen/common: Restore IRQ affinity when hotplugging a pCPU
  xen/arm: Release maintenance interrupt when CPU is hot-unplugged
  xen/arm: Release timer interrupts when CPU is hot-unplugged
  xen/arm: Free memory allocated for sibling/core maps on CPU hot-unplug
  xen/arm: Enable errata for secondary CPU on hotplug after the boot

 xen/arch/arm/arm64/smpboot.c     |  2 +-
 xen/arch/arm/arm64/vsysreg.c     |  3 +-
 xen/arch/arm/cpuerrata.c         | 35 +++++++++++++++++
 xen/arch/arm/cpufeature.c        | 23 +++++++++++
 xen/arch/arm/gic.c               | 29 ++++++++++++++
 xen/arch/arm/irq.c               |  2 +-
 xen/arch/arm/p2m.c               | 82 ++++++++++++++++++++++++++++++----------
 xen/arch/arm/processor.c         |  2 +-
 xen/arch/arm/psci.c              | 13 +++++++
 xen/arch/arm/smpboot.c           | 40 +++++++++++++++++++-
 xen/arch/arm/time.c              | 39 +++++++++++++++++++
 xen/arch/arm/vgic-v2.c           |  2 +
 xen/common/schedule.c            |  4 ++
 xen/include/asm-arm/cpufeature.h |  1 +
 xen/include/asm-arm/procinfo.h   |  4 +-
 xen/include/asm-arm/psci.h       |  1 +
 16 files changed, 254 insertions(+), 28 deletions(-)

-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 01/10] xen/arm64: Added handling of the trapped access to OSLSR register
  2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
@ 2018-04-27 17:12 ` Mirela Simonovic
  2018-04-27 17:12 ` [PATCH v3 02/10] xen/arm: Ignore write to GICD_ISACTIVERn registers (vgic-v2) Mirela Simonovic
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel; +Cc: edgar.iglesias, julien.grall, sstabellini, dm, Mirela Simonovic

Linux/dom0 accesses OSLSR register when saving CPU context during the
suspend procedure. Xen traps access to this register, but has no handling
for it. Consequently, Xen injects undef exception to linux, causing it to
crash. This patch adds handling of the trapped access to OSLSR as ro/raz.

Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>

---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
---
Changes in v2:
- Commit message fix (arm64 related change instead of arm)
- Add Stefano's reviewed-by

Changes in v3:
- Added Julien's acked-by
---
 xen/arch/arm/arm64/vsysreg.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index c57ac12503..8f80e1735e 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -57,13 +57,14 @@ void do_sysreg(struct cpu_user_regs *regs,
      * ARMv8 (DDI 0487A.d): D1-1509 Table D1-58
      *
      * Unhandled:
-     *    OSLSR_EL1
      *    DBGPRCR_EL1
      */
     case HSR_SYSREG_OSLAR_EL1:
         return handle_wo_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
     case HSR_SYSREG_OSDLR_EL1:
         return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
+    case HSR_SYSREG_OSLSR_EL1:
+        return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 1);
 
     /*
      * MDCR_EL2.TDA
-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 02/10] xen/arm: Ignore write to GICD_ISACTIVERn registers (vgic-v2)
  2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
  2018-04-27 17:12 ` [PATCH v3 01/10] xen/arm64: Added handling of the trapped access to OSLSR register Mirela Simonovic
@ 2018-04-27 17:12 ` Mirela Simonovic
  2018-04-30 14:32   ` Julien Grall
  2018-04-27 17:12 ` [PATCH v3 03/10] xen/arm: Implement CPU_OFF PSCI call (physical interface) Mirela Simonovic
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel; +Cc: edgar.iglesias, julien.grall, sstabellini, dm, Mirela Simonovic

Guests attempt to write into these registers on resume (for example Linux).
Without this patch a data abort exception will be raised to the guest.
This patch handles the write access by ignoring it, but only if the value
to be written is zero. This should be fine because reading these registers
is already handled as 'read as zero'.

Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>

---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
---
Changes in v2:
- Write should be ignored only if the value to be written is zero
 (in v1 the write was ignored regardless of the value)

Changes in v3:
- Print warning only if the value to be written is not zero
---
 xen/arch/arm/vgic-v2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 646d1f3d12..f6c11f1e41 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -485,6 +485,8 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
 
     case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
         if ( dabt.size != DABT_WORD ) goto bad_width;
+        if ( r == 0 )
+            goto write_ignore_32;
         printk(XENLOG_G_ERR
                "%pv: vGICD: unhandled word write %#"PRIregister" to ISACTIVER%d\n",
                v, r, gicd_reg - GICD_ISACTIVER);
-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 03/10] xen/arm: Implement CPU_OFF PSCI call (physical interface)
  2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
  2018-04-27 17:12 ` [PATCH v3 01/10] xen/arm64: Added handling of the trapped access to OSLSR register Mirela Simonovic
  2018-04-27 17:12 ` [PATCH v3 02/10] xen/arm: Ignore write to GICD_ISACTIVERn registers (vgic-v2) Mirela Simonovic
@ 2018-04-27 17:12 ` Mirela Simonovic
  2018-04-30 14:36   ` Julien Grall
  2018-04-27 17:12 ` [PATCH v3 04/10] xen/arm: Remove __initdata and __init to enable CPU hotplug Mirela Simonovic
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel; +Cc: edgar.iglesias, julien.grall, sstabellini, dm, Mirela Simonovic

During the system suspend to RAM non-boot CPUs will be hotplugged.
This will be triggered via disable_nonboot_cpus() call. When
hotplugged the CPU will end up in an infinite wfi loop in stop_cpu().
This patch adds PSCI CPU_OFF call to the EL3 with the aim to get powered
down the calling CPU during the suspend. The CPU_OFF call will be made
only if the PSCI version is higher than v0.1 (Note that the CPU_OFF
function is mandatory since PSCI v0.2).
If PSCI CPU_OFF call to the EL3 succeeds it will not return. Otherwise,
when the PSCI CPU_OFF call returns we'll raise panic, because the
calling CPU couldn't be enabled afterwards (stays in WFI loop forever).
Note that if the PSCI version is higher than v0.1 the CPU_OFF will be
called regardless of the system state. This is done because scenarios
other than suspend may benefit from powering off the CPU.

Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>

---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
---
Changes in v2:
-Issue PSCI CPU_OFF only if the system is suspending
-If PSCI CPU_OFF call fails (unlikely to ever happen) raise panic
-Fixed commit message

Changes in v3:
-Check for PSCI version prior to calling CPU_OFF
-Don't check for system state - invoke CPU_OFF in all system states
-Don't check if returned error is not zero because it's always not
 zero if CPU_OFF SMC returns
-Fixed commit message
---
 xen/arch/arm/psci.c        | 13 +++++++++++++
 xen/arch/arm/smpboot.c     |  2 ++
 xen/include/asm-arm/psci.h |  1 +
 3 files changed, 16 insertions(+)

diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index 94b616df9b..7c1124e45f 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -46,6 +46,19 @@ int call_psci_cpu_on(int cpu)
     return call_smc(psci_cpu_on_nr, cpu_logical_map(cpu), __pa(init_secondary), 0);
 }
 
+void call_psci_cpu_off(void)
+{
+    if ( psci_ver > PSCI_VERSION(0, 1) )
+    {
+        int errno;
+
+        /* If successfull the PSCI cpu_off call doesn't return */
+        errno = call_smc(PSCI_0_2_FN32_CPU_OFF, 0, 0, 0);
+        panic("PSCI cpu off failed for CPU%d err=%d\n", get_processor_id(),
+                errno);
+    }
+}
+
 void call_psci_system_off(void)
 {
     if ( psci_ver > PSCI_VERSION(0, 1) )
diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index b2116f0d2d..8b1e274bf3 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -395,6 +395,8 @@ void stop_cpu(void)
     /* Make sure the write happens before we sleep forever */
     dsb(sy);
     isb();
+    call_psci_cpu_off();
+
     while ( 1 )
         wfi();
 }
diff --git a/xen/include/asm-arm/psci.h b/xen/include/asm-arm/psci.h
index 9ac820e94a..832f77afff 100644
--- a/xen/include/asm-arm/psci.h
+++ b/xen/include/asm-arm/psci.h
@@ -20,6 +20,7 @@ extern uint32_t psci_ver;
 
 int psci_init(void);
 int call_psci_cpu_on(int cpu);
+void call_psci_cpu_off(void);
 void call_psci_system_off(void);
 void call_psci_system_reset(void);
 
-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 04/10] xen/arm: Remove __initdata and __init to enable CPU hotplug
  2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
                   ` (2 preceding siblings ...)
  2018-04-27 17:12 ` [PATCH v3 03/10] xen/arm: Implement CPU_OFF PSCI call (physical interface) Mirela Simonovic
@ 2018-04-27 17:12 ` Mirela Simonovic
  2018-04-27 17:12 ` [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume Mirela Simonovic
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel; +Cc: edgar.iglesias, julien.grall, sstabellini, dm, Mirela Simonovic

CPU up flow is currently used during the initial boot to start secondary
CPUs. However, the same flow should be used for CPU hotplug, e.g. when
hotplugging secondary CPUs within the resume procedure (resume from the
suspend to RAM). Therefore, prefixes __initdata and __init had to be removed
from few data structures and functions that are used within the cpu up flow.

Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
Acked-by: Julien Grall <julien.grall@arm.com>

---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
---
Changes in v3:
- Added acked-by Julien
---
 xen/arch/arm/arm64/smpboot.c   | 2 +-
 xen/arch/arm/irq.c             | 2 +-
 xen/arch/arm/processor.c       | 2 +-
 xen/arch/arm/smpboot.c         | 4 ++--
 xen/include/asm-arm/procinfo.h | 4 ++--
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/arm64/smpboot.c b/xen/arch/arm/arm64/smpboot.c
index 4fd0ac68b7..694fbf67e6 100644
--- a/xen/arch/arm/arm64/smpboot.c
+++ b/xen/arch/arm/arm64/smpboot.c
@@ -104,7 +104,7 @@ int __init arch_cpu_init(int cpu, struct dt_device_node *dn)
         return smp_psci_init(cpu);
 }
 
-int __init arch_cpu_up(int cpu)
+int arch_cpu_up(int cpu)
 {
     if ( !smp_enable_ops[cpu].prepare_cpu )
         return -ENODEV;
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index aa4e832cae..098281f8ab 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -65,7 +65,7 @@ irq_desc_t *__irq_to_desc(int irq)
     return &irq_desc[irq-NR_LOCAL_IRQS];
 }
 
-int __init arch_init_one_irq_desc(struct irq_desc *desc)
+int arch_init_one_irq_desc(struct irq_desc *desc)
 {
     desc->arch.type = IRQ_TYPE_INVALID;
     return 0;
diff --git a/xen/arch/arm/processor.c b/xen/arch/arm/processor.c
index ce4385064a..acad8b31d6 100644
--- a/xen/arch/arm/processor.c
+++ b/xen/arch/arm/processor.c
@@ -20,7 +20,7 @@
 
 static DEFINE_PER_CPU(struct processor *, processor);
 
-void __init processor_setup(void)
+void processor_setup(void)
 {
     const struct proc_info_list *procinfo;
 
diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 8b1e274bf3..ad1f6b751b 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -52,8 +52,8 @@ nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
 static unsigned char __initdata cpu0_boot_stack[STACK_SIZE]
        __attribute__((__aligned__(STACK_SIZE)));
 
-/* Initial boot cpu data */
-struct init_info __initdata init_data =
+/* Boot cpu data */
+struct init_info init_data =
 {
     .stack = cpu0_boot_stack,
 };
diff --git a/xen/include/asm-arm/procinfo.h b/xen/include/asm-arm/procinfo.h
index 26306b35f8..02be56e348 100644
--- a/xen/include/asm-arm/procinfo.h
+++ b/xen/include/asm-arm/procinfo.h
@@ -35,9 +35,9 @@ struct proc_info_list {
     struct processor    *processor;
 };
 
-const __init struct proc_info_list *lookup_processor_type(void);
+const struct proc_info_list *lookup_processor_type(void);
 
-void __init processor_setup(void);
+void processor_setup(void);
 void processor_vcpu_initialise(struct vcpu *v);
 
 #endif
-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume
  2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
                   ` (3 preceding siblings ...)
  2018-04-27 17:12 ` [PATCH v3 04/10] xen/arm: Remove __initdata and __init to enable CPU hotplug Mirela Simonovic
@ 2018-04-27 17:12 ` Mirela Simonovic
  2018-04-30 14:47   ` Julien Grall
  2018-04-27 17:12 ` [PATCH v3 06/10] xen/common: Restore IRQ affinity when hotplugging a pCPU Mirela Simonovic
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel; +Cc: edgar.iglesias, julien.grall, sstabellini, dm, Mirela Simonovic

In existing code the virtual paging for non-boot CPUs is setup only on boot.
The setup is triggered from start_xen() after all CPUs are brought online.
In other words, the initialization of VTCR_EL2 register is done out of the
cpu_up/start_secondary() control flow. However, the cpu_up flow is also used
to hotplug non-boot CPUs on resume from suspend to RAM state, in which case
the virtual paging will not be configured.

With this patch the setting of paging is triggered from start_secondary()
function using cpu starting notifier (notify_cpu_starting() call). The
notifier is registered in p2m.c using init call. This has to be done with
init call rather than presmp_init because the registered callback depends
on vtcr configuration value which is setup after the presmp init calls
are executed (do_presmp_initcalls() called from start_xen()). Init calls
are executed after initial virtual paging is set up for all CPUs on boot.
This ensures that no callback can fire until the vtcr value is calculated
by Xen and virtual paging is set up initially for all CPUs. Also, this way
the virtual paging setup in boot scenario remains unchanged.

It is assumed here that after the system completed the boot, CPUs that
execute start_secondary() were booted as well when the Xen itself was
booted. According to this assumption non-boot CPUs will always be compliant
with the VTCR_EL2 value that was selected by Xen on boot.
Currently, there is no mechanism to trigger hotplugging of a CPU. This
will be added with the suspend to RAM support for ARM, where the hotplug
of non-boot CPUs will be triggered via enable_nonboot_cpus() call.

Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>

---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
---
Changes in v2:
-Fix commit message
-Save configured VTCR_EL2 value into static variable that will be used
 by non-boot CPUs on hotplug
-Add setup_virt_paging_secondary() and invoke it from start_secondary()
 if that CPU has to setup virtual paging (if the system state is not boot)

Changes in v3:
-Fix commit message
-Remove setup_virt_paging_secondary() and use notifier to setup virtual
 paging for non-boot CPU on hotplug.
-In setup_virt_paging() use vtcr static variable instead of local val
-In setup_virt_paging_one() use vtcr static variable instead of provided
 argument
---
 xen/arch/arm/p2m.c | 82 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 62 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index d43c3aa896..98a1fe6de9 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -8,6 +8,8 @@
 #include <xen/iocap.h>
 #include <xen/mem_access.h>
 #include <xen/xmalloc.h>
+#include <xen/notifier.h>
+#include <xen/cpu.h>
 #include <public/vm_event.h>
 #include <asm/flushtlb.h>
 #include <asm/event.h>
@@ -1451,24 +1453,17 @@ err:
     return page;
 }
 
-static void __init setup_virt_paging_one(void *data)
+/* VTCR value to be configured by all CPUs. Set only once by the boot CPU */
+static uint64_t __read_mostly vtcr;
+
+static void setup_virt_paging_one(void *data)
 {
-    unsigned long val = (unsigned long)data;
-    WRITE_SYSREG32(val, VTCR_EL2);
+    WRITE_SYSREG32(vtcr, VTCR_EL2);
     isb();
 }
 
 void __init setup_virt_paging(void)
 {
-    /* Setup Stage 2 address translation */
-    unsigned long val = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
-
-#ifdef CONFIG_ARM_32
-    printk("P2M: 40-bit IPA\n");
-    p2m_ipa_bits = 40;
-    val |= VTCR_T0SZ(0x18); /* 40 bit IPA */
-    val |= VTCR_SL0(0x1); /* P2M starts at first level */
-#else /* CONFIG_ARM_64 */
     const struct {
         unsigned int pabits; /* Physical Address Size */
         unsigned int t0sz;   /* Desired T0SZ, minimum in comment */
@@ -1491,6 +1486,16 @@ void __init setup_virt_paging(void)
     unsigned int pa_range = 0x10; /* Larger than any possible value */
     bool vmid_8_bit = false;
 
+    /* Setup Stage 2 address translation */
+    vtcr = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
+
+#ifdef CONFIG_ARM_32
+    printk("P2M: 40-bit IPA\n");
+    p2m_ipa_bits = 40;
+    vtcr |= VTCR_T0SZ(0x18); /* 40 bit IPA */
+    vtcr |= VTCR_SL0(0x1); /* P2M starts at first level */
+#else /* CONFIG_ARM_64 */
+
     for_each_online_cpu ( cpu )
     {
         const struct cpuinfo_arm *info = &cpu_data[cpu];
@@ -1513,14 +1518,14 @@ void __init setup_virt_paging(void)
     if ( pa_range >= ARRAY_SIZE(pa_range_info) || !pa_range_info[pa_range].pabits )
         panic("Unknown encoding of ID_AA64MMFR0_EL1.PARange %x\n", pa_range);
 
-    val |= VTCR_PS(pa_range);
-    val |= VTCR_TG0_4K;
+    vtcr |= VTCR_PS(pa_range);
+    vtcr |= VTCR_TG0_4K;
 
     /* Set the VS bit only if 16 bit VMID is supported. */
     if ( MAX_VMID == MAX_VMID_16_BIT )
-        val |= VTCR_VS;
-    val |= VTCR_SL0(pa_range_info[pa_range].sl0);
-    val |= VTCR_T0SZ(pa_range_info[pa_range].t0sz);
+        vtcr |= VTCR_VS;
+    vtcr |= VTCR_SL0(pa_range_info[pa_range].sl0);
+    vtcr |= VTCR_T0SZ(pa_range_info[pa_range].t0sz);
 
     p2m_root_order = pa_range_info[pa_range].root_order;
     p2m_root_level = 2 - pa_range_info[pa_range].sl0;
@@ -1532,16 +1537,53 @@ void __init setup_virt_paging(void)
            ( MAX_VMID == MAX_VMID_16_BIT ) ? 16 : 8);
 #endif
     printk("P2M: %d levels with order-%d root, VTCR 0x%lx\n",
-           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
+           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, vtcr);
 
     p2m_vmid_allocator_init();
 
     /* It is not allowed to concatenate a level zero root */
     BUG_ON( P2M_ROOT_LEVEL == 0 && P2M_ROOT_ORDER > 0 );
-    setup_virt_paging_one((void *)val);
-    smp_call_function(setup_virt_paging_one, (void *)val, 1);
+    setup_virt_paging_one(NULL);
+    smp_call_function(setup_virt_paging_one, NULL, 1);
+}
+
+static int cpu_virt_paging_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    switch ( action )
+    {
+    case CPU_STARTING:
+        ASSERT(system_state != SYS_STATE_boot);
+        setup_virt_paging_one(NULL);
+        break;
+    default:
+        break;
+    }
+
+    return NOTIFY_DONE;
 }
 
+static struct notifier_block cpu_virt_paging_nfb = {
+    .notifier_call = cpu_virt_paging_callback,
+    .priority = 100 /* highest priority */
+};
+
+static int __init cpu_virt_paging_init(void)
+{
+    register_cpu_notifier(&cpu_virt_paging_nfb);
+    return 0;
+}
+/*
+ * Initialization of the notifier has to be done at init rather than presmp_init
+ * phase because: the registered notifier is used to setup virtual paging for
+ * non-boot CPUs after the initial virtual paging for all CPUs is already setup,
+ * i.e. when a non-boot CPU is hotplugged after the system has booted. In other
+ * words, the notifier should be registered after the virtual paging is
+ * initially setup (setup_virt_paging() is called from start_xen()). This is
+ * required because vtcr config value has to be set before a notifier can fire.
+ */
+__initcall(cpu_virt_paging_init);
+
 /*
  * Local variables:
  * mode: C
-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 06/10] xen/common: Restore IRQ affinity when hotplugging a pCPU
  2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
                   ` (4 preceding siblings ...)
  2018-04-27 17:12 ` [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume Mirela Simonovic
@ 2018-04-27 17:12 ` Mirela Simonovic
  2018-05-10  7:33   ` Dario Faggioli
  2018-04-27 17:12 ` [PATCH v3 07/10] xen/arm: Release maintenance interrupt when CPU is hot-unplugged Mirela Simonovic
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel
  Cc: edgar.iglesias, sstabellini, dm, George Dunlap, Dario Faggioli,
	julien.grall, Mirela Simonovic

Non-boot pCPUs are being hot-unplugged during the system suspend to
RAM and hotplugged during the resume. When non-boot pCPUs are
hot-unplugged the interrupts that were targeted to them are migrated
to the boot pCPU.
On suspend, each guest could have its own wake-up devices/interrupts
(passthrough) that could trigger the system resume. These interrupts
could be targeted to a non-boot pCPU, e.g. if the guest's vCPU is
pinned to a non-boot pCPU. Due to the hot-unplug of non-boot pCPUs
during the suspend such interrupts will be migrated from non-boot pCPUs
to the boot pCPU (this is fine). However, when non-boot pCPUs are
hotplugged on resume, these interrupts are not migrated back to non-boot
pCPUs, i.e. IRQ affinity is not restored on resume (this is wrong).
This patch adds the restoration of IRQ affinity when a pCPU is hotplugged.

Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>

---
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Dario Faggioli <dfaggioli@suse.com>
---
Changes in v2:
-Instead of checking whether the affinity was broken check whether
 vcpu's processor has changed in order to trigger restoring of the
 IRQ affinity
-Fix commit message
---
 xen/common/schedule.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 343ab6306e..c37ef4846d 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -692,6 +692,7 @@ void restore_vcpu_affinity(struct domain *d)
     for_each_vcpu ( d, v )
     {
         spinlock_t *lock;
+        unsigned int old_cpu = v->processor;
 
         ASSERT(!vcpu_runnable(v));
 
@@ -724,6 +725,9 @@ void restore_vcpu_affinity(struct domain *d)
         lock = vcpu_schedule_lock_irq(v);
         v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
         spin_unlock_irq(lock);
+
+        if ( old_cpu != v->processor )
+            sched_move_irqs(v);
     }
 
     domain_update_node_affinity(d);
-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 07/10] xen/arm: Release maintenance interrupt when CPU is hot-unplugged
  2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
                   ` (5 preceding siblings ...)
  2018-04-27 17:12 ` [PATCH v3 06/10] xen/common: Restore IRQ affinity when hotplugging a pCPU Mirela Simonovic
@ 2018-04-27 17:12 ` Mirela Simonovic
  2018-04-30 14:51   ` Julien Grall
  2018-04-27 17:12 ` [PATCH v3 08/10] xen/arm: Release timer interrupts " Mirela Simonovic
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel; +Cc: edgar.iglesias, julien.grall, sstabellini, dm, Mirela Simonovic

When a CPU is hot-unplugged the maintenance interrupt has to be
released in order to free the memory that was allocated when the CPU
was hotplugged and interrupt requested. The interrupt was requested
using request_irq() which is called from start_secondary->
init_maintenance_interrupt. With this patch the interrupt will be
released when the CPU_DYING event is received by the callback which
is added in gic.c.

Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>

---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
---
Changes in v3:
-Add notifier in order to trigger releasing of the  maintenance
 interrupt when the CPU is dying.
---
 xen/arch/arm/gic.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 653a815127..89abc49950 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -27,6 +27,8 @@
 #include <xen/list.h>
 #include <xen/device_tree.h>
 #include <xen/acpi.h>
+#include <xen/notifier.h>
+#include <xen/cpu.h>
 #include <asm/p2m.h>
 #include <asm/domain.h>
 #include <asm/platform.h>
@@ -462,6 +464,33 @@ int gic_iomem_deny_access(const struct domain *d)
     return gic_hw_ops->iomem_deny_access(d);
 }
 
+static int cpu_gic_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    switch ( action )
+    {
+    case CPU_DYING:
+        /* This is reverting the work done in init_maintenance_interrupt */
+        release_irq(gic_hw_ops->info->maintenance_irq, NULL);
+        break;
+    default:
+        break;
+    }
+
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_gic_nfb = {
+    .notifier_call = cpu_gic_callback,
+};
+
+static int __init cpu_gic_notifier_init(void)
+{
+    register_cpu_notifier(&cpu_gic_nfb);
+    return 0;
+}
+__initcall(cpu_gic_notifier_init);
+
 /*
  * Local variables:
  * mode: C
-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 08/10] xen/arm: Release timer interrupts when CPU is hot-unplugged
  2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
                   ` (6 preceding siblings ...)
  2018-04-27 17:12 ` [PATCH v3 07/10] xen/arm: Release maintenance interrupt when CPU is hot-unplugged Mirela Simonovic
@ 2018-04-27 17:12 ` Mirela Simonovic
  2018-04-30 15:58   ` Julien Grall
  2018-04-27 17:12 ` [PATCH v3 09/10] xen/arm: Free memory allocated for sibling/core maps on CPU hot-unplug Mirela Simonovic
  2018-04-27 17:12 ` [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot Mirela Simonovic
  9 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel; +Cc: edgar.iglesias, julien.grall, sstabellini, dm, Mirela Simonovic

When a CPU is hot-unplugged timer interrupts have to be released
in order to free the memory that was allocated when the interrupts
were requested (using request_irq()). The request_irq is called
for each timer interrupt when the CPU gets hotplugged
(start_secondary->init_timer_interrupt->request_irq).
With this patch the timer interrupts will be released when the
newly added callback receives CPU_DYING event.

Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>

---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
---
Changes in v3:
-Trigger releasing of timer interrupts using notifiers
---
 xen/arch/arm/time.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index c11fcfeadd..c7317e4639 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -29,6 +29,8 @@
 #include <xen/sched.h>
 #include <xen/event.h>
 #include <xen/acpi.h>
+#include <xen/notifier.h>
+#include <xen/cpu.h>
 #include <asm/system.h>
 #include <asm/time.h>
 #include <asm/vgic.h>
@@ -312,6 +314,17 @@ void init_timer_interrupt(void)
     check_timer_irq_cfg(timer_irq[TIMER_PHYS_NONSECURE_PPI], "NS-physical");
 }
 
+/*
+ * Revert actions done in init_timer_interrupt that are required to properly
+ * disable this CPU.
+ */
+static void deinit_timer_interrupt(void)
+{
+    release_irq(timer_irq[TIMER_HYP_PPI], NULL);
+    release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
+    release_irq(timer_irq[TIMER_PHYS_NONSECURE_PPI], NULL);
+}
+
 /* Wait a set number of microseconds */
 void udelay(unsigned long usecs)
 {
@@ -340,6 +353,32 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
     /* XXX update guest visible wallclock time */
 }
 
+static int cpu_time_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    switch ( action )
+    {
+    case CPU_DYING:
+        deinit_timer_interrupt();
+        break;
+    default:
+        break;
+    }
+
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_time_nfb = {
+    .notifier_call = cpu_time_callback,
+};
+
+static int __init cpu_time_notifier_init(void)
+{
+    register_cpu_notifier(&cpu_time_nfb);
+    return 0;
+}
+__initcall(cpu_time_notifier_init);
+
 /*
  * Local variables:
  * mode: C
-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 09/10] xen/arm: Free memory allocated for sibling/core maps on CPU hot-unplug
  2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
                   ` (7 preceding siblings ...)
  2018-04-27 17:12 ` [PATCH v3 08/10] xen/arm: Release timer interrupts " Mirela Simonovic
@ 2018-04-27 17:12 ` Mirela Simonovic
  2018-04-30 15:55   ` Julien Grall
  2018-04-27 17:12 ` [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot Mirela Simonovic
  9 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel; +Cc: edgar.iglesias, julien.grall, sstabellini, dm, Mirela Simonovic

The memory allocated in setup_cpu_sibling_map() when a CPU is hotplugged
has to be freed when the CPU is hot-unplugged. This is done in
remove_cpu_sibling_map() and called when the CPU dies. The call to
remove_cpu_sibling_map() is made from a notifier callback when
CPU_DEAD event is received.

Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>

---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
---
Changes in v3:
-Use notifier to trigger remove_cpu_sibling_map() when the CPU dies.
---
 xen/arch/arm/smpboot.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index ad1f6b751b..b833e3a754 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -89,6 +89,12 @@ static void setup_cpu_sibling_map(int cpu)
     cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
 }
 
+static void remove_cpu_sibling_map(int cpu)
+{
+    free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
+    free_cpumask_var(per_cpu(cpu_core_mask, cpu));
+}
+
 void __init
 smp_clear_cpu_maps (void)
 {
@@ -499,6 +505,34 @@ void __cpu_die(unsigned int cpu)
     smp_mb();
 }
 
+static int cpu_smpboot_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned int cpu = (unsigned long)hcpu;
+
+    switch ( action )
+    {
+    case CPU_DEAD:
+        remove_cpu_sibling_map(cpu);
+        break;
+    default:
+        break;
+    }
+
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_smpboot_nfb = {
+    .notifier_call = cpu_smpboot_callback,
+};
+
+static int __init cpu_smpboot_notifier_init(void)
+{
+    register_cpu_notifier(&cpu_smpboot_nfb);
+    return 0;
+}
+__initcall(cpu_smpboot_notifier_init);
+
 /*
  * Local variables:
  * mode: C
-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
                   ` (8 preceding siblings ...)
  2018-04-27 17:12 ` [PATCH v3 09/10] xen/arm: Free memory allocated for sibling/core maps on CPU hot-unplug Mirela Simonovic
@ 2018-04-27 17:12 ` Mirela Simonovic
  2018-04-30 16:09   ` Julien Grall
  9 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-04-27 17:12 UTC (permalink / raw)
  To: xen-devel; +Cc: edgar.iglesias, julien.grall, sstabellini, dm, Mirela Simonovic

On boot, enabling errata workarounds will be triggered by the boot CPU
from start_xen(). On CPU hotplug (non-boot scenario) this would not be
done. This patch adds the code required to enable errata workarounds
for a CPU being hotplugged after the system boots. This is triggered
using a notifier. If the CPU fails to enable the errata Xen will panic.
This is done because it is assumed that the CPU which is hotplugged
after the system/Xen boots, was initially hotplugged during the
system/Xen boot. Therefore, enabling errata workarounds should never
fail.

Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>

---
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/cpuerrata.c         | 35 +++++++++++++++++++++++++++++++++++
 xen/arch/arm/cpufeature.c        | 23 +++++++++++++++++++++++
 xen/include/asm-arm/cpufeature.h |  1 +
 3 files changed, 59 insertions(+)

diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
index 1baa20654b..4040f781ec 100644
--- a/xen/arch/arm/cpuerrata.c
+++ b/xen/arch/arm/cpuerrata.c
@@ -5,6 +5,8 @@
 #include <xen/spinlock.h>
 #include <xen/vmap.h>
 #include <xen/warning.h>
+#include <xen/notifier.h>
+#include <xen/cpu.h>
 #include <asm/cpufeature.h>
 #include <asm/cpuerrata.h>
 #include <asm/psci.h>
@@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
     enable_cpu_capabilities(arm_errata);
 }
 
+static int cpu_errata_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    switch ( action )
+    {
+    case CPU_STARTING:
+        enable_nonboot_cpu_caps(arm_errata);
+        break;
+    default:
+        break;
+    }
+
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_errata_nfb = {
+    .notifier_call = cpu_errata_callback,
+};
+
+static int __init cpu_errata_notifier_init(void)
+{
+    register_cpu_notifier(&cpu_errata_nfb);
+    return 0;
+}
+/*
+ * Initialization has to be done at init rather than presmp_init phase because
+ * the callback should execute only after the secondary CPUs are initially
+ * booted (in hotplug scenarios when the system state is not boot). On boot,
+ * the enabling of errata workarounds will be triggered by the boot CPU from
+ * start_xen().
+ */
+__initcall(cpu_errata_notifier_init);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
index 525b45e22f..dd30f0d29c 100644
--- a/xen/arch/arm/cpufeature.c
+++ b/xen/arch/arm/cpufeature.c
@@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct arm_cpu_capabilities *caps)
     }
 }
 
+/* Run through the enabled capabilities and enable() them on the calling CPU */
+void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
+{
+    ASSERT(system_state != SYS_STATE_boot);
+
+    for ( ; caps->matches; caps++ )
+    {
+        if ( !cpus_have_cap(caps->capability) )
+            continue;
+
+        if ( caps->enable )
+        {
+            /*
+             * Since the CPU has enabled errata workarounds on boot, it should
+             * never fail to enable them here.
+             */
+            if ( caps->enable((void *)caps) )
+                panic("CPU%u failed to enable capability %u\n",
+                      smp_processor_id(), caps->capability);
+        }
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index e557a095af..b14e226401 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -88,6 +88,7 @@ void update_cpu_capabilities(const struct arm_cpu_capabilities *caps,
                              const char *info);
 
 void enable_cpu_capabilities(const struct arm_cpu_capabilities *caps);
+void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps);
 
 #endif /* __ASSEMBLY__ */
 
-- 
2.13.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 02/10] xen/arm: Ignore write to GICD_ISACTIVERn registers (vgic-v2)
  2018-04-27 17:12 ` [PATCH v3 02/10] xen/arm: Ignore write to GICD_ISACTIVERn registers (vgic-v2) Mirela Simonovic
@ 2018-04-30 14:32   ` Julien Grall
  0 siblings, 0 replies; 53+ messages in thread
From: Julien Grall @ 2018-04-30 14:32 UTC (permalink / raw)
  To: Mirela Simonovic, xen-devel; +Cc: edgar.iglesias, sstabellini, dm

Hi,

On 27/04/18 18:12, Mirela Simonovic wrote:
> Guests attempt to write into these registers on resume (for example Linux).
> Without this patch a data abort exception will be raised to the guest.
> This patch handles the write access by ignoring it, but only if the value
> to be written is zero. This should be fine because reading these registers
> is already handled as 'read as zero'.
> 
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

> 
> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> ---
> Changes in v2:
> - Write should be ignored only if the value to be written is zero
>   (in v1 the write was ignored regardless of the value)
> 
> Changes in v3:
> - Print warning only if the value to be written is not zero
> ---
>   xen/arch/arm/vgic-v2.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
> index 646d1f3d12..f6c11f1e41 100644
> --- a/xen/arch/arm/vgic-v2.c
> +++ b/xen/arch/arm/vgic-v2.c
> @@ -485,6 +485,8 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
>   
>       case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
>           if ( dabt.size != DABT_WORD ) goto bad_width;
> +        if ( r == 0 )
> +            goto write_ignore_32;
>           printk(XENLOG_G_ERR
>                  "%pv: vGICD: unhandled word write %#"PRIregister" to ISACTIVER%d\n",
>                  v, r, gicd_reg - GICD_ISACTIVER);
> 

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 03/10] xen/arm: Implement CPU_OFF PSCI call (physical interface)
  2018-04-27 17:12 ` [PATCH v3 03/10] xen/arm: Implement CPU_OFF PSCI call (physical interface) Mirela Simonovic
@ 2018-04-30 14:36   ` Julien Grall
  0 siblings, 0 replies; 53+ messages in thread
From: Julien Grall @ 2018-04-30 14:36 UTC (permalink / raw)
  To: Mirela Simonovic, xen-devel; +Cc: edgar.iglesias, sstabellini, dm

Hi Mirela,

On 27/04/18 18:12, Mirela Simonovic wrote:
> During the system suspend to RAM non-boot CPUs will be hotplugged.
> This will be triggered via disable_nonboot_cpus() call. When
> hotplugged the CPU will end up in an infinite wfi loop in stop_cpu().
> This patch adds PSCI CPU_OFF call to the EL3 with the aim to get powered
> down the calling CPU during the suspend. The CPU_OFF call will be made
> only if the PSCI version is higher than v0.1 (Note that the CPU_OFF
> function is mandatory since PSCI v0.2).
> If PSCI CPU_OFF call to the EL3 succeeds it will not return. Otherwise,
> when the PSCI CPU_OFF call returns we'll raise panic, because the
> calling CPU couldn't be enabled afterwards (stays in WFI loop forever).
> Note that if the PSCI version is higher than v0.1 the CPU_OFF will be
> called regardless of the system state. This is done because scenarios
> other than suspend may benefit from powering off the CPU.
> 
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> 
> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> ---
> Changes in v2:
> -Issue PSCI CPU_OFF only if the system is suspending
> -If PSCI CPU_OFF call fails (unlikely to ever happen) raise panic
> -Fixed commit message
> 
> Changes in v3:
> -Check for PSCI version prior to calling CPU_OFF
> -Don't check for system state - invoke CPU_OFF in all system states
> -Don't check if returned error is not zero because it's always not
>   zero if CPU_OFF SMC returns
> -Fixed commit message
> ---
>   xen/arch/arm/psci.c        | 13 +++++++++++++
>   xen/arch/arm/smpboot.c     |  2 ++
>   xen/include/asm-arm/psci.h |  1 +
>   3 files changed, 16 insertions(+)
> 
> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
> index 94b616df9b..7c1124e45f 100644
> --- a/xen/arch/arm/psci.c
> +++ b/xen/arch/arm/psci.c
> @@ -46,6 +46,19 @@ int call_psci_cpu_on(int cpu)
>       return call_smc(psci_cpu_on_nr, cpu_logical_map(cpu), __pa(init_secondary), 0);
>   }
>   
> +void call_psci_cpu_off(void)
> +{
> +    if ( psci_ver > PSCI_VERSION(0, 1) )
> +    {
> +        int errno;
> +
> +        /* If successfull the PSCI cpu_off call doesn't return */
> +        errno = call_smc(PSCI_0_2_FN32_CPU_OFF, 0, 0, 0);
> +        panic("PSCI cpu off failed for CPU%d err=%d\n", get_processor_id(),

Please use smp_processor_id() here.

> +                errno);

The indentation looks wrong.

With that:

Acked-by: Julien Grall <julien.grall@arm.com>

Cheers,


> +    }
> +}
> +
>   void call_psci_system_off(void)
>   {
>       if ( psci_ver > PSCI_VERSION(0, 1) )
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index b2116f0d2d..8b1e274bf3 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -395,6 +395,8 @@ void stop_cpu(void)
>       /* Make sure the write happens before we sleep forever */
>       dsb(sy);
>       isb();
> +    call_psci_cpu_off();
> +
>       while ( 1 )
>           wfi();
>   }
> diff --git a/xen/include/asm-arm/psci.h b/xen/include/asm-arm/psci.h
> index 9ac820e94a..832f77afff 100644
> --- a/xen/include/asm-arm/psci.h
> +++ b/xen/include/asm-arm/psci.h
> @@ -20,6 +20,7 @@ extern uint32_t psci_ver;
>   
>   int psci_init(void);
>   int call_psci_cpu_on(int cpu);
> +void call_psci_cpu_off(void);
>   void call_psci_system_off(void);
>   void call_psci_system_reset(void);
>   
> 

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume
  2018-04-27 17:12 ` [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume Mirela Simonovic
@ 2018-04-30 14:47   ` Julien Grall
  2018-05-07 14:55     ` Mirela Simonovic
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2018-04-30 14:47 UTC (permalink / raw)
  To: Mirela Simonovic, xen-devel; +Cc: edgar.iglesias, sstabellini, dm

Hi Mirela,

On 27/04/18 18:12, Mirela Simonovic wrote:
> In existing code the virtual paging for non-boot CPUs is setup only on boot.
> The setup is triggered from start_xen() after all CPUs are brought online.
> In other words, the initialization of VTCR_EL2 register is done out of the
> cpu_up/start_secondary() control flow. However, the cpu_up flow is also used
> to hotplug non-boot CPUs on resume from suspend to RAM state, in which case
> the virtual paging will not be configured.
> 
> With this patch the setting of paging is triggered from start_secondary()
> function using cpu starting notifier (notify_cpu_starting() call). The
> notifier is registered in p2m.c using init call. This has to be done with
> init call rather than presmp_init because the registered callback depends
> on vtcr configuration value which is setup after the presmp init calls
> are executed (do_presmp_initcalls() called from start_xen()). Init calls
> are executed after initial virtual paging is set up for all CPUs on boot.
> This ensures that no callback can fire until the vtcr value is calculated
> by Xen and virtual paging is set up initially for all CPUs. Also, this way
> the virtual paging setup in boot scenario remains unchanged.
> 
> It is assumed here that after the system completed the boot, CPUs that
> execute start_secondary() were booted as well when the Xen itself was
> booted. According to this assumption non-boot CPUs will always be compliant
> with the VTCR_EL2 value that was selected by Xen on boot.
> Currently, there is no mechanism to trigger hotplugging of a CPU. This
> will be added with the suspend to RAM support for ARM, where the hotplug
> of non-boot CPUs will be triggered via enable_nonboot_cpus() call.
> 
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> 
> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> ---
> Changes in v2:
> -Fix commit message
> -Save configured VTCR_EL2 value into static variable that will be used
>   by non-boot CPUs on hotplug
> -Add setup_virt_paging_secondary() and invoke it from start_secondary()
>   if that CPU has to setup virtual paging (if the system state is not boot)
> 
> Changes in v3:
> -Fix commit message
> -Remove setup_virt_paging_secondary() and use notifier to setup virtual
>   paging for non-boot CPU on hotplug.
> -In setup_virt_paging() use vtcr static variable instead of local val
> -In setup_virt_paging_one() use vtcr static variable instead of provided
>   argument
> ---
>   xen/arch/arm/p2m.c | 82 +++++++++++++++++++++++++++++++++++++++++-------------
>   1 file changed, 62 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index d43c3aa896..98a1fe6de9 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -8,6 +8,8 @@
>   #include <xen/iocap.h>
>   #include <xen/mem_access.h>
>   #include <xen/xmalloc.h>
> +#include <xen/notifier.h>
> +#include <xen/cpu.h>

Please add them alphabetically.

>   #include <public/vm_event.h>
>   #include <asm/flushtlb.h>
>   #include <asm/event.h>
> @@ -1451,24 +1453,17 @@ err:
>       return page;
>   }
>   
> -static void __init setup_virt_paging_one(void *data)
> +/* VTCR value to be configured by all CPUs. Set only once by the boot CPU */
> +static uint64_t __read_mostly vtcr;
> +
> +static void setup_virt_paging_one(void *data)
>   {
> -    unsigned long val = (unsigned long)data;
> -    WRITE_SYSREG32(val, VTCR_EL2);
> +    WRITE_SYSREG32(vtcr, VTCR_EL2);
>       isb();
>   }
>   
>   void __init setup_virt_paging(void)
>   {
> -    /* Setup Stage 2 address translation */
> -    unsigned long val = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
> -
> -#ifdef CONFIG_ARM_32
> -    printk("P2M: 40-bit IPA\n");
> -    p2m_ipa_bits = 40;
> -    val |= VTCR_T0SZ(0x18); /* 40 bit IPA */
> -    val |= VTCR_SL0(0x1); /* P2M starts at first level */
> -#else /* CONFIG_ARM_64 */
>       const struct {
>           unsigned int pabits; /* Physical Address Size */
>           unsigned int t0sz;   /* Desired T0SZ, minimum in comment */
> @@ -1491,6 +1486,16 @@ void __init setup_virt_paging(void)
>       unsigned int pa_range = 0x10; /* Larger than any possible value */
>       bool vmid_8_bit = false;

That's not going to build on arm32 because vmid_8_bit & co are not used. 
While I will not ask you to test the changes on 32-bit board, I would at 
least want each change to be build test it on impacted architecture.

In that particular case, you can just move the initialization of vtcr at 
after the endif because there is nothing that required that to be setup 
very early.

>   
> +    /* Setup Stage 2 address translation */
> +    vtcr = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
> +
> +#ifdef CONFIG_ARM_32
> +    printk("P2M: 40-bit IPA\n");
> +    p2m_ipa_bits = 40;
> +    vtcr |= VTCR_T0SZ(0x18); /* 40 bit IPA */
> +    vtcr |= VTCR_SL0(0x1); /* P2M starts at first level */
> +#else /* CONFIG_ARM_64 */
> +
>       for_each_online_cpu ( cpu )
>       {
>           const struct cpuinfo_arm *info = &cpu_data[cpu];
> @@ -1513,14 +1518,14 @@ void __init setup_virt_paging(void)
>       if ( pa_range >= ARRAY_SIZE(pa_range_info) || !pa_range_info[pa_range].pabits )
>           panic("Unknown encoding of ID_AA64MMFR0_EL1.PARange %x\n", pa_range);
>   
> -    val |= VTCR_PS(pa_range);
> -    val |= VTCR_TG0_4K;
> +    vtcr |= VTCR_PS(pa_range);
> +    vtcr |= VTCR_TG0_4K;
>   
>       /* Set the VS bit only if 16 bit VMID is supported. */
>       if ( MAX_VMID == MAX_VMID_16_BIT )
> -        val |= VTCR_VS;
> -    val |= VTCR_SL0(pa_range_info[pa_range].sl0);
> -    val |= VTCR_T0SZ(pa_range_info[pa_range].t0sz);
> +        vtcr |= VTCR_VS;
> +    vtcr |= VTCR_SL0(pa_range_info[pa_range].sl0);
> +    vtcr |= VTCR_T0SZ(pa_range_info[pa_range].t0sz);
>   
>       p2m_root_order = pa_range_info[pa_range].root_order;
>       p2m_root_level = 2 - pa_range_info[pa_range].sl0;
> @@ -1532,16 +1537,53 @@ void __init setup_virt_paging(void)
>              ( MAX_VMID == MAX_VMID_16_BIT ) ? 16 : 8);
>   #endif
>       printk("P2M: %d levels with order-%d root, VTCR 0x%lx\n",
> -           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
> +           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, vtcr);
>   
>       p2m_vmid_allocator_init();
>   
>       /* It is not allowed to concatenate a level zero root */
>       BUG_ON( P2M_ROOT_LEVEL == 0 && P2M_ROOT_ORDER > 0 );
> -    setup_virt_paging_one((void *)val);
> -    smp_call_function(setup_virt_paging_one, (void *)val, 1);
> +    setup_virt_paging_one(NULL);
> +    smp_call_function(setup_virt_paging_one, NULL, 1);
> +}
> +
> +static int cpu_virt_paging_callback(
> +    struct notifier_block *nfb, unsigned long action, void *hcpu)

The indentation looks wrong.

> +{
> +    switch ( action )
> +    {
> +    case CPU_STARTING:
> +        ASSERT(system_state != SYS_STATE_boot);

I was about to complain about this but then saw you add the notifiers 
after. That's quite clever and the comment is really helpful :).

> +        setup_virt_paging_one(NULL);
> +        break;
> +    default:
> +        break;
> +    }
> +
> +    return NOTIFY_DONE;
>   }
>   
> +static struct notifier_block cpu_virt_paging_nfb = {
> +    .notifier_call = cpu_virt_paging_callback,
> +    .priority = 100 /* highest priority */
> +};
> +
> +static int __init cpu_virt_paging_init(void)
> +{
> +    register_cpu_notifier(&cpu_virt_paging_nfb);
> +    return 0;
> +}

NIT: Missing newline.

> +/*
> + * Initialization of the notifier has to be done at init rather than presmp_init
> + * phase because: the registered notifier is used to setup virtual paging for
> + * non-boot CPUs after the initial virtual paging for all CPUs is already setup,
> + * i.e. when a non-boot CPU is hotplugged after the system has booted. In other
> + * words, the notifier should be registered after the virtual paging is
> + * initially setup (setup_virt_paging() is called from start_xen()). This is
> + * required because vtcr config value has to be set before a notifier can fire.
> + */
> +__initcall(cpu_virt_paging_init);
> +
>   /*
>    * Local variables:
>    * mode: C
> 

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 07/10] xen/arm: Release maintenance interrupt when CPU is hot-unplugged
  2018-04-27 17:12 ` [PATCH v3 07/10] xen/arm: Release maintenance interrupt when CPU is hot-unplugged Mirela Simonovic
@ 2018-04-30 14:51   ` Julien Grall
  0 siblings, 0 replies; 53+ messages in thread
From: Julien Grall @ 2018-04-30 14:51 UTC (permalink / raw)
  To: Mirela Simonovic, xen-devel; +Cc: edgar.iglesias, sstabellini, dm

Hi Mirela,

On 27/04/18 18:12, Mirela Simonovic wrote:
> When a CPU is hot-unplugged the maintenance interrupt has to be
> released in order to free the memory that was allocated when the CPU
> was hotplugged and interrupt requested. The interrupt was requested
> using request_irq() which is called from start_secondary->
> init_maintenance_interrupt. With this patch the interrupt will be
> released when the CPU_DYING event is received by the callback which
> is added in gic.c.
> 
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> 
> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> ---
> Changes in v3:
> -Add notifier in order to trigger releasing of the  maintenance
>   interrupt when the CPU is dying.
> ---
>   xen/arch/arm/gic.c | 29 +++++++++++++++++++++++++++++
>   1 file changed, 29 insertions(+)
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 653a815127..89abc49950 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -27,6 +27,8 @@
>   #include <xen/list.h>
>   #include <xen/device_tree.h>
>   #include <xen/acpi.h>
> +#include <xen/notifier.h>
> +#include <xen/cpu.h>
>   #include <asm/p2m.h>
>   #include <asm/domain.h>
>   #include <asm/platform.h>
> @@ -462,6 +464,33 @@ int gic_iomem_deny_access(const struct domain *d)
>       return gic_hw_ops->iomem_deny_access(d);
>   }
>   
> +static int cpu_gic_callback(
> +    struct notifier_block *nfb, unsigned long action, void *hcpu)

Please fix the indentation.

With that:

Acked-by: Julien Grall <julien.grall@arm.com>

> +{
> +    switch ( action )
> +    {
> +    case CPU_DYING:
> +        /* This is reverting the work done in init_maintenance_interrupt */

In the future we probably want to move init_maintenance_interrupt in the 
notifier. But that's a clean-up after this has been merged :).

> +        release_irq(gic_hw_ops->info->maintenance_irq, NULL);
> +        break;
> +    default:
> +        break;
> +    }
> +
> +    return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block cpu_gic_nfb = {
> +    .notifier_call = cpu_gic_callback,
> +};
> +
> +static int __init cpu_gic_notifier_init(void)
> +{
> +    register_cpu_notifier(&cpu_gic_nfb);
> +    return 0;
> +}
> +__initcall(cpu_gic_notifier_init);
> +
>   /*
>    * Local variables:
>    * mode: C
> 

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 09/10] xen/arm: Free memory allocated for sibling/core maps on CPU hot-unplug
  2018-04-27 17:12 ` [PATCH v3 09/10] xen/arm: Free memory allocated for sibling/core maps on CPU hot-unplug Mirela Simonovic
@ 2018-04-30 15:55   ` Julien Grall
  0 siblings, 0 replies; 53+ messages in thread
From: Julien Grall @ 2018-04-30 15:55 UTC (permalink / raw)
  To: Mirela Simonovic, xen-devel; +Cc: edgar.iglesias, sstabellini, dm

Hi Mirela,

On 27/04/18 18:12, Mirela Simonovic wrote:
> The memory allocated in setup_cpu_sibling_map() when a CPU is hotplugged
> has to be freed when the CPU is hot-unplugged. This is done in
> remove_cpu_sibling_map() and called when the CPU dies. The call to
> remove_cpu_sibling_map() is made from a notifier callback when
> CPU_DEAD event is received.
> 
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> 
> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> ---
> Changes in v3:
> -Use notifier to trigger remove_cpu_sibling_map() when the CPU dies.
> ---
>   xen/arch/arm/smpboot.c | 34 ++++++++++++++++++++++++++++++++++
>   1 file changed, 34 insertions(+)
> 
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index ad1f6b751b..b833e3a754 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -89,6 +89,12 @@ static void setup_cpu_sibling_map(int cpu)
>       cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
>   }
>   
> +static void remove_cpu_sibling_map(int cpu)
> +{
> +    free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
> +    free_cpumask_var(per_cpu(cpu_core_mask, cpu));
> +}
> +
>   void __init
>   smp_clear_cpu_maps (void)
>   {
> @@ -499,6 +505,34 @@ void __cpu_die(unsigned int cpu)
>       smp_mb();
>   }
>   
> +static int cpu_smpboot_callback(
> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
> +{
> +    unsigned int cpu = (unsigned long)hcpu;
> +
> +    switch ( action )
> +    {
> +    case CPU_DEAD:
> +        remove_cpu_sibling_map(cpu);
> +        break;
> +    default:
> +        break;
> +    }
> +
> +    return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block cpu_smpboot_nfb = {
> +    .notifier_call = cpu_smpboot_callback,
> +};
> +
> +static int __init cpu_smpboot_notifier_init(void)
> +{
> +    register_cpu_notifier(&cpu_smpboot_nfb);
> +    return 0;
> +}
> +__initcall(cpu_smpboot_notifier_init);

I think this notifiers should go in preinit smp to cover the case where 
a secondary CPU dies beforehand the initcall.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/10] xen/arm: Release timer interrupts when CPU is hot-unplugged
  2018-04-27 17:12 ` [PATCH v3 08/10] xen/arm: Release timer interrupts " Mirela Simonovic
@ 2018-04-30 15:58   ` Julien Grall
  2018-05-07 15:33     ` Mirela Simonovic
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2018-04-30 15:58 UTC (permalink / raw)
  To: Mirela Simonovic, xen-devel; +Cc: edgar.iglesias, sstabellini, dm

Hi,

On 27/04/18 18:12, Mirela Simonovic wrote:
> When a CPU is hot-unplugged timer interrupts have to be released
> in order to free the memory that was allocated when the interrupts
> were requested (using request_irq()). The request_irq is called
> for each timer interrupt when the CPU gets hotplugged
> (start_secondary->init_timer_interrupt->request_irq).
> With this patch the timer interrupts will be released when the
> newly added callback receives CPU_DYING event.
> 
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> 
> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> ---
> Changes in v3:
> -Trigger releasing of timer interrupts using notifiers
> ---
>   xen/arch/arm/time.c | 39 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 39 insertions(+)
> 
> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> index c11fcfeadd..c7317e4639 100644
> --- a/xen/arch/arm/time.c
> +++ b/xen/arch/arm/time.c
> @@ -29,6 +29,8 @@
>   #include <xen/sched.h>
>   #include <xen/event.h>
>   #include <xen/acpi.h>
> +#include <xen/notifier.h>
> +#include <xen/cpu.h>
>   #include <asm/system.h>
>   #include <asm/time.h>
>   #include <asm/vgic.h>
> @@ -312,6 +314,17 @@ void init_timer_interrupt(void)
>       check_timer_irq_cfg(timer_irq[TIMER_PHYS_NONSECURE_PPI], "NS-physical");
>   }
>   
> +/*
> + * Revert actions done in init_timer_interrupt that are required to properly
> + * disable this CPU.
> + */
> +static void deinit_timer_interrupt(void)
> +{

Any reason to not disable the timer here? But I think we need to finish 
the discussion on the previous series regarding the purpose of the 
mdelay before going further with that patch. See patch v2 7/10. I would 
also appreciate answer to my question there.

> +    release_irq(timer_irq[TIMER_HYP_PPI], NULL);
> +    release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
> +    release_irq(timer_irq[TIMER_PHYS_NONSECURE_PPI], NULL);
> +}
> +
>   /* Wait a set number of microseconds */
>   void udelay(unsigned long usecs)
>   {
> @@ -340,6 +353,32 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
>       /* XXX update guest visible wallclock time */
>   }
>   
> +static int cpu_time_callback(
> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
> +{
> +    switch ( action )
> +    {
> +    case CPU_DYING:
> +        deinit_timer_interrupt();
> +        break;
> +    default:
> +        break;
> +    }
> +
> +    return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block cpu_time_nfb = {
> +    .notifier_call = cpu_time_callback,
> +};
> +
> +static int __init cpu_time_notifier_init(void)
> +{
> +    register_cpu_notifier(&cpu_time_nfb);
> +    return 0;
> +}
> +__initcall(cpu_time_notifier_init);
> +
>   /*
>    * Local variables:
>    * mode: C
> 

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-04-27 17:12 ` [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot Mirela Simonovic
@ 2018-04-30 16:09   ` Julien Grall
  2018-05-09 15:48     ` Mirela Simonovic
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2018-04-30 16:09 UTC (permalink / raw)
  To: Mirela Simonovic, xen-devel; +Cc: edgar.iglesias, sstabellini, dm

Hi Mirela,

On 27/04/18 18:12, Mirela Simonovic wrote:
> On boot, enabling errata workarounds will be triggered by the boot CPU
> from start_xen(). On CPU hotplug (non-boot scenario) this would not be
> done. This patch adds the code required to enable errata workarounds
> for a CPU being hotplugged after the system boots. This is triggered
> using a notifier. If the CPU fails to enable the errata Xen will panic.
> This is done because it is assumed that the CPU which is hotplugged
> after the system/Xen boots, was initially hotplugged during the
> system/Xen boot. Therefore, enabling errata workarounds should never
> fail.
> 
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> 
> ---
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> ---
>   xen/arch/arm/cpuerrata.c         | 35 +++++++++++++++++++++++++++++++++++
>   xen/arch/arm/cpufeature.c        | 23 +++++++++++++++++++++++
>   xen/include/asm-arm/cpufeature.h |  1 +
>   3 files changed, 59 insertions(+)
> 
> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
> index 1baa20654b..4040f781ec 100644
> --- a/xen/arch/arm/cpuerrata.c
> +++ b/xen/arch/arm/cpuerrata.c
> @@ -5,6 +5,8 @@
>   #include <xen/spinlock.h>
>   #include <xen/vmap.h>
>   #include <xen/warning.h>
> +#include <xen/notifier.h>
> +#include <xen/cpu.h>
>   #include <asm/cpufeature.h>
>   #include <asm/cpuerrata.h>
>   #include <asm/psci.h>
> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
>       enable_cpu_capabilities(arm_errata);
>   }
>   
> +static int cpu_errata_callback(
> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
> +{
> +    switch ( action )
> +    {
> +    case CPU_STARTING:
> +        enable_nonboot_cpu_caps(arm_errata);
> +        break;
> +    default:
> +        break;
> +    }
> +
> +    return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block cpu_errata_nfb = {
> +    .notifier_call = cpu_errata_callback,
> +};
> +
> +static int __init cpu_errata_notifier_init(void)
> +{
> +    register_cpu_notifier(&cpu_errata_nfb);
> +    return 0;
> +}
> +/*
> + * Initialization has to be done at init rather than presmp_init phase because
> + * the callback should execute only after the secondary CPUs are initially
> + * booted (in hotplug scenarios when the system state is not boot). On boot,
> + * the enabling of errata workarounds will be triggered by the boot CPU from
> + * start_xen().
> + */
> +__initcall(cpu_errata_notifier_init);
> +
>   /*
>    * Local variables:
>    * mode: C
> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
> index 525b45e22f..dd30f0d29c 100644
> --- a/xen/arch/arm/cpufeature.c
> +++ b/xen/arch/arm/cpufeature.c
> @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct arm_cpu_capabilities *caps)
>       }
>   }
>   
> +/* Run through the enabled capabilities and enable() them on the calling CPU */
> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
> +{
> +    ASSERT(system_state != SYS_STATE_boot);
> +
> +    for ( ; caps->matches; caps++ )
> +    {
> +        if ( !cpus_have_cap(caps->capability) )
> +            continue;
> +
> +        if ( caps->enable )
> +        {
> +            /*
> +             * Since the CPU has enabled errata workarounds on boot, it should

This function is not really about errata, it is about capabilities. 
Errata is just a sub-category of them.

> +             * never fail to enable them here.
> +             */
> +            if ( caps->enable((void *)caps) )
> +                panic("CPU%u failed to enable capability %u\n",
> +                      smp_processor_id(), caps->capability);

We should really avoid to use panic(...) if this is something the system 
can survive. In that specific case, it would only affect the current 
CPU. So it would be better to return an error and let the caller decide 
what to do.

Cheers,

> +        }
> +    }
> +}
> +
>   /*
>    * Local variables:
>    * mode: C
> diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
> index e557a095af..b14e226401 100644
> --- a/xen/include/asm-arm/cpufeature.h
> +++ b/xen/include/asm-arm/cpufeature.h
> @@ -88,6 +88,7 @@ void update_cpu_capabilities(const struct arm_cpu_capabilities *caps,
>                                const char *info);
>   
>   void enable_cpu_capabilities(const struct arm_cpu_capabilities *caps);
> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps);
>   
>   #endif /* __ASSEMBLY__ */
>   
> 

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume
  2018-04-30 14:47   ` Julien Grall
@ 2018-05-07 14:55     ` Mirela Simonovic
  2018-05-08 14:14       ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-07 14:55 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel

Hi Julien,


On Mon, Apr 30, 2018 at 4:47 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi Mirela,
>
>
> On 27/04/18 18:12, Mirela Simonovic wrote:
>>
>> In existing code the virtual paging for non-boot CPUs is setup only on
>> boot.
>> The setup is triggered from start_xen() after all CPUs are brought online.
>> In other words, the initialization of VTCR_EL2 register is done out of the
>> cpu_up/start_secondary() control flow. However, the cpu_up flow is also
>> used
>> to hotplug non-boot CPUs on resume from suspend to RAM state, in which
>> case
>> the virtual paging will not be configured.
>>
>> With this patch the setting of paging is triggered from start_secondary()
>> function using cpu starting notifier (notify_cpu_starting() call). The
>> notifier is registered in p2m.c using init call. This has to be done with
>> init call rather than presmp_init because the registered callback depends
>> on vtcr configuration value which is setup after the presmp init calls
>> are executed (do_presmp_initcalls() called from start_xen()). Init calls
>> are executed after initial virtual paging is set up for all CPUs on boot.
>> This ensures that no callback can fire until the vtcr value is calculated
>> by Xen and virtual paging is set up initially for all CPUs. Also, this way
>> the virtual paging setup in boot scenario remains unchanged.
>>
>> It is assumed here that after the system completed the boot, CPUs that
>> execute start_secondary() were booted as well when the Xen itself was
>> booted. According to this assumption non-boot CPUs will always be
>> compliant
>> with the VTCR_EL2 value that was selected by Xen on boot.
>> Currently, there is no mechanism to trigger hotplugging of a CPU. This
>> will be added with the suspend to RAM support for ARM, where the hotplug
>> of non-boot CPUs will be triggered via enable_nonboot_cpus() call.
>>
>> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
>>
>> ---
>> CC: Stefano Stabellini <sstabellini@kernel.org>
>> CC: Julien Grall <julien.grall@arm.com>
>> ---
>> Changes in v2:
>> -Fix commit message
>> -Save configured VTCR_EL2 value into static variable that will be used
>>   by non-boot CPUs on hotplug
>> -Add setup_virt_paging_secondary() and invoke it from start_secondary()
>>   if that CPU has to setup virtual paging (if the system state is not
>> boot)
>>
>> Changes in v3:
>> -Fix commit message
>> -Remove setup_virt_paging_secondary() and use notifier to setup virtual
>>   paging for non-boot CPU on hotplug.
>> -In setup_virt_paging() use vtcr static variable instead of local val
>> -In setup_virt_paging_one() use vtcr static variable instead of provided
>>   argument
>> ---
>>   xen/arch/arm/p2m.c | 82
>> +++++++++++++++++++++++++++++++++++++++++-------------
>>   1 file changed, 62 insertions(+), 20 deletions(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index d43c3aa896..98a1fe6de9 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -8,6 +8,8 @@
>>   #include <xen/iocap.h>
>>   #include <xen/mem_access.h>
>>   #include <xen/xmalloc.h>
>> +#include <xen/notifier.h>
>> +#include <xen/cpu.h>
>
>
> Please add them alphabetically.
>
>
>>   #include <public/vm_event.h>
>>   #include <asm/flushtlb.h>
>>   #include <asm/event.h>
>> @@ -1451,24 +1453,17 @@ err:
>>       return page;
>>   }
>>   -static void __init setup_virt_paging_one(void *data)
>> +/* VTCR value to be configured by all CPUs. Set only once by the boot CPU
>> */
>> +static uint64_t __read_mostly vtcr;
>> +
>> +static void setup_virt_paging_one(void *data)
>>   {
>> -    unsigned long val = (unsigned long)data;
>> -    WRITE_SYSREG32(val, VTCR_EL2);
>> +    WRITE_SYSREG32(vtcr, VTCR_EL2);
>>       isb();
>>   }
>>     void __init setup_virt_paging(void)
>>   {
>> -    /* Setup Stage 2 address translation */
>> -    unsigned long val =
>> VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
>> -
>> -#ifdef CONFIG_ARM_32
>> -    printk("P2M: 40-bit IPA\n");
>> -    p2m_ipa_bits = 40;
>> -    val |= VTCR_T0SZ(0x18); /* 40 bit IPA */
>> -    val |= VTCR_SL0(0x1); /* P2M starts at first level */
>> -#else /* CONFIG_ARM_64 */
>>       const struct {
>>           unsigned int pabits; /* Physical Address Size */
>>           unsigned int t0sz;   /* Desired T0SZ, minimum in comment */
>> @@ -1491,6 +1486,16 @@ void __init setup_virt_paging(void)
>>       unsigned int pa_range = 0x10; /* Larger than any possible value */
>>       bool vmid_8_bit = false;
>
>
> That's not going to build on arm32 because vmid_8_bit & co are not used.
> While I will not ask you to test the changes on 32-bit board, I would at
> least want each change to be build test it on impacted architecture.
>
> In that particular case, you can just move the initialization of vtcr at
> after the endif because there is nothing that required that to be setup very
> early.
>
>
>>   +    /* Setup Stage 2 address translation */
>> +    vtcr = VTCR_RES1|VTCR_SH0_IS|VTCR_ORGN0_WBWA|VTCR_IRGN0_WBWA;
>> +
>> +#ifdef CONFIG_ARM_32
>> +    printk("P2M: 40-bit IPA\n");
>> +    p2m_ipa_bits = 40;
>> +    vtcr |= VTCR_T0SZ(0x18); /* 40 bit IPA */
>> +    vtcr |= VTCR_SL0(0x1); /* P2M starts at first level */
>> +#else /* CONFIG_ARM_64 */
>> +
>>       for_each_online_cpu ( cpu )
>>       {
>>           const struct cpuinfo_arm *info = &cpu_data[cpu];
>> @@ -1513,14 +1518,14 @@ void __init setup_virt_paging(void)
>>       if ( pa_range >= ARRAY_SIZE(pa_range_info) ||
>> !pa_range_info[pa_range].pabits )
>>           panic("Unknown encoding of ID_AA64MMFR0_EL1.PARange %x\n",
>> pa_range);
>>   -    val |= VTCR_PS(pa_range);
>> -    val |= VTCR_TG0_4K;
>> +    vtcr |= VTCR_PS(pa_range);
>> +    vtcr |= VTCR_TG0_4K;
>>         /* Set the VS bit only if 16 bit VMID is supported. */
>>       if ( MAX_VMID == MAX_VMID_16_BIT )
>> -        val |= VTCR_VS;
>> -    val |= VTCR_SL0(pa_range_info[pa_range].sl0);
>> -    val |= VTCR_T0SZ(pa_range_info[pa_range].t0sz);
>> +        vtcr |= VTCR_VS;
>> +    vtcr |= VTCR_SL0(pa_range_info[pa_range].sl0);
>> +    vtcr |= VTCR_T0SZ(pa_range_info[pa_range].t0sz);
>>         p2m_root_order = pa_range_info[pa_range].root_order;
>>       p2m_root_level = 2 - pa_range_info[pa_range].sl0;
>> @@ -1532,16 +1537,53 @@ void __init setup_virt_paging(void)
>>              ( MAX_VMID == MAX_VMID_16_BIT ) ? 16 : 8);
>>   #endif
>>       printk("P2M: %d levels with order-%d root, VTCR 0x%lx\n",
>> -           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
>> +           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, vtcr);
>>         p2m_vmid_allocator_init();
>>         /* It is not allowed to concatenate a level zero root */
>>       BUG_ON( P2M_ROOT_LEVEL == 0 && P2M_ROOT_ORDER > 0 );
>> -    setup_virt_paging_one((void *)val);
>> -    smp_call_function(setup_virt_paging_one, (void *)val, 1);
>> +    setup_virt_paging_one(NULL);
>> +    smp_call_function(setup_virt_paging_one, NULL, 1);
>> +}
>> +
>> +static int cpu_virt_paging_callback(
>> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
>
>
> The indentation looks wrong.
>

Editor indented this for me and it looks the same as in other places
where a notifier is defined. I did
grep -r "struct notifier_block \*nfb,"
to check. It looks weird but seems correct

>> +{
>> +    switch ( action )
>> +    {
>> +    case CPU_STARTING:
>> +        ASSERT(system_state != SYS_STATE_boot);
>
>
> I was about to complain about this but then saw you add the notifiers after.
> That's quite clever and the comment is really helpful :).
>
>> +        setup_virt_paging_one(NULL);
>> +        break;
>> +    default:
>> +        break;
>> +    }
>> +
>> +    return NOTIFY_DONE;
>>   }
>>   +static struct notifier_block cpu_virt_paging_nfb = {
>> +    .notifier_call = cpu_virt_paging_callback,
>> +    .priority = 100 /* highest priority */
>> +};
>> +
>> +static int __init cpu_virt_paging_init(void)
>> +{
>> +    register_cpu_notifier(&cpu_virt_paging_nfb);
>> +    return 0;
>> +}
>
>
> NIT: Missing newline.
>
>> +/*
>> + * Initialization of the notifier has to be done at init rather than
>> presmp_init
>> + * phase because: the registered notifier is used to setup virtual paging
>> for
>> + * non-boot CPUs after the initial virtual paging for all CPUs is already
>> setup,
>> + * i.e. when a non-boot CPU is hotplugged after the system has booted. In
>> other
>> + * words, the notifier should be registered after the virtual paging is
>> + * initially setup (setup_virt_paging() is called from start_xen()). This
>> is
>> + * required because vtcr config value has to be set before a notifier can
>> fire.
>> + */
>> +__initcall(cpu_virt_paging_init);
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>>
>
> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 08/10] xen/arm: Release timer interrupts when CPU is hot-unplugged
  2018-04-30 15:58   ` Julien Grall
@ 2018-05-07 15:33     ` Mirela Simonovic
  0 siblings, 0 replies; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-07 15:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel

Hi Julien,

On Mon, Apr 30, 2018 at 5:58 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi,
>
>
> On 27/04/18 18:12, Mirela Simonovic wrote:
>>
>> When a CPU is hot-unplugged timer interrupts have to be released
>> in order to free the memory that was allocated when the interrupts
>> were requested (using request_irq()). The request_irq is called
>> for each timer interrupt when the CPU gets hotplugged
>> (start_secondary->init_timer_interrupt->request_irq).
>> With this patch the timer interrupts will be released when the
>> newly added callback receives CPU_DYING event.
>>
>> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
>>
>> ---
>> CC: Stefano Stabellini <sstabellini@kernel.org>
>> CC: Julien Grall <julien.grall@arm.com>
>> ---
>> Changes in v3:
>> -Trigger releasing of timer interrupts using notifiers
>> ---
>>   xen/arch/arm/time.c | 39 +++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 39 insertions(+)
>>
>> diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
>> index c11fcfeadd..c7317e4639 100644
>> --- a/xen/arch/arm/time.c
>> +++ b/xen/arch/arm/time.c
>> @@ -29,6 +29,8 @@
>>   #include <xen/sched.h>
>>   #include <xen/event.h>
>>   #include <xen/acpi.h>
>> +#include <xen/notifier.h>
>> +#include <xen/cpu.h>
>>   #include <asm/system.h>
>>   #include <asm/time.h>
>>   #include <asm/vgic.h>
>> @@ -312,6 +314,17 @@ void init_timer_interrupt(void)
>>       check_timer_irq_cfg(timer_irq[TIMER_PHYS_NONSECURE_PPI],
>> "NS-physical");
>>   }
>>   +/*
>> + * Revert actions done in init_timer_interrupt that are required to
>> properly
>> + * disable this CPU.
>> + */
>> +static void deinit_timer_interrupt(void)
>> +{
>
>
> Any reason to not disable the timer here? But I think we need to finish the
> discussion on the previous series regarding the purpose of the mdelay before
> going further with that patch. See patch v2 7/10. I would also appreciate
> answer to my question there.
>

I just missed it. Will add disabling timers.

>
>> +    release_irq(timer_irq[TIMER_HYP_PPI], NULL);
>> +    release_irq(timer_irq[TIMER_VIRT_PPI], NULL);
>> +    release_irq(timer_irq[TIMER_PHYS_NONSECURE_PPI], NULL);
>> +}
>> +
>>   /* Wait a set number of microseconds */
>>   void udelay(unsigned long usecs)
>>   {
>> @@ -340,6 +353,32 @@ void domain_set_time_offset(struct domain *d, int64_t
>> time_offset_seconds)
>>       /* XXX update guest visible wallclock time */
>>   }
>>   +static int cpu_time_callback(
>> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
>> +{
>> +    switch ( action )
>> +    {
>> +    case CPU_DYING:
>> +        deinit_timer_interrupt();
>> +        break;
>> +    default:
>> +        break;
>> +    }
>> +
>> +    return NOTIFY_DONE;
>> +}
>> +
>> +static struct notifier_block cpu_time_nfb = {
>> +    .notifier_call = cpu_time_callback,
>> +};
>> +
>> +static int __init cpu_time_notifier_init(void)
>> +{
>> +    register_cpu_notifier(&cpu_time_nfb);
>> +    return 0;
>> +}
>> +__initcall(cpu_time_notifier_init);
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>>
>
> Cheers,
>
> --
> Julien Grall


Thanks,
Mirela

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume
  2018-05-07 14:55     ` Mirela Simonovic
@ 2018-05-08 14:14       ` Julien Grall
  2018-05-08 14:28         ` Mirela Simonovic
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2018-05-08 14:14 UTC (permalink / raw)
  To: Mirela Simonovic
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel



On 07/05/18 15:55, Mirela Simonovic wrote:
> Hi Julien,

Hi Mirela,

> On Mon, Apr 30, 2018 at 4:47 PM, Julien Grall <julien.grall@arm.com> wrote:
>> On 27/04/18 18:12, Mirela Simonovic wrote:
>>>        printk("P2M: %d levels with order-%d root, VTCR 0x%lx\n",
>>> -           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
>>> +           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, vtcr);
>>>          p2m_vmid_allocator_init();
>>>          /* It is not allowed to concatenate a level zero root */
>>>        BUG_ON( P2M_ROOT_LEVEL == 0 && P2M_ROOT_ORDER > 0 );
>>> -    setup_virt_paging_one((void *)val);
>>> -    smp_call_function(setup_virt_paging_one, (void *)val, 1);
>>> +    setup_virt_paging_one(NULL);
>>> +    smp_call_function(setup_virt_paging_one, NULL, 1);
>>> +}
>>> +
>>> +static int cpu_virt_paging_callback(
>>> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
>>
>>
>> The indentation looks wrong.
>>
> 
> Editor indented this for me and it looks the same as in other places
> where a notifier is defined. I did
> grep -r "struct notifier_block \*nfb,"
> to check. It looks weird but seems correct

Indeed, I am not sure why it is done like that for notifiers. I can't 
see any reason to split like that given the first parameter can fit on 
the first line without hitting the 80 columns.

So I would much prefer if we follow Xen coding style:

static int cpu_virt_paging_callback(struct notifier_block *nfb,
				    unsigned long action,
				    void *hcpu);

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume
  2018-05-08 14:14       ` Julien Grall
@ 2018-05-08 14:28         ` Mirela Simonovic
  2018-05-09 10:32           ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-08 14:28 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel

Hi Julien,

On Tue, May 8, 2018 at 4:14 PM, Julien Grall <julien.grall@arm.com> wrote:
>
>
> On 07/05/18 15:55, Mirela Simonovic wrote:
>>
>> Hi Julien,
>
>
> Hi Mirela,
>
>> On Mon, Apr 30, 2018 at 4:47 PM, Julien Grall <julien.grall@arm.com>
>> wrote:
>>>
>>> On 27/04/18 18:12, Mirela Simonovic wrote:
>>>>
>>>>        printk("P2M: %d levels with order-%d root, VTCR 0x%lx\n",
>>>> -           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
>>>> +           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, vtcr);
>>>>          p2m_vmid_allocator_init();
>>>>          /* It is not allowed to concatenate a level zero root */
>>>>        BUG_ON( P2M_ROOT_LEVEL == 0 && P2M_ROOT_ORDER > 0 );
>>>> -    setup_virt_paging_one((void *)val);
>>>> -    smp_call_function(setup_virt_paging_one, (void *)val, 1);
>>>> +    setup_virt_paging_one(NULL);
>>>> +    smp_call_function(setup_virt_paging_one, NULL, 1);
>>>> +}
>>>> +
>>>> +static int cpu_virt_paging_callback(
>>>> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
>>>
>>>
>>>
>>> The indentation looks wrong.
>>>
>>
>> Editor indented this for me and it looks the same as in other places
>> where a notifier is defined. I did
>> grep -r "struct notifier_block \*nfb,"
>> to check. It looks weird but seems correct
>
>
> Indeed, I am not sure why it is done like that for notifiers. I can't see
> any reason to split like that given the first parameter can fit on the first
> line without hitting the 80 columns.
>
> So I would much prefer if we follow Xen coding style:
>
> static int cpu_virt_paging_callback(struct notifier_block *nfb,
>                                     unsigned long action,
>                                     void *hcpu);
>

Please just one more clarification: why did you split line after 2nd
argument? 3rd argument could fit in 80 chars.
The only coding style I found is in CODING_STYLE file, which doesn't
specify that much details - I'm just trying to understand where to
find more info in order to avoid coding style-related iterations in
future. Is there any other source specifying coding style for Xen?

Thanks,
Mirela

> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume
  2018-05-08 14:28         ` Mirela Simonovic
@ 2018-05-09 10:32           ` Julien Grall
  0 siblings, 0 replies; 53+ messages in thread
From: Julien Grall @ 2018-05-09 10:32 UTC (permalink / raw)
  To: Mirela Simonovic
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel



On 08/05/18 15:28, Mirela Simonovic wrote:
> Hi Julien,
> 
> On Tue, May 8, 2018 at 4:14 PM, Julien Grall <julien.grall@arm.com> wrote:
>>
>>
>> On 07/05/18 15:55, Mirela Simonovic wrote:
>>>
>>> Hi Julien,
>>
>>
>> Hi Mirela,
>>
>>> On Mon, Apr 30, 2018 at 4:47 PM, Julien Grall <julien.grall@arm.com>
>>> wrote:
>>>>
>>>> On 27/04/18 18:12, Mirela Simonovic wrote:
>>>>>
>>>>>         printk("P2M: %d levels with order-%d root, VTCR 0x%lx\n",
>>>>> -           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, val);
>>>>> +           4 - P2M_ROOT_LEVEL, P2M_ROOT_ORDER, vtcr);
>>>>>           p2m_vmid_allocator_init();
>>>>>           /* It is not allowed to concatenate a level zero root */
>>>>>         BUG_ON( P2M_ROOT_LEVEL == 0 && P2M_ROOT_ORDER > 0 );
>>>>> -    setup_virt_paging_one((void *)val);
>>>>> -    smp_call_function(setup_virt_paging_one, (void *)val, 1);
>>>>> +    setup_virt_paging_one(NULL);
>>>>> +    smp_call_function(setup_virt_paging_one, NULL, 1);
>>>>> +}
>>>>> +
>>>>> +static int cpu_virt_paging_callback(
>>>>> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
>>>>
>>>>
>>>>
>>>> The indentation looks wrong.
>>>>
>>>
>>> Editor indented this for me and it looks the same as in other places
>>> where a notifier is defined. I did
>>> grep -r "struct notifier_block \*nfb,"
>>> to check. It looks weird but seems correct
>>
>>
>> Indeed, I am not sure why it is done like that for notifiers. I can't see
>> any reason to split like that given the first parameter can fit on the first
>> line without hitting the 80 columns.
>>
>> So I would much prefer if we follow Xen coding style:
>>
>> static int cpu_virt_paging_callback(struct notifier_block *nfb,
>>                                      unsigned long action,
>>                                      void *hcpu);
>>
> 
> Please just one more clarification: why did you split line after 2nd
> argument? 3rd argument could fit in 80 chars.

Both are valid. I feel having the argument separated here is easier to read.

> The only coding style I found is in CODING_STYLE file, which doesn't
> specify that much details - I'm just trying to understand where to
> find more info in order to avoid coding style-related iterations in
> future. Is there any other source specifying coding style for Xen?

Sadly the coding style is not very well formalized. At the moment it is 
more a consensus between all the reviewers. There are work going on to 
fix that by introducing a checkpatch.pl.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-04-30 16:09   ` Julien Grall
@ 2018-05-09 15:48     ` Mirela Simonovic
  2018-05-09 16:32       ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-09 15:48 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel

Hi Julien,

On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi Mirela,
>
>
> On 27/04/18 18:12, Mirela Simonovic wrote:
>>
>> On boot, enabling errata workarounds will be triggered by the boot CPU
>> from start_xen(). On CPU hotplug (non-boot scenario) this would not be
>> done. This patch adds the code required to enable errata workarounds
>> for a CPU being hotplugged after the system boots. This is triggered
>> using a notifier. If the CPU fails to enable the errata Xen will panic.
>> This is done because it is assumed that the CPU which is hotplugged
>> after the system/Xen boots, was initially hotplugged during the
>> system/Xen boot. Therefore, enabling errata workarounds should never
>> fail.
>>
>> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
>>
>> ---
>> CC: Stefano Stabellini <sstabellini@kernel.org>
>> CC: Julien Grall <julien.grall@arm.com>
>> ---
>>   xen/arch/arm/cpuerrata.c         | 35
>> +++++++++++++++++++++++++++++++++++
>>   xen/arch/arm/cpufeature.c        | 23 +++++++++++++++++++++++
>>   xen/include/asm-arm/cpufeature.h |  1 +
>>   3 files changed, 59 insertions(+)
>>
>> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
>> index 1baa20654b..4040f781ec 100644
>> --- a/xen/arch/arm/cpuerrata.c
>> +++ b/xen/arch/arm/cpuerrata.c
>> @@ -5,6 +5,8 @@
>>   #include <xen/spinlock.h>
>>   #include <xen/vmap.h>
>>   #include <xen/warning.h>
>> +#include <xen/notifier.h>
>> +#include <xen/cpu.h>
>>   #include <asm/cpufeature.h>
>>   #include <asm/cpuerrata.h>
>>   #include <asm/psci.h>
>> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
>>       enable_cpu_capabilities(arm_errata);
>>   }
>>   +static int cpu_errata_callback(
>> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
>> +{
>> +    switch ( action )
>> +    {
>> +    case CPU_STARTING:
>> +        enable_nonboot_cpu_caps(arm_errata);
>> +        break;
>> +    default:
>> +        break;
>> +    }
>> +
>> +    return NOTIFY_DONE;
>> +}
>> +
>> +static struct notifier_block cpu_errata_nfb = {
>> +    .notifier_call = cpu_errata_callback,
>> +};
>> +
>> +static int __init cpu_errata_notifier_init(void)
>> +{
>> +    register_cpu_notifier(&cpu_errata_nfb);
>> +    return 0;
>> +}
>> +/*
>> + * Initialization has to be done at init rather than presmp_init phase
>> because
>> + * the callback should execute only after the secondary CPUs are
>> initially
>> + * booted (in hotplug scenarios when the system state is not boot). On
>> boot,
>> + * the enabling of errata workarounds will be triggered by the boot CPU
>> from
>> + * start_xen().
>> + */
>> +__initcall(cpu_errata_notifier_init);
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
>> index 525b45e22f..dd30f0d29c 100644
>> --- a/xen/arch/arm/cpufeature.c
>> +++ b/xen/arch/arm/cpufeature.c
>> @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct
>> arm_cpu_capabilities *caps)
>>       }
>>   }
>>   +/* Run through the enabled capabilities and enable() them on the
>> calling CPU */
>> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
>> +{
>> +    ASSERT(system_state != SYS_STATE_boot);
>> +
>> +    for ( ; caps->matches; caps++ )
>> +    {
>> +        if ( !cpus_have_cap(caps->capability) )
>> +            continue;
>> +
>> +        if ( caps->enable )
>> +        {
>> +            /*
>> +             * Since the CPU has enabled errata workarounds on boot, it
>> should
>
>
> This function is not really about errata, it is about capabilities. Errata
> is just a sub-category of them.
>

I've fixed the comment, thanks.

>> +             * never fail to enable them here.
>> +             */
>> +            if ( caps->enable((void *)caps) )
>> +                panic("CPU%u failed to enable capability %u\n",
>> +                      smp_processor_id(), caps->capability);
>
>
> We should really avoid to use panic(...) if this is something the system can
> survive. In that specific case, it would only affect the current CPU. So it
> would be better to return an error and let the caller decide what to do.
>

I need to emphasize two points:
1) I don't see how is this different compared to PSCI CPU OFF where we
do panic. Essentially, in both cases the system will not be able to
use that CPU and we already agreed that is a good reason to panic.
As oppose to CPU_OFF which wasn't called on boot so we indeed have no
idea whether it will pass on suspend, no matter how unlikely it could
fail, in this scenario we are sure that enabling capability should
pass because it already passed on boot. So if it doesn't pass, which I
consider to be impossible, I believe we should panic.
On the other hand, I understand how would this make a difference on
big.LITTLE where you try to hotplug a CPU that was never booted.
However, that scenario is out of this scope.

2) I still wanted to give a chance to your proposal and just convert
panic into stop_cpu+printing error. The system cannot survive if
enabling a capability fails. In order to test this I added a
capability that will always fail after the boot. This is not realistic
in my opinion, but I used it only for testing to check whether the
system will survive. Instead of panic I printed an error and stopped
the CPU. However, Xen crashed. The boot CPU properly concluded that
the erroneous CPU will never become online, but later on credit
scheduler's assertion fails. I believe this is something that a person
who adds big.LITTLE support should deal with.

Do we have an agreement to keep panic?

Thanks,
Mirela

> Cheers,
>
>> +        }
>> +    }
>> +}
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>> diff --git a/xen/include/asm-arm/cpufeature.h
>> b/xen/include/asm-arm/cpufeature.h
>> index e557a095af..b14e226401 100644
>> --- a/xen/include/asm-arm/cpufeature.h
>> +++ b/xen/include/asm-arm/cpufeature.h
>> @@ -88,6 +88,7 @@ void update_cpu_capabilities(const struct
>> arm_cpu_capabilities *caps,
>>                                const char *info);
>>     void enable_cpu_capabilities(const struct arm_cpu_capabilities *caps);
>> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps);
>>     #endif /* __ASSEMBLY__ */
>>
>
>
> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-09 15:48     ` Mirela Simonovic
@ 2018-05-09 16:32       ` Julien Grall
  2018-05-10 11:57         ` Mirela Simonovic
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2018-05-09 16:32 UTC (permalink / raw)
  To: Mirela Simonovic
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel



On 09/05/18 16:48, Mirela Simonovic wrote:
> Hi Julien,

Hi Mirela,

> On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall <julien.grall@arm.com> wrote:
>> Hi Mirela,
>>
>>
>> On 27/04/18 18:12, Mirela Simonovic wrote:
>>>
>>> On boot, enabling errata workarounds will be triggered by the boot CPU
>>> from start_xen(). On CPU hotplug (non-boot scenario) this would not be
>>> done. This patch adds the code required to enable errata workarounds
>>> for a CPU being hotplugged after the system boots. This is triggered
>>> using a notifier. If the CPU fails to enable the errata Xen will panic.
>>> This is done because it is assumed that the CPU which is hotplugged
>>> after the system/Xen boots, was initially hotplugged during the
>>> system/Xen boot. Therefore, enabling errata workarounds should never
>>> fail.
>>>
>>> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
>>>
>>> ---
>>> CC: Stefano Stabellini <sstabellini@kernel.org>
>>> CC: Julien Grall <julien.grall@arm.com>
>>> ---
>>>    xen/arch/arm/cpuerrata.c         | 35
>>> +++++++++++++++++++++++++++++++++++
>>>    xen/arch/arm/cpufeature.c        | 23 +++++++++++++++++++++++
>>>    xen/include/asm-arm/cpufeature.h |  1 +
>>>    3 files changed, 59 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
>>> index 1baa20654b..4040f781ec 100644
>>> --- a/xen/arch/arm/cpuerrata.c
>>> +++ b/xen/arch/arm/cpuerrata.c
>>> @@ -5,6 +5,8 @@
>>>    #include <xen/spinlock.h>
>>>    #include <xen/vmap.h>
>>>    #include <xen/warning.h>
>>> +#include <xen/notifier.h>
>>> +#include <xen/cpu.h>
>>>    #include <asm/cpufeature.h>
>>>    #include <asm/cpuerrata.h>
>>>    #include <asm/psci.h>
>>> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
>>>        enable_cpu_capabilities(arm_errata);
>>>    }
>>>    +static int cpu_errata_callback(
>>> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
>>> +{
>>> +    switch ( action )
>>> +    {
>>> +    case CPU_STARTING:
>>> +        enable_nonboot_cpu_caps(arm_errata);
>>> +        break;
>>> +    default:
>>> +        break;
>>> +    }
>>> +
>>> +    return NOTIFY_DONE;
>>> +}
>>> +
>>> +static struct notifier_block cpu_errata_nfb = {
>>> +    .notifier_call = cpu_errata_callback,
>>> +};
>>> +
>>> +static int __init cpu_errata_notifier_init(void)
>>> +{
>>> +    register_cpu_notifier(&cpu_errata_nfb);
>>> +    return 0;
>>> +}
>>> +/*
>>> + * Initialization has to be done at init rather than presmp_init phase
>>> because
>>> + * the callback should execute only after the secondary CPUs are
>>> initially
>>> + * booted (in hotplug scenarios when the system state is not boot). On
>>> boot,
>>> + * the enabling of errata workarounds will be triggered by the boot CPU
>>> from
>>> + * start_xen().
>>> + */
>>> +__initcall(cpu_errata_notifier_init);
>>> +
>>>    /*
>>>     * Local variables:
>>>     * mode: C
>>> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
>>> index 525b45e22f..dd30f0d29c 100644
>>> --- a/xen/arch/arm/cpufeature.c
>>> +++ b/xen/arch/arm/cpufeature.c
>>> @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct
>>> arm_cpu_capabilities *caps)
>>>        }
>>>    }
>>>    +/* Run through the enabled capabilities and enable() them on the
>>> calling CPU */
>>> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
>>> +{
>>> +    ASSERT(system_state != SYS_STATE_boot);
>>> +
>>> +    for ( ; caps->matches; caps++ )
>>> +    {
>>> +        if ( !cpus_have_cap(caps->capability) )
>>> +            continue;
>>> +
>>> +        if ( caps->enable )
>>> +        {
>>> +            /*
>>> +             * Since the CPU has enabled errata workarounds on boot, it
>>> should
>>
>>
>> This function is not really about errata, it is about capabilities. Errata
>> is just a sub-category of them.
>>
> 
> I've fixed the comment, thanks.
> 
>>> +             * never fail to enable them here.
>>> +             */
>>> +            if ( caps->enable((void *)caps) )
>>> +                panic("CPU%u failed to enable capability %u\n",
>>> +                      smp_processor_id(), caps->capability);
>>
>>
>> We should really avoid to use panic(...) if this is something the system can
>> survive. In that specific case, it would only affect the current CPU. So it
>> would be better to return an error and let the caller decide what to do.
>>
> 
> I need to emphasize two points:
> 1) I don't see how is this different compared to PSCI CPU OFF where we
> do panic. Essentially, in both cases the system will not be able to
> use that CPU and we already agreed that is a good reason to panic.

You can't compare PSCI CPU off and the enable callback failing. The 
*only* reason PSCI CPU off can fail is because the Trusted OS is 
resident on that CPU. If that ever happen it is a programming error on 
Xen, and it makes sense to fail because you don't want that CPU to spin 
in Xen.

Enabling a capability can fail because of a failure of allocating memory 
or mapping (see spectre workaround). It is not a programming error but 
an expected behavior and it is not a valid reason to assume we want to 
kill the system.

> As oppose to CPU_OFF which wasn't called on boot so we indeed have no
> idea whether it will pass on suspend, no matter how unlikely it could
> fail, in this scenario we are sure that enabling capability should
> pass because it already passed on boot. So if it doesn't pass, which I
> consider to be impossible, I believe we should panic.
> On the other hand, I understand how would this make a difference on
> big.LITTLE where you try to hotplug a CPU that was never booted.
> However, that scenario is out of this scope.
While I agree that big.LITTLE is out of scope of your series, what I ask 
has nothing to do with big.LITTLE. There are valid reason for the enable 
callback to fail whether it is the case today or not.

> 
> 2) I still wanted to give a chance to your proposal and just convert
> panic into stop_cpu+printing error. The system cannot survive if
> enabling a capability fails. In order to test this I added a
> capability that will always fail after the boot. This is not realistic
> in my opinion, but I used it only for testing to check whether the
> system will survive. Instead of panic I printed an error and stopped
> the CPU. However, Xen crashed. The boot CPU properly concluded that
> the erroneous CPU will never become online, but later on credit
> scheduler's assertion fails.

Please provide more details.

> I believe this is something that a person
> who adds big.LITTLE support should deal with.

If there is a bug in the scheduler it should be fixed rather trying to 
workaround with a panic in the code. If you provide more details, we 
might be able to help here.

> 
> Do we have an agreement to keep panic?

I am afraid not, panic (and BUG*) should only be used when there are no 
way to come back or it is a programming error to end up here. I don't 
think this is the case with the information I have in hand.

The two solutions I find acceptable would be:
	1) Log a warning and ignore the error. Likely your CPU will break later on.
	2) Return an error and let the caller deal with it. The caller might 
decide to kill the system, but that's not our business. This function 
should only report an error.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 06/10] xen/common: Restore IRQ affinity when hotplugging a pCPU
  2018-04-27 17:12 ` [PATCH v3 06/10] xen/common: Restore IRQ affinity when hotplugging a pCPU Mirela Simonovic
@ 2018-05-10  7:33   ` Dario Faggioli
  0 siblings, 0 replies; 53+ messages in thread
From: Dario Faggioli @ 2018-05-10  7:33 UTC (permalink / raw)
  To: Mirela Simonovic, xen-devel
  Cc: edgar.iglesias, George Dunlap, julien.grall, sstabellini, dm


[-- Attachment #1.1: Type: text/plain, Size: 1436 bytes --]

On Fri, 2018-04-27 at 19:12 +0200, Mirela Simonovic wrote:
> Non-boot pCPUs are being hot-unplugged during the system suspend to
> RAM and hotplugged during the resume. When non-boot pCPUs are
> hot-unplugged the interrupts that were targeted to them are migrated
> to the boot pCPU.
> On suspend, each guest could have its own wake-up devices/interrupts
> (passthrough) that could trigger the system resume. These interrupts
> could be targeted to a non-boot pCPU, e.g. if the guest's vCPU is
> pinned to a non-boot pCPU. Due to the hot-unplug of non-boot pCPUs
> during the suspend such interrupts will be migrated from non-boot
> pCPUs
> to the boot pCPU (this is fine). However, when non-boot pCPUs are
> hotplugged on resume, these interrupts are not migrated back to non-
> boot
> pCPUs, i.e. IRQ affinity is not restored on resume (this is wrong).
> This patch adds the restoration of IRQ affinity when a pCPU is
> hotplugged.
> 
> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
> 
> ---
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Dario Faggioli <dfaggioli@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-09 16:32       ` Julien Grall
@ 2018-05-10 11:57         ` Mirela Simonovic
  2018-05-10 13:24           ` Mirela Simonovic
  2018-05-10 13:33           ` Dario Faggioli
  0 siblings, 2 replies; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-10 11:57 UTC (permalink / raw)
  To: Julien Grall, Dario Faggioli
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel

Hi,

+Dario

On Wed, May 9, 2018 at 6:32 PM, Julien Grall <julien.grall@arm.com> wrote:
>
>
> On 09/05/18 16:48, Mirela Simonovic wrote:
>>
>> Hi Julien,
>
>
> Hi Mirela,
>
>
>> On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall <julien.grall@arm.com>
>> wrote:
>>>
>>> Hi Mirela,
>>>
>>>
>>> On 27/04/18 18:12, Mirela Simonovic wrote:
>>>>
>>>>
>>>> On boot, enabling errata workarounds will be triggered by the boot CPU
>>>> from start_xen(). On CPU hotplug (non-boot scenario) this would not be
>>>> done. This patch adds the code required to enable errata workarounds
>>>> for a CPU being hotplugged after the system boots. This is triggered
>>>> using a notifier. If the CPU fails to enable the errata Xen will panic.
>>>> This is done because it is assumed that the CPU which is hotplugged
>>>> after the system/Xen boots, was initially hotplugged during the
>>>> system/Xen boot. Therefore, enabling errata workarounds should never
>>>> fail.
>>>>
>>>> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
>>>>
>>>> ---
>>>> CC: Stefano Stabellini <sstabellini@kernel.org>
>>>> CC: Julien Grall <julien.grall@arm.com>
>>>> ---
>>>>    xen/arch/arm/cpuerrata.c         | 35
>>>> +++++++++++++++++++++++++++++++++++
>>>>    xen/arch/arm/cpufeature.c        | 23 +++++++++++++++++++++++
>>>>    xen/include/asm-arm/cpufeature.h |  1 +
>>>>    3 files changed, 59 insertions(+)
>>>>
>>>> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
>>>> index 1baa20654b..4040f781ec 100644
>>>> --- a/xen/arch/arm/cpuerrata.c
>>>> +++ b/xen/arch/arm/cpuerrata.c
>>>> @@ -5,6 +5,8 @@
>>>>    #include <xen/spinlock.h>
>>>>    #include <xen/vmap.h>
>>>>    #include <xen/warning.h>
>>>> +#include <xen/notifier.h>
>>>> +#include <xen/cpu.h>
>>>>    #include <asm/cpufeature.h>
>>>>    #include <asm/cpuerrata.h>
>>>>    #include <asm/psci.h>
>>>> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
>>>>        enable_cpu_capabilities(arm_errata);
>>>>    }
>>>>    +static int cpu_errata_callback(
>>>> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
>>>> +{
>>>> +    switch ( action )
>>>> +    {
>>>> +    case CPU_STARTING:
>>>> +        enable_nonboot_cpu_caps(arm_errata);
>>>> +        break;
>>>> +    default:
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    return NOTIFY_DONE;
>>>> +}
>>>> +
>>>> +static struct notifier_block cpu_errata_nfb = {
>>>> +    .notifier_call = cpu_errata_callback,
>>>> +};
>>>> +
>>>> +static int __init cpu_errata_notifier_init(void)
>>>> +{
>>>> +    register_cpu_notifier(&cpu_errata_nfb);
>>>> +    return 0;
>>>> +}
>>>> +/*
>>>> + * Initialization has to be done at init rather than presmp_init phase
>>>> because
>>>> + * the callback should execute only after the secondary CPUs are
>>>> initially
>>>> + * booted (in hotplug scenarios when the system state is not boot). On
>>>> boot,
>>>> + * the enabling of errata workarounds will be triggered by the boot CPU
>>>> from
>>>> + * start_xen().
>>>> + */
>>>> +__initcall(cpu_errata_notifier_init);
>>>> +
>>>>    /*
>>>>     * Local variables:
>>>>     * mode: C
>>>> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
>>>> index 525b45e22f..dd30f0d29c 100644
>>>> --- a/xen/arch/arm/cpufeature.c
>>>> +++ b/xen/arch/arm/cpufeature.c
>>>> @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct
>>>> arm_cpu_capabilities *caps)
>>>>        }
>>>>    }
>>>>    +/* Run through the enabled capabilities and enable() them on the
>>>> calling CPU */
>>>> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
>>>> +{
>>>> +    ASSERT(system_state != SYS_STATE_boot);
>>>> +
>>>> +    for ( ; caps->matches; caps++ )
>>>> +    {
>>>> +        if ( !cpus_have_cap(caps->capability) )
>>>> +            continue;
>>>> +
>>>> +        if ( caps->enable )
>>>> +        {
>>>> +            /*
>>>> +             * Since the CPU has enabled errata workarounds on boot, it
>>>> should
>>>
>>>
>>>
>>> This function is not really about errata, it is about capabilities.
>>> Errata
>>> is just a sub-category of them.
>>>
>>
>> I've fixed the comment, thanks.
>>
>>>> +             * never fail to enable them here.
>>>> +             */
>>>> +            if ( caps->enable((void *)caps) )
>>>> +                panic("CPU%u failed to enable capability %u\n",
>>>> +                      smp_processor_id(), caps->capability);
>>>
>>>
>>>
>>> We should really avoid to use panic(...) if this is something the system
>>> can
>>> survive. In that specific case, it would only affect the current CPU. So
>>> it
>>> would be better to return an error and let the caller decide what to do.
>>>
>>
>> I need to emphasize two points:
>> 1) I don't see how is this different compared to PSCI CPU OFF where we
>> do panic. Essentially, in both cases the system will not be able to
>> use that CPU and we already agreed that is a good reason to panic.
>
>
> You can't compare PSCI CPU off and the enable callback failing. The *only*
> reason PSCI CPU off can fail is because the Trusted OS is resident on that
> CPU. If that ever happen it is a programming error on Xen, and it makes
> sense to fail because you don't want that CPU to spin in Xen.
>
> Enabling a capability can fail because of a failure of allocating memory or
> mapping (see spectre workaround). It is not a programming error but an
> expected behavior and it is not a valid reason to assume we want to kill the
> system.
>
>> As oppose to CPU_OFF which wasn't called on boot so we indeed have no
>> idea whether it will pass on suspend, no matter how unlikely it could
>> fail, in this scenario we are sure that enabling capability should
>> pass because it already passed on boot. So if it doesn't pass, which I
>> consider to be impossible, I believe we should panic.
>> On the other hand, I understand how would this make a difference on
>> big.LITTLE where you try to hotplug a CPU that was never booted.
>> However, that scenario is out of this scope.
>
> While I agree that big.LITTLE is out of scope of your series, what I ask has
> nothing to do with big.LITTLE. There are valid reason for the enable
> callback to fail whether it is the case today or not.
>
>>
>> 2) I still wanted to give a chance to your proposal and just convert
>> panic into stop_cpu+printing error. The system cannot survive if
>> enabling a capability fails. In order to test this I added a
>> capability that will always fail after the boot. This is not realistic
>> in my opinion, but I used it only for testing to check whether the
>> system will survive. Instead of panic I printed an error and stopped
>> the CPU. However, Xen crashed. The boot CPU properly concluded that
>> the erroneous CPU will never become online, but later on credit
>> scheduler's assertion fails.
>
>
> Please provide more details.
>
>> I believe this is something that a person
>> who adds big.LITTLE support should deal with.
>
>
> If there is a bug in the scheduler it should be fixed rather trying to
> workaround with a panic in the code. If you provide more details, we might
> be able to help here.
>

This flow seems to have several bugs. Lets start from here:

Please take a look at function cpu_schedule_callback in schedule.c.
Within switch, case CPU_DEAD doesn't have a break, causing the bellow
CPU_UP_CANCELED to execute as well when the CPU goes down. This looks
wrong to me.
Dario, could you please confirm that this is a bug? Otherwise could
you please confirm the reasoning beyond?

Thanks,
Mirela

>>
>> Do we have an agreement to keep panic?
>
>
> I am afraid not, panic (and BUG*) should only be used when there are no way
> to come back or it is a programming error to end up here. I don't think this
> is the case with the information I have in hand.
>
> The two solutions I find acceptable would be:
>         1) Log a warning and ignore the error. Likely your CPU will break
> later on.
>         2) Return an error and let the caller deal with it. The caller might
> decide to kill the system, but that's not our business. This function should
> only report an error.
>
> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 11:57         ` Mirela Simonovic
@ 2018-05-10 13:24           ` Mirela Simonovic
  2018-05-10 13:29             ` Julien Grall
  2018-05-10 14:25             ` Dario Faggioli
  2018-05-10 13:33           ` Dario Faggioli
  1 sibling, 2 replies; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-10 13:24 UTC (permalink / raw)
  To: Julien Grall, Dario Faggioli
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel

On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
<mirela.simonovic@aggios.com> wrote:
> Hi,
>
> +Dario
>
> On Wed, May 9, 2018 at 6:32 PM, Julien Grall <julien.grall@arm.com> wrote:
>>
>>
>> On 09/05/18 16:48, Mirela Simonovic wrote:
>>>
>>> Hi Julien,
>>
>>
>> Hi Mirela,
>>
>>
>>> On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall <julien.grall@arm.com>
>>> wrote:
>>>>
>>>> Hi Mirela,
>>>>
>>>>
>>>> On 27/04/18 18:12, Mirela Simonovic wrote:
>>>>>
>>>>>
>>>>> On boot, enabling errata workarounds will be triggered by the boot CPU
>>>>> from start_xen(). On CPU hotplug (non-boot scenario) this would not be
>>>>> done. This patch adds the code required to enable errata workarounds
>>>>> for a CPU being hotplugged after the system boots. This is triggered
>>>>> using a notifier. If the CPU fails to enable the errata Xen will panic.
>>>>> This is done because it is assumed that the CPU which is hotplugged
>>>>> after the system/Xen boots, was initially hotplugged during the
>>>>> system/Xen boot. Therefore, enabling errata workarounds should never
>>>>> fail.
>>>>>
>>>>> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
>>>>>
>>>>> ---
>>>>> CC: Stefano Stabellini <sstabellini@kernel.org>
>>>>> CC: Julien Grall <julien.grall@arm.com>
>>>>> ---
>>>>>    xen/arch/arm/cpuerrata.c         | 35
>>>>> +++++++++++++++++++++++++++++++++++
>>>>>    xen/arch/arm/cpufeature.c        | 23 +++++++++++++++++++++++
>>>>>    xen/include/asm-arm/cpufeature.h |  1 +
>>>>>    3 files changed, 59 insertions(+)
>>>>>
>>>>> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
>>>>> index 1baa20654b..4040f781ec 100644
>>>>> --- a/xen/arch/arm/cpuerrata.c
>>>>> +++ b/xen/arch/arm/cpuerrata.c
>>>>> @@ -5,6 +5,8 @@
>>>>>    #include <xen/spinlock.h>
>>>>>    #include <xen/vmap.h>
>>>>>    #include <xen/warning.h>
>>>>> +#include <xen/notifier.h>
>>>>> +#include <xen/cpu.h>
>>>>>    #include <asm/cpufeature.h>
>>>>>    #include <asm/cpuerrata.h>
>>>>>    #include <asm/psci.h>
>>>>> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
>>>>>        enable_cpu_capabilities(arm_errata);
>>>>>    }
>>>>>    +static int cpu_errata_callback(
>>>>> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
>>>>> +{
>>>>> +    switch ( action )
>>>>> +    {
>>>>> +    case CPU_STARTING:
>>>>> +        enable_nonboot_cpu_caps(arm_errata);
>>>>> +        break;
>>>>> +    default:
>>>>> +        break;
>>>>> +    }
>>>>> +
>>>>> +    return NOTIFY_DONE;
>>>>> +}
>>>>> +
>>>>> +static struct notifier_block cpu_errata_nfb = {
>>>>> +    .notifier_call = cpu_errata_callback,
>>>>> +};
>>>>> +
>>>>> +static int __init cpu_errata_notifier_init(void)
>>>>> +{
>>>>> +    register_cpu_notifier(&cpu_errata_nfb);
>>>>> +    return 0;
>>>>> +}
>>>>> +/*
>>>>> + * Initialization has to be done at init rather than presmp_init phase
>>>>> because
>>>>> + * the callback should execute only after the secondary CPUs are
>>>>> initially
>>>>> + * booted (in hotplug scenarios when the system state is not boot). On
>>>>> boot,
>>>>> + * the enabling of errata workarounds will be triggered by the boot CPU
>>>>> from
>>>>> + * start_xen().
>>>>> + */
>>>>> +__initcall(cpu_errata_notifier_init);
>>>>> +
>>>>>    /*
>>>>>     * Local variables:
>>>>>     * mode: C
>>>>> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
>>>>> index 525b45e22f..dd30f0d29c 100644
>>>>> --- a/xen/arch/arm/cpufeature.c
>>>>> +++ b/xen/arch/arm/cpufeature.c
>>>>> @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct
>>>>> arm_cpu_capabilities *caps)
>>>>>        }
>>>>>    }
>>>>>    +/* Run through the enabled capabilities and enable() them on the
>>>>> calling CPU */
>>>>> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
>>>>> +{
>>>>> +    ASSERT(system_state != SYS_STATE_boot);
>>>>> +
>>>>> +    for ( ; caps->matches; caps++ )
>>>>> +    {
>>>>> +        if ( !cpus_have_cap(caps->capability) )
>>>>> +            continue;
>>>>> +
>>>>> +        if ( caps->enable )
>>>>> +        {
>>>>> +            /*
>>>>> +             * Since the CPU has enabled errata workarounds on boot, it
>>>>> should
>>>>
>>>>
>>>>
>>>> This function is not really about errata, it is about capabilities.
>>>> Errata
>>>> is just a sub-category of them.
>>>>
>>>
>>> I've fixed the comment, thanks.
>>>
>>>>> +             * never fail to enable them here.
>>>>> +             */
>>>>> +            if ( caps->enable((void *)caps) )
>>>>> +                panic("CPU%u failed to enable capability %u\n",
>>>>> +                      smp_processor_id(), caps->capability);
>>>>
>>>>
>>>>
>>>> We should really avoid to use panic(...) if this is something the system
>>>> can
>>>> survive. In that specific case, it would only affect the current CPU. So
>>>> it
>>>> would be better to return an error and let the caller decide what to do.
>>>>
>>>
>>> I need to emphasize two points:
>>> 1) I don't see how is this different compared to PSCI CPU OFF where we
>>> do panic. Essentially, in both cases the system will not be able to
>>> use that CPU and we already agreed that is a good reason to panic.
>>
>>
>> You can't compare PSCI CPU off and the enable callback failing. The *only*
>> reason PSCI CPU off can fail is because the Trusted OS is resident on that
>> CPU. If that ever happen it is a programming error on Xen, and it makes
>> sense to fail because you don't want that CPU to spin in Xen.
>>
>> Enabling a capability can fail because of a failure of allocating memory or
>> mapping (see spectre workaround). It is not a programming error but an
>> expected behavior and it is not a valid reason to assume we want to kill the
>> system.
>>
>>> As oppose to CPU_OFF which wasn't called on boot so we indeed have no
>>> idea whether it will pass on suspend, no matter how unlikely it could
>>> fail, in this scenario we are sure that enabling capability should
>>> pass because it already passed on boot. So if it doesn't pass, which I
>>> consider to be impossible, I believe we should panic.
>>> On the other hand, I understand how would this make a difference on
>>> big.LITTLE where you try to hotplug a CPU that was never booted.
>>> However, that scenario is out of this scope.
>>
>> While I agree that big.LITTLE is out of scope of your series, what I ask has
>> nothing to do with big.LITTLE. There are valid reason for the enable
>> callback to fail whether it is the case today or not.
>>
>>>
>>> 2) I still wanted to give a chance to your proposal and just convert
>>> panic into stop_cpu+printing error. The system cannot survive if
>>> enabling a capability fails. In order to test this I added a
>>> capability that will always fail after the boot. This is not realistic
>>> in my opinion, but I used it only for testing to check whether the
>>> system will survive. Instead of panic I printed an error and stopped
>>> the CPU. However, Xen crashed. The boot CPU properly concluded that
>>> the erroneous CPU will never become online, but later on credit
>>> scheduler's assertion fails.
>>
>>
>> Please provide more details.
>>
>>> I believe this is something that a person
>>> who adds big.LITTLE support should deal with.
>>
>>
>> If there is a bug in the scheduler it should be fixed rather trying to
>> workaround with a panic in the code. If you provide more details, we might
>> be able to help here.
>>
>
> This flow seems to have several bugs. Lets start from here:
>
> Please take a look at function cpu_schedule_callback in schedule.c.
> Within switch, case CPU_DEAD doesn't have a break, causing the bellow
> CPU_UP_CANCELED to execute as well when the CPU goes down. This looks
> wrong to me.
> Dario, could you please confirm that this is a bug? Otherwise could
> you please confirm the reasoning beyond?
>

Dario sorry, this looked wrong in my scenario but actually it is
correct. I understand the purpose of the missing break now.

I'll try to explain what was the issue when non-boot CPU being
hotplugged fails to enable capabilities. For those who are not
interested in details - the issue is fixed and will be submitted as
part of this patch in the v4 of this series.

For the curious ones (if any) here is detailed description: errata
notifier added in this patch had the same priority as scheduler
notifier. I though priority doesn't matter, but I was wrong. In this
particular scenario where a CPU fails to enable capabilities
(triggered by errata notifier added in this patch), scheduler callback
executed before the errata callback for CPU_STARTING event. In other
words, scheduler already called init_pdata before the errata callback
fired (and stopped the CPU).
Later on when errata callback fired, enabling of capabilities has
failed, so the erroneous non-boot CPU stopped itself and never
declared to be online.
Then CPU#0 fired new notification with CPU_UP_CANCELED event in order
to clean up for the job done on CPU_STARTING. However, this broke the
assumption (which is good) made in cpu_schedule_callback. The
assumption is that the sequence of steps should be
alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this
particular case deinit_pdata was not done because this would be done
only upon CPU_DEAD event which makes no sense in this scenario.
In order to avoid running into the invalid scenario described above,
the errata callback should fire before the scheduler callback. If
enabling capabilities fails, the scheduler callback for CPU_STARTING
will never execute afterwards, so the following CPU_UP_CANCELED event
triggered by the CPU#0 will do free_pdata, which is ok because
init_pdata was not executed and alloc_pdata-->free_pdata flow is also
valid. Congratulations to the reader who reached this point :)

I have tested the tuned scenario where enabling capabilities fails -
the erroneous CPU is stopped/powered down and the system continues to
work fine without it. Although I still don't understand why indeed I
needed to debug this I'll add the fix in v4.

Thanks,
Mirela

> Thanks,
> Mirela
>
>>>
>>> Do we have an agreement to keep panic?
>>
>>
>> I am afraid not, panic (and BUG*) should only be used when there are no way
>> to come back or it is a programming error to end up here. I don't think this
>> is the case with the information I have in hand.
>>
>> The two solutions I find acceptable would be:
>>         1) Log a warning and ignore the error. Likely your CPU will break
>> later on.
>>         2) Return an error and let the caller deal with it. The caller might
>> decide to kill the system, but that's not our business. This function should
>> only report an error.
>>
>> Cheers,
>>
>> --
>> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 13:24           ` Mirela Simonovic
@ 2018-05-10 13:29             ` Julien Grall
  2018-05-10 14:12               ` Mirela Simonovic
  2018-05-10 14:25             ` Dario Faggioli
  1 sibling, 1 reply; 53+ messages in thread
From: Julien Grall @ 2018-05-10 13:29 UTC (permalink / raw)
  To: Mirela Simonovic, Dario Faggioli
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel

Hi,

On 05/10/2018 02:24 PM, Mirela Simonovic wrote:
> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
> <mirela.simonovic@aggios.com> wrote:

> I have tested the tuned scenario where enabling capabilities fails -
> the erroneous CPU is stopped/powered down and the system continues to
> work fine without it. Although I still don't understand why indeed I
> needed to debug this I'll add the fix in v4.

I don't understand your last sentence. What did you mean?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 11:57         ` Mirela Simonovic
  2018-05-10 13:24           ` Mirela Simonovic
@ 2018-05-10 13:33           ` Dario Faggioli
  1 sibling, 0 replies; 53+ messages in thread
From: Dario Faggioli @ 2018-05-10 13:33 UTC (permalink / raw)
  To: Mirela Simonovic, Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 2963 bytes --]

On Thu, 2018-05-10 at 13:57 +0200, Mirela Simonovic wrote:
> Hi,
> 
> +Dario
> 
Thanks.

> On Wed, May 9, 2018 at 6:32 PM, Julien Grall <julien.grall@arm.com>
> wrote:
> > If there is a bug in the scheduler it should be fixed rather trying
> > to
> > workaround with a panic in the code. If you provide more details,
> > we might
> > be able to help here.
> > 
> 
> This flow seems to have several bugs. Lets start from here:
> 
> Please take a look at function cpu_schedule_callback in schedule.c.
> Within switch, case CPU_DEAD doesn't have a break, causing the bellow
> CPU_UP_CANCELED to execute as well when the CPU goes down. This looks
> wrong to me.
> Dario, could you please confirm that this is a bug? 
>
No, that is entirely on purpose (as you can tell from the
"/* Fallthrough */" comment).

> Otherwise could
> you please confirm the reasoning beyond?
> 
Well, the idea is that the CPU_UP_CANCELLED notifier should _only_
invoke cpu_schedule_down(). On the other hand, the CPU_DEAD notifier
should invoke both:
 SCHED_OP(deinit_pdata)
 cpu_schedule_down()

and the fallthrough is the way we achieve that.

On x86, the shutdown/suspend path is as follows:

    XENPF_enter_acpi_sleep (platform_hypercall.c)
      acpi_enter_sleep()
        continue_hypercall_on_cpu(0, enter_state_helper)
          enter_state()
            system_state = SYS_STATE_suspend
            disable_nonboot_cpus()
              for_each_online_cpu ( cpu ) { 
                cpu_down(cpu) }
                  notifier_call_chain(CPU_DOWN_PREPARE)
                    cpu_callback(CPU_DOWN_PREPARE)
                      cpupool_cpu_remove(cpu)
                      cpufreq_del_cpu(cpu)
                      stop_machine_run(take_cpu_down, cpu)
                        __cpu_disable();
                          notifier_call_chain(CPU_DYING)
                            ...
                          cpumask_clear_cpu(cpu, &cpu_online_map);
                            cpu_disable_scheduler(cpu);
                  __cpu_die(cpu)
                    while ( (seen_state = cpu_state) != CPU_STATE_DEAD ) { ... }
                    notifier_call_chain(CPU_DEAD)
                      cpu_schedule_callback(CPU_DEAD)
                        ...
                        ...
                        cpu_schedule_callback(CPU_DEAD)
                          SCHED_OP(deinit_pdata)
                          cpu_schedule_down(cpu)
                            SCHED_OP(free_pdata)
                            SCHED_OP(free_vdata)


If you see a BUG_ON triggering, please provide details about that, and
we'll try to figure out what's causing it.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 13:29             ` Julien Grall
@ 2018-05-10 14:12               ` Mirela Simonovic
  2018-05-10 14:35                 ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-10 14:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Xen Devel, Stefano Stabellini, Davorin Mista,
	Dario Faggioli

Hi Julien,

On Thu, May 10, 2018 at 3:29 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi,
>
> On 05/10/2018 02:24 PM, Mirela Simonovic wrote:
>>
>> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
>> <mirela.simonovic@aggios.com> wrote:
>
>
>> I have tested the tuned scenario where enabling capabilities fails -
>> the erroneous CPU is stopped/powered down and the system continues to
>> work fine without it. Although I still don't understand why indeed I
>> needed to debug this I'll add the fix in v4.
>
>
> I don't understand your last sentence. What did you mean?
>

I still don't understand how can this scenario happen practically at
runtime (a scenario without me tuning the failure for testing). Memory
allocation failure shouldn't be possible at this point and any
workaround related failure would have happened at boot. I don't see
anything else or my understanding may not be correct...
However, if some new capability would be introduced one day and it
could fail, it would be good if we don't have to go back and fix this
patch.
At the end, it doesn't matter since the system will be able to
properly deal with such a scenario with the fix.

Thanks,
Mirela

> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 13:24           ` Mirela Simonovic
  2018-05-10 13:29             ` Julien Grall
@ 2018-05-10 14:25             ` Dario Faggioli
  2018-05-10 15:00               ` Mirela Simonovic
  1 sibling, 1 reply; 53+ messages in thread
From: Dario Faggioli @ 2018-05-10 14:25 UTC (permalink / raw)
  To: Mirela Simonovic, Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 3182 bytes --]

On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote:
> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
> 
> > Please take a look at function cpu_schedule_callback in schedule.c.
> > Within switch, case CPU_DEAD doesn't have a break, causing the
> > bellow
> > CPU_UP_CANCELED to execute as well when the CPU goes down. This
> > looks
> > wrong to me.
> > Dario, could you please confirm that this is a bug? Otherwise could
> > you please confirm the reasoning beyond?
> > 
> 
> Dario sorry, this looked wrong in my scenario but actually it is
> correct. I understand the purpose of the missing break now.
> 
No problem.

> For the curious ones (if any) here is detailed description: errata
> notifier added in this patch had the same priority as scheduler
> notifier. I though priority doesn't matter, but I was wrong. In this
> particular scenario where a CPU fails to enable capabilities
> (triggered by errata notifier added in this patch), scheduler
> callback
> executed before the errata callback for CPU_STARTING event. 
>
So, you're adding your errata callback to the CPU_STARTING notifier,
right? (Sorry for having to ask, I don't have the patch handy right
now.)

> In other
> words, scheduler already called init_pdata before the errata callback
> fired (and stopped the CPU).
> Later on when errata callback fired, enabling of capabilities has
> failed, so the erroneous non-boot CPU stopped itself and never
> declared to be online.
> Then CPU#0 fired new notification with CPU_UP_CANCELED event in order
> to clean up for the job done on CPU_STARTING. However, this broke the
> assumption (which is good) made in cpu_schedule_callback. The
> assumption is that the sequence of steps should be
> alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this
> particular case deinit_pdata was not done because this would be done
> only upon CPU_DEAD event which makes no sense in this scenario.
> In order to avoid running into the invalid scenario described above,
> the errata callback should fire before the scheduler callback. If
> enabling capabilities fails, the scheduler callback for CPU_STARTING
> will never execute afterwards, so the following CPU_UP_CANCELED event
> triggered by the CPU#0 will do free_pdata, which is ok because
> init_pdata was not executed and alloc_pdata-->free_pdata flow is also
> valid. Congratulations to the reader who reached this point :)
> 
Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE.

If you add your callback to CPU_UP_PREPARE, instead than to
CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having
to fiddle with priorities.

Is there any reason why you can't do it that way? It would look more
natural to me, and it's definitely going to be easier debug and
maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as
compared to CPU_STARTING ;-P).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 14:12               ` Mirela Simonovic
@ 2018-05-10 14:35                 ` Julien Grall
  0 siblings, 0 replies; 53+ messages in thread
From: Julien Grall @ 2018-05-10 14:35 UTC (permalink / raw)
  To: Mirela Simonovic
  Cc: Edgar E. Iglesias, Xen Devel, Stefano Stabellini, Davorin Mista,
	Dario Faggioli



On 10/05/18 15:12, Mirela Simonovic wrote:
> Hi Julien,
> 
> On Thu, May 10, 2018 at 3:29 PM, Julien Grall <julien.grall@arm.com> wrote:
>> Hi,
>>
>> On 05/10/2018 02:24 PM, Mirela Simonovic wrote:
>>>
>>> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
>>> <mirela.simonovic@aggios.com> wrote:
>>
>>
>>> I have tested the tuned scenario where enabling capabilities fails -
>>> the erroneous CPU is stopped/powered down and the system continues to
>>> work fine without it. Although I still don't understand why indeed I
>>> needed to debug this I'll add the fix in v4.
>>
>>
>> I don't understand your last sentence. What did you mean?
>>
> 
> I still don't understand how can this scenario happen practically at
> runtime (a scenario without me tuning the failure for testing).

error path are by nature hard to reach without tweaking the code. 
Imagine something that can only happen once every thousand boot. It is 
not often, but thanks to murphy that will only happen at the worst 
moment. And you know very well how it is a pain to debug :).

So I always tend to tweak the code in order to exercise major error path.

> Memory
> allocation failure shouldn't be possible at this point and any
> workaround related failure would have happened at boot.
> I don't see
> anything else or my understanding may not be correct...
> However, if some new capability would be introduced one day and it
> could fail, it would be good if we don't have to go back and fix this
> patch.

The memory failure was an example among the other. The whole point here 
is you can't extrapolate how the errata are implemented today for future 
one. I also want to limit the number of stop_cpu/panic over the code, so 
we have only place to make the decision if something is wrong.

> At the end, it doesn't matter since the system will be able to
> properly deal with such a scenario with the fix.

I would be ok having stop_cpu called here. Although my preference is 
returning an error and let the caller decides what to do.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 14:25             ` Dario Faggioli
@ 2018-05-10 15:00               ` Mirela Simonovic
  2018-05-10 15:13                 ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-10 15:00 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Edgar E. Iglesias, Julien Grall, Stefano Stabellini,
	Davorin Mista, Xen Devel

Hi Dario,

On Thu, May 10, 2018 at 4:25 PM, Dario Faggioli <dfaggioli@suse.com> wrote:
> On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote:
>> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
>>
>> > Please take a look at function cpu_schedule_callback in schedule.c.
>> > Within switch, case CPU_DEAD doesn't have a break, causing the
>> > bellow
>> > CPU_UP_CANCELED to execute as well when the CPU goes down. This
>> > looks
>> > wrong to me.
>> > Dario, could you please confirm that this is a bug? Otherwise could
>> > you please confirm the reasoning beyond?
>> >
>>
>> Dario sorry, this looked wrong in my scenario but actually it is
>> correct. I understand the purpose of the missing break now.
>>
> No problem.
>
>> For the curious ones (if any) here is detailed description: errata
>> notifier added in this patch had the same priority as scheduler
>> notifier. I though priority doesn't matter, but I was wrong. In this
>> particular scenario where a CPU fails to enable capabilities
>> (triggered by errata notifier added in this patch), scheduler
>> callback
>> executed before the errata callback for CPU_STARTING event.
>>
> So, you're adding your errata callback to the CPU_STARTING notifier,
> right? (Sorry for having to ask, I don't have the patch handy right
> now.)
>
>> In other
>> words, scheduler already called init_pdata before the errata callback
>> fired (and stopped the CPU).
>> Later on when errata callback fired, enabling of capabilities has
>> failed, so the erroneous non-boot CPU stopped itself and never
>> declared to be online.
>> Then CPU#0 fired new notification with CPU_UP_CANCELED event in order
>> to clean up for the job done on CPU_STARTING. However, this broke the
>> assumption (which is good) made in cpu_schedule_callback. The
>> assumption is that the sequence of steps should be
>> alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this
>> particular case deinit_pdata was not done because this would be done
>> only upon CPU_DEAD event which makes no sense in this scenario.
>> In order to avoid running into the invalid scenario described above,
>> the errata callback should fire before the scheduler callback. If
>> enabling capabilities fails, the scheduler callback for CPU_STARTING
>> will never execute afterwards, so the following CPU_UP_CANCELED event
>> triggered by the CPU#0 will do free_pdata, which is ok because
>> init_pdata was not executed and alloc_pdata-->free_pdata flow is also
>> valid. Congratulations to the reader who reached this point :)
>>
> Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE.
>
> If you add your callback to CPU_UP_PREPARE, instead than to
> CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having
> to fiddle with priorities.
>

Difference between CPU_UP_PREPARE and CPU_STARTING (in addition to the
sequential ordering) is about which CPU executes the callback.
In CPU_UP_PREPARE case the CPU which called cpu_up for another CPU
will execute the callback. If I have 2 CPUs: CPU#0 executes callback
when trying to hotplug CPU#1. I need CPU#1 to execute this callback.
In CPU_STARTING case the CPU#1 will execute the callback, that is the
reason why it has to be CPU_STARTING event.

I tried CPU_UP_PREPARE in my tuned scenario and I needed few moment to
realize why the system died (CPU#0 stopped himself :)

> Is there any reason why you can't do it that way? It would look more
> natural to me, and it's definitely going to be easier debug and
> maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as
> compared to CPU_STARTING ;-P).
>

Julien is going to maintain this :)))

> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Software Engineer @ SUSE https://www.suse.com/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 15:00               ` Mirela Simonovic
@ 2018-05-10 15:13                 ` Julien Grall
  2018-05-10 15:49                   ` Mirela Simonovic
  2018-05-10 16:12                   ` Dario Faggioli
  0 siblings, 2 replies; 53+ messages in thread
From: Julien Grall @ 2018-05-10 15:13 UTC (permalink / raw)
  To: Mirela Simonovic, Dario Faggioli
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel



On 10/05/18 16:00, Mirela Simonovic wrote:
> Hi Dario,
> 
> On Thu, May 10, 2018 at 4:25 PM, Dario Faggioli <dfaggioli@suse.com> wrote:
>> On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote:
>>> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
>>>
>>>> Please take a look at function cpu_schedule_callback in schedule.c.
>>>> Within switch, case CPU_DEAD doesn't have a break, causing the
>>>> bellow
>>>> CPU_UP_CANCELED to execute as well when the CPU goes down. This
>>>> looks
>>>> wrong to me.
>>>> Dario, could you please confirm that this is a bug? Otherwise could
>>>> you please confirm the reasoning beyond?
>>>>
>>>
>>> Dario sorry, this looked wrong in my scenario but actually it is
>>> correct. I understand the purpose of the missing break now.
>>>
>> No problem.
>>
>>> For the curious ones (if any) here is detailed description: errata
>>> notifier added in this patch had the same priority as scheduler
>>> notifier. I though priority doesn't matter, but I was wrong. In this
>>> particular scenario where a CPU fails to enable capabilities
>>> (triggered by errata notifier added in this patch), scheduler
>>> callback
>>> executed before the errata callback for CPU_STARTING event.
>>>
>> So, you're adding your errata callback to the CPU_STARTING notifier,
>> right? (Sorry for having to ask, I don't have the patch handy right
>> now.)
>>
>>> In other
>>> words, scheduler already called init_pdata before the errata callback
>>> fired (and stopped the CPU).
>>> Later on when errata callback fired, enabling of capabilities has
>>> failed, so the erroneous non-boot CPU stopped itself and never
>>> declared to be online.
>>> Then CPU#0 fired new notification with CPU_UP_CANCELED event in order
>>> to clean up for the job done on CPU_STARTING. However, this broke the
>>> assumption (which is good) made in cpu_schedule_callback. The
>>> assumption is that the sequence of steps should be
>>> alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this
>>> particular case deinit_pdata was not done because this would be done
>>> only upon CPU_DEAD event which makes no sense in this scenario.
>>> In order to avoid running into the invalid scenario described above,
>>> the errata callback should fire before the scheduler callback. If
>>> enabling capabilities fails, the scheduler callback for CPU_STARTING
>>> will never execute afterwards, so the following CPU_UP_CANCELED event
>>> triggered by the CPU#0 will do free_pdata, which is ok because
>>> init_pdata was not executed and alloc_pdata-->free_pdata flow is also
>>> valid. Congratulations to the reader who reached this point :)
>>>
>> Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE.
>>
>> If you add your callback to CPU_UP_PREPARE, instead than to
>> CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having
>> to fiddle with priorities.
This function will enable capabilities on a given CPU, most of them are 
touching system registers. So it is necessary to add the callback to 
CPU_STARTING.

>>
> 
> Difference between CPU_UP_PREPARE and CPU_STARTING (in addition to the
> sequential ordering) is about which CPU executes the callback.
> In CPU_UP_PREPARE case the CPU which called cpu_up for another CPU
> will execute the callback. If I have 2 CPUs: CPU#0 executes callback
> when trying to hotplug CPU#1. I need CPU#1 to execute this callback.
> In CPU_STARTING case the CPU#1 will execute the callback, that is the
> reason why it has to be CPU_STARTING event.
> 
> I tried CPU_UP_PREPARE in my tuned scenario and I needed few moment to
> realize why the system died (CPU#0 stopped himself :)
> 
>> Is there any reason why you can't do it that way? It would look more
>> natural to me, and it's definitely going to be easier debug and
>> maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as
>> compared to CPU_STARTING ;-P).
>>
> 
> Julien is going to maintain this :)))

I always like to understand what I maintain :). Why do you need to 
change the priority if you just return an error in the notifier?

At the moment, notify_cpu_starting has a BUG_ON() in it. But ideally I 
would like to either replace that BUG_ON by cpu_stop or just returning 
an error to give a chance of the architecture deciding what to do (this 
does not have to be done today).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 15:13                 ` Julien Grall
@ 2018-05-10 15:49                   ` Mirela Simonovic
  2018-05-10 16:02                     ` Julien Grall
  2018-05-10 16:24                     ` Dario Faggioli
  2018-05-10 16:12                   ` Dario Faggioli
  1 sibling, 2 replies; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-10 15:49 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Xen Devel, Davorin Mista,
	Dario Faggioli

Hi Julien,

On Thu, May 10, 2018 at 5:13 PM, Julien Grall <julien.grall@arm.com> wrote:
>
>
> On 10/05/18 16:00, Mirela Simonovic wrote:
>>
>> Hi Dario,
>>
>> On Thu, May 10, 2018 at 4:25 PM, Dario Faggioli <dfaggioli@suse.com>
>> wrote:
>>>
>>> On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote:
>>>>
>>>> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
>>>>
>>>>> Please take a look at function cpu_schedule_callback in schedule.c.
>>>>> Within switch, case CPU_DEAD doesn't have a break, causing the
>>>>> bellow
>>>>> CPU_UP_CANCELED to execute as well when the CPU goes down. This
>>>>> looks
>>>>> wrong to me.
>>>>> Dario, could you please confirm that this is a bug? Otherwise could
>>>>> you please confirm the reasoning beyond?
>>>>>
>>>>
>>>> Dario sorry, this looked wrong in my scenario but actually it is
>>>> correct. I understand the purpose of the missing break now.
>>>>
>>> No problem.
>>>
>>>> For the curious ones (if any) here is detailed description: errata
>>>> notifier added in this patch had the same priority as scheduler
>>>> notifier. I though priority doesn't matter, but I was wrong. In this
>>>> particular scenario where a CPU fails to enable capabilities
>>>> (triggered by errata notifier added in this patch), scheduler
>>>> callback
>>>> executed before the errata callback for CPU_STARTING event.
>>>>
>>> So, you're adding your errata callback to the CPU_STARTING notifier,
>>> right? (Sorry for having to ask, I don't have the patch handy right
>>> now.)
>>>
>>>> In other
>>>> words, scheduler already called init_pdata before the errata callback
>>>> fired (and stopped the CPU).
>>>> Later on when errata callback fired, enabling of capabilities has
>>>> failed, so the erroneous non-boot CPU stopped itself and never
>>>> declared to be online.
>>>> Then CPU#0 fired new notification with CPU_UP_CANCELED event in order
>>>> to clean up for the job done on CPU_STARTING. However, this broke the
>>>> assumption (which is good) made in cpu_schedule_callback. The
>>>> assumption is that the sequence of steps should be
>>>> alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this
>>>> particular case deinit_pdata was not done because this would be done
>>>> only upon CPU_DEAD event which makes no sense in this scenario.
>>>> In order to avoid running into the invalid scenario described above,
>>>> the errata callback should fire before the scheduler callback. If
>>>> enabling capabilities fails, the scheduler callback for CPU_STARTING
>>>> will never execute afterwards, so the following CPU_UP_CANCELED event
>>>> triggered by the CPU#0 will do free_pdata, which is ok because
>>>> init_pdata was not executed and alloc_pdata-->free_pdata flow is also
>>>> valid. Congratulations to the reader who reached this point :)
>>>>
>>> Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE.
>>>
>>> If you add your callback to CPU_UP_PREPARE, instead than to
>>> CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having
>>> to fiddle with priorities.
>
> This function will enable capabilities on a given CPU, most of them are
> touching system registers. So it is necessary to add the callback to
> CPU_STARTING.
>
>>>
>>
>> Difference between CPU_UP_PREPARE and CPU_STARTING (in addition to the
>> sequential ordering) is about which CPU executes the callback.
>> In CPU_UP_PREPARE case the CPU which called cpu_up for another CPU
>> will execute the callback. If I have 2 CPUs: CPU#0 executes callback
>> when trying to hotplug CPU#1. I need CPU#1 to execute this callback.
>> In CPU_STARTING case the CPU#1 will execute the callback, that is the
>> reason why it has to be CPU_STARTING event.
>>
>> I tried CPU_UP_PREPARE in my tuned scenario and I needed few moment to
>> realize why the system died (CPU#0 stopped himself :)
>>
>>> Is there any reason why you can't do it that way? It would look more
>>> natural to me, and it's definitely going to be easier debug and
>>> maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as
>>> compared to CPU_STARTING ;-P).
>>>
>>
>> Julien is going to maintain this :)))
>
>
> I always like to understand what I maintain :). Why do you need to change
> the priority if you just return an error in the notifier?
>

Regardless of the fact that the notifier returns an error or not, I
believe it would be good and safe to set priority and document that
priority zero would cause racing issue in the scenario I debugged
today. I'm pretty sure that this discussion would be forgotten soon
and it really should be documented in code/comment.

In emails above I assumed we'll stop the erroneous CPU. I didn't have
a chance to try returning an error until few minutes ago.
I tried returning an error from the notifier now and the whole system
fails. You realized according to the answer below that this is going
to happen.

> At the moment, notify_cpu_starting has a BUG_ON() in it. But ideally I would
> like to either replace that BUG_ON by cpu_stop or just returning an error to
> give a chance of the architecture deciding what to do (this does not have to
> be done today).
>

I would rather stop CPU because changing notify_cpu_starting affects
x86 as well, I cannot dig into that and it would be really to much for
this series. Since you're fine with stopping cpu as well, please lets
do that instead of escalating this to who knows where :)

Thanks,
Mirela

> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 15:49                   ` Mirela Simonovic
@ 2018-05-10 16:02                     ` Julien Grall
  2018-05-10 16:21                       ` Dario Faggioli
  2018-05-10 16:24                     ` Dario Faggioli
  1 sibling, 1 reply; 53+ messages in thread
From: Julien Grall @ 2018-05-10 16:02 UTC (permalink / raw)
  To: Mirela Simonovic
  Cc: Edgar E. Iglesias, Stefano Stabellini, Xen Devel, Davorin Mista,
	Dario Faggioli



On 10/05/18 16:49, Mirela Simonovic wrote:
> Hi Julien,

Hi,

> On Thu, May 10, 2018 at 5:13 PM, Julien Grall <julien.grall@arm.com> wrote:
>>
>>
>> On 10/05/18 16:00, Mirela Simonovic wrote:
>>>
>>> Hi Dario,
>>>
>>> On Thu, May 10, 2018 at 4:25 PM, Dario Faggioli <dfaggioli@suse.com>
>>> wrote:
>>>>
>>>> On Thu, 2018-05-10 at 15:24 +0200, Mirela Simonovic wrote:
>>>>>
>>>>> On Thu, May 10, 2018 at 1:57 PM, Mirela Simonovic
>>>>>
>>>>>> Please take a look at function cpu_schedule_callback in schedule.c.
>>>>>> Within switch, case CPU_DEAD doesn't have a break, causing the
>>>>>> bellow
>>>>>> CPU_UP_CANCELED to execute as well when the CPU goes down. This
>>>>>> looks
>>>>>> wrong to me.
>>>>>> Dario, could you please confirm that this is a bug? Otherwise could
>>>>>> you please confirm the reasoning beyond?
>>>>>>
>>>>>
>>>>> Dario sorry, this looked wrong in my scenario but actually it is
>>>>> correct. I understand the purpose of the missing break now.
>>>>>
>>>> No problem.
>>>>
>>>>> For the curious ones (if any) here is detailed description: errata
>>>>> notifier added in this patch had the same priority as scheduler
>>>>> notifier. I though priority doesn't matter, but I was wrong. In this
>>>>> particular scenario where a CPU fails to enable capabilities
>>>>> (triggered by errata notifier added in this patch), scheduler
>>>>> callback
>>>>> executed before the errata callback for CPU_STARTING event.
>>>>>
>>>> So, you're adding your errata callback to the CPU_STARTING notifier,
>>>> right? (Sorry for having to ask, I don't have the patch handy right
>>>> now.)
>>>>
>>>>> In other
>>>>> words, scheduler already called init_pdata before the errata callback
>>>>> fired (and stopped the CPU).
>>>>> Later on when errata callback fired, enabling of capabilities has
>>>>> failed, so the erroneous non-boot CPU stopped itself and never
>>>>> declared to be online.
>>>>> Then CPU#0 fired new notification with CPU_UP_CANCELED event in order
>>>>> to clean up for the job done on CPU_STARTING. However, this broke the
>>>>> assumption (which is good) made in cpu_schedule_callback. The
>>>>> assumption is that the sequence of steps should be
>>>>> alloc_pdata-->init_pdata-->deinit_pdata-->free_pdata. In this
>>>>> particular case deinit_pdata was not done because this would be done
>>>>> only upon CPU_DEAD event which makes no sense in this scenario.
>>>>> In order to avoid running into the invalid scenario described above,
>>>>> the errata callback should fire before the scheduler callback. If
>>>>> enabling capabilities fails, the scheduler callback for CPU_STARTING
>>>>> will never execute afterwards, so the following CPU_UP_CANCELED event
>>>>> triggered by the CPU#0 will do free_pdata, which is ok because
>>>>> init_pdata was not executed and alloc_pdata-->free_pdata flow is also
>>>>> valid. Congratulations to the reader who reached this point :)
>>>>>
>>>> Ok, but the flow is, AFAICR, CPU_UP_PREPARE->CPU_STARTING->CPU_ONLINE.
>>>>
>>>> If you add your callback to CPU_UP_PREPARE, instead than to
>>>> CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without having
>>>> to fiddle with priorities.
>>
>> This function will enable capabilities on a given CPU, most of them are
>> touching system registers. So it is necessary to add the callback to
>> CPU_STARTING.
>>
>>>>
>>>
>>> Difference between CPU_UP_PREPARE and CPU_STARTING (in addition to the
>>> sequential ordering) is about which CPU executes the callback.
>>> In CPU_UP_PREPARE case the CPU which called cpu_up for another CPU
>>> will execute the callback. If I have 2 CPUs: CPU#0 executes callback
>>> when trying to hotplug CPU#1. I need CPU#1 to execute this callback.
>>> In CPU_STARTING case the CPU#1 will execute the callback, that is the
>>> reason why it has to be CPU_STARTING event.
>>>
>>> I tried CPU_UP_PREPARE in my tuned scenario and I needed few moment to
>>> realize why the system died (CPU#0 stopped himself :)
>>>
>>>> Is there any reason why you can't do it that way? It would look more
>>>> natural to me, and it's definitely going to be easier debug and
>>>> maintain (e.g., look at how many callbacks CPU_UP_PREPARE has, as
>>>> compared to CPU_STARTING ;-P).
>>>>
>>>
>>> Julien is going to maintain this :)))
>>
>>
>> I always like to understand what I maintain :). Why do you need to change
>> the priority if you just return an error in the notifier?
>>
> 
> Regardless of the fact that the notifier returns an error or not, I
> believe it would be good and safe to set priority and document that
> priority zero would cause racing issue in the scenario I debugged
> today. I'm pretty sure that this discussion would be forgotten soon
> and it really should be documented in code/comment.
> 
> In emails above I assumed we'll stop the erroneous CPU. I didn't have
> a chance to try returning an error until few minutes ago.
> I tried returning an error from the notifier now and the whole system
> fails. You realized according to the answer below that this is going
> to happen.

I was aware about it since the beginning. The whole point of the 
conversation was we should avoid to take the decision at the lower level 
and let the upper layer decide what to do.

If the system is failing today then that's fine and still fit what I 
said in my first e-mail of that thread. For reminder:

"We should really avoid to use panic(...) if this is something the 
system can survive. In that specific case, it would only affect the 
current CPU. So it would be better to return an error and let the caller 
decide what to do."

> 
>> At the moment, notify_cpu_starting has a BUG_ON() in it. But ideally I would
>> like to either replace that BUG_ON by cpu_stop or just returning an error to
>> give a chance of the architecture deciding what to do (this does not have to
>> be done today).
>>
> 
> I would rather stop CPU because changing notify_cpu_starting affects
> x86 as well, I cannot dig into that and it would be really to much for
> this series. Since you're fine with stopping cpu as well, please lets
> do that instead of escalating this to who knows where :)

What's the problem to escalate the error? That's how it works in most of 
the case. Imagine a driver failing, are you going to decide to crash 
there? Very likely no, you will report the error and let the upper layer 
here.

Also, while I suggest that it could be replaced by stop_cpu() in the 
common code, I also suggested that notifier_cpu_starting() could return 
an error then the architecture specific code can decide what to do.

On x86 it would still be a BUG_ON(notifier_cpu_starting()). On Arm we 
can decide what to do. But it is not part of that discussion here.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 15:13                 ` Julien Grall
  2018-05-10 15:49                   ` Mirela Simonovic
@ 2018-05-10 16:12                   ` Dario Faggioli
  1 sibling, 0 replies; 53+ messages in thread
From: Dario Faggioli @ 2018-05-10 16:12 UTC (permalink / raw)
  To: Julien Grall, Mirela Simonovic
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 2282 bytes --]

On Thu, 2018-05-10 at 16:13 +0100, Julien Grall wrote:
> On 10/05/18 16:00, Mirela Simonovic wrote:
> > > If you add your callback to CPU_UP_PREPARE, instead than to
> > > CPU_STARTING, SCHED_OP(init_pdata) wouldn't be called, without
> > > having
> > > to fiddle with priorities.
> 
> This function will enable capabilities on a given CPU, most of them
> are 
> touching system registers. So it is necessary to add the callback to 
> CPU_STARTING.
> 
Right, I guess that answers my question.

> I always like to understand what I maintain :). Why do you need to 
> change the priority if you just return an error in the notifier?
> 
> At the moment, notify_cpu_starting has a BUG_ON() in it. But ideally
> I 
> would like to either replace that BUG_ON by cpu_stop or just
> returning 
> an error to give a chance of the architecture deciding what to do
> (this 
> does not have to be done today).
> 
The problem is that, currently, once we've reached CPU_STARTING, we
assume that the CPU bringup has gone ok, and things can't fail.

Therefore, the only place when we undo what CPU_STARTING does is during
CPU_DEAD, i.e., during hotunplug/suspend/teardown.

I understand the point of having to run stuff on the CPU that is coming
up, as well as your more general point.

However, I don't know whether allowing CPU_STARTING to fail is the best
way to achieve what you want to achieve, nor whether the BUG_ON in
notify_cpu_starting() is the only issue you'll face trying to do that.

I'd say that, if CPU_STARTING can fail, we need an appropriate rollback
step, i.e., the flow must become something like (but I'd appreciate the
opinion of x86 and core hypervisor maintainers about this):

CPU_UP_PREPARE --> CPU_STARTING xx> CPU_DIDNT_START --> CPU_UP_CANCEL

At which point, e.g., from the scheduler point of view, we can try to
put a call to SCHED_OP(deinit_pdata) in CPU_DIDNT_START, and that would
avoid the problem Mirela is facing, without having to play with
priorities.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 16:02                     ` Julien Grall
@ 2018-05-10 16:21                       ` Dario Faggioli
  0 siblings, 0 replies; 53+ messages in thread
From: Dario Faggioli @ 2018-05-10 16:21 UTC (permalink / raw)
  To: Julien Grall, Mirela Simonovic
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 2054 bytes --]

On Thu, 2018-05-10 at 17:02 +0100, Julien Grall wrote:
> On 10/05/18 16:49, Mirela Simonovic wrote:
> > Regardless of the fact that the notifier returns an error or not, I
> > believe it would be good and safe to set priority and document that
> > priority zero would cause racing issue in the scenario I debugged
> > today. I'm pretty sure that this discussion would be forgotten soon
> > and it really should be documented in code/comment.
> > 
> > In emails above I assumed we'll stop the erroneous CPU. I didn't
> > have
> > a chance to try returning an error until few minutes ago.
> > I tried returning an error from the notifier now and the whole
> > system
> > fails. You realized according to the answer below that this is
> > going
> > to happen.
> 
> I was aware about it since the beginning. The whole point of the 
> conversation was we should avoid to take the decision at the lower
> level 
> and let the upper layer decide what to do.
> 
This makes sense to me.

> > I would rather stop CPU because changing notify_cpu_starting
> > affects
> > x86 as well, I cannot dig into that and it would be really to much
> > for
> > this series. Since you're fine with stopping cpu as well, please
> > lets
> > do that instead of escalating this to who knows where :)
> 
> Also, while I suggest that it could be replaced by stop_cpu() in the 
> common code, I also suggested that notifier_cpu_starting() could
> return 
> an error then the architecture specific code can decide what to do.
> 
> On x86 it would still be a BUG_ON(notifier_cpu_starting()). On Arm
> we 
> can decide what to do. But it is not part of that discussion here.
> 
As just saind in the other email, I don't think this is all it's
necessary to enable CPU_STARTING to fail.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 15:49                   ` Mirela Simonovic
  2018-05-10 16:02                     ` Julien Grall
@ 2018-05-10 16:24                     ` Dario Faggioli
  2018-05-11 10:41                       ` Mirela Simonovic
  1 sibling, 1 reply; 53+ messages in thread
From: Dario Faggioli @ 2018-05-10 16:24 UTC (permalink / raw)
  To: Mirela Simonovic, Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 933 bytes --]

On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:
> Regardless of the fact that the notifier returns an error or not, I
> believe it would be good and safe to set priority and document that
> priority zero would cause racing issue in the scenario I debugged
> today. I'm pretty sure that this discussion would be forgotten soon
> and it really should be documented in code/comment.
> 
I may very well be missing or misunderstanding something, but I
continue to think that the problem here is that CPU_STARTING can't,
right now, fail, while you need it to be able to.

If that is the case, giving different priorities to the notifier, is
not a solution.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-10 16:24                     ` Dario Faggioli
@ 2018-05-11 10:41                       ` Mirela Simonovic
  2018-05-11 10:54                         ` Julien Grall
  2018-05-11 13:01                         ` Dario Faggioli
  0 siblings, 2 replies; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-11 10:41 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Edgar E. Iglesias, Julien Grall, Stefano Stabellini,
	Davorin Mista, Xen Devel

Hi Dario,

On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli <dfaggioli@suse.com> wrote:
> On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:
>> Regardless of the fact that the notifier returns an error or not, I
>> believe it would be good and safe to set priority and document that
>> priority zero would cause racing issue in the scenario I debugged
>> today. I'm pretty sure that this discussion would be forgotten soon
>> and it really should be documented in code/comment.
>>
> I may very well be missing or misunderstanding something, but I
> continue to think that the problem here is that CPU_STARTING can't,
> right now, fail, while you need it to be able to.
>
> If that is the case, giving different priorities to the notifier, is
> not a solution.
>

Let me try to clarify. The assumption is that the starting CPU can
fail. Additional requirement set by Julien is that panic or BUG_ON is
not acceptable. There are 2 ways to deal with this scenario:

1) Ignore and report the error, and let the erroneous CPU become online.
This cannot be done without changing the logic in either scheduler or
notify_cpu_starting(), or I least I don't see how would that be done.
In previous email when I said "escalating this to who knows where" I
did not refer to error escalation but the escalation of the scope of
this series.

2) Stop the erroneous CPU.
Taking this approach requires setting the priority for the notifier.
The key point is that notify_cpu_starting() and scheduler do not have
to be changed. If errata notifier has higher priority than scheduler's
notifier in the case of an error the CPU will not return into
notify_cpu_starting() and it will never execute BUG_ON because it will
be stopped. The rest of the system will continue to function without
that CPU.

> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Software Engineer @ SUSE https://www.suse.com/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-11 10:41                       ` Mirela Simonovic
@ 2018-05-11 10:54                         ` Julien Grall
  2018-05-11 12:07                           ` Mirela Simonovic
  2018-05-11 13:12                           ` Dario Faggioli
  2018-05-11 13:01                         ` Dario Faggioli
  1 sibling, 2 replies; 53+ messages in thread
From: Julien Grall @ 2018-05-11 10:54 UTC (permalink / raw)
  To: Mirela Simonovic, Dario Faggioli
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel



On 11/05/18 11:41, Mirela Simonovic wrote:
> Hi Dario,
> 
> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli <dfaggioli@suse.com> wrote:
>> On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:
>>> Regardless of the fact that the notifier returns an error or not, I
>>> believe it would be good and safe to set priority and document that
>>> priority zero would cause racing issue in the scenario I debugged
>>> today. I'm pretty sure that this discussion would be forgotten soon
>>> and it really should be documented in code/comment.
>>>
>> I may very well be missing or misunderstanding something, but I
>> continue to think that the problem here is that CPU_STARTING can't,
>> right now, fail, while you need it to be able to.
>>
>> If that is the case, giving different priorities to the notifier, is
>> not a solution.
>>
> 
> Let me try to clarify. The assumption is that the starting CPU can
> fail. Additional requirement set by Julien is that panic or BUG_ON is
> not acceptable.

Please don't twist my word. I never said it was not acceptable to have 
the BUG_ON in notify_cpu_starting().

I am going to repeat the content of my answer to your last e-mail:

I was aware about it since the beginning. The whole point of the 
conversation was we should avoid to take the decision at the lower level 
and let the upper layer decide what to do.

If the system is failing today then that's fine and still fit what I 
said in my first e-mail of that thread. For reminder:

"We should really avoid to use panic(...) if this is something the 
system can survive. In that specific case, it would only affect the 
current CPU. So it would be better to return an error and let the caller 
decide what to do."

To summarize:
	1) Notifiers should only report an error and let the upper caller (here 
notify_cpu_starting()) deciding what to do.
	2) I am OK with the BUG_ON in notify_cpu_starting() for now.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-11 10:54                         ` Julien Grall
@ 2018-05-11 12:07                           ` Mirela Simonovic
  2018-05-11 12:20                             ` Mirela Simonovic
  2018-05-11 13:12                           ` Dario Faggioli
  1 sibling, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-11 12:07 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Xen Devel, Davorin Mista,
	Dario Faggioli

Hi Julien,

On Fri, May 11, 2018 at 12:54 PM, Julien Grall <julien.grall@arm.com> wrote:
>
>
> On 11/05/18 11:41, Mirela Simonovic wrote:
>>
>> Hi Dario,
>>
>> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli <dfaggioli@suse.com>
>> wrote:
>>>
>>> On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:
>>>>
>>>> Regardless of the fact that the notifier returns an error or not, I
>>>> believe it would be good and safe to set priority and document that
>>>> priority zero would cause racing issue in the scenario I debugged
>>>> today. I'm pretty sure that this discussion would be forgotten soon
>>>> and it really should be documented in code/comment.
>>>>
>>> I may very well be missing or misunderstanding something, but I
>>> continue to think that the problem here is that CPU_STARTING can't,
>>> right now, fail, while you need it to be able to.
>>>
>>> If that is the case, giving different priorities to the notifier, is
>>> not a solution.
>>>
>>
>> Let me try to clarify. The assumption is that the starting CPU can
>> fail. Additional requirement set by Julien is that panic or BUG_ON is
>> not acceptable.
>
>
> Please don't twist my word. I never said it was not acceptable to have the
> BUG_ON in notify_cpu_starting().

I didn't say that either. You referenced notify_cpu_starting() here, not me.
BTW, if you get the impression that I'm twisting words then we have a
misunderstanding and your approach is not the best way toward
clarifying it. Please have that in mind next time, because I do not
have such an intent and I never will.

I referred to panic/BUG_ON in this scenario regardless of the place
from which it would be invoked. My understanding from your previous
answers is that you think the system should not panic/BUG_ON in this
scenario, because this kind of error the system should be able to
survive.

>
> I am going to repeat the content of my answer to your last e-mail:
>
> I was aware about it since the beginning. The whole point of the
> conversation was we should avoid to take the decision at the lower level and
> let the upper layer decide what to do.
>
> If the system is failing today then that's fine and still fit what I said in
> my first e-mail of that thread. For reminder:
>
> "We should really avoid to use panic(...) if this is something the system
> can survive. In that specific case, it would only affect the current CPU. So
> it would be better to return an error and let the caller decide what to do."
>
> To summarize:
>         1) Notifiers should only report an error and let the upper caller
> (here notify_cpu_starting()) deciding what to do.
>         2) I am OK with the BUG_ON in notify_cpu_starting() for now.

I agree with BUG_ON in notify_cpu_starting().

>

Let me just clarify consequence of your proposal according to my
understanding. If instead of stopping the CPU when enabling a
capability fails the notifier returns an error, the error will
propagate to notify_cpu_starting() and BUG_ON will crash the system.
The proposal with stop_cpu() in the enable_nonboot_cpu_caps() instead
of panic that is submitted in this patch would stop only the erroneous
CPU. The rest of the system will continue to work and I though that is
what's the goal since we don't want to panic/BUG_ON.
From that perspective I believe stop_cpu() in
enable_nonboot_cpu_caps() is better compared to returning an error
from the notifier.

You said above "I would be ok having stop_cpu called here" and I did
so (stop_cpu() in enable_nonboot_cpu_caps() instead of panic that
submitted in this patch).

If you believe my understanding is not correct, if I missed something
or you have another proposal please let me know.

Thanks,
Mirela

> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-11 12:07                           ` Mirela Simonovic
@ 2018-05-11 12:20                             ` Mirela Simonovic
  2018-05-11 13:08                               ` Julien Grall
  0 siblings, 1 reply; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-11 12:20 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Xen Devel, Davorin Mista,
	Dario Faggioli

Hi,

On Fri, May 11, 2018 at 2:07 PM, Mirela Simonovic
<mirela.simonovic@aggios.com> wrote:
> Hi Julien,
>
> On Fri, May 11, 2018 at 12:54 PM, Julien Grall <julien.grall@arm.com> wrote:
>>
>>
>> On 11/05/18 11:41, Mirela Simonovic wrote:
>>>
>>> Hi Dario,
>>>
>>> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli <dfaggioli@suse.com>
>>> wrote:
>>>>
>>>> On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:
>>>>>
>>>>> Regardless of the fact that the notifier returns an error or not, I
>>>>> believe it would be good and safe to set priority and document that
>>>>> priority zero would cause racing issue in the scenario I debugged
>>>>> today. I'm pretty sure that this discussion would be forgotten soon
>>>>> and it really should be documented in code/comment.
>>>>>
>>>> I may very well be missing or misunderstanding something, but I
>>>> continue to think that the problem here is that CPU_STARTING can't,
>>>> right now, fail, while you need it to be able to.
>>>>
>>>> If that is the case, giving different priorities to the notifier, is
>>>> not a solution.
>>>>
>>>
>>> Let me try to clarify. The assumption is that the starting CPU can
>>> fail. Additional requirement set by Julien is that panic or BUG_ON is
>>> not acceptable.
>>
>>
>> Please don't twist my word. I never said it was not acceptable to have the
>> BUG_ON in notify_cpu_starting().
>
> I didn't say that either. You referenced notify_cpu_starting() here, not me.
> BTW, if you get the impression that I'm twisting words then we have a
> misunderstanding and your approach is not the best way toward
> clarifying it. Please have that in mind next time, because I do not
> have such an intent and I never will.
>
> I referred to panic/BUG_ON in this scenario regardless of the place
> from which it would be invoked. My understanding from your previous
> answers is that you think the system should not panic/BUG_ON in this
> scenario, because this kind of error the system should be able to
> survive.
>
>>
>> I am going to repeat the content of my answer to your last e-mail:
>>
>> I was aware about it since the beginning. The whole point of the
>> conversation was we should avoid to take the decision at the lower level and
>> let the upper layer decide what to do.
>>
>> If the system is failing today then that's fine and still fit what I said in
>> my first e-mail of that thread. For reminder:
>>
>> "We should really avoid to use panic(...) if this is something the system
>> can survive. In that specific case, it would only affect the current CPU. So
>> it would be better to return an error and let the caller decide what to do."
>>
>> To summarize:
>>         1) Notifiers should only report an error and let the upper caller
>> (here notify_cpu_starting()) deciding what to do.
>>         2) I am OK with the BUG_ON in notify_cpu_starting() for now.
>
> I agree with BUG_ON in notify_cpu_starting().
>
>>
>
> Let me just clarify consequence of your proposal according to my
> understanding. If instead of stopping the CPU when enabling a
> capability fails the notifier returns an error, the error will
> propagate to notify_cpu_starting() and BUG_ON will crash the system.
> The proposal with stop_cpu() in the enable_nonboot_cpu_caps() instead
> of panic that is submitted in this patch would stop only the erroneous
> CPU. The rest of the system will continue to work and I though that is
> what's the goal since we don't want to panic/BUG_ON.
> From that perspective I believe stop_cpu() in
> enable_nonboot_cpu_caps() is better compared to returning an error
> from the notifier.
>
> You said above "I would be ok having stop_cpu called here" and I did
> so (stop_cpu() in enable_nonboot_cpu_caps() instead of panic that
> submitted in this patch).
>
> If you believe my understanding is not correct, if I missed something
> or you have another proposal please let me know.
>

Also, if you just want to convert panic from this patch into print I
don't believe it's a good approach, but I can do that.

> Thanks,
> Mirela
>
>> Cheers,
>>
>> --
>> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-11 10:41                       ` Mirela Simonovic
  2018-05-11 10:54                         ` Julien Grall
@ 2018-05-11 13:01                         ` Dario Faggioli
  1 sibling, 0 replies; 53+ messages in thread
From: Dario Faggioli @ 2018-05-11 13:01 UTC (permalink / raw)
  To: Mirela Simonovic
  Cc: Edgar E. Iglesias, Julien Grall, Stefano Stabellini,
	Davorin Mista, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 3759 bytes --]

On Fri, 2018-05-11 at 12:41 +0200, Mirela Simonovic wrote:
> Hi Dario,
> 
Hi,

> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli <dfaggioli@suse.com>
> wrote:
> > I may very well be missing or misunderstanding something, but I
> > continue to think that the problem here is that CPU_STARTING can't,
> > right now, fail, while you need it to be able to.
> > 
> > If that is the case, giving different priorities to the notifier,
> > is
> > not a solution.
> > 
> Let me try to clarify. The assumption is that the starting CPU can
> fail. Additional requirement set by Julien is that panic or BUG_ON is
> not acceptable. 
>
So, currently, in Xen, CPU bringup can't fail at the CPU_STARTING
phase. This is the point of the BUG_ON().

It is you that need it to change this, and make it possible for CPU
bringup to fail during CPU_STARTING. This is fine, but require changes,
which, IMO, are not limited to removing the BUG_ON() or trading it with
something else.

> There are 2 ways to deal with this scenario:
> 
> 1) Ignore and report the error, and let the erroneous CPU become
> online.
> This cannot be done without changing the logic in either scheduler or
> notify_cpu_starting(), or I least I don't see how would that be done.
> In previous email when I said "escalating this to who knows where" I
> did not refer to error escalation but the escalation of the scope of
> this series.
> 
How can that be an option? If CPU bringup failed, how is it possible to
let it go online?

To me, it's not that "we can't let a CPU for which bringup failed
continue without changing the scheduler or notify_cpu_starting()". It's
rather "we must not a CPU for which bringup faile continue. Period.".

Which is to say, you need to change things (in common code!) in such a
way that CPU bringup can fail during the CPU_STARTING phase.

> 2) Stop the erroneous CPU.
> Taking this approach requires setting the priority for the notifier.
>
No! Stop the CPU if the bringup fails duting the CPU_STARTING phase
does not require setting the priorities of the CPU_STARTING notifier. 

It requires to changing things (in common code) in such a way that CPU
bringup can fail during the CPU_STARTING phase. (Did I say that
already? :-) )

I understand that, if you set the priority, your series work. But that
does not mean that it is a proper solution to the problem. It means
that it is an hack that...well... makes your series work. :-)

> The key point is that notify_cpu_starting() and scheduler do not have
> to be changed. If errata notifier has higher priority than
> scheduler's
> notifier in the case of an error the CPU will not return into
> notify_cpu_starting() and it will never execute BUG_ON because it
> will
> be stopped. The rest of the system will continue to function without
> that CPU.
> 
Right. But now we have and architecture (ARM) for which CPU bringup can
fail at the CPU_STARTING phase. And yet, looking at common code, we see
that the CPU_STARTING notifier has a BUG_ON() if it ever fails. How
would people looking at this in 2 years time from now make sense of
this?

No, I don't think there really is a sensible workaround here. If you
need the CPU_STARTING phase of CPU bringup to be able to fail, you need
to make all the changes required to make that happen properly.

Which does involve changing common code, and which should therefore be
discussed with the other arches and core hypervisor maintainers.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-11 12:20                             ` Mirela Simonovic
@ 2018-05-11 13:08                               ` Julien Grall
  2018-05-11 13:31                                 ` Dario Faggioli
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2018-05-11 13:08 UTC (permalink / raw)
  To: Mirela Simonovic
  Cc: Edgar E. Iglesias, Stefano Stabellini, Xen Devel, Davorin Mista,
	Dario Faggioli



On 11/05/18 13:20, Mirela Simonovic wrote:
> Hi,
> 
> On Fri, May 11, 2018 at 2:07 PM, Mirela Simonovic
> <mirela.simonovic@aggios.com> wrote:
>> On Fri, May 11, 2018 at 12:54 PM, Julien Grall <julien.grall@arm.com> wrote:
>>> On 11/05/18 11:41, Mirela Simonovic wrote:
>>>> On Thu, May 10, 2018 at 6:24 PM, Dario Faggioli <dfaggioli@suse.com>
>>>> wrote:
>>>>> On Thu, 2018-05-10 at 17:49 +0200, Mirela Simonovic wrote:
>>> I am going to repeat the content of my answer to your last e-mail:
>>>
>>> I was aware about it since the beginning. The whole point of the
>>> conversation was we should avoid to take the decision at the lower level and
>>> let the upper layer decide what to do.
>>>
>>> If the system is failing today then that's fine and still fit what I said in
>>> my first e-mail of that thread. For reminder:
>>>
>>> "We should really avoid to use panic(...) if this is something the system
>>> can survive. In that specific case, it would only affect the current CPU. So
>>> it would be better to return an error and let the caller decide what to do."
>>>
>>> To summarize:
>>>          1) Notifiers should only report an error and let the upper caller
>>> (here notify_cpu_starting()) deciding what to do.
>>>          2) I am OK with the BUG_ON in notify_cpu_starting() for now.
>>
>> I agree with BUG_ON in notify_cpu_starting().
>>
>>>
>>
>> Let me just clarify consequence of your proposal according to my
>> understanding. If instead of stopping the CPU when enabling a
>> capability fails the notifier returns an error, the error will
>> propagate to notify_cpu_starting() and BUG_ON will crash the system.
>> The proposal with stop_cpu() in the enable_nonboot_cpu_caps() instead
>> of panic that is submitted in this patch would stop only the erroneous
>> CPU. The rest of the system will continue to work and I though that is
>> what's the goal since we don't want to panic/BUG_ON.
>>  From that perspective I believe stop_cpu() in
>> enable_nonboot_cpu_caps() is better compared to returning an error
>> from the notifier.
>>
>> You said above "I would be ok having stop_cpu called here" and I did
>> so (stop_cpu() in enable_nonboot_cpu_caps() instead of panic that
>> submitted in this patch).

My thoughts have evolved after Dario's discussion. He expressed 
concerned over your fix to make stop_cpu() working.

As you said I will maintain that code and this solution looks very error 
prone. If Stefano is happy with it, then fine.

>>
>> If you believe my understanding is not correct, if I missed something
>> or you have another proposal please let me know.
>>
> 
> Also, if you just want to convert panic from this patch into print I
> don't believe it's a good approach, but I can do that.

I would prefer to see the notifier reporting the error with a warning 
and returning it.

At the notifier level it does not make sense to take the decision to 
stop the CPU or kill the system. This is a decision that should be taken 
at higher level such as in notify_cpu_starting().

The whole idea here is we have only one place taking the decision and we 
don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is having 
only one place to fix over multiple one because very likely the decision 
is the same everywhere.

I agree that today it will end up to crashing the system because of the 
BUG_ON. But that's a separate topic.

Cheers,

> 
>> Thanks,
>> Mirela
>>
>>> Cheers,
>>>
>>> --
>>> Julien Grall

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-11 10:54                         ` Julien Grall
  2018-05-11 12:07                           ` Mirela Simonovic
@ 2018-05-11 13:12                           ` Dario Faggioli
  1 sibling, 0 replies; 53+ messages in thread
From: Dario Faggioli @ 2018-05-11 13:12 UTC (permalink / raw)
  To: Julien Grall, Mirela Simonovic
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 1456 bytes --]

On Fri, 2018-05-11 at 11:54 +0100, Julien Grall wrote:
> On 11/05/18 11:41, Mirela Simonovic wrote:
> "We should really avoid to use panic(...) if this is something the 
> system can survive. In that specific case, it would only affect the 
> current CPU. So it would be better to return an error and let the
> caller 
> decide what to do."
> 
> To summarize:
> 	1) Notifiers should only report an error and let the upper
> caller (here 
> notify_cpu_starting()) deciding what to do.
> 	2) I am OK with the BUG_ON in notify_cpu_starting() for now.
> 
And, in general, I agree with all this.

However (and I think I'm repeating this concept for, what, the 10th
time now?!?! :-P), in this specific case, the reason why none of the
existing CPU_STARTING callbacks report an error, is that the CPU
bringup process is, AFAICT, built around the assumption that once we
reached CPU_STARTING, things can't fail.

For this very reason, whatever new CPU_STARTING callback is introduced,
in this series or everywhere else, _can't_ fail, and _can't_ return any
error.

If we need to have a CPU_STARTING callback that can fail, we need to
change the CPU bringup process accordingly.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-11 13:08                               ` Julien Grall
@ 2018-05-11 13:31                                 ` Dario Faggioli
       [not found]                                   ` <CAKPH-Nj2znmdcjZEfFf83WmrcBS_u4R33yPOxAPWw37RHVZ38g@mail.gmail.com>
  2018-05-11 21:47                                   ` Stefano Stabellini
  0 siblings, 2 replies; 53+ messages in thread
From: Dario Faggioli @ 2018-05-11 13:31 UTC (permalink / raw)
  To: Julien Grall, Mirela Simonovic
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 1227 bytes --]

On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote:
> The whole idea here is we have only one place taking the decision and
> we 
> don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is
> having 
> only one place to fix over multiple one because very likely the
> decision 
> is the same everywhere.
> 
> I agree that today it will end up to crashing the system because of
> the 
> BUG_ON. But that's a separate topic.
> 
Yes!!! :-D

I.e., as I've said countless times, I would think that a series which
introduces a CPU_STARTING notifier that fails, should also deal with
adjusting the CPU process accordingly.

*BUT* if you ARM people are ok with arch/arm/ code that does that,
perhaps with a comment saying something like:

"This will cause us to hit the BUG_ON() in notify_cpu_starting(). To
fix that, we need to properly change the CPU bringup code, which will
happen in a leter series."

that would also work, I guess. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
       [not found]                                   ` <CAKPH-Nj2znmdcjZEfFf83WmrcBS_u4R33yPOxAPWw37RHVZ38g@mail.gmail.com>
@ 2018-05-11 14:14                                     ` Dario Faggioli
  0 siblings, 0 replies; 53+ messages in thread
From: Dario Faggioli @ 2018-05-11 14:14 UTC (permalink / raw)
  To: Mirela Simonovic
  Cc: Edgar E. Iglesias, Julien Grall, Stefano Stabellini,
	Davorin Mista, Xen Devel


[-- Attachment #1.1: Type: text/plain, Size: 2791 bytes --]

On Fri, 2018-05-11 at 15:44 +0200, Mirela Simonovic wrote:
> Hi Dario and Julien,
> 
Err... you've dropped the list and everyone else but me. Re-adding...

> Thanks for the feedback to both. 
>
You're welcome. :-) 

> I think we need to roll back here. I
> believe the root cause of this is an attempt to do errata workarounds
> using notifiers.
> But let me try to enumerate all the options I see as possible
> solutions:
> 
> 1) Don't use notifiers to do errata workaround. Do it before
> CPU_STARTING fires, essentially from start_secondary() before calling
> notify_cpu_starting(). But we need to stop the CPU within
> start_secondary() if enabling errata fails. In start_secondary()
> stop_cpu is already done so I don't see why would an additional call
> be a problem.
> 
I'm no expert of ARM's start_secondary(). Indeed it looks like it can
fail already, so what you say seems to make sense, but really, I better
don't comment on this and leave it to ARM people.

> 2) Still try to use notifiers. We have few options here:
> 2.a) Enabling capability must not fail because a notifier at this
> stage should not fail. This would mean that function to which
> 'enable'
> ptr points (defined in struct arm_cpu_capabilities) has to return
> void
> instead of int. This doesn't seem right to me.
>
It's not.

> 2.b) Change scheduler and whatever else is needed (right place to
> refer to escalation of the scope of series :) in order to make
> CPU_STARTING possible to fail. 
>
Nope, this is just as wrong as 2.a.

> I'm not the person to do that since it
> affects way beyond what I suppose to do. Please note here that I'm
> not
> running away from doing the job. I'm just concerned that this will
> compromise the actual work I suppose to do from the funding/time
> perspective.
> 2.c) Return an error and hit BUG_ON. Add comments as Dario proposed.
> I
> need to state here that I probably won't be the one to implement the
> following series that allows CPU_STARTING to fail.
> 
Yes, this is correct, from the point of view of ARM vs common code
interaction.

Whether or not it is ok to leave this pending, is again up to ARM
people, IMO, as it would be ARM that risks panicing, if the notifier
fails.

> Option 1) or 2.c) sounds like a good compromise to me. What do you
> think?
> 
Well, all I can say is that you should stay away from notifiers
priority --especially of the one of cpu_schedule_callback(). As long as
you do that, I won't be on your way. :-D

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-11 13:31                                 ` Dario Faggioli
       [not found]                                   ` <CAKPH-Nj2znmdcjZEfFf83WmrcBS_u4R33yPOxAPWw37RHVZ38g@mail.gmail.com>
@ 2018-05-11 21:47                                   ` Stefano Stabellini
  2018-05-14  9:44                                     ` Julien Grall
  1 sibling, 1 reply; 53+ messages in thread
From: Stefano Stabellini @ 2018-05-11 21:47 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel,
	Julien Grall, Mirela Simonovic

On Fri, 11 May 2018, Dario Faggioli wrote:
> On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote:
> > The whole idea here is we have only one place taking the decision and
> > we 
> > don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is
> > having 
> > only one place to fix over multiple one because very likely the
> > decision 
> > is the same everywhere.
> > 
> > I agree that today it will end up to crashing the system because of
> > the 
> > BUG_ON. But that's a separate topic.
> > 
> Yes!!! :-D
> 
> I.e., as I've said countless times, I would think that a series which
> introduces a CPU_STARTING notifier that fails, should also deal with
> adjusting the CPU process accordingly.
> 
> *BUT* if you ARM people are ok with arch/arm/ code that does that,
> perhaps with a comment saying something like:
> 
> "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To
> fix that, we need to properly change the CPU bringup code, which will
> happen in a leter series."
> 
> that would also work, I guess. :-)

Yes, I think that returning error with an in-code comment on top is the
best solution.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-11 21:47                                   ` Stefano Stabellini
@ 2018-05-14  9:44                                     ` Julien Grall
  2018-05-14 16:59                                       ` Stefano Stabellini
  0 siblings, 1 reply; 53+ messages in thread
From: Julien Grall @ 2018-05-14  9:44 UTC (permalink / raw)
  To: Stefano Stabellini, Dario Faggioli
  Cc: Edgar E. Iglesias, Xen Devel, Mirela Simonovic, Davorin Mista



On 11/05/18 22:47, Stefano Stabellini wrote:
> On Fri, 11 May 2018, Dario Faggioli wrote:
>> On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote:
>>> The whole idea here is we have only one place taking the decision and
>>> we
>>> don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is
>>> having
>>> only one place to fix over multiple one because very likely the
>>> decision
>>> is the same everywhere.
>>>
>>> I agree that today it will end up to crashing the system because of
>>> the
>>> BUG_ON. But that's a separate topic.
>>>
>> Yes!!! :-D
>>
>> I.e., as I've said countless times, I would think that a series which
>> introduces a CPU_STARTING notifier that fails, should also deal with
>> adjusting the CPU process accordingly.
>>
>> *BUT* if you ARM people are ok with arch/arm/ code that does that,
>> perhaps with a comment saying something like:
>>
>> "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To
>> fix that, we need to properly change the CPU bringup code, which will
>> happen in a leter series."
>>
>> that would also work, I guess. :-)
> 
> Yes, I think that returning error with an in-code comment on top is the
> best solution.

It is the second best solution ;). If we consider the notifier can 
return an error, then best solution is to fix notify_cpu_starting().

I would be ok with the second best solution if we have someone to fix it 
for Xen 4.12. Per my understanding, Mirela is not going to do it. So 
what's the plan here?

Another solution is to impose all the enable callbacks to never return 
an error (AFAICT Linux is just ignoring the return of the callback)).

Today, we happen to return an error only in the case vmap is failing 
(used to remapped vector table read-write). It might be possible to 
avoid the potential re-mapping failure by reworking the code.

I could explore that solution if we prefer going towards imposing all 
the enable callbacks to never return an error.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-14  9:44                                     ` Julien Grall
@ 2018-05-14 16:59                                       ` Stefano Stabellini
  2018-05-15 11:45                                         ` Mirela Simonovic
  0 siblings, 1 reply; 53+ messages in thread
From: Stefano Stabellini @ 2018-05-14 16:59 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Stefano Stabellini, Davorin Mista, Xen Devel,
	Dario Faggioli, Mirela Simonovic

On Mon, 14 May 2018, Julien Grall wrote:
> On 11/05/18 22:47, Stefano Stabellini wrote:
> > On Fri, 11 May 2018, Dario Faggioli wrote:
> > > On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote:
> > > > The whole idea here is we have only one place taking the decision and
> > > > we
> > > > don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is
> > > > having
> > > > only one place to fix over multiple one because very likely the
> > > > decision
> > > > is the same everywhere.
> > > > 
> > > > I agree that today it will end up to crashing the system because of
> > > > the
> > > > BUG_ON. But that's a separate topic.
> > > > 
> > > Yes!!! :-D
> > > 
> > > I.e., as I've said countless times, I would think that a series which
> > > introduces a CPU_STARTING notifier that fails, should also deal with
> > > adjusting the CPU process accordingly.
> > > 
> > > *BUT* if you ARM people are ok with arch/arm/ code that does that,
> > > perhaps with a comment saying something like:
> > > 
> > > "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To
> > > fix that, we need to properly change the CPU bringup code, which will
> > > happen in a leter series."
> > > 
> > > that would also work, I guess. :-)
> > 
> > Yes, I think that returning error with an in-code comment on top is the
> > best solution.
> 
> It is the second best solution ;). If we consider the notifier can return an
> error, then best solution is to fix notify_cpu_starting().
>
> I would be ok with the second best solution if we have someone to fix it for
> Xen 4.12. Per my understanding, Mirela is not going to do it. So what's the
> plan here?

I can look at fixing notify_cpu_starting(). I am also OK with you
reworking the vmap code as you suggested below. Regardless, I think
Mirela should go ahead with the comment now. Then, either you or me are
going to come in and remove the comment one way or another (either
fixing notify_cpu_starting or imposing all the callbacks to never return
an error).


> Another solution is to impose all the enable callbacks to never return an
> error (AFAICT Linux is just ignoring the return of the callback)).
> 
> Today, we happen to return an error only in the case vmap is failing (used to
> remapped vector table read-write). It might be possible to avoid the potential
> re-mapping failure by reworking the code.
> 
> I could explore that solution if we prefer going towards imposing all the
> enable callbacks to never return an error.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
  2018-05-14 16:59                                       ` Stefano Stabellini
@ 2018-05-15 11:45                                         ` Mirela Simonovic
  0 siblings, 0 replies; 53+ messages in thread
From: Mirela Simonovic @ 2018-05-15 11:45 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Edgar E. Iglesias, Julien Grall, Xen Devel, Davorin Mista,
	Dario Faggioli

Hi Stefano,

On Mon, May 14, 2018 at 6:59 PM, Stefano Stabellini
<sstabellini@kernel.org> wrote:
> On Mon, 14 May 2018, Julien Grall wrote:
>> On 11/05/18 22:47, Stefano Stabellini wrote:
>> > On Fri, 11 May 2018, Dario Faggioli wrote:
>> > > On Fri, 2018-05-11 at 14:08 +0100, Julien Grall wrote:
>> > > > The whole idea here is we have only one place taking the decision and
>> > > > we
>> > > > don't spread BUG_ON()/panic/stop_cpu everywhere. The benefit is
>> > > > having
>> > > > only one place to fix over multiple one because very likely the
>> > > > decision
>> > > > is the same everywhere.
>> > > >
>> > > > I agree that today it will end up to crashing the system because of
>> > > > the
>> > > > BUG_ON. But that's a separate topic.
>> > > >
>> > > Yes!!! :-D
>> > >
>> > > I.e., as I've said countless times, I would think that a series which
>> > > introduces a CPU_STARTING notifier that fails, should also deal with
>> > > adjusting the CPU process accordingly.
>> > >
>> > > *BUT* if you ARM people are ok with arch/arm/ code that does that,
>> > > perhaps with a comment saying something like:
>> > >
>> > > "This will cause us to hit the BUG_ON() in notify_cpu_starting(). To
>> > > fix that, we need to properly change the CPU bringup code, which will
>> > > happen in a leter series."
>> > >
>> > > that would also work, I guess. :-)
>> >
>> > Yes, I think that returning error with an in-code comment on top is the
>> > best solution.
>>
>> It is the second best solution ;). If we consider the notifier can return an
>> error, then best solution is to fix notify_cpu_starting().
>>
>> I would be ok with the second best solution if we have someone to fix it for
>> Xen 4.12. Per my understanding, Mirela is not going to do it. So what's the
>> plan here?
>
> I can look at fixing notify_cpu_starting(). I am also OK with you
> reworking the vmap code as you suggested below. Regardless, I think
> Mirela should go ahead with the comment now. Then, either you or me are
> going to come in and remove the comment one way or another (either
> fixing notify_cpu_starting or imposing all the callbacks to never return
> an error).
>

Thanks, I submitted v4.

Regards,
Mirela

>
>> Another solution is to impose all the enable callbacks to never return an
>> error (AFAICT Linux is just ignoring the return of the callback)).
>>
>> Today, we happen to return an error only in the case vmap is failing (used to
>> remapped vector table read-write). It might be possible to avoid the potential
>> re-mapping failure by reworking the code.
>>
>> I could explore that solution if we prefer going towards imposing all the
>> enable callbacks to never return an error.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2018-05-15 11:45 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 01/10] xen/arm64: Added handling of the trapped access to OSLSR register Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 02/10] xen/arm: Ignore write to GICD_ISACTIVERn registers (vgic-v2) Mirela Simonovic
2018-04-30 14:32   ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 03/10] xen/arm: Implement CPU_OFF PSCI call (physical interface) Mirela Simonovic
2018-04-30 14:36   ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 04/10] xen/arm: Remove __initdata and __init to enable CPU hotplug Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume Mirela Simonovic
2018-04-30 14:47   ` Julien Grall
2018-05-07 14:55     ` Mirela Simonovic
2018-05-08 14:14       ` Julien Grall
2018-05-08 14:28         ` Mirela Simonovic
2018-05-09 10:32           ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 06/10] xen/common: Restore IRQ affinity when hotplugging a pCPU Mirela Simonovic
2018-05-10  7:33   ` Dario Faggioli
2018-04-27 17:12 ` [PATCH v3 07/10] xen/arm: Release maintenance interrupt when CPU is hot-unplugged Mirela Simonovic
2018-04-30 14:51   ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 08/10] xen/arm: Release timer interrupts " Mirela Simonovic
2018-04-30 15:58   ` Julien Grall
2018-05-07 15:33     ` Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 09/10] xen/arm: Free memory allocated for sibling/core maps on CPU hot-unplug Mirela Simonovic
2018-04-30 15:55   ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot Mirela Simonovic
2018-04-30 16:09   ` Julien Grall
2018-05-09 15:48     ` Mirela Simonovic
2018-05-09 16:32       ` Julien Grall
2018-05-10 11:57         ` Mirela Simonovic
2018-05-10 13:24           ` Mirela Simonovic
2018-05-10 13:29             ` Julien Grall
2018-05-10 14:12               ` Mirela Simonovic
2018-05-10 14:35                 ` Julien Grall
2018-05-10 14:25             ` Dario Faggioli
2018-05-10 15:00               ` Mirela Simonovic
2018-05-10 15:13                 ` Julien Grall
2018-05-10 15:49                   ` Mirela Simonovic
2018-05-10 16:02                     ` Julien Grall
2018-05-10 16:21                       ` Dario Faggioli
2018-05-10 16:24                     ` Dario Faggioli
2018-05-11 10:41                       ` Mirela Simonovic
2018-05-11 10:54                         ` Julien Grall
2018-05-11 12:07                           ` Mirela Simonovic
2018-05-11 12:20                             ` Mirela Simonovic
2018-05-11 13:08                               ` Julien Grall
2018-05-11 13:31                                 ` Dario Faggioli
     [not found]                                   ` <CAKPH-Nj2znmdcjZEfFf83WmrcBS_u4R33yPOxAPWw37RHVZ38g@mail.gmail.com>
2018-05-11 14:14                                     ` Dario Faggioli
2018-05-11 21:47                                   ` Stefano Stabellini
2018-05-14  9:44                                     ` Julien Grall
2018-05-14 16:59                                       ` Stefano Stabellini
2018-05-15 11:45                                         ` Mirela Simonovic
2018-05-11 13:12                           ` Dario Faggioli
2018-05-11 13:01                         ` Dario Faggioli
2018-05-10 16:12                   ` Dario Faggioli
2018-05-10 13:33           ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.