All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Multicore support for ARINC653 scheduler
@ 2020-09-16 18:18 Jeff Kubascik
  2020-09-16 18:18 ` [PATCH 1/5] sched/arinc653: Clean up comments Jeff Kubascik
                   ` (4 more replies)
  0 siblings, 5 replies; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-16 18:18 UTC (permalink / raw)
  To: xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli, Jeff Kubascik

This patch set adds multicore capability to the ARINC653 scheduler, based on the
guidance presented in the CAST-32A position paper. This approach only allows for
a single domain to run at any given time, but that domain is now able to use
multiple vCPUs running across the available pCPUs.

There are 5 patches in this series. The first 4 patches are intended to tidy up
the arinc653 scheduler code, so that it more closely resembles the coding style
and structure found in the other schedulers (mainly credit). The last patch
implements multicore support.

I have tested this feature on both the Xilinx MPSoC/ ZCU102 development board
and with ARMv8 QEMU. I have a bash script that tests various edge cases such as
schedule changes, creation and destruction of domains in the cpupool, domain
migration across cpupools, vCPU pinning, and vCPU overprovisioning (more vCPUs
than pCPUs). In each of these cases, the scheduler either works as expected or
behaves in a sane manner. For the latter, any quirks that were identified are
documented in the source code.

This patch set has been sitting on the backburner for the better part of a year,
so I have rebased it to latest commit on master. The tests above were re-run and
no problems were identified. I'm looking for feedback on the patch set to so
that it may be accepted into the Xen code base.

Jeff Kubascik (5):
  sched/arinc653: Clean up comments
  sched/arinc653: Rename scheduler private structs
  sched/arinc653: Clean up function definitions
  sched/arinc653: Reorganize function definition order
  sched/arinc653: Implement CAST-32A multicore scheduling

 xen/common/sched/arinc653.c | 1190 ++++++++++++++++++++---------------
 1 file changed, 686 insertions(+), 504 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/5] sched/arinc653: Clean up comments
  2020-09-16 18:18 [PATCH 0/5] Multicore support for ARINC653 scheduler Jeff Kubascik
@ 2020-09-16 18:18 ` Jeff Kubascik
  2020-09-17 13:24   ` Andrew Cooper
  2020-09-16 18:18 ` [PATCH 2/5] sched/arinc653: Rename scheduler private structs Jeff Kubascik
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-16 18:18 UTC (permalink / raw)
  To: xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli, Jeff Kubascik

The arinc653 module has function header comment blocks and other comment
inconsistencies not in line with the Xen coding style. This change
cleans up the code to better match the Xen coding style, and has no
functional changes.

Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
---
 xen/common/sched/arinc653.c | 229 +++++-------------------------------
 1 file changed, 29 insertions(+), 200 deletions(-)

diff --git a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c
index 5421918221..7bb75ffe2b 100644
--- a/xen/common/sched/arinc653.c
+++ b/xen/common/sched/arinc653.c
@@ -36,79 +36,56 @@
 
 #include "private.h"
 
-/**************************************************************************
- * Private Macros                                                         *
- **************************************************************************/
-
-/**
- * Default timeslice for domain 0.
+/*
+ * Default timeslice for domain 0
  */
 #define DEFAULT_TIMESLICE MILLISECS(10)
 
-/**
- * Retrieve the idle UNIT for a given physical CPU
+/*
+ * Retrieve the idle UNIT for a given pCPU
  */
 #define IDLETASK(cpu)  (sched_idle_unit(cpu))
 
-/**
+/*
  * Return a pointer to the ARINC 653-specific scheduler data information
- * associated with the given UNIT (unit)
+ * associated with the given UNIT
  */
 #define AUNIT(unit) ((arinc653_unit_t *)(unit)->priv)
 
-/**
+/*
  * Return the global scheduler private data given the scheduler ops pointer
  */
 #define SCHED_PRIV(s) ((a653sched_priv_t *)((s)->sched_data))
 
-/**************************************************************************
- * Private Type Definitions                                               *
- **************************************************************************/
-
-/**
- * The arinc653_unit_t structure holds ARINC 653-scheduler-specific
- * information for all non-idle UNITs
+/*
+ * Schedule unit
  */
 typedef struct arinc653_unit_s
 {
-    /* unit points to Xen's struct sched_unit so we can get to it from an
-     * arinc653_unit_t pointer. */
-    struct sched_unit * unit;
-    /* awake holds whether the UNIT has been woken with vcpu_wake() */
-    bool                awake;
-    /* list holds the linked list information for the list this UNIT
-     * is stored in */
-    struct list_head    list;
+    struct sched_unit *unit;            /* Up-pointer to UNIT */
+    bool awake;                         /* UNIT awake flag */
+    struct list_head list;              /* On the scheduler private data */
 } arinc653_unit_t;
 
-/**
- * The sched_entry_t structure holds a single entry of the
- * ARINC 653 schedule.
+/*
+ * Domain frame entry in the ARINC 653 schedule
  */
 typedef struct sched_entry_s
 {
-    /* dom_handle holds the handle ("UUID") for the domain that this
-     * schedule entry refers to. */
-    xen_domain_handle_t dom_handle;
-    /* unit_id holds the UNIT number for the UNIT that this schedule
-     * entry refers to. */
-    int                 unit_id;
-    /* runtime holds the number of nanoseconds that the UNIT for this
-     * schedule entry should be allowed to run per major frame. */
-    s_time_t            runtime;
-    /* unit holds a pointer to the Xen sched_unit structure */
-    struct sched_unit * unit;
+    xen_domain_handle_t dom_handle;     /* UUID of the domain */
+    int unit_id;                        /* UNIT number for reference */
+    s_time_t runtime;                   /* Duration of the frame */
+    struct sched_unit *unit;            /* Pointer to UNIT */
 } sched_entry_t;
 
-/**
- * This structure defines data that is global to an instance of the scheduler
+/*
+ * Scheduler private data
  */
 typedef struct a653sched_priv_s
 {
-    /* lock for the whole pluggable scheduler, nests inside cpupool_lock */
-    spinlock_t lock;
+    spinlock_t lock;                    /* Scheduler private lock */
 
-    /**
+    /*
      * This array holds the active ARINC 653 schedule.
      *
      * When the system tries to start a new UNIT, this schedule is scanned
@@ -118,7 +95,7 @@ typedef struct a653sched_priv_s
      */
     sched_entry_t schedule[ARINC653_MAX_DOMAINS_PER_SCHEDULE];
 
-    /**
+    /*
      * This variable holds the number of entries that are valid in
      * the arinc653_schedule table.
      *
@@ -129,57 +106,19 @@ typedef struct a653sched_priv_s
      */
     unsigned int num_schedule_entries;
 
-    /**
-     * the major frame time for the ARINC 653 schedule.
-     */
-    s_time_t major_frame;
-
-    /**
-     * the time that the next major frame starts
-     */
-    s_time_t next_major_frame;
+    s_time_t major_frame;               /* Duration of a major frame */
+    s_time_t next_major_frame;          /* When to switch to the next frame */
 
-    /**
-     * pointers to all Xen UNIT structures for iterating through
-     */
-    struct list_head unit_list;
+    struct list_head unit_list;         /* UNITs belonging to this scheduler */
 } a653sched_priv_t;
 
-/**************************************************************************
- * Helper functions                                                       *
- **************************************************************************/
-
-/**
- * This function compares two domain handles.
- *
- * @param h1        Pointer to handle 1
- * @param h2        Pointer to handle 2
- *
- * @return          <ul>
- *                  <li> <0:  handle 1 is less than handle 2
- *                  <li>  0:  handle 1 is equal to handle 2
- *                  <li> >0:  handle 1 is greater than handle 2
- *                  </ul>
- */
+/* This function compares two domain handles */
 static int dom_handle_cmp(const xen_domain_handle_t h1,
                           const xen_domain_handle_t h2)
 {
     return memcmp(h1, h2, sizeof(xen_domain_handle_t));
 }
 
-/**
- * This function searches the unit list to find a UNIT that matches
- * the domain handle and UNIT ID specified.
- *
- * @param ops       Pointer to this instance of the scheduler structure
- * @param handle    Pointer to handler
- * @param unit_id   UNIT ID
- *
- * @return          <ul>
- *                  <li> Pointer to the matching UNIT if one is found
- *                  <li> NULL otherwise
- *                  </ul>
- */
 static struct sched_unit *find_unit(
     const struct scheduler *ops,
     xen_domain_handle_t handle,
@@ -187,7 +126,6 @@ static struct sched_unit *find_unit(
 {
     arinc653_unit_t *aunit;
 
-    /* loop through the unit_list looking for the specified UNIT */
     list_for_each_entry ( aunit, &SCHED_PRIV(ops)->unit_list, list )
         if ( (dom_handle_cmp(aunit->unit->domain->handle, handle) == 0)
              && (unit_id == aunit->unit->unit_id) )
@@ -196,13 +134,6 @@ static struct sched_unit *find_unit(
     return NULL;
 }
 
-/**
- * This function updates the pointer to the Xen UNIT structure for each entry
- * in the ARINC 653 schedule.
- *
- * @param ops       Pointer to this instance of the scheduler structure
- * @return          <None>
- */
 static void update_schedule_units(const struct scheduler *ops)
 {
     unsigned int i, n_entries = SCHED_PRIV(ops)->num_schedule_entries;
@@ -214,17 +145,6 @@ static void update_schedule_units(const struct scheduler *ops)
                       SCHED_PRIV(ops)->schedule[i].unit_id);
 }
 
-/**
- * This function is called by the adjust_global scheduler hook to put
- * in place a new ARINC653 schedule.
- *
- * @param ops       Pointer to this instance of the scheduler structure
- *
- * @return          <ul>
- *                  <li> 0 = success
- *                  <li> !0 = error
- *                  </ul>
- */
 static int
 arinc653_sched_set(
     const struct scheduler *ops,
@@ -238,7 +158,7 @@ arinc653_sched_set(
 
     spin_lock_irqsave(&sched_priv->lock, flags);
 
-    /* Check for valid major frame and number of schedule entries. */
+    /* Check for valid major frame and number of schedule entries */
     if ( (schedule->major_frame <= 0)
          || (schedule->num_sched_entries < 1)
          || (schedule->num_sched_entries > ARINC653_MAX_DOMAINS_PER_SCHEDULE) )
@@ -256,7 +176,7 @@ arinc653_sched_set(
 
     /*
      * Error if the major frame is not large enough to run all entries as
-     * indicated by comparing the total run time to the major frame length.
+     * indicated by comparing the total run time to the major frame length
      */
     if ( total_runtime > schedule->major_frame )
         goto fail;
@@ -292,16 +212,6 @@ arinc653_sched_set(
     return rc;
 }
 
-/**
- * This function is called by the adjust_global scheduler hook to read the
- * current ARINC 653 schedule
- *
- * @param ops       Pointer to this instance of the scheduler structure
- * @return          <ul>
- *                  <li> 0 = success
- *                  <li> !0 = error
- *                  </ul>
- */
 static int
 arinc653_sched_get(
     const struct scheduler *ops,
@@ -329,20 +239,6 @@ arinc653_sched_get(
     return 0;
 }
 
-/**************************************************************************
- * Scheduler callback functions                                           *
- **************************************************************************/
-
-/**
- * This function performs initialization for an instance of the scheduler.
- *
- * @param ops       Pointer to this instance of the scheduler structure
- *
- * @return          <ul>
- *                  <li> 0 = success
- *                  <li> !0 = error
- *                  </ul>
- */
 static int
 a653sched_init(struct scheduler *ops)
 {
@@ -361,11 +257,6 @@ a653sched_init(struct scheduler *ops)
     return 0;
 }
 
-/**
- * This function performs deinitialization for an instance of the scheduler
- *
- * @param ops       Pointer to this instance of the scheduler structure
- */
 static void
 a653sched_deinit(struct scheduler *ops)
 {
@@ -373,14 +264,6 @@ a653sched_deinit(struct scheduler *ops)
     ops->sched_data = NULL;
 }
 
-/**
- * This function allocates scheduler-specific data for a UNIT
- *
- * @param ops       Pointer to this instance of the scheduler structure
- * @param unit      Pointer to struct sched_unit
- *
- * @return          Pointer to the allocated data
- */
 static void *
 a653sched_alloc_udata(const struct scheduler *ops, struct sched_unit *unit,
                       void *dd)
@@ -437,11 +320,6 @@ a653sched_alloc_udata(const struct scheduler *ops, struct sched_unit *unit,
     return svc;
 }
 
-/**
- * This function frees scheduler-specific UNIT data
- *
- * @param ops       Pointer to this instance of the scheduler structure
- */
 static void
 a653sched_free_udata(const struct scheduler *ops, void *priv)
 {
@@ -463,12 +341,6 @@ a653sched_free_udata(const struct scheduler *ops, void *priv)
     spin_unlock_irqrestore(&sched_priv->lock, flags);
 }
 
-/**
- * Xen scheduler callback function to sleep a UNIT
- *
- * @param ops       Pointer to this instance of the scheduler structure
- * @param unit      Pointer to struct sched_unit
- */
 static void
 a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
 {
@@ -483,12 +355,6 @@ a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
         cpu_raise_softirq(sched_unit_master(unit), SCHEDULE_SOFTIRQ);
 }
 
-/**
- * Xen scheduler callback function to wake up a UNIT
- *
- * @param ops       Pointer to this instance of the scheduler structure
- * @param unit      Pointer to struct sched_unit
- */
 static void
 a653sched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
 {
@@ -498,13 +364,6 @@ a653sched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
     cpu_raise_softirq(sched_unit_master(unit), SCHEDULE_SOFTIRQ);
 }
 
-/**
- * Xen scheduler callback function to select a UNIT to run.
- * This is the main scheduler routine.
- *
- * @param ops       Pointer to this instance of the scheduler structure
- * @param now       Current time
- */
 static void
 a653sched_do_schedule(
     const struct scheduler *ops,
@@ -596,14 +455,6 @@ a653sched_do_schedule(
     BUG_ON(prev->next_time <= 0);
 }
 
-/**
- * Xen scheduler callback function to select a resource for the UNIT to run on
- *
- * @param ops       Pointer to this instance of the scheduler structure
- * @param unit      Pointer to struct sched_unit
- *
- * @return          Scheduler resource to run on
- */
 static struct sched_resource *
 a653sched_pick_resource(const struct scheduler *ops,
                         const struct sched_unit *unit)
@@ -626,14 +477,6 @@ a653sched_pick_resource(const struct scheduler *ops,
     return get_sched_res(cpu);
 }
 
-/**
- * Xen scheduler callback to change the scheduler of a cpu
- *
- * @param new_ops   Pointer to this instance of the scheduler structure
- * @param cpu       The cpu that is changing scheduler
- * @param pdata     scheduler specific PCPU data (we don't have any)
- * @param vdata     scheduler specific UNIT data of the idle unit
- */
 static spinlock_t *
 a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                   void *pdata, void *vdata)
@@ -648,14 +491,6 @@ a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     return &sr->_lock;
 }
 
-/**
- * Xen scheduler callback function to perform a global (not domain-specific)
- * adjustment. It is used by the ARINC 653 scheduler to put in place a new
- * ARINC 653 schedule or to retrieve the schedule currently in place.
- *
- * @param ops       Pointer to this instance of the scheduler structure
- * @param sc        Pointer to the scheduler operation specified by Domain 0
- */
 static int
 a653sched_adjust_global(const struct scheduler *ops,
                         struct xen_sysctl_scheduler_op *sc)
@@ -688,12 +523,6 @@ a653sched_adjust_global(const struct scheduler *ops,
     return rc;
 }
 
-/**
- * This structure defines our scheduler for Xen.
- * The entries tell Xen where to find our scheduler-specific
- * callback functions.
- * The symbol must be visible to the rest of Xen at link time.
- */
 static const struct scheduler sched_arinc653_def = {
     .name           = "ARINC 653 Scheduler",
     .opt_name       = "arinc653",
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/5] sched/arinc653: Rename scheduler private structs
  2020-09-16 18:18 [PATCH 0/5] Multicore support for ARINC653 scheduler Jeff Kubascik
  2020-09-16 18:18 ` [PATCH 1/5] sched/arinc653: Clean up comments Jeff Kubascik
@ 2020-09-16 18:18 ` Jeff Kubascik
  2020-09-17 12:09   ` Andrew Cooper
  2020-09-16 18:18 ` [PATCH 3/5] sched/arinc653: Clean up function definitions Jeff Kubascik
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-16 18:18 UTC (permalink / raw)
  To: xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli, Jeff Kubascik

The arinc653 module uses typedef struct with post fix tags for internal
structure definitions, which is not consistent with the Xen coding
style. This change cleans up the code to better match the style used
elsewhere in the Xen scheduler code, and has no functional changes.

Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
---
 xen/common/sched/arinc653.c | 42 ++++++++++++++++++-------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c
index 7bb75ffe2b..d8a23730c3 100644
--- a/xen/common/sched/arinc653.c
+++ b/xen/common/sched/arinc653.c
@@ -50,38 +50,38 @@
  * Return a pointer to the ARINC 653-specific scheduler data information
  * associated with the given UNIT
  */
-#define AUNIT(unit) ((arinc653_unit_t *)(unit)->priv)
+#define AUNIT(unit) ((struct a653sched_unit *)(unit)->priv)
 
 /*
  * Return the global scheduler private data given the scheduler ops pointer
  */
-#define SCHED_PRIV(s) ((a653sched_priv_t *)((s)->sched_data))
+#define SCHED_PRIV(s) ((struct a653sched_private *)((s)->sched_data))
 
 /*
  * Schedule unit
  */
-typedef struct arinc653_unit_s
+struct a653sched_unit
 {
     struct sched_unit *unit;            /* Up-pointer to UNIT */
     bool awake;                         /* UNIT awake flag */
     struct list_head list;              /* On the scheduler private data */
-} arinc653_unit_t;
+};
 
 /*
  * Domain frame entry in the ARINC 653 schedule
  */
-typedef struct sched_entry_s
+struct sched_entry
 {
     xen_domain_handle_t dom_handle;     /* UUID of the domain */
     int unit_id;                        /* UNIT number for reference */
     s_time_t runtime;                   /* Duration of the frame */
     struct sched_unit *unit;            /* Pointer to UNIT */
-} sched_entry_t;
+};
 
 /*
  * Scheduler private data
  */
-typedef struct a653sched_priv_s
+struct a653sched_private
 {
     spinlock_t lock;                    /* Scheduler private lock */
 
@@ -93,7 +93,7 @@ typedef struct a653sched_priv_s
      * and UNIT number match, then the UNIT is allowed to run. Its run time
      * (per major frame) is given in the third entry of the schedule.
      */
-    sched_entry_t schedule[ARINC653_MAX_DOMAINS_PER_SCHEDULE];
+    struct sched_entry schedule[ARINC653_MAX_DOMAINS_PER_SCHEDULE];
 
     /*
      * This variable holds the number of entries that are valid in
@@ -110,7 +110,7 @@ typedef struct a653sched_priv_s
     s_time_t next_major_frame;          /* When to switch to the next frame */
 
     struct list_head unit_list;         /* UNITs belonging to this scheduler */
-} a653sched_priv_t;
+};
 
 /* This function compares two domain handles */
 static int dom_handle_cmp(const xen_domain_handle_t h1,
@@ -124,7 +124,7 @@ static struct sched_unit *find_unit(
     xen_domain_handle_t handle,
     int unit_id)
 {
-    arinc653_unit_t *aunit;
+    struct a653sched_unit *aunit;
 
     list_for_each_entry ( aunit, &SCHED_PRIV(ops)->unit_list, list )
         if ( (dom_handle_cmp(aunit->unit->domain->handle, handle) == 0)
@@ -150,7 +150,7 @@ arinc653_sched_set(
     const struct scheduler *ops,
     struct xen_sysctl_arinc653_schedule *schedule)
 {
-    a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
+    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
     s_time_t total_runtime = 0;
     unsigned int i;
     unsigned long flags;
@@ -217,7 +217,7 @@ arinc653_sched_get(
     const struct scheduler *ops,
     struct xen_sysctl_arinc653_schedule *schedule)
 {
-    a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
+    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
     unsigned int i;
     unsigned long flags;
 
@@ -242,9 +242,9 @@ arinc653_sched_get(
 static int
 a653sched_init(struct scheduler *ops)
 {
-    a653sched_priv_t *prv;
+    struct a653sched_private *prv;
 
-    prv = xzalloc(a653sched_priv_t);
+    prv = xzalloc(struct a653sched_private);
     if ( prv == NULL )
         return -ENOMEM;
 
@@ -268,8 +268,8 @@ static void *
 a653sched_alloc_udata(const struct scheduler *ops, struct sched_unit *unit,
                       void *dd)
 {
-    a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
-    arinc653_unit_t *svc;
+    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
+    struct a653sched_unit *svc;
     unsigned int entry;
     unsigned long flags;
 
@@ -277,7 +277,7 @@ a653sched_alloc_udata(const struct scheduler *ops, struct sched_unit *unit,
      * Allocate memory for the ARINC 653-specific scheduler data information
      * associated with the given UNIT (unit).
      */
-    svc = xmalloc(arinc653_unit_t);
+    svc = xmalloc(struct a653sched_unit);
     if ( svc == NULL )
         return NULL;
 
@@ -323,8 +323,8 @@ a653sched_alloc_udata(const struct scheduler *ops, struct sched_unit *unit,
 static void
 a653sched_free_udata(const struct scheduler *ops, void *priv)
 {
-    a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
-    arinc653_unit_t *av = priv;
+    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
+    struct a653sched_unit *av = priv;
     unsigned long flags;
 
     if (av == NULL)
@@ -374,7 +374,7 @@ a653sched_do_schedule(
     struct sched_unit *new_task = NULL;
     static unsigned int sched_index = 0;
     static s_time_t next_switch_time;
-    a653sched_priv_t *sched_priv = SCHED_PRIV(ops);
+    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
     const unsigned int cpu = sched_get_resource_cpu(smp_processor_id());
     unsigned long flags;
 
@@ -482,7 +482,7 @@ a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
                   void *pdata, void *vdata)
 {
     struct sched_resource *sr = get_sched_res(cpu);
-    const arinc653_unit_t *svc = vdata;
+    const struct a653sched_unit *svc = vdata;
 
     ASSERT(!pdata && svc && is_idle_unit(svc->unit));
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/5] sched/arinc653: Clean up function definitions
  2020-09-16 18:18 [PATCH 0/5] Multicore support for ARINC653 scheduler Jeff Kubascik
  2020-09-16 18:18 ` [PATCH 1/5] sched/arinc653: Clean up comments Jeff Kubascik
  2020-09-16 18:18 ` [PATCH 2/5] sched/arinc653: Rename scheduler private structs Jeff Kubascik
@ 2020-09-16 18:18 ` Jeff Kubascik
  2020-09-17  8:09   ` Jan Beulich
  2020-09-16 18:18 ` [PATCH 4/5] sched/arinc653: Reorganize function definition order Jeff Kubascik
  2020-09-16 18:18 ` [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling Jeff Kubascik
  4 siblings, 1 reply; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-16 18:18 UTC (permalink / raw)
  To: xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli, Jeff Kubascik

Function definitions in the arinc653 module did not follow the Xen
coding style. Furthermore, a few function names used a different prefix.
This change cleans up the definitions to be consistent with the Xen
coding style, and has no functional changes.

Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
---
 xen/common/sched/arinc653.c | 68 +++++++++++++++----------------------
 1 file changed, 28 insertions(+), 40 deletions(-)

diff --git a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c
index d8a23730c3..5f3a1be990 100644
--- a/xen/common/sched/arinc653.c
+++ b/xen/common/sched/arinc653.c
@@ -119,10 +119,9 @@ static int dom_handle_cmp(const xen_domain_handle_t h1,
     return memcmp(h1, h2, sizeof(xen_domain_handle_t));
 }
 
-static struct sched_unit *find_unit(
-    const struct scheduler *ops,
-    xen_domain_handle_t handle,
-    int unit_id)
+static struct sched_unit *find_unit(const struct scheduler *ops,
+                                    xen_domain_handle_t handle,
+                                    int unit_id)
 {
     struct a653sched_unit *aunit;
 
@@ -145,10 +144,8 @@ static void update_schedule_units(const struct scheduler *ops)
                       SCHED_PRIV(ops)->schedule[i].unit_id);
 }
 
-static int
-arinc653_sched_set(
-    const struct scheduler *ops,
-    struct xen_sysctl_arinc653_schedule *schedule)
+static int a653sched_set(const struct scheduler *ops,
+                         struct xen_sysctl_arinc653_schedule *schedule)
 {
     struct a653sched_private *sched_priv = SCHED_PRIV(ops);
     s_time_t total_runtime = 0;
@@ -212,10 +209,8 @@ arinc653_sched_set(
     return rc;
 }
 
-static int
-arinc653_sched_get(
-    const struct scheduler *ops,
-    struct xen_sysctl_arinc653_schedule *schedule)
+static int a653sched_get(const struct scheduler *ops,
+                         struct xen_sysctl_arinc653_schedule *schedule)
 {
     struct a653sched_private *sched_priv = SCHED_PRIV(ops);
     unsigned int i;
@@ -239,8 +234,7 @@ arinc653_sched_get(
     return 0;
 }
 
-static int
-a653sched_init(struct scheduler *ops)
+static int a653sched_init(struct scheduler *ops)
 {
     struct a653sched_private *prv;
 
@@ -257,16 +251,15 @@ a653sched_init(struct scheduler *ops)
     return 0;
 }
 
-static void
-a653sched_deinit(struct scheduler *ops)
+static void a653sched_deinit(struct scheduler *ops)
 {
     xfree(SCHED_PRIV(ops));
     ops->sched_data = NULL;
 }
 
-static void *
-a653sched_alloc_udata(const struct scheduler *ops, struct sched_unit *unit,
-                      void *dd)
+static void *a653sched_alloc_udata(const struct scheduler *ops,
+                                   struct sched_unit *unit,
+                                   void *dd)
 {
     struct a653sched_private *sched_priv = SCHED_PRIV(ops);
     struct a653sched_unit *svc;
@@ -320,8 +313,7 @@ a653sched_alloc_udata(const struct scheduler *ops, struct sched_unit *unit,
     return svc;
 }
 
-static void
-a653sched_free_udata(const struct scheduler *ops, void *priv)
+static void a653sched_free_udata(const struct scheduler *ops, void *priv)
 {
     struct a653sched_private *sched_priv = SCHED_PRIV(ops);
     struct a653sched_unit *av = priv;
@@ -341,8 +333,8 @@ a653sched_free_udata(const struct scheduler *ops, void *priv)
     spin_unlock_irqrestore(&sched_priv->lock, flags);
 }
 
-static void
-a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
+static void a653sched_unit_sleep(const struct scheduler *ops,
+                                 struct sched_unit *unit)
 {
     if ( AUNIT(unit) != NULL )
         AUNIT(unit)->awake = false;
@@ -355,8 +347,8 @@ a653sched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit)
         cpu_raise_softirq(sched_unit_master(unit), SCHEDULE_SOFTIRQ);
 }
 
-static void
-a653sched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
+static void a653sched_unit_wake(const struct scheduler *ops,
+                                struct sched_unit *unit)
 {
     if ( AUNIT(unit) != NULL )
         AUNIT(unit)->awake = true;
@@ -364,12 +356,9 @@ a653sched_unit_wake(const struct scheduler *ops, struct sched_unit *unit)
     cpu_raise_softirq(sched_unit_master(unit), SCHEDULE_SOFTIRQ);
 }
 
-static void
-a653sched_do_schedule(
-    const struct scheduler *ops,
-    struct sched_unit *prev,
-    s_time_t now,
-    bool tasklet_work_scheduled)
+static void a653sched_do_schedule(const struct scheduler *ops,
+                                  struct sched_unit *prev, s_time_t now,
+                                  bool tasklet_work_scheduled)
 {
     struct sched_unit *new_task = NULL;
     static unsigned int sched_index = 0;
@@ -477,9 +466,9 @@ a653sched_pick_resource(const struct scheduler *ops,
     return get_sched_res(cpu);
 }
 
-static spinlock_t *
-a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
-                  void *pdata, void *vdata)
+static spinlock_t *a653sched_switch_sched(struct scheduler *new_ops,
+                                          unsigned int cpu, void *pdata,
+                                          void *vdata)
 {
     struct sched_resource *sr = get_sched_res(cpu);
     const struct a653sched_unit *svc = vdata;
@@ -491,9 +480,8 @@ a653_switch_sched(struct scheduler *new_ops, unsigned int cpu,
     return &sr->_lock;
 }
 
-static int
-a653sched_adjust_global(const struct scheduler *ops,
-                        struct xen_sysctl_scheduler_op *sc)
+static int a653sched_adjust_global(const struct scheduler *ops,
+                                   struct xen_sysctl_scheduler_op *sc)
 {
     struct xen_sysctl_arinc653_schedule local_sched;
     int rc = -EINVAL;
@@ -507,11 +495,11 @@ a653sched_adjust_global(const struct scheduler *ops,
             break;
         }
 
-        rc = arinc653_sched_set(ops, &local_sched);
+        rc = a653sched_set(ops, &local_sched);
         break;
     case XEN_SYSCTL_SCHEDOP_getinfo:
         memset(&local_sched, -1, sizeof(local_sched));
-        rc = arinc653_sched_get(ops, &local_sched);
+        rc = a653sched_get(ops, &local_sched);
         if ( rc )
             break;
 
@@ -547,7 +535,7 @@ static const struct scheduler sched_arinc653_def = {
 
     .pick_resource  = a653sched_pick_resource,
 
-    .switch_sched   = a653_switch_sched,
+    .switch_sched   = a653sched_switch_sched,
 
     .adjust         = NULL,
     .adjust_global  = a653sched_adjust_global,
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/5] sched/arinc653: Reorganize function definition order
  2020-09-16 18:18 [PATCH 0/5] Multicore support for ARINC653 scheduler Jeff Kubascik
                   ` (2 preceding siblings ...)
  2020-09-16 18:18 ` [PATCH 3/5] sched/arinc653: Clean up function definitions Jeff Kubascik
@ 2020-09-16 18:18 ` Jeff Kubascik
  2020-09-17  8:12   ` Jan Beulich
  2020-09-16 18:18 ` [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling Jeff Kubascik
  4 siblings, 1 reply; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-16 18:18 UTC (permalink / raw)
  To: xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli, Jeff Kubascik

This change is in preperation for an overhaul of the arinc653 module. It
groups functions in a logical order and fills out the sched_arinc653_def
structure. There are no functional changes.

Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
---
 xen/common/sched/arinc653.c | 239 +++++++++++++++++++-----------------
 1 file changed, 123 insertions(+), 116 deletions(-)

diff --git a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c
index 5f3a1be990..0cd39d475f 100644
--- a/xen/common/sched/arinc653.c
+++ b/xen/common/sched/arinc653.c
@@ -144,96 +144,6 @@ static void update_schedule_units(const struct scheduler *ops)
                       SCHED_PRIV(ops)->schedule[i].unit_id);
 }
 
-static int a653sched_set(const struct scheduler *ops,
-                         struct xen_sysctl_arinc653_schedule *schedule)
-{
-    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
-    s_time_t total_runtime = 0;
-    unsigned int i;
-    unsigned long flags;
-    int rc = -EINVAL;
-
-    spin_lock_irqsave(&sched_priv->lock, flags);
-
-    /* Check for valid major frame and number of schedule entries */
-    if ( (schedule->major_frame <= 0)
-         || (schedule->num_sched_entries < 1)
-         || (schedule->num_sched_entries > ARINC653_MAX_DOMAINS_PER_SCHEDULE) )
-        goto fail;
-
-    for ( i = 0; i < schedule->num_sched_entries; i++ )
-    {
-        /* Check for a valid run time. */
-        if ( schedule->sched_entries[i].runtime <= 0 )
-            goto fail;
-
-        /* Add this entry's run time to total run time. */
-        total_runtime += schedule->sched_entries[i].runtime;
-    }
-
-    /*
-     * Error if the major frame is not large enough to run all entries as
-     * indicated by comparing the total run time to the major frame length
-     */
-    if ( total_runtime > schedule->major_frame )
-        goto fail;
-
-    /* Copy the new schedule into place. */
-    sched_priv->num_schedule_entries = schedule->num_sched_entries;
-    sched_priv->major_frame = schedule->major_frame;
-    for ( i = 0; i < schedule->num_sched_entries; i++ )
-    {
-        memcpy(sched_priv->schedule[i].dom_handle,
-               schedule->sched_entries[i].dom_handle,
-               sizeof(sched_priv->schedule[i].dom_handle));
-        sched_priv->schedule[i].unit_id =
-            schedule->sched_entries[i].vcpu_id;
-        sched_priv->schedule[i].runtime =
-            schedule->sched_entries[i].runtime;
-    }
-    update_schedule_units(ops);
-
-    /*
-     * The newly-installed schedule takes effect immediately. We do not even
-     * wait for the current major frame to expire.
-     *
-     * Signal a new major frame to begin. The next major frame is set up by
-     * the do_schedule callback function when it is next invoked.
-     */
-    sched_priv->next_major_frame = NOW();
-
-    rc = 0;
-
- fail:
-    spin_unlock_irqrestore(&sched_priv->lock, flags);
-    return rc;
-}
-
-static int a653sched_get(const struct scheduler *ops,
-                         struct xen_sysctl_arinc653_schedule *schedule)
-{
-    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
-    unsigned int i;
-    unsigned long flags;
-
-    spin_lock_irqsave(&sched_priv->lock, flags);
-
-    schedule->num_sched_entries = sched_priv->num_schedule_entries;
-    schedule->major_frame = sched_priv->major_frame;
-    for ( i = 0; i < sched_priv->num_schedule_entries; i++ )
-    {
-        memcpy(schedule->sched_entries[i].dom_handle,
-               sched_priv->schedule[i].dom_handle,
-               sizeof(sched_priv->schedule[i].dom_handle));
-        schedule->sched_entries[i].vcpu_id = sched_priv->schedule[i].unit_id;
-        schedule->sched_entries[i].runtime = sched_priv->schedule[i].runtime;
-    }
-
-    spin_unlock_irqrestore(&sched_priv->lock, flags);
-
-    return 0;
-}
-
 static int a653sched_init(struct scheduler *ops)
 {
     struct a653sched_private *prv;
@@ -257,6 +167,20 @@ static void a653sched_deinit(struct scheduler *ops)
     ops->sched_data = NULL;
 }
 
+static spinlock_t *a653sched_switch_sched(struct scheduler *new_ops,
+                                          unsigned int cpu, void *pdata,
+                                          void *vdata)
+{
+    struct sched_resource *sr = get_sched_res(cpu);
+    const struct a653sched_unit *svc = vdata;
+
+    ASSERT(!pdata && svc && is_idle_unit(svc->unit));
+
+    sched_idle_unit(cpu)->priv = vdata;
+
+    return &sr->_lock;
+}
+
 static void *a653sched_alloc_udata(const struct scheduler *ops,
                                    struct sched_unit *unit,
                                    void *dd)
@@ -356,6 +280,27 @@ static void a653sched_unit_wake(const struct scheduler *ops,
     cpu_raise_softirq(sched_unit_master(unit), SCHEDULE_SOFTIRQ);
 }
 
+static struct sched_resource *a653sched_pick_resource(const struct scheduler *ops,
+                                                      const struct sched_unit *unit)
+{
+    const cpumask_t *online;
+    unsigned int cpu;
+
+    /*
+     * If present, prefer unit's current processor, else
+     * just find the first valid unit.
+     */
+    online = cpupool_domain_master_cpumask(unit->domain);
+
+    cpu = cpumask_first(online);
+
+    if ( cpumask_test_cpu(sched_unit_master(unit), online)
+         || (cpu >= nr_cpu_ids) )
+        cpu = sched_unit_master(unit);
+
+    return get_sched_res(cpu);
+}
+
 static void a653sched_do_schedule(const struct scheduler *ops,
                                   struct sched_unit *prev, s_time_t now,
                                   bool tasklet_work_scheduled)
@@ -444,40 +389,94 @@ static void a653sched_do_schedule(const struct scheduler *ops,
     BUG_ON(prev->next_time <= 0);
 }
 
-static struct sched_resource *
-a653sched_pick_resource(const struct scheduler *ops,
-                        const struct sched_unit *unit)
+static int a653sched_set(const struct scheduler *ops,
+                         struct xen_sysctl_arinc653_schedule *schedule)
 {
-    const cpumask_t *online;
-    unsigned int cpu;
+    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
+    s_time_t total_runtime = 0;
+    unsigned int i;
+    unsigned long flags;
+    int rc = -EINVAL;
+
+    spin_lock_irqsave(&sched_priv->lock, flags);
+
+    /* Check for valid major frame and number of schedule entries */
+    if ( (schedule->major_frame <= 0)
+         || (schedule->num_sched_entries < 1)
+         || (schedule->num_sched_entries > ARINC653_MAX_DOMAINS_PER_SCHEDULE) )
+        goto fail;
+
+    for ( i = 0; i < schedule->num_sched_entries; i++ )
+    {
+        /* Check for a valid run time. */
+        if ( schedule->sched_entries[i].runtime <= 0 )
+            goto fail;
+
+        /* Add this entry's run time to total run time. */
+        total_runtime += schedule->sched_entries[i].runtime;
+    }
 
     /*
-     * If present, prefer unit's current processor, else
-     * just find the first valid unit.
+     * Error if the major frame is not large enough to run all entries as
+     * indicated by comparing the total run time to the major frame length
      */
-    online = cpupool_domain_master_cpumask(unit->domain);
+    if ( total_runtime > schedule->major_frame )
+        goto fail;
 
-    cpu = cpumask_first(online);
+    /* Copy the new schedule into place. */
+    sched_priv->num_schedule_entries = schedule->num_sched_entries;
+    sched_priv->major_frame = schedule->major_frame;
+    for ( i = 0; i < schedule->num_sched_entries; i++ )
+    {
+        memcpy(sched_priv->schedule[i].dom_handle,
+               schedule->sched_entries[i].dom_handle,
+               sizeof(sched_priv->schedule[i].dom_handle));
+        sched_priv->schedule[i].unit_id =
+            schedule->sched_entries[i].vcpu_id;
+        sched_priv->schedule[i].runtime =
+            schedule->sched_entries[i].runtime;
+    }
+    update_schedule_units(ops);
 
-    if ( cpumask_test_cpu(sched_unit_master(unit), online)
-         || (cpu >= nr_cpu_ids) )
-        cpu = sched_unit_master(unit);
+    /*
+     * The newly-installed schedule takes effect immediately. We do not even
+     * wait for the current major frame to expire.
+     *
+     * Signal a new major frame to begin. The next major frame is set up by
+     * the do_schedule callback function when it is next invoked.
+     */
+    sched_priv->next_major_frame = NOW();
 
-    return get_sched_res(cpu);
+    rc = 0;
+
+ fail:
+    spin_unlock_irqrestore(&sched_priv->lock, flags);
+    return rc;
 }
 
-static spinlock_t *a653sched_switch_sched(struct scheduler *new_ops,
-                                          unsigned int cpu, void *pdata,
-                                          void *vdata)
+static int a653sched_get(const struct scheduler *ops,
+                         struct xen_sysctl_arinc653_schedule *schedule)
 {
-    struct sched_resource *sr = get_sched_res(cpu);
-    const struct a653sched_unit *svc = vdata;
+    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
+    unsigned int i;
+    unsigned long flags;
 
-    ASSERT(!pdata && svc && is_idle_unit(svc->unit));
+    spin_lock_irqsave(&sched_priv->lock, flags);
 
-    sched_idle_unit(cpu)->priv = vdata;
+    schedule->num_sched_entries = sched_priv->num_schedule_entries;
+    schedule->major_frame = sched_priv->major_frame;
+    for ( i = 0; i < sched_priv->num_schedule_entries; i++ )
+    {
+        memcpy(schedule->sched_entries[i].dom_handle,
+               sched_priv->schedule[i].dom_handle,
+               sizeof(sched_priv->schedule[i].dom_handle));
+        schedule->sched_entries[i].vcpu_id = sched_priv->schedule[i].unit_id;
+        schedule->sched_entries[i].runtime = sched_priv->schedule[i].runtime;
+    }
 
-    return &sr->_lock;
+    spin_unlock_irqrestore(&sched_priv->lock, flags);
+
+    return 0;
 }
 
 static int a653sched_adjust_global(const struct scheduler *ops,
@@ -517,27 +516,35 @@ static const struct scheduler sched_arinc653_def = {
     .sched_id       = XEN_SCHEDULER_ARINC653,
     .sched_data     = NULL,
 
+    .global_init    = NULL,
     .init           = a653sched_init,
     .deinit         = a653sched_deinit,
 
-    .free_udata     = a653sched_free_udata,
-    .alloc_udata    = a653sched_alloc_udata,
+    .alloc_pdata    = NULL,
+    .switch_sched   = a653sched_switch_sched,
+    .deinit_pdata   = NULL,
+    .free_pdata     = NULL,
 
+    .alloc_domdata  = NULL,
+    .free_domdata   = NULL,
+
+    .alloc_udata    = a653sched_alloc_udata,
     .insert_unit    = NULL,
     .remove_unit    = NULL,
+    .free_udata     = a653sched_free_udata,
 
     .sleep          = a653sched_unit_sleep,
     .wake           = a653sched_unit_wake,
     .yield          = NULL,
     .context_saved  = NULL,
 
-    .do_schedule    = a653sched_do_schedule,
-
     .pick_resource  = a653sched_pick_resource,
+    .migrate        = NULL,
 
-    .switch_sched   = a653sched_switch_sched,
+    .do_schedule    = a653sched_do_schedule,
 
     .adjust         = NULL,
+    .adjust_affinity= NULL,
     .adjust_global  = a653sched_adjust_global,
 
     .dump_settings  = NULL,
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-16 18:18 [PATCH 0/5] Multicore support for ARINC653 scheduler Jeff Kubascik
                   ` (3 preceding siblings ...)
  2020-09-16 18:18 ` [PATCH 4/5] sched/arinc653: Reorganize function definition order Jeff Kubascik
@ 2020-09-16 18:18 ` Jeff Kubascik
  2020-09-17  9:04   ` Jürgen Groß
  2020-09-17 14:42   ` Andrew Cooper
  4 siblings, 2 replies; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-16 18:18 UTC (permalink / raw)
  To: xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli, Jeff Kubascik

This change is an overhaul of the ARINC653 scheduler to enable CAST-32A
multicore scheduling. CAST-32A specifies that only one partition
(domain) can run during a minor frame, but that domain is now allowed to
have more than one vCPU.

Changes include:
- Each pCPU now has its own private structure. This contains a schedule
  of UNITs to run with an independent lock, to allow the do_schedule
  routine to run in parallel on each pCPU.
- Each domain now has its own private structure. This allows the
  scheduler to track domains in the cpupool and to track what pCPUs are
  active while the domain is active.
- Added the concept of an epoch to the scheduler. This allows each pCPU
  to synchronize their context switches between major/minor frames. It
  also allows the scheduler to recover in the case that a frame change
  was missed, which usually happens when the target is paused with a
  debugger.
- Added support for UNIT migration, which allows for pinning.
- In a653sched_adjust_global, removed a large structure from the stack,
  moved a653sched_private dereference here.

Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
---
 xen/common/sched/arinc653.c | 844 +++++++++++++++++++++++++-----------
 1 file changed, 601 insertions(+), 243 deletions(-)

diff --git a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c
index 0cd39d475f..1e96d81c99 100644
--- a/xen/common/sched/arinc653.c
+++ b/xen/common/sched/arinc653.c
@@ -36,15 +36,53 @@
 
 #include "private.h"
 
+/*
+ * Known design quirks:
+ * - If a pCPU is added to the cpupool, inactive UNITs (those that are not
+ *   assigned to a pCPU in the minor frame) will not be immediately activated
+ *   or migrated. Workaround is to use pinning to force a migration.
+ * - When pinning, it is recommended to specify a single pCPU. When using
+ *   multiple pCPUs IDs or "all", the scheduler will only look at pCPUs that are
+ *   not assigned to a UNIT during the minor frame. It will not try to find a
+ *   "best fit" and shuffle active UNITs around.
+ * - If a domain has more UNITs than available pCPUs, the excess UNITs will be
+ *   put into an inactive state and will not be scheduled. This is important to
+ *   note for domain 0, as Linux may hang if a vCPU is not scheduled.
+ */
+
+/*
+ * Locking:
+ * - Scheduler lock:
+ *  + is per-pCPU
+ *  + serialize accesses to UNIT schedule of the pCPU
+ *  + serializes assignment and deassignment of UNITs to a pCPU
+ * - Private lock (a.k.a. private scheduler lock):
+ *  + is scheduler-wide
+ *  + serializes accesses to the private scheduler, domain, and UNIT structures
+ *
+ * Ordering is: Scheduler lock, private lock. Or, OTOH, private lock nests
+ * inside the scheduler lock. More specifically:
+ *  + if we need both scheduler and private locks, we must acquire the scheduler
+ *    lock first
+ *  + if we already own the private lock, we must never acquire the scheduler
+ *    lock
+ */
+
+/*
+ * A handle with all zeros represents domain 0 if present, otherwise idle UNIT
+ */
+#define DOM0_HANDLE ((const xen_domain_handle_t){0})
+
 /*
  * Default timeslice for domain 0
  */
 #define DEFAULT_TIMESLICE MILLISECS(10)
 
 /*
- * Retrieve the idle UNIT for a given pCPU
+ * Return a pointer to the ARINC 653-specific scheduler data information
+ * associated with the given pCPU
  */
-#define IDLETASK(cpu)  (sched_idle_unit(cpu))
+#define APCPU(cpu) ((struct a653sched_pcpu *)get_sched_res(cpu)->sched_priv)
 
 /*
  * Return a pointer to the ARINC 653-specific scheduler data information
@@ -57,14 +95,56 @@
  */
 #define SCHED_PRIV(s) ((struct a653sched_private *)((s)->sched_data))
 
+/*
+ * UNIT frame entry for a pCPU
+ */
+struct unit_sched_entry
+{
+    struct sched_unit *unit;            /* UNIT to run */
+    s_time_t runtime;                   /* Duration of the frame */
+};
+
+/*
+ * Physical pCPU
+ */
+struct a653sched_pcpu
+{
+    unsigned int cpu;                   /* pCPU id */
+    struct list_head pcpu_list_elem;    /* On the scheduler private data */
+
+    spinlock_t lock;                    /* Scheduler lock */
+
+    /* Schedule of UNITs to run on this pCPU */
+    struct unit_sched_entry sched[ARINC653_MAX_DOMAINS_PER_SCHEDULE];
+
+    unsigned int sched_len;             /* Active entries in sched */
+    unsigned int sched_index;           /* Current frame */
+
+    s_time_t epoch;                     /* Sync to this point in time */
+    s_time_t major_frame;               /* Duration of a major frame */
+    s_time_t next_switch_time;          /* When to switch to the next frame */
+};
+
 /*
  * Schedule unit
  */
 struct a653sched_unit
 {
     struct sched_unit *unit;            /* Up-pointer to UNIT */
-    bool awake;                         /* UNIT awake flag */
-    struct list_head list;              /* On the scheduler private data */
+    struct a653sched_dom *adom;         /* Up-pointer to domain */
+    struct list_head unit_list_elem;    /* On the domain */
+    bool active;                        /* Is this UNIT active on a pCPU? */
+};
+
+/*
+ * Domain
+ */
+struct a653sched_dom
+{
+    struct domain *dom;                 /* Up-pointer to domain */
+    struct list_head dom_list_elem;     /* On the scheduler private data */
+    struct list_head unit_list;         /* UNITs belonging to this domain */
+    cpumask_t active_pcpus;             /* Active pCPUs for this domain */
 };
 
 /*
@@ -73,9 +153,7 @@ struct a653sched_unit
 struct sched_entry
 {
     xen_domain_handle_t dom_handle;     /* UUID of the domain */
-    int unit_id;                        /* UNIT number for reference */
     s_time_t runtime;                   /* Duration of the frame */
-    struct sched_unit *unit;            /* Pointer to UNIT */
 };
 
 /*
@@ -83,65 +161,124 @@ struct sched_entry
  */
 struct a653sched_private
 {
-    spinlock_t lock;                    /* Scheduler private lock */
+    spinlock_t lock;                    /* Private lock */
 
-    /*
-     * This array holds the active ARINC 653 schedule.
-     *
-     * When the system tries to start a new UNIT, this schedule is scanned
-     * to look for a matching (handle, UNIT #) pair. If both the handle (UUID)
-     * and UNIT number match, then the UNIT is allowed to run. Its run time
-     * (per major frame) is given in the third entry of the schedule.
-     */
-    struct sched_entry schedule[ARINC653_MAX_DOMAINS_PER_SCHEDULE];
+    /* Schedule of domains to run */
+    struct sched_entry sched[ARINC653_MAX_DOMAINS_PER_SCHEDULE];
 
-    /*
-     * This variable holds the number of entries that are valid in
-     * the arinc653_schedule table.
-     *
-     * This is not necessarily the same as the number of domains in the
-     * schedule. A domain could be listed multiple times within the schedule,
-     * or a domain with multiple UNITs could have a different
-     * schedule entry for each UNIT.
-     */
-    unsigned int num_schedule_entries;
+    unsigned int sched_len;             /* Active entries in sched */
 
+    s_time_t epoch;                     /* Sync to this point in time */
     s_time_t major_frame;               /* Duration of a major frame */
-    s_time_t next_major_frame;          /* When to switch to the next frame */
 
-    struct list_head unit_list;         /* UNITs belonging to this scheduler */
+    cpumask_t pcpus;                    /* pCPUs in this cpupool */
+
+    struct list_head pcpu_list;         /* pCPUs belonging to this scheduler */
+    struct list_head dom_list;          /* Doms belonging to this scheduler */
 };
 
 /* This function compares two domain handles */
-static int dom_handle_cmp(const xen_domain_handle_t h1,
-                          const xen_domain_handle_t h2)
+static inline bool dom_handle_cmp(const xen_domain_handle_t h1,
+                                  const xen_domain_handle_t h2)
+{
+    return memcmp(h1, h2, sizeof(xen_domain_handle_t)) == 0;
+}
+
+static struct a653sched_dom *find_dom(const struct a653sched_private *prv,
+                                      const xen_domain_handle_t handle)
+{
+    struct a653sched_dom *adom;
+
+    list_for_each_entry ( adom, &prv->dom_list, dom_list_elem )
+        if ( dom_handle_cmp(adom->dom->handle, handle) )
+            return adom;
+
+    return NULL;
+}
+
+static struct a653sched_unit *find_unit(const struct a653sched_dom *adom,
+                                        unsigned int cpu)
 {
-    return memcmp(h1, h2, sizeof(xen_domain_handle_t));
+    struct a653sched_unit *aunit;
+
+    list_for_each_entry ( aunit, &adom->unit_list, unit_list_elem )
+        if ( aunit->active && sched_unit_master(aunit->unit) == cpu )
+            return aunit;
+
+    return NULL;
 }
 
-static struct sched_unit *find_unit(const struct scheduler *ops,
-                                    xen_domain_handle_t handle,
-                                    int unit_id)
+static struct a653sched_unit *find_inactive_unit(const struct a653sched_dom *adom,
+                                                 unsigned int cpu)
 {
     struct a653sched_unit *aunit;
 
-    list_for_each_entry ( aunit, &SCHED_PRIV(ops)->unit_list, list )
-        if ( (dom_handle_cmp(aunit->unit->domain->handle, handle) == 0)
-             && (unit_id == aunit->unit->unit_id) )
-            return aunit->unit;
+    list_for_each_entry ( aunit, &adom->unit_list, unit_list_elem )
+        if ( !aunit->active &&
+             cpumask_test_cpu(cpu, cpupool_domain_master_cpumask(adom->dom)) &&
+             cpumask_test_cpu(cpu, aunit->unit->cpu_hard_affinity) )
+            return aunit;
 
     return NULL;
 }
 
-static void update_schedule_units(const struct scheduler *ops)
+static void sync_to_epoch(struct a653sched_pcpu *apc, s_time_t now)
+{
+    s_time_t next;
+    unsigned int index;
+
+    ASSERT(spin_is_locked(&apc->lock));
+
+    /* Determine the start of the current major frame */
+    next = now - ((now - apc->epoch) % apc->major_frame);
+
+    /* Determine which minor frame should be running */
+    for ( index = 0; index < apc->sched_len; index++ )
+    {
+        next += apc->sched[index].runtime;
+
+        if ( now < next )
+            break;
+    }
+
+    ASSERT(index < apc->sched_len);
+
+    apc->sched_index = index;
+    apc->next_switch_time = next;
+}
+
+static void build_pcpu_sched(const struct a653sched_private *prv,
+                             struct a653sched_pcpu *apc, s_time_t now)
 {
-    unsigned int i, n_entries = SCHED_PRIV(ops)->num_schedule_entries;
+    struct a653sched_dom *adom;
+    struct a653sched_unit *aunit;
+    unsigned int index;
+
+    ASSERT(spin_is_locked(&apc->lock));
+
+    for ( index = 0; index < prv->sched_len; index++ )
+    {
+        aunit = NULL;
+
+        adom = find_dom(prv, prv->sched[index].dom_handle);
+        if ( adom )
+        {
+            aunit = find_unit(adom, apc->cpu);
+        }
+
+        if ( aunit )
+            apc->sched[index].unit = aunit->unit;
+        else
+            apc->sched[index].unit = sched_idle_unit(apc->cpu);
+
+        apc->sched[index].runtime = prv->sched[index].runtime;
+    }
 
-    for ( i = 0; i < n_entries; i++ )
-        SCHED_PRIV(ops)->schedule[i].unit =
-            find_unit(ops,
-                      SCHED_PRIV(ops)->schedule[i].dom_handle,
-                      SCHED_PRIV(ops)->schedule[i].unit_id);
+    apc->sched_len = prv->sched_len;
+    apc->epoch = prv->epoch;
+    apc->major_frame = prv->major_frame;
+
+    sync_to_epoch(apc, now);
 }
 
 static int a653sched_init(struct scheduler *ops)
@@ -152,265 +289,466 @@ static int a653sched_init(struct scheduler *ops)
     if ( prv == NULL )
         return -ENOMEM;
 
-    ops->sched_data = prv;
-
-    prv->next_major_frame = 0;
     spin_lock_init(&prv->lock);
-    INIT_LIST_HEAD(&prv->unit_list);
+    INIT_LIST_HEAD(&prv->pcpu_list);
+    INIT_LIST_HEAD(&prv->dom_list);
+
+    prv->epoch = NOW();
+
+    /* Initialize the schedule to run dom0 if present, otherwise idle UNIT */
+    prv->sched_len = 1;
+    memcpy(prv->sched[0].dom_handle, DOM0_HANDLE,
+           sizeof(prv->sched[0].dom_handle));
+    prv->sched[0].runtime = DEFAULT_TIMESLICE;
+    prv->major_frame = DEFAULT_TIMESLICE;
+
+    ops->sched_data = prv;
 
     return 0;
 }
 
 static void a653sched_deinit(struct scheduler *ops)
 {
-    xfree(SCHED_PRIV(ops));
+    struct a653sched_private *prv = SCHED_PRIV(ops);
+
+    ASSERT(prv);
+    ASSERT(list_empty(&prv->pcpu_list));
+    ASSERT(list_empty(&prv->dom_list));
+
+    xfree(prv);
     ops->sched_data = NULL;
 }
 
+static void *a653sched_alloc_pdata(const struct scheduler *ops, int cpu)
+{
+    struct a653sched_pcpu *apc;
+
+    apc = xzalloc(struct a653sched_pcpu);
+    if ( apc == NULL )
+        return ERR_PTR(-ENOMEM);
+
+    spin_lock_init(&apc->lock);
+    INIT_LIST_HEAD(&apc->pcpu_list_elem);
+    apc->cpu = cpu;
+
+    return apc;
+}
+
+static void init_pdata(struct a653sched_private *prv,
+                       struct a653sched_pcpu *apc)
+{
+    ASSERT(!cpumask_test_cpu(apc->cpu, &prv->pcpus));
+
+    cpumask_set_cpu(apc->cpu, &prv->pcpus);
+    list_add(&apc->pcpu_list_elem, &prv->pcpu_list);
+
+    build_pcpu_sched(prv, apc, NOW());
+}
+
 static spinlock_t *a653sched_switch_sched(struct scheduler *new_ops,
                                           unsigned int cpu, void *pdata,
                                           void *vdata)
 {
-    struct sched_resource *sr = get_sched_res(cpu);
-    const struct a653sched_unit *svc = vdata;
+    struct a653sched_private *prv = SCHED_PRIV(new_ops);
+    struct a653sched_pcpu *apc = pdata;
+    struct a653sched_unit *aunit = vdata;
+    unsigned long flags;
 
-    ASSERT(!pdata && svc && is_idle_unit(svc->unit));
+    ASSERT(apc && aunit && is_idle_unit(aunit->unit));
 
     sched_idle_unit(cpu)->priv = vdata;
 
-    return &sr->_lock;
+    spin_lock_irqsave(&apc->lock, flags);
+    spin_lock(&prv->lock);
+    init_pdata(prv, apc);
+    spin_unlock(&prv->lock);
+    spin_unlock_irqrestore(&apc->lock, flags);
+
+    /* Return the scheduler lock */
+    return &apc->lock;
+}
+
+static void a653sched_deinit_pdata(const struct scheduler *ops, void *pcpu,
+                                   int cpu)
+{
+    struct a653sched_private *prv = SCHED_PRIV(ops);
+    struct a653sched_pcpu *apc = pcpu;
+    unsigned long flags;
+
+    ASSERT(apc);
+    ASSERT(cpumask_test_cpu(cpu, &prv->pcpus));
+
+    spin_lock_irqsave(&prv->lock, flags);
+    cpumask_clear_cpu(cpu, &prv->pcpus);
+    list_del(&apc->pcpu_list_elem);
+    spin_unlock_irqrestore(&prv->lock, flags);
+}
+
+static void a653sched_free_pdata(const struct scheduler *ops, void *pcpu,
+                                 int cpu)
+{
+    struct a653sched_pcpu *apc = pcpu;
+
+    xfree(apc);
+}
+
+static void *a653sched_alloc_domdata(const struct scheduler *ops,
+                                     struct domain *dom)
+{
+    struct a653sched_private *prv = SCHED_PRIV(ops);
+    struct a653sched_dom *adom;
+    unsigned long flags;
+
+    adom = xzalloc(struct a653sched_dom);
+    if ( adom == NULL )
+        return ERR_PTR(-ENOMEM);
+
+    INIT_LIST_HEAD(&adom->dom_list_elem);
+    INIT_LIST_HEAD(&adom->unit_list);
+    adom->dom = dom;
+
+    spin_lock_irqsave(&prv->lock, flags);
+    list_add(&adom->dom_list_elem, &prv->dom_list);
+    spin_unlock_irqrestore(&prv->lock, flags);
+
+    return adom;
+}
+
+static void a653sched_free_domdata(const struct scheduler *ops, void *data)
+{
+    struct a653sched_private *prv = SCHED_PRIV(ops);
+    struct a653sched_dom *adom = data;
+    unsigned long flags;
+
+    ASSERT(adom);
+    ASSERT(list_empty(&adom->unit_list));
+
+    spin_lock_irqsave(&prv->lock, flags);
+    list_del(&adom->dom_list_elem);
+    spin_unlock_irqrestore(&prv->lock, flags);
+
+    xfree(adom);
+}
+
+static struct sched_resource *pick_res(struct a653sched_unit *aunit)
+{
+    struct a653sched_dom *adom = aunit->adom;
+    unsigned int cpu = sched_unit_master(aunit->unit);
+    cpumask_t *cpus = cpumask_scratch_cpu(cpu);
+    unsigned int valid;
+    unsigned int available;
+
+    cpumask_and(cpus, cpupool_domain_master_cpumask(adom->dom),
+                aunit->unit->cpu_hard_affinity);
+
+    /* Stick with the current pCPU if it is still valid */
+    if ( cpumask_test_cpu(cpu, cpus) )
+        return get_sched_res(cpu);
+
+    valid = cpumask_first(cpus);
+
+    /* Find an available pCPU */
+    cpumask_andnot(cpus, cpus, &adom->active_pcpus);
+    available = cpumask_first(cpus);
+    if ( available < nr_cpu_ids )
+        return get_sched_res(available);
+
+    /* All pCPUs are in use - return the first valid ID */
+    return get_sched_res(valid);
+}
+
+static void unit_assign(struct a653sched_private *prv,
+                        struct a653sched_unit *aunit,
+                        unsigned int cpu)
+{
+    struct a653sched_dom *adom = aunit->adom;
+
+    ASSERT(!aunit->active);
+    ASSERT(!cpumask_test_cpu(cpu, &adom->active_pcpus));
+
+    cpumask_set_cpu(cpu, &adom->active_pcpus);
+    aunit->active = true;
+
+    sched_set_res(aunit->unit, get_sched_res(cpu));
+}
+
+static void unit_deassign(struct a653sched_private *prv,
+                          struct a653sched_unit *aunit)
+{
+    unsigned int cpu = sched_unit_master(aunit->unit);
+    struct a653sched_dom *adom = aunit->adom;
+    struct a653sched_unit *next_aunit;
+
+    ASSERT(aunit->active);
+    ASSERT(cpumask_test_cpu(cpu, &adom->active_pcpus));
+
+    cpumask_clear_cpu(cpu, &adom->active_pcpus);
+    aunit->active = false;
+
+    /* Find an inactive UNIT to run next */
+    next_aunit = find_inactive_unit(adom, cpu);
+    if ( next_aunit )
+        unit_assign(prv, next_aunit, cpu);
 }
 
 static void *a653sched_alloc_udata(const struct scheduler *ops,
                                    struct sched_unit *unit,
                                    void *dd)
 {
-    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
-    struct a653sched_unit *svc;
-    unsigned int entry;
-    unsigned long flags;
+    struct a653sched_unit *aunit;
 
-    /*
-     * Allocate memory for the ARINC 653-specific scheduler data information
-     * associated with the given UNIT (unit).
-     */
-    svc = xmalloc(struct a653sched_unit);
-    if ( svc == NULL )
+    aunit = xzalloc(struct a653sched_unit);
+    if ( aunit == NULL )
         return NULL;
 
-    spin_lock_irqsave(&sched_priv->lock, flags);
+    INIT_LIST_HEAD(&aunit->unit_list_elem);
+    aunit->unit = unit;
+    aunit->adom = dd;
+    aunit->active = false;
 
-    /*
-     * Add every one of dom0's units to the schedule, as long as there are
-     * slots available.
-     */
-    if ( unit->domain->domain_id == 0 )
-    {
-        entry = sched_priv->num_schedule_entries;
+    return aunit;
+}
 
-        if ( entry < ARINC653_MAX_DOMAINS_PER_SCHEDULE )
-        {
-            sched_priv->schedule[entry].dom_handle[0] = '\0';
-            sched_priv->schedule[entry].unit_id = unit->unit_id;
-            sched_priv->schedule[entry].runtime = DEFAULT_TIMESLICE;
-            sched_priv->schedule[entry].unit = unit;
+static void a653sched_insert_unit(const struct scheduler *ops,
+                                  struct sched_unit *unit)
+{
+    struct a653sched_private *prv = SCHED_PRIV(ops);
+    struct a653sched_unit *aunit = AUNIT(unit);
+    struct a653sched_dom *adom = aunit->adom;
+    struct a653sched_pcpu *apc;
+    spinlock_t *lock;
+    unsigned int cpu;
+    unsigned long flags;
+    bool assigned = false;
 
-            sched_priv->major_frame += DEFAULT_TIMESLICE;
-            ++sched_priv->num_schedule_entries;
-        }
-    }
+    /* Add the UNIT to the ARINC653 cpupool */
+    spin_lock_irqsave(&prv->lock, flags);
 
-    /*
-     * Initialize our ARINC 653 scheduler-specific information for the UNIT.
-     * The UNIT starts "asleep." When Xen is ready for the UNIT to run, it
-     * will call the vcpu_wake scheduler callback function and our scheduler
-     * will mark the UNIT awake.
-     */
-    svc->unit = unit;
-    svc->awake = false;
-    if ( !is_idle_unit(unit) )
-        list_add(&svc->list, &SCHED_PRIV(ops)->unit_list);
-    update_schedule_units(ops);
+    list_add(&aunit->unit_list_elem, &adom->unit_list);
+
+    sched_set_res(unit, pick_res(aunit));
+    cpu = sched_unit_master(unit);
+    apc = APCPU(cpu);
+
+    if ( cpumask_test_cpu(cpu, cpupool_domain_master_cpumask(adom->dom)) &&
+         cpumask_test_cpu(cpu, unit->cpu_hard_affinity) &&
+         !cpumask_test_cpu(cpu, &adom->active_pcpus) )
+    {
+        unit_assign(prv, aunit, cpu);
+        assigned = true;
+    }
 
-    spin_unlock_irqrestore(&sched_priv->lock, flags);
+    spin_unlock_irqrestore(&prv->lock, flags);
 
-    return svc;
+    /* Rebuild the UNIT schedule for the pCPU, if needed */
+    if ( assigned )
+    {
+        lock = pcpu_schedule_lock_irqsave(cpu, &flags);
+        spin_lock(&prv->lock);
+        build_pcpu_sched(prv, apc, NOW());
+        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+        spin_unlock(&prv->lock);
+        pcpu_schedule_unlock_irqrestore(lock, flags, cpu);
+    }
 }
 
-static void a653sched_free_udata(const struct scheduler *ops, void *priv)
+static void a653sched_remove_unit(const struct scheduler *ops,
+                                  struct sched_unit *unit)
 {
-    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
-    struct a653sched_unit *av = priv;
+    struct a653sched_private *prv = SCHED_PRIV(ops);
+    struct a653sched_unit *aunit = AUNIT(unit);
+    spinlock_t *lock;
+    unsigned int cpu = sched_unit_master(unit);
     unsigned long flags;
+    bool removed = false;
+
+    ASSERT(!is_idle_unit(unit));
+
+    /* Remove the UNIT from the ARINC653 cpupool */
+    spin_lock_irqsave(&prv->lock, flags);
 
-    if (av == NULL)
-        return;
+    list_del(&aunit->unit_list_elem);
 
-    spin_lock_irqsave(&sched_priv->lock, flags);
+    if ( aunit->active )
+    {
+        unit_deassign(prv, aunit);
+        removed = true;
+    }
 
-    if ( !is_idle_unit(av->unit) )
-        list_del(&av->list);
+    spin_unlock_irqrestore(&prv->lock, flags);
 
-    xfree(av);
-    update_schedule_units(ops);
+    /* Rebuild the UNIT schedule for the pCPU, if needed */
+    if ( removed )
+    {
+        lock = pcpu_schedule_lock_irqsave(cpu, &flags);
+        spin_lock(&prv->lock);
+        build_pcpu_sched(prv, APCPU(cpu), NOW());
+        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+        spin_unlock(&prv->lock);
+        pcpu_schedule_unlock_irqrestore(lock, flags, cpu);
+    }
+}
+
+static void a653sched_free_udata(const struct scheduler *ops, void *priv)
+{
+    struct a653sched_unit *aunit = priv;
 
-    spin_unlock_irqrestore(&sched_priv->lock, flags);
+    xfree(aunit);
 }
 
 static void a653sched_unit_sleep(const struct scheduler *ops,
                                  struct sched_unit *unit)
 {
-    if ( AUNIT(unit) != NULL )
-        AUNIT(unit)->awake = false;
+    const unsigned int cpu = sched_unit_master(unit);
+
+    ASSERT(!is_idle_unit(unit));
 
     /*
      * If the UNIT being put to sleep is the same one that is currently
      * running, raise a softirq to invoke the scheduler to switch domains.
      */
-    if ( get_sched_res(sched_unit_master(unit))->curr == unit )
-        cpu_raise_softirq(sched_unit_master(unit), SCHEDULE_SOFTIRQ);
+    if ( curr_on_cpu(cpu) == unit )
+        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
 }
 
 static void a653sched_unit_wake(const struct scheduler *ops,
                                 struct sched_unit *unit)
 {
-    if ( AUNIT(unit) != NULL )
-        AUNIT(unit)->awake = true;
+    const unsigned int cpu = sched_unit_master(unit);
 
-    cpu_raise_softirq(sched_unit_master(unit), SCHEDULE_SOFTIRQ);
+    ASSERT(!is_idle_unit(unit));
+
+    cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
 }
 
 static struct sched_resource *a653sched_pick_resource(const struct scheduler *ops,
                                                       const struct sched_unit *unit)
 {
-    const cpumask_t *online;
-    unsigned int cpu;
+    struct a653sched_private *prv = SCHED_PRIV(ops);
+    struct a653sched_unit *aunit = AUNIT(unit);
+    struct sched_resource *sr;
+    unsigned long flags;
 
-    /*
-     * If present, prefer unit's current processor, else
-     * just find the first valid unit.
-     */
-    online = cpupool_domain_master_cpumask(unit->domain);
+    ASSERT(!is_idle_unit(unit));
 
-    cpu = cpumask_first(online);
+    spin_lock_irqsave(&prv->lock, flags);
+    sr = pick_res(aunit);
+    spin_unlock_irqrestore(&prv->lock, flags);
 
-    if ( cpumask_test_cpu(sched_unit_master(unit), online)
-         || (cpu >= nr_cpu_ids) )
-        cpu = sched_unit_master(unit);
+    return sr;
+}
 
-    return get_sched_res(cpu);
+static void a653sched_migrate(const struct scheduler *ops,
+                              struct sched_unit *unit, unsigned int new_cpu)
+{
+    const unsigned int old_cpu = sched_unit_master(unit);
+    struct a653sched_private *prv = SCHED_PRIV(ops);
+    struct a653sched_unit *aunit = AUNIT(unit);
+    struct a653sched_dom *adom = aunit->adom;
+    unsigned long flags;
+
+    ASSERT(!is_idle_unit(unit));
+
+    spin_lock_irqsave(&prv->lock, flags);
+
+    /* Migrate off the old pCPU */
+    if ( aunit->active && old_cpu != new_cpu )
+    {
+        /*
+         * If the UNIT is currently running, we need to mark it as migrating
+         * and wait for the scheduler to switch it out.
+         */
+        if ( curr_on_cpu(old_cpu) == unit )
+        {
+            sched_set_pause_flags(unit, _VPF_migrating);
+            cpu_raise_softirq(old_cpu, SCHEDULE_SOFTIRQ);
+        }
+
+        unit_deassign(prv, aunit);
+        build_pcpu_sched(prv, APCPU(old_cpu), NOW());
+        cpu_raise_softirq(old_cpu, SCHEDULE_SOFTIRQ);
+    }
+
+    /* Migrate on to the new pCPU */
+    if ( !aunit->active && !cpumask_test_cpu(new_cpu, &adom->active_pcpus) )
+    {
+        unit_assign(prv, aunit, new_cpu);
+        build_pcpu_sched(prv, APCPU(new_cpu), NOW());
+    }
+    else
+    {
+        sched_set_res(unit, get_sched_res(new_cpu));
+    }
+
+    spin_unlock_irqrestore(&prv->lock, flags);
 }
 
 static void a653sched_do_schedule(const struct scheduler *ops,
                                   struct sched_unit *prev, s_time_t now,
                                   bool tasklet_work_scheduled)
 {
-    struct sched_unit *new_task = NULL;
-    static unsigned int sched_index = 0;
-    static s_time_t next_switch_time;
-    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
     const unsigned int cpu = sched_get_resource_cpu(smp_processor_id());
-    unsigned long flags;
+    struct a653sched_pcpu *apc = APCPU(cpu);
+    struct sched_unit *next_task;
 
-    spin_lock_irqsave(&sched_priv->lock, flags);
+    ASSERT(spin_is_locked(&apc->lock));
 
-    if ( sched_priv->num_schedule_entries < 1 )
-        sched_priv->next_major_frame = now + DEFAULT_TIMESLICE;
-    else if ( now >= sched_priv->next_major_frame )
-    {
-        /* time to enter a new major frame
-         * the first time this function is called, this will be true */
-        /* start with the first domain in the schedule */
-        sched_index = 0;
-        sched_priv->next_major_frame = now + sched_priv->major_frame;
-        next_switch_time = now + sched_priv->schedule[0].runtime;
-    }
-    else
+    /* Advance to the next frame if the current one has expired */
+    if ( now >= apc->next_switch_time )
     {
-        while ( (now >= next_switch_time)
-                && (sched_index < sched_priv->num_schedule_entries) )
-        {
-            /* time to switch to the next domain in this major frame */
-            sched_index++;
-            next_switch_time += sched_priv->schedule[sched_index].runtime;
-        }
+        apc->sched_index++;
+        if ( apc->sched_index >= apc->sched_len )
+            apc->sched_index = 0;
+
+        apc->next_switch_time += apc->sched[apc->sched_index].runtime;
     }
 
-    /*
-     * If we exhausted the domains in the schedule and still have time left
-     * in the major frame then switch next at the next major frame.
-     */
-    if ( sched_index >= sched_priv->num_schedule_entries )
-        next_switch_time = sched_priv->next_major_frame;
+    /* Frames were somehow missed - resynchronize to epoch */
+    if ( unlikely(now >= apc->next_switch_time) )
+        sync_to_epoch(apc, now);
 
-    /*
-     * If there are more domains to run in the current major frame, set
-     * new_task equal to the address of next domain's sched_unit structure.
-     * Otherwise, set new_task equal to the address of the idle task's
-     * sched_unit structure.
-     */
-    new_task = (sched_index < sched_priv->num_schedule_entries)
-        ? sched_priv->schedule[sched_index].unit
-        : IDLETASK(cpu);
-
-    /* Check to see if the new task can be run (awake & runnable). */
-    if ( !((new_task != NULL)
-           && (AUNIT(new_task) != NULL)
-           && AUNIT(new_task)->awake
-           && unit_runnable_state(new_task)) )
-        new_task = IDLETASK(cpu);
-    BUG_ON(new_task == NULL);
+    next_task = apc->sched[apc->sched_index].unit;
 
-    /*
-     * Check to make sure we did not miss a major frame.
-     * This is a good test for robust partitioning.
-     */
-    BUG_ON(now >= sched_priv->next_major_frame);
+    ASSERT(next_task);
 
-    spin_unlock_irqrestore(&sched_priv->lock, flags);
+    /* Check if the new task is runnable */
+    if ( !unit_runnable_state(next_task) )
+        next_task = sched_idle_unit(cpu);
 
     /* Tasklet work (which runs in idle UNIT context) overrides all else. */
     if ( tasklet_work_scheduled )
-        new_task = IDLETASK(cpu);
+        next_task = sched_idle_unit(cpu);
 
-    /* Running this task would result in a migration */
-    if ( !is_idle_unit(new_task)
-         && (sched_unit_master(new_task) != cpu) )
-        new_task = IDLETASK(cpu);
+    prev->next_time = apc->next_switch_time - now;
+    prev->next_task = next_task;
+    next_task->migrated = false;
 
-    /*
-     * Return the amount of time the next domain has to run and the address
-     * of the selected task's UNIT structure.
-     */
-    prev->next_time = next_switch_time - now;
-    prev->next_task = new_task;
-    new_task->migrated = false;
-
-    BUG_ON(prev->next_time <= 0);
+    ASSERT(prev->next_time > 0);
 }
 
-static int a653sched_set(const struct scheduler *ops,
+static int a653sched_set(struct a653sched_private *prv,
                          struct xen_sysctl_arinc653_schedule *schedule)
 {
-    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
+    struct a653sched_pcpu *apc;
     s_time_t total_runtime = 0;
+    s_time_t now;
+    spinlock_t *lock;
     unsigned int i;
     unsigned long flags;
-    int rc = -EINVAL;
 
-    spin_lock_irqsave(&sched_priv->lock, flags);
+    now = NOW();
 
     /* Check for valid major frame and number of schedule entries */
     if ( (schedule->major_frame <= 0)
          || (schedule->num_sched_entries < 1)
          || (schedule->num_sched_entries > ARINC653_MAX_DOMAINS_PER_SCHEDULE) )
-        goto fail;
+        return -EINVAL;
 
     for ( i = 0; i < schedule->num_sched_entries; i++ )
     {
         /* Check for a valid run time. */
         if ( schedule->sched_entries[i].runtime <= 0 )
-            goto fail;
+            return -EINVAL;
 
         /* Add this entry's run time to total run time. */
         total_runtime += schedule->sched_entries[i].runtime;
@@ -421,60 +759,64 @@ static int a653sched_set(const struct scheduler *ops,
      * indicated by comparing the total run time to the major frame length
      */
     if ( total_runtime > schedule->major_frame )
-        goto fail;
+        return -EINVAL;
+
+    spin_lock_irqsave(&prv->lock, flags);
 
     /* Copy the new schedule into place. */
-    sched_priv->num_schedule_entries = schedule->num_sched_entries;
-    sched_priv->major_frame = schedule->major_frame;
+    prv->sched_len = schedule->num_sched_entries;
+    prv->major_frame = schedule->major_frame;
+    prv->epoch = now;
     for ( i = 0; i < schedule->num_sched_entries; i++ )
     {
-        memcpy(sched_priv->schedule[i].dom_handle,
+        memcpy(prv->sched[i].dom_handle,
                schedule->sched_entries[i].dom_handle,
-               sizeof(sched_priv->schedule[i].dom_handle));
-        sched_priv->schedule[i].unit_id =
-            schedule->sched_entries[i].vcpu_id;
-        sched_priv->schedule[i].runtime =
+               sizeof(prv->sched[i].dom_handle));
+        prv->sched[i].runtime =
             schedule->sched_entries[i].runtime;
     }
-    update_schedule_units(ops);
+
+    spin_unlock_irqrestore(&prv->lock, flags);
 
     /*
-     * The newly-installed schedule takes effect immediately. We do not even
-     * wait for the current major frame to expire.
-     *
-     * Signal a new major frame to begin. The next major frame is set up by
-     * the do_schedule callback function when it is next invoked.
+     * The newly-installed schedule takes effect immediately - update the UNIT
+     * schedule for each pCPU
      */
-    sched_priv->next_major_frame = NOW();
+    list_for_each_entry ( apc, &prv->pcpu_list, pcpu_list_elem )
+    {
+        lock = pcpu_schedule_lock_irqsave(apc->cpu, &flags);
+        spin_lock(&prv->lock);
+        build_pcpu_sched(prv, apc, now);
+        spin_unlock(&prv->lock);
+        pcpu_schedule_unlock_irqrestore(lock, flags, apc->cpu);
+    }
 
-    rc = 0;
+    /* Trigger scheduler on all pCPUs */
+    cpumask_raise_softirq(&prv->pcpus, SCHEDULE_SOFTIRQ);
 
- fail:
-    spin_unlock_irqrestore(&sched_priv->lock, flags);
-    return rc;
+    return 0;
 }
 
-static int a653sched_get(const struct scheduler *ops,
+static int a653sched_get(struct a653sched_private *prv,
                          struct xen_sysctl_arinc653_schedule *schedule)
 {
-    struct a653sched_private *sched_priv = SCHED_PRIV(ops);
     unsigned int i;
     unsigned long flags;
 
-    spin_lock_irqsave(&sched_priv->lock, flags);
+    spin_lock_irqsave(&prv->lock, flags);
 
-    schedule->num_sched_entries = sched_priv->num_schedule_entries;
-    schedule->major_frame = sched_priv->major_frame;
-    for ( i = 0; i < sched_priv->num_schedule_entries; i++ )
+    schedule->num_sched_entries = prv->sched_len;
+    schedule->major_frame = prv->major_frame;
+    for ( i = 0; i < prv->sched_len; i++ )
     {
         memcpy(schedule->sched_entries[i].dom_handle,
-               sched_priv->schedule[i].dom_handle,
-               sizeof(sched_priv->schedule[i].dom_handle));
-        schedule->sched_entries[i].vcpu_id = sched_priv->schedule[i].unit_id;
-        schedule->sched_entries[i].runtime = sched_priv->schedule[i].runtime;
+               prv->sched[i].dom_handle,
+               sizeof(schedule->sched_entries[i].dom_handle));
+        schedule->sched_entries[i].vcpu_id = 0;
+        schedule->sched_entries[i].runtime = prv->sched[i].runtime;
     }
 
-    spin_unlock_irqrestore(&sched_priv->lock, flags);
+    spin_unlock_irqrestore(&prv->lock, flags);
 
     return 0;
 }
@@ -482,31 +824,47 @@ static int a653sched_get(const struct scheduler *ops,
 static int a653sched_adjust_global(const struct scheduler *ops,
                                    struct xen_sysctl_scheduler_op *sc)
 {
-    struct xen_sysctl_arinc653_schedule local_sched;
+    struct a653sched_private *prv = SCHED_PRIV(ops);
+    struct xen_sysctl_arinc653_schedule *sched = NULL;
     int rc = -EINVAL;
 
     switch ( sc->cmd )
     {
     case XEN_SYSCTL_SCHEDOP_putinfo:
-        if ( copy_from_guest(&local_sched, sc->u.sched_arinc653.schedule, 1) )
+        sched = xzalloc(struct xen_sysctl_arinc653_schedule);
+        if ( sched == NULL )
+        {
+            rc = -ENOMEM;
+            break;
+        }
+
+        if ( copy_from_guest(sched, sc->u.sched_arinc653.schedule, 1) )
         {
             rc = -EFAULT;
             break;
         }
 
-        rc = a653sched_set(ops, &local_sched);
+        rc = a653sched_set(prv, sched);
         break;
     case XEN_SYSCTL_SCHEDOP_getinfo:
-        memset(&local_sched, -1, sizeof(local_sched));
-        rc = a653sched_get(ops, &local_sched);
+        sched = xzalloc(struct xen_sysctl_arinc653_schedule);
+        if ( sched == NULL )
+        {
+            rc = -ENOMEM;
+            break;
+        }
+
+        rc = a653sched_get(prv, sched);
         if ( rc )
             break;
 
-        if ( copy_to_guest(sc->u.sched_arinc653.schedule, &local_sched, 1) )
+        if ( copy_to_guest(sc->u.sched_arinc653.schedule, sched, 1) )
             rc = -EFAULT;
         break;
     }
 
+    xfree(sched);
+
     return rc;
 }
 
@@ -520,17 +878,17 @@ static const struct scheduler sched_arinc653_def = {
     .init           = a653sched_init,
     .deinit         = a653sched_deinit,
 
-    .alloc_pdata    = NULL,
+    .alloc_pdata    = a653sched_alloc_pdata,
     .switch_sched   = a653sched_switch_sched,
-    .deinit_pdata   = NULL,
-    .free_pdata     = NULL,
+    .deinit_pdata   = a653sched_deinit_pdata,
+    .free_pdata     = a653sched_free_pdata,
 
-    .alloc_domdata  = NULL,
-    .free_domdata   = NULL,
+    .alloc_domdata  = a653sched_alloc_domdata,
+    .free_domdata   = a653sched_free_domdata,
 
     .alloc_udata    = a653sched_alloc_udata,
-    .insert_unit    = NULL,
-    .remove_unit    = NULL,
+    .insert_unit    = a653sched_insert_unit,
+    .remove_unit    = a653sched_remove_unit,
     .free_udata     = a653sched_free_udata,
 
     .sleep          = a653sched_unit_sleep,
@@ -539,7 +897,7 @@ static const struct scheduler sched_arinc653_def = {
     .context_saved  = NULL,
 
     .pick_resource  = a653sched_pick_resource,
-    .migrate        = NULL,
+    .migrate        = a653sched_migrate,
 
     .do_schedule    = a653sched_do_schedule,
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] sched/arinc653: Clean up function definitions
  2020-09-16 18:18 ` [PATCH 3/5] sched/arinc653: Clean up function definitions Jeff Kubascik
@ 2020-09-17  8:09   ` Jan Beulich
  2020-09-17 14:40     ` Dario Faggioli
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2020-09-17  8:09 UTC (permalink / raw)
  To: Jeff Kubascik
  Cc: xen-devel, xen-devel, Josh Whitehead, Stewart Hildebrand,
	George Dunlap, Dario Faggioli

On 16.09.2020 20:18, Jeff Kubascik wrote:
> --- a/xen/common/sched/arinc653.c
> +++ b/xen/common/sched/arinc653.c
> @@ -119,10 +119,9 @@ static int dom_handle_cmp(const xen_domain_handle_t h1,
>      return memcmp(h1, h2, sizeof(xen_domain_handle_t));
>  }
>  
> -static struct sched_unit *find_unit(
> -    const struct scheduler *ops,
> -    xen_domain_handle_t handle,
> -    int unit_id)
> +static struct sched_unit *find_unit(const struct scheduler *ops,
> +                                    xen_domain_handle_t handle,
> +                                    int unit_id)
>  {

Just fyi, afaict we consider both variants legitimate style as far
as Xen as a whole is concerned; I'm unaware of scheduler code
specific restrictions (but I'll be happy to be corrected if I'm
wrong with this).

Instead what I'm wondering by merely seeing this piece of code is
whether unit_id really can go negative. If not (as would be the
common case with IDs), it would want converting to unsigned int,
which may be more important than the purely typographical
adjustment done here.

Jan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] sched/arinc653: Reorganize function definition order
  2020-09-16 18:18 ` [PATCH 4/5] sched/arinc653: Reorganize function definition order Jeff Kubascik
@ 2020-09-17  8:12   ` Jan Beulich
  2020-09-17 14:16     ` Dario Faggioli
  2020-09-17 14:17     ` Andrew Cooper
  0 siblings, 2 replies; 34+ messages in thread
From: Jan Beulich @ 2020-09-17  8:12 UTC (permalink / raw)
  To: Jeff Kubascik
  Cc: xen-devel, xen-devel, Josh Whitehead, Stewart Hildebrand,
	George Dunlap, Dario Faggioli

On 16.09.2020 20:18, Jeff Kubascik wrote:
> @@ -517,27 +516,35 @@ static const struct scheduler sched_arinc653_def = {
>      .sched_id       = XEN_SCHEDULER_ARINC653,
>      .sched_data     = NULL,
>  
> +    .global_init    = NULL,
>      .init           = a653sched_init,
>      .deinit         = a653sched_deinit,
>  
> -    .free_udata     = a653sched_free_udata,
> -    .alloc_udata    = a653sched_alloc_udata,
> +    .alloc_pdata    = NULL,
> +    .switch_sched   = a653sched_switch_sched,
> +    .deinit_pdata   = NULL,
> +    .free_pdata     = NULL,
>  
> +    .alloc_domdata  = NULL,
> +    .free_domdata   = NULL,
> +
> +    .alloc_udata    = a653sched_alloc_udata,
>      .insert_unit    = NULL,
>      .remove_unit    = NULL,
> +    .free_udata     = a653sched_free_udata,
>  
>      .sleep          = a653sched_unit_sleep,
>      .wake           = a653sched_unit_wake,
>      .yield          = NULL,
>      .context_saved  = NULL,
>  
> -    .do_schedule    = a653sched_do_schedule,
> -
>      .pick_resource  = a653sched_pick_resource,
> +    .migrate        = NULL,
>  
> -    .switch_sched   = a653sched_switch_sched,
> +    .do_schedule    = a653sched_do_schedule,
>  
>      .adjust         = NULL,
> +    .adjust_affinity= NULL,

Adding all these not really needed NULL initializers looks to rather move
this scheduler away from all the others. (Oddly enough all of them
explicitly set .sched_data to NULL - for whatever reason.)

Jan


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-16 18:18 ` [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling Jeff Kubascik
@ 2020-09-17  9:04   ` Jürgen Groß
  2020-09-17 15:10     ` Stewart Hildebrand
  2020-09-17 14:42   ` Andrew Cooper
  1 sibling, 1 reply; 34+ messages in thread
From: Jürgen Groß @ 2020-09-17  9:04 UTC (permalink / raw)
  To: Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli

On 16.09.20 20:18, Jeff Kubascik wrote:
> This change is an overhaul of the ARINC653 scheduler to enable CAST-32A
> multicore scheduling. CAST-32A specifies that only one partition
> (domain) can run during a minor frame, but that domain is now allowed to
> have more than one vCPU.

It might be worth to consider using just the core scheduling framework
in order to achive this. Using a sched_granularity with the number of
cpus in the cpupool running ARINC653 scheduler should already do the
trick. There should be no further midification of ARINC653 scheduler
required.


Juergen


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/5] sched/arinc653: Rename scheduler private structs
  2020-09-16 18:18 ` [PATCH 2/5] sched/arinc653: Rename scheduler private structs Jeff Kubascik
@ 2020-09-17 12:09   ` Andrew Cooper
  2020-09-17 14:46     ` Dario Faggioli
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-09-17 12:09 UTC (permalink / raw)
  To: Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli

On 16/09/2020 19:18, Jeff Kubascik wrote:
> diff --git a/xen/common/sched/arinc653.c b/xen/common/sched/arinc653.c
> index 7bb75ffe2b..d8a23730c3 100644
> --- a/xen/common/sched/arinc653.c
> +++ b/xen/common/sched/arinc653.c
> @@ -50,38 +50,38 @@
>   * Return a pointer to the ARINC 653-specific scheduler data information
>   * associated with the given UNIT
>   */
> -#define AUNIT(unit) ((arinc653_unit_t *)(unit)->priv)
> +#define AUNIT(unit) ((struct a653sched_unit *)(unit)->priv)
>  
>  /*
>   * Return the global scheduler private data given the scheduler ops pointer
>   */
> -#define SCHED_PRIV(s) ((a653sched_priv_t *)((s)->sched_data))
> +#define SCHED_PRIV(s) ((struct a653sched_private *)((s)->sched_data))

While you're cleaning things up, please delete these macros (possibly in
this patch, as you touch every almost every user).  They strictly
introduce type safety issues, and are in the process of being taken out
of the other schedulers.

Some logic already has a local variable.  These areas can be expanded in
place and everything will work.

Any logic which doesn't have a local variable need to gain one.

~Andrew


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] sched/arinc653: Clean up comments
  2020-09-16 18:18 ` [PATCH 1/5] sched/arinc653: Clean up comments Jeff Kubascik
@ 2020-09-17 13:24   ` Andrew Cooper
  2020-09-18 15:33     ` Jeff Kubascik
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-09-17 13:24 UTC (permalink / raw)
  To: Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli

On 16/09/2020 19:18, Jeff Kubascik wrote:
> -/**
> - * Retrieve the idle UNIT for a given physical CPU
> +/*
> + * Retrieve the idle UNIT for a given pCPU
>   */

/** is also acceptable.  We've inherited quite a few doxygen-like
comments, and there is currently a plan to move some things formally to
kernel-doc as part of the automotive safety certification work, which
also uses /**.

~Andrew


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] sched/arinc653: Reorganize function definition order
  2020-09-17  8:12   ` Jan Beulich
@ 2020-09-17 14:16     ` Dario Faggioli
  2020-09-18 18:21       ` Jeff Kubascik
  2020-09-17 14:17     ` Andrew Cooper
  1 sibling, 1 reply; 34+ messages in thread
From: Dario Faggioli @ 2020-09-17 14:16 UTC (permalink / raw)
  To: Jan Beulich, Jeff Kubascik
  Cc: xen-devel, xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap

[-- Attachment #1: Type: text/plain, Size: 2262 bytes --]

On Thu, 2020-09-17 at 10:12 +0200, Jan Beulich wrote:
> On 16.09.2020 20:18, Jeff Kubascik wrote:
> > @@ -517,27 +516,35 @@ static const struct scheduler
> > sched_arinc653_def = {
> >      .sched_id       = XEN_SCHEDULER_ARINC653,
> >      .sched_data     = NULL,
> >  
> > +    .global_init    = NULL,
> >      .init           = a653sched_init,
> >      .deinit         = a653sched_deinit,
> >  
> > -    .free_udata     = a653sched_free_udata,
> > -    .alloc_udata    = a653sched_alloc_udata,
> > +    .alloc_pdata    = NULL,
> > +    .switch_sched   = a653sched_switch_sched,
> > +    .deinit_pdata   = NULL,
> > +    .free_pdata     = NULL,
> >  
> > +    .alloc_domdata  = NULL,
> > +    .free_domdata   = NULL,
> > +
> > +    .alloc_udata    = a653sched_alloc_udata,
> >      .insert_unit    = NULL,
> >      .remove_unit    = NULL,
> > +    .free_udata     = a653sched_free_udata,
> >  
> >      .sleep          = a653sched_unit_sleep,
> >      .wake           = a653sched_unit_wake,
> >      .yield          = NULL,
> >      .context_saved  = NULL,
> >  
> > -    .do_schedule    = a653sched_do_schedule,
> > -
> >      .pick_resource  = a653sched_pick_resource,
> > +    .migrate        = NULL,
> >  
> > -    .switch_sched   = a653sched_switch_sched,
> > +    .do_schedule    = a653sched_do_schedule,
> >  
> >      .adjust         = NULL,
> > +    .adjust_affinity= NULL,
> 
> Adding all these not really needed NULL initializers looks to rather
> move
> this scheduler away from all the others.
>
Agreed, no need for more "= NULL". On the contrary, the ones that are
there should go away.

About this:

>  (Oddly enough all of them
> explicitly set .sched_data to NULL - for whatever reason.)
> 
Yes, we decided to keep it like that, back then. I think now it would
be ok for it to go away too.

So, Jeff, feel free to zap it with this patch or series. Or I can send
a patch to zap all of them, as you wish.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] sched/arinc653: Reorganize function definition order
  2020-09-17  8:12   ` Jan Beulich
  2020-09-17 14:16     ` Dario Faggioli
@ 2020-09-17 14:17     ` Andrew Cooper
  2020-09-18 18:04       ` Jeff Kubascik
  2020-09-18 18:05       ` Jeff Kubascik
  1 sibling, 2 replies; 34+ messages in thread
From: Andrew Cooper @ 2020-09-17 14:17 UTC (permalink / raw)
  To: Jan Beulich, Jeff Kubascik
  Cc: xen-devel, xen-devel, Josh Whitehead, Stewart Hildebrand,
	George Dunlap, Dario Faggioli

On 17/09/2020 09:12, Jan Beulich wrote:
> On 16.09.2020 20:18, Jeff Kubascik wrote:
>> @@ -517,27 +516,35 @@ static const struct scheduler sched_arinc653_def = {
>>      .sched_id       = XEN_SCHEDULER_ARINC653,
>>      .sched_data     = NULL,
>>  
>> +    .global_init    = NULL,
>>      .init           = a653sched_init,
>>      .deinit         = a653sched_deinit,
>>  
>> -    .free_udata     = a653sched_free_udata,
>> -    .alloc_udata    = a653sched_alloc_udata,
>> +    .alloc_pdata    = NULL,
>> +    .switch_sched   = a653sched_switch_sched,
>> +    .deinit_pdata   = NULL,
>> +    .free_pdata     = NULL,
>>  
>> +    .alloc_domdata  = NULL,
>> +    .free_domdata   = NULL,
>> +
>> +    .alloc_udata    = a653sched_alloc_udata,
>>      .insert_unit    = NULL,
>>      .remove_unit    = NULL,
>> +    .free_udata     = a653sched_free_udata,
>>  
>>      .sleep          = a653sched_unit_sleep,
>>      .wake           = a653sched_unit_wake,
>>      .yield          = NULL,
>>      .context_saved  = NULL,
>>  
>> -    .do_schedule    = a653sched_do_schedule,
>> -
>>      .pick_resource  = a653sched_pick_resource,
>> +    .migrate        = NULL,
>>  
>> -    .switch_sched   = a653sched_switch_sched,
>> +    .do_schedule    = a653sched_do_schedule,
>>  
>>      .adjust         = NULL,
>> +    .adjust_affinity= NULL,
> Adding all these not really needed NULL initializers looks to rather move
> this scheduler away from all the others. (Oddly enough all of them
> explicitly set .sched_data to NULL - for whatever reason.)

The "= NULL" is totally redundant, because the compiler will do that for
you.

The last user of .sched_data was dropped by 9c95227160.  Conceptually,
it is a layering violation (it prevents different cpupools being
properly independent), so I'd recommend just dropping the field entirely.

~Andrew


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] sched/arinc653: Clean up function definitions
  2020-09-17  8:09   ` Jan Beulich
@ 2020-09-17 14:40     ` Dario Faggioli
  2020-09-18 17:43       ` Jeff Kubascik
  0 siblings, 1 reply; 34+ messages in thread
From: Dario Faggioli @ 2020-09-17 14:40 UTC (permalink / raw)
  To: Jan Beulich, Jeff Kubascik
  Cc: xen-devel, xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap

[-- Attachment #1: Type: text/plain, Size: 2471 bytes --]

On Thu, 2020-09-17 at 10:09 +0200, Jan Beulich wrote:
> On 16.09.2020 20:18, Jeff Kubascik wrote:
> > --- a/xen/common/sched/arinc653.c
> > +++ b/xen/common/sched/arinc653.c
> > @@ -119,10 +119,9 @@ static int dom_handle_cmp(const
> > xen_domain_handle_t h1,
> >      return memcmp(h1, h2, sizeof(xen_domain_handle_t));
> >  }
> >  
> > -static struct sched_unit *find_unit(
> > -    const struct scheduler *ops,
> > -    xen_domain_handle_t handle,
> > -    int unit_id)
> > +static struct sched_unit *find_unit(const struct scheduler *ops,
> > +                                    xen_domain_handle_t handle,
> > +                                    int unit_id)
> >  {
> 
> Just fyi, afaict we consider both variants legitimate style as far
> as Xen as a whole is concerned; I'm unaware of scheduler code
> specific restrictions (but I'll be happy to be corrected if I'm
> wrong with this).
> 
No, you're right, there aren't any additional restrictions. And, as
many other subsystems, scheduling code is not always 100% consistent.
There's quite a mix of style. E.g., there are both examples of the
style that this hunk above is changing and of the one that the patch is
changing it to.

So I also see limited need for doing it. But of course it's Josh's and
Stweart's call, I guess.

> Instead what I'm wondering by merely seeing this piece of code is
> whether unit_id really can go negative. If not (as would be the
> common case with IDs), it would want converting to unsigned int,
> which may be more important than the purely typographical
> adjustment done here.
> 
Yep, it's defined as `unsigned int` in `struct sched_unit`.

So this indeed would be valuable. And while you're there, this probably
applies here as well:

/**
 * The sched_entry_t structure holds a single entry of the
 * ARINC 653 schedule.
 */
typedef struct sched_entry_s
{
    /* dom_handle holds the handle ("UUID") for the domain that this
     * schedule entry refers to. */
    xen_domain_handle_t dom_handle;
    /* unit_id holds the UNIT number for the UNIT that this schedule
     * entry refers to. */
    int                 unit_id;
    ...
}

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-16 18:18 ` [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling Jeff Kubascik
  2020-09-17  9:04   ` Jürgen Groß
@ 2020-09-17 14:42   ` Andrew Cooper
  2020-09-17 14:57     ` Stewart Hildebrand
  1 sibling, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-09-17 14:42 UTC (permalink / raw)
  To: Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli

On 16/09/2020 19:18, Jeff Kubascik wrote:
> +/*
> + * A handle with all zeros represents domain 0 if present, otherwise idle UNIT
> + */
> +#define DOM0_HANDLE ((const xen_domain_handle_t){0})

This isn't accurate.

There are systems where dom0 doesn't have a zero UUID (XenServer for
one), and its easy to create domU's which have a zero UUID.  They are
not unique, and can be changed at any time during the running of the VM.

If you need a unique identifier, then use domid's.

I can't see any legitimate need for the scheduler to handle the UUID at all.

~Andrew


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/5] sched/arinc653: Rename scheduler private structs
  2020-09-17 12:09   ` Andrew Cooper
@ 2020-09-17 14:46     ` Dario Faggioli
  2020-09-18 15:52       ` Jeff Kubascik
  0 siblings, 1 reply; 34+ messages in thread
From: Dario Faggioli @ 2020-09-17 14:46 UTC (permalink / raw)
  To: Andrew Cooper, Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap

[-- Attachment #1: Type: text/plain, Size: 1504 bytes --]

On Thu, 2020-09-17 at 13:09 +0100, Andrew Cooper wrote:
> On 16/09/2020 19:18, Jeff Kubascik wrote:
> > diff --git a/xen/common/sched/arinc653.c
> > b/xen/common/sched/arinc653.c
> > index 7bb75ffe2b..d8a23730c3 100644
> > --- a/xen/common/sched/arinc653.c
> > +++ b/xen/common/sched/arinc653.c
> > @@ -50,38 +50,38 @@
> >   * Return a pointer to the ARINC 653-specific scheduler data
> > information
> >   * associated with the given UNIT
> >   */
> > -#define AUNIT(unit) ((arinc653_unit_t *)(unit)->priv)
> > +#define AUNIT(unit) ((struct a653sched_unit *)(unit)->priv)
> >  
> >  /*
> >   * Return the global scheduler private data given the scheduler
> > ops pointer
> >   */
> > -#define SCHED_PRIV(s) ((a653sched_priv_t *)((s)->sched_data))
> > +#define SCHED_PRIV(s) ((struct a653sched_private *)((s)-
> > >sched_data))
> 
> While you're cleaning things up, please delete these macros (possibly
> in
> this patch, as you touch every almost every user).  They strictly
> introduce type safety issues, and are in the process of being taken
> out
> of the other schedulers.
> 
Agreed. See, e.g.: a1c329c2828b ("xen: credit2: make accessor helpers
inline functions instead of macros")


Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-17 14:42   ` Andrew Cooper
@ 2020-09-17 14:57     ` Stewart Hildebrand
  2020-09-17 16:18       ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Stewart Hildebrand @ 2020-09-17 14:57 UTC (permalink / raw)
  To: Andrew Cooper, Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap, Dario Faggioli

On Thursday, September 17, 2020 10:43 AM, Andrew Cooper wrote:
>On 16/09/2020 19:18, Jeff Kubascik wrote:
>> +/*
>> + * A handle with all zeros represents domain 0 if present, otherwise idle UNIT
>> + */
>> +#define DOM0_HANDLE ((const xen_domain_handle_t){0})
>
>This isn't accurate.
>
>There are systems where dom0 doesn't have a zero UUID (XenServer for
>one), and its easy to create domU's which have a zero UUID.  They are
>not unique, and can be changed at any time during the running of the VM.
>
>If you need a unique identifier, then use domid's.
>
>I can't see any legitimate need for the scheduler to handle the UUID at all.

We tried switching it to domid at one point in the past, but the problem was that when a domU reboots (destroy/create) the domid increments, so the schedule would have to be reinstantiated. At least that was the case before a recent patch series allowing to specify domid [1], but Jeff developed this CAST-32A series prior to that. The UUID can be specified in the .cfg file, allowing domUs to reboot and come back up with the same UUID.

Stew

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01574.html

>~Andrew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-17  9:04   ` Jürgen Groß
@ 2020-09-17 15:10     ` Stewart Hildebrand
  2020-09-17 15:18       ` Jürgen Groß
  2020-09-17 15:20       ` Dario Faggioli
  0 siblings, 2 replies; 34+ messages in thread
From: Stewart Hildebrand @ 2020-09-17 15:10 UTC (permalink / raw)
  To: Jürgen Groß, Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap, Dario Faggioli

On Thursday, September 17, 2020 5:04 AM, Jürgen Groß wrote:
>On 16.09.20 20:18, Jeff Kubascik wrote:
>> This change is an overhaul of the ARINC653 scheduler to enable CAST-32A
>> multicore scheduling. CAST-32A specifies that only one partition
>> (domain) can run during a minor frame, but that domain is now allowed to
>> have more than one vCPU.
>
>It might be worth to consider using just the core scheduling framework
>in order to achive this. Using a sched_granularity with the number of
>cpus in the cpupool running ARINC653 scheduler should already do the
>trick. There should be no further midification of ARINC653 scheduler
>required.
>

This CAST-32A multicore patch series allows you to have a different number of vCPUs (UNITs, I guess) assigned to domUs. For example, dom1 has a single vCPU, and dom2 has 4 vCPUs. I didn't think the core scheduling framework had this flexibility?

Stew

>
>Juergen

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-17 15:10     ` Stewart Hildebrand
@ 2020-09-17 15:18       ` Jürgen Groß
  2020-09-17 15:20       ` Dario Faggioli
  1 sibling, 0 replies; 34+ messages in thread
From: Jürgen Groß @ 2020-09-17 15:18 UTC (permalink / raw)
  To: Stewart Hildebrand, Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap, Dario Faggioli

On 17.09.20 17:10, Stewart Hildebrand wrote:
> On Thursday, September 17, 2020 5:04 AM, Jürgen Groß wrote:
>> On 16.09.20 20:18, Jeff Kubascik wrote:
>>> This change is an overhaul of the ARINC653 scheduler to enable CAST-32A
>>> multicore scheduling. CAST-32A specifies that only one partition
>>> (domain) can run during a minor frame, but that domain is now allowed to
>>> have more than one vCPU.
>>
>> It might be worth to consider using just the core scheduling framework
>> in order to achive this. Using a sched_granularity with the number of
>> cpus in the cpupool running ARINC653 scheduler should already do the
>> trick. There should be no further midification of ARINC653 scheduler
>> required.
>>
> 
> This CAST-32A multicore patch series allows you to have a different number of vCPUs (UNITs, I guess) assigned to domUs. For example, dom1 has a single vCPU, and dom2 has 4 vCPUs. I didn't think the core scheduling framework had this flexibility?

It has.

You can have a guest with only one vcpu running with any larger
granularity.


Juergen


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-17 15:10     ` Stewart Hildebrand
  2020-09-17 15:18       ` Jürgen Groß
@ 2020-09-17 15:20       ` Dario Faggioli
  2020-09-17 15:59         ` Stewart Hildebrand
  1 sibling, 1 reply; 34+ messages in thread
From: Dario Faggioli @ 2020-09-17 15:20 UTC (permalink / raw)
  To: Stewart Hildebrand, Jürgen Groß, Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap

[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]

On Thu, 2020-09-17 at 15:10 +0000, Stewart Hildebrand wrote:
> On Thursday, September 17, 2020 5:04 AM, Jürgen Groß wrote:
> > On 16.09.20 20:18, Jeff Kubascik wrote:
> > > This change is an overhaul of the ARINC653 scheduler to enable
> > > CAST-32A
> > > multicore scheduling. CAST-32A specifies that only one partition
> > > (domain) can run during a minor frame, but that domain is now
> > > allowed to
> > > have more than one vCPU.
> > 
> > It might be worth to consider using just the core scheduling
> > framework
> > in order to achive this. Using a sched_granularity with the number
> > of
> > cpus in the cpupool running ARINC653 scheduler should already do
> > the
> > trick. There should be no further midification of ARINC653
> > scheduler
> > required.
> > 
> 
> This CAST-32A multicore patch series allows you to have a different
> number of vCPUs (UNITs, I guess) assigned to domUs. 
>
Yep, you can have domains with different numbers of vCPUs, when using
core-scheduling (or socket-scheduling, or ...), if this is what you're
asking.

> For example, dom1 has a single vCPU, and dom2 has 4 vCPUs. I didn't
> think the core scheduling framework had this flexibility?
>
It does.

And if you have domain A with 1 vCPU and domain B with 2 vCPUs, with
sched-gran=core:
- when the vCPU of domain A is scheduled on a pCPU of a core, no vCPU 
  from domain B can be scheduled on the same core;
- when one of the vCPUs of domain B is scheduled on a pCPU of a core, 
  no other vCPUs, except the other vCPU of domain B can run on the 
  same core.

With sched-gran=socket, the same, with s/core/socket. :-)

So indeed it seems that sched-gran=NR_CPUS (actually, number of CPUs in
the pool, as Juergen is saying) may indeed do what you want.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-17 15:20       ` Dario Faggioli
@ 2020-09-17 15:59         ` Stewart Hildebrand
  2020-09-17 17:30           ` Dario Faggioli
  0 siblings, 1 reply; 34+ messages in thread
From: Stewart Hildebrand @ 2020-09-17 15:59 UTC (permalink / raw)
  To: Dario Faggioli, Jürgen Groß, Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap

On Thursday, September 17, 2020 11:20 AM, Dario Faggioli wrote:
>On Thu, 2020-09-17 at 15:10 +0000, Stewart Hildebrand wrote:
>> On Thursday, September 17, 2020 5:04 AM, Jürgen Groß wrote:
>> > On 16.09.20 20:18, Jeff Kubascik wrote:
>> > > This change is an overhaul of the ARINC653 scheduler to enable
>> > > CAST-32A
>> > > multicore scheduling. CAST-32A specifies that only one partition
>> > > (domain) can run during a minor frame, but that domain is now
>> > > allowed to
>> > > have more than one vCPU.
>> >
>> > It might be worth to consider using just the core scheduling
>> > framework
>> > in order to achive this. Using a sched_granularity with the number
>> > of
>> > cpus in the cpupool running ARINC653 scheduler should already do
>> > the
>> > trick. There should be no further midification of ARINC653
>> > scheduler
>> > required.
>> >
>>
>> This CAST-32A multicore patch series allows you to have a different
>> number of vCPUs (UNITs, I guess) assigned to domUs.
>>
>Yep, you can have domains with different numbers of vCPUs, when using
>core-scheduling (or socket-scheduling, or ...), if this is what you're
>asking.
>
>> For example, dom1 has a single vCPU, and dom2 has 4 vCPUs. I didn't
>> think the core scheduling framework had this flexibility?
>>
>It does.
>
>And if you have domain A with 1 vCPU and domain B with 2 vCPUs, with
>sched-gran=core:
>- when the vCPU of domain A is scheduled on a pCPU of a core, no vCPU
>  from domain B can be scheduled on the same core;
>- when one of the vCPUs of domain B is scheduled on a pCPU of a core,
>  no other vCPUs, except the other vCPU of domain B can run on the
>  same core.
>
>With sched-gran=socket, the same, with s/core/socket. :-)
>
>So indeed it seems that sched-gran=NR_CPUS (actually, number of CPUs in
>the pool, as Juergen is saying) may indeed do what you want.

Fascinating. Very cool, thanks for the insight. My understanding is that core scheduling is not currently enabled on arm. This series allows us to have multicore ARINC 653 on arm today without chasing down potential issues with core scheduling on arm...

Stew

>
>Regards
>--
>Dario Faggioli, Ph.D
>http://about.me/dario.faggioli
>Virtualization Software Engineer
>SUSE Labs, SUSE https://www.suse.com/
>-------------------------------------------------------------------
><<This happens because _I_ choose it to happen!>> (Raistlin Majere)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-17 14:57     ` Stewart Hildebrand
@ 2020-09-17 16:18       ` Andrew Cooper
  2020-09-17 17:57         ` Stewart Hildebrand
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-09-17 16:18 UTC (permalink / raw)
  To: Stewart Hildebrand, Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap, Dario Faggioli

On 17/09/2020 15:57, Stewart Hildebrand wrote:
> On Thursday, September 17, 2020 10:43 AM, Andrew Cooper wrote:
>> On 16/09/2020 19:18, Jeff Kubascik wrote:
>>> +/*
>>> + * A handle with all zeros represents domain 0 if present, otherwise idle UNIT
>>> + */
>>> +#define DOM0_HANDLE ((const xen_domain_handle_t){0})
>> This isn't accurate.
>>
>> There are systems where dom0 doesn't have a zero UUID (XenServer for
>> one), and its easy to create domU's which have a zero UUID.  They are
>> not unique, and can be changed at any time during the running of the VM.
>>
>> If you need a unique identifier, then use domid's.
>>
>> I can't see any legitimate need for the scheduler to handle the UUID at all.
> We tried switching it to domid at one point in the past, but the problem was that when a domU reboots (destroy/create) the domid increments, so the schedule would have to be reinstantiated.

How are settings specified?  If they're manually while the domain is
running, then I'd argue that is a user bug.

If the settings are specified in the VM's configuration file, and they
aren't reapplied on reboot, then that is a toolstack bug.

> At least that was the case before a recent patch series allowing to specify domid [1], but Jeff developed this CAST-32A series prior to that. The UUID can be specified in the .cfg file, allowing domUs to reboot and come back up with the same UUID.

The UUID can and does change at runtime in some cases, when you get into
more complicated lifecycle scenarios such as localhost live migration,
and/or VM Fork/CoW.


Having scheduler settings remembered by UUID, in the lower layers of the
hypervisor, feels like a layering violation.  It might work for your
specific usecase, but it feels like it is papering over the underlying
bug, and it is incompatible with other usage scenarios within Xen.

~Andrew


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-17 15:59         ` Stewart Hildebrand
@ 2020-09-17 17:30           ` Dario Faggioli
  2020-09-18 20:03             ` Jeff Kubascik
  0 siblings, 1 reply; 34+ messages in thread
From: Dario Faggioli @ 2020-09-17 17:30 UTC (permalink / raw)
  To: Stewart Hildebrand, Jürgen Groß, Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap

[-- Attachment #1: Type: text/plain, Size: 3279 bytes --]

On Thu, 2020-09-17 at 15:59 +0000, Stewart Hildebrand wrote:
> On Thursday, September 17, 2020 11:20 AM, Dario Faggioli wrote:
> > On Thu, 2020-09-17 at 15:10 +0000, Stewart Hildebrand wrote:
> > > > It might be worth to consider using just the core scheduling
> > > > framework
> > > > in order to achive this. Using a sched_granularity with the
> > > > number
> > > > of
> > > > cpus in the cpupool running ARINC653 scheduler should already
> > > > do
> > > > the
> > > > trick. There should be no further midification of ARINC653
> > > > scheduler
> > > > required.
> > > > 
> > > 
> > > This CAST-32A multicore patch series allows you to have a
> > > different
> > > number of vCPUs (UNITs, I guess) assigned to domUs.
> > > 
> > And if you have domain A with 1 vCPU and domain B with 2 vCPUs,
> > with
> > sched-gran=core:
> > - when the vCPU of domain A is scheduled on a pCPU of a core, no
> > vCPU
> >  from domain B can be scheduled on the same core;
> > - when one of the vCPUs of domain B is scheduled on a pCPU of a
> > core,
> >  no other vCPUs, except the other vCPU of domain B can run on the
> >  same core.
> 
> Fascinating. Very cool, thanks for the insight. My understanding is
> that core scheduling is not currently enabled on arm. This series
> allows us to have multicore ARINC 653 on arm today without chasing
> down potential issues with core scheduling on arm...
> 
Yeah, but at the cost of quite a bit of churn, and of a lot more code
in arinc653.c, basically duplicating the functionality.

I appreciate how crude and inaccurate this is, but arinc653.c is
currently 740 LOCs, and this patch is adding 601 and removing 204.

Add to this the fact that the architecture specific part of core-
scheduling should be limited to the handling of the context switches
(and that it may even work already, as what we weren't able to do was
proper testing).

If I can cite an anecdote, back in the days where core-scheduling was
being developed, I sent my own series implementing, for both credit1
and credit2. It had its issues, of course, but I think it had some
merits, even if compared with the current implementation we have
upstream (e.g., more flexibility, as core-scheduling could have been
enabled on a per-domain basis).

At least for me, a very big plus of the other approach that Juergen
suggested and then also implemented, was the fact that we would get the
feature for all the schedulers at once. And this (i.e., the fact that
it probably can be used for this purpose as well, without major changes
necessary inside ARINC653) seems to me to be a further confirmation
that it was the good way forward.

And don't think only to the need of writing the code (as you kind of
have it already), but also to testing. As in, the vast majority of the
core-scheduling logic and code is scheduler independent, and hence has
been stressed and tested already, even by people using schedulers
different than ARINC.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-17 16:18       ` Andrew Cooper
@ 2020-09-17 17:57         ` Stewart Hildebrand
  2020-09-18 19:22           ` Jeff Kubascik
  0 siblings, 1 reply; 34+ messages in thread
From: Stewart Hildebrand @ 2020-09-17 17:57 UTC (permalink / raw)
  To: Andrew Cooper, Jeff Kubascik, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap, Dario Faggioli

On Thursday, September 17, 2020 12:19 PM, Andrew Cooper wrote:
>On 17/09/2020 15:57, Stewart Hildebrand wrote:
>> On Thursday, September 17, 2020 10:43 AM, Andrew Cooper wrote:
>>> On 16/09/2020 19:18, Jeff Kubascik wrote:
>>>> +/*
>>>> + * A handle with all zeros represents domain 0 if present, otherwise idle UNIT
>>>> + */
>>>> +#define DOM0_HANDLE ((const xen_domain_handle_t){0})
>>> This isn't accurate.
>>>
>>> There are systems where dom0 doesn't have a zero UUID (XenServer for
>>> one), and its easy to create domU's which have a zero UUID.  They are
>>> not unique, and can be changed at any time during the running of the VM.
>>>
>>> If you need a unique identifier, then use domid's.
>>>
>>> I can't see any legitimate need for the scheduler to handle the UUID at all.
>> We tried switching it to domid at one point in the past, but the problem was that when a domU reboots (destroy/create) the domid
>increments, so the schedule would have to be reinstantiated.
>
>How are settings specified?  If they're manually while the domain is
>running, then I'd argue that is a user bug.

It could be either prior to domain creation or after. The user needs to know the UUID (or domid, if we were to switch to domid) of the domain(s) to be scheduled.

We typically use this utility [2] which relies on tools/libxc/xc_arinc653.c

[2] https://github.com/dornerworks/a653_sched

>
>If the settings are specified in the VM's configuration file, and they
>aren't reapplied on reboot, then that is a toolstack bug.
>
>> At least that was the case before a recent patch series allowing to specify domid [1], but Jeff developed this CAST-32A series prior to
>that. The UUID can be specified in the .cfg file, allowing domUs to reboot and come back up with the same UUID.
>
>The UUID can and does change at runtime in some cases, when you get into
>more complicated lifecycle scenarios such as localhost live migration,
>and/or VM Fork/CoW.
>
>
>Having scheduler settings remembered by UUID, in the lower layers of the
>hypervisor, feels like a layering violation.  It might work for your
>specific usecase, but it feels like it is papering over the underlying
>bug, and it is incompatible with other usage scenarios within Xen.

These are all good points. I'd welcome a switch to domid, but I feel it should be a separate patch or series.

Stew

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] sched/arinc653: Clean up comments
  2020-09-17 13:24   ` Andrew Cooper
@ 2020-09-18 15:33     ` Jeff Kubascik
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-18 15:33 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap,
	Dario Faggioli

On 9/17/2020 9:24 AM, Andrew Cooper wrote:
> On 16/09/2020 19:18, Jeff Kubascik wrote:
>> -/**
>> - * Retrieve the idle UNIT for a given physical CPU
>> +/*
>> + * Retrieve the idle UNIT for a given pCPU
>>   */
> 
> /** is also acceptable.  We've inherited quite a few doxygen-like
> comments, and there is currently a plan to move some things formally to
> kernel-doc as part of the automotive safety certification work, which
> also uses /**.
> 
> ~Andrew
> 

I didn't realize that was the case. I can switch these to /** if that would be
more desirable.

-Jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/5] sched/arinc653: Rename scheduler private structs
  2020-09-17 14:46     ` Dario Faggioli
@ 2020-09-18 15:52       ` Jeff Kubascik
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-18 15:52 UTC (permalink / raw)
  To: Dario Faggioli, Andrew Cooper, xen-devel
  Cc: xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap

>On Thu, 2020-09-17 at 13:09 +0100, Andrew Cooper wrote:
>> On 16/09/2020 19:18, Jeff Kubascik wrote:
>>> diff --git a/xen/common/sched/arinc653.c
>>> b/xen/common/sched/arinc653.c
>>> index 7bb75ffe2b..d8a23730c3 100644
>>> --- a/xen/common/sched/arinc653.c
>>> +++ b/xen/common/sched/arinc653.c
>>> @@ -50,38 +50,38 @@
>>>   * Return a pointer to the ARINC 653-specific scheduler data
>>> information
>>>   * associated with the given UNIT
>>>   */
>>> -#define AUNIT(unit) ((arinc653_unit_t *)(unit)->priv)
>>> +#define AUNIT(unit) ((struct a653sched_unit *)(unit)->priv)
>>>
>>>  /*
>>>   * Return the global scheduler private data given the scheduler
>>> ops pointer
>>>   */
>>> -#define SCHED_PRIV(s) ((a653sched_priv_t *)((s)->sched_data))
>>> +#define SCHED_PRIV(s) ((struct a653sched_private *)((s)-
>>>> sched_data))
>>
>> While you're cleaning things up, please delete these macros (possibly
>> in
>> this patch, as you touch every almost every user).  They strictly
>> introduce type safety issues, and are in the process of being taken
>> out
>> of the other schedulers.
>>
>Agreed. See, e.g.: a1c329c2828b ("xen: credit2: make accessor helpers
>inline functions instead of macros")

That should be easy enough - I'll make this change.

-Jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] sched/arinc653: Clean up function definitions
  2020-09-17 14:40     ` Dario Faggioli
@ 2020-09-18 17:43       ` Jeff Kubascik
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-18 17:43 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: xen-devel, xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap

On 9/17/2020 10:40 AM, Dario Faggioli wrote:
>On Thu, 2020-09-17 at 10:09 +0200, Jan Beulich wrote:
>> On 16.09.2020 20:18, Jeff Kubascik wrote:
>>> --- a/xen/common/sched/arinc653.c
>>> +++ b/xen/common/sched/arinc653.c
>>> @@ -119,10 +119,9 @@ static int dom_handle_cmp(const
>>> xen_domain_handle_t h1,
>>>      return memcmp(h1, h2, sizeof(xen_domain_handle_t));
>>>  }
>>>
>>> -static struct sched_unit *find_unit(
>>> -    const struct scheduler *ops,
>>> -    xen_domain_handle_t handle,
>>> -    int unit_id)
>>> +static struct sched_unit *find_unit(const struct scheduler *ops,
>>> +                                    xen_domain_handle_t handle,
>>> +                                    int unit_id)
>>>  {
>>
>> Just fyi, afaict we consider both variants legitimate style as far
>> as Xen as a whole is concerned; I'm unaware of scheduler code
>> specific restrictions (but I'll be happy to be corrected if I'm
>> wrong with this).
>>
>No, you're right, there aren't any additional restrictions. And, as
>many other subsystems, scheduling code is not always 100% consistent.
>There's quite a mix of style. E.g., there are both examples of the
>style that this hunk above is changing and of the one that the patch is
>changing it to.
>
>So I also see limited need for doing it. But of course it's Josh's and
>Stweart's call, I guess.

If that's the case, then I'm thinking keeping the previous style would be
preferred. I'll switch it back.

>> Instead what I'm wondering by merely seeing this piece of code is
>> whether unit_id really can go negative. If not (as would be the
>> common case with IDs), it would want converting to unsigned int,
>> which may be more important than the purely typographical
>> adjustment done here.
>>
>Yep, it's defined as `unsigned int` in `struct sched_unit`.
>
>So this indeed would be valuable. And while you're there, this probably
>applies here as well:
>
>/**
> * The sched_entry_t structure holds a single entry of the
> * ARINC 653 schedule.
> */
>typedef struct sched_entry_s
>{
>    /* dom_handle holds the handle ("UUID") for the domain that this
>     * schedule entry refers to. */
>    xen_domain_handle_t dom_handle;
>    /* unit_id holds the UNIT number for the UNIT that this schedule
>     * entry refers to. */
>    int                 unit_id;
>    ...
>}

Agreed. I'll make this change.

-Jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] sched/arinc653: Reorganize function definition order
  2020-09-17 14:17     ` Andrew Cooper
@ 2020-09-18 18:04       ` Jeff Kubascik
  2020-09-18 18:05       ` Jeff Kubascik
  1 sibling, 0 replies; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-18 18:04 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: xen-devel, xen-devel, Josh Whitehead, Stewart Hildebrand,
	George Dunlap, Dario Faggioli

On 9/17/2020 10:17 AM, Andrew Cooper wrote:
> On 17/09/2020 09:12, Jan Beulich wrote:
>> On 16.09.2020 20:18, Jeff Kubascik wrote:
>>> @@ -517,27 +516,35 @@ static const struct scheduler sched_arinc653_def = {
>>>      .sched_id       = XEN_SCHEDULER_ARINC653,
>>>      .sched_data     = NULL,
>>>
>>> +    .global_init    = NULL,
>>>      .init           = a653sched_init,
>>>      .deinit         = a653sched_deinit,
>>>
>>> -    .free_udata     = a653sched_free_udata,
>>> -    .alloc_udata    = a653sched_alloc_udata,
>>> +    .alloc_pdata    = NULL,
>>> +    .switch_sched   = a653sched_switch_sched,
>>> +    .deinit_pdata   = NULL,
>>> +    .free_pdata     = NULL,
>>>
>>> +    .alloc_domdata  = NULL,
>>> +    .free_domdata   = NULL,
>>> +
>>> +    .alloc_udata    = a653sched_alloc_udata,
>>>      .insert_unit    = NULL,
>>>      .remove_unit    = NULL,
>>> +    .free_udata     = a653sched_free_udata,
>>>
>>>      .sleep          = a653sched_unit_sleep,
>>>      .wake           = a653sched_unit_wake,
>>>      .yield          = NULL,
>>>      .context_saved  = NULL,
>>>
>>> -    .do_schedule    = a653sched_do_schedule,
>>> -
>>>      .pick_resource  = a653sched_pick_resource,
>>> +    .migrate        = NULL,
>>>
>>> -    .switch_sched   = a653sched_switch_sched,
>>> +    .do_schedule    = a653sched_do_schedule,
>>>
>>>      .adjust         = NULL,
>>> +    .adjust_affinity= NULL,
>> Adding all these not really needed NULL initializers looks to rather move
>> this scheduler away from all the others. (Oddly enough all of them
>> explicitly set .sched_data to NULL - for whatever reason.)
> 
> The "= NULL" is totally redundant, because the compiler will do that for
> you.

I agree with this. This originally was intended to lay the groundwork for patch
#5, but looking at it again, was confusing and unnecessary. I'll remove the =
NULL lines.

> The last user of .sched_data was dropped by 9c95227160.  Conceptually,
> it is a layering violation (it prevents different cpupools being
> properly independent), so I'd recommend just dropping the field entirely.

I'll remove .sched_data above.

-Jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] sched/arinc653: Reorganize function definition order
  2020-09-17 14:17     ` Andrew Cooper
  2020-09-18 18:04       ` Jeff Kubascik
@ 2020-09-18 18:05       ` Jeff Kubascik
  1 sibling, 0 replies; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-18 18:05 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: xen-devel, xen-devel, Josh Whitehead, Stewart Hildebrand,
	George Dunlap, Dario Faggioli

On 9/17/2020 10:17 AM, Andrew Cooper wrote:
> On 17/09/2020 09:12, Jan Beulich wrote:
>> On 16.09.2020 20:18, Jeff Kubascik wrote:
>>> @@ -517,27 +516,35 @@ static const struct scheduler sched_arinc653_def = {
>>>      .sched_id       = XEN_SCHEDULER_ARINC653,
>>>      .sched_data     = NULL,
>>>
>>> +    .global_init    = NULL,
>>>      .init           = a653sched_init,
>>>      .deinit         = a653sched_deinit,
>>>
>>> -    .free_udata     = a653sched_free_udata,
>>> -    .alloc_udata    = a653sched_alloc_udata,
>>> +    .alloc_pdata    = NULL,
>>> +    .switch_sched   = a653sched_switch_sched,
>>> +    .deinit_pdata   = NULL,
>>> +    .free_pdata     = NULL,
>>>
>>> +    .alloc_domdata  = NULL,
>>> +    .free_domdata   = NULL,
>>> +
>>> +    .alloc_udata    = a653sched_alloc_udata,
>>>      .insert_unit    = NULL,
>>>      .remove_unit    = NULL,
>>> +    .free_udata     = a653sched_free_udata,
>>>
>>>      .sleep          = a653sched_unit_sleep,
>>>      .wake           = a653sched_unit_wake,
>>>      .yield          = NULL,
>>>      .context_saved  = NULL,
>>>
>>> -    .do_schedule    = a653sched_do_schedule,
>>> -
>>>      .pick_resource  = a653sched_pick_resource,
>>> +    .migrate        = NULL,
>>>
>>> -    .switch_sched   = a653sched_switch_sched,
>>> +    .do_schedule    = a653sched_do_schedule,
>>>
>>>      .adjust         = NULL,
>>> +    .adjust_affinity= NULL,
>> Adding all these not really needed NULL initializers looks to rather move
>> this scheduler away from all the others. (Oddly enough all of them
>> explicitly set .sched_data to NULL - for whatever reason.)
> 
> The "= NULL" is totally redundant, because the compiler will do that for
> you.

I agree with this. This originally was intended to lay the groundwork for patch
#5, but looking at it again, was confusing and unnecessary. I'll remove the =
NULL lines.

> The last user of .sched_data was dropped by 9c95227160.  Conceptually,
> it is a layering violation (it prevents different cpupools being
> properly independent), so I'd recommend just dropping the field entirely.

I'll remove .sched_data above.

-Jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] sched/arinc653: Reorganize function definition order
  2020-09-17 14:16     ` Dario Faggioli
@ 2020-09-18 18:21       ` Jeff Kubascik
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-18 18:21 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: xen-devel, xen-devel, Josh Whitehead, Stewart Hildebrand, George Dunlap

On 9/17/2020 10:16 AM, Dario Faggioli wrote:
>On Thu, 2020-09-17 at 10:12 +0200, Jan Beulich wrote:
>> On 16.09.2020 20:18, Jeff Kubascik wrote:
>>> @@ -517,27 +516,35 @@ static const struct scheduler
>>> sched_arinc653_def = {
>>>      .sched_id       = XEN_SCHEDULER_ARINC653,
>>>      .sched_data     = NULL,
>>>
>>> +    .global_init    = NULL,
>>>      .init           = a653sched_init,
>>>      .deinit         = a653sched_deinit,
>>>
>>> -    .free_udata     = a653sched_free_udata,
>>> -    .alloc_udata    = a653sched_alloc_udata,
>>> +    .alloc_pdata    = NULL,
>>> +    .switch_sched   = a653sched_switch_sched,
>>> +    .deinit_pdata   = NULL,
>>> +    .free_pdata     = NULL,
>>>
>>> +    .alloc_domdata  = NULL,
>>> +    .free_domdata   = NULL,
>>> +
>>> +    .alloc_udata    = a653sched_alloc_udata,
>>>      .insert_unit    = NULL,
>>>      .remove_unit    = NULL,
>>> +    .free_udata     = a653sched_free_udata,
>>>
>>>      .sleep          = a653sched_unit_sleep,
>>>      .wake           = a653sched_unit_wake,
>>>      .yield          = NULL,
>>>      .context_saved  = NULL,
>>>
>>> -    .do_schedule    = a653sched_do_schedule,
>>> -
>>>      .pick_resource  = a653sched_pick_resource,
>>> +    .migrate        = NULL,
>>>
>>> -    .switch_sched   = a653sched_switch_sched,
>>> +    .do_schedule    = a653sched_do_schedule,
>>>
>>>      .adjust         = NULL,
>>> +    .adjust_affinity= NULL,
>>
>> Adding all these not really needed NULL initializers looks to rather
>> move
>> this scheduler away from all the others.
>>
>Agreed, no need for more "= NULL". On the contrary, the ones that are
>there should go away.

Agreed x2, I'll remove the "= NULL" lines.

>About this:
>
>>  (Oddly enough all of them
>> explicitly set .sched_data to NULL - for whatever reason.)
>>
>Yes, we decided to keep it like that, back then. I think now it would
>be ok for it to go away too.
>
>So, Jeff, feel free to zap it with this patch or series. Or I can send
>a patch to zap all of them, as you wish.

I'll remove the ".sched_data = NULL" line above, but my scope is limited to the
ARINC653 scheduler, so I won't be able to work on this.

-Jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-17 17:57         ` Stewart Hildebrand
@ 2020-09-18 19:22           ` Jeff Kubascik
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-18 19:22 UTC (permalink / raw)
  To: Stewart Hildebrand, Andrew Cooper, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap, Dario Faggioli

On 9/17/2020 1:57 PM, Stewart Hildebrand wrote:
> On Thursday, September 17, 2020 12:19 PM, Andrew Cooper wrote:
>> On 17/09/2020 15:57, Stewart Hildebrand wrote:
>>> On Thursday, September 17, 2020 10:43 AM, Andrew Cooper wrote:
>>>> On 16/09/2020 19:18, Jeff Kubascik wrote:
>>>>> +/*
>>>>> + * A handle with all zeros represents domain 0 if present, otherwise idle UNIT
>>>>> + */
>>>>> +#define DOM0_HANDLE ((const xen_domain_handle_t){0})
>>>> This isn't accurate.
>>>>
>>>> There are systems where dom0 doesn't have a zero UUID (XenServer for
>>>> one), and its easy to create domU's which have a zero UUID.  They are
>>>> not unique, and can be changed at any time during the running of the VM.
>>>>
>>>> If you need a unique identifier, then use domid's.
>>>>
>>>> I can't see any legitimate need for the scheduler to handle the UUID at all.
>>> We tried switching it to domid at one point in the past, but the problem was that when a domU reboots (destroy/create) the domid
>> increments, so the schedule would have to be reinstantiated.
>>
>> How are settings specified?  If they're manually while the domain is
>> running, then I'd argue that is a user bug.
> 
> It could be either prior to domain creation or after. The user needs to know the UUID (or domid, if we were to switch to domid) of the domain(s) to be scheduled.
> 
> We typically use this utility [2] which relies on tools/libxc/xc_arinc653.c
> 
> [2] https://github.com/dornerworks/a653_sched
> 
>>
>> If the settings are specified in the VM's configuration file, and they
>> aren't reapplied on reboot, then that is a toolstack bug.
>>
>>> At least that was the case before a recent patch series allowing to specify domid [1], but Jeff developed this CAST-32A series prior to
>> that. The UUID can be specified in the .cfg file, allowing domUs to reboot and come back up with the same UUID.
>>
>> The UUID can and does change at runtime in some cases, when you get into
>> more complicated lifecycle scenarios such as localhost live migration,
>> and/or VM Fork/CoW.
>>
>>
>> Having scheduler settings remembered by UUID, in the lower layers of the
>> hypervisor, feels like a layering violation.  It might work for your
>> specific usecase, but it feels like it is papering over the underlying
>> bug, and it is incompatible with other usage scenarios within Xen.
> 
> These are all good points. I'd welcome a switch to domid, but I feel it should be a separate patch or series.

Is there a configuration file or xl create option to specify the domid? I'm
unable to find the mentioned patch series, or anything in the documentation.

And agreed. The current implementation of the scheduler uses the UUID approach.
Switching to a domid approach would be better suited for a separate series.

-Jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-17 17:30           ` Dario Faggioli
@ 2020-09-18 20:03             ` Jeff Kubascik
  2020-09-18 20:34               ` Dario Faggioli
  2020-09-22 19:50               ` Andrew Cooper
  0 siblings, 2 replies; 34+ messages in thread
From: Jeff Kubascik @ 2020-09-18 20:03 UTC (permalink / raw)
  To: Dario Faggioli, Stewart Hildebrand, Jürgen Groß, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap

On 9/17/2020 1:30 PM, Dario Faggioli wrote:
>On Thu, 2020-09-17 at 15:59 +0000, Stewart Hildebrand wrote:
>> On Thursday, September 17, 2020 11:20 AM, Dario Faggioli wrote:
>>> On Thu, 2020-09-17 at 15:10 +0000, Stewart Hildebrand wrote:
>>>>> It might be worth to consider using just the core scheduling
>>>>> framework
>>>>> in order to achive this. Using a sched_granularity with the
>>>>> number
>>>>> of
>>>>> cpus in the cpupool running ARINC653 scheduler should already
>>>>> do
>>>>> the
>>>>> trick. There should be no further midification of ARINC653
>>>>> scheduler
>>>>> required.
>>>>>
>>>>
>>>> This CAST-32A multicore patch series allows you to have a
>>>> different
>>>> number of vCPUs (UNITs, I guess) assigned to domUs.
>>>>
>>> And if you have domain A with 1 vCPU and domain B with 2 vCPUs,
>>> with
>>> sched-gran=core:
>>> - when the vCPU of domain A is scheduled on a pCPU of a core, no
>>> vCPU
>>>  from domain B can be scheduled on the same core;
>>> - when one of the vCPUs of domain B is scheduled on a pCPU of a
>>> core,
>>>  no other vCPUs, except the other vCPU of domain B can run on the
>>>  same core.
>>
>> Fascinating. Very cool, thanks for the insight. My understanding is
>> that core scheduling is not currently enabled on arm. This series
>> allows us to have multicore ARINC 653 on arm today without chasing
>> down potential issues with core scheduling on arm...
>>
>Yeah, but at the cost of quite a bit of churn, and of a lot more code
>in arinc653.c, basically duplicating the functionality.
>
>I appreciate how crude and inaccurate this is, but arinc653.c is
>currently 740 LOCs, and this patch is adding 601 and removing 204.
>
>Add to this the fact that the architecture specific part of core-
>scheduling should be limited to the handling of the context switches
>(and that it may even work already, as what we weren't able to do was
>proper testing).
>
>If I can cite an anecdote, back in the days where core-scheduling was
>being developed, I sent my own series implementing, for both credit1
>and credit2. It had its issues, of course, but I think it had some
>merits, even if compared with the current implementation we have
>upstream (e.g., more flexibility, as core-scheduling could have been
>enabled on a per-domain basis).
>
>At least for me, a very big plus of the other approach that Juergen
>suggested and then also implemented, was the fact that we would get the
>feature for all the schedulers at once. And this (i.e., the fact that
>it probably can be used for this purpose as well, without major changes
>necessary inside ARINC653) seems to me to be a further confirmation
>that it was the good way forward.
>
>And don't think only to the need of writing the code (as you kind of
>have it already), but also to testing. As in, the vast majority of the
>core-scheduling logic and code is scheduler independent, and hence has
>been stressed and tested already, even by people using schedulers
>different than ARINC.

When is core scheduling expected to be available for ARM platforms? My
understanding is that this only works for Intel.

With core scheduling, is the pinning of vCPUs to pCPUs configurable? Or can the
scheduler change it at will? One advantage of this patch is that you can
explicitly pin a vCPU to a pCPU. This is a desirable feature for systems where
you are looking for determinism.

There are a few changes in this patch that I think should be pulled out if we go
the path of core scheduling. For instance, it removes a large structure from
being placed on the stack. It also adds the concept of an epoch so that the
scheduler doesn't drift (important for ARINC653) and can recover if a frame is
somehow missed (I commonly saw this when using a debugger). I also observed the
occasional kernel panic when using xl commands which was fixed by improving the
locking scheme.

-Jeff


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-18 20:03             ` Jeff Kubascik
@ 2020-09-18 20:34               ` Dario Faggioli
  2020-09-22 19:50               ` Andrew Cooper
  1 sibling, 0 replies; 34+ messages in thread
From: Dario Faggioli @ 2020-09-18 20:34 UTC (permalink / raw)
  To: Jeff Kubascik, Stewart Hildebrand, Jürgen Groß, xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap

[-- Attachment #1: Type: text/plain, Size: 2828 bytes --]

On Fri, 2020-09-18 at 16:03 -0400, Jeff Kubascik wrote:
> On 9/17/2020 1:30 PM, Dario Faggioli wrote:
> > On Thu, 2020-09-17 at 15:59 +0000, Stewart Hildebrand wrote:
> > > 
> > And don't think only to the need of writing the code (as you kind
> > of
> > have it already), but also to testing. As in, the vast majority of
> > the
> > core-scheduling logic and code is scheduler independent, and hence
> > has
> > been stressed and tested already, even by people using schedulers
> > different than ARINC.
> 
> When is core scheduling expected to be available for ARM platforms?
> My
> understanding is that this only works for Intel.
> 
As soon as "someone" actually tests it, and tell us how _well_ it works
*already*.  :-) :-P :-)

IIRC, there should be no (or really really few) code changes necesssary
for doing that.

> With core scheduling, is the pinning of vCPUs to pCPUs configurable?
> Or can the
> scheduler change it at will? One advantage of this patch is that you
> can
> explicitly pin a vCPU to a pCPU. This is a desirable feature for
> systems where
> you are looking for determinism.
> 
It certainly is. Now, as I think you know very well, some of the logic
for honoring the requested pinning is implemented inside each scheduler
(i.e., credit.c, credit2.c, etc).

Therefore, you *may* need to do the same in arinc653, probably
similarly to what you were doing in this patch (so there may indeed be
parts of this patch that are still necessary, but I still expect it to
become rather smaller and less intrusive).

But yeah, all the generic stuff is there already, and core-scheduling
does not prevent it to be used.

> There are a few changes in this patch that I think should be pulled
> out if we go
> the path of core scheduling. 
>
Sure, I totally agree.

> For instance, it removes a large structure from
> being placed on the stack. It also adds the concept of an epoch so
> that the
> scheduler doesn't drift (important for ARINC653) and can recover if a
> frame is
> somehow missed (I commonly saw this when using a debugger). I also
> observed the
> occasional kernel panic when using xl commands which was fixed by
> improving the
> locking scheme.
> 
Exactly, all these things looked legit and good to me. And they can be
pulled out in one or more other patches. Plus, you probably also need a
patch with the pinning bits.

But you should be able to get rid of all the parts that tries to make
sure that only one domain is scheduled at any given time.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling
  2020-09-18 20:03             ` Jeff Kubascik
  2020-09-18 20:34               ` Dario Faggioli
@ 2020-09-22 19:50               ` Andrew Cooper
  1 sibling, 0 replies; 34+ messages in thread
From: Andrew Cooper @ 2020-09-22 19:50 UTC (permalink / raw)
  To: Jeff Kubascik, Dario Faggioli, Stewart Hildebrand,
	Jürgen Groß,
	xen-devel
  Cc: xen-devel, Josh Whitehead, George Dunlap

On 18/09/2020 21:03, Jeff Kubascik wrote:
> On 9/17/2020 1:30 PM, Dario Faggioli wrote:
>> On Thu, 2020-09-17 at 15:59 +0000, Stewart Hildebrand wrote:
>>> On Thursday, September 17, 2020 11:20 AM, Dario Faggioli wrote:
>>>> On Thu, 2020-09-17 at 15:10 +0000, Stewart Hildebrand wrote:
>>>>>> It might be worth to consider using just the core scheduling
>>>>>> framework
>>>>>> in order to achive this. Using a sched_granularity with the
>>>>>> number
>>>>>> of
>>>>>> cpus in the cpupool running ARINC653 scheduler should already
>>>>>> do
>>>>>> the
>>>>>> trick. There should be no further midification of ARINC653
>>>>>> scheduler
>>>>>> required.
>>>>>>
>>>>> This CAST-32A multicore patch series allows you to have a
>>>>> different
>>>>> number of vCPUs (UNITs, I guess) assigned to domUs.
>>>>>
>>>> And if you have domain A with 1 vCPU and domain B with 2 vCPUs,
>>>> with
>>>> sched-gran=core:
>>>> - when the vCPU of domain A is scheduled on a pCPU of a core, no
>>>> vCPU
>>>>  from domain B can be scheduled on the same core;
>>>> - when one of the vCPUs of domain B is scheduled on a pCPU of a
>>>> core,
>>>>  no other vCPUs, except the other vCPU of domain B can run on the
>>>>  same core.
>>> Fascinating. Very cool, thanks for the insight. My understanding is
>>> that core scheduling is not currently enabled on arm. This series
>>> allows us to have multicore ARINC 653 on arm today without chasing
>>> down potential issues with core scheduling on arm...
>>>
>> Yeah, but at the cost of quite a bit of churn, and of a lot more code
>> in arinc653.c, basically duplicating the functionality.
>>
>> I appreciate how crude and inaccurate this is, but arinc653.c is
>> currently 740 LOCs, and this patch is adding 601 and removing 204.
>>
>> Add to this the fact that the architecture specific part of core-
>> scheduling should be limited to the handling of the context switches
>> (and that it may even work already, as what we weren't able to do was
>> proper testing).
>>
>> If I can cite an anecdote, back in the days where core-scheduling was
>> being developed, I sent my own series implementing, for both credit1
>> and credit2. It had its issues, of course, but I think it had some
>> merits, even if compared with the current implementation we have
>> upstream (e.g., more flexibility, as core-scheduling could have been
>> enabled on a per-domain basis).
>>
>> At least for me, a very big plus of the other approach that Juergen
>> suggested and then also implemented, was the fact that we would get the
>> feature for all the schedulers at once. And this (i.e., the fact that
>> it probably can be used for this purpose as well, without major changes
>> necessary inside ARINC653) seems to me to be a further confirmation
>> that it was the good way forward.
>>
>> And don't think only to the need of writing the code (as you kind of
>> have it already), but also to testing. As in, the vast majority of the
>> core-scheduling logic and code is scheduler independent, and hence has
>> been stressed and tested already, even by people using schedulers
>> different than ARINC.
> When is core scheduling expected to be available for ARM platforms? My
> understanding is that this only works for Intel.

x86, but the real answer is "any architecture which has some knowledge
of the thread/core/socket" topology of its CPUs.  ARM currently lacks
this information.

The actual implementation is totally architecture agnostic, and
implemented within the scheduler interface itself.

> With core scheduling, is the pinning of vCPUs to pCPUs configurable? Or can the
> scheduler change it at will? One advantage of this patch is that you can
> explicitly pin a vCPU to a pCPU. This is a desirable feature for systems where
> you are looking for determinism.

All schedulers, including ARINC, now operate on sched_item's (previously
vCPUs) and sched_resource's (previously pCPUs), by virtue of the change
to the common scheduler interface.

The default case (== thread scheduling), there is a granularity of 1,
and a single sched_item maps to a single vCPU, and a single
sched_resource maps to a single pCPU. 

For the "core" case, we interrogate hardware to see how many threads per
core there are.  There may be 1 (no SMT), 2 (common case) or 4 (Knights
Landing/Corner HPC systems).  This becomes the granularity under the hood.

Therefore, a single sched_resource maps to a single core, which is 1, 2,
or 4 threads.  A single sched_item maps to the same number of vCPUs
(currently taken sequentially from the domain's vcpu list.  Any gaps
filled by the idle vcpus).

The scheduling decision of when and where a sched_item runs is still up
to the scheduler.  Pinning is also now expressed at the sched_item =>
sched_resource level.

~Andrew


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-09-22 19:51 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-16 18:18 [PATCH 0/5] Multicore support for ARINC653 scheduler Jeff Kubascik
2020-09-16 18:18 ` [PATCH 1/5] sched/arinc653: Clean up comments Jeff Kubascik
2020-09-17 13:24   ` Andrew Cooper
2020-09-18 15:33     ` Jeff Kubascik
2020-09-16 18:18 ` [PATCH 2/5] sched/arinc653: Rename scheduler private structs Jeff Kubascik
2020-09-17 12:09   ` Andrew Cooper
2020-09-17 14:46     ` Dario Faggioli
2020-09-18 15:52       ` Jeff Kubascik
2020-09-16 18:18 ` [PATCH 3/5] sched/arinc653: Clean up function definitions Jeff Kubascik
2020-09-17  8:09   ` Jan Beulich
2020-09-17 14:40     ` Dario Faggioli
2020-09-18 17:43       ` Jeff Kubascik
2020-09-16 18:18 ` [PATCH 4/5] sched/arinc653: Reorganize function definition order Jeff Kubascik
2020-09-17  8:12   ` Jan Beulich
2020-09-17 14:16     ` Dario Faggioli
2020-09-18 18:21       ` Jeff Kubascik
2020-09-17 14:17     ` Andrew Cooper
2020-09-18 18:04       ` Jeff Kubascik
2020-09-18 18:05       ` Jeff Kubascik
2020-09-16 18:18 ` [PATCH 5/5] sched/arinc653: Implement CAST-32A multicore scheduling Jeff Kubascik
2020-09-17  9:04   ` Jürgen Groß
2020-09-17 15:10     ` Stewart Hildebrand
2020-09-17 15:18       ` Jürgen Groß
2020-09-17 15:20       ` Dario Faggioli
2020-09-17 15:59         ` Stewart Hildebrand
2020-09-17 17:30           ` Dario Faggioli
2020-09-18 20:03             ` Jeff Kubascik
2020-09-18 20:34               ` Dario Faggioli
2020-09-22 19:50               ` Andrew Cooper
2020-09-17 14:42   ` Andrew Cooper
2020-09-17 14:57     ` Stewart Hildebrand
2020-09-17 16:18       ` Andrew Cooper
2020-09-17 17:57         ` Stewart Hildebrand
2020-09-18 19:22           ` Jeff Kubascik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.