All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/7] cobalt/process: skip trap handling for the root thread
@ 2019-11-24 15:46 Philippe Gerum
  2019-11-24 15:46 ` [PATCH v2 2/7] cobalt/thread: update inline documentation Philippe Gerum
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Philippe Gerum @ 2019-11-24 15:46 UTC (permalink / raw)
  To: xenomai

From: Philippe Gerum <rpm@xenomai.org>

Bottom line: the root thread is special, in that it must always be in
a runnable state, never bear any blocking bit.

Running the trap handler when sched->curr points at the root thread
means that we got a synchronous exception while running regular root
context code on the head stage, which is not related to any action we
might want to perform in the handler for Cobalt threads. Return early
in such a case.

Signed-off-by: Philippe Gerum <rpm@xenomai.org>
---
 kernel/cobalt/posix/process.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/cobalt/posix/process.c b/kernel/cobalt/posix/process.c
index f96abfa66..b90e53507 100644
--- a/kernel/cobalt/posix/process.c
+++ b/kernel/cobalt/posix/process.c
@@ -803,6 +803,11 @@ static inline int handle_exception(struct ipipe_trap_data *d)
 	sched = xnsched_current();
 	thread = sched->curr;
 
+	trace_cobalt_thread_fault(d);
+
+	if (xnthread_test_state(thread, XNROOT))
+		return 0;
+
 #ifdef IPIPE_KEVT_USERINTRET
 	if (xnarch_fault_bp_p(d) && user_mode(d->regs)) {
 		spl_t s;
@@ -817,11 +822,6 @@ static inline int handle_exception(struct ipipe_trap_data *d)
 	}
 #endif
 
-	if (xnthread_test_state(thread, XNROOT))
-		return 0;
-
-	trace_cobalt_thread_fault(d);
-
 	if (xnarch_fault_fpu_p(d)) {
 #ifdef CONFIG_XENO_ARCH_FPU
 		spl_t s;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/7] cobalt/thread: update inline documentation
  2019-11-24 15:46 [PATCH v2 1/7] cobalt/process: skip trap handling for the root thread Philippe Gerum
@ 2019-11-24 15:46 ` Philippe Gerum
  2019-11-24 15:46 ` [PATCH v2 3/7] cobalt/posix: fix missed rescheduling opportunities Philippe Gerum
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Philippe Gerum @ 2019-11-24 15:46 UTC (permalink / raw)
  To: xenomai

From: Philippe Gerum <rpm@xenomai.org>

Signed-off-by: Philippe Gerum <rpm@xenomai.org>
---
 kernel/cobalt/thread.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
index 666b4d4ca..a95396a0d 100644
--- a/kernel/cobalt/thread.c
+++ b/kernel/cobalt/thread.c
@@ -1012,16 +1012,16 @@ void xnthread_suspend(struct xnthread *thread, int mask,
 	 * We only care for threads that are not current, and for
 	 * XNSUSP, XNDELAY, XNDORMANT and XNHELD conditions, because:
 	 *
-	 * - There is no point in dealing with relaxed threads, since
-	 * personalities have to ask for primary mode switch when
-	 * processing any syscall which may block the caller
-	 * (i.e. __xn_exec_primary).
+	 * - There is no point in dealing with a relaxed thread which
+	 * is current, since personalities have to ask for primary
+	 * mode switch when processing any syscall which may block the
+	 * caller (i.e. __xn_exec_primary).
 	 *
 	 * - among all blocking bits (XNTHREAD_BLOCK_BITS), only
-	 * XNSUSP, XNDELAY and XNHELD may be applied by the current
-	 * thread to a non-current thread. XNPEND is always added by
-	 * the caller to its own state, XNMIGRATE and XNRELAX have
-	 * special semantics escaping this issue.
+	 * XNSUSP, XNDELAY, XNHELD and XNDBGSTOP may be applied by the
+	 * current thread to a non-current thread. XNPEND is always
+	 * added by the caller to its own state, XNMIGRATE, XNRELAX
+	 * and XNDBGSTOP have special semantics escaping this issue.
 	 *
 	 * We don't signal threads which are already in a dormant
 	 * state, since they are suspended by definition.
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/7] cobalt/posix: fix missed rescheduling opportunities
  2019-11-24 15:46 [PATCH v2 1/7] cobalt/process: skip trap handling for the root thread Philippe Gerum
  2019-11-24 15:46 ` [PATCH v2 2/7] cobalt/thread: update inline documentation Philippe Gerum
@ 2019-11-24 15:46 ` Philippe Gerum
  2019-11-25  9:09   ` Jan Kiszka
  2019-11-24 15:46 ` [PATCH v2 4/7] cobalt/process: remove superfluous check for non-relaxed state Philippe Gerum
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2019-11-24 15:46 UTC (permalink / raw)
  To: xenomai

From: Philippe Gerum <rpm@xenomai.org>

stop_debugged_thread() may suspend threads running on a remote CPU, we
have to kick the rescheduling procedure for these changes to take
effect asap. Make sure xnsched_run() is kicked from the call sites
once they are done updating the scheduler state.

Signed-off-by: Philippe Gerum <rpm@xenomai.org>
---
 kernel/cobalt/posix/process.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/cobalt/posix/process.c b/kernel/cobalt/posix/process.c
index b90e53507..1e003108e 100644
--- a/kernel/cobalt/posix/process.c
+++ b/kernel/cobalt/posix/process.c
@@ -819,6 +819,7 @@ static inline int handle_exception(struct ipipe_trap_data *d)
 		}
 		stop_debugged_process(thread);
 		xnlock_put_irqrestore(&nklock, s);
+		xnsched_run();
 	}
 #endif
 
@@ -1295,10 +1296,8 @@ static int handle_sigwake_event(struct task_struct *p)
 			cobalt_register_debugged_thread(thread);
 	}
 
-	if (xnthread_test_state(thread, XNRELAX)) {
-		xnlock_put_irqrestore(&nklock, s);
-		return KEVENT_PROPAGATE;
-	}
+	if (xnthread_test_state(thread, XNRELAX))
+		goto out;
 
 	/*
 	 * If kicking a shadow thread in primary mode, make sure Linux
@@ -1320,7 +1319,7 @@ static int handle_sigwake_event(struct task_struct *p)
 		xnthread_resume(thread, XNDBGSTOP);
 
 	__xnthread_kick(thread);
-
+out:
 	xnsched_run();
 
 	xnlock_put_irqrestore(&nklock, s);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/7] cobalt/process: remove superfluous check for non-relaxed state
  2019-11-24 15:46 [PATCH v2 1/7] cobalt/process: skip trap handling for the root thread Philippe Gerum
  2019-11-24 15:46 ` [PATCH v2 2/7] cobalt/thread: update inline documentation Philippe Gerum
  2019-11-24 15:46 ` [PATCH v2 3/7] cobalt/posix: fix missed rescheduling opportunities Philippe Gerum
@ 2019-11-24 15:46 ` Philippe Gerum
  2019-11-24 15:46 ` [PATCH v2 5/7] cobalt/thread: fix scheduler breakage on thread suspension Philippe Gerum
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Philippe Gerum @ 2019-11-24 15:46 UTC (permalink / raw)
  To: xenomai

From: Philippe Gerum <rpm@xenomai.org>

If sched->curr is not the root thread, it has to refer to a Cobalt
thread currently running in primary mode, by design. Convert this
check to an assertion.

Signed-off-by: Philippe Gerum <rpm@xenomai.org>
---
 kernel/cobalt/posix/process.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/cobalt/posix/process.c b/kernel/cobalt/posix/process.c
index 1e003108e..90949e519 100644
--- a/kernel/cobalt/posix/process.c
+++ b/kernel/cobalt/posix/process.c
@@ -812,11 +812,10 @@ static inline int handle_exception(struct ipipe_trap_data *d)
 	if (xnarch_fault_bp_p(d) && user_mode(d->regs)) {
 		spl_t s;
 
+		XENO_WARN_ON(CORE, xnthread_test_state(thread, XNRELAX));
 		xnlock_get_irqsave(&nklock, s);
-		if (!xnthread_test_state(thread, XNRELAX)) {
-			xnthread_set_info(thread, XNCONTHI);
-			ipipe_enable_user_intret_notifier();
-		}
+		xnthread_set_info(thread, XNCONTHI);
+		ipipe_enable_user_intret_notifier();
 		stop_debugged_process(thread);
 		xnlock_put_irqrestore(&nklock, s);
 		xnsched_run();
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/7] cobalt/thread: fix scheduler breakage on thread suspension
  2019-11-24 15:46 [PATCH v2 1/7] cobalt/process: skip trap handling for the root thread Philippe Gerum
                   ` (2 preceding siblings ...)
  2019-11-24 15:46 ` [PATCH v2 4/7] cobalt/process: remove superfluous check for non-relaxed state Philippe Gerum
@ 2019-11-24 15:46 ` Philippe Gerum
  2019-11-24 15:46 ` [PATCH v2 6/7] cobalt/sched: drop useless rmb in lock check Philippe Gerum
  2019-11-24 15:46 ` [PATCH v2 7/7] cobalt/sched: restrict rescheduling hints to migration-safe context Philippe Gerum
  5 siblings, 0 replies; 11+ messages in thread
From: Philippe Gerum @ 2019-11-24 15:46 UTC (permalink / raw)
  To: xenomai

From: Philippe Gerum <rpm@xenomai.org>

In the event that xnthread_suspend() adds blocking bit(s) to a thread
which is running in primary mode on a remote CPU at the time of the
call, a spurious rescheduling call is performed, causing the local CPU
to pick its next thread from the remote CPU's runqueue.

CPU0                              CPU1
----                              ----
                                  t2: ...
t1: suspend(t2)                       ...
       |
       |
       +------> t2->sched == t2->sched->curr (i.e. running primary)
                           |
                           |
                           +----> __xnsched_run(t2->sched);
                                        |
                                        |
                                        v
                          CPU0: ___xnsched__run(CPU1.sched);

IOW, CPU0 would pick the next thread from CPU1's runqueue. Conditions
for observing this bug:

- t1 is running in primary mode on the local CPU (such as
  CPU0:sched->curr == t1) , so that the rescheduling request needs no
  prior escalation to the head stage, allowing ___xnsched_run() to
  execute immediately using the sched slot pointer received from
  __xnsched_run().

- t2 is running in primary mode on a remote CPU (such as
  CPU1:sched->curr == t2) when t1 attempts to suspend it via a call to
  xnthread_suspend().

Signed-off-by: Philippe Gerum <rpm@xenomai.org>
---
 kernel/cobalt/thread.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
index a95396a0d..a5f8e1b00 100644
--- a/kernel/cobalt/thread.c
+++ b/kernel/cobalt/thread.c
@@ -962,12 +962,15 @@ void xnthread_suspend(struct xnthread *thread, int mask,
 	 * opportunity for interrupt delivery right before switching
 	 * context, which shortens the uninterruptible code path.
 	 *
-	 * We have to shut irqs off before __xnsched_run() though: if
-	 * an interrupt could preempt us in ___xnsched_run() right
-	 * after the call to xnarch_escalate() but before we grab the
-	 * nklock, we would enter the critical section in
-	 * xnsched_run() while running in secondary mode, which would
-	 * defeat the purpose of xnarch_escalate().
+	 * We have to shut irqs off before calling __xnsched_run()
+	 * though: if an interrupt could preempt us right after
+	 * xnarch_escalate() is passed but before the nklock is
+	 * grabbed, we would enter the critical section in
+	 * ___xnsched_run() from the root domain, which would defeat
+	 * the purpose of escalating the request.
+	 *
+	 * NOTE: using __xnsched_run() for rescheduling allows us to
+	 * break the scheduler lock temporarily.
 	 */
 	if (likely(thread == sched->curr)) {
 		xnsched_set_resched(sched);
@@ -978,10 +981,13 @@ void xnthread_suspend(struct xnthread *thread, int mask,
 			return;
 		}
 		/*
-		 * If the thread is runnning on another CPU,
-		 * xnsched_run will trigger the IPI as required.
+		 * If the thread is runnning on a remote CPU,
+		 * xnsched_run() will trigger the IPI as required.  In
+		 * this case, sched refers to a remote runqueue, so
+		 * make sure to always kick the rescheduling procedure
+		 * for the local one.
 		 */
-		__xnsched_run(sched);
+		__xnsched_run(xnsched_current());
 		goto out;
 	}
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 6/7] cobalt/sched: drop useless rmb in lock check
  2019-11-24 15:46 [PATCH v2 1/7] cobalt/process: skip trap handling for the root thread Philippe Gerum
                   ` (3 preceding siblings ...)
  2019-11-24 15:46 ` [PATCH v2 5/7] cobalt/thread: fix scheduler breakage on thread suspension Philippe Gerum
@ 2019-11-24 15:46 ` Philippe Gerum
  2019-11-24 15:46 ` [PATCH v2 7/7] cobalt/sched: restrict rescheduling hints to migration-safe context Philippe Gerum
  5 siblings, 0 replies; 11+ messages in thread
From: Philippe Gerum @ 2019-11-24 15:46 UTC (permalink / raw)
  To: xenomai

From: Philippe Gerum <rpm@xenomai.org>

Signed-off-by: Philippe Gerum <rpm@xenomai.org>
---
 include/cobalt/kernel/sched.h | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/cobalt/kernel/sched.h b/include/cobalt/kernel/sched.h
index 3b0825d67..c66e29d4c 100644
--- a/include/cobalt/kernel/sched.h
+++ b/include/cobalt/kernel/sched.h
@@ -298,6 +298,9 @@ void __xnsched_run_handler(void);
 static inline int __xnsched_run(struct xnsched *sched)
 {
 	/*
+	 * Reschedule if XNSCHED is pending, but never over an IRQ
+	 * handler or in the middle of unlocked context switch.
+	 *
 	 * NOTE: Since ___xnsched_run() won't run immediately if an
 	 * escalation to primary domain is needed, we won't use
 	 * critical scheduler information before we actually run in
@@ -328,15 +331,9 @@ static inline int __xnsched_run(struct xnsched *sched)
 static inline int xnsched_run(void)
 {
 	struct xnsched *sched = xnsched_current();
-	/*
-	 * No rescheduling is possible, either if:
-	 *
-	 * - the current thread holds the scheduler lock
-	 * - an ISR context is active
-	 * - we are caught in the middle of an unlocked context switch.
-	 */
-	smp_rmb();
-	if (unlikely(sched->curr->lock_count > 0))
+	struct xnthread *curr = sched->curr;
+
+	if (unlikely(curr->lock_count > 0))
 		return 0;
 
 	return __xnsched_run(sched);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 7/7] cobalt/sched: restrict rescheduling hints to migration-safe context
  2019-11-24 15:46 [PATCH v2 1/7] cobalt/process: skip trap handling for the root thread Philippe Gerum
                   ` (4 preceding siblings ...)
  2019-11-24 15:46 ` [PATCH v2 6/7] cobalt/sched: drop useless rmb in lock check Philippe Gerum
@ 2019-11-24 15:46 ` Philippe Gerum
  2019-11-25 10:40   ` Philippe Gerum
  5 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2019-11-24 15:46 UTC (permalink / raw)
  To: xenomai

From: Philippe Gerum <rpm@xenomai.org>

XNINIRQ, XNINSW and XNSCHED are quick hints loaded from per-cpu data
which are considered by the main rescheduling call (xnsched_run()) to
figure out whether the actual rescheduling backend (___xnsched_run())
should actually run immediately, or be postponed to a later call.

In seldom cases, we might call xnsched_run() from the root thread with
hard irqs enabled.  Because such requests also depend on those hints,
they are sensitive to CPU migration. The changes address this issue,
considering that such requests can be forwarded to ___xnsched_run()
directly - ignoring the hints - because the following premises apply
in this case:

- XNINIRQ must be cleared, since that one only appears when serving an
  IRQ from the head domain, in which case hard irqs must be disabled.

- XNINSW must be cleared, since it frames a Cobalt context switch,
  which can only happen from the head domain with hard irqs disabled
  too.

- the lock count is irrelevant for the root thread if hard irqs are
  enabled, and should be ignored as such (see
  xnsched_lock()). Disabling hard interrupts is the only way to
  prevent the root stage from being preempted by actual Cobalt
  threads.

- if xnsched_run() is called by a Cobalt interrupt handler which has
  preempted the root thread then unwinds
  (e.g. xnintr_core_clock_handler()), hard irqs are disabled, in which
  case the XNSCHED hint must be considered first.

Otherwise, we can safely assume that calling xnsched_run() from the
root stage with hard irqs enabled should cause an actual rescheduling
in most cases, either on the local CPU or remotely, following some
Cobalt syscall.  So far, only Cobalt thread creation and deletion
events are known to trigger potentially nop rescheduling requests, and
those are definitely slow paths already. In any case, ___xnsched_run()
ignores useless calls, so no harm done with infrequent invocations
from such paths.

This prevents any CPU migration issue by accessing per-cpu scheduler
data only when hard interrupts are disabled.

Signed-off-by: Philippe Gerum <rpm@xenomai.org>
---
 include/cobalt/kernel/sched.h | 37 ++++++++++++-----------------------
 kernel/cobalt/sched.c         |  6 +++++-
 2 files changed, 18 insertions(+), 25 deletions(-)

diff --git a/include/cobalt/kernel/sched.h b/include/cobalt/kernel/sched.h
index c66e29d4c..55581148e 100644
--- a/include/cobalt/kernel/sched.h
+++ b/include/cobalt/kernel/sched.h
@@ -300,26 +300,6 @@ static inline int __xnsched_run(struct xnsched *sched)
 	/*
 	 * Reschedule if XNSCHED is pending, but never over an IRQ
 	 * handler or in the middle of unlocked context switch.
-	 *
-	 * NOTE: Since ___xnsched_run() won't run immediately if an
-	 * escalation to primary domain is needed, we won't use
-	 * critical scheduler information before we actually run in
-	 * primary mode; therefore we can first test the scheduler
-	 * status then escalate.
-	 *
-	 * Running in the primary domain means that no Linux-triggered
-	 * CPU migration may occur from that point either. Finally,
-	 * since migration is always a self-directed operation for
-	 * Xenomai threads, we can safely read the scheduler state
-	 * bits without holding the nklock.
-	 *
-	 * Said differently, if we race here because of a CPU
-	 * migration, it must have been Linux-triggered because we run
-	 * in secondary mode; in which case we will escalate to the
-	 * primary domain, then unwind the current call frame without
-	 * running the rescheduling procedure in
-	 * ___xnsched_run(). Therefore, the scheduler slot
-	 * (i.e. "sched") will be either valid, or unused.
 	 */
 	if (((sched->status|sched->lflags) &
 	     (XNINIRQ|XNINSW|XNRESCHED)) != XNRESCHED)
@@ -331,12 +311,21 @@ static inline int __xnsched_run(struct xnsched *sched)
 static inline int xnsched_run(void)
 {
 	struct xnsched *sched = xnsched_current();
-	struct xnthread *curr = sched->curr;
+	struct xnthread *curr = READ_ONCE(sched->curr);
+	int ret = 0;
 
-	if (unlikely(curr->lock_count > 0))
-		return 0;
+	/*
+	 * If running over the root thread and hard irqs are enabled,
+	 * kick the rescheduling procedure unconditionally: no hints
+	 * apply.
+	 */
+	if (unlikely(xnthread_test_state(curr, XNROOT) &&
+			!hard_irqs_disabled())) {
+		ret = ___xnsched_run(sched);
+	} else if (likely(curr->lock_count == 0))
+		ret = __xnsched_run(sched);
 
-	return __xnsched_run(sched);
+	return ret;
 }
 
 void xnsched_lock(void);
diff --git a/kernel/cobalt/sched.c b/kernel/cobalt/sched.c
index 74dc7a961..024431af2 100644
--- a/kernel/cobalt/sched.c
+++ b/kernel/cobalt/sched.c
@@ -1000,7 +1000,11 @@ reschedule:
 
 	trace_cobalt_switch_context(prev, next);
 
-	sched->curr = next;
+	/*
+	 * sched->curr is shared locklessly with xnsched_run(), be
+	 * careful about load/store tearing.
+	 */
+	WRITE_ONCE(sched->curr, next);
 	shadow = 1;
 
 	if (xnthread_test_state(prev, XNROOT)) {
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/7] cobalt/posix: fix missed rescheduling opportunities
  2019-11-24 15:46 ` [PATCH v2 3/7] cobalt/posix: fix missed rescheduling opportunities Philippe Gerum
@ 2019-11-25  9:09   ` Jan Kiszka
  2019-11-25 10:09     ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2019-11-25  9:09 UTC (permalink / raw)
  To: Philippe Gerum, xenomai

On 24.11.19 16:46, Philippe Gerum via Xenomai wrote:
> From: Philippe Gerum <rpm@xenomai.org>
> 
> stop_debugged_thread() may suspend threads running on a remote CPU, we
> have to kick the rescheduling procedure for these changes to take
> effect asap. Make sure xnsched_run() is kicked from the call sites
> once they are done updating the scheduler state.
> 
> Signed-off-by: Philippe Gerum <rpm@xenomai.org>
> ---
>   kernel/cobalt/posix/process.c | 9 ++++-----
>   1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/cobalt/posix/process.c b/kernel/cobalt/posix/process.c
> index b90e53507..1e003108e 100644
> --- a/kernel/cobalt/posix/process.c
> +++ b/kernel/cobalt/posix/process.c
> @@ -819,6 +819,7 @@ static inline int handle_exception(struct ipipe_trap_data *d)
>   		}
>   		stop_debugged_process(thread);
>   		xnlock_put_irqrestore(&nklock, s);
> +		xnsched_run();

To reduce the potential (but unavoidable) delay between the exception 
and the remote thread stops, shouldn't this move under the lock? For the 
wakeup, we do this below.

Jan

>   	}
>   #endif
>   
> @@ -1295,10 +1296,8 @@ static int handle_sigwake_event(struct task_struct *p)
>   			cobalt_register_debugged_thread(thread);
>   	}
>   
> -	if (xnthread_test_state(thread, XNRELAX)) {
> -		xnlock_put_irqrestore(&nklock, s);
> -		return KEVENT_PROPAGATE;
> -	}
> +	if (xnthread_test_state(thread, XNRELAX))
> +		goto out;
>   
>   	/*
>   	 * If kicking a shadow thread in primary mode, make sure Linux
> @@ -1320,7 +1319,7 @@ static int handle_sigwake_event(struct task_struct *p)
>   		xnthread_resume(thread, XNDBGSTOP);
>   
>   	__xnthread_kick(thread);
> -
> +out:
>   	xnsched_run();
>   
>   	xnlock_put_irqrestore(&nklock, s);
> 

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/7] cobalt/posix: fix missed rescheduling opportunities
  2019-11-25  9:09   ` Jan Kiszka
@ 2019-11-25 10:09     ` Philippe Gerum
  0 siblings, 0 replies; 11+ messages in thread
From: Philippe Gerum @ 2019-11-25 10:09 UTC (permalink / raw)
  To: Jan Kiszka, xenomai

On 11/25/19 10:09 AM, Jan Kiszka wrote:
> On 24.11.19 16:46, Philippe Gerum via Xenomai wrote:
>> From: Philippe Gerum <rpm@xenomai.org>
>>
>> stop_debugged_thread() may suspend threads running on a remote CPU, we
>> have to kick the rescheduling procedure for these changes to take
>> effect asap. Make sure xnsched_run() is kicked from the call sites
>> once they are done updating the scheduler state.
>>
>> Signed-off-by: Philippe Gerum <rpm@xenomai.org>
>> ---
>>   kernel/cobalt/posix/process.c | 9 ++++-----
>>   1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/cobalt/posix/process.c b/kernel/cobalt/posix/process.c
>> index b90e53507..1e003108e 100644
>> --- a/kernel/cobalt/posix/process.c
>> +++ b/kernel/cobalt/posix/process.c
>> @@ -819,6 +819,7 @@ static inline int handle_exception(struct ipipe_trap_data *d)
>>           }
>>           stop_debugged_process(thread);
>>           xnlock_put_irqrestore(&nklock, s);
>> +        xnsched_run();
> 
> To reduce the potential (but unavoidable) delay between the exception and the remote thread stops, shouldn't this move under the lock? For the wakeup, we do this below.
> 

That is not necessary. We can't be preempted by the regular kernel logic in this routine since we are running over the head stage, so the only possible preemption would come from a Cobalt interrupt, which would imply that the rescheduling procedure was kicked even earlier (in order to switch us out) and the pending IPIs would have already been sent as a result of this.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 7/7] cobalt/sched: restrict rescheduling hints to migration-safe context
  2019-11-24 15:46 ` [PATCH v2 7/7] cobalt/sched: restrict rescheduling hints to migration-safe context Philippe Gerum
@ 2019-11-25 10:40   ` Philippe Gerum
  2019-11-25 10:55     ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2019-11-25 10:40 UTC (permalink / raw)
  To: Xenomai

On 11/24/19 4:46 PM, Philippe Gerum wrote:
> From: Philippe Gerum <rpm@xenomai.org>
> 
> XNINIRQ, XNINSW and XNSCHED are quick hints loaded from per-cpu data
> which are considered by the main rescheduling call (xnsched_run()) to
> figure out whether the actual rescheduling backend (___xnsched_run())
> should actually run immediately, or be postponed to a later call.
> 
> In seldom cases, we might call xnsched_run() from the root thread with
> hard irqs enabled.  Because such requests also depend on those hints,
> they are sensitive to CPU migration. The changes address this issue,
> considering that such requests can be forwarded to ___xnsched_run()
> directly - ignoring the hints - because the following premises apply
> in this case:
> 

Please drop this patch, it still does not solve the main issue which is that we might lose a rescheduling opportunity over the root stage in case of a CPU migration, e.g.:

CPU0                            CPU1
----                            ----

xnthread_resume/suspend(T)
<CPU migration>
                                xnsched_run()
                                    |
                                    |
                                    v
                          would only apply to rq(CPU1), missing updates to rq(CPU0)

To fix this, I don't see any other (sane) way than requiring all calls to xnsched_run() over the root stage to happen within an atomic section shared with the wakeup/suspend signal. Those cases should be few anyway, so a reasonable way of dealing with this issue would be to add an assertion to xnsched_run(), which should trigger upon migration-unsafe calling contexts, and fix a couple of call sites in the core.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 7/7] cobalt/sched: restrict rescheduling hints to migration-safe context
  2019-11-25 10:40   ` Philippe Gerum
@ 2019-11-25 10:55     ` Jan Kiszka
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kiszka @ 2019-11-25 10:55 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 25.11.19 11:40, Philippe Gerum via Xenomai wrote:
> On 11/24/19 4:46 PM, Philippe Gerum wrote:
>> From: Philippe Gerum <rpm@xenomai.org>
>>
>> XNINIRQ, XNINSW and XNSCHED are quick hints loaded from per-cpu data
>> which are considered by the main rescheduling call (xnsched_run()) to
>> figure out whether the actual rescheduling backend (___xnsched_run())
>> should actually run immediately, or be postponed to a later call.
>>
>> In seldom cases, we might call xnsched_run() from the root thread with
>> hard irqs enabled.  Because such requests also depend on those hints,
>> they are sensitive to CPU migration. The changes address this issue,
>> considering that such requests can be forwarded to ___xnsched_run()
>> directly - ignoring the hints - because the following premises apply
>> in this case:
>>
> 
> Please drop this patch, it still does not solve the main issue which is that we might lose a rescheduling opportunity over the root stage in case of a CPU migration, e.g.:
> 
> CPU0                            CPU1
> ----                            ----
> 
> xnthread_resume/suspend(T)
> <CPU migration>
>                                  xnsched_run()
>                                      |
>                                      |
>                                      v
>                            would only apply to rq(CPU1), missing updates to rq(CPU0)
> 
> To fix this, I don't see any other (sane) way than requiring all calls to xnsched_run() over the root stage to happen within an atomic section shared with the wakeup/suspend signal. Those cases should be few anyway, so a reasonable way of dealing with this issue would be to add an assertion to xnsched_run(), which should trigger upon migration-unsafe calling contexts, and fix a couple of call sites in the core.
> 

OK, simpler is better anyway.

I would then suggest to clean up via patch 6 in the very same run, to 
provide that commit a context.

Applying up to patch 5 to next now.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-11-25 10:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-24 15:46 [PATCH v2 1/7] cobalt/process: skip trap handling for the root thread Philippe Gerum
2019-11-24 15:46 ` [PATCH v2 2/7] cobalt/thread: update inline documentation Philippe Gerum
2019-11-24 15:46 ` [PATCH v2 3/7] cobalt/posix: fix missed rescheduling opportunities Philippe Gerum
2019-11-25  9:09   ` Jan Kiszka
2019-11-25 10:09     ` Philippe Gerum
2019-11-24 15:46 ` [PATCH v2 4/7] cobalt/process: remove superfluous check for non-relaxed state Philippe Gerum
2019-11-24 15:46 ` [PATCH v2 5/7] cobalt/thread: fix scheduler breakage on thread suspension Philippe Gerum
2019-11-24 15:46 ` [PATCH v2 6/7] cobalt/sched: drop useless rmb in lock check Philippe Gerum
2019-11-24 15:46 ` [PATCH v2 7/7] cobalt/sched: restrict rescheduling hints to migration-safe context Philippe Gerum
2019-11-25 10:40   ` Philippe Gerum
2019-11-25 10:55     ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.