All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] wmediumd: lib: sched: fix another scheduling corner case
@ 2021-02-25 19:50 Johannes Berg
  2021-03-02  3:03 ` Bob Copeland
  0 siblings, 1 reply; 2+ messages in thread
From: Johannes Berg @ 2021-02-25 19:50 UTC (permalink / raw)
  To: linux-wireless, Bob Copeland; +Cc: Johannes Berg

From: Johannes Berg <johannes.berg@intel.com>

When running with an external scheduler that also uses the
event loop, we can detect e.g. a client disconnecting from
a server while in usfstl_sched_forward(), causing us to not
have a job anymore on the scheduler afterwards, which then
causes the assert at the end to get reached erroneously.

Move the job check and external wait into the loop so these
cases are covered correctly.

This actually happened in wmediumd on client disconnect at
this exact time, while running usfstl_sched_forward().
---
 wmediumd/lib/sched.c | 48 ++++++++++++++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 15 deletions(-)

diff --git a/wmediumd/lib/sched.c b/wmediumd/lib/sched.c
index 7ebd81a99b4e..7d0618b49cf2 100644
--- a/wmediumd/lib/sched.c
+++ b/wmediumd/lib/sched.c
@@ -182,22 +182,38 @@ void usfstl_sched_start(struct usfstl_scheduler *sched)
 
 struct usfstl_job *usfstl_sched_next(struct usfstl_scheduler *sched)
 {
-	struct usfstl_job *job;
-
-	/*
-	 * If external scheduler is active, we might get here with nothing
-	 * to do, so we just need to wait for an external input/job which
-	 * will add an job to our scheduler in usfstl_sched_add_job().
-	 *
-	 * And due to the fact that we don't have any API for canceling external
-	 * time request, we can request external time which adds a job on the
-	 * external scheduler and cancel internally, and get scheduled to run
-	 * with nothing to do.
-	 */
-	while (usfstl_list_empty(&sched->joblist) && sched->external_request)
-		usfstl_sched_external_wait(sched);
+	while (true) {
+		struct usfstl_job *job = usfstl_sched_next_pending(sched, NULL);
+
+		if (!job) {
+			/*
+			 * If external scheduler is active, we might get here
+			 * with nothing to do, so we just need to wait for an
+			 * external input/job which will add a job to our
+			 * scheduler.
+			 *
+			 * Due to the fact that we don't have any API for
+			 * cancelling external time requests, we might have
+			 * requested time from the external scheduler for a
+			 * job that subsequently got removed, ending up here
+			 * without a job, or one further in the future which
+			 * would cause usfstl_sched_forward() to wait again.
+			 *
+			 * Additionally, we might only remove the job we just
+			 * found during the usfstl_sched_forward() below, if
+			 * that causes the main loop to run and we detect an
+			 * event that causes a job removal (such as a client
+			 * disconnecting from a server), so the job pointer we
+			 * have might go stale. Hence, all of this needs to be
+			 * checked in the overall loop.
+			 */
+			if (sched->external_request) {
+				usfstl_sched_external_wait(sched);
+				continue;
+			}
+			break;
+		}
 
-	while ((job = usfstl_sched_next_pending(sched, NULL))) {
 		/*
 		 * Forward, but only if job isn't in the past - this
 		 * can happen if some job was inserted while we
@@ -214,6 +230,8 @@ struct usfstl_job *usfstl_sched_next(struct usfstl_scheduler *sched)
 		 * might have inserted an earlier job into the timeline.
 		 * If it's not this job's turn yet, reinsert it and check
 		 * what's up next in the next loop iteration.
+		 *
+		 * Also, 'job' might now have been removed, see above.
 		 */
 		if (usfstl_sched_next_pending(sched, NULL) != job)
 			continue;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] wmediumd: lib: sched: fix another scheduling corner case
  2021-02-25 19:50 [PATCH] wmediumd: lib: sched: fix another scheduling corner case Johannes Berg
@ 2021-03-02  3:03 ` Bob Copeland
  0 siblings, 0 replies; 2+ messages in thread
From: Bob Copeland @ 2021-03-02  3:03 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless, Johannes Berg

On Thu, Feb 25, 2021 at 08:50:42PM +0100, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> When running with an external scheduler that also uses the
> event loop, we can detect e.g. a client disconnecting from
> a server while in usfstl_sched_forward(), causing us to not
> have a job anymore on the scheduler afterwards, which then
> causes the assert at the end to get reached erroneously.

Applied, thanks.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-03-03  0:35 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-25 19:50 [PATCH] wmediumd: lib: sched: fix another scheduling corner case Johannes Berg
2021-03-02  3:03 ` Bob Copeland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.