All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
To: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>,
	Youssef Esmat <youssefesmat@google.com>,
	David Vernet <void@manifault.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	joseph.salisbury@canonical.com,
	Luca Abeni <luca.abeni@santannapisa.it>,
	Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>,
	Vineeth Pillai <vineeth@bitbyteword.org>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Phil Auld <pauld@redhat.com>,
	"Joel Fernandes (Google)" <joel@joelfernandes.org>
Subject: [PATCH v2 11/15] sched/deadline: Mark DL server as unthrottled before enqueue
Date: Tue, 12 Mar 2024 21:24:47 -0400	[thread overview]
Message-ID: <20240313012451.1693807-12-joel@joelfernandes.org> (raw)
In-Reply-To: <20240313012451.1693807-1-joel@joelfernandes.org>

The DL server may not have had its timer started if start_dl_timer()
returns 0 (say the zero-laxity time has already passed). In such cases,
mark the DL task which is about to be enqueued as not throttled and
cancel any previous timers, then do the enqueue.

This fixes the following crash:

[    9.263331] kernel BUG at kernel/sched/deadline.c:1765!
[    9.282382] Call Trace:
[    9.282767]  <TASK>
[    9.283086]  ? __die_body+0x62/0xb0
[    9.283602]  ? die+0x9b/0xc0
[    9.284036]  ? do_trap+0xa3/0x170
[    9.284528]  ? enqueue_dl_entity+0x45e/0x460
[    9.285158]  ? enqueue_dl_entity+0x45e/0x460
[    9.285791]  ? handle_invalid_op+0x65/0x80
[    9.286392]  ? enqueue_dl_entity+0x45e/0x460
[    9.287021]  ? exc_invalid_op+0x2f/0x40
[    9.287585]  ? asm_exc_invalid_op+0x16/0x20
[    9.288200]  ? find_later_rq+0x120/0x120
[    9.288775]  ? fair_server_init+0x40/0x40
[    9.289364]  ? enqueue_dl_entity+0x45e/0x460
[    9.289989]  ? find_later_rq+0x120/0x120
[    9.290564]  dl_task_timer+0x1d7/0x2f0
[    9.291120]  ? find_later_rq+0x120/0x120
[    9.291695]  __run_hrtimer+0x73/0x1b0
[    9.292238]  hrtimer_interrupt+0x216/0x2c0
[    9.292841]  __sysvec_apic_timer_interrupt+0x53/0x140
[    9.293581]  sysvec_apic_timer_interrupt+0x2d/0x80
[    9.294285]  asm_sysvec_apic_timer_interrupt+0x16/0x20

The crash can easily be reproduced by adding a 100ms delay as follows:

+int delay_inject_count;
+
 static void
 enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
 {
@@ -1827,6 +1830,12 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
                setup_new_dl_entity(dl_se);
        }

+       // 100ms delay every 20 enqueues.
+       if (delay_inject_count++ > 20) {
+               mdelay(100);
+               delay_inject_count = 0;
+       }
+
        /*
         * If we are still throttled, eg. we got replenished but are a
         * zero-laxity task and still got to wait, don't enqueue.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/sched/deadline.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5adfc15803c3..1d54231fbaa6 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1949,6 +1949,18 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags)
 	if (dl_se->dl_throttled && start_dl_timer(dl_se))
 		return;
 
+	/*
+	 * We're about to enqueue, make sure we're not ->dl_throttled!
+	 * In case the timer was not started, say because the 0-lax time
+	 * has passed, mark as not throttled and mark unarmed.
+	 * Also cancel earlier timers, since letting those run is pointless.
+	 */
+	if (dl_se->dl_throttled) {
+		hrtimer_try_to_cancel(&dl_se->dl_timer);
+		dl_se->dl_defer_armed = 0;
+		dl_se->dl_throttled = 0;
+	}
+
 	__enqueue_dl_entity(dl_se);
 }
 
-- 
2.34.1


  parent reply	other threads:[~2024-03-13  1:25 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-13  1:24 [PATCH v2 00/15] Fair scheduling deadline server fixes Joel Fernandes (Google)
2024-03-13  1:24 ` [PATCH v2 01/15] sched/core: Add clearing of ->dl_server in put_prev_task_balance() Joel Fernandes (Google)
2024-04-04 17:46   ` Daniel Bristot de Oliveira
2024-03-13  1:24 ` [PATCH v2 02/15] sched/core: Clear prev->dl_server in CFS pick fast path Joel Fernandes (Google)
2024-04-04 17:52   ` Daniel Bristot de Oliveira
2024-03-13  1:24 ` [PATCH v2 03/15] sched/core: Fix priority checking for DL server picks Joel Fernandes (Google)
2024-03-13  1:24 ` [PATCH v2 04/15] sched/core: Fix picking of tasks for core scheduling with DL server Joel Fernandes (Google)
2024-04-05  9:37   ` Daniel Bristot de Oliveira
2024-03-13  1:24 ` [PATCH v2 05/15] sched/debug: Use unsigned long for cpu variable to prevent cast errors Joel Fernandes (Google)
2024-03-13 20:02   ` Chris Hyser
2024-03-13  1:24 ` [PATCH v2 06/15] sched: server: Don't start hrtick for DL server tasks Joel Fernandes (Google)
2024-04-05  8:49   ` Daniel Bristot de Oliveira
2024-03-13  1:24 ` [PATCH v2 07/15] selftests/sched: Add a test to verify that DL server works with core scheduling Joel Fernandes (Google)
2024-03-13  1:24 ` [PATCH v2 08/15] selftests/sched: Migrate cs_prctl_test to kselfttest Joel Fernandes (Google)
2024-03-13 18:44   ` Chris Hyser
2024-03-13  1:24 ` [PATCH v2 09/15] admin-guide/hw-vuln: Correct prctl() argument description Joel Fernandes (Google)
2024-03-13 19:14   ` Chris Hyser
2024-03-13 19:26     ` Chris Hyser
2024-04-05  9:32   ` Daniel Bristot de Oliveira
2024-03-13  1:24 ` [PATCH v2 10/15] sched: Fix build error in "sched/rt: Remove default bandwidth control" Joel Fernandes (Google)
2024-03-13 20:06   ` Chris Hyser
2024-03-13  1:24 ` Joel Fernandes (Google) [this message]
2024-04-05  8:54   ` [PATCH v2 11/15] sched/deadline: Mark DL server as unthrottled before enqueue Daniel Bristot de Oliveira
2024-03-13  1:24 ` [PATCH v2 12/15] sched/deadline: Reverse args to dl_time_before in replenish Joel Fernandes (Google)
2024-04-05  8:54   ` Daniel Bristot de Oliveira
2024-03-13  1:24 ` [PATCH v2 13/15] sched/deadline: Make start_dl_timer callers more robust Joel Fernandes (Google)
2024-04-05  9:10   ` Daniel Bristot de Oliveira
2024-03-13  1:24 ` [PATCH v2 14/15] sched/deadline: Do not restart the DL server on replenish from timer Joel Fernandes (Google)
2024-04-05  9:11   ` Daniel Bristot de Oliveira
2024-03-13  1:24 ` [PATCH v2 15/15] sched/deadline: Always start a new period if CFS exceeded DL runtime Joel Fernandes (Google)
2024-04-05  9:19   ` Daniel Bristot de Oliveira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240313012451.1693807-12-joel@joelfernandes.org \
    --to=joel@joelfernandes.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joseph.salisbury@canonical.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luca.abeni@santannapisa.it \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=skhan@linuxfoundation.org \
    --cc=suleiman@google.com \
    --cc=tglx@linutronix.de \
    --cc=tommaso.cucinotta@santannapisa.it \
    --cc=vincent.guittot@linaro.org \
    --cc=vineeth@bitbyteword.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    --cc=youssefesmat@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.