linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: John Keeping <john@metanate.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sasha Levin <sashal@kernel.org>,
	mingo@redhat.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org
Subject: [PATCH AUTOSEL 5.18 19/53] sched/core: Always flush pending blk_plug
Date: Sun,  7 Aug 2022 21:33:14 -0400	[thread overview]
Message-ID: <20220808013350.314757-19-sashal@kernel.org> (raw)
In-Reply-To: <20220808013350.314757-1-sashal@kernel.org>

From: John Keeping <john@metanate.com>

[ Upstream commit 401e4963bf45c800e3e9ea0d3a0289d738005fd4 ]

With CONFIG_PREEMPT_RT, it is possible to hit a deadlock between two
normal priority tasks (SCHED_OTHER, nice level zero):

	INFO: task kworker/u8:0:8 blocked for more than 491 seconds.
	      Not tainted 5.15.49-rt46 #1
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	task:kworker/u8:0    state:D stack:    0 pid:    8 ppid:     2 flags:0x00000000
	Workqueue: writeback wb_workfn (flush-7:0)
	[<c08a3a10>] (__schedule) from [<c08a3d84>] (schedule+0xdc/0x134)
	[<c08a3d84>] (schedule) from [<c08a65a0>] (rt_mutex_slowlock_block.constprop.0+0xb8/0x174)
	[<c08a65a0>] (rt_mutex_slowlock_block.constprop.0) from [<c08a6708>]
	+(rt_mutex_slowlock.constprop.0+0xac/0x174)
	[<c08a6708>] (rt_mutex_slowlock.constprop.0) from [<c0374d60>] (fat_write_inode+0x34/0x54)
	[<c0374d60>] (fat_write_inode) from [<c0297304>] (__writeback_single_inode+0x354/0x3ec)
	[<c0297304>] (__writeback_single_inode) from [<c0297998>] (writeback_sb_inodes+0x250/0x45c)
	[<c0297998>] (writeback_sb_inodes) from [<c0297c20>] (__writeback_inodes_wb+0x7c/0xb8)
	[<c0297c20>] (__writeback_inodes_wb) from [<c0297f24>] (wb_writeback+0x2c8/0x2e4)
	[<c0297f24>] (wb_writeback) from [<c0298c40>] (wb_workfn+0x1a4/0x3e4)
	[<c0298c40>] (wb_workfn) from [<c0138ab8>] (process_one_work+0x1fc/0x32c)
	[<c0138ab8>] (process_one_work) from [<c0139120>] (worker_thread+0x22c/0x2d8)
	[<c0139120>] (worker_thread) from [<c013e6e0>] (kthread+0x16c/0x178)
	[<c013e6e0>] (kthread) from [<c01000fc>] (ret_from_fork+0x14/0x38)
	Exception stack(0xc10e3fb0 to 0xc10e3ff8)
	3fa0:                                     00000000 00000000 00000000 00000000
	3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
	3fe0: 00000000 00000000 00000000 00000000 00000013 00000000

	INFO: task tar:2083 blocked for more than 491 seconds.
	      Not tainted 5.15.49-rt46 #1
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	task:tar             state:D stack:    0 pid: 2083 ppid:  2082 flags:0x00000000
	[<c08a3a10>] (__schedule) from [<c08a3d84>] (schedule+0xdc/0x134)
	[<c08a3d84>] (schedule) from [<c08a41b0>] (io_schedule+0x14/0x24)
	[<c08a41b0>] (io_schedule) from [<c08a455c>] (bit_wait_io+0xc/0x30)
	[<c08a455c>] (bit_wait_io) from [<c08a441c>] (__wait_on_bit_lock+0x54/0xa8)
	[<c08a441c>] (__wait_on_bit_lock) from [<c08a44f4>] (out_of_line_wait_on_bit_lock+0x84/0xb0)
	[<c08a44f4>] (out_of_line_wait_on_bit_lock) from [<c0371fb0>] (fat_mirror_bhs+0xa0/0x144)
	[<c0371fb0>] (fat_mirror_bhs) from [<c0372a68>] (fat_alloc_clusters+0x138/0x2a4)
	[<c0372a68>] (fat_alloc_clusters) from [<c0370b14>] (fat_alloc_new_dir+0x34/0x250)
	[<c0370b14>] (fat_alloc_new_dir) from [<c03787c0>] (vfat_mkdir+0x58/0x148)
	[<c03787c0>] (vfat_mkdir) from [<c0277b60>] (vfs_mkdir+0x68/0x98)
	[<c0277b60>] (vfs_mkdir) from [<c027b484>] (do_mkdirat+0xb0/0xec)
	[<c027b484>] (do_mkdirat) from [<c0100060>] (ret_fast_syscall+0x0/0x1c)
	Exception stack(0xc2e1bfa8 to 0xc2e1bff0)
	bfa0:                   01ee42f0 01ee4208 01ee42f0 000041ed 00000000 00004000
	bfc0: 01ee42f0 01ee4208 00000000 00000027 01ee4302 00000004 000dcb00 01ee4190
	bfe0: 000dc368 bed11924 0006d4b0 b6ebddfc

Here the kworker is waiting on msdos_sb_info::s_lock which is held by
tar which is in turn waiting for a buffer which is locked waiting to be
flushed, but this operation is plugged in the kworker.

The lock is a normal struct mutex, so tsk_is_pi_blocked() will always
return false on !RT and thus the behaviour changes for RT.

It seems that the intent here is to skip blk_flush_plug() in the case
where a non-preemptible lock (such as a spinlock) has been converted to
a rtmutex on RT, which is the case covered by the SM_RTLOCK_WAIT
schedule flag.  But sched_submit_work() is only called from schedule()
which is never called in this scenario, so the check can simply be
deleted.

Looking at the history of the -rt patchset, in fact this change was
present from v5.9.1-rt20 until being dropped in v5.13-rt1 as it was part
of a larger patch [1] most of which was replaced by commit b4bfa3fcfe3b
("sched/core: Rework the __schedule() preempt argument").

As described in [1]:

   The schedule process must distinguish between blocking on a regular
   sleeping lock (rwsem and mutex) and a RT-only sleeping lock (spinlock
   and rwlock):
   - rwsem and mutex must flush block requests (blk_schedule_flush_plug())
     even if blocked on a lock. This can not deadlock because this also
     happens for non-RT.
     There should be a warning if the scheduling point is within a RCU read
     section.

   - spinlock and rwlock must not flush block requests. This will deadlock
     if the callback attempts to acquire a lock which is already acquired.
     Similarly to being preempted, there should be no warning if the
     scheduling point is within a RCU read section.

and with the tsk_is_pi_blocked() in the scheduler path, we hit the first
issue.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0022-locking-rtmutex-Use-custom-scheduling-function-for-s.patch?h=linux-5.10.y-rt-patches

Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20220708162702.1758865-1-john@metanate.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/sched/rt.h | 8 --------
 kernel/sched/core.c      | 8 ++++++--
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index e5af028c08b4..994c25640e15 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -39,20 +39,12 @@ static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *p)
 }
 extern void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task);
 extern void rt_mutex_adjust_pi(struct task_struct *p);
-static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
-{
-	return tsk->pi_blocked_on != NULL;
-}
 #else
 static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *task)
 {
 	return NULL;
 }
 # define rt_mutex_adjust_pi(p)		do { } while (0)
-static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
-{
-	return false;
-}
 #endif
 
 extern void normalize_rt_tasks(void);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dd11daa7a84b..6baf96d2fa39 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6460,8 +6460,12 @@ static inline void sched_submit_work(struct task_struct *tsk)
 			io_wq_worker_sleeping(tsk);
 	}
 
-	if (tsk_is_pi_blocked(tsk))
-		return;
+	/*
+	 * spinlock and rwlock must not flush block requests.  This will
+	 * deadlock if the callback attempts to acquire a lock which is
+	 * already acquired.
+	 */
+	SCHED_WARN_ON(current->__state & TASK_RTLOCK_WAIT);
 
 	/*
 	 * If we are going to sleep and we have plugged IO queued,
-- 
2.35.1


  parent reply	other threads:[~2022-08-08  1:43 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-08  1:32 [PATCH AUTOSEL 5.18 01/53] x86: Handle idle=nomwait cmdline properly for x86_idle Sasha Levin
2022-08-08  1:32 ` [PATCH AUTOSEL 5.18 02/53] arch: make TRACE_IRQFLAGS_NMI_SUPPORT generic Sasha Levin
2022-08-08  1:32 ` [PATCH AUTOSEL 5.18 03/53] arm64: kernel: drop unnecessary PoC cache clean+invalidate Sasha Levin
2022-08-08  1:32 ` [PATCH AUTOSEL 5.18 04/53] arm64: mm: provide idmap pointer to cpu_replace_ttbr1() Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 05/53] arm64: kaslr: defer initialization to initcall where permitted Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 06/53] arm64: Do not forget syscall when starting a new thread Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 07/53] arm64: fix oops in concurrently setting insn_emulation sysctls Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 08/53] mm: kasan: Ensure the tags are visible before the tag in page->flags Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 09/53] mm: kasan: Skip unpoisoning of user pages Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 10/53] mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 11/53] arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags" Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 12/53] arm64: errata: Remove AES hwcap for COMPAT tasks Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 13/53] ext2: Add more validity checks for inode counts Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 14/53] sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 15/53] genirq: Don't return error on missing optional irq_request_resources() Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 16/53] irqchip/mips-gic: Only register IPI domain when SMP is enabled Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 17/53] genirq: GENERIC_IRQ_IPI depends on SMP Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 18/53] sched/fair: fix case with reduced capacity CPU Sasha Levin
2022-08-08  1:33 ` Sasha Levin [this message]
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 20/53] irqchip/mips-gic: Check the return value of ioremap() in gic_of_init() Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 21/53] wait: Fix __wait_event_hrtimeout for RT/DL tasks Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 22/53] ARM: dts: imx6ul: add missing properties for sram Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 23/53] ARM: dts: imx6ul: change operating-points to uint32-matrix Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 24/53] ARM: dts: imx6ul: fix keypad compatible Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 25/53] ARM: dts: imx6ul: fix csi node compatible Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 26/53] ARM: dts: imx6ul: fix lcdif " Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 27/53] ARM: dts: imx6ul: fix qspi " Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 29/53] ARM: dts: ux500: Fix Janice accelerometer mounting matrix Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 30/53] ARM: dts: ux500: Fix Codina " Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 31/53] ARM: dts: ux500: Fix Gavini " Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 32/53] arm64: dts: qcom: timer should use only 32-bit size Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 33/53] spi: synquacer: Add missing clk_disable_unprepare() Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 34/53] ARM: OMAP2+: display: Fix refcount leak bug Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 35/53] ARM: OMAP2+: pdata-quirks: " Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 36/53] ACPI: EC: Remove duplicate ThinkPad X1 Carbon 6th entry from DMI quirks Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 37/53] ACPI: EC: Drop the EC_FLAGS_IGNORE_DSDT_GPE quirk Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 38/53] ACPI: PM: save NVS memory for Lenovo G40-45 Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 39/53] ACPI: LPSS: Fix missing check in register_device_clock() Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 40/53] ARM: dts: qcom: sdx55: Fix the IRQ trigger type for UART Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 41/53] arm64: dts: qcom: add missing AOSS QMP compatible fallback Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 42/53] arm64: dts: qcom: ipq8074: fix NAND node name Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 43/53] arm64: dts: allwinner: a64: orangepi-win: Fix LED " Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 44/53] ARM: shmobile: rcar-gen2: Increase refcount for new reference Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 45/53] firmware: tegra: Fix error check return value of debugfs_create_file() Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 46/53] hwmon: (dell-smm) Add Dell XPS 13 7390 to fan control whitelist Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 47/53] ACPI: video: Use native backlight on Dell Inspiron N4010 Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 48/53] hwmon: (sht15) Fix wrong assumptions in device remove callback Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 49/53] PM: hibernate: defer device probing when resuming from hibernation Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 50/53] selinux: fix memleak in security_read_state_kernel() Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 51/53] selinux: Add boundary check in put_entry() Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 52/53] skbuff: don't mix ubuf_info from different sources Sasha Levin
2022-08-08  1:33 ` [PATCH AUTOSEL 5.18 53/53] kasan: test: Silence GCC 12 warnings Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220808013350.314757-19-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=john@metanate.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).