* [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
@ 2019-04-29 2:41 Jiufei Xue
[not found] ` <20190430103201.9C2D92080C@mail.kernel.org>
2019-05-06 15:31 ` Tejun Heo
0 siblings, 2 replies; 4+ messages in thread
From: Jiufei Xue @ 2019-04-29 2:41 UTC (permalink / raw)
To: cgroups, linux-mm; +Cc: tj, akpm, joseph.qi, bo.liu
synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
switch may not go to the workqueue after synchronize_rcu(). Thus
previous scheduled switches was not finished even flushing the
workqueue, which will cause a NULL pointer dereferenced followed below.
VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds. Have a nice day...
BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
[<ffffffff8126a303>] evict+0xb3/0x180
[<ffffffff8126a760>] iput+0x1b0/0x230
[<ffffffff8127c690>] inode_switch_wbs_work_fn+0x3c0/0x6a0
[<ffffffff810a5b2e>] worker_thread+0x4e/0x490
[<ffffffff810a5ae0>] ? process_one_work+0x410/0x410
[<ffffffff810ac056>] kthread+0xe6/0x100
[<ffffffff8173c199>] ret_from_fork+0x39/0x50
Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
pending callbacks to finish. And inc isw_nr_in_flight after call_rcu()
in inode_switch_wbs() to make more sense.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
---
fs/fs-writeback.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 36855c1f8daf..b16645b417d9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -523,8 +523,6 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
isw->inode = inode;
- atomic_inc(&isw_nr_in_flight);
-
/*
* In addition to synchronizing among switchers, I_WB_SWITCH tells
* the RCU protected stat update paths to grab the i_page
@@ -532,6 +530,9 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
* Let's continue after I_WB_SWITCH is guaranteed to be visible.
*/
call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn);
+
+ atomic_inc(&isw_nr_in_flight);
+
goto out_unlock;
out_free:
@@ -901,7 +902,11 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
void cgroup_writeback_umount(void)
{
if (atomic_read(&isw_nr_in_flight)) {
- synchronize_rcu();
+ /*
+ * Use rcu_barrier() to wait for all pending callbacks to
+ * ensure that all in-flight wb switches are in the workqueue.
+ */
+ rcu_barrier();
flush_workqueue(isw_wq);
}
}
--
2.19.1.856.g8858448bb
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
[not found] ` <20190430103201.9C2D92080C@mail.kernel.org>
@ 2019-05-05 12:09 ` Jiufei Xue
2019-05-20 9:49 ` Greg KH
0 siblings, 1 reply; 4+ messages in thread
From: Jiufei Xue @ 2019-05-05 12:09 UTC (permalink / raw)
To: Sasha Levin, cgroups; +Cc: tj, stable, stable
On 2019/4/30 下午6:32, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all.
>
> The bot has tested the following trees: v5.0.10, v4.19.37, v4.14.114, v4.9.171, v4.4.179, v3.18.139.
>
> v5.0.10: Build OK!
> v4.19.37: Build OK!
> v4.14.114: Build OK!
> v4.9.171: Failed to apply! Possible dependencies:
> 113c60970cf4 ("x86/intel_rdt: Add Haswell feature discovery")
> 2264d9c74dda ("x86/intel_rdt: Build structures for each resource based on cache topology")
> 3ee7e8697d58 ("bdi: Fix another oops in wb_workfn()")
> 4f341a5e4844 ("x86/intel_rdt: Add scheduler hook")
> 5318ce7d4686 ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
> 5b825c3af1d8 ("sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>")
> 5dd43ce2f69d ("sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>")
> 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support")
> 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
> 60ec2440c63d ("x86/intel_rdt: Add schemata file")
> 6b2bb7265f0b ("sched/wait: Introduce wait_var_event()")
> 78e99b4a2b9a ("x86/intel_rdt: Add CONFIG, Makefile, and basic initialization")
> 7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches")
> 8236b0ae31c8 ("bdi: wake up concurrent wb_shutdown() callers.")
> c1c7c3f9d6bb ("x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID")
>
> v4.4.179: Failed to apply! Possible dependencies:
> 0007bccc3cfd ("x86: Replace RDRAND forced-reseed with simple sanity check")
> 113c60970cf4 ("x86/intel_rdt: Add Haswell feature discovery")
> 1b74dde7c47c ("x86/cpu: Convert printk(KERN_<LEVEL> ...) to pr_<level>(...)")
> 27f6d22b037b ("perf/x86: Move perf_event.h to its new home")
> 39b0332a2158 ("perf/x86: Move perf_event_amd.c ........... => x86/events/amd/core.c")
> 3ee7e8697d58 ("bdi: Fix another oops in wb_workfn()")
> 4f341a5e4844 ("x86/intel_rdt: Add scheduler hook")
> 5318ce7d4686 ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
> 5b825c3af1d8 ("sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>")
> 5dd43ce2f69d ("sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>")
> 6b2bb7265f0b ("sched/wait: Introduce wait_var_event()")
> 724697648eec ("perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp")
> 7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches")
> 8236b0ae31c8 ("bdi: wake up concurrent wb_shutdown() callers.")
> fa9cbf320e99 ("perf/x86: Move perf_event.c ............... => x86/events/core.c")
>
> v3.18.139: Failed to apply! Possible dependencies:
> 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
> 4452226ea276 ("writeback: move backing_dev_info->state into bdi_writeback")
> 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
> 66114cad64bf ("writeback: separate out include/linux/backing-dev-defs.h")
> 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
> 87e1d789bf55 ("writeback: implement [locked_]inode_to_wb_and_lock_list()")
> a3816ab0e8fe ("fs: Convert show_fdinfo functions to void")
> b16b1deb553a ("writeback: make writeback_control track the inode being written back")
> b4caecd48005 ("fs: introduce f_op->mmap_capabilities for nommu mmap support")
> bafc0dba1e20 ("buffer, writeback: make __block_write_full_page() honor cgroup writeback")
>
>
> How should we proceed with this patch?
>
> --
I am sorry that I forgot to mention that the patch should be applied to stable
since v4.4.
v4.4.179 and v4.9.171 depend on the commit 7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches").
On these two versions we can just inc isw_nr_in_flight before return.
The patch is pasted below.
--- linux-4.4.179.orig/fs/fs-writeback.c.orig 2019-05-05 19:56:29.993961267 +0800
+++ linux-4.4.179/fs/fs-writeback.c 2019-05-05 19:39:55.880336751 +0800
@@ -502,8 +502,6 @@ static void inode_switch_wbs(struct inod
ihold(inode);
isw->inode = inode;
- atomic_inc(&isw_nr_in_flight);
-
/*
* In addition to synchronizing among switchers, I_WB_SWITCH tells
* the RCU protected stat update paths to grab the mapping's
@@ -511,6 +509,9 @@ static void inode_switch_wbs(struct inod
* Let's continue after I_WB_SWITCH is guaranteed to be visible.
*/
call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn);
+
+ atomic_inc(&isw_nr_in_flight);
+
return;
out_free:
@@ -880,7 +881,11 @@ restart:
void cgroup_writeback_umount(void)
{
if (atomic_read(&isw_nr_in_flight)) {
- synchronize_rcu();
+ /*
+ * Use rcu_barrier() to wait for all pending callbacks to
+ * ensure that all in-flight wb switches are in the workqueue.
+ */
+ rcu_barrier();
flush_workqueue(isw_wq);
}
}
Thanks,
Jiufei
> Thanks,
> Sasha
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
2019-04-29 2:41 [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount Jiufei Xue
[not found] ` <20190430103201.9C2D92080C@mail.kernel.org>
@ 2019-05-06 15:31 ` Tejun Heo
1 sibling, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2019-05-06 15:31 UTC (permalink / raw)
To: Jiufei Xue; +Cc: cgroups, linux-mm, akpm, joseph.qi, bo.liu
On Mon, Apr 29, 2019 at 10:41:08AM +0800, Jiufei Xue wrote:
> synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
> switch may not go to the workqueue after synchronize_rcu(). Thus
> previous scheduled switches was not finished even flushing the
> workqueue, which will cause a NULL pointer dereferenced followed below.
>
> VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds. Have a nice day...
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
> [<ffffffff8126a303>] evict+0xb3/0x180
> [<ffffffff8126a760>] iput+0x1b0/0x230
> [<ffffffff8127c690>] inode_switch_wbs_work_fn+0x3c0/0x6a0
> [<ffffffff810a5b2e>] worker_thread+0x4e/0x490
> [<ffffffff810a5ae0>] ? process_one_work+0x410/0x410
> [<ffffffff810ac056>] kthread+0xe6/0x100
> [<ffffffff8173c199>] ret_from_fork+0x39/0x50
>
> Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
> pending callbacks to finish. And inc isw_nr_in_flight after call_rcu()
> in inode_switch_wbs() to make more sense.
>
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Cc: stable@kernel.org
Andrew, I think it'd probably be best to route this through -mm.
Thanks!
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
2019-05-05 12:09 ` Jiufei Xue
@ 2019-05-20 9:49 ` Greg KH
0 siblings, 0 replies; 4+ messages in thread
From: Greg KH @ 2019-05-20 9:49 UTC (permalink / raw)
To: Jiufei Xue; +Cc: Sasha Levin, cgroups, tj, stable, stable
On Sun, May 05, 2019 at 08:09:01PM +0800, Jiufei Xue wrote:
>
>
> On 2019/4/30 下午6:32, Sasha Levin wrote:
> > Hi,
> >
> > [This is an automated email]
> >
> > This commit has been processed because it contains a -stable tag.
> > The stable tag indicates that it's relevant for the following trees: all.
> >
> > The bot has tested the following trees: v5.0.10, v4.19.37, v4.14.114, v4.9.171, v4.4.179, v3.18.139.
> >
> > v5.0.10: Build OK!
> > v4.19.37: Build OK!
> > v4.14.114: Build OK!
> > v4.9.171: Failed to apply! Possible dependencies:
> > 113c60970cf4 ("x86/intel_rdt: Add Haswell feature discovery")
> > 2264d9c74dda ("x86/intel_rdt: Build structures for each resource based on cache topology")
> > 3ee7e8697d58 ("bdi: Fix another oops in wb_workfn()")
> > 4f341a5e4844 ("x86/intel_rdt: Add scheduler hook")
> > 5318ce7d4686 ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
> > 5b825c3af1d8 ("sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>")
> > 5dd43ce2f69d ("sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>")
> > 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support")
> > 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
> > 60ec2440c63d ("x86/intel_rdt: Add schemata file")
> > 6b2bb7265f0b ("sched/wait: Introduce wait_var_event()")
> > 78e99b4a2b9a ("x86/intel_rdt: Add CONFIG, Makefile, and basic initialization")
> > 7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches")
> > 8236b0ae31c8 ("bdi: wake up concurrent wb_shutdown() callers.")
> > c1c7c3f9d6bb ("x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID")
> >
> > v4.4.179: Failed to apply! Possible dependencies:
> > 0007bccc3cfd ("x86: Replace RDRAND forced-reseed with simple sanity check")
> > 113c60970cf4 ("x86/intel_rdt: Add Haswell feature discovery")
> > 1b74dde7c47c ("x86/cpu: Convert printk(KERN_<LEVEL> ...) to pr_<level>(...)")
> > 27f6d22b037b ("perf/x86: Move perf_event.h to its new home")
> > 39b0332a2158 ("perf/x86: Move perf_event_amd.c ........... => x86/events/amd/core.c")
> > 3ee7e8697d58 ("bdi: Fix another oops in wb_workfn()")
> > 4f341a5e4844 ("x86/intel_rdt: Add scheduler hook")
> > 5318ce7d4686 ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
> > 5b825c3af1d8 ("sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>")
> > 5dd43ce2f69d ("sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>")
> > 6b2bb7265f0b ("sched/wait: Introduce wait_var_event()")
> > 724697648eec ("perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp")
> > 7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches")
> > 8236b0ae31c8 ("bdi: wake up concurrent wb_shutdown() callers.")
> > fa9cbf320e99 ("perf/x86: Move perf_event.c ............... => x86/events/core.c")
> >
> > v3.18.139: Failed to apply! Possible dependencies:
> > 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
> > 4452226ea276 ("writeback: move backing_dev_info->state into bdi_writeback")
> > 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
> > 66114cad64bf ("writeback: separate out include/linux/backing-dev-defs.h")
> > 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
> > 87e1d789bf55 ("writeback: implement [locked_]inode_to_wb_and_lock_list()")
> > a3816ab0e8fe ("fs: Convert show_fdinfo functions to void")
> > b16b1deb553a ("writeback: make writeback_control track the inode being written back")
> > b4caecd48005 ("fs: introduce f_op->mmap_capabilities for nommu mmap support")
> > bafc0dba1e20 ("buffer, writeback: make __block_write_full_page() honor cgroup writeback")
> >
> >
> > How should we proceed with this patch?
> >
> > --
>
> I am sorry that I forgot to mention that the patch should be applied to stable
> since v4.4.
>
> v4.4.179 and v4.9.171 depend on the commit 7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches").
> On these two versions we can just inc isw_nr_in_flight before return.
Thanks, I've just backported 7fc5854f8c6e to those kernels now and then
this applied.
greg k-h
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-05-20 9:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-29 2:41 [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount Jiufei Xue
[not found] ` <20190430103201.9C2D92080C@mail.kernel.org>
2019-05-05 12:09 ` Jiufei Xue
2019-05-20 9:49 ` Greg KH
2019-05-06 15:31 ` Tejun Heo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.