All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
@ 2019-04-29  2:41 Jiufei Xue
       [not found] ` <20190430103201.9C2D92080C@mail.kernel.org>
  2019-05-06 15:31 ` Tejun Heo
  0 siblings, 2 replies; 4+ messages in thread
From: Jiufei Xue @ 2019-04-29  2:41 UTC (permalink / raw)
  To: cgroups, linux-mm; +Cc: tj, akpm, joseph.qi, bo.liu

synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
switch may not go to the workqueue after synchronize_rcu(). Thus
previous scheduled switches was not finished even flushing the
workqueue, which will cause a NULL pointer dereferenced followed below.

VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds.  Have a nice day...
BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
[<ffffffff8126a303>] evict+0xb3/0x180
[<ffffffff8126a760>] iput+0x1b0/0x230
[<ffffffff8127c690>] inode_switch_wbs_work_fn+0x3c0/0x6a0
[<ffffffff810a5b2e>] worker_thread+0x4e/0x490
[<ffffffff810a5ae0>] ? process_one_work+0x410/0x410
[<ffffffff810ac056>] kthread+0xe6/0x100
[<ffffffff8173c199>] ret_from_fork+0x39/0x50

Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
pending callbacks to finish. And inc isw_nr_in_flight after call_rcu()
in inode_switch_wbs() to make more sense.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
---
 fs/fs-writeback.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 36855c1f8daf..b16645b417d9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -523,8 +523,6 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
 
 	isw->inode = inode;
 
-	atomic_inc(&isw_nr_in_flight);
-
 	/*
 	 * In addition to synchronizing among switchers, I_WB_SWITCH tells
 	 * the RCU protected stat update paths to grab the i_page
@@ -532,6 +530,9 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
 	 * Let's continue after I_WB_SWITCH is guaranteed to be visible.
 	 */
 	call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn);
+
+	atomic_inc(&isw_nr_in_flight);
+
 	goto out_unlock;
 
 out_free:
@@ -901,7 +902,11 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
 void cgroup_writeback_umount(void)
 {
 	if (atomic_read(&isw_nr_in_flight)) {
-		synchronize_rcu();
+		/*
+		 * Use rcu_barrier() to wait for all pending callbacks to
+		 * ensure that all in-flight wb switches are in the workqueue.
+		 */
+		rcu_barrier();
 		flush_workqueue(isw_wq);
 	}
 }
-- 
2.19.1.856.g8858448bb


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
       [not found] ` <20190430103201.9C2D92080C@mail.kernel.org>
@ 2019-05-05 12:09   ` Jiufei Xue
  2019-05-20  9:49     ` Greg KH
  0 siblings, 1 reply; 4+ messages in thread
From: Jiufei Xue @ 2019-05-05 12:09 UTC (permalink / raw)
  To: Sasha Levin, cgroups; +Cc: tj, stable, stable



On 2019/4/30 下午6:32, Sasha Levin wrote:
> Hi,
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all.
> 
> The bot has tested the following trees: v5.0.10, v4.19.37, v4.14.114, v4.9.171, v4.4.179, v3.18.139.
> 
> v5.0.10: Build OK!
> v4.19.37: Build OK!
> v4.14.114: Build OK!
> v4.9.171: Failed to apply! Possible dependencies:
>     113c60970cf4 ("x86/intel_rdt: Add Haswell feature discovery")
>     2264d9c74dda ("x86/intel_rdt: Build structures for each resource based on cache topology")
>     3ee7e8697d58 ("bdi: Fix another oops in wb_workfn()")
>     4f341a5e4844 ("x86/intel_rdt: Add scheduler hook")
>     5318ce7d4686 ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
>     5b825c3af1d8 ("sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>")
>     5dd43ce2f69d ("sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>")
>     5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support")
>     60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
>     60ec2440c63d ("x86/intel_rdt: Add schemata file")
>     6b2bb7265f0b ("sched/wait: Introduce wait_var_event()")
>     78e99b4a2b9a ("x86/intel_rdt: Add CONFIG, Makefile, and basic initialization")
>     7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches")
>     8236b0ae31c8 ("bdi: wake up concurrent wb_shutdown() callers.")
>     c1c7c3f9d6bb ("x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID")
> 
> v4.4.179: Failed to apply! Possible dependencies:
>     0007bccc3cfd ("x86: Replace RDRAND forced-reseed with simple sanity check")
>     113c60970cf4 ("x86/intel_rdt: Add Haswell feature discovery")
>     1b74dde7c47c ("x86/cpu: Convert printk(KERN_<LEVEL> ...) to pr_<level>(...)")
>     27f6d22b037b ("perf/x86: Move perf_event.h to its new home")
>     39b0332a2158 ("perf/x86: Move perf_event_amd.c ........... => x86/events/amd/core.c")
>     3ee7e8697d58 ("bdi: Fix another oops in wb_workfn()")
>     4f341a5e4844 ("x86/intel_rdt: Add scheduler hook")
>     5318ce7d4686 ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
>     5b825c3af1d8 ("sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>")
>     5dd43ce2f69d ("sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>")
>     6b2bb7265f0b ("sched/wait: Introduce wait_var_event()")
>     724697648eec ("perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp")
>     7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches")
>     8236b0ae31c8 ("bdi: wake up concurrent wb_shutdown() callers.")
>     fa9cbf320e99 ("perf/x86: Move perf_event.c ............... => x86/events/core.c")
> 
> v3.18.139: Failed to apply! Possible dependencies:
>     0ae45f63d4ef ("vfs: add support for a lazytime mount option")
>     4452226ea276 ("writeback: move backing_dev_info->state into bdi_writeback")
>     52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
>     66114cad64bf ("writeback: separate out include/linux/backing-dev-defs.h")
>     682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
>     87e1d789bf55 ("writeback: implement [locked_]inode_to_wb_and_lock_list()")
>     a3816ab0e8fe ("fs: Convert show_fdinfo functions to void")
>     b16b1deb553a ("writeback: make writeback_control track the inode being written back")
>     b4caecd48005 ("fs: introduce f_op->mmap_capabilities for nommu mmap support")
>     bafc0dba1e20 ("buffer, writeback: make __block_write_full_page() honor cgroup writeback")
> 
> 
> How should we proceed with this patch?
> 
> --

I am sorry that I forgot to mention that the patch should be applied to stable
since v4.4.

v4.4.179 and v4.9.171 depend on the commit 7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches"). 
On these two versions we can just inc isw_nr_in_flight before return.

The patch is pasted below.

--- linux-4.4.179.orig/fs/fs-writeback.c.orig	2019-05-05 19:56:29.993961267 +0800
+++ linux-4.4.179/fs/fs-writeback.c	2019-05-05 19:39:55.880336751 +0800
@@ -502,8 +502,6 @@ static void inode_switch_wbs(struct inod
 	ihold(inode);
 	isw->inode = inode;
 
-	atomic_inc(&isw_nr_in_flight);
-
 	/*
 	 * In addition to synchronizing among switchers, I_WB_SWITCH tells
 	 * the RCU protected stat update paths to grab the mapping's
@@ -511,6 +509,9 @@ static void inode_switch_wbs(struct inod
 	 * Let's continue after I_WB_SWITCH is guaranteed to be visible.
 	 */
 	call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn);
+
+	atomic_inc(&isw_nr_in_flight);
+
 	return;
 
 out_free:
@@ -880,7 +881,11 @@ restart:
 void cgroup_writeback_umount(void)
 {
 	if (atomic_read(&isw_nr_in_flight)) {
-		synchronize_rcu();
+		/*
+		 * Use rcu_barrier() to wait for all pending callbacks to
+		 * ensure that all in-flight wb switches are in the workqueue.
+		 */
+		rcu_barrier();
 		flush_workqueue(isw_wq);
 	}
 }


Thanks,
Jiufei


> Thanks,
> Sasha
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
  2019-04-29  2:41 [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount Jiufei Xue
       [not found] ` <20190430103201.9C2D92080C@mail.kernel.org>
@ 2019-05-06 15:31 ` Tejun Heo
  1 sibling, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2019-05-06 15:31 UTC (permalink / raw)
  To: Jiufei Xue; +Cc: cgroups, linux-mm, akpm, joseph.qi, bo.liu

On Mon, Apr 29, 2019 at 10:41:08AM +0800, Jiufei Xue wrote:
> synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
> switch may not go to the workqueue after synchronize_rcu(). Thus
> previous scheduled switches was not finished even flushing the
> workqueue, which will cause a NULL pointer dereferenced followed below.
> 
> VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds.  Have a nice day...
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
> [<ffffffff8126a303>] evict+0xb3/0x180
> [<ffffffff8126a760>] iput+0x1b0/0x230
> [<ffffffff8127c690>] inode_switch_wbs_work_fn+0x3c0/0x6a0
> [<ffffffff810a5b2e>] worker_thread+0x4e/0x490
> [<ffffffff810a5ae0>] ? process_one_work+0x410/0x410
> [<ffffffff810ac056>] kthread+0xe6/0x100
> [<ffffffff8173c199>] ret_from_fork+0x39/0x50
> 
> Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
> pending callbacks to finish. And inc isw_nr_in_flight after call_rcu()
> in inode_switch_wbs() to make more sense.
> 
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Cc: stable@kernel.org

Andrew, I think it'd probably be best to route this through -mm.

Thanks!

-- 
tejun


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
  2019-05-05 12:09   ` Jiufei Xue
@ 2019-05-20  9:49     ` Greg KH
  0 siblings, 0 replies; 4+ messages in thread
From: Greg KH @ 2019-05-20  9:49 UTC (permalink / raw)
  To: Jiufei Xue; +Cc: Sasha Levin, cgroups, tj, stable, stable

On Sun, May 05, 2019 at 08:09:01PM +0800, Jiufei Xue wrote:
> 
> 
> On 2019/4/30 下午6:32, Sasha Levin wrote:
> > Hi,
> > 
> > [This is an automated email]
> > 
> > This commit has been processed because it contains a -stable tag.
> > The stable tag indicates that it's relevant for the following trees: all.
> > 
> > The bot has tested the following trees: v5.0.10, v4.19.37, v4.14.114, v4.9.171, v4.4.179, v3.18.139.
> > 
> > v5.0.10: Build OK!
> > v4.19.37: Build OK!
> > v4.14.114: Build OK!
> > v4.9.171: Failed to apply! Possible dependencies:
> >     113c60970cf4 ("x86/intel_rdt: Add Haswell feature discovery")
> >     2264d9c74dda ("x86/intel_rdt: Build structures for each resource based on cache topology")
> >     3ee7e8697d58 ("bdi: Fix another oops in wb_workfn()")
> >     4f341a5e4844 ("x86/intel_rdt: Add scheduler hook")
> >     5318ce7d4686 ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
> >     5b825c3af1d8 ("sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>")
> >     5dd43ce2f69d ("sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>")
> >     5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support")
> >     60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
> >     60ec2440c63d ("x86/intel_rdt: Add schemata file")
> >     6b2bb7265f0b ("sched/wait: Introduce wait_var_event()")
> >     78e99b4a2b9a ("x86/intel_rdt: Add CONFIG, Makefile, and basic initialization")
> >     7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches")
> >     8236b0ae31c8 ("bdi: wake up concurrent wb_shutdown() callers.")
> >     c1c7c3f9d6bb ("x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID")
> > 
> > v4.4.179: Failed to apply! Possible dependencies:
> >     0007bccc3cfd ("x86: Replace RDRAND forced-reseed with simple sanity check")
> >     113c60970cf4 ("x86/intel_rdt: Add Haswell feature discovery")
> >     1b74dde7c47c ("x86/cpu: Convert printk(KERN_<LEVEL> ...) to pr_<level>(...)")
> >     27f6d22b037b ("perf/x86: Move perf_event.h to its new home")
> >     39b0332a2158 ("perf/x86: Move perf_event_amd.c ........... => x86/events/amd/core.c")
> >     3ee7e8697d58 ("bdi: Fix another oops in wb_workfn()")
> >     4f341a5e4844 ("x86/intel_rdt: Add scheduler hook")
> >     5318ce7d4686 ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()")
> >     5b825c3af1d8 ("sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>")
> >     5dd43ce2f69d ("sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>")
> >     6b2bb7265f0b ("sched/wait: Introduce wait_var_event()")
> >     724697648eec ("perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp")
> >     7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches")
> >     8236b0ae31c8 ("bdi: wake up concurrent wb_shutdown() callers.")
> >     fa9cbf320e99 ("perf/x86: Move perf_event.c ............... => x86/events/core.c")
> > 
> > v3.18.139: Failed to apply! Possible dependencies:
> >     0ae45f63d4ef ("vfs: add support for a lazytime mount option")
> >     4452226ea276 ("writeback: move backing_dev_info->state into bdi_writeback")
> >     52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
> >     66114cad64bf ("writeback: separate out include/linux/backing-dev-defs.h")
> >     682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
> >     87e1d789bf55 ("writeback: implement [locked_]inode_to_wb_and_lock_list()")
> >     a3816ab0e8fe ("fs: Convert show_fdinfo functions to void")
> >     b16b1deb553a ("writeback: make writeback_control track the inode being written back")
> >     b4caecd48005 ("fs: introduce f_op->mmap_capabilities for nommu mmap support")
> >     bafc0dba1e20 ("buffer, writeback: make __block_write_full_page() honor cgroup writeback")
> > 
> > 
> > How should we proceed with this patch?
> > 
> > --
> 
> I am sorry that I forgot to mention that the patch should be applied to stable
> since v4.4.
> 
> v4.4.179 and v4.9.171 depend on the commit 7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches"). 
> On these two versions we can just inc isw_nr_in_flight before return.

Thanks, I've just backported 7fc5854f8c6e to those kernels now and then
this applied.

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-05-20  9:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-29  2:41 [PATCH v4 RESEND] fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount Jiufei Xue
     [not found] ` <20190430103201.9C2D92080C@mail.kernel.org>
2019-05-05 12:09   ` Jiufei Xue
2019-05-20  9:49     ` Greg KH
2019-05-06 15:31 ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.