From: Jan Kara <jack@suse.cz> To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Jan Kara <jack@suse.cz>, Tejun Heo <tj@kernel.org>, Dmitry Vyukov <dvyukov@google.com>, Jens Axboe <axboe@kernel.dk>, syzbot <syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com>, syzkaller-bugs <syzkaller-bugs@googlegroups.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, Al Viro <viro@zeniv.linux.org.uk>, Dave Chinner <david@fromorbit.com>, linux-block@vger.kernel.org, Linus Torvalds <torvalds@linux-foundation.org> Subject: Re: [PATCH] bdi: Fix another oops in wb_workfn() Date: Wed, 13 Jun 2018 16:46:06 +0200 [thread overview] Message-ID: <20180613144606.nvbcyg2rdjpxhf7s@quack2.suse.cz> (raw) In-Reply-To: <b210e55c-b06a-41ff-8592-0dbe21875779@i-love.sakura.ne.jp> [-- Attachment #1: Type: text/plain, Size: 521 bytes --] On Wed 13-06-18 19:43:47, Tetsuo Handa wrote: > Can't we utilize RCU grace period (like shown below) ? Honestly, the variant 1 looks too ugly to me. However variant 2 looks mostly OK. We can also avoid the schedule_timeout_uninterruptible(HZ / 10) from your patch by careful handling of the bit waitqueues. Also I'd avoid the addition argument to wb_writeback() and split the function instead. The patch resulting from your and mine ideas is attached. Thoughts? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR [-- Attachment #2: 0001-bdi-Fix-another-oops-in-wb_workfn.patch --] [-- Type: text/x-patch, Size: 5898 bytes --] >From d0fbfad964ae5b0a331f0d814d527150bd0c305c Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Date: Wed, 13 Jun 2018 16:39:00 +0200 Subject: [PATCH] bdi: Fix another oops in wb_workfn() syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was WB_shutting_down after wb->bdi->dev became NULL. This indicates that unregister_bdi() failed to call wb_shutdown() on one of wb objects. The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus drops bdi's reference to wb structures before going through the list of wbs again and calling wb_shutdown() on each of them. This way the loop iterating through all wbs can easily miss a wb if that wb has already passed through cgwb_remove_from_bdi_list() called from wb_shutdown() from cgwb_release_workfn() and as a result fully shutdown bdi although wb_workfn() for this wb structure is still running. In fact there are also other ways cgwb_bdi_unregister() can race with cgwb_release_workfn() leading e.g. to use-after-free issues: CPU1 CPU2 cgwb_bdi_unregister() cgwb_kill(*slot); cgwb_release() queue_work(cgwb_release_wq, &wb->release_work); cgwb_release_workfn() wb = list_first_entry(&bdi->wb_list, ...) spin_unlock_irq(&cgwb_lock); wb_shutdown(wb); ... kfree_rcu(wb, rcu); wb_shutdown(wb); -> oops use-after-free We solve these issues by making wb_shutdown() remove the writeback structure from the bdi_list only after all the shutdown is done and by making sure cgwb_bdi_unregister() properly synchronizes with concurrent shutdowns using WB_shutting_down bit. Signed-off-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/backing-dev.c | 69 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 55 insertions(+), 14 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 347cc834c04a..433807ae364e 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -350,27 +350,21 @@ static int wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi, static void cgwb_remove_from_bdi_list(struct bdi_writeback *wb); -/* - * Remove bdi from the global list and shutdown any threads we have running - */ -static void wb_shutdown(struct bdi_writeback *wb) +static bool wb_start_shutdown(struct bdi_writeback *wb) { /* Make sure nobody queues further work */ spin_lock_bh(&wb->work_lock); if (!test_and_clear_bit(WB_registered, &wb->state)) { spin_unlock_bh(&wb->work_lock); - /* - * Wait for wb shutdown to finish if someone else is just - * running wb_shutdown(). Otherwise we could proceed to wb / - * bdi destruction before wb_shutdown() is finished. - */ - wait_on_bit(&wb->state, WB_shutting_down, TASK_UNINTERRUPTIBLE); - return; + return false; } set_bit(WB_shutting_down, &wb->state); spin_unlock_bh(&wb->work_lock); + return true; +} - cgwb_remove_from_bdi_list(wb); +static void __wb_shutdown(struct bdi_writeback *wb) +{ /* * Drain work list and shutdown the delayed_work. !WB_registered * tells wb_workfn() that @wb is dying and its work_list needs to @@ -379,6 +373,7 @@ static void wb_shutdown(struct bdi_writeback *wb) mod_delayed_work(bdi_wq, &wb->dwork, 0); flush_delayed_work(&wb->dwork); WARN_ON(!list_empty(&wb->work_list)); + cgwb_remove_from_bdi_list(wb); /* * Make sure bit gets cleared after shutdown is finished. Matches with * the barrier provided by test_and_clear_bit() above. @@ -387,6 +382,23 @@ static void wb_shutdown(struct bdi_writeback *wb) clear_and_wake_up_bit(WB_shutting_down, &wb->state); } +/* + * Remove bdi from the global list and shutdown any threads we have running + */ +static void wb_shutdown(struct bdi_writeback *wb) +{ + if (!wb_start_shutdown(wb)) { + /* + * Wait for wb shutdown to finish if someone else is just + * running wb_shutdown(). Otherwise we could proceed to wb / + * bdi destruction before wb_shutdown() is finished. + */ + wait_on_bit(&wb->state, WB_shutting_down, TASK_UNINTERRUPTIBLE); + return; + } + __wb_shutdown(wb); +} + static void wb_exit(struct bdi_writeback *wb) { int i; @@ -706,6 +718,35 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi) return ret; } +/* + * Mark writeback structure as shutting down. The function returns true if + * we successfully marked the structure, false if shutdown was already in + * progress (in which case we also wait for it to finish). + * + * The function requires cgwb_lock to be held and releases it. Note that the + * caller need not hold reference to the writeback structure and thus once + * cgwb_lock is released, the writeback structure can be freed. + */ +static bool cgwb_start_shutdown(struct bdi_writeback *wb) + __releases(cgwb_lock) +{ + if (!wb_start_shutdown(wb)) { + DEFINE_WAIT(wait); + wait_queue_head_t *wqh = bit_waitqueue(&wb->state, + WB_shutting_down); + bool sleep; + + prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); + sleep = test_bit(WB_shutting_down, &wb->state); + spin_unlock_irq(&cgwb_lock); + if (sleep) + schedule(); + return false; + } + spin_unlock_irq(&cgwb_lock); + return true; +} + static void cgwb_bdi_unregister(struct backing_dev_info *bdi) { struct radix_tree_iter iter; @@ -721,8 +762,8 @@ static void cgwb_bdi_unregister(struct backing_dev_info *bdi) while (!list_empty(&bdi->wb_list)) { wb = list_first_entry(&bdi->wb_list, struct bdi_writeback, bdi_node); - spin_unlock_irq(&cgwb_lock); - wb_shutdown(wb); + if (cgwb_start_shutdown(wb)) + __wb_shutdown(wb); spin_lock_irq(&cgwb_lock); } spin_unlock_irq(&cgwb_lock); -- 2.13.7
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz> To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Jan Kara <jack@suse.cz>, Tejun Heo <tj@kernel.org>, Dmitry Vyukov <dvyukov@google.com>, Jens Axboe <axboe@kernel.dk>, syzbot <syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com>, syzkaller-bugs <syzkaller-bugs@googlegroups.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, Al Viro <viro@zeniv.linux.org.uk>, Dave Chinner <david@fromorbit.com>, linux-block@vger.kernel.org, Linus Torvalds <torvalds@linux-foundation.org> Subject: Re: [PATCH] bdi: Fix another oops in wb_workfn() Date: Wed, 13 Jun 2018 16:46:06 +0200 [thread overview] Message-ID: <20180613144606.nvbcyg2rdjpxhf7s@quack2.suse.cz> (raw) In-Reply-To: <b210e55c-b06a-41ff-8592-0dbe21875779@i-love.sakura.ne.jp> [-- Attachment #1: Type: text/plain, Size: 521 bytes --] On Wed 13-06-18 19:43:47, Tetsuo Handa wrote: > Can't we utilize RCU grace period (like shown below) ? Honestly, the variant 1 looks too ugly to me. However variant 2 looks mostly OK. We can also avoid the schedule_timeout_uninterruptible(HZ / 10) from your patch by careful handling of the bit waitqueues. Also I'd avoid the addition argument to wb_writeback() and split the function instead. The patch resulting from your and mine ideas is attached. Thoughts? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR [-- Attachment #2: 0001-bdi-Fix-another-oops-in-wb_workfn.patch --] [-- Type: text/x-patch, Size: 5897 bytes --] From d0fbfad964ae5b0a331f0d814d527150bd0c305c Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Date: Wed, 13 Jun 2018 16:39:00 +0200 Subject: [PATCH] bdi: Fix another oops in wb_workfn() syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was WB_shutting_down after wb->bdi->dev became NULL. This indicates that unregister_bdi() failed to call wb_shutdown() on one of wb objects. The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus drops bdi's reference to wb structures before going through the list of wbs again and calling wb_shutdown() on each of them. This way the loop iterating through all wbs can easily miss a wb if that wb has already passed through cgwb_remove_from_bdi_list() called from wb_shutdown() from cgwb_release_workfn() and as a result fully shutdown bdi although wb_workfn() for this wb structure is still running. In fact there are also other ways cgwb_bdi_unregister() can race with cgwb_release_workfn() leading e.g. to use-after-free issues: CPU1 CPU2 cgwb_bdi_unregister() cgwb_kill(*slot); cgwb_release() queue_work(cgwb_release_wq, &wb->release_work); cgwb_release_workfn() wb = list_first_entry(&bdi->wb_list, ...) spin_unlock_irq(&cgwb_lock); wb_shutdown(wb); ... kfree_rcu(wb, rcu); wb_shutdown(wb); -> oops use-after-free We solve these issues by making wb_shutdown() remove the writeback structure from the bdi_list only after all the shutdown is done and by making sure cgwb_bdi_unregister() properly synchronizes with concurrent shutdowns using WB_shutting_down bit. Signed-off-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/backing-dev.c | 69 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 55 insertions(+), 14 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 347cc834c04a..433807ae364e 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -350,27 +350,21 @@ static int wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi, static void cgwb_remove_from_bdi_list(struct bdi_writeback *wb); -/* - * Remove bdi from the global list and shutdown any threads we have running - */ -static void wb_shutdown(struct bdi_writeback *wb) +static bool wb_start_shutdown(struct bdi_writeback *wb) { /* Make sure nobody queues further work */ spin_lock_bh(&wb->work_lock); if (!test_and_clear_bit(WB_registered, &wb->state)) { spin_unlock_bh(&wb->work_lock); - /* - * Wait for wb shutdown to finish if someone else is just - * running wb_shutdown(). Otherwise we could proceed to wb / - * bdi destruction before wb_shutdown() is finished. - */ - wait_on_bit(&wb->state, WB_shutting_down, TASK_UNINTERRUPTIBLE); - return; + return false; } set_bit(WB_shutting_down, &wb->state); spin_unlock_bh(&wb->work_lock); + return true; +} - cgwb_remove_from_bdi_list(wb); +static void __wb_shutdown(struct bdi_writeback *wb) +{ /* * Drain work list and shutdown the delayed_work. !WB_registered * tells wb_workfn() that @wb is dying and its work_list needs to @@ -379,6 +373,7 @@ static void wb_shutdown(struct bdi_writeback *wb) mod_delayed_work(bdi_wq, &wb->dwork, 0); flush_delayed_work(&wb->dwork); WARN_ON(!list_empty(&wb->work_list)); + cgwb_remove_from_bdi_list(wb); /* * Make sure bit gets cleared after shutdown is finished. Matches with * the barrier provided by test_and_clear_bit() above. @@ -387,6 +382,23 @@ static void wb_shutdown(struct bdi_writeback *wb) clear_and_wake_up_bit(WB_shutting_down, &wb->state); } +/* + * Remove bdi from the global list and shutdown any threads we have running + */ +static void wb_shutdown(struct bdi_writeback *wb) +{ + if (!wb_start_shutdown(wb)) { + /* + * Wait for wb shutdown to finish if someone else is just + * running wb_shutdown(). Otherwise we could proceed to wb / + * bdi destruction before wb_shutdown() is finished. + */ + wait_on_bit(&wb->state, WB_shutting_down, TASK_UNINTERRUPTIBLE); + return; + } + __wb_shutdown(wb); +} + static void wb_exit(struct bdi_writeback *wb) { int i; @@ -706,6 +718,35 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi) return ret; } +/* + * Mark writeback structure as shutting down. The function returns true if + * we successfully marked the structure, false if shutdown was already in + * progress (in which case we also wait for it to finish). + * + * The function requires cgwb_lock to be held and releases it. Note that the + * caller need not hold reference to the writeback structure and thus once + * cgwb_lock is released, the writeback structure can be freed. + */ +static bool cgwb_start_shutdown(struct bdi_writeback *wb) + __releases(cgwb_lock) +{ + if (!wb_start_shutdown(wb)) { + DEFINE_WAIT(wait); + wait_queue_head_t *wqh = bit_waitqueue(&wb->state, + WB_shutting_down); + bool sleep; + + prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); + sleep = test_bit(WB_shutting_down, &wb->state); + spin_unlock_irq(&cgwb_lock); + if (sleep) + schedule(); + return false; + } + spin_unlock_irq(&cgwb_lock); + return true; +} + static void cgwb_bdi_unregister(struct backing_dev_info *bdi) { struct radix_tree_iter iter; @@ -721,8 +762,8 @@ static void cgwb_bdi_unregister(struct backing_dev_info *bdi) while (!list_empty(&bdi->wb_list)) { wb = list_first_entry(&bdi->wb_list, struct bdi_writeback, bdi_node); - spin_unlock_irq(&cgwb_lock); - wb_shutdown(wb); + if (cgwb_start_shutdown(wb)) + __wb_shutdown(wb); spin_lock_irq(&cgwb_lock); } spin_unlock_irq(&cgwb_lock); -- 2.13.7
next prev parent reply other threads:[~2018-06-13 14:46 UTC|newest] Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-05-26 9:15 general protection fault in wb_workfn (2) syzbot 2018-05-27 0:47 ` Tetsuo Handa 2018-05-27 2:21 ` [PATCH] bdi: Fix another oops in wb_workfn() Tetsuo Handa 2018-05-27 2:36 ` Tejun Heo 2018-05-27 4:43 ` Tetsuo Handa 2018-05-29 13:46 ` Tejun Heo 2018-05-28 13:35 ` general protection fault in wb_workfn (2) Jan Kara 2018-05-30 16:00 ` Tetsuo Handa 2018-05-30 16:00 ` Tetsuo Handa 2018-05-31 11:42 ` Jan Kara 2018-05-31 13:19 ` Tetsuo Handa 2018-05-31 13:42 ` Jan Kara 2018-05-31 16:56 ` Jens Axboe 2018-06-05 13:45 ` Tetsuo Handa 2018-06-07 18:46 ` Dmitry Vyukov 2018-06-08 2:31 ` Tetsuo Handa 2018-06-08 14:45 ` Dmitry Vyukov 2018-06-08 15:16 ` Dmitry Vyukov 2018-06-08 16:53 ` Dmitry Vyukov 2018-06-08 17:14 ` Dmitry Vyukov 2018-06-09 5:30 ` Tetsuo Handa 2018-06-09 14:00 ` [PATCH] bdi: Fix another oops in wb_workfn() Tetsuo Handa 2018-06-11 9:12 ` Jan Kara 2018-06-11 16:01 ` Tejun Heo 2018-06-11 16:29 ` Jan Kara 2018-06-11 17:20 ` Tejun Heo 2018-06-12 15:57 ` Jan Kara 2018-06-13 10:43 ` Tetsuo Handa 2018-06-13 11:51 ` Tetsuo Handa 2018-06-13 14:06 ` Linus Torvalds 2018-06-13 14:46 ` Jan Kara [this message] 2018-06-13 14:46 ` Jan Kara 2018-06-13 14:55 ` Linus Torvalds 2018-06-13 16:20 ` Tetsuo Handa 2018-06-13 16:25 ` Linus Torvalds 2018-06-13 16:45 ` Jan Kara 2018-06-13 21:04 ` Tetsuo Handa 2018-06-14 10:11 ` Jan Kara 2018-06-13 14:33 ` Tejun Heo 2018-06-15 12:06 ` Jan Kara 2018-06-15 12:06 ` Jan Kara 2018-06-18 12:27 ` Jan Kara 2018-06-01 2:30 ` general protection fault in wb_workfn (2) Dave Chinner 2018-06-18 13:46 [PATCH] bdi: Fix another oops in wb_workfn() Jan Kara 2018-06-18 14:38 ` Tetsuo Handa 2018-06-19 8:41 ` Jan Kara 2018-06-18 17:40 ` Tejun Heo 2018-06-22 8:52 ` Jan Kara 2018-06-22 18:08 ` Jens Axboe
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180613144606.nvbcyg2rdjpxhf7s@quack2.suse.cz \ --to=jack@suse.cz \ --cc=axboe@kernel.dk \ --cc=david@fromorbit.com \ --cc=dvyukov@google.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=penguin-kernel@i-love.sakura.ne.jp \ --cc=syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com \ --cc=syzkaller-bugs@googlegroups.com \ --cc=tj@kernel.org \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.