* [PATCH 0/2 v2] bdi: Fix hung tasks and oops in writeback
@ 2014-02-27 21:43 Jan Kara
2014-02-27 21:43 ` [PATCH 1/2] bdi: Fix hung task on sync Jan Kara
2014-02-27 21:43 ` [PATCH 2/2] bdi: Avoid oops on device removal Jan Kara
0 siblings, 2 replies; 6+ messages in thread
From: Jan Kara @ 2014-02-27 21:43 UTC (permalink / raw)
To: Jens Axboe; +Cc: LKML, linux-fsdevel, Tejun Heo
Hello,
this is a second version of patches to fix some issues in current
flusher work implementation. Since v1 I have fixed some whitespace
issues and added Reviewed-by tags. Jens, can you please merge the
patches with Linus? Thanks!
Honza
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] bdi: Fix hung task on sync
2014-02-27 21:43 [PATCH 0/2 v2] bdi: Fix hung tasks and oops in writeback Jan Kara
@ 2014-02-27 21:43 ` Jan Kara
2014-02-27 21:43 ` [PATCH 2/2] bdi: Avoid oops on device removal Jan Kara
1 sibling, 0 replies; 6+ messages in thread
From: Jan Kara @ 2014-02-27 21:43 UTC (permalink / raw)
To: Jens Axboe
Cc: LKML, linux-fsdevel, Tejun Heo, Derek Basehore, stable, Jan Kara
From: Derek Basehore <dbasehore@chromium.org>
bdi_wakeup_thread_delayed() used the mod_delayed_work() function to
schedule work to writeback dirty inodes. The problem with this is that
it can delay work that is scheduled for immediate execution, such as the
work from sync_inodes_sb(). This can happen since mod_delayed_work()
can now steal work from a work_queue. This fixes the problem by using
queue_delayed_work() instead. This is a regression caused by
839a8e8660b6 "writeback: replace custom worker pool implementation with
unbound workqueue".
The reason that this causes a problem is that laptop-mode will change
the delay, dirty_writeback_centisecs, to 60000 (10 minutes) by default.
In the case that bdi_wakeup_thread_delayed() races with
sync_inodes_sb(), sync will be stopped for 10 minutes and trigger a hung
task. Even if dirty_writeback_centisecs is not long enough to cause a
hung task, we still don't want to delay sync for that long.
We fix the problem by using queue_delayed_work() when we want to
schedule writeback sometime in future. This function doesn't change the
timer if it is already armed.
For the same reason, we also change bdi_writeback_workfn() to
immediately queue the work again in the case that the work_list is not
empty. The same problem can happen if the sync work is run on the rescue
worker.
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
CC: stable@vger.kernel.org
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Derek Basehore <dbasehore@chromium.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 8 ++++----
mm/backing-dev.c | 5 ++++-
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e0259a163f98..8277b76be983 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1047,10 +1047,10 @@ void bdi_writeback_workfn(struct work_struct *work)
trace_writeback_pages_written(pages_written);
}
- if (!list_empty(&bdi->work_list) ||
- (wb_has_dirty_io(wb) && dirty_writeback_interval))
- queue_delayed_work(bdi_wq, &wb->dwork,
- msecs_to_jiffies(dirty_writeback_interval * 10));
+ if (!list_empty(&bdi->work_list))
+ mod_delayed_work(bdi_wq, &wb->dwork, 0);
+ else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
+ bdi_wakeup_thread_delayed(bdi);
current->flags &= ~PF_SWAPWRITE;
}
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index ce682f7a4f29..fab8401fc54e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -288,13 +288,16 @@ int bdi_has_dirty_io(struct backing_dev_info *bdi)
* Note, we wouldn't bother setting up the timer, but this function is on the
* fast-path (used by '__mark_inode_dirty()'), so we save few context switches
* by delaying the wake-up.
+ *
+ * We have to be careful not to postpone flush work if it is scheduled for
+ * earlier. Thus we use queue_delayed_work().
*/
void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi)
{
unsigned long timeout;
timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
- mod_delayed_work(bdi_wq, &bdi->wb.dwork, timeout);
+ queue_delayed_work(bdi_wq, &bdi->wb.dwork, timeout);
}
/*
--
1.8.1.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] bdi: Avoid oops on device removal
2014-02-27 21:43 [PATCH 0/2 v2] bdi: Fix hung tasks and oops in writeback Jan Kara
2014-02-27 21:43 ` [PATCH 1/2] bdi: Fix hung task on sync Jan Kara
@ 2014-02-27 21:43 ` Jan Kara
1 sibling, 0 replies; 6+ messages in thread
From: Jan Kara @ 2014-02-27 21:43 UTC (permalink / raw)
To: Jens Axboe; +Cc: LKML, linux-fsdevel, Tejun Heo, Jan Kara, stable
After 839a8e8660b67 "writeback: replace custom worker pool
implementation with unbound workqueue" when device is removed while we
are writing to it we crash in bdi_writeback_workfn() ->
set_worker_desc() because bdi->dev is NULL. This can happen because
even though bdi_unregister() cancels all pending flushing work, nothing
really prevents new ones from being queued from balance_dirty_pages() or
other places.
Fix the problem by clearing BDI_registered bit in bdi_unregister() and
checking it before scheduling of any flushing work.
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
CC: stable@vger.kernel.org
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 23 ++++++++++++++++++-----
include/linux/backing-dev.h | 2 +-
mm/backing-dev.c | 13 +++++++++----
3 files changed, 28 insertions(+), 10 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 8277b76be983..c24e7b8b521b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -94,16 +94,29 @@ static inline struct inode *wb_inode(struct list_head *head)
#define CREATE_TRACE_POINTS
#include <trace/events/writeback.h>
+static void bdi_wakeup_thread(struct backing_dev_info *bdi)
+{
+ spin_lock_bh(&bdi->wb_lock);
+ if (test_bit(BDI_registered, &bdi->state))
+ mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
+ spin_unlock_bh(&bdi->wb_lock);
+}
+
static void bdi_queue_work(struct backing_dev_info *bdi,
struct wb_writeback_work *work)
{
trace_writeback_queue(bdi, work);
spin_lock_bh(&bdi->wb_lock);
+ if (!test_bit(BDI_registered, &bdi->state)) {
+ if (work->done)
+ complete(work->done);
+ goto out_unlock;
+ }
list_add_tail(&work->list, &bdi->work_list);
- spin_unlock_bh(&bdi->wb_lock);
-
mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
+out_unlock:
+ spin_unlock_bh(&bdi->wb_lock);
}
static void
@@ -119,7 +132,7 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
work = kzalloc(sizeof(*work), GFP_ATOMIC);
if (!work) {
trace_writeback_nowork(bdi);
- mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
+ bdi_wakeup_thread(bdi);
return;
}
@@ -166,7 +179,7 @@ void bdi_start_background_writeback(struct backing_dev_info *bdi)
* writeback as soon as there is no other work to do.
*/
trace_writeback_wake_background(bdi);
- mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
+ bdi_wakeup_thread(bdi);
}
/*
@@ -1025,7 +1038,7 @@ void bdi_writeback_workfn(struct work_struct *work)
current->flags |= PF_SWAPWRITE;
if (likely(!current_is_workqueue_rescuer() ||
- list_empty(&bdi->bdi_list))) {
+ !test_bit(BDI_registered, &bdi->state))) {
/*
* The normal path. Keep writing back @bdi until its
* work_list is empty. Note that this path is also taken
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 24819001f5c8..e488e9459a93 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -95,7 +95,7 @@ struct backing_dev_info {
unsigned int max_ratio, max_prop_frac;
struct bdi_writeback wb; /* default writeback info for this bdi */
- spinlock_t wb_lock; /* protects work_list */
+ spinlock_t wb_lock; /* protects work_list & wb.dwork scheduling */
struct list_head work_list;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index fab8401fc54e..09d9591b7708 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -297,7 +297,10 @@ void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi)
unsigned long timeout;
timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
- queue_delayed_work(bdi_wq, &bdi->wb.dwork, timeout);
+ spin_lock_bh(&bdi->wb_lock);
+ if (test_bit(BDI_registered, &bdi->state))
+ queue_delayed_work(bdi_wq, &bdi->wb.dwork, timeout);
+ spin_unlock_bh(&bdi->wb_lock);
}
/*
@@ -310,9 +313,6 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi)
spin_unlock_bh(&bdi_lock);
synchronize_rcu_expedited();
-
- /* bdi_list is now unused, clear it to mark @bdi dying */
- INIT_LIST_HEAD(&bdi->bdi_list);
}
int bdi_register(struct backing_dev_info *bdi, struct device *parent,
@@ -363,6 +363,11 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
*/
bdi_remove_from_list(bdi);
+ /* Make sure nobody queues further work */
+ spin_lock_bh(&bdi->wb_lock);
+ clear_bit(BDI_registered, &bdi->state);
+ spin_unlock_bh(&bdi->wb_lock);
+
/*
* Drain work list and shutdown the delayed_work. At this point,
* @bdi->bdi_list is empty telling bdi_Writeback_workfn() that @bdi
--
1.8.1.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] bdi: Avoid oops on device removal
2014-02-27 20:07 ` Tejun Heo
@ 2014-02-27 21:32 ` Jan Kara
0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2014-02-27 21:32 UTC (permalink / raw)
To: Tejun Heo; +Cc: Jan Kara, linux-fsdevel, LKML, Jens Axboe, stable
On Thu 27-02-14 15:07:48, Tejun Heo wrote:
> Hello,
>
> On Tue, Feb 25, 2014 at 11:29:14PM +0100, Jan Kara wrote:
> > +static void bdi_wakeup_thread(struct backing_dev_info *bdi)
> > +{
> > + spin_lock_bh(&bdi->wb_lock);
> > + if (test_bit(BDI_registered, &bdi->state))
> > + mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
> > + spin_unlock_bh(&bdi->wb_lock);
> > +}
>
> I wonder whether this can be smarter without requiring wb_lock each
> timer but this probably is the simplest for -stable backports.
We could be clever, check whether the work is already queued for
execution and bail out without taking wb_lock if yes (that would also
save us some unnecessary juggling in try_to_grab_pending() for the situation
were the work is already queued). But I'm not sure how to cleanly implement
this...
> > static void bdi_queue_work(struct backing_dev_info *bdi,
> > struct wb_writeback_work *work)
> > {
> > trace_writeback_queue(bdi, work);
> >
> > spin_lock_bh(&bdi->wb_lock);
> > + if (!test_bit(BDI_registered, &bdi->state)) {
> > + if (work->done)
> > + complete(work->done);
> > + goto out_unlock;
> > + }
> > list_add_tail(&work->list, &bdi->work_list);
> > - spin_unlock_bh(&bdi->wb_lock);
> > -
> > mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
> > +out_unlock:
> > + spin_unlock_bh(&bdi->wb_lock);
> > }
> >
> > +
> > +
>
> Why three blank lines?
A mistake. Will fix.
> Other than that,
>
> Reviewed-by: Tejun Heo <tj@kernel.org>
Thanks!
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] bdi: Avoid oops on device removal
2014-02-25 22:29 ` [PATCH 2/2] bdi: Avoid oops on device removal Jan Kara
@ 2014-02-27 20:07 ` Tejun Heo
2014-02-27 21:32 ` Jan Kara
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2014-02-27 20:07 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-fsdevel, LKML, Jens Axboe, stable
Hello,
On Tue, Feb 25, 2014 at 11:29:14PM +0100, Jan Kara wrote:
> +static void bdi_wakeup_thread(struct backing_dev_info *bdi)
> +{
> + spin_lock_bh(&bdi->wb_lock);
> + if (test_bit(BDI_registered, &bdi->state))
> + mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
> + spin_unlock_bh(&bdi->wb_lock);
> +}
I wonder whether this can be smarter without requiring wb_lock each
timer but this probably is the simplest for -stable backports.
> static void bdi_queue_work(struct backing_dev_info *bdi,
> struct wb_writeback_work *work)
> {
> trace_writeback_queue(bdi, work);
>
> spin_lock_bh(&bdi->wb_lock);
> + if (!test_bit(BDI_registered, &bdi->state)) {
> + if (work->done)
> + complete(work->done);
> + goto out_unlock;
> + }
> list_add_tail(&work->list, &bdi->work_list);
> - spin_unlock_bh(&bdi->wb_lock);
> -
> mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
> +out_unlock:
> + spin_unlock_bh(&bdi->wb_lock);
> }
>
> +
> +
Why three blank lines?
Other than that,
Reviewed-by: Tejun Heo <tj@kernel.org>
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] bdi: Avoid oops on device removal
2014-02-25 22:29 [PATCH 0/2] bdi fixes Jan Kara
@ 2014-02-25 22:29 ` Jan Kara
2014-02-27 20:07 ` Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2014-02-25 22:29 UTC (permalink / raw)
To: linux-fsdevel; +Cc: LKML, Jens Axboe, Tejun Heo, Jan Kara, stable
After 839a8e8660b67 "writeback: replace custom worker pool
implementation with unbound workqueue" when device is removed while we
are writing to it we crash in bdi_writeback_workfn() ->
set_worker_desc() because bdi->dev is NULL. This can happen because
even though bdi_unregister() cancels all pending flushing work, nothing
really prevents new ones from being queued from balance_dirty_pages() or
other places.
Fix the problem by clearing BDI_registered bit in bdi_unregister() and
checking it before scheduling of any flushing work.
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 25 ++++++++++++++++++++-----
include/linux/backing-dev.h | 2 +-
mm/backing-dev.c | 13 +++++++++----
3 files changed, 30 insertions(+), 10 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 8277b76be983..2127441fabf4 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -94,18 +94,33 @@ static inline struct inode *wb_inode(struct list_head *head)
#define CREATE_TRACE_POINTS
#include <trace/events/writeback.h>
+static void bdi_wakeup_thread(struct backing_dev_info *bdi)
+{
+ spin_lock_bh(&bdi->wb_lock);
+ if (test_bit(BDI_registered, &bdi->state))
+ mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
+ spin_unlock_bh(&bdi->wb_lock);
+}
+
static void bdi_queue_work(struct backing_dev_info *bdi,
struct wb_writeback_work *work)
{
trace_writeback_queue(bdi, work);
spin_lock_bh(&bdi->wb_lock);
+ if (!test_bit(BDI_registered, &bdi->state)) {
+ if (work->done)
+ complete(work->done);
+ goto out_unlock;
+ }
list_add_tail(&work->list, &bdi->work_list);
- spin_unlock_bh(&bdi->wb_lock);
-
mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
+out_unlock:
+ spin_unlock_bh(&bdi->wb_lock);
}
+
+
static void
__bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
bool range_cyclic, enum wb_reason reason)
@@ -119,7 +134,7 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
work = kzalloc(sizeof(*work), GFP_ATOMIC);
if (!work) {
trace_writeback_nowork(bdi);
- mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
+ bdi_wakeup_thread(bdi);
return;
}
@@ -166,7 +181,7 @@ void bdi_start_background_writeback(struct backing_dev_info *bdi)
* writeback as soon as there is no other work to do.
*/
trace_writeback_wake_background(bdi);
- mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
+ bdi_wakeup_thread(bdi);
}
/*
@@ -1025,7 +1040,7 @@ void bdi_writeback_workfn(struct work_struct *work)
current->flags |= PF_SWAPWRITE;
if (likely(!current_is_workqueue_rescuer() ||
- list_empty(&bdi->bdi_list))) {
+ !test_bit(BDI_registered, &bdi->state))) {
/*
* The normal path. Keep writing back @bdi until its
* work_list is empty. Note that this path is also taken
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 24819001f5c8..e488e9459a93 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -95,7 +95,7 @@ struct backing_dev_info {
unsigned int max_ratio, max_prop_frac;
struct bdi_writeback wb; /* default writeback info for this bdi */
- spinlock_t wb_lock; /* protects work_list */
+ spinlock_t wb_lock; /* protects work_list & wb.dwork scheduling */
struct list_head work_list;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index fab8401fc54e..09d9591b7708 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -297,7 +297,10 @@ void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi)
unsigned long timeout;
timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
- queue_delayed_work(bdi_wq, &bdi->wb.dwork, timeout);
+ spin_lock_bh(&bdi->wb_lock);
+ if (test_bit(BDI_registered, &bdi->state))
+ queue_delayed_work(bdi_wq, &bdi->wb.dwork, timeout);
+ spin_unlock_bh(&bdi->wb_lock);
}
/*
@@ -310,9 +313,6 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi)
spin_unlock_bh(&bdi_lock);
synchronize_rcu_expedited();
-
- /* bdi_list is now unused, clear it to mark @bdi dying */
- INIT_LIST_HEAD(&bdi->bdi_list);
}
int bdi_register(struct backing_dev_info *bdi, struct device *parent,
@@ -363,6 +363,11 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
*/
bdi_remove_from_list(bdi);
+ /* Make sure nobody queues further work */
+ spin_lock_bh(&bdi->wb_lock);
+ clear_bit(BDI_registered, &bdi->state);
+ spin_unlock_bh(&bdi->wb_lock);
+
/*
* Drain work list and shutdown the delayed_work. At this point,
* @bdi->bdi_list is empty telling bdi_Writeback_workfn() that @bdi
--
1.8.1.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-02-27 21:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-27 21:43 [PATCH 0/2 v2] bdi: Fix hung tasks and oops in writeback Jan Kara
2014-02-27 21:43 ` [PATCH 1/2] bdi: Fix hung task on sync Jan Kara
2014-02-27 21:43 ` [PATCH 2/2] bdi: Avoid oops on device removal Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2014-02-25 22:29 [PATCH 0/2] bdi fixes Jan Kara
2014-02-25 22:29 ` [PATCH 2/2] bdi: Avoid oops on device removal Jan Kara
2014-02-27 20:07 ` Tejun Heo
2014-02-27 21:32 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).