All of lore.kernel.org
 help / color / mirror / Atom feed
* merging the per-bdi writeback patchset
@ 2009-06-23  8:11 Jens Axboe
  2009-06-23  8:48 ` Andrew Morton
  2009-06-23 11:09 ` Christoph Hellwig
  0 siblings, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2009-06-23  8:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, akpm, hch

Hi,

Things are looking good for this patchset and it's been in -next for
almost a week without any reports of problems. So I'd like to merge it
for 2.6.31 if at all possible. Any objections?

This was the last posting of the patchset:

http://lkml.org/lkml/2009/6/17/143

  git://git.kernel.dk/linux-2.6-block.git writeback

Jens Axboe (10):
      writeback: move dirty inodes from super_block to backing_dev_info
      writeback: switch to per-bdi threads for flushing data
      writeback: get rid of pdflush completely
      writeback: separate the flushing state/task from the bdi
      writeback: support > 1 flusher thread per bdi
      writeback: allow sleepy exit of default writeback task
      writeback: add some debug inode list counters to bdi stats
      writeback: add name to backing_dev_info
      writeback: check for registered bdi in flusher add and inode dirty
      writeback: use spin_trylock() in bdi_writeback_all() for WB_SYNC_NONE

 block/blk-core.c            |    1 +
 drivers/block/aoe/aoeblk.c  |    1 +
 drivers/char/mem.c          |    1 +
 fs/btrfs/disk-io.c          |    1 +
 fs/buffer.c                 |    2 +-
 fs/char_dev.c               |    1 +
 fs/configfs/inode.c         |    1 +
 fs/fs-writeback.c           |  812 ++++++++++++++++++++++++++++++++++---------
 fs/fuse/inode.c             |    1 +
 fs/hugetlbfs/inode.c        |    1 +
 fs/nfs/client.c             |    1 +
 fs/ocfs2/dlm/dlmfs.c        |    1 +
 fs/ramfs/inode.c            |    1 +
 fs/super.c                  |    3 -
 fs/sysfs/inode.c            |    1 +
 fs/ubifs/super.c            |    1 +
 include/linux/backing-dev.h |   71 ++++-
 include/linux/fs.h          |   11 +-
 include/linux/writeback.h   |   15 +-
 kernel/cgroup.c             |    1 +
 mm/Makefile                 |    2 +-
 mm/backing-dev.c            |  532 ++++++++++++++++++++++++++++-
 mm/page-writeback.c         |  157 ++-------
 mm/pdflush.c                |  269 --------------
 mm/swap_state.c             |    1 +
 mm/vmscan.c                 |    2 +-
 26 files changed, 1294 insertions(+), 597 deletions(-)
 delete mode 100644 mm/pdflush.c

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23  8:11 merging the per-bdi writeback patchset Jens Axboe
@ 2009-06-23  8:48 ` Andrew Morton
  2009-06-23  8:55   ` Jens Axboe
  2009-06-23 11:09 ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2009-06-23  8:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, Linux Kernel, hch

On Tue, 23 Jun 2009 10:11:56 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:

> Things are looking good for this patchset and it's been in -next for
> almost a week without any reports of problems. So I'd like to merge it
> for 2.6.31 if at all possible. Any objections?

erk.  I was rather expecting I'd have time to have a look at it all.

It's unclear to me actually _why_ the performance changes which were
observed have actually occurred.  In fact it's a bit unclear (to me)
why the patchset was written and what it sets out to achieve :(

A long time ago the XFS guys (Dave Chinner iirc) said that XFS needs
more than one thread per device to keep the device saturated.  Did that
get addressed?

(kthread_run() returns an ERR_PTR() on error, btw - not NULL.)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23  8:48 ` Andrew Morton
@ 2009-06-23  8:55   ` Jens Axboe
  2009-06-23 10:28     ` KOSAKI Motohiro
  2009-06-23 15:01     ` Andrew Morton
  0 siblings, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2009-06-23  8:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, Linux Kernel, hch

On Tue, Jun 23 2009, Andrew Morton wrote:
> On Tue, 23 Jun 2009 10:11:56 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > Things are looking good for this patchset and it's been in -next for
> > almost a week without any reports of problems. So I'd like to merge it
> > for 2.6.31 if at all possible. Any objections?
> 
> erk.  I was rather expecting I'd have time to have a look at it all.

OK, we can wait if we have to, just trying to avoid having to keep this
fresh for one full cycle. I have posted this patchset 11 times though
over months, so it's not like it's a new piece of work :-)

> It's unclear to me actually _why_ the performance changes which were
> observed have actually occurred.  In fact it's a bit unclear (to me)
> why the patchset was written and what it sets out to achieve :(

It started out trying to get rid of the pdflush uneven writeout. If you
look at various pdflush intensive workloads, even on a single disk you
often have 5 or more pdflush threads working the same device. It's just
not optimal. Another issue was starvation with request allocation. Given
that pdflush does non-blocking writes (it has to, by design), pdflush
can potentially be starved if someone else is working the device.

> A long time ago the XFS guys (Dave Chinner iirc) said that XFS needs
> more than one thread per device to keep the device saturated.  Did that
> get addressed?

It supports up to 32-threads per device, but Chinner et all have been
silent. So the support is there and there's a
super_operations->inode_get_wb() to map a dirty inode to a writeback
device. Nobody is doing that yet though.

> (kthread_run() returns an ERR_PTR() on error, btw - not NULL.)

Oh thanks, will fix that up.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23  8:55   ` Jens Axboe
@ 2009-06-23 10:28     ` KOSAKI Motohiro
  2009-06-23 10:56       ` Jens Axboe
  2009-06-23 15:01     ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: KOSAKI Motohiro @ 2009-06-23 10:28 UTC (permalink / raw)
  To: Jens Axboe
  Cc: kosaki.motohiro, Andrew Morton, Linus Torvalds, Linux Kernel, hch

Hi

> On Tue, Jun 23 2009, Andrew Morton wrote:
> > On Tue, 23 Jun 2009 10:11:56 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > Things are looking good for this patchset and it's been in -next for
> > > almost a week without any reports of problems. So I'd like to merge it
> > > for 2.6.31 if at all possible. Any objections?
> > 
> > erk.  I was rather expecting I'd have time to have a look at it all.
> 
> OK, we can wait if we have to, just trying to avoid having to keep this
> fresh for one full cycle. I have posted this patchset 11 times though
> over months, so it's not like it's a new piece of work :-)
> 
> > It's unclear to me actually _why_ the performance changes which were
> > observed have actually occurred.  In fact it's a bit unclear (to me)
> > why the patchset was written and what it sets out to achieve :(
> 
> It started out trying to get rid of the pdflush uneven writeout. If you
> look at various pdflush intensive workloads, even on a single disk you
> often have 5 or more pdflush threads working the same device. It's just
> not optimal. Another issue was starvation with request allocation. Given
> that pdflush does non-blocking writes (it has to, by design), pdflush
> can potentially be starved if someone else is working the device.

Can you please make reproduce program and post mesurement result?
I hope to mesure the same program on my box.

Plus, Can you please write more vervose patch description? your patch is a
bit harder review.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23 10:28     ` KOSAKI Motohiro
@ 2009-06-23 10:56       ` Jens Axboe
  2009-06-23 11:19         ` KOSAKI Motohiro
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2009-06-23 10:56 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Linus Torvalds, Linux Kernel, hch

On Tue, Jun 23 2009, KOSAKI Motohiro wrote:
> Hi
> 
> > On Tue, Jun 23 2009, Andrew Morton wrote:
> > > On Tue, 23 Jun 2009 10:11:56 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > 
> > > > Things are looking good for this patchset and it's been in -next for
> > > > almost a week without any reports of problems. So I'd like to merge it
> > > > for 2.6.31 if at all possible. Any objections?
> > > 
> > > erk.  I was rather expecting I'd have time to have a look at it all.
> > 
> > OK, we can wait if we have to, just trying to avoid having to keep this
> > fresh for one full cycle. I have posted this patchset 11 times though
> > over months, so it's not like it's a new piece of work :-)
> > 
> > > It's unclear to me actually _why_ the performance changes which were
> > > observed have actually occurred.  In fact it's a bit unclear (to me)
> > > why the patchset was written and what it sets out to achieve :(
> > 
> > It started out trying to get rid of the pdflush uneven writeout. If you
> > look at various pdflush intensive workloads, even on a single disk you
> > often have 5 or more pdflush threads working the same device. It's just
> > not optimal. Another issue was starvation with request allocation. Given
> > that pdflush does non-blocking writes (it has to, by design), pdflush
> > can potentially be starved if someone else is working the device.
> 
> Can you please make reproduce program and post mesurement result?
> I hope to mesure the same program on my box.

For which issue? Lumpy writeout can often be observed just by doing
buffered writes to a bunch of files.

> Plus, Can you please write more vervose patch description? your patch is a
> bit harder review.

OK, I can probably improve on that. Do you mean the general description
of the patchset, or some of the individual patches?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23  8:11 merging the per-bdi writeback patchset Jens Axboe
  2009-06-23  8:48 ` Andrew Morton
@ 2009-06-23 11:09 ` Christoph Hellwig
  2009-06-23 11:12   ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2009-06-23 11:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, Linux Kernel, akpm, hch

On Tue, Jun 23, 2009 at 10:11:56AM +0200, Jens Axboe wrote:
> Hi,
> 
> Things are looking good for this patchset and it's been in -next for
> almost a week without any reports of problems. So I'd like to merge it
> for 2.6.31 if at all possible. Any objections?

Last time we discussed this you said you're happy with 2.6.32.  I really
want to take a more detailed look and put that on the not so urgent list
because ou didn't seem to rush for .31.  So my vote goes for waiting a
bit longer.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23 11:09 ` Christoph Hellwig
@ 2009-06-23 11:12   ` Jens Axboe
  2009-06-23 11:17     ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2009-06-23 11:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, Linux Kernel, akpm

On Tue, Jun 23 2009, Christoph Hellwig wrote:
> On Tue, Jun 23, 2009 at 10:11:56AM +0200, Jens Axboe wrote:
> > Hi,
> > 
> > Things are looking good for this patchset and it's been in -next for
> > almost a week without any reports of problems. So I'd like to merge it
> > for 2.6.31 if at all possible. Any objections?
> 
> Last time we discussed this you said you're happy with 2.6.32.  I really
> want to take a more detailed look and put that on the not so urgent list
> because ou didn't seem to rush for .31.  So my vote goes for waiting a
> bit longer.

Yeah, 2.6.32 works for me too, .31 would have been nice though so I
don't have to carry it around anymore. But either is fine, if you and
Andrew want more time to review this stuff, then lets just settle for
.32.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23 11:12   ` Jens Axboe
@ 2009-06-23 11:17     ` Christoph Hellwig
  2009-06-23 11:52       ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2009-06-23 11:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Linus Torvalds, Linux Kernel, akpm

On Tue, Jun 23, 2009 at 01:12:10PM +0200, Jens Axboe wrote:
> > Last time we discussed this you said you're happy with 2.6.32.  I really
> > want to take a more detailed look and put that on the not so urgent list
> > because ou didn't seem to rush for .31.  So my vote goes for waiting a
> > bit longer.
> 
> Yeah, 2.6.32 works for me too, .31 would have been nice though so I
> don't have to carry it around anymore. But either is fine, if you and
> Andrew want more time to review this stuff, then lets just settle for
> .32.

Yes, I'd really prefer more time.  I also expect to come up with some
more changes in that area.  Your patch makes the differences between
kupdate and pdflush-stye writeback look even more ugly then it already
is, so I want to see i there's some nicer way to handle it.  I also want
to take a look if it makes sense to distangle data integrity and
background writeback somehow.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23 10:56       ` Jens Axboe
@ 2009-06-23 11:19         ` KOSAKI Motohiro
  2009-06-23 11:28           ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: KOSAKI Motohiro @ 2009-06-23 11:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: kosaki.motohiro, Andrew Morton, Linus Torvalds, Linux Kernel, hch

> On Tue, Jun 23 2009, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > On Tue, Jun 23 2009, Andrew Morton wrote:
> > > > On Tue, 23 Jun 2009 10:11:56 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > 
> > > > > Things are looking good for this patchset and it's been in -next for
> > > > > almost a week without any reports of problems. So I'd like to merge it
> > > > > for 2.6.31 if at all possible. Any objections?
> > > > 
> > > > erk.  I was rather expecting I'd have time to have a look at it all.
> > > 
> > > OK, we can wait if we have to, just trying to avoid having to keep this
> > > fresh for one full cycle. I have posted this patchset 11 times though
> > > over months, so it's not like it's a new piece of work :-)
> > > 
> > > > It's unclear to me actually _why_ the performance changes which were
> > > > observed have actually occurred.  In fact it's a bit unclear (to me)
> > > > why the patchset was written and what it sets out to achieve :(
> > > 
> > > It started out trying to get rid of the pdflush uneven writeout. If you
> > > look at various pdflush intensive workloads, even on a single disk you
> > > often have 5 or more pdflush threads working the same device. It's just
> > > not optimal. Another issue was starvation with request allocation. Given
> > > that pdflush does non-blocking writes (it has to, by design), pdflush
> > > can potentially be starved if someone else is working the device.
> > 
> > Can you please make reproduce program and post mesurement result?
> > I hope to mesure the same program on my box.
> 
> For which issue? Lumpy writeout can often be observed just by doing
> buffered writes to a bunch of files.

Yes, I know current behavior is not perfectly optimal.
but I haven't seen it cause serious issue.

Then, I guess you have big degression workload, yes? if so, I hope to see it.


> > Plus, Can you please write more vervose patch description? your patch is a
> > bit harder review.
> 
> OK, I can probably improve on that. Do you mean the general description
> of the patchset, or some of the individual patches?

Hopefully both. honestly I haven't understand your main worryed issue.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23 11:19         ` KOSAKI Motohiro
@ 2009-06-23 11:28           ` Jens Axboe
  2009-06-23 11:44             ` KOSAKI Motohiro
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2009-06-23 11:28 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Andrew Morton, Linus Torvalds, Linux Kernel, hch

On Tue, Jun 23 2009, KOSAKI Motohiro wrote:
> > On Tue, Jun 23 2009, KOSAKI Motohiro wrote:
> > > Hi
> > > 
> > > > On Tue, Jun 23 2009, Andrew Morton wrote:
> > > > > On Tue, 23 Jun 2009 10:11:56 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > > 
> > > > > > Things are looking good for this patchset and it's been in -next for
> > > > > > almost a week without any reports of problems. So I'd like to merge it
> > > > > > for 2.6.31 if at all possible. Any objections?
> > > > > 
> > > > > erk.  I was rather expecting I'd have time to have a look at it all.
> > > > 
> > > > OK, we can wait if we have to, just trying to avoid having to keep this
> > > > fresh for one full cycle. I have posted this patchset 11 times though
> > > > over months, so it's not like it's a new piece of work :-)
> > > > 
> > > > > It's unclear to me actually _why_ the performance changes which were
> > > > > observed have actually occurred.  In fact it's a bit unclear (to me)
> > > > > why the patchset was written and what it sets out to achieve :(
> > > > 
> > > > It started out trying to get rid of the pdflush uneven writeout. If you
> > > > look at various pdflush intensive workloads, even on a single disk you
> > > > often have 5 or more pdflush threads working the same device. It's just
> > > > not optimal. Another issue was starvation with request allocation. Given
> > > > that pdflush does non-blocking writes (it has to, by design), pdflush
> > > > can potentially be starved if someone else is working the device.
> > > 
> > > Can you please make reproduce program and post mesurement result?
> > > I hope to mesure the same program on my box.
> > 
> > For which issue? Lumpy writeout can often be observed just by doing
> > buffered writes to a bunch of files.
> 
> Yes, I know current behavior is not perfectly optimal.
> but I haven't seen it cause serious issue.
> 
> Then, I guess you have big degression workload, yes? if so, I hope to
> see it.

Not really, I was just interested in making it more optimal. I work from
various fio job files, one case that is sped up greatly is doing random
writes with mmap to an otherwise buffered file. pdflush is both lumpy
and a lot slower there, even with many pdflush threads active. Looking
at disk utilization, pdflush doesn't manage more than ~80% for that. The
per-bdi writeback is completely smooth and gets about as close to 100%
utilization as possible (around ~98% there). And this is just one 1
disk, the per-bdi writeback scales nicely upwards. pdflush falls flat.

And then there are lots of cases where the performance is the same. For
many workloads, pdflush isn't really very active.

> > > Plus, Can you please write more vervose patch description? your patch is a
> > > bit harder review.
> > 
> > OK, I can probably improve on that. Do you mean the general description
> > of the patchset, or some of the individual patches?
> 
> Hopefully both. honestly I haven't understand your main worryed issue.

Does the above help? It's all about making the writeback more
consistent. So getting rid of the lumpy writeback and eliminating the
pdflush starvation were the prime motivators.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23 11:28           ` Jens Axboe
@ 2009-06-23 11:44             ` KOSAKI Motohiro
  0 siblings, 0 replies; 15+ messages in thread
From: KOSAKI Motohiro @ 2009-06-23 11:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: kosaki.motohiro, Andrew Morton, Linus Torvalds, Linux Kernel, hch

> > > > Can you please make reproduce program and post mesurement result?
> > > > I hope to mesure the same program on my box.
> > > 
> > > For which issue? Lumpy writeout can often be observed just by doing
> > > buffered writes to a bunch of files.
> > 
> > Yes, I know current behavior is not perfectly optimal.
> > but I haven't seen it cause serious issue.
> > 
> > Then, I guess you have big degression workload, yes? if so, I hope to
> > see it.
> 
> Not really, I was just interested in making it more optimal. I work from
> various fio job files, one case that is sped up greatly is doing random
> writes with mmap to an otherwise buffered file. pdflush is both lumpy
> and a lot slower there, even with many pdflush threads active. Looking
> at disk utilization, pdflush doesn't manage more than ~80% for that. The
> per-bdi writeback is completely smooth and gets about as close to 100%
> utilization as possible (around ~98% there). And this is just one 1
> disk, the per-bdi writeback scales nicely upwards. pdflush falls flat.
> 
> And then there are lots of cases where the performance is the same. For
> many workloads, pdflush isn't really very active.
> 
> > > > Plus, Can you please write more vervose patch description? your patch is a
> > > > bit harder review.
> > > 
> > > OK, I can probably improve on that. Do you mean the general description
> > > of the patchset, or some of the individual patches?
> > 
> > Hopefully both. honestly I haven't understand your main worryed issue.
> 
> Does the above help? It's all about making the writeback more
> consistent. So getting rid of the lumpy writeback and eliminating the
> pdflush starvation were the prime motivators.

ok, thanks.
I try to review your patch series without any bias later.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23 11:17     ` Christoph Hellwig
@ 2009-06-23 11:52       ` Jens Axboe
  2009-06-23 12:20         ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2009-06-23 11:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, Linux Kernel, akpm

On Tue, Jun 23 2009, Christoph Hellwig wrote:
> On Tue, Jun 23, 2009 at 01:12:10PM +0200, Jens Axboe wrote:
> > > Last time we discussed this you said you're happy with 2.6.32.  I really
> > > want to take a more detailed look and put that on the not so urgent list
> > > because ou didn't seem to rush for .31.  So my vote goes for waiting a
> > > bit longer.
> > 
> > Yeah, 2.6.32 works for me too, .31 would have been nice though so I
> > don't have to carry it around anymore. But either is fine, if you and
> > Andrew want more time to review this stuff, then lets just settle for
> > .32.
> 
> Yes, I'd really prefer more time.  I also expect to come up with some
> more changes in that area.  Your patch makes the differences between
> kupdate and pdflush-stye writeback look even more ugly then it already
> is, so I want to see i there's some nicer way to handle it.  I also want

Good point, should be easy enough to fold the two together.

> to take a look if it makes sense to distangle data integrity and
> background writeback somehow.

That one is also on more list, would make the code flow a lot cleaner I
suspect.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23 11:52       ` Jens Axboe
@ 2009-06-23 12:20         ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2009-06-23 12:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, Linux Kernel, akpm

On Tue, Jun 23 2009, Jens Axboe wrote:
> On Tue, Jun 23 2009, Christoph Hellwig wrote:
> > On Tue, Jun 23, 2009 at 01:12:10PM +0200, Jens Axboe wrote:
> > > > Last time we discussed this you said you're happy with 2.6.32.  I really
> > > > want to take a more detailed look and put that on the not so urgent list
> > > > because ou didn't seem to rush for .31.  So my vote goes for waiting a
> > > > bit longer.
> > > 
> > > Yeah, 2.6.32 works for me too, .31 would have been nice though so I
> > > don't have to carry it around anymore. But either is fine, if you and
> > > Andrew want more time to review this stuff, then lets just settle for
> > > .32.
> > 
> > Yes, I'd really prefer more time.  I also expect to come up with some
> > more changes in that area.  Your patch makes the differences between
> > kupdate and pdflush-stye writeback look even more ugly then it already
> > is, so I want to see i there's some nicer way to handle it.  I also want
> 
> Good point, should be easy enough to fold the two together.

Something like the below, untested except for checking that it compiles.
So this doesn't dual-path the two reasons we wake up to flush data
anymore, rather it just flushes kupdated style if we didn't have any
work to do (or did any work).

 fs-writeback.c |  111 +++++++++++++++----------------------------------
 1 file changed, 36 insertions(+), 75 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index d589db3..c5996e5 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -269,8 +269,18 @@ void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
  */
 #define MAX_WRITEBACK_PAGES     1024
 
+static inline bool over_bground_thresh(void)
+{
+	unsigned long background_thresh, dirty_thresh;
+
+	get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
+
+	return (global_page_state(NR_FILE_DIRTY) +
+		global_page_state(NR_UNSTABLE_NFS) >= background_thresh);
+}
+
 /*
- * Periodic writeback of "old" data.
+ * Retrieve work items queued, or perform periodic writeback of "old" data.
  *
  * Define "old": the first time one of an inode's pages is dirtied, we mark the
  * dirtying-time in the inode's address_space.  So this periodic writeback code
@@ -284,61 +294,26 @@ void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
  * older_than_this takes precedence over nr_to_write.  So we'll only write back
  * all dirty pages if they are all attached to "old" mappings.
  */
-static long wb_kupdated(struct bdi_writeback *wb)
-{
-	unsigned long oldest_jif;
-	long nr_to_write, wrote = 0;
-	struct writeback_control wbc = {
-		.bdi			= wb->bdi,
-		.sync_mode		= WB_SYNC_NONE,
-		.older_than_this	= &oldest_jif,
-		.nr_to_write		= 0,
-		.for_kupdate		= 1,
-		.range_cyclic		= 1,
-	};
-
-	oldest_jif = jiffies - msecs_to_jiffies(dirty_expire_interval * 10);
-
-	nr_to_write = global_page_state(NR_FILE_DIRTY) +
-			global_page_state(NR_UNSTABLE_NFS) +
-			(inodes_stat.nr_inodes - inodes_stat.nr_unused);
-
-	while (nr_to_write > 0) {
-		wbc.more_io = 0;
-		wbc.encountered_congestion = 0;
-		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
-		generic_sync_wb_inodes(wb, NULL, &wbc);
-		wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;
-		if (wbc.nr_to_write > 0)
-			break;	/* All the old data is written */
-		nr_to_write -= MAX_WRITEBACK_PAGES;
-	}
-
-	return wrote;
-}
-
-static inline bool over_bground_thresh(void)
-{
-	unsigned long background_thresh, dirty_thresh;
-
-	get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
-
-	return (global_page_state(NR_FILE_DIRTY) +
-		global_page_state(NR_UNSTABLE_NFS) >= background_thresh);
-}
-
-static long __wb_writeback(struct bdi_writeback *wb, long nr_pages,
-			   struct super_block *sb,
-			   enum writeback_sync_modes sync_mode)
+static long wb_writeback(struct bdi_writeback *wb, long nr_pages,
+			 struct super_block *sb,
+			 enum writeback_sync_modes sync_mode, int for_kupdate)
 {
 	struct writeback_control wbc = {
 		.bdi			= wb->bdi,
 		.sync_mode		= sync_mode,
 		.older_than_this	= NULL,
+		.for_kupdate		= for_kupdate,
 		.range_cyclic		= 1,
 	};
+	unsigned long oldest_jif;
 	long wrote = 0;
 
+	if (wbc.for_kupdate) {
+		wbc.older_than_this = &oldest_jif;
+		oldest_jif = jiffies -
+				msecs_to_jiffies(dirty_expire_interval * 10);
+	}
+
 	for (;;) {
 		if (sync_mode == WB_SYNC_NONE && nr_pages <= 0 &&
 		    !over_bground_thresh())
@@ -355,7 +330,7 @@ static long __wb_writeback(struct bdi_writeback *wb, long nr_pages,
 		 * If we ran out of stuff to write, bail unless more_io got set
 		 */
 		if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
-			if (wbc.more_io)
+			if (wbc.more_io && !wbc.for_kupdate)
 				continue;
 			break;
 		}
@@ -390,17 +365,18 @@ static struct bdi_work *get_next_work_item(struct backing_dev_info *bdi,
 /*
  * Retrieve work items and do the writeback they describe
  */
-static long wb_writeback(struct bdi_writeback *wb, int force_wait)
+long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 {
 	struct backing_dev_info *bdi = wb->bdi;
 	struct bdi_work *work;
-	long wrote = 0;
+	long nr_pages, wrote = 0;
 
 	while ((work = get_next_work_item(bdi, wb)) != NULL) {
 		struct super_block *sb = bdi_work_sb(work);
-		long nr_pages = work->nr_pages;
 		enum writeback_sync_modes sync_mode;
 
+		nr_pages = work->nr_pages;
+
 		/*
 		 * Override sync mode, in case we must wait for completion
 		 */
@@ -416,7 +392,7 @@ static long wb_writeback(struct bdi_writeback *wb, int force_wait)
 		if (sync_mode == WB_SYNC_NONE)
 			wb_clear_pending(wb, work);
 
-		wrote += __wb_writeback(wb, nr_pages, sb, sync_mode);
+		wrote += wb_writeback(wb, nr_pages, sb, sync_mode, 0);
 
 		/*
 		 * This is a data integrity writeback, so only do the
@@ -426,31 +402,16 @@ static long wb_writeback(struct bdi_writeback *wb, int force_wait)
 			wb_clear_pending(wb, work);
 	}
 
-	return wrote;
-}
-
-/*
- * This will be inlined in bdi_writeback_task() once we get rid of any
- * dirty inodes on the default_backing_dev_info
- */
-long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
-{
-	long wrote;
-
 	/*
-	 * We get here in two cases:
-	 *
-	 *  schedule_timeout() returned because the dirty writeback
-	 *  interval has elapsed. If that happens, the work item list
-	 *  will be empty and we will proceed to do kupdated style writeout.
-	 *
-	 *  Someone called bdi_start_writeback(), which put one/more work
-	 *  items on the work_list. Process those.
+	 * Check for periodic writeback, kupdated() style
 	 */
-	if (list_empty(&wb->bdi->work_list))
-		wrote = wb_kupdated(wb);
-	else
-		wrote = wb_writeback(wb, force_wait);
+	if (!wrote) {
+		nr_pages = global_page_state(NR_FILE_DIRTY) +
+				global_page_state(NR_UNSTABLE_NFS) +
+				(inodes_stat.nr_inodes - inodes_stat.nr_unused);
+
+		wrote = wb_writeback(wb, nr_pages, NULL, WB_SYNC_NONE, 1);
+	}
 
 	return wrote;
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23  8:55   ` Jens Axboe
  2009-06-23 10:28     ` KOSAKI Motohiro
@ 2009-06-23 15:01     ` Andrew Morton
  2009-06-24 10:04       ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2009-06-23 15:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, Linux Kernel, hch

On Tue, 23 Jun 2009 10:55:05 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:

> On Tue, Jun 23 2009, Andrew Morton wrote:
> > On Tue, 23 Jun 2009 10:11:56 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > 
> > > Things are looking good for this patchset and it's been in -next for
> > > almost a week without any reports of problems. So I'd like to merge it
> > > for 2.6.31 if at all possible. Any objections?
> > 
> > erk.  I was rather expecting I'd have time to have a look at it all.
> 
> OK, we can wait if we have to, just trying to avoid having to keep this
> fresh for one full cycle. I have posted this patchset 11 times though
> over months, so it's not like it's a new piece of work :-)

Yeah, sorry.  

> > It's unclear to me actually _why_ the performance changes which were
> > observed have actually occurred.  In fact it's a bit unclear (to me)
> > why the patchset was written and what it sets out to achieve :(
> 
> It started out trying to get rid of the pdflush uneven writeout. If you
> look at various pdflush intensive workloads, even on a single disk you
> often have 5 or more pdflush threads working the same device. It's just
> not optimal.

That's a bug, isn't it?  This

		/* Is another pdflush already flushing this queue? */
		if (current_is_pdflush() && !writeback_acquire(bdi))
			break;

isn't working.

> Another issue was starvation with request allocation. Given
> that pdflush does non-blocking writes (it has to, by design), pdflush
> can potentially be starved if someone else is working the device.

hm, true.  100% starved, or just "slowed down"?  The latter I trust -
otherwise there are still failure modes?

> > A long time ago the XFS guys (Dave Chinner iirc) said that XFS needs
> > more than one thread per device to keep the device saturated.  Did that
> > get addressed?
> 
> It supports up to 32-threads per device, but Chinner et all have been
> silent. So the support is there and there's a
> super_operations->inode_get_wb() to map a dirty inode to a writeback
> device. Nobody is doing that yet though.

OK.

How many kernel threads do the 1000-spindle people end up with?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: merging the per-bdi writeback patchset
  2009-06-23 15:01     ` Andrew Morton
@ 2009-06-24 10:04       ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2009-06-24 10:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, Linux Kernel, hch

On Tue, Jun 23 2009, Andrew Morton wrote:
> On Tue, 23 Jun 2009 10:55:05 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Tue, Jun 23 2009, Andrew Morton wrote:
> > > On Tue, 23 Jun 2009 10:11:56 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> > > 
> > > > Things are looking good for this patchset and it's been in -next for
> > > > almost a week without any reports of problems. So I'd like to merge it
> > > > for 2.6.31 if at all possible. Any objections?
> > > 
> > > erk.  I was rather expecting I'd have time to have a look at it all.
> > 
> > OK, we can wait if we have to, just trying to avoid having to keep this
> > fresh for one full cycle. I have posted this patchset 11 times though
> > over months, so it's not like it's a new piece of work :-)
> 
> Yeah, sorry.  
> 
> > > It's unclear to me actually _why_ the performance changes which were
> > > observed have actually occurred.  In fact it's a bit unclear (to me)
> > > why the patchset was written and what it sets out to achieve :(
> > 
> > It started out trying to get rid of the pdflush uneven writeout. If you
> > look at various pdflush intensive workloads, even on a single disk you
> > often have 5 or more pdflush threads working the same device. It's just
> > not optimal.
> 
> That's a bug, isn't it?  This
> 
> 		/* Is another pdflush already flushing this queue? */
> 		if (current_is_pdflush() && !writeback_acquire(bdi))
> 			break;
> 
> isn't working.

But that's on a per-inode basis. I didn't look further into the problem
to be honest, just noticed that you very quickly get a handful of
pdflush threads ticking along.

> > Another issue was starvation with request allocation. Given
> > that pdflush does non-blocking writes (it has to, by design), pdflush
> > can potentially be starved if someone else is working the device.
> 
> hm, true.  100% starved, or just "slowed down"?  The latter I trust -
> otherwise there are still failure modes?

Just slowed down, I'm suspecting this is where the lumpiness comes from
as well. At least in the cases I have seen, in theory you could starve
the pdflush thread indefinitely.

> > > A long time ago the XFS guys (Dave Chinner iirc) said that XFS needs
> > > more than one thread per device to keep the device saturated.  Did that
> > > get addressed?
> > 
> > It supports up to 32-threads per device, but Chinner et all have been
> > silent. So the support is there and there's a
> > super_operations->inode_get_wb() to map a dirty inode to a writeback
> > device. Nobody is doing that yet though.
> 
> OK.
> 
> How many kernel threads do the 1000-spindle people end up with?

If all 1000 spindles are exposed and flushing dirty data, you get 1000
threads. Realistically, you'll likely use some sort of dm/md frontend
though. And then you only get 1 thread per dm/md device.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-06-24 10:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-23  8:11 merging the per-bdi writeback patchset Jens Axboe
2009-06-23  8:48 ` Andrew Morton
2009-06-23  8:55   ` Jens Axboe
2009-06-23 10:28     ` KOSAKI Motohiro
2009-06-23 10:56       ` Jens Axboe
2009-06-23 11:19         ` KOSAKI Motohiro
2009-06-23 11:28           ` Jens Axboe
2009-06-23 11:44             ` KOSAKI Motohiro
2009-06-23 15:01     ` Andrew Morton
2009-06-24 10:04       ` Jens Axboe
2009-06-23 11:09 ` Christoph Hellwig
2009-06-23 11:12   ` Jens Axboe
2009-06-23 11:17     ` Christoph Hellwig
2009-06-23 11:52       ` Jens Axboe
2009-06-23 12:20         ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.