All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-02 21:29 ` Eric Sandeen
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Sandeen @ 2009-07-02 21:29 UTC (permalink / raw)
  To: xfs mailing list; +Cc: Christoph Hellwig, linux-mm, MASON, CHRISTOPHER

Talking w/ someone who had a raid6 of 15 drives on an areca
controller, he wondered why he could only get 300MB/s or so
out of a streaming buffered write to xfs like so:

dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s

when the same write directly to the device was going closer
to 700MB/s...

With the following change things get moving again for xfs:

dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
10737418240 bytes (11 GB) copied, 16.2938 s, 659 MB/s

Chris had sent out something similar at Christoph's suggestion,
and Christoph reminded me of it, and I tested it a variant of
it, and it seems to help shockingly well.

Feels like a bandaid though; thoughts?  Other tests to do?

Thanks,
-Eric

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Cc: Chris Mason <chris.mason@oracle.com>
---

Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
@@ -1268,6 +1268,13 @@ xfs_vm_writepage(
 	if (!page_has_buffers(page))
 		create_empty_buffers(page, 1 << inode->i_blkbits, 0);
 
+
+	/*
+	 *  VM calculation for nr_to_write seems off.  Bump it way
+	 *  up, this gets simple streaming writes zippy again.
+	 */
+	wbc->nr_to_write *= 4;
+
 	/*
 	 * Convert delayed allocate, unwritten or unmapped space
 	 * to real space and flush out to disk.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-02 21:29 ` Eric Sandeen
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Sandeen @ 2009-07-02 21:29 UTC (permalink / raw)
  To: xfs mailing list; +Cc: linux-mm, Christoph Hellwig, MASON,CHRISTOPHER

Talking w/ someone who had a raid6 of 15 drives on an areca
controller, he wondered why he could only get 300MB/s or so
out of a streaming buffered write to xfs like so:

dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s

when the same write directly to the device was going closer
to 700MB/s...

With the following change things get moving again for xfs:

dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
10737418240 bytes (11 GB) copied, 16.2938 s, 659 MB/s

Chris had sent out something similar at Christoph's suggestion,
and Christoph reminded me of it, and I tested it a variant of
it, and it seems to help shockingly well.

Feels like a bandaid though; thoughts?  Other tests to do?

Thanks,
-Eric

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Cc: Chris Mason <chris.mason@oracle.com>
---

Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
@@ -1268,6 +1268,13 @@ xfs_vm_writepage(
 	if (!page_has_buffers(page))
 		create_empty_buffers(page, 1 << inode->i_blkbits, 0);
 
+
+	/*
+	 *  VM calculation for nr_to_write seems off.  Bump it way
+	 *  up, this gets simple streaming writes zippy again.
+	 */
+	wbc->nr_to_write *= 4;
+
 	/*
 	 * Convert delayed allocate, unwritten or unmapped space
 	 * to real space and flush out to disk.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-02 21:29 ` Eric Sandeen
  (?)
@ 2009-07-03 23:51 ` Michael Monnerie
  -1 siblings, 0 replies; 29+ messages in thread
From: Michael Monnerie @ 2009-07-03 23:51 UTC (permalink / raw)
  To: xfs

On Donnerstag 02 Juli 2009 Eric Sandeen wrote:
> With the following change things get moving again for xfs:

Amazeing, more than double speed with a one-liner. Do you have more such 
lines? ;-)

> +	/*
> +	 *  VM calculation for nr_to_write seems off.  Bump it way
> +	 *  up, this gets simple streaming writes zippy again.
> +	 */
> +	wbc->nr_to_write *= 4;

Could this be helpful here also: I've just transfered a copy of a 
directory from our server to a Linux desktop. Nothing else running, just 
an rsync from server to client, where the client has a Seagate 1TB ES.2 
SATA disk, whhic can do about 80MB/s on large writes. But it did this, 
measured on large files (>20MB each, no small files):

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
sdb               0,00   584,00    0,00  368,00     0,00  7448,00    
40,48   148,64  401,40   2,72 100,00

All the time around 300+ IOps, which is OK, but only 7-10MB/s? That 
can't be true. Then I killed the rsync process on the server, and the 
writes on the client jumped up:

sdb               0,00  4543,40    0,00  333,40     0,00 44965,60   
269,74   144,66  384,98   3,00 100,00

45MB/s is OK. I investigated a bit further: Seems the /proc/sys/vm 
values are strange, clients kernel is # uname -a
Linux saturn 2.6.30-ZMI #1 SMP PREEMPT Wed Jun 10 20:07:31 CEST 2009 
x86_64 x86_64 x86_64 GNU/Linux

This makes rsync slow:
cat /proc/sys/vm/dirty_*       
0                                       
5                                       
0                                       
8000                                    
50                                      
100                    

This fast:
cat /proc/sys/vm/dirty_*       
cat dirty_*
16123456
0
524123456
8000
0
100

Seems more like a kernel related stuff, but do others see the same 
thing?

So, I'm really out for a 1 week vacation now, have fun!

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-02 21:29 ` Eric Sandeen
@ 2009-07-07  9:07   ` Olaf Weber
  -1 siblings, 0 replies; 29+ messages in thread
From: Olaf Weber @ 2009-07-07  9:07 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Christoph Hellwig, linux-mm, MASON, CHRISTOPHER, xfs mailing list

Eric Sandeen writes:

> Talking w/ someone who had a raid6 of 15 drives on an areca
> controller, he wondered why he could only get 300MB/s or so
> out of a streaming buffered write to xfs like so:

> dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
> 10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s

> when the same write directly to the device was going closer
> to 700MB/s...

> With the following change things get moving again for xfs:

> dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
> 10737418240 bytes (11 GB) copied, 16.2938 s, 659 MB/s

> Chris had sent out something similar at Christoph's suggestion,
> and Christoph reminded me of it, and I tested it a variant of
> it, and it seems to help shockingly well.

> Feels like a bandaid though; thoughts?  Other tests to do?

If the nr_to_write calculation really yields a value that is too
small, shouldn't it be fixed elsewhere?

Otherwise it might make sense to make the fudge factor tunable.

> +
> +	/*
> +	 *  VM calculation for nr_to_write seems off.  Bump it way
> +	 *  up, this gets simple streaming writes zippy again.
> +	 */
> +	wbc->nr_to_write *= 4;
> +

-- 
Olaf Weber                 SGI               Phone:  +31(0)30-6696752
                           Veldzigt 2b       Fax:    +31(0)30-6696799
Technical Lead             3454 PW de Meern  Vnet:   955-7151
Storage Software           The Netherlands   Email:  olaf@sgi.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-07  9:07   ` Olaf Weber
  0 siblings, 0 replies; 29+ messages in thread
From: Olaf Weber @ 2009-07-07  9:07 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: xfs mailing list, Christoph Hellwig, linux-mm, MASON, CHRISTOPHER

Eric Sandeen writes:

> Talking w/ someone who had a raid6 of 15 drives on an areca
> controller, he wondered why he could only get 300MB/s or so
> out of a streaming buffered write to xfs like so:

> dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
> 10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s

> when the same write directly to the device was going closer
> to 700MB/s...

> With the following change things get moving again for xfs:

> dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
> 10737418240 bytes (11 GB) copied, 16.2938 s, 659 MB/s

> Chris had sent out something similar at Christoph's suggestion,
> and Christoph reminded me of it, and I tested it a variant of
> it, and it seems to help shockingly well.

> Feels like a bandaid though; thoughts?  Other tests to do?

If the nr_to_write calculation really yields a value that is too
small, shouldn't it be fixed elsewhere?

Otherwise it might make sense to make the fudge factor tunable.

> +
> +	/*
> +	 *  VM calculation for nr_to_write seems off.  Bump it way
> +	 *  up, this gets simple streaming writes zippy again.
> +	 */
> +	wbc->nr_to_write *= 4;
> +

-- 
Olaf Weber                 SGI               Phone:  +31(0)30-6696752
                           Veldzigt 2b       Fax:    +31(0)30-6696799
Technical Lead             3454 PW de Meern  Vnet:   955-7151
Storage Software           The Netherlands   Email:  olaf@sgi.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07  9:07   ` Olaf Weber
@ 2009-07-07 10:19     ` Christoph Hellwig
  -1 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2009-07-07 10:19 UTC (permalink / raw)
  To: Olaf Weber
  Cc: Christoph Hellwig, Eric Sandeen, linux-mm, MASON, CHRISTOPHER,
	xfs mailing list

On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote:
> If the nr_to_write calculation really yields a value that is too
> small, shouldn't it be fixed elsewhere?

In theory it should.  But given the amazing feedback of the VM people
on this I'd rather make sure we do get the full HW bandwith on large
arrays instead of sucking badly and not just wait forever.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-07 10:19     ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2009-07-07 10:19 UTC (permalink / raw)
  To: Olaf Weber
  Cc: Eric Sandeen, xfs mailing list, Christoph Hellwig, linux-mm,
	MASON, CHRISTOPHER

On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote:
> If the nr_to_write calculation really yields a value that is too
> small, shouldn't it be fixed elsewhere?

In theory it should.  But given the amazing feedback of the VM people
on this I'd rather make sure we do get the full HW bandwith on large
arrays instead of sucking badly and not just wait forever.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07 10:19     ` Christoph Hellwig
@ 2009-07-07 10:33       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2009-07-07 10:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Sandeen, kosaki.motohiro, xfs mailing list, linux-mm,
	Olaf Weber, MASON, CHRISTOPHER

> On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote:
> > If the nr_to_write calculation really yields a value that is too
> > small, shouldn't it be fixed elsewhere?
> 
> In theory it should.  But given the amazing feedback of the VM people
> on this I'd rather make sure we do get the full HW bandwith on large
> arrays instead of sucking badly and not just wait forever.

At least, I agree with Olaf. if you got someone's NAK in past thread,
Could you please tell me its url?




_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-07 10:33       ` KOSAKI Motohiro
  0 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2009-07-07 10:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kosaki.motohiro, Olaf Weber, Eric Sandeen, xfs mailing list,
	linux-mm, MASON, CHRISTOPHER

> On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote:
> > If the nr_to_write calculation really yields a value that is too
> > small, shouldn't it be fixed elsewhere?
> 
> In theory it should.  But given the amazing feedback of the VM people
> on this I'd rather make sure we do get the full HW bandwith on large
> arrays instead of sucking badly and not just wait forever.

At least, I agree with Olaf. if you got someone's NAK in past thread,
Could you please tell me its url?




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07 10:33       ` KOSAKI Motohiro
@ 2009-07-07 10:44         ` Christoph Hellwig
  -1 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2009-07-07 10:44 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Eric Sandeen, xfs mailing list, Christoph Hellwig, linux-mm,
	Olaf Weber, MASON, CHRISTOPHER

On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> At least, I agree with Olaf. if you got someone's NAK in past thread,
> Could you please tell me its url?

The previous thread was simply dead-ended and nothing happened.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-07 10:44         ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2009-07-07 10:44 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Hellwig, Eric Sandeen, xfs mailing list, linux-mm,
	Olaf Weber, MASON, CHRISTOPHER

On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> At least, I agree with Olaf. if you got someone's NAK in past thread,
> Could you please tell me its url?

The previous thread was simply dead-ended and nothing happened.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07 10:19     ` Christoph Hellwig
@ 2009-07-07 11:37       ` Olaf Weber
  -1 siblings, 0 replies; 29+ messages in thread
From: Olaf Weber @ 2009-07-07 11:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Sandeen, xfs mailing list, MASON, CHRISTOPHER, linux-mm

Christoph Hellwig writes:
> On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote:

>> If the nr_to_write calculation really yields a value that is too
>> small, shouldn't it be fixed elsewhere?

> In theory it should.  But given the amazing feedback of the VM people
> on this I'd rather make sure we do get the full HW bandwith on large
> arrays instead of sucking badly and not just wait forever.

So how do you feel about making the fudge factor tunable?  I don't
have a good sense myself of what the value should be, whether the
hard-coded 4 is good enough in general.

-- 
Olaf Weber                 SGI               Phone:  +31(0)30-6696752
                           Veldzigt 2b       Fax:    +31(0)30-6696799
Technical Lead             3454 PW de Meern  Vnet:   955-7151
Storage Software           The Netherlands   Email:  olaf@sgi.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-07 11:37       ` Olaf Weber
  0 siblings, 0 replies; 29+ messages in thread
From: Olaf Weber @ 2009-07-07 11:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Sandeen, linux-mm, MASON, CHRISTOPHER, xfs mailing list

Christoph Hellwig writes:
> On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote:

>> If the nr_to_write calculation really yields a value that is too
>> small, shouldn't it be fixed elsewhere?

> In theory it should.  But given the amazing feedback of the VM people
> on this I'd rather make sure we do get the full HW bandwith on large
> arrays instead of sucking badly and not just wait forever.

So how do you feel about making the fudge factor tunable?  I don't
have a good sense myself of what the value should be, whether the
hard-coded 4 is good enough in general.

-- 
Olaf Weber                 SGI               Phone:  +31(0)30-6696752
                           Veldzigt 2b       Fax:    +31(0)30-6696799
Technical Lead             3454 PW de Meern  Vnet:   955-7151
Storage Software           The Netherlands   Email:  olaf@sgi.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07 11:37       ` Olaf Weber
@ 2009-07-07 14:46         ` Christoph Hellwig
  -1 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2009-07-07 14:46 UTC (permalink / raw)
  To: Olaf Weber
  Cc: Christoph Hellwig, Eric Sandeen, xfs mailing list, MASON,
	CHRISTOPHER, linux-mm

On Tue, Jul 07, 2009 at 01:37:05PM +0200, Olaf Weber wrote:
> > In theory it should.  But given the amazing feedback of the VM people
> > on this I'd rather make sure we do get the full HW bandwith on large
> > arrays instead of sucking badly and not just wait forever.
> 
> So how do you feel about making the fudge factor tunable?  I don't
> have a good sense myself of what the value should be, whether the
> hard-coded 4 is good enough in general.

A tunable means exposing an ABI, which I'd rather not do for a hack like
this.  If you don't like the number feel free to experiment around with
it, SGI should have enough large systems that can be used to test this
out.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-07 14:46         ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2009-07-07 14:46 UTC (permalink / raw)
  To: Olaf Weber
  Cc: Christoph Hellwig, Eric Sandeen, linux-mm, MASON, CHRISTOPHER,
	xfs mailing list

On Tue, Jul 07, 2009 at 01:37:05PM +0200, Olaf Weber wrote:
> > In theory it should.  But given the amazing feedback of the VM people
> > on this I'd rather make sure we do get the full HW bandwith on large
> > arrays instead of sucking badly and not just wait forever.
> 
> So how do you feel about making the fudge factor tunable?  I don't
> have a good sense myself of what the value should be, whether the
> hard-coded 4 is good enough in general.

A tunable means exposing an ABI, which I'd rather not do for a hack like
this.  If you don't like the number feel free to experiment around with
it, SGI should have enough large systems that can be used to test this
out.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-02 21:29 ` Eric Sandeen
@ 2009-07-07 15:17   ` Chris Mason
  -1 siblings, 0 replies; 29+ messages in thread
From: Chris Mason @ 2009-07-07 15:17 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Christoph Hellwig, linux-mm, jens.axboe, xfs mailing list

On Thu, Jul 02, 2009 at 04:29:41PM -0500, Eric Sandeen wrote:
> Talking w/ someone who had a raid6 of 15 drives on an areca
> controller, he wondered why he could only get 300MB/s or so
> out of a streaming buffered write to xfs like so:
> 
> dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
> 10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s

I did some quick tests and found some unhappy things ;)  On my 5 drive
sata array (configured via LVM in a stripeset), dd with O_DIRECT to the
block device can stream writes at a healthy 550MB/s.

On 2.6.30, XFS does O_DIRECT at the exact same 550MB/s, and buffered
writes at 370MB/s.  Btrfs does a little better on buffered and a little
worse on O_DIRECT.  Ext4 splits the middle and does 400MB/s on both
buffered and O_DIRECT.

2.6.31-rc2 gave similar results.  One thing I noticed was that pdflush
and friends aren't using the right flag in congestion_wait after it was
updated to do congestion based on sync/async instead of read/write.  I'm
always happy when I get to blame bugs on Jens, but fixing the congestion
flag usage actually made the runs slower (he still promises to send a
patch for the congestion).

A little while ago, Jan Kara sent seekwatcher changes that let it graph
per-process info about IO submission, so I cooked up a graph of the IO
done by pdflush, dd, and others during an XFS buffered streaming write.

http://oss.oracle.com/~mason/seekwatcher/xfs-dd-2.6.30.png

The dark blue dots are dd doing writes and the light green dots are
pdflush.  The graph shows that pdflush spends almost the entire run
sitting around doing nothing, and sysrq-w shows all the pdflush threads
waiting around in congestion_wait.

Just to make sure the graphing wasn't hiding work done by pdflush, I
filtered out all the dd IO:

http://oss.oracle.com/~mason/seekwatcher/xfs-dd-2.6.30-filtered.png

With all of this in mind, I think the reason why the nr_to_write change
is helping is because dd is doing all the IO during balance_dirty_pages,
and the higher nr_to_write number is making sure that more IO goes out
at a time.

Once dd starts doing IO in balance_dirty_pages, our queues get
congested.  From that moment on, the bdi_congested checks in the
writeback path make pdflush sit down.  I doubt the queue every really
leaves congestion because we get over the dirty high water mark and dd
is jumping in and sending IO down the pipe without waiting for
congestion to clear.

sysrq-w supports this.  dd is always in get_request_wait and pdflush is
always in congestion_wait.

This bad interaction between pdflush and congestion was one of the
motivations for Jens' new writeback work, so I was really hoping to git
pull and post a fantastic new benchmark result.  With Jens' code the
graph ends up completely inverted, with roughly the same performance.

Instead of dd doing all the work, the flusher thread is doing all the
work (horray!) and dd is almost always in congestion_wait (boo).  I
think the cause is a little different, it seems that with Jens' code, dd
finds the flusher thread has the inode locked, and so
balance_dirty_pages doesn't find any work to do.  It waits on
congestion_wait().

If I replace the balance_dirty_pages() congestion_wait() with
schedule_timeout(1) in Jens' writeback branch, xfs buffered writes go
from 370MB/s to 520MB/s.  There are still some big peaks and valleys,
but it at least shows where we need to think harder about congestion
flags, IO waiting and other issues.

All of this is a long way of saying that until Jens' new code goes in,
(with additional tuning) the nr_to_write change makes sense to me.  I
don't see a 2.6.31 suitable way to tune things without his work.

-chris

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-07 15:17   ` Chris Mason
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Mason @ 2009-07-07 15:17 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs mailing list, linux-mm, Christoph Hellwig, jens.axboe

On Thu, Jul 02, 2009 at 04:29:41PM -0500, Eric Sandeen wrote:
> Talking w/ someone who had a raid6 of 15 drives on an areca
> controller, he wondered why he could only get 300MB/s or so
> out of a streaming buffered write to xfs like so:
> 
> dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
> 10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s

I did some quick tests and found some unhappy things ;)  On my 5 drive
sata array (configured via LVM in a stripeset), dd with O_DIRECT to the
block device can stream writes at a healthy 550MB/s.

On 2.6.30, XFS does O_DIRECT at the exact same 550MB/s, and buffered
writes at 370MB/s.  Btrfs does a little better on buffered and a little
worse on O_DIRECT.  Ext4 splits the middle and does 400MB/s on both
buffered and O_DIRECT.

2.6.31-rc2 gave similar results.  One thing I noticed was that pdflush
and friends aren't using the right flag in congestion_wait after it was
updated to do congestion based on sync/async instead of read/write.  I'm
always happy when I get to blame bugs on Jens, but fixing the congestion
flag usage actually made the runs slower (he still promises to send a
patch for the congestion).

A little while ago, Jan Kara sent seekwatcher changes that let it graph
per-process info about IO submission, so I cooked up a graph of the IO
done by pdflush, dd, and others during an XFS buffered streaming write.

http://oss.oracle.com/~mason/seekwatcher/xfs-dd-2.6.30.png

The dark blue dots are dd doing writes and the light green dots are
pdflush.  The graph shows that pdflush spends almost the entire run
sitting around doing nothing, and sysrq-w shows all the pdflush threads
waiting around in congestion_wait.

Just to make sure the graphing wasn't hiding work done by pdflush, I
filtered out all the dd IO:

http://oss.oracle.com/~mason/seekwatcher/xfs-dd-2.6.30-filtered.png

With all of this in mind, I think the reason why the nr_to_write change
is helping is because dd is doing all the IO during balance_dirty_pages,
and the higher nr_to_write number is making sure that more IO goes out
at a time.

Once dd starts doing IO in balance_dirty_pages, our queues get
congested.  From that moment on, the bdi_congested checks in the
writeback path make pdflush sit down.  I doubt the queue every really
leaves congestion because we get over the dirty high water mark and dd
is jumping in and sending IO down the pipe without waiting for
congestion to clear.

sysrq-w supports this.  dd is always in get_request_wait and pdflush is
always in congestion_wait.

This bad interaction between pdflush and congestion was one of the
motivations for Jens' new writeback work, so I was really hoping to git
pull and post a fantastic new benchmark result.  With Jens' code the
graph ends up completely inverted, with roughly the same performance.

Instead of dd doing all the work, the flusher thread is doing all the
work (horray!) and dd is almost always in congestion_wait (boo).  I
think the cause is a little different, it seems that with Jens' code, dd
finds the flusher thread has the inode locked, and so
balance_dirty_pages doesn't find any work to do.  It waits on
congestion_wait().

If I replace the balance_dirty_pages() congestion_wait() with
schedule_timeout(1) in Jens' writeback branch, xfs buffered writes go
from 370MB/s to 520MB/s.  There are still some big peaks and valleys,
but it at least shows where we need to think harder about congestion
flags, IO waiting and other issues.

All of this is a long way of saying that until Jens' new code goes in,
(with additional tuning) the nr_to_write change makes sense to me.  I
don't see a 2.6.31 suitable way to tune things without his work.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07 10:44         ` Christoph Hellwig
@ 2009-07-09  2:04           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2009-07-09  2:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Sandeen, kosaki.motohiro, xfs mailing list, linux-mm,
	Olaf Weber, MASON, CHRISTOPHER

> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> > At least, I agree with Olaf. if you got someone's NAK in past thread,
> > Could you please tell me its url?
> 
> The previous thread was simply dead-ended and nothing happened.
> 

Can you remember this thread subject? sorry, I haven't remember it.



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-09  2:04           ` KOSAKI Motohiro
  0 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2009-07-09  2:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kosaki.motohiro, Eric Sandeen, xfs mailing list, linux-mm,
	Olaf Weber, MASON, CHRISTOPHER

> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> > At least, I agree with Olaf. if you got someone's NAK in past thread,
> > Could you please tell me its url?
> 
> The previous thread was simply dead-ended and nothing happened.
> 

Can you remember this thread subject? sorry, I haven't remember it.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-09  2:04           ` KOSAKI Motohiro
@ 2009-07-09 13:01             ` Chris Mason
  -1 siblings, 0 replies; 29+ messages in thread
From: Chris Mason @ 2009-07-09 13:01 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Hellwig, Eric Sandeen, linux-mm, Olaf Weber, xfs mailing list

On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
> > On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> > > At least, I agree with Olaf. if you got someone's NAK in past thread,
> > > Could you please tell me its url?
> > 
> > The previous thread was simply dead-ended and nothing happened.
> > 
> 
> Can you remember this thread subject? sorry, I haven't remember it.

This is the original thread, it did lead to a few different patches
going in, but the nr_to_write change wasn't one of them.

http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread

-chris

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-09 13:01             ` Chris Mason
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Mason @ 2009-07-09 13:01 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Hellwig, Eric Sandeen, xfs mailing list, linux-mm, Olaf Weber

On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
> > On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> > > At least, I agree with Olaf. if you got someone's NAK in past thread,
> > > Could you please tell me its url?
> > 
> > The previous thread was simply dead-ended and nothing happened.
> > 
> 
> Can you remember this thread subject? sorry, I haven't remember it.

This is the original thread, it did lead to a few different patches
going in, but the nr_to_write change wasn't one of them.

http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-09 13:01             ` Chris Mason
@ 2009-07-10  7:12               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2009-07-10  7:12 UTC (permalink / raw)
  To: Chris Mason
  Cc: Eric Sandeen, kosaki.motohiro, xfs mailing list,
	Christoph Hellwig, linux-mm, Olaf Weber

> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
> > > On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> > > > At least, I agree with Olaf. if you got someone's NAK in past thread,
> > > > Could you please tell me its url?
> > > 
> > > The previous thread was simply dead-ended and nothing happened.
> > > 
> > 
> > Can you remember this thread subject? sorry, I haven't remember it.
> 
> This is the original thread, it did lead to a few different patches
> going in, but the nr_to_write change wasn't one of them.
> 
> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread

Thanks good pointer. This thread have multiple interesting discussion.

1. making ext4_write_cache_pages() or modifying write_cache_pages()

I think this is Christoph's homework. he said

> I agree.  But I'm still not quite sure if that requirement is unique to
> ext4 anyway.  Give me some time to dive into the writeback code again,
> haven't been there for quite a while.

if he says modifying write_cache_pages() is necessary, I'd like to review it.


2. Current mapping->writeback_index updating is not proper?

I'm not sure which solution is better. but I think your first proposal is
enough acceptable.


3. Current wbc->nr_to_write value is not proper?

Current writeback_set_ratelimit() doesn't permit that ratelimit_pages exceed
4M byte. but it is too low restriction for nowadays.
(that's my understand. right?)

=======================================================
void writeback_set_ratelimit(void)
{
        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
        if (ratelimit_pages < 16)
                ratelimit_pages = 16;
        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
}
=======================================================

Yes, 4M bytes are pretty magical constant. We have three choice
  A. Remove magical 4M constant simple (a bit danger)
  B. Decide high border from IO capability
  C. Introduce new /proc knob (as Olaf proposed)


In my personal prefer, B & C are better.



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-10  7:12               ` KOSAKI Motohiro
  0 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2009-07-10  7:12 UTC (permalink / raw)
  To: Chris Mason
  Cc: kosaki.motohiro, Christoph Hellwig, Eric Sandeen,
	xfs mailing list, linux-mm, Olaf Weber

> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
> > > On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> > > > At least, I agree with Olaf. if you got someone's NAK in past thread,
> > > > Could you please tell me its url?
> > > 
> > > The previous thread was simply dead-ended and nothing happened.
> > > 
> > 
> > Can you remember this thread subject? sorry, I haven't remember it.
> 
> This is the original thread, it did lead to a few different patches
> going in, but the nr_to_write change wasn't one of them.
> 
> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread

Thanks good pointer. This thread have multiple interesting discussion.

1. making ext4_write_cache_pages() or modifying write_cache_pages()

I think this is Christoph's homework. he said

> I agree.  But I'm still not quite sure if that requirement is unique to
> ext4 anyway.  Give me some time to dive into the writeback code again,
> haven't been there for quite a while.

if he says modifying write_cache_pages() is necessary, I'd like to review it.


2. Current mapping->writeback_index updating is not proper?

I'm not sure which solution is better. but I think your first proposal is
enough acceptable.


3. Current wbc->nr_to_write value is not proper?

Current writeback_set_ratelimit() doesn't permit that ratelimit_pages exceed
4M byte. but it is too low restriction for nowadays.
(that's my understand. right?)

=======================================================
void writeback_set_ratelimit(void)
{
        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
        if (ratelimit_pages < 16)
                ratelimit_pages = 16;
        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
}
=======================================================

Yes, 4M bytes are pretty magical constant. We have three choice
  A. Remove magical 4M constant simple (a bit danger)
  B. Decide high border from IO capability
  C. Introduce new /proc knob (as Olaf proposed)


In my personal prefer, B & C are better.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-10  7:12               ` KOSAKI Motohiro
@ 2009-07-24  5:20                 ` Felix Blyakher
  -1 siblings, 0 replies; 29+ messages in thread
From: Felix Blyakher @ 2009-07-24  5:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Eric Sandeen, xfs mailing list, Christoph Hellwig, linux-mm,
	Olaf Weber, Chris Mason


On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote:

>> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
>>>> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
>>>>> At least, I agree with Olaf. if you got someone's NAK in past  
>>>>> thread,
>>>>> Could you please tell me its url?
>>>>
>>>> The previous thread was simply dead-ended and nothing happened.
>>>>
>>>
>>> Can you remember this thread subject? sorry, I haven't remember it.
>>
>> This is the original thread, it did lead to a few different patches
>> going in, but the nr_to_write change wasn't one of them.
>>
>> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread
>
> Thanks good pointer. This thread have multiple interesting discussion.
>
> 1. making ext4_write_cache_pages() or modifying write_cache_pages()
>
> I think this is Christoph's homework. he said
>
>> I agree.  But I'm still not quite sure if that requirement is  
>> unique to
>> ext4 anyway.  Give me some time to dive into the writeback code  
>> again,
>> haven't been there for quite a while.
>
> if he says modifying write_cache_pages() is necessary, I'd like to  
> review it.
>
>
> 2. Current mapping->writeback_index updating is not proper?
>
> I'm not sure which solution is better. but I think your first  
> proposal is
> enough acceptable.
>
>
> 3. Current wbc->nr_to_write value is not proper?
>
> Current writeback_set_ratelimit() doesn't permit that  
> ratelimit_pages exceed
> 4M byte. but it is too low restriction for nowadays.
> (that's my understand. right?)
>
> =======================================================
> void writeback_set_ratelimit(void)
> {
>        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
>        if (ratelimit_pages < 16)
>                ratelimit_pages = 16;
>        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
>                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
> }
> =======================================================
>
> Yes, 4M bytes are pretty magical constant. We have three choice
>  A. Remove magical 4M constant simple (a bit danger)

That's will be outside the xfs, and seems like there is no much interest
from mm people.

>  B. Decide high border from IO capability

It's not clear to me how to calculate that high border, but again
it's outside of the xfs scope, and we don't have much control here.

>  C. Introduce new /proc knob (as Olaf proposed)

We need at least to play with different numbers, and putting the
knob (xfs tunable) would be one way to do it. Also, different
configurations may need different nr_to_write value.

In either way it seems hackish, but with the knob at least there is
some control of it.

Felix

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-24  5:20                 ` Felix Blyakher
  0 siblings, 0 replies; 29+ messages in thread
From: Felix Blyakher @ 2009-07-24  5:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Chris Mason, Eric Sandeen, xfs mailing list, Christoph Hellwig,
	linux-mm, Olaf Weber


On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote:

>> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
>>>> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
>>>>> At least, I agree with Olaf. if you got someone's NAK in past  
>>>>> thread,
>>>>> Could you please tell me its url?
>>>>
>>>> The previous thread was simply dead-ended and nothing happened.
>>>>
>>>
>>> Can you remember this thread subject? sorry, I haven't remember it.
>>
>> This is the original thread, it did lead to a few different patches
>> going in, but the nr_to_write change wasn't one of them.
>>
>> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread
>
> Thanks good pointer. This thread have multiple interesting discussion.
>
> 1. making ext4_write_cache_pages() or modifying write_cache_pages()
>
> I think this is Christoph's homework. he said
>
>> I agree.  But I'm still not quite sure if that requirement is  
>> unique to
>> ext4 anyway.  Give me some time to dive into the writeback code  
>> again,
>> haven't been there for quite a while.
>
> if he says modifying write_cache_pages() is necessary, I'd like to  
> review it.
>
>
> 2. Current mapping->writeback_index updating is not proper?
>
> I'm not sure which solution is better. but I think your first  
> proposal is
> enough acceptable.
>
>
> 3. Current wbc->nr_to_write value is not proper?
>
> Current writeback_set_ratelimit() doesn't permit that  
> ratelimit_pages exceed
> 4M byte. but it is too low restriction for nowadays.
> (that's my understand. right?)
>
> =======================================================
> void writeback_set_ratelimit(void)
> {
>        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
>        if (ratelimit_pages < 16)
>                ratelimit_pages = 16;
>        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
>                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
> }
> =======================================================
>
> Yes, 4M bytes are pretty magical constant. We have three choice
>  A. Remove magical 4M constant simple (a bit danger)

That's will be outside the xfs, and seems like there is no much interest
from mm people.

>  B. Decide high border from IO capability

It's not clear to me how to calculate that high border, but again
it's outside of the xfs scope, and we don't have much control here.

>  C. Introduce new /proc knob (as Olaf proposed)

We need at least to play with different numbers, and putting the
knob (xfs tunable) would be one way to do it. Also, different
configurations may need different nr_to_write value.

In either way it seems hackish, but with the knob at least there is
some control of it.

Felix

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-24  5:20                 ` Felix Blyakher
@ 2009-07-24  5:33                   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2009-07-24  5:33 UTC (permalink / raw)
  To: Felix Blyakher
  Cc: Eric Sandeen, Olaf Weber, xfs mailing list, Christoph Hellwig,
	linux-mm, kosaki.motohiro, Chris Mason

> 
> On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote:
> 
> >> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
> >>>> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> >>>>> At least, I agree with Olaf. if you got someone's NAK in past  
> >>>>> thread,
> >>>>> Could you please tell me its url?
> >>>>
> >>>> The previous thread was simply dead-ended and nothing happened.
> >>>>
> >>>
> >>> Can you remember this thread subject? sorry, I haven't remember it.
> >>
> >> This is the original thread, it did lead to a few different patches
> >> going in, but the nr_to_write change wasn't one of them.
> >>
> >> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread
> >
> > Thanks good pointer. This thread have multiple interesting discussion.
> >
> > 1. making ext4_write_cache_pages() or modifying write_cache_pages()
> >
> > I think this is Christoph's homework. he said
> >
> >> I agree.  But I'm still not quite sure if that requirement is  
> >> unique to
> >> ext4 anyway.  Give me some time to dive into the writeback code  
> >> again,
> >> haven't been there for quite a while.
> >
> > if he says modifying write_cache_pages() is necessary, I'd like to  
> > review it.
> >
> >
> > 2. Current mapping->writeback_index updating is not proper?
> >
> > I'm not sure which solution is better. but I think your first  
> > proposal is
> > enough acceptable.
> >
> >
> > 3. Current wbc->nr_to_write value is not proper?
> >
> > Current writeback_set_ratelimit() doesn't permit that  
> > ratelimit_pages exceed
> > 4M byte. but it is too low restriction for nowadays.
> > (that's my understand. right?)
> >
> > =======================================================
> > void writeback_set_ratelimit(void)
> > {
> >        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
> >        if (ratelimit_pages < 16)
> >                ratelimit_pages = 16;
> >        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
> >                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
> > }
> > =======================================================
> >
> > Yes, 4M bytes are pretty magical constant. We have three choice
> >  A. Remove magical 4M constant simple (a bit danger)
> 
> That's will be outside the xfs, and seems like there is no much interest
> from mm people.

That's ok. you can join mm people :)



> >  B. Decide high border from IO capability
> 
> It's not clear to me how to calculate that high border, but again
> it's outside of the xfs scope, and we don't have much control here.
> 
> >  C. Introduce new /proc knob (as Olaf proposed)
> 
> We need at least to play with different numbers, and putting the
> knob (xfs tunable) would be one way to do it. Also, different
> configurations may need different nr_to_write value.
> 
> In either way it seems hackish, but with the knob at least there is
> some control of it.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-24  5:33                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 29+ messages in thread
From: KOSAKI Motohiro @ 2009-07-24  5:33 UTC (permalink / raw)
  To: Felix Blyakher
  Cc: kosaki.motohiro, Chris Mason, Eric Sandeen, xfs mailing list,
	Christoph Hellwig, linux-mm, Olaf Weber

> 
> On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote:
> 
> >> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
> >>>> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> >>>>> At least, I agree with Olaf. if you got someone's NAK in past  
> >>>>> thread,
> >>>>> Could you please tell me its url?
> >>>>
> >>>> The previous thread was simply dead-ended and nothing happened.
> >>>>
> >>>
> >>> Can you remember this thread subject? sorry, I haven't remember it.
> >>
> >> This is the original thread, it did lead to a few different patches
> >> going in, but the nr_to_write change wasn't one of them.
> >>
> >> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread
> >
> > Thanks good pointer. This thread have multiple interesting discussion.
> >
> > 1. making ext4_write_cache_pages() or modifying write_cache_pages()
> >
> > I think this is Christoph's homework. he said
> >
> >> I agree.  But I'm still not quite sure if that requirement is  
> >> unique to
> >> ext4 anyway.  Give me some time to dive into the writeback code  
> >> again,
> >> haven't been there for quite a while.
> >
> > if he says modifying write_cache_pages() is necessary, I'd like to  
> > review it.
> >
> >
> > 2. Current mapping->writeback_index updating is not proper?
> >
> > I'm not sure which solution is better. but I think your first  
> > proposal is
> > enough acceptable.
> >
> >
> > 3. Current wbc->nr_to_write value is not proper?
> >
> > Current writeback_set_ratelimit() doesn't permit that  
> > ratelimit_pages exceed
> > 4M byte. but it is too low restriction for nowadays.
> > (that's my understand. right?)
> >
> > =======================================================
> > void writeback_set_ratelimit(void)
> > {
> >        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
> >        if (ratelimit_pages < 16)
> >                ratelimit_pages = 16;
> >        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
> >                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
> > }
> > =======================================================
> >
> > Yes, 4M bytes are pretty magical constant. We have three choice
> >  A. Remove magical 4M constant simple (a bit danger)
> 
> That's will be outside the xfs, and seems like there is no much interest
> from mm people.

That's ok. you can join mm people :)



> >  B. Decide high border from IO capability
> 
> It's not clear to me how to calculate that high border, but again
> it's outside of the xfs scope, and we don't have much control here.
> 
> >  C. Introduce new /proc knob (as Olaf proposed)
> 
> We need at least to play with different numbers, and putting the
> knob (xfs tunable) would be one way to do it. Also, different
> configurations may need different nr_to_write value.
> 
> In either way it seems hackish, but with the knob at least there is
> some control of it.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-24  5:20                 ` Felix Blyakher
@ 2009-07-24 12:05                   ` Chris Mason
  -1 siblings, 0 replies; 29+ messages in thread
From: Chris Mason @ 2009-07-24 12:05 UTC (permalink / raw)
  To: Felix Blyakher
  Cc: Eric Sandeen, KOSAKI Motohiro, xfs mailing list,
	Christoph Hellwig, linux-mm, Olaf Weber

On Fri, Jul 24, 2009 at 12:20:32AM -0500, Felix Blyakher wrote:
>
> On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote:
>> 3. Current wbc->nr_to_write value is not proper?
>>
>> Current writeback_set_ratelimit() doesn't permit that ratelimit_pages 
>> exceed
>> 4M byte. but it is too low restriction for nowadays.
>> (that's my understand. right?)
>>
>> =======================================================
>> void writeback_set_ratelimit(void)
>> {
>>        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
>>        if (ratelimit_pages < 16)
>>                ratelimit_pages = 16;
>>        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
>>                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
>> }
>> =======================================================
>>
>> Yes, 4M bytes are pretty magical constant. We have three choice
>>  A. Remove magical 4M constant simple (a bit danger)
>
> That's will be outside the xfs, and seems like there is no much interest
> from mm people.
>
>>  B. Decide high border from IO capability

It is worth pointing out that Jens Axboe is planning on more feedback
controlled knobs as part of pdflush rework.

-chris

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-24 12:05                   ` Chris Mason
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Mason @ 2009-07-24 12:05 UTC (permalink / raw)
  To: Felix Blyakher
  Cc: KOSAKI Motohiro, Eric Sandeen, xfs mailing list,
	Christoph Hellwig, linux-mm, Olaf Weber

On Fri, Jul 24, 2009 at 12:20:32AM -0500, Felix Blyakher wrote:
>
> On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote:
>> 3. Current wbc->nr_to_write value is not proper?
>>
>> Current writeback_set_ratelimit() doesn't permit that ratelimit_pages 
>> exceed
>> 4M byte. but it is too low restriction for nowadays.
>> (that's my understand. right?)
>>
>> =======================================================
>> void writeback_set_ratelimit(void)
>> {
>>        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
>>        if (ratelimit_pages < 16)
>>                ratelimit_pages = 16;
>>        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
>>                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
>> }
>> =======================================================
>>
>> Yes, 4M bytes are pretty magical constant. We have three choice
>>  A. Remove magical 4M constant simple (a bit danger)
>
> That's will be outside the xfs, and seems like there is no much interest
> from mm people.
>
>>  B. Decide high border from IO capability

It is worth pointing out that Jens Axboe is planning on more feedback
controlled knobs as part of pdflush rework.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2009-07-24 12:05 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-02 21:29 [PATCH] bump up nr_to_write in xfs_vm_writepage Eric Sandeen
2009-07-02 21:29 ` Eric Sandeen
2009-07-03 23:51 ` Michael Monnerie
2009-07-07  9:07 ` Olaf Weber
2009-07-07  9:07   ` Olaf Weber
2009-07-07 10:19   ` Christoph Hellwig
2009-07-07 10:19     ` Christoph Hellwig
2009-07-07 10:33     ` KOSAKI Motohiro
2009-07-07 10:33       ` KOSAKI Motohiro
2009-07-07 10:44       ` Christoph Hellwig
2009-07-07 10:44         ` Christoph Hellwig
2009-07-09  2:04         ` KOSAKI Motohiro
2009-07-09  2:04           ` KOSAKI Motohiro
2009-07-09 13:01           ` Chris Mason
2009-07-09 13:01             ` Chris Mason
2009-07-10  7:12             ` KOSAKI Motohiro
2009-07-10  7:12               ` KOSAKI Motohiro
2009-07-24  5:20               ` Felix Blyakher
2009-07-24  5:20                 ` Felix Blyakher
2009-07-24  5:33                 ` KOSAKI Motohiro
2009-07-24  5:33                   ` KOSAKI Motohiro
2009-07-24 12:05                 ` Chris Mason
2009-07-24 12:05                   ` Chris Mason
2009-07-07 11:37     ` Olaf Weber
2009-07-07 11:37       ` Olaf Weber
2009-07-07 14:46       ` Christoph Hellwig
2009-07-07 14:46         ` Christoph Hellwig
2009-07-07 15:17 ` Chris Mason
2009-07-07 15:17   ` Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.