[PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)
@ 2006-05-23 20:14 Hans Reiser
  2006-05-23 20:26 ` Alexey Polyakov
  2006-05-24 17:53 ` Tom Vier
  0 siblings, 2 replies; 37+ messages in thread
From: Hans Reiser @ 2006-05-23 20:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Kernel Mailing List, Reiserfs developers mail-list,
	Reiserfs mail-list, Nate Diller

ftp://ftp.namesys.com/pub/reiser4-for-2.6/2.6.17-rc4-mm1/reiser4-for-2.6.17-rc4-mm1-2.patch.gz

The referenced patch replaces all reiser4 patches to mm.  It revises the
existing reiser4 code to do a good job for writes that are larger than
4k at a time by assiduously adhering to the principle that things that
need to be done once per write should be done once per write, not once
per 4k.  That statement is a slight simplification: there are times when
due to the limited size of RAM you want to do some things once per
WRITE_GRANULARITY, where WRITE_GRANULARITY is a #define that defines
some moderate number of pages to write at once.  This code empirically
proves that the generic code design which passes 4k at a time to the
underlying FS can be improved.  Performance results show that the new
code consumes  40% less CPU when doing "dd bs=1MB ....." (your hardware,
and whether the data is in cache, may vary this result).  Note that this
has only a small effect on elapsed time for most hardware.

The planned future(as discussed with akpm previously):  we will ship
very soon (testing it now) an improved reiser4 read code that does reads
in more than little 4k chunks.  Then we will revise the generic code to
allow an FS to receive the writes and reads in whole increments.  How
best to revise the generic code is still being discussed.  Nate is
discussing doing it in some way that improves code symmetry in the io
scheduler layer as well, if there is interest by others in it maybe a
thread can start on that topic, or maybe it can wait for him+zam to make
a patch.

Note for users: this patch also contains numerous important bug fixes.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)
  2006-05-23 20:14 [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer) Hans Reiser
@ 2006-05-23 20:26 ` Alexey Polyakov
  2006-05-23 20:33   ` Michal Piotrowski
  2006-05-24 17:53 ` Tom Vier
  1 sibling, 1 reply; 37+ messages in thread
From: Alexey Polyakov @ 2006-05-23 20:26 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Andrew Morton, Linux Kernel Mailing List,
	Reiserfs developers mail-list, Reiserfs mail-list, Nate Diller

Hi!

I'm actively using Reiser4 on a production servers (and I know a lot
of people that do that too).
Could you please release the patch against the vanilla tree?
I don't think there's a lot of people that will test -mm version,
especially on production servers - -mm is a little bit too unstable.

Thanks.


On 5/24/06, Hans Reiser <reiser@namesys.com> wrote:
> ftp://ftp.namesys.com/pub/reiser4-for-2.6/2.6.17-rc4-mm1/reiser4-for-2.6.17-rc4-mm1-2.patch.gz
>
> The referenced patch replaces all reiser4 patches to mm.  It revises the
> existing reiser4 code to do a good job for writes that are larger than
> 4k at a time by assiduously adhering to the principle that things that
> need to be done once per write should be done once per write, not once
> per 4k.  That statement is a slight simplification: there are times when
> due to the limited size of RAM you want to do some things once per
> WRITE_GRANULARITY, where WRITE_GRANULARITY is a #define that defines
> some moderate number of pages to write at once.  This code empirically
> proves that the generic code design which passes 4k at a time to the
> underlying FS can be improved.  Performance results show that the new
> code consumes  40% less CPU when doing "dd bs=1MB ....." (your hardware,
> and whether the data is in cache, may vary this result).  Note that this
> has only a small effect on elapsed time for most hardware.
>
> The planned future(as discussed with akpm previously):  we will ship
> very soon (testing it now) an improved reiser4 read code that does reads
> in more than little 4k chunks.  Then we will revise the generic code to
> allow an FS to receive the writes and reads in whole increments.  How
> best to revise the generic code is still being discussed.  Nate is
> discussing doing it in some way that improves code symmetry in the io
> scheduler layer as well, if there is interest by others in it maybe a
> thread can start on that topic, or maybe it can wait for him+zam to make
> a patch.
>
> Note for users: this patch also contains numerous important bug fixes.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


-- 
Alexey Polyakov

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)
  2006-05-23 20:26 ` Alexey Polyakov
@ 2006-05-23 20:33   ` Michal Piotrowski
  2006-05-24 14:39     ` Vladimir V. Saveliev
  0 siblings, 1 reply; 37+ messages in thread
From: Michal Piotrowski @ 2006-05-23 20:33 UTC (permalink / raw)
  To: Alexey Polyakov
  Cc: Hans Reiser, Andrew Morton, Linux Kernel Mailing List,
	Reiserfs developers mail-list, Reiserfs mail-list, Nate Diller

Hi Hans,

On 23/05/06, Alexey Polyakov <alexey.polyakov@gmail.com> wrote:
> Hi!
>
> I'm actively using Reiser4 on a production servers (and I know a lot
> of people that do that too).
> Could you please release the patch against the vanilla tree?
> I don't think there's a lot of people that will test -mm version,
> especially on production servers - -mm is a little bit too unstable.

Any chance to get this patch against 2.6.17-rc4-mm3?

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/wiki/)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)
  2006-05-23 20:33   ` Michal Piotrowski
@ 2006-05-24 14:39     ` Vladimir V. Saveliev
  2006-06-08 10:45       ` Jan Engelhardt
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir V. Saveliev @ 2006-05-24 14:39 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Alexey Polyakov, Hans Reiser, Andrew Morton,
	Linux Kernel Mailing List, Reiserfs developers mail-list,
	Reiserfs mail-list, Nate Diller

Hello

On Tue, 2006-05-23 at 22:33 +0200, Michal Piotrowski wrote:
> Hi Hans,
> 
> On 23/05/06, Alexey Polyakov <alexey.polyakov@gmail.com> wrote:
> > Hi!
> >
> > I'm actively using Reiser4 on a production servers (and I know a lot
> > of people that do that too).
> > Could you please release the patch against the vanilla tree?
> > I don't think there's a lot of people that will test -mm version,
> > especially on production servers - -mm is a little bit too unstable.
> 
> Any chance to get this patch against 2.6.17-rc4-mm3?
> 

yes, reiser4 updates for latest stock and mm kernels will be out in one
or two days

> Regards,
> Michal
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)
  2006-05-23 20:14 [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer) Hans Reiser
  2006-05-23 20:26 ` Alexey Polyakov
@ 2006-05-24 17:53 ` Tom Vier
  2006-05-24 17:55   ` Hans Reiser
  2006-05-24 17:59   ` [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer) Hans Reiser
  1 sibling, 2 replies; 37+ messages in thread
From: Tom Vier @ 2006-05-24 17:53 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Linux-Kernel, Reiserfs developers mail-list, Reiserfs mail-list

On Tue, May 23, 2006 at 01:14:54PM -0700, Hans Reiser wrote:
> underlying FS can be improved.  Performance results show that the new
> code consumes  40% less CPU when doing "dd bs=1MB ....." (your hardware,
> and whether the data is in cache, may vary this result).  Note that this
> has only a small effect on elapsed time for most hardware.

Write requests in linux are restricted to one page?

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)
  2006-05-24 17:53 ` Tom Vier
@ 2006-05-24 17:55   ` Hans Reiser
  2006-06-08 11:00     ` Jens Axboe
  2006-05-24 17:59   ` [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer) Hans Reiser
  1 sibling, 1 reply; 37+ messages in thread
From: Hans Reiser @ 2006-05-24 17:55 UTC (permalink / raw)
  To: Tom Vier; +Cc: Linux-Kernel, Reiserfs developers mail-list, Reiserfs mail-list

Tom Vier wrote:

>On Tue, May 23, 2006 at 01:14:54PM -0700, Hans Reiser wrote:
>  
>
>>underlying FS can be improved.  Performance results show that the new
>>code consumes  40% less CPU when doing "dd bs=1MB ....." (your hardware,
>>and whether the data is in cache, may vary this result).  Note that this
>>has only a small effect on elapsed time for most hardware.
>>    
>>
>
>Write requests in linux are restricted to one page?
>
>  
>
It may go to the kernel as a 64MB write, but VFS sends it to the FS as
64MB/4k separate 4k writes.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)
  2006-05-24 17:53 ` Tom Vier
  2006-05-24 17:55   ` Hans Reiser
@ 2006-05-24 17:59   ` Hans Reiser
  1 sibling, 0 replies; 37+ messages in thread
From: Hans Reiser @ 2006-05-24 17:59 UTC (permalink / raw)
  To: Tom Vier
  Cc: Linux-Kernel, Reiserfs developers mail-list, Reiserfs mail-list,
	Nate Diller

I should add, you execute a lot more than 4k worth of instructions for
each of these 4k writes, thus performance is non-optimal.  This is why
bios exist in the kernel, because the io layer has a similar problem
when you send it only 4k at a time (of executing a lot more than 4k of
instructions per io submission).The way the io layer handles bios is not
as clean as it could be though, Nate can say more on that.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)
  2006-05-24 14:39     ` Vladimir V. Saveliev
@ 2006-06-08 10:45       ` Jan Engelhardt
  2006-06-08 12:40         ` Vladimir V. Saveliev
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Engelhardt @ 2006-06-08 10:45 UTC (permalink / raw)
  To: Vladimir V. Saveliev
  Cc: Michal Piotrowski, Alexey Polyakov, Hans Reiser, Andrew Morton,
	Linux Kernel Mailing List, Reiserfs developers mail-list,
	Reiserfs mail-list, Nate Diller

>> > I'm actively using Reiser4 on a production servers (and I know a lot
>> > of people that do that too).
>> > Could you please release the patch against the vanilla tree?
>> > I don't think there's a lot of people that will test -mm version,
>> > especially on production servers - -mm is a little bit too unstable.
>> 
>> Any chance to get this patch against 2.6.17-rc4-mm3?
>
>yes, reiser4 updates for latest stock and mm kernels will be out in one
>or two days
>
There is a version out for 2.6.17-rc4-mm1, but for stock kernel? Has the latter
been canceled?


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing  more than 4k at a time (has implications for generic write code and eventually  for the IO layer)
  2006-05-24 17:55   ` Hans Reiser
@ 2006-06-08 11:00     ` Jens Axboe
  2006-06-08 11:26       ` Vladimir V. Saveliev
  0 siblings, 1 reply; 37+ messages in thread
From: Jens Axboe @ 2006-06-08 11:00 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Tom Vier, Linux-Kernel, Reiserfs developers mail-list,
	Reiserfs mail-list

On Wed, May 24 2006, Hans Reiser wrote:
> Tom Vier wrote:
> 
> >On Tue, May 23, 2006 at 01:14:54PM -0700, Hans Reiser wrote:
> >  
> >
> >>underlying FS can be improved.  Performance results show that the new
> >>code consumes  40% less CPU when doing "dd bs=1MB ....." (your hardware,
> >>and whether the data is in cache, may vary this result).  Note that this
> >>has only a small effect on elapsed time for most hardware.
> >>    
> >>
> >
> >Write requests in linux are restricted to one page?
> >
> >  
> >
> It may go to the kernel as a 64MB write, but VFS sends it to the FS as
> 64MB/4k separate 4k writes.

Nonsense, there are ways to get > PAGE_CACHE_SIZE writes in one chunk.
Other file systems have been doing it for years.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing  more than 4k at a time (has implications for generic write code and eventually  for the IO layer)
  2006-06-08 11:00     ` Jens Axboe
@ 2006-06-08 11:26       ` Vladimir V. Saveliev
  2006-06-08 11:35         ` Jens Axboe
  2006-06-08 12:10         ` Christoph Hellwig
  0 siblings, 2 replies; 37+ messages in thread
From: Vladimir V. Saveliev @ 2006-06-08 11:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Hans Reiser, Tom Vier, Linux-Kernel,
	Reiserfs developers mail-list, Reiserfs mail-list

Hello

On Thu, 2006-06-08 at 13:00 +0200, Jens Axboe wrote:
> On Wed, May 24 2006, Hans Reiser wrote:
> > Tom Vier wrote:
> > 
> > >On Tue, May 23, 2006 at 01:14:54PM -0700, Hans Reiser wrote:
> > >  
> > >
> > >>underlying FS can be improved.  Performance results show that the new
> > >>code consumes  40% less CPU when doing "dd bs=1MB ....." (your hardware,
> > >>and whether the data is in cache, may vary this result).  Note that this
> > >>has only a small effect on elapsed time for most hardware.
> > >>    
> > >>
> > >
> > >Write requests in linux are restricted to one page?
> > >
> > >  
> > >
> > It may go to the kernel as a 64MB write, but VFS sends it to the FS as
> > 64MB/4k separate 4k writes.
> 
> Nonsense, 

Hans refers to generic_file_write which does
prepare_write
copy_from_user
commit_write
for each page.

> there are ways to get > PAGE_CACHE_SIZE writes in one chunk.
> Other file systems have been doing it for years.
> 

Would you, please, say more about it.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by  writing  more than 4k at a time (has implications for generic write code  and eventually  for the IO layer)
  2006-06-08 11:26       ` Vladimir V. Saveliev
@ 2006-06-08 11:35         ` Jens Axboe
  2006-06-08 12:08           ` Vladimir V. Saveliev
  2006-06-14 19:37           ` Hans Reiser
  2006-06-08 12:10         ` Christoph Hellwig
  1 sibling, 2 replies; 37+ messages in thread
From: Jens Axboe @ 2006-06-08 11:35 UTC (permalink / raw)
  To: Vladimir V. Saveliev
  Cc: Hans Reiser, Tom Vier, Linux-Kernel,
	Reiserfs developers mail-list, Reiserfs mail-list

On Thu, Jun 08 2006, Vladimir V. Saveliev wrote:
> Hello
> 
> On Thu, 2006-06-08 at 13:00 +0200, Jens Axboe wrote:
> > On Wed, May 24 2006, Hans Reiser wrote:
> > > Tom Vier wrote:
> > > 
> > > >On Tue, May 23, 2006 at 01:14:54PM -0700, Hans Reiser wrote:
> > > >  
> > > >
> > > >>underlying FS can be improved.  Performance results show that the new
> > > >>code consumes  40% less CPU when doing "dd bs=1MB ....." (your hardware,
> > > >>and whether the data is in cache, may vary this result).  Note that this
> > > >>has only a small effect on elapsed time for most hardware.
> > > >>    
> > > >>
> > > >
> > > >Write requests in linux are restricted to one page?
> > > >
> > > >  
> > > >
> > > It may go to the kernel as a 64MB write, but VFS sends it to the FS as
> > > 64MB/4k separate 4k writes.
> > 
> > Nonsense, 
> 
> Hans refers to generic_file_write which does
> prepare_write
> copy_from_user
> commit_write
> for each page.

Provide your own f_op->write() ?

> > there are ways to get > PAGE_CACHE_SIZE writes in one chunk.
> > Other file systems have been doing it for years.
> > 
> 
> Would you, please, say more about it.

Use writepages?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by  writing  more than 4k at a time (has implications for generic write code  and eventually  for the IO layer)
  2006-06-08 11:35         ` Jens Axboe
@ 2006-06-08 12:08           ` Vladimir V. Saveliev
  2006-06-14 19:37           ` Hans Reiser
  1 sibling, 0 replies; 37+ messages in thread
From: Vladimir V. Saveliev @ 2006-06-08 12:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Hans Reiser, Tom Vier, Linux-Kernel,
	Reiserfs developers mail-list, Reiserfs mail-list

Hello

On Thu, 2006-06-08 at 13:35 +0200, Jens Axboe wrote:
> On Thu, Jun 08 2006, Vladimir V. Saveliev wrote:
> > Hello
> > 
> > On Thu, 2006-06-08 at 13:00 +0200, Jens Axboe wrote:
> > > On Wed, May 24 2006, Hans Reiser wrote:
> > > > Tom Vier wrote:
> > > > 
> > > > >On Tue, May 23, 2006 at 01:14:54PM -0700, Hans Reiser wrote:
> > > > >  
> > > > >
> > > > >>underlying FS can be improved.  Performance results show that the new
> > > > >>code consumes  40% less CPU when doing "dd bs=1MB ....." (your hardware,
> > > > >>and whether the data is in cache, may vary this result).  Note that this
> > > > >>has only a small effect on elapsed time for most hardware.
> > > > >>    
> > > > >>
> > > > >
> > > > >Write requests in linux are restricted to one page?
> > > > >
> > > > >  
> > > > >
> > > > It may go to the kernel as a 64MB write, but VFS sends it to the FS as
> > > > 64MB/4k separate 4k writes.
> > > 
> > > Nonsense, 
> > 
> > Hans refers to generic_file_write which does
> > prepare_write
> > copy_from_user
> > commit_write
> > for each page.
> 
> Provide your own f_op->write() ?
> 
Yes, a filesystem can do that. But it is more welcomed to kernel if it
writes/reads using generic functions.

> > > there are ways to get > PAGE_CACHE_SIZE writes in one chunk.
> > > Other file systems have been doing it for years.
> > > 
> > 
> > Would you, please, say more about it.
> 
> Use writepages?
> 
This is about flushing, not sys_write.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing  more than 4k at a time (has implications for generic write code and eventually  for the IO layer)
  2006-06-08 11:26       ` Vladimir V. Saveliev
  2006-06-08 11:35         ` Jens Axboe
@ 2006-06-08 12:10         ` Christoph Hellwig
  2006-06-14 22:08           ` batched write Vladimir V. Saveliev
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2006-06-08 12:10 UTC (permalink / raw)
  To: Vladimir V. Saveliev
  Cc: Jens Axboe, Hans Reiser, Tom Vier, Linux-Kernel,
	Reiserfs developers mail-list, Reiserfs mail-list

On Thu, Jun 08, 2006 at 03:26:40PM +0400, Vladimir V. Saveliev wrote:
> > > It may go to the kernel as a 64MB write, but VFS sends it to the FS as
> > > 64MB/4k separate 4k writes.
> > 
> > Nonsense, 
> 
> Hans refers to generic_file_write which does
> prepare_write
> copy_from_user
> commit_write
> for each page.

That's not really the vfs but the generic pagecache routines.  For some
filesystems (e.g. XFS) only reservations for delayed allocations are
performed in this path so it doesn't really matter.  For not so advanced
filesystems batching these calls would definitly be very helpful.  Patches
to get there are very welcome.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)
  2006-06-08 10:45       ` Jan Engelhardt
@ 2006-06-08 12:40         ` Vladimir V. Saveliev
  2006-06-08 14:11           ` Jan Engelhardt
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir V. Saveliev @ 2006-06-08 12:40 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Michal Piotrowski, Alexey Polyakov, Hans Reiser, Andrew Morton,
	Linux Kernel Mailing List, Reiserfs developers mail-list,
	Reiserfs mail-list, Nate Diller

Hello

On Thu, 2006-06-08 at 12:45 +0200, Jan Engelhardt wrote:
> >> > I'm actively using Reiser4 on a production servers (and I know a lot
> >> > of people that do that too).
> >> > Could you please release the patch against the vanilla tree?
> >> > I don't think there's a lot of people that will test -mm version,
> >> > especially on production servers - -mm is a little bit too unstable.
> >> 
> >> Any chance to get this patch against 2.6.17-rc4-mm3?
> >
> >yes, reiser4 updates for latest stock and mm kernels will be out in one
> >or two days
> >
> There is a version out for 2.6.17-rc4-mm1, but for stock kernel? Has the latter
> been canceled?
> 

There is quite fresh
ftp://ftp.namesys.com/pub/reiser4-for-2.6/2.6.16/reiser4-for-2.6.16-4.patch.gz

> 
> Jan Engelhardt


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer)
  2006-06-08 12:40         ` Vladimir V. Saveliev
@ 2006-06-08 14:11           ` Jan Engelhardt
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Engelhardt @ 2006-06-08 14:11 UTC (permalink / raw)
  To: Vladimir V. Saveliev
  Cc: Michal Piotrowski, Alexey Polyakov, Hans Reiser, Andrew Morton,
	Linux Kernel Mailing List, Reiserfs developers mail-list,
	Reiserfs mail-list, Nate Diller

>> >> Any chance to get this patch against 2.6.17-rc4-mm3?
>> >yes, reiser4 updates for latest stock and mm kernels will be out in one
>> >or two days
>> There is a version out for 2.6.17-rc4-mm1, but for stock kernel? Has the latter
>> been canceled?
>There is quite fresh
>ftp://ftp.namesys.com/pub/reiser4-for-2.6/2.6.16/reiser4-for-2.6.16-4.patch.gz
>
Ah thank you. I actually was looking for a /^2.6.17-rc\d+$/ dir which
explains why I did not find it :)


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] updated reiser4 - reduced cpu usage for writes by  writing more than 4k at a time (has implications for generic write code  and eventually for the IO layer)
  2006-06-08 11:35         ` Jens Axboe
  2006-06-08 12:08           ` Vladimir V. Saveliev
@ 2006-06-14 19:37           ` Hans Reiser
  1 sibling, 0 replies; 37+ messages in thread
From: Hans Reiser @ 2006-06-14 19:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Vladimir V. Saveliev, Tom Vier, Linux-Kernel,
	Reiserfs developers mail-list, Reiserfs mail-list

Jens Axboe wrote:

>On Thu, Jun 08 2006, Vladimir V. Saveliev wrote:
>  
>
>>Hello
>>
>>On Thu, 2006-06-08 at 13:00 +0200, Jens Axboe wrote:
>>    
>>
>>>On Wed, May 24 2006, Hans Reiser wrote:
>>>      
>>>
>>>>Tom Vier wrote:
>>>>
>>>>        
>>>>
>>>>>On Tue, May 23, 2006 at 01:14:54PM -0700, Hans Reiser wrote:
>>>>> 
>>>>>
>>>>>          
>>>>>
>>>>>>underlying FS can be improved.  Performance results show that the new
>>>>>>code consumes  40% less CPU when doing "dd bs=1MB ....." (your hardware,
>>>>>>and whether the data is in cache, may vary this result).  Note that this
>>>>>>has only a small effect on elapsed time for most hardware.
>>>>>>   
>>>>>>
>>>>>>            
>>>>>>
>>>>>Write requests in linux are restricted to one page?
>>>>>
>>>>> 
>>>>>
>>>>>          
>>>>>
>>>>It may go to the kernel as a 64MB write, but VFS sends it to the FS as
>>>>64MB/4k separate 4k writes.
>>>>        
>>>>
>>>Nonsense, 
>>>      
>>>
>>Hans refers to generic_file_write which does
>>prepare_write
>>copy_from_user
>>commit_write
>>for each page.
>>    
>>
>
>Provide your own f_op->write() ?
>  
>
In Unix VFS is an abstraction layer with a philosophical commitment to
allow filesystems to do their own thing, but Linux is quite different,
and what you suggest got vetoed with emphasis.   In all fairness, the
patch vs is sending is one I can live with that allows me to not worry
about aio code and direct io code, neither of which interest me at this
time.  So I suppose there is some benefit to all this hassle.

>  
>
>>>there are ways to get > PAGE_CACHE_SIZE writes in one chunk.
>>>Other file systems have been doing it for years.
>>>
>>>      
>>>
>>Would you, please, say more about it.
>>    
>>
>
>Use writepages?
>
>  
>
writepages is flush time code, this is sys_write() code.  sys_write
first sticks things into the cache,, then memory pressure or pages
reaching maximum time allowed in memory or fsync pushes them out to
disk, at which time writepages might get used.

This issue is about cached writes losing performance when done 4k at a
time.  It is very similar to why bios are better than submitting io 4k
at a time, but it is at a different stage.

Christoph Hellwig wrote:

>That's not really the vfs but the generic pagecache routines.  For some
>filesystems (e.g. XFS) only reservations for delayed allocations are
>performed in this path so it doesn't really matter.  For not so advanced
>filesystems batching these calls would definitly be very helpful.  Patches
>to get there are very welcome.
>  
>
>
Glad we all agree.  vs is sending a pseudocoded proposal.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* batched write
  2006-06-08 12:10         ` Christoph Hellwig
@ 2006-06-14 22:08           ` Vladimir V. Saveliev
  2006-06-17 17:04             ` Andrew Morton
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir V. Saveliev @ 2006-06-14 22:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Reiserfs-Dev, Linux-Kernel, linux-fsdevel

Hello

On Thu, 2006-06-08 at 13:10 +0100, Christoph Hellwig wrote:
> On Thu, Jun 08, 2006 at 03:26:40PM +0400, Vladimir V. Saveliev wrote:
> > > > It may go to the kernel as a 64MB write, but VFS sends it to the FS as
> > > > 64MB/4k separate 4k writes.
> > > 
> > > Nonsense, 
> > 
> > Hans refers to generic_file_write which does
> > prepare_write
> > copy_from_user
> > commit_write
> > for each page.
> 
> That's not really the vfs but the generic pagecache routines.  For some
> filesystems (e.g. XFS) only reservations for delayed allocations are
> performed in this path so it doesn't really matter.  For not so advanced
> filesystems batching these calls would definitly be very helpful.  Patches
> to get there are very welcome.
> 

The core of generic_file_buffered_write is 
do {
	grab_cache_page();
	a_ops->prepare_write();
	copy_from_user();
	a_ops->commit_write();
	
	filemap_set_next_iovec();
	balance_dirty_pages_ratelimited();
} while (count);


Would it make sence to rework this code with adding new address_space
operation - fill_pages so that looks like:

do {
	a_ops->fill_pages();
	filemap_set_next_iovec();
	balance_dirty_pages_ratelimited();
} while (count);

generic implementation of fill_pages would look like:

generic_fill_pages()
{
	grab_cache_page();
	a_ops->prepare_write();
	copy_from_user();
	a_ops->commit_write();
}

I believe that filesystem developers will want to exploit that
operation. 

Any opinion on this plan is welcomed. I would try to code whatever we
will have developed (I hope) in result of this discussion.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-14 22:08           ` batched write Vladimir V. Saveliev
@ 2006-06-17 17:04             ` Andrew Morton
  2006-06-17 17:51               ` Hans Reiser
  2006-06-19 16:27               ` Andreas Dilger
  0 siblings, 2 replies; 37+ messages in thread
From: Andrew Morton @ 2006-06-17 17:04 UTC (permalink / raw)
  To: Vladimir V. Saveliev; +Cc: hch, Reiserfs-Dev, Linux-Kernel, linux-fsdevel

On Thu, 15 Jun 2006 02:08:32 +0400
"Vladimir V. Saveliev" <vs@namesys.com> wrote:

> The core of generic_file_buffered_write is 
> do {
> 	grab_cache_page();
> 	a_ops->prepare_write();
> 	copy_from_user();
> 	a_ops->commit_write();
> 	
> 	filemap_set_next_iovec();
> 	balance_dirty_pages_ratelimited();
> } while (count);
> 
> 
> Would it make sence to rework this code with adding new address_space
> operation - fill_pages so that looks like:
> 
> do {
> 	a_ops->fill_pages();
> 	filemap_set_next_iovec();
> 	balance_dirty_pages_ratelimited();
> } while (count);
> 
> generic implementation of fill_pages would look like:
> 
> generic_fill_pages()
> {
> 	grab_cache_page();
> 	a_ops->prepare_write();
> 	copy_from_user();
> 	a_ops->commit_write();
> }
> 

There's nothing which leaps out and says "wrong" in this.  But there's
nothing which leaps out and says "right", either.  It seems somewhat
arbitrary, that's all.

We have one filesystem which wants such a refactoring (although I don't
think you've adequately spelled out _why_ reiser4 wants this).

To be able to say "yes, we want this" I think we'd need to understand which
other filesystems would benefit from exploiting it, and with what results?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-17 17:04             ` Andrew Morton
@ 2006-06-17 17:51               ` Hans Reiser
  2006-06-18 11:20                 ` Nix
  2006-06-19 16:27               ` Andreas Dilger
  1 sibling, 1 reply; 37+ messages in thread
From: Hans Reiser @ 2006-06-17 17:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vladimir V. Saveliev, hch, Reiserfs-Dev, Linux-Kernel, linux-fsdevel

Andrew Morton wrote:

>We have one filesystem which wants such a refactoring (although I don't
>think you've adequately spelled out _why_ reiser4 wants this).
>  
>
>
When calling the filesystem for writes, there is processing that must be
done:

1) per word

2) per page

3) per call to the FS

If the FS is called per page, then it turns out that 3) costs more than
1) and 2) for sophisticated filesystems.  As we develop fancier and
fancier plugins this will just get more and more true.  It decreases CPU
usage by 2x to use per sys_write calls into reiser4 rather than per page
calls into reiser4.  (Vladimir, on Monday can you find and send your
benchmarks?)  This is significant for cached writes.

If it violates the intuition to believe this, then let me point out that
there was a similar motivation for the creation of bios: calling the
block layer traverses more lines of code than copying a page of bytes
does.  Unfortunately, all that code turns out to be useful
optimizations, so one cannot just take the attitude (whether for the
block layer or reiser4) that it should just be simplified.

Please note that I have no real problem with leaving the generic code
unchanged and having reiser4 do its own write operation.  I am modifying
the generic code because you suggested it was preferred.  Having
reviewed the code in detail, I see that you were right and it is better
to just fix the generic code to call more than 4k at a time into the FS,
and then be able to reuse the generic aio and direct io code (and etc.)
as a result.  So, to be sociable, and to get more code reuse, we make
this proposal.

>To be able to say "yes, we want this" I think we'd need to understand which
>other filesystems would benefit from exploiting it, and with what results?
>
>
>  
>
Or just let us have our own sys_write implementation without being
excluded for it.  I have shown that it is significantly faster for
reiser4 to process things more than 4k at a time.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-17 17:51               ` Hans Reiser
@ 2006-06-18 11:20                 ` Nix
  2006-06-19  9:05                   ` Hans Reiser
  0 siblings, 1 reply; 37+ messages in thread
From: Nix @ 2006-06-18 11:20 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Andrew Morton, Vladimir V. Saveliev, hch, Reiserfs-Dev,
	Linux-Kernel, linux-fsdevel

On 17 Jun 2006, Hans Reiser prattled cheerily:
> If the FS is called per page, then it turns out that 3) costs more than
> 1) and 2) for sophisticated filesystems.  As we develop fancier and
> fancier plugins this will just get more and more true.  It decreases CPU
> usage by 2x to use per sys_write calls into reiser4 rather than per page
> calls into reiser4.

This seems to me to be something that FUSE filesystems might well like,
too: I know one I'm working on would like to know the real size of the
original write request (so that it can optimize layout appropriately
for things frequently written in large chunks; the assumption being that
if it's written in large chunks it's likely to be read in large chunks
too).

-- 
`Voting for any American political party is fundamentally
 incomprehensible.' --- Vadik

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-18 11:20                 ` Nix
@ 2006-06-19  9:05                   ` Hans Reiser
  2006-06-19 11:32                     ` Miklos Szeredi
  0 siblings, 1 reply; 37+ messages in thread
From: Hans Reiser @ 2006-06-19  9:05 UTC (permalink / raw)
  To: Nix
  Cc: Andrew Morton, Vladimir V. Saveliev, hch, Reiserfs-Dev,
	Linux-Kernel, linux-fsdevel

Nix wrote:

>On 17 Jun 2006, Hans Reiser prattled cheerily:
>  
>
>>If the FS is called per page, then it turns out that 3) costs more than
>>1) and 2) for sophisticated filesystems.  As we develop fancier and
>>fancier plugins this will just get more and more true.  It decreases CPU
>>usage by 2x to use per sys_write calls into reiser4 rather than per page
>>calls into reiser4.
>>    
>>
>
>This seems to me to be something that FUSE filesystems might well like,
>too: I know one I'm working on would like to know the real size of the
>original write request (so that it can optimize layout appropriately
>for things frequently written in large chunks; the assumption being that
>if it's written in large chunks it's likely to be read in large chunks
>too).
>
>  
>
Hi Nix,

Forgive myn utter ignorance of fuse, but does it currently context
switch to user space for every 4k written through VFS?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-19  9:05                   ` Hans Reiser
@ 2006-06-19 11:32                     ` Miklos Szeredi
  2006-06-19 16:39                       ` Hans Reiser
  0 siblings, 1 reply; 37+ messages in thread
From: Miklos Szeredi @ 2006-06-19 11:32 UTC (permalink / raw)
  To: reiser; +Cc: nix, akpm, vs, hch, Reiserfs-Dev, Linux-Kernel, linux-fsdevel

> Forgive myn utter ignorance of fuse, but does it currently context
> switch to user space for every 4k written through VFS?

Yes, unfortunately it does, so fuse would benefit from batched writing
as well, with some constraint on the number of locked pages to avoid
DoS against the page cache.

Miklos

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-17 17:04             ` Andrew Morton
  2006-06-17 17:51               ` Hans Reiser
@ 2006-06-19 16:27               ` Andreas Dilger
  2006-06-19 16:51                 ` Hans Reiser
  2006-06-19 18:28                 ` Vladimir V. Saveliev
  1 sibling, 2 replies; 37+ messages in thread
From: Andreas Dilger @ 2006-06-19 16:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vladimir V. Saveliev, hch, Reiserfs-Dev, Linux-Kernel, linux-fsdevel

On Jun 17, 2006  10:04 -0700, Andrew Morton wrote:
> On Thu, 15 Jun 2006 02:08:32 +0400
> "Vladimir V. Saveliev" <vs@namesys.com> wrote:
> 
> > The core of generic_file_buffered_write is 
> > do {
> > 	grab_cache_page();
> > 	a_ops->prepare_write();
> > 	copy_from_user();
> > 	a_ops->commit_write();
> > 	
> > 	filemap_set_next_iovec();
> > 	balance_dirty_pages_ratelimited();
> > } while (count);
> > 
> > 
> > Would it make sence to rework this code with adding new address_space
> > operation - fill_pages so that looks like:
> > 
> > do {
> > 	a_ops->fill_pages();
> > 	filemap_set_next_iovec();
> > 	balance_dirty_pages_ratelimited();
> > } while (count);
> > 
> > generic implementation of fill_pages would look like:
> > 
> > generic_fill_pages()
> > {
> > 	grab_cache_page();
> > 	a_ops->prepare_write();
> > 	copy_from_user();
> > 	a_ops->commit_write();
> > }
> > 
> 
> There's nothing which leaps out and says "wrong" in this.  But there's
> nothing which leaps out and says "right", either.  It seems somewhat
> arbitrary, that's all.
> 
> We have one filesystem which wants such a refactoring (although I don't
> think you've adequately spelled out _why_ reiser4 wants this).
> 
> To be able to say "yes, we want this" I think we'd need to understand which
> other filesystems would benefit from exploiting it, and with what results?

With the caveat that I didn't see the original patch, if this can be a step
down the road toward supporting delayed allocation at the VFS level then
I'm all for such changes.

Lustre goes to some lengths to batch up reads and writes on the client into
large (1MB+) RPCs in order to maximize performance.  Similarly on the
server we essentially bypass the VFS in order to allocate all of the RPC's
blocks in one call and do a large bio write in a second.  It just isn't
possible to maximize performance if everything is split into PAGE_SIZE
chunks.

I believe XFS would benefit from delayed allocation, and the ext3-delalloc
patches from Alex also provide a large part of the performance wins for
userspace IO, when they allow large sys_write() and VM cache flush to
efficiently call into the filesystem to allocate many blocks at once, and
then push them out to disk in large chunks.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-19 11:32                     ` Miklos Szeredi
@ 2006-06-19 16:39                       ` Hans Reiser
  2006-06-19 17:35                         ` Miklos Szeredi
  0 siblings, 1 reply; 37+ messages in thread
From: Hans Reiser @ 2006-06-19 16:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: nix, akpm, vs, hch, Reiserfs-Dev, Linux-Kernel, linux-fsdevel, drepper

Miklos Szeredi wrote:

>>Forgive myn utter ignorance of fuse, but does it currently context
>>switch to user space for every 4k written through VFS?
>>    
>>
>
>Yes, unfortunately it does, so fuse would benefit from batched writing
>as well, with some constraint on the number of locked pages to avoid
>DoS against the page cache.
>
>Miklos
>
>
>  
>
I would think that batched write is pretty essential then to FUSE
performance.  If we could then get the glibc authors to not sabotage the
using of a large block size to indicate that we like large IOs (see
thread on fseek implementation), reiser4 and FUSE would be all set for
improved performance.  Even without glibc developer cooperation, we will
get a lot of benefits.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-19 16:27               ` Andreas Dilger
@ 2006-06-19 16:51                 ` Hans Reiser
  2006-06-19 18:50                   ` Andreas Dilger
  2006-06-19 18:28                 ` Vladimir V. Saveliev
  1 sibling, 1 reply; 37+ messages in thread
From: Hans Reiser @ 2006-06-19 16:51 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Andrew Morton, Vladimir V. Saveliev, hch, Reiserfs-Dev,
	Linux-Kernel, linux-fsdevel

Andreas Dilger wrote:

>On Jun 17, 2006  10:04 -0700, Andrew Morton wrote:
>  
>
>>On Thu, 15 Jun 2006 02:08:32 +0400
>>"Vladimir V. Saveliev" <vs@namesys.com> wrote:
>>
>>    
>>
>>>The core of generic_file_buffered_write is 
>>>do {
>>>	grab_cache_page();
>>>	a_ops->prepare_write();
>>>	copy_from_user();
>>>	a_ops->commit_write();
>>>	
>>>	filemap_set_next_iovec();
>>>	balance_dirty_pages_ratelimited();
>>>} while (count);
>>>
>>>
>>>Would it make sence to rework this code with adding new address_space
>>>operation - fill_pages so that looks like:
>>>
>>>do {
>>>	a_ops->fill_pages();
>>>	filemap_set_next_iovec();
>>>	balance_dirty_pages_ratelimited();
>>>} while (count);
>>>
>>>generic implementation of fill_pages would look like:
>>>
>>>generic_fill_pages()
>>>{
>>>	grab_cache_page();
>>>	a_ops->prepare_write();
>>>	copy_from_user();
>>>	a_ops->commit_write();
>>>}
>>>
>>>      
>>>
>>There's nothing which leaps out and says "wrong" in this.  But there's
>>nothing which leaps out and says "right", either.  It seems somewhat
>>arbitrary, that's all.
>>
>>We have one filesystem which wants such a refactoring (although I don't
>>think you've adequately spelled out _why_ reiser4 wants this).
>>
>>To be able to say "yes, we want this" I think we'd need to understand which
>>other filesystems would benefit from exploiting it, and with what results?
>>    
>>
>
>With the caveat that I didn't see the original patch, if this can be a step
>down the road toward supporting delayed allocation at the VFS level then
>I'm all for such changes.
>  
>
What do you mean by supporting delayed allocation at the VFS level?  Do
you mean calling to the FS or maybe just not stepping on the FS's toes
so much or?  Delayed allocation is very fs specific in so far as I can
imagine it.

>Lustre goes to some lengths to batch up reads and writes on the client into
>large (1MB+) RPCs in order to maximize performance.  Similarly on the
>server we essentially bypass the VFS in order to allocate all of the RPC's
>blocks in one call and do a large bio write in a second.  It just isn't
>possible to maximize performance if everything is split into PAGE_SIZE
>chunks.
>
>I believe XFS would benefit from delayed allocation, and the ext3-delalloc
>patches from Alex also provide a large part of the performance wins for
>userspace IO, when they allow large sys_write() and VM cache flush to
>efficiently call into the filesystem to allocate many blocks at once, and
>then push them out to disk in large chunks.
>
>Cheers, Andreas
>--
>Andreas Dilger
>Principal Software Engineer
>Cluster File Systems, Inc.
>
>
>
>  
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-19 16:39                       ` Hans Reiser
@ 2006-06-19 17:35                         ` Miklos Szeredi
  2006-06-19 17:52                           ` Akshat Aranya
  0 siblings, 1 reply; 37+ messages in thread
From: Miklos Szeredi @ 2006-06-19 17:35 UTC (permalink / raw)
  To: reiser
  Cc: nix, akpm, vs, hch, Reiserfs-Dev, Linux-Kernel, linux-fsdevel, drepper

> I would think that batched write is pretty essential then to FUSE
> performance.

Well, yes essential if the this is the bottleneck in write throughput,
which is most often not the case, but sometimes it is.

Miklos

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-19 17:35                         ` Miklos Szeredi
@ 2006-06-19 17:52                           ` Akshat Aranya
  2006-06-19 20:39                             ` Hans Reiser
  0 siblings, 1 reply; 37+ messages in thread
From: Akshat Aranya @ 2006-06-19 17:52 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: reiser, nix, akpm, vs, hch, Reiserfs-Dev, Linux-Kernel,
	linux-fsdevel, drepper

On 6/19/06, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > I would think that batched write is pretty essential then to FUSE
> > performance.
>
> Well, yes essential if the this is the bottleneck in write throughput,
> which is most often not the case, but sometimes it is.
>

I can vouch for this.  I did some experiments with an example FUSE
filesystem that discards the data in userspace.  Exporting such a
filesystem over NFS gives us 80 MB/s writes when FUSE is modified to
write with 32K block sizes.  With the standard FUSE (4K writes), we
get  closer to 50 MB/s.

> Miklos


-Akshat
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-19 16:27               ` Andreas Dilger
  2006-06-19 16:51                 ` Hans Reiser
@ 2006-06-19 18:28                 ` Vladimir V. Saveliev
  1 sibling, 0 replies; 37+ messages in thread
From: Vladimir V. Saveliev @ 2006-06-19 18:28 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Andrew Morton, hch, Reiserfs-Dev, Linux-Kernel, linux-fsdevel

Hello

On Mon, 2006-06-19 at 09:27 -0700, Andreas Dilger wrote:
> On Jun 17, 2006  10:04 -0700, Andrew Morton wrote:
> > On Thu, 15 Jun 2006 02:08:32 +0400
> > "Vladimir V. Saveliev" <vs@namesys.com> wrote:
> > 
> > > The core of generic_file_buffered_write is 
> > > do {
> > > 	grab_cache_page();
> > > 	a_ops->prepare_write();
> > > 	copy_from_user();
> > > 	a_ops->commit_write();
> > > 	
> > > 	filemap_set_next_iovec();
> > > 	balance_dirty_pages_ratelimited();
> > > } while (count);
> > > 
> > > 
> > > Would it make sence to rework this code with adding new address_space
> > > operation - fill_pages so that looks like:
> > > 
> > > do {
> > > 	a_ops->fill_pages();
> > > 	filemap_set_next_iovec();
> > > 	balance_dirty_pages_ratelimited();
> > > } while (count);
> > > 
> > > generic implementation of fill_pages would look like:
> > > 
> > > generic_fill_pages()
> > > {
> > > 	grab_cache_page();
> > > 	a_ops->prepare_write();
> > > 	copy_from_user();
> > > 	a_ops->commit_write();
> > > }
> > > 
> > 
> > There's nothing which leaps out and says "wrong" in this.  But there's
> > nothing which leaps out and says "right", either.  It seems somewhat
> > arbitrary, that's all.
> > 
> > We have one filesystem which wants such a refactoring (although I don't
> > think you've adequately spelled out _why_ reiser4 wants this).
> > 
> > To be able to say "yes, we want this" I think we'd need to understand which
> > other filesystems would benefit from exploiting it, and with what results?
> 
> With the caveat that I didn't see the original patch, if this can be a step
> down the road toward supporting delayed allocation at the VFS level then
> I'm all for such changes.
> 

Doesn't writepages method operation of address space provide enough
freedom for a filesystem to perform delayed allocation?

The goal of the patch was just to allow a filesystem to perform metadata
update for several newly added to a file pages at once. Currently,
filesystem is asked to do that once per page. Filesystems which have
complex algorithms involved into that may find this possibility useful
to improve performance.

> Lustre goes to some lengths to batch up reads and writes on the client into
> large (1MB+) RPCs in order to maximize performance.  Similarly on the
> server we essentially bypass the VFS in order to allocate all of the RPC's
> blocks in one call and do a large bio write in a second.  It just isn't
> possible to maximize performance if everything is split into PAGE_SIZE
> chunks.
> 
> I believe XFS would benefit from delayed allocation, and the ext3-delalloc
> patches from Alex also provide a large part of the performance wins for
> userspace IO, when they allow large sys_write() and VM cache flush to
> efficiently call into the filesystem to allocate many blocks at once, and
> then push them out to disk in large chunks.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
> 
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-19 16:51                 ` Hans Reiser
@ 2006-06-19 18:50                   ` Andreas Dilger
  2006-06-19 20:47                     ` Hans Reiser
  2006-06-20  0:01                     ` David Chinner
  0 siblings, 2 replies; 37+ messages in thread
From: Andreas Dilger @ 2006-06-19 18:50 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Andrew Morton, Vladimir V. Saveliev, hch, Reiserfs-Dev,
	Linux-Kernel, linux-fsdevel

On Jun 19, 2006  09:51 -0700, Hans Reiser wrote:
> Andreas Dilger wrote:
> >With the caveat that I didn't see the original patch, if this can be a step
> >down the road toward supporting delayed allocation at the VFS level then
> >I'm all for such changes.
>
> What do you mean by supporting delayed allocation at the VFS level?  Do
> you mean calling to the FS or maybe just not stepping on the FS's toes
> so much or?  Delayed allocation is very fs specific in so far as I can
> imagine it.

Currently the VM/VFS call into the filesystem in ->prepare_write for each
page to do block allocation for the filesystem.  This is the filesystem's
chance to return -ENOSPC, etc, because after that point the dirty pages
are written asynchronously and there is no guarantee that the application
will even be around when they are finally written to disk.

If the VFS supported delayed allocation it would call into the filesystem
on a per-sys_write basis to allow the filesystem to RESERVE space for all
of the pages in the write call, and then later (under memory pressure,
page aging, or even "pull" from the fs) submit a whole batch of contiguous
pages to the fs efficiently (via ->fill_pages() or whatever).

The fs can know at that time the final file size (if the file isn't still
being dirtied), can allocate all these blocks in a contiguous chunk, can
submit all of the IO in a single bio to the block layer or RPC/RDMA to net.

As you well know, while it is possible to do this now by copying all of the
generic_file_write() logic into the filesystem *_file_write() method, in
practise it is hard to do this from a code maintenance point of view.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-19 17:52                           ` Akshat Aranya
@ 2006-06-19 20:39                             ` Hans Reiser
  0 siblings, 0 replies; 37+ messages in thread
From: Hans Reiser @ 2006-06-19 20:39 UTC (permalink / raw)
  To: Akshat Aranya, vs
  Cc: Miklos Szeredi, nix, akpm, vs, hch, Reiserfs-Dev, Linux-Kernel,
	linux-fsdevel, drepper

Akshat Aranya wrote:

> On 6/19/06, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>> > I would think that batched write is pretty essential then to FUSE
>> > performance.
>>
>> Well, yes essential if the this is the bottleneck in write throughput,
>> which is most often not the case, but sometimes it is.
>>
>
> I can vouch for this.  I did some experiments with an example FUSE
> filesystem that discards the data in userspace.  Exporting such a
> filesystem over NFS gives us 80 MB/s writes when FUSE is modified to
> write with 32K block sizes.  With the standard FUSE (4K writes), we
> get  closer to 50 MB/s.

The ratios of 4k performance / large write performance are amusingly
similar for reiser4 and FUSE even though the filesystems and absolute
performance are totally different.  The principle is the same it seems
for both filesystems.

Vladimir, the benchmarks, please send them.....

Hans

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-19 18:50                   ` Andreas Dilger
@ 2006-06-19 20:47                     ` Hans Reiser
  2006-06-20  0:01                     ` David Chinner
  1 sibling, 0 replies; 37+ messages in thread
From: Hans Reiser @ 2006-06-19 20:47 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Andrew Morton, Vladimir V. Saveliev, hch, Reiserfs-Dev,
	Linux-Kernel, linux-fsdevel

Andreas Dilger wrote:

>
>If the VFS supported delayed allocation it would call into the filesystem
>on a per-sys_write basis 
>
Is it necessary for VFS to specify that it is for delayed allocation
that it does it, or can it be a more generic sort of per sys-write call?

>to allow the filesystem to RESERVE space for all
>of the pages in the write call, and then later (under memory pressure,
>page aging, or even "pull" from the fs) submit a whole batch of contiguous
>pages to the fs efficiently (via ->fill_pages() or whatever).
>  
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-19 18:50                   ` Andreas Dilger
  2006-06-19 20:47                     ` Hans Reiser
@ 2006-06-20  0:01                     ` David Chinner
  2006-06-20  7:19                       ` Hans Reiser
  1 sibling, 1 reply; 37+ messages in thread
From: David Chinner @ 2006-06-20  0:01 UTC (permalink / raw)
  To: Hans Reiser, Andrew Morton, Vladimir V. Saveliev, hch,
	Reiserfs-Dev, Linux-Kernel, linux-fsdevel

On Mon, Jun 19, 2006 at 12:50:49PM -0600, Andreas Dilger wrote:
> On Jun 19, 2006  09:51 -0700, Hans Reiser wrote:
> > Andreas Dilger wrote:
> > >With the caveat that I didn't see the original patch, if this can be a step
> > >down the road toward supporting delayed allocation at the VFS level then
> > >I'm all for such changes.
> >
> > What do you mean by supporting delayed allocation at the VFS level?  Do
> > you mean calling to the FS or maybe just not stepping on the FS's toes
> > so much or?  Delayed allocation is very fs specific in so far as I can
> > imagine it.
> 
> Currently the VM/VFS call into the filesystem in ->prepare_write for each
> page to do block allocation for the filesystem.  This is the filesystem's
> chance to return -ENOSPC, etc, because after that point the dirty pages
> are written asynchronously and there is no guarantee that the application
> will even be around when they are finally written to disk.
> 
> If the VFS supported delayed allocation it would call into the filesystem
> on a per-sys_write basis to allow the filesystem to RESERVE space for all
> of the pages in the write call,

The VFS doesn't need to support delalloc as delalloc is fundamentally a
filesystem property. The VFS it already provides a hook for delalloc space
reservation that can return ENOSPC - it's called ->prepare_write().

Sure, a batch interface would be nice, but that's an optimisation that
needs to be done regardless of whether the filesystem supports delalloc or
not.  The current ->prepare_write() interface shows its limits when having to
do hundreds of thousands (millions, even) of ->prepare_write() calls per
second. This makes for entertaining scaling problems that batching would
make less of a problem.

> and then later (under memory pressure,
> page aging, or even "pull" from the fs) submit a whole batch of contiguous
> pages to the fs efficiently (via ->fill_pages() or whatever).

Can be done right now - XFS does this probe-and-pull operation already for
writes. See xfs_probe_cluster(), xfs_cluster_write() and friends.

Yes, it would be nice to have the VM pass us clusters of adjacent pages, but
given that the file layout drives the cluster size it is more appropriate
to do this from the filesystem. Also, the pages do not contain the state
necessary for the VM to cluster pages in an way that results in efficient
I/O patterns.

Basically, the only thing really needed from the VFS/VM is a method of tagging
delalloc (or unwritten) pages so that the writepage path knows how to treat
the page being written. Currently we keep that state in bufferheads (e.g. see
buffer_delay() usage) attached to the page......

> The fs can know at that time the final file size (if the file isn't still
> being dirtied), can allocate all these blocks in a contiguous chunk, can
> submit all of the IO in a single bio to the block layer or RPC/RDMA to net.

You don't need to know the final file size - just what is contiguous in the
page cache and in the same state as the page being flushed.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-20  0:01                     ` David Chinner
@ 2006-06-20  7:19                       ` Hans Reiser
  2006-06-20  7:26                         ` Andrew Morton
  0 siblings, 1 reply; 37+ messages in thread
From: Hans Reiser @ 2006-06-20  7:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Chinner, Vladimir V. Saveliev, hch, Reiserfs-Dev,
	Linux-Kernel, linux-fsdevel

So far we have XFS, FUSE, and reiser4 benefiting from the potential
ability to process more than 4k at a time.  Is it enough?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-20  7:19                       ` Hans Reiser
@ 2006-06-20  7:26                         ` Andrew Morton
  2006-06-20  9:02                           ` Steven Whitehouse
  2006-06-20 16:26                           ` Vladimir V. Saveliev
  0 siblings, 2 replies; 37+ messages in thread
From: Andrew Morton @ 2006-06-20  7:26 UTC (permalink / raw)
  To: Hans Reiser; +Cc: dgc, vs, hch, Reiserfs-Dev, Linux-Kernel, linux-fsdevel

On Tue, 20 Jun 2006 00:19:24 -0700
Hans Reiser <reiser@namesys.com> wrote:

> So far we have XFS, FUSE, and reiser4 benefiting from the potential
> ability to process more than 4k at a time.  Is it enough?

Spose so.  Let's see what the diff looks like?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-20  7:26                         ` Andrew Morton
@ 2006-06-20  9:02                           ` Steven Whitehouse
  2006-06-20 16:26                           ` Vladimir V. Saveliev
  1 sibling, 0 replies; 37+ messages in thread
From: Steven Whitehouse @ 2006-06-20  9:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-fsdevel, Linux-Kernel, Reiserfs-Dev, hch, vs, dgc, Hans Reiser

Hi,

On Tue, 2006-06-20 at 00:26 -0700, Andrew Morton wrote:
> On Tue, 20 Jun 2006 00:19:24 -0700
> Hans Reiser <reiser@namesys.com> wrote:
> 
> > So far we have XFS, FUSE, and reiser4 benefiting from the potential
> > ability to process more than 4k at a time.  Is it enough?
> 
> Spose so.  Let's see what the diff looks like?
> -

I have plans to do something along these lines for GFS2 in the future,
so you can add that to the list as well, if that helps things along,

Steve.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-20  7:26                         ` Andrew Morton
  2006-06-20  9:02                           ` Steven Whitehouse
@ 2006-06-20 16:26                           ` Vladimir V. Saveliev
  2006-06-20 17:29                             ` Hans Reiser
  1 sibling, 1 reply; 37+ messages in thread
From: Vladimir V. Saveliev @ 2006-06-20 16:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hans Reiser, dgc, hch, Reiserfs-Dev, Linux-Kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 386 bytes --]

Hello

On Tue, 2006-06-20 at 00:26 -0700, Andrew Morton wrote:
> On Tue, 20 Jun 2006 00:19:24 -0700
> Hans Reiser <reiser@namesys.com> wrote:
> 
> > So far we have XFS, FUSE, and reiser4 benefiting from the potential
> > ability to process more than 4k at a time.  Is it enough?
> 
> Spose so.  Let's see what the diff looks like?
> 
ok, the first draft version for evaluation is here.

[-- Attachment #2: batched-write.patch --]
[-- Type: text/x-patch, Size: 8721 bytes --]


This patch adds a method fill_pages to struct address_space_operations.
A filesystem may want to implement this operation to improve write performance.
Generic implementation for the method is made by cut-n-paste off generic_file_buffered_write:
it writes one page using prepare_write and commit_write address space operations.

NOTE: just for evaluation, compiliable only yet. 



diff -puN mm/filemap.c~batched-write mm/filemap.c
--- linux-2.6.17-rc6-mm2/mm/filemap.c~batched-write	2006-06-12 16:06:43.000000000 +0400
+++ linux-2.6.17-rc6-mm2-vs/mm/filemap.c	2006-06-20 19:52:53.000000000 +0400
@@ -2093,68 +2093,78 @@ generic_file_direct_write(struct kiocb *
 }
 EXPORT_SYMBOL(generic_file_direct_write);
 
-ssize_t
-generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
-		unsigned long nr_segs, loff_t pos, loff_t *ppos,
-		size_t count, ssize_t written)
+typedef size_t (*write_actor_t)(struct page *, unsigned long, size_t,
+				const write_descriptor_t *);
+struct write_descriptor {
+	loff_t pos;
+	size_t count;
+	const struct iovec *cur_iov;	/* current iovec */
+	size_t iov_base;		/* offset in the current iovec */
+	char __user *buf;
+	write_actor_t actor;
+};
+
+static size_t write_actor(struct page *page, unsigned long offset, size_t bytes,
+			  const write_descriptor_t *desc)
 {
-	struct file *file = iocb->ki_filp;
-	struct address_space * mapping = file->f_mapping;
-	const struct address_space_operations *a_ops = mapping->a_ops;
-	struct inode 	*inode = mapping->host;
-	long		status = 0;
-	struct page	*page;
-	struct page	*cached_page = NULL;
-	size_t		bytes;
-	struct pagevec	lru_pvec;
-	const struct iovec *cur_iov = iov; /* current iovec */
-	size_t		iov_base = 0;	   /* offset in the current iovec */
-	char __user	*buf;
+	return filemap_copy_from_user(page, offset, desc->buf, bytes);
+}
 
-	pagevec_init(&lru_pvec, 0);
+static size_t write_iovec_actor(struct page *page, unsigned long offset,
+				size_t bytes, const write_descriptor_t *desc)
+{
+	return filemap_copy_from_user_iovec(page, offset, desc->cur_iov,
+					    desc->iov_base, bytes);
+}
+
+/**
+ * generic_fill_pages
+ * @file:
+ * @desc:
+ * @lru_pvec:
+ * @cached_page:
+ * @copied:
+ *
+ * Number of bytes written is returned via @copied.
+ */
+long generic_fill_pages(struct file *file, const write_descriptor_t *desc,
+			struct pagevec *lru_pvec, struct page **cached_page,
+			size_t *copied)
+{
+	const struct address_space_operations *a_ops = file->f_mapping->a_ops;
+	struct page *page;
+	unsigned long index;
+	size_t bytes;
+	unsigned long offset;
+	unsigned long maxlen;
+	long status;
+
+	offset = (desc->pos & (PAGE_CACHE_SIZE - 1)); /* Within page */
+	index = desc->pos >> PAGE_CACHE_SHIFT;
+	bytes = PAGE_CACHE_SIZE - offset;
+	if (bytes > desc->count)
+		bytes = desc->count;
 
 	/*
-	 * handle partial DIO write.  Adjust cur_iov if needed.
+	 * Bring in the user page that we will copy from _first_.
+	 * Otherwise there's a nasty deadlock on copying from the same
+	 * page as we're writing to, without it being marked
+	 * up-to-date.
 	 */
-	if (likely(nr_segs == 1))
-		buf = iov->iov_base + written;
-	else {
-		filemap_set_next_iovec(&cur_iov, &iov_base, written);
-		buf = cur_iov->iov_base + iov_base;
-	}
-
-	do {
-		unsigned long index;
-		unsigned long offset;
-		unsigned long maxlen;
-		size_t copied;
-
-		offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
-		index = pos >> PAGE_CACHE_SHIFT;
-		bytes = PAGE_CACHE_SIZE - offset;
-		if (bytes > count)
-			bytes = count;
-
-		/*
-		 * Bring in the user page that we will copy from _first_.
-		 * Otherwise there's a nasty deadlock on copying from the
-		 * same page as we're writing to, without it being marked
-		 * up-to-date.
-		 */
-		maxlen = cur_iov->iov_len - iov_base;
-		if (maxlen > bytes)
-			maxlen = bytes;
-		fault_in_pages_readable(buf, maxlen);
-
-		page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
-		if (!page) {
-			status = -ENOMEM;
-			break;
-		}
+	maxlen = desc->cur_iov->iov_len - desc->iov_base;
+	if (maxlen > bytes)
+		maxlen = bytes;
+
+	while (1) {
+		fault_in_pages_readable(desc->buf, maxlen);
+
+		page = __grab_cache_page(file->f_mapping, index, cached_page, lru_pvec);
+		if (!page)
+			return -ENOMEM;
 
 		status = a_ops->prepare_write(file, page, offset, offset+bytes);
 		if (unlikely(status)) {
-			loff_t isize = i_size_read(inode);
+			loff_t isize = i_size_read(file->f_mapping->host);
 
 			if (status != AOP_TRUNCATED_PAGE)
 				unlock_page(page);
@@ -2162,51 +2172,87 @@ generic_file_buffered_write(struct kiocb
 			if (status == AOP_TRUNCATED_PAGE)
 				continue;
 			/*
-			 * prepare_write() may have instantiated a few blocks
-			 * outside i_size.  Trim these off again.
+			 * prepare_write() may have instantiated a few
+			 * blocks outside i_size.  Trim these off
+			 * again.
 			 */
-			if (pos + bytes > isize)
-				vmtruncate(inode, isize);
-			break;
+			if (desc->pos + bytes > isize)
+				vmtruncate(file->f_mapping->host, isize);
+			return status;
 		}
-		if (likely(nr_segs == 1))
-			copied = filemap_copy_from_user(page, offset,
-							buf, bytes);
-		else
-			copied = filemap_copy_from_user_iovec(page, offset,
-						cur_iov, iov_base, bytes);
+
+		*copied = desc->actor(page, offset, bytes, desc);
+
 		flush_dcache_page(page);
 		status = a_ops->commit_write(file, page, offset, offset+bytes);
 		if (status == AOP_TRUNCATED_PAGE) {
 			page_cache_release(page);
 			continue;
 		}
-		if (likely(copied > 0)) {
-			if (!status)
-				status = copied;
 
-			if (status >= 0) {
-				written += status;
-				count -= status;
-				pos += status;
-				buf += status;
-				if (unlikely(nr_segs > 1)) {
-					filemap_set_next_iovec(&cur_iov,
-							&iov_base, status);
-					if (count)
-						buf = cur_iov->iov_base +
-							iov_base;
-				} else {
-					iov_base += status;
-				}
-			}
-		}
-		if (unlikely(copied != bytes))
-			if (status >= 0)
-				status = -EFAULT;
 		unlock_page(page);
 		mark_page_accessed(page);
 		page_cache_release(page);
+		break;
+	}
+	if (status)
+		*copied = 0;
+	else if (*copied != bytes)
+		status = -EFAULT;
+	return status;
+}
+
+ssize_t
+generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
+			    unsigned long nr_segs, loff_t pos, loff_t *ppos,
+			    size_t count, ssize_t written)
+{
+	struct file *file = iocb->ki_filp;
+	struct address_space * mapping = file->f_mapping;
+	const struct address_space_operations *a_ops = mapping->a_ops;
+	struct inode 	*inode = mapping->host;
+	long		status;
+	struct page	*cached_page = NULL;
+	struct pagevec	lru_pvec;
+	write_descriptor_t desc;
+	size_t copied = 0;
+
+	pagevec_init(&lru_pvec, 0);
+
+	desc.pos = pos;
+	desc.count = count;
+	desc.cur_iov = iov;
+	desc.iov_base = 0;
+
+	/*
+	 * handle partial DIO write.  Adjust cur_iov if needed.
+	 */
+	if (likely(nr_segs == 1)) {
+		desc.buf = iov->iov_base + written;
+		desc.actor = write_actor;
+	} else {
+		filemap_set_next_iovec(&desc.cur_iov, &desc.iov_base, written);
+		desc.buf = desc.cur_iov->iov_base + desc.iov_base;
+		desc.actor = write_iovec_actor;
+	}
+
+	do {
+		status = mapping->a_ops->fill_pages(file, &desc,
+						    &lru_pvec, &cached_page, &copied);
+		if (likely(copied > 0)) {
+			written += copied;
+			desc.count -= copied;
+			desc.pos += copied;
+			desc.buf += copied;
+			if (unlikely(nr_segs > 1)) {
+				filemap_set_next_iovec(&desc.cur_iov,
+						       &desc.iov_base, copied);
+				if (count)
+					desc.buf = desc.cur_iov->iov_base + desc.iov_base;
+			} else {
+				desc.iov_base += copied;
+			}
+		}
 		if (status < 0)
 			break;
 		balance_dirty_pages_ratelimited(mapping);
diff -puN include/linux/fs.h~batched-write include/linux/fs.h
--- linux-2.6.17-rc6-mm2/include/linux/fs.h~batched-write	2006-06-16 19:12:47.000000000 +0400
+++ linux-2.6.17-rc6-mm2-vs/include/linux/fs.h	2006-06-20 19:40:26.000000000 +0400
@@ -346,6 +346,9 @@ enum positive_aop_returns {
 struct page;
 struct address_space;
 struct writeback_control;
+typedef struct write_descriptor write_descriptor_t;
+
+#include <linux/pagevec.h>
 
 struct address_space_operations {
 	int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -367,6 +370,9 @@ struct address_space_operations {
 	 */
 	int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
 	int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
+	long (*fill_pages)(struct file *, const write_descriptor_t *,
+			   struct pagevec *, struct page **, size_t *copied);
+
 	/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
 	sector_t (*bmap)(struct address_space *, sector_t);
 	void (*invalidatepage) (struct page *, unsigned long);

_

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: batched write
  2006-06-20 16:26                           ` Vladimir V. Saveliev
@ 2006-06-20 17:29                             ` Hans Reiser
  0 siblings, 0 replies; 37+ messages in thread
From: Hans Reiser @ 2006-06-20 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vladimir V. Saveliev, dgc, hch, Reiserfs-Dev, Linux-Kernel,
	linux-fsdevel

Hold off on reading this, there will be a tested version with comments,
etc., later this week.  

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2006-06-20 17:29 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-23 20:14 [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer) Hans Reiser
2006-05-23 20:26 ` Alexey Polyakov
2006-05-23 20:33   ` Michal Piotrowski
2006-05-24 14:39     ` Vladimir V. Saveliev
2006-06-08 10:45       ` Jan Engelhardt
2006-06-08 12:40         ` Vladimir V. Saveliev
2006-06-08 14:11           ` Jan Engelhardt
2006-05-24 17:53 ` Tom Vier
2006-05-24 17:55   ` Hans Reiser
2006-06-08 11:00     ` Jens Axboe
2006-06-08 11:26       ` Vladimir V. Saveliev
2006-06-08 11:35         ` Jens Axboe
2006-06-08 12:08           ` Vladimir V. Saveliev
2006-06-14 19:37           ` Hans Reiser
2006-06-08 12:10         ` Christoph Hellwig
2006-06-14 22:08           ` batched write Vladimir V. Saveliev
2006-06-17 17:04             ` Andrew Morton
2006-06-17 17:51               ` Hans Reiser
2006-06-18 11:20                 ` Nix
2006-06-19  9:05                   ` Hans Reiser
2006-06-19 11:32                     ` Miklos Szeredi
2006-06-19 16:39                       ` Hans Reiser
2006-06-19 17:35                         ` Miklos Szeredi
2006-06-19 17:52                           ` Akshat Aranya
2006-06-19 20:39                             ` Hans Reiser
2006-06-19 16:27               ` Andreas Dilger
2006-06-19 16:51                 ` Hans Reiser
2006-06-19 18:50                   ` Andreas Dilger
2006-06-19 20:47                     ` Hans Reiser
2006-06-20  0:01                     ` David Chinner
2006-06-20  7:19                       ` Hans Reiser
2006-06-20  7:26                         ` Andrew Morton
2006-06-20  9:02                           ` Steven Whitehouse
2006-06-20 16:26                           ` Vladimir V. Saveliev
2006-06-20 17:29                             ` Hans Reiser
2006-06-19 18:28                 ` Vladimir V. Saveliev
2006-05-24 17:59   ` [PATCH] updated reiser4 - reduced cpu usage for writes by writing more than 4k at a time (has implications for generic write code and eventually for the IO layer) Hans Reiser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).