All of lore.kernel.org
 help / color / mirror / Atom feed
* v3 experimental data=ordered and logging speedups for 2.6.1
@ 2004-01-19 16:45 Chris Mason
  2004-01-19 22:53 ` Dieter Nützel
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Chris Mason @ 2004-01-19 16:45 UTC (permalink / raw)
  To: reiserfs-list, green

Hello everyone,

I've got most of data=ordered finished, there are a few paths like
writepage and O_DIRECT that need tweaking.  Thanks to Oleg's file_write
work in 2.6.x, the data=journal patch is much cleaner than 2.4, it is
almost done but not included in the bunch of patches I just uploaded to
ftp.suse.com.  Oleg is cc'd in case he wants to look over the changes to
reiserfs_file_write in reiserfs-jh-2.

The code has survived a weekend of moderate load, but you still want it
very far away from production servers.  I'm headed off to linux world in
NYC for the rest of the week, and I wanted to post this for review and
the few brave souls out there who might want to give it a try.

ftp.suse.com/pub/people/mason/patches/data-logging/experimental/2.6.1

The README:

Experimental reiserfs data=ordered and logging speedups against 2.6.1

Apply these in order:

01-reiserfs-journal-writer
removes old stale debugging code, very safe

02-reiserfs-nesting
Adds support for nested transactions in reiserfs, needed for the quota code,
and ported from 2.4.x by Jeff Mahoney

03-reiserfs-iosize
Changes reiserfs to tell userspace the default io size is 4k.  Works around
a bug in bdb hit by rpm users

04-reiserfs-balance_dirty
Changes reiserfs_file_write to throttle writers the way the rest of linux
does.  This patch has already been sent for inclusion, it should get in soon

05-reiserfs-logging
Logging speedups for small transactions and fsync heavy applications.  Most
experimental patch of the bunch, since it changes the way the log does
metadata writeback

06-reiserfs-jh-2
Adds data=ordered support, along with a journal header attached to
the buffer head.  This allows for more efficient data=ordered support
than I had in 2.4.x.

-chris



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-01-19 16:45 v3 experimental data=ordered and logging speedups for 2.6.1 Chris Mason
@ 2004-01-19 22:53 ` Dieter Nützel
  2004-01-19 22:54   ` Mike Fedyk
  2004-01-21  1:50   ` Chris Mason
  2004-01-21 15:09 ` Oleg Drokin
  2004-02-11 11:49 ` Oleg Drokin
  2 siblings, 2 replies; 13+ messages in thread
From: Dieter Nützel @ 2004-01-19 22:53 UTC (permalink / raw)
  To: Chris Mason; +Cc: reiserfs-list, green, Gerd Knorr, Jeff Mahoney

Am Montag, 19. Januar 2004 17:45 schrieb Chris Mason:
> Hello everyone,
>
> I've got most of data=ordered finished, there are a few paths like
> writepage and O_DIRECT that need tweaking.  Thanks to Oleg's file_write
> work in 2.6.x, the data=journal patch is much cleaner than 2.4, it is
> almost done but not included in the bunch of patches I just uploaded to
> ftp.suse.com.  Oleg is cc'd in case he wants to look over the changes to
> reiserfs_file_write in reiserfs-jh-2.
>
> The code has survived a weekend of moderate load, but you still want it
> very far away from production servers.  I'm headed off to linux world in
> NYC for the rest of the week, and I wanted to post this for review and
> the few brave souls out there who might want to give it a try.
>
> ftp.suse.com/pub/people/mason/patches/data-logging/experimental/2.6.1
>
> The README:
>
> Experimental reiserfs data=ordered and logging speedups against 2.6.1

Success.

I applied them against the SuSE 9.0 2.6.1-0 kernel from Gerd.

> Apply these in order:
>
> 01-reiserfs-journal-writer
> removes old stale debugging code, very safe
>
> 02-reiserfs-nesting
> Adds support for nested transactions in reiserfs, needed for the quota
> code, and ported from 2.4.x by Jeff Mahoney
>
> 03-reiserfs-iosize
> Changes reiserfs to tell userspace the default io size is 4k.  Works around
> a bug in bdb hit by rpm users

02 and 03 where already applied.

So NOT need, here.

> 04-reiserfs-balance_dirty
> Changes reiserfs_file_write to throttle writers the way the rest of linux
> does.  This patch has already been sent for inclusion, it should get in
> soon

Goes in clean.

> 05-reiserfs-logging
> Logging speedups for small transactions and fsync heavy applications.  Most
> experimental patch of the bunch, since it changes the way the log does
> metadata writeback
>
> 06-reiserfs-jh-2
> Adds data=ordered support, along with a journal header attached to
> the buffer head.  This allows for more efficient data=ordered support
> than I had in 2.4.x.

05 and 06 needed some handwork 'cause the SuSE kernel inclues xattrs and posix 
acl's but nothing special.

An EXPORT was missing in linux/fs/buffer.c to compile ReiserFS 3.x.x as modul 
(inode.c, unresolved symbol):

[-]
int try_to_release_page(struct page *page, int gfp_mask)
{
        struct address_space * const mapping = page->mapping;

        if (!PageLocked(page))
                BUG();
        if (PageWriteback(page))
                return 0;

        if (mapping && mapping->a_ops->releasepage)
                return mapping->a_ops->releasepage(page, gfp_mask);
        return try_to_free_buffers(page);
}
EXPORT_SYMBOL(try_to_release_page);
[-]

Up and running.

Greetings,
	Dieter

BTW Gerd "released" 2.6.1-1 already...


-- 
Dieter Nützel
@home: <Dieter.Nuetzel () hamburg ! de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-01-19 22:53 ` Dieter Nützel
@ 2004-01-19 22:54   ` Mike Fedyk
  2004-01-21  1:50   ` Chris Mason
  1 sibling, 0 replies; 13+ messages in thread
From: Mike Fedyk @ 2004-01-19 22:54 UTC (permalink / raw)
  To: Dieter N?tzel; +Cc: Chris Mason, reiserfs-list, green, Gerd Knorr, Jeff Mahoney

On Mon, Jan 19, 2004 at 11:53:03PM +0100, Dieter N?tzel wrote:
> An EXPORT was missing in linux/fs/buffer.c to compile ReiserFS 3.x.x as modul 
> (inode.c, unresolved symbol):
> 
> [-]
> int try_to_release_page(struct page *page, int gfp_mask)
> {
>         struct address_space * const mapping = page->mapping;
> 
>         if (!PageLocked(page))
>                 BUG();
>         if (PageWriteback(page))
>                 return 0;
> 
>         if (mapping && mapping->a_ops->releasepage)
>                 return mapping->a_ops->releasepage(page, gfp_mask);
>         return try_to_free_buffers(page);
> }
> EXPORT_SYMBOL(try_to_release_page);
> [-]
> 
> Up and running.

Uh, oh, I think several people won't like exporting that function...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-01-19 22:53 ` Dieter Nützel
  2004-01-19 22:54   ` Mike Fedyk
@ 2004-01-21  1:50   ` Chris Mason
  2004-02-09 13:04     ` Dieter Nützel
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Mason @ 2004-01-21  1:50 UTC (permalink / raw)
  To: Dieter Nützel; +Cc: reiserfs-list, green, Gerd Knorr, Jeff Mahoney

On Mon, 2004-01-19 at 17:53, Dieter Nützel wrote:

> 05 and 06 needed some handwork 'cause the SuSE kernel inclues xattrs and posix 
> acl's but nothing special.
> 

Good to hear.  I wasn't expecting the suse merge to be difficult,
luckily it doesn't have many patches in it yet.  Jeff and I will look at
getting them into the suse kernel once data=journal is done as well.

> An EXPORT was missing in linux/fs/buffer.c to compile ReiserFS 3.x.x as modul 
> (inode.c, unresolved symbol):
> 

Thanks, I'll add it into the patch when I get back from linux world.

-chris



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-01-19 16:45 v3 experimental data=ordered and logging speedups for 2.6.1 Chris Mason
  2004-01-19 22:53 ` Dieter Nützel
@ 2004-01-21 15:09 ` Oleg Drokin
  2004-02-11 11:49 ` Oleg Drokin
  2 siblings, 0 replies; 13+ messages in thread
From: Oleg Drokin @ 2004-01-21 15:09 UTC (permalink / raw)
  To: Chris Mason; +Cc: reiserfs-list

Hello!

On Mon, Jan 19, 2004 at 11:45:26AM -0500, Chris Mason wrote:

> I've got most of data=ordered finished, there are a few paths like
> writepage and O_DIRECT that need tweaking.  Thanks to Oleg's file_write
> work in 2.6.x, the data=journal patch is much cleaner than 2.4, it is
> almost done but not included in the bunch of patches I just uploaded to
> ftp.suse.com.  Oleg is cc'd in case he wants to look over the changes to
> reiserfs_file_write in reiserfs-jh-2.

Cool. I'd certainly take a look at it. But may be in February, as I am in US
right now and I have not got any stable internet connection yet.

Thank you.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-01-21  1:50   ` Chris Mason
@ 2004-02-09 13:04     ` Dieter Nützel
  2004-02-09 14:14       ` Javier Marcet
  0 siblings, 1 reply; 13+ messages in thread
From: Dieter Nützel @ 2004-02-09 13:04 UTC (permalink / raw)
  To: reiserfs-list; +Cc: Chris Mason, green, Gerd Knorr, Jeff Mahoney

Am Mittwoch, 21. Januar 2004 02:50 schrieb Chris Mason:
> On Mon, 2004-01-19 at 17:53, Dieter Nützel wrote:
> > 05 and 06 needed some handwork 'cause the SuSE kernel inclues xattrs and
> > posix acl's but nothing special.
>
> Good to hear.  I wasn't expecting the suse merge to be difficult,
> luckily it doesn't have many patches in it yet.  Jeff and I will look at
> getting them into the suse kernel once data=journal is done as well.
>
> > An EXPORT was missing in linux/fs/buffer.c to compile ReiserFS 3.x.x as
> > modul (inode.c, unresolved symbol):
>
> Thanks, I'll add it into the patch when I get back from linux world.

More on this.

I have now SuSE's 9.0 2.6.2-0 up and running.

Only 01-reiserfs-journal-writer is needed now without modifications.

02-reiserfs-nesting
03-reiserfs-iosize
04-reiserfs-balance_dirty

Are all in an these two need some _little_ hand work:

05-reiserfs-logging
06-reiserfs-jh-2

Don't forget this in fs/buffer.c

EXPORT_SYMBOL(try_to_release_page);

Greetings,
	Dieter

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-02-09 13:04     ` Dieter Nützel
@ 2004-02-09 14:14       ` Javier Marcet
  0 siblings, 0 replies; 13+ messages in thread
From: Javier Marcet @ 2004-02-09 14:14 UTC (permalink / raw)
  To: reiserfs-list
  Cc: Dieter,
	=?iso-8859-1?Q?N=FCtzel_=3CDieter=2ENuetzel=40hamburg=2Ede=3E?=,
	Chris Mason, green, Gerd Knorr, Jeff Mahoney

* Dieter Nützel <Dieter.Nuetzel@hamburg.de> [040209 14:00]:

>> > An EXPORT was missing in linux/fs/buffer.c to compile ReiserFS 3.x.x as
>> > modul (inode.c, unresolved symbol):

>> Thanks, I'll add it into the patch when I get back from linux world.

>More on this.

>I have now SuSE's 9.0 2.6.2-0 up and running.

>Only 01-reiserfs-journal-writer is needed now without modifications.

>02-reiserfs-nesting
>03-reiserfs-iosize
>04-reiserfs-balance_dirty

>Are all in an these two need some _little_ hand work:

>05-reiserfs-logging
>06-reiserfs-jh-2

>Don't forget this in fs/buffer.c

>EXPORT_SYMBOL(try_to_release_page);

Of the above, only 04-reiserfs-balance_dirty is included in vanilla
2.6.2. I applied the rest and I haven't had the smallest problem.


-- 
Javier Marcet <javier@marcet.info>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-01-19 16:45 v3 experimental data=ordered and logging speedups for 2.6.1 Chris Mason
  2004-01-19 22:53 ` Dieter Nützel
  2004-01-21 15:09 ` Oleg Drokin
@ 2004-02-11 11:49 ` Oleg Drokin
  2004-02-11 14:00   ` Chris Mason
  2 siblings, 1 reply; 13+ messages in thread
From: Oleg Drokin @ 2004-02-11 11:49 UTC (permalink / raw)
  To: Chris Mason; +Cc: reiserfs-list

Hello!

On Mon, Jan 19, 2004 at 11:45:26AM -0500, Chris Mason wrote:

> ftp.suse.com.  Oleg is cc'd in case he wants to look over the changes to
> reiserfs_file_write in reiserfs-jh-2.

Ok, I finally got some time to look at it.

> 03-reiserfs-iosize
> Changes reiserfs to tell userspace the default io size is 4k.  Works around
> a bug in bdb hit by rpm users

BTW, I am using 2.6 for a long time now, and I use rpm too, I yet to see any
problems with 128k default write size. Though I use Fedora Core 1 as the distro.
Anyway there should be absolutely zero problems with bdb, I looked in their code
and they have sanity checks that do not allow write size to be bigger than
some values that they think are safe (16K I think).

> 05-reiserfs-logging
> Logging speedups for small transactions and fsync heavy applications.  Most
> experimental patch of the bunch, since it changes the way the log does
> metadata writeback

> 06-reiserfs-jh-2
> Adds data=ordered support, along with a journal header attached to
> the buffer head.  This allows for more efficient data=ordered support
> than I had in 2.4.x.

I have some comments on this one, too.
Replicating __block_commit_write just to make sure you add buffer to ordered
list seems to be overkill. You can easily put buffers to some temp list at
buffer allocation time and then just add entire temp list to ordered
buffers list, I think. Also this will make handling of mmap write in the middle
of write a little bit more correct, I think. BTW, I do not see where do you
specially handle mmap writes so that they are written in correct order wrt
inode updates.
Also I am not very sure why you choose to still update sd_blocks, but not
st_size in case of errors in reiserfs_allocate_blocks_for_region().
Perhaps it makes more sence to update both to avoid potential metadata
inconsistency.

Also this part of patch to file.c ruins the attachement of comment to
prepared_pages definition.

     struct page * prepared_pages[REISERFS_WRITE_PAGES_AT_A_TIME];
+    struct reiserfs_transaction_handle th;
+    th.t_trans_id = 0;
                                /* To simplify coding at this time, we store
                                   locked pages in array for now */

That's about all I have noticed on the first look. ;)

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-02-11 11:49 ` Oleg Drokin
@ 2004-02-11 14:00   ` Chris Mason
  2004-02-11 14:26     ` Oleg Drokin
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Mason @ 2004-02-11 14:00 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: reiserfs-list

On Wed, 2004-02-11 at 06:49, Oleg Drokin wrote:
> Hello!
> 
> On Mon, Jan 19, 2004 at 11:45:26AM -0500, Chris Mason wrote:
> 
> > ftp.suse.com.  Oleg is cc'd in case he wants to look over the changes to
> > reiserfs_file_write in reiserfs-jh-2.
> 
> Ok, I finally got some time to look at it.
> 
> > 03-reiserfs-iosize
> > Changes reiserfs to tell userspace the default io size is 4k.  Works around
> > a bug in bdb hit by rpm users
> 
> BTW, I am using 2.6 for a long time now, and I use rpm too, I yet to see any
> problems with 128k default write size. Though I use Fedora Core 1 as the distro.
> Anyway there should be absolutely zero problems with bdb, I looked in their code
> and they have sanity checks that do not allow write size to be bigger than
> some values that they think are safe (16K I think).
> 
Well, it was easiest to trigger with older rpm versions, but you'll get
hit by eventually.  Keep in mind the suse autobuild system runs rpm
thousands (hundreds of thousands) of times per day.  It wasn't an easy
bug to hit.

> > 05-reiserfs-logging
> > Logging speedups for small transactions and fsync heavy applications.  Most
> > experimental patch of the bunch, since it changes the way the log does
> > metadata writeback
> 
> > 06-reiserfs-jh-2
> > Adds data=ordered support, along with a journal header attached to
> > the buffer head.  This allows for more efficient data=ordered support
> > than I had in 2.4.x.
> 
> I have some comments on this one, too.
> Replicating __block_commit_write just to make sure you add buffer to ordered
> list seems to be overkill. You can easily put buffers to some temp list at
> buffer allocation time and then just add entire temp list to ordered
> buffers list, I think. 

If you use the generic __block_commit_write, you've got to update i_size
first to make sure the generic code doesn't do it.  There are a few
other tricky parts when the data=journal code is added.  We've already
made our own file_write call, it doesn't make sense to warp it just to
avoid our own __block_commit_write ;-)

> Also this will make handling of mmap write in the middle
> of write a little bit more correct, I think. BTW, I do not see where do you
> specially handle mmap writes so that they are written in correct order wrt
> inode updates.

Those changes are still only in my local tree.  I wanted to get the
basic functionality out there for testing.

> Also I am not very sure why you choose to still update sd_blocks, but not
> st_size in case of errors in reiserfs_allocate_blocks_for_region().
> Perhaps it makes more sence to update both to avoid potential metadata
> inconsistency.
> 
I'll look at that, thanks.

> Also this part of patch to file.c ruins the attachement of comment to
> prepared_pages definition.
> 
>      struct page * prepared_pages[REISERFS_WRITE_PAGES_AT_A_TIME];
> +    struct reiserfs_transaction_handle th;
> +    th.t_trans_id = 0;
>                                 /* To simplify coding at this time, we store
>                                    locked pages in array for now */
> 
> That's about all I have noticed on the first look. ;)

Thanks, will fix.

-chris



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-02-11 14:00   ` Chris Mason
@ 2004-02-11 14:26     ` Oleg Drokin
  2004-02-11 14:59       ` Chris Mason
  0 siblings, 1 reply; 13+ messages in thread
From: Oleg Drokin @ 2004-02-11 14:26 UTC (permalink / raw)
  To: Chris Mason; +Cc: reiserfs-list

Hello!

On Wed, Feb 11, 2004 at 09:00:34AM -0500, Chris Mason wrote:

> > Anyway there should be absolutely zero problems with bdb, I looked in their code
> > and they have sanity checks that do not allow write size to be bigger than
> > some values that they think are safe (16K I think).
> Well, it was easiest to trigger with older rpm versions, but you'll get
> hit by eventually.  Keep in mind the suse autobuild system runs rpm
> thousands (hundreds of thousands) of times per day.  It wasn't an easy
> bug to hit.

What are the symptoms?

> > > 06-reiserfs-jh-2
> > > Adds data=ordered support, along with a journal header attached to
> > > the buffer head.  This allows for more efficient data=ordered support
> > > than I had in 2.4.x.
> > I have some comments on this one, too.
> > Replicating __block_commit_write just to make sure you add buffer to ordered
> > list seems to be overkill. You can easily put buffers to some temp list at
> > buffer allocation time and then just add entire temp list to ordered
> > buffers list, I think. 
> If you use the generic __block_commit_write, you've got to update i_size
> first to make sure the generic code doesn't do it.  There are a few

No.
The auto i_size update happens only if you use generic_commit_write(),
but I used block_commit_write() which does not do this.

> other tricky parts when the data=journal code is added.  We've already
> made our own file_write call, it doesn't make sense to warp it just to
> avoid our own __block_commit_write ;-)

Well, code duplication is not very good thing.

> > Also this will make handling of mmap write in the middle
> > of write a little bit more correct, I think. BTW, I do not see where do you
> > specially handle mmap writes so that they are written in correct order wrt
> > inode updates.
> Those changes are still only in my local tree.  I wanted to get the
> basic functionality out there for testing.

Well, I need the quota part too, before I can put that on my university's file
server to get more testing ;)))

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-02-11 14:26     ` Oleg Drokin
@ 2004-02-11 14:59       ` Chris Mason
  2004-02-11 15:09         ` Oleg Drokin
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Mason @ 2004-02-11 14:59 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: reiserfs-list

On Wed, 2004-02-11 at 09:26, Oleg Drokin wrote:
> Hello!
> 
> On Wed, Feb 11, 2004 at 09:00:34AM -0500, Chris Mason wrote:
> 
> > > Anyway there should be absolutely zero problems with bdb, I looked in their code
> > > and they have sanity checks that do not allow write size to be bigger than
> > > some values that they think are safe (16K I think).
> > Well, it was easiest to trigger with older rpm versions, but you'll get
> > hit by eventually.  Keep in mind the suse autobuild system runs rpm
> > thousands (hundreds of thousands) of times per day.  It wasn't an easy
> > bug to hit.
> 
> What are the symptoms?

The rpm database is corrupted, rpm --rebuild-db is required.

> 
> > > > 06-reiserfs-jh-2
> > > > Adds data=ordered support, along with a journal header attached to
> > > > the buffer head.  This allows for more efficient data=ordered support
> > > > than I had in 2.4.x.
> > > I have some comments on this one, too.
> > > Replicating __block_commit_write just to make sure you add buffer to ordered
> > > list seems to be overkill. You can easily put buffers to some temp list at
> > > buffer allocation time and then just add entire temp list to ordered
> > > buffers list, I think. 
> > If you use the generic __block_commit_write, you've got to update i_size
> > first to make sure the generic code doesn't do it.  There are a few
> 
> No.
> The auto i_size update happens only if you use generic_commit_write(),
> but I used block_commit_write() which does not do this.
> 
Hmmm, I need more coffee ;-)

> > other tricky parts when the data=journal code is added.  We've already
> > made our own file_write call, it doesn't make sense to warp it just to
> > avoid our own __block_commit_write ;-)
> 
> Well, code duplication is not very good thing.

It depends on how much you have to twist things to use the generic
code.  If we used __block_commit_write, buffers would be marked dirty
when it completes.  This won't work for data=journal at all, we don't
want them marked dirty.

> > > Also this will make handling of mmap write in the middle
> > > of write a little bit more correct, I think. BTW, I do not see where do you
> > > specially handle mmap writes so that they are written in correct order wrt
> > > inode updates.
> > Those changes are still only in my local tree.  I wanted to get the
> > basic functionality out there for testing.
> 
> Well, I need the quota part too, before I can put that on my university's file
> server to get more testing ;)))

getting there ;)

-chris



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-02-11 14:59       ` Chris Mason
@ 2004-02-11 15:09         ` Oleg Drokin
  2004-02-12 14:18           ` Chris Mason
  0 siblings, 1 reply; 13+ messages in thread
From: Oleg Drokin @ 2004-02-11 15:09 UTC (permalink / raw)
  To: Chris Mason; +Cc: reiserfs-list

Hello!

On Wed, Feb 11, 2004 at 09:59:31AM -0500, Chris Mason wrote:
> > > thousands (hundreds of thousands) of times per day.  It wasn't an easy
> > > bug to hit.
> > What are the symptoms?
> The rpm database is corrupted, rpm --rebuild-db is required.

Hm, that's really strange. But on ia64 default io size is 64k, do they have
same problems there?

> > > other tricky parts when the data=journal code is added.  We've already
> > > made our own file_write call, it doesn't make sense to warp it just to
> > > avoid our own __block_commit_write ;-)
> > Well, code duplication is not very good thing.
> It depends on how much you have to twist things to use the generic
> code.  If we used __block_commit_write, buffers would be marked dirty
> when it completes.  This won't work for data=journal at all, we don't
> want them marked dirty.

Well, if you do not want them marked dirty, you just do not need to call
commit_write at all since the only thing it does is marking buffers dirty ;)
And you can have a list of buffers at the allocation time anyway,
so no need to do extra checks about partial page writes and so on since
all these checks were already done.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: v3 experimental data=ordered and logging speedups for 2.6.1
  2004-02-11 15:09         ` Oleg Drokin
@ 2004-02-12 14:18           ` Chris Mason
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Mason @ 2004-02-12 14:18 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: reiserfs-list

On Wed, 2004-02-11 at 10:09, Oleg Drokin wrote:
> Hello!
> 
> On Wed, Feb 11, 2004 at 09:59:31AM -0500, Chris Mason wrote:
> > > > thousands (hundreds of thousands) of times per day.  It wasn't an easy
> > > > bug to hit.
> > > What are the symptoms?
> > The rpm database is corrupted, rpm --rebuild-db is required.
> 
> Hm, that's really strange. But on ia64 default io size is 64k, do they have
> same problems there?

No, ia64 works fine.  It really is strange.

> 
> > > > other tricky parts when the data=journal code is added.  We've already
> > > > made our own file_write call, it doesn't make sense to warp it just to
> > > > avoid our own __block_commit_write ;-)
> > > Well, code duplication is not very good thing.
> > It depends on how much you have to twist things to use the generic
> > code.  If we used __block_commit_write, buffers would be marked dirty
> > when it completes.  This won't work for data=journal at all, we don't
> > want them marked dirty.
> 
> Well, if you do not want them marked dirty, you just do not need to call
> commit_write at all since the only thing it does is marking buffers dirty ;)
> And you can have a list of buffers at the allocation time anyway,
> so no need to do extra checks about partial page writes and so on since
> all these checks were already done.

Interesting.  I'll look harder at that.

-chris



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-02-12 14:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-19 16:45 v3 experimental data=ordered and logging speedups for 2.6.1 Chris Mason
2004-01-19 22:53 ` Dieter Nützel
2004-01-19 22:54   ` Mike Fedyk
2004-01-21  1:50   ` Chris Mason
2004-02-09 13:04     ` Dieter Nützel
2004-02-09 14:14       ` Javier Marcet
2004-01-21 15:09 ` Oleg Drokin
2004-02-11 11:49 ` Oleg Drokin
2004-02-11 14:00   ` Chris Mason
2004-02-11 14:26     ` Oleg Drokin
2004-02-11 14:59       ` Chris Mason
2004-02-11 15:09         ` Oleg Drokin
2004-02-12 14:18           ` Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.