linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Test12 ll_rw_block error.
@ 2000-12-15  2:02 Russell Cattelan
  2000-12-15  2:48 ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Russell Cattelan @ 2000-12-15  2:02 UTC (permalink / raw)
  To: linux-kernel

This would seem to be an error on the part of ll_rw_block.
Setting b_end_io to a default handler without checking to see
a callback has already been defined defeats the purpose of having
a function op.

void ll_rw_block(int rw, int nr, struct buffer_head * bhs[])
 {
@@ -928,7 +1046,8 @@
                if (test_and_set_bit(BH_Lock, &bh->b_state))
                        continue;

-               set_bit(BH_Req, &bh->b_state);
+               /* We have the buffer lock */
+               bh->b_end_io = end_buffer_io_sync;

                switch(rw) {
                case WRITE:


-Russell Cattelan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-15  2:02 Test12 ll_rw_block error Russell Cattelan
@ 2000-12-15  2:48 ` Linus Torvalds
  0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2000-12-15  2:48 UTC (permalink / raw)
  To: linux-kernel

In article <3A397BA9.CB0EC8E5@thebarn.com>,
Russell Cattelan  <cattelan@thebarn.com> wrote:
>This would seem to be an error on the part of ll_rw_block.
>Setting b_end_io to a default handler without checking to see
>a callback has already been defined defeats the purpose of having
>a function op.

No.

It just means that if you have your own function op, you had better not
call "ll_rw_block()".

The problem is that as it was done before, people would set the function
op without actually holding the buffer lock.

Which meant that you could have two unrelated users setting the function
op to two different things, and it would be only a matter of the purest
luck which one happened to "win".

If you want to set your own end-operation, you now need to lock the
buffer _first_, then set "b_end_io" to your operation, and then do a
"submit_bh()". You cannot use ll_rw_block().

Yes, this is different than before. Sorry about that.

But yes, this way actually happens to work reliably.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-19 16:43             ` Daniel Phillips
  2000-12-19 17:18               ` Marcelo Tosatti
@ 2000-12-19 17:40               ` Christoph Hellwig
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2000-12-19 17:40 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Marcelo Tosatti, linux-kernel

In article <3A3F904F.8FBC46A5@innominate.de> you wrote:
>> I think the semantics of the filesystem specific ->flush and ->writepage
>> are not the same.
>> 
>> Is ok for filesystem specific writepage() code to sync other "physically
>> contiguous" dirty pages with reference to the one requested by
>> writepage() ?
>> 
>> If so, it can do the same job as the ->flush() idea we've discussing.
>
> Except that for ->writepage you don't have the option of *not* writing
> the specified page.

In current -test13pre you have.

	Christoph

-- 
Whip me.  Beat me.  Make me maintain AIX.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-19 16:43             ` Daniel Phillips
@ 2000-12-19 17:18               ` Marcelo Tosatti
  2000-12-19 17:40               ` Christoph Hellwig
  1 sibling, 0 replies; 25+ messages in thread
From: Marcelo Tosatti @ 2000-12-19 17:18 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel


On Tue, 19 Dec 2000, Daniel Phillips wrote:

> Marcelo Tosatti wrote:
> > 
> > On Mon, 18 Dec 2000, Stephen C. Tweedie wrote:
> > 
> > > Hi,
> > >
> > > On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
> > > > On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
> > > >
> > > > Stephen,
> > > >
> > > > The ->flush() operation (which we've been discussing a bit) would be very
> > > > useful now (mainly for XFS).
> > > >
> > > > At page_launder(), we can call ->flush() if the given page has it defined.
> > > > Otherwise use try_to_free_buffers() as we do now for filesystems which
> > > > dont care about the special flushing treatment.
> > >
> > > As of 2.4.0test12, page_launder() will already call the
> > > per-address-space writepage() operation for dirty pages.  Do you need
> > > something similar for clean pages too, or does Linus's new laundry
> > > code give you what you need now?
> > 
> > I think the semantics of the filesystem specific ->flush and ->writepage
> > are not the same.
> > 
> > Is ok for filesystem specific writepage() code to sync other "physically
> > contiguous" dirty pages with reference to the one requested by
> > writepage() ?
> > 
> > If so, it can do the same job as the ->flush() idea we've discussing.
> 
> Except that for ->writepage you don't have the option of *not* writing
> the specified page.

It does.

Both the swapper writepage() operation and shm_writepage() cannot be able
to write the page. 
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-19 14:05           ` Marcelo Tosatti
@ 2000-12-19 16:43             ` Daniel Phillips
  2000-12-19 17:18               ` Marcelo Tosatti
  2000-12-19 17:40               ` Christoph Hellwig
  0 siblings, 2 replies; 25+ messages in thread
From: Daniel Phillips @ 2000-12-19 16:43 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel

Marcelo Tosatti wrote:
> 
> On Mon, 18 Dec 2000, Stephen C. Tweedie wrote:
> 
> > Hi,
> >
> > On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
> > > On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
> > >
> > > Stephen,
> > >
> > > The ->flush() operation (which we've been discussing a bit) would be very
> > > useful now (mainly for XFS).
> > >
> > > At page_launder(), we can call ->flush() if the given page has it defined.
> > > Otherwise use try_to_free_buffers() as we do now for filesystems which
> > > dont care about the special flushing treatment.
> >
> > As of 2.4.0test12, page_launder() will already call the
> > per-address-space writepage() operation for dirty pages.  Do you need
> > something similar for clean pages too, or does Linus's new laundry
> > code give you what you need now?
> 
> I think the semantics of the filesystem specific ->flush and ->writepage
> are not the same.
> 
> Is ok for filesystem specific writepage() code to sync other "physically
> contiguous" dirty pages with reference to the one requested by
> writepage() ?
> 
> If so, it can do the same job as the ->flush() idea we've discussing.

Except that for ->writepage you don't have the option of *not* writing
the specified page.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-18 11:46         ` Stephen C. Tweedie
@ 2000-12-19 14:05           ` Marcelo Tosatti
  2000-12-19 16:43             ` Daniel Phillips
  0 siblings, 1 reply; 25+ messages in thread
From: Marcelo Tosatti @ 2000-12-19 14:05 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Alexander Viro, Linus Torvalds, Russell Cattelan, linux-kernel


On Mon, 18 Dec 2000, Stephen C. Tweedie wrote:

> Hi,
> 
> On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
> > On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
> > 
> > Stephen,
> > 
> > The ->flush() operation (which we've been discussing a bit) would be very
> > useful now (mainly for XFS).
> > 
> > At page_launder(), we can call ->flush() if the given page has it defined.
> > Otherwise use try_to_free_buffers() as we do now for filesystems which
> > dont care about the special flushing treatment. 
> 
> As of 2.4.0test12, page_launder() will already call the
> per-address-space writepage() operation for dirty pages.  Do you need
> something similar for clean pages too, or does Linus's new laundry
> code give you what you need now?

I think the semantics of the filesystem specific ->flush and ->writepage
are not the same.

Is ok for filesystem specific writepage() code to sync other "physically
contiguous" dirty pages with reference to the one requested by
writepage() ? 

If so, it can do the same job as the ->flush() idea we've discussing.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-17  2:38       ` Marcelo Tosatti
@ 2000-12-18 11:46         ` Stephen C. Tweedie
  2000-12-19 14:05           ` Marcelo Tosatti
  0 siblings, 1 reply; 25+ messages in thread
From: Stephen C. Tweedie @ 2000-12-18 11:46 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Stephen C. Tweedie, Alexander Viro, Linus Torvalds,
	Russell Cattelan, linux-kernel

Hi,

On Sun, Dec 17, 2000 at 12:38:17AM -0200, Marcelo Tosatti wrote:
> On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:
> 
> Stephen,
> 
> The ->flush() operation (which we've been discussing a bit) would be very
> useful now (mainly for XFS).
> 
> At page_launder(), we can call ->flush() if the given page has it defined.
> Otherwise use try_to_free_buffers() as we do now for filesystems which
> dont care about the special flushing treatment. 

As of 2.4.0test12, page_launder() will already call the
per-address-space writepage() operation for dirty pages.  Do you need
something similar for clean pages too, or does Linus's new laundry
code give you what you need now?

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-17  1:08       ` Russell Cattelan
@ 2000-12-18 11:44         ` Stephen C. Tweedie
  0 siblings, 0 replies; 25+ messages in thread
From: Stephen C. Tweedie @ 2000-12-18 11:44 UTC (permalink / raw)
  To: Russell Cattelan
  Cc: Stephen C. Tweedie, Alexander Viro, Linus Torvalds, linux-kernel

Hi,

On Sat, Dec 16, 2000 at 07:08:02PM -0600, Russell Cattelan wrote:
> > There is a very clean way of doing this with address spaces.  It's
> > something I would like to see done properly for 2.5: eliminate all
> > knowledge of buffer_heads from the VM layer.  It would be pretty
> > simple to remove page->buffers completely and replace it with a
> > page->private pointer, owned by whatever address_space controlled the
> > page.  Instead of trying to unmap and flush buffers on the page
> > directly, these operations would become address_space operations.
> 
> Yes this is a lot of what page buf would like to do eventually.
> Have the VM system pressure page_buf for pages which would
> then be able to intelligently call the file system to free up cached pages.
> A big part of getting Delay Alloc to not completely consume all the
> system pages, is being told when it's time to start really allocating disk
> space and push pages out.

Delayed allocation is actually much easier, since it's entirely an
operation on logical page addresses, not physical ones --- by
definition you don't have any buffer_heads yet because you haven't
decided on the disk blocks.  If you're just dealing with pages, not
blocks, then the address_space is the natural way of dealing with it
already.  

Only the full semantics of the flush callback have been missing to
date, and with 2.4.0-test12 even that is mostly solved, since
page_launder will give you the writeback() callbacks you need to flush
things to disk when you start getting memory pressure.  You can even
treat the writepage() as an advisory call.  

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-17  0:54       ` Russell Cattelan
@ 2000-12-17 12:20         ` Chris Mason
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Mason @ 2000-12-17 12:20 UTC (permalink / raw)
  To: Russell Cattelan
  Cc: Alexander Viro, Linus Torvalds, Stephen C. Tweedie, linux-kernel



On Sat, 16 Dec 2000, Russell Cattelan wrote:
> >
> I'm curious about this.
> Does the mean reiserFS is doing all of it's own buffer management?
> 
> This would seem a little redundant with what is already in the kernel?
> 
For metadata only reiserfs does its own write management.  The buffers
come from getblk. We just don't mark the buffers dirty for flushing by
flush_dirty_buffers()

This has the advantage of avoiding races against bdflush and friends, and
makes it easier to keep track of which buffers have actually made their
way to disk.  It has all of the obvious disadvantages with respect to
memory pressure.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-15 10:51     ` Stephen C. Tweedie
  2000-12-15 19:04       ` Linus Torvalds
  2000-12-17  1:08       ` Russell Cattelan
@ 2000-12-17  2:38       ` Marcelo Tosatti
  2000-12-18 11:46         ` Stephen C. Tweedie
  2 siblings, 1 reply; 25+ messages in thread
From: Marcelo Tosatti @ 2000-12-17  2:38 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Alexander Viro, Linus Torvalds, Russell Cattelan, linux-kernel


On Fri, 15 Dec 2000, Stephen C. Tweedie wrote:

> Hi,
> 
> On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
> > On Thu, 14 Dec 2000, Linus Torvalds wrote:
> > 
> > Just one: any fs that really cares about completion callback is very likely
> > to be picky about the requests ordering. So sync_buffers() is very unlikely
> > to be useful anyway.
> > 
> > In that sense we really don't have anonymous buffers here. I seriously
> > suspect that "unrealistic" assumption is not unrealistic at all. I'm
> > not sufficiently familiar with XFS code to say for sure, but...
> 
> Right.  ext3 and reiserfs just want to submit their own IOs when it
> comes to the journal.  (At least in ext3, already-journaled buffers
> can be written back by the VM freely.)  It's a matter of telling the
> fs when that should start.
> 
> > What we really need is a way for VFS/VM to pass the pressure on filesystem.
> > That's it. If fs wants unusual completions for requests - let it have its
> > own queueing mechanism and submit these requests when it finds that convenient.
> 
> There is a very clean way of doing this with address spaces.  It's
> something I would like to see done properly for 2.5: eliminate all
> knowledge of buffer_heads from the VM layer.  It would be pretty
> simple to remove page->buffers completely and replace it with a
> page->private pointer, owned by whatever address_space controlled the
> page.  Instead of trying to unmap and flush buffers on the page
> directly, these operations would become address_space operations.
> 
> We could still provide the standard try_to_free_buffers() and
> unmap_underlying_metadata() functions to operate on the per-page
> buffer_head lists, and existing filesystems would only have to point
> their address_space "private metadata" operations at the generic
> functions.  However, a filesystem which had additional ordering
> constraints could then intercept the flush or writeback calls from the
> VM and decide on its own how best to honour the VM pressure.

Stephen,

The ->flush() operation (which we've been discussing a bit) would be very
useful now (mainly for XFS).

At page_launder(), we can call ->flush() if the given page has it defined.
Otherwise use try_to_free_buffers() as we do now for filesystems which
dont care about the special flushing treatment. 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-15 10:51     ` Stephen C. Tweedie
  2000-12-15 19:04       ` Linus Torvalds
@ 2000-12-17  1:08       ` Russell Cattelan
  2000-12-18 11:44         ` Stephen C. Tweedie
  2000-12-17  2:38       ` Marcelo Tosatti
  2 siblings, 1 reply; 25+ messages in thread
From: Russell Cattelan @ 2000-12-17  1:08 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Alexander Viro, Linus Torvalds, linux-kernel

"Stephen C. Tweedie" wrote:

> Hi,
>
> On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
> > On Thu, 14 Dec 2000, Linus Torvalds wrote:
> >
> > Just one: any fs that really cares about completion callback is very likely
> > to be picky about the requests ordering. So sync_buffers() is very unlikely
> > to be useful anyway.
> >
> > In that sense we really don't have anonymous buffers here. I seriously
> > suspect that "unrealistic" assumption is not unrealistic at all. I'm
> > not sufficiently familiar with XFS code to say for sure, but...
>
> Right.  ext3 and reiserfs just want to submit their own IOs when it
> comes to the journal.  (At least in ext3, already-journaled buffers
> can be written back by the VM freely.)  It's a matter of telling the
> fs when that should start.
>
> > What we really need is a way for VFS/VM to pass the pressure on filesystem.
> > That's it. If fs wants unusual completions for requests - let it have its
> > own queueing mechanism and submit these requests when it finds that convenient.
>
> There is a very clean way of doing this with address spaces.  It's
> something I would like to see done properly for 2.5: eliminate all
> knowledge of buffer_heads from the VM layer.  It would be pretty
> simple to remove page->buffers completely and replace it with a
> page->private pointer, owned by whatever address_space controlled the
> page.  Instead of trying to unmap and flush buffers on the page
> directly, these operations would become address_space operations.

Yes this is a lot of what page buf would like to do eventually.
Have the VM system pressure page_buf for pages which would
then be able to intelligently call the file system to free up cached pages.
A big part of getting Delay Alloc to not completely consume all the
system pages, is being told when it's time to start really allocating disk
space and push pages out.


>
>
> We could still provide the standard try_to_free_buffers() and
> unmap_underlying_metadata() functions to operate on the per-page
> buffer_head lists, and existing filesystems would only have to point
> their address_space "private metadata" operations at the generic
> functions.  However, a filesystem which had additional ordering
> constraints could then intercept the flush or writeback calls from the
> VM and decide on its own how best to honour the VM pressure.
>
> This even works both for hashed and unhashed anonymous buffers, *if*
> you allow the filesystem to attach all of its hashed buffers buffers
> to an address_space of its own rather than having the buffer cache
> pool all such buffers together.
>
> --Stephen

--
Russell Cattelan
cattelan@thebarn.com



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-15  9:14     ` Chris Mason
@ 2000-12-17  0:54       ` Russell Cattelan
  2000-12-17 12:20         ` Chris Mason
  0 siblings, 1 reply; 25+ messages in thread
From: Russell Cattelan @ 2000-12-17  0:54 UTC (permalink / raw)
  To: Chris Mason
  Cc: Alexander Viro, Linus Torvalds, Stephen C. Tweedie, linux-kernel

Chris Mason wrote:

> On Fri, 15 Dec 2000, Alexander Viro wrote:
>
> > Just one: any fs that really cares about completion callback is very likely
> > to be picky about the requests ordering. So sync_buffers() is very unlikely
> > to be useful anyway.
> >
> Somewhat.  I guess there are at least two ways to do it.  First flush the
> buffers where ordering matters (log blocks), then send the others onto the
> dirty list (general metadata).  You might have your own end_io for those, and
> sync_buffers would lose it.
>
> Second way (reiserfs recently changed to this method) is to do all the
> flushing yourself, and remove the need for an end_io call back.
>
I'm curious about this.
Does the mean reiserFS is doing all of it's own buffer management?

This would seem a little redundant with what is already in the kernel?

>
>
> > In that sense we really don't have anonymous buffers here. I seriously
> > suspect that "unrealistic" assumption is not unrealistic at all. I'm
> > not sufficiently familiar with XFS code to say for sure, but...
> >
> > What we really need is a way for VFS/VM to pass the pressure on filesystem.
> > That's it. If fs wants unusual completions for requests - let it have its
> > own queueing mechanism and submit these requests when it finds that convenient.
> >
> Yes, this is exactly what we've discussed.
>
> -chris

--
Russell Cattelan
cattelan@thebarn.com



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-15  7:00   ` Alexander Viro
  2000-12-15  9:14     ` Chris Mason
  2000-12-15 10:51     ` Stephen C. Tweedie
@ 2000-12-17  0:51     ` Russell Cattelan
  2 siblings, 0 replies; 25+ messages in thread
From: Russell Cattelan @ 2000-12-17  0:51 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, Stephen C. Tweedie, linux-kernel

Alexander Viro wrote:

> On Thu, 14 Dec 2000, Linus Torvalds wrote:
>
> > Good point.
> >
> > This actually looks fairly nasty to fix. The obvious fix would be to not
> > put such buffers on the dirty list at all, and instead rely on the VM
> > layer calling "writepage()" when it wants to push out the pages.
> > That would be the nice behaviour from a VM standpoint.
> >
> > However, that assumes that you don't have any "anonymous" buffers, which
> > is probably an unrealistic assumption.
> >
> > The problem is that we don't have any per-buffer "writebuffer()" function,
> > the way we have them per-page. It was never needed for any of the normal
> > filesystems, and XFS just happened to be able to take advantage of the
> > b_end_io behaviour.
> >
> > Suggestions welcome.
>
> Just one: any fs that really cares about completion callback is very likely
> to be picky about the requests ordering. So sync_buffers() is very unlikely
> to be useful anyway.

Actually no,  that's not the issue.

The XFS log uses a LSN (Log Sequence Number) to keep track of log write ordering.
Sync IO on each log buffer isn't realistic; the performance hit would be to great.

I wasn't around when  most of XFS was developed, but  from I what I understand it
was discovered early on that firing off writes in a particular order doesn't
guarantee that
they will finish in that order.  Thus the implantation of a sequence number for
each log write.


One of the obstacles we ran into early on in the linux port was the fact that
linux used fixed size IO requests to any given device.
But most of XFS's meta data structures vary in size in multiples of 512 bytes.

We were also implementing a page caching / clustering layer called
page_buf which understands  primarily  pages and not necessary
disk blocks. If your FS block size happens to match your page size then things
are good,  but it doesn't....
So we added a  bit map field to the pages structure.
Each bit then represents one BASIC BLOCK eg 512 for all practical purposes

The end_io functions XFS defines updates the correct bit or the whole bit array
if the whole page is valid, thus signaling the rest of the page_buf that the io
has
completed.

Ok there is a lot more to it than what I've just described but you probably get
the idea.


>
>
> In that sense we really don't have anonymous buffers here. I seriously
> suspect that "unrealistic" assumption is not unrealistic at all. I'm
> not sufficiently familiar with XFS code to say for sure, but...
>
> What we really need is a way for VFS/VM to pass the pressure on filesystem.
> That's it. If fs wants unusual completions for requests - let it have its
> own queueing mechanism and submit these requests when it finds that convenient.
>
> Stephen, you probably already thought about that area. Could you comment on
> that?
>                                                                 Cheers,
>                                                                         Al

--
Russell Cattelan
cattelan@thebarn.com



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-15  6:20 ` Linus Torvalds
  2000-12-15  7:00   ` Alexander Viro
@ 2000-12-17  0:21   ` Russell Cattelan
  1 sibling, 0 replies; 25+ messages in thread
From: Russell Cattelan @ 2000-12-17  0:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:

> On Thu, 14 Dec 2000, Russell Cattelan wrote:
> >
> > Ok one more wrinkle.
> > sync_buffers calls ll_rw_block, this is going to have the same problem as
> > calling ll_rw_block directly.
>
> Good point.
>
> This actually looks fairly nasty to fix. The obvious fix would be to not
> put such buffers on the dirty list at all, and instead rely on the VM
> layer calling "writepage()" when it wants to push out the pages.
> That would be the nice behaviour from a VM standpoint.
>
> However, that assumes that you don't have any "anonymous" buffers, which
> is probably an unrealistic assumption.
>
> The problem is that we don't have any per-buffer "writebuffer()" function,
> the way we have them per-page. It was never needed for any of the normal
> filesystems, and XFS just happened to be able to take advantage of the
> b_end_io behaviour.
>
> Suggestions welcome.
>
>                 Linus

Ok after a bit of trial and error I do have something working.
I wouldn't call it the most elegant solution but it does work
and it isn't very intrusive.

#define BH_End_io  7    /* End io function defined don't remap it */

                /*  don't change the callback if somebody explicitly set it */

                if(!test_bit(BH_End_io, &bh->b_state)){
                  bh->b_end_io = end_buffer_io_sync;
                }
What I've done is in the XFS set buffer_head setup functions is
set the initial value of b_state to BH_Locked  and BH_End_io
set the callback function and the rest of the relevant fields and then unlock
the
buffer.

The only other quick fix that comes to mind is to change sync_buffers to use
submit_bh rather than ll_rw_block.

--
Russell Cattelan
cattelan@thebarn.com



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-16 19:07     ` Linus Torvalds
@ 2000-12-16 19:35       ` Chris Mason
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Mason @ 2000-12-16 19:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff Chua, linux-kernel



On Sat, 16 Dec 2000, Linus Torvalds wrote:

> Your patch looks fine, although I'd personally prefer this one even more:
> 
Yes, that looks better and works here.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-16 17:13   ` Chris Mason
@ 2000-12-16 19:07     ` Linus Torvalds
  2000-12-16 19:35       ` Chris Mason
  0 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2000-12-16 19:07 UTC (permalink / raw)
  To: Chris Mason; +Cc: Jeff Chua, linux-kernel



On Sat, 16 Dec 2000, Chris Mason wrote:
> 
> In other words, after calling reiserfs_get_block, the buffer might be
> mapped and uptodate, with no i/o required in block_read_full_page
> 
> The following patch to block_read_full_page fixes things for me, and seems
> like a good idea in general.  It might be better to apply something
> similar to submit_bh instead...comments?

Making the change to submit_bh() would make it look more like the old
ll_rw_block() in this regard, but at the same time I really don't like the
old ll_rw_block() interface that knew about semantics..

Your patch looks fine, although I'd personally prefer this one even more:

	fs/buffer.c patch cut-and-paste:
	+++ fs/buffer.c Sat Dec 16 11:02:44 2000
	@@ -1700,6 +1693,9 @@
	                                set_bit(BH_Uptodate, &bh->b_state);
	                                continue;
	                        }
	+                       /* get_block() might have updated the buffer synchronously */
	+                       if (buffer_uptodate(bh))
	+                               continue;
	                }
	 
	                arr[nr] = bh;

which makes it explicit about how we could have suddenly gotten an
up-to-date buffer. Does that work for you?

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-16  1:12 ` Linus Torvalds
@ 2000-12-16 17:13   ` Chris Mason
  2000-12-16 19:07     ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Mason @ 2000-12-16 17:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff Chua, linux-kernel

On Fri, 15 Dec 2000, Linus Torvalds wrote:

[ writepage for anon buffers ]

> 
> It might be 10 lines of change, and obviously correct.
> 
I'll give this a try, it will be interesting regardless of if it is simple
enough for kernel inclusion.  

On a related note, I hit a snag porting reiserfs into test12, where
block_read_full_page assumes the buffer it gets back from get_block won't
be up to date.  When reiserfs reads a packed tail directly into the page,
reading the  newly mapped buffer isn't required, and is actually a bad
idea, since the packed tails have a block number of 0 when copied into
the page cache.

In other words, after calling reiserfs_get_block, the buffer might be
mapped and uptodate, with no i/o required in block_read_full_page

The following patch to block_read_full_page fixes things for me, and seems
like a good idea in general.  It might be better to apply something
similar to submit_bh instead...comments?

-chris

--- linux-test12/fs/buffer.c	Mon Dec 18 11:37:42 2000
+++ linux/fs/buffer.c	Mon Dec 18 11:38:36 2000
@@ -1706,8 +1706,10 @@
 			}
 		}
 
-		arr[nr] = bh;
-		nr++;
+		if (!buffer_uptodate(bh)) {
+			arr[nr] = bh;
+			nr++;
+	        }
 	} while (i++, iblock++, (bh = bh->b_this_page) != head);
 
 	if (!nr) {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-16  0:58 Jeff Chua
@ 2000-12-16  1:12 ` Linus Torvalds
  2000-12-16 17:13   ` Chris Mason
  0 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2000-12-16  1:12 UTC (permalink / raw)
  To: Jeff Chua; +Cc: linux-kernel



On Sat, 16 Dec 2000, Jeff Chua wrote:
>
> > Now, I also agree that we should be able to clean this up properly for
> > 2.5.x, and actually do exactly this for the anonymous buffers, so that
> > the VM no longer needs to worry about buffer knowledge, and fs/buffer.c
> > becomes just another user of the writepage functionality.  That is not
> > all that hard to do, it mainly just requires some small changes to how
> 
> Why not incorporate this change into 2.4.x?

It might be 10 lines of change, and obviously correct.

And it might not be. If somebody wants to try out the DirtyPage approach
for buffer handling, please do so. I'll apply it if it _does_ turn out to
be as small as I suspect it might be, and if the code is straightforward
and obvious.

If not, we're better off leaving it for 2.5.x

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
@ 2000-12-16  0:58 Jeff Chua
  2000-12-16  1:12 ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Chua @ 2000-12-16  0:58 UTC (permalink / raw)
  To: linux-kernel, torvalds

> Now, I also agree that we should be able to clean this up properly for
> 2.5.x, and actually do exactly this for the anonymous buffers, so that
> the VM no longer needs to worry about buffer knowledge, and fs/buffer.c
> becomes just another user of the writepage functionality.  That is not
> all that hard to do, it mainly just requires some small changes to how

Why not incorporate this change into 2.4.x?

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-15 10:51     ` Stephen C. Tweedie
@ 2000-12-15 19:04       ` Linus Torvalds
  2000-12-17  1:08       ` Russell Cattelan
  2000-12-17  2:38       ` Marcelo Tosatti
  2 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2000-12-15 19:04 UTC (permalink / raw)
  To: linux-kernel

In article <20001215105148.E11931@redhat.com>,
Stephen C. Tweedie <sct@redhat.com> wrote:
>
>> What we really need is a way for VFS/VM to pass the pressure on filesystem.
>> That's it. If fs wants unusual completions for requests - let it have its
>> own queueing mechanism and submit these requests when it finds that convenient.
>
>There is a very clean way of doing this with address spaces.  It's
>something I would like to see done properly for 2.5: eliminate all
>knowledge of buffer_heads from the VM layer.

Note that you should be able to already get this effect with the current
2.4.0 tree.

The way to get the VM to ignore your buffer heads is to never mark the
buffers dirty, and instead mark the page they are on dirty along with
giving it a mapping (you can have a special per-superblock
"metadata-mapping" for stuff that isn't actually associated with any
particular file.

That way, the VM will just call writepage() for you, and you can use
that to schedule your writeouts (you don't actually need to write the
page the VM _asks_ you to write - you can just mark it dirty again and
consider the writepage to be mainly a VM pressure indicator).

Now, I also agree that we should be able to clean this up properly for
2.5.x, and actually do exactly this for the anonymous buffers, so that
the VM no longer needs to worry about buffer knowledge, and fs/buffer.c
becomes just another user of the writepage functionality.  That is not
all that hard to do, it mainly just requires some small changes to how
"mark_buffer_dirty()" works (ie it would also mark the page dirty, so
that the VM layer would know to call "writepage()"). 

I really think almost all of the VM infrastructure for this is in place,
the PageDirty code has both simplified the VM enormously and made it a
lot more powerful at the same time.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-15  7:00   ` Alexander Viro
  2000-12-15  9:14     ` Chris Mason
@ 2000-12-15 10:51     ` Stephen C. Tweedie
  2000-12-15 19:04       ` Linus Torvalds
                         ` (2 more replies)
  2000-12-17  0:51     ` Russell Cattelan
  2 siblings, 3 replies; 25+ messages in thread
From: Stephen C. Tweedie @ 2000-12-15 10:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, Stephen C. Tweedie, Russell Cattelan, linux-kernel

Hi,

On Fri, Dec 15, 2000 at 02:00:19AM -0500, Alexander Viro wrote:
> On Thu, 14 Dec 2000, Linus Torvalds wrote:
> 
> Just one: any fs that really cares about completion callback is very likely
> to be picky about the requests ordering. So sync_buffers() is very unlikely
> to be useful anyway.
> 
> In that sense we really don't have anonymous buffers here. I seriously
> suspect that "unrealistic" assumption is not unrealistic at all. I'm
> not sufficiently familiar with XFS code to say for sure, but...

Right.  ext3 and reiserfs just want to submit their own IOs when it
comes to the journal.  (At least in ext3, already-journaled buffers
can be written back by the VM freely.)  It's a matter of telling the
fs when that should start.

> What we really need is a way for VFS/VM to pass the pressure on filesystem.
> That's it. If fs wants unusual completions for requests - let it have its
> own queueing mechanism and submit these requests when it finds that convenient.

There is a very clean way of doing this with address spaces.  It's
something I would like to see done properly for 2.5: eliminate all
knowledge of buffer_heads from the VM layer.  It would be pretty
simple to remove page->buffers completely and replace it with a
page->private pointer, owned by whatever address_space controlled the
page.  Instead of trying to unmap and flush buffers on the page
directly, these operations would become address_space operations.

We could still provide the standard try_to_free_buffers() and
unmap_underlying_metadata() functions to operate on the per-page
buffer_head lists, and existing filesystems would only have to point
their address_space "private metadata" operations at the generic
functions.  However, a filesystem which had additional ordering
constraints could then intercept the flush or writeback calls from the
VM and decide on its own how best to honour the VM pressure.

This even works both for hashed and unhashed anonymous buffers, *if*
you allow the filesystem to attach all of its hashed buffers buffers
to an address_space of its own rather than having the buffer cache
pool all such buffers together.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-15  7:00   ` Alexander Viro
@ 2000-12-15  9:14     ` Chris Mason
  2000-12-17  0:54       ` Russell Cattelan
  2000-12-15 10:51     ` Stephen C. Tweedie
  2000-12-17  0:51     ` Russell Cattelan
  2 siblings, 1 reply; 25+ messages in thread
From: Chris Mason @ 2000-12-15  9:14 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, Stephen C. Tweedie, Russell Cattelan, linux-kernel



On Fri, 15 Dec 2000, Alexander Viro wrote:

> Just one: any fs that really cares about completion callback is very likely
> to be picky about the requests ordering. So sync_buffers() is very unlikely
> to be useful anyway.
> 
Somewhat.  I guess there are at least two ways to do it.  First flush the
buffers where ordering matters (log blocks), then send the others onto the
dirty list (general metadata).  You might have your own end_io for those, and
sync_buffers would lose it.

Second way (reiserfs recently changed to this method) is to do all the
flushing yourself, and remove the need for an end_io call back.

> In that sense we really don't have anonymous buffers here. I seriously
> suspect that "unrealistic" assumption is not unrealistic at all. I'm
> not sufficiently familiar with XFS code to say for sure, but...
> 
> What we really need is a way for VFS/VM to pass the pressure on filesystem.
> That's it. If fs wants unusual completions for requests - let it have its
> own queueing mechanism and submit these requests when it finds that convenient.
> 
Yes, this is exactly what we've discussed.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
  2000-12-15  6:20 ` Linus Torvalds
@ 2000-12-15  7:00   ` Alexander Viro
  2000-12-15  9:14     ` Chris Mason
                       ` (2 more replies)
  2000-12-17  0:21   ` Russell Cattelan
  1 sibling, 3 replies; 25+ messages in thread
From: Alexander Viro @ 2000-12-15  7:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen C. Tweedie, Russell Cattelan, linux-kernel



On Thu, 14 Dec 2000, Linus Torvalds wrote:

> Good point. 
> 
> This actually looks fairly nasty to fix. The obvious fix would be to not
> put such buffers on the dirty list at all, and instead rely on the VM
> layer calling "writepage()" when it wants to push out the pages.
> That would be the nice behaviour from a VM standpoint.
> 
> However, that assumes that you don't have any "anonymous" buffers, which
> is probably an unrealistic assumption.
> 
> The problem is that we don't have any per-buffer "writebuffer()" function,
> the way we have them per-page. It was never needed for any of the normal
> filesystems, and XFS just happened to be able to take advantage of the
> b_end_io behaviour.
> 
> Suggestions welcome. 

Just one: any fs that really cares about completion callback is very likely
to be picky about the requests ordering. So sync_buffers() is very unlikely
to be useful anyway.

In that sense we really don't have anonymous buffers here. I seriously
suspect that "unrealistic" assumption is not unrealistic at all. I'm
not sufficiently familiar with XFS code to say for sure, but...

What we really need is a way for VFS/VM to pass the pressure on filesystem.
That's it. If fs wants unusual completions for requests - let it have its
own queueing mechanism and submit these requests when it finds that convenient.

Stephen, you probably already thought about that area. Could you comment on
that?
								Cheers,
									Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
       [not found] <3A398F84.2CD3039D@thebarn.com>
@ 2000-12-15  6:20 ` Linus Torvalds
  2000-12-15  7:00   ` Alexander Viro
  2000-12-17  0:21   ` Russell Cattelan
  0 siblings, 2 replies; 25+ messages in thread
From: Linus Torvalds @ 2000-12-15  6:20 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: linux-kernel



On Thu, 14 Dec 2000, Russell Cattelan wrote:
> 
> Ok one more wrinkle.
> sync_buffers calls ll_rw_block, this is going to have the same problem as
> calling ll_rw_block directly.

Good point. 

This actually looks fairly nasty to fix. The obvious fix would be to not
put such buffers on the dirty list at all, and instead rely on the VM
layer calling "writepage()" when it wants to push out the pages.
That would be the nice behaviour from a VM standpoint.

However, that assumes that you don't have any "anonymous" buffers, which
is probably an unrealistic assumption.

The problem is that we don't have any per-buffer "writebuffer()" function,
the way we have them per-page. It was never needed for any of the normal
filesystems, and XFS just happened to be able to take advantage of the
b_end_io behaviour.

Suggestions welcome. 

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Test12 ll_rw_block error.
       [not found] <3A398D58.92BBC9A4@thebarn.com>
@ 2000-12-15  3:28 ` Linus Torvalds
  0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2000-12-15  3:28 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: Kernel Mailing List



On Thu, 14 Dec 2000, Russell Cattelan wrote:
> 
> So one more observation in
> filemap_sync_pte
> 
>  lock_page(page);
>  error = filemap_write_page(page, 1);
> ->  UnlockPage(page);
> This unlock page was removed? is that correct?

Yes. The "writepage" thing changed: "struct file" disappeared (as I'm sure
you also noticed), and the page writer is supposed to unlock the page
itself. Which it may do at any time, of course.

There are some reasons to do it only after the IO has actually completed:
this way the VM layer won't try to write it out _again_ before the first
IO hasn't even finished yet, and the writing logic can possibly be
simplified if you know that nobody else will be touching that page.

But that is up to you: you can do the UnlockPage before even returning
from your "->writepage()" function, if you choose to do so.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2000-12-19 19:42 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-15  2:02 Test12 ll_rw_block error Russell Cattelan
2000-12-15  2:48 ` Linus Torvalds
     [not found] <3A398D58.92BBC9A4@thebarn.com>
2000-12-15  3:28 ` Linus Torvalds
     [not found] <3A398F84.2CD3039D@thebarn.com>
2000-12-15  6:20 ` Linus Torvalds
2000-12-15  7:00   ` Alexander Viro
2000-12-15  9:14     ` Chris Mason
2000-12-17  0:54       ` Russell Cattelan
2000-12-17 12:20         ` Chris Mason
2000-12-15 10:51     ` Stephen C. Tweedie
2000-12-15 19:04       ` Linus Torvalds
2000-12-17  1:08       ` Russell Cattelan
2000-12-18 11:44         ` Stephen C. Tweedie
2000-12-17  2:38       ` Marcelo Tosatti
2000-12-18 11:46         ` Stephen C. Tweedie
2000-12-19 14:05           ` Marcelo Tosatti
2000-12-19 16:43             ` Daniel Phillips
2000-12-19 17:18               ` Marcelo Tosatti
2000-12-19 17:40               ` Christoph Hellwig
2000-12-17  0:51     ` Russell Cattelan
2000-12-17  0:21   ` Russell Cattelan
2000-12-16  0:58 Jeff Chua
2000-12-16  1:12 ` Linus Torvalds
2000-12-16 17:13   ` Chris Mason
2000-12-16 19:07     ` Linus Torvalds
2000-12-16 19:35       ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).