All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sage Weil <sage@newdream.net>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: linux-fsdevel@vger.kernel.org, Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 08/19] ceph: address space operations
Date: Thu, 23 Jul 2009 21:44:57 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0907231642590.2930@cobra.newdream.net> (raw)
In-Reply-To: <1248374834.6139.13.camel@heimdal.trondhjem.org>

On Thu, 23 Jul 2009, Trond Myklebust wrote:
> On Thu, 2009-07-23 at 11:26 -0700, Sage Weil wrote:
> > A related question I had on writepages failures: what is the 'right' thing 
> > to do if we get a server error on writeback?  If we believe it may be 
> > transient (say, ENOSPC), should we redirty pages and hope for better luck 
> > next time?
> 
> How would ENOSPC be transient? On most systems, ENOSPC requires some
> kind of user action in order to allow recovery, so will they pass the
> error back to the application.

In a distributed environment, other users may be deleting data, or the 
cluster might be expanding/rebalancing as new storage is added to the 
system.  Of course, any retry after ENOSPC should be limited to a small 
number of additional attempts.

> On the other hand, an error due to a storage element rebooting might be
> transient, and can probably be dealt with by retrying. It depends on
> what kind of contract you have with applications w.r.t. data integrity.

The general strategy with an unresponsive server is the same as NFS: just 
wait indefinitely.  (Control-c works, though.)
 
> > What if we decide it's a fatal error?
> 
> Well, the NFS client will record the error, and then pass it back to the
> application on the next write() or on close(). However this strategy
> relies partly on the fact that all NFS clients are required to flush
> pending writes to permanent storage on close().

I see.  Looking through the code, I see SetPageError(page) along with the 
end_page_writeback stuff, and the error code in the nfs_open_context.  

The part I don't understand is what actually happens to pages after the 
error flag set.  They're still uptodate, but no longer dirty?  And can be 
overwritten/redirtied?  There's also an error flag on the address_space.  
Are there any guidelines as far as which should be used?

Thanks-
sage



> 
> Cheers
>   Trond
> 
> > sage
> > 
> > 
> > On Thu, 23 Jul 2009, Andi Kleen wrote:
> > 
> > > Sage Weil <sage@newdream.net> writes:
> > > 
> > > > The ceph address space methods are concerned primarily with managing
> > > > the dirty page accounting in the inode, which (among other things)
> > > > must keep track of which snapshot context each page was dirtied in,
> > > > and ensure that dirty data is written out to the OSDs in snapshort
> > > > order.
> > > >
> > > > A writepage() on a page that is not currently writeable due to
> > > > snapshot writeback ordering constraints is ignored (it was presumably
> > > > called from kswapd).
> > > 
> > > Not a detailed review. You would need to get one from someone who
> > > knows the VFS interfaces very well (unfortunately those people are hard
> > > to find). I just read through it.
> > > 
> > > One thing I noticed is that you seem to do a lot of memory allocation
> > > in the write out paths (some of it even GFP_KERNEL, not GFP_NOFS) 
> > > 
> > > The traditional wisdom is that you should not allocate memory in block
> > > writeout, because that can deadlock. The worst case is swapfile
> > > on it, but it can happen with mmap too (e.g. one process using
> > > most memory with a file mmap from your fs)  GFP_KERNEL can also recurse,
> > > which can cause other problems in your fs.
> > > 
> > > There were some changes to make this problem less severe (e.g. better
> > > dirty pages accounting), but I don't think anyone has really declared
> > > it solved yet. The standard workaround for this is to use mempools 
> > > for anything allocated in the writeout path, then you are at least
> > > guaranteed to make forward progress.
> > > 
> > > You also had at least one unchecked kmalloc I think.
> > > 
> > > -Andi
> > > 
> > > -- 
> > > ak@linux.intel.com -- Speaking for myself only.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

  reply	other threads:[~2009-07-24  4:45 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-22 19:51 [PATCH 00/19] ceph: Ceph distributed file system client v0.11 Sage Weil
2009-07-22 19:51 ` [PATCH 01/19] ceph: documentation Sage Weil
2009-07-22 19:51   ` [PATCH 02/19] ceph: on-wire types Sage Weil
2009-07-22 19:51     ` [PATCH 03/19] ceph: client types Sage Weil
2009-07-22 19:51       ` [PATCH 04/19] ceph: super.c Sage Weil
2009-07-22 19:51         ` [PATCH 05/19] ceph: inode operations Sage Weil
2009-07-22 19:51           ` [PATCH 06/19] ceph: directory operations Sage Weil
2009-07-22 19:51             ` [PATCH 07/19] ceph: file operations Sage Weil
2009-07-22 19:51               ` [PATCH 08/19] ceph: address space operations Sage Weil
2009-07-22 19:51                 ` [PATCH 09/19] ceph: MDS client Sage Weil
2009-07-22 19:51                   ` [PATCH 10/19] ceph: OSD client Sage Weil
2009-07-22 19:51                     ` [PATCH 11/19] ceph: CRUSH mapping algorithm Sage Weil
2009-07-22 19:51                       ` [PATCH 12/19] ceph: monitor client Sage Weil
2009-07-22 19:51                         ` [PATCH 13/19] ceph: capability management Sage Weil
2009-07-22 19:51                           ` [PATCH 14/19] ceph: snapshot management Sage Weil
2009-07-22 19:51                             ` [PATCH 15/19] ceph: messenger library Sage Weil
2009-07-22 19:51                               ` [PATCH 16/19] ceph: nfs re-export support Sage Weil
2009-07-22 19:51                                 ` [PATCH 17/19] ceph: ioctls Sage Weil
2009-07-22 19:51                                   ` [PATCH 18/19] ceph: debugfs Sage Weil
2009-07-22 19:51                                     ` [PATCH 19/19] ceph: Kconfig, Makefile Sage Weil
2009-07-25  5:31                                     ` [PATCH 18/19] ceph: debugfs Greg KH
2009-07-27 17:06                                       ` Sage Weil
2009-07-22 22:39                                   ` [PATCH 17/19] ceph: ioctls Andi Kleen
2009-07-22 23:52                                     ` Sage Weil
2009-07-23  6:24                                       ` Andi Kleen
2009-07-23 18:42                                         ` Sage Weil
2009-07-23 10:25                 ` [PATCH 08/19] ceph: address space operations Andi Kleen
2009-07-23 18:22                   ` Sage Weil
2009-07-23 19:16                     ` Andi Kleen
2009-07-24  4:48                       ` Sage Weil
2009-07-23 19:17                     ` Andi Kleen
2009-07-23 18:26                   ` Sage Weil
2009-07-23 18:47                     ` Trond Myklebust
2009-07-24  4:44                       ` Sage Weil [this message]
2009-07-24  6:56                         ` Andi Kleen
2009-07-24 16:52                           ` Sage Weil
2009-07-24 19:40                         ` J. Bruce Fields
  -- strict thread matches above, loose matches on Subject: below --
2009-08-05 22:30 [PATCH 00/19] ceph distributed file system client Sage Weil
2009-08-05 22:30 ` [PATCH 01/19] ceph: documentation Sage Weil
2009-08-05 22:30   ` [PATCH 02/19] ceph: on-wire types Sage Weil
2009-08-05 22:30     ` [PATCH 03/19] ceph: client types Sage Weil
2009-08-05 22:30       ` [PATCH 04/19] ceph: super.c Sage Weil
2009-08-05 22:30         ` [PATCH 05/19] ceph: inode operations Sage Weil
2009-08-05 22:30           ` [PATCH 06/19] ceph: directory operations Sage Weil
2009-08-05 22:30             ` [PATCH 07/19] ceph: file operations Sage Weil
2009-08-05 22:30               ` [PATCH 08/19] ceph: address space operations Sage Weil
2008-11-14  0:55 [PATCH 00/19] ceph: Ceph distributed file system client Sage Weil
2008-11-14  0:56 ` [PATCH 01/19] ceph: documentation Sage Weil
2008-11-14  0:56   ` [PATCH 02/19] ceph: on-wire types Sage Weil
2008-11-14  0:56     ` [PATCH 03/19] ceph: client types Sage Weil
2008-11-14  0:56       ` [PATCH 04/19] ceph: super.c Sage Weil
2008-11-14  0:56         ` [PATCH 05/19] ceph: inode operations Sage Weil
2008-11-14  0:56           ` [PATCH 06/19] ceph: directory operations Sage Weil
2008-11-14  0:56             ` [PATCH 07/19] ceph: file operations Sage Weil
2008-11-14  0:56               ` [PATCH 08/19] ceph: address space operations Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0907231642590.2930@cobra.newdream.net \
    --to=sage@newdream.net \
    --cc=andi@firstfloor.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.