CEPH-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Eric Biggers <ebiggers@kernel.org>
Cc: ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-fscrypt@vger.kernel.org
Subject: Re: [RFC PATCH v2 01/18] vfs: export new_inode_pseudo
Date: Wed, 09 Sep 2020 06:47:28 -0400
Message-ID: <6cfe023df5cf6c50d6197bb7c71b3fa6a51bf555.camel@kernel.org> (raw)
In-Reply-To: <20200908223125.GA3760467@gmail.com>

On Tue, 2020-09-08 at 15:31 -0700, Eric Biggers wrote:
> On Tue, Sep 08, 2020 at 07:27:58AM -0400, Jeff Layton wrote:
> > On Mon, 2020-09-07 at 20:38 -0700, Eric Biggers wrote:
> > > On Fri, Sep 04, 2020 at 12:05:20PM -0400, Jeff Layton wrote:
> > > > Ceph needs to be able to allocate inodes ahead of a create that might
> > > > involve a fscrypt-encrypted inode. new_inode() almost fits the bill,
> > > > but it puts the inode on the sb->s_inodes list, and we don't want to
> > > > do that until we're ready to insert it into the hash.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/inode.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/fs/inode.c b/fs/inode.c
> > > > index 72c4c347afb7..61554c2477ab 100644
> > > > --- a/fs/inode.c
> > > > +++ b/fs/inode.c
> > > > @@ -935,6 +935,7 @@ struct inode *new_inode_pseudo(struct super_block *sb)
> > > >  	}
> > > >  	return inode;
> > > >  }
> > > > +EXPORT_SYMBOL(new_inode_pseudo);
> > > >  
> > > 
> > > What's the problem with putting the new inode on sb->s_inodes already?
> > > That's what all the other filesystems do.
> > > 
> > 
> > The existing ones are all local filesystems that use
> > insert_inode_locked() and similar paths. Ceph needs to use the '5'
> > variants of those functions (inode_insert5(), iget5_locked(), etc.).
> > 
> > When we go to insert it into the hash in inode_insert5(), we'd need to
> > set I_CREATING if allocated from new_inode(). But, if you do _that_,
> > then you'll get back ESTALE from find_inode() if (eg.) someone calls
> > iget5_locked() before you can clear I_CREATING.
> > 
> > Hitting that race is easy with an asynchronous create. The simplest
> > scheme to avoid that is to just export new_inode_pseudo and keep it off
> > of s_inodes until we're ready to do the insert. The only real issue here
> > is that this inode won't be findable by evict_inodes during umount, but
> > that shouldn't be happening during an active syscall anyway.
> 
> Is your concern the following scenario?
> 
> 1. ceph successfully created a new file on the server
> 2. inode_insert5() is called for the new file's inode
> 3. error occurs in ceph_fill_inode()
> 4. discard_new_inode() is called
> 5. another thread looks up the inode and gets ESTALE
> 6. iput() is finally called
> 
> And the claim is that the ESTALE in (5) is unexpected?  I'm not sure that it's
> unexpected, given that the file allegedly failed to be created...  Also, it
> seems that maybe (3) isn't something that should actually happen, since after
> (1) it's too late to fail the file creation.
> 

No, more like:

Syscall					Workqueue
------------------------------------------------------------------------------
1. allocate an inode
2. determine we can do an async create
   and allocate an inode number for it
3. hash the inode (must set I_CREATING
   if we allocated with new_inode()) 
4. issue the call to the MDS
5. finish filling out the inode()
6.					MDS reply comes in, and workqueue thread
					looks up new inode (-ESTALE)
7. unlock_new_inode()


Because 6 happens before 7 in this case, we get an ESTALE on that
lookup. By using new_inode_pseudo() and not setting I_CREATING, 6 ends
up waiting on the inode to be unlocked rather than giving up.

-- 
Jeff Layton <jlayton@kernel.org>


  reply index

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-04 16:05 [RFC PATCH v2 00/18] ceph+fscrypt: context, filename and symlink support Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 01/18] vfs: export new_inode_pseudo Jeff Layton
2020-09-08  3:38   ` Eric Biggers
2020-09-08 11:27     ` Jeff Layton
2020-09-08 22:31       ` Eric Biggers
2020-09-09 10:47         ` Jeff Layton [this message]
2020-09-09 16:12           ` Eric Biggers
2020-09-09 16:51             ` Jeff Layton
2020-09-09 18:49               ` Eric Biggers
2020-09-09 19:24                 ` Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 02/18] fscrypt: drop unused inode argument from fscrypt_fname_alloc_buffer Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 03/18] fscrypt: export fscrypt_d_revalidate Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 04/18] fscrypt: add fscrypt_new_context_from_inode Jeff Layton
2020-09-08  3:48   ` Eric Biggers
2020-09-08 11:29     ` Jeff Layton
2020-09-08 12:29     ` Jeff Layton
2020-09-08 22:34       ` Eric Biggers
2020-09-04 16:05 ` [RFC PATCH v2 05/18] fscrypt: don't balk when inode is already marked encrypted Jeff Layton
2020-09-08  3:52   ` Eric Biggers
2020-09-08 12:54     ` Jeff Layton
2020-09-08 23:08       ` Eric Biggers
2020-09-04 16:05 ` [RFC PATCH v2 06/18] fscrypt: move nokey_name conversion to separate function and export it Jeff Layton
2020-09-08  3:55   ` Eric Biggers
2020-09-08 12:50     ` Jeff Layton
2020-09-08 22:53       ` Eric Biggers
2020-09-09 16:02         ` Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 07/18] lib: lift fscrypt base64 conversion into lib/ Jeff Layton
2020-09-08  3:59   ` Eric Biggers
2020-09-08 12:51     ` Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 08/18] ceph: add fscrypt ioctls Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 09/18] ceph: crypto context handling for ceph Jeff Layton
2020-09-08  4:29   ` Eric Biggers
2020-09-08 16:14     ` Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 10/18] ceph: preallocate inode for ops that may create one Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 11/18] ceph: add routine to create context prior to RPC Jeff Layton
2020-09-08  4:43   ` Eric Biggers
2020-09-04 16:05 ` [RFC PATCH v2 12/18] ceph: set S_ENCRYPTED bit if new inode has encryption.ctx xattr Jeff Layton
2020-09-08  4:57   ` Eric Biggers
2020-09-09 12:20     ` Jeff Layton
2020-09-09 15:53     ` Jeff Layton
2020-09-09 16:33       ` Eric Biggers
2020-09-09 17:19         ` Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 13/18] ceph: make ceph_msdc_build_path use ref-walk Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 14/18] ceph: add encrypted fname handling to ceph_mdsc_build_path Jeff Layton
2020-09-08  5:06   ` Eric Biggers
2020-09-09 12:24     ` Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 15/18] ceph: make d_revalidate call fscrypt revalidator for encrypted dentries Jeff Layton
2020-09-08  5:12   ` Eric Biggers
2020-09-09 12:26     ` Jeff Layton
2020-09-09 16:18       ` Eric Biggers
2020-09-04 16:05 ` [RFC PATCH v2 16/18] ceph: add support to readdir for encrypted filenames Jeff Layton
2020-09-08  5:34   ` Eric Biggers
2020-09-09 13:02     ` Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 17/18] ceph: add fscrypt support to ceph_fill_trace Jeff Layton
2020-09-04 16:05 ` [RFC PATCH v2 18/18] ceph: create symlinks with encrypted and base64-encoded targets Jeff Layton
2020-09-04 16:11   ` Jeff Layton
2020-09-08  5:43   ` Eric Biggers
2020-09-08  5:54 ` [RFC PATCH v2 00/18] ceph+fscrypt: context, filename and symlink support Eric Biggers
2020-09-08 12:09   ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6cfe023df5cf6c50d6197bb7c71b3fa6a51bf555.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ebiggers@kernel.org \
    --cc=linux-fscrypt@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

CEPH-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/ceph-devel/0 ceph-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ceph-devel ceph-devel/ https://lore.kernel.org/ceph-devel \
		ceph-devel@vger.kernel.org
	public-inbox-index ceph-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.ceph-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git