All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Drokin <green@linuxhacker.ru>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mailing List <linux-kernel@vger.kernel.org>,
	"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>
Subject: Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
Date: Wed, 6 Jul 2016 00:35:30 -0400	[thread overview]
Message-ID: <EF5C00CA-5E49-43E8-87A5-0511DCAC148E@linuxhacker.ru> (raw)
In-Reply-To: <6334F6B6-807C-4B36-B785-7DD502E0D70B@linuxhacker.ru>


On Jul 5, 2016, at 11:25 PM, Oleg Drokin wrote:

> 
> On Jul 5, 2016, at 11:20 PM, Al Viro wrote:
> 
>> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
>>>> +		/* Otherwise we just unhash it to be rehashed afresh via
>>>> +		 * lookup if necessary
>>>> +		 */
>>>> +		d_drop(dentry);
>>> 
>>> So we can even drop this part and retain the top condition as it was.
>>> d_add does not care if the dentry we are feeding it was hashed or not,
>>> so do you see any downsides to doing that I wonder?
>> 
>> d_add() on hashed dentry will end up reaching this:
>> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
>> {
>>       BUG_ON(!d_unhashed(entry));
> 
> Ah, ok. Yes, I remember about it now from the older issue with nfs.
> 
> It's still puzzling why I did not hit it yet, but oh well.
> 
> I wonder if handling of negative dentries broke… Time for more investigations.

Ah, the reason was just looking me in the eyes as the first thing we do in
ll_revalidate_dentry():

        if ((lookup_flags & (LOOKUP_OPEN | LOOKUP_CREATE)) ==
            (LOOKUP_OPEN | LOOKUP_CREATE))
                return 0;


Apparently we are trying to protect from the case where dentry is valid when we
check it, but gets pulled under us as soon as we are done (no lustre locking held
around it).
So if we don't do this, we fall into ll_file_open/ll_intent_file_open and form
a request to open the file based on the inode. If this inode is already gone
server-side, we might get ESTALE, since we don't really send the name to the
server in this case so it does not know what is it we are after anyway.

On the userspace side of things the picture is not pretty then, we are doing
open(O_CREAT) and get ESTALE back.

Looking at do_filp_open(), there's actually a retry:
        if (unlikely(filp == ERR_PTR(-ESTALE)))
                filp = path_openat(&nd, op, flags | LOOKUP_REVAL);

But it's only once so for a contended file we can still have some other thread
do a lookup, provide us with a new cached dentry that's similarly pulled
under us next time around.
But I guess the whole idea of this is to let us know the file is contended and
we can just only fail revalidate then.

Hm.. looking at the server - it's even worse, we'd get ENOENT from the server if
the desired inode no longer exist, so a perfect opportunity for the
client to turn that ENOENT into ESTALE if the create flag was set.

WARNING: multiple messages have this Message-ID (diff)
From: Oleg Drokin <green@linuxhacker.ru>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mailing List <linux-kernel@vger.kernel.org>,
	"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>
Subject: Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
Date: Wed, 6 Jul 2016 00:35:30 -0400	[thread overview]
Message-ID: <EF5C00CA-5E49-43E8-87A5-0511DCAC148E@linuxhacker.ru> (raw)
In-Reply-To: <6334F6B6-807C-4B36-B785-7DD502E0D70B@linuxhacker.ru>


On Jul 5, 2016, at 11:25 PM, Oleg Drokin wrote:

> 
> On Jul 5, 2016, at 11:20 PM, Al Viro wrote:
> 
>> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
>>>> +		/* Otherwise we just unhash it to be rehashed afresh via
>>>> +		 * lookup if necessary
>>>> +		 */
>>>> +		d_drop(dentry);
>>> 
>>> So we can even drop this part and retain the top condition as it was.
>>> d_add does not care if the dentry we are feeding it was hashed or not,
>>> so do you see any downsides to doing that I wonder?
>> 
>> d_add() on hashed dentry will end up reaching this:
>> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
>> {
>>       BUG_ON(!d_unhashed(entry));
> 
> Ah, ok. Yes, I remember about it now from the older issue with nfs.
> 
> It's still puzzling why I did not hit it yet, but oh well.
> 
> I wonder if handling of negative dentries broke� Time for more investigations.

Ah, the reason was just looking me in the eyes as the first thing we do in
ll_revalidate_dentry():

        if ((lookup_flags & (LOOKUP_OPEN | LOOKUP_CREATE)) ==
            (LOOKUP_OPEN | LOOKUP_CREATE))
                return 0;


Apparently we are trying to protect from the case where dentry is valid when we
check it, but gets pulled under us as soon as we are done (no lustre locking held
around it).
So if we don't do this, we fall into ll_file_open/ll_intent_file_open and form
a request to open the file based on the inode. If this inode is already gone
server-side, we might get ESTALE, since we don't really send the name to the
server in this case so it does not know what is it we are after anyway.

On the userspace side of things the picture is not pretty then, we are doing
open(O_CREAT) and get ESTALE back.

Looking at do_filp_open(), there's actually a retry:
        if (unlikely(filp == ERR_PTR(-ESTALE)))
                filp = path_openat(&nd, op, flags | LOOKUP_REVAL);

But it's only once so for a contended file we can still have some other thread
do a lookup, provide us with a new cached dentry that's similarly pulled
under us next time around.
But I guess the whole idea of this is to let us know the file is contended and
we can just only fail revalidate then.

Hm.. looking at the server - it's even worse, we'd get ENOENT from the server if
the desired inode no longer exist, so a perfect opportunity for the
client to turn that ENOENT into ESTALE if the create flag was set.

  reply	other threads:[~2016-07-06  4:35 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17  4:09 More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes Oleg Drokin
2016-06-17  4:29 ` Al Viro
2016-06-25 16:38   ` Oleg Drokin
2016-07-03  6:29     ` Al Viro
2016-07-04  0:08       ` Al Viro
2016-07-04  0:37         ` Oleg Drokin
2016-07-04  0:37           ` Oleg Drokin
2016-07-04  3:08           ` Al Viro
2016-07-04  3:55             ` Oleg Drokin
2016-07-04  3:55               ` Oleg Drokin
2016-07-05  2:25               ` Al Viro
2016-07-10 17:01                 ` Oleg Drokin
2016-07-10 18:14                   ` James Simmons
2016-07-11  1:01                     ` Al Viro
2016-07-11  1:03                       ` Al Viro
2016-07-11 22:54                         ` lustre sendmsg stuff Oleg Drokin
2016-07-11 17:15                       ` More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes James Simmons
2016-07-05  2:28       ` Oleg Drokin
2016-07-05  2:32         ` Oleg Drokin
2016-07-05  4:43         ` Oleg Drokin
2016-07-05  6:22       ` Oleg Drokin
2016-07-05 12:31         ` Al Viro
2016-07-05 13:51           ` Al Viro
2016-07-05 15:21             ` Oleg Drokin
2016-07-05 17:42               ` Al Viro
2016-07-05 18:12                 ` Oleg Drokin
2016-07-05 16:33             ` Oleg Drokin
2016-07-05 18:08               ` Al Viro
2016-07-05 19:12                 ` Oleg Drokin
2016-07-05 20:08                   ` Al Viro
2016-07-05 20:21                     ` Oleg Drokin
2016-07-06  0:29                       ` Oleg Drokin
2016-07-06  3:20                         ` Al Viro
2016-07-06  3:25                           ` Oleg Drokin
2016-07-06  3:25                             ` Oleg Drokin
2016-07-06  4:35                             ` Oleg Drokin [this message]
2016-07-06  4:35                               ` Oleg Drokin
2016-07-06 16:24             ` Oleg Drokin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=EF5C00CA-5E49-43E8-87A5-0511DCAC148E@linuxhacker.ru \
    --to=green@linuxhacker.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.