From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-yw0-f195.google.com ([209.85.161.195]:42589 "EHLO
        mail-yw0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1753107AbeFIIAS (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Sat, 9 Jun 2018 04:00:18 -0400
Received: by mail-yw0-f195.google.com with SMTP id q7-v6so4843791ywd.9
        for <linux-fsdevel@vger.kernel.org>; Sat, 09 Jun 2018 01:00:18 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <CAOQ4uxhKkJ=4x4UbcgZRci-qJ5XrY0NmNYfR6zyoRZH4UzQGaQ@mail.gmail.com>
References: <20180607150217.jq757ncqopuimbkd@quack2.suse.cz>
 <CA+55aFwTNYAfN1RZOZ0CQ39fJTH2La_6BVqmo6VuUg_vzMhzOw@mail.gmail.com>
 <20180608132737.etbkpqg77yz3vhp7@quack2.suse.cz> <CA+55aFyhzE_g8zemF4qyb_Y33=9YLV7_Hy6ASVyJPO8mFh3QSA@mail.gmail.com>
 <CAOQ4uxhKkJ=4x4UbcgZRci-qJ5XrY0NmNYfR6zyoRZH4UzQGaQ@mail.gmail.com>
From: Amir Goldstein <amir73il@gmail.com>
Date: Sat, 9 Jun 2018 11:00:17 +0300
Message-ID: <CAOQ4uxiGom8qdMA+8Gy+LHhR8fF8fX5nHbYRHHxjDgou6BxYiA@mail.gmail.com>
Subject: Re: [GIT PULL] Fsnotify cleanups
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Sat, Jun 9, 2018 at 9:57 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, Jun 8, 2018 at 11:34 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Fri, Jun 8, 2018 at 6:27 AM Jan Kara <jack@suse.cz> wrote:
>>>
>>> The alignment could be fixed by having
>>>
>>> struct inode {
>>>         ...
>>>         struct fsnotify_obj i_fsnotify __attribute__ ((aligned(sizeof(void *))));
>>>         ...
>>> }
>>>
>>> but that's too ugly even for my twisted taste. And before someone asks,
>>> adding aligned attribute to the definition of fsnotify_obj structure makes
>>> it grow the padding at the end again.
>>
>> Yeah, I don't think there's any way to say "this can't be an array
>> member, don't pad the size out to the alignment".
>>
>> I do wonder if perhaps "struct fsnotify_obj" could actually have a
>> mask and a 32-bit object ID instead, which would (a) avoid the
>> alignment issue and (b) actually shrink the structure onm 64-bit.
>>
>> Obviously you'd then have to look up the pointer from the object ID
>> (presumably using a hash, but maybe it would use the radix tree of
>> idr_alloc() or something).
>>
>> I haven't looked at how often those pointers actually get looked up.
>> If the 'mask' makes it fairly rare, then the extra indirection might
>> be ok. I suspect it's not. But I thought I'd mention it as a possible
>> approach.
>
> As a matter of fact, the pointer is checked to be non zero before
> accessing the mask:
>
> 7c49b8616460 fs/notify: optimize inotify/fsnotify code for unwatched files
>
> So the mask is not really needed to optimize away unwatched inodes.
> I guess the mask is needed to optimize away use cases like frequent
> fsnotify_access() calls on inodes that are watched for modification only,
> but not sure that is worth optimizing.
>
>>
>> The other approach is - as you say - to move 'mask' to be behind the
>> pointer. But if that causes a lot of extra pointer chasing that mask
>> would normally avoid, that could be really expensive too. We see that
>> with the security objects: the cost of looking up whatever is behind
>> i_security is really nasty, because every inode gets allocated a
>> private security structure, so it's absolutely horrendous for cache
>> effects (no sharing).
>
> Besides pointer chasing, there is also srcu_read_lock.
>
>>
>> But I suspect that the behavior of i_fsnotify_marks is very different
>> from i_security, in that most inodes probably don't have marks at all.
>> No? Yes? So you might not have the nasty case of "almost all inode
>> access ends up following that pointer and it's horrible in the cache".
>>
>
> Yes and No :-)
> Yes, most inodes don't have marks at all,
> but, every fsnotify() call may need to follow mnt->mnt_fsnotify.marks
> if mount is watched and there is no mnt->mnt_fsnotify.mask in fsnotify_obj.
>
> Sure, we don't need to worry about 32bits in struct mount, so we could
> have a different attach method for inodes and mounts, but the whole
> purpose of this cleanup is to make the code more generic in handing
> and attaching of marks to inode/mounts.
>
> How about if we move mask into connector, but also leave a shadow mask,
> in an "inherited" struct i.e.:
>
> /* struct to embed in objects, which marks can be attached to */
> struct fsnotify_obj {
>        struct fsnotify_mark_connector __rcu *marks;
> };
>
> /* struct to embed in objects, which marks can be attached to with
> shadow mask */
> struct fsnotify_obj_mask {
>        struct fsnotify_obj obj;
>        /* all events this object cares about */
>        __u32 mask;
> };
>
> We embed fsnotify_obj in struct inode and fsnotify_obj_mask in struct mount.
>
> The only code that needs to test the shadow mask before following into
> connector is the optimization code in start of fsnotify() which was not
> generalized anyway and where the mask optimization is more important
> for mount watches.
>

No. That's not true, there is also fsnotify_inode_watches_children() and
__fsnotify_parent(). But I think the only bits in the mask that really matter
for optimizing inode watches are FS_ACCESS and FS_EVENT_ON_CHILD
and those could be encoded in lower bits of connector pointer.
connector is pointer is always dereferenced via fsnotify_grab_connector(),
so it's easy to mask out those bits.

> The only code that sets the mask is __fsnotify_recalc_mask(), needs to
> set the shadow mask, if conn->type != FSNOTIFY_OBJ_TYPE_INODE.
> We already special case FSNOTIFY_OBJ_TYPE_INODE for pinning inode
> in several places, so there is nothing new here.
>

So instead, __fsnotify_recalc_mask() needs to set the bits on connector
pointer in FSNOTIFY_OBJ_TYPE_INODE case.

I'll try to sketch some patches.

Thanks,
Amir.