All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Jeff Layton <jlayton@kernel.org>,
	 Chuck Lever <chuck.lever@oracle.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>,
	linux-kernel@vger.kernel.org,  Jens Axboe <axboe@kernel.dk>,
	Tejun Heo <tj@kernel.org>,
	Vasily Averin <vasily.averin@linux.dev>,
	 Michal Koutny <mkoutny@suse.com>,
	Waiman Long <longman@redhat.com>,
	 Muchun Song <muchun.song@linux.dev>,
	Jiri Kosina <jikos@kernel.org>,
	cgroups@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
Date: Sun, 21 Jan 2024 21:10:09 -0800	[thread overview]
Message-ID: <CAHk-=whYOOdM7jWy5jdrAm8LxcgCMFyk2bt8fYYvZzM4U-zAQA@mail.gmail.com> (raw)
In-Reply-To: <CALvZod4V3QTULTW5QxgqCbDpNtVO6fXzta33HR7GN=L2LUU26g@mail.gmail.com>

On Wed, 17 Jan 2024 at 14:56, Shakeel Butt <shakeelb@google.com> wrote:
> >
> > So I don't see how we can make it really cheap (say, less than 5% overhead)
> > without caching pre-accounted objects.
>
> Maybe this is what we want. Now we are down to just SLUB, maybe such
> caching of pre-accounted objects can be in SLUB layer and we can
> decide to keep this caching per-kmem-cache opt-in or always on.

So it turns out that we have another case of SLAB_ACCOUNT being quite
a big expense, and it's actually the normal - but failed - open() or
execve() case.

See the thread at

    https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@mail.gmail.com/

and to see the effect in profiles, you can use this EXTREMELY stupid
test program:

    #include <fcntl.h>

    int main(int argc, char **argv)
    {
        for (int i = 0; i < 10000000; i++)
                open("nonexistent", O_RDONLY);
    }

where the point of course is that the "nonexistent" pathname doesn't
actually exist (so don't create a file called that for the test).

What happens is that open() allocates a 'struct file *' early from the
filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting
for it, failt the pathname open, and free it again, which uncharges
the accounting.

Now, in this case, I actually have a suggestion: could we please just
make SLAB_ACCOUNT be something that we do *after* the allocation, kind
of the same way the zeroing works?

IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make
slab_post_alloc_hook() do all the "charge the memcg if required".

Obviously that means that now a failure to charge the memcg would have
to then de-allocate things, but that's an uncommon path and would be
marked unlikely and not be in the hot path at all.

Now, the reason I would prefer that is that the *second* step would be to

 (a) expose a "kmem_cache_charge()" function that takes a
*non*-accounted slab allocation, and turns it into an accounted one
(and obviously this is why you want to do everything in the post-alloc
hook: just try to share this code)

 (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file
allocations start out unaccounted.

 (c) when we have *actually* looked up the pathname and open the file
successfully, at *that* point we'd do a

        error = kmem_cache_charge(filp_cachep, file);

    in do_dentry_open() to turn the unaccounted file pointer into an
accounted one (and if that fails, we do the cleanup and free it, of
course, exactly like we do when file_get_write_access() fails)

which means that now the failure case doesn't unnecessarily charge the
allocation that never ends up being finalized.

NOTE! I think this would clean up mm/slub.c too, simply because it
would get rid of that memcg_slab_pre_alloc_hook() entirely, and get
rid of the need to carry the "struct obj_cgroup **objcgp" pointer
along until the post-alloc hook: everything would be done post-alloc.

The actual kmem_cache_free() code already deals with "this slab hasn't
been accounted" because it obviously has to deal with allocations that
were done without __GFP_ACCOUNT anyway. So there's no change needed on
the freeing path, it already has to handle this all gracefully.

I may be missing something, but it would seem to have very little
downside, and fix a case that actually is visible right now.

              Linus

  reply	other threads:[~2024-01-22  5:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 16:14 [PATCH RFC 0/4] Fix file lock cache accounting, again Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 1/4] fs/locks: " Josh Poimboeuf
2024-01-17 19:00   ` Jeff Layton
2024-01-17 19:39     ` Josh Poimboeuf
2024-01-17 20:20       ` Linus Torvalds
2024-01-17 21:02         ` Shakeel Butt
2024-01-17 22:20           ` Roman Gushchin
2024-01-17 22:56             ` Shakeel Butt
2024-01-22  5:10               ` Linus Torvalds [this message]
2024-01-22 17:38                 ` Shakeel Butt
2024-01-26  9:50                 ` Vlastimil Babka
2024-01-30 11:04                   ` Vlastimil Babka
2024-01-19  7:47             ` Shakeel Butt
2024-01-17 21:19         ` Vlastimil Babka
2024-01-17 21:50         ` Roman Gushchin
2024-01-18  9:49     ` Michal Hocko
2024-01-17 16:14 ` [PATCH RFC 2/4] fs/locks: Add CONFIG_FLOCK_ACCOUNTING Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 3/4] mitigations: Expand 'mitigations=off' to include optional software mitigations Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 4/4] mitigations: Add flock cache accounting to 'mitigations=off' Josh Poimboeuf
2024-01-18  9:04   ` Michal Koutný

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whYOOdM7jWy5jdrAm8LxcgCMFyk2bt8fYYvZzM4U-zAQA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=jikos@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=jpoimboe@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.