All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Potapenko <glider@google.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Kees Cook <keescook@chromium.org>,
	Eric Biggers <ebiggers@google.com>,
	Christian Brauner <brauner@kernel.org>,
	serge@hallyn.com, paul@paul-moore.com,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org
Subject: Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible
Date: Sat, 4 Mar 2023 20:01:45 +0100	[thread overview]
Message-ID: <CAGudoHH8t9_5iLd8FsTW4PBZ+_vGad3YAd8K=n=SrRtnWHm49Q@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgz51x2gaiD4=6T3UGZtKOSm3k56iq=h4tqy3wQsN-VTA@mail.gmail.com>

On 3/3/23, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, Mar 3, 2023 at 12:39 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>>
>> I think there is a systemic problem which comes with the kzalloc API
>
> Well, it's not necessarily the API that is bad, but the implementation.
>
> We could easily make kzalloc() with a constant size just expand to
> kmalloc+memset, and get the behavior you want.
>
> We already do magical things for "find the right slab bucket" part of
> kmalloc too for constant sizes. It's changed over the years, but that
> policy goes back a long long time. See
>
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=95203fe78007f9ab3aebb96606473ae18c00a5a8
>
> from the BK history tree.
>
> Exactly because some things are worth optimizing for when the size is
> known at compile time.
>
> Maybe just extending kzalloc() similarly? Trivial and entirely untested
> patch:
>
>    --- a/include/linux/slab.h
>    +++ b/include/linux/slab.h
>    @@ -717,6 +717,12 @@ static inline void *kmem_cache_zalloc(struct
> kmem_cache *k, gfp_t flags)
>      */
>     static inline __alloc_size(1) void *kzalloc(size_t size, gfp_t flags)
>     {
>    +    if (__builtin_constant_p(size)) {
>    +            void *ret = kmalloc(size, flags);
>    +            if (ret)
>    +                    memset(ret, 0, size);
>    +            return ret;
>    +    }
>         return kmalloc(size, flags | __GFP_ZERO);
>     }
>

So I played with this and have a rather nasty summary. Bullet points:
1. patched kzalloc does not reduce memsets calls during kernel build
2. patched kmem_cache_zalloc_ptr + 2 consumers converted *does* drop
it significantly (36150671 -> 14414454)
3. ... inline memset generated by gcc sucks by resorting to rep stosq
around 48 bytes
4. memsets not sorted out have sizes not known at compilation time and
are not necessarily perf bugs on their own [read: would benefit from
faster memset]

Onto the the meat:

I patched the kernel with a slightly tweaked version of the above:
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 45af70315a94..7abb5490690f 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -717,6 +717,12 @@ static inline void *kmem_cache_zalloc(struct
kmem_cache *k, gfp_t flags)
  */
 static inline __alloc_size(1) void *kzalloc(size_t size, gfp_t flags)
 {
+       if (__builtin_constant_p(size)) {
+               void *ret = kmalloc(size, flags);
+               if (likely(ret))
+                       memset(ret, 0, size);
+               return ret;
+       }
        return kmalloc(size, flags | __GFP_ZERO);
 }

and verified it indeed zeroes inline:

void kztest(void)
{
        void *ptr;

        ptr = kzalloc(32, GFP_KERNEL);
        if (unlikely(!ptr))
                return;
        memsettest_rec(ptr);
}

$ objdump --disassemble=kztest vmlinux
[snip]
call   ffffffff8135e130 <kmalloc_trace>
test   %rax,%rax
je     ffffffff81447d5f <kztest+0x4f>
movq   $0x0,(%rax)
mov    %rax,%rdi
movq   $0x0,0x8(%rax)
movq   $0x0,0x10(%rax)
movq   $0x0,0x18(%rax)
call   ffffffff81454060 <memsettest_rec>
[snip]

This did *NOT* lead to reduction of memset calls when building the kernel.

I verified few cases by hand, it is all either kmem_cache_zalloc or
explicitly added memsets with sizes not known at compilation time.

Two most frequent callers:
@[
    memset+5
    __alloc_file+40
    alloc_empty_file+73
    path_openat+77
    do_filp_open+182
    do_sys_openat2+159
    __x64_sys_openat+89
    do_syscall_64+93
    entry_SYSCALL_64_after_hwframe+114
]: 11028994
@[
    memset+5
    security_file_alloc+45
    __alloc_file+89
    alloc_empty_file+73
    path_openat+77
    do_filp_open+182
    do_sys_openat2+159
    __x64_sys_openat+89
    do_syscall_64+93
    entry_SYSCALL_64_after_hwframe+114
]: 11028994

My wip addition is:

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 45af70315a94..12b5b02ef3d3 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -710,6 +710,17 @@ static inline void *kmem_cache_zalloc(struct
kmem_cache *k, gfp_t flags)
        return kmem_cache_alloc(k, flags | __GFP_ZERO);
 }

+#define kmem_cache_zalloc_ptr(k, f, retp) ({                           \
+       __typeof(retp) _retp = kmem_cache_alloc(k, f);                  \
+       bool _rv = false;                                               \
+       retp = _retp;                                                   \
+       if (likely(_retp)) {                                            \
+               memset(_retp, 0, sizeof(*_retp));                       \
+               _rv = true;                                             \
+       }                                                               \
+       _rv;                                                            \
+})
+
diff --git a/security/security.c b/security/security.c
index cf6cc576736f..0f769ede0e54 100644
--- a/security/security.c
+++ b/security/security.c
@@ -600,8 +600,7 @@ static int lsm_file_alloc(struct file *file)
                return 0;
        }

-       file->f_security = kmem_cache_zalloc(lsm_file_cache, GFP_KERNEL);
-       if (file->f_security == NULL)
+       if (!kmem_cache_zalloc_ptr(lsm_file_cache, GFP_KERNEL,
file->f_security))
                return -ENOMEM;
        return 0;
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index 372653b92617..8e0dabf9530e 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -136,8 +136,7 @@ static struct file *__alloc_file(int flags, const
struct cred *cred)
        struct file *f;
        int error;

-       f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL);
-       if (unlikely(!f))
+       if (!kmem_cache_zalloc_ptr(filp_cachep, GFP_KERNEL, f))
                return ERR_PTR(-ENOMEM);

        f->f_cred = get_cred(cred);

As mentioned above it cuts total calls in more than half.

The problem is it is it rolls with rep stosq way too easily, partially
defeating the point of inlining anything. clang does not have this
problem.

Take a look at __alloc_file:
[snip]
mov    0x19cab05(%rip),%rdi        # ffffffff82df4318 <filp_cachep>
call   ffffffff813dd610 <kmem_cache_alloc>
test   %rax,%rax
je     ffffffff814298b7 <__alloc_file+0xc7>
mov    %rax,%r12
mov    $0x1d,%ecx
xor    %eax,%eax
mov    %r12,%rdi
rep stos %rax,%es:(%rdi)
[/snip]

Here is a sample consumer which can't help but have a variable size --
select, used by gmake:
@[
    memset+5
    do_pselect.constprop.0+202
    __x64_sys_pselect6+101
    do_syscall_64+93
    entry_SYSCALL_64_after_hwframe+114
]: 13160

In conclusion:
1. fixing up memsets is worthwhile regardless of what happens to its
current consumers -- not all of them are necessarily doing something
wrong
2. inlining memset can be a pessimization vs non-plain-erms memset as
evidenced above. will have to figure out how to convince gcc to be
less eager to use it.

Sometimes I hate computers.

-- 
Mateusz Guzik <mjguzik gmail.com>

  parent reply	other threads:[~2023-03-04 19:01 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 15:55 [PATCH v3 1/2] capability: add cap_isidentical Mateusz Guzik
2023-01-25 15:55 ` [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible Mateusz Guzik
2023-02-28  0:44   ` Linus Torvalds
2023-03-02  8:30     ` Christian Brauner
2023-03-02 17:51       ` Linus Torvalds
2023-03-02 18:14         ` Mateusz Guzik
2023-03-02 18:18           ` Al Viro
2023-03-02 18:22             ` Mateusz Guzik
2023-03-02 18:43               ` Al Viro
2023-03-02 18:51                 ` Mateusz Guzik
2023-03-02 19:02                 ` Al Viro
2023-03-02 19:18                   ` Al Viro
2023-03-02 19:03               ` Linus Torvalds
2023-03-02 19:10                 ` Linus Torvalds
2023-03-02 19:19                   ` Al Viro
2023-03-02 19:54                     ` Kees Cook
2023-03-02 20:11                       ` Al Viro
2023-03-03 15:30                         ` Alexander Potapenko
2023-03-03 17:39                           ` Mateusz Guzik
2023-03-03 17:54                             ` Linus Torvalds
2023-03-03 19:37                               ` Mateusz Guzik
2023-03-03 19:38                                 ` Mateusz Guzik
2023-03-03 20:08                                 ` Linus Torvalds
2023-03-03 20:39                                   ` Mateusz Guzik
2023-03-03 20:58                                     ` Linus Torvalds
2023-03-03 21:09                                       ` Mateusz Guzik
2023-03-04 19:01                                       ` Mateusz Guzik [this message]
2023-03-04 20:31                                         ` Mateusz Guzik
2023-03-04 20:48                                           ` Linus Torvalds
2023-03-05 17:23                                             ` David Laight
2023-03-04  1:29                                     ` Linus Torvalds
2023-03-04  3:25                                       ` Yury Norov
2023-03-04  3:42                                         ` Linus Torvalds
2023-03-04  5:51                                           ` Yury Norov
2023-03-04 16:41                                             ` David Vernet
2023-03-04 19:02                                             ` Linus Torvalds
2023-03-04 19:19                                             ` Linus Torvalds
2023-03-04 20:34                                               ` Linus Torvalds
2023-03-04 20:51                                               ` Yury Norov
2023-03-04 21:01                                                 ` Linus Torvalds
2023-03-04 21:03                                                   ` Linus Torvalds
2023-03-04 21:10                                                   ` Linus Torvalds
2023-03-04 23:08                                                     ` Linus Torvalds
2023-03-04 23:52                                                       ` Linus Torvalds
2023-03-05  9:26                                                       ` Sedat Dilek
2023-03-05 18:17                                                         ` Linus Torvalds
2023-03-05 18:43                                                           ` Linus Torvalds
2023-03-06  5:43                                                       ` Yury Norov
2023-03-04 20:18                                     ` Al Viro
2023-03-04 20:42                                       ` Mateusz Guzik
2023-03-02 19:38                   ` Kees Cook
2023-03-02 19:48                     ` Eric Biggers
2023-03-02 18:41             ` Al Viro
2023-03-03 14:49         ` Christian Brauner
2023-03-02 18:11       ` Al Viro
2023-03-03 14:27         ` Christian Brauner
2023-02-28  1:14 ` [PATCH v3 1/2] capability: add cap_isidentical Linus Torvalds
2023-02-28  2:46   ` Casey Schaufler
2023-02-28 14:47     ` Mateusz Guzik
2023-02-28 19:39       ` Linus Torvalds
2023-02-28 19:51         ` Linus Torvalds
2023-02-28 20:48         ` Linus Torvalds
2023-02-28 21:21           ` Mateusz Guzik
2023-02-28 21:29             ` Linus Torvalds
2023-03-01 18:13               ` Linus Torvalds
2023-02-28 17:32     ` Serge E. Hallyn
2023-02-28 17:52       ` Casey Schaufler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGudoHH8t9_5iLd8FsTW4PBZ+_vGad3YAd8K=n=SrRtnWHm49Q@mail.gmail.com' \
    --to=mjguzik@gmail.com \
    --cc=brauner@kernel.org \
    --cc=ebiggers@google.com \
    --cc=glider@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=paul@paul-moore.com \
    --cc=serge@hallyn.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.