All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Potapenko <glider@google.com>
To: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jann Horn <jannh@google.com>,
	Aleksandr Nogikh <nogikh@google.com>,
	Taras Madan <tarasmadan@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	kasan-dev <kasan-dev@googlegroups.com>
Subject: Re: [PATCH v3 4/5] kfence: limit currently covered allocations when pool nearly full
Date: Thu, 23 Sep 2021 15:46:17 +0200	[thread overview]
Message-ID: <CAG_fn=VpgmcmLg7=bh6Mf6HNr6wZYUADJZfB5AuRkedCqas6-w@mail.gmail.com> (raw)
In-Reply-To: <CANpmjNOh0ugPq90cVRPAbR-6qr=Q4CsQ_R1Qxk_Bi4TocgwUQA@mail.gmail.com>

On Thu, Sep 23, 2021 at 3:44 PM Marco Elver <elver@google.com> wrote:
>
> On Thu, 23 Sept 2021 at 15:24, Alexander Potapenko <glider@google.com> wrote:
> >
> > On Thu, Sep 23, 2021 at 1:19 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > >
> > > On Thu, 23 Sept 2021 at 12:48, Marco Elver <elver@google.com> wrote:
> > > >
> > > > One of KFENCE's main design principles is that with increasing uptime,
> > > > allocation coverage increases sufficiently to detect previously
> > > > undetected bugs.
> > > >
> > > > We have observed that frequent long-lived allocations of the same
> > > > source (e.g. pagecache) tend to permanently fill up the KFENCE pool
> > > > with increasing system uptime, thus breaking the above requirement.
> > > > The workaround thus far had been increasing the sample interval and/or
> > > > increasing the KFENCE pool size, but is no reliable solution.
> > > >
> > > > To ensure diverse coverage of allocations, limit currently covered
> > > > allocations of the same source once pool utilization reaches 75%
> > > > (configurable via `kfence.skip_covered_thresh`) or above. The effect is
> > > > retaining reasonable allocation coverage when the pool is close to full.
> > > >
> > > > A side-effect is that this also limits frequent long-lived allocations
> > > > of the same source filling up the pool permanently.
> > > >
> > > > Uniqueness of an allocation for coverage purposes is based on its
> > > > (partial) allocation stack trace (the source). A Counting Bloom filter
> > > > is used to check if an allocation is covered; if the allocation is
> > > > currently covered, the allocation is skipped by KFENCE.
> > > >
> > > > Testing was done using:
> > > >
> > > >         (a) a synthetic workload that performs frequent long-lived
> > > >             allocations (default config values; sample_interval=1;
> > > >             num_objects=63), and
> > > >
> > > >         (b) normal desktop workloads on an otherwise idle machine where
> > > >             the problem was first reported after a few days of uptime
> > > >             (default config values).
> > > >
> > > > In both test cases the sampled allocation rate no longer drops to zero
> > > > at any point. In the case of (b) we observe (after 2 days uptime) 15%
> > > > unique allocations in the pool, 77% pool utilization, with 20% "skipped
> > > > allocations (covered)".
> > > >
> > > > Signed-off-by: Marco Elver <elver@google.com>
> > >
> > > Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
> > Acked-by: Alexander Potapenko <glider@google.com>
>
> Thank you both!
>
> > > > ---
> > > > v3:
> > > > * Remove unneeded !alloc_stack_hash checks.
> > > > * Remove unneeded meta->alloc_stack_hash=0 in kfence_guarded_free().
> > > >
> > > > v2:
> > > > * Switch to counting bloom filter to guarantee currently covered
> > > >   allocations being skipped.
> > > > * Use a module param for skip_covered threshold.
> > > > * Use kfence pool address as hash entropy.
> > > > * Use filter_irq_stacks().
> > > > ---
> > > >  mm/kfence/core.c   | 103 ++++++++++++++++++++++++++++++++++++++++++++-
> > > >  mm/kfence/kfence.h |   2 +
> > > >  2 files changed, 103 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/mm/kfence/core.c b/mm/kfence/core.c
> > > > index db01814f8ff0..58a0f6f1acc5 100644
> > > > --- a/mm/kfence/core.c
> > > > +++ b/mm/kfence/core.c
> > > > @@ -11,11 +11,13 @@
> > > >  #include <linux/bug.h>
> > > >  #include <linux/debugfs.h>
> > > >  #include <linux/irq_work.h>
> > > > +#include <linux/jhash.h>
> > > >  #include <linux/kcsan-checks.h>
> > > >  #include <linux/kfence.h>
> > > >  #include <linux/kmemleak.h>
> > > >  #include <linux/list.h>
> > > >  #include <linux/lockdep.h>
> > > > +#include <linux/log2.h>
> > > >  #include <linux/memblock.h>
> > > >  #include <linux/moduleparam.h>
> > > >  #include <linux/random.h>
> > > > @@ -82,6 +84,10 @@ static const struct kernel_param_ops sample_interval_param_ops = {
> > > >  };
> > > >  module_param_cb(sample_interval, &sample_interval_param_ops, &kfence_sample_interval, 0600);
> > > >
> > > > +/* Pool usage% threshold when currently covered allocations are skipped. */
> > > > +static unsigned long kfence_skip_covered_thresh __read_mostly = 75;
> > > > +module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644);
> > > > +
> > > >  /* The pool of pages used for guard pages and objects. */
> > > >  char *__kfence_pool __ro_after_init;
> > > >  EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */
> > > > @@ -105,6 +111,25 @@ DEFINE_STATIC_KEY_FALSE(kfence_allocation_key);
> > > >  /* Gates the allocation, ensuring only one succeeds in a given period. */
> > > >  atomic_t kfence_allocation_gate = ATOMIC_INIT(1);
> > > >
> > > > +/*
> > > > + * A Counting Bloom filter of allocation coverage: limits currently covered
> > > > + * allocations of the same source filling up the pool.
> > > > + *
> > > > + * Assuming a range of 15%-85% unique allocations in the pool at any point in
> >
> > Where do these 85% come from?
>
> An imaginary worst case, just to illustrate the range of the false
> positive probabilities (in the case of 85% it'd be 0.33). I expect
> unique allocations to be around 10-15% on a freshly booted system (on
> my real-system-experiment it stayed below 15%), but other workloads
> may produce other unique allocations%.
>
> > > > + * time, the below parameters provide a probablity of 0.02-0.33 for false
> > > > + * positive hits respectively:
> > > > + *
> > > > + *     P(alloc_traces) = (1 - e^(-HNUM * (alloc_traces / SIZE)) ^ HNUM
> > > > + */
> > > > +#define ALLOC_COVERED_HNUM     2
> > > > +#define ALLOC_COVERED_SIZE     (1 << (const_ilog2(CONFIG_KFENCE_NUM_OBJECTS) + 2))
> > > > +#define ALLOC_COVERED_HNEXT(h) (1664525 * (h) + 1013904223)
> >
> > Unless we are planning to change these primes, can you use
> > next_pseudo_random32() instead?
>
> I'm worried about next_pseudo_random32() changing their implementation
> to longer be deterministic or change in other ways that break our
> usecase. In this case we want pseudorandomness, but we're not
> implementing a PRNG.
>
> Open-coding the constants (given they are from "Numerical Recipes") is
> more reliable and doesn't introduce unwanted reliance on
> next_pseudo_random32()'s behaviour.

Okay, fair enough.

>
> Thanks,
> -- Marco



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

  reply	other threads:[~2021-09-23 13:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-23 10:47 [PATCH v3 1/5] stacktrace: move filter_irq_stacks() to kernel/stacktrace.c Marco Elver
2021-09-23 10:47 ` Marco Elver
2021-09-23 10:48 ` [PATCH v3 2/5] kfence: count unexpectedly skipped allocations Marco Elver
2021-09-23 10:48   ` Marco Elver
2021-09-23 11:15   ` Alexander Potapenko
2021-09-23 11:15     ` Alexander Potapenko
2021-09-23 10:48 ` [PATCH v3 3/5] kfence: move saving stack trace of allocations into __kfence_alloc() Marco Elver
2021-09-23 10:48   ` Marco Elver
2021-09-23 11:32   ` Alexander Potapenko
2021-09-23 11:32     ` Alexander Potapenko
2021-09-23 10:48 ` [PATCH v3 4/5] kfence: limit currently covered allocations when pool nearly full Marco Elver
2021-09-23 10:48   ` Marco Elver
2021-09-23 11:18   ` Dmitry Vyukov
2021-09-23 11:18     ` Dmitry Vyukov
2021-09-23 13:23     ` Alexander Potapenko
2021-09-23 13:23       ` Alexander Potapenko
2021-09-23 13:44       ` Marco Elver
2021-09-23 13:44         ` Marco Elver
2021-09-23 13:46         ` Alexander Potapenko [this message]
2021-09-23 13:46           ` Alexander Potapenko
2021-09-23 23:28         ` Andrew Morton
2021-09-24 13:01           ` Marco Elver
2021-09-23 10:48 ` [PATCH v3 5/5] kfence: add note to documentation about skipping covered allocations Marco Elver
2021-09-23 10:48   ` Marco Elver
2021-09-23 15:46   ` Alexander Potapenko
2021-09-23 15:46     ` Alexander Potapenko
2021-09-23 11:14 ` [PATCH v3 1/5] stacktrace: move filter_irq_stacks() to kernel/stacktrace.c Alexander Potapenko
2021-09-23 11:14   ` Alexander Potapenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG_fn=VpgmcmLg7=bh6Mf6HNr6wZYUADJZfB5AuRkedCqas6-w@mail.gmail.com' \
    --to=glider@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=jannh@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nogikh@google.com \
    --cc=tarasmadan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.