All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Alexander Potapenko <glider@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Matthias Brugger" <matthias.bgg@gmail.com>,
	kasan-dev <kasan-dev@googlegroups.com>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Linux ARM" <linux-arm-kernel@lists.infradead.org>,
	<linux-mediatek@lists.infradead.org>,
	wsd_upstream <wsd_upstream@mediatek.com>,
	Miles Chen <miles.chen@mediatek.com>,
	<nicholas.tang@mediatek.com>
Subject: Re: [PATCH 1/1] kasan: fix object remain in offline per-cpu quarantine
Date: Fri, 13 Nov 2020 10:31:54 +0800	[thread overview]
Message-ID: <1605234714.30076.18.camel@mtksdccf07> (raw)
In-Reply-To: <CACT4Y+bpDTqQRRdV0_O07H=Kczj3nXUY9ngQgX5K=BtT=Y60RQ@mail.gmail.com>

On Thu, 2020-11-12 at 09:39 +0100, Dmitry Vyukov wrote:
> On Thu, Nov 12, 2020 at 7:25 AM Kuan-Ying Lee
> <Kuan-Ying.Lee@mediatek.com> wrote:
> >
> > We hit this issue in our internal test.
> > When enabling generic kasan, a kfree()'d object is put into per-cpu
> > quarantine first. If the cpu goes offline, object still remains in
> > the per-cpu quarantine. If we call kmem_cache_destroy() now, slub
> > will report "Objects remaining" error.
> >
> > [   74.982625] =============================================================================
> > [   74.983380] BUG test_module_slab (Not tainted): Objects remaining in test_module_slab on __kmem_cache_shutdown()
> > [   74.984145] -----------------------------------------------------------------------------
> > [   74.984145]
> > [   74.984883] Disabling lock debugging due to kernel taint
> > [   74.985561] INFO: Slab 0x(____ptrval____) objects=34 used=1 fp=0x(____ptrval____) flags=0x2ffff00000010200
> > [   74.986638] CPU: 3 PID: 176 Comm: cat Tainted: G    B             5.10.0-rc1-00007-g4525c8781ec0-dirty #10
> > [   74.987262] Hardware name: linux,dummy-virt (DT)
> > [   74.987606] Call trace:
> > [   74.987924]  dump_backtrace+0x0/0x2b0
> > [   74.988296]  show_stack+0x18/0x68
> > [   74.988698]  dump_stack+0xfc/0x168
> > [   74.989030]  slab_err+0xac/0xd4
> > [   74.989346]  __kmem_cache_shutdown+0x1e4/0x3c8
> > [   74.989779]  kmem_cache_destroy+0x68/0x130
> > [   74.990176]  test_version_show+0x84/0xf0
> > [   74.990679]  module_attr_show+0x40/0x60
> > [   74.991218]  sysfs_kf_seq_show+0x128/0x1c0
> > [   74.991656]  kernfs_seq_show+0xa0/0xb8
> > [   74.992059]  seq_read+0x1f0/0x7e8
> > [   74.992415]  kernfs_fop_read+0x70/0x338
> > [   74.993051]  vfs_read+0xe4/0x250
> > [   74.993498]  ksys_read+0xc8/0x180
> > [   74.993825]  __arm64_sys_read+0x44/0x58
> > [   74.994203]  el0_svc_common.constprop.0+0xac/0x228
> > [   74.994708]  do_el0_svc+0x38/0xa0
> > [   74.995088]  el0_sync_handler+0x170/0x178
> > [   74.995497]  el0_sync+0x174/0x180
> > [   74.996050] INFO: Object 0x(____ptrval____) @offset=15848
> > [   74.996752] INFO: Allocated in test_version_show+0x98/0xf0 age=8188 cpu=6 pid=172
> > [   75.000802]  stack_trace_save+0x9c/0xd0
> > [   75.002420]  set_track+0x64/0xf0
> > [   75.002770]  alloc_debug_processing+0x104/0x1a0
> > [   75.003171]  ___slab_alloc+0x628/0x648
> > [   75.004213]  __slab_alloc.isra.0+0x2c/0x58
> > [   75.004757]  kmem_cache_alloc+0x560/0x588
> > [   75.005376]  test_version_show+0x98/0xf0
> > [   75.005756]  module_attr_show+0x40/0x60
> > [   75.007035]  sysfs_kf_seq_show+0x128/0x1c0
> > [   75.007433]  kernfs_seq_show+0xa0/0xb8
> > [   75.007800]  seq_read+0x1f0/0x7e8
> > [   75.008128]  kernfs_fop_read+0x70/0x338
> > [   75.008507]  vfs_read+0xe4/0x250
> > [   75.008990]  ksys_read+0xc8/0x180
> > [   75.009462]  __arm64_sys_read+0x44/0x58
> > [   75.010085]  el0_svc_common.constprop.0+0xac/0x228
> > [   75.011006] kmem_cache_destroy test_module_slab: Slab cache still has objects
> >
> > Register a cpu hotplug function to remove all objects in the offline
> > per-cpu quarantine when cpu is going offline. Set a per-cpu variable
> > to indicate this cpu is offline.
> >
> > Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
> > ---
> >  mm/kasan/quarantine.c | 59 +++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 57 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
> > index 4c5375810449..67fb91ae2bd0 100644
> > --- a/mm/kasan/quarantine.c
> > +++ b/mm/kasan/quarantine.c
> > @@ -29,6 +29,7 @@
> >  #include <linux/srcu.h>
> >  #include <linux/string.h>
> >  #include <linux/types.h>
> > +#include <linux/cpuhotplug.h>
> >
> >  #include "../slab.h"
> >  #include "kasan.h"
> > @@ -97,6 +98,7 @@ static void qlist_move_all(struct qlist_head *from, struct qlist_head *to)
> >   * guarded by quarantine_lock.
> >   */
> 
> Hi Kuan-Ying,
> 
> Thanks for fixing this.
> 
> >  static DEFINE_PER_CPU(struct qlist_head, cpu_quarantine);
> > +static DEFINE_PER_CPU(int, cpu_quarantine_offline);
> 
> I think cpu_quarantine_offline is better be part of cpu_quarantine
> because it logically is and we already obtain a pointer to
> cpu_quarantine in quarantine_put, so it will also make the code a bit
> shorter.
> 

Ok. Got it.

> 
> >  /* Round-robin FIFO array of batches. */
> >  static struct qlist_head global_quarantine[QUARANTINE_BATCHES];
> > @@ -176,6 +178,8 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
> >         unsigned long flags;
> >         struct qlist_head *q;
> >         struct qlist_head temp = QLIST_INIT;
> > +       int *offline;
> > +       struct qlist_head q_offline = QLIST_INIT;
> >
> >         /*
> >          * Note: irq must be disabled until after we move the batch to the
> > @@ -187,8 +191,16 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
> >          */
> >         local_irq_save(flags);
> >
> > -       q = this_cpu_ptr(&cpu_quarantine);
> > -       qlist_put(q, &info->quarantine_link, cache->size);
> > +       offline = this_cpu_ptr(&cpu_quarantine_offline);
> > +       if (*offline == 0) {
> > +               q = this_cpu_ptr(&cpu_quarantine);
> > +               qlist_put(q, &info->quarantine_link, cache->size);
> > +       } else {
> > +               qlist_put(&q_offline, &info->quarantine_link, cache->size);
> > +               qlist_free_all(&q_offline, cache);
> 
> This looks like a convoluted way to call qlink_free. I think it will
> be better to call qlink_free directly here.
> 
> And why do we need this? Because CPU shutdown code can still free some
> objects afterwards?
> 

Yes, it is because IRQ can happen during CPU shutdown and free some
objects into offline CPU quarantine.

> > +               local_irq_restore(flags);
> > +               return;
> 
> You add both if/else and early return, this looks like unnecessary
> code complication. It would be simpler with:
> 
> if (*offline) {
>     qlink_free(...);
>     return;
> }
> ... all current per-cpu local ...
> 
> 

Thank you for reminder. v2 Will do it.

> > +       }
> >         if (unlikely(q->bytes > QUARANTINE_PERCPU_SIZE)) {
> >                 qlist_move_all(q, &temp);
> >
> > @@ -328,3 +340,46 @@ void quarantine_remove_cache(struct kmem_cache *cache)
> >
> >         synchronize_srcu(&remove_cache_srcu);
> >  }
> > +
> > +static int kasan_cpu_online(unsigned int cpu)
> > +{
> > +       int *offline;
> > +       unsigned long flags;
> > +
> > +       local_irq_save(flags);
> 
> I assume this local_irq_save/restore is to prevent some warnings from
> this_cpu_ptr.
> But CPU online/offline callbacks should run without preemption already
> (preempting/rescheduling on other CPUs does not make sense for them,
> right?), so I would assume that is already at least preemption
> disabled or something. Is there this_cpu_ptr variant that won't
> produce warnings on its own in cpu online/offline callbacks?
> This whole function could be a 1-liner:
> this_cpu_ptr(&cpu_quarantine)->offline = true;
> So I am trying to understand if we could avoid all this unnecessary danse.
> 

Yes, it's unnecessary. v2 will fix it.

> 
> > +       offline = this_cpu_ptr(&cpu_quarantine_offline);
> > +       *offline = 0;
> > +       local_irq_restore(flags);
> > +       return 0;
> > +}
> > +
> > +static int kasan_cpu_offline(unsigned int cpu)
> > +{
> > +       struct kmem_cache *s;
> > +       int *offline;
> > +       unsigned long flags;
> > +
> > +       local_irq_save(flags);
> > +       offline = this_cpu_ptr(&cpu_quarantine_offline);
> > +       *offline = 1;
> > +       local_irq_restore(flags);
> > +
> > +       mutex_lock(&slab_mutex);
> > +       list_for_each_entry(s, &slab_caches, list) {
> > +               per_cpu_remove_cache(s);
> > +       }
> > +       mutex_unlock(&slab_mutex);
> 
> We just want to drop the whole per-cpu cache at once, right? I would
> assume there should be a simpler way to do this all at once, rather
> than doing this per-slab.
> 

Yes.
Is removing objects in per-cpu quarantine directly better?

struct qlist_head *q;
q = this_cpu_ptr(&cpu_quaratine);
q->offline = true;
qlist_free_all(q, NULL);

> > +       return 0;
> > +}
> > +
> > +static int __init kasan_cpu_offline_quarantine_init(void)
> > +{
> > +       int ret = 0;
> > +
> > +       ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mm/kasan:online",
> > +                               kasan_cpu_online, kasan_cpu_offline);
> > +       if (ret)
> > +               pr_err("kasan offline cpu quarantine register failed [%d]\n", ret);
> > +       return ret;
> > +}
> > +late_initcall(kasan_cpu_offline_quarantine_init);
> > --
> > 2.18.0
> >
> > --
> > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/1605162252-23886-2-git-send-email-Kuan-Ying.Lee%40mediatek.com.


WARNING: multiple messages have this Message-ID (diff)
From: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: wsd_upstream <wsd_upstream@mediatek.com>,
	nicholas.tang@mediatek.com, linux-mediatek@lists.infradead.org,
	LKML <linux-kernel@vger.kernel.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	Linux-MM <linux-mm@kvack.org>,
	Miles Chen <miles.chen@mediatek.com>,
	Alexander Potapenko <glider@google.com>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 1/1] kasan: fix object remain in offline per-cpu quarantine
Date: Fri, 13 Nov 2020 10:31:54 +0800	[thread overview]
Message-ID: <1605234714.30076.18.camel@mtksdccf07> (raw)
In-Reply-To: <CACT4Y+bpDTqQRRdV0_O07H=Kczj3nXUY9ngQgX5K=BtT=Y60RQ@mail.gmail.com>

On Thu, 2020-11-12 at 09:39 +0100, Dmitry Vyukov wrote:
> On Thu, Nov 12, 2020 at 7:25 AM Kuan-Ying Lee
> <Kuan-Ying.Lee@mediatek.com> wrote:
> >
> > We hit this issue in our internal test.
> > When enabling generic kasan, a kfree()'d object is put into per-cpu
> > quarantine first. If the cpu goes offline, object still remains in
> > the per-cpu quarantine. If we call kmem_cache_destroy() now, slub
> > will report "Objects remaining" error.
> >
> > [   74.982625] =============================================================================
> > [   74.983380] BUG test_module_slab (Not tainted): Objects remaining in test_module_slab on __kmem_cache_shutdown()
> > [   74.984145] -----------------------------------------------------------------------------
> > [   74.984145]
> > [   74.984883] Disabling lock debugging due to kernel taint
> > [   74.985561] INFO: Slab 0x(____ptrval____) objects=34 used=1 fp=0x(____ptrval____) flags=0x2ffff00000010200
> > [   74.986638] CPU: 3 PID: 176 Comm: cat Tainted: G    B             5.10.0-rc1-00007-g4525c8781ec0-dirty #10
> > [   74.987262] Hardware name: linux,dummy-virt (DT)
> > [   74.987606] Call trace:
> > [   74.987924]  dump_backtrace+0x0/0x2b0
> > [   74.988296]  show_stack+0x18/0x68
> > [   74.988698]  dump_stack+0xfc/0x168
> > [   74.989030]  slab_err+0xac/0xd4
> > [   74.989346]  __kmem_cache_shutdown+0x1e4/0x3c8
> > [   74.989779]  kmem_cache_destroy+0x68/0x130
> > [   74.990176]  test_version_show+0x84/0xf0
> > [   74.990679]  module_attr_show+0x40/0x60
> > [   74.991218]  sysfs_kf_seq_show+0x128/0x1c0
> > [   74.991656]  kernfs_seq_show+0xa0/0xb8
> > [   74.992059]  seq_read+0x1f0/0x7e8
> > [   74.992415]  kernfs_fop_read+0x70/0x338
> > [   74.993051]  vfs_read+0xe4/0x250
> > [   74.993498]  ksys_read+0xc8/0x180
> > [   74.993825]  __arm64_sys_read+0x44/0x58
> > [   74.994203]  el0_svc_common.constprop.0+0xac/0x228
> > [   74.994708]  do_el0_svc+0x38/0xa0
> > [   74.995088]  el0_sync_handler+0x170/0x178
> > [   74.995497]  el0_sync+0x174/0x180
> > [   74.996050] INFO: Object 0x(____ptrval____) @offset=15848
> > [   74.996752] INFO: Allocated in test_version_show+0x98/0xf0 age=8188 cpu=6 pid=172
> > [   75.000802]  stack_trace_save+0x9c/0xd0
> > [   75.002420]  set_track+0x64/0xf0
> > [   75.002770]  alloc_debug_processing+0x104/0x1a0
> > [   75.003171]  ___slab_alloc+0x628/0x648
> > [   75.004213]  __slab_alloc.isra.0+0x2c/0x58
> > [   75.004757]  kmem_cache_alloc+0x560/0x588
> > [   75.005376]  test_version_show+0x98/0xf0
> > [   75.005756]  module_attr_show+0x40/0x60
> > [   75.007035]  sysfs_kf_seq_show+0x128/0x1c0
> > [   75.007433]  kernfs_seq_show+0xa0/0xb8
> > [   75.007800]  seq_read+0x1f0/0x7e8
> > [   75.008128]  kernfs_fop_read+0x70/0x338
> > [   75.008507]  vfs_read+0xe4/0x250
> > [   75.008990]  ksys_read+0xc8/0x180
> > [   75.009462]  __arm64_sys_read+0x44/0x58
> > [   75.010085]  el0_svc_common.constprop.0+0xac/0x228
> > [   75.011006] kmem_cache_destroy test_module_slab: Slab cache still has objects
> >
> > Register a cpu hotplug function to remove all objects in the offline
> > per-cpu quarantine when cpu is going offline. Set a per-cpu variable
> > to indicate this cpu is offline.
> >
> > Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
> > ---
> >  mm/kasan/quarantine.c | 59 +++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 57 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
> > index 4c5375810449..67fb91ae2bd0 100644
> > --- a/mm/kasan/quarantine.c
> > +++ b/mm/kasan/quarantine.c
> > @@ -29,6 +29,7 @@
> >  #include <linux/srcu.h>
> >  #include <linux/string.h>
> >  #include <linux/types.h>
> > +#include <linux/cpuhotplug.h>
> >
> >  #include "../slab.h"
> >  #include "kasan.h"
> > @@ -97,6 +98,7 @@ static void qlist_move_all(struct qlist_head *from, struct qlist_head *to)
> >   * guarded by quarantine_lock.
> >   */
> 
> Hi Kuan-Ying,
> 
> Thanks for fixing this.
> 
> >  static DEFINE_PER_CPU(struct qlist_head, cpu_quarantine);
> > +static DEFINE_PER_CPU(int, cpu_quarantine_offline);
> 
> I think cpu_quarantine_offline is better be part of cpu_quarantine
> because it logically is and we already obtain a pointer to
> cpu_quarantine in quarantine_put, so it will also make the code a bit
> shorter.
> 

Ok. Got it.

> 
> >  /* Round-robin FIFO array of batches. */
> >  static struct qlist_head global_quarantine[QUARANTINE_BATCHES];
> > @@ -176,6 +178,8 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
> >         unsigned long flags;
> >         struct qlist_head *q;
> >         struct qlist_head temp = QLIST_INIT;
> > +       int *offline;
> > +       struct qlist_head q_offline = QLIST_INIT;
> >
> >         /*
> >          * Note: irq must be disabled until after we move the batch to the
> > @@ -187,8 +191,16 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
> >          */
> >         local_irq_save(flags);
> >
> > -       q = this_cpu_ptr(&cpu_quarantine);
> > -       qlist_put(q, &info->quarantine_link, cache->size);
> > +       offline = this_cpu_ptr(&cpu_quarantine_offline);
> > +       if (*offline == 0) {
> > +               q = this_cpu_ptr(&cpu_quarantine);
> > +               qlist_put(q, &info->quarantine_link, cache->size);
> > +       } else {
> > +               qlist_put(&q_offline, &info->quarantine_link, cache->size);
> > +               qlist_free_all(&q_offline, cache);
> 
> This looks like a convoluted way to call qlink_free. I think it will
> be better to call qlink_free directly here.
> 
> And why do we need this? Because CPU shutdown code can still free some
> objects afterwards?
> 

Yes, it is because IRQ can happen during CPU shutdown and free some
objects into offline CPU quarantine.

> > +               local_irq_restore(flags);
> > +               return;
> 
> You add both if/else and early return, this looks like unnecessary
> code complication. It would be simpler with:
> 
> if (*offline) {
>     qlink_free(...);
>     return;
> }
> ... all current per-cpu local ...
> 
> 

Thank you for reminder. v2 Will do it.

> > +       }
> >         if (unlikely(q->bytes > QUARANTINE_PERCPU_SIZE)) {
> >                 qlist_move_all(q, &temp);
> >
> > @@ -328,3 +340,46 @@ void quarantine_remove_cache(struct kmem_cache *cache)
> >
> >         synchronize_srcu(&remove_cache_srcu);
> >  }
> > +
> > +static int kasan_cpu_online(unsigned int cpu)
> > +{
> > +       int *offline;
> > +       unsigned long flags;
> > +
> > +       local_irq_save(flags);
> 
> I assume this local_irq_save/restore is to prevent some warnings from
> this_cpu_ptr.
> But CPU online/offline callbacks should run without preemption already
> (preempting/rescheduling on other CPUs does not make sense for them,
> right?), so I would assume that is already at least preemption
> disabled or something. Is there this_cpu_ptr variant that won't
> produce warnings on its own in cpu online/offline callbacks?
> This whole function could be a 1-liner:
> this_cpu_ptr(&cpu_quarantine)->offline = true;
> So I am trying to understand if we could avoid all this unnecessary danse.
> 

Yes, it's unnecessary. v2 will fix it.

> 
> > +       offline = this_cpu_ptr(&cpu_quarantine_offline);
> > +       *offline = 0;
> > +       local_irq_restore(flags);
> > +       return 0;
> > +}
> > +
> > +static int kasan_cpu_offline(unsigned int cpu)
> > +{
> > +       struct kmem_cache *s;
> > +       int *offline;
> > +       unsigned long flags;
> > +
> > +       local_irq_save(flags);
> > +       offline = this_cpu_ptr(&cpu_quarantine_offline);
> > +       *offline = 1;
> > +       local_irq_restore(flags);
> > +
> > +       mutex_lock(&slab_mutex);
> > +       list_for_each_entry(s, &slab_caches, list) {
> > +               per_cpu_remove_cache(s);
> > +       }
> > +       mutex_unlock(&slab_mutex);
> 
> We just want to drop the whole per-cpu cache at once, right? I would
> assume there should be a simpler way to do this all at once, rather
> than doing this per-slab.
> 

Yes.
Is removing objects in per-cpu quarantine directly better?

struct qlist_head *q;
q = this_cpu_ptr(&cpu_quaratine);
q->offline = true;
qlist_free_all(q, NULL);

> > +       return 0;
> > +}
> > +
> > +static int __init kasan_cpu_offline_quarantine_init(void)
> > +{
> > +       int ret = 0;
> > +
> > +       ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mm/kasan:online",
> > +                               kasan_cpu_online, kasan_cpu_offline);
> > +       if (ret)
> > +               pr_err("kasan offline cpu quarantine register failed [%d]\n", ret);
> > +       return ret;
> > +}
> > +late_initcall(kasan_cpu_offline_quarantine_init);
> > --
> > 2.18.0
> >
> > --
> > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/1605162252-23886-2-git-send-email-Kuan-Ying.Lee%40mediatek.com.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

WARNING: multiple messages have this Message-ID (diff)
From: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: wsd_upstream <wsd_upstream@mediatek.com>,
	nicholas.tang@mediatek.com, linux-mediatek@lists.infradead.org,
	LKML <linux-kernel@vger.kernel.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	Linux-MM <linux-mm@kvack.org>,
	Miles Chen <miles.chen@mediatek.com>,
	Alexander Potapenko <glider@google.com>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 1/1] kasan: fix object remain in offline per-cpu quarantine
Date: Fri, 13 Nov 2020 10:31:54 +0800	[thread overview]
Message-ID: <1605234714.30076.18.camel@mtksdccf07> (raw)
In-Reply-To: <CACT4Y+bpDTqQRRdV0_O07H=Kczj3nXUY9ngQgX5K=BtT=Y60RQ@mail.gmail.com>

On Thu, 2020-11-12 at 09:39 +0100, Dmitry Vyukov wrote:
> On Thu, Nov 12, 2020 at 7:25 AM Kuan-Ying Lee
> <Kuan-Ying.Lee@mediatek.com> wrote:
> >
> > We hit this issue in our internal test.
> > When enabling generic kasan, a kfree()'d object is put into per-cpu
> > quarantine first. If the cpu goes offline, object still remains in
> > the per-cpu quarantine. If we call kmem_cache_destroy() now, slub
> > will report "Objects remaining" error.
> >
> > [   74.982625] =============================================================================
> > [   74.983380] BUG test_module_slab (Not tainted): Objects remaining in test_module_slab on __kmem_cache_shutdown()
> > [   74.984145] -----------------------------------------------------------------------------
> > [   74.984145]
> > [   74.984883] Disabling lock debugging due to kernel taint
> > [   74.985561] INFO: Slab 0x(____ptrval____) objects=34 used=1 fp=0x(____ptrval____) flags=0x2ffff00000010200
> > [   74.986638] CPU: 3 PID: 176 Comm: cat Tainted: G    B             5.10.0-rc1-00007-g4525c8781ec0-dirty #10
> > [   74.987262] Hardware name: linux,dummy-virt (DT)
> > [   74.987606] Call trace:
> > [   74.987924]  dump_backtrace+0x0/0x2b0
> > [   74.988296]  show_stack+0x18/0x68
> > [   74.988698]  dump_stack+0xfc/0x168
> > [   74.989030]  slab_err+0xac/0xd4
> > [   74.989346]  __kmem_cache_shutdown+0x1e4/0x3c8
> > [   74.989779]  kmem_cache_destroy+0x68/0x130
> > [   74.990176]  test_version_show+0x84/0xf0
> > [   74.990679]  module_attr_show+0x40/0x60
> > [   74.991218]  sysfs_kf_seq_show+0x128/0x1c0
> > [   74.991656]  kernfs_seq_show+0xa0/0xb8
> > [   74.992059]  seq_read+0x1f0/0x7e8
> > [   74.992415]  kernfs_fop_read+0x70/0x338
> > [   74.993051]  vfs_read+0xe4/0x250
> > [   74.993498]  ksys_read+0xc8/0x180
> > [   74.993825]  __arm64_sys_read+0x44/0x58
> > [   74.994203]  el0_svc_common.constprop.0+0xac/0x228
> > [   74.994708]  do_el0_svc+0x38/0xa0
> > [   74.995088]  el0_sync_handler+0x170/0x178
> > [   74.995497]  el0_sync+0x174/0x180
> > [   74.996050] INFO: Object 0x(____ptrval____) @offset=15848
> > [   74.996752] INFO: Allocated in test_version_show+0x98/0xf0 age=8188 cpu=6 pid=172
> > [   75.000802]  stack_trace_save+0x9c/0xd0
> > [   75.002420]  set_track+0x64/0xf0
> > [   75.002770]  alloc_debug_processing+0x104/0x1a0
> > [   75.003171]  ___slab_alloc+0x628/0x648
> > [   75.004213]  __slab_alloc.isra.0+0x2c/0x58
> > [   75.004757]  kmem_cache_alloc+0x560/0x588
> > [   75.005376]  test_version_show+0x98/0xf0
> > [   75.005756]  module_attr_show+0x40/0x60
> > [   75.007035]  sysfs_kf_seq_show+0x128/0x1c0
> > [   75.007433]  kernfs_seq_show+0xa0/0xb8
> > [   75.007800]  seq_read+0x1f0/0x7e8
> > [   75.008128]  kernfs_fop_read+0x70/0x338
> > [   75.008507]  vfs_read+0xe4/0x250
> > [   75.008990]  ksys_read+0xc8/0x180
> > [   75.009462]  __arm64_sys_read+0x44/0x58
> > [   75.010085]  el0_svc_common.constprop.0+0xac/0x228
> > [   75.011006] kmem_cache_destroy test_module_slab: Slab cache still has objects
> >
> > Register a cpu hotplug function to remove all objects in the offline
> > per-cpu quarantine when cpu is going offline. Set a per-cpu variable
> > to indicate this cpu is offline.
> >
> > Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
> > ---
> >  mm/kasan/quarantine.c | 59 +++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 57 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
> > index 4c5375810449..67fb91ae2bd0 100644
> > --- a/mm/kasan/quarantine.c
> > +++ b/mm/kasan/quarantine.c
> > @@ -29,6 +29,7 @@
> >  #include <linux/srcu.h>
> >  #include <linux/string.h>
> >  #include <linux/types.h>
> > +#include <linux/cpuhotplug.h>
> >
> >  #include "../slab.h"
> >  #include "kasan.h"
> > @@ -97,6 +98,7 @@ static void qlist_move_all(struct qlist_head *from, struct qlist_head *to)
> >   * guarded by quarantine_lock.
> >   */
> 
> Hi Kuan-Ying,
> 
> Thanks for fixing this.
> 
> >  static DEFINE_PER_CPU(struct qlist_head, cpu_quarantine);
> > +static DEFINE_PER_CPU(int, cpu_quarantine_offline);
> 
> I think cpu_quarantine_offline is better be part of cpu_quarantine
> because it logically is and we already obtain a pointer to
> cpu_quarantine in quarantine_put, so it will also make the code a bit
> shorter.
> 

Ok. Got it.

> 
> >  /* Round-robin FIFO array of batches. */
> >  static struct qlist_head global_quarantine[QUARANTINE_BATCHES];
> > @@ -176,6 +178,8 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
> >         unsigned long flags;
> >         struct qlist_head *q;
> >         struct qlist_head temp = QLIST_INIT;
> > +       int *offline;
> > +       struct qlist_head q_offline = QLIST_INIT;
> >
> >         /*
> >          * Note: irq must be disabled until after we move the batch to the
> > @@ -187,8 +191,16 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
> >          */
> >         local_irq_save(flags);
> >
> > -       q = this_cpu_ptr(&cpu_quarantine);
> > -       qlist_put(q, &info->quarantine_link, cache->size);
> > +       offline = this_cpu_ptr(&cpu_quarantine_offline);
> > +       if (*offline == 0) {
> > +               q = this_cpu_ptr(&cpu_quarantine);
> > +               qlist_put(q, &info->quarantine_link, cache->size);
> > +       } else {
> > +               qlist_put(&q_offline, &info->quarantine_link, cache->size);
> > +               qlist_free_all(&q_offline, cache);
> 
> This looks like a convoluted way to call qlink_free. I think it will
> be better to call qlink_free directly here.
> 
> And why do we need this? Because CPU shutdown code can still free some
> objects afterwards?
> 

Yes, it is because IRQ can happen during CPU shutdown and free some
objects into offline CPU quarantine.

> > +               local_irq_restore(flags);
> > +               return;
> 
> You add both if/else and early return, this looks like unnecessary
> code complication. It would be simpler with:
> 
> if (*offline) {
>     qlink_free(...);
>     return;
> }
> ... all current per-cpu local ...
> 
> 

Thank you for reminder. v2 Will do it.

> > +       }
> >         if (unlikely(q->bytes > QUARANTINE_PERCPU_SIZE)) {
> >                 qlist_move_all(q, &temp);
> >
> > @@ -328,3 +340,46 @@ void quarantine_remove_cache(struct kmem_cache *cache)
> >
> >         synchronize_srcu(&remove_cache_srcu);
> >  }
> > +
> > +static int kasan_cpu_online(unsigned int cpu)
> > +{
> > +       int *offline;
> > +       unsigned long flags;
> > +
> > +       local_irq_save(flags);
> 
> I assume this local_irq_save/restore is to prevent some warnings from
> this_cpu_ptr.
> But CPU online/offline callbacks should run without preemption already
> (preempting/rescheduling on other CPUs does not make sense for them,
> right?), so I would assume that is already at least preemption
> disabled or something. Is there this_cpu_ptr variant that won't
> produce warnings on its own in cpu online/offline callbacks?
> This whole function could be a 1-liner:
> this_cpu_ptr(&cpu_quarantine)->offline = true;
> So I am trying to understand if we could avoid all this unnecessary danse.
> 

Yes, it's unnecessary. v2 will fix it.

> 
> > +       offline = this_cpu_ptr(&cpu_quarantine_offline);
> > +       *offline = 0;
> > +       local_irq_restore(flags);
> > +       return 0;
> > +}
> > +
> > +static int kasan_cpu_offline(unsigned int cpu)
> > +{
> > +       struct kmem_cache *s;
> > +       int *offline;
> > +       unsigned long flags;
> > +
> > +       local_irq_save(flags);
> > +       offline = this_cpu_ptr(&cpu_quarantine_offline);
> > +       *offline = 1;
> > +       local_irq_restore(flags);
> > +
> > +       mutex_lock(&slab_mutex);
> > +       list_for_each_entry(s, &slab_caches, list) {
> > +               per_cpu_remove_cache(s);
> > +       }
> > +       mutex_unlock(&slab_mutex);
> 
> We just want to drop the whole per-cpu cache at once, right? I would
> assume there should be a simpler way to do this all at once, rather
> than doing this per-slab.
> 

Yes.
Is removing objects in per-cpu quarantine directly better?

struct qlist_head *q;
q = this_cpu_ptr(&cpu_quaratine);
q->offline = true;
qlist_free_all(q, NULL);

> > +       return 0;
> > +}
> > +
> > +static int __init kasan_cpu_offline_quarantine_init(void)
> > +{
> > +       int ret = 0;
> > +
> > +       ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mm/kasan:online",
> > +                               kasan_cpu_online, kasan_cpu_offline);
> > +       if (ret)
> > +               pr_err("kasan offline cpu quarantine register failed [%d]\n", ret);
> > +       return ret;
> > +}
> > +late_initcall(kasan_cpu_offline_quarantine_init);
> > --
> > 2.18.0
> >
> > --
> > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/1605162252-23886-2-git-send-email-Kuan-Ying.Lee%40mediatek.com.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-11-13  2:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12  6:24 [PATCH 0/1] Fix objects remain in the offline per-cpu quarantine Kuan-Ying Lee
2020-11-12  6:24 ` Kuan-Ying Lee
2020-11-12  6:24 ` Kuan-Ying Lee
2020-11-12  6:24 ` [PATCH 1/1] kasan: fix object remain in " Kuan-Ying Lee
2020-11-12  6:24   ` Kuan-Ying Lee
2020-11-12  6:24   ` Kuan-Ying Lee
2020-11-12  8:39   ` Dmitry Vyukov
2020-11-12  8:39     ` Dmitry Vyukov
2020-11-12  8:39     ` Dmitry Vyukov
2020-11-12  8:39     ` Dmitry Vyukov
2020-11-13  2:31     ` Kuan-Ying Lee [this message]
2020-11-13  2:31       ` Kuan-Ying Lee
2020-11-13  2:31       ` Kuan-Ying Lee
2020-11-13  7:03       ` Dmitry Vyukov
2020-11-13  7:03         ` Dmitry Vyukov
2020-11-13  7:03         ` Dmitry Vyukov
2020-11-13  7:03         ` Dmitry Vyukov
2020-11-16  1:44         ` Kuan-Ying Lee
2020-11-16  1:44           ` Kuan-Ying Lee
2020-11-16  1:44           ` Kuan-Ying Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1605234714.30076.18.camel@mtksdccf07 \
    --to=kuan-ying.lee@mediatek.com \
    --cc=akpm@linux-foundation.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=matthias.bgg@gmail.com \
    --cc=miles.chen@mediatek.com \
    --cc=nicholas.tang@mediatek.com \
    --cc=wsd_upstream@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.