netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net] inet: frags: rework rhashtable dismantle
@ 2018-10-02  5:49 Eric Dumazet
  2018-10-02  8:19 ` Dmitry Vyukov
  2018-10-03  1:52 ` Eric Dumazet
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2018-10-02  5:49 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Thomas Graf, Herbert Xu

syszbot found an interesting use-after-free [1] happening
while IPv4 fragment rhashtable was destroyed at netns dismantle.

While no insertions can possibly happen at the time a dismantling
netns is destroying this rhashtable, timers can still fire and
attempt to remove elements from this rhashtable.

This is forbidden, since rhashtable_free_and_destroy() has
no synchronization against concurrent inserts and deletes.

Add a new nf->dead flag so that timers do not attempt
a rhashtable_remove_fast() operation.

[1]
BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:188 [inline]
BUG: KASAN: use-after-free in rhashtable_last_table+0x216/0x240 lib/rhashtable.c:217
Read of size 8 at addr ffff88019a4c8840 by task kworker/0:4/8279

CPU: 0 PID: 8279 Comm: kworker/0:4 Not tainted 4.19.0-rc5+ #61
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events rht_deferred_worker
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 __read_once_size include/linux/compiler.h:188 [inline]
 rhashtable_last_table+0x216/0x240 lib/rhashtable.c:217
 rht_deferred_worker+0x157/0x1de0 lib/rhashtable.c:410
 process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153
 worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
 kthread+0x35a/0x420 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

Allocated by task 5:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
 __do_kmalloc_node mm/slab.c:3682 [inline]
 __kmalloc_node+0x47/0x70 mm/slab.c:3689
 kmalloc_node include/linux/slab.h:555 [inline]
 kvmalloc_node+0xb9/0xf0 mm/util.c:423
 kvmalloc include/linux/mm.h:577 [inline]
 kvzalloc include/linux/mm.h:585 [inline]
 bucket_table_alloc+0x9a/0x4e0 lib/rhashtable.c:176
 rhashtable_rehash_alloc+0x73/0x100 lib/rhashtable.c:353
 rht_deferred_worker+0x278/0x1de0 lib/rhashtable.c:413
 process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153
 worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
 kthread+0x35a/0x420 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

Freed by task 8283:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xcf/0x230 mm/slab.c:3813
 kvfree+0x61/0x70 mm/util.c:452
 bucket_table_free+0xda/0x250 lib/rhashtable.c:108
 rhashtable_free_and_destroy+0x152/0x900 lib/rhashtable.c:1163
 inet_frags_exit_net+0x3d/0x50 net/ipv4/inet_fragment.c:96
 ipv4_frags_exit_net+0x73/0x90 net/ipv4/ip_fragment.c:914
 ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:153
 cleanup_net+0x555/0xb10 net/core/net_namespace.c:551
 process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153
 worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
 kthread+0x35a/0x420 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

The buggy address belongs to the object at ffff88019a4c8800
 which belongs to the cache kmalloc-16384 of size 16384
The buggy address is located 64 bytes inside of
 16384-byte region [ffff88019a4c8800, ffff88019a4cc800)
The buggy address belongs to the page:
page:ffffea0006693200 count:1 mapcount:0 mapping:ffff8801da802200 index:0x0 compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffffea0006685608 ffffea0006617c08 ffff8801da802200
raw: 0000000000000000 ffff88019a4c8800 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88019a4c8700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88019a4c8780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88019a4c8800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff88019a4c8880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88019a4c8900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
 include/net/inet_frag.h  |  4 +++-
 net/ipv4/inet_fragment.c | 31 +++++++++++++++++++++----------
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 1662cbc0b46b45296a367ecbdaf03c68854fdce7..ffe5e1be40212fa63e360f3e29a56c1b2ce897ee 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -11,7 +11,7 @@ struct netns_frags {
 	int			timeout;
 	int			max_dist;
 	struct inet_frags	*f;
-
+	bool			dead;
 	struct rhashtable       rhashtable ____cacheline_aligned_in_smp;
 
 	/* Keep atomic mem on separate cachelines in structs that include it */
@@ -24,11 +24,13 @@ struct netns_frags {
  * @INET_FRAG_FIRST_IN: first fragment has arrived
  * @INET_FRAG_LAST_IN: final fragment has arrived
  * @INET_FRAG_COMPLETE: frag queue has been processed and is due for destruction
+ * @INET_FRAG_HASH_DEAD: inet_frag_kill() has not removed fq from rhashtable
  */
 enum {
 	INET_FRAG_FIRST_IN	= BIT(0),
 	INET_FRAG_LAST_IN	= BIT(1),
 	INET_FRAG_COMPLETE	= BIT(2),
+	INET_FRAG_HASH_DEAD	= BIT(3),
 };
 
 struct frag_v4_compare_key {
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index bcb11f3a27c0c34115af05034a5a20f57842eb0a..887134425350cc7c8ee081d8c83a037074d96d93 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -71,28 +71,31 @@ EXPORT_SYMBOL(inet_frags_fini);
 static void inet_frags_free_cb(void *ptr, void *arg)
 {
 	struct inet_frag_queue *fq = ptr;
+	int count = 0;
 
-	/* If we can not cancel the timer, it means this frag_queue
-	 * is already disappearing, we have nothing to do.
-	 * Otherwise, we own a refcount until the end of this function.
-	 */
-	if (!del_timer(&fq->timer))
-		return;
+	if (del_timer_sync(&fq->timer))
+		count++;
 
 	spin_lock_bh(&fq->lock);
 	if (!(fq->flags & INET_FRAG_COMPLETE)) {
 		fq->flags |= INET_FRAG_COMPLETE;
-		refcount_dec(&fq->refcnt);
+		count++;
+	} else if (fq->flags & INET_FRAG_HASH_DEAD) {
+		count++;
 	}
 	spin_unlock_bh(&fq->lock);
 
-	inet_frag_put(fq);
+	if (refcount_sub_and_test(count, &fq->refcnt))
+		inet_frag_destroy(fq);
 }
 
 void inet_frags_exit_net(struct netns_frags *nf)
 {
 	nf->high_thresh = 0; /* prevent creation of new frags */
 
+	/* paired with READ_ONCE() in inet_frag_kill() */
+	smp_store_release(&nf->dead, true);
+
 	rhashtable_free_and_destroy(&nf->rhashtable, inet_frags_free_cb, NULL);
 }
 EXPORT_SYMBOL(inet_frags_exit_net);
@@ -106,8 +109,16 @@ void inet_frag_kill(struct inet_frag_queue *fq)
 		struct netns_frags *nf = fq->net;
 
 		fq->flags |= INET_FRAG_COMPLETE;
-		rhashtable_remove_fast(&nf->rhashtable, &fq->node, nf->f->rhash_params);
-		refcount_dec(&fq->refcnt);
+		/* This READ_ONCE() is paired with smp_store_release()
+		 * in inet_frags_exit_net().
+		 */
+		if (!READ_ONCE(nf->dead)) {
+			rhashtable_remove_fast(&nf->rhashtable, &fq->node,
+					       nf->f->rhash_params);
+			refcount_dec(&fq->refcnt);
+		} else {
+			fq->flags |= INET_FRAG_HASH_DEAD;
+		}
 	}
 }
 EXPORT_SYMBOL(inet_frag_kill);
-- 
2.19.0.605.g01d371f741-goog

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net] inet: frags: rework rhashtable dismantle
  2018-10-02  5:49 [PATCH v2 net] inet: frags: rework rhashtable dismantle Eric Dumazet
@ 2018-10-02  8:19 ` Dmitry Vyukov
  2018-10-02 13:16   ` Eric Dumazet
  2018-10-03  1:52 ` Eric Dumazet
  1 sibling, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2018-10-02  8:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eric Dumazet, Thomas Graf, Herbert Xu

On Tue, Oct 2, 2018 at 7:49 AM, Eric Dumazet <edumazet@google.com> wrote:
> syszbot found an interesting use-after-free [1] happening
> while IPv4 fragment rhashtable was destroyed at netns dismantle.
>
> While no insertions can possibly happen at the time a dismantling
> netns is destroying this rhashtable, timers can still fire and
> attempt to remove elements from this rhashtable.
>
> This is forbidden, since rhashtable_free_and_destroy() has
> no synchronization against concurrent inserts and deletes.
>
> Add a new nf->dead flag so that timers do not attempt
> a rhashtable_remove_fast() operation.
>
> [1]
> BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:188 [inline]
> BUG: KASAN: use-after-free in rhashtable_last_table+0x216/0x240 lib/rhashtable.c:217
> Read of size 8 at addr ffff88019a4c8840 by task kworker/0:4/8279
>
> CPU: 0 PID: 8279 Comm: kworker/0:4 Not tainted 4.19.0-rc5+ #61
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: events rht_deferred_worker
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
>  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>  __read_once_size include/linux/compiler.h:188 [inline]
>  rhashtable_last_table+0x216/0x240 lib/rhashtable.c:217
>  rht_deferred_worker+0x157/0x1de0 lib/rhashtable.c:410
>  process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153
>  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
>  kthread+0x35a/0x420 kernel/kthread.c:246
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
>
> Allocated by task 5:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>  set_track mm/kasan/kasan.c:460 [inline]
>  kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
>  __do_kmalloc_node mm/slab.c:3682 [inline]
>  __kmalloc_node+0x47/0x70 mm/slab.c:3689
>  kmalloc_node include/linux/slab.h:555 [inline]
>  kvmalloc_node+0xb9/0xf0 mm/util.c:423
>  kvmalloc include/linux/mm.h:577 [inline]
>  kvzalloc include/linux/mm.h:585 [inline]
>  bucket_table_alloc+0x9a/0x4e0 lib/rhashtable.c:176
>  rhashtable_rehash_alloc+0x73/0x100 lib/rhashtable.c:353
>  rht_deferred_worker+0x278/0x1de0 lib/rhashtable.c:413
>  process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153
>  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
>  kthread+0x35a/0x420 kernel/kthread.c:246
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
>
> Freed by task 8283:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>  set_track mm/kasan/kasan.c:460 [inline]
>  __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
>  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
>  __cache_free mm/slab.c:3498 [inline]
>  kfree+0xcf/0x230 mm/slab.c:3813
>  kvfree+0x61/0x70 mm/util.c:452
>  bucket_table_free+0xda/0x250 lib/rhashtable.c:108
>  rhashtable_free_and_destroy+0x152/0x900 lib/rhashtable.c:1163
>  inet_frags_exit_net+0x3d/0x50 net/ipv4/inet_fragment.c:96
>  ipv4_frags_exit_net+0x73/0x90 net/ipv4/ip_fragment.c:914
>  ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:153
>  cleanup_net+0x555/0xb10 net/core/net_namespace.c:551
>  process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153
>  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
>  kthread+0x35a/0x420 kernel/kthread.c:246
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
>
> The buggy address belongs to the object at ffff88019a4c8800
>  which belongs to the cache kmalloc-16384 of size 16384
> The buggy address is located 64 bytes inside of
>  16384-byte region [ffff88019a4c8800, ffff88019a4cc800)
> The buggy address belongs to the page:
> page:ffffea0006693200 count:1 mapcount:0 mapping:ffff8801da802200 index:0x0 compound_mapcount: 0
> flags: 0x2fffc0000008100(slab|head)
> raw: 02fffc0000008100 ffffea0006685608 ffffea0006617c08 ffff8801da802200
> raw: 0000000000000000 ffff88019a4c8800 0000000100000001 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>  ffff88019a4c8700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff88019a4c8780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>ffff88019a4c8800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                            ^
>  ffff88019a4c8880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff88019a4c8900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
> Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> ---
>  include/net/inet_frag.h  |  4 +++-
>  net/ipv4/inet_fragment.c | 31 +++++++++++++++++++++----------
>  2 files changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> index 1662cbc0b46b45296a367ecbdaf03c68854fdce7..ffe5e1be40212fa63e360f3e29a56c1b2ce897ee 100644
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -11,7 +11,7 @@ struct netns_frags {
>         int                     timeout;
>         int                     max_dist;
>         struct inet_frags       *f;
> -
> +       bool                    dead;
>         struct rhashtable       rhashtable ____cacheline_aligned_in_smp;
>
>         /* Keep atomic mem on separate cachelines in structs that include it */
> @@ -24,11 +24,13 @@ struct netns_frags {
>   * @INET_FRAG_FIRST_IN: first fragment has arrived
>   * @INET_FRAG_LAST_IN: final fragment has arrived
>   * @INET_FRAG_COMPLETE: frag queue has been processed and is due for destruction
> + * @INET_FRAG_HASH_DEAD: inet_frag_kill() has not removed fq from rhashtable
>   */
>  enum {
>         INET_FRAG_FIRST_IN      = BIT(0),
>         INET_FRAG_LAST_IN       = BIT(1),
>         INET_FRAG_COMPLETE      = BIT(2),
> +       INET_FRAG_HASH_DEAD     = BIT(3),
>  };
>
>  struct frag_v4_compare_key {
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index bcb11f3a27c0c34115af05034a5a20f57842eb0a..887134425350cc7c8ee081d8c83a037074d96d93 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -71,28 +71,31 @@ EXPORT_SYMBOL(inet_frags_fini);
>  static void inet_frags_free_cb(void *ptr, void *arg)
>  {
>         struct inet_frag_queue *fq = ptr;
> +       int count = 0;
>
> -       /* If we can not cancel the timer, it means this frag_queue
> -        * is already disappearing, we have nothing to do.
> -        * Otherwise, we own a refcount until the end of this function.
> -        */
> -       if (!del_timer(&fq->timer))
> -               return;
> +       if (del_timer_sync(&fq->timer))
> +               count++;
>
>         spin_lock_bh(&fq->lock);
>         if (!(fq->flags & INET_FRAG_COMPLETE)) {
>                 fq->flags |= INET_FRAG_COMPLETE;
> -               refcount_dec(&fq->refcnt);
> +               count++;
> +       } else if (fq->flags & INET_FRAG_HASH_DEAD) {
> +               count++;
>         }
>         spin_unlock_bh(&fq->lock);
>
> -       inet_frag_put(fq);
> +       if (refcount_sub_and_test(count, &fq->refcnt))
> +               inet_frag_destroy(fq);
>  }
>
>  void inet_frags_exit_net(struct netns_frags *nf)
>  {
>         nf->high_thresh = 0; /* prevent creation of new frags */
>
> +       /* paired with READ_ONCE() in inet_frag_kill() */
> +       smp_store_release(&nf->dead, true);
> +
>         rhashtable_free_and_destroy(&nf->rhashtable, inet_frags_free_cb, NULL);
>  }
>  EXPORT_SYMBOL(inet_frags_exit_net);
> @@ -106,8 +109,16 @@ void inet_frag_kill(struct inet_frag_queue *fq)
>                 struct netns_frags *nf = fq->net;
>
>                 fq->flags |= INET_FRAG_COMPLETE;
> -               rhashtable_remove_fast(&nf->rhashtable, &fq->node, nf->f->rhash_params);
> -               refcount_dec(&fq->refcnt);
> +               /* This READ_ONCE() is paired with smp_store_release()
> +                * in inet_frags_exit_net().
> +                */
> +               if (!READ_ONCE(nf->dead)) {


Does inet_frag_kill() hold fq->lock? I am missing how inet_frag_kill()
and inet_frags_exit_net() are synchronized.
Since you use smp_store_release()/READ_ONCE() they seem to run in
parallel. But then isn't it possible that inet_frag_kill() reads
nf->dead == 0, then inet_frags_exit_net() sets nf->dead, and then we
have the same race on concurrent removal? Or, isn't it possible that
inet_frag_kill() reads nf->dead == 1, but does not set
INET_FRAG_HASH_DEAD yet, and then inet_frags_free_cb() misses the
INET_FRAG_HASH_DEAD flag?


> +                       rhashtable_remove_fast(&nf->rhashtable, &fq->node,
> +                                              nf->f->rhash_params);
> +                       refcount_dec(&fq->refcnt);
> +               } else {
> +                       fq->flags |= INET_FRAG_HASH_DEAD;
> +               }
>         }
>  }
>  EXPORT_SYMBOL(inet_frag_kill);
> --
> 2.19.0.605.g01d371f741-goog
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net] inet: frags: rework rhashtable dismantle
  2018-10-02  8:19 ` Dmitry Vyukov
@ 2018-10-02 13:16   ` Eric Dumazet
  2018-10-02 13:45     ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2018-10-02 13:16 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: David Miller, netdev, Eric Dumazet, Thomas Graf, Herbert Xu

On Tue, Oct 2, 2018 at 1:19 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Tue, Oct 2, 2018 at 7:49 AM, Eric Dumazet <edumazet@google.com> wrote:
>
>
> Does inet_frag_kill() hold fq->lock? I am missing how inet_frag_kill()
> and inet_frags_exit_net() are synchronized.
> Since you use smp_store_release()/READ_ONCE() they seem to run in
> parallel. But then isn't it possible that inet_frag_kill() reads
> nf->dead == 0, then inet_frags_exit_net() sets nf->dead, and then we
> have the same race on concurrent removal? Or, isn't it possible that
> inet_frag_kill() reads nf->dead == 1, but does not set
> INET_FRAG_HASH_DEAD yet, and then inet_frags_free_cb() misses the
> INET_FRAG_HASH_DEAD flag?
>

Yes this is kind of implied in my patch.
I put the smp_store_release() and READ_ONCE exactly to document the
possible races.
This was the reason for my attempt in V1, doing a walk, but Herbert
said walk was not designed for doing deletes.

Proper synch will need a synchronize_rcu(), and thus a future
conversion in net-next because we can not really
add new synchronize_rcu() calls in an (struct
pernet_operations.)exit() without considerable performance hit of
netns dismantles.

So this will require a conversion of all inet_frags_exit_net() callers
to .exit_batch() to mitigate the cost.

I thought of synchronize_rcu_bh() but this beast is going away soon anyway.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net] inet: frags: rework rhashtable dismantle
  2018-10-02 13:16   ` Eric Dumazet
@ 2018-10-02 13:45     ` Dmitry Vyukov
  2018-10-02 14:04       ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2018-10-02 13:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Eric Dumazet, Thomas Graf, Herbert Xu

On Tue, Oct 2, 2018 at 3:16 PM, Eric Dumazet <edumazet@google.com> wrote:
> On Tue, Oct 2, 2018 at 1:19 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> On Tue, Oct 2, 2018 at 7:49 AM, Eric Dumazet <edumazet@google.com> wrote:
>>
>>
>> Does inet_frag_kill() hold fq->lock? I am missing how inet_frag_kill()
>> and inet_frags_exit_net() are synchronized.
>> Since you use smp_store_release()/READ_ONCE() they seem to run in
>> parallel. But then isn't it possible that inet_frag_kill() reads
>> nf->dead == 0, then inet_frags_exit_net() sets nf->dead, and then we
>> have the same race on concurrent removal? Or, isn't it possible that
>> inet_frag_kill() reads nf->dead == 1, but does not set
>> INET_FRAG_HASH_DEAD yet, and then inet_frags_free_cb() misses the
>> INET_FRAG_HASH_DEAD flag?
>>
>
> Yes this is kind of implied in my patch.
> I put the smp_store_release() and READ_ONCE exactly to document the
> possible races.
> This was the reason for my attempt in V1, doing a walk, but Herbert
> said walk was not designed for doing deletes.
>
> Proper synch will need a synchronize_rcu(), and thus a future
> conversion in net-next because we can not really
> add new synchronize_rcu() calls in an (struct
> pernet_operations.)exit() without considerable performance hit of
> netns dismantles.
>
> So this will require a conversion of all inet_frags_exit_net() callers
> to .exit_batch() to mitigate the cost.
>
> I thought of synchronize_rcu_bh() but this beast is going away soon anyway.

But if this patch allows all the same races and corruptions, then
what's the point?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net] inet: frags: rework rhashtable dismantle
  2018-10-02 13:45     ` Dmitry Vyukov
@ 2018-10-02 14:04       ` Eric Dumazet
  2018-10-02 14:28         ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2018-10-02 14:04 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: David Miller, netdev, Eric Dumazet, Thomas Graf, Herbert Xu

On Tue, Oct 2, 2018 at 6:46 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Tue, Oct 2, 2018 at 3:16 PM, Eric Dumazet <edumazet@google.com> wrote:
> > On Tue, Oct 2, 2018 at 1:19 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>
> >> On Tue, Oct 2, 2018 at 7:49 AM, Eric Dumazet <edumazet@google.com> wrote:
> >>
> >>
> >> Does inet_frag_kill() hold fq->lock? I am missing how inet_frag_kill()
> >> and inet_frags_exit_net() are synchronized.
> >> Since you use smp_store_release()/READ_ONCE() they seem to run in
> >> parallel. But then isn't it possible that inet_frag_kill() reads
> >> nf->dead == 0, then inet_frags_exit_net() sets nf->dead, and then we
> >> have the same race on concurrent removal? Or, isn't it possible that
> >> inet_frag_kill() reads nf->dead == 1, but does not set
> >> INET_FRAG_HASH_DEAD yet, and then inet_frags_free_cb() misses the
> >> INET_FRAG_HASH_DEAD flag?
> >>
> >
> > Yes this is kind of implied in my patch.
> > I put the smp_store_release() and READ_ONCE exactly to document the
> > possible races.
> > This was the reason for my attempt in V1, doing a walk, but Herbert
> > said walk was not designed for doing deletes.
> >
> > Proper synch will need a synchronize_rcu(), and thus a future
> > conversion in net-next because we can not really
> > add new synchronize_rcu() calls in an (struct
> > pernet_operations.)exit() without considerable performance hit of
> > netns dismantles.
> >
> > So this will require a conversion of all inet_frags_exit_net() callers
> > to .exit_batch() to mitigate the cost.
> >
> > I thought of synchronize_rcu_bh() but this beast is going away soon anyway.
>
> But if this patch allows all the same races and corruptions, then
> what's the point?

Not really. The current races can last dozen of seconds, if youu have
one million frags.

With the fix, the race is in the order of one usec on typical hosts.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net] inet: frags: rework rhashtable dismantle
  2018-10-02 14:04       ` Eric Dumazet
@ 2018-10-02 14:28         ` Dmitry Vyukov
  2018-10-02 14:32           ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2018-10-02 14:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Eric Dumazet, Thomas Graf, Herbert Xu

On Tue, Oct 2, 2018 at 4:04 PM, Eric Dumazet <edumazet@google.com> wrote:
> On Tue, Oct 2, 2018 at 6:46 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> On Tue, Oct 2, 2018 at 3:16 PM, Eric Dumazet <edumazet@google.com> wrote:
>> > On Tue, Oct 2, 2018 at 1:19 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>> >>
>> >> On Tue, Oct 2, 2018 at 7:49 AM, Eric Dumazet <edumazet@google.com> wrote:
>> >>
>> >>
>> >> Does inet_frag_kill() hold fq->lock? I am missing how inet_frag_kill()
>> >> and inet_frags_exit_net() are synchronized.
>> >> Since you use smp_store_release()/READ_ONCE() they seem to run in
>> >> parallel. But then isn't it possible that inet_frag_kill() reads
>> >> nf->dead == 0, then inet_frags_exit_net() sets nf->dead, and then we
>> >> have the same race on concurrent removal? Or, isn't it possible that
>> >> inet_frag_kill() reads nf->dead == 1, but does not set
>> >> INET_FRAG_HASH_DEAD yet, and then inet_frags_free_cb() misses the
>> >> INET_FRAG_HASH_DEAD flag?
>> >>
>> >
>> > Yes this is kind of implied in my patch.
>> > I put the smp_store_release() and READ_ONCE exactly to document the
>> > possible races.
>> > This was the reason for my attempt in V1, doing a walk, but Herbert
>> > said walk was not designed for doing deletes.
>> >
>> > Proper synch will need a synchronize_rcu(), and thus a future
>> > conversion in net-next because we can not really
>> > add new synchronize_rcu() calls in an (struct
>> > pernet_operations.)exit() without considerable performance hit of
>> > netns dismantles.
>> >
>> > So this will require a conversion of all inet_frags_exit_net() callers
>> > to .exit_batch() to mitigate the cost.
>> >
>> > I thought of synchronize_rcu_bh() but this beast is going away soon anyway.
>>
>> But if this patch allows all the same races and corruptions, then
>> what's the point?
>
> Not really. The current races can last dozen of seconds, if youu have
> one million frags.
>
> With the fix, the race is in the order of one usec on typical hosts.

Ah, I see. A known bug probably worth a comment in the code.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net] inet: frags: rework rhashtable dismantle
  2018-10-02 14:28         ` Dmitry Vyukov
@ 2018-10-02 14:32           ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2018-10-02 14:32 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: David Miller, netdev, Eric Dumazet, Thomas Graf, Herbert Xu

On Tue, Oct 2, 2018 at 7:28 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Tue, Oct 2, 2018 at 4:04 PM, Eric Dumazet <edumazet@google.com> wrote:
> > On Tue, Oct 2, 2018 at 6:46 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>
> >> On Tue, Oct 2, 2018 at 3:16 PM, Eric Dumazet <edumazet@google.com> wrote:
> >> > On Tue, Oct 2, 2018 at 1:19 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >> >>
> >> >> On Tue, Oct 2, 2018 at 7:49 AM, Eric Dumazet <edumazet@google.com> wrote:
> >> >>
> >> >>
> >> >> Does inet_frag_kill() hold fq->lock? I am missing how inet_frag_kill()
> >> >> and inet_frags_exit_net() are synchronized.
> >> >> Since you use smp_store_release()/READ_ONCE() they seem to run in
> >> >> parallel. But then isn't it possible that inet_frag_kill() reads
> >> >> nf->dead == 0, then inet_frags_exit_net() sets nf->dead, and then we
> >> >> have the same race on concurrent removal? Or, isn't it possible that
> >> >> inet_frag_kill() reads nf->dead == 1, but does not set
> >> >> INET_FRAG_HASH_DEAD yet, and then inet_frags_free_cb() misses the
> >> >> INET_FRAG_HASH_DEAD flag?
> >> >>
> >> >
> >> > Yes this is kind of implied in my patch.
> >> > I put the smp_store_release() and READ_ONCE exactly to document the
> >> > possible races.
> >> > This was the reason for my attempt in V1, doing a walk, but Herbert
> >> > said walk was not designed for doing deletes.
> >> >
> >> > Proper synch will need a synchronize_rcu(), and thus a future
> >> > conversion in net-next because we can not really
> >> > add new synchronize_rcu() calls in an (struct
> >> > pernet_operations.)exit() without considerable performance hit of
> >> > netns dismantles.
> >> >
> >> > So this will require a conversion of all inet_frags_exit_net() callers
> >> > to .exit_batch() to mitigate the cost.
> >> >
> >> > I thought of synchronize_rcu_bh() but this beast is going away soon anyway.
> >>
> >> But if this patch allows all the same races and corruptions, then
> >> what's the point?
> >
> > Not really. The current races can last dozen of seconds, if youu have
> > one million frags.
> >
> > With the fix, the race is in the order of one usec on typical hosts.
>
> Ah, I see. A known bug probably worth a comment in the code.

Plan is to use proper synchronize_rcu() in net-next

For net tree, I prefer not having to deal with backports hassle since
netns dismantles got many changes last 6 months.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net] inet: frags: rework rhashtable dismantle
  2018-10-02  5:49 [PATCH v2 net] inet: frags: rework rhashtable dismantle Eric Dumazet
  2018-10-02  8:19 ` Dmitry Vyukov
@ 2018-10-03  1:52 ` Eric Dumazet
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2018-10-03  1:52 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eric Dumazet, Thomas Graf, Herbert Xu



On 10/01/2018 10:49 PM, Eric Dumazet wrote:
> syszbot found an interesting use-after-free [1] happening
> while IPv4 fragment rhashtable was destroyed at netns dismantle.
>

David, please do not apply this patch.

I am working on another version, a bit more polished and allowing for more parallelism.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-10-03  8:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-02  5:49 [PATCH v2 net] inet: frags: rework rhashtable dismantle Eric Dumazet
2018-10-02  8:19 ` Dmitry Vyukov
2018-10-02 13:16   ` Eric Dumazet
2018-10-02 13:45     ` Dmitry Vyukov
2018-10-02 14:04       ` Eric Dumazet
2018-10-02 14:28         ` Dmitry Vyukov
2018-10-02 14:32           ` Eric Dumazet
2018-10-03  1:52 ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).