All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle
@ 2019-05-24 16:03 Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 01/11] inet: rename netns_frags to fqdir Eric Dumazet
                   ` (11 more replies)
  0 siblings, 12 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

This patch series fixes a race happening on netns dismantle with
frag queues. While rhashtable_free_and_destroy() is running,
concurrent timers might run inet_frag_kill() and attempt
rhashtable_remove_fast() calls. This is not allowed by
rhashtable logic.

Since I do not want to add expensive synchronize_rcu() calls
in the netns dismantle path, I had to no longer inline
netns_frags structures, but dynamically allocate them.

The ten first patches make this preparation, so that
the last patch clearly shows the fix.

As this patch series is not exactly trivial, I chose to
target 5.3. We will backport it once soaked a bit.

Eric Dumazet (11):
  inet: rename netns_frags to fqdir
  net: rename inet_frags_exit_net() to fqdir_exit()
  net: rename struct fqdir fields
  ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[]
  ipv6: no longer reference init_net in ip6_frags_ns_ctl_table[]
  netfilter: ipv6: nf_defrag: no longer reference init_net in
    nf_ct_frag6_sysctl_table
  ieee820154: 6lowpan: no longer reference init_net in
    lowpan_frags_ns_ctl_table
  net: rename inet_frags_init_net() to fdir_init()
  net: add a net pointer to struct fqdir
  net: dynamically allocate fqdir structures
  inet: frags: rework rhashtable dismantle

 include/net/inet_frag.h                 | 48 ++++++++----
 include/net/netns/ieee802154_6lowpan.h  |  2 +-
 include/net/netns/ipv4.h                |  2 +-
 include/net/netns/ipv6.h                |  4 +-
 net/ieee802154/6lowpan/reassembly.c     | 36 ++++-----
 net/ipv4/inet_fragment.c                | 98 ++++++++++++++++---------
 net/ipv4/ip_fragment.c                  | 67 +++++++----------
 net/ipv4/proc.c                         |  4 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c | 43 +++++------
 net/ipv6/proc.c                         |  4 +-
 net/ipv6/reassembly.c                   | 40 ++++------
 11 files changed, 181 insertions(+), 167 deletions(-)

-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH net-next 01/11] inet: rename netns_frags to fqdir
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 02/11] net: rename inet_frags_exit_net() to fqdir_exit() Eric Dumazet
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

1) struct netns_frags is renamed to struct fqdir
  This structure is really holding many frag queues in a hash table.

2) (struct inet_frag_queue)->net field is renamed to fqdir
  since net is generally associated to a 'struct net' pointer
  in networking stack.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/inet_frag.h                 | 29 +++++++-------
 include/net/netns/ieee802154_6lowpan.h  |  2 +-
 include/net/netns/ipv4.h                |  2 +-
 include/net/netns/ipv6.h                |  4 +-
 net/ieee802154/6lowpan/reassembly.c     |  2 +-
 net/ipv4/inet_fragment.c                | 52 ++++++++++++-------------
 net/ipv4/ip_fragment.c                  | 20 +++++-----
 net/ipv6/netfilter/nf_conntrack_reasm.c |  4 +-
 net/ipv6/reassembly.c                   |  6 +--
 9 files changed, 61 insertions(+), 60 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 378904ee91296ff06a78d3ede4151892186f4545..b19b1ba44ac595215f44b9c86029d6ad27e26458 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -4,7 +4,8 @@
 
 #include <linux/rhashtable-types.h>
 
-struct netns_frags {
+/* Per netns frag queues directory */
+struct fqdir {
 	/* sysctls */
 	long			high_thresh;
 	long			low_thresh;
@@ -64,7 +65,7 @@ struct frag_v6_compare_key {
  * @meat: length of received fragments so far
  * @flags: fragment queue flags
  * @max_size: maximum received fragment size
- * @net: namespace that this frag belongs to
+ * @fqdir: pointer to struct fqdir
  * @rcu: rcu head for freeing deferall
  */
 struct inet_frag_queue {
@@ -84,7 +85,7 @@ struct inet_frag_queue {
 	int			meat;
 	__u8			flags;
 	u16			max_size;
-	struct netns_frags      *net;
+	struct fqdir		*fqdir;
 	struct rcu_head		rcu;
 };
 
@@ -103,16 +104,16 @@ struct inet_frags {
 int inet_frags_init(struct inet_frags *);
 void inet_frags_fini(struct inet_frags *);
 
-static inline int inet_frags_init_net(struct netns_frags *nf)
+static inline int inet_frags_init_net(struct fqdir *fqdir)
 {
-	atomic_long_set(&nf->mem, 0);
-	return rhashtable_init(&nf->rhashtable, &nf->f->rhash_params);
+	atomic_long_set(&fqdir->mem, 0);
+	return rhashtable_init(&fqdir->rhashtable, &fqdir->f->rhash_params);
 }
-void inet_frags_exit_net(struct netns_frags *nf);
+void inet_frags_exit_net(struct fqdir *fqdir);
 
 void inet_frag_kill(struct inet_frag_queue *q);
 void inet_frag_destroy(struct inet_frag_queue *q);
-struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key);
+struct inet_frag_queue *inet_frag_find(struct fqdir *fqdir, void *key);
 
 /* Free all skbs in the queue; return the sum of their truesizes. */
 unsigned int inet_frag_rbtree_purge(struct rb_root *root);
@@ -125,19 +126,19 @@ static inline void inet_frag_put(struct inet_frag_queue *q)
 
 /* Memory Tracking Functions. */
 
-static inline long frag_mem_limit(const struct netns_frags *nf)
+static inline long frag_mem_limit(const struct fqdir *fqdir)
 {
-	return atomic_long_read(&nf->mem);
+	return atomic_long_read(&fqdir->mem);
 }
 
-static inline void sub_frag_mem_limit(struct netns_frags *nf, long val)
+static inline void sub_frag_mem_limit(struct fqdir *fqdir, long val)
 {
-	atomic_long_sub(val, &nf->mem);
+	atomic_long_sub(val, &fqdir->mem);
 }
 
-static inline void add_frag_mem_limit(struct netns_frags *nf, long val)
+static inline void add_frag_mem_limit(struct fqdir *fqdir, long val)
 {
-	atomic_long_add(val, &nf->mem);
+	atomic_long_add(val, &fqdir->mem);
 }
 
 /* RFC 3168 support :
diff --git a/include/net/netns/ieee802154_6lowpan.h b/include/net/netns/ieee802154_6lowpan.h
index 736aeac52f56c98723901ade68b7adb0f10d2a37..48897cbcb538cbae7658bb03e5a5a702c2036739 100644
--- a/include/net/netns/ieee802154_6lowpan.h
+++ b/include/net/netns/ieee802154_6lowpan.h
@@ -16,7 +16,7 @@ struct netns_sysctl_lowpan {
 
 struct netns_ieee802154_lowpan {
 	struct netns_sysctl_lowpan sysctl;
-	struct netns_frags	frags;
+	struct fqdir	frags;
 };
 
 #endif
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 7698460a3dd1e5070e12d406b3ee58834688cdc9..22f712141962c2c86cd0210ea97a7f111de5ee16 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -72,7 +72,7 @@ struct netns_ipv4 {
 
 	struct inet_peer_base	*peers;
 	struct sock  * __percpu	*tcp_sk;
-	struct netns_frags	frags;
+	struct fqdir	frags;
 #ifdef CONFIG_NETFILTER
 	struct xt_table		*iptable_filter;
 	struct xt_table		*iptable_mangle;
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 5e61b5a8635d7a01181886e3968d258eb7d74698..a22e8702d82866576c235489a810040adb4267c7 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -58,7 +58,7 @@ struct netns_ipv6 {
 	struct ipv6_devconf	*devconf_all;
 	struct ipv6_devconf	*devconf_dflt;
 	struct inet_peer_base	*peers;
-	struct netns_frags	frags;
+	struct fqdir	frags;
 #ifdef CONFIG_NETFILTER
 	struct xt_table		*ip6table_filter;
 	struct xt_table		*ip6table_mangle;
@@ -116,7 +116,7 @@ struct netns_ipv6 {
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
 struct netns_nf_frag {
-	struct netns_frags	frags;
+	struct fqdir	frags;
 };
 #endif
 
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 4196bcd4105ae047b88cdaf090055980144fcc2b..8551d307f2149d0c9e74c7d9f89665b082ed63bc 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -139,7 +139,7 @@ static int lowpan_frag_queue(struct lowpan_frag_queue *fq,
 		fq->q.flags |= INET_FRAG_FIRST_IN;
 
 	fq->q.meat += skb->len;
-	add_frag_mem_limit(fq->q.net, skb->truesize);
+	add_frag_mem_limit(fq->q.fqdir, skb->truesize);
 
 	if (fq->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
 	    fq->q.meat == fq->q.len) {
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 737808e27f8b1cd65aac21c9316a5db5d0f40111..f8de2860e3a3e0941d29c9d65ab8336cfee56f65 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -145,11 +145,11 @@ static void inet_frags_free_cb(void *ptr, void *arg)
 	inet_frag_put(fq);
 }
 
-void inet_frags_exit_net(struct netns_frags *nf)
+void inet_frags_exit_net(struct fqdir *fqdir)
 {
-	nf->high_thresh = 0; /* prevent creation of new frags */
+	fqdir->high_thresh = 0; /* prevent creation of new frags */
 
-	rhashtable_free_and_destroy(&nf->rhashtable, inet_frags_free_cb, NULL);
+	rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL);
 }
 EXPORT_SYMBOL(inet_frags_exit_net);
 
@@ -159,10 +159,10 @@ void inet_frag_kill(struct inet_frag_queue *fq)
 		refcount_dec(&fq->refcnt);
 
 	if (!(fq->flags & INET_FRAG_COMPLETE)) {
-		struct netns_frags *nf = fq->net;
+		struct fqdir *fqdir = fq->fqdir;
 
 		fq->flags |= INET_FRAG_COMPLETE;
-		rhashtable_remove_fast(&nf->rhashtable, &fq->node, nf->f->rhash_params);
+		rhashtable_remove_fast(&fqdir->rhashtable, &fq->node, fqdir->f->rhash_params);
 		refcount_dec(&fq->refcnt);
 	}
 }
@@ -172,7 +172,7 @@ static void inet_frag_destroy_rcu(struct rcu_head *head)
 {
 	struct inet_frag_queue *q = container_of(head, struct inet_frag_queue,
 						 rcu);
-	struct inet_frags *f = q->net->f;
+	struct inet_frags *f = q->fqdir->f;
 
 	if (f->destructor)
 		f->destructor(q);
@@ -203,7 +203,7 @@ EXPORT_SYMBOL(inet_frag_rbtree_purge);
 
 void inet_frag_destroy(struct inet_frag_queue *q)
 {
-	struct netns_frags *nf;
+	struct fqdir *fqdir;
 	unsigned int sum, sum_truesize = 0;
 	struct inet_frags *f;
 
@@ -211,18 +211,18 @@ void inet_frag_destroy(struct inet_frag_queue *q)
 	WARN_ON(del_timer(&q->timer) != 0);
 
 	/* Release all fragment data. */
-	nf = q->net;
-	f = nf->f;
+	fqdir = q->fqdir;
+	f = fqdir->f;
 	sum_truesize = inet_frag_rbtree_purge(&q->rb_fragments);
 	sum = sum_truesize + f->qsize;
 
 	call_rcu(&q->rcu, inet_frag_destroy_rcu);
 
-	sub_frag_mem_limit(nf, sum);
+	sub_frag_mem_limit(fqdir, sum);
 }
 EXPORT_SYMBOL(inet_frag_destroy);
 
-static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
+static struct inet_frag_queue *inet_frag_alloc(struct fqdir *fqdir,
 					       struct inet_frags *f,
 					       void *arg)
 {
@@ -232,9 +232,9 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
 	if (!q)
 		return NULL;
 
-	q->net = nf;
+	q->fqdir = fqdir;
 	f->constructor(q, arg);
-	add_frag_mem_limit(nf, f->qsize);
+	add_frag_mem_limit(fqdir, f->qsize);
 
 	timer_setup(&q->timer, f->frag_expire, 0);
 	spin_lock_init(&q->lock);
@@ -243,21 +243,21 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
 	return q;
 }
 
-static struct inet_frag_queue *inet_frag_create(struct netns_frags *nf,
+static struct inet_frag_queue *inet_frag_create(struct fqdir *fqdir,
 						void *arg,
 						struct inet_frag_queue **prev)
 {
-	struct inet_frags *f = nf->f;
+	struct inet_frags *f = fqdir->f;
 	struct inet_frag_queue *q;
 
-	q = inet_frag_alloc(nf, f, arg);
+	q = inet_frag_alloc(fqdir, f, arg);
 	if (!q) {
 		*prev = ERR_PTR(-ENOMEM);
 		return NULL;
 	}
-	mod_timer(&q->timer, jiffies + nf->timeout);
+	mod_timer(&q->timer, jiffies + fqdir->timeout);
 
-	*prev = rhashtable_lookup_get_insert_key(&nf->rhashtable, &q->key,
+	*prev = rhashtable_lookup_get_insert_key(&fqdir->rhashtable, &q->key,
 						 &q->node, f->rhash_params);
 	if (*prev) {
 		q->flags |= INET_FRAG_COMPLETE;
@@ -269,18 +269,18 @@ static struct inet_frag_queue *inet_frag_create(struct netns_frags *nf,
 }
 
 /* TODO : call from rcu_read_lock() and no longer use refcount_inc_not_zero() */
-struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key)
+struct inet_frag_queue *inet_frag_find(struct fqdir *fqdir, void *key)
 {
 	struct inet_frag_queue *fq = NULL, *prev;
 
-	if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh)
+	if (!fqdir->high_thresh || frag_mem_limit(fqdir) > fqdir->high_thresh)
 		return NULL;
 
 	rcu_read_lock();
 
-	prev = rhashtable_lookup(&nf->rhashtable, key, nf->f->rhash_params);
+	prev = rhashtable_lookup(&fqdir->rhashtable, key, fqdir->f->rhash_params);
 	if (!prev)
-		fq = inet_frag_create(nf, key, &prev);
+		fq = inet_frag_create(fqdir, key, &prev);
 	if (prev && !IS_ERR(prev)) {
 		fq = prev;
 		if (!refcount_inc_not_zero(&fq->refcnt))
@@ -391,7 +391,7 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
 
 	delta += head->truesize;
 	if (delta)
-		add_frag_mem_limit(q->net, delta);
+		add_frag_mem_limit(q->fqdir, delta);
 
 	/* If the first fragment is fragmented itself, we split
 	 * it to two chunks: the first with data and paged part
@@ -413,7 +413,7 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
 		head->truesize += clone->truesize;
 		clone->csum = 0;
 		clone->ip_summed = head->ip_summed;
-		add_frag_mem_limit(q->net, clone->truesize);
+		add_frag_mem_limit(q->fqdir, clone->truesize);
 		skb_shinfo(head)->frag_list = clone;
 		nextp = &clone->next;
 	} else {
@@ -466,7 +466,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
 			rbn = rbnext;
 		}
 	}
-	sub_frag_mem_limit(q->net, head->truesize);
+	sub_frag_mem_limit(q->fqdir, head->truesize);
 
 	*nextp = NULL;
 	skb_mark_not_on_list(head);
@@ -494,7 +494,7 @@ struct sk_buff *inet_frag_pull_head(struct inet_frag_queue *q)
 	if (head == q->fragments_tail)
 		q->fragments_tail = NULL;
 
-	sub_frag_mem_limit(q->net, head->truesize);
+	sub_frag_mem_limit(q->fqdir, head->truesize);
 
 	return head;
 }
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index cf2b0a6a33372a7a6ed8bc30f505a350be463d91..c93e27cb0a8d1e404fd54e6aa5ea6a99ccecba4a 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -82,7 +82,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
 static void ip4_frag_init(struct inet_frag_queue *q, const void *a)
 {
 	struct ipq *qp = container_of(q, struct ipq, q);
-	struct netns_ipv4 *ipv4 = container_of(q->net, struct netns_ipv4,
+	struct netns_ipv4 *ipv4 = container_of(q->fqdir, struct netns_ipv4,
 					       frags);
 	struct net *net = container_of(ipv4, struct net, ipv4);
 
@@ -90,7 +90,7 @@ static void ip4_frag_init(struct inet_frag_queue *q, const void *a)
 
 	q->key.v4 = *key;
 	qp->ecn = 0;
-	qp->peer = q->net->max_dist ?
+	qp->peer = q->fqdir->max_dist ?
 		inet_getpeer_v4(net->ipv4.peers, key->saddr, key->vif, 1) :
 		NULL;
 }
@@ -142,7 +142,7 @@ static void ip_expire(struct timer_list *t)
 	int err;
 
 	qp = container_of(frag, struct ipq, q);
-	net = container_of(qp->q.net, struct net, ipv4.frags);
+	net = container_of(qp->q.fqdir, struct net, ipv4.frags);
 
 	rcu_read_lock();
 	spin_lock(&qp->q.lock);
@@ -222,7 +222,7 @@ static struct ipq *ip_find(struct net *net, struct iphdr *iph,
 static int ip_frag_too_far(struct ipq *qp)
 {
 	struct inet_peer *peer = qp->peer;
-	unsigned int max = qp->q.net->max_dist;
+	unsigned int max = qp->q.fqdir->max_dist;
 	unsigned int start, end;
 
 	int rc;
@@ -239,7 +239,7 @@ static int ip_frag_too_far(struct ipq *qp)
 	if (rc) {
 		struct net *net;
 
-		net = container_of(qp->q.net, struct net, ipv4.frags);
+		net = container_of(qp->q.fqdir, struct net, ipv4.frags);
 		__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
 	}
 
@@ -250,13 +250,13 @@ static int ip_frag_reinit(struct ipq *qp)
 {
 	unsigned int sum_truesize = 0;
 
-	if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
+	if (!mod_timer(&qp->q.timer, jiffies + qp->q.fqdir->timeout)) {
 		refcount_inc(&qp->q.refcnt);
 		return -ETIMEDOUT;
 	}
 
 	sum_truesize = inet_frag_rbtree_purge(&qp->q.rb_fragments);
-	sub_frag_mem_limit(qp->q.net, sum_truesize);
+	sub_frag_mem_limit(qp->q.fqdir, sum_truesize);
 
 	qp->q.flags = 0;
 	qp->q.len = 0;
@@ -273,7 +273,7 @@ static int ip_frag_reinit(struct ipq *qp)
 /* Add new segment to existing queue. */
 static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 {
-	struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
+	struct net *net = container_of(qp->q.fqdir, struct net, ipv4.frags);
 	int ihl, end, flags, offset;
 	struct sk_buff *prev_tail;
 	struct net_device *dev;
@@ -352,7 +352,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 	qp->q.stamp = skb->tstamp;
 	qp->q.meat += skb->len;
 	qp->ecn |= ecn;
-	add_frag_mem_limit(qp->q.net, skb->truesize);
+	add_frag_mem_limit(qp->q.fqdir, skb->truesize);
 	if (offset == 0)
 		qp->q.flags |= INET_FRAG_FIRST_IN;
 
@@ -399,7 +399,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
 			 struct sk_buff *prev_tail, struct net_device *dev)
 {
-	struct net *net = container_of(qp->q.net, struct net, ipv4.frags);
+	struct net *net = container_of(qp->q.fqdir, struct net, ipv4.frags);
 	struct iphdr *iph;
 	void *reasm_data;
 	int len, err;
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 3de0e9b0a48247f0eef0691295642d6d8f521280..5b877d732b2fff23adb5be64dcbed587ab4e8077 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -151,7 +151,7 @@ static void nf_ct_frag6_expire(struct timer_list *t)
 	struct net *net;
 
 	fq = container_of(frag, struct frag_queue, q);
-	net = container_of(fq->q.net, struct net, nf_frag.frags);
+	net = container_of(fq->q.fqdir, struct net, nf_frag.frags);
 
 	ip6frag_expire_frag_queue(net, fq);
 }
@@ -276,7 +276,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
 	fq->ecn |= ecn;
 	if (payload_len > fq->q.max_size)
 		fq->q.max_size = payload_len;
-	add_frag_mem_limit(fq->q.net, skb->truesize);
+	add_frag_mem_limit(fq->q.fqdir, skb->truesize);
 
 	/* The first fragment.
 	 * nhoffset is obtained from the first fragment, of course.
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 1a832f5e190bb00554026b50837329606371ed47..acd5a9a04415506570da67cc3dcee9cb61cfbd5b 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -79,7 +79,7 @@ static void ip6_frag_expire(struct timer_list *t)
 	struct net *net;
 
 	fq = container_of(frag, struct frag_queue, q);
-	net = container_of(fq->q.net, struct net, ipv6.frags);
+	net = container_of(fq->q.fqdir, struct net, ipv6.frags);
 
 	ip6frag_expire_frag_queue(net, fq);
 }
@@ -200,7 +200,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 	fq->q.stamp = skb->tstamp;
 	fq->q.meat += skb->len;
 	fq->ecn |= ecn;
-	add_frag_mem_limit(fq->q.net, skb->truesize);
+	add_frag_mem_limit(fq->q.fqdir, skb->truesize);
 
 	fragsize = -skb_network_offset(skb) + skb->len;
 	if (fragsize > fq->q.max_size)
@@ -254,7 +254,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb,
 			  struct sk_buff *prev_tail, struct net_device *dev)
 {
-	struct net *net = container_of(fq->q.net, struct net, ipv6.frags);
+	struct net *net = container_of(fq->q.fqdir, struct net, ipv6.frags);
 	unsigned int nhoff;
 	void *reasm_data;
 	int payload_len;
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next 02/11] net: rename inet_frags_exit_net() to fqdir_exit()
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 01/11] inet: rename netns_frags to fqdir Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 03/11] net: rename struct fqdir fields Eric Dumazet
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/inet_frag.h                 | 2 +-
 net/ieee802154/6lowpan/reassembly.c     | 4 ++--
 net/ipv4/inet_fragment.c                | 4 ++--
 net/ipv4/ip_fragment.c                  | 4 ++--
 net/ipv6/netfilter/nf_conntrack_reasm.c | 4 ++--
 net/ipv6/reassembly.c                   | 4 ++--
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index b19b1ba44ac595215f44b9c86029d6ad27e26458..d1bfd5dbe2d439e1cd9c620e5197ffbffb05920a 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -109,7 +109,7 @@ static inline int inet_frags_init_net(struct fqdir *fqdir)
 	atomic_long_set(&fqdir->mem, 0);
 	return rhashtable_init(&fqdir->rhashtable, &fqdir->f->rhash_params);
 }
-void inet_frags_exit_net(struct fqdir *fqdir);
+void fqdir_exit(struct fqdir *fqdir);
 
 void inet_frag_kill(struct inet_frag_queue *q);
 void inet_frag_destroy(struct inet_frag_queue *q);
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 8551d307f2149d0c9e74c7d9f89665b082ed63bc..dc73452d3224b7b125405ba577e8d222e2ee9db3 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -464,7 +464,7 @@ static int __net_init lowpan_frags_init_net(struct net *net)
 		return res;
 	res = lowpan_frags_ns_sysctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&ieee802154_lowpan->frags);
+		fqdir_exit(&ieee802154_lowpan->frags);
 	return res;
 }
 
@@ -474,7 +474,7 @@ static void __net_exit lowpan_frags_exit_net(struct net *net)
 		net_ieee802154_lowpan(net);
 
 	lowpan_frags_ns_sysctl_unregister(net);
-	inet_frags_exit_net(&ieee802154_lowpan->frags);
+	fqdir_exit(&ieee802154_lowpan->frags);
 }
 
 static struct pernet_operations lowpan_frags_ops = {
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index f8de2860e3a3e0941d29c9d65ab8336cfee56f65..a5ec5d9567931d03cfa3fbe1a370857ed8dc3b3d 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -145,13 +145,13 @@ static void inet_frags_free_cb(void *ptr, void *arg)
 	inet_frag_put(fq);
 }
 
-void inet_frags_exit_net(struct fqdir *fqdir)
+void fqdir_exit(struct fqdir *fqdir)
 {
 	fqdir->high_thresh = 0; /* prevent creation of new frags */
 
 	rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL);
 }
-EXPORT_SYMBOL(inet_frags_exit_net);
+EXPORT_SYMBOL(fqdir_exit);
 
 void inet_frag_kill(struct inet_frag_queue *fq)
 {
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index c93e27cb0a8d1e404fd54e6aa5ea6a99ccecba4a..9de13b5d23e37f585aa50e636e009e9d21083d02 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -685,14 +685,14 @@ static int __net_init ipv4_frags_init_net(struct net *net)
 		return res;
 	res = ip4_frags_ns_ctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&net->ipv4.frags);
+		fqdir_exit(&net->ipv4.frags);
 	return res;
 }
 
 static void __net_exit ipv4_frags_exit_net(struct net *net)
 {
 	ip4_frags_ns_ctl_unregister(net);
-	inet_frags_exit_net(&net->ipv4.frags);
+	fqdir_exit(&net->ipv4.frags);
 }
 
 static struct pernet_operations ip4_frags_ops = {
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 5b877d732b2fff23adb5be64dcbed587ab4e8077..f08e1422c56dcd207d148ecb578df4a8f7ac2d9d 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -506,14 +506,14 @@ static int nf_ct_net_init(struct net *net)
 		return res;
 	res = nf_ct_frag6_sysctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&net->nf_frag.frags);
+		fqdir_exit(&net->nf_frag.frags);
 	return res;
 }
 
 static void nf_ct_net_exit(struct net *net)
 {
 	nf_ct_frags6_sysctl_unregister(net);
-	inet_frags_exit_net(&net->nf_frag.frags);
+	fqdir_exit(&net->nf_frag.frags);
 }
 
 static struct pernet_operations nf_ct_net_ops = {
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index acd5a9a04415506570da67cc3dcee9cb61cfbd5b..f1142f5d5075a56164aa3f1f56c12328ab99747c 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -528,14 +528,14 @@ static int __net_init ipv6_frags_init_net(struct net *net)
 
 	res = ip6_frags_ns_sysctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&net->ipv6.frags);
+		fqdir_exit(&net->ipv6.frags);
 	return res;
 }
 
 static void __net_exit ipv6_frags_exit_net(struct net *net)
 {
 	ip6_frags_ns_sysctl_unregister(net);
-	inet_frags_exit_net(&net->ipv6.frags);
+	fqdir_exit(&net->ipv6.frags);
 }
 
 static struct pernet_operations ip6_frags_ops = {
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next 03/11] net: rename struct fqdir fields
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 01/11] inet: rename netns_frags to fqdir Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 02/11] net: rename inet_frags_exit_net() to fqdir_exit() Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 04/11] ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[] Eric Dumazet
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Rename the @frags fields from structs netns_ipv4, netns_ipv6,
netns_nf_frag and netns_ieee802154_lowpan to @fqdir

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/netns/ieee802154_6lowpan.h  |  2 +-
 include/net/netns/ipv4.h                |  2 +-
 include/net/netns/ipv6.h                |  4 +-
 net/ieee802154/6lowpan/reassembly.c     | 36 ++++++++---------
 net/ipv4/ip_fragment.c                  | 52 ++++++++++++-------------
 net/ipv4/proc.c                         |  4 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c | 40 +++++++++----------
 net/ipv6/proc.c                         |  4 +-
 net/ipv6/reassembly.c                   | 40 +++++++++----------
 9 files changed, 92 insertions(+), 92 deletions(-)

diff --git a/include/net/netns/ieee802154_6lowpan.h b/include/net/netns/ieee802154_6lowpan.h
index 48897cbcb538cbae7658bb03e5a5a702c2036739..d27ac64f8dfefcc2253c14c04af193e6265b9e02 100644
--- a/include/net/netns/ieee802154_6lowpan.h
+++ b/include/net/netns/ieee802154_6lowpan.h
@@ -16,7 +16,7 @@ struct netns_sysctl_lowpan {
 
 struct netns_ieee802154_lowpan {
 	struct netns_sysctl_lowpan sysctl;
-	struct fqdir	frags;
+	struct fqdir		fqdir;
 };
 
 #endif
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 22f712141962c2c86cd0210ea97a7f111de5ee16..3c270baa32e030b36594627a77abe3b4ffc775f5 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -72,7 +72,7 @@ struct netns_ipv4 {
 
 	struct inet_peer_base	*peers;
 	struct sock  * __percpu	*tcp_sk;
-	struct fqdir	frags;
+	struct fqdir		fqdir;
 #ifdef CONFIG_NETFILTER
 	struct xt_table		*iptable_filter;
 	struct xt_table		*iptable_mangle;
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index a22e8702d82866576c235489a810040adb4267c7..3dd2ae2a38e2aead40f2bcf13e40b79ca492ad48 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -58,7 +58,7 @@ struct netns_ipv6 {
 	struct ipv6_devconf	*devconf_all;
 	struct ipv6_devconf	*devconf_dflt;
 	struct inet_peer_base	*peers;
-	struct fqdir	frags;
+	struct fqdir		fqdir;
 #ifdef CONFIG_NETFILTER
 	struct xt_table		*ip6table_filter;
 	struct xt_table		*ip6table_mangle;
@@ -116,7 +116,7 @@ struct netns_ipv6 {
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
 struct netns_nf_frag {
-	struct fqdir	frags;
+	struct fqdir	fqdir;
 };
 #endif
 
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index dc73452d3224b7b125405ba577e8d222e2ee9db3..955047fe797a10995845bca51cb9770e3356a351 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -79,7 +79,7 @@ fq_find(struct net *net, const struct lowpan_802154_cb *cb,
 	key.src = *src;
 	key.dst = *dst;
 
-	q = inet_frag_find(&ieee802154_lowpan->frags, &key);
+	q = inet_frag_find(&ieee802154_lowpan->fqdir, &key);
 	if (!q)
 		return NULL;
 
@@ -326,23 +326,23 @@ int lowpan_frag_rcv(struct sk_buff *skb, u8 frag_type)
 static struct ctl_table lowpan_frags_ns_ctl_table[] = {
 	{
 		.procname	= "6lowpanfrag_high_thresh",
-		.data		= &init_net.ieee802154_lowpan.frags.high_thresh,
+		.data		= &init_net.ieee802154_lowpan.fqdir.high_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &init_net.ieee802154_lowpan.frags.low_thresh
+		.extra1		= &init_net.ieee802154_lowpan.fqdir.low_thresh
 	},
 	{
 		.procname	= "6lowpanfrag_low_thresh",
-		.data		= &init_net.ieee802154_lowpan.frags.low_thresh,
+		.data		= &init_net.ieee802154_lowpan.fqdir.low_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra2		= &init_net.ieee802154_lowpan.frags.high_thresh
+		.extra2		= &init_net.ieee802154_lowpan.fqdir.high_thresh
 	},
 	{
 		.procname	= "6lowpanfrag_time",
-		.data		= &init_net.ieee802154_lowpan.frags.timeout,
+		.data		= &init_net.ieee802154_lowpan.fqdir.timeout,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
@@ -377,11 +377,11 @@ static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
 		if (table == NULL)
 			goto err_alloc;
 
-		table[0].data = &ieee802154_lowpan->frags.high_thresh;
-		table[0].extra1 = &ieee802154_lowpan->frags.low_thresh;
-		table[1].data = &ieee802154_lowpan->frags.low_thresh;
-		table[1].extra2 = &ieee802154_lowpan->frags.high_thresh;
-		table[2].data = &ieee802154_lowpan->frags.timeout;
+		table[0].data = &ieee802154_lowpan->fqdir.high_thresh;
+		table[0].extra1 = &ieee802154_lowpan->fqdir.low_thresh;
+		table[1].data = &ieee802154_lowpan->fqdir.low_thresh;
+		table[1].extra2 = &ieee802154_lowpan->fqdir.high_thresh;
+		table[2].data = &ieee802154_lowpan->fqdir.timeout;
 
 		/* Don't export sysctls to unprivileged users */
 		if (net->user_ns != &init_user_ns)
@@ -454,17 +454,17 @@ static int __net_init lowpan_frags_init_net(struct net *net)
 		net_ieee802154_lowpan(net);
 	int res;
 
-	ieee802154_lowpan->frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
-	ieee802154_lowpan->frags.low_thresh = IPV6_FRAG_LOW_THRESH;
-	ieee802154_lowpan->frags.timeout = IPV6_FRAG_TIMEOUT;
-	ieee802154_lowpan->frags.f = &lowpan_frags;
+	ieee802154_lowpan->fqdir.high_thresh = IPV6_FRAG_HIGH_THRESH;
+	ieee802154_lowpan->fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
+	ieee802154_lowpan->fqdir.timeout = IPV6_FRAG_TIMEOUT;
+	ieee802154_lowpan->fqdir.f = &lowpan_frags;
 
-	res = inet_frags_init_net(&ieee802154_lowpan->frags);
+	res = inet_frags_init_net(&ieee802154_lowpan->fqdir);
 	if (res < 0)
 		return res;
 	res = lowpan_frags_ns_sysctl_register(net);
 	if (res < 0)
-		fqdir_exit(&ieee802154_lowpan->frags);
+		fqdir_exit(&ieee802154_lowpan->fqdir);
 	return res;
 }
 
@@ -474,7 +474,7 @@ static void __net_exit lowpan_frags_exit_net(struct net *net)
 		net_ieee802154_lowpan(net);
 
 	lowpan_frags_ns_sysctl_unregister(net);
-	fqdir_exit(&ieee802154_lowpan->frags);
+	fqdir_exit(&ieee802154_lowpan->fqdir);
 }
 
 static struct pernet_operations lowpan_frags_ops = {
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 9de13b5d23e37f585aa50e636e009e9d21083d02..f1831367cc2b188bdcc93f25818dda13e4348427 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -83,7 +83,7 @@ static void ip4_frag_init(struct inet_frag_queue *q, const void *a)
 {
 	struct ipq *qp = container_of(q, struct ipq, q);
 	struct netns_ipv4 *ipv4 = container_of(q->fqdir, struct netns_ipv4,
-					       frags);
+					       fqdir);
 	struct net *net = container_of(ipv4, struct net, ipv4);
 
 	const struct frag_v4_compare_key *key = a;
@@ -142,7 +142,7 @@ static void ip_expire(struct timer_list *t)
 	int err;
 
 	qp = container_of(frag, struct ipq, q);
-	net = container_of(qp->q.fqdir, struct net, ipv4.frags);
+	net = container_of(qp->q.fqdir, struct net, ipv4.fqdir);
 
 	rcu_read_lock();
 	spin_lock(&qp->q.lock);
@@ -211,7 +211,7 @@ static struct ipq *ip_find(struct net *net, struct iphdr *iph,
 	};
 	struct inet_frag_queue *q;
 
-	q = inet_frag_find(&net->ipv4.frags, &key);
+	q = inet_frag_find(&net->ipv4.fqdir, &key);
 	if (!q)
 		return NULL;
 
@@ -239,7 +239,7 @@ static int ip_frag_too_far(struct ipq *qp)
 	if (rc) {
 		struct net *net;
 
-		net = container_of(qp->q.fqdir, struct net, ipv4.frags);
+		net = container_of(qp->q.fqdir, struct net, ipv4.fqdir);
 		__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
 	}
 
@@ -273,7 +273,7 @@ static int ip_frag_reinit(struct ipq *qp)
 /* Add new segment to existing queue. */
 static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 {
-	struct net *net = container_of(qp->q.fqdir, struct net, ipv4.frags);
+	struct net *net = container_of(qp->q.fqdir, struct net, ipv4.fqdir);
 	int ihl, end, flags, offset;
 	struct sk_buff *prev_tail;
 	struct net_device *dev;
@@ -399,7 +399,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
 			 struct sk_buff *prev_tail, struct net_device *dev)
 {
-	struct net *net = container_of(qp->q.fqdir, struct net, ipv4.frags);
+	struct net *net = container_of(qp->q.fqdir, struct net, ipv4.fqdir);
 	struct iphdr *iph;
 	void *reasm_data;
 	int len, err;
@@ -544,30 +544,30 @@ static int dist_min;
 static struct ctl_table ip4_frags_ns_ctl_table[] = {
 	{
 		.procname	= "ipfrag_high_thresh",
-		.data		= &init_net.ipv4.frags.high_thresh,
+		.data		= &init_net.ipv4.fqdir.high_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &init_net.ipv4.frags.low_thresh
+		.extra1		= &init_net.ipv4.fqdir.low_thresh
 	},
 	{
 		.procname	= "ipfrag_low_thresh",
-		.data		= &init_net.ipv4.frags.low_thresh,
+		.data		= &init_net.ipv4.fqdir.low_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra2		= &init_net.ipv4.frags.high_thresh
+		.extra2		= &init_net.ipv4.fqdir.high_thresh
 	},
 	{
 		.procname	= "ipfrag_time",
-		.data		= &init_net.ipv4.frags.timeout,
+		.data		= &init_net.ipv4.fqdir.timeout,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
 	},
 	{
 		.procname	= "ipfrag_max_dist",
-		.data		= &init_net.ipv4.frags.max_dist,
+		.data		= &init_net.ipv4.fqdir.max_dist,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
@@ -600,12 +600,12 @@ static int __net_init ip4_frags_ns_ctl_register(struct net *net)
 		if (!table)
 			goto err_alloc;
 
-		table[0].data = &net->ipv4.frags.high_thresh;
-		table[0].extra1 = &net->ipv4.frags.low_thresh;
-		table[1].data = &net->ipv4.frags.low_thresh;
-		table[1].extra2 = &net->ipv4.frags.high_thresh;
-		table[2].data = &net->ipv4.frags.timeout;
-		table[3].data = &net->ipv4.frags.max_dist;
+		table[0].data = &net->ipv4.fqdir.high_thresh;
+		table[0].extra1 = &net->ipv4.fqdir.low_thresh;
+		table[1].data = &net->ipv4.fqdir.low_thresh;
+		table[1].extra2 = &net->ipv4.fqdir.high_thresh;
+		table[2].data = &net->ipv4.fqdir.timeout;
+		table[3].data = &net->ipv4.fqdir.max_dist;
 	}
 
 	hdr = register_net_sysctl(net, "net/ipv4", table);
@@ -668,31 +668,31 @@ static int __net_init ipv4_frags_init_net(struct net *net)
 	 * we will prune down to 3MB, making room for approx 8 big 64K
 	 * fragments 8x128k.
 	 */
-	net->ipv4.frags.high_thresh = 4 * 1024 * 1024;
-	net->ipv4.frags.low_thresh  = 3 * 1024 * 1024;
+	net->ipv4.fqdir.high_thresh = 4 * 1024 * 1024;
+	net->ipv4.fqdir.low_thresh  = 3 * 1024 * 1024;
 	/*
 	 * Important NOTE! Fragment queue must be destroyed before MSL expires.
 	 * RFC791 is wrong proposing to prolongate timer each fragment arrival
 	 * by TTL.
 	 */
-	net->ipv4.frags.timeout = IP_FRAG_TIME;
+	net->ipv4.fqdir.timeout = IP_FRAG_TIME;
 
-	net->ipv4.frags.max_dist = 64;
-	net->ipv4.frags.f = &ip4_frags;
+	net->ipv4.fqdir.max_dist = 64;
+	net->ipv4.fqdir.f = &ip4_frags;
 
-	res = inet_frags_init_net(&net->ipv4.frags);
+	res = inet_frags_init_net(&net->ipv4.fqdir);
 	if (res < 0)
 		return res;
 	res = ip4_frags_ns_ctl_register(net);
 	if (res < 0)
-		fqdir_exit(&net->ipv4.frags);
+		fqdir_exit(&net->ipv4.fqdir);
 	return res;
 }
 
 static void __net_exit ipv4_frags_exit_net(struct net *net)
 {
 	ip4_frags_ns_ctl_unregister(net);
-	fqdir_exit(&net->ipv4.frags);
+	fqdir_exit(&net->ipv4.fqdir);
 }
 
 static struct pernet_operations ip4_frags_ops = {
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index c3610b37bb4ce665b1976d8cc907b6dd0de42ab9..3927e00084e8eacc237c4bb46554458ad0266dcf 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -72,8 +72,8 @@ static int sockstat_seq_show(struct seq_file *seq, void *v)
 	seq_printf(seq, "RAW: inuse %d\n",
 		   sock_prot_inuse_get(net, &raw_prot));
 	seq_printf(seq,  "FRAG: inuse %u memory %lu\n",
-		   atomic_read(&net->ipv4.frags.rhashtable.nelems),
-		   frag_mem_limit(&net->ipv4.frags));
+		   atomic_read(&net->ipv4.fqdir.rhashtable.nelems),
+		   frag_mem_limit(&net->ipv4.fqdir));
 	return 0;
 }
 
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index f08e1422c56dcd207d148ecb578df4a8f7ac2d9d..46073e9a6c566b0f019c94de902f347f6e0f0cba 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -58,26 +58,26 @@ static struct inet_frags nf_frags;
 static struct ctl_table nf_ct_frag6_sysctl_table[] = {
 	{
 		.procname	= "nf_conntrack_frag6_timeout",
-		.data		= &init_net.nf_frag.frags.timeout,
+		.data		= &init_net.nf_frag.fqdir.timeout,
 		.maxlen		= sizeof(unsigned int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
 	},
 	{
 		.procname	= "nf_conntrack_frag6_low_thresh",
-		.data		= &init_net.nf_frag.frags.low_thresh,
+		.data		= &init_net.nf_frag.fqdir.low_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra2		= &init_net.nf_frag.frags.high_thresh
+		.extra2		= &init_net.nf_frag.fqdir.high_thresh
 	},
 	{
 		.procname	= "nf_conntrack_frag6_high_thresh",
-		.data		= &init_net.nf_frag.frags.high_thresh,
+		.data		= &init_net.nf_frag.fqdir.high_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &init_net.nf_frag.frags.low_thresh
+		.extra1		= &init_net.nf_frag.fqdir.low_thresh
 	},
 	{ }
 };
@@ -94,12 +94,12 @@ static int nf_ct_frag6_sysctl_register(struct net *net)
 		if (table == NULL)
 			goto err_alloc;
 
-		table[0].data = &net->nf_frag.frags.timeout;
-		table[1].data = &net->nf_frag.frags.low_thresh;
-		table[1].extra2 = &net->nf_frag.frags.high_thresh;
-		table[2].data = &net->nf_frag.frags.high_thresh;
-		table[2].extra1 = &net->nf_frag.frags.low_thresh;
-		table[2].extra2 = &init_net.nf_frag.frags.high_thresh;
+		table[0].data = &net->nf_frag.fqdir.timeout;
+		table[1].data = &net->nf_frag.fqdir.low_thresh;
+		table[1].extra2 = &net->nf_frag.fqdir.high_thresh;
+		table[2].data = &net->nf_frag.fqdir.high_thresh;
+		table[2].extra1 = &net->nf_frag.fqdir.low_thresh;
+		table[2].extra2 = &init_net.nf_frag.fqdir.high_thresh;
 	}
 
 	hdr = register_net_sysctl(net, "net/netfilter", table);
@@ -151,7 +151,7 @@ static void nf_ct_frag6_expire(struct timer_list *t)
 	struct net *net;
 
 	fq = container_of(frag, struct frag_queue, q);
-	net = container_of(fq->q.fqdir, struct net, nf_frag.frags);
+	net = container_of(fq->q.fqdir, struct net, nf_frag.fqdir);
 
 	ip6frag_expire_frag_queue(net, fq);
 }
@@ -169,7 +169,7 @@ static struct frag_queue *fq_find(struct net *net, __be32 id, u32 user,
 	};
 	struct inet_frag_queue *q;
 
-	q = inet_frag_find(&net->nf_frag.frags, &key);
+	q = inet_frag_find(&net->nf_frag.fqdir, &key);
 	if (!q)
 		return NULL;
 
@@ -496,24 +496,24 @@ static int nf_ct_net_init(struct net *net)
 {
 	int res;
 
-	net->nf_frag.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
-	net->nf_frag.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
-	net->nf_frag.frags.timeout = IPV6_FRAG_TIMEOUT;
-	net->nf_frag.frags.f = &nf_frags;
+	net->nf_frag.fqdir.high_thresh = IPV6_FRAG_HIGH_THRESH;
+	net->nf_frag.fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
+	net->nf_frag.fqdir.timeout = IPV6_FRAG_TIMEOUT;
+	net->nf_frag.fqdir.f = &nf_frags;
 
-	res = inet_frags_init_net(&net->nf_frag.frags);
+	res = inet_frags_init_net(&net->nf_frag.fqdir);
 	if (res < 0)
 		return res;
 	res = nf_ct_frag6_sysctl_register(net);
 	if (res < 0)
-		fqdir_exit(&net->nf_frag.frags);
+		fqdir_exit(&net->nf_frag.fqdir);
 	return res;
 }
 
 static void nf_ct_net_exit(struct net *net)
 {
 	nf_ct_frags6_sysctl_unregister(net);
-	fqdir_exit(&net->nf_frag.frags);
+	fqdir_exit(&net->nf_frag.fqdir);
 }
 
 static struct pernet_operations nf_ct_net_ops = {
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 2356b4af7309c06bf1572ab4cbf8299b5ca86e51..f3e3118393c4b4841228797c168e757585dbb17b 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -48,8 +48,8 @@ static int sockstat6_seq_show(struct seq_file *seq, void *v)
 	seq_printf(seq, "RAW6: inuse %d\n",
 		       sock_prot_inuse_get(net, &rawv6_prot));
 	seq_printf(seq, "FRAG6: inuse %u memory %lu\n",
-		   atomic_read(&net->ipv6.frags.rhashtable.nelems),
-		   frag_mem_limit(&net->ipv6.frags));
+		   atomic_read(&net->ipv6.fqdir.rhashtable.nelems),
+		   frag_mem_limit(&net->ipv6.fqdir));
 	return 0;
 }
 
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index f1142f5d5075a56164aa3f1f56c12328ab99747c..5160fd9ed223b723249b1c3f8ac3e2a97c7ffc43 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -79,7 +79,7 @@ static void ip6_frag_expire(struct timer_list *t)
 	struct net *net;
 
 	fq = container_of(frag, struct frag_queue, q);
-	net = container_of(fq->q.fqdir, struct net, ipv6.frags);
+	net = container_of(fq->q.fqdir, struct net, ipv6.fqdir);
 
 	ip6frag_expire_frag_queue(net, fq);
 }
@@ -100,7 +100,7 @@ fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif)
 					    IPV6_ADDR_LINKLOCAL)))
 		key.iif = 0;
 
-	q = inet_frag_find(&net->ipv6.frags, &key);
+	q = inet_frag_find(&net->ipv6.fqdir, &key);
 	if (!q)
 		return NULL;
 
@@ -254,7 +254,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb,
 			  struct sk_buff *prev_tail, struct net_device *dev)
 {
-	struct net *net = container_of(fq->q.fqdir, struct net, ipv6.frags);
+	struct net *net = container_of(fq->q.fqdir, struct net, ipv6.fqdir);
 	unsigned int nhoff;
 	void *reasm_data;
 	int payload_len;
@@ -401,23 +401,23 @@ static const struct inet6_protocol frag_protocol = {
 static struct ctl_table ip6_frags_ns_ctl_table[] = {
 	{
 		.procname	= "ip6frag_high_thresh",
-		.data		= &init_net.ipv6.frags.high_thresh,
+		.data		= &init_net.ipv6.fqdir.high_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &init_net.ipv6.frags.low_thresh
+		.extra1		= &init_net.ipv6.fqdir.low_thresh
 	},
 	{
 		.procname	= "ip6frag_low_thresh",
-		.data		= &init_net.ipv6.frags.low_thresh,
+		.data		= &init_net.ipv6.fqdir.low_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra2		= &init_net.ipv6.frags.high_thresh
+		.extra2		= &init_net.ipv6.fqdir.high_thresh
 	},
 	{
 		.procname	= "ip6frag_time",
-		.data		= &init_net.ipv6.frags.timeout,
+		.data		= &init_net.ipv6.fqdir.timeout,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
@@ -449,11 +449,11 @@ static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
 		if (!table)
 			goto err_alloc;
 
-		table[0].data = &net->ipv6.frags.high_thresh;
-		table[0].extra1 = &net->ipv6.frags.low_thresh;
-		table[1].data = &net->ipv6.frags.low_thresh;
-		table[1].extra2 = &net->ipv6.frags.high_thresh;
-		table[2].data = &net->ipv6.frags.timeout;
+		table[0].data = &net->ipv6.fqdir.high_thresh;
+		table[0].extra1 = &net->ipv6.fqdir.low_thresh;
+		table[1].data = &net->ipv6.fqdir.low_thresh;
+		table[1].extra2 = &net->ipv6.fqdir.high_thresh;
+		table[2].data = &net->ipv6.fqdir.timeout;
 	}
 
 	hdr = register_net_sysctl(net, "net/ipv6", table);
@@ -517,25 +517,25 @@ static int __net_init ipv6_frags_init_net(struct net *net)
 {
 	int res;
 
-	net->ipv6.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
-	net->ipv6.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
-	net->ipv6.frags.timeout = IPV6_FRAG_TIMEOUT;
-	net->ipv6.frags.f = &ip6_frags;
+	net->ipv6.fqdir.high_thresh = IPV6_FRAG_HIGH_THRESH;
+	net->ipv6.fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
+	net->ipv6.fqdir.timeout = IPV6_FRAG_TIMEOUT;
+	net->ipv6.fqdir.f = &ip6_frags;
 
-	res = inet_frags_init_net(&net->ipv6.frags);
+	res = inet_frags_init_net(&net->ipv6.fqdir);
 	if (res < 0)
 		return res;
 
 	res = ip6_frags_ns_sysctl_register(net);
 	if (res < 0)
-		fqdir_exit(&net->ipv6.frags);
+		fqdir_exit(&net->ipv6.fqdir);
 	return res;
 }
 
 static void __net_exit ipv6_frags_exit_net(struct net *net)
 {
 	ip6_frags_ns_sysctl_unregister(net);
-	fqdir_exit(&net->ipv6.frags);
+	fqdir_exit(&net->ipv6.fqdir);
 }
 
 static struct pernet_operations ip6_frags_ops = {
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next 04/11] ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[]
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
                   ` (2 preceding siblings ...)
  2019-05-24 16:03 ` [PATCH net-next 03/11] net: rename struct fqdir fields Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 05/11] ipv6: no longer reference init_net in ip6_frags_ns_ctl_table[] Eric Dumazet
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

(struct net *)->ipv4.fqdir will soon be a pointer, so make
sure ip4_frags_ns_ctl_table[] does not reference init_net.

ip4_frags_ns_ctl_register() can perform the needed initialization
for all netns.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ip_fragment.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index f1831367cc2b188bdcc93f25818dda13e4348427..fb035f4f36ca72c6a9013830a3fb327d802b3e00 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -544,30 +544,24 @@ static int dist_min;
 static struct ctl_table ip4_frags_ns_ctl_table[] = {
 	{
 		.procname	= "ipfrag_high_thresh",
-		.data		= &init_net.ipv4.fqdir.high_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &init_net.ipv4.fqdir.low_thresh
 	},
 	{
 		.procname	= "ipfrag_low_thresh",
-		.data		= &init_net.ipv4.fqdir.low_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra2		= &init_net.ipv4.fqdir.high_thresh
 	},
 	{
 		.procname	= "ipfrag_time",
-		.data		= &init_net.ipv4.fqdir.timeout,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
 	},
 	{
 		.procname	= "ipfrag_max_dist",
-		.data		= &init_net.ipv4.fqdir.max_dist,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
@@ -600,13 +594,13 @@ static int __net_init ip4_frags_ns_ctl_register(struct net *net)
 		if (!table)
 			goto err_alloc;
 
-		table[0].data = &net->ipv4.fqdir.high_thresh;
-		table[0].extra1 = &net->ipv4.fqdir.low_thresh;
-		table[1].data = &net->ipv4.fqdir.low_thresh;
-		table[1].extra2 = &net->ipv4.fqdir.high_thresh;
-		table[2].data = &net->ipv4.fqdir.timeout;
-		table[3].data = &net->ipv4.fqdir.max_dist;
 	}
+	table[0].data	= &net->ipv4.fqdir.high_thresh;
+	table[0].extra1	= &net->ipv4.fqdir.low_thresh;
+	table[1].data	= &net->ipv4.fqdir.low_thresh;
+	table[1].extra2	= &net->ipv4.fqdir.high_thresh;
+	table[2].data	= &net->ipv4.fqdir.timeout;
+	table[3].data	= &net->ipv4.fqdir.max_dist;
 
 	hdr = register_net_sysctl(net, "net/ipv4", table);
 	if (!hdr)
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next 05/11] ipv6: no longer reference init_net in ip6_frags_ns_ctl_table[]
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
                   ` (3 preceding siblings ...)
  2019-05-24 16:03 ` [PATCH net-next 04/11] ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[] Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 06/11] netfilter: ipv6: nf_defrag: no longer reference init_net in nf_ct_frag6_sysctl_table Eric Dumazet
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

(struct net *)->ipv6.fqdir will soon be a pointer, so make
sure ip6_frags_ns_ctl_table[] does not reference init_net.

ip6_frags_ns_ctl_register() can perform the needed initialization
for all netns.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/reassembly.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 5160fd9ed223b723249b1c3f8ac3e2a97c7ffc43..aabc9b2e83e4ba9ae4af6a6d7047fe926c391d59 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -401,23 +401,18 @@ static const struct inet6_protocol frag_protocol = {
 static struct ctl_table ip6_frags_ns_ctl_table[] = {
 	{
 		.procname	= "ip6frag_high_thresh",
-		.data		= &init_net.ipv6.fqdir.high_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &init_net.ipv6.fqdir.low_thresh
 	},
 	{
 		.procname	= "ip6frag_low_thresh",
-		.data		= &init_net.ipv6.fqdir.low_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra2		= &init_net.ipv6.fqdir.high_thresh
 	},
 	{
 		.procname	= "ip6frag_time",
-		.data		= &init_net.ipv6.fqdir.timeout,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
@@ -449,12 +444,12 @@ static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
 		if (!table)
 			goto err_alloc;
 
-		table[0].data = &net->ipv6.fqdir.high_thresh;
-		table[0].extra1 = &net->ipv6.fqdir.low_thresh;
-		table[1].data = &net->ipv6.fqdir.low_thresh;
-		table[1].extra2 = &net->ipv6.fqdir.high_thresh;
-		table[2].data = &net->ipv6.fqdir.timeout;
 	}
+	table[0].data	= &net->ipv6.fqdir.high_thresh;
+	table[0].extra1	= &net->ipv6.fqdir.low_thresh;
+	table[1].data	= &net->ipv6.fqdir.low_thresh;
+	table[1].extra2	= &net->ipv6.fqdir.high_thresh;
+	table[2].data	= &net->ipv6.fqdir.timeout;
 
 	hdr = register_net_sysctl(net, "net/ipv6", table);
 	if (!hdr)
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next 06/11] netfilter: ipv6: nf_defrag: no longer reference init_net in nf_ct_frag6_sysctl_table
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
                   ` (4 preceding siblings ...)
  2019-05-24 16:03 ` [PATCH net-next 05/11] ipv6: no longer reference init_net in ip6_frags_ns_ctl_table[] Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 07/11] ieee820154: 6lowpan: no longer reference init_net in lowpan_frags_ns_ctl_table Eric Dumazet
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

(struct net *)->nf_frag.fqdir will soon be a pointer, so make
sure nf_ct_frag6_sysctl_table[] does not reference init_net.

nf_ct_frag6_sysctl_register() can perform the needed initialization
for all netns.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/netfilter/nf_conntrack_reasm.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 46073e9a6c566b0f019c94de902f347f6e0f0cba..3387ce53040953f16de1fbb447c744af87e0cefa 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -58,26 +58,21 @@ static struct inet_frags nf_frags;
 static struct ctl_table nf_ct_frag6_sysctl_table[] = {
 	{
 		.procname	= "nf_conntrack_frag6_timeout",
-		.data		= &init_net.nf_frag.fqdir.timeout,
 		.maxlen		= sizeof(unsigned int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
 	},
 	{
 		.procname	= "nf_conntrack_frag6_low_thresh",
-		.data		= &init_net.nf_frag.fqdir.low_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra2		= &init_net.nf_frag.fqdir.high_thresh
 	},
 	{
 		.procname	= "nf_conntrack_frag6_high_thresh",
-		.data		= &init_net.nf_frag.fqdir.high_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &init_net.nf_frag.fqdir.low_thresh
 	},
 	{ }
 };
@@ -93,15 +88,15 @@ static int nf_ct_frag6_sysctl_register(struct net *net)
 				GFP_KERNEL);
 		if (table == NULL)
 			goto err_alloc;
-
-		table[0].data = &net->nf_frag.fqdir.timeout;
-		table[1].data = &net->nf_frag.fqdir.low_thresh;
-		table[1].extra2 = &net->nf_frag.fqdir.high_thresh;
-		table[2].data = &net->nf_frag.fqdir.high_thresh;
-		table[2].extra1 = &net->nf_frag.fqdir.low_thresh;
-		table[2].extra2 = &init_net.nf_frag.fqdir.high_thresh;
 	}
 
+	table[0].data	= &net->nf_frag.fqdir.timeout;
+	table[1].data	= &net->nf_frag.fqdir.low_thresh;
+	table[1].extra2	= &net->nf_frag.fqdir.high_thresh;
+	table[2].data	= &net->nf_frag.fqdir.high_thresh;
+	table[2].extra1	= &net->nf_frag.fqdir.low_thresh;
+	table[2].extra2	= &init_net.nf_frag.fqdir.high_thresh;
+
 	hdr = register_net_sysctl(net, "net/netfilter", table);
 	if (hdr == NULL)
 		goto err_reg;
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next 07/11] ieee820154: 6lowpan: no longer reference init_net in lowpan_frags_ns_ctl_table
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
                   ` (5 preceding siblings ...)
  2019-05-24 16:03 ` [PATCH net-next 06/11] netfilter: ipv6: nf_defrag: no longer reference init_net in nf_ct_frag6_sysctl_table Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 08/11] net: rename inet_frags_init_net() to fdir_init() Eric Dumazet
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

(struct net *)->ieee802154_lowpan.fqdir will soon be a pointer, so make
sure lowpan_frags_ns_ctl_table[] does not reference init_net.

lowpan_frags_ns_sysctl_register() can perform the needed initialization
for all netns.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ieee802154/6lowpan/reassembly.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 955047fe797a10995845bca51cb9770e3356a351..4bbd6999c58fc04ad660928a10f89a7ff0ed4cf2 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -326,23 +326,18 @@ int lowpan_frag_rcv(struct sk_buff *skb, u8 frag_type)
 static struct ctl_table lowpan_frags_ns_ctl_table[] = {
 	{
 		.procname	= "6lowpanfrag_high_thresh",
-		.data		= &init_net.ieee802154_lowpan.fqdir.high_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra1		= &init_net.ieee802154_lowpan.fqdir.low_thresh
 	},
 	{
 		.procname	= "6lowpanfrag_low_thresh",
-		.data		= &init_net.ieee802154_lowpan.fqdir.low_thresh,
 		.maxlen		= sizeof(unsigned long),
 		.mode		= 0644,
 		.proc_handler	= proc_doulongvec_minmax,
-		.extra2		= &init_net.ieee802154_lowpan.fqdir.high_thresh
 	},
 	{
 		.procname	= "6lowpanfrag_time",
-		.data		= &init_net.ieee802154_lowpan.fqdir.timeout,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
@@ -377,17 +372,17 @@ static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
 		if (table == NULL)
 			goto err_alloc;
 
-		table[0].data = &ieee802154_lowpan->fqdir.high_thresh;
-		table[0].extra1 = &ieee802154_lowpan->fqdir.low_thresh;
-		table[1].data = &ieee802154_lowpan->fqdir.low_thresh;
-		table[1].extra2 = &ieee802154_lowpan->fqdir.high_thresh;
-		table[2].data = &ieee802154_lowpan->fqdir.timeout;
-
 		/* Don't export sysctls to unprivileged users */
 		if (net->user_ns != &init_user_ns)
 			table[0].procname = NULL;
 	}
 
+	table[0].data	= &ieee802154_lowpan->fqdir.high_thresh;
+	table[0].extra1	= &ieee802154_lowpan->fqdir.low_thresh;
+	table[1].data	= &ieee802154_lowpan->fqdir.low_thresh;
+	table[1].extra2	= &ieee802154_lowpan->fqdir.high_thresh;
+	table[2].data	= &ieee802154_lowpan->fqdir.timeout;
+
 	hdr = register_net_sysctl(net, "net/ieee802154/6lowpan", table);
 	if (hdr == NULL)
 		goto err_reg;
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next 08/11] net: rename inet_frags_init_net() to fdir_init()
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
                   ` (6 preceding siblings ...)
  2019-05-24 16:03 ` [PATCH net-next 07/11] ieee820154: 6lowpan: no longer reference init_net in lowpan_frags_ns_ctl_table Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 09/11] net: add a net pointer to struct fqdir Eric Dumazet
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

And pass an extra parameter, since we will soon
dynamically allocate fqdir structures.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/inet_frag.h                 | 3 ++-
 net/ieee802154/6lowpan/reassembly.c     | 3 +--
 net/ipv4/ip_fragment.c                  | 3 +--
 net/ipv6/netfilter/nf_conntrack_reasm.c | 3 +--
 net/ipv6/reassembly.c                   | 3 +--
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index d1bfd5dbe2d439e1cd9c620e5197ffbffb05920a..fca246b0abd84d354c6ca1902af9867732ff49cc 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -104,8 +104,9 @@ struct inet_frags {
 int inet_frags_init(struct inet_frags *);
 void inet_frags_fini(struct inet_frags *);
 
-static inline int inet_frags_init_net(struct fqdir *fqdir)
+static inline int fqdir_init(struct fqdir *fqdir, struct inet_frags *f)
 {
+	fqdir->f = f;
 	atomic_long_set(&fqdir->mem, 0);
 	return rhashtable_init(&fqdir->rhashtable, &fqdir->f->rhash_params);
 }
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 4bbd6999c58fc04ad660928a10f89a7ff0ed4cf2..82db76ce0e61c06c6e56a00999530c7b47b49143 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -452,9 +452,8 @@ static int __net_init lowpan_frags_init_net(struct net *net)
 	ieee802154_lowpan->fqdir.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	ieee802154_lowpan->fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
 	ieee802154_lowpan->fqdir.timeout = IPV6_FRAG_TIMEOUT;
-	ieee802154_lowpan->fqdir.f = &lowpan_frags;
 
-	res = inet_frags_init_net(&ieee802154_lowpan->fqdir);
+	res = fqdir_init(&ieee802154_lowpan->fqdir, &lowpan_frags);
 	if (res < 0)
 		return res;
 	res = lowpan_frags_ns_sysctl_register(net);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index fb035f4f36ca72c6a9013830a3fb327d802b3e00..d95592d5298136d852d9bb07a6f2091865f83e35 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -672,9 +672,8 @@ static int __net_init ipv4_frags_init_net(struct net *net)
 	net->ipv4.fqdir.timeout = IP_FRAG_TIME;
 
 	net->ipv4.fqdir.max_dist = 64;
-	net->ipv4.fqdir.f = &ip4_frags;
 
-	res = inet_frags_init_net(&net->ipv4.fqdir);
+	res = fqdir_init(&net->ipv4.fqdir, &ip4_frags);
 	if (res < 0)
 		return res;
 	res = ip4_frags_ns_ctl_register(net);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 3387ce53040953f16de1fbb447c744af87e0cefa..e72a1cc42163cc801e441e18b3a10a4c9f578aa3 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -494,9 +494,8 @@ static int nf_ct_net_init(struct net *net)
 	net->nf_frag.fqdir.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	net->nf_frag.fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
 	net->nf_frag.fqdir.timeout = IPV6_FRAG_TIMEOUT;
-	net->nf_frag.fqdir.f = &nf_frags;
 
-	res = inet_frags_init_net(&net->nf_frag.fqdir);
+	res = fqdir_init(&net->nf_frag.fqdir, &nf_frags);
 	if (res < 0)
 		return res;
 	res = nf_ct_frag6_sysctl_register(net);
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index aabc9b2e83e4ba9ae4af6a6d7047fe926c391d59..8235c5a8e8fe8d99c339e3f39979d8fe388f7c0a 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -515,9 +515,8 @@ static int __net_init ipv6_frags_init_net(struct net *net)
 	net->ipv6.fqdir.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	net->ipv6.fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
 	net->ipv6.fqdir.timeout = IPV6_FRAG_TIMEOUT;
-	net->ipv6.fqdir.f = &ip6_frags;
 
-	res = inet_frags_init_net(&net->ipv6.fqdir);
+	res = fqdir_init(&net->ipv6.fqdir, &ip6_frags);
 	if (res < 0)
 		return res;
 
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next 09/11] net: add a net pointer to struct fqdir
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
                   ` (7 preceding siblings ...)
  2019-05-24 16:03 ` [PATCH net-next 08/11] net: rename inet_frags_init_net() to fdir_init() Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 10/11] net: dynamically allocate fqdir structures Eric Dumazet
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

fqdir will soon be dynamically allocated.

We need to reach the struct net pointer from fqdir,
so add it, and replace the various container_of() constructs
by direct access to the new field.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/inet_frag.h                 |  5 ++++-
 net/ieee802154/6lowpan/reassembly.c     |  2 +-
 net/ipv4/ip_fragment.c                  | 20 +++++++-------------
 net/ipv6/netfilter/nf_conntrack_reasm.c |  6 ++----
 net/ipv6/reassembly.c                   |  8 +++-----
 5 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index fca246b0abd84d354c6ca1902af9867732ff49cc..37cde5c1498c30ee6ddf8ac7defd55e73ef296dc 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -12,6 +12,7 @@ struct fqdir {
 	int			timeout;
 	int			max_dist;
 	struct inet_frags	*f;
+	struct net		*net;
 
 	struct rhashtable       rhashtable ____cacheline_aligned_in_smp;
 
@@ -104,9 +105,11 @@ struct inet_frags {
 int inet_frags_init(struct inet_frags *);
 void inet_frags_fini(struct inet_frags *);
 
-static inline int fqdir_init(struct fqdir *fqdir, struct inet_frags *f)
+static inline int fqdir_init(struct fqdir *fqdir, struct inet_frags *f,
+			     struct net *net)
 {
 	fqdir->f = f;
+	fqdir->net = net;
 	atomic_long_set(&fqdir->mem, 0);
 	return rhashtable_init(&fqdir->rhashtable, &fqdir->f->rhash_params);
 }
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 82db76ce0e61c06c6e56a00999530c7b47b49143..03a444c9e191925c2b35c805f2b982739ca499f2 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -453,7 +453,7 @@ static int __net_init lowpan_frags_init_net(struct net *net)
 	ieee802154_lowpan->fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
 	ieee802154_lowpan->fqdir.timeout = IPV6_FRAG_TIMEOUT;
 
-	res = fqdir_init(&ieee802154_lowpan->fqdir, &lowpan_frags);
+	res = fqdir_init(&ieee802154_lowpan->fqdir, &lowpan_frags, net);
 	if (res < 0)
 		return res;
 	res = lowpan_frags_ns_sysctl_register(net);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index d95592d5298136d852d9bb07a6f2091865f83e35..d59269bbe1b610c4a34c7b09ab15d05ab13b7afa 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -82,9 +82,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
 static void ip4_frag_init(struct inet_frag_queue *q, const void *a)
 {
 	struct ipq *qp = container_of(q, struct ipq, q);
-	struct netns_ipv4 *ipv4 = container_of(q->fqdir, struct netns_ipv4,
-					       fqdir);
-	struct net *net = container_of(ipv4, struct net, ipv4);
+	struct net *net = q->fqdir->net;
 
 	const struct frag_v4_compare_key *key = a;
 
@@ -142,7 +140,7 @@ static void ip_expire(struct timer_list *t)
 	int err;
 
 	qp = container_of(frag, struct ipq, q);
-	net = container_of(qp->q.fqdir, struct net, ipv4.fqdir);
+	net = qp->q.fqdir->net;
 
 	rcu_read_lock();
 	spin_lock(&qp->q.lock);
@@ -236,12 +234,8 @@ static int ip_frag_too_far(struct ipq *qp)
 
 	rc = qp->q.fragments_tail && (end - start) > max;
 
-	if (rc) {
-		struct net *net;
-
-		net = container_of(qp->q.fqdir, struct net, ipv4.fqdir);
-		__IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS);
-	}
+	if (rc)
+		__IP_INC_STATS(qp->q.fqdir->net, IPSTATS_MIB_REASMFAILS);
 
 	return rc;
 }
@@ -273,7 +267,7 @@ static int ip_frag_reinit(struct ipq *qp)
 /* Add new segment to existing queue. */
 static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 {
-	struct net *net = container_of(qp->q.fqdir, struct net, ipv4.fqdir);
+	struct net *net = qp->q.fqdir->net;
 	int ihl, end, flags, offset;
 	struct sk_buff *prev_tail;
 	struct net_device *dev;
@@ -399,7 +393,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
 			 struct sk_buff *prev_tail, struct net_device *dev)
 {
-	struct net *net = container_of(qp->q.fqdir, struct net, ipv4.fqdir);
+	struct net *net = qp->q.fqdir->net;
 	struct iphdr *iph;
 	void *reasm_data;
 	int len, err;
@@ -673,7 +667,7 @@ static int __net_init ipv4_frags_init_net(struct net *net)
 
 	net->ipv4.fqdir.max_dist = 64;
 
-	res = fqdir_init(&net->ipv4.fqdir, &ip4_frags);
+	res = fqdir_init(&net->ipv4.fqdir, &ip4_frags, net);
 	if (res < 0)
 		return res;
 	res = ip4_frags_ns_ctl_register(net);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index e72a1cc42163cc801e441e18b3a10a4c9f578aa3..b6f7385ed93c264f60f3e92fc0ce2cdbe9af7fca 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -143,12 +143,10 @@ static void nf_ct_frag6_expire(struct timer_list *t)
 {
 	struct inet_frag_queue *frag = from_timer(frag, t, timer);
 	struct frag_queue *fq;
-	struct net *net;
 
 	fq = container_of(frag, struct frag_queue, q);
-	net = container_of(fq->q.fqdir, struct net, nf_frag.fqdir);
 
-	ip6frag_expire_frag_queue(net, fq);
+	ip6frag_expire_frag_queue(fq->q.fqdir->net, fq);
 }
 
 /* Creation primitives. */
@@ -495,7 +493,7 @@ static int nf_ct_net_init(struct net *net)
 	net->nf_frag.fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
 	net->nf_frag.fqdir.timeout = IPV6_FRAG_TIMEOUT;
 
-	res = fqdir_init(&net->nf_frag.fqdir, &nf_frags);
+	res = fqdir_init(&net->nf_frag.fqdir, &nf_frags, net);
 	if (res < 0)
 		return res;
 	res = nf_ct_frag6_sysctl_register(net);
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 8235c5a8e8fe8d99c339e3f39979d8fe388f7c0a..a6f26aa648fbee592ac1102e3ff1341df09d6385 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -76,12 +76,10 @@ static void ip6_frag_expire(struct timer_list *t)
 {
 	struct inet_frag_queue *frag = from_timer(frag, t, timer);
 	struct frag_queue *fq;
-	struct net *net;
 
 	fq = container_of(frag, struct frag_queue, q);
-	net = container_of(fq->q.fqdir, struct net, ipv6.fqdir);
 
-	ip6frag_expire_frag_queue(net, fq);
+	ip6frag_expire_frag_queue(fq->q.fqdir->net, fq);
 }
 
 static struct frag_queue *
@@ -254,7 +252,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb,
 			  struct sk_buff *prev_tail, struct net_device *dev)
 {
-	struct net *net = container_of(fq->q.fqdir, struct net, ipv6.fqdir);
+	struct net *net = fq->q.fqdir->net;
 	unsigned int nhoff;
 	void *reasm_data;
 	int payload_len;
@@ -516,7 +514,7 @@ static int __net_init ipv6_frags_init_net(struct net *net)
 	net->ipv6.fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
 	net->ipv6.fqdir.timeout = IPV6_FRAG_TIMEOUT;
 
-	res = fqdir_init(&net->ipv6.fqdir, &ip6_frags);
+	res = fqdir_init(&net->ipv6.fqdir, &ip6_frags, net);
 	if (res < 0)
 		return res;
 
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next 10/11] net: dynamically allocate fqdir structures
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
                   ` (8 preceding siblings ...)
  2019-05-24 16:03 ` [PATCH net-next 09/11] net: add a net pointer to struct fqdir Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-24 16:03 ` [PATCH net-next 11/11] inet: frags: rework rhashtable dismantle Eric Dumazet
  2019-05-26 21:26 ` [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle David Miller
  11 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Following patch will add rcu grace period before fqdir
rhashtable destruction, so we need to dynamically allocate
fqdir structures to not force expensive synchronize_rcu() calls
in netns dismantle path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/inet_frag.h                 | 17 ++++++++++---
 include/net/netns/ieee802154_6lowpan.h  |  2 +-
 include/net/netns/ipv4.h                |  2 +-
 include/net/netns/ipv6.h                |  4 ++--
 net/ieee802154/6lowpan/reassembly.c     | 24 ++++++++++---------
 net/ipv4/inet_fragment.c                |  1 +
 net/ipv4/ip_fragment.c                  | 32 ++++++++++++-------------
 net/ipv4/proc.c                         |  4 ++--
 net/ipv6/netfilter/nf_conntrack_reasm.c | 27 +++++++++++----------
 net/ipv6/proc.c                         |  4 ++--
 net/ipv6/reassembly.c                   | 24 +++++++++----------
 11 files changed, 78 insertions(+), 63 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 37cde5c1498c30ee6ddf8ac7defd55e73ef296dc..5f754c660cfa34898e48d5dbbf17a3508fb8b3ba 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -105,14 +105,25 @@ struct inet_frags {
 int inet_frags_init(struct inet_frags *);
 void inet_frags_fini(struct inet_frags *);
 
-static inline int fqdir_init(struct fqdir *fqdir, struct inet_frags *f,
+static inline int fqdir_init(struct fqdir **fqdirp, struct inet_frags *f,
 			     struct net *net)
 {
+	struct fqdir *fqdir = kzalloc(sizeof(*fqdir), GFP_KERNEL);
+	int res;
+
+	if (!fqdir)
+		return -ENOMEM;
 	fqdir->f = f;
 	fqdir->net = net;
-	atomic_long_set(&fqdir->mem, 0);
-	return rhashtable_init(&fqdir->rhashtable, &fqdir->f->rhash_params);
+	res = rhashtable_init(&fqdir->rhashtable, &fqdir->f->rhash_params);
+	if (res < 0) {
+		kfree(fqdir);
+		return res;
+	}
+	*fqdirp = fqdir;
+	return 0;
 }
+
 void fqdir_exit(struct fqdir *fqdir);
 
 void inet_frag_kill(struct inet_frag_queue *q);
diff --git a/include/net/netns/ieee802154_6lowpan.h b/include/net/netns/ieee802154_6lowpan.h
index d27ac64f8dfefcc2253c14c04af193e6265b9e02..95406e1342cb54ee1f421b68e2feb1c421b2a768 100644
--- a/include/net/netns/ieee802154_6lowpan.h
+++ b/include/net/netns/ieee802154_6lowpan.h
@@ -16,7 +16,7 @@ struct netns_sysctl_lowpan {
 
 struct netns_ieee802154_lowpan {
 	struct netns_sysctl_lowpan sysctl;
-	struct fqdir		fqdir;
+	struct fqdir		*fqdir;
 };
 
 #endif
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 3c270baa32e030b36594627a77abe3b4ffc775f5..c07cee1e0c9ea21cfab457a90287358ed7650d72 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -72,7 +72,7 @@ struct netns_ipv4 {
 
 	struct inet_peer_base	*peers;
 	struct sock  * __percpu	*tcp_sk;
-	struct fqdir		fqdir;
+	struct fqdir		*fqdir;
 #ifdef CONFIG_NETFILTER
 	struct xt_table		*iptable_filter;
 	struct xt_table		*iptable_mangle;
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 3dd2ae2a38e2aead40f2bcf13e40b79ca492ad48..022a0fd1a5a466bd35fed5deab4fa901e2d5b1d3 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -58,7 +58,7 @@ struct netns_ipv6 {
 	struct ipv6_devconf	*devconf_all;
 	struct ipv6_devconf	*devconf_dflt;
 	struct inet_peer_base	*peers;
-	struct fqdir		fqdir;
+	struct fqdir		*fqdir;
 #ifdef CONFIG_NETFILTER
 	struct xt_table		*ip6table_filter;
 	struct xt_table		*ip6table_mangle;
@@ -116,7 +116,7 @@ struct netns_ipv6 {
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
 struct netns_nf_frag {
-	struct fqdir	fqdir;
+	struct fqdir	*fqdir;
 };
 #endif
 
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 03a444c9e191925c2b35c805f2b982739ca499f2..e59c3b7089691ef95ce3b7ce02afe68ffc256dcc 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -79,7 +79,7 @@ fq_find(struct net *net, const struct lowpan_802154_cb *cb,
 	key.src = *src;
 	key.dst = *dst;
 
-	q = inet_frag_find(&ieee802154_lowpan->fqdir, &key);
+	q = inet_frag_find(ieee802154_lowpan->fqdir, &key);
 	if (!q)
 		return NULL;
 
@@ -377,11 +377,11 @@ static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
 			table[0].procname = NULL;
 	}
 
-	table[0].data	= &ieee802154_lowpan->fqdir.high_thresh;
-	table[0].extra1	= &ieee802154_lowpan->fqdir.low_thresh;
-	table[1].data	= &ieee802154_lowpan->fqdir.low_thresh;
-	table[1].extra2	= &ieee802154_lowpan->fqdir.high_thresh;
-	table[2].data	= &ieee802154_lowpan->fqdir.timeout;
+	table[0].data	= &ieee802154_lowpan->fqdir->high_thresh;
+	table[0].extra1	= &ieee802154_lowpan->fqdir->low_thresh;
+	table[1].data	= &ieee802154_lowpan->fqdir->low_thresh;
+	table[1].extra2	= &ieee802154_lowpan->fqdir->high_thresh;
+	table[2].data	= &ieee802154_lowpan->fqdir->timeout;
 
 	hdr = register_net_sysctl(net, "net/ieee802154/6lowpan", table);
 	if (hdr == NULL)
@@ -449,16 +449,18 @@ static int __net_init lowpan_frags_init_net(struct net *net)
 		net_ieee802154_lowpan(net);
 	int res;
 
-	ieee802154_lowpan->fqdir.high_thresh = IPV6_FRAG_HIGH_THRESH;
-	ieee802154_lowpan->fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
-	ieee802154_lowpan->fqdir.timeout = IPV6_FRAG_TIMEOUT;
 
 	res = fqdir_init(&ieee802154_lowpan->fqdir, &lowpan_frags, net);
 	if (res < 0)
 		return res;
+
+	ieee802154_lowpan->fqdir->high_thresh = IPV6_FRAG_HIGH_THRESH;
+	ieee802154_lowpan->fqdir->low_thresh = IPV6_FRAG_LOW_THRESH;
+	ieee802154_lowpan->fqdir->timeout = IPV6_FRAG_TIMEOUT;
+
 	res = lowpan_frags_ns_sysctl_register(net);
 	if (res < 0)
-		fqdir_exit(&ieee802154_lowpan->fqdir);
+		fqdir_exit(ieee802154_lowpan->fqdir);
 	return res;
 }
 
@@ -468,7 +470,7 @@ static void __net_exit lowpan_frags_exit_net(struct net *net)
 		net_ieee802154_lowpan(net);
 
 	lowpan_frags_ns_sysctl_unregister(net);
-	fqdir_exit(&ieee802154_lowpan->fqdir);
+	fqdir_exit(ieee802154_lowpan->fqdir);
 }
 
 static struct pernet_operations lowpan_frags_ops = {
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index a5ec5d9567931d03cfa3fbe1a370857ed8dc3b3d..b4432f209c715dd09b0b201fdae16d5332770c85 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -150,6 +150,7 @@ void fqdir_exit(struct fqdir *fqdir)
 	fqdir->high_thresh = 0; /* prevent creation of new frags */
 
 	rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL);
+	kfree(fqdir);
 }
 EXPORT_SYMBOL(fqdir_exit);
 
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index d59269bbe1b610c4a34c7b09ab15d05ab13b7afa..1ffaec056821b8e0bd0915403b0a3a1d449690ec 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -209,7 +209,7 @@ static struct ipq *ip_find(struct net *net, struct iphdr *iph,
 	};
 	struct inet_frag_queue *q;
 
-	q = inet_frag_find(&net->ipv4.fqdir, &key);
+	q = inet_frag_find(net->ipv4.fqdir, &key);
 	if (!q)
 		return NULL;
 
@@ -589,12 +589,12 @@ static int __net_init ip4_frags_ns_ctl_register(struct net *net)
 			goto err_alloc;
 
 	}
-	table[0].data	= &net->ipv4.fqdir.high_thresh;
-	table[0].extra1	= &net->ipv4.fqdir.low_thresh;
-	table[1].data	= &net->ipv4.fqdir.low_thresh;
-	table[1].extra2	= &net->ipv4.fqdir.high_thresh;
-	table[2].data	= &net->ipv4.fqdir.timeout;
-	table[3].data	= &net->ipv4.fqdir.max_dist;
+	table[0].data	= &net->ipv4.fqdir->high_thresh;
+	table[0].extra1	= &net->ipv4.fqdir->low_thresh;
+	table[1].data	= &net->ipv4.fqdir->low_thresh;
+	table[1].extra2	= &net->ipv4.fqdir->high_thresh;
+	table[2].data	= &net->ipv4.fqdir->timeout;
+	table[3].data	= &net->ipv4.fqdir->max_dist;
 
 	hdr = register_net_sysctl(net, "net/ipv4", table);
 	if (!hdr)
@@ -642,6 +642,9 @@ static int __net_init ipv4_frags_init_net(struct net *net)
 {
 	int res;
 
+	res = fqdir_init(&net->ipv4.fqdir, &ip4_frags, net);
+	if (res < 0)
+		return res;
 	/* Fragment cache limits.
 	 *
 	 * The fragment memory accounting code, (tries to) account for
@@ -656,30 +659,27 @@ static int __net_init ipv4_frags_init_net(struct net *net)
 	 * we will prune down to 3MB, making room for approx 8 big 64K
 	 * fragments 8x128k.
 	 */
-	net->ipv4.fqdir.high_thresh = 4 * 1024 * 1024;
-	net->ipv4.fqdir.low_thresh  = 3 * 1024 * 1024;
+	net->ipv4.fqdir->high_thresh = 4 * 1024 * 1024;
+	net->ipv4.fqdir->low_thresh  = 3 * 1024 * 1024;
 	/*
 	 * Important NOTE! Fragment queue must be destroyed before MSL expires.
 	 * RFC791 is wrong proposing to prolongate timer each fragment arrival
 	 * by TTL.
 	 */
-	net->ipv4.fqdir.timeout = IP_FRAG_TIME;
+	net->ipv4.fqdir->timeout = IP_FRAG_TIME;
 
-	net->ipv4.fqdir.max_dist = 64;
+	net->ipv4.fqdir->max_dist = 64;
 
-	res = fqdir_init(&net->ipv4.fqdir, &ip4_frags, net);
-	if (res < 0)
-		return res;
 	res = ip4_frags_ns_ctl_register(net);
 	if (res < 0)
-		fqdir_exit(&net->ipv4.fqdir);
+		fqdir_exit(net->ipv4.fqdir);
 	return res;
 }
 
 static void __net_exit ipv4_frags_exit_net(struct net *net)
 {
 	ip4_frags_ns_ctl_unregister(net);
-	fqdir_exit(&net->ipv4.fqdir);
+	fqdir_exit(net->ipv4.fqdir);
 }
 
 static struct pernet_operations ip4_frags_ops = {
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 3927e00084e8eacc237c4bb46554458ad0266dcf..b613572c66168c0fa56b8051d63959727f65f04e 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -72,8 +72,8 @@ static int sockstat_seq_show(struct seq_file *seq, void *v)
 	seq_printf(seq, "RAW: inuse %d\n",
 		   sock_prot_inuse_get(net, &raw_prot));
 	seq_printf(seq,  "FRAG: inuse %u memory %lu\n",
-		   atomic_read(&net->ipv4.fqdir.rhashtable.nelems),
-		   frag_mem_limit(&net->ipv4.fqdir));
+		   atomic_read(&net->ipv4.fqdir->rhashtable.nelems),
+		   frag_mem_limit(net->ipv4.fqdir));
 	return 0;
 }
 
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index b6f7385ed93c264f60f3e92fc0ce2cdbe9af7fca..c5d59fa568d6a361fbd58d920ebb61e45c9e515f 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -90,12 +90,12 @@ static int nf_ct_frag6_sysctl_register(struct net *net)
 			goto err_alloc;
 	}
 
-	table[0].data	= &net->nf_frag.fqdir.timeout;
-	table[1].data	= &net->nf_frag.fqdir.low_thresh;
-	table[1].extra2	= &net->nf_frag.fqdir.high_thresh;
-	table[2].data	= &net->nf_frag.fqdir.high_thresh;
-	table[2].extra1	= &net->nf_frag.fqdir.low_thresh;
-	table[2].extra2	= &init_net.nf_frag.fqdir.high_thresh;
+	table[0].data	= &net->nf_frag.fqdir->timeout;
+	table[1].data	= &net->nf_frag.fqdir->low_thresh;
+	table[1].extra2	= &net->nf_frag.fqdir->high_thresh;
+	table[2].data	= &net->nf_frag.fqdir->high_thresh;
+	table[2].extra1	= &net->nf_frag.fqdir->low_thresh;
+	table[2].extra2	= &init_net.nf_frag.fqdir->high_thresh;
 
 	hdr = register_net_sysctl(net, "net/netfilter", table);
 	if (hdr == NULL)
@@ -162,7 +162,7 @@ static struct frag_queue *fq_find(struct net *net, __be32 id, u32 user,
 	};
 	struct inet_frag_queue *q;
 
-	q = inet_frag_find(&net->nf_frag.fqdir, &key);
+	q = inet_frag_find(net->nf_frag.fqdir, &key);
 	if (!q)
 		return NULL;
 
@@ -489,23 +489,24 @@ static int nf_ct_net_init(struct net *net)
 {
 	int res;
 
-	net->nf_frag.fqdir.high_thresh = IPV6_FRAG_HIGH_THRESH;
-	net->nf_frag.fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
-	net->nf_frag.fqdir.timeout = IPV6_FRAG_TIMEOUT;
-
 	res = fqdir_init(&net->nf_frag.fqdir, &nf_frags, net);
 	if (res < 0)
 		return res;
+
+	net->nf_frag.fqdir->high_thresh = IPV6_FRAG_HIGH_THRESH;
+	net->nf_frag.fqdir->low_thresh = IPV6_FRAG_LOW_THRESH;
+	net->nf_frag.fqdir->timeout = IPV6_FRAG_TIMEOUT;
+
 	res = nf_ct_frag6_sysctl_register(net);
 	if (res < 0)
-		fqdir_exit(&net->nf_frag.fqdir);
+		fqdir_exit(net->nf_frag.fqdir);
 	return res;
 }
 
 static void nf_ct_net_exit(struct net *net)
 {
 	nf_ct_frags6_sysctl_unregister(net);
-	fqdir_exit(&net->nf_frag.fqdir);
+	fqdir_exit(net->nf_frag.fqdir);
 }
 
 static struct pernet_operations nf_ct_net_ops = {
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index f3e3118393c4b4841228797c168e757585dbb17b..0bbefc440bcdd9fdbca1bd9258eaa9e6d315c0eb 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -48,8 +48,8 @@ static int sockstat6_seq_show(struct seq_file *seq, void *v)
 	seq_printf(seq, "RAW6: inuse %d\n",
 		       sock_prot_inuse_get(net, &rawv6_prot));
 	seq_printf(seq, "FRAG6: inuse %u memory %lu\n",
-		   atomic_read(&net->ipv6.fqdir.rhashtable.nelems),
-		   frag_mem_limit(&net->ipv6.fqdir));
+		   atomic_read(&net->ipv6.fqdir->rhashtable.nelems),
+		   frag_mem_limit(net->ipv6.fqdir));
 	return 0;
 }
 
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index a6f26aa648fbee592ac1102e3ff1341df09d6385..836ea964cf140d8b0134894455f18addc069e13e 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -98,7 +98,7 @@ fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif)
 					    IPV6_ADDR_LINKLOCAL)))
 		key.iif = 0;
 
-	q = inet_frag_find(&net->ipv6.fqdir, &key);
+	q = inet_frag_find(net->ipv6.fqdir, &key);
 	if (!q)
 		return NULL;
 
@@ -443,11 +443,11 @@ static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
 			goto err_alloc;
 
 	}
-	table[0].data	= &net->ipv6.fqdir.high_thresh;
-	table[0].extra1	= &net->ipv6.fqdir.low_thresh;
-	table[1].data	= &net->ipv6.fqdir.low_thresh;
-	table[1].extra2	= &net->ipv6.fqdir.high_thresh;
-	table[2].data	= &net->ipv6.fqdir.timeout;
+	table[0].data	= &net->ipv6.fqdir->high_thresh;
+	table[0].extra1	= &net->ipv6.fqdir->low_thresh;
+	table[1].data	= &net->ipv6.fqdir->low_thresh;
+	table[1].extra2	= &net->ipv6.fqdir->high_thresh;
+	table[2].data	= &net->ipv6.fqdir->timeout;
 
 	hdr = register_net_sysctl(net, "net/ipv6", table);
 	if (!hdr)
@@ -510,24 +510,24 @@ static int __net_init ipv6_frags_init_net(struct net *net)
 {
 	int res;
 
-	net->ipv6.fqdir.high_thresh = IPV6_FRAG_HIGH_THRESH;
-	net->ipv6.fqdir.low_thresh = IPV6_FRAG_LOW_THRESH;
-	net->ipv6.fqdir.timeout = IPV6_FRAG_TIMEOUT;
-
 	res = fqdir_init(&net->ipv6.fqdir, &ip6_frags, net);
 	if (res < 0)
 		return res;
 
+	net->ipv6.fqdir->high_thresh = IPV6_FRAG_HIGH_THRESH;
+	net->ipv6.fqdir->low_thresh = IPV6_FRAG_LOW_THRESH;
+	net->ipv6.fqdir->timeout = IPV6_FRAG_TIMEOUT;
+
 	res = ip6_frags_ns_sysctl_register(net);
 	if (res < 0)
-		fqdir_exit(&net->ipv6.fqdir);
+		fqdir_exit(net->ipv6.fqdir);
 	return res;
 }
 
 static void __net_exit ipv6_frags_exit_net(struct net *net)
 {
 	ip6_frags_ns_sysctl_unregister(net);
-	fqdir_exit(&net->ipv6.fqdir);
+	fqdir_exit(net->ipv6.fqdir);
 }
 
 static struct pernet_operations ip6_frags_ops = {
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next 11/11] inet: frags: rework rhashtable dismantle
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
                   ` (9 preceding siblings ...)
  2019-05-24 16:03 ` [PATCH net-next 10/11] net: dynamically allocate fqdir structures Eric Dumazet
@ 2019-05-24 16:03 ` Eric Dumazet
  2019-05-28  6:34   ` Herbert Xu
  2019-05-26 21:26 ` [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle David Miller
  11 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2019-05-24 16:03 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet, syzbot

syszbot found an interesting use-after-free [1] happening
while IPv4 fragment rhashtable was destroyed at netns dismantle.

While no insertions can possibly happen at the time a dismantling
netns is destroying this rhashtable, timers can still fire and
attempt to remove elements from this rhashtable.

This is forbidden, since rhashtable_free_and_destroy() has
no synchronization against concurrent inserts and deletes.

Add a new fqdir->dead flag so that timers do not attempt
a rhashtable_remove_fast() operation.

We also have to respect an RCU grace period before starting
the rhashtable_free_and_destroy() from process context,
thus we use rcu_work infrastructure.

This is a refinement of a prior rough attempt to fix this bug :
https://marc.info/?l=linux-netdev&m=153845936820900&w=2

Since the rhashtable cleanup is now deferred to a work queue,
netns dismantles should be slightly faster.

[1]
BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:194 [inline]
BUG: KASAN: use-after-free in rhashtable_last_table+0x162/0x180 lib/rhashtable.c:212
Read of size 8 at addr ffff8880a6497b70 by task kworker/0:0/5

CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.2.0-rc1+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events rht_deferred_worker
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 kasan_report+0x12/0x20 mm/kasan/common.c:614
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
 __read_once_size include/linux/compiler.h:194 [inline]
 rhashtable_last_table+0x162/0x180 lib/rhashtable.c:212
 rht_deferred_worker+0x111/0x2030 lib/rhashtable.c:411
 process_one_work+0x989/0x1790 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x354/0x420 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Allocated by task 32687:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_kmalloc mm/kasan/common.c:489 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503
 __do_kmalloc_node mm/slab.c:3620 [inline]
 __kmalloc_node+0x4e/0x70 mm/slab.c:3627
 kmalloc_node include/linux/slab.h:590 [inline]
 kvmalloc_node+0x68/0x100 mm/util.c:431
 kvmalloc include/linux/mm.h:637 [inline]
 kvzalloc include/linux/mm.h:645 [inline]
 bucket_table_alloc+0x90/0x480 lib/rhashtable.c:178
 rhashtable_init+0x3f4/0x7b0 lib/rhashtable.c:1057
 inet_frags_init_net include/net/inet_frag.h:109 [inline]
 ipv4_frags_init_net+0x182/0x410 net/ipv4/ip_fragment.c:683
 ops_init+0xb3/0x410 net/core/net_namespace.c:130
 setup_net+0x2d3/0x740 net/core/net_namespace.c:316
 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439
 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
 ksys_unshare+0x440/0x980 kernel/fork.c:2692
 __do_sys_unshare kernel/fork.c:2760 [inline]
 __se_sys_unshare kernel/fork.c:2758 [inline]
 __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 7:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
 __cache_free mm/slab.c:3432 [inline]
 kfree+0xcf/0x220 mm/slab.c:3755
 kvfree+0x61/0x70 mm/util.c:460
 bucket_table_free+0x69/0x150 lib/rhashtable.c:108
 rhashtable_free_and_destroy+0x165/0x8b0 lib/rhashtable.c:1155
 inet_frags_exit_net+0x3d/0x50 net/ipv4/inet_fragment.c:152
 ipv4_frags_exit_net+0x73/0x90 net/ipv4/ip_fragment.c:695
 ops_exit_list.isra.0+0xaa/0x150 net/core/net_namespace.c:154
 cleanup_net+0x3fb/0x960 net/core/net_namespace.c:553
 process_one_work+0x989/0x1790 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x354/0x420 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff8880a6497b40
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 48 bytes inside of
 1024-byte region [ffff8880a6497b40, ffff8880a6497f40)
The buggy address belongs to the page:
page:ffffea0002992580 refcount:1 mapcount:0 mapping:ffff8880aa400ac0 index:0xffff8880a64964c0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea0002916e88 ffffea000218fe08 ffff8880aa400ac0
raw: ffff8880a64964c0 ffff8880a6496040 0000000100000005 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8880a6497a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880a6497a80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8880a6497b00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                             ^
 ffff8880a6497b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880a6497c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 include/net/inet_frag.h  |  4 ++++
 net/ipv4/inet_fragment.c | 49 ++++++++++++++++++++++++++++++----------
 2 files changed, 41 insertions(+), 12 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 5f754c660cfa34898e48d5dbbf17a3508fb8b3ba..002f23c1a1a7126e146f596300aee0e52b6cafc6 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -13,11 +13,13 @@ struct fqdir {
 	int			max_dist;
 	struct inet_frags	*f;
 	struct net		*net;
+	bool			dead;
 
 	struct rhashtable       rhashtable ____cacheline_aligned_in_smp;
 
 	/* Keep atomic mem on separate cachelines in structs that include it */
 	atomic_long_t		mem ____cacheline_aligned_in_smp;
+	struct rcu_work		destroy_rwork;
 };
 
 /**
@@ -26,11 +28,13 @@ struct fqdir {
  * @INET_FRAG_FIRST_IN: first fragment has arrived
  * @INET_FRAG_LAST_IN: final fragment has arrived
  * @INET_FRAG_COMPLETE: frag queue has been processed and is due for destruction
+ * @INET_FRAG_HASH_DEAD: inet_frag_kill() has not removed fq from rhashtable
  */
 enum {
 	INET_FRAG_FIRST_IN	= BIT(0),
 	INET_FRAG_LAST_IN	= BIT(1),
 	INET_FRAG_COMPLETE	= BIT(2),
+	INET_FRAG_HASH_DEAD	= BIT(3),
 };
 
 struct frag_v4_compare_key {
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index b4432f209c715dd09b0b201fdae16d5332770c85..6ca9523374dab03737cd073a1aa990130c4a87ca 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -124,34 +124,49 @@ void inet_frags_fini(struct inet_frags *f)
 }
 EXPORT_SYMBOL(inet_frags_fini);
 
+/* called from rhashtable_free_and_destroy() at netns_frags dismantle */
 static void inet_frags_free_cb(void *ptr, void *arg)
 {
 	struct inet_frag_queue *fq = ptr;
+	int count;
 
-	/* If we can not cancel the timer, it means this frag_queue
-	 * is already disappearing, we have nothing to do.
-	 * Otherwise, we own a refcount until the end of this function.
-	 */
-	if (!del_timer(&fq->timer))
-		return;
+	count = del_timer_sync(&fq->timer) ? 1 : 0;
 
 	spin_lock_bh(&fq->lock);
 	if (!(fq->flags & INET_FRAG_COMPLETE)) {
 		fq->flags |= INET_FRAG_COMPLETE;
-		refcount_dec(&fq->refcnt);
+		count++;
+	} else if (fq->flags & INET_FRAG_HASH_DEAD) {
+		count++;
 	}
 	spin_unlock_bh(&fq->lock);
 
-	inet_frag_put(fq);
+	if (refcount_sub_and_test(count, &fq->refcnt))
+		inet_frag_destroy(fq);
 }
 
-void fqdir_exit(struct fqdir *fqdir)
+static void fqdir_rwork_fn(struct work_struct *work)
 {
-	fqdir->high_thresh = 0; /* prevent creation of new frags */
+	struct fqdir *fqdir = container_of(to_rcu_work(work),
+					   struct fqdir, destroy_rwork);
 
 	rhashtable_free_and_destroy(&fqdir->rhashtable, inet_frags_free_cb, NULL);
 	kfree(fqdir);
 }
+
+void fqdir_exit(struct fqdir *fqdir)
+{
+	fqdir->high_thresh = 0; /* prevent creation of new frags */
+
+	/* paired with READ_ONCE() in inet_frag_kill() :
+	 * We want to prevent rhashtable_remove_fast() calls
+	 */
+	smp_store_release(&fqdir->dead, true);
+
+	INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn);
+	queue_rcu_work(system_wq, &fqdir->destroy_rwork);
+
+}
 EXPORT_SYMBOL(fqdir_exit);
 
 void inet_frag_kill(struct inet_frag_queue *fq)
@@ -163,8 +178,18 @@ void inet_frag_kill(struct inet_frag_queue *fq)
 		struct fqdir *fqdir = fq->fqdir;
 
 		fq->flags |= INET_FRAG_COMPLETE;
-		rhashtable_remove_fast(&fqdir->rhashtable, &fq->node, fqdir->f->rhash_params);
-		refcount_dec(&fq->refcnt);
+		rcu_read_lock();
+		/* This READ_ONCE() is paired with smp_store_release()
+		 * in inet_frags_exit_net().
+		 */
+		if (!READ_ONCE(fqdir->dead)) {
+			rhashtable_remove_fast(&fqdir->rhashtable, &fq->node,
+					       fqdir->f->rhash_params);
+			refcount_dec(&fq->refcnt);
+		} else {
+			fq->flags |= INET_FRAG_HASH_DEAD;
+		}
+		rcu_read_unlock();
 	}
 }
 EXPORT_SYMBOL(inet_frag_kill);
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle
  2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
                   ` (10 preceding siblings ...)
  2019-05-24 16:03 ` [PATCH net-next 11/11] inet: frags: rework rhashtable dismantle Eric Dumazet
@ 2019-05-26 21:26 ` David Miller
  11 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2019-05-26 21:26 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Fri, 24 May 2019 09:03:29 -0700

> This patch series fixes a race happening on netns dismantle with
> frag queues. While rhashtable_free_and_destroy() is running,
> concurrent timers might run inet_frag_kill() and attempt
> rhashtable_remove_fast() calls. This is not allowed by
> rhashtable logic.
> 
> Since I do not want to add expensive synchronize_rcu() calls
> in the netns dismantle path, I had to no longer inline
> netns_frags structures, but dynamically allocate them.
> 
> The ten first patches make this preparation, so that
> the last patch clearly shows the fix.
> 
> As this patch series is not exactly trivial, I chose to
> target 5.3. We will backport it once soaked a bit.

Ok, applied to net-next.

Everything except the last patch looks trivially correct to me.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next 11/11] inet: frags: rework rhashtable dismantle
  2019-05-24 16:03 ` [PATCH net-next 11/11] inet: frags: rework rhashtable dismantle Eric Dumazet
@ 2019-05-28  6:34   ` Herbert Xu
  2019-05-28 13:31     ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Herbert Xu @ 2019-05-28  6:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, edumazet, eric.dumazet, syzkaller

Hi Eric:

Eric Dumazet <edumazet@google.com> wrote:
>
> +void fqdir_exit(struct fqdir *fqdir)
> +{
> +       fqdir->high_thresh = 0; /* prevent creation of new frags */
> +
> +       /* paired with READ_ONCE() in inet_frag_kill() :
> +        * We want to prevent rhashtable_remove_fast() calls
> +        */
> +       smp_store_release(&fqdir->dead, true);
> +
> +       INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn);
> +       queue_rcu_work(system_wq, &fqdir->destroy_rwork);
> +
> +}

What is the smp_store_release supposed to protect here? If it's
meant to separate the setting of dead and the subsequent destruction
work then it doesn't work because the barrier only protects the code
preceding it, not after.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next 11/11] inet: frags: rework rhashtable dismantle
  2019-05-28  6:34   ` Herbert Xu
@ 2019-05-28 13:31     ` Eric Dumazet
  2019-05-29  5:40       ` [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE Herbert Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2019-05-28 13:31 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, netdev, Eric Dumazet, syzbot

On Mon, May 27, 2019 at 11:34 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Hi Eric:
>
> Eric Dumazet <edumazet@google.com> wrote:
> >
> > +void fqdir_exit(struct fqdir *fqdir)
> > +{
> > +       fqdir->high_thresh = 0; /* prevent creation of new frags */
> > +
> > +       /* paired with READ_ONCE() in inet_frag_kill() :
> > +        * We want to prevent rhashtable_remove_fast() calls
> > +        */
> > +       smp_store_release(&fqdir->dead, true);
> > +
> > +       INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn);
> > +       queue_rcu_work(system_wq, &fqdir->destroy_rwork);
> > +
> > +}
>
> What is the smp_store_release supposed to protect here? If it's
> meant to separate the setting of dead and the subsequent destruction
> work then it doesn't work because the barrier only protects the code
> preceding it, not after.
>

This smp_store_release() is a left over of the first version of the patch, where
there was no rcu grace period enforcement.

I do not believe there is harm letting this, but if you disagree
please send a patch ;)

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-28 13:31     ` Eric Dumazet
@ 2019-05-29  5:40       ` Herbert Xu
  2019-05-29  5:43         ` Dmitry Vyukov
                           ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Herbert Xu @ 2019-05-29  5:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Eric Dumazet, syzbot

On Tue, May 28, 2019 at 06:31:00AM -0700, Eric Dumazet wrote:
>
> This smp_store_release() is a left over of the first version of the patch, where
> there was no rcu grace period enforcement.
> 
> I do not believe there is harm letting this, but if you disagree
> please send a patch ;)

I see now that it is actually relying on the barrier/locking
semantics of call_rcu vs. rcu_read_lock.  So the smp_store_release
and READ_ONCE are simply unnecessary and could be confusing to
future readers.

---8<---
The smp_store_release call in fqdir_exit cannot protect the setting
of fqdir->dead as claimed because its memory barrier is only
guaranteed to be one-way and the barrier precedes the setting of
fqdir->dead.

IOW it doesn't provide any barriers between fq->dir and the following
hash table destruction.

In fact, the code is safe anyway because call_rcu does provide both
the memory barrier as well as a guarantee that when the destruction
work starts executing all RCU readers will see the updated value for
fqdir->dead.

Therefore this patch removes the unnecessary smp_store_release call
as well as the corresponding READ_ONCE on the read-side in order to
not confuse future readers of this code.  Comments have been added
in their places.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 2b816f1ebbb4..35e9784fab4e 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -193,10 +193,12 @@ void fqdir_exit(struct fqdir *fqdir)
 {
 	fqdir->high_thresh = 0; /* prevent creation of new frags */
 
-	/* paired with READ_ONCE() in inet_frag_kill() :
-	 * We want to prevent rhashtable_remove_fast() calls
+	fqdir->dead = true;
+
+	/* call_rcu is supposed to provide memory barrier semantics,
+	 * separating the setting of fqdir->dead with the destruction
+	 * work.  This implicit barrier is paired with inet_frag_kill().
 	 */
-	smp_store_release(&fqdir->dead, true);
 
 	INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn);
 	queue_rcu_work(system_wq, &fqdir->destroy_rwork);
@@ -214,10 +216,12 @@ void inet_frag_kill(struct inet_frag_queue *fq)
 
 		fq->flags |= INET_FRAG_COMPLETE;
 		rcu_read_lock();
-		/* This READ_ONCE() is paired with smp_store_release()
-		 * in inet_frags_exit_net().
+		/* The RCU read lock provides a memory barrier
+		 * guaranteeing that if fqdir->dead is false then
+		 * the hash table destruction will not start until
+		 * after we unlock.  Paired with inet_frags_exit_net().
 		 */
-		if (!READ_ONCE(fqdir->dead)) {
+		if (!fqdir->dead) {
 			rhashtable_remove_fast(&fqdir->rhashtable, &fq->node,
 					       fqdir->f->rhash_params);
 			refcount_dec(&fq->refcnt);
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-29  5:40       ` [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE Herbert Xu
@ 2019-05-29  5:43         ` Dmitry Vyukov
  2019-05-29  5:47           ` Herbert Xu
  2019-05-29 21:30         ` Eric Dumazet
  2019-05-30 18:51         ` David Miller
  2 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2019-05-29  5:43 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, David Miller, netdev, Eric Dumazet, syzbot

On Wed, May 29, 2019 at 7:40 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Tue, May 28, 2019 at 06:31:00AM -0700, Eric Dumazet wrote:
> >
> > This smp_store_release() is a left over of the first version of the patch, where
> > there was no rcu grace period enforcement.
> >
> > I do not believe there is harm letting this, but if you disagree
> > please send a patch ;)
>
> I see now that it is actually relying on the barrier/locking
> semantics of call_rcu vs. rcu_read_lock.  So the smp_store_release
> and READ_ONCE are simply unnecessary and could be confusing to
> future readers.
>
> ---8<---
> The smp_store_release call in fqdir_exit cannot protect the setting
> of fqdir->dead as claimed because its memory barrier is only
> guaranteed to be one-way and the barrier precedes the setting of
> fqdir->dead.
>
> IOW it doesn't provide any barriers between fq->dir and the following
> hash table destruction.
>
> In fact, the code is safe anyway because call_rcu does provide both
> the memory barrier as well as a guarantee that when the destruction
> work starts executing all RCU readers will see the updated value for
> fqdir->dead.
>
> Therefore this patch removes the unnecessary smp_store_release call
> as well as the corresponding READ_ONCE on the read-side in order to
> not confuse future readers of this code.  Comments have been added
> in their places.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 2b816f1ebbb4..35e9784fab4e 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -193,10 +193,12 @@ void fqdir_exit(struct fqdir *fqdir)
>  {
>         fqdir->high_thresh = 0; /* prevent creation of new frags */
>
> -       /* paired with READ_ONCE() in inet_frag_kill() :
> -        * We want to prevent rhashtable_remove_fast() calls
> +       fqdir->dead = true;
> +
> +       /* call_rcu is supposed to provide memory barrier semantics,
> +        * separating the setting of fqdir->dead with the destruction
> +        * work.  This implicit barrier is paired with inet_frag_kill().
>          */
> -       smp_store_release(&fqdir->dead, true);
>
>         INIT_RCU_WORK(&fqdir->destroy_rwork, fqdir_rwork_fn);
>         queue_rcu_work(system_wq, &fqdir->destroy_rwork);
> @@ -214,10 +216,12 @@ void inet_frag_kill(struct inet_frag_queue *fq)
>
>                 fq->flags |= INET_FRAG_COMPLETE;
>                 rcu_read_lock();
> -               /* This READ_ONCE() is paired with smp_store_release()
> -                * in inet_frags_exit_net().
> +               /* The RCU read lock provides a memory barrier
> +                * guaranteeing that if fqdir->dead is false then
> +                * the hash table destruction will not start until
> +                * after we unlock.  Paired with inet_frags_exit_net().
>                  */
> -               if (!READ_ONCE(fqdir->dead)) {
> +               if (!fqdir->dead) {

If fqdir->dead read/write are concurrent, then this still needs to be
READ_ONCE/WRITE_ONCE. Ordering is orthogonal to atomicity.

>                         rhashtable_remove_fast(&fqdir->rhashtable, &fq->node,
>                                                fqdir->f->rhash_params);
>                         refcount_dec(&fq->refcnt);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-29  5:43         ` Dmitry Vyukov
@ 2019-05-29  5:47           ` Herbert Xu
  2019-05-31  8:24             ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Herbert Xu @ 2019-05-29  5:47 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Eric Dumazet, David Miller, netdev, Eric Dumazet, syzbot

On Wed, May 29, 2019 at 07:43:51AM +0200, Dmitry Vyukov wrote:
>
> If fqdir->dead read/write are concurrent, then this still needs to be
> READ_ONCE/WRITE_ONCE. Ordering is orthogonal to atomicity.

No they do not.  READ_ONCE/WRITE_ONCE are basically a more fine-tuned
version of barrier().  In this case we already have an implicit
barrier() call due to the memory barrier semantics so this is simply
unnecessary.

It's the same reason you don't need READ_ONCE/WRITE_ONCE when you do:

CPU1				CPU2
----				----
spin_lock
shared_var = 1			spin_lock
spin_unlock			if (shared_var == 1)
					...
				spin_unlock

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-29  5:40       ` [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE Herbert Xu
  2019-05-29  5:43         ` Dmitry Vyukov
@ 2019-05-29 21:30         ` Eric Dumazet
  2019-05-30 18:51         ` David Miller
  2 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-29 21:30 UTC (permalink / raw)
  To: Herbert Xu, Eric Dumazet; +Cc: David Miller, netdev, Eric Dumazet, syzbot



On 5/28/19 10:40 PM, Herbert Xu wrote:

> I see now that it is actually relying on the barrier/locking
> semantics of call_rcu vs. rcu_read_lock.  So the smp_store_release
> and READ_ONCE are simply unnecessary and could be confusing to
> future readers.
> 
> ---8<---
> The smp_store_release call in fqdir_exit cannot protect the setting
> of fqdir->dead as claimed because its memory barrier is only
> guaranteed to be one-way and the barrier precedes the setting of
> fqdir->dead.
> 
> IOW it doesn't provide any barriers between fq->dir and the following
> hash table destruction.
> 
> In fact, the code is safe anyway because call_rcu does provide both
> the memory barrier as well as a guarantee that when the destruction
> work starts executing all RCU readers will see the updated value for
> fqdir->dead.
> 
> Therefore this patch removes the unnecessary smp_store_release call
> as well as the corresponding READ_ONCE on the read-side in order to
> not confuse future readers of this code.  Comments have been added
> in their places.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 

SGTM, thanks.

Reviewed-by: Eric Dumazet <edumazet@google.com>

David, this targets net-next tree :)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-29  5:40       ` [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE Herbert Xu
  2019-05-29  5:43         ` Dmitry Vyukov
  2019-05-29 21:30         ` Eric Dumazet
@ 2019-05-30 18:51         ` David Miller
  2 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2019-05-30 18:51 UTC (permalink / raw)
  To: herbert; +Cc: edumazet, netdev, eric.dumazet, syzkaller

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 29 May 2019 13:40:26 +0800

> The smp_store_release call in fqdir_exit cannot protect the setting
> of fqdir->dead as claimed because its memory barrier is only
> guaranteed to be one-way and the barrier precedes the setting of
> fqdir->dead.
> 
> IOW it doesn't provide any barriers between fq->dir and the following
> hash table destruction.
> 
> In fact, the code is safe anyway because call_rcu does provide both
> the memory barrier as well as a guarantee that when the destruction
> work starts executing all RCU readers will see the updated value for
> fqdir->dead.
> 
> Therefore this patch removes the unnecessary smp_store_release call
> as well as the corresponding READ_ONCE on the read-side in order to
> not confuse future readers of this code.  Comments have been added
> in their places.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-29  5:47           ` Herbert Xu
@ 2019-05-31  8:24             ` Dmitry Vyukov
  2019-05-31 14:45               ` Herbert Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Vyukov @ 2019-05-31  8:24 UTC (permalink / raw)
  To: Herbert Xu, Paul E. McKenney, Andrea Parri, Alan Stern
  Cc: Eric Dumazet, David Miller, netdev, Eric Dumazet, syzbot

On Wed, May 29, 2019 at 7:48 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Wed, May 29, 2019 at 07:43:51AM +0200, Dmitry Vyukov wrote:
> >
> > If fqdir->dead read/write are concurrent, then this still needs to be
> > READ_ONCE/WRITE_ONCE. Ordering is orthogonal to atomicity.
>
> No they do not.  READ_ONCE/WRITE_ONCE are basically a more fine-tuned
> version of barrier().  In this case we already have an implicit
> barrier() call due to the memory barrier semantics so this is simply
> unnecessary.
>
> It's the same reason you don't need READ_ONCE/WRITE_ONCE when you do:
>
> CPU1                            CPU2
> ----                            ----
> spin_lock
> shared_var = 1                  spin_lock
> spin_unlock                     if (shared_var == 1)
>                                         ...
>                                 spin_unlock

+Paul, Andrea, Alan

OK, let's call it barrier. But we need more than a barrier here then.

1. The C standard is very clear here -- this new code causes undefined
behavior of kernel. Regardless of what one might think about the C
standard, it's still the current contract between us and compiler
writers and nobody created any better replacement.

2. Then Documentation/memory-barriers.txt (which one can't say is not
relevant here) effectively says the same:

 (*) It _must_not_ be assumed that the compiler will do what you want
     with memory references that are not protected by READ_ONCE() and
     WRITE_ONCE().  Without them, the compiler is within its rights to
     do all sorts of "creative" transformations, which are covered in
     the COMPILER BARRIER section.

3. The code is only correct under the naive execution (all code is
executed literally as written). But we don't want compiler to execute
code in such way, we want them to all employ all possible optimization
tricks to make it faster. As the result compiler can break this code.
It can reload the condition multiple times, including within the same
branch, it can use variables as scratch storage and do other things
that you and me can't think of now. Some of these optimizations are
reasonable, some are not reasonable but still legal and just a result
complex compiler logic giving surprising result on a corner case. Also
if current implementation details of rhashtable_remove_fast change,
surprising things can happen, e.g. executing just refcount_dec but not
rhashtable_remove_fast.
"Proving" impossibility of any of unexpected transformations it's just
not what we should be spending time on again and again. E.g. a hundred
of times people will skim through this code in future chasing some
bug.

4. Then READ_ONCE()/WRITE_ONCE() improve code self-documentation.
There is huge difference between a plain load and inter-thread
synchronization algorithms. This needs to be clearly visible in the
code. Especially when one hunts a racy memory corruption in large base
of code (which happens with kernel all the time).

5. Marking all shared access is required to enable data race detection
tooling. Which is an absolute must for kernel taking into account
amount and complexity of tricky multi-threaded code. We need this.

6. We need marking of all shared memory accesses for the kernel memory
model effort. There is no way the model can be defined without that.

7. This is very different from spin_lock example. spin_lock is a
synchronization primitive that effectively makes code inside of the
critical section single-threaded. But in this case read/write to dead
execute concurrently.

So what are the reasons to give up all these nice things and go into
the rabbit hole of "proving" that we don't see how compiler could
screw up things?
If there are no significant reasons that outweigh all of these points,
please use READ_ONCE()/WRITE_ONCE() for all shared accesses. Many will
say thank you.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-31  8:24             ` Dmitry Vyukov
@ 2019-05-31 14:45               ` Herbert Xu
  2019-05-31 15:45                 ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Herbert Xu @ 2019-05-31 14:45 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Paul E. McKenney, Andrea Parri, Alan Stern, Eric Dumazet,
	David Miller, netdev, Eric Dumazet, syzbot

On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote:
>
> OK, let's call it barrier. But we need more than a barrier here then.

READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle
around in your code to make it work without locks.  You need to
understand exactly why you need them and why the code would be
buggy if you don't use them.

In this case the code doesn't need them because an implicit
barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
exists in both places.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-31 14:45               ` Herbert Xu
@ 2019-05-31 15:45                 ` Eric Dumazet
  2019-05-31 16:29                   ` Andrea Parri
  2019-05-31 17:11                   ` Paul E. McKenney
  0 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-31 15:45 UTC (permalink / raw)
  To: Herbert Xu, Dmitry Vyukov
  Cc: Paul E. McKenney, Andrea Parri, Alan Stern, Eric Dumazet,
	David Miller, netdev, Eric Dumazet, syzbot



On 5/31/19 7:45 AM, Herbert Xu wrote:
> On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote:
>>
>> OK, let's call it barrier. But we need more than a barrier here then.
> 
> READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle
> around in your code to make it work without locks.  You need to
> understand exactly why you need them and why the code would be
> buggy if you don't use them.
> 
> In this case the code doesn't need them because an implicit
> barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> exists in both places.
>

More over, adding READ_ONCE() while not really needed prevents some compiler
optimizations.

( Not in this particular case, since fqdir->dead is read exactly once, but we could
have had a loop )

I have already explained that the READ_ONCE() was a leftover of the first version
of the patch, that I refined later, adding correct (and slightly more complex) RCU
barriers and rules.

Dmitry, the self-documentation argument is perfectly good, but Herbert
put much nicer ad hoc comments.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-31 15:45                 ` Eric Dumazet
@ 2019-05-31 16:29                   ` Andrea Parri
  2019-05-31 17:11                     ` Eric Dumazet
  2019-05-31 17:11                   ` Paul E. McKenney
  1 sibling, 1 reply; 28+ messages in thread
From: Andrea Parri @ 2019-05-31 16:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, Dmitry Vyukov, Paul E. McKenney, Alan Stern,
	Eric Dumazet, David Miller, netdev, syzbot

On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> On 5/31/19 7:45 AM, Herbert Xu wrote:

> > In this case the code doesn't need them because an implicit
> > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> > exists in both places.


> I have already explained that the READ_ONCE() was a leftover of the first version
> of the patch, that I refined later, adding correct (and slightly more complex) RCU
> barriers and rules.

AFAICT, neither barrier() nor RCU synchronization can be used as
a replacement for {READ,WRITE}_ONCE() here (and in tons of other
different situations).  IOW, you might want to try harder.  ;-)

Thanks,
  Andrea

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-31 16:29                   ` Andrea Parri
@ 2019-05-31 17:11                     ` Eric Dumazet
  2019-06-07 12:06                       ` Dmitry Vyukov
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2019-05-31 17:11 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Eric Dumazet, Herbert Xu, Dmitry Vyukov, Paul E. McKenney,
	Alan Stern, David Miller, netdev, syzbot

On Fri, May 31, 2019 at 9:29 AM Andrea Parri
<andrea.parri@amarulasolutions.com> wrote:
>
> On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> > On 5/31/19 7:45 AM, Herbert Xu wrote:
>
> > > In this case the code doesn't need them because an implicit
> > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> > > exists in both places.
>
>
> > I have already explained that the READ_ONCE() was a leftover of the first version
> > of the patch, that I refined later, adding correct (and slightly more complex) RCU
> > barriers and rules.
>
> AFAICT, neither barrier() nor RCU synchronization can be used as
> a replacement for {READ,WRITE}_ONCE() here (and in tons of other
> different situations).  IOW, you might want to try harder.  ;-)

At least the writer side is using queue_rcu_work() which implies many
full memory barriers,
it is not equivalent to a compiler barrier() :/

David, Herbert, I really do not care, I want to move on fixing real
bugs, not arguing with memory barriers experts.

Lets add back the READ_ONCE() and be happy.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-31 15:45                 ` Eric Dumazet
  2019-05-31 16:29                   ` Andrea Parri
@ 2019-05-31 17:11                   ` Paul E. McKenney
  2019-05-31 17:18                     ` Eric Dumazet
  1 sibling, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2019-05-31 17:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, Dmitry Vyukov, Andrea Parri, Alan Stern,
	Eric Dumazet, David Miller, netdev, syzbot

On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> 
> 
> On 5/31/19 7:45 AM, Herbert Xu wrote:
> > On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote:
> >>
> >> OK, let's call it barrier. But we need more than a barrier here then.
> > 
> > READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle
> > around in your code to make it work without locks.  You need to
> > understand exactly why you need them and why the code would be
> > buggy if you don't use them.
> > 
> > In this case the code doesn't need them because an implicit
> > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> > exists in both places.
> >
> 
> More over, adding READ_ONCE() while not really needed prevents some compiler
> optimizations.
> 
> ( Not in this particular case, since fqdir->dead is read exactly once, but we could
> have had a loop )
> 
> I have already explained that the READ_ONCE() was a leftover of the first version
> of the patch, that I refined later, adding correct (and slightly more complex) RCU
> barriers and rules.
> 
> Dmitry, the self-documentation argument is perfectly good, but Herbert
> put much nicer ad hoc comments.

I don't see all the code, but let me see if I understand based on the
pieces that I do see...

o	fqdir_exit() does a store-release to ->dead, then arranges
	for fqdir_rwork_fn() to be called from workqueue context
	after a grace period has elapsed.

o	If inet_frag_kill() is invoked only from fqdir_rwork_fn(),
	and if they are using the same fqdir, then inet_frag_kill()
	would always see fqdir->dead==true.

	But then it would not be necessary to check it, so this seems
	unlikely.

o	If fqdir_exit() does store-releases to a number of ->dead
	fields under rcu_read_lock(), and if the next fqdir_exit()
	won't happen until after all the callbacks complete
	(combination of flushing workqueues and rcu_barrier(), for
	example), then ->dead would be stable when inet_frag_kill()
	is invoked, and might be true or not.  (This again requires
	inet_frag_kill() be only invoked from fqdir_rwork_fn().)

So I can imagine cases where this would in fact work.  But did I get
it right or is something else happening?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-31 17:11                   ` Paul E. McKenney
@ 2019-05-31 17:18                     ` Eric Dumazet
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2019-05-31 17:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Eric Dumazet, Herbert Xu, Dmitry Vyukov, Andrea Parri,
	Alan Stern, David Miller, netdev, syzbot

On Fri, May 31, 2019 at 10:11 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
>
> On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> >
> >
> > On 5/31/19 7:45 AM, Herbert Xu wrote:
> > > On Fri, May 31, 2019 at 10:24:08AM +0200, Dmitry Vyukov wrote:
> > >>
> > >> OK, let's call it barrier. But we need more than a barrier here then.
> > >
> > > READ_ONCE/WRITE_ONCE is not some magical dust that you sprinkle
> > > around in your code to make it work without locks.  You need to
> > > understand exactly why you need them and why the code would be
> > > buggy if you don't use them.
> > >
> > > In this case the code doesn't need them because an implicit
> > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> > > exists in both places.
> > >
> >
> > More over, adding READ_ONCE() while not really needed prevents some compiler
> > optimizations.
> >
> > ( Not in this particular case, since fqdir->dead is read exactly once, but we could
> > have had a loop )
> >
> > I have already explained that the READ_ONCE() was a leftover of the first version
> > of the patch, that I refined later, adding correct (and slightly more complex) RCU
> > barriers and rules.
> >
> > Dmitry, the self-documentation argument is perfectly good, but Herbert
> > put much nicer ad hoc comments.
>
> I don't see all the code, but let me see if I understand based on the
> pieces that I do see...
>
> o       fqdir_exit() does a store-release to ->dead, then arranges
>         for fqdir_rwork_fn() to be called from workqueue context
>         after a grace period has elapsed.
>
> o       If inet_frag_kill() is invoked only from fqdir_rwork_fn(),
>         and if they are using the same fqdir, then inet_frag_kill()
>         would always see fqdir->dead==true.
>
>         But then it would not be necessary to check it, so this seems
>         unlikely
>

Nope, inet_frag_kill() can be called from timer handler, and there is
already an existing barrier (spinlock) before we call it (also under
rcu_read_lock())

ip_expire(struct timer_list *t)

rcu_read_lock();
spin_lock(&qp->q.lock);
 ... ipq_kill(qp);   -> inet_frag_kill()





> o       If fqdir_exit() does store-releases to a number of ->dead
>         fields under rcu_read_lock(), and if the next fqdir_exit()
>         won't happen until after all the callbacks complete
>         (combination of flushing workqueues and rcu_barrier(), for
>         example), then ->dead would be stable when inet_frag_kill()
>         is invoked, and might be true or not.  (This again requires
>         inet_frag_kill() be only invoked from fqdir_rwork_fn().)
>
> So I can imagine cases where this would in fact work.  But did I get
> it right or is something else happening?
>
>                                                         Thanx, Paul
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE
  2019-05-31 17:11                     ` Eric Dumazet
@ 2019-06-07 12:06                       ` Dmitry Vyukov
  0 siblings, 0 replies; 28+ messages in thread
From: Dmitry Vyukov @ 2019-06-07 12:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrea Parri, Eric Dumazet, Herbert Xu, Paul E. McKenney,
	Alan Stern, David Miller, netdev, syzbot

On Fri, May 31, 2019 at 7:11 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, May 31, 2019 at 9:29 AM Andrea Parri
> <andrea.parri@amarulasolutions.com> wrote:
> >
> > On Fri, May 31, 2019 at 08:45:47AM -0700, Eric Dumazet wrote:
> > > On 5/31/19 7:45 AM, Herbert Xu wrote:
> >
> > > > In this case the code doesn't need them because an implicit
> > > > barrier() (which is *stronger* than READ_ONCE/WRITE_ONCE) already
> > > > exists in both places.
> >
> >
> > > I have already explained that the READ_ONCE() was a leftover of the first version
> > > of the patch, that I refined later, adding correct (and slightly more complex) RCU
> > > barriers and rules.
> >
> > AFAICT, neither barrier() nor RCU synchronization can be used as
> > a replacement for {READ,WRITE}_ONCE() here (and in tons of other
> > different situations).  IOW, you might want to try harder.  ;-)
>
> At least the writer side is using queue_rcu_work() which implies many
> full memory barriers,
> it is not equivalent to a compiler barrier() :/
>
> David, Herbert, I really do not care, I want to move on fixing real
> bugs, not arguing with memory barriers experts.
>
> Lets add back the READ_ONCE() and be happy.

We will have get back to this later with LKMM and KTSAN.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-06-07 12:07 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-24 16:03 [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 01/11] inet: rename netns_frags to fqdir Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 02/11] net: rename inet_frags_exit_net() to fqdir_exit() Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 03/11] net: rename struct fqdir fields Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 04/11] ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[] Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 05/11] ipv6: no longer reference init_net in ip6_frags_ns_ctl_table[] Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 06/11] netfilter: ipv6: nf_defrag: no longer reference init_net in nf_ct_frag6_sysctl_table Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 07/11] ieee820154: 6lowpan: no longer reference init_net in lowpan_frags_ns_ctl_table Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 08/11] net: rename inet_frags_init_net() to fdir_init() Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 09/11] net: add a net pointer to struct fqdir Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 10/11] net: dynamically allocate fqdir structures Eric Dumazet
2019-05-24 16:03 ` [PATCH net-next 11/11] inet: frags: rework rhashtable dismantle Eric Dumazet
2019-05-28  6:34   ` Herbert Xu
2019-05-28 13:31     ` Eric Dumazet
2019-05-29  5:40       ` [PATCH] inet: frags: Remove unnecessary smp_store_release/READ_ONCE Herbert Xu
2019-05-29  5:43         ` Dmitry Vyukov
2019-05-29  5:47           ` Herbert Xu
2019-05-31  8:24             ` Dmitry Vyukov
2019-05-31 14:45               ` Herbert Xu
2019-05-31 15:45                 ` Eric Dumazet
2019-05-31 16:29                   ` Andrea Parri
2019-05-31 17:11                     ` Eric Dumazet
2019-06-07 12:06                       ` Dmitry Vyukov
2019-05-31 17:11                   ` Paul E. McKenney
2019-05-31 17:18                     ` Eric Dumazet
2019-05-29 21:30         ` Eric Dumazet
2019-05-30 18:51         ` David Miller
2019-05-26 21:26 ` [PATCH net-next 00/11] inet: frags: avoid possible races at netns dismantle David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.