netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next PATCH 0/3] net: frag performance followup
@ 2013-03-27 15:54 Jesper Dangaard Brouer
  2013-03-27 15:55 ` [net-next PATCH 1/3] net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop Jesper Dangaard Brouer
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-03-27 15:54 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Jesper Dangaard Brouer, netdev, Florian Westphal,
	Daniel Borkmann, Hannes Frederic Sowa

This patchset is a followup to my previously accepted fragmentation
patchset:
 http://thread.gmane.org/gmane.linux.network/257155

This patchset is not my entire patch queue, as I have left out the
patch I mentioned in:
 http://thread.gmane.org/gmane.linux.network/261924
 "RFC crap-patch [PATCH] net: Per CPU separate frag mem accounting"

Because I'm working on another "replacement" patch which removes the LRU
list, which I discussed with Eric Dumazet during Netfilter Workshop. I
have some preliminary results of that later in this mail.


I'm uncertain if this is net-next or net material?
(for now it's based on net-next on top of commit f5a03cf461)

Patch list:
 Patch-01: avoid several CPUs grabbing same frag queue during LRU evictor loop
 Patch-02: use the frag lru_lock to protect netns_frags.nqueues update
 Patch-03: frag queue per hash bucket locking
 (below not-included)
 Patch-XX: Try Impl. Eric's idea, no LRU and direct hash cleaning


Notice, I have changed the frag DoS generator script to be more
efficient/deadly.  Before it would only hit one RX queue, now its
sending packets causing multi-queue RX, due to "better" RX hashing.

Same test setup:
 Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
 Ethernet flow-control.  A third interface is used for generating the
 DoS attack (with trafgen).

Test types summary (netperf UDP_STREAM):
 Test-20G64K     == 2x10G with 65K fragments
 Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
 Test-20G64K+DoS == Same as 20G64K with frag DoS
 Test-20G3F+DoS  == Same as 20G3F  with frag DoS
 Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
 Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS


Performance table summary (in Mbit/s):

 Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
 ----------  -------   -------  ----------  ---------  --------  -------
  net-next:  18486.7   10723.2   3657.85     4560.64      99.9    189.1
  Patch-01:  18830.8   13388.4   4054.96     5377.27     127.9    433.4
  Patch-02:  18848.7   13230.1   4103.04     5310.36     130.0    440.2
  Patch-03:  18838.0   13490.5   4405.11     6814.72     196.6    461.6
  (below work-in-progress)
  Patch-XX:  18800.0   15698.4  10012.90    12039.00   4257.39   3305.8

After his patchset, the LRU list is the major bottleneck. As can also
be seen by my preliminary results of removing the LRU list.

---

Jesper Dangaard Brouer (3):
      net: frag queue per hash bucket locking
      net: use the frag lru_lock to protect netns_frags.nqueues update
      net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop


 include/net/inet_frag.h  |   11 +++++++-
 net/ipv4/inet_fragment.c |   65 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 60 insertions(+), 16 deletions(-)


--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [net-next PATCH 1/3] net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop
  2013-03-27 15:54 [net-next PATCH 0/3] net: frag performance followup Jesper Dangaard Brouer
@ 2013-03-27 15:55 ` Jesper Dangaard Brouer
  2013-03-27 16:14   ` Eric Dumazet
  2013-03-27 15:55 ` [net-next PATCH 2/3] net: use the frag lru_lock to protect netns_frags.nqueues update Jesper Dangaard Brouer
  2013-03-27 15:56 ` [net-next PATCH 3/3] net: frag queue per hash bucket locking Jesper Dangaard Brouer
  2 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-03-27 15:55 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Jesper Dangaard Brouer, netdev, Florian Westphal,
	Daniel Borkmann, Hannes Frederic Sowa

The LRU list is protected by its own lock, since commit 3ef0eb0db4
(net: frag, move LRU list maintenance outside of rwlock), and
no-longer by a read_lock.

This makes it possible, to remove the inet_frag_queue, which is about
to be "evicted", from the LRU list head.  This avoids the problem, of
several CPUs grabbing the same frag queue.

Note, cannot remove the inet_frag_lru_del() call in fq_unlink()
called by inet_frag_kill(), because inet_frag_kill() is also used in
other situations.  Thus, we use list_del_init() to allow this
double list_del to work.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 net/ipv4/inet_fragment.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 2bff045..8ba548a 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -204,6 +204,9 @@ int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force)
 		q = list_first_entry(&nf->lru_list,
 				struct inet_frag_queue, lru_list);
 		atomic_inc(&q->refcnt);
+		/* Remove q from list to avoid several CPUs grabbing it */
+		list_del_init(&q->lru_list);
+
 		spin_unlock(&nf->lru_lock);
 
 		spin_lock(&q->lock);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next PATCH 2/3] net: use the frag lru_lock to protect netns_frags.nqueues update
  2013-03-27 15:54 [net-next PATCH 0/3] net: frag performance followup Jesper Dangaard Brouer
  2013-03-27 15:55 ` [net-next PATCH 1/3] net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop Jesper Dangaard Brouer
@ 2013-03-27 15:55 ` Jesper Dangaard Brouer
  2013-03-27 16:21   ` Eric Dumazet
  2013-03-27 15:56 ` [net-next PATCH 3/3] net: frag queue per hash bucket locking Jesper Dangaard Brouer
  2 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-03-27 15:55 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Jesper Dangaard Brouer, netdev, Florian Westphal,
	Daniel Borkmann, Hannes Frederic Sowa

Move the protection of netns_frags.nqueues updates under the LRU_lock,
instead of the write lock.  As they are located on the same cacheline,
and this is also needed when transitioning to use per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 include/net/inet_frag.h  |    2 ++
 net/ipv4/inet_fragment.c |    2 --
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 64b4e7d..7cac9c5 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -143,6 +143,7 @@ static inline void inet_frag_lru_del(struct inet_frag_queue *q)
 {
 	spin_lock(&q->net->lru_lock);
 	list_del(&q->lru_list);
+	q->net->nqueues--;
 	spin_unlock(&q->net->lru_lock);
 }
 
@@ -151,6 +152,7 @@ static inline void inet_frag_lru_add(struct netns_frags *nf,
 {
 	spin_lock(&nf->lru_lock);
 	list_add_tail(&q->lru_list, &nf->lru_list);
+	q->net->nqueues++;
 	spin_unlock(&nf->lru_lock);
 }
 
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 8ba548a..1206ca6 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -124,7 +124,6 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 {
 	write_lock(&f->lock);
 	hlist_del(&fq->list);
-	fq->net->nqueues--;
 	write_unlock(&f->lock);
 	inet_frag_lru_del(fq);
 }
@@ -260,7 +259,6 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 
 	atomic_inc(&qp->refcnt);
 	hlist_add_head(&qp->list, &f->hash[hash]);
-	nf->nqueues++;
 	write_unlock(&f->lock);
 	inet_frag_lru_add(nf, qp);
 	return qp;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-27 15:54 [net-next PATCH 0/3] net: frag performance followup Jesper Dangaard Brouer
  2013-03-27 15:55 ` [net-next PATCH 1/3] net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop Jesper Dangaard Brouer
  2013-03-27 15:55 ` [net-next PATCH 2/3] net: use the frag lru_lock to protect netns_frags.nqueues update Jesper Dangaard Brouer
@ 2013-03-27 15:56 ` Jesper Dangaard Brouer
  2013-03-27 17:25   ` Eric Dumazet
  2 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-03-27 15:56 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Jesper Dangaard Brouer, netdev, Florian Westphal,
	Daniel Borkmann, Hannes Frederic Sowa

This patch implements per hash bucket locking for the frag queue
hash.  This removes two write locks, and the only remaining write
lock is for protecting hash rebuild.  This essentially reduce the
readers-writer lock to a rebuild lock.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 include/net/inet_frag.h  |    9 ++++++-
 net/ipv4/inet_fragment.c |   60 ++++++++++++++++++++++++++++++++++++----------
 2 files changed, 55 insertions(+), 14 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 7cac9c5..c4f5183 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -50,10 +50,17 @@ struct inet_frag_queue {
  */
 #define INETFRAGS_MAXDEPTH		128
 
+struct inet_frag_bucket {
+	struct hlist_head	chain;
+	spinlock_t		chain_lock;
+	u16			chain_len;
+};
+
 struct inet_frags {
-	struct hlist_head	hash[INETFRAGS_HASHSZ];
+	struct inet_frag_bucket	hash[INETFRAGS_HASHSZ];
 	/* This rwlock is a global lock (seperate per IPv4, IPv6 and
 	 * netfilter). Important to keep this on a seperate cacheline.
+	 * Its primarily a rebuild protection rwlock.
 	 */
 	rwlock_t		lock ____cacheline_aligned_in_smp;
 	int			secret_interval;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 1206ca6..3471599 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -52,20 +52,27 @@ static void inet_frag_secret_rebuild(unsigned long dummy)
 	unsigned long now = jiffies;
 	int i;
 
+	/* Per bucket lock NOT needed here, due to write lock protection */
 	write_lock(&f->lock);
+
 	get_random_bytes(&f->rnd, sizeof(u32));
 	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
+		struct inet_frag_bucket *hb;
 		struct inet_frag_queue *q;
 		struct hlist_node *n;
 
-		hlist_for_each_entry_safe(q, n, &f->hash[i], list) {
+		hb = &f->hash[i];
+		hlist_for_each_entry_safe(q, n, &hb->chain, list) {
 			unsigned int hval = f->hashfn(q);
 
 			if (hval != i) {
+				struct inet_frag_bucket *hb_dest;
+
 				hlist_del(&q->list);
 
 				/* Relink to new hash chain. */
-				hlist_add_head(&q->list, &f->hash[hval]);
+				hb_dest = &f->hash[hval];
+				hlist_add_head(&q->list, &hb_dest->chain);
 			}
 		}
 	}
@@ -78,9 +85,13 @@ void inet_frags_init(struct inet_frags *f)
 {
 	int i;
 
-	for (i = 0; i < INETFRAGS_HASHSZ; i++)
-		INIT_HLIST_HEAD(&f->hash[i]);
+	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
+		struct inet_frag_bucket *hb = &f->hash[i];
 
+		spin_lock_init(&hb->chain_lock);
+		INIT_HLIST_HEAD(&hb->chain);
+		hb->chain_len = 0;
+	}
 	rwlock_init(&f->lock);
 
 	f->rnd = (u32) ((num_physpages ^ (num_physpages>>7)) ^
@@ -122,9 +133,19 @@ EXPORT_SYMBOL(inet_frags_exit_net);
 
 static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 {
-	write_lock(&f->lock);
+	struct inet_frag_bucket *hb;
+	unsigned int hash;
+
+	read_lock(&f->lock);
+	hash = f->hashfn(fq);
+	hb = &f->hash[hash];
+
+	spin_lock_bh(&hb->chain_lock);
 	hlist_del(&fq->list);
-	write_unlock(&f->lock);
+	hb->chain_len--;
+	spin_unlock_bh(&hb->chain_lock);
+
+	read_unlock(&f->lock);
 	inet_frag_lru_del(fq);
 }
 
@@ -226,27 +247,32 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 		struct inet_frag_queue *qp_in, struct inet_frags *f,
 		void *arg)
 {
+	struct inet_frag_bucket *hb;
 	struct inet_frag_queue *qp;
 #ifdef CONFIG_SMP
 #endif
 	unsigned int hash;
 
-	write_lock(&f->lock);
+	read_lock(&f->lock); /* Protects against hash rebuild */
 	/*
 	 * While we stayed w/o the lock other CPU could update
 	 * the rnd seed, so we need to re-calculate the hash
 	 * chain. Fortunatelly the qp_in can be used to get one.
 	 */
 	hash = f->hashfn(qp_in);
+	hb = &f->hash[hash];
+	spin_lock_bh(&hb->chain_lock);
+
 #ifdef CONFIG_SMP
 	/* With SMP race we have to recheck hash table, because
 	 * such entry could be created on other cpu, while we
-	 * promoted read lock to write lock.
+	 * released the hash bucket lock.
 	 */
-	hlist_for_each_entry(qp, &f->hash[hash], list) {
+	hlist_for_each_entry(qp, &hb->chain, list) {
 		if (qp->net == nf && f->match(qp, arg)) {
 			atomic_inc(&qp->refcnt);
-			write_unlock(&f->lock);
+			spin_unlock_bh(&hb->chain_lock);
+			read_unlock(&f->lock);
 			qp_in->last_in |= INET_FRAG_COMPLETE;
 			inet_frag_put(qp_in, f);
 			return qp;
@@ -258,8 +284,10 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 		atomic_inc(&qp->refcnt);
 
 	atomic_inc(&qp->refcnt);
-	hlist_add_head(&qp->list, &f->hash[hash]);
-	write_unlock(&f->lock);
+	hlist_add_head(&qp->list, &hb->chain);
+	hb->chain_len++;
+	spin_unlock_bh(&hb->chain_lock);
+	read_unlock(&f->lock);
 	inet_frag_lru_add(nf, qp);
 	return qp;
 }
@@ -300,17 +328,23 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 		struct inet_frags *f, void *key, unsigned int hash)
 	__releases(&f->lock)
 {
+	struct inet_frag_bucket *hb;
 	struct inet_frag_queue *q;
 	int depth = 0;
 
-	hlist_for_each_entry(q, &f->hash[hash], list) {
+	hb = &f->hash[hash];
+
+	spin_lock_bh(&hb->chain_lock);
+	hlist_for_each_entry(q, &hb->chain, list) {
 		if (q->net == nf && f->match(q, key)) {
 			atomic_inc(&q->refcnt);
+			spin_unlock_bh(&hb->chain_lock);
 			read_unlock(&f->lock);
 			return q;
 		}
 		depth++;
 	}
+	spin_unlock_bh(&hb->chain_lock);
 	read_unlock(&f->lock);
 
 	if (depth <= INETFRAGS_MAXDEPTH)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 1/3] net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop
  2013-03-27 15:55 ` [net-next PATCH 1/3] net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop Jesper Dangaard Brouer
@ 2013-03-27 16:14   ` Eric Dumazet
  2013-03-27 17:10     ` David Miller
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2013-03-27 16:14 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S. Miller, netdev, Florian Westphal, Daniel Borkmann,
	Hannes Frederic Sowa

On Wed, 2013-03-27 at 16:55 +0100, Jesper Dangaard Brouer wrote:
> The LRU list is protected by its own lock, since commit 3ef0eb0db4
> (net: frag, move LRU list maintenance outside of rwlock), and
> no-longer by a read_lock.
> 
> This makes it possible, to remove the inet_frag_queue, which is about
> to be "evicted", from the LRU list head.  This avoids the problem, of
> several CPUs grabbing the same frag queue.
> 
> Note, cannot remove the inet_frag_lru_del() call in fq_unlink()
> called by inet_frag_kill(), because inet_frag_kill() is also used in
> other situations.  Thus, we use list_del_init() to allow this
> double list_del to work.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
> 
>  net/ipv4/inet_fragment.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 2/3] net: use the frag lru_lock to protect netns_frags.nqueues update
  2013-03-27 15:55 ` [net-next PATCH 2/3] net: use the frag lru_lock to protect netns_frags.nqueues update Jesper Dangaard Brouer
@ 2013-03-27 16:21   ` Eric Dumazet
  2013-03-27 17:10     ` David Miller
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2013-03-27 16:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S. Miller, netdev, Florian Westphal, Daniel Borkmann,
	Hannes Frederic Sowa

On Wed, 2013-03-27 at 16:55 +0100, Jesper Dangaard Brouer wrote:
> Move the protection of netns_frags.nqueues updates under the LRU_lock,
> instead of the write lock.  As they are located on the same cacheline,
> and this is also needed when transitioning to use per hash bucket locking.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 1/3] net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop
  2013-03-27 16:14   ` Eric Dumazet
@ 2013-03-27 17:10     ` David Miller
  0 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2013-03-27 17:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: brouer, netdev, fw, dborkman, hannes

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 27 Mar 2013 09:14:04 -0700

> On Wed, 2013-03-27 at 16:55 +0100, Jesper Dangaard Brouer wrote:
>> The LRU list is protected by its own lock, since commit 3ef0eb0db4
>> (net: frag, move LRU list maintenance outside of rwlock), and
>> no-longer by a read_lock.
>> 
>> This makes it possible, to remove the inet_frag_queue, which is about
>> to be "evicted", from the LRU list head.  This avoids the problem, of
>> several CPUs grabbing the same frag queue.
>> 
>> Note, cannot remove the inet_frag_lru_del() call in fq_unlink()
>> called by inet_frag_kill(), because inet_frag_kill() is also used in
>> other situations.  Thus, we use list_del_init() to allow this
>> double list_del to work.
>> 
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
>> 
>>  net/ipv4/inet_fragment.c |    3 +++
>>  1 files changed, 3 insertions(+), 0 deletions(-)
>> 
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 2/3] net: use the frag lru_lock to protect netns_frags.nqueues update
  2013-03-27 16:21   ` Eric Dumazet
@ 2013-03-27 17:10     ` David Miller
  0 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2013-03-27 17:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: brouer, netdev, fw, dborkman, hannes

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 27 Mar 2013 09:21:48 -0700

> On Wed, 2013-03-27 at 16:55 +0100, Jesper Dangaard Brouer wrote:
>> Move the protection of netns_frags.nqueues updates under the LRU_lock,
>> instead of the write lock.  As they are located on the same cacheline,
>> and this is also needed when transitioning to use per hash bucket locking.
>> 
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-27 15:56 ` [net-next PATCH 3/3] net: frag queue per hash bucket locking Jesper Dangaard Brouer
@ 2013-03-27 17:25   ` Eric Dumazet
  2013-03-28 18:57     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2013-03-27 17:25 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S. Miller, netdev, Florian Westphal, Daniel Borkmann,
	Hannes Frederic Sowa

On Wed, 2013-03-27 at 16:56 +0100, Jesper Dangaard Brouer wrote:
> This patch implements per hash bucket locking for the frag queue
> hash.  This removes two write locks, and the only remaining write
> lock is for protecting hash rebuild.  This essentially reduce the
> readers-writer lock to a rebuild lock.
> 

> @@ -226,27 +247,32 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
>  		struct inet_frag_queue *qp_in, struct inet_frags *f,
>  		void *arg)
>  {
> +	struct inet_frag_bucket *hb;
>  	struct inet_frag_queue *qp;
>  #ifdef CONFIG_SMP
>  #endif
>  	unsigned int hash;
>  
> -	write_lock(&f->lock);
> +	read_lock(&f->lock); /* Protects against hash rebuild */
>  	/*
>  	 * While we stayed w/o the lock other CPU could update
>  	 * the rnd seed, so we need to re-calculate the hash
>  	 * chain. Fortunatelly the qp_in can be used to get one.
>  	 */
>  	hash = f->hashfn(qp_in);
> +	hb = &f->hash[hash];
> +	spin_lock_bh(&hb->chain_lock);
> +
>  #ifdef CONFIG_SMP
>  	/* With SMP race we have to recheck hash table, because
>  	 * such entry could be created on other cpu, while we
> -	 * promoted read lock to write lock.
> +	 * released the hash bucket lock.
>  	 */
> -	hlist_for_each_entry(qp, &f->hash[hash], list) {
> +	hlist_for_each_entry(qp, &hb->chain, list) {
>  		if (qp->net == nf && f->match(qp, arg)) {
>  			atomic_inc(&qp->refcnt);
> -			write_unlock(&f->lock);
> +			spin_unlock_bh(&hb->chain_lock);
> +			read_unlock(&f->lock);
>  			qp_in->last_in |= INET_FRAG_COMPLETE;
>  			inet_frag_put(qp_in, f);
>  			return qp;
> @@ -258,8 +284,10 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
>  		atomic_inc(&qp->refcnt);
>  
>  	atomic_inc(&qp->refcnt);
> -	hlist_add_head(&qp->list, &f->hash[hash]);
> -	write_unlock(&f->lock);
> +	hlist_add_head(&qp->list, &hb->chain);
> +	hb->chain_len++;
> +	spin_unlock_bh(&hb->chain_lock);
> +	read_unlock(&f->lock);
>  	inet_frag_lru_add(nf, qp);
>  	return qp;
>  }


I am not sure why you added _bh suffix to spin_lock()/spin_unlock()
here ?

(Please check in other functions as well)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-27 17:25   ` Eric Dumazet
@ 2013-03-28 18:57     ` Hannes Frederic Sowa
  2013-03-28 19:03       ` David Miller
  2013-03-28 20:22       ` Eric Dumazet
  0 siblings, 2 replies; 29+ messages in thread
From: Hannes Frederic Sowa @ 2013-03-28 18:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jesper Dangaard Brouer, David S. Miller, netdev,
	Florian Westphal, Daniel Borkmann

On Wed, Mar 27, 2013 at 10:25:59AM -0700, Eric Dumazet wrote:
> On Wed, 2013-03-27 at 16:56 +0100, Jesper Dangaard Brouer wrote:
> > This patch implements per hash bucket locking for the frag queue
> > hash.  This removes two write locks, and the only remaining write
> > lock is for protecting hash rebuild.  This essentially reduce the
> > readers-writer lock to a rebuild lock.
> > 
> 
> > @@ -226,27 +247,32 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
> >  		struct inet_frag_queue *qp_in, struct inet_frags *f,
> >  		void *arg)
> >  {
> > +	struct inet_frag_bucket *hb;
> >  	struct inet_frag_queue *qp;
> >  #ifdef CONFIG_SMP
> >  #endif
> >  	unsigned int hash;
> >  
> > -	write_lock(&f->lock);
> > +	read_lock(&f->lock); /* Protects against hash rebuild */
> >  	/*
> >  	 * While we stayed w/o the lock other CPU could update
> >  	 * the rnd seed, so we need to re-calculate the hash
> >  	 * chain. Fortunatelly the qp_in can be used to get one.
> >  	 */
> >  	hash = f->hashfn(qp_in);
> > +	hb = &f->hash[hash];
> > +	spin_lock_bh(&hb->chain_lock);
> > +
> >  #ifdef CONFIG_SMP
> >  	/* With SMP race we have to recheck hash table, because
> >  	 * such entry could be created on other cpu, while we
> > -	 * promoted read lock to write lock.
> > +	 * released the hash bucket lock.
> >  	 */
> > -	hlist_for_each_entry(qp, &f->hash[hash], list) {
> > +	hlist_for_each_entry(qp, &hb->chain, list) {
> >  		if (qp->net == nf && f->match(qp, arg)) {
> >  			atomic_inc(&qp->refcnt);
> > -			write_unlock(&f->lock);
> > +			spin_unlock_bh(&hb->chain_lock);
> > +			read_unlock(&f->lock);
> >  			qp_in->last_in |= INET_FRAG_COMPLETE;
> >  			inet_frag_put(qp_in, f);
> >  			return qp;
> > @@ -258,8 +284,10 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
> >  		atomic_inc(&qp->refcnt);
> >  
> >  	atomic_inc(&qp->refcnt);
> > -	hlist_add_head(&qp->list, &f->hash[hash]);
> > -	write_unlock(&f->lock);
> > +	hlist_add_head(&qp->list, &hb->chain);
> > +	hb->chain_len++;
> > +	spin_unlock_bh(&hb->chain_lock);
> > +	read_unlock(&f->lock);
> >  	inet_frag_lru_add(nf, qp);
> >  	return qp;
> >  }
> 
> 
> I am not sure why you added _bh suffix to spin_lock()/spin_unlock()
> here ?

I assume that it has to do with the usage of this code in
ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
context, if I read it correctly.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-28 18:57     ` Hannes Frederic Sowa
@ 2013-03-28 19:03       ` David Miller
  2013-03-28 19:10         ` Hannes Frederic Sowa
  2013-03-28 20:22       ` Eric Dumazet
  1 sibling, 1 reply; 29+ messages in thread
From: David Miller @ 2013-03-28 19:03 UTC (permalink / raw)
  To: hannes; +Cc: eric.dumazet, brouer, netdev, fw, dborkman

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 28 Mar 2013 19:57:21 +0100

> On Wed, Mar 27, 2013 at 10:25:59AM -0700, Eric Dumazet wrote:
>> On Wed, 2013-03-27 at 16:56 +0100, Jesper Dangaard Brouer wrote:
>> > This patch implements per hash bucket locking for the frag queue
>> > hash.  This removes two write locks, and the only remaining write
>> > lock is for protecting hash rebuild.  This essentially reduce the
>> > readers-writer lock to a rebuild lock.
 ...
>> I am not sure why you added _bh suffix to spin_lock()/spin_unlock()
>> here ?
> 
> I assume that it has to do with the usage of this code in
> ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
> context, if I read it correctly.

That appears to be the case yes.

That's an odd environment for these routines to be invoked from,
so longer term we should probably make the nf conntrack code do
the BH disabling around the inet frag calls, rather than make the
inet frag code eat the extra overhead for the more common invocations.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-28 19:03       ` David Miller
@ 2013-03-28 19:10         ` Hannes Frederic Sowa
  2013-03-28 19:19           ` David Miller
  0 siblings, 1 reply; 29+ messages in thread
From: Hannes Frederic Sowa @ 2013-03-28 19:10 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, brouer, netdev, fw, dborkman

On Thu, Mar 28, 2013 at 03:03:59PM -0400, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Thu, 28 Mar 2013 19:57:21 +0100
> 
> > On Wed, Mar 27, 2013 at 10:25:59AM -0700, Eric Dumazet wrote:
> >> On Wed, 2013-03-27 at 16:56 +0100, Jesper Dangaard Brouer wrote:
> >> > This patch implements per hash bucket locking for the frag queue
> >> > hash.  This removes two write locks, and the only remaining write
> >> > lock is for protecting hash rebuild.  This essentially reduce the
> >> > readers-writer lock to a rebuild lock.
>  ...
> >> I am not sure why you added _bh suffix to spin_lock()/spin_unlock()
> >> here ?
> > 
> > I assume that it has to do with the usage of this code in
> > ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
> > context, if I read it correctly.
> 
> That appears to be the case yes.
> 
> That's an odd environment for these routines to be invoked from,
> so longer term we should probably make the nf conntrack code do
> the BH disabling around the inet frag calls, rather than make the
> inet frag code eat the extra overhead for the more common invocations.

My idea was a bh-safe flag in struct inet_frags and conditionally use
spin_lock and spin_unlock_bh (these could be wrapped in inline functions just
for inet_fragment.c).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-28 19:10         ` Hannes Frederic Sowa
@ 2013-03-28 19:19           ` David Miller
  0 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2013-03-28 19:19 UTC (permalink / raw)
  To: hannes; +Cc: eric.dumazet, brouer, netdev, fw, dborkman

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 28 Mar 2013 20:10:50 +0100

> On Thu, Mar 28, 2013 at 03:03:59PM -0400, David Miller wrote:
>> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Date: Thu, 28 Mar 2013 19:57:21 +0100
>> 
>> > On Wed, Mar 27, 2013 at 10:25:59AM -0700, Eric Dumazet wrote:
>> >> On Wed, 2013-03-27 at 16:56 +0100, Jesper Dangaard Brouer wrote:
>> >> > This patch implements per hash bucket locking for the frag queue
>> >> > hash.  This removes two write locks, and the only remaining write
>> >> > lock is for protecting hash rebuild.  This essentially reduce the
>> >> > readers-writer lock to a rebuild lock.
>>  ...
>> >> I am not sure why you added _bh suffix to spin_lock()/spin_unlock()
>> >> here ?
>> > 
>> > I assume that it has to do with the usage of this code in
>> > ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
>> > context, if I read it correctly.
>> 
>> That appears to be the case yes.
>> 
>> That's an odd environment for these routines to be invoked from,
>> so longer term we should probably make the nf conntrack code do
>> the BH disabling around the inet frag calls, rather than make the
>> inet frag code eat the extra overhead for the more common invocations.
> 
> My idea was a bh-safe flag in struct inet_frags and conditionally use
> spin_lock and spin_unlock_bh (these could be wrapped in inline functions just
> for inet_fragment.c).

That's an extra check eaten by all users.  Really, put the overhead into
the call sites that need it, and nowhere else.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-28 18:57     ` Hannes Frederic Sowa
  2013-03-28 19:03       ` David Miller
@ 2013-03-28 20:22       ` Eric Dumazet
  2013-03-28 23:30         ` Hannes Frederic Sowa
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2013-03-28 20:22 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jesper Dangaard Brouer, David S. Miller, netdev,
	Florian Westphal, Daniel Borkmann

On Thu, 2013-03-28 at 19:57 +0100, Hannes Frederic Sowa wrote:

> I assume that it has to do with the usage of this code in
> ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
> context, if I read it correctly.

Then there would be a possible deadlock in current code.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-28 20:22       ` Eric Dumazet
@ 2013-03-28 23:30         ` Hannes Frederic Sowa
  2013-03-28 23:39           ` Eric Dumazet
  0 siblings, 1 reply; 29+ messages in thread
From: Hannes Frederic Sowa @ 2013-03-28 23:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jesper Dangaard Brouer, David S. Miller, netdev,
	Florian Westphal, Daniel Borkmann

On Thu, Mar 28, 2013 at 01:22:44PM -0700, Eric Dumazet wrote:
> On Thu, 2013-03-28 at 19:57 +0100, Hannes Frederic Sowa wrote:
> 
> > I assume that it has to do with the usage of this code in
> > ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
> > context, if I read it correctly.
> 
> Then there would be a possible deadlock in current code.

Netfilter currently does a local_bh_disable() before entering inet_fragment
(and later enables it, again).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-28 23:30         ` Hannes Frederic Sowa
@ 2013-03-28 23:39           ` Eric Dumazet
  2013-03-29  0:33             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2013-03-28 23:39 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jesper Dangaard Brouer, David S. Miller, netdev,
	Florian Westphal, Daniel Borkmann

On Fri, 2013-03-29 at 00:30 +0100, Hannes Frederic Sowa wrote:
> On Thu, Mar 28, 2013 at 01:22:44PM -0700, Eric Dumazet wrote:
> > On Thu, 2013-03-28 at 19:57 +0100, Hannes Frederic Sowa wrote:
> > 
> > > I assume that it has to do with the usage of this code in
> > > ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
> > > context, if I read it correctly.
> > 
> > Then there would be a possible deadlock in current code.
> 
> Netfilter currently does a local_bh_disable() before entering inet_fragment
> (and later enables it, again).
> 

Good, so no need for the _bh() as I suspected.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-28 23:39           ` Eric Dumazet
@ 2013-03-29  0:33             ` Hannes Frederic Sowa
  2013-03-29 19:01               ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 29+ messages in thread
From: Hannes Frederic Sowa @ 2013-03-29  0:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jesper Dangaard Brouer, David S. Miller, netdev,
	Florian Westphal, Daniel Borkmann

On Thu, Mar 28, 2013 at 04:39:42PM -0700, Eric Dumazet wrote:
> On Fri, 2013-03-29 at 00:30 +0100, Hannes Frederic Sowa wrote:
> > On Thu, Mar 28, 2013 at 01:22:44PM -0700, Eric Dumazet wrote:
> > > On Thu, 2013-03-28 at 19:57 +0100, Hannes Frederic Sowa wrote:
> > > 
> > > > I assume that it has to do with the usage of this code in
> > > > ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
> > > > context, if I read it correctly.
> > > 
> > > Then there would be a possible deadlock in current code.
> > 
> > Netfilter currently does a local_bh_disable() before entering inet_fragment
> > (and later enables it, again).
> > 
> 
> Good, so no need for the _bh() as I suspected.

Ack.

I replaced the _bh spin_locks with plain spinlocks and tested the code
with sending fragments and receiving fragments (netfilter and reassmbly
logic) with lockdep and didn't get any splats. Looks good so far.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-29  0:33             ` Hannes Frederic Sowa
@ 2013-03-29 19:01               ` Jesper Dangaard Brouer
  2013-03-29 19:05                 ` Eric Dumazet
                                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-03-29 19:01 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Eric Dumazet, David S. Miller, netdev, Florian Westphal, Daniel Borkmann

On Fri, 2013-03-29 at 01:33 +0100, Hannes Frederic Sowa wrote:
> On Thu, Mar 28, 2013 at 04:39:42PM -0700, Eric Dumazet wrote:
> > On Fri, 2013-03-29 at 00:30 +0100, Hannes Frederic Sowa wrote:
> > > On Thu, Mar 28, 2013 at 01:22:44PM -0700, Eric Dumazet wrote:
> > > > On Thu, 2013-03-28 at 19:57 +0100, Hannes Frederic Sowa wrote:
> > > > 
> > > > > I assume that it has to do with the usage of this code in
> > > > > ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
> > > > > context, if I read it correctly.
> > > > 
> > > > Then there would be a possible deadlock in current code.
> > > 
> > > Netfilter currently does a local_bh_disable() before entering inet_fragment
> > > (and later enables it, again).
> > > 
> > 
> > Good, so no need for the _bh() as I suspected.
> 
> Ack.
> 
> I replaced the _bh spin_locks with plain spinlocks and tested the code
> with sending fragments and receiving fragments (netfilter and reassmbly
> logic) with lockdep and didn't get any splats. Looks good so far.

Well, it's great to see, that you are working on solving my patch
proposal.  While I'm on Easter vacation ;-)  Much appreciated.
I'm officially back from vacation Tuesday, and I'll repost then (after
testing it on my 10G testlab).

--Jesper

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-29 19:01               ` Jesper Dangaard Brouer
@ 2013-03-29 19:05                 ` Eric Dumazet
  2013-03-29 19:22                 ` David Miller
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Eric Dumazet @ 2013-03-29 19:05 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Hannes Frederic Sowa, David S. Miller, netdev, Florian Westphal,
	Daniel Borkmann

On Fri, 2013-03-29 at 20:01 +0100, Jesper Dangaard Brouer wrote:

> Well, it's great to see, that you are working on solving my patch
> proposal.  While I'm on Easter vacation ;-)  Much appreciated.
> I'm officially back from vacation Tuesday, and I'll repost then (after
> testing it on my 10G testlab).

Have a nice Easter break !

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-29 19:01               ` Jesper Dangaard Brouer
  2013-03-29 19:05                 ` Eric Dumazet
@ 2013-03-29 19:22                 ` David Miller
  2013-04-02 15:23                 ` Jesper Dangaard Brouer
  2013-04-03 22:11                 ` Jesper Dangaard Brouer
  3 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2013-03-29 19:22 UTC (permalink / raw)
  To: brouer; +Cc: hannes, eric.dumazet, netdev, fw, dborkman

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Fri, 29 Mar 2013 20:01:33 +0100

> On Fri, 2013-03-29 at 01:33 +0100, Hannes Frederic Sowa wrote:
>> On Thu, Mar 28, 2013 at 04:39:42PM -0700, Eric Dumazet wrote:
>> > On Fri, 2013-03-29 at 00:30 +0100, Hannes Frederic Sowa wrote:
>> > > On Thu, Mar 28, 2013 at 01:22:44PM -0700, Eric Dumazet wrote:
>> > > > On Thu, 2013-03-28 at 19:57 +0100, Hannes Frederic Sowa wrote:
>> > > > 
>> > > > > I assume that it has to do with the usage of this code in
>> > > > > ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
>> > > > > context, if I read it correctly.
>> > > > 
>> > > > Then there would be a possible deadlock in current code.
>> > > 
>> > > Netfilter currently does a local_bh_disable() before entering inet_fragment
>> > > (and later enables it, again).
>> > > 
>> > 
>> > Good, so no need for the _bh() as I suspected.
>> 
>> Ack.
>> 
>> I replaced the _bh spin_locks with plain spinlocks and tested the code
>> with sending fragments and receiving fragments (netfilter and reassmbly
>> logic) with lockdep and didn't get any splats. Looks good so far.
> 
> Well, it's great to see, that you are working on solving my patch
> proposal.  While I'm on Easter vacation ;-)  Much appreciated.
> I'm officially back from vacation Tuesday, and I'll repost then (after
> testing it on my 10G testlab).

Jesper, when you do this, please just put the whole of the excellent
description text you had in the "0/3" posting from this series into
the commit message for the respun patch #3.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-29 19:01               ` Jesper Dangaard Brouer
  2013-03-29 19:05                 ` Eric Dumazet
  2013-03-29 19:22                 ` David Miller
@ 2013-04-02 15:23                 ` Jesper Dangaard Brouer
  2013-04-03 22:11                 ` Jesper Dangaard Brouer
  3 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-04-02 15:23 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Eric Dumazet, David S. Miller, netdev, Florian Westphal, Daniel Borkmann

On Fri, 2013-03-29 at 20:01 +0100, Jesper Dangaard Brouer wrote:

> I'm officially back from vacation Tuesday, and I'll repost then (after
> testing it on my 10G testlab).

Argh- just deleted a long mail with a lot of test results, showing
strange performance results ... as I just realized that /boot ran dry
during my testing procedure, thus invalidating all the test I spend most
of the day performing, argh!

--Jesper

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
  2013-03-29 19:01               ` Jesper Dangaard Brouer
                                   ` (2 preceding siblings ...)
  2013-04-02 15:23                 ` Jesper Dangaard Brouer
@ 2013-04-03 22:11                 ` Jesper Dangaard Brouer
  2013-04-04  7:52                   ` [net-next PATCH V2] " Jesper Dangaard Brouer
  3 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-04-03 22:11 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Eric Dumazet, David S. Miller, netdev, Florian Westphal, Daniel Borkmann

On Fri, 2013-03-29 at 20:01 +0100, Jesper Dangaard Brouer wrote:
> On Fri, 2013-03-29 at 01:33 +0100, Hannes Frederic Sowa wrote:
> > On Thu, Mar 28, 2013 at 04:39:42PM -0700, Eric Dumazet wrote:
> > > On Fri, 2013-03-29 at 00:30 +0100, Hannes Frederic Sowa wrote:
> > > > On Thu, Mar 28, 2013 at 01:22:44PM -0700, Eric Dumazet wrote:
> > > > > On Thu, 2013-03-28 at 19:57 +0100, Hannes Frederic Sowa wrote:
> > > > > 
> > > > > > I assume that it has to do with the usage of this code in
> > > > > > ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
> > > > > > context, if I read it correctly.
> > > > > 
> > > > > Then there would be a possible deadlock in current code.
> > > > 
> > > > Netfilter currently does a local_bh_disable() before entering inet_fragment
> > > > (and later enables it, again).
> > > > 
> > > 
> > > Good, so no need for the _bh() as I suspected.
> > 
> > Ack.
> > 
> > I replaced the _bh spin_locks with plain spinlocks and tested the code
> > with sending fragments and receiving fragments (netfilter and reassmbly
> > logic) with lockdep and didn't get any splats. Looks good so far.
> 
> Well, it's great to see, that you are working on solving my patch
> proposal.  While I'm on Easter vacation ;-)  Much appreciated.
> I'm officially back from vacation Tuesday, and I'll repost then (after
> testing it on my 10G testlab).

When I rebased patch-03 (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression.  BUT this
was caused by some unrelated change in-between.  See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de correspond to patch-02.
Test (C) is what I reported before for patch-03

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)). And (D)
function as a new base-test.

(#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
    ----------  -------   -------  ----------  ---------  --------  -------
(A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
(B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
(C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6

(D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
(E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
(F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3

Test (E) and (F) is patch-03, with and without the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"labtest" so I'm not worried).  But the other results does show
improvements.  And test (E) "with _bh" version is slightly better.

p.s. Damm, it too a bit longer, than expected, to test this fairly small
correction to the patch...
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [net-next PATCH V2] net: frag queue per hash bucket locking
  2013-04-03 22:11                 ` Jesper Dangaard Brouer
@ 2013-04-04  7:52                   ` Jesper Dangaard Brouer
  2013-04-04  9:03                     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-04-04  7:52 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Jesper Dangaard Brouer, netdev, Florian Westphal,
	Daniel Borkmann, Hannes Frederic Sowa

This patch implements per hash bucket locking for the frag queue
hash.  This removes two write locks, and the only remaining write
lock is for protecting hash rebuild.  This essentially reduce the
readers-writer lock to a rebuild lock.

V2:
- By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
  need the spinlock _bh versions, as Netfilter currently does a
  local_bh_disable() before entering inet_fragment.
- Fold-in desc from cover-mail

This patch is part of "net: frag performance followup"
 http://thread.gmane.org/gmane.linux.network/263644
of which two patches have already been accepted:

Same test setup as previous:
 (http://thread.gmane.org/gmane.linux.network/257155)
 Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
 Ethernet flow-control.  A third interface is used for generating the
 DoS attack (with trafgen).

Notice, I have changed the frag DoS generator script to be more
efficient/deadly.  Before it would only hit one RX queue, now its
sending packets causing multi-queue RX, due to "better" RX hashing.

Test types summary (netperf UDP_STREAM):
 Test-20G64K     == 2x10G with 65K fragments
 Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
 Test-20G64K+DoS == Same as 20G64K with frag DoS
 Test-20G3F+DoS  == Same as 20G3F  with frag DoS
 Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
 Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS

When I rebased this-patch(03) (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression.  BUT this
was caused by some unrelated change in-between.  See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
Test (C) is what I reported before for this-patch

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)).
Test (D) function as a new base-test.

Performance table summary (in Mbit/s):

(#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
    ----------  -------   -------  ----------  ---------  --------  -------
(A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
(B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
(C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6

(D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
(E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
(F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3

Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"lab-test" so I'm not worried).  But the other results does show
improvements.  And test (E) "with _bh" version is slightly better.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 include/net/inet_frag.h  |    9 ++++++-
 net/ipv4/inet_fragment.c |   60 ++++++++++++++++++++++++++++++++++++----------
 2 files changed, 55 insertions(+), 14 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 7cac9c5..c4f5183 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -50,10 +50,17 @@ struct inet_frag_queue {
  */
 #define INETFRAGS_MAXDEPTH		128
 
+struct inet_frag_bucket {
+	struct hlist_head	chain;
+	spinlock_t		chain_lock;
+	u16			chain_len;
+};
+
 struct inet_frags {
-	struct hlist_head	hash[INETFRAGS_HASHSZ];
+	struct inet_frag_bucket	hash[INETFRAGS_HASHSZ];
 	/* This rwlock is a global lock (seperate per IPv4, IPv6 and
 	 * netfilter). Important to keep this on a seperate cacheline.
+	 * Its primarily a rebuild protection rwlock.
 	 */
 	rwlock_t		lock ____cacheline_aligned_in_smp;
 	int			secret_interval;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 1206ca6..2bf15e8 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -52,20 +52,27 @@ static void inet_frag_secret_rebuild(unsigned long dummy)
 	unsigned long now = jiffies;
 	int i;
 
+	/* Per bucket lock NOT needed here, due to write lock protection */
 	write_lock(&f->lock);
+
 	get_random_bytes(&f->rnd, sizeof(u32));
 	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
+		struct inet_frag_bucket *hb;
 		struct inet_frag_queue *q;
 		struct hlist_node *n;
 
-		hlist_for_each_entry_safe(q, n, &f->hash[i], list) {
+		hb = &f->hash[i];
+		hlist_for_each_entry_safe(q, n, &hb->chain, list) {
 			unsigned int hval = f->hashfn(q);
 
 			if (hval != i) {
+				struct inet_frag_bucket *hb_dest;
+
 				hlist_del(&q->list);
 
 				/* Relink to new hash chain. */
-				hlist_add_head(&q->list, &f->hash[hval]);
+				hb_dest = &f->hash[hval];
+				hlist_add_head(&q->list, &hb_dest->chain);
 			}
 		}
 	}
@@ -78,9 +85,13 @@ void inet_frags_init(struct inet_frags *f)
 {
 	int i;
 
-	for (i = 0; i < INETFRAGS_HASHSZ; i++)
-		INIT_HLIST_HEAD(&f->hash[i]);
+	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
+		struct inet_frag_bucket *hb = &f->hash[i];
 
+		spin_lock_init(&hb->chain_lock);
+		INIT_HLIST_HEAD(&hb->chain);
+		hb->chain_len = 0;
+	}
 	rwlock_init(&f->lock);
 
 	f->rnd = (u32) ((num_physpages ^ (num_physpages>>7)) ^
@@ -122,9 +133,19 @@ EXPORT_SYMBOL(inet_frags_exit_net);
 
 static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 {
-	write_lock(&f->lock);
+	struct inet_frag_bucket *hb;
+	unsigned int hash;
+
+	read_lock(&f->lock);
+	hash = f->hashfn(fq);
+	hb = &f->hash[hash];
+
+	spin_lock(&hb->chain_lock);
 	hlist_del(&fq->list);
-	write_unlock(&f->lock);
+	hb->chain_len--;
+	spin_unlock(&hb->chain_lock);
+
+	read_unlock(&f->lock);
 	inet_frag_lru_del(fq);
 }
 
@@ -226,27 +247,32 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 		struct inet_frag_queue *qp_in, struct inet_frags *f,
 		void *arg)
 {
+	struct inet_frag_bucket *hb;
 	struct inet_frag_queue *qp;
 #ifdef CONFIG_SMP
 #endif
 	unsigned int hash;
 
-	write_lock(&f->lock);
+	read_lock(&f->lock); /* Protects against hash rebuild */
 	/*
 	 * While we stayed w/o the lock other CPU could update
 	 * the rnd seed, so we need to re-calculate the hash
 	 * chain. Fortunatelly the qp_in can be used to get one.
 	 */
 	hash = f->hashfn(qp_in);
+	hb = &f->hash[hash];
+	spin_lock(&hb->chain_lock);
+
 #ifdef CONFIG_SMP
 	/* With SMP race we have to recheck hash table, because
 	 * such entry could be created on other cpu, while we
-	 * promoted read lock to write lock.
+	 * released the hash bucket lock.
 	 */
-	hlist_for_each_entry(qp, &f->hash[hash], list) {
+	hlist_for_each_entry(qp, &hb->chain, list) {
 		if (qp->net == nf && f->match(qp, arg)) {
 			atomic_inc(&qp->refcnt);
-			write_unlock(&f->lock);
+			spin_unlock(&hb->chain_lock);
+			read_unlock(&f->lock);
 			qp_in->last_in |= INET_FRAG_COMPLETE;
 			inet_frag_put(qp_in, f);
 			return qp;
@@ -258,8 +284,10 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 		atomic_inc(&qp->refcnt);
 
 	atomic_inc(&qp->refcnt);
-	hlist_add_head(&qp->list, &f->hash[hash]);
-	write_unlock(&f->lock);
+	hlist_add_head(&qp->list, &hb->chain);
+	hb->chain_len++;
+	spin_unlock(&hb->chain_lock);
+	read_unlock(&f->lock);
 	inet_frag_lru_add(nf, qp);
 	return qp;
 }
@@ -300,17 +328,23 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 		struct inet_frags *f, void *key, unsigned int hash)
 	__releases(&f->lock)
 {
+	struct inet_frag_bucket *hb;
 	struct inet_frag_queue *q;
 	int depth = 0;
 
-	hlist_for_each_entry(q, &f->hash[hash], list) {
+	hb = &f->hash[hash];
+
+	spin_lock(&hb->chain_lock);
+	hlist_for_each_entry(q, &hb->chain, list) {
 		if (q->net == nf && f->match(q, key)) {
 			atomic_inc(&q->refcnt);
+			spin_unlock(&hb->chain_lock);
 			read_unlock(&f->lock);
 			return q;
 		}
 		depth++;
 	}
+	spin_unlock(&hb->chain_lock);
 	read_unlock(&f->lock);
 
 	if (depth <= INETFRAGS_MAXDEPTH)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH V2] net: frag queue per hash bucket locking
  2013-04-04  7:52                   ` [net-next PATCH V2] " Jesper Dangaard Brouer
@ 2013-04-04  9:03                     ` Hannes Frederic Sowa
  2013-04-04  9:27                       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 29+ messages in thread
From: Hannes Frederic Sowa @ 2013-04-04  9:03 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, David S. Miller, netdev, Florian Westphal, Daniel Borkmann

On Thu, Apr 04, 2013 at 09:52:26AM +0200, Jesper Dangaard Brouer wrote:
> +struct inet_frag_bucket {
> +	struct hlist_head	chain;
> +	spinlock_t		chain_lock;
> +	u16			chain_len;
> +};
> +

I just noticed and wanted to ask for what chain_len is needed?  Could it
be dropped?

If the elements are swapped between the hash buckets in
inet_frag_secret_rebuild it seems you forgot to update chain_len
correctly.

Thanks,

  Hannes

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH V2] net: frag queue per hash bucket locking
  2013-04-04  9:03                     ` Hannes Frederic Sowa
@ 2013-04-04  9:27                       ` Jesper Dangaard Brouer
  2013-04-04  9:38                         ` [net-next PATCH V3] " Jesper Dangaard Brouer
  0 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-04-04  9:27 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Eric Dumazet, David S. Miller, netdev, Florian Westphal, Daniel Borkmann

On Thu, 2013-04-04 at 11:03 +0200, Hannes Frederic Sowa wrote:
> On Thu, Apr 04, 2013 at 09:52:26AM +0200, Jesper Dangaard Brouer wrote:
> > +struct inet_frag_bucket {
> > +	struct hlist_head	chain;
> > +	spinlock_t		chain_lock;
> > +	u16			chain_len;
> > +};
> > +
> 
> I just noticed and wanted to ask for what chain_len is needed?  Could it
> be dropped?

It could be dropped from this patch.  Its part of my future hash cleanup
strategy.
I also wanted to use it to replace the nqueues counter, but its would
not be correct, because nqueues counter is maintained per netns (network
namespace).

Its currently the netns separation, which is causing "headaches" for my
removal of LRU and direct-hash-cleaning solution...


> If the elements are swapped between the hash buckets in
> inet_frag_secret_rebuild it seems you forgot to update chain_len
> correctly.

Ah, good catch.
Given its not even correct, I'll remove the chain_len and repost a V3.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [net-next PATCH V3] net: frag queue per hash bucket locking
  2013-04-04  9:27                       ` Jesper Dangaard Brouer
@ 2013-04-04  9:38                         ` Jesper Dangaard Brouer
  2013-04-04  9:58                           ` Hannes Frederic Sowa
  2013-04-04 16:24                           ` Eric Dumazet
  0 siblings, 2 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2013-04-04  9:38 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller
  Cc: Jesper Dangaard Brouer, netdev, Florian Westphal,
	Daniel Borkmann, Hannes Frederic Sowa

This patch implements per hash bucket locking for the frag queue
hash.  This removes two write locks, and the only remaining write
lock is for protecting hash rebuild.  This essentially reduce the
readers-writer lock to a rebuild lock.

This patch is part of "net: frag performance followup"
 http://thread.gmane.org/gmane.linux.network/263644
of which two patches have already been accepted:

Same test setup as previous:
 (http://thread.gmane.org/gmane.linux.network/257155)
 Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
 Ethernet flow-control.  A third interface is used for generating the
 DoS attack (with trafgen).

Notice, I have changed the frag DoS generator script to be more
efficient/deadly.  Before it would only hit one RX queue, now its
sending packets causing multi-queue RX, due to "better" RX hashing.

Test types summary (netperf UDP_STREAM):
 Test-20G64K     == 2x10G with 65K fragments
 Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
 Test-20G64K+DoS == Same as 20G64K with frag DoS
 Test-20G3F+DoS  == Same as 20G3F  with frag DoS
 Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
 Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS

When I rebased this-patch(03) (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression.  BUT this
was caused by some unrelated change in-between.  See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
Test (C) is what I reported before for this-patch

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)).
Test (D) function as a new base-test.

Performance table summary (in Mbit/s):

(#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
    ----------  -------   -------  ----------  ---------  --------  -------
(A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
(B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
(C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6

(D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
(E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
(F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3

Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"lab-test" so I'm not worried).  But the other results does show
improvements.  And test (E) "with _bh" version is slightly better.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

----
V2:
- By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
  need the spinlock _bh versions, as Netfilter currently does a
  local_bh_disable() before entering inet_fragment.
- Fold-in desc from cover-mail
V3:
- Drop the chain_len counter per hash bucket.
---

 include/net/inet_frag.h  |    8 ++++++
 net/ipv4/inet_fragment.c |   57 ++++++++++++++++++++++++++++++++++++----------
 2 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 7cac9c5..6f41b45 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -50,10 +50,16 @@ struct inet_frag_queue {
  */
 #define INETFRAGS_MAXDEPTH		128
 
+struct inet_frag_bucket {
+	struct hlist_head	chain;
+	spinlock_t		chain_lock;
+};
+
 struct inet_frags {
-	struct hlist_head	hash[INETFRAGS_HASHSZ];
+	struct inet_frag_bucket	hash[INETFRAGS_HASHSZ];
 	/* This rwlock is a global lock (seperate per IPv4, IPv6 and
 	 * netfilter). Important to keep this on a seperate cacheline.
+	 * Its primarily a rebuild protection rwlock.
 	 */
 	rwlock_t		lock ____cacheline_aligned_in_smp;
 	int			secret_interval;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 1206ca6..e97d66a 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -52,20 +52,27 @@ static void inet_frag_secret_rebuild(unsigned long dummy)
 	unsigned long now = jiffies;
 	int i;
 
+	/* Per bucket lock NOT needed here, due to write lock protection */
 	write_lock(&f->lock);
+
 	get_random_bytes(&f->rnd, sizeof(u32));
 	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
+		struct inet_frag_bucket *hb;
 		struct inet_frag_queue *q;
 		struct hlist_node *n;
 
-		hlist_for_each_entry_safe(q, n, &f->hash[i], list) {
+		hb = &f->hash[i];
+		hlist_for_each_entry_safe(q, n, &hb->chain, list) {
 			unsigned int hval = f->hashfn(q);
 
 			if (hval != i) {
+				struct inet_frag_bucket *hb_dest;
+
 				hlist_del(&q->list);
 
 				/* Relink to new hash chain. */
-				hlist_add_head(&q->list, &f->hash[hval]);
+				hb_dest = &f->hash[hval];
+				hlist_add_head(&q->list, &hb_dest->chain);
 			}
 		}
 	}
@@ -78,9 +85,12 @@ void inet_frags_init(struct inet_frags *f)
 {
 	int i;
 
-	for (i = 0; i < INETFRAGS_HASHSZ; i++)
-		INIT_HLIST_HEAD(&f->hash[i]);
+	for (i = 0; i < INETFRAGS_HASHSZ; i++) {
+		struct inet_frag_bucket *hb = &f->hash[i];
 
+		spin_lock_init(&hb->chain_lock);
+		INIT_HLIST_HEAD(&hb->chain);
+	}
 	rwlock_init(&f->lock);
 
 	f->rnd = (u32) ((num_physpages ^ (num_physpages>>7)) ^
@@ -122,9 +132,18 @@ EXPORT_SYMBOL(inet_frags_exit_net);
 
 static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 {
-	write_lock(&f->lock);
+	struct inet_frag_bucket *hb;
+	unsigned int hash;
+
+	read_lock(&f->lock);
+	hash = f->hashfn(fq);
+	hb = &f->hash[hash];
+
+	spin_lock(&hb->chain_lock);
 	hlist_del(&fq->list);
-	write_unlock(&f->lock);
+	spin_unlock(&hb->chain_lock);
+
+	read_unlock(&f->lock);
 	inet_frag_lru_del(fq);
 }
 
@@ -226,27 +245,32 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 		struct inet_frag_queue *qp_in, struct inet_frags *f,
 		void *arg)
 {
+	struct inet_frag_bucket *hb;
 	struct inet_frag_queue *qp;
 #ifdef CONFIG_SMP
 #endif
 	unsigned int hash;
 
-	write_lock(&f->lock);
+	read_lock(&f->lock); /* Protects against hash rebuild */
 	/*
 	 * While we stayed w/o the lock other CPU could update
 	 * the rnd seed, so we need to re-calculate the hash
 	 * chain. Fortunatelly the qp_in can be used to get one.
 	 */
 	hash = f->hashfn(qp_in);
+	hb = &f->hash[hash];
+	spin_lock(&hb->chain_lock);
+
 #ifdef CONFIG_SMP
 	/* With SMP race we have to recheck hash table, because
 	 * such entry could be created on other cpu, while we
-	 * promoted read lock to write lock.
+	 * released the hash bucket lock.
 	 */
-	hlist_for_each_entry(qp, &f->hash[hash], list) {
+	hlist_for_each_entry(qp, &hb->chain, list) {
 		if (qp->net == nf && f->match(qp, arg)) {
 			atomic_inc(&qp->refcnt);
-			write_unlock(&f->lock);
+			spin_unlock(&hb->chain_lock);
+			read_unlock(&f->lock);
 			qp_in->last_in |= INET_FRAG_COMPLETE;
 			inet_frag_put(qp_in, f);
 			return qp;
@@ -258,8 +282,9 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 		atomic_inc(&qp->refcnt);
 
 	atomic_inc(&qp->refcnt);
-	hlist_add_head(&qp->list, &f->hash[hash]);
-	write_unlock(&f->lock);
+	hlist_add_head(&qp->list, &hb->chain);
+	spin_unlock(&hb->chain_lock);
+	read_unlock(&f->lock);
 	inet_frag_lru_add(nf, qp);
 	return qp;
 }
@@ -300,17 +325,23 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 		struct inet_frags *f, void *key, unsigned int hash)
 	__releases(&f->lock)
 {
+	struct inet_frag_bucket *hb;
 	struct inet_frag_queue *q;
 	int depth = 0;
 
-	hlist_for_each_entry(q, &f->hash[hash], list) {
+	hb = &f->hash[hash];
+
+	spin_lock(&hb->chain_lock);
+	hlist_for_each_entry(q, &hb->chain, list) {
 		if (q->net == nf && f->match(q, key)) {
 			atomic_inc(&q->refcnt);
+			spin_unlock(&hb->chain_lock);
 			read_unlock(&f->lock);
 			return q;
 		}
 		depth++;
 	}
+	spin_unlock(&hb->chain_lock);
 	read_unlock(&f->lock);
 
 	if (depth <= INETFRAGS_MAXDEPTH)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH V3] net: frag queue per hash bucket locking
  2013-04-04  9:38                         ` [net-next PATCH V3] " Jesper Dangaard Brouer
@ 2013-04-04  9:58                           ` Hannes Frederic Sowa
  2013-04-04 16:24                           ` Eric Dumazet
  1 sibling, 0 replies; 29+ messages in thread
From: Hannes Frederic Sowa @ 2013-04-04  9:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, David S. Miller, netdev, Florian Westphal, Daniel Borkmann

On Thu, Apr 04, 2013 at 11:38:16AM +0200, Jesper Dangaard Brouer wrote:
 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> ----
> V2:
> - By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
>   need the spinlock _bh versions, as Netfilter currently does a
>   local_bh_disable() before entering inet_fragment.
> - Fold-in desc from cover-mail
> V3:
> - Drop the chain_len counter per hash bucket.

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH V3] net: frag queue per hash bucket locking
  2013-04-04  9:38                         ` [net-next PATCH V3] " Jesper Dangaard Brouer
  2013-04-04  9:58                           ` Hannes Frederic Sowa
@ 2013-04-04 16:24                           ` Eric Dumazet
  2013-04-04 21:38                             ` David Miller
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2013-04-04 16:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S. Miller, netdev, Florian Westphal, Daniel Borkmann,
	Hannes Frederic Sowa

On Thu, 2013-04-04 at 11:38 +0200, Jesper Dangaard Brouer wrote:
> This patch implements per hash bucket locking for the frag queue
> hash.  This removes two write locks, and the only remaining write
> lock is for protecting hash rebuild.  This essentially reduce the
> readers-writer lock to a rebuild lock.

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [net-next PATCH V3] net: frag queue per hash bucket locking
  2013-04-04 16:24                           ` Eric Dumazet
@ 2013-04-04 21:38                             ` David Miller
  0 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2013-04-04 21:38 UTC (permalink / raw)
  To: eric.dumazet; +Cc: brouer, netdev, fw, dborkman, hannes

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 04 Apr 2013 09:24:15 -0700

> On Thu, 2013-04-04 at 11:38 +0200, Jesper Dangaard Brouer wrote:
>> This patch implements per hash bucket locking for the frag queue
>> hash.  This removes two write locks, and the only remaining write
>> lock is for protecting hash rebuild.  This essentially reduce the
>> readers-writer lock to a rebuild lock.
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

Applied, thanks everyone.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2013-04-04 21:38 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-27 15:54 [net-next PATCH 0/3] net: frag performance followup Jesper Dangaard Brouer
2013-03-27 15:55 ` [net-next PATCH 1/3] net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop Jesper Dangaard Brouer
2013-03-27 16:14   ` Eric Dumazet
2013-03-27 17:10     ` David Miller
2013-03-27 15:55 ` [net-next PATCH 2/3] net: use the frag lru_lock to protect netns_frags.nqueues update Jesper Dangaard Brouer
2013-03-27 16:21   ` Eric Dumazet
2013-03-27 17:10     ` David Miller
2013-03-27 15:56 ` [net-next PATCH 3/3] net: frag queue per hash bucket locking Jesper Dangaard Brouer
2013-03-27 17:25   ` Eric Dumazet
2013-03-28 18:57     ` Hannes Frederic Sowa
2013-03-28 19:03       ` David Miller
2013-03-28 19:10         ` Hannes Frederic Sowa
2013-03-28 19:19           ` David Miller
2013-03-28 20:22       ` Eric Dumazet
2013-03-28 23:30         ` Hannes Frederic Sowa
2013-03-28 23:39           ` Eric Dumazet
2013-03-29  0:33             ` Hannes Frederic Sowa
2013-03-29 19:01               ` Jesper Dangaard Brouer
2013-03-29 19:05                 ` Eric Dumazet
2013-03-29 19:22                 ` David Miller
2013-04-02 15:23                 ` Jesper Dangaard Brouer
2013-04-03 22:11                 ` Jesper Dangaard Brouer
2013-04-04  7:52                   ` [net-next PATCH V2] " Jesper Dangaard Brouer
2013-04-04  9:03                     ` Hannes Frederic Sowa
2013-04-04  9:27                       ` Jesper Dangaard Brouer
2013-04-04  9:38                         ` [net-next PATCH V3] " Jesper Dangaard Brouer
2013-04-04  9:58                           ` Hannes Frederic Sowa
2013-04-04 16:24                           ` Eric Dumazet
2013-04-04 21:38                             ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).