netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TUN problems (regression?)
@ 2012-12-20 23:16 Paul Moore
  2012-12-20 23:38 ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Moore @ 2012-12-20 23:16 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 528 bytes --]

[CC'ing netdev in case this is a known problem I just missed ...]

Hi Jason,

I started doing some more testing with the multiqueue TUN changes and I ran 
into a problem when running tunctl: running it once w/o arguments works as 
expected, but running it a second time results in failure and a 
kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
on my test VM and happens independent of the LSM/SELinux fixup patches.

Have you seen this before?

-- 
paul moore
security and virtualization @ redhat

[-- Attachment #2: tun_problem.png --]
[-- Type: image/png, Size: 13740 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-20 23:16 TUN problems (regression?) Paul Moore
@ 2012-12-20 23:38 ` Eric Dumazet
  2012-12-20 23:50   ` Stephen Hemminger
  2012-12-21 16:27   ` Paul Moore
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2012-12-20 23:38 UTC (permalink / raw)
  To: Paul Moore; +Cc: Jason Wang, netdev

On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> [CC'ing netdev in case this is a known problem I just missed ...]
> 
> Hi Jason,
> 
> I started doing some more testing with the multiqueue TUN changes and I ran 
> into a problem when running tunctl: running it once w/o arguments works as 
> expected, but running it a second time results in failure and a 
> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
> on my test VM and happens independent of the LSM/SELinux fixup patches.
> 
> Have you seen this before?
> 

Obviously code in tun_flow_init() is wrong...

static int tun_flow_init(struct tun_struct *tun)
{
        int i;

        tun->flow_cache = kmem_cache_create("tun_flow_cache",
                                            sizeof(struct tun_flow_entry), 0, 0,
                                            NULL);
        if (!tun->flow_cache)
                return -ENOMEM;
...
}


I have no idea why we would need a kmem_cache per tun_struct,
and why we even need a kmem_cache.


I would try following patch :

 drivers/net/tun.c |   24 +++---------------------
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 504f7f1..fbd106e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -180,7 +180,6 @@ struct tun_struct {
 	int debug;
 #endif
 	spinlock_t lock;
-	struct kmem_cache *flow_cache;
 	struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
 	struct timer_list flow_gc_timer;
 	unsigned long ageing_time;
@@ -209,8 +208,8 @@ static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
 					      struct hlist_head *head,
 					      u32 rxhash, u16 queue_index)
 {
-	struct tun_flow_entry *e = kmem_cache_alloc(tun->flow_cache,
-						    GFP_ATOMIC);
+	struct tun_flow_entry *e = kmalloc(sizeof(*e), GFP_ATOMIC);
+
 	if (e) {
 		tun_debug(KERN_INFO, tun, "create flow: hash %u index %u\n",
 			  rxhash, queue_index);
@@ -223,19 +222,12 @@ static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
 	return e;
 }
 
-static void tun_flow_free(struct rcu_head *head)
-{
-	struct tun_flow_entry *e
-		= container_of(head, struct tun_flow_entry, rcu);
-	kmem_cache_free(e->tun->flow_cache, e);
-}
-
 static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry *e)
 {
 	tun_debug(KERN_INFO, tun, "delete flow: hash %u index %u\n",
 		  e->rxhash, e->queue_index);
 	hlist_del_rcu(&e->hash_link);
-	call_rcu(&e->rcu, tun_flow_free);
+	kfree_rcu(e, rcu);
 }
 
 static void tun_flow_flush(struct tun_struct *tun)
@@ -833,12 +825,6 @@ static int tun_flow_init(struct tun_struct *tun)
 {
 	int i;
 
-	tun->flow_cache = kmem_cache_create("tun_flow_cache",
-					    sizeof(struct tun_flow_entry), 0, 0,
-					    NULL);
-	if (!tun->flow_cache)
-		return -ENOMEM;
-
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++)
 		INIT_HLIST_HEAD(&tun->flows[i]);
 
@@ -854,10 +840,6 @@ static void tun_flow_uninit(struct tun_struct *tun)
 {
 	del_timer_sync(&tun->flow_gc_timer);
 	tun_flow_flush(tun);
-
-	/* Wait for completion of call_rcu()'s */
-	rcu_barrier();
-	kmem_cache_destroy(tun->flow_cache);
 }
 
 /* Initialize net device. */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-20 23:38 ` Eric Dumazet
@ 2012-12-20 23:50   ` Stephen Hemminger
  2012-12-21  3:32     ` Jason Wang
  2012-12-21 16:27   ` Paul Moore
  1 sibling, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2012-12-20 23:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Paul Moore, Jason Wang, netdev

On Thu, 20 Dec 2012 15:38:17 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> > [CC'ing netdev in case this is a known problem I just missed ...]
> > 
> > Hi Jason,
> > 
> > I started doing some more testing with the multiqueue TUN changes and I ran 
> > into a problem when running tunctl: running it once w/o arguments works as 
> > expected, but running it a second time results in failure and a 
> > kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
> > on my test VM and happens independent of the LSM/SELinux fixup patches.
> > 
> > Have you seen this before?
> > 
> 
> Obviously code in tun_flow_init() is wrong...
> 
> static int tun_flow_init(struct tun_struct *tun)
> {
>         int i;
> 
>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>                                             sizeof(struct tun_flow_entry), 0, 0,
>                                             NULL);
>         if (!tun->flow_cache)
>                 return -ENOMEM;
> ...
> }
> 
> 
> I have no idea why we would need a kmem_cache per tun_struct,
> and why we even need a kmem_cache.

Normally flow malloc/free should be good enough.
It might make sense to use private kmem_cache if doing hlist_nulls.


Acked-by: Stephen Hemminger <shemminger@vyatta.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-20 23:50   ` Stephen Hemminger
@ 2012-12-21  3:32     ` Jason Wang
  2012-12-21  3:39       ` Eric Dumazet
  2012-12-21 21:15       ` David Miller
  0 siblings, 2 replies; 13+ messages in thread
From: Jason Wang @ 2012-12-21  3:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, Paul Moore, netdev

On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
> On Thu, 20 Dec 2012 15:38:17 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
>>> [CC'ing netdev in case this is a known problem I just missed ...]
>>>
>>> Hi Jason,
>>>
>>> I started doing some more testing with the multiqueue TUN changes and I ran 
>>> into a problem when running tunctl: running it once w/o arguments works as 
>>> expected, but running it a second time results in failure and a 
>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
>>>
>>> Have you seen this before?
>>>
>> Obviously code in tun_flow_init() is wrong...
>>
>> static int tun_flow_init(struct tun_struct *tun)
>> {
>>         int i;
>>
>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>>                                             sizeof(struct tun_flow_entry), 0, 0,
>>                                             NULL);
>>         if (!tun->flow_cache)
>>                 return -ENOMEM;
>> ...
>> }
>>
>>
>> I have no idea why we would need a kmem_cache per tun_struct,
>> and why we even need a kmem_cache.
> Normally flow malloc/free should be good enough.
> It might make sense to use private kmem_cache if doing hlist_nulls.
>
>
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Should be at least a global cache, I thought I can get some speed-up by
using kmem_cache.

Acked-by: Jason Wang <jasowang@redhat.com>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-21  3:32     ` Jason Wang
@ 2012-12-21  3:39       ` Eric Dumazet
  2012-12-21  4:26         ` Jason Wang
  2012-12-21 21:15       ` David Miller
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2012-12-21  3:39 UTC (permalink / raw)
  To: Jason Wang; +Cc: Stephen Hemminger, Paul Moore, netdev

On Fri, 2012-12-21 at 11:32 +0800, Jason Wang wrote:
> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
> > On Thu, 20 Dec 2012 15:38:17 -0800
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> >>> [CC'ing netdev in case this is a known problem I just missed ...]
> >>>
> >>> Hi Jason,
> >>>
> >>> I started doing some more testing with the multiqueue TUN changes and I ran 
> >>> into a problem when running tunctl: running it once w/o arguments works as 
> >>> expected, but running it a second time results in failure and a 
> >>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
> >>> on my test VM and happens independent of the LSM/SELinux fixup patches.
> >>>
> >>> Have you seen this before?
> >>>
> >> Obviously code in tun_flow_init() is wrong...
> >>
> >> static int tun_flow_init(struct tun_struct *tun)
> >> {
> >>         int i;
> >>
> >>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
> >>                                             sizeof(struct tun_flow_entry), 0, 0,
> >>                                             NULL);
> >>         if (!tun->flow_cache)
> >>                 return -ENOMEM;
> >> ...
> >> }
> >>
> >>
> >> I have no idea why we would need a kmem_cache per tun_struct,
> >> and why we even need a kmem_cache.
> > Normally flow malloc/free should be good enough.
> > It might make sense to use private kmem_cache if doing hlist_nulls.
> >
> >
> > Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> Should be at least a global cache, I thought I can get some speed-up by
> using kmem_cache.
> 
> Acked-by: Jason Wang <jasowang@redhat.com>

Was it with SLUB or SLAB ?

Using generic kmalloc-64 is better than a dedicated kmem_cache of 48
bytes per object, as we guarantee each object is on a single cache line.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-21  3:39       ` Eric Dumazet
@ 2012-12-21  4:26         ` Jason Wang
  2012-12-28  0:41           ` Stephen Hemminger
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2012-12-21  4:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, Paul Moore, netdev

On 12/21/2012 11:39 AM, Eric Dumazet wrote:
> On Fri, 2012-12-21 at 11:32 +0800, Jason Wang wrote:
>> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
>>> On Thu, 20 Dec 2012 15:38:17 -0800
>>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
>>>>> [CC'ing netdev in case this is a known problem I just missed ...]
>>>>>
>>>>> Hi Jason,
>>>>>
>>>>> I started doing some more testing with the multiqueue TUN changes and I ran 
>>>>> into a problem when running tunctl: running it once w/o arguments works as 
>>>>> expected, but running it a second time results in failure and a 
>>>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
>>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
>>>>>
>>>>> Have you seen this before?
>>>>>
>>>> Obviously code in tun_flow_init() is wrong...
>>>>
>>>> static int tun_flow_init(struct tun_struct *tun)
>>>> {
>>>>         int i;
>>>>
>>>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>>>>                                             sizeof(struct tun_flow_entry), 0, 0,
>>>>                                             NULL);
>>>>         if (!tun->flow_cache)
>>>>                 return -ENOMEM;
>>>> ...
>>>> }
>>>>
>>>>
>>>> I have no idea why we would need a kmem_cache per tun_struct,
>>>> and why we even need a kmem_cache.
>>> Normally flow malloc/free should be good enough.
>>> It might make sense to use private kmem_cache if doing hlist_nulls.
>>>
>>>
>>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
>> Should be at least a global cache, I thought I can get some speed-up by
>> using kmem_cache.
>>
>> Acked-by: Jason Wang <jasowang@redhat.com>
> Was it with SLUB or SLAB ?
>
> Using generic kmalloc-64 is better than a dedicated kmem_cache of 48
> bytes per object, as we guarantee each object is on a single cache line.
>
>

Right, thanks for the explanation.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-20 23:38 ` Eric Dumazet
  2012-12-20 23:50   ` Stephen Hemminger
@ 2012-12-21 16:27   ` Paul Moore
  2012-12-21 17:17     ` [PATCH] tuntap: dont use a private kmem_cache Eric Dumazet
  1 sibling, 1 reply; 13+ messages in thread
From: Paul Moore @ 2012-12-21 16:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jason Wang, netdev

On Thursday, December 20, 2012 03:38:17 PM Eric Dumazet wrote:
> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> > [CC'ing netdev in case this is a known problem I just missed ...]
> > 
> > Hi Jason,
> > 
> > I started doing some more testing with the multiqueue TUN changes and I
> > ran
> > into a problem when running tunctl: running it once w/o arguments works as
> > expected, but running it a second time results in failure and a
> > kmem_cache_sanity_check() failure.  The problem appears to be very
> > repeatable on my test VM and happens independent of the LSM/SELinux fixup
> > patches.
> > 
> > Have you seen this before?
> 
> Obviously code in tun_flow_init() is wrong...
> 
> static int tun_flow_init(struct tun_struct *tun)
> {
>         int i;
> 
>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>                                             sizeof(struct tun_flow_entry),
> 0, 0, NULL);
>         if (!tun->flow_cache)
>                 return -ENOMEM;
> ...
> }
> 
> 
> I have no idea why we would need a kmem_cache per tun_struct,
> and why we even need a kmem_cache.
> 
> 
> I would try following patch :
> 
>  drivers/net/tun.c |   24 +++---------------------
>  1 file changed, 3 insertions(+), 21 deletions(-)

Thanks, that solved my problem.  Also, in case you were still curious, I was 
using SLUB.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] tuntap: dont use a private kmem_cache
  2012-12-21 16:27   ` Paul Moore
@ 2012-12-21 17:17     ` Eric Dumazet
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2012-12-21 17:17 UTC (permalink / raw)
  To: Paul Moore, David Miller; +Cc: Jason Wang, netdev, Stephen Hemminger

From: Eric Dumazet <edumazet@google.com>

Commit 96442e42429 (tuntap: choose the txq based on rxq)
added a per tun_struct kmem_cache.

As soon as several tun_struct are used, we get an error
because two caches cannot have same name.

Use the default kmalloc()/kfree_rcu(), as it reduce code
size and doesn't have performance impact here.

Reported-by: Paul Moore <pmoore@redhat.com>
Tested-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |   24 +++---------------------
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 504f7f1..fbd106e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -180,7 +180,6 @@ struct tun_struct {
 	int debug;
 #endif
 	spinlock_t lock;
-	struct kmem_cache *flow_cache;
 	struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
 	struct timer_list flow_gc_timer;
 	unsigned long ageing_time;
@@ -209,8 +208,8 @@ static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
 					      struct hlist_head *head,
 					      u32 rxhash, u16 queue_index)
 {
-	struct tun_flow_entry *e = kmem_cache_alloc(tun->flow_cache,
-						    GFP_ATOMIC);
+	struct tun_flow_entry *e = kmalloc(sizeof(*e), GFP_ATOMIC);
+
 	if (e) {
 		tun_debug(KERN_INFO, tun, "create flow: hash %u index %u\n",
 			  rxhash, queue_index);
@@ -223,19 +222,12 @@ static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
 	return e;
 }
 
-static void tun_flow_free(struct rcu_head *head)
-{
-	struct tun_flow_entry *e
-		= container_of(head, struct tun_flow_entry, rcu);
-	kmem_cache_free(e->tun->flow_cache, e);
-}
-
 static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry *e)
 {
 	tun_debug(KERN_INFO, tun, "delete flow: hash %u index %u\n",
 		  e->rxhash, e->queue_index);
 	hlist_del_rcu(&e->hash_link);
-	call_rcu(&e->rcu, tun_flow_free);
+	kfree_rcu(e, rcu);
 }
 
 static void tun_flow_flush(struct tun_struct *tun)
@@ -833,12 +825,6 @@ static int tun_flow_init(struct tun_struct *tun)
 {
 	int i;
 
-	tun->flow_cache = kmem_cache_create("tun_flow_cache",
-					    sizeof(struct tun_flow_entry), 0, 0,
-					    NULL);
-	if (!tun->flow_cache)
-		return -ENOMEM;
-
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++)
 		INIT_HLIST_HEAD(&tun->flows[i]);
 
@@ -854,10 +840,6 @@ static void tun_flow_uninit(struct tun_struct *tun)
 {
 	del_timer_sync(&tun->flow_gc_timer);
 	tun_flow_flush(tun);
-
-	/* Wait for completion of call_rcu()'s */
-	rcu_barrier();
-	kmem_cache_destroy(tun->flow_cache);
 }
 
 /* Initialize net device. */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-21  3:32     ` Jason Wang
  2012-12-21  3:39       ` Eric Dumazet
@ 2012-12-21 21:15       ` David Miller
  1 sibling, 0 replies; 13+ messages in thread
From: David Miller @ 2012-12-21 21:15 UTC (permalink / raw)
  To: jasowang; +Cc: shemminger, eric.dumazet, pmoore, netdev

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 21 Dec 2012 11:32:43 +0800

> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
>> On Thu, 20 Dec 2012 15:38:17 -0800
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
>>>> [CC'ing netdev in case this is a known problem I just missed ...]
>>>>
>>>> Hi Jason,
>>>>
>>>> I started doing some more testing with the multiqueue TUN changes and I ran 
>>>> into a problem when running tunctl: running it once w/o arguments works as 
>>>> expected, but running it a second time results in failure and a 
>>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
>>>>
>>>> Have you seen this before?
>>>>
>>> Obviously code in tun_flow_init() is wrong...
>>>
>>> static int tun_flow_init(struct tun_struct *tun)
>>> {
>>>         int i;
>>>
>>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>>>                                             sizeof(struct tun_flow_entry), 0, 0,
>>>                                             NULL);
>>>         if (!tun->flow_cache)
>>>                 return -ENOMEM;
>>> ...
>>> }
>>>
>>>
>>> I have no idea why we would need a kmem_cache per tun_struct,
>>> and why we even need a kmem_cache.
>> Normally flow malloc/free should be good enough.
>> It might make sense to use private kmem_cache if doing hlist_nulls.
>>
>>
>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> Should be at least a global cache, I thought I can get some speed-up by
> using kmem_cache.
> 
> Acked-by: Jason Wang <jasowang@redhat.com>

Applied.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-21  4:26         ` Jason Wang
@ 2012-12-28  0:41           ` Stephen Hemminger
  2012-12-28  5:43             ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2012-12-28  0:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: Eric Dumazet, Paul Moore, netdev

On Fri, 21 Dec 2012 12:26:56 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 12/21/2012 11:39 AM, Eric Dumazet wrote:
> > On Fri, 2012-12-21 at 11:32 +0800, Jason Wang wrote:
> >> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
> >>> On Thu, 20 Dec 2012 15:38:17 -0800
> >>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>>
> >>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> >>>>> [CC'ing netdev in case this is a known problem I just missed ...]
> >>>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> I started doing some more testing with the multiqueue TUN changes and I ran 
> >>>>> into a problem when running tunctl: running it once w/o arguments works as 
> >>>>> expected, but running it a second time results in failure and a 
> >>>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
> >>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
> >>>>>
> >>>>> Have you seen this before?
> >>>>>
> >>>> Obviously code in tun_flow_init() is wrong...
> >>>>
> >>>> static int tun_flow_init(struct tun_struct *tun)
> >>>> {
> >>>>         int i;
> >>>>
> >>>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
> >>>>                                             sizeof(struct tun_flow_entry), 0, 0,
> >>>>                                             NULL);
> >>>>         if (!tun->flow_cache)
> >>>>                 return -ENOMEM;
> >>>> ...
> >>>> }
> >>>>
> >>>>
> >>>> I have no idea why we would need a kmem_cache per tun_struct,
> >>>> and why we even need a kmem_cache.
> >>> Normally flow malloc/free should be good enough.
> >>> It might make sense to use private kmem_cache if doing hlist_nulls.
> >>>
> >>>
> >>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> >> Should be at least a global cache, I thought I can get some speed-up by
> >> using kmem_cache.
> >>
> >> Acked-by: Jason Wang <jasowang@redhat.com>
> > Was it with SLUB or SLAB ?
> >
> > Using generic kmalloc-64 is better than a dedicated kmem_cache of 48
> > bytes per object, as we guarantee each object is on a single cache line.
> >
> >
> 
> Right, thanks for the explanation.
> 

I wonder if TUN would be better if it used a array to translate
receive hash to receive queue. This is how real hardware works with the
indirection table, and it would allow RFS acceleration. The current flow
cache stuff is prone to DoS attack and scaling problems with lots of
short lived flows.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-28  0:41           ` Stephen Hemminger
@ 2012-12-28  5:43             ` Jason Wang
  2012-12-28  6:25               ` Stephen Hemminger
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2012-12-28  5:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, Paul Moore, netdev

On 12/28/2012 08:41 AM, Stephen Hemminger wrote:
> On Fri, 21 Dec 2012 12:26:56 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 12/21/2012 11:39 AM, Eric Dumazet wrote:
>>> On Fri, 2012-12-21 at 11:32 +0800, Jason Wang wrote:
>>>> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
>>>>> On Thu, 20 Dec 2012 15:38:17 -0800
>>>>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>
>>>>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
>>>>>>> [CC'ing netdev in case this is a known problem I just missed ...]
>>>>>>>
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> I started doing some more testing with the multiqueue TUN changes and I ran 
>>>>>>> into a problem when running tunctl: running it once w/o arguments works as 
>>>>>>> expected, but running it a second time results in failure and a 
>>>>>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
>>>>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
>>>>>>>
>>>>>>> Have you seen this before?
>>>>>>>
>>>>>> Obviously code in tun_flow_init() is wrong...
>>>>>>
>>>>>> static int tun_flow_init(struct tun_struct *tun)
>>>>>> {
>>>>>>         int i;
>>>>>>
>>>>>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>>>>>>                                             sizeof(struct tun_flow_entry), 0, 0,
>>>>>>                                             NULL);
>>>>>>         if (!tun->flow_cache)
>>>>>>                 return -ENOMEM;
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>>
>>>>>> I have no idea why we would need a kmem_cache per tun_struct,
>>>>>> and why we even need a kmem_cache.
>>>>> Normally flow malloc/free should be good enough.
>>>>> It might make sense to use private kmem_cache if doing hlist_nulls.
>>>>>
>>>>>
>>>>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
>>>> Should be at least a global cache, I thought I can get some speed-up by
>>>> using kmem_cache.
>>>>
>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>> Was it with SLUB or SLAB ?
>>>
>>> Using generic kmalloc-64 is better than a dedicated kmem_cache of 48
>>> bytes per object, as we guarantee each object is on a single cache line.
>>>
>>>
>> Right, thanks for the explanation.
>>
> I wonder if TUN would be better if it used a array to translate
> receive hash to receive queue. This is how real hardware works with the
> indirection table, and it would allow RFS acceleration. The current flow
> cache stuff is prone to DoS attack and scaling problems with lots of
> short lived flows.

The problem of indirection table is hash collision which may even happen
when few flows existed.

For the RFS, we can open a API/ioctl for userspace to add or remove a
flow cache.

For the DoS/scaling issue, I have an idea of:
- limit the total number of flow entries in tun/tap
- only update the flow entry every N (say 20 like ixgbe) packets or the
the tcp packet has sync flag
- I'm not sure skb_get_rxhash() is lightweight enough, or change to more
lightweight one?

Any suggestions?

Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-28  5:43             ` Jason Wang
@ 2012-12-28  6:25               ` Stephen Hemminger
  2013-01-04  5:04                 ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2012-12-28  6:25 UTC (permalink / raw)
  To: Jason Wang; +Cc: Eric Dumazet, Paul Moore, netdev

On Fri, 28 Dec 2012 13:43:54 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 12/28/2012 08:41 AM, Stephen Hemminger wrote:
> > On Fri, 21 Dec 2012 12:26:56 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >
> >> On 12/21/2012 11:39 AM, Eric Dumazet wrote:
> >>> On Fri, 2012-12-21 at 11:32 +0800, Jason Wang wrote:
> >>>> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
> >>>>> On Thu, 20 Dec 2012 15:38:17 -0800
> >>>>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>>>>
> >>>>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> >>>>>>> [CC'ing netdev in case this is a known problem I just missed ...]
> >>>>>>>
> >>>>>>> Hi Jason,
> >>>>>>>
> >>>>>>> I started doing some more testing with the multiqueue TUN changes and I ran 
> >>>>>>> into a problem when running tunctl: running it once w/o arguments works as 
> >>>>>>> expected, but running it a second time results in failure and a 
> >>>>>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
> >>>>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
> >>>>>>>
> >>>>>>> Have you seen this before?
> >>>>>>>
> >>>>>> Obviously code in tun_flow_init() is wrong...
> >>>>>>
> >>>>>> static int tun_flow_init(struct tun_struct *tun)
> >>>>>> {
> >>>>>>         int i;
> >>>>>>
> >>>>>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
> >>>>>>                                             sizeof(struct tun_flow_entry), 0, 0,
> >>>>>>                                             NULL);
> >>>>>>         if (!tun->flow_cache)
> >>>>>>                 return -ENOMEM;
> >>>>>> ...
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> I have no idea why we would need a kmem_cache per tun_struct,
> >>>>>> and why we even need a kmem_cache.
> >>>>> Normally flow malloc/free should be good enough.
> >>>>> It might make sense to use private kmem_cache if doing hlist_nulls.
> >>>>>
> >>>>>
> >>>>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> >>>> Should be at least a global cache, I thought I can get some speed-up by
> >>>> using kmem_cache.
> >>>>
> >>>> Acked-by: Jason Wang <jasowang@redhat.com>
> >>> Was it with SLUB or SLAB ?
> >>>
> >>> Using generic kmalloc-64 is better than a dedicated kmem_cache of 48
> >>> bytes per object, as we guarantee each object is on a single cache line.
> >>>
> >>>
> >> Right, thanks for the explanation.
> >>
> > I wonder if TUN would be better if it used a array to translate
> > receive hash to receive queue. This is how real hardware works with the
> > indirection table, and it would allow RFS acceleration. The current flow
> > cache stuff is prone to DoS attack and scaling problems with lots of
> > short lived flows.
> 
> The problem of indirection table is hash collision which may even happen
> when few flows existed.

Hash collision is fine, as long as the the statistical average of
hash across queue's is approximately equal it will be faster. A simple
array indirection is much faster than walking a hash table.

> For the RFS, we can open a API/ioctl for userspace to add or remove a
> flow cache.

RFS acceleration relies on programming the table. It is easier if
TUN looks more like hardware.

> For the DoS/scaling issue, I have an idea of:
> - limit the total number of flow entries in tun/tap
> - only update the flow entry every N (say 20 like ixgbe) packets or the
> the tcp packet has sync flag
> - I'm not sure skb_get_rxhash() is lightweight enough, or change to more
> lightweight one?

Ideally the hash should be programmable L2 vs L3, but that is splitting
hairs at this point.

Flow tables are scaling problem, especially on highly loaded servers where
they are most needed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: TUN problems (regression?)
  2012-12-28  6:25               ` Stephen Hemminger
@ 2013-01-04  5:04                 ` Jason Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2013-01-04  5:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, Paul Moore, netdev

On 12/28/2012 02:25 PM, Stephen Hemminger wrote:
> On Fri, 28 Dec 2012 13:43:54 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 12/28/2012 08:41 AM, Stephen Hemminger wrote:
>>> On Fri, 21 Dec 2012 12:26:56 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>
>>>> On 12/21/2012 11:39 AM, Eric Dumazet wrote:
>>>>> On Fri, 2012-12-21 at 11:32 +0800, Jason Wang wrote:
>>>>>> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
>>>>>>> On Thu, 20 Dec 2012 15:38:17 -0800
>>>>>>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>>>
>>>>>>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
>>>>>>>>> [CC'ing netdev in case this is a known problem I just missed ...]
>>>>>>>>>
>>>>>>>>> Hi Jason,
>>>>>>>>>
>>>>>>>>> I started doing some more testing with the multiqueue TUN changes and I ran 
>>>>>>>>> into a problem when running tunctl: running it once w/o arguments works as 
>>>>>>>>> expected, but running it a second time results in failure and a 
>>>>>>>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
>>>>>>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
>>>>>>>>>
>>>>>>>>> Have you seen this before?
>>>>>>>>>
>>>>>>>> Obviously code in tun_flow_init() is wrong...
>>>>>>>>
>>>>>>>> static int tun_flow_init(struct tun_struct *tun)
>>>>>>>> {
>>>>>>>>         int i;
>>>>>>>>
>>>>>>>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>>>>>>>>                                             sizeof(struct tun_flow_entry), 0, 0,
>>>>>>>>                                             NULL);
>>>>>>>>         if (!tun->flow_cache)
>>>>>>>>                 return -ENOMEM;
>>>>>>>> ...
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> I have no idea why we would need a kmem_cache per tun_struct,
>>>>>>>> and why we even need a kmem_cache.
>>>>>>> Normally flow malloc/free should be good enough.
>>>>>>> It might make sense to use private kmem_cache if doing hlist_nulls.
>>>>>>>
>>>>>>>
>>>>>>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
>>>>>> Should be at least a global cache, I thought I can get some speed-up by
>>>>>> using kmem_cache.
>>>>>>
>>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>>> Was it with SLUB or SLAB ?
>>>>>
>>>>> Using generic kmalloc-64 is better than a dedicated kmem_cache of 48
>>>>> bytes per object, as we guarantee each object is on a single cache line.
>>>>>
>>>>>
>>>> Right, thanks for the explanation.
>>>>
>>> I wonder if TUN would be better if it used a array to translate
>>> receive hash to receive queue. This is how real hardware works with the
>>> indirection table, and it would allow RFS acceleration. The current flow
>>> cache stuff is prone to DoS attack and scaling problems with lots of
>>> short lived flows.
>> The problem of indirection table is hash collision which may even happen
>> when few flows existed.
> Hash collision is fine, as long as the the statistical average of
> hash across queue's is approximately equal it will be faster. A simple
> array indirection is much faster than walking a hash table.

True, but hash collision may cause some negative effects such as losing
the flow affinity and packet re-ordering in guest which does not exist
in a perfect filter. Maybe we can implement them both and let user to
choose.
>
>> For the RFS, we can open a API/ioctl for userspace to add or remove a
>> flow cache.
> RFS acceleration relies on programming the table. It is easier if
> TUN looks more like hardware.
>
>> For the DoS/scaling issue, I have an idea of:
>> - limit the total number of flow entries in tun/tap
>> - only update the flow entry every N (say 20 like ixgbe) packets or the
>> the tcp packet has sync flag
>> - I'm not sure skb_get_rxhash() is lightweight enough, or change to more
>> lightweight one?
> Ideally the hash should be programmable L2 vs L3, but that is splitting
> hairs at this point.
>
> Flow tables are scaling problem, especially on highly loaded servers where
> they are most needed.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-01-04  5:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-20 23:16 TUN problems (regression?) Paul Moore
2012-12-20 23:38 ` Eric Dumazet
2012-12-20 23:50   ` Stephen Hemminger
2012-12-21  3:32     ` Jason Wang
2012-12-21  3:39       ` Eric Dumazet
2012-12-21  4:26         ` Jason Wang
2012-12-28  0:41           ` Stephen Hemminger
2012-12-28  5:43             ` Jason Wang
2012-12-28  6:25               ` Stephen Hemminger
2013-01-04  5:04                 ` Jason Wang
2012-12-21 21:15       ` David Miller
2012-12-21 16:27   ` Paul Moore
2012-12-21 17:17     ` [PATCH] tuntap: dont use a private kmem_cache Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).