All of lore.kernel.org
 help / color / mirror / Atom feed
* netfilter queue throughput slowdown
@ 2011-06-29  9:17 Anders Nilsson Plymoth
  2011-06-29  9:47 ` Eric Dumazet
  2011-07-02 12:25 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 28+ messages in thread
From: Anders Nilsson Plymoth @ 2011-06-29  9:17 UTC (permalink / raw)
  To: netfilter-devel

Hi,

I am using libnetfilter-queue on a router running Ubuntu 10.10 with
2.6.35-28-generic. The problem I am having is that I am experiencing a
very significant throughput slowdown whenever my NFQUEUE program is
running. This happens even when I use bare bone libnetfilter-queue
program that immediately issues an ACCEPT verdict as soon as it
receives a packet. Whenever this program is running, my max throughput
is cut in half, and the reason it happens is because nf_queue
overflows (nf_queue: full at 1024 entries, dropping packets(s)), and I
notice my CPU utilization is 100%. However, when my program is not
running and I am not passing packets through NFQUEUE and the router
routes packets as normal, I get full throughput with only 0.1% CPU
utilization.

I find this a bit strange, can the netfilter queue processing take the
cpu from 0.1% to 100% and start dropping packets even with no other
processing than setting immediately setting the verdict? We have two
of these machines, with identical hardware and OS, and they experience
the same behavior.
I am also confused as we have been using these machines previously and
been able to obtain full throughput with our netfilter program.

Does anyone have a clue here, or suggest what I should look into in
order to speed things up.

Thanks,
Anders

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-29  9:17 netfilter queue throughput slowdown Anders Nilsson Plymoth
@ 2011-06-29  9:47 ` Eric Dumazet
  2011-06-29  9:55   ` Anders Nilsson Plymoth
  2011-07-02 12:25 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2011-06-29  9:47 UTC (permalink / raw)
  To: Anders Nilsson Plymoth; +Cc: netfilter-devel

Le mercredi 29 juin 2011 à 11:17 +0200, Anders Nilsson Plymoth a écrit :
> Hi,
> 
> I am using libnetfilter-queue on a router running Ubuntu 10.10 with
> 2.6.35-28-generic. The problem I am having is that I am experiencing a
> very significant throughput slowdown whenever my NFQUEUE program is
> running. This happens even when I use bare bone libnetfilter-queue
> program that immediately issues an ACCEPT verdict as soon as it
> receives a packet. Whenever this program is running, my max throughput
> is cut in half, and the reason it happens is because nf_queue
> overflows (nf_queue: full at 1024 entries, dropping packets(s)), and I
> notice my CPU utilization is 100%. However, when my program is not
> running and I am not passing packets through NFQUEUE and the router
> routes packets as normal, I get full throughput with only 0.1% CPU
> utilization.
> 
> I find this a bit strange, can the netfilter queue processing take the
> cpu from 0.1% to 100% and start dropping packets even with no other
> processing than setting immediately setting the verdict? We have two
> of these machines, with identical hardware and OS, and they experience
> the same behavior.
> I am also confused as we have been using these machines previously and
> been able to obtain full throughput with our netfilter program.
> 
> Does anyone have a clue here, or suggest what I should look into in
> order to speed things up.
> 

Hmm, this is a known problem.

net/netfilter/nfnetlink_queue.c uses a single list of packets per queue.

If your application gives verdict for a packet not at the head of queue,
find_dequeue_entry() spend a lot of time to find the packet.

So are you sure you dont forget to give verdict for some packets, and
queue fills to its limit ?

Some attempts in the past tried to convert this list in a tree but AFAIK
nothing was merged.

By the way, latest Ubuntu has more recent kernel, you could try it as it
includes commit c463ac972315a0 (netfilter: nfnetlink_queue: some
optimizations)



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-29  9:47 ` Eric Dumazet
@ 2011-06-29  9:55   ` Anders Nilsson Plymoth
  2011-06-29 10:08     ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Anders Nilsson Plymoth @ 2011-06-29  9:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netfilter-devel

Hi Eric,

Thanks for your reply.
Yes, I am sure I set the verdict, as right now I do it on all packets
by default.
I will try upgrading and see if it works. Do you know if commit
c463ac972315a0 solves the problem you mentioned?

Thanks,
Anders

On Wed, Jun 29, 2011 at 11:47 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 29 juin 2011 à 11:17 +0200, Anders Nilsson Plymoth a écrit :
>> Hi,
>>
>> I am using libnetfilter-queue on a router running Ubuntu 10.10 with
>> 2.6.35-28-generic. The problem I am having is that I am experiencing a
>> very significant throughput slowdown whenever my NFQUEUE program is
>> running. This happens even when I use bare bone libnetfilter-queue
>> program that immediately issues an ACCEPT verdict as soon as it
>> receives a packet. Whenever this program is running, my max throughput
>> is cut in half, and the reason it happens is because nf_queue
>> overflows (nf_queue: full at 1024 entries, dropping packets(s)), and I
>> notice my CPU utilization is 100%. However, when my program is not
>> running and I am not passing packets through NFQUEUE and the router
>> routes packets as normal, I get full throughput with only 0.1% CPU
>> utilization.
>>
>> I find this a bit strange, can the netfilter queue processing take the
>> cpu from 0.1% to 100% and start dropping packets even with no other
>> processing than setting immediately setting the verdict? We have two
>> of these machines, with identical hardware and OS, and they experience
>> the same behavior.
>> I am also confused as we have been using these machines previously and
>> been able to obtain full throughput with our netfilter program.
>>
>> Does anyone have a clue here, or suggest what I should look into in
>> order to speed things up.
>>
>
> Hmm, this is a known problem.
>
> net/netfilter/nfnetlink_queue.c uses a single list of packets per queue.
>
> If your application gives verdict for a packet not at the head of queue,
> find_dequeue_entry() spend a lot of time to find the packet.
>
> So are you sure you dont forget to give verdict for some packets, and
> queue fills to its limit ?
>
> Some attempts in the past tried to convert this list in a tree but AFAIK
> nothing was merged.
>
> By the way, latest Ubuntu has more recent kernel, you could try it as it
> includes commit c463ac972315a0 (netfilter: nfnetlink_queue: some
> optimizations)
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-29  9:55   ` Anders Nilsson Plymoth
@ 2011-06-29 10:08     ` Eric Dumazet
  2011-06-30  6:20       ` Kuzin Andrey
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2011-06-29 10:08 UTC (permalink / raw)
  To: Anders Nilsson Plymoth; +Cc: netfilter-devel

Le mercredi 29 juin 2011 à 11:55 +0200, Anders Nilsson Plymoth a écrit :
> Hi Eric,
> 
> Thanks for your reply.
> Yes, I am sure I set the verdict, as right now I do it on all packets
> by default.
> I will try upgrading and see if it works. Do you know if commit
> c463ac972315a0 solves the problem you mentioned?

No, it doesnt solve the problem, only make things faster.

But if some packets are never dequeued (because something is wrong with
your program, failing to give verdict), they stay forever in the list
and each dequeue is slower since it has to go past these packets...

2.6.37 also contains commit 29b4433d991c88d86ca48a4c1cc33c671475be4b
(net: percpu net_device refcount) which helps a lot netfilter
queue/dequeue if your machine is SMP.



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-29 10:08     ` Eric Dumazet
@ 2011-06-30  6:20       ` Kuzin Andrey
  2011-06-30  6:47         ` Eric Dumazet
  2011-06-30 22:26         ` Sam Roberts
  0 siblings, 2 replies; 28+ messages in thread
From: Kuzin Andrey @ 2011-06-30  6:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Anders Nilsson Plymoth, netfilter-devel

  On 29.06.2011 14:08, Eric Dumazet wrote:
> Le mercredi 29 juin 2011 à 11:55 +0200, Anders Nilsson Plymoth a écrit :
>> Hi Eric,
>>
>> Thanks for your reply.
>> Yes, I am sure I set the verdict, as right now I do it on all packets
>> by default.
>> I will try upgrading and see if it works. Do you know if commit
>> c463ac972315a0 solves the problem you mentioned?
> No, it doesnt solve the problem, only make things faster.
>
> But if some packets are never dequeued (because something is wrong with
> your program, failing to give verdict), they stay forever in the list
> and each dequeue is slower since it has to go past these packets...
>
> 2.6.37 also contains commit 29b4433d991c88d86ca48a4c1cc33c671475be4b
> (net: percpu net_device refcount) which helps a lot netfilter
> queue/dequeue if your machine is SMP.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
Hello.
I'am using NFQUEUE several years on our main network router, and problem 
with packets stuck in the queue is still not resolved.
But in the past when i use libipq, it not have this problem. I try to 
use simple program with ACCEPT verdict on all packets, and after some
time queue has packets without any verdict, result is large (and 
increased) delays on every packet and waste CPU time.

I'am write patch for this, but it doesn't apply by any netfilter 
developers. Today i'am not using any kernel without this NFQUEUE patch.

diff --git a/net/netfilter/nfnetlink_queue.c 
b/net/netfilter/nfnetlink_queue.c
index 8c86011..74fc322 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -169,17 +169,29 @@ __enqueue_entry(struct nfqnl_instance *queue, 
struct nf_queue_entry *entry)
         queue->queue_total++;
  }

+#define NFQNL_TIMEOUT_ENTRY_DROP 20
+
  static struct nf_queue_entry *
  find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
  {
-       struct nf_queue_entry *entry = NULL, *i;
+       struct nf_queue_entry *entry = NULL, *next, *i;
+       ktime_t kt = ktime_get_real();

         spin_lock_bh(&queue->lock);

-       list_for_each_entry(i, &queue->queue_list, list) {
+       list_for_each_entry_safe(i, next, &queue->queue_list, list) {
                 if (i->id == id) {
                         entry = i;
                         break;
+                } else {
+                       struct timeval tv = 
ktime_to_timeval(ktime_sub(kt, i->skb->tstamp));
+                       if (tv.tv_sec > NFQNL_TIMEOUT_ENTRY_DROP) {
+                               printk(KERN_ERR "nf_queue: drop 
timeouted packet "
+                                       "(queue_num=%d seq_id=%d)\n", 
queue->queue_num, i->id);
+                               list_del(&i->list);
+                               queue->queue_total--;
+                               nf_reinject(i, NF_DROP);
+                       }
                 }
         }

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30  6:20       ` Kuzin Andrey
@ 2011-06-30  6:47         ` Eric Dumazet
  2011-06-30  7:36           ` Kuzin Andrey
  2011-06-30 22:26         ` Sam Roberts
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2011-06-30  6:47 UTC (permalink / raw)
  To: Kuzin Andrey; +Cc: Anders Nilsson Plymoth, netfilter-devel

Le jeudi 30 juin 2011 à 10:20 +0400, Kuzin Andrey a écrit :
> On 29.06.2011 14:08, Eric Dumazet wrote:
> > Le mercredi 29 juin 2011 à 11:55 +0200, Anders Nilsson Plymoth a écrit :
> >> Hi Eric,
> >>
> >> Thanks for your reply.
> >> Yes, I am sure I set the verdict, as right now I do it on all packets
> >> by default.
> >> I will try upgrading and see if it works. Do you know if commit
> >> c463ac972315a0 solves the problem you mentioned?
> > No, it doesnt solve the problem, only make things faster.
> >
> > But if some packets are never dequeued (because something is wrong with
> > your program, failing to give verdict), they stay forever in the list
> > and each dequeue is slower since it has to go past these packets...
> >
> > 2.6.37 also contains commit 29b4433d991c88d86ca48a4c1cc33c671475be4b
> > (net: percpu net_device refcount) which helps a lot netfilter
> > queue/dequeue if your machine is SMP.
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> Hello.
> I'am using NFQUEUE several years on our main network router, and problem 
> with packets stuck in the queue is still not resolved.
> But in the past when i use libipq, it not have this problem. I try to 
> use simple program with ACCEPT verdict on all packets, and after some
> time queue has packets without any verdict, result is large (and 
> increased) delays on every packet and waste CPU time.
> 
> I'am write patch for this, but it doesn't apply by any netfilter 
> developers. Today i'am not using any kernel without this NFQUEUE patch.
> 
> diff --git a/net/netfilter/nfnetlink_queue.c 
> b/net/netfilter/nfnetlink_queue.c
> index 8c86011..74fc322 100644
> --- a/net/netfilter/nfnetlink_queue.c
> +++ b/net/netfilter/nfnetlink_queue.c
> @@ -169,17 +169,29 @@ __enqueue_entry(struct nfqnl_instance *queue, 
> struct nf_queue_entry *entry)
>          queue->queue_total++;
>   }
> 
> +#define NFQNL_TIMEOUT_ENTRY_DROP 20
> +
>   static struct nf_queue_entry *
>   find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
>   {
> -       struct nf_queue_entry *entry = NULL, *i;
> +       struct nf_queue_entry *entry = NULL, *next, *i;
> +       ktime_t kt = ktime_get_real();
> 
>          spin_lock_bh(&queue->lock);
> 
> -       list_for_each_entry(i, &queue->queue_list, list) {
> +       list_for_each_entry_safe(i, next, &queue->queue_list, list) {
>                  if (i->id == id) {
>                          entry = i;
>                          break;
> +                } else {
> +                       struct timeval tv = 
> ktime_to_timeval(ktime_sub(kt, i->skb->tstamp));
> +                       if (tv.tv_sec > NFQNL_TIMEOUT_ENTRY_DROP) {
> +                               printk(KERN_ERR "nf_queue: drop 
> timeouted packet "
> +                                       "(queue_num=%d seq_id=%d)\n", 
> queue->queue_num, i->id);
> +                               list_del(&i->list);
> +                               queue->queue_total--;
> +                               nf_reinject(i, NF_DROP);
> +                       }
>                  }
>          }
> 

You do realize we have to find out what happened to these packets, why
no verdict was given at all ?

Also, where skb->tstamp is set exactly ? This depends on
netstamp_needed ?



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30  6:47         ` Eric Dumazet
@ 2011-06-30  7:36           ` Kuzin Andrey
  2011-06-30 11:34             ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Kuzin Andrey @ 2011-06-30  7:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Anders Nilsson Plymoth, netfilter-devel

  On 30.06.2011 10:47, Eric Dumazet wrote:
> Le jeudi 30 juin 2011 à 10:20 +0400, Kuzin Andrey a écrit :
>> On 29.06.2011 14:08, Eric Dumazet wrote:
>>> Le mercredi 29 juin 2011 à 11:55 +0200, Anders Nilsson Plymoth a écrit :
>>>> Hi Eric,
>>>>
>>>> Thanks for your reply.
>>>> Yes, I am sure I set the verdict, as right now I do it on all packets
>>>> by default.
>>>> I will try upgrading and see if it works. Do you know if commit
>>>> c463ac972315a0 solves the problem you mentioned?
>>> No, it doesnt solve the problem, only make things faster.
>>>
>>> But if some packets are never dequeued (because something is wrong with
>>> your program, failing to give verdict), they stay forever in the list
>>> and each dequeue is slower since it has to go past these packets...
>>>
>>> 2.6.37 also contains commit 29b4433d991c88d86ca48a4c1cc33c671475be4b
>>> (net: percpu net_device refcount) which helps a lot netfilter
>>> queue/dequeue if your machine is SMP.
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> Hello.
>> I'am using NFQUEUE several years on our main network router, and problem
>> with packets stuck in the queue is still not resolved.
>> But in the past when i use libipq, it not have this problem. I try to
>> use simple program with ACCEPT verdict on all packets, and after some
>> time queue has packets without any verdict, result is large (and
>> increased) delays on every packet and waste CPU time.
>>
>> I'am write patch for this, but it doesn't apply by any netfilter
>> developers. Today i'am not using any kernel without this NFQUEUE patch.
>>
>> diff --git a/net/netfilter/nfnetlink_queue.c
>> b/net/netfilter/nfnetlink_queue.c
>> index 8c86011..74fc322 100644
>> --- a/net/netfilter/nfnetlink_queue.c
>> +++ b/net/netfilter/nfnetlink_queue.c
>> @@ -169,17 +169,29 @@ __enqueue_entry(struct nfqnl_instance *queue,
>> struct nf_queue_entry *entry)
>>           queue->queue_total++;
>>    }
>>
>> +#define NFQNL_TIMEOUT_ENTRY_DROP 20
>> +
>>    static struct nf_queue_entry *
>>    find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
>>    {
>> -       struct nf_queue_entry *entry = NULL, *i;
>> +       struct nf_queue_entry *entry = NULL, *next, *i;
>> +       ktime_t kt = ktime_get_real();
>>
>>           spin_lock_bh(&queue->lock);
>>
>> -       list_for_each_entry(i,&queue->queue_list, list) {
>> +       list_for_each_entry_safe(i, next,&queue->queue_list, list) {
>>                   if (i->id == id) {
>>                           entry = i;
>>                           break;
>> +                } else {
>> +                       struct timeval tv =
>> ktime_to_timeval(ktime_sub(kt, i->skb->tstamp));
>> +                       if (tv.tv_sec>  NFQNL_TIMEOUT_ENTRY_DROP) {
>> +                               printk(KERN_ERR "nf_queue: drop
>> timeouted packet "
>> +                                       "(queue_num=%d seq_id=%d)\n",
>> queue->queue_num, i->id);
>> +                               list_del(&i->list);
>> +                               queue->queue_total--;
>> +                               nf_reinject(i, NF_DROP);
>> +                       }
>>                   }
>>           }
>>
> You do realize we have to find out what happened to these packets, why
> no verdict was given at all ?
>
> Also, where skb->tstamp is set exactly ? This depends on
> netstamp_needed ?
>
Every day netfilter code become  more and more difficult to understand. 
IPQ mechanism don't have
this problems, migration to NFQUEUE by simply modifications of function 
names in original program
code led to these problems. I think it very hard to find this error in 
netfilter code and may be sometimes
NFQUEUE will be rewritten from scratch to NGQUEUE (next generation queue) ;)
I wrote this patch as simply as possible (one-hour solution) to solve 
the problem for the our paid services.
Early no one netfilter developer did not pay any attention to messages 
about this problem.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30  7:36           ` Kuzin Andrey
@ 2011-06-30 11:34             ` Eric Dumazet
  2011-06-30 11:59               ` Patrick McHardy
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2011-06-30 11:34 UTC (permalink / raw)
  To: Kuzin Andrey; +Cc: Anders Nilsson Plymoth, netfilter-devel

Le jeudi 30 juin 2011 à 11:36 +0400, Kuzin Andrey a écrit :

> Every day netfilter code become  more and more difficult to understand. 
> IPQ mechanism don't have
> this problems, migration to NFQUEUE by simply modifications of function 
> names in original program
> code led to these problems. I think it very hard to find this error in 
> netfilter code and may be sometimes
> NFQUEUE will be rewritten from scratch to NGQUEUE (next generation queue) ;)
> I wrote this patch as simply as possible (one-hour solution) to solve 
> the problem for the our paid services.
> Early no one netfilter developer did not pay any attention to messages 
> about this problem.
> 

Wow wow wow...

Maybe these netfilter guys you blame had some paid job to do at the time
you sent your bug report ? Or only you have real paid services ?

For the record, I also used NFQUEUE and never hit a single problem.

Maybe I was just lucky, I dont know.

Instead of trying to hide the bug, please be constructive and find a way
to pinpoint the bug, so that we can fix it for good.

A timeout of 20 seconds, on a machine receiving 100.000 packets per
second would not help anyway...



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 11:34             ` Eric Dumazet
@ 2011-06-30 11:59               ` Patrick McHardy
  2011-06-30 15:15                 ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick McHardy @ 2011-06-30 11:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Kuzin Andrey, Anders Nilsson Plymoth, netfilter-devel

Am 30.06.2011 13:34, schrieb Eric Dumazet:
> Le jeudi 30 juin 2011 à 11:36 +0400, Kuzin Andrey a écrit :
> 
>> Every day netfilter code become  more and more difficult to understand. 
>> IPQ mechanism don't have
>> this problems, migration to NFQUEUE by simply modifications of function 
>> names in original program
>> code led to these problems. I think it very hard to find this error in 
>> netfilter code and may be sometimes
>> NFQUEUE will be rewritten from scratch to NGQUEUE (next generation queue) ;)
>> I wrote this patch as simply as possible (one-hour solution) to solve 
>> the problem for the our paid services.
>> Early no one netfilter developer did not pay any attention to messages 
>> about this problem.
>>
> 
> Wow wow wow...
> 
> Maybe these netfilter guys you blame had some paid job to do at the time
> you sent your bug report ? Or only you have real paid services ?
> 
> For the record, I also used NFQUEUE and never hit a single problem.
> 
> Maybe I was just lucky, I dont know.
> 
> Instead of trying to hide the bug, please be constructive and find a way
> to pinpoint the bug, so that we can fix it for good.

Thanks Eric, I agree. Give us data and we'll fix it if really is a bug.

The fact that the timeout patch apparently helps indicates that some
packets don't receive verdicts.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 15:15                 ` Eric Dumazet
@ 2011-06-30 14:32                   ` Stephen Clark
  2011-06-30 14:51                     ` Patrick McHardy
  2011-06-30 22:24                   ` Sam Roberts
  1 sibling, 1 reply; 28+ messages in thread
From: Stephen Clark @ 2011-06-30 14:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Patrick McHardy, Kuzin Andrey, Anders Nilsson Plymoth, netfilter-devel

On 06/30/2011 11:15 AM, Eric Dumazet wrote:
> Le jeudi 30 juin 2011 à 13:59 +0200, Patrick McHardy a écrit :
>
>    
>> Thanks Eric, I agree. Give us data and we'll fix it if really is a bug.
>>
>> The fact that the timeout patch apparently helps indicates that some
>> packets don't receive verdicts.
>>      
> My rough guess is that this user application gets an error in its
> nfq_set_verdict() call ( maybe a transient out of memory indication) and
> packet never gets its verdict.
>
> libnetfilter_queue/utils/nfqnl_test.c is buggy in this regard : It
> should at least log an error if nfq_set_verdict() fails, so that
> programmer using nfqnl_test.c as a template is aware of a possible
> problem here.
>
>
>   utils/nfqnl_test.c |    7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/utils/nfqnl_test.c b/utils/nfqnl_test.c
> index a554f2d..b7e0cf9 100644
> --- a/utils/nfqnl_test.c
> +++ b/utils/nfqnl_test.c
> @@ -69,8 +69,13 @@ static int cb(struct nfq_q_handle *qh, struct nfgenmsg *nfmsg,
>   	      struct nfq_data *nfa, void *data)
>   {
>   	u_int32_t id = print_pkt(nfa);
> +	int res;
> +
>   	printf("entering callback\n");
> -	return nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL);
> +	res = nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL);
> +	if (res == -1)
> +		printf("nfq_set_verdict() error %d (packet stuck in queue !)\n", errno);
> +	return res;
>   }
>
>   int main(int argc, char **argv)
>
>    

So if you receive a -1 the proper recovery is to call nfq_set_verdict() 
again?

-- 

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 14:32                   ` Stephen Clark
@ 2011-06-30 14:51                     ` Patrick McHardy
  2011-06-30 17:07                       ` Eric Leblond
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick McHardy @ 2011-06-30 14:51 UTC (permalink / raw)
  To: sclark46
  Cc: Eric Dumazet, Kuzin Andrey, Anders Nilsson Plymoth, netfilter-devel

On 30.06.2011 16:32, Stephen Clark wrote:
> On 06/30/2011 11:15 AM, Eric Dumazet wrote:
>> Le jeudi 30 juin 2011 à 13:59 +0200, Patrick McHardy a écrit :
>>
>>   
>>> Thanks Eric, I agree. Give us data and we'll fix it if really is a bug.
>>>
>>> The fact that the timeout patch apparently helps indicates that some
>>> packets don't receive verdicts.
>>>      
>> My rough guess is that this user application gets an error in its
>> nfq_set_verdict() call ( maybe a transient out of memory indication) and
>> packet never gets its verdict.
>>
>> libnetfilter_queue/utils/nfqnl_test.c is buggy in this regard : It
>> should at least log an error if nfq_set_verdict() fails, so that
>> programmer using nfqnl_test.c as a template is aware of a possible
>> problem here.
>>
>>
>>   utils/nfqnl_test.c |    7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/utils/nfqnl_test.c b/utils/nfqnl_test.c
>> index a554f2d..b7e0cf9 100644
>> --- a/utils/nfqnl_test.c
>> +++ b/utils/nfqnl_test.c
>> @@ -69,8 +69,13 @@ static int cb(struct nfq_q_handle *qh, struct
>> nfgenmsg *nfmsg,
>>             struct nfq_data *nfa, void *data)
>>   {
>>       u_int32_t id = print_pkt(nfa);
>> +    int res;
>> +
>>       printf("entering callback\n");
>> -    return nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL);
>> +    res = nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL);
>> +    if (res == -1)
>> +        printf("nfq_set_verdict() error %d (packet stuck in queue
>> !)\n", errno);
>> +    return res;
>>   }
>>
>>   int main(int argc, char **argv)
>>
>>    
> 
> So if you receive a -1 the proper recovery is to call nfq_set_verdict()
> again?

Look at errno to see what's happening. But yes, this indicates the
verdict wasn't issues successfully, so you need to retransmit.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 11:59               ` Patrick McHardy
@ 2011-06-30 15:15                 ` Eric Dumazet
  2011-06-30 14:32                   ` Stephen Clark
  2011-06-30 22:24                   ` Sam Roberts
  0 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2011-06-30 15:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Kuzin Andrey, Anders Nilsson Plymoth, netfilter-devel

Le jeudi 30 juin 2011 à 13:59 +0200, Patrick McHardy a écrit :

> Thanks Eric, I agree. Give us data and we'll fix it if really is a bug.
> 
> The fact that the timeout patch apparently helps indicates that some
> packets don't receive verdicts.

My rough guess is that this user application gets an error in its
nfq_set_verdict() call ( maybe a transient out of memory indication) and
packet never gets its verdict.

libnetfilter_queue/utils/nfqnl_test.c is buggy in this regard : It
should at least log an error if nfq_set_verdict() fails, so that
programmer using nfqnl_test.c as a template is aware of a possible
problem here.


 utils/nfqnl_test.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/utils/nfqnl_test.c b/utils/nfqnl_test.c
index a554f2d..b7e0cf9 100644
--- a/utils/nfqnl_test.c
+++ b/utils/nfqnl_test.c
@@ -69,8 +69,13 @@ static int cb(struct nfq_q_handle *qh, struct nfgenmsg *nfmsg,
 	      struct nfq_data *nfa, void *data)
 {
 	u_int32_t id = print_pkt(nfa);
+	int res;
+
 	printf("entering callback\n");
-	return nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL);
+	res = nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL);
+	if (res == -1)
+		printf("nfq_set_verdict() error %d (packet stuck in queue !)\n", errno);
+	return res;
 }
 
 int main(int argc, char **argv)


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 14:51                     ` Patrick McHardy
@ 2011-06-30 17:07                       ` Eric Leblond
  2011-06-30 17:45                         ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Leblond @ 2011-06-30 17:07 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: sclark46, Eric Dumazet, Kuzin Andrey, Anders Nilsson Plymoth,
	netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

Hello,

On Thu, 2011-06-30 at 16:51 +0200, Patrick McHardy wrote:
> On 30.06.2011 16:32, Stephen Clark wrote:
> > On 06/30/2011 11:15 AM, Eric Dumazet wrote:
> >> Le jeudi 30 juin 2011 à 13:59 +0200, Patrick McHardy a écrit :
> >>
> >>   
> >>> Thanks Eric, I agree. Give us data and we'll fix it if really is a bug.
...
> > 
> > So if you receive a -1 the proper recovery is to call nfq_set_verdict()
> > again?
> 
> Look at errno to see what's happening. But yes, this indicates the
> verdict wasn't issues successfully, so you need to retransmit.

As the verdict failure is bound to occur in a high load time,
retransmission of the verdict (which is necessary) will not help the
system to recover. Userspace has to deal with it but it has another
consequences which is that userspace software may suffer of case where
successive failures occurs.

In this scope, Florian's patch "netfilter: nfqueue: batch verdict
support" could be really useful. It could be used by userspace to
trigger an decide on all stucked packets. Issuing a massive ACCEPT could
lead to dynosaurus packet coming from ancient time but it could be ok if
batch occurs enough often.

Is there a plan to accept it in mainstream ?

BR,
-- 
Eric Leblond 
Blog: http://home.regit.org/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 17:07                       ` Eric Leblond
@ 2011-06-30 17:45                         ` Eric Dumazet
  2011-06-30 18:08                           ` Eric Leblond
                                             ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Eric Dumazet @ 2011-06-30 17:45 UTC (permalink / raw)
  To: Eric Leblond
  Cc: Patrick McHardy, sclark46, Kuzin Andrey, Anders Nilsson Plymoth,
	netfilter-devel

Le jeudi 30 juin 2011 à 19:07 +0200, Eric Leblond a écrit :

> As the verdict failure is bound to occur in a high load time,
> retransmission of the verdict (which is necessary) will not help the
> system to recover. Userspace has to deal with it but it has another
> consequences which is that userspace software may suffer of case where
> successive failures occurs.
> 
> In this scope, Florian's patch "netfilter: nfqueue: batch verdict
> support" could be really useful. It could be used by userspace to
> trigger an decide on all stucked packets. Issuing a massive ACCEPT could
> lead to dynosaurus packet coming from ancient time but it could be ok if
> batch occurs enough often.
> 
> Is there a plan to accept it in mainstream ?

Given that apparently some apps are not aware some of their verdicts are
lost, I consider the BATCH idea would be a bad idea, unless DROP is
used.

If you have any doubt, only sane thing is to drop packets, not accept
them.

Maybe a single queue flag is needed : DROP_OLD_PACKETS, if user
application is handling packets in order.

Every time a verdict is given by application, automatically DROP all
previous un-verdicted packets.



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 17:45                         ` Eric Dumazet
@ 2011-06-30 18:08                           ` Eric Leblond
  2011-07-01  6:39                           ` Amos Jeffries
                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Eric Leblond @ 2011-06-30 18:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Patrick McHardy, sclark46, Kuzin Andrey, Anders Nilsson Plymoth,
	netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]

Hello,

On Thu, 2011-06-30 at 19:45 +0200, Eric Dumazet wrote:
> Le jeudi 30 juin 2011 à 19:07 +0200, Eric Leblond a écrit :
> 
> > As the verdict failure is bound to occur in a high load time,
> > retransmission of the verdict (which is necessary) will not help the
> > system to recover. Userspace has to deal with it but it has another
> > consequences which is that userspace software may suffer of case where
> > successive failures occurs.
> > 
> > In this scope, Florian's patch "netfilter: nfqueue: batch verdict
> > support" could be really useful. It could be used by userspace to
> > trigger an decide on all stucked packets. Issuing a massive ACCEPT could
> > lead to dynosaurus packet coming from ancient time but it could be ok if
> > batch occurs enough often.
> > 
> > Is there a plan to accept it in mainstream ?
> 
> Given that apparently some apps are not aware some of their verdicts are
> lost, I consider the BATCH idea would be a bad idea, unless DROP is
> used.
> 
> If you have any doubt, only sane thing is to drop packets, not accept
> them.

All depends of the application. For a security application this is a
sane behaviour (and maybe the only one acceptable) but we've seen
applications such as NFQUEUE based QoS implementation where ACCEPT may
be a decent decision.

BR,
-- 
Eric Leblond 
Blog: http://home.regit.org/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 15:15                 ` Eric Dumazet
  2011-06-30 14:32                   ` Stephen Clark
@ 2011-06-30 22:24                   ` Sam Roberts
  2011-07-01  4:53                     ` Eric Dumazet
  1 sibling, 1 reply; 28+ messages in thread
From: Sam Roberts @ 2011-06-30 22:24 UTC (permalink / raw)
  To: Eric Dumazet, netfilter-devel

On Thu, Jun 30, 2011 at 8:15 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 30 juin 2011 à 13:59 +0200, Patrick McHardy a écrit :
>
>> Thanks Eric, I agree. Give us data and we'll fix it if really is a bug.
>>
>> The fact that the timeout patch apparently helps indicates that some
>> packets don't receive verdicts.
>
> My rough guess is that this user application gets an error in its
> nfq_set_verdict() call ( maybe a transient out of memory indication) and
> packet never gets its verdict.

Hows does ENOBUFS interact with the queue? Is it possible that the
kernel->userspace notification of the packet id gets dropped, and
userspace is thus never notified of the packet id, and can't issue a
verdict?

Sam
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30  6:20       ` Kuzin Andrey
  2011-06-30  6:47         ` Eric Dumazet
@ 2011-06-30 22:26         ` Sam Roberts
  2011-07-01  4:52           ` Eric Dumazet
  1 sibling, 1 reply; 28+ messages in thread
From: Sam Roberts @ 2011-06-30 22:26 UTC (permalink / raw)
  To: Kuzin Andrey; +Cc: Eric Dumazet, Anders Nilsson Plymoth, netfilter-devel

On Wed, Jun 29, 2011 at 11:20 PM, Kuzin Andrey <kuzinandrey@yandex.ru> wrote:
>  On 29.06.2011 14:08, Eric Dumazet wrote:
>> But if some packets are never dequeued (because something is wrong with
>> your program, failing to give verdict), they stay forever in the list
>> and each dequeue is slower since it has to go past these packets...

> I'am using NFQUEUE several years on our main network router, and problem
> with packets stuck in the queue is still not resolved.

Is there any way to see what the current queue is in the kernel? How
deep, and what  the IDs are?

Cheers,
Sam
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 22:26         ` Sam Roberts
@ 2011-07-01  4:52           ` Eric Dumazet
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2011-07-01  4:52 UTC (permalink / raw)
  To: Sam Roberts; +Cc: Kuzin Andrey, Anders Nilsson Plymoth, netfilter-devel

Le jeudi 30 juin 2011 à 15:26 -0700, Sam Roberts a écrit :
> On Wed, Jun 29, 2011 at 11:20 PM, Kuzin Andrey <kuzinandrey@yandex.ru> wrote:
> >  On 29.06.2011 14:08, Eric Dumazet wrote:
> >> But if some packets are never dequeued (because something is wrong with
> >> your program, failing to give verdict), they stay forever in the list
> >> and each dequeue is slower since it has to go past these packets...
> 
> > I'am using NFQUEUE several years on our main network router, and problem
> > with packets stuck in the queue is still not resolved.
> 
> Is there any way to see what the current queue is in the kernel? How
> deep, and what  the IDs are?
> 

check third column (queue_total) of :

# cat /proc/net/netfilter/nfnetlink_queue 
    0  15363  1024 2 65535   160     0     2208  1

queue_number peer_pid queue_total copy_mode copy_range queue_dropped queue_user_dropped id_sequence 1

Here my test program 'forgot' to give a verdict to 1024 packets.
So all new packets are dropped (160 so far)



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 22:24                   ` Sam Roberts
@ 2011-07-01  4:53                     ` Eric Dumazet
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2011-07-01  4:53 UTC (permalink / raw)
  To: Sam Roberts; +Cc: netfilter-devel

Le jeudi 30 juin 2011 à 15:24 -0700, Sam Roberts a écrit :
> On Thu, Jun 30, 2011 at 8:15 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le jeudi 30 juin 2011 à 13:59 +0200, Patrick McHardy a écrit :
> >
> >> Thanks Eric, I agree. Give us data and we'll fix it if really is a bug.
> >>
> >> The fact that the timeout patch apparently helps indicates that some
> >> packets don't receive verdicts.
> >
> > My rough guess is that this user application gets an error in its
> > nfq_set_verdict() call ( maybe a transient out of memory indication) and
> > packet never gets its verdict.
> 
> Hows does ENOBUFS interact with the queue? Is it possible that the
> kernel->userspace notification of the packet id gets dropped, and
> userspace is thus never notified of the packet id, and can't issue a
> verdict?

Its not possible.

If a paquet is in QUEUE, it means a friend packet was also queued on the
netlink socket.



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 17:45                         ` Eric Dumazet
  2011-06-30 18:08                           ` Eric Leblond
@ 2011-07-01  6:39                           ` Amos Jeffries
  2011-07-01  7:00                           ` [RFC] nfnetlink_queue not scalable Eric Dumazet
  2011-07-01 15:08                           ` netfilter queue throughput slowdown Anders Nilsson Plymoth
  3 siblings, 0 replies; 28+ messages in thread
From: Amos Jeffries @ 2011-07-01  6:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Leblond, Patrick McHardy, sclark46, Kuzin Andrey,
	Anders Nilsson Plymoth, netfilter-devel

On 01/07/11 05:45, Eric Dumazet wrote:
> Le jeudi 30 juin 2011 à 19:07 +0200, Eric Leblond a écrit :
>
>> As the verdict failure is bound to occur in a high load time,
>> retransmission of the verdict (which is necessary) will not help the
>> system to recover. Userspace has to deal with it but it has another
>> consequences which is that userspace software may suffer of case where
>> successive failures occurs.
>>
>> In this scope, Florian's patch "netfilter: nfqueue: batch verdict
>> support" could be really useful. It could be used by userspace to
>> trigger an decide on all stucked packets. Issuing a massive ACCEPT could
>> lead to dynosaurus packet coming from ancient time but it could be ok if
>> batch occurs enough often.
>>
>> Is there a plan to accept it in mainstream ?
>
> Given that apparently some apps are not aware some of their verdicts are
> lost, I consider the BATCH idea would be a bad idea, unless DROP is
> used.
>
> If you have any doubt, only sane thing is to drop packets, not accept
> them.
>
> Maybe a single queue flag is needed : DROP_OLD_PACKETS, if user
> application is handling packets in order.
>
> Every time a verdict is given by application, automatically DROP all
> previous un-verdicted packets.
>

Would need to be paired with the option to do so for other verdicts.

What about an option of "verdict X on this packet Y on anything older" ?

AYJ
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC] nfnetlink_queue not scalable
  2011-06-30 17:45                         ` Eric Dumazet
  2011-06-30 18:08                           ` Eric Leblond
  2011-07-01  6:39                           ` Amos Jeffries
@ 2011-07-01  7:00                           ` Eric Dumazet
  2011-07-01  7:49                             ` Florian Westphal
  2011-07-01 15:08                           ` netfilter queue throughput slowdown Anders Nilsson Plymoth
  3 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2011-07-01  7:00 UTC (permalink / raw)
  To: Eric Leblond, Patrick McHardy, Florian Westphal
  Cc: sclark46, Kuzin Andrey, Anders Nilsson Plymoth, netfilter-devel

So I took a look at using several queues to spread load on several cpus,
and quickly found out some hot spots in our NFQUEUE processing, even
using cpu match, and correct cpu affinities.

iptables -N DISPATCH 2>/dev/null
iptables -F DISPATCH

for i in  `seq 0 15`
do
 iptables -A DISPATCH -m cpu --cpu $i -j NFQUEUE --queue-num $i
 nfqnl_test -q $i &
done
iptables -A FORWARD -j DISPATCH

<stress load, 500.000 pps >

# cat /proc/net/netfilter/nfnetlink_queue 
    0   4855     0 2 65535     0 37638   886112  1
    1   4858     0 2 65535     0 24530   886096  1
    2   4862     0 2 65535     0 40666   886103  1
    3   4866     0 2 65535     0  6480   830716  1
    4   4870     0 2 65535     0 35957   886099  1
    5   4872     0 2 65535     0 20001   886100  1
    6   4875     0 2 65535     0 41656   886099  1
    7   4877     0 2 65535     0 27196   886099  1
    8   4879     0 2 65535     0 39173   886112  1
    9   4881     0 2 65535     0 19840   886103  1
   10   4883     0 2 65535     0 29339   850670  1
   11   4885     0 2 65535     0 17933   886102  1
   12   4887     0 2 65535     0 37680   886099  1
   13   4889     0 2 65535     0  7327   830721  1
   14   4891     0 2 65535     0 35577   886100  1
   15   4893     0 2 65535     0 23854   886100  1


Number one offender is the nfnl_lock mutex hold each time we give a
verdict.

Number two is the nl_table_lock rwlock

# Events: 28K cycles
#
# Overhead       Command                Shared Object                               Symbol
# ........  ............  ...........................  ...................................
#
    10.60%    nfqnl_test  [kernel.kallsyms]            [k] mutex_spin_on_owner
              |
              --- mutex_spin_on_owner
                 |          
                 |--99.68%-- __mutex_lock_slowpath
                 |          mutex_lock
                 |          nfnl_lock
                 |          nfnetlink_rcv
                 |          netlink_unicast
                 |          netlink_sendmsg
                 |          sock_sendmsg
                 |          __sys_sendmsg
                 |          sys_sendmsg
                 |          compat_sys_sendmsg
                 |          compat_sys_socketcall
                 |          sysenter_dispatch
                 |          0xffffe430
                 |          nfnl_sendiov
                 |          __set_verdict
                 |          cb
                 |          __nfnl_handle_msg
                 |          nfnl_handle_packet
                 |          nfq_handle_packet
                 |          main
                 |          __libc_start_main
                 |          _start
                  --0.32%-- [...]

     4.12%    nfqnl_test  [kernel.kallsyms]            [k] __mutex_lock_slowpath
              |
              --- __mutex_lock_slowpath
                 |          
                 |--99.68%-- mutex_lock
                 |          nfnl_lock



Time to add RCU ;)




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] nfnetlink_queue not scalable
  2011-07-01  7:00                           ` [RFC] nfnetlink_queue not scalable Eric Dumazet
@ 2011-07-01  7:49                             ` Florian Westphal
  2011-07-01 15:27                               ` [PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg() Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Westphal @ 2011-07-01  7:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Leblond, Patrick McHardy, sclark46, Kuzin Andrey,
	Anders Nilsson Plymoth, netfilter-devel

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Number one offender is the nfnl_lock mutex hold each time we give a
> verdict.

Yes, the nfnl mutex is fairly annoying for nfqueue.

Unfortunately it is not possible to just remove it
completely since it also protects against module removal.

But I guess even having to grab a refcount would be
a huge win as opposed to holding on to the nfnl mutex...

We'd also need to audit all ->call implementations; most
of them assume the nfnl_mutex is being hold.

> Time to add RCU ;)

Thanks for volunteering ;-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg()
  2011-07-01 15:27                               ` [PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg() Eric Dumazet
@ 2011-07-01 14:11                                 ` Florian Westphal
  2011-07-05 13:22                                 ` Patrick McHardy
  2011-07-18 14:06                                 ` Patrick McHardy
  2 siblings, 0 replies; 28+ messages in thread
From: Florian Westphal @ 2011-07-01 14:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, Patrick McHardy, Eric Leblond, sclark46,
	Kuzin Andrey, Anders Nilsson Plymoth, netfilter-devel, netdev

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 01 juillet 2011 à 09:49 +0200, Florian Westphal a écrit :
> > But I guess even having to grab a refcount would be
> > a huge win as opposed to holding on to the nfnl mutex...
> > 
> > We'd also need to audit all ->call implementations; most
> > of them assume the nfnl_mutex is being hold.
> 
> We can do another way : Introduce a new ->call_rcu() implementation
> and convert places where we prefer not holding nfnf_mutex.
> If/when all places are converted, remove the ->call() field for good.

Sounds reasonable to me.

Both patches look great, thanks a lot for doing this Eric!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-30 17:45                         ` Eric Dumazet
                                             ` (2 preceding siblings ...)
  2011-07-01  7:00                           ` [RFC] nfnetlink_queue not scalable Eric Dumazet
@ 2011-07-01 15:08                           ` Anders Nilsson Plymoth
  3 siblings, 0 replies; 28+ messages in thread
From: Anders Nilsson Plymoth @ 2011-07-01 15:08 UTC (permalink / raw)
  To: netfilter-devel

Hi,

I upgraded the kernel and distro but some only a little improvent. I
saw no failures to set the verdict, but many many packets drops in the
netfilter queue proc entry.

I saw the two submitted patches today, and will try those.

I think having an option on what to do with stuck packets makes a lot
of sense. I can think of several ways to handle this, and each have
different consequences depending on the type of packets in the queue.
Having them stuck in the buffer is really bad though.
If you do a massive ACCEPT, packets will probably arrive out of order,
which may cause problems for some protocols and apps. If you do a
massive DROP, TCP will deflate its window and TCP will cut down its
rate significantly. Here it might be better to some RED type dropping,
maybe successive single DROPs untill queue is empty of stuck packets.

Thanks,
Anders

On Thu, Jun 30, 2011 at 7:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Le jeudi 30 juin 2011 à 19:07 +0200, Eric Leblond a écrit :
>
> > As the verdict failure is bound to occur in a high load time,
> > retransmission of the verdict (which is necessary) will not help the
> > system to recover. Userspace has to deal with it but it has another
> > consequences which is that userspace software may suffer of case where
> > successive failures occurs.
> >
> > In this scope, Florian's patch "netfilter: nfqueue: batch verdict
> > support" could be really useful. It could be used by userspace to
> > trigger an decide on all stucked packets. Issuing a massive ACCEPT could
> > lead to dynosaurus packet coming from ancient time but it could be ok if
> > batch occurs enough often.
> >
> > Is there a plan to accept it in mainstream ?
>
> Given that apparently some apps are not aware some of their verdicts are
> lost, I consider the BATCH idea would be a bad idea, unless DROP is
> used.
>
> If you have any doubt, only sane thing is to drop packets, not accept
> them.
>
> Maybe a single queue flag is needed : DROP_OLD_PACKETS, if user
> application is handling packets in order.
>
> Every time a verdict is given by application, automatically DROP all
> previous un-verdicted packets.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg()
  2011-07-01  7:49                             ` Florian Westphal
@ 2011-07-01 15:27                               ` Eric Dumazet
  2011-07-01 14:11                                 ` Florian Westphal
                                                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Eric Dumazet @ 2011-07-01 15:27 UTC (permalink / raw)
  To: Florian Westphal, Patrick McHardy
  Cc: Eric Leblond, sclark46, Kuzin Andrey, Anders Nilsson Plymoth,
	netfilter-devel, netdev

Le vendredi 01 juillet 2011 à 09:49 +0200, Florian Westphal a écrit :
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Number one offender is the nfnl_lock mutex hold each time we give a
> > verdict.
> 
> Yes, the nfnl mutex is fairly annoying for nfqueue.
> 
> Unfortunately it is not possible to just remove it
> completely since it also protects against module removal.
> 

I believe it can, just add appropriate synchronization points.

> But I guess even having to grab a refcount would be
> a huge win as opposed to holding on to the nfnl mutex...
> 
> We'd also need to audit all ->call implementations; most
> of them assume the nfnl_mutex is being hold.

CC netdev

We can do another way : Introduce a new ->call_rcu() implementation
and convert places where we prefer not holding nfnf_mutex.

If/when all places are converted, remove the ->call() field for good.

With following two patches, I was able to reach more than 2.000.000 pps
without losses on my setup (limited by my lab setup), instead of less
than 500.000 pps

Thanks

[PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg()

Goal of this patch is to permit nfnetlink providers not mandate
nfnl_mutex being held while nfnetlink_rcv_msg() calls them.

If struct nfnl_callback contains a non NULL call_rcu(), then
nfnetlink_rcv_msg() will use it instead of call() field, holding
rcu_read_lock instead of nfnl_mutex

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Eric Leblond <eric@regit.org>
CC: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter/nfnetlink.h |    3 +
 net/netfilter/nfnetlink.c           |   40 +++++++++++++++++++-------
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/include/linux/netfilter/nfnetlink.h b/include/linux/netfilter/nfnetlink.h
index 2b11fc1..74d3386 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -60,6 +60,9 @@ struct nfnl_callback {
 	int (*call)(struct sock *nl, struct sk_buff *skb, 
 		    const struct nlmsghdr *nlh,
 		    const struct nlattr * const cda[]);
+	int (*call_rcu)(struct sock *nl, struct sk_buff *skb, 
+		    const struct nlmsghdr *nlh,
+		    const struct nlattr * const cda[]);
 	const struct nla_policy *policy;	/* netlink attribute policy */
 	const u_int16_t attr_count;		/* number of nlattr's */
 };
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index b4a4532..1905976 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -37,7 +37,7 @@ MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_NETFILTER);
 
 static char __initdata nfversion[] = "0.30";
 
-static const struct nfnetlink_subsystem *subsys_table[NFNL_SUBSYS_COUNT];
+static const struct nfnetlink_subsystem __rcu *subsys_table[NFNL_SUBSYS_COUNT];
 static DEFINE_MUTEX(nfnl_mutex);
 
 void nfnl_lock(void)
@@ -59,7 +59,7 @@ int nfnetlink_subsys_register(const struct nfnetlink_subsystem *n)
 		nfnl_unlock();
 		return -EBUSY;
 	}
-	subsys_table[n->subsys_id] = n;
+	rcu_assign_pointer(subsys_table[n->subsys_id], n);
 	nfnl_unlock();
 
 	return 0;
@@ -71,7 +71,7 @@ int nfnetlink_subsys_unregister(const struct nfnetlink_subsystem *n)
 	nfnl_lock();
 	subsys_table[n->subsys_id] = NULL;
 	nfnl_unlock();
-
+	synchronize_rcu();
 	return 0;
 }
 EXPORT_SYMBOL_GPL(nfnetlink_subsys_unregister);
@@ -83,7 +83,7 @@ static inline const struct nfnetlink_subsystem *nfnetlink_get_subsys(u_int16_t t
 	if (subsys_id >= NFNL_SUBSYS_COUNT)
 		return NULL;
 
-	return subsys_table[subsys_id];
+	return rcu_dereference(subsys_table[subsys_id]);
 }
 
 static inline const struct nfnl_callback *
@@ -139,21 +139,27 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 	type = nlh->nlmsg_type;
 replay:
+	rcu_read_lock();
 	ss = nfnetlink_get_subsys(type);
 	if (!ss) {
 #ifdef CONFIG_MODULES
-		nfnl_unlock();
+		rcu_read_unlock();
 		request_module("nfnetlink-subsys-%d", NFNL_SUBSYS_ID(type));
-		nfnl_lock();
+		rcu_read_lock();
 		ss = nfnetlink_get_subsys(type);
 		if (!ss)
 #endif
+		{
+			rcu_read_unlock();
 			return -EINVAL;
+		}
 	}
 
 	nc = nfnetlink_find_client(type, ss);
-	if (!nc)
+	if (!nc) {
+		rcu_read_unlock();
 		return -EINVAL;
+	}
 
 	{
 		int min_len = NLMSG_SPACE(sizeof(struct nfgenmsg));
@@ -167,7 +173,23 @@ replay:
 		if (err < 0)
 			return err;
 
-		err = nc->call(net->nfnl, skb, nlh, (const struct nlattr **)cda);
+		if (nc->call_rcu) {
+			err = nc->call_rcu(net->nfnl, skb, nlh,
+					   (const struct nlattr **)cda);
+			rcu_read_unlock();
+		} else {
+			rcu_read_unlock();
+			nfnl_lock();
+			if (rcu_dereference_protected(
+					subsys_table[NFNL_SUBSYS_ID(type)],
+					lockdep_is_held(&nfnl_mutex)) != ss ||
+			    nfnetlink_find_client(type, ss) != nc)
+				err = -EAGAIN;
+			else
+				err = nc->call(net->nfnl, skb, nlh,
+						   (const struct nlattr **)cda);
+			nfnl_unlock();
+		}
 		if (err == -EAGAIN)
 			goto replay;
 		return err;
@@ -176,9 +198,7 @@ replay:
 
 static void nfnetlink_rcv(struct sk_buff *skb)
 {
-	nfnl_lock();
 	netlink_rcv_skb(skb, &nfnetlink_rcv_msg);
-	nfnl_unlock();
 }
 
 static int __net_init nfnetlink_net_init(struct net *net)


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: netfilter queue throughput slowdown
  2011-06-29  9:17 netfilter queue throughput slowdown Anders Nilsson Plymoth
  2011-06-29  9:47 ` Eric Dumazet
@ 2011-07-02 12:25 ` Pablo Neira Ayuso
  1 sibling, 0 replies; 28+ messages in thread
From: Pablo Neira Ayuso @ 2011-07-02 12:25 UTC (permalink / raw)
  To: Anders Nilsson Plymoth; +Cc: netfilter-devel

On 29/06/11 11:17, Anders Nilsson Plymoth wrote:
> Hi,
> 
> I am using libnetfilter-queue on a router running Ubuntu 10.10 with
> 2.6.35-28-generic. The problem I am having is that I am experiencing a
> very significant throughput slowdown whenever my NFQUEUE program is
> running. This happens even when I use bare bone libnetfilter-queue
> program that immediately issues an ACCEPT verdict as soon as it
> receives a packet. Whenever this program is running, my max throughput
> is cut in half, and the reason it happens is because nf_queue
> overflows (nf_queue: full at 1024 entries, dropping packets(s)), and I
> notice my CPU utilization is 100%. However, when my program is not
> running and I am not passing packets through NFQUEUE and the router
> routes packets as normal, I get full throughput with only 0.1% CPU
> utilization.
> 
> I find this a bit strange, can the netfilter queue processing take the
> cpu from 0.1% to 100% and start dropping packets even with no other
> processing than setting immediately setting the verdict? We have two
> of these machines, with identical hardware and OS, and they experience
> the same behavior.
> I am also confused as we have been using these machines previously and
> been able to obtain full throughput with our netfilter program.
> 
> Does anyone have a clue here, or suggest what I should look into in
> order to speed things up.

Did you have a look at the suggestion available in the documentation:

http://www.netfilter.org/projects/libnetfilter_queue/doxygen/

See Performance.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg()
  2011-07-01 15:27                               ` [PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg() Eric Dumazet
  2011-07-01 14:11                                 ` Florian Westphal
@ 2011-07-05 13:22                                 ` Patrick McHardy
  2011-07-18 14:06                                 ` Patrick McHardy
  2 siblings, 0 replies; 28+ messages in thread
From: Patrick McHardy @ 2011-07-05 13:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, Eric Leblond, sclark46, Kuzin Andrey,
	Anders Nilsson Plymoth, netfilter-devel, netdev

On 01.07.2011 17:27, Eric Dumazet wrote:
> Le vendredi 01 juillet 2011 à 09:49 +0200, Florian Westphal a écrit :
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Number one offender is the nfnl_lock mutex hold each time we give a
>>> verdict.
>>
>> Yes, the nfnl mutex is fairly annoying for nfqueue.
>>
>> Unfortunately it is not possible to just remove it
>> completely since it also protects against module removal.
>>
> 
> I believe it can, just add appropriate synchronization points.
> 
>> But I guess even having to grab a refcount would be
>> a huge win as opposed to holding on to the nfnl mutex...
>>
>> We'd also need to audit all ->call implementations; most
>> of them assume the nfnl_mutex is being hold.
> 
> CC netdev
> 
> We can do another way : Introduce a new ->call_rcu() implementation
> and convert places where we prefer not holding nfnf_mutex.
> 
> If/when all places are converted, remove the ->call() field for good.

We've talked about this a few times, but we have some pretty deep
call chains especially in ctnetlink, which are using sleeping
allocations. Not sure whether we really want to convert those.
An alternative would be to push locking down one level and have
the subsystem decide whether to use RCU or the mutex. However that
would require taking a reference to the subsystem in nfnetlink to
avoid module unloda races.

> With following two patches, I was able to reach more than 2.000.000 pps
> without losses on my setup (limited by my lab setup), instead of less
> than 500.000 pps

That sounds pretty impressive.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg()
  2011-07-01 15:27                               ` [PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg() Eric Dumazet
  2011-07-01 14:11                                 ` Florian Westphal
  2011-07-05 13:22                                 ` Patrick McHardy
@ 2011-07-18 14:06                                 ` Patrick McHardy
  2 siblings, 0 replies; 28+ messages in thread
From: Patrick McHardy @ 2011-07-18 14:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, Eric Leblond, sclark46, Kuzin Andrey,
	Anders Nilsson Plymoth, netfilter-devel, netdev

On 01.07.2011 17:27, Eric Dumazet wrote:
> [PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg()
> 
> Goal of this patch is to permit nfnetlink providers not mandate
> nfnl_mutex being held while nfnetlink_rcv_msg() calls them.
> 
> If struct nfnl_callback contains a non NULL call_rcu(), then
> nfnetlink_rcv_msg() will use it instead of call() field, holding
> rcu_read_lock instead of nfnl_mutex
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Florian Westphal <fw@strlen.de>
> CC: Eric Leblond <eric@regit.org>
> CC: Patrick McHardy <kaber@trash.net>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-07-18 14:06 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-29  9:17 netfilter queue throughput slowdown Anders Nilsson Plymoth
2011-06-29  9:47 ` Eric Dumazet
2011-06-29  9:55   ` Anders Nilsson Plymoth
2011-06-29 10:08     ` Eric Dumazet
2011-06-30  6:20       ` Kuzin Andrey
2011-06-30  6:47         ` Eric Dumazet
2011-06-30  7:36           ` Kuzin Andrey
2011-06-30 11:34             ` Eric Dumazet
2011-06-30 11:59               ` Patrick McHardy
2011-06-30 15:15                 ` Eric Dumazet
2011-06-30 14:32                   ` Stephen Clark
2011-06-30 14:51                     ` Patrick McHardy
2011-06-30 17:07                       ` Eric Leblond
2011-06-30 17:45                         ` Eric Dumazet
2011-06-30 18:08                           ` Eric Leblond
2011-07-01  6:39                           ` Amos Jeffries
2011-07-01  7:00                           ` [RFC] nfnetlink_queue not scalable Eric Dumazet
2011-07-01  7:49                             ` Florian Westphal
2011-07-01 15:27                               ` [PATCH 1/2] nfnetlink: add RCU in nfnetlink_rcv_msg() Eric Dumazet
2011-07-01 14:11                                 ` Florian Westphal
2011-07-05 13:22                                 ` Patrick McHardy
2011-07-18 14:06                                 ` Patrick McHardy
2011-07-01 15:08                           ` netfilter queue throughput slowdown Anders Nilsson Plymoth
2011-06-30 22:24                   ` Sam Roberts
2011-07-01  4:53                     ` Eric Dumazet
2011-06-30 22:26         ` Sam Roberts
2011-07-01  4:52           ` Eric Dumazet
2011-07-02 12:25 ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.