All of lore.kernel.org
 help / color / mirror / Atom feed
* (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
@ 2009-02-06 18:14 Anton VG
  2009-02-08  1:34 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Anton VG @ 2009-02-06 18:14 UTC (permalink / raw)
  To: netfilter-devel

Hello Friends,

Just came to a problem with nfnetlink_queue -

I've created a service where users do connect to a host, and every
connected user (over PPP) get's a separate NF-QUEUE -
In this QUEUE i do packet accounting, to a different destination. When
number of simultaneous queues went to 40+  I just came to a problem -
deadloop with continues generation of the error to stderr - (3GB of
record in the log in 3 minutes of deadlock)

nfnl_talk: recvmsg over-run

GDB connected and backtrace showed the loop in the following:

in write () from /lib/libc.so.6
(gdb) bt
#0  0x00007f67b94c041f in write () from /lib/libc.so.6
#1  0x00007f67b946a743 in _IO_file_write () from /lib/libc.so.6
#2  0x00007f67b946baf8 in _IO_file_xsputn () from /lib/libc.so.6
#3  0x00007f67b9444442 in cuserid () from /lib/libc.so.6
#4  0x00007f67b944508f in vfprintf () from /lib/libc.so.6
#5  0x00007f67b944e328 in fprintf () from /lib/libc.so.6
#6  0x00007f67b930102d in nfnl_talk (nfnlh=0x53c4b0, n=<value
optimized out>, peer=<value optimized out>, groups=<value optimized
out>, answer=0x0,
    junk=0, jarg=0x0) at libnfnetlink.c:678
#7  0x00007f67b9be457f in __build_send_cfg_msg (h=0x5398d0, command=1
'\001', queuenum=<value optimized out>, pf=0) at
libnetfilter_queue.c:114
#8  0x00007f67b9be46e6 in nfq_create_queue (h=0x5398d0, num=40,
cb=0x41104a <cb>, data=0x5c8b68) at libnetfilter_queue.c:246
#9  0x0000000000410579 in nfqhandler::add_queue (this=0x53c3e0,
group=40, dev=0x551578 "ppp40", ip=318845450) at nfqlib.cpp:369
#10 0x00000000004065df in hndpptp::setda (this=0x5301a0,
pptp_pid=1505) at hndlib.cpp:418
#11 0x0000000000406b05 in hndpptp::dologin (this=0x5301a0, pi={_M_node
= 0x588a40}) at hndlib.cpp:453
#12 0x0000000000408603 in hndpptp::run (this=0x5301a0) at hndlib.cpp:268
#13 0x000000000040522c in main () at nfman.cpp:34

Also I'm watching the following in the dmesg (though, it does not kill
the service) - but maybe somehow influences?

__ratelimit: 14 messages suppressed
nf_queue: full at 1024 entries, dropping packets(s). Dropped: 679

Further details: using kernel 2.6.26.5 and
libnetfilter_queue-0.0.16
libnfnetlink-0.0.39

Do you think that increasing the NFQNL_QMAX_DEFAULT from 1024 to 10240
would solve the problem
(in linux-2.6.26.5/net/netfilter/nfnetlink_queue.c) - or the problem is deeper?

Maybe anything like this is fixed in further versions of the kernel of
libraries?

Will be grateful for any help

Anton.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
  2009-02-06 18:14 (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required? Anton VG
@ 2009-02-08  1:34 ` Pablo Neira Ayuso
  2009-02-09 10:56   ` Anton
  0 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-08  1:34 UTC (permalink / raw)
  To: Anton VG; +Cc: netfilter-devel

Anton VG wrote:
> Hello Friends,
> 
> Just came to a problem with nfnetlink_queue -
> 
> I've created a service where users do connect to a host, and every
> connected user (over PPP) get's a separate NF-QUEUE -
> In this QUEUE i do packet accounting, to a different destination. When
> number of simultaneous queues went to 40+  I just came to a problem -
> deadloop with continues generation of the error to stderr - (3GB of
> record in the log in 3 minutes of deadlock)
> 
> nfnl_talk: recvmsg over-run

This happens when netlink fails to deliver a packet from kernel to
userspace due to an overrun in the buffer.

> GDB connected and backtrace showed the loop in the following:
[...]
> Also I'm watching the following in the dmesg (though, it does not kill
> the service) - but maybe somehow influences?
> 
> __ratelimit: 14 messages suppressed
> nf_queue: full at 1024 entries, dropping packets(s). Dropped: 679

This message is triggered when you exceed queue_maxlen.

> Further details: using kernel 2.6.26.5 and
> libnetfilter_queue-0.0.16
> libnfnetlink-0.0.39
> 
> Do you think that increasing the NFQNL_QMAX_DEFAULT from 1024 to 10240
> would solve the problem
> (in linux-2.6.26.5/net/netfilter/nfnetlink_queue.c) - or the problem is deeper?

That would reduce the chances to hit the printk error that you have
reported (which I think that it needs to be removed or disabled it, we
have the /proc interface to report this error, the point would be to
document this issue in the library).

For the ENOBUFS problem, what you can do is to increase the buffer size,
that will delay the appearance of the ENOBUFS problem. Please, see
nfnl_rcvbufsiz() in libnfnetlink. Increasing the priority of the process
via nice() would reduce the chances to hit ENOBUFS.

> Maybe anything like this is fixed in further versions of the kernel of
> libraries?

ENOBUFS is there to tell userspace that Netlink cannot back off. It's
not a bug, it's a feature of Netlink.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
  2009-02-08  1:34 ` Pablo Neira Ayuso
@ 2009-02-09 10:56   ` Anton
  2009-02-09 11:20     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Anton @ 2009-02-09 10:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo,

Just a little more thought, this happened when my 
application attempted to create a NEW queue, while in 
kernel packet queue overrun situation. And after calling 
the nfq_create_queue there was no return from it, and only 
error message generation in the loop inside the 
libnetfilter_queue library. Looks like not a very proper 
behaviour. Am I missing something?

With best regards,
Anton.

On Sunday 08 February 2009 06:34, Pablo Neira Ayuso wrote:
> Anton VG wrote:
> > Hello Friends,
> >
> > Just came to a problem with nfnetlink_queue -
> >
> > I've created a service where users do connect to a
> > host, and every connected user (over PPP) get's a
> > separate NF-QUEUE - In this QUEUE i do packet
> > accounting, to a different destination. When number of
> > simultaneous queues went to 40+  I just came to a
> > problem - deadloop with continues generation of the
> > error to stderr - (3GB of record in the log in 3
> > minutes of deadlock)
> >
> > nfnl_talk: recvmsg over-run
>
> This happens when netlink fails to deliver a packet from
> kernel to userspace due to an overrun in the buffer.
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
  2009-02-09 10:56   ` Anton
@ 2009-02-09 11:20     ` Pablo Neira Ayuso
  2009-02-11  8:48       ` Anton
       [not found]       ` <49928B62.1090600@netfilter.org>
  0 siblings, 2 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-09 11:20 UTC (permalink / raw)
  To: Anton; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

Anton wrote:
> Pablo,
> 
> Just a little more thought, this happened when my 
> application attempted to create a NEW queue, while in 
> kernel packet queue overrun situation. And after calling 
> the nfq_create_queue there was no return from it, and only 
> error message generation in the loop inside the 
> libnetfilter_queue library. Looks like not a very proper 
> behaviour. Am I missing something?

Oh I see, I did not get the point initially. Does this patch fix the
problem?

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1066 bytes --]

diff --git a/src/libnetfilter_queue.c b/src/libnetfilter_queue.c
index 9e4903b..ec17595 100644
--- a/src/libnetfilter_queue.c
+++ b/src/libnetfilter_queue.c
@@ -141,7 +141,7 @@ __build_send_cfg_msg(struct nfq_handle *h, u_int8_t command,
 	cmd.pf = htons(pf);
 	nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_CFG_CMD, &cmd, sizeof(cmd));
 
-	return nfnl_talk(h->nfnlh, &u.nmh, 0, 0, NULL, NULL, NULL);
+	return nfnl_query(h->nfnlh, &u.nmh);
 }
 
 static int __nfq_rcv_pkt(struct nlmsghdr *nlh, struct nfattr *nfa[],
@@ -553,7 +553,7 @@ int nfq_set_mode(struct nfq_q_handle *qh,
 	nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_CFG_PARAMS, &params,
 			sizeof(params));
 
-	return nfnl_talk(qh->h->nfnlh, &u.nmh, 0, 0, NULL, NULL, NULL);
+	return nfnl_query(qh->h->nfnlh, &u.nmh);
 }
 
 /**
@@ -581,7 +581,7 @@ int nfq_set_queue_maxlen(struct nfq_q_handle *qh,
 	nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_CFG_QUEUE_MAXLEN, &queue_maxlen,
 			sizeof(queue_maxlen));
 
-	return nfnl_talk(qh->h->nfnlh, &u.nmh, 0, 0, NULL, NULL, NULL);
+	return nfnl_query(qh->h->nfnlh, &u.nmh);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
  2009-02-09 11:20     ` Pablo Neira Ayuso
@ 2009-02-11  8:48       ` Anton
       [not found]       ` <49928B62.1090600@netfilter.org>
  1 sibling, 0 replies; 23+ messages in thread
From: Anton @ 2009-02-11  8:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

I'll try to create a test environment for this, since it 
happened in the production PC, so it hard to just try the 
patch on production :)

On Monday 09 February 2009 16:20, Pablo Neira Ayuso wrote:
> Anton wrote:
> > Pablo,
> >
> > Just a little more thought, this happened when my
> > application attempted to create a NEW queue, while in
> > kernel packet queue overrun situation. And after
> > calling the nfq_create_queue there was no return from
> > it, and only error message generation in the loop
> > inside the libnetfilter_queue library. Looks like not a
> > very proper behaviour. Am I missing something?
>
> Oh I see, I did not get the point initially. Does this
> patch fix the problem?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
       [not found]       ` <49928B62.1090600@netfilter.org>
@ 2009-02-11 12:26         ` Anton VG
  2009-02-11 16:41           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Anton VG @ 2009-02-11 12:26 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel

Pablo,

On 64bit system, after applying the patch, nfq_create_queue() started
to oftenly return NULL, and calling
strerror(errno) - I've got a error "Invalid or incomplete multibyte or
wide character".

On 32bit it works. This have been tested with only 2x simultaneous queues
Any thoughts?

2009/2/11 Pablo Neira Ayuso <pablo@netfilter.org>:
> Updates on this?
>
> Pablo Neira Ayuso wrote:
>> Anton wrote:
>>> Pablo,
>>>
>>> Just a little more thought, this happened when my
>>> application attempted to create a NEW queue, while in
>>> kernel packet queue overrun situation. And after calling
>>> the nfq_create_queue there was no return from it, and only
>>> error message generation in the loop inside the
>>> libnetfilter_queue library. Looks like not a very proper
>>> behaviour. Am I missing something?
>>
>> Oh I see, I did not get the point initially. Does this patch fix the
>> problem?
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> diff --git a/src/libnetfilter_queue.c b/src/libnetfilter_queue.c
>> index 9e4903b..ec17595 100644
>> --- a/src/libnetfilter_queue.c
>> +++ b/src/libnetfilter_queue.c
>> @@ -141,7 +141,7 @@ __build_send_cfg_msg(struct nfq_handle *h, u_int8_t command,
>>       cmd.pf = htons(pf);
>>       nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_CFG_CMD, &cmd, sizeof(cmd));
>>
>> -     return nfnl_talk(h->nfnlh, &u.nmh, 0, 0, NULL, NULL, NULL);
>> +     return nfnl_query(h->nfnlh, &u.nmh);
>>  }
>>
>>  static int __nfq_rcv_pkt(struct nlmsghdr *nlh, struct nfattr *nfa[],
>> @@ -553,7 +553,7 @@ int nfq_set_mode(struct nfq_q_handle *qh,
>>       nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_CFG_PARAMS, &params,
>>                       sizeof(params));
>>
>> -     return nfnl_talk(qh->h->nfnlh, &u.nmh, 0, 0, NULL, NULL, NULL);
>> +     return nfnl_query(qh->h->nfnlh, &u.nmh);
>>  }
>>
>>  /**
>> @@ -581,7 +581,7 @@ int nfq_set_queue_maxlen(struct nfq_q_handle *qh,
>>       nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_CFG_QUEUE_MAXLEN, &queue_maxlen,
>>                       sizeof(queue_maxlen));
>>
>> -     return nfnl_talk(qh->h->nfnlh, &u.nmh, 0, 0, NULL, NULL, NULL);
>> +     return nfnl_query(qh->h->nfnlh, &u.nmh);
>>  }
>>
>>  /**
>
>
> --
> "Los honestos son inadaptados sociales" -- Les Luthiers
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-11 12:26         ` Anton VG
@ 2009-02-11 16:41           ` Pablo Neira Ayuso
  2009-02-12 10:45             ` Anton
  0 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-11 16:41 UTC (permalink / raw)
  To: Anton VG; +Cc: netfilter-devel

Anton VG wrote:
> Pablo,
> 
> On 64bit system, after applying the patch, nfq_create_queue() started
> to oftenly return NULL, and calling
> strerror(errno) - I've got a error "Invalid or incomplete multibyte or
> wide character".
> 
> On 32bit it works. This have been tested with only 2x simultaneous queues
> Any thoughts?

Let me check this. Do you think that I can reproduce it with the test
file in libnetfilter_queue? If you can pass something similar to your
code to reproduce the problem, it would be great. Thanks.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-11 16:41           ` Pablo Neira Ayuso
@ 2009-02-12 10:45             ` Anton
  2009-02-12 12:43               ` Pablo Neira Ayuso
  2009-02-14 17:13               ` Pablo Neira Ayuso
  0 siblings, 2 replies; 23+ messages in thread
From: Anton @ 2009-02-12 10:45 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo,

Some more info. After applying the patch, If we do try to 
just create 100 QUEUE's by the test code - on the test PC, 
with _no_ transit traffic, routed to QUEUE's - it works 
fine, queues created with no problem. 
But if we do this on the live PC, with trasit traffic routed 
to queues - we came to the problem once in a few queues. 
We localized the place, and the sequence is as follows:
nfnl_query=>nfnl_catch=>nfnl_process

and in the nfnl_process

if (nlh->nlmsg_seq && nlh->nlmsg_seq != h->seq) {
                errno = EILSEQ;
                return -1;
}

and varibales are
nlh->msg_seq=1234422225, h->seq=1234422229.

EILSEQ=84
strerr(84) returns "Invalid or incomplete multibyte or wide 
character"

Any clue on this?

Regards,
Anton.

On Wednesday 11 February 2009 21:41, Pablo Neira Ayuso 
wrote:
> Anton VG wrote:
> > Pablo,
> >
> > On 64bit system, after applying the patch,
> > nfq_create_queue() started to oftenly return NULL, and
> > calling
> > strerror(errno) - I've got a error "Invalid or
> > incomplete multibyte or wide character".
> >
> > On 32bit it works. This have been tested with only 2x
> > simultaneous queues Any thoughts?
>
> Let me check this. Do you think that I can reproduce it
> with the test file in libnetfilter_queue? If you can pass
> something similar to your code to reproduce the problem,
> it would be great. Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-12 10:45             ` Anton
@ 2009-02-12 12:43               ` Pablo Neira Ayuso
  2009-02-14  9:03                 ` Anton
  2009-02-14 17:13               ` Pablo Neira Ayuso
  1 sibling, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-12 12:43 UTC (permalink / raw)
  To: Anton; +Cc: netfilter-devel

Anton wrote:
> Pablo,
> 
> Some more info. After applying the patch, If we do try to 
> just create 100 QUEUE's by the test code - on the test PC, 
> with _no_ transit traffic, routed to QUEUE's - it works 
> fine, queues created with no problem. 
> But if we do this on the live PC, with trasit traffic routed 
> to queues - we came to the problem once in a few queues. 
> We localized the place, and the sequence is as follows:
> nfnl_query=>nfnl_catch=>nfnl_process
> 
> and in the nfnl_process
> 
> if (nlh->nlmsg_seq && nlh->nlmsg_seq != h->seq) {
>                 errno = EILSEQ;
>                 return -1;
> }
> 
> and varibales are
> nlh->msg_seq=1234422225, h->seq=1234422229.
> 
> EILSEQ=84
> strerr(84) returns "Invalid or incomplete multibyte or wide 
> character"
> 
> Any clue on this?

There's some race condition. It seems that you're receiving packets from
kernel-space to nfqueue before the ACK message from kernel-space to
user-space to confirm subscription is send. Let me investigate this.
Thanks for the accurate report.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-12 12:43               ` Pablo Neira Ayuso
@ 2009-02-14  9:03                 ` Anton
  0 siblings, 0 replies; 23+ messages in thread
From: Anton @ 2009-02-14  9:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo,

Today (unpatched) case happened with increased buffer size 
to 10240, nice=0, trying with nice=-15
No dmesg messages on overflows. Just loop again. 
Patch yet not usable, since returns failure too often on 
queue creation. Any update from you?

Regards,
Anton.

On Thursday 12 February 2009 17:43, Pablo Neira Ayuso wrote:
> Anton wrote:
> > Pablo,
> >
> > Some more info. After applying the patch, If we do try
> > to just create 100 QUEUE's by the test code - on the
> > test PC, with _no_ transit traffic, routed to QUEUE's -
> > it works fine, queues created with no problem.
> > But if we do this on the live PC, with trasit traffic
> > routed to queues - we came to the problem once in a few
> > queues. We localized the place, and the sequence is as
> > follows: nfnl_query=>nfnl_catch=>nfnl_process
> >
> > and in the nfnl_process
> >
> > if (nlh->nlmsg_seq && nlh->nlmsg_seq != h->seq) {
> >                 errno = EILSEQ;
> >                 return -1;
> > }
> >
> > and varibales are
> > nlh->msg_seq=1234422225, h->seq=1234422229.
> >
> > EILSEQ=84
> > strerr(84) returns "Invalid or incomplete multibyte or
> > wide character"
> >
> > Any clue on this?
>
> There's some race condition. It seems that you're
> receiving packets from kernel-space to nfqueue before the
> ACK message from kernel-space to user-space to confirm
> subscription is send. Let me investigate this. Thanks for
> the accurate report.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-12 10:45             ` Anton
  2009-02-12 12:43               ` Pablo Neira Ayuso
@ 2009-02-14 17:13               ` Pablo Neira Ayuso
  2009-02-16 13:19                 ` Anton
  1 sibling, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-14 17:13 UTC (permalink / raw)
  To: Anton; +Cc: netfilter-devel

Anton wrote:
> Pablo,
> 
> Some more info. After applying the patch, If we do try to 
> just create 100 QUEUE's by the test code - on the test PC, 
> with _no_ transit traffic, routed to QUEUE's - it works 
> fine, queues created with no problem. 
> But if we do this on the live PC, with trasit traffic routed 
> to queues - we came to the problem once in a few queues. 
> We localized the place, and the sequence is as follows:
> nfnl_query=>nfnl_catch=>nfnl_process
> 
> and in the nfnl_process
> 
> if (nlh->nlmsg_seq && nlh->nlmsg_seq != h->seq) {
>                 errno = EILSEQ;
>                 return -1;
> }
> 
> and varibales are
> nlh->msg_seq=1234422225, h->seq=1234422229.

This means that we expected to receive 1234422229, but we got 1234422225
instead. I don't fine any explanation for this but this is spotting a
problem somewhere (in the library or your application) that nfnl_talk
silently ignores. Could you send me the code that you use to trigger this?

Even if you don't have the problem anymore, We have to replace that
nfnl_talk() which looks broken in several aspects.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-14 17:13               ` Pablo Neira Ayuso
@ 2009-02-16 13:19                 ` Anton
  2009-02-16 13:42                   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Anton @ 2009-02-16 13:19 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

The code for full application is quite a big and overloaded 
with extra functionality like threading, database 
connectivity and so on. We'll try to make a simple 
emulation app to trigger the case and i'll send it.

On Saturday 14 February 2009 22:13, Pablo Neira Ayuso wrote:
> Anton wrote:
> > Pablo,
> >
> > Some more info. After applying the patch, If we do try
> > to just create 100 QUEUE's by the test code - on the
> > test PC, with _no_ transit traffic, routed to QUEUE's -
> > it works fine, queues created with no problem.
> > But if we do this on the live PC, with trasit traffic
> > routed to queues - we came to the problem once in a few
> > queues. We localized the place, and the sequence is as
> > follows: nfnl_query=>nfnl_catch=>nfnl_process
> >
> > and in the nfnl_process
> >
> > if (nlh->nlmsg_seq && nlh->nlmsg_seq != h->seq) {
> >                 errno = EILSEQ;
> >                 return -1;
> > }
> >
> > and varibales are
> > nlh->msg_seq=1234422225, h->seq=1234422229.
>
> This means that we expected to receive 1234422229, but we
> got 1234422225 instead. I don't fine any explanation for
> this but this is spotting a problem somewhere (in the
> library or your application) that nfnl_talk silently
> ignores. Could you send me the code that you use to
> trigger this?
>
> Even if you don't have the problem anymore, We have to
> replace that nfnl_talk() which looks broken in several
> aspects.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-16 13:19                 ` Anton
@ 2009-02-16 13:42                   ` Pablo Neira Ayuso
  2009-02-16 14:38                     ` Anton VG
  0 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-16 13:42 UTC (permalink / raw)
  To: Anton; +Cc: netfilter-devel

Anton wrote:
> The code for full application is quite a big and overloaded 
> with extra functionality like threading, database 
> connectivity and so on. We'll try to make a simple 
> emulation app to trigger the case and i'll send it.

Threading is the point that I wanted to hear. Are you creating those 
queues inside threads? The queue creation path is not thread-safe and it 
  requires mutex.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
  2009-02-16 13:42                   ` Pablo Neira Ayuso
@ 2009-02-16 14:38                     ` Anton VG
  2009-02-16 15:23                       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Anton VG @ 2009-02-16 14:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, Vitaly Bodzhgua

[-- Attachment #1: Type: text/plain, Size: 1553 bytes --]

Pablo,
Attached is the code which triggers the case, and it does not use
threads (btw we of coase use mutexes in threaded app)

How to use it:
at first, the app created 40 queues and attaches to output. Every
first 40 created queues have assigned corrwsponding
192.168.1.{queue_num} IP address assigned to the queue.
This means, for instance when you send a file to an IP address
192.168.1.37 it flows through QUEUE 37.

Than app started the loop, where it's randomly creates and destroys
extra queues (over 40) every second.

After starting the app, you need to send a big file, say 1GB, over FTP
to anther PC with IP address from group of first 40, we used
192.168.1.37

Somewhere is the middle of sending the file it triggers the error on
queue creation.

Please make sure that you use OUTPUT chain, and you send a file from test PC.
With this test code we did trigger this both on 64 and 32 bit systems.
Kernel versions on test PC's 2.6.26.1 and 2.6.26.5

Just let me know if anything needs clarification.

Regards,
Anton.

2009/2/16 Pablo Neira Ayuso <pablo@netfilter.org>:
> Anton wrote:
>>
>> The code for full application is quite a big and overloaded with extra
>> functionality like threading, database connectivity and so on. We'll try to
>> make a simple emulation app to trigger the case and i'll send it.
>
> Threading is the point that I wanted to hear. Are you creating those queues
> inside threads? The queue creation path is not thread-safe and it  requires
> mutex.
>
> --
> "Los honestos son inadaptados sociales" -- Les Luthiers
>

[-- Attachment #2: nftst.cpp --]
[-- Type: text/x-c++src, Size: 6842 bytes --]


#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <netinet/in.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>
#include <netinet/udp.h>
#include <sys/ioctl.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/time.h>
#include <arpa/inet.h>

#include <string.h>
#include <signal.h>

#include <map>
#include <string>
#include <sstream>

using namespace std;


#ifdef  __cplusplus
extern "C" {
#endif

#include <linux/netfilter.h>		/* for NF_ACCEPT */
#include <libnfnetlink/libnfnetlink.h>
#include <libnetfilter_queue/libnetfilter_queue.h>
#include <libnetfilter_queue/linux_nfnetlink_queue.h>

#ifdef  __cplusplus
}
#endif


#define QNUM 40
#define QSTART 1000

int TERM=0;

void sig_term(int sig) {
 TERM=1;
}

struct info {
 int qnum;
 struct nfq_q_handle *qh;
};

map<int,info> queues; //queus[qnum]=info

//GLOBAlS HERE
struct nfq_handle *h;
struct nfq_q_handle *qh;
struct nfnl_handle *nh;
int nfqfd;

int rv;
char buf[4096];
//=========================

template <class T>
std::string stringify(T x) {
 std::stringstream o;
 o << x;
 return o.str();
}

#include <sys/resource.h>

int force_core(unsigned long core_size_cur, unsigned long core_size_max) {
 struct  rlimit rlim;

 rlim.rlim_cur=core_size_cur;
 rlim.rlim_max=core_size_max;
 return setrlimit(RLIMIT_CORE,&rlim);
}

//HANDLES PACKETS HERE
static u_int32_t handle_pkt (struct nfq_data *tb,struct info *spec,int &verdict) {
  int id = 0;
  struct nfqnl_msg_packet_hdr *ph;
  u_int32_t mark;
  int ret;
  char *data;
//=	
  struct iphdr  *ip;
  struct tcphdr *tcp;
  struct udphdr *udp;
  char saddr[20],daddr[20];
  int sport=0;
  int dport=0;
  int i;
//=	
  
  verdict=NF_ACCEPT;  
	
  ph = nfq_get_msg_packet_hdr(tb);
  if (ph) {
	id = ntohl(ph->packet_id);
	printf("hw_protocol=0x%04x hook=%u id=%u ",
	ntohs(ph->hw_protocol), ph->hook, id);	 			
  }
	
  mark = nfq_get_nfmark(tb);
  printf("mark=%u ", mark);
  ret = nfq_get_payload(tb, &data);

//======
  ip=(struct iphdr*) data;
  if (ip->protocol==6) {
    tcp=(struct tcphdr*) (data + (4 * ip->ihl));
    sport = htons(tcp->source);
    dport = htons(tcp->dest);
  } else if (ip->protocol==17)  {
     udp=(struct udphdr*) (data + (4 * ip->ihl));
     sport = htons(udp->source);
     dport = htons(udp->dest);
  }

  strcpy(saddr,inet_ntoa(*(struct in_addr*)&ip->saddr));
  strcpy(daddr,inet_ntoa(*(struct in_addr*)&ip->daddr));

  printf("%i src=%s:%u  dst=%s:%u size=%u proto=%u",
                 spec->qnum,saddr,sport,daddr,dport,htons(ip->tot_len),ip->protocol);

//=======
  fputc('\n', stdout);
  return id;
}
	

static int cb(struct nfq_q_handle *qh, struct nfgenmsg *nfmsg,
	      struct nfq_data *nfa, void *data) {
        int verdict=NF_DROP;	      
 	u_int32_t id = handle_pkt(nfa, (struct info *) data, verdict);
	return nfq_set_verdict(qh, id, verdict, 0, NULL);
}


//INIT NFQ
void init_nfq() {
 h=0;
 
 printf("opening library handle\n");
 h = nfq_open();
 if (!h) throw "error during nfq_open()";

 printf("unbinding existing nf_queue handler for AF_INET (if any)\n");
 if (nfq_unbind_pf(h, AF_INET) < 0) throw "error during nfq_unbind_pf()";

 printf("binding nfnetlink_queue as nf_queue handler for AF_INET\n");
 if (nfq_bind_pf(h, AF_INET) < 0) throw "error during nfq_bind_pf()";

 nh = nfq_nfnlh(h);
 nfqfd = nfnl_fd(nh);
    
 if (nfqfd>0) {
   fcntl(nfqfd,F_SETFL,O_NONBLOCK);
 } else throw "fail to set nfq nfnl fd";
}


//CHECK IF THERE ANY PACKETS IN QUEUE
int check_nfq(int timeout) {
 fd_set readfds;
 struct timeval tv;
 int res;

 if (!queues.size()) return -100;

 FD_ZERO(&readfds);
 FD_SET(nfqfd,&readfds);

 if (timeout<0) {
   res=select(nfqfd+1,&readfds,NULL,NULL,NULL);
 } else {
   tv.tv_sec=0; //timeout;
   tv.tv_usec=timeout;
   res=select(nfqfd+1,&readfds,NULL,NULL,&tv);
 }

 if (res>0) {
  res=0;
   int rv;
   char buf[4096];
   if (FD_ISSET(nfqfd,&readfds)) {
    res=1;
    rv = recv(nfqfd, buf, sizeof(buf), 0);
    if (errno <0) throw strerror(errno);
    nfq_handle_packet(h, buf, rv);
   }
 }
 
 return res;
 
}



void create_queue(int qnum) {
  struct info spec;
  struct info *pspec;
  map<int,info>::iterator ii; 
  
  if ((ii=queues.find(qnum))!=queues.end()) {
   printf("queue '%i' already queued\n",ii->first);
   return;
  }

  spec.qnum=qnum;
  queues[spec.qnum]=spec;

  pspec=&queues[spec.qnum];
  printf("binding this socket to queue '%i'\n",spec.qnum);
  pspec->qh = nfq_create_queue(h,  spec.qnum, &cb, pspec);
  if (!pspec->qh) throw "error during nfq_create_queue()";
  printf("setting copy_packet mode\n");
  if (nfq_set_mode(pspec->qh, NFQNL_COPY_PACKET, 0xffff) < 0) {
    fprintf(stderr,"can't set packet_copy mode (%s)\n",strerror(errno));
    queues.erase(queues.find(spec.qnum));
  }  
}

void delete_queue(int qnum) {
 map<int,info>::iterator ii; 
 if ((ii=queues.find(qnum))!=queues.end()) {
   printf("destroing queue '%i'\n",ii->first);
   nfq_destroy_queue(ii->second.qh);
 }
}

void enable_access(int qnum,int order, bool enable=true) {
 char what;
 
 if (enable) what='A';
 else what='D';
 
 string cmd=string("iptables -") + what +" OUTPUT -d 192.168.1." + stringify(order)+  " -j NFQUEUE --queue-num "+stringify(qnum);
 puts(cmd.c_str());
 system(cmd.c_str());
 
}

//RUN TEST

#define TIMEOUT 10000

void run_test() {
 int i,n;
 map<int,info>::iterator ii; 
 n=0;
 
 force_core(-1,-1); 
 srand(time(0));
 
 for (i=0; i<QNUM; i++) { //init queues
  create_queue(QSTART+i);    
  enable_access(QSTART+i,i);
 }
 
 printf("Initialized %i queues\n",queues.size());
 
 time_t now=time(0);
 time_t last=0;
 while(!TERM) {
 //creating and destroying queues 
  check_nfq(TIMEOUT);
  now=time(0);
  if (now<last+2) continue;
  last=now;
  int x=(int) (40.0+160.0/RAND_MAX*rand()); //random 40..200
//  printf("%i\n",x);
//  continue;
  if (x & 1) { //try to create if x is odd
   create_queue(x+QSTART);
   enable_access(QSTART+x,x);
   continue;
  }
  if (queues.size()<QNUM) continue;
  delete_queue(x+QSTART); //try to create if x is even
  enable_access(QSTART+x,x,false);
 }
 
 for (ii=queues.begin();ii!=queues.end(); ++ii) { //destroy queues
  delete_queue(ii->first);
  enable_access(ii->first,ii->first-QSTART,false);  
 }
 
}

int main(int argc, char **argv) {

 signal(SIGTERM,sig_term);
 signal(SIGINT,sig_term);
 
 try { 
  init_nfq();
 } 
 catch(const char *msg) {
  fprintf(stderr,"Fail to init nfq: %s\n",msg);  
  exit(-1);
 } 

 try { 
  run_test();
 } 
 catch(const char *msg) {
  fprintf(stderr,"Failure during test running: %s\n",msg);  
  exit(-1);
 } 
 
 
 signal(SIGTERM,SIG_DFL);
 signal(SIGINT,SIG_DFL);


#ifdef INSANE
	/* normally, applications SHOULD NOT issue this command, since
	 * it detaches other programs/sockets from AF_INET, too ! */
	printf("unbinding from AF_INET\n");
	nfq_unbind_pf(h, AF_INET);
#endif
 printf("closing library handle\n");
 nfq_close(h);
 
 return 0;
}

[-- Attachment #3: Makefile --]
[-- Type: application/octet-stream, Size: 221 bytes --]

#PATCHED_PATH=/usr/src/NETFILTER/libnetfilter_queue-0.0.16.patched/src/.libs/
PATCHED_PATH=/usr/src/lib.patched/

nftst: nftst.cpp
	g++ -o $@ $^ -L$(PATCHED_PATH) -Wl,-rpath=$(PATCHED_PATH)  -lnetfilter_queue -lnfnetlink

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-16 14:38                     ` Anton VG
@ 2009-02-16 15:23                       ` Pablo Neira Ayuso
  2009-02-16 15:33                         ` Anton VG
  0 siblings, 1 reply; 23+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-16 15:23 UTC (permalink / raw)
  To: Anton VG; +Cc: netfilter-devel, Vitaly Bodzhgua

Anton VG wrote:
> Pablo,
> Attached is the code which triggers the case, and it does not use
> threads (btw we of coase use mutexes in threaded app)
> 
> How to use it:
> at first, the app created 40 queues and attaches to output. Every
> first 40 created queues have assigned corrwsponding
> 192.168.1.{queue_num} IP address assigned to the queue.
> This means, for instance when you send a file to an IP address
> 192.168.1.37 it flows through QUEUE 37.
> 
> Than app started the loop, where it's randomly creates and destroys
> extra queues (over 40) every second.
> 
> After starting the app, you need to send a big file, say 1GB, over FTP
> to anther PC with IP address from group of first 40, we used
> 192.168.1.37
> 
> Somewhere is the middle of sending the file it triggers the error on
> queue creation.
> 
> Please make sure that you use OUTPUT chain, and you send a file from test PC.
> With this test code we did trigger this both on 64 and 32 bit systems.
> Kernel versions on test PC's 2.6.26.1 and 2.6.26.5
> 
> Just let me know if anything needs clarification.

void init_nfq()
...
  if (nfqfd>0) {
    fcntl(nfqfd,F_SETFL,O_NONBLOCK);
  } else throw "fail to set nfq nfnl fd";
}

With the current interface of libnetfilter_queue, the queue creation 
must be blocking to ensure serialization. I'll document this. I can add 
some functions to allow non-blocking queue creation but that's a 
different point.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
  2009-02-16 15:23                       ` Pablo Neira Ayuso
@ 2009-02-16 15:33                         ` Anton VG
  2009-02-16 15:41                           ` Anton VG
  0 siblings, 1 reply; 23+ messages in thread
From: Anton VG @ 2009-02-16 15:33 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel, Vitaly Bodzhgua

The same stuff with commented out

// fcntl(nfqfd,F_SETFL,O_NONBLOCK);

The same stuff. Or do i miss something, and there is a flag, which
forces blocking operation?
Any suggestion?

if (nfqfd>0) {
 // fcntl(nfqfd,F_SETFL,O_NONBLOCK);
 } else throw "fail to set nfq nfnl fd";
}

2009/2/16 Pablo Neira Ayuso <pablo@netfilter.org>:
> Anton VG wrote:
>>
>> Pablo,
>> Attached is the code which triggers the case, and it does not use
>> threads (btw we of coase use mutexes in threaded app)
>>
>> How to use it:
>> at first, the app created 40 queues and attaches to output. Every
>> first 40 created queues have assigned corrwsponding
>> 192.168.1.{queue_num} IP address assigned to the queue.
>> This means, for instance when you send a file to an IP address
>> 192.168.1.37 it flows through QUEUE 37.
>>
>> Than app started the loop, where it's randomly creates and destroys
>> extra queues (over 40) every second.
>>
>> After starting the app, you need to send a big file, say 1GB, over FTP
>> to anther PC with IP address from group of first 40, we used
>> 192.168.1.37
>>
>> Somewhere is the middle of sending the file it triggers the error on
>> queue creation.
>>
>> Please make sure that you use OUTPUT chain, and you send a file from test
>> PC.
>> With this test code we did trigger this both on 64 and 32 bit systems.
>> Kernel versions on test PC's 2.6.26.1 and 2.6.26.5
>>
>> Just let me know if anything needs clarification.
>
> void init_nfq()
> ...
>  if (nfqfd>0) {
>   fcntl(nfqfd,F_SETFL,O_NONBLOCK);
>  } else throw "fail to set nfq nfnl fd";
> }
>
> With the current interface of libnetfilter_queue, the queue creation must be
> blocking to ensure serialization. I'll document this. I can add some
> functions to allow non-blocking queue creation but that's a different point.
>
> --
> "Los honestos son inadaptados sociales" -- Les Luthiers
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
  2009-02-16 15:33                         ` Anton VG
@ 2009-02-16 15:41                           ` Anton VG
  2009-02-17 16:58                             ` Anton VG
  0 siblings, 1 reply; 23+ messages in thread
From: Anton VG @ 2009-02-16 15:41 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel, Vitaly Bodzhgua

>From the fcntl MAN page

If the O_NONBLOCK flag is not enabled, then  system  call
is  blocked  until the lock is removed or converted to a mode that is
compatible with the access.

So, just to clarify, that commenting out the non-blocking fcntl - does
not solve the problem, Error persists.

2009/2/16 Anton VG <anton.vazir@gmail.com>:
> The same stuff with commented out
>
> // fcntl(nfqfd,F_SETFL,O_NONBLOCK);
>>
>> With the current interface of libnetfilter_queue, the queue creation must be
>> blocking to ensure serialization. I'll document this. I can add some
>> functions to allow non-blocking queue creation but that's a different point.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
  2009-02-16 15:41                           ` Anton VG
@ 2009-02-17 16:58                             ` Anton VG
  2009-02-17 17:15                               ` Pablo Neira Ayuso
  0 siblings, 1 reply; 23+ messages in thread
From: Anton VG @ 2009-02-17 16:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel, Vitaly Bodzhgua

Pablo,
A little update, just tried non-patched variant with blocking, the
only difference is - it generated ERROR only once and hanged (waiting
for data)
Any update from you?

2009/2/16 Anton VG <anton.vazir@gmail.com>:
> From the fcntl MAN page
>
> If the O_NONBLOCK flag is not enabled, then  system  call
> is  blocked  until the lock is removed or converted to a mode that is
> compatible with the access.
>
> So, just to clarify, that commenting out the non-blocking fcntl - does
> not solve the problem, Error persists.
>
> 2009/2/16 Anton VG <anton.vazir@gmail.com>:
>> The same stuff with commented out
>>
>> // fcntl(nfqfd,F_SETFL,O_NONBLOCK);
>>>
>>> With the current interface of libnetfilter_queue, the queue creation must be
>>> blocking to ensure serialization. I'll document this. I can add some
>>> functions to allow non-blocking queue creation but that's a different point.
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-17 16:58                             ` Anton VG
@ 2009-02-17 17:15                               ` Pablo Neira Ayuso
  2009-02-17 17:31                                 ` Anton VG
  2009-02-17 17:34                                 ` Anton VG
  0 siblings, 2 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-17 17:15 UTC (permalink / raw)
  To: Anton VG; +Cc: netfilter-devel, Vitaly Bodzhgua

[-- Attachment #1: Type: text/plain, Size: 1269 bytes --]

Anton VG wrote:
> Pablo,
> A little update, just tried non-patched variant with blocking, the
> only difference is - it generated ERROR only once and hanged (waiting
> for data)
> Any update from you?

Yes, I got a trace of the problem (with blocking behaviour):

userspace                       kernelspace
create queue (seq=x) --->
add iptables rule    --->
                     <--- (seq=0) packet sent
verdict (seq=x+1)    --->
                     <--- (seq=0) packet sent
verdict (seq=x+2)    --->
                     <--- (seq=x) ACK message

Then, it hits EILSEQ. The patch attached applies to libnfnetlink, it
sets the sequence number for messages if we expect to receive an answer
from kernelspace. With it, I can hit ENOBUFS (that's normal), but not
EILSEQ anymore.

With non-blocking behaviour, you may still hit EILSEQ (even with the
patch applied) since the current API does not allow non-blocking queue
creation.

BTW, why don't open one socket handler per queue? That will reduce the
chances to hit ENOBUFS. Now the problem for you would be that you'll
have a lot of descriptors in userspace to handle (probably select() is
not the best choice anymore) but more netlink bandwidth in return.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

[-- Attachment #2: x --]
[-- Type: text/plain, Size: 642 bytes --]

diff --git a/src/libnfnetlink.c b/src/libnfnetlink.c
index d4212f9..5cfe2f5 100644
--- a/src/libnfnetlink.c
+++ b/src/libnfnetlink.c
@@ -418,7 +418,11 @@ void nfnl_fill_hdr(struct nfnl_subsys_handle *ssh,
 	nlh->nlmsg_type = (ssh->subsys_id<<8)|msg_type;
 	nlh->nlmsg_flags = msg_flags;
 	nlh->nlmsg_pid = 0;
-	nlh->nlmsg_seq = ++ssh->nfnlh->seq;
+	/* set sequence number if we expect an answer from kernelspace */
+	if (msg_flags & (NLM_F_ACK | NLM_F_ECHO | NLM_F_DUMP))
+		nlh->nlmsg_seq = ++ssh->nfnlh->seq;
+	else
+		nlh->nlmsg_seq = 0;
 
 	/* check for wraparounds: assume that seqnum 0 is only used by events */
 	if (!ssh->nfnlh->seq)

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
  2009-02-17 17:15                               ` Pablo Neira Ayuso
@ 2009-02-17 17:31                                 ` Anton VG
  2009-02-18  2:48                                   ` Amos Jeffries
  2009-02-17 17:34                                 ` Anton VG
  1 sibling, 1 reply; 23+ messages in thread
From: Anton VG @ 2009-02-17 17:31 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, Vitaly Bodzhgua

Pablo,

Thanks so much for patch, will test it shortly,

Do you have any suggestion what method to use if not select() ?

Since we have to handle potentially thousands of queues on the single PC?

Sincerely,
Anton.

2009/2/17 Pablo Neira Ayuso <pablo@netfilter.org>:
> Anton VG wrote:
>> Pablo,
>> A little update, just tried non-patched variant with blocking, the
>> only difference is - it generated ERROR only once and hanged (waiting
>> for data)
>> Any update from you?
>
> Yes, I got a trace of the problem (with blocking behaviour):
>
> userspace                       kernelspace
> create queue (seq=x) --->
> add iptables rule    --->
>                     <--- (seq=0) packet sent
> verdict (seq=x+1)    --->
>                     <--- (seq=0) packet sent
> verdict (seq=x+2)    --->
>                     <--- (seq=x) ACK message
>
> Then, it hits EILSEQ. The patch attached applies to libnfnetlink, it
> sets the sequence number for messages if we expect to receive an answer
> from kernelspace. With it, I can hit ENOBUFS (that's normal), but not
> EILSEQ anymore.
>
> With non-blocking behaviour, you may still hit EILSEQ (even with the
> patch applied) since the current API does not allow non-blocking queue
> creation.
>
> BTW, why don't open one socket handler per queue? That will reduce the
> chances to hit ENOBUFS. Now the problem for you would be that you'll
> have a lot of descriptors in userspace to handle (probably select() is
> not the best choice anymore) but more netlink bandwidth in return.
>
> --
> "Los honestos son inadaptados sociales" -- Les Luthiers
>
> diff --git a/src/libnfnetlink.c b/src/libnfnetlink.c
> index d4212f9..5cfe2f5 100644
> --- a/src/libnfnetlink.c
> +++ b/src/libnfnetlink.c
> @@ -418,7 +418,11 @@ void nfnl_fill_hdr(struct nfnl_subsys_handle *ssh,
>        nlh->nlmsg_type = (ssh->subsys_id<<8)|msg_type;
>        nlh->nlmsg_flags = msg_flags;
>        nlh->nlmsg_pid = 0;
> -       nlh->nlmsg_seq = ++ssh->nfnlh->seq;
> +       /* set sequence number if we expect an answer from kernelspace */
> +       if (msg_flags & (NLM_F_ACK | NLM_F_ECHO | NLM_F_DUMP))
> +               nlh->nlmsg_seq = ++ssh->nfnlh->seq;
> +       else
> +               nlh->nlmsg_seq = 0;
>
>        /* check for wraparounds: assume that seqnum 0 is only used by events */
>        if (!ssh->nfnlh->seq)
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required?
  2009-02-17 17:15                               ` Pablo Neira Ayuso
  2009-02-17 17:31                                 ` Anton VG
@ 2009-02-17 17:34                                 ` Anton VG
  2009-02-17 19:51                                   ` Pablo Neira Ayuso
  1 sibling, 1 reply; 23+ messages in thread
From: Anton VG @ 2009-02-17 17:34 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, Vitaly Bodzhgua

As I understand, the previous patch should be applied also?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-17 17:34                                 ` Anton VG
@ 2009-02-17 19:51                                   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 23+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-17 19:51 UTC (permalink / raw)
  To: Anton VG; +Cc: netfilter-devel, Vitaly Bodzhgua

[-- Attachment #1: Type: text/plain, Size: 385 bytes --]

Anton VG wrote:
> As I understand, the previous patch should be applied also?

While testing this a bit more, I notice that there are more race
conditions in the sequence tracking that the previous patch cannot fix.
As a temporary workaround, sequence tracking has been disabled.

I'm going to commit the following patches.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

[-- Attachment #2: libnfnl.patch --]
[-- Type: text/x-diff, Size: 3771 bytes --]

nfnl: allow disabling and enabling sequence tracking

This patch adds a couple of functions to enable and disable netlink
sequence tracking. Since nfqueue goes over a unicast socket, the
same channel to receive control messages and packets is used. This
leads to race conditions that may trigger sporious out-of-sequence
errors while creating queues and receiving high load of packets at
the same time.

Reported-by: Anton Vazir <anton.vazir@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---

 configure.in                        |    2 +-
 include/libnfnetlink/libnfnetlink.h |    4 ++++
 src/libnfnetlink.c                  |   37 +++++++++++++++++++++++++++++++----
 3 files changed, 38 insertions(+), 5 deletions(-)


diff --git a/configure.in b/configure.in
index 27b00c2..f760cd0 100644
--- a/configure.in
+++ b/configure.in
@@ -1,6 +1,6 @@
 dnl Process this file with autoconf to create configure.
 
-AC_INIT(libnfnetlink, 0.0.40)
+AC_INIT(libnfnetlink, 0.0.41)
 
 AC_CANONICAL_SYSTEM
 
diff --git a/include/libnfnetlink/libnfnetlink.h b/include/libnfnetlink/libnfnetlink.h
index b2f3652..10b6478 100644
--- a/include/libnfnetlink/libnfnetlink.h
+++ b/include/libnfnetlink/libnfnetlink.h
@@ -60,6 +60,10 @@ extern struct nfnl_subsys_handle *nfnl_subsys_open(struct nfnl_handle *,
 						   unsigned int);
 extern void nfnl_subsys_close(struct nfnl_subsys_handle *);
 
+/* set and unset sequence tracking */
+void nfnl_set_sequence_tracking(struct nfnl_handle *h);
+void nfnl_unset_sequence_tracking(struct nfnl_handle *h);
+
 /* set receive buffer size (for nfnl_catch) */
 extern void nfnl_set_rcv_buffer_size(struct nfnl_handle *h, unsigned int size);
 
diff --git a/src/libnfnetlink.c b/src/libnfnetlink.c
index d4212f9..a836de1 100644
--- a/src/libnfnetlink.c
+++ b/src/libnfnetlink.c
@@ -78,6 +78,9 @@ struct nfnl_subsys_handle {
 };
 
 #define		NFNL_MAX_SUBSYS			16 /* enough for now */
+
+#define NFNL_F_SEQTRACK_ENABLED		(1 << 0)
+
 struct nfnl_handle {
 	int			fd;
 	struct sockaddr_nl	local;
@@ -86,6 +89,7 @@ struct nfnl_handle {
 	u_int32_t		seq;
 	u_int32_t		dump;
 	u_int32_t		rcv_buffer_size;	/* for nfnl_catch */
+	u_int32_t		flags;
 	struct nlmsghdr 	*last_nlhdr;
 	struct nfnl_subsys_handle subsys[NFNL_MAX_SUBSYS+1];
 };
@@ -202,6 +206,8 @@ struct nfnl_handle *nfnl_open(void)
 		errno = EINVAL;
 		goto err_close;
 	}
+	/* sequence tracking enabled by default */
+	nfnlh->flags |= NFNL_F_SEQTRACK_ENABLED;
 
 	return nfnlh;
 
@@ -213,6 +219,24 @@ err_free:
 }
 
 /**
+ * nfnl_set_sequence_tracking - set netlink sequence tracking
+ * @h: nfnetlink handler
+ */
+void nfnl_set_sequence_tracking(struct nfnl_handle *h)
+{
+	h->flags |= NFNL_F_SEQTRACK_ENABLED;
+}
+
+/**
+ * nfnl_unset_sequence_tracking - set netlink sequence tracking
+ * @h: nfnetlink handler
+ */
+void nfnl_unset_sequence_tracking(struct nfnl_handle *h)
+{
+	h->flags &= ~NFNL_F_SEQTRACK_ENABLED;
+}
+
+/**
  * nfnl_set_rcv_buffer_size - set the size of the receive buffer
  * @h: libnfnetlink handler
  * @size: buffer size
@@ -418,11 +442,16 @@ void nfnl_fill_hdr(struct nfnl_subsys_handle *ssh,
 	nlh->nlmsg_type = (ssh->subsys_id<<8)|msg_type;
 	nlh->nlmsg_flags = msg_flags;
 	nlh->nlmsg_pid = 0;
-	nlh->nlmsg_seq = ++ssh->nfnlh->seq;
 
-	/* check for wraparounds: assume that seqnum 0 is only used by events */
-	if (!ssh->nfnlh->seq)
-		nlh->nlmsg_seq = ssh->nfnlh->seq = time(NULL);
+	if (ssh->nfnlh->flags & NFNL_F_SEQTRACK_ENABLED) {
+		nlh->nlmsg_seq = ++ssh->nfnlh->seq;
+		/* kernel uses sequence number zero for events */
+		if (!ssh->nfnlh->seq)
+			nlh->nlmsg_seq = ssh->nfnlh->seq = time(NULL);
+	} else {
+		/* unset sequence number, ignore it */
+		nlh->nlmsg_seq = 0;
+	}
 
 	nfg->nfgen_family = family;
 	nfg->version = NFNETLINK_V0;

[-- Attachment #3: libnfq.patch --]
[-- Type: text/x-diff, Size: 2256 bytes --]

nfq: replace nfnl_talk by nfnl_query and disable sequence tracking

This patch replaces the nfnl_talk() calls by the newer nfnl_query().
This patch also disables netlink sequence tracking by default.
Spurious race conditions in the sequence tracking may occur while
creating queues and receiving high load of packets at the same time.

Reported-by: Anton Vazir <anton.vazir@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---

 configure.in             |    2 +-
 src/libnetfilter_queue.c |    9 ++++++---
 2 files changed, 7 insertions(+), 4 deletions(-)


diff --git a/configure.in b/configure.in
index d3ce4a0..15e03a1 100644
--- a/configure.in
+++ b/configure.in
@@ -18,7 +18,7 @@ case $target in
 esac
 
 dnl Dependencies
-LIBNFNETLINK_REQUIRED=0.0.38
+LIBNFNETLINK_REQUIRED=0.0.41
  
 PKG_CHECK_MODULES(LIBNFNETLINK, libnfnetlink >= $LIBNFNETLINK_REQUIRED,,
 	AC_MSG_ERROR(Cannot find libnfnetlink >= $LIBNFNETLINK_REQUIRED))
diff --git a/src/libnetfilter_queue.c b/src/libnetfilter_queue.c
index 9e4903b..a2d0de2 100644
--- a/src/libnetfilter_queue.c
+++ b/src/libnetfilter_queue.c
@@ -141,7 +141,7 @@ __build_send_cfg_msg(struct nfq_handle *h, u_int8_t command,
 	cmd.pf = htons(pf);
 	nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_CFG_CMD, &cmd, sizeof(cmd));
 
-	return nfnl_talk(h->nfnlh, &u.nmh, 0, 0, NULL, NULL, NULL);
+	return nfnl_query(h->nfnlh, &u.nmh);
 }
 
 static int __nfq_rcv_pkt(struct nlmsghdr *nlh, struct nfattr *nfa[],
@@ -295,6 +295,9 @@ struct nfq_handle *nfq_open(void)
 	if (!nfnlh)
 		return NULL;
 
+	/* unset netlink sequence tracking by default */
+	nfnl_unset_sequence_tracking(nfnlh);
+
 	qh = nfq_open_nfnl(nfnlh);
 	if (!qh)
 		nfnl_close(nfnlh);
@@ -553,7 +556,7 @@ int nfq_set_mode(struct nfq_q_handle *qh,
 	nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_CFG_PARAMS, &params,
 			sizeof(params));
 
-	return nfnl_talk(qh->h->nfnlh, &u.nmh, 0, 0, NULL, NULL, NULL);
+	return nfnl_query(qh->h->nfnlh, &u.nmh);
 }
 
 /**
@@ -581,7 +584,7 @@ int nfq_set_queue_maxlen(struct nfq_q_handle *qh,
 	nfnl_addattr_l(&u.nmh, sizeof(u), NFQA_CFG_QUEUE_MAXLEN, &queue_maxlen,
 			sizeof(queue_maxlen));
 
-	return nfnl_talk(qh->h->nfnlh, &u.nmh, 0, 0, NULL, NULL, NULL);
+	return nfnl_query(qh->h->nfnlh, &u.nmh);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024  entries, dropping packets(s). Dropped: 582) - bug or just some defaults  increase required?
  2009-02-17 17:31                                 ` Anton VG
@ 2009-02-18  2:48                                   ` Amos Jeffries
  0 siblings, 0 replies; 23+ messages in thread
From: Amos Jeffries @ 2009-02-18  2:48 UTC (permalink / raw)
  To: Anton VG; +Cc: netfilter-devel

Anton VG wrote:
> Pablo,
> 
> Thanks so much for patch, will test it shortly,
> 
> Do you have any suggestion what method to use if not select() ?
> 
> Since we have to handle potentially thousands of queues on the single PC?
> 
> Sincerely,
> Anton.

epoll and in where available kqueue seems to be faster and non-blocking 
than select().

AYJ


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-02-18  2:48 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-06 18:14 (nfnl_talk: recvmsg over-run) and (nf_queue: full at 1024 entries, dropping packets(s). Dropped: 582) - bug or just some defaults increase required? Anton VG
2009-02-08  1:34 ` Pablo Neira Ayuso
2009-02-09 10:56   ` Anton
2009-02-09 11:20     ` Pablo Neira Ayuso
2009-02-11  8:48       ` Anton
     [not found]       ` <49928B62.1090600@netfilter.org>
2009-02-11 12:26         ` Anton VG
2009-02-11 16:41           ` Pablo Neira Ayuso
2009-02-12 10:45             ` Anton
2009-02-12 12:43               ` Pablo Neira Ayuso
2009-02-14  9:03                 ` Anton
2009-02-14 17:13               ` Pablo Neira Ayuso
2009-02-16 13:19                 ` Anton
2009-02-16 13:42                   ` Pablo Neira Ayuso
2009-02-16 14:38                     ` Anton VG
2009-02-16 15:23                       ` Pablo Neira Ayuso
2009-02-16 15:33                         ` Anton VG
2009-02-16 15:41                           ` Anton VG
2009-02-17 16:58                             ` Anton VG
2009-02-17 17:15                               ` Pablo Neira Ayuso
2009-02-17 17:31                                 ` Anton VG
2009-02-18  2:48                                   ` Amos Jeffries
2009-02-17 17:34                                 ` Anton VG
2009-02-17 19:51                                   ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.