All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
@ 2011-01-23 23:16 Pablo Neira Ayuso
  2011-01-24  6:41 ` Eric Dumazet
  2011-02-20 20:48 ` Stephen Hemminger
  0 siblings, 2 replies; 17+ messages in thread
From: Pablo Neira Ayuso @ 2011-01-23 23:16 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber

In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks
to protect the dump of the conntrack table according to reports from
Stephen and acknowledgments on the issue from Eric.

However, Stephen removed the refcount bump in that patch that allows
to keep a reference to the current ct object we are interating over.
That code avoids race conditions between ct object destruction and
the iteration itself. This patch reintroduces these lines since the
ct object may vanish between two recvmgs() invocations.

This patch fixes ocasional crashes while dumping the conntrack table
intensively.

Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 93297aa..519e245 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -654,14 +654,16 @@ restart:
 			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
+			if (!atomic_inc_not_zero(&ct->ct_general.use))
+				continue;
 			/* Dump entries of a given L3 protocol number.
 			 * If it is not specified, ie. l3proto == 0,
 			 * then dump everything. */
 			if (l3proto && nf_ct_l3num(ct) != l3proto)
-				continue;
+				goto releasect;
 			if (cb->args[1]) {
 				if (ct != last)
-					continue;
+					goto releasect;
 				cb->args[1] = 0;
 			}
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
@@ -679,6 +681,8 @@ restart:
 				if (acct)
 					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
 			}
+releasect:
+		nf_ct_put(ct);
 		}
 		if (cb->args[1]) {
 			cb->args[1] = 0;


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-23 23:16 [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy Pablo Neira Ayuso
@ 2011-01-24  6:41 ` Eric Dumazet
  2011-01-24  7:04   ` Eric Dumazet
  2011-01-24 11:57   ` Pablo Neira Ayuso
  2011-02-20 20:48 ` Stephen Hemminger
  1 sibling, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2011-01-24  6:41 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, kaber, Stephen Hemminger

Le lundi 24 janvier 2011 à 00:16 +0100, Pablo Neira Ayuso a écrit :
> In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks
> to protect the dump of the conntrack table according to reports from
> Stephen and acknowledgments on the issue from Eric.
> 

When a commit is referred, always give its title, and you can use a
short id (12 digits are OK)

In 13ee6ac57957 (netfilter: fix race in conntrack between dump_table and
destroy)

> However, Stephen removed the refcount bump in that patch that allows
> to keep a reference to the current ct object we are interating over.
> That code avoids race conditions between ct object destruction and
> the iteration itself. This patch reintroduces these lines since the
> ct object may vanish between two recvmgs() invocations.
> 
> This patch fixes ocasional crashes while dumping the conntrack table
> intensively.
> 
> Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>

Thats not true, Stephen was not in CC on your mail

I add him right now.

> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
>  net/netfilter/nf_conntrack_netlink.c |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 93297aa..519e245 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -654,14 +654,16 @@ restart:
>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>  				continue;
>  			ct = nf_ct_tuplehash_to_ctrack(h);
> +			if (!atomic_inc_not_zero(&ct->ct_general.use))
> +				continue;
>  			/* Dump entries of a given L3 protocol number.
>  			 * If it is not specified, ie. l3proto == 0,
>  			 * then dump everything. */
>  			if (l3proto && nf_ct_l3num(ct) != l3proto)
> -				continue;
> +				goto releasect;
>  			if (cb->args[1]) {
>  				if (ct != last)
> -					continue;
> +					goto releasect;
>  				cb->args[1] = 0;
>  			}
>  			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
> @@ -679,6 +681,8 @@ restart:
>  				if (acct)
>  					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
>  			}
> +releasect:
> +		nf_ct_put(ct);
>  		}
>  		if (cb->args[1]) {
>  			cb->args[1] = 0;
> 


Hmm...

Only the RCU lookup should need this extra check (atomic_inc_not_zero),
not a writer. (And a dumper is a "writer" since it holds nf_conntrack
spinlock)

In Stephen commit, we switched from RCU lookup to traditional one
(spinlock protected), so, we should not need to touch individual objects
refcount ?

I feel the right fix is not to increment refcount then decrement it.

Only to _test_ it being zero, and not dumping the ct, eventually ?


diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 93297aa..a977cc7 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -654,6 +654,8 @@ restart:
 			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
+			if (!atomic_read(&ct->ct_general.use))
+				continue;
 			/* Dump entries of a given L3 protocol number.
 			 * If it is not specified, ie. l3proto == 0,
 			 * then dump everything. */


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24  6:41 ` Eric Dumazet
@ 2011-01-24  7:04   ` Eric Dumazet
  2011-01-24 11:57   ` Pablo Neira Ayuso
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2011-01-24  7:04 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, kaber, Stephen Hemminger

Le lundi 24 janvier 2011 à 07:41 +0100, Eric Dumazet a écrit :
> Le lundi 24 janvier 2011 à 00:16 +0100, Pablo Neira Ayuso a écrit :
> > In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks
> > to protect the dump of the conntrack table according to reports from
> > Stephen and acknowledgments on the issue from Eric.
> > 
> 
> When a commit is referred, always give its title, and you can use a
> short id (12 digits are OK)
> 
> In 13ee6ac57957 (netfilter: fix race in conntrack between dump_table and
> destroy)
> 
> > However, Stephen removed the refcount bump in that patch that allows
> > to keep a reference to the current ct object we are interating over.
> > That code avoids race conditions between ct object destruction and
> > the iteration itself. This patch reintroduces these lines since the
> > ct object may vanish between two recvmgs() invocations.
> > 
> > This patch fixes ocasional crashes while dumping the conntrack table
> > intensively.
> > 
> > Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
> 
> Thats not true, Stephen was not in CC on your mail
> 
> I add him right now.
> 
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> >  net/netfilter/nf_conntrack_netlink.c |    8 ++++++--
> >  1 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> > index 93297aa..519e245 100644
> > --- a/net/netfilter/nf_conntrack_netlink.c
> > +++ b/net/netfilter/nf_conntrack_netlink.c
> > @@ -654,14 +654,16 @@ restart:
> >  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> >  				continue;
> >  			ct = nf_ct_tuplehash_to_ctrack(h);
> > +			if (!atomic_inc_not_zero(&ct->ct_general.use))
> > +				continue;
> >  			/* Dump entries of a given L3 protocol number.
> >  			 * If it is not specified, ie. l3proto == 0,
> >  			 * then dump everything. */
> >  			if (l3proto && nf_ct_l3num(ct) != l3proto)
> > -				continue;
> > +				goto releasect;
> >  			if (cb->args[1]) {
> >  				if (ct != last)
> > -					continue;
> > +					goto releasect;
> >  				cb->args[1] = 0;
> >  			}
> >  			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
> > @@ -679,6 +681,8 @@ restart:
> >  				if (acct)
> >  					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
> >  			}
> > +releasect:
> > +		nf_ct_put(ct);
> >  		}
> >  		if (cb->args[1]) {
> >  			cb->args[1] = 0;
> > 
> 
> 
> Hmm...
> 
> Only the RCU lookup should need this extra check (atomic_inc_not_zero),
> not a writer. (And a dumper is a "writer" since it holds nf_conntrack
> spinlock)
> 
> In Stephen commit, we switched from RCU lookup to traditional one
> (spinlock protected), so, we should not need to touch individual objects
> refcount ?
> 
> I feel the right fix is not to increment refcount then decrement it.
> 
> Only to _test_ it being zero, and not dumping the ct, eventually ?
> 
> 
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 93297aa..a977cc7 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -654,6 +654,8 @@ restart:
>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>  				continue;
>  			ct = nf_ct_tuplehash_to_ctrack(h);
> +			if (!atomic_read(&ct->ct_general.use))
> +				continue;
>  			/* Dump entries of a given L3 protocol number.
>  			 * If it is not specified, ie. l3proto == 0,
>  			 * then dump everything. */
> 


I read ea781f197d6a (netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU
and get rid of call_rcu()) and can see previous code only had the
spinlock, not the refcount inc/dec.

Are you saying that my commit included a bug fix ? ;)



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24  6:41 ` Eric Dumazet
  2011-01-24  7:04   ` Eric Dumazet
@ 2011-01-24 11:57   ` Pablo Neira Ayuso
  2011-01-24 12:46     ` Patrick McHardy
  1 sibling, 1 reply; 17+ messages in thread
From: Pablo Neira Ayuso @ 2011-01-24 11:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netfilter-devel, kaber, Stephen Hemminger

[-- Attachment #1: Type: text/plain, Size: 3886 bytes --]

On 24/01/11 07:41, Eric Dumazet wrote:
> Le lundi 24 janvier 2011 à 00:16 +0100, Pablo Neira Ayuso a écrit :
>> In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks
>> to protect the dump of the conntrack table according to reports from
>> Stephen and acknowledgments on the issue from Eric.
> 
> When a commit is referred, always give its title, and you can use a
> short id (12 digits are OK)

I'm using the id that git log --oneline shows, is it OK?

> In 13ee6ac57957 (netfilter: fix race in conntrack between dump_table and
> destroy)
> 
>> However, Stephen removed the refcount bump in that patch that allows
>> to keep a reference to the current ct object we are interating over.
>> That code avoids race conditions between ct object destruction and
>> the iteration itself. This patch reintroduces these lines since the
>> ct object may vanish between two recvmgs() invocations.
>>
>> This patch fixes ocasional crashes while dumping the conntrack table
>> intensively.
>>
>> Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
> 
> Thats not true, Stephen was not in CC on your mail

I'm sorry, I forgot to add --cc to stgit.

> I add him right now.
> 
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>> ---
>>  net/netfilter/nf_conntrack_netlink.c |    8 ++++++--
>>  1 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
>> index 93297aa..519e245 100644
>> --- a/net/netfilter/nf_conntrack_netlink.c
>> +++ b/net/netfilter/nf_conntrack_netlink.c
>> @@ -654,14 +654,16 @@ restart:
>>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>>  				continue;
>>  			ct = nf_ct_tuplehash_to_ctrack(h);
>> +			if (!atomic_inc_not_zero(&ct->ct_general.use))
>> +				continue;
>>  			/* Dump entries of a given L3 protocol number.
>>  			 * If it is not specified, ie. l3proto == 0,
>>  			 * then dump everything. */
>>  			if (l3proto && nf_ct_l3num(ct) != l3proto)
>> -				continue;
>> +				goto releasect;
>>  			if (cb->args[1]) {
>>  				if (ct != last)
>> -					continue;
>> +					goto releasect;
>>  				cb->args[1] = 0;
>>  			}
>>  			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
>> @@ -679,6 +681,8 @@ restart:
>>  				if (acct)
>>  					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
>>  			}
>> +releasect:
>> +		nf_ct_put(ct);
>>  		}
>>  		if (cb->args[1]) {
>>  			cb->args[1] = 0;
>>
> 
> 
> Hmm...
> 
> Only the RCU lookup should need this extra check (atomic_inc_not_zero),
> not a writer. (And a dumper is a "writer" since it holds nf_conntrack
> spinlock)
> 
> In Stephen commit, we switched from RCU lookup to traditional one
> (spinlock protected), so, we should not need to touch individual objects
> refcount ?
> 
> I feel the right fix is not to increment refcount then decrement it.
> 
> Only to _test_ it being zero, and not dumping the ct, eventually ?
> 
> 
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 93297aa..a977cc7 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -654,6 +654,8 @@ restart:
>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>  				continue;
>  			ct = nf_ct_tuplehash_to_ctrack(h);
> +			if (!atomic_read(&ct->ct_general.use))
> +				continue;
>  			/* Dump entries of a given L3 protocol number.
>  			 * If it is not specified, ie. l3proto == 0,
>  			 * then dump everything. */

No, that won't work.

Sorry, I didn't explain the problem appropriately. We still do a
nf_ct_put() for last ct. In that patch, you forgot to add the refcount
bump that we need to keep the reference to the object.

The following patch attached is smaller, it also fixes the problem and
we don't have to increase the refcount for each object. Now it looks
similar to how it was before the RCU patches.

[-- Attachment #2: fix-ct-dump.patch --]
[-- Type: text/x-patch, Size: 1329 bytes --]

netfilter: ctnetlink: fix missing refcount increment during dumps

From: Pablo Neira Ayuso <pablo@netfilter.org>

In 13ee6ac netfilter: fix race in conntrack between dump_table and
destroy, we recovered spinlocks to protect the dump of the conntrack
table according to reports from Stephen and acknowledgments on the
issue from Eric.

In that patch, the refcount bump that allows to keep a reference
to the current ct object was removed. However, we still decrement
the refcount for that object in the output path of
ctnetlink_dump_table():

        if (last)
                nf_ct_put(last)

Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 93297aa..6dd9296 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -667,6 +667,8 @@ restart:
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
 						cb->nlh->nlmsg_seq,
 						IPCTNL_MSG_CT_NEW, ct) < 0) {
+				if (!atomic_inc_not_zero(&ct->ct_general.use))
+					continue;
 				cb->args[1] = (unsigned long)ct;
 				goto out;
 			}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24 11:57   ` Pablo Neira Ayuso
@ 2011-01-24 12:46     ` Patrick McHardy
  2011-01-24 12:54       ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Patrick McHardy @ 2011-01-24 12:46 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Eric Dumazet, netfilter-devel, Stephen Hemminger

On 24.01.2011 12:57, Pablo Neira Ayuso wrote:
> On 24/01/11 07:41, Eric Dumazet wrote:
>> Le lundi 24 janvier 2011 à 00:16 +0100, Pablo Neira Ayuso a écrit :
>>> In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks
>>> to protect the dump of the conntrack table according to reports from
>>> Stephen and acknowledgments on the issue from Eric.
>>
>> When a commit is referred, always give its title, and you can use a
>> short id (12 digits are OK)
> 
> I'm using the id that git log --oneline shows, is it OK?
> 
>> In 13ee6ac57957 (netfilter: fix race in conntrack between dump_table and
>> destroy)
>>
>>> However, Stephen removed the refcount bump in that patch that allows
>>> to keep a reference to the current ct object we are interating over.
>>> That code avoids race conditions between ct object destruction and
>>> the iteration itself. This patch reintroduces these lines since the
>>> ct object may vanish between two recvmgs() invocations.
>>>
>>> This patch fixes ocasional crashes while dumping the conntrack table
>>> intensively.
>>>
>>> Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
>>
>> Thats not true, Stephen was not in CC on your mail
> 
> I'm sorry, I forgot to add --cc to stgit.
> 
>> I add him right now.
>>
>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>>> ---
>>>  net/netfilter/nf_conntrack_netlink.c |    8 ++++++--
>>>  1 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
>>> index 93297aa..519e245 100644
>>> --- a/net/netfilter/nf_conntrack_netlink.c
>>> +++ b/net/netfilter/nf_conntrack_netlink.c
>>> @@ -654,14 +654,16 @@ restart:
>>>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>>>  				continue;
>>>  			ct = nf_ct_tuplehash_to_ctrack(h);
>>> +			if (!atomic_inc_not_zero(&ct->ct_general.use))
>>> +				continue;
>>>  			/* Dump entries of a given L3 protocol number.
>>>  			 * If it is not specified, ie. l3proto == 0,
>>>  			 * then dump everything. */
>>>  			if (l3proto && nf_ct_l3num(ct) != l3proto)
>>> -				continue;
>>> +				goto releasect;
>>>  			if (cb->args[1]) {
>>>  				if (ct != last)
>>> -					continue;
>>> +					goto releasect;
>>>  				cb->args[1] = 0;
>>>  			}
>>>  			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
>>> @@ -679,6 +681,8 @@ restart:
>>>  				if (acct)
>>>  					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
>>>  			}
>>> +releasect:
>>> +		nf_ct_put(ct);
>>>  		}
>>>  		if (cb->args[1]) {
>>>  			cb->args[1] = 0;
>>>
>>
>>
>> Hmm...
>>
>> Only the RCU lookup should need this extra check (atomic_inc_not_zero),
>> not a writer. (And a dumper is a "writer" since it holds nf_conntrack
>> spinlock)
>>
>> In Stephen commit, we switched from RCU lookup to traditional one
>> (spinlock protected), so, we should not need to touch individual objects
>> refcount ?
>>
>> I feel the right fix is not to increment refcount then decrement it.
>>
>> Only to _test_ it being zero, and not dumping the ct, eventually ?
>>
>>
>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
>> index 93297aa..a977cc7 100644
>> --- a/net/netfilter/nf_conntrack_netlink.c
>> +++ b/net/netfilter/nf_conntrack_netlink.c
>> @@ -654,6 +654,8 @@ restart:
>>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>>  				continue;
>>  			ct = nf_ct_tuplehash_to_ctrack(h);
>> +			if (!atomic_read(&ct->ct_general.use))
>> +				continue;
>>  			/* Dump entries of a given L3 protocol number.
>>  			 * If it is not specified, ie. l3proto == 0,
>>  			 * then dump everything. */
> 
> No, that won't work.
> 
> Sorry, I didn't explain the problem appropriately. We still do a
> nf_ct_put() for last ct. In that patch, you forgot to add the refcount
> bump that we need to keep the reference to the object.
> 
> The following patch attached is smaller, it also fixes the problem and
> we don't have to increase the refcount for each object. Now it looks
> similar to how it was before the RCU patches.

This looks correct to me, we need to keep a reference for continuations.
Eric, do you agree with this patch?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24 12:46     ` Patrick McHardy
@ 2011-01-24 12:54       ` Eric Dumazet
  2011-01-24 13:06         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2011-01-24 12:54 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Pablo Neira Ayuso, netfilter-devel, Stephen Hemminger

Le lundi 24 janvier 2011 à 13:46 +0100, Patrick McHardy a écrit :
> On 24.01.2011 12:57, Pablo Neira Ayuso wrote:
> > On 24/01/11 07:41, Eric Dumazet wrote:
> >> Le lundi 24 janvier 2011 à 00:16 +0100, Pablo Neira Ayuso a écrit :
> >>> In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks
> >>> to protect the dump of the conntrack table according to reports from
> >>> Stephen and acknowledgments on the issue from Eric.
> >>
> >> When a commit is referred, always give its title, and you can use a
> >> short id (12 digits are OK)
> > 
> > I'm using the id that git log --oneline shows, is it OK?
> > 
> >> In 13ee6ac57957 (netfilter: fix race in conntrack between dump_table and
> >> destroy)
> >>
> >>> However, Stephen removed the refcount bump in that patch that allows
> >>> to keep a reference to the current ct object we are interating over.
> >>> That code avoids race conditions between ct object destruction and
> >>> the iteration itself. This patch reintroduces these lines since the
> >>> ct object may vanish between two recvmgs() invocations.
> >>>
> >>> This patch fixes ocasional crashes while dumping the conntrack table
> >>> intensively.
> >>>
> >>> Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
> >>
> >> Thats not true, Stephen was not in CC on your mail
> > 
> > I'm sorry, I forgot to add --cc to stgit.
> > 
> >> I add him right now.
> >>
> >>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> >>> ---
> >>>  net/netfilter/nf_conntrack_netlink.c |    8 ++++++--
> >>>  1 files changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> >>> index 93297aa..519e245 100644
> >>> --- a/net/netfilter/nf_conntrack_netlink.c
> >>> +++ b/net/netfilter/nf_conntrack_netlink.c
> >>> @@ -654,14 +654,16 @@ restart:
> >>>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> >>>  				continue;
> >>>  			ct = nf_ct_tuplehash_to_ctrack(h);
> >>> +			if (!atomic_inc_not_zero(&ct->ct_general.use))
> >>> +				continue;
> >>>  			/* Dump entries of a given L3 protocol number.
> >>>  			 * If it is not specified, ie. l3proto == 0,
> >>>  			 * then dump everything. */
> >>>  			if (l3proto && nf_ct_l3num(ct) != l3proto)
> >>> -				continue;
> >>> +				goto releasect;
> >>>  			if (cb->args[1]) {
> >>>  				if (ct != last)
> >>> -					continue;
> >>> +					goto releasect;
> >>>  				cb->args[1] = 0;
> >>>  			}
> >>>  			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
> >>> @@ -679,6 +681,8 @@ restart:
> >>>  				if (acct)
> >>>  					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
> >>>  			}
> >>> +releasect:
> >>> +		nf_ct_put(ct);
> >>>  		}
> >>>  		if (cb->args[1]) {
> >>>  			cb->args[1] = 0;
> >>>
> >>
> >>
> >> Hmm...
> >>
> >> Only the RCU lookup should need this extra check (atomic_inc_not_zero),
> >> not a writer. (And a dumper is a "writer" since it holds nf_conntrack
> >> spinlock)
> >>
> >> In Stephen commit, we switched from RCU lookup to traditional one
> >> (spinlock protected), so, we should not need to touch individual objects
> >> refcount ?
> >>
> >> I feel the right fix is not to increment refcount then decrement it.
> >>
> >> Only to _test_ it being zero, and not dumping the ct, eventually ?
> >>
> >>
> >> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> >> index 93297aa..a977cc7 100644
> >> --- a/net/netfilter/nf_conntrack_netlink.c
> >> +++ b/net/netfilter/nf_conntrack_netlink.c
> >> @@ -654,6 +654,8 @@ restart:
> >>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> >>  				continue;
> >>  			ct = nf_ct_tuplehash_to_ctrack(h);
> >> +			if (!atomic_read(&ct->ct_general.use))
> >> +				continue;
> >>  			/* Dump entries of a given L3 protocol number.
> >>  			 * If it is not specified, ie. l3proto == 0,
> >>  			 * then dump everything. */
> > 
> > No, that won't work.
> > 
> > Sorry, I didn't explain the problem appropriately. We still do a
> > nf_ct_put() for last ct. In that patch, you forgot to add the refcount
> > bump that we need to keep the reference to the object.
> > 
> > The following patch attached is smaller, it also fixes the problem and
> > we don't have to increase the refcount for each object. Now it looks
> > similar to how it was before the RCU patches.
> 
> This looks correct to me, we need to keep a reference for continuations.
> Eric, do you agree with this patch?

Taking a reference I agree for sure, this is the full
atomic_inc_not_zero(&ct->ct_general.use) I disagree.

Pablo, the refcount cannot be 0 at this point, or something is really
wrong ?

Why not using

atomic_inc(&ct->ct_general.use);

?


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24 12:54       ` Eric Dumazet
@ 2011-01-24 13:06         ` Pablo Neira Ayuso
  2011-01-24 13:12           ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Pablo Neira Ayuso @ 2011-01-24 13:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, netfilter-devel, Stephen Hemminger

[-- Attachment #1: Type: text/plain, Size: 4837 bytes --]

On 24/01/11 13:54, Eric Dumazet wrote:
> Le lundi 24 janvier 2011 à 13:46 +0100, Patrick McHardy a écrit :
>> On 24.01.2011 12:57, Pablo Neira Ayuso wrote:
>>> On 24/01/11 07:41, Eric Dumazet wrote:
>>>> Le lundi 24 janvier 2011 à 00:16 +0100, Pablo Neira Ayuso a écrit :
>>>>> In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks
>>>>> to protect the dump of the conntrack table according to reports from
>>>>> Stephen and acknowledgments on the issue from Eric.
>>>>
>>>> When a commit is referred, always give its title, and you can use a
>>>> short id (12 digits are OK)
>>>
>>> I'm using the id that git log --oneline shows, is it OK?
>>>
>>>> In 13ee6ac57957 (netfilter: fix race in conntrack between dump_table and
>>>> destroy)
>>>>
>>>>> However, Stephen removed the refcount bump in that patch that allows
>>>>> to keep a reference to the current ct object we are interating over.
>>>>> That code avoids race conditions between ct object destruction and
>>>>> the iteration itself. This patch reintroduces these lines since the
>>>>> ct object may vanish between two recvmgs() invocations.
>>>>>
>>>>> This patch fixes ocasional crashes while dumping the conntrack table
>>>>> intensively.
>>>>>
>>>>> Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
>>>>
>>>> Thats not true, Stephen was not in CC on your mail
>>>
>>> I'm sorry, I forgot to add --cc to stgit.
>>>
>>>> I add him right now.
>>>>
>>>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>>>>> ---
>>>>>  net/netfilter/nf_conntrack_netlink.c |    8 ++++++--
>>>>>  1 files changed, 6 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
>>>>> index 93297aa..519e245 100644
>>>>> --- a/net/netfilter/nf_conntrack_netlink.c
>>>>> +++ b/net/netfilter/nf_conntrack_netlink.c
>>>>> @@ -654,14 +654,16 @@ restart:
>>>>>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>>>>>  				continue;
>>>>>  			ct = nf_ct_tuplehash_to_ctrack(h);
>>>>> +			if (!atomic_inc_not_zero(&ct->ct_general.use))
>>>>> +				continue;
>>>>>  			/* Dump entries of a given L3 protocol number.
>>>>>  			 * If it is not specified, ie. l3proto == 0,
>>>>>  			 * then dump everything. */
>>>>>  			if (l3proto && nf_ct_l3num(ct) != l3proto)
>>>>> -				continue;
>>>>> +				goto releasect;
>>>>>  			if (cb->args[1]) {
>>>>>  				if (ct != last)
>>>>> -					continue;
>>>>> +					goto releasect;
>>>>>  				cb->args[1] = 0;
>>>>>  			}
>>>>>  			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
>>>>> @@ -679,6 +681,8 @@ restart:
>>>>>  				if (acct)
>>>>>  					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
>>>>>  			}
>>>>> +releasect:
>>>>> +		nf_ct_put(ct);
>>>>>  		}
>>>>>  		if (cb->args[1]) {
>>>>>  			cb->args[1] = 0;
>>>>>
>>>>
>>>>
>>>> Hmm...
>>>>
>>>> Only the RCU lookup should need this extra check (atomic_inc_not_zero),
>>>> not a writer. (And a dumper is a "writer" since it holds nf_conntrack
>>>> spinlock)
>>>>
>>>> In Stephen commit, we switched from RCU lookup to traditional one
>>>> (spinlock protected), so, we should not need to touch individual objects
>>>> refcount ?
>>>>
>>>> I feel the right fix is not to increment refcount then decrement it.
>>>>
>>>> Only to _test_ it being zero, and not dumping the ct, eventually ?
>>>>
>>>>
>>>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
>>>> index 93297aa..a977cc7 100644
>>>> --- a/net/netfilter/nf_conntrack_netlink.c
>>>> +++ b/net/netfilter/nf_conntrack_netlink.c
>>>> @@ -654,6 +654,8 @@ restart:
>>>>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>>>>  				continue;
>>>>  			ct = nf_ct_tuplehash_to_ctrack(h);
>>>> +			if (!atomic_read(&ct->ct_general.use))
>>>> +				continue;
>>>>  			/* Dump entries of a given L3 protocol number.
>>>>  			 * If it is not specified, ie. l3proto == 0,
>>>>  			 * then dump everything. */
>>>
>>> No, that won't work.
>>>
>>> Sorry, I didn't explain the problem appropriately. We still do a
>>> nf_ct_put() for last ct. In that patch, you forgot to add the refcount
>>> bump that we need to keep the reference to the object.
>>>
>>> The following patch attached is smaller, it also fixes the problem and
>>> we don't have to increase the refcount for each object. Now it looks
>>> similar to how it was before the RCU patches.
>>
>> This looks correct to me, we need to keep a reference for continuations.
>> Eric, do you agree with this patch?
> 
> Taking a reference I agree for sure, this is the full
> atomic_inc_not_zero(&ct->ct_general.use) I disagree.
> 
> Pablo, the refcount cannot be 0 at this point, or something is really
> wrong ?
> 
> Why not using
> 
> atomic_inc(&ct->ct_general.use);
> 
> ?

Yes, we can use nf_conntrack_get (which does atomic_inc) instead. New
patch attached.

[-- Attachment #2: fix-ct-dump.patch --]
[-- Type: text/x-patch, Size: 1300 bytes --]

netfilter: ctnetlink: fix missing refcount increment during dumps

From: Pablo Neira Ayuso <pablo@netfilter.org>

In 13ee6ac netfilter: fix race in conntrack between dump_table and
destroy, we recovered spinlocks to protect the dump of the conntrack
table according to reports from Stephen and acknowledgments on the
issue from Eric.

In that patch, the refcount bump that allows to keep a reference
to the current ct object was removed. However, we still decrement
the refcount for that object in the output path of
ctnetlink_dump_table():

        if (last)
                nf_ct_put(last)

Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 93297aa..eead9db 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -667,6 +667,7 @@ restart:
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
 						cb->nlh->nlmsg_seq,
 						IPCTNL_MSG_CT_NEW, ct) < 0) {
+				nf_conntrack_get(&ct->ct_general);
 				cb->args[1] = (unsigned long)ct;
 				goto out;
 			}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24 13:06         ` Pablo Neira Ayuso
@ 2011-01-24 13:12           ` Eric Dumazet
  2011-01-24 13:25             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2011-01-24 13:12 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Patrick McHardy, netfilter-devel, Stephen Hemminger

Le lundi 24 janvier 2011 à 14:06 +0100, Pablo Neira Ayuso a écrit :

> Yes, we can use nf_conntrack_get (which does atomic_inc) instead. New
> patch attached.

I feel now a bit uncomfortable, sorry ;)

Are we sure the refcount cannot reach 0 while we hold
nf_conntrack_lock ?



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24 13:12           ` Eric Dumazet
@ 2011-01-24 13:25             ` Pablo Neira Ayuso
  2011-01-24 13:37               ` Patrick McHardy
  0 siblings, 1 reply; 17+ messages in thread
From: Pablo Neira Ayuso @ 2011-01-24 13:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, netfilter-devel, Stephen Hemminger

On 24/01/11 14:12, Eric Dumazet wrote:
> Le lundi 24 janvier 2011 à 14:06 +0100, Pablo Neira Ayuso a écrit :
> 
>> Yes, we can use nf_conntrack_get (which does atomic_inc) instead. New
>> patch attached.
> 
> I feel now a bit uncomfortable, sorry ;)
> 
> Are we sure the refcount cannot reach 0 while we hold
> nf_conntrack_lock ?

the ct deletion from the hash list is protected by spin lock, so
whatever deletion would wait until we have left the dump section.

with this patch, the code looks like it was in 2.6.24 before the rcu stuff.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24 13:25             ` Pablo Neira Ayuso
@ 2011-01-24 13:37               ` Patrick McHardy
  2011-01-24 14:06                 ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Patrick McHardy @ 2011-01-24 13:37 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Eric Dumazet, netfilter-devel, Stephen Hemminger

On 24.01.2011 14:25, Pablo Neira Ayuso wrote:
> On 24/01/11 14:12, Eric Dumazet wrote:
>> Le lundi 24 janvier 2011 à 14:06 +0100, Pablo Neira Ayuso a écrit :
>>
>>> Yes, we can use nf_conntrack_get (which does atomic_inc) instead. New
>>> patch attached.
>>
>> I feel now a bit uncomfortable, sorry ;)
>>
>> Are we sure the refcount cannot reach 0 while we hold
>> nf_conntrack_lock ?
> 
> the ct deletion from the hash list is protected by spin lock, so
> whatever deletion would wait until we have left the dump section.
> 
> with this patch, the code looks like it was in 2.6.24 before the rcu stuff.

Yeah, we definitely have a reference while the conntrack is contained
in the hash table, and removal requires taking nf_conntrack_lock,
therefor using the conntrack entry while holding the lock is valid.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24 13:37               ` Patrick McHardy
@ 2011-01-24 14:06                 ` Eric Dumazet
  2011-01-24 17:53                   ` Patrick McHardy
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2011-01-24 14:06 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Pablo Neira Ayuso, netfilter-devel, Stephen Hemminger

Le lundi 24 janvier 2011 à 14:37 +0100, Patrick McHardy a écrit :
> On 24.01.2011 14:25, Pablo Neira Ayuso wrote:
> > On 24/01/11 14:12, Eric Dumazet wrote:
> >> Le lundi 24 janvier 2011 à 14:06 +0100, Pablo Neira Ayuso a écrit :
> >>
> >>> Yes, we can use nf_conntrack_get (which does atomic_inc) instead. New
> >>> patch attached.
> >>
> >> I feel now a bit uncomfortable, sorry ;)
> >>
> >> Are we sure the refcount cannot reach 0 while we hold
> >> nf_conntrack_lock ?
> > 
> > the ct deletion from the hash list is protected by spin lock, so
> > whatever deletion would wait until we have left the dump section.
> > 
> > with this patch, the code looks like it was in 2.6.24 before the rcu stuff.
> 
> Yeah, we definitely have a reference while the conntrack is contained
> in the hash table, and removal requires taking nf_conntrack_lock,
> therefor using the conntrack entry while holding the lock is valid.

Yes, but to clarify my question, is the following possible ?

CPU 0                                           CPU1
dump()
spin_lock()
....
    if (not enough room in skb)
						decrement last refcount
      increment_ref_count()
spin_unlock();
						since refcount hit 0,
						spin_lock(nf_conntrack)
						remove ct from hashes


If yes we should use the atomic_inc_not_zero() form.


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24 14:06                 ` Eric Dumazet
@ 2011-01-24 17:53                   ` Patrick McHardy
  2011-01-24 18:01                     ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Patrick McHardy @ 2011-01-24 17:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Pablo Neira Ayuso, netfilter-devel, Stephen Hemminger

Am 24.01.2011 15:06, schrieb Eric Dumazet:
> Le lundi 24 janvier 2011 à 14:37 +0100, Patrick McHardy a écrit :
>> On 24.01.2011 14:25, Pablo Neira Ayuso wrote:
>>> On 24/01/11 14:12, Eric Dumazet wrote:
>>>> Le lundi 24 janvier 2011 à 14:06 +0100, Pablo Neira Ayuso a écrit :
>>>>
>>>>> Yes, we can use nf_conntrack_get (which does atomic_inc) instead. New
>>>>> patch attached.
>>>>
>>>> I feel now a bit uncomfortable, sorry ;)
>>>>
>>>> Are we sure the refcount cannot reach 0 while we hold
>>>> nf_conntrack_lock ?
>>>
>>> the ct deletion from the hash list is protected by spin lock, so
>>> whatever deletion would wait until we have left the dump section.
>>>
>>> with this patch, the code looks like it was in 2.6.24 before the rcu stuff.
>>
>> Yeah, we definitely have a reference while the conntrack is contained
>> in the hash table, and removal requires taking nf_conntrack_lock,
>> therefor using the conntrack entry while holding the lock is valid.
> 
> Yes, but to clarify my question, is the following possible ?
> 
> CPU 0                                           CPU1
> dump()
> spin_lock()
> ....
>     if (not enough room in skb)
> 						decrement last refcount
>       increment_ref_count()
> spin_unlock();
> 						since refcount hit 0,
> 						spin_lock(nf_conntrack)
> 						remove ct from hashes
> 

No, that can't happen. Before the last refcount is released, the entry
is removed from the hash, which requires to take the lock. Each entry
contained in the hash table has a refcount of at least 1.

If there are no futher concerns, I'm going to apply Pablo's patch.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24 17:53                   ` Patrick McHardy
@ 2011-01-24 18:01                     ` Eric Dumazet
  2011-01-24 18:16                       ` Patrick McHardy
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2011-01-24 18:01 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Pablo Neira Ayuso, netfilter-devel, Stephen Hemminger

Le lundi 24 janvier 2011 à 18:53 +0100, Patrick McHardy a écrit :

> No, that can't happen. Before the last refcount is released, the entry
> is removed from the hash, which requires to take the lock. Each entry
> contained in the hash table has a refcount of at least 1.
> 
> If there are no futher concerns, I'm going to apply Pablo's patch.

Thanks, that's perfect by me

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-24 18:01                     ` Eric Dumazet
@ 2011-01-24 18:16                       ` Patrick McHardy
  0 siblings, 0 replies; 17+ messages in thread
From: Patrick McHardy @ 2011-01-24 18:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Pablo Neira Ayuso, netfilter-devel, Stephen Hemminger

Am 24.01.2011 19:01, schrieb Eric Dumazet:
> Le lundi 24 janvier 2011 à 18:53 +0100, Patrick McHardy a écrit :
> 
>> No, that can't happen. Before the last refcount is released, the entry
>> is removed from the hash, which requires to take the lock. Each entry
>> contained in the hash table has a refcount of at least 1.
>>
>> If there are no futher concerns, I'm going to apply Pablo's patch.
> 
> Thanks, that's perfect by me
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy
  2011-01-23 23:16 [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy Pablo Neira Ayuso
  2011-01-24  6:41 ` Eric Dumazet
@ 2011-02-20 20:48 ` Stephen Hemminger
  2011-02-20 20:51   ` [PATCH 1/2] netfilter: fix race in conntrack " Stephen Hemminger
  2011-02-20 20:51   ` [PATCH 2/2] netfilter: ctnetlink: fix missing refcount increment during dumps Stephen Hemminger
  1 sibling, 2 replies; 17+ messages in thread
From: Stephen Hemminger @ 2011-02-20 20:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso, David Miller, Greg KH; +Cc: netfilter-devel, kaber, stable

On Mon, 24 Jan 2011 00:16:02 +0100
Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> In 13ee6ac579574a2a95e982b19920fd2495dce8cd, we recovered spinlocks
> to protect the dump of the conntrack table according to reports from
> Stephen and acknowledgments on the issue from Eric.
> 
> However, Stephen removed the refcount bump in that patch that allows
> to keep a reference to the current ct object we are interating over.
> That code avoids race conditions between ct object destruction and
> the iteration itself. This patch reintroduces these lines since the
> ct object may vanish between two recvmgs() invocations.
> 
> This patch fixes ocasional crashes while dumping the conntrack table
> intensively.
> 
> Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

These two patches should have been submitted to the stable kernel.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] netfilter: fix race in conntrack between dump_table and destroy
  2011-02-20 20:48 ` Stephen Hemminger
@ 2011-02-20 20:51   ` Stephen Hemminger
  2011-02-20 20:51   ` [PATCH 2/2] netfilter: ctnetlink: fix missing refcount increment during dumps Stephen Hemminger
  1 sibling, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2011-02-20 20:51 UTC (permalink / raw)
  To: Greg KH; +Cc: Pablo Neira Ayuso, David Miller, netfilter-devel, kaber, stable


The netlink interface to dump the connection tracking table has a race
when entries are deleted at the same time. A customer reported a crash
and the backtrace showed thatctnetlink_dump_table was running while a
conntrack entry was being destroyed.
(see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).

According to RCU documentation, when using hlist_nulls the reader
must handle the case of seeing a deleted entry and not proceed
further down the linked list.  The old code would continue
which caused the scan to walk into the free list.

This patch uses locking (rather than RCU) for this operation which
is guaranteed safe, and no longer requires getting reference while
doing dump operation.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(cherry picked from commit 13ee6ac579574a2a95e982b19920fd2495dce8cd)
---
 Patch against 2.6.37.1

 net/netfilter/nf_conntrack_netlink.c |   14 +++++---------
 1 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index b729ace..d57dcdb 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -642,25 +642,23 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
 	struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
 	u_int8_t l3proto = nfmsg->nfgen_family;
 
-	rcu_read_lock();
+	spin_lock_bh(&nf_conntrack_lock);
 	last = (struct nf_conn *)cb->args[1];
 	for (; cb->args[0] < net->ct.htable_size; cb->args[0]++) {
 restart:
-		hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[cb->args[0]],
+		hlist_nulls_for_each_entry(h, n, &net->ct.hash[cb->args[0]],
 					 hnnode) {
 			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
-			if (!atomic_inc_not_zero(&ct->ct_general.use))
-				continue;
 			/* Dump entries of a given L3 protocol number.
 			 * If it is not specified, ie. l3proto == 0,
 			 * then dump everything. */
 			if (l3proto && nf_ct_l3num(ct) != l3proto)
-				goto releasect;
+				continue;
 			if (cb->args[1]) {
 				if (ct != last)
-					goto releasect;
+					continue;
 				cb->args[1] = 0;
 			}
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
@@ -678,8 +676,6 @@ restart:
 				if (acct)
 					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
 			}
-releasect:
-		nf_ct_put(ct);
 		}
 		if (cb->args[1]) {
 			cb->args[1] = 0;
@@ -687,7 +683,7 @@ releasect:
 		}
 	}
 out:
-	rcu_read_unlock();
+	spin_unlock_bh(&nf_conntrack_lock);
 	if (last)
 		nf_ct_put(last);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] netfilter: ctnetlink: fix missing refcount increment during dumps
  2011-02-20 20:48 ` Stephen Hemminger
  2011-02-20 20:51   ` [PATCH 1/2] netfilter: fix race in conntrack " Stephen Hemminger
@ 2011-02-20 20:51   ` Stephen Hemminger
  1 sibling, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2011-02-20 20:51 UTC (permalink / raw)
  To: Greg KH; +Cc: Pablo Neira Ayuso, David Miller, netfilter-devel, kaber, stable


>From Pablo Neira Ayuso <pablo@netfilter.org>

In 13ee6ac netfilter: fix race in conntrack between dump_table and
destroy, we recovered spinlocks to protect the dump of the conntrack
table according to reports from Stephen and acknowledgments on the
issue from Eric.

In that patch, the refcount bump that allows to keep a reference
to the current ct object was removed. However, we still decrement
the refcount for that object in the output path of
ctnetlink_dump_table():

        if (last)
                nf_ct_put(last)

Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
(cherry picked from commit c71caf4114a0e1da3451cc92fba6a152929cd4c2)
---
 Patch against v2.6.37.1

 net/netfilter/nf_conntrack_netlink.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index d57dcdb..742a6dc 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -664,6 +664,7 @@ restart:
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
 						cb->nlh->nlmsg_seq,
 						IPCTNL_MSG_CT_NEW, ct) < 0) {
+				nf_conntrack_get(&ct->ct_general);
 				cb->args[1] = (unsigned long)ct;
 				goto out;
 			}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-02-20 20:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-23 23:16 [PATCH] netfilter: ctnetlink: fix (really) race condition between dump_table and destroy Pablo Neira Ayuso
2011-01-24  6:41 ` Eric Dumazet
2011-01-24  7:04   ` Eric Dumazet
2011-01-24 11:57   ` Pablo Neira Ayuso
2011-01-24 12:46     ` Patrick McHardy
2011-01-24 12:54       ` Eric Dumazet
2011-01-24 13:06         ` Pablo Neira Ayuso
2011-01-24 13:12           ` Eric Dumazet
2011-01-24 13:25             ` Pablo Neira Ayuso
2011-01-24 13:37               ` Patrick McHardy
2011-01-24 14:06                 ` Eric Dumazet
2011-01-24 17:53                   ` Patrick McHardy
2011-01-24 18:01                     ` Eric Dumazet
2011-01-24 18:16                       ` Patrick McHardy
2011-02-20 20:48 ` Stephen Hemminger
2011-02-20 20:51   ` [PATCH 1/2] netfilter: fix race in conntrack " Stephen Hemminger
2011-02-20 20:51   ` [PATCH 2/2] netfilter: ctnetlink: fix missing refcount increment during dumps Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.