All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
@ 2016-05-11 15:41 Florian Westphal
  2016-05-12  9:47 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2016-05-11 15:41 UTC (permalink / raw)
  To: netfilter-devel; +Cc: dale.4d, netdev, Florian Westphal, Eric W. Biederman

Under full load (unshare() in loop -> OOM conditions) we can
get kernel panic:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
[..]
task: ffff88012dfa3840 ti: ffff88012dffc000 task.ti: ffff88012dffc000
RIP: 0010:[<ffffffff81476c85>]  [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
RSP: 0000:ffff88012dfffd80  EFLAGS: 00010206
RAX: 0000000000000008 RBX: ffffffff81add0c0 RCX: ffff88013fd80000
[..]
Call Trace:
 [<ffffffff81474d98>] nf_queue_nf_hook_drop+0x18/0x20
 [<ffffffff814738eb>] nf_unregister_net_hook+0xdb/0x150
 [<ffffffff8147398f>] netfilter_net_exit+0x2f/0x60
 [<ffffffff8141b088>] ops_exit_list.isra.4+0x38/0x60
 [<ffffffff8141b652>] setup_net+0xc2/0x120
 [<ffffffff8141bd09>] copy_net_ns+0x79/0x120
 [<ffffffff8106965b>] create_new_namespaces+0x11b/0x1e0
 [<ffffffff810698a7>] unshare_nsproxy_namespaces+0x57/0xa0
 [<ffffffff8104baa2>] SyS_unshare+0x1b2/0x340
 [<ffffffff81608276>] entry_SYSCALL_64_fastpath+0x1e/0xa8
Code: 65 00 48 89 e5 41 56 41 55 41 54 53 83 e8 01 48 8b 97 70 12 00 00 48 98 49 89 f4 4c 8b 74 c2 18 4d 8d 6e 08 49 81 c6 88 00 00 00 <49> 8b 5d 00 48 85 db 74 1a 48 89 df 4c 89 e2 48 c7 c6 90 68 47

Problem is that we call into the nfqueue backend to zap
packets that might be queued to userspace when we unregister a
netfilter hook.

However, this assumes that the backend was initialized and
net_generic(net, nfnl_queue_net_id) returns valid memory.

This is only true if the hook unregister happens in the netns exit path.
If it happens during error unwind because a netns init hook returned
an error condition (e.g. out of memory), then the result of
net_generic(net, nfnl_queue_net_id) is undefined.

Only do the cleanup for namespaces that were on the
net_namespace_list list (i.e., all netns ->init() functions were ok).

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Dale Whitfield <dale.4d@gmail.com>
Fixes: 8405a8fff3f ("netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 AFAICS this works fine as well -- if netns was never on the
 net_namespace_list no packets can be queued so we don't need
 to care if the nf_queue init hook got called or not.

diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 5baa8e2..9722819 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -102,6 +102,13 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
 {
 	const struct nf_queue_handler *qh;
 
+	/* netns wasn't initialized, error unwind in progress.
+	 * Its possible that the nfq netns init function was not even
+	 * called, in which case nfq pernetns data is in undefined state.
+	 */
+	if (!net->list.next)
+		return;
+
 	rcu_read_lock();
 	qh = rcu_dereference(queue_handler);
 	if (qh)
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-11 15:41 [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding Florian Westphal
@ 2016-05-12  9:47 ` Pablo Neira Ayuso
  2016-05-12 16:15   ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Pablo Neira Ayuso @ 2016-05-12  9:47 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, dale.4d, netdev, Eric W. Biederman

On Wed, May 11, 2016 at 05:41:13PM +0200, Florian Westphal wrote:
> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
> index 5baa8e2..9722819 100644
> --- a/net/netfilter/nf_queue.c
> +++ b/net/netfilter/nf_queue.c
> @@ -102,6 +102,13 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
>  {
>  	const struct nf_queue_handler *qh;
>  
> +	/* netns wasn't initialized, error unwind in progress.
> +	 * Its possible that the nfq netns init function was not even
> +	 * called, in which case nfq pernetns data is in undefined state.
> +	 */
> +	if (!net->list.next)
> +		return;

Thanks Florian.

Question for the netns folks: Is this a stable assumption? My only
concern with this is that it relies on internal the netns
representation, so I'd like to make sure this thing doesn't break
again.

Let me know, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-12  9:47 ` Pablo Neira Ayuso
@ 2016-05-12 16:15   ` Eric W. Biederman
  2016-05-12 16:40     ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2016-05-12 16:15 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, dale.4d, netdev

Pablo Neira Ayuso <pablo@netfilter.org> writes:

> On Wed, May 11, 2016 at 05:41:13PM +0200, Florian Westphal wrote:
>> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
>> index 5baa8e2..9722819 100644
>> --- a/net/netfilter/nf_queue.c
>> +++ b/net/netfilter/nf_queue.c
>> @@ -102,6 +102,13 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
>>  {
>>  	const struct nf_queue_handler *qh;
>>  
>> +	/* netns wasn't initialized, error unwind in progress.
>> +	 * Its possible that the nfq netns init function was not even
>> +	 * called, in which case nfq pernetns data is in undefined state.
>> +	 */
>> +	if (!net->list.next)
>> +		return;
>
> Thanks Florian.
>
> Question for the netns folks: Is this a stable assumption? My only
> concern with this is that it relies on internal the netns
> representation, so I'd like to make sure this thing doesn't break
> again.
>
> Let me know, thanks.

Ugh. I got distracted before I finished replying.

I don't think the analysis is correct.

The unwind and the netns exit path should maintain the same constraints.
If they don't there is a bug, perhaps in initialization ordering.

Code should be able to count on net_generic() from the beginning of the
corresponding .init to the end of the corresponding .fini

Something like that is not happening and if we can I would like to track
down why.

Otherwise we need code like the above all over the place.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-12 16:15   ` Eric W. Biederman
@ 2016-05-12 16:40     ` Florian Westphal
  2016-05-13 19:40       ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2016-05-12 16:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel, dale.4d, netdev

Eric W. Biederman <ebiederm@xmission.com> wrote:
> > On Wed, May 11, 2016 at 05:41:13PM +0200, Florian Westphal wrote:
> >> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
> >> index 5baa8e2..9722819 100644
> >> --- a/net/netfilter/nf_queue.c
> >> +++ b/net/netfilter/nf_queue.c
> >> @@ -102,6 +102,13 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
> >>  {
> >>  	const struct nf_queue_handler *qh;
> >>  
> >> +	/* netns wasn't initialized, error unwind in progress.
> >> +	 * Its possible that the nfq netns init function was not even
> >> +	 * called, in which case nfq pernetns data is in undefined state.
> >> +	 */
> >> +	if (!net->list.next)
> >> +		return;
> >
> > Thanks Florian.
> >
> > Question for the netns folks: Is this a stable assumption? My only
> > concern with this is that it relies on internal the netns
> > representation, so I'd like to make sure this thing doesn't break
> > again.
> >
> > Let me know, thanks.
> 
> Ugh. I got distracted before I finished replying.
> 
> I don't think the analysis is correct.

I am afraid it is, see below for example

> The unwind and the netns exit path should maintain the same constraints.
> If they don't there is a bug, perhaps in initialization ordering.

Since 8405a8fff3f8545c888a872d6e3c0c8eecd4d348 any call to
nf_unregister_hook invokes nf_queue_nf_hook_drop().

If the nfnetlink_queue module is loaded, that means we call
nfqnl_nf_hook_drop function via the nf_queue nf_queue_handler struct
in nf_queue.c .

And since nfqnl_nf_hook_drop() uses net_generic() ...

> Code should be able to count on net_generic() from the beginning of the
> corresponding .init to the end of the corresponding .fini

... we cannot count on this.

Example:
1. we try to create new netns
2. net_namespace.c:ops_init walks the netns op list and calls ->init()
3. some of these init hooks register netfilter hooks
4. we call init hook for nfnetlink_queue, but this yields -EFOO (alternatively, another ->init handler before this failed)
5. netns init code invokes the ->exit() handlers in reverse order for unwinding
6. hooks get unregistered, which makes a call into nf_queue_nf_hook_drop
7. nf_queue then calls q->nfqnl_nf_hook_drop(net)
8. oops as nfnetlink_queue wasn't setup in this net ns and net_generic
returns some undefined result

> Something like that is not happening and if we can I would like to track
> down why.
> 
> Otherwise we need code like the above all over the place.

AFAICS no other callers do something similar, but yes,
we'd need this all over the place if there are others.

Maybe we need a saner fix, e.g. by adding bounds check to net_generic(),
and making sure that net_generic() returns non-NULL only if the per
netns memory was allocated properly.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-12 16:40     ` Florian Westphal
@ 2016-05-13 19:40       ` Eric W. Biederman
  2016-05-13 20:04         ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2016-05-13 19:40 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel, dale.4d, netdev

Florian Westphal <fw@strlen.de> writes:

> Eric W. Biederman <ebiederm@xmission.com> wrote:
>> > On Wed, May 11, 2016 at 05:41:13PM +0200, Florian Westphal wrote:
>> >> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
>> >> index 5baa8e2..9722819 100644
>> >> --- a/net/netfilter/nf_queue.c
>> >> +++ b/net/netfilter/nf_queue.c
>> >> @@ -102,6 +102,13 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
>> >>  {
>> >>  	const struct nf_queue_handler *qh;
>> >>  
>> >> +	/* netns wasn't initialized, error unwind in progress.
>> >> +	 * Its possible that the nfq netns init function was not even
>> >> +	 * called, in which case nfq pernetns data is in undefined state.
>> >> +	 */
>> >> +	if (!net->list.next)
>> >> +		return;
>> >
>> > Thanks Florian.
>> >
>> > Question for the netns folks: Is this a stable assumption? My only
>> > concern with this is that it relies on internal the netns
>> > representation, so I'd like to make sure this thing doesn't break
>> > again.
>> >
>> > Let me know, thanks.
>> 
>> Ugh. I got distracted before I finished replying.
>> 
>> I don't think the analysis is correct.
>
> I am afraid it is, see below for example
>
>> The unwind and the netns exit path should maintain the same constraints.
>> If they don't there is a bug, perhaps in initialization ordering.
>
> Since 8405a8fff3f8545c888a872d6e3c0c8eecd4d348 any call to
> nf_unregister_hook invokes nf_queue_nf_hook_drop().
>
> If the nfnetlink_queue module is loaded, that means we call
> nfqnl_nf_hook_drop function via the nf_queue nf_queue_handler struct
> in nf_queue.c .
>
> And since nfqnl_nf_hook_drop() uses net_generic() ...
>
>> Code should be able to count on net_generic() from the beginning of the
>> corresponding .init to the end of the corresponding .fini
>
> ... we cannot count on this.

I guess what I meant was that we should see if we can fix this.

> Example:
> 1. we try to create new netns
> 2. net_namespace.c:ops_init walks the netns op list and calls ->init()
> 3. some of these init hooks register netfilter hooks
> 4. we call init hook for nfnetlink_queue, but this yields -EFOO (alternatively, another ->init handler before this failed)
> 5. netns init code invokes the ->exit() handlers in reverse order for unwinding
> 6. hooks get unregistered, which makes a call into nf_queue_nf_hook_drop
> 7. nf_queue then calls q->nfqnl_nf_hook_drop(net)
> 8. oops as nfnetlink_queue wasn't setup in this net ns and net_generic
> returns some undefined result
>
>> Something like that is not happening and if we can I would like to track
>> down why.
>> 
>> Otherwise we need code like the above all over the place.
>
> AFAICS no other callers do something similar, but yes,
> we'd need this all over the place if there are others.
>
> Maybe we need a saner fix, e.g. by adding bounds check to net_generic(),
> and making sure that net_generic() returns non-NULL only if the per
> netns memory was allocated properly.

As a first approximiation I am thinking we should fix by making
nf_queue_register_handler a per netns operation.

Either that or give nfnetlink_queue it's own dedicated place in
struct net for struct nfnl_queue_net.

Let's sort out the interface so we can't get this wrong.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-13 19:40       ` Eric W. Biederman
@ 2016-05-13 20:04         ` Florian Westphal
  2016-05-13 20:26           ` Eric W. Biederman
  2016-05-13 20:44           ` Eric W. Biederman
  0 siblings, 2 replies; 15+ messages in thread
From: Florian Westphal @ 2016-05-13 20:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Florian Westphal, Pablo Neira Ayuso, netfilter-devel, dale.4d, netdev

Eric W. Biederman <ebiederm@xmission.com> wrote:
> > AFAICS no other callers do something similar, but yes,
> > we'd need this all over the place if there are others.
> >
> > Maybe we need a saner fix, e.g. by adding bounds check to net_generic(),
> > and making sure that net_generic() returns non-NULL only if the per
> > netns memory was allocated properly.
> 
> As a first approximiation I am thinking we should fix by making
> nf_queue_register_handler a per netns operation.

We can do that but then we'd end up storing the very same address
for each namespace which I think isn't nice either.

> Either that or give nfnetlink_queue it's own dedicated place in
> struct net for struct nfnl_queue_net.

That would work too, but then I don't see why that is preferrable
to this proposed patch. We could add a helper for this so that
we have something like

static bool netns_was_inited(const struct net *n)
{
	return net->list.next != NULL;
}

Another alternative is this patch (not even compile tested, just to
illustrate the idea):

diff --git a/include/net/netns/generic.h b/include/net/netns/generic.h
--- a/include/net/netns/generic.h
+++ b/include/net/netns/generic.h
@@ -38,7 +38,11 @@ static inline void *net_generic(const struct net *net, int id)
 
 	rcu_read_lock();
 	ng = rcu_dereference(net->gen);
-	ptr = ng->ptr[id - 1];
+
+	if (ng->len < id)
+		ptr = ng->ptr[id - 1];
+	else
+		ptr = NULL;
 	rcu_read_unlock();
 
 	return ptr;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -111,6 +111,7 @@ static int ops_init(const struct pernet_operations *ops, struct net *net)
 		return 0;
 
 cleanup:
+	net_assign_generic(net, *ops->id, NULL);
 	kfree(data);
 
 out:
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -927,6 +927,9 @@ static void nfqnl_nf_hook_drop(struct net *net, struct nf_hook_ops *hook)
 	struct nfnl_queue_net *q = nfnl_queue_pernet(net);
 	int i;
 
+	if (!q)
+		return;
+
 	rcu_read_lock();
 	for (i = 0; i < INSTANCE_BUCKETS; i++) {
 		struct nfqnl_instance *inst;

What do you think?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-13 20:04         ` Florian Westphal
@ 2016-05-13 20:26           ` Eric W. Biederman
  2016-05-13 21:07             ` Florian Westphal
  2016-05-13 20:44           ` Eric W. Biederman
  1 sibling, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2016-05-13 20:26 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel, dale.4d, netdev

Florian Westphal <fw@strlen.de> writes:

> Eric W. Biederman <ebiederm@xmission.com> wrote:
>> > AFAICS no other callers do something similar, but yes,
>> > we'd need this all over the place if there are others.
>> >
>> > Maybe we need a saner fix, e.g. by adding bounds check to net_generic(),
>> > and making sure that net_generic() returns non-NULL only if the per
>> > netns memory was allocated properly.
>> 
>> As a first approximiation I am thinking we should fix by making
>> nf_queue_register_handler a per netns operation.
>
> We can do that but then we'd end up storing the very same address
> for each namespace which I think isn't nice either.

Sort of.  At the same time we fundamentally need a per namespace
variable to ask if things were initialized.  So using the address as
that variable changes the equation not at all, and keeps the code
simpler.

>> Either that or give nfnetlink_queue it's own dedicated place in
>> struct net for struct nfnl_queue_net.
>
> That would work too, but then I don't see why that is preferrable
> to this proposed patch. We could add a helper for this so that
> we have something like
>
> static bool netns_was_inited(const struct net *n)
> {
> 	return net->list.next != NULL;
> }

That does not work because the real question were the things this piece
of code cares about initialized.

> Another alternative is this patch (not even compile tested, just to
> illustrate the idea):

Very much no.

I believe that changing net_generic to return a value so that
nfqnl_nf_hook_drop can detect if the timing is wrong is an error.

I also don't think your changes to net_generic will work as the number
of pointers should have been allocted properly from the start.  

Your proposed change is just papering over the problem and fixing it at
the wrong level.  (The obvious level yes) but still the wrong level.

> What do you think?

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-13 20:04         ` Florian Westphal
  2016-05-13 20:26           ` Eric W. Biederman
@ 2016-05-13 20:44           ` Eric W. Biederman
  2016-05-13 21:20             ` Florian Westphal
  1 sibling, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2016-05-13 20:44 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel, dale.4d, netdev


Florian could you test and verify this patch fixes your issues?

Unlike the other possibilities that have been discussed this also
addresses the nf_queue path as well as the nf_queue_hook_drop path.

And frankly that is the best argument I have for fixing it this way
because potentially nf_queue could have similar issues.

Eric

From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Fri, 13 May 2016 15:26:03 -0500
Subject: [PATCH] nf_queue: Make the queue_handler pernet

Florian Weber reported:
> Under full load (unshare() in loop -> OOM conditions) we can
> get kernel panic:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
> [..]
> task: ffff88012dfa3840 ti: ffff88012dffc000 task.ti: ffff88012dffc000
> RIP: 0010:[<ffffffff81476c85>]  [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
> RSP: 0000:ffff88012dfffd80  EFLAGS: 00010206
> RAX: 0000000000000008 RBX: ffffffff81add0c0 RCX: ffff88013fd80000
> [..]
> Call Trace:
>  [<ffffffff81474d98>] nf_queue_nf_hook_drop+0x18/0x20
>  [<ffffffff814738eb>] nf_unregister_net_hook+0xdb/0x150
>  [<ffffffff8147398f>] netfilter_net_exit+0x2f/0x60
>  [<ffffffff8141b088>] ops_exit_list.isra.4+0x38/0x60
>  [<ffffffff8141b652>] setup_net+0xc2/0x120
>  [<ffffffff8141bd09>] copy_net_ns+0x79/0x120
>  [<ffffffff8106965b>] create_new_namespaces+0x11b/0x1e0
>  [<ffffffff810698a7>] unshare_nsproxy_namespaces+0x57/0xa0
>  [<ffffffff8104baa2>] SyS_unshare+0x1b2/0x340
>  [<ffffffff81608276>] entry_SYSCALL_64_fastpath+0x1e/0xa8
> Code: 65 00 48 89 e5 41 56 41 55 41 54 53 83 e8 01 48 8b 97 70 12 00 00 48 98 49 89 f4 4c 8b 74 c2 18 4d 8d 6e 08 49 81 c6 88 00 00 00 <49> 8b 5d 00 48 85 db 74 1a 48 89 df 4c 89 e2 48 c7 c6 90 68 47
>

The simple fix for this requires a new pernet variable for struct
nf_queue that indicates when it is safe to use the dynamically
allocated nf_queue state.

As we need a variable anyway make nf_register_queue_handler and
nf_unregister_queue_handler pernet.  This allows the existing logic of
when it is safe to use the state from the nfnetlink_queue module to be
reused with no changes except for making it per net.

The syncrhonize_rcu from nf_unregister_queue_handler is moved to a new
function nfnl_queue_net_exit_batch so that the worst case of having a
syncrhonize_rcu in the pernet exit path is not experienced in batch
mode.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/net/netfilter/nf_queue.h |  4 ++--
 include/net/netns/netfilter.h    |  2 ++
 net/netfilter/nf_queue.c         | 17 ++++++++---------
 net/netfilter/nfnetlink_queue.c  | 18 ++++++++++++------
 4 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 9c5638ad872e..0dbce55437f2 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -28,8 +28,8 @@ struct nf_queue_handler {
 						struct nf_hook_ops *ops);
 };
 
-void nf_register_queue_handler(const struct nf_queue_handler *qh);
-void nf_unregister_queue_handler(void);
+void nf_register_queue_handler(struct net *net, const struct nf_queue_handler *qh);
+void nf_unregister_queue_handler(struct net *net);
 void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict);
 
 void nf_queue_entry_get_refs(struct nf_queue_entry *entry);
diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h
index 38aa4983e2a9..36d723579af2 100644
--- a/include/net/netns/netfilter.h
+++ b/include/net/netns/netfilter.h
@@ -5,11 +5,13 @@
 
 struct proc_dir_entry;
 struct nf_logger;
+struct nf_queue_handler;
 
 struct netns_nf {
 #if defined CONFIG_PROC_FS
 	struct proc_dir_entry *proc_netfilter;
 #endif
+	const struct nf_queue_handler __rcu *queue_handler;
 	const struct nf_logger __rcu *nf_loggers[NFPROTO_NUMPROTO];
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header *nf_log_dir_header;
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 5baa8e24e6ac..b19ad20a705c 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -26,23 +26,21 @@
  * Once the queue is registered it must reinject all packets it
  * receives, no matter what.
  */
-static const struct nf_queue_handler __rcu *queue_handler __read_mostly;
 
 /* return EBUSY when somebody else is registered, return EEXIST if the
  * same handler is registered, return 0 in case of success. */
-void nf_register_queue_handler(const struct nf_queue_handler *qh)
+void nf_register_queue_handler(struct net *net, const struct nf_queue_handler *qh)
 {
 	/* should never happen, we only have one queueing backend in kernel */
-	WARN_ON(rcu_access_pointer(queue_handler));
-	rcu_assign_pointer(queue_handler, qh);
+	WARN_ON(rcu_access_pointer(net->nf.queue_handler));
+	rcu_assign_pointer(net->nf.queue_handler, qh);
 }
 EXPORT_SYMBOL(nf_register_queue_handler);
 
 /* The caller must flush their queue before this */
-void nf_unregister_queue_handler(void)
+void nf_unregister_queue_handler(struct net *net)
 {
-	RCU_INIT_POINTER(queue_handler, NULL);
-	synchronize_rcu();
+	RCU_INIT_POINTER(net->nf.queue_handler, NULL);
 }
 EXPORT_SYMBOL(nf_unregister_queue_handler);
 
@@ -103,7 +101,7 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
 	const struct nf_queue_handler *qh;
 
 	rcu_read_lock();
-	qh = rcu_dereference(queue_handler);
+	qh = rcu_dereference(net->nf.queue_handler);
 	if (qh)
 		qh->nf_hook_drop(net, ops);
 	rcu_read_unlock();
@@ -122,9 +120,10 @@ int nf_queue(struct sk_buff *skb,
 	struct nf_queue_entry *entry = NULL;
 	const struct nf_afinfo *afinfo;
 	const struct nf_queue_handler *qh;
+	struct net *net = state->net;
 
 	/* QUEUE == DROP if no one is waiting, to be safe. */
-	qh = rcu_dereference(queue_handler);
+	qh = rcu_dereference(net->nf.queue_handler);
 	if (!qh) {
 		status = -ESRCH;
 		goto err;
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index cb5b630a645b..2c13e41c508c 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -1377,21 +1377,29 @@ static int __net_init nfnl_queue_net_init(struct net *net)
 			 net->nf.proc_netfilter, &nfqnl_file_ops))
 		return -ENOMEM;
 #endif
+	nf_register_queue_handler(net, &nfqh);
 	return 0;
 }
 
 static void __net_exit nfnl_queue_net_exit(struct net *net)
 {
+	nf_unregister_queue_handler(net);
 #ifdef CONFIG_PROC_FS
 	remove_proc_entry("nfnetlink_queue", net->nf.proc_netfilter);
 #endif
 }
 
+static void nfnl_queue_net_exit_batch(struct list_head *net_exit_list)
+{
+	synchronize_rcu();
+}
+
 static struct pernet_operations nfnl_queue_net_ops = {
-	.init	= nfnl_queue_net_init,
-	.exit	= nfnl_queue_net_exit,
-	.id	= &nfnl_queue_net_id,
-	.size	= sizeof(struct nfnl_queue_net),
+	.init		= nfnl_queue_net_init,
+	.exit		= nfnl_queue_net_exit,
+	.exit_batch	= nfnl_queue_net_exit_batch,
+	.id		= &nfnl_queue_net_id,
+	.size		= sizeof(struct nfnl_queue_net),
 };
 
 static int __init nfnetlink_queue_init(void)
@@ -1412,7 +1420,6 @@ static int __init nfnetlink_queue_init(void)
 	}
 
 	register_netdevice_notifier(&nfqnl_dev_notifier);
-	nf_register_queue_handler(&nfqh);
 	return status;
 
 cleanup_netlink_notifier:
@@ -1424,7 +1431,6 @@ out:
 
 static void __exit nfnetlink_queue_fini(void)
 {
-	nf_unregister_queue_handler();
 	unregister_netdevice_notifier(&nfqnl_dev_notifier);
 	nfnetlink_subsys_unregister(&nfqnl_subsys);
 	netlink_unregister_notifier(&nfqnl_rtnl_notifier);
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-13 20:26           ` Eric W. Biederman
@ 2016-05-13 21:07             ` Florian Westphal
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2016-05-13 21:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Florian Westphal, Pablo Neira Ayuso, netfilter-devel, dale.4d, netdev

Eric W. Biederman <ebiederm@xmission.com> wrote:
> Florian Westphal <fw@strlen.de> writes:
> 
> > Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> > AFAICS no other callers do something similar, but yes,
> >> > we'd need this all over the place if there are others.
> >> >
> >> > Maybe we need a saner fix, e.g. by adding bounds check to net_generic(),
> >> > and making sure that net_generic() returns non-NULL only if the per
> >> > netns memory was allocated properly.
> >> 
> >> As a first approximiation I am thinking we should fix by making
> >> nf_queue_register_handler a per netns operation.
> >
> > We can do that but then we'd end up storing the very same address
> > for each namespace which I think isn't nice either.
> 
> Sort of.  At the same time we fundamentally need a per namespace
> variable to ask if things were initialized.  So using the address as
> that variable changes the equation not at all, and keeps the code
> simpler.

We already do this for nf_conntrack_event_cb and nf_expect_event_cb so
we store 16 byte per netns where one bit would suffice.

> >> Either that or give nfnetlink_queue it's own dedicated place in
> >> struct net for struct nfnl_queue_net.
> >
> > That would work too, but then I don't see why that is preferrable
> > to this proposed patch. We could add a helper for this so that
> > we have something like

Addendum: its also a non-starter as nfqueue data is quite big
and the module won't be loaded in most configurations.

Looking at 70e9942f17a6193e9172a804e6569a8806633d6b again it seems
the problem is somewhat related to this one, although not identical.

Perhaps we should consider adding

nf_netns_feature_ok(struct net *, enum nfnet_feature);

enum nfnet_feature {
	NFNET_QUEUE,
	NFNET_EVENT,
};

and then set/unset that from the netns init/exit handlers.

nf_netns_feature_ok would then just need one bit in struct nf_net
to store wheter whatever feature we care about is available.

We could then de-netnsify those pointers again.

> > static bool netns_was_inited(const struct net *n)
> > {
> > 	return net->list.next != NULL;
> > }
> 
> That does not work because the real question were the things this piece
> of code cares about initialized.

Hmm, AFAIU we can't have packets queued without the netns being on
the list, no?

> Very much no.
> 
> I believe that changing net_generic to return a value so that
> nfqnl_nf_hook_drop can detect if the timing is wrong is an error.

Well, I don't see how this is fixable without having some means of
detecting wheter 'timing' is wrong...

You could say that making calls that end up in nfnetlink_queue
without a successful ->init() is illegal but in that case you
will need some condition by which the netfilter core can know
wheter that call is ok in the first place.

So I do not think that detecting it within nfnetlink_queue is worse
than detecting this in the caller, in fact, detecting it in callee
is better since the caller can't forget the test...

> I also don't think your changes to net_generic will work as the number
> of pointers should have been allocted properly from the start.

As I said I did not test it.
I only saw that net_assign_generic() can expand this array so I
figured one has to check if ptr->[id] is useable.

> Your proposed change is just papering over the problem and fixing it at
> the wrong level.  (The obvious level yes) but still the wrong level.

Well, I have no more ideas.

Looking at the number of function pointers used within netfilter
I'm worried that we will end up pushing more and more redundant info
into struct net to solve this "problem".

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-13 20:44           ` Eric W. Biederman
@ 2016-05-13 21:20             ` Florian Westphal
  2016-05-14  0:58               ` Eric W. Biederman
  2016-05-14  2:18               ` [PATCH] nf_queue: Make the queue_handler pernet Eric W. Biederman
  0 siblings, 2 replies; 15+ messages in thread
From: Florian Westphal @ 2016-05-13 21:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Florian Westphal, Pablo Neira Ayuso, netfilter-devel, dale.4d, netdev

Eric W. Biederman <ebiederm@xmission.com> wrote:
> Florian could you test and verify this patch fixes your issues?

Yes, this seems to work.

Pablo, I'm fine with this patch going into -nf/stable but I do not think
making the pointers per netns is a desireable option in the long term.

> Unlike the other possibilities that have been discussed this also
> addresses the nf_queue path as well as the nf_queue_hook_drop path.

The nf_queue path should have been fine, no?

Or putting it differently: can we start processing skbs before a netns
is fully initialized?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-13 21:20             ` Florian Westphal
@ 2016-05-14  0:58               ` Eric W. Biederman
  2016-05-14 10:33                 ` Florian Westphal
  2016-05-14  2:18               ` [PATCH] nf_queue: Make the queue_handler pernet Eric W. Biederman
  1 sibling, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2016-05-14  0:58 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel, dale.4d, netdev

Florian Westphal <fw@strlen.de> writes:

> Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Florian could you test and verify this patch fixes your issues?
>
> Yes, this seems to work.
>
> Pablo, I'm fine with this patch going into -nf/stable but I do not think
> making the pointers per netns is a desireable option in the long term.
>
>> Unlike the other possibilities that have been discussed this also
>> addresses the nf_queue path as well as the nf_queue_hook_drop path.
>
> The nf_queue path should have been fine, no?
>
> Or putting it differently: can we start processing skbs before a netns
> is fully initialized?

The practical case that worries me is what happens when someone does
"rmmod nfnetlink_queue" while the system is running.  It appears to me
that today we could free the per netns data during the rcu grace period
and cause a similar issue in nfnl_queue_pernet.

That looks like it could affect both the nf_queue path and the
nf_queue_nf_hook_drop path.

Eric




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] nf_queue: Make the queue_handler pernet
  2016-05-13 21:20             ` Florian Westphal
  2016-05-14  0:58               ` Eric W. Biederman
@ 2016-05-14  2:18               ` Eric W. Biederman
  2016-05-30  9:31                 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2016-05-14  2:18 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, dale.4d, netdev, Florian Westphal


Florian Weber reported:
> Under full load (unshare() in loop -> OOM conditions) we can
> get kernel panic:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
> [..]
> task: ffff88012dfa3840 ti: ffff88012dffc000 task.ti: ffff88012dffc000
> RIP: 0010:[<ffffffff81476c85>]  [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
> RSP: 0000:ffff88012dfffd80  EFLAGS: 00010206
> RAX: 0000000000000008 RBX: ffffffff81add0c0 RCX: ffff88013fd80000
> [..]
> Call Trace:
>  [<ffffffff81474d98>] nf_queue_nf_hook_drop+0x18/0x20
>  [<ffffffff814738eb>] nf_unregister_net_hook+0xdb/0x150
>  [<ffffffff8147398f>] netfilter_net_exit+0x2f/0x60
>  [<ffffffff8141b088>] ops_exit_list.isra.4+0x38/0x60
>  [<ffffffff8141b652>] setup_net+0xc2/0x120
>  [<ffffffff8141bd09>] copy_net_ns+0x79/0x120
>  [<ffffffff8106965b>] create_new_namespaces+0x11b/0x1e0
>  [<ffffffff810698a7>] unshare_nsproxy_namespaces+0x57/0xa0
>  [<ffffffff8104baa2>] SyS_unshare+0x1b2/0x340
>  [<ffffffff81608276>] entry_SYSCALL_64_fastpath+0x1e/0xa8
> Code: 65 00 48 89 e5 41 56 41 55 41 54 53 83 e8 01 48 8b 97 70 12 00 00 48 98 49 89 f4 4c 8b 74 c2 18 4d 8d 6e 08 49 81 c6 88 00 00 00 <49> 8b 5d 00 48 85 db 74 1a 48 89 df 4c 89 e2 48 c7 c6 90 68 47
>

The simple fix for this requires a new pernet variable for struct
nf_queue that indicates when it is safe to use the dynamically
allocated nf_queue state.

As we need a variable anyway make nf_register_queue_handler and
nf_unregister_queue_handler pernet.  This allows the existing logic of
when it is safe to use the state from the nfnetlink_queue module to be
reused with no changes except for making it per net.

The syncrhonize_rcu from nf_unregister_queue_handler is moved to a new
function nfnl_queue_net_exit_batch so that the worst case of having a
syncrhonize_rcu in the pernet exit path is not experienced in batch
mode.

Reported-by: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/net/netfilter/nf_queue.h |  4 ++--
 include/net/netns/netfilter.h    |  2 ++
 net/netfilter/nf_queue.c         | 17 ++++++++---------
 net/netfilter/nfnetlink_queue.c  | 18 ++++++++++++------
 4 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 9c5638ad872e..0dbce55437f2 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -28,8 +28,8 @@ struct nf_queue_handler {
 						struct nf_hook_ops *ops);
 };
 
-void nf_register_queue_handler(const struct nf_queue_handler *qh);
-void nf_unregister_queue_handler(void);
+void nf_register_queue_handler(struct net *net, const struct nf_queue_handler *qh);
+void nf_unregister_queue_handler(struct net *net);
 void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict);
 
 void nf_queue_entry_get_refs(struct nf_queue_entry *entry);
diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h
index 38aa4983e2a9..36d723579af2 100644
--- a/include/net/netns/netfilter.h
+++ b/include/net/netns/netfilter.h
@@ -5,11 +5,13 @@
 
 struct proc_dir_entry;
 struct nf_logger;
+struct nf_queue_handler;
 
 struct netns_nf {
 #if defined CONFIG_PROC_FS
 	struct proc_dir_entry *proc_netfilter;
 #endif
+	const struct nf_queue_handler __rcu *queue_handler;
 	const struct nf_logger __rcu *nf_loggers[NFPROTO_NUMPROTO];
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header *nf_log_dir_header;
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 5baa8e24e6ac..b19ad20a705c 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -26,23 +26,21 @@
  * Once the queue is registered it must reinject all packets it
  * receives, no matter what.
  */
-static const struct nf_queue_handler __rcu *queue_handler __read_mostly;
 
 /* return EBUSY when somebody else is registered, return EEXIST if the
  * same handler is registered, return 0 in case of success. */
-void nf_register_queue_handler(const struct nf_queue_handler *qh)
+void nf_register_queue_handler(struct net *net, const struct nf_queue_handler *qh)
 {
 	/* should never happen, we only have one queueing backend in kernel */
-	WARN_ON(rcu_access_pointer(queue_handler));
-	rcu_assign_pointer(queue_handler, qh);
+	WARN_ON(rcu_access_pointer(net->nf.queue_handler));
+	rcu_assign_pointer(net->nf.queue_handler, qh);
 }
 EXPORT_SYMBOL(nf_register_queue_handler);
 
 /* The caller must flush their queue before this */
-void nf_unregister_queue_handler(void)
+void nf_unregister_queue_handler(struct net *net)
 {
-	RCU_INIT_POINTER(queue_handler, NULL);
-	synchronize_rcu();
+	RCU_INIT_POINTER(net->nf.queue_handler, NULL);
 }
 EXPORT_SYMBOL(nf_unregister_queue_handler);
 
@@ -103,7 +101,7 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops)
 	const struct nf_queue_handler *qh;
 
 	rcu_read_lock();
-	qh = rcu_dereference(queue_handler);
+	qh = rcu_dereference(net->nf.queue_handler);
 	if (qh)
 		qh->nf_hook_drop(net, ops);
 	rcu_read_unlock();
@@ -122,9 +120,10 @@ int nf_queue(struct sk_buff *skb,
 	struct nf_queue_entry *entry = NULL;
 	const struct nf_afinfo *afinfo;
 	const struct nf_queue_handler *qh;
+	struct net *net = state->net;
 
 	/* QUEUE == DROP if no one is waiting, to be safe. */
-	qh = rcu_dereference(queue_handler);
+	qh = rcu_dereference(net->nf.queue_handler);
 	if (!qh) {
 		status = -ESRCH;
 		goto err;
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index cb5b630a645b..2c13e41c508c 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -1377,21 +1377,29 @@ static int __net_init nfnl_queue_net_init(struct net *net)
 			 net->nf.proc_netfilter, &nfqnl_file_ops))
 		return -ENOMEM;
 #endif
+	nf_register_queue_handler(net, &nfqh);
 	return 0;
 }
 
 static void __net_exit nfnl_queue_net_exit(struct net *net)
 {
+	nf_unregister_queue_handler(net);
 #ifdef CONFIG_PROC_FS
 	remove_proc_entry("nfnetlink_queue", net->nf.proc_netfilter);
 #endif
 }
 
+static void nfnl_queue_net_exit_batch(struct list_head *net_exit_list)
+{
+	synchronize_rcu();
+}
+
 static struct pernet_operations nfnl_queue_net_ops = {
-	.init	= nfnl_queue_net_init,
-	.exit	= nfnl_queue_net_exit,
-	.id	= &nfnl_queue_net_id,
-	.size	= sizeof(struct nfnl_queue_net),
+	.init		= nfnl_queue_net_init,
+	.exit		= nfnl_queue_net_exit,
+	.exit_batch	= nfnl_queue_net_exit_batch,
+	.id		= &nfnl_queue_net_id,
+	.size		= sizeof(struct nfnl_queue_net),
 };
 
 static int __init nfnetlink_queue_init(void)
@@ -1412,7 +1420,6 @@ static int __init nfnetlink_queue_init(void)
 	}
 
 	register_netdevice_notifier(&nfqnl_dev_notifier);
-	nf_register_queue_handler(&nfqh);
 	return status;
 
 cleanup_netlink_notifier:
@@ -1424,7 +1431,6 @@ out:
 
 static void __exit nfnetlink_queue_fini(void)
 {
-	nf_unregister_queue_handler();
 	unregister_netdevice_notifier(&nfqnl_dev_notifier);
 	nfnetlink_subsys_unregister(&nfqnl_subsys);
 	netlink_unregister_notifier(&nfqnl_rtnl_notifier);
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-14  0:58               ` Eric W. Biederman
@ 2016-05-14 10:33                 ` Florian Westphal
  2016-05-15  3:00                   ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2016-05-14 10:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Florian Westphal, Pablo Neira Ayuso, netfilter-devel, dale.4d, netdev

Eric W. Biederman <ebiederm@xmission.com> wrote:
> Florian Westphal <fw@strlen.de> writes:
> 
> > Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> Florian could you test and verify this patch fixes your issues?
> >
> > Yes, this seems to work.
> >
> > Pablo, I'm fine with this patch going into -nf/stable but I do not think
> > making the pointers per netns is a desireable option in the long term.
> >
> >> Unlike the other possibilities that have been discussed this also
> >> addresses the nf_queue path as well as the nf_queue_hook_drop path.
> >
> > The nf_queue path should have been fine, no?
> >
> > Or putting it differently: can we start processing skbs before a netns
> > is fully initialized?
> 
> The practical case that worries me is what happens when someone does
> "rmmod nfnetlink_queue" while the system is running.  It appears to me
> that today we could free the per netns data during the rcu grace period
> and cause a similar issue in nfnl_queue_pernet.
>
> That looks like it could affect both the nf_queue path and the
> nf_queue_nf_hook_drop path.

OK, I'll check this again but I seem to recall this was fine (the
nfqueue module exit path sets the handler to NULL before doing anything
else).

The normal netns exit path should be fine too as exit and free happens
in two distinct loops, i.e. while (without your change) we can have
calls to nf_queue_hook_drop after the nfqueue netns exit function was
called, these calls will always happen before the pernets data is
freed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding
  2016-05-14 10:33                 ` Florian Westphal
@ 2016-05-15  3:00                   ` Eric W. Biederman
  0 siblings, 0 replies; 15+ messages in thread
From: Eric W. Biederman @ 2016-05-15  3:00 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel, dale.4d, netdev

Florian Westphal <fw@strlen.de> writes:

> Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Florian Westphal <fw@strlen.de> writes:
>> 
>> > Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >> Florian could you test and verify this patch fixes your issues?
>> >
>> > Yes, this seems to work.
>> >
>> > Pablo, I'm fine with this patch going into -nf/stable but I do not think
>> > making the pointers per netns is a desireable option in the long term.
>> >
>> >> Unlike the other possibilities that have been discussed this also
>> >> addresses the nf_queue path as well as the nf_queue_hook_drop path.
>> >
>> > The nf_queue path should have been fine, no?
>> >
>> > Or putting it differently: can we start processing skbs before a netns
>> > is fully initialized?
>> 
>> The practical case that worries me is what happens when someone does
>> "rmmod nfnetlink_queue" while the system is running.  It appears to me
>> that today we could free the per netns data during the rcu grace period
>> and cause a similar issue in nfnl_queue_pernet.
>>
>> That looks like it could affect both the nf_queue path and the
>> nf_queue_nf_hook_drop path.
>
> OK, I'll check this again but I seem to recall this was fine (the
> nfqueue module exit path sets the handler to NULL before doing anything
> else).

Good point.

Yes, the nfnetlink_queue module calls nf_unregister_queue_handler()
in the module fini method before it does anything else.  That does
set queue_handler to NULL and calls synchronize_rcu() before anything
else.

So in practice that is not a problem, but being disconnected from
everything else that is not immediately apparent.  Sigh.

> The normal netns exit path should be fine too as exit and free happens
> in two distinct loops, i.e. while (without your change) we can have
> calls to nf_queue_hook_drop after the nfqueue netns exit function was
> called, these calls will always happen before the pernets data is
> freed.

Ouch.  That is a little scary.  Today we only have remove_proc_entry
in nfnl_queue_net_exit.  If we had something more substantial those
calls after .exit (without my change) we could get into nasty to find
oopses.  So I guess I did prevent a possible future issue with my patch.


I am half wondering if we could make everything simpler by simply not
allowing nfnetlink_queue be a module.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nf_queue: Make the queue_handler pernet
  2016-05-14  2:18               ` [PATCH] nf_queue: Make the queue_handler pernet Eric W. Biederman
@ 2016-05-30  9:31                 ` Pablo Neira Ayuso
  0 siblings, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2016-05-30  9:31 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netfilter-devel, dale.4d, netdev, Florian Westphal

On Fri, May 13, 2016 at 09:18:52PM -0500, Eric W. Biederman wrote:
> 
> Florian Weber reported:
> > Under full load (unshare() in loop -> OOM conditions) we can
> > get kernel panic:
> >
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > IP: [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
> > [..]
> > task: ffff88012dfa3840 ti: ffff88012dffc000 task.ti: ffff88012dffc000
> > RIP: 0010:[<ffffffff81476c85>]  [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
> > RSP: 0000:ffff88012dfffd80  EFLAGS: 00010206
> > RAX: 0000000000000008 RBX: ffffffff81add0c0 RCX: ffff88013fd80000
> > [..]
> > Call Trace:
> >  [<ffffffff81474d98>] nf_queue_nf_hook_drop+0x18/0x20
> >  [<ffffffff814738eb>] nf_unregister_net_hook+0xdb/0x150
> >  [<ffffffff8147398f>] netfilter_net_exit+0x2f/0x60
> >  [<ffffffff8141b088>] ops_exit_list.isra.4+0x38/0x60
> >  [<ffffffff8141b652>] setup_net+0xc2/0x120
> >  [<ffffffff8141bd09>] copy_net_ns+0x79/0x120
> >  [<ffffffff8106965b>] create_new_namespaces+0x11b/0x1e0
> >  [<ffffffff810698a7>] unshare_nsproxy_namespaces+0x57/0xa0
> >  [<ffffffff8104baa2>] SyS_unshare+0x1b2/0x340
> >  [<ffffffff81608276>] entry_SYSCALL_64_fastpath+0x1e/0xa8
> > Code: 65 00 48 89 e5 41 56 41 55 41 54 53 83 e8 01 48 8b 97 70 12 00 00 48 98 49 89 f4 4c 8b 74 c2 18 4d 8d 6e 08 49 81 c6 88 00 00 00 <49> 8b 5d 00 48 85 db 74 1a 48 89 df 4c 89 e2 48 c7 c6 90 68 47
> >
> 
> The simple fix for this requires a new pernet variable for struct
> nf_queue that indicates when it is safe to use the dynamically
> allocated nf_queue state.
> 
> As we need a variable anyway make nf_register_queue_handler and
> nf_unregister_queue_handler pernet.  This allows the existing logic of
> when it is safe to use the state from the nfnetlink_queue module to be
> reused with no changes except for making it per net.
> 
> The syncrhonize_rcu from nf_unregister_queue_handler is moved to a new
> function nfnl_queue_net_exit_batch so that the worst case of having a
> syncrhonize_rcu in the pernet exit path is not experienced in batch
> mode.

Applied, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-05-30  9:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-11 15:41 [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding Florian Westphal
2016-05-12  9:47 ` Pablo Neira Ayuso
2016-05-12 16:15   ` Eric W. Biederman
2016-05-12 16:40     ` Florian Westphal
2016-05-13 19:40       ` Eric W. Biederman
2016-05-13 20:04         ` Florian Westphal
2016-05-13 20:26           ` Eric W. Biederman
2016-05-13 21:07             ` Florian Westphal
2016-05-13 20:44           ` Eric W. Biederman
2016-05-13 21:20             ` Florian Westphal
2016-05-14  0:58               ` Eric W. Biederman
2016-05-14 10:33                 ` Florian Westphal
2016-05-15  3:00                   ` Eric W. Biederman
2016-05-14  2:18               ` [PATCH] nf_queue: Make the queue_handler pernet Eric W. Biederman
2016-05-30  9:31                 ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.