netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup
@ 2013-07-22 21:45 Hannes Frederic Sowa
  2013-07-23  8:28 ` Fan Du
  2013-07-25  0:03 ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Hannes Frederic Sowa @ 2013-07-22 21:45 UTC (permalink / raw)
  To: netdev; +Cc: srivatsa.bhat

Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer
which leads to a panic (from Srivatsa S. Bhat):

BUG: unable to handle kernel paging request at ffff882018552020
IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
+ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
Workqueue: netns cleanup_net
task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000
RIP: 0010:[<ffffffffa0366b02>]  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
RSP: 0018:ffff881039367bd8  EFLAGS: 00010286
RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200
RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68
RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222
R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040
R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0
Stack:
 ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000
 ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0
 ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0
Call Trace:
 [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6]
 [<ffffffff815bdecb>] inet_release+0xfb/0x220
 [<ffffffff815bddf2>] ? inet_release+0x22/0x220
 [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6]
 [<ffffffff8151c1d9>] sock_release+0x29/0xa0
 [<ffffffff81525520>] sk_release_kernel+0x30/0x70
 [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6]
 [<ffffffff8152fff9>] ops_exit_list+0x39/0x60
 [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0
 [<ffffffff81075e3a>] process_one_work+0x1da/0x610
 [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
 [<ffffffff81076390>] worker_thread+0x120/0x3a0
 [<ffffffff81076270>] ? process_one_work+0x610/0x610
 [<ffffffff8107da2e>] kthread+0xee/0x100
 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06 <4c> 39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65
RIP  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
 RSP <ffff881039367bd8>
CR2: ffff882018552020

Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/ip6mr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 583e8d4..03986d3 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -259,10 +259,12 @@ static void __net_exit ip6mr_rules_exit(struct net *net)
 {
 	struct mr6_table *mrt, *next;
 
+	rtnl_lock();
 	list_for_each_entry_safe(mrt, next, &net->ipv6.mr6_tables, list) {
 		list_del(&mrt->list);
 		ip6mr_free_table(mrt);
 	}
+	rtnl_unlock();
 	fib_rules_unregister(net->ipv6.mr6_rules_ops);
 }
 #else
@@ -289,7 +291,10 @@ static int __net_init ip6mr_rules_init(struct net *net)
 
 static void __net_exit ip6mr_rules_exit(struct net *net)
 {
+	rtnl_lock();
 	ip6mr_free_table(net->ipv6.mrt6);
+	net->ipv6.mrt6 = NULL;
+	rtnl_unlock();
 }
 #endif
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup
  2013-07-22 21:45 [PATCH net] ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup Hannes Frederic Sowa
@ 2013-07-23  8:28 ` Fan Du
  2013-07-23 13:39   ` Hannes Frederic Sowa
  2013-07-25  0:03 ` David Miller
  1 sibling, 1 reply; 4+ messages in thread
From: Fan Du @ 2013-07-23  8:28 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev, srivatsa.bhat

Hallo Hannes

On 2013年07月23日 05:45, Hannes Frederic Sowa wrote:
> Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer
> which leads to a panic (from Srivatsa S. Bhat):
>
> BUG: unable to handle kernel paging request at ffff882018552020
> IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
> PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> +ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod
> CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4
> Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
> Workqueue: netns cleanup_net
> task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000
> RIP: 0010:[<ffffffffa0366b02>]  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
> RSP: 0018:ffff881039367bd8  EFLAGS: 00010286
> RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200
> RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68
> RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222
> R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040
> R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0
> Stack:
>   ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000
>   ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0
>   ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0
> Call Trace:
>   [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6]
>   [<ffffffff815bdecb>] inet_release+0xfb/0x220
>   [<ffffffff815bddf2>] ? inet_release+0x22/0x220
>   [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6]
>   [<ffffffff8151c1d9>] sock_release+0x29/0xa0
>   [<ffffffff81525520>] sk_release_kernel+0x30/0x70
>   [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6]
>   [<ffffffff8152fff9>] ops_exit_list+0x39/0x60
>   [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0
>   [<ffffffff81075e3a>] process_one_work+0x1da/0x610
>   [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
>   [<ffffffff81076390>] worker_thread+0x120/0x3a0
>   [<ffffffff81076270>] ? process_one_work+0x610/0x610
>   [<ffffffff8107da2e>] kthread+0xee/0x100
>   [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
>   [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70

This call trace actually comes a long way down from put_net, indicating net
reference count is ZERO, then clean all things up. Since Srivatsa didn't
enable CONFIG_IPV6_MROUTE_MULTIPLE_TABLES, so mrt could only be net->ipv6.mrt6,
which is allocated in ip6mr_rules_init. The only place to free mrt6 is
ip6mr_rules_exit in this configuration.

Question is how could mrt6 be freed in ip6mr_rules_exit, and then put_net has
decreased net->count to zero and then free all things up, all of this happened
in boot up process? Even if mrt6 has been freed, subsequent access on that area
didn't necessarily cause page fault, PTE 8000002018552060 shows the mapping is
not present at all. I think SLAB did tired down page entry for kfreed area once
pte has been setup already.

The scenario is weird for me, or my ignorance reach a new limit :)


> Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06<4c>  39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65
> RIP  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
>   RSP<ffff881039367bd8>
> CR2: ffff882018552020
>
> Reported-by: Srivatsa S. Bhat<srivatsa.bhat@linux.vnet.ibm.com>
> Tested-by: Srivatsa S. Bhat<srivatsa.bhat@linux.vnet.ibm.com>
> Signed-off-by: Hannes Frederic Sowa<hannes@stressinduktion.org>
> ---
>   net/ipv6/ip6mr.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> index 583e8d4..03986d3 100644
> --- a/net/ipv6/ip6mr.c
> +++ b/net/ipv6/ip6mr.c
> @@ -259,10 +259,12 @@ static void __net_exit ip6mr_rules_exit(struct net *net)
>   {
>   	struct mr6_table *mrt, *next;
>
> +	rtnl_lock();
>   	list_for_each_entry_safe(mrt, next,&net->ipv6.mr6_tables, list) {
>   		list_del(&mrt->list);
>   		ip6mr_free_table(mrt);
>   	}
> +	rtnl_unlock();
>   	fib_rules_unregister(net->ipv6.mr6_rules_ops);
>   }
>   #else
> @@ -289,7 +291,10 @@ static int __net_init ip6mr_rules_init(struct net *net)
>
>   static void __net_exit ip6mr_rules_exit(struct net *net)
>   {
> +	rtnl_lock();
>   	ip6mr_free_table(net->ipv6.mrt6);
> +	net->ipv6.mrt6 = NULL;
> +	rtnl_unlock();
>   }
>   #endif
>

-- 
浮沉随浪只记今朝笑

--fan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup
  2013-07-23  8:28 ` Fan Du
@ 2013-07-23 13:39   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 4+ messages in thread
From: Hannes Frederic Sowa @ 2013-07-23 13:39 UTC (permalink / raw)
  To: Fan Du; +Cc: netdev, srivatsa.bhat

On Tue, Jul 23, 2013 at 04:28:14PM +0800, Fan Du wrote:
> This call trace actually comes a long way down from put_net, indicating net
> reference count is ZERO, then clean all things up. Since Srivatsa didn't
> enable CONFIG_IPV6_MROUTE_MULTIPLE_TABLES, so mrt could only be 
> net->ipv6.mrt6,
> which is allocated in ip6mr_rules_init. The only place to free mrt6 is
> ip6mr_rules_exit in this configuration.
> 
> Question is how could mrt6 be freed in ip6mr_rules_exit, and then put_net 
> has
> decreased net->count to zero and then free all things up, all of this 
> happened
> in boot up process? Even if mrt6 has been freed, subsequent access on that 
> area
> didn't necessarily cause page fault, PTE 8000002018552060 shows the mapping 
> is
> not present at all. I think SLAB did tired down page entry for kfreed area 
> once
> pte has been setup already.
> 
> The scenario is weird for me, or my ignorance reach a new limit :)

The important thing is: net->ipv6.mrt6 = NULL;

After we freed the mrt6 table we leave the pointer value intact. So when the
namespace cleanup calls the next callback and the icmpv6 socket closes, we try
to check mrt->mroute6_sk again.

See how ip6mr_for_each_table expands if you compile without
CONFIG_IPV6_MROUTE_MULTIPLE_TABLES:

#define ip6mr_for_each_table(mrt, net) \
        for (mrt = net->ipv6.mrt6; mrt; mrt = NULL)

It is merely a NULL check.

Does it make sense now? I added the rtnl_lock for ip6mr_free_table ==>
mroute_clean_tables ==> mif6_delete and the lock has nothing to do with this
"race", as the netspace cleanup is serialized in the workqueue.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup
  2013-07-22 21:45 [PATCH net] ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup Hannes Frederic Sowa
  2013-07-23  8:28 ` Fan Du
@ 2013-07-25  0:03 ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2013-07-25  0:03 UTC (permalink / raw)
  To: hannes; +Cc: netdev, srivatsa.bhat

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Mon, 22 Jul 2013 23:45:53 +0200

> Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer
> which leads to a panic (from Srivatsa S. Bhat):
 ...
> Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied and queued up for -stable, thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-07-25  0:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-22 21:45 [PATCH net] ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup Hannes Frederic Sowa
2013-07-23  8:28 ` Fan Du
2013-07-23 13:39   ` Hannes Frederic Sowa
2013-07-25  0:03 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).