netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [(RFC) PATCH ] NULL pointer dereference on rmmod iptable_mangle.
@ 2020-06-03 21:25 David Wilder
  2020-06-03 22:05 ` Florian Westphal
  0 siblings, 1 reply; 6+ messages in thread
From: David Wilder @ 2020-06-03 21:25 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, wilder, mkubecek

This crash happened on a ppc64le system running ltp network tests when ltp script ran "rmmod iptable_mangle".

[213425.602369] BUG: Kernel NULL pointer dereference at 0x00000010
[213425.602388] Faulting instruction address: 0xc008000000550bdc
[213425.602399] Oops: Kernel access of bad area, sig: 11 [#1]
[213425.602409] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[213425.602418] Modules linked in: nf_log_ipv4 nf_log_common iptable_mangle(-) iptable_nat nf_nat nf_conntrack iptable_filter ip_tables xt_limit xt_multiport xt_LOG xt_tcpudp nf_defrag_ipv6 nf_defrag_ipv4 x_tables sch_netem tcp_bbr rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver rds dummy sctp crypto_user veth uhid kvm_pr kvm vfio_iommu_spapr_tce vfio_spapr_eeh vfio hci_vhci bluetooth ecdh_generic ecc vhost_net tap vhost_vsock vmw_vsock_virtio_transport_common vhost vsock uinput n_gsm pps_ldisc ppp_synctty ppp_async ppp_generic slip slhc serport brd tun fuse vfat fat xfs ext4 crc16 mbcache jbd2 mlx5_ib ib_uverbs ib_core mlx5_core mlxfw tls loop be2net ibmveth(XX) st sr_mod cdrom lp parport_pc parport nvram xfrm_user joydev binfmt_misc rpadlpar_io(XX) rpaphp(XX) xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nfsv3 nfs_acl nfs lockd grace sunrpc fscache af_packet rfkill vmx_crypto gf128mul ibmvnic uio_pdrv_genirq crct10dif_vpmsum uio rtc_generic btrfs
[213425.602577]  libcrc32c xor raid6_pq dm_service_time sd_mod ibmvfc(XX) scsi_transport_fc crc32c_vpmsum dm_mirror dm_region_hash dm_log sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod [last unloaded: ipt_REJECT]
[213425.602659] Supported: No, Unreleased kernel
[213425.602671] CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G               X   5.3.18-14-default #1 SLE15-SP2 (unreleased)
[213425.602682] NIP:  c008000000550bdc LR: c008000001de00c8 CTR: c008000000550b48
[213425.602692] REGS: c000000002973250 TRAP: 0380   Tainted: G               X    (5.3.18-14-default)
[213425.602701] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 88082822  XER: 00000001
[213425.602726] CFAR: c008000000551050 IRQMASK: 0
                GPR00: c008000001de00c8 c0000000029734e0 c00800000055d800 c00000087b7c3600
                GPR04: c000000002973768 0000000000000000 0000000000000000 c0000007ab050800
                GPR08: 000000000000000e c0000007ab050814 c000000001558380 c008000001de04e0
                GPR12: c008000000550b48 c0000000021e0000 c00000000016b358 0000000000000100
                GPR16: 000000000000008e 00000000000000a0 0000000000000000 0000000000000005
                GPR20: 0000000000000000 c000000001168fa8 0000000000000000 c0000007ac4b46d4
                GPR24: c000000002973768 c008000000555f80 0000000000000001 c0000000011ee000
                GPR28: c0000000011ee000 c00000087b7c3600 c0000007ab05080e c000000002973768
[213425.602816] NIP [c008000000550bdc] ipt_do_table+0x94/0x980 [ip_tables]
[213425.602827] LR [c008000001de00c8] iptable_mangle_hook+0x50/0x180 [iptable_mangle]
[213425.602835] Call Trace:
[213425.602843] [c0000000029734e0] [c000000002973570] 0xc000000002973570 (unreliable)
[213425.602856] [c000000002973690] [c008000001de00c8] iptable_mangle_hook+0x50/0x180 [iptable_mangle]
[213425.602871] [c0000000029736f0] [c000000000a82b60] nf_hook_slow+0x70/0x140
[213425.602882] [c000000002973740] [c000000000a90cdc] ip_rcv+0xac/0x120
[213425.602894] [c0000000029737c0] [c0000000009d978c] __netif_receive_skb_core+0x42c/0x1160
[213425.602906] [c0000000029738a0] [c0000000009dab80] __netif_receive_skb_list_core+0x130/0x330
[213425.602919] [c000000002973940] [c0000000009dafa4] netif_receive_skb_list_internal+0x224/0x350
[213425.602932] [c0000000029739c0] [c0000000009db2b4] gro_normal_list.part.109+0x34/0x60
[213425.602943] [c0000000029739f0] [c0000000009dc0c8] napi_gro_receive+0x1b8/0x200
[213425.602957] [c000000002973a30] [c008000000e32368] ibmvnic_poll+0x2d0/0x410 [ibmvnic]
[213425.602969] [c000000002973b10] [c0000000009dcebc] net_rx_action+0x1ec/0x540
[213425.602982] [c000000002973c30] [c000000000c1ff68] __do_softirq+0x178/0x424
[213425.602994] [c000000002973d20] [c00000000013c924] run_ksoftirqd+0x64/0x90
[213425.603006] [c000000002973d40] [c0000000001717c0] smpboot_thread_fn+0x270/0x2c0
[213425.603018] [c000000002973db0] [c00000000016b4fc] kthread+0x1ac/0x1c0
[213425.603029] [c000000002973e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c
[213425.603038] Instruction dump:
[213425.603046] e8e300c0 82c40000 e92d1178 f9210118 39200000 2fbc0000 7fc74214 419e046c
[213425.603067] eb380010 2fb90000 419e0474 393e0006 <80850010> 38c00000 7d404e2c 39200001
[213425.603089] ---[ end trace f2babb2170f723cc ]---
[213425.690517]

In the crash we find in iptable_mangle_hook() that state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference. net->ipv4.iptable_mangle is set to NULL in iptable_mangle_net_exit() and called when ip_mangle modules is unloaded. A rmmod task was found in the crash dump.  A 2nd crash showed the same problem when running "rmmod iptable_filter" (net->ipv4.iptable_filter=NULL).

Once a hook is registered packets will picked up a pointer from: net->ipv4.iptable_$table. The patch adds a call to synchronize_net() in ipt_unregister_table() to insure no packets are in flight that have picked up the pointer before completing the un-register.

This change has has prevented the problem in our testing.  However, we have concerns with this change as it would mean that on netns cleanup, we would need one synchronize_net() call for every table in use. Also, on module unload, there would be one synchronize_net() for every existing netns.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
---
 net/ipv4/netfilter/ip_tables.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index c2670ea..97c4121 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1800,8 +1800,10 @@ int ipt_register_table(struct net *net, const struct xt_table *table,
 void ipt_unregister_table(struct net *net, struct xt_table *table,
 			  const struct nf_hook_ops *ops)
 {
-	if (ops)
+	if (ops) {
 		nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
+		synchronize_net();
+	}
 	__ipt_unregister_table(net, table);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [(RFC) PATCH ] NULL pointer dereference on rmmod iptable_mangle.
  2020-06-03 21:25 [(RFC) PATCH ] NULL pointer dereference on rmmod iptable_mangle David Wilder
@ 2020-06-03 22:05 ` Florian Westphal
  2020-06-04  6:00   ` dwilder
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2020-06-03 22:05 UTC (permalink / raw)
  To: David Wilder; +Cc: netdev, netfilter-devel, wilder, mkubecek

David Wilder <dwilder@us.ibm.com> wrote:
> This crash happened on a ppc64le system running ltp network tests when ltp script ran "rmmod iptable_mangle".
> 
> [213425.602369] BUG: Kernel NULL pointer dereference at 0x00000010
> [213425.602388] Faulting instruction address: 0xc008000000550bdc
[..]

> In the crash we find in iptable_mangle_hook() that state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference. net->ipv4.iptable_mangle is set to NULL in iptable_mangle_net_exit() and called when ip_mangle modules is unloaded. A rmmod task was found in the crash dump.  A 2nd crash showed the same problem when running "rmmod iptable_filter" (net->ipv4.iptable_filter=NULL).
> 
> Once a hook is registered packets will picked up a pointer from: net->ipv4.iptable_$table. The patch adds a call to synchronize_net() in ipt_unregister_table() to insure no packets are in flight that have picked up the pointer before completing the un-register.
> 
> This change has has prevented the problem in our testing.  However, we have concerns with this change as it would mean that on netns cleanup, we would need one synchronize_net() call for every table in use. Also, on module unload, there would be one synchronize_net() for every existing netns.

Yes, I agree with the analysis.

> Signed-off-by: David Wilder <dwilder@us.ibm.com>
> ---
>  net/ipv4/netfilter/ip_tables.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
> index c2670ea..97c4121 100644
> --- a/net/ipv4/netfilter/ip_tables.c
> +++ b/net/ipv4/netfilter/ip_tables.c
> @@ -1800,8 +1800,10 @@ int ipt_register_table(struct net *net, const struct xt_table *table,
>  void ipt_unregister_table(struct net *net, struct xt_table *table,
>  			  const struct nf_hook_ops *ops)
>  {
> -	if (ops)
> +	if (ops) {
>  		nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
> +		synchronize_net();
> +	}

I'd wager ebtables, arptables and ip6tables have the same bug.

The extra synchronize_net() isn't ideal.  We could probably do it this
way and then improve in a second patch.

One way to fix this without a new synchronize_net() is to switch all
iptable_foo.c to use ".pre_exit" hook as well.

pre_exit would unregister the underlying hook and .exit would to the
table freeing.

Since the netns core already does an unconditional synchronize_rcu after
the pre_exit hooks this would avoid the problem as well.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [(RFC) PATCH ] NULL pointer dereference on rmmod iptable_mangle.
  2020-06-03 22:05 ` Florian Westphal
@ 2020-06-04  6:00   ` dwilder
  2020-06-04 10:38     ` Florian Westphal
  0 siblings, 1 reply; 6+ messages in thread
From: dwilder @ 2020-06-04  6:00 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, netfilter-devel, wilder, mkubecek

On 2020-06-03 15:05, Florian Westphal wrote:
> David Wilder <dwilder@us.ibm.com> wrote:
>> This crash happened on a ppc64le system running ltp network tests when 
>> ltp script ran "rmmod iptable_mangle".
>> 
>> [213425.602369] BUG: Kernel NULL pointer dereference at 0x00000010
>> [213425.602388] Faulting instruction address: 0xc008000000550bdc
> [..]
> 
>> In the crash we find in iptable_mangle_hook() that 
>> state->net->ipv4.iptable_mangle=NULL causing a NULL pointer 
>> dereference. net->ipv4.iptable_mangle is set to NULL in 
>> iptable_mangle_net_exit() and called when ip_mangle modules is 
>> unloaded. A rmmod task was found in the crash dump.  A 2nd crash 
>> showed the same problem when running "rmmod iptable_filter" 
>> (net->ipv4.iptable_filter=NULL).
>> 
>> Once a hook is registered packets will picked up a pointer from: 
>> net->ipv4.iptable_$table. The patch adds a call to synchronize_net() 
>> in ipt_unregister_table() to insure no packets are in flight that have 
>> picked up the pointer before completing the un-register.
>> 
>> This change has has prevented the problem in our testing.  However, we 
>> have concerns with this change as it would mean that on netns cleanup, 
>> we would need one synchronize_net() call for every table in use. Also, 
>> on module unload, there would be one synchronize_net() for every 
>> existing netns.
> 
> Yes, I agree with the analysis.
> 
>> Signed-off-by: David Wilder <dwilder@us.ibm.com>
>> ---
>>  net/ipv4/netfilter/ip_tables.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/ipv4/netfilter/ip_tables.c 
>> b/net/ipv4/netfilter/ip_tables.c
>> index c2670ea..97c4121 100644
>> --- a/net/ipv4/netfilter/ip_tables.c
>> +++ b/net/ipv4/netfilter/ip_tables.c
>> @@ -1800,8 +1800,10 @@ int ipt_register_table(struct net *net, const 
>> struct xt_table *table,
>>  void ipt_unregister_table(struct net *net, struct xt_table *table,
>>  			  const struct nf_hook_ops *ops)
>>  {
>> -	if (ops)
>> +	if (ops) {
>>  		nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
>> +		synchronize_net();
>> +	}
> 
> I'd wager ebtables, arptables and ip6tables have the same bug.
> 
> The extra synchronize_net() isn't ideal.  We could probably do it this
> way and then improve in a second patch.
> 
> One way to fix this without a new synchronize_net() is to switch all
> iptable_foo.c to use ".pre_exit" hook as well.
> 
> pre_exit would unregister the underlying hook and .exit would to the
> table freeing.
> 
> Since the netns core already does an unconditional synchronize_rcu 
> after
> the pre_exit hooks this would avoid the problem as well.

Something like this?  (un-tested)

diff --git a/net/ipv4/netfilter/iptable_mangle.c 
b/net/ipv4/netfilter/iptable_mangle.c
index bb9266ea3785..0d448e4d5213 100644
--- a/net/ipv4/netfilter/iptable_mangle.c
+++ b/net/ipv4/netfilter/iptable_mangle.c
@@ -100,15 +100,26 @@ static int __net_init 
iptable_mangle_table_init(struct net *net)
         return ret;
  }

+static void __net_exit iptable_mangle_net_pre_exit(struct net *net)
+{
+       struct xt_table *table = net->ipv4.iptable_mangle;
+
+       if (mangle_ops)
+               nf_unregister_net_hooks(net, mangle_ops,
+                       hweight32(table->valid_hooks));
+}
+
+
  static void __net_exit iptable_mangle_net_exit(struct net *net)
  {
         if (!net->ipv4.iptable_mangle)
                 return;
-       ipt_unregister_table(net, net->ipv4.iptable_mangle, mangle_ops);
+       ipt_unregister_table(net, net->ipv4.iptable_mangle, NULL);
         net->ipv4.iptable_mangle = NULL;
  }

  static struct pernet_operations iptable_mangle_net_ops = {
+       .pre_exit = iptable_mangle_net_pre_exit,
         .exit = iptable_mangle_net_exit,
  };


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [(RFC) PATCH ] NULL pointer dereference on rmmod iptable_mangle.
  2020-06-04  6:00   ` dwilder
@ 2020-06-04 10:38     ` Florian Westphal
  2020-06-15 11:44       ` Florian Westphal
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2020-06-04 10:38 UTC (permalink / raw)
  To: dwilder; +Cc: Florian Westphal, netdev, netfilter-devel, wilder, mkubecek

dwilder <dwilder@us.ibm.com> wrote:
> > Since the netns core already does an unconditional synchronize_rcu after
> > the pre_exit hooks this would avoid the problem as well.
> 
> Something like this?  (un-tested)

Yes.

> diff --git a/net/ipv4/netfilter/iptable_mangle.c
> b/net/ipv4/netfilter/iptable_mangle.c
> index bb9266ea3785..0d448e4d5213 100644
> --- a/net/ipv4/netfilter/iptable_mangle.c
> +++ b/net/ipv4/netfilter/iptable_mangle.c
> @@ -100,15 +100,26 @@ static int __net_init iptable_mangle_table_init(struct
> net *net)
>         return ret;
>  }
> 
> +static void __net_exit iptable_mangle_net_pre_exit(struct net *net)
> +{
> +       struct xt_table *table = net->ipv4.iptable_mangle;
> +
> +       if (mangle_ops)
> +               nf_unregister_net_hooks(net, mangle_ops,
> +                       hweight32(table->valid_hooks));
> +}

You probably need if (table) here, not mangle_ops.
I'm not sure if it makes sense to add a new

xt_unregister_table_hook() helper, I guess one would have to try
and see if that reduces copy&paste programming or not.

>  static void __net_exit iptable_mangle_net_exit(struct net *net)
>  {
>         if (!net->ipv4.iptable_mangle)
>                 return;
> -       ipt_unregister_table(net, net->ipv4.iptable_mangle, mangle_ops);
> +       ipt_unregister_table(net, net->ipv4.iptable_mangle, NULL);

I guess the 3rd arg could be removed from the helper.

But yes, this looks like what I had in mind.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [(RFC) PATCH ] NULL pointer dereference on rmmod iptable_mangle.
  2020-06-04 10:38     ` Florian Westphal
@ 2020-06-15 11:44       ` Florian Westphal
  2020-06-15 16:21         ` dwilder
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2020-06-15 11:44 UTC (permalink / raw)
  To: Florian Westphal; +Cc: dwilder, netdev, netfilter-devel, wilder, mkubecek

Florian Westphal <fw@strlen.de> wrote:
> dwilder <dwilder@us.ibm.com> wrote:
> > > Since the netns core already does an unconditional synchronize_rcu after
> > > the pre_exit hooks this would avoid the problem as well.
> > 
> > Something like this?  (un-tested)
> 
> Yes.
> 
> > diff --git a/net/ipv4/netfilter/iptable_mangle.c
> > b/net/ipv4/netfilter/iptable_mangle.c
> > index bb9266ea3785..0d448e4d5213 100644
> > --- a/net/ipv4/netfilter/iptable_mangle.c
> > +++ b/net/ipv4/netfilter/iptable_mangle.c
> > @@ -100,15 +100,26 @@ static int __net_init iptable_mangle_table_init(struct
> > net *net)
> >         return ret;
> >  }
> > 
> > +static void __net_exit iptable_mangle_net_pre_exit(struct net *net)
> > +{
> > +       struct xt_table *table = net->ipv4.iptable_mangle;
> > +
> > +       if (mangle_ops)
> > +               nf_unregister_net_hooks(net, mangle_ops,
> > +                       hweight32(table->valid_hooks));
> > +}
> 
> You probably need if (table) here, not mangle_ops.
> I'm not sure if it makes sense to add a new
> 
> xt_unregister_table_hook() helper, I guess one would have to try
> and see if that reduces copy&paste programming or not.
> 
> >  static void __net_exit iptable_mangle_net_exit(struct net *net)
> >  {
> >         if (!net->ipv4.iptable_mangle)
> >                 return;
> > -       ipt_unregister_table(net, net->ipv4.iptable_mangle, mangle_ops);
> > +       ipt_unregister_table(net, net->ipv4.iptable_mangle, NULL);
> 
> I guess the 3rd arg could be removed from the helper.
> 
> But yes, this looks like what I had in mind.

Will there be a followup?  Otherwise I will place this on my todo-list.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [(RFC) PATCH ] NULL pointer dereference on rmmod iptable_mangle.
  2020-06-15 11:44       ` Florian Westphal
@ 2020-06-15 16:21         ` dwilder
  0 siblings, 0 replies; 6+ messages in thread
From: dwilder @ 2020-06-15 16:21 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, netfilter-devel, wilder, mkubecek, netfilter-devel-owner

On 2020-06-15 04:44, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> dwilder <dwilder@us.ibm.com> wrote:
>> > > Since the netns core already does an unconditional synchronize_rcu after
>> > > the pre_exit hooks this would avoid the problem as well.
>> >
>> > Something like this?  (un-tested)
>> 
>> Yes.
>> 
>> > diff --git a/net/ipv4/netfilter/iptable_mangle.c
>> > b/net/ipv4/netfilter/iptable_mangle.c
>> > index bb9266ea3785..0d448e4d5213 100644
>> > --- a/net/ipv4/netfilter/iptable_mangle.c
>> > +++ b/net/ipv4/netfilter/iptable_mangle.c
>> > @@ -100,15 +100,26 @@ static int __net_init iptable_mangle_table_init(struct
>> > net *net)
>> >         return ret;
>> >  }
>> >
>> > +static void __net_exit iptable_mangle_net_pre_exit(struct net *net)
>> > +{
>> > +       struct xt_table *table = net->ipv4.iptable_mangle;
>> > +
>> > +       if (mangle_ops)
>> > +               nf_unregister_net_hooks(net, mangle_ops,
>> > +                       hweight32(table->valid_hooks));
>> > +}
>> 
>> You probably need if (table) here, not mangle_ops.
>> I'm not sure if it makes sense to add a new
>> 
>> xt_unregister_table_hook() helper, I guess one would have to try
>> and see if that reduces copy&paste programming or not.
>> 
>> >  static void __net_exit iptable_mangle_net_exit(struct net *net)
>> >  {
>> >         if (!net->ipv4.iptable_mangle)
>> >                 return;
>> > -       ipt_unregister_table(net, net->ipv4.iptable_mangle, mangle_ops);
>> > +       ipt_unregister_table(net, net->ipv4.iptable_mangle, NULL);
>> 
>> I guess the 3rd arg could be removed from the helper.
>> 
>> But yes, this looks like what I had in mind.
> 
> Will there be a followup?  Otherwise I will place this on my todo-list.
> 
> Thanks.

Hi Florian

I am working on a patch.  Will post soon.  Sorry for the delay.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-15 16:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-03 21:25 [(RFC) PATCH ] NULL pointer dereference on rmmod iptable_mangle David Wilder
2020-06-03 22:05 ` Florian Westphal
2020-06-04  6:00   ` dwilder
2020-06-04 10:38     ` Florian Westphal
2020-06-15 11:44       ` Florian Westphal
2020-06-15 16:21         ` dwilder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).