All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] tun: Fix NULL pointer dereference in XDP redirect
@ 2018-05-25  4:32 Toshiaki Makita
  2018-05-25  9:59 ` Jason Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Toshiaki Makita @ 2018-05-25  4:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: Toshiaki Makita, netdev, Jason Wang

Calling XDP redirection requires preempt/bh disabled. Especially softirq
can call another XDP function and redirection functions, then percpu
value ri->map can be overwritten to NULL.

This is a generic XDP case called from tun.

[ 3535.736058] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 3535.743974] PGD 0 P4D 0
[ 3535.746530] Oops: 0000 [#1] SMP PTI
[ 3535.750049] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel crypto_simd cryptd enclosure hpwdt hpilo glue_helper ipmi_si pcspkr wmi mei_me ioatdma mei ipmi_devintf shpchp dca ipmi_msghandler lpc_ich acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm smartpqi i40e crc32c_intel scsi_transport_sas tg3 i2c_core ptp pps_core
[ 3535.813456] CPU: 5 PID: 1630 Comm: vhost-1614 Not tainted 4.17.0-rc4 #2
[ 3535.820127] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017
[ 3535.828732] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
[ 3535.833740] RSP: 0018:ffffb4bc47bf7c58 EFLAGS: 00010246
[ 3535.839009] RAX: ffff9fdfcfea1c40 RBX: 0000000000000000 RCX: ffff9fdf27fe3100
[ 3535.846205] RDX: ffff9fdfca769200 RSI: 0000000000000000 RDI: 0000000000000000
[ 3535.853402] RBP: ffffb4bc491d9000 R08: 00000000000045ad R09: 0000000000000ec0
[ 3535.860597] R10: 0000000000000001 R11: ffff9fdf26c3ce4e R12: ffff9fdf9e72c000
[ 3535.867794] R13: 0000000000000000 R14: fffffffffffffff2 R15: ffff9fdfc82cdd00
[ 3535.874990] FS:  0000000000000000(0000) GS:ffff9fdfcfe80000(0000) knlGS:0000000000000000
[ 3535.883152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3535.888948] CR2: 0000000000000018 CR3: 0000000bde724004 CR4: 00000000007626e0
[ 3535.896145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3535.903342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3535.910538] PKRU: 55555554
[ 3535.913267] Call Trace:
[ 3535.915736]  xdp_do_generic_redirect+0x7a/0x310
[ 3535.920310]  do_xdp_generic.part.117+0x285/0x370
[ 3535.924970]  tun_get_user+0x5b9/0x1260 [tun]
[ 3535.929279]  tun_sendmsg+0x52/0x70 [tun]
[ 3535.933237]  handle_tx+0x2ad/0x5f0 [vhost_net]
[ 3535.937721]  vhost_worker+0xa5/0x100 [vhost]
[ 3535.942030]  kthread+0xf5/0x130
[ 3535.945198]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
[ 3535.950031]  ? kthread_bind+0x10/0x10
[ 3535.953727]  ret_from_fork+0x35/0x40
[ 3535.957334] Code: 0e 74 15 83 f8 10 75 05 e9 49 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 29 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 49 a9 b3 ff 31 c0 c3
[ 3535.976387] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffffb4bc47bf7c58
[ 3535.982883] CR2: 0000000000000018
[ 3535.987096] ---[ end trace 383b299dd1430240 ]---
[ 3536.131325] Kernel panic - not syncing: Fatal exception
[ 3536.137484] Kernel Offset: 0x26a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 3536.281406] ---[ end Kernel panic - not syncing: Fatal exception ]---

And a kernel with generic case fixed still panics in tun driver XDP
redirect, because it did not disable bh.

[ 2055.128746] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 2055.136662] PGD 0 P4D 0
[ 2055.139219] Oops: 0000 [#1] SMP PTI
[ 2055.142736] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel ipmi_ssif crypto_simd enclosure cryptd hpwdt glue_helper ioatdma hpilo wmi dca pcspkr ipmi_si acpi_power_meter ipmi_devintf shpchp mei_me ipmi_msghandler mei lpc_ich sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e smartpqi tg3 scsi_transport_sas crc32c_intel i2c_core ptp pps_core
[ 2055.206142] CPU: 6 PID: 1693 Comm: vhost-1683 Tainted: G        W         4.17.0-rc5-fix-tun+ #1
[ 2055.215011] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017
[ 2055.223617] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
[ 2055.228624] RSP: 0018:ffff998b07607cc0 EFLAGS: 00010246
[ 2055.233892] RAX: ffff8dbd8e235700 RBX: ffff8dbd8ff21c40 RCX: 0000000000000004
[ 2055.241089] RDX: ffff998b097a9000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2055.248286] RBP: 0000000000000000 R08: 00000000000065a8 R09: 0000000000005d80
[ 2055.255483] R10: 0000000000000040 R11: ffff8dbcf0100000 R12: ffff998b097a9000
[ 2055.262681] R13: ffff8dbd8c98c000 R14: 0000000000000000 R15: ffff998b07607d78
[ 2055.269879] FS:  0000000000000000(0000) GS:ffff8dbd8ff00000(0000) knlGS:0000000000000000
[ 2055.278039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2055.283834] CR2: 0000000000000018 CR3: 0000000c0c8cc005 CR4: 00000000007626e0
[ 2055.291030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2055.298227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2055.305424] PKRU: 55555554
[ 2055.308153] Call Trace:
[ 2055.310624]  xdp_do_redirect+0x7b/0x380
[ 2055.314499]  tun_get_user+0x10fe/0x12a0 [tun]
[ 2055.318895]  tun_sendmsg+0x52/0x70 [tun]
[ 2055.322852]  handle_tx+0x2ad/0x5f0 [vhost_net]
[ 2055.327337]  vhost_worker+0xa5/0x100 [vhost]
[ 2055.331646]  kthread+0xf5/0x130
[ 2055.334813]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
[ 2055.339646]  ? kthread_bind+0x10/0x10
[ 2055.343343]  ret_from_fork+0x35/0x40
[ 2055.346950] Code: 0e 74 15 83 f8 10 75 05 e9 e9 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 c9 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 e9 a9 b3 ff 31 c0 c3
[ 2055.366004] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffff998b07607cc0
[ 2055.372500] CR2: 0000000000000018
[ 2055.375856] ---[ end trace 2a2dcc5e9e174268 ]---
[ 2055.523626] Kernel panic - not syncing: Fatal exception
[ 2055.529796] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2055.677539] ---[ end Kernel panic - not syncing: Fatal exception ]---

Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
 drivers/net/tun.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 45d8077..4fc7dbf 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1650,6 +1650,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	else
 		*skb_xdp = 0;
 
+	local_bh_disable();
 	preempt_disable();
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(tun->xdp_prog);
@@ -1676,6 +1677,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 				goto err_redirect;
 			rcu_read_unlock();
 			preempt_enable();
+			local_bh_enable();
 			return NULL;
 		case XDP_TX:
 			get_page(alloc_frag->page);
@@ -1685,6 +1687,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 			tun_xdp_flush(tun->dev);
 			rcu_read_unlock();
 			preempt_enable();
+			local_bh_enable();
 			return NULL;
 		case XDP_PASS:
 			delta = orig_data - xdp.data;
@@ -1704,6 +1707,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	if (!skb) {
 		rcu_read_unlock();
 		preempt_enable();
+		local_bh_enable();
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -1714,6 +1718,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 
 	rcu_read_unlock();
 	preempt_enable();
+	local_bh_enable();
 
 	return skb;
 
@@ -1722,6 +1727,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 err_xdp:
 	rcu_read_unlock();
 	preempt_enable();
+	local_bh_enable();
 	this_cpu_inc(tun->pcpu_stats->rx_dropped);
 	return NULL;
 }
@@ -1917,16 +1923,22 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		struct bpf_prog *xdp_prog;
 		int ret;
 
+		local_bh_disable();
+		preempt_disable();
 		rcu_read_lock();
 		xdp_prog = rcu_dereference(tun->xdp_prog);
 		if (xdp_prog) {
 			ret = do_xdp_generic(xdp_prog, skb);
 			if (ret != XDP_PASS) {
 				rcu_read_unlock();
+				preempt_enable();
+				local_bh_enable();
 				return total_len;
 			}
 		}
 		rcu_read_unlock();
+		preempt_enable();
+		local_bh_enable();
 	}
 
 	rcu_read_lock();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net] tun: Fix NULL pointer dereference in XDP redirect
  2018-05-25  4:32 [PATCH net] tun: Fix NULL pointer dereference in XDP redirect Toshiaki Makita
@ 2018-05-25  9:59 ` Jason Wang
  2018-05-25 13:43   ` Toshiaki Makita
  0 siblings, 1 reply; 5+ messages in thread
From: Jason Wang @ 2018-05-25  9:59 UTC (permalink / raw)
  To: Toshiaki Makita, David S. Miller; +Cc: netdev



On 2018年05月25日 12:32, Toshiaki Makita wrote:
> Calling XDP redirection requires preempt/bh disabled. Especially softirq
> can call another XDP function and redirection functions, then percpu
> value ri->map can be overwritten to NULL.
>
> This is a generic XDP case called from tun.
>
> [ 3535.736058] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> [ 3535.743974] PGD 0 P4D 0
> [ 3535.746530] Oops: 0000 [#1] SMP PTI
> [ 3535.750049] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel crypto_simd cryptd enclosure hpwdt hpilo glue_helper ipmi_si pcspkr wmi mei_me ioatdma mei ipmi_devintf shpchp dca ipmi_msghandler lpc_ich acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm smartpqi i40e crc32c_intel scsi_transport_sas tg3 i2c_core ptp pps_core
> [ 3535.813456] CPU: 5 PID: 1630 Comm: vhost-1614 Not tainted 4.17.0-rc4 #2
> [ 3535.820127] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017
> [ 3535.828732] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
> [ 3535.833740] RSP: 0018:ffffb4bc47bf7c58 EFLAGS: 00010246
> [ 3535.839009] RAX: ffff9fdfcfea1c40 RBX: 0000000000000000 RCX: ffff9fdf27fe3100
> [ 3535.846205] RDX: ffff9fdfca769200 RSI: 0000000000000000 RDI: 0000000000000000
> [ 3535.853402] RBP: ffffb4bc491d9000 R08: 00000000000045ad R09: 0000000000000ec0
> [ 3535.860597] R10: 0000000000000001 R11: ffff9fdf26c3ce4e R12: ffff9fdf9e72c000
> [ 3535.867794] R13: 0000000000000000 R14: fffffffffffffff2 R15: ffff9fdfc82cdd00
> [ 3535.874990] FS:  0000000000000000(0000) GS:ffff9fdfcfe80000(0000) knlGS:0000000000000000
> [ 3535.883152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3535.888948] CR2: 0000000000000018 CR3: 0000000bde724004 CR4: 00000000007626e0
> [ 3535.896145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3535.903342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3535.910538] PKRU: 55555554
> [ 3535.913267] Call Trace:
> [ 3535.915736]  xdp_do_generic_redirect+0x7a/0x310
> [ 3535.920310]  do_xdp_generic.part.117+0x285/0x370
> [ 3535.924970]  tun_get_user+0x5b9/0x1260 [tun]
> [ 3535.929279]  tun_sendmsg+0x52/0x70 [tun]
> [ 3535.933237]  handle_tx+0x2ad/0x5f0 [vhost_net]
> [ 3535.937721]  vhost_worker+0xa5/0x100 [vhost]
> [ 3535.942030]  kthread+0xf5/0x130
> [ 3535.945198]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
> [ 3535.950031]  ? kthread_bind+0x10/0x10
> [ 3535.953727]  ret_from_fork+0x35/0x40
> [ 3535.957334] Code: 0e 74 15 83 f8 10 75 05 e9 49 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 29 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 49 a9 b3 ff 31 c0 c3
> [ 3535.976387] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffffb4bc47bf7c58
> [ 3535.982883] CR2: 0000000000000018
> [ 3535.987096] ---[ end trace 383b299dd1430240 ]---
> [ 3536.131325] Kernel panic - not syncing: Fatal exception
> [ 3536.137484] Kernel Offset: 0x26a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 3536.281406] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> And a kernel with generic case fixed still panics in tun driver XDP
> redirect, because it did not disable bh.
>
> [ 2055.128746] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> [ 2055.136662] PGD 0 P4D 0
> [ 2055.139219] Oops: 0000 [#1] SMP PTI
> [ 2055.142736] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel ipmi_ssif crypto_simd enclosure cryptd hpwdt glue_helper ioatdma hpilo wmi dca pcspkr ipmi_si acpi_power_meter ipmi_devintf shpchp mei_me ipmi_msghandler mei lpc_ich sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e smartpqi tg3 scsi_transport_sas crc32c_intel i2c_core ptp pps_core
> [ 2055.206142] CPU: 6 PID: 1693 Comm: vhost-1683 Tainted: G        W         4.17.0-rc5-fix-tun+ #1
> [ 2055.215011] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017
> [ 2055.223617] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
> [ 2055.228624] RSP: 0018:ffff998b07607cc0 EFLAGS: 00010246
> [ 2055.233892] RAX: ffff8dbd8e235700 RBX: ffff8dbd8ff21c40 RCX: 0000000000000004
> [ 2055.241089] RDX: ffff998b097a9000 RSI: 0000000000000000 RDI: 0000000000000000
> [ 2055.248286] RBP: 0000000000000000 R08: 00000000000065a8 R09: 0000000000005d80
> [ 2055.255483] R10: 0000000000000040 R11: ffff8dbcf0100000 R12: ffff998b097a9000
> [ 2055.262681] R13: ffff8dbd8c98c000 R14: 0000000000000000 R15: ffff998b07607d78
> [ 2055.269879] FS:  0000000000000000(0000) GS:ffff8dbd8ff00000(0000) knlGS:0000000000000000
> [ 2055.278039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2055.283834] CR2: 0000000000000018 CR3: 0000000c0c8cc005 CR4: 00000000007626e0
> [ 2055.291030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2055.298227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2055.305424] PKRU: 55555554
> [ 2055.308153] Call Trace:
> [ 2055.310624]  xdp_do_redirect+0x7b/0x380
> [ 2055.314499]  tun_get_user+0x10fe/0x12a0 [tun]
> [ 2055.318895]  tun_sendmsg+0x52/0x70 [tun]
> [ 2055.322852]  handle_tx+0x2ad/0x5f0 [vhost_net]
> [ 2055.327337]  vhost_worker+0xa5/0x100 [vhost]
> [ 2055.331646]  kthread+0xf5/0x130
> [ 2055.334813]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
> [ 2055.339646]  ? kthread_bind+0x10/0x10
> [ 2055.343343]  ret_from_fork+0x35/0x40
> [ 2055.346950] Code: 0e 74 15 83 f8 10 75 05 e9 e9 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 c9 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 e9 a9 b3 ff 31 c0 c3
> [ 2055.366004] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffff998b07607cc0
> [ 2055.372500] CR2: 0000000000000018
> [ 2055.375856] ---[ end trace 2a2dcc5e9e174268 ]---
> [ 2055.523626] Kernel panic - not syncing: Fatal exception
> [ 2055.529796] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 2055.677539] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Fixes: 761876c857cb ("tap: XDP support")
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
>   drivers/net/tun.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 45d8077..4fc7dbf 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1650,6 +1650,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>   	else
>   		*skb_xdp = 0;
>   
> +	local_bh_disable();
>   	preempt_disable();
>   	rcu_read_lock();
>   	xdp_prog = rcu_dereference(tun->xdp_prog);
> @@ -1676,6 +1677,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>   				goto err_redirect;
>   			rcu_read_unlock();
>   			preempt_enable();
> +			local_bh_enable();
>   			return NULL;
>   		case XDP_TX:
>   			get_page(alloc_frag->page);
> @@ -1685,6 +1687,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>   			tun_xdp_flush(tun->dev);
>   			rcu_read_unlock();
>   			preempt_enable();
> +			local_bh_enable();
>   			return NULL;
>   		case XDP_PASS:
>   			delta = orig_data - xdp.data;
> @@ -1704,6 +1707,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>   	if (!skb) {
>   		rcu_read_unlock();
>   		preempt_enable();
> +		local_bh_enable();
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> @@ -1714,6 +1718,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>   
>   	rcu_read_unlock();
>   	preempt_enable();
> +	local_bh_enable();
>   
>   	return skb;
>   
> @@ -1722,6 +1727,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>   err_xdp:
>   	rcu_read_unlock();
>   	preempt_enable();
> +	local_bh_enable();
>   	this_cpu_inc(tun->pcpu_stats->rx_dropped);
>   	return NULL;
>   }
> @@ -1917,16 +1923,22 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>   		struct bpf_prog *xdp_prog;
>   		int ret;
>   
> +		local_bh_disable();
> +		preempt_disable();
>   		rcu_read_lock();
>   		xdp_prog = rcu_dereference(tun->xdp_prog);
>   		if (xdp_prog) {
>   			ret = do_xdp_generic(xdp_prog, skb);
>   			if (ret != XDP_PASS) {
>   				rcu_read_unlock();
> +				preempt_enable();
> +				local_bh_enable();
>   				return total_len;
>   			}
>   		}
>   		rcu_read_unlock();
> +		preempt_enable();
> +		local_bh_enable();
>   	}
>   
>   	rcu_read_lock();

Good catch, thanks.

But I think we can just replace preempt_disable()/enable() with 
local_bh_disable()/local_bh_enable() ?

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] tun: Fix NULL pointer dereference in XDP redirect
  2018-05-25  9:59 ` Jason Wang
@ 2018-05-25 13:43   ` Toshiaki Makita
  2018-05-28  2:24     ` Jason Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Toshiaki Makita @ 2018-05-25 13:43 UTC (permalink / raw)
  To: Jason Wang, Toshiaki Makita, David S. Miller; +Cc: netdev

On 18/05/25 (金) 18:59, Jason Wang wrote:
> On 2018年05月25日 12:32, Toshiaki Makita wrote:
>> Calling XDP redirection requires preempt/bh disabled. Especially softirq
>> can call another XDP function and redirection functions, then percpu
>> value ri->map can be overwritten to NULL.
>>
>> This is a generic XDP case called from tun.
>>
>> [ 3535.736058] BUG: unable to handle kernel NULL pointer dereference 
>> at 0000000000000018
>> [ 3535.743974] PGD 0 P4D 0
>> [ 3535.746530] Oops: 0000 [#1] SMP PTI
>> [ 3535.750049] Modules linked in: vhost_net vhost tap tun bridge stp 
>> llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
>> sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm 
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif 
>> irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses 
>> aesni_intel crypto_simd cryptd enclosure hpwdt hpilo glue_helper 
>> ipmi_si pcspkr wmi mei_me ioatdma mei ipmi_devintf shpchp dca 
>> ipmi_msghandler lpc_ich acpi_power_meter sch_fq_codel ip_tables xfs 
>> libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea 
>> sysfillrect sysimgblt fb_sys_fops ttm drm smartpqi i40e crc32c_intel 
>> scsi_transport_sas tg3 i2c_core ptp pps_core
>> [ 3535.813456] CPU: 5 PID: 1630 Comm: vhost-1614 Not tainted 
>> 4.17.0-rc4 #2
>> [ 3535.820127] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 
>> Gen10, BIOS U32 11/14/2017
>> [ 3535.828732] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
>> [ 3535.833740] RSP: 0018:ffffb4bc47bf7c58 EFLAGS: 00010246
>> [ 3535.839009] RAX: ffff9fdfcfea1c40 RBX: 0000000000000000 RCX: 
>> ffff9fdf27fe3100
>> [ 3535.846205] RDX: ffff9fdfca769200 RSI: 0000000000000000 RDI: 
>> 0000000000000000
>> [ 3535.853402] RBP: ffffb4bc491d9000 R08: 00000000000045ad R09: 
>> 0000000000000ec0
>> [ 3535.860597] R10: 0000000000000001 R11: ffff9fdf26c3ce4e R12: 
>> ffff9fdf9e72c000
>> [ 3535.867794] R13: 0000000000000000 R14: fffffffffffffff2 R15: 
>> ffff9fdfc82cdd00
>> [ 3535.874990] FS:  0000000000000000(0000) GS:ffff9fdfcfe80000(0000) 
>> knlGS:0000000000000000
>> [ 3535.883152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 3535.888948] CR2: 0000000000000018 CR3: 0000000bde724004 CR4: 
>> 00000000007626e0
>> [ 3535.896145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> [ 3535.903342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>> 0000000000000400
>> [ 3535.910538] PKRU: 55555554
>> [ 3535.913267] Call Trace:
>> [ 3535.915736]  xdp_do_generic_redirect+0x7a/0x310
>> [ 3535.920310]  do_xdp_generic.part.117+0x285/0x370
>> [ 3535.924970]  tun_get_user+0x5b9/0x1260 [tun]
>> [ 3535.929279]  tun_sendmsg+0x52/0x70 [tun]
>> [ 3535.933237]  handle_tx+0x2ad/0x5f0 [vhost_net]
>> [ 3535.937721]  vhost_worker+0xa5/0x100 [vhost]
>> [ 3535.942030]  kthread+0xf5/0x130
>> [ 3535.945198]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
>> [ 3535.950031]  ? kthread_bind+0x10/0x10
>> [ 3535.953727]  ret_from_fork+0x35/0x40
>> [ 3535.957334] Code: 0e 74 15 83 f8 10 75 05 e9 49 aa b3 ff f3 c3 0f 
>> 1f 80 00 00 00 00 f3 c3 e9 29 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 
>> 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 49 a9 b3 ff 31 
>> c0 c3
>> [ 3535.976387] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffffb4bc47bf7c58
>> [ 3535.982883] CR2: 0000000000000018
>> [ 3535.987096] ---[ end trace 383b299dd1430240 ]---
>> [ 3536.131325] Kernel panic - not syncing: Fatal exception
>> [ 3536.137484] Kernel Offset: 0x26a00000 from 0xffffffff81000000 
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 3536.281406] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> And a kernel with generic case fixed still panics in tun driver XDP
>> redirect, because it did not disable bh.
>>
>> [ 2055.128746] BUG: unable to handle kernel NULL pointer dereference 
>> at 0000000000000018
>> [ 2055.136662] PGD 0 P4D 0
>> [ 2055.139219] Oops: 0000 [#1] SMP PTI
>> [ 2055.142736] Modules linked in: vhost_net vhost tap tun bridge stp 
>> llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
>> sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm 
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass 
>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel 
>> ipmi_ssif crypto_simd enclosure cryptd hpwdt glue_helper ioatdma hpilo 
>> wmi dca pcspkr ipmi_si acpi_power_meter ipmi_devintf shpchp mei_me 
>> ipmi_msghandler mei lpc_ich sch_fq_codel ip_tables xfs libcrc32c 
>> sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect 
>> sysimgblt fb_sys_fops ttm drm i40e smartpqi tg3 scsi_transport_sas 
>> crc32c_intel i2c_core ptp pps_core
>> [ 2055.206142] CPU: 6 PID: 1693 Comm: vhost-1683 Tainted: G        
>> W         4.17.0-rc5-fix-tun+ #1
>> [ 2055.215011] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 
>> Gen10, BIOS U32 11/14/2017
>> [ 2055.223617] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
>> [ 2055.228624] RSP: 0018:ffff998b07607cc0 EFLAGS: 00010246
>> [ 2055.233892] RAX: ffff8dbd8e235700 RBX: ffff8dbd8ff21c40 RCX: 
>> 0000000000000004
>> [ 2055.241089] RDX: ffff998b097a9000 RSI: 0000000000000000 RDI: 
>> 0000000000000000
>> [ 2055.248286] RBP: 0000000000000000 R08: 00000000000065a8 R09: 
>> 0000000000005d80
>> [ 2055.255483] R10: 0000000000000040 R11: ffff8dbcf0100000 R12: 
>> ffff998b097a9000
>> [ 2055.262681] R13: ffff8dbd8c98c000 R14: 0000000000000000 R15: 
>> ffff998b07607d78
>> [ 2055.269879] FS:  0000000000000000(0000) GS:ffff8dbd8ff00000(0000) 
>> knlGS:0000000000000000
>> [ 2055.278039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2055.283834] CR2: 0000000000000018 CR3: 0000000c0c8cc005 CR4: 
>> 00000000007626e0
>> [ 2055.291030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> [ 2055.298227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>> 0000000000000400
>> [ 2055.305424] PKRU: 55555554
>> [ 2055.308153] Call Trace:
>> [ 2055.310624]  xdp_do_redirect+0x7b/0x380
>> [ 2055.314499]  tun_get_user+0x10fe/0x12a0 [tun]
>> [ 2055.318895]  tun_sendmsg+0x52/0x70 [tun]
>> [ 2055.322852]  handle_tx+0x2ad/0x5f0 [vhost_net]
>> [ 2055.327337]  vhost_worker+0xa5/0x100 [vhost]
>> [ 2055.331646]  kthread+0xf5/0x130
>> [ 2055.334813]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
>> [ 2055.339646]  ? kthread_bind+0x10/0x10
>> [ 2055.343343]  ret_from_fork+0x35/0x40
>> [ 2055.346950] Code: 0e 74 15 83 f8 10 75 05 e9 e9 aa b3 ff f3 c3 0f 
>> 1f 80 00 00 00 00 f3 c3 e9 c9 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 
>> 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 e9 a9 b3 ff 31 
>> c0 c3
>> [ 2055.366004] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffff998b07607cc0
>> [ 2055.372500] CR2: 0000000000000018
>> [ 2055.375856] ---[ end trace 2a2dcc5e9e174268 ]---
>> [ 2055.523626] Kernel panic - not syncing: Fatal exception
>> [ 2055.529796] Kernel Offset: 0x2e000000 from 0xffffffff81000000 
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 2055.677539] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> Fixes: 761876c857cb ("tap: XDP support")
>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>> ---
>>   drivers/net/tun.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 45d8077..4fc7dbf 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1650,6 +1650,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>       else
>>           *skb_xdp = 0;
>> +    local_bh_disable();
>>       preempt_disable();
>>       rcu_read_lock();
>>       xdp_prog = rcu_dereference(tun->xdp_prog);
>> @@ -1676,6 +1677,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>                   goto err_redirect;
>>               rcu_read_unlock();
>>               preempt_enable();
>> +            local_bh_enable();
>>               return NULL;
>>           case XDP_TX:
>>               get_page(alloc_frag->page);
>> @@ -1685,6 +1687,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>               tun_xdp_flush(tun->dev);
>>               rcu_read_unlock();
>>               preempt_enable();
>> +            local_bh_enable();
>>               return NULL;
>>           case XDP_PASS:
>>               delta = orig_data - xdp.data;
>> @@ -1704,6 +1707,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>       if (!skb) {
>>           rcu_read_unlock();
>>           preempt_enable();
>> +        local_bh_enable();
>>           return ERR_PTR(-ENOMEM);
>>       }
>> @@ -1714,6 +1718,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>       rcu_read_unlock();
>>       preempt_enable();
>> +    local_bh_enable();
>>       return skb;
>> @@ -1722,6 +1727,7 @@ static struct sk_buff *tun_build_skb(struct 
>> tun_struct *tun,
>>   err_xdp:
>>       rcu_read_unlock();
>>       preempt_enable();
>> +    local_bh_enable();
>>       this_cpu_inc(tun->pcpu_stats->rx_dropped);
>>       return NULL;
>>   }
>> @@ -1917,16 +1923,22 @@ static ssize_t tun_get_user(struct tun_struct 
>> *tun, struct tun_file *tfile,
>>           struct bpf_prog *xdp_prog;
>>           int ret;
>> +        local_bh_disable();
>> +        preempt_disable();
>>           rcu_read_lock();
>>           xdp_prog = rcu_dereference(tun->xdp_prog);
>>           if (xdp_prog) {
>>               ret = do_xdp_generic(xdp_prog, skb);
>>               if (ret != XDP_PASS) {
>>                   rcu_read_unlock();
>> +                preempt_enable();
>> +                local_bh_enable();
>>                   return total_len;
>>               }
>>           }
>>           rcu_read_unlock();
>> +        preempt_enable();
>> +        local_bh_enable();
>>       }
>>       rcu_read_lock();
> 
> Good catch, thanks.
> 
> But I think we can just replace preempt_disable()/enable() with 
> local_bh_disable()/local_bh_enable() ?

I actually thought the same, but noticed this patch.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9ea4c380066fbe

It looks like they do not think local_bh_disable() implies 
preempt_disable(). But I'm not sure why..

Toshiaki Makita

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] tun: Fix NULL pointer dereference in XDP redirect
  2018-05-25 13:43   ` Toshiaki Makita
@ 2018-05-28  2:24     ` Jason Wang
  2018-05-28  4:28       ` Toshiaki Makita
  0 siblings, 1 reply; 5+ messages in thread
From: Jason Wang @ 2018-05-28  2:24 UTC (permalink / raw)
  To: Toshiaki Makita, Toshiaki Makita, David S. Miller
  Cc: netdev, Michael S. Tsirkin



On 2018年05月25日 21:43, Toshiaki Makita wrote:

[...]

>>> @@ -1917,16 +1923,22 @@ static ssize_t tun_get_user(struct 
>>> tun_struct *tun, struct tun_file *tfile,
>>>           struct bpf_prog *xdp_prog;
>>>           int ret;
>>> +        local_bh_disable();
>>> +        preempt_disable();
>>>           rcu_read_lock();
>>>           xdp_prog = rcu_dereference(tun->xdp_prog);
>>>           if (xdp_prog) {
>>>               ret = do_xdp_generic(xdp_prog, skb);
>>>               if (ret != XDP_PASS) {
>>>                   rcu_read_unlock();
>>> +                preempt_enable();
>>> +                local_bh_enable();
>>>                   return total_len;
>>>               }
>>>           }
>>>           rcu_read_unlock();
>>> +        preempt_enable();
>>> +        local_bh_enable();
>>>       }
>>>       rcu_read_lock();
>>
>> Good catch, thanks.
>>
>> But I think we can just replace preempt_disable()/enable() with 
>> local_bh_disable()/local_bh_enable() ?
>
> I actually thought the same, but noticed this patch.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9ea4c380066fbe 
>
>
> It looks like they do not think local_bh_disable() implies 
> preempt_disable(). But I'm not sure why..
>
> Toshiaki Makita 

I see, there're probably have some subtle differences and implications 
for e.g scheduler or others.

What we what here is to make sure the process is not moved to another 
CPU and bh is enabled. By checking preemptible() function, preemption 
should be disabled after local_bh_disable(). So I think we're safe here.

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] tun: Fix NULL pointer dereference in XDP redirect
  2018-05-28  2:24     ` Jason Wang
@ 2018-05-28  4:28       ` Toshiaki Makita
  0 siblings, 0 replies; 5+ messages in thread
From: Toshiaki Makita @ 2018-05-28  4:28 UTC (permalink / raw)
  To: Jason Wang; +Cc: Toshiaki Makita, David S. Miller, netdev, Michael S. Tsirkin

On 2018/05/28 11:24, Jason Wang wrote:
> On 2018年05月25日 21:43, Toshiaki Makita wrote:
> 
> [...]
> 
>>>> @@ -1917,16 +1923,22 @@ static ssize_t tun_get_user(struct
>>>> tun_struct *tun, struct tun_file *tfile,
>>>>           struct bpf_prog *xdp_prog;
>>>>           int ret;
>>>> +        local_bh_disable();
>>>> +        preempt_disable();
>>>>           rcu_read_lock();
>>>>           xdp_prog = rcu_dereference(tun->xdp_prog);
>>>>           if (xdp_prog) {
>>>>               ret = do_xdp_generic(xdp_prog, skb);
>>>>               if (ret != XDP_PASS) {
>>>>                   rcu_read_unlock();
>>>> +                preempt_enable();
>>>> +                local_bh_enable();
>>>>                   return total_len;
>>>>               }
>>>>           }
>>>>           rcu_read_unlock();
>>>> +        preempt_enable();
>>>> +        local_bh_enable();
>>>>       }
>>>>       rcu_read_lock();
>>>
>>> Good catch, thanks.
>>>
>>> But I think we can just replace preempt_disable()/enable() with
>>> local_bh_disable()/local_bh_enable() ?
>>
>> I actually thought the same, but noticed this patch.
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9ea4c380066fbe
>>
>>
>> It looks like they do not think local_bh_disable() implies
>> preempt_disable(). But I'm not sure why..
>>
>> Toshiaki Makita 
> 
> I see, there're probably have some subtle differences and implications
> for e.g scheduler or others.
> 
> What we what here is to make sure the process is not moved to another
> CPU and bh is enabled. By checking preemptible() function, preemption
> should be disabled after local_bh_disable(). So I think we're safe here.

OK. I checked retint_kernel which IIUC is the entry point of preemption
process on x86, and confirmed it just checks if __preempt_count is zero.

I haven't checked other archs but I was probably worried too much.
Will send v2.

Thanks,
Toshiaki Makita

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-05-28  4:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-25  4:32 [PATCH net] tun: Fix NULL pointer dereference in XDP redirect Toshiaki Makita
2018-05-25  9:59 ` Jason Wang
2018-05-25 13:43   ` Toshiaki Makita
2018-05-28  2:24     ` Jason Wang
2018-05-28  4:28       ` Toshiaki Makita

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.