All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: KASAN: use-after-free Read in tun_net_xmit
@ 2019-04-22  3:57 YueHaibing
  2019-04-22  9:41 ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: YueHaibing @ 2019-04-22  3:57 UTC (permalink / raw)
  To: netdev

We get a KASAN report as below, but don't have any reproducer.

Any comments are appreciated.

==================================================================
BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0

CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xca/0x13e lib/dump_stack.c:113
 print_address_description+0x79/0x330 mm/kasan/report.c:253
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
 tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
 __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
 netdev_start_xmit include/linux/netdevice.h:4309 [inline]
 xmit_one net/core/dev.c:3243 [inline]
 dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
 sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
 qdisc_restart net/sched/sch_generic.c:390 [inline]
 __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
 qdisc_run include/net/pkt_sched.h:120 [inline]
 __dev_xmit_skb net/core/dev.c:3438 [inline]
 __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
 neigh_output include/net/neighbour.h:501 [inline]
 ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
 ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
 dst_output include/net/dst.h:444 [inline]
 NF_HOOK include/linux/netfilter.h:289 [inline]
 mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
 mld_send_cr net/ipv6/mcast.c:1979 [inline]
 mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
 call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers kernel/time/timer.c:1682 [inline]
 run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
 __do_softirq+0x26d/0xabd kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:372 [inline]
 irq_exit+0x209/0x290 kernel/softirq.c:412
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
 </IRQ>
RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
 default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x2ca/0x420 kernel/sched/idle.c:262
 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
 start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243

Allocated by task 19764:
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
 __kmalloc+0x11b/0x2d0 mm/slub.c:3750
 kmalloc include/linux/slab.h:518 [inline]
 sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
 sk_alloc+0x3d/0xc00 net/core/sock.c:1523
 tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
 misc_open+0x367/0x4e0 drivers/char/misc.c:141
 chrdev_open+0x212/0x580 fs/char_dev.c:417
 do_dentry_open+0x704/0x1050 fs/open.c:777
 do_last fs/namei.c:3418 [inline]
 path_openat+0x7ed/0x2ae0 fs/namei.c:3533
 do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
 do_sys_open+0x307/0x430 fs/open.c:1069
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 19764:
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
 slab_free_hook mm/slub.c:1370 [inline]
 slab_free_freelist_hook mm/slub.c:1397 [inline]
 slab_free mm/slub.c:2952 [inline]
 kfree+0xeb/0x2f0 mm/slub.c:3905
 sk_prot_free net/core/sock.c:1506 [inline]
 __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
 sk_destruct+0x48/0x70 net/core/sock.c:1596
 __sk_free+0xa9/0x270 net/core/sock.c:1607
 sk_free+0x2a/0x30 net/core/sock.c:1618
 sock_put include/net/sock.h:1696 [inline]
 __tun_detach+0x464/0xf70 drivers/net/tun.c:735
 tun_detach drivers/net/tun.c:747 [inline]
 tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
 __fput+0x27f/0x7f0 fs/file_table.c:278
 task_work_run+0x136/0x1b0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:193 [inline]
 exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
 prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
 do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88836cc26600
 which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 1136 bytes inside of
 4096-byte region [ffff88836cc26600, ffff88836cc27600)
The buggy address belongs to the page:
page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0
flags: 0x2fffff80008100(slab|head)
raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: KASAN: use-after-free Read in tun_net_xmit
  2019-04-22  3:57 BUG: KASAN: use-after-free Read in tun_net_xmit YueHaibing
@ 2019-04-22  9:41 ` Jason Wang
  2019-04-22 12:37   ` YueHaibing
  2019-04-23  6:00   ` Cong Wang
  0 siblings, 2 replies; 28+ messages in thread
From: Jason Wang @ 2019-04-22  9:41 UTC (permalink / raw)
  To: YueHaibing, netdev


On 2019/4/22 上午11:57, YueHaibing wrote:
> We get a KASAN report as below, but don't have any reproducer.
>
> Any comments are appreciated.
>
> ==================================================================
> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0


Which kernel version did you use? The calltrace points out the a use 
after free for tun_file structure which should be synchronized through 
RCU + RTNL lock.

Thanks


>
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> Call Trace:
>   <IRQ>
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0xca/0x13e lib/dump_stack.c:113
>   print_address_description+0x79/0x330 mm/kasan/report.c:253
>   kasan_report_error mm/kasan/report.c:351 [inline]
>   kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
>   tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>   __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
>   netdev_start_xmit include/linux/netdevice.h:4309 [inline]
>   xmit_one net/core/dev.c:3243 [inline]
>   dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
>   sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
>   qdisc_restart net/sched/sch_generic.c:390 [inline]
>   __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
>   qdisc_run include/net/pkt_sched.h:120 [inline]
>   __dev_xmit_skb net/core/dev.c:3438 [inline]
>   __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
>   neigh_output include/net/neighbour.h:501 [inline]
>   ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
>   ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
>   NF_HOOK_COND include/linux/netfilter.h:278 [inline]
>   ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
>   dst_output include/net/dst.h:444 [inline]
>   NF_HOOK include/linux/netfilter.h:289 [inline]
>   mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
>   mld_send_cr net/ipv6/mcast.c:1979 [inline]
>   mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
>   call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
>   expire_timers kernel/time/timer.c:1363 [inline]
>   __run_timers kernel/time/timer.c:1682 [inline]
>   run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
>   __do_softirq+0x26d/0xabd kernel/softirq.c:292
>   invoke_softirq kernel/softirq.c:372 [inline]
>   irq_exit+0x209/0x290 kernel/softirq.c:412
>   exiting_irq arch/x86/include/asm/apic.h:536 [inline]
>   smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
>   </IRQ>
> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
>   arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
>   default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
>   cpuidle_idle_call kernel/sched/idle.c:153 [inline]
>   do_idle+0x2ca/0x420 kernel/sched/idle.c:262
>   cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
>   start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
>   secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
>
> Allocated by task 19764:
>   set_track mm/kasan/kasan.c:460 [inline]
>   kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
>   __kmalloc+0x11b/0x2d0 mm/slub.c:3750
>   kmalloc include/linux/slab.h:518 [inline]
>   sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
>   sk_alloc+0x3d/0xc00 net/core/sock.c:1523
>   tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
>   misc_open+0x367/0x4e0 drivers/char/misc.c:141
>   chrdev_open+0x212/0x580 fs/char_dev.c:417
>   do_dentry_open+0x704/0x1050 fs/open.c:777
>   do_last fs/namei.c:3418 [inline]
>   path_openat+0x7ed/0x2ae0 fs/namei.c:3533
>   do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
>   do_sys_open+0x307/0x430 fs/open.c:1069
>   do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Freed by task 19764:
>   set_track mm/kasan/kasan.c:460 [inline]
>   __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
>   slab_free_hook mm/slub.c:1370 [inline]
>   slab_free_freelist_hook mm/slub.c:1397 [inline]
>   slab_free mm/slub.c:2952 [inline]
>   kfree+0xeb/0x2f0 mm/slub.c:3905
>   sk_prot_free net/core/sock.c:1506 [inline]
>   __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
>   sk_destruct+0x48/0x70 net/core/sock.c:1596
>   __sk_free+0xa9/0x270 net/core/sock.c:1607
>   sk_free+0x2a/0x30 net/core/sock.c:1618
>   sock_put include/net/sock.h:1696 [inline]
>   __tun_detach+0x464/0xf70 drivers/net/tun.c:735
>   tun_detach drivers/net/tun.c:747 [inline]
>   tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
>   __fput+0x27f/0x7f0 fs/file_table.c:278
>   task_work_run+0x136/0x1b0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:193 [inline]
>   exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
>   prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>   do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> The buggy address belongs to the object at ffff88836cc26600
>   which belongs to the cache kmalloc-4096 of size 4096
> The buggy address is located 1136 bytes inside of
>   4096-byte region [ffff88836cc26600, ffff88836cc27600)
> The buggy address belongs to the page:
> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0
> flags: 0x2fffff80008100(slab|head)
> raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
> raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>   ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                               ^
>   ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: KASAN: use-after-free Read in tun_net_xmit
  2019-04-22  9:41 ` Jason Wang
@ 2019-04-22 12:37   ` YueHaibing
  2019-04-23  6:00   ` Cong Wang
  1 sibling, 0 replies; 28+ messages in thread
From: YueHaibing @ 2019-04-22 12:37 UTC (permalink / raw)
  To: Jason Wang, netdev

On 2019/4/22 17:41, Jason Wang wrote:
> 
> On 2019/4/22 上午11:57, YueHaibing wrote:
>> We get a KASAN report as below, but don't have any reproducer.
>>
>> Any comments are appreciated.
>>
>> ==================================================================
>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
> 
> 
> Which kernel version did you use? The calltrace points out the a use after free for tun_file structure which should be synchronized through RCU + RTNL lock.

We use lts 4.19.32 on x86_64.

I try to reproduce this by adding and deleting tun queue repeatedly
while tun_net_xmit sending packets, but never recurring it.


> 
> Thanks
> 
> 
>>
>> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
>> Call Trace:
>>   <IRQ>
>>   __dump_stack lib/dump_stack.c:77 [inline]
>>   dump_stack+0xca/0x13e lib/dump_stack.c:113
>>   print_address_description+0x79/0x330 mm/kasan/report.c:253
>>   kasan_report_error mm/kasan/report.c:351 [inline]
>>   kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
>>   tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>>   __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
>>   netdev_start_xmit include/linux/netdevice.h:4309 [inline]
>>   xmit_one net/core/dev.c:3243 [inline]
>>   dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
>>   sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
>>   qdisc_restart net/sched/sch_generic.c:390 [inline]
>>   __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
>>   qdisc_run include/net/pkt_sched.h:120 [inline]
>>   __dev_xmit_skb net/core/dev.c:3438 [inline]
>>   __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
>>   neigh_output include/net/neighbour.h:501 [inline]
>>   ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
>>   ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
>>   NF_HOOK_COND include/linux/netfilter.h:278 [inline]
>>   ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
>>   dst_output include/net/dst.h:444 [inline]
>>   NF_HOOK include/linux/netfilter.h:289 [inline]
>>   mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
>>   mld_send_cr net/ipv6/mcast.c:1979 [inline]
>>   mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
>>   call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
>>   expire_timers kernel/time/timer.c:1363 [inline]
>>   __run_timers kernel/time/timer.c:1682 [inline]
>>   run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
>>   __do_softirq+0x26d/0xabd kernel/softirq.c:292
>>   invoke_softirq kernel/softirq.c:372 [inline]
>>   irq_exit+0x209/0x290 kernel/softirq.c:412
>>   exiting_irq arch/x86/include/asm/apic.h:536 [inline]
>>   smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
>>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
>>   </IRQ>
>> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
>> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
>> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
>> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
>> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
>>   arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
>>   default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
>>   cpuidle_idle_call kernel/sched/idle.c:153 [inline]
>>   do_idle+0x2ca/0x420 kernel/sched/idle.c:262
>>   cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
>>   start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
>>   secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
>>
>> Allocated by task 19764:
>>   set_track mm/kasan/kasan.c:460 [inline]
>>   kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
>>   __kmalloc+0x11b/0x2d0 mm/slub.c:3750
>>   kmalloc include/linux/slab.h:518 [inline]
>>   sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
>>   sk_alloc+0x3d/0xc00 net/core/sock.c:1523
>>   tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
>>   misc_open+0x367/0x4e0 drivers/char/misc.c:141
>>   chrdev_open+0x212/0x580 fs/char_dev.c:417
>>   do_dentry_open+0x704/0x1050 fs/open.c:777
>>   do_last fs/namei.c:3418 [inline]
>>   path_openat+0x7ed/0x2ae0 fs/namei.c:3533
>>   do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
>>   do_sys_open+0x307/0x430 fs/open.c:1069
>>   do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Freed by task 19764:
>>   set_track mm/kasan/kasan.c:460 [inline]
>>   __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
>>   slab_free_hook mm/slub.c:1370 [inline]
>>   slab_free_freelist_hook mm/slub.c:1397 [inline]
>>   slab_free mm/slub.c:2952 [inline]
>>   kfree+0xeb/0x2f0 mm/slub.c:3905
>>   sk_prot_free net/core/sock.c:1506 [inline]
>>   __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
>>   sk_destruct+0x48/0x70 net/core/sock.c:1596
>>   __sk_free+0xa9/0x270 net/core/sock.c:1607
>>   sk_free+0x2a/0x30 net/core/sock.c:1618
>>   sock_put include/net/sock.h:1696 [inline]
>>   __tun_detach+0x464/0xf70 drivers/net/tun.c:735
>>   tun_detach drivers/net/tun.c:747 [inline]
>>   tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
>>   __fput+0x27f/0x7f0 fs/file_table.c:278
>>   task_work_run+0x136/0x1b0 kernel/task_work.c:113
>>   tracehook_notify_resume include/linux/tracehook.h:193 [inline]
>>   exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
>>   prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>>   syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>>   do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> The buggy address belongs to the object at ffff88836cc26600
>>   which belongs to the cache kmalloc-4096 of size 4096
>> The buggy address is located 1136 bytes inside of
>>   4096-byte region [ffff88836cc26600, ffff88836cc27600)
>> The buggy address belongs to the page:
>> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0
>> flags: 0x2fffff80008100(slab|head)
>> raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
>> raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>>   ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                                                               ^
>>   ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ==================================================================
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: KASAN: use-after-free Read in tun_net_xmit
  2019-04-22  9:41 ` Jason Wang
  2019-04-22 12:37   ` YueHaibing
@ 2019-04-23  6:00   ` Cong Wang
  2019-04-23  6:42     ` Jason Wang
  1 sibling, 1 reply; 28+ messages in thread
From: Cong Wang @ 2019-04-23  6:00 UTC (permalink / raw)
  To: Jason Wang; +Cc: YueHaibing, netdev

On Mon, Apr 22, 2019 at 2:41 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2019/4/22 上午11:57, YueHaibing wrote:
> > We get a KASAN report as below, but don't have any reproducer.
> >
> > Any comments are appreciated.
> >
> > ==================================================================
> > BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> > Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>
>
> Which kernel version did you use? The calltrace points out the a use
> after free for tun_file structure which should be synchronized through
> RCU + RTNL lock.

The tfile socket has to be marked with SOCK_RCU_FREE in order
to fully respect the RCU grace period.

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e9ca1c088d0b..31c3210288cb 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
struct file * file)
        file->private_data = tfile;
        INIT_LIST_HEAD(&tfile->next);

+       sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
        sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);

        return 0;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: BUG: KASAN: use-after-free Read in tun_net_xmit
  2019-04-23  6:00   ` Cong Wang
@ 2019-04-23  6:42     ` Jason Wang
  2019-04-23 16:41       ` Cong Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2019-04-23  6:42 UTC (permalink / raw)
  To: Cong Wang; +Cc: YueHaibing, netdev


On 2019/4/23 下午2:00, Cong Wang wrote:
> On Mon, Apr 22, 2019 at 2:41 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2019/4/22 上午11:57, YueHaibing wrote:
>>> We get a KASAN report as below, but don't have any reproducer.
>>>
>>> Any comments are appreciated.
>>>
>>> ==================================================================
>>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>>
>> Which kernel version did you use? The calltrace points out the a use
>> after free for tun_file structure which should be synchronized through
>> RCU + RTNL lock.
> The tfile socket has to be marked with SOCK_RCU_FREE in order
> to fully respect the RCU grace period.
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e9ca1c088d0b..31c3210288cb 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
> struct file * file)
>          file->private_data = tfile;
>          INIT_LIST_HEAD(&tfile->next);
>
> +       sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
>          sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
>
>          return 0;


We did a synchronize_net() when socket is detached from netdevice in 
__tun_detach() so it looks to me this is unnecessary.

But if it solves the use-after-free, it means we have bugs somewhere.

Thanks


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: KASAN: use-after-free Read in tun_net_xmit
  2019-04-23  6:42     ` Jason Wang
@ 2019-04-23 16:41       ` Cong Wang
  2019-04-24  9:11         ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Cong Wang @ 2019-04-23 16:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: YueHaibing, netdev

On Mon, Apr 22, 2019 at 11:42 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2019/4/23 下午2:00, Cong Wang wrote:
> > On Mon, Apr 22, 2019 at 2:41 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2019/4/22 上午11:57, YueHaibing wrote:
> >>> We get a KASAN report as below, but don't have any reproducer.
> >>>
> >>> Any comments are appreciated.
> >>>
> >>> ==================================================================
> >>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> >>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
> >>
> >> Which kernel version did you use? The calltrace points out the a use
> >> after free for tun_file structure which should be synchronized through
> >> RCU + RTNL lock.
> > The tfile socket has to be marked with SOCK_RCU_FREE in order
> > to fully respect the RCU grace period.
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index e9ca1c088d0b..31c3210288cb 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
> > struct file * file)
> >          file->private_data = tfile;
> >          INIT_LIST_HEAD(&tfile->next);
> >
> > +       sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
> >          sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
> >
> >          return 0;
>
>
> We did a synchronize_net() when socket is detached from netdevice in
> __tun_detach() so it looks to me this is unnecessary.

I knew, but it is only called conditionally, that is:

 695         if (tun && !tfile->detached) {
...
 710
 711                 synchronize_net();

And it looks like syzbot just skipped this condition, this is why I believe
you still need to respect RCU grace period _unconditionally_ for tfile.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: KASAN: use-after-free Read in tun_net_xmit
  2019-04-23 16:41       ` Cong Wang
@ 2019-04-24  9:11         ` Jason Wang
  2019-04-24 12:25           ` YueHaibing
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2019-04-24  9:11 UTC (permalink / raw)
  To: Cong Wang; +Cc: YueHaibing, netdev


On 2019/4/24 上午12:41, Cong Wang wrote:
> On Mon, Apr 22, 2019 at 11:42 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2019/4/23 下午2:00, Cong Wang wrote:
>>> On Mon, Apr 22, 2019 at 2:41 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2019/4/22 上午11:57, YueHaibing wrote:
>>>>> We get a KASAN report as below, but don't have any reproducer.
>>>>>
>>>>> Any comments are appreciated.
>>>>>
>>>>> ==================================================================
>>>>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>>>>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>>>> Which kernel version did you use? The calltrace points out the a use
>>>> after free for tun_file structure which should be synchronized through
>>>> RCU + RTNL lock.
>>> The tfile socket has to be marked with SOCK_RCU_FREE in order
>>> to fully respect the RCU grace period.
>>>
>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>> index e9ca1c088d0b..31c3210288cb 100644
>>> --- a/drivers/net/tun.c
>>> +++ b/drivers/net/tun.c
>>> @@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
>>> struct file * file)
>>>           file->private_data = tfile;
>>>           INIT_LIST_HEAD(&tfile->next);
>>>
>>> +       sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
>>>           sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
>>>
>>>           return 0;
>>
>> We did a synchronize_net() when socket is detached from netdevice in
>> __tun_detach() so it looks to me this is unnecessary.
> I knew, but it is only called conditionally, that is:
>
>   695         if (tun && !tfile->detached) {
> ...
>   710
>   711                 synchronize_net();
>
> And it looks like syzbot just skipped this condition,


If tfile is detached, it should have gone for the path of 
synchronize_net() before. If tfile is never attached, tun_net_xmit() 
doesn't have the chance to access that. I wonder whether or not we 
should use WRITE_ONCE() for tun->numqueues-- in this fucntion. If the 
value was not committed to memory before synchronize_net(), we may race 
with tun_net_xmit() which check txq against tun->numqueues.


> this is why I believe
> you still need to respect RCU grace period _unconditionally_ for tfile.


This is true if I miss subtle race in the code.


Haibing: could you please try the following test?

1) start VM with multiple queue

2) using pktgen to inject packets to all queues through tap

3) using ethtool to change the combined channels in guest in a loop

4) kill the guest


Thanks


> Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: KASAN: use-after-free Read in tun_net_xmit
  2019-04-24  9:11         ` Jason Wang
@ 2019-04-24 12:25           ` YueHaibing
  2019-04-25 16:04             ` YueHaibing
  0 siblings, 1 reply; 28+ messages in thread
From: YueHaibing @ 2019-04-24 12:25 UTC (permalink / raw)
  To: Jason Wang, Cong Wang; +Cc: netdev

On 2019/4/24 17:11, Jason Wang wrote:
> 
> On 2019/4/24 上午12:41, Cong Wang wrote:
>> On Mon, Apr 22, 2019 at 11:42 PM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> On 2019/4/23 下午2:00, Cong Wang wrote:
>>>> On Mon, Apr 22, 2019 at 2:41 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>> On 2019/4/22 上午11:57, YueHaibing wrote:
>>>>>> We get a KASAN report as below, but don't have any reproducer.
>>>>>>
>>>>>> Any comments are appreciated.
>>>>>>
>>>>>> ==================================================================
>>>>>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>>>>>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>>>>> Which kernel version did you use? The calltrace points out the a use
>>>>> after free for tun_file structure which should be synchronized through
>>>>> RCU + RTNL lock.
>>>> The tfile socket has to be marked with SOCK_RCU_FREE in order
>>>> to fully respect the RCU grace period.
>>>>
>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>> index e9ca1c088d0b..31c3210288cb 100644
>>>> --- a/drivers/net/tun.c
>>>> +++ b/drivers/net/tun.c
>>>> @@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
>>>> struct file * file)
>>>>           file->private_data = tfile;
>>>>           INIT_LIST_HEAD(&tfile->next);
>>>>
>>>> +       sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
>>>>           sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
>>>>
>>>>           return 0;
>>>
>>> We did a synchronize_net() when socket is detached from netdevice in
>>> __tun_detach() so it looks to me this is unnecessary.
>> I knew, but it is only called conditionally, that is:
>>
>>   695         if (tun && !tfile->detached) {
>> ...
>>   710
>>   711                 synchronize_net();
>>
>> And it looks like syzbot just skipped this condition,
> 
> 
> If tfile is detached, it should have gone for the path of synchronize_net() before. If tfile is never attached, tun_net_xmit() doesn't have the chance to access that. I wonder whether or not we should use WRITE_ONCE() for tun->numqueues-- in this fucntion. If the value was not committed to memory before synchronize_net(), we may race with tun_net_xmit() which check txq against tun->numqueues.
> 
> 
>> this is why I believe
>> you still need to respect RCU grace period _unconditionally_ for tfile.
> 
> 
> This is true if I miss subtle race in the code.
> 
> 
> Haibing: could you please try the following test?
> 
> 1) start VM with multiple queue
> 
> 2) using pktgen to inject packets to all queues through tap
> 
> 3) using ethtool to change the combined channels in guest in a loop
> 
> 4) kill the guest
> 
> 

Ok, I will try this.

> Thanks
> 
> 
>> Thanks.
> 
> .
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: KASAN: use-after-free Read in tun_net_xmit
  2019-04-24 12:25           ` YueHaibing
@ 2019-04-25 16:04             ` YueHaibing
  2019-04-26  7:37               ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: YueHaibing @ 2019-04-25 16:04 UTC (permalink / raw)
  To: Jason Wang, Cong Wang; +Cc: netdev

On 2019/4/24 20:25, YueHaibing wrote:
> On 2019/4/24 17:11, Jason Wang wrote:
>>
>> On 2019/4/24 上午12:41, Cong Wang wrote:
>>> On Mon, Apr 22, 2019 at 11:42 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>> On 2019/4/23 下午2:00, Cong Wang wrote:
>>>>> On Mon, Apr 22, 2019 at 2:41 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On 2019/4/22 上午11:57, YueHaibing wrote:
>>>>>>> We get a KASAN report as below, but don't have any reproducer.
>>>>>>>
>>>>>>> Any comments are appreciated.
>>>>>>>
>>>>>>> ==================================================================
>>>>>>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>>>>>>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>>>>>> Which kernel version did you use? The calltrace points out the a use
>>>>>> after free for tun_file structure which should be synchronized through
>>>>>> RCU + RTNL lock.
>>>>> The tfile socket has to be marked with SOCK_RCU_FREE in order
>>>>> to fully respect the RCU grace period.
>>>>>
>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>> index e9ca1c088d0b..31c3210288cb 100644
>>>>> --- a/drivers/net/tun.c
>>>>> +++ b/drivers/net/tun.c
>>>>> @@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
>>>>> struct file * file)
>>>>>           file->private_data = tfile;
>>>>>           INIT_LIST_HEAD(&tfile->next);
>>>>>
>>>>> +       sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
>>>>>           sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
>>>>>
>>>>>           return 0;
>>>>
>>>> We did a synchronize_net() when socket is detached from netdevice in
>>>> __tun_detach() so it looks to me this is unnecessary.
>>> I knew, but it is only called conditionally, that is:
>>>
>>>   695         if (tun && !tfile->detached) {
>>> ...
>>>   710
>>>   711                 synchronize_net();
>>>
>>> And it looks like syzbot just skipped this condition,
>>
>>
>> If tfile is detached, it should have gone for the path of synchronize_net() before. If tfile is never attached, tun_net_xmit() doesn't have the chance to access that. I wonder whether or not we should use WRITE_ONCE() for tun->numqueues-- in this fucntion. If the value was not committed to memory before synchronize_net(), we may race with tun_net_xmit() which check txq against tun->numqueues.
>>
>>
>>> this is why I believe
>>> you still need to respect RCU grace period _unconditionally_ for tfile.
>>
>>
>> This is true if I miss subtle race in the code.
>>
>>
>> Haibing: could you please try the following test?
>>
>> 1) start VM with multiple queue

 I configured 8 queues with virtio driver to start vm

>>
>> 2) using pktgen to inject packets to all queues through tap

 inject packet into tap nic in host

>>
>> 3) using ethtool to change the combined channels in guest in a loop

 repeat do as follow in vm:
 	ethtool -L eth0 combined 4
 	ethtool -L eth0 combined 8

>>
>> 4) kill the guest
>>

This cannot reproduce the issue.


>>
> 
> Ok, I will try this.
> 
>> Thanks
>>
>>
>>> Thanks.
>>
>> .
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: KASAN: use-after-free Read in tun_net_xmit
  2019-04-25 16:04             ` YueHaibing
@ 2019-04-26  7:37               ` Jason Wang
  2019-04-28  3:05                 ` [PATCH] tun: Fix use-after-free " Yue Haibing
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2019-04-26  7:37 UTC (permalink / raw)
  To: YueHaibing, Cong Wang; +Cc: netdev


On 2019/4/26 上午12:04, YueHaibing wrote:
> On 2019/4/24 20:25, YueHaibing wrote:
>> On 2019/4/24 17:11, Jason Wang wrote:
>>> On 2019/4/24 上午12:41, Cong Wang wrote:
>>>> On Mon, Apr 22, 2019 at 11:42 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>> On 2019/4/23 下午2:00, Cong Wang wrote:
>>>>>> On Mon, Apr 22, 2019 at 2:41 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>> On 2019/4/22 上午11:57, YueHaibing wrote:
>>>>>>>> We get a KASAN report as below, but don't have any reproducer.
>>>>>>>>
>>>>>>>> Any comments are appreciated.
>>>>>>>>
>>>>>>>> ==================================================================
>>>>>>>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>>>>>>>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>>>>>>> Which kernel version did you use? The calltrace points out the a use
>>>>>>> after free for tun_file structure which should be synchronized through
>>>>>>> RCU + RTNL lock.
>>>>>> The tfile socket has to be marked with SOCK_RCU_FREE in order
>>>>>> to fully respect the RCU grace period.
>>>>>>
>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>>> index e9ca1c088d0b..31c3210288cb 100644
>>>>>> --- a/drivers/net/tun.c
>>>>>> +++ b/drivers/net/tun.c
>>>>>> @@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
>>>>>> struct file * file)
>>>>>>            file->private_data = tfile;
>>>>>>            INIT_LIST_HEAD(&tfile->next);
>>>>>>
>>>>>> +       sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
>>>>>>            sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
>>>>>>
>>>>>>            return 0;
>>>>> We did a synchronize_net() when socket is detached from netdevice in
>>>>> __tun_detach() so it looks to me this is unnecessary.
>>>> I knew, but it is only called conditionally, that is:
>>>>
>>>>    695         if (tun && !tfile->detached) {
>>>> ...
>>>>    710
>>>>    711                 synchronize_net();
>>>>
>>>> And it looks like syzbot just skipped this condition,
>>>
>>> If tfile is detached, it should have gone for the path of synchronize_net() before. If tfile is never attached, tun_net_xmit() doesn't have the chance to access that. I wonder whether or not we should use WRITE_ONCE() for tun->numqueues-- in this fucntion. If the value was not committed to memory before synchronize_net(), we may race with tun_net_xmit() which check txq against tun->numqueues.
>>>
>>>
>>>> this is why I believe
>>>> you still need to respect RCU grace period _unconditionally_ for tfile.
>>>
>>> This is true if I miss subtle race in the code.
>>>
>>>
>>> Haibing: could you please try the following test?
>>>
>>> 1) start VM with multiple queue
>   I configured 8 queues with virtio driver to start vm
>
>>> 2) using pktgen to inject packets to all queues through tap
>   inject packet into tap nic in host
>
>>> 3) using ethtool to change the combined channels in guest in a loop
>   repeat do as follow in vm:
>   	ethtool -L eth0 combined 4
>   	ethtool -L eth0 combined 8
>
>>> 4) kill the guest
>>>
> This cannot reproduce the issue.


Thanks for the testing. If possible, please keep doing the test. I will 
try to come some patch in one or two days.


>
>
>> Ok, I will try this.
>>
>>> Thanks
>>>
>>>
>>>> Thanks.
>>> .
>>>
>>
>> .
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-26  7:37               ` Jason Wang
@ 2019-04-28  3:05                 ` Yue Haibing
  2019-04-28  3:24                   ` Jason Wang
                                     ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Yue Haibing @ 2019-04-28  3:05 UTC (permalink / raw)
  To: davem, jasowang, edumazet, brouer, mst, lirongqing,
	xiyou.wangcong, nicolas.dichtel, 3chas3, wangli39
  Cc: linux-kernel, netdev, YueHaibing

From: YueHaibing <yuehaibing@huawei.com>

KASAN report this:

BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0

CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xca/0x13e lib/dump_stack.c:113
 print_address_description+0x79/0x330 mm/kasan/report.c:253
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
 tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
 __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
 netdev_start_xmit include/linux/netdevice.h:4309 [inline]
 xmit_one net/core/dev.c:3243 [inline]
 dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
 sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
 qdisc_restart net/sched/sch_generic.c:390 [inline]
 __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
 qdisc_run include/net/pkt_sched.h:120 [inline]
 __dev_xmit_skb net/core/dev.c:3438 [inline]
 __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
 neigh_output include/net/neighbour.h:501 [inline]
 ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
 ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
 dst_output include/net/dst.h:444 [inline]
 NF_HOOK include/linux/netfilter.h:289 [inline]
 mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
 mld_send_cr net/ipv6/mcast.c:1979 [inline]
 mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
 call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers kernel/time/timer.c:1682 [inline]
 run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
 __do_softirq+0x26d/0xabd kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:372 [inline]
 irq_exit+0x209/0x290 kernel/softirq.c:412
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
 </IRQ>
RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
 default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x2ca/0x420 kernel/sched/idle.c:262
 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
 start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243

Allocated by task 19764:
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
 __kmalloc+0x11b/0x2d0 mm/slub.c:3750
 kmalloc include/linux/slab.h:518 [inline]
 sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
 sk_alloc+0x3d/0xc00 net/core/sock.c:1523
 tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
 misc_open+0x367/0x4e0 drivers/char/misc.c:141
 chrdev_open+0x212/0x580 fs/char_dev.c:417
 do_dentry_open+0x704/0x1050 fs/open.c:777
 do_last fs/namei.c:3418 [inline]
 path_openat+0x7ed/0x2ae0 fs/namei.c:3533
 do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
 do_sys_open+0x307/0x430 fs/open.c:1069
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 19764:
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
 slab_free_hook mm/slub.c:1370 [inline]
 slab_free_freelist_hook mm/slub.c:1397 [inline]
 slab_free mm/slub.c:2952 [inline]
 kfree+0xeb/0x2f0 mm/slub.c:3905
 sk_prot_free net/core/sock.c:1506 [inline]
 __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
 sk_destruct+0x48/0x70 net/core/sock.c:1596
 __sk_free+0xa9/0x270 net/core/sock.c:1607
 sk_free+0x2a/0x30 net/core/sock.c:1618
 sock_put include/net/sock.h:1696 [inline]
 __tun_detach+0x464/0xf70 drivers/net/tun.c:735
 tun_detach drivers/net/tun.c:747 [inline]
 tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
 __fput+0x27f/0x7f0 fs/file_table.c:278
 task_work_run+0x136/0x1b0 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:193 [inline]
 exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
 prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
 do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88836cc26600
 which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 1136 bytes inside of
 4096-byte region [ffff88836cc26600, ffff88836cc27600)
The buggy address belongs to the page:
page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0
flags: 0x2fffff80008100(slab|head)
raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

If tun driver have multiqueues, user close the last queue by
tun_detach, then tun->tfiles[index] is not cleared. Then a new
queue may add to the tun, which using rcu_assign_pointer
tun->tfiles[index] to the new tfile and increase the numqueues.
However if there send a packet during this time, which picking the last
queue, it may uses the old tun->tfiles[index], beacause there no
RCU grace period.

1) tun_chr_close //close the last queue
    --> __tun_detach  //close the last queue, but tun->tfiles[index] still exist

2) tun_chr_open  //attach a new queue
    --> tun_attach
      -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
	  //there need a RCU grace period

    -->tun->numqueues++;
			
3) tun_net_xmit //a new packet is sending, which pick the last queue
     -->if (txq >= tun->numqueues)
	   //above check passed, but tfile still not renew
     -->if (tfile->socket.sk->sk_filter ...
	   //use the old tfile,trigger use-after-free

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/tun.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e9ca1c0..3770aba 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	 */
 	rcu_assign_pointer(tfile->tun, tun);
 	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
+	synchronize_net();
 	tun->numqueues++;
 	tun_set_real_num_queues(tun);
 out:
-- 
2.7.0



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-28  3:05                 ` [PATCH] tun: Fix use-after-free " Yue Haibing
@ 2019-04-28  3:24                   ` Jason Wang
  2019-04-28  4:26                     ` Jason Wang
  2019-04-28 17:49                   ` Cong Wang
  2019-04-29 14:55                   ` Michael S. Tsirkin
  2 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2019-04-28  3:24 UTC (permalink / raw)
  To: Yue Haibing, davem, edumazet, brouer, mst, lirongqing,
	xiyou.wangcong, nicolas.dichtel, 3chas3, wangli39
  Cc: linux-kernel, netdev


On 2019/4/28 上午11:05, Yue Haibing wrote:
> From: YueHaibing <yuehaibing@huawei.com>
>
> KASAN report this:
>
> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> Call Trace:
>   <IRQ>
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0xca/0x13e lib/dump_stack.c:113
>   print_address_description+0x79/0x330 mm/kasan/report.c:253
>   kasan_report_error mm/kasan/report.c:351 [inline]
>   kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
>   tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>   __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
>   netdev_start_xmit include/linux/netdevice.h:4309 [inline]
>   xmit_one net/core/dev.c:3243 [inline]
>   dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
>   sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
>   qdisc_restart net/sched/sch_generic.c:390 [inline]
>   __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
>   qdisc_run include/net/pkt_sched.h:120 [inline]
>   __dev_xmit_skb net/core/dev.c:3438 [inline]
>   __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
>   neigh_output include/net/neighbour.h:501 [inline]
>   ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
>   ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
>   NF_HOOK_COND include/linux/netfilter.h:278 [inline]
>   ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
>   dst_output include/net/dst.h:444 [inline]
>   NF_HOOK include/linux/netfilter.h:289 [inline]
>   mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
>   mld_send_cr net/ipv6/mcast.c:1979 [inline]
>   mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
>   call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
>   expire_timers kernel/time/timer.c:1363 [inline]
>   __run_timers kernel/time/timer.c:1682 [inline]
>   run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
>   __do_softirq+0x26d/0xabd kernel/softirq.c:292
>   invoke_softirq kernel/softirq.c:372 [inline]
>   irq_exit+0x209/0x290 kernel/softirq.c:412
>   exiting_irq arch/x86/include/asm/apic.h:536 [inline]
>   smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
>   </IRQ>
> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
>   arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
>   default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
>   cpuidle_idle_call kernel/sched/idle.c:153 [inline]
>   do_idle+0x2ca/0x420 kernel/sched/idle.c:262
>   cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
>   start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
>   secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
>
> Allocated by task 19764:
>   set_track mm/kasan/kasan.c:460 [inline]
>   kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
>   __kmalloc+0x11b/0x2d0 mm/slub.c:3750
>   kmalloc include/linux/slab.h:518 [inline]
>   sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
>   sk_alloc+0x3d/0xc00 net/core/sock.c:1523
>   tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
>   misc_open+0x367/0x4e0 drivers/char/misc.c:141
>   chrdev_open+0x212/0x580 fs/char_dev.c:417
>   do_dentry_open+0x704/0x1050 fs/open.c:777
>   do_last fs/namei.c:3418 [inline]
>   path_openat+0x7ed/0x2ae0 fs/namei.c:3533
>   do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
>   do_sys_open+0x307/0x430 fs/open.c:1069
>   do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Freed by task 19764:
>   set_track mm/kasan/kasan.c:460 [inline]
>   __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
>   slab_free_hook mm/slub.c:1370 [inline]
>   slab_free_freelist_hook mm/slub.c:1397 [inline]
>   slab_free mm/slub.c:2952 [inline]
>   kfree+0xeb/0x2f0 mm/slub.c:3905
>   sk_prot_free net/core/sock.c:1506 [inline]
>   __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
>   sk_destruct+0x48/0x70 net/core/sock.c:1596
>   __sk_free+0xa9/0x270 net/core/sock.c:1607
>   sk_free+0x2a/0x30 net/core/sock.c:1618
>   sock_put include/net/sock.h:1696 [inline]
>   __tun_detach+0x464/0xf70 drivers/net/tun.c:735
>   tun_detach drivers/net/tun.c:747 [inline]
>   tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
>   __fput+0x27f/0x7f0 fs/file_table.c:278
>   task_work_run+0x136/0x1b0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:193 [inline]
>   exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
>   prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>   do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> The buggy address belongs to the object at ffff88836cc26600
>   which belongs to the cache kmalloc-4096 of size 4096
> The buggy address is located 1136 bytes inside of
>   4096-byte region [ffff88836cc26600, ffff88836cc27600)
> The buggy address belongs to the page:
> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0
> flags: 0x2fffff80008100(slab|head)
> raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
> raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>   ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                               ^
>   ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
> If tun driver have multiqueues, user close the last queue by
> tun_detach, then tun->tfiles[index] is not cleared. Then a new
> queue may add to the tun, which using rcu_assign_pointer
> tun->tfiles[index] to the new tfile and increase the numqueues.
> However if there send a packet during this time, which picking the last
> queue, it may uses the old tun->tfiles[index], beacause there no
> RCU grace period.
>
> 1) tun_chr_close //close the last queue
>      --> __tun_detach  //close the last queue, but tun->tfiles[index] still exist
>
> 2) tun_chr_open  //attach a new queue
>      --> tun_attach
>        -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> 	  //there need a RCU grace period
>
>      -->tun->numqueues++;
> 			
> 3) tun_net_xmit //a new packet is sending, which pick the last queue
>       -->if (txq >= tun->numqueues)
> 	   //above check passed, but tfile still not renew
>       -->if (tfile->socket.sk->sk_filter ...
> 	   //use the old tfile,trigger use-after-free
>
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>   drivers/net/tun.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e9ca1c0..3770aba 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>   	 */
>   	rcu_assign_pointer(tfile->tun, tun);
>   	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> +	synchronize_net();
>   	tun->numqueues++;
>   	tun_set_real_num_queues(tun);
>   out:


Good catch, that's one of my suspicion as well. Assume that this has 
been tested.

Acked-by: Jason Wang <jasowang@redhat.com>

This patch is needed for -stable.

I will post another theoretical issue similar to this soon.

Thanks


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-28  3:24                   ` Jason Wang
@ 2019-04-28  4:26                     ` Jason Wang
  2019-04-28  5:40                       ` weiyongjun (A)
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2019-04-28  4:26 UTC (permalink / raw)
  To: Yue Haibing, davem, edumazet, brouer, mst, lirongqing,
	xiyou.wangcong, nicolas.dichtel, 3chas3, wangli39
  Cc: linux-kernel, netdev


On 2019/4/28 上午11:24, Jason Wang wrote:
>
> On 2019/4/28 上午11:05, Yue Haibing wrote:
>> From: YueHaibing <yuehaibing@huawei.com>
>>
>> KASAN report this:
>>
>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 
>> drivers/net/tun.c:1104
>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>>
>> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.10.2-1ubuntu1 04/01/2014
>> Call Trace:
>>   <IRQ>
>>   __dump_stack lib/dump_stack.c:77 [inline]
>>   dump_stack+0xca/0x13e lib/dump_stack.c:113
>>   print_address_description+0x79/0x330 mm/kasan/report.c:253
>>   kasan_report_error mm/kasan/report.c:351 [inline]
>>   kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
>>   tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>>   __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
>>   netdev_start_xmit include/linux/netdevice.h:4309 [inline]
>>   xmit_one net/core/dev.c:3243 [inline]
>>   dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
>>   sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
>>   qdisc_restart net/sched/sch_generic.c:390 [inline]
>>   __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
>>   qdisc_run include/net/pkt_sched.h:120 [inline]
>>   __dev_xmit_skb net/core/dev.c:3438 [inline]
>>   __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
>>   neigh_output include/net/neighbour.h:501 [inline]
>>   ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
>>   ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
>>   NF_HOOK_COND include/linux/netfilter.h:278 [inline]
>>   ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
>>   dst_output include/net/dst.h:444 [inline]
>>   NF_HOOK include/linux/netfilter.h:289 [inline]
>>   mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
>>   mld_send_cr net/ipv6/mcast.c:1979 [inline]
>>   mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
>>   call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
>>   expire_timers kernel/time/timer.c:1363 [inline]
>>   __run_timers kernel/time/timer.c:1682 [inline]
>>   run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
>>   __do_softirq+0x26d/0xabd kernel/softirq.c:292
>>   invoke_softirq kernel/softirq.c:372 [inline]
>>   irq_exit+0x209/0x290 kernel/softirq.c:412
>>   exiting_irq arch/x86/include/asm/apic.h:536 [inline]
>>   smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
>>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
>>   </IRQ>
>> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
>> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 
>> ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 
>> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
>> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
>> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
>> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
>>   arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
>>   default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
>>   cpuidle_idle_call kernel/sched/idle.c:153 [inline]
>>   do_idle+0x2ca/0x420 kernel/sched/idle.c:262
>>   cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
>>   start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
>>   secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
>>
>> Allocated by task 19764:
>>   set_track mm/kasan/kasan.c:460 [inline]
>>   kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
>>   __kmalloc+0x11b/0x2d0 mm/slub.c:3750
>>   kmalloc include/linux/slab.h:518 [inline]
>>   sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
>>   sk_alloc+0x3d/0xc00 net/core/sock.c:1523
>>   tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
>>   misc_open+0x367/0x4e0 drivers/char/misc.c:141
>>   chrdev_open+0x212/0x580 fs/char_dev.c:417
>>   do_dentry_open+0x704/0x1050 fs/open.c:777
>>   do_last fs/namei.c:3418 [inline]
>>   path_openat+0x7ed/0x2ae0 fs/namei.c:3533
>>   do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
>>   do_sys_open+0x307/0x430 fs/open.c:1069
>>   do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Freed by task 19764:
>>   set_track mm/kasan/kasan.c:460 [inline]
>>   __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
>>   slab_free_hook mm/slub.c:1370 [inline]
>>   slab_free_freelist_hook mm/slub.c:1397 [inline]
>>   slab_free mm/slub.c:2952 [inline]
>>   kfree+0xeb/0x2f0 mm/slub.c:3905
>>   sk_prot_free net/core/sock.c:1506 [inline]
>>   __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
>>   sk_destruct+0x48/0x70 net/core/sock.c:1596
>>   __sk_free+0xa9/0x270 net/core/sock.c:1607
>>   sk_free+0x2a/0x30 net/core/sock.c:1618
>>   sock_put include/net/sock.h:1696 [inline]
>>   __tun_detach+0x464/0xf70 drivers/net/tun.c:735
>>   tun_detach drivers/net/tun.c:747 [inline]
>>   tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
>>   __fput+0x27f/0x7f0 fs/file_table.c:278
>>   task_work_run+0x136/0x1b0 kernel/task_work.c:113
>>   tracehook_notify_resume include/linux/tracehook.h:193 [inline]
>>   exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
>>   prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>>   syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>>   do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
>>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> The buggy address belongs to the object at ffff88836cc26600
>>   which belongs to the cache kmalloc-4096 of size 4096
>> The buggy address is located 1136 bytes inside of
>>   4096-byte region [ffff88836cc26600, ffff88836cc27600)
>> The buggy address belongs to the page:
>> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 
>> index:0x0 compound_mapcount: 0
>> flags: 0x2fffff80008100(slab|head)
>> raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
>> raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>>   ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                                                               ^
>>   ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>   ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>
>> If tun driver have multiqueues, user close the last queue by
>> tun_detach, then tun->tfiles[index] is not cleared. Then a new
>> queue may add to the tun, which using rcu_assign_pointer
>> tun->tfiles[index] to the new tfile and increase the numqueues.
>> However if there send a packet during this time, which picking the last
>> queue, it may uses the old tun->tfiles[index], beacause there no
>> RCU grace period.
>>
>> 1) tun_chr_close //close the last queue
>>      --> __tun_detach  //close the last queue, but tun->tfiles[index] 
>> still exist
>>
>> 2) tun_chr_open  //attach a new queue
>>      --> tun_attach
>> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>>       //there need a RCU grace period
>>
>>      -->tun->numqueues++;
>>
>> 3) tun_net_xmit //a new packet is sending, which pick the last queue
>>       -->if (txq >= tun->numqueues)
>>        //above check passed, but tfile still not renew
>>       -->if (tfile->socket.sk->sk_filter ...
>>        //use the old tfile,trigger use-after-free
>>
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>> ---
>>   drivers/net/tun.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index e9ca1c0..3770aba 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, 
>> struct file *file,
>>        */
>>       rcu_assign_pointer(tfile->tun, tun);
>>       rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>> +    synchronize_net();
>>       tun->numqueues++;
>>       tun_set_real_num_queues(tun);
>>   out:
>
>
> Good catch, that's one of my suspicion as well. Assume that this has 
> been tested.
>
> Acked-by: Jason Wang <jasowang@redhat.com>
>
> This patch is needed for -stable.
>
> I will post another theoretical issue similar to this soon.
>
> Thanks
>

Ok, a second thought. All evil come from the access of tun->numqueues 
without sufficient synchronization. This is a partial fix, another 
possible case is __tun_detach(), we need either

1) synchronize through pointers in tfiles

2) or smp_load_acquire()/smp_store_release() to solve it completely.

Let me post a complete fix.

Thanks


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-28  4:26                     ` Jason Wang
@ 2019-04-28  5:40                       ` weiyongjun (A)
  2019-04-28  7:51                         ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: weiyongjun (A) @ 2019-04-28  5:40 UTC (permalink / raw)
  To: Jason Wang, yuehaibing, davem, edumazet, brouer, mst, lirongqing,
	xiyou.wangcong, nicolas.dichtel, 3chas3, wangli39
  Cc: linux-kernel, netdev

Hi Jason,

> On 2019/4/28 上午11:24, Jason Wang wrote:
> >
> > On 2019/4/28 上午11:05, Yue Haibing wrote:
> >> From: YueHaibing <yuehaibing@huawei.com>
> >>
> >> KASAN report this:
> >>
> >> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750
> >> drivers/net/tun.c:1104
> >> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
> >>
> >> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> >> 1.10.2-1ubuntu1 04/01/2014
> >> Call Trace:
> >>   <IRQ>
> >>   __dump_stack lib/dump_stack.c:77 [inline]
> >>   dump_stack+0xca/0x13e lib/dump_stack.c:113
> >>   print_address_description+0x79/0x330 mm/kasan/report.c:253
> >>   kasan_report_error mm/kasan/report.c:351 [inline]
> >>   kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
> >>   tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> >>   __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
> >>   netdev_start_xmit include/linux/netdevice.h:4309 [inline]
> >>   xmit_one net/core/dev.c:3243 [inline]
> >>   dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
> >>   sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
> >>   qdisc_restart net/sched/sch_generic.c:390 [inline]
> >>   __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
> >>   qdisc_run include/net/pkt_sched.h:120 [inline]
> >>   __dev_xmit_skb net/core/dev.c:3438 [inline]
> >>   __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
> >>   neigh_output include/net/neighbour.h:501 [inline]
> >>   ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
> >>   ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
> >>   NF_HOOK_COND include/linux/netfilter.h:278 [inline]
> >>   ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
> >>   dst_output include/net/dst.h:444 [inline]
> >>   NF_HOOK include/linux/netfilter.h:289 [inline]
> >>   mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
> >>   mld_send_cr net/ipv6/mcast.c:1979 [inline]
> >>   mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
> >>   call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
> >>   expire_timers kernel/time/timer.c:1363 [inline]
> >>   __run_timers kernel/time/timer.c:1682 [inline]
> >>   run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
> >>   __do_softirq+0x26d/0xabd kernel/softirq.c:292
> >>   invoke_softirq kernel/softirq.c:372 [inline]
> >>   irq_exit+0x209/0x290 kernel/softirq.c:412
> >>   exiting_irq arch/x86/include/asm/apic.h:536 [inline]
> >>   smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
> >>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
> >>   </IRQ>
> >> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
> >> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9
> >> ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3>
> >> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
> >> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> >> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
> >> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
> >> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
> >> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
> >> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
> >>   arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
> >>   default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
> >>   cpuidle_idle_call kernel/sched/idle.c:153 [inline]
> >>   do_idle+0x2ca/0x420 kernel/sched/idle.c:262
> >>   cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
> >>   start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
> >>   secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
> >>
> >> Allocated by task 19764:
> >>   set_track mm/kasan/kasan.c:460 [inline]
> >>   kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
> >>   __kmalloc+0x11b/0x2d0 mm/slub.c:3750
> >>   kmalloc include/linux/slab.h:518 [inline]
> >>   sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
> >>   sk_alloc+0x3d/0xc00 net/core/sock.c:1523
> >>   tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
> >>   misc_open+0x367/0x4e0 drivers/char/misc.c:141
> >>   chrdev_open+0x212/0x580 fs/char_dev.c:417
> >>   do_dentry_open+0x704/0x1050 fs/open.c:777
> >>   do_last fs/namei.c:3418 [inline]
> >>   path_openat+0x7ed/0x2ae0 fs/namei.c:3533
> >>   do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
> >>   do_sys_open+0x307/0x430 fs/open.c:1069
> >>   do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
> >>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>
> >> Freed by task 19764:
> >>   set_track mm/kasan/kasan.c:460 [inline]
> >>   __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
> >>   slab_free_hook mm/slub.c:1370 [inline]
> >>   slab_free_freelist_hook mm/slub.c:1397 [inline]
> >>   slab_free mm/slub.c:2952 [inline]
> >>   kfree+0xeb/0x2f0 mm/slub.c:3905
> >>   sk_prot_free net/core/sock.c:1506 [inline]
> >>   __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
> >>   sk_destruct+0x48/0x70 net/core/sock.c:1596
> >>   __sk_free+0xa9/0x270 net/core/sock.c:1607
> >>   sk_free+0x2a/0x30 net/core/sock.c:1618
> >>   sock_put include/net/sock.h:1696 [inline]
> >>   __tun_detach+0x464/0xf70 drivers/net/tun.c:735
> >>   tun_detach drivers/net/tun.c:747 [inline]
> >>   tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
> >>   __fput+0x27f/0x7f0 fs/file_table.c:278
> >>   task_work_run+0x136/0x1b0 kernel/task_work.c:113
> >>   tracehook_notify_resume include/linux/tracehook.h:193 [inline]
> >>   exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
> >>   prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
> >>   syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
> >>   do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
> >>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >>
> >> The buggy address belongs to the object at ffff88836cc26600
> >>   which belongs to the cache kmalloc-4096 of size 4096
> >> The buggy address is located 1136 bytes inside of
> >>   4096-byte region [ffff88836cc26600, ffff88836cc27600)
> >> The buggy address belongs to the page:
> >> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600
> >> index:0x0 compound_mapcount: 0
> >> flags: 0x2fffff80008100(slab|head)
> >> raw: 002fffff80008100 dead000000000100 dead000000000200
> ffff8883e280e600
> >> raw: 0000000000000000 0000000000070007 00000001ffffffff
> 0000000000000000
> >> page dumped because: kasan: bad access detected
> >>
> >> Memory state around the buggy address:
> >>   ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>   ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>                                                               ^
> >>   ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>   ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>
> >> If tun driver have multiqueues, user close the last queue by
> >> tun_detach, then tun->tfiles[index] is not cleared. Then a new
> >> queue may add to the tun, which using rcu_assign_pointer
> >> tun->tfiles[index] to the new tfile and increase the numqueues.
> >> However if there send a packet during this time, which picking the last
> >> queue, it may uses the old tun->tfiles[index], beacause there no
> >> RCU grace period.
> >>
> >> 1) tun_chr_close //close the last queue
> >>      --> __tun_detach  //close the last queue, but tun->tfiles[index]
> >> still exist
> >>
> >> 2) tun_chr_open  //attach a new queue
> >>      --> tun_attach
> >> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> >>       //there need a RCU grace period
> >>
> >>      -->tun->numqueues++;
> >>
> >> 3) tun_net_xmit //a new packet is sending, which pick the last queue
> >>       -->if (txq >= tun->numqueues)
> >>        //above check passed, but tfile still not renew
> >>       -->if (tfile->socket.sk->sk_filter ...
> >>        //use the old tfile,trigger use-after-free
> >>
> >> Reported-by: Hulk Robot <hulkci@huawei.com>
> >> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
> >> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> >> ---
> >>   drivers/net/tun.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >> index e9ca1c0..3770aba 100644
> >> --- a/drivers/net/tun.c
> >> +++ b/drivers/net/tun.c
> >> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun,
> >> struct file *file,
> >>        */
> >>       rcu_assign_pointer(tfile->tun, tun);
> >>       rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> >> +    synchronize_net();
> >>       tun->numqueues++;
> >>       tun_set_real_num_queues(tun);
> >>   out:
> >
> >
> > Good catch, that's one of my suspicion as well. Assume that this has
> > been tested.
> >
> > Acked-by: Jason Wang <jasowang@redhat.com>
> >
> > This patch is needed for -stable.
> >
> > I will post another theoretical issue similar to this soon.
> >
> > Thanks
> >
> 
> Ok, a second thought. All evil come from the access of tun->numqueues
> without sufficient synchronization. This is a partial fix, another
> possible case is __tun_detach(), we need either

In __tun_detach(),tun->tfiles changes should always call synchronize_net(),
and free tfile after RCU grace period. tun_net_xmit() doesn't have the chance to
access the change because it holding the rcu_read_lock().

So __tun_detach() path cannot cause the use after free, isn't it?

Regards

> 
> 1) synchronize through pointers in tfiles
> 
> 2) or smp_load_acquire()/smp_store_release() to solve it completely.
> 
> Let me post a complete fix.
> 
> Thanks


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-28  5:40                       ` weiyongjun (A)
@ 2019-04-28  7:51                         ` Jason Wang
  2019-04-28 17:59                           ` Cong Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2019-04-28  7:51 UTC (permalink / raw)
  To: weiyongjun (A),
	yuehaibing, davem, edumazet, brouer, mst, lirongqing,
	xiyou.wangcong, nicolas.dichtel, 3chas3, wangli39
  Cc: linux-kernel, netdev, Peter Xu


On 2019/4/28 下午1:40, weiyongjun (A) wrote:
> Hi Jason,
>
>> On 2019/4/28 上午11:24, Jason Wang wrote:
>>> On 2019/4/28 上午11:05, Yue Haibing wrote:
>>>> From: YueHaibing <yuehaibing@huawei.com>
>>>>
>>>> KASAN report this:
>>>>
>>>> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750
>>>> drivers/net/tun.c:1104
>>>> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
>>>>
>>>> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>>>> 1.10.2-1ubuntu1 04/01/2014
>>>> Call Trace:
>>>>    <IRQ>
>>>>    __dump_stack lib/dump_stack.c:77 [inline]
>>>>    dump_stack+0xca/0x13e lib/dump_stack.c:113
>>>>    print_address_description+0x79/0x330 mm/kasan/report.c:253
>>>>    kasan_report_error mm/kasan/report.c:351 [inline]
>>>>    kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
>>>>    tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>>>>    __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
>>>>    netdev_start_xmit include/linux/netdevice.h:4309 [inline]
>>>>    xmit_one net/core/dev.c:3243 [inline]
>>>>    dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
>>>>    sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
>>>>    qdisc_restart net/sched/sch_generic.c:390 [inline]
>>>>    __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
>>>>    qdisc_run include/net/pkt_sched.h:120 [inline]
>>>>    __dev_xmit_skb net/core/dev.c:3438 [inline]
>>>>    __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
>>>>    neigh_output include/net/neighbour.h:501 [inline]
>>>>    ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
>>>>    ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
>>>>    NF_HOOK_COND include/linux/netfilter.h:278 [inline]
>>>>    ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
>>>>    dst_output include/net/dst.h:444 [inline]
>>>>    NF_HOOK include/linux/netfilter.h:289 [inline]
>>>>    mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
>>>>    mld_send_cr net/ipv6/mcast.c:1979 [inline]
>>>>    mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
>>>>    call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
>>>>    expire_timers kernel/time/timer.c:1363 [inline]
>>>>    __run_timers kernel/time/timer.c:1682 [inline]
>>>>    run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
>>>>    __do_softirq+0x26d/0xabd kernel/softirq.c:292
>>>>    invoke_softirq kernel/softirq.c:372 [inline]
>>>>    irq_exit+0x209/0x290 kernel/softirq.c:412
>>>>    exiting_irq arch/x86/include/asm/apic.h:536 [inline]
>>>>    smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
>>>>    apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
>>>>    </IRQ>
>>>> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
>>>> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9
>>>> ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3>
>>>> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
>>>> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>>>> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
>>>> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
>>>> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
>>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
>>>> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
>>>>    arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
>>>>    default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
>>>>    cpuidle_idle_call kernel/sched/idle.c:153 [inline]
>>>>    do_idle+0x2ca/0x420 kernel/sched/idle.c:262
>>>>    cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
>>>>    start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
>>>>    secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
>>>>
>>>> Allocated by task 19764:
>>>>    set_track mm/kasan/kasan.c:460 [inline]
>>>>    kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
>>>>    __kmalloc+0x11b/0x2d0 mm/slub.c:3750
>>>>    kmalloc include/linux/slab.h:518 [inline]
>>>>    sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
>>>>    sk_alloc+0x3d/0xc00 net/core/sock.c:1523
>>>>    tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
>>>>    misc_open+0x367/0x4e0 drivers/char/misc.c:141
>>>>    chrdev_open+0x212/0x580 fs/char_dev.c:417
>>>>    do_dentry_open+0x704/0x1050 fs/open.c:777
>>>>    do_last fs/namei.c:3418 [inline]
>>>>    path_openat+0x7ed/0x2ae0 fs/namei.c:3533
>>>>    do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
>>>>    do_sys_open+0x307/0x430 fs/open.c:1069
>>>>    do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>>>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>
>>>> Freed by task 19764:
>>>>    set_track mm/kasan/kasan.c:460 [inline]
>>>>    __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
>>>>    slab_free_hook mm/slub.c:1370 [inline]
>>>>    slab_free_freelist_hook mm/slub.c:1397 [inline]
>>>>    slab_free mm/slub.c:2952 [inline]
>>>>    kfree+0xeb/0x2f0 mm/slub.c:3905
>>>>    sk_prot_free net/core/sock.c:1506 [inline]
>>>>    __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
>>>>    sk_destruct+0x48/0x70 net/core/sock.c:1596
>>>>    __sk_free+0xa9/0x270 net/core/sock.c:1607
>>>>    sk_free+0x2a/0x30 net/core/sock.c:1618
>>>>    sock_put include/net/sock.h:1696 [inline]
>>>>    __tun_detach+0x464/0xf70 drivers/net/tun.c:735
>>>>    tun_detach drivers/net/tun.c:747 [inline]
>>>>    tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
>>>>    __fput+0x27f/0x7f0 fs/file_table.c:278
>>>>    task_work_run+0x136/0x1b0 kernel/task_work.c:113
>>>>    tracehook_notify_resume include/linux/tracehook.h:193 [inline]
>>>>    exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
>>>>    prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>>>>    syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>>>>    do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
>>>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>
>>>> The buggy address belongs to the object at ffff88836cc26600
>>>>    which belongs to the cache kmalloc-4096 of size 4096
>>>> The buggy address is located 1136 bytes inside of
>>>>    4096-byte region [ffff88836cc26600, ffff88836cc27600)
>>>> The buggy address belongs to the page:
>>>> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600
>>>> index:0x0 compound_mapcount: 0
>>>> flags: 0x2fffff80008100(slab|head)
>>>> raw: 002fffff80008100 dead000000000100 dead000000000200
>> ffff8883e280e600
>>>> raw: 0000000000000000 0000000000070007 00000001ffffffff
>> 0000000000000000
>>>> page dumped because: kasan: bad access detected
>>>>
>>>> Memory state around the buggy address:
>>>>    ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>    ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>> ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>                                                                ^
>>>>    ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>    ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>
>>>> If tun driver have multiqueues, user close the last queue by
>>>> tun_detach, then tun->tfiles[index] is not cleared. Then a new
>>>> queue may add to the tun, which using rcu_assign_pointer
>>>> tun->tfiles[index] to the new tfile and increase the numqueues.
>>>> However if there send a packet during this time, which picking the last
>>>> queue, it may uses the old tun->tfiles[index], beacause there no
>>>> RCU grace period.
>>>>
>>>> 1) tun_chr_close //close the last queue
>>>>       --> __tun_detach  //close the last queue, but tun->tfiles[index]
>>>> still exist
>>>>
>>>> 2) tun_chr_open  //attach a new queue
>>>>       --> tun_attach
>>>> -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>>>>        //there need a RCU grace period
>>>>
>>>>       -->tun->numqueues++;
>>>>
>>>> 3) tun_net_xmit //a new packet is sending, which pick the last queue
>>>>        -->if (txq >= tun->numqueues)
>>>>         //above check passed, but tfile still not renew
>>>>        -->if (tfile->socket.sk->sk_filter ...
>>>>         //use the old tfile,trigger use-after-free
>>>>
>>>> Reported-by: Hulk Robot <hulkci@huawei.com>
>>>> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
>>>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>>>> ---
>>>>    drivers/net/tun.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>> index e9ca1c0..3770aba 100644
>>>> --- a/drivers/net/tun.c
>>>> +++ b/drivers/net/tun.c
>>>> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun,
>>>> struct file *file,
>>>>         */
>>>>        rcu_assign_pointer(tfile->tun, tun);
>>>>        rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>>>> +    synchronize_net();
>>>>        tun->numqueues++;
>>>>        tun_set_real_num_queues(tun);
>>>>    out:
>>>
>>> Good catch, that's one of my suspicion as well. Assume that this has
>>> been tested.
>>>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>
>>> This patch is needed for -stable.
>>>
>>> I will post another theoretical issue similar to this soon.
>>>
>>> Thanks
>>>
>> Ok, a second thought. All evil come from the access of tun->numqueues
>> without sufficient synchronization. This is a partial fix, another
>> possible case is __tun_detach(), we need either
> In __tun_detach(),tun->tfiles changes should always call synchronize_net(),
> and free tfile after RCU grace period.


Exactly, and that's why I suggest to check pointers in tfiles against 
NULL for method 1) I mentioned below.


> tun_net_xmit() doesn't have the chance to
> access the change because it holding the rcu_read_lock().


The problem is the following codes:


         --tun->numqueues;

         ...

         synchronize_net();

We need make sure the decrement of tun->numqueues be visible to readers 
after synchronize_net(). And in tun_net_xmit():


     rcu_read_lock();
     tfile = rcu_dereference(tun->tfiles[txq]);

     /* Drop packet if interface is not attached */
     if (txq >= tun->numqueues)
         goto drop;


We should make sure tun->numqueues were read after tfiles array dereference.

Unfortunately, we don't have such guarantee even with this patch. So we 
should change to use smp_load_acquire()/smp_store_release() for those 
cases (method 2).

It looks to me checking file against NULL (and NULL out 
tfiles[tun->numqueues] in __tun_detach) is much more easier to be 
understand.

What do you think?

Thanks


>
> So __tun_detach() path cannot cause the use after free, isn't it?
>
> Regards
>
>> 1) synchronize through pointers in tfiles
>>
>> 2) or smp_load_acquire()/smp_store_release() to solve it completely.
>>
>> Let me post a complete fix.
>>
>> Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-28  3:05                 ` [PATCH] tun: Fix use-after-free " Yue Haibing
  2019-04-28  3:24                   ` Jason Wang
@ 2019-04-28 17:49                   ` Cong Wang
  2019-04-29 14:55                   ` Michael S. Tsirkin
  2 siblings, 0 replies; 28+ messages in thread
From: Cong Wang @ 2019-04-28 17:49 UTC (permalink / raw)
  To: Yue Haibing
  Cc: David Miller, Jason Wang, Eric Dumazet, Jesper Dangaard Brouer,
	Michael S. Tsirkin, Li,Rongqing, Nicolas Dichtel, Chas Williams,
	wangli39, LKML, Linux Kernel Network Developers

On Sat, Apr 27, 2019 at 8:06 PM Yue Haibing <yuehaibing@huawei.com> wrote:
>
> If tun driver have multiqueues, user close the last queue by
> tun_detach, then tun->tfiles[index] is not cleared. Then a new
> queue may add to the tun, which using rcu_assign_pointer
> tun->tfiles[index] to the new tfile and increase the numqueues.
> However if there send a packet during this time, which picking the last
> queue, it may uses the old tun->tfiles[index], beacause there no
> RCU grace period.

This analysis makes sense. It is a normal scenario for RCU,
where readers could still read even after we unpublish the RCU
protected structure, we only need to worry about when we free it.


> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e9ca1c0..3770aba 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>          */
>         rcu_assign_pointer(tfile->tun, tun);
>         rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> +       synchronize_net();
>         tun->numqueues++;
>         tun_set_real_num_queues(tun);

But this fix doesn't make any sense, we only wait for RCU
grace period when freeing old ones, not for new ones. RCU
grace period is all about readers against free.

This is why I came up with the SOCK_RCU_FREE patch, which
is also blocking-free.


Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-28  7:51                         ` Jason Wang
@ 2019-04-28 17:59                           ` Cong Wang
  2019-04-29  2:23                             ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Cong Wang @ 2019-04-28 17:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: weiyongjun (A),
	yuehaibing, davem, edumazet, brouer, mst, lirongqing,
	nicolas.dichtel, 3chas3, wangli39, linux-kernel, netdev,
	Peter Xu

On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > tun_net_xmit() doesn't have the chance to
> > access the change because it holding the rcu_read_lock().
>
>
> The problem is the following codes:
>
>
>          --tun->numqueues;
>
>          ...
>
>          synchronize_net();
>
> We need make sure the decrement of tun->numqueues be visible to readers
> after synchronize_net(). And in tun_net_xmit():

It doesn't matter at all. Readers are okay to read it even they still use the
stale tun->numqueues, as long as the tfile is not freed readers can read
whatever they want...

The decrement of tun->numqueues is just how we unpublish the old
tfile, it is still valid for readers to read it _after_ unpublish, we only need
to worry about free, not about unpublish. This is the whole spirit of RCU.

You need to rethink about my SOCK_RCU_FREE patch.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-28 17:59                           ` Cong Wang
@ 2019-04-29  2:23                             ` Jason Wang
  2019-04-29  3:53                               ` weiyongjun (A)
                                                 ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Jason Wang @ 2019-04-29  2:23 UTC (permalink / raw)
  To: Cong Wang
  Cc: weiyongjun (A),
	yuehaibing, davem, edumazet, brouer, mst, lirongqing,
	nicolas dichtel, 3chas3, wangli39, linux-kernel, netdev,
	Peter Xu


On 2019/4/29 上午1:59, Cong Wang wrote:
> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote:
>>> tun_net_xmit() doesn't have the chance to
>>> access the change because it holding the rcu_read_lock().
>>
>>
>> The problem is the following codes:
>>
>>
>>          --tun->numqueues;
>>
>>          ...
>>
>>          synchronize_net();
>>
>> We need make sure the decrement of tun->numqueues be visible to readers
>> after synchronize_net(). And in tun_net_xmit():
>
> It doesn't matter at all. Readers are okay to read it even they still use the
> stale tun->numqueues, as long as the tfile is not freed readers can read
> whatever they want...

This is only true if we set SOCK_RCU_FREE, isn't it?

>
> The decrement of tun->numqueues is just how we unpublish the old
> tfile, it is still valid for readers to read it _after_ unpublish, we only need
> to worry about free, not about unpublish. This is the whole spirit of RCU.
>

The point is we don't convert tun->numqueues to RCU but use
synchronize_net().

> You need to rethink about my SOCK_RCU_FREE patch.

The code is wrote before SOCK_RCU_FREE is introduced and assume no
de-reference from device after synchronize_net(). It doesn't harm to
figure out the root cause which may give us more confidence to the fix
(e.g like SOCK_RCU_FREE).

I don't object to fix with SOCK_RCU_FREE, but then we should remove
the redundant synchronize_net(). But I still prefer to synchronize
everything explicitly like (completely untested):

From df91f77d35a6aa7943b6f2a7d4b329990896a0fe Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Mon, 29 Apr 2019 10:21:06 +0800
Subject: [PATCH] tuntap: synchronize through tfiles instead of numqueues

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 80bff1b4ec17..03715f605fb5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -698,6 +698,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 
 		rcu_assign_pointer(tun->tfiles[index],
 				   tun->tfiles[tun->numqueues - 1]);
+		rcu_assign_pointer(tun->tfiles[tun->numqueues], NULL);
 		ntfile = rtnl_dereference(tun->tfiles[index]);
 		ntfile->queue_index = index;
 
@@ -1082,7 +1083,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	tfile = rcu_dereference(tun->tfiles[txq]);
 
 	/* Drop packet if interface is not attached */
-	if (txq >= tun->numqueues)
+	if (!tfile)
 		goto drop;
 
 	if (!rcu_dereference(tun->steering_prog))
@@ -1305,15 +1306,13 @@ static int tun_xdp_xmit(struct net_device *dev, int n,
 
 	rcu_read_lock();
 
-	numqueues = READ_ONCE(tun->numqueues);
-	if (!numqueues) {
+	tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
+					    tun->numqueues]);
+	if (!tfile) {
 		rcu_read_unlock();
 		return -ENXIO; /* Caller will free/return all frames */
 	}
 
-	tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
-					    numqueues]);
-
 	spin_lock(&tfile->tx_ring.producer_lock);
 	for (i = 0; i < n; i++) {
 		struct xdp_frame *xdp = frames[i];
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-29  2:23                             ` Jason Wang
@ 2019-04-29  3:53                               ` weiyongjun (A)
  2019-04-29 13:07                               ` YueHaibing
  2019-04-29 16:38                               ` Cong Wang
  2 siblings, 0 replies; 28+ messages in thread
From: weiyongjun (A) @ 2019-04-29  3:53 UTC (permalink / raw)
  To: Jason Wang, Cong Wang
  Cc: yuehaibing, davem, edumazet, brouer, mst, lirongqing,
	nicolas dichtel, 3chas3, wangli39, linux-kernel, netdev,
	Peter Xu

> > On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com>
> wrote:
> >>> tun_net_xmit() doesn't have the chance to
> >>> access the change because it holding the rcu_read_lock().
> >>
> >>
> >> The problem is the following codes:
> >>
> >>
> >>          --tun->numqueues;
> >>
> >>          ...
> >>
> >>          synchronize_net();
> >>
> >> We need make sure the decrement of tun->numqueues be visible to
> readers
> >> after synchronize_net(). And in tun_net_xmit():
> >
> > It doesn't matter at all. Readers are okay to read it even they still use the
> > stale tun->numqueues, as long as the tfile is not freed readers can read
> > whatever they want...
> 
> This is only true if we set SOCK_RCU_FREE, isn't it?
> 
> >
> > The decrement of tun->numqueues is just how we unpublish the old
> > tfile, it is still valid for readers to read it _after_ unpublish, we only need
> > to worry about free, not about unpublish. This is the whole spirit of RCU.
> >
> 
> The point is we don't convert tun->numqueues to RCU but use
> synchronize_net().
> 
> > You need to rethink about my SOCK_RCU_FREE patch.
> 
> The code is wrote before SOCK_RCU_FREE is introduced and assume no
> de-reference from device after synchronize_net(). It doesn't harm to
> figure out the root cause which may give us more confidence to the fix
> (e.g like SOCK_RCU_FREE).
> 
> I don't object to fix with SOCK_RCU_FREE, but then we should remove
> the redundant synchronize_net(). But I still prefer to synchronize
> everything explicitly like (completely untested):
> 
> From df91f77d35a6aa7943b6f2a7d4b329990896a0fe Mon Sep 17 00:00:00
> 2001
> From: Jason Wang <jasowang@redhat.com>
> Date: Mon, 29 Apr 2019 10:21:06 +0800
> Subject: [PATCH] tuntap: synchronize through tfiles instead of numqueues
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/tun.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 80bff1b4ec17..03715f605fb5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -698,6 +698,7 @@ static void __tun_detach(struct tun_file *tfile, bool
> clean)
> 
>  		rcu_assign_pointer(tun->tfiles[index],
>  				   tun->tfiles[tun->numqueues - 1]);
> +		rcu_assign_pointer(tun->tfiles[tun->numqueues], NULL);

Should be "rcu_assign_pointer(tun->tfiles[tun->numqueues - 1], NULL);"

>  		ntfile = rtnl_dereference(tun->tfiles[index]);
>  		ntfile->queue_index = index;
> 
> @@ -1082,7 +1083,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff
> *skb, struct net_device *dev)
>  	tfile = rcu_dereference(tun->tfiles[txq]);
> 
>  	/* Drop packet if interface is not attached */
> -	if (txq >= tun->numqueues)
> +	if (!tfile)
>  		goto drop;
> 
>  	if (!rcu_dereference(tun->steering_prog))
> @@ -1305,15 +1306,13 @@ static int tun_xdp_xmit(struct net_device *dev,
> int n,
> 
>  	rcu_read_lock();
> 
> -	numqueues = READ_ONCE(tun->numqueues);
> -	if (!numqueues) {
> +	tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
> +					    tun->numqueues]);
> +	if (!tfile) {
>  		rcu_read_unlock();
>  		return -ENXIO; /* Caller will free/return all frames */
>  	}
> 
> -	tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
> -					    numqueues]);
> -
>  	spin_lock(&tfile->tx_ring.producer_lock);
>  	for (i = 0; i < n; i++) {
>  		struct xdp_frame *xdp = frames[i];
> --
> 2.19.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-29  2:23                             ` Jason Wang
  2019-04-29  3:53                               ` weiyongjun (A)
@ 2019-04-29 13:07                               ` YueHaibing
  2019-04-29 16:38                               ` Cong Wang
  2 siblings, 0 replies; 28+ messages in thread
From: YueHaibing @ 2019-04-29 13:07 UTC (permalink / raw)
  To: Jason Wang, Cong Wang
  Cc: weiyongjun (A),
	davem, edumazet, brouer, mst, lirongqing, nicolas dichtel,
	3chas3, wangli39, linux-kernel, netdev, Peter Xu

On 2019/4/29 10:23, Jason Wang wrote:
> 
> On 2019/4/29 上午1:59, Cong Wang wrote:
>> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> tun_net_xmit() doesn't have the chance to
>>>> access the change because it holding the rcu_read_lock().
>>>
>>>
>>> The problem is the following codes:
>>>
>>>
>>>          --tun->numqueues;
>>>
>>>          ...
>>>
>>>          synchronize_net();
>>>
>>> We need make sure the decrement of tun->numqueues be visible to readers
>>> after synchronize_net(). And in tun_net_xmit():
>>
>> It doesn't matter at all. Readers are okay to read it even they still use the
>> stale tun->numqueues, as long as the tfile is not freed readers can read
>> whatever they want...
> 
> This is only true if we set SOCK_RCU_FREE, isn't it?
> 
>>
>> The decrement of tun->numqueues is just how we unpublish the old
>> tfile, it is still valid for readers to read it _after_ unpublish, we only need
>> to worry about free, not about unpublish. This is the whole spirit of RCU.
>>
> 
> The point is we don't convert tun->numqueues to RCU but use
> synchronize_net().
> 
>> You need to rethink about my SOCK_RCU_FREE patch.
> 
> The code is wrote before SOCK_RCU_FREE is introduced and assume no
> de-reference from device after synchronize_net(). It doesn't harm to
> figure out the root cause which may give us more confidence to the fix
> (e.g like SOCK_RCU_FREE).
> 
> I don't object to fix with SOCK_RCU_FREE, but then we should remove
> the redundant synchronize_net(). But I still prefer to synchronize
> everything explicitly like (completely untested):
> 
>>From df91f77d35a6aa7943b6f2a7d4b329990896a0fe Mon Sep 17 00:00:00 2001
> From: Jason Wang <jasowang@redhat.com>
> Date: Mon, 29 Apr 2019 10:21:06 +0800
> Subject: [PATCH] tuntap: synchronize through tfiles instead of numqueues
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/tun.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 80bff1b4ec17..03715f605fb5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -698,6 +698,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
>  
>  		rcu_assign_pointer(tun->tfiles[index],
>  				   tun->tfiles[tun->numqueues - 1]);
> +		rcu_assign_pointer(tun->tfiles[tun->numqueues], NULL);
>  		ntfile = rtnl_dereference(tun->tfiles[index]);

move here to avoid NULL pointer dereference,  it works for me

		rcu_assign_pointer(tun->tfiles[tun->numqueues -1 ], NULL);

>  		ntfile->queue_index = index;
>  
> @@ -1082,7 +1083,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  	tfile = rcu_dereference(tun->tfiles[txq]);
>  
>  	/* Drop packet if interface is not attached */
> -	if (txq >= tun->numqueues)
> +	if (!tfile)
>  		goto drop;
>  
>  	if (!rcu_dereference(tun->steering_prog))
> @@ -1305,15 +1306,13 @@ static int tun_xdp_xmit(struct net_device *dev, int n,
>  
>  	rcu_read_lock();
>  
> -	numqueues = READ_ONCE(tun->numqueues);
> -	if (!numqueues) {
> +	tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
> +					    tun->numqueues]);
> +	if (!tfile) {
>  		rcu_read_unlock();
>  		return -ENXIO; /* Caller will free/return all frames */
>  	}
>  
> -	tfile = rcu_dereference(tun->tfiles[smp_processor_id() %
> -					    numqueues]);
> -
>  	spin_lock(&tfile->tx_ring.producer_lock);
>  	for (i = 0; i < n; i++) {
>  		struct xdp_frame *xdp = frames[i];
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-28  3:05                 ` [PATCH] tun: Fix use-after-free " Yue Haibing
  2019-04-28  3:24                   ` Jason Wang
  2019-04-28 17:49                   ` Cong Wang
@ 2019-04-29 14:55                   ` Michael S. Tsirkin
  2019-04-29 16:58                     ` Cong Wang
  2 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-04-29 14:55 UTC (permalink / raw)
  To: Yue Haibing
  Cc: davem, jasowang, edumazet, brouer, lirongqing, xiyou.wangcong,
	nicolas.dichtel, 3chas3, wangli39, linux-kernel, netdev

On Sun, Apr 28, 2019 at 11:05:39AM +0800, Yue Haibing wrote:
> From: YueHaibing <yuehaibing@huawei.com>
> 
> KASAN report this:
> 
> BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
> Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
> 
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0xca/0x13e lib/dump_stack.c:113
>  print_address_description+0x79/0x330 mm/kasan/report.c:253
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
>  tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
>  __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
>  netdev_start_xmit include/linux/netdevice.h:4309 [inline]
>  xmit_one net/core/dev.c:3243 [inline]
>  dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
>  sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
>  qdisc_restart net/sched/sch_generic.c:390 [inline]
>  __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
>  qdisc_run include/net/pkt_sched.h:120 [inline]
>  __dev_xmit_skb net/core/dev.c:3438 [inline]
>  __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
>  neigh_output include/net/neighbour.h:501 [inline]
>  ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
>  ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
>  NF_HOOK_COND include/linux/netfilter.h:278 [inline]
>  ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
>  dst_output include/net/dst.h:444 [inline]
>  NF_HOOK include/linux/netfilter.h:289 [inline]
>  mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
>  mld_send_cr net/ipv6/mcast.c:1979 [inline]
>  mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
>  call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers kernel/time/timer.c:1682 [inline]
>  run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
>  __do_softirq+0x26d/0xabd kernel/softirq.c:292
>  invoke_softirq kernel/softirq.c:372 [inline]
>  irq_exit+0x209/0x290 kernel/softirq.c:412
>  exiting_irq arch/x86/include/asm/apic.h:536 [inline]
>  smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
>  </IRQ>
> RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
> Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
> RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
> RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
> R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
>  arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
>  default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
>  cpuidle_idle_call kernel/sched/idle.c:153 [inline]
>  do_idle+0x2ca/0x420 kernel/sched/idle.c:262
>  cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
>  start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
>  secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
> 
> Allocated by task 19764:
>  set_track mm/kasan/kasan.c:460 [inline]
>  kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
>  __kmalloc+0x11b/0x2d0 mm/slub.c:3750
>  kmalloc include/linux/slab.h:518 [inline]
>  sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
>  sk_alloc+0x3d/0xc00 net/core/sock.c:1523
>  tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
>  misc_open+0x367/0x4e0 drivers/char/misc.c:141
>  chrdev_open+0x212/0x580 fs/char_dev.c:417
>  do_dentry_open+0x704/0x1050 fs/open.c:777
>  do_last fs/namei.c:3418 [inline]
>  path_openat+0x7ed/0x2ae0 fs/namei.c:3533
>  do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
>  do_sys_open+0x307/0x430 fs/open.c:1069
>  do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Freed by task 19764:
>  set_track mm/kasan/kasan.c:460 [inline]
>  __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
>  slab_free_hook mm/slub.c:1370 [inline]
>  slab_free_freelist_hook mm/slub.c:1397 [inline]
>  slab_free mm/slub.c:2952 [inline]
>  kfree+0xeb/0x2f0 mm/slub.c:3905
>  sk_prot_free net/core/sock.c:1506 [inline]
>  __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
>  sk_destruct+0x48/0x70 net/core/sock.c:1596
>  __sk_free+0xa9/0x270 net/core/sock.c:1607
>  sk_free+0x2a/0x30 net/core/sock.c:1618
>  sock_put include/net/sock.h:1696 [inline]
>  __tun_detach+0x464/0xf70 drivers/net/tun.c:735
>  tun_detach drivers/net/tun.c:747 [inline]
>  tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
>  __fput+0x27f/0x7f0 fs/file_table.c:278
>  task_work_run+0x136/0x1b0 kernel/task_work.c:113
>  tracehook_notify_resume include/linux/tracehook.h:193 [inline]
>  exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
>  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>  do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> The buggy address belongs to the object at ffff88836cc26600
>  which belongs to the cache kmalloc-4096 of size 4096
> The buggy address is located 1136 bytes inside of
>  4096-byte region [ffff88836cc26600, ffff88836cc27600)
> The buggy address belongs to the page:
> page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0
> flags: 0x2fffff80008100(slab|head)
> raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
> raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                              ^
>  ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 
> If tun driver have multiqueues, user close the last queue by
> tun_detach, then tun->tfiles[index] is not cleared. Then a new
> queue may add to the tun, which using rcu_assign_pointer
> tun->tfiles[index] to the new tfile and increase the numqueues.
> However if there send a packet during this time, which picking the last
> queue, it may uses the old tun->tfiles[index], beacause there no
> RCU grace period.
> 
> 1) tun_chr_close //close the last queue
>     --> __tun_detach  //close the last queue, but tun->tfiles[index] still exist
> 
> 2) tun_chr_open  //attach a new queue
>     --> tun_attach
>       -->rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> 	  //there need a RCU grace period
> 
>     -->tun->numqueues++;
> 			
> 3) tun_net_xmit //a new packet is sending, which pick the last queue
>      -->if (txq >= tun->numqueues)
> 	   //above check passed, but tfile still not renew
>      -->if (tfile->socket.sk->sk_filter ...
> 	   //use the old tfile,trigger use-after-free
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Fixes: c8d68e6be1c3 ("tuntap: multiqueue support")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  drivers/net/tun.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e9ca1c0..3770aba 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -876,6 +876,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>  	 */
>  	rcu_assign_pointer(tfile->tun, tun);
>  	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> +	synchronize_net();
>  	tun->numqueues++;
>  	tun_set_real_num_queues(tun);
>  out:

The problem seems real enough, but an extra synchronize_net on tun_attach
might be a problem, slowing guest startup significantly.
Better ideas?

> -- 
> 2.7.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-29  2:23                             ` Jason Wang
  2019-04-29  3:53                               ` weiyongjun (A)
  2019-04-29 13:07                               ` YueHaibing
@ 2019-04-29 16:38                               ` Cong Wang
  2019-04-30  2:44                                 ` YueHaibing
  2019-05-05  9:09                                 ` Jason Wang
  2 siblings, 2 replies; 28+ messages in thread
From: Cong Wang @ 2019-04-29 16:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: weiyongjun (A),
	yuehaibing, David Miller, Eric Dumazet, Jesper Dangaard Brouer,
	Michael S. Tsirkin, Li,Rongqing, nicolas dichtel, Chas Williams,
	wangli39, LKML, Linux Kernel Network Developers, Peter Xu

On Sun, Apr 28, 2019 at 7:23 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2019/4/29 上午1:59, Cong Wang wrote:
> > On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote:
> >>> tun_net_xmit() doesn't have the chance to
> >>> access the change because it holding the rcu_read_lock().
> >>
> >>
> >> The problem is the following codes:
> >>
> >>
> >>          --tun->numqueues;
> >>
> >>          ...
> >>
> >>          synchronize_net();
> >>
> >> We need make sure the decrement of tun->numqueues be visible to readers
> >> after synchronize_net(). And in tun_net_xmit():
> >
> > It doesn't matter at all. Readers are okay to read it even they still use the
> > stale tun->numqueues, as long as the tfile is not freed readers can read
> > whatever they want...
>
> This is only true if we set SOCK_RCU_FREE, isn't it?


Sure, this is how RCU is supposed to work.

>
> >
> > The decrement of tun->numqueues is just how we unpublish the old
> > tfile, it is still valid for readers to read it _after_ unpublish, we only need
> > to worry about free, not about unpublish. This is the whole spirit of RCU.
> >
>
> The point is we don't convert tun->numqueues to RCU but use
> synchronize_net().

Why tun->numqueues needs RCU? It is an integer, and reading a stale
value is _perfectly_ fine.

If you actually meant to say tun->tfiles[] itself, no, it is a fixed-size array,
it doesn't shrink or grow, so we don't need RCU for it. This is also why
a stale tun->numqueues is fine, as long as it never goes out-of-bound.


>
> > You need to rethink about my SOCK_RCU_FREE patch.
>
> The code is wrote before SOCK_RCU_FREE is introduced and assume no
> de-reference from device after synchronize_net(). It doesn't harm to
> figure out the root cause which may give us more confidence to the fix
> (e.g like SOCK_RCU_FREE).

I believe SOCK_RCU_FREE is the fix for the root cause, not just a
cover-up.


>
> I don't object to fix with SOCK_RCU_FREE, but then we should remove
> the redundant synchronize_net(). But I still prefer to synchronize
> everything explicitly like (completely untested):

I agree that synchronize_net() can be removed. However I don't
understand your untested patch at all, it looks like to fix a completely
different problem rather than this use-after-free.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-29 14:55                   ` Michael S. Tsirkin
@ 2019-04-29 16:58                     ` Cong Wang
  2019-04-30  5:11                       ` weiyongjun (A)
  0 siblings, 1 reply; 28+ messages in thread
From: Cong Wang @ 2019-04-29 16:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Yue Haibing, David Miller, Jason Wang, Eric Dumazet,
	Jesper Dangaard Brouer, Li,Rongqing, Nicolas Dichtel,
	Chas Williams, wangli39, LKML, Linux Kernel Network Developers

On Mon, Apr 29, 2019 at 7:55 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> The problem seems real enough, but an extra synchronize_net on tun_attach
> might be a problem, slowing guest startup significantly.
> Better ideas?

Yes, I proposed the following patch in the other thread.

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e9ca1c088d0b..31c3210288cb 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
struct file * file)
        file->private_data = tfile;
        INIT_LIST_HEAD(&tfile->next);

+       sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
        sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);

        return 0;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-29 16:38                               ` Cong Wang
@ 2019-04-30  2:44                                 ` YueHaibing
  2019-04-30 23:33                                   ` Cong Wang
  2019-05-05  9:09                                 ` Jason Wang
  1 sibling, 1 reply; 28+ messages in thread
From: YueHaibing @ 2019-04-30  2:44 UTC (permalink / raw)
  To: Cong Wang, Jason Wang
  Cc: weiyongjun (A),
	David Miller, Eric Dumazet, Jesper Dangaard Brouer,
	Michael S. Tsirkin, Li,Rongqing, nicolas dichtel, Chas Williams,
	wangli39, LKML, Linux Kernel Network Developers, Peter Xu



On 2019/4/30 0:38, Cong Wang wrote:
> On Sun, Apr 28, 2019 at 7:23 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>>
>> On 2019/4/29 上午1:59, Cong Wang wrote:
>>> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>> tun_net_xmit() doesn't have the chance to
>>>>> access the change because it holding the rcu_read_lock().
>>>>
>>>>
>>>> The problem is the following codes:
>>>>
>>>>
>>>>          --tun->numqueues;
>>>>
>>>>          ...
>>>>
>>>>          synchronize_net();
>>>>
>>>> We need make sure the decrement of tun->numqueues be visible to readers
>>>> after synchronize_net(). And in tun_net_xmit():
>>>
>>> It doesn't matter at all. Readers are okay to read it even they still use the
>>> stale tun->numqueues, as long as the tfile is not freed readers can read
>>> whatever they want...
>>
>> This is only true if we set SOCK_RCU_FREE, isn't it?
> 
> 
> Sure, this is how RCU is supposed to work.
> 
>>
>>>
>>> The decrement of tun->numqueues is just how we unpublish the old
>>> tfile, it is still valid for readers to read it _after_ unpublish, we only need
>>> to worry about free, not about unpublish. This is the whole spirit of RCU.
>>>
>>
>> The point is we don't convert tun->numqueues to RCU but use
>> synchronize_net().
> 
> Why tun->numqueues needs RCU? It is an integer, and reading a stale
> value is _perfectly_ fine.
> 
> If you actually meant to say tun->tfiles[] itself, no, it is a fixed-size array,
> it doesn't shrink or grow, so we don't need RCU for it. This is also why
> a stale tun->numqueues is fine, as long as it never goes out-of-bound.
> 
> 
>>
>>> You need to rethink about my SOCK_RCU_FREE patch.
>>
>> The code is wrote before SOCK_RCU_FREE is introduced and assume no
>> de-reference from device after synchronize_net(). It doesn't harm to
>> figure out the root cause which may give us more confidence to the fix
>> (e.g like SOCK_RCU_FREE).
> 
> I believe SOCK_RCU_FREE is the fix for the root cause, not just a
> cover-up.

With SOCK_RCU_FREE tfile is ok ,

but tfile->sk is freed by sock_put in __tun_detach, it will trgger

use-after-free in tun_net_xmit if tun->numqueues check passed.

> 
> 
>>
>> I don't object to fix with SOCK_RCU_FREE, but then we should remove
>> the redundant synchronize_net(). But I still prefer to synchronize
>> everything explicitly like (completely untested):
> 
> I agree that synchronize_net() can be removed. However I don't
> understand your untested patch at all, it looks like to fix a completely
> different problem rather than this use-after-free.
> 
> Thanks.
> 
> .
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-29 16:58                     ` Cong Wang
@ 2019-04-30  5:11                       ` weiyongjun (A)
  2019-04-30 23:42                         ` Cong Wang
  0 siblings, 1 reply; 28+ messages in thread
From: weiyongjun (A) @ 2019-04-30  5:11 UTC (permalink / raw)
  To: Cong Wang, Michael S. Tsirkin
  Cc: yuehaibing, David Miller, Jason Wang, Eric Dumazet,
	Jesper Dangaard Brouer, Li,Rongqing, Nicolas Dichtel,
	Chas Williams, wangli39, LKML, Linux Kernel Network Developers

> Network Developers <netdev@vger.kernel.org>
> Subject: Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
> 
> On Mon, Apr 29, 2019 at 7:55 AM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > The problem seems real enough, but an extra synchronize_net on
> tun_attach
> > might be a problem, slowing guest startup significantly.
> > Better ideas?
> 
> Yes, I proposed the following patch in the other thread.
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e9ca1c088d0b..31c3210288cb 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -3431,6 +3431,7 @@ static int tun_chr_open(struct inode *inode,
> struct file * file)
>         file->private_data = tfile;
>         INIT_LIST_HEAD(&tfile->next);
> 
> +       sock_set_flag(&tfile->sk, SOCK_RCU_FREE);
>         sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
> 
>         return 0;


This patch should not work. The key point is that when detach the queue
with index is equal to tun->numqueues - 1, we do not clear the point
in tun->tfiles:

static void __tun_detach(...)
{
...
        **** if index == tun->numqueues - 1, nothing changed ****
        rcu_assign_pointer(tun->tfiles[index],
                                                tun->tfiles[tun->numqueues - 1]);
....
}

And after tfile free, xmit have change to get and use the freed file point.

Regards


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-30  2:44                                 ` YueHaibing
@ 2019-04-30 23:33                                   ` Cong Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Cong Wang @ 2019-04-30 23:33 UTC (permalink / raw)
  To: YueHaibing
  Cc: Jason Wang, weiyongjun (A),
	David Miller, Eric Dumazet, Jesper Dangaard Brouer,
	Michael S. Tsirkin, Li,Rongqing, nicolas dichtel, Chas Williams,
	wangli39, LKML, Linux Kernel Network Developers, Peter Xu

On Mon, Apr 29, 2019 at 7:44 PM YueHaibing <yuehaibing@huawei.com> wrote:
>
> With SOCK_RCU_FREE tfile is ok ,
>
> but tfile->sk is freed by sock_put in __tun_detach, it will trgger

SOCK_RCU_FREE is exactly for sock and for sock_put(),
you need to look into sock_put() path to see where SOCK_RCU_FREE
is tested.


>
> use-after-free in tun_net_xmit if tun->numqueues check passed.

Why do you believe we still have use-after-free with SOCK_RCU_FREE?

tun_net_xmit() holds RCU read lock, so with SOCK_RCU_FREE,
the sock won't be freed until tun_net_xmit() releases RCU read lock.
This is just how RCU works...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-30  5:11                       ` weiyongjun (A)
@ 2019-04-30 23:42                         ` Cong Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Cong Wang @ 2019-04-30 23:42 UTC (permalink / raw)
  To: weiyongjun (A)
  Cc: Michael S. Tsirkin, yuehaibing, David Miller, Jason Wang,
	Eric Dumazet, Jesper Dangaard Brouer, Li,Rongqing,
	Nicolas Dichtel, Chas Williams, wangli39, LKML,
	Linux Kernel Network Developers

On Mon, Apr 29, 2019 at 10:11 PM weiyongjun (A) <weiyongjun1@huawei.com> wrote:
> This patch should not work. The key point is that when detach the queue
> with index is equal to tun->numqueues - 1, we do not clear the point
> in tun->tfiles:
>
> static void __tun_detach(...)
> {
> ...
>         **** if index == tun->numqueues - 1, nothing changed ****
>         rcu_assign_pointer(tun->tfiles[index],
>                                                 tun->tfiles[tun->numqueues - 1]);
> ....
> }


This is _perfectly_ fine. This is just how we _unpublish_ it, RCU is NOT
against unpublish, you keep missing this point.

Think about list_del_rcu(). RCU readers could still read the list entry
even _after_ list_del_rcu(), this is perfectly fine, list_del_rcu() just
unpublishes the list entry from a global list, kfree_rcu() is the one frees
it. So, RCU readers never hate "unpublish", they just hate "free".


>
> And after tfile free, xmit have change to get and use the freed file point.

With SOCK_RCU_FREE, it won't be freed until the last reader is gone.
This is the fundamental of RCU.

Please, at least look into sk_destruct().

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] tun: Fix use-after-free in tun_net_xmit
  2019-04-29 16:38                               ` Cong Wang
  2019-04-30  2:44                                 ` YueHaibing
@ 2019-05-05  9:09                                 ` Jason Wang
  1 sibling, 0 replies; 28+ messages in thread
From: Jason Wang @ 2019-05-05  9:09 UTC (permalink / raw)
  To: Cong Wang
  Cc: weiyongjun (A),
	yuehaibing, David Miller, Eric Dumazet, Jesper Dangaard Brouer,
	Michael S. Tsirkin, Li,Rongqing, nicolas dichtel, Chas Williams,
	wangli39, LKML, Linux Kernel Network Developers, Peter Xu


On 2019/4/30 上午12:38, Cong Wang wrote:
> On Sun, Apr 28, 2019 at 7:23 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2019/4/29 上午1:59, Cong Wang wrote:
>>> On Sun, Apr 28, 2019 at 12:51 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>> tun_net_xmit() doesn't have the chance to
>>>>> access the change because it holding the rcu_read_lock().
>>>>
>>>> The problem is the following codes:
>>>>
>>>>
>>>>           --tun->numqueues;
>>>>
>>>>           ...
>>>>
>>>>           synchronize_net();
>>>>
>>>> We need make sure the decrement of tun->numqueues be visible to readers
>>>> after synchronize_net(). And in tun_net_xmit():
>>> It doesn't matter at all. Readers are okay to read it even they still use the
>>> stale tun->numqueues, as long as the tfile is not freed readers can read
>>> whatever they want...
>> This is only true if we set SOCK_RCU_FREE, isn't it?
>
> Sure, this is how RCU is supposed to work.
>
>>> The decrement of tun->numqueues is just how we unpublish the old
>>> tfile, it is still valid for readers to read it _after_ unpublish, we only need
>>> to worry about free, not about unpublish. This is the whole spirit of RCU.
>>>
>> The point is we don't convert tun->numqueues to RCU but use
>> synchronize_net().
> Why tun->numqueues needs RCU? It is an integer, and reading a stale
> value is _perfectly_ fine.


I meant we don't want e.g tun_net_xmit() to see the stale value after 
synchronize_net() in __tun_detach(), since it has various other steps 
with the assumption that no tfile dereference from data path. E.g one 
example is XDP rxq information un-registering which looks racy in the 
case of XDP_TX.


>
> If you actually meant to say tun->tfiles[] itself, no, it is a fixed-size array,
> it doesn't shrink or grow, so we don't need RCU for it. This is also why
> a stale tun->numqueues is fine, as long as it never goes out-of-bound.


We do kind of shrinking or growing through tun->numqueues. That's why we 
check against it in various places. But, of course this is buggy.


>
>
>>> You need to rethink about my SOCK_RCU_FREE patch.
>> The code is wrote before SOCK_RCU_FREE is introduced and assume no
>> de-reference from device after synchronize_net(). It doesn't harm to
>> figure out the root cause which may give us more confidence to the fix
>> (e.g like SOCK_RCU_FREE).
> I believe SOCK_RCU_FREE is the fix for the root cause, not just a
> cover-up.
>
>
>> I don't object to fix with SOCK_RCU_FREE, but then we should remove
>> the redundant synchronize_net(). But I still prefer to synchronize
>> everything explicitly like (completely untested):
> I agree that synchronize_net() can be removed. However I don't
> understand your untested patch at all, it looks like to fix a completely
> different problem rather than this use-after-free.


As has been mentioned, the problem of current code is that we still 
leave pointers  to freed tfile in tfiles[] array in __tun_detach() and 
the check with tun->numqueues seems racy. So the patch just NULL out the 
detached tfile pointers and make sure no it can not be dereferenced from 
tfile after synchronize_net() by dereferencing tfile instead of checking 
tun->numqueues .


Thanks

>
> Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-05-05  9:09 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-22  3:57 BUG: KASAN: use-after-free Read in tun_net_xmit YueHaibing
2019-04-22  9:41 ` Jason Wang
2019-04-22 12:37   ` YueHaibing
2019-04-23  6:00   ` Cong Wang
2019-04-23  6:42     ` Jason Wang
2019-04-23 16:41       ` Cong Wang
2019-04-24  9:11         ` Jason Wang
2019-04-24 12:25           ` YueHaibing
2019-04-25 16:04             ` YueHaibing
2019-04-26  7:37               ` Jason Wang
2019-04-28  3:05                 ` [PATCH] tun: Fix use-after-free " Yue Haibing
2019-04-28  3:24                   ` Jason Wang
2019-04-28  4:26                     ` Jason Wang
2019-04-28  5:40                       ` weiyongjun (A)
2019-04-28  7:51                         ` Jason Wang
2019-04-28 17:59                           ` Cong Wang
2019-04-29  2:23                             ` Jason Wang
2019-04-29  3:53                               ` weiyongjun (A)
2019-04-29 13:07                               ` YueHaibing
2019-04-29 16:38                               ` Cong Wang
2019-04-30  2:44                                 ` YueHaibing
2019-04-30 23:33                                   ` Cong Wang
2019-05-05  9:09                                 ` Jason Wang
2019-04-28 17:49                   ` Cong Wang
2019-04-29 14:55                   ` Michael S. Tsirkin
2019-04-29 16:58                     ` Cong Wang
2019-04-30  5:11                       ` weiyongjun (A)
2019-04-30 23:42                         ` Cong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.