linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: KASAN: use-after-free Read in __do_page_fault
       [not found] <94eb2c0433c8f42cac055cc86991@google.com>
@ 2017-10-30 19:15 ` Dmitry Vyukov
  2017-10-31 12:00   ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Dmitry Vyukov @ 2017-10-30 19:15 UTC (permalink / raw)
  To: syzbot
  Cc: JBeulich, H. Peter Anvin, Josh Poimboeuf, Kirill A. Shutemov,
	ldufour, LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs,
	Thomas Gleixner, the arch/x86 maintainers, Andrew Morton,
	Michal Hocko, Hugh Dickins, David Rientjes, linux-mm

On Mon, Oct 30, 2017 at 10:12 PM, syzbot
<bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>
wrote:
> Hello,
>
> syzkaller hit the following crash on
> 887c8ba753fbe809ba93fa3cfd0cc46db18d37d4
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
>
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
>
>
> BUG: KASAN: use-after-free in arch_local_irq_enable
> arch/x86/include/asm/paravirt.h:787 [inline]
> BUG: KASAN: use-after-free in __do_page_fault+0xc03/0xd60
> arch/x86/mm/fault.c:1357
> Read of size 8 at addr ffff8801cbfd3090 by task syz-executor7/3660
>
> CPU: 1 PID: 3660 Comm: syz-executor7 Not tainted 4.14.0-rc3+ #23
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
>  arch_local_irq_enable arch/x86/include/asm/paravirt.h:787 [inline]
>  __do_page_fault+0xc03/0xd60 arch/x86/mm/fault.c:1357
>  do_page_fault+0xee/0x720 arch/x86/mm/fault.c:1520
>  page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1066
> RIP: 0023:0x8073f4f
> RSP: 002b:00000000f7f89bd0 EFLAGS: 00010202
> RAX: 00000000f7f89c8c RBX: 0000000000000400 RCX: 000000000000000e
> RDX: 00000000f7f8aa88 RSI: 0000000020012fe0 RDI: 00000000f7f89c8c
> RBP: 0000000008128000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>
> Allocated by task 3660:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3561
>  kmem_cache_zalloc include/linux/slab.h:656 [inline]
>  mmap_region+0x7ee/0x15a0 mm/mmap.c:1658
>  do_mmap+0x6a1/0xd50 mm/mmap.c:1468
>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>  do_syscall_32_irqs_on arch/x86/entry/common.c:329 [inline]
>  do_fast_syscall_32+0x3f2/0xf05 arch/x86/entry/common.c:391
>  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:124
>
> Freed by task 3667:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>  __cache_free mm/slab.c:3503 [inline]
>  kmem_cache_free+0x77/0x280 mm/slab.c:3763
>  remove_vma+0x162/0x1b0 mm/mmap.c:176
>  remove_vma_list mm/mmap.c:2475 [inline]
>  do_munmap+0x82a/0xdf0 mm/mmap.c:2714
>  mmap_region+0x59e/0x15a0 mm/mmap.c:1631
>  do_mmap+0x6a1/0xd50 mm/mmap.c:1468
>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>  do_syscall_32_irqs_on arch/x86/entry/common.c:329 [inline]
>  do_fast_syscall_32+0x3f2/0xf05 arch/x86/entry/common.c:391
>  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:124
>
> The buggy address belongs to the object at ffff8801cbfd3040
>  which belongs to the cache vm_area_struct of size 200
> The buggy address is located 80 bytes inside of
>  200-byte region [ffff8801cbfd3040, ffff8801cbfd3108)
> The buggy address belongs to the page:
> page:ffffea00072ff4c0 count:1 mapcount:0 mapping:ffff8801cbfd3040 index:0x0
> flags: 0x200000000000100(slab)
> raw: 0200000000000100 ffff8801cbfd3040 0000000000000000 000000010000000f
> raw: ffffea000730c7a0 ffffea00072ff7a0 ffff8801dae069c0 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>  ffff8801cbfd2f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff8801cbfd3000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>>
>> ffff8801cbfd3080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
>                          ^
>  ffff8801cbfd3100: fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb
>  ffff8801cbfd3180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================


I guess this is more related to mm rather than x86, so +mm maintainers.
This continues to happen, in particular on upstream
781402340475144bb360e32bb7437fa4b84cadc3 (Oct 28).


> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> Once a fix for this bug is committed, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/94eb2c0433c8f42cac055cc86991%40google.com.
> For more options, visit https://groups.google.com/d/optout.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-30 19:15 ` KASAN: use-after-free Read in __do_page_fault Dmitry Vyukov
@ 2017-10-31 12:00   ` Vlastimil Babka
  2017-10-31 12:42     ` Dmitry Vyukov
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2017-10-31 12:00 UTC (permalink / raw)
  To: Dmitry Vyukov, syzbot
  Cc: JBeulich, H. Peter Anvin, Josh Poimboeuf, Kirill A. Shutemov,
	ldufour, LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs,
	Thomas Gleixner, the arch/x86 maintainers, Andrew Morton,
	Michal Hocko, Hugh Dickins, David Rientjes, linux-mm

On 10/30/2017 08:15 PM, Dmitry Vyukov wrote:
> On Mon, Oct 30, 2017 at 10:12 PM, syzbot
> <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>
> wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 887c8ba753fbe809ba93fa3cfd0cc46db18d37d4
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>>
>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> for information about syzkaller reproducers
>>
>>
>> BUG: KASAN: use-after-free in arch_local_irq_enable
>> arch/x86/include/asm/paravirt.h:787 [inline]
>> BUG: KASAN: use-after-free in __do_page_fault+0xc03/0xd60
>> arch/x86/mm/fault.c:1357
>> Read of size 8 at addr ffff8801cbfd3090 by task syz-executor7/3660

Why would local_irq_enable() touch a vma object? Is the stack unwinder
confused or what?
arch/x86/mm/fault.c:1357 means the "else" path of if (user_mode(regs)),
but the page fault's RIP is userspace? Strange.

>> CPU: 1 PID: 3660 Comm: syz-executor7 Not tainted 4.14.0-rc3+ #23
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:16 [inline]
>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>>  kasan_report_error mm/kasan/report.c:351 [inline]
>>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
>>  arch_local_irq_enable arch/x86/include/asm/paravirt.h:787 [inline]
>>  __do_page_fault+0xc03/0xd60 arch/x86/mm/fault.c:1357
>>  do_page_fault+0xee/0x720 arch/x86/mm/fault.c:1520
>>  page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1066
>> RIP: 0023:0x8073f4f
>> RSP: 002b:00000000f7f89bd0 EFLAGS: 00010202
>> RAX: 00000000f7f89c8c RBX: 0000000000000400 RCX: 000000000000000e
>> RDX: 00000000f7f8aa88 RSI: 0000000020012fe0 RDI: 00000000f7f89c8c
>> RBP: 0000000008128000 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000000000
>> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>
>> Allocated by task 3660:
>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>>  set_track mm/kasan/kasan.c:459 [inline]
>>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3561
>>  kmem_cache_zalloc include/linux/slab.h:656 [inline]
>>  mmap_region+0x7ee/0x15a0 mm/mmap.c:1658
>>  do_mmap+0x6a1/0xd50 mm/mmap.c:1468
>>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>>  do_syscall_32_irqs_on arch/x86/entry/common.c:329 [inline]
>>  do_fast_syscall_32+0x3f2/0xf05 arch/x86/entry/common.c:391
>>  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:124
>>
>> Freed by task 3667:
>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>>  set_track mm/kasan/kasan.c:459 [inline]
>>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>>  __cache_free mm/slab.c:3503 [inline]
>>  kmem_cache_free+0x77/0x280 mm/slab.c:3763
>>  remove_vma+0x162/0x1b0 mm/mmap.c:176
>>  remove_vma_list mm/mmap.c:2475 [inline]
>>  do_munmap+0x82a/0xdf0 mm/mmap.c:2714
>>  mmap_region+0x59e/0x15a0 mm/mmap.c:1631
>>  do_mmap+0x6a1/0xd50 mm/mmap.c:1468
>>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>>  do_syscall_32_irqs_on arch/x86/entry/common.c:329 [inline]
>>  do_fast_syscall_32+0x3f2/0xf05 arch/x86/entry/common.c:391
>>  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:124

This would mean that mmap_sem is not doing its job and we raced with a
vma removal. Or the rbtree is broken and contains a vma that has been
freed. Hmm, or the vmacache is broken? You could try removing the 3
lines starting with vmacache_find() in find_vma().

>> The buggy address belongs to the object at ffff8801cbfd3040
>>  which belongs to the cache vm_area_struct of size 200
>> The buggy address is located 80 bytes inside of
>>  200-byte region [ffff8801cbfd3040, ffff8801cbfd3108)

My vm_area_struct is 192 bytes, could be your layout is different due to
.config. At offset 80 I have vma->vm_flags. That is checked by
__do_page_fault(), but only after vma->vm_start (offset 0). Of course,
reordering is possible.

>> The buggy address belongs to the page:
>> page:ffffea00072ff4c0 count:1 mapcount:0 mapping:ffff8801cbfd3040 index:0x0
>> flags: 0x200000000000100(slab)
>> raw: 0200000000000100 ffff8801cbfd3040 0000000000000000 000000010000000f
>> raw: ffffea000730c7a0 ffffea00072ff7a0 ffff8801dae069c0 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>>  ffff8801cbfd2f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>  ffff8801cbfd3000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>>>
>>> ffff8801cbfd3080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>
>>                          ^
>>  ffff8801cbfd3100: fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb
>>  ffff8801cbfd3180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ==================================================================
> 
> 
> I guess this is more related to mm rather than x86, so +mm maintainers.
> This continues to happen, in particular on upstream
> 781402340475144bb360e32bb7437fa4b84cadc3 (Oct 28).
> 
> 
>> ---
>> This bug is generated by a dumb bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for details.
>> Direct all questions to syzkaller@googlegroups.com.
>>
>> syzbot will keep track of this bug report.
>> Once a fix for this bug is committed, please reply to this email with:
>> #syz fix: exact-commit-title
>> To mark this as a duplicate of another syzbot report, please reply with:
>> #syz dup: exact-subject-of-another-report
>> If it's a one-off invalid bug report, please reply with:
>> #syz invalid
>> Note: if the crash happens again, it will cause creation of a new bug
>> report.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "syzkaller-bugs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to syzkaller-bugs+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/syzkaller-bugs/94eb2c0433c8f42cac055cc86991%40google.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 12:00   ` Vlastimil Babka
@ 2017-10-31 12:42     ` Dmitry Vyukov
  2017-10-31 13:20       ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Dmitry Vyukov @ 2017-10-31 12:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: syzbot, Jan Beulich, H. Peter Anvin, Josh Poimboeuf,
	Kirill A. Shutemov, ldufour, LKML, Andy Lutomirski, Ingo Molnar,
	syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers,
	Andrew Morton, Michal Hocko, Hugh Dickins, David Rientjes,
	linux-mm

On Tue, Oct 31, 2017 at 3:00 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
> On 10/30/2017 08:15 PM, Dmitry Vyukov wrote:
>> On Mon, Oct 30, 2017 at 10:12 PM, syzbot
>> <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>
>> wrote:
>>> Hello,
>>>
>>> syzkaller hit the following crash on
>>> 887c8ba753fbe809ba93fa3cfd0cc46db18d37d4
>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>>> compiler: gcc (GCC) 7.1.1 20170620
>>> .config is attached
>>> Raw console output is attached.
>>>
>>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>>> for information about syzkaller reproducers
>>>
>>>
>>> BUG: KASAN: use-after-free in arch_local_irq_enable
>>> arch/x86/include/asm/paravirt.h:787 [inline]
>>> BUG: KASAN: use-after-free in __do_page_fault+0xc03/0xd60
>>> arch/x86/mm/fault.c:1357
>>> Read of size 8 at addr ffff8801cbfd3090 by task syz-executor7/3660
>
> Why would local_irq_enable() touch a vma object? Is the stack unwinder
> confused or what?
> arch/x86/mm/fault.c:1357 means the "else" path of if (user_mode(regs)),
> but the page fault's RIP is userspace? Strange.
>
>>> CPU: 1 PID: 3660 Comm: syz-executor7 Not tainted 4.14.0-rc3+ #23
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>>> Google 01/01/2011
>>> Call Trace:
>>>  __dump_stack lib/dump_stack.c:16 [inline]
>>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>>>  kasan_report_error mm/kasan/report.c:351 [inline]
>>>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>>>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
>>>  arch_local_irq_enable arch/x86/include/asm/paravirt.h:787 [inline]
>>>  __do_page_fault+0xc03/0xd60 arch/x86/mm/fault.c:1357
>>>  do_page_fault+0xee/0x720 arch/x86/mm/fault.c:1520
>>>  page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1066
>>> RIP: 0023:0x8073f4f
>>> RSP: 002b:00000000f7f89bd0 EFLAGS: 00010202
>>> RAX: 00000000f7f89c8c RBX: 0000000000000400 RCX: 000000000000000e
>>> RDX: 00000000f7f8aa88 RSI: 0000000020012fe0 RDI: 00000000f7f89c8c
>>> RBP: 0000000008128000 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000000000
>>> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>
>>> Allocated by task 3660:
>>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>>>  set_track mm/kasan/kasan.c:459 [inline]
>>>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>>>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>>>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3561
>>>  kmem_cache_zalloc include/linux/slab.h:656 [inline]
>>>  mmap_region+0x7ee/0x15a0 mm/mmap.c:1658
>>>  do_mmap+0x6a1/0xd50 mm/mmap.c:1468
>>>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>>>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>>>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>>>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>>>  do_syscall_32_irqs_on arch/x86/entry/common.c:329 [inline]
>>>  do_fast_syscall_32+0x3f2/0xf05 arch/x86/entry/common.c:391
>>>  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:124
>>>
>>> Freed by task 3667:
>>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>>>  set_track mm/kasan/kasan.c:459 [inline]
>>>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>>>  __cache_free mm/slab.c:3503 [inline]
>>>  kmem_cache_free+0x77/0x280 mm/slab.c:3763
>>>  remove_vma+0x162/0x1b0 mm/mmap.c:176
>>>  remove_vma_list mm/mmap.c:2475 [inline]
>>>  do_munmap+0x82a/0xdf0 mm/mmap.c:2714
>>>  mmap_region+0x59e/0x15a0 mm/mmap.c:1631
>>>  do_mmap+0x6a1/0xd50 mm/mmap.c:1468
>>>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>>>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>>>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>>>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>>>  do_syscall_32_irqs_on arch/x86/entry/common.c:329 [inline]
>>>  do_fast_syscall_32+0x3f2/0xf05 arch/x86/entry/common.c:391
>>>  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:124
>
> This would mean that mmap_sem is not doing its job and we raced with a
> vma removal. Or the rbtree is broken and contains a vma that has been
> freed. Hmm, or the vmacache is broken? You could try removing the 3
> lines starting with vmacache_find() in find_vma().
>
>>> The buggy address belongs to the object at ffff8801cbfd3040
>>>  which belongs to the cache vm_area_struct of size 200
>>> The buggy address is located 80 bytes inside of
>>>  200-byte region [ffff8801cbfd3040, ffff8801cbfd3108)
>
> My vm_area_struct is 192 bytes, could be your layout is different due to
> .config. At offset 80 I have vma->vm_flags. That is checked by
> __do_page_fault(), but only after vma->vm_start (offset 0). Of course,
> reordering is possible.


It seems that compiler over-optimizes things and messes debug info.
I just re-reproduced this on upstream
15f859ae5c43c7f0a064ed92d33f7a5bc5de6de0 and got the same report:

==================================================================
BUG: KASAN: use-after-free in arch_local_irq_enable
arch/x86/include/asm/paravirt.h:787 [inline]
BUG: KASAN: use-after-free in __do_page_fault+0xc03/0xd60
arch/x86/mm/fault.c:1357
Read of size 8 at addr ffff880064d19aa0 by task syz-executor/8001

CPU: 0 PID: 8001 Comm: syz-executor Not tainted 4.14.0-rc6+ #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 print_address_description+0x73/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x25b/0x340 mm/kasan/report.c:409
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
 arch_local_irq_enable arch/x86/include/asm/paravirt.h:787 [inline]
 __do_page_fault+0xc03/0xd60 arch/x86/mm/fault.c:1357
 do_page_fault+0xee/0x720 arch/x86/mm/fault.c:1520
 do_async_page_fault+0x82/0x110 arch/x86/kernel/kvm.c:273
 async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
RIP: 0033:0x441bd0
RSP: 002b:00007f2ed8229798 EFLAGS: 00010202
RAX: 00007f2ed82297c0 RBX: 0000000000000000 RCX: 000000000000000e
RDX: 0000000000000400 RSI: 0000000020012fe0 RDI: 00007f2ed82297c0
RBP: 0000000000748020 R08: 0000000000000400 R09: 0000000000000000
R10: 0000000020012fee R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000008430 R14: 00000000006ec4d0 R15: 00007f2ed822a700

Allocated by task 8001:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3561
 kmem_cache_zalloc include/linux/slab.h:656 [inline]
 mmap_region+0x7ee/0x15a0 mm/mmap.c:1658
 do_mmap+0x69b/0xd40 mm/mmap.c:1468
 do_mmap_pgoff include/linux/mm.h:2150 [inline]
 vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
 SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
 SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
 SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
 SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 8007:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
 __cache_free mm/slab.c:3503 [inline]
 kmem_cache_free+0x77/0x280 mm/slab.c:3763
 remove_vma+0x162/0x1b0 mm/mmap.c:176
 remove_vma_list mm/mmap.c:2475 [inline]
 do_munmap+0x82a/0xdf0 mm/mmap.c:2714
 mmap_region+0x59e/0x15a0 mm/mmap.c:1631
 do_mmap+0x69b/0xd40 mm/mmap.c:1468
 do_mmap_pgoff include/linux/mm.h:2150 [inline]
 vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
 SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
 SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
 SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
 SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
 entry_SYSCALL_64_fastpath+0x1f/0xbe

The buggy address belongs to the object at ffff880064d19a50
 which belongs to the cache vm_area_struct of size 200
The buggy address is located 80 bytes inside of
 200-byte region [ffff880064d19a50, ffff880064d19b18)
The buggy address belongs to the page:
page:ffffea0001934640 count:1 mapcount:0 mapping:ffff880064d19000 index:0x0
flags: 0x100000000000100(slab)
raw: 0100000000000100 ffff880064d19000 0000000000000000 000000010000000f
raw: ffffea00018a3a60 ffffea0001940be0 ffff88006c5f79c0 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880064d19980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff880064d19a00: fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb
>ffff880064d19a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
 ffff880064d19b00: fb fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb
 ffff880064d19b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


Here is disasm of the function:
https://gist.githubusercontent.com/dvyukov/5a56c66ce605168c951a321d94df6e3a/raw/538d4ce72ceb5631dfcc866ccde46c74543de1cf/gistfile1.txt

Seems to be vma->vm_flags at offset 80.

I think the size of 200 reported by slab is OK as it can do some rounding.
Everything points to a vma object.


>>> The buggy address belongs to the page:
>>> page:ffffea00072ff4c0 count:1 mapcount:0 mapping:ffff8801cbfd3040 index:0x0
>>> flags: 0x200000000000100(slab)
>>> raw: 0200000000000100 ffff8801cbfd3040 0000000000000000 000000010000000f
>>> raw: ffffea000730c7a0 ffffea00072ff7a0 ffff8801dae069c0 0000000000000000
>>> page dumped because: kasan: bad access detected
>>>
>>> Memory state around the buggy address:
>>>  ffff8801cbfd2f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>>  ffff8801cbfd3000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>>>>
>>>> ffff8801cbfd3080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>
>>>                          ^
>>>  ffff8801cbfd3100: fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb
>>>  ffff8801cbfd3180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> ==================================================================
>>
>>
>> I guess this is more related to mm rather than x86, so +mm maintainers.
>> This continues to happen, in particular on upstream
>> 781402340475144bb360e32bb7437fa4b84cadc3 (Oct 28).
>>
>>
>>> ---
>>> This bug is generated by a dumb bot. It may contain errors.
>>> See https://goo.gl/tpsmEJ for details.
>>> Direct all questions to syzkaller@googlegroups.com.
>>>
>>> syzbot will keep track of this bug report.
>>> Once a fix for this bug is committed, please reply to this email with:
>>> #syz fix: exact-commit-title
>>> To mark this as a duplicate of another syzbot report, please reply with:
>>> #syz dup: exact-subject-of-another-report
>>> If it's a one-off invalid bug report, please reply with:
>>> #syz invalid
>>> Note: if the crash happens again, it will cause creation of a new bug
>>> report.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "syzkaller-bugs" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to syzkaller-bugs+unsubscribe@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/syzkaller-bugs/94eb2c0433c8f42cac055cc86991%40google.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 12:42     ` Dmitry Vyukov
@ 2017-10-31 13:20       ` Vlastimil Babka
  2017-10-31 13:57         ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2017-10-31 13:20 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Jan Beulich, H. Peter Anvin, Josh Poimboeuf,
	Kirill A. Shutemov, ldufour, LKML, Andy Lutomirski, Ingo Molnar,
	syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers,
	Andrew Morton, Michal Hocko, Hugh Dickins, David Rientjes,
	linux-mm

On 10/31/2017 01:42 PM, Dmitry Vyukov wrote:
>> My vm_area_struct is 192 bytes, could be your layout is different due to
>> .config. At offset 80 I have vma->vm_flags. That is checked by
>> __do_page_fault(), but only after vma->vm_start (offset 0). Of course,
>> reordering is possible.
> 
> 
> It seems that compiler over-optimizes things and messes debug info.
> I just re-reproduced this on upstream
> 15f859ae5c43c7f0a064ed92d33f7a5bc5de6de0 and got the same report:
> 
> ==================================================================
> BUG: KASAN: use-after-free in arch_local_irq_enable
> arch/x86/include/asm/paravirt.h:787 [inline]
> BUG: KASAN: use-after-free in __do_page_fault+0xc03/0xd60
> arch/x86/mm/fault.c:1357
> Read of size 8 at addr ffff880064d19aa0 by task syz-executor/8001
> 
> CPU: 0 PID: 8001 Comm: syz-executor Not tainted 4.14.0-rc6+ #12
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
>  arch_local_irq_enable arch/x86/include/asm/paravirt.h:787 [inline]
>  __do_page_fault+0xc03/0xd60 arch/x86/mm/fault.c:1357
>  do_page_fault+0xee/0x720 arch/x86/mm/fault.c:1520
>  do_async_page_fault+0x82/0x110 arch/x86/kernel/kvm.c:273
>  async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
> RIP: 0033:0x441bd0
> RSP: 002b:00007f2ed8229798 EFLAGS: 00010202
> RAX: 00007f2ed82297c0 RBX: 0000000000000000 RCX: 000000000000000e
> RDX: 0000000000000400 RSI: 0000000020012fe0 RDI: 00007f2ed82297c0
> RBP: 0000000000748020 R08: 0000000000000400 R09: 0000000000000000
> R10: 0000000020012fee R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000008430 R14: 00000000006ec4d0 R15: 00007f2ed822a700
> 
> Allocated by task 8001:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3561
>  kmem_cache_zalloc include/linux/slab.h:656 [inline]
>  mmap_region+0x7ee/0x15a0 mm/mmap.c:1658
>  do_mmap+0x69b/0xd40 mm/mmap.c:1468
>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> 
> Freed by task 8007:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>  __cache_free mm/slab.c:3503 [inline]
>  kmem_cache_free+0x77/0x280 mm/slab.c:3763
>  remove_vma+0x162/0x1b0 mm/mmap.c:176
>  remove_vma_list mm/mmap.c:2475 [inline]
>  do_munmap+0x82a/0xdf0 mm/mmap.c:2714
>  mmap_region+0x59e/0x15a0 mm/mmap.c:1631
>  do_mmap+0x69b/0xd40 mm/mmap.c:1468
>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> 
> The buggy address belongs to the object at ffff880064d19a50
>  which belongs to the cache vm_area_struct of size 200
> The buggy address is located 80 bytes inside of
>  200-byte region [ffff880064d19a50, ffff880064d19b18)
> The buggy address belongs to the page:
> page:ffffea0001934640 count:1 mapcount:0 mapping:ffff880064d19000 index:0x0
> flags: 0x100000000000100(slab)
> raw: 0100000000000100 ffff880064d19000 0000000000000000 000000010000000f
> raw: ffffea00018a3a60 ffffea0001940be0 ffff88006c5f79c0 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff880064d19980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff880064d19a00: fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb
>> ffff880064d19a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                ^
>  ffff880064d19b00: fb fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb
>  ffff880064d19b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
> 
> 
> Here is disasm of the function:
> https://gist.githubusercontent.com/dvyukov/5a56c66ce605168c951a321d94df6e3a/raw/538d4ce72ceb5631dfcc866ccde46c74543de1cf/gistfile1.txt
> 
> Seems to be vma->vm_flags at offset 80.

You can see it from the disasm? I can't make much of it, unfortunately,
the added kasan calls obscure it a lot for me. But I suspect it might be
the vma_pkey() thing which reads from vma->vm_flags. What happens when
CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled? (or is it already?)

Also did you try the vmacache shortcut test suggested in my previous mail?

> I think the size of 200 reported by slab is OK as it can do some rounding.
> Everything points to a vma object.
> 
> 
>>>> The buggy address belongs to the page:
>>>> page:ffffea00072ff4c0 count:1 mapcount:0 mapping:ffff8801cbfd3040 index:0x0
>>>> flags: 0x200000000000100(slab)
>>>> raw: 0200000000000100 ffff8801cbfd3040 0000000000000000 000000010000000f
>>>> raw: ffffea000730c7a0 ffffea00072ff7a0 ffff8801dae069c0 0000000000000000
>>>> page dumped because: kasan: bad access detected
>>>>
>>>> Memory state around the buggy address:
>>>>  ffff8801cbfd2f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>>>  ffff8801cbfd3000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>>>>>
>>>>> ffff8801cbfd3080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>
>>>>                          ^
>>>>  ffff8801cbfd3100: fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb
>>>>  ffff8801cbfd3180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> ==================================================================
>>>
>>>
>>> I guess this is more related to mm rather than x86, so +mm maintainers.
>>> This continues to happen, in particular on upstream
>>> 781402340475144bb360e32bb7437fa4b84cadc3 (Oct 28).
>>>
>>>
>>>> ---
>>>> This bug is generated by a dumb bot. It may contain errors.
>>>> See https://goo.gl/tpsmEJ for details.
>>>> Direct all questions to syzkaller@googlegroups.com.
>>>>
>>>> syzbot will keep track of this bug report.
>>>> Once a fix for this bug is committed, please reply to this email with:
>>>> #syz fix: exact-commit-title
>>>> To mark this as a duplicate of another syzbot report, please reply with:
>>>> #syz dup: exact-subject-of-another-report
>>>> If it's a one-off invalid bug report, please reply with:
>>>> #syz invalid
>>>> Note: if the crash happens again, it will cause creation of a new bug
>>>> report.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "syzkaller-bugs" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to syzkaller-bugs+unsubscribe@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/syzkaller-bugs/94eb2c0433c8f42cac055cc86991%40google.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 13:20       ` Vlastimil Babka
@ 2017-10-31 13:57         ` Vlastimil Babka
  2017-10-31 14:11           ` Kirill A. Shutemov
                             ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Vlastimil Babka @ 2017-10-31 13:57 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Jan Beulich, H. Peter Anvin, Josh Poimboeuf,
	Kirill A. Shutemov, ldufour, LKML, Andy Lutomirski, Ingo Molnar,
	syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers,
	Andrew Morton, Michal Hocko, Hugh Dickins, David Rientjes,
	linux-mm, Andrea Arcangeli, Linus Torvalds, Thorsten Leemhuis

+CC Andrea, Thorsten, Linus

On 10/31/2017 02:20 PM, Vlastimil Babka wrote:
> On 10/31/2017 01:42 PM, Dmitry Vyukov wrote:
>>> My vm_area_struct is 192 bytes, could be your layout is different due to
>>> .config. At offset 80 I have vma->vm_flags. That is checked by
>>> __do_page_fault(), but only after vma->vm_start (offset 0). Of course,
>>> reordering is possible.
>>
>>
>> It seems that compiler over-optimizes things and messes debug info.
>> I just re-reproduced this on upstream
>> 15f859ae5c43c7f0a064ed92d33f7a5bc5de6de0 and got the same report:
>>
>> ==================================================================
>> BUG: KASAN: use-after-free in arch_local_irq_enable
>> arch/x86/include/asm/paravirt.h:787 [inline]
>> BUG: KASAN: use-after-free in __do_page_fault+0xc03/0xd60
>> arch/x86/mm/fault.c:1357
>> Read of size 8 at addr ffff880064d19aa0 by task syz-executor/8001
>>
>> CPU: 0 PID: 8001 Comm: syz-executor Not tainted 4.14.0-rc6+ #12
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:16 [inline]
>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>>  kasan_report_error mm/kasan/report.c:351 [inline]
>>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
>>  arch_local_irq_enable arch/x86/include/asm/paravirt.h:787 [inline]
>>  __do_page_fault+0xc03/0xd60 arch/x86/mm/fault.c:1357
>>  do_page_fault+0xee/0x720 arch/x86/mm/fault.c:1520
>>  do_async_page_fault+0x82/0x110 arch/x86/kernel/kvm.c:273
>>  async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
>> RIP: 0033:0x441bd0
>> RSP: 002b:00007f2ed8229798 EFLAGS: 00010202
>> RAX: 00007f2ed82297c0 RBX: 0000000000000000 RCX: 000000000000000e
>> RDX: 0000000000000400 RSI: 0000000020012fe0 RDI: 00007f2ed82297c0
>> RBP: 0000000000748020 R08: 0000000000000400 R09: 0000000000000000
>> R10: 0000000020012fee R11: 0000000000000246 R12: 00000000ffffffff
>> R13: 0000000000008430 R14: 00000000006ec4d0 R15: 00007f2ed822a700
>>
>> Allocated by task 8001:
>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>>  set_track mm/kasan/kasan.c:459 [inline]
>>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3561
>>  kmem_cache_zalloc include/linux/slab.h:656 [inline]
>>  mmap_region+0x7ee/0x15a0 mm/mmap.c:1658
>>  do_mmap+0x69b/0xd40 mm/mmap.c:1468
>>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
>>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
>>  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>
>> Freed by task 8007:
>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>>  set_track mm/kasan/kasan.c:459 [inline]
>>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>>  __cache_free mm/slab.c:3503 [inline]
>>  kmem_cache_free+0x77/0x280 mm/slab.c:3763
>>  remove_vma+0x162/0x1b0 mm/mmap.c:176
>>  remove_vma_list mm/mmap.c:2475 [inline]
>>  do_munmap+0x82a/0xdf0 mm/mmap.c:2714
>>  mmap_region+0x59e/0x15a0 mm/mmap.c:1631
>>  do_mmap+0x69b/0xd40 mm/mmap.c:1468
>>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
>>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
>>  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>
>> The buggy address belongs to the object at ffff880064d19a50
>>  which belongs to the cache vm_area_struct of size 200
>> The buggy address is located 80 bytes inside of
>>  200-byte region [ffff880064d19a50, ffff880064d19b18)
>> The buggy address belongs to the page:
>> page:ffffea0001934640 count:1 mapcount:0 mapping:ffff880064d19000 index:0x0
>> flags: 0x100000000000100(slab)
>> raw: 0100000000000100 ffff880064d19000 0000000000000000 000000010000000f
>> raw: ffffea00018a3a60 ffffea0001940be0 ffff88006c5f79c0 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>>  ffff880064d19980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>  ffff880064d19a00: fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb
>>> ffff880064d19a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                                ^
>>  ffff880064d19b00: fb fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb
>>  ffff880064d19b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ==================================================================
>>
>>
>> Here is disasm of the function:
>> https://gist.githubusercontent.com/dvyukov/5a56c66ce605168c951a321d94df6e3a/raw/538d4ce72ceb5631dfcc866ccde46c74543de1cf/gistfile1.txt
>>
>> Seems to be vma->vm_flags at offset 80.
> 
> You can see it from the disasm? I can't make much of it, unfortunately,
> the added kasan calls obscure it a lot for me. But I suspect it might be
> the vma_pkey() thing which reads from vma->vm_flags. What happens when
> CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled? (or is it already?)

OK, so I opened the google groups link in the report's signature and
looked at the attached config there, which says protkeys are enabled.
Also looked at the repro.txt attachment:
#{Threaded:true Collide:true Repeat:true Procs:8 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:true UseTmpDir:true HandleSegv:true WaitRepeat:true Debug:false Repro:false}
mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
r0 = userfaultfd(0x0)
ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
r1 = gettid()
syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
tkill(r1, 0x7)

The userfaultfd() caught my attention so I checked handle_userfault()
which seems to do up_read(&mm->mmap_sem); and in some cases later
followed by down_read(&mm->mmap_sem); return VM_FAULT_NOPAGE.
However, __do_page_fault() only expects that mmap_sem to be released
when handle_mm_fault() returns with VM_FAULT_RETRY. It doesn't expect it
to be released and then acquired again, because then vma can be indeed
gone. It seems vma hasn't been touched after that point until the
vma_pkey() was added by commit a3c4fb7c9c2e ("x86/mm: Fix fault error
path using unsafe vma pointer") in rc3. Which tried to fix a similar
problem, but run into this corner case?

So I suspect a3c4fb7c9c2e is the culprit and thus a regression.

> Also did you try the vmacache shortcut test suggested in my previous mail?
> 
>> I think the size of 200 reported by slab is OK as it can do some rounding.
>> Everything points to a vma object.
>>
>>
>>>>> The buggy address belongs to the page:
>>>>> page:ffffea00072ff4c0 count:1 mapcount:0 mapping:ffff8801cbfd3040 index:0x0
>>>>> flags: 0x200000000000100(slab)
>>>>> raw: 0200000000000100 ffff8801cbfd3040 0000000000000000 000000010000000f
>>>>> raw: ffffea000730c7a0 ffffea00072ff7a0 ffff8801dae069c0 0000000000000000
>>>>> page dumped because: kasan: bad access detected
>>>>>
>>>>> Memory state around the buggy address:
>>>>>  ffff8801cbfd2f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>>>>  ffff8801cbfd3000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>>>>>>
>>>>>> ffff8801cbfd3080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>>
>>>>>                          ^
>>>>>  ffff8801cbfd3100: fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb
>>>>>  ffff8801cbfd3180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>> ==================================================================
>>>>
>>>>
>>>> I guess this is more related to mm rather than x86, so +mm maintainers.
>>>> This continues to happen, in particular on upstream
>>>> 781402340475144bb360e32bb7437fa4b84cadc3 (Oct 28).
>>>>
>>>>
>>>>> ---
>>>>> This bug is generated by a dumb bot. It may contain errors.
>>>>> See https://goo.gl/tpsmEJ for details.
>>>>> Direct all questions to syzkaller@googlegroups.com.
>>>>>
>>>>> syzbot will keep track of this bug report.
>>>>> Once a fix for this bug is committed, please reply to this email with:
>>>>> #syz fix: exact-commit-title
>>>>> To mark this as a duplicate of another syzbot report, please reply with:
>>>>> #syz dup: exact-subject-of-another-report
>>>>> If it's a one-off invalid bug report, please reply with:
>>>>> #syz invalid
>>>>> Note: if the crash happens again, it will cause creation of a new bug
>>>>> report.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups
>>>>> "syzkaller-bugs" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>>> email to syzkaller-bugs+unsubscribe@googlegroups.com.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/syzkaller-bugs/94eb2c0433c8f42cac055cc86991%40google.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>
>>>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 13:57         ` Vlastimil Babka
@ 2017-10-31 14:11           ` Kirill A. Shutemov
  2017-10-31 14:28             ` Vlastimil Babka
  2017-10-31 15:37           ` Linus Torvalds
  2017-11-02 10:00           ` Laurent Dufour
  2 siblings, 1 reply; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-10-31 14:11 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Dmitry Vyukov, syzbot, Jan Beulich, H. Peter Anvin,
	Josh Poimboeuf, Kirill A. Shutemov, ldufour, LKML,
	Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers, Andrew Morton, Michal Hocko,
	Hugh Dickins, David Rientjes, linux-mm, Andrea Arcangeli,
	Linus Torvalds, Thorsten Leemhuis

On Tue, Oct 31, 2017 at 02:57:58PM +0100, Vlastimil Babka wrote:
> +CC Andrea, Thorsten, Linus
> 
> On 10/31/2017 02:20 PM, Vlastimil Babka wrote:
> > On 10/31/2017 01:42 PM, Dmitry Vyukov wrote:
> >>> My vm_area_struct is 192 bytes, could be your layout is different due to
> >>> .config. At offset 80 I have vma->vm_flags. That is checked by
> >>> __do_page_fault(), but only after vma->vm_start (offset 0). Of course,
> >>> reordering is possible.
> >>
> >>
> >> It seems that compiler over-optimizes things and messes debug info.
> >> I just re-reproduced this on upstream
> >> 15f859ae5c43c7f0a064ed92d33f7a5bc5de6de0 and got the same report:
> >>
> >> ==================================================================
> >> BUG: KASAN: use-after-free in arch_local_irq_enable
> >> arch/x86/include/asm/paravirt.h:787 [inline]
> >> BUG: KASAN: use-after-free in __do_page_fault+0xc03/0xd60
> >> arch/x86/mm/fault.c:1357
> >> Read of size 8 at addr ffff880064d19aa0 by task syz-executor/8001
> >>
> >> CPU: 0 PID: 8001 Comm: syz-executor Not tainted 4.14.0-rc6+ #12
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >> Call Trace:
> >>  __dump_stack lib/dump_stack.c:16 [inline]
> >>  dump_stack+0x194/0x257 lib/dump_stack.c:52
> >>  print_address_description+0x73/0x250 mm/kasan/report.c:252
> >>  kasan_report_error mm/kasan/report.c:351 [inline]
> >>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
> >>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
> >>  arch_local_irq_enable arch/x86/include/asm/paravirt.h:787 [inline]
> >>  __do_page_fault+0xc03/0xd60 arch/x86/mm/fault.c:1357
> >>  do_page_fault+0xee/0x720 arch/x86/mm/fault.c:1520
> >>  do_async_page_fault+0x82/0x110 arch/x86/kernel/kvm.c:273
> >>  async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
> >> RIP: 0033:0x441bd0
> >> RSP: 002b:00007f2ed8229798 EFLAGS: 00010202
> >> RAX: 00007f2ed82297c0 RBX: 0000000000000000 RCX: 000000000000000e
> >> RDX: 0000000000000400 RSI: 0000000020012fe0 RDI: 00007f2ed82297c0
> >> RBP: 0000000000748020 R08: 0000000000000400 R09: 0000000000000000
> >> R10: 0000000020012fee R11: 0000000000000246 R12: 00000000ffffffff
> >> R13: 0000000000008430 R14: 00000000006ec4d0 R15: 00007f2ed822a700
> >>
> >> Allocated by task 8001:
> >>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> >>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> >>  set_track mm/kasan/kasan.c:459 [inline]
> >>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
> >>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
> >>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3561
> >>  kmem_cache_zalloc include/linux/slab.h:656 [inline]
> >>  mmap_region+0x7ee/0x15a0 mm/mmap.c:1658
> >>  do_mmap+0x69b/0xd40 mm/mmap.c:1468
> >>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
> >>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
> >>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
> >>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
> >>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
> >>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
> >>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> >>
> >> Freed by task 8007:
> >>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> >>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> >>  set_track mm/kasan/kasan.c:459 [inline]
> >>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
> >>  __cache_free mm/slab.c:3503 [inline]
> >>  kmem_cache_free+0x77/0x280 mm/slab.c:3763
> >>  remove_vma+0x162/0x1b0 mm/mmap.c:176
> >>  remove_vma_list mm/mmap.c:2475 [inline]
> >>  do_munmap+0x82a/0xdf0 mm/mmap.c:2714
> >>  mmap_region+0x59e/0x15a0 mm/mmap.c:1631
> >>  do_mmap+0x69b/0xd40 mm/mmap.c:1468
> >>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
> >>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
> >>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
> >>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
> >>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
> >>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
> >>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> >>
> >> The buggy address belongs to the object at ffff880064d19a50
> >>  which belongs to the cache vm_area_struct of size 200
> >> The buggy address is located 80 bytes inside of
> >>  200-byte region [ffff880064d19a50, ffff880064d19b18)
> >> The buggy address belongs to the page:
> >> page:ffffea0001934640 count:1 mapcount:0 mapping:ffff880064d19000 index:0x0
> >> flags: 0x100000000000100(slab)
> >> raw: 0100000000000100 ffff880064d19000 0000000000000000 000000010000000f
> >> raw: ffffea00018a3a60 ffffea0001940be0 ffff88006c5f79c0 0000000000000000
> >> page dumped because: kasan: bad access detected
> >>
> >> Memory state around the buggy address:
> >>  ffff880064d19980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>  ffff880064d19a00: fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb
> >>> ffff880064d19a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >>                                ^
> >>  ffff880064d19b00: fb fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb
> >>  ffff880064d19b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >> ==================================================================
> >>
> >>
> >> Here is disasm of the function:
> >> https://gist.githubusercontent.com/dvyukov/5a56c66ce605168c951a321d94df6e3a/raw/538d4ce72ceb5631dfcc866ccde46c74543de1cf/gistfile1.txt
> >>
> >> Seems to be vma->vm_flags at offset 80.
> > 
> > You can see it from the disasm? I can't make much of it, unfortunately,
> > the added kasan calls obscure it a lot for me. But I suspect it might be
> > the vma_pkey() thing which reads from vma->vm_flags. What happens when
> > CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled? (or is it already?)
> 
> OK, so I opened the google groups link in the report's signature and
> looked at the attached config there, which says protkeys are enabled.
> Also looked at the repro.txt attachment:
> #{Threaded:true Collide:true Repeat:true Procs:8 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:true UseTmpDir:true HandleSegv:true WaitRepeat:true Debug:false Repro:false}
> mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
> mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
> r0 = userfaultfd(0x0)
> ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
> ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
> r1 = gettid()
> syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
> tkill(r1, 0x7)
> 
> The userfaultfd() caught my attention so I checked handle_userfault()
> which seems to do up_read(&mm->mmap_sem); and in some cases later
> followed by down_read(&mm->mmap_sem); return VM_FAULT_NOPAGE.
> However, __do_page_fault() only expects that mmap_sem to be released
> when handle_mm_fault() returns with VM_FAULT_RETRY. It doesn't expect it
> to be released and then acquired again, because then vma can be indeed
> gone. It seems vma hasn't been touched after that point until the
> vma_pkey() was added by commit a3c4fb7c9c2e ("x86/mm: Fix fault error
> path using unsafe vma pointer") in rc3. Which tried to fix a similar
> problem, but run into this corner case?
> 
> So I suspect a3c4fb7c9c2e is the culprit and thus a regression.

I wounder if we can move "pkey = vma_pkey(vma);" before handle_mm_fault()?
pkey can't change during page fault handing, can it?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 14:11           ` Kirill A. Shutemov
@ 2017-10-31 14:28             ` Vlastimil Babka
  2017-10-31 19:15               ` Andrea Arcangeli
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2017-10-31 14:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dmitry Vyukov, syzbot, Jan Beulich, H. Peter Anvin,
	Josh Poimboeuf, Kirill A. Shutemov, ldufour, LKML,
	Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers, Andrew Morton, Michal Hocko,
	Hugh Dickins, David Rientjes, linux-mm, Andrea Arcangeli,
	Linus Torvalds, Thorsten Leemhuis

On 10/31/2017 03:11 PM, Kirill A. Shutemov wrote:
> On Tue, Oct 31, 2017 at 02:57:58PM +0100, Vlastimil Babka wrote:
>> +CC Andrea, Thorsten, Linus
>>
>> On 10/31/2017 02:20 PM, Vlastimil Babka wrote:
>>> On 10/31/2017 01:42 PM, Dmitry Vyukov wrote:
>>>>> My vm_area_struct is 192 bytes, could be your layout is different due to
>>>>> .config. At offset 80 I have vma->vm_flags. That is checked by
>>>>> __do_page_fault(), but only after vma->vm_start (offset 0). Of course,
>>>>> reordering is possible.
>>>>
>>>>
>>>> It seems that compiler over-optimizes things and messes debug info.
>>>> I just re-reproduced this on upstream
>>>> 15f859ae5c43c7f0a064ed92d33f7a5bc5de6de0 and got the same report:
>>>>
>>>> ==================================================================
>>>> BUG: KASAN: use-after-free in arch_local_irq_enable
>>>> arch/x86/include/asm/paravirt.h:787 [inline]
>>>> BUG: KASAN: use-after-free in __do_page_fault+0xc03/0xd60
>>>> arch/x86/mm/fault.c:1357
>>>> Read of size 8 at addr ffff880064d19aa0 by task syz-executor/8001
>>>>
>>>> CPU: 0 PID: 8001 Comm: syz-executor Not tainted 4.14.0-rc6+ #12
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>>> Call Trace:
>>>>  __dump_stack lib/dump_stack.c:16 [inline]
>>>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>>>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>>>>  kasan_report_error mm/kasan/report.c:351 [inline]
>>>>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>>>>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
>>>>  arch_local_irq_enable arch/x86/include/asm/paravirt.h:787 [inline]
>>>>  __do_page_fault+0xc03/0xd60 arch/x86/mm/fault.c:1357
>>>>  do_page_fault+0xee/0x720 arch/x86/mm/fault.c:1520
>>>>  do_async_page_fault+0x82/0x110 arch/x86/kernel/kvm.c:273
>>>>  async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
>>>> RIP: 0033:0x441bd0
>>>> RSP: 002b:00007f2ed8229798 EFLAGS: 00010202
>>>> RAX: 00007f2ed82297c0 RBX: 0000000000000000 RCX: 000000000000000e
>>>> RDX: 0000000000000400 RSI: 0000000020012fe0 RDI: 00007f2ed82297c0
>>>> RBP: 0000000000748020 R08: 0000000000000400 R09: 0000000000000000
>>>> R10: 0000000020012fee R11: 0000000000000246 R12: 00000000ffffffff
>>>> R13: 0000000000008430 R14: 00000000006ec4d0 R15: 00007f2ed822a700
>>>>
>>>> Allocated by task 8001:
>>>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>>>>  set_track mm/kasan/kasan.c:459 [inline]
>>>>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>>>>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>>>>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3561
>>>>  kmem_cache_zalloc include/linux/slab.h:656 [inline]
>>>>  mmap_region+0x7ee/0x15a0 mm/mmap.c:1658
>>>>  do_mmap+0x69b/0xd40 mm/mmap.c:1468
>>>>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>>>>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>>>>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>>>>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>>>>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
>>>>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
>>>>  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>>
>>>> Freed by task 8007:
>>>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>>>>  set_track mm/kasan/kasan.c:459 [inline]
>>>>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>>>>  __cache_free mm/slab.c:3503 [inline]
>>>>  kmem_cache_free+0x77/0x280 mm/slab.c:3763
>>>>  remove_vma+0x162/0x1b0 mm/mmap.c:176
>>>>  remove_vma_list mm/mmap.c:2475 [inline]
>>>>  do_munmap+0x82a/0xdf0 mm/mmap.c:2714
>>>>  mmap_region+0x59e/0x15a0 mm/mmap.c:1631
>>>>  do_mmap+0x69b/0xd40 mm/mmap.c:1468
>>>>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>>>>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>>>>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>>>>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>>>>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
>>>>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
>>>>  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>>
>>>> The buggy address belongs to the object at ffff880064d19a50
>>>>  which belongs to the cache vm_area_struct of size 200
>>>> The buggy address is located 80 bytes inside of
>>>>  200-byte region [ffff880064d19a50, ffff880064d19b18)
>>>> The buggy address belongs to the page:
>>>> page:ffffea0001934640 count:1 mapcount:0 mapping:ffff880064d19000 index:0x0
>>>> flags: 0x100000000000100(slab)
>>>> raw: 0100000000000100 ffff880064d19000 0000000000000000 000000010000000f
>>>> raw: ffffea00018a3a60 ffffea0001940be0 ffff88006c5f79c0 0000000000000000
>>>> page dumped because: kasan: bad access detected
>>>>
>>>> Memory state around the buggy address:
>>>>  ffff880064d19980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>  ffff880064d19a00: fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb
>>>>> ffff880064d19a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>                                ^
>>>>  ffff880064d19b00: fb fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb
>>>>  ffff880064d19b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> ==================================================================
>>>>
>>>>
>>>> Here is disasm of the function:
>>>> https://gist.githubusercontent.com/dvyukov/5a56c66ce605168c951a321d94df6e3a/raw/538d4ce72ceb5631dfcc866ccde46c74543de1cf/gistfile1.txt
>>>>
>>>> Seems to be vma->vm_flags at offset 80.
>>>
>>> You can see it from the disasm? I can't make much of it, unfortunately,
>>> the added kasan calls obscure it a lot for me. But I suspect it might be
>>> the vma_pkey() thing which reads from vma->vm_flags. What happens when
>>> CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled? (or is it already?)
>>
>> OK, so I opened the google groups link in the report's signature and
>> looked at the attached config there, which says protkeys are enabled.
>> Also looked at the repro.txt attachment:
>> #{Threaded:true Collide:true Repeat:true Procs:8 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:true UseTmpDir:true HandleSegv:true WaitRepeat:true Debug:false Repro:false}
>> mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
>> mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
>> r0 = userfaultfd(0x0)
>> ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
>> ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
>> r1 = gettid()
>> syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
>> tkill(r1, 0x7)
>>
>> The userfaultfd() caught my attention so I checked handle_userfault()
>> which seems to do up_read(&mm->mmap_sem); and in some cases later
>> followed by down_read(&mm->mmap_sem); return VM_FAULT_NOPAGE.
>> However, __do_page_fault() only expects that mmap_sem to be released
>> when handle_mm_fault() returns with VM_FAULT_RETRY. It doesn't expect it
>> to be released and then acquired again, because then vma can be indeed
>> gone. It seems vma hasn't been touched after that point until the
>> vma_pkey() was added by commit a3c4fb7c9c2e ("x86/mm: Fix fault error
>> path using unsafe vma pointer") in rc3. Which tried to fix a similar
>> problem, but run into this corner case?
>>
>> So I suspect a3c4fb7c9c2e is the culprit and thus a regression.
> 
> I wounder if we can move "pkey = vma_pkey(vma);" before handle_mm_fault()?
> pkey can't change during page fault handing, can it?
 
Hmm that could indeed work, Dmitry can you try the patch below?
But it still seems rather fragile so I'd hope Andrea can do it more
robust, or at least make sure that we don't reintroduce this kind of
problem in the future (explicitly set vma to NULL with a comment?).

----8<----
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e2baeaa053a5..9bd16fc621db 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1441,6 +1441,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 * the fault.  Since we never set FAULT_FLAG_RETRY_NOWAIT, if
 	 * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked.
 	 */
+	pkey = vma_pkey(vma);
 	fault = handle_mm_fault(vma, address, flags);
 	major |= fault & VM_FAULT_MAJOR;
 
@@ -1467,7 +1468,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 		return;
 	}
 
-	pkey = vma_pkey(vma);
 	up_read(&mm->mmap_sem);
 	if (unlikely(fault & VM_FAULT_ERROR)) {
 		mm_fault_error(regs, error_code, address, &pkey, fault);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 13:57         ` Vlastimil Babka
  2017-10-31 14:11           ` Kirill A. Shutemov
@ 2017-10-31 15:37           ` Linus Torvalds
  2017-10-31 19:13             ` Andrea Arcangeli
  2017-11-02 10:00           ` Laurent Dufour
  2 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2017-10-31 15:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Dmitry Vyukov, syzbot, Jan Beulich, H. Peter Anvin,
	Josh Poimboeuf, Kirill A. Shutemov, Laurent Dufour, LKML,
	Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers, Andrew Morton, Michal Hocko,
	Hugh Dickins, David Rientjes, linux-mm, Andrea Arcangeli,
	Thorsten Leemhuis

On Tue, Oct 31, 2017 at 6:57 AM, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> However, __do_page_fault() only expects that mmap_sem to be released
> when handle_mm_fault() returns with VM_FAULT_RETRY. It doesn't expect it
> to be released and then acquired again, because then vma can be indeed
> gone.

Yes. Accessing "vma" after calling "handle_mm_fault()" is a bug. An
unfortunate issue with userfaultfd.

The suggested fix to simply look up pkey beforehand seems sane and simple.

But sadly, from a quick check, it looks like arch/um/ has the same
bug, but even worse. It will do

 (a) handle_mm_fault() in a loop without re-calculating vma. Don't ask me why.

 (b) flush_tlb_page(vma, address); afterwards

but much more importantly, I think __get_user_pages() is broken in two ways:

 - faultin_page() does:

        ret = handle_mm_fault(vma, address, fault_flags);
        ...
        if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))

   (easily fixed the same way)

 - more annoyingly and harder to fix: the retry case in
__get_user_pages(), and the VMA saving there.

Ho humm.

Andrea, looking at that get_user_pages() case, I really think it's
userfaultfd that is broken.

Could we perhaps limit userfaultfd to _only_ do the VM_FAULT_RETRY,
and simply fail for non-retry faults?

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 15:37           ` Linus Torvalds
@ 2017-10-31 19:13             ` Andrea Arcangeli
  2017-11-01 15:26               ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Andrea Arcangeli @ 2017-10-31 19:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vlastimil Babka, Dmitry Vyukov, syzbot, Jan Beulich,
	H. Peter Anvin, Josh Poimboeuf, Kirill A. Shutemov,
	Laurent Dufour, LKML, Andy Lutomirski, Ingo Molnar,
	syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers,
	Andrew Morton, Michal Hocko, Hugh Dickins, David Rientjes,
	linux-mm, Thorsten Leemhuis

On Tue, Oct 31, 2017 at 08:37:47AM -0700, Linus Torvalds wrote:
> Yes. Accessing "vma" after calling "handle_mm_fault()" is a bug. An
> unfortunate issue with userfaultfd.
> 
> The suggested fix to simply look up pkey beforehand seems sane and simple.

Agreed.

> 
> But sadly, from a quick check, it looks like arch/um/ has the same
> bug, but even worse. It will do
> 
>  (a) handle_mm_fault() in a loop without re-calculating vma. Don't ask me why.
> 
>  (b) flush_tlb_page(vma, address); afterwards

Yes, that flush_tlb_page is unsafe. Luckily it's only using it for
vma->vm_mm so it doesn't sound major issue to fix it.

> 
> but much more importantly, I think __get_user_pages() is broken in two ways:
> 
>  - faultin_page() does:
> 
>         ret = handle_mm_fault(vma, address, fault_flags);
>         ...
>         if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
> 
>    (easily fixed the same way)
> 
>  - more annoyingly and harder to fix: the retry case in
> __get_user_pages(), and the VMA saving there.
> 
> Ho humm.
> 
> Andrea, looking at that get_user_pages() case, I really think it's
> userfaultfd that is broken.
> 
> Could we perhaps limit userfaultfd to _only_ do the VM_FAULT_RETRY,
> and simply fail for non-retry faults?

In the get_user_pages case we already limit it to do only
VM_FAULT_RETRY so no use after free should materialize whenever gup is
involved.

The problematic path for the return to userland (get_user_pages
returns to kernel) is this one:

	if (return_to_userland) {
		if (signal_pending(current) &&
		    !fatal_signal_pending(current)) {
			/*
			 * If we got a SIGSTOP or SIGCONT and this is
			 * a normal userland page fault, just let
			 * userland return so the signal will be
			 * handled and gdb debugging works.  The page
			 * fault code immediately after we return from
			 * this function is going to release the
			 * mmap_sem and it's not depending on it
			 * (unlike gup would if we were not to return
			 * VM_FAULT_RETRY).
			 *
			 * If a fatal signal is pending we still take
			 * the streamlined VM_FAULT_RETRY failure path
			 * and there's no need to retake the mmap_sem
			 * in such case.
			 */
			down_read(&mm->mmap_sem);
			ret = VM_FAULT_NOPAGE;
		}
	}

We could remove the above branch all together and then
handle_userfault() would always return VM_FAULT_RETRY whenever it
decides to release the mmap_sem. The above makes debugging with gdb
more user friendly and it potentially lowers the latency of signals as
signals can unblock handle_userfault. The downside is that the return
to userland cannot dereference the vma after calling handle_mm_fault.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 14:28             ` Vlastimil Babka
@ 2017-10-31 19:15               ` Andrea Arcangeli
  2017-11-01  7:42                 ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Andrea Arcangeli @ 2017-10-31 19:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dmitry Vyukov, syzbot, Jan Beulich,
	H. Peter Anvin, Josh Poimboeuf, Kirill A. Shutemov, ldufour,
	LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs,
	Thomas Gleixner, the arch/x86 maintainers, Andrew Morton,
	Michal Hocko, Hugh Dickins, David Rientjes, linux-mm,
	Linus Torvalds, Thorsten Leemhuis

On Tue, Oct 31, 2017 at 03:28:26PM +0100, Vlastimil Babka wrote:
> Hmm that could indeed work, Dmitry can you try the patch below?
> But it still seems rather fragile so I'd hope Andrea can do it more
> robust, or at least make sure that we don't reintroduce this kind of
> problem in the future (explicitly set vma to NULL with a comment?).

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

> 
> ----8<----
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index e2baeaa053a5..9bd16fc621db 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1441,6 +1441,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
>  	 * the fault.  Since we never set FAULT_FLAG_RETRY_NOWAIT, if
>  	 * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked.
>  	 */
> +	pkey = vma_pkey(vma);
>  	fault = handle_mm_fault(vma, address, flags);
>  	major |= fault & VM_FAULT_MAJOR;
>  
> @@ -1467,7 +1468,6 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
>  		return;
>  	}
>  
> -	pkey = vma_pkey(vma);
>  	up_read(&mm->mmap_sem);
>  	if (unlikely(fault & VM_FAULT_ERROR)) {
>  		mm_fault_error(regs, error_code, address, &pkey, fault);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 19:15               ` Andrea Arcangeli
@ 2017-11-01  7:42                 ` Vlastimil Babka
  2017-11-01 10:17                   ` Andrea Arcangeli
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2017-11-01  7:42 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Kirill A. Shutemov, Dmitry Vyukov, syzbot, Jan Beulich,
	H. Peter Anvin, Josh Poimboeuf, Kirill A. Shutemov, ldufour,
	LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs,
	Thomas Gleixner, the arch/x86 maintainers, Andrew Morton,
	Michal Hocko, Hugh Dickins, David Rientjes, linux-mm,
	Linus Torvalds, Thorsten Leemhuis

On 10/31/2017 08:15 PM, Andrea Arcangeli wrote:
> On Tue, Oct 31, 2017 at 03:28:26PM +0100, Vlastimil Babka wrote:
>> Hmm that could indeed work, Dmitry can you try the patch below?
>> But it still seems rather fragile so I'd hope Andrea can do it more
>> robust, or at least make sure that we don't reintroduce this kind of
>> problem in the future (explicitly set vma to NULL with a comment?).
> 
> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

Thanks. OK so here's the full patch for the immediate issue, unless we
decide to do something more general.

----8<----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-11-01  7:42                 ` Vlastimil Babka
@ 2017-11-01 10:17                   ` Andrea Arcangeli
  2017-11-01 12:14                     ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Andrea Arcangeli @ 2017-11-01 10:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dmitry Vyukov, syzbot, Jan Beulich,
	H. Peter Anvin, Josh Poimboeuf, Kirill A. Shutemov, ldufour,
	LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs,
	Thomas Gleixner, the arch/x86 maintainers, Andrew Morton,
	Michal Hocko, Hugh Dickins, David Rientjes, linux-mm,
	Linus Torvalds, Thorsten Leemhuis

On Wed, Nov 01, 2017 at 08:42:57AM +0100, Vlastimil Babka wrote:
> The vma should be pinned by mmap_sem, but handle_userfault() will in some
> scenarios release it and then acquire again, so when we return to

In the above message and especially in the below comment, I would
suggest to take the opportunity to more accurately document the
specific scenario instead of "some scenario" which is only "A return
to userland to repeat the page fault later with a VM_FAULT_NOPAGE
retval (potentially after handling any pending signal during the
return to userland). The return to userland is identified whenever
FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in vmf->flags".

> +	 * in some scenario (and not return VM_FAULT_RETRY), we have to be

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-11-01 10:17                   ` Andrea Arcangeli
@ 2017-11-01 12:14                     ` Vlastimil Babka
  0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2017-11-01 12:14 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Kirill A. Shutemov, Dmitry Vyukov, syzbot, Jan Beulich,
	H. Peter Anvin, Josh Poimboeuf, Kirill A. Shutemov, ldufour,
	LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs,
	Thomas Gleixner, the arch/x86 maintainers, Andrew Morton,
	Michal Hocko, Hugh Dickins, David Rientjes, linux-mm,
	Linus Torvalds, Thorsten Leemhuis

On 11/01/2017 11:17 AM, Andrea Arcangeli wrote:
> On Wed, Nov 01, 2017 at 08:42:57AM +0100, Vlastimil Babka wrote:
>> The vma should be pinned by mmap_sem, but handle_userfault() will in some
>> scenarios release it and then acquire again, so when we return to
> 
> In the above message and especially in the below comment, I would
> suggest to take the opportunity to more accurately document the
> specific scenario instead of "some scenario" which is only "A return
> to userland to repeat the page fault later with a VM_FAULT_NOPAGE
> retval (potentially after handling any pending signal during the
> return to userland). The return to userland is identified whenever
> FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in vmf->flags".

OK, updated patch below
----8<----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 19:13             ` Andrea Arcangeli
@ 2017-11-01 15:26               ` Linus Torvalds
  2017-11-02 19:36                 ` Andrea Arcangeli
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2017-11-01 15:26 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Vlastimil Babka, Dmitry Vyukov, syzbot, Jan Beulich,
	H. Peter Anvin, Josh Poimboeuf, Kirill A. Shutemov,
	Laurent Dufour, LKML, Andy Lutomirski, Ingo Molnar,
	syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers,
	Andrew Morton, Michal Hocko, Hugh Dickins, David Rientjes,
	linux-mm, Thorsten Leemhuis

On Tue, Oct 31, 2017 at 12:13 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
>
> The problematic path for the return to userland (get_user_pages
> returns to kernel) is this one:
>
>         if (return_to_userland) {
>                 if (signal_pending(current) &&
>                     !fatal_signal_pending(current)) {
>                         /*
>                          * If we got a SIGSTOP or SIGCONT and this is
>                          * a normal userland page fault, just let
>                          * userland return so the signal will be
>                          * handled and gdb debugging works.  The page
>                          * fault code immediately after we return from
>                          * this function is going to release the
>                          * mmap_sem and it's not depending on it
>                          * (unlike gup would if we were not to return
>                          * VM_FAULT_RETRY).
>                          *
>                          * If a fatal signal is pending we still take
>                          * the streamlined VM_FAULT_RETRY failure path
>                          * and there's no need to retake the mmap_sem
>                          * in such case.
>                          */
>                         down_read(&mm->mmap_sem);
>                         ret = VM_FAULT_NOPAGE;
>                 }
>         }
>
> We could remove the above branch all together and then
> handle_userfault() would always return VM_FAULT_RETRY whenever it
> decides to release the mmap_sem.

Honestly, I would *much* prefer that.

>    The above makes debugging with gdb
> more user friendly and it potentially lowers the latency of signals as
> signals can unblock handle_userfault.

I don't disagree about that, but why don't you use VM_FAULT_RETRY and
not re-take the mmap_sem? Then we wouldn't have a special case for
userfaultfd at all.

I see the gdb issue, but I wonder if we shouldn't fix that differently
by changing the retry logic in the fault handler.

In particular, right now we do

 -  Retry at most once

 - handle fatal signals specially

and I think the gdb case actually shows that both of those decisions
may have been wrong, or at least something we could improve on?

Maybe we should return to user space on _any_ pending signal? That
might help latency for other things than gdb (think ^Z etc, but also
things that catch SIGSEGV and abort).

And maybe we should allow FAULT_FLAG_ALLOW_RETRY to just go on forever
- although that will want to verify that every case that returns
VM_FAULT_RETRY does wait for the condition it dropped the mmap
semaphore and is retrying for.

There aren't that many places that return VM_FAULT_RETRY. The
important one is lock_page_or_retry(), and that one does wait for the
page.

(There's one in mm/shmem.c too, and obviously the userfaultfd case,
but those seem to do waiting too).

So maybe we could just fix the gdb case without that userfaultfd hack?

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-10-31 13:57         ` Vlastimil Babka
  2017-10-31 14:11           ` Kirill A. Shutemov
  2017-10-31 15:37           ` Linus Torvalds
@ 2017-11-02 10:00           ` Laurent Dufour
  2 siblings, 0 replies; 16+ messages in thread
From: Laurent Dufour @ 2017-11-02 10:00 UTC (permalink / raw)
  To: Vlastimil Babka, Dmitry Vyukov
  Cc: syzbot, Jan Beulich, H. Peter Anvin, Josh Poimboeuf,
	Kirill A. Shutemov, LKML, Andy Lutomirski, Ingo Molnar,
	syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers,
	Andrew Morton, Michal Hocko, Hugh Dickins, David Rientjes,
	linux-mm, Andrea Arcangeli, Linus Torvalds, Thorsten Leemhuis

Hi Vlastimil,

Sorry for the late answer I got a few day off.

On 31/10/2017 14:57, Vlastimil Babka wrote:
> +CC Andrea, Thorsten, Linus
> 
> On 10/31/2017 02:20 PM, Vlastimil Babka wrote:
>> On 10/31/2017 01:42 PM, Dmitry Vyukov wrote:
>>>> My vm_area_struct is 192 bytes, could be your layout is different due to
>>>> .config. At offset 80 I have vma->vm_flags. That is checked by
>>>> __do_page_fault(), but only after vma->vm_start (offset 0). Of course,
>>>> reordering is possible.
>>>
>>>
>>> It seems that compiler over-optimizes things and messes debug info.
>>> I just re-reproduced this on upstream
>>> 15f859ae5c43c7f0a064ed92d33f7a5bc5de6de0 and got the same report:
>>>
>>> ==================================================================
>>> BUG: KASAN: use-after-free in arch_local_irq_enable
>>> arch/x86/include/asm/paravirt.h:787 [inline]
>>> BUG: KASAN: use-after-free in __do_page_fault+0xc03/0xd60
>>> arch/x86/mm/fault.c:1357
>>> Read of size 8 at addr ffff880064d19aa0 by task syz-executor/8001
>>>
>>> CPU: 0 PID: 8001 Comm: syz-executor Not tainted 4.14.0-rc6+ #12
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>> Call Trace:
>>>  __dump_stack lib/dump_stack.c:16 [inline]
>>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>>>  kasan_report_error mm/kasan/report.c:351 [inline]
>>>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>>>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
>>>  arch_local_irq_enable arch/x86/include/asm/paravirt.h:787 [inline]
>>>  __do_page_fault+0xc03/0xd60 arch/x86/mm/fault.c:1357
>>>  do_page_fault+0xee/0x720 arch/x86/mm/fault.c:1520
>>>  do_async_page_fault+0x82/0x110 arch/x86/kernel/kvm.c:273
>>>  async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
>>> RIP: 0033:0x441bd0
>>> RSP: 002b:00007f2ed8229798 EFLAGS: 00010202
>>> RAX: 00007f2ed82297c0 RBX: 0000000000000000 RCX: 000000000000000e
>>> RDX: 0000000000000400 RSI: 0000000020012fe0 RDI: 00007f2ed82297c0
>>> RBP: 0000000000748020 R08: 0000000000000400 R09: 0000000000000000
>>> R10: 0000000020012fee R11: 0000000000000246 R12: 00000000ffffffff
>>> R13: 0000000000008430 R14: 00000000006ec4d0 R15: 00007f2ed822a700
>>>
>>> Allocated by task 8001:
>>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>>>  set_track mm/kasan/kasan.c:459 [inline]
>>>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>>>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>>>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3561
>>>  kmem_cache_zalloc include/linux/slab.h:656 [inline]
>>>  mmap_region+0x7ee/0x15a0 mm/mmap.c:1658
>>>  do_mmap+0x69b/0xd40 mm/mmap.c:1468
>>>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>>>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>>>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>>>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>>>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
>>>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
>>>  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>
>>> Freed by task 8007:
>>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>>>  set_track mm/kasan/kasan.c:459 [inline]
>>>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>>>  __cache_free mm/slab.c:3503 [inline]
>>>  kmem_cache_free+0x77/0x280 mm/slab.c:3763
>>>  remove_vma+0x162/0x1b0 mm/mmap.c:176
>>>  remove_vma_list mm/mmap.c:2475 [inline]
>>>  do_munmap+0x82a/0xdf0 mm/mmap.c:2714
>>>  mmap_region+0x59e/0x15a0 mm/mmap.c:1631
>>>  do_mmap+0x69b/0xd40 mm/mmap.c:1468
>>>  do_mmap_pgoff include/linux/mm.h:2150 [inline]
>>>  vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
>>>  SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
>>>  SyS_mmap_pgoff+0x23b/0x5f0 mm/mmap.c:1476
>>>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
>>>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
>>>  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>
>>> The buggy address belongs to the object at ffff880064d19a50
>>>  which belongs to the cache vm_area_struct of size 200
>>> The buggy address is located 80 bytes inside of
>>>  200-byte region [ffff880064d19a50, ffff880064d19b18)
>>> The buggy address belongs to the page:
>>> page:ffffea0001934640 count:1 mapcount:0 mapping:ffff880064d19000 index:0x0
>>> flags: 0x100000000000100(slab)
>>> raw: 0100000000000100 ffff880064d19000 0000000000000000 000000010000000f
>>> raw: ffffea00018a3a60 ffffea0001940be0 ffff88006c5f79c0 0000000000000000
>>> page dumped because: kasan: bad access detected
>>>
>>> Memory state around the buggy address:
>>>  ffff880064d19980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>  ffff880064d19a00: fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb
>>>> ffff880064d19a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>                                ^
>>>  ffff880064d19b00: fb fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb
>>>  ffff880064d19b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> ==================================================================
>>>
>>>
>>> Here is disasm of the function:
>>> https://gist.githubusercontent.com/dvyukov/5a56c66ce605168c951a321d94df6e3a/raw/538d4ce72ceb5631dfcc866ccde46c74543de1cf/gistfile1.txt
>>>
>>> Seems to be vma->vm_flags at offset 80.
>>
>> You can see it from the disasm? I can't make much of it, unfortunately,
>> the added kasan calls obscure it a lot for me. But I suspect it might be
>> the vma_pkey() thing which reads from vma->vm_flags. What happens when
>> CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled? (or is it already?)
> 
> OK, so I opened the google groups link in the report's signature and
> looked at the attached config there, which says protkeys are enabled.
> Also looked at the repro.txt attachment:
> #{Threaded:true Collide:true Repeat:true Procs:8 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:true UseTmpDir:true HandleSegv:true WaitRepeat:true Debug:false Repro:false}
> mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
> mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
> r0 = userfaultfd(0x0)
> ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
> ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
> r1 = gettid()
> syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
> tkill(r1, 0x7)
> 
> The userfaultfd() caught my attention so I checked handle_userfault()
> which seems to do up_read(&mm->mmap_sem); and in some cases later
> followed by down_read(&mm->mmap_sem); return VM_FAULT_NOPAGE.
> However, __do_page_fault() only expects that mmap_sem to be released
> when handle_mm_fault() returns with VM_FAULT_RETRY. It doesn't expect it
> to be released and then acquired again, because then vma can be indeed
> gone. It seems vma hasn't been touched after that point until the
> vma_pkey() was added by commit a3c4fb7c9c2e ("x86/mm: Fix fault error
> path using unsafe vma pointer") in rc3. Which tried to fix a similar
> problem, but run into this corner case?
> 
> So I suspect a3c4fb7c9c2e is the culprit and thus a regression.

Yes that's my mistake.

My patch was removing the use of vma once mmap_sem is released but it was
mainly done in the error path and I moved the read of the vma's pkey before
releasing the mmap_sem, but in the no-error path also, leading to the use
after free you seen.

As suggested and done later in this thread, reading the vma's key value
before calling handle_mm_fault() will solve this issue. This is safe since
the vma's pkey can't be changed once the mmap_sem is held.

Thanks,
Laurent.

> 
>> Also did you try the vmacache shortcut test suggested in my previous mail?
>>
>>> I think the size of 200 reported by slab is OK as it can do some rounding.
>>> Everything points to a vma object.
>>>
>>>
>>>>>> The buggy address belongs to the page:
>>>>>> page:ffffea00072ff4c0 count:1 mapcount:0 mapping:ffff8801cbfd3040 index:0x0
>>>>>> flags: 0x200000000000100(slab)
>>>>>> raw: 0200000000000100 ffff8801cbfd3040 0000000000000000 000000010000000f
>>>>>> raw: ffffea000730c7a0 ffffea00072ff7a0 ffff8801dae069c0 0000000000000000
>>>>>> page dumped because: kasan: bad access detected
>>>>>>
>>>>>> Memory state around the buggy address:
>>>>>>  ffff8801cbfd2f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>>>>>>  ffff8801cbfd3000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>>>>>>>
>>>>>>> ffff8801cbfd3080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>>>
>>>>>>                          ^
>>>>>>  ffff8801cbfd3100: fb fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb
>>>>>>  ffff8801cbfd3180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>>> ==================================================================
>>>>>
>>>>>
>>>>> I guess this is more related to mm rather than x86, so +mm maintainers.
>>>>> This continues to happen, in particular on upstream
>>>>> 781402340475144bb360e32bb7437fa4b84cadc3 (Oct 28).
>>>>>
>>>>>
>>>>>> ---
>>>>>> This bug is generated by a dumb bot. It may contain errors.
>>>>>> See https://goo.gl/tpsmEJ for details.
>>>>>> Direct all questions to syzkaller@googlegroups.com.
>>>>>>
>>>>>> syzbot will keep track of this bug report.
>>>>>> Once a fix for this bug is committed, please reply to this email with:
>>>>>> #syz fix: exact-commit-title
>>>>>> To mark this as a duplicate of another syzbot report, please reply with:
>>>>>> #syz dup: exact-subject-of-another-report
>>>>>> If it's a one-off invalid bug report, please reply with:
>>>>>> #syz invalid
>>>>>> Note: if the crash happens again, it will cause creation of a new bug
>>>>>> report.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google Groups
>>>>>> "syzkaller-bugs" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>>>> email to syzkaller-bugs+unsubscribe@googlegroups.com.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/syzkaller-bugs/94eb2c0433c8f42cac055cc86991%40google.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>> --
>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>>> see: http://www.linux-mm.org/ .
>>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>>
>>>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KASAN: use-after-free Read in __do_page_fault
  2017-11-01 15:26               ` Linus Torvalds
@ 2017-11-02 19:36                 ` Andrea Arcangeli
  0 siblings, 0 replies; 16+ messages in thread
From: Andrea Arcangeli @ 2017-11-02 19:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vlastimil Babka, Dmitry Vyukov, syzbot, Jan Beulich,
	H. Peter Anvin, Josh Poimboeuf, Kirill A. Shutemov,
	Laurent Dufour, LKML, Andy Lutomirski, Ingo Molnar,
	syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers,
	Andrew Morton, Michal Hocko, Hugh Dickins, David Rientjes,
	linux-mm, Thorsten Leemhuis

On Wed, Nov 01, 2017 at 08:26:24AM -0700, Linus Torvalds wrote:
> On Tue, Oct 31, 2017 at 12:13 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> >
> > The problematic path for the return to userland (get_user_pages
> > returns to kernel) is this one:
> >
> >         if (return_to_userland) {
> >                 if (signal_pending(current) &&
> >                     !fatal_signal_pending(current)) {
> >                         /*
> >                          * If we got a SIGSTOP or SIGCONT and this is
> >                          * a normal userland page fault, just let
> >                          * userland return so the signal will be
> >                          * handled and gdb debugging works.  The page
> >                          * fault code immediately after we return from
> >                          * this function is going to release the
> >                          * mmap_sem and it's not depending on it
> >                          * (unlike gup would if we were not to return
> >                          * VM_FAULT_RETRY).
> >                          *
> >                          * If a fatal signal is pending we still take
> >                          * the streamlined VM_FAULT_RETRY failure path
> >                          * and there's no need to retake the mmap_sem
> >                          * in such case.
> >                          */
> >                         down_read(&mm->mmap_sem);
> >                         ret = VM_FAULT_NOPAGE;
> >                 }
> >         }
> >
> > We could remove the above branch all together and then
> > handle_userfault() would always return VM_FAULT_RETRY whenever it
> > decides to release the mmap_sem.
> 
> Honestly, I would *much* prefer that.
> 
> >    The above makes debugging with gdb
> > more user friendly and it potentially lowers the latency of signals as
> > signals can unblock handle_userfault.
> 
> I don't disagree about that, but why don't you use VM_FAULT_RETRY and
> not re-take the mmap_sem? Then we wouldn't have a special case for
> userfaultfd at all.

Yes I prefer that as well, it's much more generic and cleaner that
way.

> I see the gdb issue, but I wonder if we shouldn't fix that differently
> by changing the retry logic in the fault handler.
> 
> In particular, right now we do
> 
>  -  Retry at most once
> 
>  - handle fatal signals specially
> 
> and I think the gdb case actually shows that both of those decisions
> may have been wrong, or at least something we could improve on?
> 
> Maybe we should return to user space on _any_ pending signal? That
> might help latency for other things than gdb (think ^Z etc, but also
> things that catch SIGSEGV and abort).

That would be an ideal solution and then we can drop the special case
from handle_userfault() entirely and the signal check will cover more
cases then.

If VM_FAULT_RETRY is returned but there are pending signals we and the
page fault was invoked on top of userland, we will ignore the
VM_FAULT_RETRY request and return to userland instead.

> And maybe we should allow FAULT_FLAG_ALLOW_RETRY to just go on forever

Returning VM_FAULT_RETRY more than once is already a dependency in the
future userfaultfd WP support (at the moment to stay on the safe side
it only allows one more VM_FAULT_RETRY), as we may writeprotect
individual pagetables while there's already a page fault in flight, in
turn requiring two VM_FAULT_RETRY in a row if the second page fault
invocation ends up in handle_userfault() because of a concurrent
UFFDIO_WRITEPROTECT.

Such an issue cannot materialize with "missing" userfaults because by
definition if the virtual page is missing there cannot be already
other page faults in flight on it.

It'd be cleaner to just let it go on forever.

> - although that will want to verify that every case that returns
> VM_FAULT_RETRY does wait for the condition it dropped the mmap
> semaphore and is retrying for.

Agreed.

> There aren't that many places that return VM_FAULT_RETRY. The
> important one is lock_page_or_retry(), and that one does wait for the
> page.
> 
> (There's one in mm/shmem.c too, and obviously the userfaultfd case,
> but those seem to do waiting too).
> 
> So maybe we could just fix the gdb case without that userfaultfd hack?

That would be great, I will look into implementing the above.

Thanks!
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-11-02 19:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <94eb2c0433c8f42cac055cc86991@google.com>
2017-10-30 19:15 ` KASAN: use-after-free Read in __do_page_fault Dmitry Vyukov
2017-10-31 12:00   ` Vlastimil Babka
2017-10-31 12:42     ` Dmitry Vyukov
2017-10-31 13:20       ` Vlastimil Babka
2017-10-31 13:57         ` Vlastimil Babka
2017-10-31 14:11           ` Kirill A. Shutemov
2017-10-31 14:28             ` Vlastimil Babka
2017-10-31 19:15               ` Andrea Arcangeli
2017-11-01  7:42                 ` Vlastimil Babka
2017-11-01 10:17                   ` Andrea Arcangeli
2017-11-01 12:14                     ` Vlastimil Babka
2017-10-31 15:37           ` Linus Torvalds
2017-10-31 19:13             ` Andrea Arcangeli
2017-11-01 15:26               ` Linus Torvalds
2017-11-02 19:36                 ` Andrea Arcangeli
2017-11-02 10:00           ` Laurent Dufour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).