linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sched: divide error in sg_capacity_factor
@ 2015-03-09 17:46 Sasha Levin
  2015-03-10  4:29 ` Ingo Molnar
  0 siblings, 1 reply; 3+ messages in thread
From: Sasha Levin @ 2015-03-09 17:46 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: Dave Jones, LKML, nicolas.pitre

Hi all,

While fuzzing with trinity inside the latest -next kernel using trinity I've stumbled on:

[  936.784266] divide error: 0000 [#1] PREEMPT SMP KASAN
[  936.789198] Dumping ftrace buffer:
[  936.793957]    (ftrace buffer empty)
[  936.793957] Modules linked in:
[  936.793957] CPU: 52 PID: 22110 Comm: trinity-c52 Tainted: G        W       4.0.0-rc1-sasha-00044-ge21109a #2039
[  936.793957] task: ffff8807ff293000 ti: ffff880f81fe8000 task.ti: ffff880f81fe8000
[  936.793957] RIP: find_busiest_group (kernel/sched/fair.c:6152 kernel/sched/fair.c:6223 kernel/sched/fair.c:6341 kernel/sched/fair.c:6603)
[  936.829403] RSP: 0000:ffff8810c28079a8  EFLAGS: 00010206
[  936.829403] RAX: 00000000000003ff RBX: 000000000000004e RCX: 0000000000002000
[  936.829403] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  936.829403] RBP: ffff8810c2807be8 R08: 0000000000000001 R09: 0000000000000001
[  936.829403] R10: 0000000000000001 R11: 0000000000000008 R12: dffffc0000000000
[  936.829403] R13: 0000000000000001 R14: ffff8810c2807b40 R15: ffff8810c2807ce8
[  936.829403] FS:  00007f89c95ff700(0000) GS:ffff8810c2800000(0000) knlGS:0000000000000000
[  936.829403] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  936.829403] CR2: 0000000003503ff8 CR3: 0000000f8237b000 CR4: 00000000000007a0
[  936.829403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  936.829403] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000050602
[  936.829403] Stack:
[  936.829403]  0000000000000082 ffffffff00000001 ffff8810c28079d8 ffff8810c2817a88
[  936.829403]  0000000000000000 1ffff10218500f4b 00000000c2817a88 ffff8810c2807d14
[  936.829403]  ffff8810c2807b50 ffff8810c2807cfc ffff8810c0740010 0000000307418e2e
[  936.829403] Call Trace:
[  936.829403]  <IRQ>
[  936.829403] ? __enqueue_entity (kernel/sched/fair.c:501)
[  936.829403] ? update_group_capacity (kernel/sched/fair.c:6593)
[  936.829403] ? update_cfs_shares (kernel/sched/fair.c:2375)
[  936.829403] ? cpumask_next_and (lib/cpumask.c:40)
[  936.829403] load_balance (kernel/sched/fair.c:6857)
[  936.829403] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:163 kernel/locking/spinlock.c:191)
[  936.829403] ? update_blocked_averages (kernel/sched/fair.c:5743)
[  936.829403] ? find_busiest_group (kernel/sched/fair.c:6820)
[  936.829403] ? run_rebalance_domains (kernel/sched/fair.c:7450 kernel/sched/fair.c:7659)
[  936.829403] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2566)
[  936.829403] run_rebalance_domains (kernel/sched/fair.c:7494 kernel/sched/fair.c:7659)
[  936.829403] ? run_rebalance_domains (kernel/sched/fair.c:7450 kernel/sched/fair.c:7659)
[  936.829403] ? pick_next_task_fair (kernel/sched/fair.c:7654)
[  936.829403] ? irq_exit (kernel/softirq.c:350 kernel/softirq.c:391)
[  936.829403] __do_softirq (kernel/softirq.c:273 include/linux/jump_label.h:114 include/trace/events/irq.h:126 kernel/softirq.c:274)
[  936.829403] irq_exit (kernel/softirq.c:350 kernel/softirq.c:391)
[  936.829403] smp_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:918)
[  936.829403] apic_timer_interrupt (arch/x86/kernel/entry_64.S:958)
[  936.829403]  <EOI>
[  936.829403] ? is_module_address (kernel/module.c:3835)
[  936.829403] ? __kernel_text_address (kernel/extable.c:104)
[  936.829403] print_context_stack (arch/x86/kernel/dumpstack.c:105)
[  936.829403] dump_trace (arch/x86/kernel/dumpstack_64.c:244)
[  936.829403] save_stack_trace (arch/x86/kernel/stacktrace.c:64)
[  936.829403] __set_page_owner (mm/page_owner.c:72)
[  936.829403] ? __reset_page_owner (mm/page_owner.c:61)
[  936.829403] ? __inc_zone_state (mm/vmstat.c:271)
[  936.829403] get_page_from_freelist (include/linux/page_owner.h:26 mm/page_alloc.c:2176)
[  936.829403] __alloc_pages_nodemask (mm/page_alloc.c:2844)
[  936.829403] ? alloc_pages_vma (mm/mempolicy.c:2007)
[  936.829403] ? debug_check_no_locks_freed (kernel/locking/lockdep.c:3051)
[  936.829403] ? debug_check_no_locks_freed (kernel/locking/lockdep.c:3051)
[  936.829403] ? __alloc_pages_direct_compact (mm/page_alloc.c:2797)
[  936.829403] ? debug_check_no_locks_freed (kernel/locking/lockdep.c:3051)
[  936.829403] ? arch_local_irq_restore (init/do_mounts.h:19)
[  936.829403] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2566)
[  936.829403] alloc_pages_vma (mm/mempolicy.c:2007)
[  936.829403] ? handle_mm_fault (mm/memory.c:2156 mm/memory.c:3164 mm/memory.c:3269 mm/memory.c:3298)
[  936.829403] handle_mm_fault (mm/memory.c:2156 mm/memory.c:3164 mm/memory.c:3269 mm/memory.c:3298)
[  936.829403] ? debug_check_no_locks_freed (kernel/locking/lockdep.c:3051)
[  936.829403] ? __pmd_alloc (mm/memory.c:3280)
[  936.829403] ? perf_event_context_sched_in (kernel/events/core.c:2755)
[  936.829403] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2566)
[  936.829403] ? __do_page_fault (arch/x86/mm/fault.c:1173)
[  936.829403] ? ___might_sleep (kernel/sched/core.c:7297 (discriminator 1))
[  936.829403] ? find_vma (mm/mmap.c:2035)
[  936.829403] __do_page_fault (arch/x86/mm/fault.c:1235)
[  936.829403] ? finish_task_switch (kernel/sched/core.c:2214)
[  936.829403] ? finish_task_switch (kernel/sched/sched.h:1058 kernel/sched/core.c:2210)
[  936.829403] trace_do_page_fault (arch/x86/mm/fault.c:1329)
[  936.829403] do_async_page_fault (arch/x86/kernel/kvm.c:280)
[  936.829403] async_page_fault (arch/x86/kernel/entry_64.S:1295)
[ 936.829403] Code: 89 f8 48 c1 e8 03 42 0f b6 04 20 84 c0 74 08 3c 03 0f 8e 3a 18 00 00 8b 7e 08 44 89 e8 48 c1 e0 0a 48 8d 44 07 ff 48 89 fe 48 99 <48> f7 ff 31 d2 48 89 c7 44 89 e8 f7 f7 45 89 c5 49 81 c5 00 02
All code
========
   0:	89 f8                	mov    %edi,%eax
   2:	48 c1 e8 03          	shr    $0x3,%rax
   6:	42 0f b6 04 20       	movzbl (%rax,%r12,1),%eax
   b:	84 c0                	test   %al,%al
   d:	74 08                	je     0x17
   f:	3c 03                	cmp    $0x3,%al
  11:	0f 8e 3a 18 00 00    	jle    0x1851
  17:	8b 7e 08             	mov    0x8(%rsi),%edi
  1a:	44 89 e8             	mov    %r13d,%eax
  1d:	48 c1 e0 0a          	shl    $0xa,%rax
  21:	48 8d 44 07 ff       	lea    -0x1(%rdi,%rax,1),%rax
  26:	48 89 fe             	mov    %rdi,%rsi
  29:	48 99                	cqto
  2b:*	48 f7 ff             	idiv   %rdi		<-- trapping instruction
  2e:	31 d2                	xor    %edx,%edx
  30:	48 89 c7             	mov    %rax,%rdi
  33:	44 89 e8             	mov    %r13d,%eax
  36:	f7 f7                	div    %edi
  38:	45 89 c5             	mov    %r8d,%r13d
  3b:	49                   	rex.WB
  3c:	81                   	.byte 0x81
  3d:	c5 00 02             	(bad)
	...

Code starting with the faulting instruction
===========================================
   0:	48 f7 ff             	idiv   %rdi
   3:	31 d2                	xor    %edx,%edx
   5:	48 89 c7             	mov    %rax,%rdi
   8:	44 89 e8             	mov    %r13d,%eax
   b:	f7 f7                	div    %edi
   d:	45 89 c5             	mov    %r8d,%r13d
  10:	49                   	rex.WB
  11:	81                   	.byte 0x81
  12:	c5 00 02             	(bad)
	...
[  936.829403] RIP find_busiest_group (kernel/sched/fair.c:6152 kernel/sched/fair.c:6223 kernel/sched/fair.c:6341 kernel/sched/fair.c:6603)
[  936.829403]  RSP <ffff8810c28079a8>


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: sched: divide error in sg_capacity_factor
  2015-03-09 17:46 sched: divide error in sg_capacity_factor Sasha Levin
@ 2015-03-10  4:29 ` Ingo Molnar
  2015-03-10 10:55   ` Sasha Levin
  0 siblings, 1 reply; 3+ messages in thread
From: Ingo Molnar @ 2015-03-10  4:29 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Peter Zijlstra, Dave Jones, LKML, nicolas.pitre, Linus Torvalds,
	Andrew Morton


* Sasha Levin <sasha.levin@oracle.com> wrote:

> Hi all,
> 
> While fuzzing with trinity inside the latest -next kernel using trinity I've stumbled on:
> 
> [  936.784266] divide error: 0000 [#1] PREEMPT SMP KASAN
> [  936.793957] RIP: find_busiest_group (kernel/sched/fair.c:6152 kernel/sched/fair.c:6223 kernel/sched/fair.c:6341 kernel/sched/fair.c:6603)

Hm, these line numbers don't seem to match up very well with my 
version of linux-next:

  28855005be1d Add linux-next specific files for 20150306

and the Git version info included in the oops seems useless:

  4.0.0-rc1-sasha-00044-ge21109a

  $ git log e21109a
  fatal: ambiguous argument 'e21109a': unknown revision or path not in the working tree.

I think the kernel's SHA1 should be made at least 12 char wide, 
regardless of the user's gitconfig::core.abbrev settings.

Also, latest linux-next is -rc2 based, while your version string says 
-rc1.

> [  936.829403] load_balance (kernel/sched/fair.c:6857)

this does not match up either.

> [  936.829403] run_rebalance_domains (kernel/sched/fair.c:7494 kernel/sched/fair.c:7659)

The line numbers are not even close to anything related: 
run_rebalance_domains() starts at line 7666 and ends at 7680.

Also, why are the offsets into the function missing from the output? 
Those would allow the rough determination of the crash site, even if 
debuginfo is crap.

I also checked Linus's latest, and they do seem to match up better:

  affb8172de39 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm

and the line number gives:

        capacity_factor = min_t(unsigned,
                capacity_factor, DIV_ROUND_CLOSEST(capacity, SCHED_CAPACITY_SCALE));

but that's a division with a constant? Should not trap.

So I rebuild a kernel with debug info, pattern matched the disassembly 
you provided, and that gave me this division:

(gdb) list *0xffffffff8107d958
0xffffffff8107d958 is in find_busiest_group (kernel/sched/fair.c:6162).
6157            capacity = group->sgc->capacity;
6158            capacity_orig = group->sgc->capacity_orig;
6159            cpus = group->group_weight;
6160
6161            /* smt := ceil(cpus / capacity), assumes: 1 < smt_capacity < 2 */
6162            smt = DIV_ROUND_UP(SCHED_CAPACITY_SCALE * cpus, capacity_orig);
6163            capacity_factor = cpus / smt; /* cores */
6164
6165            capacity_factor = min_t(unsigned,
6166                    capacity_factor, DIV_ROUND_CLOSEST(capacity, SCHED_CAPACITY_SCALE));

So this too seems not very plausible: 'capacity_orig' comes straight 
from group->sgc->capacity_orig, which is:

 - boot time initialized

 - sometimes recalculated during CPU hot-plug: not sure how much of 
   that your tests are doing?

 - but otherwise it's fairly constant and should have crashed your 
   system early on if it was set up wrong

unless I missed something that is.

> [  936.829403] __do_softirq (kernel/softirq.c:273 include/linux/jump_label.h:114 include/trace/events/irq.h:126 kernel/softirq.c:274)
> [  936.829403] irq_exit (kernel/softirq.c:350 kernel/softirq.c:391)
> [  936.829403] smp_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:918)
> [  936.829403] apic_timer_interrupt (arch/x86/kernel/entry_64.S:958)
> [  936.829403]  <EOI>
> [  936.829403] print_context_stack (arch/x86/kernel/dumpstack.c:105)
> [  936.829403] dump_trace (arch/x86/kernel/dumpstack_64.c:244)
> [  936.829403] save_stack_trace (arch/x86/kernel/stacktrace.c:64)
> [  936.829403] __set_page_owner (mm/page_owner.c:72)
> [  936.829403] get_page_from_freelist (include/linux/page_owner.h:26 mm/page_alloc.c:2176)
> [  936.829403] __alloc_pages_nodemask (mm/page_alloc.c:2844)
> [  936.829403] alloc_pages_vma (mm/mempolicy.c:2007)
> [  936.829403] handle_mm_fault (mm/memory.c:2156 mm/memory.c:3164 mm/memory.c:3269 mm/memory.c:3298)
> [  936.829403] __do_page_fault (arch/x86/mm/fault.c:1235)
> [  936.829403] trace_do_page_fault (arch/x86/mm/fault.c:1329)
> [  936.829403] do_async_page_fault (arch/x86/kernel/kvm.c:280)
> [  936.829403] async_page_fault (arch/x86/kernel/entry_64.S:1295)

So debug info weirdnesses aside, other divisions in 
find_busiest_group():

        sds.avg_load = (SCHED_CAPACITY_SCALE * sds.total_load)
                                                / sds.total_capacity;

total_capacity ought to be zero only on a totally borked machine 
(unlikely to boot), or on memory corruption.

if calculate_imbalance() got inlined, then:

                load_above_capacity /= busiest->group_capacity;

that too ought to only get corrupted in the most serious cases, we 
don't recalculate it runtime.

So I'm baffled. Some tentative handwaving, pointing away from the 
scheduler:

 - Your stack trace is 'weird' not just due to debug info: an async 
   page fault doing allocations, doing a stack trace, interruted by a 
   timer irq, doing scheduler rebalancing...

 - The (spectacularly misnamed [*] ) CONFIG_PAGE_OWNER=y page lifetime
   tracing facility got enabled explicitly via the page_owner=on boot 
   parameter, right? Not many people are doing that I suspect.

 - CONFIG_KASAN=y is enabled in your kernel. New, invasive option, 
   using compiler features that weren't used by kernel code before.

 - async page faults are virtualization specials: not used much 
   elsewhere.

 - There's a 'W' taint in your oops. Probably some harmless prior
   warning?

So your crash signature has the combination of 3 'uncommon' kernel 
features, and a scheduler crash with a relatively constant value that 
should never be zero and which should crash everywhere.

So right now I'd blame the other 3 guys, I wasn't even there that 
night, officer!

Cc:-ed others as well.

Thanks,

	Ingo

[*] Please name debugging features accordingly: CONFIG_DEBUG_PAGE_OWNER.
    Maybe even prefix them with the subsystem: CONFIG_DEBUG_VM_PAGE_OWNER.

    We already have a nice set of CONFIG_DEBUG_VM* options:

      CONFIG_DEBUG_VM
      CONFIG_DEBUG_VM_RB
      CONFIG_DEBUG_VM_VMACACHE

    Also, there's no penalty for including a verb, so that people know 
    wth it's doing, at a glance: CONFIG_DEBUG_VM_TRACK_PAGE_OWNER?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: sched: divide error in sg_capacity_factor
  2015-03-10  4:29 ` Ingo Molnar
@ 2015-03-10 10:55   ` Sasha Levin
  0 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2015-03-10 10:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Dave Jones, LKML, nicolas.pitre, Linus Torvalds,
	Andrew Morton, Andrey Ryabinin

On 03/10/2015 12:29 AM, Ingo Molnar wrote:
> 
> * Sasha Levin <sasha.levin@oracle.com> wrote:
> 
>> Hi all,
>>
>> While fuzzing with trinity inside the latest -next kernel using trinity I've stumbled on:
>>
>> [  936.784266] divide error: 0000 [#1] PREEMPT SMP KASAN
>> [  936.793957] RIP: find_busiest_group (kernel/sched/fair.c:6152 kernel/sched/fair.c:6223 kernel/sched/fair.c:6341 kernel/sched/fair.c:6603)
> 
> Hm, these line numbers don't seem to match up very well with my 
> version of linux-next:
> 
>   28855005be1d Add linux-next specific files for 20150306
> 
> and the Git version info included in the oops seems useless:
> 
>   4.0.0-rc1-sasha-00044-ge21109a
> 
>   $ git log e21109a
>   fatal: ambiguous argument 'e21109a': unknown revision or path not in the working tree.
> 
> I think the kernel's SHA1 should be made at least 12 char wide, 
> regardless of the user's gitconfig::core.abbrev settings.
> 
> Also, latest linux-next is -rc2 based, while your version string says 
> -rc1.
> 
>> [  936.829403] load_balance (kernel/sched/fair.c:6857)
> 
> this does not match up either.
> 
>> [  936.829403] run_rebalance_domains (kernel/sched/fair.c:7494 kernel/sched/fair.c:7659)
> 
> The line numbers are not even close to anything related: 
> run_rebalance_domains() starts at line 7666 and ends at 7680.

Right, this is my fuck up. It seems that I was fuzzing 4.0-rc1 rather than
-next as I thought I was. I forgot to go back to -next after I tested a few
things on Linus's tree.

So the line numbers should match correctly with Linus's tree as you've already
guessed below.

> Also, why are the offsets into the function missing from the output? 
> Those would allow the rough determination of the crash site, even if 
> debuginfo is crap.

I found that offsets are useless here because of the really odd things the
compiler does based on my config. There's so many things that got inlined
in this case that I think offsets wouldn't mean anything to you here.

For example, in this case the division by 0 happened on load_balance+0x88a/0x2399.

> I also checked Linus's latest, and they do seem to match up better:
> 
>   affb8172de39 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
> 
> and the line number gives:
> 
>         capacity_factor = min_t(unsigned,
>                 capacity_factor, DIV_ROUND_CLOSEST(capacity, SCHED_CAPACITY_SCALE));
> 
> but that's a division with a constant? Should not trap.
> 
> So I rebuild a kernel with debug info, pattern matched the disassembly 
> you provided, and that gave me this division:
> 
> (gdb) list *0xffffffff8107d958
> 0xffffffff8107d958 is in find_busiest_group (kernel/sched/fair.c:6162).
> 6157            capacity = group->sgc->capacity;
> 6158            capacity_orig = group->sgc->capacity_orig;
> 6159            cpus = group->group_weight;
> 6160
> 6161            /* smt := ceil(cpus / capacity), assumes: 1 < smt_capacity < 2 */
> 6162            smt = DIV_ROUND_UP(SCHED_CAPACITY_SCALE * cpus, capacity_orig);
> 6163            capacity_factor = cpus / smt; /* cores */
> 6164
> 6165            capacity_factor = min_t(unsigned,
> 6166                    capacity_factor, DIV_ROUND_CLOSEST(capacity, SCHED_CAPACITY_SCALE));

This is the division I was seeing as well.

> So this too seems not very plausible: 'capacity_orig' comes straight 
> from group->sgc->capacity_orig, which is:
> 
>  - boot time initialized
> 
>  - sometimes recalculated during CPU hot-plug: not sure how much of 
>    that your tests are doing?

I'm not forcing them, but they do happen pretty often.

>  - but otherwise it's fairly constant and should have crashed your 
>    system early on if it was set up wrong
> 
> unless I missed something that is.
> 
>> [  936.829403] __do_softirq (kernel/softirq.c:273 include/linux/jump_label.h:114 include/trace/events/irq.h:126 kernel/softirq.c:274)
>> [  936.829403] irq_exit (kernel/softirq.c:350 kernel/softirq.c:391)
>> [  936.829403] smp_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:918)
>> [  936.829403] apic_timer_interrupt (arch/x86/kernel/entry_64.S:958)
>> [  936.829403]  <EOI>
>> [  936.829403] print_context_stack (arch/x86/kernel/dumpstack.c:105)
>> [  936.829403] dump_trace (arch/x86/kernel/dumpstack_64.c:244)
>> [  936.829403] save_stack_trace (arch/x86/kernel/stacktrace.c:64)
>> [  936.829403] __set_page_owner (mm/page_owner.c:72)
>> [  936.829403] get_page_from_freelist (include/linux/page_owner.h:26 mm/page_alloc.c:2176)
>> [  936.829403] __alloc_pages_nodemask (mm/page_alloc.c:2844)
>> [  936.829403] alloc_pages_vma (mm/mempolicy.c:2007)
>> [  936.829403] handle_mm_fault (mm/memory.c:2156 mm/memory.c:3164 mm/memory.c:3269 mm/memory.c:3298)
>> [  936.829403] __do_page_fault (arch/x86/mm/fault.c:1235)
>> [  936.829403] trace_do_page_fault (arch/x86/mm/fault.c:1329)
>> [  936.829403] do_async_page_fault (arch/x86/kernel/kvm.c:280)
>> [  936.829403] async_page_fault (arch/x86/kernel/entry_64.S:1295)
> 
> So debug info weirdnesses aside, other divisions in 
> find_busiest_group():
> 
>         sds.avg_load = (SCHED_CAPACITY_SCALE * sds.total_load)
>                                                 / sds.total_capacity;
> 
> total_capacity ought to be zero only on a totally borked machine 
> (unlikely to boot), or on memory corruption.
> 
> if calculate_imbalance() got inlined, then:
> 
>                 load_above_capacity /= busiest->group_capacity;
> 
> that too ought to only get corrupted in the most serious cases, we 
> don't recalculate it runtime.
> 
> So I'm baffled. Some tentative handwaving, pointing away from the 
> scheduler:
> 
>  - Your stack trace is 'weird' not just due to debug info: an async 
>    page fault doing allocations, doing a stack trace, interruted by a 
>    timer irq, doing scheduler rebalancing...

Yeah, it's not the most straightforward trace, but it's not "broken" -
it's a plausible scenario.

>  - The (spectacularly misnamed [*] ) CONFIG_PAGE_OWNER=y page lifetime
>    tracing facility got enabled explicitly via the page_owner=on boot 
>    parameter, right? Not many people are doing that I suspect.

Right. But all it did here was save a stack trace, no?

>  - CONFIG_KASAN=y is enabled in your kernel. New, invasive option, 
>    using compiler features that weren't used by kernel code before.

Agreed, I've Cc'ed Andrey and hope that he could see if there's anything
up with the assembly that's fishy.

>  - async page faults are virtualization specials: not used much 
>    elsewhere.
> 
>  - There's a 'W' taint in your oops. Probably some harmless prior
>    warning?

Yup, just one of my debug patches that helps me track down a different
bug.

> So your crash signature has the combination of 3 'uncommon' kernel 
> features, and a scheduler crash with a relatively constant value that 
> should never be zero and which should crash everywhere.
> 
> So right now I'd blame the other 3 guys, I wasn't even there that 
> night, officer!

Fair enough, I've only seen it once myself so I can ignore it until (if)
I hit it again.

Sorry again about the tree mixup. I'll go back to fuzzing -next now :)


Thanks,
Sasha

> Cc:-ed others as well.
> 
> Thanks,
> 
> 	Ingo
> 
> [*] Please name debugging features accordingly: CONFIG_DEBUG_PAGE_OWNER.
>     Maybe even prefix them with the subsystem: CONFIG_DEBUG_VM_PAGE_OWNER.
> 
>     We already have a nice set of CONFIG_DEBUG_VM* options:
> 
>       CONFIG_DEBUG_VM
>       CONFIG_DEBUG_VM_RB
>       CONFIG_DEBUG_VM_VMACACHE
> 
>     Also, there's no penalty for including a verb, so that people know 
>     wth it's doing, at a glance: CONFIG_DEBUG_VM_TRACK_PAGE_OWNER?
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-03-10 10:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-09 17:46 sched: divide error in sg_capacity_factor Sasha Levin
2015-03-10  4:29 ` Ingo Molnar
2015-03-10 10:55   ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).