linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* task_stat splat
@ 2014-11-23 11:12 Borislav Petkov
  2014-11-23 17:22 ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2014-11-23 11:12 UTC (permalink / raw)
  To: lkml; +Cc: Rik van Riel, Peter Zijlstra, Oleg Nesterov, Steven Rostedt, x86-ml

Hi,

so I'm seeing the oops below on rc5 + tip/master from the 17th merged
ontop. I've seen it twice already after resuming the box so maybe not a
glitch.

So from looking at the splat I *think* I can see conky trying to read
/proc/.../stat and we end up in

proc_tgid_stat
|-> do_task_stat
    |-> thread_group_cputime_adjusted
        |-> thread_group_cputime

where we end up with a zero PMD. RIP is corrupted too so we're somewhere
off in the fields.

Machine wedges in completely after the NMI hardlockup detector dumps
splats on each core. I have those too, if anyone wants to see them.

Comment over thread_group_cputime() talks about dead tasks accounting
which might be relevant as we're seeing not mapped page hierarchy so
something must have gone away recently but we try to look at it.

So let me CC the people who have touched kernel/sched/cputime.c
recently, they might have an idea... :)

Thanks.

[   10.324923] PM: Image loading progress: 100%
[   10.325017] PM: Image loading done.
[   10.325127] PM: Read 3730208 kbytes in 6.37 seconds (585.58 MB/s)
[   10.329329] PM: Image successfully loaded
[   10.332518] serial 00:06: disabled
[   10.332591] serial 00:06: System wakeup disabled by ACPI
[42142.200246] r8169 0000:02:00.0 eth0: link up
[42460.368298] BUG: unable to handle kernel NULL pointer dereference at           (null)
[42460.371094] IP: [<          (null)>]           (null)
[42460.373859] PGD 42612c067 PUD 41ba89067 PMD 0 
[42460.376676] Oops: 0010 [#1] PREEMPT SMP 
[42460.379428] Modules linked in: tun ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_mangle iptable_nat nf_conntra
ck_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables sha256_ssse3 sha256_gene
ric cpufreq_powersave cpufreq_userspace cpufreq_stats cpufreq_conservative binfmt_misc ipv6 vfat fat fuse dm_cryp
t dm_mod kvm_amd kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd radeon amd64
_edac_mod k10temp fam15h_power edac_core drm_kms_helper ttm cfbfillrect cfbimgblt cfbcopyarea acpi_cpufreq
[42460.389566] CPU: 1 PID: 3739 Comm: conky Not tainted 3.18.0-rc5+ #1
[42460.389570] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[42460.389573] task: ffff880426b53a50 ti: ffff8800b4e9c000 task.ti: ffff8800b4e9c000
[42460.389581] RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
[42460.389584] RSP: 0018:ffff8800b4e9fbc0  EFLAGS: 00010092
[42460.389587] RAX: ffff88042e3d3c80 RBX: ffff88042bb9a6e0 RCX: 0000015005a00fff
[42460.389589] RDX: ffffffff81672140 RSI: ffff88042d9d0dd8 RDI: ffff88042e3d3c80
[42460.389591] RBP: ffff8800b4e9fbf8 R08: 0000000000000118 R09: 0000000000000000
[42460.389594] R10: 0000000000000001 R11: 0000000000000028 R12: ffff88042babe900
[42460.389596] R13: ffff88042bb9aa98 R14: ffff88042bb9a6e0 R15: ffff8800b4e9fc90
[42460.389599] FS:  00007febc82e8700(0000) GS:ffff88042d800000(0000) knlGS:0000000000000000
[42460.389602] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42460.389604] CR2: 0000000000000000 CR3: 0000000426b88000 CR4: 00000000000407e0
[42460.389605] Stack:
[42460.389612]  ffffffff810813d9 ffff8800b4e9fbe8 ffff88042e3d3c80 ffff88042bb9a6e0
[42460.389618]  0000000000000082 ffff88042bb9a6e0 ffff88042babe900 ffff8800b4e9fc78
[42460.389624]  ffffffff81088b5c ffffffff81088e2b 0000000400000003 ffff88042babeb38
[42460.389625] Call Trace:
[42460.389635]  [<ffffffff810813d9>] ? task_sched_runtime+0x99/0xc0
[42460.389643]  [<ffffffff81088b5c>] thread_group_cputime+0x17c/0x2d0
[42460.389649]  [<ffffffff81088e2b>] ? thread_group_cputime_adjusted+0x2b/0x60
[42460.389656]  [<ffffffff81061f23>] ? __lock_task_sighand+0xc3/0x2f0
[42460.389662]  [<ffffffff81088e2b>] thread_group_cputime_adjusted+0x2b/0x60
[42460.389670]  [<ffffffff811ed9b9>] do_task_stat+0x8e9/0xb60
[42460.389682]  [<ffffffff811ee7e4>] proc_tgid_stat+0x14/0x20
[42460.389687]  [<ffffffff811e815f>] proc_single_show+0x5f/0xa0
[42460.389694]  [<ffffffff811a8e50>] seq_read+0xe0/0x3c0
[42460.389700]  [<ffffffff811a2658>] ? __fdget_pos+0x48/0x50
[42460.389707]  [<ffffffff81181282>] vfs_read+0xa2/0x160
[42460.389713]  [<ffffffff81181da2>] SyS_read+0x52/0xc0
[42460.389721]  [<ffffffff816563d6>] system_call_fastpath+0x16/0x1b
[42460.389730] Code:  Bad RIP value.
[42460.389733] RIP  [<          (null)>]           (null)
[42460.389734]  RSP <ffff8800b4e9fbc0>
[42460.389736] CR2: 0000000000000000
[42460.389740] ---[ end trace 9f7e43df784ab3e3 ]---
[42460.389769] note: conky[3739] exited with preempt_count 3

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: task_stat splat
  2014-11-23 11:12 task_stat splat Borislav Petkov
@ 2014-11-23 17:22 ` Oleg Nesterov
  2014-11-23 17:46   ` Borislav Petkov
  2014-11-23 17:56   ` Oleg Nesterov
  0 siblings, 2 replies; 8+ messages in thread
From: Oleg Nesterov @ 2014-11-23 17:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: lkml, Rik van Riel, Peter Zijlstra, Steven Rostedt, x86-ml

On 11/23, Borislav Petkov wrote:
>
> where we end up with a zero PMD. RIP is corrupted too so we're somewhere
> off in the fields.

PMD = 0 is fine I guess, addr == 0 is not mapped.

> Comment over thread_group_cputime() talks about dead tasks accounting

This comment simply means that we also need to read the accumulated
counters in tsk->signal.

> which might be relevant as we're seeing not mapped page hierarchy so
> something must have gone away recently but we try to look at it.

This is called under ->siglock, we can't race with exit/etc. But this
doesn't matter, it is not that we (say) get t == NULL or something like
this.

RIP == 0, and this looks "impossible", I do not see indirect function
calls in this paths.

> [42460.389625] Call Trace:
> [42460.389635]  [<ffffffff810813d9>] ? task_sched_runtime+0x99/0xc0
> [42460.389643]  [<ffffffff81088b5c>] thread_group_cputime+0x17c/0x2d0

Could you identify somehow the line which matches task_sched_runtime+0x99 ?
Probably do_task_delta_exec()...

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: task_stat splat
  2014-11-23 17:22 ` Oleg Nesterov
@ 2014-11-23 17:46   ` Borislav Petkov
  2014-11-23 18:26     ` Oleg Nesterov
  2014-11-23 17:56   ` Oleg Nesterov
  1 sibling, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2014-11-23 17:46 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: lkml, Rik van Riel, Peter Zijlstra, Steven Rostedt, x86-ml

On Sun, Nov 23, 2014 at 06:22:56PM +0100, Oleg Nesterov wrote:
> Could you identify somehow the line which matches
> task_sched_runtime+0x99 ? Probably do_task_delta_exec()...

See http://marc.info/?l=linux-kernel&m=141675498417759&w=2

Chris triggers the same splat and reverting

6e998916dfe3 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency")

fixes the issue for him. I'm testing that same thing reverted too, first
resume cycle looks good.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: task_stat splat
  2014-11-23 17:22 ` Oleg Nesterov
  2014-11-23 17:46   ` Borislav Petkov
@ 2014-11-23 17:56   ` Oleg Nesterov
  2014-11-23 18:17     ` Oleg Nesterov
  1 sibling, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2014-11-23 17:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: lkml, Rik van Riel, Peter Zijlstra, Steven Rostedt, x86-ml

On 11/23, Oleg Nesterov wrote:
>
> On 11/23, Borislav Petkov wrote:
> >
> > where we end up with a zero PMD. RIP is corrupted too so we're somewhere
> > off in the fields.
>
> PMD = 0 is fine I guess, addr == 0 is not mapped.
>
> > Comment over thread_group_cputime() talks about dead tasks accounting
>
> This comment simply means that we also need to read the accumulated
> counters in tsk->signal.
>
> > which might be relevant as we're seeing not mapped page hierarchy so
> > something must have gone away recently but we try to look at it.
>
> This is called under ->siglock, we can't race with exit/etc. But this
> doesn't matter, it is not that we (say) get t == NULL or something like
> this.
>
> RIP == 0, and this looks "impossible", I do not see indirect function
> calls in this paths.

Ah, I didn't notice you mentioned tip/master... so it looks as if
sched_class->update_curr is NULL?

>
> > [42460.389625] Call Trace:
> > [42460.389635]  [<ffffffff810813d9>] ? task_sched_runtime+0x99/0xc0
> > [42460.389643]  [<ffffffff81088b5c>] thread_group_cputime+0x17c/0x2d0
>
> Could you identify somehow the line which matches task_sched_runtime+0x99 ?
> Probably do_task_delta_exec()...
>
> Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: task_stat splat
  2014-11-23 17:56   ` Oleg Nesterov
@ 2014-11-23 18:17     ` Oleg Nesterov
  2014-11-23 18:21       ` Oleg Nesterov
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2014-11-23 18:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: lkml, Rik van Riel, Peter Zijlstra, Steven Rostedt, x86-ml

On 11/23, Oleg Nesterov wrote:
>
> On 11/23, Oleg Nesterov wrote:
> >
> > On 11/23, Borislav Petkov wrote:
> > >
> > > where we end up with a zero PMD. RIP is corrupted too so we're somewhere
> > > off in the fields.
> >
> > PMD = 0 is fine I guess, addr == 0 is not mapped.
> >
> > > Comment over thread_group_cputime() talks about dead tasks accounting
> >
> > This comment simply means that we also need to read the accumulated
> > counters in tsk->signal.
> >
> > > which might be relevant as we're seeing not mapped page hierarchy so
> > > something must have gone away recently but we try to look at it.
> >
> > This is called under ->siglock, we can't race with exit/etc. But this
> > doesn't matter, it is not that we (say) get t == NULL or something like
> > this.
> >
> > RIP == 0, and this looks "impossible", I do not see indirect function
> > calls in this paths.
>
> Ah, I didn't notice you mentioned tip/master... so it looks as if
> sched_class->update_curr is NULL?

Perhaps this is migration thread? stop_sched_class doesn't have ->update_curr.

could you try to cat /proc/pid-of-migration-thread/stat on your machine?

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: task_stat splat
  2014-11-23 18:17     ` Oleg Nesterov
@ 2014-11-23 18:21       ` Oleg Nesterov
  2014-11-23 18:38         ` Borislav Petkov
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2014-11-23 18:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: lkml, Rik van Riel, Peter Zijlstra, Steven Rostedt, x86-ml

Damn, sorry for noise ;)

On 11/23, Oleg Nesterov wrote:
>
> On 11/23, Oleg Nesterov wrote:
> >
> > On 11/23, Oleg Nesterov wrote:
> > >
> > > On 11/23, Borislav Petkov wrote:
> > > >
> > > > where we end up with a zero PMD. RIP is corrupted too so we're somewhere
> > > > off in the fields.
> > >
> > > PMD = 0 is fine I guess, addr == 0 is not mapped.
> > >
> > > > Comment over thread_group_cputime() talks about dead tasks accounting
> > >
> > > This comment simply means that we also need to read the accumulated
> > > counters in tsk->signal.
> > >
> > > > which might be relevant as we're seeing not mapped page hierarchy so
> > > > something must have gone away recently but we try to look at it.
> > >
> > > This is called under ->siglock, we can't race with exit/etc. But this
> > > doesn't matter, it is not that we (say) get t == NULL or something like
> > > this.
> > >
> > > RIP == 0, and this looks "impossible", I do not see indirect function
> > > calls in this paths.
> >
> > Ah, I didn't notice you mentioned tip/master... so it looks as if
> > sched_class->update_curr is NULL?
>
> Perhaps this is migration thread? stop_sched_class doesn't have ->update_curr.

Yes, I think this can explain the problem, but

> could you try to cat /proc/pid-of-migration-thread/stat on your machine?

This won't trigger the crash unless it is running.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: task_stat splat
  2014-11-23 17:46   ` Borislav Petkov
@ 2014-11-23 18:26     ` Oleg Nesterov
  0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2014-11-23 18:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: lkml, Rik van Riel, Peter Zijlstra, Steven Rostedt, x86-ml

On 11/23, Borislav Petkov wrote:
>
> On Sun, Nov 23, 2014 at 06:22:56PM +0100, Oleg Nesterov wrote:
> > Could you identify somehow the line which matches
> > task_sched_runtime+0x99 ? Probably do_task_delta_exec()...
>
> See http://marc.info/?l=linux-kernel&m=141675498417759&w=2
>
> Chris triggers the same splat and reverting
>
> 6e998916dfe3 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency")
>
> fixes the issue for him. I'm testing that same thing reverted too, first
> resume cycle looks good.

Yes, this probably confirms the theory, stop_class needs update_curr.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: task_stat splat
  2014-11-23 18:21       ` Oleg Nesterov
@ 2014-11-23 18:38         ` Borislav Petkov
  0 siblings, 0 replies; 8+ messages in thread
From: Borislav Petkov @ 2014-11-23 18:38 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: lkml, Rik van Riel, Peter Zijlstra, Steven Rostedt, x86-ml

On Sun, Nov 23, 2014 at 07:21:08PM +0100, Oleg Nesterov wrote:
> > Perhaps this is migration thread? stop_sched_class doesn't have ->update_curr.
> 
> Yes, I think this can explain the problem, but
> 
> > could you try to cat /proc/pid-of-migration-thread/stat on your machine?
> 
> This won't trigger the crash unless it is running.

This ?

$ for pid in $(pgrep migration); do cat /proc/$pid/stat; done
10 (migration/0) S 2 0 0 0 -1 69238848 0 0 0 0 0 10 0 0 -100 0 1 0 8 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 0 99 1 0 0 0 0 0 0 0 0 0 0 0
13 (migration/1) S 2 0 0 0 -1 69238848 0 0 0 0 0 4 0 0 -100 0 1 0 9 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 1 99 1 0 0 0 0 0 0 0 0 0 0 0
18 (migration/2) S 2 0 0 0 -1 69238848 0 0 0 0 0 4 0 0 -100 0 1 0 11 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 2 99 1 0 0 0 0 0 0 0 0 0 0 0
23 (migration/3) S 2 0 0 0 -1 69238848 0 0 0 0 0 10 0 0 -100 0 1 0 14 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 3 99 1 0 0 0 0 0 0 0 0 0 0 0
28 (migration/4) S 2 0 0 0 -1 69238848 0 0 0 0 0 4 0 0 -100 0 1 0 16 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 4 99 1 0 0 0 0 0 0 0 0 0 0 0
33 (migration/5) S 2 0 0 0 -1 69238848 0 0 0 0 0 11 0 0 -100 0 1 0 19 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 5 99 1 0 0 0 0 0 0 0 0 0 0 0
38 (migration/6) S 2 0 0 0 -1 69238848 0 0 0 0 0 10 0 0 -100 0 1 0 21 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 6 99 1 0 0 0 0 0 0 0 0 0 0 0
43 (migration/7) S 2 0 0 0 -1 69238848 0 0 0 0 0 6 0 0 -100 0 1 0 23 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 0 17 7 99 1 0 0 0 0 0 0 0 0 0 0 0

It doesn't trigger, of course, as I've reverted the patch. Second resume
looks ok.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-11-23 18:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-23 11:12 task_stat splat Borislav Petkov
2014-11-23 17:22 ` Oleg Nesterov
2014-11-23 17:46   ` Borislav Petkov
2014-11-23 18:26     ` Oleg Nesterov
2014-11-23 17:56   ` Oleg Nesterov
2014-11-23 18:17     ` Oleg Nesterov
2014-11-23 18:21       ` Oleg Nesterov
2014-11-23 18:38         ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).