All of lore.kernel.org
 help / color / mirror / Atom feed
* New crashes walking proc with Saturday's git
@ 2014-11-23 15:02 Chris Mason
  2014-11-23 15:56 ` Chris Mason
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Mason @ 2014-11-23 15:02 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

Hi everyone,

I was running some tests on Saturday before my pull, and I'm now hitting
this consistently across two boxes.  One box has plain linus git:

commit cb95413971d605b0d152d3ceecc47ba8991d66fb
Merge: ecde006 6bab4a8
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sat Nov 22 14:33:11 2014 -0800

The other has the btrfs locking fix I wanted to send in on top.  The
tests are just normal xfstests, and I don't think they are causing this.

The process triggering this is dynoProcMon, which is an internal
facebook program that walks proc and apparently checks process stats.
It's not part of the xfstests run, just internal monitoring.

I had the same xfstests in a loop on Friday with plain 3.18-rc5 trying to
trigger the skbuff memory corruption fixed by Dave's pull.  This
crash in /proc wasn't triggering then.

It's possible that dynoProcMon was changed to hammer on proc in new
ways.  These utils do get updated in the background sometimes (I'll
check).

It takes about an hour to crash, so not really bisectable.  I left
the boxes idle overnight and they were both dead in the morning.

Since it looks like a race between process exit and /proc, I'll try to
hammer on that for a better reproduction.  But, here's hoping that
someone has already seen this one:

[ 1333.162263] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1333.178174] IP: [<          (null)>]           (null)
[ 1333.188406] PGD 10153db067 PUD 10398d8067 PMD 0 
[ 1333.197825] Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
[ 1333.207051] Modules linked in: fuse k10temp coretemp hwmon btrfs raid6_pq zlib_deflate lzo_compress xor xfs exportfs libcrc32c tcp_d
iag inet_diag nfsv4 loop ip6table_filter ip6_tables xt_NFLOG nfnetlink_log nfnetlink xt_comment xt_statistic iptable_filter ip_tables x
_tables nfsv3 nfs lockd grace mptctl netconsole autofs4 rpcsec_gss_krb5 auth_rpcgss oid_registry sunrpc ipv6 ext3 jbd dm_mod iTCO_wdt i
TCO_vendor_support pcspkr i2c_i801 shpchp lpc_ich mfd_core ehci_pci ehci_hcd mlx4_en ptp pps_core mlx4_core rtc_cmos ipmi_si ipmi_msgha
ndler ses enclosure sg button megaraid_sas
[ 1333.310989] CPU: 12 PID: 8309 Comm: dynoProcMon Not tainted 3.18.0-rc5-mason+ #66
[ 1333.326070] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 1.07 05/10/2012
[ 1333.341847] task: ffff881035795cd0 ti: ffff88100b29c000 task.ti: ffff88100b29c000
[ 1333.356927] RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
[ 1333.372052] RSP: 0018:ffff88100b29fb90  EFLAGS: 00010092
[ 1333.382740] RAX: ffffffff8180dd80 RBX: ffff880853389390 RCX: 006a6f3444800000
[ 1333.397066] RDX: 0000013666a9a100 RSI: 00000000000001d1 RDI: ffff88085fd13a00
[ 1333.411385] RBP: ffff88100b29fbc8 R08: 0000000000000000 R09: 0000000000000000
[ 1333.425705] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88085fd13a00
[ 1333.440029] R13: ffff88100b29fc18 R14: ffff880853389390 R15: ffff880853389390
[ 1333.454349] FS:  00007f02f55fd700(0000) GS:ffff881077c80000(0000) knlGS:0000000000000000
[ 1333.470650] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1333.482207] CR2: 0000000000000000 CR3: 00000010153da000 CR4: 00000000000407e0
[ 1333.496530] Stack:
[ 1333.500624]  ffffffff8107e3ab 0000000000000000 ffff88100b29fc68 ffff880853389390
[ 1333.515665]  0000000000000086 ffff88100b29fc68 0000000000000000 ffff88100b29fc58
[ 1333.530714]  ffffffff81082f41 ffffffff81083042 ffffffff81060800 0000000000000000
[ 1333.545744] Call Trace:
[ 1333.550712]  [<ffffffff8107e3ab>] ? task_sched_runtime+0xab/0xb0
[ 1333.562783]  [<ffffffff81082f41>] thread_group_cputime+0x161/0x230
[ 1333.575198]  [<ffffffff81083042>] ? thread_group_cputime_adjusted+0x32/0x60
[ 1333.589180]  [<ffffffff81060800>] ? __sigqueue_alloc+0x140/0x150
[ 1333.601242]  [<ffffffff81083042>] thread_group_cputime_adjusted+0x32/0x60
[ 1333.614875]  [<ffffffff8121f058>] do_task_stat+0x8b8/0xb00
[ 1333.625901]  [<ffffffff8121f2b4>] proc_tgid_stat+0x14/0x20
[ 1333.636934]  [<ffffffff8121b474>] proc_single_show+0x64/0x90
[ 1333.648309]  [<ffffffff811d6716>] seq_read+0xc6/0x430
[ 1333.658474]  [<ffffffff811afed3>] vfs_read+0xa3/0x110
[ 1333.668637]  [<ffffffff811b04cd>] SyS_read+0x5d/0xd0
[ 1333.678629]  [<ffffffff81676ad2>] system_call_fastpath+0x12/0x17
[ 1333.690698] Code:  Bad RIP value.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 15:02 New crashes walking proc with Saturday's git Chris Mason
@ 2014-11-23 15:56 ` Chris Mason
  2014-11-23 16:11   ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Mason @ 2014-11-23 15:56 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

On Sun, Nov 23, 2014 at 10:02 AM, Chris Mason <clm@fb.com> wrote:
> Since it looks like a race between process exit and /proc, I'll try to
> hammer on that for a better reproduction.  But, here's hoping that
> someone has already seen this one:

It falls over in less than 5 minutes with a fork bomb going.  I'll try 
to bisect.

-chris




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 15:56 ` Chris Mason
@ 2014-11-23 16:11   ` Borislav Petkov
  2014-11-23 16:16     ` Chris Mason
  0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2014-11-23 16:11 UTC (permalink / raw)
  To: Chris Mason; +Cc: torvalds, linux-kernel

On Sun, Nov 23, 2014 at 10:56:27AM -0500, Chris Mason wrote:
> On Sun, Nov 23, 2014 at 10:02 AM, Chris Mason <clm@fb.com> wrote:
> >Since it looks like a race between process exit and /proc, I'll try to
> >hammer on that for a better reproduction.  But, here's hoping that
> >someone has already seen this one:
> 
> It falls over in less than 5 minutes with a fork bomb going.  I'll try to
> bisect.

I'm seeing the same thing:
https://lkml.kernel.org/r/20141123111220.GA6436@pd.tnic

Happens here after I resume from suspend to disk.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 16:11   ` Borislav Petkov
@ 2014-11-23 16:16     ` Chris Mason
  2014-11-23 16:32       ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Mason @ 2014-11-23 16:16 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: torvalds, linux-kernel



On Sun, Nov 23, 2014 at 11:11 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Sun, Nov 23, 2014 at 10:56:27AM -0500, Chris Mason wrote:
>>  On Sun, Nov 23, 2014 at 10:02 AM, Chris Mason <clm@fb.com> wrote:
>>  >Since it looks like a race between process exit and /proc, I'll 
>> try to
>>  >hammer on that for a better reproduction.  But, here's hoping that
>>  >someone has already seen this one:
>> 
>>  It falls over in less than 5 minutes with a fork bomb going.  I'll 
>> try to
>>  bisect.
> 
> I'm seeing the same thing:


It must be:

commit 6e998916dfe327e785e7c2447959b2c1a3ea4930
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Wed Nov 12 16:58:44 2014 +0100

    sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency

I'll do two runs to confirm, but it's the only related patch between 
rc5 and now.

-chris





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 16:16     ` Chris Mason
@ 2014-11-23 16:32       ` Borislav Petkov
  2014-11-23 16:36         ` Chris Mason
  2014-11-23 16:49         ` Chris Mason
  0 siblings, 2 replies; 15+ messages in thread
From: Borislav Petkov @ 2014-11-23 16:32 UTC (permalink / raw)
  To: Chris Mason; +Cc: torvalds, linux-kernel

On Sun, Nov 23, 2014 at 11:16:51AM -0500, Chris Mason wrote:
> It must be:
> 
> commit 6e998916dfe327e785e7c2447959b2c1a3ea4930
> Author: Stanislaw Gruszka <sgruszka@redhat.com>
> Date:   Wed Nov 12 16:58:44 2014 +0100
> 
>    sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency
> 
> I'll do two runs to confirm, but it's the only related patch between rc5 and
> now.

Cool, let me know and I'll revert here too to verify. It might take a
while to confirm as it happens only during resume but I could try the
fork bomb if I must.

>From looking at it I'm not getting any smarter. We iterate over the
threads under a rcu_read_lock() so we should be protected. Unless the
sched changes have somehow effect on RCU grace periods but WTH do I
know... This is crazy code.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 16:32       ` Borislav Petkov
@ 2014-11-23 16:36         ` Chris Mason
  2014-11-23 16:49         ` Chris Mason
  1 sibling, 0 replies; 15+ messages in thread
From: Chris Mason @ 2014-11-23 16:36 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: torvalds, linux-kernel

On Sun, Nov 23, 2014 at 11:32 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Sun, Nov 23, 2014 at 11:16:51AM -0500, Chris Mason wrote:
>>  It must be:
>> 
>>  commit 6e998916dfe327e785e7c2447959b2c1a3ea4930
>>  Author: Stanislaw Gruszka <sgruszka@redhat.com>
>>  Date:   Wed Nov 12 16:58:44 2014 +0100
>> 
>>     sched/cputime: Fix clock_nanosleep()/clock_gettime() 
>> inconsistency
>> 
>>  I'll do two runs to confirm, but it's the only related patch 
>> between rc5 and
>>  now.
> 
> Cool, let me know and I'll revert here too to verify. It might take a
> while to confirm as it happens only during resume but I could try the
> fork bomb if I must.
> 
> From looking at it I'm not getting any smarter. We iterate over the
> threads under a rcu_read_lock() so we should be protected. Unless the
> sched changes have somehow effect on RCU grace periods but WTH do I
> know... This is crazy code.

+               p->sched_class->update_curr(rq);

Total guess while I'm waiting for the test, but somehow this is NULL.

-chris




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 16:32       ` Borislav Petkov
  2014-11-23 16:36         ` Chris Mason
@ 2014-11-23 16:49         ` Chris Mason
  2014-11-23 16:59           ` Borislav Petkov
  2014-11-23 21:05           ` Thomas Gleixner
  1 sibling, 2 replies; 15+ messages in thread
From: Chris Mason @ 2014-11-23 16:49 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: torvalds, linux-kernel, Ingo Molnar, Stanislaw Gruszka

On Sun, Nov 23, 2014 at 11:32 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Sun, Nov 23, 2014 at 11:16:51AM -0500, Chris Mason wrote:
>>  It must be:
>> 
>>  commit 6e998916dfe327e785e7c2447959b2c1a3ea4930
>>  Author: Stanislaw Gruszka <sgruszka@redhat.com>
>>  Date:   Wed Nov 12 16:58:44 2014 +0100
>> 
>>     sched/cputime: Fix clock_nanosleep()/clock_gettime() 
>> inconsistency
>> 
>>  I'll do two runs to confirm, but it's the only related patch 
>> between rc5 and
>>  now.

I've adding Ingo and Stanislaw to the cc.  With 
6e998916dfe327e785e7c2447959b2c1a3ea4930 reverted, I'm no longer 
crashing.

Repeating the stack trace for the new cc list.  I see the crash with 
atop or similar walkers of /proc racing against exiting programs.  
Given the NULL rip, this line from the patch is probably broken, but it 
really feels like we should be falling over on p->sched_class and not 
on the update_curr func.

+               p->sched_class->update_curr(rq);

I'm leaving my fork bomb running on two machines with the patch 
reverted to make sure.

[ 1053.317472] BUG: unable to handle kernel NULL pointer dereference at 
          (null)
[ 1053.333312] IP: [<          (null)>]           (null)
[ 1053.343498] PGD 1050f5c067 PUD 1044f86067 PMD 0
[ 1053.352874] Oops: 0010 [#1] SMP
[ 1053.359457] Modules linked in: loop k10temp coretemp hwmon btrfs 
raid6_pq zlib_deflate lzo_compress xor fuse tcp_diag inet_diag nfsv
_tables x_tables nfsv3 nfs lockd grace mptctl netconsole autofs4 
rpcsec_gss_krb5 auth_rpcgss oid_registry sunrpc ipv6 ext3 jbd dm_mod r
shpchp ehci_pci ehci_hcd mlx4_en ptp pps_core mlx4_core sg ses 
enclosure button megaraid_sas
[ 1053.460866] CPU: 19 PID: 8404 Comm: atop Not tainted 
3.18.0-rc5-mason+ #35
[ 1053.474665] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, 
BIOS 1.07 05/10/2012
[ 1053.490444] task: ffff8810449d0000 ti: ffff88103a1e0000 task.ti: 
ffff88103a1e0000
[ 1053.505527] RIP: 0010:[<0000000000000000>]  [<          (null)>]     
      (null)
[ 1053.520637] RSP: 0018:ffff88103a1e3bb0  EFLAGS: 00010096
[ 1053.531307] RAX: ffffffff8180dd80 RBX: ffff8810547b6040 RCX: 
0056d214af400000
[ 1053.545632] RDX: 000000f53e9ce885 RSI: 00000000000001d1 RDI: 
ffff88107fc32d80
[ 1053.559954] RBP: ffff88103a1e3be8 R08: 0000000000000001 R09: 
0000000000000000
[ 1053.574274] R10: 0000000000000001 R11: 0000000000000246 R12: 
ffff88107fc32d80
[ 1053.588596] R13: ffff88103a1e3c68 R14: ffff8810547b6040 R15: 
0000000000000000
[ 1053.602917] FS:  00007f37b298e700(0000) GS:ffff88085fd60000(0000) 
knlGS:0000000000000000
[ 1053.619215] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1053.630759] CR2: 0000000000000000 CR3: 000000104652d000 CR4: 
00000000000407e0
[ 1053.645084] Stack:
[ 1053.649176]  ffffffff81077d4b ffff88103a1e3be8 ffffffff811c18bf 
ffff88087fffcd80
[ 1053.664201]  0000000000000086 ffff8810547b6040 ffff8808542381ac 
ffff88103a1e3c58
[ 1053.679233]  ffffffff8107f94a ffff8810449d07c0 ffff8808542381a8 
0000000000000000
[ 1053.694263] Call Trace:
[ 1053.699227]  [<ffffffff81077d4b>] ? task_sched_runtime+0xab/0xb0
[ 1053.711288]  [<ffffffff811c18bf>] ? seq_open+0x4f/0xc0
[ 1053.721623]  [<ffffffff8107f94a>] thread_group_cputime+0xda/0x190
[ 1053.733868]  [<ffffffff8107fa32>] 
thread_group_cputime_adjusted+0x32/0x60
[ 1053.747498]  [<ffffffff8105e381>] ? __lock_task_sighand+0x51/0xb0
[ 1053.759741]  [<ffffffff81208348>] do_task_stat+0x8b8/0xb00
[ 1053.770769]  [<ffffffff812085a4>] proc_tgid_stat+0x14/0x20
[ 1053.781801]  [<ffffffff81205114>] proc_single_show+0x64/0x90
[ 1053.793177]  [<ffffffff811c1bbb>] seq_read+0xbb/0x410
[ 1053.803342]  [<ffffffff8119d143>] vfs_read+0xa3/0x110
[ 1053.813506]  [<ffffffff811ba753>] ? __fdget+0x13/0x20
[ 1053.823672]  [<ffffffff8119d6fa>] SyS_read+0x5a/0xd0
[ 1053.833664]  [<ffffffff816427d2>] system_call_fastpath+0x12/0x17
[ 1053.845733] Code:  Bad RIP value.
[ 1053.852490] RIP  [<          (null)>]           (null)
[ 1053.862854]  RSP <ffff88103a1e3bb0>
[ 1053.869883] CR2: 0000000000000000
[ 1053.877131] ---[ end trace a218425ffc5c90cd ]---




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 16:49         ` Chris Mason
@ 2014-11-23 16:59           ` Borislav Petkov
  2014-11-23 21:05           ` Thomas Gleixner
  1 sibling, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2014-11-23 16:59 UTC (permalink / raw)
  To: Chris Mason; +Cc: torvalds, linux-kernel, Ingo Molnar, Stanislaw Gruszka

On Sun, Nov 23, 2014 at 11:49:02AM -0500, Chris Mason wrote:
> I've adding Ingo and Stanislaw to the cc.  With
> 6e998916dfe327e785e7c2447959b2c1a3ea4930 reverted, I'm no longer crashing.

Ok, reverting here too, to confirm.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 16:49         ` Chris Mason
  2014-11-23 16:59           ` Borislav Petkov
@ 2014-11-23 21:05           ` Thomas Gleixner
  2014-11-23 21:11             ` Chris Mason
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2014-11-23 21:05 UTC (permalink / raw)
  To: Chris Mason
  Cc: Borislav Petkov, torvalds, linux-kernel, Ingo Molnar, Stanislaw Gruszka

On Sun, 23 Nov 2014, Chris Mason wrote:
> On Sun, Nov 23, 2014 at 11:32 AM, Borislav Petkov <bp@alien8.de> wrote:
> > On Sun, Nov 23, 2014 at 11:16:51AM -0500, Chris Mason wrote:
> > >  It must be:
> > > 
> > >  commit 6e998916dfe327e785e7c2447959b2c1a3ea4930
> > >  Author: Stanislaw Gruszka <sgruszka@redhat.com>
> > >  Date:   Wed Nov 12 16:58:44 2014 +0100
> > > 
> > >     sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency
> > > 
> > >  I'll do two runs to confirm, but it's the only related patch between rc5
> > > and
> > >  now.
> 
> I've adding Ingo and Stanislaw to the cc.  With
> 6e998916dfe327e785e7c2447959b2c1a3ea4930 reverted, I'm no longer crashing.
> 
> Repeating the stack trace for the new cc list.  I see the crash with atop or
> similar walkers of /proc racing against exiting programs.  Given the NULL rip,
> this line from the patch is probably broken, but it really feels like we
> should be falling over on p->sched_class and not on the update_curr func.
> 
> +               p->sched_class->update_curr(rq);
> 
> I'm leaving my fork bomb running on two machines with the patch reverted to
> make sure.

The sched_class instances which do not have update_curr are stop_task
and idle. Patch below.

I'm sure nobody thought about the stats read code path here.

[ 1053.759741]  [<ffffffff81208348>] do_task_stat+0x8b8/0xb00

do_task_stat(()
 thread_group_cputime_adjusted()
   thread_group_cputime()
     task_cputime()
       task_sched_runtime()
	if (task_current(rq, p) && task_on_rq_queued(p)) {
                update_rq_clock(rq);
                p->sched_class->update_curr(rq);
        }

Now if the stats are read for a stomp machine task, aka 'migration/N'
and that task is current on its cpu. Ooops.

I added the callback for idle tasks as well for completeness sake.

Thanks,

	tglx
---
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index 67ad4e7f506a..c65dac8c97cd 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -75,6 +75,10 @@ static unsigned int get_rr_interval_idle(struct rq *rq, struct task_struct *task
 	return 0;
 }
 
+static void update_curr_idle(struct rq *rq)
+{
+}
+
 /*
  * Simple, special scheduling class for the per-CPU idle tasks:
  */
@@ -101,4 +105,5 @@ const struct sched_class idle_sched_class = {
 
 	.prio_changed		= prio_changed_idle,
 	.switched_to		= switched_to_idle,
+	.update_curr		= update_curr_idle,
 };
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 67426e529f59..79ffec45a6ac 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -102,6 +102,10 @@ get_rr_interval_stop(struct rq *rq, struct task_struct *task)
 	return 0;
 }
 
+static void update_curr_stop(struct rq *rq)
+{
+}
+
 /*
  * Simple, special scheduling class for the per-CPU stop tasks:
  */
@@ -128,4 +132,5 @@ const struct sched_class stop_sched_class = {
 
 	.prio_changed		= prio_changed_stop,
 	.switched_to		= switched_to_stop,
+	.update_curr		= update_curr_stop,
 };


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 21:05           ` Thomas Gleixner
@ 2014-11-23 21:11             ` Chris Mason
  2014-11-23 21:29               ` Chris Mason
  2014-11-23 21:38               ` Thomas Gleixner
  0 siblings, 2 replies; 15+ messages in thread
From: Chris Mason @ 2014-11-23 21:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, torvalds, linux-kernel, Ingo Molnar, Stanislaw Gruszka



On Sun, Nov 23, 2014 at 4:05 PM, Thomas Gleixner <tglx@linutronix.de> 
wrote:
> On Sun, 23 Nov 2014, Chris Mason wrote:
>>  On Sun, Nov 23, 2014 at 11:32 AM, Borislav Petkov <bp@alien8.de> 
>> wrote:
>>  > On Sun, Nov 23, 2014 at 11:16:51AM -0500, Chris Mason wrote:
>>  > >  It must be:
>>  > >
>>  > >  commit 6e998916dfe327e785e7c2447959b2c1a3ea4930
>>  > >  Author: Stanislaw Gruszka <sgruszka@redhat.com>
>>  > >  Date:   Wed Nov 12 16:58:44 2014 +0100
>>  > >
>>  > >     sched/cputime: Fix clock_nanosleep()/clock_gettime() 
>> inconsistency
>>  > >
>>  > >  I'll do two runs to confirm, but it's the only related patch 
>> between rc5
>>  > > and
>>  > >  now.
>> 
>>  I've adding Ingo and Stanislaw to the cc.  With
>>  6e998916dfe327e785e7c2447959b2c1a3ea4930 reverted, I'm no longer 
>> crashing.
>> 
>>  Repeating the stack trace for the new cc list.  I see the crash 
>> with atop or
>>  similar walkers of /proc racing against exiting programs.  Given 
>> the NULL rip,
>>  this line from the patch is probably broken, but it really feels 
>> like we
>>  should be falling over on p->sched_class and not on the update_curr 
>> func.
>> 
>>  +               p->sched_class->update_curr(rq);
>> 
>>  I'm leaving my fork bomb running on two machines with the patch 
>> reverted to
>>  make sure.
> 
> The sched_class instances which do not have update_curr are stop_task
> and idle. Patch below.
> 
> I'm sure nobody thought about the stats read code path here.
> 
> [ 1053.759741]  [<ffffffff81208348>] do_task_stat+0x8b8/0xb00
> 
> do_task_stat(()
>  thread_group_cputime_adjusted()
>    thread_group_cputime()
>      task_cputime()
>        task_sched_runtime()
> 	if (task_current(rq, p) && task_on_rq_queued(p)) {
>                 update_rq_clock(rq);
>                 p->sched_class->update_curr(rq);
>         }
> 
> Now if the stats are read for a stomp machine task, aka 'migration/N'
> and that task is current on its cpu. Ooops.
> 
> I added the callback for idle tasks as well for completeness sake.

This does make sense, but it doesn't match with the crash being much 
more likely during the fork bomb.  The difference is crashing within a 
few hours vs crashing within 5 minutes.

But, maybe I just got lucky.  I'll try the patch.

-chris




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 21:11             ` Chris Mason
@ 2014-11-23 21:29               ` Chris Mason
  2014-11-23 21:38                 ` Borislav Petkov
  2014-11-23 21:38               ` Thomas Gleixner
  1 sibling, 1 reply; 15+ messages in thread
From: Chris Mason @ 2014-11-23 21:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, torvalds, linux-kernel, Ingo Molnar, Stanislaw Gruszka

On Sun, Nov 23, 2014 at 4:11 PM, Chris Mason <clm@fb.com> wrote:
> 
> 
> On Sun, Nov 23, 2014 at 4:05 PM, Thomas Gleixner <tglx@linutronix.de> 
> wrote:
>> On Sun, 23 Nov 2014, Chris Mason wrote:
>>>  On Sun, Nov 23, 2014 at 11:32 AM, Borislav Petkov <bp@alien8.de> 
>>> wrote:
>>>  > On Sun, Nov 23, 2014 at 11:16:51AM -0500, Chris Mason wrote:
>>>  > >  It must be:
>>>  > >
>>>  > >  commit 6e998916dfe327e785e7c2447959b2c1a3ea4930
>>>  > >  Author: Stanislaw Gruszka <sgruszka@redhat.com>
>>>  > >  Date:   Wed Nov 12 16:58:44 2014 +0100
>>>  > >
>>>  > >     sched/cputime: Fix clock_nanosleep()/clock_gettime() 
>>> inconsistency
>>>  > >
>>>  > >  I'll do two runs to confirm, but it's the only related patch 
>>> between rc5
>>>  > > and
>>>  > >  now.
>>> 
>>>  I've adding Ingo and Stanislaw to the cc.  With
>>>  6e998916dfe327e785e7c2447959b2c1a3ea4930 reverted, I'm no longer 
>>> crashing.
>>> 
>>>  Repeating the stack trace for the new cc list.  I see the crash 
>>> with atop or
>>>  similar walkers of /proc racing against exiting programs.  Given 
>>> the NULL rip,
>>>  this line from the patch is probably broken, but it really feels 
>>> like we
>>>  should be falling over on p->sched_class and not on the 
>>> update_curr func.
>>> 
>>>  +               p->sched_class->update_curr(rq);
>>> 
>>>  I'm leaving my fork bomb running on two machines with the patch 
>>> reverted to
>>>  make sure.
>> 
>> The sched_class instances which do not have update_curr are stop_task
>> and idle. Patch below.
>> 
>> I'm sure nobody thought about the stats read code path here.
>> 
>> [ 1053.759741]  [<ffffffff81208348>] do_task_stat+0x8b8/0xb00
>> 
>> do_task_stat(()
>>  thread_group_cputime_adjusted()
>>    thread_group_cputime()
>>      task_cputime()
>>        task_sched_runtime()
>> 	if (task_current(rq, p) && task_on_rq_queued(p)) {
>>                 update_rq_clock(rq);
>>                 p->sched_class->update_curr(rq);
>>         }
>> 
>> Now if the stats are read for a stomp machine task, aka 'migration/N'
>> and that task is current on its cpu. Ooops.
>> 
>> I added the callback for idle tasks as well for completeness sake.
> 
> This does make sense, but it doesn't match with the crash being much 
> more likely during the fork bomb.  The difference is crashing within 
> a few hours vs crashing within 5 minutes.
> 
> But, maybe I just got lucky.  I'll try the patch.

11 minutes later and it's still alive.  I'll keep an eye on it and yell 
if it falls over.

-chris




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 21:11             ` Chris Mason
  2014-11-23 21:29               ` Chris Mason
@ 2014-11-23 21:38               ` Thomas Gleixner
  2014-11-23 21:42                 ` Chris Mason
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2014-11-23 21:38 UTC (permalink / raw)
  To: Chris Mason
  Cc: Borislav Petkov, torvalds, linux-kernel, Ingo Molnar, Stanislaw Gruszka

On Sun, 23 Nov 2014, Chris Mason wrote:
> On Sun, Nov 23, 2014 at 4:05 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Sun, 23 Nov 2014, Chris Mason wrote:
> > >  On Sun, Nov 23, 2014 at 11:32 AM, Borislav Petkov <bp@alien8.de> wrote:
> > >  > On Sun, Nov 23, 2014 at 11:16:51AM -0500, Chris Mason wrote:
> > >  > >  It must be:
> > >  > >
> > >  > >  commit 6e998916dfe327e785e7c2447959b2c1a3ea4930
> > >  > >  Author: Stanislaw Gruszka <sgruszka@redhat.com>
> > >  > >  Date:   Wed Nov 12 16:58:44 2014 +0100
> > >  > >
> > >  > >     sched/cputime: Fix clock_nanosleep()/clock_gettime()
> > > inconsistency
> > >  > >
> > >  > >  I'll do two runs to confirm, but it's the only related patch between
> > > rc5
> > >  > > and
> > >  > >  now.
> > > 
> > >  I've adding Ingo and Stanislaw to the cc.  With
> > >  6e998916dfe327e785e7c2447959b2c1a3ea4930 reverted, I'm no longer
> > > crashing.
> > > 
> > >  Repeating the stack trace for the new cc list.  I see the crash with atop
> > > or
> > >  similar walkers of /proc racing against exiting programs.  Given the NULL
> > > rip,
> > >  this line from the patch is probably broken, but it really feels like we
> > >  should be falling over on p->sched_class and not on the update_curr func.
> > > 
> > >  +               p->sched_class->update_curr(rq);
> > > 
> > >  I'm leaving my fork bomb running on two machines with the patch reverted
> > > to
> > >  make sure.
> > 
> > The sched_class instances which do not have update_curr are stop_task
> > and idle. Patch below.
> > 
> > I'm sure nobody thought about the stats read code path here.
> > 
> > [ 1053.759741]  [<ffffffff81208348>] do_task_stat+0x8b8/0xb00
> > 
> > do_task_stat(()
> >  thread_group_cputime_adjusted()
> >    thread_group_cputime()
> >      task_cputime()
> >        task_sched_runtime()
> > 	if (task_current(rq, p) && task_on_rq_queued(p)) {
> >                 update_rq_clock(rq);
> >                 p->sched_class->update_curr(rq);
> >         }
> > 
> > Now if the stats are read for a stomp machine task, aka 'migration/N'
> > and that task is current on its cpu. Ooops.
> > 
> > I added the callback for idle tasks as well for completeness sake.
> 
> This does make sense, but it doesn't match with the crash being much more
> likely during the fork bomb.  The difference is crashing within a few hours vs
> crashing within 5 minutes.

The fork bomb will kick the migration task pretty often into life, so
the probablity of do_task_stat() to hit a running migration thread is
higher than on a normaly loaded machine.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 21:29               ` Chris Mason
@ 2014-11-23 21:38                 ` Borislav Petkov
  0 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2014-11-23 21:38 UTC (permalink / raw)
  To: Chris Mason
  Cc: Thomas Gleixner, torvalds, linux-kernel, Ingo Molnar, Stanislaw Gruszka

On Sun, Nov 23, 2014 at 04:29:53PM -0500, Chris Mason wrote:
> 11 minutes later and it's still alive. I'll keep an eye on it and yell
> if it falls over.

Ditto. Box here suspends and resumes fine even while the fork bomb is
running. I'll watch it the next days just in case though.

Reported-and-tested-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: New crashes walking proc with Saturday's git
  2014-11-23 21:38               ` Thomas Gleixner
@ 2014-11-23 21:42                 ` Chris Mason
  2014-11-23 22:04                   ` [PATCH] sched: Provide update_curr callbacks for stop/idle scheduling classes Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Mason @ 2014-11-23 21:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, torvalds, linux-kernel, Ingo Molnar, Stanislaw Gruszka

On Sun, Nov 23, 2014 at 4:38 PM, Thomas Gleixner <tglx@linutronix.de> 
wrote:
> On Sun, 23 Nov 2014, Chris Mason wrote:
>>  On Sun, Nov 23, 2014 at 4:05 PM, Thomas Gleixner 
>> <tglx@linutronix.de> wrote:
>>  > On Sun, 23 Nov 2014, Chris Mason wrote:
>>  > >  On Sun, Nov 23, 2014 at 11:32 AM, Borislav Petkov 
>> <bp@alien8.de> wrote:
>>  > >  > On Sun, Nov 23, 2014 at 11:16:51AM -0500, Chris Mason wrote:
>>  > >  > >  It must be:
>>  > >  > >
>>  > >  > >  commit 6e998916dfe327e785e7c2447959b2c1a3ea4930
>>  > >  > >  Author: Stanislaw Gruszka <sgruszka@redhat.com>
>>  > >  > >  Date:   Wed Nov 12 16:58:44 2014 +0100
>>  > >  > >
>>  > >  > >     sched/cputime: Fix clock_nanosleep()/clock_gettime()
>>  > > inconsistency
>>  > >  > >
>>  > >  > >  I'll do two runs to confirm, but it's the only related 
>> patch between
>>  > > rc5
>>  > >  > > and
>>  > >  > >  now.
>>  > >
>>  > >  I've adding Ingo and Stanislaw to the cc.  With
>>  > >  6e998916dfe327e785e7c2447959b2c1a3ea4930 reverted, I'm no 
>> longer
>>  > > crashing.
>>  > >
>>  > >  Repeating the stack trace for the new cc list.  I see the 
>> crash with atop
>>  > > or
>>  > >  similar walkers of /proc racing against exiting programs.  
>> Given the NULL
>>  > > rip,
>>  > >  this line from the patch is probably broken, but it really 
>> feels like we
>>  > >  should be falling over on p->sched_class and not on the 
>> update_curr func.
>>  > >
>>  > >  +               p->sched_class->update_curr(rq);
>>  > >
>>  > >  I'm leaving my fork bomb running on two machines with the 
>> patch reverted
>>  > > to
>>  > >  make sure.
>>  >
>>  > The sched_class instances which do not have update_curr are 
>> stop_task
>>  > and idle. Patch below.
>>  >
>>  > I'm sure nobody thought about the stats read code path here.
>>  >
>>  > [ 1053.759741]  [<ffffffff81208348>] do_task_stat+0x8b8/0xb00
>>  >
>>  > do_task_stat(()
>>  >  thread_group_cputime_adjusted()
>>  >    thread_group_cputime()
>>  >      task_cputime()
>>  >        task_sched_runtime()
>>  > 	if (task_current(rq, p) && task_on_rq_queued(p)) {
>>  >                 update_rq_clock(rq);
>>  >                 p->sched_class->update_curr(rq);
>>  >         }
>>  >
>>  > Now if the stats are read for a stomp machine task, aka 
>> 'migration/N'
>>  > and that task is current on its cpu. Ooops.
>>  >
>>  > I added the callback for idle tasks as well for completeness sake.
>> 
>>  This does make sense, but it doesn't match with the crash being 
>> much more
>>  likely during the fork bomb.  The difference is crashing within a 
>> few hours vs
>>  crashing within 5 minutes.
> 
> The fork bomb will kick the migration task pretty often into life, so
> the probablity of do_task_stat() to hit a running migration thread is
> higher than on a normaly loaded machine.

Fair enough, I just had crashes_in_proc == races_with_exit stuck in my 
head ;)  I've got a new xfstests run on the second machine to be sure, 
but this is definitely better.

-chris




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] sched: Provide update_curr callbacks for stop/idle scheduling classes
  2014-11-23 21:42                 ` Chris Mason
@ 2014-11-23 22:04                   ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2014-11-23 22:04 UTC (permalink / raw)
  To: Chris Mason
  Cc: Borislav Petkov, Linus Torvalds, LKML, Ingo Molnar,
	Stanislaw Gruszka, Peter Zijlstra

Chris bisected a NULL pointer deference in task_sched_runtime() to
commit 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime()
inconsistency'.

Chris observed crashes in atop or other /proc walking programs when he
started fork bombs on his machine. He assumed that this is a new exit
race, but that does not make any sense when looking at that commit.

What's interesting is that, the commit provides update_curr callbacks
for all scheduling classes except stop_task and idle_task.

While nothing can ever hit that via the clock_nanosleep() and
clock_gettime() interfaces, which have been the target of the commit
in question, the author obviously forgot, that there are other code
pathes which invoke task_sched_runtime()

do_task_stat(()
 thread_group_cputime_adjusted()
   thread_group_cputime()
     task_cputime()
       task_sched_runtime()
        if (task_current(rq, p) && task_on_rq_queued(p)) {
          update_rq_clock(rq);
          up->sched_class->update_curr(rq);
        }

If the stats are read for a stomp machine task, aka 'migration/N' and
that task is current on its cpu, this will happily call the NULL
pointer of stop_task->update_curr. Ooops.

Chris observation that this happens faster when he runs the fork bomb
makes sense as the fork bomb will kick migration threads more often so
the probability to hit the issue will increase.

Add the missing update_curr callbacks to the scheduler classes
stop_task and idle_task. While idle tasks cannot be monitored via
/proc we have other means to hit the idle case.

Fixes: 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'
Reported-by: Chris Mason <clm@fb.com>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>

---
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index 67ad4e7f506a..c65dac8c97cd 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -75,6 +75,10 @@ static unsigned int get_rr_interval_idle(struct rq *rq, struct task_struct *task
 	return 0;
 }
 
+static void update_curr_idle(struct rq *rq)
+{
+}
+
 /*
  * Simple, special scheduling class for the per-CPU idle tasks:
  */
@@ -101,4 +105,5 @@ const struct sched_class idle_sched_class = {
 
 	.prio_changed		= prio_changed_idle,
 	.switched_to		= switched_to_idle,
+	.update_curr		= update_curr_idle,
 };
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 67426e529f59..79ffec45a6ac 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -102,6 +102,10 @@ get_rr_interval_stop(struct rq *rq, struct task_struct *task)
 	return 0;
 }
 
+static void update_curr_stop(struct rq *rq)
+{
+}
+
 /*
  * Simple, special scheduling class for the per-CPU stop tasks:
  */
@@ -128,4 +132,5 @@ const struct sched_class stop_sched_class = {
 
 	.prio_changed		= prio_changed_stop,
 	.switched_to		= switched_to_stop,
+	.update_curr		= update_curr_stop,
 };

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-11-23 22:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-23 15:02 New crashes walking proc with Saturday's git Chris Mason
2014-11-23 15:56 ` Chris Mason
2014-11-23 16:11   ` Borislav Petkov
2014-11-23 16:16     ` Chris Mason
2014-11-23 16:32       ` Borislav Petkov
2014-11-23 16:36         ` Chris Mason
2014-11-23 16:49         ` Chris Mason
2014-11-23 16:59           ` Borislav Petkov
2014-11-23 21:05           ` Thomas Gleixner
2014-11-23 21:11             ` Chris Mason
2014-11-23 21:29               ` Chris Mason
2014-11-23 21:38                 ` Borislav Petkov
2014-11-23 21:38               ` Thomas Gleixner
2014-11-23 21:42                 ` Chris Mason
2014-11-23 22:04                   ` [PATCH] sched: Provide update_curr callbacks for stop/idle scheduling classes Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.