linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* -next20181010 regression: thinkpad x60 (32 bit) dies during boot.
@ 2018-10-10 19:59 Pavel Machek
  2018-10-10 20:03 ` Pavel Machek
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Machek @ 2018-10-10 19:59 UTC (permalink / raw)
  To: kernel list, tglx, mingo, bp, hpa, x86

[-- Attachment #1: Type: text/plain, Size: 421 bytes --]

Hi!

I updated to todays next... and boot crashes with

..
Call Trace:
kick_ilb
trigger_load_balance
? active_load..
scheduler_tick
update_process_times
tick_nohz_handler

-next20181005 worked ok.

Tell me if x86-32 works for you...

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: -next20181010 regression: thinkpad x60 (32 bit) dies during boot.
  2018-10-10 19:59 -next20181010 regression: thinkpad x60 (32 bit) dies during boot Pavel Machek
@ 2018-10-10 20:03 ` Pavel Machek
  2018-10-11 18:03   ` -next20181010,1011 " Pavel Machek
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Machek @ 2018-10-10 20:03 UTC (permalink / raw)
  To: kernel list, tglx, mingo, bp, hpa, x86

[-- Attachment #1: Type: text/plain, Size: 713 bytes --]

Hi!

> I updated to todays next... and boot crashes with
> 
> ..
> Call Trace:
> kick_ilb
> trigger_load_balance
> ? active_load..
> scheduler_tick
> update_process_times
> tick_nohz_handler
> 
> -next20181005 worked ok.

Backtrace indicates problem with nohz, so I added idle=poll.

Now I have

Run /sbin/init as init process
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: pgd_alloc...
...
dump_stack
....
pgd_alloc
mm_init.isra.
mm_alloc
__do_execve_file
do_execve
...

Good ideas welcome :-).

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* -next20181010,1011 regression: thinkpad x60 (32 bit) dies during boot.
  2018-10-10 20:03 ` Pavel Machek
@ 2018-10-11 18:03   ` Pavel Machek
  2018-10-11 20:09     ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Machek @ 2018-10-11 18:03 UTC (permalink / raw)
  To: kernel list, tglx, mingo, bp, hpa, x86

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

On Wed 2018-10-10 22:03:32, Pavel Machek wrote:
> Hi!
> 
> > I updated to todays next... and boot crashes with
> > 
> > ..
> > Call Trace:
> > kick_ilb
> > trigger_load_balance
> > ? active_load..
> > scheduler_tick
> > update_process_times
> > tick_nohz_handler
> > 
> > -next20181005 worked ok.

Problem is still there in today's next.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: -next20181010,1011 regression: thinkpad x60 (32 bit) dies during boot.
  2018-10-11 18:03   ` -next20181010,1011 " Pavel Machek
@ 2018-10-11 20:09     ` Thomas Gleixner
  2018-10-12 10:24       ` Pavel Machek
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2018-10-11 20:09 UTC (permalink / raw)
  To: Pavel Machek; +Cc: kernel list, mingo, bp, hpa, x86

On Thu, 11 Oct 2018, Pavel Machek wrote:

> On Wed 2018-10-10 22:03:32, Pavel Machek wrote:
> > Hi!
> > 
> > > I updated to todays next... and boot crashes with
> > > 
> > > ..
> > > Call Trace:
> > > kick_ilb
> > > trigger_load_balance
> > > ? active_load..
> > > scheduler_tick
> > > update_process_times
> > > tick_nohz_handler
> > > 
> > > -next20181005 worked ok.
> 
> Problem is still there in today's next.

So what came in between -next20181005 and the first bad one? kernel/sched/*
being the first place to look at.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: -next20181010,1011 regression: thinkpad x60 (32 bit) dies during boot.
  2018-10-11 20:09     ` Thomas Gleixner
@ 2018-10-12 10:24       ` Pavel Machek
  2018-10-12 10:52         ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Machek @ 2018-10-12 10:24 UTC (permalink / raw)
  To: Thomas Gleixner, sfr; +Cc: kernel list, mingo, bp, hpa, x86

[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]

Hi!

> > > > I updated to todays next... and boot crashes with
> > > > 
> > > > ..
> > > > Call Trace:
> > > > kick_ilb
> > > > trigger_load_balance
> > > > ? active_load..
> > > > scheduler_tick
> > > > update_process_times
> > > > tick_nohz_handler
> > > > 
> > > > -next20181005 worked ok.
> > 
> > Problem is still there in today's next.
> 
> So what came in between -next20181005 and the first bad one? kernel/sched/*
> being the first place to look at.

kernel/sched does not seem to contain anything too scary.

I know that -next20181005 works ok, and I know -next20181010 is
bad. Is there easy way to bisect using that information? I can do
bisect between -next and mainline, but that's a lot of patches and
thus not much fun :-(.

In the meantime, I reproduced the failure with T40p. Is there someone
with working x86-32 in -next?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: -next20181010,1011 regression: thinkpad x60 (32 bit) dies during boot.
  2018-10-12 10:24       ` Pavel Machek
@ 2018-10-12 10:52         ` Ingo Molnar
  2018-10-12 12:35           ` Pavel Machek
  2018-10-12 18:10           ` Avoid VLA in pgd_alloc kills boot on 32-bit machines was " Pavel Machek
  0 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2018-10-12 10:52 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Thomas Gleixner, sfr, kernel list, mingo, bp, hpa, x86


* Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> > > > > I updated to todays next... and boot crashes with
> > > > > 
> > > > > ..
> > > > > Call Trace:
> > > > > kick_ilb
> > > > > trigger_load_balance
> > > > > ? active_load..
> > > > > scheduler_tick
> > > > > update_process_times
> > > > > tick_nohz_handler
> > > > > 
> > > > > -next20181005 worked ok.
> > > 
> > > Problem is still there in today's next.
> > 
> > So what came in between -next20181005 and the first bad one? kernel/sched/*
> > being the first place to look at.
> 
> kernel/sched does not seem to contain anything too scary.
> 
> I know that -next20181005 works ok, and I know -next20181010 is
> bad. Is there easy way to bisect using that information? I can do
> bisect between -next and mainline, but that's a lot of patches and
> thus not much fun :-(.
> 
> In the meantime, I reproduced the failure with T40p. Is there someone
> with working x86-32 in -next?

Does latest -tip fail too? If yes then I suspect bisection would be needed.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: -next20181010,1011 regression: thinkpad x60 (32 bit) dies during boot.
  2018-10-12 10:52         ` Ingo Molnar
@ 2018-10-12 12:35           ` Pavel Machek
  2018-10-12 18:10           ` Avoid VLA in pgd_alloc kills boot on 32-bit machines was " Pavel Machek
  1 sibling, 0 replies; 11+ messages in thread
From: Pavel Machek @ 2018-10-12 12:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, sfr, kernel list, mingo, bp, hpa, x86

[-- Attachment #1: Type: text/plain, Size: 2270 bytes --]

Hi!

> > > > Problem is still there in today's next.
> > > 
> > > So what came in between -next20181005 and the first bad one? kernel/sched/*
> > > being the first place to look at.
> > 
> > kernel/sched does not seem to contain anything too scary.
> > 
> > I know that -next20181005 works ok, and I know -next20181010 is
> > bad. Is there easy way to bisect using that information? I can do
> > bisect between -next and mainline, but that's a lot of patches and
> > thus not much fun :-(.
> > 
> > In the meantime, I reproduced the failure with T40p. Is there someone
> > with working x86-32 in -next?
> 
> Does latest -tip fail too? If yes then I suspect bisection would be needed.

I already started bisect on -next (T40p is my test machine, so bisect
is not that bad).

The log so far is:

								Pavel

# bad: [771b65e89c8a51d611b8049718693a4202e4f732] Add linux-next specific files for 20181011
# good: [7876320f88802b22d4e2daf7eb027dd14175a0f8] Linux 4.19-rc4
git bisect start '771b65e89c8a51d611b8049718693a4202e4f732' '7876320f88802b22d4e2daf7eb027dd14175a0f8'
# good: [43faff25da004eabce691268da34065b3690f5ca] Merge remote-tracking branch 'net-next/master'
git bisect good 43faff25da004eabce691268da34065b3690f5ca
# good: [3e2beb7db82a880319aa2f0dcafa820f3f5206d3] Merge remote-tracking branch 'spi/for-next'
git bisect good 3e2beb7db82a880319aa2f0dcafa820f3f5206d3
# bad: [74411e5fd30ae540491c4d6142af6ee6b2b22f09] Merge remote-tracking branch 'char-misc/char-misc-next'
git bisect bad 74411e5fd30ae540491c4d6142af6ee6b2b22f09
# bad: [c810d907775aa2aa753e836a122613fd2416b14d] Merge remote-tracking branch 'kvm/linux-next'
git bisect bad c810d907775aa2aa753e836a122613fd2416b14d
# good: [fac07d2ba7b2764e3002ff9bc7861742a84a2ef6] Merge branch 'perf/core'
git bisect good fac07d2ba7b2764e3002ff9bc7861742a84a2ef6
# bad: [d74865bd3996c7a6f3e8ce6e626c1fe474e39494] Merge branch 'x86/mm'
git bisect bad d74865bd3996c7a6f3e8ce6e626c1fe474e39494
# good: [dcd2d0cece1608b2be9184786c900807ec947076] Merge branch 'x86/asm'
git bisect good dcd2d0cece1608b2be9184786c900807ec947076


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Avoid VLA in pgd_alloc kills boot on 32-bit machines was Re: -next20181010,1011 regression: thinkpad x60 (32 bit) dies during boot.
  2018-10-12 10:52         ` Ingo Molnar
  2018-10-12 12:35           ` Pavel Machek
@ 2018-10-12 18:10           ` Pavel Machek
  2018-10-12 18:13             ` Borislav Petkov
  2018-10-12 18:22             ` Pavel Machek
  1 sibling, 2 replies; 11+ messages in thread
From: Pavel Machek @ 2018-10-12 18:10 UTC (permalink / raw)
  To: Ingo Molnar, arnd, akpm, luto, dave.hansen, jroedel, keescook,
	torvalds, toshi.kani
  Cc: Thomas Gleixner, sfr, kernel list, mingo, bp, hpa, x86

[-- Attachment #1: Type: text/plain, Size: 983 bytes --]

Hi!

> > > So what came in between -next20181005 and the first bad one? kernel/sched/*
> > > being the first place to look at.
> > 
> > kernel/sched does not seem to contain anything too scary.
> > 
> > I know that -next20181005 works ok, and I know -next20181010 is
> > bad. Is there easy way to bisect using that information? I can do
> > bisect between -next and mainline, but that's a lot of patches and
> > thus not much fun :-(.
> > 
> > In the meantime, I reproduced the failure with T40p. Is there someone
> > with working x86-32 in -next?
> 
> Does latest -tip fail too? If yes then I suspect bisection would be needed.

And the winner is...

[1be3f247c2882a82279cbcf43717581ea943b692] x86/mm: Avoid VLA in
pgd_alloc()

"Kernel stack is corrupted in: pgd_alloc" panic kind of suggests this
is right commit.


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid VLA in pgd_alloc kills boot on 32-bit machines was Re: -next20181010,1011 regression: thinkpad x60 (32 bit) dies during boot.
  2018-10-12 18:10           ` Avoid VLA in pgd_alloc kills boot on 32-bit machines was " Pavel Machek
@ 2018-10-12 18:13             ` Borislav Petkov
  2018-10-12 18:57               ` Pavel Machek
  2018-10-12 18:22             ` Pavel Machek
  1 sibling, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2018-10-12 18:13 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Ingo Molnar, arnd, akpm, luto, dave.hansen, jroedel, keescook,
	torvalds, toshi.kani, Thomas Gleixner, sfr, kernel list, mingo,
	hpa, x86

On Fri, Oct 12, 2018 at 08:10:11PM +0200, Pavel Machek wrote:
> And the winner is...
> 
> [1be3f247c2882a82279cbcf43717581ea943b692] x86/mm: Avoid VLA in
> pgd_alloc()

That should be fixed now:

https://git.kernel.org/tip/184d47f0fd365108bd06ab26cdb3450b716269fd

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid VLA in pgd_alloc kills boot on 32-bit machines was Re: -next20181010,1011 regression: thinkpad x60 (32 bit) dies during boot.
  2018-10-12 18:10           ` Avoid VLA in pgd_alloc kills boot on 32-bit machines was " Pavel Machek
  2018-10-12 18:13             ` Borislav Petkov
@ 2018-10-12 18:22             ` Pavel Machek
  1 sibling, 0 replies; 11+ messages in thread
From: Pavel Machek @ 2018-10-12 18:22 UTC (permalink / raw)
  To: Ingo Molnar, arnd, akpm, luto, dave.hansen, jroedel, keescook,
	torvalds, toshi.kani
  Cc: Thomas Gleixner, sfr, kernel list, mingo, bp, hpa, x86


[-- Attachment #1.1: Type: text/plain, Size: 3382 bytes --]

On Fri 2018-10-12 20:10:11, Pavel Machek wrote:
> Hi!
> 
> > > > So what came in between -next20181005 and the first bad one? kernel/sched/*
> > > > being the first place to look at.
> > > 
> > > kernel/sched does not seem to contain anything too scary.
> > > 
> > > I know that -next20181005 works ok, and I know -next20181010 is
> > > bad. Is there easy way to bisect using that information? I can do
> > > bisect between -next and mainline, but that's a lot of patches and
> > > thus not much fun :-(.
> > > 
> > > In the meantime, I reproduced the failure with T40p. Is there someone
> > > with working x86-32 in -next?
> > 
> > Does latest -tip fail too? If yes then I suspect bisection would be needed.
> 
> And the winner is...
> 
> [1be3f247c2882a82279cbcf43717581ea943b692] x86/mm: Avoid VLA in
> pgd_alloc()
> 
> "Kernel stack is corrupted in: pgd_alloc" panic kind of suggests this
> is right commit.

git bisect log, for the reference... and ~ my config.

									Pavel

# bad: [771b65e89c8a51d611b8049718693a4202e4f732] Add linux-next specific files for 20181011
# good: [7876320f88802b22d4e2daf7eb027dd14175a0f8] Linux 4.19-rc4
git bisect start '771b65e89c8a51d611b8049718693a4202e4f732' '7876320f88802b22d4e2daf7eb027dd14175a0f8'
# good: [43faff25da004eabce691268da34065b3690f5ca] Merge remote-tracking branch 'net-next/master'
git bisect good 43faff25da004eabce691268da34065b3690f5ca
# good: [3e2beb7db82a880319aa2f0dcafa820f3f5206d3] Merge remote-tracking branch 'spi/for-next'
git bisect good 3e2beb7db82a880319aa2f0dcafa820f3f5206d3
# bad: [74411e5fd30ae540491c4d6142af6ee6b2b22f09] Merge remote-tracking branch 'char-misc/char-misc-next'
git bisect bad 74411e5fd30ae540491c4d6142af6ee6b2b22f09
# bad: [c810d907775aa2aa753e836a122613fd2416b14d] Merge remote-tracking branch 'kvm/linux-next'
git bisect bad c810d907775aa2aa753e836a122613fd2416b14d
# good: [fac07d2ba7b2764e3002ff9bc7861742a84a2ef6] Merge branch 'perf/core'
git bisect good fac07d2ba7b2764e3002ff9bc7861742a84a2ef6
# bad: [d74865bd3996c7a6f3e8ce6e626c1fe474e39494] Merge branch 'x86/mm'
git bisect bad d74865bd3996c7a6f3e8ce6e626c1fe474e39494
# good: [dcd2d0cece1608b2be9184786c900807ec947076] Merge branch 'x86/asm'
git bisect good dcd2d0cece1608b2be9184786c900807ec947076
# bad: [ae9260d80e517c8702b91b8e00d117e1e2834c33] Merge branch 'x86/cache'
git bisect bad ae9260d80e517c8702b91b8e00d117e1e2834c33
# good: [d5a581d84ae6b8a4a740464b80d8d9cf1e7947b2] x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
git bisect good d5a581d84ae6b8a4a740464b80d8d9cf1e7947b2
# good: [245e5707dd7df01428459d97a9121f14a57dac6b] Merge branch 'x86/build'
git bisect good 245e5707dd7df01428459d97a9121f14a57dac6b
# bad: [7d27cb68cc307ee103e116d357e9baca35151c55] Merge branch 'x86/urgent' into x86/cache
git bisect bad 7d27cb68cc307ee103e116d357e9baca35151c55
# good: [2cc81c6992248ea37d0241bc325977bab310bc3b] x86/intel_rdt: Show missing resctrl mount options
git bisect good 2cc81c6992248ea37d0241bc325977bab310bc3b
# bad: [e8bd1803aec89dfce5758d88022963fe3248bc4c] x86/intel_rdt: Fix out-of-bounds memory access in CBM tests
git bisect bad e8bd1803aec89dfce5758d88022963fe3248bc4c


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #1.2: config.gz --]
[-- Type: application/gzip, Size: 27906 bytes --]

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid VLA in pgd_alloc kills boot on 32-bit machines was Re: -next20181010,1011 regression: thinkpad x60 (32 bit) dies during boot.
  2018-10-12 18:13             ` Borislav Petkov
@ 2018-10-12 18:57               ` Pavel Machek
  0 siblings, 0 replies; 11+ messages in thread
From: Pavel Machek @ 2018-10-12 18:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, arnd, akpm, luto, dave.hansen, jroedel, keescook,
	torvalds, toshi.kani, Thomas Gleixner, sfr, kernel list, mingo,
	hpa, x86

[-- Attachment #1: Type: text/plain, Size: 558 bytes --]

On Fri 2018-10-12 20:13:35, Borislav Petkov wrote:
> On Fri, Oct 12, 2018 at 08:10:11PM +0200, Pavel Machek wrote:
> > And the winner is...
> > 
> > [1be3f247c2882a82279cbcf43717581ea943b692] x86/mm: Avoid VLA in
> > pgd_alloc()
> 
> That should be fixed now:
> 
> https://git.kernel.org/tip/184d47f0fd365108bd06ab26cdb3450b716269fd

Aha, ok.

-next20181012 indeed works.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-10-12 18:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-10 19:59 -next20181010 regression: thinkpad x60 (32 bit) dies during boot Pavel Machek
2018-10-10 20:03 ` Pavel Machek
2018-10-11 18:03   ` -next20181010,1011 " Pavel Machek
2018-10-11 20:09     ` Thomas Gleixner
2018-10-12 10:24       ` Pavel Machek
2018-10-12 10:52         ` Ingo Molnar
2018-10-12 12:35           ` Pavel Machek
2018-10-12 18:10           ` Avoid VLA in pgd_alloc kills boot on 32-bit machines was " Pavel Machek
2018-10-12 18:13             ` Borislav Petkov
2018-10-12 18:57               ` Pavel Machek
2018-10-12 18:22             ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).