All of lore.kernel.org
 help / color / mirror / Atom feed
* Bisected KVM hang on x86-32 between v3.12 and v3.13
@ 2014-04-06 15:19 Michele Ballabio
  2014-04-06 15:52 ` Toralf Förster
  2014-04-07 15:03 ` Peter Zijlstra
  0 siblings, 2 replies; 23+ messages in thread
From: Michele Ballabio @ 2014-04-06 15:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: toralf.foerster, fweisbec, mingo, peterz

Toralf Förster reported this in
  http://article.gmane.org/gmane.linux.kernel/1662567
  http://article.gmane.org/gmane.linux.kernel/1658422
  http://article.gmane.org/gmane.linux.kernel/1657962

  "The issue happens here at a 32 bit stable Gentoo Linux if
   I try to start a KVM image. Kernels 3.12.X works fine,
   kernel >= v3.13 will hang shortly after I started the image
   with the virtual-manager. The last syslog messages are
   something like:
   Feb 28 16:22:00 n22 kernel: INFO: rcu_sched detected stalls
       on CPUs/tasks: {} (detected by 2, t=60002 jiffies,
       g=14689, c=14688, q=21051)
   Feb 28 16:22:00 n22 kernel: INFO: Stall ended before state
       dump start"

He correctly pointed out that the bisection blamed the merge
commit 37bf06375c90a42fe07b9bebdb07bc316ae5a0ce
"Merge tag 'v3.12-rc4' into sched/core".

This bug is obviously caused by at least two patches, one
on each side of the merge, that only when combined together
(at that merge point) cause the bug in kvm. By rebasing
the "sched/core" branch on "master" before the merge and
going on with the bisection, I found commit
3e8e42c69bb7d9fc12ebc23ff308e8523a2a59a0
"sched: Revert need_resched() to look at TIF_NEED_RESCHED"
as one of the causes. The other patch that contributes to the
bug is commit ded797547548a5b8e7b92383a41e4c0e6b0ecb7f
"irq: Force hardirq exit's softirq processing on its own stack".

Reverting either one of them solves the problem reported with kvm,
but revert is probably not the correct answer.

I wonder if the solution is as simple as this:

--->8---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0af5250..f3b985d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -126,6 +126,7 @@ config X86
 	select RTC_LIB
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
+	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_32
 	select HAVE_CC_STACKPROTECTOR

 config INSTRUCTION_DECODER
---8<---

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-06 15:19 Bisected KVM hang on x86-32 between v3.12 and v3.13 Michele Ballabio
@ 2014-04-06 15:52 ` Toralf Förster
  2014-04-06 17:40   ` Michele Ballabio
  2014-04-07 15:03 ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Toralf Förster @ 2014-04-06 15:52 UTC (permalink / raw)
  To: Michele Ballabio, linux-kernel; +Cc: fweisbec, mingo, peterz

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 04/06/2014 05:19 PM, Michele Ballabio wrote:
> Toralf Förster reported this in
>   http://article.gmane.org/gmane.linux.kernel/1662567
>   http://article.gmane.org/gmane.linux.kernel/1658422
>   http://article.gmane.org/gmane.linux.kernel/1657962
> 
>   "The issue happens here at a 32 bit stable Gentoo Linux if
>    I try to start a KVM image. Kernels 3.12.X works fine,
>    kernel >= v3.13 will hang shortly after I started the image
>    with the virtual-manager. The last syslog messages are
>    something like:
>    Feb 28 16:22:00 n22 kernel: INFO: rcu_sched detected stalls
>        on CPUs/tasks: {} (detected by 2, t=60002 jiffies,
>        g=14689, c=14688, q=21051)
>    Feb 28 16:22:00 n22 kernel: INFO: Stall ended before state
>        dump start"
> 
> He correctly pointed out that the bisection blamed the merge
> commit 37bf06375c90a42fe07b9bebdb07bc316ae5a0ce
> "Merge tag 'v3.12-rc4' into sched/core".
> 
> This bug is obviously caused by at least two patches, one
> on each side of the merge, that only when combined together
> (at that merge point) cause the bug in kvm. By rebasing
> the "sched/core" branch on "master" before the merge and
> going on with the bisection, I found commit
> 3e8e42c69bb7d9fc12ebc23ff308e8523a2a59a0
> "sched: Revert need_resched() to look at TIF_NEED_RESCHED"
> as one of the causes. The other patch that contributes to the
> bug is commit ded797547548a5b8e7b92383a41e4c0e6b0ecb7f
> "irq: Force hardirq exit's softirq processing on its own stack".
> 
> Reverting either one of them solves the problem reported with kvm,
> but revert is probably not the correct answer.
> 
> I wonder if the solution is as simple as this:
> 
> --->8---
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0af5250..f3b985d 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -126,6 +126,7 @@ config X86
>  	select RTC_LIB
>  	select HAVE_DEBUG_STACKOVERFLOW
>  	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
> +	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_32
>  	select HAVE_CC_STACKPROTECTOR
> 
>  config INSTRUCTION_DECODER
> ---8<---
> 
applied both to 3.13.9 and 3.14.0 - issue does not happened any longer

Thanks !



P.S..
	'By rebasing the "sched/core" branch on "master" before the merge and going on with the bisection'

Probably off-topic but I'm really interested what did you do in detail ? I'm asking b/c using git for my own and to bisect a remote tree, but I'm not too familiar in bisecting bugs of this kind. Furthermore probably worth an own section in one of the TODO's ?


- -- 
MfG/Sincerely
Toralf Förster
pgp finger print:1A37 6F99 4A9D 026F 13E2 4DCF C4EA CDDE 0076 E94E
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iF4EAREIAAYFAlNBeE4ACgkQxOrN3gB26U4zpgD/bEaIS17/FIxmsyHZvL15RoX6
Z0dLwOoPcIRJyi2pn44A/0qh9YmB9Bv2yIf7qsUaEZA+lpJ+ikWMZSVEW2JtZMV0
=QOaP
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-06 15:52 ` Toralf Förster
@ 2014-04-06 17:40   ` Michele Ballabio
  0 siblings, 0 replies; 23+ messages in thread
From: Michele Ballabio @ 2014-04-06 17:40 UTC (permalink / raw)
  To: Toralf Förster, linux-kernel; +Cc: fweisbec, mingo, peterz

On 06/04/2014 17:52, Toralf Förster wrote:
>> > Reverting either one of them solves the problem reported with kvm,
>> > but revert is probably not the correct answer.
>> > 
>> > I wonder if the solution is as simple as this:
>> > 
>> > --->8---
>> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> > index 0af5250..f3b985d 100644
>> > --- a/arch/x86/Kconfig
>> > +++ b/arch/x86/Kconfig
>> > @@ -126,6 +126,7 @@ config X86
>> >  	select RTC_LIB
>> >  	select HAVE_DEBUG_STACKOVERFLOW
>> >  	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
>> > +	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_32
>> >  	select HAVE_CC_STACKPROTECTOR
>> > 
>> >  config INSTRUCTION_DECODER
>> > ---8<---
>> > 
> applied both to 3.13.9 and 3.14.0 - issue does not happened any longer
> 
> Thanks !

You're welcome, but I'm not sure it's bug-free, I've only glanced at the
code. Let's hear what other people think.

> P.S.. 'By rebasing the "sched/core" branch on "master" before the 
> merge and going on with the bisection'
> 
> Probably off-topic but I'm really interested what did you do in 
> detail ? I'm asking b/c using git for my own and to bisect a remote 
> tree, but I'm not too familiar in bisecting bugs of this kind.

Let's say bisection blames the merge commit 'M' as the first bad
commit:

R---S---T---P---X---Y---Z---M
             \             /
              A---B---C---'

First, let's search for the merge base P:
  git merge-base Z C

Now: we know there are at least 2 patches (one on both sides of the
merge) that are responsible for the regression.
One is either X, Y or Z; the other is either A, B or C.

To find the first one (between X, Y and Z), we can rebase Z on top of
C, either with

  git rebase C Z

or with
  git format-patch Z ^C
  git am 00*
(It's almost the same thing, sometimes you might want to do this to
edit out some patches without using git bisect --interactive).

The result should be:

 P---X---Y---Z---M
  \             /
   A---B---C---'
            \
             X'---Y'---Z'

At this point
  git diff M Y'
should show nothing.

Now we can further our bisection:
  git bisect start Z' C
... should give us the first patch to blame: let's say it's Y'.

Now we search for the other one: the sequence I used is

# We know the other guilty commit is A, B or C
  git bisect start C P
# We will be brought on commit B.
# We know the regression triggers only with Y, so we merge with it
  git merge --no-commit Y
# The result will be something like this:
 P---X---Y---Z---M
  \       \     /
   A---B---N   /
        \     /
         C---'
          \
           X'---Y'---Z'
# Where N is a merge, but it's not a commit.
# 1st compile, install, reboot, test
# We are about to search another commit to test, so let's remove the
# temporary changes of the previous test
  git reset --hard
# What is the result of the 1st test?
  git bisect good|bad
# Here is another iteration
  git merge --no-commit Y
# 2nd compile, install, reboot, test

... and so on. Of course, there are some variations: if Y does not
depend on X, instead of a merge we could do:
  git cherry-pick --no-commit Y

In this case I used a merge because it was easier, and actually I
didn't merge _exactly_ with the blamed commit, but with the last
commit before the merge M, i.e. Z (I assumed it would be better
this way).

> Furthermore probably worth an own section in one of the TODO's ?

I think this (a regression blamed on a merge) is quite unusual,
and as such it could be useful on the man page of git-bisect,
but it should be written better and shorter than what I did
here.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-06 15:19 Bisected KVM hang on x86-32 between v3.12 and v3.13 Michele Ballabio
  2014-04-06 15:52 ` Toralf Förster
@ 2014-04-07 15:03 ` Peter Zijlstra
  2014-04-07 15:07   ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-04-07 15:03 UTC (permalink / raw)
  To: Michele Ballabio
  Cc: linux-kernel, toralf.foerster, fweisbec, mingo, Steven Rostedt

On Sun, Apr 06, 2014 at 05:19:27PM +0200, Michele Ballabio wrote:
> Toralf Förster reported this in
>   http://article.gmane.org/gmane.linux.kernel/1662567
>   http://article.gmane.org/gmane.linux.kernel/1658422
>   http://article.gmane.org/gmane.linux.kernel/1657962
> 
>   "The issue happens here at a 32 bit stable Gentoo Linux if
>    I try to start a KVM image. Kernels 3.12.X works fine,
>    kernel >= v3.13 will hang shortly after I started the image
>    with the virtual-manager. The last syslog messages are
>    something like:
>    Feb 28 16:22:00 n22 kernel: INFO: rcu_sched detected stalls
>        on CPUs/tasks: {} (detected by 2, t=60002 jiffies,
>        g=14689, c=14688, q=21051)
>    Feb 28 16:22:00 n22 kernel: INFO: Stall ended before state
>        dump start"
> 
> He correctly pointed out that the bisection blamed the merge
> commit 37bf06375c90a42fe07b9bebdb07bc316ae5a0ce
> "Merge tag 'v3.12-rc4' into sched/core".
> 
> This bug is obviously caused by at least two patches, one
> on each side of the merge, that only when combined together
> (at that merge point) cause the bug in kvm. By rebasing
> the "sched/core" branch on "master" before the merge and
> going on with the bisection, I found commit
> 3e8e42c69bb7d9fc12ebc23ff308e8523a2a59a0
> "sched: Revert need_resched() to look at TIF_NEED_RESCHED"
> as one of the causes. The other patch that contributes to the
> bug is commit ded797547548a5b8e7b92383a41e4c0e6b0ecb7f
> "irq: Force hardirq exit's softirq processing on its own stack".
> 
> Reverting either one of them solves the problem reported with kvm,
> but revert is probably not the correct answer.
> 
> I wonder if the solution is as simple as this:
> 
> --->8---
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0af5250..f3b985d 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -126,6 +126,7 @@ config X86
>  	select RTC_LIB
>  	select HAVE_DEBUG_STACKOVERFLOW
>  	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
> +	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_32
>  	select HAVE_CC_STACKPROTECTOR

Ohh ahh.. shiney!

So what I suspect at this point is that because i386 and x86_64 have a
difference in current_thread_info() (i386 is stack based), we end up
setting the TIF_NEED_RESCHED bit on the wrong stack.

Now I have some vague memories of propagating the TIF flags on stack
switch, but I cannot remember what arch we did that for. Let me stare at
this a little more.

Also, IFF this is the case, then the fingered patch above (and your
suggested 'fix') aren't the real curlpit/cure but simply make it
more/less likely to happen.

Now, Steve had a patch somewhere that would make i386 use per-cpu
variables for current_thread_info() just like x86_64 already does I
think. Let me go find them too.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-07 15:03 ` Peter Zijlstra
@ 2014-04-07 15:07   ` Peter Zijlstra
  2014-04-07 18:16     ` Toralf Förster
  2014-04-07 19:49     ` Michele Ballabio
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2014-04-07 15:07 UTC (permalink / raw)
  To: Michele Ballabio
  Cc: linux-kernel, toralf.foerster, fweisbec, mingo, Steven Rostedt

On Mon, Apr 07, 2014 at 05:03:37PM +0200, Peter Zijlstra wrote:
> So what I suspect at this point is that because i386 and x86_64 have a
> difference in current_thread_info() (i386 is stack based), we end up
> setting the TIF_NEED_RESCHED bit on the wrong stack.
> 
> Now I have some vague memories of propagating the TIF flags on stack
> switch, but I cannot remember what arch we did that for. Let me stare at
> this a little more.
> 
> Also, IFF this is the case, then the fingered patch above (and your
> suggested 'fix') aren't the real curlpit/cure but simply make it
> more/less likely to happen.
> 
> Now, Steve had a patch somewhere that would make i386 use per-cpu
> variables for current_thread_info() just like x86_64 already does I
> think. Let me go find them too.

Ohh, goodie, they're already in Linus' tree. Could you see if current
git still suffers this problem?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-07 15:07   ` Peter Zijlstra
@ 2014-04-07 18:16     ` Toralf Förster
  2014-04-07 18:56       ` Peter Zijlstra
  2014-04-07 18:59       ` Bisected KVM hang on x86-32 between v3.12 and v3.13 Frederic Weisbecker
  2014-04-07 19:49     ` Michele Ballabio
  1 sibling, 2 replies; 23+ messages in thread
From: Toralf Förster @ 2014-04-07 18:16 UTC (permalink / raw)
  To: Peter Zijlstra, Michele Ballabio
  Cc: linux-kernel, fweisbec, mingo, Steven Rostedt

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 04/07/2014 05:07 PM, Peter Zijlstra wrote:
> On Mon, Apr 07, 2014 at 05:03:37PM +0200, Peter Zijlstra wrote:
>> So what I suspect at this point is that because i386 and x86_64
>> have a difference in current_thread_info() (i386 is stack based),
>> we end up setting the TIF_NEED_RESCHED bit on the wrong stack.
>> 
>> Now I have some vague memories of propagating the TIF flags on
>> stack switch, but I cannot remember what arch we did that for.
>> Let me stare at this a little more.
>> 
>> Also, IFF this is the case, then the fingered patch above (and
>> your suggested 'fix') aren't the real curlpit/cure but simply
>> make it more/less likely to happen.
>> 
>> Now, Steve had a patch somewhere that would make i386 use
>> per-cpu variables for current_thread_info() just like x86_64
>> already does I think. Let me go find them too.
> 
> Ohh, goodie, they're already in Linus' tree. Could you see if
> current git still suffers this problem?
> 
v3.14-10353-g2b3a8fd works fine AFAICS
(BTW the fix is stable material, right ?)

- -- 
MfG/Sincerely
Toralf Förster
pgp finger print:1A37 6F99 4A9D 026F 13E2 4DCF C4EA CDDE 0076 E94E
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iF4EAREIAAYFAlNC63gACgkQxOrN3gB26U6MoQD/QlC0K92TivMFR6H35G9m1QzK
dkdGzHDVn5HtgIswDToA/RhmhblKnqM7bWYzOxwypF7oZGdKRnBQocWbbuMi5lf0
=tw9y
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-07 18:16     ` Toralf Förster
@ 2014-04-07 18:56       ` Peter Zijlstra
  2014-04-08 12:21         ` Peter Zijlstra
  2014-04-07 18:59       ` Bisected KVM hang on x86-32 between v3.12 and v3.13 Frederic Weisbecker
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-04-07 18:56 UTC (permalink / raw)
  To: Toralf Förster
  Cc: Michele Ballabio, linux-kernel, fweisbec, mingo, Steven Rostedt

On Mon, Apr 07, 2014 at 08:16:24PM +0200, Toralf Förster wrote:

> v3.14-10353-g2b3a8fd works fine AFAICS
> (BTW the fix is stable material, right ?)

I'm fairly sure its not; its a rather invasive series; see:

2432e1364bbe x86: Nuke the supervisor_stack field in i386 thread_info
b807902a88c4 x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
0788aa6a23cb x86: Prepare removal of previous_esp from i386 thread_info structure
198d208df437 x86: Keep thread_info on thread stack in x86_32

Let me see if there's anything 'simpler' we can stuff in .13+

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-07 18:16     ` Toralf Förster
  2014-04-07 18:56       ` Peter Zijlstra
@ 2014-04-07 18:59       ` Frederic Weisbecker
  2014-04-07 19:57         ` Toralf Förster
  1 sibling, 1 reply; 23+ messages in thread
From: Frederic Weisbecker @ 2014-04-07 18:59 UTC (permalink / raw)
  To: Toralf Förster
  Cc: Peter Zijlstra, Michele Ballabio, linux-kernel, mingo, Steven Rostedt

On Mon, Apr 07, 2014 at 08:16:24PM +0200, Toralf Förster wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> On 04/07/2014 05:07 PM, Peter Zijlstra wrote:
> > On Mon, Apr 07, 2014 at 05:03:37PM +0200, Peter Zijlstra wrote:
> >> So what I suspect at this point is that because i386 and x86_64
> >> have a difference in current_thread_info() (i386 is stack based),
> >> we end up setting the TIF_NEED_RESCHED bit on the wrong stack.
> >> 
> >> Now I have some vague memories of propagating the TIF flags on
> >> stack switch, but I cannot remember what arch we did that for.
> >> Let me stare at this a little more.
> >> 
> >> Also, IFF this is the case, then the fingered patch above (and
> >> your suggested 'fix') aren't the real curlpit/cure but simply
> >> make it more/less likely to happen.
> >> 
> >> Now, Steve had a patch somewhere that would make i386 use
> >> per-cpu variables for current_thread_info() just like x86_64
> >> already does I think. Let me go find them too.
> > 
> > Ohh, goodie, they're already in Linus' tree. Could you see if
> > current git still suffers this problem?
> > 
> v3.14-10353-g2b3a8fd works fine AFAICS
> (BTW the fix is stable material, right ?)

If we are reffering to 198d208df4371734ac4728f69cb585c284d20a15
(x86: Keep thread_info on thread stack in x86_32) it doesn't
carry a stable tag.

So to be clear, you are saying that v3.14 is fine but other release
are buggy? Which ones are these?

If a backport is needed, 198d208df4371734ac4728f69cb585c284d20a15 looks too large...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-07 15:07   ` Peter Zijlstra
  2014-04-07 18:16     ` Toralf Förster
@ 2014-04-07 19:49     ` Michele Ballabio
  1 sibling, 0 replies; 23+ messages in thread
From: Michele Ballabio @ 2014-04-07 19:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, toralf.foerster, fweisbec, mingo, Steven Rostedt

On 07/04/2014 17:07, Peter Zijlstra wrote:
> On Mon, Apr 07, 2014 at 05:03:37PM +0200, Peter Zijlstra wrote:
>> So what I suspect at this point is that because i386 and x86_64 have a
>> difference in current_thread_info() (i386 is stack based), we end up
>> setting the TIF_NEED_RESCHED bit on the wrong stack.
>>
>> Now I have some vague memories of propagating the TIF flags on stack
>> switch, but I cannot remember what arch we did that for. Let me stare at
>> this a little more.
>>
>> Also, IFF this is the case, then the fingered patch above (and your
>> suggested 'fix') aren't the real curlpit/cure but simply make it
>> more/less likely to happen.

Understood.

>> Now, Steve had a patch somewhere that would make i386 use per-cpu
>> variables for current_thread_info() just like x86_64 already does I
>> think. Let me go find them too.
> 
> Ohh, goodie, they're already in Linus' tree. Could you see if current
> git still suffers this problem?

It is fixed on current -master (3.14.0-10773-gb8780c3).

Thanks.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-07 18:59       ` Bisected KVM hang on x86-32 between v3.12 and v3.13 Frederic Weisbecker
@ 2014-04-07 19:57         ` Toralf Förster
  2014-04-07 22:43           ` Frederic Weisbecker
  0 siblings, 1 reply; 23+ messages in thread
From: Toralf Förster @ 2014-04-07 19:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Michele Ballabio, linux-kernel, mingo, Steven Rostedt

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 04/07/2014 08:59 PM, Frederic Weisbecker wrote:
> On Mon, Apr 07, 2014 at 08:16:24PM +0200, Toralf Förster wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
>> 
>> On 04/07/2014 05:07 PM, Peter Zijlstra wrote:
>>> On Mon, Apr 07, 2014 at 05:03:37PM +0200, Peter Zijlstra
>>> wrote:
>>>> So what I suspect at this point is that because i386 and
>>>> x86_64 have a difference in current_thread_info() (i386 is
>>>> stack based), we end up setting the TIF_NEED_RESCHED bit on
>>>> the wrong stack.
>>>> 
>>>> Now I have some vague memories of propagating the TIF flags
>>>> on stack switch, but I cannot remember what arch we did that
>>>> for. Let me stare at this a little more.
>>>> 
>>>> Also, IFF this is the case, then the fingered patch above
>>>> (and your suggested 'fix') aren't the real curlpit/cure but
>>>> simply make it more/less likely to happen.
>>>> 
>>>> Now, Steve had a patch somewhere that would make i386 use 
>>>> per-cpu variables for current_thread_info() just like x86_64 
>>>> already does I think. Let me go find them too.
>>> 
>>> Ohh, goodie, they're already in Linus' tree. Could you see if 
>>> current git still suffers this problem?
>>> 
>> v3.14-10353-g2b3a8fd works fine AFAICS (BTW the fix is stable
>> material, right ?)
> 
> If we are reffering to 198d208df4371734ac4728f69cb585c284d20a15 
> (x86: Keep thread_info on thread stack in x86_32) it doesn't carry
> a stable tag.
> 
> So to be clear, you are saying that v3.14 is fine but other
> release are buggy? Which ones are these?
> 
No, 3.13.x and 3.14 have the issue, latest git is fine and 3.12.x

> If a backport is needed, 198d208df4371734ac4728f69cb585c284d20a15
> looks too large...
> 


- -- 
MfG/Sincerely
Toralf Förster
pgp finger print:1A37 6F99 4A9D 026F 13E2 4DCF C4EA CDDE 0076 E94E
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iF4EAREIAAYFAlNDAwwACgkQxOrN3gB26U5Z7AD+OMcd+262s1fJV9Fg8NjL5TaI
Y4hZHzdv76fU/9P1lrkA/RUbQFYn6r9fZz61UrEIZ5lg5dUX04DDcQkfPUMZRow3
=Ifk8
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-07 19:57         ` Toralf Förster
@ 2014-04-07 22:43           ` Frederic Weisbecker
  0 siblings, 0 replies; 23+ messages in thread
From: Frederic Weisbecker @ 2014-04-07 22:43 UTC (permalink / raw)
  To: Toralf Förster
  Cc: Peter Zijlstra, Michele Ballabio, linux-kernel, mingo, Steven Rostedt

On Mon, Apr 07, 2014 at 09:57:00PM +0200, Toralf Förster wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> On 04/07/2014 08:59 PM, Frederic Weisbecker wrote:
> > On Mon, Apr 07, 2014 at 08:16:24PM +0200, Toralf Förster wrote:
> >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
> >> 
> >> On 04/07/2014 05:07 PM, Peter Zijlstra wrote:
> >>> On Mon, Apr 07, 2014 at 05:03:37PM +0200, Peter Zijlstra
> >>> wrote:
> >>>> So what I suspect at this point is that because i386 and
> >>>> x86_64 have a difference in current_thread_info() (i386 is
> >>>> stack based), we end up setting the TIF_NEED_RESCHED bit on
> >>>> the wrong stack.
> >>>> 
> >>>> Now I have some vague memories of propagating the TIF flags
> >>>> on stack switch, but I cannot remember what arch we did that
> >>>> for. Let me stare at this a little more.
> >>>> 
> >>>> Also, IFF this is the case, then the fingered patch above
> >>>> (and your suggested 'fix') aren't the real curlpit/cure but
> >>>> simply make it more/less likely to happen.
> >>>> 
> >>>> Now, Steve had a patch somewhere that would make i386 use 
> >>>> per-cpu variables for current_thread_info() just like x86_64 
> >>>> already does I think. Let me go find them too.
> >>> 
> >>> Ohh, goodie, they're already in Linus' tree. Could you see if 
> >>> current git still suffers this problem?
> >>> 
> >> v3.14-10353-g2b3a8fd works fine AFAICS (BTW the fix is stable
> >> material, right ?)
> > 
> > If we are reffering to 198d208df4371734ac4728f69cb585c284d20a15 
> > (x86: Keep thread_info on thread stack in x86_32) it doesn't carry
> > a stable tag.
> > 
> > So to be clear, you are saying that v3.14 is fine but other
> > release are buggy? Which ones are these?
> > 
> No, 3.13.x and 3.14 have the issue, latest git is fine and 3.12.x
> 

Ah ok. Hmm, Peter seemed to have an idea on a fix to backport so I'll let
him handle that :)

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-07 18:56       ` Peter Zijlstra
@ 2014-04-08 12:21         ` Peter Zijlstra
  2014-04-08 19:14           ` Michele Ballabio
                             ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Peter Zijlstra @ 2014-04-08 12:21 UTC (permalink / raw)
  To: Toralf Förster
  Cc: Michele Ballabio, linux-kernel, fweisbec, mingo, Steven Rostedt,
	David Cohen, Stefan Bader, Paolo Bonzini, Borislav Petkov

On Mon, Apr 07, 2014 at 08:56:58PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 07, 2014 at 08:16:24PM +0200, Toralf Förster wrote:
> 
> > v3.14-10353-g2b3a8fd works fine AFAICS
> > (BTW the fix is stable material, right ?)
> 
> I'm fairly sure its not; its a rather invasive series; see:
> 
> 2432e1364bbe x86: Nuke the supervisor_stack field in i386 thread_info
> b807902a88c4 x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
> 0788aa6a23cb x86: Prepare removal of previous_esp from i386 thread_info structure
> 198d208df437 x86: Keep thread_info on thread stack in x86_32
> 
> Let me see if there's anything 'simpler' we can stuff in .13+

OK.. so could someone test the below patch to see if that makes things
work for .13 and .14?

---
 arch/x86/include/asm/preempt.h | 11 +++++++++++
 include/linux/preempt.h        |  4 ++++
 include/linux/thread_info.h    |  2 --
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index c8b051933b1b..7ada46d8d4d8 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -5,6 +5,17 @@
 #include <asm/percpu.h>
 #include <linux/thread_info.h>
 
+#ifdef CONFIG_X86_32
+/*
+ * i386's current_thread_info() depends on ESP and for interrupt/exception
+ * stacks this doesn't yield the actual task thread_info.
+ *
+ * We hard rely on the fact that all the TIF_NEED_RESCHED bits are
+ * the same, therefore use the slightly more expensive version below.
+ */
+#define tif_need_resched() test_tsk_thread_flag(current, TIF_NEED_RESCHED)
+#endif
+
 DECLARE_PER_CPU(int, __preempt_count);
 
 /*
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index de83b4eb1642..891c8f8b52fe 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -17,6 +17,10 @@
 
 #include <asm/preempt.h>
 
+#ifndef tif_need_resched
+#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
+#endif
+
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
 extern void preempt_count_add(int val);
 extern void preempt_count_sub(int val);
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index fddbe2023a5d..a629e4b23217 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -118,8 +118,6 @@ static inline __deprecated void set_need_resched(void)
 	 */
 }
 
-#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
-
 #if defined TIF_RESTORE_SIGMASK && !defined HAVE_SET_RESTORE_SIGMASK
 /*
  * An arch can define its own version of set_restore_sigmask() to get the

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-08 12:21         ` Peter Zijlstra
@ 2014-04-08 19:14           ` Michele Ballabio
  2014-04-08 19:51             ` Michele Ballabio
  2014-04-08 20:28           ` Toralf Förster
  2014-04-09  9:14           ` Stefan Bader
  2 siblings, 1 reply; 23+ messages in thread
From: Michele Ballabio @ 2014-04-08 19:14 UTC (permalink / raw)
  To: Peter Zijlstra, Toralf Förster
  Cc: linux-kernel, fweisbec, mingo, Steven Rostedt, David Cohen,
	Stefan Bader, Paolo Bonzini, Borislav Petkov

On 08/04/2014 14:21, Peter Zijlstra wrote:
> On Mon, Apr 07, 2014 at 08:56:58PM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 07, 2014 at 08:16:24PM +0200, Toralf Förster wrote:
>>
>>> v3.14-10353-g2b3a8fd works fine AFAICS
>>> (BTW the fix is stable material, right ?)
>>
>> I'm fairly sure its not; its a rather invasive series; see:
>>
>> 2432e1364bbe x86: Nuke the supervisor_stack field in i386 thread_info
>> b807902a88c4 x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
>> 0788aa6a23cb x86: Prepare removal of previous_esp from i386 thread_info structure
>> 198d208df437 x86: Keep thread_info on thread stack in x86_32
>>
>> Let me see if there's anything 'simpler' we can stuff in .13+
> 
> OK.. so could someone test the below patch to see if that makes things
> work for .13 and .14?
> 
> ---
>  arch/x86/include/asm/preempt.h | 11 +++++++++++
>  include/linux/preempt.h        |  4 ++++
>  include/linux/thread_info.h    |  2 --
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
> index c8b051933b1b..7ada46d8d4d8 100644
> --- a/arch/x86/include/asm/preempt.h
> +++ b/arch/x86/include/asm/preempt.h
> @@ -5,6 +5,17 @@
>  #include <asm/percpu.h>
>  #include <linux/thread_info.h>
>  
> +#ifdef CONFIG_X86_32
> +/*
> + * i386's current_thread_info() depends on ESP and for interrupt/exception
> + * stacks this doesn't yield the actual task thread_info.
> + *
> + * We hard rely on the fact that all the TIF_NEED_RESCHED bits are
> + * the same, therefore use the slightly more expensive version below.
> + */
> +#define tif_need_resched() test_tsk_thread_flag(current, TIF_NEED_RESCHED)
> +#endif
> +
>  DECLARE_PER_CPU(int, __preempt_count);
>  
>  /*
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index de83b4eb1642..891c8f8b52fe 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -17,6 +17,10 @@
>  
>  #include <asm/preempt.h>
>  
> +#ifndef tif_need_resched
> +#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
> +#endif
> +
>  #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
>  extern void preempt_count_add(int val);
>  extern void preempt_count_sub(int val);
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index fddbe2023a5d..a629e4b23217 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -118,8 +118,6 @@ static inline __deprecated void set_need_resched(void)
>  	 */
>  }
>  
> -#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
> -
>  #if defined TIF_RESTORE_SIGMASK && !defined HAVE_SET_RESTORE_SIGMASK
>  /*
>   * An arch can define its own version of set_restore_sigmask() to get the
> 

It works fine for me on top of v3.14.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-08 19:14           ` Michele Ballabio
@ 2014-04-08 19:51             ` Michele Ballabio
  0 siblings, 0 replies; 23+ messages in thread
From: Michele Ballabio @ 2014-04-08 19:51 UTC (permalink / raw)
  To: Peter Zijlstra, Toralf Förster
  Cc: linux-kernel, fweisbec, mingo, Steven Rostedt, David Cohen,
	Stefan Bader, Paolo Bonzini, Borislav Petkov

On 08/04/2014 21:14, Michele Ballabio wrote:
> On 08/04/2014 14:21, Peter Zijlstra wrote:
>> On Mon, Apr 07, 2014 at 08:56:58PM +0200, Peter Zijlstra wrote:
>>> On Mon, Apr 07, 2014 at 08:16:24PM +0200, Toralf Förster wrote:
>>>
>>>> v3.14-10353-g2b3a8fd works fine AFAICS
>>>> (BTW the fix is stable material, right ?)
>>>
>>> I'm fairly sure its not; its a rather invasive series; see:
>>>
>>> 2432e1364bbe x86: Nuke the supervisor_stack field in i386 thread_info
>>> b807902a88c4 x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
>>> 0788aa6a23cb x86: Prepare removal of previous_esp from i386 thread_info structure
>>> 198d208df437 x86: Keep thread_info on thread stack in x86_32
>>>
>>> Let me see if there's anything 'simpler' we can stuff in .13+
>>
>> OK.. so could someone test the below patch to see if that makes things
>> work for .13 and .14?
>>
>> ---
>>  arch/x86/include/asm/preempt.h | 11 +++++++++++
>>  include/linux/preempt.h        |  4 ++++
>>  include/linux/thread_info.h    |  2 --
>>  3 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
>> index c8b051933b1b..7ada46d8d4d8 100644
>> --- a/arch/x86/include/asm/preempt.h
>> +++ b/arch/x86/include/asm/preempt.h
>> @@ -5,6 +5,17 @@
>>  #include <asm/percpu.h>
>>  #include <linux/thread_info.h>
>>  
>> +#ifdef CONFIG_X86_32
>> +/*
>> + * i386's current_thread_info() depends on ESP and for interrupt/exception
>> + * stacks this doesn't yield the actual task thread_info.
>> + *
>> + * We hard rely on the fact that all the TIF_NEED_RESCHED bits are
>> + * the same, therefore use the slightly more expensive version below.
>> + */
>> +#define tif_need_resched() test_tsk_thread_flag(current, TIF_NEED_RESCHED)
>> +#endif
>> +
>>  DECLARE_PER_CPU(int, __preempt_count);
>>  
>>  /*
>> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
>> index de83b4eb1642..891c8f8b52fe 100644
>> --- a/include/linux/preempt.h
>> +++ b/include/linux/preempt.h
>> @@ -17,6 +17,10 @@
>>  
>>  #include <asm/preempt.h>
>>  
>> +#ifndef tif_need_resched
>> +#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
>> +#endif
>> +
>>  #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
>>  extern void preempt_count_add(int val);
>>  extern void preempt_count_sub(int val);
>> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
>> index fddbe2023a5d..a629e4b23217 100644
>> --- a/include/linux/thread_info.h
>> +++ b/include/linux/thread_info.h
>> @@ -118,8 +118,6 @@ static inline __deprecated void set_need_resched(void)
>>  	 */
>>  }
>>  
>> -#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
>> -
>>  #if defined TIF_RESTORE_SIGMASK && !defined HAVE_SET_RESTORE_SIGMASK
>>  /*
>>   * An arch can define its own version of set_restore_sigmask() to get the
>>
> 
> It works fine for me on top of v3.14.

And on top of v3.13, too.

Thanks.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-08 12:21         ` Peter Zijlstra
  2014-04-08 19:14           ` Michele Ballabio
@ 2014-04-08 20:28           ` Toralf Förster
  2014-04-09  9:14           ` Stefan Bader
  2 siblings, 0 replies; 23+ messages in thread
From: Toralf Förster @ 2014-04-08 20:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michele Ballabio, linux-kernel, fweisbec, mingo, Steven Rostedt,
	David Cohen, Stefan Bader, Paolo Bonzini, Borislav Petkov

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 04/08/2014 02:21 PM, Peter Zijlstra wrote:
> On Mon, Apr 07, 2014 at 08:56:58PM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 07, 2014 at 08:16:24PM +0200, Toralf Förster wrote:
>> 
>>> v3.14-10353-g2b3a8fd works fine AFAICS (BTW the fix is stable
>>> material, right ?)
>> 
>> I'm fairly sure its not; its a rather invasive series; see:
>> 
>> 2432e1364bbe x86: Nuke the supervisor_stack field in i386
>> thread_info b807902a88c4 x86: Nuke GET_THREAD_INFO_WITH_ESP()
>> macro for i386 0788aa6a23cb x86: Prepare removal of previous_esp
>> from i386 thread_info structure 198d208df437 x86: Keep
>> thread_info on thread stack in x86_32
>> 
>> Let me see if there's anything 'simpler' we can stuff in .13+
> 
> OK.. so could someone test the below patch to see if that makes
> things work for .13 and .14?
> 
patch applied on 3.1.3.9 and 3.14.0 it fixed the issue AFAICT

> --- arch/x86/include/asm/preempt.h | 11 +++++++++++ 
> include/linux/preempt.h        |  4 ++++ 
> include/linux/thread_info.h    |  2 -- 3 files changed, 15
> insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/preempt.h
> b/arch/x86/include/asm/preempt.h index c8b051933b1b..7ada46d8d4d8
> 100644 --- a/arch/x86/include/asm/preempt.h +++
> b/arch/x86/include/asm/preempt.h @@ -5,6 +5,17 @@ #include
> <asm/percpu.h> #include <linux/thread_info.h>
> 
> +#ifdef CONFIG_X86_32 +/* + * i386's current_thread_info() depends
> on ESP and for interrupt/exception + * stacks this doesn't yield
> the actual task thread_info. + * + * We hard rely on the fact that
> all the TIF_NEED_RESCHED bits are + * the same, therefore use the
> slightly more expensive version below. + */ +#define
> tif_need_resched() test_tsk_thread_flag(current, TIF_NEED_RESCHED) 
> +#endif + DECLARE_PER_CPU(int, __preempt_count);
> 
> /* diff --git a/include/linux/preempt.h b/include/linux/preempt.h 
> index de83b4eb1642..891c8f8b52fe 100644 ---
> a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -17,6
> +17,10 @@
> 
> #include <asm/preempt.h>
> 
> +#ifndef tif_need_resched +#define tif_need_resched()
> test_thread_flag(TIF_NEED_RESCHED) +#endif + #if
> defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER) 
> extern void preempt_count_add(int val); extern void
> preempt_count_sub(int val); diff --git
> a/include/linux/thread_info.h b/include/linux/thread_info.h index
> fddbe2023a5d..a629e4b23217 100644 ---
> a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@
> -118,8 +118,6 @@ static inline __deprecated void
> set_need_resched(void) */ }
> 
> -#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED) - 
> #if defined TIF_RESTORE_SIGMASK && !defined
> HAVE_SET_RESTORE_SIGMASK /* * An arch can define its own version of
> set_restore_sigmask() to get the
> 


- -- 
MfG/Sincerely
Toralf Förster
pgp finger print:1A37 6F99 4A9D 026F 13E2 4DCF C4EA CDDE 0076 E94E
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iF0EAREIAAYFAlNEW+EACgkQxOrN3gB26U5RkAD4ysUp9CrCajM7zXLsdWRLTZFi
oX0lYgrKo/gCrcS9UwD/ZdaRPlP6vTKApfxQ7F48KTTA+tNcyTD4xSyCdwI8l0Q=
=WeKZ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-08 12:21         ` Peter Zijlstra
  2014-04-08 19:14           ` Michele Ballabio
  2014-04-08 20:28           ` Toralf Förster
@ 2014-04-09  9:14           ` Stefan Bader
  2014-04-09  9:45             ` Peter Zijlstra
  2 siblings, 1 reply; 23+ messages in thread
From: Stefan Bader @ 2014-04-09  9:14 UTC (permalink / raw)
  To: Peter Zijlstra, Toralf Förster
  Cc: Michele Ballabio, linux-kernel, fweisbec, mingo, Steven Rostedt,
	David Cohen, Paolo Bonzini, Borislav Petkov

[-- Attachment #1: Type: text/plain, Size: 3041 bytes --]

On 08.04.2014 14:21, Peter Zijlstra wrote:
> On Mon, Apr 07, 2014 at 08:56:58PM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 07, 2014 at 08:16:24PM +0200, Toralf Förster wrote:
>>
>>> v3.14-10353-g2b3a8fd works fine AFAICS
>>> (BTW the fix is stable material, right ?)
>>
>> I'm fairly sure its not; its a rather invasive series; see:
>>
>> 2432e1364bbe x86: Nuke the supervisor_stack field in i386 thread_info
>> b807902a88c4 x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
>> 0788aa6a23cb x86: Prepare removal of previous_esp from i386 thread_info structure
>> 198d208df437 x86: Keep thread_info on thread stack in x86_32
>>
>> Let me see if there's anything 'simpler' we can stuff in .13+
> 
> OK.. so could someone test the below patch to see if that makes things
> work for .13 and .14?

Hi Peter,

sorry I only had time to try 3.13 but that looks good. I was able to get kvm up
and running for a longer than usual (without the patch) amount of time without
seeing any softlockups.

-Stefan
> 
> ---
>  arch/x86/include/asm/preempt.h | 11 +++++++++++
>  include/linux/preempt.h        |  4 ++++
>  include/linux/thread_info.h    |  2 --
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
> index c8b051933b1b..7ada46d8d4d8 100644
> --- a/arch/x86/include/asm/preempt.h
> +++ b/arch/x86/include/asm/preempt.h
> @@ -5,6 +5,17 @@
>  #include <asm/percpu.h>
>  #include <linux/thread_info.h>
>  
> +#ifdef CONFIG_X86_32
> +/*
> + * i386's current_thread_info() depends on ESP and for interrupt/exception
> + * stacks this doesn't yield the actual task thread_info.
> + *
> + * We hard rely on the fact that all the TIF_NEED_RESCHED bits are
> + * the same, therefore use the slightly more expensive version below.
> + */
> +#define tif_need_resched() test_tsk_thread_flag(current, TIF_NEED_RESCHED)
> +#endif
> +
>  DECLARE_PER_CPU(int, __preempt_count);
>  
>  /*
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index de83b4eb1642..891c8f8b52fe 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -17,6 +17,10 @@
>  
>  #include <asm/preempt.h>
>  
> +#ifndef tif_need_resched
> +#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
> +#endif
> +
>  #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
>  extern void preempt_count_add(int val);
>  extern void preempt_count_sub(int val);
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index fddbe2023a5d..a629e4b23217 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -118,8 +118,6 @@ static inline __deprecated void set_need_resched(void)
>  	 */
>  }
>  
> -#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
> -
>  #if defined TIF_RESTORE_SIGMASK && !defined HAVE_SET_RESTORE_SIGMASK
>  /*
>   * An arch can define its own version of set_restore_sigmask() to get the
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Bisected KVM hang on x86-32 between v3.12 and v3.13
  2014-04-09  9:14           ` Stefan Bader
@ 2014-04-09  9:45             ` Peter Zijlstra
  2014-04-09 14:24               ` [PATCH -stable] x86,preempt: Fix preemption for i386 Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-04-09  9:45 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Toralf Förster, Michele Ballabio, linux-kernel, fweisbec,
	mingo, Steven Rostedt, David Cohen, Paolo Bonzini,
	Borislav Petkov

On Wed, Apr 09, 2014 at 11:14:38AM +0200, Stefan Bader wrote:
> On 08.04.2014 14:21, Peter Zijlstra wrote:
> > On Mon, Apr 07, 2014 at 08:56:58PM +0200, Peter Zijlstra wrote:
> >> On Mon, Apr 07, 2014 at 08:16:24PM +0200, Toralf Förster wrote:
> >>
> >>> v3.14-10353-g2b3a8fd works fine AFAICS
> >>> (BTW the fix is stable material, right ?)
> >>
> >> I'm fairly sure its not; its a rather invasive series; see:
> >>
> >> 2432e1364bbe x86: Nuke the supervisor_stack field in i386 thread_info
> >> b807902a88c4 x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
> >> 0788aa6a23cb x86: Prepare removal of previous_esp from i386 thread_info structure
> >> 198d208df437 x86: Keep thread_info on thread stack in x86_32
> >>
> >> Let me see if there's anything 'simpler' we can stuff in .13+
> > 
> > OK.. so could someone test the below patch to see if that makes things
> > work for .13 and .14?
> 
> Hi Peter,
> 
> sorry I only had time to try 3.13 but that looks good. I was able to get kvm up
> and running for a longer than usual (without the patch) amount of time without
> seeing any softlockups.

Great, I suppose I should go write a changelog and talk to Greg about
how to get this into .13 and .14 stable.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH -stable] x86,preempt: Fix preemption for i386
  2014-04-09  9:45             ` Peter Zijlstra
@ 2014-04-09 14:24               ` Peter Zijlstra
  2014-04-09 14:36                 ` Linus Torvalds
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-04-09 14:24 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Toralf Förster, Michele Ballabio, linux-kernel, fweisbec,
	mingo, Steven Rostedt, David Cohen, Paolo Bonzini,
	Borislav Petkov, Linus Torvalds, Greg KH

Greg, Linus,

I'm not entirely clear on how acceptable it is to propose a different
patch for -stable than what we have upstream.

The below Changelog explain, but in short, we should either backport 4
rather invasive patches to .13-stable and .14-stable or do the small
patch provided.

The 4 patches by Steven are the right thing to do, but I feel they might
be too invasive to propose for -stable, therefore the alternative
approach.

---
Subject: x86,preempt: Fix preemption for i386
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 8 Apr 2014 14:21:28 +0200

Many people reported preemption/reschedule problems with i386 kernels
for .13 and .14. After Michele bisected this to a combination of

  3e8e42c69bb ("sched: Revert need_resched() to look at TIF_NEED_RESCHED")
  ded79754754 ("irq: Force hardirq exit's softirq processing on its own stack")

it finally dawned on me that i386's current_thread_info() was to
blame.

When we are on interrupt/exception stacks, we fail to observe the
right TIF_NEED_RESCHED bit and therefore the PREEMPT_NEED_RESCHED
folding malfunctions.

Current upstream fixes this by making i386 behave the same as x86_64
already did:

  2432e1364bbe ("x86: Nuke the supervisor_stack field in i386 thread_info")
  b807902a88c4 ("x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386")
  0788aa6a23cb ("x86: Prepare removal of previous_esp from i386 thread_info structure")
  198d208df437 ("x86: Keep thread_info on thread stack in x86_32")

However, that is far too much to stuff into -stable. Therefore I
propose we merge the below patch which uses task_thread_info(current)
for tif_need_resched() instead of the ESP based current_thread_info().

This makes sure we always observe the one true TIF_NEED_RESCHED bit
and things will work as expected again.

Cc: fweisbec@gmail.com
Cc: mingo@kernel.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Tested-by: Stefan Bader <stefan.bader@canonical.com>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/preempt.h |   11 +++++++++++
 include/linux/preempt.h        |    4 ++++
 include/linux/thread_info.h    |    2 --
 3 files changed, 15 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -5,6 +5,17 @@
 #include <asm/percpu.h>
 #include <linux/thread_info.h>
 
+#ifdef CONFIG_X86_32
+/*
+ * i386's current_thread_info() depends on ESP and for interrupt/exception
+ * stacks this doesn't yield the actual task thread_info.
+ *
+ * We hard rely on the fact that all the TIF_NEED_RESCHED bits are
+ * the same, therefore use the slightly more expensive version below.
+ */
+#define tif_need_resched() test_tsk_thread_flag(current, TIF_NEED_RESCHED)
+#endif
+
 DECLARE_PER_CPU(int, __preempt_count);
 
 /*
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -17,6 +17,10 @@
 
 #include <asm/preempt.h>
 
+#ifndef tif_need_resched
+#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
+#endif
+
 #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
 extern void preempt_count_add(int val);
 extern void preempt_count_sub(int val);
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -118,8 +118,6 @@ static inline __deprecated void set_need
 	 */
 }
 
-#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
-
 #if defined TIF_RESTORE_SIGMASK && !defined HAVE_SET_RESTORE_SIGMASK
 /*
  * An arch can define its own version of set_restore_sigmask() to get the

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -stable] x86,preempt: Fix preemption for i386
  2014-04-09 14:24               ` [PATCH -stable] x86,preempt: Fix preemption for i386 Peter Zijlstra
@ 2014-04-09 14:36                 ` Linus Torvalds
  2014-04-09 19:19                   ` Greg KH
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2014-04-09 14:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stefan Bader, Toralf Förster, Michele Ballabio,
	Linux Kernel Mailing List, Frédéric Weisbecker,
	Ingo Molnar, Steven Rostedt, David Cohen, Paolo Bonzini,
	Borislav Petkov, Greg KH

On Wed, Apr 9, 2014 at 7:24 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> I'm not entirely clear on how acceptable it is to propose a different
> patch for -stable than what we have upstream.

It's not all that common, but it certainly happens.

It's fine, as long as it mentions the commits that fix it upstream.
And as long as it's well tested, of course.

                     Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -stable] x86,preempt: Fix preemption for i386
  2014-04-09 14:36                 ` Linus Torvalds
@ 2014-04-09 19:19                   ` Greg KH
  2014-04-09 19:38                     ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Greg KH @ 2014-04-09 19:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Stefan Bader, Toralf Förster,
	Michele Ballabio, Linux Kernel Mailing List,
	Frédéric Weisbecker, Ingo Molnar, Steven Rostedt,
	David Cohen, Paolo Bonzini, Borislav Petkov

On Wed, Apr 09, 2014 at 07:36:23AM -0700, Linus Torvalds wrote:
> On Wed, Apr 9, 2014 at 7:24 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > I'm not entirely clear on how acceptable it is to propose a different
> > patch for -stable than what we have upstream.
> 
> It's not all that common, but it certainly happens.
> 
> It's fine, as long as it mentions the commits that fix it upstream.
> And as long as it's well tested, of course.

I agree, I can take this as long as you say it's correct and tested...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -stable] x86,preempt: Fix preemption for i386
  2014-04-09 19:19                   ` Greg KH
@ 2014-04-09 19:38                     ` Peter Zijlstra
  2014-04-09 19:57                       ` Greg KH
  2014-05-13 23:56                       ` Greg KH
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2014-04-09 19:38 UTC (permalink / raw)
  To: Greg KH
  Cc: Linus Torvalds, Stefan Bader, Toralf Förster,
	Michele Ballabio, Linux Kernel Mailing List,
	Frédéric Weisbecker, Ingo Molnar, Steven Rostedt,
	David Cohen, Paolo Bonzini, Borislav Petkov

On Wed, Apr 09, 2014 at 12:19:31PM -0700, Greg KH wrote:
> On Wed, Apr 09, 2014 at 07:36:23AM -0700, Linus Torvalds wrote:
> > On Wed, Apr 9, 2014 at 7:24 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > I'm not entirely clear on how acceptable it is to propose a different
> > > patch for -stable than what we have upstream.
> > 
> > It's not all that common, but it certainly happens.
> > 
> > It's fine, as long as it mentions the commits that fix it upstream.
> > And as long as it's well tested, of course.
> 
> I agree, I can take this as long as you say it's correct and tested...

As far as I understand the issue the patch is indeed correct and I have
3 independent people who confirm their previously reported issues are
now cured (as testified by the Tested-by tags).

There has also been confirmation that upstream does no longer suffer the
problem.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -stable] x86,preempt: Fix preemption for i386
  2014-04-09 19:38                     ` Peter Zijlstra
@ 2014-04-09 19:57                       ` Greg KH
  2014-05-13 23:56                       ` Greg KH
  1 sibling, 0 replies; 23+ messages in thread
From: Greg KH @ 2014-04-09 19:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Stefan Bader, Toralf Förster,
	Michele Ballabio, Linux Kernel Mailing List,
	Frédéric Weisbecker, Ingo Molnar, Steven Rostedt,
	David Cohen, Paolo Bonzini, Borislav Petkov

On Wed, Apr 09, 2014 at 09:38:45PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 09, 2014 at 12:19:31PM -0700, Greg KH wrote:
> > On Wed, Apr 09, 2014 at 07:36:23AM -0700, Linus Torvalds wrote:
> > > On Wed, Apr 9, 2014 at 7:24 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > I'm not entirely clear on how acceptable it is to propose a different
> > > > patch for -stable than what we have upstream.
> > > 
> > > It's not all that common, but it certainly happens.
> > > 
> > > It's fine, as long as it mentions the commits that fix it upstream.
> > > And as long as it's well tested, of course.
> > 
> > I agree, I can take this as long as you say it's correct and tested...
> 
> As far as I understand the issue the patch is indeed correct and I have
> 3 independent people who confirm their previously reported issues are
> now cured (as testified by the Tested-by tags).
> 
> There has also been confirmation that upstream does no longer suffer the
> problem.

Thanks for that, I'll queue it up in a bit.

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -stable] x86,preempt: Fix preemption for i386
  2014-04-09 19:38                     ` Peter Zijlstra
  2014-04-09 19:57                       ` Greg KH
@ 2014-05-13 23:56                       ` Greg KH
  1 sibling, 0 replies; 23+ messages in thread
From: Greg KH @ 2014-05-13 23:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Stefan Bader, Toralf Förster,
	Michele Ballabio, Linux Kernel Mailing List,
	Frédéric Weisbecker, Ingo Molnar, Steven Rostedt,
	David Cohen, Paolo Bonzini, Borislav Petkov

On Wed, Apr 09, 2014 at 09:38:45PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 09, 2014 at 12:19:31PM -0700, Greg KH wrote:
> > On Wed, Apr 09, 2014 at 07:36:23AM -0700, Linus Torvalds wrote:
> > > On Wed, Apr 9, 2014 at 7:24 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > I'm not entirely clear on how acceptable it is to propose a different
> > > > patch for -stable than what we have upstream.
> > > 
> > > It's not all that common, but it certainly happens.
> > > 
> > > It's fine, as long as it mentions the commits that fix it upstream.
> > > And as long as it's well tested, of course.
> > 
> > I agree, I can take this as long as you say it's correct and tested...
> 
> As far as I understand the issue the patch is indeed correct and I have
> 3 independent people who confirm their previously reported issues are
> now cured (as testified by the Tested-by tags).
> 
> There has also been confirmation that upstream does no longer suffer the
> problem.

Now applied, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-05-13 23:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-06 15:19 Bisected KVM hang on x86-32 between v3.12 and v3.13 Michele Ballabio
2014-04-06 15:52 ` Toralf Förster
2014-04-06 17:40   ` Michele Ballabio
2014-04-07 15:03 ` Peter Zijlstra
2014-04-07 15:07   ` Peter Zijlstra
2014-04-07 18:16     ` Toralf Förster
2014-04-07 18:56       ` Peter Zijlstra
2014-04-08 12:21         ` Peter Zijlstra
2014-04-08 19:14           ` Michele Ballabio
2014-04-08 19:51             ` Michele Ballabio
2014-04-08 20:28           ` Toralf Förster
2014-04-09  9:14           ` Stefan Bader
2014-04-09  9:45             ` Peter Zijlstra
2014-04-09 14:24               ` [PATCH -stable] x86,preempt: Fix preemption for i386 Peter Zijlstra
2014-04-09 14:36                 ` Linus Torvalds
2014-04-09 19:19                   ` Greg KH
2014-04-09 19:38                     ` Peter Zijlstra
2014-04-09 19:57                       ` Greg KH
2014-05-13 23:56                       ` Greg KH
2014-04-07 18:59       ` Bisected KVM hang on x86-32 between v3.12 and v3.13 Frederic Weisbecker
2014-04-07 19:57         ` Toralf Förster
2014-04-07 22:43           ` Frederic Weisbecker
2014-04-07 19:49     ` Michele Ballabio

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.