All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org, benh@kernel.crashing.org,
	paulus@samba.org, akpm@linux-foundation.org,
	heiko.carstens@de.ibm.com, schwidefsky@de.ibm.com,
	borntraeger@de.ibm.com, mingo@kernel.org
Subject: Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic
Date: Wed, 26 Nov 2014 18:19:47 +0200	[thread overview]
Message-ID: <20141126161947.GA10850@redhat.com> (raw)
In-Reply-To: <20141126170223.3b108b94@thinkpad-w530>

On Wed, Nov 26, 2014 at 05:02:23PM +0100, David Hildenbrand wrote:
> > > This is what happened on our side (very recent kernel):
> > > 
> > > spin_lock(&lock)
> > > copy_to_user(...)
> > > spin_unlock(&lock)
> > 
> > That's a deadlock even without copy_to_user - it's
> > enough for the thread to be preempted and another one
> > to try taking the lock.
> > 
> > 
> > > 1. s390 locks/unlocks a spin lock with a compare and swap, using the _cpu id_
> > >    as "old value"
> > > 2. we slept during copy_to_user()
> > > 3. the thread got scheduled onto another cpu
> > > 4. spin_unlock failed as the _cpu id_ didn't match (another cpu that locked
> > >    the spinlock tried to unlocked it).
> > > 5. lock remained locked -> deadlock
> > > 
> > > Christian came up with the following explanation:
> > > Without preemption, spin_lock() will not touch the preempt counter.
> > > disable_pfault() will always touch it.
> > > 
> > > Therefore, with preemption disabled, copy_to_user() has no idea that it is
> > > running in atomic context - and will therefore try to sleep.
> > >
> > > So copy_to_user() will on s390:
> > > 1. run "as atomic" while spin_lock() with preemption enabled.
> > > 2. run "as not atomic" while spin_lock() with preemption disabled.
> > > 3.  run "as atomic" while pagefault_disabled() with preemption enabled or
> > > disabled.
> > > 4. run "as not atomic" when really not atomic.
> 
> should have been more clear at that point: 
> preemption enabled == kernel compiled with preemption support
> preemption disabled == kernel compiled without preemption support
> 
> > > 
> > > And exactly nr 2. is the thing that produced the deadlock in our scenario and
> > > the reason why I want a might_sleep() :)
> > 
> > IMHO it's not copy to user that causes the problem.
> > It's the misuse of spinlocks with preemption on.
> 
> As I said, preemption was off.

off -> disabled at compile time?

But the code is broken for people that do enable it.


> > 
> > So might_sleep would make you think copy_to_user is
> > the problem, and e.g. let you paper over it by
> > moving copy_to_user out.
> 
> Actually implementing different way of locking easily fixed the problem for us.
> The old might_sleep() checks would have given us the problem within a few
> seconds (I tested it).


Or enable CONFIG_PREMPT, with same effect (copy_to_user will report
an error).

Do you check  return code from copy to user?
If not then you have another bug ...

> > 
> > Enable lock prover and you will see what the real
> > issue is, which is you didn't disable preempt.
> > and if you did, copy_to_user would be okay.
> > 
> 
> Our kernel is compiled without preemption and we turned on all lock/atomic
> sleep debugging aid. No problem was detected.

But your code is still buggy with preemption on, isn't it?


> ----
> But the question is if we shouldn't rather provide a:
> 
>   copy_to_user_nosleep() implementation that can be called from
>     pagefault_disable() because it won't sleep.
> and a
>   copy_to_user_sleep() implementation that cannot be called from
>     pagefault_disable().
> 
> Another way to fix it would be a reworked pagefault_disable() function that
> somehow sets "a flag", so copy_to_user() knows that it is in fact called from a
> valid context, not just from "some atomic" context. So we could trigger
> might_sleep() when detecting a !pagefault_disable contex

I think all this is just directing people to paper over the
problem. You should normally disable preemption if you take
spinlocks.
Yes it might happen to work if preempt is compiled out
and you don't trigger scheduler, but Linux might
add scheduler calls at any point without notice,
code must be preempt safe.

Maybe add a debug option warning about spinlocks taken
with preempt on.

That would make sense I think.


-- 
MST

WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: linux-arch@vger.kernel.org, heiko.carstens@de.ibm.com,
	linux-kernel@vger.kernel.org, borntraeger@de.ibm.com,
	paulus@samba.org, schwidefsky@de.ibm.com,
	akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
	mingo@kernel.org
Subject: Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic
Date: Wed, 26 Nov 2014 18:19:47 +0200	[thread overview]
Message-ID: <20141126161947.GA10850@redhat.com> (raw)
In-Reply-To: <20141126170223.3b108b94@thinkpad-w530>

On Wed, Nov 26, 2014 at 05:02:23PM +0100, David Hildenbrand wrote:
> > > This is what happened on our side (very recent kernel):
> > > 
> > > spin_lock(&lock)
> > > copy_to_user(...)
> > > spin_unlock(&lock)
> > 
> > That's a deadlock even without copy_to_user - it's
> > enough for the thread to be preempted and another one
> > to try taking the lock.
> > 
> > 
> > > 1. s390 locks/unlocks a spin lock with a compare and swap, using the _cpu id_
> > >    as "old value"
> > > 2. we slept during copy_to_user()
> > > 3. the thread got scheduled onto another cpu
> > > 4. spin_unlock failed as the _cpu id_ didn't match (another cpu that locked
> > >    the spinlock tried to unlocked it).
> > > 5. lock remained locked -> deadlock
> > > 
> > > Christian came up with the following explanation:
> > > Without preemption, spin_lock() will not touch the preempt counter.
> > > disable_pfault() will always touch it.
> > > 
> > > Therefore, with preemption disabled, copy_to_user() has no idea that it is
> > > running in atomic context - and will therefore try to sleep.
> > >
> > > So copy_to_user() will on s390:
> > > 1. run "as atomic" while spin_lock() with preemption enabled.
> > > 2. run "as not atomic" while spin_lock() with preemption disabled.
> > > 3.  run "as atomic" while pagefault_disabled() with preemption enabled or
> > > disabled.
> > > 4. run "as not atomic" when really not atomic.
> 
> should have been more clear at that point: 
> preemption enabled == kernel compiled with preemption support
> preemption disabled == kernel compiled without preemption support
> 
> > > 
> > > And exactly nr 2. is the thing that produced the deadlock in our scenario and
> > > the reason why I want a might_sleep() :)
> > 
> > IMHO it's not copy to user that causes the problem.
> > It's the misuse of spinlocks with preemption on.
> 
> As I said, preemption was off.

off -> disabled at compile time?

But the code is broken for people that do enable it.


> > 
> > So might_sleep would make you think copy_to_user is
> > the problem, and e.g. let you paper over it by
> > moving copy_to_user out.
> 
> Actually implementing different way of locking easily fixed the problem for us.
> The old might_sleep() checks would have given us the problem within a few
> seconds (I tested it).


Or enable CONFIG_PREMPT, with same effect (copy_to_user will report
an error).

Do you check  return code from copy to user?
If not then you have another bug ...

> > 
> > Enable lock prover and you will see what the real
> > issue is, which is you didn't disable preempt.
> > and if you did, copy_to_user would be okay.
> > 
> 
> Our kernel is compiled without preemption and we turned on all lock/atomic
> sleep debugging aid. No problem was detected.

But your code is still buggy with preemption on, isn't it?


> ----
> But the question is if we shouldn't rather provide a:
> 
>   copy_to_user_nosleep() implementation that can be called from
>     pagefault_disable() because it won't sleep.
> and a
>   copy_to_user_sleep() implementation that cannot be called from
>     pagefault_disable().
> 
> Another way to fix it would be a reworked pagefault_disable() function that
> somehow sets "a flag", so copy_to_user() knows that it is in fact called from a
> valid context, not just from "some atomic" context. So we could trigger
> might_sleep() when detecting a !pagefault_disable contex

I think all this is just directing people to paper over the
problem. You should normally disable preemption if you take
spinlocks.
Yes it might happen to work if preempt is compiled out
and you don't trigger scheduler, but Linux might
add scheduler calls at any point without notice,
code must be preempt safe.

Maybe add a debug option warning about spinlocks taken
with preempt on.

That would make sense I think.


-- 
MST

  reply	other threads:[~2014-11-26 16:20 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-25 11:43 [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic David Hildenbrand
2014-11-25 11:43 ` David Hildenbrand
2014-11-25 11:43 ` [RFC 1/2] powerpc/fsl-pci: atomic get_user when pagefault_disabled David Hildenbrand
2014-11-25 11:43   ` David Hildenbrand
2015-01-30  5:15   ` [RFC,1/2] " Scott Wood
2015-01-30  7:58     ` David Hildenbrand
2014-11-25 11:43 ` [RFC 2/2] mm, sched: trigger might_sleep() in might_fault() when atomic David Hildenbrand
2014-11-25 11:43   ` David Hildenbrand
2014-11-26  7:02 ` [RFC 0/2] Reenable might_sleep() checks for " Michael S. Tsirkin
2014-11-26  7:02   ` Michael S. Tsirkin
2014-11-26 10:05   ` David Hildenbrand
2014-11-26 10:05     ` David Hildenbrand
2014-11-26 15:17     ` Michael S. Tsirkin
2014-11-26 15:17       ` Michael S. Tsirkin
2014-11-26 15:23       ` Michael S. Tsirkin
2014-11-26 15:23         ` Michael S. Tsirkin
2014-11-26 15:23         ` Michael S. Tsirkin
2014-11-26 15:32         ` David Hildenbrand
2014-11-26 15:32           ` David Hildenbrand
2014-11-26 15:47           ` Michael S. Tsirkin
2014-11-26 15:47             ` Michael S. Tsirkin
2014-11-26 16:02             ` David Hildenbrand
2014-11-26 16:02               ` David Hildenbrand
2014-11-26 16:19               ` Michael S. Tsirkin [this message]
2014-11-26 16:19                 ` Michael S. Tsirkin
2014-11-26 16:30                 ` Christian Borntraeger
2014-11-26 16:30                   ` Christian Borntraeger
2014-11-26 16:50                   ` Michael S. Tsirkin
2014-11-26 16:50                     ` Michael S. Tsirkin
2014-11-26 16:07             ` Christian Borntraeger
2014-11-26 16:07               ` Christian Borntraeger
2014-11-26 16:32               ` Michael S. Tsirkin
2014-11-26 16:32                 ` Michael S. Tsirkin
2014-11-26 16:51                 ` Christian Borntraeger
2014-11-26 16:51                   ` Christian Borntraeger
2014-11-26 17:04                   ` Michael S. Tsirkin
2014-11-26 17:04                     ` Michael S. Tsirkin
2014-11-26 17:21                     ` Michael S. Tsirkin
2014-11-26 17:21                       ` Michael S. Tsirkin
2014-11-27  7:09                     ` Heiko Carstens
2014-11-27  7:09                       ` Heiko Carstens
2014-11-27  7:40                       ` Michael S. Tsirkin
2014-11-27  7:40                         ` Michael S. Tsirkin
2014-11-27  8:03                       ` David Hildenbrand
2014-11-27  8:03                         ` David Hildenbrand
2014-11-27 12:04                         ` Heiko Carstens
2014-11-27 12:04                           ` Heiko Carstens
2014-11-27 12:08                           ` David Hildenbrand
2014-11-27 12:08                             ` David Hildenbrand
2014-11-27 15:07                           ` Thomas Gleixner
2014-11-27 15:07                             ` Thomas Gleixner
2014-11-27 15:19                             ` David Hildenbrand
2014-11-27 15:19                               ` David Hildenbrand
2014-11-27 15:37                               ` David Laight
2014-11-27 15:37                                 ` David Laight
2014-11-27 15:37                                 ` David Laight
2014-11-27 15:45                                 ` David Hildenbrand
2014-11-27 15:45                                   ` David Hildenbrand
2014-11-27 16:27                                   ` David Laight
2014-11-27 16:27                                     ` David Laight
2014-11-27 16:49                                     ` David Hildenbrand
2014-11-27 16:49                                       ` David Hildenbrand
2014-11-27 16:49                                       ` David Hildenbrand
2014-11-27 21:52                               ` Thomas Gleixner
2014-11-27 21:52                                 ` Thomas Gleixner
2014-11-28  7:34                                 ` David Hildenbrand
2014-11-28  7:34                                   ` David Hildenbrand
2014-11-26 15:30       ` Christian Borntraeger
2014-11-26 15:30         ` Christian Borntraeger
2014-11-26 15:37         ` Michael S. Tsirkin
2014-11-26 15:37           ` Michael S. Tsirkin
2014-11-26 16:02           ` Christian Borntraeger
2014-11-26 16:02             ` Christian Borntraeger
2014-11-26 15:22     ` Michael S. Tsirkin
2014-11-26 15:22       ` Michael S. Tsirkin
2014-11-27 17:10 ` [PATCH RFC " David Hildenbrand
2014-11-27 17:10   ` David Hildenbrand
2014-11-27 17:10   ` [PATCH RFC 1/2] preempt: track pagefault_disable() calls in the preempt counter David Hildenbrand
2014-11-27 17:10     ` David Hildenbrand
2014-11-27 17:10   ` [PATCH RFC 2/2] mm, sched: trigger might_sleep() in might_fault() when pagefaults are disabled David Hildenbrand
2014-11-27 17:10     ` David Hildenbrand
2014-11-27 17:24     ` Michael S. Tsirkin
2014-11-27 17:24       ` Michael S. Tsirkin
2014-11-27 17:32       ` Michael S. Tsirkin
2014-11-27 17:32         ` Michael S. Tsirkin
2014-11-27 18:08         ` David Hildenbrand
2014-11-27 18:08           ` David Hildenbrand
2014-11-27 18:27           ` Michael S. Tsirkin
2014-11-27 18:27             ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141126161947.GA10850@redhat.com \
    --to=mst@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=borntraeger@de.ibm.com \
    --cc=dahi@linux.vnet.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@kernel.org \
    --cc=paulus@samba.org \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.