linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] VFIO fixes for v4.1-rc2
@ 2015-05-01 17:40 Alex Williamson
  2015-05-01 18:37 ` Linus Torvalds
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2015-05-01 17:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, kvm

Hi Linus,

The following changes since commit b787f68c36d49bb1d9236f403813641efa74a031:

  Linux 4.1-rc1 (2015-04-26 17:59:10 -0700)

are available in the git repository at:

  git://github.com/awilliam/linux-vfio.git tags/vfio-v4.1-rc2

for you to fetch changes up to 82a0eaab980a3af92d46e93eaf299d6c1428c52b:

  vfio-pci: Log device requests more verbosely (2015-04-28 10:23:30 -0600)

----------------------------------------------------------------
Fix some undesirable behavior with the vfio device request interface:

- Flush signals on interrupted wait to retain polling interval (Alex Williamson)

- Increase verbosity of device request channel (Alex Williamson)

----------------------------------------------------------------
Alex Williamson (2):
      vfio: Flush signals on device request interruption
      vfio-pci: Log device requests more verbosely

 drivers/vfio/pci/vfio_pci.c |  8 +++++++-
 drivers/vfio/vfio.c         | 13 ++++++++++---
 2 files changed, 17 insertions(+), 4 deletions(-)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [GIT PULL] VFIO fixes for v4.1-rc2
  2015-05-01 17:40 [GIT PULL] VFIO fixes for v4.1-rc2 Alex Williamson
@ 2015-05-01 18:37 ` Linus Torvalds
  2015-05-01 18:48   ` Alex Williamson
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Linus Torvalds @ 2015-05-01 18:37 UTC (permalink / raw)
  To: Alex Williamson, Oleg Nesterov; +Cc: linux-kernel, kvm

On Fri, May 1, 2015 at 10:40 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> - Flush signals on interrupted wait to retain polling interval (Alex Williamson)

This cannot *possibly* be right. If I read this patch right, you're
randomly just getting rid of signals. No way in hell is that correct.

"flush_signals()" is only for kernel threads, where it's a hacky
alternative to actually handling them (since kernel threads never
rreturn to user space and cannot really "handle" a signal). But you're
doing it in the ->remove handler for the device, which can be called
by arbitrary system processes. This is not a kernel thread thing, as
far as I can see.

If you cannot handle signals, you damn well shouldn't be using
"wait_event_interruptible_timeout()" to begin with. Get rid of the
"interruptible", since it apparently *isn't* interruptible.

So I'm not pulling this.

Now I'm worried that other drivers do insane things like this. I
wonder if we should add some sanity test to flush_signals() to make
sure that it can only ever get called from a kernel thread.

Oleg?

                          Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [GIT PULL] VFIO fixes for v4.1-rc2
  2015-05-01 18:37 ` Linus Torvalds
@ 2015-05-01 18:48   ` Alex Williamson
  2015-05-01 20:23     ` Linus Torvalds
  2015-05-01 19:38   ` [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context Ingo Molnar
  2015-05-01 20:11   ` [GIT PULL] VFIO fixes for v4.1-rc2 Richard Weinberger
  2 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2015-05-01 18:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Oleg Nesterov, linux-kernel, kvm

On Fri, 2015-05-01 at 11:37 -0700, Linus Torvalds wrote:
> On Fri, May 1, 2015 at 10:40 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > - Flush signals on interrupted wait to retain polling interval (Alex Williamson)
> 
> This cannot *possibly* be right. If I read this patch right, you're
> randomly just getting rid of signals. No way in hell is that correct.
> 
> "flush_signals()" is only for kernel threads, where it's a hacky
> alternative to actually handling them (since kernel threads never
> rreturn to user space and cannot really "handle" a signal). But you're
> doing it in the ->remove handler for the device, which can be called
> by arbitrary system processes. This is not a kernel thread thing, as
> far as I can see.
> 
> If you cannot handle signals, you damn well shouldn't be using
> "wait_event_interruptible_timeout()" to begin with. Get rid of the
> "interruptible", since it apparently *isn't* interruptible.
> 
> So I'm not pulling this.

Ok.  It seemed like useful behavior to be able to provide some response
to the user in the event that a ->remove handler is blocked by a device
in-use and the user attempts to abort the action.  Thanks for reviewing,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-01 18:37 ` Linus Torvalds
  2015-05-01 18:48   ` Alex Williamson
@ 2015-05-01 19:38   ` Ingo Molnar
  2015-05-02  8:30     ` NeilBrown
                       ` (3 more replies)
  2015-05-01 20:11   ` [GIT PULL] VFIO fixes for v4.1-rc2 Richard Weinberger
  2 siblings, 4 replies; 21+ messages in thread
From: Ingo Molnar @ 2015-05-01 19:38 UTC (permalink / raw)
  To: Linus Torvalds, Neil Brown, Evgeniy Polyakov, Stephen Smalley
  Cc: Alex Williamson, Oleg Nesterov, linux-kernel, kvm


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, May 1, 2015 at 10:40 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > - Flush signals on interrupted wait to retain polling interval (Alex Williamson)
> 
> This cannot *possibly* be right. If I read this patch right, you're
> randomly just getting rid of signals. No way in hell is that correct.
> 
> "flush_signals()" is only for kernel threads, where it's a hacky
> alternative to actually handling them (since kernel threads never
> rreturn to user space and cannot really "handle" a signal). But you're
> doing it in the ->remove handler for the device, which can be called
> by arbitrary system processes. This is not a kernel thread thing, as
> far as I can see.
> 
> If you cannot handle signals, you damn well shouldn't be using
> "wait_event_interruptible_timeout()" to begin with. Get rid of the
> "interruptible", since it apparently *isn't* interruptible.
> 
> So I'm not pulling this.
> 
> Now I'm worried that other drivers do insane things like this. I
> wonder if we should add some sanity test to flush_signals() to make
> sure that it can only ever get called from a kernel thread.
> 
> Oleg?

So there are these uses:

  triton:~/tip> git grep -lw flush_signals
  arch/arm/common/bL_switcher.c

Looks safe: used within the bL_switcher_thread() kthread.

  drivers/block/drbd/drbd_main.c
  drivers/block/drbd/drbd_nl.c
  drivers/block/drbd/drbd_receiver.c
  drivers/block/drbd/drbd_worker.c

Couldn't convince myself it's safe, but it appears to be. (Call chains 
are obfuscated in various ways that makes it hard to tell where a 
given function execute.)

  drivers/md/md.c
  drivers/md/raid1.c
  drivers/md/raid5.c

Hm, so I'm not super sure about the flush_signals() in 
raid1.c:make_request() AFAICS we can do direct RAID1 writes in 
raid1_unplug(). That looks unsafe ... I've Cc:-ed Neil.

raid5.c seems safe: raid5_unplug() doesn't create requests directly, 
leaves it all for the mddev kthread.

  drivers/scsi/bnx2fc/bnx2fc_fcoe.c
  drivers/scsi/bnx2fc/bnx2fc_tgt.c
  drivers/scsi/bnx2i/bnx2i_iscsi.c
  drivers/scsi/libiscsi.c
  drivers/target/iscsi/iscsi_target_login.c
  drivers/target/iscsi/iscsi_target_nego.c

Couldn't fully check it due to excessive complexity, but seemed safe.

  drivers/staging/rtl8188eu/core/rtw_cmd.c
  drivers/staging/rtl8712/osdep_service.h

Looks safe: done in RTW_CMD_THREAD and 'padapter' kthreads.

  drivers/w1/w1_family.c
  drivers/w1/w1_int.c

Looks unsafe: called from various module exit handlers in:

  drivers/w1/slaves/w1_bq27000.c
  drivers/w1/slaves/w1_ds2406.c
  drivers/w1/slaves/w1_ds2408.c
  drivers/w1/slaves/w1_ds2413.c
  drivers/w1/slaves/w1_ds2423.c
  drivers/w1/slaves/w1_ds2431.c
  drivers/w1/slaves/w1_ds2433.c
  drivers/w1/slaves/w1_ds2760.c
  drivers/w1/slaves/w1_ds2780.c
  drivers/w1/slaves/w1_ds2781.c
  drivers/w1/slaves/w1_ds28e04.c
  drivers/w1/slaves/w1_smem.c
  drivers/w1/slaves/w1_therm.c

which would be executed in rmmod context, losing signals.
Cc:-ed Evgeniy.

  fs/lockd/svc.c
  fs/nfs/callback.c
  fs/nfsd/nfssvc.c

Looks safe: lockd, nfsd plus nfsv4.%u-svc kthreads.

  kernel/locking/rtmutex-tester.c

Looks safe: used within a kthread.

  include/linux/sched.h
  kernel/signal.c

Both safe ;-)

I also found a __flush_signals() use in:

  security/selinux/hooks.c

Now that's selinux_bprm_committed_creds(), apparently executed on 
exec(). Also does stuff like:

                memset(&itimer, 0, sizeof itimer);
                for (i = 0; i < 3; i++)
                        do_setitimer(i, &itimer, NULL);

and unblocks signals as well:

                        sigemptyset(&current->blocked);

but this appears to be kind of legit: the task failed to get the 
required permissions, and guns go off.

In any case, it seems to me that the patch below would be justified? 
Totally untested and so. __flush_signals() not affected.

Thanks,

	Ingo

---
 kernel/signal.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index d51c5ddd855c..100e30afe5d2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -427,6 +427,10 @@ void flush_signals(struct task_struct *t)
 {
 	unsigned long flags;
 
+	/* Only kthreads are allowed to destroy signals: */
+	if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD)))
+		return;
+
 	spin_lock_irqsave(&t->sighand->siglock, flags);
 	__flush_signals(t);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [GIT PULL] VFIO fixes for v4.1-rc2
  2015-05-01 18:37 ` Linus Torvalds
  2015-05-01 18:48   ` Alex Williamson
  2015-05-01 19:38   ` [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context Ingo Molnar
@ 2015-05-01 20:11   ` Richard Weinberger
  2015-05-01 21:09     ` Richard Weinberger
  2 siblings, 1 reply; 21+ messages in thread
From: Richard Weinberger @ 2015-05-01 20:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alex Williamson, Oleg Nesterov, linux-kernel, kvm

On Fri, May 1, 2015 at 8:37 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> "flush_signals()" is only for kernel threads, where it's a hacky
> alternative to actually handling them (since kernel threads never
> rreturn to user space and cannot really "handle" a signal). But you're
> doing it in the ->remove handler for the device, which can be called
> by arbitrary system processes. This is not a kernel thread thing, as
> far as I can see.
>
> If you cannot handle signals, you damn well shouldn't be using
> "wait_event_interruptible_timeout()" to begin with. Get rid of the
> "interruptible", since it apparently *isn't* interruptible.
>
> So I'm not pulling this.
>
> Now I'm worried that other drivers do insane things like this. I
> wonder if we should add some sanity test to flush_signals() to make
> sure that it can only ever get called from a kernel thread.

Hmm, a quick grep exposes some questionable users.
At least w1 looks fishy.
drivers/w1/w1_family.c:w1_unregister_family
drivers/w1/w1_int.c:__w1_remove_master_device

What do you think about a WARN_ON like:

diff --git a/kernel/signal.c b/kernel/signal.c
index d51c5dd..b4079c3 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -427,6 +427,8 @@ void flush_signals(struct task_struct *t)
 {
        unsigned long flags;

+       WARN_ON((current->flags & PF_KTHREAD) == 0);
+
        spin_lock_irqsave(&t->sighand->siglock, flags);
        __flush_signals(t);
        spin_unlock_irqrestore(&t->sighand->siglock, flags);

-- 
Thanks,
//richard

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [GIT PULL] VFIO fixes for v4.1-rc2
  2015-05-01 18:48   ` Alex Williamson
@ 2015-05-01 20:23     ` Linus Torvalds
  2015-05-01 22:03       ` Alex Williamson
  0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2015-05-01 20:23 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Oleg Nesterov, linux-kernel, kvm

On Fri, May 1, 2015 at 11:48 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> Ok.  It seemed like useful behavior to be able to provide some response
> to the user in the event that a ->remove handler is blocked by a device
> in-use and the user attempts to abort the action.

Well, that kind of notification *might* be useful, but at the cost of
saying "somebody tried to send you a signal, so I am now telling you
about it, and then deleting that signal, and you'll never know what it
actually was"?

That's not useful, that's just wrong.

Now, what might in theory be useful - but I haven't actually seen
anybody do anything like that - is to start out with an interruptible
sleep, warn if you get interrupted, and then continue with an
un-interruptible sleep (leaving the signal active).

But even that sounds like a very special case, and I don't think
anything has ever done that.

In general, our signal handling falls into three distinct categories:

 (a) interruptible (and you can cancel the operation and return "try again")

 (b) killable (you can cancel the operation, knowing that the
requester will be killed and won't try again)

 (c) uninterruptible

where that (b) tends to be a special case of an operation that
technically isn't really interruptible (because the ABI doesn't allow
for retrying or error returns), but knowing that the caller will never
see the error case because it's killed means that you can do it. The
classic example of that is an NFS mount that is mounted "nointr" - you
can't return EINTR for a read or a write (because that invalidates
POSIX) but you want to let SIGKILL just kill the process in the middle
when the network is hung.

                      Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [GIT PULL] VFIO fixes for v4.1-rc2
  2015-05-01 20:11   ` [GIT PULL] VFIO fixes for v4.1-rc2 Richard Weinberger
@ 2015-05-01 21:09     ` Richard Weinberger
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Weinberger @ 2015-05-01 21:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alex Williamson, Oleg Nesterov, linux-kernel, kvm

...of course I meant t-> and not current->



On 5/1/15, Richard Weinberger <richard.weinberger@gmail.com> wrote:
> On Fri, May 1, 2015 at 8:37 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> "flush_signals()" is only for kernel threads, where it's a hacky
>> alternative to actually handling them (since kernel threads never
>> rreturn to user space and cannot really "handle" a signal). But you're
>> doing it in the ->remove handler for the device, which can be called
>> by arbitrary system processes. This is not a kernel thread thing, as
>> far as I can see.
>>
>> If you cannot handle signals, you damn well shouldn't be using
>> "wait_event_interruptible_timeout()" to begin with. Get rid of the
>> "interruptible", since it apparently *isn't* interruptible.
>>
>> So I'm not pulling this.
>>
>> Now I'm worried that other drivers do insane things like this. I
>> wonder if we should add some sanity test to flush_signals() to make
>> sure that it can only ever get called from a kernel thread.
>
> Hmm, a quick grep exposes some questionable users.
> At least w1 looks fishy.
> drivers/w1/w1_family.c:w1_unregister_family
> drivers/w1/w1_int.c:__w1_remove_master_device
>
> What do you think about a WARN_ON like:
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index d51c5dd..b4079c3 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -427,6 +427,8 @@ void flush_signals(struct task_struct *t)
>  {
>         unsigned long flags;
>
> +       WARN_ON((current->flags & PF_KTHREAD) == 0);
> +
>         spin_lock_irqsave(&t->sighand->siglock, flags);
>         __flush_signals(t);
>         spin_unlock_irqrestore(&t->sighand->siglock, flags);
>
> --
> Thanks,
> //richard
>


-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [GIT PULL] VFIO fixes for v4.1-rc2
  2015-05-01 20:23     ` Linus Torvalds
@ 2015-05-01 22:03       ` Alex Williamson
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Williamson @ 2015-05-01 22:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Oleg Nesterov, linux-kernel, kvm

On Fri, 2015-05-01 at 13:23 -0700, Linus Torvalds wrote:
> On Fri, May 1, 2015 at 11:48 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > Ok.  It seemed like useful behavior to be able to provide some response
> > to the user in the event that a ->remove handler is blocked by a device
> > in-use and the user attempts to abort the action.
> 
> Well, that kind of notification *might* be useful, but at the cost of
> saying "somebody tried to send you a signal, so I am now telling you
> about it, and then deleting that signal, and you'll never know what it
> actually was"?
> 
> That's not useful, that's just wrong.

Yep, it was a bad idea.

> Now, what might in theory be useful - but I haven't actually seen
> anybody do anything like that - is to start out with an interruptible
> sleep, warn if you get interrupted, and then continue with an
> un-interruptible sleep (leaving the signal active).

I was considering doing exactly this.

> But even that sounds like a very special case, and I don't think
> anything has ever done that.
> 
> In general, our signal handling falls into three distinct categories:
> 
>  (a) interruptible (and you can cancel the operation and return "try again")
> 
>  (b) killable (you can cancel the operation, knowing that the
> requester will be killed and won't try again)
> 
>  (c) uninterruptible
> 
> where that (b) tends to be a special case of an operation that
> technically isn't really interruptible (because the ABI doesn't allow
> for retrying or error returns), but knowing that the caller will never
> see the error case because it's killed means that you can do it. The
> classic example of that is an NFS mount that is mounted "nointr" - you
> can't return EINTR for a read or a write (because that invalidates
> POSIX) but you want to let SIGKILL just kill the process in the middle
> when the network is hung.

I think we're in that (c) case unless we want to change our driver API
to allow driver remove callbacks to return error.  Killing the caller
doesn't really help the situation without being able to back out of the
remove path.  Killing the task with the device open would help, but
seems rather harsh.  I expect we eventually want to be able to escalate
to revoking the device from the user, but currently we only have a
notifier to request the device from cooperative users.  In the event of
an uncooperative user, we block, which can be difficult to figure out,
especially when we're dealing with SR-IOV devices and a PF unbind
implicitly induces a VF unbind.  The interruptible component here is
simply a logging mechanism which should have turned into an
"interruptible_once" rather than a signal flush.

I try to avoid vfio being a special case, but maybe in this instance
it's worthwhile.  If you have other suggestions, please let me know.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-01 19:38   ` [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context Ingo Molnar
@ 2015-05-02  8:30     ` NeilBrown
  2015-05-02 16:27       ` Linus Torvalds
  2015-05-04 17:35       ` Oleg Nesterov
  2015-05-02 11:56     ` Evgeniy Polyakov
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 21+ messages in thread
From: NeilBrown @ 2015-05-02  8:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Evgeniy Polyakov, Stephen Smalley,
	Alex Williamson, Oleg Nesterov, linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 2044 bytes --]

On Fri, 1 May 2015 21:38:13 +0200 Ingo Molnar <mingo@kernel.org> wrote:


>   drivers/md/md.c
>   drivers/md/raid1.c
>   drivers/md/raid5.c
> 
> Hm, so I'm not super sure about the flush_signals() in 
> raid1.c:make_request() AFAICS we can do direct RAID1 writes in 
> raid1_unplug(). That looks unsafe ... I've Cc:-ed Neil.
> 
> raid5.c seems safe: raid5_unplug() doesn't create requests directly, 
> leaves it all for the mddev kthread.

Both raid1.c and raid5.c call flush_signals() in the make_request function
(in unusual circumstances).
I wanted a uninterruptible wait which didn't add to load-average.  That
approach works in kernel threads...

All the calls in md.c are in a kernel thread so safe, but I'd rather have an
explicit "uninterruptible, but no load-average" wait....

I should  probably change the make_request code to queue the request
somewhere rather than wait for it to be serviceable.

I'll look into that...



> In any case, it seems to me that the patch below would be justified? 
> Totally untested and so. __flush_signals() not affected.

Fine by me - does seem justified.

Thanks,
NeilBrown


> 
> Thanks,
> 
> 	Ingo
> 
> ---
>  kernel/signal.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index d51c5ddd855c..100e30afe5d2 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -427,6 +427,10 @@ void flush_signals(struct task_struct *t)
>  {
>  	unsigned long flags;
>  
> +	/* Only kthreads are allowed to destroy signals: */
> +	if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD)))
> +		return;
> +
>  	spin_lock_irqsave(&t->sighand->siglock, flags);
>  	__flush_signals(t);
>  	spin_unlock_irqrestore(&t->sighand->siglock, flags);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-01 19:38   ` [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context Ingo Molnar
  2015-05-02  8:30     ` NeilBrown
@ 2015-05-02 11:56     ` Evgeniy Polyakov
  2015-05-02 16:33     ` Richard Weinberger
  2015-05-03 17:34     ` Oleg Nesterov
  3 siblings, 0 replies; 21+ messages in thread
From: Evgeniy Polyakov @ 2015-05-02 11:56 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, Neil Brown, Stephen Smalley
  Cc: Alex Williamson, Oleg Nesterov, linux-kernel, kvm

Hi Ingo

01.05.2015, 22:38, "Ingo Molnar" <mingo@kernel.org>:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>  On Fri, May 1, 2015 at 10:40 AM, Alex Williamson
>>  <alex.williamson@redhat.com> wrote:
>>>  - Flush signals on interrupted wait to retain polling interval (Alex Williamson)
>>  This cannot *possibly* be right. If I read this patch right, you're
>>  randomly just getting rid of signals. No way in hell is that correct.
>>
>>  "flush_signals()" is only for kernel threads, where it's a hacky
>>  alternative to actually handling them (since kernel threads never
>>  rreturn to user space and cannot really "handle" a signal). But you're
>>  doing it in the ->remove handler for the device, which can be called
>>  by arbitrary system processes. This is not a kernel thread thing, as
>>  far as I can see.
>>
>>  If you cannot handle signals, you damn well shouldn't be using
>>  "wait_event_interruptible_timeout()" to begin with. Get rid of the
>>  "interruptible", since it apparently *isn't* interruptible.
>>
>>  So I'm not pulling this.
>>
>>  Now I'm worried that other drivers do insane things like this. I
>>  wonder if we should add some sanity test to flush_signals() to make
>>  sure that it can only ever get called from a kernel thread.

> Looks unsafe: called from various module exit handlers in:
>
>   drivers/w1/slaves/w1_bq27000.c
>   drivers/w1/slaves/w1_ds2406.c
>   drivers/w1/slaves/w1_ds2408.c
>   drivers/w1/slaves/w1_ds2413.c
>   drivers/w1/slaves/w1_ds2423.c
>   drivers/w1/slaves/w1_ds2431.c
>   drivers/w1/slaves/w1_ds2433.c
>   drivers/w1/slaves/w1_ds2760.c
>   drivers/w1/slaves/w1_ds2780.c
>   drivers/w1/slaves/w1_ds2781.c
>   drivers/w1/slaves/w1_ds28e04.c
>   drivers/w1/slaves/w1_smem.c
>   drivers/w1/slaves/w1_therm.c
>
> which would be executed in rmmod context, losing signals.
> Cc:-ed Evgeniy.

w1 uses a little bit strange refcnt logic, basically every w1 module waits
for all its users to release their w1 resources and doesn't exit until its safe.

To wait for resources to be freed at module exit time it checks its refcnt to drop to zero periodically
and sleeps between the checks for a second. w1 flushes signals between these
checks for no particular reason, it can be safely removed from w1_unregister_family()
and interruptible sleep replaced with the normal one.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-02  8:30     ` NeilBrown
@ 2015-05-02 16:27       ` Linus Torvalds
  2015-05-07 12:54         ` Peter Zijlstra
  2015-05-04 17:35       ` Oleg Nesterov
  1 sibling, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2015-05-02 16:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Ingo Molnar, Evgeniy Polyakov, Stephen Smalley, Alex Williamson,
	Oleg Nesterov, linux-kernel, kvm

On Sat, May 2, 2015 at 1:30 AM, NeilBrown <neilb@suse.de> wrote:
>
> All the calls in md.c are in a kernel thread so safe, but I'd rather have an
> explicit "uninterruptible, but no load-average" wait....

Hmm. Our task state is a bitmask anyway, so we could probably just add a

    #define __TASK_NOLOAD              16

(and move the EXIT_xyz defines *away* from the list that is actually
the task state), and teach our load average thing to not count those
kinds of waits. Then you could just use

    TASK_UNINTERRUPTIBLE | __TASK_NOLOAD

to make processes not count towards the load.

Or - probably preferably - we could really clean things up, and make
things much more like the bitmask it *should* be, and have explicit
bits for

 - SLEEPING/STOPPED/EXITING ("why not running?")
 - LOADAVG (accounted towards load)
 - WAKESIG (ie "interruptible")
 - WAKEKILL (this we already have)

and just make the rule be that we use "__TASK_xyz" for the actual
individual bits, and "TASK_xyz" for the helper combinations. So then
we'd have

   #define TASK_UNINTERRUPTIBLE (__TASK_SLEEPING | __TASK_LOADAVG)
   #define TASK_INTERRUPTIBLE (__TASK_SLEEPING | __TASK_WAKESIG)
   #define TASK_KILLABLE (__TASK_SLEEPING | __TASK_WAKEKILL)
   #define TASK_NOLOADAVG (__TASK_SLEEPING)

which is almost certainly how this *should* have been done, but isn't,
because of historical use.

Cleaning up like that *should* be fairly simple, but I'd be a bit
nervous about getting all the state comparisons right (we have an
unholy mix of "check this bit" and "check this whole state", and we'd
need to make sure we get those cases all right).

Ingo, what do you think? This is mostly a scheduler interface issue..

                       Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-01 19:38   ` [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context Ingo Molnar
  2015-05-02  8:30     ` NeilBrown
  2015-05-02 11:56     ` Evgeniy Polyakov
@ 2015-05-02 16:33     ` Richard Weinberger
  2015-05-03 17:34     ` Oleg Nesterov
  3 siblings, 0 replies; 21+ messages in thread
From: Richard Weinberger @ 2015-05-02 16:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Neil Brown, Evgeniy Polyakov, Stephen Smalley,
	Alex Williamson, Oleg Nesterov, linux-kernel, kvm

On Fri, May 1, 2015 at 9:38 PM, Ingo Molnar <mingo@kernel.org> wrote:
> diff --git a/kernel/signal.c b/kernel/signal.c
> index d51c5ddd855c..100e30afe5d2 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -427,6 +427,10 @@ void flush_signals(struct task_struct *t)
>  {
>         unsigned long flags;
>
> +       /* Only kthreads are allowed to destroy signals: */
> +       if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD)))

Shouldn't this be t->flags?

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-01 19:38   ` [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context Ingo Molnar
                       ` (2 preceding siblings ...)
  2015-05-02 16:33     ` Richard Weinberger
@ 2015-05-03 17:34     ` Oleg Nesterov
  2015-05-04 16:45       ` [PATCH 0/1] signals: don't abuse __flush_signals() in selinux_bprm_committed_creds() Oleg Nesterov
  2015-05-06 10:19       ` [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context Ingo Molnar
  3 siblings, 2 replies; 21+ messages in thread
From: Oleg Nesterov @ 2015-05-03 17:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Neil Brown, Evgeniy Polyakov, Stephen Smalley,
	Alex Williamson, linux-kernel, kvm

On 05/01, Ingo Molnar wrote:
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > On Fri, May 1, 2015 at 10:40 AM, Alex Williamson
> > <alex.williamson@redhat.com> wrote:
> > >
> > > - Flush signals on interrupted wait to retain polling interval (Alex Williamson)
> >
> > This cannot *possibly* be right. If I read this patch right, you're
> > randomly just getting rid of signals. No way in hell is that correct.
> >
> > "flush_signals()" is only for kernel threads, where it's a hacky
> > alternative to actually handling them (since kernel threads never
> > rreturn to user space and cannot really "handle" a signal). But you're
> > doing it in the ->remove handler for the device, which can be called
> > by arbitrary system processes. This is not a kernel thread thing, as
> > far as I can see.
> >
> > If you cannot handle signals, you damn well shouldn't be using
> > "wait_event_interruptible_timeout()" to begin with. Get rid of the
> > "interruptible", since it apparently *isn't* interruptible.
> >
> > So I'm not pulling this.
> >
> > Now I'm worried that other drivers do insane things like this. I
> > wonder if we should add some sanity test to flush_signals() to make
> > sure that it can only ever get called from a kernel thread.
> >
> > Oleg?
>
> So there are these uses:
>
>   triton:~/tip> git grep -lw flush_signals
>   arch/arm/common/bL_switcher.c
>
>
> Looks safe: used within the bL_switcher_thread() kthread.

safe but wrong at first glance... I mean, it is pointless. Looks like,
bL_switcher_thread() wrongly thinks that wait_event_interruptible() can
lead to signal_pending().

>   drivers/block/drbd/drbd_main.c
>   drivers/block/drbd/drbd_nl.c
>   drivers/block/drbd/drbd_receiver.c
>   drivers/block/drbd/drbd_worker.c


Oh, I didn't know this helper is abused so much. I'll try to recheck
tomorrow, but it seems that it should die...


>   drivers/md/md.c

I can't understand this code... The comment says:

	/* We need to wait INTERRUPTIBLE so that
	 * we don't add to the load-average.
	 * That means we need to be sure no signals are
	 * pending
	 */
	if (signal_pending(current))
		flush_signals(current);

and this is wrong. However, signal_pending() can be true because of
allow_signal(SIGKILL) above. But why it does allow_signal() ?


>   fs/lockd/svc.c
>   fs/nfs/callback.c
>   fs/nfsd/nfssvc.c
>
> Looks safe: lockd, nfsd plus nfsv4.%u-svc kthreads.

Yes, this case looks fine. But perhaps it makes sense to add another
helper... Or not, I'll try to think more.

> I also found a __flush_signals() use in:
>
>   security/selinux/hooks.c
>
> Now that's selinux_bprm_committed_creds(), apparently executed on
> exec(). Also does stuff like:
>
>                 memset(&itimer, 0, sizeof itimer);
>                 for (i = 0; i < 3; i++)
>                         do_setitimer(i, &itimer, NULL);
>
> and unblocks signals as well:
>
>                         sigemptyset(&current->blocked);
>
> but this appears to be kind of legit: the task failed to get the
> required permissions, and guns go off.

and I simply can't understand this code... but I feel that it can't
be correct ;) Will try to re-read tomorrow.


> In any case, it seems to me that the patch below would be justified?
> Totally untested and so. __flush_signals() not affected.

Yes, agreed, it can't be right if the caller is not kthread.


> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -427,6 +427,10 @@ void flush_signals(struct task_struct *t)
>  {
>  	unsigned long flags;
>
> +	/* Only kthreads are allowed to destroy signals: */
> +	if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD)))
> +		return;
> +

But I am not sure this can't make some buggy driver even more buggy.
Just suppose it does something

	do {
		if (signal_pending())
			flush_signals();
	} while (wait_event_interruptible(...));

and this change will turn this into busy-wait loop.

So perhaps another change which just adds WARN_ON_RATELIMIT() without
"return" will be safer?

Oleg.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 0/1] signals: don't abuse __flush_signals() in selinux_bprm_committed_creds()
  2015-05-03 17:34     ` Oleg Nesterov
@ 2015-05-04 16:45       ` Oleg Nesterov
  2015-05-04 16:45         ` [PATCH 1/1] " Oleg Nesterov
  2015-05-06 10:19       ` [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context Ingo Molnar
  1 sibling, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2015-05-04 16:45 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: Linus Torvalds, Neil Brown, Evgeniy Polyakov, Stephen Smalley,
	Alex Williamson, linux-kernel, kvm, Paul Moore, Eric Paris

On 05/03, Oleg Nesterov wrote:
>
> On 05/01, Ingo Molnar wrote:
> >
>
> > I also found a __flush_signals() use in:
> >
> >   security/selinux/hooks.c
> >
> and I simply can't understand this code... but I feel that it can't
> be correct ;) Will try to re-read tomorrow.

Yes, this doesn't look right. Lets kill __flush_signals() first, there
are no other users.

And I am not sure it is fine to flush SIGSTOP... do we really want this?

Oleg.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/1] signals: don't abuse __flush_signals() in selinux_bprm_committed_creds()
  2015-05-04 16:45       ` [PATCH 0/1] signals: don't abuse __flush_signals() in selinux_bprm_committed_creds() Oleg Nesterov
@ 2015-05-04 16:45         ` Oleg Nesterov
  2015-05-04 19:43           ` Paul Moore
  0 siblings, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2015-05-04 16:45 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: Linus Torvalds, Neil Brown, Evgeniy Polyakov, Stephen Smalley,
	Alex Williamson, linux-kernel, kvm, Paul Moore, Eric Paris

selinux_bprm_committed_creds()->__flush_signals() is not right, we
shouldn't clear TIF_SIGPENDING unconditionally. There can be other
reasons for signal_pending(): freezing(), JOBCTL_PENDING_MASK, and
potentially more.

Also change this code to check fatal_signal_pending() rather than
SIGNAL_GROUP_EXIT, it looks a bit better.

Now we can kill __flush_signals() before it finds another buggy user.

Note: this code looks racy, we can flush a signal which was sent after
the task SID has been updated.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h    |    1 -
 kernel/signal.c          |   13 ++++---------
 security/selinux/hooks.c |    6 ++++--
 3 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db31ef..eb1ac84 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2345,7 +2345,6 @@ extern void sched_dead(struct task_struct *p);
 
 extern void proc_caches_init(void);
 extern void flush_signals(struct task_struct *);
-extern void __flush_signals(struct task_struct *);
 extern void ignore_signals(struct task_struct *);
 extern void flush_signal_handlers(struct task_struct *, int force_default);
 extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info);
diff --git a/kernel/signal.c b/kernel/signal.c
index 16a3052..837ca7d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -414,21 +414,16 @@ void flush_sigqueue(struct sigpending *queue)
 }
 
 /*
- * Flush all pending signals for a task.
+ * Flush all pending signals for this kthread.
  */
-void __flush_signals(struct task_struct *t)
-{
-	clear_tsk_thread_flag(t, TIF_SIGPENDING);
-	flush_sigqueue(&t->pending);
-	flush_sigqueue(&t->signal->shared_pending);
-}
-
 void flush_signals(struct task_struct *t)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&t->sighand->siglock, flags);
-	__flush_signals(t);
+	clear_tsk_thread_flag(t, TIF_SIGPENDING);
+	flush_sigqueue(&t->pending);
+	flush_sigqueue(&t->signal->shared_pending);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);
 }
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6da7532..6907d11 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2397,10 +2397,12 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
 		for (i = 0; i < 3; i++)
 			do_setitimer(i, &itimer, NULL);
 		spin_lock_irq(&current->sighand->siglock);
-		if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
-			__flush_signals(current);
+		if (!fatal_signal_pending(current)) {
+			flush_sigqueue(&current->pending);
+			flush_sigqueue(&current->signal->shared_pending);
 			flush_signal_handlers(current, 1);
 			sigemptyset(&current->blocked);
+			recalc_sigpending();
 		}
 		spin_unlock_irq(&current->sighand->siglock);
 	}
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-02  8:30     ` NeilBrown
  2015-05-02 16:27       ` Linus Torvalds
@ 2015-05-04 17:35       ` Oleg Nesterov
  2015-05-07 13:33         ` Jiri Kosina
  1 sibling, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2015-05-04 17:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: Ingo Molnar, Linus Torvalds, Evgeniy Polyakov, Stephen Smalley,
	Alex Williamson, linux-kernel, kvm

On 05/02, NeilBrown wrote:
>
> All the calls in md.c are in a kernel thread so safe, but I'd rather have an
> explicit "uninterruptible, but no load-average" wait....

Could you please explain why md_thread() does allow_signal(SIGKILL) ?

I am just curious. It looks as if we want to allow user-space to "call"
thread->run(), and this looks strange.

Oleg.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/1] signals: don't abuse __flush_signals() in selinux_bprm_committed_creds()
  2015-05-04 16:45         ` [PATCH 1/1] " Oleg Nesterov
@ 2015-05-04 19:43           ` Paul Moore
  0 siblings, 0 replies; 21+ messages in thread
From: Paul Moore @ 2015-05-04 19:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Ingo Molnar, Linus Torvalds, Neil Brown,
	Evgeniy Polyakov, Stephen Smalley, Alex Williamson, linux-kernel,
	kvm, Eric Paris, selinux

On Monday, May 04, 2015 06:45:58 PM Oleg Nesterov wrote:
> selinux_bprm_committed_creds()->__flush_signals() is not right, we
> shouldn't clear TIF_SIGPENDING unconditionally. There can be other
> reasons for signal_pending(): freezing(), JOBCTL_PENDING_MASK, and
> potentially more.
> 
> Also change this code to check fatal_signal_pending() rather than
> SIGNAL_GROUP_EXIT, it looks a bit better.
> 
> Now we can kill __flush_signals() before it finds another buggy user.

[NOTE: Added the SELinux list to the CC line]

This looks reasonable to me, I'm going to apply it to selinux#next today.

> Note: this code looks racy, we can flush a signal which was sent after
> the task SID has been updated.

The whole signal flush thread has started some discussions about how we are 
currently handling this, and if it still makes sense.  Like many things, it 
seemed like a good idea at the time, but after several years we're debating if 
that is still the case.  I expect we'll be changing this code soon.

> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  include/linux/sched.h    |    1 -
>  kernel/signal.c          |   13 ++++---------
>  security/selinux/hooks.c |    6 ++++--
>  3 files changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 8db31ef..eb1ac84 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2345,7 +2345,6 @@ extern void sched_dead(struct task_struct *p);
> 
>  extern void proc_caches_init(void);
>  extern void flush_signals(struct task_struct *);
> -extern void __flush_signals(struct task_struct *);
>  extern void ignore_signals(struct task_struct *);
>  extern void flush_signal_handlers(struct task_struct *, int force_default);
> extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask,
> siginfo_t *info); diff --git a/kernel/signal.c b/kernel/signal.c
> index 16a3052..837ca7d 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -414,21 +414,16 @@ void flush_sigqueue(struct sigpending *queue)
>  }
> 
>  /*
> - * Flush all pending signals for a task.
> + * Flush all pending signals for this kthread.
>   */
> -void __flush_signals(struct task_struct *t)
> -{
> -	clear_tsk_thread_flag(t, TIF_SIGPENDING);
> -	flush_sigqueue(&t->pending);
> -	flush_sigqueue(&t->signal->shared_pending);
> -}
> -
>  void flush_signals(struct task_struct *t)
>  {
>  	unsigned long flags;
> 
>  	spin_lock_irqsave(&t->sighand->siglock, flags);
> -	__flush_signals(t);
> +	clear_tsk_thread_flag(t, TIF_SIGPENDING);
> +	flush_sigqueue(&t->pending);
> +	flush_sigqueue(&t->signal->shared_pending);
>  	spin_unlock_irqrestore(&t->sighand->siglock, flags);
>  }
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 6da7532..6907d11 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2397,10 +2397,12 @@ static void selinux_bprm_committed_creds(struct
> linux_binprm *bprm) for (i = 0; i < 3; i++)
>  			do_setitimer(i, &itimer, NULL);
>  		spin_lock_irq(&current->sighand->siglock);
> -		if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
> -			__flush_signals(current);
> +		if (!fatal_signal_pending(current)) {
> +			flush_sigqueue(&current->pending);
> +			flush_sigqueue(&current->signal->shared_pending);
>  			flush_signal_handlers(current, 1);
>  			sigemptyset(&current->blocked);
> +			recalc_sigpending();
>  		}
>  		spin_unlock_irq(&current->sighand->siglock);
>  	}

-- 
paul moore
www.paul-moore.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-03 17:34     ` Oleg Nesterov
  2015-05-04 16:45       ` [PATCH 0/1] signals: don't abuse __flush_signals() in selinux_bprm_committed_creds() Oleg Nesterov
@ 2015-05-06 10:19       ` Ingo Molnar
  1 sibling, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2015-05-06 10:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Neil Brown, Evgeniy Polyakov, Stephen Smalley,
	Alex Williamson, linux-kernel, kvm


* Oleg Nesterov <oleg@redhat.com> wrote:

> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -427,6 +427,10 @@ void flush_signals(struct task_struct *t)
> >  {
> >  	unsigned long flags;
> >
> > +	/* Only kthreads are allowed to destroy signals: */
> > +	if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD)))
> > +		return;
> > +
> 
> But I am not sure this can't make some buggy driver even more buggy.
> Just suppose it does something
> 
> 	do {
> 		if (signal_pending())
> 			flush_signals();
> 	} while (wait_event_interruptible(...));
> 
> and this change will turn this into busy-wait loop.
> 
> So perhaps another change which just adds WARN_ON_RATELIMIT() 
> without "return" will be safer?

Yeah, absolutely.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-02 16:27       ` Linus Torvalds
@ 2015-05-07 12:54         ` Peter Zijlstra
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2015-05-07 12:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: NeilBrown, Ingo Molnar, Evgeniy Polyakov, Stephen Smalley,
	Alex Williamson, Oleg Nesterov, linux-kernel, kvm

On Sat, May 02, 2015 at 09:27:49AM -0700, Linus Torvalds wrote:
> On Sat, May 2, 2015 at 1:30 AM, NeilBrown <neilb@suse.de> wrote:
> >
> > All the calls in md.c are in a kernel thread so safe, but I'd rather have an
> > explicit "uninterruptible, but no load-average" wait....
> 
> Hmm. Our task state is a bitmask anyway, so we could probably just add a
> 
>     #define __TASK_NOLOAD              16
> 
> (and move the EXIT_xyz defines *away* from the list that is actually
> the task state), and teach our load average thing to not count those
> kinds of waits. Then you could just use
> 
>     TASK_UNINTERRUPTIBLE | __TASK_NOLOAD
> 
> to make processes not count towards the load.
> 
> Or - probably preferably - we could really clean things up, and make
> things much more like the bitmask it *should* be, and have explicit
> bits for
> 
>  - SLEEPING/STOPPED/EXITING ("why not running?")
>  - LOADAVG (accounted towards load)
>  - WAKESIG (ie "interruptible")
>  - WAKEKILL (this we already have)
> 
> and just make the rule be that we use "__TASK_xyz" for the actual
> individual bits, and "TASK_xyz" for the helper combinations. So then
> we'd have
> 
>    #define TASK_UNINTERRUPTIBLE (__TASK_SLEEPING | __TASK_LOADAVG)
>    #define TASK_INTERRUPTIBLE (__TASK_SLEEPING | __TASK_WAKESIG)
>    #define TASK_KILLABLE (__TASK_SLEEPING | __TASK_WAKEKILL)
>    #define TASK_NOLOADAVG (__TASK_SLEEPING)
> 
> which is almost certainly how this *should* have been done, but isn't,
> because of historical use.
> 
> Cleaning up like that *should* be fairly simple, but I'd be a bit
> nervous about getting all the state comparisons right (we have an
> unholy mix of "check this bit" and "check this whole state", and we'd
> need to make sure we get those cases all right).
> 
> Ingo, what do you think? This is mostly a scheduler interface issue..

Hehe, a little something like this:

  https://lkml.org/lkml/2013/11/12/710

Lemme go clean that up and finish it :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-04 17:35       ` Oleg Nesterov
@ 2015-05-07 13:33         ` Jiri Kosina
  2015-05-07 22:37           ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: Jiri Kosina @ 2015-05-07 13:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: NeilBrown, Ingo Molnar, Linus Torvalds, Evgeniy Polyakov,
	Stephen Smalley, Alex Williamson, linux-kernel, kvm

On Mon, 4 May 2015, Oleg Nesterov wrote:

> > All the calls in md.c are in a kernel thread so safe, but I'd rather have an
> > explicit "uninterruptible, but no load-average" wait....
> 
> Could you please explain why md_thread() does allow_signal(SIGKILL) ?
> 
> I am just curious. It looks as if we want to allow user-space to "call"
> thread->run(), and this looks strange.

One would think that this is because md wants to be notified when system 
is going to be halted/rebooted, and userspace init (whatever that is) 
decides to do 'kill -9 -1' to perform the final shutdown of md (the 
question is why it really should be needed, becasue all filesystems should 
be R/O by that time anyway).

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context
  2015-05-07 13:33         ` Jiri Kosina
@ 2015-05-07 22:37           ` NeilBrown
  0 siblings, 0 replies; 21+ messages in thread
From: NeilBrown @ 2015-05-07 22:37 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Oleg Nesterov, Ingo Molnar, Linus Torvalds, Evgeniy Polyakov,
	Stephen Smalley, Alex Williamson, linux-kernel, kvm

[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]

On Thu, 7 May 2015 15:33:53 +0200 (CEST) Jiri Kosina <jkosina@suse.cz> wrote:

> On Mon, 4 May 2015, Oleg Nesterov wrote:
> 
> > > All the calls in md.c are in a kernel thread so safe, but I'd rather have an
> > > explicit "uninterruptible, but no load-average" wait....
> > 
> > Could you please explain why md_thread() does allow_signal(SIGKILL) ?
> > 
> > I am just curious. It looks as if we want to allow user-space to "call"
> > thread->run(), and this looks strange.
> 
> One would think that this is because md wants to be notified when system 
> is going to be halted/rebooted, and userspace init (whatever that is) 
> decides to do 'kill -9 -1' to perform the final shutdown of md (the 
> question is why it really should be needed, becasue all filesystems should 
> be R/O by that time anyway).
> 

Something like that.  It is historical strangeness that might have seemed
like a good idea at the time, but is hard to justify.

There is an alt-sysrq that will remount filesystems read-only, but no
alt-sysrq to switch RAID arrays to "idle/clean".  So I leverages alt-sysrq-K,
which does "kill -9 -1" (I think).

Since md gained the ability to switch to idle/clean after a short timeout,
and also the ability to use a write-intent-bitmap so only little bits of the
array are ever non idle/clean, this all became much less interesting.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-05-07 22:37 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-01 17:40 [GIT PULL] VFIO fixes for v4.1-rc2 Alex Williamson
2015-05-01 18:37 ` Linus Torvalds
2015-05-01 18:48   ` Alex Williamson
2015-05-01 20:23     ` Linus Torvalds
2015-05-01 22:03       ` Alex Williamson
2015-05-01 19:38   ` [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context Ingo Molnar
2015-05-02  8:30     ` NeilBrown
2015-05-02 16:27       ` Linus Torvalds
2015-05-07 12:54         ` Peter Zijlstra
2015-05-04 17:35       ` Oleg Nesterov
2015-05-07 13:33         ` Jiri Kosina
2015-05-07 22:37           ` NeilBrown
2015-05-02 11:56     ` Evgeniy Polyakov
2015-05-02 16:33     ` Richard Weinberger
2015-05-03 17:34     ` Oleg Nesterov
2015-05-04 16:45       ` [PATCH 0/1] signals: don't abuse __flush_signals() in selinux_bprm_committed_creds() Oleg Nesterov
2015-05-04 16:45         ` [PATCH 1/1] " Oleg Nesterov
2015-05-04 19:43           ` Paul Moore
2015-05-06 10:19       ` [PATCH] signals: Generate warning when flush_signals() is called from non-kthread context Ingo Molnar
2015-05-01 20:11   ` [GIT PULL] VFIO fixes for v4.1-rc2 Richard Weinberger
2015-05-01 21:09     ` Richard Weinberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).