All of lore.kernel.org
 help / color / mirror / Atom feed
* perf_counters issue with self-sampling threads
@ 2009-07-27 16:51 stephane eranian
  2009-07-27 16:56 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: stephane eranian @ 2009-07-27 16:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Andrew Morton, Thomas Gleixner, Robert Richter,
	Peter Zijlstra, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel

Hi,

I believe there is a problem with the current perf_counters (PCL)
code for self-sampling threads. The problem is related to sample
notifications via signal.

PCL (just like perfmon) is using SIGIO, an asynchronous signal,
to notify user applications of the availability of data in the event
buffer.

POSIX does not mandate that asynchronous signals be delivered
to the thread in which they originated. Any thread in the process
may process the signal, assuming it does not have the signal
blocked.

This is a serious problem with any self-sampling program such as
those built on top of PAPI. When sampling, you do want the signal
to be delivered to the thread in which the counter overflowed. This is
not just for convenience but it is required if the signal handler needs
to operate on the thread's machine state. Although, there is always
a possibility of forwarding the signal via tkill() to the right thread, I
do not think this is the right solution as it incurs additional latency
and therefore skid.

Looking at the kernel source code related to that, it seems that
kill_fasync() ends up calling group_send_sig_info(). This function
adds the signal to the process SHARED sigpending queue. Then,
it picks a thread to "wakeup". It first tries the thread in which the
signal originated with the following selection test (wants_signal):
   - signal is not blocked
   - thread is not exiting
   - no signal private pending for this thread

If that does not work, it iterates over the other threads of the process.

This explains why in trivial tests, the SIGIO is always delivered
to the right thread. However it the monitored thread is using any
other signals, e.g., SIGALRM, then the SIGIO signal can go to the
wrong thread. The problem also arises if the first SIGIO is not already
processed by the time a 2nd is pended.

For self-sampling, we want (and in fact require)  asynchronous notifications.
But we want the extra guarantee that the signal is ALWAYS delivered to
the thread in which the event occurred.

It seems like we could either create a different version of kill_fasync() or
pass an extra parameter to force this function to use specific_send_sig_info().
This would be only when self-monitoring. When a tool is monitoring another
thread, it is probably okay to have the signal delivered to any threads. Most
likely, the tool is setup such that threads not processing notifications have
the signal blocked.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-27 16:51 perf_counters issue with self-sampling threads stephane eranian
@ 2009-07-27 16:56 ` Peter Zijlstra
  2009-07-27 21:25 ` Andi Kleen
  2009-07-29 12:19 ` Peter Zijlstra
  2 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2009-07-27 16:56 UTC (permalink / raw)
  To: eranian
  Cc: Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel

On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
> Hi,
> 
> I believe there is a problem with the current perf_counters (PCL)
> code for self-sampling threads. The problem is related to sample
> notifications via signal.
> 
> PCL (just like perfmon) is using SIGIO, an asynchronous signal,
> to notify user applications of the availability of data in the event
> buffer.
> 
> POSIX does not mandate that asynchronous signals be delivered
> to the thread in which they originated. Any thread in the process
> may process the signal, assuming it does not have the signal
> blocked.

Bugger, you're right. /me kicks POSIX again for creating these crazy ass
semantics.

I'll look at fixing this.

Thanks!


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-27 16:51 perf_counters issue with self-sampling threads stephane eranian
  2009-07-27 16:56 ` Peter Zijlstra
@ 2009-07-27 21:25 ` Andi Kleen
       [not found]   ` <7c86c4470907272213w2ee57080re50dd22a4d73a7e0@mail.gmail.com>
  2009-07-29 12:19 ` Peter Zijlstra
  2 siblings, 1 reply; 49+ messages in thread
From: Andi Kleen @ 2009-07-27 21:25 UTC (permalink / raw)
  To: eranian
  Cc: Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Peter Zijlstra, Paul Mackerras, Andi Kleen,
	Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci,
	Dan Terpstra, perfmon2-devel


I wonder how SIGPROF gets around the same problem, or is it
buggy too?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
       [not found]   ` <7c86c4470907272213w2ee57080re50dd22a4d73a7e0@mail.gmail.com>
@ 2009-07-28  8:51     ` stephane eranian
  2009-07-28  8:56       ` Andi Kleen
  2009-08-04 16:09     ` stephane eranian
  1 sibling, 1 reply; 49+ messages in thread
From: stephane eranian @ 2009-07-28  8:51 UTC (permalink / raw)
  To: Andi Kleen
  Cc: perfmon2-devel, Maynard Johnson, Robert Richter, LKML,
	Thomas Gleixner, Paul Mackerras, Carl Love, Corey J Ashford,
	Andrew Morton, Peter Zijlstra, Philip Mucci, Dan Terpstra,
	Ingo Molnar

[Reposting the message because of stupid MIME-encoding error on my part]
Andi,

Looks like SIGPROF is calling _group_send_sig_info(), so I
think it is subject to the same problem.

I have built a perfmon example to test the problem, it is
relatively easy to trigger by simply self-monitoring a thread
which is using setitimer() and thus SIGALRM. You just have
to increase the timer frequency. At some point, the SIGPROF
signal will be delivered to the wrong thread.

One thing I did not mention in my message is that one would think
that forcing a specific signal via F_SETSIG could be a workaround
if that signal is synchronous, e.g., SIGFPE. F_SETSIG looks like a
Linux extension but it does not solve the problem. Linux determines
the mode of delivery not on the signal number but on the code path,
it seems. If you use F_ASYNC+F_SETOWN, then this is systematically
considered asynchronous regardless of the signal used.If you come from
traps.c, then the signal is delivered to the correct thread. All of this does
not look unreasonable to me.

I believe sampling, be it timer or PMU-based, may be
a special case here. We want asynchronous + specific
thread (only) when self- sampling.

An alternative may be to choose the pending queue based on the signal type
(synchronous, asynchronous). Then, we could use F_SETSIG to override
SIGIO with a synchronous signal instead. But I am not a POSIX signal expert.


On Jul 27, 2009 11:25 PM, "Andi Kleen" <andi@firstfloor.org> wrote:


I wonder how SIGPROF gets around the same problem, or is it
buggy too?

-Andi

--
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-28  8:51     ` stephane eranian
@ 2009-07-28  8:56       ` Andi Kleen
  2009-07-28  9:13         ` stephane eranian
  0 siblings, 1 reply; 49+ messages in thread
From: Andi Kleen @ 2009-07-28  8:56 UTC (permalink / raw)
  To: eranian
  Cc: Andi Kleen, perfmon2-devel, Maynard Johnson, Robert Richter,
	LKML, Thomas Gleixner, Paul Mackerras, Carl Love,
	Corey J Ashford, Andrew Morton, Peter Zijlstra, Philip Mucci,
	Dan Terpstra, Ingo Molnar

On Tue, Jul 28, 2009 at 10:51:04AM +0200, stephane eranian wrote:
> [Reposting the message because of stupid MIME-encoding error on my part]
> Andi,
> 
> Looks like SIGPROF is calling _group_send_sig_info(), so I
> think it is subject to the same problem.

So sounds like a whole class of signals can't be POSIX compliant.

Likely others will run into the same problem.

Perhaps should define a new sigaction flag for this to make
it all explicit.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-28  8:56       ` Andi Kleen
@ 2009-07-28  9:13         ` stephane eranian
  0 siblings, 0 replies; 49+ messages in thread
From: stephane eranian @ 2009-07-28  9:13 UTC (permalink / raw)
  To: Andi Kleen
  Cc: perfmon2-devel, Maynard Johnson, Robert Richter, LKML,
	Thomas Gleixner, Paul Mackerras, Carl Love, Corey J Ashford,
	Andrew Morton, Peter Zijlstra, Philip Mucci, Dan Terpstra,
	Ingo Molnar

Andi,

On Tue, Jul 28, 2009 at 10:56 AM, Andi Kleen<andi@firstfloor.org> wrote:
> On Tue, Jul 28, 2009 at 10:51:04AM +0200, stephane eranian wrote:
>> [Reposting the message because of stupid MIME-encoding error on my part]
>> Andi,
>>
>> Looks like SIGPROF is calling _group_send_sig_info(), so I
>> think it is subject to the same problem.
>
> So sounds like a whole class of signals can't be POSIX compliant.
>

Well, the problem is that I don't where to find the POSIX spec that defines
signal types and how they should be handled in multi-threaded programs.
That would be a good starting point.

> Likely others will run into the same problem.
>
> Perhaps should define a new sigaction flag for this to make
> it all explicit.
>
That's probably a clean way of doing this.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-27 16:51 perf_counters issue with self-sampling threads stephane eranian
  2009-07-27 16:56 ` Peter Zijlstra
  2009-07-27 21:25 ` Andi Kleen
@ 2009-07-29 12:19 ` Peter Zijlstra
  2009-07-29 12:37   ` stephane eranian
  2009-07-29 22:17   ` Oleg Nesterov
  2 siblings, 2 replies; 49+ messages in thread
From: Peter Zijlstra @ 2009-07-29 12:19 UTC (permalink / raw)
  To: eranian
  Cc: Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk, oleg

On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
> I believe there is a problem with the current perf_counters (PCL)
> code for self-sampling threads. The problem is related to sample
> notifications via signal.
> 
> PCL (just like perfmon) is using SIGIO, an asynchronous signal,
> to notify user applications of the availability of data in the event
> buffer.
> 
> POSIX does not mandate that asynchronous signals be delivered
> to the thread in which they originated. Any thread in the process
> may process the signal, assuming it does not have the signal
> blocked.

This signal stuff makes my head spin a little, however:

fcntl(2) for F_SETOWN says:

If a non-zero value is given to F_SETSIG  in  a  multi‐ threaded
process running with a threading library that supports thread groups
(e.g., NPTL),  then  a  positive value  given  to  F_SETOWN  has  a
different  meaning: instead of being a process ID identifying a whole
pro‐ cess,  it  is a thread ID identifying a specific thread within a
process.  Consequently, it may be necessary to pass  F_SETOWN  the
result of gettid(2) instead of get‐ pid(2) to get sensible results
when F_SETSIG  is  used.  (In  current  Linux  threading
implementations, a main thread’s thread ID is the same as its process
ID.  This means  that  a  single-threaded program can equally use
gettid(2) or getpid(2) in this scenario.)   Note,  how‐ ever,  that
the  statements  in  this paragraph do not apply to the SIGURG signal
generated  for  out-of-band data  on a socket: this signal is always
sent to either a process or a process group, depending  on  the  value
given  to  F_SETOWN.   Note  also  that Linux imposes a limit on the
number of real-time signals  that  may  be queued  to  a  process (see
getrlimit(2) and signal(7)) and if this limit is reached, then the
kernel  reverts to  delivering  SIGIO,  and this signal is delivered
to the entire process rather than to a specific thread.


Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of
a PID it should deliver SIGIO to the thread instead of the whole process
-- which, to me, seems a sane semantic.

However, 

  kill_fasync(SIGIO)
    __kill_fasync()
      send_sigio()
	/* if pid_type is a PIDTYPE_PID and pid a TID this should
           only iterate the one thread, I think */
        do_each_pid_task() {
          send_sigio_to_task();
        } while_each_pid_task();

where:

  send_sigio_to_task()
    group_send_sig_info()
      __group_send_sig_info()
        send_signal(.group = 1) /* uh-ow trouble */
          __send_signal()
            if (group)
               pending = &t->signal->shared_pending

which will result in the signal being send to the whole process anyway.


Now I was considering teaching send_sigio_to_task() to use
specific_send_sig_info() when fown->pid != fown->group_leader->pid or
something, but I'm not sure that won't break anything.

Alternatively, I've missed a detail and I either read the manpage wrong,
or the code, or both of them.




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-29 12:19 ` Peter Zijlstra
@ 2009-07-29 12:37   ` stephane eranian
  2009-07-29 12:46     ` Peter Zijlstra
  2009-07-29 22:17   ` Oleg Nesterov
  1 sibling, 1 reply; 49+ messages in thread
From: stephane eranian @ 2009-07-29 12:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk, oleg

Peter,

On Wed, Jul 29, 2009 at 2:19 PM, Peter Zijlstra<a.p.zijlstra@chello.nl> wrote:
> On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
>> I believe there is a problem with the current perf_counters (PCL)
>> code for self-sampling threads. The problem is related to sample
>> notifications via signal.
>>
>> PCL (just like perfmon) is using SIGIO, an asynchronous signal,
>> to notify user applications of the availability of data in the event
>> buffer.
>>
>> POSIX does not mandate that asynchronous signals be delivered
>> to the thread in which they originated. Any thread in the process
>> may process the signal, assuming it does not have the signal
>> blocked.
>
> This signal stuff makes my head spin a little, however:
>
> fcntl(2) for F_SETOWN says:
>
> If a non-zero value is given to F_SETSIG  in  a  multi‐ threaded
> process running with a threading library that supports thread groups
> (e.g., NPTL),  then  a  positive value  given  to  F_SETOWN  has  a
> different  meaning: instead of being a process ID identifying a whole
> pro‐ cess,  it  is a thread ID identifying a specific thread within a
> process.  Consequently, it may be necessary to pass  F_SETOWN  the
> result of gettid(2) instead of get‐ pid(2) to get sensible results
> when F_SETSIG  is  used.  (In  current  Linux  threading
> implementations, a main thread’s thread ID is the same as its process
> ID.  This means  that  a  single-threaded program can equally use
> gettid(2) or getpid(2) in this scenario.)   Note,  how‐ ever,  that
> the  statements  in  this paragraph do not apply to the SIGURG signal
> generated  for  out-of-band data  on a socket: this signal is always
> sent to either a process or a process group, depending  on  the  value
> given  to  F_SETOWN.   Note  also  that Linux imposes a limit on the
> number of real-time signals  that  may  be queued  to  a  process (see
> getrlimit(2) and signal(7)) and if this limit is reached, then the
> kernel  reverts to  delivering  SIGIO,  and this signal is delivered
> to the entire process rather than to a specific thread.
>
>
> Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of
> a PID it should deliver SIGIO to the thread instead of the whole process
> -- which, to me, seems a sane semantic.
>
Yes, I remember that manpage. I got the same impression and in fact that is
what I document in some of my test programs. So you read this right.

> However,
>
>  kill_fasync(SIGIO)
>    __kill_fasync()
>      send_sigio()
>        /* if pid_type is a PIDTYPE_PID and pid a TID this should
>           only iterate the one thread, I think */
>        do_each_pid_task() {
>          send_sigio_to_task();
>        } while_each_pid_task();
>
> where:
>
>  send_sigio_to_task()
>    group_send_sig_info()
>      __group_send_sig_info()
>        send_signal(.group = 1) /* uh-ow trouble */
>          __send_signal()
>            if (group)
>               pending = &t->signal->shared_pending
>
> which will result in the signal being send to the whole process anyway.
>
Exactly! That is the code path and this is why this does not work as
expected. Nowhere along that path is there special casing for that
F_SETOWN of tid vs. pid. kill_fasync() implies group.


>
> Now I was considering teaching send_sigio_to_task() to use
> specific_send_sig_info() when fown->pid != fown->group_leader->pid or
> something, but I'm not sure that won't break anything.
>
Yes, that's the problem with touching this. I don't know if this will break
things. That's why I was suggested creating a parallel code path which
does what we want without modifying the existing path. Unless you know
some signal expert at redhat or elsewhere.

> Alternatively, I've missed a detail and I either read the manpage wrong,
> or the code, or both of them.
>
The code does not correspond to the manpage. Not clear which one
is correct though. This F_SETOWN trick looks very Linux specific.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-29 12:37   ` stephane eranian
@ 2009-07-29 12:46     ` Peter Zijlstra
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2009-07-29 12:46 UTC (permalink / raw)
  To: eranian
  Cc: Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk, oleg

On Wed, 2009-07-29 at 14:37 +0200, stephane eranian wrote:
> 
> >
> > Now I was considering teaching send_sigio_to_task() to use
> > specific_send_sig_info() when fown->pid != fown->group_leader->pid or
> > something, but I'm not sure that won't break anything.
> >
> Yes, that's the problem with touching this. I don't know if this will break
> things. That's why I was suggested creating a parallel code path which
> does what we want without modifying the existing path. Unless you know
> some signal expert at redhat or elsewhere.

His name is Oleg, and he's on CC ;-)

> > Alternatively, I've missed a detail and I either read the manpage wrong,
> > or the code, or both of them.
> >
> The code does not correspond to the manpage. Not clear which one
> is correct though. This F_SETOWN trick looks very Linux specific.

Linus specific sounds good enough to me. Michael might have something so
say on this though...


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-29 12:19 ` Peter Zijlstra
  2009-07-29 12:37   ` stephane eranian
@ 2009-07-29 22:17   ` Oleg Nesterov
  2009-07-30 11:31     ` Peter Zijlstra
  1 sibling, 1 reply; 49+ messages in thread
From: Oleg Nesterov @ 2009-07-29 22:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk

(add Roland)

On 07/29, Peter Zijlstra wrote:
>
> On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
> >
> > POSIX does not mandate that asynchronous signals be delivered
> > to the thread in which they originated. Any thread in the process
> > may process the signal, assuming it does not have the signal
> > blocked.

Yes. I now nothing about POSIX, but this is what Linux does at least.
I don't think we can/should change this behaviour.

> fcntl(2) for F_SETOWN says:
>
> If a non-zero value is given to F_SETSIG  in  a  multi‐ threaded
> process running with a threading library that supports thread groups
> (e.g., NPTL),  then  a  positive value  given  to  F_SETOWN  has  a
> different  meaning: instead of being a process ID identifying a whole
> pro‐ cess,  it  is a thread ID identifying a specific thread within a
> process.

Heh. Definitely this is not what Linux does ;)

> Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of
> a PID it should deliver SIGIO to the thread instead of the whole process
> -- which, to me, seems a sane semantic.

I am not sure I understand the man above... But to me it looks like we
should always send a private signal when fown->signum != 0 ?

The change should be simple, but as you pointed out we can break things.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-29 22:17   ` Oleg Nesterov
@ 2009-07-30 11:31     ` Peter Zijlstra
  2009-07-30 19:20       ` Oleg Nesterov
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2009-07-30 11:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk, roland

On Thu, 2009-07-30 at 00:17 +0200, Oleg Nesterov wrote:
> (add Roland)

but you seem to have forgotten to actually edit the CC line, fixed
that ;-)

> On 07/29, Peter Zijlstra wrote:
> >
> > On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
> > >
> > > POSIX does not mandate that asynchronous signals be delivered
> > > to the thread in which they originated. Any thread in the process
> > > may process the signal, assuming it does not have the signal
> > > blocked.
> 
> Yes. I now nothing about POSIX, but this is what Linux does at least.
> I don't think we can/should change this behaviour.

Well, we have plenty exceptions to that rule already, we have itimer
extentions, tkill sys_rt_tgsigqueueinfo and plenty more..

> > fcntl(2) for F_SETOWN says:
> >
> > If a non-zero value is given to F_SETSIG  in  a  multi‐ threaded
> > process running with a threading library that supports thread groups
> > (e.g., NPTL),  then  a  positive value  given  to  F_SETOWN  has  a
> > different  meaning: instead of being a process ID identifying a whole
> > pro‐ cess,  it  is a thread ID identifying a specific thread within a
> > process.
> 
> Heh. Definitely this is not what Linux does ;)

Right, so the question is, did we ever? Why does the man page say this.

Looking at the .12 source (git start) we did:

  440 			if (!send_sig_info(fown->signum, &si, p))
  441 				break;
  442 		/* fall-through: fall back on the old plain SIGIO signal */
  443 		case 0:
  444 			send_group_sig_info(SIGIO, SEND_SIG_PRIV, p);

Which was 'corrected' in:

commit fc9c9ab22d5650977c417ef2032d02f455011b23
Author: Bharath Ramesh <bramesh@vt.edu>
Date:   Sat Apr 16 15:25:41 2005 -0700

    [PATCH] AYSNC IO using singals other than SIGIO

    A question on sigwaitinfo based IO mechanism in multithreaded applications.

    I am trying to use RT signals to notify me of IO events using RT signals
    instead of SIGIO in a multithreaded applications.  I noticed that there was
    some discussion on lkml during november 1999 with the subject of the
    discussion as "Signal driven IO".  In the thread I noticed that RT signals
    were being delivered to the worker thread.  I am running 2.6.10 kernel and
    I am trying to use the very same mechanism and I find that only SIGIO being
    propogated to the worker threads and RT signals only being propogated to
    the main thread and not the worker threads where I actually want them to be
    propogated too.  On further inspection I found that the following patch
    which I have attached solves the problem.

    I am not sure if this is a bug or feature in the kernel.


    Roland McGrath <roland@redhat.com> said:

    This relates only to fcntl F_SETSIG, which is a Linux extension.  So there is
    no POSIX issue.  When changing various things like the normal SIGIO signalling
    to do group signals, I was concerned strictly with the POSIX semantics and
    generally avoided touching things in the domain of Linux inventions.  That's
    why I didn't change this when I changed the call right next to it.  There is
    no reason I can see that F_SETSIG-requested signals shouldn't use a group
    signal like normal SIGIO does.  I'm happy to ACK this patch, there is nothing
    wrong with its change to the semantics in my book.  But neither POSIX nor I
    care a whit what F_SETSIG does.

    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

> > Which seems to imply that when we feed fcntl(F_SETOWN) a TID instead of
> > a PID it should deliver SIGIO to the thread instead of the whole process
> > -- which, to me, seems a sane semantic.
> 
> I am not sure I understand the man above... But to me it looks like we
> should always send a private signal when fown->signum != 0 ?
> 
> The change should be simple, but as you pointed out we can break things.

Right, so the change I had in mind is like the below (except I don't
know if we can compare struct pid things by pointer value or if we
should look at the content).

In any case, we should either do something like the below (yay!), or
amend the manpage (Michael?) and introduce something like F_SETOWN2
which does have the below semantics :-(.

---
Index: linux-2.6/fs/fcntl.c
===================================================================
--- linux-2.6.orig/fs/fcntl.c
+++ linux-2.6/fs/fcntl.c
@@ -431,6 +431,16 @@ static void send_sigio_to_task(struct ta
 			       int fd,
 			       int reason)
 {
+	int (*send_sig)(int, struct siginfo *, struct task_struct *);
+
+	send_sig = group_send_sig_info;
+	/*
+	 * If the fown points to a specific TID instead of to a PID
+	 * we'll send the signal to the thread only.
+	 */
+	if (fown->pid_type == PIDTYPE_PID && fown->pid != task_tgid(p))
+		send_sig = send_sig_info;
+
 	/*
 	 * F_SETSIG can change ->signum lockless in parallel, make
 	 * sure we read it once and use the same value throughout.
@@ -461,11 +472,11 @@ static void send_sigio_to_task(struct ta
 			else
 				si.si_band = band_table[reason - POLL_IN];
 			si.si_fd    = fd;
-			if (!group_send_sig_info(signum, &si, p))
+			if (!send_sig(signum, &si, p))
 				break;
 		/* fall-through: fall back on the old plain SIGIO signal */
 		case 0:
-			group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
+			send_sig(SIGIO, SEND_SIG_PRIV, p);
 	}
 }
 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-30 11:31     ` Peter Zijlstra
@ 2009-07-30 19:20       ` Oleg Nesterov
  2009-07-30 20:00         ` Peter Zijlstra
  0 siblings, 1 reply; 49+ messages in thread
From: Oleg Nesterov @ 2009-07-30 19:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk, roland

On 07/30, Peter Zijlstra wrote:
>
> On Thu, 2009-07-30 at 00:17 +0200, Oleg Nesterov wrote:
> > (add Roland)
>
> but you seem to have forgotten to actually edit the CC line, fixed
> that ;-)

Yes, thanks ;)

> > On 07/29, Peter Zijlstra wrote:
> > >
> > > On Mon, 2009-07-27 at 18:51 +0200, stephane eranian wrote:
> > > >
> > > > POSIX does not mandate that asynchronous signals be delivered
> > > > to the thread in which they originated. Any thread in the process
> > > > may process the signal, assuming it does not have the signal
> > > > blocked.
> >
> > Yes. I now nothing about POSIX, but this is what Linux does at least.
> > I don't think we can/should change this behaviour.
>
> Well, we have plenty exceptions to that rule already, we have itimer
> extentions, tkill sys_rt_tgsigqueueinfo and plenty more..

Yes, yes, I meant the behaviour of kill(2), group_send_sig_info(), etc.

> > > fcntl(2) for F_SETOWN says:
> > >
> > > If a non-zero value is given to F_SETSIG  in  a  multi‐ threaded
> > > process running with a threading library that supports thread groups
> > > (e.g., NPTL),  then  a  positive value  given  to  F_SETOWN  has  a
> > > different  meaning: instead of being a process ID identifying a whole
> > > pro‐ cess,  it  is a thread ID identifying a specific thread within a
> > > process.
> >
> > Heh. Definitely this is not what Linux does ;)
>
> Right, so the question is, did we ever? Why does the man page say this.
>
> Looking at the .12 source (git start) we did:
>
>   440 			if (!send_sig_info(fown->signum, &si, p))
>   441 				break;
>   442 		/* fall-through: fall back on the old plain SIGIO signal */
>   443 		case 0:
>   444 			send_group_sig_info(SIGIO, SEND_SIG_PRIV, p);

Yes, the send_sig_info() above seems to match the manpage.

Another thing I can't understand, group_send_sig_info() calls
check_kill_permission(). But check_kill_permission() uses current, which
can be a "random" task if kill_fasync() is called from interrupt. Even
if not interrupt, I don't understand why (say) pipe_read() can't send a
signal here. sigio_perm() has already checked permissions, and it correctly
uses fown->cred.

> Which was 'corrected' in:
>
> commit fc9c9ab22d5650977c417ef2032d02f455011b23
> Author: Bharath Ramesh <bramesh@vt.edu>
> Date:   Sat Apr 16 15:25:41 2005 -0700
>
>     [PATCH] AYSNC IO using singals other than SIGIO
>
>     A question on sigwaitinfo based IO mechanism in multithreaded applications.
>
>     I am trying to use RT signals to notify me of IO events using RT signals
>     instead of SIGIO in a multithreaded applications.  I noticed that there was
>     some discussion on lkml during november 1999 with the subject of the
>     discussion as "Signal driven IO".  In the thread I noticed that RT signals
>     were being delivered to the worker thread.  I am running 2.6.10 kernel and
>     I am trying to use the very same mechanism and I find that only SIGIO being
>     propogated to the worker threads and RT signals only being propogated to
>     the main thread and not the worker threads where I actually want them to be
>     propogated too.  On further inspection I found that the following patch
>     which I have attached solves the problem.

So, some people want shared signals here.

> > I am not sure I understand the man above... But to me it looks like we
> > should always send a private signal when fown->signum != 0 ?
> >
> > The change should be simple, but as you pointed out we can break things.
>
> Right, so the change I had in mind is like the below (except I don't
> know if we can compare struct pid things by pointer value or if we
> should look at the content).

(yes, we can compare the pointers)

> In any case, we should either do something like the below (yay!), or
> amend the manpage (Michael?) and introduce something like F_SETOWN2
> which does have the below semantics :-(.
>
> ---
> Index: linux-2.6/fs/fcntl.c
> ===================================================================
> --- linux-2.6.orig/fs/fcntl.c
> +++ linux-2.6/fs/fcntl.c
> @@ -431,6 +431,16 @@ static void send_sigio_to_task(struct ta
>  			       int fd,
>  			       int reason)
>  {
> +	int (*send_sig)(int, struct siginfo *, struct task_struct *);
> +
> +	send_sig = group_send_sig_info;
> +	/*
> +	 * If the fown points to a specific TID instead of to a PID
> +	 * we'll send the signal to the thread only.
> +	 */
> +	if (fown->pid_type == PIDTYPE_PID && fown->pid != task_tgid(p))
> +		send_sig = send_sig_info;

Yes, this allows to send a private signal to sub-thread.

But this is a bit strange, because the user can't specify it wants
a thread-specific signal to the main thread, its tid == pid.

I don't know what should we do. Perhaps we can just add
"bool is_group_signal" to fown_struct as another Linux extension.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-30 19:20       ` Oleg Nesterov
@ 2009-07-30 20:00         ` Peter Zijlstra
  2009-07-30 20:28           ` Oleg Nesterov
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2009-07-30 20:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk, roland

On Thu, 2009-07-30 at 21:20 +0200, Oleg Nesterov wrote:

> Yes, this allows to send a private signal to sub-thread.
> 
> But this is a bit strange, because the user can't specify it wants
> a thread-specific signal to the main thread, its tid == pid.

Ah, indeed. I'll make a patch for F_SETOWN_TID then, unless someone
comes up with a better name for the creature ;-)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-30 20:00         ` Peter Zijlstra
@ 2009-07-30 20:28           ` Oleg Nesterov
  2009-07-30 21:09             ` stephane eranian
  2009-07-31  8:35             ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
  0 siblings, 2 replies; 49+ messages in thread
From: Oleg Nesterov @ 2009-07-30 20:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk, roland

On 07/30, Peter Zijlstra wrote:
>
> I'll make a patch for F_SETOWN_TID then, unless someone
> comes up with a better name for the creature ;-)

I think you are right. It is not safe to change the current
behaviour.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
  2009-07-30 20:28           ` Oleg Nesterov
@ 2009-07-30 21:09             ` stephane eranian
  2009-07-31  8:35             ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
  1 sibling, 0 replies; 49+ messages in thread
From: stephane eranian @ 2009-07-30 21:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Andrew Morton,
	Thomas Gleixner, Robert Richter, Paul Mackerras, Andi Kleen,
	Maynard Johnson, Carl Love, Corey J Ashford, Philip Mucci,
	Dan Terpstra, perfmon2-devel, Michael Kerrisk, roland

On Thu, Jul 30, 2009 at 10:28 PM, Oleg Nesterov<oleg@redhat.com> wrote:
> On 07/30, Peter Zijlstra wrote:
>>
>> I'll make a patch for F_SETOWN_TID then, unless someone
>> comes up with a better name for the creature ;-)
>
> I think you are right. It is not safe to change the current
> behaviour.
>
I agree. As I said earlier, it is better to just add a new code
path.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC][PATCH] fcntl: F_[SG]ETOWN_TID
  2009-07-30 20:28           ` Oleg Nesterov
  2009-07-30 21:09             ` stephane eranian
@ 2009-07-31  8:35             ` Peter Zijlstra
  2009-07-31 14:01               ` stephane eranian
                                 ` (2 more replies)
  1 sibling, 3 replies; 49+ messages in thread
From: Peter Zijlstra @ 2009-07-31  8:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk, roland

In order to direct the SIGIO signal to a particular thread of a
multi-threaded application we cannot, like suggested by the manpage, put
a TID into the regular fcntl(F_SETOWN) call. It will still be send to
the whole process of which that thread is part.

Since people do want to properly direct SIGIO we introduce F_SETOWN_TID,
which functions similarly to F_SETOWN, except positive arguments are
interpreted as TIDs and negative arguments are interpreted as PIDs.

This extension is fully bug compatible with the old F_GETOWN
implementation in that F_GETOWN_TID will be troubled by the negative
return value for PIDs similarly to F_GETOWN's trouble with process
groups.

[ compile tested only so far ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/parisc/include/asm/fcntl.h |    2 +
 fs/fcntl.c                      |   64 +++++++++++++++++++++++++++++++++-----
 include/asm-generic/fcntl.h     |    4 ++
 include/linux/fs.h              |   11 +++++-
 net/socket.c                    |    2 +-
 5 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/arch/parisc/include/asm/fcntl.h b/arch/parisc/include/asm/fcntl.h
index 1e1c824..5d5235a 100644
--- a/arch/parisc/include/asm/fcntl.h
+++ b/arch/parisc/include/asm/fcntl.h
@@ -28,6 +28,8 @@
 #define F_SETOWN	12	/*  for sockets. */
 #define F_SETSIG	13	/*  for sockets. */
 #define F_GETSIG	14	/*  for sockets. */
+#define F_GETOWN_TID	15
+#define F_SETOWN_TID	16
 
 /* for posix fcntl() and lockf() */
 #define F_RDLCK		01
diff --git a/fs/fcntl.c b/fs/fcntl.c
index ae41308..8d0b7f9 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -197,13 +197,15 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 }
 
 static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
-                     int force)
+                     int flags)
 {
 	write_lock_irq(&filp->f_owner.lock);
-	if (force || !filp->f_owner.pid) {
+	if ((flags & FF_SETOWN_FORCE) || !filp->f_owner.pid) {
 		put_pid(filp->f_owner.pid);
 		filp->f_owner.pid = get_pid(pid);
 		filp->f_owner.pid_type = type;
+		filp->f_owner.task_only =
+			(type == PIDTYPE_PID && (flags & FF_SETOWN_TID));
 
 		if (pid) {
 			const struct cred *cred = current_cred();
@@ -215,7 +217,7 @@ static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
 }
 
 int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
-		int force)
+		int flags)
 {
 	int err;
 
@@ -223,12 +225,12 @@ int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
 	if (err)
 		return err;
 
-	f_modown(filp, pid, type, force);
+	f_modown(filp, pid, type, flags);
 	return 0;
 }
 EXPORT_SYMBOL(__f_setown);
 
-int f_setown(struct file *filp, unsigned long arg, int force)
+int f_setown(struct file *filp, unsigned long arg, int flags)
 {
 	enum pid_type type;
 	struct pid *pid;
@@ -241,7 +243,7 @@ int f_setown(struct file *filp, unsigned long arg, int force)
 	}
 	rcu_read_lock();
 	pid = find_vpid(who);
-	result = __f_setown(filp, pid, type, force);
+	result = __f_setown(filp, pid, type, flags);
 	rcu_read_unlock();
 	return result;
 }
@@ -263,6 +265,40 @@ pid_t f_getown(struct file *filp)
 	return pid;
 }
 
+static int f_setown_tid(struct file *filp, unsigned long arg)
+{
+	int flags = FF_SETOWN_FORCE;
+	struct pid *pid;
+	int who = arg;
+	int ret = 0;
+
+	if (who < 0)
+		who = -who;
+	else
+		flags |= FF_SETOWN_TID;
+
+	rcu_read_lock();
+	pid = find_vpid(who);
+	ret = __f_setown(filp, pid, PIDTYPE_PID, flags);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static pid_t f_getown_tid(struct file *filp)
+{
+	pid_t tid;
+
+	read_lock(&filp->f_owner.lock);
+	tid = pid_vnr(filp->f_owner.pid);
+	if (filp->f_owner.pid_type == PIDTYPE_PGID)
+		tid = 0;
+	if (!filp->f_owner.task_only)
+		tid = -tid;
+	read_unlock(&filp->f_owner.lock);
+	return tid;
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -311,7 +347,14 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		force_successful_syscall_return();
 		break;
 	case F_SETOWN:
-		err = f_setown(filp, arg, 1);
+		err = f_setown(filp, arg, FF_SETOWN_FORCE);
+		break;
+	case F_GETOWN_TID:
+		err = f_getown_tid(filp);
+		force_successful_syscall_return();
+		break;
+	case F_SETOWN_TID:
+		err = f_setown_tid(filp, arg);
 		break;
 	case F_GETSIG:
 		err = filp->f_owner.signum;
@@ -431,6 +474,7 @@ static void send_sigio_to_task(struct task_struct *p,
 			       int fd,
 			       int reason)
 {
+	int (*send_sig)(int, struct siginfo *, struct task_struct *);
 	/*
 	 * F_SETSIG can change ->signum lockless in parallel, make
 	 * sure we read it once and use the same value throughout.
@@ -440,6 +484,8 @@ static void send_sigio_to_task(struct task_struct *p,
 	if (!sigio_perm(p, fown, signum))
 		return;
 
+	send_sig = fown->task_only ? send_sig_info : group_send_sig_info;
+
 	switch (signum) {
 		siginfo_t si;
 		default:
@@ -461,11 +507,11 @@ static void send_sigio_to_task(struct task_struct *p,
 			else
 				si.si_band = band_table[reason - POLL_IN];
 			si.si_fd    = fd;
-			if (!group_send_sig_info(signum, &si, p))
+			if (!send_sig(signum, &si, p))
 				break;
 		/* fall-through: fall back on the old plain SIGIO signal */
 		case 0:
-			group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
+			send_sig(SIGIO, SEND_SIG_PRIV, p);
 	}
 }
 
diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
index 4d3e483..d7906b8 100644
--- a/include/asm-generic/fcntl.h
+++ b/include/asm-generic/fcntl.h
@@ -73,6 +73,10 @@
 #define F_SETSIG	10	/* for sockets. */
 #define F_GETSIG	11	/* for sockets. */
 #endif
+#ifndef F_SETOWN_TID
+#define F_SETOWN_TID	12
+#define F_GETOWN_TID	13
+#endif
 
 /* for F_[GET|SET]FL */
 #define FD_CLOEXEC	1	/* actually anything with low bit set goes */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0872372..42697e7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -872,6 +872,7 @@ struct fown_struct {
 	rwlock_t lock;          /* protects pid, uid, euid fields */
 	struct pid *pid;	/* pid or -pgrp where SIGIO should be sent */
 	enum pid_type pid_type;	/* Kind of process group SIGIO should be sent to */
+	bool task_only;		/* task or group signal */
 	uid_t uid, euid;	/* uid/euid of process setting the owner */
 	int signum;		/* posix.1b rt signal to be delivered on IO */
 };
@@ -1291,8 +1292,14 @@ extern void kill_fasync(struct fasync_struct **, int, int);
 /* only for net: no internal synchronization */
 extern void __kill_fasync(struct fasync_struct *, int, int);
 
-extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force);
-extern int f_setown(struct file *filp, unsigned long arg, int force);
+/*
+ * setown flags
+ */ 
+#define FF_SETOWN_FORCE		1
+#define FF_SETOWN_TID		2
+
+extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int flags);
+extern int f_setown(struct file *filp, unsigned long arg, int flags);
 extern void f_delown(struct file *filp);
 extern pid_t f_getown(struct file *filp);
 extern int send_sigurg(struct fown_struct *fown);
diff --git a/net/socket.c b/net/socket.c
index 791d71a..ac57f8e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -916,7 +916,7 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 			err = -EFAULT;
 			if (get_user(pid, (int __user *)argp))
 				break;
-			err = f_setown(sock->file, pid, 1);
+			err = f_setown(sock->file, pid, FF_SETOWN_FORCE);
 			break;
 		case FIOGETOWN:
 		case SIOCGPGRP:



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID
  2009-07-31  8:35             ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
@ 2009-07-31 14:01               ` stephane eranian
  2009-07-31 20:52               ` Oleg Nesterov
  2009-07-31 21:11               ` Andrew Morton
  2 siblings, 0 replies; 49+ messages in thread
From: stephane eranian @ 2009-07-31 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk, roland

Hi,

I have tried this patch in 2.6.30 and perfmon given I already had a
stress test program
for this problem. I can report that the patch seems to work for me.

This is a self-sampling multi-threaded program in which each thread also
uses setitimer(ITIMER_REAL) and thus is getting SIGALRM.

First run with stock F_SETOWN:

$ self_smpl_multi  8 20000000 29 0
program_time = 8, threshold = 20000000, signum = 29 fcntl(F_SETOWN)

launch main: fd: 3, tid: 5836, self: 0x7f75f43206e0
launch side: fd: 4, tid: 5837, self: 0x40b59950
1: fd = 3, count =   30, iter =  651, rate = 46/Kiter
1: fd = 4, count =   29, iter =  623, rate = 46/Kiter
(bad thread) ser = 287, fd = 4, tid = 5836, self = 0x7f75f43206e0
2: fd = 4, count =  119, iter = 2540, rate = 46/Kiter
2: fd = 3, count =  118, iter = 2510, rate = 47/Kiter
(bad thread) ser = 330, fd = 4, tid = 5836, self = 0x7f75f43206e0
(bad thread) ser = 395, fd = 4, tid = 5836, self = 0x7f75f43206e0


The "(bad thread)" message shows up when the signal goes to the wrong
thread.

Second run with F_SETOWN_TID:

$ ./self_smpl_multi  8 20000000 29 1
program_time = 8, threshold = 20000000, signum = 29 fcntl(F_SETOWN_TID)

launch main: fd: 3, tid: 5838, self: 0x7fa1800af6e0
launch side: fd: 4, tid: 5839, self: 0x4253a950
1: fd = 4, count =   52, iter = 1115, rate = 46/Kiter
1: fd = 3, count =   52, iter = 1115, rate = 46/Kiter
2: fd = 4, count =  119, iter = 2541, rate = 46/Kiter
2: fd = 3, count =  117, iter = 2490, rate = 46/Kiter
3: fd = 3, count =  114, iter = 2427, rate = 46/Kiter
3: fd = 4, count =  119, iter = 2541, rate = 46/Kiter
4: fd = 4, count =  119, iter = 2541, rate = 46/Kiter
4: fd = 3, count =  114, iter = 2428, rate = 46/Kiter
5: fd = 4, count =  119, iter = 2541, rate = 46/Kiter

No more error messages.

Thanks.


On Fri, Jul 31, 2009 at 10:35 AM, Peter Zijlstra<a.p.zijlstra@chello.nl> wrote:
> In order to direct the SIGIO signal to a particular thread of a
> multi-threaded application we cannot, like suggested by the manpage, put
> a TID into the regular fcntl(F_SETOWN) call. It will still be send to
> the whole process of which that thread is part.
>
> Since people do want to properly direct SIGIO we introduce F_SETOWN_TID,
> which functions similarly to F_SETOWN, except positive arguments are
> interpreted as TIDs and negative arguments are interpreted as PIDs.
>
> This extension is fully bug compatible with the old F_GETOWN
> implementation in that F_GETOWN_TID will be troubled by the negative
> return value for PIDs similarly to F_GETOWN's trouble with process
> groups.
>
> [ compile tested only so far ]
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/parisc/include/asm/fcntl.h |    2 +
>  fs/fcntl.c                      |   64 +++++++++++++++++++++++++++++++++-----
>  include/asm-generic/fcntl.h     |    4 ++
>  include/linux/fs.h              |   11 +++++-
>  net/socket.c                    |    2 +-
>  5 files changed, 71 insertions(+), 12 deletions(-)
>
> diff --git a/arch/parisc/include/asm/fcntl.h b/arch/parisc/include/asm/fcntl.h
> index 1e1c824..5d5235a 100644
> --- a/arch/parisc/include/asm/fcntl.h
> +++ b/arch/parisc/include/asm/fcntl.h
> @@ -28,6 +28,8 @@
>  #define F_SETOWN       12      /*  for sockets. */
>  #define F_SETSIG       13      /*  for sockets. */
>  #define F_GETSIG       14      /*  for sockets. */
> +#define F_GETOWN_TID   15
> +#define F_SETOWN_TID   16
>
>  /* for posix fcntl() and lockf() */
>  #define F_RDLCK                01
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index ae41308..8d0b7f9 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -197,13 +197,15 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
>  }
>
>  static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
> -                     int force)
> +                     int flags)
>  {
>        write_lock_irq(&filp->f_owner.lock);
> -       if (force || !filp->f_owner.pid) {
> +       if ((flags & FF_SETOWN_FORCE) || !filp->f_owner.pid) {
>                put_pid(filp->f_owner.pid);
>                filp->f_owner.pid = get_pid(pid);
>                filp->f_owner.pid_type = type;
> +               filp->f_owner.task_only =
> +                       (type == PIDTYPE_PID && (flags & FF_SETOWN_TID));
>
>                if (pid) {
>                        const struct cred *cred = current_cred();
> @@ -215,7 +217,7 @@ static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
>  }
>
>  int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
> -               int force)
> +               int flags)
>  {
>        int err;
>
> @@ -223,12 +225,12 @@ int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
>        if (err)
>                return err;
>
> -       f_modown(filp, pid, type, force);
> +       f_modown(filp, pid, type, flags);
>        return 0;
>  }
>  EXPORT_SYMBOL(__f_setown);
>
> -int f_setown(struct file *filp, unsigned long arg, int force)
> +int f_setown(struct file *filp, unsigned long arg, int flags)
>  {
>        enum pid_type type;
>        struct pid *pid;
> @@ -241,7 +243,7 @@ int f_setown(struct file *filp, unsigned long arg, int force)
>        }
>        rcu_read_lock();
>        pid = find_vpid(who);
> -       result = __f_setown(filp, pid, type, force);
> +       result = __f_setown(filp, pid, type, flags);
>        rcu_read_unlock();
>        return result;
>  }
> @@ -263,6 +265,40 @@ pid_t f_getown(struct file *filp)
>        return pid;
>  }
>
> +static int f_setown_tid(struct file *filp, unsigned long arg)
> +{
> +       int flags = FF_SETOWN_FORCE;
> +       struct pid *pid;
> +       int who = arg;
> +       int ret = 0;
> +
> +       if (who < 0)
> +               who = -who;
> +       else
> +               flags |= FF_SETOWN_TID;
> +
> +       rcu_read_lock();
> +       pid = find_vpid(who);
> +       ret = __f_setown(filp, pid, PIDTYPE_PID, flags);
> +       rcu_read_unlock();
> +
> +       return ret;
> +}
> +
> +static pid_t f_getown_tid(struct file *filp)
> +{
> +       pid_t tid;
> +
> +       read_lock(&filp->f_owner.lock);
> +       tid = pid_vnr(filp->f_owner.pid);
> +       if (filp->f_owner.pid_type == PIDTYPE_PGID)
> +               tid = 0;
> +       if (!filp->f_owner.task_only)
> +               tid = -tid;
> +       read_unlock(&filp->f_owner.lock);
> +       return tid;
> +}
> +
>  static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>                struct file *filp)
>  {
> @@ -311,7 +347,14 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>                force_successful_syscall_return();
>                break;
>        case F_SETOWN:
> -               err = f_setown(filp, arg, 1);
> +               err = f_setown(filp, arg, FF_SETOWN_FORCE);
> +               break;
> +       case F_GETOWN_TID:
> +               err = f_getown_tid(filp);
> +               force_successful_syscall_return();
> +               break;
> +       case F_SETOWN_TID:
> +               err = f_setown_tid(filp, arg);
>                break;
>        case F_GETSIG:
>                err = filp->f_owner.signum;
> @@ -431,6 +474,7 @@ static void send_sigio_to_task(struct task_struct *p,
>                               int fd,
>                               int reason)
>  {
> +       int (*send_sig)(int, struct siginfo *, struct task_struct *);
>        /*
>         * F_SETSIG can change ->signum lockless in parallel, make
>         * sure we read it once and use the same value throughout.
> @@ -440,6 +484,8 @@ static void send_sigio_to_task(struct task_struct *p,
>        if (!sigio_perm(p, fown, signum))
>                return;
>
> +       send_sig = fown->task_only ? send_sig_info : group_send_sig_info;
> +
>        switch (signum) {
>                siginfo_t si;
>                default:
> @@ -461,11 +507,11 @@ static void send_sigio_to_task(struct task_struct *p,
>                        else
>                                si.si_band = band_table[reason - POLL_IN];
>                        si.si_fd    = fd;
> -                       if (!group_send_sig_info(signum, &si, p))
> +                       if (!send_sig(signum, &si, p))
>                                break;
>                /* fall-through: fall back on the old plain SIGIO signal */
>                case 0:
> -                       group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
> +                       send_sig(SIGIO, SEND_SIG_PRIV, p);
>        }
>  }
>
> diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
> index 4d3e483..d7906b8 100644
> --- a/include/asm-generic/fcntl.h
> +++ b/include/asm-generic/fcntl.h
> @@ -73,6 +73,10 @@
>  #define F_SETSIG       10      /* for sockets. */
>  #define F_GETSIG       11      /* for sockets. */
>  #endif
> +#ifndef F_SETOWN_TID
> +#define F_SETOWN_TID   12
> +#define F_GETOWN_TID   13
> +#endif
>
>  /* for F_[GET|SET]FL */
>  #define FD_CLOEXEC     1       /* actually anything with low bit set goes */
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 0872372..42697e7 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -872,6 +872,7 @@ struct fown_struct {
>        rwlock_t lock;          /* protects pid, uid, euid fields */
>        struct pid *pid;        /* pid or -pgrp where SIGIO should be sent */
>        enum pid_type pid_type; /* Kind of process group SIGIO should be sent to */
> +       bool task_only;         /* task or group signal */
>        uid_t uid, euid;        /* uid/euid of process setting the owner */
>        int signum;             /* posix.1b rt signal to be delivered on IO */
>  };
> @@ -1291,8 +1292,14 @@ extern void kill_fasync(struct fasync_struct **, int, int);
>  /* only for net: no internal synchronization */
>  extern void __kill_fasync(struct fasync_struct *, int, int);
>
> -extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force);
> -extern int f_setown(struct file *filp, unsigned long arg, int force);
> +/*
> + * setown flags
> + */
> +#define FF_SETOWN_FORCE                1
> +#define FF_SETOWN_TID          2
> +
> +extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int flags);
> +extern int f_setown(struct file *filp, unsigned long arg, int flags);
>  extern void f_delown(struct file *filp);
>  extern pid_t f_getown(struct file *filp);
>  extern int send_sigurg(struct fown_struct *fown);
> diff --git a/net/socket.c b/net/socket.c
> index 791d71a..ac57f8e 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -916,7 +916,7 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
>                        err = -EFAULT;
>                        if (get_user(pid, (int __user *)argp))
>                                break;
> -                       err = f_setown(sock->file, pid, 1);
> +                       err = f_setown(sock->file, pid, FF_SETOWN_FORCE);
>                        break;
>                case FIOGETOWN:
>                case SIOCGPGRP:
>
>
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID
  2009-07-31  8:35             ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
  2009-07-31 14:01               ` stephane eranian
@ 2009-07-31 20:52               ` Oleg Nesterov
  2009-07-31 21:11               ` Andrew Morton
  2 siblings, 0 replies; 49+ messages in thread
From: Oleg Nesterov @ 2009-07-31 20:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: eranian, Ingo Molnar, LKML, Andrew Morton, Thomas Gleixner,
	Robert Richter, Paul Mackerras, Andi Kleen, Maynard Johnson,
	Carl Love, Corey J Ashford, Philip Mucci, Dan Terpstra,
	perfmon2-devel, Michael Kerrisk, roland

On 07/31, Peter Zijlstra wrote:
>
> In order to direct the SIGIO signal to a particular thread of a
> multi-threaded application we cannot, like suggested by the manpage, put
> a TID into the regular fcntl(F_SETOWN) call. It will still be send to
> the whole process of which that thread is part.
>
> Since people do want to properly direct SIGIO we introduce F_SETOWN_TID,
> which functions similarly to F_SETOWN, except positive arguments are
> interpreted as TIDs and negative arguments are interpreted as PIDs.

I think this is correct. But,

> @@ -431,6 +474,7 @@ static void send_sigio_to_task(struct task_struct *p,
>  			       int fd,
>  			       int reason)
>  {
> +	int (*send_sig)(int, struct siginfo *, struct task_struct *);
>  	/*
>  	 * F_SETSIG can change ->signum lockless in parallel, make
>  	 * sure we read it once and use the same value throughout.
> @@ -440,6 +484,8 @@ static void send_sigio_to_task(struct task_struct *p,
>  	if (!sigio_perm(p, fown, signum))
>  		return;
>
> +	send_sig = fown->task_only ? send_sig_info : group_send_sig_info;

this is ugly.

I do not blame your patch, I blame signal.c which has a lot of helpers
to send a signal but it is never possible to find the right one.

I think we need a new trivial helper,

	int xxx(int sig, struct siginfo *info, struct task_struct *p, bool group)
	{
		unsigned long flags;
		int ret = -ESRCH;

		if (lock_task_sighand(p, &flags)) {
			ret = send_signal(sig, info, p, group);
			unlock_task_sighand(p, &flags);
		}

		return ret;
	}

send_sigio_to_task() can just do: send_signal(..., !fown->task_only).

group_send_sig_info(), send_sig_info() should use this helper too.


Also. without the new helper, F_SETOWN does check_kill_permission()
while F_SETOWN_TID does not. This doesn't really matter, but this
looks a bit odd.

Note that send_sigio_to_task() does not need check_kill_permission().
We use either SEND_SIG_PRIV or SI_FROMKERNEL() signals, so we never
actually check permissions. Even if we did, it would be just wrong
to deny the signal here using current_cred().


Peter, may I suggest to to re-diff your patch on top of the trivial
patch I am going to send a bit later today?

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID
  2009-07-31  8:35             ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
  2009-07-31 14:01               ` stephane eranian
  2009-07-31 20:52               ` Oleg Nesterov
@ 2009-07-31 21:11               ` Andrew Morton
  2009-08-01  1:27                 ` [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) Oleg Nesterov
                                   ` (4 more replies)
  2 siblings, 5 replies; 49+ messages in thread
From: Andrew Morton @ 2009-07-31 21:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: oleg, eranian, mingo, linux-kernel, tglx, robert.richter, paulus,
	andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel,
	mtk.manpages, roland

On Fri, 31 Jul 2009 10:35:20 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> In order to direct the SIGIO signal to a particular thread of a
> multi-threaded application we cannot, like suggested by the manpage, put
> a TID into the regular fcntl(F_SETOWN) call. It will still be send to
> the whole process of which that thread is part.
> 
> Since people do want to properly direct SIGIO we introduce F_SETOWN_TID,
> which functions similarly to F_SETOWN, except positive arguments are
> interpreted as TIDs and negative arguments are interpreted as PIDs.
> 
> This extension is fully bug compatible with the old F_GETOWN
> implementation in that F_GETOWN_TID will be troubled by the negative
> return value for PIDs similarly to F_GETOWN's trouble with process
> groups.

I'd be interested in seeing a bit more explanation about the "people do
want to properly direct SIGIO" thing - use cases, how the current code
causes them problems, etc.  As it stands, it's a bit of a mystery-patch.

> [ compile tested only so far ]

I will continue to lurk :)

> arch/parisc/include/asm/fcntl.h |    2 +
> fs/fcntl.c                      |   64 +++++++++++++++++++++++++++++++++-----
> include/asm-generic/fcntl.h     |    4 ++
> include/linux/fs.h              |   11 +++++-
> net/socket.c                    |    2 +-

OK.



Alpha has private definitions of F_SETSIG and F_GETSIG which are
identical to the generic ones.  That's somewhat logical, given that
alpha's F_SETOWN/F_GETOWN _differ_ from the asm-generic ones.

Alpha appears to have made the decision to spell out _all_ the F_*
flags, given that some of them are different.  That has some merit, I
guess.

But your patch broke that.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID)
  2009-07-31 21:11               ` Andrew Morton
@ 2009-08-01  1:27                 ` Oleg Nesterov
  2009-08-03 15:48                   ` [PATCH 3/2] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
  2009-08-01  1:28                 ` [PATCH 1/2] signals: introduce do_send_sig_info() helper Oleg Nesterov
                                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-01  1:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

Preparation for Peter's changes, but imho makes sense anyway.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/2] signals: introduce do_send_sig_info() helper
  2009-07-31 21:11               ` Andrew Morton
  2009-08-01  1:27                 ` [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) Oleg Nesterov
@ 2009-08-01  1:28                 ` Oleg Nesterov
  2009-08-01  1:28                 ` [PATCH 2/2] signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission() Oleg Nesterov
                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-01  1:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

Introduce do_send_sig_info() and convert group_send_sig_info(),
send_sig_info(), do_send_specific() to use this helper.

Hopefully it will have more users soon, it allows to specify
specific/group behaviour via "bool group" argument.

Shaves 80 bytes from .text.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 include/linux/signal.h |    2 +
 kernel/signal.c        |   56 +++++++++++++++++++++++--------------------------
 2 files changed, 29 insertions(+), 29 deletions(-)

--- WAIT/include/linux/signal.h~1_HELPER	2009-06-11 14:16:46.000000000 +0200
+++ WAIT/include/linux/signal.h	2009-08-01 02:26:41.000000000 +0200
@@ -233,6 +233,8 @@ static inline int valid_signal(unsigned 
 }
 
 extern int next_signal(struct sigpending *pending, sigset_t *mask);
+extern int do_send_sig_info(int sig, struct siginfo *info,
+				struct task_struct *p, bool group);
 extern int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p);
 extern int __group_send_sig_info(int, struct siginfo *, struct task_struct *);
 extern long do_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig,
--- WAIT/kernel/signal.c~1_HELPER	2009-07-01 20:22:44.000000000 +0200
+++ WAIT/kernel/signal.c	2009-08-01 02:21:27.000000000 +0200
@@ -971,6 +971,20 @@ specific_send_sig_info(int sig, struct s
 	return send_signal(sig, info, t, 0);
 }
 
+int do_send_sig_info(int sig, struct siginfo *info, struct task_struct *p,
+			bool group)
+{
+	unsigned long flags;
+	int ret = -ESRCH;
+
+	if (lock_task_sighand(p, &flags)) {
+		ret = send_signal(sig, info, p, group);
+		unlock_task_sighand(p, &flags);
+	}
+
+	return ret;
+}
+
 /*
  * Force a signal that the process can't ignore: if necessary
  * we unblock the signal and change any SIG_IGN to SIG_DFL.
@@ -1068,18 +1082,10 @@ struct sighand_struct *lock_task_sighand
  */
 int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
 {
-	unsigned long flags;
-	int ret;
-
-	ret = check_kill_permission(sig, info, p);
+	int ret = check_kill_permission(sig, info, p);
 
-	if (!ret && sig) {
-		ret = -ESRCH;
-		if (lock_task_sighand(p, &flags)) {
-			ret = __group_send_sig_info(sig, info, p);
-			unlock_task_sighand(p, &flags);
-		}
-	}
+	if (!ret && sig)
+		ret = do_send_sig_info(sig, info, p, true);
 
 	return ret;
 }
@@ -1224,15 +1230,9 @@ static int kill_something_info(int sig, 
  * These are for backward compatibility with the rest of the kernel source.
  */
 
-/*
- * The caller must ensure the task can't exit.
- */
 int
 send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
 {
-	int ret;
-	unsigned long flags;
-
 	/*
 	 * Make sure legacy kernel users don't send in bad values
 	 * (normal paths check this in check_kill_permission).
@@ -1240,10 +1240,7 @@ send_sig_info(int sig, struct siginfo *i
 	if (!valid_signal(sig))
 		return -EINVAL;
 
-	spin_lock_irqsave(&p->sighand->siglock, flags);
-	ret = specific_send_sig_info(sig, info, p);
-	spin_unlock_irqrestore(&p->sighand->siglock, flags);
-	return ret;
+	return do_send_sig_info(sig, info, p, false);
 }
 
 #define __si_special(priv) \
@@ -2281,7 +2278,6 @@ static int
 do_send_specific(pid_t tgid, pid_t pid, int sig, struct siginfo *info)
 {
 	struct task_struct *p;
-	unsigned long flags;
 	int error = -ESRCH;
 
 	rcu_read_lock();
@@ -2291,14 +2287,16 @@ do_send_specific(pid_t tgid, pid_t pid, 
 		/*
 		 * The null signal is a permissions and process existence
 		 * probe.  No signal is actually delivered.
-		 *
-		 * If lock_task_sighand() fails we pretend the task dies
-		 * after receiving the signal. The window is tiny, and the
-		 * signal is private anyway.
 		 */
-		if (!error && sig && lock_task_sighand(p, &flags)) {
-			error = specific_send_sig_info(sig, info, p);
-			unlock_task_sighand(p, &flags);
+		if (!error && sig) {
+			error = do_send_sig_info(sig, info, p, false);
+			/*
+			 * If lock_task_sighand() failed we pretend the task
+			 * dies after receiving the signal. The window is tiny,
+			 * and the signal is private anyway.
+			 */
+			if (unlikely(error == -ESRCH))
+				error = 0;
 		}
 	}
 	rcu_read_unlock();


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 2/2] signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission()
  2009-07-31 21:11               ` Andrew Morton
  2009-08-01  1:27                 ` [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) Oleg Nesterov
  2009-08-01  1:28                 ` [PATCH 1/2] signals: introduce do_send_sig_info() helper Oleg Nesterov
@ 2009-08-01  1:28                 ` Oleg Nesterov
  2009-08-03 12:53                 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID stephane eranian
  2009-08-03 15:21                 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
  4 siblings, 0 replies; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-01  1:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

group_send_sig_info()->check_kill_permission() assumes that current
is the sender and uses current_cred().

This is not true in send_sigio_to_task() case. From the security pov
the sender is not current, but the task which did fcntl(F_SETOWN),
that is why we have sigio_perm() which uses the right creds to check.

Fortunately, send_sigio() always sends either SEND_SIG_PRIV or
SI_FROMKERNEL() signal, so check_kill_permission() does nothing. But
still it would be tidier to avoid this bogus security check and save
a couple of cycles.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 fs/fcntl.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- WAIT/fs/fcntl.c~2_SIGIO	2009-06-17 14:11:26.000000000 +0200
+++ WAIT/fs/fcntl.c	2009-08-01 02:40:41.000000000 +0200
@@ -462,11 +462,11 @@ static void send_sigio_to_task(struct ta
 			else
 				si.si_band = band_table[reason - POLL_IN];
 			si.si_fd    = fd;
-			if (!group_send_sig_info(signum, &si, p))
+			if (!do_send_sig_info(signum, &si, p, true))
 				break;
 		/* fall-through: fall back on the old plain SIGIO signal */
 		case 0:
-			group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
+			do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true);
 	}
 }
 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID
  2009-07-31 21:11               ` Andrew Morton
                                   ` (2 preceding siblings ...)
  2009-08-01  1:28                 ` [PATCH 2/2] signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission() Oleg Nesterov
@ 2009-08-03 12:53                 ` stephane eranian
  2009-08-09  5:46                   ` F_SETOWN_TID: F_SETOWN was thread-specific for a while Jamie Lokier
  2009-08-03 15:21                 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
  4 siblings, 1 reply; 49+ messages in thread
From: stephane eranian @ 2009-08-03 12:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, oleg, mingo, linux-kernel, tglx, robert.richter,
	paulus, andi, mpjohn, cel, cjashfor, mucci, terpstra,
	perfmon2-devel, mtk.manpages, roland

Andrew,

On Fri, Jul 31, 2009 at 11:11 PM, Andrew
Morton<akpm@linux-foundation.org> wrote:
> On Fri, 31 Jul 2009 10:35:20 +0200
> Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
>> In order to direct the SIGIO signal to a particular thread of a
>> multi-threaded application we cannot, like suggested by the manpage, put
>> a TID into the regular fcntl(F_SETOWN) call. It will still be send to
>> the whole process of which that thread is part.
>>
>> Since people do want to properly direct SIGIO we introduce F_SETOWN_TID,
>> which functions similarly to F_SETOWN, except positive arguments are
>> interpreted as TIDs and negative arguments are interpreted as PIDs.
>>
>> This extension is fully bug compatible with the old F_GETOWN
>> implementation in that F_GETOWN_TID will be troubled by the negative
>> return value for PIDs similarly to F_GETOWN's trouble with process
>> groups.
>
> I'd be interested in seeing a bit more explanation about the "people do
> want to properly direct SIGIO" thing - use cases, how the current code
> causes them problems, etc.  As it stands, it's a bit of a mystery-patch.
>
I have explained that  in my initial post on LKML. Here is a link to it:
http://lkml.org/lkml/2009/7/27/242

Basically, in the context of self-monitoring threads (which is very common),
you want the sample notification to be asynchronous and yet delivered to
a particular thread, i.e., the one in which the sample was recorded. The
asynchronous notification is required because you are monitoring yourself.
Asynchronous signals are normally delivered to any thread within the process.
In my message, I described the algorithm currently implemented by the kernel.
As Peter found out, the man page seems to indicate that if you specify a TID
instead a PID with F_SETOWN, then the kernel should interpret this as meaning
this particular thread, but this is not what is implemented right now.

Self-monitoring  has specific needs, you don't necessarily want SIGIO to
always be delivered to the thread in which it originated. That is why, I
have suggested a new code path. It is also important that backward
compatibility be maintained.

>> [ compile tested only so far ]
>
> I will continue to lurk :)
>
>> arch/parisc/include/asm/fcntl.h |    2 +
>> fs/fcntl.c                      |   64 +++++++++++++++++++++++++++++++++-----
>> include/asm-generic/fcntl.h     |    4 ++
>> include/linux/fs.h              |   11 +++++-
>> net/socket.c                    |    2 +-
>
> OK.
>
>
>
> Alpha has private definitions of F_SETSIG and F_GETSIG which are
> identical to the generic ones.  That's somewhat logical, given that
> alpha's F_SETOWN/F_GETOWN _differ_ from the asm-generic ones.
>
> Alpha appears to have made the decision to spell out _all_ the F_*
> flags, given that some of them are different.  That has some merit, I
> guess.
>
> But your patch broke that.
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID
  2009-07-31 21:11               ` Andrew Morton
                                   ` (3 preceding siblings ...)
  2009-08-03 12:53                 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID stephane eranian
@ 2009-08-03 15:21                 ` Peter Zijlstra
  4 siblings, 0 replies; 49+ messages in thread
From: Peter Zijlstra @ 2009-08-03 15:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: oleg, eranian, mingo, linux-kernel, tglx, robert.richter, paulus,
	andi, mpjohn, cel, cjashfor, mucci, terpstra, perfmon2-devel,
	mtk.manpages, roland

On Fri, 2009-07-31 at 14:11 -0700, Andrew Morton wrote:
> 
> 
> > arch/parisc/include/asm/fcntl.h |    2 +
> > fs/fcntl.c                      |   64 +++++++++++++++++++++++++++++++++-----
> > include/asm-generic/fcntl.h     |    4 ++
> > include/linux/fs.h              |   11 +++++-
> > net/socket.c                    |    2 +-
> 
> OK.
> 
> 
> 
> Alpha has private definitions of F_SETSIG and F_GETSIG which are
> identical to the generic ones.  That's somewhat logical, given that
> alpha's F_SETOWN/F_GETOWN _differ_ from the asm-generic ones.
> 
> Alpha appears to have made the decision to spell out _all_ the F_*
> flags, given that some of them are different.  That has some merit, I
> guess.
> 
> But your patch broke that.

Right, all I did was validate that I didn't overlap with any of the
existing fcntl numbers. I can explicitly add them to alpha if that makes
people happy..




^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 3/2] fcntl: F_[SG]ETOWN_TID
  2009-08-01  1:27                 ` [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) Oleg Nesterov
@ 2009-08-03 15:48                   ` Peter Zijlstra
  2009-08-03 17:16                     ` Oleg Nesterov
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2009-08-03 15:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland


In order to direct the SIGIO signal to a particular thread of a
multi-threaded application we cannot, like suggested by the manpage, put
a TID into the regular fcntl(F_SETOWN) call. It will still be send to
the whole process of which that thread is part.

Since people do want to properly direct SIGIO we introduce F_SETOWN_TID,
which functions similarly to F_SETOWN, except positive arguments are
interpreted as TIDs and negative arguments are interpreted as PIDs.

The need to direct SIGIO comes from self-monitoring profiling such as 
with perf-counters. Perf-counters uses SIGIO to notify that new sample
data is available. If the signal is delivered to the same task that
generated the new sample it can augment that data by inspecting the
task's user-space state right after it returns from the kernel. This 
is esp. convenient for interpreted or virtual machine driven environments.

This extension is fully bug compatible with the old F_GETOWN
implementation in that F_GETOWN_TID will be troubled by the negative
return value for PIDs similarly to F_GETOWN's trouble with process
groups.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: stephane eranian <eranian@googlemail.com>
---
 arch/alpha/include/asm/fcntl.h  |    2 +
 arch/parisc/include/asm/fcntl.h |    2 +
 fs/fcntl.c                      |   62 ++++++++++++++++++++++++++++++++++------
 include/asm-generic/fcntl.h     |    4 ++
 include/linux/fs.h              |   11 +++++--
 net/socket.c                    |    2 -
 6 files changed, 71 insertions(+), 12 deletions(-)

Index: linux-2.6/arch/parisc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/parisc/include/asm/fcntl.h
+++ linux-2.6/arch/parisc/include/asm/fcntl.h
@@ -28,6 +28,8 @@
 #define F_SETOWN	12	/*  for sockets. */
 #define F_SETSIG	13	/*  for sockets. */
 #define F_GETSIG	14	/*  for sockets. */
+#define F_GETOWN_TID	15
+#define F_SETOWN_TID	16
 
 /* for posix fcntl() and lockf() */
 #define F_RDLCK		01
Index: linux-2.6/fs/fcntl.c
===================================================================
--- linux-2.6.orig/fs/fcntl.c
+++ linux-2.6/fs/fcntl.c
@@ -197,13 +197,15 @@ static int setfl(int fd, struct file * f
 }
 
 static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
-                     int force)
+		     int flags)
 {
 	write_lock_irq(&filp->f_owner.lock);
-	if (force || !filp->f_owner.pid) {
+	if ((flags & FF_SETOWN_FORCE) || !filp->f_owner.pid) {
 		put_pid(filp->f_owner.pid);
 		filp->f_owner.pid = get_pid(pid);
 		filp->f_owner.pid_type = type;
+		filp->f_owner.task_only =
+			(type == PIDTYPE_PID && (flags & FF_SETOWN_TID));
 
 		if (pid) {
 			const struct cred *cred = current_cred();
@@ -215,7 +217,7 @@ static void f_modown(struct file *filp, 
 }
 
 int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
-		int force)
+		int flags)
 {
 	int err;
 
@@ -223,12 +225,12 @@ int __f_setown(struct file *filp, struct
 	if (err)
 		return err;
 
-	f_modown(filp, pid, type, force);
+	f_modown(filp, pid, type, flags);
 	return 0;
 }
 EXPORT_SYMBOL(__f_setown);
 
-int f_setown(struct file *filp, unsigned long arg, int force)
+int f_setown(struct file *filp, unsigned long arg, int flags)
 {
 	enum pid_type type;
 	struct pid *pid;
@@ -241,7 +243,7 @@ int f_setown(struct file *filp, unsigned
 	}
 	rcu_read_lock();
 	pid = find_vpid(who);
-	result = __f_setown(filp, pid, type, force);
+	result = __f_setown(filp, pid, type, flags);
 	rcu_read_unlock();
 	return result;
 }
@@ -263,6 +265,40 @@ pid_t f_getown(struct file *filp)
 	return pid;
 }
 
+static int f_setown_tid(struct file *filp, unsigned long arg)
+{
+	int flags = FF_SETOWN_FORCE;
+	struct pid *pid;
+	int who = arg;
+	int ret = 0;
+
+	if (who < 0)
+		who = -who;
+	else
+		flags |= FF_SETOWN_TID;
+
+	rcu_read_lock();
+	pid = find_vpid(who);
+	ret = __f_setown(filp, pid, PIDTYPE_PID, flags);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static pid_t f_getown_tid(struct file *filp)
+{
+	pid_t tid;
+
+	read_lock(&filp->f_owner.lock);
+	tid = pid_vnr(filp->f_owner.pid);
+	if (filp->f_owner.pid_type == PIDTYPE_PGID)
+		tid = 0;
+	if (!filp->f_owner.task_only)
+		tid = -tid;
+	read_unlock(&filp->f_owner.lock);
+	return tid;
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -311,7 +347,14 @@ static long do_fcntl(int fd, unsigned in
 		force_successful_syscall_return();
 		break;
 	case F_SETOWN:
-		err = f_setown(filp, arg, 1);
+		err = f_setown(filp, arg, FF_SETOWN_FORCE);
+		break;
+	case F_GETOWN_TID:
+		err = f_getown_tid(filp);
+		force_successful_syscall_return();
+		break;
+	case F_SETOWN_TID:
+		err = f_setown_tid(filp, arg);
 		break;
 	case F_GETSIG:
 		err = filp->f_owner.signum;
@@ -436,6 +479,7 @@ static void send_sigio_to_task(struct ta
 	 * sure we read it once and use the same value throughout.
 	 */
 	int signum = ACCESS_ONCE(fown->signum);
+	int group = !fown->task_only;
 
 	if (!sigio_perm(p, fown, signum))
 		return;
@@ -461,11 +505,11 @@ static void send_sigio_to_task(struct ta
 			else
 				si.si_band = band_table[reason - POLL_IN];
 			si.si_fd    = fd;
-			if (!do_send_sig_info(signum, &si, p, true))
+			if (!do_send_sig_info(signum, &si, p, group))
 				break;
 		/* fall-through: fall back on the old plain SIGIO signal */
 		case 0:
-			do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true);
+			do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, group);
 	}
 }
 
Index: linux-2.6/include/asm-generic/fcntl.h
===================================================================
--- linux-2.6.orig/include/asm-generic/fcntl.h
+++ linux-2.6/include/asm-generic/fcntl.h
@@ -73,6 +73,10 @@
 #define F_SETSIG	10	/* for sockets. */
 #define F_GETSIG	11	/* for sockets. */
 #endif
+#ifndef F_SETOWN_TID
+#define F_SETOWN_TID	12
+#define F_GETOWN_TID	13
+#endif
 
 /* for F_[GET|SET]FL */
 #define FD_CLOEXEC	1	/* actually anything with low bit set goes */
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -872,6 +872,7 @@ struct fown_struct {
 	rwlock_t lock;          /* protects pid, uid, euid fields */
 	struct pid *pid;	/* pid or -pgrp where SIGIO should be sent */
 	enum pid_type pid_type;	/* Kind of process group SIGIO should be sent to */
+	bool task_only;		/* task or group signal */
 	uid_t uid, euid;	/* uid/euid of process setting the owner */
 	int signum;		/* posix.1b rt signal to be delivered on IO */
 };
@@ -1291,8 +1292,14 @@ extern void kill_fasync(struct fasync_st
 /* only for net: no internal synchronization */
 extern void __kill_fasync(struct fasync_struct *, int, int);
 
-extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force);
-extern int f_setown(struct file *filp, unsigned long arg, int force);
+/*
+ * setown flags
+ */
+#define FF_SETOWN_FORCE		1
+#define FF_SETOWN_TID		2
+
+extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int flags);
+extern int f_setown(struct file *filp, unsigned long arg, int flags);
 extern void f_delown(struct file *filp);
 extern pid_t f_getown(struct file *filp);
 extern int send_sigurg(struct fown_struct *fown);
Index: linux-2.6/net/socket.c
===================================================================
--- linux-2.6.orig/net/socket.c
+++ linux-2.6/net/socket.c
@@ -916,7 +916,7 @@ static long sock_ioctl(struct file *file
 			err = -EFAULT;
 			if (get_user(pid, (int __user *)argp))
 				break;
-			err = f_setown(sock->file, pid, 1);
+			err = f_setown(sock->file, pid, FF_SETOWN_FORCE);
 			break;
 		case FIOGETOWN:
 		case SIOCGPGRP:
Index: linux-2.6/arch/alpha/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/alpha/include/asm/fcntl.h
+++ linux-2.6/arch/alpha/include/asm/fcntl.h
@@ -26,6 +26,8 @@
 #define F_GETOWN	6	/*  for sockets. */
 #define F_SETSIG	10	/*  for sockets. */
 #define F_GETSIG	11	/*  for sockets. */
+#define F_SETOWN_TID	12
+#define F_GETOWN_TID	13
 
 /* for posix fcntl() and lockf() */
 #define F_RDLCK		1



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/2] fcntl: F_[SG]ETOWN_TID
  2009-08-03 15:48                   ` [PATCH 3/2] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
@ 2009-08-03 17:16                     ` Oleg Nesterov
  2009-08-03 17:47                       ` Peter Zijlstra
  0 siblings, 1 reply; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-03 17:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On 08/03, Peter Zijlstra wrote:
>
> --- linux-2.6.orig/fs/fcntl.c
> +++ linux-2.6/fs/fcntl.c
> @@ -197,13 +197,15 @@ static int setfl(int fd, struct file * f
>  }
>
>  static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
> -                     int force)
> +		     int flags)
>  {
>  	write_lock_irq(&filp->f_owner.lock);
> -	if (force || !filp->f_owner.pid) {
> +	if ((flags & FF_SETOWN_FORCE) || !filp->f_owner.pid) {
>  		put_pid(filp->f_owner.pid);
>  		filp->f_owner.pid = get_pid(pid);
>  		filp->f_owner.pid_type = type;
> +		filp->f_owner.task_only =
> +			(type == PIDTYPE_PID && (flags & FF_SETOWN_TID));

Do we need type == PIDTYPE_PID check? FF_SETOWN_TID must imply
PIDTYPE_PID, it is only used by f_setown_tid().

Hmm. Actually I am not sure we should change f_modown() at all. I was
going to suggest this as a subsequent cleanup, but now I am starting
to think it is better to do from the very beginning. Please see below.

> +static int f_setown_tid(struct file *filp, unsigned long arg)
> +{
> +	int flags = FF_SETOWN_FORCE;
> +	struct pid *pid;
> +	int who = arg;
> +	int ret = 0;
> +
> +	if (who < 0)
> +		who = -who;
> +	else
> +		flags |= FF_SETOWN_TID;

Hmm, OK. so fcntl(F_SETOWN_TID, -666) <=> fcntl(F_SETOWN, +666).

Not that I disagree, but I think this should be discussed. Perhaps
F_SETOWN_TID can just reject who < 0.

> +static pid_t f_getown_tid(struct file *filp)
> +{
> +	pid_t tid;
> +
> +	read_lock(&filp->f_owner.lock);
> +	tid = pid_vnr(filp->f_owner.pid);
> +	if (filp->f_owner.pid_type == PIDTYPE_PGID)
> +		tid = 0;
> +	if (!filp->f_owner.task_only)
> +		tid = -tid;

I didn't think about this before... What should F_GETOWN_TID return
if we did F_GETOWN ? (and vice versa). f_getown_tid() returns < 0
if !task_only and ->piD != 0, this helps.

but the caller of F_GETOWN can't know what the returned value actually
means if F_GETOWN_TID was used.


Do we really need fown->task_only? Not only this enlarges fown_struct,
we have to modify f_modown() and f_setown().

Perhaps we can just add

	#define F_PIDTYPE_THREAD	PIDTYPE_MAX

into fcntl.c ? Then,

	static int f_setown_xxx(struct file *filp, unsigned long arg, int force, bool group)
	{
		enum pid_type type;
		struct pid *pid;
		int who = arg;
		int result;

		type = PIDTYPE_PID;
		if (!group)
			type = F_PIDTYPE_THREAD
		else if (who < 0) {
			type = PIDTYPE_PGID;
			who = -who;
		}

		rcu_read_lock();
		pid = find_vpid(who);
		result = __f_setown(filp, pid, type, force);
		rcu_read_unlock();
		return result;
	}

	int f_setown(struct file *filp, unsigned long arg, int force)
	{
		return f_setown_xxx(..., true);
	}

Now we should also change send_sigio/send_sigurg, but this is trivial

		type = fown->pid_type;
	+	if (type == F_PIDTYPE_THREAD)
			type = PIDTYPE_PID;

send_sigio_to_task() is trivial too.

What do you think? I agree, this is a bit hackish, but otoh this lessens
the changes outside of fcntl.h.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/2] fcntl: F_[SG]ETOWN_TID
  2009-08-03 17:16                     ` Oleg Nesterov
@ 2009-08-03 17:47                       ` Peter Zijlstra
  2009-08-03 18:06                         ` Oleg Nesterov
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2009-08-03 17:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On Mon, 2009-08-03 at 19:16 +0200, Oleg Nesterov wrote:

> > +		filp->f_owner.task_only =
> > +			(type == PIDTYPE_PID && (flags & FF_SETOWN_TID));
> 
> Do we need type == PIDTYPE_PID check? FF_SETOWN_TID must imply
> PIDTYPE_PID, it is only used by f_setown_tid().

Paranoia I guess.

> > +static int f_setown_tid(struct file *filp, unsigned long arg)
> > +{
> > +	int flags = FF_SETOWN_FORCE;
> > +	struct pid *pid;
> > +	int who = arg;
> > +	int ret = 0;
> > +
> > +	if (who < 0)
> > +		who = -who;
> > +	else
> > +		flags |= FF_SETOWN_TID;
> 
> Hmm, OK. so fcntl(F_SETOWN_TID, -666) <=> fcntl(F_SETOWN, +666).
> 
> Not that I disagree, but I think this should be discussed. Perhaps
> F_SETOWN_TID can just reject who < 0.

Yeah, I considered that. Opinions?

> > +static pid_t f_getown_tid(struct file *filp)
> > +{
> > +	pid_t tid;
> > +
> > +	read_lock(&filp->f_owner.lock);
> > +	tid = pid_vnr(filp->f_owner.pid);
> > +	if (filp->f_owner.pid_type == PIDTYPE_PGID)
> > +		tid = 0;
> > +	if (!filp->f_owner.task_only)
> > +		tid = -tid;
> 
> I didn't think about this before... What should F_GETOWN_TID return
> if we did F_GETOWN ? (and vice versa). f_getown_tid() returns < 0
> if !task_only and ->piD != 0, this helps.
> 
> but the caller of F_GETOWN can't know what the returned value actually
> means if F_GETOWN_TID was used.

Ah, I made GETOWN_TID deal with !PID but forgot the TID case in GETOWN.
Yeah, icky, esp since there is no room for errors in the return value :/
I guess I could make it return 0.

> Do we really need fown->task_only? Not only this enlarges fown_struct,
> we have to modify f_modown() and f_setown().
> 
> Perhaps we can just add
> 
> 	#define F_PIDTYPE_THREAD	PIDTYPE_MAX
> 
> into fcntl.c ? Then,
> 
> 	static int f_setown_xxx(struct file *filp, unsigned long arg, int force, bool group)
> 	{
> 		enum pid_type type;
> 		struct pid *pid;
> 		int who = arg;
> 		int result;
> 
> 		type = PIDTYPE_PID;
> 		if (!group)
> 			type = F_PIDTYPE_THREAD
> 		else if (who < 0) {
> 			type = PIDTYPE_PGID;
> 			who = -who;
> 		}
> 
> 		rcu_read_lock();
> 		pid = find_vpid(who);
> 		result = __f_setown(filp, pid, type, force);
> 		rcu_read_unlock();
> 		return result;
> 	}
> 
> 	int f_setown(struct file *filp, unsigned long arg, int force)
> 	{
> 		return f_setown_xxx(..., true);
> 	}
> 
> Now we should also change send_sigio/send_sigurg, but this is trivial
> 
> 		type = fown->pid_type;
> 	+	if (type == F_PIDTYPE_THREAD)
> 			type = PIDTYPE_PID;
> 
> send_sigio_to_task() is trivial too.
> 
> What do you think? I agree, this is a bit hackish, but otoh this lessens
> the changes outside of fcntl.h.

Right, I considered adding PIDTYPE_TID, but then I'd have to go through
the kernel and make everything consistent, which is where I gave up ;-)

You hack above makes sense, dunno if people will go for it though..


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/2] fcntl: F_[SG]ETOWN_TID
  2009-08-03 17:47                       ` Peter Zijlstra
@ 2009-08-03 18:06                         ` Oleg Nesterov
  2009-08-03 18:36                           ` Peter Zijlstra
  0 siblings, 1 reply; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-03 18:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On 08/03, Peter Zijlstra wrote:
>
> On Mon, 2009-08-03 at 19:16 +0200, Oleg Nesterov wrote:
>
> > Hmm, OK. so fcntl(F_SETOWN_TID, -666) <=> fcntl(F_SETOWN, +666).
> >
> > Not that I disagree, but I think this should be discussed. Perhaps
> > F_SETOWN_TID can just reject who < 0.
>
> Yeah, I considered that. Opinions?

Yes, please ;)

> > but the caller of F_GETOWN can't know what the returned value actually
> > means if F_GETOWN_TID was used.
>
> Ah, I made GETOWN_TID deal with !PID but forgot the TID case in GETOWN.
> Yeah, icky, esp since there is no room for errors in the return value :/
> I guess I could make it return 0.

Yes, this is confusing too, but probably better.

> > Perhaps we can just add
> >
> > 	#define F_PIDTYPE_THREAD	PIDTYPE_MAX
> >
> > into fcntl.c ? Then,
> >
> Right, I considered adding PIDTYPE_TID, but then I'd have to go through
> the kernel and make everything consistent, which is where I gave up ;-)

Note! we don't add the new PIDTYPE_TID actually. This F_PIDTYPE_THREAD
is not visible outsie of fcntl.c, we just re-use ->pid_type. Instead
we could add a bit, but using the impossible PIDTYPE_MAX is simpler.

> dunno if people will go for it though..

Yes, I am not sure people will like it.


As for F_XXXOWN/F_XXXWOWN_TID interaction. Another option, perhaps, is add
F_{SET,GET}OWN_EX which accepts a use-visible

	struct f_setown_struct {
		int pid;		// > 0
		int type;		// enumerates PIDTYPE_PID, PIDTYPE_PGID, F_PIDTYPE_THREAD
	}

pointer via arg. Instead of F_XXXOWN_TID + int who.

This way at least the users of new api can't be confused.

I dunno.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/2] fcntl: F_[SG]ETOWN_TID
  2009-08-03 18:06                         ` Oleg Nesterov
@ 2009-08-03 18:36                           ` Peter Zijlstra
  2009-08-03 19:02                             ` Oleg Nesterov
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2009-08-03 18:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On Mon, 2009-08-03 at 20:06 +0200, Oleg Nesterov wrote:
> 
> As for F_XXXOWN/F_XXXWOWN_TID interaction. Another option, perhaps, is add
> F_{SET,GET}OWN_EX which accepts a use-visible
> 
>         struct f_setown_struct {
>                 int pid;                // > 0
>                 int type;               // enumerates PIDTYPE_PID, PIDTYPE_PGID, F_PIDTYPE_THREAD
>         }
> 
> pointer via arg. Instead of F_XXXOWN_TID + int who.
> 
> This way at least the users of new api can't be confused.

This would expose PIDTYPE* to userspace, and also fixes the value of
F_PIDTYPE_THREAD.

I'm not sure we want to go there (unless of course, PIDTYPE_* is already
exposed somewhere).


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/2] fcntl: F_[SG]ETOWN_TID
  2009-08-03 18:36                           ` Peter Zijlstra
@ 2009-08-03 19:02                             ` Oleg Nesterov
  2009-08-04 11:39                               ` [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX Peter Zijlstra
  0 siblings, 1 reply; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-03 19:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On 08/03, Peter Zijlstra wrote:
>
> On Mon, 2009-08-03 at 20:06 +0200, Oleg Nesterov wrote:
> >
> > As for F_XXXOWN/F_XXXWOWN_TID interaction. Another option, perhaps, is add
> > F_{SET,GET}OWN_EX which accepts a use-visible
> >
> >         struct f_setown_struct {
> >                 int pid;                // > 0
> >                 int type;               // enumerates PIDTYPE_PID, PIDTYPE_PGID, F_PIDTYPE_THREAD
> >         }
> >
> > pointer via arg. Instead of F_XXXOWN_TID + int who.
> >
> > This way at least the users of new api can't be confused.
>
> This would expose PIDTYPE* to userspace,

No, no, we should not export them of course.

f_setown_struct->type could be 0,1,2 (or whatever), fcntl() maps them
to fown_struct->enum_type or ->enum_type + ->task_only.


To clarify, I am not really sure this interface is better. Just for
discussion. Because it will be very painful to change this api later.
It is better to at least consider all options now.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX
  2009-08-03 19:02                             ` Oleg Nesterov
@ 2009-08-04 11:39                               ` Peter Zijlstra
  2009-08-04 16:20                                 ` Oleg Nesterov
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2009-08-04 11:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On Mon, 2009-08-03 at 21:02 +0200, Oleg Nesterov wrote:

> To clarify, I am not really sure this interface is better. Just for
> discussion. Because it will be very painful to change this api later.
> It is better to at least consider all options now.

You're completely right, how about the below?

---
Subject: fcntl: F_[SG]ETOWN_EX
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri, 31 Jul 2009 10:35:30 +0200

In order to direct the SIGIO signal to a particular thread of a
multi-threaded application we cannot, like suggested by the manpage, put
a TID into the regular fcntl(F_SETOWN) call. It will still be send to
the whole process of which that thread is part.

Since people do want to properly direct SIGIO we introduce F_SETOWN_EX.

The need to direct SIGIO comes from self-monitoring profiling such as
with perf-counters. Perf-counters uses SIGIO to notify that new sample
data is available. If the signal is delivered to the same task that
generated the new sample it can augment that data by inspecting the
task's user-space state right after it returns from the kernel. This
is esp. convenient for interpreted or virtual machine driven
environments.

Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex
as argument:

struct f_owner_ex {
	int   type;
	pid_t pid;
};

Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/alpha/include/asm/fcntl.h  |    2 
 arch/parisc/include/asm/fcntl.h |    2 
 fs/fcntl.c                      |   83 +++++++++++++++++++++++++++++++++++++---
 include/asm-generic/fcntl.h     |   13 ++++++
 4 files changed, 95 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/parisc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/parisc/include/asm/fcntl.h
+++ linux-2.6/arch/parisc/include/asm/fcntl.h
@@ -28,6 +28,8 @@
 #define F_SETOWN	12	/*  for sockets. */
 #define F_SETSIG	13	/*  for sockets. */
 #define F_GETSIG	14	/*  for sockets. */
+#define F_GETOWN_EX	15
+#define F_SETOWN_EX	16
 
 /* for posix fcntl() and lockf() */
 #define F_RDLCK		01
Index: linux-2.6/fs/fcntl.c
===================================================================
--- linux-2.6.orig/fs/fcntl.c
+++ linux-2.6/fs/fcntl.c
@@ -263,6 +263,67 @@ pid_t f_getown(struct file *filp)
 	return pid;
 }
 
+static int f_setown_ex(struct file *filp, unsigned long arg)
+{
+	struct f_owner_ex * __user owner_p = (void * __user)arg;
+	struct f_owner_ex owner;
+	struct pid *pid;
+	int type;
+	int ret;
+
+	ret = copy_from_user(&owner, owner_p, sizeof(owner));
+	if (ret)
+		return ret;
+
+	switch (owner.type) {
+	case F_OWNER_TID:
+		type = PIDTYPE_MAX;
+		break;
+
+	case F_OWNER_PID:
+		type = PIDTYPE_PID;
+		break;
+
+	case F_OWNER_GID:
+		type = PIDTYPE_PGID;
+		break;
+	}
+
+	rcu_read_lock();
+	pid = find_vpid(owner.pid);
+	ret = __f_setown(filp, pid, type, 1);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static int f_getown_ex(struct file *filp, unsigned long arg)
+{
+	struct f_owner_ex * __user owner_p = (void * __user)arg;
+	struct f_owner_ex owner;
+	int ret;
+
+	read_lock(&filp->f_owner.lock);
+	owner.pid = pid_vnr(filp->f_owner.pid);
+	switch ((int)filp->f_owner.pid_type) {
+	case PIDTYPE_MAX:
+		owner.type = F_OWNER_TID;
+		break;
+
+	case PIDTYPE_PID:
+		owner.type = F_OWNER_PID;
+		break;
+
+	case PIDTYPE_PGID:
+		owner.type = F_OWNER_GID;
+		break;
+	}
+	read_unlock(&filp->f_owner.lock);
+
+	ret = copy_to_user(owner_p, &owner, sizeof(owner));
+	return ret;
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -313,6 +374,12 @@ static long do_fcntl(int fd, unsigned in
 	case F_SETOWN:
 		err = f_setown(filp, arg, 1);
 		break;
+	case F_GETOWN_EX:
+		err = f_getown_ex(filp, arg);
+		break;
+	case F_SETOWN_EX:
+		err = f_setown_ex(filp, arg);
+		break;
 	case F_GETSIG:
 		err = filp->f_owner.signum;
 		break;
@@ -428,8 +495,7 @@ static inline int sigio_perm(struct task
 
 static void send_sigio_to_task(struct task_struct *p,
 			       struct fown_struct *fown,
-			       int fd,
-			       int reason)
+			       int fd, int reason, int group)
 {
 	/*
 	 * F_SETSIG can change ->signum lockless in parallel, make
@@ -461,11 +527,11 @@ static void send_sigio_to_task(struct ta
 			else
 				si.si_band = band_table[reason - POLL_IN];
 			si.si_fd    = fd;
-			if (!do_send_sig_info(signum, &si, p, true))
+			if (!do_send_sig_info(signum, &si, p, group))
 				break;
 		/* fall-through: fall back on the old plain SIGIO signal */
 		case 0:
-			do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true);
+			do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, group);
 	}
 }
 
@@ -474,16 +540,23 @@ void send_sigio(struct fown_struct *fown
 	struct task_struct *p;
 	enum pid_type type;
 	struct pid *pid;
+	int group = 1;
 	
 	read_lock(&fown->lock);
+
 	type = fown->pid_type;
+	if (type == PIDTYPE_MAX) {
+		group = 0;
+		type = PIDTYPE_PID;
+	}
+
 	pid = fown->pid;
 	if (!pid)
 		goto out_unlock_fown;
 	
 	read_lock(&tasklist_lock);
 	do_each_pid_task(pid, type, p) {
-		send_sigio_to_task(p, fown, fd, band);
+		send_sigio_to_task(p, fown, fd, band, group);
 	} while_each_pid_task(pid, type, p);
 	read_unlock(&tasklist_lock);
  out_unlock_fown:
Index: linux-2.6/include/asm-generic/fcntl.h
===================================================================
--- linux-2.6.orig/include/asm-generic/fcntl.h
+++ linux-2.6/include/asm-generic/fcntl.h
@@ -73,6 +73,19 @@
 #define F_SETSIG	10	/* for sockets. */
 #define F_GETSIG	11	/* for sockets. */
 #endif
+#ifndef F_SETOWN_EX
+#define F_SETOWN_EX	12
+#define F_GETOWN_EX	13
+#endif
+
+#define F_OWNER_TID	0
+#define F_OWNER_PID	1
+#define F_OWNER_GID	2
+
+struct f_owner_ex {
+	int	type;
+	pid_t	pid;
+};
 
 /* for F_[GET|SET]FL */
 #define FD_CLOEXEC	1	/* actually anything with low bit set goes */
Index: linux-2.6/arch/alpha/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/alpha/include/asm/fcntl.h
+++ linux-2.6/arch/alpha/include/asm/fcntl.h
@@ -26,6 +26,8 @@
 #define F_GETOWN	6	/*  for sockets. */
 #define F_SETSIG	10	/*  for sockets. */
 #define F_GETSIG	11	/*  for sockets. */
+#define F_SETOWN_EX	12
+#define F_GETOWN_EX	13
 
 /* for posix fcntl() and lockf() */
 #define F_RDLCK		1



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: perf_counters issue with self-sampling threads
       [not found]   ` <7c86c4470907272213w2ee57080re50dd22a4d73a7e0@mail.gmail.com>
  2009-07-28  8:51     ` stephane eranian
@ 2009-08-04 16:09     ` stephane eranian
  1 sibling, 0 replies; 49+ messages in thread
From: stephane eranian @ 2009-08-04 16:09 UTC (permalink / raw)
  To: Andi Kleen
  Cc: perfmon2-devel, Maynard Johnson, Robert Richter, LKML,
	Thomas Gleixner, Paul Mackerras, Carl Love, Corey J Ashford,
	eranian, Andrew Morton, Peter Zijlstra, Philip Mucci,
	Dan Terpstra, Ingo Molnar

On Tue, Jul 28, 2009 at 7:13 AM, stephane eranian<eranian@googlemail.com> wrote:
> Andi,
>
> Looks like SIGPROF is calling _group_send_sig_info(), so I think it is
> subject to the same problem.
>
I did not look into SIGPROF a bit more.

First, SIGPROF is generated from ITIMER_PROF. My understanding is that this is
a global timer for the process. It may therefore fire in any thread.
Then SIGPROF
is pended to the shared signal queue as per the group_send_sig_info() code path.
That means the thread receiving the signal may not be the one in which the timer
expired. But typically things even out. But things change if the
monitored process
is using signal, and in particular signals pended to the private
signal queue which is
what happens with pthread_kill() I would think. Then, yes there may be
an imbalance.

But in the case of SIGPROF, it is not clear to me if you want to
change this behavior
as well. It all depends on the definition for ITIMER_PROF.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX
  2009-08-04 11:39                               ` [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX Peter Zijlstra
@ 2009-08-04 16:20                                 ` Oleg Nesterov
  2009-08-04 16:52                                   ` Peter Zijlstra
  0 siblings, 1 reply; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-04 16:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On 08/04, Peter Zijlstra wrote:
>
> +static int f_setown_ex(struct file *filp, unsigned long arg)
> +{
> +	struct f_owner_ex * __user owner_p = (void * __user)arg;
> +	struct f_owner_ex owner;
> +	struct pid *pid;
> +	int type;
> +	int ret;
> +
> +	ret = copy_from_user(&owner, owner_p, sizeof(owner));
> +	if (ret)
> +		return ret;
> +
> +	switch (owner.type) {
> +	case F_OWNER_TID:
> +		type = PIDTYPE_MAX;
> +		break;
> +
> +	case F_OWNER_PID:
> +		type = PIDTYPE_PID;
> +		break;
> +
> +	case F_OWNER_GID:
> +		type = PIDTYPE_PGID;
> +		break;
> +	}

Note that send_sigio()->do_each_pid_task(type) must use the valid
type < PIDTYPE_MAX, or we can crash/etc.

This means f_setown_ex() should be careful with the wrong owner->type,
the switch() above needs

	default:
		return -EINVAL;

> +	rcu_read_lock();
> +	pid = find_vpid(owner.pid);
> +	ret = __f_setown(filp, pid, type, 1);
> +	rcu_read_unlock();
> +
> +	return ret;

Perhaps it makes sense to return -ESRCH if owner.pid && !pid, not
sure.

> @@ -474,16 +540,23 @@ void send_sigio(struct fown_struct *fown
>  	struct task_struct *p;
>  	enum pid_type type;
>  	struct pid *pid;
> +	int group = 1;
>  	
>  	read_lock(&fown->lock);
> +
>  	type = fown->pid_type;
> +	if (type == PIDTYPE_MAX) {
> +		group = 0;
> +		type = PIDTYPE_PID;
> +	}

And send_sigurg() needs the same change. I am not sure we should teach
send_sigurg_to_task() to handle the F_OWNER_TID, but we must ensure
send_sigurg()->do_each_pid_task() won't be called with PIDTYPE_MAX.


Otherwise, personally I think this is what we need to solve the problem.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX
  2009-08-04 16:20                                 ` Oleg Nesterov
@ 2009-08-04 16:52                                   ` Peter Zijlstra
  2009-08-04 17:19                                     ` Oleg Nesterov
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2009-08-04 16:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On Tue, 2009-08-04 at 18:20 +0200, Oleg Nesterov wrote:
> On 08/04, Peter Zijlstra wrote:
> >
> > +static int f_setown_ex(struct file *filp, unsigned long arg)
> > +{
> > +	struct f_owner_ex * __user owner_p = (void * __user)arg;
> > +	struct f_owner_ex owner;
> > +	struct pid *pid;
> > +	int type;
> > +	int ret;
> > +
> > +	ret = copy_from_user(&owner, owner_p, sizeof(owner));
> > +	if (ret)
> > +		return ret;
> > +
> > +	switch (owner.type) {
> > +	case F_OWNER_TID:
> > +		type = PIDTYPE_MAX;
> > +		break;
> > +
> > +	case F_OWNER_PID:
> > +		type = PIDTYPE_PID;
> > +		break;
> > +
> > +	case F_OWNER_GID:
> > +		type = PIDTYPE_PGID;
> > +		break;
> > +	}
> 
> Note that send_sigio()->do_each_pid_task(type) must use the valid
> type < PIDTYPE_MAX, or we can crash/etc.
> 
> This means f_setown_ex() should be careful with the wrong owner->type,
> the switch() above needs
> 
> 	default:
> 		return -EINVAL;

D'0h very true.

> > +	rcu_read_lock();
> > +	pid = find_vpid(owner.pid);
> > +	ret = __f_setown(filp, pid, type, 1);
> > +	rcu_read_unlock();
> > +
> > +	return ret;
> 
> Perhaps it makes sense to return -ESRCH if owner.pid && !pid, not
> sure.

We'd need that case to unset/clear the owner, so returning -ESRCH might
confuse users I think.

> > @@ -474,16 +540,23 @@ void send_sigio(struct fown_struct *fown
> >  	struct task_struct *p;
> >  	enum pid_type type;
> >  	struct pid *pid;
> > +	int group = 1;
> >  	
> >  	read_lock(&fown->lock);
> > +
> >  	type = fown->pid_type;
> > +	if (type == PIDTYPE_MAX) {
> > +		group = 0;
> > +		type = PIDTYPE_PID;
> > +	}
> 
> And send_sigurg() needs the same change. I am not sure we should teach
> send_sigurg_to_task() to handle the F_OWNER_TID, but we must ensure
> send_sigurg()->do_each_pid_task() won't be called with PIDTYPE_MAX.

How about the below delta, it changes send_sigurg_to_task() to also use
do_send_sig_info() which looses the check_kill_permission() check, but
your previous changes lost that same thing from SIGIO -- so I'm hoping
that's ok.

> Otherwise, personally I think this is what we need to solve the problem.

yay!

I'll look over the bits again and send out a -v4 later today.

----
Index: linux-2.6/fs/fcntl.c
===================================================================
--- linux-2.6.orig/fs/fcntl.c
+++ linux-2.6/fs/fcntl.c
@@ -287,6 +287,9 @@ static int f_setown_ex(struct file *filp
 	case F_OWNER_GID:
 		type = PIDTYPE_PGID;
 		break;
+
+	default:
+		return -EINVAL;
 	}
 
 	rcu_read_lock();
@@ -564,10 +567,10 @@ void send_sigio(struct fown_struct *fown
 }
 
 static void send_sigurg_to_task(struct task_struct *p,
-                                struct fown_struct *fown)
+                                struct fown_struct *fown, int group)
 {
 	if (sigio_perm(p, fown, SIGURG))
-		group_send_sig_info(SIGURG, SEND_SIG_PRIV, p);
+		do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, group);
 }
 
 int send_sigurg(struct fown_struct *fown)
@@ -575,10 +578,17 @@ int send_sigurg(struct fown_struct *fown
 	struct task_struct *p;
 	enum pid_type type;
 	struct pid *pid;
+	int group = 1;
 	int ret = 0;
 	
 	read_lock(&fown->lock);
+
 	type = fown->pid_type;
+	if (type == PIDTYPE_MAX) {
+		group = 0;
+		type = PIDTYPE_PID;
+	}
+
 	pid = fown->pid;
 	if (!pid)
 		goto out_unlock_fown;
@@ -587,7 +597,7 @@ int send_sigurg(struct fown_struct *fown
 	
 	read_lock(&tasklist_lock);
 	do_each_pid_task(pid, type, p) {
-		send_sigurg_to_task(p, fown);
+		send_sigurg_to_task(p, fown, group);
 	} while_each_pid_task(pid, type, p);
 	read_unlock(&tasklist_lock);
  out_unlock_fown:




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX
  2009-08-04 16:52                                   ` Peter Zijlstra
@ 2009-08-04 17:19                                     ` Oleg Nesterov
  2009-08-06 13:14                                       ` [PATCH 3/2 -v4] " Peter Zijlstra
  0 siblings, 1 reply; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-04 17:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On 08/04, Peter Zijlstra wrote:
>
> On Tue, 2009-08-04 at 18:20 +0200, Oleg Nesterov wrote:
>
> > > +	rcu_read_lock();
> > > +	pid = find_vpid(owner.pid);
> > > +	ret = __f_setown(filp, pid, type, 1);
> > > +	rcu_read_unlock();
> > > +
> > > +	return ret;
> >
> > Perhaps it makes sense to return -ESRCH if owner.pid && !pid, not
> > sure.
>
> We'd need that case to unset/clear the owner, so returning -ESRCH might
> confuse users I think.

Agreed. Perhaps we should do nothing but return -ESRCH if user passes
owner->pid != 0 and it is not valid.

But this is minor and can be tweaked later. (and to clarify again, not
that I really think we should do this, just a random thought).

> How about the below delta, it changes send_sigurg_to_task() to also use
> do_send_sig_info() which looses the check_kill_permission() check, but
> your previous changes lost that same thing from SIGIO -- so I'm hoping
> that's ok.

Yes, I think this is fine!

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 3/2 -v4] fcntl: F_[SG]ETOWN_EX
  2009-08-04 17:19                                     ` Oleg Nesterov
@ 2009-08-06 13:14                                       ` Peter Zijlstra
  2009-08-06 19:05                                         ` Oleg Nesterov
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2009-08-06 13:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

Hopefully I got everything right this time ;-)

---
Subject: fcntl: F_[SG]ETOWN_EX
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri, 31 Jul 2009 10:35:30 +0200

In order to direct the SIGIO signal to a particular thread of a
multi-threaded application we cannot, like suggested by the manpage, put
a TID into the regular fcntl(F_SETOWN) call. It will still be send to
the whole process of which that thread is part.

Since people do want to properly direct SIGIO we introduce F_SETOWN_EX.

The need to direct SIGIO comes from self-monitoring profiling such as
with perf-counters. Perf-counters uses SIGIO to notify that new sample
data is available. If the signal is delivered to the same task that
generated the new sample it can augment that data by inspecting the
task's user-space state right after it returns from the kernel. This
is esp. convenient for interpreted or virtual machine driven
environments.

Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex
as argument:

struct f_owner_ex {
	int   type;
	pid_t pid;
};

Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/alpha/include/asm/fcntl.h  |    2 
 arch/parisc/include/asm/fcntl.h |    2 
 fs/fcntl.c                      |  108 +++++++++++++++++++++++++++++++++++++---
 include/asm-generic/fcntl.h     |   13 ++++
 4 files changed, 117 insertions(+), 8 deletions(-)

Index: linux-2.6/arch/parisc/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/parisc/include/asm/fcntl.h
+++ linux-2.6/arch/parisc/include/asm/fcntl.h
@@ -28,6 +28,8 @@
 #define F_SETOWN	12	/*  for sockets. */
 #define F_SETSIG	13	/*  for sockets. */
 #define F_GETSIG	14	/*  for sockets. */
+#define F_GETOWN_EX	15
+#define F_SETOWN_EX	16
 
 /* for posix fcntl() and lockf() */
 #define F_RDLCK		01
Index: linux-2.6/fs/fcntl.c
===================================================================
--- linux-2.6.orig/fs/fcntl.c
+++ linux-2.6/fs/fcntl.c
@@ -263,6 +263,79 @@ pid_t f_getown(struct file *filp)
 	return pid;
 }
 
+static int f_setown_ex(struct file *filp, unsigned long arg)
+{
+	struct f_owner_ex * __user owner_p = (void * __user)arg;
+	struct f_owner_ex owner;
+	struct pid *pid;
+	int type;
+	int ret;
+
+	ret = copy_from_user(&owner, owner_p, sizeof(owner));
+	if (ret)
+		return ret;
+
+	switch (owner.type) {
+	case F_OWNER_TID:
+		type = PIDTYPE_MAX;
+		break;
+
+	case F_OWNER_PID:
+		type = PIDTYPE_PID;
+		break;
+
+	case F_OWNER_GID:
+		type = PIDTYPE_PGID;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	rcu_read_lock();
+	pid = find_vpid(owner.pid);
+	if (owner.pid && !pid)
+		ret = -ESRCH;
+	else
+		ret = __f_setown(filp, pid, type, 1);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static int f_getown_ex(struct file *filp, unsigned long arg)
+{
+	struct f_owner_ex * __user owner_p = (void * __user)arg;
+	struct f_owner_ex owner;
+	int ret = 0;
+
+	read_lock(&filp->f_owner.lock);
+	owner.pid = pid_vnr(filp->f_owner.pid);
+	switch (filp->f_owner.pid_type) {
+	case PIDTYPE_MAX:
+		owner.type = F_OWNER_TID;
+		break;
+
+	case PIDTYPE_PID:
+		owner.type = F_OWNER_PID;
+		break;
+
+	case PIDTYPE_PGID:
+		owner.type = F_OWNER_GID;
+		break;
+
+	default:
+		WARN_ON(1);
+		ret = -EINVAL;
+		break;
+	}
+	read_unlock(&filp->f_owner.lock);
+
+	if (!ret)
+		ret = copy_to_user(owner_p, &owner, sizeof(owner));
+	return ret;
+}
+
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		struct file *filp)
 {
@@ -313,6 +386,12 @@ static long do_fcntl(int fd, unsigned in
 	case F_SETOWN:
 		err = f_setown(filp, arg, 1);
 		break;
+	case F_GETOWN_EX:
+		err = f_getown_ex(filp, arg);
+		break;
+	case F_SETOWN_EX:
+		err = f_setown_ex(filp, arg);
+		break;
 	case F_GETSIG:
 		err = filp->f_owner.signum;
 		break;
@@ -428,8 +507,7 @@ static inline int sigio_perm(struct task
 
 static void send_sigio_to_task(struct task_struct *p,
 			       struct fown_struct *fown,
-			       int fd,
-			       int reason)
+			       int fd, int reason, int group)
 {
 	/*
 	 * F_SETSIG can change ->signum lockless in parallel, make
@@ -461,11 +539,11 @@ static void send_sigio_to_task(struct ta
 			else
 				si.si_band = band_table[reason - POLL_IN];
 			si.si_fd    = fd;
-			if (!do_send_sig_info(signum, &si, p, true))
+			if (!do_send_sig_info(signum, &si, p, group))
 				break;
 		/* fall-through: fall back on the old plain SIGIO signal */
 		case 0:
-			do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true);
+			do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, group);
 	}
 }
 
@@ -474,16 +552,23 @@ void send_sigio(struct fown_struct *fown
 	struct task_struct *p;
 	enum pid_type type;
 	struct pid *pid;
+	int group = 1;
 	
 	read_lock(&fown->lock);
+
 	type = fown->pid_type;
+	if (type == PIDTYPE_MAX) {
+		group = 0;
+		type = PIDTYPE_PID;
+	}
+
 	pid = fown->pid;
 	if (!pid)
 		goto out_unlock_fown;
 	
 	read_lock(&tasklist_lock);
 	do_each_pid_task(pid, type, p) {
-		send_sigio_to_task(p, fown, fd, band);
+		send_sigio_to_task(p, fown, fd, band, group);
 	} while_each_pid_task(pid, type, p);
 	read_unlock(&tasklist_lock);
  out_unlock_fown:
@@ -491,10 +576,10 @@ void send_sigio(struct fown_struct *fown
 }
 
 static void send_sigurg_to_task(struct task_struct *p,
-                                struct fown_struct *fown)
+				struct fown_struct *fown, int group)
 {
 	if (sigio_perm(p, fown, SIGURG))
-		group_send_sig_info(SIGURG, SEND_SIG_PRIV, p);
+		do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, group);
 }
 
 int send_sigurg(struct fown_struct *fown)
@@ -502,10 +587,17 @@ int send_sigurg(struct fown_struct *fown
 	struct task_struct *p;
 	enum pid_type type;
 	struct pid *pid;
+	int group = 1;
 	int ret = 0;
 	
 	read_lock(&fown->lock);
+
 	type = fown->pid_type;
+	if (type == PIDTYPE_MAX) {
+		group = 0;
+		type = PIDTYPE_PID;
+	}
+
 	pid = fown->pid;
 	if (!pid)
 		goto out_unlock_fown;
@@ -514,7 +606,7 @@ int send_sigurg(struct fown_struct *fown
 	
 	read_lock(&tasklist_lock);
 	do_each_pid_task(pid, type, p) {
-		send_sigurg_to_task(p, fown);
+		send_sigurg_to_task(p, fown, group);
 	} while_each_pid_task(pid, type, p);
 	read_unlock(&tasklist_lock);
  out_unlock_fown:
Index: linux-2.6/include/asm-generic/fcntl.h
===================================================================
--- linux-2.6.orig/include/asm-generic/fcntl.h
+++ linux-2.6/include/asm-generic/fcntl.h
@@ -73,6 +73,19 @@
 #define F_SETSIG	10	/* for sockets. */
 #define F_GETSIG	11	/* for sockets. */
 #endif
+#ifndef F_SETOWN_EX
+#define F_SETOWN_EX	12
+#define F_GETOWN_EX	13
+#endif
+
+#define F_OWNER_TID	0
+#define F_OWNER_PID	1
+#define F_OWNER_GID	2
+
+struct f_owner_ex {
+	int	type;
+	pid_t	pid;
+};
 
 /* for F_[GET|SET]FL */
 #define FD_CLOEXEC	1	/* actually anything with low bit set goes */
Index: linux-2.6/arch/alpha/include/asm/fcntl.h
===================================================================
--- linux-2.6.orig/arch/alpha/include/asm/fcntl.h
+++ linux-2.6/arch/alpha/include/asm/fcntl.h
@@ -26,6 +26,8 @@
 #define F_GETOWN	6	/*  for sockets. */
 #define F_SETSIG	10	/*  for sockets. */
 #define F_GETSIG	11	/*  for sockets. */
+#define F_SETOWN_EX	12
+#define F_GETOWN_EX	13
 
 /* for posix fcntl() and lockf() */
 #define F_RDLCK		1



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/2 -v4] fcntl: F_[SG]ETOWN_EX
  2009-08-06 13:14                                       ` [PATCH 3/2 -v4] " Peter Zijlstra
@ 2009-08-06 19:05                                         ` Oleg Nesterov
  2009-08-07 12:10                                           ` stephane eranian
  0 siblings, 1 reply; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-06 19:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, eranian, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On 08/06, Peter Zijlstra wrote:
>
> Subject: fcntl: F_[SG]ETOWN_EX
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Fri, 31 Jul 2009 10:35:30 +0200
>
> In order to direct the SIGIO signal to a particular thread of a
> multi-threaded application we cannot, like suggested by the manpage, put
> a TID into the regular fcntl(F_SETOWN) call. It will still be send to
> the whole process of which that thread is part.
>
> Since people do want to properly direct SIGIO we introduce F_SETOWN_EX.
>
> The need to direct SIGIO comes from self-monitoring profiling such as
> with perf-counters. Perf-counters uses SIGIO to notify that new sample
> data is available. If the signal is delivered to the same task that
> generated the new sample it can augment that data by inspecting the
> task's user-space state right after it returns from the kernel. This
> is esp. convenient for interpreted or virtual machine driven
> environments.
>
> Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex
> as argument:
>
> struct f_owner_ex {
> 	int   type;
> 	pid_t pid;
> };
>
> Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID.

I think the patch is right.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>

> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/alpha/include/asm/fcntl.h  |    2 
>  arch/parisc/include/asm/fcntl.h |    2 
>  fs/fcntl.c                      |  108 +++++++++++++++++++++++++++++++++++++---
>  include/asm-generic/fcntl.h     |   13 ++++
>  4 files changed, 117 insertions(+), 8 deletions(-)
> 
> Index: linux-2.6/arch/parisc/include/asm/fcntl.h
> ===================================================================
> --- linux-2.6.orig/arch/parisc/include/asm/fcntl.h
> +++ linux-2.6/arch/parisc/include/asm/fcntl.h
> @@ -28,6 +28,8 @@
>  #define F_SETOWN	12	/*  for sockets. */
>  #define F_SETSIG	13	/*  for sockets. */
>  #define F_GETSIG	14	/*  for sockets. */
> +#define F_GETOWN_EX	15
> +#define F_SETOWN_EX	16
>  
>  /* for posix fcntl() and lockf() */
>  #define F_RDLCK		01
> Index: linux-2.6/fs/fcntl.c
> ===================================================================
> --- linux-2.6.orig/fs/fcntl.c
> +++ linux-2.6/fs/fcntl.c
> @@ -263,6 +263,79 @@ pid_t f_getown(struct file *filp)
>  	return pid;
>  }
>  
> +static int f_setown_ex(struct file *filp, unsigned long arg)
> +{
> +	struct f_owner_ex * __user owner_p = (void * __user)arg;
> +	struct f_owner_ex owner;
> +	struct pid *pid;
> +	int type;
> +	int ret;
> +
> +	ret = copy_from_user(&owner, owner_p, sizeof(owner));
> +	if (ret)
> +		return ret;
> +
> +	switch (owner.type) {
> +	case F_OWNER_TID:
> +		type = PIDTYPE_MAX;
> +		break;
> +
> +	case F_OWNER_PID:
> +		type = PIDTYPE_PID;
> +		break;
> +
> +	case F_OWNER_GID:
> +		type = PIDTYPE_PGID;
> +		break;
> +
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	rcu_read_lock();
> +	pid = find_vpid(owner.pid);
> +	if (owner.pid && !pid)
> +		ret = -ESRCH;
> +	else
> +		ret = __f_setown(filp, pid, type, 1);
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +
> +static int f_getown_ex(struct file *filp, unsigned long arg)
> +{
> +	struct f_owner_ex * __user owner_p = (void * __user)arg;
> +	struct f_owner_ex owner;
> +	int ret = 0;
> +
> +	read_lock(&filp->f_owner.lock);
> +	owner.pid = pid_vnr(filp->f_owner.pid);
> +	switch (filp->f_owner.pid_type) {
> +	case PIDTYPE_MAX:
> +		owner.type = F_OWNER_TID;
> +		break;
> +
> +	case PIDTYPE_PID:
> +		owner.type = F_OWNER_PID;
> +		break;
> +
> +	case PIDTYPE_PGID:
> +		owner.type = F_OWNER_GID;
> +		break;
> +
> +	default:
> +		WARN_ON(1);
> +		ret = -EINVAL;
> +		break;
> +	}
> +	read_unlock(&filp->f_owner.lock);
> +
> +	if (!ret)
> +		ret = copy_to_user(owner_p, &owner, sizeof(owner));
> +	return ret;
> +}
> +
>  static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>  		struct file *filp)
>  {
> @@ -313,6 +386,12 @@ static long do_fcntl(int fd, unsigned in
>  	case F_SETOWN:
>  		err = f_setown(filp, arg, 1);
>  		break;
> +	case F_GETOWN_EX:
> +		err = f_getown_ex(filp, arg);
> +		break;
> +	case F_SETOWN_EX:
> +		err = f_setown_ex(filp, arg);
> +		break;
>  	case F_GETSIG:
>  		err = filp->f_owner.signum;
>  		break;
> @@ -428,8 +507,7 @@ static inline int sigio_perm(struct task
>  
>  static void send_sigio_to_task(struct task_struct *p,
>  			       struct fown_struct *fown,
> -			       int fd,
> -			       int reason)
> +			       int fd, int reason, int group)
>  {
>  	/*
>  	 * F_SETSIG can change ->signum lockless in parallel, make
> @@ -461,11 +539,11 @@ static void send_sigio_to_task(struct ta
>  			else
>  				si.si_band = band_table[reason - POLL_IN];
>  			si.si_fd    = fd;
> -			if (!do_send_sig_info(signum, &si, p, true))
> +			if (!do_send_sig_info(signum, &si, p, group))
>  				break;
>  		/* fall-through: fall back on the old plain SIGIO signal */
>  		case 0:
> -			do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true);
> +			do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, group);
>  	}
>  }
>  
> @@ -474,16 +552,23 @@ void send_sigio(struct fown_struct *fown
>  	struct task_struct *p;
>  	enum pid_type type;
>  	struct pid *pid;
> +	int group = 1;
>  	
>  	read_lock(&fown->lock);
> +
>  	type = fown->pid_type;
> +	if (type == PIDTYPE_MAX) {
> +		group = 0;
> +		type = PIDTYPE_PID;
> +	}
> +
>  	pid = fown->pid;
>  	if (!pid)
>  		goto out_unlock_fown;
>  	
>  	read_lock(&tasklist_lock);
>  	do_each_pid_task(pid, type, p) {
> -		send_sigio_to_task(p, fown, fd, band);
> +		send_sigio_to_task(p, fown, fd, band, group);
>  	} while_each_pid_task(pid, type, p);
>  	read_unlock(&tasklist_lock);
>   out_unlock_fown:
> @@ -491,10 +576,10 @@ void send_sigio(struct fown_struct *fown
>  }
>  
>  static void send_sigurg_to_task(struct task_struct *p,
> -                                struct fown_struct *fown)
> +				struct fown_struct *fown, int group)
>  {
>  	if (sigio_perm(p, fown, SIGURG))
> -		group_send_sig_info(SIGURG, SEND_SIG_PRIV, p);
> +		do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, group);
>  }
>  
>  int send_sigurg(struct fown_struct *fown)
> @@ -502,10 +587,17 @@ int send_sigurg(struct fown_struct *fown
>  	struct task_struct *p;
>  	enum pid_type type;
>  	struct pid *pid;
> +	int group = 1;
>  	int ret = 0;
>  	
>  	read_lock(&fown->lock);
> +
>  	type = fown->pid_type;
> +	if (type == PIDTYPE_MAX) {
> +		group = 0;
> +		type = PIDTYPE_PID;
> +	}
> +
>  	pid = fown->pid;
>  	if (!pid)
>  		goto out_unlock_fown;
> @@ -514,7 +606,7 @@ int send_sigurg(struct fown_struct *fown
>  	
>  	read_lock(&tasklist_lock);
>  	do_each_pid_task(pid, type, p) {
> -		send_sigurg_to_task(p, fown);
> +		send_sigurg_to_task(p, fown, group);
>  	} while_each_pid_task(pid, type, p);
>  	read_unlock(&tasklist_lock);
>   out_unlock_fown:
> Index: linux-2.6/include/asm-generic/fcntl.h
> ===================================================================
> --- linux-2.6.orig/include/asm-generic/fcntl.h
> +++ linux-2.6/include/asm-generic/fcntl.h
> @@ -73,6 +73,19 @@
>  #define F_SETSIG	10	/* for sockets. */
>  #define F_GETSIG	11	/* for sockets. */
>  #endif
> +#ifndef F_SETOWN_EX
> +#define F_SETOWN_EX	12
> +#define F_GETOWN_EX	13
> +#endif
> +
> +#define F_OWNER_TID	0
> +#define F_OWNER_PID	1
> +#define F_OWNER_GID	2
> +
> +struct f_owner_ex {
> +	int	type;
> +	pid_t	pid;
> +};
>  
>  /* for F_[GET|SET]FL */
>  #define FD_CLOEXEC	1	/* actually anything with low bit set goes */
> Index: linux-2.6/arch/alpha/include/asm/fcntl.h
> ===================================================================
> --- linux-2.6.orig/arch/alpha/include/asm/fcntl.h
> +++ linux-2.6/arch/alpha/include/asm/fcntl.h
> @@ -26,6 +26,8 @@
>  #define F_GETOWN	6	/*  for sockets. */
>  #define F_SETSIG	10	/*  for sockets. */
>  #define F_GETSIG	11	/*  for sockets. */
> +#define F_SETOWN_EX	12
> +#define F_GETOWN_EX	13
>  
>  /* for posix fcntl() and lockf() */
>  #define F_RDLCK		1
> 
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 3/2 -v4] fcntl: F_[SG]ETOWN_EX
  2009-08-06 19:05                                         ` Oleg Nesterov
@ 2009-08-07 12:10                                           ` stephane eranian
  0 siblings, 0 replies; 49+ messages in thread
From: stephane eranian @ 2009-08-07 12:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Andrew Morton, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

HI,

On Thu, Aug 6, 2009 at 9:05 PM, Oleg Nesterov<oleg@redhat.com> wrote:
> On 08/06, Peter Zijlstra wrote:
>>
>> Subject: fcntl: F_[SG]ETOWN_EX
>> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Date: Fri, 31 Jul 2009 10:35:30 +0200
>>
>> In order to direct the SIGIO signal to a particular thread of a
>> multi-threaded application we cannot, like suggested by the manpage, put
>> a TID into the regular fcntl(F_SETOWN) call. It will still be send to
>> the whole process of which that thread is part.
>>
>> Since people do want to properly direct SIGIO we introduce F_SETOWN_EX.
>>
>> The need to direct SIGIO comes from self-monitoring profiling such as
>> with perf-counters. Perf-counters uses SIGIO to notify that new sample
>> data is available. If the signal is delivered to the same task that
>> generated the new sample it can augment that data by inspecting the
>> task's user-space state right after it returns from the kernel. This
>> is esp. convenient for interpreted or virtual machine driven
>> environments.
>>
>> Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex
>> as argument:
>>
>> struct f_owner_ex {
>>       int   type;
>>       pid_t pid;
>> };
>>
>> Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID.
>
> I think the patch is right.
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>

I have tested the patch in 2.6.30 (backport) + perfmon and it seems to
work in my test case.
Have not tried with perfcounters + 2.6.31.
I am glad there is finally a solution to this problem.
Thanks.

>> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> ---
>>  arch/alpha/include/asm/fcntl.h  |    2
>>  arch/parisc/include/asm/fcntl.h |    2
>>  fs/fcntl.c                      |  108 +++++++++++++++++++++++++++++++++++++---
>>  include/asm-generic/fcntl.h     |   13 ++++
>>  4 files changed, 117 insertions(+), 8 deletions(-)
>>
>> Index: linux-2.6/arch/parisc/include/asm/fcntl.h
>> ===================================================================
>> --- linux-2.6.orig/arch/parisc/include/asm/fcntl.h
>> +++ linux-2.6/arch/parisc/include/asm/fcntl.h
>> @@ -28,6 +28,8 @@
>>  #define F_SETOWN     12      /*  for sockets. */
>>  #define F_SETSIG     13      /*  for sockets. */
>>  #define F_GETSIG     14      /*  for sockets. */
>> +#define F_GETOWN_EX  15
>> +#define F_SETOWN_EX  16
>>
>>  /* for posix fcntl() and lockf() */
>>  #define F_RDLCK              01
>> Index: linux-2.6/fs/fcntl.c
>> ===================================================================
>> --- linux-2.6.orig/fs/fcntl.c
>> +++ linux-2.6/fs/fcntl.c
>> @@ -263,6 +263,79 @@ pid_t f_getown(struct file *filp)
>>       return pid;
>>  }
>>
>> +static int f_setown_ex(struct file *filp, unsigned long arg)
>> +{
>> +     struct f_owner_ex * __user owner_p = (void * __user)arg;
>> +     struct f_owner_ex owner;
>> +     struct pid *pid;
>> +     int type;
>> +     int ret;
>> +
>> +     ret = copy_from_user(&owner, owner_p, sizeof(owner));
>> +     if (ret)
>> +             return ret;
>> +
>> +     switch (owner.type) {
>> +     case F_OWNER_TID:
>> +             type = PIDTYPE_MAX;
>> +             break;
>> +
>> +     case F_OWNER_PID:
>> +             type = PIDTYPE_PID;
>> +             break;
>> +
>> +     case F_OWNER_GID:
>> +             type = PIDTYPE_PGID;
>> +             break;
>> +
>> +     default:
>> +             return -EINVAL;
>> +     }
>> +
>> +     rcu_read_lock();
>> +     pid = find_vpid(owner.pid);
>> +     if (owner.pid && !pid)
>> +             ret = -ESRCH;
>> +     else
>> +             ret = __f_setown(filp, pid, type, 1);
>> +     rcu_read_unlock();
>> +
>> +     return ret;
>> +}
>> +
>> +static int f_getown_ex(struct file *filp, unsigned long arg)
>> +{
>> +     struct f_owner_ex * __user owner_p = (void * __user)arg;
>> +     struct f_owner_ex owner;
>> +     int ret = 0;
>> +
>> +     read_lock(&filp->f_owner.lock);
>> +     owner.pid = pid_vnr(filp->f_owner.pid);
>> +     switch (filp->f_owner.pid_type) {
>> +     case PIDTYPE_MAX:
>> +             owner.type = F_OWNER_TID;
>> +             break;
>> +
>> +     case PIDTYPE_PID:
>> +             owner.type = F_OWNER_PID;
>> +             break;
>> +
>> +     case PIDTYPE_PGID:
>> +             owner.type = F_OWNER_GID;
>> +             break;
>> +
>> +     default:
>> +             WARN_ON(1);
>> +             ret = -EINVAL;
>> +             break;
>> +     }
>> +     read_unlock(&filp->f_owner.lock);
>> +
>> +     if (!ret)
>> +             ret = copy_to_user(owner_p, &owner, sizeof(owner));
>> +     return ret;
>> +}
>> +
>>  static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>>               struct file *filp)
>>  {
>> @@ -313,6 +386,12 @@ static long do_fcntl(int fd, unsigned in
>>       case F_SETOWN:
>>               err = f_setown(filp, arg, 1);
>>               break;
>> +     case F_GETOWN_EX:
>> +             err = f_getown_ex(filp, arg);
>> +             break;
>> +     case F_SETOWN_EX:
>> +             err = f_setown_ex(filp, arg);
>> +             break;
>>       case F_GETSIG:
>>               err = filp->f_owner.signum;
>>               break;
>> @@ -428,8 +507,7 @@ static inline int sigio_perm(struct task
>>
>>  static void send_sigio_to_task(struct task_struct *p,
>>                              struct fown_struct *fown,
>> -                            int fd,
>> -                            int reason)
>> +                            int fd, int reason, int group)
>>  {
>>       /*
>>        * F_SETSIG can change ->signum lockless in parallel, make
>> @@ -461,11 +539,11 @@ static void send_sigio_to_task(struct ta
>>                       else
>>                               si.si_band = band_table[reason - POLL_IN];
>>                       si.si_fd    = fd;
>> -                     if (!do_send_sig_info(signum, &si, p, true))
>> +                     if (!do_send_sig_info(signum, &si, p, group))
>>                               break;
>>               /* fall-through: fall back on the old plain SIGIO signal */
>>               case 0:
>> -                     do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, true);
>> +                     do_send_sig_info(SIGIO, SEND_SIG_PRIV, p, group);
>>       }
>>  }
>>
>> @@ -474,16 +552,23 @@ void send_sigio(struct fown_struct *fown
>>       struct task_struct *p;
>>       enum pid_type type;
>>       struct pid *pid;
>> +     int group = 1;
>>
>>       read_lock(&fown->lock);
>> +
>>       type = fown->pid_type;
>> +     if (type == PIDTYPE_MAX) {
>> +             group = 0;
>> +             type = PIDTYPE_PID;
>> +     }
>> +
>>       pid = fown->pid;
>>       if (!pid)
>>               goto out_unlock_fown;
>>
>>       read_lock(&tasklist_lock);
>>       do_each_pid_task(pid, type, p) {
>> -             send_sigio_to_task(p, fown, fd, band);
>> +             send_sigio_to_task(p, fown, fd, band, group);
>>       } while_each_pid_task(pid, type, p);
>>       read_unlock(&tasklist_lock);
>>   out_unlock_fown:
>> @@ -491,10 +576,10 @@ void send_sigio(struct fown_struct *fown
>>  }
>>
>>  static void send_sigurg_to_task(struct task_struct *p,
>> -                                struct fown_struct *fown)
>> +                             struct fown_struct *fown, int group)
>>  {
>>       if (sigio_perm(p, fown, SIGURG))
>> -             group_send_sig_info(SIGURG, SEND_SIG_PRIV, p);
>> +             do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, group);
>>  }
>>
>>  int send_sigurg(struct fown_struct *fown)
>> @@ -502,10 +587,17 @@ int send_sigurg(struct fown_struct *fown
>>       struct task_struct *p;
>>       enum pid_type type;
>>       struct pid *pid;
>> +     int group = 1;
>>       int ret = 0;
>>
>>       read_lock(&fown->lock);
>> +
>>       type = fown->pid_type;
>> +     if (type == PIDTYPE_MAX) {
>> +             group = 0;
>> +             type = PIDTYPE_PID;
>> +     }
>> +
>>       pid = fown->pid;
>>       if (!pid)
>>               goto out_unlock_fown;
>> @@ -514,7 +606,7 @@ int send_sigurg(struct fown_struct *fown
>>
>>       read_lock(&tasklist_lock);
>>       do_each_pid_task(pid, type, p) {
>> -             send_sigurg_to_task(p, fown);
>> +             send_sigurg_to_task(p, fown, group);
>>       } while_each_pid_task(pid, type, p);
>>       read_unlock(&tasklist_lock);
>>   out_unlock_fown:
>> Index: linux-2.6/include/asm-generic/fcntl.h
>> ===================================================================
>> --- linux-2.6.orig/include/asm-generic/fcntl.h
>> +++ linux-2.6/include/asm-generic/fcntl.h
>> @@ -73,6 +73,19 @@
>>  #define F_SETSIG     10      /* for sockets. */
>>  #define F_GETSIG     11      /* for sockets. */
>>  #endif
>> +#ifndef F_SETOWN_EX
>> +#define F_SETOWN_EX  12
>> +#define F_GETOWN_EX  13
>> +#endif
>> +
>> +#define F_OWNER_TID  0
>> +#define F_OWNER_PID  1
>> +#define F_OWNER_GID  2
>> +
>> +struct f_owner_ex {
>> +     int     type;
>> +     pid_t   pid;
>> +};
>>
>>  /* for F_[GET|SET]FL */
>>  #define FD_CLOEXEC   1       /* actually anything with low bit set goes */
>> Index: linux-2.6/arch/alpha/include/asm/fcntl.h
>> ===================================================================
>> --- linux-2.6.orig/arch/alpha/include/asm/fcntl.h
>> +++ linux-2.6/arch/alpha/include/asm/fcntl.h
>> @@ -26,6 +26,8 @@
>>  #define F_GETOWN     6       /*  for sockets. */
>>  #define F_SETSIG     10      /*  for sockets. */
>>  #define F_GETSIG     11      /*  for sockets. */
>> +#define F_SETOWN_EX  12
>> +#define F_GETOWN_EX  13
>>
>>  /* for posix fcntl() and lockf() */
>>  #define F_RDLCK              1
>>
>>
>
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-03 12:53                 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID stephane eranian
@ 2009-08-09  5:46                   ` Jamie Lokier
  2009-08-10 12:22                     ` stephane eranian
  0 siblings, 1 reply; 49+ messages in thread
From: Jamie Lokier @ 2009-08-09  5:46 UTC (permalink / raw)
  To: eranian
  Cc: Andrew Morton, Peter Zijlstra, oleg, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

stephane eranian wrote:
> As Peter found out, the man page seems to indicate that if you specify a TID
> instead a PID with F_SETOWN, then the kernel should interpret this as meaning
> this particular thread, but this is not what is implemented right now.

The behaviour was changed quietly in kernel 2.6.12.

I wrote that part of the man page, and it was true at the time.  SIGIO
signals _were_ thread-specific.

Starting in kernel 2.5.60, SIGIO signals were thread-specific when
F_SETSIG was used to enable queued siginfo.

Prompted by this F_SETOWN_TID patch, I checked through old kernel
patches, and found that signals set by F_SIGSIG were changed to
process-wide in 2.6.12:

    @@ -437,7 +438,7 @@ static void send_sigio_to_task(struct ta
			    else
				    si.si_band = band_table[reason - POLL_IN];
			    si.si_fd    = fd;
    -                       if (!send_sig_info(fown->signum, &si, p))
    +                       if (!send_group_sig_info(fown->signum, &si, p))
				    break;
		    /* fall-through: fall back on the old plain SIGIO signal */
		    case 0:

That's a bit annoying, because it breaks a library which uses queued
I/O signals as an I/O event mechanism when used with multiple threads.
(Fortunatelly epoll is available nowadays; it does I/O events much
better, but sometimes you want SIGIO from epoll's descriptor...)

So the man page is now incorrect, but if F_SETOWN_TID is added and has
the behaviour which F_SETOWN used to have, then the text can be
shuffled around.

If anyone is interested, I have a fairly detailed test program which
checks queued "SIGIO" (F_SETSIG) signals due to I/O on a socket,
including SIGURG and SIGIO on overflow, and checks whether each one is
delivered properly and is thread-specific.  I found quite a few Glibc
and kernel bugs with it in the past.

-- Jamie

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-09  5:46                   ` F_SETOWN_TID: F_SETOWN was thread-specific for a while Jamie Lokier
@ 2009-08-10 12:22                     ` stephane eranian
  2009-08-10 17:03                       ` Oleg Nesterov
  0 siblings, 1 reply; 49+ messages in thread
From: stephane eranian @ 2009-08-10 12:22 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Andrew Morton, Peter Zijlstra, oleg, mingo, linux-kernel, tglx,
	robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

Jamie,

On Sun, Aug 9, 2009 at 7:46 AM, Jamie Lokier<jamie@shareable.org> wrote:
> stephane eranian wrote:
>> As Peter found out, the man page seems to indicate that if you specify a TID
>> instead a PID with F_SETOWN, then the kernel should interpret this as meaning
>> this particular thread, but this is not what is implemented right now.
>
> The behaviour was changed quietly in kernel 2.6.12.
>
> I wrote that part of the man page, and it was true at the time.  SIGIO
> signals _were_ thread-specific.
>

I knew at some point this worked because with perfmon we managed
to get signals delivered to the right thread. But I guess it was just
until 2.6.12 ;-<

> Starting in kernel 2.5.60, SIGIO signals were thread-specific when
> F_SETSIG was used to enable queued siginfo.
>
Thank you for bringing up F_SETSIG. I use it in some of my test programs.
I thought it was related to the shared vs. private queue. But I looked at it
again, and in fact it is related to another yet important issue when
sampling.

You must use F_SETSIG on SIGIO if you want your signal handler to
receive the file descriptor in siginfo. This is useful if you want to perform
some actions on the descriptor. That is the case in perfmon and this is
the case in certain situations with perfcounters as well.

Setting SA_SIGINFO provides siginfo, but the si_fd field is NOT set
correctly without F_SETSIG. I have verified this with perfcounters, and
this is indeed the case.

This behavior seems kind of odd to me. I think it may be a "lucky" side
effect of the intended behavior of F_SETSIG. It would be good to clarify
this point more.

> Prompted by this F_SETOWN_TID patch, I checked through old kernel
> patches, and found that signals set by F_SIGSIG were changed to
> process-wide in 2.6.12:
>
>    @@ -437,7 +438,7 @@ static void send_sigio_to_task(struct ta
>                            else
>                                    si.si_band = band_table[reason - POLL_IN];
>                            si.si_fd    = fd;
>    -                       if (!send_sig_info(fown->signum, &si, p))
>    +                       if (!send_group_sig_info(fown->signum, &si, p))
>                                    break;
>                    /* fall-through: fall back on the old plain SIGIO signal */
>                    case 0:
>
> That's a bit annoying, because it breaks a library which uses queued
> I/O signals as an I/O event mechanism when used with multiple threads.
> (Fortunatelly epoll is available nowadays; it does I/O events much
> better, but sometimes you want SIGIO from epoll's descriptor...)
>
> So the man page is now incorrect, but if F_SETOWN_TID is added and has
> the behaviour which F_SETOWN used to have, then the text can be
> shuffled around.
>
> If anyone is interested, I have a fairly detailed test program which
> checks queued "SIGIO" (F_SETSIG) signals due to I/O on a socket,
> including SIGURG and SIGIO on overflow, and checks whether each one is
> delivered properly and is thread-specific.  I found quite a few Glibc
> and kernel bugs with it in the past.
>
> -- Jamie
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-10 12:22                     ` stephane eranian
@ 2009-08-10 17:03                       ` Oleg Nesterov
  2009-08-10 21:01                         ` stephane eranian
  2009-08-11 13:10                         ` Jamie Lokier
  0 siblings, 2 replies; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-10 17:03 UTC (permalink / raw)
  To: stephane eranian
  Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel,
	tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On 08/10, stephane eranian wrote:
>
> You must use F_SETSIG on SIGIO if you want your signal handler to
> receive the file descriptor in siginfo. This is useful if you want to perform
> some actions on the descriptor. That is the case in perfmon and this is
> the case in certain situations with perfcounters as well.
>
> Setting SA_SIGINFO provides siginfo, but the si_fd field is NOT set
> correctly without F_SETSIG. I have verified this with perfcounters, and
> this is indeed the case.
>
> This behavior seems kind of odd to me.

Agreed, this looks a bit odd. But at least this is documented. From
man 2 fcntl:

	By using F_SETSIG with a nonzero value, and setting SA_SIGINFO for the
	signal handler (see sigac- tion(2)), extra information about I/O events
	is passed to the handler in a  siginfo_t  structure.  If  the si_code
	field indicates the source is SI_SIGIO, the si_fd field gives the file
	descriptor associated with the event.  Otherwise, there is no indication
	which file descriptors are pending,

Not sure if it is safe to change the historical behaviour.

(the manpage is not exactly right though, and the comment in send_sigio_to_task()
 is not right too: SI_SIGIO (and, btw, SI_QUEUE/SI_DETHREAD) is never used).

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-10 17:03                       ` Oleg Nesterov
@ 2009-08-10 21:01                         ` stephane eranian
  2009-08-17 17:16                           ` Oleg Nesterov
  2009-08-11 13:10                         ` Jamie Lokier
  1 sibling, 1 reply; 49+ messages in thread
From: stephane eranian @ 2009-08-10 21:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel,
	tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On Mon, Aug 10, 2009 at 7:03 PM, Oleg Nesterov<oleg@redhat.com> wrote:
> On 08/10, stephane eranian wrote:
>>
>> You must use F_SETSIG on SIGIO if you want your signal handler to
>> receive the file descriptor in siginfo. This is useful if you want to perform
>> some actions on the descriptor. That is the case in perfmon and this is
>> the case in certain situations with perfcounters as well.
>>
>> Setting SA_SIGINFO provides siginfo, but the si_fd field is NOT set
>> correctly without F_SETSIG. I have verified this with perfcounters, and
>> this is indeed the case.
>>
>> This behavior seems kind of odd to me.
>
> Agreed, this looks a bit odd. But at least this is documented. From
> man 2 fcntl:
>
>        By using F_SETSIG with a nonzero value, and setting SA_SIGINFO for the
>        signal handler (see sigac- tion(2)), extra information about I/O events
>        is passed to the handler in a  siginfo_t  structure.  If  the si_code
>        field indicates the source is SI_SIGIO, the si_fd field gives the file
>        descriptor associated with the event.  Otherwise, there is no indication
>        which file descriptors are pending,
>
> Not sure if it is safe to change the historical behaviour.
>
Don't need to change it.
But for SIGIO, if you see SA_SIGINFO, then pass the si_fd.
>From Jamie, it seems F_SETSIG was not added just for this file descriptor trick.
The initial goal seems orthogonal to the file descriptor passing in si_fd.

> (the manpage is not exactly right though, and the comment in send_sigio_to_task()
>  is not right too: SI_SIGIO (and, btw, SI_QUEUE/SI_DETHREAD) is never used).
>
> Oleg.
>
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-10 17:03                       ` Oleg Nesterov
  2009-08-10 21:01                         ` stephane eranian
@ 2009-08-11 13:10                         ` Jamie Lokier
  2009-08-17 17:05                           ` Oleg Nesterov
  1 sibling, 1 reply; 49+ messages in thread
From: Jamie Lokier @ 2009-08-11 13:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: stephane eranian, Andrew Morton, Peter Zijlstra, mingo,
	linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel,
	cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland

Oleg Nesterov wrote:
> Agreed, this looks a bit odd. But at least this is documented. From
> man 2 fcntl:
>
> 	By using F_SETSIG with a nonzero value, and setting SA_SIGINFO
> 	for the signal handler (see sigaction(2)), extra information
> 	about I/O events is passed to the handler in a siginfo_t
> 	structure.  If the si_code field indicates the source is
> 	SI_SIGIO, the si_fd field gives the file descriptor associated
> 	with the event.  Otherwise, there is no indication which file
> 	descriptors are pending,
> 
> Not sure if it is safe to change the historical behaviour.

The change in 2.6.12 breaks some code of mine, which uses RT queued
I/O signals on multiple threads but as far as I know it's not used
anywhere now.

In the <= 2.4 era, there were lots of web servers and benchmarks using
queued I/O signals for scalable event-driven I/O, but I don't know of
any implementation who dared do it with multiple threads, except mine.

It was regarded as "beware ye who enter here" territory, which I can
attest to from the long time it took to get it right and the multitude
of kernel bugs and version changes needing to be worked around.

Since 2.6, everyone uses epoll which is much better, except that
occasionally SIGIO comes in handy when an async notification is
required.

So the change in 2.6.12 does break something that probably isn't much
used, but it's too late now.  Occasionally thread-specific SIGIO (or
F_SETSIG) is useful; F_SETOWN_TID makes that nice and clear.

I would drop the pseudo-"bug compatible" behaviour of using negative
tid to mean pid; that's pointless.  I'd also make F_GETOWN return an
error when F_SETOWN_TID has been used, and F_GETOWN_TID return an
error when F_SETOWN has been used.

> (the manpage is not exactly right though, and the comment in
> send_sigio_to_task() is not right too: SI_SIGIO (and, btw,
> SI_QUEUE/SI_DETHREAD) is never used).

Ah, there's another historical change you see.  It was changed in
2.3.21 from SI_SIGIO to POLL_xxx, and si_band started being set at the
same time.  The man page could be updated to reflect that.

(My portable-to-ancient-Linux code checks for si_code == SI_SIGIO, in
which case it has the descriptor but doesn't know what type of event
(pre 2.3.21) so adds it to a poll() set, or checks for si_code ==
POLL_xxx, in which case it ignores the si_code value completely and
looks at si_band for the set of pending events because some patch that
was never mainlined could result in multiple si_band bits set).

-- Jamie

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-11 13:10                         ` Jamie Lokier
@ 2009-08-17 17:05                           ` Oleg Nesterov
  0 siblings, 0 replies; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-17 17:05 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: stephane eranian, Andrew Morton, Peter Zijlstra, mingo,
	linux-kernel, tglx, robert.richter, paulus, andi, mpjohn, cel,
	cjashfor, mucci, terpstra, perfmon2-devel, mtk.manpages, roland

Sorry for late reply.

And I am a bit confused.

On 08/11, Jamie Lokier wrote:
>
> Oleg Nesterov wrote:
> > Agreed, this looks a bit odd. But at least this is documented. From
> > man 2 fcntl:
> >
> > 	By using F_SETSIG with a nonzero value, and setting SA_SIGINFO
> > 	for the signal handler (see sigaction(2)), extra information
> > 	about I/O events is passed to the handler in a siginfo_t
> > 	structure.  If the si_code field indicates the source is
> > 	SI_SIGIO, the si_fd field gives the file descriptor associated
> > 	with the event.  Otherwise, there is no indication which file
> > 	descriptors are pending,
> >
> > Not sure if it is safe to change the historical behaviour.
>
> The change in 2.6.12 breaks some code of mine, which uses RT queued
> I/O signals on multiple threads but as far as I know it's not used
> anywhere now.
>
> In the <= 2.4 era, there were lots of web servers and benchmarks using
> queued I/O signals for scalable event-driven I/O, but I don't know of
> any implementation who dared do it with multiple threads, except mine.
>
> It was regarded as "beware ye who enter here" territory, which I can
> attest to from the long time it took to get it right and the multitude
> of kernel bugs and version changes needing to be worked around.
>
> Since 2.6, everyone uses epoll which is much better, except that
> occasionally SIGIO comes in handy when an async notification is
> required.
>
> So the change in 2.6.12 does break something that probably isn't much
> used, but it's too late now.

So, you seem to agree we should not change this odd behaviour?

> Occasionally thread-specific SIGIO (or
> F_SETSIG) is useful; F_SETOWN_TID makes that nice and clear.

Great. If you agree with F_SETOWN_TID, could you look at the next
Peter's patch

	"[PATCH 3/2 -v4] fcntl: F_[SG]ETOWN_EX"
	http://marc.info/?l=linux-kernel&m=124956452125468

and ack it?

> I would drop the pseudo-"bug compatible" behaviour of using negative
> tid to mean pid; that's pointless.

done,

> I'd also make F_GETOWN return an
> error when F_SETOWN_TID has been used,

This is not trivial, F_GETOWN can't return the error. A negative
result means PIDTYPE_PGID.

> and F_GETOWN_TID return an
> error when F_SETOWN has been used.

F_GETOWN_EX does this even better.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-10 21:01                         ` stephane eranian
@ 2009-08-17 17:16                           ` Oleg Nesterov
  2009-08-17 17:40                             ` Oleg Nesterov
  2009-08-17 22:26                             ` stephane eranian
  0 siblings, 2 replies; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-17 17:16 UTC (permalink / raw)
  To: stephane eranian
  Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel,
	tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

Sorry for late reply...

On 08/10, stephane eranian wrote:
>
> On Mon, Aug 10, 2009 at 7:03 PM, Oleg Nesterov<oleg@redhat.com> wrote:
> > On 08/10, stephane eranian wrote:
> >>
> >> You must use F_SETSIG on SIGIO if you want your signal handler to
> >> receive the file descriptor in siginfo. This is useful if you want to perform
> >> some actions on the descriptor. That is the case in perfmon and this is
> >> the case in certain situations with perfcounters as well.
> >>
> >> Setting SA_SIGINFO provides siginfo, but the si_fd field is NOT set
> >> correctly without F_SETSIG. I have verified this with perfcounters, and
> >> this is indeed the case.
> >>
> >> This behavior seems kind of odd to me.
> >
> > Agreed, this looks a bit odd. But at least this is documented. From
> > man 2 fcntl:
> >
> >        By using F_SETSIG with a nonzero value, and setting SA_SIGINFO for the
> >        signal handler (see sigac- tion(2)), extra information about I/O events
> >        is passed to the handler in a  siginfo_t  structure.  If  the si_code
> >        field indicates the source is SI_SIGIO, the si_fd field gives the file
> >        descriptor associated with the event.  Otherwise, there is no indication
> >        which file descriptors are pending,
> >
> > Not sure if it is safe to change the historical behaviour.
> >
> Don't need to change it.

Good,

> But for SIGIO, if you see SA_SIGINFO, then pass the si_fd.

But this means we do change the behaviour ;) Confused.


In any case. We should not look at SA_SIGINFO at all. If sys_sigaction() was
called without SA_SIGINFO, then it doesn'matter if we send SEND_SIG_PRIV or
siginfo_t with the correct si_fd/etc.

And again, this is even documented. The change is trivial but user-space
visible, it may confuse the (stupid) app which uses SIGIO + SA_SIGINFO
without F_SETSIG.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-17 17:16                           ` Oleg Nesterov
@ 2009-08-17 17:40                             ` Oleg Nesterov
  2009-08-17 22:26                             ` stephane eranian
  1 sibling, 0 replies; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-17 17:40 UTC (permalink / raw)
  To: stephane eranian
  Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel,
	tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

Forgot to show the patch,

On 08/17, Oleg Nesterov wrote:
>
> And again, this is even documented. The change is trivial but user-space
> visible, it may confuse the (stupid) app which uses SIGIO + SA_SIGINFO
> without F_SETSIG.

Oleg.

Personally I do not really think this change is good idea. (and in any
case it should be re-diffed on top of Peter's OWN_EX patch).

Btw. _in theory_, "case 0" is not right wrt security_file_send_sigiotask(sig).
I think we shouldn't worry.

--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -431,6 +431,7 @@ static void send_sigio_to_task(struct ta
 			       int fd,
 			       int reason)
 {
+	siginfo_t si;
 	/*
 	 * F_SETSIG can change ->signum lockless in parallel, make
 	 * sure we read it once and use the same value throughout.
@@ -439,33 +440,33 @@ static void send_sigio_to_task(struct ta
 
 	if (!sigio_perm(p, fown, signum))
 		return;
+	/* Queue a rt signal with the appropriate fd as its
+	   value.  We use SI_SIGIO as the source, not 
+	   SI_KERNEL, since kernel signals always get 
+	   delivered even if we can't queue.  Failure to
+	   queue in this case _should_ be reported; we fall
+	   back to SIGIO in that case. --sct */
+	si.si_errno = 0;
+	si.si_fd    = fd;
+	si.si_code  = reason;
+	/* Make sure we are called with one of the POLL_*
+	   reasons, otherwise we could leak kernel stack into
+	   userspace.  */
+	BUG_ON((reason & __SI_MASK) != __SI_POLL);
+	if (reason - POLL_IN >= NSIGPOLL)
+		si.si_band  = ~0L;
+	else
+		si.si_band = band_table[reason - POLL_IN];
 
 	switch (signum) {
-		siginfo_t si;
 		default:
-			/* Queue a rt signal with the appropriate fd as its
-			   value.  We use SI_SIGIO as the source, not 
-			   SI_KERNEL, since kernel signals always get 
-			   delivered even if we can't queue.  Failure to
-			   queue in this case _should_ be reported; we fall
-			   back to SIGIO in that case. --sct */
 			si.si_signo = signum;
-			si.si_errno = 0;
-		        si.si_code  = reason;
-			/* Make sure we are called with one of the POLL_*
-			   reasons, otherwise we could leak kernel stack into
-			   userspace.  */
-			BUG_ON((reason & __SI_MASK) != __SI_POLL);
-			if (reason - POLL_IN >= NSIGPOLL)
-				si.si_band  = ~0L;
-			else
-				si.si_band = band_table[reason - POLL_IN];
-			si.si_fd    = fd;
 			if (!group_send_sig_info(signum, &si, p))
 				break;
 		/* fall-through: fall back on the old plain SIGIO signal */
 		case 0:
-			group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
+			si.si_signo = SIGIO;
+			group_send_sig_info(SIGIO, &si, p);
 	}
 }
 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-17 17:16                           ` Oleg Nesterov
  2009-08-17 17:40                             ` Oleg Nesterov
@ 2009-08-17 22:26                             ` stephane eranian
  2009-08-18 11:45                               ` Oleg Nesterov
  1 sibling, 1 reply; 49+ messages in thread
From: stephane eranian @ 2009-08-17 22:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel,
	tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

Oleg,

On Mon, Aug 17, 2009 at 7:16 PM, Oleg Nesterov<oleg@redhat.com> wrote:
> Sorry for late reply...
>
> On 08/10, stephane eranian wrote:
>>
>> On Mon, Aug 10, 2009 at 7:03 PM, Oleg Nesterov<oleg@redhat.com> wrote:
>> > On 08/10, stephane eranian wrote:
>> >>
>> >> You must use F_SETSIG on SIGIO if you want your signal handler to
>> >> receive the file descriptor in siginfo. This is useful if you want to perform
>> >> some actions on the descriptor. That is the case in perfmon and this is
>> >> the case in certain situations with perfcounters as well.
>> >>
>> >> Setting SA_SIGINFO provides siginfo, but the si_fd field is NOT set
>> >> correctly without F_SETSIG. I have verified this with perfcounters, and
>> >> this is indeed the case.
>> >>
>> >> This behavior seems kind of odd to me.
>> >
>> > Agreed, this looks a bit odd. But at least this is documented. From
>> > man 2 fcntl:
>> >
>> >        By using F_SETSIG with a nonzero value, and setting SA_SIGINFO for the
>> >        signal handler (see sigac- tion(2)), extra information about I/O events
>> >        is passed to the handler in a  siginfo_t  structure.  If  the si_code
>> >        field indicates the source is SI_SIGIO, the si_fd field gives the file
>> >        descriptor associated with the event.  Otherwise, there is no indication
>> >        which file descriptors are pending,
>> >
>> > Not sure if it is safe to change the historical behaviour.
>> >
>> Don't need to change it.
>
> Good,
>
>> But for SIGIO, if you see SA_SIGINFO, then pass the si_fd.
>
> But this means we do change the behaviour ;) Confused.
>
I meant do not remove F_SETSIG and its side-effect on si_fd.

>
> In any case. We should not look at SA_SIGINFO at all. If sys_sigaction() was
> called without SA_SIGINFO, then it doesn'matter if we send SEND_SIG_PRIV or
> siginfo_t with the correct si_fd/etc.
>
What's the official role of SA_SIGINFO? Pass a siginfo struct?

Does POSIX describe the rules governing the content of si_fd?
Or is si_fd a Linux-ony extension in which case it goes with F_SETSIG.

> And again, this is even documented. The change is trivial but user-space
> visible, it may confuse the (stupid) app which uses SIGIO + SA_SIGINFO
> without F_SETSIG.
>
That would be an app that uses SIGINFO and fiddles with si_fd which has no
defined content. What kind of app would that be?

It would seem natural that in the siginfo passed to the handler of SIGIO, the
file descriptor be passed by default. That is all I am trying to say here.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-17 22:26                             ` stephane eranian
@ 2009-08-18 11:45                               ` Oleg Nesterov
  2009-08-20 10:00                                 ` stephane eranian
  0 siblings, 1 reply; 49+ messages in thread
From: Oleg Nesterov @ 2009-08-18 11:45 UTC (permalink / raw)
  To: stephane eranian
  Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel,
	tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

On 08/18, stephane eranian wrote:
>
> On Mon, Aug 17, 2009 at 7:16 PM, Oleg Nesterov<oleg@redhat.com> wrote:
> > Sorry for late reply...
> >
> > On 08/10, stephane eranian wrote:
> >>
> >> On Mon, Aug 10, 2009 at 7:03 PM, Oleg Nesterov<oleg@redhat.com> wrote:
> >> >
> >> > Not sure if it is safe to change the historical behaviour.
> >> >
> >> Don't need to change it.
> >
> > Good,
> >
> >> But for SIGIO, if you see SA_SIGINFO, then pass the si_fd.
> >
> > But this means we do change the behaviour ;) Confused.
> >
> I meant do not remove F_SETSIG and its side-effect on si_fd.

Ah, now I see what you meant.

> > In any case. We should not look at SA_SIGINFO at all. If sys_sigaction() was
> > called without SA_SIGINFO, then it doesn'matter if we send SEND_SIG_PRIV or
> > siginfo_t with the correct si_fd/etc.
> >
> What's the official role of SA_SIGINFO? Pass a siginfo struct?
>
> Does POSIX describe the rules governing the content of si_fd?
> Or is si_fd a Linux-ony extension in which case it goes with F_SETSIG.

Not sure I understand your concern...

OK. You suggest to pass siginfo_t with .si_fd/etc when we detect SA_SIGINFO.

But, in that case we can _always_ pass siginfo_t, regardless of SA_SIGINFO.
If the task has a signal handler and sigaction() was called without
SA_SIGINFO, then the handler must not look into *info (the second arg of
sigaction->sa_sigaction). And in fact, __setup_rt_frame() doesn't even
copy the info to the user-space:

	if (ka->sa.sa_flags & SA_SIGINFO) {
		if (copy_siginfo_to_user(&frame->info, info))
			return -EFAULT;
	}

OK? Or I missed something?

> > And again, this is even documented. The change is trivial but user-space
> > visible, it may confuse the (stupid) app which uses SIGIO + SA_SIGINFO
> > without F_SETSIG.
> >
> That would be an app that uses SIGINFO and fiddles with si_fd which has no
> defined content. What kind of app would that be?

The stupid app. But it is always unsafe to make the user-visible changes
without good reasons. Even when we fix the bug (and the current code is not
buggy) sometimes we have "this patch breaks my app or test-case!" reports.
If nothing else, we can break the test-case which simply does

	void sigio_handler(int sig, siginfo_t *info, void *u)
	{
		assert(info->si_code == 0 && info->si_code = 0x80);
	}

Once again: this is _documented_ !

And we can't set .si_fd = fd whithout changing .si_code, this will break
copy_siginfo_to_user().

Or. Suppose that some app does:

	void io_handler(int sig, siginfo_t *info, void *u)
	{
		if ((info->si_code & __SI_MASK) != SI_POLL) {
			// RT signal failed! sig MUST be == SIGIO
			recover();
		} else {
			do_something(info->si_fd);
		}
	}

	int main(void)
	{
		sigaction(SIGRTMIN, { SA_SIGINFO, io_handler });
		sigaction(SIGIO,    { SA_SIGINFO, io_handler });
		...
	}

This is correct. But if we change the current behaviour, this app won't
be able to detect the overflow.

> It would seem natural that in the siginfo passed to the handler of SIGIO, the
> file descriptor be passed by default. That is all I am trying to say here.

Completely agreed! I was always puzzled by send_sigio_to_task(). I was never
able to understand why it looks so strange.

So, I think it should be

	static void send_sigio_to_task(struct task_struct *p,
				       struct fown_struct *fown,
				       int fd,
				       int reason)
	{
		siginfo_t si;
		/*
		 * F_SETSIG can change ->signum lockless in parallel, make
		 * sure we read it once and use the same value throughout.
		 */
		int signum = ACCESS_ONCE(fown->signum) ?: SIGIO;

		if (!sigio_perm(p, fown, signum))
			return;

		si.si_signo = signum;
		si.si_errno = 0;
		si.si_code  = reason;
		si.si_fd    = fd;
		/* Make sure we are called with one of the POLL_*
		   reasons, otherwise we could leak kernel stack into
		   userspace.  */
		BUG_ON((reason & __SI_MASK) != __SI_POLL);
		if (reason - POLL_IN >= NSIGPOLL)
			si.si_band  = ~0L;
		else
			si.si_band = band_table[reason - POLL_IN];

		/* Failure to queue an rt signal must be reported as SIGIO */
		if (!group_send_sig_info(signum, &si, p))
			group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
	}

(except it should be on top of fcntl-add-f_etown_ex.patch).
This way, at least we don't break the "detect RT signal failed" above.

What do you think?

But let me repeat: I just can't convince myself we have a good reason
to change the strange, but carefully documented behaviour.

In case you agree with the code above, I can send the patch. But only
if I have a "good" changelog from you + your Signed-of-by in advance ;)

Otherwise, please feel free to send this/similar change yourself.

Oleg.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: F_SETOWN_TID: F_SETOWN was thread-specific for a while
  2009-08-18 11:45                               ` Oleg Nesterov
@ 2009-08-20 10:00                                 ` stephane eranian
  0 siblings, 0 replies; 49+ messages in thread
From: stephane eranian @ 2009-08-20 10:00 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jamie Lokier, Andrew Morton, Peter Zijlstra, mingo, linux-kernel,
	tglx, robert.richter, paulus, andi, mpjohn, cel, cjashfor, mucci,
	terpstra, perfmon2-devel, mtk.manpages, roland

Oleg,

On Tue, Aug 18, 2009 at 1:45 PM, Oleg Nesterov<oleg@redhat.com> wrote:
> On 08/18, stephane eranian wrote:
>> > In any case. We should not look at SA_SIGINFO at all. If sys_sigaction() was
>> > called without SA_SIGINFO, then it doesn'matter if we send SEND_SIG_PRIV or
>> > siginfo_t with the correct si_fd/etc.
>> >
>> What's the official role of SA_SIGINFO? Pass a siginfo struct?
>>
>> Does POSIX describe the rules governing the content of si_fd?
>> Or is si_fd a Linux-ony extension in which case it goes with F_SETSIG.
>
> Not sure I understand your concern...
>
> OK. You suggest to pass siginfo_t with .si_fd/etc when we detect SA_SIGINFO.
>
The reason I refer to SA_SIGINFO is simply because of the excerpt from the
sigaction man page:

       If  SA_SIGINFO  is specified in sa_flags, then sa_sigaction (instead of
       sa_handler) specifies the signal-handling function  for  signum.   This
       function receives the signal number as its first argument, a pointer to
       a siginfo_t as its second argument and a pointer to a ucontext_t  (cast
       to void *) as its third argument.

In other words, I use this to emphasize the fact that to get a siginfo
struct, you need to pass SA_SIGINFO and use sa_sigaction instead of
sa_handler. That's all I am saying.


> But, in that case we can _always_ pass siginfo_t, regardless of SA_SIGINFO.
> If the task has a signal handler and sigaction() was called without
> SA_SIGINFO, then the handler must not look into *info (the second arg of
> sigaction->sa_sigaction). And in fact, __setup_rt_frame() doesn't even
> copy the info to the user-space:
>
>        if (ka->sa.sa_flags & SA_SIGINFO) {
>                if (copy_siginfo_to_user(&frame->info, info))
>                        return -EFAULT;
>        }
>
> OK? Or I missed something?
>
No, I think we are in agreement. To get meaningful siginfo use SA_SIGINFO.

> Or. Suppose that some app does:
>
>       void io_handler(int sig, siginfo_t *info, void *u)
>       {
>               if ((info->si_code & __SI_MASK) != SI_POLL) {
>                      // RT signal failed! sig MUST be == SIGIO
>                       recover();
>               } else {
>                      do_something(info->si_fd);
>               }
>       }
>
>       int main(void)
>       {
>               sigaction(SIGRTMIN, { SA_SIGINFO, io_handler });
>               sigaction(SIGIO,    { SA_SIGINFO, io_handler });
>               ...
>       }
>
I don't think you can check si_code without first checking which
signal it is in si_signo. The values for si_code overlap between
the different struct in siginfo->_sifields.

>> It would seem natural that in the siginfo passed to the handler of SIGIO, the
>> file descriptor be passed by default. That is all I am trying to say here.
>
> Completely agreed! I was always puzzled by send_sigio_to_task(). I was never
> able to understand why it looks so strange.
>
> So, I think it should be
>
>        static void send_sigio_to_task(struct task_struct *p,
>                                       struct fown_struct *fown,
>                                       int fd,
>                                       int reason)
>        {
>                siginfo_t si;
>                /*
>                 * F_SETSIG can change ->signum lockless in parallel, make
>                 * sure we read it once and use the same value throughout.
>                 */
>                int signum = ACCESS_ONCE(fown->signum) ?: SIGIO;
>
>                if (!sigio_perm(p, fown, signum))
>                        return;
>
>                si.si_signo = signum;
>                si.si_errno = 0;
>                si.si_code  = reason;
>                si.si_fd    = fd;
>                /* Make sure we are called with one of the POLL_*
>                   reasons, otherwise we could leak kernel stack into
>                   userspace.  */
>                BUG_ON((reason & __SI_MASK) != __SI_POLL);
>                if (reason - POLL_IN >= NSIGPOLL)
>                        si.si_band  = ~0L;
>                else
>                        si.si_band = band_table[reason - POLL_IN];
>
>                /* Failure to queue an rt signal must be reported as SIGIO */
>                if (!group_send_sig_info(signum, &si, p))
>                        group_send_sig_info(SIGIO, SEND_SIG_PRIV, p);
>        }
>
> (except it should be on top of fcntl-add-f_etown_ex.patch).
> This way, at least we don't break the "detect RT signal failed" above.
>
> What do you think?
>
> But let me repeat: I just can't convince myself we have a good reason
> to change the strange, but carefully documented behaviour.
>
I agree with you. Given the existing documentation in the man page
of fcntl() and the various code examples. I think it is possible for
developers to figure out how to get the si_fd in the handler. This
problem is not specific to perf_counters nor perfmon. Any SIGIO-based
program may be interested in getting si_fd, therefore I am assuming the
solution is well understood at this point.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2009-08-20 10:00 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-27 16:51 perf_counters issue with self-sampling threads stephane eranian
2009-07-27 16:56 ` Peter Zijlstra
2009-07-27 21:25 ` Andi Kleen
     [not found]   ` <7c86c4470907272213w2ee57080re50dd22a4d73a7e0@mail.gmail.com>
2009-07-28  8:51     ` stephane eranian
2009-07-28  8:56       ` Andi Kleen
2009-07-28  9:13         ` stephane eranian
2009-08-04 16:09     ` stephane eranian
2009-07-29 12:19 ` Peter Zijlstra
2009-07-29 12:37   ` stephane eranian
2009-07-29 12:46     ` Peter Zijlstra
2009-07-29 22:17   ` Oleg Nesterov
2009-07-30 11:31     ` Peter Zijlstra
2009-07-30 19:20       ` Oleg Nesterov
2009-07-30 20:00         ` Peter Zijlstra
2009-07-30 20:28           ` Oleg Nesterov
2009-07-30 21:09             ` stephane eranian
2009-07-31  8:35             ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
2009-07-31 14:01               ` stephane eranian
2009-07-31 20:52               ` Oleg Nesterov
2009-07-31 21:11               ` Andrew Morton
2009-08-01  1:27                 ` [PATCH 0/2] send_sigio/do_send_sig_info (Was: [RFC][PATCH] fcntl: F_[SG]ETOWN_TID) Oleg Nesterov
2009-08-03 15:48                   ` [PATCH 3/2] fcntl: F_[SG]ETOWN_TID Peter Zijlstra
2009-08-03 17:16                     ` Oleg Nesterov
2009-08-03 17:47                       ` Peter Zijlstra
2009-08-03 18:06                         ` Oleg Nesterov
2009-08-03 18:36                           ` Peter Zijlstra
2009-08-03 19:02                             ` Oleg Nesterov
2009-08-04 11:39                               ` [PATCH 3/2 -v3] fcntl: F_[SG]ETOWN_EX Peter Zijlstra
2009-08-04 16:20                                 ` Oleg Nesterov
2009-08-04 16:52                                   ` Peter Zijlstra
2009-08-04 17:19                                     ` Oleg Nesterov
2009-08-06 13:14                                       ` [PATCH 3/2 -v4] " Peter Zijlstra
2009-08-06 19:05                                         ` Oleg Nesterov
2009-08-07 12:10                                           ` stephane eranian
2009-08-01  1:28                 ` [PATCH 1/2] signals: introduce do_send_sig_info() helper Oleg Nesterov
2009-08-01  1:28                 ` [PATCH 2/2] signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission() Oleg Nesterov
2009-08-03 12:53                 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID stephane eranian
2009-08-09  5:46                   ` F_SETOWN_TID: F_SETOWN was thread-specific for a while Jamie Lokier
2009-08-10 12:22                     ` stephane eranian
2009-08-10 17:03                       ` Oleg Nesterov
2009-08-10 21:01                         ` stephane eranian
2009-08-17 17:16                           ` Oleg Nesterov
2009-08-17 17:40                             ` Oleg Nesterov
2009-08-17 22:26                             ` stephane eranian
2009-08-18 11:45                               ` Oleg Nesterov
2009-08-20 10:00                                 ` stephane eranian
2009-08-11 13:10                         ` Jamie Lokier
2009-08-17 17:05                           ` Oleg Nesterov
2009-08-03 15:21                 ` [RFC][PATCH] fcntl: F_[SG]ETOWN_TID Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.