linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Scheduler: SIGSTOP on multi threaded processes
@ 2005-05-04 17:37 Olivier Croquette
  2005-05-04 18:16 ` Richard B. Johnson
  2005-05-04 19:10 ` Alexander Nyberg
  0 siblings, 2 replies; 19+ messages in thread
From: Olivier Croquette @ 2005-05-04 17:37 UTC (permalink / raw)
  To: LKML

Hello

On a 2.6.11 x86 system, I am SIGSTOP'ing processes which have started 
several threads before.

As expected, all threads are suspended.

But surprisingly, it can happen that some threads are still scheduled 
after the SIGSTOP has been issued.

Typically, they get scheduled 2 times within the next 5ms, before being 
really stopped.

Sadly, I could not reproduce that in a smaller example yet.

As this behaviour is IMA against the SIGSTOP concept, I tried to analyze 
the kernel code responsible for that. I could not really find the exact 
lines.

So here are my questions:

1. do you know any reason for which the SIGSTOP would not stop 
immediatly all threads of a process?

2. where do the threads get suspended exactly in the kernel? I think it 
is in signal.c but I am not sure exactly were.

3. can you confirm that the bug MUST be in my code? :)

Thanks!

Best regards

Olivier

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-04 17:37 Scheduler: SIGSTOP on multi threaded processes Olivier Croquette
@ 2005-05-04 18:16 ` Richard B. Johnson
  2005-05-04 19:16   ` Daniel Jacobowitz
  2005-05-05  1:04   ` Andy Isaacson
  2005-05-04 19:10 ` Alexander Nyberg
  1 sibling, 2 replies; 19+ messages in thread
From: Richard B. Johnson @ 2005-05-04 18:16 UTC (permalink / raw)
  To: Olivier Croquette; +Cc: LKML

On Wed, 4 May 2005, Olivier Croquette wrote:

> Hello
>
> On a 2.6.11 x86 system, I am SIGSTOP'ing processes which have started
> several threads before.
>
> As expected, all threads are suspended.
>
> But surprisingly, it can happen that some threads are still scheduled
> after the SIGSTOP has been issued.
>
> Typically, they get scheduled 2 times within the next 5ms, before being
> really stopped.
>
> Sadly, I could not reproduce that in a smaller example yet.
>
> As this behaviour is IMA against the SIGSTOP concept, I tried to analyze
> the kernel code responsible for that. I could not really find the exact
> lines.
>
> So here are my questions:
>
> 1. do you know any reason for which the SIGSTOP would not stop
> immediatly all threads of a process?
>
> 2. where do the threads get suspended exactly in the kernel? I think it
> is in signal.c but I am not sure exactly were.
>
> 3. can you confirm that the bug MUST be in my code? :)
>
> Thanks!
>
> Best regards
>
> Olivier


The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is
a SIGSTOP and SIGCONT handler. These can be inherited by others
unless changed, perhaps by a 'C' runtime library. Basically,
the SIGSTOP handler executes pause() until the SIGCONT signal
is received.

Any delay in stopping is the time necessary for the signal to
be delivered. It is possible that the section of code that
contains the STOP/CONT handler was paged out and needs to be
paged in before the signal can be delivered.

You might quicken this up by installing your own handler for
SIGSTOP and SIGCONT....

static int stp;

static void contsig(int sig)	// SIGCONT handler
{
    stp = 0;
}

static void stopsig(int sig)  // SIGSTOP handler
{
     stp = 1;
     while(stp)
         pause();
}

Put this near the code that will be executing most of the time.



Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-04 17:37 Scheduler: SIGSTOP on multi threaded processes Olivier Croquette
  2005-05-04 18:16 ` Richard B. Johnson
@ 2005-05-04 19:10 ` Alexander Nyberg
  1 sibling, 0 replies; 19+ messages in thread
From: Alexander Nyberg @ 2005-05-04 19:10 UTC (permalink / raw)
  To: Olivier Croquette; +Cc: LKML

> On a 2.6.11 x86 system, I am SIGSTOP'ing processes which have started 
> several threads before.
> 
> As expected, all threads are suspended.
> 
> But surprisingly, it can happen that some threads are still scheduled 
> after the SIGSTOP has been issued.
> 
> Typically, they get scheduled 2 times within the next 5ms, before being 
> really stopped.
> 
> Sadly, I could not reproduce that in a smaller example yet.
> 
> As this behaviour is IMA against the SIGSTOP concept, I tried to analyze 
> the kernel code responsible for that. I could not really find the exact 
> lines.
> 
> So here are my questions:
> 
> 1. do you know any reason for which the SIGSTOP would not stop 
> immediatly all threads of a process?

The following scenario is possible:
program1 with a thread thread1

1) you send SIGSTOP to program1
2) thread1 is now scheduled and run.
3) program1 is now run and before it is scheduled off it notices it has
a signal set, makes sure all threads in the group gets SIGSTOP set.
4) thread1 is now scheduled and run again. now before it is scheduled
off it will find a signal pending and set itself in SIGSTOP.

There are absolutely no guarantees when a signal will be delivered.
Signals are delivered asynchronously.

> 2. where do the threads get suspended exactly in the kernel? I think it 
> is in signal.c but I am not sure exactly were.

do_notify_resume()
	do_signal()
		get_signal_to_deliver()
			do_signal_stop()
				finish_stop()

> 3. can you confirm that the bug MUST be in my code? :)

You'll have to use reliable mechanisms to achieve what you're looking
for.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-04 18:16 ` Richard B. Johnson
@ 2005-05-04 19:16   ` Daniel Jacobowitz
  2005-05-04 21:06     ` Alex Riesen
  2005-05-05  0:33     ` Richard B. Johnson
  2005-05-05  1:04   ` Andy Isaacson
  1 sibling, 2 replies; 19+ messages in thread
From: Daniel Jacobowitz @ 2005-05-04 19:16 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Olivier Croquette, LKML

On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote:
> The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is
> a SIGSTOP and SIGCONT handler. These can be inherited by others
> unless changed, perhaps by a 'C' runtime library. Basically,
> the SIGSTOP handler executes pause() until the SIGCONT signal
> is received.
> 
> Any delay in stopping is the time necessary for the signal to
> be delivered. It is possible that the section of code that
> contains the STOP/CONT handler was paged out and needs to be
> paged in before the signal can be delivered.
> 
> You might quicken this up by installing your own handler for
> SIGSTOP and SIGCONT....

I don't know what RTOSes you've been working with recently, but none of
the above is true for Linux.  I don't think it ever has been.

-- 
Daniel Jacobowitz
CodeSourcery, LLC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-04 19:16   ` Daniel Jacobowitz
@ 2005-05-04 21:06     ` Alex Riesen
  2005-05-05  0:42       ` Richard B. Johnson
  2005-05-05  0:33     ` Richard B. Johnson
  1 sibling, 1 reply; 19+ messages in thread
From: Alex Riesen @ 2005-05-04 21:06 UTC (permalink / raw)
  To: Richard B. Johnson, Olivier Croquette, LKML

On 5/4/05, Daniel Jacobowitz <dan@debian.org> wrote:
> On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote:
> > The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is
> > a SIGSTOP and SIGCONT handler. These can be inherited by others
> > unless changed, perhaps by a 'C' runtime library. Basically,
> > the SIGSTOP handler executes pause() until the SIGCONT signal
> > is received.
> >
> > Any delay in stopping is the time necessary for the signal to
> > be delivered. It is possible that the section of code that
> > contains the STOP/CONT handler was paged out and needs to be
> > paged in before the signal can be delivered.
> >
> > You might quicken this up by installing your own handler for
> > SIGSTOP and SIGCONT....
> 
> I don't know what RTOSes you've been working with recently, but none of
> the above is true for Linux.  I don't think it ever has been.
> 

I don't even think it was true for anything. It's his usual way of
saying things.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-04 19:16   ` Daniel Jacobowitz
  2005-05-04 21:06     ` Alex Riesen
@ 2005-05-05  0:33     ` Richard B. Johnson
  2005-05-05  0:45       ` Richard B. Johnson
  2005-05-05 12:24       ` Richard B. Johnson
  1 sibling, 2 replies; 19+ messages in thread
From: Richard B. Johnson @ 2005-05-05  0:33 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Olivier Croquette, LKML

On Wed, 4 May 2005, Daniel Jacobowitz wrote:

> On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote:
>> The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is
>> a SIGSTOP and SIGCONT handler. These can be inherited by others
>> unless changed, perhaps by a 'C' runtime library. Basically,
>> the SIGSTOP handler executes pause() until the SIGCONT signal
>> is received.
>>
>> Any delay in stopping is the time necessary for the signal to
>> be delivered. It is possible that the section of code that
>> contains the STOP/CONT handler was paged out and needs to be
>> paged in before the signal can be delivered.
>>
>> You might quicken this up by installing your own handler for
>> SIGSTOP and SIGCONT....
>
> I don't know what RTOSes you've been working with recently, but none of
> the above is true for Linux.  I don't think it ever has been.
>
> -- 
> Daniel Jacobowitz
> CodeSourcery, LLC
>

Grab a copy of your favorite init source. SIGSTOP and SIGCONT are
signals. They are handled by signal handlers, always have been
on Unix and Unix clones like Linux.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-04 21:06     ` Alex Riesen
@ 2005-05-05  0:42       ` Richard B. Johnson
  0 siblings, 0 replies; 19+ messages in thread
From: Richard B. Johnson @ 2005-05-05  0:42 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Olivier Croquette, LKML

On Wed, 4 May 2005, Alex Riesen wrote:

> On 5/4/05, Daniel Jacobowitz <dan@debian.org> wrote:
>> On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote:
>>> The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is
>>> a SIGSTOP and SIGCONT handler. These can be inherited by others
>>> unless changed, perhaps by a 'C' runtime library. Basically,
>>> the SIGSTOP handler executes pause() until the SIGCONT signal
>>> is received.
>>>
>>> Any delay in stopping is the time necessary for the signal to
>>> be delivered. It is possible that the section of code that
>>> contains the STOP/CONT handler was paged out and needs to be
>>> paged in before the signal can be delivered.
>>>
>>> You might quicken this up by installing your own handler for
>>> SIGSTOP and SIGCONT....
>>
>> I don't know what RTOSes you've been working with recently, but none of
>> the above is true for Linux.  I don't think it ever has been.
>>
>
> I don't even think it was true for anything. It's his usual way of
> saying things.
>

Nope, I thought he was talking about the terminal stopper/starter,
SIGTSTP used for X-ON and X-OFF. I thought he was sending that signal,
timing it, then restarting with SIGCONT. You can't restart or
even trap a SIGSTOP signal.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-05  0:33     ` Richard B. Johnson
@ 2005-05-05  0:45       ` Richard B. Johnson
  2005-05-05 12:24       ` Richard B. Johnson
  1 sibling, 0 replies; 19+ messages in thread
From: Richard B. Johnson @ 2005-05-05  0:45 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Olivier Croquette, LKML

On Wed, 4 May 2005, linux-os (Dick Johnson) wrote:

> On Wed, 4 May 2005, Daniel Jacobowitz wrote:
>
>> On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote:
>>> The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is
>>> a SIGSTOP and SIGCONT handler. These can be inherited by others
>>> unless changed, perhaps by a 'C' runtime library. Basically,
>>> the SIGSTOP handler executes pause() until the SIGCONT signal
>>> is received.
>>>
>>> Any delay in stopping is the time necessary for the signal to
>>> be delivered. It is possible that the section of code that
>>> contains the STOP/CONT handler was paged out and needs to be
>>> paged in before the signal can be delivered.
>>>
>>> You might quicken this up by installing your own handler for
>>> SIGSTOP and SIGCONT....
>>
>> I don't know what RTOSes you've been working with recently, but none of
>> the above is true for Linux.  I don't think it ever has been.
>>
>> --
>> Daniel Jacobowitz
>> CodeSourcery, LLC
>>
>
> Grab a copy of your favorite init source. SIGSTOP and SIGCONT are
> signals. They are handled by signal handlers, always have been
> on Unix and Unix clones like Linux.
>

Sorry. I thought he was talking about SIGTSTP and SIGCONT, the
X-ON X-OFF signals. I thought he was sending a SIGTSTP signal
to a task, timing it, then continuing with SIGCONT. He said that
it didn't operate fast enought.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-04 18:16 ` Richard B. Johnson
  2005-05-04 19:16   ` Daniel Jacobowitz
@ 2005-05-05  1:04   ` Andy Isaacson
  1 sibling, 0 replies; 19+ messages in thread
From: Andy Isaacson @ 2005-05-05  1:04 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Olivier Croquette, LKML

On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote:
> On Wed, 4 May 2005, Olivier Croquette wrote:
> >On a 2.6.11 x86 system, I am SIGSTOP'ing processes which have started
> >several threads before.
> 
> The kernel doesn't do SIGSTOP or SIGCONT.

Dear Wrongbot,

No.

-andy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-05  0:33     ` Richard B. Johnson
  2005-05-05  0:45       ` Richard B. Johnson
@ 2005-05-05 12:24       ` Richard B. Johnson
  2005-05-05 13:14         ` Denis Vlasenko
                           ` (3 more replies)
  1 sibling, 4 replies; 19+ messages in thread
From: Richard B. Johnson @ 2005-05-05 12:24 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Olivier Croquette, LKML


I don't think the kernel handler gets a chance to do anything
because SYS-V init installs its own handler(s). There are comments
about Linux misbehavior in the code. It turns out that I was
right about SIGSTOP and SIGCONT...


Source-code header..... Current init version is 2.85 but I can't find
the source. This is 2.62

/*
  * Init		A System-V Init Clone.
  *
  * Usage:	/sbin/init
  *		     init [0123456SsQqAaBbCc]
  *		  telinit [0123456SsQqAaBbCc]
  *
  * Version:	@(#)init.c  2.62  29-May-1996  MvS
  *
  *		This file is part of the sysvinit suite,

[SNIPPED...]

/*
  * Linux ignores all signals sent to init when the
  * SIG_DFL handler is installed. Therefore we must catch SIGTSTP
  * and SIGCONT, or else they won't work....
  *
  * The SIGCONT handler
  */
void cont_handler()
{
   got_cont = 1;
}

/*
  * The SIGSTOP & SIGTSTP handler
  */
void stop_handler()
{
   got_cont = 0;
   while(!got_cont) pause();
   got_cont = 0;
}


Now, if POSIX threads signals were implimented within the kernel,
without first purging the universe of all copies of the SYS-V init
that was distributed with early copies of RedHat and others (don't
know about current copies, a very long search failed to find the
source), then whatever you do in the kernel is wasted.

On Wed, 4 May 2005, Richard B. Johnson wrote:
> On Wed, 4 May 2005, Daniel Jacobowitz wrote:
>
>> On Wed, May 04, 2005 at 02:16:24PM -0400, Richard B. Johnson wrote:
>>> The kernel doesn't do SIGSTOP or SIGCONT. Within init, there is
>>> a SIGSTOP and SIGCONT handler. These can be inherited by others
>>> unless changed, perhaps by a 'C' runtime library. Basically,
>>> the SIGSTOP handler executes pause() until the SIGCONT signal
>>> is received.
>>> 
>>> Any delay in stopping is the time necessary for the signal to
>>> be delivered. It is possible that the section of code that
>>> contains the STOP/CONT handler was paged out and needs to be
>>> paged in before the signal can be delivered.
>>> 
>>> You might quicken this up by installing your own handler for
>>> SIGSTOP and SIGCONT....
>> 
>> I don't know what RTOSes you've been working with recently, but none of
>> the above is true for Linux.  I don't think it ever has been.
>> 
>> -- 
>> Daniel Jacobowitz
>> CodeSourcery, LLC
>> 
>
> Grab a copy of your favorite init source. SIGSTOP and SIGCONT are
> signals. They are handled by signal handlers, always have been
> on Unix and Unix clones like Linux.
>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
> Notice : All mail here is now cached for review by Dictator Bush.
>                 98.36% of all statistics are fiction.
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-05 12:24       ` Richard B. Johnson
@ 2005-05-05 13:14         ` Denis Vlasenko
  2005-05-05 13:30         ` Andreas Schwab
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Denis Vlasenko @ 2005-05-05 13:14 UTC (permalink / raw)
  To: linux-os, Daniel Jacobowitz; +Cc: Olivier Croquette, LKML

On Thursday 05 May 2005 15:24, Richard B. Johnson wrote:
> 
> I don't think the kernel handler gets a chance to do anything
> because SYS-V init installs its own handler(s). There are comments
> about Linux misbehavior in the code. It turns out that I was
> right about SIGSTOP and SIGCONT...

No you are not.
--
vda


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-05 12:24       ` Richard B. Johnson
  2005-05-05 13:14         ` Denis Vlasenko
@ 2005-05-05 13:30         ` Andreas Schwab
  2005-05-05 22:04         ` Miquel van Smoorenburg
  2005-05-10 20:59         ` Scheduler: SIGSTOP on multi threaded processes Olivier Croquette
  3 siblings, 0 replies; 19+ messages in thread
From: Andreas Schwab @ 2005-05-05 13:30 UTC (permalink / raw)
  To: linux-os; +Cc: Daniel Jacobowitz, Olivier Croquette, LKML

"Richard B. Johnson" <linux-os@analogic.com> writes:

> I don't think the kernel handler gets a chance to do anything
> because SYS-V init installs its own handler(s).

It's impossible to install a handler for SIGSTOP.

> There are comments about Linux misbehavior in the code. It turns out
> that I was right about SIGSTOP and SIGCONT...

No, you are wrong.  SIGTSTP != SIGSTOP.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-05 12:24       ` Richard B. Johnson
  2005-05-05 13:14         ` Denis Vlasenko
  2005-05-05 13:30         ` Andreas Schwab
@ 2005-05-05 22:04         ` Miquel van Smoorenburg
  2005-05-06 23:15           ` Problem while stopping many threads within a module Yuly Finkelberg
  2005-05-10 20:59         ` Scheduler: SIGSTOP on multi threaded processes Olivier Croquette
  3 siblings, 1 reply; 19+ messages in thread
From: Miquel van Smoorenburg @ 2005-05-05 22:04 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.61.0505050814340.24130@chaos.analogic.com>,
Richard B. Johnson <linux-os@analogic.com> wrote:
>
>I don't think the kernel handler gets a chance to do anything
>because SYS-V init installs its own handler(s). There are comments
>about Linux misbehavior in the code. It turns out that I was
>right about SIGSTOP and SIGCONT...

No, you're confused. Sysvinit catches SIGTSTP and SIGCONT (not SIGSTOP)
because pid #1 is special - unlike all other processes, SIG_DFL for
pid #1 is equal to SIG_IGN.

And remember - signal handlers are not inherited (how could they be..)
so there is no such thing as "init installing a signal handler
for all processes".

Right now you should go out and buy a copy of the Stevens book,
"Advanced programming in the Unix enviroment", and study it.

Mike.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Problem while stopping many threads within a module
  2005-05-05 22:04         ` Miquel van Smoorenburg
@ 2005-05-06 23:15           ` Yuly Finkelberg
  2006-04-20  8:43             ` shikha
  0 siblings, 1 reply; 19+ messages in thread
From: Yuly Finkelberg @ 2005-05-06 23:15 UTC (permalink / raw)
  To: linux-kernel

Hello -

I'm having a strange thread scheduling issue in a project that I'm
working on.  We have a module, with an interface that can be called by
many (currently 50) threads simulatenously.  Threads that have entered
the kernel, sleep on a wait queue until everyone else has entered.  At
this point, a "master" process wakes up the first thread, which does
some work, then wakes up the second, etc.  After waking up its
successor, each thread changes its state to STOPPED and sends itself a
SIGSTOP.  Note that the threads are created with
CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND but NOT CLONE_THREAD so
there is no group stop.

Basically, the structure is the following:
kernel_entry_point() {
        wait until its your turn
        ...... do some work .... (serialized)
        wake up the next thread
        send SIGSTOP to yourself
}

At the same time, a monitoring process polls until all the threads
have stopped themselves:
monitor() {
repeat:
        for each thread
               if (thread->state < TASK_STOPPED)
                       yield()
                       goto repeat
}

Now, here's the problem.  On 2.6.9 UP (Preempt), it is often the case
that one thread gets "stuck" in between the wake up of the next thread
and stopping itself -- this causes the monitor to poll for extended
periods of time, since the thread remains RUNNING.  Strangely enough,
it generally gets unstuck by itself, sometimes within 10 seconds,
sometimes after as long as 10 minutes.  When peeking at the kernel
stack of the offending process via the monitor, I only see that it is
in schedule and the stack looks like this:

       c55e7ad0 00000086 c55e6000 c55e7a94 00000046 c55e6000 c55e7ad0 c0109c2d 
       00000000 c03ddae0 00000001 fd0b6c12 0013bc9f c6502130 001770fe fd478e5c 
       0013bc9f c55d546c c05d3960 00002710 c05d3960 c55e6000 c0106f25 c05d3960 
Call Trace:
[<c0106f25>] need_resched+0x27/0x32

It also continues to be charged ticks, indicating that its being
scheduled but is making no progress?  However, I can't find anything
that this thread could be spinning on.  Also, I don't understand why
there is no further context on the stack -- the thread does eventually
finish and never leaves the kernel, so the stack shouldn't be
corrupted...  How can it finish if it has nowhere to return?

I realize that this is a long shot, but if anyone has any ideas, I'd
appreciate hearing them.  Please let me know if I can provide any
further information.

Thanks,
-Yuly

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-05 12:24       ` Richard B. Johnson
                           ` (2 preceding siblings ...)
  2005-05-05 22:04         ` Miquel van Smoorenburg
@ 2005-05-10 20:59         ` Olivier Croquette
  2005-05-10 21:12           ` Roland McGrath
  2005-05-10 23:05           ` Alex Riesen
  3 siblings, 2 replies; 19+ messages in thread
From: Olivier Croquette @ 2005-05-10 20:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: roland, alexn, mingo


Hi all


I worked on my problem in the last days, and I came to these main 2 
questions:

- Can a SIGSTOP be in a pending state in Linux?

- If kill(SIGSTOP,...) returns, does that mean that the corresponding 
process is completly suspended?


I thought until now that SIGSTOP was so special that it could never be
pending, and that as soon as:
signal(SIGSTOP,pid)
returned, then it was assured that the corresponding process (and all
its threads) were suspended.

This would make sense in my opinion, but apparently it is not always the
case, and the POSIX norm do not say anything about that.

Any hint?


I did also some experiments, with one program which fork()s into:

- a child which potentially starts threads and does some stuff

- a parent which regularly sends SIGSTOP to the child and check if the 
activity really stopped, and then send SIGCONT again

You will find the source code below.

I tried that with different scheduling policies (SCHED_OTHER and 
SCHED_RR) and different number of threads:
- 0: no thread started (ie. mono threaded child)
- 1: 1 thread started, and the main task just pthread_join() it
- 2: 2 threads started, and the main task pthread_join() them

I came to the following results:

    Policy   OTHER   RR
Threads
0           OK      OK
1           FAIL    OK
2           FAIL    FAIL(1)


- the answer to my 2 questions (see above) see to be No and Yes 
respectively when no thread is started

- (1) For RR with 2 threads, there are 2 observed behaviour, apparently 
happening randomly:

  * either the parent call always stop instantaneously all threads (like 
when no thread is started), and that for a long time

  * or right at the beginning, we can observe that the parent can not do 
that

I find this behaviour really strange.

Any idea?

Can one rely on the fact that the SIGSTOP operates instantaneously for 
non-threaded applications?

Would it be possible to provide that for all applications?




#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>
#include <sys/time.h>

#include <sys/types.h>
#include <sys/wait.h>
#include <sys/ipc.h>
#include <sys/shm.h>


#include <pthread.h>


int set_process_sched(pid_t pid, int policy, int priority) {
   struct sched_param p;

   p.sched_priority = priority;

   if ( 1 || policy != sched_getscheduler(pid) ) {
     if ( sched_setscheduler(pid,policy,&p) ) {
       perror("sched_setscheduler()");
       return 1;
     }
   }

   return 0;
}

unsigned long long gettime(void ) {

   struct timeval tv;

   if ( gettimeofday(&tv, NULL) ) {
     perror("gettimeofday()");
     return 0;
   }

   return (tv.tv_usec + tv.tv_sec * 1000000LL);
}

typedef struct {
   int         thread_nb; /* id defined by us */
   pthread_t   thread_id; /* system id of the thread */
} thread_data;


int   cont_main_loop = 1;


void sigterm_handler(int dummy) {
   printf("sigterm_handler\n");
   return;
}


/* We use a shared memory to communicate between the parent and the child
    They all only work in the first few bytes
*/
int     shmid;
unsigned long long int     *shared_array;
#define SHM_SIZE 1024

static inline void conf_shmem(void ) {

   shmid = shmget(IPC_PRIVATE, SHM_SIZE, 0666 | IPC_CREAT);
   if (shmid == -1) {
     perror("shmget()");
     exit(0);
   }

   shared_array = (long long int *) shmat(shmid, 0, 0);
   if (! shared_array ) {
     perror("shmat()");
     exit(0);
   }
}


void loop(int marker) {
   unsigned long long int begin = gettime();
   /* run for 2 minutes at max
      (useful in case we end up with a busy loop in SCHED_RR... */
   while ( gettime() - begin < 120000000LL ) {
     /* write in the shared memory */
     shared_array[0] = marker;
   }
}

void *go_thread(void *dummy) {
   thread_data *data = (thread_data *) dummy;
   loop(data->thread_nb);
   fprintf(stderr,"%llu\tQuitting!\n",gettime());
   return NULL;
}


#define MAX_THREADS 100

int main(int argc, char **argv)
{
   int pid;
   int test_failed = 0;
   unsigned long long exec_begin = gettime();
   int nb_threads = 0;


   conf_shmem();
   shared_array[0] = 0;

   if ( argc > 1 )
     nb_threads = atoi(argv[1]);
   if ( nb_threads > MAX_THREADS )
     nb_threads = MAX_THREADS;

   pid = fork();

   switch ( pid ) {

     case 0: /* child */
     {
       int thread;
       thread_data threads[MAX_THREADS];

       if ( nb_threads == 0 ) {
         /* no multi threading */
         loop(1);
         break;
       }

       /* start the threads */
       for ( thread = 0 ; thread < nb_threads ; thread ++) {
         threads[thread].thread_nb = thread + 1;
         if ( pthread_create (  & threads[thread].thread_id,
                           NULL,
                           go_thread,
                           (void *)&threads[thread]) )
           perror("pthread_create");

       }

       {
         int thread;
         for ( thread = 0 ; thread < nb_threads ; thread ++) {
           pthread_join (  threads[thread].thread_id, NULL);
         }
       }
       exit(0);
     }

     default: /* parent */
     {
       unsigned long long begin = gettime();

       /* depending whether we set the priorities or not,
          we get different results.
       */

       set_process_sched(0, SCHED_RR, 65);
       set_process_sched(pid, SCHED_RR, 60);


       /* run for 10s */
       while ( gettime() - begin < 10000000 ) {
         unsigned long long int b_stop, a_stop;

         /* let the child run a little bit */
         usleep(1000);

         /* stop it */
         kill(pid, SIGSTOP);

         /* Reset our flag */
         shared_array[0] = 0;

         /* Wait to see if someone dare overwriting our nice zero */
         usleep(1000);
         if ( shared_array[0] > 0 ) {
           test_failed = shared_array[0];
           break;
         }
         kill(pid, SIGCONT);
       }
       kill(pid, SIGKILL);
       break;
     }

     case -1:
       perror("fork()");
       exit(0);
   }

   system("uname -a");
   printf("%d thread(s)\n",nb_threads);
   if ( ! test_failed )
     printf("test passed");
   else
     printf("test FAILED (%d)",test_failed);
   printf(" after %f s\n\n", ( gettime() - exec_begin) / 1000000.0 );

   return 0;
}



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-10 20:59         ` Scheduler: SIGSTOP on multi threaded processes Olivier Croquette
@ 2005-05-10 21:12           ` Roland McGrath
  2005-05-11 18:58             ` Olivier Croquette
  2005-05-10 23:05           ` Alex Riesen
  1 sibling, 1 reply; 19+ messages in thread
From: Roland McGrath @ 2005-05-10 21:12 UTC (permalink / raw)
  To: Olivier Croquette; +Cc: linux-kernel, alexn, mingo

> - Can a SIGSTOP be in a pending state in Linux?

For short periods.

> - If kill(SIGSTOP,...) returns, does that mean that the corresponding 
> process is completly suspended?

No.  One or more threads of the process may still be running on another CPU
momentarily before they process the interrupt and stop for the signal.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-10 20:59         ` Scheduler: SIGSTOP on multi threaded processes Olivier Croquette
  2005-05-10 21:12           ` Roland McGrath
@ 2005-05-10 23:05           ` Alex Riesen
  1 sibling, 0 replies; 19+ messages in thread
From: Alex Riesen @ 2005-05-10 23:05 UTC (permalink / raw)
  To: Olivier Croquette; +Cc: linux-kernel, roland, alexn, mingo

This: http://www.opengroup.org/onlinepubs/009695399/toc.htm
and probably all other issues of Open Group is very interesting reading.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Scheduler: SIGSTOP on multi threaded processes
  2005-05-10 21:12           ` Roland McGrath
@ 2005-05-11 18:58             ` Olivier Croquette
  0 siblings, 0 replies; 19+ messages in thread
From: Olivier Croquette @ 2005-05-11 18:58 UTC (permalink / raw)
  To: Roland McGrath; +Cc: linux-kernel, mingo


Hello Roland

Thanks for your reply.

>>- Can a SIGSTOP be in a pending state in Linux?
> 
> For short periods.
> 
>>- If kill(SIGSTOP,...) returns, does that mean that the corresponding 
>>process is completly suspended?
> 
> No.  One or more threads of the process may still be running on another CPU
> momentarily before they process the interrupt and stop for the signal.


I get sometimes 150ms delay between the end of kill() and suspension of 
the last thread of the 3 threads, on a single-CPU system (Pentium 4).

It seems understandable to me to have a delay of <=1ms, especialy on SMP 
systems, but I really can't understand:

- the so big delays (like the 150ms)

- why only multi-threaded applications make problems

- why the policy of the programs has an impact on the results

- why for some executions, the SIGSTOP effect is instantaneous 100s of 
times in a row, until the end of the test, and the next execution shows 
delays right from the beginning


I don't have much experience hacking the kernel, are these behaviours 
are quite difficult for me to monitor or trace.
I am beginning to run out of ideas to test further :(

Could it be that my observations undercover a problem?
Or are the a consequence of the  Linux implementation?
Or do I have a problem in my test bench?

Can anyone reproduce and/or validate these observations?

Any hint would be appreciated!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Problem while stopping many threads within a module
  2005-05-06 23:15           ` Problem while stopping many threads within a module Yuly Finkelberg
@ 2006-04-20  8:43             ` shikha
  0 siblings, 0 replies; 19+ messages in thread
From: shikha @ 2006-04-20  8:43 UTC (permalink / raw)
  To: linux-kernel

Yuly Finkelberg <liquidicecube <at> gmail.com> writes:

> 
> Hello -
> 
> I'm having a strange thread scheduling issue in a project that I'm
> information.
> 
> Thanks,
> -Yuly
> 


Is there any patch for this problem ? We are facing the same
problem with Java threads on Linux
thanks
shikha



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-04-20  9:05 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-04 17:37 Scheduler: SIGSTOP on multi threaded processes Olivier Croquette
2005-05-04 18:16 ` Richard B. Johnson
2005-05-04 19:16   ` Daniel Jacobowitz
2005-05-04 21:06     ` Alex Riesen
2005-05-05  0:42       ` Richard B. Johnson
2005-05-05  0:33     ` Richard B. Johnson
2005-05-05  0:45       ` Richard B. Johnson
2005-05-05 12:24       ` Richard B. Johnson
2005-05-05 13:14         ` Denis Vlasenko
2005-05-05 13:30         ` Andreas Schwab
2005-05-05 22:04         ` Miquel van Smoorenburg
2005-05-06 23:15           ` Problem while stopping many threads within a module Yuly Finkelberg
2006-04-20  8:43             ` shikha
2005-05-10 20:59         ` Scheduler: SIGSTOP on multi threaded processes Olivier Croquette
2005-05-10 21:12           ` Roland McGrath
2005-05-11 18:58             ` Olivier Croquette
2005-05-10 23:05           ` Alex Riesen
2005-05-05  1:04   ` Andy Isaacson
2005-05-04 19:10 ` Alexander Nyberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).