linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG - HRT patch] disabling timer hangs system when multiple over runs
@ 2003-01-13 18:26 Fleischer, Julie N
  2003-01-14 22:26 ` [BUG - HRT patch] disabling timer hangs system when multiple overruns george anzinger
  0 siblings, 1 reply; 3+ messages in thread
From: Fleischer, Julie N @ 2003-01-13 18:26 UTC (permalink / raw)
  To: 'george@mvista.com'
  Cc: 'high-res-timers-discourse@lists.sourceforge.net',
	'linux-kernel@vger.kernel.org'

George -
I'm testing your 2.5.54-bk1 high-res-timers patches and am working on
debugging an issue I'm seeing where my system hangs (i.e., doesn't accept
input and I have to reboot).  It happens when I'm disabling the timer by
setting the it_value to 0.  I've been able to nail it down to know that it
only happens when you have generated multiple overruns (i.e., when I set up
a repeating timer and block it for > 1 timer expiry, my system then hangs
when I try to disable that timer -- I'm disabling before unblocking the
signals).

I know "system hang" is not very descriptive.  If you have input on what
types of logs I should be looking at to figure out what's really going on or
other ways I can debug, I'll do that.

I have added the tests I'm using to reproduce this issue to
http://posixtest.sf.net.  The original one where I noticed it was
posixtestsuite/conformance/interfaces/timer_gettime/2-3.c after Jim
Houston's bug fix.  Then, I added
posixtestsuite/conformance/interfaces/timer_settime/3-2.c and 3-3.c to help
me get to root cause.  To view the issue, you can either run
timer_gettime/2-3.c, or change timer_settime/3-3.c to use a repeating timer
(in nsecs).  I have included the latter below.

==> One related ignorant question I had is I wanted to test this against
your latest version (2.5.54-bk6), but when I went today to get the bk
patches for 2.5.54, I couldn't find them.  Are those only available for the
current kernel version?  That makes sense -- I should have been quicker.
But, just wanted to check if there was another way for me to get that
version.

Additional information is below:
kernel used = 2.5.54-bk1
HRT patches applied = 
 hrtimers-core-2.5.54-bk1-1.0.patch
 hrtimers-hrposix-2.5.54-bk1-1.0.patch
 hrtimers-i386-2.5.54-bk1-1.0.patch
 hrtimers-posix-2.5.54-bk1-1.0.patch
 hrtimers-support-2.5.52-1.0.patch

Thanks.
- Julie Fleischer

timer_settime/3-3.c - with modifications to show issue
/*   
 * Copyright (c) 2002, Intel Corporation. All rights reserved.
 * Created by:  julie.n.fleischer REMOVE-THIS AT intel DOT com
 * This file is licensed under the GPL license.  For the full content
 * of this license, see the COPYING file at the top level of this 
 * source tree.

 * Test that if value.it_value = 0, the timer is disarmed.  Test by
 * disarming a currently armed and blocked timer.
 *
 * For this test, signal SIGTOTEST will be used, clock CLOCK_REALTIME
 * will be used.
 */

#include <time.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include "posixtest.h"

#define TIMEREXPIRENSEC 10000000
#define SLEEPTIME 1

#define SIGTOTEST SIGALRM

void handler(int signo)
{
	printf("OK to be in once\n");
}

int main(int argc, char *argv[])
{
	sigset_t set;
	struct sigevent ev;
	struct sigaction act;
	timer_t tid;
	struct itimerspec its;
	struct timespec ts;

	ev.sigev_notify = SIGEV_SIGNAL;
	ev.sigev_signo = SIGTOTEST;

	if (sigemptyset(&set) != 0) {
		perror("sigemptyset() did not return success\n");
		return PTS_UNRESOLVED;
	}

	if (sigaddset(&set, SIGTOTEST) != 0) {
		perror("sigaddset() did not return success\n");
		return PTS_UNRESOLVED;
	}

	if (sigprocmask(SIG_SETMASK, &set, NULL) != 0) {
		perror("sigprocmask() did not return success\n");
		return PTS_UNRESOLVED;
	}

	if (timer_create(CLOCK_REALTIME, &ev, &tid) != 0) {
		perror("timer_create() did not return success\n");
		return PTS_UNRESOLVED;
	}

	/*
	 * First set up timer to be blocked
	 */
	its.it_interval.tv_sec = 0;
	its.it_interval.tv_nsec = 5*TIMEREXPIRENSEC;
	its.it_value.tv_sec = 0;
	its.it_value.tv_nsec = TIMEREXPIRENSEC;

	printf("setup first timer\n");
	if (timer_settime(tid, 0, &its, NULL) != 0) {
		perror("timer_settime() did not return success\n");
		return PTS_UNRESOLVED;
	}

	printf("sleep\n");
	sleep(SLEEPTIME);
	printf("awoke\n");

	/*
	 * Second, set value.it_value = 0 and set up handler to catch
	 * signal.
	 */
	act.sa_handler=handler;
	act.sa_flags=0;

	if (sigemptyset(&act.sa_mask) == -1) {
		perror("Error calling sigemptyset\n");
		return PTS_UNRESOLVED;
	}
	if (sigaction(SIGTOTEST, &act, 0) == -1) {
		perror("Error calling sigaction\n");
		return PTS_UNRESOLVED;
	}

	its.it_interval.tv_sec = 0;
	its.it_interval.tv_nsec = 0;
	its.it_value.tv_sec = 0;
	its.it_value.tv_nsec = 0;

	printf("setup second timer\n");
	if (timer_settime(tid, 0, &its, NULL) != 0) {
		perror("timer_settime() did not return success\n");
		return PTS_UNRESOLVED;
	}

	printf("unblock\n");
	if (sigprocmask(SIG_UNBLOCK, &set, NULL) != 0) {
		perror("sigprocmask() did not return success\n");
		return PTS_UNRESOLVED;
	}

	/*
	 * Ensure sleep for TIMEREXPIRE seconds not interrupted
	 */
	ts.tv_sec=SLEEPTIME;
	ts.tv_nsec=0;

	printf("sleep again\n");
	if (nanosleep(&ts, NULL) == -1) {
		printf("nanosleep() interrupted\n");
		printf("Test FAILED\n");
		return PTS_FAIL;
	}

	printf("Test PASSED\n");
	return PTS_PASS;
}

**These views are not necessarily those of my employer.**

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG - HRT patch] disabling timer hangs system when multiple  overruns
  2003-01-13 18:26 [BUG - HRT patch] disabling timer hangs system when multiple over runs Fleischer, Julie N
@ 2003-01-14 22:26 ` george anzinger
  0 siblings, 0 replies; 3+ messages in thread
From: george anzinger @ 2003-01-14 22:26 UTC (permalink / raw)
  To: Fleischer, Julie N
  Cc: 'high-res-timers-discourse@lists.sourceforge.net',
	'linux-kernel@vger.kernel.org'

"Fleischer, Julie N" wrote:
> 
> George -
> I'm testing your 2.5.54-bk1 high-res-timers patches and am working on
> debugging an issue I'm seeing where my system hangs (i.e., doesn't accept
> input and I have to reboot).  It happens when I'm disabling the timer by
> setting the it_value to 0.  I've been able to nail it down to know that it
> only happens when you have generated multiple overruns (i.e., when I set up
> a repeating timer and block it for > 1 timer expiry, my system then hangs
> when I try to disable that timer -- I'm disabling before unblocking the
> signals).
> 
> I know "system hang" is not very descriptive.  If you have input on what
> types of logs I should be looking at to figure out what's really going on or
> other ways I can debug, I'll do that.\

I suspect that you have run into a bug I fixed in the latest
version having to do with handing off a timer from id look
up to the spin lock on the timer.  I was releasing the look
up lock prior to taking the timer lock which allowed an
interrupt to sneek in there and set up a dead lock with the
interrupt code.  Most likey to happen when processing
overruning timers.

This is fixed in the latest patch.


> 
> I have added the tests I'm using to reproduce this issue to
> http://posixtest.sf.net.  The original one where I noticed it was
> posixtestsuite/conformance/interfaces/timer_gettime/2-3.c after Jim
> Houston's bug fix.  Then, I added
> posixtestsuite/conformance/interfaces/timer_settime/3-2.c and 3-3.c to help
> me get to root cause.  To view the issue, you can either run
> timer_gettime/2-3.c, or change timer_settime/3-3.c to use a repeating timer
> (in nsecs).  I have included the latter below.
> 
> ==> One related ignorant question I had is I wanted to test this against
> your latest version (2.5.54-bk6), but when I went today to get the bk
> patches for 2.5.54, I couldn't find them.  Are those only available for the
> current kernel version?  That makes sense -- I should have been quicker.
> But, just wanted to check if there was another way for me to get that
> version.

Oh, you mean the kernel.org patches, yes they are only on
kernel.org until the next version.  It is a rather large
patch.  I suppose I could send it if you can stand MB
attachments.

I have been off line trying to bring up my new computer on
RH8.0 so I have not moved to the latest kernel as yet.

-g
> 
> Additional information is below:
> kernel used = 2.5.54-bk1
> HRT patches applied =
>  hrtimers-core-2.5.54-bk1-1.0.patch
>  hrtimers-hrposix-2.5.54-bk1-1.0.patch
>  hrtimers-i386-2.5.54-bk1-1.0.patch
>  hrtimers-posix-2.5.54-bk1-1.0.patch
>  hrtimers-support-2.5.52-1.0.patch
> 
> Thanks.
> - Julie Fleischer
> 
> timer_settime/3-3.c - with modifications to show issue
> /*
>  * Copyright (c) 2002, Intel Corporation. All rights reserved.
>  * Created by:  julie.n.fleischer REMOVE-THIS AT intel DOT com
>  * This file is licensed under the GPL license.  For the full content
>  * of this license, see the COPYING file at the top level of this
>  * source tree.
> 
>  * Test that if value.it_value = 0, the timer is disarmed.  Test by
>  * disarming a currently armed and blocked timer.
>  *
>  * For this test, signal SIGTOTEST will be used, clock CLOCK_REALTIME
>  * will be used.
>  */
> 
> #include <time.h>
> #include <signal.h>
> #include <stdio.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include "posixtest.h"
> 
> #define TIMEREXPIRENSEC 10000000
> #define SLEEPTIME 1
> 
> #define SIGTOTEST SIGALRM
> 
> void handler(int signo)
> {
>         printf("OK to be in once\n");
> }
> 
> int main(int argc, char *argv[])
> {
>         sigset_t set;
>         struct sigevent ev;
>         struct sigaction act;
>         timer_t tid;
>         struct itimerspec its;
>         struct timespec ts;
> 
>         ev.sigev_notify = SIGEV_SIGNAL;
>         ev.sigev_signo = SIGTOTEST;
> 
>         if (sigemptyset(&set) != 0) {
>                 perror("sigemptyset() did not return success\n");
>                 return PTS_UNRESOLVED;
>         }
> 
>         if (sigaddset(&set, SIGTOTEST) != 0) {
>                 perror("sigaddset() did not return success\n");
>                 return PTS_UNRESOLVED;
>         }
> 
>         if (sigprocmask(SIG_SETMASK, &set, NULL) != 0) {
>                 perror("sigprocmask() did not return success\n");
>                 return PTS_UNRESOLVED;
>         }
> 
>         if (timer_create(CLOCK_REALTIME, &ev, &tid) != 0) {
>                 perror("timer_create() did not return success\n");
>                 return PTS_UNRESOLVED;
>         }
> 
>         /*
>          * First set up timer to be blocked
>          */
>         its.it_interval.tv_sec = 0;
>         its.it_interval.tv_nsec = 5*TIMEREXPIRENSEC;
>         its.it_value.tv_sec = 0;
>         its.it_value.tv_nsec = TIMEREXPIRENSEC;
> 
>         printf("setup first timer\n");
>         if (timer_settime(tid, 0, &its, NULL) != 0) {
>                 perror("timer_settime() did not return success\n");
>                 return PTS_UNRESOLVED;
>         }
> 
>         printf("sleep\n");
>         sleep(SLEEPTIME);
>         printf("awoke\n");
> 
>         /*
>          * Second, set value.it_value = 0 and set up handler to catch
>          * signal.
>          */
>         act.sa_handler=handler;
>         act.sa_flags=0;
> 
>         if (sigemptyset(&act.sa_mask) == -1) {
>                 perror("Error calling sigemptyset\n");
>                 return PTS_UNRESOLVED;
>         }
>         if (sigaction(SIGTOTEST, &act, 0) == -1) {
>                 perror("Error calling sigaction\n");
>                 return PTS_UNRESOLVED;
>         }
> 
>         its.it_interval.tv_sec = 0;
>         its.it_interval.tv_nsec = 0;
>         its.it_value.tv_sec = 0;
>         its.it_value.tv_nsec = 0;
> 
>         printf("setup second timer\n");
>         if (timer_settime(tid, 0, &its, NULL) != 0) {
>                 perror("timer_settime() did not return success\n");
>                 return PTS_UNRESOLVED;
>         }
> 
>         printf("unblock\n");
>         if (sigprocmask(SIG_UNBLOCK, &set, NULL) != 0) {
>                 perror("sigprocmask() did not return success\n");
>                 return PTS_UNRESOLVED;
>         }
> 
>         /*
>          * Ensure sleep for TIMEREXPIRE seconds not interrupted
>          */
>         ts.tv_sec=SLEEPTIME;
>         ts.tv_nsec=0;
> 
>         printf("sleep again\n");
>         if (nanosleep(&ts, NULL) == -1) {
>                 printf("nanosleep() interrupted\n");
>                 printf("Test FAILED\n");
>                 return PTS_FAIL;
>         }
> 
>         printf("Test PASSED\n");
>         return PTS_PASS;
> }
> 
> **These views are not necessarily those of my employer.**
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [BUG - HRT patch] disabling timer hangs system when multiple  overruns
@ 2003-01-16  0:12 Fleischer, Julie N
  0 siblings, 0 replies; 3+ messages in thread
From: Fleischer, Julie N @ 2003-01-16  0:12 UTC (permalink / raw)
  To: 'george anzinger'
  Cc: 'high-res-timers-discourse@lists.sourceforge.net',
	'linux-kernel@vger.kernel.org'

> George Anzinger wrote:
> I suspect that you have run into a bug I fixed in the latest
> version having to do with handing off a timer from id look
> up to the spin lock on the timer.  I was releasing the look
> up lock prior to taking the timer lock which allowed an
> interrupt to sneek in there and set up a dead lock with the
> interrupt code.  Most likey to happen when processing
> overruning timers.
> 
> This is fixed in the latest patch.

George -
Again, sorry about not testing on your latest version (and thanks for the
bk6 patch! :) ).  I ran this test again on your latest patch (the
2.5.54-bk6-1.0 patches), and I'm still seeing a hang of the test case.
There is a good difference, though, I think due to your fix.  In 2.5.54-bk1,
I had to reboot the system after the hang(or 2/3 times I usually had to).
Now, I do not have to reboot the system (or 3/3 times I don't have to), but
the test case still hangs (i.e., I have to manually kill the session the
test case was started in).

I forgot to mention reproducibility before.  The test case hang is always
reproducible (bk6 or bk1 patches).  As I mentioned, the system hang no
longer happens with bk6 (probably the issue you fixed), but the system would
hang ~1/3 times with the bk1 patches.

Someone also suggested I use strace to get more output.  That seemed to help
pinpoint exactly that the issue came doing the "timer_settime(<an its with
it_value=0>)" call.

Here's that output:
(...)
write(1, "setup first timer\n", 18setup first timer
)     = 18
ipc_subcall(0x8000003, 0, 0xbffff8e0, 0) = 0
write(1, "sleep\n", 6sleep
)                  = 6
rt_sigprocmask(SIG_BLOCK, [CHLD], [ALRM], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [ALRM], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})               = 0
write(1, "awoke\n", 6awoke
)                  = 6
rt_sigaction(SIGALRM, {0x80485c0, [], 0x4000000}, NULL, 8) = 0
write(1, "setup second timer\n", 19setup second timer
)    = 19
ipc_subcall(0x8000003, 0, 0xbffff8e0, 0

This is the point where the test case hangs.  If I'm using strace, I just do
a Ctrl-C to get out.

I'm using these patches:
  hrtimers-core-2.5.54-bk6-1.0.patch 
  hrtimers-hrposix-2.5.54-bk6-1.0.patch
  hrtimers-i386-2.5.54-bk6-1.0.patch 
  hrtimers-posix-2.5.54-bk6-1.0.patch 
  hrtimers-support-2.5.52-1.0.patch

Thanks.
- Julie

**These views are not necessarily those of my employer.**
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-01-16  0:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-13 18:26 [BUG - HRT patch] disabling timer hangs system when multiple over runs Fleischer, Julie N
2003-01-14 22:26 ` [BUG - HRT patch] disabling timer hangs system when multiple overruns george anzinger
2003-01-16  0:12 Fleischer, Julie N

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).