All of lore.kernel.org
 help / color / mirror / Atom feed
* tst-cputimer1 and tst-timer4
@ 2011-07-03 17:16 Carlos O'Donell
  2011-07-03 23:22 ` John David Anglin
  0 siblings, 1 reply; 14+ messages in thread
From: Carlos O'Donell @ 2011-07-03 17:16 UTC (permalink / raw)
  To: linux-parisc, John David Anglin

Dave,

Instrumenting tst-cputimer1 yields:
~~
carlos@firin:~/fsrc/glibc-work/tests/tst-cputimer1$ ./tst-cputimer1
clock_gettime returned timespec = { 0, 40000000 }
clock_getres returned timespec = { 0, 1 }
timer_helper_thread: creating thread...

timer_segev_thread: tid 1329
timer_helper_thread: creating thread...
~~~

The first thread starts, the second thread doesn't.

You expressed some worry over this failure:
~~~
1309709905.557733 futex(0xc043e2c8, FUTEX_WAKE_PRIVATE, 1) = 0
1309709905.557733 futex(0xc043e2c8,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 0) = -1
EINVAL (Invalid argument)
~~~
This is part of nptl/npt-init.c where the threading system is trying
to determine, at runtime, certain functionality.
The first test is to determine if private futexes e.g.
FUTEX_PRIVATE_FLAG are available, and they are.
The second test is to determine if FUTEX_CLOCK_REALTIME is available,
and it is, the test looks only for ENOSYS on return.

The first thread has a futex operation fail completely:
~~~
[pid  8589] 1309709905.789735 clone(Process 8590 attached
child_stack=0x40e65040,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x416644e8, tls=0x41664900, child_tidptr=0x416644e8) =
8590
[pid  8590] 1309709905.789735 set_robust_list(0x416644f0, 0xc <unfinished ...>
[pid  8589] 1309709905.789735 rt_sigtimedwait([RTMIN],  <unfinished ...>
[pid  8590] 1309709905.789735 <... set_robust_list resumed> ) = 0
[pid  8590] 1309709905.789735 gettid()  = 8590
[pid  8590] 1309709905.789735 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  8590] 1309709905.789735 clock_gettime(0xfffffffa /* CLOCK_???
*/, {0, 252000000}) = 0
[pid  8590] 1309709905.789735 futex(0x14548, FUTEX_WAKE_OP_PRIVATE, 1,
1, 0x14540, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = -1 ENOSYS
(Function not implemented)
~~~
This is a worrisome problem, that's a feature that's expected to work
and doesn't.

It's part of the internal NPTL thread handling, I don't know
specifically which futext it is, but it's important :-)

This needs fixing. I'll look into that next.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-03 17:16 tst-cputimer1 and tst-timer4 Carlos O'Donell
@ 2011-07-03 23:22 ` John David Anglin
  2011-07-04 15:21   ` Carlos O'Donell
  0 siblings, 1 reply; 14+ messages in thread
From: John David Anglin @ 2011-07-03 23:22 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: linux-parisc, John David Anglin

Hi Carlos,

 From what I see below, you have learned a lot about the failure.   
There's probably an operation
in the kernel that isn't hooked up although one would tend to think  
most of this should be generic.

we went canoeing today,
Dave

On 3-Jul-11, at 1:16 PM, Carlos O'Donell wrote:

> Dave,
>
> Instrumenting tst-cputimer1 yields:
> ~~
> carlos@firin:~/fsrc/glibc-work/tests/tst-cputimer1$ ./tst-cputimer1
> clock_gettime returned timespec = { 0, 40000000 }
> clock_getres returned timespec = { 0, 1 }
> timer_helper_thread: creating thread...
>
> timer_segev_thread: tid 1329
> timer_helper_thread: creating thread...
> ~~~
>
> The first thread starts, the second thread doesn't.
>
> You expressed some worry over this failure:
> ~~~
> 1309709905.557733 futex(0xc043e2c8, FUTEX_WAKE_PRIVATE, 1) = 0
> 1309709905.557733 futex(0xc043e2c8,
> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 0) = -1
> EINVAL (Invalid argument)
> ~~~
> This is part of nptl/npt-init.c where the threading system is trying
> to determine, at runtime, certain functionality.
> The first test is to determine if private futexes e.g.
> FUTEX_PRIVATE_FLAG are available, and they are.
> The second test is to determine if FUTEX_CLOCK_REALTIME is available,
> and it is, the test looks only for ENOSYS on return.
>
> The first thread has a futex operation fail completely:
> ~~~
> [pid  8589] 1309709905.789735 clone(Process 8590 attached
> child_stack=0x40e65040,
> flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD| 
> CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
> parent_tidptr=0x416644e8, tls=0x41664900, child_tidptr=0x416644e8) =
> 8590
> [pid  8590] 1309709905.789735 set_robust_list(0x416644f0, 0xc  
> <unfinished ...>
> [pid  8589] 1309709905.789735 rt_sigtimedwait([RTMIN],   
> <unfinished ...>
> [pid  8590] 1309709905.789735 <... set_robust_list resumed> ) = 0
> [pid  8590] 1309709905.789735 gettid()  = 8590
> [pid  8590] 1309709905.789735 rt_sigprocmask(SIG_SETMASK, [], NULL,  
> 8) = 0
> [pid  8590] 1309709905.789735 clock_gettime(0xfffffffa /* CLOCK_???
> */, {0, 252000000}) = 0
> [pid  8590] 1309709905.789735 futex(0x14548, FUTEX_WAKE_OP_PRIVATE, 1,
> 1, 0x14540, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = -1 ENOSYS
> (Function not implemented)
> ~~~
> This is a worrisome problem, that's a feature that's expected to work
> and doesn't.
>
> It's part of the internal NPTL thread handling, I don't know
> specifically which futext it is, but it's important :-)
>
> This needs fixing. I'll look into that next.
>
> Cheers,
> Carlos.
>

--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-03 23:22 ` John David Anglin
@ 2011-07-04 15:21   ` Carlos O'Donell
  2011-07-04 15:26     ` Carlos O'Donell
  0 siblings, 1 reply; 14+ messages in thread
From: Carlos O'Donell @ 2011-07-04 15:21 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, John David Anglin

On Sun, Jul 3, 2011 at 7:22 PM, John David Anglin <dave.anglin@bell.net=
> wrote:
> Hi Carlos,
>
> From what I see below, you have learned a lot about the failure. =A0T=
here's
> probably an operation
> in the kernel that isn't hooked up although one would tend to think m=
ost of
> this should be generic.

OK, fixed the kernel. We were still using the no-op futex
implementation I checked in 5 years ago :-(

=46ixing the kernel doesn't fix the testcase but instrumenting glibc
further reveals the cause of the failure:
~~~
carlos@firin:~/fsrc/glibc-work/tests/tst-cputimer1$ ./tst-cputimer1
do_clone: ARCH_CLONE...

clock_gettime returned timespec =3D { 0, 40000000 }
clock_getres returned timespec =3D { 0, 1 }
do_clone: ARCH_CLONE...

timer_helper_thread: creating thread...

do_clone: ARCH_CLONE...

timer_helper_thread: pthread_create ret 0
timer_segev_thread: tid 20077
timer_helper_thread: creating thread...

pthread_create: ALLOCATE_STACK failed.

timer_helper_thread: pthread_create ret 22
~~~

The second threads stack fails to allocate.

I don't know why, the stack allocation on hppa is tricky because of
S_G_U and the placement of stack guards.

Why it would fail for one thread and not the other is odd.

I'll keep digging.

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-04 15:21   ` Carlos O'Donell
@ 2011-07-04 15:26     ` Carlos O'Donell
  2011-07-04 16:09       ` Carlos O'Donell
  2011-07-04 16:58       ` John David Anglin
  0 siblings, 2 replies; 14+ messages in thread
From: Carlos O'Donell @ 2011-07-04 15:26 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, John David Anglin

On Mon, Jul 4, 2011 at 11:21 AM, Carlos O'Donell
<carlos@systemhalted.org> wrote:
> pthread_create: ALLOCATE_STACK failed.
>
> timer_helper_thread: pthread_create ret 22
> ~~~
>
> The second threads stack fails to allocate.

1309792564.144002 set_robust_list(0x4000b4f0, 0xc) = 0
1309792564.144002 rt_sigtimedwait([RTMIN], {si_signo=SIGRTMIN,
si_code=SI_TIMER, si_pid=4, si_uid=0, si_value={int=90456,
ptr=0x16158}}, NULL, 8) = 37
1309792564.368004 write(1, "timer_helper_thread: creating th"..., 40) = 40
1309792564.368004 write(1, "\n", 1)     = 1
1309792564.368004 mmap(NULL, 8388608, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x41349000
1309792564.368004 mprotect(0x41b47000, 4096, PROT_NONE) = 0
1309792564.368004 write(1, "do_clone: ARCH_CLONE...\n", 24) = 24
1309792564.368004 write(1, "\n", 1)     = 1
1309792564.372004 clone(child_stack=0x41349040,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x41b484e8, tls=0x41b48900, child_tidptr=0x41b484e8) =
21795
1309792564.372004 write(1, "timer_helper_thread: pthread_cre"..., 42) = 42
1309792564.372004 rt_sigtimedwait([RTMIN], {si_signo=SIGRTMIN,
si_code=SI_TIMER, si_pid=5, si_uid=0, si_value={int=90520,
ptr=0x16198}}, NULL, 8) = 37
1309792564.476005 write(1, "timer_helper_thread: creating th"..., 40) = 40
1309792564.476005 write(1, "\n", 1)     = 1
1309792564.480005 mprotect(0x41b47480, 4096,
PROT_READ|PROT_WRITE|PROT_EXEC) = -1 EINVAL (Invalid argument)
1309792564.480005 munmap(0x41349000, 8388608) = 0
1309792564.480005 write(1, "pthread_create: ALLOCATE_STACK f"..., 39) = 39

Why would this mprotect fail with -EINVAL? It's not page aligned?

You can see the earlier mmap/mprotect from thread one's stack looks
page aligned.

I'll look into this.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-04 15:26     ` Carlos O'Donell
@ 2011-07-04 16:09       ` Carlos O'Donell
  2011-07-04 16:58       ` John David Anglin
  1 sibling, 0 replies; 14+ messages in thread
From: Carlos O'Donell @ 2011-07-04 16:09 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, John David Anglin

On Mon, Jul 4, 2011 at 11:26 AM, Carlos O'Donell
<carlos@systemhalted.org> wrote:
> On Mon, Jul 4, 2011 at 11:21 AM, Carlos O'Donell
> <carlos@systemhalted.org> wrote:
>> pthread_create: ALLOCATE_STACK failed.
>>
>> timer_helper_thread: pthread_create ret 22
>> ~~~
>>
>> The second threads stack fails to allocate.
>
> 1309792564.144002 set_robust_list(0x4000b4f0, 0xc) =3D 0
> 1309792564.144002 rt_sigtimedwait([RTMIN], {si_signo=3DSIGRTMIN,
> si_code=3DSI_TIMER, si_pid=3D4, si_uid=3D0, si_value=3D{int=3D90456,
> ptr=3D0x16158}}, NULL, 8) =3D 37
> 1309792564.368004 write(1, "timer_helper_thread: creating th"..., 40)=
 =3D 40
> 1309792564.368004 write(1, "\n", 1) =A0 =A0 =3D 1
> 1309792564.368004 mmap(NULL, 8388608, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =3D 0x41349000
> 1309792564.368004 mprotect(0x41b47000, 4096, PROT_NONE) =3D 0
> 1309792564.368004 write(1, "do_clone: ARCH_CLONE...\n", 24) =3D 24
> 1309792564.368004 write(1, "\n", 1) =A0 =A0 =3D 1
> 1309792564.372004 clone(child_stack=3D0x41349040,
> flags=3DCLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLON=
E_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
> parent_tidptr=3D0x41b484e8, tls=3D0x41b48900, child_tidptr=3D0x41b484=
e8) =3D
> 21795
> 1309792564.372004 write(1, "timer_helper_thread: pthread_cre"..., 42)=
 =3D 42
> 1309792564.372004 rt_sigtimedwait([RTMIN], {si_signo=3DSIGRTMIN,
> si_code=3DSI_TIMER, si_pid=3D5, si_uid=3D0, si_value=3D{int=3D90520,
> ptr=3D0x16198}}, NULL, 8) =3D 37
> 1309792564.476005 write(1, "timer_helper_thread: creating th"..., 40)=
 =3D 40
> 1309792564.476005 write(1, "\n", 1) =A0 =A0 =3D 1
> 1309792564.480005 mprotect(0x41b47480, 4096,
> PROT_READ|PROT_WRITE|PROT_EXEC) =3D -1 EINVAL (Invalid argument)
> 1309792564.480005 munmap(0x41349000, 8388608) =3D 0
> 1309792564.480005 write(1, "pthread_create: ALLOCATE_STACK f"..., 39)=
 =3D 39
>
> Why would this mprotect fail with -EINVAL? It's not page aligned?
>
> You can see the earlier mmap/mprotect from thread one's stack looks
> page aligned.
>
> I'll look into this.

Alright, the documetnation for mprotect definately says "addr must be
aligned to a page boundary."

And matches the return failure:
"EINVAL addr is not a valid pointer, or not a multiple of the system pa=
ge size."

So it would seem that this just works in the _S_G_D case because of
the way addresses line up.

I'm instrumenting the stack allocation code and guard setup to see if
I missed anything.

This part of glibc has been heavily patched by me to enable _S_G_U,
and it's probably the ugliest part of the thread startup code.

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-04 15:26     ` Carlos O'Donell
  2011-07-04 16:09       ` Carlos O'Donell
@ 2011-07-04 16:58       ` John David Anglin
  2011-07-04 17:49         ` Carlos O'Donell
  1 sibling, 1 reply; 14+ messages in thread
From: John David Anglin @ 2011-07-04 16:58 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: linux-parisc, John David Anglin

On 7/4/2011 11:26 AM, Carlos O'Donell wrote:
> On Mon, Jul 4, 2011 at 11:21 AM, Carlos O'Donell
> <carlos@systemhalted.org>  wrote:
>> pthread_create: ALLOCATE_STACK failed.
>>
>> timer_helper_thread: pthread_create ret 22
>> ~~~
>>
>> The second threads stack fails to allocate.
> 1309792564.144002 set_robust_list(0x4000b4f0, 0xc) = 0
> 1309792564.144002 rt_sigtimedwait([RTMIN], {si_signo=SIGRTMIN,
> si_code=SI_TIMER, si_pid=4, si_uid=0, si_value={int=90456,
> ptr=0x16158}}, NULL, 8) = 37
> 1309792564.368004 write(1, "timer_helper_thread: creating th"..., 40) = 40
> 1309792564.368004 write(1, "\n", 1)     = 1
> 1309792564.368004 mmap(NULL, 8388608, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x41349000
> 1309792564.368004 mprotect(0x41b47000, 4096, PROT_NONE) = 0
> 1309792564.368004 write(1, "do_clone: ARCH_CLONE...\n", 24) = 24
> 1309792564.368004 write(1, "\n", 1)     = 1
> 1309792564.372004 clone(child_stack=0x41349040,
> flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
> parent_tidptr=0x41b484e8, tls=0x41b48900, child_tidptr=0x41b484e8) =
> 21795
> 1309792564.372004 write(1, "timer_helper_thread: pthread_cre"..., 42) = 42
> 1309792564.372004 rt_sigtimedwait([RTMIN], {si_signo=SIGRTMIN,
> si_code=SI_TIMER, si_pid=5, si_uid=0, si_value={int=90520,
> ptr=0x16198}}, NULL, 8) = 37
> 1309792564.476005 write(1, "timer_helper_thread: creating th"..., 40) = 40
> 1309792564.476005 write(1, "\n", 1)     = 1
> 1309792564.480005 mprotect(0x41b47480, 4096,
> PROT_READ|PROT_WRITE|PROT_EXEC) = -1 EINVAL (Invalid argument)
> 1309792564.480005 munmap(0x41349000, 8388608) = 0
> 1309792564.480005 write(1, "pthread_create: ALLOCATE_STACK f"..., 39) = 39
>
> Why would this mprotect fail with -EINVAL? It's not page aligned?
>
> You can see the earlier mmap/mprotect from thread one's stack looks
> page aligned.

I believe that the stack value needs to be aligned.  See comment in
arch_get_unmapped_area.  Not sure if this will fix mprotect.

I submitted a patch to return an error if  but there was no comment.

Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-04 16:58       ` John David Anglin
@ 2011-07-04 17:49         ` Carlos O'Donell
  2011-07-04 18:50           ` John David Anglin
  0 siblings, 1 reply; 14+ messages in thread
From: Carlos O'Donell @ 2011-07-04 17:49 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, John David Anglin

On Mon, Jul 4, 2011 at 12:58 PM, John David Anglin <dave.anglin@bell.ne=
t> wrote:
>> Why would this mprotect fail with -EINVAL? It's not page aligned?
>>
>> You can see the earlier mmap/mprotect from thread one's stack looks
>> page aligned.
>
> I believe that the stack value needs to be aligned. =A0See comment in
> arch_get_unmapped_area. =A0Not sure if this will fix mprotect.

What needs to be aligned, the stack value or the addr in a call to mpro=
tect?

The latter, yes, that appears to be the case.

The stack will be at least aligned to the target requirement.

> I submitted a patch to return an error if =A0but there was no comment=
=2E

Reference?

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-04 17:49         ` Carlos O'Donell
@ 2011-07-04 18:50           ` John David Anglin
  2011-07-04 18:59             ` Carlos O'Donell
  0 siblings, 1 reply; 14+ messages in thread
From: John David Anglin @ 2011-07-04 18:50 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: linux-parisc, John David Anglin

On 7/4/2011 1:49 PM, Carlos O'Donell wrote:
> On Mon, Jul 4, 2011 at 12:58 PM, John David Anglin<dave.anglin@bell.net>  wrote:
>>> Why would this mprotect fail with -EINVAL? It's not page aligned?
>>>
>>> You can see the earlier mmap/mprotect from thread one's stack looks
>>> page aligned.
>> I believe that the stack value needs to be aligned.  See comment in
>> arch_get_unmapped_area.  Not sure if this will fix mprotect.
> What needs to be aligned, the stack value or the addr in a call to mprotect?

Actually, it's not the stack address but the address used in the malloc 
call to allocate the
thread's stack that needs alignment.  I believe 64 bytes are used by the 
parent.  In current
thread setup.  So, it looks like child_stack value is ok.

The Linux man page says the mprotect addr must be a valid pointer or a 
multiple of PAGESIZE.
It's not clear what the mprotect call is trying to protect but it is 
definitely not page aligned.
>> I submitted a patch to return an error if  but there was no comment.
> Reference?
>
http://www.spinics.net/lists/linux-parisc/msg03352.html

Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-04 18:50           ` John David Anglin
@ 2011-07-04 18:59             ` Carlos O'Donell
  2011-07-05 17:28               ` Carlos O'Donell
  0 siblings, 1 reply; 14+ messages in thread
From: Carlos O'Donell @ 2011-07-04 18:59 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, John David Anglin

On Mon, Jul 4, 2011 at 2:50 PM, John David Anglin <dave.anglin@bell.net=
> wrote:
> The Linux man page says the mprotect addr must be a valid pointer or =
a
> multiple of PAGESIZE.
> It's not clear what the mprotect call is trying to protect but it is
> definitely not page aligned.

It's trying to protect the new stack for the thread, which is
obviously in the wrong spot.

The new stack is in the middle of the last threads guard.

I don't know what conditions triggered here, but I'm tracking it down.

The C library has a stack cache which might be the culprit, but I'd
believed all these issues fixed, I guess not.

I like easy bugs... this is simply a matter of debugging at this point.

>>> I submitted a patch to return an error if =A0but there was no comme=
nt.
>>
>> Reference?
>>
> http://www.spinics.net/lists/linux-parisc/msg03352.html

Interesting. Thanks.

Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-04 18:59             ` Carlos O'Donell
@ 2011-07-05 17:28               ` Carlos O'Donell
  2011-07-05 17:51                 ` Rolf Eike Beer
  0 siblings, 1 reply; 14+ messages in thread
From: Carlos O'Donell @ 2011-07-05 17:28 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, John David Anglin

On Mon, Jul 4, 2011 at 2:59 PM, Carlos O'Donell <carlos@systemhalted.org> wrote:
> On Mon, Jul 4, 2011 at 2:50 PM, John David Anglin <dave.anglin@bell.net> wrote:
>> The Linux man page says the mprotect addr must be a valid pointer or a
>> multiple of PAGESIZE.
>> It's not clear what the mprotect call is trying to protect but it is
>> definitely not page aligned.
>
> It's trying to protect the new stack for the thread, which is
> obviously in the wrong spot.

Good news.

I have fixed tst-cputimer1.

When a thread uses a cached stack block the location of the guard is
incorrectly computed.

The incorrect guard location causes mprotect to fail, failing the
thread stack allocation, and triggering a thread creation failure.

The defect results:

* in a missed timer event because of the silent death of the timer
thread (SIGEV_THREAD) that used the cached stack.

* in a pthread_create failure with error code EINVAL (22).

With the fix I can now run tst-cputimer1 without a hang, but the test
still complains about timers expiring early, that may be kernel
related.

This is a *serious* defect and would have caused a lot of userspace
instability for any program that uses short running threads (causing
the stack cache to be used).

I'm surprised that things worked so well, it's probably a testament to
the fact that few programs use SIGEV_THREAD, threads, or short running
threads.

When debugging if you slow down thr1 sufficiently, then thr2 can't
reuse the stack and it gets a new one, which works correctly. This
explains the odd behaviour I saw during debugging.

In a similar vein, I found that madvise was being passed an unaligned
block during thread shutdown, resulting in memory not being correctly
given back to the OS.

These two bug fixes should be a serious win for threaded applications on hppa.

~~~ tst-cputimer1 run on patched glibc ~~~
carlos@firin:~/fsrc/glibc-work/tests/tst-cputimer1$ ./tst-cputimer1 >& run.txt
carlos@firin:~/fsrc/glibc-work/tests/tst-cputimer1$ cat run.txt
clock_gettime returned timespec = { 0, 36000000 }
clock_getres returned timespec = { 0, 1 }
*** Calling clock_gettime

*** Checking thr2_cont > =5

*** Incrementing thr2_count

*** timer sig1 invoked too soon: 0.436000000 instead of expected 0.440000000
*** timer sig2 invoked too soon: 0.536000000 instead of expected 0.540000000
*** Calling clock_gettime

*** Checking thr2_cont > =5

*** Incrementing thr2_count

*** Calling clock_gettime

*** Checking thr2_cont > =5

*** Incrementing thr2_count

*** Calling clock_gettime

*** Checking thr2_cont > =5

*** Incrementing thr2_count

*** Calling clock_gettime

*** Checking thr2_cont > =5

*** Incrementing thr2_count

*** Calling clock_gettime

*** Checking thr2_cont > =5

*** Incrementing thr2_count

*** timer thr2 invoked too soon: 2.928000000 instead of expected 2.936000000
*** timer sig1 invoked too soon: 3.424000000 instead of expected 3.436000000
*** timer sig2 invoked too soon: 3.924000000 instead of expected 3.936000000
carlos@firin:~/fsrc/glibc-work/tests/tst-cputimer1$
~~~

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-05 17:28               ` Carlos O'Donell
@ 2011-07-05 17:51                 ` Rolf Eike Beer
  2011-07-05 22:17                   ` Carlos O'Donell
  0 siblings, 1 reply; 14+ messages in thread
From: Rolf Eike Beer @ 2011-07-05 17:51 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: Text/Plain, Size: 694 bytes --]

Am Dienstag 05 Juli 2011, 19:28:35 schrieben Sie:
> On Mon, Jul 4, 2011 at 2:59 PM, Carlos O'Donell <carlos@systemhalted.org> 
wrote:
> > On Mon, Jul 4, 2011 at 2:50 PM, John David Anglin <dave.anglin@bell.net> 
wrote:
> >> The Linux man page says the mprotect addr must be a valid pointer or a
> >> multiple of PAGESIZE.
> >> It's not clear what the mprotect call is trying to protect but it is
> >> definitely not page aligned.
> > 
> > It's trying to protect the new stack for the thread, which is
> > obviously in the wrong spot.
> 
> Good news.
> 
> I have fixed tst-cputimer1.

You also wrote "OK, fixed the kernel." So where an when will these fixes show 
up?

Eike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-05 17:51                 ` Rolf Eike Beer
@ 2011-07-05 22:17                   ` Carlos O'Donell
  2011-07-08 15:00                     ` Rolf Eike Beer
  0 siblings, 1 reply; 14+ messages in thread
From: Carlos O'Donell @ 2011-07-05 22:17 UTC (permalink / raw)
  To: Rolf Eike Beer; +Cc: linux-parisc

On Tue, Jul 5, 2011 at 1:51 PM, Rolf Eike Beer <eike@sf-mail.de> wrote:
> Am Dienstag 05 Juli 2011, 19:28:35 schrieben Sie:
>> On Mon, Jul 4, 2011 at 2:59 PM, Carlos O'Donell <carlos@systemhalted.org>
> wrote:
>> > On Mon, Jul 4, 2011 at 2:50 PM, John David Anglin <dave.anglin@bell.net>
> wrote:
>> >> The Linux man page says the mprotect addr must be a valid pointer or a
>> >> multiple of PAGESIZE.
>> >> It's not clear what the mprotect call is trying to protect but it is
>> >> definitely not page aligned.
>> >
>> > It's trying to protect the new stack for the thread, which is
>> > obviously in the wrong spot.
>>
>> Good news.
>>
>> I have fixed tst-cputimer1.
>
> You also wrote "OK, fixed the kernel." So where an when will these fixes show
> up?

Unfortunately I hosed my setup and I didn't have git installed to make
a proper diff so here's a diff versus a somewhat recent tree.

 WIP patch: http://www.parisc-linux.org/~carlos/futex.diff

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-05 22:17                   ` Carlos O'Donell
@ 2011-07-08 15:00                     ` Rolf Eike Beer
  2011-07-08 15:18                       ` John David Anglin
  0 siblings, 1 reply; 14+ messages in thread
From: Rolf Eike Beer @ 2011-07-08 15:00 UTC (permalink / raw)
  To: linux-parisc


[-- Attachment #1.1: Type: Text/Plain, Size: 1270 bytes --]

Am Mittwoch 06 Juli 2011, 00:17:03 schrieben Sie:
> On Tue, Jul 5, 2011 at 1:51 PM, Rolf Eike Beer <eike@sf-mail.de> wrote:
> > Am Dienstag 05 Juli 2011, 19:28:35 schrieben Sie:
> >> On Mon, Jul 4, 2011 at 2:59 PM, Carlos O'Donell
> >> <carlos@systemhalted.org>
> > 
> > wrote:
> >> > On Mon, Jul 4, 2011 at 2:50 PM, John David Anglin
> >> > <dave.anglin@bell.net>
> > 
> > wrote:
> >> >> The Linux man page says the mprotect addr must be a valid pointer or
> >> >> a multiple of PAGESIZE.
> >> >> It's not clear what the mprotect call is trying to protect but it is
> >> >> definitely not page aligned.
> >> > 
> >> > It's trying to protect the new stack for the thread, which is
> >> > obviously in the wrong spot.
> >> 
> >> Good news.
> >> 
> >> I have fixed tst-cputimer1.
> > 
> > You also wrote "OK, fixed the kernel." So where an when will these fixes
> > show up?
> 
> Unfortunately I hosed my setup and I didn't have git installed to make
> a proper diff so here's a diff versus a somewhat recent tree.
> 
>  WIP patch: http://www.parisc-linux.org/~carlos/futex.diff

Half of your diff (e.g. the int->u32 conversion) is already in upstream. I've 
rediffed it against Linus tree and fixed some of the whitespace damage.

Eike

[-- Attachment #1.2: hppa-futex.patch --]
[-- Type: text/x-patch, Size: 3326 bytes --]

From 26950303065f2d1d5e36bbb818d3ee8621a6fc2c Mon Sep 17 00:00:00 2001
From: Carlos O'Donell <carlos@systemhalted.org>
Date: Fri, 8 Jul 2011 09:42:31 +0200
Subject: [PATCH] fix futexes

Signed-off-by: Carlos O'Donell <carlos@systemhalted.org>
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
---
 arch/parisc/include/asm/futex.h |   65 ++++++++++++++++++++++++++++++++++++---
 1 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h
index 67a33cc..3008d06 100644
--- a/arch/parisc/include/asm/futex.h
+++ b/arch/parisc/include/asm/futex.h
@@ -5,11 +5,14 @@
 
 #include <linux/futex.h>
 #include <linux/uaccess.h>
+#include <asm/atomic.h>
 #include <asm/errno.h>
 
 static inline int
 futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
 {
+	unsigned long int flags;
+	u32 val;
 	int op = (encoded_op >> 28) & 7;
 	int cmp = (encoded_op >> 24) & 15;
 	int oparg = (encoded_op << 8) >> 20;
@@ -23,16 +26,53 @@ futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
 
 	pagefault_disable();
 
+	_atomic_spin_lock_irqsave(uaddr, flags);
+
 	switch (op) {
 	case FUTEX_OP_SET:
+		/* *(int *)UADDR2 = OPARG; */
+		ret = get_user(oldval, uaddr);
+		if (!ret)
+			ret = put_user(oparg, uaddr);
+		break;
 	case FUTEX_OP_ADD:
+		/* *(int *)UADDR2 += OPARG; */
+		ret = get_user(oldval, uaddr);
+		if (!ret) {
+			val = oldval + oparg;
+			ret = put_user(val, uaddr);
+		}
+		break;
 	case FUTEX_OP_OR:
+		/* *(int *)UADDR2 |= OPARG; */
+		ret = get_user(oldval, uaddr);
+		if (!ret) {
+			val = oldval | oparg;
+			ret = put_user(val, uaddr);
+		}
+		break;
 	case FUTEX_OP_ANDN:
+		/* *(int *)UADDR2 &= ~OPARG; */
+		ret = get_user(oldval, uaddr);
+		if (!ret) {
+			val = oldval & ~oparg;
+			ret = put_user(val, uaddr);
+		}
+		break;
 	case FUTEX_OP_XOR:
+		/* *(int *)UADDR2 ^= OPARG; */
+		ret = get_user(oldval, uaddr);
+		if (!ret) {
+			val = oldval ^ oparg;
+			ret = put_user(val, uaddr);
+		}
+		break;
 	default:
 		ret = -ENOSYS;
 	}
 
+	_atomic_spin_unlock_irqrestore(uaddr, flags);
+
 	pagefault_enable();
 
 	if (!ret) {
@@ -54,7 +94,9 @@ static inline int
 futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 			      u32 oldval, u32 newval)
 {
+	int ret;
 	u32 val;
+	unsigned long flags;
 
 	/* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
 	 * our gateway page, and causes no end of trouble...
@@ -65,12 +107,25 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
 		return -EFAULT;
 
-	if (get_user(val, uaddr))
-		return -EFAULT;
-	if (val == oldval && put_user(newval, uaddr))
-		return -EFAULT;
+	/* HPPA has no cmpxchg in hardware and therefore the
+	 * best we can do here is use an array of locks. The
+	 * lock selected is based on a hash of the userspace
+	 * address. This should scale to a couple of CPUs.
+	 */
+
+	_atomic_spin_lock_irqsave(uaddr, flags);
+
+	ret = get_user(val, uaddr);
+
+	if (!ret)
+		if (val == oldval)
+			ret = put_user(newval, uaddr);
+
 	*uval = val;
-	return 0;
+
+	_atomic_spin_unlock_irqrestore(uaddr, flags);
+
+	return ret;
 }
 
 #endif /*__KERNEL__*/
-- 
1.7.6


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: tst-cputimer1 and tst-timer4
  2011-07-08 15:00                     ` Rolf Eike Beer
@ 2011-07-08 15:18                       ` John David Anglin
  0 siblings, 0 replies; 14+ messages in thread
From: John David Anglin @ 2011-07-08 15:18 UTC (permalink / raw)
  To: Rolf Eike Beer; +Cc: linux-parisc

On 7/8/2011 11:00 AM, Rolf Eike Beer wrote:
> Am Mittwoch 06 Juli 2011, 00:17:03 schrieben Sie:
>> On Tue, Jul 5, 2011 at 1:51 PM, Rolf Eike Beer<eike@sf-mail.de>  wrote:
>>> Am Dienstag 05 Juli 2011, 19:28:35 schrieben Sie:
>>>> On Mon, Jul 4, 2011 at 2:59 PM, Carlos O'Donell
>>>> <carlos@systemhalted.org>
>>> wrote:
>>>>> On Mon, Jul 4, 2011 at 2:50 PM, John David Anglin
>>>>> <dave.anglin@bell.net>
>>> wrote:
>>>>>> The Linux man page says the mprotect addr must be a valid pointer or
>>>>>> a multiple of PAGESIZE.
>>>>>> It's not clear what the mprotect call is trying to protect but it is
>>>>>> definitely not page aligned.
>>>>> It's trying to protect the new stack for the thread, which is
>>>>> obviously in the wrong spot.
>>>> Good news.
>>>>
>>>> I have fixed tst-cputimer1.
>>> You also wrote "OK, fixed the kernel." So where an when will these fixes
>>> show up?
>> Unfortunately I hosed my setup and I didn't have git installed to make
>> a proper diff so here's a diff versus a somewhat recent tree.
>>
>>   WIP patch: http://www.parisc-linux.org/~carlos/futex.diff
> Half of your diff (e.g. the int->u32 conversion) is already in upstream. I've
> rediffed it against Linus tree and fixed some of the whitespace damage.
>
> Eike

+	if (!ret)
+		if (val == oldval)
+			ret = put_user(newval, uaddr);

This bit is ugly.

Dave

-- 
John David Anglin    dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-07-08 15:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-03 17:16 tst-cputimer1 and tst-timer4 Carlos O'Donell
2011-07-03 23:22 ` John David Anglin
2011-07-04 15:21   ` Carlos O'Donell
2011-07-04 15:26     ` Carlos O'Donell
2011-07-04 16:09       ` Carlos O'Donell
2011-07-04 16:58       ` John David Anglin
2011-07-04 17:49         ` Carlos O'Donell
2011-07-04 18:50           ` John David Anglin
2011-07-04 18:59             ` Carlos O'Donell
2011-07-05 17:28               ` Carlos O'Donell
2011-07-05 17:51                 ` Rolf Eike Beer
2011-07-05 22:17                   ` Carlos O'Donell
2011-07-08 15:00                     ` Rolf Eike Beer
2011-07-08 15:18                       ` John David Anglin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.