All of lore.kernel.org
 help / color / mirror / Atom feed
* Problems with the new pthread clock implementations
@ 2020-11-21  6:59 Michael Kerrisk (man-pages)
  2020-11-21 17:54 ` Mike Crowe
  2020-11-21 19:50 ` Thomas Gleixner
  0 siblings, 2 replies; 7+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-11-21  6:59 UTC (permalink / raw)
  To: Mike Crowe; +Cc: Adhemerval Zanella, libc-alpha, Linux API

Hello Mike,

I've been taking a closer look at the the new pthread*clock*() APIs:
pthread_clockjoin_np()
pthread_cond_clockwait()
pthread_mutex_clocklock()
pthread_rwlock_clockrdlock()
pthread_rwlock_clockwrlock()
sem_clockwait()

I've noticed some oddities, and at least a couple of bugs.

First off, I just note that there's a surprisingly wide variation in
the low-level futex calls being used by these APIs when implementing
CLOCK_REALTIME support:

pthread_rwlock_clockrdlock()
pthread_rwlock_clockwrlock()
sem_clockwait()
pthread_cond_clockwait()
    futex(addr,
        FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3,
        {abstimespec}, FUTEX_BITSET_MATCH_ANY)
    (This implementation seems to be okay)

pthread_clockjoin_np()
    futex(addr, FUTEX_WAIT, 48711, {reltimespec})
    (This is buggy; see below.)

pthread_mutex_clocklock()
    futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec})
    (There's bugs and strangeness here; see below.)

=== Bugs ===

pthread_clockjoin_np():
As already recognized in another mail thread [1], this API accepts any
kind of clockid, even though it doesn't support most of them.

A further bug is that even if CLOCK_REALTIME is specified,
pthread_clockjoin_np() sleeps against the CLOCK_MONOTONIC clock.
(Currently it does this for *all* clockid values.) The problem here is
that the FUTEX_WAIT operation sleeps against the CLOCK_MONOTONIC clock
by default. At the least, the FUTEX_CLOCK_REALTIME is required for
this case. Alternatively, an implementation using
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME (like the first four
functions listed above) might be appropriate.

===

pthread_mutex_clocklock():
First of all, there's a small oddity. Suppose we specify the clockid
as CLOCK_REALTIME, and then while the call is blocked, we set the
clock realtime backwards. Then, there will be further futex calls to
handle the modification to the clock (and possibly multiple futex
calls if the realtime clock is adjusted repeatedly):

        futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec1})
        futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec2})
        ...

Then there seems to be a bug. If we specify the clockid as
CLOCK_REALTIME, and while the call is blocked we set the realtime
clock forwards, then the blocking interval of the call is *not*
adjusted (shortened), when of course it should be.

===

I've attached a couple of small test programs at the end of this mail.

Thanks,

Michael

[1] https://lore.kernel.org/linux-man/20201119120034.GA20599@mcrowe.com/

=============================================

/* t_pthread_clockjoin_np.c

   Copyright Michael Kerrisk, 2020. Licensed GPLv2+
*/
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

#define errExitEN(en, msg) \
                        do { errno = en; perror(msg); \
                             exit(EXIT_FAILURE); } while (0)

static void *
threadFunc(void *arg)
{
    printf("New thread started\n");
    sleep(2000);
    return NULL;
}

int
main(int argc, char *argv[])
{
    pthread_t t1;
    int nsecs;
    int s;

    nsecs = (argc > 1) ? atoi(argv[1]) : 10;

    s = pthread_create(&t1, NULL, threadFunc, (void *) 1);
    if (s != 0)
        errExitEN(s, "pthread_create");

    struct timespec ts, tsm1, tsm2;
    if (clock_gettime(CLOCK_MONOTONIC, &tsm1) == -1)
        errExit("clock_gettime");

    int clockid;
    clockid = CLOCK_REALTIME;
    if (clock_gettime(clockid, &ts) == -1)
        errExit("clock_gettime");

    ts.tv_sec += nsecs;
    printf("ts.tv_sec = %ld\n", ts.tv_sec);

    s = pthread_clockjoin_np(t1, NULL, clockid, &ts);
    if (s == 0) {
        printf("pthread_clockjoin_np() successfully joined\n");
    } else {
        if (s == ETIMEDOUT)
            printf("pthread_clockjoin_np() timed out\n");
        else
            errExitEN(s, "pthread_clockjoin_np");
    }

    if (clock_gettime(CLOCK_MONOTONIC, &tsm2) == -1)
        errExit("clock_gettime");

    printf("CLOCK_MONOTONIC diff = %ld secs\n", tsm2.tv_sec - tsm1.tv_sec);

    exit(EXIT_SUCCESS);
}

=============================================


/* t_pthread_mutex_clocklock.c

   Copyright Michael Kerrisk, 2020. Licensed GPLv2+
*/
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)

#define errExitEN(en, msg) \
                        do { errno = en; perror(msg); \
                             exit(EXIT_FAILURE); } while (0)

static pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

static void *
threadFunc(void *arg)
{
    int s;

    printf("Thread created\n");
    s = pthread_mutex_lock(&mtx);
    if (s != 0)
        errExitEN(s, "pthread_mutex_lock");
    else
        printf("Thread 1 locked mutex\n");

    pause();
    return NULL;
}

int
main(int argc, char *argv[])
{
    pthread_t t1;
    int nsecs;
    int s;

    nsecs = (argc > 1) ? atoi(argv[1]) : 10;

    s = pthread_create(&t1, NULL, threadFunc, argv[1]);
    if (s != 0)
        errExitEN(s, "pthread_create");

    usleep(100000);

    struct timespec ts, tsm1, tsm2;
    if (clock_gettime(CLOCK_MONOTONIC, &tsm1) == -1)
        errExit("clock_gettime");

    int clockid;
    clockid = CLOCK_REALTIME;
    if (clock_gettime(clockid, &ts) == -1)
        errExit("clock_gettime");

    ts.tv_sec += nsecs;
    printf("ts.tv_sec = %ld\n", ts.tv_sec);

    s = pthread_mutex_clocklock(&mtx, clockid, &ts);
    if (s == 0) {
        printf("pthread_mutex_clocklock() got lock\n");
    } else {
        if (s == ETIMEDOUT)
            printf("pthread_mutex_clocklock() timed out\n");
        else
            errExitEN(s, "pthread_mutex_clocklock");
    }

    if (clock_gettime(CLOCK_MONOTONIC, &tsm2) == -1)
        errExit("clock_gettime");

    printf("CLOCK_MONOTONIC diff = %ld secs\n", tsm2.tv_sec - tsm1.tv_sec);

    exit(EXIT_SUCCESS);
}


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with the new pthread clock implementations
  2020-11-21  6:59 Problems with the new pthread clock implementations Michael Kerrisk (man-pages)
@ 2020-11-21 17:54 ` Mike Crowe
  2020-11-21 21:41   ` Michael Kerrisk (man-pages)
  2020-11-21 19:50 ` Thomas Gleixner
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Crowe @ 2020-11-21 17:54 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: Adhemerval Zanella, libc-alpha, Linux API

Hi Michael,

On Saturday 21 November 2020 at 07:59:04 +0100, Michael Kerrisk (man-pages) wrote:
> I've been taking a closer look at the the new pthread*clock*() APIs:
> pthread_clockjoin_np()
> pthread_cond_clockwait()
> pthread_mutex_clocklock()
> pthread_rwlock_clockrdlock()
> pthread_rwlock_clockwrlock()
> sem_clockwait()
> 
> I've noticed some oddities, and at least a couple of bugs.
> 
> First off, I just note that there's a surprisingly wide variation in
> the low-level futex calls being used by these APIs when implementing
> CLOCK_REALTIME support:
> 
> pthread_rwlock_clockrdlock()
> pthread_rwlock_clockwrlock()
> sem_clockwait()
> pthread_cond_clockwait()
>     futex(addr,
>         FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3,
>         {abstimespec}, FUTEX_BITSET_MATCH_ANY)
>     (This implementation seems to be okay)
> 
> pthread_clockjoin_np()
>     futex(addr, FUTEX_WAIT, 48711, {reltimespec})
>     (This is buggy; see below.)
> 
> pthread_mutex_clocklock()
>     futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec})
>     (There's bugs and strangeness here; see below.)

Yes, I found it very confusing when I started adding the new
pthread*clock*() functions, and it still takes me a while to find the right
functions when I look now. I believe that Adhemerval was talking about
simplifying some of this.

> === Bugs ===
> 
> pthread_clockjoin_np():
> As already recognized in another mail thread [1], this API accepts any
> kind of clockid, even though it doesn't support most of them.

Well, it sort of does support them at least as well as many other
implementations of such functions do - it just calculates a relative
timeout using the supplied lock and then uses that. But, ...

> A further bug is that even if CLOCK_REALTIME is specified,
> pthread_clockjoin_np() sleeps against the CLOCK_MONOTONIC clock.
> (Currently it does this for *all* clockid values.) The problem here is
> that the FUTEX_WAIT operation sleeps against the CLOCK_MONOTONIC clock
> by default. At the least, the FUTEX_CLOCK_REALTIME is required for
> this case. Alternatively, an implementation using
> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME (like the first four
> functions listed above) might be appropriate.

...this is one downside of that. That bug was inherited from the
existing pthread_clock_timedjoin_np implementation.

I was planning to write a patch to just limit the supported clocks, but
I'll have a go at fixing the bug you describe properly instead first which
will limit the implementation to CLOCK_REALTIME and CLOCK_MONOTONIC anyway.

> ===
> 
> pthread_mutex_clocklock():
> First of all, there's a small oddity. Suppose we specify the clockid
> as CLOCK_REALTIME, and then while the call is blocked, we set the
> clock realtime backwards. Then, there will be further futex calls to
> handle the modification to the clock (and possibly multiple futex
> calls if the realtime clock is adjusted repeatedly):
> 
>         futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec1})
>         futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec2})
>         ...
> 
> Then there seems to be a bug. If we specify the clockid as
> CLOCK_REALTIME, and while the call is blocked we set the realtime
> clock forwards, then the blocking interval of the call is *not*
> adjusted (shortened), when of course it should be.

This is because __lll_clocklock_wait ends up doing a relative wait rather
than an absolute one so it suffers from the same problem as
pthread_clockjoin_np.

> ===
> 
> I've attached a couple of small test programs at the end of this mail.

Thanks for looking at this in detail.

AFAIK, all of these bugs also affected the corresponding existing
pthread*timed*() functions. When I added the new pthread*clock*() functions
I was trying to keep my changes to the existing code as small as possible.
(I started out trying to "scratch the itch" of libstdc++
std::condition_variable::wait_for misbehaving[2] when the system clock was
warped in 2015 and all of this ballooned from that.) Now that the functions
are in, I think there's definitely scope for improving the implementation
and I will try to do so as time and confidence allows - the implementation
of __pthread_mutex_clocklock_common scares me greatly!

Thanks.

Mike.

[1] https://lore.kernel.org/linux-man/20201119120034.GA20599@mcrowe.com/
[2] https://randombitsofuselessinformation.blogspot.com/2018/06/its-about-time-monotonic-time.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with the new pthread clock implementations
  2020-11-21  6:59 Problems with the new pthread clock implementations Michael Kerrisk (man-pages)
  2020-11-21 17:54 ` Mike Crowe
@ 2020-11-21 19:50 ` Thomas Gleixner
  2020-11-21 20:10   ` Mike Crowe
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2020-11-21 19:50 UTC (permalink / raw)
  To: mtk.manpages, Mike Crowe; +Cc: Adhemerval Zanella, libc-alpha, Linux API

On Sat, Nov 21 2020 at 07:59, Michael Kerrisk wrote:
> I've been taking a closer look at the the new pthread*clock*() APIs:
> pthread_clockjoin_np()
> pthread_cond_clockwait()
> pthread_mutex_clocklock()
> pthread_rwlock_clockrdlock()
> pthread_rwlock_clockwrlock()
> sem_clockwait()

Is there any at least rudimentary specification of these functions and
what they are supposed to do?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with the new pthread clock implementations
  2020-11-21 19:50 ` Thomas Gleixner
@ 2020-11-21 20:10   ` Mike Crowe
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Crowe @ 2020-11-21 20:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: mtk.manpages@gmail.com Adhemerval Zanella, libc-alpha, Linux API

Hi Thomas,

On Saturday 21 November 2020 at 20:50:38 +0100, Thomas Gleixner wrote:
> On Sat, Nov 21 2020 at 07:59, Michael Kerrisk wrote:
> > I've been taking a closer look at the the new pthread*clock*() APIs:
> > pthread_clockjoin_np()
> > pthread_cond_clockwait()
> > pthread_mutex_clocklock()
> > pthread_rwlock_clockrdlock()
> > pthread_rwlock_clockwrlock()
> > sem_clockwait()
> 
> Is there any at least rudimentary specification of these functions and
> what they are supposed to do?

The short answer is that they are like their "timed" equivalents, but they
also accept a clockid. The longer answer is at
https://www.austingroupbugs.net/view.php?id=1216 and if you have access to
it
https://www.opengroup.org/austin/restricted/newapis-p1/Additional_APIs_for_Issue_8-part1-changedpages.pdf

HTH.

Mike.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with the new pthread clock implementations
  2020-11-21 17:54 ` Mike Crowe
@ 2020-11-21 21:41   ` Michael Kerrisk (man-pages)
  2020-11-23 14:38     ` Adhemerval Zanella
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-11-21 21:41 UTC (permalink / raw)
  To: Mike Crowe; +Cc: mtk.manpages, Adhemerval Zanella, libc-alpha, Linux API

Hello Mike,

On 11/21/20 6:54 PM, Mike Crowe wrote:
> Hi Michael,
> 
> On Saturday 21 November 2020 at 07:59:04 +0100, Michael Kerrisk (man-pages) wrote:
>> I've been taking a closer look at the the new pthread*clock*() APIs:
>> pthread_clockjoin_np()
>> pthread_cond_clockwait()
>> pthread_mutex_clocklock()
>> pthread_rwlock_clockrdlock()
>> pthread_rwlock_clockwrlock()
>> sem_clockwait()
>>
>> I've noticed some oddities, and at least a couple of bugs.
>>
>> First off, I just note that there's a surprisingly wide variation in
>> the low-level futex calls being used by these APIs when implementing
>> CLOCK_REALTIME support:
>>
>> pthread_rwlock_clockrdlock()
>> pthread_rwlock_clockwrlock()
>> sem_clockwait()
>> pthread_cond_clockwait()
>>     futex(addr,
>>         FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3,
>>         {abstimespec}, FUTEX_BITSET_MATCH_ANY)
>>     (This implementation seems to be okay)
>>
>> pthread_clockjoin_np()
>>     futex(addr, FUTEX_WAIT, 48711, {reltimespec})
>>     (This is buggy; see below.)
>>
>> pthread_mutex_clocklock()
>>     futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec})
>>     (There's bugs and strangeness here; see below.)
> 
> Yes, I found it very confusing when I started adding the new
> pthread*clock*() functions, and it still takes me a while to find the right
> functions when I look now. I believe that Adhemerval was talking about
> simplifying some of this.
> 
>> === Bugs ===
>>
>> pthread_clockjoin_np():
>> As already recognized in another mail thread [1], this API accepts any
>> kind of clockid, even though it doesn't support most of them.
> 
> Well, it sort of does support them at least as well as many other
> implementations of such functions do - it just calculates a relative
> timeout using the supplied lock and then uses that. But, ...
> 
>> A further bug is that even if CLOCK_REALTIME is specified,
>> pthread_clockjoin_np() sleeps against the CLOCK_MONOTONIC clock.
>> (Currently it does this for *all* clockid values.) The problem here is
>> that the FUTEX_WAIT operation sleeps against the CLOCK_MONOTONIC clock
>> by default. At the least, the FUTEX_CLOCK_REALTIME is required for
>> this case. Alternatively, an implementation using
>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME (like the first four
>> functions listed above) might be appropriate.
> 
> ...this is one downside of that. That bug was inherited from the
> existing pthread_clock_timedjoin_np implementation.

Oh -- that's pretty sad. I hadn't considered the possibility that
the (longstanding) "timed" functions might have the same bug.

> I was planning to write a patch to just limit the supported clocks, but
> I'll have a go at fixing the bug you describe properly instead first which
> will limit the implementation to CLOCK_REALTIME and CLOCK_MONOTONIC anyway.
> 
>> ===
>>
>> pthread_mutex_clocklock():
>> First of all, there's a small oddity. Suppose we specify the clockid
>> as CLOCK_REALTIME, and then while the call is blocked, we set the
>> clock realtime backwards. Then, there will be further futex calls to
>> handle the modification to the clock (and possibly multiple futex
>> calls if the realtime clock is adjusted repeatedly):
>>
>>         futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec1})
>>         futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec2})
>>         ...
>>
>> Then there seems to be a bug. If we specify the clockid as
>> CLOCK_REALTIME, and while the call is blocked we set the realtime
>> clock forwards, then the blocking interval of the call is *not*
>> adjusted (shortened), when of course it should be.
> 
> This is because __lll_clocklock_wait ends up doing a relative wait rather
> than an absolute one so it suffers from the same problem as
> pthread_clockjoin_np.
> 
>> ===
>>
>> I've attached a couple of small test programs at the end of this mail.
> 
> Thanks for looking at this in detail.
> 
> AFAIK, all of these bugs also affected the corresponding existing
> pthread*timed*() functions. When I added the new pthread*clock*() functions
> I was trying to keep my changes to the existing code as small as possible.
> (I started out trying to "scratch the itch" of libstdc++
> std::condition_variable::wait_for misbehaving[2] when the system clock was
> warped in 2015 and all of this ballooned from that.) Now that the functions
> are in, I think there's definitely scope for improving the implementation
> and I will try to do so as time and confidence allows - the implementation
> of __pthread_mutex_clocklock_common scares me greatly!

Yeah, a lot of glibc code is not so easy to follow... Thank you for
taking a look.

Cheers,

Michael

> [1] https://lore.kernel.org/linux-man/20201119120034.GA20599@mcrowe.com/
> [2] https://randombitsofuselessinformation.blogspot.com/2018/06/its-about-time-monotonic-time.html
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with the new pthread clock implementations
  2020-11-21 21:41   ` Michael Kerrisk (man-pages)
@ 2020-11-23 14:38     ` Adhemerval Zanella
  2020-11-23 16:12       ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 7+ messages in thread
From: Adhemerval Zanella @ 2020-11-23 14:38 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Mike Crowe; +Cc: libc-alpha, Linux API



On 21/11/2020 18:41, Michael Kerrisk (man-pages) wrote:
> Hello Mike,
> 
> On 11/21/20 6:54 PM, Mike Crowe wrote:
>> Hi Michael,
>>
>> On Saturday 21 November 2020 at 07:59:04 +0100, Michael Kerrisk (man-pages) wrote:
>>> I've been taking a closer look at the the new pthread*clock*() APIs:
>>> pthread_clockjoin_np()
>>> pthread_cond_clockwait()
>>> pthread_mutex_clocklock()
>>> pthread_rwlock_clockrdlock()
>>> pthread_rwlock_clockwrlock()
>>> sem_clockwait()
>>>
>>> I've noticed some oddities, and at least a couple of bugs.
>>>
>>> First off, I just note that there's a surprisingly wide variation in
>>> the low-level futex calls being used by these APIs when implementing
>>> CLOCK_REALTIME support:
>>>
>>> pthread_rwlock_clockrdlock()
>>> pthread_rwlock_clockwrlock()
>>> sem_clockwait()
>>> pthread_cond_clockwait()
>>>     futex(addr,
>>>         FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3,
>>>         {abstimespec}, FUTEX_BITSET_MATCH_ANY)
>>>     (This implementation seems to be okay)
>>>
>>> pthread_clockjoin_np()
>>>     futex(addr, FUTEX_WAIT, 48711, {reltimespec})
>>>     (This is buggy; see below.)
>>>
>>> pthread_mutex_clocklock()
>>>     futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec})
>>>     (There's bugs and strangeness here; see below.)
>>
>> Yes, I found it very confusing when I started adding the new
>> pthread*clock*() functions, and it still takes me a while to find the right
>> functions when I look now. I believe that Adhemerval was talking about
>> simplifying some of this.
>>
>>> === Bugs ===
>>>
>>> pthread_clockjoin_np():
>>> As already recognized in another mail thread [1], this API accepts any
>>> kind of clockid, even though it doesn't support most of them.
>>
>> Well, it sort of does support them at least as well as many other
>> implementations of such functions do - it just calculates a relative
>> timeout using the supplied lock and then uses that. But, ...
>>
>>> A further bug is that even if CLOCK_REALTIME is specified,
>>> pthread_clockjoin_np() sleeps against the CLOCK_MONOTONIC clock.
>>> (Currently it does this for *all* clockid values.) The problem here is
>>> that the FUTEX_WAIT operation sleeps against the CLOCK_MONOTONIC clock
>>> by default. At the least, the FUTEX_CLOCK_REALTIME is required for
>>> this case. Alternatively, an implementation using
>>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME (like the first four
>>> functions listed above) might be appropriate.
>>
>> ...this is one downside of that. That bug was inherited from the
>> existing pthread_clock_timedjoin_np implementation.
> 

Indeed, I am working on refactoring the futex internal usage to fix
this issue.  Thinking twice, I see that using FUTEX_WAIT_BITSET without
any additional clock adjustments should be better than calling a
clock_gettime plus FUTEX_WAIT.

> Oh -- that's pretty sad. I hadn't considered the possibility that
> the (longstanding) "timed" functions might have the same bug.
> 
>> I was planning to write a patch to just limit the supported clocks, but
>> I'll have a go at fixing the bug you describe properly instead first which
>> will limit the implementation to CLOCK_REALTIME and CLOCK_MONOTONIC anyway.

I am working on this as well.

>>
>>> ===
>>>
>>> pthread_mutex_clocklock():
>>> First of all, there's a small oddity. Suppose we specify the clockid
>>> as CLOCK_REALTIME, and then while the call is blocked, we set the
>>> clock realtime backwards. Then, there will be further futex calls to
>>> handle the modification to the clock (and possibly multiple futex
>>> calls if the realtime clock is adjusted repeatedly):
>>>
>>>         futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec1})
>>>         futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec2})
>>>         ...
>>>
>>> Then there seems to be a bug. If we specify the clockid as
>>> CLOCK_REALTIME, and while the call is blocked we set the realtime
>>> clock forwards, then the blocking interval of the call is *not*
>>> adjusted (shortened), when of course it should be.
>>
>> This is because __lll_clocklock_wait ends up doing a relative wait rather
>> than an absolute one so it suffers from the same problem as
>> pthread_clockjoin_np.

It is another indication that it would be better to use FUTEX_WAIT_BITSET
instead.

>>
>>> ===
>>>
>>> I've attached a couple of small test programs at the end of this mail.
>>
>> Thanks for looking at this in detail.
>>
>> AFAIK, all of these bugs also affected the corresponding existing
>> pthread*timed*() functions. When I added the new pthread*clock*() functions
>> I was trying to keep my changes to the existing code as small as possible.
>> (I started out trying to "scratch the itch" of libstdc++
>> std::condition_variable::wait_for misbehaving[2] when the system clock was
>> warped in 2015 and all of this ballooned from that.) Now that the functions
>> are in, I think there's definitely scope for improving the implementation
>> and I will try to do so as time and confidence allows - the implementation
>> of __pthread_mutex_clocklock_common scares me greatly!
> 
> Yeah, a lot of glibc code is not so easy to follow... Thank you for
> taking a look.

The futex code in indeed convoluted, it was initially coded all at 
lowlevellock.h.  Then it was moved out to lowlevellock-futex.h with the 
NaCL port (which required an override of the futex call to implement 
the NaCL libcalls).

Later, the futex-internal.h was added that duplicated some 
lowlevellock-futex.h call with inline function plus some error checking 
(as libstdc++ does).

So currently we have the nptl pthread code using both interfaces, which is
confusing and the duplicate the logic.  The patchset I am working makes the
NPTL call to use only futex-internal.h, remove some non required function
from it, and simplify the functions required on futex-internal.c.

The idea is lowlevellock-futex.h would be used only for lowlevellock.h
and futex-internal.h.  I am thinking whether it would be useful to
keep with lowlevellock-futex.h, it just a thin wrapper over futex syscall
with a *lot* of unused macros and without proper y2038 support (as
futex-internal.h does).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problems with the new pthread clock implementations
  2020-11-23 14:38     ` Adhemerval Zanella
@ 2020-11-23 16:12       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Kerrisk (man-pages) @ 2020-11-23 16:12 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Mike Crowe, libc-alpha, Linux API

Hello Adhemerval,

On Mon, 23 Nov 2020 at 15:39, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 21/11/2020 18:41, Michael Kerrisk (man-pages) wrote:
> > Hello Mike,
> >
> > On 11/21/20 6:54 PM, Mike Crowe wrote:
> >> Hi Michael,
> >>
> >> On Saturday 21 November 2020 at 07:59:04 +0100, Michael Kerrisk (man-pages) wrote:
> >>> I've been taking a closer look at the the new pthread*clock*() APIs:
> >>> pthread_clockjoin_np()
> >>> pthread_cond_clockwait()
> >>> pthread_mutex_clocklock()
> >>> pthread_rwlock_clockrdlock()
> >>> pthread_rwlock_clockwrlock()
> >>> sem_clockwait()
> >>>
> >>> I've noticed some oddities, and at least a couple of bugs.
> >>>
> >>> First off, I just note that there's a surprisingly wide variation in
> >>> the low-level futex calls being used by these APIs when implementing
> >>> CLOCK_REALTIME support:
> >>>
> >>> pthread_rwlock_clockrdlock()
> >>> pthread_rwlock_clockwrlock()
> >>> sem_clockwait()
> >>> pthread_cond_clockwait()
> >>>     futex(addr,
> >>>         FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3,
> >>>         {abstimespec}, FUTEX_BITSET_MATCH_ANY)
> >>>     (This implementation seems to be okay)
> >>>
> >>> pthread_clockjoin_np()
> >>>     futex(addr, FUTEX_WAIT, 48711, {reltimespec})
> >>>     (This is buggy; see below.)
> >>>
> >>> pthread_mutex_clocklock()
> >>>     futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec})
> >>>     (There's bugs and strangeness here; see below.)
> >>
> >> Yes, I found it very confusing when I started adding the new
> >> pthread*clock*() functions, and it still takes me a while to find the right
> >> functions when I look now. I believe that Adhemerval was talking about
> >> simplifying some of this.
> >>
> >>> === Bugs ===
> >>>
> >>> pthread_clockjoin_np():
> >>> As already recognized in another mail thread [1], this API accepts any
> >>> kind of clockid, even though it doesn't support most of them.
> >>
> >> Well, it sort of does support them at least as well as many other
> >> implementations of such functions do - it just calculates a relative
> >> timeout using the supplied lock and then uses that. But, ...
> >>
> >>> A further bug is that even if CLOCK_REALTIME is specified,
> >>> pthread_clockjoin_np() sleeps against the CLOCK_MONOTONIC clock.
> >>> (Currently it does this for *all* clockid values.) The problem here is
> >>> that the FUTEX_WAIT operation sleeps against the CLOCK_MONOTONIC clock
> >>> by default. At the least, the FUTEX_CLOCK_REALTIME is required for
> >>> this case. Alternatively, an implementation using
> >>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME (like the first four
> >>> functions listed above) might be appropriate.
> >>
> >> ...this is one downside of that. That bug was inherited from the
> >> existing pthread_clock_timedjoin_np implementation.
> >
>
> Indeed, I am working on refactoring the futex internal usage to fix
> this issue.  Thinking twice, I see that using FUTEX_WAIT_BITSET without
> any additional clock adjustments should be better than calling a
> clock_gettime plus FUTEX_WAIT.

Yes, that would be my estimate as well/
>
> > Oh -- that's pretty sad. I hadn't considered the possibility that
> > the (longstanding) "timed" functions might have the same bug.
> >
> >> I was planning to write a patch to just limit the supported clocks, but
> >> I'll have a go at fixing the bug you describe properly instead first which
> >> will limit the implementation to CLOCK_REALTIME and CLOCK_MONOTONIC anyway.
>
> I am working on this as well.

Thanks.

> >>> ===
> >>>
> >>> pthread_mutex_clocklock():
> >>> First of all, there's a small oddity. Suppose we specify the clockid
> >>> as CLOCK_REALTIME, and then while the call is blocked, we set the
> >>> clock realtime backwards. Then, there will be further futex calls to
> >>> handle the modification to the clock (and possibly multiple futex
> >>> calls if the realtime clock is adjusted repeatedly):
> >>>
> >>>         futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec1})
> >>>         futex(addr, FUTEX_WAIT_PRIVATE, 2, {reltimespec2})
> >>>         ...
> >>>
> >>> Then there seems to be a bug. If we specify the clockid as
> >>> CLOCK_REALTIME, and while the call is blocked we set the realtime
> >>> clock forwards, then the blocking interval of the call is *not*
> >>> adjusted (shortened), when of course it should be.
> >>
> >> This is because __lll_clocklock_wait ends up doing a relative wait rather
> >> than an absolute one so it suffers from the same problem as
> >> pthread_clockjoin_np.
>
> It is another indication that it would be better to use FUTEX_WAIT_BITSET
> instead.

:-)

> >>> ===
> >>>
> >>> I've attached a couple of small test programs at the end of this mail.
> >>
> >> Thanks for looking at this in detail.
> >>
> >> AFAIK, all of these bugs also affected the corresponding existing
> >> pthread*timed*() functions. When I added the new pthread*clock*() functions
> >> I was trying to keep my changes to the existing code as small as possible.
> >> (I started out trying to "scratch the itch" of libstdc++
> >> std::condition_variable::wait_for misbehaving[2] when the system clock was
> >> warped in 2015 and all of this ballooned from that.) Now that the functions
> >> are in, I think there's definitely scope for improving the implementation
> >> and I will try to do so as time and confidence allows - the implementation
> >> of __pthread_mutex_clocklock_common scares me greatly!
> >
> > Yeah, a lot of glibc code is not so easy to follow... Thank you for
> > taking a look.
>
> The futex code in indeed convoluted, it was initially coded all at
> lowlevellock.h.  Then it was moved out to lowlevellock-futex.h with the
> NaCL port (which required an override of the futex call to implement
> the NaCL libcalls).
>
> Later, the futex-internal.h was added that duplicated some
> lowlevellock-futex.h call with inline function plus some error checking
> (as libstdc++ does).
>
> So currently we have the nptl pthread code using both interfaces, which is
> confusing and the duplicate the logic.  The patchset I am working makes the
> NPTL call to use only futex-internal.h, remove some non required function
> from it, and simplify the functions required on futex-internal.c.
>
> The idea is lowlevellock-futex.h would be used only for lowlevellock.h
> and futex-internal.h.  I am thinking whether it would be useful to
> keep with lowlevellock-futex.h, it just a thin wrapper over futex syscall
> with a *lot* of unused macros and without proper y2038 support (as
> futex-internal.h does).

Thanks, Adhemerval. And more generally, thanks for all of the clean-up
work you do in the codebase. That's just so valuable!

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-23 16:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-21  6:59 Problems with the new pthread clock implementations Michael Kerrisk (man-pages)
2020-11-21 17:54 ` Mike Crowe
2020-11-21 21:41   ` Michael Kerrisk (man-pages)
2020-11-23 14:38     ` Adhemerval Zanella
2020-11-23 16:12       ` Michael Kerrisk (man-pages)
2020-11-21 19:50 ` Thomas Gleixner
2020-11-21 20:10   ` Mike Crowe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.