* Re: [PATCH] timerfd_create.2: Included return value 0 [not found] <CAJymdbxJNag1W0vR9Ysm7_y91HwLAu4QaSMKZbed4emT1DYvww@mail.gmail.com> @ 2020-03-26 8:45 ` Michael Kerrisk (man-pages) [not found] ` <CAJymdbwfm7EypQfXRqWZHbfj2SKk7PCP7SMccz6bmJWSauDqPQ@mail.gmail.com> 0 siblings, 1 reply; 5+ messages in thread From: Michael Kerrisk (man-pages) @ 2020-03-26 8:45 UTC (permalink / raw) To: devi R.K; +Cc: mtk.manpages, linux-man Hello Devi, On 3/18/20 3:04 PM, devi R.K wrote: > Added a return value 0 for timerfd_read which happens when there is a > bigger backward time drift*.* > > Signed-off-by: DEVI RK <devi.feb27@gmail.com> Can you say some more please about how you verified this and/or point me at the relevant kernel source code? At a simple attempt, I can't replicate the behavior you describe. Thanks, Michael > --- > man2/timerfd_create.2 | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/man2/timerfd_create.2 b/man2/timerfd_create.2 > index 066e0be..ccced98 100644 > --- a/man2/timerfd_create.2 > +++ b/man2/timerfd_create.2 > @@ -317,6 +317,10 @@ fails with the error > if the real-time clock undergoes a discontinuous change. > (This allows the reading application to discover > such discontinuous changes to the clock.) > +.IP > +A > +.BR read (2) > +may return 0 if the system clock undergoes a discontinuous change. > .TP > .BR poll "(2), " select "(2) (and similar)" > The file descriptor is readable > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CAJymdbwfm7EypQfXRqWZHbfj2SKk7PCP7SMccz6bmJWSauDqPQ@mail.gmail.com>]
[parent not found: <CAJymdby8pDASq5Xv7DeTJ5cq1NXPe1jUWwApxZ5o-huaEXUrjw@mail.gmail.com>]
* Re: [PATCH] timerfd_create.2: Included return value 0 [not found] ` <CAJymdby8pDASq5Xv7DeTJ5cq1NXPe1jUWwApxZ5o-huaEXUrjw@mail.gmail.com> @ 2020-03-29 21:06 ` Michael Kerrisk (man-pages) 2020-03-29 22:50 ` Thomas Gleixner 0 siblings, 1 reply; 5+ messages in thread From: Michael Kerrisk (man-pages) @ 2020-03-29 21:06 UTC (permalink / raw) To: devi R.K; +Cc: mtk.manpages, linux-man, Thomas Gleixner, lkml, arul.jeniston [CC widened] Hello Devi, [It's best not to top post. I've rearranged the reply for readability.] [Greetings, Thomas; now I recall a conversation we had in Lyon :-) ] On 3/27/20 5:29 AM, devi R.K wrote: > > On Thu, Mar 26, 2020 at 2:16 PM Michael Kerrisk (man-pages) < > mtk.manpages@gmail.com> wrote: > >> Hello Devi, >> >> On 3/18/20 3:04 PM, devi R.K wrote: >>> Added a return value 0 for timerfd_read which happens when there is a >>> bigger backward time drift*.* >>> >>> Signed-off-by: DEVI RK <devi.feb27@gmail.com> >> >> Can you say some more please about how you verified this and/or >> point me at the relevant kernel source code? At a simple attempt, >> I can't replicate the behavior you describe. > We have written a program using real time clock and it has been raised to > the community. > > https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/ It would be helpful if you had pointed me to this in the first place, and also CCed the people from that earlier discussion. I've widened the CC list. Thanks for pointing me at that thread. In particular, the test program at https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/#m489d81abdfbb2699743e18c37657311f8d52a4cd I've now replicated this behavior with a program of my own. >>> --- >>> man2/timerfd_create.2 | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/man2/timerfd_create.2 b/man2/timerfd_create.2 >>> index 066e0be..ccced98 100644 >>> --- a/man2/timerfd_create.2 >>> +++ b/man2/timerfd_create.2 >>> @@ -317,6 +317,10 @@ fails with the error >>> if the real-time clock undergoes a discontinuous change. >>> (This allows the reading application to discover >>> such discontinuous changes to the clock.) >>> +.IP >>> +A >>> +.BR read (2) >>> +may return 0 if the system clock undergoes a discontinuous change. >>> .TP >>> .BR poll "(2), " select "(2) (and similar)" >>> The file descriptor is readable I think this patch does not really capture the details properly. The immediately preceding paragraph says: If the associated clock is either CLOCK_REALTIME or CLOCK_REALTIME_ALARM, the timer is absolute (TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET was specified when calling timerfd_settime(), then read(2) fails with the error ECANCELED if the real-time clock undergoes a discontinuous change. (This allows the reading application to discover such discontinuous changes to the clock.) Following on from that, I think we should have a pargraph that says something like: If the associated clock is either CLOCK_REALTIME or CLOCK_REALTIME_ALARM, the timer is absolute (TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET was not specified when calling timerfd_settime(), then a discontinuous negative change to the clock (e.g., clock_settime(2)) may cause read(2) to unblock, but return a value of 0 (i.e., no bytes read), if the clock change occurs after the time expired, but before the read(2) on the timerfd file descriptor. This seems consistent with Thomas's observations in https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/#m49b78122b573a2749a05b720dc9fa036546db490 Do you agree? Thomas, if you had a moment, your input would, as always, be appreciated. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] timerfd_create.2: Included return value 0 2020-03-29 21:06 ` Michael Kerrisk (man-pages) @ 2020-03-29 22:50 ` Thomas Gleixner 2020-03-30 21:09 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 5+ messages in thread From: Thomas Gleixner @ 2020-03-29 22:50 UTC (permalink / raw) To: Michael Kerrisk (man-pages), devi R.K Cc: mtk.manpages, linux-man, lkml, arul.jeniston Micheal, "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: > [Greetings, Thomas; now I recall a conversation we had in Lyon :-) ] Hehe. > I think this patch does not really capture the details > properly. The immediately preceding paragraph says: > > If the associated clock is either CLOCK_REALTIME or > CLOCK_REALTIME_ALARM, the timer is absolute > (TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET > was specified when calling timerfd_settime(), then read(2) > fails with the error ECANCELED if the real-time clock > undergoes a discontinuous change. (This allows the reading > application to discover such discontinuous changes to the > clock.) > > Following on from that, I think we should have a pargraph that says > something like: > > If the associated clock is either CLOCK_REALTIME or > CLOCK_REALTIME_ALARM, the timer is absolute > (TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET > was not specified when calling timerfd_settime(), then a > discontinuous negative change to the clock > (e.g., clock_settime(2)) may cause read(2) to unblock, but > return a value of 0 (i.e., no bytes read), if the clock > change occurs after the time expired, but before the > read(2) on the timerfd file descriptor. Yes, that's correct. Accurate as always! This is pretty much in line with clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME) which has a similar problem vs. observability in user space. clock_nanosleep(2) mutters: "POSIX.1 specifies that after changing the value of the CLOCK_REALTIME clock via clock_settime(2), the new clock value shall be used to determine the time at which a thread blocked on an absolute clock_nanosleep() will wake up; if the new clock value falls past the end of the sleep interval, then the clock_nanosleep() call will return immediately." which can be interpreted as guarantee that clock_nanosleep() never returns prematurely, i.e. the assert() in the below code would indicate a kernel failure: ret = clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &expiry, NULL); if (!ret) { clock_gettime(CLOCK_REALTIME, &now); assert(now >= expiry); } But that assert can trigger when CLOCK_REALTIME was modified after the timer fired and the kernel decided to wake up the task and let it return to user space. clock_nanosleep(..., &expiry) arm_timer(expires); schedule(); -> timer interrupt now = ktime_get_real(); if (expires <= now) -------------------------------- After this point wakeup(); clock_settime(2) or adjtimex(2) which makes CLOCK_REALTIME jump back far enough will cause the above assert to trigger. ... return from syscall (retval == 0) There is no guarantee against clock_settime() coming after the wakeup. Even if we put another check into the return to user path then we won't catch a clock_settime() which comes right after that and before user space invokes clock_gettime(). POSIX spec Issue 7 (2018 edition) says: The suspension for the absolute clock_nanosleep() function (that is, with the TIMER_ABSTIME flag set) shall be in effect at least until the value of the corresponding clock reaches the absolute time specified by rqtp. And that's what the kernel implements for clock_nanosleep() and timerfd behaves exactly the same way. The wakeup of the waiter, i.e. task blocked in clock_nanosleep(2), read(2), poll(2), is not happening _before_ the absolute time specified is reached. If clock_settime() happens right before the expiry check, then it does the right thing, but any modification to the clock after the wakeup cannot be mitigated. At least not in a way which would make the assert() in the example code above a reliable indicator for a kernel fail. That's the reason why I rejected the attempt to mitigate that particular 0 tick issue in timerfd as it would just scratch a particular itch but still not provide any guarantee. So having the '0' return documented is the right way to go. Thanks, tglx ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] timerfd_create.2: Included return value 0 2020-03-29 22:50 ` Thomas Gleixner @ 2020-03-30 21:09 ` Michael Kerrisk (man-pages) 2020-03-30 21:29 ` Thomas Gleixner 0 siblings, 1 reply; 5+ messages in thread From: Michael Kerrisk (man-pages) @ 2020-03-30 21:09 UTC (permalink / raw) To: Thomas Gleixner, devi R.K; +Cc: mtk.manpages, linux-man, lkml, arul.jeniston Hello Thomas, On 3/30/20 12:50 AM, Thomas Gleixner wrote: > Micheal, > > "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: >> [Greetings, Thomas; now I recall a conversation we had in Lyon :-) ] > > Hehe. > >> I think this patch does not really capture the details >> properly. The immediately preceding paragraph says: >> >> If the associated clock is either CLOCK_REALTIME or >> CLOCK_REALTIME_ALARM, the timer is absolute >> (TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET >> was specified when calling timerfd_settime(), then read(2) >> fails with the error ECANCELED if the real-time clock >> undergoes a discontinuous change. (This allows the reading >> application to discover such discontinuous changes to the >> clock.) >> >> Following on from that, I think we should have a pargraph that says >> something like: >> >> If the associated clock is either CLOCK_REALTIME or >> CLOCK_REALTIME_ALARM, the timer is absolute >> (TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET >> was not specified when calling timerfd_settime(), then a >> discontinuous negative change to the clock >> (e.g., clock_settime(2)) may cause read(2) to unblock, but >> return a value of 0 (i.e., no bytes read), if the clock >> change occurs after the time expired, but before the >> read(2) on the timerfd file descriptor. > > Yes, that's correct. Accurate as always! Thanks. (It took me a while to nut it out, actually.) > This is pretty much in line with clock_nanosleep(CLOCK_REALTIME, > TIMER_ABSTIME) which has a similar problem vs. observability in user > space. > > clock_nanosleep(2) mutters: > > "POSIX.1 specifies that after changing the value of the CLOCK_REALTIME > clock via clock_settime(2), the new clock value shall be used to > determine the time at which a thread blocked on an absolute > clock_nanosleep() will wake up; if the new clock value falls past the > end of the sleep interval, then the clock_nanosleep() call will return > immediately." > > which can be interpreted as guarantee that clock_nanosleep() never > returns prematurely, <nod> > i.e. the assert() in the below code would indicate > a kernel failure: > > ret = clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &expiry, NULL); > if (!ret) { > clock_gettime(CLOCK_REALTIME, &now); > assert(now >= expiry); > } > > But that assert can trigger when CLOCK_REALTIME was modified after the > timer fired and the kernel decided to wake up the task and let it return > to user space. > > clock_nanosleep(..., &expiry) > arm_timer(expires); > schedule(); > > -> timer interrupt > now = ktime_get_real(); > if (expires <= now) > -------------------------------- After this point > wakeup(); clock_settime(2) or > adjtimex(2) which > makes CLOCK_REALTIME > jump back far enough will > cause the above assert > to trigger. > > ... > return from syscall (retval == 0) > > There is no guarantee against clock_settime() coming after the > wakeup. Even if we put another check into the return to user path then > we won't catch a clock_settime() which comes right after that and before > user space invokes clock_gettime(). <nod> > POSIX spec Issue 7 (2018 edition) says: > > The suspension for the absolute clock_nanosleep() function (that is, > with the TIMER_ABSTIME flag set) shall be in effect at least until the > value of the corresponding clock reaches the absolute time specified by > rqtp. > > And that's what the kernel implements for clock_nanosleep() and timerfd > behaves exactly the same way. > > The wakeup of the waiter, i.e. task blocked in clock_nanosleep(2), > read(2), poll(2), is not happening _before_ the absolute time specified > is reached. > > If clock_settime() happens right before the expiry check, then it does > the right thing, but any modification to the clock after the wakeup > cannot be mitigated. At least not in a way which would make the assert() > in the example code above a reliable indicator for a kernel fail. > > That's the reason why I rejected the attempt to mitigate that particular > 0 tick issue in timerfd as it would just scratch a particular itch but > still not provide any guarantee. So having the '0' return documented is > the right way to go. Thanks for the incredibly detailed follow-up Thomas (which all landed in my commit message). I've applied a patch almost exactly the same as the text I showed above (and it's pushed to Git). All of 2020 is a bust, I expect. Perhaps we see us at a conference in 2021 ;-). Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] timerfd_create.2: Included return value 0 2020-03-30 21:09 ` Michael Kerrisk (man-pages) @ 2020-03-30 21:29 ` Thomas Gleixner 0 siblings, 0 replies; 5+ messages in thread From: Thomas Gleixner @ 2020-03-30 21:29 UTC (permalink / raw) To: Michael Kerrisk (man-pages), devi R.K Cc: mtk.manpages, linux-man, lkml, arul.jeniston Micheal, "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: > On 3/30/20 12:50 AM, Thomas Gleixner wrote: >> That's the reason why I rejected the attempt to mitigate that particular >> 0 tick issue in timerfd as it would just scratch a particular itch but >> still not provide any guarantee. So having the '0' return documented is >> the right way to go. > > Thanks for the incredibly detailed follow-up Thomas (which all > landed in my commit message). I've applied a patch almost exactly > the same as the text I showed above (and it's pushed to Git). "timerfd_create.2: Negetive..." That first word after the colon looks weird :) >> All of 2020 is a bust, I expect. Perhaps we see us at a conference > in 2021 ;-). Let's see how that evolves and in the worst case we can meet for a beer once we gained antibodies one way or the other. For now let's stay safe and all I can offer is a virtual 'Prosit' :) Thanks, tglx ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-03-30 21:29 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAJymdbxJNag1W0vR9Ysm7_y91HwLAu4QaSMKZbed4emT1DYvww@mail.gmail.com> 2020-03-26 8:45 ` [PATCH] timerfd_create.2: Included return value 0 Michael Kerrisk (man-pages) [not found] ` <CAJymdbwfm7EypQfXRqWZHbfj2SKk7PCP7SMccz6bmJWSauDqPQ@mail.gmail.com> [not found] ` <CAJymdby8pDASq5Xv7DeTJ5cq1NXPe1jUWwApxZ5o-huaEXUrjw@mail.gmail.com> 2020-03-29 21:06 ` Michael Kerrisk (man-pages) 2020-03-29 22:50 ` Thomas Gleixner 2020-03-30 21:09 ` Michael Kerrisk (man-pages) 2020-03-30 21:29 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).