Hi Thomas, I see a strange behavior on the parisc platform, for which I'm not sure if it's intended or if there is a bug somewhere. The program calls timerfd_settime() and sets a timer (e.g. sec=0, nsec=100000000). Directly after setting the timer it calls timerfd_gettime() and receives (sec=0, nsec=103914413). The second nsec is higher than the initial nsec value which was set. Does timerfd_settime() maybe tries to add the initial time it takes to start the timer? Any idea or hint? Thanks, Helge Background: I'm debugging the build-failure on debian for the liblinux-fd-perl package: https://buildd.debian.org/status/fetch.php?pkg=liblinux-fd-perl&arch=hppa&ver=0.011-1&stamp=1443355593 Here is a log which I get from kernel after adding some printks. The problematic line is #3. [ 465.888000] timerfd_settime: interval (sec=0, nsec=0) it_value (sec=0, nsec=100000000) [ 466.196000] timerfd_settime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=100000000) [ 466.300000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=103914413) [ 466.404000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=97444552) [ 466.508000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=92611704) [ 466.616000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=87376859) [ 466.720000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=82538534) [ 466.824000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=77293289) [ 466.928000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=72501584) [ 467.036000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=67377673) [ 467.140000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=62631601) [ 467.244000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=57401824)
Helge, On Sat, 26 Dec 2015, Helge Deller wrote: > I see a strange behavior on the parisc platform, for which I'm not sure if > it's intended or if there is a bug somewhere. > The program calls timerfd_settime() and sets a timer (e.g. sec=0, nsec=100000000). > Directly after setting the timer it calls timerfd_gettime() and receives > (sec=0, nsec=103914413). > The second nsec is higher than the initial nsec value which was set. > > Does timerfd_settime() maybe tries to add the initial time it takes to start > the timer? > > Any idea or hint? Yes. This is a fallout from the power aware batching magic. Interesting that nobody noticed this within 7 years. Does the patch below fix your issue? Thanks, tglx 8<---------------- diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 76dd4f0da5ca..0f4a3e8734f1 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -268,7 +268,7 @@ static inline s64 hrtimer_get_expires_ns(const struct hrtimer *timer) static inline ktime_t hrtimer_expires_remaining(const struct hrtimer *timer) { - return ktime_sub(timer->node.expires, timer->base->get_time()); + return ktime_sub(timer->_softexpires, timer->base->get_time()); } static inline ktime_t hrtimer_cb_get_time(struct hrtimer *timer)
On 29.12.2015 10:44, Thomas Gleixner wrote: > On Sat, 26 Dec 2015, Helge Deller wrote: >> I see a strange behavior on the parisc platform, for which I'm not sure if >> it's intended or if there is a bug somewhere. > >> The program calls timerfd_settime() and sets a timer (e.g. sec=0, nsec=100000000). >> Directly after setting the timer it calls timerfd_gettime() and receives >> (sec=0, nsec=103914413). >> The second nsec is higher than the initial nsec value which was set. >> >> Does timerfd_settime() maybe tries to add the initial time it takes to start >> the timer? >> >> Any idea or hint? > > Yes. This is a fallout from the power aware batching magic. Interesting that > nobody noticed this within 7 years. > > Does the patch below fix your issue? No, the patch below doesn't help. I still see: [ 644.916000] timerfd_settime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=100000000) [ 645.024000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=103029949) Helge > diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h > index 76dd4f0da5ca..0f4a3e8734f1 100644 > --- a/include/linux/hrtimer.h > +++ b/include/linux/hrtimer.h > @@ -268,7 +268,7 @@ static inline s64 hrtimer_get_expires_ns(const struct hrtimer *timer) > > static inline ktime_t hrtimer_expires_remaining(const struct hrtimer *timer) > { > - return ktime_sub(timer->node.expires, timer->base->get_time()); > + return ktime_sub(timer->_softexpires, timer->base->get_time()); > } > > static inline ktime_t hrtimer_cb_get_time(struct hrtimer *timer)
On Tue, 29 Dec 2015, Helge Deller wrote:
> No, the patch below doesn't help.
>
> I still see:
> [ 644.916000] timerfd_settime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=100000000)
> [ 645.024000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=103029949)
>
Right. It can't help. Sorry for the distraction.
Looking deeper I found the issue. It's caused by CONFIG_TIME_LOW_RES. See the
comment in hrtimer_start_range_ns(). We round the expiry time to the next
jiffies period to avoid short timeouts. Assuming you are running with HZ=250
this is exactly 4ms. So that's where your extra time comes from.
Not sure what to do about that.
Thanks,
tglx
Hi Thomas, * Thomas Gleixner <tglx@linutronix.de>: > On Tue, 29 Dec 2015, Helge Deller wrote: > > No, the patch below doesn't help. > > > > I still see: > > [ 644.916000] timerfd_settime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=100000000) > > [ 645.024000] timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=103029949) > > > > Right. It can't help. Sorry for the distraction. > > Looking deeper I found the issue. It's caused by CONFIG_TIME_LOW_RES. See the > comment in hrtimer_start_range_ns(). We round the expiry time to the next > jiffies period to avoid short timeouts. Assuming you are running with HZ=250 > this is exactly 4ms. So that's where your extra time comes from. Yes, that seems right. > Not sure what to do about that. The patch below seems to solve the issue for me. But I'm not sure if there might be any side-effects. What do you think? Does the patch seems correct? It just adjusts to values returned to userspace (and thus hides the internal roundings). Thanks, Helge diff --git a/fs/timerfd.c b/fs/timerfd.c index b94fa6c..9b6cc73 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -152,8 +152,16 @@ static ktime_t timerfd_get_remaining(struct timerfd_ctx *ctx) if (isalarm(ctx)) remaining = alarm_expires_remaining(&ctx->t.alarm); - else + else { remaining = hrtimer_expires_remaining(&ctx->t.tmr); +#ifdef CONFIG_TIME_LOW_RES + /* Expiry time was rounded up in hrtimer_start_range_ns() + * to the next jiffies period to avoid short timeouts. + * Subtract it here again to avoid userspace seeing higher + * values than expected. */ + remaining.tv64 -= hrtimer_resolution; +#endif + } return remaining.tv64 < 0 ? ktime_set(0, 0): remaining; }
On architectures where CONFIG_TIME_LOW_RES is set (currently parisc, m68k, frv) calling timerfd_settime() to set a timeout and directly afterwards calling timerfd_gettime() to get the remaining time shows a behaviour that the remaining time can be higher than the originally set timeout. Here is an example showing the problem, that the nsec value of it_value is higher than the set nsec value: timerfd_settime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=100000000) timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=103029949) This happens because in hrtimer_start_range_ns() the expiry time is rounded to the next jiffies period to avoid short timeouts. When running with HZ=250 this is 4ms which can be seen in the example above. This behaviour confuses userspace programs. For example, the debian liblinux-fd-perl and libnanomsg-raw-perl packges fail to build because the timeout is higher than expected. Fix this problem by subtracting the value added by hrtimer_start_range_ns() before returning the timeout back to userspace. Signed-off-by: Helge Deller <deller@gmx.de> diff --git a/fs/timerfd.c b/fs/timerfd.c index b94fa6c..098ac0a 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -152,8 +152,17 @@ static ktime_t timerfd_get_remaining(struct timerfd_ctx *ctx) if (isalarm(ctx)) remaining = alarm_expires_remaining(&ctx->t.alarm); - else + else { remaining = hrtimer_expires_remaining(&ctx->t.tmr); +#ifdef CONFIG_TIME_LOW_RES + /* Expiry time was rounded up in hrtimer_start_range_ns() + * to the next jiffies period to avoid short timeouts. + * Subtract it here again to avoid userspace seeing higher + * values than originally programmed. */ + if (!(ctx->settime_flags & TFD_TIMER_ABSTIME)) + remaining.tv64 -= hrtimer_resolution; +#endif + } return remaining.tv64 < 0 ? ktime_set(0, 0): remaining; }
On architectures where CONFIG_TIME_LOW_RES is set (currently parisc, m68k, frv) calling timerfd_settime() to set a timeout and directly afterwards calling timerfd_gettime() to get the remaining time shows a behaviour that the remaining time can be higher than the originally set timeout. Here is an example showing the problem, that the nsec value of it_value is higher than the set nsec value: timerfd_settime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=100000000) timerfd_gettime: interval (sec=0, nsec=100000000) it_value (sec=0, nsec=103029949) This happens because in hrtimer_start_range_ns() the expiry time is rounded to the next jiffies period to avoid short timeouts. When running with HZ=250 this is 4ms which can be seen in the example above. This behaviour confuses userspace programs. For example, the debian liblinux-fd-perl and libnanomsg-raw-perl packges fail to build because the timeout is higher than expected. Fix this problem by subtracting the value added by hrtimer_start_range_ns() before returning the timeout back to userspace. Signed-off-by: Helge Deller <deller@gmx.de> diff --git a/fs/timerfd.c b/fs/timerfd.c index b94fa6c..098ac0a 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -152,8 +152,17 @@ static ktime_t timerfd_get_remaining(struct timerfd_ctx *ctx) if (isalarm(ctx)) remaining = alarm_expires_remaining(&ctx->t.alarm); - else + else { remaining = hrtimer_expires_remaining(&ctx->t.tmr); +#ifdef CONFIG_TIME_LOW_RES + /* Expiry time was rounded up in hrtimer_start_range_ns() + * to the next jiffies period to avoid short timeouts. + * Subtract it here again to avoid userspace seeing higher + * values than originally programmed. */ + if (!(ctx->settime_flags & TFD_TIMER_ABSTIME)) + remaining.tv64 -= hrtimer_resolution; +#endif + } return remaining.tv64 < 0 ? ktime_set(0, 0): remaining; }