* [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 'hrtimer_forward_now()' returns zero due to bigger backward time drift. This causes timerfd_read to return 0. As per man page, read on timerfd is not expected to return 0. This patch fixes this problem. Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> @ 2019-08-16 8:32 arul.jeniston 2019-08-16 9:05 ` [PATCH] FS: timerfd: [Trimmed unreadable long subject line ] Thomas Gleixner 2019-08-16 10:22 ` [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function Arul Jeniston 0 siblings, 2 replies; 26+ messages in thread From: arul.jeniston @ 2019-08-16 8:32 UTC (permalink / raw) To: viro, tglx; +Cc: linux-fsdevel, linux-kernel, arul_mc, ARUL JENISTON MC From: ARUL JENISTON MC <arul.jeniston@gmail.com> --- fs/timerfd.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/timerfd.c b/fs/timerfd.c index 6a6fc8aa1de7..f5094e070e9a 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count, &ctx->t.alarm, ctx->tintv) - 1; alarm_restart(&ctx->t.alarm); } else { - ticks += hrtimer_forward_now(&ctx->t.tmr, - ctx->tintv) - 1; + u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr, + ctx->tintv); + /* + * ticks shouldn't become zero at this point. + * Ignore if hrtimer_forward_now returns 0 + * due to larger backward time drift. + */ + if (likely(nooftimeo)) { + ticks += nooftimeo - 1; + } hrtimer_restart(&ctx->t.tmr); } } -- 2.11.0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: [Trimmed unreadable long subject line ] 2019-08-16 8:32 [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 'hrtimer_forward_now()' returns zero due to bigger backward time drift. This causes timerfd_read to return 0. As per man page, read on timerfd is not expected to return 0. This patch fixes this problem. Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> arul.jeniston @ 2019-08-16 9:05 ` Thomas Gleixner 2019-08-16 10:22 ` [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function Arul Jeniston 1 sibling, 0 replies; 26+ messages in thread From: Thomas Gleixner @ 2019-08-16 9:05 UTC (permalink / raw) To: ARUL JENISTON MC; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, On Fri, 16 Aug 2019, arul.jeniston@gmail.com wrote: Please write the subject as a short precise sentence which fits into 70 characters and put the long explanation into the body, i.e. here. See Documentation/process/submitting-patches.rst > From: ARUL JENISTON MC <arul.jeniston@gmail.com> This lacks a Signed-off-by > --- > fs/timerfd.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/fs/timerfd.c b/fs/timerfd.c > index 6a6fc8aa1de7..f5094e070e9a 100644 > --- a/fs/timerfd.c > +++ b/fs/timerfd.c > @@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count, > &ctx->t.alarm, ctx->tintv) - 1; > alarm_restart(&ctx->t.alarm); > } else { > - ticks += hrtimer_forward_now(&ctx->t.tmr, > - ctx->tintv) - 1; > + u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr, > + ctx->tintv); > + /* > + * ticks shouldn't become zero at this point. > + * Ignore if hrtimer_forward_now returns 0 > + * due to larger backward time drift. > + */ What? Backward time drift? Can you please explain how this would happen? > + if (likely(nooftimeo)) { > + ticks += nooftimeo - 1; > + } Pointless brackets. Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-16 8:32 [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 'hrtimer_forward_now()' returns zero due to bigger backward time drift. This causes timerfd_read to return 0. As per man page, read on timerfd is not expected to return 0. This patch fixes this problem. Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> arul.jeniston 2019-08-16 9:05 ` [PATCH] FS: timerfd: [Trimmed unreadable long subject line ] Thomas Gleixner @ 2019-08-16 10:22 ` Arul Jeniston 2019-08-16 10:45 ` Thomas Gleixner 1 sibling, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-08-16 10:22 UTC (permalink / raw) To: viro, tglx; +Cc: linux-fsdevel, linux-kernel, arul_mc, ARUL JENISTON MC 'hrtimer_forward_now()' returns zero due to bigger backward time drift. This causes timerfd_read to return 0. As per man page, read on timerfd is not expected to return 0. This problem is well explained in https://lkml.org/lkml/2019/7/31/442 . This patch fixes this problem. Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> --- fs/timerfd.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/timerfd.c b/fs/timerfd.c index 6a6fc8aa1de7..f5094e070e9a 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count, &ctx->t.alarm, ctx->tintv) - 1; alarm_restart(&ctx->t.alarm); } else { - ticks += hrtimer_forward_now(&ctx->t.tmr, - ctx->tintv) - 1; + u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr, + ctx->tintv); + /* + * ticks shouldn't become zero at this point. + * Ignore if hrtimer_forward_now returns 0 + * due to larger backward time drift. + */ + if (likely(nooftimeo)) { + ticks += nooftimeo - 1; + } hrtimer_restart(&ctx->t.tmr); } } -- 2.11.0 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-16 10:22 ` [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function Arul Jeniston @ 2019-08-16 10:45 ` Thomas Gleixner 2019-08-16 16:55 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-08-16 10:45 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, On Fri, 16 Aug 2019, Arul Jeniston wrote: > Subject: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. The prefix is not 'FS: timerfd:' 1) The usual prefix for fs/* is: 'fs:' but... 2) git log fs/timerfd.c gives you a pretty good hint for the proper prefix. Look at the commits which actually do functional changes to that file, not at those which do (sub)system wide cleanups/adjustments. Also 'timerfd_read function' can be written as 'timerfd_read()' which spares the redundant function and clearly marks it as function via the brackets. > 'hrtimer_forward_now()' returns zero due to bigger backward time drift. > This causes timerfd_read to return 0. As per man page, read on timerfd > is not expected to return 0. > This problem is well explained in https://lkml.org/lkml/2019/7/31/442 1) The explanation needs to be in the changelog itself. Links can point to discussions, bug-reports which have supplementary information. 2) Please do not use lkml.org links. Again: Please read and follow Documentation/process/submitting-patches.rst > . This patch fixes this problem. > Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> Missing empty line before Signed-off-by. Please use git-log to see how changelogs are properly formatted. Also: 'This patch fixes this problem' is not helpful at all. Again see the document I already pointed you to. > --- > fs/timerfd.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/fs/timerfd.c b/fs/timerfd.c > index 6a6fc8aa1de7..f5094e070e9a 100644 > --- a/fs/timerfd.c > +++ b/fs/timerfd.c > @@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file, > char __user *buf, size_t count, > &ctx->t.alarm, ctx->tintv) - 1; > alarm_restart(&ctx->t.alarm); > } else { > - ticks += hrtimer_forward_now(&ctx->t.tmr, > - ctx->tintv) - 1; > + u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr, > + ctx->tintv); nooftimeo is pretty non-intuitive. The function documentation of hrtimer_forward_now() says: Returns the number of overruns. So the obvious variable name is overruns, right? > + /* > + * ticks shouldn't become zero at this point. > + * Ignore if hrtimer_forward_now returns 0 > + * due to larger backward time drift. Again. This explanation does not make any sense at all. Time does not go backwards, except if it is CLOCK_REALTIME which can be set backwards via clock_settime() or settimeofday(). > + */ > + if (likely(nooftimeo)) { > + ticks += nooftimeo - 1; > + } Again: Pointless brackets. If you disagree with my review comment, then tell me in a reply. If not, then fix it. If you decide to ignore my comments, then don't wonder if I ignore your patches. Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-16 10:45 ` Thomas Gleixner @ 2019-08-16 16:55 ` Arul Jeniston 2019-08-16 17:00 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-08-16 16:55 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Hi tglx, Thank you for your comments. Please find my commend in-lined On Fri, Aug 16, 2019 at 4:15 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > Arul, > > On Fri, 16 Aug 2019, Arul Jeniston wrote: > > > Subject: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. > > The prefix is not 'FS: timerfd:' > > 1) The usual prefix for fs/* is: 'fs:' but... > > 2) git log fs/timerfd.c gives you a pretty good hint for the proper > prefix. Look at the commits which actually do functional changes to that > file, not at those which do (sub)system wide cleanups/adjustments. > > Also 'timerfd_read function' can be written as 'timerfd_read()' which > spares the redundant function and clearly marks it as function via the > brackets. > > > 'hrtimer_forward_now()' returns zero due to bigger backward time drift. > > This causes timerfd_read to return 0. As per man page, read on timerfd > > is not expected to return 0. > > This problem is well explained in https://lkml.org/lkml/2019/7/31/442 > > 1) The explanation needs to be in the changelog itself. Links can point to > discussions, bug-reports which have supplementary information. > > 2) Please do not use lkml.org links. > > Again: Please read and follow Documentation/process/submitting-patches.rst > > > . This patch fixes this problem. > > Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> > > Missing empty line before Signed-off-by. Please use git-log to see how > changelogs are properly formatted. > > Also: 'This patch fixes this problem' is not helpful at all. Again see the > document I already pointed you to. > Agreed. Would incorporate all the above comments. > > --- > > fs/timerfd.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/fs/timerfd.c b/fs/timerfd.c > > index 6a6fc8aa1de7..f5094e070e9a 100644 > > --- a/fs/timerfd.c > > +++ b/fs/timerfd.c > > @@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file, > > char __user *buf, size_t count, > > &ctx->t.alarm, ctx->tintv) - 1; > > alarm_restart(&ctx->t.alarm); > > } else { > > - ticks += hrtimer_forward_now(&ctx->t.tmr, > > - ctx->tintv) - 1; > > + u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr, > > + ctx->tintv); > > nooftimeo is pretty non-intuitive. The function documentation of > hrtimer_forward_now() says: > > Returns the number of overruns. > > So the obvious variable name is overruns, right? > Agreed. Would change the variable name to overruns. > > + /* > > + * ticks shouldn't become zero at this point. > > + * Ignore if hrtimer_forward_now returns 0 > > + * due to larger backward time drift. > > Again. This explanation does not make any sense at all. > > Time does not go backwards, except if it is CLOCK_REALTIME which can be set > backwards via clock_settime() or settimeofday(). > > > + */ > > + if (likely(nooftimeo)) { > > + ticks += nooftimeo - 1; > > + } > > Again: Pointless brackets. > > If you disagree with my review comment, then tell me in a reply. If not, > then fix it. If you decide to ignore my comments, then don't wonder if I > ignore your patches. > We use CLOCK_REALTIME while creating timer_fd. Can read() on timerfd return 0 when the clock is set to CLOCK_REALTIME? We have Intel rangely 4 cpu system running debian stretch linux kernel. The current clock source is set to tsc. During our testing, we observed the time drifts backward occasionally. Through kernel instrumentation, we observed, sometimes clocksource_delta() finds the current time lesser than last time. and returns 0 delta. This causes the following code flow to return 0 ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta() > Thanks, > > tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-16 16:55 ` Arul Jeniston @ 2019-08-16 17:00 ` Arul Jeniston 2019-08-16 21:17 ` Thomas Gleixner 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-08-16 17:00 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Adding few more data points... On Fri, Aug 16, 2019 at 10:25 PM Arul Jeniston <arul.jeniston@gmail.com> wrote: > > Hi tglx, > > Thank you for your comments. > Please find my commend in-lined > > On Fri, Aug 16, 2019 at 4:15 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > Arul, > > > > On Fri, 16 Aug 2019, Arul Jeniston wrote: > > > > > Subject: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. > > > > The prefix is not 'FS: timerfd:' > > > > 1) The usual prefix for fs/* is: 'fs:' but... > > > > 2) git log fs/timerfd.c gives you a pretty good hint for the proper > > prefix. Look at the commits which actually do functional changes to that > > file, not at those which do (sub)system wide cleanups/adjustments. > > > > Also 'timerfd_read function' can be written as 'timerfd_read()' which > > spares the redundant function and clearly marks it as function via the > > brackets. > > > > > 'hrtimer_forward_now()' returns zero due to bigger backward time drift. > > > This causes timerfd_read to return 0. As per man page, read on timerfd > > > is not expected to return 0. > > > This problem is well explained in https://lkml.org/lkml/2019/7/31/442 > > > > 1) The explanation needs to be in the changelog itself. Links can point to > > discussions, bug-reports which have supplementary information. > > > > 2) Please do not use lkml.org links. > > > > Again: Please read and follow Documentation/process/submitting-patches.rst > > > > > . This patch fixes this problem. > > > Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> > > > > Missing empty line before Signed-off-by. Please use git-log to see how > > changelogs are properly formatted. > > > > Also: 'This patch fixes this problem' is not helpful at all. Again see the > > document I already pointed you to. > > > > Agreed. Would incorporate all the above comments. > > > > --- > > > fs/timerfd.c | 12 ++++++++++-- > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/timerfd.c b/fs/timerfd.c > > > index 6a6fc8aa1de7..f5094e070e9a 100644 > > > --- a/fs/timerfd.c > > > +++ b/fs/timerfd.c > > > @@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file, > > > char __user *buf, size_t count, > > > &ctx->t.alarm, ctx->tintv) - 1; > > > alarm_restart(&ctx->t.alarm); > > > } else { > > > - ticks += hrtimer_forward_now(&ctx->t.tmr, > > > - ctx->tintv) - 1; > > > + u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr, > > > + ctx->tintv); > > > > nooftimeo is pretty non-intuitive. The function documentation of > > hrtimer_forward_now() says: > > > > Returns the number of overruns. > > > > So the obvious variable name is overruns, right? > > > > Agreed. Would change the variable name to overruns. > > > > + /* > > > + * ticks shouldn't become zero at this point. > > > + * Ignore if hrtimer_forward_now returns 0 > > > + * due to larger backward time drift. > > > > Again. This explanation does not make any sense at all. > > > > Time does not go backwards, except if it is CLOCK_REALTIME which can be set > > backwards via clock_settime() or settimeofday(). > > > > > + */ > > > + if (likely(nooftimeo)) { > > > + ticks += nooftimeo - 1; > > > + } > > > > Again: Pointless brackets. > > > > If you disagree with my review comment, then tell me in a reply. If not, > > then fix it. If you decide to ignore my comments, then don't wonder if I > > ignore your patches. > > > > We use CLOCK_REALTIME while creating timer_fd. > Can read() on timerfd return 0 when the clock is set to CLOCK_REALTIME? > > We have Intel rangely 4 cpu system running debian stretch linux > kernel. The current clock source is set to tsc. During our testing, we > observed the time drifts backward occasionally. Through kernel > instrumentation, we observed, sometimes clocksource_delta() finds the > current time lesser than last time. and returns 0 delta. > This causes the following code flow to return a time which is lesser than previously fetched time. ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta() Since ktime_get() returns a time which is lesser than the expiry time, hrtimer_forward_now return 0. This in-turn causes timerfd_read to return 0. Is it not a bug? > > Thanks, > > > > tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-16 17:00 ` Arul Jeniston @ 2019-08-16 21:17 ` Thomas Gleixner [not found] ` <CACAVd4hT6QYtgtDsBcgy7c_s9WVBAH+1m0r5geBe7BUWJWYhbA@mail.gmail.com> 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-08-16 21:17 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, On Fri, 16 Aug 2019, Arul Jeniston wrote: > Adding few more data points... Can you please trim your replies? It's annoying to have to search for the meat of your mail by scrolling down several pages and paying attention to not skip something useful inside of useless information. > On Fri, Aug 16, 2019 at 10:25 PM Arul Jeniston <arul.jeniston@gmail.com> wrote: > > On Fri, Aug 16, 2019 at 4:15 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > We use CLOCK_REALTIME while creating timer_fd. > > Can read() on timerfd return 0 when the clock is set to CLOCK_REALTIME? As CLOCK_REALTIME is subject to be set by various mechanisms, yes. See timerfd_clock_was_set(). If that's the case, your application is missing something. But see below ... > > We have Intel rangely 4 cpu system running debian stretch linux > > kernel. The current clock source is set to tsc. During our testing, we > > observed the time drifts backward occasionally. Through kernel > > instrumentation, we observed, sometimes clocksource_delta() finds the > > current time lesser than last time. and returns 0 delta. That has absolutely nothing to do with CLOCK_REALTIME. Your machines TSC is either going backwards or not synchronized between cores. Hint: Dell has a track record of BIOS doing the wrong things to TSC in order to hide their 'value add' features stealing CPU time. > This causes the following code flow to return a time which is lesser > than previously fetched time. > ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta() ktime_get() is CLOCK_MONOTONIC and not CLOCK_REALTIME. > Since ktime_get() returns a time which is lesser than the expiry time, > hrtimer_forward_now return 0. > This in-turn causes timerfd_read to return 0. > Is it not a bug? It's a bug, but either a hardware or a BIOS bug and you are trying to paper over it at the place where you observe the symptom, which is obviously the wrong place because: 1) Any other time related function even in timerfd is affected as well 2) We do not cure symptoms, we cure the root cause. And clearly the root cause hase not been explained and addressed. Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <CACAVd4hT6QYtgtDsBcgy7c_s9WVBAH+1m0r5geBe7BUWJWYhbA@mail.gmail.com>]
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. [not found] ` <CACAVd4hT6QYtgtDsBcgy7c_s9WVBAH+1m0r5geBe7BUWJWYhbA@mail.gmail.com> @ 2019-08-17 19:23 ` Thomas Gleixner 2019-08-19 6:07 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-08-17 19:23 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, On Sat, 17 Aug 2019, Arul Jeniston wrote: > Do you agree the possibility of returning zero value from timerfd_read()? Obviosuly it happens. > > That has absolutely nothing to do with CLOCK_REALTIME. Your machines > > TSC is either going backwards or not synchronized between cores. > > > > Hint: Dell has a track record of BIOS doing the wrong things to TSC in > > order to hide their 'value add' features stealing CPU time. > > We haven't seen any issue in Dell hardware but we would definitely check > the possibility of hardware bug. BIOS is the more likely candidate. > Let us say, due to some reason the tsc goes backwards. Isn't it handled in > clocksource_delta(). No. clocksource_delta() does damage limitation. It prevents insane large time jumps which would result if the read out TSC value is less than the base value which is used to calculate the delta. It cannot do more than that. > Is timerfd_read expected to return 0 if tsc drifts backwards? If so, why it > is not documented? > Being a system call, we expect all return values of read() on timerfd to be > documented in the man page. We expect TSC not to go backwards. If it does it's broken and not usable as a clocksource for the kernel. The problem is that this is not necessarily easy to detect. Fact is, that your machines TSC goes backwards or is not properly synchronized between the cores. Otherwise the problem would not exist at all. That's the problem which needs to be fixed and not papered over with crude hacks and weird justifications. > > ktime_get() is CLOCK_MONOTONIC and not CLOCK_REALTIME. > > We see the same base used for CLOCK_MONOTONIC, CLOCK_REALTIME timers here. > both MONOTONIC, REALTIME timers hits the following code flow. we confirmed > it through instrumentation. > timerfd_read()-->hrtimer_forward_now()-->ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta(). > Do you want me to share any other logs to confirm it? No. That's the case when you use a relative timer with CLOCK_REALTIME because only absolute timers are affected by modifications of CLOCK_REALTIME. So it's NOT an issue of a CLOCK_REALTIME modification via settimeofday() or adjtimex(). > > It's a bug, but either a hardware or a BIOS bug and you are trying to > > paper over it at the place where you observe the symptom, which is > > obviously the wrong place because: > > > > 1) Any other time related function even in timerfd is affected as well > > > > 2) We do not cure symptoms, we cure the root cause. And clearly the root > > cause hase not been explained and addressed. > > if we don't fix this in kernel, can we document this return value in > timerfd read() man page? Again: You cannot fix a hardware problem by hacking around it at exactly one place where you can observe it. If that problem exists on your machine, then any other time related function is affected as well. Are you going to submit patches against _ALL_ time{r} related syscalls to fix^Wpaper over this? Either against the kernel or against the man pages? As this is a 4 core Rangely, it has a properly synchronized TSC on all 4 cores which runs with constant frequency and is not affected by deeper C-States. Here is the flow: timerfd_read() waitfortick() timer interrupt() time = ktime_get() expire timer time >= timer_time tick++; wakeup_reader() hrtimer_forward_now() now = ktime_get() In the failure case now < timer_time i.e. time went backwards since the timer was expired. That's absolutely unexpected behaviour and no, we are not papering over it. Did you ever quantify how much time goes backwards in that case? Is the timer expiry and the timerfd_read() on the same CPU or on different ones? Can you please provide a full dmesg from boot to after the point where this failure happens? Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-17 19:23 ` Thomas Gleixner @ 2019-08-19 6:07 ` Arul Jeniston 2019-08-19 8:04 ` Thomas Gleixner 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-08-19 6:07 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Hi tgls, > BIOS is the more likely candidate. We would check BIOS and update. > We expect TSC not to go backwards. If it does it's broken and not usable as > a clocksource for the kernel. The problem is that this is not necessarily > easy to detect. We used relative timer with CLOCK_REALTIME which behaves similar to CLOCK_MONOTONIC. We are unable to recreate this problem. So, we have instrumented kernel code to find the possibility. During normal run, we do see small amount (~1000 cycles) of backward time drifts one in a while. This is likely happens due to the race between multiple processors and ISR routines. We have added a hook to read_tsc() and observed backward time drift when isr comes between reading tsc register and returning the value. This drifting time differs based on the number of isr handled and the time taken to service each isr. During our testing, we observed, if the drifting time goes beyond 5000 cycles, timerfd_read() returns 0 sometimes. We don't see any problem, if the drifting time is lesser than 5000 cycles. > Again: > > You cannot fix a hardware problem by hacking around it at exactly one place > where you can observe it. If that problem exists on your machine, then any > other time related function is affected as well. The system is up and running more than 6 months. We don't see any log in dmesg to comment on whether it is a hardware issue. Other than timerfd_read(), we see no functionality issue . Do we have any data collected in the kernel to help us in analyzing in the direction of BIOS or hardware? > Are you going to submit patches against _ALL_ time{r} related syscalls to > fix^Wpaper over this? Either against the kernel or against the man pages > > i.e. time went backwards since the timer was expired. That's absolutely > unexpected behaviour and no, we are not papering over it. Agreed. Our intention is not to put a workaround. Intention is to write a reliable application that handles all values returned by a system call. At present, the application doesn't know whether 0 return value is a bug or valid case. > Did you ever quantify how much time goes backwards in that case? It should be more than 5000 cycles. Found it through kernel instrumentation. > Is the timer expiry and the timerfd_read() on the same CPU or on different > ones? We don't have data to answer this. However, the kernel is configured to allow timer migration. So, we believe, the timer expiry and timerfd_read happens on different CPUs. > Can you please provide a full dmesg from boot to after the point where this > failure happens? We don't see any logs in dmesg during the occurrence of this problem. We may not be able to share complete dmesg logs due to security reasons. We haven't seen any time drifting related messages too. Let us know, if you are looking for any specific log message. Thanks, Arul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-19 6:07 ` Arul Jeniston @ 2019-08-19 8:04 ` Thomas Gleixner 2019-08-19 14:25 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-08-19 8:04 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, On Mon, 19 Aug 2019, Arul Jeniston wrote: > During normal run, we do see small amount (~1000 cycles) of backward > time drifts one in a while. > This is likely happens due to the race between multiple processors and > ISR routines. No. > We have added a hook to read_tsc() and observed backward time drift > when isr comes between reading tsc register and returning the value. > This drifting time differs based on the number of isr handled and the > time taken to service each isr. This is not a drift. Please do not misuse technical expressions which have a well defined meaning. rdtsc() val = read() interrupt .... return val Time does not go backwards in that case simply because at the point it was taken it was correct. Versus that timerfd problem this situaiton is completely irrelevant simply because hrtimer_forward_now() happens _AFTER_ the timer was expired not before. So the read of CLOCK_MONOTONIC in hrtimer_forward_now() is _AFTER_ the read of CLOCK_MONOTONIC in hrtimer_interrupt() which expires the timer and there are only two issue which can make that read in hrtimer_forward_now() go backwards vs. the time which was read when the timer was expired: 1) TSCs are out of sync or affected otherwise 2) Timekeeping has a bug. That's where the problem lies it needs to be analyzed whether this is caused by #1 or by #2. Once we know that we can discuss solutions. > Agreed. Our intention is not to put a workaround. Intention is to > write a reliable application that handles all values returned by a > system call. > At present, the application doesn't know whether 0 return value is a > bug or valid case. Again, you are tackling the wrong end. You need to find, analyze and fix the root cause. > > Is the timer expiry and the timerfd_read() on the same CPU or on different > > ones? > > We don't have data to answer this. However, the kernel is configured > to allow timer migration. > So, we believe, the timer expiry and timerfd_read happens on different CPUs. Believe is a matter of religion and pretty useless to analyze technical problems. It's not rocket science to figure this out with tracing. > > Can you please provide a full dmesg from boot to after the point where this > > failure happens? > > We don't see any logs in dmesg during the occurrence of this problem. > We may not be able to share complete dmesg logs due to security reasons. > We haven't seen any time drifting related messages too. > Let us know, if you are looking for any specific log message. I was asking for a full boot log for a reason. Is it impossible to stick that into a mail? Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-19 8:04 ` Thomas Gleixner @ 2019-08-19 14:25 ` Arul Jeniston 2019-08-19 14:52 ` Thomas Gleixner 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-08-19 14:25 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc hi Tglx, ... > > > Is the timer expiry and the timerfd_read() on the same CPU or on different > > > ones? > > We have 10000+ units running in production. We see this problem ~20 switches in a span of one year time. The problem is not seen more than once in the same unit. The occurrence is random and unpredictable. We tried to recreate this problem in LAB by loading the units similar to the production unit. so far we are not able to recreate it. It is very difficult to predict what triggered this in the production units. So, to find root cause, we studied and instrumented kernel code. > 1) TSCs are out of sync or affected otherwise If the TSC clock is unstable and not synchronized, Linux kernel throws dmesg logs and changes the current clock source to next best timer (hpet). But we didn't see these logs in any of the 10000 units. > 2) Timekeeping has a bug. As per our analysis, After the timer expiry, after tsc is read in hrtimer_forward_now() -->ktime_get()-->timekeeping_get_ns(), if the current thread (t1) is interrupted and/or some other thread running in different CPU (t2) updates timekeeper cycle_last value with a latest tsc than t1, clocksource_delta() and timekeeping_get_delta() would return 0. Eventually timekeeping_delta_to_ns() would return a smaller value based on the other two parameters (mult, xtime_nsec). If base(timekeeper.tkr_mono.base) is not updated all this time, then ktime_get() could return a value lesser than expiry time. Note: CONFIG_DEBUG_TIMEKEEPING is not configured in our system. > I was asking for a full boot log for a reason. Is it impossible to stick > that into a mail? Give me a day time to sync-up internally and update. Regards, Arul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-19 14:25 ` Arul Jeniston @ 2019-08-19 14:52 ` Thomas Gleixner 2019-08-19 15:26 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-08-19 14:52 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, On Mon, 19 Aug 2019, Arul Jeniston wrote: > > 1) TSCs are out of sync or affected otherwise > > If the TSC clock is unstable and not synchronized, Linux kernel throws > dmesg logs and changes the current clock source to next best timer > (hpet). But we didn't see these logs in any of the 10000 units. Did you see "TSC ADJUST" entries? > > 2) Timekeeping has a bug. > > As per our analysis, > > After the timer expiry, after tsc is read in hrtimer_forward_now() > -->ktime_get()-->timekeeping_get_ns(), if the current thread (t1) is > interrupted and/or some other thread running in different CPU (t2) > updates timekeeper cycle_last value with a latest tsc than t1, > clocksource_delta() and timekeeping_get_delta() would return 0. > Eventually timekeeping_delta_to_ns() would return a smaller value > based on the other two parameters (mult, xtime_nsec). If > base(timekeeper.tkr_mono.base) is not updated all this time, then > ktime_get() could return a value lesser than expiry time. > Note: CONFIG_DEBUG_TIMEKEEPING is not configured in our system. Sorry, but your analysis is wrong. The timekeeping code does never return time going backwards, except for the case where the hardware is buggered and the failure cannot be detected. But for the above scenario: ktime_get() do { seq = read_seqcount_begin(&tk_core.seq); base = tk->tkr_mono.base; nsecs = timekeeping_get_ns(&tk->tkr_mono); } while (read_seqcount_retry(&tk_core.seq, seq)); So if the interrupt which updates the timekeeper hits in the middle of timekeeping_get_ns() then the result is discarded because the sequence count changed and read_seqcount_retry() returns true. So it takes another round which will be perfectly aligned with the updated time keeper. Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-19 14:52 ` Thomas Gleixner @ 2019-08-19 15:26 ` Arul Jeniston 2019-08-19 15:59 ` Thomas Gleixner 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-08-19 15:26 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc hi Tglx, > But for the above scenario: > > ktime_get() > do { > seq = read_seqcount_begin(&tk_core.seq); > base = tk->tkr_mono.base; > nsecs = timekeeping_get_ns(&tk->tkr_mono); > > } while (read_seqcount_retry(&tk_core.seq, seq)); > > So if the interrupt which updates the timekeeper hits in the middle of > timekeeping_get_ns() then the result is discarded because the sequence > count changed and read_seqcount_retry() returns true. So it takes another > round which will be perfectly aligned with the updated time keeper. > Do you mean to say the timekeeper updates always happen from ktime_get? My point was, when one thread is in ktime_get other thread/isr updates timekeeper from different flow. Regards, Arul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-19 15:26 ` Arul Jeniston @ 2019-08-19 15:59 ` Thomas Gleixner [not found] ` <CACAVd4iRN7=eq_B1+Yb-xcspU-Sg1dmMo_=VtLXXVPkjN1hY5Q@mail.gmail.com> 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-08-19 15:59 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc On Mon, 19 Aug 2019, Arul Jeniston wrote: > hi Tglx, > > But for the above scenario: > > > > ktime_get() > > do { > > seq = read_seqcount_begin(&tk_core.seq); > > base = tk->tkr_mono.base; > > nsecs = timekeeping_get_ns(&tk->tkr_mono); > > > > } while (read_seqcount_retry(&tk_core.seq, seq)); > > > > So if the interrupt which updates the timekeeper hits in the middle of > > timekeeping_get_ns() then the result is discarded because the sequence > > count changed and read_seqcount_retry() returns true. So it takes another > > round which will be perfectly aligned with the updated time keeper. > > > > Do you mean to say the timekeeper updates always happen from ktime_get? > My point was, when one thread is in ktime_get other thread/isr updates > timekeeper from different flow. Timekeeper updates happen of course NOT from ktime_get(), but ktime_get() is protected against concurrent updates via the seqcount. Simplified without all the required barriers etc. ktime_get() do { seq = tk->seq; if (seq & 1) continue; base = tk->base; nsec = get_nsec(); while (seq != tk->seq); update() tk->seq++; update_data(); tk-<seq++; It does not matter whether the update is an interrupt on the same CPU which hits ktime_get() or whether it happens concurrent on a different CPU. ktime_get() can never use inconsistent tk data for calculating the time. Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <CACAVd4iRN7=eq_B1+Yb-xcspU-Sg1dmMo_=VtLXXVPkjN1hY5Q@mail.gmail.com>]
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. [not found] ` <CACAVd4iRN7=eq_B1+Yb-xcspU-Sg1dmMo_=VtLXXVPkjN1hY5Q@mail.gmail.com> @ 2019-08-19 18:29 ` Thomas Gleixner [not found] ` <CACAVd4jAJ5QcOH=q=Q9kAz20X4_nAc7=vVU_gPWTS1UuiGK-fg@mail.gmail.com> 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-08-19 18:29 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, On Mon, 19 Aug 2019, Arul Jeniston wrote: > > hits ktime_get() or whether it happens concurrent on a different CPU. > > ktime_get() can never use inconsistent tk data for calculating the time. > > Agreed. I think, I am not making my point clear here. > Do you mean to say ktime_get() would always return incremental time > irrespective of isr and multi-processors? Yes. The only exception is when the TSC is either jumping or not fully in sync between cores. > If yes, this is where, I have difference of understanding. And your understanding is still wrong. I explain it to you _once_ more: The side which updates the timekeeper: - increments the sequence count _BEFORE_ it changes any data. After that increment the sequence count is odd, i.e. bit 0 is set. - updates data (base, last, mult, shift ....) - increments it again _AFTER_ it updated data. After that increment the sequence count is even, i.e. bit 0 is cleared. The read out side: start: - reads the sequence count - if sequence count is odd (update in progress) go back to start - reads base from timekeeper data - reads TSC and calculates the delta with timekeeper data (last, mult, shift ...), i.e. timekeeping_get_ns(). - reads the sequence count again. If it is even and the same as read above, the data is valid and consistent and the result is returned. If the sequence count is different to the original value it goes back to start. It does not matter at all if timekeeping_get_ns() returns occasionally a wrong value due to timekeeper data being updated concurrently. The result is discarded and never returned to the caller. It tries again. All places which update the timer keeper issue the sequence count increment protection and are properly serialized against each other. So there is no occacional point where ktime_get() would return random crap due to being interrupted by an update or due to a concurrent update on a different CPU. This is a protection mechanism which is well understood in computer science (seqlock with lockless readers) and it works in kernel timekeeping for way more than a decade without any issue except when the underlying hardware clocksource (TSC in that case) misbehaves. There is no way to protect the code against this and we are not going to do anything about it simply because we can't. The fact that you can observe the (cycles < last) condition is not proving anything. Just looking at the (cycles < last) condition is wrong. You need to proof that the result is returned from ktime_get() without a retry despite the sequence counter being changed. I doubt you can. If you can prove that the condition is met _AND_ the sequence counter has NOT changed, then you have proven that the TSC on your machine is not correctly synchronized or otherwise returning crap values. You can make up further weird theories about the incorrectness of that code, but these theories wont become magically true. Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <CACAVd4jAJ5QcOH=q=Q9kAz20X4_nAc7=vVU_gPWTS1UuiGK-fg@mail.gmail.com>]
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. [not found] ` <CACAVd4jAJ5QcOH=q=Q9kAz20X4_nAc7=vVU_gPWTS1UuiGK-fg@mail.gmail.com> @ 2019-08-20 8:49 ` Thomas Gleixner 2019-08-20 9:42 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-08-20 8:49 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? A: No. Q: Should I include quotations after my reply? http://daringfireball.net/2007/07/on_top On Tue, 20 Aug 2019, Arul Jeniston wrote: > Agreed. Please find the dmesg output below. We see the problem on > [1456773.366951]. > Let us know if you find anything suspicious. The most suspicious thing is: > [ 0.000000] Linux version 4.9.0-8-amd64 ^^^^^ IOW, a (dead) kernel, which does not contain the new TSC ADJUST verification mechanism. If you send patches against mainline in future, then please always verify that the problem persists with the mainline kernel. Can you please boot something more recent - at least 4.19 LTS - on that machine and let it run for a couple of days to check whether there are 'TSC ADJUST' related entries in dmesg? Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-20 8:49 ` Thomas Gleixner @ 2019-08-20 9:42 ` Arul Jeniston 2019-09-05 8:48 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-08-20 9:42 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Hi Tglx, > Can you please boot something more recent - at least 4.19 LTS - on that > machine and let it run for a couple of days to check whether there are 'TSC > ADJUST' related entries in dmesg? Sure. Would check and update. Regards, Arul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-08-20 9:42 ` Arul Jeniston @ 2019-09-05 8:48 ` Arul Jeniston 2019-09-05 15:34 ` Thomas Gleixner 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-09-05 8:48 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc hi Tglx, When we adjust the date setting using date command we observed 'timerfd_read()' on CLOCK_REALTIME (TFD_TIMER_ABSTIME flag is set) returns 0. we don't see any hardware influence here and we are able to recreate it consistently. Is it expected? if yes, isn't it something to be documented in timerfd read() man page? Steps followed to make timerfd read to return 0. 1. run the following script in background to move the date front and back root@sonic:~/test/exp2# cat ./clock_change_test.sh while [ 1 ] do date --set="04 sep 2019 09:50:21" > /dev/null date --set="04 sep 2019 09:58:21" > /dev/null done root@sonic:~/test/exp2# ./clock_change_test.sh 2. Execute the following program to do read on CLOCK_REALTIME timer. root@sonic:~/test/exp2# cat timerfdtest.c #include <errno.h> #include <inttypes.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <sys/epoll.h> #include <sys/timerfd.h> int main() { int tfd; struct itimerspec ts; tfd = timerfd_create(CLOCK_REALTIME, 0); ts.it_interval.tv_sec = 0; ts.it_interval.tv_nsec = 10; ts.it_value.tv_sec = 10; ts.it_value.tv_nsec = 1000; if (timerfd_settime(tfd, TFD_TIMER_ABSTIME, &ts, NULL) < 0) { return EXIT_FAILURE; } while(1) { uint64_t noftimeo; if (read(tfd, &noftimeo, sizeof(noftimeo)) == 0) { printf("read returned 0 nooftimeo:%d\n", noftimeo); } } /* not reached */ return EXIT_SUCCESS; } 3. Observed the following read failure logs within few minutes. root@sonic:~/test/exp2# ./timerfdtest read returned 0 nooftimeo:1392 Regards, Arul On Tue, Aug 20, 2019 at 3:12 PM Arul Jeniston <arul.jeniston@gmail.com> wrote: > > Hi Tglx, > > > Can you please boot something more recent - at least 4.19 LTS - on that > > machine and let it run for a couple of days to check whether there are 'TSC > > ADJUST' related entries in dmesg? > > Sure. Would check and update. > > Regards, > Arul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-09-05 8:48 ` Arul Jeniston @ 2019-09-05 15:34 ` Thomas Gleixner 2019-09-06 16:36 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-09-05 15:34 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, On Thu, 5 Sep 2019, Arul Jeniston wrote: > When we adjust the date setting using date command we observed > 'timerfd_read()' on CLOCK_REALTIME (TFD_TIMER_ABSTIME flag is set) > returns 0. > we don't see any hardware influence here and we are able to recreate > it consistently. Is it expected? if yes, isn't it something to be > documented in timerfd read() man page? It's expected, yes. Simply because it hits the following condition: armtimer(T1) settime(T1 + X) --> causes timer to fire wakeup reader settime(T0) read number of intervals: 0 i.e. timer did not expire Changing the return value to 1 would be just a cosmetic workaround. We could also jump back and wait again. But that's all not consistent because armtimer(T1) settime(T1 + X) --> causes timer to fire wakeup reader read number of intervals: 1 settime(T0) user space reads time and figures that the returned tick is bogus. So I rather change the documentation (this applies only to CLOCK_REALTIME and CLOCK_REALTIME_ALARM) and explain the rationale. For applications which care about notifications when the time was set, timerfd_settime() provides TFD_TIMER_CANCEL_ON_SET which causes the timer to be canceled when time is set and returns -ECANCELED from the read. That's unambiguous. Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-09-05 15:34 ` Thomas Gleixner @ 2019-09-06 16:36 ` Arul Jeniston 2019-09-07 14:38 ` Thomas Gleixner 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-09-06 16:36 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Hi tglx, >Changing the return value to 1 would be just a cosmetic workaround. Agreed. Returning 1 is incorrect as It causes the next read() to return before the interval time passed. >So I rather change the documentation (this applies only to CLOCK_REALTIME >and CLOCK_REALTIME_ALARM) and explain the rationale. When timerfd_read() returns 0, hrtimer_forward() doesn't change expiry time, So, instead of modifying the man page, can we call timerfd_read() functionality once again from kernel. For example:- timerfd_read_wrapper() { do { ret = timerfd_read(...); } while (ret == 0); } Let us know whether you see any problem in handling this race in kernel. Regards, Arul On Thu, Sep 5, 2019 at 9:04 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > Arul, > > On Thu, 5 Sep 2019, Arul Jeniston wrote: > > When we adjust the date setting using date command we observed > > 'timerfd_read()' on CLOCK_REALTIME (TFD_TIMER_ABSTIME flag is set) > > returns 0. > > we don't see any hardware influence here and we are able to recreate > > it consistently. Is it expected? if yes, isn't it something to be > > documented in timerfd read() man page? > > It's expected, yes. Simply because it hits the following condition: > > armtimer(T1) > > settime(T1 + X) --> causes timer to fire > > wakeup reader > settime(T0) > > read number of intervals: 0 > > i.e. timer did not expire > > Changing the return value to 1 would be just a cosmetic workaround. We > could also jump back and wait again. But that's all not consistent because > > armtimer(T1) > > settime(T1 + X) --> causes timer to fire > > wakeup reader > > read number of intervals: 1 > settime(T0) > > user space reads time and figures that > the returned tick is bogus. > > So I rather change the documentation (this applies only to CLOCK_REALTIME > and CLOCK_REALTIME_ALARM) and explain the rationale. > > For applications which care about notifications when the time was set, > timerfd_settime() provides TFD_TIMER_CANCEL_ON_SET which causes the timer > to be canceled when time is set and returns -ECANCELED from the > read. That's unambiguous. > > Thanks, > > tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-09-06 16:36 ` Arul Jeniston @ 2019-09-07 14:38 ` Thomas Gleixner 2019-11-05 5:01 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-09-07 14:38 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, On Fri, 6 Sep 2019, Arul Jeniston wrote: > >Changing the return value to 1 would be just a cosmetic workaround. > > Agreed. Returning 1 is incorrect as It causes the next read() to > return before the interval time passed. > > >So I rather change the documentation (this applies only to CLOCK_REALTIME > >and CLOCK_REALTIME_ALARM) and explain the rationale. > > When timerfd_read() returns 0, hrtimer_forward() doesn't change expiry > time, So, instead of modifying the man page, can we call > timerfd_read() functionality once again from kernel. > > For example:- > timerfd_read_wrapper() > { > do { > ret = timerfd_read(...); > } while (ret == 0); > } > > Let us know whether you see any problem in handling this race in kernel. There is no race. It's defined behaviour and I explained it to you in great length why it is correct to return 0 and document that in the man page. Any CLOCK_REALTIME ABSTIME based interface of the kernel is affected by this and no, we are not papering over it in one particular place just because. If clock REALTIME gets set then all bets are off. The syscalls can return either early or userspace cam observe that the return value is bogus when it actually reads the time. You cannot handle this by any means. The only way to handle this gracefully is by using the TFD_TIMER_CANCEL_ON_SET flag and reevaluate the situation in user space. So I'm going to send a patch to document that in the manpage. Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-09-07 14:38 ` Thomas Gleixner @ 2019-11-05 5:01 ` Arul Jeniston 2019-11-05 10:01 ` Thomas Gleixner 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-11-05 5:01 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc hi Tglx, > So I'm going to send a patch to document that in the manpage. Did you get a chance to make the manpage patch? if yes, please help us by sharing the link where we can find the patch. Regards, Arul On Sat, Sep 7, 2019 at 8:08 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > Arul, > > On Fri, 6 Sep 2019, Arul Jeniston wrote: > > >Changing the return value to 1 would be just a cosmetic workaround. > > > > Agreed. Returning 1 is incorrect as It causes the next read() to > > return before the interval time passed. > > > > >So I rather change the documentation (this applies only to CLOCK_REALTIME > > >and CLOCK_REALTIME_ALARM) and explain the rationale. > > > > When timerfd_read() returns 0, hrtimer_forward() doesn't change expiry > > time, So, instead of modifying the man page, can we call > > timerfd_read() functionality once again from kernel. > > > > For example:- > > timerfd_read_wrapper() > > { > > do { > > ret = timerfd_read(...); > > } while (ret == 0); > > } > > > > Let us know whether you see any problem in handling this race in kernel. > > There is no race. It's defined behaviour and I explained it to you in great > length why it is correct to return 0 and document that in the man page. > > Any CLOCK_REALTIME ABSTIME based interface of the kernel is affected by > this and no, we are not papering over it in one particular place just > because. > > If clock REALTIME gets set then all bets are off. The syscalls can return > either early or userspace cam observe that the return value is bogus when > it actually reads the time. You cannot handle this by any means. > > The only way to handle this gracefully is by using the > TFD_TIMER_CANCEL_ON_SET flag and reevaluate the situation in user space. > > So I'm going to send a patch to document that in the manpage. > > Thanks, > > tglx > > > > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-11-05 5:01 ` Arul Jeniston @ 2019-11-05 10:01 ` Thomas Gleixner 2019-11-06 3:38 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Thomas Gleixner @ 2019-11-05 10:01 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc Arul, On Tue, 5 Nov 2019, Arul Jeniston wrote: > > So I'm going to send a patch to document that in the manpage. > > Did you get a chance to make the manpage patch? if yes, please help us > by sharing the link where we can find the patch. No. I would have Cc'ed you when posting. It's somewhere on my todo list. Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-11-05 10:01 ` Thomas Gleixner @ 2019-11-06 3:38 ` Arul Jeniston 2020-02-12 18:14 ` Arul Jeniston 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2019-11-06 3:38 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc hi Tglx, Thank you for the update. We have few customers who are waiting for this patch. Please prioritize it. Regards, Arul On Tue, Nov 5, 2019 at 3:31 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > Arul, > > On Tue, 5 Nov 2019, Arul Jeniston wrote: > > > So I'm going to send a patch to document that in the manpage. > > > > Did you get a chance to make the manpage patch? if yes, please help us > > by sharing the link where we can find the patch. > > No. I would have Cc'ed you when posting. It's somewhere on my todo list. > > Thanks, > > tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2019-11-06 3:38 ` Arul Jeniston @ 2020-02-12 18:14 ` Arul Jeniston 2020-02-13 12:11 ` Thomas Gleixner 0 siblings, 1 reply; 26+ messages in thread From: Arul Jeniston @ 2020-02-12 18:14 UTC (permalink / raw) To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc hi Tglx, Did you get a chance to update the timerfd man page? Our customers are expecting these changes to happen asap. Regards, Arul On Wed, Nov 6, 2019 at 9:08 AM Arul Jeniston <arul.jeniston@gmail.com> wrote: > > hi Tglx, > > Thank you for the update. We have few customers who are waiting for > this patch. Please prioritize it. > > Regards, > Arul > > On Tue, Nov 5, 2019 at 3:31 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > Arul, > > > > On Tue, 5 Nov 2019, Arul Jeniston wrote: > > > > So I'm going to send a patch to document that in the manpage. > > > > > > Did you get a chance to make the manpage patch? if yes, please help us > > > by sharing the link where we can find the patch. > > > > No. I would have Cc'ed you when posting. It's somewhere on my todo list. > > > > Thanks, > > > > tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 2020-02-12 18:14 ` Arul Jeniston @ 2020-02-13 12:11 ` Thomas Gleixner 0 siblings, 0 replies; 26+ messages in thread From: Thomas Gleixner @ 2020-02-13 12:11 UTC (permalink / raw) To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc, Matt Domsch Arul, Arul Jeniston <arul.jeniston@gmail.com> writes: > Did you get a chance to update the timerfd man page? Obviously not. > Our customers are expecting these changes to happen asap. You surely understand that both the kernel and the manpages are available to you (Dell) free of charge. If Dell provides these things under a commercial contract to customers, then Dells customers surely can have expectations from Dell. But that's none of my and any other contributors business. If you need these changes ASAP, then either make them yourself or contract someone who can do it for you. Just making demands on a public mailing list in order to sort your business problems is neither appropriate nor acceptable. You're neither my boss nor do I have any contractual obligations with Dell. You surely achieved something. The priority of this has moved right to the bottom of my todo list. It's going to be worked on when I run out of other things to fix or if I really get bored. Both won't happen anytime soon. Thanks, Thomas ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2020-02-13 12:11 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-16 8:32 [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 'hrtimer_forward_now()' returns zero due to bigger backward time drift. This causes timerfd_read to return 0. As per man page, read on timerfd is not expected to return 0. This patch fixes this problem. Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> arul.jeniston 2019-08-16 9:05 ` [PATCH] FS: timerfd: [Trimmed unreadable long subject line ] Thomas Gleixner 2019-08-16 10:22 ` [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function Arul Jeniston 2019-08-16 10:45 ` Thomas Gleixner 2019-08-16 16:55 ` Arul Jeniston 2019-08-16 17:00 ` Arul Jeniston 2019-08-16 21:17 ` Thomas Gleixner [not found] ` <CACAVd4hT6QYtgtDsBcgy7c_s9WVBAH+1m0r5geBe7BUWJWYhbA@mail.gmail.com> 2019-08-17 19:23 ` Thomas Gleixner 2019-08-19 6:07 ` Arul Jeniston 2019-08-19 8:04 ` Thomas Gleixner 2019-08-19 14:25 ` Arul Jeniston 2019-08-19 14:52 ` Thomas Gleixner 2019-08-19 15:26 ` Arul Jeniston 2019-08-19 15:59 ` Thomas Gleixner [not found] ` <CACAVd4iRN7=eq_B1+Yb-xcspU-Sg1dmMo_=VtLXXVPkjN1hY5Q@mail.gmail.com> 2019-08-19 18:29 ` Thomas Gleixner [not found] ` <CACAVd4jAJ5QcOH=q=Q9kAz20X4_nAc7=vVU_gPWTS1UuiGK-fg@mail.gmail.com> 2019-08-20 8:49 ` Thomas Gleixner 2019-08-20 9:42 ` Arul Jeniston 2019-09-05 8:48 ` Arul Jeniston 2019-09-05 15:34 ` Thomas Gleixner 2019-09-06 16:36 ` Arul Jeniston 2019-09-07 14:38 ` Thomas Gleixner 2019-11-05 5:01 ` Arul Jeniston 2019-11-05 10:01 ` Thomas Gleixner 2019-11-06 3:38 ` Arul Jeniston 2020-02-12 18:14 ` Arul Jeniston 2020-02-13 12:11 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).