Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.  'hrtimer_forward_now()' returns zero due to bigger backward time drift.  This causes timerfd_read to return 0. As per man page, read on timerfd  is not expected to return 0. This patch fixes this problem.  Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com>
@ 2019-08-16  8:32 arul.jeniston
  2019-08-16  9:05 ` [PATCH] FS: timerfd: [Trimmed unreadable long subject line ] Thomas Gleixner
  2019-08-16 10:22 ` [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function Arul Jeniston
  0 siblings, 2 replies; 24+ messages in thread
From: arul.jeniston @ 2019-08-16  8:32 UTC (permalink / raw)
  To: viro, tglx; +Cc: linux-fsdevel, linux-kernel, arul_mc, ARUL JENISTON MC

From: ARUL JENISTON MC <arul.jeniston@gmail.com>

---
 fs/timerfd.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 6a6fc8aa1de7..f5094e070e9a 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
 					&ctx->t.alarm, ctx->tintv) - 1;
 				alarm_restart(&ctx->t.alarm);
 			} else {
-				ticks += hrtimer_forward_now(&ctx->t.tmr,
-							     ctx->tintv) - 1;
+				u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr,
+								 ctx->tintv);
+				/*
+				 * ticks shouldn't become zero at this point.
+				 * Ignore if hrtimer_forward_now returns 0
+				 * due to larger backward time drift.
+				 */
+				if (likely(nooftimeo)) {
+					ticks += nooftimeo - 1;
+				}
 				hrtimer_restart(&ctx->t.tmr);
 			}
 		}
-- 
2.11.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: [Trimmed unreadable long subject line ]
  2019-08-16  8:32 [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 'hrtimer_forward_now()' returns zero due to bigger backward time drift. This causes timerfd_read to return 0. As per man page, read on timerfd is not expected to return 0. This patch fixes this problem. Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> arul.jeniston
@ 2019-08-16  9:05 ` Thomas Gleixner
  2019-08-16 10:22 ` [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function Arul Jeniston
  1 sibling, 0 replies; 24+ messages in thread
From: Thomas Gleixner @ 2019-08-16  9:05 UTC (permalink / raw)
  To: ARUL JENISTON MC; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

On Fri, 16 Aug 2019, arul.jeniston@gmail.com wrote:

Please write the subject as a short precise sentence which fits into 70
characters and put the long explanation into the body, i.e. here.

See Documentation/process/submitting-patches.rst

> From: ARUL JENISTON MC <arul.jeniston@gmail.com>

This lacks a Signed-off-by

> ---
>  fs/timerfd.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/timerfd.c b/fs/timerfd.c
> index 6a6fc8aa1de7..f5094e070e9a 100644
> --- a/fs/timerfd.c
> +++ b/fs/timerfd.c
> @@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
>  					&ctx->t.alarm, ctx->tintv) - 1;
>  				alarm_restart(&ctx->t.alarm);
>  			} else {
> -				ticks += hrtimer_forward_now(&ctx->t.tmr,
> -							     ctx->tintv) - 1;
> +				u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr,
> +								 ctx->tintv);
> +				/*
> +				 * ticks shouldn't become zero at this point.
> +				 * Ignore if hrtimer_forward_now returns 0
> +				 * due to larger backward time drift.
> +				 */

What? Backward time drift? Can you please explain how this would happen?

> +				if (likely(nooftimeo)) {
> +					ticks += nooftimeo - 1;
> +				}

Pointless brackets.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-16  8:32 [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 'hrtimer_forward_now()' returns zero due to bigger backward time drift. This causes timerfd_read to return 0. As per man page, read on timerfd is not expected to return 0. This patch fixes this problem. Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> arul.jeniston
  2019-08-16  9:05 ` [PATCH] FS: timerfd: [Trimmed unreadable long subject line ] Thomas Gleixner
@ 2019-08-16 10:22 ` Arul Jeniston
  2019-08-16 10:45   ` Thomas Gleixner
  1 sibling, 1 reply; 24+ messages in thread
From: Arul Jeniston @ 2019-08-16 10:22 UTC (permalink / raw)
  To: viro, tglx; +Cc: linux-fsdevel, linux-kernel, arul_mc, ARUL JENISTON MC

'hrtimer_forward_now()' returns zero due to bigger backward time drift.
This causes timerfd_read to return 0. As per man page, read on timerfd
 is not expected to return 0.
This problem is well explained in https://lkml.org/lkml/2019/7/31/442
. This patch fixes this problem.
Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com>


---
 fs/timerfd.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 6a6fc8aa1de7..f5094e070e9a 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file,
char __user *buf, size_t count,
                                        &ctx->t.alarm, ctx->tintv) - 1;
                                alarm_restart(&ctx->t.alarm);
                        } else {
-                               ticks += hrtimer_forward_now(&ctx->t.tmr,
-                                                            ctx->tintv) - 1;
+                               u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr,
+                                                                ctx->tintv);
+                               /*
+                                * ticks shouldn't become zero at this point.
+                                * Ignore if hrtimer_forward_now returns 0
+                                * due to larger backward time drift.
+                                */
+                               if (likely(nooftimeo)) {
+                                       ticks += nooftimeo - 1;
+                               }
                                hrtimer_restart(&ctx->t.tmr);
                        }
                }
--
2.11.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-16 10:22 ` [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function Arul Jeniston
@ 2019-08-16 10:45   ` Thomas Gleixner
  2019-08-16 16:55     ` Arul Jeniston
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-08-16 10:45 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

On Fri, 16 Aug 2019, Arul Jeniston wrote:

> Subject: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.

The prefix is not 'FS: timerfd:'

1) The usual prefix for fs/* is: 'fs:' but...

2) git log fs/timerfd.c gives you a pretty good hint for the proper
   prefix. Look at the commits which actually do functional changes to that
   file, not at those which do (sub)system wide cleanups/adjustments.

Also 'timerfd_read function' can be written as 'timerfd_read()' which
spares the redundant function and clearly marks it as function via the
brackets.

> 'hrtimer_forward_now()' returns zero due to bigger backward time drift.
> This causes timerfd_read to return 0. As per man page, read on timerfd
>  is not expected to return 0.
> This problem is well explained in https://lkml.org/lkml/2019/7/31/442

1) The explanation needs to be in the changelog itself. Links can point to
   discussions, bug-reports which have supplementary information.

2) Please do not use lkml.org links.

Again: Please read and follow Documentation/process/submitting-patches.rst 

> . This patch fixes this problem.
> Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com>

Missing empty line before Signed-off-by. Please use git-log to see how
changelogs are properly formatted.

Also: 'This patch fixes this problem' is not helpful at all. Again see the
document I already pointed you to.

> ---
>  fs/timerfd.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/timerfd.c b/fs/timerfd.c
> index 6a6fc8aa1de7..f5094e070e9a 100644
> --- a/fs/timerfd.c
> +++ b/fs/timerfd.c
> @@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file,
> char __user *buf, size_t count,
>                                         &ctx->t.alarm, ctx->tintv) - 1;
>                                 alarm_restart(&ctx->t.alarm);
>                         } else {
> -                               ticks += hrtimer_forward_now(&ctx->t.tmr,
> -                                                            ctx->tintv) - 1;
> +                               u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr,
> +                                                                ctx->tintv);

nooftimeo is pretty non-intuitive. The function documentation of
hrtimer_forward_now() says:

      Returns the number of overruns.

So the obvious variable name is overruns, right?

> +                               /*
> +                                * ticks shouldn't become zero at this point.
> +                                * Ignore if hrtimer_forward_now returns 0
> +                                * due to larger backward time drift.

Again. This explanation does not make any sense at all.

Time does not go backwards, except if it is CLOCK_REALTIME which can be set
backwards via clock_settime() or settimeofday().

> +                                */
> +                               if (likely(nooftimeo)) {
> +                                       ticks += nooftimeo - 1;
> +                               }

Again: Pointless brackets.

If you disagree with my review comment, then tell me in a reply. If not,
then fix it. If you decide to ignore my comments, then don't wonder if I
ignore your patches.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-16 10:45   ` Thomas Gleixner
@ 2019-08-16 16:55     ` Arul Jeniston
  2019-08-16 17:00       ` Arul Jeniston
  0 siblings, 1 reply; 24+ messages in thread
From: Arul Jeniston @ 2019-08-16 16:55 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Hi tglx,

Thank you for your comments.
Please find my commend in-lined

On Fri, Aug 16, 2019 at 4:15 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Arul,
>
> On Fri, 16 Aug 2019, Arul Jeniston wrote:
>
> > Subject: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
>
> The prefix is not 'FS: timerfd:'
>
> 1) The usual prefix for fs/* is: 'fs:' but...
>
> 2) git log fs/timerfd.c gives you a pretty good hint for the proper
>    prefix. Look at the commits which actually do functional changes to that
>    file, not at those which do (sub)system wide cleanups/adjustments.
>
> Also 'timerfd_read function' can be written as 'timerfd_read()' which
> spares the redundant function and clearly marks it as function via the
> brackets.
>
> > 'hrtimer_forward_now()' returns zero due to bigger backward time drift.
> > This causes timerfd_read to return 0. As per man page, read on timerfd
> >  is not expected to return 0.
> > This problem is well explained in https://lkml.org/lkml/2019/7/31/442
>
> 1) The explanation needs to be in the changelog itself. Links can point to
>    discussions, bug-reports which have supplementary information.
>
> 2) Please do not use lkml.org links.
>
> Again: Please read and follow Documentation/process/submitting-patches.rst
>
> > . This patch fixes this problem.
> > Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com>
>
> Missing empty line before Signed-off-by. Please use git-log to see how
> changelogs are properly formatted.
>
> Also: 'This patch fixes this problem' is not helpful at all. Again see the
> document I already pointed you to.
>

Agreed. Would incorporate all the above comments.

> > ---
> >  fs/timerfd.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/timerfd.c b/fs/timerfd.c
> > index 6a6fc8aa1de7..f5094e070e9a 100644
> > --- a/fs/timerfd.c
> > +++ b/fs/timerfd.c
> > @@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file,
> > char __user *buf, size_t count,
> >                                         &ctx->t.alarm, ctx->tintv) - 1;
> >                                 alarm_restart(&ctx->t.alarm);
> >                         } else {
> > -                               ticks += hrtimer_forward_now(&ctx->t.tmr,
> > -                                                            ctx->tintv) - 1;
> > +                               u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr,
> > +                                                                ctx->tintv);
>
> nooftimeo is pretty non-intuitive. The function documentation of
> hrtimer_forward_now() says:
>
>       Returns the number of overruns.
>
> So the obvious variable name is overruns, right?
>

Agreed. Would change the variable name to overruns.

> > +                               /*
> > +                                * ticks shouldn't become zero at this point.
> > +                                * Ignore if hrtimer_forward_now returns 0
> > +                                * due to larger backward time drift.
>
> Again. This explanation does not make any sense at all.
>
> Time does not go backwards, except if it is CLOCK_REALTIME which can be set
> backwards via clock_settime() or settimeofday().
>
> > +                                */
> > +                               if (likely(nooftimeo)) {
> > +                                       ticks += nooftimeo - 1;
> > +                               }
>
> Again: Pointless brackets.
>
> If you disagree with my review comment, then tell me in a reply. If not,
> then fix it. If you decide to ignore my comments, then don't wonder if I
> ignore your patches.
>

We use CLOCK_REALTIME while creating timer_fd.
Can read() on timerfd return 0 when the clock is set to CLOCK_REALTIME?

We have Intel rangely 4 cpu system running debian stretch linux
kernel. The current clock source is set to tsc. During our testing, we
observed the time drifts backward occasionally. Through kernel
instrumentation, we observed, sometimes clocksource_delta() finds the
current time lesser than last time. and returns 0 delta.
This causes  the following code flow to return 0
 ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta()

> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-16 16:55     ` Arul Jeniston
@ 2019-08-16 17:00       ` Arul Jeniston
  2019-08-16 21:17         ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Arul Jeniston @ 2019-08-16 17:00 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Adding few more data points...

On Fri, Aug 16, 2019 at 10:25 PM Arul Jeniston <arul.jeniston@gmail.com> wrote:
>
> Hi tglx,
>
> Thank you for your comments.
> Please find my commend in-lined
>
> On Fri, Aug 16, 2019 at 4:15 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Arul,
> >
> > On Fri, 16 Aug 2019, Arul Jeniston wrote:
> >
> > > Subject: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
> >
> > The prefix is not 'FS: timerfd:'
> >
> > 1) The usual prefix for fs/* is: 'fs:' but...
> >
> > 2) git log fs/timerfd.c gives you a pretty good hint for the proper
> >    prefix. Look at the commits which actually do functional changes to that
> >    file, not at those which do (sub)system wide cleanups/adjustments.
> >
> > Also 'timerfd_read function' can be written as 'timerfd_read()' which
> > spares the redundant function and clearly marks it as function via the
> > brackets.
> >
> > > 'hrtimer_forward_now()' returns zero due to bigger backward time drift.
> > > This causes timerfd_read to return 0. As per man page, read on timerfd
> > >  is not expected to return 0.
> > > This problem is well explained in https://lkml.org/lkml/2019/7/31/442
> >
> > 1) The explanation needs to be in the changelog itself. Links can point to
> >    discussions, bug-reports which have supplementary information.
> >
> > 2) Please do not use lkml.org links.
> >
> > Again: Please read and follow Documentation/process/submitting-patches.rst
> >
> > > . This patch fixes this problem.
> > > Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com>
> >
> > Missing empty line before Signed-off-by. Please use git-log to see how
> > changelogs are properly formatted.
> >
> > Also: 'This patch fixes this problem' is not helpful at all. Again see the
> > document I already pointed you to.
> >
>
> Agreed. Would incorporate all the above comments.
>
> > > ---
> > >  fs/timerfd.c | 12 ++++++++++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/timerfd.c b/fs/timerfd.c
> > > index 6a6fc8aa1de7..f5094e070e9a 100644
> > > --- a/fs/timerfd.c
> > > +++ b/fs/timerfd.c
> > > @@ -284,8 +284,16 @@ static ssize_t timerfd_read(struct file *file,
> > > char __user *buf, size_t count,
> > >                                         &ctx->t.alarm, ctx->tintv) - 1;
> > >                                 alarm_restart(&ctx->t.alarm);
> > >                         } else {
> > > -                               ticks += hrtimer_forward_now(&ctx->t.tmr,
> > > -                                                            ctx->tintv) - 1;
> > > +                               u64 nooftimeo = hrtimer_forward_now(&ctx->t.tmr,
> > > +                                                                ctx->tintv);
> >
> > nooftimeo is pretty non-intuitive. The function documentation of
> > hrtimer_forward_now() says:
> >
> >       Returns the number of overruns.
> >
> > So the obvious variable name is overruns, right?
> >
>
> Agreed. Would change the variable name to overruns.
>
> > > +                               /*
> > > +                                * ticks shouldn't become zero at this point.
> > > +                                * Ignore if hrtimer_forward_now returns 0
> > > +                                * due to larger backward time drift.
> >
> > Again. This explanation does not make any sense at all.
> >
> > Time does not go backwards, except if it is CLOCK_REALTIME which can be set
> > backwards via clock_settime() or settimeofday().
> >
> > > +                                */
> > > +                               if (likely(nooftimeo)) {
> > > +                                       ticks += nooftimeo - 1;
> > > +                               }
> >
> > Again: Pointless brackets.
> >
> > If you disagree with my review comment, then tell me in a reply. If not,
> > then fix it. If you decide to ignore my comments, then don't wonder if I
> > ignore your patches.
> >
>
> We use CLOCK_REALTIME while creating timer_fd.
> Can read() on timerfd return 0 when the clock is set to CLOCK_REALTIME?
>
> We have Intel rangely 4 cpu system running debian stretch linux
> kernel. The current clock source is set to tsc. During our testing, we
> observed the time drifts backward occasionally. Through kernel
> instrumentation, we observed, sometimes clocksource_delta() finds the
> current time lesser than last time. and returns 0 delta.
>

This causes the following code flow to return a time which is lesser
than previously fetched time.
ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta()

Since ktime_get() returns a time which is lesser than the expiry time,
hrtimer_forward_now return 0.
This in-turn causes timerfd_read to return 0.
Is it not a bug?

> > Thanks,
> >
> >         tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-16 17:00       ` Arul Jeniston
@ 2019-08-16 21:17         ` Thomas Gleixner
       [not found]           ` <CACAVd4hT6QYtgtDsBcgy7c_s9WVBAH+1m0r5geBe7BUWJWYhbA@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-08-16 21:17 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

On Fri, 16 Aug 2019, Arul Jeniston wrote:

> Adding few more data points...

Can you please trim your replies? It's annoying to have to search for the
meat of your mail by scrolling down several pages and paying attention to
not skip something useful inside of useless information.

> On Fri, Aug 16, 2019 at 10:25 PM Arul Jeniston <arul.jeniston@gmail.com> wrote:
> > On Fri, Aug 16, 2019 at 4:15 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > We use CLOCK_REALTIME while creating timer_fd.
> > Can read() on timerfd return 0 when the clock is set to CLOCK_REALTIME?

As CLOCK_REALTIME is subject to be set by various mechanisms, yes. See
timerfd_clock_was_set(). If that's the case, your application is missing
something. But see below ...

> > We have Intel rangely 4 cpu system running debian stretch linux
> > kernel. The current clock source is set to tsc. During our testing, we
> > observed the time drifts backward occasionally. Through kernel
> > instrumentation, we observed, sometimes clocksource_delta() finds the
> > current time lesser than last time. and returns 0 delta.

That has absolutely nothing to do with CLOCK_REALTIME. Your machines TSC is
either going backwards or not synchronized between cores.

Hint: Dell has a track record of BIOS doing the wrong things to TSC in
order to hide their 'value add' features stealing CPU time.

> This causes the following code flow to return a time which is lesser
> than previously fetched time.
> ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta()

ktime_get() is CLOCK_MONOTONIC and not CLOCK_REALTIME.
 
> Since ktime_get() returns a time which is lesser than the expiry time,
> hrtimer_forward_now return 0.
> This in-turn causes timerfd_read to return 0.
> Is it not a bug?

It's a bug, but either a hardware or a BIOS bug and you are trying to paper
over it at the place where you observe the symptom, which is obviously the
wrong place because:

 1) Any other time related function even in timerfd is affected as well

 2) We do not cure symptoms, we cure the root cause. And clearly the root
    cause hase not been explained and addressed.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
       [not found]           ` <CACAVd4hT6QYtgtDsBcgy7c_s9WVBAH+1m0r5geBe7BUWJWYhbA@mail.gmail.com>
@ 2019-08-17 19:23             ` Thomas Gleixner
  2019-08-19  6:07               ` Arul Jeniston
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-08-17 19:23 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

On Sat, 17 Aug 2019, Arul Jeniston wrote:

> Do you agree the possibility of returning zero value from timerfd_read()?

Obviosuly it happens.
 
> > That has absolutely nothing to do with CLOCK_REALTIME. Your machines
> > TSC is either going backwards or not synchronized between cores.
> >
> > Hint: Dell has a track record of BIOS doing the wrong things to TSC in
> > order to hide their 'value add' features stealing CPU time.
> 
> We haven't seen any issue in Dell hardware but we would definitely check
> the possibility of hardware bug.

BIOS is the more likely candidate.

> Let us say, due to some reason the tsc goes backwards. Isn't it handled in
> clocksource_delta().

No. clocksource_delta() does damage limitation. It prevents insane large
time jumps which would result if the read out TSC value is less than the
base value which is used to calculate the delta. It cannot do more than
that.

> Is timerfd_read expected to return 0 if tsc drifts backwards? If so, why it
> is not documented?
> Being a system call, we expect all return values of read() on timerfd to be
> documented in the man page.

We expect TSC not to go backwards. If it does it's broken and not usable as
a clocksource for the kernel. The problem is that this is not necessarily
easy to detect.

Fact is, that your machines TSC goes backwards or is not properly
synchronized between the cores. Otherwise the problem would not exist at
all. That's the problem which needs to be fixed and not papered over with
crude hacks and weird justifications.

> > ktime_get() is CLOCK_MONOTONIC and not CLOCK_REALTIME.
> 
> We see the same base used for CLOCK_MONOTONIC, CLOCK_REALTIME timers here.
> both MONOTONIC, REALTIME timers hits the following code flow. we confirmed
> it through instrumentation.
> timerfd_read()-->hrtimer_forward_now()-->ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta().
>  Do you want me to share any other logs to confirm it?

No. That's the case when you use a relative timer with CLOCK_REALTIME
because only absolute timers are affected by modifications of
CLOCK_REALTIME. So it's NOT an issue of a CLOCK_REALTIME modification via
settimeofday() or adjtimex().

> > It's a bug, but either a hardware or a BIOS bug and you are trying to
> > paper over it at the place where you observe the symptom, which is
> > obviously the wrong place because:
> >
> >  1) Any other time related function even in timerfd is affected as well
> >
> >  2) We do not cure symptoms, we cure the root cause. And clearly the root
> >     cause hase not been explained and addressed.
> 
> if we don't fix this in kernel, can we document this return value in
> timerfd read() man page?

Again:

You cannot fix a hardware problem by hacking around it at exactly one place
where you can observe it. If that problem exists on your machine, then any
other time related function is affected as well.

Are you going to submit patches against _ALL_ time{r} related syscalls to
fix^Wpaper over this? Either against the kernel or against the man pages?

As this is a 4 core Rangely, it has a properly synchronized TSC on all 4
cores which runs with constant frequency and is not affected by deeper
C-States.

Here is the flow:

timerfd_read()

   waitfortick()

timer interrupt()
      time = ktime_get()
      expire timer			time >= timer_time
        tick++;
	wakeup_reader()

   hrtimer_forward_now()
	now = ktime_get()		In the failure case now < timer_time

i.e. time went backwards since the timer was expired. That's absolutely
unexpected behaviour and no, we are not papering over it.

Did you ever quantify how much time goes backwards in that case?

Is the timer expiry and the timerfd_read() on the same CPU or on different
ones?

Can you please provide a full dmesg from boot to after the point where this
failure happens?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-17 19:23             ` Thomas Gleixner
@ 2019-08-19  6:07               ` Arul Jeniston
  2019-08-19  8:04                 ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Arul Jeniston @ 2019-08-19  6:07 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Hi tgls,


> BIOS is the more likely candidate.

We would check BIOS and update.


> We expect TSC not to go backwards. If it does it's broken and not usable as
> a clocksource for the kernel. The problem is that this is not necessarily
> easy to detect.

We used  relative timer with CLOCK_REALTIME which behaves similar to
CLOCK_MONOTONIC.
We are unable to recreate this problem. So, we have instrumented
kernel code to find the possibility.
During normal run, we do see small amount (~1000 cycles) of backward
time drifts one in a while.
This is likely happens due to the race between multiple processors and
ISR routines.
We have added a hook to read_tsc() and observed backward time drift
when isr comes between reading tsc register and returning the value.
This drifting time differs based on the number of isr handled and the
time taken to service each isr.
During our testing, we observed, if the drifting time goes beyond 5000
cycles, timerfd_read() returns 0 sometimes.
We don't see any problem, if the drifting time is lesser than 5000 cycles.


> Again:
>
> You cannot fix a hardware problem by hacking around it at exactly one place
> where you can observe it. If that problem exists on your machine, then any
> other time related function is affected as well.


The system is up and running more than 6 months.
We don't see any log in dmesg to comment on whether it is a hardware issue.
Other than timerfd_read(), we see no functionality issue . Do we have
any data collected in the kernel to help us in analyzing in the
direction of BIOS or hardware?

> Are you going to submit patches against _ALL_ time{r} related syscalls to
> fix^Wpaper over this? Either against the kernel or against the man pages
>
> i.e. time went backwards since the timer was expired. That's absolutely
> unexpected behaviour and no, we are not papering over it.


Agreed. Our intention is not to put a workaround. Intention is to
write a reliable application that handles all values returned by a
system call.
At present, the application doesn't know  whether 0 return value is a
bug or valid case.


> Did you ever quantify how much time goes backwards in that case?

It should be more than 5000 cycles. Found it through kernel instrumentation.


> Is the timer expiry and the timerfd_read() on the same CPU or on different
> ones?


We don't have data to answer this. However, the kernel is configured
to allow timer migration.
So, we believe, the timer expiry and timerfd_read happens on different CPUs.

> Can you please provide a full dmesg from boot to after the point where this
> failure happens?

We don't see any logs in dmesg during the occurrence of this problem.
We may not be able to share complete dmesg logs due to security reasons.
We haven't seen any time drifting related messages too.
Let us know, if you are looking for any specific log message.

Thanks,
Arul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-19  6:07               ` Arul Jeniston
@ 2019-08-19  8:04                 ` Thomas Gleixner
  2019-08-19 14:25                   ` Arul Jeniston
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-08-19  8:04 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

On Mon, 19 Aug 2019, Arul Jeniston wrote:
> During normal run, we do see small amount (~1000 cycles) of backward
> time drifts one in a while.
> This is likely happens due to the race between multiple processors and
> ISR routines.

No.

> We have added a hook to read_tsc() and observed backward time drift
> when isr comes between reading tsc register and returning the value.
> This drifting time differs based on the number of isr handled and the
> time taken to service each isr.

This is not a drift. Please do not misuse technical expressions which have
a well defined meaning.

rdtsc()
  val = read()

interrupt
	....

  return val

Time does not go backwards in that case simply because at the point it was
taken it was correct.

Versus that timerfd problem this situaiton is completely irrelevant simply
because hrtimer_forward_now() happens _AFTER_ the timer was expired not
before.

So the read of CLOCK_MONOTONIC in hrtimer_forward_now() is _AFTER_ the read
of CLOCK_MONOTONIC in hrtimer_interrupt() which expires the timer and there
are only two issue which can make that read in hrtimer_forward_now() go
backwards vs. the time which was read when the timer was expired:

  1) TSCs are out of sync or affected otherwise

  2) Timekeeping has a bug.

That's where the problem lies it needs to be analyzed whether this is
caused by #1 or by #2. Once we know that we can discuss solutions.

> Agreed. Our intention is not to put a workaround. Intention is to
> write a reliable application that handles all values returned by a
> system call.
> At present, the application doesn't know  whether 0 return value is a
> bug or valid case.

Again, you are tackling the wrong end. You need to find, analyze and fix
the root cause.

> > Is the timer expiry and the timerfd_read() on the same CPU or on different
> > ones?
> 
> We don't have data to answer this. However, the kernel is configured
> to allow timer migration.
> So, we believe, the timer expiry and timerfd_read happens on different CPUs.

Believe is a matter of religion and pretty useless to analyze technical
problems. It's not rocket science to figure this out with tracing.

> > Can you please provide a full dmesg from boot to after the point where this
> > failure happens?
> 
> We don't see any logs in dmesg during the occurrence of this problem.
> We may not be able to share complete dmesg logs due to security reasons.
> We haven't seen any time drifting related messages too.
> Let us know, if you are looking for any specific log message.

I was asking for a full boot log for a reason. Is it impossible to stick
that into a mail?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-19  8:04                 ` Thomas Gleixner
@ 2019-08-19 14:25                   ` Arul Jeniston
  2019-08-19 14:52                     ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Arul Jeniston @ 2019-08-19 14:25 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

hi Tglx,

...
> > > Is the timer expiry and the timerfd_read() on the same CPU or on different
> > > ones?
> >

We have 10000+ units running in production. We see this problem ~20
switches in a span of one year time.
The problem is not seen more than once in the same unit. The
occurrence is random and unpredictable.
We tried to recreate this problem in LAB by loading the units similar
to the production unit.
so far we are not able to recreate it. It is very difficult to predict
what triggered this in the production units.
So, to find root cause, we studied and instrumented kernel code.

>   1) TSCs are out of sync or affected otherwise

If the TSC clock is unstable and not synchronized, Linux kernel throws
dmesg logs and changes the current clock source to next best timer
(hpet). But we didn't see these logs in any of the 10000 units.

>   2) Timekeeping has a bug.

As per our analysis,

After the timer expiry, after tsc is read in hrtimer_forward_now()
-->ktime_get()-->timekeeping_get_ns(), if the current thread (t1) is
interrupted and/or some other thread running in different CPU (t2)
updates timekeeper cycle_last value with a latest tsc than t1,
clocksource_delta() and timekeeping_get_delta() would return 0.
Eventually   timekeeping_delta_to_ns() would return a smaller value
based on the other two parameters (mult, xtime_nsec). If
base(timekeeper.tkr_mono.base) is not updated all this time, then
ktime_get() could return a value lesser than expiry time.
Note: CONFIG_DEBUG_TIMEKEEPING is not configured in our system.

> I was asking for a full boot log for a reason. Is it impossible to stick
> that into a mail?

Give me a day time to sync-up internally and update.

Regards,
Arul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-19 14:25                   ` Arul Jeniston
@ 2019-08-19 14:52                     ` Thomas Gleixner
  2019-08-19 15:26                       ` Arul Jeniston
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-08-19 14:52 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

On Mon, 19 Aug 2019, Arul Jeniston wrote:
> >   1) TSCs are out of sync or affected otherwise
> 
> If the TSC clock is unstable and not synchronized, Linux kernel throws
> dmesg logs and changes the current clock source to next best timer
> (hpet). But we didn't see these logs in any of the 10000 units.

Did you see "TSC ADJUST" entries?

> >   2) Timekeeping has a bug.
> 
> As per our analysis,
> 
> After the timer expiry, after tsc is read in hrtimer_forward_now()
> -->ktime_get()-->timekeeping_get_ns(), if the current thread (t1) is
> interrupted and/or some other thread running in different CPU (t2)
> updates timekeeper cycle_last value with a latest tsc than t1,
> clocksource_delta() and timekeeping_get_delta() would return 0.
> Eventually   timekeeping_delta_to_ns() would return a smaller value
> based on the other two parameters (mult, xtime_nsec). If
> base(timekeeper.tkr_mono.base) is not updated all this time, then
> ktime_get() could return a value lesser than expiry time.
> Note: CONFIG_DEBUG_TIMEKEEPING is not configured in our system.

Sorry, but your analysis is wrong.

The timekeeping code does never return time going backwards, except for the
case where the hardware is buggered and the failure cannot be detected.

But for the above scenario:

ktime_get()
  	do {
                seq = read_seqcount_begin(&tk_core.seq);
	        base = tk->tkr_mono.base;
		nsecs = timekeeping_get_ns(&tk->tkr_mono);

        } while (read_seqcount_retry(&tk_core.seq, seq));

So if the interrupt which updates the timekeeper hits in the middle of
timekeeping_get_ns() then the result is discarded because the sequence
count changed and read_seqcount_retry() returns true. So it takes another
round which will be perfectly aligned with the updated time keeper.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-19 14:52                     ` Thomas Gleixner
@ 2019-08-19 15:26                       ` Arul Jeniston
  2019-08-19 15:59                         ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Arul Jeniston @ 2019-08-19 15:26 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

hi Tglx,
> But for the above scenario:
>
> ktime_get()
>         do {
>                 seq = read_seqcount_begin(&tk_core.seq);
>                 base = tk->tkr_mono.base;
>                 nsecs = timekeeping_get_ns(&tk->tkr_mono);
>
>         } while (read_seqcount_retry(&tk_core.seq, seq));
>
> So if the interrupt which updates the timekeeper hits in the middle of
> timekeeping_get_ns() then the result is discarded because the sequence
> count changed and read_seqcount_retry() returns true. So it takes another
> round which will be perfectly aligned with the updated time keeper.
>

Do you mean to say the timekeeper  updates always happen from ktime_get?
My point was, when one thread is in ktime_get other thread/isr updates
timekeeper from different flow.

Regards,
Arul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-19 15:26                       ` Arul Jeniston
@ 2019-08-19 15:59                         ` Thomas Gleixner
       [not found]                           ` <CACAVd4iRN7=eq_B1+Yb-xcspU-Sg1dmMo_=VtLXXVPkjN1hY5Q@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-08-19 15:59 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

On Mon, 19 Aug 2019, Arul Jeniston wrote:

> hi Tglx,
> > But for the above scenario:
> >
> > ktime_get()
> >         do {
> >                 seq = read_seqcount_begin(&tk_core.seq);
> >                 base = tk->tkr_mono.base;
> >                 nsecs = timekeeping_get_ns(&tk->tkr_mono);
> >
> >         } while (read_seqcount_retry(&tk_core.seq, seq));
> >
> > So if the interrupt which updates the timekeeper hits in the middle of
> > timekeeping_get_ns() then the result is discarded because the sequence
> > count changed and read_seqcount_retry() returns true. So it takes another
> > round which will be perfectly aligned with the updated time keeper.
> >
> 
> Do you mean to say the timekeeper  updates always happen from ktime_get?
> My point was, when one thread is in ktime_get other thread/isr updates
> timekeeper from different flow.

Timekeeper updates happen of course NOT from ktime_get(), but ktime_get()
is protected against concurrent updates via the seqcount. Simplified
without all the required barriers etc.

ktime_get()

    do {
       seq = tk->seq;
       if (seq & 1)
          continue;
       base = tk->base;
       nsec = get_nsec();
    while (seq != tk->seq);

update()

   tk->seq++;
   update_data();
   tk-<seq++;

It does not matter whether the update is an interrupt on the same CPU which
hits ktime_get() or whether it happens concurrent on a different CPU.

ktime_get() can never use inconsistent tk data for calculating the time.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
       [not found]                           ` <CACAVd4iRN7=eq_B1+Yb-xcspU-Sg1dmMo_=VtLXXVPkjN1hY5Q@mail.gmail.com>
@ 2019-08-19 18:29                             ` Thomas Gleixner
       [not found]                               ` <CACAVd4jAJ5QcOH=q=Q9kAz20X4_nAc7=vVU_gPWTS1UuiGK-fg@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-08-19 18:29 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

On Mon, 19 Aug 2019, Arul Jeniston wrote:

> > hits ktime_get() or whether it happens concurrent on a different CPU.
> > ktime_get() can never use inconsistent tk data for calculating the time.
> 
> Agreed. I think, I am not making my point clear here.
> Do you mean to say ktime_get() would always return incremental time
> irrespective of isr and multi-processors?

Yes. The only exception is when the TSC is either jumping or not fully in
sync between cores.

> If yes, this is where, I have difference of understanding.

And your understanding is still wrong. I explain it to you _once_ more:

The side which updates the timekeeper:

    - increments the sequence count _BEFORE_ it changes any data. After that
      increment the sequence count is odd, i.e. bit 0 is set.

    - updates data (base, last, mult, shift ....)

    - increments it again _AFTER_ it updated data.  After that increment
      the sequence count is even, i.e. bit 0 is cleared.

The read out side:

start:
    - reads the sequence count
      - if sequence count is odd (update in progress) go back to start

    - reads base from timekeeper data
    - reads TSC and calculates the delta with timekeeper data
      (last, mult, shift ...), i.e. timekeeping_get_ns().

    - reads the sequence count again.

      If it is even and the same as read above, the data is valid and
      consistent and the result is returned.

      If the sequence count is different to the original value it goes back
      to start.

It does not matter at all if timekeeping_get_ns() returns occasionally a
wrong value due to timekeeper data being updated concurrently. The result
is discarded and never returned to the caller. It tries again.

All places which update the timer keeper issue the sequence count increment
protection and are properly serialized against each other. So there is no
occacional point where ktime_get() would return random crap due to being
interrupted by an update or due to a concurrent update on a different CPU.

This is a protection mechanism which is well understood in computer science
(seqlock with lockless readers) and it works in kernel timekeeping for way
more than a decade without any issue except when the underlying hardware
clocksource (TSC in that case) misbehaves. There is no way to protect the
code against this and we are not going to do anything about it simply
because we can't.

The fact that you can observe the (cycles < last) condition is not proving
anything. Just looking at the (cycles < last) condition is wrong. You need
to proof that the result is returned from ktime_get() without a retry
despite the sequence counter being changed. I doubt you can.

If you can prove that the condition is met _AND_ the sequence counter has
NOT changed, then you have proven that the TSC on your machine is not
correctly synchronized or otherwise returning crap values.

You can make up further weird theories about the incorrectness of that
code, but these theories wont become magically true.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
       [not found]                               ` <CACAVd4jAJ5QcOH=q=Q9kAz20X4_nAc7=vVU_gPWTS1UuiGK-fg@mail.gmail.com>
@ 2019-08-20  8:49                                 ` Thomas Gleixner
  2019-08-20  9:42                                   ` Arul Jeniston
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-08-20  8:49 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

 A: Because it messes up the order in which people normally read text.
 Q: Why is top-posting such a bad thing?
 A: Top-posting.
 Q: What is the most annoying thing in e-mail?

 A: No.
 Q: Should I include quotations after my reply?

 http://daringfireball.net/2007/07/on_top

On Tue, 20 Aug 2019, Arul Jeniston wrote:

> Agreed. Please find the dmesg output below. We see the problem on
> [1456773.366951].
> Let us know if you find anything suspicious.

The most suspicious thing is:

> [    0.000000] Linux version 4.9.0-8-amd64

       		       	       ^^^^^

IOW, a (dead) kernel, which does not contain the new TSC ADJUST verification
mechanism.

If you send patches against mainline in future, then please always verify
that the problem persists with the mainline kernel.

Can you please boot something more recent - at least 4.19 LTS - on that
machine and let it run for a couple of days to check whether there are 'TSC
ADJUST' related entries in dmesg?

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-20  8:49                                 ` Thomas Gleixner
@ 2019-08-20  9:42                                   ` Arul Jeniston
  2019-09-05  8:48                                     ` Arul Jeniston
  0 siblings, 1 reply; 24+ messages in thread
From: Arul Jeniston @ 2019-08-20  9:42 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Hi Tglx,

> Can you please boot something more recent - at least 4.19 LTS - on that
> machine and let it run for a couple of days to check whether there are 'TSC
> ADJUST' related entries in dmesg?

Sure. Would check and update.

Regards,
Arul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-08-20  9:42                                   ` Arul Jeniston
@ 2019-09-05  8:48                                     ` Arul Jeniston
  2019-09-05 15:34                                       ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Arul Jeniston @ 2019-09-05  8:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

hi Tglx,

When we adjust the date setting using date command we observed
'timerfd_read()' on CLOCK_REALTIME (TFD_TIMER_ABSTIME flag is set)
returns 0.
we don't see any hardware influence here and we are able to recreate
it consistently. Is it expected? if yes, isn't it something to be
documented in timerfd read() man page?

Steps followed to make timerfd read to return 0.

1. run the following script in background to move the date front and back

root@sonic:~/test/exp2# cat ./clock_change_test.sh
while [ 1 ]
do
date --set="04 sep 2019 09:50:21" > /dev/null
date --set="04 sep 2019 09:58:21" > /dev/null
done

root@sonic:~/test/exp2# ./clock_change_test.sh

2. Execute the following program to do read on CLOCK_REALTIME timer.

root@sonic:~/test/exp2# cat timerfdtest.c
#include <errno.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/epoll.h>
#include <sys/timerfd.h>

int main()
{
        int tfd;
        struct itimerspec ts;
        tfd = timerfd_create(CLOCK_REALTIME, 0);
        ts.it_interval.tv_sec = 0;
        ts.it_interval.tv_nsec = 10;
        ts.it_value.tv_sec = 10;
        ts.it_value.tv_nsec = 1000;
        if (timerfd_settime(tfd, TFD_TIMER_ABSTIME, &ts, NULL) < 0) {
          return EXIT_FAILURE;
        }

        while(1) {
                uint64_t noftimeo;
                if (read(tfd, &noftimeo, sizeof(noftimeo)) == 0) {
                   printf("read returned 0 nooftimeo:%d\n", noftimeo);
                }
        }
        /* not reached */
        return EXIT_SUCCESS;
}

3. Observed the following read failure logs within few minutes.
root@sonic:~/test/exp2# ./timerfdtest
read returned 0 nooftimeo:1392


Regards,
Arul


On Tue, Aug 20, 2019 at 3:12 PM Arul Jeniston <arul.jeniston@gmail.com> wrote:
>
> Hi Tglx,
>
> > Can you please boot something more recent - at least 4.19 LTS - on that
> > machine and let it run for a couple of days to check whether there are 'TSC
> > ADJUST' related entries in dmesg?
>
> Sure. Would check and update.
>
> Regards,
> Arul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-09-05  8:48                                     ` Arul Jeniston
@ 2019-09-05 15:34                                       ` Thomas Gleixner
  2019-09-06 16:36                                         ` Arul Jeniston
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-09-05 15:34 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

On Thu, 5 Sep 2019, Arul Jeniston wrote:
> When we adjust the date setting using date command we observed
> 'timerfd_read()' on CLOCK_REALTIME (TFD_TIMER_ABSTIME flag is set)
> returns 0.
> we don't see any hardware influence here and we are able to recreate
> it consistently. Is it expected? if yes, isn't it something to be
> documented in timerfd read() man page?

It's expected, yes. Simply because it hits the following condition:

     armtimer(T1)

     settime(T1 + X)  --> causes timer to fire

     		      	  	 wakeup reader
     settime(T0)

				 read number of intervals: 0

				 i.e. timer did not expire

Changing the return value to 1 would be just a cosmetic workaround. We
could also jump back and wait again. But that's all not consistent because

     armtimer(T1)

     settime(T1 + X)  --> causes timer to fire

     		      	  	 wakeup reader

				 read number of intervals: 1
     settime(T0)

				 user space reads time and figures that
				 the returned tick is bogus.

So I rather change the documentation (this applies only to CLOCK_REALTIME
and CLOCK_REALTIME_ALARM) and explain the rationale.

For applications which care about notifications when the time was set,
timerfd_settime() provides TFD_TIMER_CANCEL_ON_SET which causes the timer
to be canceled when time is set and returns -ECANCELED from the
read. That's unambiguous.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-09-05 15:34                                       ` Thomas Gleixner
@ 2019-09-06 16:36                                         ` Arul Jeniston
  2019-09-07 14:38                                           ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Arul Jeniston @ 2019-09-06 16:36 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Hi tglx,

>Changing the return value to 1 would be just a cosmetic workaround.

Agreed. Returning 1 is incorrect as It causes the next read() to
return before the interval time passed.

>So I rather change the documentation (this applies only to CLOCK_REALTIME
>and CLOCK_REALTIME_ALARM) and explain the rationale.

When timerfd_read() returns 0, hrtimer_forward() doesn't change expiry
time, So, instead of modifying the man page, can we call
timerfd_read() functionality once again from kernel.

For example:-
timerfd_read_wrapper()
{
   do {
     ret = timerfd_read(...);
   } while (ret == 0);
}

Let us know whether you see any problem in handling this race in kernel.

Regards,
Arul


On Thu, Sep 5, 2019 at 9:04 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Arul,
>
> On Thu, 5 Sep 2019, Arul Jeniston wrote:
> > When we adjust the date setting using date command we observed
> > 'timerfd_read()' on CLOCK_REALTIME (TFD_TIMER_ABSTIME flag is set)
> > returns 0.
> > we don't see any hardware influence here and we are able to recreate
> > it consistently. Is it expected? if yes, isn't it something to be
> > documented in timerfd read() man page?
>
> It's expected, yes. Simply because it hits the following condition:
>
>      armtimer(T1)
>
>      settime(T1 + X)  --> causes timer to fire
>
>                                  wakeup reader
>      settime(T0)
>
>                                  read number of intervals: 0
>
>                                  i.e. timer did not expire
>
> Changing the return value to 1 would be just a cosmetic workaround. We
> could also jump back and wait again. But that's all not consistent because
>
>      armtimer(T1)
>
>      settime(T1 + X)  --> causes timer to fire
>
>                                  wakeup reader
>
>                                  read number of intervals: 1
>      settime(T0)
>
>                                  user space reads time and figures that
>                                  the returned tick is bogus.
>
> So I rather change the documentation (this applies only to CLOCK_REALTIME
> and CLOCK_REALTIME_ALARM) and explain the rationale.
>
> For applications which care about notifications when the time was set,
> timerfd_settime() provides TFD_TIMER_CANCEL_ON_SET which causes the timer
> to be canceled when time is set and returns -ECANCELED from the
> read. That's unambiguous.
>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-09-06 16:36                                         ` Arul Jeniston
@ 2019-09-07 14:38                                           ` Thomas Gleixner
  2019-11-05  5:01                                             ` Arul Jeniston
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-09-07 14:38 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

On Fri, 6 Sep 2019, Arul Jeniston wrote:
> >Changing the return value to 1 would be just a cosmetic workaround.
> 
> Agreed. Returning 1 is incorrect as It causes the next read() to
> return before the interval time passed.
> 
> >So I rather change the documentation (this applies only to CLOCK_REALTIME
> >and CLOCK_REALTIME_ALARM) and explain the rationale.
> 
> When timerfd_read() returns 0, hrtimer_forward() doesn't change expiry
> time, So, instead of modifying the man page, can we call
> timerfd_read() functionality once again from kernel.
> 
> For example:-
> timerfd_read_wrapper()
> {
>    do {
>      ret = timerfd_read(...);
>    } while (ret == 0);
> }
> 
> Let us know whether you see any problem in handling this race in kernel.

There is no race. It's defined behaviour and I explained it to you in great
length why it is correct to return 0 and document that in the man page.

Any CLOCK_REALTIME ABSTIME based interface of the kernel is affected by
this and no, we are not papering over it in one particular place just
because.

If clock REALTIME gets set then all bets are off. The syscalls can return
either early or userspace cam observe that the return value is bogus when
it actually reads the time. You cannot handle this by any means.

The only way to handle this gracefully is by using the
TFD_TIMER_CANCEL_ON_SET flag and reevaluate the situation in user space.

So I'm going to send a patch to document that in the manpage.

Thanks,

	tglx





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-09-07 14:38                                           ` Thomas Gleixner
@ 2019-11-05  5:01                                             ` Arul Jeniston
  2019-11-05 10:01                                               ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Arul Jeniston @ 2019-11-05  5:01 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

hi Tglx,

>  So I'm going to send a patch to document that in the manpage.

Did you get a chance to make the manpage patch? if yes, please help us
by sharing the link where we can find the patch.

Regards,
Arul

On Sat, Sep 7, 2019 at 8:08 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Arul,
>
> On Fri, 6 Sep 2019, Arul Jeniston wrote:
> > >Changing the return value to 1 would be just a cosmetic workaround.
> >
> > Agreed. Returning 1 is incorrect as It causes the next read() to
> > return before the interval time passed.
> >
> > >So I rather change the documentation (this applies only to CLOCK_REALTIME
> > >and CLOCK_REALTIME_ALARM) and explain the rationale.
> >
> > When timerfd_read() returns 0, hrtimer_forward() doesn't change expiry
> > time, So, instead of modifying the man page, can we call
> > timerfd_read() functionality once again from kernel.
> >
> > For example:-
> > timerfd_read_wrapper()
> > {
> >    do {
> >      ret = timerfd_read(...);
> >    } while (ret == 0);
> > }
> >
> > Let us know whether you see any problem in handling this race in kernel.
>
> There is no race. It's defined behaviour and I explained it to you in great
> length why it is correct to return 0 and document that in the man page.
>
> Any CLOCK_REALTIME ABSTIME based interface of the kernel is affected by
> this and no, we are not papering over it in one particular place just
> because.
>
> If clock REALTIME gets set then all bets are off. The syscalls can return
> either early or userspace cam observe that the return value is bogus when
> it actually reads the time. You cannot handle this by any means.
>
> The only way to handle this gracefully is by using the
> TFD_TIMER_CANCEL_ON_SET flag and reevaluate the situation in user space.
>
> So I'm going to send a patch to document that in the manpage.
>
> Thanks,
>
>         tglx
>
>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-11-05  5:01                                             ` Arul Jeniston
@ 2019-11-05 10:01                                               ` Thomas Gleixner
  2019-11-06  3:38                                                 ` Arul Jeniston
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2019-11-05 10:01 UTC (permalink / raw)
  To: Arul Jeniston; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

Arul,

On Tue, 5 Nov 2019, Arul Jeniston wrote:
> >  So I'm going to send a patch to document that in the manpage.
> 
> Did you get a chance to make the manpage patch? if yes, please help us
> by sharing the link where we can find the patch.

No. I would have Cc'ed you when posting. It's somewhere on my todo list.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
  2019-11-05 10:01                                               ` Thomas Gleixner
@ 2019-11-06  3:38                                                 ` Arul Jeniston
  0 siblings, 0 replies; 24+ messages in thread
From: Arul Jeniston @ 2019-11-06  3:38 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: viro, linux-fsdevel, linux-kernel, arul_mc

hi Tglx,

Thank you for the update. We have few customers who are waiting for
this patch. Please prioritize it.

Regards,
Arul

On Tue, Nov 5, 2019 at 3:31 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Arul,
>
> On Tue, 5 Nov 2019, Arul Jeniston wrote:
> > >  So I'm going to send a patch to document that in the manpage.
> >
> > Did you get a chance to make the manpage patch? if yes, please help us
> > by sharing the link where we can find the patch.
>
> No. I would have Cc'ed you when posting. It's somewhere on my todo list.
>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, back to index

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16  8:32 [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 'hrtimer_forward_now()' returns zero due to bigger backward time drift. This causes timerfd_read to return 0. As per man page, read on timerfd is not expected to return 0. This patch fixes this problem. Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> arul.jeniston
2019-08-16  9:05 ` [PATCH] FS: timerfd: [Trimmed unreadable long subject line ] Thomas Gleixner
2019-08-16 10:22 ` [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function Arul Jeniston
2019-08-16 10:45   ` Thomas Gleixner
2019-08-16 16:55     ` Arul Jeniston
2019-08-16 17:00       ` Arul Jeniston
2019-08-16 21:17         ` Thomas Gleixner
     [not found]           ` <CACAVd4hT6QYtgtDsBcgy7c_s9WVBAH+1m0r5geBe7BUWJWYhbA@mail.gmail.com>
2019-08-17 19:23             ` Thomas Gleixner
2019-08-19  6:07               ` Arul Jeniston
2019-08-19  8:04                 ` Thomas Gleixner
2019-08-19 14:25                   ` Arul Jeniston
2019-08-19 14:52                     ` Thomas Gleixner
2019-08-19 15:26                       ` Arul Jeniston
2019-08-19 15:59                         ` Thomas Gleixner
     [not found]                           ` <CACAVd4iRN7=eq_B1+Yb-xcspU-Sg1dmMo_=VtLXXVPkjN1hY5Q@mail.gmail.com>
2019-08-19 18:29                             ` Thomas Gleixner
     [not found]                               ` <CACAVd4jAJ5QcOH=q=Q9kAz20X4_nAc7=vVU_gPWTS1UuiGK-fg@mail.gmail.com>
2019-08-20  8:49                                 ` Thomas Gleixner
2019-08-20  9:42                                   ` Arul Jeniston
2019-09-05  8:48                                     ` Arul Jeniston
2019-09-05 15:34                                       ` Thomas Gleixner
2019-09-06 16:36                                         ` Arul Jeniston
2019-09-07 14:38                                           ` Thomas Gleixner
2019-11-05  5:01                                             ` Arul Jeniston
2019-11-05 10:01                                               ` Thomas Gleixner
2019-11-06  3:38                                                 ` Arul Jeniston

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git