Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Arul Jeniston <arul.jeniston@gmail.com>
Cc: viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, arul_mc@dell.com
Subject: Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.
Date: Sat, 17 Aug 2019 21:23:36 +0200 (CEST)
Message-ID: <alpine.DEB.2.21.1908171942370.1923@nanos.tec.linutronix.de> (raw)
In-Reply-To: <CACAVd4hT6QYtgtDsBcgy7c_s9WVBAH+1m0r5geBe7BUWJWYhbA@mail.gmail.com>

Arul,

On Sat, 17 Aug 2019, Arul Jeniston wrote:

> Do you agree the possibility of returning zero value from timerfd_read()?

Obviosuly it happens.
 
> > That has absolutely nothing to do with CLOCK_REALTIME. Your machines
> > TSC is either going backwards or not synchronized between cores.
> >
> > Hint: Dell has a track record of BIOS doing the wrong things to TSC in
> > order to hide their 'value add' features stealing CPU time.
> 
> We haven't seen any issue in Dell hardware but we would definitely check
> the possibility of hardware bug.

BIOS is the more likely candidate.

> Let us say, due to some reason the tsc goes backwards. Isn't it handled in
> clocksource_delta().

No. clocksource_delta() does damage limitation. It prevents insane large
time jumps which would result if the read out TSC value is less than the
base value which is used to calculate the delta. It cannot do more than
that.

> Is timerfd_read expected to return 0 if tsc drifts backwards? If so, why it
> is not documented?
> Being a system call, we expect all return values of read() on timerfd to be
> documented in the man page.

We expect TSC not to go backwards. If it does it's broken and not usable as
a clocksource for the kernel. The problem is that this is not necessarily
easy to detect.

Fact is, that your machines TSC goes backwards or is not properly
synchronized between the cores. Otherwise the problem would not exist at
all. That's the problem which needs to be fixed and not papered over with
crude hacks and weird justifications.

> > ktime_get() is CLOCK_MONOTONIC and not CLOCK_REALTIME.
> 
> We see the same base used for CLOCK_MONOTONIC, CLOCK_REALTIME timers here.
> both MONOTONIC, REALTIME timers hits the following code flow. we confirmed
> it through instrumentation.
> timerfd_read()-->hrtimer_forward_now()-->ktime_get()-->timekeeping_get_ns()-->timekeeping_get_delta()-->clocksource_delta().
>  Do you want me to share any other logs to confirm it?

No. That's the case when you use a relative timer with CLOCK_REALTIME
because only absolute timers are affected by modifications of
CLOCK_REALTIME. So it's NOT an issue of a CLOCK_REALTIME modification via
settimeofday() or adjtimex().

> > It's a bug, but either a hardware or a BIOS bug and you are trying to
> > paper over it at the place where you observe the symptom, which is
> > obviously the wrong place because:
> >
> >  1) Any other time related function even in timerfd is affected as well
> >
> >  2) We do not cure symptoms, we cure the root cause. And clearly the root
> >     cause hase not been explained and addressed.
> 
> if we don't fix this in kernel, can we document this return value in
> timerfd read() man page?

Again:

You cannot fix a hardware problem by hacking around it at exactly one place
where you can observe it. If that problem exists on your machine, then any
other time related function is affected as well.

Are you going to submit patches against _ALL_ time{r} related syscalls to
fix^Wpaper over this? Either against the kernel or against the man pages?

As this is a 4 core Rangely, it has a properly synchronized TSC on all 4
cores which runs with constant frequency and is not affected by deeper
C-States.

Here is the flow:

timerfd_read()

   waitfortick()

timer interrupt()
      time = ktime_get()
      expire timer			time >= timer_time
        tick++;
	wakeup_reader()

   hrtimer_forward_now()
	now = ktime_get()		In the failure case now < timer_time

i.e. time went backwards since the timer was expired. That's absolutely
unexpected behaviour and no, we are not papering over it.

Did you ever quantify how much time goes backwards in that case?

Is the timer expiry and the timerfd_read() on the same CPU or on different
ones?

Can you please provide a full dmesg from boot to after the point where this
failure happens?

Thanks,

	tglx

  parent reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-16  8:32 [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function. 'hrtimer_forward_now()' returns zero due to bigger backward time drift. This causes timerfd_read to return 0. As per man page, read on timerfd is not expected to return 0. This patch fixes this problem. Signed-off-by: Arul Jeniston <arul.jeniston@gmail.com> arul.jeniston
2019-08-16  9:05 ` [PATCH] FS: timerfd: [Trimmed unreadable long subject line ] Thomas Gleixner
2019-08-16 10:22 ` [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function Arul Jeniston
2019-08-16 10:45   ` Thomas Gleixner
2019-08-16 16:55     ` Arul Jeniston
2019-08-16 17:00       ` Arul Jeniston
2019-08-16 21:17         ` Thomas Gleixner
     [not found]           ` <CACAVd4hT6QYtgtDsBcgy7c_s9WVBAH+1m0r5geBe7BUWJWYhbA@mail.gmail.com>
2019-08-17 19:23             ` Thomas Gleixner [this message]
2019-08-19  6:07               ` Arul Jeniston
2019-08-19  8:04                 ` Thomas Gleixner
2019-08-19 14:25                   ` Arul Jeniston
2019-08-19 14:52                     ` Thomas Gleixner
2019-08-19 15:26                       ` Arul Jeniston
2019-08-19 15:59                         ` Thomas Gleixner
     [not found]                           ` <CACAVd4iRN7=eq_B1+Yb-xcspU-Sg1dmMo_=VtLXXVPkjN1hY5Q@mail.gmail.com>
2019-08-19 18:29                             ` Thomas Gleixner
     [not found]                               ` <CACAVd4jAJ5QcOH=q=Q9kAz20X4_nAc7=vVU_gPWTS1UuiGK-fg@mail.gmail.com>
2019-08-20  8:49                                 ` Thomas Gleixner
2019-08-20  9:42                                   ` Arul Jeniston
2019-09-05  8:48                                     ` Arul Jeniston
2019-09-05 15:34                                       ` Thomas Gleixner
2019-09-06 16:36                                         ` Arul Jeniston
2019-09-07 14:38                                           ` Thomas Gleixner
2019-11-05  5:01                                             ` Arul Jeniston
2019-11-05 10:01                                               ` Thomas Gleixner
2019-11-06  3:38                                                 ` Arul Jeniston

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1908171942370.1923@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=arul.jeniston@gmail.com \
    --cc=arul_mc@dell.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git