All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: John Stultz <john.stultz@linaro.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Liav Rehana <liavr@mellanox.com>,
	Chris Metcalf <cmetcalf@mellanox.com>,
	Richard Cochran <richardcochran@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Prarit Bhargava <prarit@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	"Christopher S . Hall" <christopher.s.hall@intel.com>,
	"4.6+" <stable@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH] timekeeping: Change type of nsec variable to unsigned in its calculation.
Date: Fri, 2 Dec 2016 09:36:42 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.20.1612020921500.4295@nanos> (raw)
In-Reply-To: <20161201233210.GB31412@umbus.fritz.box>

On Fri, 2 Dec 2016, David Gibson wrote:
> On Thu, Dec 01, 2016 at 12:59:51PM +0100, Thomas Gleixner wrote:
> > So I assume that you are talking about a VM which was not scheduled by the
> > host due to overcommitment (who ever thought that this is a good idea) or
> > whatever other reason (yes, people were complaining about wreckage caused
> > by stopping kernels with debuggers) for a long enough time to trigger that
> > overflow situation. If that's the case then the unsigned conversion will
> > just make it more unlikely but it still will happen.
> 
> It was essentially the stopped by debugger case.  I forget exactly
> why, but the guest was being explicitly stopped from outside, it
> wasn't just scheduling lag.  I think it was something in the vicinity
> of 10 minutes stopped.

Ok. Debuggers stopping stuff is one issue, but if I understood Liav
correctly, then he is seing the issue on a heavy loaded machine.

Liav, can you please describe the scenario in detail? Are you observing
this on bare metal or in a VM which gets scheduled out long enough or was
there debugging/hypervisor intervention involved?

> It's long enough ago that I can't be sure, but I thought we'd tried
> various different stoppage periods, which should have also triggered
> the unsigned overflow you're describing, and didn't observe the crash
> once the change was applied.  Note that there have been other changes
> to the timekeeping code since then, which might have made a
> difference.
> 
> I agree that it's not reasonable for the guest to be entirely
> unaffected by such a large stoppage: I'd have no complaints if the
> guest time was messed up, and/or it spewed warnings.  But complete
> guest death seems a rather more fragile response to the situation than
> we'd like.

Guests death? Is it really dead/crashed or just stuck in that endless loop
trying to add that huge negative value piecewise?

That's at least what Liav was describing as he mentioned
__iter_div_u64_rem() explicitely.

While I'm less worried about debuggers, I worry about the real thing.

I agree that we should not starve after resume from a debug stop, but in
that case the least of my worries is time going backwards.

Though if the signed mult overrun is observable in a live system, then we
need to worry about time going backwards even with the unsigned
conversion. Simply because once we fixed the starvation issue people with
insane enough setups will trigger the unsigned overrun and complain about
time going backwards.

Thanks,

	tglx

  reply	other threads:[~2016-12-02  8:39 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-19  4:53 [PATCH] timekeeping: Change type of nsec variable to unsigned in its calculation John Stultz
2016-11-28 22:50 ` John Stultz
2016-11-29 14:22 ` Thomas Gleixner
2016-11-29 23:57   ` David Gibson
2016-11-30 23:21     ` Thomas Gleixner
2016-12-01  2:12       ` David Gibson
2016-12-01 11:59         ` Thomas Gleixner
2016-12-01 20:23           ` John Stultz
2016-12-01 20:46             ` Thomas Gleixner
2016-12-01 21:19               ` John Stultz
2016-12-01 22:44                 ` Thomas Gleixner
2016-12-01 23:03                   ` John Stultz
2016-12-01 23:08                     ` Thomas Gleixner
2016-12-01 23:32           ` David Gibson
2016-12-02  8:36             ` Thomas Gleixner [this message]
2016-12-03  0:33               ` David Gibson
  -- strict thread matches above, loose matches on Subject: below --
2016-09-26  6:13 Liav Rehana
2016-09-26  5:45 Liav Rehana
2016-09-26  6:02 ` John Stultz
2016-09-27  0:01 ` Thomas Gleixner
2016-09-27  5:10   ` Liav Rehana
2016-09-27 14:18     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1612020921500.4295@nanos \
    --to=tglx@linutronix.de \
    --cc=christopher.s.hall@intel.com \
    --cc=cmetcalf@mellanox.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=john.stultz@linaro.org \
    --cc=liavr@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lvivier@redhat.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.