All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	David Gibson <david@gibson.dropbear.id.au>,
	Liav Rehana <liavr@mellanox.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Parit Bhargava <prarit@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	"Christopher S. Hall" <christopher.s.hall@intel.com>,
	Chris Metcalf <cmetcalf@mellanox.com>,
	Richard Cochran <richardcochran@gmail.com>,
	John Stultz <john.stultz@linaro.org>
Subject: [PATCH 4.8 11/96] timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
Date: Fri,  6 Jan 2017 22:42:59 +0100	[thread overview]
Message-ID: <20170106214228.148345925@linuxfoundation.org> (raw)
In-Reply-To: <20170106214227.601120243@linuxfoundation.org>

4.8-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 9c1645727b8fa90d07256fdfcc45bf831242a3ab upstream.

The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:

    s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;

Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:

 - Time jumps backwards

 - __iter_div_u64_rem() which is used in one of the calling code pathes
   will take forever to piecewise calculate the seconds/nanoseconds part.

This has been reported by several people with different scenarios:

David observed that when stopping a VM with a debugger:

 "It was essentially the stopped by debugger case.  I forget exactly why,
  but the guest was being explicitly stopped from outside, it wasn't just
  scheduling lag.  I think it was something in the vicinity of 10 minutes
  stopped."

 When lifting the stop the machine went dead.

The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.

But this was also observed on a live system by Liav:

 "When the OS is too overloaded, delta will get a high enough value for the
  msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
  after the shift the nsec variable will gain a value similar to
  0xffffffffff000000."

Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
timekeeping_get_ns()").

Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:

   s64 nsec;

-  nsec = (d * m + n) >> s:
+  nsec = d * m + n;
+  nsec >>= s;

d being type of cycle_t adds another level of obfuscation.

This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.

There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.

Despite delta being cycle_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.

Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.

Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.

This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.

It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.

I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.

Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/time/timekeeping.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -299,10 +299,10 @@ u32 (*arch_gettimeoffset)(void) = defaul
 static inline u32 arch_gettimeoffset(void) { return 0; }
 #endif
 
-static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
+static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
 					  cycle_t delta)
 {
-	s64 nsec;
+	u64 nsec;
 
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;

  parent reply	other threads:[~2017-01-06 21:48 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20170106215010epcas3p298f6cd3d6c81baf1e1e724741444d929@epcas3p2.samsung.com>
2017-01-06 21:42 ` [PATCH 4.8 00/96] 4.8.17-stable review Greg Kroah-Hartman
2017-01-06 21:42   ` [PATCH 4.8 01/96] ssb: Fix error routine when fallback SPROM fails Greg Kroah-Hartman
2017-01-06 21:42   ` [PATCH 4.8 02/96] rtlwifi: Fix enter/exit power_save Greg Kroah-Hartman
2017-01-06 21:42   ` [PATCH 4.8 03/96] perf/x86: Fix exclusion of BTS and LBR for Goldmont Greg Kroah-Hartman
2017-01-06 21:42   ` [PATCH 4.8 04/96] perf/x86/intel/cstate: Prevent hotplug callback leak Greg Kroah-Hartman
2017-01-06 21:42   ` [PATCH 4.8 05/96] rtl8xxxu: Work around issue with 8192eu and 8723bu devices not reconnecting Greg Kroah-Hartman
2017-01-06 21:42   ` [PATCH 4.8 06/96] cfg80211/mac80211: fix BSS leaks when abandoning assoc attempts Greg Kroah-Hartman
2017-01-06 21:42   ` [PATCH 4.8 07/96] ath9k: fix ath9k_hw_gpio_get() to return 0 or 1 on success Greg Kroah-Hartman
2017-01-06 21:42   ` [PATCH 4.8 08/96] ath9k: Really fix LED polarity for some Mini PCI AR9220 MB92 cards Greg Kroah-Hartman
2017-01-06 21:42   ` [PATCH 4.8 09/96] mmc: sdhci: Fix recovery from tuning timeout Greg Kroah-Hartman
2017-01-06 21:42   ` [PATCH 4.8 10/96] regulator: stw481x-vmmc: fix ages old enable error Greg Kroah-Hartman
2017-01-06 21:42   ` Greg Kroah-Hartman [this message]
2017-01-06 21:43   ` [PATCH 4.8 12/96] gpio: chardev: Return error for seek operations Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 13/96] arm64: tegra: Add VDD_GPU regulator to Jetson TX1 Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 14/96] clk: bcm2835: Avoid overwriting the div info when disabling a pll_div clk Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 15/96] thermal: hwmon: Properly report critical temperature in sysfs Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 16/96] docs: sphinx-extensions: make rstFlatTable work with docutils 0.13 Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 17/96] hv: acquire vmbus_connection.channel_mutex in vmbus_free_channels() Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 18/96] staging: comedi: ni_mio_common: fix M Series ni_ai_insn_read() data mask Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 19/96] staging: comedi: ni_mio_common: fix E series ni_ai_insn_read() data Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 20/96] ACPI / video: Add force_native quirk for Dell XPS 17 L702X Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 21/96] ACPI / video: Add force_native quirk for HP Pavilion dv6 Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 22/96] drm/amd/amdgpu: enable GUI idle INT after enabling CGCG Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 23/96] drm/nouveau/gr: fallback to legacy paths during firmware lookup Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 24/96] drm/nouveau/kms: lvds panel strap moved again on maxwell Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 25/96] drm/nouveau/bios: require checksum to match for fast acpi shadow method Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 26/96] drm/nouveau/ltc: protect clearing of comptags with mutex Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 27/96] drm/nouveau/ttm: wait for bo fence to signal before unmapping vmas Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 28/96] drm/nouveau/i2c/gk110b,gm10x: use the correct implementation Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 29/96] drm/nouveau/fifo/gf100-: protect channel preempt with subdev mutex Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 32/96] drm/radeon: add additional pci revision to dpm workaround Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 33/96] drm/radeon/si: load the proper firmware on 0x87 oland boards Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 34/96] drm/gma500: Add compat ioctl Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 35/96] drm/amdgpu: fix init save/restore list in gfx_v8.0 Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 36/96] drivers/gpu/drm/ast: Fix infinite loop if read fails Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 37/96] mei: request async autosuspend at the end of enumeration Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 38/96] mei: me: add lewisburg device ids Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 39/96] block: protect iterate_bdevs() against concurrent close Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 40/96] vt: fix Scroll Lock LED trigger name Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 41/96] stm class: Fix device leak in open error path Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 42/96] scsi: megaraid_sas: For SRIOV enabled firmware, ensure VF driver waits for 30secs before reset Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 43/96] scsi: megaraid_sas: Do not set MPI2_TYPE_CUDA for JBOD FP path for FW which does not support JBOD sequence map Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 44/96] iscsi-target: Return error if unable to add network portal Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 45/96] scsi: zfcp: fix use-after-"free" in FC ingress path after TMF Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 46/96] scsi: zfcp: do not trace pure benign residual HBA responses at default level Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 47/96] scsi: zfcp: fix rport unblock race with LUN recovery Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 48/96] scsi: avoid a permanent stop of the scsi devices request queue Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 49/96] ARC: mm: arc700: Dont assume 2 colours for aliasing VIPT dcache Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 50/96] firmware: fix usermode helper fallback loading Greg Kroah-Hartman
2017-01-06 21:54     ` Yves-Alexis Perez
2017-01-13 10:58       ` Greg Kroah-Hartman
2017-01-13 10:58         ` Greg Kroah-Hartman
2017-03-24 20:01         ` Ben Gamari
2017-03-30  4:06           ` Luis R. Rodriguez
2017-03-30  4:06             ` Luis R. Rodriguez
2017-01-06 21:43   ` [PATCH 4.8 51/96] s390/vmlogrdr: fix IUCV buffer allocation Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 52/96] s390/kexec: use node 0 when re-adding crash kernel memory Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 53/96] arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 54/96] sc16is7xx: Drop bogus use of IRQF_ONESHOT Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 55/96] md/raid5: limit request size according to implementation limits Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 56/96] scsi: aacraid: remove wildcard for series 9 controllers Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 57/96] KVM: PPC: Book3S HV: Save/restore XER in checkpointed register state Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 58/96] KVM: PPC: Book3S HV: Dont lose hardware R/C bit updates in H_PROTECT Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 59/96] kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF) Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 60/96] fsnotify: Fix possible use-after-free in inode iteration on umount Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 61/96] vsock/virtio: fix src/dst cid format Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 62/96] ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 63/96] platform/x86: asus-nb-wmi.c: Add X45U quirk Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 64/96] fgraph: Handle a case where a tracer ignores set_graph_notrace Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 65/96] IB/mad: Fix an array index check Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 66/96] IPoIB: Avoid reading an uninitialized member variable Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 67/96] IB/multicast: Check ib_find_pkey() return value Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 68/96] IB/rxe: Fix a memory leak in rxe_qp_cleanup() Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 69/96] IB/cma: Fix a race condition in iboe_addr_get_sgid() Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 70/96] [media] mn88472: fix chip id check on probe Greg Kroah-Hartman
2017-01-06 21:43   ` [PATCH 4.8 71/96] [media] mn88473: " Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 72/96] [media] s5p-mfc: fix failure path of s5p_mfc_alloc_memdev() Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 73/96] [media] media: solo6x10: fix lockup by avoiding delayed register write Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 74/96] [media] v4l: tvp5150: Add missing break in set control handler Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 75/96] Input: drv260x - fix input devices parent assignment Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 76/96] i40iw: Use correct src address in memcpy to rdma stats counters Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 77/96] PCI: Check for PME in targeted sleep state Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 78/96] libceph: verify authorize reply on connect Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 79/96] nfs_write_end(): fix handling of short copies Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 80/96] pNFS: On error, do not send LAYOUTGET until the LAYOUTRETURN has completed Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 81/96] pNFS: Dont clear the layout stateid if a layout return is outstanding Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 82/96] pNFS: Clear NFS_LAYOUT_RETURN_REQUESTED when invalidating the layout stateid Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 83/96] pNFS: Fix a deadlock between read resends and layoutreturn Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 84/96] SUNRPC: fix refcounting problems with auth_gss messages Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 85/96] powerpc/64e: Convert cmpi to cmpwi in head_64.S Greg Kroah-Hartman
2017-01-06 21:44     ` Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 86/96] powerpc/ps3: Fix system hang with GCC 5 builds Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 87/96] libnvdimm, pfn: fix align attribute Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 88/96] target/user: Fix use-after-free of tcmu_cmds if they are expired Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 89/96] kconfig/nconf: Fix hang when editing symbol with a long prompt Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 90/96] sg_write()/bsg_write() is not fit to be called under KERNEL_DS Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 91/96] net: mvpp2: fix dma unmapping of TX buffers for fragments Greg Kroah-Hartman
2017-01-06 21:44   ` [PATCH 4.8 96/96] drm/i915: skip the first 4k of stolen memory on everything >= gen8 Greg Kroah-Hartman
2017-01-07  2:04   ` [PATCH 4.8 00/96] 4.8.17-stable review Shuah Khan
2017-01-07 15:53   ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170106214228.148345925@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=christopher.s.hall@intel.com \
    --cc=cmetcalf@mellanox.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=john.stultz@linaro.org \
    --cc=liavr@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lvivier@redhat.com \
    --cc=peterz@infradead.org \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.