linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 4.20 36/60] psi: avoid divide-by-zero crash inside virtual machines
Date: Wed, 13 Mar 2019 15:09:57 -0400	[thread overview]
Message-ID: <20190313191021.158171-36-sashal@kernel.org> (raw)
In-Reply-To: <20190313191021.158171-1-sashal@kernel.org>

From: Johannes Weiner <hannes@cmpxchg.org>

[ Upstream commit 4e37504d1c49eec6434d0cc97278d2b51c9e8763 ]

We've been seeing hard-to-trigger psi crashes when running inside VM
instances:

    divide error: 0000 [#1] SMP PTI
    Modules linked in: [...]
    CPU: 0 PID: 212 Comm: kworker/0:2 Not tainted 4.16.18-119_fbk9_3817_gfe944c98d695 #119
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
    Workqueue: events psi_clock
    RIP: 0010:psi_update_stats+0x270/0x490
    RSP: 0018:ffffc90001117e10 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800a35a13f8
    RDX: 0000000000000000 RSI: ffff8800a35a1340 RDI: 0000000000000000
    RBP: 0000000000000658 R08: ffff8800a35a1470 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000f8502
    FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fbe370fa000 CR3: 00000000b1e3a000 CR4: 00000000000006f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     psi_clock+0x12/0x50
     process_one_work+0x1e0/0x390
     worker_thread+0x2b/0x3c0
     ? rescuer_thread+0x330/0x330
     kthread+0x113/0x130
     ? kthread_create_worker_on_cpu+0x40/0x40
     ? SyS_exit_group+0x10/0x10
     ret_from_fork+0x35/0x40
    Code: 48 0f 47 c7 48 01 c2 45 85 e4 48 89 16 0f 85 e6 00 00 00 4c 8b 49 10 4c 8b 51 08 49 69 d9 f2 07 00 00 48 6b c0 64 4c 8b 29 31 d2 <48> f7 f7 49 69 d5 8d 06 00 00 48 89 c5 4c 69 f0 00 98 0b 00 48

The Code-line points to `period` being 0 inside update_stats(), and we
divide by that when calculating that period's pressure percentage.

The elapsed period should never be 0.  The reason this can happen is due
to an off-by-one in the idle time / missing period calculation combined
with a coarse sched_clock() in the virtual machine.

The target time for aggregation is advanced into the future on a fixed
grid to prevent clock drift.  So when an aggregation runs after some idle
period, we can not just set it to "now + psi_period", but have to
calculate the downtime and advance the target time relative to itself.

However, if the aggregator was disabled exactly one psi_period (ns), we
drop one idle period in the calculation due to a > when we should do >=.
In that case, next_update will be advanced from 'now - psi_period' to
'now' when it should be moved to 'now + psi_period'.  The run finishes
with last_update == next_update == sched_clock().

With hardware clocks, this exact nanosecond match isn't likely in the
first place; but if it does happen, the clock will still have moved on and
the period non-zero by the time the worker runs.  A pointlessly short
period, but besides the extra work, no harm no foul.  However, a slow
sched_clock() like we have on VMs might not have advanced either by the
time the worker runs again.  And when we calculate the elapsed period, the
result, our pressure divisor, will be 0.  Ouch.

Fix this by correctly handling the situation when the elapsed time between
aggregation runs is precisely two periods, and advance the expiration
timestamp correctly to period into the future.

Link: http://lkml.kernel.org/r/20190214193157.15788-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Łukasz Siudut <lsiudut@fb.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/sched/psi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index fe24de3fbc93..ba4ab443bb67 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -321,7 +321,7 @@ static bool update_stats(struct psi_group *group)
 	expires = group->next_update;
 	if (now < expires)
 		goto out;
-	if (now - expires > psi_period)
+	if (now - expires >= psi_period)
 		missed_periods = div_u64(now - expires, psi_period);
 
 	/*
-- 
2.19.1


  parent reply	other threads:[~2019-03-13 19:11 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-13 19:09 [PATCH AUTOSEL 4.20 01/60] clk: sunxi-ng: v3s: Fix TCON reset de-assert bit Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 02/60] kallsyms: Handle too long symbols in kallsyms.c Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 03/60] clk: sunxi: A31: Fix wrong AHB gate number Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 04/60] esp: Skip TX bytes accounting when sending from a request socket Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 05/60] ARM: 8824/1: fix a migrating irq bug when hotplug cpu Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 06/60] bpf: Fix narrow load on a bpf_sock returned from sk_lookup() Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 07/60] bpf: only adjust gso_size on bytestream protocols Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 08/60] bpf: fix lockdep false positive in stackmap Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 09/60] af_key: unconditionally clone on broadcast Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 10/60] ARM: 8835/1: dma-mapping: Clear DMA ops on teardown Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 11/60] sh: fix build error for invisible CONFIG_BUILTIN_DTB_SOURCE Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 12/60] assoc_array: Fix shortcut creation Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 13/60] keys: Fix dependency loop between construction record and auth key Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 14/60] scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 15/60] net: systemport: Fix reception of BPDUs Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 16/60] net: dsa: bcm_sf2: Do not assume DSA master supports WoL Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 17/60] pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 18/60] qmi_wwan: apply SET_DTR quirk to Sierra WP7607 Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 19/60] net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe() Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 20/60] xfrm: Fix inbound traffic via XFRM interfaces across network namespaces Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 21/60] arm64: fix SSBS sanitization Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 22/60] mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush timeout issue Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 23/60] ASoC: topology: free created components in tplg load error Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 24/60] bpf/test_run: fix unkillable BPF_PROG_TEST_RUN Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 25/60] qed: Fix iWARP buffer size provided for syn packet processing Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 26/60] qed: Fix iWARP syn packet mac address validation Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 27/60] ARM: dts: armada-xp: fix Armada XP boards NAND description Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 28/60] ARM: dts: am335x-evmsk: Fix PHY mode for ethernet Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 29/60] ARM: dts: am335x-evm: " Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 30/60] arm64: Relax GIC version check during early boot Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 31/60] ARM: tegra: Restore DT ABI on Tegra124 Chromebooks Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 32/60] drm/amd/display: Fix negative cursor pos programming Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 33/60] net: marvell: mvneta: fix DMA debug warning Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 34/60] kasan, slub: move kasan_poison_slab hook before page_address Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 35/60] mm: handle lru_add_drain_all for UP properly Sasha Levin
2019-03-13 19:09 ` Sasha Levin [this message]
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 37/60] tmpfs: fix link accounting when a tmpfile is linked in Sasha Levin
2019-03-13 19:58   ` Hugh Dickins
2019-03-19 20:07     ` Sasha Levin
2019-03-13 19:09 ` [PATCH AUTOSEL 4.20 38/60] kasan, slab: fix conflicts with CONFIG_HARDENED_USERCOPY Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 39/60] kasan, slab: make freelist stored without tags Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 40/60] ixgbe: fix older devices that do not support IXGBE_MRQC_L3L4TXSWEN Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 41/60] i40e: fix potential RX buffer starvation for AF_XDP Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 42/60] ixgbe: " Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 43/60] ARCv2: lib: memcpy: fix doing prefetchw outside of buffer Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 44/60] ARC: uacces: remove lp_start, lp_end from clobber list Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 45/60] ARCv2: support manual regfile save on interrupts Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 46/60] i40e: fix XDP_REDIRECT/XDP xmit ring cleanup race Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 47/60] ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 48/60] ARCv2: don't assume core 0x54 has dual issue Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 49/60] phonet: fix building with clang Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 50/60] mac80211_hwsim: propagate genlmsg_reply return code Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 51/60] bpf, lpm: fix lookup bug in map_delete_elem Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 52/60] net: thunderx: make CFG_DONE message to run through generic send-ack sequence Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 53/60] net: thunderx: add nicvf_send_msg_to_pf result check for set_rx_mode_task Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 54/60] nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 55/60] nfp: bpf: fix ALU32 high bits clearance bug Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 56/60] bnxt_en: Fix typo in firmware message timeout logic Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 57/60] bnxt_en: Wait longer for the firmware message response to complete Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 58/60] mdio_bus: Fix use-after-free on device_register fails Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 59/60] net: set static variable an initial value in atl2_probe() Sasha Levin
2019-03-13 19:10 ` [PATCH AUTOSEL 4.20 60/60] selftests: fib_tests: sleep after changing carrier. again Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190313191021.158171-36-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).