linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Parth Shah <parth@linux.ibm.com>,
	Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Waiman Long <longman@redhat.com>,
	"Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Phil Auld <pauld@redhat.com>,
	Vaidyanathan Srinivasan <svaidy@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>
Subject: [PATCH 4.19 60/84] powerpc/vcpu: Assume dedicated processors as non-preempt
Date: Sat, 11 Jan 2020 10:50:37 +0100	[thread overview]
Message-ID: <20200111094908.859752069@linuxfoundation.org> (raw)
In-Reply-To: <20200111094845.328046411@linuxfoundation.org>

From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

commit 14c73bd344da60abaf7da3ea2e7733ddda35bbac upstream.

With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on
pre-empted vCPUs"), the scheduler avoids preempted vCPUs to schedule
tasks on wakeup. This leads to wrong choice of CPU, which in-turn
leads to larger wakeup latencies. Eventually, it leads to performance
regression in latency sensitive benchmarks like soltp, schbench etc.

On Powerpc, vcpu_is_preempted() only looks at yield_count. If the
yield_count is odd, the vCPU is assumed to be preempted. However
yield_count is increased whenever the LPAR enters CEDE state (idle).
So any CPU that has entered CEDE state is assumed to be preempted.

Even if vCPU of dedicated LPAR is preempted/donated, it should have
right of first-use since they are supposed to own the vCPU.

On a Power9 System with 32 cores:
  # lscpu
  Architecture:        ppc64le
  Byte Order:          Little Endian
  CPU(s):              128
  On-line CPU(s) list: 0-127
  Thread(s) per core:  8
  Core(s) per socket:  1
  Socket(s):           16
  NUMA node(s):        2
  Model:               2.2 (pvr 004e 0202)
  Model name:          POWER9 (architected), altivec supported
  Hypervisor vendor:   pHyp
  Virtualization type: para
  L1d cache:           32K
  L1i cache:           32K
  L2 cache:            512K
  L3 cache:            10240K
  NUMA node0 CPU(s):   0-63
  NUMA node1 CPU(s):   64-127

  # perf stat -a -r 5 ./schbench
  v5.4                               v5.4 + patch
  Latency percentiles (usec)         Latency percentiles (usec)
        50.0000th: 45                      50.0th: 45
        75.0000th: 62                      75.0th: 63
        90.0000th: 71                      90.0th: 74
        95.0000th: 77                      95.0th: 78
        *99.0000th: 91                     *99.0th: 82
        99.5000th: 707                     99.5th: 83
        99.9000th: 6920                    99.9th: 86
        min=0, max=10048                   min=0, max=96
  Latency percentiles (usec)         Latency percentiles (usec)
        50.0000th: 45                      50.0th: 46
        75.0000th: 61                      75.0th: 64
        90.0000th: 72                      90.0th: 75
        95.0000th: 79                      95.0th: 79
        *99.0000th: 691                    *99.0th: 83
        99.5000th: 3972                    99.5th: 85
        99.9000th: 8368                    99.9th: 91
        min=0, max=16606                   min=0, max=117
  Latency percentiles (usec)         Latency percentiles (usec)
        50.0000th: 45                      50.0th: 46
        75.0000th: 61                      75.0th: 64
        90.0000th: 71                      90.0th: 75
        95.0000th: 77                      95.0th: 79
        *99.0000th: 106                    *99.0th: 83
        99.5000th: 2364                    99.5th: 84
        99.9000th: 7480                    99.9th: 90
        min=0, max=10001                   min=0, max=95
  Latency percentiles (usec)         Latency percentiles (usec)
        50.0000th: 45                      50.0th: 47
        75.0000th: 62                      75.0th: 65
        90.0000th: 72                      90.0th: 75
        95.0000th: 78                      95.0th: 79
        *99.0000th: 93                     *99.0th: 84
        99.5000th: 108                     99.5th: 85
        99.9000th: 6792                    99.9th: 90
        min=0, max=17681                   min=0, max=117
  Latency percentiles (usec)         Latency percentiles (usec)
        50.0000th: 46                      50.0th: 45
        75.0000th: 62                      75.0th: 64
        90.0000th: 73                      90.0th: 75
        95.0000th: 79                      95.0th: 79
        *99.0000th: 113                    *99.0th: 82
        99.5000th: 2724                    99.5th: 83
        99.9000th: 6184                    99.9th: 93
        min=0, max=9887                    min=0, max=111

   Performance counter stats for 'system wide' (5 runs):

  context-switches    43,373  ( +-  0.40% )   44,597 ( +-  0.55% )
  cpu-migrations       1,211  ( +-  5.04% )      220 ( +-  6.23% )
  page-faults         15,983  ( +-  5.21% )   15,360 ( +-  3.38% )

Waiman Long suggested using static_keys.

Fixes: 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs")
Cc: stable@vger.kernel.org # v4.18+
Reported-by: Parth Shah <parth@linux.ibm.com>
Reported-by: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Acked-by: Waiman Long <longman@redhat.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Tested-by: Parth Shah <parth@linux.ibm.com>
[mpe: Move the key and setting of the key to pseries/setup.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191213035036.6913-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/include/asm/spinlock.h    |    4 +++-
 arch/powerpc/platforms/pseries/setup.c |    7 +++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -53,10 +53,12 @@
 #endif
 
 #ifdef CONFIG_PPC_PSERIES
+DECLARE_STATIC_KEY_FALSE(shared_processor);
+
 #define vcpu_is_preempted vcpu_is_preempted
 static inline bool vcpu_is_preempted(int cpu)
 {
-	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
+	if (!static_branch_unlikely(&shared_processor))
 		return false;
 	return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
 }
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -75,6 +75,9 @@
 #include "pseries.h"
 #include "../../../../drivers/pci/pci.h"
 
+DEFINE_STATIC_KEY_FALSE(shared_processor);
+EXPORT_SYMBOL_GPL(shared_processor);
+
 int CMO_PrPSP = -1;
 int CMO_SecPSP = -1;
 unsigned long CMO_PageSize = (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K);
@@ -761,6 +764,10 @@ static void __init pSeries_setup_arch(vo
 
 	if (firmware_has_feature(FW_FEATURE_LPAR)) {
 		vpa_init(boot_cpuid);
+
+		if (lppaca_shared_proc(get_lppaca()))
+			static_branch_enable(&shared_processor);
+
 		ppc_md.power_save = pseries_lpar_idle;
 		ppc_md.enable_pmcs = pseries_lpar_enable_pmcs;
 #ifdef CONFIG_PCI_IOV



  parent reply	other threads:[~2020-01-11 10:17 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-11  9:49 [PATCH 4.19 00/84] 4.19.95-stable review Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 01/84] USB: dummy-hcd: use usb_urb_dir_in instead of usb_pipein Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 02/84] USB: dummy-hcd: increase max number of devices to 32 Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 03/84] bpf: Fix passing modified ctx to ld/abs/ind instruction Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 04/84] regulator: fix use after free issue Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 05/84] ASoC: max98090: fix possible race conditions Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 06/84] locking/spinlock/debug: Fix various data races Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 07/84] netfilter: ctnetlink: netns exit must wait for callbacks Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 08/84] mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame() Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 09/84] libtraceevent: Fix lib installation with O= Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 10/84] x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 11/84] ASoC: Intel: bytcr_rt5640: Update quirk for Teclast X89 Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 12/84] efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 13/84] efi/gop: Return EFI_SUCCESS if a usable GOP was found Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 14/84] efi/gop: Fix memory leak in __gop_query32/64() Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 15/84] ARM: dts: imx6ul: imx6ul-14x14-evk.dtsi: Fix SPI NOR probing Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 16/84] ARM: vexpress: Set-up shared OPP table instead of individual for each CPU Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 17/84] netfilter: uapi: Avoid undefined left-shift in xt_sctp.h Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 18/84] netfilter: nft_set_rbtree: bogus lookup/get on consecutive elements in named sets Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 19/84] netfilter: nf_tables: validate NFT_SET_ELEM_INTERVAL_END Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 20/84] netfilter: nf_tables: validate NFT_DATA_VALUE after nft_data_init() Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 21/84] ARM: dts: BCM5301X: Fix MDIO node address/size cells Greg Kroah-Hartman
2020-01-11  9:49 ` [PATCH 4.19 22/84] selftests/ftrace: Fix multiple kprobe testcase Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 23/84] ARM: dts: Cygnus: Fix MDIO node address/size cells Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 24/84] spi: spi-cavium-thunderx: Add missing pci_release_regions() Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 25/84] ASoC: topology: Check return value for soc_tplg_pcm_create() Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 26/84] ARM: dts: bcm283x: Fix critical trip point Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 27/84] bnxt_en: Return error if FW returns more data than dump length Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 28/84] bpf, mips: Limit to 33 tail calls Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 29/84] spi: spi-ti-qspi: Fix a bug when accessing non default CS Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 30/84] ARM: dts: am437x-gp/epos-evm: fix panel compatible Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 31/84] samples: bpf: Replace symbol compare of trace_event Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 32/84] samples: bpf: fix syscall_tp due to unused syscall Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 33/84] powerpc: Ensure that swiotlb buffer is allocated from low memory Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 34/84] btrfs: Fix error messages in qgroup_rescan_init Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 35/84] bpf: Clear skb->tstamp in bpf_redirect when necessary Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 36/84] bnx2x: Do not handle requests from VFs after parity Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 37/84] bnx2x: Fix logic to get total no. of PFs per engine Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 38/84] cxgb4: Fix kernel panic while accessing sge_info Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 39/84] net: usb: lan78xx: Fix error message format specifier Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 40/84] parisc: add missing __init annotation Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 41/84] rfkill: Fix incorrect check to avoid NULL pointer dereference Greg Kroah-Hartman
2020-01-13  8:08   ` Pavel Machek
2020-01-11  9:50 ` [PATCH 4.19 42/84] ASoC: wm8962: fix lambda value Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 43/84] regulator: rn5t618: fix module aliases Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 44/84] iommu/iova: Init the struct iova to fix the possible memleak Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 45/84] kconfig: dont crash on NULL expressions in expr_eq() Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 46/84] perf/x86/intel: Fix PT PMI handling Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 47/84] fs: avoid softlockups in s_inodes iterators Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 48/84] net: stmmac: Do not accept invalid MTU values Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 49/84] net: stmmac: xgmac: Clear previous RX buffer size Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 50/84] net: stmmac: RX buffer size must be 16 byte aligned Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 51/84] net: stmmac: Always arm TX Timer at end of transmission start Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 52/84] s390/purgatory: do not build purgatory with kcov, kasan and friends Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 53/84] drm/exynos: gsc: add missed component_del Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 54/84] s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 55/84] s390/dasd: fix memleak in path handling error case Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 56/84] block: fix memleak when __blk_rq_map_user_iov() is failed Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 57/84] parisc: Fix compiler warnings in debug_core.c Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 58/84] llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 59/84] hv_netvsc: Fix unwanted rx_table reset Greg Kroah-Hartman
2020-01-11  9:50 ` Greg Kroah-Hartman [this message]
2020-01-11  9:50 ` [PATCH 4.19 61/84] powerpc/spinlocks: Include correct header for static key Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 62/84] cpufreq: imx6q: read OCOTP through nvmem for imx6ul/imx6ull Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 63/84] ARM: dts: imx6ul: use nvmem-cells for cpu speed grading Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 64/84] PCI/switchtec: Read all 64 bits of part_event_bitmap Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 65/84] arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set Greg Kroah-Hartman
2020-01-11 12:30   ` Naresh Kamboju
2020-01-11 17:44     ` Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 66/84] gtp: fix bad unlock balance in gtp_encap_enable_socket Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 67/84] macvlan: do not assume mac_header is set in macvlan_broadcast() Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 68/84] net: dsa: mv88e6xxx: Preserve priority when setting CPU port Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 69/84] net: stmmac: dwmac-sun8i: Allow all RGMII modes Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 70/84] net: stmmac: dwmac-sunxi: " Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 71/84] net: usb: lan78xx: fix possible skb leak Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 72/84] pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 73/84] sch_cake: avoid possible divide by zero in cake_enqueue() Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 74/84] sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 75/84] tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 76/84] vxlan: fix tos value before xmit Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 77/84] vlan: fix memory leak in vlan_dev_set_egress_priority Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 78/84] vlan: vlan_changelink() should propagate errors Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 79/84] mlxsw: spectrum_qdisc: Ignore grafting of invisible FIFO Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 80/84] net: sch_prio: When ungrafting, replace with FIFO Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 81/84] usb: dwc3: gadget: Fix request complete check Greg Kroah-Hartman
2020-01-11  9:50 ` [PATCH 4.19 82/84] USB: core: fix check for duplicate endpoints Greg Kroah-Hartman
2020-01-11  9:51 ` [PATCH 4.19 83/84] USB: serial: option: add Telit ME910G1 0x110a composition Greg Kroah-Hartman
2020-01-11  9:51 ` [PATCH 4.19 84/84] usb: missing parentheses in USE_NEW_SCHEME Greg Kroah-Hartman
2020-01-11 16:02 ` [PATCH 4.19 00/84] 4.19.95-stable review Guenter Roeck
2020-01-11 17:47   ` Greg Kroah-Hartman
2020-01-11 20:10     ` Guenter Roeck
2020-01-11 20:41       ` Greg Kroah-Hartman
2020-01-12  4:57     ` Naresh Kamboju
2020-01-12  8:14       ` Greg Kroah-Hartman
2020-01-13 15:48 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200111094908.859752069@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=Ihor.Pasichnyk@ibm.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=parth@linux.ibm.com \
    --cc=pauld@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=svaidy@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).