From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751792AbdIMVrl (ORCPT ); Wed, 13 Sep 2017 17:47:41 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:40016 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751529AbdIMVeq (ORCPT ); Wed, 13 Sep 2017 17:34:46 -0400 Message-Id: <20170913212902.530704676@linutronix.de> User-Agent: quilt/0.63-1 Date: Wed, 13 Sep 2017 23:29:02 +0200 From: Thomas Gleixner To: LKML Cc: Ingo Molnar , Peter Anvin , Marc Zyngier , Peter Zijlstra , Borislav Petkov , Chen Yu , Rui Zhang , "Rafael J. Wysocki" , Len Brown , Dan Williams , Christoph Hellwig , Paolo Bonzini , Joerg Roedel , Boris Ostrovsky , Juergen Gross , Tony Luck , "K. Y. Srinivasan" , Alok Kataria , Steven Rostedt , Arjan van de Ven Subject: [patch 00/52] x86: Rework the vector management Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sorry for the large CC list, but this is a major surgery. The vector management in x86 including the surrounding code is a conglomorate of ancient bits and pieces which have been subject to 'modernization' and featuritis over the years. The most obscure parts are the vector allocation mechanics, the cleanup vector handling and the cpu hotplug machinery. Replacing these pieces of art was on my todo list for a long time. Recent attempts to 'solve' CPU offline / hibernation issues which are partially caused by the current vector management implementation made me look for real. Further information in this thread: http://lkml.kernel.org/r/cover.1504235838.git.yu.c.chen@intel.com Aside of drivers allocating gazillion of interrupts, there are quite some things which can be addressed in the x86 vector management and in the core code. - Multi CPU affinities: A dubious property which is not available on all machines and causes major complexity both in the allocator and the cleanup/hotplug management. See: http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos - Priority level spreading: An obscure and undocumented property which I think is sufficiently argued to be not required in: http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos - Allocation of vectors when interrupt descriptors are allocated. This is a historical implementation detail, which is not really required when the vector allocation is delayed up to the point when request_irq() is invoked. This might make request_irq() fail, when the vector space is exhausted, but drivers should handle request_irq() fails anyway. The upside of changing this is that the active vector space becomes smaller especially on hibernation/cpu offline when drivers shut down queue interrupts of outgoing CPUs. Some of this is already addressed with the managed interrupt facility, but that was bolted on top of the existing vector management because proper integration was not possible at that point. I take the blame for this, but the tradeoff of not doing it would have been more broken driver boiler plate code all over the place. So I went for the lesser of two evils. - Allocation of vectors on the wrong place Even for managed interrupts the vector allocation at descriptor allocation happens on the wrong place and gets fixed after the fact with a call to set_affinity(). In case of not remapped interrupts this results in at least one interrupt on the wrong CPU before it is migrated to the desired target. - Lack of instrumentation All of this is a black box which allows no insight into the actual vector usage. The series addresses these points and converts the x86 vector management to a bitmap based allocator which provides proper reservation management for 'managed interrupts' and best effort reservation for regular interrupts. The latter allows overcommitment, which 'fixes' some of hotplug/hibernation problems in a clean way. It can't fix all of them depending on the driver involved. This rework is no excuse for driver writers to do exhaustive vector allocations instead of utilizing the managed interrupt infrastructure, but it addresses long standing issues in this code with the side effect of mitigating some of the driver oddities. The proper solution for multi queue management are 'managed interrupts' which has been proven in the block-mq work as they solve issues which are worked around in other drivers in creative ways with lots of copied code and often enough broken attempts to handle interrupt affinity and CPU hotplug problems. The new bitmap allocator and the x86 vector management code are instrumented with tracepoints and the irq domain debugfs files allow deep insight into the vector allocation and reservations. The patches work on machines with and without interrupt remapping and inside of KVM guests of various flavours, though I have no idea what I broke on the way with other hypervisors, posted interrupts etc. So I kindly ask for your support in testing and review. The series applies on top of Linus tree and is available as git branch: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/apic Note, that this branch is Linus tree plus scheduler and x86 fixes which I required to do proper testing. They have outstanding pull requests and might be merged already when you read this. Thanks, tglx --- arch/x86/include/asm/x2apic.h | 49 - b/arch/x86/Kconfig | 1 b/arch/x86/include/asm/apic.h | 255 +----- b/arch/x86/include/asm/desc.h | 2 b/arch/x86/include/asm/hw_irq.h | 6 b/arch/x86/include/asm/io_apic.h | 2 b/arch/x86/include/asm/irq.h | 4 b/arch/x86/include/asm/irq_vectors.h | 8 b/arch/x86/include/asm/irqdomain.h | 5 b/arch/x86/include/asm/kvm_host.h | 2 b/arch/x86/include/asm/trace/irq_vectors.h | 244 ++++++ b/arch/x86/kernel/apic/Makefile | 2 b/arch/x86/kernel/apic/apic.c | 38 - b/arch/x86/kernel/apic/apic_common.c | 46 + b/arch/x86/kernel/apic/apic_flat_64.c | 10 b/arch/x86/kernel/apic/apic_noop.c | 25 b/arch/x86/kernel/apic/apic_numachip.c | 12 b/arch/x86/kernel/apic/bigsmp_32.c | 8 b/arch/x86/kernel/apic/htirq.c | 5 b/arch/x86/kernel/apic/io_apic.c | 94 -- b/arch/x86/kernel/apic/msi.c | 5 b/arch/x86/kernel/apic/probe_32.c | 29 b/arch/x86/kernel/apic/vector.c | 1090 +++++++++++++++++------------ b/arch/x86/kernel/apic/x2apic.h | 9 b/arch/x86/kernel/apic/x2apic_cluster.c | 196 +---- b/arch/x86/kernel/apic/x2apic_phys.c | 44 + b/arch/x86/kernel/apic/x2apic_uv_x.c | 17 b/arch/x86/kernel/i8259.c | 1 b/arch/x86/kernel/idt.c | 12 b/arch/x86/kernel/irq.c | 101 -- b/arch/x86/kernel/irqinit.c | 1 b/arch/x86/kernel/setup.c | 12 b/arch/x86/kernel/smpboot.c | 14 b/arch/x86/kernel/traps.c | 2 b/arch/x86/kernel/vsmp_64.c | 19 b/arch/x86/platform/uv/uv_irq.c | 5 b/arch/x86/xen/apic.c | 6 b/drivers/gpio/gpio-xgene-sb.c | 7 b/drivers/iommu/amd_iommu.c | 44 - b/drivers/iommu/intel_irq_remapping.c | 43 - b/drivers/irqchip/irq-gic-v3-its.c | 5 b/drivers/pinctrl/stm32/pinctrl-stm32.c | 5 b/include/linux/irq.h | 22 b/include/linux/irqdesc.h | 1 b/include/linux/irqdomain.h | 14 b/include/linux/msi.h | 5 b/include/trace/events/irq_matrix.h | 201 +++++ b/kernel/irq/Kconfig | 3 b/kernel/irq/Makefile | 1 b/kernel/irq/autoprobe.c | 2 b/kernel/irq/chip.c | 37 b/kernel/irq/debugfs.c | 12 b/kernel/irq/internals.h | 19 b/kernel/irq/irqdesc.c | 3 b/kernel/irq/irqdomain.c | 43 - b/kernel/irq/manage.c | 18 b/kernel/irq/matrix.c | 443 +++++++++++ b/kernel/irq/msi.c | 32 58 files changed, 2133 insertions(+), 1208 deletions(-)