From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751970AbdINLV6 (ORCPT ); Thu, 14 Sep 2017 07:21:58 -0400 Received: from mx2.suse.de ([195.135.220.15]:51556 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751789AbdINLVz (ORCPT ); Thu, 14 Sep 2017 07:21:55 -0400 Subject: Re: [patch 00/52] x86: Rework the vector management To: Thomas Gleixner , LKML Cc: Ingo Molnar , Peter Anvin , Marc Zyngier , Peter Zijlstra , Borislav Petkov , Chen Yu , Rui Zhang , "Rafael J. Wysocki" , Len Brown , Dan Williams , Christoph Hellwig , Paolo Bonzini , Joerg Roedel , Boris Ostrovsky , Tony Luck , "K. Y. Srinivasan" , Alok Kataria , Steven Rostedt , Arjan van de Ven References: <20170913212902.530704676@linutronix.de> From: Juergen Gross Message-ID: <01b6b691-c080-d8b0-588b-0e1d59a1c2ec@suse.com> Date: Thu, 14 Sep 2017 13:21:47 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20170913212902.530704676@linutronix.de> Content-Type: text/plain; charset=utf-8 Content-Language: de-DE Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 13/09/17 23:29, Thomas Gleixner wrote: > Sorry for the large CC list, but this is a major surgery. > > The vector management in x86 including the surrounding code is a > conglomorate of ancient bits and pieces which have been subject to > 'modernization' and featuritis over the years. The most obscure parts are > the vector allocation mechanics, the cleanup vector handling and the cpu > hotplug machinery. Replacing these pieces of art was on my todo list for a > long time. > > Recent attempts to 'solve' CPU offline / hibernation issues which are > partially caused by the current vector management implementation made me > look for real. Further information in this thread: > > http://lkml.kernel.org/r/cover.1504235838.git.yu.c.chen@intel.com > > Aside of drivers allocating gazillion of interrupts, there are quite some > things which can be addressed in the x86 vector management and in the core > code. > > - Multi CPU affinities: > > A dubious property which is not available on all machines and causes > major complexity both in the allocator and the cleanup/hotplug > management. See: > > http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos > > - Priority level spreading: > > An obscure and undocumented property which I think is sufficiently > argued to be not required in: > > http://lkml.kernel.org/r/alpine.DEB.2.20.1709071045440.1827@nanos > > - Allocation of vectors when interrupt descriptors are allocated. > > This is a historical implementation detail, which is not really > required when the vector allocation is delayed up to the point when > request_irq() is invoked. This might make request_irq() fail, when the > vector space is exhausted, but drivers should handle request_irq() > fails anyway. > > The upside of changing this is that the active vector space becomes > smaller especially on hibernation/cpu offline when drivers shut down > queue interrupts of outgoing CPUs. > > Some of this is already addressed with the managed interrupt facility, > but that was bolted on top of the existing vector management because > proper integration was not possible at that point. I take the blame > for this, but the tradeoff of not doing it would have been more > broken driver boiler plate code all over the place. So I went for the > lesser of two evils. > > - Allocation of vectors on the wrong place > > Even for managed interrupts the vector allocation at descriptor > allocation happens on the wrong place and gets fixed after the fact > with a call to set_affinity(). In case of not remapped interrupts > this results in at least one interrupt on the wrong CPU before it is > migrated to the desired target. > > - Lack of instrumentation > > All of this is a black box which allows no insight into the actual > vector usage. > > The series addresses these points and converts the x86 vector management to > a bitmap based allocator which provides proper reservation management for > 'managed interrupts' and best effort reservation for regular interrupts. > The latter allows overcommitment, which 'fixes' some of hotplug/hibernation > problems in a clean way. It can't fix all of them depending on the driver > involved. > > This rework is no excuse for driver writers to do exhaustive vector > allocations instead of utilizing the managed interrupt infrastructure, but > it addresses long standing issues in this code with the side effect of > mitigating some of the driver oddities. The proper solution for multi queue > management are 'managed interrupts' which has been proven in the block-mq > work as they solve issues which are worked around in other drivers in > creative ways with lots of copied code and often enough broken attempts to > handle interrupt affinity and CPU hotplug problems. > > The new bitmap allocator and the x86 vector management code are > instrumented with tracepoints and the irq domain debugfs files allow deep > insight into the vector allocation and reservations. > > The patches work on machines with and without interrupt remapping and > inside of KVM guests of various flavours, though I have no idea what I > broke on the way with other hypervisors, posted interrupts etc. So I kindly > ask for your support in testing and review. > > The series applies on top of Linus tree and is available as git branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/apic > > Note, that this branch is Linus tree plus scheduler and x86 fixes which I > required to do proper testing. They have outstanding pull requests and > might be merged already when you read this. > > Thanks, > > tglx > --- > arch/x86/include/asm/x2apic.h | 49 - > b/arch/x86/Kconfig | 1 > b/arch/x86/include/asm/apic.h | 255 +----- > b/arch/x86/include/asm/desc.h | 2 > b/arch/x86/include/asm/hw_irq.h | 6 > b/arch/x86/include/asm/io_apic.h | 2 > b/arch/x86/include/asm/irq.h | 4 > b/arch/x86/include/asm/irq_vectors.h | 8 > b/arch/x86/include/asm/irqdomain.h | 5 > b/arch/x86/include/asm/kvm_host.h | 2 > b/arch/x86/include/asm/trace/irq_vectors.h | 244 ++++++ > b/arch/x86/kernel/apic/Makefile | 2 > b/arch/x86/kernel/apic/apic.c | 38 - > b/arch/x86/kernel/apic/apic_common.c | 46 + > b/arch/x86/kernel/apic/apic_flat_64.c | 10 > b/arch/x86/kernel/apic/apic_noop.c | 25 > b/arch/x86/kernel/apic/apic_numachip.c | 12 > b/arch/x86/kernel/apic/bigsmp_32.c | 8 > b/arch/x86/kernel/apic/htirq.c | 5 > b/arch/x86/kernel/apic/io_apic.c | 94 -- > b/arch/x86/kernel/apic/msi.c | 5 > b/arch/x86/kernel/apic/probe_32.c | 29 > b/arch/x86/kernel/apic/vector.c | 1090 +++++++++++++++++------------ > b/arch/x86/kernel/apic/x2apic.h | 9 > b/arch/x86/kernel/apic/x2apic_cluster.c | 196 +---- > b/arch/x86/kernel/apic/x2apic_phys.c | 44 + > b/arch/x86/kernel/apic/x2apic_uv_x.c | 17 > b/arch/x86/kernel/i8259.c | 1 > b/arch/x86/kernel/idt.c | 12 > b/arch/x86/kernel/irq.c | 101 -- > b/arch/x86/kernel/irqinit.c | 1 > b/arch/x86/kernel/setup.c | 12 > b/arch/x86/kernel/smpboot.c | 14 > b/arch/x86/kernel/traps.c | 2 > b/arch/x86/kernel/vsmp_64.c | 19 > b/arch/x86/platform/uv/uv_irq.c | 5 > b/arch/x86/xen/apic.c | 6 > b/drivers/gpio/gpio-xgene-sb.c | 7 > b/drivers/iommu/amd_iommu.c | 44 - > b/drivers/iommu/intel_irq_remapping.c | 43 - > b/drivers/irqchip/irq-gic-v3-its.c | 5 > b/drivers/pinctrl/stm32/pinctrl-stm32.c | 5 > b/include/linux/irq.h | 22 > b/include/linux/irqdesc.h | 1 > b/include/linux/irqdomain.h | 14 > b/include/linux/msi.h | 5 > b/include/trace/events/irq_matrix.h | 201 +++++ > b/kernel/irq/Kconfig | 3 > b/kernel/irq/Makefile | 1 > b/kernel/irq/autoprobe.c | 2 > b/kernel/irq/chip.c | 37 > b/kernel/irq/debugfs.c | 12 > b/kernel/irq/internals.h | 19 > b/kernel/irq/irqdesc.c | 3 > b/kernel/irq/irqdomain.c | 43 - > b/kernel/irq/manage.c | 18 > b/kernel/irq/matrix.c | 443 +++++++++++ > b/kernel/irq/msi.c | 32 > 58 files changed, 2133 insertions(+), 1208 deletions(-) Complete series tested with paravirt + xen enabled 64 bit kernel: bare metal boot okay boot as Xen dom0 okay boot as Xen pv-domU okay boot as Xen HVM-domU with PV-drivers okay Vcpu onlining/offlining in pv-domU okay So you can add my: Tested-by: Juergen Gross Acked-by: Juergen Gross Juergen