linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>, X86 Kernel <x86@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	iommu@lists.linux.dev, Lu Baolu <baolu.lu@linux.intel.com>,
	kvm@vger.kernel.org, Dave Hansen <dave.hansen@intel.com>,
	Joerg Roedel <joro@8bytes.org>, "H. Peter Anvin" <hpa@zytor.com>,
	Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
	Paul Luse <paul.e.luse@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Jens Axboe <axboe@kernel.dk>, Raj Ashok <ashok.raj@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	maz@kernel.org, seanjc@google.com,
	Robin Murphy <robin.murphy@arm.com>,
	jim.harris@samsung.com, a.manzanares@samsung.com,
	Bjorn Helgaas <helgaas@kernel.org>,
	guang.zeng@intel.com, robert.hoo.linux@gmail.com,
	kan.liang@intel.com, "Kleen, Andi" <andi.kleen@intel.com>
Subject: Re: [PATCH v2 05/13] x86/irq: Reserve a per CPU IDT vector for posted MSIs
Date: Fri, 19 Apr 2024 17:07:17 -0300	[thread overview]
Message-ID: <ZiLO9RUdMsNlCtI_@x1> (raw)
In-Reply-To: <87jzkuxaqv.ffs@tglx>

On Fri, Apr 19, 2024 at 06:00:24AM +0200, Thomas Gleixner wrote:
> On Mon, Apr 15 2024 at 13:43, Jacob Pan wrote:
> > On Mon, 15 Apr 2024 11:53:58 -0700, Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> >> On Thu, 11 Apr 2024 18:51:14 +0200, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> > If we really care then we do it proper for _all_ of them. Something like
> >> > the uncompiled below. There is certainly a smarter way to do the build
> >> > thing, but my kbuild foo is rusty.  
> >> I too had the concern of the wasting system vectors, but did not know how
> >> to fix it. But now your code below works well. Tested without KVM in
> >> .config to show the gaps:
> >> 
> >> In VECTOR IRQ domain.
> >> 
> >> BEFORE:
> >> System: 46: 0-31,50,235-236,244,246-255
> >> 
> >> AFTER:
> >> System: 46: 0-31,50,241-242,245-255
> >> 
> >> The only gap is MANAGED_IRQ_SHUTDOWN_VECTOR(243), which is expected on a
> >> running system.
> >> 
> >> Verified in irqvectors.s: .ascii "->MANAGED_IRQ_SHUTDOWN_VECTOR $243
> >> 
> >> POSTED MSI/first system vector moved up from 235 to 241 for this case.
> >> 
> >> Will try to let tools/arch/x86/include/asm/irq_vectors.h also use it
> >> instead of manually copy over each time. Any suggestions greatly
> >> appreciated.
> >>
> > On a second thought, if we make system IRQ vector determined at compile
> > time based on different CONFIG options, will it break userspace tools such
> > as perf? More importantly the rule of not breaking userspace.

The rule for tools/perf is "don't impose _any requirement_ on the kernel
developers, they don't have to test if any change they do outside of
tools/ will break something inside tools/."
 
> tools/arch/x86/include/asm/irq_vectors.h is only used to generate the
> list of system vectors for pretty output. And your change already broke
> that.

Yeah, I even moved that from tools/arch/x86/include/asm/irq_vectors.h
to tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h (for next
merge window).

Having it in tools/arch/x86/include/asm/irq_vectors.h was a bad decision
as it, as you mentinoned, is only used to generate string tables:

⬢[acme@toolbox perf-tools-next]$ tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh 
static const char *x86_irq_vectors[] = {
	[0x02] = "NMI",
	[0x80] = "IA32_SYSCALL",
	[0xec] = "LOCAL_TIMER",
	[0xed] = "HYPERV_STIMER0",
	[0xee] = "HYPERV_REENLIGHTENMENT",
	[0xef] = "MANAGED_IRQ_SHUTDOWN",
	[0xf0] = "POSTED_INTR_NESTED",
	[0xf1] = "POSTED_INTR_WAKEUP",
	[0xf2] = "POSTED_INTR",
	[0xf3] = "HYPERVISOR_CALLBACK",
	[0xf4] = "DEFERRED_ERROR",
	[0xf6] = "IRQ_WORK",
	[0xf7] = "X86_PLATFORM_IPI",
	[0xf8] = "REBOOT",
	[0xf9] = "THRESHOLD_APIC",
	[0xfa] = "THERMAL_APIC",
	[0xfb] = "CALL_FUNCTION_SINGLE",
	[0xfc] = "CALL_FUNCTION",
	[0xfd] = "RESCHEDULE",
	[0xfe] = "ERROR_APIC",
	[0xff] = "SPURIOUS_APIC",
};

⬢[acme@toolbox perf-tools-next]$

Used in:

root@number:~# perf trace -a -e irq_vectors:irq_work_entry/max-stack=32/ --max-events=1
     0.000 kworker/u57:0-/9912 irq_vectors:irq_work_entry(vector: IRQ_WORK)
                                       __sysvec_irq_work ([kernel.kallsyms])
                                       __sysvec_irq_work ([kernel.kallsyms])
                                       sysvec_irq_work ([kernel.kallsyms])
                                       asm_sysvec_irq_work ([kernel.kallsyms])
                                       _raw_spin_unlock_irqrestore ([kernel.kallsyms])
                                       dma_fence_wait_timeout ([kernel.kallsyms])
                                       intel_atomic_commit_tail ([kernel.kallsyms])
                                       process_one_work ([kernel.kallsyms])
                                       worker_thread ([kernel.kallsyms])
                                       kthread ([kernel.kallsyms])
                                       ret_from_fork ([kernel.kallsyms])
                                       ret_from_fork_asm ([kernel.kallsyms])
root@number:~#

But as the original cset introducing this explains, these irq_vectors:
tracepoins operate on just one of the vectors, so irq_work_entry(vector:
IRQ_WORK), irq_vectors:reschedule_exit(vector: RESCHEDULE), etc. 

> The obvious solution to that is to expose that list in sysfs for
> consumption by perf.

nah, the best thing these days is stop using 'int' for vector and use
'enum irq_vector', then since we have BTF we can use that to do the enum
-> string translation, like with (using /sys/kernel/btf/vmlinux, that is
pretty much available everywhere these days):

root@number:~# pahole clocksource_ids
enum clocksource_ids {
	CSID_GENERIC          = 0,
	CSID_ARM_ARCH_COUNTER = 1,
	CSID_MAX              = 2,
};

root@number:~# pahole skb_drop_reason | head
enum skb_drop_reason {
	SKB_NOT_DROPPED_YET                     = 0,
	SKB_CONSUMED                            = 1,
	SKB_DROP_REASON_NOT_SPECIFIED           = 2,
	SKB_DROP_REASON_NO_SOCKET               = 3,
	SKB_DROP_REASON_PKT_TOO_SMALL           = 4,
	SKB_DROP_REASON_TCP_CSUM                = 5,
	SKB_DROP_REASON_SOCKET_FILTER           = 6,
	SKB_DROP_REASON_UDP_CSUM                = 7,
	SKB_DROP_REASON_NETFILTER_DROP          = 8,
root@number:~#

Then its easy to go from 0 to CSID_GENERIC, etc.

⬢[acme@toolbox pahole]$ perf stat -e cycles pahole skb_drop_reason > /dev/null

 Performance counter stats for 'pahole skb_drop_reason':

         6,095,427      cpu_atom/cycles:u/                                                      (2.82%)
       103,694,633      cpu_core/cycles:u/                                                      (97.18%)

       0.039031759 seconds time elapsed

       0.016028000 seconds user
       0.023007000 seconds sys


⬢[acme@toolbox pahole]$

- Arnaldo
 
> But we don't have to do any of that right away. It's an orthogonal
> issue. Just waste the extra system vector to start with and then we can
> add the compile time dependend change on top if we really care about
> gaining back the vectors.
> 
> Thanks,
> 
>         tglx

  reply	other threads:[~2024-04-19 20:07 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05 22:30 [PATCH v2 00/13] Coalesced Interrupt Delivery with posted MSI Jacob Pan
2024-04-05 22:30 ` [PATCH v2 01/13] x86/irq: Move posted interrupt descriptor out of vmx code Jacob Pan
2024-04-17  0:34   ` Sean Christopherson
2024-04-17 18:33     ` Jacob Pan
2024-04-05 22:30 ` [PATCH v2 02/13] x86/irq: Unionize PID.PIR for 64bit access w/o casting Jacob Pan
2024-04-05 22:31 ` [PATCH v2 03/13] x86/irq: Remove bitfields in posted interrupt descriptor Jacob Pan
2024-04-17  0:39   ` Sean Christopherson
2024-04-17 18:01     ` Jacob Pan
2024-04-18 17:30       ` Thomas Gleixner
2024-04-18 18:10         ` Jacob Pan
2024-04-05 22:31 ` [PATCH v2 04/13] x86/irq: Add a Kconfig option for posted MSI Jacob Pan
2024-04-05 22:31 ` [PATCH v2 05/13] x86/irq: Reserve a per CPU IDT vector for posted MSIs Jacob Pan
2024-04-11 16:51   ` Thomas Gleixner
2024-04-15 18:53     ` Jacob Pan
2024-04-15 20:43       ` Jacob Pan
2024-04-19  4:00         ` Thomas Gleixner
2024-04-19 20:07           ` Arnaldo Carvalho de Melo [this message]
2024-04-22 22:32             ` Jacob Pan
2024-04-12  9:14   ` Tian, Kevin
2024-04-12 14:27     ` Sean Christopherson
2024-04-16  3:45       ` Tian, Kevin
2024-04-05 22:31 ` [PATCH v2 06/13] x86/irq: Set up per host CPU posted interrupt descriptors Jacob Pan
2024-04-12  9:16   ` Tian, Kevin
2024-04-12 17:54     ` Jacob Pan
2024-04-05 22:31 ` [PATCH v2 07/13] x86/irq: Factor out calling ISR from common_interrupt Jacob Pan
2024-04-12  9:21   ` Tian, Kevin
2024-04-12 16:50     ` Jacob Pan
2024-04-05 22:31 ` [PATCH v2 08/13] x86/irq: Install posted MSI notification handler Jacob Pan
2024-04-11  7:52   ` Tian, Kevin
2024-04-11 17:38     ` Jacob Pan
2024-04-11 16:54   ` Thomas Gleixner
2024-04-11 18:29     ` Jacob Pan
2024-04-05 22:31 ` [PATCH v2 09/13] x86/irq: Factor out common code for checking pending interrupts Jacob Pan
2024-04-05 22:31 ` [PATCH v2 10/13] x86/irq: Extend checks for pending vectors to posted interrupts Jacob Pan
2024-04-12  9:25   ` Tian, Kevin
2024-04-12 18:23     ` Jacob Pan
2024-04-16  3:47       ` Tian, Kevin
2024-04-05 22:31 ` [PATCH v2 11/13] iommu/vt-d: Make posted MSI an opt-in cmdline option Jacob Pan
2024-04-06  4:31   ` Robert Hoo
2024-04-08 23:33     ` Jacob Pan
2024-04-13 10:59       ` Robert Hoo
2024-04-12  9:31   ` Tian, Kevin
2024-04-15 23:20     ` Jacob Pan
2024-04-05 22:31 ` [PATCH v2 12/13] iommu/vt-d: Add an irq_chip for posted MSIs Jacob Pan
2024-04-12  9:36   ` Tian, Kevin
2024-04-16 22:15     ` Jacob Pan
2024-04-05 22:31 ` [PATCH v2 13/13] iommu/vt-d: Enable posted mode for device MSIs Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiLO9RUdMsNlCtI_@x1 \
    --to=acme@kernel.org \
    --cc=a.manzanares@samsung.com \
    --cc=andi.kleen@intel.com \
    --cc=ashok.raj@intel.com \
    --cc=axboe@kernel.dk \
    --cc=baolu.lu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=guang.zeng@intel.com \
    --cc=helgaas@kernel.org \
    --cc=hpa@zytor.com \
    --cc=iommu@lists.linux.dev \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jim.harris@samsung.com \
    --cc=joro@8bytes.org \
    --cc=kan.liang@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=paul.e.luse@intel.com \
    --cc=peterz@infradead.org \
    --cc=robert.hoo.linux@gmail.com \
    --cc=robin.murphy@arm.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).