linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Andy Lutomirski <luto@kernel.org>,
	Sohil Mehta <sohil.mehta@intel.com>,
	the arch/x86 maintainers <x86@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>, Jens Axboe <axboe@kernel.dk>,
	Christian Brauner <christian@brauner.io>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Shuah Khan <shuah@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Jonathan Corbet <corbet@lwn.net>, Raj Ashok <ashok.raj@intel.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	Gayatri Kammela <gayatri.kammela@intel.com>,
	Zeng Guang <guang.zeng@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	Randy E Witt <randy.e.witt@intel.com>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	Ramesh Thomas <ramesh.thomas@intel.com>,
	Linux API <linux-api@vger.kernel.org>,
	linux-arch@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-kselftest@vger.kernel.org
Subject: Re: [RFC PATCH 11/13] x86/uintr: Introduce uintr_wait() syscall
Date: Fri, 01 Oct 2021 02:01:37 +0200	[thread overview]
Message-ID: <87tui162am.ffs@tglx> (raw)
In-Reply-To: <b537a890-4b9f-462e-8c17-5c7aa9b60138@www.fastmail.com>

On Thu, Sep 30 2021 at 15:01, Andy Lutomirski wrote:
> On Thu, Sep 30, 2021, at 12:29 PM, Thomas Gleixner wrote:
>>
>> But even with that we still need to keep track of the armed ones per CPU
>> so we can handle CPU hotunplug correctly. Sigh...
>
> I don’t think any real work is needed. We will only ever have armed
> UPIDs (with notification interrupts enabled) for running tasks, and
> hot-unplugged CPUs don’t have running tasks.

That's not the problem. The problem is the wait for uintr case where the
task is obviously not running:

CPU 1
     upid = T1->upid;
     upid->vector = UINTR_WAIT_VECTOR;
     upid->ndst = local_apic_id();
     ...
     do {
         ....
         schedule();
     }

CPU 0
    unplug CPU 1

    SENDUPI(index)
        // Hardware does:
        tblentry = &ttable[index];
        upid = tblentry->upid;
        upid->pir |= tblentry->uv;
        send_IPI(upid->vector, upid->ndst);

So SENDUPI will send the IPI to the APIC ID provided by T1->upid.ndst
which points to the offlined CPU 1 and therefore is obviously going to
/dev/null. IOW, lost wakeup...

> We do need a way to drain pending IPIs before we offline a CPU, but
> that’s a separate problem and may be unsolvable for all I know. Is
> there a magic APIC operation to wait until all initiated IPIs
> targeting the local CPU arrive?  I guess we can also just mask the
> notification vector so that it won’t crash us if we get a stale IPI
> after going offline.

All of this is solved already otherwise CPU hot unplug would explode in
your face every time. The software IPI send side is carefully
synchronized vs. hotplug (at least in theory). May I ask you politely to
make yourself familiar with all that before touting "We do need..." based
on random assumptions?

The above SENDUIPI vs. CPU hotplug scenario is the same problem as we
have with regular device interrupts which are targeted at an outgoing
CPU. We have magic mechanisms in place to handle that to the extent
possible, but due to the insanity of X86 interrupt handling mechanics
that still leaves a very tiny hole which might cause a lost and
subsequently stale interrupt. Nothing we can fix in software.

So on CPU offline the hotplug code walks through all device interrupts
and checks whether they are targeted at the outgoing CPU. If so they are
rerouted to an online CPU with lots of care to make the possible race
window as small as it gets. That's nowadays only a problem on systems
where interrupt remapping is not available or disabled via commandline.

For tasks which just have the user interrupt armed there is no problem
because SENDUPI modifies UPID->PIR which is reevaluated when the task
which got migrated to an online CPU is going back to user space.

The uintr_wait() syscall creates the very same problem as we have with
device interrupts. Which means we need to make that wait thing:

     upid = T1->upid;
     upid->vector = UINTR_WAIT_VECTOR;
     upid->ndst = local_apic_id();
     list_add(this_cpu_ptr(pcp_uintrs), upid->pcp_uintr);
     ...
     do {
         ....
         schedule();
     }
     list_del_init(upid->pcp_uintr);

and the hotplug code do:

    for_each_entry_safe(upid, this_cpu_ptr(pcp_uintrs), ...) {
       list_del(upid->pcp_uintr);
       upid->ndst = apic_id_of_random_online_cpu();
       if (do_magic_checks_whether_ipi_is_pending())
         send_ipi(upid->vector, upid->ndst);
    }

See?

We could map that to the interrupt subsystem by creating a virtual
interrupt domain for this, but that would make uintr_wait() look like
this:

     irq = uintr_alloc_irq();
     request_irq(irq, ......);
     upid = T1->upid;
     upid->vector = UINTR_WAIT_VECTOR;
     upid->ndst = local_apic_id();
     list_add(this_cpu_ptr(pcp_uintrs), upid->pcp_uintr);
     ...
     do {
         ....
         schedule();
     }
     list_del_init(upid->pcp_uintr);
     free_irq(irq);

But the benefit of that is dubious as it creates overhead on both sides
of the sleep and the only real purpose of the irq request would be to
handle CPU hotunplug without the above per CPU list mechanics.

Welcome to my wonderful world!

Thanks,

        tglx

  reply	other threads:[~2021-10-01  0:01 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-13 20:01 [RFC PATCH 00/13] x86 User Interrupts support Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 01/13] x86/uintr/man-page: Include man pages draft for reference Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 02/13] Documentation/x86: Add documentation for User Interrupts Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 03/13] x86/cpu: Enumerate User Interrupts support Sohil Mehta
2021-09-23 22:24   ` Thomas Gleixner
2021-09-24 19:59     ` Sohil Mehta
2021-09-27 20:42     ` Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 04/13] x86/fpu/xstate: Enumerate User Interrupts supervisor state Sohil Mehta
2021-09-23 22:34   ` Thomas Gleixner
2021-09-27 22:25     ` Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 05/13] x86/irq: Reserve a user IPI notification vector Sohil Mehta
2021-09-23 23:07   ` Thomas Gleixner
2021-09-25 13:30     ` Thomas Gleixner
2021-09-26 12:39       ` Thomas Gleixner
2021-09-27 19:07         ` Sohil Mehta
2021-09-28  8:11           ` Thomas Gleixner
2021-09-27 19:26     ` Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 06/13] x86/uintr: Introduce uintr receiver syscalls Sohil Mehta
2021-09-23 12:26   ` Greg KH
2021-09-24  0:05     ` Thomas Gleixner
2021-09-27 23:20     ` Sohil Mehta
2021-09-28  4:39       ` Greg KH
2021-09-28 16:47         ` Sohil Mehta
2021-09-23 23:52   ` Thomas Gleixner
2021-09-27 23:57     ` Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 07/13] x86/process/64: Add uintr task context switch support Sohil Mehta
2021-09-24  0:41   ` Thomas Gleixner
2021-09-28  0:30     ` Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 08/13] x86/process/64: Clean up uintr task fork and exit paths Sohil Mehta
2021-09-24  1:02   ` Thomas Gleixner
2021-09-28  1:23     ` Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 09/13] x86/uintr: Introduce vector registration and uintr_fd syscall Sohil Mehta
2021-09-24 10:33   ` Thomas Gleixner
2021-09-28 20:40     ` Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 10/13] x86/uintr: Introduce user IPI sender syscalls Sohil Mehta
2021-09-23 12:28   ` Greg KH
2021-09-28 18:01     ` Sohil Mehta
2021-09-29  7:04       ` Greg KH
2021-09-29 14:27         ` Sohil Mehta
2021-09-24 10:54   ` Thomas Gleixner
2021-09-13 20:01 ` [RFC PATCH 11/13] x86/uintr: Introduce uintr_wait() syscall Sohil Mehta
2021-09-24 11:04   ` Thomas Gleixner
2021-09-25 12:08     ` Thomas Gleixner
2021-09-28 23:13       ` Sohil Mehta
2021-09-28 23:08     ` Sohil Mehta
2021-09-26 14:41   ` Thomas Gleixner
2021-09-29  1:09     ` Sohil Mehta
2021-09-29  3:30   ` Andy Lutomirski
2021-09-29  4:56     ` Sohil Mehta
2021-09-30 18:08       ` Andy Lutomirski
2021-09-30 19:29         ` Thomas Gleixner
2021-09-30 22:01           ` Andy Lutomirski
2021-10-01  0:01             ` Thomas Gleixner [this message]
2021-10-01  4:41               ` Andy Lutomirski
2021-10-01  9:56                 ` Thomas Gleixner
2021-10-01 15:13                   ` Andy Lutomirski
2021-10-01 18:04                     ` Sohil Mehta
2021-10-01 21:29                     ` Thomas Gleixner
2021-10-01 23:00                       ` Sohil Mehta
2021-10-01 23:04                       ` Andy Lutomirski
2021-09-13 20:01 ` [RFC PATCH 12/13] x86/uintr: Wire up the user interrupt syscalls Sohil Mehta
2021-09-13 20:01 ` [RFC PATCH 13/13] selftests/x86: Add basic tests for User IPI Sohil Mehta
2021-09-13 20:27 ` [RFC PATCH 00/13] x86 User Interrupts support Dave Hansen
2021-09-14 19:03   ` Mehta, Sohil
2021-09-23 12:19     ` Greg KH
2021-09-23 14:09       ` Greg KH
2021-09-23 14:46         ` Dave Hansen
2021-09-23 15:07           ` Greg KH
2021-09-23 23:24         ` Sohil Mehta
2021-09-23 23:09       ` Sohil Mehta
2021-09-24  0:17       ` Sohil Mehta
2021-09-23 14:39 ` Jens Axboe
2021-09-29  4:31 ` Andy Lutomirski
2021-09-30 16:30   ` Stefan Hajnoczi
2021-09-30 17:24     ` Sohil Mehta
2021-09-30 17:26       ` Andy Lutomirski
2021-10-01 16:35       ` Stefan Hajnoczi
2021-10-01 16:41         ` Richard Henderson
2021-09-30 16:26 ` Stefan Hajnoczi
2021-10-01  0:40   ` Sohil Mehta
2021-10-01  8:19 ` Pavel Machek
2021-11-18 22:19   ` Sohil Mehta
2021-11-16  3:49 ` Prakash Sangappa
2021-11-18 21:44   ` Sohil Mehta
2021-12-22 16:17 ` Chrisma Pakha
2022-01-07  2:08   ` Sohil Mehta
2022-01-17  1:14     ` Chrisma Pakha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tui162am.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=arnd@arndb.de \
    --cc=ashok.raj@intel.com \
    --cc=axboe@kernel.dk \
    --cc=bp@alien8.de \
    --cc=christian@brauner.io \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=gayatri.kammela@intel.com \
    --cc=guang.zeng@intel.com \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ramesh.thomas@intel.com \
    --cc=randy.e.witt@intel.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=shuah@kernel.org \
    --cc=sohil.mehta@intel.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).