All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com>
Cc: "Oleksandr Tyshchenko" <Oleksandr_Tyshchenko@epam.com>,
	"Volodymyr Babchuk" <Volodymyr_Babchuk@epam.com>,
	"Artem Mygaiev" <Artem_Mygaiev@epam.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
	"george.dunlap@citrix.com" <george.dunlap@citrix.com>,
	"paul@xen.org" <paul@xen.org>,
	"Bertrand Marquis" <bertrand.marquis@arm.com>,
	"Rahul Singh" <rahul.singh@arm.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"julien@xen.org" <julien@xen.org>
Subject: Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal
Date: Mon, 22 Nov 2021 15:57:38 +0100	[thread overview]
Message-ID: <f16c70be-f5f6-5f80-54f1-7ba69adc114e@suse.com> (raw)
In-Reply-To: <d8d6cbcb-478a-a5a7-4e93-036b0f75c6d0@epam.com>

On 22.11.2021 15:45, Oleksandr Andrushchenko wrote:
> 
> 
> On 22.11.21 16:37, Jan Beulich wrote:
>> On 22.11.2021 15:21, Oleksandr Andrushchenko wrote:
>>> On 19.11.21 15:34, Oleksandr Andrushchenko wrote:
>>>> On 19.11.21 15:25, Jan Beulich wrote:
>>>>> On 19.11.2021 14:16, Oleksandr Andrushchenko wrote:
>>>>>> On 19.11.21 15:00, Jan Beulich wrote:
>>>>>>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote:
>>>>>>>> Possible locking and other work needed:
>>>>>>>> =======================================
>>>>>>>>
>>>>>>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host
>>>>>>>> 2. pdev->vpci->lock cannot be used as vpci is freed by vpci_remove_device
>>>>>>>> 3. We may want a dedicated per-domain rw lock to be implemented:
>>>>>>>>
>>>>>>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>>>>>>> index 28146ee404e6..ebf071893b21 100644
>>>>>>>> --- a/xen/include/xen/sched.h
>>>>>>>> +++ b/xen/include/xen/sched.h
>>>>>>>> @@ -444,6 +444,7 @@ struct domain
>>>>>>>>
>>>>>>>>       #ifdef CONFIG_HAS_PCI
>>>>>>>>           struct list_head pdev_list;
>>>>>>>> +    rwlock_t vpci_rwlock;
>>>>>>>> +    bool vpci_terminating; <- atomic?
>>>>>>>>       #endif
>>>>>>>> then vpci_remove_device is a writer (cold path) and vpci_process_pending and
>>>>>>>> vpci_mmio_{read|write} are readers (hot path).
>>>>>>> Right - you need such a lock for other purposes anyway, as per the
>>>>>>> discussion with Julien.
>>>>>> What about bool vpci_terminating? Do you see it as an atomic type or just bool?
>>>>> Having seen only ...
>>>>>
>>>>>>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need hypercall_create_continuation
>>>>>>>> to be implemented, so when re-start removal if need be:
>>>>>>>>
>>>>>>>> vpci_remove_device()
>>>>>>>> {
>>>>>>>>        d->vpci_terminating = true;
>>>>> ... this use so far, I can't tell yet. But at a first glance a boolean
>>>>> looks to be what you need.
>>>>>
>>>>>>>>        remove vPCI register handlers <- this will cut off PCI_COMMAND emulation among others
>>>>>>>>        if ( !write_trylock(d->vpci_rwlock) )
>>>>>>>>          return -ERESTART;
>>>>>>>>        xfree(pdev->vpci);
>>>>>>>>        pdev->vpci = NULL;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for
>>>>>>>> other operations which may require it, e.g. virtual bus topology can
>>>>>>>> use it when assigning vSBDF etc.
>>>>>>>>
>>>>>>>> 4. vpci_remove_device needs to be removed from vpci_process_pending
>>>>>>>> and do nothing for Dom0 and crash DomU otherwise:
>>>>>>> Why is this? I'm not outright opposed, but I don't immediately see why
>>>>>>> trying to remove the problematic device wouldn't be a reasonable course
>>>>>>> of action anymore. vpci_remove_device() may need to become more careful
>>>>>>> as to not crashing,
>>>>>> vpci_remove_device does not crash, vpci_process_pending does
>>>>>>>      though.
>>>>>> Assume we are in an error state in vpci_process_pending *on one of the vCPUs*
>>>>>> and we call vpci_remove_device. vpci_remove_device tries to acquire the
>>>>>> lock and it can't just because there are some other vpci code is running on other vCPU.
>>>>>> Then what do we do here? We are in SoftIRQ context now and we can't spin
>>>>>> trying to acquire d->vpci_rwlock forever. Neither we can blindly free vpci
>>>>>> structure because it is seen by all vCPUs and may crash them.
>>>>>>
>>>>>> If vpci_remove_device is in hypercall context it just returns -ERESTART and
>>>>>> hypercall continuation helps here. But not in SoftIRQ context.
>>>>> Maybe then you want to invoke this cleanup from RCU context (whether
>>>>> vpci_remove_device() itself or a suitable clone there of is TBD)? (I
>>>>> will admit though that I didn't check whether that would satisfy all
>>>>> constraints.)
>>>>>
>>>>> Then again it also hasn't become clear to me why you use write_trylock()
>>>>> there. The lock contention you describe doesn't, on the surface, look
>>>>> any different from situations elsewhere.
>>>> I use write_trylock in vpci_remove_device because if we can't
>>>> acquire the lock then we defer device removal. This would work
>>>> well if called from a hypercall which will employ hypercall continuation.
>>>> But SoftIRQ getting -ERESTART is something that we can't probably
>>>> handle by restarting as hypercall can, thus I only see that vpci_process_pending
>>>> will need to spin and wait until vpci_remove_device succeeds.
>>> Does anybody have any better solution for preventing SoftIRQ from
>>> spinning on vpci_remove_device and -ERESTART?
>> Well, at this point I can suggest only a marginal improvement: Instead of
>> spinning inside the softirq handler, you want to re-raise the softirq and
>> exit the handler. That way at least higher "priority" softirqs won't be
>> starved.
>>
>> Beyond that - maybe the guest (or just a vcpu of it) needs pausing in such
>> an event, with the work deferred to a tasklet?
>>
>> Yet I don't think my earlier question regarding the use of write_trylock()
>> was really answered. What you said in reply doesn't explain (to me at
>> least) why write_lock() is not an option.
> I was thinking that we do not want to freeze in case we are calling vpci_remove_device
> from SoftIRQ context, thus we try to lock and if we can't we return -ERESTART
> indicating that the removal needs to be deferred. If we use write_lock, then
> SoftIRQ -> write_lock will spin there waiting for readers to release the lock.
> 
> write_lock actually makes things a lot easier, but I just don't know if it
> is ok to use it. If so, then vpci_remove_device becomes synchronous and
> there is no need in hypercall continuation and other heavy machinery for
> re-scheduling SoftIRQ...

I'm inclined to ask: If it wasn't okay to use here, then where would it be
okay to use? Of course I realize there are cases when long spinning times
can be a problem. But I expect there aren't going to be excessively long
lock holding regions for this lock, and I also would expect average
contention to not be overly bad. But in the end you know better the code
that you're writing (and which may lead to issues with the lock usage) than
I do ...

Jan



  reply	other threads:[~2021-11-22 14:58 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-05  6:56 [PATCH v4 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 01/11] vpci: fix function attributes for vpci_process_pending Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal Oleksandr Andrushchenko
2021-11-15 16:56   ` Jan Beulich
2021-11-16  7:32     ` Oleksandr Andrushchenko
2021-11-16  8:01       ` Jan Beulich
2021-11-16  8:23         ` Oleksandr Andrushchenko
2021-11-16 11:38           ` Jan Beulich
2021-11-16 13:27             ` Oleksandr Andrushchenko
2021-11-16 14:11               ` Jan Beulich
2021-11-16 13:41           ` Oleksandr Andrushchenko
2021-11-16 14:12             ` Jan Beulich
2021-11-16 14:24               ` Oleksandr Andrushchenko
2021-11-16 14:37                 ` Oleksandr Andrushchenko
2021-11-16 16:09                 ` Jan Beulich
2021-11-16 18:02                 ` Julien Grall
2021-11-18 12:57                   ` Oleksandr Andrushchenko
2021-11-17  8:28   ` Jan Beulich
2021-11-18  7:49     ` Oleksandr Andrushchenko
2021-11-18  8:36       ` Jan Beulich
2021-11-18  8:54         ` Oleksandr Andrushchenko
2021-11-18  9:15           ` Jan Beulich
2021-11-18  9:32             ` Oleksandr Andrushchenko
2021-11-18 13:25               ` Jan Beulich
2021-11-18 13:48                 ` Oleksandr Andrushchenko
2021-11-18 14:04                   ` Roger Pau Monné
2021-11-18 14:14                     ` Oleksandr Andrushchenko
2021-11-18 14:35                       ` Jan Beulich
2021-11-18 15:11                         ` Oleksandr Andrushchenko
2021-11-18 15:16                           ` Jan Beulich
2021-11-18 15:21                             ` Oleksandr Andrushchenko
2021-11-18 15:41                               ` Jan Beulich
2021-11-18 15:46                                 ` Oleksandr Andrushchenko
2021-11-18 15:53                                   ` Jan Beulich
2021-11-19 12:34                                     ` Oleksandr Andrushchenko
2021-11-19 13:00                                       ` Jan Beulich
2021-11-19 13:16                                         ` Oleksandr Andrushchenko
2021-11-19 13:25                                           ` Jan Beulich
2021-11-19 13:34                                             ` Oleksandr Andrushchenko
2021-11-22 14:21                                               ` Oleksandr Andrushchenko
2021-11-22 14:37                                                 ` Jan Beulich
2021-11-22 14:45                                                   ` Oleksandr Andrushchenko
2021-11-22 14:57                                                     ` Jan Beulich [this message]
2021-11-22 15:02                                                       ` Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 03/11] vpci: make vpci registers removal a dedicated function Oleksandr Andrushchenko
2021-11-15 16:57   ` Jan Beulich
2021-11-16  8:02     ` Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 04/11] vpci: add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
2021-11-15 17:06   ` Jan Beulich
2021-11-16  9:38     ` Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 05/11] vpci/header: implement guest BAR register handlers Oleksandr Andrushchenko
2021-11-19 11:58   ` Jan Beulich
2021-11-19 12:10     ` Oleksandr Andrushchenko
2021-11-19 12:37       ` Jan Beulich
2021-11-19 12:46         ` Oleksandr Andrushchenko
2021-11-19 12:49           ` Jan Beulich
2021-11-19 12:54             ` Oleksandr Andrushchenko
2021-11-19 13:02               ` Jan Beulich
2021-11-19 13:17                 ` Oleksandr Andrushchenko
2021-11-23 15:14                 ` Oleksandr Andrushchenko
2021-11-24 12:32                   ` Roger Pau Monné
2021-11-24 12:36                     ` Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 06/11] vpci/header: handle p2m range sets per BAR Oleksandr Andrushchenko
2021-11-19 12:05   ` Jan Beulich
2021-11-19 12:13     ` Oleksandr Andrushchenko
2021-11-19 12:45       ` Jan Beulich
2021-11-19 12:50         ` Oleksandr Andrushchenko
2021-11-19 13:06           ` Jan Beulich
2021-11-19 13:19             ` Oleksandr Andrushchenko
2021-11-19 13:29               ` Jan Beulich
2021-11-19 13:38                 ` Oleksandr Andrushchenko
2021-11-19 13:16   ` Jan Beulich
2021-11-19 13:41     ` Oleksandr Andrushchenko
2021-11-19 13:57       ` Jan Beulich
2021-11-19 14:09         ` Oleksandr Andrushchenko
2021-11-22  8:24           ` Jan Beulich
2021-11-22  8:31             ` Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 07/11] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
2021-11-19 12:33   ` Jan Beulich
2021-11-19 12:44     ` Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 08/11] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 09/11] vpci/header: reset the command register when adding devices Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 10/11] vpci: add initial support for virtual PCI bus topology Oleksandr Andrushchenko
2021-11-18 16:45   ` Jan Beulich
2021-11-24 11:28     ` Oleksandr Andrushchenko
2021-11-24 12:36       ` Roger Pau Monné
2021-11-24 12:43         ` Oleksandr Andrushchenko
2021-11-05  6:56 ` [PATCH v4 11/11] xen/arm: translate virtual PCI bus topology for guests Oleksandr Andrushchenko
2021-11-08 11:10   ` Jan Beulich
2021-11-08 11:16     ` Oleksandr Andrushchenko
2021-11-08 14:23       ` Roger Pau Monné
2021-11-08 15:28         ` Oleksandr Andrushchenko
2021-11-24 11:31           ` Oleksandr Andrushchenko
2021-11-19 13:56 ` [PATCH v4 00/11] PCI devices passthrough on Arm, part 3 Jan Beulich
2021-11-19 14:06   ` Oleksandr Andrushchenko
2021-11-19 14:23   ` Roger Pau Monné
2021-11-19 14:26     ` Oleksandr Andrushchenko
2021-11-20  9:47       ` Roger Pau Monné
2021-11-22  8:22     ` Jan Beulich
2021-11-22  8:34       ` Oleksandr Andrushchenko
2021-11-22  8:44         ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f16c70be-f5f6-5f80-54f1-7ba69adc114e@suse.com \
    --to=jbeulich@suse.com \
    --cc=Artem_Mygaiev@epam.com \
    --cc=Oleksandr_Andrushchenko@epam.com \
    --cc=Oleksandr_Tyshchenko@epam.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=bertrand.marquis@arm.com \
    --cc=george.dunlap@citrix.com \
    --cc=julien@xen.org \
    --cc=paul@xen.org \
    --cc=rahul.singh@arm.com \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.