xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>,
	"Jan Beulich" <jbeulich@suse.com>
Cc: "julien@xen.org" <julien@xen.org>,
	"sstabellini@kernel.org" <sstabellini@kernel.org>,
	Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>,
	Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	Artem Mygaiev <Artem_Mygaiev@epam.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
	"george.dunlap@citrix.com" <george.dunlap@citrix.com>,
	"paul@xen.org" <paul@xen.org>,
	Bertrand Marquis <bertrand.marquis@arm.com>,
	Rahul Singh <rahul.singh@arm.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com>
Subject: Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
Date: Tue, 8 Feb 2022 11:13:41 +0000	[thread overview]
Message-ID: <0f83fa71-c252-6e6c-47c9-3ef899b45901@epam.com> (raw)
In-Reply-To: <YgJK2zPszyTXGxMM@Air-de-Roger>



On 08.02.22 12:50, Roger Pau Monné wrote:
> On Tue, Feb 08, 2022 at 07:35:34AM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 18:44, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 18:37, Jan Beulich wrote:
>>>> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
>>>>> On 07.02.22 18:15, Jan Beulich wrote:
>>>>>> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
>>>>>>> On 07.02.22 17:26, Jan Beulich wrote:
>>>>>>>> 1b. Make vpci_write use write lock for writes to command register and BARs
>>>>>>>> only; keep using the read lock for all other writes.
>>>>>>> I am not quite sure how to do that. Do you mean something like:
>>>>>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>>>>>                      uint32_t data)
>>>>>>> [snip]
>>>>>>>          list_for_each_entry ( r, &pdev->vpci->handlers, node )
>>>>>>> {
>>>>>>> [snip]
>>>>>>>          if ( r->needs_write_lock)
>>>>>>>              write_lock(d->vpci_lock)
>>>>>>>          else
>>>>>>>              read_lock(d->vpci_lock)
>>>>>>> ....
>>>>>>>
>>>>>>> And provide rw as an argument to:
>>>>>>>
>>>>>>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>>>>>>                            vpci_write_t *write_handler, unsigned int offset,
>>>>>>>                            unsigned int size, void *data, --->>> bool write_path <<<-----)
>>>>>>>
>>>>>>> Is this what you mean?
>>>>>> This sounds overly complicated. You can derive locally in vpci_write(),
>>>>>> from just its "reg" and "size" parameters, whether the lock needs taking
>>>>>> in write mode.
>>>>> Yes, I started writing a reply with that. So, the summary (ROM
>>>>> position depends on header type):
>>>>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
>>>>> {
>>>>>         read PCI_COMMAND and see if memory or IO decoding are enabled.
>>>>>         if ( enabled )
>>>>>             write_lock(d->vpci_lock)
>>>>>         else
>>>>>             read_lock(d->vpci_lock)
>>>>> }
>>>> Hmm, yes, you can actually get away without using "size", since both
>>>> command register and ROM BAR are 32-bit aligned registers, and 64-bit
>>>> accesses get split in vpci_ecam_write().
>>> But, OS may want reading a single byte of ROM BAR, so I think
>>> I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
>>> ranges
>>>> For the command register the memory- / IO-decoding-enabled check may
>>>> end up a little more complicated, as the value to be written also
>>>> matters. Maybe read the command register only for the ROM BAR write,
>>>> using the write lock uniformly for all command register writes?
>>> Sounds good for the start.
>>> Another concern is that if we go with a read_lock and then in the
>>> underlying code we disable memory decoding and try doing
>>> something and calling cmd_write handler for any reason then....
>>>
>>> I mean that the check in the vpci_write is somewhat we can tolerate,
>>> but then it is must be considered that no code in the read path
>>> is allowed to perform write path functions. Which brings a pretty
>>> valid use-case: say in read mode we detect an unrecoverable error
>>> and need to remove the device:
>>> vpci_process_pending -> ERROR -> vpci_remove_device or similar.
>>>
>>> What do we do then? It is all going to be fragile...
>> I have tried to summarize the options we have wrt locking
>> and would love to hear from @Roger and @Jan.
>>
>> In every variant there is a task of dealing with the overlap
>> detection in modify_bars, so this is the only place as of now
>> which needs special treatment.
>>
>> Existing limitations: there is no way to upgrade a read lock to a write
>> lock, so paths which may require write lock protection need to use
>> write lock from the very beginning. Workarounds can be applied.
>>
>> 1. Per-domain rw lock, aka d->vpci_lock
>> ==============================================================
>> Note: with per-domain rw lock it is possible to do without introducing
>> per-device locks, so pdev->vpci->lock can be removed and no pdev->vpci_lock
>> should be required.
> Er, no, I think you still need a per-device lock unless you intent to
> take the per-domain rwlock in write mode every time you modify data
> in vpci.
This is exactly the assumption stated below. I am trying to discuss
all the possible options, so this one is also listed
>   I still think you need pdev->vpci->lock. It's possible this
> approach doesn't require moving the lock outside of the vpci struct.
>
>> This is only going to work in case if vpci_write always takes the write lock
>> and vpci_read takes a read lock and no path in vpci_read is allowed to
>> perform write path operations.
> I think that's likely too strong?
>
> You could get away with both vpci_{read,write} only taking the read
> lock and use a per-device vpci lock?
But as discussed before:
- if pdev->vpci_lock is used this still leads to ABBA
- we should know about if to take the write lock beforehand
>
> Otherwise you are likely to introduce contention in msix_write if a
> guest makes heavy use of the MSI-X entry mask bit.
>
>> vpci_process_pending uses write lock as it have vpci_remove_device in its
>> error path.
>>
>> Pros:
>> - no per-device vpci lock is needed?
>> - solves overlap code ABBA in modify_bars
>>
>> Cons:
>> - all writes are serialized
>> - need to carefully select read paths, so they are guaranteed not to lead
>>     to lock upgrade use-cases
>>
>> 1.1. Semi read lock upgrade in modify bars
>> --------------------------------------------------------------
>> In this case both vpci_read and vpci_write take a read lock and when it comes
>> to modify_bars:
>>
>> 1. read_unlock(d->vpci_lock)
>> 2. write_lock(d->vpci_lock)
>> 3. Check that pdev->vpci is still available and is the same object:
>> if (pdev->vpci && (pdev->vpci == old_vpci) )
>> {
>>       /* vpci structure is valid and can be used. */
>> }
>> else
>> {
>>       /* vpci has gone, return an error. */
>> }
>>
>> Pros:
>> - no per-device vpci lock is needed?
>> - solves overlap code ABBA in modify_bars
>> - readers and writers are NOT serialized
>> - NO need to carefully select read paths, so they are guaranteed not to lead
>>     to lock upgrade use-cases
>>
>> Cons:
>> - ???
>>
>> 2. per-device lock (pdev->vpci_lock) + d->overlap_chk_lock
>> ==============================================================
>> In order to solve overlap ABBA, we introduce a per-domain helper
>> lock to protect the overlapping code in modify_bars:
>>
>>       old_vpci = pdev->vpci;
>>       spin_unlock(pdev->vpci_lock);
>>       spin_lock(pdev->domain->overlap_chk_lock);
> Since you drop the pdev lock you get a window here where either vpci
> or even pdev itself could be removed under your feet, so using
> pdev->vpci_lock like you do below could dereference a stale pdev.
pdev is anyways not protected with pcidevs lock here, so even
now it is possible to have pdev disapear in between.
We do not use pcidevs_lock in MMIO handlers...
>
>>       spin_lock(pdev->vpci_lock);
>>       if ( pdev->vpci && (pdev->vpci == old_vpci) )
>>           for_each_pdev ( pdev->domain, tmp )
>>           {
>>               if ( tmp != pdev )
>>               {
>>                   spin_lock(tmp->vpci_lock);
>>                   if ( tmp->vpci )
>>                       ...
>>               }
>>           }
>>
>> Pros:
>> - all accesses are independent, only the same device access is serialized
>> - no need to care about readers and writers wrt read lock upgrade issues
>>
>> Cons:
>> - helper spin lock
>>
>> 3. Move overlap detection into process pending
>> ==============================================================
>> There is a Roger's patch [1] which adds a possibility for vpci_process_pending
>> to perform different tasks rather than just map/unmap. With this patch extended
>> in a way that it can hold a request queue it is possible to delay execution
>> of the overlap code until no pdev->vpci_lock is held, but before returning to
>> a guest after vpci_{read|write} or similar.
>>
>> Pros:
>> - no need to emulate read lock upgrade
>> - fully parallel read/write
>> - queue in the vpci_process_pending will later on be used by SR-IOV,
>>     so this is going to help the future code
>> Cons:
>> - ???
> Maybe? It's hard to devise how that would end up looking like, and
> whether it won't still require such kind of double locking. We would
> still need to prevent doing a rangeset_remove_range for the device we
> are trying to setup the mapping for, at which point we still need to
> lock the current device plus the device we are iterating against?
>
> Since the code in vpci_process_pending is always executed in guest
> vCPU context requiring all guest vCPUs to be paused when doing a
> device addition or removal would prevent devices from going away, but
> we could still have issues with concurrent accesses from other vCPUs.
Yes, I understand that this may not be easily done, but this is still
an option,
>
>> 4. Re-write overlap detection code
>> ==============================================================
>> It is possible to re-write overlap detection code, so the information about the
>> mapped/unmapped regions is not read from vpci->header->bars[i] of each device,
>> but instead there is a per-domain structure which holds the regions and
>> implements reference counting.
>>
>> Pros:
>> - solves ABBA
>>
>> Cons:
>> - very complex code is expected
>>
>> 5. You name it
>> ==============================================================
>>
>>   From all the above I would recommend we go with option 2 which seems to reliably
>> solve ABBA and does not bring cons of the other approaches.
> 6. per-domain rwlock + per-device vpci lock
>
> Introduce vpci_header_write_lock(start, {end, size}) helper: return
> whether a range requires the per-domain lock in write mode. This will
> only return true if the range overlaps with the BAR ROM or the command
> register.
>
> In vpci_{read,write}:
>
> if ( vpci_header_write_lock(...) )
>      /* Gain exclusive access to all of the domain pdevs vpci. */
>      write_lock(d->vpci);
> else
> {
>      read_lock(d->vpci);
>      spin_lock(vpci->lock);
> }
> ...
>
> The vpci assign/deassign functions would need to be modified to write
> lock the per-domain rwlock. The MSI-X table MMIO handler will also
> need to read lock the per domain vpci lock.
Ok, so it seems you are in favor of this implementation and I have
no objection as well. The only limitation we should be aware of is
that once a path has acquired the read lock it is not possible to do
any write path operations in there.
vpci_process_pending will acquire write lock though as it can
lead to vpci_remove_device on its error path.

So, I am going to implement pdev->vpci->lock + d->vpci_lock
>
> I think it's either something along the lines of my suggestion above,
> or maybe option 3, albeit you would have to investigate how to
> implement option 3.
>
> Thanks, Roger.

@Roger, @Jan!
Thank you!!

  reply	other threads:[~2022-02-08 11:14 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-04  6:34 [PATCH v6 00/13] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole Oleksandr Andrushchenko
2022-02-04  8:51   ` Julien Grall
2022-02-04  9:01     ` Oleksandr Andrushchenko
2022-02-04  9:41       ` Julien Grall
2022-02-04  9:47         ` Oleksandr Andrushchenko
2022-02-04  9:57           ` Julien Grall
2022-02-04 10:35             ` Oleksandr Andrushchenko
2022-02-04 11:00               ` Julien Grall
2022-02-04 11:25                 ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 02/13] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 03/13] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
2022-02-04  7:52   ` Jan Beulich
2022-02-04  8:13     ` Oleksandr Andrushchenko
2022-02-04  8:36       ` Jan Beulich
2022-02-04  8:58     ` Oleksandr Andrushchenko
2022-02-04  9:15       ` Jan Beulich
2022-02-04 10:12         ` Oleksandr Andrushchenko
2022-02-04 10:49           ` Jan Beulich
2022-02-04 11:13             ` Roger Pau Monné
2022-02-04 11:37               ` Jan Beulich
2022-02-04 12:37                 ` Oleksandr Andrushchenko
2022-02-04 12:47                   ` Jan Beulich
2022-02-04 12:53                     ` Oleksandr Andrushchenko
2022-02-04 13:03                       ` Jan Beulich
2022-02-04 13:06                       ` Roger Pau Monné
2022-02-04 14:43                         ` Oleksandr Andrushchenko
2022-02-04 14:57                           ` Roger Pau Monné
2022-02-07 11:08                             ` Oleksandr Andrushchenko
2022-02-07 12:34                               ` Jan Beulich
2022-02-07 12:57                                 ` Oleksandr Andrushchenko
2022-02-07 13:02                                   ` Jan Beulich
2022-02-07 12:46                               ` Roger Pau Monné
2022-02-07 13:53                                 ` Oleksandr Andrushchenko
2022-02-07 14:11                                   ` Jan Beulich
2022-02-07 14:27                                     ` Roger Pau Monné
2022-02-07 14:33                                       ` Jan Beulich
2022-02-07 14:35                                       ` Oleksandr Andrushchenko
2022-02-07 15:11                                         ` Oleksandr Andrushchenko
2022-02-07 15:26                                           ` Jan Beulich
2022-02-07 16:07                                             ` Oleksandr Andrushchenko
2022-02-07 16:15                                               ` Jan Beulich
2022-02-07 16:21                                                 ` Oleksandr Andrushchenko
2022-02-07 16:37                                                   ` Jan Beulich
2022-02-07 16:44                                                     ` Oleksandr Andrushchenko
2022-02-08  7:35                                                       ` Oleksandr Andrushchenko
2022-02-08  8:57                                                         ` Jan Beulich
2022-02-08  9:03                                                           ` Oleksandr Andrushchenko
2022-02-08 10:50                                                         ` Roger Pau Monné
2022-02-08 11:13                                                           ` Oleksandr Andrushchenko [this message]
2022-02-08 13:38                                                             ` Roger Pau Monné
2022-02-08 13:52                                                               ` Oleksandr Andrushchenko
2022-02-08  8:53                                                       ` Jan Beulich
2022-02-08  9:00                                                         ` Oleksandr Andrushchenko
2022-02-08 10:11                                                     ` Roger Pau Monné
2022-02-08 10:32                                                       ` Oleksandr Andrushchenko
2022-02-07 16:08                                             ` Roger Pau Monné
2022-02-07 16:12                                               ` Jan Beulich
2022-02-07 14:28                                     ` Oleksandr Andrushchenko
2022-02-07 14:19                                   ` Roger Pau Monné
2022-02-07 14:27                                     ` Oleksandr Andrushchenko
2022-02-04 11:37               ` Oleksandr Andrushchenko
2022-02-04 12:15                 ` Roger Pau Monné
2022-02-04 10:57           ` Roger Pau Monné
2022-02-04  6:34 ` [PATCH v6 04/13] vpci: restrict unhandled read/write operations for guests Oleksandr Andrushchenko
2022-02-04 14:11   ` Jan Beulich
2022-02-04 14:24     ` Oleksandr Andrushchenko
2022-02-08  8:00       ` Oleksandr Andrushchenko
2022-02-08  9:04         ` Jan Beulich
2022-02-08  9:09           ` Oleksandr Andrushchenko
2022-02-08  9:05         ` Roger Pau Monné
2022-02-08  9:10           ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 05/13] vpci: add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
2022-02-07 16:28   ` Jan Beulich
2022-02-08  8:32     ` Oleksandr Andrushchenko
2022-02-08  9:13       ` Jan Beulich
2022-02-08  9:27         ` Oleksandr Andrushchenko
2022-02-08  9:44           ` Jan Beulich
2022-02-08  9:55             ` Oleksandr Andrushchenko
2022-02-08 10:09               ` Jan Beulich
2022-02-08 10:22                 ` Oleksandr Andrushchenko
2022-02-08 10:29                   ` Jan Beulich
2022-02-08 10:52                     ` Oleksandr Andrushchenko
2022-02-08 11:00                       ` Jan Beulich
2022-02-08 11:25                         ` Oleksandr Andrushchenko
2022-02-10  8:21                           ` Oleksandr Andrushchenko
2022-02-10  9:22                             ` Jan Beulich
2022-02-10  9:33                               ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 06/13] vpci/header: implement guest BAR register handlers Oleksandr Andrushchenko
2022-02-07 17:06   ` Jan Beulich
2022-02-08  8:06     ` Oleksandr Andrushchenko
2022-02-08  9:16       ` Jan Beulich
2022-02-08  9:29         ` Roger Pau Monné
2022-02-08  9:25   ` Roger Pau Monné
2022-02-08  9:31     ` Oleksandr Andrushchenko
2022-02-08  9:48       ` Jan Beulich
2022-02-08  9:57         ` Oleksandr Andrushchenko
2022-02-08 10:15           ` Jan Beulich
2022-02-08 10:29             ` Oleksandr Andrushchenko
2022-02-08 13:58               ` Roger Pau Monné
2022-02-04  6:34 ` [PATCH v6 07/13] vpci/header: handle p2m range sets per BAR Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 08/13] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
2022-02-04 14:25   ` Jan Beulich
2022-02-08  8:13     ` Oleksandr Andrushchenko
2022-02-08  9:33       ` Jan Beulich
2022-02-08  9:38         ` Oleksandr Andrushchenko
2022-02-08  9:52           ` Jan Beulich
2022-02-08  9:58             ` Oleksandr Andrushchenko
2022-02-08 11:11               ` Roger Pau Monné
2022-02-08 11:29                 ` Oleksandr Andrushchenko
2022-02-08 14:09                   ` Roger Pau Monné
2022-02-08 14:13                     ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 10/13] vpci/header: reset the command register when adding devices Oleksandr Andrushchenko
2022-02-04 14:30   ` Jan Beulich
2022-02-04 14:37     ` Oleksandr Andrushchenko
2022-02-07  7:29       ` Jan Beulich
2022-02-07 11:27         ` Oleksandr Andrushchenko
2022-02-07 12:38           ` Jan Beulich
2022-02-07 12:51             ` Oleksandr Andrushchenko
2022-02-07 12:54               ` Jan Beulich
2022-02-07 14:17                 ` Oleksandr Andrushchenko
2022-02-07 14:31                   ` Jan Beulich
2022-02-07 14:46                     ` Oleksandr Andrushchenko
2022-02-07 15:05                       ` Jan Beulich
2022-02-07 15:14                         ` Oleksandr Andrushchenko
2022-02-07 15:28                           ` Jan Beulich
2022-02-07 15:59                             ` Oleksandr Andrushchenko
2022-02-10 12:54                     ` Oleksandr Andrushchenko
2022-02-10 13:36                       ` Jan Beulich
2022-02-10 13:56                         ` Oleksandr Andrushchenko
2022-02-10 12:59                     ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 11/13] vpci: add initial support for virtual PCI bus topology Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 12/13] xen/arm: translate virtual PCI bus topology for guests Oleksandr Andrushchenko
2022-02-04  7:56   ` Jan Beulich
2022-02-04  8:18     ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 13/13] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Andrushchenko
2022-02-11 15:28   ` Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0f83fa71-c252-6e6c-47c9-3ef899b45901@epam.com \
    --to=oleksandr_andrushchenko@epam.com \
    --cc=Artem_Mygaiev@epam.com \
    --cc=Oleksandr_Tyshchenko@epam.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=bertrand.marquis@arm.com \
    --cc=george.dunlap@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=paul@xen.org \
    --cc=rahul.singh@arm.com \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).