xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@epam.com>
Cc: Jan Beulich <jbeulich@suse.com>,
	"julien@xen.org" <julien@xen.org>,
	"sstabellini@kernel.org" <sstabellini@kernel.org>,
	Oleksandr Tyshchenko <Oleksandr_Tyshchenko@epam.com>,
	Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	Artem Mygaiev <Artem_Mygaiev@epam.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
	"george.dunlap@citrix.com" <george.dunlap@citrix.com>,
	"paul@xen.org" <paul@xen.org>,
	Bertrand Marquis <bertrand.marquis@arm.com>,
	Rahul Singh <rahul.singh@arm.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
Date: Tue, 8 Feb 2022 11:50:03 +0100	[thread overview]
Message-ID: <YgJK2zPszyTXGxMM@Air-de-Roger> (raw)
In-Reply-To: <e91965c5-0802-adf8-0c17-522f86ebf231@epam.com>

On Tue, Feb 08, 2022 at 07:35:34AM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 18:44, Oleksandr Andrushchenko wrote:
> >
> > On 07.02.22 18:37, Jan Beulich wrote:
> >> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
> >>> On 07.02.22 18:15, Jan Beulich wrote:
> >>>> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
> >>>>> On 07.02.22 17:26, Jan Beulich wrote:
> >>>>>> 1b. Make vpci_write use write lock for writes to command register and BARs
> >>>>>> only; keep using the read lock for all other writes.
> >>>>> I am not quite sure how to do that. Do you mean something like:
> >>>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> >>>>>                     uint32_t data)
> >>>>> [snip]
> >>>>>         list_for_each_entry ( r, &pdev->vpci->handlers, node )
> >>>>> {
> >>>>> [snip]
> >>>>>         if ( r->needs_write_lock)
> >>>>>             write_lock(d->vpci_lock)
> >>>>>         else
> >>>>>             read_lock(d->vpci_lock)
> >>>>> ....
> >>>>>
> >>>>> And provide rw as an argument to:
> >>>>>
> >>>>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
> >>>>>                           vpci_write_t *write_handler, unsigned int offset,
> >>>>>                           unsigned int size, void *data, --->>> bool write_path <<<-----)
> >>>>>
> >>>>> Is this what you mean?
> >>>> This sounds overly complicated. You can derive locally in vpci_write(),
> >>>> from just its "reg" and "size" parameters, whether the lock needs taking
> >>>> in write mode.
> >>> Yes, I started writing a reply with that. So, the summary (ROM
> >>> position depends on header type):
> >>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
> >>> {
> >>>        read PCI_COMMAND and see if memory or IO decoding are enabled.
> >>>        if ( enabled )
> >>>            write_lock(d->vpci_lock)
> >>>        else
> >>>            read_lock(d->vpci_lock)
> >>> }
> >> Hmm, yes, you can actually get away without using "size", since both
> >> command register and ROM BAR are 32-bit aligned registers, and 64-bit
> >> accesses get split in vpci_ecam_write().
> > But, OS may want reading a single byte of ROM BAR, so I think
> > I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
> > ranges
> >> For the command register the memory- / IO-decoding-enabled check may
> >> end up a little more complicated, as the value to be written also
> >> matters. Maybe read the command register only for the ROM BAR write,
> >> using the write lock uniformly for all command register writes?
> > Sounds good for the start.
> > Another concern is that if we go with a read_lock and then in the
> > underlying code we disable memory decoding and try doing
> > something and calling cmd_write handler for any reason then....
> >
> > I mean that the check in the vpci_write is somewhat we can tolerate,
> > but then it is must be considered that no code in the read path
> > is allowed to perform write path functions. Which brings a pretty
> > valid use-case: say in read mode we detect an unrecoverable error
> > and need to remove the device:
> > vpci_process_pending -> ERROR -> vpci_remove_device or similar.
> >
> > What do we do then? It is all going to be fragile...
> I have tried to summarize the options we have wrt locking
> and would love to hear from @Roger and @Jan.
> 
> In every variant there is a task of dealing with the overlap
> detection in modify_bars, so this is the only place as of now
> which needs special treatment.
> 
> Existing limitations: there is no way to upgrade a read lock to a write
> lock, so paths which may require write lock protection need to use
> write lock from the very beginning. Workarounds can be applied.
> 
> 1. Per-domain rw lock, aka d->vpci_lock
> ==============================================================
> Note: with per-domain rw lock it is possible to do without introducing
> per-device locks, so pdev->vpci->lock can be removed and no pdev->vpci_lock
> should be required.

Er, no, I think you still need a per-device lock unless you intent to
take the per-domain rwlock in write mode every time you modify data
in vpci. I still think you need pdev->vpci->lock. It's possible this
approach doesn't require moving the lock outside of the vpci struct.

> This is only going to work in case if vpci_write always takes the write lock
> and vpci_read takes a read lock and no path in vpci_read is allowed to
> perform write path operations.

I think that's likely too strong?

You could get away with both vpci_{read,write} only taking the read
lock and use a per-device vpci lock?

Otherwise you are likely to introduce contention in msix_write if a
guest makes heavy use of the MSI-X entry mask bit.

> vpci_process_pending uses write lock as it have vpci_remove_device in its
> error path.
> 
> Pros:
> - no per-device vpci lock is needed?
> - solves overlap code ABBA in modify_bars
> 
> Cons:
> - all writes are serialized
> - need to carefully select read paths, so they are guaranteed not to lead
>    to lock upgrade use-cases
> 
> 1.1. Semi read lock upgrade in modify bars
> --------------------------------------------------------------
> In this case both vpci_read and vpci_write take a read lock and when it comes
> to modify_bars:
> 
> 1. read_unlock(d->vpci_lock)
> 2. write_lock(d->vpci_lock)
> 3. Check that pdev->vpci is still available and is the same object:
> if (pdev->vpci && (pdev->vpci == old_vpci) )
> {
>      /* vpci structure is valid and can be used. */
> }
> else
> {
>      /* vpci has gone, return an error. */
> }
> 
> Pros:
> - no per-device vpci lock is needed?
> - solves overlap code ABBA in modify_bars
> - readers and writers are NOT serialized
> - NO need to carefully select read paths, so they are guaranteed not to lead
>    to lock upgrade use-cases
> 
> Cons:
> - ???
> 
> 2. per-device lock (pdev->vpci_lock) + d->overlap_chk_lock
> ==============================================================
> In order to solve overlap ABBA, we introduce a per-domain helper
> lock to protect the overlapping code in modify_bars:
> 
>      old_vpci = pdev->vpci;
>      spin_unlock(pdev->vpci_lock);
>      spin_lock(pdev->domain->overlap_chk_lock);

Since you drop the pdev lock you get a window here where either vpci
or even pdev itself could be removed under your feet, so using
pdev->vpci_lock like you do below could dereference a stale pdev.

>      spin_lock(pdev->vpci_lock);
>      if ( pdev->vpci && (pdev->vpci == old_vpci) )
>          for_each_pdev ( pdev->domain, tmp )
>          {
>              if ( tmp != pdev )
>              {
>                  spin_lock(tmp->vpci_lock);
>                  if ( tmp->vpci )
>                      ...
>              }
>          }
> 
> Pros:
> - all accesses are independent, only the same device access is serialized
> - no need to care about readers and writers wrt read lock upgrade issues
> 
> Cons:
> - helper spin lock
> 
> 3. Move overlap detection into process pending
> ==============================================================
> There is a Roger's patch [1] which adds a possibility for vpci_process_pending
> to perform different tasks rather than just map/unmap. With this patch extended
> in a way that it can hold a request queue it is possible to delay execution
> of the overlap code until no pdev->vpci_lock is held, but before returning to
> a guest after vpci_{read|write} or similar.
> 
> Pros:
> - no need to emulate read lock upgrade
> - fully parallel read/write
> - queue in the vpci_process_pending will later on be used by SR-IOV,
>    so this is going to help the future code
> Cons:
> - ???

Maybe? It's hard to devise how that would end up looking like, and
whether it won't still require such kind of double locking. We would
still need to prevent doing a rangeset_remove_range for the device we
are trying to setup the mapping for, at which point we still need to
lock the current device plus the device we are iterating against?

Since the code in vpci_process_pending is always executed in guest
vCPU context requiring all guest vCPUs to be paused when doing a
device addition or removal would prevent devices from going away, but
we could still have issues with concurrent accesses from other vCPUs.

> 
> 4. Re-write overlap detection code
> ==============================================================
> It is possible to re-write overlap detection code, so the information about the
> mapped/unmapped regions is not read from vpci->header->bars[i] of each device,
> but instead there is a per-domain structure which holds the regions and
> implements reference counting.
> 
> Pros:
> - solves ABBA
> 
> Cons:
> - very complex code is expected
> 
> 5. You name it
> ==============================================================
> 
>  From all the above I would recommend we go with option 2 which seems to reliably
> solve ABBA and does not bring cons of the other approaches.

6. per-domain rwlock + per-device vpci lock

Introduce vpci_header_write_lock(start, {end, size}) helper: return
whether a range requires the per-domain lock in write mode. This will
only return true if the range overlaps with the BAR ROM or the command
register.

In vpci_{read,write}:

if ( vpci_header_write_lock(...) )
    /* Gain exclusive access to all of the domain pdevs vpci. */
    write_lock(d->vpci);
else
{
    read_lock(d->vpci);
    spin_lock(vpci->lock);
}
...

The vpci assign/deassign functions would need to be modified to write
lock the per-domain rwlock. The MSI-X table MMIO handler will also
need to read lock the per domain vpci lock.

I think it's either something along the lines of my suggestion above,
or maybe option 3, albeit you would have to investigate how to
implement option 3.

Thanks, Roger.


  parent reply	other threads:[~2022-02-08 10:50 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-04  6:34 [PATCH v6 00/13] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole Oleksandr Andrushchenko
2022-02-04  8:51   ` Julien Grall
2022-02-04  9:01     ` Oleksandr Andrushchenko
2022-02-04  9:41       ` Julien Grall
2022-02-04  9:47         ` Oleksandr Andrushchenko
2022-02-04  9:57           ` Julien Grall
2022-02-04 10:35             ` Oleksandr Andrushchenko
2022-02-04 11:00               ` Julien Grall
2022-02-04 11:25                 ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 02/13] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 03/13] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
2022-02-04  7:52   ` Jan Beulich
2022-02-04  8:13     ` Oleksandr Andrushchenko
2022-02-04  8:36       ` Jan Beulich
2022-02-04  8:58     ` Oleksandr Andrushchenko
2022-02-04  9:15       ` Jan Beulich
2022-02-04 10:12         ` Oleksandr Andrushchenko
2022-02-04 10:49           ` Jan Beulich
2022-02-04 11:13             ` Roger Pau Monné
2022-02-04 11:37               ` Jan Beulich
2022-02-04 12:37                 ` Oleksandr Andrushchenko
2022-02-04 12:47                   ` Jan Beulich
2022-02-04 12:53                     ` Oleksandr Andrushchenko
2022-02-04 13:03                       ` Jan Beulich
2022-02-04 13:06                       ` Roger Pau Monné
2022-02-04 14:43                         ` Oleksandr Andrushchenko
2022-02-04 14:57                           ` Roger Pau Monné
2022-02-07 11:08                             ` Oleksandr Andrushchenko
2022-02-07 12:34                               ` Jan Beulich
2022-02-07 12:57                                 ` Oleksandr Andrushchenko
2022-02-07 13:02                                   ` Jan Beulich
2022-02-07 12:46                               ` Roger Pau Monné
2022-02-07 13:53                                 ` Oleksandr Andrushchenko
2022-02-07 14:11                                   ` Jan Beulich
2022-02-07 14:27                                     ` Roger Pau Monné
2022-02-07 14:33                                       ` Jan Beulich
2022-02-07 14:35                                       ` Oleksandr Andrushchenko
2022-02-07 15:11                                         ` Oleksandr Andrushchenko
2022-02-07 15:26                                           ` Jan Beulich
2022-02-07 16:07                                             ` Oleksandr Andrushchenko
2022-02-07 16:15                                               ` Jan Beulich
2022-02-07 16:21                                                 ` Oleksandr Andrushchenko
2022-02-07 16:37                                                   ` Jan Beulich
2022-02-07 16:44                                                     ` Oleksandr Andrushchenko
2022-02-08  7:35                                                       ` Oleksandr Andrushchenko
2022-02-08  8:57                                                         ` Jan Beulich
2022-02-08  9:03                                                           ` Oleksandr Andrushchenko
2022-02-08 10:50                                                         ` Roger Pau Monné [this message]
2022-02-08 11:13                                                           ` Oleksandr Andrushchenko
2022-02-08 13:38                                                             ` Roger Pau Monné
2022-02-08 13:52                                                               ` Oleksandr Andrushchenko
2022-02-08  8:53                                                       ` Jan Beulich
2022-02-08  9:00                                                         ` Oleksandr Andrushchenko
2022-02-08 10:11                                                     ` Roger Pau Monné
2022-02-08 10:32                                                       ` Oleksandr Andrushchenko
2022-02-07 16:08                                             ` Roger Pau Monné
2022-02-07 16:12                                               ` Jan Beulich
2022-02-07 14:28                                     ` Oleksandr Andrushchenko
2022-02-07 14:19                                   ` Roger Pau Monné
2022-02-07 14:27                                     ` Oleksandr Andrushchenko
2022-02-04 11:37               ` Oleksandr Andrushchenko
2022-02-04 12:15                 ` Roger Pau Monné
2022-02-04 10:57           ` Roger Pau Monné
2022-02-04  6:34 ` [PATCH v6 04/13] vpci: restrict unhandled read/write operations for guests Oleksandr Andrushchenko
2022-02-04 14:11   ` Jan Beulich
2022-02-04 14:24     ` Oleksandr Andrushchenko
2022-02-08  8:00       ` Oleksandr Andrushchenko
2022-02-08  9:04         ` Jan Beulich
2022-02-08  9:09           ` Oleksandr Andrushchenko
2022-02-08  9:05         ` Roger Pau Monné
2022-02-08  9:10           ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 05/13] vpci: add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
2022-02-07 16:28   ` Jan Beulich
2022-02-08  8:32     ` Oleksandr Andrushchenko
2022-02-08  9:13       ` Jan Beulich
2022-02-08  9:27         ` Oleksandr Andrushchenko
2022-02-08  9:44           ` Jan Beulich
2022-02-08  9:55             ` Oleksandr Andrushchenko
2022-02-08 10:09               ` Jan Beulich
2022-02-08 10:22                 ` Oleksandr Andrushchenko
2022-02-08 10:29                   ` Jan Beulich
2022-02-08 10:52                     ` Oleksandr Andrushchenko
2022-02-08 11:00                       ` Jan Beulich
2022-02-08 11:25                         ` Oleksandr Andrushchenko
2022-02-10  8:21                           ` Oleksandr Andrushchenko
2022-02-10  9:22                             ` Jan Beulich
2022-02-10  9:33                               ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 06/13] vpci/header: implement guest BAR register handlers Oleksandr Andrushchenko
2022-02-07 17:06   ` Jan Beulich
2022-02-08  8:06     ` Oleksandr Andrushchenko
2022-02-08  9:16       ` Jan Beulich
2022-02-08  9:29         ` Roger Pau Monné
2022-02-08  9:25   ` Roger Pau Monné
2022-02-08  9:31     ` Oleksandr Andrushchenko
2022-02-08  9:48       ` Jan Beulich
2022-02-08  9:57         ` Oleksandr Andrushchenko
2022-02-08 10:15           ` Jan Beulich
2022-02-08 10:29             ` Oleksandr Andrushchenko
2022-02-08 13:58               ` Roger Pau Monné
2022-02-04  6:34 ` [PATCH v6 07/13] vpci/header: handle p2m range sets per BAR Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 08/13] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
2022-02-04 14:25   ` Jan Beulich
2022-02-08  8:13     ` Oleksandr Andrushchenko
2022-02-08  9:33       ` Jan Beulich
2022-02-08  9:38         ` Oleksandr Andrushchenko
2022-02-08  9:52           ` Jan Beulich
2022-02-08  9:58             ` Oleksandr Andrushchenko
2022-02-08 11:11               ` Roger Pau Monné
2022-02-08 11:29                 ` Oleksandr Andrushchenko
2022-02-08 14:09                   ` Roger Pau Monné
2022-02-08 14:13                     ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 10/13] vpci/header: reset the command register when adding devices Oleksandr Andrushchenko
2022-02-04 14:30   ` Jan Beulich
2022-02-04 14:37     ` Oleksandr Andrushchenko
2022-02-07  7:29       ` Jan Beulich
2022-02-07 11:27         ` Oleksandr Andrushchenko
2022-02-07 12:38           ` Jan Beulich
2022-02-07 12:51             ` Oleksandr Andrushchenko
2022-02-07 12:54               ` Jan Beulich
2022-02-07 14:17                 ` Oleksandr Andrushchenko
2022-02-07 14:31                   ` Jan Beulich
2022-02-07 14:46                     ` Oleksandr Andrushchenko
2022-02-07 15:05                       ` Jan Beulich
2022-02-07 15:14                         ` Oleksandr Andrushchenko
2022-02-07 15:28                           ` Jan Beulich
2022-02-07 15:59                             ` Oleksandr Andrushchenko
2022-02-10 12:54                     ` Oleksandr Andrushchenko
2022-02-10 13:36                       ` Jan Beulich
2022-02-10 13:56                         ` Oleksandr Andrushchenko
2022-02-10 12:59                     ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 11/13] vpci: add initial support for virtual PCI bus topology Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 12/13] xen/arm: translate virtual PCI bus topology for guests Oleksandr Andrushchenko
2022-02-04  7:56   ` Jan Beulich
2022-02-04  8:18     ` Oleksandr Andrushchenko
2022-02-04  6:34 ` [PATCH v6 13/13] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Andrushchenko
2022-02-11 15:28   ` Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YgJK2zPszyTXGxMM@Air-de-Roger \
    --to=roger.pau@citrix.com \
    --cc=Artem_Mygaiev@epam.com \
    --cc=Oleksandr_Andrushchenko@epam.com \
    --cc=Oleksandr_Tyshchenko@epam.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=bertrand.marquis@arm.com \
    --cc=george.dunlap@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=paul@xen.org \
    --cc=rahul.singh@arm.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).