linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Streetman <ddstreet@ieee.org>
To: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	jgross@suse.com, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Dan Streetman <dan.streetman@canonical.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	xen-devel@lists.xenproject.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-pci@vger.kernel.org
Subject: Re: [Xen-devel] [PATCH] xen: do not re-use pirq number cached in pci device msi msg data
Date: Fri, 13 Jan 2017 15:13:54 -0500	[thread overview]
Message-ID: <CALZtONAkxjxBYees128HVAqW5=SFqQkp3e98gc4NEvAk-XafcA@mail.gmail.com> (raw)
In-Reply-To: <04f2a09f-59be-a720-bc98-4afb53171790@oracle.com>

On Fri, Jan 13, 2017 at 3:00 PM, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 01/13/2017 01:44 PM, Stefano Stabellini wrote:
>> On Fri, 13 Jan 2017, Dan Streetman wrote:
>>> On Wed, Jan 11, 2017 at 6:25 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>>>> On Wed, Jan 11, 2017 at 1:46 PM, Stefano Stabellini
>>>> <sstabellini@kernel.org> wrote:
>>>>> On Wed, 11 Jan 2017, Dan Streetman wrote:
>>>>>> On Tue, Jan 10, 2017 at 8:25 PM, Stefano Stabellini
>>>>>> <sstabellini@kernel.org> wrote:
>>>>>>> On Tue, 10 Jan 2017, Dan Streetman wrote:
>>>>>>>> On Tue, Jan 10, 2017 at 2:03 PM, Stefano Stabellini
>>>>>>>> <sstabellini@kernel.org> wrote:
>>>>>>>>> On Tue, 10 Jan 2017, Dan Streetman wrote:
>>>>>>>>>> On Tue, Jan 10, 2017 at 10:57 AM, Dan Streetman <ddstreet@ieee.org> wrote:
>>>>>>>>>>> On Mon, Jan 9, 2017 at 2:30 PM, Stefano Stabellini
>>>>>>>>>>> <sstabellini@kernel.org> wrote:
>>>>>>>>>>>> On Mon, 9 Jan 2017, Konrad Rzeszutek Wilk wrote:
>>>>>>>>>>>>> On Mon, Jan 09, 2017 at 10:42:41AM -0500, Dan Streetman wrote:
>>>>>>>>>>>>>> On Mon, Jan 9, 2017 at 9:59 AM, Boris Ostrovsky
>>>>>>>>>>>>>> <boris.ostrovsky@oracle.com> wrote:
>>>>>>>>>>>>>>> On 01/06/2017 08:06 PM, Konrad Rzeszutek Wilk wrote:
>>>>>>>>>>>>>>>> On Thu, Jan 05, 2017 at 02:28:56PM -0500, Dan Streetman wrote:
>>>>>>>>>>>>>>>>> Do not read a pci device's msi message data to see if a pirq was
>>>>>>>>>>>>>>>>> previously configured for the device's msi/msix, as the old pirq was
>>>>>>>>>>>>>>>>> unmapped and may now be in use by another pci device.  The previous
>>>>>>>>>>>>>>>>> pirq should never be re-used; instead a new pirq should always be
>>>>>>>>>>>>>>>>> allocated from the hypervisor.
>>>>>>>>>>>>>>>> Won't this cause a starvation problem? That is we will run out of PIRQs
>>>>>>>>>>>>>>>> as we are not reusing them?
>>>>>>>>>>>>>>> Don't we free the pirq when we unmap it?
>>>>>>>>>>>>>> I think this is actually a bit worse than I initially thought.  After
>>>>>>>>>>>>>> looking a bit closer, and I think there's an asymmetry with pirq
>>>>>>>>>>>>>> allocation:
>>>>>>>>>>>>> Lets include Stefano,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you for digging in this! This has quite the deja-vu
>>>>>>>>>>>>> feeling as I believe I hit this at some point in the past and
>>>>>>>>>>>>> posted some possible ways of fixing this. But sadly I can't
>>>>>>>>>>>>> find the thread.
>>>>>>>>>>>> This issue seems to be caused by:
>>>>>>>>>>>>
>>>>>>>>>>>> commit af42b8d12f8adec6711cb824549a0edac6a4ae8f
>>>>>>>>>>>> Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>>>>>>>>>> Date:   Wed Dec 1 14:51:44 2010 +0000
>>>>>>>>>>>>
>>>>>>>>>>>>     xen: fix MSI setup and teardown for PV on HVM guests
>>>>>>>>>>>>
>>>>>>>>>>>> which was a fix to a bug:
>>>>>>>>>>>>
>>>>>>>>>>>>     This fixes a bug in xen_hvm_setup_msi_irqs that manifests itself when
>>>>>>>>>>>>     trying to enable the same MSI for the second time: the old MSI to pirq
>>>>>>>>>>>>     mapping is still valid at this point but xen_hvm_setup_msi_irqs would
>>>>>>>>>>>>     try to assign a new pirq anyway.
>>>>>>>>>>>>     A simple way to reproduce this bug is to assign an MSI capable network
>>>>>>>>>>>>     card to a PV on HVM guest, if the user brings down the corresponding
>>>>>>>>>>>>     ethernet interface and up again, Linux would fail to enable MSIs on the
>>>>>>>>>>>>     device.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't remember any of the details. From the description of this bug,
>>>>>>>>>>>> it seems that Xen changed behavior in the past few years: before it used
>>>>>>>>>>>> to keep the pirq-MSI mapping, while today it doesn't. If I wrote "the
>>>>>>>>>>>> old MSI to pirq mapping is still valid at this point", the pirq couldn't
>>>>>>>>>>>> have been completely freed, then reassigned to somebody else the way it
>>>>>>>>>>>> is described in this email.
>>>>>>>>>>>>
>>>>>>>>>>>> I think we should indentify the changeset or Xen version that introduced
>>>>>>>>>>>> the new behavior. If it is old enough, we might be able to just revert
>>>>>>>>>>>> af42b8d12f8adec6711cb824549a0edac6a4ae8f. Otherwise we could make the
>>>>>>>>>>>> behavior conditional to the Xen version.
>>>>>>>>>>> Are PT devices the only MSI-capable devices available in a Xen guest?
>>>>>>>>>>> That's where I'm seeing this problem, with PT NVMe devices.
>>>>>>>>> They are the main ones. It is possible to have emulated virtio devices
>>>>>>>>> with emulated MSIs, for example virtio-net. Althought they are not in
>>>>>>>>> the Xen Project CI-loop, so I wouldn't be surprised if they are broken
>>>>>>>>> too.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> I can say that on the Xen guest with NVMe PT devices I'm testing on,
>>>>>>>>>>> with the patch from this thread (which essentially reverts your commit
>>>>>>>>>>> above) as well as some added debug to see the pirq numbers, cycles of
>>>>>>>>>>> 'modprobe nvme ; rmmod nvme' don't cause pirq starvation, as the
>>>>>>>>>>> hypervisor provides essentially the same pirqs for each modprobe,
>>>>>>>>>>> since they were freed by the rmmod.
>>>>>>>>> I am fine with reverting the old patch, but we need to understand what
>>>>>>>>> caused the change in behavior first. Maybe somebody else with a Xen PCI
>>>>>>>>> passthrough setup at hand can help with that.
>>>>>>>>>
>>>>>>>>> In the Xen code I can still see:
>>>>>>>>>
>>>>>>>>>     case ECS_PIRQ: {
>>>>>>>>>         struct pirq *pirq = pirq_info(d1, chn1->u.pirq.irq);
>>>>>>>>>
>>>>>>>>>         if ( !pirq )
>>>>>>>>>             break;
>>>>>>>>>         if ( !is_hvm_domain(d1) )
>>>>>>>>>             pirq_guest_unbind(d1, pirq);
>>>>>>>>>
>>>>>>>>> which means that pirq_guest_unbind should only be called on evtchn_close
>>>>>>>>> if the guest is not an HVM guest.
>>>>>>>> I tried an experiment to call get_free_pirq on both sides of a
>>>>>>>> evtchn_close hcall, using two SRIOV nics.  When I rmmod/modprobe, I
>>>>>>>> see something interesting; each nic uses 3 MSIs, and it looks like
>>>>>>>> when they shut down, each nic's 3 pirqs are not available until after
>>>>>>>> the nic is done shutting down, so it seems like closing the evtchn
>>>>>>>> isn't what makes the pirq free.
>>>>>>>>
>>>>>>>> [3697700.390188] xen:events: creating evtchn using pirq 101 irq 290
>>>>>>>> [3697700.390214] xen:events: creating evtchn using pirq 100 irq 291
>>>>>>>> [3697700.390228] xen:events: creating evtchn using pirq 99 irq 292
>>>>>>>> [3697700.392789] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps
>>>>>>>> [3697700.406167] xen:events: creating evtchn using pirq 98 irq 293
>>>>>>>> [3697700.406222] xen:events: creating evtchn using pirq 97 irq 294
>>>>>>>> [3697700.406259] xen:events: creating evtchn using pirq 96 irq 295
>>>>>>>> [3697700.408345] ixgbevf 0000:00:04.0: NIC Link is Up 10 Gbps
>>>>>>>>
>>>>>>>> nic 3 uses pirq 99-101, while nic 4 uses pirq 96-98.
>>>>>>>>
>>>>>>>> [3697705.470151] xen:events: shutdown_pirq: xen_domain() == 1,
>>>>>>>> xen_pv_domain() == 0, xen_hvm_domain() == 1, xen_initial_domain() ==
>>>>>>>> 0, xen_pvh_domain() == 0
>>>>>>>>
>>>>>>>> just to be sure, a bit of dbg so I know what domain this is :-)
>>>>>>>>
>>>>>>>> [3697778.781463] xen:events: shutdown_pirq: get_free_pirq returned pirq 93
>>>>>>>> [3697778.781465] xen:events: shutdown_pirq: closing evtchn for pirq 96 irq 295
>>>>>>>> [3697778.781475] xen:events: shutdown_pirq: get_free_pirq returned pirq 92
>>>>>>>> [3697778.781489] xen:events: shutdown_pirq: get_free_pirq returned pirq 91
>>>>>>>> [3697778.781490] xen:events: shutdown_pirq: closing evtchn for pirq 97 irq 294
>>>>>>>> [3697778.781498] xen:events: shutdown_pirq: get_free_pirq returned pirq 90
>>>>>>>> [3697778.781508] xen:events: shutdown_pirq: get_free_pirq returned pirq 89
>>>>>>>> [3697778.781509] xen:events: shutdown_pirq: closing evtchn for pirq 98 irq 293
>>>>>>>> [3697778.781517] xen:events: shutdown_pirq: get_free_pirq returned pirq 88
>>>>>>>>
>>>>>>>> nic 4 is shutdown first, and closes its evtchns for pirqs 96-98; but
>>>>>>>> none of those become available for get_free_pirq.
>>>>>>>>
>>>>>>>> [3697779.005501] xen:events: shutdown_pirq: get_free_pirq returned pirq 98
>>>>>>>>
>>>>>>>> aha, now nic 4 has fully finished shutting down, and nic 3 has started
>>>>>>>> shutdown; we see those pirqs from nic 4 are now available.  So it must
>>>>>>>> not be evtchn closing that frees the pirqs.
>>>>>>>>
>>>>>>>> [3697779.005503] xen:events: shutdown_pirq: closing evtchn for pirq 99 irq 292
>>>>>>>> [3697779.005512] xen:events: shutdown_pirq: get_free_pirq returned pirq 97
>>>>>>>> [3697779.005524] xen:events: shutdown_pirq: get_free_pirq returned pirq 96
>>>>>>>> [3697779.005526] xen:events: shutdown_pirq: closing evtchn for pirq 100 irq 291
>>>>>>>> [3697779.005540] xen:events: shutdown_pirq: get_free_pirq returned pirq 87
>>>>>>>> [3697779.005611] xen:events: shutdown_pirq: get_free_pirq returned pirq 86
>>>>>>>> [3697779.005624] xen:events: shutdown_pirq: closing evtchn for pirq 101 irq 290
>>>>>>>> [3697779.005659] xen:events: shutdown_pirq: get_free_pirq returned pirq 85
>>>>>>>>
>>>>>>>>
>>>>>>>> so, since pci_disable_msix eventually calls xen_teardown_msi_irq()
>>>>>>>> which calls xen_destroy_irq(), i moved the dbg to xen_destroy_irq()
>>>>>>>> (and recompiled/rebooted) and found the pirqs have already been freed
>>>>>>>> before that is called:
>>>>>>>>
>>>>>>>> [3700084.714686] xen:events: shutdown_pirq: closing evtchn for pirq 98 irq 295
>>>>>>>> [3700084.714702] xen:events: shutdown_pirq: closing evtchn for pirq 99 irq 294
>>>>>>>> [3700084.714708] xen:events: shutdown_pirq: closing evtchn for pirq 100 irq 293
>>>>>>>> [3700084.775598] xen:events: xen_destroy_irq: get_free_pirq returned pirq 100
>>>>>>>> [3700084.775599] xen:events: xen_destroy_irq: pirq 100 irq 293
>>>>>>>> [3700084.775624] xen:events: xen_destroy_irq: get_free_pirq returned pirq 99
>>>>>>>> [3700084.775631] xen:events: xen_destroy_irq: get_free_pirq returned pirq 98
>>>>>>>> [3700084.775632] xen:events: xen_destroy_irq: pirq 99 irq 294
>>>>>>>> [3700084.775646] xen:events: xen_destroy_irq: get_free_pirq returned pirq 97
>>>>>>>> [3700084.775653] xen:events: xen_destroy_irq: get_free_pirq returned pirq 96
>>>>>>>> [3700084.775654] xen:events: xen_destroy_irq: pirq 98 irq 295
>>>>>>>> [3700084.775666] xen:events: xen_destroy_irq: get_free_pirq returned pirq 95
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm still following thru the kernel code, but it's not immediately
>>>>>>>> obvious what exactly is telling the hypervisor to free the pirqs; any
>>>>>>>> idea?
>>>>>>>>
>>>>>>>> >From the hypervisor code, it seems that the pirq is "available" via
>>>>>>>> is_free_pirq():
>>>>>>>>     return !pirq || (!pirq->arch.irq && (!is_hvm_domain(d) ||
>>>>>>>>         pirq->arch.hvm.emuirq == IRQ_UNBOUND));
>>>>>>>>
>>>>>>>> when the evtchn is closed, it does:
>>>>>>>>         if ( is_hvm_domain(d1) && domain_pirq_to_irq(d1, pirq->pirq) > 0 )
>>>>>>>>             unmap_domain_pirq_emuirq(d1, pirq->pirq);
>>>>>>>>
>>>>>>>> and that call to unmap_domain_pirq_emuirq does:
>>>>>>>>         info->arch.hvm.emuirq = IRQ_UNBOUND;
>>>>>>>>
>>>>>>>> so, the only thing left is to clear pirq->arch.irq,but the only place
>>>>>>>> I can find that does that is clear_domain_irq_pirq(), which is only
>>>>>>>> called from pirq_guest_unbind() and unmap_domain_pirq(), but I'm not
>>>>>>>> seeing where either of those would be called when all the kernel is
>>>>>>>> doing is disabling a pci device.
>>>>>>> Thanks for the info. I think I know what causes the pirq to be unmapped:
>>>>>>> when Linux disables msi or msix on the device, using the regular pci
>>>>>>> config space based method, QEMU (which emulates the PCI config space)
>>>>>>> tells Xen to unmap the pirq.
>>>>>> aha, via a XEN_DOMCTL_unbind_pt_irq, maybe?  Well that makes more sense then.
>>>>>>
>>>>>>> If that's the case, and if it isn't possible for xen_hvm_setup_msi_irqs
>>>>>>> to be called a second time without msis being disabled first, then I
>>>>>>> think we can revert the patch.
>>>>>> It doesn't seem possible to call it twice from a correctly-behaved
>>>>>> driver, but in case of a driver bug that does try to enable msi/msix
>>>>>> multiple times without disabling, __pci_enable_msix() only does
>>>>>> WARN_ON(!!dev->msix_enabled), and __pci_enable_msi_range() only does
>>>>>> WARN_ON(!!dev->msi_enabled); they both will continue.  Maybe that
>>>>>> should be changed to warn and also return error, to prevent
>>>>>> re-configuring msi/msix if already configured?  Or, maybe the warning
>>>>>> is enough - the worst thing that reverting the patch does is use extra
>>>>>> pirqs, right?
>>>>> I think the warning is enough.  Can you confirm that with
>>>>> af42b8d12f8adec6711cb824549a0edac6a4ae8f reverted, also
>>>>>
>>>>> ifconfig eth0 down; ifconfig eth0 up
>>>>>
>>>>> still work as expected, no warnings?
>>>> yes, with the patch that started this thread - which essentially
>>>> reverts af42b8d12f8adec6711cb824549a0edac6a4ae8f - there are no
>>>> warnings and ifconfig down ; ifconfig up work as expected.
>>>>
>>>>>
>>>>> It looks like the patch that changed hypervisor (QEMU actually) behavior
>>>>> is:
>>>>>
>>>>> commit c976437c7dba9c7444fb41df45468968aaa326ad
>>>>> Author: Zhenzhong Duan <zhenzhong.duan@oracle.com>
>>>>> Date:   Wed May 7 13:41:48 2014 +0000
>>>>>
>>>>>     qemu-xen: free all the pirqs for msi/msix when driver unload
>>>>>
>>>>> From this commit onward, QEMU started unmapping pirqs when MSIs are
>>>>> disabled. c976437c7dba9c7444fb41df45468968aaa326ad is present in 4.8,
>>>>> 4.7, 4.6, 4.5. The newest release without the commit is Xen 4.4.
>>>>>
>>>>> If we revert af42b8d12f8adec6711cb824549a0edac6a4ae8f, we fix the bug on
>>>>> all Xen versions from 4.5 onward, but we break the behavior on Xen 4.4
>>>>> and older. Given that Xen 4.4 is out of support, I think we should go
>>>>> ahead with it.  Opinions?
>>> Looks like there's no complaints; is my patch from the start of this
>>> thread ok to use, or can you craft a patch to use?  My patch's
>>> description could use updating to add some of the info/background from
>>> this discussion...
>> Hi Dan, I would like an explicit Ack from the other maintainers, Boris
>> and Juergen. Let me place them in To: to make it more obvious.
>
>
> Where is the patch? I don't think 'git revert' will work.

I just re-sent the same patch that originally started this long
thread, except with the description updated to include details on the
previous commits and a Fixes: line.  It reverts the main part of
commit af42b8d12f8adec6711cb824549a0edac6a4ae8f, but there are other
changes in that commit that either don't apply anymore and/or were
already removed/changed (like changes to xen_allocate_pirq_msi
function), or changes that don't need reverting (like adding the
XEN_PIRQ_MSI_DATA #define, which should stay).

The only part that really needed reverting was the code in
xen_hvm_setup_msi_irqs() that checked the pirq number that was cached
in the pci msi message data.

Thanks!

>
> And Konrad will need to ack it too as he is Xen-PCI maintainer.
>
> -boris
>

      parent reply	other threads:[~2017-01-13 20:14 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-05 19:28 [PATCH] xen: do not re-use pirq number cached in pci device msi msg data Dan Streetman
2017-01-07  1:06 ` Konrad Rzeszutek Wilk
2017-01-09 14:59   ` Boris Ostrovsky
2017-01-09 15:42     ` [Xen-devel] " Dan Streetman
2017-01-09 15:59       ` Konrad Rzeszutek Wilk
2017-01-09 19:30         ` Stefano Stabellini
2017-01-10 15:57           ` Dan Streetman
2017-01-10 18:41             ` Dan Streetman
2017-01-10 19:03               ` Stefano Stabellini
2017-01-10 21:32                 ` Dan Streetman
2017-01-10 23:28                   ` Boris Ostrovsky
2017-01-11  1:25                   ` Stefano Stabellini
2017-01-11 15:26                     ` Dan Streetman
2017-01-11 18:46                       ` Stefano Stabellini
2017-01-11 23:25                         ` Dan Streetman
2017-01-13 18:31                           ` Dan Streetman
2017-01-13 18:44                             ` Stefano Stabellini
2017-01-13 20:00                               ` Boris Ostrovsky
2017-01-13 20:07                                 ` Dan Streetman
2017-01-13 20:54                                   ` Stefano Stabellini
2017-01-13 21:49                                   ` Boris Ostrovsky
2017-01-13 22:30                                   ` Konrad Rzeszutek Wilk
2017-02-21 15:31                                     ` Dan Streetman
2017-02-21 15:45                                       ` Juergen Gross
2017-02-21 15:58                                         ` Boris Ostrovsky
2017-02-22 14:28                                           ` Bjorn Helgaas
2017-02-22 15:14                                             ` Boris Ostrovsky
2017-05-03 18:19                                               ` [Xen-devel] " David Woodhouse
2017-05-03 18:43                                                 ` Boris Ostrovsky
2017-05-03 22:59                                                   ` Stefano Stabellini
2017-05-03 23:06                                                     ` Greg KH
2017-05-03 23:12                                                       ` Stefano Stabellini
2017-05-03 23:19                                                         ` Greg KH
2017-01-13 20:13                                 ` Dan Streetman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALZtONAkxjxBYees128HVAqW5=SFqQkp3e98gc4NEvAk-XafcA@mail.gmail.com' \
    --to=ddstreet@ieee.org \
    --cc=bhelgaas@google.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dan.streetman@canonical.com \
    --cc=jgross@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).