From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [PATCH v2 2/4] x86/MSI-X: access MSI-X table only after having enabled MSI-X Date: Mon, 13 Apr 2015 10:05:14 +0100 Message-ID: <552BA2EA02000078000715D3@mail.emea.novell.com> References: <5512F1B3020000780006D924@mail.emea.novell.com> <5512F2E5020000780006D945@mail.emea.novell.com> <20150410200218.GA1814@l.oracle.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=__Part4A7E39DA.1__=" Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1YhaIy-0005yr-PV for xen-devel@lists.xenproject.org; Mon, 13 Apr 2015 09:05:20 +0000 In-Reply-To: <20150410200218.GA1814@l.oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Andrew Cooper , Keir Fraser , xen-devel List-Id: xen-devel@lists.xenproject.org This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__Part4A7E39DA.1__= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline >>> On 10.04.15 at 22:02, wrote: > On Wed, Mar 25, 2015 at 04:39:49PM +0000, Jan Beulich wrote: >> As done in Linux by f598282f51 ("PCI: Fix the NIU MSI-X problem in a >> better way") and its broken predecessor, make sure we don't access the >> MSI-X table without having enabled MSI-X first, using the mask-all flag >> instead to prevent interrupts from occurring. >=20 > This causes an regression with an Linux guest that has the XSA120 + = XSA120 > addendum with PV guests (hadn't tried yet HVM). You mentioning XSA-120 and its addendum - are these requirements for the problem to be seen? I admit I may have tested a PV guest only with an SR-IOV VF (and only a HVM guest also with an "ordinary" device), but I'd like to be clear about the validity of the connection. > When PV guest requests an MSI-X, pciback gets: >=20 > [ 122.778654] xen-pciback: 0000:0a:00.0: enable MSI-X > [ 122.778861] pciback 0000:0a:00.0: xen map irq failed -6 for 1 domain > [ 122.778887] xen_pciback: 0000:0a:00.0: error enabling MSI-X for guest = 1: err -6! Yeah, there were a few adjustments necessary to get similar issues under control for HVM guests - maybe there's a path left not covered by what's done for HVM guests. > The device has the PCI_COMMAND enabled correctly: >=20 > # lspci -s 0a:0.0 -vvv | head > 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network=20 > Connection (rev 01) > Subsystem: Super Micro Computer Inc Device 10c9 > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- = ParErr-=20 > Stepping- SERR- FastB2B- DisINTx- >=20 > Attaching the 'xl dmesg', 'dmesg', and 'lspci' Right - MSI-X is still disabled if the lspci really matches the state at the time the failure occurred. I'm attaching the original patch with some debugging code left in, which I suppose would show where the problematic path is in case you can get to it before I would. Jan --=__Part4A7E39DA.1__= Content-Type: text/plain; name="x86-MSI-X-enable.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="x86-MSI-X-enable.patch" =0ATODO: drop //temp-s=0A=0AAs done in Linux by f598282f51 ("PCI: Fix the = NIU MSI-X problem in a=0Abetter way") and its broken predecessor, make = sure we don't access the=0AMSI-X table without having enabled MSI-X first, = using the mask-all flag=0Ainstead to prevent interrupts from occurring.=0A= =0A--- unstable.orig/xen/arch/x86/msi.c 2015-02-10 08:19:17.000000000 = +0100=0A+++ unstable/xen/arch/x86/msi.c 2015-02-10 09:05:12.000000000 = +0100=0A@@ -142,6 +142,23 @@ static bool_t memory_decoded(const struc=0A = PCI_COMMAND_MEMORY);=0A }=0A =0A+static bool_t msix_memory_deco= ded(const struct pci_dev *dev, unsigned int pos)=0A+{=0A+ u16 control = =3D pci_conf_read16(dev->seg, dev->bus,=0A+ = PCI_SLOT(dev->devfn),=0A+ PCI_FUNC(dev->= devfn),=0A+ msix_control_reg(pos));=0A+=0A= + if ( !(control & PCI_MSIX_FLAGS_ENABLE) )=0A+{//temp=0A+ static = bool_t warned;=0A+ WARN_ON(!test_and_set_bool(warned));=0A+ return = 0;=0A+}=0A+=0A+ return memory_decoded(dev);=0A+}=0A+=0A /*=0A * MSI = message composition=0A */=0A@@ -219,7 +236,8 @@ static bool_t read_msi_msg= (struct msi_de=0A void __iomem *base;=0A base =3D = entry->mask_base;=0A =0A- if ( unlikely(!memory_decoded(entry->dev))= )=0A+ if ( unlikely(!msix_memory_decoded(entry->dev,=0A+ = entry->msi_attrib.pos)) )=0A = return 0;=0A msg->address_lo =3D readl(base + PCI_MSIX_ENTRY_LOWER_= ADDR_OFFSET);=0A msg->address_hi =3D readl(base + PCI_MSIX_ENTRY_UP= PER_ADDR_OFFSET);=0A@@ -285,7 +303,8 @@ static int write_msi_msg(struct = msi_desc=0A void __iomem *base;=0A base =3D entry->mask_bas= e;=0A =0A- if ( unlikely(!memory_decoded(entry->dev)) )=0A+ = if ( unlikely(!msix_memory_decoded(entry->dev,=0A+ = entry->msi_attrib.pos)) )=0A return = -ENXIO;=0A writel(msg->address_lo,=0A base + = PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);=0A@@ -379,7 +398,7 @@ static bool_t = msi_set_mask_bit(struct ir=0A {=0A struct msi_desc *entry =3D = desc->msi_desc;=0A struct pci_dev *pdev;=0A- u16 seg;=0A+ u16 = seg, control;=0A u8 bus, slot, func;=0A =0A ASSERT(spin_is_locked(&= desc->lock));=0A@@ -401,35 +420,38 @@ static bool_t msi_set_mask_bit(struct= ir=0A }=0A break;=0A case PCI_CAP_ID_MSIX:=0A+ = control =3D pci_conf_read16(seg, bus, slot, func,=0A+ = msix_control_reg(entry->msi_attrib.pos));=0A+ if ( = unlikely(!(control & PCI_MSIX_FLAGS_ENABLE)) )=0A+ pci_conf_writ= e16(seg, bus, slot, func,=0A+ msix_control_reg(= entry->msi_attrib.pos),=0A+ control | = (PCI_MSIX_FLAGS_ENABLE |=0A+ = PCI_MSIX_FLAGS_MASKALL));=0A if ( likely(memory_decoded(pdev)) = )=0A {=0A writel(flag, entry->mask_base + PCI_MSIX_ENTR= Y_VECTOR_CTRL_OFFSET);=0A readl(entry->mask_base + PCI_MSIX_ENT= RY_VECTOR_CTRL_OFFSET);=0A- break;=0A+ if ( = likely(control & PCI_MSIX_FLAGS_ENABLE) )=0A+ break;=0A+ = flag =3D 1;=0A }=0A- if ( flag )=0A+ else if = ( flag && !(control & PCI_MSIX_FLAGS_MASKALL) )=0A {=0A- = u16 control;=0A domid_t domid =3D pdev->domain->domain_id;=0A = =0A- control =3D pci_conf_read16(seg, bus, slot, func,=0A- = msix_control_reg(entry->msi_attrib.pos));= =0A- if ( control & PCI_MSIX_FLAGS_MASKALL )=0A- = break;=0A- pci_conf_write16(seg, bus, slot, func,=0A- = msix_control_reg(entry->msi_attrib.pos),=0A- = control | PCI_MSIX_FLAGS_MASKALL);=0A+ control = |=3D PCI_MSIX_FLAGS_MASKALL;=0A if ( pdev->msix->warned !=3D = domid )=0A {=0A pdev->msix->warned =3D = domid;=0A printk(XENLOG_G_WARNING=0A- = "cannot mask IRQ %d: masked MSI-X on Dom%d's %04x:%02x:%02x.%u\n",=0A+ = "cannot mask IRQ %d: masking MSI-X on Dom%d's %04x:%02x:= %02x.%u\n",=0A desc->irq, domid, pdev->seg, = pdev->bus,=0A PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->= devfn));=0A }=0A- break;=0A }=0A- /* = fall through */=0A+ pci_conf_write16(seg, bus, slot, func,=0A+ = msix_control_reg(entry->msi_attrib.pos), control);=0A+ = return flag;=0A default:=0A return 0;=0A }=0A@@ = -454,7 +476,8 @@ static int msi_get_mask_bit(const struct=0A = entry->msi.mpos) >>=0A entry->msi_attrib.= entry_nr) & 1;=0A case PCI_CAP_ID_MSIX:=0A- if ( unlikely(!memor= y_decoded(entry->dev)) )=0A+ if ( unlikely(!msix_memory_decoded(entr= y->dev,=0A+ entry->msi_attrib.pos= )) )=0A break;=0A return readl(entry->mask_base + = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET) & 1;=0A }=0A@@ -775,16 +798,32 @@ = static int msix_capability_init(struct p=0A =0A pos =3D pci_find_cap_of= fset(seg, bus, slot, func, PCI_CAP_ID_MSIX);=0A control =3D pci_conf_re= ad16(seg, bus, slot, func, msix_control_reg(pos));=0A- msix_set_enable(d= ev, 0);/* Ensure msix is disabled as I set it up */=0A+ /*=0A+ * = Ensure MSI-X interrupts are masked during setup. Some devices require=0A+ = * MSI-X to be enabled before we can touch the MSI-X registers. We = need=0A+ * to mask all the vectors to prevent interrupts coming in = before they're=0A+ * fully set up.=0A+ */=0A+ pci_conf_write16(s= eg, bus, slot, func, msix_control_reg(pos),=0A+ = control | (PCI_MSIX_FLAGS_ENABLE |=0A+ = PCI_MSIX_FLAGS_MASKALL));=0A =0A if ( unlikely(!memory_decoded(dev)) = )=0A+ {=0A+ pci_conf_write16(seg, bus, slot, func, msix_control_r= eg(pos),=0A+ control & ~PCI_MSIX_FLAGS_ENABLE);=0A = return -ENXIO;=0A+ }=0A =0A if ( desc )=0A {=0A = entry =3D alloc_msi_entry(1);=0A if ( !entry )=0A+ {=0A+ = pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos),=0A+ = control & ~PCI_MSIX_FLAGS_ENABLE);=0A = return -ENOMEM;=0A+ }=0A ASSERT(msi);=0A }=0A =0A@@ = -815,6 +854,8 @@ static int msix_capability_init(struct p=0A {=0A = if ( !msi || !msi->table_base )=0A {=0A+ pci_conf_wri= te16(seg, bus, slot, func, msix_control_reg(pos),=0A+ = control & ~PCI_MSIX_FLAGS_ENABLE);=0A xfree(entry);=0A = return -ENXIO;=0A }=0A@@ -857,6 +898,8 @@ static int = msix_capability_init(struct p=0A =0A if ( idx < 0 )=0A = {=0A+ pci_conf_write16(seg, bus, slot, func, msix_control_reg(po= s),=0A+ control & ~PCI_MSIX_FLAGS_ENABLE);=0A = xfree(entry);=0A return idx;=0A }=0A@@ = -912,8 +955,7 @@ static int msix_capability_init(struct p=0A ++msix->us= ed_entries;=0A =0A /* Restore MSI-X enabled bits */=0A- pci_conf_wri= te16(seg, bus, slot, func, msix_control_reg(pos),=0A- = control & ~PCI_MSIX_FLAGS_MASKALL);=0A+ pci_conf_write16(seg, bus, = slot, func, msix_control_reg(pos), control);=0A =0A return 0;=0A = }=0A@@ -1062,7 +1104,10 @@ static void __pci_disable_msix(struct ms=0A =0A = pos =3D pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSIX);=0A = control =3D pci_conf_read16(seg, bus, slot, func, msix_control_reg(pos)= );=0A- msix_set_enable(dev, 0);=0A+ if ( unlikely(!(control & = PCI_MSIX_FLAGS_ENABLE)) )=0A+ pci_conf_write16(seg, bus, slot, = func, msix_control_reg(pos),=0A+ control | = (PCI_MSIX_FLAGS_ENABLE |=0A+ PCI_MSIX_FL= AGS_MASKALL));=0A =0A BUG_ON(list_empty(&dev->msi_list));=0A =0A@@ = -1188,6 +1234,8 @@ int pci_restore_msi_state(struct pci_dev=0A = list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )=0A {=0A = unsigned int i =3D 0, nr =3D 1;=0A+ u16 control =3D 0;=0A+ = u8 slot =3D PCI_SLOT(pdev->devfn), func =3D PCI_FUNC(pdev->devfn);=0A = =0A irq =3D entry->irq;=0A desc =3D &irq_desc[irq];=0A@@ = -1214,10 +1262,18 @@ int pci_restore_msi_state(struct pci_dev=0A = }=0A else if ( entry->msi_attrib.type =3D=3D PCI_CAP_ID_MSIX )=0A = {=0A- msix_set_enable(pdev, 0);=0A+ control = =3D pci_conf_read16(pdev->seg, pdev->bus, slot, func,=0A+ = msix_control_reg(entry->msi_attrib.pos));=0A+ = pci_conf_write16(pdev->seg, pdev->bus, slot, func,=0A+ = msix_control_reg(entry->msi_attrib.pos),=0A+ = control | (PCI_MSIX_FLAGS_ENABLE |=0A+ = PCI_MSIX_FLAGS_MASKALL));=0A if ( unlikely(!memory_d= ecoded(pdev)) )=0A {=0A spin_unlock_irqrestore(= &desc->lock, flags);=0A+ pci_conf_write16(pdev->seg, = pdev->bus, slot, func,=0A+ msix_control_reg= (entry->msi_attrib.pos),=0A+ control & = ~PCI_MSIX_FLAGS_ENABLE);=0A return -ENXIO;=0A = }=0A }=0A@@ -1246,11 +1302,9 @@ int pci_restore_msi_state(struct = pci_dev=0A if ( entry->msi_attrib.type =3D=3D PCI_CAP_ID_MSI )=0A = {=0A unsigned int cpos =3D msi_control_reg(entry->msi_at= trib.pos);=0A- u16 control =3D pci_conf_read16(pdev->seg, = pdev->bus,=0A- PCI_SLOT(pdev->devf= n),=0A- PCI_FUNC(pdev->devfn), = cpos);=0A =0A- control &=3D ~PCI_MSI_FLAGS_QSIZE;=0A+ = control =3D pci_conf_read16(pdev->seg, pdev->bus, slot, func, cpos) &=0A+ = ~PCI_MSI_FLAGS_QSIZE;=0A multi_msi_enable(= control, entry->msi.nvec);=0A pci_conf_write16(pdev->seg, = pdev->bus, PCI_SLOT(pdev->devfn),=0A = PCI_FUNC(pdev->devfn), cpos, control);=0A@@ -1258,7 +1312,9 @@ int = pci_restore_msi_state(struct pci_dev=0A msi_set_enable(pdev, = 1);=0A }=0A else if ( entry->msi_attrib.type =3D=3D = PCI_CAP_ID_MSIX )=0A- msix_set_enable(pdev, 1);=0A+ = pci_conf_write16(pdev->seg, pdev->bus, slot, func,=0A+ = msix_control_reg(entry->msi_attrib.pos),=0A+ = control | PCI_MSIX_FLAGS_ENABLE);=0A }=0A =0A return 0;=0A --=__Part4A7E39DA.1__= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --=__Part4A7E39DA.1__=--