All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: Bart Van Assche
	<bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>,
	Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in swiotlb_alloc_coherent()
Date: Mon, 13 Jun 2016 18:30:04 -0400 (EDT)	[thread overview]
Message-ID: <450384210.42057823.1465857004662.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <887623939.42004497.1465831339845.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>



----- Original Message -----
> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Yishai Hadas" <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Monday, June 13, 2016 11:22:19 AM
> Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in swiotlb_alloc_coherent()
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> > Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Yishai Hadas"
> > <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Sent: Monday, June 13, 2016 10:19:57 AM
> > Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in
> > swiotlb_alloc_coherent()
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Leon Romanovsky" <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > > To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > Cc: "Yishai Hadas" <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman"
> > > <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > Sent: Monday, June 13, 2016 10:07:47 AM
> > > Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in
> > > swiotlb_alloc_coherent()
> > > 
> > > On Sun, Jun 12, 2016 at 11:32:53PM -0700, Bart Van Assche wrote:
> > > > On 06/12/2016 03:40 PM, Laurence Oberman wrote:
> > > > >Jun  8 10:12:52 jumpclient kernel: mlx5_core 0000:08:00.1: swiotlb
> > > > >buffer
> > > > >is full (sz: 266240 bytes)
> > > > >Jun  8 10:12:52 jumpclient kernel: swiotlb: coherent allocation failed
> > > > >for
> > > > >device 0000:08:00.1 size=266240
> > > > 
> > > > Hello,
> > > > 
> > > > I think the above means that the coherent memory allocation succeeded
> > > > but
> > > > that the test dev_addr + size - 1 <= DMA_BIT_MASK(32) failed. Can
> > > > someone
> > > > from Mellanox tell us whether or not it would be safe to set
> > > > coherent_dma_mask to DMA_BIT_MASK(64) for the mlx4 and mlx5 drivers?
> > > 
> > > Bart and Laurence,
> > > We are actually doing it For mlx5 driver.
> > > 
> > > 926 static int mlx5_pci_init(struct mlx5_core_dev *dev, struct  mlx5_priv
> > > *priv)
> > > <...>
> > > 961         err = set_dma_caps(pdev);
> > > 
> > > 187 static int set_dma_caps(struct pci_dev *pdev)
> > > <...>
> > > 201         err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
> > > 202         if (err) {
> > > 203                 dev_warn(&pdev->dev,
> > > 204                          "Warning: couldn't set 64-bit consistent PCI
> > > DMA
> > > mask\n");
> > > 205                 err = pci_set_consistent_dma_mask(pdev,
> > > DMA_BIT_MASK(32));
> > > 206                 if (err) {
> > > 207                         dev_err(&pdev->dev,
> > > 208                                 "Can't set consistent PCI DMA mask,
> > > aborting\n");
> > > 209                         return err;
> > > 210                 }
> > > 211         }
> > > 
> > > 118 static inline int pci_set_consistent_dma_mask(struct pci_dev *dev,u64
> > > mask)
> > > 119 {
> > > 120         return dma_set_coherent_mask(&dev->dev, mask);
> > > 121 }
> > > 
> > > > 
> > > > Thanks,
> > > > 
> > > > Bart.
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > > > in
> > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > Hi Leon,
> > 
> > OK I see it now
> > 
> > static int set_dma_caps(struct pci_dev *pdev)
> > {
> >         int err;
> > 
> >         err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
> >         if (err) {
> > 
> > Thanks
> > Laurence
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> Replying to my own email.
> Leon, what is the implication of the mapping failure.
> Its only in the reconnect stack when I am restarting controllers with the
> messaging and stack dump masked I still see the failure but it seems
> transparent in that all the paths come back.
> 
> [ 1595.167812] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240
> bytes)
> [ 1595.379133] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240
> bytes)
> [ 1595.460627] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240
> bytes)
> [ 1598.121096] scsi host1: reconnect attempt 3 failed (-48)
> [ 1608.187869] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240
> bytes)
> [ 1615.911705] scsi host1: reconnect attempt 4 failed (-12)
> [ 1641.446017] scsi host1: ib_srp: Got failed path rec status -110
> [ 1641.482947] scsi host1: ib_srp: Path record query failed
> [ 1641.513454] scsi host1: reconnect attempt 5 failed (-110)
> [ 1662.330883] scsi host1: ib_srp: Got failed path rec status -110
> [ 1662.361224] scsi host1: ib_srp: Path record query failed
> [ 1662.390768] scsi host1: reconnect attempt 6 failed (-110)
> [ 1683.892311] scsi host1: ib_srp: Got failed path rec status -110
> [ 1683.922653] scsi host1: ib_srp: Path record query failed
> [ 1683.952717] scsi host1: reconnect attempt 7 failed (-110)
> SM port is up
> 
> Entering MASTER state
> 
> [ 1705.254048] scsi host1:   REJ reason 0x8
> [ 1705.274869] scsi host1: reconnect attempt 8 failed (-104)
> [ 1723.264914] scsi host1:   REJ reason 0x8
> [ 1723.285193] scsi host1: reconnect attempt 9 failed (-104)
> [ 1743.658091] scsi host1:   REJ reason 0x8
> [ 1743.678562] scsi host1: reconnect attempt 10 failed (-104)
> [ 1761.911512] scsi host1:   REJ reason 0x8
> [ 1761.932006] scsi host1: reconnect attempt 11 failed (-104)
> [ 1782.209020] scsi host1: ib_srp: reconnect succeeded
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hi Leon

Calling relationship looks like this.

swiotlb_alloc_coherent 
     u64 dma_mask = DMA_BIT_MASK(32);
          
      May get overwritten here, I assume as a 64 bit mask right ?

       if (hwdev && hwdev->coherent_dma_mask)
		dma_mask = hwdev->coherent_dma_mask;          ***** Is this now DMA_BIT_MASK(64)


We fail here and then try single but we fail so we see the warning
 ret = (void *)__get_free_pages(flags, order);

It seems to be a non-critical event in that when we are able to reconnect we do.
I am missing how we recover here. I assume on next try we pass.
I will add some instrumentation to figure this out.


if (!ret) {
		/*
		 * We are either out of memory or the device can't DMA to
		 * GFP_DMA memory; fall back on map_single(), which
		 * will grab memory from the lowest available address range.
		 */
		phys_addr_t paddr = map_single(hwdev, 0, size, DMA_FROM_DEVICE);
		if (paddr == SWIOTLB_MAP_ERROR)
			goto err_warn;

		ret = phys_to_virt(paddr);
		dev_addr = phys_to_dma(hwdev, paddr);

		/* Confirm address can be DMA'd by device */
		if (dev_addr + size - 1 > dma_mask) {
			printk("hwdev DMA mask = 0x%016Lx, dev_addr = 0x%016Lx\n",
			       (unsigned long long)dma_mask,
			       (unsigned long long)dev_addr);

			/* DMA_TO_DEVICE to avoid memcpy in unmap_single */
			swiotlb_tbl_unmap_single(hwdev, paddr,
						 size, DMA_TO_DEVICE);
			goto err_warn;
		}
	}
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2016-06-13 22:30 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1217453008.41876448.1465770498545.JavaMail.zimbra@redhat.com>
     [not found] ` <1217453008.41876448.1465770498545.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-12 22:40   ` multipath IB/srp fail-over testing lands up in dump stack in swiotlb_alloc_coherent() Laurence Oberman
     [not found]     ` <19156300.41876496.1465771227395.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-13  6:32       ` Bart Van Assche
     [not found]         ` <2d316ddf-9a2a-3aba-cf2d-fcdaafbaa848-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-13 13:23           ` Laurence Oberman
2016-06-13 14:07           ` Leon Romanovsky
     [not found]             ` <20160613140747.GL5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-13 14:19               ` Laurence Oberman
     [not found]                 ` <946373818.41993264.1465827597452.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-13 15:22                   ` Laurence Oberman
     [not found]                     ` <887623939.42004497.1465831339845.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-13 22:30                       ` Laurence Oberman [this message]
     [not found]                         ` <450384210.42057823.1465857004662.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-14  1:56                           ` Laurence Oberman
     [not found]                             ` <1964187258.42093298.1465869387551.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-14  9:24                               ` Bart Van Assche
     [not found]                                 ` <11e680c4-84b3-1cd6-133c-36f71bd853d0-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-14 12:08                                   ` Leon Romanovsky
     [not found]                                     ` <20160614120833.GO5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-14 12:25                                       ` Bart Van Assche
     [not found]                                         ` <fe7c9713-2864-7b6c-53ec-f5d1364d65d8-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-14 13:10                                           ` Laurence Oberman
2016-06-14 13:15                                           ` Leon Romanovsky
     [not found]                                             ` <20160614131552.GP5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-14 13:57                                               ` Laurence Oberman
     [not found]                                                 ` <1531921470.42169965.1465912634165.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-14 17:40                                                   ` Laurence Oberman
     [not found]                                                     ` <1296246237.42197305.1465926035162.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-14 18:41                                                       ` Laurence Oberman
     [not found]                                                         ` <1167916510.42202925.1465929678588.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-15  7:40                                                           ` Bart Van Assche
     [not found]                                                             ` <a524c577-cfb1-4072-da12-01d0d9ab9c38-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-15 10:57                                                               ` Laurence Oberman
     [not found]                                                                 ` <109658870.42286330.1465988279277.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-15 12:02                                                                   ` Laurence Oberman
     [not found]                                                                     ` <794983323.42297890.1465992133003.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-15 12:51                                                                       ` Bart Van Assche
     [not found]                                                                         ` <cb6f8f42-1f4f-cf9d-42d0-12ba5e90ab86-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-15 13:19                                                                           ` Laurence Oberman
     [not found]                                                                             ` <1925675172.42312868.1465996772507.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-15 13:23                                                                               ` Laurence Oberman
     [not found]                                                                                 ` <868111008.42313561.1465997038399.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-15 23:05                                                                                   ` Laurence Oberman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=450384210.42057823.1465857004662.JavaMail.zimbra@redhat.com \
    --to=loberman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org \
    --cc=leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.