All of lore.kernel.org
 help / color / mirror / Atom feed
* bnx2x DMA mapping errors cause iscsi problems
@ 2014-04-23  8:07 Patrick Vranckx
  2014-04-23 11:59 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Patrick Vranckx @ 2014-04-23  8:07 UTC (permalink / raw)
  To: xen-devel

Hi,

We are running open source Xen 4.1.4 on Debian 7.4 amd64
HW is HP BL460c Gen8. Nic is Broadcom Corporation NetXtreme II BCM57810 
10 Gigabit Ethernet (rev 10)

We are experiencing sporadic network blackouts after a few days on eth4 
which is used for iscsi block storage for the VMs. VMs file sytems are 
switching to read-only and so we loose all the VMs. Then we have to 
reboot the hypervisor to regain network connectivity.

MTU = 9000 on eth4.

We were using Broadcom kernel driver from Debian 7.4 official kernel 
(3.2.54-2). Now we've updated with the latest driver published on 
Broadcom website, we have some more login :

[1200406.207855] [bnx2x_alloc_rx_data:1009(eth4)]Can't map rx data
[1200406.207978] [bnx2x_alloc_rx_data:1009(eth4)]Can't map rx data
.....

Here are bnx2x module versions we tried :
Debian 7.4 stock kernel : 1.70.30
Broadcom website (latest) : 1.78.58
Broadcom Firmware : Latest from HP BROADCOM 2.9.26 CP021537 package

Looking at bnx2x source code (bnx2x_cmn.c), it appears this error is 
caused by a DMA mapping error for rx buffers (memory leak ?)

static int bnx2x_alloc_rx_data(struct bnx2x *bp, struct bnx2x_fastpath *fp,
u16 index, gfp_t gfp_mask)
{
....
mapping = dma_map_single(&bp->pdev->dev, data + NET_SKB_PAD,
fp->rx_buf_size,
DMA_FROM_DEVICE);

if (unlikely(dma_mapping_error(&bp->pdev->dev, mapping))) {

#ifdef BCM_HAS_BUILD_SKB /* BNX2X_UPSTREAM */
bnx2x_frag_free(fp, data);
#else
dev_kfree_skb_any(data);
#endif
BNX2X_ERR("Can't map rx data\n");
return -ENOMEM;
}
...

We found several other references of people suffering from the same problem.
Here are two threads concerning Citrix XenServer 6 showing the exact 
same problem on BL460C G6 and Gen8
http://discussions.citrix.com/topic/324343-xenserver-61-bnx2x-sw-iommu/
http://discussions.citrix.com/topic/333281-xenserver-62-crash-bug/page-3

It seems from other references that most of the time, similar problems 
occuring with this driver are related to virtualized environments.

We found a rather old workaround from VMWare. The solution is to reduce 
the number of queues used by the driver (num_queues parameter). 
Unfortunately, the problem still occurs but after a longer period.

There are threads in this mailing list related to DMA allocation in Xen 
( http://markmail.org/message/uududlw5w6xlqcp2 ) but I'm not able to 
understand if those threads are related to our problem.

Thanks for your help,

Patrick

.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bnx2x DMA mapping errors cause iscsi problems
  2014-04-23  8:07 bnx2x DMA mapping errors cause iscsi problems Patrick Vranckx
@ 2014-04-23 11:59 ` Jan Beulich
  2014-04-23 12:10 ` Malcolm Crossley
  2014-04-24 13:55 ` Patrick Vranckx
  2 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2014-04-23 11:59 UTC (permalink / raw)
  To: Patrick Vranckx; +Cc: xen-devel

>>> On 23.04.14 at 10:07, <Patrick.Vranckx@uclouvain.be> wrote:
> Looking at bnx2x source code (bnx2x_cmn.c), it appears this error is 
> caused by a DMA mapping error for rx buffers (memory leak ?)
> 
> static int bnx2x_alloc_rx_data(struct bnx2x *bp, struct bnx2x_fastpath *fp,
> u16 index, gfp_t gfp_mask)
> {
> ....
> mapping = dma_map_single(&bp->pdev->dev, data + NET_SKB_PAD,
> fp->rx_buf_size,
> DMA_FROM_DEVICE);

With that, did you try increasing the SWIOTLB size ("swiotlb=" on the
kernel command line)?

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bnx2x DMA mapping errors cause iscsi problems
  2014-04-23  8:07 bnx2x DMA mapping errors cause iscsi problems Patrick Vranckx
  2014-04-23 11:59 ` Jan Beulich
@ 2014-04-23 12:10 ` Malcolm Crossley
  2014-04-24 13:55 ` Patrick Vranckx
  2 siblings, 0 replies; 4+ messages in thread
From: Malcolm Crossley @ 2014-04-23 12:10 UTC (permalink / raw)
  To: xen-devel

Hi Patrick,

Sorry this email won't help resolve your issue but I'm highlighting that
a design I have will help resolve this problem in the long run.

On 23/04/14 09:07, Patrick Vranckx wrote:
> Hi,
> 
> We are running open source Xen 4.1.4 on Debian 7.4 amd64
> HW is HP BL460c Gen8. Nic is Broadcom Corporation NetXtreme II BCM57810
> 10 Gigabit Ethernet (rev 10)
> 
> We are experiencing sporadic network blackouts after a few days on eth4
> which is used for iscsi block storage for the VMs. VMs file sytems are
> switching to read-only and so we loose all the VMs. Then we have to
> reboot the hypervisor to regain network connectivity.
> 
> MTU = 9000 on eth4.
> 
> We were using Broadcom kernel driver from Debian 7.4 official kernel
> (3.2.54-2). Now we've updated with the latest driver published on
> Broadcom website, we have some more login :
> 
> [1200406.207855] [bnx2x_alloc_rx_data:1009(eth4)]Can't map rx data
> [1200406.207978] [bnx2x_alloc_rx_data:1009(eth4)]Can't map rx data
> .....
>

This is exactly the issue that the linked design is trying to address:

http://lists.xen.org/archives/html/xen-devel/2014-04/msg01632.html

> Here are bnx2x module versions we tried :
> Debian 7.4 stock kernel : 1.70.30
> Broadcom website (latest) : 1.78.58
> Broadcom Firmware : Latest from HP BROADCOM 2.9.26 CP021537 package
> 
> Looking at bnx2x source code (bnx2x_cmn.c), it appears this error is
> caused by a DMA mapping error for rx buffers (memory leak ?)
> 
> static int bnx2x_alloc_rx_data(struct bnx2x *bp, struct bnx2x_fastpath *fp,
> u16 index, gfp_t gfp_mask)
> {
> ....
> mapping = dma_map_single(&bp->pdev->dev, data + NET_SKB_PAD,
> fp->rx_buf_size,
> DMA_FROM_DEVICE);
> 
> if (unlikely(dma_mapping_error(&bp->pdev->dev, mapping))) {
> 
> #ifdef BCM_HAS_BUILD_SKB /* BNX2X_UPSTREAM */
> bnx2x_frag_free(fp, data);
> #else
> dev_kfree_skb_any(data);
> #endif
> BNX2X_ERR("Can't map rx data\n");
> return -ENOMEM;
> }
> ...
> 
> We found several other references of people suffering from the same
> problem.
> Here are two threads concerning Citrix XenServer 6 showing the exact
> same problem on BL460C G6 and Gen8
> http://discussions.citrix.com/topic/324343-xenserver-61-bnx2x-sw-iommu/
> http://discussions.citrix.com/topic/333281-xenserver-62-crash-bug/page-3
> 
> It seems from other references that most of the time, similar problems
> occuring with this driver are related to virtualized environments.
> 
> We found a rather old workaround from VMWare. The solution is to reduce
> the number of queues used by the driver (num_queues parameter).
> Unfortunately, the problem still occurs but after a longer period.

Reducing the queues, increasing the swiotlb size (as Jan suggested) and
you can try adding  "disable_tpa=1" to bnx2x module parameters to work
around this issue. There will be a potential reduction in network
performance from these parameters.
> 
> There are threads in this mailing list related to DMA allocation in Xen
> ( http://markmail.org/message/uududlw5w6xlqcp2 ) but I'm not able to
> understand if those threads are related to our problem.
> 
> Thanks for your help,
> 
> Patrick
> 

Malcolm

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bnx2x DMA mapping errors cause iscsi problems
  2014-04-23  8:07 bnx2x DMA mapping errors cause iscsi problems Patrick Vranckx
  2014-04-23 11:59 ` Jan Beulich
  2014-04-23 12:10 ` Malcolm Crossley
@ 2014-04-24 13:55 ` Patrick Vranckx
  2 siblings, 0 replies; 4+ messages in thread
From: Patrick Vranckx @ 2014-04-24 13:55 UTC (permalink / raw)
  To: xen-devel


Hi Malcolm,


Thank you for your answer !

We already tried to tune serveral parameters in order to find a 
workaround for our concern:

- swiotlb size:

We didn't increase the swiotlb size.  We found a similar case found on 
the Citrix forums ( 
http://discussions.citrix.com/topic/324343-xenserver-61-bnx2x-sw-iommu/ 
): using swiotlb=256 did not help. So we didn't try ourselves. 
Unfortunately, there is no mention of a solution in that thread, only a 
patch for the bnx2x driver (Driver Disk for Broadcom bnx2x driver 
v1.74.22 for XenServer 6.1.0 with Hotfix XS61E018) but I have to verify 
if it is related to our problem.
I have to mention that we have no error messages about "Out of SW-IOMMU 
space" but this can be due the verbosity of the driver or the kernel.

- disable_tpa=1

this is already the case by disabling LRO (correct  ?). Here is the 
output of ethtool:

root@xen2-pyth:~# ethtool -k eth4
Features for eth4:
rx-checksumming: on
tx-checksumming: on
         tx-checksum-ipv4: on
         tx-checksum-unneeded: off [fixed]
         tx-checksum-ip-generic: off [fixed]
         tx-checksum-ipv6: on
         tx-checksum-fcoe-crc: off [fixed]
         tx-checksum-sctp: off [fixed]
scatter-gather: on
         tx-scatter-gather: on
         tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
         tx-tcp-segmentation: on
         tx-tcp-ecn-segmentation: on
         tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off

- reducing the queues.

We reduced the queues to 4 (default was 11).  When the problems happened 
this week, we modified again the parameter dynamically to num_queues=1. 
We were then able to go on without rebooting the hypervisor. No more 
messages  'Can't map rx data' till now... but for how long ? Setting the 
number of queues as low as 1 could have a long term effect ?

I've read the draft you wrote to solve the problem. As far as I 
understand (because this a very complex for me), this could be the root 
cause of our problem. But how can we monitor the different parameters 
(DMA, SW-IOMMU space, ...) when we have this problem to validate this 
assumption ?

BTW what is the time frame for implementing the proposed solution in 
your draft ? We run version 4.1.4 of Xen : are there improvements 
related to this problem in newer versions ?

Regards,

Patrick

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-24 13:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-23  8:07 bnx2x DMA mapping errors cause iscsi problems Patrick Vranckx
2014-04-23 11:59 ` Jan Beulich
2014-04-23 12:10 ` Malcolm Crossley
2014-04-24 13:55 ` Patrick Vranckx

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.