All of lore.kernel.org
 help / color / mirror / Atom feed
* decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
@ 2013-05-01 23:11 Or Gerlitz
       [not found] ` <CAJZOPZJ8eF-Q+WFzA-_vvzkpSb41PQjKFo27_Wi3McUccOqs9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Or Gerlitz @ 2013-05-01 23:11 UTC (permalink / raw)
  To: Roland Dreier, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Yan Burman, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Michael S. Tsirkin

Hi Roland, IOMMU folks,

So we've noted that when configuring the kernel && booting with intel
iommu set to on on a physical node (non VM, and without enabling SRIOV
by the HW device driver) raw performance of the iSER (iSCSI RDMA) SAN
initiator is reduced notably, e.g in the testbed we looked today we
had ~260K 1KB random IOPS and 5.5GBs BW for 128KB IOs with iommu
turned off for single LUN, and ~150K IOPS and 4GBs BW with iommu
turned on. No change on the target node between runs.

This was done over kernel 3.5.x, I will re-run tomorrow with latest
upstream and send the top perf hits for both cases, but does this
rings some bells? basically it makes sense for some extra latency, but
I didn't expect the IOPS and BW drop to be so notable. Is it possible
that at these rates the IOMMU is so much trashed that it can't keep
the perf we have without it being on?

We're a SCSI LLD which is called through the queuecommand API by the
SCSI midlayer, get a scatter-gather list pointing to set of pages,
issue dma_map_sg on the SG list, and then conduct some registration
with the HCA of this set of DMA addresses which produced a token
(rkey) we send to the target node with the request. The target node
does RDMA using that token, without the CPU of the initiator being
involved (the two HCAs talk) later the target sends respose which
means the transaction is done, after we get the response we undo the
registration at the HCA and do dma_unmap_sg.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
       [not found] ` <CAJZOPZJ8eF-Q+WFzA-_vvzkpSb41PQjKFo27_Wi3McUccOqs9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-05-02  0:13   ` Roland Dreier
  2013-05-02  1:56   ` Michael S. Tsirkin
  1 sibling, 0 replies; 11+ messages in thread
From: Roland Dreier @ 2013-05-02  0:13 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Yan Burman,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Michael S. Tsirkin

On Wed, May 1, 2013 at 4:11 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> This was done over kernel 3.5.x, I will re-run tomorrow with latest
> upstream and send the top perf hits for both cases, but does this
> rings some bells? basically it makes sense for some extra latency, but
> I didn't expect the IOPS and BW drop to be so notable. Is it possible
> that at these rates the IOMMU is so much trashed that it can't keep
> the perf we have without it being on?

Using the IOMMU makes the dma mapping API *much* more expensive.  Look
at the recent thread about NFS/RDMA performance for another instance
of this.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
       [not found] ` <CAJZOPZJ8eF-Q+WFzA-_vvzkpSb41PQjKFo27_Wi3McUccOqs9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-05-02  0:13   ` Roland Dreier
@ 2013-05-02  1:56   ` Michael S. Tsirkin
       [not found]     ` <20130502015603.GC26105-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2013-05-02  1:56 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Roland Dreier, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Yan Burman, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, May 02, 2013 at 02:11:15AM +0300, Or Gerlitz wrote:
> Hi Roland, IOMMU folks,
> 
> So we've noted that when configuring the kernel && booting with intel
> iommu set to on on a physical node (non VM, and without enabling SRIOV
> by the HW device driver) raw performance of the iSER (iSCSI RDMA) SAN
> initiator is reduced notably, e.g in the testbed we looked today we
> had ~260K 1KB random IOPS and 5.5GBs BW for 128KB IOs with iommu
> turned off for single LUN, and ~150K IOPS and 4GBs BW with iommu
> turned on. No change on the target node between runs.

That's why we have iommu=pt.
See definition of iommu_pass_through in arch/x86/kernel/pci-dma.c.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
       [not found]     ` <20130502015603.GC26105-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-05-02 14:13       ` Yan Burman
       [not found]         ` <0EE9A1CDC8D6434DB00095CD7DB873462CF9D73E-fViJhHBwANKuSA5JZHE7gA@public.gmane.org>
  2013-05-06 21:39       ` Or Gerlitz
  1 sibling, 1 reply; 11+ messages in thread
From: Yan Burman @ 2013-05-02 14:13 UTC (permalink / raw)
  To: Michael S. Tsirkin, Or Gerlitz
  Cc: Roland Dreier, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Thursday, May 02, 2013 04:56
> To: Or Gerlitz
> Cc: Roland Dreier; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Yan Burman; linux-
> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: decent performance drop for SCSI LLD / SAN initiator when
> iommu is turned on
> 
> On Thu, May 02, 2013 at 02:11:15AM +0300, Or Gerlitz wrote:
> > Hi Roland, IOMMU folks,
> >
> > So we've noted that when configuring the kernel && booting with intel
> > iommu set to on on a physical node (non VM, and without enabling SRIOV
> > by the HW device driver) raw performance of the iSER (iSCSI RDMA) SAN
> > initiator is reduced notably, e.g in the testbed we looked today we
> > had ~260K 1KB random IOPS and 5.5GBs BW for 128KB IOs with iommu
> > turned off for single LUN, and ~150K IOPS and 4GBs BW with iommu
> > turned on. No change on the target node between runs.
> 
> That's why we have iommu=pt.
> See definition of iommu_pass_through in arch/x86/kernel/pci-dma.c.

I tried passing "intel_iommu=on iommu=pt" to 3.8.11 kernel and I still get performance degradation.
I get the same numbers with iommu=pt as without it.

I wanted to send perf output, but currently I seem to have some problem with its output.
Will try to get perf differences next week.

Yan


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
       [not found]         ` <0EE9A1CDC8D6434DB00095CD7DB873462CF9D73E-fViJhHBwANKuSA5JZHE7gA@public.gmane.org>
@ 2013-05-03 19:40           ` Don Dutile
       [not found]             ` <518412AC.3070507-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Don Dutile @ 2013-05-03 19:40 UTC (permalink / raw)
  To: Yan Burman
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Or Gerlitz,
	Michael S. Tsirkin

On 05/02/2013 10:13 AM, Yan Burman wrote:
>
>
>> -----Original Message-----
>> From: Michael S. Tsirkin [mailto:mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
>> Sent: Thursday, May 02, 2013 04:56
>> To: Or Gerlitz
>> Cc: Roland Dreier; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Yan Burman; linux-
>> rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: decent performance drop for SCSI LLD / SAN initiator when
>> iommu is turned on
>>
>> On Thu, May 02, 2013 at 02:11:15AM +0300, Or Gerlitz wrote:
>>> Hi Roland, IOMMU folks,
>>>
>>> So we've noted that when configuring the kernel&&  booting with intel
>>> iommu set to on on a physical node (non VM, and without enabling SRIOV
>>> by the HW device driver) raw performance of the iSER (iSCSI RDMA) SAN
>>> initiator is reduced notably, e.g in the testbed we looked today we
>>> had ~260K 1KB random IOPS and 5.5GBs BW for 128KB IOs with iommu
>>> turned off for single LUN, and ~150K IOPS and 4GBs BW with iommu
>>> turned on. No change on the target node between runs.
>>
>> That's why we have iommu=pt.
>> See definition of iommu_pass_through in arch/x86/kernel/pci-dma.c.
>
> I tried passing "intel_iommu=on iommu=pt" to 3.8.11 kernel and I still get performance degradation.
> I get the same numbers with iommu=pt as without it.
>
> I wanted to send perf output, but currently I seem to have some problem with its output.
> Will try to get perf differences next week.
>
> Yan
>
>
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
dmesg dump? -- interested to see if x2apic is on, and if MSI is used (or not)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
       [not found]             ` <518412AC.3070507-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-05-05 14:06               ` Yan Burman
  0 siblings, 0 replies; 11+ messages in thread
From: Yan Burman @ 2013-05-05 14:06 UTC (permalink / raw)
  To: Don Dutile
  Cc: Michael S. Tsirkin, Or Gerlitz, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



> -----Original Message-----
> From: Don Dutile [mailto:ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> Sent: Friday, May 03, 2013 22:41
> To: Yan Burman
> Cc: Michael S. Tsirkin; Or Gerlitz; Roland Dreier; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> Subject: Re: decent performance drop for SCSI LLD / SAN initiator when
> iommu is turned on
> 
> On 05/02/2013 10:13 AM, Yan Burman wrote:
> >
> >
> >> -----Original Message-----
> >> From: Michael S. Tsirkin [mailto:mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org]
> >> Sent: Thursday, May 02, 2013 04:56
> >> To: Or Gerlitz
> >> Cc: Roland Dreier; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Yan Burman;
> >> linux- rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Subject: Re: decent performance drop for SCSI LLD / SAN initiator
> >> when iommu is turned on
> >>
> >> On Thu, May 02, 2013 at 02:11:15AM +0300, Or Gerlitz wrote:
> >>> Hi Roland, IOMMU folks,
> >>>
> >>> So we've noted that when configuring the kernel&&  booting with
> >>> intel iommu set to on on a physical node (non VM, and without
> >>> enabling SRIOV by the HW device driver) raw performance of the iSER
> >>> (iSCSI RDMA) SAN initiator is reduced notably, e.g in the testbed we
> >>> looked today we had ~260K 1KB random IOPS and 5.5GBs BW for 128KB
> >>> IOs with iommu turned off for single LUN, and ~150K IOPS and 4GBs BW
> >>> with iommu turned on. No change on the target node between runs.
> >>
> >> That's why we have iommu=pt.
> >> See definition of iommu_pass_through in arch/x86/kernel/pci-dma.c.
> >
> > I tried passing "intel_iommu=on iommu=pt" to 3.8.11 kernel and I still get
> performance degradation.
> > I get the same numbers with iommu=pt as without it.
> >
> > I wanted to send perf output, but currently I seem to have some problem
> with its output.
> > Will try to get perf differences next week.
> >
> > Yan
> >
> >
> > _______________________________________________
> > iommu mailing list
> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
> dmesg dump? -- interested to see if x2apic is on, and if MSI is used (or not)


The entire dmesg is 98K, so I won't send it here (I can send it off list if you need it), but I see that x2apic is not enabled:

[    0.019051] ------------[ cut here ]------------
[    0.019175] WARNING: at drivers/iommu/intel_irq_remapping.c:542 intel_enable_irq_remapping+0x78/0x279()
[    0.019362] Hardware name: ProLiant DL380p Gen8
[    0.019481] Your BIOS is broken and requested that x2apic be disabled
[    0.019481] This will leave your machine vulnerable to irq-injection attacks
[    0.019481] Use 'intremap=no_x2apic_optout' to override BIOS request
[    0.019750] Modules linked in:
[    0.019921] Pid: 1, comm: swapper/0 Not tainted 3.8.11-perf #4
[    0.020040] Call Trace:
[    0.020159]  [<ffffffff8103d22a>] warn_slowpath_common+0x7a/0xb0
[    0.020279]  [<ffffffff8103d301>] warn_slowpath_fmt+0x41/0x50
[    0.020399]  [<ffffffff8168f300>] intel_enable_irq_remapping+0x78/0x279
[    0.020522]  [<ffffffff8168f563>] irq_remapping_enable+0x1b/0x24
[    0.020646]  [<ffffffff8166faf5>] enable_IR+0x3c/0x3e
[    0.020768]  [<ffffffff8166fb7f>] enable_IR_x2apic+0x88/0x1e7
[    0.020892]  [<ffffffff81672089>] default_setup_apic_routing+0x15/0x6e
[    0.021015]  [<ffffffff8166ef7d>] native_smp_prepare_cpus+0x361/0x395
[    0.021139]  [<ffffffff816625d0>] kernel_init_freeable+0x5e/0x191
[    0.021263]  [<ffffffff8138a810>] ? rest_init+0x80/0x80
[    0.021384]  [<ffffffff8138a819>] kernel_init+0x9/0xf0
[    0.021505]  [<ffffffff8139132c>] ret_from_fork+0x7c/0xb0
[    0.021630]  [<ffffffff8138a810>] ? rest_init+0x80/0x80
[    0.021755] ---[ end trace 307c85faec0be3b4 ]---
[    0.022289] Enabled IRQ remapping in xapic mode
[    0.022409] x2apic not enabled, IRQ remapping is in xapic mode
[    0.022543] Switched APIC routing to physical flat.



MSI is being used:
cat /proc/interrupts | grep mlx
  98:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-comp-0@pci:0000:07:00.0
  99:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-comp-1@pci:0000:07:00.0
 100:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-comp-2@pci:0000:07:00.0
 101:       3877          0          0          0       5503          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-async@pci:0000:07:00.0
 102:        108          0          0          0          0    2012115          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-ib-1-0@PCI Bus 0000:07
 103:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-ib-1-1@PCI Bus 0000:07
 104:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-ib-1-2@PCI Bus 0000:07
 105:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-ib-1-3@PCI Bus 0000:07
 106:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-ib-1-4@PCI Bus 0000:07
 107:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-ib-1-5@PCI Bus 0000:07
 108:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-ib-1-6@PCI Bus 0000:07
 109:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mlx4-ib-1-7@PCI Bus 0000:07


I tried passing in 'intremap=no_x2apic_optout' along with iommu=pt, but saw no difference in performance.

I did see a difference between boot with iommu=pt and without it (don't know if it matters):
Without iommu=pt, I get a lot of "IOMMU: Setting identity map for device 0000:07:00.0 [0xe8000 - 0xe8fff]"
With iommu=pt, I get a lot of "IOMMU: hardware identity mapping for device 0000:20:04.7" first and the "Setting identity map for device" messages,
But the device in question (0000:07:00.0) does not appear in " IOMMU: hardware identity mapping" messages.

Yan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
       [not found]     ` <20130502015603.GC26105-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-05-02 14:13       ` Yan Burman
@ 2013-05-06 21:39       ` Or Gerlitz
       [not found]         ` <CAJZOPZLWgXNCEpZjzuizVGPEVPg1G+cHh373ZCoumMx9eAabvQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Or Gerlitz @ 2013-05-06 21:39 UTC (permalink / raw)
  To: Michael S. Tsirkin, Alexander Duyck
  Cc: Roland Dreier, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Yan Burman, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Paolo Bonzini,
	Asias He

On Thu, May 2, 2013 at 4:56 AM, Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
> On Thu, May 02, 2013 at 02:11:15AM +0300, Or Gerlitz wrote:
> > So we've noted that when configuring the kernel && booting with intel
> > iommu set to on on a physical node (non VM, and without enabling SRIOV
> > by the HW device driver) raw performance of the iSER (iSCSI RDMA) SAN
> > initiator is reduced notably, e.g in the testbed we looked today we
> > had ~260K 1KB random IOPS and 5.5GBs BW for 128KB IOs with iommu
> > turned off for single LUN, and ~150K IOPS and 4GBs BW with iommu
> > turned on. No change on the target node between runs.
>
> That's why we have iommu=pt.
> See definition of iommu_pass_through in arch/x86/kernel/pci-dma.c.



Hi Michael (hope you feel better),

We did some runs with the pt approach you suggested and still didn't
get the promised gain -- in parallel we came across this 2012 commit
f800326dc "ixgbe: Replace standard receive path with a page based
receive" where they say "[...] we are able to see a considerable
performance gain when an IOMMU is enabled because we are no longer
unmapping every buffer on receive [...] instead we can simply call
sync_single_range [...]"  looking on the commit you can see that they
allocate a page/skb dma_map it initially and later of the life cycle
of that buffer use dma_sync_for_device/cpu and avoid dma_map/unmap on
the fast path.

Well few questions which I'd love to hear people's opinion -- 1st this
approach seems cool for network device RX path, but what about the TX
path, any idea how to avoid dma_map for it? or why on the TX path
calling dma_map/unmap for every buffer doesn't involve a notable perf
hit? 2nd I don't see how to apply the method on block device since
these devices don't allocate buffers, but rather get a scatter-gather
list of pages from upper layers, issue dma_map_sg on them and submit
the IO, later when done call dma_unmap_sg

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
       [not found]         ` <CAJZOPZLWgXNCEpZjzuizVGPEVPg1G+cHh373ZCoumMx9eAabvQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-05-06 22:35           ` Alexander Duyck
       [not found]             ` <5188304E.9050603-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Duyck @ 2013-05-06 22:35 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Michael S. Tsirkin, Roland Dreier,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Yan Burman,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Paolo Bonzini, Asias He

On 05/06/2013 02:39 PM, Or Gerlitz wrote:
> On Thu, May 2, 2013 at 4:56 AM, Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> On Thu, May 02, 2013 at 02:11:15AM +0300, Or Gerlitz wrote:
>>> So we've noted that when configuring the kernel && booting with intel
>>> iommu set to on on a physical node (non VM, and without enabling SRIOV
>>> by the HW device driver) raw performance of the iSER (iSCSI RDMA) SAN
>>> initiator is reduced notably, e.g in the testbed we looked today we
>>> had ~260K 1KB random IOPS and 5.5GBs BW for 128KB IOs with iommu
>>> turned off for single LUN, and ~150K IOPS and 4GBs BW with iommu
>>> turned on. No change on the target node between runs.
>> That's why we have iommu=pt.
>> See definition of iommu_pass_through in arch/x86/kernel/pci-dma.c.
>
>
> Hi Michael (hope you feel better),
>
> We did some runs with the pt approach you suggested and still didn't
> get the promised gain -- in parallel we came across this 2012 commit
> f800326dc "ixgbe: Replace standard receive path with a page based
> receive" where they say "[...] we are able to see a considerable
> performance gain when an IOMMU is enabled because we are no longer
> unmapping every buffer on receive [...] instead we can simply call
> sync_single_range [...]"  looking on the commit you can see that they
> allocate a page/skb dma_map it initially and later of the life cycle
> of that buffer use dma_sync_for_device/cpu and avoid dma_map/unmap on
> the fast path.
>
> Well few questions which I'd love to hear people's opinion -- 1st this
> approach seems cool for network device RX path, but what about the TX
> path, any idea how to avoid dma_map for it? or why on the TX path
> calling dma_map/unmap for every buffer doesn't involve a notable perf
> hit? 2nd I don't see how to apply the method on block device since
> these devices don't allocate buffers, but rather get a scatter-gather
> list of pages from upper layers, issue dma_map_sg on them and submit
> the IO, later when done call dma_unmap_sg
>
> Or.

The Tx path ends up taking a performance hit if IOMMU is enabled.  It
just isn't as severe due to things like TSO.

One way to work around the performance penalty is to allocate bounce
buffers and just leave them static mapped.  Then you can simply memcpy
the data to the buffers and avoid the locking overhead of
allocating/freeing IOMMU resources.  It consumes more memory but works
around the IOMMU limitations.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
       [not found]             ` <5188304E.9050603-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2013-05-07 12:12               ` Or Gerlitz
  2013-05-07 12:22               ` Michael S. Tsirkin
  1 sibling, 0 replies; 11+ messages in thread
From: Or Gerlitz @ 2013-05-07 12:12 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Or Gerlitz, Michael S. Tsirkin, Roland Dreier,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Yan Burman,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Paolo Bonzini, Asias He

On 07/05/2013 01:35, Alexander Duyck wrote:
> The Tx path ends up taking a performance hit if IOMMU is enabled.  It
> just isn't as severe due to things like TSO.

In testing done by some Mellanox folks I think they see major penalty on 
the RX side, but hardly
nothing on the TX side, I'll check that.

> One way to work around the performance penalty is to allocate bounce
> buffers and just leave them static mapped.  Then you can simply memcpy
> the data to the buffers and avoid the locking overhead of
> allocating/freeing IOMMU resources.  It consumes more memory but works
> around the IOMMU limitations.

I don't think can be applicable approach to fast networking/storage drivers.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
       [not found]             ` <5188304E.9050603-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2013-05-07 12:12               ` Or Gerlitz
@ 2013-05-07 12:22               ` Michael S. Tsirkin
       [not found]                 ` <20130507122235.GA21361-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2013-05-07 12:22 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Roland Dreier, Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Yan Burman, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Paolo Bonzini, Asias He

On Mon, May 06, 2013 at 03:35:58PM -0700, Alexander Duyck wrote:
> On 05/06/2013 02:39 PM, Or Gerlitz wrote:
> > On Thu, May 2, 2013 at 4:56 AM, Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> On Thu, May 02, 2013 at 02:11:15AM +0300, Or Gerlitz wrote:
> >>> So we've noted that when configuring the kernel && booting with intel
> >>> iommu set to on on a physical node (non VM, and without enabling SRIOV
> >>> by the HW device driver) raw performance of the iSER (iSCSI RDMA) SAN
> >>> initiator is reduced notably, e.g in the testbed we looked today we
> >>> had ~260K 1KB random IOPS and 5.5GBs BW for 128KB IOs with iommu
> >>> turned off for single LUN, and ~150K IOPS and 4GBs BW with iommu
> >>> turned on. No change on the target node between runs.
> >> That's why we have iommu=pt.
> >> See definition of iommu_pass_through in arch/x86/kernel/pci-dma.c.
> >
> >
> > Hi Michael (hope you feel better),
> >
> > We did some runs with the pt approach you suggested and still didn't
> > get the promised gain -- in parallel we came across this 2012 commit
> > f800326dc "ixgbe: Replace standard receive path with a page based
> > receive" where they say "[...] we are able to see a considerable
> > performance gain when an IOMMU is enabled because we are no longer
> > unmapping every buffer on receive [...] instead we can simply call
> > sync_single_range [...]"  looking on the commit you can see that they
> > allocate a page/skb dma_map it initially and later of the life cycle
> > of that buffer use dma_sync_for_device/cpu and avoid dma_map/unmap on
> > the fast path.
> >
> > Well few questions which I'd love to hear people's opinion -- 1st this
> > approach seems cool for network device RX path, but what about the TX
> > path, any idea how to avoid dma_map for it? or why on the TX path
> > calling dma_map/unmap for every buffer doesn't involve a notable perf
> > hit? 2nd I don't see how to apply the method on block device since
> > these devices don't allocate buffers, but rather get a scatter-gather
> > list of pages from upper layers, issue dma_map_sg on them and submit
> > the IO, later when done call dma_unmap_sg
> >
> > Or.
> 
> The Tx path ends up taking a performance hit if IOMMU is enabled.  It
> just isn't as severe due to things like TSO.
> 
> One way to work around the performance penalty is to allocate bounce
> buffers and just leave them static mapped.  Then you can simply memcpy
> the data to the buffers and avoid the locking overhead of
> allocating/freeing IOMMU resources.  It consumes more memory but works
> around the IOMMU limitations.
> 
> Thanks,
> 
> Alex

But why isn't iommu=pt effective?
AFAIK the whole point of it was to give up on security
for host-controlled devices, but still get a
measure of security for assigned devices.

-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: decent performance drop for SCSI LLD / SAN initiator when iommu is turned on
       [not found]                 ` <20130507122235.GA21361-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-05-07 14:50                   ` Or Gerlitz
  0 siblings, 0 replies; 11+ messages in thread
From: Or Gerlitz @ 2013-05-07 14:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, Roland Dreier,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Yan Burman,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Paolo Bonzini, Asias He

On Tue, May 7, 2013 at 3:22 PM, Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> But why isn't iommu=pt effective?
> AFAIK the whole point of it was to give up on security
> for host-controlled devices, but still get a
> measure of security for assigned devices.


Good questions, from the tests Yan did so far he didn't see that this
yields improvement for the iSER
SCSI driver, didn't try it yet with netdevice, Alex/Asias did you
tried that option? any conclusions?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-05-07 14:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-01 23:11 decent performance drop for SCSI LLD / SAN initiator when iommu is turned on Or Gerlitz
     [not found] ` <CAJZOPZJ8eF-Q+WFzA-_vvzkpSb41PQjKFo27_Wi3McUccOqs9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-02  0:13   ` Roland Dreier
2013-05-02  1:56   ` Michael S. Tsirkin
     [not found]     ` <20130502015603.GC26105-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-05-02 14:13       ` Yan Burman
     [not found]         ` <0EE9A1CDC8D6434DB00095CD7DB873462CF9D73E-fViJhHBwANKuSA5JZHE7gA@public.gmane.org>
2013-05-03 19:40           ` Don Dutile
     [not found]             ` <518412AC.3070507-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-05-05 14:06               ` Yan Burman
2013-05-06 21:39       ` Or Gerlitz
     [not found]         ` <CAJZOPZLWgXNCEpZjzuizVGPEVPg1G+cHh373ZCoumMx9eAabvQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-06 22:35           ` Alexander Duyck
     [not found]             ` <5188304E.9050603-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2013-05-07 12:12               ` Or Gerlitz
2013-05-07 12:22               ` Michael S. Tsirkin
     [not found]                 ` <20130507122235.GA21361-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-05-07 14:50                   ` Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.