linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PCIe RC\EP virtio rdma solution discussion.
@ 2023-02-07 19:45 Frank Li
  2023-02-14  3:24 ` Shunsuke Mie
  2023-02-15  8:23 ` Manivannan Sadhasivam
  0 siblings, 2 replies; 4+ messages in thread
From: Frank Li @ 2023-02-07 19:45 UTC (permalink / raw)
  To: mie, imx
  Cc: Frank.Li, bhelgaas, jasowang, jdmason, kishon, kw, linux-kernel,
	linux-pci, lpieralisi, mani, mst, renzhijie2, taki,
	virtualization

From: Frank Li <Frank.li@nxp.com>

Recently more and more people are interested in PCI RC and EP connection,
especially network usage cases. I upstreamed a vntb solution last year. 
But the transfer speed is not good enough. I initialized a discussion 
at https://lore.kernel.org/imx/d098a631-9930-26d3-48f3-8f95386c8e50@ti.com/T/#t
 
  ┌─────────────────────────────────┐   ┌──────────────┐
  │                                 │   │              │
  │                                 │   │              │
  │   VirtQueue             RX      │   │  VirtQueue   │
  │     TX                 ┌──┐     │   │    TX        │
  │  ┌─────────┐           │  │     │   │ ┌─────────┐  │
  │  │ SRC LEN ├─────┐  ┌──┤  │◄────┼───┼─┤ SRC LEN │  │
  │  ├─────────┤     │  │  │  │     │   │ ├─────────┤  │
  │  │         │     │  │  │  │     │   │ │         │  │
  │  ├─────────┤     │  │  │  │     │   │ ├─────────┤  │
  │  │         │     │  │  │  │     │   │ │         │  │
  │  └─────────┘     │  │  └──┘     │   │ └─────────┘  │
  │                  │  │           │   │              │
  │     RX       ┌───┼──┘   TX      │   │    RX        │
  │  ┌─────────┐ │   │     ┌──┐     │   │ ┌─────────┐  │
  │  │         │◄┘   └────►│  ├─────┼───┼─┤         │  │
  │  ├─────────┤           │  │     │   │ ├─────────┤  │
  │  │         │           │  │     │   │ │         │  │
  │  ├─────────┤           │  │     │   │ ├─────────┤  │
  │  │         │           │  │     │   │ │         │  │
  │  └─────────┘           │  │     │   │ └─────────┘  │
  │   virtio_net           └──┘     │   │ virtio_net   │
  │  Virtual PCI BUS   EDMA Queue   │   │              │
  ├─────────────────────────────────┤   │              │
  │  PCI EP Controller with eDMA    │   │  PCI Host    │
  └─────────────────────────────────┘   └──────────────┘

Basic idea is
	1.	Both EP and host probe virtio_net driver
	2.	There are two queues,  one is the EP side(EQ),  the other is the Host side. 
	3.	EP side epf driver map Host side’s queue into EP’s space. Called HQ.
	4.	One working thread 
	5.	pick one TX from EQ and RX from HQ, combine and generate EDMA requests, 
and put into the DMA TX queue.
	6.	Pick one RX from EQ and TX from HQ, combine and generate EDMA requests,
and put into the DMA RX queue. 
	7.	EDMA done irq will mark related item in EP and HQ finished.

The whole transfer is zero copied and uses a DMA queue.

The Shunsuke Mie implemented the above idea. 
 https://lore.kernel.org/linux-pci/CANXvt5q_qgLuAfF7dxxrqUirT_Ld4B=POCq8JcB9uPRvCGDiKg@mail.gmail.com/T/#t


Similar solution posted at 2019, except use memcpy from/to PCI EP map windows. 
Using DMA should be simpler because EDMA can access the whole HOST\EP side memory space. 
https://lore.kernel.org/linux-pci/9f8e596f-b601-7f97-a98a-111763f966d1@ti.com/T/

Solution 1 (Based on shunsuke):

Both EP and Host side use virtio.
Using EDMA to simplify data transfer and improve transfer speed.
RDMA implement based on RoCE
- proposal: https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
- presentation on kvm forum: https://youtu.be/Qrhv6hC_YK4

Solution 2(2020, Kishon)

Previous https://lore.kernel.org/linux-pci/20200702082143.25259-1-kishon@ti.com/
EP side use vhost, RC side use virtio.
I don’t think anyone works on this thread now.
If using eDMA, it needs both sides to have a transfer queue. 
I don't know how to easily implement it on the vhost side. 

Solution 3(I am working on)

Implement infiniband rdma driver at both EP and RC side. 
EP side build EDMA hardware queue based on EP/RC side’s send and receive
queue and when eDMA finished, write status to complete queue for both EP/RC 
side. Use ipoib implement network transfer.


The whole upstream effort is quite huge for these. I don’t want to waste
time and efforts because direction is wrong. 

I think Solution 1 is an easy path.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PCIe RC\EP virtio rdma solution discussion.
  2023-02-07 19:45 PCIe RC\EP virtio rdma solution discussion Frank Li
@ 2023-02-14  3:24 ` Shunsuke Mie
  2023-02-14 15:28   ` [EXT] " Frank Li
  2023-02-15  8:23 ` Manivannan Sadhasivam
  1 sibling, 1 reply; 4+ messages in thread
From: Shunsuke Mie @ 2023-02-14  3:24 UTC (permalink / raw)
  To: Frank Li, imx
  Cc: bhelgaas, jasowang, jdmason, kishon, kw, linux-kernel, linux-pci,
	lpieralisi, mani, mst, renzhijie2, taki, virtualization

Thanks for organizing the discussion.

On 2023/02/08 4:45, Frank Li wrote:
> From: Frank Li<Frank.li@nxp.com>
>
> Recently more and more people are interested in PCI RC and EP connection,
> especially network usage cases. I upstreamed a vntb solution last year.
> But the transfer speed is not good enough. I initialized a discussion
> athttps://lore.kernel.org/imx/d098a631-9930-26d3-48f3-8f95386c8e50@ti.com/T/#t
I've investigated the vntb + ntbnet device that uses ntb transfer. Is it
difficult to adapt the eDMA to the ntb transfer? It is likely one of the
solutions for the performance problem.
>   
>    ┌─────────────────────────────────┐   ┌──────────────┐
>    │                                 │   │              │
>    │                                 │   │              │
>    │   VirtQueue             RX      │   │  VirtQueue   │
>    │     TX                 ┌──┐     │   │    TX        │
>    │  ┌─────────┐           │  │     │   │ ┌─────────┐  │
>    │  │ SRC LEN ├─────┐  ┌──┤  │◄────┼───┼─┤ SRC LEN │  │
>    │  ├─────────┤     │  │  │  │     │   │ ├─────────┤  │
>    │  │         │     │  │  │  │     │   │ │         │  │
>    │  ├─────────┤     │  │  │  │     │   │ ├─────────┤  │
>    │  │         │     │  │  │  │     │   │ │         │  │
>    │  └─────────┘     │  │  └──┘     │   │ └─────────┘  │
>    │                  │  │           │   │              │
>    │     RX       ┌───┼──┘   TX      │   │    RX        │
>    │  ┌─────────┐ │   │     ┌──┐     │   │ ┌─────────┐  │
>    │  │         │◄┘   └────►│  ├─────┼───┼─┤         │  │
>    │  ├─────────┤           │  │     │   │ ├─────────┤  │
>    │  │         │           │  │     │   │ │         │  │
>    │  ├─────────┤           │  │     │   │ ├─────────┤  │
>    │  │         │           │  │     │   │ │         │  │
>    │  └─────────┘           │  │     │   │ └─────────┘  │
>    │   virtio_net           └──┘     │   │ virtio_net   │
>    │  Virtual PCI BUS   EDMA Queue   │   │              │
>    ├─────────────────────────────────┤   │              │
>    │  PCI EP Controller with eDMA    │   │  PCI Host    │
>    └─────────────────────────────────┘   └──────────────┘
>
> Basic idea is
> 	1.	Both EP and host probe virtio_net driver
> 	2.	There are two queues,  one is the EP side(EQ),  the other is the Host side.
> 	3.	EP side epf driver map Host side’s queue into EP’s space. Called HQ.
> 	4.	One working thread
> 	5.	pick one TX from EQ and RX from HQ, combine and generate EDMA requests,
> and put into the DMA TX queue.
> 	6.	Pick one RX from EQ and TX from HQ, combine and generate EDMA requests,
> and put into the DMA RX queue.
> 	7.	EDMA done irq will mark related item in EP and HQ finished.
>
> The whole transfer is zero copied and uses a DMA queue.
>
> The Shunsuke Mie implemented the above idea.
>   https://lore.kernel.org/linux-pci/CANXvt5q_qgLuAfF7dxxrqUirT_Ld4B=POCq8JcB9uPRvCGDiKg@mail.gmail.com/T/#t
>
>
> Similar solution posted at 2019, except use memcpy from/to PCI EP map windows.
> Using DMA should be simpler because EDMA can access the whole HOST\EP side memory space.
> https://lore.kernel.org/linux-pci/9f8e596f-b601-7f97-a98a-111763f966d1@ti.com/T/
>
> Solution 1 (Based on shunsuke):
>
> Both EP and Host side use virtio.
> Using EDMA to simplify data transfer and improve transfer speed.
> RDMA implement based on RoCE
> - proposal:https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> - presentation on kvm forum:https://youtu.be/Qrhv6hC_YK4
>
> Solution 2(2020, Kishon)
>
> Previoushttps://lore.kernel.org/linux-pci/20200702082143.25259-1-kishon@ti.com/
> EP side use vhost, RC side use virtio.
> I don’t think anyone works on this thread now.
> If using eDMA, it needs both sides to have a transfer queue.
> I don't know how to easily implement it on the vhost side.
We had implemented this solution at the design stage of our proposal.
This solution has to prepare a network device and register to the kernel
from scratch for the endpoint. There is a lot of duplicated code, so we
think the solution 1 is better, as Frank said.
> Solution 3(I am working on)
>
> Implement infiniband rdma driver at both EP and RC side.
> EP side build EDMA hardware queue based on EP/RC side’s send and receive
> queue and when eDMA finished, write status to complete queue for both EP/RC
> side. Use ipoib implement network transfer.
The new InfiniBand device has to implement an InfiniBand network layer.
I think it is overengineered for this peer-to-peer communication. In
addition, the driver of the InfiniBand device should be implemented or
emulate the existing InfiniBand device to use the upstream driver. We
want to reduce the cost of implementation and maintenance.
> The whole upstream effort is quite huge for these. I don’t want to waste
> time and efforts because direction is wrong.
>
> I think Solution 1 is an easy path.
>
>
>
Best,

Shunsuke.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [EXT] Re: PCIe RC\EP virtio rdma solution discussion.
  2023-02-14  3:24 ` Shunsuke Mie
@ 2023-02-14 15:28   ` Frank Li
  0 siblings, 0 replies; 4+ messages in thread
From: Frank Li @ 2023-02-14 15:28 UTC (permalink / raw)
  To: Shunsuke Mie, imx
  Cc: bhelgaas, jasowang, jdmason, kishon, kw, linux-kernel, linux-pci,
	lpieralisi, mani, mst, renzhijie2, taki, virtualization

> EP side use vhost, RC side use virtio.
> > I don’t think anyone works on this thread now.
> > If using eDMA, it needs both sides to have a transfer queue.
> > I don't know how to easily implement it on the vhost side.
> We had implemented this solution at the design stage of our proposal.
> This solution has to prepare a network device and register to the kernel
> from scratch for the endpoint. There is a lot of duplicated code, so we
> think the solution 1 is better, as Frank said.
> > Solution 3(I am working on)
> >
> > Implement infiniband rdma driver at both EP and RC side.
> > EP side build EDMA hardware queue based on EP/RC side’s send and
> receive
> > queue and when eDMA finished, write status to complete queue for both
> EP/RC
> > side. Use ipoib implement network transfer.
> The new InfiniBand device has to implement an InfiniBand network layer.
> I think it is overengineered for this peer-to-peer communication. In
> addition, the driver of the InfiniBand device should be implemented or
> emulate the existing InfiniBand device to use the upstream driver. We
> want to reduce the cost of implementation and maintenance.

Infiniband driver is quite complex. That's reason why progress is slow in my
side.  I hope the endpoint maintainer(kw) and PCI maintainer(Bjorn) can
provide comments.  

> > The whole upstream effort is quite huge for these. I don’t want to waste
> > time and efforts because direction is wrong.
> >
> > I think Solution 1 is an easy path.
> >
> >
> >
> Best,
> 
> Shunsuke.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PCIe RC\EP virtio rdma solution discussion.
  2023-02-07 19:45 PCIe RC\EP virtio rdma solution discussion Frank Li
  2023-02-14  3:24 ` Shunsuke Mie
@ 2023-02-15  8:23 ` Manivannan Sadhasivam
  1 sibling, 0 replies; 4+ messages in thread
From: Manivannan Sadhasivam @ 2023-02-15  8:23 UTC (permalink / raw)
  To: Frank Li
  Cc: mie, imx, bhelgaas, jasowang, jdmason, kishon, kw, linux-kernel,
	linux-pci, lpieralisi, mani, mst, renzhijie2, taki,
	virtualization

On Tue, Feb 07, 2023 at 02:45:27PM -0500, Frank Li wrote:
> From: Frank Li <Frank.li@nxp.com>
> 
> Recently more and more people are interested in PCI RC and EP connection,
> especially network usage cases. I upstreamed a vntb solution last year. 
> But the transfer speed is not good enough. I initialized a discussion 
> at https://lore.kernel.org/imx/d098a631-9930-26d3-48f3-8f95386c8e50@ti.com/T/#t
>  
>   ┌─────────────────────────────────┐   ┌──────────────┐
>   │                                 │   │              │
>   │                                 │   │              │
>   │   VirtQueue             RX      │   │  VirtQueue   │
>   │     TX                 ┌──┐     │   │    TX        │
>   │  ┌─────────┐           │  │     │   │ ┌─────────┐  │
>   │  │ SRC LEN ├─────┐  ┌──┤  │◄────┼───┼─┤ SRC LEN │  │
>   │  ├─────────┤     │  │  │  │     │   │ ├─────────┤  │
>   │  │         │     │  │  │  │     │   │ │         │  │
>   │  ├─────────┤     │  │  │  │     │   │ ├─────────┤  │
>   │  │         │     │  │  │  │     │   │ │         │  │
>   │  └─────────┘     │  │  └──┘     │   │ └─────────┘  │
>   │                  │  │           │   │              │
>   │     RX       ┌───┼──┘   TX      │   │    RX        │
>   │  ┌─────────┐ │   │     ┌──┐     │   │ ┌─────────┐  │
>   │  │         │◄┘   └────►│  ├─────┼───┼─┤         │  │
>   │  ├─────────┤           │  │     │   │ ├─────────┤  │
>   │  │         │           │  │     │   │ │         │  │
>   │  ├─────────┤           │  │     │   │ ├─────────┤  │
>   │  │         │           │  │     │   │ │         │  │
>   │  └─────────┘           │  │     │   │ └─────────┘  │
>   │   virtio_net           └──┘     │   │ virtio_net   │
>   │  Virtual PCI BUS   EDMA Queue   │   │              │
>   ├─────────────────────────────────┤   │              │
>   │  PCI EP Controller with eDMA    │   │  PCI Host    │
>   └─────────────────────────────────┘   └──────────────┘
> 
> Basic idea is
> 	1.	Both EP and host probe virtio_net driver
> 	2.	There are two queues,  one is the EP side(EQ),  the other is the Host side. 
> 	3.	EP side epf driver map Host side’s queue into EP’s space. Called HQ.
> 	4.	One working thread 
> 	5.	pick one TX from EQ and RX from HQ, combine and generate EDMA requests, 
> and put into the DMA TX queue.
> 	6.	Pick one RX from EQ and TX from HQ, combine and generate EDMA requests,
> and put into the DMA RX queue. 
> 	7.	EDMA done irq will mark related item in EP and HQ finished.
> 
> The whole transfer is zero copied and uses a DMA queue.
> 
> The Shunsuke Mie implemented the above idea. 
>  https://lore.kernel.org/linux-pci/CANXvt5q_qgLuAfF7dxxrqUirT_Ld4B=POCq8JcB9uPRvCGDiKg@mail.gmail.com/T/#t
> 
> 
> Similar solution posted at 2019, except use memcpy from/to PCI EP map windows. 
> Using DMA should be simpler because EDMA can access the whole HOST\EP side memory space. 
> https://lore.kernel.org/linux-pci/9f8e596f-b601-7f97-a98a-111763f966d1@ti.com/T/
> 
> Solution 1 (Based on shunsuke):
> 
> Both EP and Host side use virtio.
> Using EDMA to simplify data transfer and improve transfer speed.
> RDMA implement based on RoCE
> - proposal: https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
> - presentation on kvm forum: https://youtu.be/Qrhv6hC_YK4
> 
> Solution 2(2020, Kishon)
> 
> Previous https://lore.kernel.org/linux-pci/20200702082143.25259-1-kishon@ti.com/
> EP side use vhost, RC side use virtio.
> I don’t think anyone works on this thread now.
> If using eDMA, it needs both sides to have a transfer queue. 
> I don't know how to easily implement it on the vhost side. 
> 
> Solution 3(I am working on)
> 
> Implement infiniband rdma driver at both EP and RC side. 
> EP side build EDMA hardware queue based on EP/RC side’s send and receive
> queue and when eDMA finished, write status to complete queue for both EP/RC 
> side. Use ipoib implement network transfer.
> 
> 
> The whole upstream effort is quite huge for these. I don’t want to waste
> time and efforts because direction is wrong. 
> 
> I think Solution 1 is an easy path.
> 

I didn't had time to look into Shunsuke's series, but from the initial look
of the proposed solutions, option 1 seems to be the best for me.

Thanks,
Mani

> 
> 

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-02-15  8:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-07 19:45 PCIe RC\EP virtio rdma solution discussion Frank Li
2023-02-14  3:24 ` Shunsuke Mie
2023-02-14 15:28   ` [EXT] " Frank Li
2023-02-15  8:23 ` Manivannan Sadhasivam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).