From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Tan, Jianfeng" <jianfeng.tan@intel.com>
Subject: Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch
 (generic switch) framework
Date: Tue, 13 Sep 2016 14:51:31 +0800
Message-ID: <ccbbdec5-29b5-9c20-8305-1346a5c39c14@intel.com>
References: <1473072871-16108-1-git-send-email-pankaj.chauhan@nxp.com>
 <1473072871-16108-2-git-send-email-pankaj.chauhan@nxp.com>
 <cdc07310-18be-43cf-2fef-86fa6d50c6dd@intel.com>
 <84848c11-fc26-015f-b7f8-a27d0558ef0b@nxp.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Cc: hemant.agrawal@nxp.com, yuanhan.liu@linux.intel.com,
 maxime.coquelin@redhat.com
To: Pankaj Chauhan <pankaj.chauhan@nxp.com>, dev@dpdk.org
Return-path: <dev-bounces@dpdk.org>
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
 by dpdk.org (Postfix) with ESMTP id A83D5567A
 for <dev@dpdk.org>; Tue, 13 Sep 2016 08:51:34 +0200 (CEST)
In-Reply-To: <84848c11-fc26-015f-b7f8-a27d0558ef0b@nxp.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Pankaj,


On 9/12/2016 6:55 PM, Pankaj Chauhan wrote:
> On 9/9/2016 2:26 PM, Tan, Jianfeng wrote:
>> Hi Pankaj,
>>
>> Thanks for the massive and great work.
>
> Hi Jianfeng,
>
> Thanks for the review.
>>
>> On 9/5/2016 6:54 PM, Pankaj Chauhan wrote:
>>> Introduce support for a generic framework for handling of switching
>>> between
>>> physical and vhost devices. The vswitch framework introduces the
>>> following
>>> concept:
>>>
>>> 1. vswitch_dev: Vswitch device is a logical switch which can have
>>> physical and
>>> virtio devices. The devices are operated/used using standard rte_eth
>>> API for
>>> physical devices and lib_vhost API for vhost devices, PMD API is not 
>>> used
>>> for vhost devices.
>>>
>>> 2. vswitch_port: Any physical or virtio device that is added to
>>> vswitch. The
>>> port can have its own tx/rx functions for doing data transfer, which
>>> are exposed
>>> to the framework using generic function pointers
>>> (vs_port->do_tx/do_rx). This way
>>> the generic code can do tx/rx without understanding the type of device
>>> (Physical or
>>> virtio). Similarly each port has its own functions to select tx/rx
>>> queues. The framework
>>> provides default tx/rx queue selection functions to the port when port
>>> is added
>>> (for both physical and vhost devices). But the framework allows the
>>> switch implementation
>>> to override the queue selection functions (vs_port->get_txq/rxq) if
>>> required.
>>>
>>> 3. vswitch_ops: The ops is set of function pointers which are used to
>>> do operations
>>> like learning, unlearning, add/delete port, lookup_and_forward. The
>>> user of vswitch
>>> framework (vhost/main.[c,h])uses these function pointers to perform
>>> above mentioned
>>> operations, thus it remains agnostic of the underlying implementation.
>>>
>>> Different switching logics can implement their vswitch_device and
>>> vswitch_ops, and
>>> register with the framework. This framework makes vhost-switch
>>> application scalable
>>> in terms of:
>>>
>>> 1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h]
>>> 2. Number of ports.
>>> 3. Policies of selecting ports for rx and tx.
>>>
>>> Signed-off-by: Pankaj Chauhan <pankaj.chauhan@nxp.com>
>>
>> After this patch set, how's mapping relationship between cores and
>> vswitch_dev? Old vhost example does not have the concept of switch, so
>> each core is binded with some VDEVs. Now, we still keep original logic?
>>
>> (a) If yes, provided that phys device could has no direct relationship
>> with vdevs in other switching logics, we may need to bind those physical
>> devices to cores too? In that case, switch_worker() will receiving pkts
>> from all devices (phys or virtual) and dispatch, which looks like:
>>
>> switch_worker() {
>>     FOR each port binding to this core {
>>          rx(port, pkts[], count);
>>          vs_lookup_n_fwd( information_needed );
>>     }
>> }
>
> Since we support only one switch device running at one time. We bind 
> the ports of the switchdev to the core. But The switch might have it's 
> own logic to bind the port to the core. For example
> VMDQ only supports one Physical port, other switch can support more 
> than one Physical port. In order to take care of that i have added two 
> ops in swithdev_ops:
>
> 1. vs_sched_rx_port:
>
> struct vswitch_port *vs_sched_rx_port(struct vswitch_dev *vs_dev, enum
>                                       vswitch_port_type ptype, 
> uint16_t core_id)
>
> 2. vs_sched_tx_port:
>
> struct vswitch_port *vs_sched_tx_port(struct vswitch_dev *vs_dev, enum
>                                       vswitch_port_type ptype, uint16_t
> core_id)
>
> The idea of providing these functions is that vhost/main requests the 
> underlying switch implementation to schedule a port for rx or Tx, the 
> current running core_id is also passed as parameter. So the switch can
> take a decision which port to do rx or tx based on core id, and may be 
> some other custom policy.
>
> For example VMDQ always returns the one single Physical port in 
> response to these functions called from any core. The other switch
> can return the ports bound to the cores.
>
> Similarly there are two port operations (vs_port->get_rxq and 
> vs_port->get_txq), here also we pass core_id as parameter so that
> the underlying switch implementation can schedule the rx or tx queue 
> based on the core_id.
>
> The above mentioned ops are used in drain_eth_rx() and 
> do_drain_mbuf_table() (main.c) currently, and they leave binding of 
> physical port and the queues to the underlying implementation. This
> way we can accommodate VMDQ which uses only one physical port and 
> rxqueues based on VMDQ, OR any other switch which uses multiple 
> physical port and rx/tx queue scheduling based on some other logic.
>
> Please suggest if this way of scheduling ports and tx/rx queues is 
> fine or not?

According to above explanation, in VMDQ switch, we cannot schedule two 
queues (belongs to the same port) on the same core, right?


>>
>> (b) If no, we may bind each core with n switches? If so, switch_worker()
>> also need to be reworked to keep receiving pkts from all ports of the
>> switch, and dispatch.
>>
>> switch_worker() {
>>     FOR each switch binding to this core {
>>          FOR each port of switch {
>>              rx(port, pkts[], count);
>>              vs_lookup_n_fwd( information_needed );
>>         }
>>     }
>> }
> Since we currently support only one switch_dev in a running instance of
> vhost_switch() we can use binding of ports to core as you suggested in 
> #(a).

OK.

>
>>
>> In all, (1) we'd better not use vdev to find phys dev in switch_worker
>> any more;
>
> I agree with you. I have tried to do following in drain_eth_rx ():
>
>
>         core_id = rte_lcore_id();
>         rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS,
>                                     core_id);
>         if (unlikely(!rx_port))
>             goto out;
>
>        rxq = rx_port->get_rxq(rx_port, vdev, core_id);
>         rx_count = rx_port->do_rx(rx_port, rxq, NULL, pkts, 
> MAX_PKT_BURST);
>
> Here we don't assume any relation between vdev and the physical device 
> or rxq.  But pass vdev to the underlying switch so that it can choose
> the rxq for this vdev. In case of VMDQ it will return VMDQ queue 
> number for this vdev. In case of any other switch implementation it 
> can have their own logic to decide rxq.

Thanks for explanation.

>
> (2) we'd better not differentiate phys device and virtual
>> device in generic framework (it's just an attribute of vswitch_port.
>>
>> What do you think?
>
> I agree with your thought that given the current API in this patchset we
> should aim for making switch_worker agnostic of the port type. Ideally 
> it should look something like this:
>
> switch_worker() {
>
>         rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS;
>
>         rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id)
>         rx_q = rx_port->get_rxq(vs_port, vdev, code_id);
>         rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST);

Can we hide queues inside struct vswitch_port? I mean:
For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far 
you've already stored "struct vhost_dev *" into vswitch_port.priv when 
it's a virtual port, how about store queue_id into vswitch_port.priv 
when it's a physical port.
For arp_learning switch, make (port_id, all_enabled_queues) as a 
vswitch_port.
Summarize above two: we treat (port_id, all_enabled_queues[]) as a 
vswitch_port.

How about it?

>
>         vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq);
>
> }
>
> Here i have two problems in achieving switch_worker as mentioned above:
>
> 1. while doing Rx from physical port, we need the associated vdev to 
> know the VMDQ rx_q. That was the reason i kept current logic of 
> keeping vdevs bound to the core in the switch_worker.  If we don't 
> keep list of vdevs assigned to the core, how we' will get the rxq on 
> phsyical device in case of VMDQ switch implementation.

I think my proposal above can solve this problem.

>
> 2. drain_mbuf_table() currently assumes that there is only one 
> physical device on which we have to transmit. We may have to rethink 
> where to keep the tx queues (&lcore_tx_queue[core_id]), i case we have 
> multiple physical ports we might have to move tx queues to each port.

This function, drain_mbuf_table(), and its related data structures, are 
used for cache on tx direction. But it's not only useful for physical 
port, but also virtual port (vhost_dev). I suggest to make such cache a 
field of per vswitch_port. And each again, all vswitch_ports are 
connected to a core, each core will periodically clean those vswitch_ports.

Thanks,
Jianfeng

>
> Since i didn't have good solution currently for #1 and #2 mentioned 
> above i kept assumption of one single physical port as it is and tried 
> to integrate VMDQ in the framework with leaving these assumptions (thus
> the main structure of switch_worker) as it it.
>
> Please suggest if you have some solution to above mentioned points in 
> your mind. Ideally i agree we should have switch_worker() as i 
> mentioned in pseudo-code i gave above.
>>
>> Thanks,
>> Jianfeng
>>
>
>