From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pankaj Chauhan Subject: Re: [RFC][PATCH V2 1/3] examples/vhost: Add vswitch (generic switch) framework Date: Thu, 15 Sep 2016 14:30:55 +0530 Message-ID: <23bebfde-54ad-04fd-64cb-c99d7a4a78e9@nxp.com> References: <1473072871-16108-1-git-send-email-pankaj.chauhan@nxp.com> <1473072871-16108-2-git-send-email-pankaj.chauhan@nxp.com> <84848c11-fc26-015f-b7f8-a27d0558ef0b@nxp.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Cc: , , To: "Tan, Jianfeng" , Return-path: Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0077.outbound.protection.outlook.com [104.47.40.77]) by dpdk.org (Postfix) with ESMTP id 1AE765689 for ; Thu, 15 Sep 2016 11:01:07 +0200 (CEST) In-Reply-To: List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 9/13/2016 12:21 PM, Tan, Jianfeng wrote: Hi Jianfeng, >>> >>> On 9/5/2016 6:54 PM, Pankaj Chauhan wrote: >>>> Introduce support for a generic framework for handling of switching >>>> between >>>> physical and vhost devices. The vswitch framework introduces the >>>> following >>>> concept: >>>> >>>> 1. vswitch_dev: Vswitch device is a logical switch which can have >>>> physical and >>>> virtio devices. The devices are operated/used using standard rte_eth >>>> API for >>>> physical devices and lib_vhost API for vhost devices, PMD API is not >>>> used >>>> for vhost devices. >>>> >>>> 2. vswitch_port: Any physical or virtio device that is added to >>>> vswitch. The >>>> port can have its own tx/rx functions for doing data transfer, which >>>> are exposed >>>> to the framework using generic function pointers >>>> (vs_port->do_tx/do_rx). This way >>>> the generic code can do tx/rx without understanding the type of device >>>> (Physical or >>>> virtio). Similarly each port has its own functions to select tx/rx >>>> queues. The framework >>>> provides default tx/rx queue selection functions to the port when port >>>> is added >>>> (for both physical and vhost devices). But the framework allows the >>>> switch implementation >>>> to override the queue selection functions (vs_port->get_txq/rxq) if >>>> required. >>>> >>>> 3. vswitch_ops: The ops is set of function pointers which are used to >>>> do operations >>>> like learning, unlearning, add/delete port, lookup_and_forward. The >>>> user of vswitch >>>> framework (vhost/main.[c,h])uses these function pointers to perform >>>> above mentioned >>>> operations, thus it remains agnostic of the underlying implementation. >>>> >>>> Different switching logics can implement their vswitch_device and >>>> vswitch_ops, and >>>> register with the framework. This framework makes vhost-switch >>>> application scalable >>>> in terms of: >>>> >>>> 1. Different switching logics (one of them is vmdq, vhost/vmdq.[c,h] >>>> 2. Number of ports. >>>> 3. Policies of selecting ports for rx and tx. >>>> >>>> Signed-off-by: Pankaj Chauhan >>> >>> After this patch set, how's mapping relationship between cores and >>> vswitch_dev? Old vhost example does not have the concept of switch, so >>> each core is binded with some VDEVs. Now, we still keep original logic? >>> >>> (a) If yes, provided that phys device could has no direct relationship >>> with vdevs in other switching logics, we may need to bind those physical >>> devices to cores too? In that case, switch_worker() will receiving pkts >>> from all devices (phys or virtual) and dispatch, which looks like: >>> >>> switch_worker() { >>> FOR each port binding to this core { >>> rx(port, pkts[], count); >>> vs_lookup_n_fwd( information_needed ); >>> } >>> } >> >> Since we support only one switch device running at one time. We bind >> the ports of the switchdev to the core. But The switch might have it's >> own logic to bind the port to the core. For example >> VMDQ only supports one Physical port, other switch can support more >> than one Physical port. In order to take care of that i have added two >> ops in swithdev_ops: >> >> 1. vs_sched_rx_port: >> >> struct vswitch_port *vs_sched_rx_port(struct vswitch_dev *vs_dev, enum >> vswitch_port_type ptype, >> uint16_t core_id) >> >> 2. vs_sched_tx_port: >> >> struct vswitch_port *vs_sched_tx_port(struct vswitch_dev *vs_dev, enum >> vswitch_port_type ptype, uint16_t >> core_id) >> >> The idea of providing these functions is that vhost/main requests the >> underlying switch implementation to schedule a port for rx or Tx, the >> current running core_id is also passed as parameter. So the switch can >> take a decision which port to do rx or tx based on core id, and may be >> some other custom policy. >> >> For example VMDQ always returns the one single Physical port in >> response to these functions called from any core. The other switch >> can return the ports bound to the cores. >> >> Similarly there are two port operations (vs_port->get_rxq and >> vs_port->get_txq), here also we pass core_id as parameter so that >> the underlying switch implementation can schedule the rx or tx queue >> based on the core_id. >> >> The above mentioned ops are used in drain_eth_rx() and >> do_drain_mbuf_table() (main.c) currently, and they leave binding of >> physical port and the queues to the underlying implementation. This >> way we can accommodate VMDQ which uses only one physical port and >> rxqueues based on VMDQ, OR any other switch which uses multiple >> physical port and rx/tx queue scheduling based on some other logic. >> >> Please suggest if this way of scheduling ports and tx/rx queues is >> fine or not? > > According to above explanation, in VMDQ switch, we cannot schedule two > queues (belongs to the same port) on the same core, right? > Yes. This would be a limitation i believe which will not be acceptable. The method that you suggested further in the email (tread port + queue_id as a vs_port) can solve this. > >>> >>> (b) If no, we may bind each core with n switches? If so, switch_worker() >>> also need to be reworked to keep receiving pkts from all ports of the >>> switch, and dispatch. >>> >>> switch_worker() { >>> FOR each switch binding to this core { >>> FOR each port of switch { >>> rx(port, pkts[], count); >>> vs_lookup_n_fwd( information_needed ); >>> } >>> } >>> } >> Since we currently support only one switch_dev in a running instance of >> vhost_switch() we can use binding of ports to core as you suggested in >> #(a). > > OK. > >> >>> >>> In all, (1) we'd better not use vdev to find phys dev in switch_worker >>> any more; >> >> I agree with you. I have tried to do following in drain_eth_rx (): >> >> >> core_id = rte_lcore_id(); >> rx_port = vs_sched_rx_port(vswitch_dev_g, VSWITCH_PTYPE_PHYS, >> core_id); >> if (unlikely(!rx_port)) >> goto out; >> >> rxq = rx_port->get_rxq(rx_port, vdev, core_id); >> rx_count = rx_port->do_rx(rx_port, rxq, NULL, pkts, >> MAX_PKT_BURST); >> >> Here we don't assume any relation between vdev and the physical device >> or rxq. But pass vdev to the underlying switch so that it can choose >> the rxq for this vdev. In case of VMDQ it will return VMDQ queue >> number for this vdev. In case of any other switch implementation it >> can have their own logic to decide rxq. > > Thanks for explanation. > >> >> (2) we'd better not differentiate phys device and virtual >>> device in generic framework (it's just an attribute of vswitch_port. >>> >>> What do you think? >> >> I agree with your thought that given the current API in this patchset we >> should aim for making switch_worker agnostic of the port type. Ideally >> it should look something like this: >> >> switch_worker() { >> >> rx_port mask = VSWITCH_PTYPE_PHYS | VSWITCH_PTYPE_PHYS; >> >> rx_port = vs_sched_rx_port(vswit_dev_g, rx_port_mask, core_id) >> rx_q = rx_port->get_rxq(vs_port, vdev, code_id); >> rx_port->do_rx(rx_port, rxq, NULL, pktss, MAX_PKT_BURST); > > Can we hide queues inside struct vswitch_port? I mean: > For VMDQ switch, treat (port_id, queue_id) as a vswitch_port, so far > you've already stored "struct vhost_dev *" into vswitch_port.priv when > it's a virtual port, how about store queue_id into vswitch_port.priv > when it's a physical port. > For arp_learning switch, make (port_id, all_enabled_queues) as a > vswitch_port. > Summarize above two: we treat (port_id, all_enabled_queues[]) as a > vswitch_port. > > How about it? Thanks for wonderful suggestion! Yes it should be possible, instead of having one vs_port for physical device we could create one vs_port for each phys device + queue_id combination. Then we can bind the vs_ports to the cores, and do away with binding vdevs to the cores. In order to add extra vs_ports (port + queue_id) there are two methods: 1. vhost/main.c (or vswitch_common.c) queries the underlying switch about how many rx queues to support thourgh a new accessor. VMDQ will return the number of vmdq queues, in this case. After getting the number of rx queues, vhost/main.c creates the extra ports i.e one vs_port for each physical port and queue id combination. 2. Second method is that the switch implementation for example vmdq.c create the extra ports (port + queue_id) as part of vs_port->add_port operation for the physical port. Which method do you think we should follow? I think #1 should be done so that same logic can be used for other switches also. > >> >> vs_lookup_n_fwd(rx_port, pkts, rx_count, rxq); >> >> } >> >> Here i have two problems in achieving switch_worker as mentioned above: >> >> 1. while doing Rx from physical port, we need the associated vdev to >> know the VMDQ rx_q. That was the reason i kept current logic of >> keeping vdevs bound to the core in the switch_worker. If we don't >> keep list of vdevs assigned to the core, how we' will get the rxq on >> phsyical device in case of VMDQ switch implementation. > > I think my proposal above can solve this problem. Yes thanks! > >> >> 2. drain_mbuf_table() currently assumes that there is only one >> physical device on which we have to transmit. We may have to rethink >> where to keep the tx queues (&lcore_tx_queue[core_id]), i case we have >> multiple physical ports we might have to move tx queues to each port. > > This function, drain_mbuf_table(), and its related data structures, are > used for cache on tx direction. But it's not only useful for physical > port, but also virtual port (vhost_dev). I suggest to make such cache a > field of per vswitch_port. And each again, all vswitch_ports are > connected to a core, each core will periodically clean those vswitch_ports. > okay. So we add similar tx_q structures (one per core) to each vswitch_port, correct? Thanks, Pankaj