From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yuanhan Liu Subject: [RFC] packed (virtio-net) headers Date: Tue, 28 Feb 2017 13:47:19 +0800 Message-ID: <20170228054719.GJ18844__46444.9029569097$1488675806$gmane$org@yliu-dev.sh.intel.com> References: <20160915223915.qjlnlvf2w7u37bu3@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20160915223915.qjlnlvf2w7u37bu3@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: "Michael S. Tsirkin" Cc: virtio-dev@lists.oasis-open.org, virtualization@lists.linux-foundation.org List-Id: virtualization@lists.linuxfoundation.org Hi, For virtio-net, we use 2 descs for representing a (small) pkt. One for virtio-net header and another one for the pkt data. And it has two issues: - the desc buffer for storing pkt data is halfed Though we later introduced 2 more options to overcome this: ANYLAY_OUT and indirect desc. The indirect desc has another issue: it introdues an extra cache line visit. - virtio-net header could be scattered Assume the ANYLAY_OUT case, whereas the headered is prepened before each mbuf (or skb in kernel). In DPDK, a burst recevice in vhost pmd means 32 different cache visit for virtio header. For the legacy layout and indirect desc, the cache issue could somehone diminished a bit: we could arrange the virtio header in a same memory block and let the header desc point to the right one. But it's still not good enough: the virtio-net headers aren't accessed in batch: they have to be accessed one by one (by reading the desc). That said, it's still not that good for cache utilization. And I'm proposing packed header: - put all virtio-net header in a memory block. A burst size of 32 pkts need only access (32 * 12) / 64 = 6 cache lines. While before, it could be 32 cache lines. - introduce a header desc to reference above memory block. desc->addr = starting addr of net headers mem block desc->len = size of all net virtio net headers (burst size * header size) Thus, in a burst size of 32, we only need 33 descs: one for headers and others for store corresponding pkt data. More importantly, we could use the "len" field for computing the batch size. We then could load the virtio net headers at once; we could also prefetch all the descs at once. Note it could also be adapted to virtio 0.95 and 1.0. I also made a simple prototype with DPDK (yet again, it's Tx path only), I saw an impressive boost (about 30%) in a mirco benchmark. I think such proposal may should also help other devices, too, if they also have a small header for each data. Thoughts? --yliu