dev.dpdk.org archive mirror
 help / color / mirror / Atom feed
From: Flavio Leitner <fbl@sysclose.org>
To: Shahaf Shuler <shahafs@mellanox.com>
Cc: David Marchand <david.marchand@redhat.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	Maxime Coquelin <maxime.coquelin@redhat.com>,
	Tiwei Bie <tiwei.bie@intel.com>,
	Zhihong Wang <zhihong.wang@intel.com>,
	Obrembski MichalX <michalx.obrembski@intel.com>,
	Stokes Ian <ian.stokes@intel.com>
Subject: Re: [dpdk-dev] [PATCH] vhost: add support to large linear mbufs
Date: Wed, 2 Oct 2019 09:58:31 -0300	[thread overview]
Message-ID: <20191002095831.5927af93@p50.lan> (raw)
In-Reply-To: <AM0PR0502MB3795E94C2D0570A715525C3AC39C0@AM0PR0502MB3795.eurprd05.prod.outlook.com>


Hi Shahaf,

Thanks for looking into this, see my inline comments.

On Wed, 2 Oct 2019 09:00:11 +0000
Shahaf Shuler <shahafs@mellanox.com> wrote:

> Wednesday, October 2, 2019 11:05 AM, David Marchand:
> > Subject: Re: [dpdk-dev] [PATCH] vhost: add support to large linear
> > mbufs
> > 
> > Hello Shahaf,
> > 
> > On Wed, Oct 2, 2019 at 6:46 AM Shahaf Shuler <shahafs@mellanox.com>
> > wrote:  
> > >
> > > Wednesday, October 2, 2019 1:20 AM, Flavio Leitner:  
> > > > Subject: [dpdk-dev] [PATCH] vhost: add support to large linear
> > > > mbufs
> > > >
> > > > The rte_vhost_dequeue_burst supports two ways of dequeuing
> > > > data. If the data fits into a buffer, then all data is copied
> > > > and a single linear buffer is returned. Otherwise it allocates
> > > > additional mbufs and chains them together to return a multiple
> > > > segments mbuf.
> > > >
> > > > While that covers most use cases, it forces applications that
> > > > need to work with larger data sizes to support multiple
> > > > segments mbufs. The non-linear characteristic brings complexity
> > > > and performance implications to the application.
> > > >
> > > > To resolve the issue, change the API so that the application can
> > > > optionally provide a second mempool containing larger mbufs. If
> > > > that is not provided (NULL), the behavior remains as before the
> > > > change. Otherwise, the data size is checked and the
> > > > corresponding mempool is used to return linear mbufs.  
> > >
> > > I understand the motivation.
> > > However, providing a static pool w/ large buffers is not so
> > > efficient in terms  
> > of memory footprint. You will need to prepare to worst case (all
> > packet are large) w/ max size of 64KB.  
> > > Also, the two mempools are quite restrictive as the memory fill
> > > of the  
> > mbufs might be very sparse. E.g. mempool1 mbuf.size = 1.5K ,
> > mempool2 mbuf.size = 64K, packet size 4KB.  
> > >
> > > Instead, how about using the mbuf external buffer feature?
> > > The flow will be:
> > > 1. vhost PMD always receive a single mempool (like today) 2. on
> > > dequeue, PMD looks on the virtio packet size. If smaller than the
> > > mbuf size use the mbuf as is (like today) 3. otherwise, allocate
> > > a new buffer (inside the PMD) and link it to the mbuf as external
> > > buffer (rte_pktmbuf_attach_extbuf)  
> > 
> > I am missing some piece here.
> > Which pool would the PMD take those external buffers from?  
> 
> The mbuf is always taken from the single mempool associated w/ the
> rxq. The buffer for the mbuf may be allocated (in case virtio payload
> is bigger than current mbuf size) from DPDK hugepages or any other
> system memory and be attached to the mbuf.
> 
> You can see example implementation of it in mlx5 PMD (checkout
> rte_pktmbuf_attach_extbuf call)

Thanks, I wasn't aware of external buffers.

I see that attaching external buffers of the correct size would be more
efficient in terms of saving memory/avoiding sparsing.

However, we still need to be prepared to the worse case scenario (all
packets 64K), so that doesn't help with the total memory required.

The current patch pushes the decision to the application which knows
better the workload. If more memory is available, it can optionally use
large buffers, otherwise just don't pass that. Or even decide whether
to share the same 64K mempool between multiple vhost ports or use one
mempool per port.

Perhaps I missed something, but managing memory with mempool still
require us to have buffers of 64K regardless if the data consumes less
space. Otherwise the application or the PMD will have to manage memory
itself.

If we let the PMD manages the memory, what happens if a port/queue
is closed and one or more buffers are still in use (switching)? I don't
see how to solve this cleanly.

fbl

> 
> > 
> > If it is from an additional mempool passed to the vhost pmd, I
> > can't see the difference with Flavio proposal.
> > 
> >   
> > > The pros of this approach is that you have full flexibility on
> > > the memory  
> > allocation, and therefore a lower footprint.  
> > > The cons is the OVS will need to know how to handle mbuf w/
> > > external  
> > buffers (not too complex IMO).
> > 
> > 
> > --
> > David Marchand  


  reply	other threads:[~2019-10-02 12:58 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-01 22:19 [dpdk-dev] [PATCH] vhost: add support to large linear mbufs Flavio Leitner
2019-10-01 23:10 ` Flavio Leitner
2019-10-02  4:45 ` Shahaf Shuler
2019-10-02  8:04   ` David Marchand
2019-10-02  9:00     ` Shahaf Shuler
2019-10-02 12:58       ` Flavio Leitner [this message]
2019-10-02 17:50         ` Shahaf Shuler
2019-10-02 18:15           ` Flavio Leitner
2019-10-03 16:57             ` Ilya Maximets
2019-10-03 21:25               ` Flavio Leitner
2019-10-02  7:51 ` Maxime Coquelin
2019-10-04 20:10 ` [dpdk-dev] [PATCH v2] vhost: add support for large buffers Flavio Leitner
2019-10-06  4:47   ` Shahaf Shuler
2019-10-10  5:12   ` Tiwei Bie
2019-10-10 12:12     ` Flavio Leitner
2019-10-11 17:09   ` [dpdk-dev] [PATCH v3] " Flavio Leitner
2019-10-14  2:44     ` Tiwei Bie
2019-10-15 16:17     ` [dpdk-dev] [PATCH v4] " Flavio Leitner
2019-10-15 17:41       ` Ilya Maximets
2019-10-15 18:44         ` Flavio Leitner
2019-10-15 18:59       ` [dpdk-dev] [PATCH v5] " Flavio Leitner
2019-10-16 10:02         ` Maxime Coquelin
2019-10-16 11:13         ` Maxime Coquelin
2019-10-16 13:32           ` Ilya Maximets
2019-10-16 13:46             ` Maxime Coquelin
2019-10-16 14:02               ` Flavio Leitner
2019-10-16 14:08                 ` Ilya Maximets
2019-10-16 14:14                   ` Flavio Leitner
2019-10-16 14:05               ` Ilya Maximets
2019-10-29  9:02         ` David Marchand
2019-10-29 12:21           ` Flavio Leitner
2019-10-29 16:19             ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191002095831.5927af93@p50.lan \
    --to=fbl@sysclose.org \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=ian.stokes@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=michalx.obrembski@intel.com \
    --cc=shahafs@mellanox.com \
    --cc=tiwei.bie@intel.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).