All of lore.kernel.org
 help / color / mirror / Atom feed
From: wangyunjian <wangyunjian@huawei.com>
To: Paolo Abeni <pabeni@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"willemdebruijn.kernel@gmail.com"
	<willemdebruijn.kernel@gmail.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"bjorn@kernel.org" <bjorn@kernel.org>,
	"magnus.karlsson@intel.com" <magnus.karlsson@intel.com>,
	"maciej.fijalkowski@intel.com" <maciej.fijalkowski@intel.com>,
	"jonathan.lemon@gmail.com" <jonathan.lemon@gmail.com>,
	"davem@davemloft.net" <davem@davemloft.net>
Cc: "bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"virtualization@lists.linux.dev" <virtualization@lists.linux.dev>,
	xudingke <xudingke@huawei.com>,
	"liwei (DT)" <liwei395@huawei.com>
Subject: RE: [PATCH net-next v2 3/3] tun: AF_XDP Tx zero-copy support
Date: Fri, 1 Mar 2024 11:45:52 +0000	[thread overview]
Message-ID: <223aeca6435342ec8a4d57c959c23303@huawei.com> (raw)
In-Reply-To: <7d478cb842e28094f4d6102e593e3de25ab27dfe.camel@redhat.com>

> -----Original Message-----
> From: Paolo Abeni [mailto:pabeni@redhat.com]
> Sent: Thursday, February 29, 2024 7:13 PM
> To: wangyunjian <wangyunjian@huawei.com>; mst@redhat.com;
> willemdebruijn.kernel@gmail.com; jasowang@redhat.com; kuba@kernel.org;
> bjorn@kernel.org; magnus.karlsson@intel.com; maciej.fijalkowski@intel.com;
> jonathan.lemon@gmail.com; davem@davemloft.net
> Cc: bpf@vger.kernel.org; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> virtualization@lists.linux.dev; xudingke <xudingke@huawei.com>; liwei (DT)
> <liwei395@huawei.com>
> Subject: Re: [PATCH net-next v2 3/3] tun: AF_XDP Tx zero-copy support
> 
> On Wed, 2024-02-28 at 19:05 +0800, Yunjian Wang wrote:
> > @@ -2661,6 +2776,54 @@ static int tun_ptr_peek_len(void *ptr)
> >  	}
> >  }
> >
> > +static void tun_peek_xsk(struct tun_file *tfile) {
> > +	struct xsk_buff_pool *pool;
> > +	u32 i, batch, budget;
> > +	void *frame;
> > +
> > +	if (!ptr_ring_empty(&tfile->tx_ring))
> > +		return;
> > +
> > +	spin_lock(&tfile->pool_lock);
> > +	pool = tfile->xsk_pool;
> > +	if (!pool) {
> > +		spin_unlock(&tfile->pool_lock);
> > +		return;
> > +	}
> > +
> > +	if (tfile->nb_descs) {
> > +		xsk_tx_completed(pool, tfile->nb_descs);
> > +		if (xsk_uses_need_wakeup(pool))
> > +			xsk_set_tx_need_wakeup(pool);
> > +	}
> > +
> > +	spin_lock(&tfile->tx_ring.producer_lock);
> > +	budget = min_t(u32, tfile->tx_ring.size, TUN_XDP_BATCH);
> > +
> > +	batch = xsk_tx_peek_release_desc_batch(pool, budget);
> > +	if (!batch) {
> 
> This branch looks like an unneeded "optimization". The generic loop below
> should have the same effect with no measurable perf delta - and smaller code.
> Just remove this.
> 
> > +		tfile->nb_descs = 0;
> > +		spin_unlock(&tfile->tx_ring.producer_lock);
> > +		spin_unlock(&tfile->pool_lock);
> > +		return;
> > +	}
> > +
> > +	tfile->nb_descs = batch;
> > +	for (i = 0; i < batch; i++) {
> > +		/* Encode the XDP DESC flag into lowest bit for consumer to differ
> > +		 * XDP desc from XDP buffer and sk_buff.
> > +		 */
> > +		frame = tun_xdp_desc_to_ptr(&pool->tx_descs[i]);
> > +		/* The budget must be less than or equal to tx_ring.size,
> > +		 * so enqueuing will not fail.
> > +		 */
> > +		__ptr_ring_produce(&tfile->tx_ring, frame);
> > +	}
> > +	spin_unlock(&tfile->tx_ring.producer_lock);
> > +	spin_unlock(&tfile->pool_lock);
> 
> More related to the general design: it looks wrong. What if
> get_rx_bufs() will fail (ENOBUF) after successful peeking? With no more
> incoming packets, later peek will return 0 and it looks like that the
> half-processed packets will stay in the ring forever???
> 
> I think the 'ring produce' part should be moved into tun_do_read().

Currently, the vhost-net obtains a batch descriptors/sk_buffs from the
ptr_ring and enqueue the batch descriptors/sk_buffs to the virtqueue'queue,
and then consumes the descriptors/sk_buffs from the virtqueue'queue in
sequence. As a result, TUN does not know whether the batch descriptors have
been used up, and thus does not know when to return the batch descriptors.

So, I think it's reasonable that when vhost-net checks ptr_ring is empty,
it calls peek_len to get new xsk's descs and return the descriptors.

Thanks
> 
> Cheers,
> 
> Paolo


  parent reply	other threads:[~2024-03-01 11:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 11:05 [PATCH net-next v2 3/3] tun: AF_XDP Tx zero-copy support Yunjian Wang
2024-02-29 10:24 ` kernel test robot
2024-02-29 11:12 ` Paolo Abeni
2024-02-29 13:15   ` wangyunjian
2024-03-01 11:45   ` wangyunjian [this message]
2024-03-01 11:53     ` Michael S. Tsirkin
2024-03-04 13:45       ` wangyunjian
2024-03-11  4:00         ` Jason Wang
2024-03-11 13:27           ` wangyunjian
2024-03-12  6:07             ` Jason Wang
2024-02-29 15:44 ` kernel test robot
2024-03-01 14:11 ` Maciej Fijalkowski
2024-03-01 18:40   ` Willem de Bruijn
2024-03-06  5:32     ` Jason Wang
2024-03-04  6:55 ` Jason Wang
2024-03-04 11:23   ` wangyunjian
2024-03-06  2:11     ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=223aeca6435342ec8a4d57c959c23303@huawei.com \
    --to=wangyunjian@huawei.com \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=jasowang@redhat.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liwei395@huawei.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=xudingke@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.