All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tiwei Bie <tiwei.bie@intel.com>
To: Jason Wang <jasowang@redhat.com>
Cc: mst@redhat.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, wexu@redhat.com
Subject: Re: [PATCH RFC 2/2] virtio_ring: support packed ring
Date: Fri, 16 Mar 2018 18:04:13 +0800	[thread overview]
Message-ID: <20180316100413.vtqwatregzrmhvt3__5134.0028913275$1521194633$gmane$org@debian> (raw)
In-Reply-To: <02a3da02-8226-ba4e-1b47-d25755b2c429@redhat.com>

On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
> On 2018年03月16日 15:40, Tiwei Bie wrote:
> > On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
> > > On 2018年03月16日 14:10, Tiwei Bie wrote:
> > > > On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> > > > > On 2018年02月23日 19:18, Tiwei Bie wrote:
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > ---
> > > > > >     drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> > > > > >     include/linux/virtio_ring.h  |   8 +-
> > > > > >     2 files changed, 618 insertions(+), 89 deletions(-)
> > [...]
> > > > > >     			      cpu_addr, size, direction);
> > > > > >     }
> > > > > > -static void vring_unmap_one(const struct vring_virtqueue *vq,
> > > > > > -			    struct vring_desc *desc)
> > > > > > +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
> > > > > >     {
> > > > > Let's split the helpers to packed/split version like other helpers?
> > > > > (Consider the caller has already known the type of vq).
> > > > Okay.
> > > > 
> > > [...]
> > > 
> > > > > > +				desc[i].flags = flags;
> > > > > > +
> > > > > > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > > > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > If it's a part of chain, we only need to do this for last buffer I think.
> > > > I'm not sure I've got your point about the "last buffer".
> > > > But, yes, id just needs to be set for the last desc.
> > > Right, I think I meant "last descriptor" :)
> > > 
> > > > > > +			prev = i;
> > > > > > +			i++;
> > > > > It looks to me prev is always i - 1?
> > > > No. prev will be (vq->vring_packed.num - 1) when i becomes 0.
> > > Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.
> > Yes, i wraps together with vq->wrap_counter in following code:
> > 
> > > > > > +			if (!indirect && i >= vq->vring_packed.num) {
> > > > > > +				i = 0;
> > > > > > +				vq->wrap_counter ^= 1;
> > > > > > +			}
> > 
> > > > > > +		}
> > > > > > +	}
> > > > > > +	for (; n < (out_sgs + in_sgs); n++) {
> > > > > > +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> > > > > > +			if (vring_mapping_error(vq, addr))
> > > > > > +				goto unmap_release;
> > > > > > +
> > > > > > +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > > > > > +					VRING_DESC_F_WRITE |
> > > > > > +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > > > > > +					VRING_DESC_F_USED(!vq->wrap_counter));
> > > > > > +			if (!indirect && i == head)
> > > > > > +				head_flags = flags;
> > > > > > +			else
> > > > > > +				desc[i].flags = flags;
> > > > > > +
> > > > > > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > > > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > > +			prev = i;
> > > > > > +			i++;
> > > > > > +			if (!indirect && i >= vq->vring_packed.num) {
> > > > > > +				i = 0;
> > > > > > +				vq->wrap_counter ^= 1;
> > > > > > +			}
> > > > > > +		}
> > > > > > +	}
> > > > > > +	/* Last one doesn't continue. */
> > > > > > +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
> > > > > > +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > > I can't get the why we need this here.
> > > > If only one desc is used, we will need to clear the
> > > > VRING_DESC_F_NEXT flag from the head_flags.
> > > Yes, I meant why following desc[prev].flags won't work for this?
> > Because the update of desc[head].flags (in above case,
> > prev == head) has been delayed. The flags is saved in
> > head_flags.
> 
> Ok, but let's try to avoid modular here e.g tracking the number of sgs in a
> counter.
> 
> And I see lots of duplication in the above two loops, I believe we can unify
> them with a a single loop. the only difference is dma direction and write
> flag.

The above implementation for packed ring is basically
an mirror of the existing implementation in split ring
as I want to keep the coding style consistent. Below
is the corresponding code in split ring:

static inline int virtqueue_add(struct virtqueue *_vq,
				struct scatterlist *sgs[],
				unsigned int total_sg,
				unsigned int out_sgs,
				unsigned int in_sgs,
				void *data,
				void *ctx,
				gfp_t gfp)
{
	......

	for (n = 0; n < out_sgs; n++) {
		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
			if (vring_mapping_error(vq, addr))
				goto unmap_release;

			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT);
			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
			prev = i;
			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
		}
	}
	for (; n < (out_sgs + in_sgs); n++) {
		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
			if (vring_mapping_error(vq, addr))
				goto unmap_release;

			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE);
			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
			prev = i;
			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
		}
	}

	......
}

> 
> > 
> > > > > > +	else
> > > > > > +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > > > +
> > > > > > +	if (indirect) {
> > > > > > +		/* FIXME: to be implemented */
> > > > > > +
> > > > > > +		/* Now that the indirect table is filled in, map it. */
> > > > > > +		dma_addr_t addr = vring_map_single(
> > > > > > +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> > > > > > +			DMA_TO_DEVICE);
> > > > > > +		if (vring_mapping_error(vq, addr))
> > > > > > +			goto unmap_release;
> > > > > > +
> > > > > > +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> > > > > > +					     VRING_DESC_F_AVAIL(wrap_counter) |
> > > > > > +					     VRING_DESC_F_USED(!wrap_counter));
> > > > > > +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> > > > > > +				total_sg * sizeof(struct vring_packed_desc));
> > > > > > +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > > +	}
> > > > > > +
> > > > > > +	/* We're using some buffers from the free list. */
> > > > > > +	vq->vq.num_free -= descs_used;
> > > > > > +
> > > > > > +	/* Update free pointer */
> > > > > > +	if (indirect) {
> > > > > > +		n = head + 1;
> > > > > > +		if (n >= vq->vring_packed.num) {
> > > > > > +			n = 0;
> > > > > > +			vq->wrap_counter ^= 1;
> > > > > > +		}
> > > > > > +		vq->free_head = n;
> > > > > detach_buf_packed() does not even touch free_head here, so need to explain
> > > > > its meaning for packed ring.
> > > > Above code is for indirect support which isn't really
> > > > implemented in this patch yet.
> > > > 
> > > > For your question, free_head stores the index of the
> > > > next avail desc. I'll add a comment for it or move it
> > > > to union and give it a better name in next version.
> > > Yes, something like avail_idx might be better.
> > > 
> > > > > > +	} else
> > > > > > +		vq->free_head = i;
> > > > > ID is only valid in the last descriptor in the list, so head + 1 should be
> > > > > ok too?
> > > > I don't really get your point. The vq->free_head stores
> > > > the index of the next avail desc.
> > > I think I get your idea now, free_head has two meanings:
> > > 
> > > - next avail index
> > > - buffer id
> > In my design, free_head is just the index of the next
> > avail desc.
> > 
> > Driver can set anything to buffer ID.
> 
> Then you need another method to track id to context e.g hashing.

I keep the context in desc_state[desc_idx]. So there is
no extra method needed to track the context.

> 
> >   And in my design,
> > I save desc index in buffer ID.
> > 
> > I'll add comments for them.
> > 
> > > If I'm correct, let's better add a comment for this.
> > > 
> > > > > > +
> > > > > > +	/* Store token and indirect buffer state. */
> > > > > > +	vq->desc_state[head].num = descs_used;
> > > > > > +	vq->desc_state[head].data = data;
> > > > > > +	if (indirect)
> > > > > > +		vq->desc_state[head].indir_desc = desc;
> > > > > > +	else
> > > > > > +		vq->desc_state[head].indir_desc = ctx;
> > > > > > +
> > > > > > +	virtio_wmb(vq->weak_barriers);
> > > > > Let's add a comment to explain the barrier here.
> > > > Okay.
> > > > 
> > > > > > +	vq->vring_packed.desc[head].flags = head_flags;
> > > > > > +	vq->num_added++;
> > > > > > +
> > > > > > +	pr_debug("Added buffer head %i to %p\n", head, vq);
> > > > > > +	END_USE(vq);
> > > > > > +
> > > > > > +	return 0;
> > > > > > +
> > > > > > +unmap_release:
> > > > > > +	err_idx = i;
> > > > > > +	i = head;
> > > > > > +
> > > > > > +	for (n = 0; n < total_sg; n++) {
> > > > > > +		if (i == err_idx)
> > > > > > +			break;
> > > > > > +		vring_unmap_one(vq, &desc[i]);
> > > > > > +		i++;
> > > > > > +		if (!indirect && i >= vq->vring_packed.num)
> > > > > > +			i = 0;
> > > > > > +	}
> > > > > > +
> > > > > > +	vq->wrap_counter = wrap_counter;
> > > > > > +
> > > > > > +	if (indirect)
> > > > > > +		kfree(desc);
> > > > > > +
> > > > > > +	END_USE(vq);
> > > > > > +	return -EIO;
> > > > > > +}
> > [...]
> > > > > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > >     	if (!queue) {
> > > > > >     		/* Try to get a single page. You are my only hope! */
> > > > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > +							     packed),
> > > > > >     					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > >     	}
> > > > > >     	if (!queue)
> > > > > >     		return NULL;
> > > > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > -	vring_init(&vring, num, queue, vring_align);
> > > > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > +	if (packed)
> > > > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > +	else
> > > > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > I don't think we can rename it.
> > > I see, then this need more thoughts to unify the API.
> > My thought is to keep the old API as is, and introduce
> > new types and helpers for packed ring.
> 
> I admit it's not a fault of this patch. But we'd better think of this in the
> future, consider we may have new kinds of ring.
> 
> > 
> > More details can be found in this patch:
> > https://lkml.org/lkml/2018/2/23/243
> > (PS. The type which has bit fields is just for reference,
> >   and will be changed in next version.)
> > 
> > Do you have any other suggestions?
> 
> No.

Hmm.. Sorry, I didn't describe my question well.
I mean do you have any suggestions about the API
design for packed ring in uapi header? Currently
I introduced below two new helpers:

static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
				     void *p, unsigned long align);
static inline unsigned vring_packed_size(unsigned int num, unsigned long align);

When new rings are introduced in the future, above
helpers can't be reused. Maybe we should make the
helpers be able to determine the ring type?

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > > > > > -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > > > > -				   notify, callback, name);
> > > > > > +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > > > > +				   context, notify, callback, name);
> > > > > >     	if (!vq) {
> > > > > >     		vring_free_queue(vdev, queue_size_in_bytes, queue,
> > > > > >     				 dma_addr);
> > [...]
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

      parent reply	other threads:[~2018-03-16 10:04 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-23 11:17 [PATCH RFC 0/2] Packed ring for virtio Tiwei Bie
2018-02-23 11:17 ` Tiwei Bie
2018-02-23 11:18 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Tiwei Bie
2018-02-23 11:18   ` Tiwei Bie
2018-02-27  8:54   ` Jens Freimann
2018-02-27  8:54   ` Jens Freimann
2018-02-27  9:18     ` Jens Freimann
2018-02-27  9:18     ` Jens Freimann
2018-02-27 12:01     ` Tiwei Bie
2018-02-27 12:01     ` Tiwei Bie
2018-02-27 20:28     ` Michael S. Tsirkin
2018-02-27 20:28     ` Michael S. Tsirkin
2018-02-27  9:26   ` David Laight
2018-02-27  9:26   ` David Laight
2018-02-27 11:31     ` Tiwei Bie
2018-02-27 11:31       ` Tiwei Bie
2018-02-23 11:18 ` [PATCH RFC 2/2] virtio_ring: support packed ring Tiwei Bie
2018-02-23 11:18   ` Tiwei Bie
2018-03-16  4:03   ` Jason Wang
2018-03-16  4:03     ` Jason Wang
2018-03-16  6:10     ` Tiwei Bie
2018-03-16  6:10       ` Tiwei Bie
2018-03-16  6:44       ` Jason Wang
2018-03-16  6:44       ` Jason Wang
2018-03-16  7:40         ` Tiwei Bie
2018-03-16  7:40         ` Tiwei Bie
2018-03-16  8:34           ` Jason Wang
2018-03-16  8:34             ` Jason Wang
2018-03-16 10:04             ` Tiwei Bie
2018-03-16 11:36               ` Jason Wang
2018-03-16 11:36                 ` Jason Wang
2018-03-16 14:30                 ` Michael S. Tsirkin
2018-03-16 14:30                   ` Michael S. Tsirkin
2018-03-21  7:35                   ` Tiwei Bie
2018-03-21  7:35                     ` Tiwei Bie
2018-03-21  7:30                 ` Tiwei Bie
2018-03-21  7:30                   ` Tiwei Bie
2018-03-16 10:04             ` Tiwei Bie [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20180316100413.vtqwatregzrmhvt3__5134.0028913275$1521194633$gmane$org@debian' \
    --to=tiwei.bie@intel.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wexu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.