From: Paolo Bonzini <pbonzini@redhat.com> To: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-kernel@vger.kernel.org, Wanlong Gao <gaowanlong@cn.fujitsu.com>, asias@redhat.com, mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: Re: [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Date: Thu, 14 Feb 2013 10:23:37 +0100 [thread overview] Message-ID: <511CAD19.2010902@redhat.com> (raw) In-Reply-To: <87r4kjjuyn.fsf@rustcorp.com.au> Il 14/02/2013 07:00, Rusty Russell ha scritto: > Paolo Bonzini <pbonzini@redhat.com> writes: >> This series adds a different set of APIs for adding a buffer to a >> virtqueue. The new API lets you pass the buffers piecewise, wrapping >> multiple calls to virtqueue_add_sg between virtqueue_start_buf and >> virtqueue_end_buf. Letting drivers call virtqueue_add_sg multiple times >> if they already have a scatterlist provided by someone else simplifies the >> code and, for virtio-scsi, it saves the copying and related locking. > > They are ugly though. It's convoluted because we do actually know all > the buffers at once, we don't need a piecemeal API. > > As a result, you now have arbitrary changes to the indirect heuristic, > because the API is now piecemeal. Note that I have sent v2 of patch 1/9, keeping the original indirect heuristic. It was indeed a bad idea to conflate it in this series (it was born there because originally virtqueue_add_buf was not sharing any code, but now it's a different story) > How about this as a first step? > > virtio_ring: virtqueue_add_sgs, to add multiple sgs. > > virtio_scsi and virtio_blk can really use these, to avoid their current > hack of copying the whole sg array. > > Signed-off-by: Ruty Russell <rusty@rustcorp.com.au> It's much better than the other prototype you had posted, but I still dislike this... You pay for additional counting of scatterlists when the caller knows the number of buffers; and the nested loops aren't free, either. My piecemeal API tried hard to keep things as fast as virtqueue_add_buf when possible; I'm worried that this approach requires a lot more benchmarking. Probably you would also need a fast-path virtqueue_add_buf_single, and (unlike my version) that one couldn't share much code if any with virtqueue_add_sgs. So I can resend based on this patch, but I'm not sure it's really better... Also, see below for a comment. > @@ -197,8 +213,47 @@ int virtqueue_add_buf(struct virtqueue *_vq, > void *data, > gfp_t gfp) > { > + struct scatterlist *sgs[2]; > + unsigned int i; > + > + sgs[0] = sg; > + sgs[1] = sg + out; > + > + /* Workaround until callers pass well-formed sgs. */ > + for (i = 0; i < out + in; i++) > + sg_unmark_end(sg + i); > + > + sg_unmark_end(sg + out + in); > + if (out && in) > + sg_unmark_end(sg + out); What's this second sg_unmark_end block for? Doesn't it access after the end of sg? If you wanted it to be sg_mark_end, that must be: if (out) sg_mark_end(sg + out - 1); if (in) sg_mark_end(sg + out + in - 1); with a corresponding unmark afterwards. Paolo > + return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp); > +} > + > +/** > + * virtqueue_add_sgs - expose buffers to other end > + * @vq: the struct virtqueue we're talking about. > + * @sgs: array of terminated scatterlists. > + * @out_num: the number of scatterlists readable by other side > + * @in_num: the number of scatterlists which are writable (after readable ones) > + * @data: the token identifying the buffer. > + * @gfp: how to do memory allocations (if necessary). > + * > + * Caller must ensure we don't call this with other virtqueue operations > + * at the same time (except where noted). > + * > + * Returns zero or a negative error (ie. ENOSPC, ENOMEM). > + */ > +int virtqueue_add_sgs(struct virtqueue *_vq, > + struct scatterlist *sgs[], > + unsigned int out_sgs, > + unsigned int in_sgs, > + void *data, > + gfp_t gfp) > +{ > struct vring_virtqueue *vq = to_vvq(_vq); > - unsigned int i, avail, uninitialized_var(prev); > + struct scatterlist *sg; > + unsigned int i, n, avail, uninitialized_var(prev), total_sg; > int head; > > START_USE(vq); > @@ -218,46 +273,59 @@ int virtqueue_add_buf(struct virtqueue *_vq, > } > #endif > > + /* Count them first. */ > + for (i = total_sg = 0; i < out_sgs + in_sgs; i++) { > + struct scatterlist *sg; > + for (sg = sgs[i]; sg; sg = sg_next(sg)) > + total_sg++; > + } > + > + > /* If the host supports indirect descriptor tables, and we have multiple > * buffers, then go indirect. FIXME: tune this threshold */ > - if (vq->indirect && (out + in) > 1 && vq->vq.num_free) { > - head = vring_add_indirect(vq, sg, out, in, gfp); > + if (vq->indirect && total_sg > 1 && vq->vq.num_free) { > + head = vring_add_indirect(vq, sgs, total_sg, out_sgs, in_sgs, > + gfp); > if (likely(head >= 0)) > goto add_head; > } > > - BUG_ON(out + in > vq->vring.num); > - BUG_ON(out + in == 0); > + BUG_ON(total_sg > vq->vring.num); > + BUG_ON(total_sg == 0); > > - if (vq->vq.num_free < out + in) { > + if (vq->vq.num_free < total_sg) { > pr_debug("Can't add buf len %i - avail = %i\n", > - out + in, vq->vq.num_free); > + total_sg, vq->vq.num_free); > /* FIXME: for historical reasons, we force a notify here if > * there are outgoing parts to the buffer. Presumably the > * host should service the ring ASAP. */ > - if (out) > + if (out_sgs) > vq->notify(&vq->vq); > END_USE(vq); > return -ENOSPC; > } > > /* We're about to use some buffers from the free list. */ > - vq->vq.num_free -= out + in; > - > - head = vq->free_head; > - for (i = vq->free_head; out; i = vq->vring.desc[i].next, out--) { > - vq->vring.desc[i].flags = VRING_DESC_F_NEXT; > - vq->vring.desc[i].addr = sg_phys(sg); > - vq->vring.desc[i].len = sg->length; > - prev = i; > - sg++; > + vq->vq.num_free -= total_sg; > + > + head = i = vq->free_head; > + for (n = 0; n < out_sgs; n++) { > + for (sg = sgs[n]; sg; sg = sg_next(sg)) { > + vq->vring.desc[i].flags = VRING_DESC_F_NEXT; > + vq->vring.desc[i].addr = sg_phys(sg); > + vq->vring.desc[i].len = sg->length; > + prev = i; > + i = vq->vring.desc[i].next; > + } > } > - for (; in; i = vq->vring.desc[i].next, in--) { > - vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE; > - vq->vring.desc[i].addr = sg_phys(sg); > - vq->vring.desc[i].len = sg->length; > - prev = i; > - sg++; > + for (; n < (out_sgs + in_sgs); n++) { > + for (sg = sgs[n]; sg; sg = sg_next(sg)) { > + vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE; > + vq->vring.desc[i].addr = sg_phys(sg); > + vq->vring.desc[i].len = sg->length; > + prev = i; > + i = vq->vring.desc[i].next; > + } > } > /* Last one doesn't continue. */ > vq->vring.desc[prev].flags &= ~VRING_DESC_F_NEXT; > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index ff6714e..6eff15b 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -40,6 +40,13 @@ int virtqueue_add_buf(struct virtqueue *vq, > void *data, > gfp_t gfp); > > +int virtqueue_add_sgs(struct virtqueue *vq, > + struct scatterlist *sgs[], > + unsigned int out_sgs, > + unsigned int in_sgs, > + void *data, > + gfp_t gfp); > + > void virtqueue_kick(struct virtqueue *vq); > > bool virtqueue_kick_prepare(struct virtqueue *vq); >
WARNING: multiple messages have this Message-ID (diff)
From: Paolo Bonzini <pbonzini@redhat.com> To: Rusty Russell <rusty@rustcorp.com.au> Cc: kvm@vger.kernel.org, mst@redhat.com, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: Re: [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Date: Thu, 14 Feb 2013 10:23:37 +0100 [thread overview] Message-ID: <511CAD19.2010902@redhat.com> (raw) In-Reply-To: <87r4kjjuyn.fsf@rustcorp.com.au> Il 14/02/2013 07:00, Rusty Russell ha scritto: > Paolo Bonzini <pbonzini@redhat.com> writes: >> This series adds a different set of APIs for adding a buffer to a >> virtqueue. The new API lets you pass the buffers piecewise, wrapping >> multiple calls to virtqueue_add_sg between virtqueue_start_buf and >> virtqueue_end_buf. Letting drivers call virtqueue_add_sg multiple times >> if they already have a scatterlist provided by someone else simplifies the >> code and, for virtio-scsi, it saves the copying and related locking. > > They are ugly though. It's convoluted because we do actually know all > the buffers at once, we don't need a piecemeal API. > > As a result, you now have arbitrary changes to the indirect heuristic, > because the API is now piecemeal. Note that I have sent v2 of patch 1/9, keeping the original indirect heuristic. It was indeed a bad idea to conflate it in this series (it was born there because originally virtqueue_add_buf was not sharing any code, but now it's a different story) > How about this as a first step? > > virtio_ring: virtqueue_add_sgs, to add multiple sgs. > > virtio_scsi and virtio_blk can really use these, to avoid their current > hack of copying the whole sg array. > > Signed-off-by: Ruty Russell <rusty@rustcorp.com.au> It's much better than the other prototype you had posted, but I still dislike this... You pay for additional counting of scatterlists when the caller knows the number of buffers; and the nested loops aren't free, either. My piecemeal API tried hard to keep things as fast as virtqueue_add_buf when possible; I'm worried that this approach requires a lot more benchmarking. Probably you would also need a fast-path virtqueue_add_buf_single, and (unlike my version) that one couldn't share much code if any with virtqueue_add_sgs. So I can resend based on this patch, but I'm not sure it's really better... Also, see below for a comment. > @@ -197,8 +213,47 @@ int virtqueue_add_buf(struct virtqueue *_vq, > void *data, > gfp_t gfp) > { > + struct scatterlist *sgs[2]; > + unsigned int i; > + > + sgs[0] = sg; > + sgs[1] = sg + out; > + > + /* Workaround until callers pass well-formed sgs. */ > + for (i = 0; i < out + in; i++) > + sg_unmark_end(sg + i); > + > + sg_unmark_end(sg + out + in); > + if (out && in) > + sg_unmark_end(sg + out); What's this second sg_unmark_end block for? Doesn't it access after the end of sg? If you wanted it to be sg_mark_end, that must be: if (out) sg_mark_end(sg + out - 1); if (in) sg_mark_end(sg + out + in - 1); with a corresponding unmark afterwards. Paolo > + return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp); > +} > + > +/** > + * virtqueue_add_sgs - expose buffers to other end > + * @vq: the struct virtqueue we're talking about. > + * @sgs: array of terminated scatterlists. > + * @out_num: the number of scatterlists readable by other side > + * @in_num: the number of scatterlists which are writable (after readable ones) > + * @data: the token identifying the buffer. > + * @gfp: how to do memory allocations (if necessary). > + * > + * Caller must ensure we don't call this with other virtqueue operations > + * at the same time (except where noted). > + * > + * Returns zero or a negative error (ie. ENOSPC, ENOMEM). > + */ > +int virtqueue_add_sgs(struct virtqueue *_vq, > + struct scatterlist *sgs[], > + unsigned int out_sgs, > + unsigned int in_sgs, > + void *data, > + gfp_t gfp) > +{ > struct vring_virtqueue *vq = to_vvq(_vq); > - unsigned int i, avail, uninitialized_var(prev); > + struct scatterlist *sg; > + unsigned int i, n, avail, uninitialized_var(prev), total_sg; > int head; > > START_USE(vq); > @@ -218,46 +273,59 @@ int virtqueue_add_buf(struct virtqueue *_vq, > } > #endif > > + /* Count them first. */ > + for (i = total_sg = 0; i < out_sgs + in_sgs; i++) { > + struct scatterlist *sg; > + for (sg = sgs[i]; sg; sg = sg_next(sg)) > + total_sg++; > + } > + > + > /* If the host supports indirect descriptor tables, and we have multiple > * buffers, then go indirect. FIXME: tune this threshold */ > - if (vq->indirect && (out + in) > 1 && vq->vq.num_free) { > - head = vring_add_indirect(vq, sg, out, in, gfp); > + if (vq->indirect && total_sg > 1 && vq->vq.num_free) { > + head = vring_add_indirect(vq, sgs, total_sg, out_sgs, in_sgs, > + gfp); > if (likely(head >= 0)) > goto add_head; > } > > - BUG_ON(out + in > vq->vring.num); > - BUG_ON(out + in == 0); > + BUG_ON(total_sg > vq->vring.num); > + BUG_ON(total_sg == 0); > > - if (vq->vq.num_free < out + in) { > + if (vq->vq.num_free < total_sg) { > pr_debug("Can't add buf len %i - avail = %i\n", > - out + in, vq->vq.num_free); > + total_sg, vq->vq.num_free); > /* FIXME: for historical reasons, we force a notify here if > * there are outgoing parts to the buffer. Presumably the > * host should service the ring ASAP. */ > - if (out) > + if (out_sgs) > vq->notify(&vq->vq); > END_USE(vq); > return -ENOSPC; > } > > /* We're about to use some buffers from the free list. */ > - vq->vq.num_free -= out + in; > - > - head = vq->free_head; > - for (i = vq->free_head; out; i = vq->vring.desc[i].next, out--) { > - vq->vring.desc[i].flags = VRING_DESC_F_NEXT; > - vq->vring.desc[i].addr = sg_phys(sg); > - vq->vring.desc[i].len = sg->length; > - prev = i; > - sg++; > + vq->vq.num_free -= total_sg; > + > + head = i = vq->free_head; > + for (n = 0; n < out_sgs; n++) { > + for (sg = sgs[n]; sg; sg = sg_next(sg)) { > + vq->vring.desc[i].flags = VRING_DESC_F_NEXT; > + vq->vring.desc[i].addr = sg_phys(sg); > + vq->vring.desc[i].len = sg->length; > + prev = i; > + i = vq->vring.desc[i].next; > + } > } > - for (; in; i = vq->vring.desc[i].next, in--) { > - vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE; > - vq->vring.desc[i].addr = sg_phys(sg); > - vq->vring.desc[i].len = sg->length; > - prev = i; > - sg++; > + for (; n < (out_sgs + in_sgs); n++) { > + for (sg = sgs[n]; sg; sg = sg_next(sg)) { > + vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE; > + vq->vring.desc[i].addr = sg_phys(sg); > + vq->vring.desc[i].len = sg->length; > + prev = i; > + i = vq->vring.desc[i].next; > + } > } > /* Last one doesn't continue. */ > vq->vring.desc[prev].flags &= ~VRING_DESC_F_NEXT; > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index ff6714e..6eff15b 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -40,6 +40,13 @@ int virtqueue_add_buf(struct virtqueue *vq, > void *data, > gfp_t gfp); > > +int virtqueue_add_sgs(struct virtqueue *vq, > + struct scatterlist *sgs[], > + unsigned int out_sgs, > + unsigned int in_sgs, > + void *data, > + gfp_t gfp); > + > void virtqueue_kick(struct virtqueue *vq); > > bool virtqueue_kick_prepare(struct virtqueue *vq); >
next prev parent reply other threads:[~2013-02-14 9:23 UTC|newest] Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-02-12 12:23 [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Paolo Bonzini 2013-02-12 12:23 ` Paolo Bonzini 2013-02-12 12:23 ` [PATCH 1/9] virtio: add functions for piecewise addition of buffers Paolo Bonzini 2013-02-12 12:23 ` Paolo Bonzini 2013-02-12 14:56 ` Michael S. Tsirkin 2013-02-12 14:56 ` Michael S. Tsirkin 2013-02-12 15:32 ` Paolo Bonzini 2013-02-12 15:32 ` Paolo Bonzini 2013-02-12 15:43 ` Michael S. Tsirkin 2013-02-12 15:43 ` Michael S. Tsirkin 2013-02-12 15:48 ` Paolo Bonzini 2013-02-12 15:48 ` Paolo Bonzini 2013-02-12 16:13 ` Michael S. Tsirkin 2013-02-12 16:13 ` Michael S. Tsirkin 2013-02-12 16:17 ` Paolo Bonzini 2013-02-12 16:17 ` Paolo Bonzini 2013-02-12 16:35 ` Michael S. Tsirkin 2013-02-12 16:35 ` Michael S. Tsirkin 2013-02-12 16:57 ` Paolo Bonzini 2013-02-12 16:57 ` Paolo Bonzini 2013-02-12 17:34 ` Michael S. Tsirkin 2013-02-12 17:34 ` Michael S. Tsirkin 2013-02-12 18:04 ` Paolo Bonzini 2013-02-12 18:04 ` Paolo Bonzini 2013-02-12 18:23 ` Michael S. Tsirkin 2013-02-12 18:23 ` Michael S. Tsirkin 2013-02-12 20:08 ` Paolo Bonzini 2013-02-12 20:08 ` Paolo Bonzini 2013-02-12 20:49 ` Michael S. Tsirkin 2013-02-12 20:49 ` Michael S. Tsirkin 2013-02-13 8:06 ` Paolo Bonzini 2013-02-13 10:33 ` Michael S. Tsirkin 2013-02-12 18:03 ` [PATCH v2 " Paolo Bonzini 2013-02-12 18:03 ` Paolo Bonzini 2013-02-12 12:23 ` [PATCH 2/9] virtio-blk: reorganize virtblk_add_req Paolo Bonzini 2013-02-12 12:23 ` Paolo Bonzini 2013-02-17 6:38 ` Asias He 2013-02-17 6:38 ` Asias He 2013-02-12 12:23 ` [PATCH 3/9] virtio-blk: use virtqueue_start_buf on bio path Paolo Bonzini 2013-02-12 12:23 ` Paolo Bonzini 2013-02-17 6:39 ` Asias He 2013-02-17 6:39 ` Asias He 2013-02-12 12:23 ` [PATCH 4/9] virtio-blk: use virtqueue_start_buf on req path Paolo Bonzini 2013-02-12 12:23 ` Paolo Bonzini 2013-02-17 6:37 ` Asias He 2013-02-17 6:37 ` Asias He 2013-02-18 9:05 ` Paolo Bonzini 2013-02-18 9:05 ` Paolo Bonzini 2013-02-12 12:23 ` [PATCH 5/9] scatterlist: introduce sg_unmark_end Paolo Bonzini 2013-02-12 12:23 ` Paolo Bonzini 2013-02-12 12:23 ` [PATCH 6/9] virtio-net: unmark scatterlist ending after virtqueue_add_buf Paolo Bonzini 2013-02-12 12:23 ` Paolo Bonzini 2013-02-12 12:23 ` [PATCH 7/9] virtio-scsi: use virtqueue_start_buf Paolo Bonzini 2013-02-12 12:23 ` Paolo Bonzini 2013-02-12 12:23 ` [PATCH 8/9] virtio: introduce and use virtqueue_add_buf_single Paolo Bonzini 2013-02-12 12:23 ` Paolo Bonzini 2013-02-12 12:23 ` [PATCH 9/9] virtio: reimplement virtqueue_add_buf using new functions Paolo Bonzini 2013-02-12 12:23 ` Paolo Bonzini 2013-02-14 6:00 ` [PATCH 0/9] virtio: new API for addition of buffers, scatterlist changes Rusty Russell 2013-02-14 6:00 ` Rusty Russell 2013-02-14 9:23 ` Paolo Bonzini [this message] 2013-02-14 9:23 ` Paolo Bonzini 2013-02-15 18:04 ` Paolo Bonzini 2013-02-15 18:04 ` Paolo Bonzini 2013-02-19 7:49 ` Rusty Russell 2013-02-19 7:49 ` Rusty Russell 2013-02-19 9:11 ` Paolo Bonzini 2013-02-19 9:11 ` Paolo Bonzini
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=511CAD19.2010902@redhat.com \ --to=pbonzini@redhat.com \ --cc=asias@redhat.com \ --cc=gaowanlong@cn.fujitsu.com \ --cc=kvm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mst@redhat.com \ --cc=rusty@rustcorp.com.au \ --cc=virtualization@lists.linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.