dev.dpdk.org archive mirror
 help / color / mirror / Atom feed
From: lin li <chuckylinchuckylin@gmail.com>
To: Tiwei Bie <tiwei.bie@intel.com>
Cc: "maxime.coquelin@redhat.com" <maxime.coquelin@redhat.com>,
	 "zhihong.wang@intel.com" <zhihong.wang@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	 "dariusz.stojaczyk@intel.com" <dariusz.stojaczyk@intel.com>,
	 "changpeng.liu@intel.com" <changpeng.liu@intel.com>,
	 "james.r.harris@intel.com" <james.r.harris@intel.com>,
	lilin24 <lilin24@baidu.com>, Ni Xun <nixun@baidu.com>,
	 Zhang Yu <zhangyu31@baidu.com>,
	"mst@redhat.com" <mst@redhat.com>,
	xieyongji@baidu.com
Subject: Re: [dpdk-dev] [PATCH v3] vhost: support inflight share memory protocol feature
Date: Tue, 30 Apr 2019 16:37:58 +0800	[thread overview]
Message-ID: <CAF+hgq2rrH5=ms62Sx+2K-Z5WiEJC=ZpM2uyQEgrQdy2OGiVkw@mail.gmail.com> (raw)
In-Reply-To: <20190429105425.GA12868@___>

Tiwei Bie <tiwei.bie@intel.com> 于2019年4月29日周一 下午6:55写道:
>
> On Mon, Apr 29, 2019 at 12:07:05PM +0800, lin li wrote:
> > Tiwei Bie <tiwei.bie@intel.com> 于2019年4月28日周日 下午7:18写道:
> > > On Fri, Apr 26, 2019 at 05:40:21AM -0400, Li Lin wrote:
> [...]
> > > > @@ -98,12 +102,26 @@ struct rte_vhost_memory {
> > > >       struct rte_vhost_mem_region regions[];
> > > >  };
> > > >
> > > > +typedef struct VhostUserInflightEntry {
> > > > +     uint8_t inflight;
> > > > +} VhostUserInflightEntry;
> > > > +
> > > > +typedef struct VhostInflightInfo {
> > > > +     uint16_t version;
> > > > +     uint16_t last_inflight_io;
> > > > +     uint16_t used_idx;
> > > > +     VhostUserInflightEntry desc[0];
> > > > +} VhostInflightInfo;
> > >
> > > Is there any details on above structure? Why does it not match
> > > QueueRegionSplit or QueueRegionPacked structures described in
> > > qemu/docs/interop/vhost-user.txt?
> >
> > Qemu have its vhost-user backend,
>
> Do you mean contrib/libvhost-user in QEMU?
>
> > qemu did the submission of IO in it.
>
> What does this mean?
>
> > The implementation of dpdk is more general. It is just to mark inflight entry.
>
> Based on the discussion in below threads:
>
> https://patchwork.kernel.org/patch/10753893/
> https://patchwork.kernel.org/patch/10817735/
>
> IIUC, above structure is the extra buffer allocated by slave to
> store the information of inflight descriptors and share with master
> for persistent. All vhost-user backends' implementation should
> follow the vhost-user protocol (including the format of above
> structure) defined in qemu/docs/interop/vhost-user.txt. Right?

Yes you're right. I've confirmed with QEMU that these data structures
are consistent with qemu.
It will be consistent with QEMU in the next version.

>
> > The submission of inflight entry is handle over to different backends.
> > They have their own ways to handle it, such as spdk.
> > So there are some differences in data structure.
> >
> > >
> [...]
> > > > +static int
> > > > +vhost_user_set_inflight_fd(struct virtio_net **pdev, VhostUserMsg *msg,
> > > > +             int main_fd __rte_unused)
> > > > +{
> > > > +     int fd, i;
> > > > +     uint64_t mmap_size, mmap_offset;
> > > > +     uint16_t num_queues, queue_size;
> > > > +     uint32_t pervq_inflight_size;
> > > > +     void *rc;
> > > > +     struct vhost_virtqueue *vq;
> > > > +     struct virtio_net *dev = *pdev;
> > > > +
> > > > +     fd = msg->fds[0];
> > > > +     if (msg->size != sizeof(msg->payload.inflight) || fd < 0) {
> > > > +             RTE_LOG(ERR, VHOST_CONFIG, "Invalid set_inflight_fd message size is %d,fd is %d\n",
> > > > +                     msg->size, fd);
> > > > +             return -1;
> > >
> > > Ditto.
> > >
> > > > +     }
> > > > +
> > > > +     mmap_size = msg->payload.inflight.mmap_size;
> > > > +     mmap_offset = msg->payload.inflight.mmap_offset;
> > > > +     num_queues = msg->payload.inflight.num_queues;
> > > > +     queue_size = msg->payload.inflight.queue_size;
> > > > +     pervq_inflight_size = get_pervq_shm_size(queue_size);
> > > > +
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd mmap_size: %lu\n", mmap_size);
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd mmap_offset: %lu\n", mmap_offset);
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd num_queues: %u\n", num_queues);
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd queue_size: %u\n", queue_size);
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd fd: %d\n", fd);
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd pervq_inflight_size: %d\n",
> > > > +             pervq_inflight_size);
> > > > +
> > > > +     if (dev->inflight_info.addr)
> > > > +             munmap(dev->inflight_info.addr, dev->inflight_info.size);
> > > > +
> > > > +     rc = mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > > > +                     fd, mmap_offset);
> > >
> > > Why call it rc? Maybe addr is a better name?
> >
> > In some scenarios, shared memory is reallocated or resized by qemu, so
> > again mmap is needed.
>
> In which case the shared memory will be reallocated or resized by QEMU?

QEMU replies that it needs to copy inflight share memory data when
live migration occurs.
If the size of inflight buffer on two machines is not the same, QEMU
will reallocate memory according to the old inflight buffer size.

>
> >
> > >
> > > > +     if (rc == MAP_FAILED) {
> > > > +             RTE_LOG(ERR, VHOST_CONFIG, "failed to mmap share memory.\n");
> > > > +             return -1;
> > >
> > > Should always return RTE_VHOST_MSG_RESULT_* in
> > > message handler.
> > >
> > > > +     }
> > > > +
> > > > +     if (dev->inflight_info.fd)
> > > > +             close(dev->inflight_info.fd);
> > > > +
> > > > +     dev->inflight_info.fd = fd;
> > > > +     dev->inflight_info.addr = rc;
> > > > +     dev->inflight_info.size = mmap_size;
> > > > +
> > > > +     for (i = 0; i < num_queues; i++) {
> > > > +             vq = dev->virtqueue[i];
> > > > +             vq->inflight = (VhostInflightInfo *)rc;
> > > > +             rc = (void *)((char *)rc + pervq_inflight_size);
> > > > +     }
> > > > +
> > > > +     return RTE_VHOST_MSG_RESULT_OK;
> > > > +}
> [...]
> > > > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> > > > index 2a650fe4b..35f969b1b 100644
> > > > --- a/lib/librte_vhost/vhost_user.h
> > > > +++ b/lib/librte_vhost/vhost_user.h
> > > > @@ -23,7 +23,8 @@
> > > >                                        (1ULL << VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \
> > > >                                        (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \
> > > >                                        (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
> > > > -                                      (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT))
> > > > +                                     (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT) | \
> > > > +                                     (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD))
> > >
> > > It will advertise this feature for builtin net and crypto
> > > backends. It's probably not what you intended.
> > >
> >
> > Indeed, this feature is mainly used for spdk-like backends. You mean
> > this function is disabled by default?
>
> External backends should use rte_vhost_driver_set_protocol_features()
> to advertise the protocol features they support.

Thanks,It will be modified in the next version.

  reply	other threads:[~2019-04-30  8:38 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-25  7:56 [dpdk-dev] [PATCH] [resend] vhost: support inflight share memory protocol feature Li Lin
2019-04-25 10:56 ` lin li
2019-04-25 11:10 ` [dpdk-dev] [PATCH v2] " Li Lin
2019-04-26  9:40   ` [dpdk-dev] [PATCH v3] " Li Lin
2019-04-28 11:17     ` Tiwei Bie
2019-04-29  4:07       ` lin li
2019-04-29 10:54         ` Tiwei Bie
2019-04-30  8:37           ` lin li [this message]
2019-05-05  9:02     ` [dpdk-dev] [PATCH v4] " Li Lin
2019-05-17 15:47       ` Maxime Coquelin
2019-05-20  2:13         ` Tiwei Bie
2019-05-21  8:29           ` lin li
2019-05-21  8:46             ` Tiwei Bie
2019-05-22 10:15               ` [dpdk-dev] 答复: " Li,Lin(ACG Cloud)
2019-05-22 11:04         ` Li,Lin(ACG Cloud)
2019-06-12  8:07           ` Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF+hgq2rrH5=ms62Sx+2K-Z5WiEJC=ZpM2uyQEgrQdy2OGiVkw@mail.gmail.com' \
    --to=chuckylinchuckylin@gmail.com \
    --cc=changpeng.liu@intel.com \
    --cc=dariusz.stojaczyk@intel.com \
    --cc=dev@dpdk.org \
    --cc=james.r.harris@intel.com \
    --cc=lilin24@baidu.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mst@redhat.com \
    --cc=nixun@baidu.com \
    --cc=tiwei.bie@intel.com \
    --cc=xieyongji@baidu.com \
    --cc=zhangyu31@baidu.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).