All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aviad Yehezkel <aviadye@dev.mellanox.co.il>
To: 858585 jemmy <jemmy858585@gmail.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Aviad Yehezkel <aviadye@mellanox.com>
Cc: Adi Dotan <adido@mellanox.com>,
	zhang.zhanghailiang@huawei.com,
	Juan Quintela <quintela@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Lidong Chen <lidongchen@tencent.com>
Subject: Re: [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion
Date: Sun, 3 Jun 2018 18:04:12 +0300	[thread overview]
Message-ID: <c7a6c24d-4699-cfb1-b097-93f62060e36d@dev.mellanox.co.il> (raw)
In-Reply-To: <CAOGPPbe=8AOX0S8vBAOmkV_vnb4A0EV_amUif8Qig0PuJQCwFg@mail.gmail.com>

+Gal

Gal, please comment with our findings.

Thanks!


On 5/31/2018 10:36 AM, 858585 jemmy wrote:
> On Thu, May 31, 2018 at 1:33 AM, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
>> * Lidong Chen (jemmy858585@gmail.com) wrote:
>>> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
>>> maybe loop forever. so we should also poll the cm event fd, and when
>>> receive any cm event, we consider some error happened.
>>>
>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> I don't understand enough about the way the infiniband fd's work to
>> fully review this; so I'd appreciate if some one who does could
>> comment/add their review.
> Hi Avaid:
>      we need your help. I also not find any document about the cq
> channel event fd and
> cm channel event f.
>      Should we set the events to G_IO_IN | G_IO_HUP | G_IO_ERR? or
> G_IO_IN is enough?
>      pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>      pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>      Thanks.
>
>>> ---
>>>   migration/rdma.c | 35 ++++++++++++++++++++++++-----------
>>>   1 file changed, 24 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index 1b9e261..d611a06 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
>>>    */
>>>   static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>>>   {
>>> +    struct rdma_cm_event *cm_event;
>>> +    int ret = -1;
>>> +
>>>       /*
>>>        * Coroutine doesn't start until migration_fd_process_incoming()
>>>        * so don't yield unless we know we're running inside of a coroutine.
>>> @@ -1504,25 +1507,35 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>>>            * But we need to be able to handle 'cancel' or an error
>>>            * without hanging forever.
>>>            */
>>> -        while (!rdma->error_state  && !rdma->received_error) {
>>> -            GPollFD pfds[1];
>>> +        while (!rdma->error_state && !rdma->received_error) {
>>> +            GPollFD pfds[2];
>>>               pfds[0].fd = rdma->comp_channel->fd;
>>>               pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>>> +            pfds[0].revents = 0;
>>> +
>>> +            pfds[1].fd = rdma->channel->fd;
>>> +            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
>>> +            pfds[1].revents = 0;
>>> +
>>>               /* 0.1s timeout, should be fine for a 'cancel' */
>>> -            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
>>> -            case 1: /* fd active */
>>> -                return 0;
>>> +            qemu_poll_ns(pfds, 2, 100 * 1000 * 1000);
>> Shouldn't we still check the return value of this; if it's negative
>> something has gone wrong.
> I will fix this.
> Thanks.
>
>> Dave
>>
>>> -            case 0: /* Timeout, go around again */
>>> -                break;
>>> +            if (pfds[1].revents) {
>>> +                ret = rdma_get_cm_event(rdma->channel, &cm_event);
>>> +                if (!ret) {
>>> +                    rdma_ack_cm_event(cm_event);
>>> +                }
>>> +                error_report("receive cm event while wait comp channel,"
>>> +                             "cm event is %d", cm_event->event);
>>>
>>> -            default: /* Error of some type -
>>> -                      * I don't trust errno from qemu_poll_ns
>>> -                     */
>>> -                error_report("%s: poll failed", __func__);
>>> +                /* consider any rdma communication event as an error */
>>>                   return -EPIPE;
>>>               }
>>>
>>> +            if (pfds[0].revents) {
>>> +                return 0;
>>> +            }
>>> +
>>>               if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
>>>                   /* Bail out and let the cancellation happen */
>>>                   return -EPIPE;
>>> --
>>> 1.8.3.1
>>>
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2018-06-03 15:04 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 01/12] migration: disable RDMA WRITE after postcopy started Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 02/12] migration: create a dedicated connection for rdma return path Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 03/12] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads Lidong Chen
2018-05-30 14:45   ` Dr. David Alan Gilbert
2018-05-31  7:07     ` 858585 jemmy
2018-05-31 10:52       ` Dr. David Alan Gilbert
2018-06-03 13:50         ` 858585 jemmy
2018-06-03 14:43           ` 858585 jemmy
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 05/12] migration: implement bi-directional RDMA QIOChannel Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 06/12] migration: Stop rdma yielding during incoming postcopy Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 07/12] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect Lidong Chen
2018-05-30 12:24   ` Dr. David Alan Gilbert
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 08/12] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 09/12] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource Lidong Chen
2018-05-30 16:50   ` Dr. David Alan Gilbert
2018-05-31  7:25     ` 858585 jemmy
2018-05-31 10:55       ` Dr. David Alan Gilbert
2018-05-31 11:27         ` 858585 jemmy
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion Lidong Chen
2018-05-30 17:33   ` Dr. David Alan Gilbert
2018-05-31  7:36     ` 858585 jemmy
2018-06-03 15:04       ` Aviad Yehezkel [this message]
2018-06-05 14:26         ` 858585 jemmy
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 12/12] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
2018-05-30 17:59   ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c7a6c24d-4699-cfb1-b097-93f62060e36d@dev.mellanox.co.il \
    --to=aviadye@dev.mellanox.co.il \
    --cc=adido@mellanox.com \
    --cc=aviadye@mellanox.com \
    --cc=dgilbert@redhat.com \
    --cc=jemmy858585@gmail.com \
    --cc=lidongchen@tencent.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.