All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Galaxy <mgalaxy@akamai.com>
To: Yu Zhang <yu.zhang@ionos.com>, Peter Xu <peterx@redhat.com>
Cc: "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>,
	"Jinpu Wang" <jinpu.wang@ionos.com>,
	"Elmar Gerdes" <elmar.gerdes@ionos.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Yuval Shaia" <yuval.shaia.ml@gmail.com>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Prasanna Kumar Kalever" <prasanna.kalever@redhat.com>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Michael Roth" <michael.roth@amd.com>,
	"Prasanna Kumar Kalever" <prasanna4324@gmail.com>,
	"integration@gluster.org" <integration@gluster.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"devel@lists.libvirt.org" <devel@lists.libvirt.org>,
	"Hanna Reitz" <hreitz@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Song Gao" <gaosong@loongson.cn>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Wainer dos Santos Moschetta" <wainersm@redhat.com>,
	"Beraldo Leal" <bleal@redhat.com>,
	arei.gonglei@huawei.com, pannengyuan@huawei.com
Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
Date: Mon, 29 Apr 2024 08:08:10 -0500	[thread overview]
Message-ID: <46f5e323-632d-7bda-f2c5-3cfa7b1c6b68@akamai.com> (raw)
In-Reply-To: <CAHEcVy7POArt+CmY8dyNTzLJp3XxXgjh3k8=C=9K+_cw1CSJFA@mail.gmail.com>

Hi All (and Peter),

My name is Michael Galaxy (formerly Hines). Yes, I changed my last name 
(highly irregular for a male) and yes, that's my real last name: 
https://www.linkedin.com/in/mrgalaxy/)

I'm the original author of the RDMA implementation. I've been discussing 
with Yu Zhang for a little bit about potentially handing over 
maintainership of the codebase to his team.

I simply have zero access to RoCE or Infiniband hardware at all, 
unfortunately. so I've never been able to run tests or use what I wrote 
at work, and as all of you know, if you don't have a way to test 
something, then you can't maintain it.

Yu Zhang put a (very kind) proposal forward to me to ask the community 
if they feel comfortable training his team to maintain the codebase (and 
run tests) while they learn about it.

If you don't mind, I'd like to let him send over his (very detailed) 
proposal,

- Michael

On 4/11/24 11:36, Yu Zhang wrote:
>> 1) Either a CI test covering at least the major RDMA paths, or at least
>>      periodically tests for each QEMU release will be needed.
> We use a batch of regression test cases for the stack, which covers the
> test for QEMU. I did such test for most of the QEMU releases planned as
> candidates for rollout.
>
> The migration test needs a pair of (either physical or virtual) servers with
> InfiniBand network, which makes it difficult to do on a single server. The
> nested VM could be a possible approach, for which we may need virtual
> InfiniBand network. Is SoftRoCE [1] a choice? I will try it and let you know.
>
> [1]  https://urldefense.com/v3/__https://enterprise-support.nvidia.com/s/article/howto-configure-soft-roce__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWvGHE3ig$
>
> Thanks and best regards!
>
> On Thu, Apr 11, 2024 at 4:20 PM Peter Xu <peterx@redhat.com> wrote:
>> On Wed, Apr 10, 2024 at 09:49:15AM -0400, Peter Xu wrote:
>>> On Wed, Apr 10, 2024 at 02:28:59AM +0000, Zhijian Li (Fujitsu) via wrote:
>>>>
>>>> on 4/10/2024 3:46 AM, Peter Xu wrote:
>>>>
>>>>>> Is there document/link about the unittest/CI for migration tests, Why
>>>>>> are those tests missing?
>>>>>> Is it hard or very special to set up an environment for that? maybe we
>>>>>> can help in this regards.
>>>>> See tests/qtest/migration-test.c.  We put most of our migration tests
>>>>> there and that's covered in CI.
>>>>>
>>>>> I think one major issue is CI systems don't normally have rdma devices.
>>>>> Can rdma migration test be carried out without a real hardware?
>>>> Yeah,  RXE aka. SOFT-RoCE is able to emulate the RDMA, for example
>>>> $ sudo rdma link add rxe_eth0 type rxe netdev eth0  # on host
>>>> then we can get a new RDMA interface "rxe_eth0".
>>>> This new RDMA interface is able to do the QEMU RDMA migration.
>>>>
>>>> Also, the loopback(lo) device is able to emulate the RDMA interface
>>>> "rxe_lo", however when
>>>> I tried(years ago) to do RDMA migration over this
>>>> interface(rdma:127.0.0.1:3333) , it got something wrong.
>>>> So i gave up enabling the RDMA migration qtest at that time.
>>> Thanks, Zhijian.
>>>
>>> I'm not sure adding an emu-link for rdma is doable for CI systems, though.
>>> Maybe someone more familiar with how CI works can chim in.
>> Some people got dropped on the cc list for unknown reason, I'm adding them
>> back (Fabiano, Peter Maydell, Phil).  Let's make sure nobody is dropped by
>> accident.
>>
>> I'll try to summarize what is still missing, and I think these will be
>> greatly helpful if we don't want to deprecate rdma migration:
>>
>>    1) Either a CI test covering at least the major RDMA paths, or at least
>>       periodically tests for each QEMU release will be needed.
>>
>>    2) Some performance tests between modern RDMA and NIC devices are
>>       welcomed.  The current knowledge is modern NIC can work similarly to
>>       RDMA in performance, then it's debatable why we still maintain so much
>>       rdma specific code.
>>
>>    3) No need to be soild patchsets for this one, but some plan to improve
>>       RDMA migration code so that it is not almost isolated from the rest
>>       protocols.
>>
>>    4) Someone to look after this code for real.
>>
>> For 2) and 3) more info is here:
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/r/ZhWa0YeAb9ySVKD1@x1n__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWpIWYBhQ$
>>
>> Here 4) can be the most important as Markus pointed out.  We just didn't
>> get there yet on the discussions, but maybe Markus is right that we should
>> talk that first.
>>
>> Thanks,
>>
>> --
>> Peter Xu
>>


  parent reply	other threads:[~2024-04-29 13:10 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 13:02 [PATCH-for-9.1 v2 0/3] rdma: Remove RDMA subsystem and pvrdma device Philippe Mathieu-Daudé
2024-03-28 13:02 ` [PATCH-for-9.1 v2 1/3] hw/rdma: Remove pvrdma device and rdmacm-mux helper Philippe Mathieu-Daudé
2024-03-28 17:51   ` Thomas Huth
2024-03-28 13:02 ` [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling Philippe Mathieu-Daudé
2024-03-28 14:18   ` Fabiano Rosas
2024-03-28 15:01     ` Peter Xu
2024-03-28 15:22       ` Thomas Huth
2024-03-28 19:04         ` Peter Xu
2024-03-29  1:53       ` Zhijian Li (Fujitsu) via
2024-03-29 10:28         ` Philippe Mathieu-Daudé
2024-03-29 19:44           ` Daniel P. Berrangé
2024-04-01  7:55           ` Zhijian Li (Fujitsu) via
2024-04-01 21:26             ` Yu Zhang
2024-04-02 21:23               ` Peter Xu
2024-04-08 14:07                 ` Jinpu Wang
2024-04-08 16:18                   ` Peter Xu
2024-04-09  7:32                     ` Jinpu Wang
2024-04-09 19:46                       ` Peter Xu
2024-04-10  2:28                         ` Zhijian Li (Fujitsu) via
2024-04-10 13:49                           ` Peter Xu
2024-04-11 14:20                             ` Peter Xu
2024-04-11 16:36                               ` Yu Zhang
2024-04-12 14:04                                 ` Peter Xu
2024-04-29 13:08                                 ` Michael Galaxy [this message]
2024-04-29 14:56                                   ` Peter Xu
2024-04-29 20:45                                     ` Yu Zhang
2024-04-29 20:56                                       ` Michael Galaxy
2024-04-30  7:15                                     ` Markus Armbruster
2024-04-30  8:00                                       ` Daniel P. Berrangé
2024-05-01 15:31                                         ` Peter Xu
2024-05-01 15:59                                           ` Daniel P. Berrangé
2024-05-01 16:16                                             ` Peter Xu
2024-05-02 13:22                                               ` Michael Galaxy
2024-05-02 13:30                                                 ` Jinpu Wang
2024-05-02 16:19                                                   ` Peter Xu
2024-05-02 17:10                                                     ` Jinpu Wang
2024-05-03  6:40                                             ` Jinpu Wang
2024-05-03 14:33                                               ` Peter Xu
2024-05-06 10:08                                                 ` Jinpu Wang
2024-05-06 15:28                                                   ` Peter Xu
2024-05-07  4:52                                                     ` Jinpu Wang
2024-05-08 10:06                                                       ` Daniel P. Berrangé
2024-05-06  2:06                                           ` Gonglei (Arei) via
2024-05-06 15:18                                             ` Peter Xu
2024-05-07  1:50                                               ` Gonglei (Arei) via
2024-05-07 16:28                                                 ` Peter Xu
2024-05-09  8:58                                                   ` Zheng Chuan via
2024-05-09 14:13                                                     ` Peter Xu
2024-05-13  7:30                                                       ` Jinpu Wang
2024-05-14 15:19                                                       ` Yu Zhang
2024-05-16 17:29                                                         ` Michael Galaxy
2024-05-17 13:01                                                           ` Yu Zhang
2024-05-21 22:15                                                             ` Peter Xu
2024-05-13 18:52                                                     ` Michael Galaxy
2024-04-11 14:42                         ` Jinpu Wang
2024-04-09  9:00                     ` Markus Armbruster
2024-03-28 13:02 ` [PATCH-for-9.1 v2 3/3] block/gluster: " Philippe Mathieu-Daudé
2024-03-28 17:54   ` Thomas Huth
2024-03-29  9:17 ` [PATCH-for-9.1 v2 0/3] rdma: Remove RDMA subsystem and pvrdma device Michael S. Tsirkin
2024-04-03  9:37 ` Philippe Mathieu-Daudé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46f5e323-632d-7bda-f2c5-3cfa7b1c6b68@akamai.com \
    --to=mgalaxy@akamai.com \
    --cc=alex.bennee@linaro.org \
    --cc=arei.gonglei@huawei.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=bleal@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=devel@lists.libvirt.org \
    --cc=eblake@redhat.com \
    --cc=elmar.gerdes@ionos.com \
    --cc=gaosong@loongson.cn \
    --cc=hreitz@redhat.com \
    --cc=integration@gluster.org \
    --cc=jinpu.wang@ionos.com \
    --cc=kwolf@redhat.com \
    --cc=lizhijian@fujitsu.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=michael.roth@amd.com \
    --cc=mst@redhat.com \
    --cc=pannengyuan@huawei.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=prasanna.kalever@redhat.com \
    --cc=prasanna4324@gmail.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    --cc=wainersm@redhat.com \
    --cc=yu.zhang@ionos.com \
    --cc=yuval.shaia.ml@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.