Hi Andrey,

 

We just tried kernel 5.16 based on https://gitlab.freedesktop.org/agd5f/linux.git amd-staging-drm-next branch, and found out that hotplug did not work out of box for Rocm compute stack.

We did not try the rendering stack since we currently are more focused on AI workloads.

 

We have also created a patch against the amd-staging-drm-next branch to enable hotplug for ROCM stack, which were sent in another later email with same subject. I am attaching the patch in this email, in case that you would want to delete that later email.

 

Best regards,

Shuotao

 

From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Date: Wednesday, April 6, 2022 at 10:13 PM
To: Shuotao Xu <shuotaoxu@microsoft.com>, amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Ziyue Yang <Ziyue.Yang@microsoft.com>, Lei Qu <Lei.Qu@microsoft.com>, Peng Cheng <pengc@microsoft.com>, Ran Shu <Ran.Shu@microsoft.com>
Subject: [EXTERNAL] Re: Code Review Request for AMDGPU Hotplug Support

[You don't often get email from andrey.grodzovsky@amd.com. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.]

Looks like you are using 5.13 kernel for this work, FYI we added
hot plug support for the graphic stack in 5.14 kernel (see
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.phoronix.com%2Fscan.php%3Fpage%3Dnews_item%26px%3DLinux-5.14-AMDGPU-Hot-Unplug&amp;data=05%7C01%7Cshuotaoxu%40microsoft.com%7Cf1f7980b198541d7196d08da17d79838%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637848512015144682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=26qOd5vKzOigo0SaSc5%2FF8BOI9yzRlqC08xUMC01Jzk%3D&amp;reserved=0)


I am not sure about the code part since it all touches KFD driver (KFD
team can comment on that) - but I was just wondering if you try 5.14
kernel would things just work for you out of the box ?

Andrey

On 2022-04-05 22:45, Shuotao Xu wrote:
> Dear AMD Colleagues,
>
> We are from Microsoft Research, and are working on GPU disaggregation
> technology.
>
> We have created a new pull requestAdd PCIe hotplug support for amdgpu by
> xushuotao · Pull Request #131 · RadeonOpenCompute/ROCK-Kernel-Driver
> (github.com)
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRadeonOpenCompute%2FROCK-Kernel-Driver%2Fpull%2F131&amp;data=05%7C01%7Cshuotaoxu%40microsoft.com%7Cf1f7980b198541d7196d08da17d79838%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637848512015144682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=u2NtNDfuiCfKNKqeZ337KLq2uRDB1oGyO3%2BxIMQweRA%3D&amp;reserved=0>in
> ROCK-Kernel-Driver, which will enable PCIe hot-plug support for amdgpu.
>
> We believe the support of hot-plug of GPU devices can open doors for
> many advanced applications in data center in the next few years, and we
> would like to have some reviewers on this PR so we can continue further
> technical discussions around this feature.
>
> Would you please help review this PR?
>
> Thank you very much!
>
> Best regards,
>
> Shuotao Xu
>