All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Liu, Monk" <Monk.Liu-5C7GfCeVMHo@public.gmane.org>
To: "Koenig,
	Christian" <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Cc: "Chen, Horace" <Horace.Chen-5C7GfCeVMHo@public.gmane.org>
Subject: RE: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
Date: Mon, 18 Sep 2017 10:12:00 +0000	[thread overview]
Message-ID: <BLUPR12MB0449D3944109EA4A7D151A2684630@BLUPR12MB0449.namprd12.prod.outlook.com> (raw)
In-Reply-To: <1821bf91-83d8-c933-704d-fcd8db07def1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Christian,

Let's discuss this patch and the one follows which skip the KIQ MQD free to avoid SAVE_FAIL issue.


For skipping KIQ MQD deallocation patch, I think I will drop it and use a new way:
We allocate KIQ MQD in VRAM domain and this BO can be safely freed after driver unloaded, because after driver unloaded no one will change the data in this BO *usually*.
e.g. some root  app can map visible vram and alter the value in it

for this patch "to skipping unbind the GART mapping to keep KIQ MQD always valid":
Since hypervisor side always have couple hw component working, and they rely on GMC kept alive, so this is very different with BARE-METAL. That's to say we can only do like this way.

Besides, we'll have more patches in future for L1 secure mode, which forbidden VF access GMC registers, so under L1 secure mode driver will always skip GMC programing under SRIOV both in init and fini, but that will come later

BR Monk



-----Original Message-----
From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com] 
Sent: 2017年9月18日 17:28
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Chen, Horace <Horace.Chen@amd.com>
Subject: Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> [SWDEV-126631] - fix hypervisor save_vf fail that occured after driver 
> removed:
> 1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed mqd of KIQ and KCQ.
> 2. KIQ can't be unmapped since RLCV always need it, the bo_free on KIQ 
> should be skipped 3. KCQ can be unmapped, and should be unmapped 
> during hw_fini, 4. RLCV still need to access other mc address from some hw even after driver unloaded,
>     So we should not unbind gart for VF.
>
> Change-Id: I320487a9a848f41484c5f8cc11be34aca807b424
> Signed-off-by: Horace Chen <horace.chen@amd.com>
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

I absolutely can't judge if this is correct or not, but keeping the GART and KIQ alive after the driver is unloaded sounds really fishy to me.

Isn't there any other clean way of handling this?

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c |  3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  5 +++
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 60 +++++++++++++++++++++++++++++++-
>   3 files changed, 66 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index f437008..2fee071 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -394,7 +394,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
>    */
>   void amdgpu_gart_fini(struct amdgpu_device *adev)
>   {
> -	if (adev->gart.ready) {
> +	/* gart is still used by other hw under SRIOV, don't unbind it */
> +	if (adev->gart.ready && !amdgpu_sriov_vf(adev)) {
>   		/* unbind pages */
>   		amdgpu_gart_unbind(adev, 0, adev->gart.num_cpu_pages);
>   	}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 4f6c68f..bf6656f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -309,6 +309,11 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device *adev)
>   				      &ring->mqd_ptr);
>   	}
>   
> +	/* don't deallocate KIQ mqd because the bo is still used by RLCV even
> +	the guest VM is shutdown */
> +	if (amdgpu_sriov_vf(adev))
> +		return;
> +
>   	ring = &adev->gfx.kiq.ring;
>   	kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]);
>   	amdgpu_bo_free_kernel(&ring->mqd_obj,
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 44960b3..a577bbc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -2892,14 +2892,72 @@ static int gfx_v9_0_hw_init(void *handle)
>   	return r;
>   }
>   
> +static int gfx_v9_0_kcq_disable(struct amdgpu_ring *kiq_ring,struct 
> +amdgpu_ring *ring) {
> +	struct amdgpu_device *adev = kiq_ring->adev;
> +	uint32_t scratch, tmp = 0;
> +	int r, i;
> +
> +	r = amdgpu_gfx_scratch_get(adev, &scratch);
> +	if (r) {
> +		DRM_ERROR("Failed to get scratch reg (%d).\n", r);
> +		return r;
> +	}
> +	WREG32(scratch, 0xCAFEDEAD);
> +
> +	r = amdgpu_ring_alloc(kiq_ring, 10);
> +	if (r) {
> +		DRM_ERROR("Failed to lock KIQ (%d).\n", r);
> +		amdgpu_gfx_scratch_free(adev, scratch);
> +		return r;
> +	}
> +
> +	/* unmap queues */
> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4));
> +	amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */
> +						PACKET3_UNMAP_QUEUES_ACTION(1) | /* RESET_QUEUES */
> +						PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) |
> +						PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) |
> +						PACKET3_UNMAP_QUEUES_NUM_QUEUES(1));
> +	amdgpu_ring_write(kiq_ring, PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
> +	amdgpu_ring_write(kiq_ring, 0);
> +	amdgpu_ring_write(kiq_ring, 0);
> +	amdgpu_ring_write(kiq_ring, 0);
> +	/* write to scratch for completion */
> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1));
> +	amdgpu_ring_write(kiq_ring, (scratch - PACKET3_SET_UCONFIG_REG_START));
> +	amdgpu_ring_write(kiq_ring, 0xDEADBEEF);
> +	amdgpu_ring_commit(kiq_ring);
> +
> +	for (i = 0; i < adev->usec_timeout; i++) {
> +		tmp = RREG32(scratch);
> +		if (tmp == 0xDEADBEEF)
> +			break;
> +		DRM_UDELAY(1);
> +	}
> +	if (i >= adev->usec_timeout) {
> +		DRM_ERROR("KCQ disabled failed (scratch(0x%04X)=0x%08X)\n", scratch, tmp);
> +		r = -EINVAL;
> +	}
> +	amdgpu_gfx_scratch_free(adev, scratch);
> +	return r;
> +}
> +
> +
>   static int gfx_v9_0_hw_fini(void *handle)
>   {
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +	int i, r;
>   
>   	amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
>   	amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
>   	if (amdgpu_sriov_vf(adev)) {
> -		pr_debug("For SRIOV client, shouldn't do anything.\n");
> +		/* disable KCQ to avoid CPC touch memory not valid anymore */
> +		for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> +			r = gfx_v9_0_kcq_disable(&adev->gfx.kiq.ring, &adev->gfx.compute_ring[i]);
> +			if (r)
> +				return r;
> +		}
>   		return 0;
>   	}
>   	gfx_v9_0_cp_enable(adev, false);


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2017-09-18 10:12 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-18  6:11 [PATCH 00/18] *** misc patches for SRIOV *** Monk Liu
     [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  6:11   ` [PATCH 01/18] drm/amdgpu/sriov:fix missing error handling Monk Liu
     [not found]     ` <1505715122-23904-2-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:04       ` Christian König
2017-09-18  6:11   ` [PATCH 02/18] drm/amdgpu:no kiq in IH Monk Liu
     [not found]     ` <1505715122-23904-3-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:05       ` Christian König
2017-09-18  6:11   ` [PATCH 03/18] drm/amdgpu/sriov:move in_reset to adev and rename Monk Liu
     [not found]     ` <1505715122-23904-4-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:05       ` Christian König
2017-09-18  6:11   ` [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset Monk Liu
     [not found]     ` <1505715122-23904-5-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:06       ` Christian König
     [not found]         ` <2cd93ffd-91a6-77c6-b07c-c68188a340a5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-20  1:32           ` Quan, Evan
     [not found]             ` <DM5PR1201MB2489EF41F0B4703FE248AEBDE4610-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-09-20  1:54               ` Liu, Monk
2017-09-18  6:11   ` [PATCH 05/18] drm/amdgpu:make ctx_add_fence interruptible Monk Liu
     [not found]     ` <1505715122-23904-6-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:10       ` Christian König
2017-09-18  6:11   ` [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset Monk Liu
     [not found]     ` <1505715122-23904-7-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:12       ` Christian König
     [not found]         ` <f96a1189-2fe3-6466-df1b-557f87319cb9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 10:47           ` Liu, Monk
     [not found]             ` <BLUPR12MB0449D8D7812A4C80EDA2253D84630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-18 11:34               ` Christian König
     [not found]                 ` <45fa4145-41a4-6186-4f35-4f3347bad601-5C7GfCeVMHo@public.gmane.org>
2017-09-20  2:27                   ` Liu, Monk
2017-09-18  6:11   ` [PATCH 07/18] drm/amdgpu:add hdp golden setting register name hint Monk Liu
     [not found]     ` <1505715122-23904-8-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:13       ` Christian König
2017-09-18  6:11   ` [PATCH 08/18] drm/amdgpu:halt when vm fault Monk Liu
     [not found]     ` <1505715122-23904-9-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:14       ` Christian König
2017-09-18  6:11   ` [PATCH 09/18] drm/amdgpu:insert TMZ_BEGIN Monk Liu
     [not found]     ` <1505715122-23904-10-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:15       ` Christian König
2017-09-18  6:11   ` [PATCH 10/18] drm/amdgpu:hdp flush should be put it initialized Monk Liu
     [not found]     ` <1505715122-23904-11-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:16       ` Christian König
2017-09-18  6:11   ` [PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9 Monk Liu
     [not found]     ` <1505715122-23904-12-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:18       ` Christian König
     [not found]         ` <34ac878c-5bf7-7735-1787-b5d3c1691fd2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 15:48           ` Marek Olšák
2017-09-18  6:11   ` [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate Monk Liu
     [not found]     ` <1505715122-23904-13-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:19       ` Christian König
     [not found]         ` <2f11f862-6022-7a97-17ab-ae2c634f0061-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 11:03           ` Liu, Monk
     [not found]             ` <BLUPR12MB04497CDE395DCE35F830DD4F84630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-18 11:39               ` Christian König
     [not found]                 ` <4de1beaf-95c0-ba6e-da79-1070074f82e8-5C7GfCeVMHo@public.gmane.org>
2017-09-19  4:04                   ` Liu, Monk
     [not found]                     ` <BLUPR12MB0449D86C880B4B15A4FD916884600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  4:25                       ` Zhou, David(ChunMing)
     [not found]                         ` <MWHPR1201MB020621C233AA2C12F6127C61B4600-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-09-19  6:46                           ` Liu, Monk
     [not found]                             ` <BLUPR12MB0449F560B6A658DC4C120EC084600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  6:50                               ` zhoucm1
     [not found]                                 ` <baa9518f-d2b1-cfb8-8f98-c3557e3ef8fe-5C7GfCeVMHo@public.gmane.org>
2017-09-19  7:00                                   ` Liu, Monk
     [not found]                                     ` <BLUPR12MB0449775C4245A708B15E9D0B84600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  7:02                                       ` zhoucm1
     [not found]                                         ` <5367a2b2-3044-7388-08ff-6f0a620d5aa8-5C7GfCeVMHo@public.gmane.org>
2017-09-19  8:30                                           ` Christian König
     [not found]                                             ` <28fa17b6-ebb0-99c7-042a-19289d858f64-5C7GfCeVMHo@public.gmane.org>
2017-09-19  9:34                                               ` Zhang, Jerry (Junwei)
2017-09-19 13:42                                               ` Alex Deucher
2017-09-18  6:11   ` [PATCH 13/18] drm/amdgpu:fix driver unloading bug Monk Liu
     [not found]     ` <1505715122-23904-14-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:27       ` Christian König
     [not found]         ` <1821bf91-83d8-c933-704d-fcd8db07def1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 10:12           ` Liu, Monk [this message]
     [not found]             ` <BLUPR12MB0449D3944109EA4A7D151A2684630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-18 11:53               ` Christian König
     [not found]                 ` <fade2e70-6594-9a6e-9d5a-d488d360363e-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-19  4:14                   ` Liu, Monk
     [not found]                     ` <BLUPR12MB04498EEB2BF374C72EF7CF5384600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  8:26                       ` Christian König
     [not found]                         ` <69a1e774-6a9e-31c6-8b30-dfbd430062c8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-19 11:37                           ` Liu, Monk
2017-09-18  6:11   ` [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV Monk Liu
     [not found]     ` <1505715122-23904-15-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:10       ` Yu, Xiangliang
2017-09-18  9:31       ` Christian König
     [not found]         ` <0951ed06-954a-0f31-6b6e-ba923be008a2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 21:07           ` Alex Deucher
     [not found]             ` <CADnq5_Nj5Kqp4CXtFLLz-cPynvchBV-RLFFpB6e5D-OCyPXQiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-09-19  1:52               ` Yu, Xiangliang
2017-09-18  6:11   ` [PATCH 15/18] drm/amdgpu/sriov: fix page fault issue of driver unload Monk Liu
     [not found]     ` <1505715122-23904-16-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:22       ` Christian König
2017-09-18  6:12   ` [PATCH 16/18] drm/amdgpu: increate mailbox polling timeout to 12s Monk Liu
     [not found]     ` <1505715122-23904-17-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:23       ` Christian König
2017-09-18  6:12   ` [PATCH 17/18] drm/amdgpu:fix uvd ring fini routine Monk Liu
     [not found]     ` <1505715122-23904-18-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:25       ` Christian König
2017-09-18  6:12   ` [PATCH 18/18] drm/amdgpu/sriov:init csb for gfxv9 Monk Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BLUPR12MB0449D3944109EA4A7D151A2684630@BLUPR12MB0449.namprd12.prod.outlook.com \
    --to=monk.liu-5c7gfcevmho@public.gmane.org \
    --cc=Christian.Koenig-5C7GfCeVMHo@public.gmane.org \
    --cc=Horace.Chen-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.