From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.9 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,INCLUDES_PATCH,MAILING_LIST_MULTI, MIME_HTML_MOSTLY,MSGID_FROM_MTA_HEADER,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFD28C2BA19 for ; Wed, 15 Apr 2020 08:11:24 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9288F20784 for ; Wed, 15 Apr 2020 08:11:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=amdcloud.onmicrosoft.com header.i=@amdcloud.onmicrosoft.com header.b="2iYf6kKS" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9288F20784 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=amd.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=amd-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4D7EF6E905; Wed, 15 Apr 2020 08:11:24 +0000 (UTC) Received: from NAM02-CY1-obe.outbound.protection.outlook.com (mail-eopbgr760070.outbound.protection.outlook.com [40.107.76.70]) by gabe.freedesktop.org (Postfix) with ESMTPS id E5B196E905 for ; Wed, 15 Apr 2020 08:11:22 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Qx2NoHEjx+xNXQ6kvcaeaKt9Lw/M9bFjvR4UBrYdMMvQblCIn/X7+kHV8AzvQhLi/TU7vB33puzfk5Y0xf5wZl7K3vBt0SkPO7lazUBlNz7XP+CCFw1kjmfaVENmV+w2GNw8NpCGSYXHvQxpRUkVjvpnSnQ9EMQHVhfwOAqC2ruDZ0QQ7Bcx1Bicsu10sOCkbXN5NnwfWx8Rpk1CWY4lqibFbs6dlsJkL5ac/thcHQvmYiJ2vVxJA+vGSGDUfsgYGvV9ZUL4uxZ1GI9bGflSVGonZzVXR5nOrKyRAUFbaxPbzRhHQKNFZn8+octDx3Jva9TZzlssJGYlh6TUVyu+Eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ar/7OE5d2Zt4kUtQQ46TNBE5k4FuRG6fOieyZLRwybU=; b=RungjAhjFCGphGWetZNEwiLJkl3Ivmm30rb4hMa28Ee48csPv16gJzEOuo2q6IXFs7AeBA1qo2qygu4UvcfP1Va7MQsKbuRr3ZWZVR2bLATNcz1CrqIxvrYeE5eH9w+ietnKMt81BB44hMae3cM593ot9fp//FPqg7a7LohMQJ157pZMBHI3W+fcc9CuWNjwy7ZRnUhzJP8TI1T43REQgdZrKhoZvHV8QGJ2fHp3IcWKFc1cYdy3NshECkNIKwAnjM+rx2x0N3ZlXixhCuLF3k4TNYRJqB5cABG+mDkZjfaD6ezqd+DZLOooA72M9JDPtV1XDDO6ipK1ChGL3/VyjA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amdcloud.onmicrosoft.com; s=selector2-amdcloud-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ar/7OE5d2Zt4kUtQQ46TNBE5k4FuRG6fOieyZLRwybU=; b=2iYf6kKS/cT9Mmf72Aw+179t8Ef8mBjMEbh+tFCcpSOR16V4CR3FbN3rTrvIKb6qcs+0dglGRhhyp2Y6XiGUcyFJXU5M7oz+mFEbbJ6OsOlIT5iAusSpzLeJXibnTnukosyOmkZsv00U1RTEG8HaWSz7xvFSFuOp12fyLx6mHUM= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Christian.Koenig@amd.com; Received: from DM6PR12MB4401.namprd12.prod.outlook.com (2603:10b6:5:2a9::15) by DM6PR12MB4435.namprd12.prod.outlook.com (2603:10b6:5:2a6::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2900.24; Wed, 15 Apr 2020 08:11:20 +0000 Received: from DM6PR12MB4401.namprd12.prod.outlook.com ([fe80::f164:85c4:1b51:14d2]) by DM6PR12MB4401.namprd12.prod.outlook.com ([fe80::f164:85c4:1b51:14d2%4]) with mapi id 15.20.2900.028; Wed, 15 Apr 2020 08:11:19 +0000 Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2" To: "Kim, Jonathan" , "Kuehling, Felix" , "Deucher, Alexander" References: <20200413182026.2561-1-kent.russell@amd.com> <85fcb568-b0d8-b6c9-4e62-3866aa2da0c9@gmail.com> <146d9570-724e-423d-931e-24c96821aaae@email.android.com> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: Date: Wed, 15 Apr 2020 10:11:11 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 In-Reply-To: Content-Language: en-US X-ClientProxiedBy: AM6PR01CA0048.eurprd01.prod.exchangelabs.com (2603:10a6:20b:e0::25) To DM6PR12MB4401.namprd12.prod.outlook.com (2603:10b6:5:2a9::15) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [IPv6:2a02:908:1252:fb60:be8a:bd56:1f94:86e7] (2a02:908:1252:fb60:be8a:bd56:1f94:86e7) by AM6PR01CA0048.eurprd01.prod.exchangelabs.com (2603:10a6:20b:e0::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2921.25 via Frontend Transport; Wed, 15 Apr 2020 08:11:17 +0000 X-Originating-IP: [2a02:908:1252:fb60:be8a:bd56:1f94:86e7] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 7e5d5827-b885-494e-77e5-08d7e11493ee X-MS-TrafficTypeDiagnostic: DM6PR12MB4435:|DM6PR12MB4435: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-Forefront-PRVS: 0374433C81 X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR12MB4401.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(10009020)(4636009)(39860400002)(396003)(366004)(376002)(346002)(136003)(966005)(110136005)(30864003)(54906003)(316002)(66946007)(16526019)(478600001)(45080400002)(66574012)(66556008)(5660300002)(66476007)(31686004)(186003)(2616005)(36756003)(6636002)(52116002)(86362001)(6666004)(81156014)(31696002)(53546011)(8676002)(33964004)(4326008)(2906002)(8936002)(6486002)(559001)(579004); DIR:OUT; SFP:1101; Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: EqLcRpeJO1w91ZVX/d7VhKgHUMoz4MPcW/XjLZptFgTPrpHbul4HVd5LEqg96pZhMPpS99whFCGTPz0CLqNWZf4GTr47vErO2iXHd9psiSR+SVOoU21N93LrUIEWTaU4io7nFqZbPkgx0RJk31xAbroMtRyiT2+zQgTAmk++omA++/0FfqTDPfDgobbwoiORGLrYczikhNB5JzbuGnoT+kYVXcXMB5VtCWc9q6lRkk6HoX3sXNdOhQiY+gzjd6vzQJBCdW9bo9gkr7tXHXWWRVEx0+Dz//qp+bPKI+TpbOyOI+BTncSxh4s7LUalwEIeYTUOZoLL3Z0WPAJTOuQDYnSOmU5lZxoarF5Ff5r308ySFM+n8/tc/3FPYPhvS1VeofOnNf38ZFV1kbtQPeMQfWQXdgBMhXej1Syeweqrb8t4vxTlpUpBlgr4wnzFBC11jH7FCDQAaWgSaXMidKiXDjtpocb52wOO8S28n5NWrag= X-MS-Exchange-AntiSpam-MessageData: Ma71q9wnqTKIcvHeGpMqvXpeeNuicwuhypZOz7mU9kYs6Lw4/m9xzjX7NBwKJq65EqdjF1iUs3yRrv6lHNv0794MtgL8ybU+IgZvyLKIExl+Lg9WFkJA5tYfH96fK1/0vahuCRw6VZ20/uj0/5WTvPuDhHDrPlDac2EtmiC3hht0tZHNywbZblrYCYwyohVn2DzPr9aVrwfsc2y+7QSxKA== X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7e5d5827-b885-494e-77e5-08d7e11493ee X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Apr 2020 08:11:19.6436 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: e42WeGIlhSZwugo402TyRspI/4RvQAKj2IeCDEi+GmSGhKVTtFnZaRokPvoAmPSV X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4435 X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Russell, Kent" , "amd-gfx@lists.freedesktop.org" Content-Type: multipart/mixed; boundary="===============1419015803==" Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" --===============1419015803== Content-Type: multipart/alternative; boundary="------------E16AA641935D14B04B1B31B2" Content-Language: en-US --------------E16AA641935D14B04B1B31B2 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hi Jon, > Also cwsr tests fail on Vega20 with or without the revert with the > same RAS error. That sounds like the system/setup has a more general problem. Could it be that we are seeing RAS errors because there really is some hardware failure, but with the MM path we don't trigger a RAS interrupt? Thanks, Christian. Am 14.04.20 um 22:30 schrieb Kim, Jonathan: > > [AMD Official Use Only - Internal Distribution Only] > > If we’re passing the test on the revert, then the only thing that’s > different is we’re not invalidating HDP and doing a copy to host > anymore in amdgpu_device_vram_access since the function is still > called in ttm access_memory with BAR. > > Also cwsr tests fail on Vega20 with or without the revert with the > same RAS error. > > Thanks, > > Jon > > *From:* Kuehling, Felix > *Sent:* Tuesday, April 14, 2020 2:32 PM > *To:* Kim, Jonathan ; Koenig, Christian > ; Deucher, Alexander > *Cc:* Russell, Kent ; amd-gfx@lists.freedesktop.org > *Subject:* Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in > amdgpu_device_vram_access v2" > > I wouldn't call it premature. Revert is a usual practice when there is > a serious regression that isn't fully understood or root-caused. As > far as I can tell, the problem has been reproduced on multiple > systems, different GPUs, and clearly regressed to Christian's commit. > I think that justifies reverting it for now. > > I agree with Christian that a general HDP memory access problem > causing RAS errors would potentially cause problems in other tests as > well. For example common operations like GART table updates, and GPUVM > page table updates and PCIe peer2peer accesses in ROCm applications > use HDP. But we're not seeing obvious problems from those. So we need > to understand what's special about this test. I asked questions to > that effect on our other email thread. > > Regards, >   Felix > > Am 2020-04-14 um 10:51 a.m. schrieb Kim, Jonathan: > > [AMD Official Use Only - Internal Distribution Only] > > I think it’s premature to push this revert. > > With more testing, I’m getting failures from different tests or > sometimes none at all on my machine. > > Kent, let’s continue the discussion on the original thread. > > Thanks, > > Jon > > *From:* Koenig, Christian > > *Sent:* Tuesday, April 14, 2020 10:47 AM > *To:* Deucher, Alexander > > *Cc:* Russell, Kent > ; amd-gfx@lists.freedesktop.org > ; Kuehling, Felix > ; Kim, > Jonathan > *Subject:* Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible > in amdgpu_device_vram_access v2" > > That's exactly my concern as well. > > This looks a bit like the test creates erroneous data somehow, but > there doesn't seems to be a RAS check in the MM data path. > > And now that we use the BAR path it goes up in flames. > > I just don't see how we can create erroneous data in a test case? > > Christian. > > Am 14.04.2020 16:35 schrieb "Deucher, Alexander" > >: > > [AMD Public Use] > > If this causes an issue, any access to vram via the BAR could > cause an issue. > > Alex > > ------------------------------------------------------------------------ > > *From:*amd-gfx > on behalf of > Russell, Kent > > *Sent:* Tuesday, April 14, 2020 10:19 AM > *To:* Koenig, Christian >; > amd-gfx@lists.freedesktop.org > > > > *Cc:* Kuehling, Felix >; Kim, Jonathan > > > *Subject:* RE: [PATCH] Revert "drm/amdgpu: use the BAR if > possible in amdgpu_device_vram_access v2" > > [AMD Official Use Only - Internal Distribution Only] > > On VG20 or MI100, as soon as we run the subtest, we get the > dmesg output below, and then the kernel ends up hanging. I > don't know enough about the test itself to know why this is > occurring, but Jon Kim and Felix were discussing it on a > separate thread when the issue was first reported, so they can > hopefully provide some additional information. > >  Kent > > > -----Original Message----- > > From: Christian König > > > Sent: Tuesday, April 14, 2020 9:52 AM > > To: Russell, Kent >; amd-gfx@lists.freedesktop.org > > > Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if > possible in > > amdgpu_device_vram_access v2" > > > > Am 13.04.20 um 20:20 schrieb Kent Russell: > > > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e. > > > The original patch causes a RAS event and subsequent > kernel hard-hang > > > when running the KFDMemoryTest.PtraceAccessInvisibleVram > on VG20 and > > > Arcturus > > > > > > dmesg output at hang time: > > > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected! > > > amdgpu 0000:67:00.0: GPU reset begin! > > > Evicting PASID 0x8000 queues > > > Started evicting pasid 0x8000 > > > qcm fence wait loop timeout expired > > > The cp might be in an unrecoverable state due to an > unsuccessful > > > queues preemption Failed to evict process queues Failed to > suspend > > > process 0x8000 Finished evicting pasid 0x8000 Started > restoring pasid > > > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU > state may lost > > > due to RAS ERREVENT_ATHUB_INTERRUPT > > > amdgpu: [powerplay] Failed to send message 0x26, response 0x0 > > > amdgpu: [powerplay] Failed to set soft min gfxclk ! > > > amdgpu: [powerplay] Failed to upload DPM Bootup Levels! > > > amdgpu: [powerplay] Failed to send message 0x7, response 0x0 > > > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to > disable all smu > > features! > > > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable > all smu features! > > > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM! > > > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* > suspend of IP > > > block failed -5 > > > > Do you have more information on what's going wrong here > since this is a really > > important patch for KFD debugging. > > > > > > > > Signed-off-by: Kent Russell > > > > > Reviewed-by: Christian König > > > > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 > ---------------------- > > >   1 file changed, 26 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > index cf5d6e585634..a3f997f84020 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct > > amdgpu_device *adev, loff_t pos, > > >      uint32_t hi = ~0; > > >      uint64_t last; > > > > > > - > > > -#ifdef CONFIG_64BIT > > > -   last = min(pos + size, adev->gmc.visible_vram_size); > > > -   if (last > pos) { > > > -           void __iomem *addr = > adev->mman.aper_base_kaddr + pos; > > > -           size_t count = last - pos; > > > - > > > -           if (write) { > > > - memcpy_toio(addr, buf, count); > > > -                   mb(); > > > - amdgpu_asic_flush_hdp(adev, NULL); > > > -           } else { > > > - amdgpu_asic_invalidate_hdp(adev, NULL); > > > -                   mb(); > > > - memcpy_fromio(buf, addr, count); > > > -           } > > > - > > > -           if (count == size) > > > -                   return; > > > - > > > -           pos += count; > > > -           buf += count / 4; > > > -           size -= count; > > > -   } > > > -#endif > > > - > > > spin_lock_irqsave(&adev->mmio_idx_lock, flags); > > >      for (last = pos + size; pos < last; pos += 4) { > > >              uint32_t tmp = pos >> 31; > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0 > > Am 14.04.2020 16:35 schrieb "Deucher, Alexander" > >: > > [AMD Public Use] > > If this causes an issue, any access to vram via the BAR could > cause an issue. > > Alex > > ------------------------------------------------------------------------ > > *From:*amd-gfx > on behalf of > Russell, Kent > > *Sent:* Tuesday, April 14, 2020 10:19 AM > *To:* Koenig, Christian >; > amd-gfx@lists.freedesktop.org > > > > *Cc:* Kuehling, Felix >; Kim, Jonathan > > > *Subject:* RE: [PATCH] Revert "drm/amdgpu: use the BAR if > possible in amdgpu_device_vram_access v2" > > [AMD Official Use Only - Internal Distribution Only] > > On VG20 or MI100, as soon as we run the subtest, we get the > dmesg output below, and then the kernel ends up hanging. I > don't know enough about the test itself to know why this is > occurring, but Jon Kim and Felix were discussing it on a > separate thread when the issue was first reported, so they can > hopefully provide some additional information. > >  Kent > > > -----Original Message----- > > From: Christian König > > > Sent: Tuesday, April 14, 2020 9:52 AM > > To: Russell, Kent >; amd-gfx@lists.freedesktop.org > > > Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if > possible in > > amdgpu_device_vram_access v2" > > > > Am 13.04.20 um 20:20 schrieb Kent Russell: > > > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e. > > > The original patch causes a RAS event and subsequent > kernel hard-hang > > > when running the KFDMemoryTest.PtraceAccessInvisibleVram > on VG20 and > > > Arcturus > > > > > > dmesg output at hang time: > > > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected! > > > amdgpu 0000:67:00.0: GPU reset begin! > > > Evicting PASID 0x8000 queues > > > Started evicting pasid 0x8000 > > > qcm fence wait loop timeout expired > > > The cp might be in an unrecoverable state due to an > unsuccessful > > > queues preemption Failed to evict process queues Failed to > suspend > > > process 0x8000 Finished evicting pasid 0x8000 Started > restoring pasid > > > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU > state may lost > > > due to RAS ERREVENT_ATHUB_INTERRUPT > > > amdgpu: [powerplay] Failed to send message 0x26, response 0x0 > > > amdgpu: [powerplay] Failed to set soft min gfxclk ! > > > amdgpu: [powerplay] Failed to upload DPM Bootup Levels! > > > amdgpu: [powerplay] Failed to send message 0x7, response 0x0 > > > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to > disable all smu > > features! > > > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable > all smu features! > > > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM! > > > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* > suspend of IP > > > block failed -5 > > > > Do you have more information on what's going wrong here > since this is a really > > important patch for KFD debugging. > > > > > > > > Signed-off-by: Kent Russell > > > > > Reviewed-by: Christian König > > > > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 > ---------------------- > > >   1 file changed, 26 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > index cf5d6e585634..a3f997f84020 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct > > amdgpu_device *adev, loff_t pos, > > >      uint32_t hi = ~0; > > >      uint64_t last; > > > > > > - > > > -#ifdef CONFIG_64BIT > > > -   last = min(pos + size, adev->gmc.visible_vram_size); > > > -   if (last > pos) { > > > -           void __iomem *addr = > adev->mman.aper_base_kaddr + pos; > > > -           size_t count = last - pos; > > > - > > > -           if (write) { > > > - memcpy_toio(addr, buf, count); > > > -                   mb(); > > > - amdgpu_asic_flush_hdp(adev, NULL); > > > -           } else { > > > - amdgpu_asic_invalidate_hdp(adev, NULL); > > > -                   mb(); > > > - memcpy_fromio(buf, addr, count); > > > -           } > > > - > > > -           if (count == size) > > > -                   return; > > > - > > > -           pos += count; > > > -           buf += count / 4; > > > -           size -= count; > > > -   } > > > -#endif > > > - > > > spin_lock_irqsave(&adev->mmio_idx_lock, flags); > > >      for (last = pos + size; pos < last; pos += 4) { > > >              uint32_t tmp = pos >> 31; > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0 > > Am 14.04.2020 16:35 schrieb "Deucher, Alexander" > >: > > [AMD Public Use] > > If this causes an issue, any access to vram via the BAR could > cause an issue. > > Alex > > ------------------------------------------------------------------------ > > *From:*amd-gfx > on behalf of > Russell, Kent > > *Sent:* Tuesday, April 14, 2020 10:19 AM > *To:* Koenig, Christian >; > amd-gfx@lists.freedesktop.org > > > > *Cc:* Kuehling, Felix >; Kim, Jonathan > > > *Subject:* RE: [PATCH] Revert "drm/amdgpu: use the BAR if > possible in amdgpu_device_vram_access v2" > > [AMD Official Use Only - Internal Distribution Only] > > On VG20 or MI100, as soon as we run the subtest, we get the > dmesg output below, and then the kernel ends up hanging. I > don't know enough about the test itself to know why this is > occurring, but Jon Kim and Felix were discussing it on a > separate thread when the issue was first reported, so they can > hopefully provide some additional information. > >  Kent > > > -----Original Message----- > > From: Christian König > > > Sent: Tuesday, April 14, 2020 9:52 AM > > To: Russell, Kent >; amd-gfx@lists.freedesktop.org > > > Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if > possible in > > amdgpu_device_vram_access v2" > > > > Am 13.04.20 um 20:20 schrieb Kent Russell: > > > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e. > > > The original patch causes a RAS event and subsequent > kernel hard-hang > > > when running the KFDMemoryTest.PtraceAccessInvisibleVram > on VG20 and > > > Arcturus > > > > > > dmesg output at hang time: > > > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected! > > > amdgpu 0000:67:00.0: GPU reset begin! > > > Evicting PASID 0x8000 queues > > > Started evicting pasid 0x8000 > > > qcm fence wait loop timeout expired > > > The cp might be in an unrecoverable state due to an > unsuccessful > > > queues preemption Failed to evict process queues Failed to > suspend > > > process 0x8000 Finished evicting pasid 0x8000 Started > restoring pasid > > > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU > state may lost > > > due to RAS ERREVENT_ATHUB_INTERRUPT > > > amdgpu: [powerplay] Failed to send message 0x26, response 0x0 > > > amdgpu: [powerplay] Failed to set soft min gfxclk ! > > > amdgpu: [powerplay] Failed to upload DPM Bootup Levels! > > > amdgpu: [powerplay] Failed to send message 0x7, response 0x0 > > > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to > disable all smu > > features! > > > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable > all smu features! > > > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM! > > > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* > suspend of IP > > > block failed -5 > > > > Do you have more information on what's going wrong here > since this is a really > > important patch for KFD debugging. > > > > > > > > Signed-off-by: Kent Russell > > > > > Reviewed-by: Christian König > > > > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 > ---------------------- > > >   1 file changed, 26 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > index cf5d6e585634..a3f997f84020 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct > > amdgpu_device *adev, loff_t pos, > > >      uint32_t hi = ~0; > > >      uint64_t last; > > > > > > - > > > -#ifdef CONFIG_64BIT > > > -   last = min(pos + size, adev->gmc.visible_vram_size); > > > -   if (last > pos) { > > > -           void __iomem *addr = > adev->mman.aper_base_kaddr + pos; > > > -           size_t count = last - pos; > > > - > > > -           if (write) { > > > - memcpy_toio(addr, buf, count); > > > -                   mb(); > > > - amdgpu_asic_flush_hdp(adev, NULL); > > > -           } else { > > > - amdgpu_asic_invalidate_hdp(adev, NULL); > > > -                   mb(); > > > - memcpy_fromio(buf, addr, count); > > > -           } > > > - > > > -           if (count == size) > > > -                   return; > > > - > > > -           pos += count; > > > -           buf += count / 4; > > > -           size -= count; > > > -   } > > > -#endif > > > - > > > spin_lock_irqsave(&adev->mmio_idx_lock, flags); > > >      for (last = pos + size; pos < last; pos += 4) { > > >              uint32_t tmp = pos >> 31; > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0 > > Am 14.04.2020 16:35 schrieb "Deucher, Alexander" > >: > > [AMD Public Use] > > If this causes an issue, any access to vram via the BAR could > cause an issue. > > Alex > > ------------------------------------------------------------------------ > > *From:*amd-gfx > on behalf of > Russell, Kent > > *Sent:* Tuesday, April 14, 2020 10:19 AM > *To:* Koenig, Christian >; > amd-gfx@lists.freedesktop.org > > > > *Cc:* Kuehling, Felix >; Kim, Jonathan > > > *Subject:* RE: [PATCH] Revert "drm/amdgpu: use the BAR if > possible in amdgpu_device_vram_access v2" > > [AMD Official Use Only - Internal Distribution Only] > > On VG20 or MI100, as soon as we run the subtest, we get the > dmesg output below, and then the kernel ends up hanging. I > don't know enough about the test itself to know why this is > occurring, but Jon Kim and Felix were discussing it on a > separate thread when the issue was first reported, so they can > hopefully provide some additional information. > >  Kent > > > -----Original Message----- > > From: Christian König > > > Sent: Tuesday, April 14, 2020 9:52 AM > > To: Russell, Kent >; amd-gfx@lists.freedesktop.org > > > Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if > possible in > > amdgpu_device_vram_access v2" > > > > Am 13.04.20 um 20:20 schrieb Kent Russell: > > > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e. > > > The original patch causes a RAS event and subsequent > kernel hard-hang > > > when running the KFDMemoryTest.PtraceAccessInvisibleVram > on VG20 and > > > Arcturus > > > > > > dmesg output at hang time: > > > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected! > > > amdgpu 0000:67:00.0: GPU reset begin! > > > Evicting PASID 0x8000 queues > > > Started evicting pasid 0x8000 > > > qcm fence wait loop timeout expired > > > The cp might be in an unrecoverable state due to an > unsuccessful > > > queues preemption Failed to evict process queues Failed to > suspend > > > process 0x8000 Finished evicting pasid 0x8000 Started > restoring pasid > > > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU > state may lost > > > due to RAS ERREVENT_ATHUB_INTERRUPT > > > amdgpu: [powerplay] Failed to send message 0x26, response 0x0 > > > amdgpu: [powerplay] Failed to set soft min gfxclk ! > > > amdgpu: [powerplay] Failed to upload DPM Bootup Levels! > > > amdgpu: [powerplay] Failed to send message 0x7, response 0x0 > > > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to > disable all smu > > features! > > > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable > all smu features! > > > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM! > > > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* > suspend of IP > > > block failed -5 > > > > Do you have more information on what's going wrong here > since this is a really > > important patch for KFD debugging. > > > > > > > > Signed-off-by: Kent Russell > > > > > Reviewed-by: Christian König > > > > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 > ---------------------- > > >   1 file changed, 26 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > index cf5d6e585634..a3f997f84020 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct > > amdgpu_device *adev, loff_t pos, > > >      uint32_t hi = ~0; > > >      uint64_t last; > > > > > > - > > > -#ifdef CONFIG_64BIT > > > -   last = min(pos + size, adev->gmc.visible_vram_size); > > > -   if (last > pos) { > > > -           void __iomem *addr = > adev->mman.aper_base_kaddr + pos; > > > -           size_t count = last - pos; > > > - > > > -           if (write) { > > > - memcpy_toio(addr, buf, count); > > > -                   mb(); > > > - amdgpu_asic_flush_hdp(adev, NULL); > > > -           } else { > > > - amdgpu_asic_invalidate_hdp(adev, NULL); > > > -                   mb(); > > > - memcpy_fromio(buf, addr, count); > > > -           } > > > - > > > -           if (count == size) > > > -                   return; > > > - > > > -           pos += count; > > > -           buf += count / 4; > > > -           size -= count; > > > -   } > > > -#endif > > > - > > > spin_lock_irqsave(&adev->mmio_idx_lock, flags); > > >      for (last = pos + size; pos < last; pos += 4) { > > >              uint32_t tmp = pos >> 31; > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0 > > Am 14.04.2020 16:35 schrieb "Deucher, Alexander" > >: > > [AMD Public Use] > > If this causes an issue, any access to vram via the BAR could > cause an issue. > > Alex > > ------------------------------------------------------------------------ > > *From:*amd-gfx > on behalf of > Russell, Kent > > *Sent:* Tuesday, April 14, 2020 10:19 AM > *To:* Koenig, Christian >; amd-gfx@lists.freedesktop.org > > > > *Cc:* Kuehling, Felix >; Kim, Jonathan > > > *Subject:* RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible > in amdgpu_device_vram_access v2" > > [AMD Official Use Only - Internal Distribution Only] > > On VG20 or MI100, as soon as we run the subtest, we get the dmesg > output below, and then the kernel ends up hanging. I don't know > enough about the test itself to know why this is occurring, but > Jon Kim and Felix were discussing it on a separate thread when the > issue was first reported, so they can hopefully provide some > additional information. > >  Kent > > > -----Original Message----- > > From: Christian König > > > Sent: Tuesday, April 14, 2020 9:52 AM > > To: Russell, Kent >; amd-gfx@lists.freedesktop.org > > > Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in > > amdgpu_device_vram_access v2" > > > > Am 13.04.20 um 20:20 schrieb Kent Russell: > > > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e. > > > The original patch causes a RAS event and subsequent kernel > hard-hang > > > when running the KFDMemoryTest.PtraceAccessInvisibleVram on > VG20 and > > > Arcturus > > > > > > dmesg output at hang time: > > > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected! > > > amdgpu 0000:67:00.0: GPU reset begin! > > > Evicting PASID 0x8000 queues > > > Started evicting pasid 0x8000 > > > qcm fence wait loop timeout expired > > > The cp might be in an unrecoverable state due to an unsuccessful > > > queues preemption Failed to evict process queues Failed to suspend > > > process 0x8000 Finished evicting pasid 0x8000 Started > restoring pasid > > > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state > may lost > > > due to RAS ERREVENT_ATHUB_INTERRUPT > > > amdgpu: [powerplay] Failed to send message 0x26, response 0x0 > > > amdgpu: [powerplay] Failed to set soft min gfxclk ! > > > amdgpu: [powerplay] Failed to upload DPM Bootup Levels! > > > amdgpu: [powerplay] Failed to send message 0x7, response 0x0 > > > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable > all smu > > features! > > > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all > smu features! > > > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM! > > > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend > of IP > > > block failed -5 > > > > Do you have more information on what's going wrong here since > this is a really > > important patch for KFD debugging. > > > > > > > > Signed-off-by: Kent Russell > > > > > Reviewed-by: Christian König > > > > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 > ---------------------- > > >   1 file changed, 26 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > index cf5d6e585634..a3f997f84020 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct > > amdgpu_device *adev, loff_t pos, > > >      uint32_t hi = ~0; > > >      uint64_t last; > > > > > > - > > > -#ifdef CONFIG_64BIT > > > -   last = min(pos + size, adev->gmc.visible_vram_size); > > > -   if (last > pos) { > > > -           void __iomem *addr = adev->mman.aper_base_kaddr + pos; > > > -           size_t count = last - pos; > > > - > > > -           if (write) { > > > -                   memcpy_toio(addr, buf, count); > > > -                   mb(); > > > - amdgpu_asic_flush_hdp(adev, NULL); > > > -           } else { > > > - amdgpu_asic_invalidate_hdp(adev, NULL); > > > -                   mb(); > > > -                   memcpy_fromio(buf, addr, count); > > > -           } > > > - > > > -           if (count == size) > > > -                   return; > > > - > > > -           pos += count; > > > -           buf += count / 4; > > > -           size -= count; > > > -   } > > > -#endif > > > - > > > spin_lock_irqsave(&adev->mmio_idx_lock, flags); > > >      for (last = pos + size; pos < last; pos += 4) { > > >              uint32_t tmp = pos >> 31; > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&reserved=0 > --------------E16AA641935D14B04B1B31B2 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
Hi Jon,

Also cwsr tests fail on Vega20 with or without the revert with the same RAS error.

That sounds like the system/setup has a more general problem.

Could it be that we are seeing RAS errors because there really is some hardware failure, but with the MM path we don't trigger a RAS interrupt?

Thanks,
Christian.

Am 14.04.20 um 22:30 schrieb Kim, Jonathan:

[AMD Official Use Only - Internal Distribution Only]

 

If we’re passing the test on the revert, then the only thing that’s different is we’re not invalidating HDP and doing a copy to host anymore in amdgpu_device_vram_access since the function is still called in ttm access_memory with BAR.

 

Also cwsr tests fail on Vega20 with or without the revert with the same RAS error.

 

Thanks,

 

Jon

 

From: Kuehling, Felix <Felix.Kuehling@amd.com>
Sent: Tuesday, April 14, 2020 2:32 PM
To: Kim, Jonathan <Jonathan.Kim@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
Cc: Russell, Kent <Kent.Russell@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

 

I wouldn't call it premature. Revert is a usual practice when there is a serious regression that isn't fully understood or root-caused. As far as I can tell, the problem has been reproduced on multiple systems, different GPUs, and clearly regressed to Christian's commit. I think that justifies reverting it for now.

I agree with Christian that a general HDP memory access problem causing RAS errors would potentially cause problems in other tests as well. For example common operations like GART table updates, and GPUVM page table updates and PCIe peer2peer accesses in ROCm applications use HDP. But we're not seeing obvious problems from those. So we need to understand what's special about this test. I asked questions to that effect on our other email thread.

Regards,
  Felix

Am 2020-04-14 um 10:51 a.m. schrieb Kim, Jonathan:

[AMD Official Use Only - Internal Distribution Only]

 

I think it’s premature to push this revert.

 

With more testing, I’m getting failures from different tests or sometimes none at all on my machine.

 

Kent, let’s continue the discussion on the original thread.

 

Thanks,

 

Jon

 

From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Tuesday, April 14, 2020 10:47 AM
To: Deucher, Alexander <Alexander.Deucher@amd.com>
Cc: Russell, Kent <Kent.Russell@amd.com>; amd-gfx@lists.freedesktop.org; Kuehling, Felix <Felix.Kuehling@amd.com>; Kim, Jonathan <Jonathan.Kim@amd.com>
Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

 

That's exactly my concern as well.

 

This looks a bit like the test creates erroneous data somehow, but there doesn't seems to be a RAS check in the MM data path.

 

And now that we use the BAR path it goes up in flames.

 

I just don't see how we can create erroneous data in a test case?

 

Christian.

 

Am 14.04.2020 16:35 schrieb "Deucher, Alexander" <Alexander.Deucher@amd.com>:

[AMD Public Use]

 

If this causes an issue, any access to vram via the BAR could cause an issue.

 

Alex


From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Russell, Kent <Kent.Russell@amd.com>
Sent: Tuesday, April 14, 2020 10:19 AM
To: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Kim, Jonathan <Jonathan.Kim@amd.com>
Subject: RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

 

[AMD Official Use Only - Internal Distribution Only]

On VG20 or MI100, as soon as we run the subtest, we get the dmesg output below, and then the kernel ends up hanging. I don't know enough about the test itself to know why this is occurring, but Jon Kim and Felix were discussing it on a separate thread when the issue was first reported, so they can hopefully provide some additional information.

 Kent

> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Tuesday, April 14, 2020 9:52 AM
> To: Russell, Kent <Kent.Russell@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in
> amdgpu_device_vram_access v2"
>
> Am 13.04.20 um 20:20 schrieb Kent Russell:
> > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e.
> > The original patch causes a RAS event and subsequent kernel hard-hang
> > when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and
> > Arcturus
> >
> > dmesg output at hang time:
> > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected!
> > amdgpu 0000:67:00.0: GPU reset begin!
> > Evicting PASID 0x8000 queues
> > Started evicting pasid 0x8000
> > qcm fence wait loop timeout expired
> > The cp might be in an unrecoverable state due to an unsuccessful
> > queues preemption Failed to evict process queues Failed to suspend
> > process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid
> > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost
> > due to RAS ERREVENT_ATHUB_INTERRUPT
> > amdgpu: [powerplay] Failed to send message 0x26, response 0x0
> > amdgpu: [powerplay] Failed to set soft min gfxclk !
> > amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
> > amdgpu: [powerplay] Failed to send message 0x7, response 0x0
> > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu
> features!
> > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features!
> > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM!
> > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP
> > block <powerplay> failed -5
>
> Do you have more information on what's going wrong here since this is a really
> important patch for KFD debugging.
>
> >
> > Signed-off-by: Kent Russell <kent.russell@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ----------------------
> >   1 file changed, 26 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index cf5d6e585634..a3f997f84020 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct
> amdgpu_device *adev, loff_t pos,
> >      uint32_t hi = ~0;
> >      uint64_t last;
> >
> > -
> > -#ifdef CONFIG_64BIT
> > -   last = min(pos + size, adev->gmc.visible_vram_size);
> > -   if (last > pos) {
> > -           void __iomem *addr = adev->mman.aper_base_kaddr + pos;
> > -           size_t count = last - pos;
> > -
> > -           if (write) {
> > -                   memcpy_toio(addr, buf, count);
> > -                   mb();
> > -                   amdgpu_asic_flush_hdp(adev, NULL);
> > -           } else {
> > -                   amdgpu_asic_invalidate_hdp(adev, NULL);
> > -                   mb();
> > -                   memcpy_fromio(buf, addr, count);
> > -           }
> > -
> > -           if (count == size)
> > -                   return;
> > -
> > -           pos += count;
> > -           buf += count / 4;
> > -           size -= count;
> > -   }
> > -#endif
> > -
> >      spin_lock_irqsave(&adev->mmio_idx_lock, flags);
> >      for (last = pos + size; pos < last; pos += 4) {
> >              uint32_t tmp = pos >> 31;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0

 

 

Am 14.04.2020 16:35 schrieb "Deucher, Alexander" <Alexander.Deucher@amd.com>:

[AMD Public Use]

 

If this causes an issue, any access to vram via the BAR could cause an issue.

 

Alex


From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Russell, Kent <Kent.Russell@amd.com>
Sent: Tuesday, April 14, 2020 10:19 AM
To: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Kim, Jonathan <Jonathan.Kim@amd.com>
Subject: RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

 

[AMD Official Use Only - Internal Distribution Only]

On VG20 or MI100, as soon as we run the subtest, we get the dmesg output below, and then the kernel ends up hanging. I don't know enough about the test itself to know why this is occurring, but Jon Kim and Felix were discussing it on a separate thread when the issue was first reported, so they can hopefully provide some additional information.

 Kent

> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Tuesday, April 14, 2020 9:52 AM
> To: Russell, Kent <Kent.Russell@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in
> amdgpu_device_vram_access v2"
>
> Am 13.04.20 um 20:20 schrieb Kent Russell:
> > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e.
> > The original patch causes a RAS event and subsequent kernel hard-hang
> > when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and
> > Arcturus
> >
> > dmesg output at hang time:
> > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected!
> > amdgpu 0000:67:00.0: GPU reset begin!
> > Evicting PASID 0x8000 queues
> > Started evicting pasid 0x8000
> > qcm fence wait loop timeout expired
> > The cp might be in an unrecoverable state due to an unsuccessful
> > queues preemption Failed to evict process queues Failed to suspend
> > process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid
> > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost
> > due to RAS ERREVENT_ATHUB_INTERRUPT
> > amdgpu: [powerplay] Failed to send message 0x26, response 0x0
> > amdgpu: [powerplay] Failed to set soft min gfxclk !
> > amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
> > amdgpu: [powerplay] Failed to send message 0x7, response 0x0
> > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu
> features!
> > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features!
> > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM!
> > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP
> > block <powerplay> failed -5
>
> Do you have more information on what's going wrong here since this is a really
> important patch for KFD debugging.
>
> >
> > Signed-off-by: Kent Russell <kent.russell@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ----------------------
> >   1 file changed, 26 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index cf5d6e585634..a3f997f84020 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct
> amdgpu_device *adev, loff_t pos,
> >      uint32_t hi = ~0;
> >      uint64_t last;
> >
> > -
> > -#ifdef CONFIG_64BIT
> > -   last = min(pos + size, adev->gmc.visible_vram_size);
> > -   if (last > pos) {
> > -           void __iomem *addr = adev->mman.aper_base_kaddr + pos;
> > -           size_t count = last - pos;
> > -
> > -           if (write) {
> > -                   memcpy_toio(addr, buf, count);
> > -                   mb();
> > -                   amdgpu_asic_flush_hdp(adev, NULL);
> > -           } else {
> > -                   amdgpu_asic_invalidate_hdp(adev, NULL);
> > -                   mb();
> > -                   memcpy_fromio(buf, addr, count);
> > -           }
> > -
> > -           if (count == size)
> > -                   return;
> > -
> > -           pos += count;
> > -           buf += count / 4;
> > -           size -= count;
> > -   }
> > -#endif
> > -
> >      spin_lock_irqsave(&adev->mmio_idx_lock, flags);
> >      for (last = pos + size; pos < last; pos += 4) {
> >              uint32_t tmp = pos >> 31;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0

 

 

Am 14.04.2020 16:35 schrieb "Deucher, Alexander" <Alexander.Deucher@amd.com>:

[AMD Public Use]

 

If this causes an issue, any access to vram via the BAR could cause an issue.

 

Alex


From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Russell, Kent <Kent.Russell@amd.com>
Sent: Tuesday, April 14, 2020 10:19 AM
To: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Kim, Jonathan <Jonathan.Kim@amd.com>
Subject: RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

 

[AMD Official Use Only - Internal Distribution Only]

On VG20 or MI100, as soon as we run the subtest, we get the dmesg output below, and then the kernel ends up hanging. I don't know enough about the test itself to know why this is occurring, but Jon Kim and Felix were discussing it on a separate thread when the issue was first reported, so they can hopefully provide some additional information.

 Kent

> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Tuesday, April 14, 2020 9:52 AM
> To: Russell, Kent <Kent.Russell@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in
> amdgpu_device_vram_access v2"
>
> Am 13.04.20 um 20:20 schrieb Kent Russell:
> > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e.
> > The original patch causes a RAS event and subsequent kernel hard-hang
> > when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and
> > Arcturus
> >
> > dmesg output at hang time:
> > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected!
> > amdgpu 0000:67:00.0: GPU reset begin!
> > Evicting PASID 0x8000 queues
> > Started evicting pasid 0x8000
> > qcm fence wait loop timeout expired
> > The cp might be in an unrecoverable state due to an unsuccessful
> > queues preemption Failed to evict process queues Failed to suspend
> > process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid
> > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost
> > due to RAS ERREVENT_ATHUB_INTERRUPT
> > amdgpu: [powerplay] Failed to send message 0x26, response 0x0
> > amdgpu: [powerplay] Failed to set soft min gfxclk !
> > amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
> > amdgpu: [powerplay] Failed to send message 0x7, response 0x0
> > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu
> features!
> > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features!
> > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM!
> > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP
> > block <powerplay> failed -5
>
> Do you have more information on what's going wrong here since this is a really
> important patch for KFD debugging.
>
> >
> > Signed-off-by: Kent Russell <kent.russell@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ----------------------
> >   1 file changed, 26 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index cf5d6e585634..a3f997f84020 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct
> amdgpu_device *adev, loff_t pos,
> >      uint32_t hi = ~0;
> >      uint64_t last;
> >
> > -
> > -#ifdef CONFIG_64BIT
> > -   last = min(pos + size, adev->gmc.visible_vram_size);
> > -   if (last > pos) {
> > -           void __iomem *addr = adev->mman.aper_base_kaddr + pos;
> > -           size_t count = last - pos;
> > -
> > -           if (write) {
> > -                   memcpy_toio(addr, buf, count);
> > -                   mb();
> > -                   amdgpu_asic_flush_hdp(adev, NULL);
> > -           } else {
> > -                   amdgpu_asic_invalidate_hdp(adev, NULL);
> > -                   mb();
> > -                   memcpy_fromio(buf, addr, count);
> > -           }
> > -
> > -           if (count == size)
> > -                   return;
> > -
> > -           pos += count;
> > -           buf += count / 4;
> > -           size -= count;
> > -   }
> > -#endif
> > -
> >      spin_lock_irqsave(&adev->mmio_idx_lock, flags);
> >      for (last = pos + size; pos < last; pos += 4) {
> >              uint32_t tmp = pos >> 31;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0

 

 

Am 14.04.2020 16:35 schrieb "Deucher, Alexander" <Alexander.Deucher@amd.com>:

[AMD Public Use]

 

If this causes an issue, any access to vram via the BAR could cause an issue.

 

Alex


From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Russell, Kent <Kent.Russell@amd.com>
Sent: Tuesday, April 14, 2020 10:19 AM
To: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Kim, Jonathan <Jonathan.Kim@amd.com>
Subject: RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

 

[AMD Official Use Only - Internal Distribution Only]

On VG20 or MI100, as soon as we run the subtest, we get the dmesg output below, and then the kernel ends up hanging. I don't know enough about the test itself to know why this is occurring, but Jon Kim and Felix were discussing it on a separate thread when the issue was first reported, so they can hopefully provide some additional information.

 Kent

> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Tuesday, April 14, 2020 9:52 AM
> To: Russell, Kent <Kent.Russell@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in
> amdgpu_device_vram_access v2"
>
> Am 13.04.20 um 20:20 schrieb Kent Russell:
> > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e.
> > The original patch causes a RAS event and subsequent kernel hard-hang
> > when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and
> > Arcturus
> >
> > dmesg output at hang time:
> > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected!
> > amdgpu 0000:67:00.0: GPU reset begin!
> > Evicting PASID 0x8000 queues
> > Started evicting pasid 0x8000
> > qcm fence wait loop timeout expired
> > The cp might be in an unrecoverable state due to an unsuccessful
> > queues preemption Failed to evict process queues Failed to suspend
> > process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid
> > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost
> > due to RAS ERREVENT_ATHUB_INTERRUPT
> > amdgpu: [powerplay] Failed to send message 0x26, response 0x0
> > amdgpu: [powerplay] Failed to set soft min gfxclk !
> > amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
> > amdgpu: [powerplay] Failed to send message 0x7, response 0x0
> > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu
> features!
> > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features!
> > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM!
> > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP
> > block <powerplay> failed -5
>
> Do you have more information on what's going wrong here since this is a really
> important patch for KFD debugging.
>
> >
> > Signed-off-by: Kent Russell <kent.russell@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ----------------------
> >   1 file changed, 26 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index cf5d6e585634..a3f997f84020 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct
> amdgpu_device *adev, loff_t pos,
> >      uint32_t hi = ~0;
> >      uint64_t last;
> >
> > -
> > -#ifdef CONFIG_64BIT
> > -   last = min(pos + size, adev->gmc.visible_vram_size);
> > -   if (last > pos) {
> > -           void __iomem *addr = adev->mman.aper_base_kaddr + pos;
> > -           size_t count = last - pos;
> > -
> > -           if (write) {
> > -                   memcpy_toio(addr, buf, count);
> > -                   mb();
> > -                   amdgpu_asic_flush_hdp(adev, NULL);
> > -           } else {
> > -                   amdgpu_asic_invalidate_hdp(adev, NULL);
> > -                   mb();
> > -                   memcpy_fromio(buf, addr, count);
> > -           }
> > -
> > -           if (count == size)
> > -                   return;
> > -
> > -           pos += count;
> > -           buf += count / 4;
> > -           size -= count;
> > -   }
> > -#endif
> > -
> >      spin_lock_irqsave(&adev->mmio_idx_lock, flags);
> >      for (last = pos + size; pos < last; pos += 4) {
> >              uint32_t tmp = pos >> 31;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0

 

 

Am 14.04.2020 16:35 schrieb "Deucher, Alexander" <Alexander.Deucher@amd.com>:

[AMD Public Use]

 

If this causes an issue, any access to vram via the BAR could cause an issue.

 

Alex


From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Russell, Kent <Kent.Russell@amd.com>
Sent: Tuesday, April 14, 2020 10:19 AM
To: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Kuehling, Felix <Felix.Kuehling@amd.com>; Kim, Jonathan <Jonathan.Kim@amd.com>
Subject: RE: [PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

 

[AMD Official Use Only - Internal Distribution Only]

On VG20 or MI100, as soon as we run the subtest, we get the dmesg output below, and then the kernel ends up hanging. I don't know enough about the test itself to know why this is occurring, but Jon Kim and Felix were discussing it on a separate thread when the issue was first reported, so they can hopefully provide some additional information.

 Kent

> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Tuesday, April 14, 2020 9:52 AM
> To: Russell, Kent <Kent.Russell@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in
> amdgpu_device_vram_access v2"
>
> Am 13.04.20 um 20:20 schrieb Kent Russell:
> > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e.
> > The original patch causes a RAS event and subsequent kernel hard-hang
> > when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and
> > Arcturus
> >
> > dmesg output at hang time:
> > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected!
> > amdgpu 0000:67:00.0: GPU reset begin!
> > Evicting PASID 0x8000 queues
> > Started evicting pasid 0x8000
> > qcm fence wait loop timeout expired
> > The cp might be in an unrecoverable state due to an unsuccessful
> > queues preemption Failed to evict process queues Failed to suspend
> > process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid
> > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost
> > due to RAS ERREVENT_ATHUB_INTERRUPT
> > amdgpu: [powerplay] Failed to send message 0x26, response 0x0
> > amdgpu: [powerplay] Failed to set soft min gfxclk !
> > amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
> > amdgpu: [powerplay] Failed to send message 0x7, response 0x0
> > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu
> features!
> > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features!
> > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM!
> > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP
> > block <powerplay> failed -5
>
> Do you have more information on what's going wrong here since this is a really
> important patch for KFD debugging.
>
> >
> > Signed-off-by: Kent Russell <kent.russell@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ----------------------
> >   1 file changed, 26 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index cf5d6e585634..a3f997f84020 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct
> amdgpu_device *adev, loff_t pos,
> >      uint32_t hi = ~0;
> >      uint64_t last;
> >
> > -
> > -#ifdef CONFIG_64BIT
> > -   last = min(pos + size, adev->gmc.visible_vram_size);
> > -   if (last > pos) {
> > -           void __iomem *addr = adev->mman.aper_base_kaddr + pos;
> > -           size_t count = last - pos;
> > -
> > -           if (write) {
> > -                   memcpy_toio(addr, buf, count);
> > -                   mb();
> > -                   amdgpu_asic_flush_hdp(adev, NULL);
> > -           } else {
> > -                   amdgpu_asic_invalidate_hdp(adev, NULL);
> > -                   mb();
> > -                   memcpy_fromio(buf, addr, count);
> > -           }
> > -
> > -           if (count == size)
> > -                   return;
> > -
> > -           pos += count;
> > -           buf += count / 4;
> > -           size -= count;
> > -   }
> > -#endif
> > -
> >      spin_lock_irqsave(&adev->mmio_idx_lock, flags);
> >      for (last = pos + size; pos < last; pos += 4) {
> >              uint32_t tmp = pos >> 31;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Calexander.deucher%40amd.com%7C68e0bfea2a5f4a909ab108d7e07ed164%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637224707637289768&amp;sdata=ttNOHJt0IwywpOIWahKjjuC6OkT1jxduc6iMzYzndpg%3D&amp;reserved=0


--------------E16AA641935D14B04B1B31B2-- --===============1419015803== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx --===============1419015803==--