Re: [PATCH 3/3] drm/amdgpu: Fix multi-level page table bugs for large BOs

From: "Christian König" <deathsimple-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
To: "Zhang,
	Jerry (Junwei)" <Jerry.Zhang-5C7GfCeVMHo@public.gmane.org>,
	"Kuehling, Felix" <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"Koenig,
	Christian" <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>,
	"Zhou,
	David(ChunMing)" <David1.Zhou-5C7GfCeVMHo@public.gmane.org>
Cc: "Deucher,
	Alexander" <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org>,
	"Russell, Kent" <Kent.Russell-5C7GfCeVMHo@public.gmane.org>
Subject: Re: [PATCH 3/3] drm/amdgpu: Fix multi-level page table bugs for large BOs
Date: Wed, 29 Mar 2017 08:52:53 +0200	[thread overview]
Message-ID: <8acdbd65-f99d-3e0e-b61b-318cf27d7a3b@vodafone.de> (raw)
In-Reply-To: <58DB4D02.7020903-5C7GfCeVMHo@public.gmane.org>

Am 29.03.2017 um 07:58 schrieb Zhang, Jerry (Junwei):
> Hi Felix,
>
> Thanks for your illustration with patience.
> I got your meaning then, and thanks to fix that.
>
> > I think you're right, there are some extra high bits in saddr and 
> eaddr, but
> > they get discarded by the modulo-division in the next recursion 
> level. For
> > added clarity the modulo division (or masking of high bits) could be 
> moved up
> > to the caller.
>
> It may be more clear if modulo-division called before the caller.

Yeah, indeed that isn't easy to get on first glance.

Please clean that up and also remove all those WARN_ON(), we don't want 
to spam the system log with backtraces like this.

Regards,
Christian.

> Anyway,
> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
>
>
> On 03/29/2017 11:24 AM, Kuehling, Felix wrote:
>> Hi Jerry,
>>
>> Let me clarify my understanding of the problem and the intention of 
>> the fix.
>>
>> Address range per page: 4KB
>> Address range per PT (level 3): 2MB
>> Address range per PD (level 2): 1GB
>> Address range per PD (level 1): 512GB
>> Address range per PD (level 0): 256TB
>>
>> Imagine for example a mapping that spans multiple level 2 page 
>> directories. Say
>> from 0.5GB to 2.5GB.
>>
>> At level 0:
>> from = 0
>> to = 0
>>
>> At level 1:
>> from = 0
>> to = 2
>>
>> So for level 2 amdgpu_vm_alloc_levels gets called 3 times (pt_idx = 
>> [0..2]).
>> Without my change, it gets called with the same saddr and eaddr 3 
>> times. So it
>> calculates the same "from" and "to" each time:
>> from = 256 % 512 = 256
>> to = 767 % 512 = 255
>>
>> Obviously that doesn't work. What we really want is this (from..to in 
>> level 2
>> when called for pt_idx values from level 1):
>>
>> For pt_idx=0 we want to fill the second half of the level 2 page 
>> directory with
>> page tables:
>> from = 256
>> to = 511
>>
>> For pt_idx=1 we need to fill the entire level 2 page directories with 
>> page tables:
>> from = 0
>> to = 511
>>
>> For pt_idx=2 we want to fill the first half of the level 2 page 
>> directory with
>> page tables:
>> from = 0
>> to = 255
>>
>> This creates a contiguous range of page tables across the three level 
>> 2 page
>> directories that are spanned by the allocation. My change achieves that.
>>
>> I think you're right, there are some extra high bits in saddr and 
>> eaddr, but
>> they get discarded by the modulo-division in the next recursion 
>> level. For
>> added clarity the modulo division (or masking of high bits) could be 
>> moved up
>> to the caller.
>>
>> Regards,
>>    Felix
>>
>>
>> -- 
>> F e l i x   K u e h l i n g
>> SMTS Software Development Engineer | Vertical Workstation/Compute
>> 1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
>> (O) +1(289)695-1597
>>     _     _   _   _____   _____
>>    / \   | \ / | |  _  \  \ _  |
>>   / A \  | \M/ | | |D) )  /|_| |
>> /_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com
>> ------------------------------------------------------------------------------- 
>>
>> *From:* Zhang, Jerry
>> *Sent:* Tuesday, March 28, 2017 10:54:03 PM
>> *To:* Kuehling, Felix; amd-gfx@lists.freedesktop.org; Koenig, 
>> Christian; Zhou,
>> David(ChunMing)
>> *Cc:* Deucher, Alexander; Russell, Kent
>> *Subject:* Re: [PATCH 3/3] drm/amdgpu: Fix multi-level page table 
>> bugs for
>> large BOs
>> On 03/29/2017 09:00 AM, Felix Kuehling wrote:
>>> Fix the start/end address calculation for address ranges that span
>>> multiple page directories in amdgpu_vm_alloc_levels.
>>>
>>> Add WARN_ONs if page tables aren't found. Otherwise the page table
>>> update would just fail silently.
>>>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++++++---
>>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 818747f..44aa039 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -278,6 +278,8 @@ static int amdgpu_vm_alloc_levels(struct 
>>> amdgpu_device *adev,
>>>        from = (saddr >> shift) % amdgpu_vm_num_entries(adev, level);
>>>        to = (eaddr >> shift) % amdgpu_vm_num_entries(adev, level);
>>>
>>> +     WARN_ON(from > to);
>>> +
>>>        if (to > parent->last_entry_used)
>>>                parent->last_entry_used = to;
>>>
>>> @@ -312,8 +314,11 @@ static int amdgpu_vm_alloc_levels(struct 
>>> amdgpu_device *adev,
>>>                }
>>>
>>>                if (level < adev->vm_manager.num_level) {
>>> -                     r = amdgpu_vm_alloc_levels(adev, vm, entry, 
>>> saddr,
>>> -                                                eaddr, level);
>>> +                     uint64_t sub_saddr = (pt_idx == from) ? saddr 
>>> : 0;
>>> +                     uint64_t sub_eaddr = (pt_idx == to) ? eaddr :
>>> +                             ((1 << shift) - 1);
>>> +                     r = amdgpu_vm_alloc_levels(adev, vm, entry, 
>>> sub_saddr,
>>> +                                                sub_eaddr, level);
>>
>> Do you mean to create a full sub-pt if pt_idx != from and != to?
>> I didn't catch it up clear.
>>
>> In my perspective, we may need to clear the saddr and eaddr upper 
>> level bit
>> before going on to allocate the next level PT.
>> Otherwise, the number of next PT entry will be incorrect, more than 
>> num_entries
>> as the upper level PT number calculated in.
>>
>> e.g.
>> there is a address contains
>> PD + PT1 + PT2 + PT3
>>
>> for the first round, we could get correct "from" and "to" address 
>> with right
>> "shift"(3 blocks), then walk over the address space in PD range.
>>
>> Then for the 2nd round, we cannot get the expected "from" and "to" 
>> address with
>> "shift"(2 blocks), as it walks over the address space in PD + PT1 range.
>> While we'd like it's in range PT1 indeed.
>>
>> The fault way goes on with later round.
>>
>> Perhaps, that's the root cause about the issue you face.
>>
>> Jerry.
>>
>>>                        if (r)
>>>                                return r;
>>>                }
>>> @@ -957,9 +962,11 @@ static struct amdgpu_bo 
>>> *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p,
>>>                entry = &entry->entries[idx];
>>>        }
>>>
>>> -     if (level)
>>> +     if (WARN_ON(level))
>>>                return NULL;
>>>
>>> +     WARN_ON(!entry->bo);
>>> +
>>>        return entry->bo;
>>>   }
>>>
>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx