Found and fixed. If nobody has any more objections are batter names for the new flags I'm going to push this to amd-staging-drm-next tomorrow. Thanks, Christian. Am 17.05.22 um 08:33 schrieb Christian König: > Ok that sounds like a rather simple bug. I will try to take a look. > > Thanks, > Christian. > > Am 17.05.22 um 02:12 schrieb Marek Olšák: >> Dmesg doesn't contain anything. There is no backtrace because it's >> not a crash. The VA map ioctl just fails with the new flag. It looks >> like the flag is considered invalid. >> >> Marek >> >> On Mon., May 16, 2022, 12:13 Christian König, >> wrote: >> >> I don't have access to any gfx10 hardware. >> >> Can you give me a dmesg and/or backtrace, etc..? >> >> I can't push this unless it's working properly. >> >> Christian. >> >> Am 16.05.22 um 14:56 schrieb Marek Olšák: >>> Reproduction steps: >>> - use mesa/main on gfx10.3 (not sure what other GPUs do) >>> - run: radeonsi_mall_noalloc=true glxgears >>> >>> Marek >>> >>> On Mon, May 16, 2022 at 7:53 AM Christian König >>> wrote: >>> >>> Crap, do you have a link to the failure? >>> >>> Am 16.05.22 um 13:10 schrieb Marek Olšák: >>>> I forgot to say: The NOALLOC flag causes an allocation >>>> failure, so there is a kernel bug somewhere. >>>> >>>> Marek >>>> >>>> On Mon, May 16, 2022 at 7:06 AM Marek Olšák >>>> wrote: >>>> >>>> FYI, I think it's time to merge this because the Mesa >>>> commits are going to be merged in ~30 minutes if Gitlab >>>> CI is green, and that includes updated amdgpu_drm.h. >>>> >>>> Marek >>>> >>>> On Wed, May 11, 2022 at 2:55 PM Marek Olšák >>>> wrote: >>>> >>>> Ok sounds good. >>>> >>>> Marek >>>> >>>> On Wed., May 11, 2022, 03:43 Christian König, >>>> wrote: >>>> >>>> It really *is* a NOALLOC feature. In other >>>> words there is no latency improvement on reads >>>> because the cache is always checked, even with >>>> the noalloc flag set. >>>> >>>> The only thing it affects is that misses not >>>> enter the cache and so don't cause any >>>> additional pressure on evicting cache lines. >>>> >>>> You might want to double check with the >>>> hardware guys, but I'm something like 95% sure >>>> that it works this way. >>>> >>>> Christian. >>>> >>>> Am 11.05.22 um 09:22 schrieb Marek Olšák: >>>>> Bypass means that the contents of the cache >>>>> are ignored, which decreases latency at the >>>>> cost of no coherency between bypassed and >>>>> normal memory requests. NOA (noalloc) means >>>>> that the cache is checked and can give you >>>>> cache hits, but misses are not cached and the >>>>> overall latency is higher. I don't know what >>>>> the hw does, but I hope it was misnamed and it >>>>> really means bypass because there is no point >>>>> in doing cache lookups on every memory request >>>>> if the driver wants to disable caching to >>>>> *decrease* latency in the situations when the >>>>> cache isn't helping. >>>>> >>>>> Marek >>>>> >>>>> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo >>>>> wrote: >>>>> >>>>> >>>>> >>>>> On 5/11/2022 11:36 AM, Christian König wrote: >>>>> > Mhm, it doesn't really bypass MALL. It >>>>> just doesn't allocate any MALL >>>>> > entries on write. >>>>> > >>>>> > How about AMDGPU_VM_PAGE_NO_MALL ? >>>>> >>>>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = >>>>> last level cache, * = some sort >>>>> of attribute which decides LLC behaviour] >>>>> >>>>> Thanks, >>>>> Lijo >>>>> >>>>> > >>>>> > Christian. >>>>> > >>>>> > Am 10.05.22 um 23:21 schrieb Marek Olšák: >>>>> >> A better name would be: >>>>> >> AMDGPU_VM_PAGE_BYPASS_MALL >>>>> >> >>>>> >> Marek >>>>> >> >>>>> >> On Fri, May 6, 2022 at 7:23 AM >>>>> Christian König >>>>> >> wrote: >>>>> >> >>>>> >>     Add the AMDGPU_VM_NOALLOC flag to >>>>> let userspace control MALL >>>>> >>     allocation. >>>>> >> >>>>> >>     Only compile tested! >>>>> >> >>>>> >>  Signed-off-by: Christian König >>>>> >>>>> >>     --- >>>>> >> >>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>>> | 2 ++ >>>>> >> >>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | >>>>> 3 +++ >>>>> >> >>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | >>>>> 3 +++ >>>>> >>   include/uapi/drm/amdgpu_drm.h         >>>>>  | 2 ++ >>>>> >>      4 files changed, 10 insertions(+) >>>>> >> >>>>> >>     diff --git >>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>>> >>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>>> >>     index bf97d8f07f57..d8129626581f 100644 >>>>> >>     --- >>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>>> >>     +++ >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>>> >>     @@ -650,6 +650,8 @@ uint64_t >>>>> amdgpu_gem_va_map_flags(struct >>>>> >>  amdgpu_device *adev, uint32_t flags) >>>>> >>     pte_flag |= AMDGPU_PTE_WRITEABLE; >>>>> >>             if (flags & AMDGPU_VM_PAGE_PRT) >>>>> >>     pte_flag |= AMDGPU_PTE_PRT; >>>>> >>     +       if (flags & >>>>> AMDGPU_VM_PAGE_NOALLOC) >>>>> >>     +      pte_flag |= AMDGPU_PTE_NOALLOC; >>>>> >> >>>>> >>             if >>>>> (adev->gmc.gmc_funcs->map_mtype) >>>>> >>     pte_flag |= amdgpu_gmc_map_mtype(adev, >>>>> >>     diff --git >>>>> a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>>>> >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>>>> >>     index b8c79789e1e4..9077dfccaf3c 100644 >>>>> >>     --- >>>>> a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>>>> >>     +++ >>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>>>> >>     @@ -613,6 +613,9 @@ static void >>>>> gmc_v10_0_get_vm_pte(struct >>>>> >>  amdgpu_device *adev, >>>>> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK; >>>>> >> *flags |= (mapping->flags & >>>>> AMDGPU_PTE_MTYPE_NV10_MASK); >>>>> >> >>>>> >>     +  *flags &= ~AMDGPU_PTE_NOALLOC; >>>>> >>     +  *flags |= (mapping->flags & >>>>> AMDGPU_PTE_NOALLOC); >>>>> >>     + >>>>> >>             if (mapping->flags & >>>>> AMDGPU_PTE_PRT) { >>>>> >>     *flags |= AMDGPU_PTE_PRT; >>>>> >>     *flags |= AMDGPU_PTE_SNOOPED; >>>>> >>     diff --git >>>>> a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c >>>>> >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c >>>>> >>     index 8d733eeac556..32ee56adb602 100644 >>>>> >>     --- >>>>> a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c >>>>> >>     +++ >>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c >>>>> >>     @@ -508,6 +508,9 @@ static void >>>>> gmc_v11_0_get_vm_pte(struct >>>>> >>  amdgpu_device *adev, >>>>> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK; >>>>> >> *flags |= (mapping->flags & >>>>> AMDGPU_PTE_MTYPE_NV10_MASK); >>>>> >> >>>>> >>     +  *flags &= ~AMDGPU_PTE_NOALLOC; >>>>> >>     +  *flags |= (mapping->flags & >>>>> AMDGPU_PTE_NOALLOC); >>>>> >>     + >>>>> >>             if (mapping->flags & >>>>> AMDGPU_PTE_PRT) { >>>>> >>     *flags |= AMDGPU_PTE_PRT; >>>>> >>     *flags |= AMDGPU_PTE_SNOOPED; >>>>> >>     diff --git >>>>> a/include/uapi/drm/amdgpu_drm.h >>>>> >>  b/include/uapi/drm/amdgpu_drm.h >>>>> >>     index 57b9d8f0133a..9d71d6330687 100644 >>>>> >>     --- a/include/uapi/drm/amdgpu_drm.h >>>>> >>     +++ b/include/uapi/drm/amdgpu_drm.h >>>>> >>     @@ -533,6 +533,8 @@ struct >>>>> drm_amdgpu_gem_op { >>>>> >>      #define AMDGPU_VM_MTYPE_UC       >>>>>  (4 << 5) >>>>> >>      /* Use Read Write MTYPE instead of >>>>> default MTYPE */ >>>>> >>      #define AMDGPU_VM_MTYPE_RW       >>>>>  (5 << 5) >>>>> >>     +/* don't allocate MALL */ >>>>> >>     +#define AMDGPU_VM_PAGE_NOALLOC     >>>>>    (1 << 9) >>>>> >> >>>>> >>      struct drm_amdgpu_gem_va { >>>>> >>             /** GEM object handle */ >>>>> >>     -- >>>>> >>     2.25.1 >>>>> >> >>>>> > >>>>> >>>> >>> >> >