Ok that sounds like a rather simple bug. I will try to take a look. Thanks, Christian. Am 17.05.22 um 02:12 schrieb Marek Olšák: > Dmesg doesn't contain anything. There is no backtrace because it's not > a crash. The VA map ioctl just fails with the new flag. It looks like > the flag is considered invalid. > > Marek > > On Mon., May 16, 2022, 12:13 Christian König, > wrote: > > I don't have access to any gfx10 hardware. > > Can you give me a dmesg and/or backtrace, etc..? > > I can't push this unless it's working properly. > > Christian. > > Am 16.05.22 um 14:56 schrieb Marek Olšák: >> Reproduction steps: >> - use mesa/main on gfx10.3 (not sure what other GPUs do) >> - run: radeonsi_mall_noalloc=true glxgears >> >> Marek >> >> On Mon, May 16, 2022 at 7:53 AM Christian König >> wrote: >> >> Crap, do you have a link to the failure? >> >> Am 16.05.22 um 13:10 schrieb Marek Olšák: >>> I forgot to say: The NOALLOC flag causes an allocation >>> failure, so there is a kernel bug somewhere. >>> >>> Marek >>> >>> On Mon, May 16, 2022 at 7:06 AM Marek Olšák >>> wrote: >>> >>> FYI, I think it's time to merge this because the Mesa >>> commits are going to be merged in ~30 minutes if Gitlab >>> CI is green, and that includes updated amdgpu_drm.h. >>> >>> Marek >>> >>> On Wed, May 11, 2022 at 2:55 PM Marek Olšák >>> wrote: >>> >>> Ok sounds good. >>> >>> Marek >>> >>> On Wed., May 11, 2022, 03:43 Christian König, >>> wrote: >>> >>> It really *is* a NOALLOC feature. In other words >>> there is no latency improvement on reads because >>> the cache is always checked, even with the >>> noalloc flag set. >>> >>> The only thing it affects is that misses not >>> enter the cache and so don't cause any >>> additional pressure on evicting cache lines. >>> >>> You might want to double check with the hardware >>> guys, but I'm something like 95% sure that it >>> works this way. >>> >>> Christian. >>> >>> Am 11.05.22 um 09:22 schrieb Marek Olšák: >>>> Bypass means that the contents of the cache are >>>> ignored, which decreases latency at the cost of >>>> no coherency between bypassed and normal memory >>>> requests. NOA (noalloc) means that the cache is >>>> checked and can give you cache hits, but misses >>>> are not cached and the overall latency is >>>> higher. I don't know what the hw does, but I >>>> hope it was misnamed and it really means bypass >>>> because there is no point in doing cache >>>> lookups on every memory request if the driver >>>> wants to disable caching to *decrease* latency >>>> in the situations when the cache isn't helping. >>>> >>>> Marek >>>> >>>> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo >>>> wrote: >>>> >>>> >>>> >>>> On 5/11/2022 11:36 AM, Christian König wrote: >>>> > Mhm, it doesn't really bypass MALL. It >>>> just doesn't allocate any MALL >>>> > entries on write. >>>> > >>>> > How about AMDGPU_VM_PAGE_NO_MALL ? >>>> >>>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = >>>> last level cache, * = some sort >>>> of attribute which decides LLC behaviour] >>>> >>>> Thanks, >>>> Lijo >>>> >>>> > >>>> > Christian. >>>> > >>>> > Am 10.05.22 um 23:21 schrieb Marek Olšák: >>>> >> A better name would be: >>>> >> AMDGPU_VM_PAGE_BYPASS_MALL >>>> >> >>>> >> Marek >>>> >> >>>> >> On Fri, May 6, 2022 at 7:23 AM Christian >>>> König >>>> >> wrote: >>>> >> >>>> >>     Add the AMDGPU_VM_NOALLOC flag to >>>> let userspace control MALL >>>> >>     allocation. >>>> >> >>>> >>     Only compile tested! >>>> >> >>>> >>  Signed-off-by: Christian König >>>> >>>> >>     --- >>>> >> >>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | >>>> 2 ++ >>>> >>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>>> | 3 +++ >>>> >>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c >>>> | 3 +++ >>>> >>   include/uapi/drm/amdgpu_drm.h         >>>>  | 2 ++ >>>> >>      4 files changed, 10 insertions(+) >>>> >> >>>> >>     diff --git >>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>> >>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>> >>     index bf97d8f07f57..d8129626581f 100644 >>>> >>     --- >>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>> >>     +++ >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >>>> >>     @@ -650,6 +650,8 @@ uint64_t >>>> amdgpu_gem_va_map_flags(struct >>>> >>     amdgpu_device *adev, uint32_t flags) >>>> >>   pte_flag |= AMDGPU_PTE_WRITEABLE; >>>> >>             if (flags & AMDGPU_VM_PAGE_PRT) >>>> >>   pte_flag |= AMDGPU_PTE_PRT; >>>> >>     +       if (flags & >>>> AMDGPU_VM_PAGE_NOALLOC) >>>> >>     +    pte_flag |= AMDGPU_PTE_NOALLOC; >>>> >> >>>> >>             if >>>> (adev->gmc.gmc_funcs->map_mtype) >>>> >>   pte_flag |= amdgpu_gmc_map_mtype(adev, >>>> >>     diff --git >>>> a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>>> >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>>> >>     index b8c79789e1e4..9077dfccaf3c 100644 >>>> >>     --- >>>> a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>>> >>     +++ >>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>>> >>     @@ -613,6 +613,9 @@ static void >>>> gmc_v10_0_get_vm_pte(struct >>>> >>     amdgpu_device *adev, >>>> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK; >>>> >> *flags |= (mapping->flags & >>>> AMDGPU_PTE_MTYPE_NV10_MASK); >>>> >> >>>> >>     +  *flags &= ~AMDGPU_PTE_NOALLOC; >>>> >>     +  *flags |= (mapping->flags & >>>> AMDGPU_PTE_NOALLOC); >>>> >>     + >>>> >>             if (mapping->flags & >>>> AMDGPU_PTE_PRT) { >>>> >>   *flags |= AMDGPU_PTE_PRT; >>>> >>   *flags |= AMDGPU_PTE_SNOOPED; >>>> >>     diff --git >>>> a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c >>>> >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c >>>> >>     index 8d733eeac556..32ee56adb602 100644 >>>> >>     --- >>>> a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c >>>> >>     +++ >>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c >>>> >>     @@ -508,6 +508,9 @@ static void >>>> gmc_v11_0_get_vm_pte(struct >>>> >>     amdgpu_device *adev, >>>> >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK; >>>> >> *flags |= (mapping->flags & >>>> AMDGPU_PTE_MTYPE_NV10_MASK); >>>> >> >>>> >>     +  *flags &= ~AMDGPU_PTE_NOALLOC; >>>> >>     +  *flags |= (mapping->flags & >>>> AMDGPU_PTE_NOALLOC); >>>> >>     + >>>> >>             if (mapping->flags & >>>> AMDGPU_PTE_PRT) { >>>> >>   *flags |= AMDGPU_PTE_PRT; >>>> >>   *flags |= AMDGPU_PTE_SNOOPED; >>>> >>     diff --git >>>> a/include/uapi/drm/amdgpu_drm.h >>>> >>  b/include/uapi/drm/amdgpu_drm.h >>>> >>     index 57b9d8f0133a..9d71d6330687 100644 >>>> >>     --- a/include/uapi/drm/amdgpu_drm.h >>>> >>     +++ b/include/uapi/drm/amdgpu_drm.h >>>> >>     @@ -533,6 +533,8 @@ struct >>>> drm_amdgpu_gem_op { >>>> >>      #define AMDGPU_VM_MTYPE_UC      (4 >>>> << 5) >>>> >>      /* Use Read Write MTYPE instead of >>>> default MTYPE */ >>>> >>      #define AMDGPU_VM_MTYPE_RW      (5 >>>> << 5) >>>> >>     +/* don't allocate MALL */ >>>> >>     +#define AMDGPU_VM_PAGE_NOALLOC     >>>>  (1 << 9) >>>> >> >>>> >>      struct drm_amdgpu_gem_va { >>>> >>             /** GEM object handle */ >>>> >>     -- >>>> >>     2.25.1 >>>> >> >>>> > >>>> >>> >> >