Dmesg doesn't contain anything. There is no backtrace because it's not a
crash. The VA map ioctl just fails with the new flag. It looks like the
flag is considered invalid.

Marek

On Mon., May 16, 2022, 12:13 Christian König, <
ckoenig.leichtzumerken@gmail.com> wrote:

> I don't have access to any gfx10 hardware.
>
> Can you give me a dmesg and/or backtrace, etc..?
>
> I can't push this unless it's working properly.
>
> Christian.
>
> Am 16.05.22 um 14:56 schrieb Marek Olšák:
>
> Reproduction steps:
> - use mesa/main on gfx10.3 (not sure what other GPUs do)
> - run: radeonsi_mall_noalloc=true glxgears
>
> Marek
>
> On Mon, May 16, 2022 at 7:53 AM Christian König <
> ckoenig.leichtzumerken@gmail.com> wrote:
>
>> Crap, do you have a link to the failure?
>>
>> Am 16.05.22 um 13:10 schrieb Marek Olšák:
>>
>> I forgot to say: The NOALLOC flag causes an allocation failure, so there
>> is a kernel bug somewhere.
>>
>> Marek
>>
>> On Mon, May 16, 2022 at 7:06 AM Marek Olšák <maraeo@gmail.com> wrote:
>>
>>> FYI, I think it's time to merge this because the Mesa commits are going
>>> to be merged in ~30 minutes if Gitlab CI is green, and that includes
>>> updated amdgpu_drm.h.
>>>
>>> Marek
>>>
>>> On Wed, May 11, 2022 at 2:55 PM Marek Olšák <maraeo@gmail.com> wrote:
>>>
>>>> Ok sounds good.
>>>>
>>>> Marek
>>>>
>>>> On Wed., May 11, 2022, 03:43 Christian König, <
>>>> ckoenig.leichtzumerken@gmail.com> wrote:
>>>>
>>>>> It really *is* a NOALLOC feature. In other words there is no latency
>>>>> improvement on reads because the cache is always checked, even with the
>>>>> noalloc flag set.
>>>>>
>>>>> The only thing it affects is that misses not enter the cache and so
>>>>> don't cause any additional pressure on evicting cache lines.
>>>>>
>>>>> You might want to double check with the hardware guys, but I'm
>>>>> something like 95% sure that it works this way.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>>>>
>>>>> Bypass means that the contents of the cache are ignored, which
>>>>> decreases latency at the cost of no coherency between bypassed and normal
>>>>> memory requests. NOA (noalloc) means that the cache is checked and can give
>>>>> you cache hits, but misses are not cached and the overall latency is
>>>>> higher. I don't know what the hw does, but I hope it was misnamed and it
>>>>> really means bypass because there is no point in doing cache lookups on
>>>>> every memory request if the driver wants to disable caching to *decrease*
>>>>> latency in the situations when the cache isn't helping.
>>>>>
>>>>> Marek
>>>>>
>>>>> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo <lijo.lazar@amd.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/11/2022 11:36 AM, Christian König wrote:
>>>>>> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any
>>>>>> MALL
>>>>>> > entries on write.
>>>>>> >
>>>>>> > How about AMDGPU_VM_PAGE_NO_MALL ?
>>>>>>
>>>>>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some
>>>>>> sort
>>>>>> of attribute which decides LLC behaviour]
>>>>>>
>>>>>> Thanks,
>>>>>> Lijo
>>>>>>
>>>>>> >
>>>>>> > Christian.
>>>>>> >
>>>>>> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>>>>>> >> A better name would be:
>>>>>> >> AMDGPU_VM_PAGE_BYPASS_MALL
>>>>>> >>
>>>>>> >> Marek
>>>>>> >>
>>>>>> >> On Fri, May 6, 2022 at 7:23 AM Christian König
>>>>>> >> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>> >>
>>>>>> >>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>>>>>> >>     allocation.
>>>>>> >>
>>>>>> >>     Only compile tested!
>>>>>> >>
>>>>>> >>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> >>     ---
>>>>>> >>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>>>>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>>>>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>>>>> >>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>>>>>> >>      4 files changed, 10 insertions(+)
>>>>>> >>
>>>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> >>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> >>     index bf97d8f07f57..d8129626581f 100644
>>>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> >>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>>>>>> >>     amdgpu_device *adev, uint32_t flags)
>>>>>> >>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>>>>> >>             if (flags & AMDGPU_VM_PAGE_PRT)
>>>>>> >>                     pte_flag |= AMDGPU_PTE_PRT;
>>>>>> >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>>>>> >>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>>>>>> >>
>>>>>> >>             if (adev->gmc.gmc_funcs->map_mtype)
>>>>>> >>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>>>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> >>     index b8c79789e1e4..9077dfccaf3c 100644
>>>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> >>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>>>>>> >>     amdgpu_device *adev,
>>>>>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>>>> >>             *flags |= (mapping->flags &
>>>>>> AMDGPU_PTE_MTYPE_NV10_MASK);
>>>>>> >>
>>>>>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>>>>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>>>>> >>     +
>>>>>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>>>>> >>                     *flags |= AMDGPU_PTE_PRT;
>>>>>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> >>     index 8d733eeac556..32ee56adb602 100644
>>>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> >>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>>>>>> >>     amdgpu_device *adev,
>>>>>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>>>> >>             *flags |= (mapping->flags &
>>>>>> AMDGPU_PTE_MTYPE_NV10_MASK);
>>>>>> >>
>>>>>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>>>>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>>>>> >>     +
>>>>>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>>>>> >>                     *flags |= AMDGPU_PTE_PRT;
>>>>>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>>>>> >>     diff --git a/include/uapi/drm/amdgpu_drm.h
>>>>>> >>     b/include/uapi/drm/amdgpu_drm.h
>>>>>> >>     index 57b9d8f0133a..9d71d6330687 100644
>>>>>> >>     --- a/include/uapi/drm/amdgpu_drm.h
>>>>>> >>     +++ b/include/uapi/drm/amdgpu_drm.h
>>>>>> >>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>>>>>> >>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
>>>>>> >>      /* Use Read Write MTYPE instead of default MTYPE */
>>>>>> >>      #define AMDGPU_VM_MTYPE_RW             (5 << 5)
>>>>>> >>     +/* don't allocate MALL */
>>>>>> >>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>>>>>> >>
>>>>>> >>      struct drm_amdgpu_gem_va {
>>>>>> >>             /** GEM object handle */
>>>>>> >>     --
>>>>>> >>     2.25.1
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>
>