* Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
[not found] <1586138158.v5u7myprlp.none.ref@localhost>
@ 2020-04-06 19:51 ` Alex Xu (Hello71)
2020-04-06 20:25 ` Thomas Hellström (VMware)
2020-04-06 21:04 ` Thomas Hellström (VMware)
0 siblings, 2 replies; 7+ messages in thread
From: Alex Xu (Hello71) @ 2020-04-06 19:51 UTC (permalink / raw)
To: linux-mm, dri-devel, linux-kernel, thomas_os
Cc: pv-drivers, linux-graphics-maintainer, Andrew Morton,
Michal Hocko, Matthew Wilcox (Oracle),
Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
Christian König, Dan Williams, Roland Scheidegger
Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
start filling dmesg, and then closing programs causes more BUGs and
hangs, and then everything grinds to a halt (can't start more programs,
can't even reboot through systemd).
Using master and reverting that branch up to that point fixes the
problem.
I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
board with IOMMU enabled.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
2020-04-06 19:51 ` Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Alex Xu (Hello71)
@ 2020-04-06 20:25 ` Thomas Hellström (VMware)
2020-04-06 21:04 ` Thomas Hellström (VMware)
1 sibling, 0 replies; 7+ messages in thread
From: Thomas Hellström (VMware) @ 2020-04-06 20:25 UTC (permalink / raw)
To: Alex Xu (Hello71), linux-mm, dri-devel, linux-kernel
Cc: pv-drivers, linux-graphics-maintainer, Andrew Morton,
Michal Hocko, Matthew Wilcox (Oracle),
Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
Christian König, Dan Williams, Roland Scheidegger
On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
> start filling dmesg, and then closing programs causes more BUGs and
> hangs, and then everything grinds to a halt (can't start more programs,
> can't even reboot through systemd).
>
> Using master and reverting that branch up to that point fixes the
> problem.
>
> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
> board with IOMMU enabled.
Hmm. That sounds bad. Could you send a copy of your config?
Meanwhile, I'll prepare a small patch that disables the non-vmwgfx
huge_fault() until we've figured out what's happening.
/Thomas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
2020-04-06 19:51 ` Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Alex Xu (Hello71)
2020-04-06 20:25 ` Thomas Hellström (VMware)
@ 2020-04-06 21:04 ` Thomas Hellström (VMware)
2020-04-07 0:38 ` Alex Xu (Hello71)
1 sibling, 1 reply; 7+ messages in thread
From: Thomas Hellström (VMware) @ 2020-04-06 21:04 UTC (permalink / raw)
To: Alex Xu (Hello71), linux-mm, dri-devel, linux-kernel
Cc: pv-drivers, linux-graphics-maintainer, Andrew Morton,
Michal Hocko, Matthew Wilcox (Oracle),
Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
Christian König, Dan Williams, Roland Scheidegger
[-- Attachment #1: Type: text/plain, Size: 631 bytes --]
Hi,
On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
> start filling dmesg, and then closing programs causes more BUGs and
> hangs, and then everything grinds to a halt (can't start more programs,
> can't even reboot through systemd).
>
> Using master and reverting that branch up to that point fixes the
> problem.
>
> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
> board with IOMMU enabled.
If you could try the attached patch, that'd be great!
Thanks,
Thomas
[-- Attachment #2: 0001-drm-ttm-Temporarily-disable-the-huge_fault-callback.patch --]
[-- Type: text/x-patch, Size: 2774 bytes --]
From b630b9b4dcc1d01514d97a84cbb7f0cb85333154 Mon Sep 17 00:00:00 2001
From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org>
Date: Mon, 6 Apr 2020 22:55:13 +0200
Subject: [PATCH] drm/ttm: Temporarily disable the huge_fault() callback
Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org>
---
drivers/gpu/drm/ttm/ttm_bo_vm.c | 63 ---------------------------------
1 file changed, 63 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 6ee3b96f0d13..0ad30b112982 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -442,66 +442,6 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
}
EXPORT_SYMBOL(ttm_bo_vm_fault);
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/**
- * ttm_pgprot_is_wrprotecting - Is a page protection value write-protecting?
- * @prot: The page protection value
- *
- * Return: true if @prot is write-protecting. false otherwise.
- */
-static bool ttm_pgprot_is_wrprotecting(pgprot_t prot)
-{
- /*
- * This is meant to say "pgprot_wrprotect(prot) == prot" in a generic
- * way. Unfortunately there is no generic pgprot_wrprotect.
- */
- return pte_val(pte_wrprotect(__pte(pgprot_val(prot)))) ==
- pgprot_val(prot);
-}
-
-static vm_fault_t ttm_bo_vm_huge_fault(struct vm_fault *vmf,
- enum page_entry_size pe_size)
-{
- struct vm_area_struct *vma = vmf->vma;
- pgprot_t prot;
- struct ttm_buffer_object *bo = vma->vm_private_data;
- vm_fault_t ret;
- pgoff_t fault_page_size = 0;
- bool write = vmf->flags & FAULT_FLAG_WRITE;
-
- switch (pe_size) {
- case PE_SIZE_PMD:
- fault_page_size = HPAGE_PMD_SIZE >> PAGE_SHIFT;
- break;
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
- case PE_SIZE_PUD:
- fault_page_size = HPAGE_PUD_SIZE >> PAGE_SHIFT;
- break;
-#endif
- default:
- WARN_ON_ONCE(1);
- return VM_FAULT_FALLBACK;
- }
-
- /* Fallback on write dirty-tracking or COW */
- if (write && ttm_pgprot_is_wrprotecting(vma->vm_page_prot))
- return VM_FAULT_FALLBACK;
-
- ret = ttm_bo_vm_reserve(bo, vmf);
- if (ret)
- return ret;
-
- prot = vm_get_page_prot(vma->vm_flags);
- ret = ttm_bo_vm_fault_reserved(vmf, prot, 1, fault_page_size);
- if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
- return ret;
-
- dma_resv_unlock(bo->base.resv);
-
- return ret;
-}
-#endif
-
void ttm_bo_vm_open(struct vm_area_struct *vma)
{
struct ttm_buffer_object *bo = vma->vm_private_data;
@@ -604,9 +544,6 @@ static const struct vm_operations_struct ttm_bo_vm_ops = {
.open = ttm_bo_vm_open,
.close = ttm_bo_vm_close,
.access = ttm_bo_vm_access,
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- .huge_fault = ttm_bo_vm_huge_fault,
-#endif
};
static struct ttm_buffer_object *ttm_bo_vm_lookup(struct ttm_bo_device *bdev,
--
2.21.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
2020-04-06 21:04 ` Thomas Hellström (VMware)
@ 2020-04-07 0:38 ` Alex Xu (Hello71)
2020-04-07 11:26 ` Thomas Hellström (VMware)
0 siblings, 1 reply; 7+ messages in thread
From: Alex Xu (Hello71) @ 2020-04-07 0:38 UTC (permalink / raw)
To: dri-devel, linux-kernel, linux-mm, Thomas Hellström (VMware)
Cc: Andrew Morton, Christian König, Dan Williams,
Jérôme Glisse, Kirill A. Shutemov,
linux-graphics-maintainer, Michal Hocko, pv-drivers,
Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle)
Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm:
> Hi,
>
> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
>> start filling dmesg, and then closing programs causes more BUGs and
>> hangs, and then everything grinds to a halt (can't start more programs,
>> can't even reboot through systemd).
>>
>> Using master and reverting that branch up to that point fixes the
>> problem.
>>
>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
>> board with IOMMU enabled.
>
> If you could try the attached patch, that'd be great!
>
> Thanks,
>
> Thomas
>
Yeah, that works too. Kernel config sent off-list.
Regards,
Alex.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
2020-04-07 0:38 ` Alex Xu (Hello71)
@ 2020-04-07 11:26 ` Thomas Hellström (VMware)
2020-04-07 15:36 ` Alex Xu (Hello71)
0 siblings, 1 reply; 7+ messages in thread
From: Thomas Hellström (VMware) @ 2020-04-07 11:26 UTC (permalink / raw)
To: Alex Xu (Hello71), dri-devel, linux-kernel, linux-mm
Cc: Andrew Morton, Christian König, Dan Williams,
Jérôme Glisse, Kirill A. Shutemov,
linux-graphics-maintainer, Michal Hocko, pv-drivers,
Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle)
On 4/7/20 2:38 AM, Alex Xu (Hello71) wrote:
> Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm:
>> Hi,
>>
>> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
>>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
>>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
>>> start filling dmesg, and then closing programs causes more BUGs and
>>> hangs, and then everything grinds to a halt (can't start more programs,
>>> can't even reboot through systemd).
>>>
>>> Using master and reverting that branch up to that point fixes the
>>> problem.
>>>
>>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
>>> board with IOMMU enabled.
>> If you could try the attached patch, that'd be great!
>>
>> Thanks,
>>
>> Thomas
>>
> Yeah, that works too. Kernel config sent off-list.
>
> Regards,
> Alex.
Thanks. Do you want me to add your
Reported-by: and Tested-by: To this patch?
/Thomas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
2020-04-07 11:26 ` Thomas Hellström (VMware)
@ 2020-04-07 15:36 ` Alex Xu (Hello71)
2020-04-07 19:57 ` Thomas Hellström (VMware)
0 siblings, 1 reply; 7+ messages in thread
From: Alex Xu (Hello71) @ 2020-04-07 15:36 UTC (permalink / raw)
To: dri-devel, linux-kernel, linux-mm, Thomas Hellström (VMware)
Cc: Andrew Morton, Christian König, Dan Williams,
Jérôme Glisse, Kirill A. Shutemov,
linux-graphics-maintainer, Michal Hocko, pv-drivers,
Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle)
Excerpts from Thomas Hellström (VMware)'s message of April 7, 2020 7:26 am:
> On 4/7/20 2:38 AM, Alex Xu (Hello71) wrote:
>> Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm:
>>> Hi,
>>>
>>> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
>>>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
>>>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
>>>> start filling dmesg, and then closing programs causes more BUGs and
>>>> hangs, and then everything grinds to a halt (can't start more programs,
>>>> can't even reboot through systemd).
>>>>
>>>> Using master and reverting that branch up to that point fixes the
>>>> problem.
>>>>
>>>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
>>>> board with IOMMU enabled.
>>> If you could try the attached patch, that'd be great!
>>>
>>> Thanks,
>>>
>>> Thomas
>>>
>> Yeah, that works too. Kernel config sent off-list.
>>
>> Regards,
>> Alex.
>
> Thanks. Do you want me to add your
>
> Reported-by: and Tested-by: To this patch?
>
> /Thomas
>
>
Sure. Shouldn't we fix it properly though?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
2020-04-07 15:36 ` Alex Xu (Hello71)
@ 2020-04-07 19:57 ` Thomas Hellström (VMware)
0 siblings, 0 replies; 7+ messages in thread
From: Thomas Hellström (VMware) @ 2020-04-07 19:57 UTC (permalink / raw)
To: Alex Xu (Hello71), dri-devel, linux-kernel, linux-mm
Cc: Andrew Morton, Christian König, Dan Williams,
Jérôme Glisse, Kirill A. Shutemov,
linux-graphics-maintainer, Michal Hocko, pv-drivers,
Ralph Campbell, Roland Scheidegger, Matthew Wilcox (Oracle)
On 4/7/20 5:36 PM, Alex Xu (Hello71) wrote:
> Excerpts from Thomas Hellström (VMware)'s message of April 7, 2020 7:26 am:
>> On 4/7/20 2:38 AM, Alex Xu (Hello71) wrote:
>>> Excerpts from Thomas Hellström (VMware)'s message of April 6, 2020 5:04 pm:
>>>> Hi,
>>>>
>>>> On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
>>>>> Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
>>>>> rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
>>>>> start filling dmesg, and then closing programs causes more BUGs and
>>>>> hangs, and then everything grinds to a halt (can't start more programs,
>>>>> can't even reboot through systemd).
>>>>>
>>>>> Using master and reverting that branch up to that point fixes the
>>>>> problem.
>>>>>
>>>>> I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
>>>>> board with IOMMU enabled.
>>>> If you could try the attached patch, that'd be great!
>>>>
>>>> Thanks,
>>>>
>>>> Thomas
>>>>
>>> Yeah, that works too. Kernel config sent off-list.
>>>
>>> Regards,
>>> Alex.
>> Thanks. Do you want me to add your
>>
>> Reported-by: and Tested-by: To this patch?
>>
>> /Thomas
>>
>>
> Sure. Shouldn't we fix it properly though?
It's still enabled for vmwgfx for which it is reasonably well tested and
where I can't see any such errors.
The code we remove with this patch enables huge page-table entries in
some circumstances for other drivers, but given the problems you're
seeing for amdgpu, it's better to enable this on a per-driver basis
after thorough testing. Since I don't have amdgpu hardware I'm not sure
what it's doing differently, and can't debug the issue properly.
/Thomas
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-04-07 19:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1586138158.v5u7myprlp.none.ref@localhost>
2020-04-06 19:51 ` Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Alex Xu (Hello71)
2020-04-06 20:25 ` Thomas Hellström (VMware)
2020-04-06 21:04 ` Thomas Hellström (VMware)
2020-04-07 0:38 ` Alex Xu (Hello71)
2020-04-07 11:26 ` Thomas Hellström (VMware)
2020-04-07 15:36 ` Alex Xu (Hello71)
2020-04-07 19:57 ` Thomas Hellström (VMware)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).