nouveau.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()
@ 2024-04-26 15:41 Lyude Paul
  2024-04-26 15:41 ` [PATCH 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3 Lyude Paul
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Lyude Paul @ 2024-04-26 15:41 UTC (permalink / raw)
  To: nouveau
  Cc: dri-devel, Karol Herbst, Danilo Krummrich, David Airlie,
	Daniel Vetter, Dave Airlie, Ben Skeggs, Justin Stitt, Kees Cook,
	open list

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

  kernel BUG at include/linux/scatterlist.h:187!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
  Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
  RIP: 0010:sg_init_one+0x85/0xa0
  Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
  24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
  0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
  RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
  RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
  R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
  R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
  FS:  00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
  Call Trace:
   <TASK>
   ? die+0x36/0x90
   ? do_trap+0xdd/0x100
   ? sg_init_one+0x85/0xa0
   ? do_error_trap+0x65/0x80
   ? sg_init_one+0x85/0xa0
   ? exc_invalid_op+0x50/0x70
   ? sg_init_one+0x85/0xa0
   ? asm_exc_invalid_op+0x1a/0x20
   ? sg_init_one+0x85/0xa0
   nvkm_firmware_ctor+0x14a/0x250 [nouveau]
   nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
   ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
   r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
   ? srso_return_thunk+0x5/0x5f
   ? srso_return_thunk+0x5/0x5f
   ? nvkm_udevice_new+0x95/0x140 [nouveau]
   ? srso_return_thunk+0x5/0x5f
   ? srso_return_thunk+0x5/0x5f
   ? ktime_get+0x47/0xb0
   ? srso_return_thunk+0x5/0x5f
   nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
   nvkm_subdev_init_+0x39/0x140 [nouveau]
   ? srso_return_thunk+0x5/0x5f
   nvkm_subdev_init+0x44/0x90 [nouveau]
   nvkm_device_init+0x166/0x2e0 [nouveau]
   nvkm_udevice_init+0x47/0x70 [nouveau]
   nvkm_object_init+0x41/0x1c0 [nouveau]
   nvkm_ioctl_new+0x16a/0x290 [nouveau]
   ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
   ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
   nvkm_ioctl+0x126/0x290 [nouveau]
   nvif_object_ctor+0x112/0x190 [nouveau]
   nvif_device_ctor+0x23/0x60 [nouveau]
   nouveau_cli_init+0x164/0x640 [nouveau]
   nouveau_drm_device_init+0x97/0x9e0 [nouveau]
   ? srso_return_thunk+0x5/0x5f
   ? pci_update_current_state+0x72/0xb0
   ? srso_return_thunk+0x5/0x5f
   nouveau_drm_probe+0x12c/0x280 [nouveau]
   ? srso_return_thunk+0x5/0x5f
   local_pci_probe+0x45/0xa0
   pci_device_probe+0xc7/0x270
   really_probe+0xe6/0x3a0
   __driver_probe_device+0x87/0x160
   driver_probe_device+0x1f/0xc0
   __driver_attach+0xec/0x1f0
   ? __pfx___driver_attach+0x10/0x10
   bus_for_each_dev+0x88/0xd0
   bus_add_driver+0x116/0x220
   driver_register+0x59/0x100
   ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
   do_one_initcall+0x5b/0x320
   do_init_module+0x60/0x250
   init_module_from_file+0x86/0xc0
   idempotent_init_module+0x120/0x2b0
   __x64_sys_finit_module+0x5e/0xb0
   do_syscall_64+0x83/0x160
   ? srso_return_thunk+0x5/0x5f
   entry_SYSCALL_64_after_hwframe+0x71/0x79
  RIP: 0033:0x7feeb5cc20cd
  Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
  f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
  ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
  RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
  RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
  RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
  R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
  R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
   </TASK>

We hit this because when initializing firmware of type
NVKM_FIRMWARE_IMG_DMA we allocate coherent memory and then attempt to
include that coherent memory in a scatterlist. What we actually mean to do
here though is to pass a CPU-allocated memory address, since that's the
only thing that would make sense to put in the scatterlist.

Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/nouveau/nvkm/core/firmware.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/core/firmware.c b/drivers/gpu/drm/nouveau/nvkm/core/firmware.c
index adc60b25f8e6c..141b0a513bf52 100644
--- a/drivers/gpu/drm/nouveau/nvkm/core/firmware.c
+++ b/drivers/gpu/drm/nouveau/nvkm/core/firmware.c
@@ -205,7 +205,9 @@ nvkm_firmware_dtor(struct nvkm_firmware *fw)
 		break;
 	case NVKM_FIRMWARE_IMG_DMA:
 		nvkm_memory_unref(&memory);
-		dma_free_coherent(fw->device->dev, sg_dma_len(&fw->mem.sgl), fw->img, fw->phys);
+		dma_unmap_single(fw->device->dev, fw->phys, sg_dma_len(&fw->mem.sgl),
+				 DMA_TO_DEVICE);
+		kfree(fw->img);
 		break;
 	case NVKM_FIRMWARE_IMG_SGT:
 		nvkm_memory_unref(&memory);
@@ -235,14 +237,17 @@ nvkm_firmware_ctor(const struct nvkm_firmware_func *func, const char *name,
 		fw->img = kmemdup(src, fw->len, GFP_KERNEL);
 		break;
 	case NVKM_FIRMWARE_IMG_DMA: {
-		dma_addr_t addr;
-
 		len = ALIGN(fw->len, PAGE_SIZE);
 
-		fw->img = dma_alloc_coherent(fw->device->dev, len, &addr, GFP_KERNEL);
-		if (fw->img) {
-			memcpy(fw->img, src, fw->len);
-			fw->phys = addr;
+		fw->img = kmalloc(len, GFP_KERNEL);
+		if (!fw->img)
+			return -ENOMEM;
+
+		memcpy(fw->img, src, fw->len);
+		fw->phys = dma_map_single(fw->device->dev, fw->img, len, DMA_TO_DEVICE);
+		if (dma_mapping_error(fw->device->dev, fw->phys)) {
+			kfree(fw->img);
+			return -EFAULT;
 		}
 
 		sg_init_one(&fw->mem.sgl, fw->img, len);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3
  2024-04-26 15:41 [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() Lyude Paul
@ 2024-04-26 15:41 ` Lyude Paul
  2024-04-29  6:03   ` Dave Airlie
  2024-04-26 15:47 ` [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() Timur Tabi
  2024-04-29 18:23 ` [PATCH v2 " Lyude Paul
  2 siblings, 1 reply; 11+ messages in thread
From: Lyude Paul @ 2024-04-26 15:41 UTC (permalink / raw)
  To: nouveau
  Cc: dri-devel, stable, Karol Herbst, Danilo Krummrich, David Airlie,
	Daniel Vetter, Dave Airlie, Ben Skeggs, Timur Tabi,
	Dan Carpenter, open list

Currently we allocate all 3 levels of radix3 page tables using
nvkm_gsp_mem_ctor(), which uses dma_alloc_coherent() for allocating all of
the relevant memory. This can end up failing in scenarios where the system
has very high memory fragmentation, and we can't find enough contiguous
memory to allocate level 2 of the page table.

Currently, this can result in runtime PM issues on systems where memory
fragmentation is high - as we'll fail to allocate the page table for our
suspend/resume buffer:

  kworker/10:2: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL),
  nodemask=(null),cpuset=/,mems_allowed=0
  CPU: 10 PID: 479809 Comm: kworker/10:2 Not tainted
  6.8.6-201.ChopperV6.fc39.x86_64 #1
  Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024
  Workqueue: pm pm_runtime_work
  Call Trace:
   <TASK>
   dump_stack_lvl+0x64/0x80
   warn_alloc+0x165/0x1e0
   ? __alloc_pages_direct_compact+0xb3/0x2b0
   __alloc_pages_slowpath.constprop.0+0xd7d/0xde0
   __alloc_pages+0x32d/0x350
   __dma_direct_alloc_pages.isra.0+0x16a/0x2b0
   dma_direct_alloc+0x70/0x270
   nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau]
   r535_gsp_fini+0x1d4/0x350 [nouveau]
   nvkm_subdev_fini+0x67/0x150 [nouveau]
   nvkm_device_fini+0x95/0x1e0 [nouveau]
   nvkm_udevice_fini+0x53/0x70 [nouveau]
   nvkm_object_fini+0xb9/0x240 [nouveau]
   nvkm_object_fini+0x75/0x240 [nouveau]
   nouveau_do_suspend+0xf5/0x280 [nouveau]
   nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau]
   pci_pm_runtime_suspend+0x67/0x1e0
   ? __pfx_pci_pm_runtime_suspend+0x10/0x10
   __rpm_callback+0x41/0x170
   ? __pfx_pci_pm_runtime_suspend+0x10/0x10
   rpm_callback+0x5d/0x70
   ? __pfx_pci_pm_runtime_suspend+0x10/0x10
   rpm_suspend+0x120/0x6a0
   pm_runtime_work+0x98/0xb0
   process_one_work+0x171/0x340
   worker_thread+0x27b/0x3a0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0xe5/0x120
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x31/0x50
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1b/0x30

Luckily, we don't actually need to allocate coherent memory for the page
table thanks to being able to pass the GPU a radix3 page table for
suspend/resume data. So, let's rewrite nvkm_gsp_radix3_sg() to use the sg
allocator for level 2. We continue using coherent allocations for lvl0 and
1, since they only take a single page.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
---
 .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |  4 +-
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 71 ++++++++++++-------
 2 files changed, 47 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
index 6f5d376d8fcc1..a11d16a16c3b2 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
@@ -15,7 +15,9 @@ struct nvkm_gsp_mem {
 };
 
 struct nvkm_gsp_radix3 {
-	struct nvkm_gsp_mem mem[3];
+	struct nvkm_gsp_mem lvl0;
+	struct nvkm_gsp_mem lvl1;
+	struct sg_table lvl2;
 };
 
 int nvkm_gsp_sg(struct nvkm_device *, u64 size, struct sg_table *);
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 9858c1438aa7f..2bf9077d37118 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1624,7 +1624,7 @@ r535_gsp_wpr_meta_init(struct nvkm_gsp *gsp)
 	meta->magic = GSP_FW_WPR_META_MAGIC;
 	meta->revision = GSP_FW_WPR_META_REVISION;
 
-	meta->sysmemAddrOfRadix3Elf = gsp->radix3.mem[0].addr;
+	meta->sysmemAddrOfRadix3Elf = gsp->radix3.lvl0.addr;
 	meta->sizeOfRadix3Elf = gsp->fb.wpr2.elf.size;
 
 	meta->sysmemAddrOfBootloader = gsp->boot.fw.addr;
@@ -1919,8 +1919,9 @@ nvkm_gsp_sg(struct nvkm_device *device, u64 size, struct sg_table *sgt)
 static void
 nvkm_gsp_radix3_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_radix3 *rx3)
 {
-	for (int i = ARRAY_SIZE(rx3->mem) - 1; i >= 0; i--)
-		nvkm_gsp_mem_dtor(gsp, &rx3->mem[i]);
+	nvkm_gsp_sg_free(gsp->subdev.device, &rx3->lvl2);
+	nvkm_gsp_mem_dtor(gsp, &rx3->lvl1);
+	nvkm_gsp_mem_dtor(gsp, &rx3->lvl0);
 }
 
 /**
@@ -1960,36 +1961,52 @@ static int
 nvkm_gsp_radix3_sg(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size,
 		   struct nvkm_gsp_radix3 *rx3)
 {
-	u64 addr;
+	struct sg_dma_page_iter sg_dma_iter;
+	struct scatterlist *sg;
+	size_t bufsize;
+	u64 *pte;
+	int ret, i, page_idx = 0;
 
-	for (int i = ARRAY_SIZE(rx3->mem) - 1; i >= 0; i--) {
-		u64 *ptes;
-		size_t bufsize;
-		int ret, idx;
-
-		bufsize = ALIGN((size / GSP_PAGE_SIZE) * sizeof(u64), GSP_PAGE_SIZE);
-		ret = nvkm_gsp_mem_ctor(gsp, bufsize, &rx3->mem[i]);
-		if (ret)
-			return ret;
+	ret = nvkm_gsp_mem_ctor(gsp, GSP_PAGE_SIZE, &rx3->lvl0);
+	if (ret)
+		return ret;
 
-		ptes = rx3->mem[i].data;
-		if (i == 2) {
-			struct scatterlist *sgl;
+	ret = nvkm_gsp_mem_ctor(gsp, GSP_PAGE_SIZE, &rx3->lvl1);
+	if (ret)
+		goto lvl1_fail;
 
-			for_each_sgtable_dma_sg(sgt, sgl, idx) {
-				for (int j = 0; j < sg_dma_len(sgl) / GSP_PAGE_SIZE; j++)
-					*ptes++ = sg_dma_address(sgl) + (GSP_PAGE_SIZE * j);
-			}
-		} else {
-			for (int j = 0; j < size / GSP_PAGE_SIZE; j++)
-				*ptes++ = addr + GSP_PAGE_SIZE * j;
+	// Allocate level 2
+	bufsize = ALIGN((size / GSP_PAGE_SIZE) * sizeof(u64), GSP_PAGE_SIZE);
+	ret = nvkm_gsp_sg(gsp->subdev.device, bufsize, &rx3->lvl2);
+	if (ret)
+		goto lvl2_fail;
+
+	// Write the bus address of level 1 to level 0
+	pte = rx3->lvl0.data;
+	*pte = rx3->lvl1.addr;
+
+	// Write the bus address of each page in level 2 to level 1
+	pte = rx3->lvl1.data;
+	for_each_sgtable_dma_page(&rx3->lvl2, &sg_dma_iter, 0)
+		*pte++ = sg_page_iter_dma_address(&sg_dma_iter);
+
+	// Finally, write the bus address of each page in sgt to level 2
+	for_each_sgtable_sg(&rx3->lvl2, sg, i) {
+		pte = sg_virt(sg);
+		for_each_sgtable_dma_page(sgt, &sg_dma_iter, page_idx) {
+			*pte++ = sg_page_iter_dma_address(&sg_dma_iter);
+			page_idx++;
 		}
+	}
 
-		size = rx3->mem[i].size;
-		addr = rx3->mem[i].addr;
+	if (ret) {
+lvl2_fail:
+		nvkm_gsp_mem_dtor(gsp, &rx3->lvl1);
+lvl1_fail:
+		nvkm_gsp_mem_dtor(gsp, &rx3->lvl0);
 	}
 
-	return 0;
+	return ret;
 }
 
 int
@@ -2021,7 +2038,7 @@ r535_gsp_fini(struct nvkm_gsp *gsp, bool suspend)
 		sr = gsp->sr.meta.data;
 		sr->magic = GSP_FW_SR_META_MAGIC;
 		sr->revision = GSP_FW_SR_META_REVISION;
-		sr->sysmemAddrOfSuspendResumeData = gsp->sr.radix3.mem[0].addr;
+		sr->sysmemAddrOfSuspendResumeData = gsp->sr.radix3.lvl0.addr;
 		sr->sizeOfSuspendResumeData = len;
 
 		mbox0 = lower_32_bits(gsp->sr.meta.addr);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()
  2024-04-26 15:41 [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() Lyude Paul
  2024-04-26 15:41 ` [PATCH 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3 Lyude Paul
@ 2024-04-26 15:47 ` Timur Tabi
  2024-04-28 15:52   ` Lyude Paul
  2024-04-29 18:23 ` [PATCH v2 " Lyude Paul
  2 siblings, 1 reply; 11+ messages in thread
From: Timur Tabi @ 2024-04-26 15:47 UTC (permalink / raw)
  To: nouveau, lyude
  Cc: bskeggs, kherbst, airlied, justinstitt, dri-devel, keescook,
	airlied, dakr, linux-kernel, daniel

On Fri, 2024-04-26 at 11:41 -0400, Lyude Paul wrote:
> We hit this because when initializing firmware of type
> NVKM_FIRMWARE_IMG_DMA we allocate coherent memory and then attempt to
> include that coherent memory in a scatterlist. 

I'm sure this patch is a good one, and I will try to test it soon, but I am
very curious to know why including coherent memory in a scatterlist is bad.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()
  2024-04-26 15:47 ` [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() Timur Tabi
@ 2024-04-28 15:52   ` Lyude Paul
  2024-04-28 18:19     ` Timur Tabi
  0 siblings, 1 reply; 11+ messages in thread
From: Lyude Paul @ 2024-04-28 15:52 UTC (permalink / raw)
  To: Timur Tabi, nouveau
  Cc: bskeggs, kherbst, airlied, justinstitt, dri-devel, keescook,
	airlied, dakr, linux-kernel, daniel

On Fri, 2024-04-26 at 15:47 +0000, Timur Tabi wrote:
> On Fri, 2024-04-26 at 11:41 -0400, Lyude Paul wrote:
> > We hit this because when initializing firmware of type
> > NVKM_FIRMWARE_IMG_DMA we allocate coherent memory and then attempt
> > to
> > include that coherent memory in a scatterlist. 
> 
> I'm sure this patch is a good one, and I will try to test it soon,
> but I am
> very curious to know why including coherent memory in a scatterlist
> is bad.

Thanks for asking this as I think you unintentionally pointed out this
explanation I gave doesn't make sense - so I looked a bit more into it.
The issue isn't coherent memory in the scatterlist, the issue is that
we're allocating with dma_alloc_coherent(). And according to the source
in dma_alloc_attrs() (which dma_alloc_coherent() is just a wrapper)
for):

   /*
    * DMA allocations can never be turned back into a page pointer, so
    * requesting compound pages doesn't make sense (and can't even be
    * supported at all by various backends).
    */
   if (WARN_ON_ONCE(flag & __GFP_COMP))
   	return NULL;

Which explains the check in sg_set_buf() that this patch stops us from
hitting:

   BUG_ON(!virt_addr_valid(buf));

Scatterlists need page pointers (we use one later down here:)

   sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));

But we can't get a page pointer from an allocation made by
dma_alloc_coherent() - but we can from vmalloc(). I'll fix the patch
explanation in the next version, I have to send out another version
anyhow since I realized that patch #2 still needs one more check to
work properly
-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3
  2024-04-29 18:23   ` [PATCH v2 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3 Lyude Paul
@ 2024-04-28 16:36     ` Ben Skeggs
  0 siblings, 0 replies; 11+ messages in thread
From: Ben Skeggs @ 2024-04-28 16:36 UTC (permalink / raw)
  To: nouveau

On 30/4/24 04:23, Lyude Paul wrote:

> Currently we allocate all 3 levels of radix3 page tables using
> nvkm_gsp_mem_ctor(), which uses dma_alloc_coherent() for allocating all of
> the relevant memory. This can end up failing in scenarios where the system
> has very high memory fragmentation, and we can't find enough contiguous
> memory to allocate level 2 of the page table.
>
> Currently, this can result in runtime PM issues on systems where memory
> fragmentation is high - as we'll fail to allocate the page table for our
> suspend/resume buffer:
>
>    kworker/10:2: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL),
>    nodemask=(null),cpuset=/,mems_allowed=0
>    CPU: 10 PID: 479809 Comm: kworker/10:2 Not tainted
>    6.8.6-201.ChopperV6.fc39.x86_64 #1
>    Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024
>    Workqueue: pm pm_runtime_work
>    Call Trace:
>     <TASK>
>     dump_stack_lvl+0x64/0x80
>     warn_alloc+0x165/0x1e0
>     ? __alloc_pages_direct_compact+0xb3/0x2b0
>     __alloc_pages_slowpath.constprop.0+0xd7d/0xde0
>     __alloc_pages+0x32d/0x350
>     __dma_direct_alloc_pages.isra.0+0x16a/0x2b0
>     dma_direct_alloc+0x70/0x270
>     nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau]
>     r535_gsp_fini+0x1d4/0x350 [nouveau]
>     nvkm_subdev_fini+0x67/0x150 [nouveau]
>     nvkm_device_fini+0x95/0x1e0 [nouveau]
>     nvkm_udevice_fini+0x53/0x70 [nouveau]
>     nvkm_object_fini+0xb9/0x240 [nouveau]
>     nvkm_object_fini+0x75/0x240 [nouveau]
>     nouveau_do_suspend+0xf5/0x280 [nouveau]
>     nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau]
>     pci_pm_runtime_suspend+0x67/0x1e0
>     ? __pfx_pci_pm_runtime_suspend+0x10/0x10
>     __rpm_callback+0x41/0x170
>     ? __pfx_pci_pm_runtime_suspend+0x10/0x10
>     rpm_callback+0x5d/0x70
>     ? __pfx_pci_pm_runtime_suspend+0x10/0x10
>     rpm_suspend+0x120/0x6a0
>     pm_runtime_work+0x98/0xb0
>     process_one_work+0x171/0x340
>     worker_thread+0x27b/0x3a0
>     ? __pfx_worker_thread+0x10/0x10
>     kthread+0xe5/0x120
>     ? __pfx_kthread+0x10/0x10
>     ret_from_fork+0x31/0x50
>     ? __pfx_kthread+0x10/0x10
>     ret_from_fork_asm+0x1b/0x30
>
> Luckily, we don't actually need to allocate coherent memory for the page
> table thanks to being able to pass the GPU a radix3 page table for
> suspend/resume data. So, let's rewrite nvkm_gsp_radix3_sg() to use the sg
> allocator for level 2. We continue using coherent allocations for lvl0 and
> 1, since they only take a single page.
>
> V2:
> * Don't forget to actually jump to the next scatterlist when we reach the
>    end of the scatterlist we're currently on when writing out the page table
>    for level 2
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: stable@vger.kernel.org

Hey Lyude,

These are looking good!  Thank you for looking at this issue. For both 
patches:

Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>


Ben.

> ---
>   .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |  4 +-
>   .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 77 ++++++++++++-------
>   2 files changed, 54 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> index 6f5d376d8fcc1..a11d16a16c3b2 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> @@ -15,7 +15,9 @@ struct nvkm_gsp_mem {
>   };
>   
>   struct nvkm_gsp_radix3 {
> -	struct nvkm_gsp_mem mem[3];
> +	struct nvkm_gsp_mem lvl0;
> +	struct nvkm_gsp_mem lvl1;
> +	struct sg_table lvl2;
>   };
>   
>   int nvkm_gsp_sg(struct nvkm_device *, u64 size, struct sg_table *);
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index 9858c1438aa7f..fd4e80ba6adfc 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -1624,7 +1624,7 @@ r535_gsp_wpr_meta_init(struct nvkm_gsp *gsp)
>   	meta->magic = GSP_FW_WPR_META_MAGIC;
>   	meta->revision = GSP_FW_WPR_META_REVISION;
>   
> -	meta->sysmemAddrOfRadix3Elf = gsp->radix3.mem[0].addr;
> +	meta->sysmemAddrOfRadix3Elf = gsp->radix3.lvl0.addr;
>   	meta->sizeOfRadix3Elf = gsp->fb.wpr2.elf.size;
>   
>   	meta->sysmemAddrOfBootloader = gsp->boot.fw.addr;
> @@ -1919,8 +1919,9 @@ nvkm_gsp_sg(struct nvkm_device *device, u64 size, struct sg_table *sgt)
>   static void
>   nvkm_gsp_radix3_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_radix3 *rx3)
>   {
> -	for (int i = ARRAY_SIZE(rx3->mem) - 1; i >= 0; i--)
> -		nvkm_gsp_mem_dtor(gsp, &rx3->mem[i]);
> +	nvkm_gsp_sg_free(gsp->subdev.device, &rx3->lvl2);
> +	nvkm_gsp_mem_dtor(gsp, &rx3->lvl1);
> +	nvkm_gsp_mem_dtor(gsp, &rx3->lvl0);
>   }
>   
>   /**
> @@ -1960,36 +1961,60 @@ static int
>   nvkm_gsp_radix3_sg(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size,
>   		   struct nvkm_gsp_radix3 *rx3)
>   {
> -	u64 addr;
> +	struct sg_dma_page_iter sg_dma_iter;
> +	struct scatterlist *sg;
> +	size_t bufsize;
> +	u64 *pte;
> +	int ret, i, page_idx = 0;
>   
> -	for (int i = ARRAY_SIZE(rx3->mem) - 1; i >= 0; i--) {
> -		u64 *ptes;
> -		size_t bufsize;
> -		int ret, idx;
> +	ret = nvkm_gsp_mem_ctor(gsp, GSP_PAGE_SIZE, &rx3->lvl0);
> +	if (ret)
> +		return ret;
>   
> -		bufsize = ALIGN((size / GSP_PAGE_SIZE) * sizeof(u64), GSP_PAGE_SIZE);
> -		ret = nvkm_gsp_mem_ctor(gsp, bufsize, &rx3->mem[i]);
> -		if (ret)
> -			return ret;
> +	ret = nvkm_gsp_mem_ctor(gsp, GSP_PAGE_SIZE, &rx3->lvl1);
> +	if (ret)
> +		goto lvl1_fail;
>   
> -		ptes = rx3->mem[i].data;
> -		if (i == 2) {
> -			struct scatterlist *sgl;
> +	// Allocate level 2
> +	bufsize = ALIGN((size / GSP_PAGE_SIZE) * sizeof(u64), GSP_PAGE_SIZE);
> +	ret = nvkm_gsp_sg(gsp->subdev.device, bufsize, &rx3->lvl2);
> +	if (ret)
> +		goto lvl2_fail;
>   
> -			for_each_sgtable_dma_sg(sgt, sgl, idx) {
> -				for (int j = 0; j < sg_dma_len(sgl) / GSP_PAGE_SIZE; j++)
> -					*ptes++ = sg_dma_address(sgl) + (GSP_PAGE_SIZE * j);
> -			}
> -		} else {
> -			for (int j = 0; j < size / GSP_PAGE_SIZE; j++)
> -				*ptes++ = addr + GSP_PAGE_SIZE * j;
> +	// Write the bus address of level 1 to level 0
> +	pte = rx3->lvl0.data;
> +	*pte = rx3->lvl1.addr;
> +
> +	// Write the bus address of each page in level 2 to level 1
> +	pte = rx3->lvl1.data;
> +	for_each_sgtable_dma_page(&rx3->lvl2, &sg_dma_iter, 0)
> +		*pte++ = sg_page_iter_dma_address(&sg_dma_iter);
> +
> +	// Finally, write the bus address of each page in sgt to level 2
> +	for_each_sgtable_sg(&rx3->lvl2, sg, i) {
> +		void *sgl_end;
> +
> +		pte = sg_virt(sg);
> +		sgl_end = (void*)pte + sg->length;
> +
> +		for_each_sgtable_dma_page(sgt, &sg_dma_iter, page_idx) {
> +			*pte++ = sg_page_iter_dma_address(&sg_dma_iter);
> +			page_idx++;
> +
> +			// Go to the next scatterlist for level 2 if we've reached the end
> +			if ((void*)pte >= sgl_end)
> +				break;
>   		}
> +	}
>   
> -		size = rx3->mem[i].size;
> -		addr = rx3->mem[i].addr;
> +	if (ret) {
> +lvl2_fail:
> +		nvkm_gsp_mem_dtor(gsp, &rx3->lvl1);
> +lvl1_fail:
> +		nvkm_gsp_mem_dtor(gsp, &rx3->lvl0);
>   	}
>   
> -	return 0;
> +	return ret;
>   }
>   
>   int
> @@ -2021,7 +2046,7 @@ r535_gsp_fini(struct nvkm_gsp *gsp, bool suspend)
>   		sr = gsp->sr.meta.data;
>   		sr->magic = GSP_FW_SR_META_MAGIC;
>   		sr->revision = GSP_FW_SR_META_REVISION;
> -		sr->sysmemAddrOfSuspendResumeData = gsp->sr.radix3.mem[0].addr;
> +		sr->sysmemAddrOfSuspendResumeData = gsp->sr.radix3.lvl0.addr;
>   		sr->sizeOfSuspendResumeData = len;
>   
>   		mbox0 = lower_32_bits(gsp->sr.meta.addr);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()
  2024-04-28 15:52   ` Lyude Paul
@ 2024-04-28 18:19     ` Timur Tabi
  0 siblings, 0 replies; 11+ messages in thread
From: Timur Tabi @ 2024-04-28 18:19 UTC (permalink / raw)
  To: nouveau, lyude
  Cc: bskeggs, kherbst, airlied, justinstitt, dri-devel, keescook,
	airlied, dakr, linux-kernel, daniel

On Sun, 2024-04-28 at 11:52 -0400, Lyude Paul wrote:
> But we can't get a page pointer from an allocation made by
> dma_alloc_coherent() - but we can from vmalloc(). I'll fix the patch
> explanation in the next version, I have to send out another version
> anyhow since I realized that patch #2 still needs one more check to
> work properly

Glad my confusion was helpful!  You wouldn't happent to know why this
statement is true?

	DMA allocations can never be turned back into a page pointer

I'm guessing DMA allocations are taken from some memory pool that isn't
tracked using page pointers, but I don't understand why.  It seems weird that
you can have a virtual address to the buffer, therefore it is surely mapped on
a page-by-page basis, so why are there no page pointers?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3
  2024-04-26 15:41 ` [PATCH 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3 Lyude Paul
@ 2024-04-29  6:03   ` Dave Airlie
  2024-04-29 17:54     ` Lyude Paul
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Airlie @ 2024-04-29  6:03 UTC (permalink / raw)
  To: Lyude Paul
  Cc: nouveau, dri-devel, Karol Herbst, Danilo Krummrich,
	Daniel Vetter, Dave Airlie, Ben Skeggs, Timur Tabi,
	Dan Carpenter, open list

> Currently, this can result in runtime PM issues on systems where memory
> Luckily, we don't actually need to allocate coherent memory for the page
> table thanks to being able to pass the GPU a radix3 page table for
> suspend/resume data. So, let's rewrite nvkm_gsp_radix3_sg() to use the sg
> allocator for level 2. We continue using coherent allocations for lvl0 and
> 1, since they only take a single page.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>  .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |  4 +-
>  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 71 ++++++++++++-------
>  2 files changed, 47 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> index 6f5d376d8fcc1..a11d16a16c3b2 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> @@ -15,7 +15,9 @@ struct nvkm_gsp_mem {
>  };
>
>  struct nvkm_gsp_radix3 {
> -       struct nvkm_gsp_mem mem[3];
> +       struct nvkm_gsp_mem lvl0;
> +       struct nvkm_gsp_mem lvl1;
> +       struct sg_table lvl2;

This looks great, could we go a step further and combine lvl0 and lvl1
into a 2 page allocation, I thought we could combine lvl0/lvl1 into a
2 page alloc, but that actually might be a bad idea under memory
pressure.

Dave.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3
  2024-04-29  6:03   ` Dave Airlie
@ 2024-04-29 17:54     ` Lyude Paul
  0 siblings, 0 replies; 11+ messages in thread
From: Lyude Paul @ 2024-04-29 17:54 UTC (permalink / raw)
  To: Dave Airlie
  Cc: nouveau, dri-devel, Karol Herbst, Danilo Krummrich,
	Daniel Vetter, Dave Airlie, Ben Skeggs, Timur Tabi,
	Dan Carpenter, open list

On Mon, 2024-04-29 at 16:03 +1000, Dave Airlie wrote:
> > Currently, this can result in runtime PM issues on systems where
> > memory
> > Luckily, we don't actually need to allocate coherent memory for the
> > page
> > table thanks to being able to pass the GPU a radix3 page table for
> > suspend/resume data. So, let's rewrite nvkm_gsp_radix3_sg() to use
> > the sg
> > allocator for level 2. We continue using coherent allocations for
> > lvl0 and
> > 1, since they only take a single page.
> > 
> > Signed-off-by: Lyude Paul <lyude@redhat.com>
> > Cc: stable@vger.kernel.org
> > ---
> >  .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |  4 +-
> >  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 71 ++++++++++++---
> > ----
> >  2 files changed, 47 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> > b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> > index 6f5d376d8fcc1..a11d16a16c3b2 100644
> > --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> > +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
> > @@ -15,7 +15,9 @@ struct nvkm_gsp_mem {
> >  };
> > 
> >  struct nvkm_gsp_radix3 {
> > -       struct nvkm_gsp_mem mem[3];
> > +       struct nvkm_gsp_mem lvl0;
> > +       struct nvkm_gsp_mem lvl1;
> > +       struct sg_table lvl2;
> 
> This looks great, could we go a step further and combine lvl0 and
> lvl1
> into a 2 page allocation, I thought we could combine lvl0/lvl1 into a
> 2 page alloc, but that actually might be a bad idea under memory
> pressure.

I'm not sure I understand :P, do we want to go for that or not? TBH -
I'm not sure there's any hardware reason we wouldn't be able to do the
whole radix3 table as an sg allocation with two additional memory pages
added on for level 0 and 1 - since both of those can only be the size
of a single page anyway it probably doesn't make much of a difference.

The main reason I didn't end up doing that though is because it would
make the codepath in nvkm_radix3_sg() a lot uglier. We need the virtual
addresses of level 0-2's first/only pages to populate them, and we also
need the DMA addresses of level 1-2. There isn't an iterator that lets
you go through both DMA/virtual addresses as far as I can tell - and
even if there was we'd start having to keep track of when we reach the
end of a page in the loop and make sure that we always set pte to the
address of the third sg page on the first iteration of the loop. IMO,
scatterlist could definitely benefit from having an iterator that does
both and can be stepped through both in and out of for loop macros
(like Iterator in rust).

So - it's definitely possible, but considering:

 * nvkm_gsp_mem isn't a very big struct
 * We're only allocating a single page for level 0 and 1, so at least
   according to the advice I got from Sima this should be a safe amount
   to allocate coherently under memory pressure.
 * It's just a lot easier code-wise having direct address to the
   DMA/virt addresses for the first two levels

I decided to stay with nvkm_gsp_mem_ctor() for the first two pages and
just use nvkm_gsp_sg() for the rest. I can definitely convert the whole
thing to using nvkm_gsp_sg() if we really want though - but I don't
think it'll give us much benefit.

I'll send out the new version of the patch without these changes and a
fix for one of the issues with this patch I already mentioned to Timur,
just let me know what you end up deciding and I can revise the patch if
you want.

> 
> Dave.
> 

-- 
Cheers, Lyude Paul (she/her) Software Engineer at Red Hat


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()
  2024-04-26 15:41 [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() Lyude Paul
  2024-04-26 15:41 ` [PATCH 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3 Lyude Paul
  2024-04-26 15:47 ` [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() Timur Tabi
@ 2024-04-29 18:23 ` Lyude Paul
  2024-04-29 18:23   ` [PATCH v2 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3 Lyude Paul
  2024-04-29 23:16   ` [PATCH v2 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() David Airlie
  2 siblings, 2 replies; 11+ messages in thread
From: Lyude Paul @ 2024-04-29 18:23 UTC (permalink / raw)
  To: nouveau
  Cc: dri-devel, Karol Herbst, Danilo Krummrich, David Airlie,
	Daniel Vetter, Justin Stitt, Dave Airlie, Kees Cook, Ben Skeggs,
	open list

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

  kernel BUG at include/linux/scatterlist.h:187!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
  Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
  RIP: 0010:sg_init_one+0x85/0xa0
  Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
  24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
  0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
  RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
  RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
  R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
  R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
  FS:  00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
  Call Trace:
   <TASK>
   ? die+0x36/0x90
   ? do_trap+0xdd/0x100
   ? sg_init_one+0x85/0xa0
   ? do_error_trap+0x65/0x80
   ? sg_init_one+0x85/0xa0
   ? exc_invalid_op+0x50/0x70
   ? sg_init_one+0x85/0xa0
   ? asm_exc_invalid_op+0x1a/0x20
   ? sg_init_one+0x85/0xa0
   nvkm_firmware_ctor+0x14a/0x250 [nouveau]
   nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
   ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
   r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
   ? srso_return_thunk+0x5/0x5f
   ? srso_return_thunk+0x5/0x5f
   ? nvkm_udevice_new+0x95/0x140 [nouveau]
   ? srso_return_thunk+0x5/0x5f
   ? srso_return_thunk+0x5/0x5f
   ? ktime_get+0x47/0xb0
   ? srso_return_thunk+0x5/0x5f
   nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
   nvkm_subdev_init_+0x39/0x140 [nouveau]
   ? srso_return_thunk+0x5/0x5f
   nvkm_subdev_init+0x44/0x90 [nouveau]
   nvkm_device_init+0x166/0x2e0 [nouveau]
   nvkm_udevice_init+0x47/0x70 [nouveau]
   nvkm_object_init+0x41/0x1c0 [nouveau]
   nvkm_ioctl_new+0x16a/0x290 [nouveau]
   ? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
   ? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
   nvkm_ioctl+0x126/0x290 [nouveau]
   nvif_object_ctor+0x112/0x190 [nouveau]
   nvif_device_ctor+0x23/0x60 [nouveau]
   nouveau_cli_init+0x164/0x640 [nouveau]
   nouveau_drm_device_init+0x97/0x9e0 [nouveau]
   ? srso_return_thunk+0x5/0x5f
   ? pci_update_current_state+0x72/0xb0
   ? srso_return_thunk+0x5/0x5f
   nouveau_drm_probe+0x12c/0x280 [nouveau]
   ? srso_return_thunk+0x5/0x5f
   local_pci_probe+0x45/0xa0
   pci_device_probe+0xc7/0x270
   really_probe+0xe6/0x3a0
   __driver_probe_device+0x87/0x160
   driver_probe_device+0x1f/0xc0
   __driver_attach+0xec/0x1f0
   ? __pfx___driver_attach+0x10/0x10
   bus_for_each_dev+0x88/0xd0
   bus_add_driver+0x116/0x220
   driver_register+0x59/0x100
   ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
   do_one_initcall+0x5b/0x320
   do_init_module+0x60/0x250
   init_module_from_file+0x86/0xc0
   idempotent_init_module+0x120/0x2b0
   __x64_sys_finit_module+0x5e/0xb0
   do_syscall_64+0x83/0x160
   ? srso_return_thunk+0x5/0x5f
   entry_SYSCALL_64_after_hwframe+0x71/0x79
  RIP: 0033:0x7feeb5cc20cd
  Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
  f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
  ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
  RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
  RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
  RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
  R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
  R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
   </TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
---
 drivers/gpu/drm/nouveau/nvkm/core/firmware.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/core/firmware.c b/drivers/gpu/drm/nouveau/nvkm/core/firmware.c
index adc60b25f8e6c..141b0a513bf52 100644
--- a/drivers/gpu/drm/nouveau/nvkm/core/firmware.c
+++ b/drivers/gpu/drm/nouveau/nvkm/core/firmware.c
@@ -205,7 +205,9 @@ nvkm_firmware_dtor(struct nvkm_firmware *fw)
 		break;
 	case NVKM_FIRMWARE_IMG_DMA:
 		nvkm_memory_unref(&memory);
-		dma_free_coherent(fw->device->dev, sg_dma_len(&fw->mem.sgl), fw->img, fw->phys);
+		dma_unmap_single(fw->device->dev, fw->phys, sg_dma_len(&fw->mem.sgl),
+				 DMA_TO_DEVICE);
+		kfree(fw->img);
 		break;
 	case NVKM_FIRMWARE_IMG_SGT:
 		nvkm_memory_unref(&memory);
@@ -235,14 +237,17 @@ nvkm_firmware_ctor(const struct nvkm_firmware_func *func, const char *name,
 		fw->img = kmemdup(src, fw->len, GFP_KERNEL);
 		break;
 	case NVKM_FIRMWARE_IMG_DMA: {
-		dma_addr_t addr;
-
 		len = ALIGN(fw->len, PAGE_SIZE);
 
-		fw->img = dma_alloc_coherent(fw->device->dev, len, &addr, GFP_KERNEL);
-		if (fw->img) {
-			memcpy(fw->img, src, fw->len);
-			fw->phys = addr;
+		fw->img = kmalloc(len, GFP_KERNEL);
+		if (!fw->img)
+			return -ENOMEM;
+
+		memcpy(fw->img, src, fw->len);
+		fw->phys = dma_map_single(fw->device->dev, fw->img, len, DMA_TO_DEVICE);
+		if (dma_mapping_error(fw->device->dev, fw->phys)) {
+			kfree(fw->img);
+			return -EFAULT;
 		}
 
 		sg_init_one(&fw->mem.sgl, fw->img, len);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3
  2024-04-29 18:23 ` [PATCH v2 " Lyude Paul
@ 2024-04-29 18:23   ` Lyude Paul
  2024-04-28 16:36     ` Ben Skeggs
  2024-04-29 23:16   ` [PATCH v2 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() David Airlie
  1 sibling, 1 reply; 11+ messages in thread
From: Lyude Paul @ 2024-04-29 18:23 UTC (permalink / raw)
  To: nouveau
  Cc: dri-devel, stable, Karol Herbst, Danilo Krummrich, David Airlie,
	Daniel Vetter, Dave Airlie, Ben Skeggs, Timur Tabi,
	Dan Carpenter, open list

Currently we allocate all 3 levels of radix3 page tables using
nvkm_gsp_mem_ctor(), which uses dma_alloc_coherent() for allocating all of
the relevant memory. This can end up failing in scenarios where the system
has very high memory fragmentation, and we can't find enough contiguous
memory to allocate level 2 of the page table.

Currently, this can result in runtime PM issues on systems where memory
fragmentation is high - as we'll fail to allocate the page table for our
suspend/resume buffer:

  kworker/10:2: page allocation failure: order:7, mode:0xcc0(GFP_KERNEL),
  nodemask=(null),cpuset=/,mems_allowed=0
  CPU: 10 PID: 479809 Comm: kworker/10:2 Not tainted
  6.8.6-201.ChopperV6.fc39.x86_64 #1
  Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024
  Workqueue: pm pm_runtime_work
  Call Trace:
   <TASK>
   dump_stack_lvl+0x64/0x80
   warn_alloc+0x165/0x1e0
   ? __alloc_pages_direct_compact+0xb3/0x2b0
   __alloc_pages_slowpath.constprop.0+0xd7d/0xde0
   __alloc_pages+0x32d/0x350
   __dma_direct_alloc_pages.isra.0+0x16a/0x2b0
   dma_direct_alloc+0x70/0x270
   nvkm_gsp_radix3_sg+0x5e/0x130 [nouveau]
   r535_gsp_fini+0x1d4/0x350 [nouveau]
   nvkm_subdev_fini+0x67/0x150 [nouveau]
   nvkm_device_fini+0x95/0x1e0 [nouveau]
   nvkm_udevice_fini+0x53/0x70 [nouveau]
   nvkm_object_fini+0xb9/0x240 [nouveau]
   nvkm_object_fini+0x75/0x240 [nouveau]
   nouveau_do_suspend+0xf5/0x280 [nouveau]
   nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau]
   pci_pm_runtime_suspend+0x67/0x1e0
   ? __pfx_pci_pm_runtime_suspend+0x10/0x10
   __rpm_callback+0x41/0x170
   ? __pfx_pci_pm_runtime_suspend+0x10/0x10
   rpm_callback+0x5d/0x70
   ? __pfx_pci_pm_runtime_suspend+0x10/0x10
   rpm_suspend+0x120/0x6a0
   pm_runtime_work+0x98/0xb0
   process_one_work+0x171/0x340
   worker_thread+0x27b/0x3a0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0xe5/0x120
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x31/0x50
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1b/0x30

Luckily, we don't actually need to allocate coherent memory for the page
table thanks to being able to pass the GPU a radix3 page table for
suspend/resume data. So, let's rewrite nvkm_gsp_radix3_sg() to use the sg
allocator for level 2. We continue using coherent allocations for lvl0 and
1, since they only take a single page.

V2:
* Don't forget to actually jump to the next scatterlist when we reach the
  end of the scatterlist we're currently on when writing out the page table
  for level 2

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
---
 .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |  4 +-
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    | 77 ++++++++++++-------
 2 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
index 6f5d376d8fcc1..a11d16a16c3b2 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
@@ -15,7 +15,9 @@ struct nvkm_gsp_mem {
 };
 
 struct nvkm_gsp_radix3 {
-	struct nvkm_gsp_mem mem[3];
+	struct nvkm_gsp_mem lvl0;
+	struct nvkm_gsp_mem lvl1;
+	struct sg_table lvl2;
 };
 
 int nvkm_gsp_sg(struct nvkm_device *, u64 size, struct sg_table *);
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 9858c1438aa7f..fd4e80ba6adfc 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -1624,7 +1624,7 @@ r535_gsp_wpr_meta_init(struct nvkm_gsp *gsp)
 	meta->magic = GSP_FW_WPR_META_MAGIC;
 	meta->revision = GSP_FW_WPR_META_REVISION;
 
-	meta->sysmemAddrOfRadix3Elf = gsp->radix3.mem[0].addr;
+	meta->sysmemAddrOfRadix3Elf = gsp->radix3.lvl0.addr;
 	meta->sizeOfRadix3Elf = gsp->fb.wpr2.elf.size;
 
 	meta->sysmemAddrOfBootloader = gsp->boot.fw.addr;
@@ -1919,8 +1919,9 @@ nvkm_gsp_sg(struct nvkm_device *device, u64 size, struct sg_table *sgt)
 static void
 nvkm_gsp_radix3_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_radix3 *rx3)
 {
-	for (int i = ARRAY_SIZE(rx3->mem) - 1; i >= 0; i--)
-		nvkm_gsp_mem_dtor(gsp, &rx3->mem[i]);
+	nvkm_gsp_sg_free(gsp->subdev.device, &rx3->lvl2);
+	nvkm_gsp_mem_dtor(gsp, &rx3->lvl1);
+	nvkm_gsp_mem_dtor(gsp, &rx3->lvl0);
 }
 
 /**
@@ -1960,36 +1961,60 @@ static int
 nvkm_gsp_radix3_sg(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size,
 		   struct nvkm_gsp_radix3 *rx3)
 {
-	u64 addr;
+	struct sg_dma_page_iter sg_dma_iter;
+	struct scatterlist *sg;
+	size_t bufsize;
+	u64 *pte;
+	int ret, i, page_idx = 0;
 
-	for (int i = ARRAY_SIZE(rx3->mem) - 1; i >= 0; i--) {
-		u64 *ptes;
-		size_t bufsize;
-		int ret, idx;
+	ret = nvkm_gsp_mem_ctor(gsp, GSP_PAGE_SIZE, &rx3->lvl0);
+	if (ret)
+		return ret;
 
-		bufsize = ALIGN((size / GSP_PAGE_SIZE) * sizeof(u64), GSP_PAGE_SIZE);
-		ret = nvkm_gsp_mem_ctor(gsp, bufsize, &rx3->mem[i]);
-		if (ret)
-			return ret;
+	ret = nvkm_gsp_mem_ctor(gsp, GSP_PAGE_SIZE, &rx3->lvl1);
+	if (ret)
+		goto lvl1_fail;
 
-		ptes = rx3->mem[i].data;
-		if (i == 2) {
-			struct scatterlist *sgl;
+	// Allocate level 2
+	bufsize = ALIGN((size / GSP_PAGE_SIZE) * sizeof(u64), GSP_PAGE_SIZE);
+	ret = nvkm_gsp_sg(gsp->subdev.device, bufsize, &rx3->lvl2);
+	if (ret)
+		goto lvl2_fail;
 
-			for_each_sgtable_dma_sg(sgt, sgl, idx) {
-				for (int j = 0; j < sg_dma_len(sgl) / GSP_PAGE_SIZE; j++)
-					*ptes++ = sg_dma_address(sgl) + (GSP_PAGE_SIZE * j);
-			}
-		} else {
-			for (int j = 0; j < size / GSP_PAGE_SIZE; j++)
-				*ptes++ = addr + GSP_PAGE_SIZE * j;
+	// Write the bus address of level 1 to level 0
+	pte = rx3->lvl0.data;
+	*pte = rx3->lvl1.addr;
+
+	// Write the bus address of each page in level 2 to level 1
+	pte = rx3->lvl1.data;
+	for_each_sgtable_dma_page(&rx3->lvl2, &sg_dma_iter, 0)
+		*pte++ = sg_page_iter_dma_address(&sg_dma_iter);
+
+	// Finally, write the bus address of each page in sgt to level 2
+	for_each_sgtable_sg(&rx3->lvl2, sg, i) {
+		void *sgl_end;
+
+		pte = sg_virt(sg);
+		sgl_end = (void*)pte + sg->length;
+
+		for_each_sgtable_dma_page(sgt, &sg_dma_iter, page_idx) {
+			*pte++ = sg_page_iter_dma_address(&sg_dma_iter);
+			page_idx++;
+
+			// Go to the next scatterlist for level 2 if we've reached the end
+			if ((void*)pte >= sgl_end)
+				break;
 		}
+	}
 
-		size = rx3->mem[i].size;
-		addr = rx3->mem[i].addr;
+	if (ret) {
+lvl2_fail:
+		nvkm_gsp_mem_dtor(gsp, &rx3->lvl1);
+lvl1_fail:
+		nvkm_gsp_mem_dtor(gsp, &rx3->lvl0);
 	}
 
-	return 0;
+	return ret;
 }
 
 int
@@ -2021,7 +2046,7 @@ r535_gsp_fini(struct nvkm_gsp *gsp, bool suspend)
 		sr = gsp->sr.meta.data;
 		sr->magic = GSP_FW_SR_META_MAGIC;
 		sr->revision = GSP_FW_SR_META_REVISION;
-		sr->sysmemAddrOfSuspendResumeData = gsp->sr.radix3.mem[0].addr;
+		sr->sysmemAddrOfSuspendResumeData = gsp->sr.radix3.lvl0.addr;
 		sr->sizeOfSuspendResumeData = len;
 
 		mbox0 = lower_32_bits(gsp->sr.meta.addr);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()
  2024-04-29 18:23 ` [PATCH v2 " Lyude Paul
  2024-04-29 18:23   ` [PATCH v2 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3 Lyude Paul
@ 2024-04-29 23:16   ` David Airlie
  1 sibling, 0 replies; 11+ messages in thread
From: David Airlie @ 2024-04-29 23:16 UTC (permalink / raw)
  To: Lyude Paul
  Cc: nouveau, dri-devel, Karol Herbst, Danilo Krummrich, David Airlie,
	Daniel Vetter, Justin Stitt, Kees Cook, Ben Skeggs, open list

> V2:
> * Fixup explanation as the prior one was bogus

For v2 of both patches,

Reviewed-by: Dave Airlie <airlied@redhat.com>

Please apply to drm-misc-fixes

Dave.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-04-29 23:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-26 15:41 [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() Lyude Paul
2024-04-26 15:41 ` [PATCH 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3 Lyude Paul
2024-04-29  6:03   ` Dave Airlie
2024-04-29 17:54     ` Lyude Paul
2024-04-26 15:47 ` [PATCH 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() Timur Tabi
2024-04-28 15:52   ` Lyude Paul
2024-04-28 18:19     ` Timur Tabi
2024-04-29 18:23 ` [PATCH v2 " Lyude Paul
2024-04-29 18:23   ` [PATCH v2 2/2] drm/nouveau/gsp: Use the sg allocator for level 2 of radix3 Lyude Paul
2024-04-28 16:36     ` Ben Skeggs
2024-04-29 23:16   ` [PATCH v2 1/2] drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor() David Airlie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).