linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Scheduling while atomic due to i915 intel_uncore->lock
@ 2023-01-31 13:16 Richard Weinberger
  2023-02-16 18:04 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 2+ messages in thread
From: Richard Weinberger @ 2023-01-31 13:16 UTC (permalink / raw)
  To: rt-users

Hi,

On 5.15.40-rt43 I came across the following splat followed by a lockup:

[10416.548215] BUG: scheduling while atomic: X/447/0x00000002
[10416.548226] Modules linked in: magic(O) mei_hdcp mei_me mei
[10416.548238] CPU: 2 PID: 447 Comm: X Tainted: G           O
5.15.40-rt43 #1
[10416.548241] Hardware name: FUJITSU D3544-Sx/D3544-Sx, BIOS
V5.0.0.13 R1.12.0 for D3544-Sxx                    06/28/2020
[10416.548244] Call Trace:
[10416.548250]  <TASK>
[10416.548253]  dump_stack_lvl+0x34/0x44
[10416.548263]  __schedule_bug.cold+0x47/0x53
[10416.548267]  __schedule+0x108d/0x1450
[10416.548271]  ? raw_spin_rq_lock_nested+0x1a/0xe0
[10416.548276]  ? sched_clock_cpu+0x9/0xe0
[10416.548279]  ? update_rq_clock+0x31/0x160
[10416.548281]  ? rt_mutex_setprio+0x188/0x520
[10416.548284]  schedule_rtlock+0x1b/0x40
[10416.548286]  rtlock_slowlock_locked+0x373/0xcf0
[10416.548291]  ? rt_spin_unlock+0x13/0x40
[10416.548294]  rt_spin_lock+0x41/0x60
[10416.548296]  intel_gt_flush_ggtt_writes+0x45/0x70
[10416.548301]  reloc_cache_reset.constprop.0+0x71/0x110
[10416.548306]  eb_relocate_vma+0x125/0x150
[10416.548309]  ? rt_spin_unlock+0x13/0x40
[10416.548312]  ? kvfree_call_rcu+0x67/0x2d0
[10416.548316]  ? __kmalloc+0x145/0x2e0
[10416.548321]  ? ksize+0x14/0x30
[10416.548324]  ? i915_vma_pin_ww+0x731/0x920
[10416.548327]  ? eb_validate_vmas+0x24b/0x7f0
[10416.548329]  ? i915_gem_object_userptr_submit_init+0x20b/0x3f0
[10416.548333]  i915_gem_do_execbuffer+0x1082/0x1f90
[10416.548338]  ? ___slab_alloc+0x106/0x8c0
[10416.548341]  ? rt_spin_lock+0x26/0x60Move mesa/etnaviv to
xf86-video-modesetting
[10416.548344]  ? i915_gem_execbuffer2_ioctl+0xb5/0x250
[10416.548346]  ? __i915_gem_object_set_pages+0x1b4/0x200
[10416.548349]  ? i915_gem_userptr_get_pages+0x17f/0x190
[10416.548352]  ? __kmalloc_node+0x153/0x340
[10416.548355]  i915_gem_execbuffer2_ioctl+0x106/0x250
[10416.548357]  ? i915_gem_do_execbuffer+0x1f90/0x1f90
[10416.548360]  drm_ioctl_kernel+0x84/0xd0
[10416.548364]  drm_ioctl+0x1ff/0x3d0
[10416.548366]  ? i915_gem_do_execbuffer+0x1f90/0x1f90
[10416.548370]  __x64_sys_ioctl+0x7f/0xb0
[10416.548374]  do_syscall_64+0x35/0x80
[10416.548378]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[10416.548381] RIP: 0033:0x7fe88b8cfd6c
[10416.548385] Code: 1e fa 48 8d 44 24 08 48 89 54 24 e0 48 89 44 24
c0 48 8d 44 24 d0 48 89 44 24 c8 b8 10 00 00 00 c7 44 24 b8 10 00 00
00 0f 05 <3d> 00 f0 ff ff 41 89 c0 77 0a 44 89 c0 c3 66 0f 1f 44 00 00
48 8b
[10416.548387] RSP: 002b:00007ffdec4f0598 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[10416.548390] RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007fe88b8cfd6c
[10416.548391] RDX: 00007ffdec4f05d0 RSI: 0000000040406469 RDI: 000000000000000b
[10416.548393] RBP: 000055c4c7893b10 R08: 0000000000000000 R09: 0000000000003fd0
[10416.548394] R10: 00007fe88971e000 R11: 0000000000000246 R12: 00007ffdec4f05d0
[10416.548395] R13: 000055c4c7d18530 R14: 00007fe88972f488 R15: 00007fe88972f000
[10416.548398]  </TASK>

A reliable trigger for the problem is using the Chromium browser when
it utilizes OpenGL.

The root of the problem seems to be that struct intel_uncore->lock is
a regular spinlock but
taken in atomic context.
AFAICT the atomic context is a result of kmap_atomic() and
io_mapping_map_atomic_wc()
usage in i915.
Converting the said lock to a raw spinlock cures the problem but the
overall latency almost doubles.

I'm pretty sure the problem exists also on v6.2-rc3-rt1 because there
the affected lock is also
still a spinlock and used below kmap_atomic() and io_mapping_map_atomic_wc().

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Scheduling while atomic due to i915 intel_uncore->lock
  2023-01-31 13:16 Scheduling while atomic due to i915 intel_uncore->lock Richard Weinberger
@ 2023-02-16 18:04 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-02-16 18:04 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: rt-users

On 2023-01-31 14:16:48 [+0100], Richard Weinberger wrote:
> Hi,
Hi!,

> A reliable trigger for the problem is using the Chromium browser when
> it utilizes OpenGL.

need to try that. OpenGL as in watching a video with HW acceleration
enabled or is there something else?

> The root of the problem seems to be that struct intel_uncore->lock is
> a regular spinlock but
> taken in atomic context.
> AFAICT the atomic context is a result of kmap_atomic() and
> io_mapping_map_atomic_wc()

kmap_atomic() should be fine as it does not disable preemption.
io_mapping_map_atomic_wc() on the other hand does disable preemption.

Now looking at this, it seems to happen eb_relocate_vma(). From staring
at it, you might be lucky to get away with

diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
index 09d4f17c8d3b6..5ddc70a4e7e3e 100644
--- a/include/linux/io-mapping.h
+++ b/include/linux/io-mapping.h
@@ -162,7 +162,7 @@ static inline void __iomem *
 io_mapping_map_atomic_wc(struct io_mapping *mapping,
 			 unsigned long offset)
 {
-	preempt_disable();
+	migrate_disable();
 	pagefault_disable();
 	return io_mapping_map_wc(mapping, offset, PAGE_SIZE);
 }
@@ -172,7 +172,7 @@ io_mapping_unmap_atomic(void __iomem *vaddr)
 {
 	io_mapping_unmap(vaddr);
 	pagefault_enable();
-	preempt_enable();
+	migrate_enable();
 }
 
 static inline void __iomem *

which is based on 6.2 but you get the idea.

> usage in i915.
> Converting the said lock to a raw spinlock cures the problem but the
> overall latency almost doubles.

yeah. I tried that once, too.

> I'm pretty sure the problem exists also on v6.2-rc3-rt1 because there
> the affected lock is also
> still a spinlock and used below kmap_atomic() and io_mapping_map_atomic_wc().

Could you please try the hack above and check if it appears to work?

Sebastian

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-02-16 18:04 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-31 13:16 Scheduling while atomic due to i915 intel_uncore->lock Richard Weinberger
2023-02-16 18:04 ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).