All of lore.kernel.org
 help / color / mirror / Atom feed
* drm_cflush_sg() loops for over 3ms
@ 2020-01-13 14:34 ` David Laight
  0 siblings, 0 replies; 2+ messages in thread
From: David Laight @ 2020-01-13 14:34 UTC (permalink / raw)
  To: 'maarten.lankhorst@linux.intel.com',
	'mripard@kernel.org', 'sean@poorly.run',
	'airlied@linux.ie', 'daniel@ffwll.ch',
	'dri-devel@lists.freedesktop.org',
	'linux-kernel@vger.kernel.org'

I've been looking at why some RT processes don't get scheduled promptly.
In my test the RT process's affinity ties it to a single cpu (this may not be such
a good idea as it seems).

What I've found is that the Intel i915 graphics driver uses the 'events_unbound'
kernel worker thread to periodically execute drm_cflush_sg().
(see https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/drm_cache.c)

I'm guessing this is to ensure that any writes to graphics memory become
visible is a semi-timely manner.

This loop takes about 1us per iteration split fairly evenly between whatever is in
for_each_sg_page() and drm_cflush_page().
With a 2560x1440 display the loop count is 3600 (4 bytes/pixel) and the whole
function takes around 3.3ms.

Since the kernel isn't pre-emptive (I though that wasn't much harder than SMP)
nothing else can run on that cpu until the loop finishes.

Adding a cond_resched() to the loop (maybe every 64 iterations) will
allow higher priority processes to run.
But really the code needs to be a lot faster.

I actually suspect that the (I assume IPI based) wbinv_on_all_cpus() would be
a lot faster - especially done by a per-cpu work queue?

I had moderate difficulty getting from the process (kworker/u8:3) to the
name of the worker thread pool, never mind the actual work.
Fortunately it runs so long that some of the output from 'echo t >/proc/sysrq-trigger'
still linked the pid (which I knew from ftrace scheduler events (and schedviz))
to the actual work item name.
(Oh, after I'd written a program to tidy up the raw ftrace output so schedviz
didn't barf on a trace that had wrapped.)

Is there anything in /proc (etc) that shows all the work queues and their current
work?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 2+ messages in thread

* drm_cflush_sg() loops for over 3ms
@ 2020-01-13 14:34 ` David Laight
  0 siblings, 0 replies; 2+ messages in thread
From: David Laight @ 2020-01-13 14:34 UTC (permalink / raw)
  To: 'maarten.lankhorst@linux.intel.com',
	'mripard@kernel.org', 'sean@poorly.run',
	'airlied@linux.ie', 'daniel@ffwll.ch',
	'dri-devel@lists.freedesktop.org',
	'linux-kernel@vger.kernel.org'

I've been looking at why some RT processes don't get scheduled promptly.
In my test the RT process's affinity ties it to a single cpu (this may not be such
a good idea as it seems).

What I've found is that the Intel i915 graphics driver uses the 'events_unbound'
kernel worker thread to periodically execute drm_cflush_sg().
(see https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/drm_cache.c)

I'm guessing this is to ensure that any writes to graphics memory become
visible is a semi-timely manner.

This loop takes about 1us per iteration split fairly evenly between whatever is in
for_each_sg_page() and drm_cflush_page().
With a 2560x1440 display the loop count is 3600 (4 bytes/pixel) and the whole
function takes around 3.3ms.

Since the kernel isn't pre-emptive (I though that wasn't much harder than SMP)
nothing else can run on that cpu until the loop finishes.

Adding a cond_resched() to the loop (maybe every 64 iterations) will
allow higher priority processes to run.
But really the code needs to be a lot faster.

I actually suspect that the (I assume IPI based) wbinv_on_all_cpus() would be
a lot faster - especially done by a per-cpu work queue?

I had moderate difficulty getting from the process (kworker/u8:3) to the
name of the worker thread pool, never mind the actual work.
Fortunately it runs so long that some of the output from 'echo t >/proc/sysrq-trigger'
still linked the pid (which I knew from ftrace scheduler events (and schedviz))
to the actual work item name.
(Oh, after I'd written a program to tidy up the raw ftrace output so schedviz
didn't barf on a trace that had wrapped.)

Is there anything in /proc (etc) that shows all the work queues and their current
work?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-01-14  8:19 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-13 14:34 drm_cflush_sg() loops for over 3ms David Laight
2020-01-13 14:34 ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.