dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Steven Price <steven.price@arm.com>
To: Boris Brezillon <boris.brezillon@collabora.com>,
	dri-devel@lists.freedesktop.org
Cc: "Nicolas Boichat" <drinkcat@chromium.org>,
	kernel@collabora.com, "Daniel Stone" <daniels@collabora.com>,
	"Neil Armstrong" <neil.armstrong@linaro.org>,
	"Liviu Dudau" <Liviu.Dudau@arm.com>,
	"Clément Péron" <peron.clem@gmail.com>,
	"Marty E . Plummer" <hanetzer@startmail.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Faith Ekstrand" <faith.ekstrand@collabora.com>
Subject: Re: [PATCH v3 10/14] drm/panthor: Add the scheduler logical block
Date: Mon, 11 Dec 2023 16:27:36 +0000	[thread overview]
Message-ID: <d100db02-fc65-452e-8081-ff6794284e70@arm.com> (raw)
In-Reply-To: <20231204173313.2098733-11-boris.brezillon@collabora.com>

On 04/12/2023 17:33, Boris Brezillon wrote:
> This is the piece of software interacting with the FW scheduler, and
> taking care of some scheduling aspects when the FW comes short of slots
> scheduling slots. Indeed, the FW only expose a few slots, and the kernel
> has to give all submission contexts, a chance to execute their jobs.
> 
> The kernel-side scheduler is timeslice-based, with a round-robin queue
> per priority level.
> 
> Job submission is handled with a 1:1 drm_sched_entity:drm_gpu_scheduler,
> allowing us to delegate the dependency tracking to the core.
> 
> All the gory details should be documented inline.
> 
> v3:
> - Rework the FW event handling logic to avoid races
> - Make sure MMU faults kill the group immediately
> - Use the panthor_kernel_bo abstraction for group/queue buffers
> - Make in_progress an atomic_t, so we can check it without the reset lock
>   held
> - Don't limit the number of groups per context to the FW scheduler
>   capacity. Fix the limit to 128 for now.
> - Add a panthor_job_vm() helper
> - Account for panthor_vm changes
> - Add our job fence as DMA_RESV_USAGE_WRITE to all external objects
>   (was previously DMA_RESV_USAGE_BOOKKEEP). I don't get why, given
>   we're supposed to be fully-explicit, but other drivers do that, so
>   there must be a good reason
> - Account for drm_sched changes
> - Provide a panthor_queue_put_syncwait_obj()
> - Unconditionally return groups to their idle list in
>   panthor_sched_suspend()
> - Condition of sched_queue_{,delayed_}work fixed to be only when a reset
>   isn't pending or in progress.
> - Several typos in comments fixed.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Signed-off-by: Steven Price <steven.price@arm.com>

Two minor comments below, but either way:

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>  drivers/gpu/drm/panthor/panthor_sched.c | 3410 +++++++++++++++++++++++
>  drivers/gpu/drm/panthor/panthor_sched.h |   48 +
>  2 files changed, 3458 insertions(+)
>  create mode 100644 drivers/gpu/drm/panthor/panthor_sched.c
>  create mode 100644 drivers/gpu/drm/panthor/panthor_sched.h
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> new file mode 100644
> index 000000000000..08e5662f4879
> --- /dev/null
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -0,0 +1,3410 @@
> +// SPDX-License-Identifier: GPL-2.0 or MIT
> +/* Copyright 2023 Collabora ltd. */
> +
> +#include <drm/panthor_drm.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gem_shmem_helper.h>
> +#include <drm/drm_managed.h>
> +#include <drm/gpu_scheduler.h>
> +
> +#include <linux/build_bug.h>
> +#include <linux/clk.h>
> +#include <linux/delay.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/firmware.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/iopoll.h>
> +#include <linux/iosys-map.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/dma-resv.h>
> +
> +#include "panthor_sched.h"
> +#include "panthor_devfreq.h"
> +#include "panthor_device.h"
> +#include "panthor_gem.h"
> +#include "panthor_heap.h"
> +#include "panthor_regs.h"
> +#include "panthor_gpu.h"
> +#include "panthor_fw.h"
> +#include "panthor_mmu.h"
> +
> +/**
> + * DOC: Scheduler
> + *
> + * Mali CSF hardware adopts a firmware-assisted scheduling model, where
> + * the firmware takes care of scheduling aspects, to some extend.
> + *
> + * The scheduling happens at the scheduling group level, each group
> + * contains 1 to N queues (N is FW/hardware dependent, and exposed
> + * through the firmware interface). Each queue is assigned a command
> + * stream ring buffer, which serves as a way to get jobs submitted to
> + * the GPU, among other things.
> + *
> + * The firmware can schedule a maximum of M groups (M is FW/hardware
> + * dependent, and exposed through the firmware interface). Passed
> + * this maximum number of groups, the kernel must take care of
> + * rotating the groups passed to the firmware so every group gets
> + * a chance to have his queues scheduled for execution.
> + *
> + * The current implementation only supports with kernel-mode queues.
> + * In other terms, userspace doesn't have access to the ring-buffer.
> + * Instead, userspace passes indirect command stream buffers that are
> + * called from the queue ring-buffer by the kernel using a pre-defined
> + * sequence of command stream instructions to ensure the userspace driver
> + * always gets consistent results (cache maintenance,
> + * synchronization, ...).
> + *
> + * We rely on the drm_gpu_scheduler framework to deal with job
> + * dependencies and submission. As any other driver dealing with a
> + * FW-scheduler, we use the 1:1 entity:scheduler mode, such that each
> + * entity has its own job scheduler. When a job is ready to be executed
> + * (all its dependencies are met), it is pushed to the appropriate
> + * queue ring-buffer, and the group is scheduled for execution if it
> + * wasn't already active.
> + *
> + * Kernel-side group scheduling is timeslice-based. When we have less
> + * groups than there are slots, the periodic tick is disabled and we
> + * just let the FW schedule the active groups. When there are more
> + * groups than slots, we let each group a chance to execute stuff for
> + * a given amount of time, and then re-evaluate and pick new groups
> + * to schedule. The group selection algorithm is based on
> + * priority+round-robin.
> + *
> + * Even though user-mode queues is out of the scope right now, the
> + * current design takes them into account by avoiding any guess on the
> + * group/queue state that would be based on information we wouldn't have
> + * if userspace was in charge of the ring-buffer. That's also one of the
> + * reason we don't do 'cooperative' scheduling (encoding FW group slot
> + * reservation as dma_fence that would be returned from the
> + * drm_gpu_scheduler::prepare_job() hook, and treating group rotation as
> + * a queue of waiters, ordered by job submission order). This approach
> + * would work for kernel-mode queues, but would make user-mode queues a
> + * lot more complicated to retrofit.
> + */
> +
> +#define JOB_TIMEOUT_MS				5000
> +
> +#define MIN_CS_PER_CSG				8
> +
> +#define MIN_CSGS				3
> +#define MAX_CSG_PRIO				0xf
> +
> +struct panthor_group;
> +
> +/**
> + * struct panthor_csg_slot - Command stream group slot
> + *
> + * This represents a FW slot for a scheduling group.
> + */
> +struct panthor_csg_slot {
> +	/** @group: Scheduling group bound to this slot. */
> +	struct panthor_group *group;
> +
> +	/** @priority: Group priority. */
> +	u8 priority;
> +
> +	/**
> +	 * @idle: True if the group bound to this slot is idle.
> +	 *
> +	 * A group is idle when it has nothing waiting for execution on
> +	 * all its queues, or when queues are blocked waiting for something
> +	 * to happen (synchronization object).
> +	 */
> +	bool idle;
> +};
> +
> +/**
> + * enum panthor_csg_priority - Group priority
> + */
> +enum panthor_csg_priority {
> +	/** @PANTHOR_CSG_PRIORITY_LOW: Low priority group. */
> +	PANTHOR_CSG_PRIORITY_LOW = 0,
> +
> +	/** @PANTHOR_CSG_PRIORITY_MEDIUM: Medium priority group. */
> +	PANTHOR_CSG_PRIORITY_MEDIUM,
> +
> +	/** @PANTHOR_CSG_PRIORITY_HIGH: High priority group. */
> +	PANTHOR_CSG_PRIORITY_HIGH,
> +
> +	/**
> +	 * @PANTHOR_CSG_PRIORITY_RT: Real-time priority group.
> +	 *
> +	 * Real-time priority allows one to preempt scheduling of other
> +	 * non-real-time groups. When such a group becomes executable,
> +	 * it will evict the group with the lowest non-rt priority if
> +	 * there's no free group slot available.
> +	 *
> +	 * Currently not exposed to userspace.
> +	 */
> +	PANTHOR_CSG_PRIORITY_RT,
> +
> +	/** @PANTHOR_CSG_PRIORITY_COUNT: Number of priority levels. */
> +	PANTHOR_CSG_PRIORITY_COUNT,
> +};
> +
> +/**
> + * struct panthor_scheduler - Object used to manage the scheduler
> + */
> +struct panthor_scheduler {
> +	/** @ptdev: Device. */
> +	struct panthor_device *ptdev;
> +
> +	/**
> +	 * @wq: Workqueue used by our internal scheduler logic.
> +	 *
> +	 * Used for the scheduler tick, group update or other kind of FW
> +	 * event processing that can't be handled in the threaded interrupt
> +	 * path.
> +	 */
> +	struct workqueue_struct *wq;
> +
> +	/**
> +	 * @drm_sched_wq: Workqueue passed to the drm_gpu_scheduler.
> +	 *
> +	 * The driver doesn't use this queue, it's left entirely to the
> +	 * drm_sched for job dequeuing/cleanup.
> +	 */
> +	struct workqueue_struct *drm_sched_wq;
> +
> +	/** @tick_work: Work executed on a scheduling tick. */
> +	struct delayed_work tick_work;
> +
> +	/**
> +	 * @sync_upd_work: Work used to process synchronization object updates.
> +	 *
> +	 * We use this work to unblock queues/groups that were waiting on a
> +	 * synchronization object.
> +	 */
> +	struct work_struct sync_upd_work;
> +
> +	/**
> +	 * @fw_events_work: Work used to process FW events outside the interrupt path.
> +	 *
> +	 * Even if the interrupt is threaded, we need any event processing
> +	 * that require taking the panthor_scheduler::lock to be processed
> +	 * outside the interrupt path so we don't block the tick logic when
> +	 * it calls panthor_fw_{csg,wait}_wait_acks(). Since most of the
> +	 * even processing require taking this lock, we just delegate all

           ^^^^^^^^^^^^^^^^^^^^^^^
           event processing requires

> +	 * FW event processing to the scheduler workqueue.
> +	 */
> +	struct work_struct fw_events_work;
> +
> +	/**
> +	 * @fw_events: Bitmask encoding pending FW events.
> +	 */
> +	atomic_t fw_events;
> +
> +	/**
> +	 * @resched_target: When the next tick should occur.
> +	 *
> +	 * Expressed in jiffies.
> +	 */
> +	u64 resched_target;
> +
> +	/**
> +	 * @last_tick: When the last tick occurred.
> +	 *
> +	 * Expressed in jiffies.
> +	 */
> +	u64 last_tick;
> +
> +	/** @tick_period: Tick period in jiffies. */
> +	u64 tick_period;
> +
> +	/**
> +	 * @lock: Lock protecting access to all the scheduler fields.
> +	 *
> +	 * Should be taken in the tick work, the irq handler, and anywhere the @groups
> +	 * fields are touched.
> +	 */
> +	struct mutex lock;
> +
> +	/** @groups: Various lists used to classify groups. */
> +	struct {
> +		/**
> +		 * @runnable: Runnable group lists.
> +		 *
> +		 * When a group has queues that want to execute something,
> +		 * its panthor_group::run_node should be inserted here.
> +		 *
> +		 * One list per-priority.
> +		 */
> +		struct list_head runnable[PANTHOR_CSG_PRIORITY_COUNT];
> +
> +		/**
> +		 * @idle: Idle group lists.
> +		 *
> +		 * When all queues of a group are idle (either because they
> +		 * have nothing to execute, or because they are blocked), the
> +		 * panthor_group::run_node field should be inserted here.
> +		 *
> +		 * One list per-priority.
> +		 */
> +		struct list_head idle[PANTHOR_CSG_PRIORITY_COUNT];
> +
> +		/**
> +		 * @waiting: List of groups whose queues are blocked on a
> +		 * synchronization object.
> +		 *
> +		 * Insert panthor_group::wait_node here when a group is waiting
> +		 * for synchronization objects to be signaled.
> +		 *
> +		 * This list is evaluated in the @sync_upd_work work.
> +		 */
> +		struct list_head waiting;
> +	} groups;
> +
> +	/**
> +	 * @csg_slots: FW command stream group slots.
> +	 */
> +	struct panthor_csg_slot csg_slots[MAX_CSGS];
> +
> +	/** @csg_slot_count: Number of command stream group slots exposed by the FW. */
> +	u32 csg_slot_count;
> +
> +	/** @cs_slot_count: Number of command stream slot per group slot exposed by the FW. */
> +	u32 cs_slot_count;
> +
> +	/** @as_slot_count: Number of address space slots supported by the MMU. */
> +	u32 as_slot_count;
> +
> +	/** @used_csg_slot_count: Number of command stream group slot currently used. */
> +	u32 used_csg_slot_count;
> +
> +	/** @sb_slot_count: Number of scoreboard slots. */
> +	u32 sb_slot_count;
> +
> +	/**
> +	 * @might_have_idle_groups: True if an active group might have become idle.
> +	 *
> +	 * This will force a tick, so other runnable groups can be scheduled if one
> +	 * or more active groups became idle.
> +	 */
> +	bool might_have_idle_groups;
> +
> +	/** @pm: Power management related fields. */
> +	struct {
> +		/** @has_ref: True if the scheduler owns a runtime PM reference. */
> +		bool has_ref;
> +	} pm;
> +
> +	/** @reset: Reset related fields. */
> +	struct {
> +		/** @lock: Lock protecting the other reset fields. */
> +		struct mutex lock;
> +
> +		/**
> +		 * @in_progress: True if a reset is in progress.
> +		 *
> +		 * Set to true in panthor_sched_pre_reset() and back to false in
> +		 * panthor_sched_post_reset().
> +		 */
> +		atomic_t in_progress;
> +
> +		/**
> +		 * @stopped_groups: List containing all groups that were stopped
> +		 * before a reset.
> +		 *
> +		 * Insert panthor_group::run_node in the pre_reset path.
> +		 */
> +		struct list_head stopped_groups;
> +	} reset;
> +};

<snip>

> +
> +static void process_fw_events_work(struct work_struct *work)
> +{
> +	struct panthor_scheduler *sched = container_of(work, struct panthor_scheduler,
> +						      fw_events_work);
> +	u32 events = atomic_fetch_and(0, &sched->fw_events);

I think atomic_xchg() would be clearer here.

> +	struct panthor_device *ptdev = sched->ptdev;
> +
> +	mutex_lock(&sched->lock);
> +
> +	if (events & JOB_INT_GLOBAL_IF) {
> +		sched_process_global_irq_locked(ptdev);
> +		events &= ~JOB_INT_GLOBAL_IF;
> +	}
> +
> +	while (events) {
> +		u32 csg_id = ffs(events) - 1;
> +		sched_process_csg_irq_locked(ptdev, csg_id);
> +		events &= ~BIT(csg_id);
> +	}
> +
> +	mutex_unlock(&sched->lock);
> +}

<snip>
Thanks,

Steve

  reply	other threads:[~2023-12-11 16:27 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04 17:32 [PATCH v3 00/14] drm: Add a driver for CSF-based Mali GPUs Boris Brezillon
2023-12-04 17:32 ` [PATCH v3 01/14] drm/panthor: Add uAPI Boris Brezillon
2023-12-06 16:17   ` Steven Price
2023-12-18 13:20   ` Chris Diamand
2024-01-15 11:18     ` Boris Brezillon
2023-12-04 17:32 ` [PATCH v3 02/14] drm/panthor: Add GPU register definitions Boris Brezillon
2023-12-06 16:23   ` Steven Price
2023-12-04 17:32 ` [PATCH v3 03/14] drm/panthor: Add the device logical block Boris Brezillon
2023-12-06 16:55   ` Steven Price
2023-12-07  8:12     ` Boris Brezillon
2023-12-07  8:56       ` Boris Brezillon
2023-12-07 10:23         ` Steven Price
2023-12-07 10:49           ` Boris Brezillon
2023-12-07 11:11           ` [EXTERNAL] " Donald Robson
2023-12-22 13:26   ` Liviu Dudau
2023-12-22 14:04     ` Boris Brezillon
2023-12-04 17:32 ` [PATCH v3 04/14] drm/panthor: Add the GPU " Boris Brezillon
2023-12-07 16:05   ` Steven Price
2023-12-04 17:32 ` [PATCH v3 05/14] drm/panthor: Add GEM " Boris Brezillon
2023-12-07 16:38   ` Steven Price
2024-01-15 10:29     ` Boris Brezillon
2023-12-04 17:32 ` [PATCH v3 06/14] drm/panthor: Add the devfreq " Boris Brezillon
2023-12-05  9:42   ` Clément Péron
2023-12-04 17:33 ` [PATCH v3 07/14] drm/panthor: Add the MMU/VM " Boris Brezillon
2023-12-08 14:28   ` Steven Price
2024-01-15 11:04     ` Boris Brezillon
2024-01-15 17:31   ` Boris Brezillon
2024-01-15 17:38   ` Boris Brezillon
2024-01-15 17:41   ` Boris Brezillon
2024-01-15 18:09   ` Boris Brezillon
2023-12-04 17:33 ` [PATCH v3 08/14] drm/panthor: Add the FW " Boris Brezillon
2023-12-08 15:39   ` Steven Price
2023-12-18 21:25   ` Chris Diamand
2024-01-15 11:37     ` Boris Brezillon
2024-01-22 16:34     ` Boris Brezillon
2024-01-22 21:14       ` Chris Diamand
2023-12-20 15:12   ` Liviu Dudau
2024-01-15 12:56     ` Boris Brezillon
2023-12-04 17:33 ` [PATCH v3 09/14] drm/panthor: Add the heap " Boris Brezillon
2023-12-08 16:27   ` Steven Price
2024-01-15 11:15     ` Boris Brezillon
2023-12-04 17:33 ` [PATCH v3 10/14] drm/panthor: Add the scheduler " Boris Brezillon
2023-12-11 16:27   ` Steven Price [this message]
2024-01-15 13:03     ` Boris Brezillon
2023-12-19 11:50   ` Ketil Johnsen
2024-01-15 13:05     ` Boris Brezillon
2023-12-20 19:59   ` Ketil Johnsen
2024-01-15 13:11     ` Boris Brezillon
2023-12-04 17:33 ` [PATCH v3 11/14] drm/panthor: Add the driver frontend block Boris Brezillon
2023-12-13 11:47   ` Steven Price
2023-12-20 16:24   ` Liviu Dudau
2024-01-15 12:59     ` Boris Brezillon
2023-12-04 17:33 ` [PATCH v3 12/14] drm/panthor: Allow driver compilation Boris Brezillon
2023-12-13 13:18   ` Steven Price
2023-12-04 17:33 ` [PATCH v3 13/14] dt-bindings: gpu: mali-valhall-csf: Add support for Arm Mali CSF GPUs Boris Brezillon
2023-12-04 19:29   ` Rob Herring
2023-12-05  8:46     ` Boris Brezillon
2023-12-05 20:48   ` Rob Herring
2023-12-06 10:59     ` Liviu Dudau
2024-01-22 16:37       ` Boris Brezillon
2023-12-04 17:33 ` [PATCH v3 14/14] drm/panthor: Add an entry to MAINTAINERS Boris Brezillon
2023-12-13 13:51   ` Steven Price
2023-12-04 18:09 ` [PATCH v3 00/14] drm: Add a driver for CSF-based Mali GPUs Clément Péron
2023-12-05  8:04   ` Boris Brezillon
2023-12-05  8:48 ` Boris Brezillon
2023-12-06 15:47   ` Steven Price
2023-12-06 16:28     ` Boris Brezillon
2023-12-10  4:58 ` Tatsuyuki Ishi
2023-12-11  8:52   ` Boris Brezillon
2023-12-11 18:18     ` Faith Ekstrand
2024-01-15 14:18       ` Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d100db02-fc65-452e-8081-ff6794284e70@arm.com \
    --to=steven.price@arm.com \
    --cc=Liviu.Dudau@arm.com \
    --cc=boris.brezillon@collabora.com \
    --cc=daniels@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=drinkcat@chromium.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=hanetzer@startmail.com \
    --cc=kernel@collabora.com \
    --cc=neil.armstrong@linaro.org \
    --cc=peron.clem@gmail.com \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).