From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69C13C433EF for ; Wed, 10 Nov 2021 10:09:59 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2EADA61207 for ; Wed, 10 Nov 2021 10:09:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2EADA61207 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BF2D46ED74; Wed, 10 Nov 2021 10:09:55 +0000 (UTC) Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by gabe.freedesktop.org (Postfix) with ESMTPS id CBF9C6ED08; Wed, 10 Nov 2021 10:09:53 +0000 (UTC) Received: by mail-wr1-x42f.google.com with SMTP id r8so3008392wra.7; Wed, 10 Nov 2021 02:09:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=e4PoOedOinvy2PkhbLq0Cqv2vY0i3Bisk4wmUKFRYYc=; b=piWiabf1s5irBRKEYmIO9iPUr/qaf+k/D11vyq2lps8e81MY7BlSkZssc6S9ZTfy3I vHuEkjHL9xnLVxp0SK/Wck9NlkCzMPA+g0rYyUFoklSmqnDDOKD41LslJ1fGkzRq7uWV aBn6sTpGy83shXdcgj+Hy0QzgRjrQ0kT5/SBFJx2ZeXUwi7ixTL+4GGbdWbLz7jYbLvd 9nv9Gh3PmXuq0yBse15MUnAbM/R47vZdb8DcDhZdhnle4Rb2ycgykzmWbtZlR1G0vZqr SiQYMto2/oGjYrxyngp0pJiXhritKhwFATvCjFgMkwooHj4mUfbQ7lpk9cLczzYfSY4k 1yjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=e4PoOedOinvy2PkhbLq0Cqv2vY0i3Bisk4wmUKFRYYc=; b=xWwrR06iHOuhzZR04/0RSqNNklsRBuJOc6FPVUQ2PDgdPAXH03nhFYTRvPumWWYW3b f3w//lIu992AT0g3hzRPk95sraqzcFlJHQOKQhHilZbIukAIRZKUSPWN/h5enjYEKsA7 OlV1+3Ssv7iTIKv6B/vIdhfic2BI6oW1lHN4ml5dYCWFWh5X8V5oPAfMNiTQpTijDs6n wJ5bwoEDLpX3CNZAQCx19uhPCx3Wwfht0leLhoGnbPTSH6yBeTk6GgZXUwLhl8tg6fyz PyhDprzlbmLRbGQUin2cvCmpe6zvNLGEU+iiQ3+QkSXDFdJ8PCcd6dakm3uxVZdyjkV8 wa9w== X-Gm-Message-State: AOAM530K/v/VwjdgzVeuuhuvn3C0EZDZazyTbTM3HTBhdGY20fZoBATY lJnnzZP+wSBXZaUdWMk2h/o= X-Google-Smtp-Source: ABdhPJzsPzmFaZcIK1ceok6y5eFZkR7cCMDt6RFVriEg3TCs2CcqVt6dnmrkpRfRVwuGtVQavs9djA== X-Received: by 2002:adf:e54a:: with SMTP id z10mr18529439wrm.328.1636538992273; Wed, 10 Nov 2021 02:09:52 -0800 (PST) Received: from ?IPv6:2a02:908:1252:fb60:fa11:45ae:fadf:6269? ([2a02:908:1252:fb60:fa11:45ae:fadf:6269]) by smtp.gmail.com with ESMTPSA id u2sm24267263wrs.17.2021.11.10.02.09.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 10 Nov 2021 02:09:51 -0800 (PST) Subject: Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler To: Daniel Vetter , Rob Clark References: <1630406139-19621-1-git-send-email-Monk.Liu@amd.com> <1630406139-19621-2-git-send-email-Monk.Liu@amd.com> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: Date: Wed, 10 Nov 2021 11:09:50 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Liu, Monk" , "amd-gfx@lists.freedesktop.org" , "dri-devel@lists.freedesktop.org" , "Chen, Jingwen" Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Am 10.11.21 um 10:50 schrieb Daniel Vetter: > On Tue, Nov 09, 2021 at 08:17:01AM -0800, Rob Clark wrote: >> On Tue, Nov 9, 2021 at 1:07 AM Daniel Vetter wrote: >>> On Mon, Nov 08, 2021 at 03:39:17PM -0800, Rob Clark wrote: >>>> I stumbled across this thread when I ran into the same issue, while >>>> working out how to move drm/msm to use scheduler's retire + >>>> timeout/recovery (and get rid of our own mirror list of in-flight >>>> jobs). We already have hw error detection enabled, and it can signal >>>> quite fast, so assuming the first job on the list is the guilty job >>>> just won't work. >>>> >>>> But I was considering a slightly different approach to fixing this, >>>> instead just handling it all in drm_sched_main() and getting rid of >>>> the complicated kthread parking gymnastics. Ie. something along the >>>> lines of: >>> So handling timeouts in the main sched thread wont work as soon as you >>> have multiple engines and reset that impacts across engines: >>> >>> - Nothing is simplified since you still need to stop the other scheduler >>> threads. >>> >>> - You get deadlocks if 2 schedulers time out at the same time, and both >>> want to stop the other one. >>> >>> Hence workqueue. Now the rule for the wq is that you can only have one per >>> reset domain, so >>> - single engine you just take the one drm/sched provides >>> - if reset affects all your engines in the chip, then you allocate on in >>> the drm_device and pass that to all >>> - if you have a complex of gpus all interconnected (e.g. xgmi hive for >>> amd), then it's one wq for the entire hive >>> >>> _All_ reset related things must be run on that workqueue or things breaks, >>> which means if you get hw fault that also needs to be run there. I guess >>> we should either patch drm/sched to check you call that function from the >>> right workqueue, or just handle it internally. >> Hmm, ok.. I guess it would be useful to better document the reasoning >> for the current design, that would have steered me more towards the >> approach taken in this patch. > Maybe this was because you worked on an old kernel? Boris did update the > kerneldoc as part of making gpu reset work for panfrost, which has this > multi-engine reset problem. If that's not yet clear then we need to > improve the docs further. > > AMD's problem is even worse, because their reset domain is the entire xgmi > hive, so multiple pci devices. I'm pushing for quite a while that we get something like an amdgpu_reset_domain structure or similar for this, but we unfortunately don't have that yet. Maybe it should be a good idea to have something like a drm_sched_domain or similar with all the necessary information for the inter scheduler handling. E.g. a workqueue for reset etc... Regards, Christian. > > Also there might more issues in drm/sched ofc, e.g. I've looked a bit at > ordering/barriers and I'm pretty sure a lot are still missing. Or at least > we should have comments in the code explaining why it all works. > -Daniel > >> BR, >> -R >> >>> -Daniel >>> >>>> --------------------- >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>> index 67382621b429..4d6ce775c316 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>> @@ -764,6 +764,45 @@ static bool drm_sched_blocked(struct >>>> drm_gpu_scheduler *sched) >>>> return false; >>>> } >>>> >>>> +static bool handle_timeout(struct drm_gpu_scheduler *sched) >>>> +{ >>>> + struct drm_sched_job *bad; >>>> + >>>> + if (!sched->has_timeout) >>>> + return false; >>>> + >>>> + sched->has_timeout = false; >>>> + >>>> + spin_lock(&sched->job_list_lock); >>>> + bad = list_first_entry_or_null(&sched->pending_list, >>>> + struct drm_sched_job, list); >>>> + >>>> + if (!bad) { >>>> + spin_unlock(&sched->job_list_lock); >>>> + return false; >>>> + } >>>> + >>>> + spin_unlock(&sched->job_list_lock); >>>> + >>>> + if (sched->timeout_wq == system_wq) { >>>> + /* >>>> + * If driver has no specific requirements about serializing >>>> + * reset wrt. other engines, just call timedout_job() directly >>>> + */ >>>> + sched->ops->timedout_job(job); >>>> + } else { >>>> + /* >>>> + * Otherwise queue it on timeout_wq and wait for it to complete >>>> + */ >>>> + ... more typing needed here ... >>>> + } >>>> + >>>> + if (sched->free_guilty) { >>>> + sched->ops->free_job(job); >>>> + sched->free_guilty = false; >>>> + } >>>> +} >>>> + >>>> /** >>>> * drm_sched_main - main scheduler thread >>>> * >>>> @@ -787,6 +826,7 @@ static int drm_sched_main(void *param) >>>> >>>> wait_event_interruptible(sched->wake_up_worker, >>>> (cleanup_job = >>>> drm_sched_get_cleanup_job(sched)) || >>>> + handle_timeout(sched) || >>>> (!drm_sched_blocked(sched) && >>>> (entity = >>>> drm_sched_select_entity(sched))) || >>>> kthread_should_stop()); >>>> --------------------- >>>> >>>> drm_sched_fault() and the sw timeout handler would just set >>>> sched->has_timeout and kick sched->wake_up_worker. >>>> >>>> And since we handle the timeout case after >>>> drm_sched_get_cleanup_job(), we know that all of the successfully >>>> completed jobs have already been popped off the list, and won't be >>>> unfairly maligned. >>>> >>>> BR, >>>> -R >>>> >>>> On Tue, Aug 31, 2021 at 6:29 PM Liu, Monk wrote: >>>>> [AMD Official Use Only] >>>>> >>>>> Okay, I will reprepare this patch >>>>> >>>>> Thanks >>>>> >>>>> ------------------------------------------ >>>>> Monk Liu | Cloud-GPU Core team >>>>> ------------------------------------------ >>>>> >>>>> -----Original Message----- >>>>> From: Daniel Vetter >>>>> Sent: Tuesday, August 31, 2021 9:02 PM >>>>> To: Liu, Monk >>>>> Cc: amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Chen, Jingwen >>>>> Subject: Re: [PATCH 2/2] drm/sched: serialize job_timeout and scheduler >>>>> >>>>> On Tue, Aug 31, 2021 at 02:59:02PM +0200, Daniel Vetter wrote: >>>>>> Can we please have some actual commit message here, with detailed >>>>>> explanation of the race/bug/whatever, how you fix it and why this is >>>>>> the best option? >>>>>> >>>>>> On Tue, Aug 31, 2021 at 06:35:39PM +0800, Monk Liu wrote: >>>>>>> tested-by: jingwen chen >>>>>>> Signed-off-by: Monk Liu >>>>>>> Signed-off-by: jingwen chen >>>>>>> --- >>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 24 >>>>>>> ++++-------------------- >>>>>>> 1 file changed, 4 insertions(+), 20 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>>>>> index ecf8140..894fdb24 100644 >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>>>> @@ -319,19 +319,17 @@ static void drm_sched_job_timedout(struct work_struct *work) >>>>>>> sched = container_of(work, struct drm_gpu_scheduler, >>>>>>> work_tdr.work); >>>>>>> >>>>>>> /* Protects against concurrent deletion in >>>>>>> drm_sched_get_cleanup_job */ >>>>>>> + if (!__kthread_should_park(sched->thread)) >>>>>> This is a __ function, i.e. considered internal, and it's lockless >>>>>> atomic, i.e. unordered. And you're not explaining why this works. >>>>>> >>>>>> Iow it's probably buggy, and an just unconditionally parking the >>>>>> kthread is probably the right thing to do. If it's not the right thing >>>>>> to do, there's a bug here for sure. >>>>> Also why don't we reuse the function drivers already have to stop a scheduler thread? We seem to have two kthread_park now, that's probably one too much. >>>>> -Daniel >>>>> >>>>>>> + kthread_park(sched->thread); >>>>>>> + >>>>>>> spin_lock(&sched->job_list_lock); >>>>>>> job = list_first_entry_or_null(&sched->pending_list, >>>>>>> struct drm_sched_job, list); >>>>>>> >>>>>>> if (job) { >>>>>>> - /* >>>>>>> - * Remove the bad job so it cannot be freed by concurrent >>>>>>> - * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread >>>>>>> - * is parked at which point it's safe. >>>>>>> - */ >>>>>>> - list_del_init(&job->list); >>>>>>> spin_unlock(&sched->job_list_lock); >>>>>>> >>>>>>> + /* vendor's timeout_job should call drm_sched_start() */ >>>>>>> status = job->sched->ops->timedout_job(job); >>>>>>> >>>>>>> /* >>>>>>> @@ -393,20 +391,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) >>>>>>> kthread_park(sched->thread); >>>>>>> >>>>>>> /* >>>>>>> - * Reinsert back the bad job here - now it's safe as >>>>>>> - * drm_sched_get_cleanup_job cannot race against us and release the >>>>>>> - * bad job at this point - we parked (waited for) any in progress >>>>>>> - * (earlier) cleanups and drm_sched_get_cleanup_job will not be called >>>>>>> - * now until the scheduler thread is unparked. >>>>>>> - */ >>>>>>> - if (bad && bad->sched == sched) >>>>>>> - /* >>>>>>> - * Add at the head of the queue to reflect it was the earliest >>>>>>> - * job extracted. >>>>>>> - */ >>>>>>> - list_add(&bad->list, &sched->pending_list); >>>>>>> - >>>>>>> - /* >>>>>>> * Iterate the job list from later to earlier one and either deactive >>>>>>> * their HW callbacks or remove them from pending list if they already >>>>>>> * signaled. >>>>>>> -- >>>>>>> 2.7.4 >>>>>>> >>>>>> -- >>>>>> Daniel Vetter >>>>>> Software Engineer, Intel Corporation >>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog. >>>>>> ffwll.ch%2F&data=04%7C01%7CMonk.Liu%40amd.com%7C298815bea18f4fbf76 >>>>>> b308d96c7f7a8b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6376601170 >>>>>> 51194614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiL >>>>>> CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QzgCU7%2BPdA0aWL5%2BJLg >>>>>> KeKbGaMMGqeGI9KE0P0LXlN4%3D&reserved=0 >>>>> -- >>>>> Daniel Vetter >>>>> Software Engineer, Intel Corporation >>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=04%7C01%7CMonk.Liu%40amd.com%7C298815bea18f4fbf76b308d96c7f7a8b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637660117051194614%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QzgCU7%2BPdA0aWL5%2BJLgKeKbGaMMGqeGI9KE0P0LXlN4%3D&reserved=0 >>> -- >>> Daniel Vetter >>> Software Engineer, Intel Corporation >>> http://blog.ffwll.ch