From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A2976C32772 for ; Tue, 23 Aug 2022 21:38:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7691710FEA0; Tue, 23 Aug 2022 21:38:12 +0000 (UTC) Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2040.outbound.protection.outlook.com [40.107.237.40]) by gabe.freedesktop.org (Postfix) with ESMTPS id 543B610EE9D; Tue, 23 Aug 2022 21:37:56 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nAOZPCH01bA8Oc0dE5/y1JuHXlCrqLGEtlV+rcY+X6Luda4p66ofopTbEDvefouiH1476TixTr5EmstPIAbkGCyYXfBXaPDknHZ8P4Ip32a6uD8YgR7v7CcLruTsJ0+13H8MrdssoifKUnnw7EiNDYF1wTtP+b5NlZO+/iiaNnlAWjcF8c6PU9MB0eq1Z/TU/fC0pF/Mhq0FcdTOLuGpolEAcH46h+aJRhkxGILdPozH5XZj5wbf8K1tbB4eusHO6qNKC7g3dKKukXHNzIvQ4n7LHgoE0xOwMRN/oZv3uvxwB+KKhOVfU58Xp93m5mRsDvvbE7S1l/sbvtsSmo+19A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6ORYJY51ZgIeyiKYxn4stPRdgmk8HzPg/klWuTDFXwY=; b=M1ddEBspj1mLoR6Q7JPphF61rWofUMV1V2uoLLG+glCVkSe6oAQZymWs4WmVsu52/Y/5VF8KkiGGYTVaYexnBD0DtBHYyxyGd2HP7RT0XacRCKmA+LjixoYqK9c8n05sXhRHZ15CReMPYrdQuyXx5v8sWZ16dNbGyzj1JqM9FUEl3q1c7UIZW15F+Lbn5fjPteTiz56uAGQzidjgpZ/Hc2ULbfaUkYa7gNI3dZvArIJZu1TAAhbnQKrviLpgR+NrM7RmP2ckId9sKv/Sa7tOzrya677WPcuoigJFqhoRn3TVF4kWiKaJWBIN601pr+TSFdgJXYRW6RZ+3kYuTF6qfQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6ORYJY51ZgIeyiKYxn4stPRdgmk8HzPg/klWuTDFXwY=; b=3J8qUURlBFbs/yy2CGqs5ATaoy4sZL/GtJQUA4VXfo5tr9ZZGAoGFC+Nu84NkqJfxY7SOmeztO+JmOod3BMEG3OIO2IkMDGsqGCAV+e7n9MiR+70yCbzGmdol/MVLAR+i5tT7BfrGIdKaMvpmkSXsyu2777p+Yjv/eiCWOOHv2o= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com; Received: from DM6PR12MB3370.namprd12.prod.outlook.com (2603:10b6:5:38::25) by CY5PR12MB6276.namprd12.prod.outlook.com (2603:10b6:930:f::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.16; Tue, 23 Aug 2022 21:37:52 +0000 Received: from DM6PR12MB3370.namprd12.prod.outlook.com ([fe80::845b:3332:e2df:8286]) by DM6PR12MB3370.namprd12.prod.outlook.com ([fe80::845b:3332:e2df:8286%7]) with mapi id 15.20.5546.022; Tue, 23 Aug 2022 21:37:52 +0000 Message-ID: <65722ac9-e76b-8473-e1d5-3209c2d59a89@amd.com> Date: Tue, 23 Aug 2022 17:37:48 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.1.2 Subject: Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq Content-Language: en-CA To: Andrey Grodzovsky , dri-devel@lists.freedesktop.org References: <20220822200917.440681-1-andrey.grodzovsky@amd.com> <1eadb06c-7d2c-0317-a34a-c219c68b93c3@amd.com> <54bbcee9-68c7-5b4d-1050-7f098c96a805@amd.com> From: Luben Tuikov In-Reply-To: <54bbcee9-68c7-5b4d-1050-7f098c96a805@amd.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-ClientProxiedBy: YT4PR01CA0017.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:b01:d1::28) To DM6PR12MB3370.namprd12.prod.outlook.com (2603:10b6:5:38::25) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 42866cdf-5362-4ff1-ee31-08da854fbbae X-MS-TrafficTypeDiagnostic: CY5PR12MB6276:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +TtrRgO4onMEoFwxFzyRjhG44ad+mtJneXHyhDhhCQ9aVqXvMRWVyXbTSfn0r5J2sDsNnO1M0YsFlbTtlxH8OID4H0m9DIYWhyC/tMqW488h3FzbUZOb9seHP0SbQ8cjJLauPTbn2KL47ZhKC7geTKawdM4eZlVAOlrkgeDcw2zXop8ZRdRQHB0Y00AfU0+hqZpl5V8CdsaXnDknKGouUAUZEW23ex2zPjF2cXgFkgDnkp/8MsYUgqt2Yr5a5lz5CudQMazzusvZED6a+hMO3tyKMmf3LsYTClRQt5uMcAEkXMI5QyfXX8R8KA5ciM154653CdKZB+w5hKeo72UiJQWnOdarCK7K83IaobxOlFQR2uB8LFLjs7l6eSHv+4b4H53FHRm8VEqxu2nVFxcl+13tqgBkwq4wQH3H6Pt01OiwcXzai6nq27Ryy7rufcHZO75NrNCzLN7jLtrtkJE4XUIR3uzDSnJFj6LvUBAyQCGwpB6zYgkT35ODEOhnoHdBejlBI9rqSQ1OHQPjWMrYYBuM/CvV/Pr/nYNQSACtUta3MwTuefX0uvPgeA07+OSM8/mhQ+7cXo85EOLiZxmBz5AQ9Vr2M8I41O57f1SYmpS0yso7GtbzF24RuOBHYBB1irACT1zVb4fcCP8kBUFnnUjJ7qVSFj64MyCupfVHsQDqywQEOI1yg4pcQ6r+ez6gN4OOmYLCskxa1NOMH+OobYtw31S3Id8UlnBDym9/mAQJA0x/jp3Ro6C8aqAkItXY+X5RgWVugCXIEsw4Gx/mAuJrTlZmbQvPtKeGSddHOJQ= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR12MB3370.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(4636009)(366004)(39860400002)(376002)(346002)(136003)(396003)(6666004)(8936002)(38100700002)(83380400001)(86362001)(31686004)(36756003)(316002)(6506007)(6486002)(478600001)(66556008)(8676002)(53546011)(31696002)(41300700001)(66476007)(6512007)(4326008)(186003)(2616005)(30864003)(5660300002)(2906002)(44832011)(66946007)(26005)(43740500002)(45980500001); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?V1NyL01JRzVBNWpIRkZuYXdPeU9QcWVsTlpyVGpFdGNpbHJ1cUN4RzdGVFBS?= =?utf-8?B?MG1sWWNjRFhlWGM0cUFEVFR2OFZTQ1JnWjRvT2FIQTE5WDdkTG9aVU1EaHV6?= =?utf-8?B?Tkc1STIzcGFHdDlON05aV3pHZmwwb0xmSkFjQTVoWlRMY1Z2VlhzaElFcHJx?= =?utf-8?B?UnAxMHdBdkpWRmFWcjg5WW94cy9RSFVVbHBtYTZxdXVBUnllRVZJRkI3Ymgv?= =?utf-8?B?UnFEUzAxSEtlMk9Xc3dtVkRYZXdkWlRETFp0OWEvemVOclpZQzlXZjJKQ1NI?= =?utf-8?B?ejNLUzk1VDVYTFlHbjNjUXJkNVQ3WG8xN3UvcXo3blFNZFp4Sk9qZGxEeGVY?= =?utf-8?B?MFA5TVFYQm9vWnlPWmRqQTVhYUV1aDB4b3pqL1h4MGI1b09IUWFsb09nTWJw?= =?utf-8?B?NGZvdi8vYW54aHFvdE1pdEZZRnVOa1NLT2tBdkRuTWNuYlFPSGJqN3ZaZjhw?= =?utf-8?B?cVIvK0ltN0pzVitKb1VhMVFwaGR4ZTNPell1cXMxSGFZZklvYVNCNTEzVTVJ?= =?utf-8?B?RkFxUERnQ1dEMHQvK3Z6bVBpRmNGeGVwcEdPd0hBcXBmU1l4K2JsWTRUTGV3?= =?utf-8?B?NEdkREs1NUVwNFBKclBib3lySlJXMExtbVQvc283Yy9EbUdGZUtHM2xOYys1?= =?utf-8?B?UXhFRmE2WkNKK2ZHaVNHd3RnOThPU1VvOXBqU1lobWw5Q1FXUkFOcmlWZUNN?= =?utf-8?B?bThCTFNGcDJDTzJVUG1kUmRiTkJaUUcrWkVBUlFsRU5UOWpXeEZqZGE3bDg1?= =?utf-8?B?OExpQnpndlV3Si82OUhSbklkc0J0Qk5GcjlralVMVWZXaVY4Q0RqL1FuKzBE?= =?utf-8?B?cmtERis1RGoySW1IK2Y4NlJMSENWVk5pZmxYVjI3djE0c3RNQkhVQnhRSjFK?= =?utf-8?B?b2FpakVDWU9ZMk5xSUgwRXFxVFRXMHpEUU14YmxydU5HVWpQMzBvM3hBZFU1?= =?utf-8?B?NXgzOWpOMzU5bXo0TWoxTENyR0hUeFdLbWdtYTc4dnEza0xWN1ZCVFF2K0p3?= =?utf-8?B?WDgydFFQeDNUSDhMZ2NERWRhUGNlWlFxMFdMbS9INXhSZ1BnbHNIVlVuQ0Fh?= =?utf-8?B?K1NMT3ppUldLQkRiNWZtYjh1RTdEK3hEU21aOGpzTWVCVjVWcnA0VnFTL1d0?= =?utf-8?B?VUsya2ZkRWtwa3c0eUNmcTJUbVNiUFhtQUZPOTZMeENBNzVSTVVRcDRLWGs2?= =?utf-8?B?THRmampERHh1RjZzbEFUdC9Sd2hqa0F2TzdqZXNMNktqR0swbmNpcEVpblAw?= =?utf-8?B?QkIxTGNySWh0N0pGMWJIcVczSjdjRFVremYwU3JDeE9oWW1OS1lRTElKN0RW?= =?utf-8?B?NFNXaWVnc0lhMFZRME5ySTV2SzVpTXZWajJ6cEF0ZkxQQ0VueWlNZzU2UXRT?= =?utf-8?B?NjZITVhwcUZxZWFJdHpiZkxEOTVsTmFZVVk3cFhwNm1DRFRFcVpVRGRKMlJ4?= =?utf-8?B?WnRBK1E5SzgvS0V1bHFtaXNDVnp2eXFPcXZpSXFpOXRQQXdxdS9kUFJIcnlT?= =?utf-8?B?RlR2L29Rc3BZWVZlN1RjazlvanJYb0kxdkVOYTIxbVVRa1RFeG8xdXVaQTNr?= =?utf-8?B?K1RnaEhyRXdEeFMzc2c5ejlmRjlVcnpnQlZvR09RY3dSa3V3NjRXdVBhdzhG?= =?utf-8?B?MFg0dkJCL1JQVmN4RFphdWVTMGNmMXdxcXcrTTZaQVE4S1pDUGk5cC83b2VQ?= =?utf-8?B?WXZOaFVDSzdYNEU5d1RFRTZackVwOERKUUNnWncvREIralM2WlhWbkYxSVVK?= =?utf-8?B?Nzk0YjVtRk45YWNSQWluZUFyZjlIUzNxUi9pVVRJMkp4TmdrYnJJa0hHUXZ4?= =?utf-8?B?UlV6VUIyNTh5OG1MV1gwTkNreFFFR0xQSFVtYXF0M2RMN2xCdlJGVmpoaUpk?= =?utf-8?B?TkEvWGwxZS9Qb3k4ZXg1RHdjZWg4dk9DczF5RHJhOVdCN0crUEV4dmlib0dB?= =?utf-8?B?Tjh0bWd1V3ZDV2wyU1I2WGFvZ3h0YXp2dFlOV2NETDlvM09aL2IrVHRIeWNp?= =?utf-8?B?TlNzdC9xN1FGdFFFNlhiVmg4TW03MHU5VWIrZ3p0YXgyUFlGWmpoMlNHK3NG?= =?utf-8?B?cFVSMVREVDRFOGF0V0FOUlluczBGc2pCSkxSeTdHaGZjQ2lEZk1vM0RXeHRh?= =?utf-8?Q?FQUyM6dz7ZBHnk3H+TgsrLz71?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 42866cdf-5362-4ff1-ee31-08da854fbbae X-MS-Exchange-CrossTenant-AuthSource: DM6PR12MB3370.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Aug 2022 21:37:52.5466 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: sGL8hZYNZ8gU8iFL/SPiuB2UEw7eU6ApO14PxtFjiIqocAiGbNdqdhre3qRE++Ly X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6276 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ckoenig.leichtzumerken@gmail.com, Li Yunxiang , amd-gfx@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On 2022-08-23 14:57, Andrey Grodzovsky wrote: > On 2022-08-23 14:30, Luben Tuikov wrote: > >> >> On 2022-08-23 14:13, Andrey Grodzovsky wrote: >>> On 2022-08-23 12:58, Luben Tuikov wrote: >>>> Inlined: >>>> >>>> On 2022-08-22 16:09, Andrey Grodzovsky wrote: >>>>> Poblem: Given many entities competing for same rq on >>>> ^Problem >>>> >>>>> same scheduler an uncceptabliy long wait time for some >>>> ^unacceptably >>>> >>>>> jobs waiting stuck in rq before being picked up are >>>>> observed (seen using GPUVis). >>>>> The issue is due to Round Robin policy used by scheduler >>>>> to pick up the next entity for execution. Under stress >>>>> of many entities and long job queus within entity some >>>> ^queues >>>> >>>>> jobs could be stack for very long time in it's entity's >>>>> queue before being popped from the queue and executed >>>>> while for other entites with samller job queues a job >>>> ^entities; smaller >>>> >>>>> might execute ealier even though that job arrived later >>>> ^earlier >>>> >>>>> then the job in the long queue. >>>>> >>>>> Fix: >>>>> Add FIFO selection policy to entites in RQ, chose next enitity >>>>> on rq in such order that if job on one entity arrived >>>>> ealrier then job on another entity the first job will start >>>>> executing ealier regardless of the length of the entity's job >>>>> queue. >>>>> >>>>> Signed-off-by: Andrey Grodzovsky >>>>> Tested-by: Li Yunxiang (Teddy) >>>>> --- >>>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >>>>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >>>>> include/drm/gpu_scheduler.h | 8 +++ >>>>> 3 files changed, 71 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>>>> index 6b25b2f4f5a3..3bb7f69306ef 100644 >>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>>>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) >>>>> atomic_inc(entity->rq->sched->score); >>>>> WRITE_ONCE(entity->last_user, current->group_leader); >>>>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); >>>>> + sched_job->submit_ts = ktime_get(); >>>>> + >>>>> >>>>> /* first job wakes up scheduler */ >>>>> if (first) { >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>>> index 68317d3a7a27..c123aa120d06 100644 >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>> @@ -59,6 +59,19 @@ >>>>> #define CREATE_TRACE_POINTS >>>>> #include "gpu_scheduler_trace.h" >>>>> >>>>> + >>>>> + >>>>> +int drm_sched_policy = -1; >>>>> + >>>>> +/** >>>>> + * DOC: sched_policy (int) >>>>> + * Used to override default entites scheduling policy in a run queue. >>>>> + */ >>>>> +MODULE_PARM_DESC(sched_policy, >>>>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); >>>>> +module_param_named(sched_policy, drm_sched_policy, int, 0444); >>>> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, >>>> say the RR. >>>> >>>> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. >>> >>> Christian is not against it, just against adding 'auto' here - like the >>> default. >> Exactly what I said. >> >> Also, I still think an O(1) scheduling (picking next to run) should be >> what we strive for in such a FIFO patch implementation. >> A FIFO mechanism is by it's nature an O(1) mechanism for picking the next >> element. >> >> Regards, >> Luben > > > The only solution i see for this now is keeping a global per rq jobs > list parallel to SPCP queue per entity - we use this list when we switch > to FIFO scheduling, we can even start buildingĀ  it ONLY when we switch > to FIFO building it gradually as more jobs come. Do you have other solution > in mind ? The idea is to "sort" on insertion, not on picking the next one to run. cont'd below: > > Andrey > >> >>> >>>>> + >>>>> + >>>>> #define to_drm_sched_job(sched_job) \ >>>>> container_of((sched_job), struct drm_sched_job, queue_node) >>>>> >>>>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>>> } >>>>> >>>>> /** >>>>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run >>>>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run >>>>> * >>>>> * @rq: scheduler run queue to check. >>>>> * >>>>> - * Try to find a ready entity, returns NULL if none found. >>>>> + * Try to find a ready entity, in round robin manner. >>>>> + * >>>>> + * Returns NULL if none found. >>>>> */ >>>>> static struct drm_sched_entity * >>>>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >>>>> { >>>>> struct drm_sched_entity *entity; >>>>> >>>>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>> return NULL; >>>>> } >>>>> >>>>> +/** >>>>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run >>>>> + * >>>>> + * @rq: scheduler run queue to check. >>>>> + * >>>>> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >>>>> + * >>>>> + * Returns NULL if none found. >>>>> + */ >>>>> +static struct drm_sched_entity * >>>>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >>>>> +{ >>>>> + struct drm_sched_entity *tmp, *entity = NULL; >>>>> + ktime_t oldest_ts = KTIME_MAX; >>>>> + struct drm_sched_job *sched_job; >>>>> + >>>>> + spin_lock(&rq->lock); >>>>> + >>>>> + list_for_each_entry(tmp, &rq->entities, list) { >>>>> + >>>>> + if (drm_sched_entity_is_ready(tmp)) { >>>>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >>>>> + >>>>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >>>>> + oldest_ts = sched_job->submit_ts; >>>>> + entity = tmp; >>>>> + } >>>>> + } >>>>> + } >>>> Here I think we need an O(1) lookup of the next job to pick out to run. >>>> I see a number of optimizations, for instance keeping the current/oldest >>>> timestamp in the rq struct itself, >>> >>> This was my original design with rb tree based min heap per rq based on >>> time stamp of >>> the oldest job waiting in each entity's job queue (head of entity's SPCP >>> job queue). But how in this >>> case you record the timestamps of all the jobs waiting in entity's the >>> SPCP queue behind the head job ? >>> If you record only the oldest job and more jobs come you have no place >>> to store their timestamps and so >>> on next job select after current HEAD how you will know who came before >>> or after between 2 job queues of >>> of 2 entities ? >>> >>> >>>> or better yet keeping the next job >>>> to pick out to run at a head of list (a la timer wheel implementation). >>>> For suck an optimization to work, you'd prep things up on job insertion, rather >>>> than on job removal/pick to run. >>> >>> I looked at timer wheel and I don't see how this applies here - it >>> assumes you know before >>> job submission your deadline time for job's execution to start - which >>> we don't so I don't see >>> how we can use it. This seems more suitable for real time scheduler >>> implementation where you >>> have a hard requirement to execute a job by some specific time T. In a timer wheel you instantly know the "soonest" job to run--it's naturally your "next" job, regardless of in what order the timers were added and what their timeout time is. >>> >>> I also mentioned a list, obviously there is the brute force solution of >>> just ordering all jobs in one giant list and get >>> naturally a FIFO ordering this way with O(1) insertions and extractions >>> but I assume we don't want one giant jobs queue >>> for all the entities to avoid more dependeies between them (like locking >>> the entire list when one specific entity is killed and >>> needs to remove it's jobs from SW queue). You can also have a list of list pointers. It'd be trivial to remove a whole list from the main list, by simply removing an element--akin to locking out a rq, or should you need to edit the rq's entity list. >>> >>>> I'm also surprised that there is no job transition from one queue to another, >>>> as it is picked to run next--for the above optimizations to implemented, you'd >>>> want a state transition from (state) queue to queue. >>> >>> Not sure what you have in mind here ? How this helps ? I think I've explained this a few times now--each list represents a state and a job/entity travels through lists as it travels through states, which states more or less represent the state of execution in the hardware--it could be as simple as incoming --> pending --> done. It allows a finer grain when resetting the hardware (should the hardware allow it). Note that this isn't directly related to the O(1) mechanism I brought up here. As I said, I was surprised to find out none such distinction existed--that's all. Don't fixate on this. Regards, Luben >>> >>> Andrey >>> >>> >>>> Regards, >>>> Luben >>>> >>> In my origianl design >>> >>>>> + >>>>> + if (entity) { >>>>> + rq->current_entity = entity; >>>>> + reinit_completion(&entity->entity_idle); >>>>> + } >>>>> + >>>>> + spin_unlock(&rq->lock); >>>>> + return entity; >>>>> +} >>>>> + >>>>> /** >>>>> * drm_sched_job_done - complete a job >>>>> * @s_job: pointer to the job which is done >>>>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) >>>>> >>>>> /* Kernel run queue has higher priority than normal run queue*/ >>>>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { >>>>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >>>>> + entity = drm_sched_policy != 1 ? >>>>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >>>>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >>>>> + >>>>> if (entity) >>>>> break; >>>>> } >>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>>>> index addb135eeea6..95865881bfcf 100644 >>>>> --- a/include/drm/gpu_scheduler.h >>>>> +++ b/include/drm/gpu_scheduler.h >>>>> @@ -314,6 +314,14 @@ struct drm_sched_job { >>>>> >>>>> /** @last_dependency: tracks @dependencies as they signal */ >>>>> unsigned long last_dependency; >>>>> + >>>>> + /** >>>>> + * @submit_ts: >>>>> + * >>>>> + * Marks job submit time >>>>> + */ >>>>> + ktime_t submit_ts; >>>>> + >>>>> }; >>>>> >>>>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, >> Regards, Regards, -- Luben