From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8168BC47247 for ; Thu, 30 Apr 2020 20:13:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2ADE6207DD for ; Thu, 30 Apr 2020 20:13:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="RQnf4l7n" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2ADE6207DD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C894E8E0006; Thu, 30 Apr 2020 16:13:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C13B68E0001; Thu, 30 Apr 2020 16:13:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADBED8E0006; Thu, 30 Apr 2020 16:13:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0206.hostedemail.com [216.40.44.206]) by kanga.kvack.org (Postfix) with ESMTP id 975478E0001 for ; Thu, 30 Apr 2020 16:13:53 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 63F7A8248D7C for ; Thu, 30 Apr 2020 20:13:53 +0000 (UTC) X-FDA: 76765622346.27.space39_5c152097d5d04 X-HE-Tag: space39_5c152097d5d04 X-Filterd-Recvd-Size: 13222 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Thu, 30 Apr 2020 20:13:52 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03UKDfRp025824; Thu, 30 Apr 2020 20:13:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=OM/fbru4UEW/37LyyUAiPpDA7ZzW3bAgBlzQnvhOljM=; b=RQnf4l7ni65knfAnj2oAt11nN3u1diMi2kGnfBS9oYldzCTSNJ7A/rnFfQNCNAVhIL6F 4JrkC6xsvTH61hN7PSlhm4D/ebHdr+TbS8DjSIbU3zXGoAayO74w9IVhIZwXOMWXdBMT 0jrCGsyMx+PpTKMaaVhjSpZQ0IqJE1M0tILkkCsgjJWerlIR0yfq170MJ6Vf6PkJS09X 9ZGGID8eFKmEoSOo8/Ja/5Kwwtc4+74rrKq2BN5ZihjSfpbwtsAg9/aRCF4Khwv2c7dt mSkLwi3R1u8M82iewdkPho7iwl2MBUsYvwospYtWwVFlmAcuQ8eL5suWOLVpvb2bAf0b 0w== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 30p01p45j8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 30 Apr 2020 20:13:40 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03UK6spH001498; Thu, 30 Apr 2020 20:11:40 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 30qtf8fyfx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 30 Apr 2020 20:11:39 +0000 Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03UKBbt0026674; Thu, 30 Apr 2020 20:11:37 GMT Received: from localhost.localdomain (/98.229.125.203) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 30 Apr 2020 13:11:37 -0700 From: Daniel Jordan To: Andrew Morton , Herbert Xu , Steffen Klassert Cc: Alex Williamson , Alexander Duyck , Dan Williams , Dave Hansen , David Hildenbrand , Jason Gunthorpe , Jonathan Corbet , Josh Triplett , Kirill Tkhai , Michal Hocko , Pavel Machek , Pavel Tatashin , Peter Zijlstra , Randy Dunlap , Shile Zhang , Tejun Heo , Zi Yan , linux-crypto@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Daniel Jordan Subject: [PATCH 3/7] padata: allocate work structures for parallel jobs from a pool Date: Thu, 30 Apr 2020 16:11:21 -0400 Message-Id: <20200430201125.532129-4-daniel.m.jordan@oracle.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200430201125.532129-1-daniel.m.jordan@oracle.com> References: <20200430201125.532129-1-daniel.m.jordan@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9607 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=2 phishscore=0 malwarescore=0 mlxlogscore=999 bulkscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004300150 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9607 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 spamscore=0 clxscore=1015 phishscore=0 mlxlogscore=999 adultscore=0 priorityscore=1501 mlxscore=0 suspectscore=2 malwarescore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004300151 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: padata allocates per-CPU, per-instance work structs for parallel jobs. A do_parallel call assigns a job to a sequence number and hashes the number to a CPU, where the job will eventually run using the corresponding work. This approach fit with how padata used to bind a job to each CPU round-robin, makes less sense after commit bfde23ce200e6 ("padata: unbind parallel jobs from specific CPUs") because a work isn't bound to a particular CPU anymore, and isn't needed at all for multithreaded jobs because they don't have sequence numbers. Replace the per-CPU works with a preallocated pool, which allows sharing them between existing padata users and the upcoming multithreaded user. The pool will also facilitate setting NUMA-aware concurrency limits with later users. The pool is sized according to the number of possible CPUs. With this limit, MAX_OBJ_NUM no longer makes sense, so remove it. If the global pool is exhausted, a parallel job is run in the current task instead to throttle a system trying to do too much in parallel. Signed-off-by: Daniel Jordan --- include/linux/padata.h | 8 +-- kernel/padata.c | 118 +++++++++++++++++++++++++++-------------- 2 files changed, 78 insertions(+), 48 deletions(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index 476ecfa41f363..3bfa503503ac5 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -24,7 +24,6 @@ * @list: List entry, to attach to the padata lists. * @pd: Pointer to the internal control structure. * @cb_cpu: Callback cpu for serializatioon. - * @cpu: Cpu for parallelization. * @seq_nr: Sequence number of the parallelized data object. * @info: Used to pass information from the parallel to the serial funct= ion. * @parallel: Parallel execution function. @@ -34,7 +33,6 @@ struct padata_priv { struct list_head list; struct parallel_data *pd; int cb_cpu; - int cpu; unsigned int seq_nr; int info; void (*parallel)(struct padata_priv *padata); @@ -68,15 +66,11 @@ struct padata_serial_queue { /** * struct padata_parallel_queue - The percpu padata parallel queue * - * @parallel: List to wait for parallelization. * @reorder: List to wait for reordering after parallel processing. - * @work: work struct for parallelization. * @num_obj: Number of objects that are processed by this cpu. */ struct padata_parallel_queue { - struct padata_list parallel; struct padata_list reorder; - struct work_struct work; atomic_t num_obj; }; =20 @@ -111,7 +105,7 @@ struct parallel_data { struct padata_parallel_queue __percpu *pqueue; struct padata_serial_queue __percpu *squeue; atomic_t refcnt; - atomic_t seq_nr; + unsigned int seq_nr; unsigned int processed; int cpu; struct padata_cpumask cpumask; diff --git a/kernel/padata.c b/kernel/padata.c index b05cd30f8905b..edd3ff551e262 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -32,7 +32,15 @@ #include #include =20 -#define MAX_OBJ_NUM 1000 +struct padata_work { + struct work_struct pw_work; + struct list_head pw_list; /* padata_free_works linkage */ + void *pw_data; +}; + +static DEFINE_SPINLOCK(padata_works_lock); +static struct padata_work *padata_works; +static LIST_HEAD(padata_free_works); =20 static void padata_free_pd(struct parallel_data *pd); =20 @@ -58,30 +66,44 @@ static int padata_cpu_hash(struct parallel_data *pd, = unsigned int seq_nr) return padata_index_to_cpu(pd, cpu_index); } =20 -static void padata_parallel_worker(struct work_struct *parallel_work) +static struct padata_work *padata_work_alloc(void) { - struct padata_parallel_queue *pqueue; - LIST_HEAD(local_list); + struct padata_work *pw; =20 - local_bh_disable(); - pqueue =3D container_of(parallel_work, - struct padata_parallel_queue, work); + lockdep_assert_held(&padata_works_lock); =20 - spin_lock(&pqueue->parallel.lock); - list_replace_init(&pqueue->parallel.list, &local_list); - spin_unlock(&pqueue->parallel.lock); + if (list_empty(&padata_free_works)) + return NULL; /* No more work items allowed to be queued. */ =20 - while (!list_empty(&local_list)) { - struct padata_priv *padata; + pw =3D list_first_entry(&padata_free_works, struct padata_work, pw_list= ); + list_del(&pw->pw_list); + return pw; +} =20 - padata =3D list_entry(local_list.next, - struct padata_priv, list); +static void padata_work_init(struct padata_work *pw, work_func_t work_fn= , + void *data) +{ + INIT_WORK(&pw->pw_work, work_fn); + pw->pw_data =3D data; +} =20 - list_del_init(&padata->list); +static void padata_work_free(struct padata_work *pw) +{ + lockdep_assert_held(&padata_works_lock); + list_add(&pw->pw_list, &padata_free_works); +} =20 - padata->parallel(padata); - } +static void padata_parallel_worker(struct work_struct *parallel_work) +{ + struct padata_work *pw =3D container_of(parallel_work, struct padata_wo= rk, + pw_work); + struct padata_priv *padata =3D pw->pw_data; =20 + local_bh_disable(); + padata->parallel(padata); + spin_lock(&padata_works_lock); + padata_work_free(pw); + spin_unlock(&padata_works_lock); local_bh_enable(); } =20 @@ -105,9 +127,9 @@ int padata_do_parallel(struct padata_shell *ps, struct padata_priv *padata, int *cb_cpu) { struct padata_instance *pinst =3D ps->pinst; - int i, cpu, cpu_index, target_cpu, err; - struct padata_parallel_queue *queue; + int i, cpu, cpu_index, err; struct parallel_data *pd; + struct padata_work *pw; =20 rcu_read_lock_bh(); =20 @@ -135,25 +157,25 @@ int padata_do_parallel(struct padata_shell *ps, if ((pinst->flags & PADATA_RESET)) goto out; =20 - if (atomic_read(&pd->refcnt) >=3D MAX_OBJ_NUM) - goto out; - - err =3D 0; atomic_inc(&pd->refcnt); padata->pd =3D pd; padata->cb_cpu =3D *cb_cpu; =20 - padata->seq_nr =3D atomic_inc_return(&pd->seq_nr); - target_cpu =3D padata_cpu_hash(pd, padata->seq_nr); - padata->cpu =3D target_cpu; - queue =3D per_cpu_ptr(pd->pqueue, target_cpu); - - spin_lock(&queue->parallel.lock); - list_add_tail(&padata->list, &queue->parallel.list); - spin_unlock(&queue->parallel.lock); + rcu_read_unlock_bh(); =20 - queue_work(pinst->parallel_wq, &queue->work); + spin_lock(&padata_works_lock); + padata->seq_nr =3D ++pd->seq_nr; + pw =3D padata_work_alloc(); + spin_unlock(&padata_works_lock); + if (pw) { + padata_work_init(pw, padata_parallel_worker, padata); + queue_work(pinst->parallel_wq, &pw->pw_work); + } else { + /* Maximum works limit exceeded, run in the current task. */ + padata->parallel(padata); + } =20 + return 0; out: rcu_read_unlock_bh(); =20 @@ -324,8 +346,9 @@ static void padata_serial_worker(struct work_struct *= serial_work) void padata_do_serial(struct padata_priv *padata) { struct parallel_data *pd =3D padata->pd; + int hashed_cpu =3D padata_cpu_hash(pd, padata->seq_nr); struct padata_parallel_queue *pqueue =3D per_cpu_ptr(pd->pqueue, - padata->cpu); + hashed_cpu); struct padata_priv *cur; =20 spin_lock(&pqueue->reorder.lock); @@ -416,8 +439,6 @@ static void padata_init_pqueues(struct parallel_data = *pd) pqueue =3D per_cpu_ptr(pd->pqueue, cpu); =20 __padata_list_init(&pqueue->reorder); - __padata_list_init(&pqueue->parallel); - INIT_WORK(&pqueue->work, padata_parallel_worker); atomic_set(&pqueue->num_obj, 0); } } @@ -451,7 +472,7 @@ static struct parallel_data *padata_alloc_pd(struct p= adata_shell *ps) =20 padata_init_pqueues(pd); padata_init_squeues(pd); - atomic_set(&pd->seq_nr, -1); + pd->seq_nr =3D -1; atomic_set(&pd->refcnt, 1); spin_lock_init(&pd->lock); pd->cpu =3D cpumask_first(pd->cpumask.pcpu); @@ -1050,6 +1071,7 @@ EXPORT_SYMBOL(padata_free_shell); =20 void __init padata_init(void) { + unsigned int i, possible_cpus; #ifdef CONFIG_HOTPLUG_CPU int ret; =20 @@ -1061,13 +1083,27 @@ void __init padata_init(void) =20 ret =3D cpuhp_setup_state_multi(CPUHP_PADATA_DEAD, "padata:dead", NULL, padata_cpu_dead); - if (ret < 0) { - cpuhp_remove_multi_state(hp_online); - goto err; - } + if (ret < 0) + goto remove_online_state; +#endif + + possible_cpus =3D num_possible_cpus(); + padata_works =3D kmalloc_array(possible_cpus, sizeof(struct padata_work= ), + GFP_KERNEL); + if (!padata_works) + goto remove_dead_state; + + for (i =3D 0; i < possible_cpus; ++i) + list_add(&padata_works[i].pw_list, &padata_free_works); =20 return; + +remove_dead_state: +#ifdef CONFIG_HOTPLUG_CPU + cpuhp_remove_multi_state(CPUHP_PADATA_DEAD); +remove_online_state: + cpuhp_remove_multi_state(hp_online); err: - pr_warn("padata: initialization failed\n"); #endif + pr_warn("padata: initialization failed\n"); } --=20 2.26.2