From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32ED6C433EF for ; Mon, 29 Nov 2021 15:02:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343666AbhK2PFc (ORCPT ); Mon, 29 Nov 2021 10:05:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351244AbhK2PD1 (ORCPT ); Mon, 29 Nov 2021 10:03:27 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96E6BC061397; Mon, 29 Nov 2021 05:23:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=XI+jJ4HE08vbr6cxNFwgDtnF+jXYceTPv2Cm+tg2N8U=; b=UoWSBUO0XIjErdM3Fo/kNlF2Of uR18Lpy94jYibhxqaRhcSl/ei022kwwzCkJQKD5zzFRQcosn/gIzXX7pC27HyZNcmpaTN2NqfLS+Q o26jheWS88Hu60Ibp73mem4N+F+4q3YURYA+xXfTL9j7dS/jzSXux+1JlVfHfGQjbhxbqGunYpKFZ x9Z0wUUtrBjfx+nbhFXvA+LKj5vfnxJ3RaSa8Bhg9dhdomEIRi+6d0N/YnFugrOCqdvciS0ETPmjh w5KsGNDoCSLzV94HaH5i/eUQoirZrmbPk5u4nGZS6NYMZF+16pIAYD+vx5mOk1vSZPiuTCAKlN07n VNem1r3w==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mrgcp-007hAU-DH; Mon, 29 Nov 2021 13:23:19 +0000 Date: Mon, 29 Nov 2021 13:23:19 +0000 From: Matthew Wilcox To: Michal Hocko Cc: Hao Lee , Linux MM , Johannes Weiner , vdavydov.dev@gmail.com, Shakeel Butt , cgroups@vger.kernel.org, LKML Subject: Re: [PATCH] mm: reduce spinlock contention in release_pages() Message-ID: References: <20211125080238.GA7356@haolee.io> <20211125123133.GA7758@haolee.io> <20211126162623.GA10277@haolee.io> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 29, 2021 at 09:39:16AM +0100, Michal Hocko wrote: > On Fri 26-11-21 16:26:23, Hao Lee wrote: > [...] > > I will try Matthew's idea to use semaphore or mutex to limit the number of BE > > jobs that are in the exiting path. This sounds like a feasible approach for > > our scenario... > > I am not really sure this is something that would be acceptable. Your > problem is resource partitioning. Papering that over by a lock is not > the right way to go. Besides that you will likely hit a hard question on > how many tasks to allow to run concurrently. Whatever the value some > workload will very likely going to suffer. We cannot assume admin to > chose the right value because there is no clear answer for that. Not to > mention other potential problems - e.g. even more priority inversions > etc. I don't see how we get priority inversions. These tasks are exiting; at the point they take the semaphore, they should not be holding any locks. They're holding a resource (memory) that needs to be released, but a task wanting to acquire memory must already be prepared to sleep. I see this as being a thundering herd problem. We have dozens, maybe hundreds of tasks all trying to free their memory at once. If we force the herd to go through a narrow gap, they arrive at the spinlock in an orderly manner.