From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751138AbdK0Fq7 (ORCPT <rfc822;w@1wt.eu>);
        Mon, 27 Nov 2017 00:46:59 -0500
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:47938 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1750787AbdK0Fq5 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Nov 2017 00:46:57 -0500
Subject: Re: [PATCH] mm: Do not stall register_shrinker
To: Minchan Kim <minchan@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>
References: <1511481899-20335-1-git-send-email-minchan@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
        kernel-team <kernel-team@lge.com>, Michal Hocko <mhocko@suse.com>,
        Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
        Shakeel Butt <shakeelb@google.com>,
        Johannes Weiner <hannes@cmpxchg.org>
From: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Date: Mon, 27 Nov 2017 11:16:46 +0530
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.5.1
MIME-Version: 1.0
In-Reply-To: <1511481899-20335-1-git-send-email-minchan@kernel.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
x-cbid: 17112705-0040-0000-0000-000003F2D051
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17112705-0041-0000-0000-000025F5A836
Message-Id: <cb35065d-b100-533b-04c1-1188a75220a2@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-11-27_03:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501
 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0
 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000
 definitions=main-1711270079
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 11/24/2017 05:34 AM, Minchan Kim wrote:
> Shakeel Butt reported, he have observed in production system that
> the job loader gets stuck for 10s of seconds while doing mount
> operation. It turns out that it was stuck in register_shrinker()
> and some unrelated job was under memory pressure and spending time
> in shrink_slab(). Machines have a lot of shrinkers registered and
> jobs under memory pressure has to traverse all of those memcg-aware
> shrinkers and do affect unrelated jobs which want to register their
> own shrinkers.
> 
> To solve the issue, this patch simply bails out slab shrinking
> once it found someone want to register shrinker in parallel.
> A downside is it could cause unfair shrinking between shrinkers.
> However, it should be rare and we can add compilcated logic once
> we found it's not enough.
> 
> Link: http://lkml.kernel.org/r/20171115005602.GB23810@bbox
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Reported-and-tested-by: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/vmscan.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6a5a72baccd5..6698001787bd 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -486,6 +486,14 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>  			sc.nid = 0;
>  
>  		freed += do_shrink_slab(&sc, shrinker, priority);
> +		/*
> +		 * bail out if someone want to register a new shrinker to
> +		 * prevent long time stall by parallel ongoing shrinking.
> +		 */
> +		if (rwsem_is_contended(&shrinker_rwsem)) {
> +			freed = freed ? : 1;
> +			break;
> +		}

This is similar to when it aborts for not being able to grab the
shrinker_rwsem at the beginning.

if (!down_read_trylock(&shrinker_rwsem)) {
	/*
	 * If we would return 0, our callers would understand that we
	 * have nothing else to shrink and give up trying. By returning
	 * 1 we keep it going and assume we'll be able to shrink next
	 * time.
	 */
	freed = 1;
	goto out;
}

Right now, shrink_slab() is getting called from three places. Twice in
shrink_node() and once in drop_slab_node(). But the return value from
shrink_slab() is checked only inside drop_slab_node() and it has some
heuristics to decide whether to keep on scanning over available memcg
shrinkers registered.

The question is does aborting here will still guarantee forward progress
for all the contexts which might be attempting to allocate memory and had
eventually invoked shrink_slab() ? Because may be the memory allocation
request has more priority than a context getting bit delayed while being
stuck waiting on shrinker_rwsem.