From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=0.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64181C4361B for ; Thu, 17 Dec 2020 15:47:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C212323975 for ; Thu, 17 Dec 2020 15:47:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C212323975 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 135128D0002; Thu, 17 Dec 2020 10:47:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BE938D0001; Thu, 17 Dec 2020 10:47:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7A778D0002; Thu, 17 Dec 2020 10:47:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0155.hostedemail.com [216.40.44.155]) by kanga.kvack.org (Postfix) with ESMTP id CB3BE8D0001 for ; Thu, 17 Dec 2020 10:47:28 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 723C98249980 for ; Thu, 17 Dec 2020 15:47:28 +0000 (UTC) X-FDA: 77603203776.14.farm32_190e45c27436 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 4ACA218229837 for ; Thu, 17 Dec 2020 15:47:28 +0000 (UTC) X-HE-Tag: farm32_190e45c27436 X-Filterd-Recvd-Size: 15513 Received: from mail-io1-f43.google.com (mail-io1-f43.google.com [209.85.166.43]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Thu, 17 Dec 2020 15:47:27 +0000 (UTC) Received: by mail-io1-f43.google.com with SMTP id d9so27935826iob.6 for ; Thu, 17 Dec 2020 07:47:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=iLCPPHyQpo/+lnVnJ7cnrtvgRXeltF4dz7uBIucr590=; b=CBqOATblBj0/DTz9ZOTDnI19EVNSvvVHdwyGj+zNFPuC/pWiLV9Y/343ZFduSsqUAo XQDQJgDNRv2wtd6EisG3hJb+8wZYqloYqTjm6apvWB9iWeEeqITFNNOUa658QQorfU+9 JRQQXLFphK7CoPjLrQQ95niKR3rqT3LlRHcjpmEX0XJQVZllgk2tdAhG/qTgtmzXWolJ IROjFTKQ/nKxnK5dJm2YVSlYXzFI1cKpUbQjbzhLDFN/TIPe0hDPdwv806sY+anjvGBo v0V6CfluyGMvtvxxFQvzB0p8zry2/ybYOuGqmPMCrslhjZe2v+whAQNWW7cXS/ie7VCT Cv/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=iLCPPHyQpo/+lnVnJ7cnrtvgRXeltF4dz7uBIucr590=; b=QIScovQwJCaZObrQdi3Whz+PX740EVv89rjMI8zDVcWrdnKuBoT6IZhrSrr2XcHQc0 X0une1J0kh7evALvHx6AK0kIBuCQDggxUh2osmoZo2TvXyDumv9qE7Y8du+i2pwbuBLq mlJY1GQbaQ3/L7z5vHz54BnM4ZKETDdsxXcoUWIeLh1pDO3J/dcadA+hUSCE3Tz5BSpQ Nq79puk8YzbDtt3sb2BNbBVZ01SFK19/p7zFHZ/ENKe1B8k14x+cDt9SdQQRn0GdmAj3 3RJV7nySsJXtWFW0berT/I2XpjilIeQeaBtZQQ3UWis1wi71gleFbHy5DsQ4iyQHqCRD T6eA== X-Gm-Message-State: AOAM5337T2B6X0HL1TZBHG5S+lX80E0PSmbqFe/bSCVCxwzzO245gR4z xF+DjtvU5HQii4LC6cCqm1alSu9M+kZqWzye8TM= X-Google-Smtp-Source: ABdhPJyz2CpQwinN+/zGK6A0yYpS4vdt/otKewBpt3/7zxn/Lho5ELR0N/5xudisnfzBm8Ov76/++zuooza8qLe9oyQ= X-Received: by 2002:a02:cc54:: with SMTP id i20mr49372066jaq.136.1608220047148; Thu, 17 Dec 2020 07:47:27 -0800 (PST) MIME-Version: 1.0 References: <158893941613.200862.4094521350329937435.stgit@buzz> <97ece625-2799-7ae6-28b5-73c52c7c497b@oracle.com> <04b4d5cf-780d-83a9-2b2b-80ae6029ae2c@oracle.com> <4bcbd2e7-b5e3-6f45-51cf-8658f9c9009d@oracle.com> In-Reply-To: <4bcbd2e7-b5e3-6f45-51cf-8658f9c9009d@oracle.com> From: Konstantin Khlebnikov Date: Thu, 17 Dec 2020 18:47:16 +0300 Message-ID: Subject: Re: [PATCH RFC 0/8] dcache: increase poison resistance To: Junxiao Bi Cc: Konstantin Khlebnikov , Linux Kernel Mailing List , linux-fsdevel , linux-mm@kvack.org, Alexander Viro , Waiman Long , Gautham Ananthakrishna , matthew.wilcox@oracle.com Content-Type: multipart/alternative; boundary="000000000000a18af205b6aae800" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000a18af205b6aae800 Content-Type: text/plain; charset="UTF-8" On Wed, Dec 16, 2020 at 9:47 PM Junxiao Bi wrote: > Hi Konstantin, > > How would you like to proceed with this patch set? > > This patchset as it is already fixed the customer issue we faced, it > will stop memory fragmentation causing by negative dentry and no > performance regression through our test. In production workload, it is > common that some app kept creating and removing tmp files, this will > leave a lot of negative dentry over time, some time later, it will cause > memory fragmentation and system run into memory compaction and not > responsible. It will be good to push it to upstream merge. If you are > busy, we can try push it again. > Feel free to try. You may drop fist hunk of changes which fixes slow inotify and keep only dances around hash chains. Thanks, > > Junxiao. > > On 12/14/20 3:10 PM, Junxiao Bi wrote: > > On 12/13/20 11:43 PM, Konstantin Khlebnikov wrote: > > > >> > >> > >> On Sun, Dec 13, 2020 at 9:52 PM Junxiao Bi >> > wrote: > >> > >> On 12/11/20 11:32 PM, Konstantin Khlebnikov wrote: > >> > >> > On Thu, Dec 10, 2020 at 2:01 AM Junxiao Bi > >> > >> > >> > >> wrote: > >> > > >> > Hi Konstantin, > >> > > >> > We tested this patch set recently and found it limiting > >> negative > >> > dentry > >> > to a small part of total memory. And also we don't see any > >> > performance > >> > regression on it. Do you have any plan to integrate it into > >> > mainline? It > >> > will help a lot on memory fragmentation issue causing by > >> dentry slab, > >> > there were a lot of customer cases where sys% was very high > >> since > >> > most > >> > cpu were doing memory compaction, dentry slab was taking too > >> much > >> > memory > >> > and nearly all dentry there were negative. > >> > > >> > > >> > Right now I don't have any plans for this. I suspect such > >> problems will > >> > appear much more often since machines are getting bigger. > >> > So, somebody will take care of it. > >> We already had a lot of customer cases. It made no sense to leave so > >> many negative dentry in the system, it caused memory fragmentation > >> and > >> not much benefit. > >> > >> > >> Dcache could grow so big only if the system lacks of memory pressure. > >> > >> Simplest solution is a cronjob which provinces such pressure by > >> creating sparse file on disk-based fs and then reading it. > >> This should wash away all inactive caches with no IO and zero chance > >> of oom. > > Sound good, will try. > >> > >> > > >> > First part which collects negative dentries at the end list of > >> > siblings could be > >> > done in a more obvious way by splitting the list in two. > >> > But this touches much more code. > >> That would add new field to dentry? > >> > >> > >> Yep. Decision is up to maintainers. > >> > >> > > >> > Last patch isn't very rigid but does non-trivial changes. > >> > Probably it's better to call some garbage collector thingy > >> periodically. > >> > Lru list needs pressure to age and reorder entries properly. > >> > >> Swap the negative dentry to the head of hash list when it get > >> accessed? > >> Extra ones can be easily trimmed when swapping, using GC is to > >> reduce > >> perf impact? > >> > >> > >> Reclaimer/shrinker scans denties in LRU lists, it's an another list. > > > > Ah, you mean GC to reclaim from LRU list. I am not sure it could catch > > up the speed of negative dentry generating. > > > > Thanks, > > > > Junxiao. > > > >> My patch used order in hash lists is a very unusual way. Don't be > >> confused. > >> > >> There are four lists > >> parent - siblings > >> hashtable - hashchain > >> LRU > >> inode - alias > >> > >> > >> Thanks, > >> > >> Junxioao. > >> > >> > > >> > Gc could be off by default or thresholds set very high (50% of > >> ram for > >> > example). > >> > Final setup could be left up to owners of large systems, which > >> needs > >> > fine tuning. > >> > --000000000000a18af205b6aae800 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Wed, Dec 16, 2020 at 9:47 PM Junxiao B= i <junxiao.bi@oracle.com>= ; wrote:
Hi Konstantin,

How would you like to proceed with this patch set?

This patchset as it is already fixed the customer issue we faced, it
will stop memory fragmentation causing by negative dentry and no
performance regression through our test. In production workload, it is
common that some app kept creating and removing tmp files, this will
leave a lot of negative dentry over time, some time later, it will cause memory fragmentation and system run into memory compaction and not
responsible. It will be good to push it to upstream merge. If you are
busy, we can try push it again.

Feel fr= ee to try.

You may drop fist hunk of changes which= fixes slow inotify and keep only dances around hash chains.

=

Thanks,

Junxiao.

On 12/14/20 3:10 PM, Junxiao Bi wrote:
> On 12/13/20 11:43 PM, Konstantin Khlebnikov wrote:
>
>>
>>
>> On Sun, Dec 13, 2020 at 9:52 PM Junxiao Bi <junxiao.bi@oracle.com
>> <mailto:junxiao.bi@oracle.com>> wrote:
>>
>> =C2=A0=C2=A0=C2=A0 On 12/11/20 11:32 PM, Konstantin Khlebnikov wro= te:
>>
>> =C2=A0=C2=A0=C2=A0 > On Thu, Dec 10, 2020 at 2:01 AM Junxiao Bi=
>> =C2=A0=C2=A0=C2=A0 <junxiao.bi@oracle.com <mailto:junxiao.bi@oracle.com>
>> =C2=A0=C2=A0=C2=A0 > <mailto:junxiao.bi@oracle.com <mailto:junxiao.bi@oracle.com&= gt;>>
>> =C2=A0=C2=A0=C2=A0 wrote:
>> =C2=A0=C2=A0=C2=A0 >
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0Hi Konstantin,
>> =C2=A0=C2=A0=C2=A0 >
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0We tested this patch se= t recently and found it limiting
>> negative
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0dentry
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0to a small part of tota= l memory. And also we don't see any
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0performance
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0regression on it. Do yo= u have any plan to integrate it into
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0mainline? It
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0will help a lot on memo= ry fragmentation issue causing by
>> =C2=A0=C2=A0=C2=A0 dentry slab,
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0there were a lot of cus= tomer cases where sys% was very high
>> =C2=A0=C2=A0=C2=A0 since
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0most
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0cpu were doing memory c= ompaction, dentry slab was taking too
>> =C2=A0=C2=A0=C2=A0 much
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0memory
>> =C2=A0=C2=A0=C2=A0 >=C2=A0 =C2=A0 =C2=A0and nearly all dentry t= here were negative.
>> =C2=A0=C2=A0=C2=A0 >
>> =C2=A0=C2=A0=C2=A0 >
>> =C2=A0=C2=A0=C2=A0 > Right now I don't have any plans for t= his. I suspect such
>> =C2=A0=C2=A0=C2=A0 problems will
>> =C2=A0=C2=A0=C2=A0 > appear much more often since machines are = getting bigger.
>> =C2=A0=C2=A0=C2=A0 > So, somebody will take care of it.
>> =C2=A0=C2=A0=C2=A0 We already had a lot of customer cases. It made= no sense to leave so
>> =C2=A0=C2=A0=C2=A0 many negative dentry in the system, it caused m= emory fragmentation
>> =C2=A0=C2=A0=C2=A0 and
>> =C2=A0=C2=A0=C2=A0 not much benefit.
>>
>>
>> Dcache could grow so big only if the system lacks of memory pressu= re.
>>
>> Simplest solution is a cronjob=C2=A0which provinces=C2=A0such pres= sure by
>> creating sparse file on disk-based fs and then reading it.
>> This should wash away all inactive caches with no IO and zero chan= ce
>> of oom.
> Sound good, will try.
>>
>> =C2=A0=C2=A0=C2=A0 >
>> =C2=A0=C2=A0=C2=A0 > First part which collects negative dentrie= s at the end list of
>> =C2=A0=C2=A0=C2=A0 > siblings could be
>> =C2=A0=C2=A0=C2=A0 > done in a more obvious way by splitting th= e list in two.
>> =C2=A0=C2=A0=C2=A0 > But this touches much more code.
>> =C2=A0=C2=A0=C2=A0 That would add new field to dentry?
>>
>>
>> Yep. Decision=C2=A0is up to maintainers.
>>
>> =C2=A0=C2=A0=C2=A0 >
>> =C2=A0=C2=A0=C2=A0 > Last patch isn't very rigid but does n= on-trivial changes.
>> =C2=A0=C2=A0=C2=A0 > Probably it's better to call some garb= age collector thingy
>> =C2=A0=C2=A0=C2=A0 periodically.
>> =C2=A0=C2=A0=C2=A0 > Lru list needs pressure to age and reorder= entries properly.
>>
>> =C2=A0=C2=A0=C2=A0 Swap the negative dentry to the head of hash li= st when it get
>> =C2=A0=C2=A0=C2=A0 accessed?
>> =C2=A0=C2=A0=C2=A0 Extra ones can be easily trimmed when swapping,= using GC is to
>> reduce
>> =C2=A0=C2=A0=C2=A0 perf impact?
>>
>>
>> Reclaimer/shrinker scans denties=C2=A0in LRU lists, it's an an= other list.
>
> Ah, you mean GC to reclaim from LRU list. I am not sure it could catch=
> up the speed of negative dentry generating.
>
> Thanks,
>
> Junxiao.
>
>> My patch used order in hash lists is a very unusual way. Don't= be
>> confused.
>>
>> There are four lists
>> parent - siblings
>> hashtable - hashchain
>> LRU
>> inode - alias
>>
>>
>> =C2=A0=C2=A0=C2=A0 Thanks,
>>
>> =C2=A0=C2=A0=C2=A0 Junxioao.
>>
>> =C2=A0=C2=A0=C2=A0 >
>> =C2=A0=C2=A0=C2=A0 > Gc could be off by default or thresholds s= et very high (50% of
>> =C2=A0=C2=A0=C2=A0 ram for
>> =C2=A0=C2=A0=C2=A0 > example).
>> =C2=A0=C2=A0=C2=A0 > Final setup could be left up to owners of = large systems, which
>> =C2=A0=C2=A0=C2=A0 needs
>> =C2=A0=C2=A0=C2=A0 > fine tuning.
>>
--000000000000a18af205b6aae800--