From: "Yang Shi" <yang.s@alibaba-inc.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-mm@kvack.org, linux-kernel <linux-kernel@vger.kernel.org>,
linux-api@vger.kernel.org
Subject: Re: [RFC PATCH] fs: fsnotify: account fsnotify metadata to kmemcg
Date: Wed, 25 Oct 2017 08:34:41 +0800 [thread overview]
Message-ID: <1400291b-0fe7-3a22-2f8a-84ff488b8fc1@alibaba-inc.com> (raw)
In-Reply-To: <CAOQ4uxiVbA1HxPt9mjn-AL0XzMuOYU5dMeMoHxZbxHLzaS=niQ@mail.gmail.com>
On 10/23/17 10:42 PM, Amir Goldstein wrote:
> On Tue, Oct 24, 2017 at 7:12 AM, Yang Shi <yang.s@alibaba-inc.com> wrote:
>>
>>
>> On 10/22/17 1:24 AM, Amir Goldstein wrote:
>>>
>>> On Sat, Oct 21, 2017 at 12:07 AM, Yang Shi <yang.s@alibaba-inc.com> wrote:
>>>>
>>>>
>>>>
>>>> On 10/19/17 8:14 PM, Amir Goldstein wrote:
>>>>>
>>>>>
>>>>> On Fri, Oct 20, 2017 at 12:20 AM, Yang Shi <yang.s@alibaba-inc.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> We observed some misbehaved user applications might consume significant
>>>>>> amount of fsnotify slabs silently. It'd better to account those slabs
>>>>>> in
>>>>>> kmemcg so that we can get heads up before misbehaved applications use
>>>>>> too
>>>>>> much memory silently.
>>>>>
>>>>>
>>>>>
>>>>> In what way do they misbehave? create a lot of marks? create a lot of
>>>>> events?
>>>>> Not reading events in their queue?
>>>>
>>>>
>>>>
>>>> It looks both a lot marks and events. I'm not sure if it is the latter
>>>> case.
>>>> If I knew more about the details of the behavior, I would elaborated more
>>>> in
>>>> the commit log.
>>>
>>>
>>> If you are not sure, do not refer to user application as "misbehaved".
>>> Is updatedb(8) a misbehaved application because it produces a lot of
>>> access
>>> events?
>>
>>
>> Should be not. It sounds like our in-house applications. But, it is a sort
>> of blackbox to me.
>>
>
> If you know which process is "misbehaving" you can look at
> ls -l /proc/<pid>/fd |grep notify
> and see the anonymous inotify/fanotify file descriptors
>
> then you can look at /proc/<pid>/fdinfo/<fd> file of those
> file descriptors to learn more about the fanotify flags etc.
Thanks for the hints.
>
> ...
>
>>
>>>
>>> But I think there is another problem, not introduced by your change, but
>>> could
>>> be amplified because of it - when a non-permission event allocation fails,
>>> the
>>> event is silently dropped, AFAICT, with no indication to listener.
>>> That seems like a bug to me, because there is a perfectly safe way to deal
>>> with
>>> event allocation failure - queue the overflow event.
>>
>>
>> I'm not sure if such issue could be amplified by the accounting since once
>> the usage exceeds the limit any following kmem allocation would fail. So, it
>> might fail at fsnotify event allocation, or other places, i.e. fork, open
>> syscall, etc. So, in most cases the generator even can't generate new event
>> any more.
>>
>
> To be clear, I did not mean that kmem limit would cause a storm of dropped
> events. I meant if you have a listener outside memcp watching a single file
> for access/modifications and you have many containers each with its own
> limited memcg, then event drops probability goes to infinity as you run more
> of those kmem limited containers with event producers.
>
>> The typical output from my LTP test is filesystem dcache allocation error or
>> fork error due to kmem limit is reached.
>
> And that should be considered a success result of the test.
> The only failure case is when producer touches the file and event is
> not delivered
> nor an overflow event delivered.
> You can probably try to reduce allocation failure for fork and dentry by:
> 1. pin dentry cache of subject file on test init by opening the file
> 2. set the low kmem limit after forking
>
> Then you should probably loop the test enough times
> in some of the times, producer may fail to access the file
> in others if will succeed and produce events properly
> and many some times, producer will access the file and event
> will be dropped, so event count is lower than access count.
I still have not very direct test result shows the patch works as
expected. But, I did get some prove from running fanotify test *without*
your overflow event patch.
Running fanotify without your overflow patch, sometimes I can see:
[ 122.222455] SLUB: Unable to allocate memory on node -1,
gfp=0x14000c0(GFP_KERNEL)
[ 122.224198] cache: fanotify_event_info(110:fsnotify), object size:
56, buffer size: 88, default order: 0, min order: 0
[ 122.226578] node 0: slabs: 11, objs: 506, free: 0
[ 122.227655] node 1: slabs: 0, objs: 0, free: 0
[ 122.229251] SLUB: Unable to allocate memory on node -1,
gfp=0x14000c0(GFP_KERNEL)
[ 122.230912] cache: fanotify_event_info(110:fsnotify), object size:
56, buffer size: 88, default order: 0, min order: 0
[ 122.233266] node 0: slabs: 11, objs: 506, free: 0
[ 122.234337] node 1: slabs: 0, objs: 0, free: 0
The slub oom information is printed a couple of times, it should mean
neither the event is delivered nor the overflow event is delivered.
With your overflow patch, I've never seen such message.
Regards,
Yang
>
>
>
>>
>>> I am not going to be the one to determine if fixing this alleged bug is a
>>> prerequisite for merging your patch, but I think enforcing memory limits
>>> on
>>> event allocation could amplify that bug, so it should be fixed.
>>>
>>> The upside is that with both your accounting fix and ENOMEM = overlflow
>>> fix, it going to be easy to write a test that verifies both of them:
>>> - Run a listener in memcg with limited kmem and unlimited (or very
>>> large) event queue
>>> - Produce events inside memcg without listener reading them
>>> - Read event and expect an OVERFLOW even
>>>
>>> This is a simple variant of LTP tests inotify05 and fanotify05.
>>
>>
>> I tried to test your patch with LTP, but it sounds not that easy to setup a
>> scenario to make fsnotify event allocation just hit the kmem limit, since
>> the limit may be hit before a new event is allocated, for example allocating
>> dentry cache in open syscall may hit the limit.
>>
>> So, it sounds the overflow event might be not generated by the producer in
>> most cases.
>>
>
> Right. not as simple, but maybe still possible as I described above.
> Assuming that my patch is not buggy...
>
> Thanks,
> Amir.
>
next prev parent reply other threads:[~2017-10-25 0:34 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-19 21:20 [RFC PATCH] fs: fsnotify: account fsnotify metadata to kmemcg Yang Shi
2017-10-20 3:14 ` Amir Goldstein
2017-10-20 21:07 ` Yang Shi
2017-10-22 8:24 ` Amir Goldstein
2017-10-24 4:12 ` Yang Shi
2017-10-24 5:42 ` Amir Goldstein
2017-10-25 0:34 ` Yang Shi [this message]
2017-10-31 10:50 ` Jan Kara
2017-10-31 11:51 ` Amir Goldstein
2017-10-31 16:52 ` Jan Kara
2017-10-31 17:01 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1400291b-0fe7-3a22-2f8a-84ff488b8fc1@alibaba-inc.com \
--to=yang.s@alibaba-inc.com \
--cc=amir73il@gmail.com \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).