All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: fanotify_merge improvements
Date: Tue, 26 Jan 2021 18:21:26 +0200	[thread overview]
Message-ID: <CAOQ4uxiSSYr4bejwZBBPDjs1Vg_BUSSjY4YiUAgri=adHdOLuQ@mail.gmail.com> (raw)
In-Reply-To: <20210125130149.GC1175@quack2.suse.cz>

On Mon, Jan 25, 2021 at 3:01 PM Jan Kara <jack@suse.cz> wrote:
>
> On Sat 23-01-21 15:30:59, Amir Goldstein wrote:
> > On Fri, Jan 22, 2021 at 3:59 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > > > > Hum, now thinking about this, maybe we could clean this up even a bit more.
> > > > > > event->inode is currently used only by inotify and fanotify for merging
> > > > > > purposes. Now inotify could use its 'wd' instead of inode with exactly the
> > > > > > same results, fanotify path or fid check is at least as strong as the inode
> > > > > > check. So only for the case of pure "inode" events, we need to store inode
> > > > > > identifier in struct fanotify_event - and we can do that in the union with
> > > > > > struct path and completely remove the 'inode' member from fsnotify_event.
> > > > > > Am I missing something?
> > > > >
> > > > > That generally sounds good and I did notice it is strange that wd is not
> > > > > being compared.  However, I think I was worried that comparing fid+name
> > > > > (in following patches) would be more expensive than comparing dentry (or
> > > > > object inode) as a "rule out first" in merge, so I preferred to keep the
> > > > > tag/dentry/id comparison for fanotify_fid case.
> > > >
> > > > Yes, that could be a concern.
> > > >
> > > > > Given this analysis (and assuming it is correct), would you like me to
> > > > > just go a head with the change suggested above? or anything beyond that?
> > > >
> > > > Let's go just with the change suggested above for now. We can work on this
> > > > later (probably with optimizing of the fanotify merging code).
> > > >
> > >
> > > Hi Jan,
> > >
> > > Recap:
> > > - fanotify_merge is very inefficient and uses extensive CPU if queue contains
> > >   many events, so it is rather easy for a poorly written listener to
> > > cripple the system
> > > - You had an idea to store in event->objectid a hash of all the compared
> > >   fields (e.g. fid+name)
> > > - I think you had an idea to keep a hash table of events in the queue
> > > to find the
> > >   merge candidates faster
> > > - For internal uses, I carry a patch that limits the linear search for
> > > last 128 events
> > >   which is enough to relieve the CPU overuse in case of unattended long queues
> > >
> > > I tried looking into implementing the hash table idea, assuming I understood you
> > > correctly and I struggled to choose appropriate table sizes. It seemed to make
> > > sense to use a global hash table, such as inode/dentry cache for all the groups
> > > but that would add complexity to locking rules of queue/dequeue and
> > > group cleanup.
> > >
> > > A simpler solution I considered, similar to my 128 events limit patch,
> > > is to limit
> > > the linear search to events queued in the last X seconds.
> > > The rationale is that event merging is not supposed to be long term at all.
> > > If a listener fails to perform read from the queue, it is not fsnotify's job to
> > > try and keep the queue compact. I think merging events mechanism was
> > > mainly meant to merge short bursts of events on objects, which are quite
> > > common and surely can happen concurrently on several objects.
> > >
> > > My intuition is that making event->objectid into event->hash in addition
> > > to limiting the age of events to merge would address the real life workloads.
> > > One question if we do choose this approach is what should the age limit be?
> > > Should it be configurable? Default to infinity and let distro cap the age or
> > > provide a sane default by kernel while slightly changing behavior (yes please).
> > >
> > > What are your thoughts about this?
> >
> > Aha! found it:
> > https://lore.kernel.org/linux-fsdevel/20200227112755.GZ10728@quack2.suse.cz/
> > You suggested a small hash table per group (128 slots).
> >
> > My intuition is that this will not be good enough for the worst case, which is
> > not that hard to hit is real life:
> > 1. Listener sets FAN_UNLIMITED_QUEUE
> > 2. Listener adds a FAN_MARK_FILESYSTEM watch
> > 3. Many thousands of events are queued
> > 4. Listener lingers (due to bad implementation?) in reading events
> > 5. Every single event now incurs a huge fanotify_merge() cost
> >
> > Reducing the cost of merge from O(N) to O(N/128) doesn't really fix the
> > problem.
>
> So my thought was that indeed reducing the overhead of merging by a factor
> of 128 should be enough for any practical case as much as I agree that in
> principle the computational complexity remains the same. And I've picked
> per-group hash table to avoid interferences among notification groups and
> to keep locking simple. That being said I'm not opposed to combining this
> with a limit on the number of elements traversed in a hash chain (e.g.
> those 128 you use yourself) - it will be naturally ordered by queue order
> if we are a bit careful. This will provide efficient and effective merging
> for ~8k queued events which seems enough to me. I find time based limits
> not really worth it. Yes, they provide more predictable behavior but less
> predictable runtime and overall I don't find the complexity worth the
> benefit.
>

Sounds reasonable.
If you have time, please take a look at this WIP branch:
https://github.com/amir73il/linux/commits/fanotify_merge
and let me know if you like the direction it is taking.

This branch is only compile tested, but I am asking w.r.t to the chosen
data structures.
So far it is just an array of queues selected by (yet unmodified) objectid.
Reading is just from any available queue.

My goal was to avoid having to hang the event on multiple list/hlist and
the idea is to implement read by order of events as follows:

- With multi queue, high bit of obejctid will be masked for merge compare.
- Instead, they will be used to store the next_qid to read from

For example:
- event #1 is added to queue 6
- set group->last_qid = 6
- set group->next_qid = 6 (because group->num_events == 1)
- event #2 is added to queue 13
- the next_qid bits of the last event in last_qid (6) queue are set to 13
- set group->last_qid = 13

- read() checks value of group->next_qid and reads the first event
from queue 6 (event #1)
- event #1 has 13 stored in next_qid bits so set group->next_qid = 13
- read() reads first event from queue 13 (event #2)

Permission events require special care, but that is the idea of a simple singly
linked list using qid's for reading events by insert order and merging by hashed
queue.

The advantage of this method is that most of the generic code remains unaware
of the multi queue changes (see WIP) and I think it gets the job done without
complicating the code too much.

Thoughts?

Thanks,
Amir.

  reply	other threads:[~2021-01-26 16:22 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-17 13:14 [PATCH v2 00/16] Fanotify event with name info Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 01/16] fsnotify: tidy up FS_ and FAN_ constants Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 02/16] fsnotify: factor helpers fsnotify_dentry() and fsnotify_file() Amir Goldstein
2020-02-25 13:46   ` Jan Kara
2020-02-25 14:27     ` Amir Goldstein
2020-02-26 13:59       ` Jan Kara
2020-02-17 13:14 ` [PATCH v2 03/16] fsnotify: funnel all dirent events through fsnotify_name() Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 04/16] fsnotify: use helpers to access data by data_type Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 05/16] fsnotify: simplify arguments passing to fsnotify_parent() Amir Goldstein
2020-02-19 10:50   ` kbuild test robot
2020-02-19 10:50     ` kbuild test robot
2020-02-19 11:11   ` Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 06/16] fsnotify: pass dentry instead of inode for events possible on child Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 07/16] fsnotify: replace inode pointer with tag Amir Goldstein
2020-02-26  8:20   ` Jan Kara
2020-02-26  9:34     ` Amir Goldstein
2020-02-26  8:52   ` Jan Kara
2020-02-17 13:14 ` [PATCH v2 08/16] fanotify: merge duplicate events on parent and child Amir Goldstein
2020-02-26  9:18   ` Jan Kara
2020-02-26 12:14     ` Amir Goldstein
2020-02-26 14:38       ` Jan Kara
2021-01-22 13:59         ` fanotify_merge improvements Amir Goldstein
2021-01-23 13:30           ` Amir Goldstein
2021-01-25 13:01             ` Jan Kara
2021-01-26 16:21               ` Amir Goldstein [this message]
2021-01-27 11:24                 ` Jan Kara
2021-01-27 12:57                   ` Amir Goldstein
2021-01-27 15:15                     ` Jan Kara
2021-01-27 18:03                       ` Amir Goldstein
2021-01-28 10:27                         ` Jan Kara
2021-01-28 18:50                           ` Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 09/16] fanotify: fix merging marks masks with FAN_ONDIR Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 10/16] fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 11/16] fanotify: prepare to encode both parent and child fid's Amir Goldstein
2020-02-26 10:23   ` Jan Kara
2020-02-26 11:53     ` Amir Goldstein
2020-02-26 17:07       ` Jan Kara
2020-02-26 17:50         ` Amir Goldstein
2020-02-27  9:06           ` Amir Goldstein
2020-02-27 11:27             ` Jan Kara
2020-02-27 12:12               ` Amir Goldstein
2020-02-27 13:30                 ` Jan Kara
2020-02-27 14:06                   ` Amir Goldstein
2020-03-01 16:26                     ` Amir Goldstein
2020-03-05 15:49                       ` Jan Kara
2020-03-06 11:19                         ` Amir Goldstein
2020-03-08  7:29                           ` Amir Goldstein
2020-03-18 17:51                             ` Jan Kara
2020-03-18 18:50                               ` Amir Goldstein
2020-03-19  9:30                                 ` Jan Kara
2020-03-19 10:07                                   ` Amir Goldstein
2020-03-30 19:29                                 ` Amir Goldstein
2020-02-27 11:01           ` Jan Kara
2020-02-17 13:14 ` [PATCH v2 12/16] fanotify: record name info for FAN_DIR_MODIFY event Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 13/16] fanotify: report " Amir Goldstein
2020-02-19  9:43   ` kbuild test robot
2020-02-19  9:43     ` kbuild test robot
2020-02-19 10:17   ` kbuild test robot
2020-02-19 10:17     ` kbuild test robot
2020-02-19 11:22   ` Amir Goldstein
2020-04-16 12:16   ` Michael Kerrisk (man-pages)
2020-04-20 15:53     ` Jan Kara
2020-04-20 18:45     ` Amir Goldstein
2020-04-20 18:47       ` Michael Kerrisk (man-pages)
2020-02-17 13:14 ` [PATCH v2 14/16] fanotify: report parent fid + name with FAN_REPORT_NAME Amir Goldstein
2020-02-17 13:14 ` [PATCH v2 15/16] fanotify: refine rules for when name is reported Amir Goldstein
2020-02-17 13:14 ` [BONUS][PATCH v2 16/16] fanotify: support limited functionality for unprivileged users Amir Goldstein
2020-02-20 22:10 ` [PATCH v2 00/16] Fanotify event with name info Matthew Bobrowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxiSSYr4bejwZBBPDjs1Vg_BUSSjY4YiUAgri=adHdOLuQ@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.