linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Beata Michalska <b.michalska@samsung.com>
To: Greg KH <greg@kroah.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-api@vger.kernel.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, hughd@google.com, lczerner@redhat.com,
	hch@infradead.org, linux-ext4@vger.kernel.org,
	linux-mm@kvack.org, kyungmin.park@samsung.com,
	kmpark@infradead.org
Subject: Re: [RFC v2 1/4] fs: Add generic file system event notifications
Date: Tue, 26 May 2015 18:39:48 +0200	[thread overview]
Message-ID: <5564A1D4.4040309@samsung.com> (raw)
In-Reply-To: <554B5329.8040907@samsung.com>

Hi,

On 05/07/2015 01:57 PM, Beata Michalska wrote:
> Hi,
> 
> On 05/05/2015 02:16 PM, Beata Michalska wrote:
>> Hi again,
>>
>> On 04/29/2015 11:13 AM, Greg KH wrote:
>>> On Wed, Apr 29, 2015 at 09:42:59AM +0200, Jan Kara wrote:
>>>> On Wed 29-04-15 09:03:08, Beata Michalska wrote:
>>>>> On 04/28/2015 07:39 PM, Greg KH wrote:
>>>>>> On Tue, Apr 28, 2015 at 04:46:46PM +0200, Beata Michalska wrote:
>>>>>>> On 04/28/2015 04:09 PM, Greg KH wrote:
>>>>>>>> On Tue, Apr 28, 2015 at 03:56:53PM +0200, Jan Kara wrote:
>>>>>>>>> On Mon 27-04-15 17:37:11, Greg KH wrote:
>>>>>>>>>> On Mon, Apr 27, 2015 at 05:08:27PM +0200, Beata Michalska wrote:
>>>>>>>>>>> On 04/27/2015 04:24 PM, Greg KH wrote:
>>>>>>>>>>>> On Mon, Apr 27, 2015 at 01:51:41PM +0200, Beata Michalska wrote:
>>>>>>>>>>>>> Introduce configurable generic interface for file
>>>>>>>>>>>>> system-wide event notifications, to provide file
>>>>>>>>>>>>> systems with a common way of reporting any potential
>>>>>>>>>>>>> issues as they emerge.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The notifications are to be issued through generic
>>>>>>>>>>>>> netlink interface by newly introduced multicast group.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Threshold notifications have been included, allowing
>>>>>>>>>>>>> triggering an event whenever the amount of free space drops
>>>>>>>>>>>>> below a certain level - or levels to be more precise as two
>>>>>>>>>>>>> of them are being supported: the lower and the upper range.
>>>>>>>>>>>>> The notifications work both ways: once the threshold level
>>>>>>>>>>>>> has been reached, an event shall be generated whenever
>>>>>>>>>>>>> the number of available blocks goes up again re-activating
>>>>>>>>>>>>> the threshold.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The interface has been exposed through a vfs. Once mounted,
>>>>>>>>>>>>> it serves as an entry point for the set-up where one can
>>>>>>>>>>>>> register for particular file system events.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Beata Michalska <b.michalska@samsung.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  Documentation/filesystems/events.txt |  231 ++++++++++
>>>>>>>>>>>>>  fs/Makefile                          |    1 +
>>>>>>>>>>>>>  fs/events/Makefile                   |    6 +
>>>>>>>>>>>>>  fs/events/fs_event.c                 |  770 ++++++++++++++++++++++++++++++++++
>>>>>>>>>>>>>  fs/events/fs_event.h                 |   25 ++
>>>>>>>>>>>>>  fs/events/fs_event_netlink.c         |   99 +++++
>>>>>>>>>>>>>  fs/namespace.c                       |    1 +
>>>>>>>>>>>>>  include/linux/fs.h                   |    6 +-
>>>>>>>>>>>>>  include/linux/fs_event.h             |   58 +++
>>>>>>>>>>>>>  include/uapi/linux/fs_event.h        |   54 +++
>>>>>>>>>>>>>  include/uapi/linux/genetlink.h       |    1 +
>>>>>>>>>>>>>  net/netlink/genetlink.c              |    7 +-
>>>>>>>>>>>>>  12 files changed, 1257 insertions(+), 2 deletions(-)
>>>>>>>>>>>>>  create mode 100644 Documentation/filesystems/events.txt
>>>>>>>>>>>>>  create mode 100644 fs/events/Makefile
>>>>>>>>>>>>>  create mode 100644 fs/events/fs_event.c
>>>>>>>>>>>>>  create mode 100644 fs/events/fs_event.h
>>>>>>>>>>>>>  create mode 100644 fs/events/fs_event_netlink.c
>>>>>>>>>>>>>  create mode 100644 include/linux/fs_event.h
>>>>>>>>>>>>>  create mode 100644 include/uapi/linux/fs_event.h
>>>>>>>>>>>>
>>>>>>>>>>>> Any reason why you just don't do uevents for the block devices today,
>>>>>>>>>>>> and not create a new type of netlink message and userspace tool required
>>>>>>>>>>>> to read these?
>>>>>>>>>>>
>>>>>>>>>>> The idea here is to have support for filesystems with no backing device as well.
>>>>>>>>>>> Parsing the message with libnl is really simple and requires few lines of code
>>>>>>>>>>> (sample application has been presented in the initial version of this RFC)
>>>>>>>>>>
>>>>>>>>>> I'm not saying it's not "simple" to parse, just that now you are doing
>>>>>>>>>> something that requires a different tool.  If you have a block device,
>>>>>>>>>> you should be able to emit uevents for it, you don't need a backing
>>>>>>>>>> device, we handle virtual filesystems in /sys/block/ just fine :)
>>>>>>>>>>
>>>>>>>>>> People already have tools that listen to libudev for system monitoring
>>>>>>>>>> and management, why require them to hook up to yet-another-library?  And
>>>>>>>>>> what is going to provide the ability for multiple userspace tools to
>>>>>>>>>> listen to these netlink messages in case you have more than one program
>>>>>>>>>> that wants to watch for these things (i.e. multiple desktop filesystem
>>>>>>>>>> monitoring tools, system-health checkers, etc.)?
>>>>>>>>>   As much as I understand your concerns I'm not convinced uevent interface
>>>>>>>>> is a good fit. There are filesystems that don't have underlying block
>>>>>>>>> device - think of e.g. tmpfs or filesystems working directly on top of
>>>>>>>>> flash devices.  These still want to send notification to userspace (one of
>>>>>>>>> primary motivation for this interfaces was so that tmpfs can notify about
>>>>>>>>> something). And creating some fake nodes in /sys/block for tmpfs and
>>>>>>>>> similar filesystems seems like doing more harm than good to me...
>>>>>>>>
>>>>>>>> If these are "fake" block devices, what's going to be present in the
>>>>>>>> block major/minor fields of the netlink message?  For some reason I
>>>>>>>> thought it was a required field, and because of that, I thought we had a
>>>>>>>> "real" filesystem somewhere to refer to, otherwise how would userspace
>>>>>>>> know what filesystem was creating these events?
>>>>>>>>
>>>>>>>> What am I missing here?
>>>>>>>>
>>>>>>>> confused,
>>>>>>>>
>>>>>>>> greg k-h
>>>>>>>>
>>>>>>>
>>>>>>> For those 'fake' block devs, upon mount, get_anon_bdev will assign
>>>>>>> the major:minor numbers. Userspace might get those through stat.
>>>>>>
>>>>>> How can userspace do the mapping backwards from this "anonymous"
>>>>>> major:minor number for these types of filesystems in such a way that
>>>>>> they can "know" how to report the block device that is causing the
>>>>>> event?
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> greg k-h
>>>>>>
>>>>>
>>>>> It needs to be done internally by the app but is doable.
>>>>> The app knows what it is watching, so it can maintain the mappings.
>>>>> So prior to activating the notifications it can call 'stat' on the mount point.
>>>>> Stat struct gives the 'st_dev' which is the device id. Same will be reported
>>>>> within the message payload (through major:minor numbers). So having this,
>>>>> the app is able to get any other information it needs. 
>>>>> Note that the events refer to the file system as a whole and they may not
>>>>> necessarily have anything to do with the actual block device. 
>>>
>>> How are you going to show an event for a filesystem that is made up of
>>> multiple block devices?
>>>
>>>>   Or you can use /proc/self/mountinfo for the mapping. There you can see
>>>> device numbers, real device names if applicable and mountpoints. This has
>>>> the advantage that it works even if filesystem mountpoints change.
>>>
>>> Ok, then that brings up my next question, how does this handle
>>> namespaces?  What namespace is the event being sent in?  block devices
>>> aren't namespaced, but the mount points are, is that going to cause
>>> problems?
>>>
>>> thanks,
>>>
>>> greg k-h
>>>
>>
>> Getting back to the namespaces ... 
>> In the current state the notifications will be sent to the init network namespace,
>> which means that processes belonging to a different net namespace will not
>> be able to receive them. To be more precise, those processes will not be 
>> able to subscribe to the multicast group, though this can be easily changed.
>> Furthermore, the notifications might also be sent to specific namespace.
>> In this case, the one, with which the trace for the mount point has been registered,
>> which as I believe would be the best approach.
>>
>> As for the mount namespaces, reading the config file needs to be slightly tweaked, 
>> to hide away all the registered mount points which does not belong to the current
>> mount namespace.
>>
>> Still, there is one possible 'issue' - the private/slave mount points. 
>> As the notifications will be sent to all the listeners (within the same netns),
>> the events might be visible to processes outside the given mount ns.
>> This should be limited to only those listeners that share the mount namespace,
>> to which such private/slave mount points belong. As using the generic netlink
>> to filter the outgoing messages is doable (with small changes to current
>> implementation), the filters themselves seem rather cumbersome, as they would require
>> finding the socket’s owner mount namespace, which just doesn't seems right.
>> On the other hand, identifying the file system, which generated the event, will
>> not be possible for processes outside such namespace, as device major:minor
>> numbers are not bound to any namespace (afaict) so they will not provide any
>> valid information. They will remain unresolved.
>>
>> The best way out here though, is to leave it to userspace to properly setup new namespaces:
>> the mount namespace with possible private/slave mounts should have a separate 
>> network namespace to isolate the potential fs events, if required.
>>
>>
>> BR
>> Beata
>>
>>
>>
> 
> I'm not really sure where we are with this RFC now (?).
> Just wanted to let You know I won't be available for the next two weeks,
> in case this comes around.
> 
> Best Regards
> Beata
> 
> 

Things has gone a bit quiet thread wise ...
As I believe I've managed to snap back to reality, I was hoping we could continue with this?
I'm not sure if we've got everything cleared up or ... have we reached a dead end?
Please let me know if we can move to the next stage? Or, if there are any showstoppers?

Thank You,

Best Regards
Beata


  reply	other threads:[~2015-05-26 16:39 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-27 11:51 [RFC v2 0/4] fs: Add generic file system event notifications Beata Michalska
2015-04-27 11:51 ` [RFC v2 1/4] " Beata Michalska
2015-04-27 14:24   ` Greg KH
2015-04-27 15:08     ` Beata Michalska
2015-04-27 15:37       ` Greg KH
2015-04-28  9:05         ` Beata Michalska
2015-04-28 13:56         ` Jan Kara
2015-04-28 14:09           ` Greg KH
2015-04-28 14:46             ` Beata Michalska
2015-04-28 17:39               ` Greg KH
2015-04-29  7:03                 ` Beata Michalska
2015-04-29  7:42                   ` Jan Kara
2015-04-29  9:13                     ` Greg KH
2015-04-29 11:10                       ` Beata Michalska
2015-04-29 13:45                         ` Greg KH
2015-04-29 15:48                           ` Beata Michalska
2015-04-29 15:55                             ` Greg KH
2015-04-30  8:21                               ` Beata Michalska
2015-05-05 12:16                       ` Beata Michalska
2015-05-07 11:57                         ` Beata Michalska
2015-05-26 16:39                           ` Beata Michalska [this message]
2015-05-27  2:34                             ` Greg KH
2015-05-27 13:32                               ` Beata Michalska
2015-04-27 11:51 ` [RFC v2 2/4] ext4: Add helper function to mark group as corrupted Beata Michalska
2015-04-27 11:51 ` [RFC v2 3/4] ext4: Add support for generic FS events Beata Michalska
2015-04-27 11:51 ` [RFC v2 4/4] shmem: " Beata Michalska

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5564A1D4.4040309@samsung.com \
    --to=b.michalska@samsung.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=greg@kroah.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=kmpark@infradead.org \
    --cc=kyungmin.park@samsung.com \
    --cc=lczerner@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).