All of lore.kernel.org
 help / color / mirror / Atom feed
* A few concerns about fanotify implementation.
@ 2010-10-26 12:13 Vasily Novikov
  2010-10-26 12:58 ` [malware-list] " Tvrtko Ursulin
  0 siblings, 1 reply; 17+ messages in thread
From: Vasily Novikov @ 2010-10-26 12:13 UTC (permalink / raw)
  To: eparis-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Novikov, Vasily,
	malware-list-h+Im9A44IAFcMpApZELgcQ

Hi Eric,

We are interested in using fanotify in anti-malware applications. I
found a few concerns in fanotify implementation from the recently
released 2.6.36 kernel:

1. Race in cache implementation.

The cache is implemented as inode ignored mark. I suppose there could be
a race here.
Consider the following scenario with hostile processes A and B, and
victim process C:
1. Process A opens new file for writing. File check request is
generated.
2. File check is performed in userspace. Check result is "file has no
malware".
3. The "permit" response is delivered to kernel space.
4. File ignored mark set.
5. Process A writes dummy bytes to the file. File ignored flags are
cleared.
6. Process B opens the same file for reading. File check request is
generated.
7. File check is performed in userspace. Check result is "file has no
malware".
8. Process A writes malware bytes to the file. There is no cached
response yet.
9. The "permit" response is delivered to kernel space and is cached in
fanotify.
10. File ignored mark set.
11. Now any process C will be permitted to open the malware file.
There is a race between steps 8 and 10.

The race could be easily reproduced by Andreas's fanotify example:
  console1# ./fanotify -s1 -o open_perm,modify,close -m /mnt
  console2# while :; do echo 123 >> /mnt/123.txt; done
echo command opens, then writes, so write should clean ignore mask and
every open call should be intercepted but actually only every 2-nd call
is intercepted.

I be believe it could be solved by introducing two more ignore mark
flags. The fist one to set before the scan starts. It could be cleaned
by write operation. The second one to ask fanotify to set ignore flags
only if the first flag is still set. In this case we will never have
file with not scanned file changes in cache.

2. As I understood it was intended to flush cache via FAN_MARK_FLUSH
flag but it is currently disabled and there is no notion about it in the
man page. There are cases when it is necessary to flush all cache, for
example on anti-malware bases update.

3. I read the discussion about how to define paths to scan but anyway.
We would prefer to have global listener that was defined in the first
version of the interface and mark unnecessary mount points with
persistent ignore flags.

4. FAN_DENY response has no effect at the moment.

Regards,
  Vasily

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [malware-list] A few concerns about fanotify implementation.
  2010-10-26 12:13 A few concerns about fanotify implementation Vasily Novikov
@ 2010-10-26 12:58 ` Tvrtko Ursulin
  2010-10-26 13:58   ` Vasily Novikov
  2010-10-27  8:54   ` [malware-list] " Vasily Novikov
  0 siblings, 2 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2010-10-26 12:58 UTC (permalink / raw)
  To: malware-list; +Cc: Vasily Novikov, eparis, linux-fsdevel


[trimming the CC list because it looked really funky on my end]

Hi Vasily,

On Tuesday 26 Oct 2010 13:13:15 Vasily Novikov wrote:
> Hi Eric,
>
> We are interested in using fanotify in anti-malware applications. I
> found a few concerns in fanotify implementation from the recently
> released 2.6.36 kernel:
>
> 1. Race in cache implementation.

[snip]

> I be believe it could be solved by introducing two more ignore mark
> flags. The fist one to set before the scan starts. It could be cleaned
> by write operation. The second one to ask fanotify to set ignore flags
> only if the first flag is still set. In this case we will never have
> file with not scanned file changes in cache.

Interesting that you have also found this - I suspected it but did not
actually got round verifying it.

Another possible (and simpler) solution is to refuse (ignore) adding ignore
marks if file (well inode) is opened for writing (inode->i_writecount > 0)?
More or less this is the approach we use in Talpa.

> 2. As I understood it was intended to flush cache via FAN_MARK_FLUSH
> flag but it is currently disabled and there is no notion about it in the
> man page. There are cases when it is necessary to flush all cache, for
> example on anti-malware bases update.

Where do you see this as disabled?

> 3. I read the discussion about how to define paths to scan but anyway.
> We would prefer to have global listener that was defined in the first
> version of the interface and mark unnecessary mount points with
> persistent ignore flags.

Yeah, but according to Eric there was fierce opposition against global mode
and hence he dropped it. I personally think anti-global mode arguments are not
that solid but what can you do.

I am pursuing another path of trying to add support for mount marks which
automatically propagate to sub-mounts. That way you can mark root with a mount
mark and when a new filesystem appears under it it will automatically inherit
that mark. I have a proof of concept patch which works but needs some
refactoring to comply with fanotify locking rules. Hopefully it will be
possible to do it in which case I will post it for review.

> 4. FAN_DENY response has no effect at the moment.

I posted a patch for this some time ago (check the archive for September 7th).
It wasn't 100% correct because of my misunderstanding of how mark merging
works so I believe Eric will fix that properly in the next release.

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [malware-list] A few concerns about fanotify implementation.
  2010-10-26 12:58 ` [malware-list] " Tvrtko Ursulin
@ 2010-10-26 13:58   ` Vasily Novikov
  2010-10-26 14:22     ` Tvrtko Ursulin
  2010-10-27  8:54   ` [malware-list] " Vasily Novikov
  1 sibling, 1 reply; 17+ messages in thread
From: Vasily Novikov @ 2010-10-26 13:58 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: malware-list, eparis, linux-fsdevel

Hi Tvrtko,

> > 1. Race in cache implementation.
> 
> [snip]
> 
> > I be believe it could be solved by introducing two more ignore mark
> > flags. The fist one to set before the scan starts. It could be cleaned
> > by write operation. The second one to ask fanotify to set ignore flags
> > only if the first flag is still set. In this case we will never have
> > file with not scanned file changes in cache.
> 
> Interesting that you have also found this - I suspected it but did not
> actually got round verifying it.
> 
> Another possible (and simpler) solution is to refuse (ignore) adding ignore
> marks if file (well inode) is opened for writing (inode->i_writecount > 0)?
> More or less this is the approach we use in Talpa.

I agree. It's simpler and more clear and doesn't require changing the
interface.

> > 2. As I understood it was intended to flush cache via FAN_MARK_FLUSH
> > flag but it is currently disabled and there is no notion about it in the
> > man page. There are cases when it is necessary to flush all cache, for
> > example on anti-malware bases update.
> 
> Where do you see this as disabled?

In fanotify_mark syscall flags argument is checked and if FAN_MARK_FLUSH
bit is set then EINVAL error is returned.

include/linux/fanotify.h:
    31  #define FAN_MARK_ADD            0x00000001
    32  #define FAN_MARK_REMOVE         0x00000002
    33  #define FAN_MARK_DONT_FOLLOW    0x00000004
    34  #define FAN_MARK_ONLYDIR        0x00000008
    35  #define FAN_MARK_MOUNT          0x00000010
    36  #define FAN_MARK_IGNORED_MASK   0x00000020
    37  #define FAN_MARK_IGNORED_SURV_MODIFY    0x00000040
    38  #define FAN_MARK_FLUSH          0x00000080
    39  
    40  #define FAN_ALL_MARK_FLAGS      (FAN_MARK_ADD |\
    41                                   FAN_MARK_REMOVE |\
    42                                   FAN_MARK_DONT_FOLLOW |\
    43                                   FAN_MARK_ONLYDIR |\
    44                                   FAN_MARK_MOUNT |\
    45                                   FAN_MARK_IGNORED_MASK |\
    46                                   FAN_MARK_IGNORED_SURV_MODIFY)

fs/notify/fanotify/fanotify_user.c:
   678  SYSCALL_DEFINE(fanotify_mark)(int fanotify_fd, unsigned int
flags,
   679                                __u64 mask, int dfd,
   680                                const char  __user * pathname)
   ...
   696          if (flags & ~FAN_ALL_MARK_FLAGS)
   697                  return -EINVAL;


Regards,
  Vasily


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [malware-list] A few concerns about fanotify implementation.
  2010-10-26 13:58   ` Vasily Novikov
@ 2010-10-26 14:22     ` Tvrtko Ursulin
       [not found]       ` <201010261522.34157.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2010-10-26 14:22 UTC (permalink / raw)
  To: Vasily Novikov; +Cc: malware-list, eparis, linux-fsdevel

On Tuesday 26 Oct 2010 14:58:34 Vasily Novikov wrote:
> > > 2. As I understood it was intended to flush cache via FAN_MARK_FLUSH
> > > flag but it is currently disabled and there is no notion about it in
> > > the man page. There are cases when it is necessary to flush all cache,
> > > for example on anti-malware bases update.
> >
> > Where do you see this as disabled?
>
> In fanotify_mark syscall flags argument is checked and if FAN_MARK_FLUSH
> bit is set then EINVAL error is returned.

[snip]

>     40  #define FAN_ALL_MARK_FLAGS      (FAN_MARK_ADD |\
>     41                                   FAN_MARK_REMOVE |\
>     42                                   FAN_MARK_DONT_FOLLOW |\
>     43                                   FAN_MARK_ONLYDIR |\
>     44                                   FAN_MARK_MOUNT |\
>     45                                   FAN_MARK_IGNORED_MASK |\
>     46                                   FAN_MARK_IGNORED_SURV_MODIFY)
>
> fs/notify/fanotify/fanotify_user.c:
>    678  SYSCALL_DEFINE(fanotify_mark)(int fanotify_fd, unsigned int
> flags,
>    679                                __u64 mask, int dfd,
>    680                                const char  __user * pathname)
>    ...
>    696          if (flags & ~FAN_ALL_MARK_FLAGS)
>    697                  return -EINVAL;

You are right. I suspect it is just an accidental omission of FAN_MARK_FLUSH
from FAN_ALL_MARK_FLAGS. Eric would probably appreciate if you send him a
patch for this since he is quite busy these days.

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A few concerns about fanotify implementation.
       [not found]       ` <201010261522.34157.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
@ 2010-10-26 14:58         ` Eric Paris
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Paris @ 2010-10-26 14:58 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Vasily Novikov, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	malware-list-h+Im9A44IAFcMpApZELgcQ

On Tue, 2010-10-26 at 15:22 +0100, Tvrtko Ursulin wrote:
> On Tuesday 26 Oct 2010 14:58:34 Vasily Novikov wrote:
> > > > 2. As I understood it was intended to flush cache via FAN_MARK_FLUSH
> > > > flag but it is currently disabled and there is no notion about it in
> > > > the man page. There are cases when it is necessary to flush all cache,
> > > > for example on anti-malware bases update.
> > >
> > > Where do you see this as disabled?
> >
> > In fanotify_mark syscall flags argument is checked and if FAN_MARK_FLUSH
> > bit is set then EINVAL error is returned.
> 
> [snip]
> 
> >     40  #define FAN_ALL_MARK_FLAGS      (FAN_MARK_ADD |\
> >     41                                   FAN_MARK_REMOVE |\
> >     42                                   FAN_MARK_DONT_FOLLOW |\
> >     43                                   FAN_MARK_ONLYDIR |\
> >     44                                   FAN_MARK_MOUNT |\
> >     45                                   FAN_MARK_IGNORED_MASK |\
> >     46                                   FAN_MARK_IGNORED_SURV_MODIFY)
> >
> > fs/notify/fanotify/fanotify_user.c:
> >    678  SYSCALL_DEFINE(fanotify_mark)(int fanotify_fd, unsigned int
> > flags,
> >    679                                __u64 mask, int dfd,
> >    680                                const char  __user * pathname)
> >    ...
> >    696          if (flags & ~FAN_ALL_MARK_FLAGS)
> >    697                  return -EINVAL;
> 
> You are right. I suspect it is just an accidental omission of FAN_MARK_FLUSH
> from FAN_ALL_MARK_FLAGS. Eric would probably appreciate if you send him a
> patch for this since he is quite busy these days.

I actually started to get the fanotify tree into order last night.  I'll
try to address all of your comments and send my series of patches today.
Thanks SOOOOOOOOOO much for the review!

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [malware-list] A few concerns about fanotify implementation.
  2010-10-26 12:58 ` [malware-list] " Tvrtko Ursulin
  2010-10-26 13:58   ` Vasily Novikov
@ 2010-10-27  8:54   ` Vasily Novikov
  2010-10-27 15:58     ` Eric Paris
  1 sibling, 1 reply; 17+ messages in thread
From: Vasily Novikov @ 2010-10-27  8:54 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: malware-list, eparis, linux-fsdevel

> > 3. I read the discussion about how to define paths to scan but anyway.
> > We would prefer to have global listener that was defined in the first
> > version of the interface and mark unnecessary mount points with
> > persistent ignore flags.
> 
> Yeah, but according to Eric there was fierce opposition against global mode
> and hence he dropped it. I personally think anti-global mode arguments are not
> that solid but what can you do.
> 
> I am pursuing another path of trying to add support for mount marks which
> automatically propagate to sub-mounts. That way you can mark root with a mount
> mark and when a new filesystem appears under it it will automatically inherit
> that mark. I have a proof of concept patch which works but needs some
> refactoring to comply with fanotify locking rules. Hopefully it will be
> possible to do it in which case I will post it for review.

It looks good. I can help you with testing the patch.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [malware-list] A few concerns about fanotify implementation.
  2010-10-27  8:54   ` [malware-list] " Vasily Novikov
@ 2010-10-27 15:58     ` Eric Paris
       [not found]       ` <1288195134.2655.202.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Paris @ 2010-10-27 15:58 UTC (permalink / raw)
  To: Vasily Novikov; +Cc: Tvrtko Ursulin, malware-list, linux-fsdevel

On Wed, 2010-10-27 at 12:54 +0400, Vasily Novikov wrote:
> > > 3. I read the discussion about how to define paths to scan but anyway.
> > > We would prefer to have global listener that was defined in the first
> > > version of the interface and mark unnecessary mount points with
> > > persistent ignore flags.
> > 
> > Yeah, but according to Eric there was fierce opposition against global mode
> > and hence he dropped it. I personally think anti-global mode arguments are not
> > that solid but what can you do.
> > 
> > I am pursuing another path of trying to add support for mount marks which
> > automatically propagate to sub-mounts. That way you can mark root with a mount
> > mark and when a new filesystem appears under it it will automatically inherit
> > that mark. I have a proof of concept patch which works but needs some
> > refactoring to comply with fanotify locking rules. Hopefully it will be
> > possible to do it in which case I will post it for review.
> 
> It looks good. I can help you with testing the patch.

I've got about 20 patches in

http://git.infradead.org/users/eparis/notify.git/shortlog/refs/heads/for-next

Which I hope addresses everything people have been asking for (except
Tvrtko's mount point inheritance wish)

-Eric


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A few concerns about fanotify implementation.
       [not found]       ` <1288195134.2655.202.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2011-06-03  9:43         ` Vasily Novikov
       [not found]           ` <4DE8ACAD.2080003-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org>
  2011-06-06 10:27           ` [malware-list] " Lino Sanfilippo
  0 siblings, 2 replies; 17+ messages in thread
From: Vasily Novikov @ 2011-06-03  9:43 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	malware-list-h+Im9A44IAFcMpApZELgcQ

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

Hi Eric,

We are moving to fanotify at the moment. It meets our needs except 
couple minor issues we would like to be fixed in future versions:

1. The file is thrown out of the cache only when it is modified. But in 
case there are different scan options for different dirs that's not 
enough. So we also need it to be evicted from cache on rename or number 
of hard links change.
It is the most important issue for us.
The patch is in attachment: add_clear_cache_events.patch

2. We can't use unlimited cache because it can potentially grab too much 
memory and slow down a system. In case we use limited cache it can be 
easily filled with not very frequently used files. So the only option we 
have at the moment is to clear cache every time it is full.
The solution we consider the most appropriate is to introduce 
replaceable marks and LRU cache for them in fanotify.
So we suggest to introduce a new flag FAN_MARK_REPLACEABLE for marks. 
That will not break the current API.
The patch is in attachment: marks_lru_cache.patch

3. The fanotify file descriptor is always ready to be written to it. But 
it's poll method says the opposite. In case you handle it by yourself 
that's not a problem. But in case you use some async io library as we do 
that polls fds internally it doesn't work.
The patch is in attachment: fanotify_write_poll.patch

-- 
Best regards,

Vasily Novikov | Software developer | Kaspersky Lab

Direct: +7 495 123 45 67 x2344 | Mobile: +7 964 786 44 82 | 
vasily.novikov-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org
10/1, 1st Volokolamsky Proezd, Moscow, 123060, Russia | 
www.kaspersky.com,  www.securelist.com

[-- Attachment #2: add_clear_cache_events.patch --]
[-- Type: text/x-patch, Size: 1726 bytes --]


#
# Patch managed by http://www.holgerschurig.de/patcher.html
#

--- //media/storage/src/linux-2.6.38/fs/notify/fsnotify.c~add_clear_cache_events
+++ //media/storage/src/linux-2.6.38/fs/notify/fsnotify.c
@@ -140,7 +140,7 @@
 	}
 
 	/* clear ignored on inode modification */
-	if (mask & FS_MODIFY) {
+	if (mask & FS_CLEAR_IGNORED_MASK_EVENTS) {
 		if (inode_mark &&
 		    !(inode_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY))
 			inode_mark->ignored_mask = 0;
@@ -220,19 +220,19 @@
 	 * otherwise return if neither the inode nor the vfsmount care about
 	 * this type of event.
 	 */
-	if (!(mask & FS_MODIFY) &&
+	if (!(mask & FS_CLEAR_IGNORED_MASK_EVENTS) &&
 	    !(test_mask & to_tell->i_fsnotify_mask) &&
 	    !(mnt && test_mask & mnt->mnt_fsnotify_mask))
 		return 0;
 
 	idx = srcu_read_lock(&fsnotify_mark_srcu);
 
-	if ((mask & FS_MODIFY) ||
+	if ((mask & FS_CLEAR_IGNORED_MASK_EVENTS) ||
 	    (test_mask & to_tell->i_fsnotify_mask))
 		inode_node = srcu_dereference(to_tell->i_fsnotify_marks.first,
 					      &fsnotify_mark_srcu);
 
-	if (mnt && ((mask & FS_MODIFY) ||
+	if (mnt && ((mask & FS_CLEAR_IGNORED_MASK_EVENTS) ||
 		    (test_mask & mnt->mnt_fsnotify_mask))) {
 		vfsmount_node = srcu_dereference(mnt->mnt_fsnotify_marks.first,
 						 &fsnotify_mark_srcu);
--- //media/storage/src/linux-2.6.38/include/linux/fsnotify_backend.h~add_clear_cache_events
+++ //media/storage/src/linux-2.6.38/include/linux/fsnotify_backend.h
@@ -75,6 +75,8 @@
 			     FS_ISDIR | FS_IN_ONESHOT | FS_DN_RENAME | \
 			     FS_DN_MULTISHOT | FS_EVENT_ON_CHILD)
 
+#define FS_CLEAR_IGNORED_MASK_EVENTS (FS_MODIFY | FS_ATTRIB | FS_MOVE_SELF)
+
 struct fsnotify_group;
 struct fsnotify_event;
 struct fsnotify_mark;

[-- Attachment #3: fanotify_write_poll.patch --]
[-- Type: text/x-patch, Size: 751 bytes --]


#
# Patch managed by http://www.holgerschurig.de/patcher.html
#

--- //media/storage/src/linux-2.6.38/fs/notify/fanotify/fanotify_user.c~fanotify_write_poll
+++ //media/storage/src/linux-2.6.38/fs/notify/fanotify/fanotify_user.c
@@ -301,12 +301,16 @@
 static unsigned int fanotify_poll(struct file *file, poll_table *wait)
 {
 	struct fsnotify_group *group = file->private_data;
+#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
+	int ret = POLLOUT | POLLWRNORM;
+#else
 	int ret = 0;
+#endif
 
 	poll_wait(file, &group->notification_waitq, wait);
 	mutex_lock(&group->notification_mutex);
 	if (!fsnotify_notify_queue_is_empty(group))
-		ret = POLLIN | POLLRDNORM;
+		ret |= POLLIN | POLLRDNORM;
 	mutex_unlock(&group->notification_mutex);
 
 	return ret;

[-- Attachment #4: marks_lru_cache.patch --]
[-- Type: text/x-patch, Size: 11082 bytes --]


#
# Patch managed by http://www.holgerschurig.de/patcher.html
#

--- //media/storage/src/linux-2.6.38/include/linux/fanotify.h~marks_lru_cache
+++ //media/storage/src/linux-2.6.38/include/linux/fanotify.h
@@ -49,6 +49,7 @@
 #define FAN_MARK_IGNORED_MASK	0x00000020
 #define FAN_MARK_IGNORED_SURV_MODIFY	0x00000040
 #define FAN_MARK_FLUSH		0x00000080
+#define FAN_MARK_REPLACEABLE	0x00000200
 #ifdef __KERNEL__
 /* not valid from userspace, only kernel internal */
 #define FAN_MARK_ONDIR		0x00000100
@@ -61,7 +62,8 @@
 				 FAN_MARK_MOUNT |\
 				 FAN_MARK_IGNORED_MASK |\
 				 FAN_MARK_IGNORED_SURV_MODIFY |\
-				 FAN_MARK_FLUSH)
+				 FAN_MARK_FLUSH |\
+				 FAN_MARK_REPLACEABLE)
 
 /*
  * All of the events - we build the list by hand so that we can add flags in
--- //media/storage/src/linux-2.6.38/include/linux/fsnotify_backend.h~marks_lru_cache
+++ //media/storage/src/linux-2.6.38/include/linux/fsnotify_backend.h
@@ -93,6 +93,9 @@
  * freeing-mark - this means that a mark has been flagged to die when everything
  *		finishes using it.  The function is supplied with what must be a
  *		valid group and inode to use to clean up.
+ * on_ignored_event - called when an event is masked by ignored_mask and
+ *		therefore should_send_event & handle_event are not called.
+ *		This callback is required to implement LRU cache in fanotify.
  */
 struct fsnotify_ops {
 	bool (*should_send_event)(struct fsnotify_group *group, struct inode *inode,
@@ -106,6 +109,10 @@
 	void (*free_group_priv)(struct fsnotify_group *group);
 	void (*freeing_mark)(struct fsnotify_mark *mark, struct fsnotify_group *group);
 	void (*free_event_priv)(struct fsnotify_event_private_data *priv);
+	void (*on_ignored_event)(struct fsnotify_group *group, struct inode *inode,
+				  struct fsnotify_mark *inode_mark,
+				  struct fsnotify_mark *vfsmount_mark,
+				  __u32 mask, __u32 ignored_mask, void *data, int data_type);
 };
 
 /*
@@ -173,6 +180,8 @@
 			int f_flags;
 			unsigned int max_marks;
 			struct user_struct *user;
+			struct list_head lru_cache_list;
+			spinlock_t lru_cache_lock;		/* protect g_lru_cache_list */
 		} fanotify_data;
 #endif /* CONFIG_FANOTIFY */
 	};
--- //media/storage/src/linux-2.6.38/fs/notify/dnotify/dnotify.c~marks_lru_cache
+++ //media/storage/src/linux-2.6.38/fs/notify/dnotify/dnotify.c
@@ -157,6 +157,7 @@
 	.free_group_priv = NULL,
 	.freeing_mark = NULL,
 	.free_event_priv = NULL,
+	.on_ignored_event = NULL,
 };
 
 /*
--- //media/storage/src/linux-2.6.38/fs/notify/fanotify/fanotify.c~marks_lru_cache
+++ //media/storage/src/linux-2.6.38/fs/notify/fanotify/fanotify.c
@@ -9,6 +9,8 @@
 #include <linux/types.h>
 #include <linux/wait.h>
 
+#include "fanotify.h"
+
 static bool should_merge(struct fsnotify_event *old, struct fsnotify_event *new)
 {
 	pr_debug("%s: old=%p new=%p\n", __func__, old, new);
@@ -219,10 +221,49 @@
 	free_uid(user);
 }
 
+static void fanotify_freeing_mark(struct fsnotify_mark *fsn_mark, struct fsnotify_group *group)
+{
+	struct fanotify_mark *fan_mark = container_of(fsn_mark, struct fanotify_mark, fsn_mark);
+
+	spin_lock(&group->fanotify_data.lru_cache_lock);
+	if (!list_empty(&fan_mark->lru_cache_list))
+		list_del_init(&fan_mark->lru_cache_list);
+	spin_unlock(&group->fanotify_data.lru_cache_lock);
+}
+
+static void fanotify_lru_cache_hit(struct fsnotify_group *group, struct fsnotify_mark *fsn_mark)
+{
+	if (!fsn_mark) {
+		return;
+	} else {
+		struct fanotify_mark *fan_mark = container_of(fsn_mark, struct fanotify_mark, fsn_mark);
+
+		/* move fsn_mark to the head of lru_cache_list */
+		spin_lock(&group->fanotify_data.lru_cache_lock);
+		if (!list_empty(&fan_mark->lru_cache_list)) {
+			list_del_init(&fan_mark->lru_cache_list);
+			list_add(&fan_mark->lru_cache_list, &group->fanotify_data.lru_cache_list);
+		}
+		spin_unlock(&group->fanotify_data.lru_cache_lock);
+	}
+}
+
+static void fanotify_ignored_event(struct fsnotify_group *group,
+				       struct inode *to_tell,
+				       struct fsnotify_mark *inode_mark,
+				       struct fsnotify_mark *vfsmnt_mark,
+				       __u32 event_mask, __u32 ignored_mask,
+				       void *data, int data_type)
+{
+	fanotify_lru_cache_hit(group, inode_mark);
+	fanotify_lru_cache_hit(group, vfsmnt_mark);
+}
+
 const struct fsnotify_ops fanotify_fsnotify_ops = {
 	.handle_event = fanotify_handle_event,
 	.should_send_event = fanotify_should_send_event,
 	.free_group_priv = fanotify_free_group_priv,
 	.free_event_priv = NULL,
-	.freeing_mark = NULL,
+	.freeing_mark = fanotify_freeing_mark,
+	.on_ignored_event = fanotify_ignored_event,
 };
--- //media/storage/src/linux-2.6.38/fs/notify/inotify/inotify_fsnotify.c~marks_lru_cache
+++ //media/storage/src/linux-2.6.38/fs/notify/inotify/inotify_fsnotify.c
@@ -218,4 +218,5 @@
 	.free_group_priv = inotify_free_group_priv,
 	.free_event_priv = inotify_free_event_priv,
 	.freeing_mark = inotify_freeing_mark,
+	.on_ignored_event = NULL,
 };
--- //media/storage/src/linux-2.6.38/fs/notify/fsnotify.c~marks_lru_cache
+++ //media/storage/src/linux-2.6.38/fs/notify/fsnotify.c
@@ -133,6 +133,7 @@
 	struct fsnotify_group *group = NULL;
 	__u32 inode_test_mask = 0;
 	__u32 vfsmount_test_mask = 0;
+	__u32 ignored_mask = 0;
 
 	if (unlikely(!inode_mark && !vfsmount_mark)) {
 		BUG();
@@ -154,6 +155,7 @@
 		group = inode_mark->group;
 		inode_test_mask = (mask & ~FS_EVENT_ON_CHILD);
 		inode_test_mask &= inode_mark->mask;
+		ignored_mask |= inode_test_mask & inode_mark->ignored_mask;
 		inode_test_mask &= ~inode_mark->ignored_mask;
 	}
 
@@ -162,9 +164,12 @@
 		vfsmount_test_mask = (mask & ~FS_EVENT_ON_CHILD);
 		group = vfsmount_mark->group;
 		vfsmount_test_mask &= vfsmount_mark->mask;
+		ignored_mask |= vfsmount_test_mask & vfsmount_mark->ignored_mask;
 		vfsmount_test_mask &= ~vfsmount_mark->ignored_mask;
-		if (inode_mark)
+		if (inode_mark) {
+			ignored_mask |= vfsmount_test_mask & inode_mark->ignored_mask;
 			vfsmount_test_mask &= ~inode_mark->ignored_mask;
+		}
 	}
 
 	pr_debug("%s: group=%p to_tell=%p mnt=%p mask=%x inode_mark=%p"
@@ -174,8 +179,14 @@
 		 inode_test_mask, vfsmount_mark, vfsmount_test_mask, data,
 		 data_is, cookie, *event);
 
-	if (!inode_test_mask && !vfsmount_test_mask)
+	if (!inode_test_mask && !vfsmount_test_mask) {
+		if (group->ops->on_ignored_event && ignored_mask) {
+			group->ops->on_ignored_event(group, to_tell, inode_mark,
+						     vfsmount_mark, mask, ignored_mask,
+						     data, data_is);
+		}
 		return 0;
+	}
 
 	if (group->ops->should_send_event(group, to_tell, inode_mark,
 					  vfsmount_mark, mask, data,
--- //media/storage/src/linux-2.6.38/fs/notify/fanotify/fanotify_user.c~marks_lru_cache
+++ //media/storage/src/linux-2.6.38/fs/notify/fanotify/fanotify_user.c
@@ -16,6 +16,8 @@
 
 #include <asm/ioctls.h>
 
+#include "fanotify.h"
+
 #define FANOTIFY_DEFAULT_MAX_EVENTS	16384
 #define FANOTIFY_DEFAULT_MAX_MARKS	8192
 #define FANOTIFY_DEFAULT_MAX_LISTENERS	128
@@ -464,7 +466,51 @@
 
 static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
 {
-	kmem_cache_free(fanotify_mark_cache, fsn_mark);
+	struct fanotify_mark *fan_mark = container_of(fsn_mark, struct fanotify_mark, fsn_mark);
+	kmem_cache_free(fanotify_mark_cache, fan_mark);
+}
+
+static struct fsnotify_mark* fanotify_alloc_mark(void)
+{
+	struct fanotify_mark *fan_mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
+
+	if (!fan_mark)
+		return NULL;
+
+	fsnotify_init_mark(&fan_mark->fsn_mark, fanotify_free_mark);
+	INIT_LIST_HEAD(&fan_mark->lru_cache_list);
+
+	return &fan_mark->fsn_mark;
+}
+
+static void fanotify_lru_cache_add(struct fsnotify_mark *fsn_mark, struct fsnotify_group *group)
+{
+	struct fanotify_mark *fan_mark = container_of(fsn_mark, struct fanotify_mark, fsn_mark);
+	
+	spin_lock(&group->fanotify_data.lru_cache_lock);
+	list_add(&fan_mark->lru_cache_list, &group->fanotify_data.lru_cache_list);
+	spin_unlock(&group->fanotify_data.lru_cache_lock);
+}
+
+static bool fanotify_lru_cache_evict(struct fsnotify_group *group)
+{
+	struct fanotify_mark *fan_mark = NULL;
+
+	/* evict lru_cache_list tail */
+	spin_lock(&group->fanotify_data.lru_cache_lock);
+	if (!list_empty(&group->fanotify_data.lru_cache_list)) {
+		fan_mark = list_entry(group->fanotify_data.lru_cache_list.prev, struct fanotify_mark, lru_cache_list);
+		fsnotify_get_mark(&fan_mark->fsn_mark);
+		list_del_init(&fan_mark->lru_cache_list);
+	}
+	spin_unlock(&group->fanotify_data.lru_cache_lock);
+
+	if (fan_mark) {
+		fsnotify_destroy_mark(&fan_mark->fsn_mark);
+		fsnotify_put_mark(&fan_mark->fsn_mark);
+	}
+
+	return !!fan_mark;
 }
 
 static int fanotify_find_path(int dfd, const char __user *filename,
@@ -614,16 +660,19 @@
 	fsn_mark = fsnotify_find_vfsmount_mark(group, mnt);
 	if (!fsn_mark) {
 		if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks)
-			return -ENOSPC;
+			if (!fanotify_lru_cache_evict(group))
+				return -ENOSPC;
 
-		fsn_mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
+		fsn_mark = fanotify_alloc_mark();
 		if (!fsn_mark)
 			return -ENOMEM;
 
-		fsnotify_init_mark(fsn_mark, fanotify_free_mark);
 		ret = fsnotify_add_mark(fsn_mark, group, NULL, mnt, 0);
 		if (ret)
 			goto err;
+
+		if (flags & FAN_MARK_REPLACEABLE)
+			fanotify_lru_cache_add(fsn_mark, group);
 	}
 	added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
 
@@ -657,16 +706,19 @@
 	fsn_mark = fsnotify_find_inode_mark(group, inode);
 	if (!fsn_mark) {
 		if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks)
-			return -ENOSPC;
+			if (!fanotify_lru_cache_evict(group))
+				return -ENOSPC;
 
-		fsn_mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL);
+		fsn_mark = fanotify_alloc_mark();
 		if (!fsn_mark)
 			return -ENOMEM;
 
-		fsnotify_init_mark(fsn_mark, fanotify_free_mark);
 		ret = fsnotify_add_mark(fsn_mark, group, inode, NULL, 0);
 		if (ret)
 			goto err;
+
+		if (flags & FAN_MARK_REPLACEABLE)
+			fanotify_lru_cache_add(fsn_mark, group);
 	}
 	added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
 
@@ -722,6 +774,9 @@
 	INIT_LIST_HEAD(&group->fanotify_data.access_list);
 	atomic_set(&group->fanotify_data.bypass_perm, 0);
 #endif
+	INIT_LIST_HEAD(&group->fanotify_data.lru_cache_list);
+	spin_lock_init(&group->fanotify_data.lru_cache_lock);
+
 	switch (flags & FAN_ALL_CLASS_BITS) {
 	case FAN_CLASS_NOTIF:
 		group->priority = FS_PRIO_0;
@@ -886,7 +941,7 @@
  */
 static int __init fanotify_user_setup(void)
 {
-	fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC);
+	fanotify_mark_cache = KMEM_CACHE(fanotify_mark, SLAB_PANIC);
 	fanotify_response_event_cache = KMEM_CACHE(fanotify_response_event,
 						   SLAB_PANIC);
 
--- /dev/null
+++ //media/storage/src/linux-2.6.38/fs/notify/fanotify/fanotify.h
@@ -0,0 +1,12 @@
+#ifndef __FS_NOTIFY_FANOTIFY_FANOTIFY_H_
+#define __FS_NOTIFY_FANOTIFY_FANOTIFY_H_
+
+#include <linux/list.h>
+#include <linux/fsnotify.h>
+
+struct fanotify_mark {
+	struct fsnotify_mark fsn_mark;
+	struct list_head lru_cache_list;	/* list of marks that can be replaced when cache in full */
+};
+
+#endif /* __FS_NOTIFY_FANOTIFY_FANOTIFY_H_ */

[-- Attachment #5: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A few concerns about fanotify implementation.
       [not found]           ` <4DE8ACAD.2080003-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org>
@ 2011-06-06  9:02             ` Douglas Leeder
  2011-06-06  9:19               ` [malware-list] " Vasily Novikov
       [not found]               ` <C511438CDC161C41B3C47B91D99ABA8D37B4B42114-u5UUZ0l8pcxUerCGrXd8jcc3qqyFMPEu@public.gmane.org>
  0 siblings, 2 replies; 17+ messages in thread
From: Douglas Leeder @ 2011-06-06  9:02 UTC (permalink / raw)
  To: Vasily Novikov, Eric Paris
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	malware-list-h+Im9A44IAFcMpApZELgcQ

On 03/06/11 10:43, Vasily Novikov wrote:
> Hi Eric,
>
> We are moving to fanotify at the moment. It meets our needs except
> couple minor issues we would like to be fixed in future versions:
>
> 1. The file is thrown out of the cache only when it is modified. But in
> case there are different scan options for different dirs that's not
> enough. So we also need it to be evicted from cache on rename or number
> of hard links change.
> It is the most important issue for us.
> The patch is in attachment: add_clear_cache_events.patch


This is interesting, as it makes the cache less efficient for those
users who don't have different scanning within a filesystem.

Our only equivalent things are exclusions, which we handle by not
marking the responses for exclusions as cache-able.


>
> 2. We can't use unlimited cache because it can potentially grab too much
> memory and slow down a system. In case we use limited cache it can be
> easily filled with not very frequently used files. So the only option we
> have at the moment is to clear cache every time it is full.
> The solution we consider the most appropriate is to introduce
> replaceable marks and LRU cache for them in fanotify.
> So we suggest to introduce a new flag FAN_MARK_REPLACEABLE for marks.
> That will not break the current API.
> The patch is in attachment: marks_lru_cache.patch


IIRC the cache is stored in the inodes themselves, so will automatically
be culled as inodes are pushed out of memory?


>
> 3. The fanotify file descriptor is always ready to be written to it. But
> it's poll method says the opposite. In case you handle it by yourself
> that's not a problem. But in case you use some async io library as we do
> that polls fds internally it doesn't work.
> The patch is in attachment: fanotify_write_poll.patch
>

This seems like a good fix.


--
Douglas Leeder, Senior Software Engineer

Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 991 2418 08.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [malware-list] A few concerns about fanotify implementation.
  2011-06-06  9:02             ` Douglas Leeder
@ 2011-06-06  9:19               ` Vasily Novikov
       [not found]                 ` <4DEC9B86.6060506-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org>
       [not found]               ` <C511438CDC161C41B3C47B91D99ABA8D37B4B42114-u5UUZ0l8pcxUerCGrXd8jcc3qqyFMPEu@public.gmane.org>
  1 sibling, 1 reply; 17+ messages in thread
From: Vasily Novikov @ 2011-06-06  9:19 UTC (permalink / raw)
  To: Douglas Leeder; +Cc: Eric Paris, linux-fsdevel, malware-list

>> 1. The file is thrown out of the cache only when it is modified. But in
>> case there are different scan options for different dirs that's not
>> enough. So we also need it to be evicted from cache on rename or number
>> of hard links change.
>
> This is interesting, as it makes the cache less efficient for those
> users who don't have different scanning within a filesystem.
>
> Our only equivalent things are exclusions, which we handle by not
> marking the responses for exclusions as cache-able.
>

I suppose rename or make hard link is less frequent operation then 
modify so I believe it won't add a significant overhead.

>> 2. We can't use unlimited cache because it can potentially grab too much
>> memory and slow down a system. In case we use limited cache it can be
>> easily filled with not very frequently used files. So the only option we
>> have at the moment is to clear cache every time it is full.
>> The solution we consider the most appropriate is to introduce
>> replaceable marks and LRU cache for them in fanotify.
>> So we suggest to introduce a new flag FAN_MARK_REPLACEABLE for marks.
>
> IIRC the cache is stored in the inodes themselves, so will automatically
> be culled as inodes are pushed out of memory?

If I understood the code correctly there is no cache by itself. It's 
just implemented through marks and it's ignored_mask field. So there is 
a list of marks for each inode that is empty initially. But when you add 
mark to this list you allocate fsnotify_mark structure which is about 64 
bytes.

-- 
Best regards,

Vasily Novikov | Software developer | Kaspersky Lab

Direct: +7 495 123 45 67 x2344 | Mobile: +7 964 786 44 82 | 
vasily.novikov@kaspersky.com
10/1, 1st Volokolamsky Proezd, Moscow, 123060, Russia | 
www.kaspersky.com,  www.securelist.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A few concerns about fanotify implementation.
       [not found]               ` <C511438CDC161C41B3C47B91D99ABA8D37B4B42114-u5UUZ0l8pcxUerCGrXd8jcc3qqyFMPEu@public.gmane.org>
@ 2011-06-06  9:42                 ` Vasily Novikov
  0 siblings, 0 replies; 17+ messages in thread
From: Vasily Novikov @ 2011-06-06  9:42 UTC (permalink / raw)
  To: Douglas Leeder
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	malware-list-h+Im9A44IAFcMpApZELgcQ, Eric Paris

Douglas,

>> 1. The file is thrown out of the cache only when it is modified. But in
>> case there are different scan options for different dirs that's not
>> enough. So we also need it to be evicted from cache on rename or number
>> of hard links change.
>
> This is interesting, as it makes the cache less efficient for those
> users who don't have different scanning within a filesystem.

If you consider overhead is a problem here it could be solved by adding 
some flag to a fsnotify group that would be responsible for whether file 
would be evicted from cache on modify only or on renaming or changing 
attributes as well for each group.

Another thought about this issue: it solves the problem only if a file 
is moved/renamed but not a directory. I just don't know how to resolve 
it without adding too much overhead.

Forgot to write that it would also be nice to have a possibility to set 
cache size (i.e. group->fanotify_data.max_marks).

-- 
Best regards,

Vasily Novikov | Software developer | Kaspersky Lab

Direct: +7 495 123 45 67 x2344 | Mobile: +7 964 786 44 82 | 
vasily.novikov-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org
10/1, 1st Volokolamsky Proezd, Moscow, 123060, Russia | 
www.kaspersky.com,  www.securelist.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [malware-list] A few concerns about fanotify implementation.
  2011-06-03  9:43         ` Vasily Novikov
       [not found]           ` <4DE8ACAD.2080003-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org>
@ 2011-06-06 10:27           ` Lino Sanfilippo
  2011-06-06 11:17             ` [malware-list] A few concerns about fanotify implementation ([PATCH] inside) Lino Sanfilippo
  1 sibling, 1 reply; 17+ messages in thread
From: Lino Sanfilippo @ 2011-06-06 10:27 UTC (permalink / raw)
  To: Vasily Novikov; +Cc: Eric Paris, Tvrtko Ursulin, malware-list, linux-fsdevel

On Fri, Jun 03, 2011 at 01:43:09PM +0400, Vasily Novikov wrote:

> 3. The fanotify file descriptor is always ready to be written to it. But  

This is not correct. The fanotify file is only writeable if there is
at least one event on the access list (meaning at least one file access event
has been read but not already been confirmed by userspace).
Otherwise you will get -ENOENT.

The applied patch should handle this correctly.

Lino


Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/fanotify/fanotify_user.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9fde1c0..f39bcc4 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -304,10 +304,15 @@ static unsigned int fanotify_poll(struct file *file, poll_table *wait)
 	int ret = 0;
 
 	poll_wait(file, &group->notification_waitq, wait);
+	poll_wait(file, &group->fanotify_data.access_waitq, wait);
 	mutex_lock(&group->notification_mutex);
 	if (!fsnotify_notify_queue_is_empty(group))
-		ret = POLLIN | POLLRDNORM;
+		ret |= POLLIN | POLLRDNORM;
 	mutex_unlock(&group->notification_mutex);
+	mutex_lock(&group->fanotify_data.access_mutex);
+	if (!list_empty(&group->fanotify_data.access_list))
+		ret |= POLLOUT | POLLWRNORM;
+	mutex_unlock(&group->fanotify_data.access_mutex);
 
 	return ret;
 }
-- 
1.5.6.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [malware-list] A few concerns about fanotify implementation ([PATCH] inside).
  2011-06-06 10:27           ` [malware-list] " Lino Sanfilippo
@ 2011-06-06 11:17             ` Lino Sanfilippo
  0 siblings, 0 replies; 17+ messages in thread
From: Lino Sanfilippo @ 2011-06-06 11:17 UTC (permalink / raw)
  To: Lino Sanfilippo
  Cc: Vasily Novikov, Eric Paris, Tvrtko Ursulin, malware-list, linux-fsdevel

On Mon, Jun 06, 2011 at 12:27:54PM +0200, Lino Sanfilippo wrote:
> The applied patch should handle this correctly.

Should but doesnt, since access permission handling might not be
enabled.

So here is an improved version:


Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/fanotify/fanotify_user.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9fde1c0..527fbb0 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -306,8 +306,15 @@ static unsigned int fanotify_poll(struct file *file, poll_table *wait)
 	poll_wait(file, &group->notification_waitq, wait);
 	mutex_lock(&group->notification_mutex);
 	if (!fsnotify_notify_queue_is_empty(group))
-		ret = POLLIN | POLLRDNORM;
+		ret |= POLLIN | POLLRDNORM;
 	mutex_unlock(&group->notification_mutex);
+#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
+	poll_wait(file, &group->fanotify_data.access_waitq, wait);
+	mutex_lock(&group->fanotify_data.access_mutex);
+	if (!list_empty(&group->fanotify_data.access_list))
+		ret |= POLLOUT | POLLWRNORM;
+	mutex_unlock(&group->fanotify_data.access_mutex);
+#endif
 
 	return ret;
 }
-- 
1.5.6.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: A few concerns about fanotify implementation.
       [not found]                 ` <4DEC9B86.6060506-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org>
@ 2011-06-06 13:43                   ` Eric Paris
  2011-06-06 14:42                     ` [malware-list] " Vasily Novikov
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Paris @ 2011-06-06 13:43 UTC (permalink / raw)
  To: Vasily Novikov
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	malware-list-h+Im9A44IAFcMpApZELgcQ, Douglas Leeder

On Mon, 2011-06-06 at 13:19 +0400, Vasily Novikov wrote:
> >> 2. We can't use unlimited cache because it can potentially grab too much
> >> memory and slow down a system. In case we use limited cache it can be
> >> easily filled with not very frequently used files. So the only option we
> >> have at the moment is to clear cache every time it is full.
> >> The solution we consider the most appropriate is to introduce
> >> replaceable marks and LRU cache for them in fanotify.
> >> So we suggest to introduce a new flag FAN_MARK_REPLACEABLE for marks.
> >
> > IIRC the cache is stored in the inodes themselves, so will automatically
> > be culled as inodes are pushed out of memory?
> 
> If I understood the code correctly there is no cache by itself. It's 
> just implemented through marks and it's ignored_mask field. So there is 
> a list of marks for each inode that is empty initially. But when you add 
> mark to this list you allocate fsnotify_mark structure which is about 64 
> bytes.

Well, you are both correct.  If you add a mark with only 'ignored'
events set it will not pin the inode into the kernel.  If the system
starts to get under memory pressure the kernel will kick unused inodes
and any associated ignored marks out of ram.  Inodes with 'real' events
attached will be pinned in memory and cannot be evicted under memory
pressure.

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [malware-list] A few concerns about fanotify implementation.
  2011-06-06 13:43                   ` Eric Paris
@ 2011-06-06 14:42                     ` Vasily Novikov
       [not found]                       ` <4DECE76E.4060507-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Vasily Novikov @ 2011-06-06 14:42 UTC (permalink / raw)
  To: Eric Paris; +Cc: Douglas Leeder, linux-fsdevel, malware-list

On 06/06/2011 05:43 PM, Eric Paris wrote:
> Well, you are both correct.  If you add a mark with only 'ignored'
> events set it will not pin the inode into the kernel.  If the system
> starts to get under memory pressure the kernel will kick unused inodes
> and any associated ignored marks out of ram.  Inodes with 'real' events
> attached will be pinned in memory and cannot be evicted under memory
> pressure.

So if we use marks with only 'ignored' events then under memory pressure 
mm subsystem will shrink inode cache that will free our marks and 
therefore it's safe to use FAN_UNLIMITED_MARKS in this case?
If it really works then we don't need LRU cache in fanotify because it's 
already implemented in dentry_cache/inode_cache.

-- 
Best regards,

Vasily Novikov | Software developer | Kaspersky Lab

Direct: +7 495 123 45 67 x2344 | Mobile: +7 964 786 44 82 | 
vasily.novikov@kaspersky.com
10/1, 1st Volokolamsky Proezd, Moscow, 123060, Russia | 
www.kaspersky.com,  www.securelist.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A few concerns about fanotify implementation.
       [not found]                       ` <4DECE76E.4060507-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org>
@ 2011-06-06 15:53                         ` Eric Paris
  2011-06-07 12:35                           ` [malware-list] " Vasily Novikov
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Paris @ 2011-06-06 15:53 UTC (permalink / raw)
  To: Vasily Novikov
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	malware-list-h+Im9A44IAFcMpApZELgcQ, Douglas Leeder

On Mon, 2011-06-06 at 18:42 +0400, Vasily Novikov wrote:
> On 06/06/2011 05:43 PM, Eric Paris wrote:
> > Well, you are both correct.  If you add a mark with only 'ignored'
> > events set it will not pin the inode into the kernel.  If the system
> > starts to get under memory pressure the kernel will kick unused inodes
> > and any associated ignored marks out of ram.  Inodes with 'real' events
> > attached will be pinned in memory and cannot be evicted under memory
> > pressure.
> 
> So if we use marks with only 'ignored' events then under memory pressure 
> mm subsystem will shrink inode cache that will free our marks and 
> therefore it's safe to use FAN_UNLIMITED_MARKS in this case?
> If it really works then we don't need LRU cache in fanotify because it's 
> already implemented in dentry_cache/inode_cache.

That's how it's supposed to work.  Just remember, if you set a real
event, the inode becomes pinned in core and the mm will be unable to
evict either the inode or the mark.

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [malware-list] A few concerns about fanotify implementation.
  2011-06-06 15:53                         ` Eric Paris
@ 2011-06-07 12:35                           ` Vasily Novikov
  0 siblings, 0 replies; 17+ messages in thread
From: Vasily Novikov @ 2011-06-07 12:35 UTC (permalink / raw)
  To: Eric Paris; +Cc: Douglas Leeder, linux-fsdevel, malware-list

Eric,

>> So if we use marks with only 'ignored' events then under memory pressure
>> mm subsystem will shrink inode cache that will free our marks and
>> therefore it's safe to use FAN_UNLIMITED_MARKS in this case?
>> If it really works then we don't need LRU cache in fanotify because it's
>> already implemented in dentry_cache/inode_cache.
>
> That's how it's supposed to work.  Just remember, if you set a real
> event, the inode becomes pinned in core and the mm will be unable to
> evict either the inode or the mark.

It really works) On machine with 2GB ram it holds no more than about 
3500000 'ignored' marks in 10 groups. After that it begins to evict LRU 
files. So it completely satisfies our needs.

What you think about clearing ignored mask not only on FS_MODIFY but 
also on FS_ATTRIB and FS_MOVE_SELF?

-- 
Best regards,

Vasily Novikov | Software developer | Kaspersky Lab

Direct: +7 495 123 45 67 x2344 | Mobile: +7 964 786 44 82 | 
vasily.novikov@kaspersky.com
10/1, 1st Volokolamsky Proezd, Moscow, 123060, Russia | 
www.kaspersky.com,  www.securelist.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-06-07 12:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-26 12:13 A few concerns about fanotify implementation Vasily Novikov
2010-10-26 12:58 ` [malware-list] " Tvrtko Ursulin
2010-10-26 13:58   ` Vasily Novikov
2010-10-26 14:22     ` Tvrtko Ursulin
     [not found]       ` <201010261522.34157.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2010-10-26 14:58         ` Eric Paris
2010-10-27  8:54   ` [malware-list] " Vasily Novikov
2010-10-27 15:58     ` Eric Paris
     [not found]       ` <1288195134.2655.202.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2011-06-03  9:43         ` Vasily Novikov
     [not found]           ` <4DE8ACAD.2080003-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org>
2011-06-06  9:02             ` Douglas Leeder
2011-06-06  9:19               ` [malware-list] " Vasily Novikov
     [not found]                 ` <4DEC9B86.6060506-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org>
2011-06-06 13:43                   ` Eric Paris
2011-06-06 14:42                     ` [malware-list] " Vasily Novikov
     [not found]                       ` <4DECE76E.4060507-BkmlMuIjteXqlBn2x/YWAg@public.gmane.org>
2011-06-06 15:53                         ` Eric Paris
2011-06-07 12:35                           ` [malware-list] " Vasily Novikov
     [not found]               ` <C511438CDC161C41B3C47B91D99ABA8D37B4B42114-u5UUZ0l8pcxUerCGrXd8jcc3qqyFMPEu@public.gmane.org>
2011-06-06  9:42                 ` Vasily Novikov
2011-06-06 10:27           ` [malware-list] " Lino Sanfilippo
2011-06-06 11:17             ` [malware-list] A few concerns about fanotify implementation ([PATCH] inside) Lino Sanfilippo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.