From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:50972 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753045AbdDCPef (ORCPT ); Mon, 3 Apr 2017 11:34:35 -0400 From: Jan Kara To: Cc: Miklos Szeredi , Amir Goldstein , Paul Moore , Jan Kara Subject: [PATCH 0/35 v7] fsnotify: Avoid SRCU stalls with fanotify permission events Date: Mon, 3 Apr 2017 17:33:49 +0200 Message-Id: <20170403153424.24945-1-jack@suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hello, This is the seventh revision of my patches to avoid SRCU stalls when fanotify waits for response to permission events from userspace processes. Thanks to Amir, Paul, and Miklos for review! It also passes a new LTP test that tries to provoke hangs in fanotify subsystem when there are unanswered fanotify permission events. If nobody has more objections, I'll push the changes to my tree to queue them for the next merge window. Changes since v6: * Added Reviewed-by tags from Miklos * Improved couple of comments suggested by Miklos * Fixed possible NULL pointer dereference in audit_tree * Cleaned up some patches based on Miklos' feedback Changes since v5: * Added Reviewed-by tags from Amir * Fixed up __rcu annotation * Fixed minor issues spotted by 0-day in the middle of the series * Added fsnotify_attach_connector_to_object() * Removed igrab()/iput() from fsnotify_recalc_mask() Changes since v4: * Further split up of patches as requested by Miklos * Moved some hunks between patches to make things more logical * Couple of smaller improvements suggested by Miklos * Rebased on top of 4.11-rc2 Changes since v3: * added Reviewed-by tags * split adding of fsnotify_mark_connector into 4 smaller parts as Miklos asked * simplified API of fsnotify_prepare/finish_user_wait() Changes since v2: * added Reviewed-by tags * dropped fsnotify_put_list() abstraction * use rcu_assign_pointer() where appropriate Changes since v1: * renamed fsnotify_mark_list to fsnotify_mark_connector and couple other things * updated some comments and changelogs to better explain what is going on * made audit use inode pointer as a key again * added Reviewed-by tags * dropped two audit fixes that got already merged * added cleanup of mark destruction functions Patch set overview ------------------ Currently, fanotify waits for response to a permission even from userspace process while holding fsnotify_mark_srcu lock. That has a consequence that when userspace process takes long to respond or does not respond at all, fsnotify_mark_srcu period cannot ever complete blocking reclaim of any notification marks and also blocking any process that did synchronize_srcu() on fsnotify_mark_srcu. Effectively, this eventually blocks anybody interacting with the notification subsystem. Miklos has some real world reports of this happening. Although this in principle a problem of broken userspace application (which futhermore has to have CAP_SYS_ADMIN in init_user_ns, so it is not a security problem), it is still nasty that a simple error can block the kernel like this. This patch set solves this problem. The basic idea of the solution is that when fanotify needs to wait for response from userspace process, it grabs reference to the mark which generated the event and drops fsnotify_mark_srcu lock. When userspace responds, we grab fsnotify_mark_srcu again, drop the mark reference, and continue iterating the list of marks attached to the inode / vfsmount delivering the event to other notification groups. What complicates this simple approach is that the mark for which we wait for response has to stay pinned in the list of marks attached to the inode / vfsmount so that we can resume iteration of the list when userspace responds but on the other hand when the inode gets unlinked while we wait for userspace reponse, we need to destroy the mark (or at least detach it from the inode). The first 5 patches contain some initial fixes and cleanups. Patches 6-17 implement attaching of marks to inode / vfsmount via a dedicated structure which allows us to detach list of marks from the object without having to destroy the list itself. Patches 18-20 implement removal of mark from the list of marks attached to an object when last mark reference is dropped. Patches 21-24 then implement dropping of SRCU lock when waiting on response from userspace. Patches 25-33 are mostly trivial cleanups that get rid of trivial wrappers and one pointer in the mark structure. Patches have survived testing with inotify/fanotify tests in LTP. Finally, to ease experimenting with the patches I've pushed them out to git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify Honza