From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF7E2C169C4 for ; Wed, 6 Feb 2019 19:54:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7BDD9218B0 for ; Wed, 6 Feb 2019 19:54:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LkFx9piC" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726615AbfBFTyB (ORCPT ); Wed, 6 Feb 2019 14:54:01 -0500 Received: from mail-ot1-f74.google.com ([209.85.210.74]:45392 "EHLO mail-ot1-f74.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726161AbfBFTyA (ORCPT ); Wed, 6 Feb 2019 14:54:00 -0500 Received: by mail-ot1-f74.google.com with SMTP id d93so7017933otb.12 for ; Wed, 06 Feb 2019 11:53:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=jitr2f8tDZQmZXOap/UQkC5PUYAZ05XZkLtEF+RitPE=; b=LkFx9piCfafgoG+bNOFDV1tPbcd5XhBbnL/HxOgEi/33Ye2akeE4JuA/y6Hp57TIAN SI9hRn9i/9AKd5M3fJ9BMQcRiksmN469RzpEfPqJZQGA2ZlfVq1pTTm4F/0auXa+jBfZ 6vxnrBsbCpHXJTOavc/+6NnNlvOaiDyfcQDq3PBQR/bDoGhkXPIS4HqFVTuAQfUcfX36 H5eoK3gFyimZvBNC0nTSCkgJF0AoOS7kgUJKP318gfb2ym1njegGQaZukdXKp94CBENl 3vk2fRaTKOuL9WyadvmmqBm6DjcDfzsAxR7cWy6eoPnF+z1iexvKxpkOIfep5kI3890s 89eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=jitr2f8tDZQmZXOap/UQkC5PUYAZ05XZkLtEF+RitPE=; b=OxD+QGukS6lsaDdXKRsTWsuUTiFB96jKxdRRQSBITqJjFzow7J6VR6RcglOpRkoJa9 YR+KoVAKJOoFbfPO/w+6BT21dPdqB6Qa5gg8EBWHZjOVDwOKS0cAm1FjxLS8lf663qyo GrXA7JFdaPqXbXMHOQN9PduQ7wsEi/L03OxVr93+rLwH22XFE+GyxVtGNww3ev27SImN LSFmqP0LpynCpyrng3gbXAulGdEvicNniKURhC+ydWms95CKcBS+zXmiZ3FAtDCkioB6 YuxJW5gm5M24JOfJi+azsdA+OLNdhkaJExdTZiPTGP8aT6pX7wVF5OA9ByvJ4zRB150C rviQ== X-Gm-Message-State: AHQUAubjfzi9XgbKD2LMqXYZ0HDCRlZZxCaDancY++SiiEu4e6jiWp6J /tJQw4ByXT+t1cCqvs45SmWBf5QpfA== X-Google-Smtp-Source: AHgI3IaUG2o2PyQq2sfFq0TTKc6ZMlieKbEpYOa/se58nf1cxlTtqVuKqlBWUpzMprRAG89cDDGFr74TxQ== X-Received: by 2002:aca:340b:: with SMTP id b11mr7498058oia.55.1549482839246; Wed, 06 Feb 2019 11:53:59 -0800 (PST) Date: Wed, 6 Feb 2019 11:53:54 -0800 Message-Id: <20190206195354.40576-1-sqazi@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.20.1.611.gfbb209baf1-goog Subject: [PATCH] fs, ipc: Use an asynchronous version of kern_unmount in IPC From: Salman Qazi To: Alexander Viro , Eric Biederman , Eric Dumazet , linux-fsdevel@vger.kernel.org Cc: Salman Qazi Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Prior to this patch, the kernel can spend a lot of time with this stack trace: [] __wait_rcu_gp+0x93/0xe0 [] synchronize_sched+0x48/0x60 [] kern_unmount+0x3a/0x46 [] mq_put_mnt+0x15/0x17 [] put_ipc_ns+0x36/0x8b This patch solves the issue by removing synchronize_rcu from mq_put_mnt. This is done by implementing an asynchronous version of kern_unmount. Since mntput() sleeps, it needs to be deferred to a work queue. Additionally, the callers of mq_put_mnt appear to be safe having it behave asynchronously. In particular, put_ipc_ns calls mq_clear_sbinfo which renders the inode inaccessible for the purposes of mqueue_create by making s_fs_info NULL. This appears to be the thing that prevents access while free_ipc_ns is taking place. So, the unmount should be able to proceed lazily. Tested: Ran the following program: int main(void) { int pid; int status; int i; for (i = 0; i < 1000; i++) { pid = fork(); if (!pid) { assert(!unshare(CLONE_NEWUSER| CLONE_NEWIPC|CLONE_NEWNS)); return 0; } assert(waitpid(pid, &status, 0) == pid); } } Before: $ time ./unshare2 real 0m9.784s user 0m0.428s sys 0m0.000s After: $ time ./unshare2 real 0m0.368s user 0m0.226s sys 0m0.122s Signed-off-by: Salman Qazi --- fs/namespace.c | 41 +++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 1 + ipc/mqueue.c | 2 +- 3 files changed, 43 insertions(+), 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index a677b59efd74..caa51ca81605 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3323,6 +3323,47 @@ void kern_unmount(struct vfsmount *mnt) } EXPORT_SYMBOL(kern_unmount); +struct async_unmount_cb { + struct vfsmount *mnt; + struct work_struct work; + struct rcu_head rcu_head; +}; + +static void kern_unmount_work(struct work_struct *work) +{ + struct async_unmount_cb *cb = container_of(work, + struct async_unmount_cb, work); + + mntput(cb->mnt); + kfree(cb); +} + +static void kern_unmount_rcu_cb(struct rcu_head *rcu_head) +{ + struct async_unmount_cb *cb = container_of(rcu_head, + struct async_unmount_cb, rcu_head); + + INIT_WORK(&cb->work, kern_unmount_work); + schedule_work(&cb->work); + +} + +void kern_unmount_async(struct vfsmount *mnt) +{ + /* release long term mount so mount point can be released */ + if (!IS_ERR_OR_NULL(mnt)) { + struct async_unmount_cb *cb = kmalloc(sizeof(*cb), GFP_KERNEL); + + if (cb) { + real_mount(mnt)->mnt_ns = NULL; + cb->mnt = mnt; + call_rcu(&cb->rcu_head, kern_unmount_rcu_cb); + } else { + kern_unmount(mnt); + } + } +} + bool our_mnt(struct vfsmount *mnt) { return check_mnt(real_mount(mnt)); diff --git a/include/linux/fs.h b/include/linux/fs.h index 29d8e2cfed0e..8865997a8722 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2274,6 +2274,7 @@ extern int register_filesystem(struct file_system_type *); extern int unregister_filesystem(struct file_system_type *); extern struct vfsmount *kern_mount_data(struct file_system_type *, void *data); #define kern_mount(type) kern_mount_data(type, NULL) +extern void kern_unmount_async(struct vfsmount *mnt); extern void kern_unmount(struct vfsmount *mnt); extern int may_umount_tree(struct vfsmount *); extern int may_umount(struct vfsmount *); diff --git a/ipc/mqueue.c b/ipc/mqueue.c index c595bed7bfcb..a8c2465ac0cb 100644 --- a/ipc/mqueue.c +++ b/ipc/mqueue.c @@ -1554,7 +1554,7 @@ void mq_clear_sbinfo(struct ipc_namespace *ns) void mq_put_mnt(struct ipc_namespace *ns) { - kern_unmount(ns->mq_mnt); + kern_unmount_async(ns->mq_mnt); } static int __init init_mqueue_fs(void) -- 2.20.1.611.gfbb209baf1-goog