From: Ian Kent <email@example.com> To: Greg Kroah-Hartman <firstname.lastname@example.org> Cc: Tejun Heo <email@example.com>, Stephen Rothwell <firstname.lastname@example.org>, Andrew Morton <email@example.com>, Al Viro <viro@ZenIV.linux.org.uk>, Rick Lindsley <firstname.lastname@example.org>, David Howells <email@example.com>, Miklos Szeredi <firstname.lastname@example.org>, linux-fsdevel <email@example.com>, Kernel Mailing List <firstname.lastname@example.org> Subject: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement Date: Wed, 17 Jun 2020 15:37:43 +0800 [thread overview] Message-ID: <email@example.com> (raw) For very large IBM Power mainframe systems with hundreds of CPUs and TBs of RAM booting can take a very long time. Initial reports showed that booting a configuration of several hundred CPUs and 64TB of RAM would take more than 30 minutes and require kernel parameters of udev.children-max=1024 systemd.default_timeout_start_sec=3600 to prevent dropping into emergency mode. Gathering information about what's happening during the boot is a bit challenging but two main issues appeared to be: a large number of path lookups for non-existent files, and very high lock contention in the VFS during path walks particularly in the dentry allocation code path. The underlying cause of this was thought to be the sheer number of sysfs memory objects, 100,000+ for a 64TB memory configuration as the hardware divides the memory into 256MB logical blocks. This is believed to be due to either IBM Power hardware design or a requirement of the mainframe software used to create logical partitions (LPARs, that are used to install an operating system to provide services), since these can be made up of a wide range of resources, CPU, Memory, disks, etc. It's unclear yet whether the creation of syfs nodes for these memory devices can be postponed or spread out over a larger amount of time. That's because the high overhead looks to be due to notifications received by udev which invokes a systemd program for them and attempts by systemd folks to improve this have not focused on changing the handling of these notifications, possibly because of difficulties with doing so. This remains an avenue of investigation. Kernel traces show there are many path walks with a fairly large portion of those for non-existent paths. However, looking at the systemd code invoked by the udev action it appears there's only one additional lookup for each invocation so the large number of negative lookups is most likely due to the large number of notifications rather than a fault with the systemd program. The series here tries to reduce the locking needed during path walks based on the assumption that there are many path walks with a fairly large portion of those for non-existent paths, as described above. That was done by adding kernfs negative dentry caching (non-existent paths) to avoid continual alloc/free cycle of dentries and a read/write semaphore introduced to increase kernfs concurrency during path walks. With these changes we still need kernel parameters of udev.children-max=2048 and systemd.default_timeout_start_sec=300 for the fastest boot times of under 5 minutes. There may be opportunities for further improvements but the series here has seen a fair amount of testing and thinking about what else these could be. Discussing it with Rick Lindsay, I suspect improvements will get more difficult to implement for somewhat less improvement so I think what we have here is a good start for now. Changes since v1: - fix locking in .permission() and .getattr() by re-factoring the attribute handling code. --- Ian Kent (6): kernfs: switch kernfs to use an rwsem kernfs: move revalidate to be near lookup kernfs: improve kernfs path resolution kernfs: use revision to identify directory node changes kernfs: refactor attr locking kernfs: make attr_mutex a local kernfs node lock fs/kernfs/dir.c | 284 ++++++++++++++++++++++++++++--------------- fs/kernfs/file.c | 4 - fs/kernfs/inode.c | 58 +++++---- fs/kernfs/kernfs-internal.h | 29 ++++ fs/kernfs/mount.c | 12 +- fs/kernfs/symlink.c | 4 - include/linux/kernfs.h | 7 + 7 files changed, 259 insertions(+), 139 deletions(-) -- Ian
next reply other threads:[~2020-06-17 7:37 UTC|newest] Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-17 7:37 Ian Kent [this message] 2020-06-17 7:37 ` [PATCH v2 1/6] kernfs: switch kernfs to use an rwsem Ian Kent 2020-06-17 7:37 ` [PATCH v2 2/6] kernfs: move revalidate to be near lookup Ian Kent 2020-06-17 7:37 ` [PATCH v2 3/6] kernfs: improve kernfs path resolution Ian Kent 2020-06-17 7:38 ` [PATCH v2 4/6] kernfs: use revision to identify directory node changes Ian Kent 2020-06-17 7:38 ` [PATCH v2 5/6] kernfs: refactor attr locking Ian Kent 2020-06-17 7:38 ` [PATCH v2 6/6] kernfs: make attr_mutex a local kernfs node lock Ian Kent 2020-06-19 15:38 ` [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement Tejun Heo 2020-06-19 20:41 ` Rick Lindsley 2020-06-19 22:23 ` Tejun Heo 2020-06-20 2:44 ` Rick Lindsley 2020-06-22 17:53 ` Tejun Heo 2020-06-22 21:22 ` Rick Lindsley 2020-06-23 23:13 ` Tejun Heo 2020-06-24 9:04 ` Rick Lindsley 2020-06-24 9:27 ` Greg Kroah-Hartman 2020-06-24 13:19 ` Tejun Heo 2020-06-25 8:15 ` Ian Kent 2020-06-25 9:43 ` Greg Kroah-Hartman 2020-06-26 0:19 ` Ian Kent 2020-06-21 4:55 ` Ian Kent 2020-06-22 17:48 ` Tejun Heo 2020-06-22 18:03 ` Greg Kroah-Hartman 2020-06-22 21:27 ` Rick Lindsley 2020-06-23 5:21 ` Greg Kroah-Hartman 2020-06-23 5:09 ` Ian Kent 2020-06-23 6:02 ` Greg Kroah-Hartman 2020-06-23 8:01 ` Ian Kent 2020-06-23 8:29 ` Ian Kent 2020-06-23 11:49 ` Greg Kroah-Hartman 2020-06-23 9:33 ` Rick Lindsley 2020-06-23 11:45 ` Greg Kroah-Hartman 2020-06-23 22:55 ` Rick Lindsley 2020-06-23 11:51 ` Ian Kent 2020-06-21 3:21 ` Ian Kent 2020-12-10 16:44 ` Fox Chen 2020-12-11 2:01 ` [PATCH " Ian Kent 2020-12-11 2:17 ` Ian Kent 2020-12-13 3:46 ` Ian Kent 2020-12-14 6:14 ` Fox Chen 2020-12-14 13:30 ` Ian Kent 2020-12-15 8:33 ` Fox Chen 2020-12-15 12:59 ` Ian Kent 2020-12-17 4:46 ` Ian Kent 2020-12-17 8:54 ` Fox Chen 2020-12-17 10:09 ` Ian Kent 2020-12-17 11:09 ` Ian Kent 2020-12-17 11:48 ` Ian Kent 2020-12-17 15:14 ` Tejun Heo 2020-12-18 7:36 ` Ian Kent 2020-12-18 8:01 ` Fox Chen 2020-12-18 11:21 ` Ian Kent 2020-12-18 13:20 ` Fox Chen 2020-12-19 0:53 ` Ian Kent 2020-12-19 7:47 ` Fox Chen 2020-12-22 2:17 ` Ian Kent 2020-12-18 14:59 ` Tejun Heo 2020-12-19 7:08 ` Ian Kent 2020-12-19 16:23 ` Tejun Heo 2020-12-19 23:52 ` Ian Kent 2020-12-20 1:37 ` Ian Kent 2020-12-21 9:28 ` Fox Chen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --cc=viro@ZenIV.linux.org.uk \ --subject='Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).