From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E8C53BBDC for ; Thu, 28 Mar 2024 20:03:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711656222; cv=none; b=ZaE2Td0wpbB95toP1EIcTX9lf9wse4Nox4Crua8DIOvTPtpMhtn59wteRZqx5CWsHOLxUedwmmPWDtX5gTCcf31V9k4Ljq91LyG+rrFrLwiscWiJF71fjA79B1QxUuLp3nZDP2qZXA1ADOz1qvwOTo2Z9aJahiVL5lVE8T2fA28= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711656222; c=relaxed/simple; bh=Inx5Ha2iR3SN1u+aPKMmoVMw9cCEZiHB3zQezTO7yPg=; h=MIME-Version:Date:Content-Type:From:Message-ID:Subject:To:Cc: In-Reply-To:References; b=iwc2trymipRpV2surD2Adp0i7pPMleKsI1n9UOcBXhYgosEid6OLJt9yacZ/M1499vsxTmw5orLg0GRzQxie2D5yZVD78fBv4Lft31XaTpE27LgIf84C0KsdZSgqJ+EqaqfAawy4DaJwrkPejuKnkf05mA4LvCs6+g4jvtMf84E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=N1snRDzI; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="N1snRDzI" Precedence: bulk X-Mailing-List: v9fs@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1711656217; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4rCtnLPFuqJwMoV3P1xZL5Mbd1A7R9dCH3xTvyLj+9U=; b=N1snRDzI7qi0bCD2aXFRXTZGtdYy8V1WPFj2LOw2Yg7jFo7IsAitshAU26RXwV1ww8nHHi ga1kUi93FOwYShdaVK1fk0L8DFlFJ1lC8ZbtT0ez/iIHSCaD+YDibLpUye9d1IsbhoT0F0 uoT4AHk+9O0dOy1cjDGP0kITYWBvVGc= Date: Thu, 28 Mar 2024 20:03:36 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: "Eric Van Hensbergen" Message-ID: <4d7d7f9020fecc66d34c3b925591371e2aeac42f@linux.dev> TLS-Required: No Subject: Re: [PATCH v2 2/4] fs/9p: drop inodes immediately on non-.L too To: "Joakim Sindholt" Cc: v9fs@lists.linux.dev In-Reply-To: <6a4f3bed800f9f5021e17311e6d7288f9a2b56c6@linux.dev> References: <20240318112542.18863-1-opensource@zhasha.com> <20240318112542.18863-3-opensource@zhasha.com> <145fc52b2452095c44260825670141d0d3b42c7d@linux.dev> <20240328163753.eb3541f4c8595de40f5e3c82@zhasha.com> <6a4f3bed800f9f5021e17311e6d7288f9a2b56c6@linux.dev> X-Migadu-Flow: FLOW_OUT Also digging a bit deeper -- looks like for legacy we set nlink to 1 alwa= ys. So server doesn't have anything to do with that, and it may be a problem in the newer code that we aren't= coping properly. I guess I need to compare a legacy trace to a dotL tra= ce and make sure that's right -- a bit worried about how directories show= up since they should have 3 nlinks (. and ..) and this code looks generi= c. -eric March 28, 2024 at 2:47 PM, "Eric Van Hensbergen" wrote: >=20 >=20Not sure if your server source is public or not, but it would be usef= ul to test against it and see what I see. It's possible a mismatch with t= he nlink stat information could indeed cause inodes to stick around even = under memory pressure and that'd cause all sorts of problems. >=20 >=20The newer kernel versions have much better qid->inode mapping stuff t= hat should provent some of the weirdness in the implementation before (th= ere were lots of inodes with duplicate ino_num because the way we crushed= the allocated inode number with the qid.path) so its definitely worth te= sting against the latest and greatest if you can. I will try and allocate= some time to revive my regressions against 9p2000 and 9p2000.u servers s= o we catch some of this earlier in the future, but I'm not sure I would h= ave caught any degredation. >=20 >=20 -eric >=20 >=20March 28, 2024 at 10:37 AM, "Joakim Sindholt" = wrote: >=20 >=20>=20 >=20> On Thu, 28 Mar 2024 15:08:59 +0000, "Eric Van Hensbergen" wrote: > >=20 >=20>=20=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> Slowly parsing through these, thanks for the fixes. > >=20 >=20>=20=20 >=20>=20 >=20> This one has a bit of a problem in that we don't have a v9fs speci= fic > >=20 >=20>=20=20 >=20>=20 >=20> drop_inode anymore (either in legacy or .L). The release of the fi= d should=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> be handled by v9fs_dir_release in both versions of the protocol. > >=20 >=20>=20=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> My apologies. I didn't realize that it had been changed in the nex= t > >=20 >=20>=20=20 >=20>=20 >=20> branch until after sending it and I haven't had the time to check = how > >=20 >=20>=20=20 >=20>=20 >=20> it's supposed to work now. > >=20 >=20>=20=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> I had convinced myself we could just use generic_drop_inode becaus= e we=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> decoupled clunking fids from the dentry/inode structures and just = handled > >=20 >=20>=20=20 >=20>=20 >=20> them directly in v9fs_dir_release -- so really by recovering the i= node=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> structure every time we were just forcing churn in the inode alloc= /dealloc > >=20 >=20>=20=20 >=20>=20 >=20> routines that seemed unnecessary (even in the uncached mode). > >=20 >=20>=20=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> I agree that the alloc followed by immediately dropping it from th= e > >=20 >=20>=20=20 >=20>=20 >=20> cache is a waste of cycles. This was just the most reliable way I = knew > >=20 >=20>=20=20 >=20>=20 >=20> of fixing it without impacting anything else. > >=20 >=20>=20=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> Was your concern the performance of the client side lookup of the = inode > >=20 >=20>=20=20 >=20>=20 >=20> based on qid or the server side based on fid and can you give me a= bit > >=20 >=20>=20=20 >=20>=20 >=20> more information on what you are seeing? Are you not seeing fid cl= unks > >=20 >=20>=20=20 >=20>=20 >=20> for certain conditions (ie. transient fids, etc.)? > >=20 >=20>=20=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> This is what I'm seeing in slabtop on a system with no caching ena= bled > >=20 >=20>=20=20 >=20>=20 >=20> on a 6.1 kernel. I never tested if it persisted on the 6.6 kernel = but I > >=20 >=20>=20=20 >=20>=20 >=20> assume it would. > >=20 >=20>=20=20 >=20>=20 >=20> Active / Total Objects (% used) : 2738808 / 2788118 (98.2%) > >=20 >=20>=20=20 >=20>=20 >=20> Active / Total Slabs (% used) : 89853 / 89853 (100.0%) > >=20 >=20>=20=20 >=20>=20 >=20> Active / Total Caches (% used) : 164 / 261 (62.8%) > >=20 >=20>=20=20 >=20>=20 >=20> Active / Total Size (% used) : 1071558.97K / 1084648.27K (98.8%) > >=20 >=20>=20=20 >=20>=20 >=20> Minimum / Average / Maximum Object : 0.02K / 0.39K / 16.01K > >=20 >=20>=20=20 >=20>=20 >=20> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > >=20 >=20>=20=20 >=20>=20 >=20> 1235968 1235915 99% 0.06K 19312 64 77248K lsm_inode_cache > >=20 >=20>=20=20 >=20>=20 >=20> 1200339 1200339 100% 0.75K 57159 21 914544K v9fs_inode_cache > >=20 >=20>=20=20 >=20>=20 >=20> 62136 54815 88% 0.11K 1726 36 6904K buffer_head > >=20 >=20>=20=20 >=20>=20 >=20> 49400 45639 92% 0.20K 2600 19 10400K dentry > >=20 >=20>=20=20 >=20>=20 >=20> 26368 26364 99% 0.03K 206 128 824K avtab_node > >=20 >=20>=20=20 >=20>=20 >=20> 26222 15713 59% 0.57K 1873 14 14984K radix_tree_node > >=20 >=20>=20=20 >=20>=20 >=20> 19162 18337 95% 1.20K 1474 13 23584K ext4_inode_cache > >=20 >=20>=20=20 >=20>=20 >=20> ... > >=20 >=20>=20=20 >=20>=20 >=20> What this is not showing, and it's really rather difficult to show= , is > >=20 >=20>=20=20 >=20>=20 >=20> that it also causes a large amount of CPU usage on any file operat= ions. > >=20 >=20>=20=20 >=20>=20 >=20> I'm not sure if it's simply because the table is enormous or if it= 's > >=20 >=20>=20=20 >=20>=20 >=20> because most of the qids v9fs gets to contend with are identical o= nes > >=20 >=20>=20=20 >=20>=20 >=20> from a whole host of synthetic file systems that it then has to it= erate > >=20 >=20>=20=20 >=20>=20 >=20> over linearly. > >=20 >=20>=20=20 >=20>=20 >=20> The v9fs_inode_cache grows until it takes up all available RAM. > >=20 >=20>=20=20 >=20>=20 >=20> And snatching this one from your second mail: > >=20 >=20>=20=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> Which server are you testing against? I'm wondering if the underly= ing problem > >=20 >=20>=20=20 >=20>=20 >=20> might actually be i_nlink being maintained incorrectly by the serv= er which would > >=20 >=20>=20=20 >=20>=20 >=20> lead to inodes lingering since that is the primary way the generic= _inode_drop > >=20 >=20>=20=20 >=20>=20 >=20> differentiates. It is possible that we could could add an v9fs_ino= de_always_drop > >=20 >=20>=20=20 >=20>=20 >=20> in for legacy if the servers weren't reporting i_nlink a compatibl= e fashion > >=20 >=20>=20=20 >=20>=20 >=20> although that might result in never caching legacy 9p2000 (which i= s probably > >=20 >=20>=20=20 >=20>=20 >=20> what most legacy folks want anyways). > >=20 >=20>=20=20 >=20>=20 >=20>=20=20 >=20>=20 >=20> I'm running against a regular old 9P2000 server that I wrote mysel= f. No > >=20 >=20>=20=20 >=20>=20 >=20> extensions of any kind, so there are no links. I don't really have= any > >=20 >=20>=20=20 >=20>=20 >=20> objection to the kind of qid.version contextual you implemented in= 6.6 > >=20 >=20>=20=20 >=20>=20 >=20> and would actually quite like to use it in the near future, but I > >=20 >=20>=20=20 >=20>=20 >=20> wouldn't be terribly sad if you removed it either. > > >