From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3509C0044D for ; Wed, 11 Mar 2020 19:55:44 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B921D20737 for ; Wed, 11 Mar 2020 19:55:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (4096-bit key) header.d=crudebyte.com header.i=@crudebyte.com header.b="K6zzj9cd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B921D20737 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=crudebyte.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:57608 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC7SB-0005VD-NZ for qemu-devel@archiver.kernel.org; Wed, 11 Mar 2020 15:55:43 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37178) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC7RY-0004xz-8m for qemu-devel@nongnu.org; Wed, 11 Mar 2020 15:55:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jC7RV-0001Am-Vn for qemu-devel@nongnu.org; Wed, 11 Mar 2020 15:55:04 -0400 Received: from kylie.crudebyte.com ([5.189.157.229]:51743) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jC7RV-00016M-Iy for qemu-devel@nongnu.org; Wed, 11 Mar 2020 15:55:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=crudebyte.com; s=kylie; h=Content-Type:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Content-ID:Content-Description; bh=MCyuuO7OuQiOy0F4XFMFde14slPV+Utf2uFedJpRoJw=; b=K6zzj9cd6e3xuq+ehTPjS/I77Y EAV9trQ5bI+MdYZOwT1TfRMtsBR1ED5BQ15nGLpPwc4uR4tvoQbI9/fSg7ziUH0Vu2R7OILoUhTCj iJJ3B2tv4AjlZcnmRMdeIIgq/M2cfmCwTRincL4F31F6UNpOOY9vPsDndGGLbIJe7hzq3PU837Ny6 d738xt8rUQCW6axFlik+QaybJfKygM1HoFqUdZnkRd4362CU0r1B/TbN/oaTgxYn5NkGWh4+LQZ2n qyG/YzeZS1iyHJPf3NBLiPfY094u6dGVorGpKL459SuMpJsXfPpg01xZUF6Xxaf3h2vpr2mwUxIXY GobseaQffLGfMFW39xlwkvfSkAC6Hp2YdfvirPysXPz6YJc2zBggleYaIs/Rsn07yeJ5ItQfcTpdR noVneEoESLKqF55GCPQW4LLTVPIUePfqv2JN4wF726NVTaGzyLrYpF9Rc0wc1BDyiIQa2EAr5VMlF MFRLPOlw6+D4OhU/pl9VFLmOOGa1DWj/SGYLIvp13DNp0oLXdAVXO3sonEQP6l8d8JZ/67KDIrVJo S4nXz31JJcre/Q8BgRNt6s8z3COKXVLIbR6KIa/m7fGGAWTgoAe7j/hfurO2iUwFbVRuTXu1LuQ1O JeGGjUDcfvMMX5RLN1xcc8UpVTqHc2iVlOPlVBPZg=; From: Christian Schoenebeck To: qemu-devel@nongnu.org Cc: Greg Kurz Subject: Re: [PATCH v4 10/11] 9pfs: T_readdir latency optimization Date: Wed, 11 Mar 2020 20:54:58 +0100 Message-ID: <3047810.ezBau4JpT1@silver> In-Reply-To: <20200311171408.3b3a2dfa@bahia.home> References: <2408892.h2sFNKg3ST@silver> <20200311171408.3b3a2dfa@bahia.home> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 5.189.157.229 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Mittwoch, 11. M=E4rz 2020 17:14:08 CET Greg Kurz wrote: > On Wed, 11 Mar 2020 02:18:04 +0100 >=20 > Christian Schoenebeck wrote: > > On Dienstag, 10. M=E4rz 2020 19:33:36 CET Greg Kurz wrote: > > > > This patch is also too big for my preference, but I don't see a via= ble > > > > way > > > > to split it further into separate patches. I already separated all = the > > > > patches I could. If you have suggestions, very much appreciated! > > >=20 > > > Well, the patch could be split in two or three parts at least: > > >=20 > > > (1) introduce the new function that reads multiple entries in codir.c > > >=20 > > > (2) use it from 9p.c > > >=20 > > > (3) remove unused stuff if anything remains > > >=20 > > > This doesn't seem to change much but the smaller diffstats > > > for each individual patch make them less scary :) and with > > > (1) applied only it is easier to compare what the old code > > > in 9p.c and the new one in codir.c do. > > >=20 > > > > The reason for this is that in order to fix these issues with curre= nt > > > > T_readdir implementation, it requires to separate what's currently = one > > > > task > > > > (i.e. one function) into two separate tasks (i.e. two functions). > > > > There is > > > > no sensible way to do that. > > >=20 > > > Yeah, I won't ask for finer grain. > >=20 > > Me confused. Does that mean your split suggestion was just theoretical,= or > > do you need it? >=20 > I need it and I won't ask for more splitting. Promised ! :) Okay then. :) > > > > Current code on master is much more tricky and error prone due to t= he > > > > huge > > > > amount of potential branches, individual error/cleanup handlers, hi= gh > > > > amount of thread dispatching and much more. In the patched version = all > > > > these code complexities and error sources are removed. > > >=20 > > > Come on :) The 9pfs code has been a can of worms from the beginning. > > > It produced more than the average amount of security-related bugs, > > > and most sadly, due to the overall lack of interest, it bitrotted > > > and missed a lot of cool improvements like an appropriate support of > > > unlinked files, QOM-ification of fsdev, conversion of proxy fsdev to > > > vhost-user, hot plug/unplug support, live migration support and > > > "much more"... The performance aspect of things is a full time job > >=20 > > No, the performance issues are actually very managable in case of 9pfs. > > I already addressed readdir with this patch (by far the worst performan= ce >=20 > They're very manageable if someone cares and spends time. Thanks again > for doing this. Thanks! > > My intention is not to squeeze out the last fractional percent of > > performance for 9pfs, but you certainly agree that a simple "ls" blocki= ng > > for more than 1 second is something that should be fixed, and fortunate= ly > > the amount of > I never observed such timeouts with my personal toy use of 9p but > you did and this motivated you to step in, which is very welcome. Probably you don't notice it much because of the dirent cache on guest side= =2E=20 If guest's dirent cache is cold, and you do a readdir() ("ls") on some=20 directory with e.g. several hundred entries, you should notice it. > > I think the cause of disagreements we have are solely our use cases of > > 9pfs: your personal use case does not seem to require any performance > > considerations or multi-user aspects, whereas my use case requires at > > least some minimum performance grade for utilizing 9pfs for server > > applications. >=20 > Your point about the personal use case is right indeed but our > disagreements, if any, aren't uniquely related to that. It's more about > maintainership and available time really. I'm 100% benevolent "Odd fixer" > now and I just try to avoid being forced into fixing things after my PR is > merged. If you want to go faster, then you're welcome to upgrade to > maintainer and send PRs. This would make sense since you care for 9p, you > showed a good understanding of the code and you provided beneficial > contributions so far :) That maintainership upgrade is planned. The question is just when. What was= =20 your idea, co-maintainership? > > > > > Oh, so we'd still have the current implementation being used, even > > > > > with this patch applied... This would be okay for a RFC patch but > > > > > in the end I'd really like to merge something that also converts > > > > > v9fs_do_readdir_with_stat(). > > > >=20 > > > > Yes, I know, but I would not recommend mixing these things at this > > > > point, > > > > because it would be an entire effort on its own. > > > >=20 > > > > v9fs_do_readdir_with_stat() is used for 9P2000.u, while > > > > v9fs_do_readdir() > > > > is used for 9P2000.L. They're behaving very differently, so it would > > > > not > > > > only require me to update v9fs_do_readdir_with_stat() and v9fs_read= (), > > > > I > > > > would also need to write their own test cases (plural, since there = are > > > > none at all yet) and benchmarks, and of course somebody what need to > > > > review all that additional amount of code, and who would actually t= est > > > > it? I am actively testing my 9P2000.L changes, but I am not actually > > > > using 9P2000.u, so I could not guarantee for the same quality. > > > >=20 > > > > Considering the current commitment situation regarding 9pfs, we can > > > > only > > > > bring things forward with compromises. Hence I suggest I concentrate > > > > on > > > > fixing the worst performance issues on 9P2000.L side first, and lat= er > > > > on > > > > if there are really people interested in seeing these improvements = on > > > > 9P2000.u side and willing to at least help out with testing, then I > > > > can > > > > definitely also adjust 9P2000.u side accordingly as well. > > > >=20 > > > > But to also make this clear: this patch 10 is not introducing any > > > > behaviour > > > > change for 9P2000.u side. > > >=20 > > > Ok, fair enough to leave 9p2000.U behind for now but I had to ask :) > > > /me is even wondering if we should start deprecating 9p2000.U since > > > most clients can use 9p2000.L now... > >=20 > > I probably wouldn't. On macOS for instance there's only 9p2000.u. In the > > end > Hmm... the only thing I've heard about is: >=20 > https://github.com/benavento/mac9p >=20 > and unless I'm missing something, this seems to only have a socket based > transport... the server in QEMU is for virtio-9p and Xen 9p devices only. > Unless mac9p seriously plans to implement a driver for either of those, > I'm still not convinced it is worth keeping .U around... and BTW, our > deprecation policy imposes a 2 QEMU versions cooling period before > actually removing the code. This would leave ample time for someone > to speak up. Well, I leave that up to you. I could consider adding a transport for macOS= =20 guests, but that's definitely not going to happen in any near future. > > > Not sure we're talking about the same loop here... > > >=20 > > > I'm just suggesting that v9fs_readdir_lock() could be something like: > > >=20 > > > static inline void v9fs_readdir_lock(V9fsDir *dir) > > > { > > >=20 > > > while (qemu_mutex_trylock(&dir->readdir_mutex)) { > > > =20 > > > warn_report_once("blah"); > > > =20 > > > } > > >=20 > > > } > >=20 > > Ah yeah, in fact I got you wrong on this one. I thought you meant to mo= ve > > the lock within do_readdir_lowlat(). About your actual suggestion above > > ...>=20 > > > > the loop. The normal case is not requiring a lock at all, like the > > > > comment > > > > describes. Doing multiple locks inside the loop unnecessaririly > > > > reduces > > > > performance for no reason. > > > >=20 > > > > About *_trylock() instead of v9fs_readdir_lock(): might be a candid= ate > > > > to > > >=20 > > > Hmm... are you suggesting that do_readdir_lowlat() should return if it > > > can't take the lock ? > >=20 > > ... yep, that's what I had in mind for it later on. I would not mind > > running trylock in a loop like you suggested; if it fails, log a warning > > and return. Because if the lock fails, then client is buggy. User can > > read the warning in the logs and report the bug for client to be fixed. > > Not worth to handle that case in any fancy way by server. >=20 > The locking is just a safety net because readdir(3) isn't necessarily > thread safe. It was never my intention to add an error path for that. > And thinking again, sending concurrent Treaddir requests on the same > fid is certainly a weird thing to do but is it really a bug ? Well, I was unresolved on that issue as well, hence my decision to only lea= ve=20 a TODO comment for now. My expectation would be separate fid for concurrent= =20 readdir, but you are right of course, Treaddir (unlike its 9p2000.u=20 counterpart) is reentrant by design, since it has an offset parameter, so f= rom=20 protocol perspective concurrent Treaddir is not per se impossible. The lock would not be noticable either with this patch, since unlike on=20 master, the lock is just retained on driver side now (with this patch), whi= ch=20 returns very fast even on large directories, as you can see from the benchm= ark=20 output. So yes, returning from function with an error is probably not the best choi= ce.=20 My tendency is still though logging a message at least. I don't care about= =20 that too much though to deal with trylock() in a loop or something right no= w.=20 There are more important things to take care of ATM. > > avoid similar mistakes by placing appropriate comment(s). > >=20 > > > > > > > + struct V9fsDirEnt > > > > > > > **entries, > > > > > > > + int32_t maxsize, bool > > > > > > > dostat) > > > > >=20 > > > > > s/maxsize/max_count since this is the wording use in the caller. > > > >=20 > > > > Wouldn't do that. This is the driver side, not the 9p protocol requ= est > > > > handler. As you might have relized before, "max_count" is semantica= lly > > > > completely wrong. This variable does not count integral entries, it= is > > > > rather a maximum size (in bytes) of the destination buffer being > > > > filled. > > > >=20 > > > > On 9p request handler side it is ok to use "max_count" instead, > > > > because > > > > the > > > > protocol definition uses exactly this term for the request paramete= r, > > > > but > > > > on driver side it's IMO inappropriate. > > >=20 > > > I agree the max_count wording sucks and I don't care that much to > > > name variables according to protocol definitions, but I do care > > > to use the same name in caller/callee when it makes sense, for a > > > better grep or cscope experience. > >=20 > > So you insist on that max_count change? >=20 > Rather the opposite. Kill max_count and make it maxsize, in a preparatory > patch with the reasons you've exposed. Ah got it. OK then. Best regards, Christian Schoenebeck