From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 302A2C6FD1C for ; Mon, 20 Mar 2023 23:39:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229844AbjCTXjh (ORCPT ); Mon, 20 Mar 2023 19:39:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230156AbjCTXj0 (ORCPT ); Mon, 20 Mar 2023 19:39:26 -0400 Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E5B525E16 for ; Mon, 20 Mar 2023 16:38:52 -0700 (PDT) Received: from [0.0.0.0] ([149.28.201.231]) by mail.gmx.net (mrgmx105 [212.227.17.174]) with ESMTPSA (Nemesis) id 1Mel3t-1qEpsx26IX-00anAq; Tue, 21 Mar 2023 00:37:51 +0100 Message-ID: Date: Tue, 21 Mar 2023 07:37:46 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Content-Language: en-US To: Christoph Hellwig Cc: Chris Mason , Josef Bacik , David Sterba , Johannes Thumshirn , linux-btrfs@vger.kernel.org References: <20230314165910.373347-1-hch@lst.de> <20230314165910.373347-4-hch@lst.de> <2aa047a7-984e-8f6f-163e-8fe6d12a41d8@gmx.com> <20230320123059.GB9008@lst.de> From: Qu Wenruo Subject: Re: [PATCH 03/10] btrfs: offload all write I/O completions to a workqueue In-Reply-To: <20230320123059.GB9008@lst.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K1:scgquC6bBRoJoFtCnU1+J7jfrvZN48eFjZaRVL7V6JpeFqGGQxd 1OMjgyKDiaWHshbDwfyDP4LB7e3Me2VXSPvocsfuADPiIFg5BuRrc1Bt6yoItr9kZ1qy3hb FgcDaftbyhFkdjGwAuyxm5zDqCy87m9LDfItfvfQA39mbbpAyNOYUwe46XHgYWeswTRgddw sU5hkBau7jA+yDr5QkeJQ== UI-OutboundReport: notjunk:1;M01:P0:srOGr1we/Fs=;5xOnVNLx8cz+eM4lTkMjZOc7gMN iU1V8E9O+QlkLbNeRj4CFgx/SPhF7DDAr0Rizp02ZzaSaLpFTF0rbfqyYs3pEzxB1bcgLKRkA ISbDtuoAIetN0g/WlrD62gafvGNPx0J1d8J7SQImHurtDJakQewVe1Xpfg2V9iC5Vs7+oJC/m Dw4xPFxSnpxoJhrsoDRSy2mR3fIDBfz1iXIYNxeVHspmhpfoyMefj+lminWisY9Lw2F0VeTr8 DQSaCMYzdtBlnP1/azxYgIF1FJuzH7d+UTtSGoT53Bn8q7zflDtsDzl3kLWp39aihxDKGU5Ot PBAmjHwh7gl/CVoKGI+5BeLjNrRRLhZAhws3xme6Q5MTXm7kTyo82ORkMM78agr9BVpbK66JG 6J6WeC8S6N5qBFn378YWAo4sQkSQGuypGJNIBtILg6ecTTBpfM7R1qebg7NVFVH1MwcgxAZSc UFMMdP97ANIDTgFXunQOBT9gS+Yw/WzElAHbz52PnlHsCR3eL8eO1KL+bpdCa3pHH7AvM0py4 Ls3+bzk33OQGGvVD+OOjzHNFsnxM6Z832V7tNERk2QvUSPBweCbEFXKpfFKCE8/XKzB9Av5VS p+lMB+hs84RDrkGV4EDeC2auOmFsX/6qIExIDPtrKIyJ2kqsxdlhDM/C/VcIRV8ee/5KYH0dE p2SJ2JGydauQ2EJow0pXPHg7KKYwTAPRg1BVsdOt5AWcGYEbufxcLOxqn0XiqqeEambizHd3v hqdXViLnssJ4MjZV2NH8WlwKa4GpPSDea3DeYDiVmWEyemqg7HLTtFuOtWIOxrvwaSC98mCxv Mv4KbFy5KKUx5wEXETCiM+Eqgg6n3+BhfOJ8dX6cPYpn/hKRxpTcvZ6ErIOnFIVp8YJvgIPYM VtyL00/CnvtnqT7tGiQgjW/rvazZxIDAbhHLHahGSjHaXlirr+Jop3vGAaep/moydnKgOTh1b rDRsgLJj8cPGo3ja5SxKeQhD5FI= Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On 2023/3/20 20:30, Christoph Hellwig wrote: > On Mon, Mar 20, 2023 at 07:29:38PM +0800, Qu Wenruo wrote: >> Sure, they are called in very strict context, thus we should keep them >> short. >> But on the other hand, we're already having too many workqueues, and I'm >> always wondering under what situation they can lead to deadlock. >> (e.g. why we need to queue endios for free space and regular data inodes >> into different workqueues?) > > In general the reason for separate workqueues is if one workqueue depends > on the execution of another. It seems like this is Josef's area, but > my impression is that finishing and ordered extent can cause writeback > and a possible wait for data in the freespace inode. Normally such > workqueue splits should have comments in the code to explain them, but > so far I haven't found one. > >> My current method is always consider the workqueue has only 1 max_active, >> but I'm still not sure for such case, what would happen if one work slept? > > That's my understanding of the workqueue mechanisms, yes. > >> Would the workqueue being able to choose the next work item? Or that >> workqueue is stalled until the only active got woken? > > I think it is stalled. That's why the workqueue heavily discourages > limiting max_active unless you have a good reason to, and most callers > follow that advise. To me, this looks more like hiding a bug in the workqueue user. Shouldn't we expose such bugs instead of hiding them? Furthermore although I'm a big fan of plain workqueue, the old btrfs workqueues allows us to apply max_active to the plain workqueues, but now we don't have this ability any more. Thus at least, we should bring back the old behavior, and possibly fix whatever bugs in our workqueue usage. > >> Personally speaking, I'd like to keep the btrfs bio endio function calls in >> the old soft/hard irq context, and let the higher layer to queue the work. > > Can you explain why? Just to keep the context consistent. Image a situation, where we put some sleep functions into a endio function without any doubt, and fully rely on the fact the endio function is executed under workqueue context. Or we have to extra comments explain the situation every time we're doing something like this. Thanks, Qu > >> However we have already loosen the endio context for btrfs bio, from the >> old soft/hard irq to the current workqueue context already... > > read I/O are executed in a workqueue. For write completions there also > are various offloads, but none that is consistent and dependable so far.