From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 857B0C433E0 for ; Thu, 18 Mar 2021 16:10:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 50B5564E4D for ; Thu, 18 Mar 2021 16:10:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231876AbhCRQJf (ORCPT ); Thu, 18 Mar 2021 12:09:35 -0400 Received: from frasgout.his.huawei.com ([185.176.79.56]:2712 "EHLO frasgout.his.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231871AbhCRQJY (ORCPT ); Thu, 18 Mar 2021 12:09:24 -0400 Received: from fraeml706-chm.china.huawei.com (unknown [172.18.147.206]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4F1Ww628xgz681Nc; Fri, 19 Mar 2021 00:00:54 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml706-chm.china.huawei.com (10.206.15.55) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2106.2; Thu, 18 Mar 2021 17:09:22 +0100 Received: from [10.47.0.142] (10.47.0.142) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Thu, 18 Mar 2021 16:09:21 +0000 Subject: Re: [PATCH 2/2] iommu/iova: Improve restart logic To: Robin Murphy , CC: , , Linuxarm , References: <03931d86c0ad71f44b29394e3a8d38bfc32349cd.1614962123.git.robin.murphy@arm.com> <076b3484d1e5057b95d8c387c894bd6ad2514043.1614962123.git.robin.murphy@arm.com> <6cea11f9-e98d-98cb-6789-93abd8833fa0@huawei.com> <878f4f77-97ee-898d-eb05-4548cf00ec27@arm.com> From: John Garry Message-ID: Date: Thu, 18 Mar 2021 16:07:11 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.1.2 MIME-Version: 1.0 In-Reply-To: <878f4f77-97ee-898d-eb05-4548cf00ec27@arm.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.47.0.142] X-ClientProxiedBy: lhreml730-chm.china.huawei.com (10.201.108.81) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > Well yeah, in your particular case you're allocating from a heavily > over-contended address space, so much of the time it is genuinely full. > Plus you're primarily churning one or two sizes of IOVA, so there's a > high chance that you will either allocate immediately from the cached > node (after a previous free), or search the whole space and fail. In > case it was missed, searching only some arbitrary subset of the space > before giving up is not a good behaviour for an allocator to have in > general. > >>> So since the retry means that we search through the complete pfn >>> range most of the time (due to poor success rate), we should be able >>> to do a better job at maintaining an accurate max alloc size, by >>> calculating it from the range search, and not relying on max alloc >>> failed or resetting it frequently. Hopefully that would mean that >>> we're smarter about not trying the allocation. >> >> So I tried that out and we seem to be able to scrap back an >> appreciable amount of performance. Maybe 80% of original, with with >> another change, below. > > TBH if you really want to make allocation more efficient I think there > are more radical changes that would be worth experimenting with, like > using some form of augmented rbtree to also encode the amount of free > space under each branch, or representing the free space in its own > parallel tree, or whether some other structure entirely might be a > better bet these days. > > And if you just want to make your thing acceptably fast, now I'm going > to say stick a quirk somewhere to force the "forcedac" option on your > platform ;) > Easier said than done :) But still, I'd like to just be able to cache all IOVA sizes for my DMA engine, so we should not have to go near the RB tree often. I have put together a series to allow upper limit of rcache range be increased per domain. So naturally that gives better performance than we originally had. I don't want to prejudice the solution by saying what I think of it now, so will send it out... > [...] >>>>> @@ -219,7 +256,7 @@ static int __alloc_and_insert_iova_range(struct >>>>> iova_domain *iovad, >>>>>           if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) { >>>>>               high_pfn = limit_pfn; >>>>>               low_pfn = retry_pfn; >>>>> -            curr = &iovad->anchor.node; >>>>> +            curr = iova_find_limit(iovad, limit_pfn); >> >> >> I see that it is now applied. However, alternatively could we just add >> a zero-length 32b boundary marker node for the 32b pfn restart point? > > That would need special cases all over the place to prevent the marker > getting merged into reservations or hit by lookups, and at worst break > the ordering of the tree if a legitimate node straddles the boundary. I > did consider having the insert/delete routines keep track of yet another > cached node for whatever's currently the first thing above the 32-bit > boundary, but I was worried that might be a bit too invasive. Yeah, I did think of that. I don't think that it would have too much overhead. > > FWIW I'm currently planning to come back to this again when I have a bit > more time, since the optimum thing to do (modulo replacing the entire > algorithm...) is actually to make the second part of the search > *upwards* from the cached node to the limit. Furthermore, to revive my > arch/arm conversion I think we're realistically going to need a > compatibility option for bottom-up allocation to avoid too many nasty > surprises, so I'd like to generalise things to tackle both concerns at > once. > Thanks, John