From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH 0/2] /proc/stat: Reduce irqs counting performance overhead To: Dave Chinner Cc: Andrew Morton , Alexey Dobriyan , Luis Chamberlain , Kees Cook , Jonathan Corbet , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, Davidlohr Bueso , Miklos Szeredi , Daniel Colascione , Randy Dunlap References: <1546873978-27797-1-git-send-email-longman@redhat.com> <20190107223214.GZ6311@dastard> <9b4208b7-f97b-047c-4dab-15bd3791e7de@redhat.com> <20190108020422.GA27534@dastard> From: Waiman Long Message-ID: <56954b42-4258-7268-53b5-ddca28758193@redhat.com> Date: Tue, 8 Jan 2019 11:58:26 -0500 MIME-Version: 1.0 In-Reply-To: <20190108020422.GA27534@dastard> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: On 01/07/2019 09:04 PM, Dave Chinner wrote: > On Mon, Jan 07, 2019 at 05:41:39PM -0500, Waiman Long wrote: >> On 01/07/2019 05:32 PM, Dave Chinner wrote: >>> On Mon, Jan 07, 2019 at 10:12:56AM -0500, Waiman Long wrote: >>>> As newer systems have more and more IRQs and CPUs available in their >>>> system, the performance of reading /proc/stat frequently is getting >>>> worse and worse. >>> Because the "roll-your-own" per-cpu counter implementaiton has been >>> optimised for low possible addition overhead on the premise that >>> summing the counters is rare and isn't a performance issue. This >>> patchset is a direct indication that this "summing is rare and can >>> be slow" premise is now invalid. >>> >>> We have percpu counter infrastructure that trades off a small amount >>> of addition overhead for zero-cost reading of the counter value. >>> i.e. why not just convert this whole mess to percpu_counters and >>> then just use percpu_counter_read_positive()? Then we just don't >>> care how often userspace reads the /proc file because there is no >>> summing involved at all... >>> >>> Cheers, >>> >>> Dave. >> Yes, percpu_counter_read_positive() is cheap. However, you still need to >> pay the price somewhere. In the case of percpu_counter, the update is >> more expensive. > Ummm, that's exactly what I just said. It's a percpu counter that > solves the "sum is expensive and frequent" problem, just like you > are encountering here. I do not need basic scalability algorithms > explained to me. What I am trying to say is that the "sum is expensive and frequent" is only true of a very small percentage of applications. It is not true for most of them. I am hesitating to add latency to the interrupt path that will affect all applications.   >> I would say the percentage of applications that will hit this problem is >> small. But for them, this problem has some significant performance overhead. > Well, duh! > > What I was suggesting is that you change the per-cpu counter > implementation to the /generic infrastructure/ that solves this > problem, and then determine if the extra update overhead is at all > measurable. If you can't measure any difference in update overhead, > then slapping complexity on the existing counter to attempt to > mitigate the summing overhead is the wrong solution. > > Indeed, it may be that you need o use a custom batch scaling curve > for the generic per-cpu coutner infrastructure to mitigate the > update overhead, but the fact is we already have generic > infrastructure that solves your problem and so the solution should > be "use the generic infrastructure" until it can be proven not to > work. > > i.e. prove the generic infrastructure is not fit for purpose and > cannot be improved sufficiently to work for this use case before > implementing a complex, one-off snowflake counter implementation... I see your point. I like the deferred summation approach that I am currently using. If I have to modify the current per-cpu counter implementation to support that and I probably need to add counter grouping support to amortize the overhead, that can be a major undertaking. This is not a high priority item for me at the moment, so I may have to wait until I have some spare time left. Thanks, Longman