From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1012BC05027 for ; Tue, 7 Feb 2023 03:19:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229655AbjBGDT0 (ORCPT ); Mon, 6 Feb 2023 22:19:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229483AbjBGDTY (ORCPT ); Mon, 6 Feb 2023 22:19:24 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98FCA9EEB for ; Mon, 6 Feb 2023 19:18:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675739888; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GIxTMZmbsJ4n4h+wodJFJ7AH4L9sE2etRukeWGfprOQ=; b=eANDvVka1kzOtB2QmbX+LinmbIlao9B/6l1R7qr5DYuNkPP5NcCHal3ixbiFxYHqQPHUjP lpHjTENepvTF1XjUbCwRYdyWiVnxpzpfgfss/AEiYeS586W+hN/nOymAFS8ude9b8RUq2Y QjaDgif+t9OFCU9p+Vr2KitiBUw/Jhk= Received: from mail-oo1-f71.google.com (mail-oo1-f71.google.com [209.85.161.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-654-jraRc2wnOjmSxQ2RsvRAsQ-1; Mon, 06 Feb 2023 22:18:07 -0500 X-MC-Unique: jraRc2wnOjmSxQ2RsvRAsQ-1 Received: by mail-oo1-f71.google.com with SMTP id i10-20020a4a900a000000b004f1f4aa7f3bso4046402oog.7 for ; Mon, 06 Feb 2023 19:18:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=GIxTMZmbsJ4n4h+wodJFJ7AH4L9sE2etRukeWGfprOQ=; b=NzYvNWeHAW02rqCSF+3lLz9N76EdCe/puq0BBrYeHVWUnbvICs36O6vzgUo7Cmdi6x RwirotGfDAsHSlxWXJo2OuXP8ErSrL5GP4nZB6sxgFTHC71DJR5ml6QV07XBvTn4oe1u lF7jsZ0K+/YolSnUF3LFecCf+shGIkIIYthirISRlNTz8UX7bZt/0sDVxaoM7G8pd8Wq g7CkTxy+4nuv7ynpn9bFKsx9GbCHrJ933tI0ixAgnkxBH/y5Vq+BwYkW2ac1fhSwmtE3 xQfAEqfQiH69QZD9brwBmvYAXI27WiQI/gHMB9z63DVd93haG1v0CmJbz0CPDk4Zx4EH weng== X-Gm-Message-State: AO0yUKUADOjpI7rk1gjCgY0n+SefJdYpeTAdyiNLAbi94WYx1N2krIZP RRa6XJSUvgtlJ/5MBqMQIvGu3qw8u4hZj88d+lZT/MwS/aMEBVnTbu5cPS8/3bP3qgp5ffCJ1Fl Fm1jWjKOEuwbDH4fyOzxPXj7i X-Received: by 2002:aca:2809:0:b0:37a:2bf0:5025 with SMTP id 9-20020aca2809000000b0037a2bf05025mr5460885oix.19.1675739886842; Mon, 06 Feb 2023 19:18:06 -0800 (PST) X-Google-Smtp-Source: AK7set+7VS3H2qMxD1XHEeHGgHZ74B+TJyfGc4ZI4Mp5kB81qQes42VlFPDf7PJJ4nRSKOza3aEOWw== X-Received: by 2002:aca:2809:0:b0:37a:2bf0:5025 with SMTP id 9-20020aca2809000000b0037a2bf05025mr5460883oix.19.1675739886546; Mon, 06 Feb 2023 19:18:06 -0800 (PST) Received: from ?IPv6:2804:1b3:a800:9aa9:fdcb:7dec:9680:8417? ([2804:1b3:a800:9aa9:fdcb:7dec:9680:8417]) by smtp.gmail.com with ESMTPSA id z12-20020a54458c000000b003790759c310sm5040893oib.15.2023.02.06.19.18.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Feb 2023 19:18:05 -0800 (PST) Message-ID: <4b232f47e038ab6fcaa0114f73c28d4bf8799f84.camel@redhat.com> Subject: Re: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining From: Leonardo =?ISO-8859-1?Q?Br=E1s?= To: Roman Gushchin Cc: Michal Hocko , Marcelo Tosatti , Johannes Weiner , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 07 Feb 2023 00:18:01 -0300 In-Reply-To: References: <9e61ab53e1419a144f774b95230b789244895424.camel@redhat.com> <0122005439ffb7895efda7a1a67992cbe41392fe.camel@redhat.com> <28e08669302ad1e7a41bdf8b9988de6a352b5fe1.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.3 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 2023-02-05 at 11:49 -0800, Roman Gushchin wrote: > Hi Leonardo! Hello Roman, Thanks a lot for replying! >=20 > > Yes, but we are exchanging an "always schedule_work_on()", which is a k= ind of > > contention, for a "sometimes we hit spinlock contention". > >=20 > > For the spinlock proposal, on the local cpu side, the *worst case* cont= ention > > is: > > 1 - wait the spin_unlock() for a complete , > > 2 - wait a cache hit for local per-cpu cacheline=C2=A0 > >=20 > > What is current implemented (schedule_work_on() approach), for the loca= l > > cpu=C2=A0side there is *always* this contention: > > 1 - wait for a context switch, > > 2 - wait a cache hit from it's local per-cpu cacheline, > > 3 - wait a complete ,=C2=A0 > > 4 - then for a new context switch to the current thread. >=20 > I think both Michal and me are thinking of a more generic case in which t= he cpu > is not exclusively consumed by 1 special process, so that the draining wo= rk can > be executed during an idle time. In this case the work is basically free. Oh, it makes sense. But in such scenarios, wouldn't the same happens to spinlocks? I mean, most of the contention with spinlocks only happens if the remote cp= u is trying to drain the cache while the local cpu happens to be draining/chargi= ng, which is quite rare due to how fast the local cpu operations are. Also, if the cpu has some idle time, using a little more on a possible spin= lock contention should not be a problem. Right? >=20 > And the introduction of a spin_lock() on the hot path is what we're are c= oncerned > about. I agree, that on some hardware platforms it won't be that expensiv= e,=C2=A0 >=20 IIRC most hardware platforms with multicore supported by the kernel should = have the same behavior, since it's better to rely on cache coherence than lockin= g the memory bus. For instance, the other popular architectures supported by Linux use the LR= /SC strategy for atomic operations (tested on ARM, POWER, RISCV) and IIRC the LoadReserve slow part waits for the cacheline exclusivity, which is already already exclusive in this perCPU structure. > but in general not having any spinlocks is so much better. I agree that spinlocks may bring contention, which is not ideal in many cas= es. In this case, though, it may not be a big issue, due to very rare remote ac= cess in the structure, for the usual case (non-pre-OOMCG) >=20 > >=20 > > So moving from schedule_work_on() to spinlocks will save 2 context swit= ches per > > cpu every time drain_all_stock() is called. > >=20 > > On the remote cpu side, my tests point that doing the remote draining i= s faster > > than scheduling a local draining, so it's also a gain. > >=20 > > Also, IIUC the possible contention in the spinlock approach happens onl= y on > > page-faulting and syscalls, versus the schedule_work_on() approach that= can > > interrupt user workload at any time.=C2=A0 > >=20 > > In fact, not interrupting the user workload in isolated cpus is just a = bonus of > > using spinlocks. >=20 > I believe it significantly depends on the preemption model: you're right = regarding > fully preemptive kernels, but with voluntary/none preemption it's exactly= opposite: > the draining work will be executed at some point later (probably with 0 c= ost), So, in case of voluntary/none preemption with some free cpu time.=20 > while the remote access from another cpu will potentially cause delays on= the > spin lock as well as a need to refill the stock. But if there is some free CPU time, what is the issue of some (potential) d= elays due to spinlock contention? I am probably missing the whole picture, but when I think of performance improvement, I think on doing more with the same cputime. If we can use fre= e cputime to do stuff later, it's only fair to also use it in case of content= ion, right? I know there are some cases that may need to be more previsible (mostly RT)= , but when I think of memory allocation, I don't expect it to always take the sam= e time (as there are caches, pre-OOM, and so) Also, as previously discussed, in case of a busy cpu, the spinlock approach= will probably allow more work to be done. >=20 > Overall I'd expect a noticeable performance regression from an introducti= on of > spin locks and remote draining. Maybe not on all platforms, but at least = on some. > That's my main concern. >=20 I see.=20 For the platform I have tested (x86) I noticed better overall performance o= n spinlocks than upstream solution. For other popular platforms, I have brief= ly read some documentation on locking/atomicity and I think we may keep the performance gains. But to be sure, I could retake the tests on other platforms, such as ARM, P= OWER, RISCV, and so. Or even perform extra suggested tests. With that info, would you feel less concerned about a possible change in me= mcg pcp cache locking scheme? > And I don't think the problem we're aiming to solve here > justifies this potential regression. >=20 To be strict, the isolated cpu scheduling problem is already fixed by the housekeeping patch (with some limitations).=C2=A0 At this point, I am trying to bring focus to a (possible) performance improvement on the memcg pcp cache locking system. > Thanks! >=20 Thank you for helping me better understand your arguments and concerns. I really appreciate it! Best regards, Leo