From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05E86C433F5 for ; Fri, 28 Jan 2022 06:25:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346463AbiA1GZQ (ORCPT ); Fri, 28 Jan 2022 01:25:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346438AbiA1GZM (ORCPT ); Fri, 28 Jan 2022 01:25:12 -0500 Received: from mail-il1-x129.google.com (mail-il1-x129.google.com [IPv6:2607:f8b0:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0A7EC06173B for ; Thu, 27 Jan 2022 22:25:11 -0800 (PST) Received: by mail-il1-x129.google.com with SMTP id s1so4564113ilj.7 for ; Thu, 27 Jan 2022 22:25:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ioX5kk3sWh56bJEPWUdFpYPz8fw0aCA2n/24TeU+OSY=; b=DcclhvcH9slBX27nMJsYCkSK9cLOnuPv4HRZuib0o9hVmuMNVGIzDoiczsSQ8VFYPE svvSySicRb0BiJOBVYeLV1Ins06BzzqKHc6Pwg+L5oFV2an9NyexJXWi5FJ+3tQxn77Q gakw+O/53ZKq/LhbsLkca4TvPqywSGffho6xD31OksHO69E/Zz++g2Cz5A3nlQC8sWIN tJKYtaeTQODRO0QaitU9oEhXKC7S7iFJIXtHgewf9PLq+lBP0oXHn8IP0Wi0OjI/7g6H wxwqv4ToOcQQ1UqO3ZGMPMWZm5sr+WTG4VFJaEX3OFbmnooQD2plVouBI4ScfsKOmuFn D0Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ioX5kk3sWh56bJEPWUdFpYPz8fw0aCA2n/24TeU+OSY=; b=zhs0wkBqV6Hdzi7yVHrdqgZF/MJhv4jlVcoVdWqLfULdHUmw4wjgiWWbfMREC/5HrQ pLmV7Pe7/eGCjMyJ3wAbsYSQmj9d25XBbiuIS2yOOH/G9lbuv8078D6g9KaLMrQETXjt pahBvxrgxau8494BzjBKEyIZgKcng5JTmGEPNuYYk87IGTvVJ+oxyp1nYNwFUIoj/LqA AqcMD3c82AzXIwqb7sk9wtztmFwzq2XUYWB73KduJJtrMIv0oacK7CYAIgkvi5bVHeMP Gr8gZLPM5HYxAcAJXjYzGb2TMJ1T93B5yubiq/PXjwy+KAi+yt+ijRy8sK6onBLMLxN5 sD9A== X-Gm-Message-State: AOAM533as214CSJICE4F7lAgTPlt/6+rkNZ2tzQvaeGhod1FR9+VO/hP DjWD0rgwl65+owAZUYJrpdNAg38TmI+uiaPpbHro3Q== X-Google-Smtp-Source: ABdhPJw8xJtMdMXLpq5DNJh0qHQnEwUlREutuMllw/zjqf3plgXpZY3RK/5z+FWQU/NwE5f50hf7bHaEt/Ldv33+DuM= X-Received: by 2002:a05:6e02:188a:: with SMTP id o10mr4893492ilu.89.1643351110790; Thu, 27 Jan 2022 22:25:10 -0800 (PST) MIME-Version: 1.0 References: <20220125204602.4137477-1-irogers@google.com> <20220128142348.17d51894dbdb35c9a9449567@kernel.org> In-Reply-To: <20220128142348.17d51894dbdb35c9a9449567@kernel.org> From: Ian Rogers Date: Thu, 27 Jan 2022 22:24:59 -0800 Message-ID: Subject: Re: [PATCH v2 0/4] Reference count checker and related fixes To: Masami Hiramatsu Cc: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Darren Hart , Davidlohr Bueso , =?UTF-8?Q?Andr=C3=A9_Almeida?= , James Clark , John Garry , Riccardo Mancini , Yury Norov , Andy Shevchenko , Andrew Morton , Jin Yao , Adrian Hunter , Leo Yan , Andi Kleen , Thomas Richter , Kan Liang , Madhavan Srinivasan , Shunsuke Nakamura , Song Liu , Steven Rostedt , Miaoqian Lin , Stephen Brennan , Kajol Jain , Alexey Bayduraev , German Gomez , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Eric Dumazet , Dmitry Vyukov , masami.hiramatsu.pt@hitachi.com, eranian@google.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 27, 2022 at 9:24 PM Masami Hiramatsu wrote: > > On Thu, 27 Jan 2022 13:33:23 -0800 > Ian Rogers wrote: > > > On Tue, Jan 25, 2022 at 12:46 PM Ian Rogers wrote: > > > > > > This v2 patch set has the main reference count patch for cpu map from > > > the first set and then adds reference count checking to nsinfo. The > > > reference count checking on nsinfo helped diagnose a data race bug > > > which is fixed in the independent patches 2 and 3. > > > > > > The perf tool has a class of memory problems where reference counts > > > are used incorrectly. Memory/address sanitizers and valgrind don't > > > provide useful ways to debug these problems, you see a memory leak > > > where the only pertinent information is the original allocation > > > site. What would be more useful is knowing where a get fails to have a > > > corresponding put, where there are double puts, etc. > > > > > > This work was motivated by the roll-back of: > > > https://lore.kernel.org/linux-perf-users/20211118193714.2293728-1-irogers@google.com/ > > > where fixing a missed put resulted in a use-after-free in a different > > > context. There was a sense in fixing the issue that a game of > > > wac-a-mole had been embarked upon in adding missed gets and puts. > > > > > > The basic approach of the change is to add a level of indirection at > > > the get and put calls. Get allocates a level of indirection that, if > > > no corresponding put is called, becomes a memory leak (and associated > > > stack trace) that leak sanitizer can report. Similarly if two puts are > > > called for the same get, then a double free can be detected by address > > > sanitizer. This can also detect the use after put, which should also > > > yield a segv without a sanitizer. > > > > > > Adding reference count checking to cpu map was done as a proof of > > > concept, it yielded little other than a location where the use of get > > > could be cleaner by using its result. Reference count checking on > > > nsinfo identified a double free of the indirection layer and the > > > related threads, thereby identifying a data race as discussed here: > > > https://lore.kernel.org/linux-perf-users/CAP-5=fWZH20L4kv-BwVtGLwR=Em3AOOT+Q4QGivvQuYn5AsPRg@mail.gmail.com/ > > > Accordingly the dso->lock was extended and use to cover the race. > > > > > > An alternative that was considered was ref_tracker: > > > https://lwn.net/Articles/877603/ > > > ref_tracker requires use of a reference counted struct to also use a > > > cookie/tracker. The cookie is combined with data in a ref_tracker_dir > > > to spot double puts. When an object is finished with leaks can be > > > detected, as with this approach when leak analysis happens. This > > > approach was preferred as it doesn't introduce cookies, spots use > > > after put and appears moderately more neutral to the API. Weaknesses > > > of the implemented approcah are not being able to do adhoc leak > > > detection and a preference for adding an accessor API to structs. I > > > believe there are other issues and welcome suggestions. > > > > And so we've been here before (Dec 2015 to be exact). Namhyung pointed me to: > > https://lore.kernel.org/all/20151209021047.10245.8918.stgit@localhost.localdomain/ > > by Masami Hiramatsu. In this work he adds a leak sanitizer style > > reference count checker that will describe locations of puts and gets > > for diagnosis. Firstly that's an awesome achievement! This work is > > different in that it isn't trying to invent a leak sanitizer, it is > > just using the existing one. By adding a level of indirection this > > work can catch use after put and pairs gets with puts to make lifetime > > analysis more automatic. An advantage of Masami's work is that it > > doesn't change data-structures and after the initial patch-set is > > somewhat transparent. Overall I prefer the approach in these patches, > > future patches can look to clean up the API as Masami has. > > Thanks for referring my series :-D The series aimed to solve the refcount > usage issue in the perf which lead the object leaks. At that moment, > I found that there were 2 patterns, refcount start from 0 and start from 1. > That made me confused what I should do for using a object. > But the perf uses linux/refcount.h now, I hope such issue has already gone. > (but the object leakage seems not fixed fully yet, as you found.) > > BTW, I think the introducing UNWRAP_* macro may put a burden on future > development. If it is inevitable, we should consider it as carefully as > possible. Or, it may cause another issue (it is easily missed that the new > patch does not use UNWRAP_* for object reference, because it is natual.) > > So I agree with you that you to clean up the API. :-) > I think we can make yet another refcount.h for user space debugging and > replace it with the linux/refcount.h. Thanks Masami, Agreed on the UNWRAP_ macros, hence wanting to hide them behind accessors. Making accessors could be automated with macros, for example, have a list of variables, have a macro declare the struct using the list, another macro can use the list to declare accessors. I didn't find adding the UNWRAP_ macros in this change particularly burdensome as any use of the wrapping pointer as the original type caused a compile time error telling you what and where to fix. The macro is extra stuff in the way of using just the raw object, but that's fairly typical in C++ with shared_ptr, scoped_lock, etc. The question is, is it worth it to make sure use of the reference counted object is correct and misuse is easier to diagnose? I think it is near as least offensive as possible while providing the best information - hence being able to solve the dso->nsinfo put data race, that has been a problem to solve up to this point. I'm open to better suggestions though :-) Thanks again, Ian > Thank you, > > -- > Masami Hiramatsu