From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 785EFC433EF for ; Tue, 26 Apr 2022 15:54:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352589AbiDZP5i (ORCPT ); Tue, 26 Apr 2022 11:57:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236826AbiDZP5g (ORCPT ); Tue, 26 Apr 2022 11:57:36 -0400 Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0676D15CEAE for ; Tue, 26 Apr 2022 08:54:29 -0700 (PDT) Received: by mail-io1-xd2b.google.com with SMTP id r28so1805307iot.1 for ; Tue, 26 Apr 2022 08:54:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=HSvtwZpAkuv6AVEJodrvJP+b239G8tFOswN2WmHZxq8=; b=lXjd8yPyjZf18NMqeiqatwHcNE2dOivYsc7mNkxXwUVYTlkLisfHjlBnK2LoTAp1xa iKTta2kHOKebIQuYChpe4sNbAch9kf/Ed4F+nb0CTtwuSCpGFJU+1/QaxQuM/0K1IXst SnO2WiMbkcVvbJROkbhJ5VUz60X9KqKUruXfkzRgUPiBt/re410DBnH2DVBITuSlx+pW tVWJWMT1NN0hC9p/Qaj3KK1u3tKyjUVa1zgW7KGNP2G6gF53ZVeSz6iK/LJgFbf1oBW9 wEPQ12vESNVx2DVIiUmW+ctHr5v/F7eHkZOvXg9P7ipdJ1OHU9F8SzRRmFXY/SinGXnr TqVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HSvtwZpAkuv6AVEJodrvJP+b239G8tFOswN2WmHZxq8=; b=r5gPGcWVU7Jg1FrBYtOBOpmADX/e+ckTKaYzDByYnBwDmi4T4IQTlOT+P3nGdGIuHo 2kmzgMC9wqQUFtXVoteAZBR9oglFS9Xp96clfdLNcbMrcCSg94Zsc9jPma0M1p2/7zNk R3EfX/LA6qmJmP0eh6hfz2DfSaT1KKe1iXijEKnHdgVlxZ5V9Wxh9ypilijziBHlHOe8 bIm0hgBuQEOT/sfIZCHUTkUVsBAUpGJAuRJZghUNQ48gt6F/lSJT9L5UX0/w14WMSqnT V94THLg7U9JCsklNtyfC/Q0EL8B0XaIZ8vZPneqbsaXzo9+Xmliuv/5+oBcFG2uIDvjF LTkQ== X-Gm-Message-State: AOAM5313MO4q8OGUVEIJWcHYrHZX6pUJOkmKaB6EFMf0Hena8axDBwN1 K3X1lrA641y1NiNs0CI+ekHia93OZN69wHagh7o= X-Google-Smtp-Source: ABdhPJx+h0r4To7YsiEDfg+uv/p+xfZl7R8UmHsNh2Cu9yrM2ynd+3g6PaHcek9Eoux2Vt8DNBXdcNz8uaAfEZJp894= X-Received: by 2002:a5d:9f4e:0:b0:652:2323:2eb8 with SMTP id u14-20020a5d9f4e000000b0065223232eb8mr9580794iot.79.1650988468412; Tue, 26 Apr 2022 08:54:28 -0700 (PDT) MIME-Version: 1.0 References: <20220120162520.570782-1-valentin.schneider@arm.com> <93a20759600c05b6d9e4359a1517c88e06b44834.camel@fb.com> <20220422110903.GW2731@worktop.programming.kicks-ass.net> <056e9bb0d0e3fc20572d42db7386face1d0665d6.camel@fb.com> <20220426140959.op6u5m7id57aq7yc@wubuntu> In-Reply-To: <20220426140959.op6u5m7id57aq7yc@wubuntu> From: Andrii Nakryiko Date: Tue, 26 Apr 2022 08:54:17 -0700 Message-ID: Subject: Re: [PATCH] sched/tracing: append prev_state to tp args instead To: Qais Yousef Cc: Peter Zijlstra , Alexei Starovoitov , Delyan Kratunov , Namhyung Kim , Arnaldo Carvalho de Melo , "bigeasy@linutronix.de" , "dietmar.eggemann@arm.com" , "keescook@chromium.org" , "x86@kernel.org" , "andrii@kernel.org" , "u.kleine-koenig@pengutronix.de" , "vincent.guittot@linaro.org" , "akpm@linux-foundation.org" , "mingo@kernel.org" , "linux-kernel@vger.kernel.org" , "rdunlap@infradead.org" , "rostedt@goodmis.org" , "Kenta.Tada@sony.com" , "tglx@linutronix.de" , "bristot@redhat.com" , "ebiederm@xmission.com" , "ast@kernel.org" , "legion@kernel.org" , "adharmap@quicinc.com" , "valentin.schneider@arm.com" , "ed.tsai@mediatek.com" , "juri.lelli@redhat.com" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 26, 2022 at 7:10 AM Qais Yousef wrote: > > On 04/26/22 14:28, Peter Zijlstra wrote: > > On Fri, Apr 22, 2022 at 11:30:12AM -0700, Alexei Starovoitov wrote: > > > On Fri, Apr 22, 2022 at 10:22 AM Delyan Kratunov wrote: > > > > > > > > On Fri, 2022-04-22 at 13:09 +0200, Peter Zijlstra wrote: > > > > > And on the other hand; those users need to be fixed anyway, right? > > > > > Accessing prev->__state is equally broken. > > > > > > > > The users that access prev->__state would most likely have to be fixed, for sure. > > > > > > > > However, not all users access prev->__state. `offcputime` for example just takes a > > > > stack trace and associates it with the switched out task. This kind of user > > > > would continue working with the proposed patch. > > > > > > > > > If bpf wants to ride on them, it needs to suffer the pain of doing so. > > > > > > > > Sure, I'm just advocating for a fairly trivial patch to avoid some of the suffering, > > > > hopefully without being a burden to development. If that's not the case, then it's a > > > > clear no-go. > > > > > > > > > Namhyung just sent this patch set: > > > https://patchwork.kernel.org/project/netdevbpf/patch/20220422053401.208207-3-namhyung@kernel.org/ > > > > That has: > > > > + * recently task_struct->state renamed to __state so it made an incompatible > > + * change. > > > > git tells me: > > > > 2f064a59a11f ("sched: Change task_struct::state") > > > > is almost a year old by now. That don't qualify as recently in my book. > > That says that 'old kernels used to call this...'. > > > > > to add off-cpu profiling to perf. > > > It also hooks into sched_switch tracepoint. > > > Notice it deals with state->__state rename just fine. > > > > So I don't speak BPF much; it always takes me more time to make bpf work > > than to just hack up the kernel, which makes it hard to get motivated. > > > > However, it was not just a rename, state changed type too, which is why I > > did the rename, to make sure all users would get a compile fail and > > could adjust. > > > > If you're silently making it work by frobbing the name, you loose that. > > > > Specifically, task_struct::state used to be 'volatile long', while > > task_struct::__state is 'unsigned int'. As such, any user must now be > > very careful to use READ_ONCE(). I don't see that happening with just > > frobbing the name. > > > > Additinoally, by shrinking the field, I suppose BE systems get to keep > > the pieces? > > > > > But it will have a hard time without this patch > > > until we add all the extra CO-RE features to detect > > > and automatically adjust bpf progs when tracepoint > > > arguments order changed. > > > > Could be me, but silently making it work sounds like fail :/ There's a > > reason code changes, users need to adapt, not silently pretend stuff is > > as before. > > > > How will you know you need to fix your tool? > > If libbpf doesn't fail, then yeah it's a big problem. I wonder how users of > kprobe who I suppose are more prone to this kind of problems have been coping. See my reply to Peter. libbpf can't know user's intent to fail this automatically, in general. In some cases when it can it does accommodate this automatically. In other cases it provides instruments for user to handle this (bpf_core_field_size(), BPF_CORE_READ_BITFIELD(), etc). But in the end no one eliminated the need for testing your application for correctness. Tracing programs do break on kernel changes and BPF users do adapt to them. Sometimes adapting is easy (like state -> __state transition), sometimes it's much more involved (like this argument order change). > > > > > > We will do it eventually, of course. > > > There will be additional work in llvm, libbpf, kernel, etc. > > > But for now I think it would be good to land Delyan's patch > > > to avoid unnecessary pain to all the users. > > > > > > Peter, do you mind? > > > > I suppose I can help out this time, but I really don't want to set a > > precedent for these things. Broken is broken. > > > > The down-side for me is that the argument order no longer makes any > > sense. > > I'm intending to backport fa2c3254d7cf to 5.10 and 5.15 but waiting for > a Tested-by. If you take this one, then it'll need to be backported too. > > Cheers > > -- > Qais Yousef