From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 926B2C433E7 for ; Thu, 8 Oct 2020 18:04:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D0EF020725 for ; Thu, 8 Oct 2020 18:04:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="cK8+ANXF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D0EF020725 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 03DD7940007; Thu, 8 Oct 2020 14:04:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F301E900002; Thu, 8 Oct 2020 14:04:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1FCC940007; Thu, 8 Oct 2020 14:04:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0118.hostedemail.com [216.40.44.118]) by kanga.kvack.org (Postfix) with ESMTP id B4668900002 for ; Thu, 8 Oct 2020 14:04:39 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3548A3623 for ; Thu, 8 Oct 2020 18:04:39 +0000 (UTC) X-FDA: 77349533478.07.drink46_3107de7271da Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 089BE1803F9AA for ; Thu, 8 Oct 2020 18:04:39 +0000 (UTC) X-HE-Tag: drink46_3107de7271da X-Filterd-Recvd-Size: 6993 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Thu, 8 Oct 2020 18:04:38 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id x22so4604257pfo.12 for ; Thu, 08 Oct 2020 11:04:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6l1DK1e7gAIZzYCOUa+9DeHIxDGazGiF/4pcELUamXc=; b=cK8+ANXFk3tz3l8K+RbBq0vC7MmDU/n/U9ksam7NlERPgeqtRkRL9UCcc2mqxnZQCT NCq6AyXAl1Ut/Bxk52WLelFJtVbD9SPVXNToK8KcxNdCmMopH0AhOhsHoFVBu5FmAGq8 DxpqLiX8SwZ3LuejagSLYNLvcEzK1RkrA4H/hMI21cnodnO3S1gXiNxEIDWKKie+szDD YVxaWPQN/llo8TE1y2FbxBl1vKZVJ9KvOsgStZD9xZbLkDiHX/FyKSUC/OSTNQcn6s5+ jLWZRIYYDBuMcOM5BJfnUIkldky/8g/B70BFeAw6r+GcuH+ZPIZoJhSko29U+KqocyDD +O/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6l1DK1e7gAIZzYCOUa+9DeHIxDGazGiF/4pcELUamXc=; b=EZ+O5FBcMInpz4gGDlfObaWWPnu69V+H8Dgij3pChzcT7pW4gF+cZqC9Gov3mvm/6N FBEspnqinTQ3PldUoSdFw3Ik5DVd9jJqTbLGuiW1Hk75PvnH2k1553Rg3TvT6V3Oa5MX NWNdKmiHmG4q6lkOlqecE8S3mDMEkhqOw+4N+AeqMYXQ8eS+h+Z9MXXi+BX2AS8KigBB dS4W4JimJrc36vSEHwhpIRc1luZFS55qLhtNnAvpUmd83QI6BJYA+q3JYguocBvz2H5B 0ldHqgyJqOOrDQXS0yVW7J4/luvXmd4D2jWG9xQqmeEysgsvbgare5aVQPUu3R69GYFK 01Dw== X-Gm-Message-State: AOAM530j/wD5anZM2BXaDOk9e76gcZz7FAGB7eN2sj70dVYAN8arBKHM 7z7mQumrKlBs+WprgZsdy0s2ZjnofyuKbVl2Gfp5Kg== X-Google-Smtp-Source: ABdhPJzmzHf4fPlIRX7zPtPyr2lguPurZk3aFDe2PCGkXR1mMJ+DUReB5Ul00NgUBULZPmZgbaHSi2QkLNv30QeJzHg= X-Received: by 2002:a62:2bd4:0:b029:152:8c07:d72b with SMTP id r203-20020a622bd40000b02901528c07d72bmr8843188pfr.42.1602180277281; Thu, 08 Oct 2020 11:04:37 -0700 (PDT) MIME-Version: 1.0 References: <20201007184403.1902111-1-axelrasmussen@google.com> <20201007184403.1902111-3-axelrasmussen@google.com> In-Reply-To: From: Axel Rasmussen Date: Thu, 8 Oct 2020 11:04:00 -0700 Message-ID: Subject: Re: [PATCH v2 2/2] mmap_lock: add tracepoints around lock acquisition To: Michel Lespinasse Cc: Steven Rostedt , Ingo Molnar , Andrew Morton , Vlastimil Babka , Daniel Jordan , Laurent Dufour , Jann Horn , Chinwen Chang , Yafang Shao , LKML , linux-mm Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 8, 2020 at 12:40 AM Michel Lespinasse wrote: > > On Wed, Oct 7, 2020 at 11:44 AM Axel Rasmussen wrote: > > The goal of these tracepoints is to be able to debug lock contention > > issues. This lock is acquired on most (all?) mmap / munmap / page fault > > operations, so a multi-threaded process which does a lot of these can > > experience significant contention. > > > > We trace just before we start acquisition, when the acquisition returns > > (whether it succeeded or not), and when the lock is released (or > > downgraded). The events are broken out by lock type (read / write). > > > > The events are also broken out by memcg path. For container-based > > workloads, users often think of several processes in a memcg as a single > > logical "task", so collecting statistics at this level is useful. > > > > The end goal is to get latency information. This isn't directly included > > in the trace events. Instead, users are expected to compute the time > > between "start locking" and "acquire returned", using e.g. synthetic > > events or BPF. The benefit we get from this is simpler code. > > > > Because we use tracepoint_enabled() to decide whether or not to trace, > > this patch has effectively no overhead unless tracepoints are enabled at > > runtime. If tracepoints are enabled, there is a performance impact, but > > how much depends on exactly what e.g. the BPF program does. > > > > Signed-off-by: Axel Rasmussen > > Thanks for working on this. > > I like that there is no overhead unless CONFIG_TRACING is set. Actually, it's even a bit better. Even if CONFIG_TRACING is set, the only overhead we add is a single static_key branch (much cheaper than a normal branch). The overhead is only introduced when the tracepoints are enabled at runtime, e.g. by writing to /sys/kernel/trace/events/(event)/enabled. > However, I think the __mmap_lock_traced_lock and similar functions are > the wrong level of abstraction, especially considering that we are > considering to switch away from using rwsem as the underlying lock > implementation. Would you consider something along the following lines > instead for include/linux/mmap_lock.h ? Sure, there's no reason that can't work. It's just a tradeoff: Breaking up these helpers the way you describe makes the locking functions more verbose (more repeated code; overall this file would be longer), but there are a few less lines to change when we switch from rwsem to something else (because we'd only need to change the lock functions vs. as-is we'd have to change a couple of lines in each of the helpers too). I prefer the way it is, but I don't feel that strongly about it. I'll refactor and send a v3 that looks as you describe. > > #ifdef CONFIG_TRACING > > DECLARE_TRACEPOINT(...); > > void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write); > > static inline void mmap_lock_trace_start_locking(struct mm_struct *mm, > bool write) > { > if (tracepoint_enabled(mmap_lock_start_locking)) > __mmap_lock_do_trace_start_locking(mm, write); > } > > #else > > static inline void mmap_lock_trace_start_locking(struct mm_struct *mm, > bool write) {} > > #endif > > static inline void mmap_write_lock(struct mm_struct *mm) > { > mmap_lock_trace_start_locking(mm, true); > down_write(&mm->mmap_lock); > mmap_lock_trace_acquire_returned(mm, true, true); > } > > I think this is more straightforward, and also the > mmap_lock_trace_start_locking and similar functions don't depend on > the underlying lock implementation. > > The changes to the other files look fine to me. > > -- > Michel "Walken" Lespinasse > A program is never fully debugged until the last user dies.