From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5701FC33CB6 for ; Fri, 17 Jan 2020 16:58:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D80862072E for ; Fri, 17 Jan 2020 16:58:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="YRuMaL7f" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D80862072E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3657E6B04CB; Fri, 17 Jan 2020 11:58:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 315F96B04CC; Fri, 17 Jan 2020 11:58:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22CCD6B04CD; Fri, 17 Jan 2020 11:58:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70]) by kanga.kvack.org (Postfix) with ESMTP id 0A4806B04CB for ; Fri, 17 Jan 2020 11:58:05 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id A4962181AEF10 for ; Fri, 17 Jan 2020 16:58:04 +0000 (UTC) X-FDA: 76387733688.14.egg48_29528a7fc405e X-HE-Tag: egg48_29528a7fc405e X-Filterd-Recvd-Size: 9390 Received: from mail-ot1-f65.google.com (mail-ot1-f65.google.com [209.85.210.65]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Jan 2020 16:58:03 +0000 (UTC) Received: by mail-ot1-f65.google.com with SMTP id k14so23133386otn.4 for ; Fri, 17 Jan 2020 08:58:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SVdmKszNnhoaHDNeCmKwKg+oKUWstdwrLDQTNGyOrtg=; b=YRuMaL7fGyr27X1i82sHIbXhAORF5u2Ym2ug9SABWZbWxeWrsckjPfMR7nQSgBasws wonoYfYIfFS1mntoWdBSNrrkhYxkdQaruSKny15ufforyO0fUntf8yJhURCop8gYv0RK RItVhTIaa8EJ9a0bFB57G0/17NoTea9XXE9yjT2DRg+XvUaNGeCYr4csHKZUTmX2h0Dv u+vm201ZbxReZC2ebgQFg4B3vZ0t29B6IkMhDtOyGtxMmYsT/T5UBFO1VGqYp4kjbWGA 29f0cmYunZuedtDvxKjeXShJuVELAvafbVdbHdA4kNpupDWzpMFMNFVtXJON18R9heCj dIHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SVdmKszNnhoaHDNeCmKwKg+oKUWstdwrLDQTNGyOrtg=; b=JJNwLP40LCfa2HdslmvPIjU5pv/leaP5kyiOiZSZkl7azD+lWe/zrU//PKnhYiUHSS NFrgHcXu6XLR6QngYJ9xHx+xfv2jDhuk4TgClBnSSou8uc1b/ffpgKtwaagYng008PjQ FC0aC+BIvJRRjT9jvEbPPivQDOMeaV2PLTRZSEP+hNViLb+Z9ITyYpMhN46e+oZKzHSR qFLape95e4oTkYJ83nHN4f+NO80j5Mcoi5A9EzPiGUwChKWWE4v7vP3VRK29ke2wQb50 EPcVV0Hv0Cjcw80ieGSl2WwU/VXOmHaB3VUuEW8rLEdliTnI53/e9jjNkbsF53R1r1gf eN1A== X-Gm-Message-State: APjAAAUGDmXu62xZx6hdiyDyE0u6vAgJbF0euikRQcAfFCB24cQELiwO 8Www5G9e0aCE90o4NOHX90PMWrvmDGvDxWuk82qspA== X-Google-Smtp-Source: APXvYqxgOgqv3ymmNMR4Qi8eZqfarlt0GXyxGWemoSfAqaNgr6Z1q23zsOW9tFkDz0e186QlkX0vwUXqICR3gNOvyhQ= X-Received: by 2002:a9d:68cc:: with SMTP id i12mr6744052oto.207.1579280282992; Fri, 17 Jan 2020 08:58:02 -0800 (PST) MIME-Version: 1.0 References: <20200117105759.27905-1-david@redhat.com> <20200117113353.GT19428@dhcp22.suse.cz> <20200117145233.GB19428@dhcp22.suse.cz> <65606e2e-1cf7-de3b-10b1-33653cb41a52@redhat.com> <20200117152947.GK19428@dhcp22.suse.cz> <25a94f61-46a1-59a6-6b54-8cc6b35790d2@redhat.com> In-Reply-To: <25a94f61-46a1-59a6-6b54-8cc6b35790d2@redhat.com> From: Dan Williams Date: Fri, 17 Jan 2020 08:57:51 -0800 Message-ID: Subject: Re: [PATCH RFC v1] mm: is_mem_section_removable() overhaul To: David Hildenbrand Cc: Michal Hocko , Linux Kernel Mailing List , Linux MM , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Leonardo Bras , Nathan Lynch , Allison Randal , Nathan Fontenot , Thomas Gleixner , Stephen Rothwell , Anshuman Khandual , lantianyu1986@gmail.com, linuxppc-dev Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 17, 2020 at 8:11 AM David Hildenbrand wrote: > > On 17.01.20 16:54, Dan Williams wrote: > > On Fri, Jan 17, 2020 at 7:30 AM Michal Hocko wrote: > >> > >> On Fri 17-01-20 15:58:26, David Hildenbrand wrote: > >>> On 17.01.20 15:52, Michal Hocko wrote: > >>>> On Fri 17-01-20 14:08:06, David Hildenbrand wrote: > >>>>> On 17.01.20 12:33, Michal Hocko wrote: > >>>>>> On Fri 17-01-20 11:57:59, David Hildenbrand wrote: > >>>>>>> Let's refactor that code. We want to check if we can offline memory > >>>>>>> blocks. Add a new function is_mem_section_offlineable() for that and > >>>>>>> make it call is_mem_section_offlineable() for each contained section. > >>>>>>> Within is_mem_section_offlineable(), add some more sanity checks and > >>>>>>> directly bail out if the section contains holes or if it spans multiple > >>>>>>> zones. > >>>>>> > >>>>>> I didn't read the patch (yet) but I am wondering. If we want to touch > >>>>>> this code, can we simply always return true there? I mean whoever > >>>>>> depends on this check is racy and the failure can happen even after > >>>>>> the sysfs says good to go, right? The check is essentially as expensive > >>>>>> as calling the offlining code itself. So the only usecase I can think of > >>>>>> is a dumb driver to crawl over blocks and check which is removable and > >>>>>> try to hotremove it. But just trying to offline one block after another > >>>>>> is essentially going to achieve the same. > >>>>> > >>>>> Some thoughts: > >>>>> > >>>>> 1. It allows you to check if memory is likely to be offlineable without > >>>>> doing expensive locking and trying to isolate pages (meaning: > >>>>> zone->lock, mem_hotplug_lock. but also, calling drain_all_pages() > >>>>> when isolating) > >>>>> > >>>>> 2. There are use cases that want to identify a memory block/DIMM to > >>>>> unplug. One example is PPC DLPAR code (see this patch). Going over all > >>>>> memory block trying to offline them is an expensive operation. > >>>>> > >>>>> 3. powerpc-utils (https://github.com/ibm-power-utilities/powerpc-utils) > >>>>> makes use of /sys/.../removable to speed up the search AFAIK. > >>>> > >>>> Well, while I do see those points I am not really sure they are worth > >>>> having a broken (by-definition) interface. > >>> > >>> It's a pure speedup. And for that, the interface has been working > >>> perfectly fine for years? > >>> > >>>> > >>>>> 4. lsmem displays/groups by "removable". > >>>> > >>>> Is anybody really using that? > >>> > >>> Well at least I am using that when testing to identify which > >>> (ZONE_NORMAL!) block I can easily offline/re-online (e.g., to validate > >>> all the zone shrinking stuff I have been fixing) > >>> > >>> So there is at least one user ;) > >> > >> Fair enough. But I would argue that there are better ways to do the same > >> solely for testing purposes. Rather than having a subtly broken code to > >> maintain. > >> > >>>> > >>>>>> Or does anybody see any reasonable usecase that would break if we did > >>>>>> that unconditional behavior? > >>>>> > >>>>> If we would return always "true", then the whole reason the > >>>>> interface originally was introduced would be "broken" (meaning, less > >>>>> performant as you would try to offline any memory block). > >>>> > >>>> I would argue that the whole interface is broken ;). Not the first time > >>>> in the kernel development history and not the last time either. What I > >>>> am trying to say here is that unless there are _real_ usecases depending > >>>> on knowing that something surely is _not_ offlineable then I would just > >>>> try to drop the functionality while preserving the interface and see > >>>> what happens. > >>> > >>> I can see that, but I can perfectly well understand why - especially > >>> powerpc - wants a fast way to sense which blocks actually sense to try > >>> to online. > >>> > >>> The original patch correctly states > >>> "which sections of > >>> memory are likely to be removable before attempting the potentially > >>> expensive operation." > >>> > >>> It works as designed I would say. > >> > >> Then I would just keep it crippled the same way it has been for years > >> without anybody noticing. > > > > I tend to agree. At least the kmem driver that wants to unplug memory > > could not use an interface that does not give stable answers. It just > > relies on remove_memory() to return a definitive error. > > > > Just because kmem cannot reuse such an interface doesn't mean we should > not touch it (or I am not getting your point). Especially, this > interface is about "can it be likely be offlined and then eventually be > removed (if there is a HW interface for that)" (as documented), not > about "will remove_memory()" work. Unless the user is willing to hold the device_hotplug_lock over the evaluation then the result is unreliable. For example the changes to removable_show() are no better than they were previously because the result is invalidated as soon as the lock is dropped. > We do have users and if we agree to keep it (what I think we should as I > expressed) then I think we should un-cripple and fix it. After all we > have to maintain it. The current interface provides what was documented > - "likely to be offlineable". (the chosen name was just horribly bad - > as I expressed a while ago already :) ) Are there users that misbehave because they try to offline a section with holes? I brought up kmem because it's an unplug user that does not care whether the failure was due to pinned pages or holes in sections. Do others care about that precision in a meaningful way?