From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF927C47253 for ; Thu, 30 Apr 2020 18:58:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9E7932073E for ; Thu, 30 Apr 2020 18:58:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="ftYR3vvy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726754AbgD3S6c (ORCPT ); Thu, 30 Apr 2020 14:58:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726548AbgD3S6a (ORCPT ); Thu, 30 Apr 2020 14:58:30 -0400 Received: from mail-ej1-x642.google.com (mail-ej1-x642.google.com [IPv6:2a00:1450:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53C7AC08E934 for ; Thu, 30 Apr 2020 11:58:30 -0700 (PDT) Received: by mail-ej1-x642.google.com with SMTP id x1so5557322ejd.8 for ; Thu, 30 Apr 2020 11:58:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VO82Y+ddPqwVbejb8yp1YADHo+TiaduERH1aUXgFNdU=; b=ftYR3vvy7y4XQUHnig2EWsnYgRjVl+CPW2em4ftSq034hx6ZoqSueCcso+RG9w8AGa w0/+YALKEw0FIaP05V+9C0XKOYuOmd59KAW1siA/r09GzwZUf48N8GmxNx1dxSbliixs CuBWjC2ORmo5GZgQNulH0b+ykeGy5IQzQog4ZcfdBGWxbr/kGwbKqiBi3jZCbBsGvnNW Q5W2E6yv533qZlcTHc6rxcPndCCEwjOVK0cZ2rz6LGVrs0kCEepYAF9z3xs1VIBjMZtf A5p7YNgqSxKHyCub+EuUXAZ2xuX50SSxRFhOGGuo9nNwQKHccsh0XWdv0oNDM0uH1ryy dbvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VO82Y+ddPqwVbejb8yp1YADHo+TiaduERH1aUXgFNdU=; b=GzBnoXMq0bHlGJxm6op9W9zD/qWzyWszV0mPF4uJJUQj3T+oTxuCWeY4okvXgBvZfZ ogmtDOL1Vk77ySJFUIHzqCMLtsVRMEG6XC/QEDBkHDSe7N/UbnNJ+0cQhGcSRrAbZPM1 h06uFWZSZwd8Zq0HS9TBbqOYCXIOySvYS3wMqRleu0zV8AJwUjefQOU+LgpVxL0v4x6l p6VMx1Kq71Pba9XZKmSPJWZhAWDUzi28oTzz6EMeTO/M3AFR7+pc5ayZgfmJ2KD3AxDM JpTt3CKdWnpSmvQ9syOsf3lUD/3gHRSAil2Bc+pkXVYb05GmcQ3cJ9SLStx4672taSZr knxg== X-Gm-Message-State: AGi0PuayS05AEjcP9iuWIdSgz+R94XyQ9UCEOSJZ6cHmOl4+T8uFilWZ wPc8Hyhyjqz4qU1JJajA344M2dowfqFW8CzMY4Jv0Q== X-Google-Smtp-Source: APiQypLtO6lmgP+/t4n/a1LprqtsF2znfD77bnYaqHJRks9KFzncsFlKVdj99iACRR541U4ie6ERYFDj3Eno6uEUPdw= X-Received: by 2002:a17:906:855a:: with SMTP id h26mr4305126ejy.56.1588273108788; Thu, 30 Apr 2020 11:58:28 -0700 (PDT) MIME-Version: 1.0 References: <20200430102908.10107-1-david@redhat.com> <20200430102908.10107-3-david@redhat.com> <87pnbp2dcz.fsf@x220.int.ebiederm.org> <1b49c3be-6e2f-57cb-96f7-f66a8f8a9380@redhat.com> <871ro52ary.fsf@x220.int.ebiederm.org> <373a6898-4020-4af1-5b3d-f827d705dd77@redhat.com> <875zdg26hp.fsf@x220.int.ebiederm.org> In-Reply-To: From: Dan Williams Date: Thu, 30 Apr 2020 11:58:16 -0700 Message-ID: Subject: Re: [PATCH v2 2/3] mm/memory_hotplug: Introduce MHP_NO_FIRMWARE_MEMMAP To: David Hildenbrand Cc: "Eric W. Biederman" , Andrew Morton , Linux Kernel Mailing List , Linux MM , virtio-dev@lists.oasis-open.org, virtualization@lists.linux-foundation.org, linuxppc-dev , Linux ACPI , linux-nvdimm , linux-hyperv@vger.kernel.org, linux-s390 , xen-devel , Michal Hocko , "Michael S . Tsirkin" , Michal Hocko , Pankaj Gupta , Wei Yang , Baoquan He Content-Type: text/plain; charset="UTF-8" Sender: linux-hyperv-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-hyperv@vger.kernel.org On Thu, Apr 30, 2020 at 11:44 AM David Hildenbrand wrote: > > >>> If the class of memory is different then please by all means let's mark > >>> it differently in struct resource so everyone knows it is different. > >>> But that difference needs to be more than hotplug. > >>> > >>> That difference needs to be the hypervisor loaned us memory and might > >>> take it back at any time, or this memory is persistent and so it has > >>> these different characteristics so don't use it as ordinary ram. > >> > >> Yes, and I think kmem took an excellent approach of explicitly putting > >> that "System RAM" into a resource hierarchy. That "System RAM" won't > >> show up as a root node under /proc/iomem (see patch #3), which already > >> results in kexec-tools to treat it in a special way. I am thinking about > >> doing the same for virtio-mem. > > > > Reading this and your patch cover letters again my concern is that > > the justification seems to be letting the tail wag the dog. > > > > You want kexec-tools to behave in a certain way so you are changing the > > kernel. > > > > Rather it should be change the kernel to clearly reflect reality and if > > you can get away without a change to kexec-tools that is a bonus. > > > > Right, because user space has to have a way to figure out what to do. > > But talking about the firmware memmap, indicating something via a "raw > firmware-provided memory map", that is not actually in the "raw > firmware-provided memory map" feels wrong to me. (below) > > > >>> That information is also useful to other people looking at the system > >>> and seeing what is going on. > >>> > >>> Just please don't muddle the concepts, or assume that whatever subset of > >>> hotplug memory you are dealing with is the only subset. > >> > >> I can certainly rephrase the subject/description/comment, stating that > >> this is not to be used for ordinary hotplugged DIMMs - only when the > >> device driver is under control to decide what to do with that memory - > >> especially when kexec'ing. > >> > >> (previously, I called this flag MHP_DRIVER_MANAGED, but I think > >> MHP_NO_FIRMWARE_MEMMAP is clearer, we just need a better description) > >> > >> Would that make it clearer? > > > > I am not certain, but Andrew Morton deliberately added that > > firmware_map_add_hotplug call. Which means that there is a reason > > for putting hotplugged memory in the firmware map. > > > > So the justification needs to take that reason into account. The > > justification can not be it is hotplugged therefore it should not belong > > in the firmware memory map. Unless you can show that > > firmware_map_add_hotplug that was actually a bug and should be removed. > > But as it has been that way since 2010 that seems like a long shot. > > > > So my question is what is right for the firmware map? > > We have documentation for that since 2008. Andrews patch is from 2010. > > Documentation/ABI/testing/sysfs-firmware-memmap > > It clearly talks about "raw firmware-provided memory map" and why the > interface was introduced at all ("on most architectures that > firmware-provided memory map is modified afterwards by the kernel itself"). > > > > > Why does the firmware map support hotplug entries? > > I assume: > > The firmware memmap was added primarily for x86-64 kexec (and still, is > mostly used on x86-64 only IIRC). There, we had ACPI hotplug. When DIMMs > get hotplugged on real HW, they get added to e820. Same applies to > memory added via HyperV balloon (unless memory is unplugged via > ballooning and you reboot ... the the e820 is changed as well). I assume > we wanted to be able to reflect that, to make kexec look like a real reboot. I can at least say that this breakdown makes sense to me. Traditional memory hotplug results in permanent change to the raw firmware memory map reported by the host at next reboot. These device-driver-owned memory regions really want a hotplug policy per-kernel boot instance and should fall back to the default reserved state at reboot (kexec or otherwise). When I say hotplug-policy I mean whether the current kernel wants to treat the device range as System RAM or leave it as device-managed. The intent is that the follow-on kernel needs to re-decide the device policy. > > This worked for a while. Then came dax/kmem. Now comes virtio-mem. >