From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83878C433DF for ; Thu, 15 Oct 2020 13:09:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C75C122268 for ; Thu, 15 Oct 2020 13:09:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WU2JUfrQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C75C122268 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DC1246B0062; Thu, 15 Oct 2020 09:09:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D74026B0068; Thu, 15 Oct 2020 09:09:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C89B3900002; Thu, 15 Oct 2020 09:09:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id 98A166B0062 for ; Thu, 15 Oct 2020 09:09:00 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 25980180ACF86 for ; Thu, 15 Oct 2020 13:09:00 +0000 (UTC) X-FDA: 77374190040.08.plane45_3b12abc27214 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 063261819E76F for ; Thu, 15 Oct 2020 13:08:59 +0000 (UTC) X-HE-Tag: plane45_3b12abc27214 X-Filterd-Recvd-Size: 10236 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Thu, 15 Oct 2020 13:08:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602767338; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=amWWG+kTuqi4xhGPXsH3kRbTL+pMT2RSM58mjdpA5YA=; b=WU2JUfrQQppboTbKLf6eVcJdwIFPDBk5B/PXWu+zMFhsqzX1S/bvJrYdNfS5h2P6/w6U5Y zEe9Ny7+/Lhqpq4ur14M4B28L7JV1tFEu+mZxOlYjpRg3aSzGl1TQnHUSt334ASGALThRT xhKFNLK1CrichfXJghD9Hm1I6B2NIjc= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-411-wDXanjMdMA-u9FS6iJknQg-1; Thu, 15 Oct 2020 09:08:57 -0400 X-MC-Unique: wDXanjMdMA-u9FS6iJknQg-1 Received: by mail-wm1-f72.google.com with SMTP id 13so1843280wmf.0 for ; Thu, 15 Oct 2020 06:08:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=amWWG+kTuqi4xhGPXsH3kRbTL+pMT2RSM58mjdpA5YA=; b=WhMUVrlE09HoOJeRYPtn22QCFBJIkmZXQuBnW9vNeuRAZ1cmUM9hhp25YLqjkaT9e+ /YH9zdbCaI4/0iwm5/ThLNmfNw2i1b2//tYDhBdZ47+6WPAPCBqTY/uPK9ebIDkwTyIf p++Gav2c3qOxO36IgHnp0W88zhbU54J7CATEN0DjvM61y5P9mKSEINC+aitF5q13sYWh ons8FsV9lWbNH/yvJSmsT+sb2Km7PDOvoWzHRekbXqED57IxXNSdECTEGwjtM2Oe6JvZ Yd6wur1Ylb2WrOUetnC+dNrQ19XFDddmnV3BD1epuJvH/aO8skljDmfMGeTG0g/ID+Wm R/2g== X-Gm-Message-State: AOAM532IK2QFotnyNH6mrR+Mt0aNDZnWgAtr2HmMM1KZA0dCqQZEez81 3vc2+88uP3PZ1CSI2MO7rEoRxSfp1fqftlF5as1/ydfhl4muqhtPlrhbQ54SGeR50+2/UO+m8EP GwGqciioin3c= X-Received: by 2002:a5d:6a0a:: with SMTP id m10mr4383684wru.189.1602767336025; Thu, 15 Oct 2020 06:08:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzQZrhtSW2enOmvoatQSToAAHB1KKoTngRA1irazmfnBLgVhAYUNwX7OIg9ea4ecei3uByAHA== X-Received: by 2002:a5d:6a0a:: with SMTP id m10mr4383648wru.189.1602767335789; Thu, 15 Oct 2020 06:08:55 -0700 (PDT) Received: from redhat.com (bzq-79-176-118-93.red.bezeqint.net. [79.176.118.93]) by smtp.gmail.com with ESMTPSA id g83sm4251115wmf.15.2020.10.15.06.08.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Oct 2020 06:08:55 -0700 (PDT) Date: Thu, 15 Oct 2020 09:08:52 -0400 From: "Michael S. Tsirkin" To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Andrew Morton , Jason Wang , Pankaj Gupta , Michal Hocko , Oscar Salvador , Wei Yang Subject: Re: [PATCH v1 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block Message-ID: <20201015090815-mutt-send-email-mst@kernel.org> References: <20201012125323.17509-1-david@redhat.com> <20201012125323.17509-28-david@redhat.com> MIME-Version: 1.0 In-Reply-To: <20201012125323.17509-28-david@redhat.com> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=mst@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 12, 2020 at 02:53:21PM +0200, David Hildenbrand wrote: > virtio-mem soon wants to use offline_and_remove_memory() memory that > exceeds a single Linux memory block (memory_block_size_bytes()). Let's > remove that restriction. > > Let's remember the old state and try to restore that if anything goes > wrong. While re-onlining can, in general, fail, it's highly unlikely to > happen (usually only when a notifier fails to allocate memory, and these > are rather rare). > > This will be used by virtio-mem to offline+remove memory ranges that are > bigger than a single memory block - for example, with a device block > size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory > block size of 128MB. > > While we could compress the state into 2 bit, using 8 bit is much > easier. > > This handling is similar, but different to acpi_scan_try_to_offline(): > > a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG > optimization is still relevant - it should only apply to ZONE_NORMAL > (where we have no guarantees). If relevant, we can always add it. > > b) acpi_scan_try_to_offline() simply onlines all memory in case > something goes wrong. It doesn't restore previous online type. Let's do > that, so we won't overwrite what e.g., user space configured. > > Cc: "Michael S. Tsirkin" > Cc: Jason Wang > Cc: Pankaj Gupta > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: Wei Yang > Cc: Andrew Morton > Signed-off-by: David Hildenbrand Could I get some acks from mm folks for this one? The rest can go in through my tree I guess ... Andrew? Thanks! > --- > mm/memory_hotplug.c | 105 +++++++++++++++++++++++++++++++++++++------- > 1 file changed, 89 insertions(+), 16 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index b44d4c7ba73b..217080ca93e5 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1806,39 +1806,112 @@ int remove_memory(int nid, u64 start, u64 size) > } > EXPORT_SYMBOL_GPL(remove_memory); > > +static int try_offline_memory_block(struct memory_block *mem, void *arg) > +{ > + uint8_t online_type = MMOP_ONLINE_KERNEL; > + uint8_t **online_types = arg; > + struct page *page; > + int rc; > + > + /* > + * Sense the online_type via the zone of the memory block. Offlining > + * with multiple zones within one memory block will be rejected > + * by offlining code ... so we don't care about that. > + */ > + page = pfn_to_online_page(section_nr_to_pfn(mem->start_section_nr)); > + if (page && zone_idx(page_zone(page)) == ZONE_MOVABLE) > + online_type = MMOP_ONLINE_MOVABLE; > + > + rc = device_offline(&mem->dev); > + /* > + * Default is MMOP_OFFLINE - change it only if offlining succeeded, > + * so try_reonline_memory_block() can do the right thing. > + */ > + if (!rc) > + **online_types = online_type; > + > + (*online_types)++; > + /* Ignore if already offline. */ > + return rc < 0 ? rc : 0; > +} > + > +static int try_reonline_memory_block(struct memory_block *mem, void *arg) > +{ > + uint8_t **online_types = arg; > + int rc; > + > + if (**online_types != MMOP_OFFLINE) { > + mem->online_type = **online_types; > + rc = device_online(&mem->dev); > + if (rc < 0) > + pr_warn("%s: Failed to re-online memory: %d", > + __func__, rc); > + } > + > + /* Continue processing all remaining memory blocks. */ > + (*online_types)++; > + return 0; > +} > + > /* > - * Try to offline and remove a memory block. Might take a long time to > - * finish in case memory is still in use. Primarily useful for memory devices > - * that logically unplugged all memory (so it's no longer in use) and want to > - * offline + remove the memory block. > + * Try to offline and remove memory. Might take a long time to finish in case > + * memory is still in use. Primarily useful for memory devices that logically > + * unplugged all memory (so it's no longer in use) and want to offline + remove > + * that memory. > */ > int offline_and_remove_memory(int nid, u64 start, u64 size) > { > - struct memory_block *mem; > - int rc = -EINVAL; > + const unsigned long mb_count = size / memory_block_size_bytes(); > + uint8_t *online_types, *tmp; > + int rc; > > if (!IS_ALIGNED(start, memory_block_size_bytes()) || > - size != memory_block_size_bytes()) > - return rc; > + !IS_ALIGNED(size, memory_block_size_bytes()) || !size) > + return -EINVAL; > + > + /* > + * We'll remember the old online type of each memory block, so we can > + * try to revert whatever we did when offlining one memory block fails > + * after offlining some others succeeded. > + */ > + online_types = kmalloc_array(mb_count, sizeof(*online_types), > + GFP_KERNEL); > + if (!online_types) > + return -ENOMEM; > + /* > + * Initialize all states to MMOP_OFFLINE, so when we abort processing in > + * try_offline_memory_block(), we'll skip all unprocessed blocks in > + * try_reonline_memory_block(). > + */ > + memset(online_types, MMOP_OFFLINE, mb_count); > > lock_device_hotplug(); > - mem = find_memory_block(__pfn_to_section(PFN_DOWN(start))); > - if (mem) > - rc = device_offline(&mem->dev); > - /* Ignore if the device is already offline. */ > - if (rc > 0) > - rc = 0; > + > + tmp = online_types; > + rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block); > > /* > - * In case we succeeded to offline the memory block, remove it. > + * In case we succeeded to offline all memory, remove it. > * This cannot fail as it cannot get onlined in the meantime. > */ > if (!rc) { > rc = try_remove_memory(nid, start, size); > - WARN_ON_ONCE(rc); > + if (rc) > + pr_err("%s: Failed to remove memory: %d", __func__, rc); > + } > + > + /* > + * Rollback what we did. While memory onlining might theoretically fail > + * (nacked by a notifier), it barely ever happens. > + */ > + if (rc) { > + tmp = online_types; > + walk_memory_blocks(start, size, &tmp, > + try_reonline_memory_block); > } > unlock_device_hotplug(); > > + kfree(online_types); > return rc; > } > EXPORT_SYMBOL_GPL(offline_and_remove_memory); > -- > 2.26.2