linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: wei.w.wang@intel.com
Cc: virtio-dev@lists.oasis-open.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	virtualization <virtualization@lists.linux-foundation.org>,
	KVM list <kvm@vger.kernel.org>, linux-mm <linux-mm@kvack.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Michal Hocko <mhocko@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	liliang.opensource@gmail.com, yang.zhang.wz@gmail.com,
	quan.xu0@gmail.com, nilal@redhat.com,
	Rik van Riel <riel@redhat.com>,
	peterx@redhat.com
Subject: Re: [PATCH v35 1/5] mm: support to get hints of free page blocks
Date: Tue, 10 Jul 2018 18:44:34 -0700	[thread overview]
Message-ID: <CA+55aFzqj8wxXnHAdUTiOomipgFONVbqKMjL_tfk7e5ar1FziQ@mail.gmail.com> (raw)
In-Reply-To: <5B455D50.90902@intel.com>

On Tue, Jul 10, 2018 at 6:24 PM Wei Wang <wei.w.wang@intel.com> wrote:
>
> We only get addresses of the "MAX_ORDER-1" blocks into the array. The
> max size of the array that could be allocated by kmalloc is
> KMALLOC_MAX_SIZE (i.e. 4MB on x86). With that max array, we could load
> "4MB / sizeof(u64)" addresses of "MAX_ORDER-1" blocks, that is, 2TB free
> memory at most. We thought about removing that 2TB limitation by passing
> in multiple such max arrays (a list of them).

No.

Stop this already./

You're doing everthing wrong.

If the array has to describe *all* memory you will ever free, then you
have already lost.

Just do it in chunks.

I don't want the VM code to even fill in that big of an array anyway -
this all happens under the zone lock, and you're walking a list that
is bad for caching anyway.

So plan on an interface that allows _incremental_ freeing, because any
plan that starts with "I worry that maybe two TERABYTES of memory
isn't big enough" is so broken that it's laughable.

That was what I tried to encourage with actually removing the pages
form the page list. That would be an _incremental_ interface. You can
remove MAX_ORDER-1 pages one by one (or a hundred at a time), and mark
them free for ballooning that way. And if you still feel you have tons
of free memory, just continue removing more pages from the free list.

Notice? Incremental. Not "I want to have a crazy array that is enough
to hold 2TB at one time".

So here's the rule:

 - make it a simple array interface

 - make the array *small*. Not megabytes. Kilobytes. Because if you're
filling in megabytes worth of free pointers while holding the zone
lock, you're doing something wrong.

 - design the interface so that you do not *need* to have this crazy
"all or nothing" approach.

See what I'm trying to push for. Think "low latency". Think "small
arrays". Think "simple and straightforward interfaces".

At no point should you ever worry about "2TB". Never.

           Linus

  reply	other threads:[~2018-07-11  1:45 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-10  9:31 [PATCH v35 0/5] Virtio-balloon: support free page reporting Wei Wang
2018-07-10  9:31 ` [PATCH v35 1/5] mm: support to get hints of free page blocks Wei Wang
2018-07-10 10:16   ` Wang, Wei W
2018-07-10 17:33   ` Linus Torvalds
2018-07-11  1:28     ` Wei Wang
2018-07-11  1:44       ` Linus Torvalds [this message]
2018-07-11  9:21         ` Michal Hocko
2018-07-11 10:52           ` Wei Wang
2018-07-11 11:09             ` Michal Hocko
2018-07-11 13:55               ` Wang, Wei W
2018-07-11 14:38                 ` Michal Hocko
2018-07-11 19:36               ` Michael S. Tsirkin
2018-07-11 16:23           ` Linus Torvalds
2018-07-12  2:21             ` Wei Wang
2018-07-12  2:30               ` Linus Torvalds
2018-07-12  2:52                 ` Wei Wang
2018-07-12  8:13                   ` Michal Hocko
2018-07-12 11:34                     ` Wei Wang
2018-07-12 11:49                       ` Michal Hocko
2018-07-13  0:33                         ` Wei Wang
2018-07-12 13:12             ` Michal Hocko
2018-07-11  4:00     ` Michael S. Tsirkin
2018-07-11  4:04       ` Michael S. Tsirkin
2018-07-10  9:31 ` [PATCH v35 2/5] virtio-balloon: remove BUG() in init_vqs Wei Wang
2018-07-10  9:31 ` [PATCH v35 3/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-07-10  9:31 ` [PATCH v35 4/5] mm/page_poison: expose page_poisoning_enabled to kernel modules Wei Wang
2018-07-10  9:31 ` [PATCH v35 5/5] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON Wei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFzqj8wxXnHAdUTiOomipgFONVbqKMjL_tfk7e5ar1FziQ@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=liliang.opensource@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=nilal@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=quan.xu0@gmail.com \
    --cc=riel@redhat.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wei.w.wang@intel.com \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).