From: Christopher Lameter <cl@linux.com> To: Mikulas Patocka <mpatocka@redhat.com> Cc: Matthew Wilcox <willy@infradead.org>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, dm-devel@redhat.com, Mike Snitzer <msnitzer@redhat.com> Subject: Re: [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE Date: Wed, 21 Mar 2018 13:40:31 -0500 (CDT) [thread overview] Message-ID: <alpine.DEB.2.20.1803211335240.13978@nuc-kabylake> (raw) In-Reply-To: <alpine.LRH.2.02.1803211406180.26409@file01.intranet.prod.int.rdu2.redhat.com> On Wed, 21 Mar 2018, Mikulas Patocka wrote: > > > F.e. you could optimize the allcations > 2x PAGE_SIZE so that they do not > > > allocate powers of two pages. It would be relatively easy to make > > > kmalloc_large round the allocation to the next page size and then allocate > > > N consecutive pages via alloc_pages_exact() and free the remainder unused > > > pages or some such thing. > > alloc_pages_exact() has O(n*log n) complexity with respect to the number > of requested pages. It would have to be reworked and optimized if it were > to be used for the dm-bufio cache. (it could be optimized down to O(log n) > if it didn't split the compound page to a lot of separate pages, but split > it to a power-of-two clusters instead). Well then a memory pool of page allocator requests may address that issue? Have a look at include/linux/mempool.h. > > I don't know if that's a good idea. That will contribute to fragmentation > > if the allocation is held onto for a short-to-medium length of time. > > If the allocation is for a very long period of time then those pages > > would have been unavailable anyway, but if the user of the tail pages > > holds them beyond the lifetime of the large allocation, then this is > > probably a bad tradeoff to make. Fragmentation is sadly a big issue. You could create a mempool on bootup or early after boot to ensure that you have a sufficient number of contiguous pages available. > The problem with alloc_pages_exact() is that it exhausts all the > high-order pages and leaves many free low-order pages around. So you'll > end up in a system with a lot of free memory, but with all high-order > pages missing. As there would be a lot of free memory, the kswapd thread > would not be woken up to free some high-order pages. I think that logic is properly balanced and will take into account pages that have been removed from the LRU expiration logic. > I think that using slab with high order is better, because it at least > doesn't leave many low-order pages behind. Any request to the slab via kmalloc with a size > 2x page size will simply lead to a page allocator request. You have the same issue. If you want to rely on the slab allocator buffering large segments for you then a mempool will also solve the issue for you and you have more control over the pool. > BTW. it could be possible to open the file > "/sys/kernel/slab/<cache>/order" from the dm-bufio kernel driver and write > the requested value there, but it seems very dirty. It would be better to > have a kernel interface for that. Hehehe you could directly write to the kmem_cache structure and increase the order. AFAICT this would be dirty but work. But still the increased page order will get you into trouble with fragmentation when the system runs for a long time. That is the reason we try to limit the allocation sizes coming from the slab allocator.
WARNING: multiple messages have this Message-ID (diff)
From: Christopher Lameter <cl@linux.com> To: Mikulas Patocka <mpatocka@redhat.com> Cc: Mike Snitzer <msnitzer@redhat.com>, Matthew Wilcox <willy@infradead.org>, Pekka Enberg <penberg@kernel.org>, linux-mm@kvack.org, dm-devel@redhat.com, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Andrew Morton <akpm@linux-foundation.org> Subject: Re: [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE Date: Wed, 21 Mar 2018 13:40:31 -0500 (CDT) [thread overview] Message-ID: <alpine.DEB.2.20.1803211335240.13978@nuc-kabylake> (raw) In-Reply-To: <alpine.LRH.2.02.1803211406180.26409@file01.intranet.prod.int.rdu2.redhat.com> On Wed, 21 Mar 2018, Mikulas Patocka wrote: > > > F.e. you could optimize the allcations > 2x PAGE_SIZE so that they do not > > > allocate powers of two pages. It would be relatively easy to make > > > kmalloc_large round the allocation to the next page size and then allocate > > > N consecutive pages via alloc_pages_exact() and free the remainder unused > > > pages or some such thing. > > alloc_pages_exact() has O(n*log n) complexity with respect to the number > of requested pages. It would have to be reworked and optimized if it were > to be used for the dm-bufio cache. (it could be optimized down to O(log n) > if it didn't split the compound page to a lot of separate pages, but split > it to a power-of-two clusters instead). Well then a memory pool of page allocator requests may address that issue? Have a look at include/linux/mempool.h. > > I don't know if that's a good idea. That will contribute to fragmentation > > if the allocation is held onto for a short-to-medium length of time. > > If the allocation is for a very long period of time then those pages > > would have been unavailable anyway, but if the user of the tail pages > > holds them beyond the lifetime of the large allocation, then this is > > probably a bad tradeoff to make. Fragmentation is sadly a big issue. You could create a mempool on bootup or early after boot to ensure that you have a sufficient number of contiguous pages available. > The problem with alloc_pages_exact() is that it exhausts all the > high-order pages and leaves many free low-order pages around. So you'll > end up in a system with a lot of free memory, but with all high-order > pages missing. As there would be a lot of free memory, the kswapd thread > would not be woken up to free some high-order pages. I think that logic is properly balanced and will take into account pages that have been removed from the LRU expiration logic. > I think that using slab with high order is better, because it at least > doesn't leave many low-order pages behind. Any request to the slab via kmalloc with a size > 2x page size will simply lead to a page allocator request. You have the same issue. If you want to rely on the slab allocator buffering large segments for you then a mempool will also solve the issue for you and you have more control over the pool. > BTW. it could be possible to open the file > "/sys/kernel/slab/<cache>/order" from the dm-bufio kernel driver and write > the requested value there, but it seems very dirty. It would be better to > have a kernel interface for that. Hehehe you could directly write to the kmem_cache structure and increase the order. AFAICT this would be dirty but work. But still the increased page order will get you into trouble with fragmentation when the system runs for a long time. That is the reason we try to limit the allocation sizes coming from the slab allocator.
next prev parent reply other threads:[~2018-03-21 18:40 UTC|newest] Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-03-20 17:25 [PATCH] slab: introduce the flag SLAB_MINIMIZE_WASTE Mikulas Patocka 2018-03-20 17:35 ` Matthew Wilcox 2018-03-20 17:35 ` Matthew Wilcox 2018-03-20 17:54 ` Christopher Lameter 2018-03-20 17:54 ` Christopher Lameter 2018-03-20 19:22 ` Mikulas Patocka 2018-03-20 19:22 ` Mikulas Patocka 2018-03-20 20:42 ` Christopher Lameter 2018-03-20 20:42 ` Christopher Lameter 2018-03-20 22:02 ` Mikulas Patocka 2018-03-20 22:02 ` Mikulas Patocka 2018-03-21 15:35 ` Christopher Lameter 2018-03-21 15:35 ` Christopher Lameter 2018-03-21 16:25 ` Mikulas Patocka 2018-03-21 16:25 ` Mikulas Patocka 2018-03-21 17:10 ` Matthew Wilcox 2018-03-21 17:10 ` Matthew Wilcox 2018-03-21 17:30 ` Christopher Lameter 2018-03-21 17:30 ` Christopher Lameter 2018-03-21 17:39 ` Christopher Lameter 2018-03-21 17:39 ` Christopher Lameter 2018-03-21 17:49 ` Matthew Wilcox 2018-03-21 17:49 ` Matthew Wilcox 2018-03-21 18:01 ` Christopher Lameter 2018-03-21 18:01 ` Christopher Lameter 2018-03-21 18:23 ` Mikulas Patocka 2018-03-21 18:23 ` Mikulas Patocka 2018-03-21 18:40 ` Christopher Lameter [this message] 2018-03-21 18:40 ` Christopher Lameter 2018-03-21 18:55 ` Mikulas Patocka 2018-03-21 18:55 ` Mikulas Patocka 2018-03-21 18:55 ` Matthew Wilcox 2018-03-21 18:55 ` Matthew Wilcox 2018-03-21 18:58 ` Christopher Lameter 2018-03-21 18:58 ` Christopher Lameter 2018-03-21 19:25 ` Mikulas Patocka 2018-03-21 19:25 ` Mikulas Patocka 2018-03-21 18:36 ` Mikulas Patocka 2018-03-21 18:36 ` Mikulas Patocka 2018-03-21 18:57 ` Christopher Lameter 2018-03-21 18:57 ` Christopher Lameter 2018-03-21 19:19 ` Mikulas Patocka 2018-03-21 19:19 ` Mikulas Patocka 2018-03-21 20:09 ` Christopher Lameter 2018-03-21 20:09 ` Christopher Lameter 2018-03-21 20:37 ` Mikulas Patocka 2018-03-21 20:37 ` Mikulas Patocka 2018-03-23 15:10 ` Christopher Lameter 2018-03-23 15:10 ` Christopher Lameter 2018-03-23 15:31 ` Mikulas Patocka 2018-03-23 15:31 ` Mikulas Patocka 2018-03-23 15:48 ` Christopher Lameter 2018-03-23 15:48 ` Christopher Lameter 2018-04-13 9:22 ` Vlastimil Babka 2018-04-13 9:22 ` Vlastimil Babka 2018-04-13 15:10 ` Mike Snitzer 2018-04-13 15:10 ` Mike Snitzer 2018-04-16 12:38 ` Vlastimil Babka 2018-04-16 12:38 ` Vlastimil Babka 2018-04-16 14:27 ` Mike Snitzer 2018-04-16 14:27 ` Mike Snitzer 2018-04-16 14:37 ` Mikulas Patocka 2018-04-16 14:37 ` Mikulas Patocka 2018-04-16 14:46 ` Mike Snitzer 2018-04-16 14:46 ` Mike Snitzer 2018-04-16 14:57 ` Mikulas Patocka 2018-04-16 14:57 ` Mikulas Patocka 2018-04-16 15:18 ` Christopher Lameter 2018-04-16 15:18 ` Christopher Lameter 2018-04-16 15:25 ` Mikulas Patocka 2018-04-16 15:25 ` Mikulas Patocka 2018-04-16 15:45 ` Christopher Lameter 2018-04-16 15:45 ` Christopher Lameter 2018-04-16 19:36 ` Mikulas Patocka 2018-04-16 19:36 ` Mikulas Patocka 2018-04-16 19:53 ` Vlastimil Babka 2018-04-16 21:01 ` Mikulas Patocka 2018-04-17 14:40 ` Christopher Lameter 2018-04-17 18:53 ` Mikulas Patocka 2018-04-17 18:53 ` Mikulas Patocka 2018-04-17 21:42 ` Christopher Lameter 2018-04-17 14:49 ` Christopher Lameter 2018-04-17 14:49 ` Christopher Lameter 2018-04-17 14:47 ` Christopher Lameter 2018-04-17 14:47 ` Christopher Lameter 2018-04-16 19:32 ` [PATCH RESEND] " Mikulas Patocka 2018-04-17 14:45 ` Christopher Lameter 2018-04-17 16:16 ` Vlastimil Babka 2018-04-17 16:38 ` Christopher Lameter 2018-04-17 19:09 ` Mikulas Patocka 2018-04-17 17:26 ` Mikulas Patocka 2018-04-17 19:13 ` Vlastimil Babka 2018-04-17 19:06 ` Mikulas Patocka 2018-04-17 19:06 ` Mikulas Patocka 2018-04-18 14:55 ` Christopher Lameter 2018-04-25 21:04 ` Mikulas Patocka 2018-04-25 23:24 ` Mikulas Patocka 2018-04-26 19:01 ` Christopher Lameter 2018-04-26 21:09 ` Mikulas Patocka 2018-04-27 16:41 ` Christopher Lameter 2018-04-27 19:19 ` Mikulas Patocka 2018-06-13 17:01 ` Mikulas Patocka 2018-06-13 18:16 ` Christoph Hellwig 2018-06-13 18:53 ` Mikulas Patocka 2018-04-26 18:51 ` Christopher Lameter 2018-04-16 19:38 ` Vlastimil Babka 2018-04-16 19:38 ` Vlastimil Babka 2018-04-16 21:04 ` Mikulas Patocka 2018-04-16 21:04 ` Mikulas Patocka
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=alpine.DEB.2.20.1803211335240.13978@nuc-kabylake \ --to=cl@linux.com \ --cc=akpm@linux-foundation.org \ --cc=dm-devel@redhat.com \ --cc=iamjoonsoo.kim@lge.com \ --cc=linux-mm@kvack.org \ --cc=mpatocka@redhat.com \ --cc=msnitzer@redhat.com \ --cc=penberg@kernel.org \ --cc=rientjes@google.com \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.