All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-21 15:20 Huaisheng Ye
  0 siblings, 0 replies; 24+ messages in thread
From: Huaisheng Ye @ 2018-05-21 15:20 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: kstewart, mhocko, Huaisheng Ye, hehy1, gregkh, linux-kernel,
	willy, alexander.levin, iommu, linux-btrfs, chengnt, xen-devel,
	colyli, mgorman, vbabka

From: Huaisheng Ye <yehs1@lenovo.com>

Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.

Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
the bottom three bits of GFP mask is reserved for storing encoded
zone number.

The encoding method is XOR. Get zone number from enum zone_type,
then encode the number with ZONE_NORMAL by XOR operation.
The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
can be used as before.

Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
__GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
__GFP_ZONE_MOVABLE is created to realize it.

With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
enough to get ZONE_MOVABLE from gfp_zone. All callers should use
GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that.

Decode zone number directly from bottom three bits of flags in gfp_zone.
The theory of encoding and decoding is,
        A ^ B ^ B = A

Changes since v1,

v2: Add __GFP_ZONE_MOVABLE and modify GFP_HIGHUSER_MOVABLE to help
callers to get ZONE_MOVABLE. Add __GFP_ZONE_MASK to mask lowest 3
bits of GFP bitmasks.
Modify some callers' gfp flag to update usage of address zone
modifiers.
Modify inline function gfp_zone to get better performance according
to Matthew's suggestion.

Link: https://marc.info/?l=linux-mm&m=152596791931266&w=2

Huaisheng Ye (12):
  include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD
  arch/x86/kernel/amd_gart_64: update usage of address zone modifiers
  arch/x86/kernel/pci-calgary_64: update usage of address zone modifiers
  drivers/iommu/amd_iommu: update usage of address zone modifiers
  include/linux/dma-mapping: update usage of address zone modifiers
  drivers/xen/swiotlb-xen: update usage of address zone modifiers
  fs/btrfs/extent_io: update usage of address zone modifiers
  drivers/block/zram/zram_drv: update usage of address zone modifiers
  mm/vmpressure: update usage of address zone modifiers
  mm/zsmalloc: update usage of address zone modifiers
  include/linux/highmem: update usage of movableflags
  arch/x86/include/asm/page.h: update usage of movableflags

 arch/x86/include/asm/page.h      |  3 +-
 arch/x86/kernel/amd_gart_64.c    |  2 +-
 arch/x86/kernel/pci-calgary_64.c |  2 +-
 drivers/block/zram/zram_drv.c    |  6 +--
 drivers/iommu/amd_iommu.c        |  2 +-
 drivers/xen/swiotlb-xen.c        |  2 +-
 fs/btrfs/extent_io.c             |  2 +-
 include/linux/dma-mapping.h      |  2 +-
 include/linux/gfp.h              | 98 +++++-----------------------------------
 include/linux/highmem.h          |  4 +-
 mm/vmpressure.c                  |  2 +-
 mm/zsmalloc.c                    |  4 +-
 12 files changed, 26 insertions(+), 103 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-25 12:00           ` Matthew Wilcox
@ 2018-05-28 13:33               ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2018-05-28 13:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Huaisheng Ye, akpm, linux-mm, vbabka, mgorman, kstewart,
	alexander.levin, gregkh, colyli, chengnt, hehy1, linux-kernel,
	iommu, xen-devel, linux-btrfs, Huaisheng Ye

On Fri 25-05-18 05:00:44, Matthew Wilcox wrote:
> On Thu, May 24, 2018 at 05:29:43PM +0200, Michal Hocko wrote:
> > > ie if we had more,
> > > could we solve our pain by making them more generic?
> > 
> > Well, if you have more you will consume more bits in the struct pages,
> > right?
> 
> Not necessarily ... the zone number is stored in the struct page
> currently, so either two or three bits are used right now.  In my
> proposal, one can infer the zone of a page from its PFN, except for
> ZONE_MOVABLE.  So we could trim down to just one bit per struct page
> for 32-bit machines while using 3 bits on 64-bit machines, where there
> is plenty of space.

Just be warned that page_zone is called from many hot paths. I am not
sure adding something more complex there is going to fly.

> > > it more-or-less sucks that the devices with 28-bit DMA limits are forced
> > > to allocate from the low 16MB when they're perfectly capable of using the
> > > low 256MB.
> > 
> > Do we actually care all that much about those? If yes then we should
> > probably follow the ZONE_DMA (x86) path and use a CMA region for them.
> > I mean most devices should be good with very limited addressability or
> > below 4G, no?
> 
> Sure.  One other thing I meant to mention was the media devices
> (TV capture cards and so on) which want a vmalloc_32() allocation.
> On 32-bit machines right now, we allocate from LOWMEM, when we really
> should be allocating from the 1GB-4GB region.  32-bit machines generally
> don't have a ZONE_DMA32 today.

Well, _I_ think that vmalloc on 32b is just lost case...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-28 13:33               ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2018-05-28 13:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: kstewart, Huaisheng Ye, hehy1, gregkh, linux-kernel,
	alexander.levin, linux-mm, iommu, linux-btrfs, Huaisheng Ye,
	chengnt, xen-devel, akpm, colyli, mgorman, vbabka

On Fri 25-05-18 05:00:44, Matthew Wilcox wrote:
> On Thu, May 24, 2018 at 05:29:43PM +0200, Michal Hocko wrote:
> > > ie if we had more,
> > > could we solve our pain by making them more generic?
> > 
> > Well, if you have more you will consume more bits in the struct pages,
> > right?
> 
> Not necessarily ... the zone number is stored in the struct page
> currently, so either two or three bits are used right now.  In my
> proposal, one can infer the zone of a page from its PFN, except for
> ZONE_MOVABLE.  So we could trim down to just one bit per struct page
> for 32-bit machines while using 3 bits on 64-bit machines, where there
> is plenty of space.

Just be warned that page_zone is called from many hot paths. I am not
sure adding something more complex there is going to fly.

> > > it more-or-less sucks that the devices with 28-bit DMA limits are forced
> > > to allocate from the low 16MB when they're perfectly capable of using the
> > > low 256MB.
> > 
> > Do we actually care all that much about those? If yes then we should
> > probably follow the ZONE_DMA (x86) path and use a CMA region for them.
> > I mean most devices should be good with very limited addressability or
> > below 4G, no?
> 
> Sure.  One other thing I meant to mention was the media devices
> (TV capture cards and so on) which want a vmalloc_32() allocation.
> On 32-bit machines right now, we allocate from LOWMEM, when we really
> should be allocating from the 1GB-4GB region.  32-bit machines generally
> don't have a ZONE_DMA32 today.

Well, _I_ think that vmalloc on 32b is just lost case...

-- 
Michal Hocko
SUSE Labs

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-24 15:29         ` Michal Hocko
  2018-05-25 12:00           ` Matthew Wilcox
@ 2018-05-25 12:00           ` Matthew Wilcox
  2018-05-28 13:33               ` Michal Hocko
  1 sibling, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2018-05-25 12:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Huaisheng Ye, akpm, linux-mm, vbabka, mgorman, kstewart,
	alexander.levin, gregkh, colyli, chengnt, hehy1, linux-kernel,
	iommu, xen-devel, linux-btrfs, Huaisheng Ye

On Thu, May 24, 2018 at 05:29:43PM +0200, Michal Hocko wrote:
> > ie if we had more,
> > could we solve our pain by making them more generic?
> 
> Well, if you have more you will consume more bits in the struct pages,
> right?

Not necessarily ... the zone number is stored in the struct page
currently, so either two or three bits are used right now.  In my
proposal, one can infer the zone of a page from its PFN, except for
ZONE_MOVABLE.  So we could trim down to just one bit per struct page
for 32-bit machines while using 3 bits on 64-bit machines, where there
is plenty of space.

> > it more-or-less sucks that the devices with 28-bit DMA limits are forced
> > to allocate from the low 16MB when they're perfectly capable of using the
> > low 256MB.
> 
> Do we actually care all that much about those? If yes then we should
> probably follow the ZONE_DMA (x86) path and use a CMA region for them.
> I mean most devices should be good with very limited addressability or
> below 4G, no?

Sure.  One other thing I meant to mention was the media devices
(TV capture cards and so on) which want a vmalloc_32() allocation.
On 32-bit machines right now, we allocate from LOWMEM, when we really
should be allocating from the 1GB-4GB region.  32-bit machines generally
don't have a ZONE_DMA32 today.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-24 15:29         ` Michal Hocko
@ 2018-05-25 12:00           ` Matthew Wilcox
  2018-05-25 12:00           ` Matthew Wilcox
  1 sibling, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2018-05-25 12:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: kstewart, Huaisheng Ye, hehy1, gregkh, linux-kernel,
	alexander.levin, linux-mm, iommu, linux-btrfs, Huaisheng Ye,
	chengnt, xen-devel, akpm, colyli, mgorman, vbabka

On Thu, May 24, 2018 at 05:29:43PM +0200, Michal Hocko wrote:
> > ie if we had more,
> > could we solve our pain by making them more generic?
> 
> Well, if you have more you will consume more bits in the struct pages,
> right?

Not necessarily ... the zone number is stored in the struct page
currently, so either two or three bits are used right now.  In my
proposal, one can infer the zone of a page from its PFN, except for
ZONE_MOVABLE.  So we could trim down to just one bit per struct page
for 32-bit machines while using 3 bits on 64-bit machines, where there
is plenty of space.

> > it more-or-less sucks that the devices with 28-bit DMA limits are forced
> > to allocate from the low 16MB when they're perfectly capable of using the
> > low 256MB.
> 
> Do we actually care all that much about those? If yes then we should
> probably follow the ZONE_DMA (x86) path and use a CMA region for them.
> I mean most devices should be good with very limited addressability or
> below 4G, no?

Sure.  One other thing I meant to mention was the media devices
(TV capture cards and so on) which want a vmalloc_32() allocation.
On 32-bit machines right now, we allocate from LOWMEM, when we really
should be allocating from the 1GB-4GB region.  32-bit machines generally
don't have a ZONE_DMA32 today.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-24 15:18       ` Matthew Wilcox
  2018-05-24 15:29         ` Michal Hocko
@ 2018-05-24 15:29         ` Michal Hocko
  2018-05-25 12:00           ` Matthew Wilcox
  2018-05-25 12:00           ` Matthew Wilcox
  1 sibling, 2 replies; 24+ messages in thread
From: Michal Hocko @ 2018-05-24 15:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Huaisheng Ye, akpm, linux-mm, vbabka, mgorman, kstewart,
	alexander.levin, gregkh, colyli, chengnt, hehy1, linux-kernel,
	iommu, xen-devel, linux-btrfs, Huaisheng Ye

On Thu 24-05-18 08:18:18, Matthew Wilcox wrote:
> On Thu, May 24, 2018 at 02:23:23PM +0200, Michal Hocko wrote:
> > > If we had eight ZONEs, we could offer:
> > 
> > No, please no more zones. What we have is quite a maint. burden on its
> > own. Ideally we should only have lowmem, highmem and special/device
> > zones for directly kernel accessible memory, the one that the kernel
> > cannot or must not use and completely special memory managed out of
> > the page allocator. All the remaining constrains should better be
> > implemented on top.
> 
> I believe you when you say that they're a maintenance pain.  Is that
> maintenance pain because they're so specialised?

Well, it used to be LRU balancing which is gone with the node reclaim
but that brings new challenges. Now as you say their meaning is not
really clear to users and that leads to bugs left and right.

> ie if we had more,
> could we solve our pain by making them more generic?

Well, if you have more you will consume more bits in the struct pages,
right?

[...]

> > But those already do have aproper API, IIUC. So do we really need to
> > make our GFP_*/Zone API more complicated than it already is?
> 
> I don't want to change the driver API (setting the DMA mask, etc),
> but we don't actually have a good API to the page allocator for the
> implementation of dma_alloc_foo() to request pages.  More or less,
> architectures do:
> 
> 	if (mask < 4GB)
> 		alloc_page(GFP_DMA)
> 	else if (mask < 64EB)
> 		alloc_page(GFP_DMA32)
> 	else
> 		alloc_page(GFP_HIGHMEM)
> 
> it more-or-less sucks that the devices with 28-bit DMA limits are forced
> to allocate from the low 16MB when they're perfectly capable of using the
> low 256MB.

Do we actually care all that much about those? If yes then we should
probably follow the ZONE_DMA (x86) path and use a CMA region for them.
I mean most devices should be good with very limited addressability or
below 4G, no?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-24 15:18       ` Matthew Wilcox
@ 2018-05-24 15:29         ` Michal Hocko
  2018-05-24 15:29         ` Michal Hocko
  1 sibling, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2018-05-24 15:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: kstewart, Huaisheng Ye, hehy1, gregkh, linux-kernel,
	alexander.levin, linux-mm, iommu, linux-btrfs, Huaisheng Ye,
	chengnt, xen-devel, akpm, colyli, mgorman, vbabka

On Thu 24-05-18 08:18:18, Matthew Wilcox wrote:
> On Thu, May 24, 2018 at 02:23:23PM +0200, Michal Hocko wrote:
> > > If we had eight ZONEs, we could offer:
> > 
> > No, please no more zones. What we have is quite a maint. burden on its
> > own. Ideally we should only have lowmem, highmem and special/device
> > zones for directly kernel accessible memory, the one that the kernel
> > cannot or must not use and completely special memory managed out of
> > the page allocator. All the remaining constrains should better be
> > implemented on top.
> 
> I believe you when you say that they're a maintenance pain.  Is that
> maintenance pain because they're so specialised?

Well, it used to be LRU balancing which is gone with the node reclaim
but that brings new challenges. Now as you say their meaning is not
really clear to users and that leads to bugs left and right.

> ie if we had more,
> could we solve our pain by making them more generic?

Well, if you have more you will consume more bits in the struct pages,
right?

[...]

> > But those already do have aproper API, IIUC. So do we really need to
> > make our GFP_*/Zone API more complicated than it already is?
> 
> I don't want to change the driver API (setting the DMA mask, etc),
> but we don't actually have a good API to the page allocator for the
> implementation of dma_alloc_foo() to request pages.  More or less,
> architectures do:
> 
> 	if (mask < 4GB)
> 		alloc_page(GFP_DMA)
> 	else if (mask < 64EB)
> 		alloc_page(GFP_DMA32)
> 	else
> 		alloc_page(GFP_HIGHMEM)
> 
> it more-or-less sucks that the devices with 28-bit DMA limits are forced
> to allocate from the low 16MB when they're perfectly capable of using the
> low 256MB.

Do we actually care all that much about those? If yes then we should
probably follow the ZONE_DMA (x86) path and use a CMA region for them.
I mean most devices should be good with very limited addressability or
below 4G, no?
-- 
Michal Hocko
SUSE Labs

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-24 12:23       ` Michal Hocko
  (?)
  (?)
@ 2018-05-24 15:18       ` Matthew Wilcox
  2018-05-24 15:29         ` Michal Hocko
  2018-05-24 15:29         ` Michal Hocko
  -1 siblings, 2 replies; 24+ messages in thread
From: Matthew Wilcox @ 2018-05-24 15:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Huaisheng Ye, akpm, linux-mm, vbabka, mgorman, kstewart,
	alexander.levin, gregkh, colyli, chengnt, hehy1, linux-kernel,
	iommu, xen-devel, linux-btrfs, Huaisheng Ye

On Thu, May 24, 2018 at 02:23:23PM +0200, Michal Hocko wrote:
> > If we had eight ZONEs, we could offer:
> 
> No, please no more zones. What we have is quite a maint. burden on its
> own. Ideally we should only have lowmem, highmem and special/device
> zones for directly kernel accessible memory, the one that the kernel
> cannot or must not use and completely special memory managed out of
> the page allocator. All the remaining constrains should better be
> implemented on top.

I believe you when you say that they're a maintenance pain.  Is that
maintenance pain because they're so specialised?  ie if we had more,
could we solve our pain by making them more generic?

> > ZONE_16M	// 24 bit
> > ZONE_256M	// 28 bit
> > ZONE_LOWMEM	// CONFIG_32BIT only
> > ZONE_4G		// 32 bit
> > ZONE_64G	// 36 bit
> > ZONE_1T		// 40 bit
> > ZONE_ALL	// everything larger
> > ZONE_MOVABLE	// movable allocations; no physical address guarantees
> > 
> > #ifdef CONFIG_64BIT
> > #define ZONE_NORMAL	ZONE_ALL
> > #else
> > #define ZONE_NORMAL	ZONE_LOWMEM
> > #endif
> > 
> > This would cover most driver DMA mask allocations; we could tweak the
> > offered zones based on analysis of what people need.
> 
> But those already do have aproper API, IIUC. So do we really need to
> make our GFP_*/Zone API more complicated than it already is?

I don't want to change the driver API (setting the DMA mask, etc),
but we don't actually have a good API to the page allocator for the
implementation of dma_alloc_foo() to request pages.  More or less,
architectures do:

	if (mask < 4GB)
		alloc_page(GFP_DMA)
	else if (mask < 64EB)
		alloc_page(GFP_DMA32)
	else
		alloc_page(GFP_HIGHMEM)

it more-or-less sucks that the devices with 28-bit DMA limits are forced
to allocate from the low 16MB when they're perfectly capable of using the
low 256MB.  Sure, my proposal doesn't help 27 or 26 bit DMA mask devices,
but those are pretty rare.

I'm sure you don't need reminding what a mess vmalloc_32 is, and the
implementation of saa7146_vmalloc_build_pgtable() just hurts.

> > #define GFP_HIGHUSER		(GFP_USER | ZONE_ALL)
> > #define GFP_HIGHUSER_MOVABLE	(GFP_USER | ZONE_MOVABLE)
> > 
> > One other thing I want to see is that fallback from zones happens from
> > highest to lowest normally (ie if you fail to allocate in 1T, then you
> > try to allocate from 64G), but movable allocations hapen from lowest
> > to highest.  So ZONE_16M ends up full of page cache pages which are
> > readily evictable for the rare occasions when we need to allocate memory
> > below 16MB.
> > 
> > I'm sure there are lots of good reasons why this won't work, which is
> > why I've been hesitant to propose it before now.
> 
> I am worried you are playing with a can of worms...

Yes.  Me too.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-24 12:23       ` Michal Hocko
  (?)
@ 2018-05-24 15:18       ` Matthew Wilcox
  -1 siblings, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2018-05-24 15:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: kstewart, Huaisheng Ye, hehy1, gregkh, linux-kernel,
	alexander.levin, linux-mm, iommu, linux-btrfs, Huaisheng Ye,
	chengnt, xen-devel, akpm, colyli, mgorman, vbabka

On Thu, May 24, 2018 at 02:23:23PM +0200, Michal Hocko wrote:
> > If we had eight ZONEs, we could offer:
> 
> No, please no more zones. What we have is quite a maint. burden on its
> own. Ideally we should only have lowmem, highmem and special/device
> zones for directly kernel accessible memory, the one that the kernel
> cannot or must not use and completely special memory managed out of
> the page allocator. All the remaining constrains should better be
> implemented on top.

I believe you when you say that they're a maintenance pain.  Is that
maintenance pain because they're so specialised?  ie if we had more,
could we solve our pain by making them more generic?

> > ZONE_16M	// 24 bit
> > ZONE_256M	// 28 bit
> > ZONE_LOWMEM	// CONFIG_32BIT only
> > ZONE_4G		// 32 bit
> > ZONE_64G	// 36 bit
> > ZONE_1T		// 40 bit
> > ZONE_ALL	// everything larger
> > ZONE_MOVABLE	// movable allocations; no physical address guarantees
> > 
> > #ifdef CONFIG_64BIT
> > #define ZONE_NORMAL	ZONE_ALL
> > #else
> > #define ZONE_NORMAL	ZONE_LOWMEM
> > #endif
> > 
> > This would cover most driver DMA mask allocations; we could tweak the
> > offered zones based on analysis of what people need.
> 
> But those already do have aproper API, IIUC. So do we really need to
> make our GFP_*/Zone API more complicated than it already is?

I don't want to change the driver API (setting the DMA mask, etc),
but we don't actually have a good API to the page allocator for the
implementation of dma_alloc_foo() to request pages.  More or less,
architectures do:

	if (mask < 4GB)
		alloc_page(GFP_DMA)
	else if (mask < 64EB)
		alloc_page(GFP_DMA32)
	else
		alloc_page(GFP_HIGHMEM)

it more-or-less sucks that the devices with 28-bit DMA limits are forced
to allocate from the low 16MB when they're perfectly capable of using the
low 256MB.  Sure, my proposal doesn't help 27 or 26 bit DMA mask devices,
but those are pretty rare.

I'm sure you don't need reminding what a mess vmalloc_32 is, and the
implementation of saa7146_vmalloc_build_pgtable() just hurts.

> > #define GFP_HIGHUSER		(GFP_USER | ZONE_ALL)
> > #define GFP_HIGHUSER_MOVABLE	(GFP_USER | ZONE_MOVABLE)
> > 
> > One other thing I want to see is that fallback from zones happens from
> > highest to lowest normally (ie if you fail to allocate in 1T, then you
> > try to allocate from 64G), but movable allocations hapen from lowest
> > to highest.  So ZONE_16M ends up full of page cache pages which are
> > readily evictable for the rare occasions when we need to allocate memory
> > below 16MB.
> > 
> > I'm sure there are lots of good reasons why this won't work, which is
> > why I've been hesitant to propose it before now.
> 
> I am worried you are playing with a can of worms...

Yes.  Me too.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-24 12:23       ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2018-05-24 12:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Huaisheng Ye, akpm, linux-mm, vbabka, mgorman, kstewart,
	alexander.levin, gregkh, colyli, chengnt, hehy1, linux-kernel,
	iommu, xen-devel, linux-btrfs, Huaisheng Ye

On Wed 23-05-18 22:19:19, Matthew Wilcox wrote:
> On Tue, May 22, 2018 at 08:37:28PM +0200, Michal Hocko wrote:
> > So why is this any better than the current code. Sure I am not a great
> > fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> > doesn't look too much better, yet we are losing a check for incompatible
> > gfp flags. The diffstat looks really sound but then you just look and
> > see that the large part is the comment that at least explained the gfp
> > zone modifiers somehow and the debugging code. So what is the selling
> > point?
> 
> I have a plan, but it's not exactly fully-formed yet.
> 
> One of the big problems we have today is that we have a lot of users
> who have constraints on the physical memory they want to allocate,
> but we have very limited abilities to provide them with what they're
> asking for.  The various different ZONEs have different meanings on
> different architectures and are generally a mess.

Agreed.

> If we had eight ZONEs, we could offer:

No, please no more zones. What we have is quite a maint. burden on its
own. Ideally we should only have lowmem, highmem and special/device
zones for directly kernel accessible memory, the one that the kernel
cannot or must not use and completely special memory managed out of
the page allocator. All the remaining constrains should better be
implemented on top.

> ZONE_16M	// 24 bit
> ZONE_256M	// 28 bit
> ZONE_LOWMEM	// CONFIG_32BIT only
> ZONE_4G		// 32 bit
> ZONE_64G	// 36 bit
> ZONE_1T		// 40 bit
> ZONE_ALL	// everything larger
> ZONE_MOVABLE	// movable allocations; no physical address guarantees
> 
> #ifdef CONFIG_64BIT
> #define ZONE_NORMAL	ZONE_ALL
> #else
> #define ZONE_NORMAL	ZONE_LOWMEM
> #endif
> 
> This would cover most driver DMA mask allocations; we could tweak the
> offered zones based on analysis of what people need.

But those already do have aproper API, IIUC. So do we really need to
make our GFP_*/Zone API more complicated than it already is?

> #define GFP_HIGHUSER		(GFP_USER | ZONE_ALL)
> #define GFP_HIGHUSER_MOVABLE	(GFP_USER | ZONE_MOVABLE)
> 
> One other thing I want to see is that fallback from zones happens from
> highest to lowest normally (ie if you fail to allocate in 1T, then you
> try to allocate from 64G), but movable allocations hapen from lowest
> to highest.  So ZONE_16M ends up full of page cache pages which are
> readily evictable for the rare occasions when we need to allocate memory
> below 16MB.
> 
> I'm sure there are lots of good reasons why this won't work, which is
> why I've been hesitant to propose it before now.

I am worried you are playing with a can of worms...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-24 12:23       ` Michal Hocko
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2018-05-24 12:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: kstewart-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, Huaisheng Ye,
	hehy1-6jq1YtArVR3QT0dZR+AlfA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	alexander.levin-H+0wwilmMs1BDgjK7y7TUQ,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA, Huaisheng Ye,
	chengnt-6jq1YtArVR3QT0dZR+AlfA,
	xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, colyli-l3A5Bk7waGM,
	mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt, vbabka-AlSwsSmVLrQ

On Wed 23-05-18 22:19:19, Matthew Wilcox wrote:
> On Tue, May 22, 2018 at 08:37:28PM +0200, Michal Hocko wrote:
> > So why is this any better than the current code. Sure I am not a great
> > fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> > doesn't look too much better, yet we are losing a check for incompatible
> > gfp flags. The diffstat looks really sound but then you just look and
> > see that the large part is the comment that at least explained the gfp
> > zone modifiers somehow and the debugging code. So what is the selling
> > point?
> 
> I have a plan, but it's not exactly fully-formed yet.
> 
> One of the big problems we have today is that we have a lot of users
> who have constraints on the physical memory they want to allocate,
> but we have very limited abilities to provide them with what they're
> asking for.  The various different ZONEs have different meanings on
> different architectures and are generally a mess.

Agreed.

> If we had eight ZONEs, we could offer:

No, please no more zones. What we have is quite a maint. burden on its
own. Ideally we should only have lowmem, highmem and special/device
zones for directly kernel accessible memory, the one that the kernel
cannot or must not use and completely special memory managed out of
the page allocator. All the remaining constrains should better be
implemented on top.

> ZONE_16M	// 24 bit
> ZONE_256M	// 28 bit
> ZONE_LOWMEM	// CONFIG_32BIT only
> ZONE_4G		// 32 bit
> ZONE_64G	// 36 bit
> ZONE_1T		// 40 bit
> ZONE_ALL	// everything larger
> ZONE_MOVABLE	// movable allocations; no physical address guarantees
> 
> #ifdef CONFIG_64BIT
> #define ZONE_NORMAL	ZONE_ALL
> #else
> #define ZONE_NORMAL	ZONE_LOWMEM
> #endif
> 
> This would cover most driver DMA mask allocations; we could tweak the
> offered zones based on analysis of what people need.

But those already do have aproper API, IIUC. So do we really need to
make our GFP_*/Zone API more complicated than it already is?

> #define GFP_HIGHUSER		(GFP_USER | ZONE_ALL)
> #define GFP_HIGHUSER_MOVABLE	(GFP_USER | ZONE_MOVABLE)
> 
> One other thing I want to see is that fallback from zones happens from
> highest to lowest normally (ie if you fail to allocate in 1T, then you
> try to allocate from 64G), but movable allocations hapen from lowest
> to highest.  So ZONE_16M ends up full of page cache pages which are
> readily evictable for the rare occasions when we need to allocate memory
> below 16MB.
> 
> I'm sure there are lots of good reasons why this won't work, which is
> why I've been hesitant to propose it before now.

I am worried you are playing with a can of worms...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-24  5:19     ` Matthew Wilcox
  (?)
  (?)
@ 2018-05-24 12:23     ` Michal Hocko
  -1 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2018-05-24 12:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: kstewart, Huaisheng Ye, hehy1, gregkh, linux-kernel,
	alexander.levin, linux-mm, iommu, linux-btrfs, Huaisheng Ye,
	chengnt, xen-devel, akpm, colyli, mgorman, vbabka

On Wed 23-05-18 22:19:19, Matthew Wilcox wrote:
> On Tue, May 22, 2018 at 08:37:28PM +0200, Michal Hocko wrote:
> > So why is this any better than the current code. Sure I am not a great
> > fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> > doesn't look too much better, yet we are losing a check for incompatible
> > gfp flags. The diffstat looks really sound but then you just look and
> > see that the large part is the comment that at least explained the gfp
> > zone modifiers somehow and the debugging code. So what is the selling
> > point?
> 
> I have a plan, but it's not exactly fully-formed yet.
> 
> One of the big problems we have today is that we have a lot of users
> who have constraints on the physical memory they want to allocate,
> but we have very limited abilities to provide them with what they're
> asking for.  The various different ZONEs have different meanings on
> different architectures and are generally a mess.

Agreed.

> If we had eight ZONEs, we could offer:

No, please no more zones. What we have is quite a maint. burden on its
own. Ideally we should only have lowmem, highmem and special/device
zones for directly kernel accessible memory, the one that the kernel
cannot or must not use and completely special memory managed out of
the page allocator. All the remaining constrains should better be
implemented on top.

> ZONE_16M	// 24 bit
> ZONE_256M	// 28 bit
> ZONE_LOWMEM	// CONFIG_32BIT only
> ZONE_4G		// 32 bit
> ZONE_64G	// 36 bit
> ZONE_1T		// 40 bit
> ZONE_ALL	// everything larger
> ZONE_MOVABLE	// movable allocations; no physical address guarantees
> 
> #ifdef CONFIG_64BIT
> #define ZONE_NORMAL	ZONE_ALL
> #else
> #define ZONE_NORMAL	ZONE_LOWMEM
> #endif
> 
> This would cover most driver DMA mask allocations; we could tweak the
> offered zones based on analysis of what people need.

But those already do have aproper API, IIUC. So do we really need to
make our GFP_*/Zone API more complicated than it already is?

> #define GFP_HIGHUSER		(GFP_USER | ZONE_ALL)
> #define GFP_HIGHUSER_MOVABLE	(GFP_USER | ZONE_MOVABLE)
> 
> One other thing I want to see is that fallback from zones happens from
> highest to lowest normally (ie if you fail to allocate in 1T, then you
> try to allocate from 64G), but movable allocations hapen from lowest
> to highest.  So ZONE_16M ends up full of page cache pages which are
> readily evictable for the rare occasions when we need to allocate memory
> below 16MB.
> 
> I'm sure there are lots of good reasons why this won't work, which is
> why I've been hesitant to propose it before now.

I am worried you are playing with a can of worms...
-- 
Michal Hocko
SUSE Labs

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-24  5:19     ` Matthew Wilcox
  0 siblings, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2018-05-24  5:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Huaisheng Ye, akpm, linux-mm, vbabka, mgorman, kstewart,
	alexander.levin, gregkh, colyli, chengnt, hehy1, linux-kernel,
	iommu, xen-devel, linux-btrfs, Huaisheng Ye

On Tue, May 22, 2018 at 08:37:28PM +0200, Michal Hocko wrote:
> So why is this any better than the current code. Sure I am not a great
> fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> doesn't look too much better, yet we are losing a check for incompatible
> gfp flags. The diffstat looks really sound but then you just look and
> see that the large part is the comment that at least explained the gfp
> zone modifiers somehow and the debugging code. So what is the selling
> point?

I have a plan, but it's not exactly fully-formed yet.

One of the big problems we have today is that we have a lot of users
who have constraints on the physical memory they want to allocate,
but we have very limited abilities to provide them with what they're
asking for.  The various different ZONEs have different meanings on
different architectures and are generally a mess.

If we had eight ZONEs, we could offer:

ZONE_16M	// 24 bit
ZONE_256M	// 28 bit
ZONE_LOWMEM	// CONFIG_32BIT only
ZONE_4G		// 32 bit
ZONE_64G	// 36 bit
ZONE_1T		// 40 bit
ZONE_ALL	// everything larger
ZONE_MOVABLE	// movable allocations; no physical address guarantees

#ifdef CONFIG_64BIT
#define ZONE_NORMAL	ZONE_ALL
#else
#define ZONE_NORMAL	ZONE_LOWMEM
#endif

This would cover most driver DMA mask allocations; we could tweak the
offered zones based on analysis of what people need.

#define GFP_HIGHUSER		(GFP_USER | ZONE_ALL)
#define GFP_HIGHUSER_MOVABLE	(GFP_USER | ZONE_MOVABLE)

One other thing I want to see is that fallback from zones happens from
highest to lowest normally (ie if you fail to allocate in 1T, then you
try to allocate from 64G), but movable allocations hapen from lowest
to highest.  So ZONE_16M ends up full of page cache pages which are
readily evictable for the rare occasions when we need to allocate memory
below 16MB.

I'm sure there are lots of good reasons why this won't work, which is
why I've been hesitant to propose it before now.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-24  5:19     ` Matthew Wilcox
  0 siblings, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2018-05-24  5:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: kstewart-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, Huaisheng Ye,
	hehy1-6jq1YtArVR3QT0dZR+AlfA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	alexander.levin-H+0wwilmMs1BDgjK7y7TUQ,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA, Huaisheng Ye,
	chengnt-6jq1YtArVR3QT0dZR+AlfA,
	xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, colyli-l3A5Bk7waGM,
	mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt, vbabka-AlSwsSmVLrQ

On Tue, May 22, 2018 at 08:37:28PM +0200, Michal Hocko wrote:
> So why is this any better than the current code. Sure I am not a great
> fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> doesn't look too much better, yet we are losing a check for incompatible
> gfp flags. The diffstat looks really sound but then you just look and
> see that the large part is the comment that at least explained the gfp
> zone modifiers somehow and the debugging code. So what is the selling
> point?

I have a plan, but it's not exactly fully-formed yet.

One of the big problems we have today is that we have a lot of users
who have constraints on the physical memory they want to allocate,
but we have very limited abilities to provide them with what they're
asking for.  The various different ZONEs have different meanings on
different architectures and are generally a mess.

If we had eight ZONEs, we could offer:

ZONE_16M	// 24 bit
ZONE_256M	// 28 bit
ZONE_LOWMEM	// CONFIG_32BIT only
ZONE_4G		// 32 bit
ZONE_64G	// 36 bit
ZONE_1T		// 40 bit
ZONE_ALL	// everything larger
ZONE_MOVABLE	// movable allocations; no physical address guarantees

#ifdef CONFIG_64BIT
#define ZONE_NORMAL	ZONE_ALL
#else
#define ZONE_NORMAL	ZONE_LOWMEM
#endif

This would cover most driver DMA mask allocations; we could tweak the
offered zones based on analysis of what people need.

#define GFP_HIGHUSER		(GFP_USER | ZONE_ALL)
#define GFP_HIGHUSER_MOVABLE	(GFP_USER | ZONE_MOVABLE)

One other thing I want to see is that fallback from zones happens from
highest to lowest normally (ie if you fail to allocate in 1T, then you
try to allocate from 64G), but movable allocations hapen from lowest
to highest.  So ZONE_16M ends up full of page cache pages which are
readily evictable for the rare occasions when we need to allocate memory
below 16MB.

I'm sure there are lots of good reasons why this won't work, which is
why I've been hesitant to propose it before now.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-22 18:37 ` Michal Hocko
  2018-05-24  5:19     ` Matthew Wilcox
@ 2018-05-24  5:19   ` Matthew Wilcox
  1 sibling, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2018-05-24  5:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: kstewart, Huaisheng Ye, hehy1, gregkh, linux-kernel,
	alexander.levin, linux-mm, iommu, linux-btrfs, Huaisheng Ye,
	chengnt, xen-devel, akpm, colyli, mgorman, vbabka

On Tue, May 22, 2018 at 08:37:28PM +0200, Michal Hocko wrote:
> So why is this any better than the current code. Sure I am not a great
> fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> doesn't look too much better, yet we are losing a check for incompatible
> gfp flags. The diffstat looks really sound but then you just look and
> see that the large part is the comment that at least explained the gfp
> zone modifiers somehow and the debugging code. So what is the selling
> point?

I have a plan, but it's not exactly fully-formed yet.

One of the big problems we have today is that we have a lot of users
who have constraints on the physical memory they want to allocate,
but we have very limited abilities to provide them with what they're
asking for.  The various different ZONEs have different meanings on
different architectures and are generally a mess.

If we had eight ZONEs, we could offer:

ZONE_16M	// 24 bit
ZONE_256M	// 28 bit
ZONE_LOWMEM	// CONFIG_32BIT only
ZONE_4G		// 32 bit
ZONE_64G	// 36 bit
ZONE_1T		// 40 bit
ZONE_ALL	// everything larger
ZONE_MOVABLE	// movable allocations; no physical address guarantees

#ifdef CONFIG_64BIT
#define ZONE_NORMAL	ZONE_ALL
#else
#define ZONE_NORMAL	ZONE_LOWMEM
#endif

This would cover most driver DMA mask allocations; we could tweak the
offered zones based on analysis of what people need.

#define GFP_HIGHUSER		(GFP_USER | ZONE_ALL)
#define GFP_HIGHUSER_MOVABLE	(GFP_USER | ZONE_MOVABLE)

One other thing I want to see is that fallback from zones happens from
highest to lowest normally (ie if you fail to allocate in 1T, then you
try to allocate from 64G), but movable allocations hapen from lowest
to highest.  So ZONE_16M ends up full of page cache pages which are
readily evictable for the rare occasions when we need to allocate memory
below 16MB.

I'm sure there are lots of good reasons why this won't work, which is
why I've been hesitant to propose it before now.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-21 15:20 Huaisheng Ye
                   ` (2 preceding siblings ...)
  2018-05-22 18:37 ` Michal Hocko
@ 2018-05-22 18:37 ` Michal Hocko
  2018-05-24  5:19     ` Matthew Wilcox
  2018-05-24  5:19   ` Matthew Wilcox
  3 siblings, 2 replies; 24+ messages in thread
From: Michal Hocko @ 2018-05-22 18:37 UTC (permalink / raw)
  To: Huaisheng Ye
  Cc: akpm, linux-mm, willy, vbabka, mgorman, kstewart,
	alexander.levin, gregkh, colyli, chengnt, hehy1, linux-kernel,
	iommu, xen-devel, linux-btrfs, Huaisheng Ye

On Mon 21-05-18 23:20:21, Huaisheng Ye wrote:
> From: Huaisheng Ye <yehs1@lenovo.com>
> 
> Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.
> 
> Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
> the bottom three bits of GFP mask is reserved for storing encoded
> zone number.
> 
> The encoding method is XOR. Get zone number from enum zone_type,
> then encode the number with ZONE_NORMAL by XOR operation.
> The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
> the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
> can be used as before.
> 
> Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
> a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
> for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
> __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
> __GFP_ZONE_MOVABLE is created to realize it.
> 
> With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
> enough to get ZONE_MOVABLE from gfp_zone. All callers should use
> GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that.
> 
> Decode zone number directly from bottom three bits of flags in gfp_zone.
> The theory of encoding and decoding is,
>         A ^ B ^ B = A

So why is this any better than the current code. Sure I am not a great
fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
doesn't look too much better, yet we are losing a check for incompatible
gfp flags. The diffstat looks really sound but then you just look and
see that the large part is the comment that at least explained the gfp
zone modifiers somehow and the debugging code. So what is the selling
point?

> Changes since v1,
> 
> v2: Add __GFP_ZONE_MOVABLE and modify GFP_HIGHUSER_MOVABLE to help
> callers to get ZONE_MOVABLE. Add __GFP_ZONE_MASK to mask lowest 3
> bits of GFP bitmasks.
> Modify some callers' gfp flag to update usage of address zone
> modifiers.
> Modify inline function gfp_zone to get better performance according
> to Matthew's suggestion.
> 
> Link: https://marc.info/?l=linux-mm&m=152596791931266&w=2
> 
> Huaisheng Ye (12):
>   include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD
>   arch/x86/kernel/amd_gart_64: update usage of address zone modifiers
>   arch/x86/kernel/pci-calgary_64: update usage of address zone modifiers
>   drivers/iommu/amd_iommu: update usage of address zone modifiers
>   include/linux/dma-mapping: update usage of address zone modifiers
>   drivers/xen/swiotlb-xen: update usage of address zone modifiers
>   fs/btrfs/extent_io: update usage of address zone modifiers
>   drivers/block/zram/zram_drv: update usage of address zone modifiers
>   mm/vmpressure: update usage of address zone modifiers
>   mm/zsmalloc: update usage of address zone modifiers
>   include/linux/highmem: update usage of movableflags
>   arch/x86/include/asm/page.h: update usage of movableflags
> 
>  arch/x86/include/asm/page.h      |  3 +-
>  arch/x86/kernel/amd_gart_64.c    |  2 +-
>  arch/x86/kernel/pci-calgary_64.c |  2 +-
>  drivers/block/zram/zram_drv.c    |  6 +--
>  drivers/iommu/amd_iommu.c        |  2 +-
>  drivers/xen/swiotlb-xen.c        |  2 +-
>  fs/btrfs/extent_io.c             |  2 +-
>  include/linux/dma-mapping.h      |  2 +-
>  include/linux/gfp.h              | 98 +++++-----------------------------------
>  include/linux/highmem.h          |  4 +-
>  mm/vmpressure.c                  |  2 +-
>  mm/zsmalloc.c                    |  4 +-
>  12 files changed, 26 insertions(+), 103 deletions(-)
> 
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-21 15:20 Huaisheng Ye
  2018-05-22  9:40 ` Christoph Hellwig
  2018-05-22  9:40   ` Christoph Hellwig
@ 2018-05-22 18:37 ` Michal Hocko
  2018-05-22 18:37 ` Michal Hocko
  3 siblings, 0 replies; 24+ messages in thread
From: Michal Hocko @ 2018-05-22 18:37 UTC (permalink / raw)
  To: Huaisheng Ye
  Cc: kstewart, Huaisheng Ye, hehy1, gregkh, linux-kernel, willy,
	alexander.levin, linux-mm, iommu, linux-btrfs, chengnt,
	xen-devel, akpm, colyli, mgorman, vbabka

On Mon 21-05-18 23:20:21, Huaisheng Ye wrote:
> From: Huaisheng Ye <yehs1@lenovo.com>
> 
> Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.
> 
> Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
> the bottom three bits of GFP mask is reserved for storing encoded
> zone number.
> 
> The encoding method is XOR. Get zone number from enum zone_type,
> then encode the number with ZONE_NORMAL by XOR operation.
> The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
> the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
> can be used as before.
> 
> Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
> a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
> for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
> __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
> __GFP_ZONE_MOVABLE is created to realize it.
> 
> With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
> enough to get ZONE_MOVABLE from gfp_zone. All callers should use
> GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that.
> 
> Decode zone number directly from bottom three bits of flags in gfp_zone.
> The theory of encoding and decoding is,
>         A ^ B ^ B = A

So why is this any better than the current code. Sure I am not a great
fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
doesn't look too much better, yet we are losing a check for incompatible
gfp flags. The diffstat looks really sound but then you just look and
see that the large part is the comment that at least explained the gfp
zone modifiers somehow and the debugging code. So what is the selling
point?

> Changes since v1,
> 
> v2: Add __GFP_ZONE_MOVABLE and modify GFP_HIGHUSER_MOVABLE to help
> callers to get ZONE_MOVABLE. Add __GFP_ZONE_MASK to mask lowest 3
> bits of GFP bitmasks.
> Modify some callers' gfp flag to update usage of address zone
> modifiers.
> Modify inline function gfp_zone to get better performance according
> to Matthew's suggestion.
> 
> Link: https://marc.info/?l=linux-mm&m=152596791931266&w=2
> 
> Huaisheng Ye (12):
>   include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD
>   arch/x86/kernel/amd_gart_64: update usage of address zone modifiers
>   arch/x86/kernel/pci-calgary_64: update usage of address zone modifiers
>   drivers/iommu/amd_iommu: update usage of address zone modifiers
>   include/linux/dma-mapping: update usage of address zone modifiers
>   drivers/xen/swiotlb-xen: update usage of address zone modifiers
>   fs/btrfs/extent_io: update usage of address zone modifiers
>   drivers/block/zram/zram_drv: update usage of address zone modifiers
>   mm/vmpressure: update usage of address zone modifiers
>   mm/zsmalloc: update usage of address zone modifiers
>   include/linux/highmem: update usage of movableflags
>   arch/x86/include/asm/page.h: update usage of movableflags
> 
>  arch/x86/include/asm/page.h      |  3 +-
>  arch/x86/kernel/amd_gart_64.c    |  2 +-
>  arch/x86/kernel/pci-calgary_64.c |  2 +-
>  drivers/block/zram/zram_drv.c    |  6 +--
>  drivers/iommu/amd_iommu.c        |  2 +-
>  drivers/xen/swiotlb-xen.c        |  2 +-
>  fs/btrfs/extent_io.c             |  2 +-
>  include/linux/dma-mapping.h      |  2 +-
>  include/linux/gfp.h              | 98 +++++-----------------------------------
>  include/linux/highmem.h          |  4 +-
>  mm/vmpressure.c                  |  2 +-
>  mm/zsmalloc.c                    |  4 +-
>  12 files changed, 26 insertions(+), 103 deletions(-)
> 
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-22 10:22 ` Huaisheng HS1 Ye
  0 siblings, 0 replies; 24+ messages in thread
From: Huaisheng HS1 Ye @ 2018-05-22 10:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: akpm, linux-mm, mhocko, willy, vbabka, mgorman, kstewart,
	alexander.levin, gregkh, colyli, NingTing Cheng, Ocean HY1 He,
	linux-kernel, iommu, xen-devel, linux-btrfs, Huaisheng Ye

From: owner-linux-mm@kvack.org On Behalf Of Christoph Hellwig
> This seems to be missing patch 1 and generally be in somewhat odd format.
> Can you try to resend it with git-send-email and against current Linus'
> tree?
> 
Sure, I will rebase them to current mainline ASAP.

> Also I'd suggest you do cleanups like adding and using __GFP_ZONE_MASK
> at the beginning of the series before doing any real changes.

Ok, thanks for your suggestion.

Sincerely,
Huaisheng Ye

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-22 10:22 ` Huaisheng HS1 Ye
  0 siblings, 0 replies; 24+ messages in thread
From: Huaisheng HS1 Ye @ 2018-05-22 10:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kstewart-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, mhocko-IBi9RG/b67k,
	Ocean HY1 He, gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	willy-wEGCiKHe2LqWVfeAwA7xHQ,
	alexander.levin-H+0wwilmMs1BDgjK7y7TUQ,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA, NingTing Cheng,
	xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, colyli-l3A5Bk7waGM,
	mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt, vbabka-AlSwsSmVLrQ,
	Huaisheng Ye

From: owner-linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org On Behalf Of Christoph Hellwig
> This seems to be missing patch 1 and generally be in somewhat odd format.
> Can you try to resend it with git-send-email and against current Linus'
> tree?
> 
Sure, I will rebase them to current mainline ASAP.

> Also I'd suggest you do cleanups like adding and using __GFP_ZONE_MASK
> at the beginning of the series before doing any real changes.

Ok, thanks for your suggestion.

Sincerely,
Huaisheng Ye

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-22 10:22 Huaisheng HS1 Ye
  0 siblings, 0 replies; 24+ messages in thread
From: Huaisheng HS1 Ye @ 2018-05-22 10:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kstewart, mhocko, Ocean HY1 He, gregkh, linux-kernel, willy,
	alexander.levin, linux-mm, iommu, linux-btrfs, NingTing Cheng,
	xen-devel, akpm, colyli, mgorman, vbabka, Huaisheng Ye

From: owner-linux-mm@kvack.org On Behalf Of Christoph Hellwig
> This seems to be missing patch 1 and generally be in somewhat odd format.
> Can you try to resend it with git-send-email and against current Linus'
> tree?
> 
Sure, I will rebase them to current mainline ASAP.

> Also I'd suggest you do cleanups like adding and using __GFP_ZONE_MASK
> at the beginning of the series before doing any real changes.

Ok, thanks for your suggestion.

Sincerely,
Huaisheng Ye

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-22  9:40   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2018-05-22  9:40 UTC (permalink / raw)
  To: Huaisheng Ye
  Cc: akpm, linux-mm, mhocko, willy, vbabka, mgorman, kstewart,
	alexander.levin, gregkh, colyli, chengnt, hehy1, linux-kernel,
	iommu, xen-devel, linux-btrfs, Huaisheng Ye

This seems to be missing patch 1 and generally be in somewhat odd format.
Can you try to resend it with git-send-email and against current Linus'
tree?

Also I'd suggest you do cleanups like adding and using __GFP_ZONE_MASK
at the beginning of the series before doing any real changes.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-22  9:40   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2018-05-22  9:40 UTC (permalink / raw)
  To: Huaisheng Ye
  Cc: kstewart-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, mhocko-IBi9RG/b67k,
	Huaisheng Ye, hehy1-6jq1YtArVR3QT0dZR+AlfA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	willy-wEGCiKHe2LqWVfeAwA7xHQ,
	alexander.levin-H+0wwilmMs1BDgjK7y7TUQ,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	chengnt-6jq1YtArVR3QT0dZR+AlfA,
	xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, colyli-l3A5Bk7waGM,
	mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt, vbabka-AlSwsSmVLrQ

This seems to be missing patch 1 and generally be in somewhat odd format.
Can you try to resend it with git-send-email and against current Linus'
tree?

Also I'd suggest you do cleanups like adding and using __GFP_ZONE_MASK
at the beginning of the series before doing any real changes.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
  2018-05-21 15:20 Huaisheng Ye
@ 2018-05-22  9:40 ` Christoph Hellwig
  2018-05-22  9:40   ` Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2018-05-22  9:40 UTC (permalink / raw)
  To: Huaisheng Ye
  Cc: kstewart, mhocko, Huaisheng Ye, hehy1, gregkh, linux-kernel,
	willy, alexander.levin, linux-mm, iommu, linux-btrfs, chengnt,
	xen-devel, akpm, colyli, mgorman, vbabka

This seems to be missing patch 1 and generally be in somewhat odd format.
Can you try to resend it with git-send-email and against current Linus'
tree?

Also I'd suggest you do cleanups like adding and using __GFP_ZONE_MASK
at the beginning of the series before doing any real changes.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
@ 2018-05-21 15:20 Huaisheng Ye
  2018-05-22  9:40 ` Christoph Hellwig
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Huaisheng Ye @ 2018-05-21 15:20 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: mhocko, willy, vbabka, mgorman, kstewart, alexander.levin,
	gregkh, colyli, chengnt, hehy1, linux-kernel, iommu, xen-devel,
	linux-btrfs, Huaisheng Ye

From: Huaisheng Ye <yehs1@lenovo.com>

Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.

Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
the bottom three bits of GFP mask is reserved for storing encoded
zone number.

The encoding method is XOR. Get zone number from enum zone_type,
then encode the number with ZONE_NORMAL by XOR operation.
The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
can be used as before.

Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
__GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
__GFP_ZONE_MOVABLE is created to realize it.

With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
enough to get ZONE_MOVABLE from gfp_zone. All callers should use
GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that.

Decode zone number directly from bottom three bits of flags in gfp_zone.
The theory of encoding and decoding is,
        A ^ B ^ B = A

Changes since v1,

v2: Add __GFP_ZONE_MOVABLE and modify GFP_HIGHUSER_MOVABLE to help
callers to get ZONE_MOVABLE. Add __GFP_ZONE_MASK to mask lowest 3
bits of GFP bitmasks.
Modify some callers' gfp flag to update usage of address zone
modifiers.
Modify inline function gfp_zone to get better performance according
to Matthew's suggestion.

Link: https://marc.info/?l=linux-mm&m=152596791931266&w=2

Huaisheng Ye (12):
  include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD
  arch/x86/kernel/amd_gart_64: update usage of address zone modifiers
  arch/x86/kernel/pci-calgary_64: update usage of address zone modifiers
  drivers/iommu/amd_iommu: update usage of address zone modifiers
  include/linux/dma-mapping: update usage of address zone modifiers
  drivers/xen/swiotlb-xen: update usage of address zone modifiers
  fs/btrfs/extent_io: update usage of address zone modifiers
  drivers/block/zram/zram_drv: update usage of address zone modifiers
  mm/vmpressure: update usage of address zone modifiers
  mm/zsmalloc: update usage of address zone modifiers
  include/linux/highmem: update usage of movableflags
  arch/x86/include/asm/page.h: update usage of movableflags

 arch/x86/include/asm/page.h      |  3 +-
 arch/x86/kernel/amd_gart_64.c    |  2 +-
 arch/x86/kernel/pci-calgary_64.c |  2 +-
 drivers/block/zram/zram_drv.c    |  6 +--
 drivers/iommu/amd_iommu.c        |  2 +-
 drivers/xen/swiotlb-xen.c        |  2 +-
 fs/btrfs/extent_io.c             |  2 +-
 include/linux/dma-mapping.h      |  2 +-
 include/linux/gfp.h              | 98 +++++-----------------------------------
 include/linux/highmem.h          |  4 +-
 mm/vmpressure.c                  |  2 +-
 mm/zsmalloc.c                    |  4 +-
 12 files changed, 26 insertions(+), 103 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-05-28 15:56 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-21 15:20 [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD Huaisheng Ye
  -- strict thread matches above, loose matches on Subject: below --
2018-05-22 10:22 Huaisheng HS1 Ye
2018-05-22 10:22 Huaisheng HS1 Ye
2018-05-22 10:22 ` Huaisheng HS1 Ye
2018-05-21 15:20 Huaisheng Ye
2018-05-22  9:40 ` Christoph Hellwig
2018-05-22  9:40 ` Christoph Hellwig
2018-05-22  9:40   ` Christoph Hellwig
2018-05-22 18:37 ` Michal Hocko
2018-05-22 18:37 ` Michal Hocko
2018-05-24  5:19   ` Matthew Wilcox
2018-05-24  5:19     ` Matthew Wilcox
2018-05-24 12:23     ` Michal Hocko
2018-05-24 12:23       ` Michal Hocko
2018-05-24 15:18       ` Matthew Wilcox
2018-05-24 15:18       ` Matthew Wilcox
2018-05-24 15:29         ` Michal Hocko
2018-05-24 15:29         ` Michal Hocko
2018-05-25 12:00           ` Matthew Wilcox
2018-05-25 12:00           ` Matthew Wilcox
2018-05-28 13:33             ` Michal Hocko
2018-05-28 13:33               ` Michal Hocko
2018-05-24 12:23     ` Michal Hocko
2018-05-24  5:19   ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.