linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: optimize PageHighMem() check
@ 2015-09-29  7:54 Vineet Gupta
  2015-10-01 13:12 ` Michal Hocko
  2015-10-01 23:25 ` Andrew Morton
  0 siblings, 2 replies; 7+ messages in thread
From: Vineet Gupta @ 2015-09-29  7:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Vineet Gupta, Naoya Horiguchi, Hugh Dickins,
	Kirill A. Shutemov, Michal Hocko, Jennifer Herbert,
	Konstantin Khlebnikov, linux-kernel

This came up when implementing HIHGMEM/PAE40 for ARC.
The kmap() / kmap_atomic() generated code seemed needlessly bloated due
to the way PageHighMem() macro is implemented.
It derives the exact zone for page and then does pointer subtraction
with first zone to infer the zone_type.
The pointer arithmatic in turn generates the code bloat.

PageHighMem(page)
  is_highmem(page_zone(page))
     zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones

Instead use is_highmem_idx() to work on zone_type available in page flags

   ----- Before -----
80756348:	mov_s      r13,r0
8075634a:	ld_s       r2,[r13,0]
8075634c:	lsr_s      r2,r2,30
8075634e:	mpy        r2,r2,0x2a4
80756352:	add_s      r2,r2,0x80aef880
80756358:	ld_s       r3,[r2,28]
8075635a:	sub_s      r2,r2,r3
8075635c:	breq       r2,0x2a4,80756378 <kmap+0x48>
80756364:	breq       r2,0x548,80756378 <kmap+0x48>

   ----- After  -----
80756330:	mov_s      r13,r0
80756332:	ld_s       r2,[r13,0]
80756334:	lsr_s      r2,r2,30
80756336:	sub_s      r2,r2,1
80756338:	brlo       r2,2,80756348 <kmap+0x30>

For x86 defconfig build (32 bit only) it saves around 900 bytes.
For ARC defconfig with HIGHMEM, it saved around 2K bytes.

   ---->8-------
./scripts/bloat-o-meter x86/vmlinux-defconfig-pre x86/vmlinux-defconfig-post
add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-934 (-934)
function                                     old     new   delta
saveable_page                                162     154      -8
saveable_highmem_page                        154     146      -8
skb_gro_reset_offset                         147     131     -16
...
...
__change_page_attr_set_clr                  1715    1678     -37
setup_data_read                              434     394     -40
mon_bin_event                               1967    1927     -40
swsusp_save                                 1148    1105     -43
_set_pages_array                             549     493     -56
   ---->8-------

e.g. For ARC kmap()

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jennifer Herbert <jennifer.herbert@citrix.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 include/linux/page-flags.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 41c93844fb1d..2953aaa06d67 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -252,7 +252,7 @@ PAGEFLAG(Readahead, reclaim) TESTCLEARFLAG(Readahead, reclaim)
  * Must use a macro here due to header dependency issues. page_zone() is not
  * available at this point.
  */
-#define PageHighMem(__p) is_highmem(page_zone(__p))
+#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
 #else
 PAGEFLAG_FALSE(HighMem)
 #endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: optimize PageHighMem() check
  2015-09-29  7:54 [PATCH] mm: optimize PageHighMem() check Vineet Gupta
@ 2015-10-01 13:12 ` Michal Hocko
  2015-10-01 23:25 ` Andrew Morton
  1 sibling, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2015-10-01 13:12 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Andrew Morton, linux-mm, Naoya Horiguchi, Hugh Dickins,
	Kirill A. Shutemov, Jennifer Herbert, Konstantin Khlebnikov,
	linux-kernel

On Tue 29-09-15 13:24:20, Vineet Gupta wrote:
> This came up when implementing HIHGMEM/PAE40 for ARC.
> The kmap() / kmap_atomic() generated code seemed needlessly bloated due
> to the way PageHighMem() macro is implemented.
> It derives the exact zone for page and then does pointer subtraction
> with first zone to infer the zone_type.
> The pointer arithmatic in turn generates the code bloat.
> 
> PageHighMem(page)
>   is_highmem(page_zone(page))
>      zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones
> 
> Instead use is_highmem_idx() to work on zone_type available in page flags
> 
>    ----- Before -----
> 80756348:	mov_s      r13,r0
> 8075634a:	ld_s       r2,[r13,0]
> 8075634c:	lsr_s      r2,r2,30
> 8075634e:	mpy        r2,r2,0x2a4
> 80756352:	add_s      r2,r2,0x80aef880
> 80756358:	ld_s       r3,[r2,28]
> 8075635a:	sub_s      r2,r2,r3
> 8075635c:	breq       r2,0x2a4,80756378 <kmap+0x48>
> 80756364:	breq       r2,0x548,80756378 <kmap+0x48>
> 
>    ----- After  -----
> 80756330:	mov_s      r13,r0
> 80756332:	ld_s       r2,[r13,0]
> 80756334:	lsr_s      r2,r2,30
> 80756336:	sub_s      r2,r2,1
> 80756338:	brlo       r2,2,80756348 <kmap+0x30>
> 
> For x86 defconfig build (32 bit only) it saves around 900 bytes.
> For ARC defconfig with HIGHMEM, it saved around 2K bytes.
> 
>    ---->8-------
> ./scripts/bloat-o-meter x86/vmlinux-defconfig-pre x86/vmlinux-defconfig-post
> add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-934 (-934)
> function                                     old     new   delta
> saveable_page                                162     154      -8
> saveable_highmem_page                        154     146      -8
> skb_gro_reset_offset                         147     131     -16
> ...
> ...
> __change_page_attr_set_clr                  1715    1678     -37
> setup_data_read                              434     394     -40
> mon_bin_event                               1967    1927     -40
> swsusp_save                                 1148    1105     -43
> _set_pages_array                             549     493     -56
>    ---->8-------
> 
> e.g. For ARC kmap()
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Jennifer Herbert <jennifer.herbert@citrix.com>
> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

Looks reasonably to me.
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/page-flags.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 41c93844fb1d..2953aaa06d67 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -252,7 +252,7 @@ PAGEFLAG(Readahead, reclaim) TESTCLEARFLAG(Readahead, reclaim)
>   * Must use a macro here due to header dependency issues. page_zone() is not
>   * available at this point.
>   */
> -#define PageHighMem(__p) is_highmem(page_zone(__p))
> +#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
>  #else
>  PAGEFLAG_FALSE(HighMem)
>  #endif
> -- 
> 1.9.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: optimize PageHighMem() check
  2015-09-29  7:54 [PATCH] mm: optimize PageHighMem() check Vineet Gupta
  2015-10-01 13:12 ` Michal Hocko
@ 2015-10-01 23:25 ` Andrew Morton
  2015-10-02  7:15   ` Vineet Gupta
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2015-10-01 23:25 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-mm, Naoya Horiguchi, Hugh Dickins, Kirill A. Shutemov,
	Michal Hocko, Jennifer Herbert, Konstantin Khlebnikov,
	linux-kernel

On Tue, 29 Sep 2015 13:24:20 +0530 Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:

> This came up when implementing HIHGMEM/PAE40 for ARC.
> The kmap() / kmap_atomic() generated code seemed needlessly bloated due
> to the way PageHighMem() macro is implemented.
> It derives the exact zone for page and then does pointer subtraction
> with first zone to infer the zone_type.
> The pointer arithmatic in turn generates the code bloat.
> 
> PageHighMem(page)
>   is_highmem(page_zone(page))
>      zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones
> 
> Instead use is_highmem_idx() to work on zone_type available in page flags
> 
>    ----- Before -----
> 80756348:	mov_s      r13,r0
> 8075634a:	ld_s       r2,[r13,0]
> 8075634c:	lsr_s      r2,r2,30
> 8075634e:	mpy        r2,r2,0x2a4
> 80756352:	add_s      r2,r2,0x80aef880
> 80756358:	ld_s       r3,[r2,28]
> 8075635a:	sub_s      r2,r2,r3
> 8075635c:	breq       r2,0x2a4,80756378 <kmap+0x48>
> 80756364:	breq       r2,0x548,80756378 <kmap+0x48>
> 
>    ----- After  -----
> 80756330:	mov_s      r13,r0
> 80756332:	ld_s       r2,[r13,0]
> 80756334:	lsr_s      r2,r2,30
> 80756336:	sub_s      r2,r2,1
> 80756338:	brlo       r2,2,80756348 <kmap+0x30>
> 
> For x86 defconfig build (32 bit only) it saves around 900 bytes.
> For ARC defconfig with HIGHMEM, it saved around 2K bytes.
> 
>    ---->8-------
> ./scripts/bloat-o-meter x86/vmlinux-defconfig-pre x86/vmlinux-defconfig-post
> add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-934 (-934)
> function                                     old     new   delta
> saveable_page                                162     154      -8
> saveable_highmem_page                        154     146      -8
> skb_gro_reset_offset                         147     131     -16
> ...
> ...
> __change_page_attr_set_clr                  1715    1678     -37
> setup_data_read                              434     394     -40
> mon_bin_event                               1967    1927     -40
> swsusp_save                                 1148    1105     -43
> _set_pages_array                             549     493     -56
>    ---->8-------
> 
> e.g. For ARC kmap()
> 

is_highmem() is deranged.  Can't we use a bit in zone->flags or
something?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: optimize PageHighMem() check
  2015-10-01 23:25 ` Andrew Morton
@ 2015-10-02  7:15   ` Vineet Gupta
  2015-10-02 20:53     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Vineet Gupta @ 2015-10-02  7:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Naoya Horiguchi, Hugh Dickins, Kirill A. Shutemov,
	Michal Hocko, Jennifer Herbert, Konstantin Khlebnikov,
	linux-kernel

On Friday 02 October 2015 04:55 AM, Andrew Morton wrote:
> On Tue, 29 Sep 2015 13:24:20 +0530 Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
> 
>> > This came up when implementing HIHGMEM/PAE40 for ARC.
>> > The kmap() / kmap_atomic() generated code seemed needlessly bloated due
>> > to the way PageHighMem() macro is implemented.
>> > It derives the exact zone for page and then does pointer subtraction
>> > with first zone to infer the zone_type.
>> > The pointer arithmatic in turn generates the code bloat.
>> > 
>> > PageHighMem(page)
>> >   is_highmem(page_zone(page))
>> >      zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones
>> > 
>> > Instead use is_highmem_idx() to work on zone_type available in page flags
>> > 
>> >    ----- Before -----
>> > 80756348:	mov_s      r13,r0
>> > 8075634a:	ld_s       r2,[r13,0]
>> > 8075634c:	lsr_s      r2,r2,30
>> > 8075634e:	mpy        r2,r2,0x2a4
>> > 80756352:	add_s      r2,r2,0x80aef880
>> > 80756358:	ld_s       r3,[r2,28]
>> > 8075635a:	sub_s      r2,r2,r3
>> > 8075635c:	breq       r2,0x2a4,80756378 <kmap+0x48>
>> > 80756364:	breq       r2,0x548,80756378 <kmap+0x48>
>> > 
>> >    ----- After  -----
>> > 80756330:	mov_s      r13,r0
>> > 80756332:	ld_s       r2,[r13,0]
>> > 80756334:	lsr_s      r2,r2,30
>> > 80756336:	sub_s      r2,r2,1
>> > 80756338:	brlo       r2,2,80756348 <kmap+0x30>
>> > 
>> > For x86 defconfig build (32 bit only) it saves around 900 bytes.
>> > For ARC defconfig with HIGHMEM, it saved around 2K bytes.
>> > 
>> >    ---->8-------
>> > ./scripts/bloat-o-meter x86/vmlinux-defconfig-pre x86/vmlinux-defconfig-post
>> > add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-934 (-934)
>> > function                                     old     new   delta
>> > saveable_page                                162     154      -8
>> > saveable_highmem_page                        154     146      -8
>> > skb_gro_reset_offset                         147     131     -16
>> > ...
>> > ...
>> > __change_page_attr_set_clr                  1715    1678     -37
>> > setup_data_read                              434     394     -40
>> > mon_bin_event                               1967    1927     -40
>> > swsusp_save                                 1148    1105     -43
>> > _set_pages_array                             549     493     -56
>> >    ---->8-------
>> > 
>> > e.g. For ARC kmap()
>> > 
> is_highmem() is deranged.  Can't we use a bit in zone->flags or
> something?

It won't be "a" bit since zone_type is an enum. However zone->flags could be split
into 2 bitfields to hold enum zone_flags and enum zone_type.
However this patch still is independent of that since we have struct page as
starting point and zone_type is available from there directly w/o monkeying around
with any zone structs.

-Vineet

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: optimize PageHighMem() check
  2015-10-02  7:15   ` Vineet Gupta
@ 2015-10-02 20:53     ` Andrew Morton
  2015-10-03 10:19       ` Vineet Gupta
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2015-10-02 20:53 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-mm, Naoya Horiguchi, Hugh Dickins, Kirill A. Shutemov,
	Michal Hocko, Jennifer Herbert, Konstantin Khlebnikov,
	linux-kernel

On Fri, 2 Oct 2015 12:45:53 +0530 Vineet Gupta <vgupta@synopsys.com> wrote:

> On Friday 02 October 2015 04:55 AM, Andrew Morton wrote:
> > On Tue, 29 Sep 2015 13:24:20 +0530 Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
> > 
> >> > This came up when implementing HIHGMEM/PAE40 for ARC.
> >> > The kmap() / kmap_atomic() generated code seemed needlessly bloated due
> >> > to the way PageHighMem() macro is implemented.
> >> > It derives the exact zone for page and then does pointer subtraction
> >> > with first zone to infer the zone_type.
> >> > The pointer arithmatic in turn generates the code bloat.
> >> > 
> >> > PageHighMem(page)
> >> >   is_highmem(page_zone(page))
> >> >      zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones
> >> > 
> >> > Instead use is_highmem_idx() to work on zone_type available in page flags
> >> > 
> >> >    ----- Before -----
> >> > 80756348:	mov_s      r13,r0
> >> > 8075634a:	ld_s       r2,[r13,0]
> >> > 8075634c:	lsr_s      r2,r2,30
> >> > 8075634e:	mpy        r2,r2,0x2a4
> >> > 80756352:	add_s      r2,r2,0x80aef880
> >> > 80756358:	ld_s       r3,[r2,28]
> >> > 8075635a:	sub_s      r2,r2,r3
> >> > 8075635c:	breq       r2,0x2a4,80756378 <kmap+0x48>
> >> > 80756364:	breq       r2,0x548,80756378 <kmap+0x48>
> >> > 
> >> >    ----- After  -----
> >> > 80756330:	mov_s      r13,r0
> >> > 80756332:	ld_s       r2,[r13,0]
> >> > 80756334:	lsr_s      r2,r2,30
> >> > 80756336:	sub_s      r2,r2,1
> >> > 80756338:	brlo       r2,2,80756348 <kmap+0x30>
> >> > 
> >> > For x86 defconfig build (32 bit only) it saves around 900 bytes.
> >> > For ARC defconfig with HIGHMEM, it saved around 2K bytes.
> >> > 
> >> >    ---->8-------
> >> > ./scripts/bloat-o-meter x86/vmlinux-defconfig-pre x86/vmlinux-defconfig-post
> >> > add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-934 (-934)
> >> > function                                     old     new   delta
> >> > saveable_page                                162     154      -8
> >> > saveable_highmem_page                        154     146      -8
> >> > skb_gro_reset_offset                         147     131     -16
> >> > ...
> >> > ...
> >> > __change_page_attr_set_clr                  1715    1678     -37
> >> > setup_data_read                              434     394     -40
> >> > mon_bin_event                               1967    1927     -40
> >> > swsusp_save                                 1148    1105     -43
> >> > _set_pages_array                             549     493     -56
> >> >    ---->8-------
> >> > 
> >> > e.g. For ARC kmap()
> >> > 
> > is_highmem() is deranged.  Can't we use a bit in zone->flags or
> > something?
> 
> It won't be "a" bit since zone_type is an enum.

Yes it will!

static inline int is_highmem(struct zone *zone)
{
	return test_bit(ZONE_HIGHMEM, &zone->flags);
}

> ...
>
> However this patch still is independent of that since we have struct page as
> starting point and zone_type is available from there directly w/o monkeying around
> with any zone structs.

yup.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: optimize PageHighMem() check
  2015-10-02 20:53     ` Andrew Morton
@ 2015-10-03 10:19       ` Vineet Gupta
  2015-11-06 10:21         ` Vineet Gupta
  0 siblings, 1 reply; 7+ messages in thread
From: Vineet Gupta @ 2015-10-03 10:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Naoya Horiguchi, Hugh Dickins, Kirill A. Shutemov,
	Michal Hocko, Jennifer Herbert, Konstantin Khlebnikov,
	linux-kernel

On Saturday 03 October 2015 02:23 AM, Andrew Morton wrote:
> On Fri, 2 Oct 2015 12:45:53 +0530 Vineet Gupta <vgupta@synopsys.com> wrote:
>
>> On Friday 02 October 2015 04:55 AM, Andrew Morton wrote:
>>> On Tue, 29 Sep 2015 13:24:20 +0530 Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>>>
>>>>> This came up when implementing HIHGMEM/PAE40 for ARC.
>>>>> The kmap() / kmap_atomic() generated code seemed needlessly bloated due
>>>>> to the way PageHighMem() macro is implemented.
>>>>> It derives the exact zone for page and then does pointer subtraction
>>>>> with first zone to infer the zone_type.
>>>>> The pointer arithmatic in turn generates the code bloat.
>>>>>
>>>>> PageHighMem(page)
>>>>>   is_highmem(page_zone(page))
>>>>>      zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones
>>>>>
>>>>> Instead use is_highmem_idx() to work on zone_type available in page flags
>>>>>
>>>>>    ----- Before -----
>>>>> 80756348:	mov_s      r13,r0
>>>>> 8075634a:	ld_s       r2,[r13,0]
>>>>> 8075634c:	lsr_s      r2,r2,30
>>>>> 8075634e:	mpy        r2,r2,0x2a4
>>>>> 80756352:	add_s      r2,r2,0x80aef880
>>>>> 80756358:	ld_s       r3,[r2,28]
>>>>> 8075635a:	sub_s      r2,r2,r3
>>>>> 8075635c:	breq       r2,0x2a4,80756378 <kmap+0x48>
>>>>> 80756364:	breq       r2,0x548,80756378 <kmap+0x48>
>>>>>
>>>>>    ----- After  -----
>>>>> 80756330:	mov_s      r13,r0
>>>>> 80756332:	ld_s       r2,[r13,0]
>>>>> 80756334:	lsr_s      r2,r2,30
>>>>> 80756336:	sub_s      r2,r2,1
>>>>> 80756338:	brlo       r2,2,80756348 <kmap+0x30>
>>>>>
>>>>> For x86 defconfig build (32 bit only) it saves around 900 bytes.
>>>>> For ARC defconfig with HIGHMEM, it saved around 2K bytes.
>>>>>
>>>>>    ---->8-------
>>>>> ./scripts/bloat-o-meter x86/vmlinux-defconfig-pre x86/vmlinux-defconfig-post
>>>>> add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-934 (-934)
>>>>> function                                     old     new   delta
>>>>> saveable_page                                162     154      -8
>>>>> saveable_highmem_page                        154     146      -8
>>>>> skb_gro_reset_offset                         147     131     -16
>>>>> ...
>>>>> ...
>>>>> __change_page_attr_set_clr                  1715    1678     -37
>>>>> setup_data_read                              434     394     -40
>>>>> mon_bin_event                               1967    1927     -40
>>>>> swsusp_save                                 1148    1105     -43
>>>>> _set_pages_array                             549     493     -56
>>>>>    ---->8-------
>>>>>
>>>>> e.g. For ARC kmap()
>>>>>
>>> is_highmem() is deranged.  Can't we use a bit in zone->flags or
>>> something?
>> It won't be "a" bit since zone_type is an enum.
> Yes it will!
>
> static inline int is_highmem(struct zone *zone)
> {
> 	return test_bit(ZONE_HIGHMEM, &zone->flags);
> }

Point is do we want to fix this specific case, or do we want to improve zone_idx()
in general.

#define zone_idx(zone)        ((zone) - (zone)->zone_pgdat->node_zones

If former, I can split up zone->flags into 31:1 bit-field. Otherwise I will split
it into 24:8 to hold both zone_flags and zone_type

>> ...
>>
>> However this patch still is independent of that since we have struct page as
>> starting point and zone_type is available from there directly w/o monkeying around
>> with any zone structs.
> yup.

Do u want me to carry this as part of the the ARC HIGHMEM/PAE40 patchset or would
you rather prefer merging it via mm tree.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: optimize PageHighMem() check
  2015-10-03 10:19       ` Vineet Gupta
@ 2015-11-06 10:21         ` Vineet Gupta
  0 siblings, 0 replies; 7+ messages in thread
From: Vineet Gupta @ 2015-11-06 10:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Naoya Horiguchi, Hugh Dickins, Kirill A. Shutemov,
	Michal Hocko, Jennifer Herbert, Konstantin Khlebnikov,
	linux-kernel, arcml

On Saturday 03 October 2015 03:49 PM, Vineet Gupta wrote:
> On Saturday 03 October 2015 02:23 AM, Andrew Morton wrote:
>> > On Fri, 2 Oct 2015 12:45:53 +0530 Vineet Gupta <vgupta@synopsys.com> wrote:
>> >
>>> >> On Friday 02 October 2015 04:55 AM, Andrew Morton wrote:
>>>> >>> On Tue, 29 Sep 2015 13:24:20 +0530 Vineet Gupta <Vineet.Gupta1@synopsys.com> wrote:
>>>> >>>
>>>>>> >>>>> This came up when implementing HIHGMEM/PAE40 for ARC.
>>>>>> >>>>> The kmap() / kmap_atomic() generated code seemed needlessly bloated due
>>>>>> >>>>> to the way PageHighMem() macro is implemented.
>>>>>> >>>>> It derives the exact zone for page and then does pointer subtraction
>>>>>> >>>>> with first zone to infer the zone_type.
>>>>>> >>>>> The pointer arithmatic in turn generates the code bloat.
>>>>>> >>>>>
>>>>>> >>>>> PageHighMem(page)
>>>>>> >>>>>   is_highmem(page_zone(page))
>>>>>> >>>>>      zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones
>>>>>> >>>>>
>>>>>> >>>>> Instead use is_highmem_idx() to work on zone_type available in page flags
>>>>>> >>>>>
>>>>>> >>>>>    ----- Before -----
>>>>>> >>>>> 80756348:	mov_s      r13,r0
>>>>>> >>>>> 8075634a:	ld_s       r2,[r13,0]
>>>>>> >>>>> 8075634c:	lsr_s      r2,r2,30
>>>>>> >>>>> 8075634e:	mpy        r2,r2,0x2a4
>>>>>> >>>>> 80756352:	add_s      r2,r2,0x80aef880
>>>>>> >>>>> 80756358:	ld_s       r3,[r2,28]
>>>>>> >>>>> 8075635a:	sub_s      r2,r2,r3
>>>>>> >>>>> 8075635c:	breq       r2,0x2a4,80756378 <kmap+0x48>
>>>>>> >>>>> 80756364:	breq       r2,0x548,80756378 <kmap+0x48>
>>>>>> >>>>>
>>>>>> >>>>>    ----- After  -----
>>>>>> >>>>> 80756330:	mov_s      r13,r0
>>>>>> >>>>> 80756332:	ld_s       r2,[r13,0]
>>>>>> >>>>> 80756334:	lsr_s      r2,r2,30
>>>>>> >>>>> 80756336:	sub_s      r2,r2,1
>>>>>> >>>>> 80756338:	brlo       r2,2,80756348 <kmap+0x30>
>>>>>> >>>>>
>>>>>> >>>>> For x86 defconfig build (32 bit only) it saves around 900 bytes.
>>>>>> >>>>> For ARC defconfig with HIGHMEM, it saved around 2K bytes.
>>>>>> >>>>>
>>>>>> >>>>>    ---->8-------
>>>>>> >>>>> ./scripts/bloat-o-meter x86/vmlinux-defconfig-pre x86/vmlinux-defconfig-post
>>>>>> >>>>> add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-934 (-934)
>>>>>> >>>>> function                                     old     new   delta
>>>>>> >>>>> saveable_page                                162     154      -8
>>>>>> >>>>> saveable_highmem_page                        154     146      -8
>>>>>> >>>>> skb_gro_reset_offset                         147     131     -16
>>>>>> >>>>> ...
>>>>>> >>>>> ...
>>>>>> >>>>> __change_page_attr_set_clr                  1715    1678     -37
>>>>>> >>>>> setup_data_read                              434     394     -40
>>>>>> >>>>> mon_bin_event                               1967    1927     -40
>>>>>> >>>>> swsusp_save                                 1148    1105     -43
>>>>>> >>>>> _set_pages_array                             549     493     -56
>>>>>> >>>>>    ---->8-------
>>>>>> >>>>>
>>>>>> >>>>> e.g. For ARC kmap()
>>>>>> >>>>>
>>>> >>> is_highmem() is deranged.  Can't we use a bit in zone->flags or
>>>> >>> something?
>>> >> It won't be "a" bit since zone_type is an enum.
>> > Yes it will!
>> >
>> > static inline int is_highmem(struct zone *zone)
>> > {
>> > 	return test_bit(ZONE_HIGHMEM, &zone->flags);
>> > }
> Point is do we want to fix this specific case, or do we want to improve zone_idx()
> in general.
>
> #define zone_idx(zone)        ((zone) - (zone)->zone_pgdat->node_zones
>
> If former, I can split up zone->flags into 31:1 bit-field. Otherwise I will split
> it into 24:8 to hold both zone_flags and zone_type


Andrew, what do u think. Do we need to shoehorn type (idx) and flags into same
placeholder or simply add a new zone_idx field.
There will be slight complications with former to avoid any possible collisions
when someone calls set_bit() for flags holder bitfield.

IMHO zone struct is already so bloated, adding another word would be OK specially
when number of zone structures is not too many !

-Vineet

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-11-06 10:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29  7:54 [PATCH] mm: optimize PageHighMem() check Vineet Gupta
2015-10-01 13:12 ` Michal Hocko
2015-10-01 23:25 ` Andrew Morton
2015-10-02  7:15   ` Vineet Gupta
2015-10-02 20:53     ` Andrew Morton
2015-10-03 10:19       ` Vineet Gupta
2015-11-06 10:21         ` Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).