linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PG_reserved and compound pages
@ 2016-04-06 11:28 Frank Mehnert
  2016-04-06 15:02 ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Frank Mehnert @ 2016-04-06 11:28 UTC (permalink / raw)
  To: linux-kernel

Hi,

Linux 4.5 introduced additional checks to ensure that compound pages are
never marked as reserved. In our code we use PG_reserved to ensure that
the kernel does never swap out such pages, e.g.

  int i;
  struct page *pages = alloc_pages(GFP_HIGHUSER | __GFP_COMP, 4);
  for (i = 0; i < 16; i++)
    SetPageReserved(&pages[i]);

The purpose of setting PG_reserved is to prevent the kernel from swapping
this memory out. This worked with older kernel but not with Linux 4.5 as
setting PG_reserved to compound pages is not allowed any more.

Can somebody explain how we can achieve the same result in accordance to
the new Linux 4.5 rules?

Thanks,

Frank
-- 
Dr.-Ing. Frank Mehnert | Software Development Director, VirtualBox
ORACLE Deutschland B.V. & Co. KG | Werkstr. 24 | 71384 Weinstadt, Germany

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PG_reserved and compound pages
  2016-04-06 11:28 PG_reserved and compound pages Frank Mehnert
@ 2016-04-06 15:02 ` Michal Hocko
  2016-04-06 15:12   ` Frank Mehnert
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2016-04-06 15:02 UTC (permalink / raw)
  To: Frank Mehnert; +Cc: linux-kernel, linux-mm

[CCing linux-mm mailing list]

On Wed 06-04-16 13:28:37, Frank Mehnert wrote:
> Hi,
> 
> Linux 4.5 introduced additional checks to ensure that compound pages are
> never marked as reserved. In our code we use PG_reserved to ensure that
> the kernel does never swap out such pages, e.g.

Are you putting your pages on the LRU list? If not how they could get
swapped out?

> 
>   int i;
>   struct page *pages = alloc_pages(GFP_HIGHUSER | __GFP_COMP, 4);
>   for (i = 0; i < 16; i++)
>     SetPageReserved(&pages[i]);
> 
> The purpose of setting PG_reserved is to prevent the kernel from swapping
> this memory out. This worked with older kernel but not with Linux 4.5 as
> setting PG_reserved to compound pages is not allowed any more.
> 
> Can somebody explain how we can achieve the same result in accordance to
> the new Linux 4.5 rules?
> 
> Thanks,
> 
> Frank
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: PG_reserved and compound pages
  2016-04-06 15:02 ` Michal Hocko
@ 2016-04-06 15:12   ` Frank Mehnert
  2016-04-06 15:33     ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Frank Mehnert @ 2016-04-06 15:12 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, linux-mm

Hi Michal,

On Wednesday 06 April 2016 17:02:06 Michal Hocko wrote:
> [CCing linux-mm mailing list]
> 
> On Wed 06-04-16 13:28:37, Frank Mehnert wrote:
> > Hi,
> > 
> > Linux 4.5 introduced additional checks to ensure that compound pages are
> > never marked as reserved. In our code we use PG_reserved to ensure that
> > the kernel does never swap out such pages, e.g.
> 
> Are you putting your pages on the LRU list? If not how they could get
> swapped out?

No, we do nothing like that. It was my understanding that at least with
older kernels it was possible that pages allocated with alloc_pages()
could be swapped out or otherwise manipulated, I might be wrong. For
instance, it's also necessary that the physical address of the page
is known and that it does never change. I know, there might be problems
with automatic NUMA page migration but that's another story.

> >   int i;
> >   struct page *pages = alloc_pages(GFP_HIGHUSER | __GFP_COMP, 4);
> >   for (i = 0; i < 16; i++)
> >   
> >     SetPageReserved(&pages[i]);
> > 
> > The purpose of setting PG_reserved is to prevent the kernel from swapping
> > this memory out. This worked with older kernel but not with Linux 4.5 as
> > setting PG_reserved to compound pages is not allowed any more.
> > 
> > Can somebody explain how we can achieve the same result in accordance to
> > the new Linux 4.5 rules?

Frank
-- 
Dr.-Ing. Frank Mehnert | Software Development Director, VirtualBox
ORACLE Deutschland B.V. & Co. KG | Werkstr. 24 | 71384 Weinstadt, Germany

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: PG_reserved and compound pages
  2016-04-06 15:12   ` Frank Mehnert
@ 2016-04-06 15:33     ` Michal Hocko
  2016-04-07 13:45       ` Frank Mehnert
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2016-04-06 15:33 UTC (permalink / raw)
  To: Frank Mehnert; +Cc: linux-kernel, linux-mm

On Wed 06-04-16 17:12:43, Frank Mehnert wrote:
> Hi Michal,
> 
> On Wednesday 06 April 2016 17:02:06 Michal Hocko wrote:
> > [CCing linux-mm mailing list]
> > 
> > On Wed 06-04-16 13:28:37, Frank Mehnert wrote:
> > > Hi,
> > > 
> > > Linux 4.5 introduced additional checks to ensure that compound pages are
> > > never marked as reserved. In our code we use PG_reserved to ensure that
> > > the kernel does never swap out such pages, e.g.
> > 
> > Are you putting your pages on the LRU list? If not how they could get
> > swapped out?
> 
> No, we do nothing like that. It was my understanding that at least with
> older kernels it was possible that pages allocated with alloc_pages()
> could be swapped out or otherwise manipulated, I might be wrong.

I do not see anything like that. All the evictable pages should be on
a LRU.

> For
> instance, it's also necessary that the physical address of the page
> is known and that it does never change. I know, there might be problems
> with automatic NUMA page migration but that's another story.

Do you map your pages to the userspace? If yes then vma with VM_IO or
VM_PFNMAP should keep any attempt away from those pages.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: Re: PG_reserved and compound pages
  2016-04-06 15:33     ` Michal Hocko
@ 2016-04-07 13:45       ` Frank Mehnert
  2016-04-07 15:22         ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Frank Mehnert @ 2016-04-07 13:45 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, linux-mm

On Wednesday 06 April 2016 17:33:43 Michal Hocko wrote:
> On Wed 06-04-16 17:12:43, Frank Mehnert wrote:
> > Hi Michal,
> > 
> > On Wednesday 06 April 2016 17:02:06 Michal Hocko wrote:
> > > [CCing linux-mm mailing list]
> > > 
> > > On Wed 06-04-16 13:28:37, Frank Mehnert wrote:
> > > > Hi,
> > > > 
> > > > Linux 4.5 introduced additional checks to ensure that compound pages
> > > > are
> > > > never marked as reserved. In our code we use PG_reserved to ensure
> > > > that
> > > > the kernel does never swap out such pages, e.g.
> > > 
> > > Are you putting your pages on the LRU list? If not how they could get
> > > swapped out?
> > 
> > No, we do nothing like that. It was my understanding that at least with
> > older kernels it was possible that pages allocated with alloc_pages()
> > could be swapped out or otherwise manipulated, I might be wrong.
> 
> I do not see anything like that. All the evictable pages should be on
> a LRU.

OK. It seems to work if I just don't mark these pages as 'PG_reserved'.
Need to do further tests.

> > For
> > instance, it's also necessary that the physical address of the page
> > is known and that it does never change. I know, there might be problems
> > with automatic NUMA page migration but that's another story.
> 
> Do you map your pages to the userspace? If yes then vma with VM_IO or
> VM_PFNMAP should keep any attempt away from those pages.

Yes, such memory objects are also mapped to userland. Do you think that
VM_IO or VM_PFNMAP would guard against NUMA page migration? Because when
NUMA page migration was introduced (I believe with Linux 3.8) I tested
both flags and saw that they didn't prevent the migration on such VM
areas. Maybe this changed in the meantime, do you have more information
about that?

The drawback of at least VM_IO is that such memory is not part of a core
dump. Actually currently we use vm_insert_page() for userland mapping
and mark the VM areas as

  VM_DONTEXPAND | VM_DONTDUMP

for such areas. We used VM_RESERVED for pre-3.7 kernels (old doc says
``VM_RESERVED tells the memory management system not to attempt to swap
out this VMA; it should be set in most device mappings.'' but this didn't
work for 3.7+ anymore.

Thanks,

Frank
-- 
Dr.-Ing. Frank Mehnert | Software Development Director, VirtualBox
ORACLE Deutschland B.V. & Co. KG | Werkstr. 24 | 71384 Weinstadt, Germany

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: Re: PG_reserved and compound pages
  2016-04-07 13:45       ` Frank Mehnert
@ 2016-04-07 15:22         ` Michal Hocko
  2016-04-19 10:34           ` Frank Mehnert
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2016-04-07 15:22 UTC (permalink / raw)
  To: Frank Mehnert; +Cc: linux-kernel, linux-mm

On Thu 07-04-16 15:45:02, Frank Mehnert wrote:
> On Wednesday 06 April 2016 17:33:43 Michal Hocko wrote:
[...]
> > Do you map your pages to the userspace? If yes then vma with VM_IO or
> > VM_PFNMAP should keep any attempt away from those pages.
> 
> Yes, such memory objects are also mapped to userland. Do you think that
> VM_IO or VM_PFNMAP would guard against NUMA page migration?

Both auto numa and manual numa migration checks vma_migratable and that
excludes both VM flags.

> Because when
> NUMA page migration was introduced (I believe with Linux 3.8) I tested
> both flags and saw that they didn't prevent the migration on such VM
> areas. Maybe this changed in the meantime, do you have more information
> about that?

I haven't checked the history much but vma_migratable should be there
for quite some time. Maybe it wasn't used in the past. Dunno

> The drawback of at least VM_IO is that such memory is not part of a core
> dump.

that seems to be correct as per vma_dump_size

> Actually currently we use vm_insert_page() for userland mapping
> and mark the VM areas as
> 
>   VM_DONTEXPAND | VM_DONTDUMP

but that means that it won't end up in the dump either. Or am I missing
your point.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: Re: Re: PG_reserved and compound pages
  2016-04-07 15:22         ` Michal Hocko
@ 2016-04-19 10:34           ` Frank Mehnert
  0 siblings, 0 replies; 7+ messages in thread
From: Frank Mehnert @ 2016-04-19 10:34 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, linux-mm

Hi Michal,

On Thursday 07 April 2016 17:22:35 Michal Hocko wrote:
> On Thu 07-04-16 15:45:02, Frank Mehnert wrote:
> > On Wednesday 06 April 2016 17:33:43 Michal Hocko wrote:
> [...]
> 
> > > Do you map your pages to the userspace? If yes then vma with VM_IO or
> > > VM_PFNMAP should keep any attempt away from those pages.
> > 
> > Yes, such memory objects are also mapped to userland. Do you think that
> > VM_IO or VM_PFNMAP would guard against NUMA page migration?
> 
> Both auto numa and manual numa migration checks vma_migratable and that
> excludes both VM flags.
> 
> > Because when
> > NUMA page migration was introduced (I believe with Linux 3.8) I tested
> > both flags and saw that they didn't prevent the migration on such VM
> > areas. Maybe this changed in the meantime, do you have more information
> > about that?
> 
> I haven't checked the history much but vma_migratable should be there
> for quite some time. Maybe it wasn't used in the past. Dunno

I did some further tests and indeed, with Linux 3.8 ... Linux 3.12 I was
able to reproduce NUMA page faults while with Linux 3.14 (3.13 didn't run
for some reason on my hardware) I'm no longer able to reproduce NUMA page
faults. The important point is that with Linux 3.8, all pages are unmapped
from time to time and in the page fault handler a decision is made if the
page should be migrated to another NUMA node or not. So even if
vma_migratable() returns FALSE it wouldn't help us as the page has already
faulted.

But as I said, the behaviour with Linux >= 3.14 is different which helps
us a lot!

> > The drawback of at least VM_IO is that such memory is not part of a core
> > dump.
> 
> that seems to be correct as per vma_dump_size
> 
> > Actually currently we use vm_insert_page() for userland mapping
> > and mark the VM areas as
> > 
> >   VM_DONTEXPAND | VM_DONTDUMP
> 
> but that means that it won't end up in the dump either. Or am I missing
> your point.

I guess you are right and we probably don't get these pages into core dumps
either. We can live with that.

Thank you for your suggestions and explanations, it was very helpful!

Frank
-- 
Dr.-Ing. Frank Mehnert | Software Development Director, VirtualBox
ORACLE Deutschland B.V. & Co. KG | Werkstr. 24 | 71384 Weinstadt, Germany

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-04-19 10:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-06 11:28 PG_reserved and compound pages Frank Mehnert
2016-04-06 15:02 ` Michal Hocko
2016-04-06 15:12   ` Frank Mehnert
2016-04-06 15:33     ` Michal Hocko
2016-04-07 13:45       ` Frank Mehnert
2016-04-07 15:22         ` Michal Hocko
2016-04-19 10:34           ` Frank Mehnert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).