Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page

From: Anshuman Khandual <anshuman.khandual@arm.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	kirill@shutemov.name, kirill.shutemov@linux.intel.com,
	vbabka@suse.cz, will.deacon@arm.com, dave.hansen@intel.com
Subject: Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page
Date: Mon, 18 Feb 2019 08:37:11 +0530	[thread overview]
Message-ID: <bd382e05-487e-06c8-9239-a2303f48b578@arm.com> (raw)
In-Reply-To: <20190215092746.GU4525@dhcp22.suse.cz>

On 02/15/2019 02:57 PM, Michal Hocko wrote:
> On Fri 15-02-19 14:15:58, Anshuman Khandual wrote:
>> On 02/14/2019 05:58 PM, Michal Hocko wrote:
>>> It is hard to assume any further access for migrated pages here. Then we
>>> have an explicit move_pages syscall and I would expect this to be
>>> somewhere in the middle. One would expect that the caller knows why the
>>> memory is migrated and it will be used but again, we cannot really
>>> assume anything.
>>
>> What if the caller knows that it wont be used ever again or in near future
>> and hence trying to migrate to a different node which has less expensive and
>> slower memory. Kernel should not assume either way on it but can decide to
>> be conservative in spending time in preparing for future exec faults.
>>
>> But being conservative during migration risks additional exec faults which
>> would have been avoided if exec permission should have stayed on followed
>> by an I-cache invalidation. Deferral of the I-cache invalidation requires
>> removing the exec permission completely (unless there is some magic which
>> I am not aware about) i.e unmapping page for exec permission and risking
>> an exec fault next time around.
>>
>> This problem gets particularly amplified for mixed permission (WRITE | EXEC)
>> user space mappings where things like NUMA migration, compaction etc probably
>> gets triggered by write faults and additional exec permission there never
>> really gets used.
> 
> Please quantify that and provide us with some _data_> 
>>> This would suggest that this depends on the migration reason quite a
>>> lot. So I would really like to see a more comprehensive analysis of
>>> different workloads to see whether this is really worth it.
>>
>> Sure. Could you please give some more details on how to go about this and
>> what specifically you are looking for ?
> 
> You are proposing an optimization without actually providing any
> justification. The overhead is not removed it is just shifted from one
> path to another. So you should have some pretty convincing arguments
> to back that shift as a general win. You can go an test on wider range
> of workloads and isolate the worst/best case behavior. I fully realize
> that this is tedious. Another option would be to define conditions when
> the optimization is going to be a huge win and have some convincing

Yeah conditional approach might narrow down the field and provide better
probability for a general win. The system call (move_pages/mbind) based
migrations from the user space are better placed for an win because the
user might just want to put those pages aside for rare exec accesses in
the future and the worst case cost for those deferral is not too high as
well. A hint regarding probable rare exec access in the future for the
kernel would have been better but I am afraid it would then require a new
user interface. But I think lazy exec decision can be taken right away
for MR_SYSCALL triggered migrations for VMAs with mixed permission
([VM_READ]|VM_WRITE|VM_EXEC) knowing the fact that in worst case the
cost is just getting migrated.

MR_NUMA_MISPLACED triggered migrations requires explicit tracking of fault
type (exec/write/[read]) per VMA along with it's applicable permission to
determine if exec permission deferral would be helpful or not. These stats
can also be used for all other kernel or user initiated migrations like
MR_COMPACTION, MR_MEMORY_FAILURE, MR_MEMORY_HOTPLUG and MR_CONTIG_RANGE.

Would it be worth adding explicit fault type tracking per VMA ? Can it be
used for some other purpose as well.

> arguments that many/most workloads are falling into that category while
> pathological ones are not suffering much.
> 
> This is no different from any other optimizations/heuristics we have.

Sure. Will think about this further.

> 
> Btw. have you considered to have this optimization conditional based on
> the migration reason or vma flags?

Started considering it after our discussions here. It makes sense to look
into the migration reason and the VMA flags right away but as I mentioned
earlier VMA fault type stats can really help on this as well.