All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muhammad Usama Anjum <usama.anjum@collabora.com>
To: Andrei Vagin <avagin@gmail.com>
Cc: "Muhammad Usama Anjum" <usama.anjum@collabora.com>,
	"Michał Mirosław" <emmir@google.com>,
	"Danylo Mocherniuk" <mdanylo@google.com>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	"Christian Brauner" <brauner@kernel.org>,
	"Peter Xu" <peterx@redhat.com>, "Yang Shi" <shy828301@gmail.com>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Zach O'Keefe" <zokeefe@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Gustavo A. R. Silva" <gustavoars@kernel.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	kernel@collabora.com,
	"Gabriel Krisman Bertazi" <krisman@collabora.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Peter Enderborg" <peter.enderborg@sony.com>,
	"open list : KERNEL SELFTEST FRAMEWORK"
	<linux-kselftest@vger.kernel.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"open list" <linux-kernel@vger.kernel.org>,
	"open list : PROC FILESYSTEM" <linux-fsdevel@vger.kernel.org>,
	"open list : MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	"Paul Gofman" <pgofman@codeweavers.com>
Subject: Re: [PATCH v6 2/3] fs/proc/task_mmu: Implement IOCTL to get and/or the clear info about PTEs
Date: Fri, 11 Nov 2022 15:10:01 +0500	[thread overview]
Message-ID: <b5a67c87-e901-4d0c-8367-d1bf1293d5c4@collabora.com> (raw)
In-Reply-To: <Y2w9sWZf5mlNV7Z3@gmail.com>

Hello Andrei,

Thank you for reviewing.

On 11/10/22 4:54 AM, Andrei Vagin wrote:
[...]
>> +static int add_to_out(bool sd, bool file, bool pres, bool swap, struct pagemap_scan_private *p,
>> +		      unsigned long addr, unsigned int len)
>> +{
>> +	unsigned long bitmap, cur = sd | file << 1 | pres << 2 | swap << 3;
> 
> Should we define contants for each of these bits?
I think I can define a macro to hide this dirty bit shifting in the function.

> 
>> +	bool cpy = true;
>> +
>> +	if (p->required_mask)
>> +		cpy = ((p->required_mask & cur) == p->required_mask);
>> +	if (cpy && p->anyof_mask)
>> +		cpy = (p->anyof_mask & cur);
>> +	if (cpy && p->excluded_mask)
>> +		cpy = !(p->excluded_mask & cur);
>> +
>> +	bitmap = cur & p->return_mask;
>> +
>> +	if (cpy && bitmap) {
>> +		if ((p->vec_index) && (p->vec[p->vec_index - 1].bitmap == bitmap) &&
>> +		    (p->vec[p->vec_index - 1].start + p->vec[p->vec_index - 1].len * PAGE_SIZE ==
>> +		     addr)) {
> 
> I think it is better to define a variable for p->vec_index - 1.
Will do in the next revision.

> nit: len can be in bytes rather than pages.
We are considering memory in the page units. The memory given to this IOCTL
must have PAGE_SIZE alignment. Oterwise we error out (picked this from
mincore()).

>> +static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long addr,
>> +				  unsigned long end, struct mm_walk *walk)
>> +{
>> +	struct pagemap_scan_private *p = walk->private;
>> +	struct vm_area_struct *vma = walk->vma;
>> +	unsigned int len;
>> +	spinlock_t *ptl;
>> +	int ret = 0;
>> +	pte_t *pte;
>> +	bool dirty_vma = (p->flags & PAGEMAP_NO_REUSED_REGIONS) ?
>> +			 (false) : (vma->vm_flags & VM_SOFTDIRTY);
>> +
>> +	if ((walk->vma->vm_end < addr) || (p->max_pages && p->found_pages == p->max_pages))
>> +		return 0;
>> +
>> +	end = min(end, walk->vma->vm_end);
>> +
>> +	ptl = pmd_trans_huge_lock(pmd, vma);
>> +	if (ptl) {
>> +		if (dirty_vma || check_soft_dirty_pmd(vma, addr, pmd, false)) {
>> +			/*
>> +			 * Break huge page into small pages if operation needs to be performed is
>> +			 * on a portion of the huge page or the return buffer cannot store complete
>> +			 * data.
>> +			 */
>> +			if ((IS_CLEAR_OP(p) && (end - addr < HPAGE_SIZE))) {
>> +				spin_unlock(ptl);
>> +				split_huge_pmd(vma, pmd, addr);
>> +				goto process_smaller_pages;
>> +			}
>> +
>> +			if (IS_GET_OP(p)) {
>> +				len = (end - addr)/PAGE_SIZE;
>> +				if (p->max_pages && p->found_pages + len > p->max_pages)
>> +					len = p->max_pages - p->found_pages;
>> +
>> +				ret = add_to_out(dirty_vma ||
>> +						 check_soft_dirty_pmd(vma, addr, pmd, false),
> 
> can we reuse a return code of the previous call of check_soft_dirty_pmd?
Yes, will do.

> 
>> +						 vma->vm_file, pmd_present(*pmd), is_swap_pmd(*pmd),
>> +						 p, addr, len);
>> +			}
>> +			if (!ret && IS_CLEAR_OP(p))
>> +				check_soft_dirty_pmd(vma, addr, pmd, true);
> 
> should we return a error in this case? We need to be sure that:
> * we stop waking page tables after this point.
I'll update the implementation to return error. It immediately terminates
the walk as well.
> * return this error to the user-space if we are not able to add anything
>   in the vector.
I'm not returning error to userspace if we found no page matching the
masks. The total number of filled page_region are returned from the IOCTL.
If IOCTL returns 0, it means no page found has found, but the IOCTL
executed successfully.

[...]
>> +static long do_pagemap_sd_cmd(struct mm_struct *mm, struct pagemap_scan_arg *arg)
>> +{
>> +	struct mmu_notifier_range range;
>> +	unsigned long __user start, end;
>> +	struct pagemap_scan_private p;
>> +	int ret;
>> +
>> +	start = (unsigned long)untagged_addr(arg->start);
>> +	if ((!IS_ALIGNED(start, PAGE_SIZE)) || (!access_ok((void __user *)start, arg->len)))
>> +		return -EINVAL;
>> +
>> +	if (IS_GET_OP(arg) &&
>> +	    ((arg->vec_len == 0) || (!access_ok((struct page_region *)arg->vec, arg->vec_len))))
>> +		return -ENOMEM;
>> +
>> +#ifndef CONFIG_MEM_SOFT_DIRTY
>> +	if (IS_SD_OP(arg) || (arg->required_mask & PAGE_IS_SOFTDIRTY) ||
>> +	    (arg->anyof_mask & PAGE_IS_SOFTDIRTY))
>> +		return -EINVAL;
>> +#endif
>> +
>> +	if ((arg->flags & ~PAGEMAP_SD_FLAGS) || (arg->required_mask & ~PAGEMAP_OP_MASK) ||
>> +	    (arg->anyof_mask & ~PAGEMAP_OP_MASK) || (arg->excluded_mask & ~PAGEMAP_OP_MASK) ||
>> +	    (arg->return_mask & ~PAGEMAP_OP_MASK))
>> +		return -EINVAL;
>> +
>> +	if ((!arg->required_mask && !arg->anyof_mask && !arg->excluded_mask) || !arg->return_mask)
>> +		return -EINVAL;
>> +
>> +	if (IS_SD_OP(arg) && ((arg->required_mask & PAGEMAP_NONSD_OP_MASK) ||
>> +	     (arg->anyof_mask & PAGEMAP_NONSD_OP_MASK)))
>> +		return -EINVAL;
>> +
>> +	end = start + arg->len;
>> +	p.max_pages = arg->max_pages;
>> +	p.found_pages = 0;
>> +	p.flags = arg->flags;
>> +	p.required_mask = arg->required_mask;
>> +	p.anyof_mask = arg->anyof_mask;
>> +	p.excluded_mask = arg->excluded_mask;
>> +	p.return_mask = arg->return_mask;
>> +	p.vec_index = 0;
>> +	p.vec_len = arg->vec_len;
>> +
>> +	if (IS_GET_OP(arg)) {
>> +		p.vec = vzalloc(arg->vec_len * sizeof(struct page_region));
> 
> I think we need to set a reasonable limit for vec_len to avoid large
> allocations on the kernel. We can consider to use kmalloc or kvmalloc
> here.
I'll update to kvzalloc which uses vmalloc if kmalloc fails. It'll use
kmalloc for smaller allocations. Thanks for suggesting it. But it'll not
limit the memory allocation.

> 
> Thanks,
> Andrei

-- 
BR,
Muhammad Usama Anjum

  reply	other threads:[~2022-11-11 10:10 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-09 10:23 [PATCH v6 0/3] Implement IOCTL to get and/or the clear info about PTEs Muhammad Usama Anjum
2022-11-09 10:23 ` [PATCH v6 1/3] fs/proc/task_mmu: update functions to clear the soft-dirty PTE bit Muhammad Usama Anjum
2022-11-09 10:23 ` [PATCH v6 2/3] fs/proc/task_mmu: Implement IOCTL to get and/or the clear info about PTEs Muhammad Usama Anjum
2022-11-09 23:54   ` Andrei Vagin
2022-11-11 10:10     ` Muhammad Usama Anjum [this message]
2022-11-10 17:58   ` kernel test robot
2022-11-11 17:13   ` kernel test robot
2022-11-11 17:53     ` Muhammad Usama Anjum
2022-11-18  1:32   ` kernel test robot
2022-12-12 20:42   ` Cyrill Gorcunov
2022-12-13 13:04     ` Muhammad Usama Anjum
2022-12-13 22:22       ` Cyrill Gorcunov
2022-11-09 10:23 ` [PATCH v6 3/3] selftests: vm: add pagemap ioctl tests Muhammad Usama Anjum
2022-11-09 10:34 ` [PATCH v6 0/3] Implement IOCTL to get and/or the clear info about PTEs David Hildenbrand
2022-11-11  7:08   ` Muhammad Usama Anjum
2022-11-14 15:46     ` David Hildenbrand
2022-11-21 15:00       ` Muhammad Usama Anjum
2022-11-21 15:55         ` David Hildenbrand
2022-11-30 11:42           ` Muhammad Usama Anjum
2022-11-30 12:10             ` David Hildenbrand
2022-12-05 15:29               ` Muhammad Usama Anjum
2022-12-05 15:39                 ` David Hildenbrand
2022-11-23 14:11 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b5a67c87-e901-4d0c-8367-d1bf1293d5c4@collabora.com \
    --to=usama.anjum@collabora.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=brauner@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=emmir@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gustavoars@kernel.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mdanylo@google.com \
    --cc=peter.enderborg@sony.com \
    --cc=peterx@redhat.com \
    --cc=pgofman@codeweavers.com \
    --cc=shuah@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.