linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtio-fs@redhat.com, miklos@szeredi.hu, stefanha@redhat.com,
	dgilbert@redhat.com, Dan Williams <dan.j.williams@intel.com>,
	linux-nvdimm@lists.01.org
Subject: Re: [PATCH v2 02/20] dax: Create a range version of dax_layout_busy_page()
Date: Mon, 17 Aug 2020 18:53:39 +0200	[thread overview]
Message-ID: <20200817165339.GA22500@quack2.suse.cz> (raw)
In-Reply-To: <20200807195526.426056-3-vgoyal@redhat.com>

On Fri 07-08-20 15:55:08, Vivek Goyal wrote:
> virtiofs device has a range of memory which is mapped into file inodes
> using dax. This memory is mapped in qemu on host and maps different
> sections of real file on host. Size of this memory is limited
> (determined by administrator) and depending on filesystem size, we will
> soon reach a situation where all the memory is in use and we need to
> reclaim some.
> 
> As part of reclaim process, we will need to make sure that there are
> no active references to pages (taken by get_user_pages()) on the memory
> range we are trying to reclaim. I am planning to use
> dax_layout_busy_page() for this. But in current form this is per inode
> and scans through all the pages of the inode.
> 
> We want to reclaim only a portion of memory (say 2MB page). So we want
> to make sure that only that 2MB range of pages do not have any
> references  (and don't want to unmap all the pages of inode).
> 
> Hence, create a range version of this function named
> dax_layout_busy_page_range() which can be used to pass a range which
> needs to be unmapped.
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-nvdimm@lists.01.org
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>

The API looks OK. Some comments WRT the implementation below.

> diff --git a/fs/dax.c b/fs/dax.c
> index 11b16729b86f..0d51b0fbb489 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -558,27 +558,20 @@ static void *grab_mapping_entry(struct xa_state *xas,
>  	return xa_mk_internal(VM_FAULT_FALLBACK);
>  }
>  
> -/**
> - * dax_layout_busy_page - find first pinned page in @mapping
> - * @mapping: address space to scan for a page with ref count > 1
> - *
> - * DAX requires ZONE_DEVICE mapped pages. These pages are never
> - * 'onlined' to the page allocator so they are considered idle when
> - * page->count == 1. A filesystem uses this interface to determine if
> - * any page in the mapping is busy, i.e. for DMA, or other
> - * get_user_pages() usages.
> - *
> - * It is expected that the filesystem is holding locks to block the
> - * establishment of new mappings in this address_space. I.e. it expects
> - * to be able to run unmap_mapping_range() and subsequently not race
> - * mapping_mapped() becoming true.
> +/*
> + * Partial pages are included. If end is LLONG_MAX, pages in the range from
> + * start to end of the file are inluded.
>   */

I think the big kerneldoc comment should stay with
dax_layout_busy_page_range() since dax_layout_busy_page() will be just a
trivial wrapper around it..

> -struct page *dax_layout_busy_page(struct address_space *mapping)
> +struct page *dax_layout_busy_page_range(struct address_space *mapping,
> +					loff_t start, loff_t end)
>  {
> -	XA_STATE(xas, &mapping->i_pages, 0);
>  	void *entry;
>  	unsigned int scanned = 0;
>  	struct page *page = NULL;
> +	pgoff_t start_idx = start >> PAGE_SHIFT;
> +	pgoff_t end_idx = end >> PAGE_SHIFT;
> +	XA_STATE(xas, &mapping->i_pages, start_idx);
> +	loff_t len, lstart = round_down(start, PAGE_SIZE);
>  
>  	/*
>  	 * In the 'limited' case get_user_pages() for dax is disabled.
> @@ -589,6 +582,22 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
>  	if (!dax_mapping(mapping) || !mapping_mapped(mapping))
>  		return NULL;
>  
> +	/* If end == LLONG_MAX, all pages from start to till end of file */
> +	if (end == LLONG_MAX) {
> +		end_idx = ULONG_MAX;
> +		len = 0;
> +	} else {
> +		/* length is being calculated from lstart and not start.
> +		 * This is due to behavior of unmap_mapping_range(). If
> +		 * start is say 4094 and end is on 4096 then we want to
> +		 * unamp two pages, idx 0 and 1. But unmap_mapping_range()
> +		 * will unmap only page at idx 0. If we calculate len
> +		 * from the rounded down start, this problem should not
> +		 * happen.
> +		 */
> +		len = end - lstart + 1;
> +	}

Maybe it would be more understandable to use
	unmap_mapping_pages(mapping, start_idx, end_idx - start_idx + 1);
below and avoid all this rounding and special-casing.

> +
>  	/*
>  	 * If we race get_user_pages_fast() here either we'll see the
>  	 * elevated page count in the iteration and wait, or
> @@ -601,10 +610,10 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
>  	 * guaranteed to either see new references or prevent new
>  	 * references from being established.
>  	 */
> -	unmap_mapping_range(mapping, 0, 0, 0);
> +	unmap_mapping_range(mapping, start, len, 0);
>  
>  	xas_lock_irq(&xas);
> -	xas_for_each(&xas, entry, ULONG_MAX) {
> +	xas_for_each(&xas, entry, end_idx) {
>  		if (WARN_ON_ONCE(!xa_is_value(entry)))
>  			continue;
>  		if (unlikely(dax_is_locked(entry)))
> @@ -625,6 +634,27 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
>  	xas_unlock_irq(&xas);
>  	return page;
>  }
> +EXPORT_SYMBOL_GPL(dax_layout_busy_page_range);
> +
> +/**
> + * dax_layout_busy_page - find first pinned page in @mapping
> + * @mapping: address space to scan for a page with ref count > 1
> + *
> + * DAX requires ZONE_DEVICE mapped pages. These pages are never
> + * 'onlined' to the page allocator so they are considered idle when
> + * page->count == 1. A filesystem uses this interface to determine if
> + * any page in the mapping is busy, i.e. for DMA, or other
> + * get_user_pages() usages.
> + *
> + * It is expected that the filesystem is holding locks to block the
> + * establishment of new mappings in this address_space. I.e. it expects
> + * to be able to run unmap_mapping_range() and subsequently not race
> + * mapping_mapped() becoming true.
> + */
> +struct page *dax_layout_busy_page(struct address_space *mapping)
> +{
> +	return dax_layout_busy_page_range(mapping, 0, 0);

Should the 'end' rather be LLONG_MAX?

Otherwise the patch looks good to me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2020-08-17 16:54 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-07 19:55 [PATCH v2 00/20] virtiofs: Add DAX support Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 01/20] dax: Modify bdev_dax_pgoff() to handle NULL bdev Vivek Goyal
2020-08-17 16:57   ` Jan Kara
2020-08-07 19:55 ` [PATCH v2 02/20] dax: Create a range version of dax_layout_busy_page() Vivek Goyal
2020-08-17 16:53   ` Jan Kara [this message]
2020-08-17 17:22     ` Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 03/20] virtio: Add get_shm_region method Vivek Goyal
2020-08-10 13:47   ` Michael S. Tsirkin
2020-08-10 13:54     ` Vivek Goyal
2020-08-10 14:02   ` Michael S. Tsirkin
2020-08-10 14:06   ` Michael S. Tsirkin
2020-08-07 19:55 ` [PATCH v2 04/20] virtio: Implement get_shm_region for PCI transport Vivek Goyal
2020-08-10 14:05   ` Michael S. Tsirkin
2020-08-10 14:50     ` Vivek Goyal
     [not found]       ` <CAAfnVBk+Hmcm2ftd3wOK-P2NyYQ7z4Wrf1JKhLJaNkCZBLoo6g@mail.gmail.com>
2020-08-17 20:29         ` Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 05/20] virtio: Implement get_shm_region for MMIO transport Vivek Goyal
2020-08-10 14:03   ` Michael S. Tsirkin
2020-08-07 19:55 ` [PATCH v2 06/20] virtiofs: Provide a helper function for virtqueue initialization Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 07/20] fuse: Get rid of no_mount_options Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 08/20] fuse,virtiofs: Add a mount option to enable dax Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 09/20] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 10/20] fuse,virtiofs: Keep a list of free dax memory ranges Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 11/20] fuse: implement FUSE_INIT map_alignment field Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 12/20] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 13/20] fuse, dax: Implement dax read/write operations Vivek Goyal
2020-08-10 22:06   ` Dave Chinner
2020-08-11 17:44     ` Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 14/20] fuse,dax: add DAX mmap support Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 15/20] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal
2020-08-10 22:22   ` Dave Chinner
2020-08-11 17:55     ` Vivek Goyal
2020-08-12  1:23       ` Dave Chinner
2020-08-12 21:10         ` Vivek Goyal
2020-08-13  5:12           ` Dave Chinner
2020-08-07 19:55 ` [PATCH v2 16/20] fuse,virtiofs: Define dax address space operations Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 17/20] fuse,virtiofs: Maintain a list of busy elements Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 18/20] fuse: Release file in process context Vivek Goyal
2020-08-10  8:29   ` Miklos Szeredi
2020-08-10 15:48     ` Vivek Goyal
2020-08-10 19:37     ` Vivek Goyal
2020-08-10 19:39       ` Miklos Szeredi
2020-08-07 19:55 ` [PATCH v2 19/20] fuse: Take inode lock for dax inode truncation Vivek Goyal
2020-08-07 19:55 ` [PATCH v2 20/20] fuse,virtiofs: Add logic to free up a memory range Vivek Goyal
2020-08-10  7:29 ` [PATCH v2 00/20] virtiofs: Add DAX support Miklos Szeredi
2020-08-10 13:08   ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200817165339.GA22500@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=dan.j.williams@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=stefanha@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).