All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Song Liu <songliubraving@fb.com>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, matthew.wilcox@oracle.com,
	kirill.shutemov@linux.intel.com, kernel-team@fb.com,
	william.kucharski@oracle.com, akpm@linux-foundation.org,
	hdanton@sina.com
Subject: Re: [PATCH v9 5/6] mm,thp: add read-only THP support for (non-shmem) FS
Date: Wed, 10 Jul 2019 14:48:11 -0400	[thread overview]
Message-ID: <20190710184811.GF11197@cmpxchg.org> (raw)
In-Reply-To: <20190625001246.685563-6-songliubraving@fb.com>

On Mon, Jun 24, 2019 at 05:12:45PM -0700, Song Liu wrote:
> This patch is (hopefully) the first step to enable THP for non-shmem
> filesystems.
> 
> This patch enables an application to put part of its text sections to THP
> via madvise, for example:
> 
>     madvise((void *)0x600000, 0x200000, MADV_HUGEPAGE);
> 
> We tried to reuse the logic for THP on tmpfs.
> 
> Currently, write is not supported for non-shmem THP. khugepaged will only
> process vma with VM_DENYWRITE. sys_mmap() ignores VM_DENYWRITE requests
> (see ksys_mmap_pgoff). The only way to create vma with VM_DENYWRITE is
> execve(). This requirement limits non-shmem THP to text sections.
> 
> The next patch will handle writes, which would only happen when the all
> the vmas with VM_DENYWRITE are unmapped.
> 
> An EXPERIMENTAL config, READ_ONLY_THP_FOR_FS, is added to gate this
> feature.
> 
> Acked-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>

This is really cool, and less invasive than I anticipated. Nice work.

I only have one concern and one question:

> @@ -1392,6 +1401,29 @@ static void collapse_file(struct mm_struct *mm,
>  				result = SCAN_FAIL;
>  				goto xa_unlocked;
>  			}
> +		} else if (!page || xa_is_value(page)) {
> +			xas_unlock_irq(&xas);
> +			page_cache_sync_readahead(mapping, &file->f_ra, file,
> +						  index, PAGE_SIZE);
> +			/* drain pagevecs to help isolate_lru_page() */
> +			lru_add_drain();
> +			page = find_lock_page(mapping, index);
> +			if (unlikely(page == NULL)) {
> +				result = SCAN_FAIL;
> +				goto xa_unlocked;
> +			}
> +		} else if (!PageUptodate(page)) {
> +			VM_BUG_ON(is_shmem);
> +			xas_unlock_irq(&xas);
> +			wait_on_page_locked(page);
> +			if (!trylock_page(page)) {
> +				result = SCAN_PAGE_LOCK;
> +				goto xa_unlocked;
> +			}
> +			get_page(page);
> +		} else if (!is_shmem && PageDirty(page)) {
> +			result = SCAN_FAIL;
> +			goto xa_locked;
>  		} else if (trylock_page(page)) {
>  			get_page(page);
>  			xas_unlock_irq(&xas);

The many else ifs here check fairly complex page state and are hard to
follow and verify mentally. In fact, it's a bit easier now in the
patch when you see how it *used* to work with just shmem, but the end
result is fragile from a maintenance POV.

The shmem and file cases have little in common - basically only the
trylock_page(). Can you please make one big 'if (is_shmem) {} {}'
structure instead that keeps those two scenarios separate?

> @@ -1426,6 +1458,12 @@ static void collapse_file(struct mm_struct *mm,
>  			goto out_unlock;
>  		}
>  
> +		if (page_has_private(page) &&
> +		    !try_to_release_page(page, GFP_KERNEL)) {
> +			result = SCAN_PAGE_HAS_PRIVATE;
> +			break;
> +		}
> +
>  		if (page_mapped(page))
>  			unmap_mapping_pages(mapping, index, 1, false);

> @@ -1607,6 +1658,17 @@ static void khugepaged_scan_file(struct mm_struct *mm,
>  			break;
>  		}
>  
> +		if (page_has_private(page) && trylock_page(page)) {
> +			int ret;
> +
> +			ret = try_to_release_page(page, GFP_KERNEL);
> +			unlock_page(page);
> +			if (!ret) {
> +				result = SCAN_PAGE_HAS_PRIVATE;
> +				break;
> +			}
> +		}
> +
>  		if (page_count(page) != 1 + page_mapcount(page)) {
>  			result = SCAN_PAGE_COUNT;
>  			break;

There is already a try_to_release() inside the page lock section in
collapse_file(). I'm assuming you added this one because private data
affects the refcount. But it seems a bit overkill just for that; we
could also still fail the check, in which case we'd have dropped the
buffers in vain. Can you fix the check instead?

There is an is_page_cache_freeable() function in vmscan.c that handles
private fs references:

static inline int is_page_cache_freeable(struct page *page)
{
	/*
	 * A freeable page cache page is referenced only by the caller
	 * that isolated the page, the page cache and optional buffer
	 * heads at page->private.
	 */
	int page_cache_pins = PageTransHuge(page) && PageSwapCache(page) ?
		HPAGE_PMD_NR : 1;
	return page_count(page) - page_has_private(page) == 1 + page_cache_pins;
}

Wouldn't this work here as well?

The rest looks great to me.

  reply	other threads:[~2019-07-10 18:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-25  0:12 [PATCH v9 0/6] Enable THP for text section of non-shmem files Song Liu
2019-06-25  0:12 ` [PATCH v9 1/6] filemap: check compound_head(page)->mapping in filemap_fault() Song Liu
2019-07-10 17:44   ` Johannes Weiner
2019-07-10 17:51   ` Johannes Weiner
2019-06-25  0:12 ` [PATCH v9 2/6] filemap: update offset check " Song Liu
2019-07-10 17:52   ` Johannes Weiner
2019-06-25  0:12 ` [PATCH v9 3/6] mm,thp: stats for file backed THP Song Liu
2019-07-10 17:59   ` Johannes Weiner
2019-06-25  0:12 ` [PATCH v9 4/6] khugepaged: rename collapse_shmem() and khugepaged_scan_shmem() Song Liu
2019-06-27 13:19   ` Rik van Riel
2019-06-27 13:19     ` Rik van Riel
2019-07-10 18:21   ` Johannes Weiner
2019-06-25  0:12 ` [PATCH v9 5/6] mm,thp: add read-only THP support for (non-shmem) FS Song Liu
2019-07-10 18:48   ` Johannes Weiner [this message]
2019-07-22 23:41     ` Song Liu
2019-07-23 23:59   ` Huang, Kai
2019-07-28  6:41     ` Song Liu
2019-06-25  0:12 ` [PATCH v9 6/6] mm,thp: avoid writes to file with THP in pagecache Song Liu
2019-06-27 13:18   ` Rik van Riel
2019-06-27 13:18     ` Rik van Riel
2019-07-10 19:11   ` Johannes Weiner
2019-06-27 12:46 ` [PATCH v9 0/6] Enable THP for text section of non-shmem files Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190710184811.GF11197@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=hdanton@sina.com \
    --cc=kernel-team@fb.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.wilcox@oracle.com \
    --cc=songliubraving@fb.com \
    --cc=william.kucharski@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.