From: Yang Shi <shy828301@gmail.com> To: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org>, Shakeel Butt <shakeelb@google.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Miaohe Lin <linmiaohe@huawei.com>, Mike Kravetz <mike.kravetz@oracle.com>, Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@surriel.com>, Christoph Hellwig <hch@infradead.org>, Matthew Wilcox <willy@infradead.org>, "Eric W. Biederman" <ebiederm@xmission.com>, Alexey Gladkov <legion@kernel.org>, Chris Wilson <chris@chris-wilson.co.uk>, Matthew Auld <matthew.auld@intel.com>, Linux FS-devel Mailing List <linux-fsdevel@vger.kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, linux-api@vger.kernel.org, Linux MM <linux-mm@kvack.org> Subject: Re: [PATCH 01/16] huge tmpfs: fix fallocate(vanilla) advance over huge pages Date: Mon, 2 Aug 2021 13:36:04 -0700 [thread overview] Message-ID: <CAHbLzkrvB6r5CKxwhcKmLZN+H3t10UuqeXT5vgi=YzkZjA2qnw@mail.gmail.com> (raw) In-Reply-To: <422db5c4-2490-749c-964b-dd2b93286ed5@google.com> On Sat, Jul 31, 2021 at 8:38 PM Hugh Dickins <hughd@google.com> wrote: > > On Fri, 30 Jul 2021, Yang Shi wrote: > > On Fri, Jul 30, 2021 at 12:25 AM Hugh Dickins <hughd@google.com> wrote: > > > > > > shmem_fallocate() goes to a lot of trouble to leave its newly allocated > > > pages !Uptodate, partly to identify and undo them on failure, partly to > > > leave the overhead of clearing them until later. But the huge page case > > > did not skip to the end of the extent, walked through the tail pages one > > > by one, and appeared to work just fine: but in doing so, cleared and > > > Uptodated the huge page, so there was no way to undo it on failure. > > > > > > Now advance immediately to the end of the huge extent, with a comment on > > > why this is more than just an optimization. But although this speeds up > > > huge tmpfs fallocation, it does leave the clearing until first use, and > > > some users may have come to appreciate slow fallocate but fast first use: > > > if they complain, then we can consider adding a pass to clear at the end. > > > > > > Fixes: 800d8c63b2e9 ("shmem: add huge pages support") > > > Signed-off-by: Hugh Dickins <hughd@google.com> > > > > Reviewed-by: Yang Shi <shy828301@gmail.com> > > Many thanks for reviewing so many of these. > > > > > A nit below: > > > > > --- > > > mm/shmem.c | 19 ++++++++++++++++--- > > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > > index 70d9ce294bb4..0cd5c9156457 100644 > > > --- a/mm/shmem.c > > > +++ b/mm/shmem.c > > > @@ -2736,7 +2736,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > > > inode->i_private = &shmem_falloc; > > > spin_unlock(&inode->i_lock); > > > > > > - for (index = start; index < end; index++) { > > > + for (index = start; index < end; ) { > > > struct page *page; > > > > > > /* > > > @@ -2759,13 +2759,26 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, > > > goto undone; > > > } > > > > > > + index++; > > > + /* > > > + * Here is a more important optimization than it appears: > > > + * a second SGP_FALLOC on the same huge page will clear it, > > > + * making it PageUptodate and un-undoable if we fail later. > > > + */ > > > + if (PageTransCompound(page)) { > > > + index = round_up(index, HPAGE_PMD_NR); > > > + /* Beware 32-bit wraparound */ > > > + if (!index) > > > + index--; > > > + } > > > + > > > /* > > > * Inform shmem_writepage() how far we have reached. > > > * No need for lock or barrier: we have the page lock. > > > */ > > > - shmem_falloc.next++; > > > if (!PageUptodate(page)) > > > - shmem_falloc.nr_falloced++; > > > + shmem_falloc.nr_falloced += index - shmem_falloc.next; > > > + shmem_falloc.next = index; > > > > This also fixed the wrong accounting of nr_falloced, so it should be > > able to avoid returning -ENOMEM prematurely IIUC. Is it worth > > mentioning in the commit log? > > It took me a long time to see your point there: ah yes, because it made > the whole huge page Uptodate when it reached the first tail, there would > have been only one nr_falloced++ for the whole of the huge page: well > spotted, thanks, I hadn't realized that. > > Though I'm not so sure about your premature -ENOMEM: because once it has > made the huge page Uptodate, the other end (shmem_writepage()) will not > be incrementing nr_unswapped at all: so -ENOMEM would have been deferred > rather than premature, wouldn't it? Ah, ok, I didn't pay too much attention to how nr_unswapped is incremented. Just thought nr_falloced will be incremented by 512 rather than 1, so it is more unlikely to return -ENOMEM. > > Add a comment on this in the commit log: yes, I guess so, but I haven't > worked out what to write yet. > > Hugh > > > > > > > > > /* > > > * If !PageUptodate, leave it that way so that freeable pages > > > -- > > > 2.26.2
next prev parent reply other threads:[~2021-08-02 20:36 UTC|newest] Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-30 7:22 [PATCH 00/16] tmpfs: HUGEPAGE and MEM_LOCK fcntls and memfds Hugh Dickins 2021-07-30 7:25 ` [PATCH 01/16] huge tmpfs: fix fallocate(vanilla) advance over huge pages Hugh Dickins 2021-07-30 21:36 ` Yang Shi 2021-08-01 3:38 ` Hugh Dickins 2021-08-02 20:36 ` Yang Shi [this message] 2021-07-30 7:28 ` [PATCH 02/16] huge tmpfs: fix split_huge_page() after FALLOC_FL_KEEP_SIZE Hugh Dickins 2021-07-30 23:48 ` Yang Shi 2021-07-30 7:30 ` [PATCH 03/16] huge tmpfs: remove shrinklist addition from shmem_setattr() Hugh Dickins 2021-07-30 21:50 ` Yang Shi 2021-07-30 7:36 ` [PATCH 04/16] huge tmpfs: revert shmem's use of transhuge_vma_enabled() Hugh Dickins 2021-07-30 21:56 ` Yang Shi 2021-08-01 4:01 ` Hugh Dickins 2021-08-02 20:39 ` Yang Shi 2021-07-30 7:39 ` [PATCH 05/16] huge tmpfs: move shmem_huge_enabled() upwards Hugh Dickins 2021-07-30 21:57 ` Yang Shi 2021-07-30 7:42 ` [PATCH 06/16] huge tmpfs: shmem_is_huge(vma, inode, index) Hugh Dickins 2021-07-30 23:34 ` Yang Shi 2021-08-01 5:22 ` Hugh Dickins 2021-08-01 5:37 ` Hugh Dickins 2021-08-02 21:14 ` Yang Shi 2021-08-04 8:28 ` Hugh Dickins 2021-08-04 19:01 ` Yang Shi 2021-08-06 5:21 ` Hugh Dickins 2021-08-06 17:41 ` Yang Shi 2021-08-05 23:04 ` Yang Shi 2021-08-06 5:43 ` Hugh Dickins 2021-08-06 17:57 ` Yang Shi 2021-08-12 18:19 ` Yang Shi 2021-07-30 7:45 ` [PATCH 07/16] memfd: memfd_create(name, MFD_HUGEPAGE) for shmem huge pages Hugh Dickins 2021-08-04 14:03 ` Kirill A. Shutemov 2021-08-06 3:33 ` Hugh Dickins 2021-07-30 7:48 ` [PATCH 08/16] huge tmpfs: fcntl(fd, F_HUGEPAGE) and fcntl(fd, F_NOHUGEPAGE) Hugh Dickins 2021-08-04 14:08 ` Kirill A. Shutemov 2021-08-06 4:34 ` Hugh Dickins 2021-07-30 7:51 ` [PATCH 09/16] huge tmpfs: decide stat.st_blksize by shmem_is_huge() Hugh Dickins 2021-07-30 23:40 ` Yang Shi 2021-07-30 7:55 ` [PATCH 10/16] tmpfs: fcntl(fd, F_MEM_LOCK) to memlock a tmpfs file Hugh Dickins 2021-08-03 1:38 ` Matthew Wilcox 2021-08-04 9:15 ` Hugh Dickins 2021-07-30 7:57 ` [PATCH 11/16] tmpfs: fcntl(fd, F_MEM_LOCKED) to test if memlocked Hugh Dickins 2021-07-30 8:00 ` [PATCH 12/16] tmpfs: refuse memlock when fallocated beyond i_size Hugh Dickins 2021-07-30 8:03 ` [PATCH 13/16] mm: bool user_shm_lock(loff_t size, struct ucounts *) Hugh Dickins 2021-07-30 8:06 ` [PATCH 14/16] mm: user_shm_lock(,,getuc) and user_shm_unlock(,,putuc) Hugh Dickins 2021-07-30 8:09 ` [PATCH 15/16] tmpfs: permit changing size of memlocked file Hugh Dickins 2021-07-30 8:13 ` [PATCH 16/16] memfd: memfd_create(name, MFD_MEM_LOCK) for memlocked shmem Hugh Dickins
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAHbLzkrvB6r5CKxwhcKmLZN+H3t10UuqeXT5vgi=YzkZjA2qnw@mail.gmail.com' \ --to=shy828301@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=chris@chris-wilson.co.uk \ --cc=ebiederm@xmission.com \ --cc=hch@infradead.org \ --cc=hughd@google.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=legion@kernel.org \ --cc=linmiaohe@huawei.com \ --cc=linux-api@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=matthew.auld@intel.com \ --cc=mhocko@suse.com \ --cc=mike.kravetz@oracle.com \ --cc=riel@surriel.com \ --cc=shakeelb@google.com \ --cc=willy@infradead.org \ --subject='Re: [PATCH 01/16] huge tmpfs: fix fallocate(vanilla) advance over huge pages' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).