From: Andy Lutomirski <luto@amacapital.net> To: linux-mm@kvack.org, linux-ext4@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Andy Lutomirski <luto@amacapital.net> Subject: [RFC 0/3] Add madvise(..., MADV_WILLWRITE) Date: Mon, 5 Aug 2013 12:43:58 -0700 [thread overview] Message-ID: <cover.1375729665.git.luto@amacapital.net> (raw) My application fallocates and mmaps (shared, writable) a lot (several GB) of data at startup. Those mappings are mlocked, and they live on ext4. The first write to any given page is slow because ext4_da_get_block_prep can block. This means that, to get decent performance, I need to write something to all of these pages at startup. This, in turn, causes a giant IO storm as several GB of zeros get pointlessly written to disk. This series is an attempt to add madvise(..., MADV_WILLWRITE) to signal to the kernel that I will eventually write to the referenced pages. It should cause any expensive operations that happen on the first write to happen immediately, but it should not result in dirtying the pages. madvice(addr, len, MADV_WILLWRITE) returns the number of bytes that the operation succeeded on or a negative error code if there was an actual failure. A return value of zero signifies that the kernel doesn't know how to "willwrite" the range and that userspace should implement a fallback. For now, it only works on shared writable ext4 mappings. Eventually it should support other filesystems as well as private pages (it should COW the pages but not cause swap IO) and anonymous pages (it should COW the zero page if applicable). The implementation leaves much to be desired. In particular, it generates dirty buffer heads on a clean page, and this scares me. Thoughts? Andy Lutomirski (3): mm: Add MADV_WILLWRITE to indicate that a range will be written to fs: Add block_willwrite ext4: Implement willwrite for the delalloc case fs/buffer.c | 57 ++++++++++++++++++++++++++++++++++ fs/ext4/ext4.h | 2 ++ fs/ext4/file.c | 1 + fs/ext4/inode.c | 22 +++++++++++++ include/linux/buffer_head.h | 3 ++ include/linux/mm.h | 12 +++++++ include/uapi/asm-generic/mman-common.h | 3 ++ mm/madvise.c | 28 +++++++++++++++-- 8 files changed, 126 insertions(+), 2 deletions(-) -- 1.8.3.1
WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@amacapital.net> To: linux-mm@kvack.org, linux-ext4@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Andy Lutomirski <luto@amacapital.net> Subject: [RFC 0/3] Add madvise(..., MADV_WILLWRITE) Date: Mon, 5 Aug 2013 12:43:58 -0700 [thread overview] Message-ID: <cover.1375729665.git.luto@amacapital.net> (raw) My application fallocates and mmaps (shared, writable) a lot (several GB) of data at startup. Those mappings are mlocked, and they live on ext4. The first write to any given page is slow because ext4_da_get_block_prep can block. This means that, to get decent performance, I need to write something to all of these pages at startup. This, in turn, causes a giant IO storm as several GB of zeros get pointlessly written to disk. This series is an attempt to add madvise(..., MADV_WILLWRITE) to signal to the kernel that I will eventually write to the referenced pages. It should cause any expensive operations that happen on the first write to happen immediately, but it should not result in dirtying the pages. madvice(addr, len, MADV_WILLWRITE) returns the number of bytes that the operation succeeded on or a negative error code if there was an actual failure. A return value of zero signifies that the kernel doesn't know how to "willwrite" the range and that userspace should implement a fallback. For now, it only works on shared writable ext4 mappings. Eventually it should support other filesystems as well as private pages (it should COW the pages but not cause swap IO) and anonymous pages (it should COW the zero page if applicable). The implementation leaves much to be desired. In particular, it generates dirty buffer heads on a clean page, and this scares me. Thoughts? Andy Lutomirski (3): mm: Add MADV_WILLWRITE to indicate that a range will be written to fs: Add block_willwrite ext4: Implement willwrite for the delalloc case fs/buffer.c | 57 ++++++++++++++++++++++++++++++++++ fs/ext4/ext4.h | 2 ++ fs/ext4/file.c | 1 + fs/ext4/inode.c | 22 +++++++++++++ include/linux/buffer_head.h | 3 ++ include/linux/mm.h | 12 +++++++ include/uapi/asm-generic/mman-common.h | 3 ++ mm/madvise.c | 28 +++++++++++++++-- 8 files changed, 126 insertions(+), 2 deletions(-) -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2013-08-05 19:44 UTC|newest] Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-08-05 19:43 Andy Lutomirski [this message] 2013-08-05 19:43 ` [RFC 0/3] Add madvise(..., MADV_WILLWRITE) Andy Lutomirski 2013-08-05 19:43 ` [RFC 1/3] mm: Add MADV_WILLWRITE to indicate that a range will be written to Andy Lutomirski 2013-08-05 19:43 ` Andy Lutomirski 2013-08-05 19:44 ` [RFC 2/3] fs: Add block_willwrite Andy Lutomirski 2013-08-05 19:44 ` Andy Lutomirski 2013-08-05 19:44 ` [RFC 3/3] ext4: Implement willwrite for the delalloc case Andy Lutomirski 2013-08-05 19:44 ` Andy Lutomirski 2013-08-07 13:40 ` [RFC 0/3] Add madvise(..., MADV_WILLWRITE) Jan Kara 2013-08-07 13:40 ` Jan Kara 2013-08-07 17:02 ` Andy Lutomirski 2013-08-07 17:02 ` Andy Lutomirski 2013-08-07 17:40 ` Dave Hansen 2013-08-07 17:40 ` Dave Hansen 2013-08-07 18:00 ` Andy Lutomirski 2013-08-07 18:00 ` Andy Lutomirski 2013-08-08 10:18 ` Jan Kara 2013-08-08 10:18 ` Jan Kara 2013-08-08 15:56 ` Andy Lutomirski 2013-08-08 15:56 ` Andy Lutomirski 2013-08-08 18:53 ` Jan Kara 2013-08-08 18:53 ` Jan Kara 2013-08-08 19:25 ` Andy Lutomirski 2013-08-08 19:25 ` Andy Lutomirski 2013-08-08 22:58 ` Dave Hansen 2013-08-08 22:58 ` Dave Hansen 2013-08-09 7:55 ` Jan Kara 2013-08-09 7:55 ` Jan Kara 2013-08-09 17:36 ` Andy Lutomirski 2013-08-09 17:36 ` Andy Lutomirski 2013-08-09 20:34 ` Jan Kara 2013-08-09 20:34 ` Jan Kara 2013-08-09 17:42 ` Dave Hansen 2013-08-09 17:42 ` Dave Hansen 2013-08-09 17:44 ` Andy Lutomirski 2013-08-09 17:44 ` Andy Lutomirski 2013-08-12 22:44 ` Dave Hansen 2013-08-12 22:44 ` Dave Hansen 2013-08-09 0:11 ` Andy Lutomirski 2013-08-09 0:11 ` Andy Lutomirski
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=cover.1375729665.git.luto@amacapital.net \ --to=luto@amacapital.net \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.