From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Wang Shilong <wshilong@ddn.com>,
Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 02/41] lustre: llite: make readahead aware of hints
Date: Sun, 4 Apr 2021 20:50:31 -0400 [thread overview]
Message-ID: <1617583870-32029-3-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1617583870-32029-1-git-send-email-jsimmons@infradead.org>
From: Wang Shilong <wshilong@ddn.com>
Calling madvise(MADV_SEQUENTIAL) and madvise(MADV_RANDOM) sets the
VM_SEQ_READ and VM_RAND_READ hints in vma->vm_flags. These should
be used to guide the Lustre readahead for better performance.
Disable the kernel readahead for mmap() pages and use the llite
readahead instead. There was also a bug in __ll_fault() that would
set both VM_SEQ_READ and VM_RAND_READ at the same time, which was
confusing the detection of the VM_SEQ_READ case, since VM_RAND_READ
was being checked first.
This changes the readahead for mmap from submitting mostly 4KB RPCs
to a large number of 1MB RPCs for the application profiled:
llite.*.read_ahead_stats before patched
------------------------ ------ -------
hits 2408 135924 samples [pages]
misses 34160 2384 samples [pages]
osc.*.rpc_stats read before read patched
--------------- ------------- --------------
pages per rpc rpcs % cum% rpcs % cum%
1: 6542 95 95 351 55 55
2: 224 3 99 76 12 67
4: 32 0 99 28 4 72
8: 2 0 99 9 1 73
16: 25 0 99 32 5 78
32: 0 0 99 8 1 80
64: 0 0 99 5 0 80
128: 0 0 99 15 2 83
256: 2 0 99 102 16 99
512: 0 0 99 0 0 99
1024: 1 0 100 3 0 100
Readahead hit rate improved from 6% to 98%, and 4KB RPCs dropped from
95% to 55% and 1MB+ RPCs increased from 0% to 16% (79% of all pages).
Add debug to ll_file_mmap(), ll_fault() and ll_fault_io_init() to
allow tracing VMA state functions for future IO optimizations.
WC-bug-id: https://jira.whamcloud.com/browse/LU-13669
Lustre-commit: 7542820698696ed ("LU-13669 llite: make readahead aware of hints")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/41228
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
fs/lustre/include/cl_object.h | 10 +++++++++-
fs/lustre/llite/file.c | 2 ++
fs/lustre/llite/llite_mmap.c | 42 ++++++++++++++++++++++--------------------
fs/lustre/llite/rw.c | 20 ++++++++++++++++----
4 files changed, 49 insertions(+), 25 deletions(-)
diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 4f34e5d..739fe5b 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -1974,7 +1974,15 @@ struct cl_io {
* the read IO will check to-be-read OSCs' status, and make fast-switch
* another mirror if some of the OSTs are not healthy.
*/
- ci_tried_all_mirrors:1;
+ ci_tried_all_mirrors:1,
+ /**
+ * Random read hints, readahead will be disabled.
+ */
+ ci_rand_read:1,
+ /**
+ * Sequential read hints.
+ */
+ ci_seq_read:1;
/**
* Bypass quota check
*/
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 7c7ac01..fd01e14 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -736,6 +736,8 @@ static int ll_local_open(struct file *file, struct lookup_intent *it,
file->private_data = fd;
ll_readahead_init(inode, &fd->fd_ras);
fd->fd_omode = it->it_flags & (FMODE_READ | FMODE_WRITE | FMODE_EXEC);
+ /* turn off the kernel's read-ahead */
+ file->f_ra.ra_pages = 0;
/* ll_cl_context initialize */
rwlock_init(&fd->fd_lock);
diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
index f0be7ba..b9a73e0 100644
--- a/fs/lustre/llite/llite_mmap.c
+++ b/fs/lustre/llite/llite_mmap.c
@@ -84,13 +84,11 @@ struct vm_area_struct *our_vma(struct mm_struct *mm, unsigned long addr,
* @vma virtual memory area addressed to page fault
* @env corespondent lu_env to processing
* @index page index corespondent to fault.
- * @ra_flags vma readahead flags.
*
- * \return error codes from cl_io_init.
+ * RETURN error codes from cl_io_init.
*/
static struct cl_io *
-ll_fault_io_init(struct lu_env *env, struct vm_area_struct *vma,
- pgoff_t index, unsigned long *ra_flags)
+ll_fault_io_init(struct lu_env *env, struct vm_area_struct *vma, pgoff_t index)
{
struct file *file = vma->vm_file;
struct inode *inode = file_inode(file);
@@ -110,18 +108,15 @@ struct vm_area_struct *our_vma(struct mm_struct *mm, unsigned long addr,
fio->ft_index = index;
fio->ft_executable = vma->vm_flags & VM_EXEC;
- /*
- * disable VM_SEQ_READ and use VM_RAND_READ to make sure that
- * the kernel will not read other pages not covered by ldlm in
- * filemap_nopage. we do our readahead in ll_readpage.
- */
- if (ra_flags)
- *ra_flags = vma->vm_flags & (VM_RAND_READ | VM_SEQ_READ);
- vma->vm_flags &= ~VM_SEQ_READ;
- vma->vm_flags |= VM_RAND_READ;
+ CDEBUG(D_MMAP,
+ DFID": vma=%p start=%#lx end=%#lx vm_flags=%#lx idx=%lu\n",
+ PFID(&ll_i2info(inode)->lli_fid), vma, vma->vm_start,
+ vma->vm_end, vma->vm_flags, fio->ft_index);
- CDEBUG(D_MMAP, "vm_flags: %lx (%lu %d)\n", vma->vm_flags,
- fio->ft_index, fio->ft_executable);
+ if (vma->vm_flags & VM_SEQ_READ)
+ io->ci_seq_read = 1;
+ else if (vma->vm_flags & VM_RAND_READ)
+ io->ci_rand_read = 1;
rc = cl_io_init(env, io, CIT_FAULT, io->ci_obj);
if (rc == 0) {
@@ -161,7 +156,7 @@ static int __ll_page_mkwrite(struct vm_area_struct *vma, struct page *vmpage,
if (IS_ERR(env))
return PTR_ERR(env);
- io = ll_fault_io_init(env, vma, vmpage->index, NULL);
+ io = ll_fault_io_init(env, vma, vmpage->index);
if (IS_ERR(io)) {
result = PTR_ERR(io);
goto out;
@@ -277,7 +272,6 @@ static vm_fault_t __ll_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
struct cl_io *io;
struct vvp_io *vio = NULL;
struct page *vmpage;
- unsigned long ra_flags;
int result = 0;
vm_fault_t fault_ret = 0;
u16 refcheck;
@@ -314,7 +308,7 @@ static vm_fault_t __ll_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
fault_ret = 0;
}
- io = ll_fault_io_init(env, vma, vmf->pgoff, &ra_flags);
+ io = ll_fault_io_init(env, vma, vmf->pgoff);
if (IS_ERR(io)) {
fault_ret = to_fault_error(PTR_ERR(io));
goto out;
@@ -350,8 +344,6 @@ static vm_fault_t __ll_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
}
cl_io_fini(env, io);
- vma->vm_flags |= ra_flags;
-
out:
cl_env_put(env, &refcheck);
if (result != 0 && !(fault_ret & VM_FAULT_RETRY))
@@ -375,6 +367,10 @@ static vm_fault_t ll_fault(struct vm_fault *vmf)
if (cached)
goto out;
+ CDEBUG(D_MMAP, DFID": vma=%p start=%#lx end=%#lx vm_flags=%#lx\n",
+ PFID(&ll_i2info(file_inode(vma->vm_file))->lli_fid),
+ vma, vma->vm_start, vma->vm_end, vma->vm_flags);
+
/* Only SIGKILL and SIGTERM are allowed for fault/nopage/mkwrite
* so that it can be killed by admin but not cause segfault by
* other signals.
@@ -385,6 +381,7 @@ static vm_fault_t ll_fault(struct vm_fault *vmf)
/* make sure offset is not a negative number */
if (vmf->pgoff > (MAX_LFS_FILESIZE >> PAGE_SHIFT))
return VM_FAULT_SIGBUS;
+
restart:
result = __ll_fault(vmf->vma, vmf);
if (vmf->page &&
@@ -545,6 +542,11 @@ int ll_file_mmap(struct file *file, struct vm_area_struct *vma)
bool cached;
int rc;
+ CDEBUG(D_VFSTRACE | D_MMAP,
+ "VFS_Op: fid="DFID" vma=%p start=%#lx end=%#lx vm_flags=%#lx\n",
+ PFID(&ll_i2info(inode)->lli_fid),
+ vma, vma->vm_start, vma->vm_end, vma->vm_flags);
+
if (ll_file_nolock(file))
return -EOPNOTSUPP;
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 096e015..8bba97f 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -1255,7 +1255,7 @@ static bool index_in_stride_window(struct ll_readahead_state *ras,
*/
static void ras_update(struct ll_sb_info *sbi, struct inode *inode,
struct ll_readahead_state *ras, pgoff_t index,
- enum ras_update_flags flags)
+ enum ras_update_flags flags, struct cl_io *io)
{
struct ll_ra_info *ra = &sbi->ll_ra_info;
bool hit = flags & LL_RAS_HIT;
@@ -1276,6 +1276,18 @@ static void ras_update(struct ll_sb_info *sbi, struct inode *inode,
if (ras->ras_no_miss_check)
goto out_unlock;
+ if (io && io->ci_rand_read)
+ goto out_unlock;
+
+ if (io && io->ci_seq_read) {
+ if (!hit) {
+ /* to avoid many small read RPC here */
+ ras->ras_window_pages = sbi->ll_ra_info.ra_range_pages;
+ ll_ra_stats_inc_sbi(sbi, RA_STAT_MMAP_RANGE_READ);
+ }
+ goto skip;
+ }
+
if (flags & LL_RAS_MMAP) {
unsigned long ra_pages;
@@ -1594,7 +1606,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
flags |= LL_RAS_HIT;
if (!vio->vui_ra_valid)
flags |= LL_RAS_MMAP;
- ras_update(sbi, inode, ras, vvp_index(vpg), flags);
+ ras_update(sbi, inode, ras, vvp_index(vpg), flags, io);
}
cl_2queue_init(queue);
@@ -1613,7 +1625,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
io_start_index = cl_index(io->ci_obj, io->u.ci_rw.crw_pos);
io_end_index = cl_index(io->ci_obj, io->u.ci_rw.crw_pos +
io->u.ci_rw.crw_count - 1);
- if (ll_readahead_enabled(sbi) && ras) {
+ if (ll_readahead_enabled(sbi) && ras && !io->ci_rand_read) {
pgoff_t skip_index = 0;
if (ras->ras_next_readahead_idx < vvp_index(vpg))
@@ -1802,7 +1814,7 @@ int ll_readpage(struct file *file, struct page *vmpage)
* if the page is hit in cache because non cache page
* case will be handled by slow read later.
*/
- ras_update(sbi, inode, ras, vvp_index(vpg), flags);
+ ras_update(sbi, inode, ras, vvp_index(vpg), flags, io);
/* avoid duplicate ras_update() call */
vpg->vpg_ra_updated = 1;
--
1.8.3.1
_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
next prev parent reply other threads:[~2021-04-05 0:51 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-05 0:50 [lustre-devel] [PATCH 00/41] lustre: sync to OpenSFS branch as of March 1 James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 01/41] lustre: llite: data corruption due to RPC reordering James Simmons
2021-04-05 0:50 ` James Simmons [this message]
2021-04-05 0:50 ` [lustre-devel] [PATCH 03/41] lustre: lov: avoid NULL dereference in cleanup James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 04/41] lustre: llite: quiet spurious ioctl warning James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 05/41] lustre: ptlrpc: do not output error when imp_sec is freed James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 06/41] lustre: update version to 2.14.0 James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 07/41] lnet: UDSP storage and marshalled structs James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 08/41] lnet: foundation patch for selection mod James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 09/41] lnet: Preferred gateway selection James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 10/41] lnet: Select NI/peer NI with highest prio James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 11/41] lnet: select best peer and local net James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 12/41] lnet: UDSP handling James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 13/41] lnet: Apply UDSP on local and remote NIs James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 14/41] lnet: Add the kernel level Marshalling API James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 15/41] lnet: Add the kernel level De-Marshalling API James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 16/41] lnet: Add the ioctl handler for "add policy" James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 17/41] lnet: ioctl handler for "delete policy" James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 18/41] lnet: ioctl handler for get policy info James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 19/41] lustre: update version to 2.14.50 James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 20/41] lustre: gss: handle empty reqmsg in sptlrpc_req_ctx_switch James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 21/41] lustre: sec: file ioctls to handle encryption policies James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 22/41] lustre: obdclass: try to skip corrupted llog records James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 23/41] lustre: lov: fix layout generation inc for mirror split James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 24/41] lnet: modify assertion in lnet_post_send_locked James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 25/41] lustre: lov: fixes bitfield in lod qos code James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 26/41] lustre: lov: grant deadlock if same OSC in two components James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 27/41] lustre: change EWOULDBLOCK to EAGAIN James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 28/41] lsutre: ldlm: return error from ldlm_namespace_new() James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 29/41] lustre: llite: remove unused ll_teardown_mmaps() James Simmons
2021-04-05 0:50 ` [lustre-devel] [PATCH 30/41] lustre: lov: style cleanups in lov_set_osc_active() James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 31/41] lustre: change various operations structs to const James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 32/41] lustre: mark strings in char arrays as const James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 33/41] lustre: convert snprintf to scnprintf as appropriate James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 34/41] lustre: remove non-static 'inline' markings James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 35/41] lustre: llite: use is_root_inode() James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 36/41] lnet: libcfs: discard cfs_firststr James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 37/41] lnet: place wire protocol data int own headers James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 38/41] lnet: libcfs: use wait_event_timeout() in tracefiled() James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 39/41] lnet: use init_wait() rather than init_waitqueue_entry() James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 40/41] lnet: discard LNET_MD_PHYS James Simmons
2021-04-05 0:51 ` [lustre-devel] [PATCH 41/41] lnet: o2iblnd: convert peers hash table to hashtable.h James Simmons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1617583870-32029-3-git-send-email-jsimmons@infradead.org \
--to=jsimmons@infradead.org \
--cc=adilger@whamcloud.com \
--cc=green@whamcloud.com \
--cc=lustre-devel@lists.lustre.org \
--cc=neilb@suse.de \
--cc=wshilong@ddn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).