All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] Limitations of kernel read ahead
@ 2018-10-29 17:06 James Simmons
  2018-10-30  2:00 ` Li Xi
  0 siblings, 1 reply; 5+ messages in thread
From: James Simmons @ 2018-10-29 17:06 UTC (permalink / raw)
  To: lustre-devel


Currently the lustre client has its own read ahead handling in the CLIO 
layer. The reason for this is due to some limitations in the read ahead
code for the linux kernel. Some work to use the kernel's read ahead was 
attempted for the LU-8964 work but the general work for LU-8964 had other
issues. Alternative work to LU-8964 has emerged under ticket

https://jira.whamcloud.com/browse/LU-8709

with early code at:

https://review.whamcloud.com/#/c/23552

Also I have included a link to a presentation of this work and it gives
insight on how lustre does its own read ahead.

https://www.eofs.eu/_media/events/lad16/19_parallel_readahead_framework_li_xi.pdf

Now that this seems to be the targeted work for read ahead the discussion
has come up about why this new work doesn't use the kernel read ahead 
again. I wasn't involved in the discussion about the limitations but I 
have included the people interested in this work so progress can be done
to imporve the linux kernels version of read ahead.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [lustre-devel] Limitations of kernel read ahead
  2018-10-29 17:06 [lustre-devel] Limitations of kernel read ahead James Simmons
@ 2018-10-30  2:00 ` Li Xi
  2018-10-30  7:16   ` Andreas Dilger
  2018-11-07 20:32   ` Latham, Robert J.
  0 siblings, 2 replies; 5+ messages in thread
From: Li Xi @ 2018-10-30  2:00 UTC (permalink / raw)
  To: lustre-devel

Thank you for summarize this, James!

I think everyone agrees that the current readahead algorithm of Lustre needs to be improved. And evidences show that the readahead algorithm of Linux kernel would not suitable for Lustre either. There are several reasons for this. In general, the readahead algorithm of kernel is designed for local file system with small readahead window. It is single thread, synchronous readahead, only usable for sequential read. Because the read operation of Lustre is has longer latency than local file system, while its bandwidth is typically higher than local file system, we need totally different algorithm for Lustre readahead. The readahead algorithm needs to be 1) asynchronous to hide latency for application 2) multiple threaded to utilize the high bandwidth 3) use big readahead window to align with the big RPC size 4) work for sequential read, stride read and potentially small & random read.

The work of LU-8709 was started with these targets and got pretty good numbers even without detailed tuning. We (the Whamcloud team) would like to rework on it with a goal of merging it in the next releases of Lustre.

Regards,
Li Xi

?? 2018/10/30 ??2:06??James Simmons?<jsimmons@infradead.org> ??:

    
    Currently the lustre client has its own read ahead handling in the CLIO 
    layer. The reason for this is due to some limitations in the read ahead
    code for the linux kernel. Some work to use the kernel's read ahead was 
    attempted for the LU-8964 work but the general work for LU-8964 had other
    issues. Alternative work to LU-8964 has emerged under ticket
    
    https://jira.whamcloud.com/browse/LU-8709
    
    with early code at:
    
    https://review.whamcloud.com/#/c/23552
    
    Also I have included a link to a presentation of this work and it gives
    insight on how lustre does its own read ahead.
    
    https://www.eofs.eu/_media/events/lad16/19_parallel_readahead_framework_li_xi.pdf
    
    Now that this seems to be the targeted work for read ahead the discussion
    has come up about why this new work doesn't use the kernel read ahead 
    again. I wasn't involved in the discussion about the limitations but I 
    have included the people interested in this work so progress can be done
    to imporve the linux kernels version of read ahead.
    

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [lustre-devel] Limitations of kernel read ahead
  2018-10-30  2:00 ` Li Xi
@ 2018-10-30  7:16   ` Andreas Dilger
  2018-11-04 21:37     ` James Simmons
  2018-11-07 20:32   ` Latham, Robert J.
  1 sibling, 1 reply; 5+ messages in thread
From: Andreas Dilger @ 2018-10-30  7:16 UTC (permalink / raw)
  To: lustre-devel

One other enhancement that would be good to make for the readahead is to add opportunistic readahead for random access of smaller files (as determined by file size and client RAM) as discussed in LU-11416.  This has shown significant improvement for random access (about 40x speedup when doing random read IOPS on a 1GB file).

On Oct 29, 2018, at 20:00, Li Xi <lixi@ddn.com> wrote:
> 
> Thank you for summarize this, James!
> 
> I think everyone agrees that the current readahead algorithm of Lustre needs to be improved. And evidences show that the readahead algorithm of Linux kernel would not suitable for Lustre either. There are several reasons for this. In general, the readahead algorithm of kernel is designed for local file system with small readahead window. It is single thread, synchronous readahead, only usable for sequential read. Because the read operation of Lustre is has longer latency than local file system, while its bandwidth is typically higher than local file system, we need totally different algorithm for Lustre readahead. The readahead algorithm needs to be 1) asynchronous to hide latency for application 2) multiple threaded to utilize the high bandwidth 3) use big readahead window to align with the big RPC size 4) work for sequential read, stride read and potentially small & random read.
> 
> The work of LU-8709 was started with these targets and got pretty good numbers even without detailed tuning. We (the Whamcloud team) would like to rework on it with a goal of merging it in the next releases of Lustre.
> 
> Regards,
> Li Xi
> 
> ?? 2018/10/30 ??2:06??James Simmons?<jsimmons@infradead.org> ??:
>> 
>> 
>>    Currently the lustre client has its own read ahead handling in the CLIO 
>>    layer. The reason for this is due to some limitations in the read ahead
>>    code for the linux kernel. Some work to use the kernel's read ahead was 
>>    attempted for the LU-8964 work but the general work for LU-8964 had other
>>    issues. Alternative work to LU-8964 has emerged under ticket
>> 
>>    https://jira.whamcloud.com/browse/LU-8709
>> 
>>    with early code at:
>> 
>>    https://review.whamcloud.com/#/c/23552
>> 
>>    Also I have included a link to a presentation of this work and it gives
>>    insight on how lustre does its own read ahead.
>> 
>>    https://www.eofs.eu/_media/events/lad16/19_parallel_readahead_framework_li_xi.pdf
>> 
>>    Now that this seems to be the targeted work for read ahead the discussion
>>    has come up about why this new work doesn't use the kernel read ahead 
>>    again. I wasn't involved in the discussion about the limitations but I 
>>    have included the people interested in this work so progress can be done
>>    to imporve the linux kernels version of read ahead.
>> 
>> 

Cheers, Andreas
---
Andreas Dilger
CTO Whamcloud

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [lustre-devel] Limitations of kernel read ahead
  2018-10-30  7:16   ` Andreas Dilger
@ 2018-11-04 21:37     ` James Simmons
  0 siblings, 0 replies; 5+ messages in thread
From: James Simmons @ 2018-11-04 21:37 UTC (permalink / raw)
  To: lustre-devel


> One other enhancement that would be good to make for the readahead is to add opportunistic readahead for random access of smaller files (as determined by file size and client RAM) as discussed in LU-11416.  This has shown significant improvement for random access (about 40x speedup when doing random read IOPS on a 1GB file).

Has anyone created patches for the kernels read ahead to get this work 
started?
 
> On Oct 29, 2018, at 20:00, Li Xi <lixi@ddn.com> wrote:
> > 
> > Thank you for summarize this, James!
> > 
> > I think everyone agrees that the current readahead algorithm of Lustre needs to be improved. And evidences show that the readahead algorithm of Linux kernel would not suitable for Lustre either. There are several reasons for this. In general, the readahead algorithm of kernel is designed for local file system with small readahead window. It is single thread, synchronous readahead, only usable for sequential read. Because the read operation of Lustre is has longer latency than local file system, while its bandwidth is typically higher than local file system, we need totally different algorithm for Lustre readahead. The readahead algorithm needs to be 1) asynchronous to hide latency for application 2) multiple threaded to utilize the high bandwidth 3) use big readahead window to align with the big RPC size 4) work for sequential read, stride read and potentially small & random read.
> > 
> > The work of LU-8709 was started with these targets and got pretty good numbers even without detailed tuning. We (the Whamcloud team) would like to rework on it with a goal of merging it in the next releases of Lustre.
> > 
> > Regards,
> > Li Xi
> > 
> > ?? 2018/10/30 ??2:06??James Simmons?<jsimmons@infradead.org> ??:
> >> 
> >> 
> >>    Currently the lustre client has its own read ahead handling in the CLIO 
> >>    layer. The reason for this is due to some limitations in the read ahead
> >>    code for the linux kernel. Some work to use the kernel's read ahead was 
> >>    attempted for the LU-8964 work but the general work for LU-8964 had other
> >>    issues. Alternative work to LU-8964 has emerged under ticket
> >> 
> >>    https://jira.whamcloud.com/browse/LU-8709
> >> 
> >>    with early code at:
> >> 
> >>    https://review.whamcloud.com/#/c/23552
> >> 
> >>    Also I have included a link to a presentation of this work and it gives
> >>    insight on how lustre does its own read ahead.
> >> 
> >>    https://www.eofs.eu/_media/events/lad16/19_parallel_readahead_framework_li_xi.pdf
> >> 
> >>    Now that this seems to be the targeted work for read ahead the discussion
> >>    has come up about why this new work doesn't use the kernel read ahead 
> >>    again. I wasn't involved in the discussion about the limitations but I 
> >>    have included the people interested in this work so progress can be done
> >>    to imporve the linux kernels version of read ahead.
> >> 
> >> 
> 
> Cheers, Andreas
> ---
> Andreas Dilger
> CTO Whamcloud
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [lustre-devel] Limitations of kernel read ahead
  2018-10-30  2:00 ` Li Xi
  2018-10-30  7:16   ` Andreas Dilger
@ 2018-11-07 20:32   ` Latham, Robert J.
  1 sibling, 0 replies; 5+ messages in thread
From: Latham, Robert J. @ 2018-11-07 20:32 UTC (permalink / raw)
  To: lustre-devel

On Tue, 2018-10-30 at 02:00 +0000, Li Xi wrote:
> Thank you for summarize this, James!
> 
> I think everyone agrees that the current readahead algorithm of
> Lustre needs to be improved. And evidences show that the readahead
> algorithm of Linux kernel would not suitable for Lustre either. There
> are several reasons for this. In general, the readahead algorithm of
> kernel is designed for local file system with small readahead window.
> It is single thread, synchronous readahead, only usable for
> sequential read. Because the read operation of Lustre is has longer
> latency than local file system, while its bandwidth is typically
> higher than local file system, we need totally different algorithm
> for Lustre readahead. The readahead algorithm needs to be 1)
> asynchronous to hide latency for application 2) multiple threaded to
> utilize the high bandwidth 3) use big readahead window to align with
> the big RPC size 4) work for sequential read, stride read and
> potentially small & random read.

Please don't forget that HPC workloads are likely to fall into category
4.

==rob

> The work of LU-8709 was started with these targets and got pretty
> good numbers even without detailed tuning. We (the Whamcloud team)
> would like to rework on it with a goal of merging it in the next
> releases of Lustre.
> 
> Regards,
> Li Xi
> 
> ?? 2018/10/30 ??2:06??James Simmons?<jsimmons@infradead.org> ??:
> 
>     
>     Currently the lustre client has its own read ahead handling in
> the CLIO 
>     layer. The reason for this is due to some limitations in the read
> ahead
>     code for the linux kernel. Some work to use the kernel's read
> ahead was 
>     attempted for the LU-8964 work but the general work for LU-8964
> had other
>     issues. Alternative work to LU-8964 has emerged under ticket
>     
>     https://jira.whamcloud.com/browse/LU-8709
>     
>     with early code at:
>     
>     https://review.whamcloud.com/#/c/23552
>     
>     Also I have included a link to a presentation of this work and it
> gives
>     insight on how lustre does its own read ahead.
>     
>     
> 
https://www.eofs.eu/_media/events/lad16/19_parallel_readahead_framework_li_xi.pdf
>     
>     Now that this seems to be the targeted work for read ahead the
> discussion
>     has come up about why this new work doesn't use the kernel read
> ahead 
>     again. I wasn't involved in the discussion about the limitations
> but I 
>     have included the people interested in this work so progress can
> be done
>     to imporve the linux kernels version of read ahead.
>     
> 
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-11-07 20:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-29 17:06 [lustre-devel] Limitations of kernel read ahead James Simmons
2018-10-30  2:00 ` Li Xi
2018-10-30  7:16   ` Andreas Dilger
2018-11-04 21:37     ` James Simmons
2018-11-07 20:32   ` Latham, Robert J.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.