> On Feb 25, 2021, at 12:40 AM, Peter Zijlstra wrote: > > On Wed, Feb 24, 2021 at 11:29:04PM -0800, Nadav Amit wrote: >> From: Nadav Amit >> >> Just as applications can use prefetch instructions to overlap >> computations and memory accesses, applications may want to overlap the >> page-faults and compute or overlap the I/O accesses that are required >> for page-faults of different pages. >> >> Applications can use multiple threads and cores for this matter, by >> running one thread that prefetches the data (i.e., faults in the data) >> and another that does the compute, but this scheme is inefficient. Using >> mincore() can tell whether a page is mapped, but might not tell whether >> the page is in the page-cache and does not fault in the data. >> >> Introduce prefetch_page() vDSO-call to prefetch, i.e. fault-in memory >> asynchronously. The semantic of this call is: try to prefetch a page of >> in a given address and return zero if the page is accessible following >> the call. Start I/O operations to retrieve the page if such operations >> are required and there is no high memory pressure that might introduce >> slowdowns. >> >> Note that as usual the page might be paged-out at any point and >> therefore, similarly to mincore(), there is no guarantee that the page >> will be present at the time that the user application uses the data that >> resides on the page. Nevertheless, it is expected that in the vast >> majority of the cases this would not happen, since prefetch_page() >> accesses the page and therefore sets the PTE access-bit (if it is >> clear). >> >> The implementation is as follows. The vDSO code accesses the data, >> triggering a page-fault it is not present. The handler detects based on >> the instruction pointer that this is an asynchronous-#PF, using the >> recently introduce vDSO exception tables. If the page can be brought >> without waiting (e.g., the page is already in the page-cache), the >> kernel handles the fault and returns success (zero). If there is memory >> pressure that prevents the proper handling of the fault (i.e., requires >> heavy-weight reclamation) it returns a failure. Otherwise, it starts an >> I/O to bring the page and returns failure. >> >> Compilers can be extended to issue the prefetch_page() calls when >> needed. > > Interesting, but given we've been removing explicit prefetch from some > parts of the kernel how useful is this in actual use? I'm thinking there > should at least be a real user and performance numbers with this before > merging. Can you give me a reference to the “removing explicit prefetch from some parts of the kernel”? I will work on an llvm/gcc plugin to provide some performance numbers. I wanted to make sure that the idea is not a complete obscenity first.