From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752539Ab2LPDfu (ORCPT ); Sat, 15 Dec 2012 22:35:50 -0500 Received: from dcvr.yhbt.net ([64.71.152.64]:59968 "EHLO dcvr.yhbt.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752191Ab2LPDft (ORCPT ); Sat, 15 Dec 2012 22:35:49 -0500 Date: Sun, 16 Dec 2012 03:35:49 +0000 From: Eric Wong To: Dave Chinner Cc: Alan Cox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue Message-ID: <20121216033549.GA30446@dcvr.yhbt.net> References: <20121215005448.GA7698@dcvr.yhbt.net> <20121215223448.08272fd5@pyramind.ukuu.org.uk> <20121216002549.GA19402@dcvr.yhbt.net> <20121216030302.GI9806@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121216030302.GI9806@dastard> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dave Chinner wrote: > On Sun, Dec 16, 2012 at 12:25:49AM +0000, Eric Wong wrote: > > Alan Cox wrote: > > > On Sat, 15 Dec 2012 00:54:48 +0000 > > > Eric Wong wrote: > > > > > > > Applications streaming large files may want to reduce disk spinups and > > > > I/O latency by performing large amounts of readahead up front > > > > > > How does it compare benchmark wise with a user thread or using the > > > readahead() call ? > > > > Very well. > > > > My main concern is for the speed of the initial pread()/read() call > > after open(). > > > > Setting EARLY_EXIT means my test program _exit()s immediately after the > > first pread(). In my test program (below), I wait for the background > > thread to become ready before open() so I would not take overhead from > > pthread_create() into account. > > > > RA=1 uses a pthread + readahead() > > Not setting RA uses fadvise (with my patch) > > And if you don't use fadvise/readahead at all? Sorry for the confusion. I believe my other reply to you summarized what I wanted to say in my commit message and also reply to Alan. I want all the following things: - I want the first read to be fast. - I want to read the whole file eventually (probably slowly, as processing takes a while). - I want to let my disk spin down for as long as possible. This could also be a use case for an audio/video player. > You're not timing how long the first pread() takes at all. You're > timing the entire set of operations, including cloning a thread and > for the readahead(2) call and messages to be passed back and forth > through the eventfd interface to read the entire file. You're right, I screwed up the measurement. Using clock_gettime(), there's hardly a difference between the approaches and I can't get consistent timings between them. So no, there's no difference that matters between the approaches. But I think doing this in the kernel is easier for userspace users. ---------------------------------- 8<---------------------------- /* gcc -O2 -Wall -lpthread -lrt -o first_read first_read.c */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include static int efd1; static int efd2; static void clock_diff(struct timespec *a, const struct timespec *b) { a->tv_sec -= b->tv_sec; a->tv_nsec -= b->tv_nsec; if (a->tv_nsec < 0) { --a->tv_sec; a->tv_nsec += 1000000000; } } static void * start_ra(void *unused) { struct stat st; eventfd_t val; int fd; /* tell parent to open() */ assert(eventfd_write(efd1, 1) == 0); /* wait for parent to tell us fd is ready */ assert(eventfd_read(efd2, &val) == 0); fd = (int)val; assert(fstat(fd, &st) == 0); assert(readahead(fd, 0, st.st_size) == 0); return NULL; } int main(int argc, char *argv[]) { char buf[16384]; pthread_t thr; int fd; struct timespec start; struct timespec finish; char *do_ra = getenv("RA"); if (argc != 2) { fprintf(stderr, "Usage: strace -T %s LARGE_FILE\n", argv[0]); return 1; } if (do_ra) { eventfd_t val; efd1 = eventfd(0, 0); efd2 = eventfd(0, 0); assert(efd1 >= 0 && efd2 >= 0 && "eventfd failed"); assert(pthread_create(&thr, NULL, start_ra, NULL) == 0); /* wait for child thread to spawn */ assert(eventfd_read(efd1, &val) == 0); } fd = open(argv[1], O_RDONLY); assert(fd >= 0 && "open failed"); assert(clock_gettime(CLOCK_MONOTONIC, &start) == 0); if (do_ra) { /* wake up the child thread, give it a chance to run */ assert(eventfd_write(efd2, fd) == 0); sched_yield(); } else assert(posix_fadvise(fd, 0, 0, POSIX_FADV_WILLNEED) == 0); assert(pread(fd, buf, sizeof(buf), 0) == sizeof(buf)); assert(clock_gettime(CLOCK_MONOTONIC, &finish) == 0); clock_diff(&finish, &start); fprintf(stderr, "elapsed: %lu.%09lu\n", finish.tv_sec, finish.tv_nsec); if (getenv("FULL_READ")) { ssize_t r; do { r = read(fd, buf, sizeof(buf)); } while (r > 0); assert(r == 0 && "EOF not reached"); } if (getenv("EXIT_EARLY")) _exit(0); if (do_ra) { assert(pthread_join(thr, NULL) == 0); assert(close(efd1) == 0); assert(close(efd2) == 0); } assert(close(fd) == 0); return 0; }