From: Christoph Hellwig <firstname.lastname@example.org> To: David Howells <email@example.com> Cc: Christoph Hellwig <firstname.lastname@example.org>, Qu Wenruo <email@example.com>, Andreas Dilger <firstname.lastname@example.org>, linux-fsdevel <email@example.com>, Al Viro <firstname.lastname@example.org>, "Theodore Y. Ts'o" <email@example.com>, "Darrick J. Wong" <firstname.lastname@example.org>, Chris Mason <email@example.com>, Josef Bacik <firstname.lastname@example.org>, David Sterba <email@example.com>, linux-ext4 <firstname.lastname@example.org>, linux-xfs <email@example.com>, linux-btrfs <firstname.lastname@example.org>, Linux Kernel Mailing List <email@example.com> Subject: Re: Problems with determining data presence by examining extents? Date: Wed, 15 Jan 2020 15:48:39 +0100 [thread overview] Message-ID: <20200115144839.GA30301@lst.de> (raw) In-Reply-To: <firstname.lastname@example.org> On Wed, Jan 15, 2020 at 02:35:22PM +0000, David Howells wrote: > > If we can't get that easily it can be emulated using lseek SEEK_DATA / > > SEEK_HOLE assuming no other thread could be writing to the file, or the > > raciness doesn't matter. > > Another thread could be writing to the file, and the raciness matters if I > want to cache the result of calling SEEK_HOLE - though it might be possible > just to mask it off. Well, if you have other threads changing the file (writing, punching holes, truncating, etc) you have lost with any interface that isn't an atomic give me that data or tell me its a hole. And even if that if you allow threads that aren't part of your fscache implementation to do the modifications you have lost. If on the other hand they are part of fscache you should be able to synchronize your threads somehow. > One problem I have with SEEK_HOLE is that there's no upper bound on it. Say > I have a 1GiB cachefile that's completely populated and I want to find out if > the first byte is present or not. I call: > > end = vfs_llseek(file, SEEK_HOLE, 0); > > It will have to scan the metadata of the entire 1GiB file and will then > presumably return the EOF position. Now this might only be a mild irritation > as I can cache this information for later use, but it does put potentially put > a performance hiccough in the case of someone only reading the first page or > so of the file (say the file program). On the other hand, probably most of > the files in the cache are likely to be complete - in which case, it's > probably quite cheap. At least for XFS all the metadata is read from disk at once anyway, so you only spend a few more cycles walking through a pretty efficient in-memory data structure. > However, SEEK_HOLE doesn't help with the issue of the filesystem 'altering' > the content of the file by adding or removing blocks of zeros. As does any other method. If you need that fine grained control you need to track the information yourself.
next prev parent reply other threads:[~2020-01-15 14:48 UTC|newest] Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-01-14 16:48 David Howells 2020-01-14 22:49 ` Theodore Y. Ts'o 2020-01-15 3:54 ` Qu Wenruo 2020-01-15 12:46 ` Andreas Dilger 2020-01-15 13:10 ` Qu Wenruo 2020-01-15 13:31 ` Christoph Hellwig 2020-01-15 19:48 ` Andreas Dilger 2020-01-16 10:16 ` Christoph Hellwig 2020-01-15 20:55 ` David Howells 2020-01-15 22:11 ` Andreas Dilger 2020-01-15 23:09 ` David Howells 2020-01-26 18:19 ` Zygo Blaxell 2020-01-15 14:35 ` David Howells 2020-01-15 14:48 ` Christoph Hellwig [this message] 2020-01-15 14:59 ` David Howells 2020-01-16 10:13 ` Christoph Hellwig 2020-01-17 16:43 ` David Howells 2020-01-15 14:20 ` David Howells 2020-01-15 8:38 ` Christoph Hellwig 2020-01-15 13:50 ` David Howells 2020-01-15 14:05 ` David Howells 2020-01-15 14:24 ` Qu Wenruo 2020-01-15 14:50 ` David Howells 2020-01-15 14:15 ` David Howells
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200115144839.GA30301@lst.de \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: Problems with determining data presence by examining extents?' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.