linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Max Kellermann <mk@cm4all.com>
To: David Howells <dhowells@redhat.com>
Cc: Max Kellermann <mk@cm4all.com>,
	linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: fscache corruption in Linux 5.17?
Date: Tue, 19 Apr 2022 18:41:38 +0200	[thread overview]
Message-ID: <Yl7mQr05hPg4vELb@rabbit.intern.cm-ag> (raw)
In-Reply-To: <508603.1650385022@warthog.procyon.org.uk>

On 2022/04/19 18:17, David Howells <dhowells@redhat.com> wrote:
> 	find /var/cache/fscache -inum $((0xiiii))
> 
> and see if you can see the corruption in there.  Note that there may be blocks
> of zeroes corresponding to unfetched file blocks.

I checked several known-corrupt files, but unfortunately, all
corruption have disappeared :-(

The /var/cache/fscache/ files have a time stamp half an hour ago
(17:53 CET = 15:53 GMT).  I don't know what happened at that time -
too bad this disappeared after a week, just when we started
investigating it.

All those new files are all-zero.  No data is stored in any of them.

Note that I had to enable
/sys/kernel/debug/tracing/events/cachefiles/enable; the trace events
you named (read/write/trunc/io_error/vfs_error) do not emit anything.
This is what I see:

  kworker/u98:11-1446185 [016] ..... 1813913.318370: cachefiles_ref: c=00014bd5 o=12080f1c u=1 NEW obj
  kworker/u98:11-1446185 [016] ..... 1813913.318379: cachefiles_lookup: o=12080f1c dB=3e01ee B=3e5580 e=0
  kworker/u98:11-1446185 [016] ..... 1813913.318380: cachefiles_mark_active: o=12080f1c B=3e5580
  kworker/u98:11-1446185 [016] ..... 1813913.318401: cachefiles_coherency: o=12080f1c OK       B=3e5580 c=0
  kworker/u98:11-1446185 [016] ..... 1813913.318402: cachefiles_ref: c=00014bd5 o=12080f1c u=1 SEE lookup_cookie

> Also, what filesystem is backing your cachefiles cache?  It could be useful to
> dump the extent list of the file.  You should be able to do this with
> "filefrag -e".

It's ext4.

 Filesystem type is: ef53
 File size of /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@58/T,c0000208,,1cf4167,184558d9,c0000208,,40,36bab37,40, is 188416 (46 blocks of 4096 bytes)
 /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@58/T,c0000208,,1cf4167,184558d9,c0000208,,40,36bab37,40,: 0 extents found
 File size of /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@ea/T,c0000208,,10cc976,1208c7f6,c0000208,,40,36bab37,40, is 114688 (28 blocks of 4096 bytes)
 /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@ea/T,c0000208,,10cc976,1208c7f6,c0000208,,40,36bab37,40,: 0 extents found

> As to why this happens, a write that's misaligned by 31 bytes should cause DIO
> to a disk to fail - so it shouldn't be possible to write that.  However, I'm
> doing fallocate and truncate on the file to shape it so that DIO will work on
> it, so it's possible that there's a bug there.  The cachefiles_trunc trace
> lines may help catch that.

I don't think any write is misaligned.  This was triggered by a
WordPress update, so I think the WordPress updater truncated and
rewrote all files.  Random guess: some pages got transferred to the
NFS server, but the local copy in fscache did not get updated.

Max

  reply	other threads:[~2022-04-19 16:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12 15:10 fscache corruption in Linux 5.17? Max Kellermann
2022-04-16 11:38 ` Thorsten Leemhuis
2022-04-16 19:55   ` Max Kellermann
2022-04-19 13:02 ` David Howells
2022-04-19 14:18   ` Max Kellermann
2022-04-19 15:23     ` [Linux-cachefs] " David Wysochanski
2022-04-19 16:17   ` David Howells
2022-04-19 16:41     ` Max Kellermann [this message]
2022-04-19 16:47     ` David Howells
2022-04-19 15:56 ` David Howells
2022-04-19 16:06   ` Max Kellermann
2022-04-19 16:42   ` David Howells
2022-04-19 18:01     ` Max Kellermann
2022-04-19 21:27     ` Max Kellermann
2022-04-20 13:55     ` David Howells
2022-05-04  8:38       ` Max Kellermann
2022-05-31  8:35       ` David Howells
2022-05-31  8:41         ` Max Kellermann
2022-05-31  9:13         ` David Howells
2022-06-20  7:11           ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yl7mQr05hPg4vELb@rabbit.intern.cm-ag \
    --to=mk@cm4all.com \
    --cc=dhowells@redhat.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).