All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Kellermann <mk@cm4all.com>
To: David Howells <dhowells@redhat.com>
Cc: Max Kellermann <mk@cm4all.com>,
	linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: fscache corruption in Linux 5.17?
Date: Tue, 19 Apr 2022 18:41:38 +0200	[thread overview]
Message-ID: <Yl7mQr05hPg4vELb@rabbit.intern.cm-ag> (raw)
In-Reply-To: <508603.1650385022@warthog.procyon.org.uk>

On 2022/04/19 18:17, David Howells <dhowells@redhat.com> wrote:
> 	find /var/cache/fscache -inum $((0xiiii))
> 
> and see if you can see the corruption in there.  Note that there may be blocks
> of zeroes corresponding to unfetched file blocks.

I checked several known-corrupt files, but unfortunately, all
corruption have disappeared :-(

The /var/cache/fscache/ files have a time stamp half an hour ago
(17:53 CET = 15:53 GMT).  I don't know what happened at that time -
too bad this disappeared after a week, just when we started
investigating it.

All those new files are all-zero.  No data is stored in any of them.

Note that I had to enable
/sys/kernel/debug/tracing/events/cachefiles/enable; the trace events
you named (read/write/trunc/io_error/vfs_error) do not emit anything.
This is what I see:

  kworker/u98:11-1446185 [016] ..... 1813913.318370: cachefiles_ref: c=00014bd5 o=12080f1c u=1 NEW obj
  kworker/u98:11-1446185 [016] ..... 1813913.318379: cachefiles_lookup: o=12080f1c dB=3e01ee B=3e5580 e=0
  kworker/u98:11-1446185 [016] ..... 1813913.318380: cachefiles_mark_active: o=12080f1c B=3e5580
  kworker/u98:11-1446185 [016] ..... 1813913.318401: cachefiles_coherency: o=12080f1c OK       B=3e5580 c=0
  kworker/u98:11-1446185 [016] ..... 1813913.318402: cachefiles_ref: c=00014bd5 o=12080f1c u=1 SEE lookup_cookie

> Also, what filesystem is backing your cachefiles cache?  It could be useful to
> dump the extent list of the file.  You should be able to do this with
> "filefrag -e".

It's ext4.

 Filesystem type is: ef53
 File size of /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@58/T,c0000208,,1cf4167,184558d9,c0000208,,40,36bab37,40, is 188416 (46 blocks of 4096 bytes)
 /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@58/T,c0000208,,1cf4167,184558d9,c0000208,,40,36bab37,40,: 0 extents found
 File size of /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@ea/T,c0000208,,10cc976,1208c7f6,c0000208,,40,36bab37,40, is 114688 (28 blocks of 4096 bytes)
 /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@ea/T,c0000208,,10cc976,1208c7f6,c0000208,,40,36bab37,40,: 0 extents found

> As to why this happens, a write that's misaligned by 31 bytes should cause DIO
> to a disk to fail - so it shouldn't be possible to write that.  However, I'm
> doing fallocate and truncate on the file to shape it so that DIO will work on
> it, so it's possible that there's a bug there.  The cachefiles_trunc trace
> lines may help catch that.

I don't think any write is misaligned.  This was triggered by a
WordPress update, so I think the WordPress updater truncated and
rewrote all files.  Random guess: some pages got transferred to the
NFS server, but the local copy in fscache did not get updated.

Max

  reply	other threads:[~2022-04-19 16:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12 15:10 fscache corruption in Linux 5.17? Max Kellermann
2022-04-16 11:38 ` Thorsten Leemhuis
2022-04-16 19:55   ` Max Kellermann
2022-04-19 13:02 ` David Howells
2022-04-19 14:18   ` Max Kellermann
2022-04-19 15:23     ` [Linux-cachefs] " David Wysochanski
2022-04-19 16:17   ` David Howells
2022-04-19 16:41     ` Max Kellermann [this message]
2022-04-19 16:47     ` David Howells
2022-04-19 15:56 ` David Howells
2022-04-19 16:06   ` Max Kellermann
2022-04-19 16:42   ` David Howells
2022-04-19 18:01     ` Max Kellermann
2022-04-19 21:27     ` Max Kellermann
2022-04-20 13:55     ` David Howells
2022-05-04  8:38       ` Max Kellermann
2022-05-31  8:35       ` David Howells
2022-05-31  8:41         ` Max Kellermann
2022-05-31  9:13         ` David Howells
2022-06-20  7:11           ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yl7mQr05hPg4vELb@rabbit.intern.cm-ag \
    --to=mk@cm4all.com \
    --cc=dhowells@redhat.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.