linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Donald Buczek <buczek@molgen.mpg.de>
To: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	dm-devel@redhat.com
Cc: it+linux@molgen.mpg.de,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: ext4_writepages: jbd2_start: 5120 pages, ino 11; err -5
Date: Thu, 14 Apr 2022 17:19:49 +0200	[thread overview]
Message-ID: <4e83fb26-4d4a-d482-640c-8104973b7ebf@molgen.mpg.de> (raw)

We have a cluster scheduler which provides each cluster job with a private scratch filesystem (TMPDIR). These are created when a job starts and removed when a job completes. The setup works by fallocate, losetup, mkfs.ext4, mkdir, mount, "losetup -d", rm and the teardown just does a umount and rmdir.

This works but there is one nuisance: The systems usually have a lot of memory and some jobs write a lot of data to their scratch filesystems. So when a job finishes, there often is a lot to sync by umount which sometimes takes many minutes and wastes a lot of I/O bandwidth. Additionally, the reserved space can't be returned and reused until the umount is finished and the backing file is deleted.

So I was looking for a way to avoid that but didn't find something straightforward. The workaround I've found so far is using a dm-device (linear target) between the filesystem and the loop device and then use this sequence for teardown:

- fcntl EXT4_IOC_SHUTDOWN with EXT4_GOING_FLAGS_NOLOGFLUSH
- dmestup reload $dmname --table "0 $sectors zero"
- dmsetup resume $dmname --noflush
- umount $mountpoint
- dmsetup remove --deferred $dmname
- rmdir $mountpoint

This seems to do what I want. The unnecessary flushing of the temporary data is redirected from the backing file into the zero target and it works really fast. There is one remaining problem though, which might be just a cosmetic one: Although ext4 is shut down to prevent it from writing, I sometimes get the error message from the subject in the logs:

[2963044.462043] EXT4-fs (dm-1): mounted filesystem without journal. Opts: (null)
[2963044.686994] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[2963044.728391] EXT4-fs (dm-2): mounted filesystem without journal. Opts: (null)
[2963055.585198] EXT4-fs (dm-2): shut down requested (2)
[2963064.821246] EXT4-fs (dm-2): mounted filesystem without journal. Opts: (null)
[2963074.838259] EXT4-fs (dm-2): shut down requested (2)
[2963095.979089] EXT4-fs (dm-0): shut down requested (2)
[2963096.066376] EXT4-fs (dm-0): ext4_writepages: jbd2_start: 5120 pages, ino 11; err -5
[2963108.636648] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[2963125.194740] EXT4-fs (dm-0): shut down requested (2)
[2963166.708088] EXT4-fs (dm-1): shut down requested (2)
[2963169.334437] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[2963227.515974] EXT4-fs (dm-0): shut down requested (2)
[2966222.515143] EXT4-fs (dm-0): mounted filesystem without journal. Opts: (null)
[2966222.523390] EXT4-fs (dm-1): mounted filesystem without journal. Opts: (null)
[2966222.598071] EXT4-fs (dm-2): mounted filesystem without journal. Opts: (null)

So I'd like to ask a few questions:

- Is this error message expected or is it a bug?
- Can it be ignored or is there a leak or something on that error path.
- Is there a better way to do what I want? Something I've overlooked?
- I consider to create a new dm target or add an option to an existing one, because I feel that "zero" underneath a filesystem asks for problems because a filesystem expects to read back the data that it wrote, and the "error" target would trigger lots of errors during the writeback attempts. What I really want is a target which silently discard writes and returns errors on reads. Any opinion about that?
- But to use devicemapper to eat away the I/O is also just a workaround to the fact that we can't parse some flag to umount to say that we are okay to lose all data and leave the filesystem in a corrupted state if this was the last reference to it. Would this be a useful feature?

Best
   Donald
-- 
Donald Buczek
buczek@molgen.mpg.de

             reply	other threads:[~2022-04-14 16:08 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14 15:19 Donald Buczek [this message]
2022-05-31 10:38 ` ext4_writepages: jbd2_start: 5120 pages, ino 11; err -5 Jan Kara
2022-05-31 13:48   ` Donald Buczek
2022-05-31 15:39     ` Theodore Ts'o
2022-09-28 10:10       ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e83fb26-4d4a-d482-640c-8104973b7ebf@molgen.mpg.de \
    --to=buczek@molgen.mpg.de \
    --cc=dm-devel@redhat.com \
    --cc=it+linux@molgen.mpg.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).