linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Y. Ts'o" <tytso@mit.edu>
To: Ming Lei <ming.lei@redhat.com>
Cc: Andrea Vai <andrea.vai@unipv.it>,
	"Schmid, Carsten" <Carsten_Schmid@mentor.com>,
	Finn Thain <fthain@telegraphics.com.au>,
	Damien Le Moal <Damien.LeMoal@wdc.com>,
	Alan Stern <stern@rowland.harvard.edu>,
	Jens Axboe <axboe@kernel.dk>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	USB list <linux-usb@vger.kernel.org>,
	SCSI development list <linux-scsi@vger.kernel.org>,
	Himanshu Madhani <himanshu.madhani@cavium.com>,
	Hannes Reinecke <hare@suse.com>, Omar Sandoval <osandov@fb.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	Hans Holmberg <Hans.Holmberg@wdc.com>,
	Kernel development list <linux-kernel@vger.kernel.org>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6
Date: Wed, 25 Dec 2019 00:17:22 -0500	[thread overview]
Message-ID: <20191225051722.GA119634@mit.edu> (raw)
In-Reply-To: <20191224012707.GA13083@ming.t460p>

On Tue, Dec 24, 2019 at 09:27:07AM +0800, Ming Lei wrote:
> The ext4_release_file() should be run from read() or write() syscall if
> Fedora 30's 'cp' is implemented correctly. IMO, it isn't expected behavior
> for ext4_release_file() to be run thousands of times when just
> running 'cp' once, see comment of ext4_release_file():

What's your evidence of that?  As opposed to the writeback taking a
long time, leading to the *one* call of ext4_release_file taking a
long time?  If it's a big file, we might very well be calliing
ext4_writepages multiple times, from a single call to
__filemap_fdatawrite_range().

You confused mightily from that assertion, and that caused me to make
assumptions that cp was doing something crazy.  But I'm quite conviced
now that this is almost certainly not what is happening.

> > I suspect the next step is use a blktrace, to see what kind of I/O is
> > being sent to the USB drive, and how long it takes for the I/O to
> > complete.  You might also try to capture the output of "iostat -x 1"
> > while the script is running, and see what the difference might be
> > between a kernel version that has the problem and one that doesn't,
> > and see if that gives us a clue.
> 
> That isn't necessary, given we have concluded that the bad write
> performance is caused by broken write order.

I didn't see any evidence of that from what I had in my inbox, so I
went back to the mailing list archives to figure out what you were
talking about.  Part of the problem is this has been a very
long-spanning thread, and I had deleted from my inbox all of the parts
relating to the MQ scheduler since that was clearly Not My Problem.  :-)

So, summarizing the most of the thread.  The problem started when we
removed the legacy I/O scheduler, since we are now only using the MQ
scheduler.  What the kernel is sending is long writes (240 sectors),
but it is being sent as an interleaved stream of two sequential
writes.  This particular pendrive can't handle this workload, because
it has a very simplistic Flash Translation Layer.  Now, this is not
*broken*, from a storage perspective; it's just that it's more than
the simple little brain of this particular pen drive can handle.

Previously, with a single queue, and specially since the queue depth
supported by this pen drive is 1, the elevator algorithm would sort
the I/O requests so that it would be mostly sequential, and this
wouldn't be much of a problem.  However, once the legacy I/O stack was
removed, the MQ stack is designed so that we don't have to take a
global lock in order to submit an I/O request.  That also means that
we can't do a full elevator sort since that would require locking all
of the queues.

This is not a problem, since HDD's generally have a 16 deep queue, and
SSD's have a super-deep queue depth since they get their speed via
parallel writes to different flash chips.  Unfortunately, it *is* a
problem for super primitive USB sticks.

> So far, the reason points to the extra writeback path from exit_to_usermode_loop().
> If it is not from close() syscall, the issue should be related with file reference
> count. If it is from close() syscall, the issue might be in 'cp''s
> implementation.

Oh, it's probably from the close system call; and it's *only* from a
single close system call.  Because there is the auto delayed
allocation resolution to protect against buggy userspace, under
certain circumstances, as I explained earlier, we force a full
writeout on a close for a file decsriptor which was opened with an
O_TRUNC.  This is by *design*, since we are trying to protect against
buggy userspace (application programmers vastly outnumber file system
programmers, and far too many of them want O_PONY).  This is Working
As Intended.

You can disable it by deleting the test file before the cp:

    rm -f /mnt/pendrive/$testfile

Or you can disable the protection against stupid userspace by using
the noauto_da_alloc mount option.  (But then if you have a buggy game
program which writes the top-ten score file by using open(2) w/
O_TRUNC, and then said program closes the OpenGL library, and the
proprietary 3rd party binary-only video driver wedges the X server
requiring a hard reset to recover, and the top-ten score file becomes
a zero-length file, don't come crying to me...  Or if a graphical text
editor forgets to use fsync(2) before saving a source file you spent
hours working on, and then the system crashes at exactly the wrong
moment and your source file becomes zero-length, against, don't come
crying to me.  Blame the stupid application programmer which wrote
your text editor who decided to skip the fsync(2), or who decided that
copying the ACL's and xattrs was Too Hard(tm), and so opening the file
with O_TRUNC and rewriting the file in place was easier for the
application programmer.)

In any case, I think this is all working all as intended.  The MQ I/O
stack is optimized for modern HDD and SSD's, and especially SSD's.
And the file system assumes that parallel sequential writes,
especially if they are large, is really not a big deal, since that's
what NCQ or massive parallelism of pretty much all SSD's want.
(Again, ignoring the legacy of crappy flash drives.

You can argue with storage stack folks about whether we need to have
super-dumb mode for slow, crappy flash which uses a global lock and a
global elevator scheduler for super-crappy flash if you want.  I'm
going to stay out of that argument.

					- Ted

  parent reply	other threads:[~2019-12-25  5:17 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <307581a490b610c3025ee80f79a465a89d68ed19.camel@unipv.it>
2019-08-20 17:13 ` Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6 Alan Stern
2019-08-23 10:39   ` Andrea Vai
2019-08-23 20:42     ` Alan Stern
2019-08-26  6:09       ` Andrea Vai
2019-08-26 16:33         ` Alan Stern
2019-09-18 15:25           ` Andrea Vai
2019-09-18 16:30             ` Alan Stern
2019-09-19  7:33               ` Andrea Vai
2019-09-19 17:54                 ` Alan Stern
2019-09-20  7:25                   ` Andrea Vai
2019-09-20  7:44                     ` Greg KH
2019-09-19  8:26               ` Damien Le Moal
2019-09-19  8:55                 ` Ming Lei
2019-09-19  9:09                   ` Damien Le Moal
2019-09-19  9:21                     ` Ming Lei
2019-09-19 14:01                 ` Alan Stern
2019-09-19 14:14                   ` Damien Le Moal
2019-09-20  7:03                     ` Andrea Vai
2019-09-25 19:30                       ` Alan Stern
2019-09-25 19:36                         ` Jens Axboe
2019-09-27 15:47                           ` Andrea Vai
2019-11-04 16:00                             ` Andrea Vai
2019-11-04 18:20                               ` Alan Stern
2019-11-05 11:48                                 ` Andrea Vai
2019-11-05 18:31                                   ` Alan Stern
2019-11-05 23:29                                     ` Jens Axboe
2019-11-06 16:03                                       ` Alan Stern
2019-11-06 22:13                                         ` Damien Le Moal
2019-11-07  7:04                                           ` Andrea Vai
2019-11-07  7:54                                             ` Damien Le Moal
2019-11-07 18:59                                               ` Andrea Vai
2019-11-08  8:42                                                 ` Damien Le Moal
2019-11-08 14:33                                                   ` Jens Axboe
2019-11-11 10:46                                                     ` Andrea Vai
2019-11-09 10:09                                                   ` Ming Lei
2019-11-09 22:28                                                 ` Ming Lei
2019-11-11 10:50                                                   ` Andrea Vai
2019-11-11 11:05                                                     ` Ming Lei
2019-11-11 11:13                                                       ` Andrea Vai
2019-11-22 19:16                                                   ` Andrea Vai
2019-11-23  7:28                                                     ` Ming Lei
2019-11-23 15:44                                                       ` Andrea Vai
2019-11-25  3:54                                                         ` Ming Lei
2019-11-25 10:11                                                           ` Andrea Vai
2019-11-25 10:29                                                             ` Ming Lei
2019-11-25 14:58                                                               ` Andrea Vai
2019-11-25 15:15                                                                 ` Ming Lei
2019-11-25 18:51                                                                   ` Andrea Vai
2019-11-26  2:32                                                                     ` Ming Lei
2019-11-26  7:46                                                                       ` Andrea Vai
2019-11-26  9:15                                                                         ` Ming Lei
2019-11-26 10:24                                                                           ` Ming Lei
2019-11-26 11:14                                                                           ` Andrea Vai
2019-11-27  2:05                                                                             ` Ming Lei
2019-11-27  9:39                                                                               ` Andrea Vai
2019-11-27 13:08                                                                                 ` Ming Lei
2019-11-27 15:01                                                                                   ` Andrea Vai
2019-11-27  0:21                                                                         ` Finn Thain
2019-11-27  8:14                                                                           ` AW: " Schmid, Carsten
2019-11-27 21:49                                                                             ` Finn Thain
2019-11-28  7:46                                                                             ` Andrea Vai
2019-11-28  8:12                                                                               ` AW: " Schmid, Carsten
2019-11-28 11:40                                                                                 ` Andrea Vai
2019-11-28 17:39                                                                                 ` Alan Stern
2019-11-28  9:17                                                                               ` Ming Lei
2019-11-28 17:34                                                                                 ` Andrea Vai
2019-11-29  0:57                                                                                   ` Ming Lei
2019-11-29  2:35                                                                                     ` Ming Lei
2019-11-29 14:41                                                                                       ` Andrea Vai
2019-12-03  2:23                                                                                         ` Ming Lei
2019-12-10  7:35                                                                                           ` Andrea Vai
2019-12-10  8:05                                                                                             ` Ming Lei
2019-12-11  2:41                                                                                               ` Theodore Y. Ts'o
2019-12-11  4:00                                                                                                 ` Ming Lei
2019-12-11 16:07                                                                                                   ` Theodore Y. Ts'o
2019-12-11 21:33                                                                                                     ` Ming Lei
2019-12-12  7:34                                                                                                       ` Andrea Vai
2019-12-18  8:25                                                                                                       ` Andrea Vai
2019-12-18  9:48                                                                                                         ` Ming Lei
     [not found]                                                                                                           ` <b1b6a0e9d690ecd9432025acd2db4ac09f834040.camel@unipv.it>
2019-12-23 13:08                                                                                                             ` Ming Lei
2019-12-23 14:02                                                                                                               ` Andrea Vai
2019-12-24  1:32                                                                                                                 ` Ming Lei
2019-12-24  8:04                                                                                                                   ` Andrea Vai
2019-12-24  8:47                                                                                                                     ` Ming Lei
2019-12-23 16:26                                                                                                               ` Theodore Y. Ts'o
2019-12-23 16:29                                                                                                                 ` Andrea Vai
2019-12-23 17:22                                                                                                                   ` Theodore Y. Ts'o
2019-12-23 18:45                                                                                                                     ` Andrea Vai
2019-12-23 19:53                                                                                                                       ` Theodore Y. Ts'o
2019-12-24  1:27                                                                                                                         ` Ming Lei
2019-12-24  6:49                                                                                                                           ` Andrea Vai
2019-12-24  8:51                                                                                                                           ` Andrea Vai
2019-12-24  9:35                                                                                                                             ` Ming Lei
2019-12-25  5:17                                                                                                                           ` Theodore Y. Ts'o [this message]
2019-12-26  2:27                                                                                                                             ` Ming Lei
2019-12-26  3:30                                                                                                                               ` Theodore Y. Ts'o
2019-12-26  8:37                                                                                                                                 ` Ming Lei
2020-01-07  7:51                                                                                                                                   ` Andrea Vai
     [not found]                                                                                                                                 ` <20200101074310.10904-1-hdanton@sina.com>
2020-01-01 13:53                                                                                                                                   ` slow IO on USB media Ming Lei
2019-11-29 11:44                                                                                     ` AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6 Bernd Schubert
2019-12-02  7:01                                                                                       ` Andrea Vai
2019-11-28 17:10                                                                           ` Andrea Vai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191225051722.GA119634@mit.edu \
    --to=tytso@mit.edu \
    --cc=Carsten_Schmid@mentor.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Hans.Holmberg@wdc.com \
    --cc=andrea.vai@unipv.it \
    --cc=axboe@kernel.dk \
    --cc=fthain@telegraphics.com.au \
    --cc=gregkh@linuxfoundation.org \
    --cc=hare@suse.com \
    --cc=himanshu.madhani@cavium.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=osandov@fb.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).