Re: BLKSECDISCARD ioctl and hung tasks

From: Ming Lei <ming.lei@redhat.com>
To: Salman Qazi <sqazi@google.com>
Cc: Ming Lei <tom.leiming@gmail.com>,
	Bart Van Assche <bvanassche@acm.org>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	Gwendal Grignou <gwendal@google.com>,
	Jesse Barnes <jsbarnes@google.com>
Subject: Re: BLKSECDISCARD ioctl and hung tasks
Date: Sat, 15 Feb 2020 11:46:52 +0800	[thread overview]
Message-ID: <20200215034652.GA19867@ming.t460p> (raw)
In-Reply-To: <CAKUOC8Xss0YPefhKfwBiBar-7QQ=QrVh3d_8NBfidCCxUuxcgg@mail.gmail.com>

On Fri, Feb 14, 2020 at 11:42:32AM -0800, Salman Qazi wrote:
> On Fri, Feb 14, 2020 at 1:23 AM Ming Lei <tom.leiming@gmail.com> wrote:
> >
> > On Fri, Feb 14, 2020 at 1:50 PM Bart Van Assche <bvanassche@acm.org> wrote:
> > >
> > > On 2020-02-13 11:21, Salman Qazi wrote:
> > > > AFAICT, This is not actually sufficient, because the issuer of the bio
> > > > is waiting for the entire bio, regardless of how it is split later.
> > > > But, also there isn't a good mapping between the size of the secure
> > > > discard and how long it will take.  If given the geometry of a flash
> > > > device, it is not hard to construct a scenario where a relatively
> > > > small secure discard (few thousand sectors) will take a very long time
> > > > (multiple seconds).
> > > >
> > > > Having said that, I don't like neutering the hung task timer either.
> > >
> > > Hi Salman,
> > >
> > > How about modifying the block layer such that completions of bio
> > > fragments are considered as task activity? I think that bio splitting is
> > > rare enough for such a change not to affect performance of the hot path.
> >
> > Are you sure that the task hung warning won't be triggered in case of
> > non-splitting?
> 
> I demonstrated a few emails ago that it doesn't take a very large
> secure discard command to trigger this.  So, I am sceptical that we
> will be able to use splitting to solve this.
> 
> >
> > >
> > > How about setting max_discard_segments such that a discard always
> > > completes in less than half the hung task timeout? This may make
> > > discards a bit slower for one particular block driver but I think that's
> > > better than hung task complaints.
> >
> > I am afraid you can't find a golden setting max_discard_segments working
> > for every drivers. Even it is found, the performance  may have been affected.
> >
> > So just wondering why not take the simple approach used in blk_execute_rq()?
> 
> My colleague Gwendal pointed out another issue which I had missed:
> secure discard is an exclusive command: it monopolizes the device.
> Even if we fix this via your approach, it will show up somewhere else,
> because other operations to the drive will not make progress for that
> length of time.

What are the 'other operations'? Are they block IOs?

If yes, that is why I suggest to fix submit_bio_wait(), which should cover
most of sync bio submission.

Anyway, the fix is simple & generic enough, I'd plan to post a formal
patch if no one figures out better doable approaches.

> 
> For Chromium OS purposes, if we had a blank slate, this is how I would solve it:
> 
> * Under the assumption that the truly sensitive data is not very big:
>     * Keep secure data on a separate partition to make sure that those
> LBAs have controlled history
>     * Treat the files in that partition as immutable (i.e. no
> overwriting the contents of the file without first secure erasing the
> existing contents).
>     * By never letting more than one version of the file accumulate,
> we can guarantee that the secure erase will always be fast for
> moderate sized files.
> 
> But for all the existing machines with keys on them, we will need to
> do something else.

The issue you reported is a generic one, not Chromium only.

Thanks,
Ming