Re: [RFC 0/5] fix races in CDC-WDM

From: Oliver Neukum <oneukum@suse.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: bjorn@mork.no, linux-usb@vger.kernel.org
Subject: Re: [RFC 0/5] fix races in CDC-WDM
Date: Mon, 21 Sep 2020 12:52:58 +0200	[thread overview]
Message-ID: <1600685578.2424.72.camel@suse.com> (raw)
In-Reply-To: <52714f66-c2ec-7a31-782a-9365ba900111@i-love.sakura.ne.jp>

Am Freitag, den 18.09.2020, 01:17 +0900 schrieb Tetsuo Handa:
> On 2020/09/17 23:17, Oliver Neukum wrote:
> > The API and its semantics are clear. Write schedules a write:
> > 
> >        A  successful  return  from  write() does not make any guarantee that data has been committed to disk.  On some filesystems, including NFS, it does not even guarantee that space has successfully been reserved for the data.  In this case, some errors might be
> >        delayed until a future write(2), fsync(2), or even close(2).  The only way to be sure is to call fsync(2) after you are done writing all your data.
> 
> But I think that this leaves a room for allowing write() to imply fflush()
> (i.e. write() is allowed to wait for data to be committed to disk).

That would be inferior and very bad for the non-blocking case.

> > If user space does not call fsync(), the error is supposed to be reported
> > by the next write() and if there is no next write(), close() shall report it.
> 
> Where does "the next" (and not "the next after the next") write() come from?

We would indeed by on spec. However, we perform best if we return an
error as soon as possible.

> You are saying that if user space does not call fsync(), the error is allowed to be
> reported by the next after the next (in other words, (N+2)'th) write() ?

Yes. The man page is clear on that.

> > > . At least I think that
> > > 
> > >         spin_lock_irq(&desc->iuspin);
> > >         we = desc->werr;
> > >         desc->werr = 0;
> > >         spin_unlock_irq(&desc->iuspin);
> > >         if (we < 0)
> > >                 return usb_translate_errors(we);
> > > 
> > > in wdm_write() should be moved to after !test_bit(WDM_IN_USE, &desc->flags).
> > 
> > Why?
> 
> Otherwise, we can't make sure (N+1)'th write() will report error from N'th write().

We should move the test for reporting errors later, so that it is sure
to be carried out? I am afraid I cannot follow that logic.

> Since I don't know the characteristics of data passed via wdm_write() (I guess that
> the data is some stateful controlling commands rather than meaningless byte stream),
> I guess that (N+1)'th wdm_write() attempt should be made only after confirming that
> N'th wdm_write() attempt received wdm_callback() response. To preserve state / data
> used by N'th wdm_write() attempt, reporting the error from too late write() attempt
> would be useless.

We cannot make assumptions on how user space uses the driver. Somebody
manually connecting and typing in commands letter by letter must also
work.

We can optimize for the common case, but we must operate according to
the specs.
> 
> 
> 
> > > In addition, is
> > > 
> > >         /* using write lock to protect desc->count */
> > >         mutex_lock(&desc->wlock);
> > > 
> > > required? Isn't wdm_mutex that is actually protecting desc->count from modification?
> > > If it is desc->wlock that is actually protecting desc->count, the !desc->count check
> > > in wdm_release() and the desc->count == 1 check in wdm_open() have to be done with
> > > desc->wlock held.
> > 
> > Correct. So should wdm_mutex be dropped earlier?
> 
> If recover_from_urb_loss() can tolerate stale desc->count value, wdm_mutex already

It cannot.

> protects desc->count. I don't know how this module works. I don't know whether
> wdm_mutex and/or desc->wlock is held when recover_from_urb_loss() is called from
> wdm_resume(). It seems that desc->wlock is held but wdm_mutex is not held when
> recover_from_urb_loss() is called from wdm_post_reset().

Indeed.
> 
> 
> 
> By the way, after the fixes, we could replace
> 
>   spin_lock_irq(&desc->iuspin);
>   rv = desc->werr;
>   desc->werr = 0;
>   spin_unlock_irq(&desc->iuspin);
> 
> with
> 
>   rv = xchg(&desc->werr, 0);
> 
> and avoid spin_lock_irq()/spin_unlock_irq() because there are many
> locations which needs to check and clear the error...

Have you checked whether this has implications on memory ordering?

	Regards
		Oliver