All of lore.kernel.org
 help / color / mirror / Atom feed
* usbtmc: stb race condition introduced by 4f3c8d6eddc272b386464524235440a418ed2029
@ 2020-08-31 14:31 George McCollister
  2020-09-01 18:59 ` guido
  0 siblings, 1 reply; 8+ messages in thread
From: George McCollister @ 2020-08-31 14:31 UTC (permalink / raw)
  To: linux-usb; +Cc: guido, guido.kiener

Since 4f3c8d6eddc272b386464524235440a418ed2029 I'm experiencing this
STB race condition. TL;DR an old, cached STB value can be used after a
new one is reported in reply to a control message. Hacking up the
latest driver code to ignore the cached stb value gets around the
issue.

My USBTMC device has an interrupt endpoint with a 1ms interval.

1) A query is sent to the USBTMC device.

2) An SRQ is reported via the interrupt endpoint with MAV set.

3) Userspace performs a read to get the reply since MAV is set after
being notified by poll().

4) A condition on the USBTMC device triggers an SRQ before the
response is read but is not reported until later due to 1ms poll
interval. MAV is still set in the reported stb.

5) The USBTMC device submits a reply and clears MAV.

6) Userspace receives the reply and does USBTMC488_IOCTL_READ_STB.

[ 2584.683589] 000: usbtmc 1-1.1:1.0: Enter ioctl_read_stb iin_ep_present: 1

7) The SRQ comes in but too late to be used as a cached value by
usbtmc488_ioctl_read_stb() so it is stored for later.
[ 2584.683673] 000: usbtmc 1-1.1:1.0: int status: 0 len 2
[ 2584.683684] 000: usbtmc 1-1.1:1.0: srq received bTag 81 stb 51

8) A control msg is sent and the interrupt endpoint replies with the
STB. MAV is (correctly) not set.
[ 2584.684657] 000: usbtmc 1-1.1:1.0: int status: 0 len 2
[ 2584.687642] 000: usbtmc 1-1.1:1.0: stb:0x41 received 0

9) Userspace sends a query in response to the device condition
reported in the first STB bit.

10) Userspace does USBTMC488_IOCTL_READ_STB before waiting with poll()
and finds MAV is already set (incorrectly) since the cached value from
the last SRQ is used even though the bit was more recently reported
not set in response to a control message.

11) Userspace then attempts to read based on this false MAV 3and finds
no response.


Am I making any incorrect assumptions about how this should work? If
not what could we do to fix this behavior other than not using a
cached stb value? FWIW I'm the author of the userspace code and the
USBTMC device firmware.

Regards,
George McCollister
Software Architect
NovaTech LLC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: usbtmc: stb race condition introduced by 4f3c8d6eddc272b386464524235440a418ed2029
  2020-08-31 14:31 usbtmc: stb race condition introduced by 4f3c8d6eddc272b386464524235440a418ed2029 George McCollister
@ 2020-09-01 18:59 ` guido
  2020-09-01 22:21   ` George McCollister
  0 siblings, 1 reply; 8+ messages in thread
From: guido @ 2020-09-01 18:59 UTC (permalink / raw)
  To: George McCollister; +Cc: linux-usb, guido.kiener


George,

Thanks for your question.
> Since 4f3c8d6eddc272b386464524235440a418ed2029 I'm experiencing this
> STB race condition. TL;DR an old, cached STB value can be used after a
> new one is reported in reply to a control message. Hacking up the
> latest driver code to ignore the cached stb value gets around the
> issue.

The SRQ notification is an important message and must not be missed
in userspace applications. The new implementation does not miss the
SRQ event anymore, even when multiple threads are waiting for the SRQ event.

Your coding relies on the previous behavior and did not fail for your
use case and timing. However we could not develop reliable applications with
the previous implementation.

There are now two ways to wait for the SRQ event:
1. Using the poll/select method
2. Using the new ioctl USBTMC488_IOCTL_WAIT_SRQ (preferred way)

When receiving the SRQ event, you should immediately read the stb with
USBTMC488_IOCTL_READ_STB within the same thread to prevent races or
reading stale status byte information.

More info see:
https://github.com/GuidoKiener/linux-usbtmc/blob/master/README.md

> My USBTMC device has an interrupt endpoint with a 1ms interval.
>
> 1) A query is sent to the USBTMC device.
>
> 2) An SRQ is reported via the interrupt endpoint with MAV set.
>
> 3) Userspace performs a read to get the reply since MAV is set after
> being notified by poll().

Did you start reading (read()) without checking the Status Byte after  
poll() event?

Regards,

Guido



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: usbtmc: stb race condition introduced by 4f3c8d6eddc272b386464524235440a418ed2029
  2020-09-01 18:59 ` guido
@ 2020-09-01 22:21   ` George McCollister
  2020-09-02 15:25     ` guido
  0 siblings, 1 reply; 8+ messages in thread
From: George McCollister @ 2020-09-01 22:21 UTC (permalink / raw)
  To: guido; +Cc: linux-usb, guido.kiener

On Tue, Sep 1, 2020 at 1:59 PM <guido@kiener-muenchen.de> wrote:
>
>
> George,
>
> Thanks for your question.
> > Since 4f3c8d6eddc272b386464524235440a418ed2029 I'm experiencing this
> > STB race condition. TL;DR an old, cached STB value can be used after a
> > new one is reported in reply to a control message. Hacking up the
> > latest driver code to ignore the cached stb value gets around the
> > issue.
>
> The SRQ notification is an important message and must not be missed
> in userspace applications. The new implementation does not miss the
> SRQ event anymore, even when multiple threads are waiting for the SRQ event.
>
> Your coding relies on the previous behavior and did not fail for your
> use case and timing. However we could not develop reliable applications with
> the previous implementation.
>
> There are now two ways to wait for the SRQ event:
> 1. Using the poll/select method
> 2. Using the new ioctl USBTMC488_IOCTL_WAIT_SRQ (preferred way)
>
> When receiving the SRQ event, you should immediately read the stb with
> USBTMC488_IOCTL_READ_STB within the same thread to prevent races or
> reading stale status byte information.

I do. I call it as soon as poll indicates there is an SRQ.

>
> More info see:
> https://github.com/GuidoKiener/linux-usbtmc/blob/master/README.md
>
> > My USBTMC device has an interrupt endpoint with a 1ms interval.
> >
> > 1) A query is sent to the USBTMC device.
> >
> > 2) An SRQ is reported via the interrupt endpoint with MAV set.
> >
> > 3) Userspace performs a read to get the reply since MAV is set after
> > being notified by poll().
>
> Did you start reading (read()) without checking the Status Byte after
> poll() event?

I check the status byte after poll(). The problem seems to be that I
also check the status byte in a loop (until there is nothing to
service) before calling poll again. As long as you only check the
status byte when there is a cached value available it should be no
problem. However if you call USBTMC488_IOCTL_READ_STB when there is
not a cached SRQ value an SRQ can occur after the lock is released in
usbtmc488_ioctl_read_stb() and cache an older value (which will be
read by the next USBTMC488_IOCTL_READ_STB) than the one returned. This
is a race condition and sounds broken to me but if this is the
intended behavior I can adjust my userspace code to only do
USBTMC488_IOCTL_READ_STB once after poll indicates an SRQ and live
with it.

It doesn't seem right for USBTMC488_IOCTL_READ_STB to ever report an
older value than what was reported on the previous call.

>
> Regards,
>
> Guido
>
>

Thanks for your reply.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: usbtmc: stb race condition introduced by 4f3c8d6eddc272b386464524235440a418ed2029
  2020-09-01 22:21   ` George McCollister
@ 2020-09-02 15:25     ` guido
  2020-09-03 14:42       ` George McCollister
  0 siblings, 1 reply; 8+ messages in thread
From: guido @ 2020-09-02 15:25 UTC (permalink / raw)
  To: George McCollister; +Cc: linux-usb, guido.kiener

>> More info see:
>> https://github.com/GuidoKiener/linux-usbtmc/blob/master/README.md
>>
>> > My USBTMC device has an interrupt endpoint with a 1ms interval.
>> >
>> > 1) A query is sent to the USBTMC device.
>> >
>> > 2) An SRQ is reported via the interrupt endpoint with MAV set.
>> >
>> > 3) Userspace performs a read to get the reply since MAV is set after
>> > being notified by poll().
>>
>> Did you start reading (read()) without checking the Status Byte after
>> poll() event?
>
> I check the status byte after poll(). The problem seems to be that I
> also check the status byte in a loop (until there is nothing to
> service) before calling poll again.

This is not a problem. You can check the status byte several times
as long as you like. When RQS Bit (0x40) is set, then you know it was
an intermediate SRQ, sent according to chapter 3.4.1
(USBTMC_usb488_subclass_1_00.pdf). Otherwise it is a requested status
byte according to chapter 3.4.2.

> As long as you only check the
> status byte when there is a cached value available it should be no
> problem. However if you call USBTMC488_IOCTL_READ_STB when there is
> not a cached SRQ value an SRQ can occur after the lock is released in
> usbtmc488_ioctl_read_stb() and cache an older value (which will be
> read by the next USBTMC488_IOCTL_READ_STB) than the one returned.

Yes, interrupts (here SRQ) can happen at any time. Therefore the user
can enable/disable interrupts (e.g. with SCPI command SRE) and
postpone interrupt handling.

The SRQ sends a status byte with RQS bit set (chapter 3.4.1) and the
request initiated by the client returns a status byte without RQS bit
(chapter 3.4.2).

> This
> is a race condition and sounds broken to me but if this is the
> intended behavior I can adjust my userspace code to only do
> USBTMC488_IOCTL_READ_STB once after poll indicates an SRQ and live
> with it.
> It doesn't seem right for USBTMC488_IOCTL_READ_STB to ever report an
> older value than what was reported on the previous call.

I agree that this can result in an odd behavior (E.g. the MAV bit
is unset with reading the subsequent cached status byte).
I was not aware about this problem and I need to check whether this is
a common problem in VISA implementations or whether we just need to fix
the kernel (e.g. drop requested status byte and return older cached value).

A workaround to avoid this odd behavior is to read the status byte
again with USBTMC488_IOCTL_READ_STB when your user application detects
the RQS bit.

BTW when you look at the old implementation
https://elixir.bootlin.com/linux/v4.7.10/source/drivers/usb/class/usbtmc.c#L1332
then you can see that you will never get a status byte with RQS bit set.
The status byte was never stored in data->bNotify2

Regards,

Guido




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: usbtmc: stb race condition introduced by 4f3c8d6eddc272b386464524235440a418ed2029
  2020-09-02 15:25     ` guido
@ 2020-09-03 14:42       ` George McCollister
  2020-09-03 16:54         ` guido
  0 siblings, 1 reply; 8+ messages in thread
From: George McCollister @ 2020-09-03 14:42 UTC (permalink / raw)
  To: guido; +Cc: linux-usb, guido.kiener

On Wed, Sep 2, 2020 at 10:25 AM <guido@kiener-muenchen.de> wrote:
>
> >> More info see:
> >> https://github.com/GuidoKiener/linux-usbtmc/blob/master/README.md
> >>
> >> > My USBTMC device has an interrupt endpoint with a 1ms interval.
> >> >
> >> > 1) A query is sent to the USBTMC device.
> >> >
> >> > 2) An SRQ is reported via the interrupt endpoint with MAV set.
> >> >
> >> > 3) Userspace performs a read to get the reply since MAV is set after
> >> > being notified by poll().
> >>
> >> Did you start reading (read()) without checking the Status Byte after
> >> poll() event?
> >
> > I check the status byte after poll(). The problem seems to be that I
> > also check the status byte in a loop (until there is nothing to
> > service) before calling poll again.
>
> This is not a problem. You can check the status byte several times
> as long as you like. When RQS Bit (0x40) is set, then you know it was
> an intermediate SRQ, sent according to chapter 3.4.1
> (USBTMC_usb488_subclass_1_00.pdf). Otherwise it is a requested status
> byte according to chapter 3.4.2.

I wouldn't call myself a USBTMC or 488 expert. I read the USBTMC spec,
488.1 and 488.2 but it's been three years since I implemented most of
my device firmware and wrote the userspace code. It runs in a
contained environment so I haven't done any interoperability testing.
However, my understanding is that MSS shares the same bit with RQS and
if something is still needing service the MSS bit will be set in the
status byte given in reply to a READ_STATUS_BYTE request. If that is
the case how would you tell a status byte sent as an SRQ vs one with
MSS sent in reply to READ_STATUS_BYTE? See IEEE Std 488.2-1992 Figure
11-3—Service Request Enabling.

>
> > As long as you only check the
> > status byte when there is a cached value available it should be no
> > problem. However if you call USBTMC488_IOCTL_READ_STB when there is
> > not a cached SRQ value an SRQ can occur after the lock is released in
> > usbtmc488_ioctl_read_stb() and cache an older value (which will be
> > read by the next USBTMC488_IOCTL_READ_STB) than the one returned.
>
> Yes, interrupts (here SRQ) can happen at any time. Therefore the user
> can enable/disable interrupts (e.g. with SCPI command SRE) and
> postpone interrupt handling.

This wasn't necessary before but perhaps it could be used to work
around the issue.

>
> The SRQ sends a status byte with RQS bit set (chapter 3.4.1) and the
> request initiated by the client returns a status byte without RQS bit
> (chapter 3.4.2).

Again maybe I'm confused by how MSS is supposed to work.

>
> > This
> > is a race condition and sounds broken to me but if this is the
> > intended behavior I can adjust my userspace code to only do
> > USBTMC488_IOCTL_READ_STB once after poll indicates an SRQ and live
> > with it.
> > It doesn't seem right for USBTMC488_IOCTL_READ_STB to ever report an
> > older value than what was reported on the previous call.
>
> I agree that this can result in an odd behavior (E.g. the MAV bit
> is unset with reading the subsequent cached status byte).
> I was not aware about this problem and I need to check whether this is
> a common problem in VISA implementations or whether we just need to fix
> the kernel (e.g. drop requested status byte and return older cached value).
>
> A workaround to avoid this odd behavior is to read the status byte
> again with USBTMC488_IOCTL_READ_STB when your user application detects
> the RQS bit.
>
> BTW when you look at the old implementation
> https://elixir.bootlin.com/linux/v4.7.10/source/drivers/usb/class/usbtmc.c#L1332
> then you can see that you will never get a status byte with RQS bit set.
> The status byte was never stored in data->bNotify2
>
> Regards,
>
> Guido
>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: usbtmc: stb race condition introduced by 4f3c8d6eddc272b386464524235440a418ed2029
  2020-09-03 14:42       ` George McCollister
@ 2020-09-03 16:54         ` guido
  2020-09-22 15:33           ` guido
  0 siblings, 1 reply; 8+ messages in thread
From: guido @ 2020-09-03 16:54 UTC (permalink / raw)
  To: George McCollister; +Cc: linux-usb, guido.kiener


>> >> More info see:
>> >> https://github.com/GuidoKiener/linux-usbtmc/blob/master/README.md
>> >>
>> >> > My USBTMC device has an interrupt endpoint with a 1ms interval.
>> >> >
>> >> > 1) A query is sent to the USBTMC device.
>> >> >
>> >> > 2) An SRQ is reported via the interrupt endpoint with MAV set.
>> >> >
>> >> > 3) Userspace performs a read to get the reply since MAV is set after
>> >> > being notified by poll().
>> >>
>> >> Did you start reading (read()) without checking the Status Byte after
>> >> poll() event?
>> >
>> > I check the status byte after poll(). The problem seems to be that I
>> > also check the status byte in a loop (until there is nothing to
>> > service) before calling poll again.
>>
>> This is not a problem. You can check the status byte several times
>> as long as you like. When RQS Bit (0x40) is set, then you know it was
>> an intermediate SRQ, sent according to chapter 3.4.1
>> (USBTMC_usb488_subclass_1_00.pdf). Otherwise it is a requested status
>> byte according to chapter 3.4.2.
>
> I wouldn't call myself a USBTMC or 488 expert. I read the USBTMC spec,
> 488.1 and 488.2 but it's been three years since I implemented most of
> my device firmware and wrote the userspace code. It runs in a
> contained environment so I haven't done any interoperability testing.
> However, my understanding is that MSS shares the same bit with RQS and
> if something is still needing service the MSS bit will be set in the
> status byte given in reply to a READ_STATUS_BYTE request. If that is
> the case how would you tell a status byte sent as an SRQ vs one with
> MSS sent in reply to READ_STATUS_BYTE? See IEEE Std 488.2-1992 Figure
> 11-3—Service Request Enabling.
>
Yes RQS and MSS share the same bit. When you look at Figure 11-2 then you
can see the difference in handling this bit:
1. RQS is returned by a Serial Poll (=viReadSTB()). The RQS bit is reset
after the Serial Poll. See chapter 11.2.2.1 Reading with a Serial Poll.
Therefore the RQS bit is only sent with the SRQ message and deleted after
sending the message. As a consequence the device need not to set the RQS bit
when the status request is initiated by the READ_STATUS_BYTE.

2. The MSS bit does not toggle after reading, but it is only sent in the
SCPI layer as a reponse to the SCPI query "*STB?".

This is true for 488.2 devices and when you support the Interrupt pipe.
It should be true for 488.1 devices, too, however it is very difficult to
get the whole picture from this spec.
Most people, who have written that spec, are now retired.

>>
>> > As long as you only check the
>> > status byte when there is a cached value available it should be no
>> > problem. However if you call USBTMC488_IOCTL_READ_STB when there is
>> > not a cached SRQ value an SRQ can occur after the lock is released in
>> > usbtmc488_ioctl_read_stb() and cache an older value (which will be
>> > read by the next USBTMC488_IOCTL_READ_STB) than the one returned.
>>
>> Yes, interrupts (here SRQ) can happen at any time. Therefore the user
>> can enable/disable interrupts (e.g. with SCPI command SRE) and
>> postpone interrupt handling.
>
> This wasn't necessary before but perhaps it could be used to work
> around the issue.
>
>>
>> The SRQ sends a status byte with RQS bit set (chapter 3.4.1) and the
>> request initiated by the client returns a status byte without RQS bit
>> (chapter 3.4.2).
>
> Again maybe I'm confused by how MSS is supposed to work.
>

Regards,

Guido



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: usbtmc: stb race condition introduced by 4f3c8d6eddc272b386464524235440a418ed2029
  2020-09-03 16:54         ` guido
@ 2020-09-22 15:33           ` guido
  2020-09-28 13:58             ` George McCollister
  0 siblings, 1 reply; 8+ messages in thread
From: guido @ 2020-09-22 15:33 UTC (permalink / raw)
  To: George McCollister; +Cc: linux-usb, guido.kiener, dpenkler, steve_bayless


Hi George,

We discussed some solutions to solve the race you found.

Since we want to avoid future requests and compatibility problems
regarding the SRQ and Status Byte, we will fix the current ioctl.
Furthermore we add two ioctls which return the original status
message and avoid assumptions about the USB488 subclass definition.

We will have the 3 STB ioctls:

1) USBTMC488_IOCTL_READ_STB always reads the STB from the device and
if the associated file descriptor has the srq_asserted bit set it ors
in the RQS bit to the returned STB and clears the srq_asserted bit.

Comment: This ioctl will return the latest status byte again, but will
not loose the RQS bit sent by intermediate SRQ messages. This ioctl
should be conform with 488.2 devices.

2) USBTMC_IOCTL_GET_STB always reads the STB from the device and returns
the STB unmodified to the application. The srq_asserted bit is not changed.

3) USBTMC_IOCTL_GET_SRQ_STB if the associated file descriptor has the
srq_asserted bit set it returns the STB originally sent in the device
SRQ message and clears the srq_asserted bit otherwise it returns error
code ENOMSG.

Please let us know your feedback and comments.

Regards,

Guido



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: usbtmc: stb race condition introduced by 4f3c8d6eddc272b386464524235440a418ed2029
  2020-09-22 15:33           ` guido
@ 2020-09-28 13:58             ` George McCollister
  0 siblings, 0 replies; 8+ messages in thread
From: George McCollister @ 2020-09-28 13:58 UTC (permalink / raw)
  To: guido; +Cc: linux-usb, Guido Kiener, dpenkler, steve_bayless

On Tue, Sep 22, 2020 at 10:33 AM <guido@kiener-muenchen.de> wrote:
>
>
> Hi George,
>
> We discussed some solutions to solve the race you found.
>
> Since we want to avoid future requests and compatibility problems
> regarding the SRQ and Status Byte, we will fix the current ioctl.
> Furthermore we add two ioctls which return the original status
> message and avoid assumptions about the USB488 subclass definition.
>
> We will have the 3 STB ioctls:
>
> 1) USBTMC488_IOCTL_READ_STB always reads the STB from the device and
> if the associated file descriptor has the srq_asserted bit set it ors
> in the RQS bit to the returned STB and clears the srq_asserted bit.
>
> Comment: This ioctl will return the latest status byte again, but will
> not loose the RQS bit sent by intermediate SRQ messages. This ioctl
> should be conform with 488.2 devices.
>
> 2) USBTMC_IOCTL_GET_STB always reads the STB from the device and returns
> the STB unmodified to the application. The srq_asserted bit is not changed.
>
> 3) USBTMC_IOCTL_GET_SRQ_STB if the associated file descriptor has the
> srq_asserted bit set it returns the STB originally sent in the device
> SRQ message and clears the srq_asserted bit otherwise it returns error
> code ENOMSG.
>
> Please let us know your feedback and comments.

Sounds great (and sorry for the late reply). This should allow for a
variety of reliable application and device implementations. I can't
think of anything else to add.

>
> Regards,
>
> Guido
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-09-28 13:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-31 14:31 usbtmc: stb race condition introduced by 4f3c8d6eddc272b386464524235440a418ed2029 George McCollister
2020-09-01 18:59 ` guido
2020-09-01 22:21   ` George McCollister
2020-09-02 15:25     ` guido
2020-09-03 14:42       ` George McCollister
2020-09-03 16:54         ` guido
2020-09-22 15:33           ` guido
2020-09-28 13:58             ` George McCollister

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.