All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] scsi: scsi_dh_alua: do not set h->sdev to NULL before removing the list entry
@ 2020-09-11 16:21 Brian Bunker
  2020-09-18 18:41 ` Ewan D. Milne
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Bunker @ 2020-09-11 16:21 UTC (permalink / raw)
  To: linux-scsi

A race exists where the BUG_ON(!h->sdev) will fire if the detach device handler
from one thread runs removing a list entry while another thread is trying to
evaluate the target portal group state.

Do not set the h->sdev to NULL in the detach device handler. It is freed at the
end of the function any way. Also remove the BUG_ON since the condition
that causes them to fire has been removed.

Signed-off-by: Brian Bunker <brian@purestorage.com>
Acked-by: Krishna Kant <krishna.kant@purestorage.com>
___
--- a/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-10 12:29:03.000000000 -0700
+++ b/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-11 09:14:15.000000000 -0700
@@ -658,8 +658,6 @@
                                        rcu_read_lock();
                                        list_for_each_entry_rcu(h,
                                                &tmp_pg->dh_list, node) {
-                                               /* h->sdev should always be valid */
-                                               BUG_ON(!h->sdev);
                                                h->sdev->access_state = desc[0];
                                        }
                                        rcu_read_unlock();
@@ -705,7 +703,6 @@
                        pg->expiry = 0;
                        rcu_read_lock();
                        list_for_each_entry_rcu(h, &pg->dh_list, node) {
-                               BUG_ON(!h->sdev);
                                h->sdev->access_state =
                                        (pg->state & SCSI_ACCESS_STATE_MASK);
                                if (pg->pref)
@@ -1147,7 +1144,6 @@
        spin_lock(&h->pg_lock);
        pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h->pg_lock));
        rcu_assign_pointer(h->pg, NULL);
-       h->sdev = NULL;
        spin_unlock(&h->pg_lock);
        if (pg) {
                spin_lock_irq(&pg->lock);





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] scsi: scsi_dh_alua: do not set h->sdev to NULL before removing the list entry
  2020-09-11 16:21 [PATCH 1/1] scsi: scsi_dh_alua: do not set h->sdev to NULL before removing the list entry Brian Bunker
@ 2020-09-18 18:41 ` Ewan D. Milne
  2020-09-18 22:07   ` Brian Bunker
  0 siblings, 1 reply; 6+ messages in thread
From: Ewan D. Milne @ 2020-09-18 18:41 UTC (permalink / raw)
  To: Brian Bunker, linux-scsi

On Fri, 2020-09-11 at 09:21 -0700, Brian Bunker wrote:
> A race exists where the BUG_ON(!h->sdev) will fire if the detach
> device handler
> from one thread runs removing a list entry while another thread is
> trying to
> evaluate the target portal group state.
> 
> Do not set the h->sdev to NULL in the detach device handler. It is
> freed at the
> end of the function any way. Also remove the BUG_ON since the
> condition
> that causes them to fire has been removed.
> 
> Signed-off-by: Brian Bunker <brian@purestorage.com>
> Acked-by: Krishna Kant <krishna.kant@purestorage.com>
> ___
> --- a/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-10
> 12:29:03.000000000 -0700
> +++ b/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-11
> 09:14:15.000000000 -0700
> @@ -658,8 +658,6 @@
>                                         rcu_read_lock();
>                                         list_for_each_entry_rcu(h,
>                                                 &tmp_pg->dh_list,
> node) {
> -                                               /* h->sdev should
> always be valid */
> -                                               BUG_ON(!h->sdev);
>                                                 h->sdev->access_state 
> = desc[0];
>                                         }
>                                         rcu_read_unlock();
> @@ -705,7 +703,6 @@
>                         pg->expiry = 0;
>                         rcu_read_lock();
>                         list_for_each_entry_rcu(h, &pg->dh_list,
> node) {
> -                               BUG_ON(!h->sdev);
>                                 h->sdev->access_state =
>                                         (pg->state &
> SCSI_ACCESS_STATE_MASK);
>                                 if (pg->pref)
> @@ -1147,7 +1144,6 @@
>         spin_lock(&h->pg_lock);
>         pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h-
> >pg_lock));
>         rcu_assign_pointer(h->pg, NULL);
> -       h->sdev = NULL;
>         spin_unlock(&h->pg_lock);
>         if (pg) {
>                 spin_lock_irq(&pg->lock);
> 

We ran this change through fault insertion regression testing and
did not see any new problems.  (Our tests didn't hit the original
bug being fixed here though.)

-Ewan



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] scsi: scsi_dh_alua: do not set h->sdev to NULL before removing the list entry
  2020-09-18 18:41 ` Ewan D. Milne
@ 2020-09-18 22:07   ` Brian Bunker
  2020-09-23  8:23     ` Hannes Reinecke
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Bunker @ 2020-09-18 22:07 UTC (permalink / raw)
  To: Ewan D. Milne; +Cc: linux-scsi

Internally we had discussions about adding a synchronize_rcu here:

--- a/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-10 12:29:03.000000000 -0700
+++ b/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-18 14:59:59.000000000 -0700
@@ -658,8 +658,6 @@
                                        rcu_read_lock();
                                        list_for_each_entry_rcu(h,
                                                &tmp_pg->dh_list, node) {
-                                               /* h->sdev should always be valid */
-                                               BUG_ON(!h->sdev);
                                                h->sdev->access_state = desc[0];
                                        }
                                        rcu_read_unlock();
@@ -705,7 +703,6 @@
                        pg->expiry = 0;
                        rcu_read_lock();
                        list_for_each_entry_rcu(h, &pg->dh_list, node) {
-                               BUG_ON(!h->sdev);
                                h->sdev->access_state =
                                        (pg->state & SCSI_ACCESS_STATE_MASK);
                                if (pg->pref)
@@ -1147,7 +1144,6 @@
        spin_lock(&h->pg_lock);
        pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h->pg_lock));
        rcu_assign_pointer(h->pg, NULL);
-       h->sdev = NULL;
        spin_unlock(&h->pg_lock);
        if (pg) {
                spin_lock_irq(&pg->lock);
@@ -1156,6 +1152,7 @@
                kref_put(&pg->kref, release_port_group);
        }
        sdev->handler_data = NULL;
+       synchronize_rcu();
        kfree(h);
 }

We were thinking that this would allow any running readers to finish before the object is freed. Would
that make sense there? From what I can tell the only RCU implementation in this kernel version in
scsi is here, so I couldn’t find many examples to follow.
 
Brian Bunker
SW Eng
brian@purestorage.com



> On Sep 18, 2020, at 11:41 AM, Ewan D. Milne <emilne@redhat.com> wrote:
> 
> On Fri, 2020-09-11 at 09:21 -0700, Brian Bunker wrote:
>> A race exists where the BUG_ON(!h->sdev) will fire if the detach
>> device handler
>> from one thread runs removing a list entry while another thread is
>> trying to
>> evaluate the target portal group state.
>> 
>> Do not set the h->sdev to NULL in the detach device handler. It is
>> freed at the
>> end of the function any way. Also remove the BUG_ON since the
>> condition
>> that causes them to fire has been removed.
>> 
>> Signed-off-by: Brian Bunker <brian@purestorage.com>
>> Acked-by: Krishna Kant <krishna.kant@purestorage.com>
>> ___
>> --- a/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-10
>> 12:29:03.000000000 -0700
>> +++ b/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-11
>> 09:14:15.000000000 -0700
>> @@ -658,8 +658,6 @@
>>                                        rcu_read_lock();
>>                                        list_for_each_entry_rcu(h,
>>                                                &tmp_pg->dh_list,
>> node) {
>> -                                               /* h->sdev should
>> always be valid */
>> -                                               BUG_ON(!h->sdev);
>>                                                h->sdev->access_state 
>> = desc[0];
>>                                        }
>>                                        rcu_read_unlock();
>> @@ -705,7 +703,6 @@
>>                        pg->expiry = 0;
>>                        rcu_read_lock();
>>                        list_for_each_entry_rcu(h, &pg->dh_list,
>> node) {
>> -                               BUG_ON(!h->sdev);
>>                                h->sdev->access_state =
>>                                        (pg->state &
>> SCSI_ACCESS_STATE_MASK);
>>                                if (pg->pref)
>> @@ -1147,7 +1144,6 @@
>>        spin_lock(&h->pg_lock);
>>        pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h-
>>> pg_lock));
>>        rcu_assign_pointer(h->pg, NULL);
>> -       h->sdev = NULL;
>>        spin_unlock(&h->pg_lock);
>>        if (pg) {
>>                spin_lock_irq(&pg->lock);
>> 
> 
> We ran this change through fault insertion regression testing and
> did not see any new problems.  (Our tests didn't hit the original
> bug being fixed here though.)
> 
> -Ewan
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] scsi: scsi_dh_alua: do not set h->sdev to NULL before removing the list entry
  2020-09-18 22:07   ` Brian Bunker
@ 2020-09-23  8:23     ` Hannes Reinecke
  2020-09-23 22:21       ` Brian Bunker
  0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2020-09-23  8:23 UTC (permalink / raw)
  To: Brian Bunker, Ewan D. Milne; +Cc: linux-scsi

On 9/19/20 12:07 AM, Brian Bunker wrote:
> Internally we had discussions about adding a synchronize_rcu here:
> 
> --- a/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-10 12:29:03.000000000 -0700
> +++ b/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-18 14:59:59.000000000 -0700
> @@ -658,8 +658,6 @@
>                                          rcu_read_lock();
>                                          list_for_each_entry_rcu(h,
>                                                  &tmp_pg->dh_list, node) {
> -                                               /* h->sdev should always be valid */
> -                                               BUG_ON(!h->sdev);
>                                                  h->sdev->access_state = desc[0];
>                                          }
>                                          rcu_read_unlock();
> @@ -705,7 +703,6 @@
>                          pg->expiry = 0;
>                          rcu_read_lock();
>                          list_for_each_entry_rcu(h, &pg->dh_list, node) {
> -                               BUG_ON(!h->sdev);
>                                  h->sdev->access_state =
>                                          (pg->state & SCSI_ACCESS_STATE_MASK);
>                                  if (pg->pref)
> @@ -1147,7 +1144,6 @@
>          spin_lock(&h->pg_lock);
>          pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h->pg_lock));
>          rcu_assign_pointer(h->pg, NULL);
> -       h->sdev = NULL;
>          spin_unlock(&h->pg_lock);
>          if (pg) {
>                  spin_lock_irq(&pg->lock);
> @@ -1156,6 +1152,7 @@
>                  kref_put(&pg->kref, release_port_group);
>          }
>          sdev->handler_data = NULL;
> +       synchronize_rcu();
>          kfree(h);
>   }
> 
> We were thinking that this would allow any running readers to finish before the object is freed. Would
> that make sense there? From what I can tell the only RCU implementation in this kernel version in
> scsi is here, so I couldn’t find many examples to follow.
>   That actually looks quite viable (if the intendation gets fixed :-)
Initially I was thinking of using 'h->sdev = NULL' as a marker for the 
other code to figure out that the device has about to be deleted; that 
worked, but with rather unintended consequences.

Have you tried with your patch?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] scsi: scsi_dh_alua: do not set h->sdev to NULL before removing the list entry
  2020-09-23  8:23     ` Hannes Reinecke
@ 2020-09-23 22:21       ` Brian Bunker
  2020-09-24 10:42         ` Hannes Reinecke
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Bunker @ 2020-09-23 22:21 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Ewan D. Milne, linux-scsi

We have tried with our patch here and it works. We have not tried with our patch at the
customer site where they hit the crash. Since they hit the BUG_ON line which we
can see in the logs we have, we expect that removing the race as we did 
would avoid the crash. We also remove the BUG_ON’s in our patch so it can’t hit
the same crash. If there is another similar race a null pointer deference could still
happen in our patch. I saw you had a patch to only use the value if the pointer is not
null. That would also work to stop the crash, but it would hide the race where the
BUG_ON was helpful in finding it.

Trying our fix at the customer site for us would be more difficult since the operating
system crash belongs to Oracle. That is why you see their patch for the same
issue. Our interest in getting this fixed goes beyond this customer since more
Linux vendors as they move forward in kernel version inherit this code, and
we are reliant on ALUA. We hope to catch it here.

Should I put together a patch with the h->sdev set to null removed from the
detach function along the syncrhronize_rcu and removing the BUG_ON, or
did you want me to diff against your checkin where you have already removed
the BUG_ON?

Thanks,
Brian

Brian Bunker
SW Eng
brian@purestorage.com



> On Sep 23, 2020, at 1:23 AM, Hannes Reinecke <hare@suse.de> wrote:
> 
> On 9/19/20 12:07 AM, Brian Bunker wrote:
>> Internally we had discussions about adding a synchronize_rcu here:
>> --- a/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-10 12:29:03.000000000 -0700
>> +++ b/scsi/drivers/scsi/device_handler/scsi_dh_alua.c   2020-09-18 14:59:59.000000000 -0700
>> @@ -658,8 +658,6 @@
>>                                         rcu_read_lock();
>>                                         list_for_each_entry_rcu(h,
>>                                                 &tmp_pg->dh_list, node) {
>> -                                               /* h->sdev should always be valid */
>> -                                               BUG_ON(!h->sdev);
>>                                                 h->sdev->access_state = desc[0];
>>                                         }
>>                                         rcu_read_unlock();
>> @@ -705,7 +703,6 @@
>>                         pg->expiry = 0;
>>                         rcu_read_lock();
>>                         list_for_each_entry_rcu(h, &pg->dh_list, node) {
>> -                               BUG_ON(!h->sdev);
>>                                 h->sdev->access_state =
>>                                         (pg->state & SCSI_ACCESS_STATE_MASK);
>>                                 if (pg->pref)
>> @@ -1147,7 +1144,6 @@
>>         spin_lock(&h->pg_lock);
>>         pg = rcu_dereference_protected(h->pg, lockdep_is_held(&h->pg_lock));
>>         rcu_assign_pointer(h->pg, NULL);
>> -       h->sdev = NULL;
>>         spin_unlock(&h->pg_lock);
>>         if (pg) {
>>                 spin_lock_irq(&pg->lock);
>> @@ -1156,6 +1152,7 @@
>>                 kref_put(&pg->kref, release_port_group);
>>         }
>>         sdev->handler_data = NULL;
>> +       synchronize_rcu();
>>         kfree(h);
>>  }
>> We were thinking that this would allow any running readers to finish before the object is freed. Would
>> that make sense there? From what I can tell the only RCU implementation in this kernel version in
>> scsi is here, so I couldn’t find many examples to follow.
>>  That actually looks quite viable (if the intendation gets fixed :-)
> Initially I was thinking of using 'h->sdev = NULL' as a marker for the other code to figure out that the device has about to be deleted; that worked, but with rather unintended consequences.
> 
> Have you tried with your patch?
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] scsi: scsi_dh_alua: do not set h->sdev to NULL before removing the list entry
  2020-09-23 22:21       ` Brian Bunker
@ 2020-09-24 10:42         ` Hannes Reinecke
  0 siblings, 0 replies; 6+ messages in thread
From: Hannes Reinecke @ 2020-09-24 10:42 UTC (permalink / raw)
  To: Brian Bunker; +Cc: Ewan D. Milne, linux-scsi

On 9/24/20 12:21 AM, Brian Bunker wrote:
> We have tried with our patch here and it works. We have not tried with our patch at the
> customer site where they hit the crash. Since they hit the BUG_ON line which we
> can see in the logs we have, we expect that removing the race as we did
> would avoid the crash. We also remove the BUG_ON’s in our patch so it can’t hit
> the same crash. If there is another similar race a null pointer deference could still
> happen in our patch. I saw you had a patch to only use the value if the pointer is not
> null. That would also work to stop the crash, but it would hide the race where the
> BUG_ON was helpful in finding it.
> 
> Trying our fix at the customer site for us would be more difficult since the operating
> system crash belongs to Oracle. That is why you see their patch for the same
> issue. Our interest in getting this fixed goes beyond this customer since more
> Linux vendors as they move forward in kernel version inherit this code, and
> we are reliant on ALUA. We hope to catch it here.
> 
> Should I put together a patch with the h->sdev set to null removed from the
> detach function along the syncrhronize_rcu and removing the BUG_ON, or
> did you want me to diff against your checkin where you have already removed
> the BUG_ON?
> 
No need, I already sent a patch attached to another mail to the oracle 
folks.
Guess I'll be sending an 'official' patch now, seeing that I have 
confirmation.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-24 10:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11 16:21 [PATCH 1/1] scsi: scsi_dh_alua: do not set h->sdev to NULL before removing the list entry Brian Bunker
2020-09-18 18:41 ` Ewan D. Milne
2020-09-18 22:07   ` Brian Bunker
2020-09-23  8:23     ` Hannes Reinecke
2020-09-23 22:21       ` Brian Bunker
2020-09-24 10:42         ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.