linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] virtio-scsi: Fix the race condition in virtscsi_handle_event
@ 2015-01-05  6:04 Fam Zheng
  2015-01-05 19:51 ` Venkatesh Srinivas
       [not found] ` <CAHdzE-_QmNZVhmChHmy3FgRjJ692Apw2Va-SGoTz25s0fLjbvQ@mail.gmail.com>
  0 siblings, 2 replies; 5+ messages in thread
From: Fam Zheng @ 2015-01-05  6:04 UTC (permalink / raw)
  To: linux-scsi
  Cc: James E.J. Bottomley, linux-kernel, Paolo Bonzini,
	Christoph Hellwig, Michael S. Tsirkin

There is a race condition in virtscsi_handle_event, when many device
hotplug/unplug events flush in quickly.

The scsi_remove_device in virtscsi_handle_transport_reset may trigger
the BUG_ON in scsi_target_reap, because the state is altered behind it,
probably by scsi_scan_host of another event. I'm able to reproduce it by
repeatedly plugging and unplugging a scsi disk with the same lun number.

To make is safe, the mutex added in struct virtio_scsi is held in
virtscsi_handle_event, so that all the events are processed in a
synchronized way. With this lock, the panic goes away.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 drivers/scsi/virtio_scsi.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index c52bb5d..7f194d4 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -110,6 +110,9 @@ struct virtio_scsi {
 	/* CPU hotplug notifier */
 	struct notifier_block nb;
 
+	/* Protect the hotplug/unplug event handling */
+	struct mutex scan_lock;
+
 	/* Protected by event_vq lock */
 	bool stop_events;
 
@@ -377,6 +380,7 @@ static void virtscsi_handle_event(struct work_struct *work)
 	struct virtio_scsi *vscsi = event_node->vscsi;
 	struct virtio_scsi_event *event = &event_node->event;
 
+	mutex_lock(&vscsi->scan_lock);
 	if (event->event &
 	    cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) {
 		event->event &= ~cpu_to_virtio32(vscsi->vdev,
@@ -397,6 +401,7 @@ static void virtscsi_handle_event(struct work_struct *work)
 		pr_err("Unsupport virtio scsi event %x\n", event->event);
 	}
 	virtscsi_kick_event(vscsi, event_node);
+	mutex_unlock(&vscsi->scan_lock);
 }
 
 static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf)
@@ -894,6 +899,7 @@ static int virtscsi_init(struct virtio_device *vdev,
 	const char **names;
 	struct virtqueue **vqs;
 
+	mutex_init(&vscsi->scan_lock);
 	num_vqs = vscsi->num_queues + VIRTIO_SCSI_VQ_BASE;
 	vqs = kmalloc(num_vqs * sizeof(struct virtqueue *), GFP_KERNEL);
 	callbacks = kmalloc(num_vqs * sizeof(vq_callback_t *), GFP_KERNEL);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] virtio-scsi: Fix the race condition in virtscsi_handle_event
  2015-01-05  6:04 [PATCH] virtio-scsi: Fix the race condition in virtscsi_handle_event Fam Zheng
@ 2015-01-05 19:51 ` Venkatesh Srinivas
       [not found] ` <CAHdzE-_QmNZVhmChHmy3FgRjJ692Apw2Va-SGoTz25s0fLjbvQ@mail.gmail.com>
  1 sibling, 0 replies; 5+ messages in thread
From: Venkatesh Srinivas @ 2015-01-05 19:51 UTC (permalink / raw)
  To: Fam Zheng
  Cc: linux-scsi, James E.J. Bottomley, Linux Kernel Developers List,
	Paolo Bonzini, Christoph Hellwig, Michael S. Tsirkin

On Sun, Jan 4, 2015 at 10:04 PM, Fam Zheng <famz@redhat.com> wrote:
>
> There is a race condition in virtscsi_handle_event, when many device
> hotplug/unplug events flush in quickly.
>
> The scsi_remove_device in virtscsi_handle_transport_reset may trigger
> the BUG_ON in scsi_target_reap, because the state is altered behind it,
> probably by scsi_scan_host of another event. I'm able to reproduce it by
> repeatedly plugging and unplugging a scsi disk with the same lun number.
>
> To make is safe, the mutex added in struct virtio_scsi is held in
> virtscsi_handle_event, so that all the events are processed in a
> synchronized way. With this lock, the panic goes away.
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  drivers/scsi/virtio_scsi.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index c52bb5d..7f194d4 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -110,6 +110,9 @@ struct virtio_scsi {
>         /* CPU hotplug notifier */
>         struct notifier_block nb;
>
> +       /* Protect the hotplug/unplug event handling */
> +       struct mutex scan_lock;
> +
>         /* Protected by event_vq lock */
>         bool stop_events;
>
> @@ -377,6 +380,7 @@ static void virtscsi_handle_event(struct work_struct *work)
>         struct virtio_scsi *vscsi = event_node->vscsi;
>         struct virtio_scsi_event *event = &event_node->event;
>
> +       mutex_lock(&vscsi->scan_lock);
>         if (event->event &
>             cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) {
>                 event->event &= ~cpu_to_virtio32(vscsi->vdev,
> @@ -397,6 +401,7 @@ static void virtscsi_handle_event(struct work_struct *work)
>                 pr_err("Unsupport virtio scsi event %x\n", event->event);
>         }
>         virtscsi_kick_event(vscsi, event_node);
> +       mutex_unlock(&vscsi->scan_lock);
>  }
>
>  static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf)
> @@ -894,6 +899,7 @@ static int virtscsi_init(struct virtio_device *vdev,
>         const char **names;
>         struct virtqueue **vqs;
>
> +       mutex_init(&vscsi->scan_lock);
>         num_vqs = vscsi->num_queues + VIRTIO_SCSI_VQ_BASE;
>         vqs = kmalloc(num_vqs * sizeof(struct virtqueue *), GFP_KERNEL);
>         callbacks = kmalloc(num_vqs * sizeof(vq_callback_t *), GFP_KERNEL);
> --
> 1.9.3


Nice find.

This fix does have the effect of serializing all event handling via
scan_lock; perhaps you want to instead create a singlethreaded
workqueue in virtio_scsi and queue handle_event there, rather than
waiting on scan_lock on the system workqueue?

Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>

-- vs;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] virtio-scsi: Fix the race condition in virtscsi_handle_event
       [not found] ` <CAHdzE-_QmNZVhmChHmy3FgRjJ692Apw2Va-SGoTz25s0fLjbvQ@mail.gmail.com>
@ 2015-01-05 22:10   ` Michael S. Tsirkin
  2015-01-06  3:57     ` Fam Zheng
  2015-01-06  7:15     ` Michael S. Tsirkin
  0 siblings, 2 replies; 5+ messages in thread
From: Michael S. Tsirkin @ 2015-01-05 22:10 UTC (permalink / raw)
  To: Venkatesh Srinivas
  Cc: Fam Zheng, linux-scsi, James E.J. Bottomley,
	Linux Kernel Developers List, Paolo Bonzini, Christoph Hellwig

On Mon, Jan 05, 2015 at 11:48:47AM -0800, Venkatesh Srinivas wrote:
> On Sun, Jan 4, 2015 at 10:04 PM, Fam Zheng <famz@redhat.com> wrote:
> 
>     There is a race condition in virtscsi_handle_event, when many device
>     hotplug/unplug events flush in quickly.
> 
>     The scsi_remove_device in virtscsi_handle_transport_reset may trigger
>     the BUG_ON in scsi_target_reap, because the state is altered behind it,
>     probably by scsi_scan_host of another event. I'm able to reproduce it by
>     repeatedly plugging and unplugging a scsi disk with the same lun number.
> 
>     To make is safe, the mutex added in struct virtio_scsi is held in
>     virtscsi_handle_event, so that all the events are processed in a
>     synchronized way. With this lock, the panic goes away.
> 
>     Signed-off-by: Fam Zheng <famz@redhat.com>
>     ---
>      drivers/scsi/virtio_scsi.c | 6 ++++++
>      1 file changed, 6 insertions(+)
> 
>     diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
>     index c52bb5d..7f194d4 100644
>     --- a/drivers/scsi/virtio_scsi.c
>     +++ b/drivers/scsi/virtio_scsi.c
>     @@ -110,6 +110,9 @@ struct virtio_scsi {
>             /* CPU hotplug notifier */
>             struct notifier_block nb;
> 
>     +       /* Protect the hotplug/unplug event handling */
>     +       struct mutex scan_lock;
>     +
>             /* Protected by event_vq lock */
>             bool stop_events;
> 
>     @@ -377,6 +380,7 @@ static void virtscsi_handle_event(struct work_struct
>     *work)
>             struct virtio_scsi *vscsi = event_node->vscsi;
>             struct virtio_scsi_event *event = &event_node->event;
> 
>     +       mutex_lock(&vscsi->scan_lock);
>             if (event->event &
>                 cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) {
>                     event->event &= ~cpu_to_virtio32(vscsi->vdev,
>     @@ -397,6 +401,7 @@ static void virtscsi_handle_event(struct work_struct
>     *work)
>                     pr_err("Unsupport virtio scsi event %x\n", event->event);
>             }
>             virtscsi_kick_event(vscsi, event_node);
>     +       mutex_unlock(&vscsi->scan_lock);
>      }
> 
>      static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf)
>     @@ -894,6 +899,7 @@ static int virtscsi_init(struct virtio_device *vdev,
>             const char **names;
>             struct virtqueue **vqs;
> 
>     +       mutex_init(&vscsi->scan_lock);
>             num_vqs = vscsi->num_queues + VIRTIO_SCSI_VQ_BASE;
>             vqs = kmalloc(num_vqs * sizeof(struct virtqueue *), GFP_KERNEL);
>             callbacks = kmalloc(num_vqs * sizeof(vq_callback_t *), GFP_KERNEL);
>     --
>     1.9.3
> 
> 
> Nice find.
> 
> This fix does have the effect of serializing all event handling via scan_lock;
> perhaps you want to instead create a singlethreaded workqueue in virtio_scsi
> and queue handle_event there, rather than waiting on scan_lock on the system
> workqueue?

Or use the system single-threaded wq.


> Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
> 
> -- vs;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] virtio-scsi: Fix the race condition in virtscsi_handle_event
  2015-01-05 22:10   ` Michael S. Tsirkin
@ 2015-01-06  3:57     ` Fam Zheng
  2015-01-06  7:15     ` Michael S. Tsirkin
  1 sibling, 0 replies; 5+ messages in thread
From: Fam Zheng @ 2015-01-06  3:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venkatesh Srinivas, linux-scsi, James E.J. Bottomley,
	Linux Kernel Developers List, Paolo Bonzini, Christoph Hellwig

On Tue, 01/06 00:10, Michael S. Tsirkin wrote:
> On Mon, Jan 05, 2015 at 11:48:47AM -0800, Venkatesh Srinivas wrote:
> > On Sun, Jan 4, 2015 at 10:04 PM, Fam Zheng <famz@redhat.com> wrote:
> > 
> >     There is a race condition in virtscsi_handle_event, when many device
> >     hotplug/unplug events flush in quickly.
> > 
> >     The scsi_remove_device in virtscsi_handle_transport_reset may trigger
> >     the BUG_ON in scsi_target_reap, because the state is altered behind it,
> >     probably by scsi_scan_host of another event. I'm able to reproduce it by
> >     repeatedly plugging and unplugging a scsi disk with the same lun number.
> > 
> >     To make is safe, the mutex added in struct virtio_scsi is held in
> >     virtscsi_handle_event, so that all the events are processed in a
> >     synchronized way. With this lock, the panic goes away.
> > 
> >     Signed-off-by: Fam Zheng <famz@redhat.com>
> >     ---
> >      drivers/scsi/virtio_scsi.c | 6 ++++++
> >      1 file changed, 6 insertions(+)
> > 
> >     diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> >     index c52bb5d..7f194d4 100644
> >     --- a/drivers/scsi/virtio_scsi.c
> >     +++ b/drivers/scsi/virtio_scsi.c
> >     @@ -110,6 +110,9 @@ struct virtio_scsi {
> >             /* CPU hotplug notifier */
> >             struct notifier_block nb;
> > 
> >     +       /* Protect the hotplug/unplug event handling */
> >     +       struct mutex scan_lock;
> >     +
> >             /* Protected by event_vq lock */
> >             bool stop_events;
> > 
> >     @@ -377,6 +380,7 @@ static void virtscsi_handle_event(struct work_struct
> >     *work)
> >             struct virtio_scsi *vscsi = event_node->vscsi;
> >             struct virtio_scsi_event *event = &event_node->event;
> > 
> >     +       mutex_lock(&vscsi->scan_lock);
> >             if (event->event &
> >                 cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) {
> >                     event->event &= ~cpu_to_virtio32(vscsi->vdev,
> >     @@ -397,6 +401,7 @@ static void virtscsi_handle_event(struct work_struct
> >     *work)
> >                     pr_err("Unsupport virtio scsi event %x\n", event->event);
> >             }
> >             virtscsi_kick_event(vscsi, event_node);
> >     +       mutex_unlock(&vscsi->scan_lock);
> >      }
> > 
> >      static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf)
> >     @@ -894,6 +899,7 @@ static int virtscsi_init(struct virtio_device *vdev,
> >             const char **names;
> >             struct virtqueue **vqs;
> > 
> >     +       mutex_init(&vscsi->scan_lock);
> >             num_vqs = vscsi->num_queues + VIRTIO_SCSI_VQ_BASE;
> >             vqs = kmalloc(num_vqs * sizeof(struct virtqueue *), GFP_KERNEL);
> >             callbacks = kmalloc(num_vqs * sizeof(vq_callback_t *), GFP_KERNEL);
> >     --
> >     1.9.3
> > 
> > 
> > Nice find.
> > 
> > This fix does have the effect of serializing all event handling via scan_lock;
> > perhaps you want to instead create a singlethreaded workqueue in virtio_scsi
> > and queue handle_event there, rather than waiting on scan_lock on the system
> > workqueue?
> 
> Or use the system single-threaded wq.

Good idea, I'll change to that.

Thanks,

Fam

> 
> 
> > Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
> > 
> > -- vs;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] virtio-scsi: Fix the race condition in virtscsi_handle_event
  2015-01-05 22:10   ` Michael S. Tsirkin
  2015-01-06  3:57     ` Fam Zheng
@ 2015-01-06  7:15     ` Michael S. Tsirkin
  1 sibling, 0 replies; 5+ messages in thread
From: Michael S. Tsirkin @ 2015-01-06  7:15 UTC (permalink / raw)
  To: Venkatesh Srinivas
  Cc: Fam Zheng, linux-scsi, James E.J. Bottomley,
	Linux Kernel Developers List, Paolo Bonzini, Christoph Hellwig

On Tue, Jan 06, 2015 at 12:10:59AM +0200, Michael S. Tsirkin wrote:
> On Mon, Jan 05, 2015 at 11:48:47AM -0800, Venkatesh Srinivas wrote:
> > On Sun, Jan 4, 2015 at 10:04 PM, Fam Zheng <famz@redhat.com> wrote:
> > 
> >     There is a race condition in virtscsi_handle_event, when many device
> >     hotplug/unplug events flush in quickly.
> > 
> >     The scsi_remove_device in virtscsi_handle_transport_reset may trigger
> >     the BUG_ON in scsi_target_reap, because the state is altered behind it,
> >     probably by scsi_scan_host of another event. I'm able to reproduce it by
> >     repeatedly plugging and unplugging a scsi disk with the same lun number.
> > 
> >     To make is safe, the mutex added in struct virtio_scsi is held in
> >     virtscsi_handle_event, so that all the events are processed in a
> >     synchronized way. With this lock, the panic goes away.
> > 
> >     Signed-off-by: Fam Zheng <famz@redhat.com>
> >     ---
> >      drivers/scsi/virtio_scsi.c | 6 ++++++
> >      1 file changed, 6 insertions(+)
> > 
> >     diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> >     index c52bb5d..7f194d4 100644
> >     --- a/drivers/scsi/virtio_scsi.c
> >     +++ b/drivers/scsi/virtio_scsi.c
> >     @@ -110,6 +110,9 @@ struct virtio_scsi {
> >             /* CPU hotplug notifier */
> >             struct notifier_block nb;
> > 
> >     +       /* Protect the hotplug/unplug event handling */
> >     +       struct mutex scan_lock;
> >     +
> >             /* Protected by event_vq lock */
> >             bool stop_events;
> > 
> >     @@ -377,6 +380,7 @@ static void virtscsi_handle_event(struct work_struct
> >     *work)
> >             struct virtio_scsi *vscsi = event_node->vscsi;
> >             struct virtio_scsi_event *event = &event_node->event;
> > 
> >     +       mutex_lock(&vscsi->scan_lock);
> >             if (event->event &
> >                 cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) {
> >                     event->event &= ~cpu_to_virtio32(vscsi->vdev,
> >     @@ -397,6 +401,7 @@ static void virtscsi_handle_event(struct work_struct
> >     *work)
> >                     pr_err("Unsupport virtio scsi event %x\n", event->event);
> >             }
> >             virtscsi_kick_event(vscsi, event_node);
> >     +       mutex_unlock(&vscsi->scan_lock);
> >      }
> > 
> >      static void virtscsi_complete_event(struct virtio_scsi *vscsi, void *buf)
> >     @@ -894,6 +899,7 @@ static int virtscsi_init(struct virtio_device *vdev,
> >             const char **names;
> >             struct virtqueue **vqs;
> > 
> >     +       mutex_init(&vscsi->scan_lock);
> >             num_vqs = vscsi->num_queues + VIRTIO_SCSI_VQ_BASE;
> >             vqs = kmalloc(num_vqs * sizeof(struct virtqueue *), GFP_KERNEL);
> >             callbacks = kmalloc(num_vqs * sizeof(vq_callback_t *), GFP_KERNEL);
> >     --
> >     1.9.3
> > 
> > 
> > Nice find.
> > 
> > This fix does have the effect of serializing all event handling via scan_lock;
> > perhaps you want to instead create a singlethreaded workqueue in virtio_scsi
> > and queue handle_event there, rather than waiting on scan_lock on the system
> > workqueue?
> 
> Or use the system single-threaded wq.


I was sure we have one, but apparently not :(

Pls ignore the comment, sorry about the noise.

> 
> > Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
> > 
> > -- vs;

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-06  7:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-05  6:04 [PATCH] virtio-scsi: Fix the race condition in virtscsi_handle_event Fam Zheng
2015-01-05 19:51 ` Venkatesh Srinivas
     [not found] ` <CAHdzE-_QmNZVhmChHmy3FgRjJ692Apw2Va-SGoTz25s0fLjbvQ@mail.gmail.com>
2015-01-05 22:10   ` Michael S. Tsirkin
2015-01-06  3:57     ` Fam Zheng
2015-01-06  7:15     ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).