All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miles Chen <miles.chen@mediatek.com>
To: <alice.chao@mediatek.com>
Cc: <cc.chou@mediatek.com>, <chaotian.jing@mediatek.com>,
	<chun-hung.wu@mediatek.com>, <jejb@linux.ibm.com>,
	<jiajie.hao@mediatek.com>, <lin.gui@mediatek.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-mediatek@lists.infradead.org>,
	<linux-scsi@vger.kernel.org>, <martin.petersen@oracle.com>,
	<matthias.bgg@gmail.com>, <peter.wang@mediatek.com>,
	<powen.kao@mediatek.com>, <qilin.tan@mediatek.com>,
	<stanley.chu@mediatek.com>, <wsd_upstream@mediatek.com>,
	<yanxu.wei@mediatek.com>
Subject: Re: [PATCH v3 1/1] scsi: Fix racing between dev init and dev reset
Date: Fri, 15 Apr 2022 13:52:33 +0800	[thread overview]
Message-ID: <20220415055233.29264-1-miles.chen@mediatek.com> (raw)
In-Reply-To: <20220415040446.26451-2-alice.chao@mediatek.com>

Hi Alice,

> Device reset thread uses kobject_uevent_env() to get kobj.parent, and it
> aces with device init thread which calls device_add() to add kobj.parent

"aces" may be "races"?

> before kobject_uevent_env().
> 
> Device init call:           Device reset call:
>  scsi_probe_and_add_lun()    scsi_evt_thread()
>   scsi_add_lun()             scsi_evt_emit()
>    scsi_sysfs_add_sdev()      kobject_uevent_env() //get kobj.parent
>     scsi_target_add()           kobject_get_path()
>                                  len = get_kobj_path_length () // len=1 because parent hasn't created yet
>     device_add() // add kobj.parent
>       kobject_uevent_env()
>        kobject_get_path()         path = kzalloc()
>         fill_kobj_path()           fill_kobj_path() // --length; length -= cur is a negative value
>                                     memcpy(path + length, kobject_name(parent), cur); // slab OOB!
> 
> Above backtrace describes the problem, device reset thread will get wrong
> kobj.parent when device init thread didn’t add kobj.parent yet. When this
> racing happened, it triggers the a KASAN dump on the final iteration:
> 
> BUG: KASAN: slab-out-of-bounds in kobject_get_path+0xf8/0x1b8
> Write of size 11 at addr ffffff80d6bb94f5 by task kworker/3:1/58
> <snip>
> Call trace:
>  __kasan_report+0x124/0x1c8
>  kasan_report+0x54/0x84
>  kasan_check_range+0x200/0x208
>  memcpy+0xb8/0xf0
>  kobject_get_path+0xf8/0x1b8
>  kobject_uevent_env+0x228/0xa88
>  scsi_evt_thread+0x2d0/0x5b0
>  process_one_work+0x570/0xf94
>  worker_thread+0x7cc/0xf80
>  kthread+0x2c4/0x388
> 
> These two jobs are scheduled asynchronously, we can't guaranteed that
> kobj.parent will be created in device init thread before device reset
> thread calls kobject_get_path().
> 
> To resolve the racing issue between device init thread and device reset
> thread, we use wait_event() in scsi_evt_emit() to wait for device_add()
> to complete the creation of kobj.parent.
> 
> Device init call:                Device reset call:
> ufshcd_async_scan()              scsi_evt_thread()
>  scsi_scan_host()                 scsi_evt_emit() <- add wait_event()
>   do_scsi_scan_host() <- add wake_up()
>    scsi_scan_host_selected()
>     scsi_scan_channel()
>      scsi_probe_and_add_lun()
>       scsi_target_add()
>        device_add() // add kobj.parent
>         kobject_uevent_env()
>          kobject_get_path()
>           fill_kobj_path()
>   do_scan_async() <- wake_up()     kobject_uevent_env() // add kobj.parent

There is no do_scan_async() changes in this patch. It this a typo?
From the patch, the flow looks like:

Device init call                        Device reset call:
do_scsi_scan_host()                     scsi_evt_thread()
 scsi_scan_host_selected()               scsi_evt_emit() <- add wait_event()
  scsi_scan_channel()
   scsi_probe_and_add_lun()
    scsi_target_add()
     device_add() // add kobj.parent
      kobject_uevent_env()
       kobject_get_path()
        fill_kobj_path()
 //call wake_up() after scsi_scan_host_selected is done
                                        kobject_uevent_env()
                                         kobject_get_path() // get valid kobj.parent
					 ...
                                           fill_kobj_path()

>                                     kobject_get_path() // get valid kobj.parent
>                                      fill_kobj_path()
> 
> After we add wake_up at do_scsi_scan_host() in device init thread, we can
> ensure that device reset thread will get kobject after device init thread
> finishes adding parent.
> 
> Signed-off-by: Alice Chao <alice.chao@mediatek.com>
> 
> ---
> 
> Change since v2
> -Change commit: Describes the preblem first and then the solution.
> -Add commit: Add KASAN error log.

Please keep all change history.

e.g.,

See https://lore.kernel.org/lkml/20220326022728.2969-1-jianjun.wang@mediatek.com/
as an example


Thanks,
Miles

> 
> ---
>  drivers/scsi/scsi_lib.c  | 1 +
>  drivers/scsi/scsi_scan.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0a70aa763a96..abf9a71ed77c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2461,6 +2461,7 @@ static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt)
>  		break;
>  	case SDEV_EVT_POWER_ON_RESET_OCCURRED:
>  		envp[idx++] = "SDEV_UA=POWER_ON_RESET_OCCURRED";
> +		wait_event(sdev->host->host_wait, sdev->sdev_gendev.kobj.parent != NULL);
>  		break;
>  	default:
>  		/* do nothing */
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index f4e6c68ac99e..431f229ac435 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1904,6 +1904,7 @@ static void do_scsi_scan_host(struct Scsi_Host *shost)
>  	} else {
>  		scsi_scan_host_selected(shost, SCAN_WILD_CARD, SCAN_WILD_CARD,
>  				SCAN_WILD_CARD, 0);
> +		wake_up(&shost->host_wait);
>  	}
>  }
>  
> -- 
> 2.18.0
> 
> 

WARNING: multiple messages have this Message-ID (diff)
From: Miles Chen <miles.chen@mediatek.com>
To: <alice.chao@mediatek.com>
Cc: <cc.chou@mediatek.com>, <chaotian.jing@mediatek.com>,
	<chun-hung.wu@mediatek.com>, <jejb@linux.ibm.com>,
	<jiajie.hao@mediatek.com>,  <lin.gui@mediatek.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-mediatek@lists.infradead.org>,
	<linux-scsi@vger.kernel.org>, <martin.petersen@oracle.com>,
	<matthias.bgg@gmail.com>, <peter.wang@mediatek.com>,
	<powen.kao@mediatek.com>, <qilin.tan@mediatek.com>,
	<stanley.chu@mediatek.com>, <wsd_upstream@mediatek.com>,
	<yanxu.wei@mediatek.com>
Subject: Re: [PATCH v3 1/1] scsi: Fix racing between dev init and dev reset
Date: Fri, 15 Apr 2022 13:52:33 +0800	[thread overview]
Message-ID: <20220415055233.29264-1-miles.chen@mediatek.com> (raw)
In-Reply-To: <20220415040446.26451-2-alice.chao@mediatek.com>

Hi Alice,

> Device reset thread uses kobject_uevent_env() to get kobj.parent, and it
> aces with device init thread which calls device_add() to add kobj.parent

"aces" may be "races"?

> before kobject_uevent_env().
> 
> Device init call:           Device reset call:
>  scsi_probe_and_add_lun()    scsi_evt_thread()
>   scsi_add_lun()             scsi_evt_emit()
>    scsi_sysfs_add_sdev()      kobject_uevent_env() //get kobj.parent
>     scsi_target_add()           kobject_get_path()
>                                  len = get_kobj_path_length () // len=1 because parent hasn't created yet
>     device_add() // add kobj.parent
>       kobject_uevent_env()
>        kobject_get_path()         path = kzalloc()
>         fill_kobj_path()           fill_kobj_path() // --length; length -= cur is a negative value
>                                     memcpy(path + length, kobject_name(parent), cur); // slab OOB!
> 
> Above backtrace describes the problem, device reset thread will get wrong
> kobj.parent when device init thread didn’t add kobj.parent yet. When this
> racing happened, it triggers the a KASAN dump on the final iteration:
> 
> BUG: KASAN: slab-out-of-bounds in kobject_get_path+0xf8/0x1b8
> Write of size 11 at addr ffffff80d6bb94f5 by task kworker/3:1/58
> <snip>
> Call trace:
>  __kasan_report+0x124/0x1c8
>  kasan_report+0x54/0x84
>  kasan_check_range+0x200/0x208
>  memcpy+0xb8/0xf0
>  kobject_get_path+0xf8/0x1b8
>  kobject_uevent_env+0x228/0xa88
>  scsi_evt_thread+0x2d0/0x5b0
>  process_one_work+0x570/0xf94
>  worker_thread+0x7cc/0xf80
>  kthread+0x2c4/0x388
> 
> These two jobs are scheduled asynchronously, we can't guaranteed that
> kobj.parent will be created in device init thread before device reset
> thread calls kobject_get_path().
> 
> To resolve the racing issue between device init thread and device reset
> thread, we use wait_event() in scsi_evt_emit() to wait for device_add()
> to complete the creation of kobj.parent.
> 
> Device init call:                Device reset call:
> ufshcd_async_scan()              scsi_evt_thread()
>  scsi_scan_host()                 scsi_evt_emit() <- add wait_event()
>   do_scsi_scan_host() <- add wake_up()
>    scsi_scan_host_selected()
>     scsi_scan_channel()
>      scsi_probe_and_add_lun()
>       scsi_target_add()
>        device_add() // add kobj.parent
>         kobject_uevent_env()
>          kobject_get_path()
>           fill_kobj_path()
>   do_scan_async() <- wake_up()     kobject_uevent_env() // add kobj.parent

There is no do_scan_async() changes in this patch. It this a typo?
From the patch, the flow looks like:

Device init call                        Device reset call:
do_scsi_scan_host()                     scsi_evt_thread()
 scsi_scan_host_selected()               scsi_evt_emit() <- add wait_event()
  scsi_scan_channel()
   scsi_probe_and_add_lun()
    scsi_target_add()
     device_add() // add kobj.parent
      kobject_uevent_env()
       kobject_get_path()
        fill_kobj_path()
 //call wake_up() after scsi_scan_host_selected is done
                                        kobject_uevent_env()
                                         kobject_get_path() // get valid kobj.parent
					 ...
                                           fill_kobj_path()

>                                     kobject_get_path() // get valid kobj.parent
>                                      fill_kobj_path()
> 
> After we add wake_up at do_scsi_scan_host() in device init thread, we can
> ensure that device reset thread will get kobject after device init thread
> finishes adding parent.
> 
> Signed-off-by: Alice Chao <alice.chao@mediatek.com>
> 
> ---
> 
> Change since v2
> -Change commit: Describes the preblem first and then the solution.
> -Add commit: Add KASAN error log.

Please keep all change history.

e.g.,

See https://lore.kernel.org/lkml/20220326022728.2969-1-jianjun.wang@mediatek.com/
as an example


Thanks,
Miles

> 
> ---
>  drivers/scsi/scsi_lib.c  | 1 +
>  drivers/scsi/scsi_scan.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0a70aa763a96..abf9a71ed77c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2461,6 +2461,7 @@ static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt)
>  		break;
>  	case SDEV_EVT_POWER_ON_RESET_OCCURRED:
>  		envp[idx++] = "SDEV_UA=POWER_ON_RESET_OCCURRED";
> +		wait_event(sdev->host->host_wait, sdev->sdev_gendev.kobj.parent != NULL);
>  		break;
>  	default:
>  		/* do nothing */
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index f4e6c68ac99e..431f229ac435 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1904,6 +1904,7 @@ static void do_scsi_scan_host(struct Scsi_Host *shost)
>  	} else {
>  		scsi_scan_host_selected(shost, SCAN_WILD_CARD, SCAN_WILD_CARD,
>  				SCAN_WILD_CARD, 0);
> +		wake_up(&shost->host_wait);
>  	}
>  }
>  
> -- 
> 2.18.0
> 
> 

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

WARNING: multiple messages have this Message-ID (diff)
From: Miles Chen <miles.chen@mediatek.com>
To: <alice.chao@mediatek.com>
Cc: <cc.chou@mediatek.com>, <chaotian.jing@mediatek.com>,
	<chun-hung.wu@mediatek.com>, <jejb@linux.ibm.com>,
	<jiajie.hao@mediatek.com>,  <lin.gui@mediatek.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-mediatek@lists.infradead.org>,
	<linux-scsi@vger.kernel.org>, <martin.petersen@oracle.com>,
	<matthias.bgg@gmail.com>, <peter.wang@mediatek.com>,
	<powen.kao@mediatek.com>, <qilin.tan@mediatek.com>,
	<stanley.chu@mediatek.com>, <wsd_upstream@mediatek.com>,
	<yanxu.wei@mediatek.com>
Subject: Re: [PATCH v3 1/1] scsi: Fix racing between dev init and dev reset
Date: Fri, 15 Apr 2022 13:52:33 +0800	[thread overview]
Message-ID: <20220415055233.29264-1-miles.chen@mediatek.com> (raw)
In-Reply-To: <20220415040446.26451-2-alice.chao@mediatek.com>

Hi Alice,

> Device reset thread uses kobject_uevent_env() to get kobj.parent, and it
> aces with device init thread which calls device_add() to add kobj.parent

"aces" may be "races"?

> before kobject_uevent_env().
> 
> Device init call:           Device reset call:
>  scsi_probe_and_add_lun()    scsi_evt_thread()
>   scsi_add_lun()             scsi_evt_emit()
>    scsi_sysfs_add_sdev()      kobject_uevent_env() //get kobj.parent
>     scsi_target_add()           kobject_get_path()
>                                  len = get_kobj_path_length () // len=1 because parent hasn't created yet
>     device_add() // add kobj.parent
>       kobject_uevent_env()
>        kobject_get_path()         path = kzalloc()
>         fill_kobj_path()           fill_kobj_path() // --length; length -= cur is a negative value
>                                     memcpy(path + length, kobject_name(parent), cur); // slab OOB!
> 
> Above backtrace describes the problem, device reset thread will get wrong
> kobj.parent when device init thread didn’t add kobj.parent yet. When this
> racing happened, it triggers the a KASAN dump on the final iteration:
> 
> BUG: KASAN: slab-out-of-bounds in kobject_get_path+0xf8/0x1b8
> Write of size 11 at addr ffffff80d6bb94f5 by task kworker/3:1/58
> <snip>
> Call trace:
>  __kasan_report+0x124/0x1c8
>  kasan_report+0x54/0x84
>  kasan_check_range+0x200/0x208
>  memcpy+0xb8/0xf0
>  kobject_get_path+0xf8/0x1b8
>  kobject_uevent_env+0x228/0xa88
>  scsi_evt_thread+0x2d0/0x5b0
>  process_one_work+0x570/0xf94
>  worker_thread+0x7cc/0xf80
>  kthread+0x2c4/0x388
> 
> These two jobs are scheduled asynchronously, we can't guaranteed that
> kobj.parent will be created in device init thread before device reset
> thread calls kobject_get_path().
> 
> To resolve the racing issue between device init thread and device reset
> thread, we use wait_event() in scsi_evt_emit() to wait for device_add()
> to complete the creation of kobj.parent.
> 
> Device init call:                Device reset call:
> ufshcd_async_scan()              scsi_evt_thread()
>  scsi_scan_host()                 scsi_evt_emit() <- add wait_event()
>   do_scsi_scan_host() <- add wake_up()
>    scsi_scan_host_selected()
>     scsi_scan_channel()
>      scsi_probe_and_add_lun()
>       scsi_target_add()
>        device_add() // add kobj.parent
>         kobject_uevent_env()
>          kobject_get_path()
>           fill_kobj_path()
>   do_scan_async() <- wake_up()     kobject_uevent_env() // add kobj.parent

There is no do_scan_async() changes in this patch. It this a typo?
From the patch, the flow looks like:

Device init call                        Device reset call:
do_scsi_scan_host()                     scsi_evt_thread()
 scsi_scan_host_selected()               scsi_evt_emit() <- add wait_event()
  scsi_scan_channel()
   scsi_probe_and_add_lun()
    scsi_target_add()
     device_add() // add kobj.parent
      kobject_uevent_env()
       kobject_get_path()
        fill_kobj_path()
 //call wake_up() after scsi_scan_host_selected is done
                                        kobject_uevent_env()
                                         kobject_get_path() // get valid kobj.parent
					 ...
                                           fill_kobj_path()

>                                     kobject_get_path() // get valid kobj.parent
>                                      fill_kobj_path()
> 
> After we add wake_up at do_scsi_scan_host() in device init thread, we can
> ensure that device reset thread will get kobject after device init thread
> finishes adding parent.
> 
> Signed-off-by: Alice Chao <alice.chao@mediatek.com>
> 
> ---
> 
> Change since v2
> -Change commit: Describes the preblem first and then the solution.
> -Add commit: Add KASAN error log.

Please keep all change history.

e.g.,

See https://lore.kernel.org/lkml/20220326022728.2969-1-jianjun.wang@mediatek.com/
as an example


Thanks,
Miles

> 
> ---
>  drivers/scsi/scsi_lib.c  | 1 +
>  drivers/scsi/scsi_scan.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0a70aa763a96..abf9a71ed77c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2461,6 +2461,7 @@ static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt)
>  		break;
>  	case SDEV_EVT_POWER_ON_RESET_OCCURRED:
>  		envp[idx++] = "SDEV_UA=POWER_ON_RESET_OCCURRED";
> +		wait_event(sdev->host->host_wait, sdev->sdev_gendev.kobj.parent != NULL);
>  		break;
>  	default:
>  		/* do nothing */
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index f4e6c68ac99e..431f229ac435 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1904,6 +1904,7 @@ static void do_scsi_scan_host(struct Scsi_Host *shost)
>  	} else {
>  		scsi_scan_host_selected(shost, SCAN_WILD_CARD, SCAN_WILD_CARD,
>  				SCAN_WILD_CARD, 0);
> +		wake_up(&shost->host_wait);
>  	}
>  }
>  
> -- 
> 2.18.0
> 
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-04-15  5:52 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-15  4:04 [PATCH v3 1/1] scsi: Fix racing between dev init and dev reset Alice Chao
2022-04-15  4:04 ` Alice Chao
2022-04-15  4:04 ` Alice Chao
2022-04-15  5:52 ` Miles Chen [this message]
2022-04-15  5:52   ` Miles Chen
2022-04-15  5:52   ` Miles Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220415055233.29264-1-miles.chen@mediatek.com \
    --to=miles.chen@mediatek.com \
    --cc=alice.chao@mediatek.com \
    --cc=cc.chou@mediatek.com \
    --cc=chaotian.jing@mediatek.com \
    --cc=chun-hung.wu@mediatek.com \
    --cc=jejb@linux.ibm.com \
    --cc=jiajie.hao@mediatek.com \
    --cc=lin.gui@mediatek.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=matthias.bgg@gmail.com \
    --cc=peter.wang@mediatek.com \
    --cc=powen.kao@mediatek.com \
    --cc=qilin.tan@mediatek.com \
    --cc=stanley.chu@mediatek.com \
    --cc=wsd_upstream@mediatek.com \
    --cc=yanxu.wei@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.