From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Smart Subject: Re: [patch 19/30] Fix the reproducible oops in scsi Date: Fri, 16 Feb 2007 09:33:15 -0500 Message-ID: <45D5C0AB.3040902@emulex.com> References: <200702160946.l1G9kb4I026673@shell0.pdx.osdl.net> Reply-To: James.Smart@Emulex.Com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from emulex.emulex.com ([138.239.112.1]:55248 "EHLO emulex.emulex.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422634AbXBPOeL (ORCPT ); Fri, 16 Feb 2007 09:34:11 -0500 In-Reply-To: <200702160946.l1G9kb4I026673@shell0.pdx.osdl.net> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: akpm@linux-foundation.org Cc: James.Bottomley@steeleye.com, linux-scsi@vger.kernel.org, hirofumi@mail.parknet.co.jp Andrew, Please do not include this fix. Per a discussion earlier this week, Hannes Reinecke has proposed a fix: http://marc.theaimsgroup.com/?l=linux-scsi&m=117139230702785&w=2 -- james s akpm@linux-foundation.org wrote: > From: OGAWA Hirofumi > > I got the oops after some hotplug events. And the similar oops can > reproduce by the following step. > > plug usb-storage (e.g. scsi_host of usb is "host5", and device is sde) > # mount /dev/sde1 /mnt > # echo 1 > /sys/block/sde/device/delete > # echo - - - > /sys/class/scsi_host/host5/scan > # echo - - - > /sys/class/scsi_host/host5/scan > # echo - - - > /sys/class/scsi_host/host5/scan > unplug usb-storage > # umount /mnt > > general protection fault: 0000 [1] SMP > CPU 1 > Modules linked in: nls_iso8859_1 nls_cp437 vfat fat nls_base nfsd exportfs p4_clockmod speedstep_lib cpufreq_ondemand freq_table thermal fan button processor ac battery autofs4 ipv6 nfs lockd nfs_acl sunrpc deflate zlib_deflate zlib_inflate twofish twofish_common serpent blowfish des cbc ecb blkcipher aes xcbc sha256 sha1 crypto_null hmac crypto_hash cryptomgr af_key binfmt_misc dm_snapshot dm_mirror dm_mod eth1394 usb_storage snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer r8169 snd i2c_i801 bitrev crc32 ehci_hcd uhci_hcd intel_agp ohci1394 ieee1394 rng_core usbcore soundcore snd_page_alloc i2c_core parport_pc parport evdev sg sr_mod cdrom floppy > Pid: 3780, comm: umount Not tainted 2.6.20-rc4 #1 > RIP: 0010:[] [] sysfs_remove_file+0x8/0x12 > RSP: 0018:ffff810059703b58 EFLAGS: 00010286 > RAX: ffff8100596bba50 RBX: ffff8100596bba50 RCX: ffff810059703b68 > RDX: ffff810059703b48 RSI: 6b6b6b6b6b6b6b6b RDI: 6b6b6b6b6b6b6b6b > RBP: ffff810059703b58 R08: ffff8100596bbac8 R09: 0000000000000000 > R10: ffff8100596bba28 R11: 0000000000000400 R12: 6b6b6b6b6b6b6b6b > R13: ffff81005e303650 R14: ffff81005e9cf1b8 R15: ffff8100596bba50 > FS: 00002af1e22111d0(0000) GS:ffff81005f7a9898(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00007fff3c0145e8 CR3: 0000000059600000 CR4: 00000000000006e0 > Process umount (pid: 3780, threadinfo ffff810059702000, task ffff81005f3997f0) > Stack: ffff810059703b78 ffffffff8023a07a ffff8100596bba50 ffff8100596bba50 > ffff810059703ba8 ffffffff8023a110 ffff81005e303478 ffff8100596bba50 > ffff8100596bba28 ffff81005e303478 ffff810059703bd8 ffffffff80249049 > Call Trace: > [] device_remove_file+0x26/0x33 > [] device_del+0x3a/0x1d1 > [] scsi_target_reap_usercontext+0x54/0xbb > [] execute_in_process_context+0x27/0x56 > [] scsi_target_reap+0x98/0xac > [] scsi_device_dev_release_usercontext+0xab/0xd8 > [] execute_in_process_context+0x27/0x56 > [] scsi_device_dev_release+0x17/0x19 > [] device_release+0x31/0x70 > [] kobject_cleanup+0x53/0x77 > [] kobject_release+0x0/0xf > [] kobject_release+0xd/0xf > [] kref_put+0x63/0x6d > [] kobject_put+0x19/0x1b > [] put_device+0x15/0x17 > [] scsi_device_put+0x3d/0x42 > [] scsi_disk_put+0x2e/0x3f > [] sd_release+0x7b/0x82 > [] __blkdev_put+0x7c/0x15e > [] __blkdev_put+0x12b/0x15e > [] blkdev_put+0xb/0xd > [] close_bdev_excl+0x19/0x1e > [] kill_block_super+0x36/0x3b > [] deactivate_super+0x6f/0x84 > [] mntput_no_expire+0x56/0x87 > [] path_release_on_umount+0x1d/0x21 > [] sys_umount+0x21f/0x254 > [] sys_newstat+0x22/0x3c > [] system_call+0x7e/0x83 > > By "echo 1 > /sys/block/sde/device/delete", sdev->sdev_state became SDEV_DEL. > > The problem is in, > scsi_probe_and_add_lun() > -> scsi_device_lookup_by_target() > -> __scsi_device_lookup_by_target() > > by "echo - - - > /sys/class/scsi_host/host5/scan". > > The first scan is still not problem. Since founded existence device is > already SDEV_DEL, it just adds a new device. > > The problem is second re-scan, it is looking a existence device up. In > this case it is SDEV_DEL device, not newly added device by first scan. > > This patch fixes the oops by excluding the SDEV_DEL devices in > __scsi_device_lookup_by_target(). > > Signed-off-by: OGAWA Hirofumi > Cc: James Smart > > > > On Wed, 10 Jan 2007 08:31:54 -0500 > James Smart wrote: > >> I don't believe this is a valid fix. This is yet another case >> of the reuse-after-free issues on sdevs. The real issue is the >> deleted sdev isn't truly getting deleted due to references, and >> we're deadlocked trying to allocate a new one while the old one >> is outstanding. This fix just jumps over things. You're actually >> using a partially torn down sdev that, if the refcounts ever >> decremented, would be zapped - and you would be in a bunch of trouble. > > Signed-off-by: Andrew Morton > --- > > drivers/scsi/scsi.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff -puN drivers/scsi/scsi.c~fix-the-reproducible-oops-in-scsi drivers/scsi/scsi.c > --- a/drivers/scsi/scsi.c~fix-the-reproducible-oops-in-scsi > +++ a/drivers/scsi/scsi.c > @@ -937,6 +937,8 @@ struct scsi_device *__scsi_device_lookup > struct scsi_device *sdev; > > list_for_each_entry(sdev, &starget->devices, same_target_siblings) { > + if (sdev->sdev_state == SDEV_DEL) > + continue; > if (sdev->lun ==lun) > return sdev; > } > _ >