From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753225AbbLJGwZ (ORCPT ); Thu, 10 Dec 2015 01:52:25 -0500 Received: from mx2.suse.de ([195.135.220.15]:33895 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752923AbbLJGwX (ORCPT ); Thu, 10 Dec 2015 01:52:23 -0500 Subject: Re: [BISECTED] WARNING: CPU: 2 PID: 142 at block/genhd.c:626 add_disk+0x480/0x4e0() To: Laura Abbott , Christoph Hellwig , "Martin K. Petersen" , Mike Snitzer , James Bottomley References: <5668F8C8.6000601@redhat.com> Cc: linux-scsi@vger.kernel.org, Linux Kernel Mailing List , awilliam@redhat.com From: Hannes Reinecke Message-ID: <56692124.6000208@suse.de> Date: Thu, 10 Dec 2015 07:52:20 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <5668F8C8.6000601@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/10/2015 05:00 AM, Laura Abbott wrote: > Hi, > > We received a report > (https://bugzilla.redhat.com/show_bug.cgi?id=1288687) that > live images with the rawhide kernel were failing to boot on USB sticks. > Similar issues were reported when just inserting a USB stick into a > boot from a > CD instead of USB ("I see /dev/sdb, but no /dev/sdb1 etc." per the > report) > I reduced the test scenario to: > > 1) insert scsi_dh_alua module > 2) insert Live USB drive > > which gives > > [ 125.107185] sd 6:0:0:0: alua: supports implicit and explicit TPGS > [ 125.107778] sd 6:0:0:0: [sdb] 15634432 512-byte logical blocks: > (8.00 GB/7.46 GiB) > [ 125.107973] sd 6:0:0:0: alua: No target port descriptors found > [ 125.107975] sd 6:0:0:0: alua: Attach failed (-22) > [ 125.107978] sd 6:0:0:0: failed to add device handler: -22 > [ 125.108462] sd 6:0:0:0: [sdb] Write Protect is off > [ 125.108465] sd 6:0:0:0: [sdb] Mode Sense: 43 00 00 00 > [ 125.108468] sd 6:0:0:0: [sdb] Asking for cache data failed > [ 125.108469] sd 6:0:0:0: [sdb] Assuming drive cache: write through > [ 125.109122] ------------[ cut here ]------------ > [ 125.109127] WARNING: CPU: 2 PID: 142 at block/genhd.c:626 > add_disk+0x480/0x4e0() > [ 125.109128] Modules linked in: uas usb_storage scsi_dh_alua fuse > xt_CHECKSUM > ipt_MASQUERADE nf_nat_masquerade_ipv4 ccm tun nf_conntrack_netbios_ns > nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 > xt_conntrack > ebtable_filter ebtable_nat ebtable_broute bridge stp llc ebtables > ip6table_raw > ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 > nf_nat_ipv6 > ip6table_mangle ip6table_filter ip6_tables iptable_raw iptable_security > iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat > nf_conntrack > iptable_mangle bnep snd_hda_codec_hdmi arc4 iwlmvm mac80211 i915 > intel_rapl > iosf_mbi x86_pkg_temp_thermal coretemp iwlwifi kvm_intel kvm > snd_hda_codec_realtek uvcvideo snd_hda_codec_generic btusb > snd_hda_intel btrtl > videobuf2_vmalloc cfg80211 snd_hda_codec btbcm iTCO_wdt videobuf2_v4l2 > [ 125.109164] btintel iTCO_vendor_support videobuf2_core irqbypass > videobuf2_memops bluetooth v4l2_common snd_hda_core ghash_clmulni_intel > videodev snd_hwdep snd_seq media pcspkr joydev snd_seq_device > rtsx_pci_ms > snd_pcm memstick thinkpad_acpi snd_timer mei_me snd i2c_algo_bit mei > drm_kms_helper ie31200_edac rfkill tpm_tis edac_core shpchp > soundcore tpm > i2c_i801 lpc_ich wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc > binfmt_misc > dm_crypt hid_microsoft rtsx_pci_sdmmc mmc_core crct10dif_pclmul > crc32_pclmul > crc32c_intel serio_raw drm e1000e ptp rtsx_pci pps_core fjes video > [ 125.109197] CPU: 2 PID: 142 Comm: kworker/u16:6 Tainted: G W > 4.4.0-rc4-usbbadness-next-20151209+ #3 > [ 125.109198] Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS > GMET62WW (2.10 ) 03/19/2014 > [ 125.109202] Workqueue: events_unbound async_run_entry_fn > [ 125.109204] 0000000000000000 00000000202f2ede ffff880402ccfc38 > ffffffff81434509 > [ 125.109206] 0000000000000000 ffff880402ccfc70 ffffffff810ad9c2 > ffff880407a1e000 > [ 125.109208] ffff880407a1e0b0 ffff880407a1e00c ffff880401e48ef0 > ffff8800c90d0600 > [ 125.109211] Call Trace: > [ 125.109214] [] dump_stack+0x4b/0x72 > [ 125.109218] [] warn_slowpath_common+0x82/0xc0 > [ 125.109220] [] warn_slowpath_null+0x1a/0x20 > [ 125.109222] [] add_disk+0x480/0x4e0 > [ 125.109225] [] sd_probe_async+0x115/0x1d0 > [ 125.109227] [] async_run_entry_fn+0x4a/0x140 > [ 125.109231] [] process_one_work+0x239/0x6b0 > [ 125.109233] [] ? process_one_work+0x1a2/0x6b0 > [ 125.109235] [] worker_thread+0x4e/0x490 > [ 125.109237] [] ? process_one_work+0x6b0/0x6b0 > [ 125.109238] [] kthread+0x101/0x120 > [ 125.109242] [] ? > trace_hardirqs_on_caller+0x129/0x1b0 > [ 125.109243] [] ? kthread_create_on_node+0x250/0x250 > [ 125.109247] [] ret_from_fork+0x3f/0x70 > [ 125.109248] [] ? kthread_create_on_node+0x250/0x250 > [ 125.109250] ---[ end trace d54b73ed8d1295d5 ]--- > [ 125.109272] sd 6:0:0:0: [sdb] Attached SCSI removable disk > > and no partitions so the drive can't be mounted. Note the alua -EINVAL > error is there even when the drive can be mounted so the warning and > lack of partitions is the real indication of the problem. > > I did a bisect and came up with this as the first bad commit: > > commit 086b91d052ebe4ead5d28021afe3bdfd70af15bf > Author: Christoph Hellwig > Date: Thu Aug 27 14:16:57 2015 +0200 > > scsi_dh: integrate into the core SCSI code > > Stop building scsi_dh as a separate module and integrate it fully > into the > core SCSI code with explicit callouts at bus scan time. For now the > callouts are placed at the same point as the old bus notifiers were > called, > but in the future we will be able to look at ALUA INQUIRY data > earlier on. > > Note that this also means that the device handler modules need to be > loaded > by the time we scan the bus. The next patches will add support for > autoloading device handlers at bus scan time to make sure they are > always > loaded if they are enabled in the kernel config. > > Signed-off-by: Christoph Hellwig > Reviewed-by: Martin K. Petersen > Reviewed-by: Hannes Reinecke > Acked-by: Mike Snitzer > Signed-off-by: James Bottomley > > This was an involved commit so I didn't try to revert. Any ideas here? > Full bisect log is below > There's a patchset to update the ALUA handler in Martin Petersens tree which should help here; most notably the commit 'scsi: ignore errors from scsi_dh_add_device()' should fix this particular issue. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)