From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Axboe Subject: Re: [PATCH] libata: revert "libata: use blk taging" et al. Date: Wed, 11 Mar 2015 17:45:39 -0400 Message-ID: <5500B783.20106@fb.com> References: <5500863D.4070807@cybernetics.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:1207 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751511AbbCKVqJ (ORCPT ); Wed, 11 Mar 2015 17:46:09 -0400 In-Reply-To: <5500863D.4070807@cybernetics.com> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Tony Battersby , Tejun Heo , Shaohua Li , linux-ide@vger.kernel.org Cc: Christoph Hellwig , Dan Williams , linux-kernel@vger.kernel.org On 03/11/2015 02:15 PM, Tony Battersby wrote: > This reverts commits 12cb5ce101abfaf74421f8cc9f196e708209eb79 and > 98bd4be1ba95f2fe7f543910792b7163a5de06eb. > > Commit 12cb5ce101ab ("libata: use blk taging") causes the following oops > with scsi-mq enabled: > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 > IP: [] ata_qc_new_init+0x3e/0x120 > PGD 32adf0067 PUD 32adf1067 PMD 0 > Oops: 0002 [#1] SMP DEBUG_PAGEALLOC > Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi igb > i2c_algo_bit ptp pps_core pm80xx libsas scsi_transport_sas sg coretemp > eeprom w83795 i2c_i801 > CPU: 4 PID: 1450 Comm: cydiskbench Not tainted 4.0.0-rc3 #1 > Hardware name: Supermicro X8DTH-i/6/iF/6F/X8DTH, BIOS 2.1b 05/04/12 > task: ffff8800ba86d500 ti: ffff88032a064000 task.ti: ffff88032a064000 > RIP: 0010:[] [] ata_qc_new_init+0x3e/0x120 > RSP: 0018:ffff88032a067858 EFLAGS: 00010046 > RAX: 0000000000000000 RBX: ffff8800ba0d2230 RCX: 000000000000002a > RDX: ffffffff80505ae0 RSI: 0000000000000020 RDI: ffff8800ba0d2230 > RBP: ffff88032a067868 R08: 0000000000000201 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800ba0d0000 > R13: ffff8800ba0d2230 R14: ffffffff80505ae0 R15: ffff8800ba0d0000 > FS: 0000000041223950(0063) GS:ffff88033e480000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000058 CR3: 000000032a0a3000 CR4: 00000000000006e0 > Stack: > ffff880329eee758 ffff880329eee758 ffff88032a0678a8 ffffffff80502dad > ffff8800ba167978 ffff880329eee758 ffff88032bf9c520 ffff8800ba167978 > ffff88032bf9c520 ffff88032bf9a290 ffff88032a0678b8 ffffffff80506909 > Call Trace: > [] ata_scsi_translate+0x3d/0x1b0 > [] ata_sas_queuecmd+0x149/0x2a0 > [] sas_queuecommand+0xa0/0x1f0 [libsas] > [] scsi_dispatch_cmd+0xd4/0x1a0 > [] scsi_queue_rq+0x66f/0x7f0 > [] __blk_mq_run_hw_queue+0x208/0x3f0 > [] blk_mq_run_hw_queue+0x88/0xc0 > [] blk_mq_insert_request+0xc4/0x130 > [] blk_execute_rq_nowait+0x73/0x160 > [] sg_common_write+0x3da/0x720 [sg] > [] ? might_fault+0x5e/0xb0 > [] sg_new_write+0x250/0x360 [sg] > [] ? __lock_acquire+0x50c/0xc10 > [] ? lock_release_non_nested+0xa7/0x360 > [] ? _raw_spin_unlock_irqrestore+0x3b/0x60 > [] ? might_fault+0x5e/0xb0 > [] ? might_fault+0x5e/0xb0 > [] sg_write+0x13b/0x450 [sg] > [] ? __lock_acquire+0x50c/0xc10 > [] ? do_futex+0x109/0xbf0 > [] ? might_fault+0x5e/0xb0 > [] vfs_write+0xd1/0x1b0 > [] SyS_write+0x54/0xc0 > [] system_call_fastpath+0x12/0x17 > Code: 24 20 04 0f 85 ec 00 00 00 49 83 3c 24 00 0f 84 cf 00 00 00 83 fe 1f > 0f 87 dc 00 00 00 89 f0 48 69 c0 f0 00 00 00 49 8d 44 04 40 <89> 70 58 48 > c7 40 10 00 00 00 00 4c 89 20 48 89 58 08 c7 40 64 > RIP [] ata_qc_new_init+0x3e/0x120 > RSP > CR2: 0000000000000058 > ---[ end trace 43f5eefb64627eff ]--- > > > scsi-mq uses a host-wide tag map shared among all devices with some > integer tag values >= ATA_MAX_QUEUE. These unexpectedly high tag values > cause __ata_qc_from_tag() to return NULL, which is then dereferenced in > ata_qc_new_init(), causing the oops above. Wait, something is missing here. We should not be getting tag values that are >= ATA_MAX_QUEUE. Instead of reverting this, we need to figure out why this is happening, and fix it. That is correct way forward here. What setup is this being reproduced on? -- Jens Axboe From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752288AbbCKVqM (ORCPT ); Wed, 11 Mar 2015 17:46:12 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:1207 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751511AbbCKVqJ (ORCPT ); Wed, 11 Mar 2015 17:46:09 -0400 Message-ID: <5500B783.20106@fb.com> Date: Wed, 11 Mar 2015 17:45:39 -0400 From: Jens Axboe User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Tony Battersby , Tejun Heo , Shaohua Li , CC: Christoph Hellwig , Dan Williams , Subject: Re: [PATCH] libata: revert "libata: use blk taging" et al. References: <5500863D.4070807@cybernetics.com> In-Reply-To: <5500863D.4070807@cybernetics.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [192.168.54.13] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.13.68,1.0.33,0.0.0000 definitions=2015-03-11_04:2015-03-11,2015-03-11,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 kscore.is_bulkscore=5.55111512312578e-17 kscore.compositescore=0 circleOfTrustscore=0 compositescore=0.985052395226116 suspectscore=0 recipient_domain_to_sender_totalscore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 rbsscore=0.985052395226116 recipient_to_sender_totalscore=0 recipient_domain_to_sender_domain_totalscore=0 spamscore=0 recipient_to_sender_domain_totalscore=0 urlsuspectscore=0.985052395226116 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1503110224 X-FB-Internal: deliver Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/11/2015 02:15 PM, Tony Battersby wrote: > This reverts commits 12cb5ce101abfaf74421f8cc9f196e708209eb79 and > 98bd4be1ba95f2fe7f543910792b7163a5de06eb. > > Commit 12cb5ce101ab ("libata: use blk taging") causes the following oops > with scsi-mq enabled: > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 > IP: [] ata_qc_new_init+0x3e/0x120 > PGD 32adf0067 PUD 32adf1067 PMD 0 > Oops: 0002 [#1] SMP DEBUG_PAGEALLOC > Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi igb > i2c_algo_bit ptp pps_core pm80xx libsas scsi_transport_sas sg coretemp > eeprom w83795 i2c_i801 > CPU: 4 PID: 1450 Comm: cydiskbench Not tainted 4.0.0-rc3 #1 > Hardware name: Supermicro X8DTH-i/6/iF/6F/X8DTH, BIOS 2.1b 05/04/12 > task: ffff8800ba86d500 ti: ffff88032a064000 task.ti: ffff88032a064000 > RIP: 0010:[] [] ata_qc_new_init+0x3e/0x120 > RSP: 0018:ffff88032a067858 EFLAGS: 00010046 > RAX: 0000000000000000 RBX: ffff8800ba0d2230 RCX: 000000000000002a > RDX: ffffffff80505ae0 RSI: 0000000000000020 RDI: ffff8800ba0d2230 > RBP: ffff88032a067868 R08: 0000000000000201 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800ba0d0000 > R13: ffff8800ba0d2230 R14: ffffffff80505ae0 R15: ffff8800ba0d0000 > FS: 0000000041223950(0063) GS:ffff88033e480000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000058 CR3: 000000032a0a3000 CR4: 00000000000006e0 > Stack: > ffff880329eee758 ffff880329eee758 ffff88032a0678a8 ffffffff80502dad > ffff8800ba167978 ffff880329eee758 ffff88032bf9c520 ffff8800ba167978 > ffff88032bf9c520 ffff88032bf9a290 ffff88032a0678b8 ffffffff80506909 > Call Trace: > [] ata_scsi_translate+0x3d/0x1b0 > [] ata_sas_queuecmd+0x149/0x2a0 > [] sas_queuecommand+0xa0/0x1f0 [libsas] > [] scsi_dispatch_cmd+0xd4/0x1a0 > [] scsi_queue_rq+0x66f/0x7f0 > [] __blk_mq_run_hw_queue+0x208/0x3f0 > [] blk_mq_run_hw_queue+0x88/0xc0 > [] blk_mq_insert_request+0xc4/0x130 > [] blk_execute_rq_nowait+0x73/0x160 > [] sg_common_write+0x3da/0x720 [sg] > [] ? might_fault+0x5e/0xb0 > [] sg_new_write+0x250/0x360 [sg] > [] ? __lock_acquire+0x50c/0xc10 > [] ? lock_release_non_nested+0xa7/0x360 > [] ? _raw_spin_unlock_irqrestore+0x3b/0x60 > [] ? might_fault+0x5e/0xb0 > [] ? might_fault+0x5e/0xb0 > [] sg_write+0x13b/0x450 [sg] > [] ? __lock_acquire+0x50c/0xc10 > [] ? do_futex+0x109/0xbf0 > [] ? might_fault+0x5e/0xb0 > [] vfs_write+0xd1/0x1b0 > [] SyS_write+0x54/0xc0 > [] system_call_fastpath+0x12/0x17 > Code: 24 20 04 0f 85 ec 00 00 00 49 83 3c 24 00 0f 84 cf 00 00 00 83 fe 1f > 0f 87 dc 00 00 00 89 f0 48 69 c0 f0 00 00 00 49 8d 44 04 40 <89> 70 58 48 > c7 40 10 00 00 00 00 4c 89 20 48 89 58 08 c7 40 64 > RIP [] ata_qc_new_init+0x3e/0x120 > RSP > CR2: 0000000000000058 > ---[ end trace 43f5eefb64627eff ]--- > > > scsi-mq uses a host-wide tag map shared among all devices with some > integer tag values >= ATA_MAX_QUEUE. These unexpectedly high tag values > cause __ata_qc_from_tag() to return NULL, which is then dereferenced in > ata_qc_new_init(), causing the oops above. Wait, something is missing here. We should not be getting tag values that are >= ATA_MAX_QUEUE. Instead of reverting this, we need to figure out why this is happening, and fix it. That is correct way forward here. What setup is this being reproduced on? -- Jens Axboe