From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@bugzilla.kernel.org Subject: [Bug 81861] Oops by mvsas v0.8.16: sas: ataX: end_device-Y:0:Z: dev error handler -> general protection fault, RIP: mvs_task_prep_ata+0x80/0x3a0 Date: Fri, 26 Sep 2014 07:04:54 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from mail.kernel.org ([198.145.19.201]:60811 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753800AbaIZHE7 (ORCPT ); Fri, 26 Sep 2014 03:04:59 -0400 Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 586AB202D1 for ; Fri, 26 Sep 2014 07:04:58 +0000 (UTC) Received: from bugzilla2.web.kernel.org (bugzilla2.web.kernel.org [172.20.200.52]) by mail.kernel.org (Postfix) with ESMTP id 04063202E6 for ; Fri, 26 Sep 2014 07:04:56 +0000 (UTC) In-Reply-To: Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: linux-scsi@vger.kernel.org https://bugzilla.kernel.org/show_bug.cgi?id=81861 Leon Woestenberg changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sidebranch.linux@gmail.com --- Comment #17 from Leon Woestenberg --- With TXQ_PHY_SHIFT being 12, and TXQ_CMD_SHIFT being 29, it seems the PHY one-bit-hot coding appears in bits 12 through 28 inclusive. I.e. 16 bits or PHY ID's are supported. The register transmitted to the controller seems a 32-bit fixed register, so this seems a hardware limitation rather than software driver limitation. 469 del_q = TXQ_MODE_I | tag | 470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) | 471 (MVS_PHY_ID << TXQ_PHY_SHIFT) | 472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT); printk("%d", mvi->tx_prod]); 473 mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q); Remaining question: how is this supposed to fly with port expanders where PHY ID's get >16? Thanks to an extensive debug report by e-mail from Rob Elliott (HP Server Storage) --- thanks! --- which I copied ad verbatim: --- 1. Although MVS_PHY_ID looks like a constant, it's really not: #define MVS_PHY_ID (1U << sas_phy->id) 2. This fault: [ 32.271218] BUG: unable to handle kernel NULL pointer dereference at 0000000000000255 (although 255 looks like a decimal number 0xff, it's really hex 0x255) at this line: 0xffffffffa01c481e <+1838>: mov 0x254(%rbx),%ecx implies that rbx contains 1, so 0x254 + 1 = 0x255. 3. pahole drivers/scsi/mvsas/mv_sas.o shows there are two structures with fields at offset 596: * asd_sas_phy.id * asd_sas_port.sas_addr[8] 4. objdump -drS drivers/scsi/mvsas/mv_sas.o shows only a few lines with 0x254(%something), one of which is the del_q line you've identified: mvs_task_prep_ata(struct mvs_info *mvi, struct mvs_task_exec_info *tei): struct sas_ha_struct *sha = mvi->sas; struct sas_task *task = tei->task; struct domain_device *dev = task->dev; struct sas_phy *sphy = dev->phy; struct asd_sas_phy *sas_phy = sha->sas_phy[sphy->number]; ... del_q = TXQ_MODE_I | tag | (TXQ_CMD_STP << TXQ_CMD_SHIFT) | (MVS_PHY_ID << TXQ_PHY_SHIFT) | (mvi_dev->taskfileset << TXQ_SRS_SHIFT); mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q); MVS_PHY_ID = sas_phy->id = sha->sas_phy[sphy->number] = mvi->sas->sas_phy[dev->phy->number] = mvi->sas->sas_phy[task->dev->phy->number]->id mvi->sas->sas_phy[tei->task->dev->phy->number]->id Looking at the offsets reported by pahole, that means: %rdi->56->344[%rsi->0->0->56->688]->254 mvi->sas->sas_phy is a pointer to a pointer: struct sas_ha_struct { ... struct asd_sas_phy * * sas_phy; /* 344 8 */ You might look for somewhere that could accidentally be setting sas_phy[something] to a for loop index, with a typecast hiding the problem from the compiler. Or, the phy->number value being passed might be out of range; if there were discovery errors, something might not have been initialized like this function expects. Rob Elliott HP Server Storage --- -- You are receiving this mail because: You are watching the assignee of the bug.