From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753255Ab2D3IPY (ORCPT ); Mon, 30 Apr 2012 04:15:24 -0400 Received: from s15943758.onlinehome-server.info ([217.160.130.188]:39056 "EHLO mail.x86-64.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751933Ab2D3IPW (ORCPT ); Mon, 30 Apr 2012 04:15:22 -0400 Date: Mon, 30 Apr 2012 10:15:13 +0200 From: Borislav Petkov To: Mauro Carvalho Chehab Cc: Linux Edac Mailing List , Linux Kernel Mailing List , Aristeu Rozanski , Doug Thompson , Mark Gross , Jason Uhlenkott , Tim Small , Ranganathan Desikan , "Arvind R." , Olof Johansson , Egor Martovetsky , Chris Metcalf , Michal Marek , Jiri Kosina , Joe Perches , Dmitry Eremin-Solenikov , Benjamin Herrenschmidt , Hitoshi Mitake , Andrew Morton , Niklas =?iso-8859-1?Q?S=F6derlund?= , Shaohui Xie , Josh Boyer , linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH EDACv16 1/2] edac: Change internal representation to work with layers Message-ID: <20120430081513.GD8182@aftab.osrc.amd.com> References: <1335289087-11337-1-git-send-email-mchehab@redhat.com> <1335291342-14922-1-git-send-email-mchehab@redhat.com> <20120427133304.GE9626@aftab.osrc.amd.com> <4F9ABCEC.9090807@redhat.com> <20120428090523.GD26065@aftab.osrc.amd.com> <4F9D46F8.1020104@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F9D46F8.1020104@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Apr 29, 2012 at 10:49:44AM -0300, Mauro Carvalho Chehab wrote: > > [ 10.486440] EDAC MC: DCT0 chip selects: > > [ 10.486443] EDAC amd64: MC: 0: 2048MB 1: 2048MB > > [ 10.486445] EDAC amd64: MC: 2: 2048MB 3: 2048MB > > [ 10.486448] EDAC amd64: MC: 4: 0MB 5: 0MB > > [ 10.486450] EDAC amd64: MC: 6: 0MB 7: 0MB > > [ 10.486453] EDAC DEBUG: amd64_debug_display_dimm_sizes: F2x180 (DRAM Bank Address Mapping): 0x00000088 > > [ 10.486455] EDAC MC: DCT1 chip selects: > > [ 10.486458] EDAC amd64: MC: 0: 2048MB 1: 2048MB > > [ 10.486460] EDAC amd64: MC: 2: 2048MB 3: 2048MB > > [ 10.486463] EDAC amd64: MC: 4: 0MB 5: 0MB > > [ 10.486465] EDAC amd64: MC: 6: 0MB 7: 0MB > > [ 10.486467] EDAC amd64: using x8 syndromes. > > [ 10.486469] EDAC DEBUG: amd64_dump_dramcfg_low: F2x190 (DRAM Cfg Low): 0x00083100 > > [ 10.486472] EDAC DEBUG: amd64_dump_dramcfg_low: DIMM type: buffered; all DIMMs support ECC: yes > > [ 10.486475] EDAC DEBUG: amd64_dump_dramcfg_low: PAR/ERR parity: enabled > > [ 10.486478] EDAC DEBUG: amd64_dump_dramcfg_low: DCT 128bit mode width: 64b > > [ 10.486481] EDAC DEBUG: amd64_dump_dramcfg_low: x4 logical DIMMs present: L0: yes L1: yes L2: no L3: no > > [ 10.486485] EDAC DEBUG: f1x_early_channel_count: Data width is not 128 bits - need more decoding > > [ 10.486488] EDAC amd64: MCT channel count: 2 > > [ 10.486493] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc(): allocating 3692 bytes for mci data (16 ranks, 16 csrows/channels) > > [ 10.486501] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 0: rank0 (0:0:0): row 0, chan 0 > > [ 10.486506] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 1: rank1 (0:1:0): row 0, chan 1 > > [ 10.486510] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 2: rank2 (1:0:0): row 1, chan 0 > > [ 10.486514] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 3: rank3 (1:1:0): row 1, chan 1 > > [ 10.486518] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 4: rank4 (2:0:0): row 2, chan 0 > > [ 10.486522] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 5: rank5 (2:1:0): row 2, chan 1 > > [ 10.486526] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 6: rank6 (3:0:0): row 3, chan 0 > > [ 10.486530] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 7: rank7 (3:1:0): row 3, chan 1 > > [ 10.486534] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 8: rank8 (4:0:0): row 4, chan 0 > > [ 10.486538] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 9: rank9 (4:1:0): row 4, chan 1 > > [ 10.486542] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 10: rank10 (5:0:0): row 5, chan 0 > > [ 10.486546] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 11: rank11 (5:1:0): row 5, chan 1 > > [ 10.486550] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 12: rank12 (6:0:0): row 6, chan 0 > > [ 10.486554] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 13: rank13 (6:1:0): row 6, chan 1 > > [ 10.486558] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 14: rank14 (7:0:0): row 7, chan 0 > > [ 10.486562] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 15: rank15 (7:1:0): row 7, chan 1 > > > > DCT0 has 4 ranks + DCT1 also 4 ranks = 8 ranks total. > > > > Now your change is showing 16 ranks. Still b0rked. > > > No, DCT0+DCT1 have 16 ranks, 8 filled and 8 empty. So, it is OK. > > As I said before when you've pointed this bug (likel at v3 review), edac_mc_alloc > doesn't know how many ranks are filled, as the driver logic first calls it to > allocate for the max amount of ranks, and then fills the rank with their info > (or let them untouched with 0 pages, if they're empty). Basically you're saying you're generating dimm_info structs for all _possible_ dimms and the loop where this debug message comes from goes and marrily initializes them all although some of them are empty: + for (i = 0; i < tot_dimms; i++) { + chan = &csi[row].channels[chn]; + dimm = EDAC_DIMM_PTR(lay, mci->dimms, n_layers, + pos[0], pos[1], pos[2]); + dimm->mci = mci; + + debugf2("%s: %d: dimm%zd (%d:%d:%d): row %d, chan %d\n", __func__, + i, (dimm - mci->dimms), + pos[0], pos[1], pos[2], row, chn); + + /* Copy DIMM location */ + for (j = 0; j < n_layers; j++) + dimm->location[j] = pos[j]; ... definitely superfluous. Oh well, looking at edac_mc_alloc, it used to allocate structs for all csrows on the controller even though some of them were empty... Ok, then please remove this debug call because it is misleading. Having [ 10.486493] EDAC DEBUG: new_edac_mc_alloc: allocating 3692 bytes for mci data (16 ranks, 16 csrows/channels) is enough. You probably want to say how many channels/csrows there are, though: [ 10.486493] EDAC DEBUG: new_edac_mc_alloc: allocating 3692 bytes for mci data (16 ranks, 8 csrows, 2 channels) or something similar. Simply dump tot_dimms, tot_channels and tot_csrows and that's it. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.x86-64.org (s15943758.onlinehome-server.info [217.160.130.188]) by ozlabs.org (Postfix) with ESMTP id 52BCCB6F9A for ; Mon, 30 Apr 2012 18:15:22 +1000 (EST) Date: Mon, 30 Apr 2012 10:15:13 +0200 From: Borislav Petkov To: Mauro Carvalho Chehab Subject: Re: [PATCH EDACv16 1/2] edac: Change internal representation to work with layers Message-ID: <20120430081513.GD8182@aftab.osrc.amd.com> References: <1335289087-11337-1-git-send-email-mchehab@redhat.com> <1335291342-14922-1-git-send-email-mchehab@redhat.com> <20120427133304.GE9626@aftab.osrc.amd.com> <4F9ABCEC.9090807@redhat.com> <20120428090523.GD26065@aftab.osrc.amd.com> <4F9D46F8.1020104@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <4F9D46F8.1020104@redhat.com> Cc: Shaohui Xie , Jason Uhlenkott , Aristeu Rozanski , Hitoshi Mitake , Mark Gross , Dmitry Eremin-Solenikov , Ranganathan Desikan , Egor Martovetsky , Niklas =?iso-8859-1?Q?S=F6derlund?= , Tim Small , "Arvind R." , Chris Metcalf , Olof Johansson , Doug Thompson , Linux Edac Mailing List , Michal Marek , Jiri Kosina , Linux Kernel Mailing List , Joe Perches , Andrew Morton , linuxppc-dev@lists.ozlabs.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Sun, Apr 29, 2012 at 10:49:44AM -0300, Mauro Carvalho Chehab wrote: > > [ 10.486440] EDAC MC: DCT0 chip selects: > > [ 10.486443] EDAC amd64: MC: 0: 2048MB 1: 2048MB > > [ 10.486445] EDAC amd64: MC: 2: 2048MB 3: 2048MB > > [ 10.486448] EDAC amd64: MC: 4: 0MB 5: 0MB > > [ 10.486450] EDAC amd64: MC: 6: 0MB 7: 0MB > > [ 10.486453] EDAC DEBUG: amd64_debug_display_dimm_sizes: F2x180 (DRAM Bank Address Mapping): 0x00000088 > > [ 10.486455] EDAC MC: DCT1 chip selects: > > [ 10.486458] EDAC amd64: MC: 0: 2048MB 1: 2048MB > > [ 10.486460] EDAC amd64: MC: 2: 2048MB 3: 2048MB > > [ 10.486463] EDAC amd64: MC: 4: 0MB 5: 0MB > > [ 10.486465] EDAC amd64: MC: 6: 0MB 7: 0MB > > [ 10.486467] EDAC amd64: using x8 syndromes. > > [ 10.486469] EDAC DEBUG: amd64_dump_dramcfg_low: F2x190 (DRAM Cfg Low): 0x00083100 > > [ 10.486472] EDAC DEBUG: amd64_dump_dramcfg_low: DIMM type: buffered; all DIMMs support ECC: yes > > [ 10.486475] EDAC DEBUG: amd64_dump_dramcfg_low: PAR/ERR parity: enabled > > [ 10.486478] EDAC DEBUG: amd64_dump_dramcfg_low: DCT 128bit mode width: 64b > > [ 10.486481] EDAC DEBUG: amd64_dump_dramcfg_low: x4 logical DIMMs present: L0: yes L1: yes L2: no L3: no > > [ 10.486485] EDAC DEBUG: f1x_early_channel_count: Data width is not 128 bits - need more decoding > > [ 10.486488] EDAC amd64: MCT channel count: 2 > > [ 10.486493] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc(): allocating 3692 bytes for mci data (16 ranks, 16 csrows/channels) > > [ 10.486501] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 0: rank0 (0:0:0): row 0, chan 0 > > [ 10.486506] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 1: rank1 (0:1:0): row 0, chan 1 > > [ 10.486510] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 2: rank2 (1:0:0): row 1, chan 0 > > [ 10.486514] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 3: rank3 (1:1:0): row 1, chan 1 > > [ 10.486518] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 4: rank4 (2:0:0): row 2, chan 0 > > [ 10.486522] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 5: rank5 (2:1:0): row 2, chan 1 > > [ 10.486526] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 6: rank6 (3:0:0): row 3, chan 0 > > [ 10.486530] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 7: rank7 (3:1:0): row 3, chan 1 > > [ 10.486534] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 8: rank8 (4:0:0): row 4, chan 0 > > [ 10.486538] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 9: rank9 (4:1:0): row 4, chan 1 > > [ 10.486542] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 10: rank10 (5:0:0): row 5, chan 0 > > [ 10.486546] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 11: rank11 (5:1:0): row 5, chan 1 > > [ 10.486550] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 12: rank12 (6:0:0): row 6, chan 0 > > [ 10.486554] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 13: rank13 (6:1:0): row 6, chan 1 > > [ 10.486558] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 14: rank14 (7:0:0): row 7, chan 0 > > [ 10.486562] EDAC DEBUG: new_edac_mc_alloc: new_edac_mc_alloc: 15: rank15 (7:1:0): row 7, chan 1 > > > > DCT0 has 4 ranks + DCT1 also 4 ranks = 8 ranks total. > > > > Now your change is showing 16 ranks. Still b0rked. > > > No, DCT0+DCT1 have 16 ranks, 8 filled and 8 empty. So, it is OK. > > As I said before when you've pointed this bug (likel at v3 review), edac_mc_alloc > doesn't know how many ranks are filled, as the driver logic first calls it to > allocate for the max amount of ranks, and then fills the rank with their info > (or let them untouched with 0 pages, if they're empty). Basically you're saying you're generating dimm_info structs for all _possible_ dimms and the loop where this debug message comes from goes and marrily initializes them all although some of them are empty: + for (i = 0; i < tot_dimms; i++) { + chan = &csi[row].channels[chn]; + dimm = EDAC_DIMM_PTR(lay, mci->dimms, n_layers, + pos[0], pos[1], pos[2]); + dimm->mci = mci; + + debugf2("%s: %d: dimm%zd (%d:%d:%d): row %d, chan %d\n", __func__, + i, (dimm - mci->dimms), + pos[0], pos[1], pos[2], row, chn); + + /* Copy DIMM location */ + for (j = 0; j < n_layers; j++) + dimm->location[j] = pos[j]; ... definitely superfluous. Oh well, looking at edac_mc_alloc, it used to allocate structs for all csrows on the controller even though some of them were empty... Ok, then please remove this debug call because it is misleading. Having [ 10.486493] EDAC DEBUG: new_edac_mc_alloc: allocating 3692 bytes for mci data (16 ranks, 16 csrows/channels) is enough. You probably want to say how many channels/csrows there are, though: [ 10.486493] EDAC DEBUG: new_edac_mc_alloc: allocating 3692 bytes for mci data (16 ranks, 8 csrows, 2 channels) or something similar. Simply dump tot_dimms, tot_channels and tot_csrows and that's it. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551