All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-13 18:44 ` Konrad Dybcio
  0 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-01-13 18:44 UTC (permalink / raw)
  To: ~postmarketos/upstreaming
  Cc: martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Konrad Dybcio, Manivannan Sadhasivam,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel

While I have absolutely 0 idea why and how, running clear_bam_transaction
when READID is issued makes the DMA totally clog up and refuse to function
at all on mdm9607. In fact, it is so bad that all the data gets garbled
and after a short while in the nand probe flow, the CPU decides that
sepuku is the only option.

Removing _READID from the if condition makes it work like a charm, I can
read data and mount partitions without a problem.

Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
---
This is totally just an observation which took me an inhumane amount of
debug prints to find.. perhaps there's a better reason behind this, but
I can't seem to find any answers.. Therefore, this is a BIG RFC!


 drivers/mtd/nand/raw/qcom_nandc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
index 04e6f7b26706..506006ccdf1a 100644
--- a/drivers/mtd/nand/raw/qcom_nandc.c
+++ b/drivers/mtd/nand/raw/qcom_nandc.c
@@ -1459,8 +1459,7 @@ static void pre_command(struct qcom_nand_host *host, int command)
 
 	clear_read_regs(nandc);
 
-	if (command == NAND_CMD_RESET || command == NAND_CMD_READID ||
-	    command == NAND_CMD_PARAM || command == NAND_CMD_ERASE1)
+	if (command == NAND_CMD_RESET || command == NAND_CMD_PARAM || command == NAND_CMD_ERASE1)
 		clear_bam_transaction(nandc);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-13 18:44 ` Konrad Dybcio
  0 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-01-13 18:44 UTC (permalink / raw)
  To: ~postmarketos/upstreaming
  Cc: martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Konrad Dybcio, Manivannan Sadhasivam,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel

While I have absolutely 0 idea why and how, running clear_bam_transaction
when READID is issued makes the DMA totally clog up and refuse to function
at all on mdm9607. In fact, it is so bad that all the data gets garbled
and after a short while in the nand probe flow, the CPU decides that
sepuku is the only option.

Removing _READID from the if condition makes it work like a charm, I can
read data and mount partitions without a problem.

Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
---
This is totally just an observation which took me an inhumane amount of
debug prints to find.. perhaps there's a better reason behind this, but
I can't seem to find any answers.. Therefore, this is a BIG RFC!


 drivers/mtd/nand/raw/qcom_nandc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
index 04e6f7b26706..506006ccdf1a 100644
--- a/drivers/mtd/nand/raw/qcom_nandc.c
+++ b/drivers/mtd/nand/raw/qcom_nandc.c
@@ -1459,8 +1459,7 @@ static void pre_command(struct qcom_nand_host *host, int command)
 
 	clear_read_regs(nandc);
 
-	if (command == NAND_CMD_RESET || command == NAND_CMD_READID ||
-	    command == NAND_CMD_PARAM || command == NAND_CMD_ERASE1)
+	if (command == NAND_CMD_RESET || command == NAND_CMD_PARAM || command == NAND_CMD_ERASE1)
 		clear_bam_transaction(nandc);
 }
 
-- 
2.34.1


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-13 18:44 ` Konrad Dybcio
@ 2022-01-13 18:45   ` Konrad Dybcio
  -1 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-01-13 18:45 UTC (permalink / raw)
  To: ~postmarketos/upstreaming
  Cc: martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Manivannan Sadhasivam, Miquel Raynal,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel



On 13.01.2022 19:44, Konrad Dybcio wrote:
> While I have absolutely 0 idea why and how, running clear_bam_transaction
> when READID is issued makes the DMA totally clog up and refuse to function
> at all on mdm9607. In fact, it is so bad that all the data gets garbled
> and after a short while in the nand probe flow, the CPU decides that
> sepuku is the only option.
> 
> Removing _READID from the if condition makes it work like a charm, I can
> read data and mount partitions without a problem.
> 
> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> ---
> This is totally just an observation which took me an inhumane amount of
> debug prints to find.. perhaps there's a better reason behind this, but
> I can't seem to find any answers.. Therefore, this is a BIG RFC!
> 
Somehow I didn't put RFC in the title though, sorry!

Konrad

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-13 18:45   ` Konrad Dybcio
  0 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-01-13 18:45 UTC (permalink / raw)
  To: ~postmarketos/upstreaming
  Cc: martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Manivannan Sadhasivam, Miquel Raynal,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel



On 13.01.2022 19:44, Konrad Dybcio wrote:
> While I have absolutely 0 idea why and how, running clear_bam_transaction
> when READID is issued makes the DMA totally clog up and refuse to function
> at all on mdm9607. In fact, it is so bad that all the data gets garbled
> and after a short while in the nand probe flow, the CPU decides that
> sepuku is the only option.
> 
> Removing _READID from the if condition makes it work like a charm, I can
> read data and mount partitions without a problem.
> 
> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> ---
> This is totally just an observation which took me an inhumane amount of
> debug prints to find.. perhaps there's a better reason behind this, but
> I can't seem to find any answers.. Therefore, this is a BIG RFC!
> 
Somehow I didn't put RFC in the title though, sorry!

Konrad

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-13 18:44 ` Konrad Dybcio
@ 2022-01-14  7:27   ` Miquel Raynal
  -1 siblings, 0 replies; 42+ messages in thread
From: Miquel Raynal @ 2022-01-14  7:27 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Manivannan Sadhasivam, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, sricharan

Hi Konrad,

konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:

> While I have absolutely 0 idea why and how, running clear_bam_transaction
> when READID is issued makes the DMA totally clog up and refuse to function
> at all on mdm9607. In fact, it is so bad that all the data gets garbled
> and after a short while in the nand probe flow, the CPU decides that
> sepuku is the only option.
> 
> Removing _READID from the if condition makes it work like a charm, I can
> read data and mount partitions without a problem.
> 
> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> ---
> This is totally just an observation which took me an inhumane amount of
> debug prints to find.. perhaps there's a better reason behind this, but
> I can't seem to find any answers.. Therefore, this is a BIG RFC!

I'm adding two people from codeaurora who worked a lot on this driver.
Hopefully they will have an idea :)

Thanks,Miquèl

> 
> 
>  drivers/mtd/nand/raw/qcom_nandc.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
> index 04e6f7b26706..506006ccdf1a 100644
> --- a/drivers/mtd/nand/raw/qcom_nandc.c
> +++ b/drivers/mtd/nand/raw/qcom_nandc.c
> @@ -1459,8 +1459,7 @@ static void pre_command(struct qcom_nand_host *host, int command)
>  
>  	clear_read_regs(nandc);
>  
> -	if (command == NAND_CMD_RESET || command == NAND_CMD_READID ||
> -	    command == NAND_CMD_PARAM || command == NAND_CMD_ERASE1)
> +	if (command == NAND_CMD_RESET || command == NAND_CMD_PARAM || command == NAND_CMD_ERASE1)
>  		clear_bam_transaction(nandc);
>  }
>  

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-14  7:27   ` Miquel Raynal
  0 siblings, 0 replies; 42+ messages in thread
From: Miquel Raynal @ 2022-01-14  7:27 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Manivannan Sadhasivam, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, sricharan

Hi Konrad,

konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:

> While I have absolutely 0 idea why and how, running clear_bam_transaction
> when READID is issued makes the DMA totally clog up and refuse to function
> at all on mdm9607. In fact, it is so bad that all the data gets garbled
> and after a short while in the nand probe flow, the CPU decides that
> sepuku is the only option.
> 
> Removing _READID from the if condition makes it work like a charm, I can
> read data and mount partitions without a problem.
> 
> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> ---
> This is totally just an observation which took me an inhumane amount of
> debug prints to find.. perhaps there's a better reason behind this, but
> I can't seem to find any answers.. Therefore, this is a BIG RFC!

I'm adding two people from codeaurora who worked a lot on this driver.
Hopefully they will have an idea :)

Thanks,Miquèl

> 
> 
>  drivers/mtd/nand/raw/qcom_nandc.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
> index 04e6f7b26706..506006ccdf1a 100644
> --- a/drivers/mtd/nand/raw/qcom_nandc.c
> +++ b/drivers/mtd/nand/raw/qcom_nandc.c
> @@ -1459,8 +1459,7 @@ static void pre_command(struct qcom_nand_host *host, int command)
>  
>  	clear_read_regs(nandc);
>  
> -	if (command == NAND_CMD_RESET || command == NAND_CMD_READID ||
> -	    command == NAND_CMD_PARAM || command == NAND_CMD_ERASE1)
> +	if (command == NAND_CMD_RESET || command == NAND_CMD_PARAM || command == NAND_CMD_ERASE1)
>  		clear_bam_transaction(nandc);
>  }
>  

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-14  7:27   ` Miquel Raynal
@ 2022-01-26 10:16     ` Miquel Raynal
  -1 siblings, 0 replies; 42+ messages in thread
From: Miquel Raynal @ 2022-01-26 10:16 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Manivannan Sadhasivam, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, sricharan

Hello,

miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:

> Hi Konrad,
> 
> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:
> 
> > While I have absolutely 0 idea why and how, running clear_bam_transaction
> > when READID is issued makes the DMA totally clog up and refuse to function
> > at all on mdm9607. In fact, it is so bad that all the data gets garbled
> > and after a short while in the nand probe flow, the CPU decides that
> > sepuku is the only option.
> > 
> > Removing _READID from the if condition makes it work like a charm, I can
> > read data and mount partitions without a problem.
> > 
> > Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> > ---
> > This is totally just an observation which took me an inhumane amount of
> > debug prints to find.. perhaps there's a better reason behind this, but
> > I can't seem to find any answers.. Therefore, this is a BIG RFC!  
> 
> I'm adding two people from codeaurora who worked a lot on this driver.
> Hopefully they will have an idea :)

Sadre, I've spent a significant amount of time reviewing your patches,
now it's your turn to not take a month to answer to your peers
proposals.

Please help reviewing this patch.

BTW why is this driver still using cmdfund? It should have been
migrated to ->exec_op() a long time ago.

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-26 10:16     ` Miquel Raynal
  0 siblings, 0 replies; 42+ messages in thread
From: Miquel Raynal @ 2022-01-26 10:16 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Manivannan Sadhasivam, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, sricharan

Hello,

miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:

> Hi Konrad,
> 
> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:
> 
> > While I have absolutely 0 idea why and how, running clear_bam_transaction
> > when READID is issued makes the DMA totally clog up and refuse to function
> > at all on mdm9607. In fact, it is so bad that all the data gets garbled
> > and after a short while in the nand probe flow, the CPU decides that
> > sepuku is the only option.
> > 
> > Removing _READID from the if condition makes it work like a charm, I can
> > read data and mount partitions without a problem.
> > 
> > Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> > ---
> > This is totally just an observation which took me an inhumane amount of
> > debug prints to find.. perhaps there's a better reason behind this, but
> > I can't seem to find any answers.. Therefore, this is a BIG RFC!  
> 
> I'm adding two people from codeaurora who worked a lot on this driver.
> Hopefully they will have an idea :)

Sadre, I've spent a significant amount of time reviewing your patches,
now it's your turn to not take a month to answer to your peers
proposals.

Please help reviewing this patch.

BTW why is this driver still using cmdfund? It should have been
migrated to ->exec_op() a long time ago.

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-26 10:16     ` Miquel Raynal
@ 2022-01-26 10:33       ` Manivannan Sadhasivam
  -1 siblings, 0 replies; 42+ messages in thread
From: Manivannan Sadhasivam @ 2022-01-26 10:33 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Konrad Dybcio, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam, sricharan

On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
> Hello,
> 
> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
> 
> > Hi Konrad,
> > 
> > konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:
> > 
> > > While I have absolutely 0 idea why and how, running clear_bam_transaction
> > > when READID is issued makes the DMA totally clog up and refuse to function
> > > at all on mdm9607. In fact, it is so bad that all the data gets garbled
> > > and after a short while in the nand probe flow, the CPU decides that
> > > sepuku is the only option.
> > > 
> > > Removing _READID from the if condition makes it work like a charm, I can
> > > read data and mount partitions without a problem.
> > > 
> > > Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> > > ---
> > > This is totally just an observation which took me an inhumane amount of
> > > debug prints to find.. perhaps there's a better reason behind this, but
> > > I can't seem to find any answers.. Therefore, this is a BIG RFC!  
> > 
> > I'm adding two people from codeaurora who worked a lot on this driver.
> > Hopefully they will have an idea :)
> 
> Sadre, I've spent a significant amount of time reviewing your patches,
> now it's your turn to not take a month to answer to your peers
> proposals.
> 
> Please help reviewing this patch.
> 

Sorry. I was hoping that Qcom folks would chime in as I don't have any idea
about the mdm9607 platform. It could be that the mail server migration from
codeaurora to quicinc put a barrier here.

Let me ping them internally.

> BTW why is this driver still using cmdfund? It should have been
> migrated to ->exec_op() a long time ago.

I'll look into it.

Thanks,
Mani
> 
> Thanks,
> Miquèl

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-26 10:33       ` Manivannan Sadhasivam
  0 siblings, 0 replies; 42+ messages in thread
From: Manivannan Sadhasivam @ 2022-01-26 10:33 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Konrad Dybcio, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam, sricharan

On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
> Hello,
> 
> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
> 
> > Hi Konrad,
> > 
> > konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:
> > 
> > > While I have absolutely 0 idea why and how, running clear_bam_transaction
> > > when READID is issued makes the DMA totally clog up and refuse to function
> > > at all on mdm9607. In fact, it is so bad that all the data gets garbled
> > > and after a short while in the nand probe flow, the CPU decides that
> > > sepuku is the only option.
> > > 
> > > Removing _READID from the if condition makes it work like a charm, I can
> > > read data and mount partitions without a problem.
> > > 
> > > Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> > > ---
> > > This is totally just an observation which took me an inhumane amount of
> > > debug prints to find.. perhaps there's a better reason behind this, but
> > > I can't seem to find any answers.. Therefore, this is a BIG RFC!  
> > 
> > I'm adding two people from codeaurora who worked a lot on this driver.
> > Hopefully they will have an idea :)
> 
> Sadre, I've spent a significant amount of time reviewing your patches,
> now it's your turn to not take a month to answer to your peers
> proposals.
> 
> Please help reviewing this patch.
> 

Sorry. I was hoping that Qcom folks would chime in as I don't have any idea
about the mdm9607 platform. It could be that the mail server migration from
codeaurora to quicinc put a barrier here.

Let me ping them internally.

> BTW why is this driver still using cmdfund? It should have been
> migrated to ->exec_op() a long time ago.

I'll look into it.

Thanks,
Mani
> 
> Thanks,
> Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-26 10:33       ` Manivannan Sadhasivam
@ 2022-01-26 10:42         ` Miquel Raynal
  -1 siblings, 0 replies; 42+ messages in thread
From: Miquel Raynal @ 2022-01-26 10:42 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Konrad Dybcio, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam, sricharan

Hi Mani,

mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:

> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
> > Hello,
> > 
> > miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
> >   
> > > Hi Konrad,
> > > 
> > > konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:
> > >   
> > > > While I have absolutely 0 idea why and how, running clear_bam_transaction
> > > > when READID is issued makes the DMA totally clog up and refuse to function
> > > > at all on mdm9607. In fact, it is so bad that all the data gets garbled
> > > > and after a short while in the nand probe flow, the CPU decides that
> > > > sepuku is the only option.
> > > > 
> > > > Removing _READID from the if condition makes it work like a charm, I can
> > > > read data and mount partitions without a problem.
> > > > 
> > > > Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> > > > ---
> > > > This is totally just an observation which took me an inhumane amount of
> > > > debug prints to find.. perhaps there's a better reason behind this, but
> > > > I can't seem to find any answers.. Therefore, this is a BIG RFC!    
> > > 
> > > I'm adding two people from codeaurora who worked a lot on this driver.
> > > Hopefully they will have an idea :)  
> > 
> > Sadre, I've spent a significant amount of time reviewing your patches,
> > now it's your turn to not take a month to answer to your peers
> > proposals.
> > 
> > Please help reviewing this patch.
> >   
> 
> Sorry. I was hoping that Qcom folks would chime in as I don't have any idea
> about the mdm9607 platform. It could be that the mail server migration from
> codeaurora to quicinc put a barrier here.
> 
> Let me ping them internally.

Oh, ok, I didn't know. Thanks!

> > BTW why is this driver still using cmdfund? It should have been
> > migrated to ->exec_op() a long time ago.  
> 
> I'll look into it.

That would be great, given the number of updates this driver has
received, it would be nice to tackle the legacy bits there.

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-26 10:42         ` Miquel Raynal
  0 siblings, 0 replies; 42+ messages in thread
From: Miquel Raynal @ 2022-01-26 10:42 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Konrad Dybcio, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam, sricharan

Hi Mani,

mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:

> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
> > Hello,
> > 
> > miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
> >   
> > > Hi Konrad,
> > > 
> > > konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:
> > >   
> > > > While I have absolutely 0 idea why and how, running clear_bam_transaction
> > > > when READID is issued makes the DMA totally clog up and refuse to function
> > > > at all on mdm9607. In fact, it is so bad that all the data gets garbled
> > > > and after a short while in the nand probe flow, the CPU decides that
> > > > sepuku is the only option.
> > > > 
> > > > Removing _READID from the if condition makes it work like a charm, I can
> > > > read data and mount partitions without a problem.
> > > > 
> > > > Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> > > > ---
> > > > This is totally just an observation which took me an inhumane amount of
> > > > debug prints to find.. perhaps there's a better reason behind this, but
> > > > I can't seem to find any answers.. Therefore, this is a BIG RFC!    
> > > 
> > > I'm adding two people from codeaurora who worked a lot on this driver.
> > > Hopefully they will have an idea :)  
> > 
> > Sadre, I've spent a significant amount of time reviewing your patches,
> > now it's your turn to not take a month to answer to your peers
> > proposals.
> > 
> > Please help reviewing this patch.
> >   
> 
> Sorry. I was hoping that Qcom folks would chime in as I don't have any idea
> about the mdm9607 platform. It could be that the mail server migration from
> codeaurora to quicinc put a barrier here.
> 
> Let me ping them internally.

Oh, ok, I didn't know. Thanks!

> > BTW why is this driver still using cmdfund? It should have been
> > migrated to ->exec_op() a long time ago.  
> 
> I'll look into it.

That would be great, given the number of updates this driver has
received, it would be nice to tackle the legacy bits there.

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-26 10:42         ` Miquel Raynal
@ 2022-01-26 11:36           ` Manivannan Sadhasivam
  -1 siblings, 0 replies; 42+ messages in thread
From: Manivannan Sadhasivam @ 2022-01-26 11:36 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Konrad Dybcio, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam, sricharan

On Wed, Jan 26, 2022 at 11:42:00AM +0100, Miquel Raynal wrote:
> Hi Mani,
> 
> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
> 
> > On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
> > > Hello,
> > > 
> > > miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
> > >   
> > > > Hi Konrad,
> > > > 
> > > > konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:
> > > >   
> > > > > While I have absolutely 0 idea why and how, running clear_bam_transaction
> > > > > when READID is issued makes the DMA totally clog up and refuse to function
> > > > > at all on mdm9607. In fact, it is so bad that all the data gets garbled
> > > > > and after a short while in the nand probe flow, the CPU decides that
> > > > > sepuku is the only option.
> > > > > 
> > > > > Removing _READID from the if condition makes it work like a charm, I can
> > > > > read data and mount partitions without a problem.
> > > > > 
> > > > > Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> > > > > ---
> > > > > This is totally just an observation which took me an inhumane amount of
> > > > > debug prints to find.. perhaps there's a better reason behind this, but
> > > > > I can't seem to find any answers.. Therefore, this is a BIG RFC!    
> > > > 
> > > > I'm adding two people from codeaurora who worked a lot on this driver.
> > > > Hopefully they will have an idea :)  
> > > 
> > > Sadre, I've spent a significant amount of time reviewing your patches,
> > > now it's your turn to not take a month to answer to your peers
> > > proposals.
> > > 
> > > Please help reviewing this patch.
> > >   
> > 
> > Sorry. I was hoping that Qcom folks would chime in as I don't have any idea
> > about the mdm9607 platform. It could be that the mail server migration from
> > codeaurora to quicinc put a barrier here.
> > 
> > Let me ping them internally.
> 
> Oh, ok, I didn't know. Thanks!
> 

Pinged them.

> > > BTW why is this driver still using cmdfund? It should have been
> > > migrated to ->exec_op() a long time ago.  
> > 
> > I'll look into it.
> 
> That would be great, given the number of updates this driver has
> received, it would be nice to tackle the legacy bits there.
> 

Sure! I've added this to my to-do list for the coming weeks.

Thanks,
Mani

> Thanks,
> Miquèl

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-26 11:36           ` Manivannan Sadhasivam
  0 siblings, 0 replies; 42+ messages in thread
From: Manivannan Sadhasivam @ 2022-01-26 11:36 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Konrad Dybcio, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam, sricharan

On Wed, Jan 26, 2022 at 11:42:00AM +0100, Miquel Raynal wrote:
> Hi Mani,
> 
> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
> 
> > On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
> > > Hello,
> > > 
> > > miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
> > >   
> > > > Hi Konrad,
> > > > 
> > > > konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:
> > > >   
> > > > > While I have absolutely 0 idea why and how, running clear_bam_transaction
> > > > > when READID is issued makes the DMA totally clog up and refuse to function
> > > > > at all on mdm9607. In fact, it is so bad that all the data gets garbled
> > > > > and after a short while in the nand probe flow, the CPU decides that
> > > > > sepuku is the only option.
> > > > > 
> > > > > Removing _READID from the if condition makes it work like a charm, I can
> > > > > read data and mount partitions without a problem.
> > > > > 
> > > > > Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> > > > > ---
> > > > > This is totally just an observation which took me an inhumane amount of
> > > > > debug prints to find.. perhaps there's a better reason behind this, but
> > > > > I can't seem to find any answers.. Therefore, this is a BIG RFC!    
> > > > 
> > > > I'm adding two people from codeaurora who worked a lot on this driver.
> > > > Hopefully they will have an idea :)  
> > > 
> > > Sadre, I've spent a significant amount of time reviewing your patches,
> > > now it's your turn to not take a month to answer to your peers
> > > proposals.
> > > 
> > > Please help reviewing this patch.
> > >   
> > 
> > Sorry. I was hoping that Qcom folks would chime in as I don't have any idea
> > about the mdm9607 platform. It could be that the mail server migration from
> > codeaurora to quicinc put a barrier here.
> > 
> > Let me ping them internally.
> 
> Oh, ok, I didn't know. Thanks!
> 

Pinged them.

> > > BTW why is this driver still using cmdfund? It should have been
> > > migrated to ->exec_op() a long time ago.  
> > 
> > I'll look into it.
> 
> That would be great, given the number of updates this driver has
> received, it would be nice to tackle the legacy bits there.
> 

Sure! I've added this to my to-do list for the coming weeks.

Thanks,
Mani

> Thanks,
> Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-26 10:42         ` Miquel Raynal
@ 2022-01-28  4:25           ` Sricharan Ramabadhran
  -1 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-01-28  4:25 UTC (permalink / raw)
  To: Miquel Raynal, Manivannan Sadhasivam
  Cc: Konrad Dybcio, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam

Hi Miquel,

On 1/26/2022 4:12 PM, Miquel Raynal wrote:
> Hi Mani,
>
> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>
>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>> Hello,
>>>
>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>    
>>>> Hi Konrad,
>>>>
>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:
>>>>    
>>>>> While I have absolutely 0 idea why and how, running clear_bam_transaction
>>>>> when READID is issued makes the DMA totally clog up and refuse to function
>>>>> at all on mdm9607. In fact, it is so bad that all the data gets garbled
>>>>> and after a short while in the nand probe flow, the CPU decides that
>>>>> sepuku is the only option.
>>>>>
>>>>> Removing _READID from the if condition makes it work like a charm, I can
>>>>> read data and mount partitions without a problem.
>>>>>
>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>> ---
>>>>> This is totally just an observation which took me an inhumane amount of
>>>>> debug prints to find.. perhaps there's a better reason behind this, but
>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>> I'm adding two people from codeaurora who worked a lot on this driver.
>>>> Hopefully they will have an idea :)
>>> Sadre, I've spent a significant amount of time reviewing your patches,
>>> now it's your turn to not take a month to answer to your peers
>>> proposals.
>>>
>>> Please help reviewing this patch.
>>>    
>> Sorry. I was hoping that Qcom folks would chime in as I don't have any idea
>> about the mdm9607 platform. It could be that the mail server migration from
>> codeaurora to quicinc put a barrier here.
>>
>> Let me ping them internally.
> Oh, ok, I didn't know. Thanks!

    Sorry Miquel, somehow we did not get this email in our inbox.
    Thanks to Mani for pinging us, we will test this up today and get back.

Regards,
    Sricharan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-28  4:25           ` Sricharan Ramabadhran
  0 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-01-28  4:25 UTC (permalink / raw)
  To: Miquel Raynal, Manivannan Sadhasivam
  Cc: Konrad Dybcio, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam

Hi Miquel,

On 1/26/2022 4:12 PM, Miquel Raynal wrote:
> Hi Mani,
>
> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>
>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>> Hello,
>>>
>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>    
>>>> Hi Konrad,
>>>>
>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 +0100:
>>>>    
>>>>> While I have absolutely 0 idea why and how, running clear_bam_transaction
>>>>> when READID is issued makes the DMA totally clog up and refuse to function
>>>>> at all on mdm9607. In fact, it is so bad that all the data gets garbled
>>>>> and after a short while in the nand probe flow, the CPU decides that
>>>>> sepuku is the only option.
>>>>>
>>>>> Removing _READID from the if condition makes it work like a charm, I can
>>>>> read data and mount partitions without a problem.
>>>>>
>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>> ---
>>>>> This is totally just an observation which took me an inhumane amount of
>>>>> debug prints to find.. perhaps there's a better reason behind this, but
>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>> I'm adding two people from codeaurora who worked a lot on this driver.
>>>> Hopefully they will have an idea :)
>>> Sadre, I've spent a significant amount of time reviewing your patches,
>>> now it's your turn to not take a month to answer to your peers
>>> proposals.
>>>
>>> Please help reviewing this patch.
>>>    
>> Sorry. I was hoping that Qcom folks would chime in as I don't have any idea
>> about the mdm9607 platform. It could be that the mail server migration from
>> codeaurora to quicinc put a barrier here.
>>
>> Let me ping them internally.
> Oh, ok, I didn't know. Thanks!

    Sorry Miquel, somehow we did not get this email in our inbox.
    Thanks to Mani for pinging us, we will test this up today and get back.

Regards,
    Sricharan


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-28  4:25           ` Sricharan Ramabadhran
@ 2022-01-28 17:50             ` Sricharan Ramabadhran
  -1 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-01-28 17:50 UTC (permalink / raw)
  To: Miquel Raynal, Manivannan Sadhasivam
  Cc: Konrad Dybcio, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam

Hi Konrad,

On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
> Hi Miquel,
>
> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>> Hi Mani,
>>
>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>
>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>> Hello,
>>>>
>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>> Hi Konrad,
>>>>>
>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 
>>>>> +0100:
>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>> clear_bam_transaction
>>>>>> when READID is issued makes the DMA totally clog up and refuse to 
>>>>>> function
>>>>>> at all on mdm9607. In fact, it is so bad that all the data gets 
>>>>>> garbled
>>>>>> and after a short while in the nand probe flow, the CPU decides that
>>>>>> sepuku is the only option.
>>>>>>
>>>>>> Removing _READID from the if condition makes it work like a 
>>>>>> charm, I can
>>>>>> read data and mount partitions without a problem.
>>>>>>
>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>> ---
>>>>>> This is totally just an observation which took me an inhumane 
>>>>>> amount of
>>>>>> debug prints to find.. perhaps there's a better reason behind 
>>>>>> this, but
>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>> I'm adding two people from codeaurora who worked a lot on this 
>>>>> driver.
>>>>> Hopefully they will have an idea :)
>>>> Sadre, I've spent a significant amount of time reviewing your patches,
>>>> now it's your turn to not take a month to answer to your peers
>>>> proposals.
>>>>
>>>> Please help reviewing this patch.
>>> Sorry. I was hoping that Qcom folks would chime in as I don't have 
>>> any idea
>>> about the mdm9607 platform. It could be that the mail server 
>>> migration from
>>> codeaurora to quicinc put a barrier here.
>>>
>>> Let me ping them internally.
>> Oh, ok, I didn't know. Thanks!
>
>    Sorry Miquel, somehow we did not get this email in our inbox.
>    Thanks to Mani for pinging us, we will test this up today and get 
> back.
>
       While we could not reproduce this issue on our ipq boards (do not 
have a mdm9607 right now) and
        issue does not look any obvious.
       can you please give the debug logs that you did for the above 
stage by stage ?

   Regards,
       Sricharan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-28 17:50             ` Sricharan Ramabadhran
  0 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-01-28 17:50 UTC (permalink / raw)
  To: Miquel Raynal, Manivannan Sadhasivam
  Cc: Konrad Dybcio, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam

Hi Konrad,

On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
> Hi Miquel,
>
> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>> Hi Mani,
>>
>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>
>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>> Hello,
>>>>
>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>> Hi Konrad,
>>>>>
>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 
>>>>> +0100:
>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>> clear_bam_transaction
>>>>>> when READID is issued makes the DMA totally clog up and refuse to 
>>>>>> function
>>>>>> at all on mdm9607. In fact, it is so bad that all the data gets 
>>>>>> garbled
>>>>>> and after a short while in the nand probe flow, the CPU decides that
>>>>>> sepuku is the only option.
>>>>>>
>>>>>> Removing _READID from the if condition makes it work like a 
>>>>>> charm, I can
>>>>>> read data and mount partitions without a problem.
>>>>>>
>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>> ---
>>>>>> This is totally just an observation which took me an inhumane 
>>>>>> amount of
>>>>>> debug prints to find.. perhaps there's a better reason behind 
>>>>>> this, but
>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>> I'm adding two people from codeaurora who worked a lot on this 
>>>>> driver.
>>>>> Hopefully they will have an idea :)
>>>> Sadre, I've spent a significant amount of time reviewing your patches,
>>>> now it's your turn to not take a month to answer to your peers
>>>> proposals.
>>>>
>>>> Please help reviewing this patch.
>>> Sorry. I was hoping that Qcom folks would chime in as I don't have 
>>> any idea
>>> about the mdm9607 platform. It could be that the mail server 
>>> migration from
>>> codeaurora to quicinc put a barrier here.
>>>
>>> Let me ping them internally.
>> Oh, ok, I didn't know. Thanks!
>
>    Sorry Miquel, somehow we did not get this email in our inbox.
>    Thanks to Mani for pinging us, we will test this up today and get 
> back.
>
       While we could not reproduce this issue on our ipq boards (do not 
have a mdm9607 right now) and
        issue does not look any obvious.
       can you please give the debug logs that you did for the above 
stage by stage ?

   Regards,
       Sricharan


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-28 17:50             ` Sricharan Ramabadhran
@ 2022-01-31  9:52               ` Miquel Raynal
  -1 siblings, 0 replies; 42+ messages in thread
From: Miquel Raynal @ 2022-01-31  9:52 UTC (permalink / raw)
  To: Sricharan Ramabadhran
  Cc: Manivannan Sadhasivam, Konrad Dybcio, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam

Hi Sricharan,

sricharan@codeaurora.org wrote on Fri, 28 Jan 2022 23:20:04 +0530:

> Hi Konrad,
> 
> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
> > Hi Miquel,
> >
> > On 1/26/2022 4:12 PM, Miquel Raynal wrote:  
> >> Hi Mani,
> >>
> >> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
> >>  
> >>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:  
> >>>> Hello,
> >>>>
> >>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:  
> >>>>> Hi Konrad,
> >>>>>
> >>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>> +0100:  
> >>>>>> While I have absolutely 0 idea why and how, running >>>>>> clear_bam_transaction
> >>>>>> when READID is issued makes the DMA totally clog up and refuse to >>>>>> function
> >>>>>> at all on mdm9607. In fact, it is so bad that all the data gets >>>>>> garbled
> >>>>>> and after a short while in the nand probe flow, the CPU decides that
> >>>>>> sepuku is the only option.
> >>>>>>
> >>>>>> Removing _READID from the if condition makes it work like a >>>>>> charm, I can
> >>>>>> read data and mount partitions without a problem.
> >>>>>>
> >>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> >>>>>> ---
> >>>>>> This is totally just an observation which took me an inhumane >>>>>> amount of
> >>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>> this, but
> >>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!  
> >>>>> I'm adding two people from codeaurora who worked a lot on this >>>>> driver.
> >>>>> Hopefully they will have an idea :)  
> >>>> Sadre, I've spent a significant amount of time reviewing your patches,
> >>>> now it's your turn to not take a month to answer to your peers
> >>>> proposals.
> >>>>
> >>>> Please help reviewing this patch.  
> >>> Sorry. I was hoping that Qcom folks would chime in as I don't have >>> any idea
> >>> about the mdm9607 platform. It could be that the mail server >>> migration from
> >>> codeaurora to quicinc put a barrier here.
> >>>
> >>> Let me ping them internally.  
> >> Oh, ok, I didn't know. Thanks!  
> >
> >    Sorry Miquel, somehow we did not get this email in our inbox.
> >    Thanks to Mani for pinging us, we will test this up today and get > back.
> >  
>        While we could not reproduce this issue on our ipq boards (do not have a mdm9607 right now) and
>         issue does not look any obvious.
>        can you please give the debug logs that you did for the above stage by stage ?

Thanks for stepping up, it is really appreciated, good luck both for
the debugging.

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-31  9:52               ` Miquel Raynal
  0 siblings, 0 replies; 42+ messages in thread
From: Miquel Raynal @ 2022-01-31  9:52 UTC (permalink / raw)
  To: Sricharan Ramabadhran
  Cc: Manivannan Sadhasivam, Konrad Dybcio, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam

Hi Sricharan,

sricharan@codeaurora.org wrote on Fri, 28 Jan 2022 23:20:04 +0530:

> Hi Konrad,
> 
> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
> > Hi Miquel,
> >
> > On 1/26/2022 4:12 PM, Miquel Raynal wrote:  
> >> Hi Mani,
> >>
> >> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
> >>  
> >>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:  
> >>>> Hello,
> >>>>
> >>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:  
> >>>>> Hi Konrad,
> >>>>>
> >>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>> +0100:  
> >>>>>> While I have absolutely 0 idea why and how, running >>>>>> clear_bam_transaction
> >>>>>> when READID is issued makes the DMA totally clog up and refuse to >>>>>> function
> >>>>>> at all on mdm9607. In fact, it is so bad that all the data gets >>>>>> garbled
> >>>>>> and after a short while in the nand probe flow, the CPU decides that
> >>>>>> sepuku is the only option.
> >>>>>>
> >>>>>> Removing _READID from the if condition makes it work like a >>>>>> charm, I can
> >>>>>> read data and mount partitions without a problem.
> >>>>>>
> >>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> >>>>>> ---
> >>>>>> This is totally just an observation which took me an inhumane >>>>>> amount of
> >>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>> this, but
> >>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!  
> >>>>> I'm adding two people from codeaurora who worked a lot on this >>>>> driver.
> >>>>> Hopefully they will have an idea :)  
> >>>> Sadre, I've spent a significant amount of time reviewing your patches,
> >>>> now it's your turn to not take a month to answer to your peers
> >>>> proposals.
> >>>>
> >>>> Please help reviewing this patch.  
> >>> Sorry. I was hoping that Qcom folks would chime in as I don't have >>> any idea
> >>> about the mdm9607 platform. It could be that the mail server >>> migration from
> >>> codeaurora to quicinc put a barrier here.
> >>>
> >>> Let me ping them internally.  
> >> Oh, ok, I didn't know. Thanks!  
> >
> >    Sorry Miquel, somehow we did not get this email in our inbox.
> >    Thanks to Mani for pinging us, we will test this up today and get > back.
> >  
>        While we could not reproduce this issue on our ipq boards (do not have a mdm9607 right now) and
>         issue does not look any obvious.
>        can you please give the debug logs that you did for the above stage by stage ?

Thanks for stepping up, it is really appreciated, good luck both for
the debugging.

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-28 17:50             ` Sricharan Ramabadhran
@ 2022-01-31 10:09               ` Konrad Dybcio
  -1 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-01-31 10:09 UTC (permalink / raw)
  To: Sricharan Ramabadhran, Miquel Raynal, Manivannan Sadhasivam
  Cc: ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam


On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
> Hi Konrad,
>
> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>> Hi Miquel,
>>
>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>> Hi Mani,
>>>
>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>
>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>> Hello,
>>>>>
>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>> Hi Konrad,
>>>>>>
>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 
>>>>>> +0100:
>>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>>> clear_bam_transaction
>>>>>>> when READID is issued makes the DMA totally clog up and refuse 
>>>>>>> to function
>>>>>>> at all on mdm9607. In fact, it is so bad that all the data gets 
>>>>>>> garbled
>>>>>>> and after a short while in the nand probe flow, the CPU decides 
>>>>>>> that
>>>>>>> sepuku is the only option.
>>>>>>>
>>>>>>> Removing _READID from the if condition makes it work like a 
>>>>>>> charm, I can
>>>>>>> read data and mount partitions without a problem.
>>>>>>>
>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>> ---
>>>>>>> This is totally just an observation which took me an inhumane 
>>>>>>> amount of
>>>>>>> debug prints to find.. perhaps there's a better reason behind 
>>>>>>> this, but
>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>> I'm adding two people from codeaurora who worked a lot on this 
>>>>>> driver.
>>>>>> Hopefully they will have an idea :)
>>>>> Sadre, I've spent a significant amount of time reviewing your 
>>>>> patches,
>>>>> now it's your turn to not take a month to answer to your peers
>>>>> proposals.
>>>>>
>>>>> Please help reviewing this patch.
>>>> Sorry. I was hoping that Qcom folks would chime in as I don't have 
>>>> any idea
>>>> about the mdm9607 platform. It could be that the mail server 
>>>> migration from
>>>> codeaurora to quicinc put a barrier here.
>>>>
>>>> Let me ping them internally.
>>> Oh, ok, I didn't know. Thanks!
>>
>>    Sorry Miquel, somehow we did not get this email in our inbox.
>>    Thanks to Mani for pinging us, we will test this up today and get 
>> back.
>>
>       While we could not reproduce this issue on our ipq boards (do 
> not have a mdm9607 right now) and
>        issue does not look any obvious.
>       can you please give the debug logs that you did for the above 
> stage by stage ?

I won't have access to the board for about two weeks, sorry.

When I get to it, I'll surely try to send you the logs, though there

wasn't much more than just something jumping to who-knows-where

after clear_bam_transaction was called, resulting in values associated with

the NAND being all zeroed out in pr_err/_debug/etc.


Konrad

>
>   Regards,
>       Sricharan
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-31 10:09               ` Konrad Dybcio
  0 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-01-31 10:09 UTC (permalink / raw)
  To: Sricharan Ramabadhran, Miquel Raynal, Manivannan Sadhasivam
  Cc: ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam


On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
> Hi Konrad,
>
> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>> Hi Miquel,
>>
>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>> Hi Mani,
>>>
>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>
>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>> Hello,
>>>>>
>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>> Hi Konrad,
>>>>>>
>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 
>>>>>> +0100:
>>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>>> clear_bam_transaction
>>>>>>> when READID is issued makes the DMA totally clog up and refuse 
>>>>>>> to function
>>>>>>> at all on mdm9607. In fact, it is so bad that all the data gets 
>>>>>>> garbled
>>>>>>> and after a short while in the nand probe flow, the CPU decides 
>>>>>>> that
>>>>>>> sepuku is the only option.
>>>>>>>
>>>>>>> Removing _READID from the if condition makes it work like a 
>>>>>>> charm, I can
>>>>>>> read data and mount partitions without a problem.
>>>>>>>
>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>> ---
>>>>>>> This is totally just an observation which took me an inhumane 
>>>>>>> amount of
>>>>>>> debug prints to find.. perhaps there's a better reason behind 
>>>>>>> this, but
>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>> I'm adding two people from codeaurora who worked a lot on this 
>>>>>> driver.
>>>>>> Hopefully they will have an idea :)
>>>>> Sadre, I've spent a significant amount of time reviewing your 
>>>>> patches,
>>>>> now it's your turn to not take a month to answer to your peers
>>>>> proposals.
>>>>>
>>>>> Please help reviewing this patch.
>>>> Sorry. I was hoping that Qcom folks would chime in as I don't have 
>>>> any idea
>>>> about the mdm9607 platform. It could be that the mail server 
>>>> migration from
>>>> codeaurora to quicinc put a barrier here.
>>>>
>>>> Let me ping them internally.
>>> Oh, ok, I didn't know. Thanks!
>>
>>    Sorry Miquel, somehow we did not get this email in our inbox.
>>    Thanks to Mani for pinging us, we will test this up today and get 
>> back.
>>
>       While we could not reproduce this issue on our ipq boards (do 
> not have a mdm9607 right now) and
>        issue does not look any obvious.
>       can you please give the debug logs that you did for the above 
> stage by stage ?

I won't have access to the board for about two weeks, sorry.

When I get to it, I'll surely try to send you the logs, though there

wasn't much more than just something jumping to who-knows-where

after clear_bam_transaction was called, resulting in values associated with

the NAND being all zeroed out in pr_err/_debug/etc.


Konrad

>
>   Regards,
>       Sricharan
>

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-31 10:09               ` Konrad Dybcio
@ 2022-01-31 14:13                 ` Sricharan Ramabadhran
  -1 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-01-31 14:13 UTC (permalink / raw)
  To: Konrad Dybcio, Miquel Raynal, Manivannan Sadhasivam, pragalla
  Cc: ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam

Hi Konrad,

On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>
> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>> Hi Konrad,
>>
>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>> Hi Miquel,
>>>
>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>> Hi Mani,
>>>>
>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>
>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>> Hello,
>>>>>>
>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>> Hi Konrad,
>>>>>>>
>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 
>>>>>>> +0100:
>>>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>>>> clear_bam_transaction
>>>>>>>> when READID is issued makes the DMA totally clog up and refuse 
>>>>>>>> to function
>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data gets 
>>>>>>>> garbled
>>>>>>>> and after a short while in the nand probe flow, the CPU decides 
>>>>>>>> that
>>>>>>>> sepuku is the only option.
>>>>>>>>
>>>>>>>> Removing _READID from the if condition makes it work like a 
>>>>>>>> charm, I can
>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>
>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>> ---
>>>>>>>> This is totally just an observation which took me an inhumane 
>>>>>>>> amount of
>>>>>>>> debug prints to find.. perhaps there's a better reason behind 
>>>>>>>> this, but
>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>> I'm adding two people from codeaurora who worked a lot on this 
>>>>>>> driver.
>>>>>>> Hopefully they will have an idea :)
>>>>>> Sadre, I've spent a significant amount of time reviewing your 
>>>>>> patches,
>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>> proposals.
>>>>>>
>>>>>> Please help reviewing this patch.
>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't have 
>>>>> any idea
>>>>> about the mdm9607 platform. It could be that the mail server 
>>>>> migration from
>>>>> codeaurora to quicinc put a barrier here.
>>>>>
>>>>> Let me ping them internally.
>>>> Oh, ok, I didn't know. Thanks!
>>>
>>>    Sorry Miquel, somehow we did not get this email in our inbox.
>>>    Thanks to Mani for pinging us, we will test this up today and get 
>>> back.
>>>
>>       While we could not reproduce this issue on our ipq boards (do 
>> not have a mdm9607 right now) and
>>        issue does not look any obvious.
>>       can you please give the debug logs that you did for the above 
>> stage by stage ?
>
> I won't have access to the board for about two weeks, sorry.
>
> When I get to it, I'll surely try to send you the logs, though there
>
> wasn't much more than just something jumping to who-knows-where
>
> after clear_bam_transaction was called, resulting in values associated 
> with
>
> the NAND being all zeroed out in pr_err/_debug/etc.
>
>
     Ok sure. So was the READID command itself failing (or) the 
subsequent one ?
    We can check which parameter reset by the clear_bam_transaction is 
causing the
    failure.  Meanwhile, looping in Pradeep who has access to the board, 
so in a better
    position to debug.

Regards,
    Sricharan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-31 14:13                 ` Sricharan Ramabadhran
  0 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-01-31 14:13 UTC (permalink / raw)
  To: Konrad Dybcio, Miquel Raynal, Manivannan Sadhasivam, pragalla
  Cc: ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam

Hi Konrad,

On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>
> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>> Hi Konrad,
>>
>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>> Hi Miquel,
>>>
>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>> Hi Mani,
>>>>
>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>
>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>> Hello,
>>>>>>
>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>> Hi Konrad,
>>>>>>>
>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 
>>>>>>> +0100:
>>>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>>>> clear_bam_transaction
>>>>>>>> when READID is issued makes the DMA totally clog up and refuse 
>>>>>>>> to function
>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data gets 
>>>>>>>> garbled
>>>>>>>> and after a short while in the nand probe flow, the CPU decides 
>>>>>>>> that
>>>>>>>> sepuku is the only option.
>>>>>>>>
>>>>>>>> Removing _READID from the if condition makes it work like a 
>>>>>>>> charm, I can
>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>
>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>> ---
>>>>>>>> This is totally just an observation which took me an inhumane 
>>>>>>>> amount of
>>>>>>>> debug prints to find.. perhaps there's a better reason behind 
>>>>>>>> this, but
>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>> I'm adding two people from codeaurora who worked a lot on this 
>>>>>>> driver.
>>>>>>> Hopefully they will have an idea :)
>>>>>> Sadre, I've spent a significant amount of time reviewing your 
>>>>>> patches,
>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>> proposals.
>>>>>>
>>>>>> Please help reviewing this patch.
>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't have 
>>>>> any idea
>>>>> about the mdm9607 platform. It could be that the mail server 
>>>>> migration from
>>>>> codeaurora to quicinc put a barrier here.
>>>>>
>>>>> Let me ping them internally.
>>>> Oh, ok, I didn't know. Thanks!
>>>
>>>    Sorry Miquel, somehow we did not get this email in our inbox.
>>>    Thanks to Mani for pinging us, we will test this up today and get 
>>> back.
>>>
>>       While we could not reproduce this issue on our ipq boards (do 
>> not have a mdm9607 right now) and
>>        issue does not look any obvious.
>>       can you please give the debug logs that you did for the above 
>> stage by stage ?
>
> I won't have access to the board for about two weeks, sorry.
>
> When I get to it, I'll surely try to send you the logs, though there
>
> wasn't much more than just something jumping to who-knows-where
>
> after clear_bam_transaction was called, resulting in values associated 
> with
>
> the NAND being all zeroed out in pr_err/_debug/etc.
>
>
     Ok sure. So was the READID command itself failing (or) the 
subsequent one ?
    We can check which parameter reset by the clear_bam_transaction is 
causing the
    failure.  Meanwhile, looping in Pradeep who has access to the board, 
so in a better
    position to debug.

Regards,
    Sricharan



______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-31 14:13                 ` Sricharan Ramabadhran
@ 2022-01-31 19:54                   ` Konrad Dybcio
  -1 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-01-31 19:54 UTC (permalink / raw)
  To: Sricharan Ramabadhran, Miquel Raynal, Manivannan Sadhasivam, pragalla
  Cc: ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam


On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
> Hi Konrad,
>
> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>
>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>> Hi Konrad,
>>>
>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>> Hi Miquel,
>>>>
>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>> Hi Mani,
>>>>>
>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>
>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>>> Hi Konrad,
>>>>>>>>
>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 
>>>>>>>> +0100:
>>>>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>>>>> clear_bam_transaction
>>>>>>>>> when READID is issued makes the DMA totally clog up and refuse 
>>>>>>>>> to function
>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data 
>>>>>>>>> gets garbled
>>>>>>>>> and after a short while in the nand probe flow, the CPU 
>>>>>>>>> decides that
>>>>>>>>> sepuku is the only option.
>>>>>>>>>
>>>>>>>>> Removing _READID from the if condition makes it work like a 
>>>>>>>>> charm, I can
>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>> ---
>>>>>>>>> This is totally just an observation which took me an inhumane 
>>>>>>>>> amount of
>>>>>>>>> debug prints to find.. perhaps there's a better reason behind 
>>>>>>>>> this, but
>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>>> I'm adding two people from codeaurora who worked a lot on this 
>>>>>>>> driver.
>>>>>>>> Hopefully they will have an idea :)
>>>>>>> Sadre, I've spent a significant amount of time reviewing your 
>>>>>>> patches,
>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>> proposals.
>>>>>>>
>>>>>>> Please help reviewing this patch.
>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't 
>>>>>> have any idea
>>>>>> about the mdm9607 platform. It could be that the mail server 
>>>>>> migration from
>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>
>>>>>> Let me ping them internally.
>>>>> Oh, ok, I didn't know. Thanks!
>>>>
>>>>    Sorry Miquel, somehow we did not get this email in our inbox.
>>>>    Thanks to Mani for pinging us, we will test this up today and 
>>>> get back.
>>>>
>>>       While we could not reproduce this issue on our ipq boards (do 
>>> not have a mdm9607 right now) and
>>>        issue does not look any obvious.
>>>       can you please give the debug logs that you did for the above 
>>> stage by stage ?
>>
>> I won't have access to the board for about two weeks, sorry.
>>
>> When I get to it, I'll surely try to send you the logs, though there
>>
>> wasn't much more than just something jumping to who-knows-where
>>
>> after clear_bam_transaction was called, resulting in values 
>> associated with
>>
>> the NAND being all zeroed out in pr_err/_debug/etc.
>>
>>
>     Ok sure. So was the READID command itself failing (or) the 
> subsequent one ?
>    We can check which parameter reset by the clear_bam_transaction is 
> causing the
>    failure.  Meanwhile, looping in Pradeep who has access to the 
> board, so in a better
>    position to debug.

I'm sorry I have so few details on hand, and no kernel tree (no access 
to that machine either, for now).


I will try to describe to the best of my abilities what I recall.


My methodology of making sure things don't go haywire was to print the 
oob size

of our NAND basically every two lines of code (yes, i was very desperate 
at one point),

as that was zeroed out when *the bug* happened, leading to a kernel 
bug/panic/stall

(can't recall what exactly it was, but it said something along the lines 
of "no support for

oob size 0" and then it didn't fail graceully, leading to some bad jumps 
and ultimately

a dead platform..)


after hours of digging, I found out that everything goes fine until 
clear_bam_transaction is called,

after that gets executed every nand op starts reading all zeroes (for 
example in JEDEC ID check)

so I added the changes from this patch, and things magically started 
working... My suspicion is

that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, 
i work on too many socs at once)

and this function only makes Linux think it is, without actually 
draining it, and the leftover

commands get executed with some parts of them getting overwritten, 
resulting in the

famous garbage in - garbage out situation, but that's only a guesstimate..


Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I 
went as far as replacing most

of the kernel with the updated/downgraded parts via git checkout (i 
tried many combinations),

to no avail.. I even tried different compilers and optimization levels, 
thinking it could have been

a codegen issue, but no luck either.


I.. do understand this email is a total mess to read, as much as it was 
to write, but

without access to my code and the machine itself I can't give you solid 
details, and

the fact this situation is far from ordinary doesn't help either..


The latest (ancient, not quite pretty, but probably working if my memory 
is correct) version of my patches

for the mdm9607 is available at [1], I will push the new revision after 
I get access to the workstation.


Konrad


[1] https://github.com/SoMainline/linux/commits/konrad/pinemodem


>
> Regards,
>    Sricharan
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-01-31 19:54                   ` Konrad Dybcio
  0 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-01-31 19:54 UTC (permalink / raw)
  To: Sricharan Ramabadhran, Miquel Raynal, Manivannan Sadhasivam, pragalla
  Cc: ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam


On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
> Hi Konrad,
>
> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>
>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>> Hi Konrad,
>>>
>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>> Hi Miquel,
>>>>
>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>> Hi Mani,
>>>>>
>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>
>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>>> Hi Konrad,
>>>>>>>>
>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 
>>>>>>>> +0100:
>>>>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>>>>> clear_bam_transaction
>>>>>>>>> when READID is issued makes the DMA totally clog up and refuse 
>>>>>>>>> to function
>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data 
>>>>>>>>> gets garbled
>>>>>>>>> and after a short while in the nand probe flow, the CPU 
>>>>>>>>> decides that
>>>>>>>>> sepuku is the only option.
>>>>>>>>>
>>>>>>>>> Removing _READID from the if condition makes it work like a 
>>>>>>>>> charm, I can
>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>> ---
>>>>>>>>> This is totally just an observation which took me an inhumane 
>>>>>>>>> amount of
>>>>>>>>> debug prints to find.. perhaps there's a better reason behind 
>>>>>>>>> this, but
>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>>> I'm adding two people from codeaurora who worked a lot on this 
>>>>>>>> driver.
>>>>>>>> Hopefully they will have an idea :)
>>>>>>> Sadre, I've spent a significant amount of time reviewing your 
>>>>>>> patches,
>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>> proposals.
>>>>>>>
>>>>>>> Please help reviewing this patch.
>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't 
>>>>>> have any idea
>>>>>> about the mdm9607 platform. It could be that the mail server 
>>>>>> migration from
>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>
>>>>>> Let me ping them internally.
>>>>> Oh, ok, I didn't know. Thanks!
>>>>
>>>>    Sorry Miquel, somehow we did not get this email in our inbox.
>>>>    Thanks to Mani for pinging us, we will test this up today and 
>>>> get back.
>>>>
>>>       While we could not reproduce this issue on our ipq boards (do 
>>> not have a mdm9607 right now) and
>>>        issue does not look any obvious.
>>>       can you please give the debug logs that you did for the above 
>>> stage by stage ?
>>
>> I won't have access to the board for about two weeks, sorry.
>>
>> When I get to it, I'll surely try to send you the logs, though there
>>
>> wasn't much more than just something jumping to who-knows-where
>>
>> after clear_bam_transaction was called, resulting in values 
>> associated with
>>
>> the NAND being all zeroed out in pr_err/_debug/etc.
>>
>>
>     Ok sure. So was the READID command itself failing (or) the 
> subsequent one ?
>    We can check which parameter reset by the clear_bam_transaction is 
> causing the
>    failure.  Meanwhile, looping in Pradeep who has access to the 
> board, so in a better
>    position to debug.

I'm sorry I have so few details on hand, and no kernel tree (no access 
to that machine either, for now).


I will try to describe to the best of my abilities what I recall.


My methodology of making sure things don't go haywire was to print the 
oob size

of our NAND basically every two lines of code (yes, i was very desperate 
at one point),

as that was zeroed out when *the bug* happened, leading to a kernel 
bug/panic/stall

(can't recall what exactly it was, but it said something along the lines 
of "no support for

oob size 0" and then it didn't fail graceully, leading to some bad jumps 
and ultimately

a dead platform..)


after hours of digging, I found out that everything goes fine until 
clear_bam_transaction is called,

after that gets executed every nand op starts reading all zeroes (for 
example in JEDEC ID check)

so I added the changes from this patch, and things magically started 
working... My suspicion is

that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, 
i work on too many socs at once)

and this function only makes Linux think it is, without actually 
draining it, and the leftover

commands get executed with some parts of them getting overwritten, 
resulting in the

famous garbage in - garbage out situation, but that's only a guesstimate..


Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I 
went as far as replacing most

of the kernel with the updated/downgraded parts via git checkout (i 
tried many combinations),

to no avail.. I even tried different compilers and optimization levels, 
thinking it could have been

a codegen issue, but no luck either.


I.. do understand this email is a total mess to read, as much as it was 
to write, but

without access to my code and the machine itself I can't give you solid 
details, and

the fact this situation is far from ordinary doesn't help either..


The latest (ancient, not quite pretty, but probably working if my memory 
is correct) version of my patches

for the mdm9607 is available at [1], I will push the new revision after 
I get access to the workstation.


Konrad


[1] https://github.com/SoMainline/linux/commits/konrad/pinemodem


>
> Regards,
>    Sricharan
>
>

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-01-31 19:54                   ` Konrad Dybcio
@ 2022-02-01 13:52                     ` Miquel Raynal
  -1 siblings, 0 replies; 42+ messages in thread
From: Miquel Raynal @ 2022-02-01 13:52 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Sricharan Ramabadhran, Manivannan Sadhasivam, pragalla,
	~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam

Hi Konrad,

konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:

> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
> > Hi Konrad,
> >
> > On 1/31/2022 3:39 PM, Konrad Dybcio wrote:  
> >>
> >> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:  
> >>> Hi Konrad,
> >>>
> >>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:  
> >>>> Hi Miquel,
> >>>>
> >>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:  
> >>>>> Hi Mani,
> >>>>>
> >>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
> >>>>>  
> >>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:  
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:  
> >>>>>>>> Hi Konrad,
> >>>>>>>>
> >>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>>>>> +0100:  
> >>>>>>>>> While I have absolutely 0 idea why and how, running >>>>>>>>> clear_bam_transaction
> >>>>>>>>> when READID is issued makes the DMA totally clog up and refuse >>>>>>>>> to function
> >>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data >>>>>>>>> gets garbled
> >>>>>>>>> and after a short while in the nand probe flow, the CPU >>>>>>>>> decides that
> >>>>>>>>> sepuku is the only option.
> >>>>>>>>>
> >>>>>>>>> Removing _READID from the if condition makes it work like a >>>>>>>>> charm, I can
> >>>>>>>>> read data and mount partitions without a problem.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> >>>>>>>>> ---
> >>>>>>>>> This is totally just an observation which took me an inhumane >>>>>>>>> amount of
> >>>>>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>>>>> this, but
> >>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!  
> >>>>>>>> I'm adding two people from codeaurora who worked a lot on this >>>>>>>> driver.
> >>>>>>>> Hopefully they will have an idea :)  
> >>>>>>> Sadre, I've spent a significant amount of time reviewing your >>>>>>> patches,
> >>>>>>> now it's your turn to not take a month to answer to your peers
> >>>>>>> proposals.
> >>>>>>>
> >>>>>>> Please help reviewing this patch.  
> >>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't >>>>>> have any idea
> >>>>>> about the mdm9607 platform. It could be that the mail server >>>>>> migration from
> >>>>>> codeaurora to quicinc put a barrier here.
> >>>>>>
> >>>>>> Let me ping them internally.  
> >>>>> Oh, ok, I didn't know. Thanks!  
> >>>>
> >>>>    Sorry Miquel, somehow we did not get this email in our inbox.
> >>>>    Thanks to Mani for pinging us, we will test this up today and >>>> get back.
> >>>>  
> >>>       While we could not reproduce this issue on our ipq boards (do >>> not have a mdm9607 right now) and
> >>>        issue does not look any obvious.
> >>>       can you please give the debug logs that you did for the above >>> stage by stage ?  
> >>
> >> I won't have access to the board for about two weeks, sorry.
> >>
> >> When I get to it, I'll surely try to send you the logs, though there
> >>
> >> wasn't much more than just something jumping to who-knows-where
> >>
> >> after clear_bam_transaction was called, resulting in values >> associated with
> >>
> >> the NAND being all zeroed out in pr_err/_debug/etc.
> >>
> >>  
> >     Ok sure. So was the READID command itself failing (or) the > subsequent one ?
> >    We can check which parameter reset by the clear_bam_transaction is > causing the
> >    failure.  Meanwhile, looping in Pradeep who has access to the > board, so in a better
> >    position to debug.  
> 
> I'm sorry I have so few details on hand, and no kernel tree (no access to that machine either, for now).
> 
> 
> I will try to describe to the best of my abilities what I recall.
> 
> 
> My methodology of making sure things don't go haywire was to print the oob size
> 
> of our NAND basically every two lines of code (yes, i was very desperate at one point),
> 
> as that was zeroed out when *the bug* happened,

This does look like a pointer error at some point and some kernel data
has been corrupted very badly by the driver.

> leading to a kernel bug/panic/stall
> 
> (can't recall what exactly it was, but it said something along the lines of "no support for
> 
> oob size 0" and then it didn't fail graceully, leading to some bad jumps and ultimately
> 
> a dead platform..)
> 
> 
> after hours of digging, I found out that everything goes fine until clear_bam_transaction is called,

Do you remember if this function was called for the first time when
this happened?

> after that gets executed every nand op starts reading all zeroes (for example in JEDEC ID check)
> 
> so I added the changes from this patch, and things magically started working... My suspicion is
> 
> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, i work on too many socs at once)

I don't see it in the list of supported devices, what's the exact
compatible used?

> 
> and this function only makes Linux think it is, without actually draining it, and the leftover
> 
> commands get executed with some parts of them getting overwritten, resulting in the
> 
> famous garbage in - garbage out situation, but that's only a guesstimate..

I would bet for a non allocated bam-ish pointer that is reset to zero
in the clear_bam_transaction() helper.

Can you get your hands on the board again?
It would be nice to check if the allocation always occurs before use,
and if yes on how much bytes.

If the pointer is not dangling, then perhaps something else smashes
that pointer.

> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I went as far as replacing most
> 
> of the kernel with the updated/downgraded parts via git checkout (i tried many combinations),
> 
> to no avail.. I even tried different compilers and optimization levels, thinking it could have been
> 
> a codegen issue, but no luck either.
> 
> 
> I.. do understand this email is a total mess to read, as much as it was to write, but
> 
> without access to my code and the machine itself I can't give you solid details, and
> 
> the fact this situation is far from ordinary doesn't help either..
> 
> 
> The latest (ancient, not quite pretty, but probably working if my memory is correct) version of my patches
> 
> for the mdm9607 is available at [1], I will push the new revision after I get access to the workstation.
> 
> 
> Konrad
> 
> 
> [1] https://github.com/SoMainline/linux/commits/konrad/pinemodem
> 
> 
> >
> > Regards,
> >    Sricharan
> >
> >  


Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-02-01 13:52                     ` Miquel Raynal
  0 siblings, 0 replies; 42+ messages in thread
From: Miquel Raynal @ 2022-02-01 13:52 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Sricharan Ramabadhran, Manivannan Sadhasivam, pragalla,
	~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam

Hi Konrad,

konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:

> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
> > Hi Konrad,
> >
> > On 1/31/2022 3:39 PM, Konrad Dybcio wrote:  
> >>
> >> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:  
> >>> Hi Konrad,
> >>>
> >>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:  
> >>>> Hi Miquel,
> >>>>
> >>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:  
> >>>>> Hi Mani,
> >>>>>
> >>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
> >>>>>  
> >>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:  
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:  
> >>>>>>>> Hi Konrad,
> >>>>>>>>
> >>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>>>>> +0100:  
> >>>>>>>>> While I have absolutely 0 idea why and how, running >>>>>>>>> clear_bam_transaction
> >>>>>>>>> when READID is issued makes the DMA totally clog up and refuse >>>>>>>>> to function
> >>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data >>>>>>>>> gets garbled
> >>>>>>>>> and after a short while in the nand probe flow, the CPU >>>>>>>>> decides that
> >>>>>>>>> sepuku is the only option.
> >>>>>>>>>
> >>>>>>>>> Removing _READID from the if condition makes it work like a >>>>>>>>> charm, I can
> >>>>>>>>> read data and mount partitions without a problem.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
> >>>>>>>>> ---
> >>>>>>>>> This is totally just an observation which took me an inhumane >>>>>>>>> amount of
> >>>>>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>>>>> this, but
> >>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!  
> >>>>>>>> I'm adding two people from codeaurora who worked a lot on this >>>>>>>> driver.
> >>>>>>>> Hopefully they will have an idea :)  
> >>>>>>> Sadre, I've spent a significant amount of time reviewing your >>>>>>> patches,
> >>>>>>> now it's your turn to not take a month to answer to your peers
> >>>>>>> proposals.
> >>>>>>>
> >>>>>>> Please help reviewing this patch.  
> >>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't >>>>>> have any idea
> >>>>>> about the mdm9607 platform. It could be that the mail server >>>>>> migration from
> >>>>>> codeaurora to quicinc put a barrier here.
> >>>>>>
> >>>>>> Let me ping them internally.  
> >>>>> Oh, ok, I didn't know. Thanks!  
> >>>>
> >>>>    Sorry Miquel, somehow we did not get this email in our inbox.
> >>>>    Thanks to Mani for pinging us, we will test this up today and >>>> get back.
> >>>>  
> >>>       While we could not reproduce this issue on our ipq boards (do >>> not have a mdm9607 right now) and
> >>>        issue does not look any obvious.
> >>>       can you please give the debug logs that you did for the above >>> stage by stage ?  
> >>
> >> I won't have access to the board for about two weeks, sorry.
> >>
> >> When I get to it, I'll surely try to send you the logs, though there
> >>
> >> wasn't much more than just something jumping to who-knows-where
> >>
> >> after clear_bam_transaction was called, resulting in values >> associated with
> >>
> >> the NAND being all zeroed out in pr_err/_debug/etc.
> >>
> >>  
> >     Ok sure. So was the READID command itself failing (or) the > subsequent one ?
> >    We can check which parameter reset by the clear_bam_transaction is > causing the
> >    failure.  Meanwhile, looping in Pradeep who has access to the > board, so in a better
> >    position to debug.  
> 
> I'm sorry I have so few details on hand, and no kernel tree (no access to that machine either, for now).
> 
> 
> I will try to describe to the best of my abilities what I recall.
> 
> 
> My methodology of making sure things don't go haywire was to print the oob size
> 
> of our NAND basically every two lines of code (yes, i was very desperate at one point),
> 
> as that was zeroed out when *the bug* happened,

This does look like a pointer error at some point and some kernel data
has been corrupted very badly by the driver.

> leading to a kernel bug/panic/stall
> 
> (can't recall what exactly it was, but it said something along the lines of "no support for
> 
> oob size 0" and then it didn't fail graceully, leading to some bad jumps and ultimately
> 
> a dead platform..)
> 
> 
> after hours of digging, I found out that everything goes fine until clear_bam_transaction is called,

Do you remember if this function was called for the first time when
this happened?

> after that gets executed every nand op starts reading all zeroes (for example in JEDEC ID check)
> 
> so I added the changes from this patch, and things magically started working... My suspicion is
> 
> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, i work on too many socs at once)

I don't see it in the list of supported devices, what's the exact
compatible used?

> 
> and this function only makes Linux think it is, without actually draining it, and the leftover
> 
> commands get executed with some parts of them getting overwritten, resulting in the
> 
> famous garbage in - garbage out situation, but that's only a guesstimate..

I would bet for a non allocated bam-ish pointer that is reset to zero
in the clear_bam_transaction() helper.

Can you get your hands on the board again?
It would be nice to check if the allocation always occurs before use,
and if yes on how much bytes.

If the pointer is not dangling, then perhaps something else smashes
that pointer.

> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I went as far as replacing most
> 
> of the kernel with the updated/downgraded parts via git checkout (i tried many combinations),
> 
> to no avail.. I even tried different compilers and optimization levels, thinking it could have been
> 
> a codegen issue, but no luck either.
> 
> 
> I.. do understand this email is a total mess to read, as much as it was to write, but
> 
> without access to my code and the machine itself I can't give you solid details, and
> 
> the fact this situation is far from ordinary doesn't help either..
> 
> 
> The latest (ancient, not quite pretty, but probably working if my memory is correct) version of my patches
> 
> for the mdm9607 is available at [1], I will push the new revision after I get access to the workstation.
> 
> 
> Konrad
> 
> 
> [1] https://github.com/SoMainline/linux/commits/konrad/pinemodem
> 
> 
> >
> > Regards,
> >    Sricharan
> >
> >  


Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-02-01 13:52                     ` Miquel Raynal
@ 2022-02-01 15:51                       ` Konrad Dybcio
  -1 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-02-01 15:51 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Sricharan Ramabadhran, Manivannan Sadhasivam, pragalla,
	~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam


On 01/02/2022 14:52, Miquel Raynal wrote:
> Hi Konrad,
>
> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>
>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>> Hi Konrad,
>>>
>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>> Hi Konrad,
>>>>>
>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>> Hi Miquel,
>>>>>>
>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>> Hi Mani,
>>>>>>>
>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>   
>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>>>>> Hi Konrad,
>>>>>>>>>>
>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>> While I have absolutely 0 idea why and how, running >>>>>>>>> clear_bam_transaction
>>>>>>>>>>> when READID is issued makes the DMA totally clog up and refuse >>>>>>>>> to function
>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data >>>>>>>>> gets garbled
>>>>>>>>>>> and after a short while in the nand probe flow, the CPU >>>>>>>>> decides that
>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>
>>>>>>>>>>> Removing _READID from the if condition makes it work like a >>>>>>>>> charm, I can
>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>> ---
>>>>>>>>>>> This is totally just an observation which took me an inhumane >>>>>>>>> amount of
>>>>>>>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>>>>> this, but
>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on this >>>>>>>> driver.
>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>> Sadre, I've spent a significant amount of time reviewing your >>>>>>> patches,
>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>> proposals.
>>>>>>>>>
>>>>>>>>> Please help reviewing this patch.
>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't >>>>>> have any idea
>>>>>>>> about the mdm9607 platform. It could be that the mail server >>>>>> migration from
>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>
>>>>>>>> Let me ping them internally.
>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>     Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>     Thanks to Mani for pinging us, we will test this up today and >>>> get back.
>>>>>>   
>>>>>        While we could not reproduce this issue on our ipq boards (do >>> not have a mdm9607 right now) and
>>>>>         issue does not look any obvious.
>>>>>        can you please give the debug logs that you did for the above >>> stage by stage ?
>>>> I won't have access to the board for about two weeks, sorry.
>>>>
>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>
>>>> wasn't much more than just something jumping to who-knows-where
>>>>
>>>> after clear_bam_transaction was called, resulting in values >> associated with
>>>>
>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>
>>>>   
>>>      Ok sure. So was the READID command itself failing (or) the > subsequent one ?
>>>     We can check which parameter reset by the clear_bam_transaction is > causing the
>>>     failure.  Meanwhile, looping in Pradeep who has access to the > board, so in a better
>>>     position to debug.
>> I'm sorry I have so few details on hand, and no kernel tree (no access to that machine either, for now).
>>
>>
>> I will try to describe to the best of my abilities what I recall.
>>
>>
>> My methodology of making sure things don't go haywire was to print the oob size
>>
>> of our NAND basically every two lines of code (yes, i was very desperate at one point),
>>
>> as that was zeroed out when *the bug* happened,
> This does look like a pointer error at some point and some kernel data
> has been corrupted very badly by the driver.
>
>> leading to a kernel bug/panic/stall
>>
>> (can't recall what exactly it was, but it said something along the lines of "no support for
>>
>> oob size 0" and then it didn't fail graceully, leading to some bad jumps and ultimately
>>
>> a dead platform..)
>>
>>
>> after hours of digging, I found out that everything goes fine until clear_bam_transaction is called,
> Do you remember if this function was called for the first time when
> this happened?

I think so, if I recall correctly there are no more callers in this 
path, as readid is the first nand command executed in flash probe flow.



>
>> after that gets executed every nand op starts reading all zeroes (for example in JEDEC ID check)
>>
>> so I added the changes from this patch, and things magically started working... My suspicion is
>>
>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, i work on too many socs at once)
> I don't see it in the list of supported devices, what's the exact
> compatible used?

qcom,ipq4019-nand



>
>> and this function only makes Linux think it is, without actually draining it, and the leftover
>>
>> commands get executed with some parts of them getting overwritten, resulting in the
>>
>> famous garbage in - garbage out situation, but that's only a guesstimate..
> I would bet for a non allocated bam-ish pointer that is reset to zero
> in the clear_bam_transaction() helper.
>
> Can you get your hands on the board again?

Sure, but as I mentioned previously, only in about 2 weeks, I can't 
really do any dev before then.. :(



> It would be nice to check if the allocation always occurs before use,
> and if yes on how much bytes.
>
> If the pointer is not dangling, then perhaps something else smashes
> that pointer.


Konrad

>
>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I went as far as replacing most
>>
>> of the kernel with the updated/downgraded parts via git checkout (i tried many combinations),
>>
>> to no avail.. I even tried different compilers and optimization levels, thinking it could have been
>>
>> a codegen issue, but no luck either.
>>
>>
>> I.. do understand this email is a total mess to read, as much as it was to write, but
>>
>> without access to my code and the machine itself I can't give you solid details, and
>>
>> the fact this situation is far from ordinary doesn't help either..
>>
>>
>> The latest (ancient, not quite pretty, but probably working if my memory is correct) version of my patches
>>
>> for the mdm9607 is available at [1], I will push the new revision after I get access to the workstation.
>>
>>
>> Konrad
>>
>>
>> [1] https://github.com/SoMainline/linux/commits/konrad/pinemodem
>>
>>
>>> Regards,
>>>     Sricharan
>>>
>>>   
>
> Thanks,
> Miquèl

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-02-01 15:51                       ` Konrad Dybcio
  0 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-02-01 15:51 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Sricharan Ramabadhran, Manivannan Sadhasivam, pragalla,
	~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam


On 01/02/2022 14:52, Miquel Raynal wrote:
> Hi Konrad,
>
> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>
>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>> Hi Konrad,
>>>
>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>> Hi Konrad,
>>>>>
>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>> Hi Miquel,
>>>>>>
>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>> Hi Mani,
>>>>>>>
>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>   
>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>>>>> Hi Konrad,
>>>>>>>>>>
>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>> While I have absolutely 0 idea why and how, running >>>>>>>>> clear_bam_transaction
>>>>>>>>>>> when READID is issued makes the DMA totally clog up and refuse >>>>>>>>> to function
>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data >>>>>>>>> gets garbled
>>>>>>>>>>> and after a short while in the nand probe flow, the CPU >>>>>>>>> decides that
>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>
>>>>>>>>>>> Removing _READID from the if condition makes it work like a >>>>>>>>> charm, I can
>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>> ---
>>>>>>>>>>> This is totally just an observation which took me an inhumane >>>>>>>>> amount of
>>>>>>>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>>>>> this, but
>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on this >>>>>>>> driver.
>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>> Sadre, I've spent a significant amount of time reviewing your >>>>>>> patches,
>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>> proposals.
>>>>>>>>>
>>>>>>>>> Please help reviewing this patch.
>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't >>>>>> have any idea
>>>>>>>> about the mdm9607 platform. It could be that the mail server >>>>>> migration from
>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>
>>>>>>>> Let me ping them internally.
>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>     Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>     Thanks to Mani for pinging us, we will test this up today and >>>> get back.
>>>>>>   
>>>>>        While we could not reproduce this issue on our ipq boards (do >>> not have a mdm9607 right now) and
>>>>>         issue does not look any obvious.
>>>>>        can you please give the debug logs that you did for the above >>> stage by stage ?
>>>> I won't have access to the board for about two weeks, sorry.
>>>>
>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>
>>>> wasn't much more than just something jumping to who-knows-where
>>>>
>>>> after clear_bam_transaction was called, resulting in values >> associated with
>>>>
>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>
>>>>   
>>>      Ok sure. So was the READID command itself failing (or) the > subsequent one ?
>>>     We can check which parameter reset by the clear_bam_transaction is > causing the
>>>     failure.  Meanwhile, looping in Pradeep who has access to the > board, so in a better
>>>     position to debug.
>> I'm sorry I have so few details on hand, and no kernel tree (no access to that machine either, for now).
>>
>>
>> I will try to describe to the best of my abilities what I recall.
>>
>>
>> My methodology of making sure things don't go haywire was to print the oob size
>>
>> of our NAND basically every two lines of code (yes, i was very desperate at one point),
>>
>> as that was zeroed out when *the bug* happened,
> This does look like a pointer error at some point and some kernel data
> has been corrupted very badly by the driver.
>
>> leading to a kernel bug/panic/stall
>>
>> (can't recall what exactly it was, but it said something along the lines of "no support for
>>
>> oob size 0" and then it didn't fail graceully, leading to some bad jumps and ultimately
>>
>> a dead platform..)
>>
>>
>> after hours of digging, I found out that everything goes fine until clear_bam_transaction is called,
> Do you remember if this function was called for the first time when
> this happened?

I think so, if I recall correctly there are no more callers in this 
path, as readid is the first nand command executed in flash probe flow.



>
>> after that gets executed every nand op starts reading all zeroes (for example in JEDEC ID check)
>>
>> so I added the changes from this patch, and things magically started working... My suspicion is
>>
>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, i work on too many socs at once)
> I don't see it in the list of supported devices, what's the exact
> compatible used?

qcom,ipq4019-nand



>
>> and this function only makes Linux think it is, without actually draining it, and the leftover
>>
>> commands get executed with some parts of them getting overwritten, resulting in the
>>
>> famous garbage in - garbage out situation, but that's only a guesstimate..
> I would bet for a non allocated bam-ish pointer that is reset to zero
> in the clear_bam_transaction() helper.
>
> Can you get your hands on the board again?

Sure, but as I mentioned previously, only in about 2 weeks, I can't 
really do any dev before then.. :(



> It would be nice to check if the allocation always occurs before use,
> and if yes on how much bytes.
>
> If the pointer is not dangling, then perhaps something else smashes
> that pointer.


Konrad

>
>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I went as far as replacing most
>>
>> of the kernel with the updated/downgraded parts via git checkout (i tried many combinations),
>>
>> to no avail.. I even tried different compilers and optimization levels, thinking it could have been
>>
>> a codegen issue, but no luck either.
>>
>>
>> I.. do understand this email is a total mess to read, as much as it was to write, but
>>
>> without access to my code and the machine itself I can't give you solid details, and
>>
>> the fact this situation is far from ordinary doesn't help either..
>>
>>
>> The latest (ancient, not quite pretty, but probably working if my memory is correct) version of my patches
>>
>> for the mdm9607 is available at [1], I will push the new revision after I get access to the workstation.
>>
>>
>> Konrad
>>
>>
>> [1] https://github.com/SoMainline/linux/commits/konrad/pinemodem
>>
>>
>>> Regards,
>>>     Sricharan
>>>
>>>   
>
> Thanks,
> Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-02-01 15:51                       ` Konrad Dybcio
@ 2022-02-02  7:24                         ` Sricharan Ramabadhran
  -1 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-02-02  7:24 UTC (permalink / raw)
  To: Konrad Dybcio, Miquel Raynal
  Cc: Manivannan Sadhasivam, pragalla, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk

Hi Konrad/Miquel,

On 2/1/2022 9:21 PM, Konrad Dybcio wrote:
>
> On 01/02/2022 14:52, Miquel Raynal wrote:
>> Hi Konrad,
>>
>> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>>
>>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>>> Hi Konrad,
>>>>
>>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>>> Hi Konrad,
>>>>>>
>>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>>> Hi Miquel,
>>>>>>>
>>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>>> Hi Mani,
>>>>>>>>
>>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 
>>>>>>>>>> +0100:
>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>
>>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 
>>>>>>>>>>> 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>>>>>>>> >>>>>>>>> clear_bam_transaction
>>>>>>>>>>>> when READID is issued makes the DMA totally clog up and 
>>>>>>>>>>>> refuse >>>>>>>>> to function
>>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data 
>>>>>>>>>>>> >>>>>>>>> gets garbled
>>>>>>>>>>>> and after a short while in the nand probe flow, the CPU 
>>>>>>>>>>>> >>>>>>>>> decides that
>>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>>
>>>>>>>>>>>> Removing _READID from the if condition makes it work like a 
>>>>>>>>>>>> >>>>>>>>> charm, I can
>>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>>> ---
>>>>>>>>>>>> This is totally just an observation which took me an 
>>>>>>>>>>>> inhumane >>>>>>>>> amount of
>>>>>>>>>>>> debug prints to find.. perhaps there's a better reason 
>>>>>>>>>>>> behind >>>>>>>>> this, but
>>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG 
>>>>>>>>>>>> RFC!
>>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on 
>>>>>>>>>>> this >>>>>>>> driver.
>>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>>> Sadre, I've spent a significant amount of time reviewing your 
>>>>>>>>>> >>>>>>> patches,
>>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>>> proposals.
>>>>>>>>>>
>>>>>>>>>> Please help reviewing this patch.
>>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't 
>>>>>>>>> >>>>>> have any idea
>>>>>>>>> about the mdm9607 platform. It could be that the mail server 
>>>>>>>>> >>>>>> migration from
>>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>>
>>>>>>>>> Let me ping them internally.
>>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>>     Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>>     Thanks to Mani for pinging us, we will test this up today 
>>>>>>> and >>>> get back.
>>>>>>        While we could not reproduce this issue on our ipq boards 
>>>>>> (do >>> not have a mdm9607 right now) and
>>>>>>         issue does not look any obvious.
>>>>>>        can you please give the debug logs that you did for the 
>>>>>> above >>> stage by stage ?
>>>>> I won't have access to the board for about two weeks, sorry.
>>>>>
>>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>>
>>>>> wasn't much more than just something jumping to who-knows-where
>>>>>
>>>>> after clear_bam_transaction was called, resulting in values >> 
>>>>> associated with
>>>>>
>>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>>
>>>>      Ok sure. So was the READID command itself failing (or) the > 
>>>> subsequent one ?
>>>>     We can check which parameter reset by the clear_bam_transaction 
>>>> is > causing the
>>>>     failure.  Meanwhile, looping in Pradeep who has access to the > 
>>>> board, so in a better
>>>>     position to debug.
>>> I'm sorry I have so few details on hand, and no kernel tree (no 
>>> access to that machine either, for now).
>>>
>>>
>>> I will try to describe to the best of my abilities what I recall.
>>>
>>>
>>> My methodology of making sure things don't go haywire was to print 
>>> the oob size
>>>
>>> of our NAND basically every two lines of code (yes, i was very 
>>> desperate at one point),
>>>
>>> as that was zeroed out when *the bug* happened,
>> This does look like a pointer error at some point and some kernel data
>> has been corrupted very badly by the driver.
>>
>>> leading to a kernel bug/panic/stall
>>>
>>> (can't recall what exactly it was, but it said something along the 
>>> lines of "no support for
>>>
>>> oob size 0" and then it didn't fail graceully, leading to some bad 
>>> jumps and ultimately
>>>
>>> a dead platform..)
>>>
>>>
>>> after hours of digging, I found out that everything goes fine until 
>>> clear_bam_transaction is called,
>> Do you remember if this function was called for the first time when
>> this happened?
>
> I think so, if I recall correctly there are no more callers in this 
> path, as readid is the first nand command executed in flash probe flow.
>
>
>
>>
>>> after that gets executed every nand op starts reading all zeroes 
>>> (for example in JEDEC ID check)
>>>
>>> so I added the changes from this patch, and things magically started 
>>> working... My suspicion is
>>>
>>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? 
>>> bah, i work on too many socs at once)
>> I don't see it in the list of supported devices, what's the exact
>> compatible used?
>
> qcom,ipq4019-nand
>
>
>
>>
>>> and this function only makes Linux think it is, without actually 
>>> draining it, and the leftover
>>>
>>> commands get executed with some parts of them getting overwritten, 
>>> resulting in the
>>>
>>> famous garbage in - garbage out situation, but that's only a 
>>> guesstimate..
>> I would bet for a non allocated bam-ish pointer that is reset to zero
>> in the clear_bam_transaction() helper.
>>
>> Can you get your hands on the board again?
>
> Sure, but as I mentioned previously, only in about 2 weeks, I can't 
> really do any dev before then.. :(
>
>
>
>> It would be nice to check if the allocation always occurs before use,
>> and if yes on how much bytes.
>>
>> If the pointer is not dangling, then perhaps something else smashes
>> that pointer.
>
>
> Konrad
>
>>
>>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. 
>>> I went as far as replacing most
>>>
>>> of the kernel with the updated/downgraded parts via git checkout (i 
>>> tried many combinations),
>>>
>>> to no avail.. I even tried different compilers and optimization 
>>> levels, thinking it could have been
>>>
>>> a codegen issue, but no luck either.
>>>
>>>
>>> I.. do understand this email is a total mess to read, as much as it 
>>> was to write, but
>>>
>>> without access to my code and the machine itself I can't give you 
>>> solid details, and
>>>
>>> the fact this situation is far from ordinary doesn't help either..
>>>
>>>
>>> The latest (ancient, not quite pretty, but probably working if my 
>>> memory is correct) version of my patches
>>>
>>> for the mdm9607 is available at [1], I will push the new revision 
>>> after I get access to the workstation.
>>>
   + few more who have access to the board.

    Going by the description, for kernel corruption, we can try out a 
KASAN build.
    Since you have mentioned it worked till 5.11, you bisected the 
driver till 5.11 head and it worked ?

Regards,
    Sricharan





^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-02-02  7:24                         ` Sricharan Ramabadhran
  0 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-02-02  7:24 UTC (permalink / raw)
  To: Konrad Dybcio, Miquel Raynal
  Cc: Manivannan Sadhasivam, pragalla, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk

Hi Konrad/Miquel,

On 2/1/2022 9:21 PM, Konrad Dybcio wrote:
>
> On 01/02/2022 14:52, Miquel Raynal wrote:
>> Hi Konrad,
>>
>> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>>
>>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>>> Hi Konrad,
>>>>
>>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>>> Hi Konrad,
>>>>>>
>>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>>> Hi Miquel,
>>>>>>>
>>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>>> Hi Mani,
>>>>>>>>
>>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 
>>>>>>>>>> +0100:
>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>
>>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 
>>>>>>>>>>> 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>>>>>>>> >>>>>>>>> clear_bam_transaction
>>>>>>>>>>>> when READID is issued makes the DMA totally clog up and 
>>>>>>>>>>>> refuse >>>>>>>>> to function
>>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data 
>>>>>>>>>>>> >>>>>>>>> gets garbled
>>>>>>>>>>>> and after a short while in the nand probe flow, the CPU 
>>>>>>>>>>>> >>>>>>>>> decides that
>>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>>
>>>>>>>>>>>> Removing _READID from the if condition makes it work like a 
>>>>>>>>>>>> >>>>>>>>> charm, I can
>>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>>> ---
>>>>>>>>>>>> This is totally just an observation which took me an 
>>>>>>>>>>>> inhumane >>>>>>>>> amount of
>>>>>>>>>>>> debug prints to find.. perhaps there's a better reason 
>>>>>>>>>>>> behind >>>>>>>>> this, but
>>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG 
>>>>>>>>>>>> RFC!
>>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on 
>>>>>>>>>>> this >>>>>>>> driver.
>>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>>> Sadre, I've spent a significant amount of time reviewing your 
>>>>>>>>>> >>>>>>> patches,
>>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>>> proposals.
>>>>>>>>>>
>>>>>>>>>> Please help reviewing this patch.
>>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't 
>>>>>>>>> >>>>>> have any idea
>>>>>>>>> about the mdm9607 platform. It could be that the mail server 
>>>>>>>>> >>>>>> migration from
>>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>>
>>>>>>>>> Let me ping them internally.
>>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>>     Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>>     Thanks to Mani for pinging us, we will test this up today 
>>>>>>> and >>>> get back.
>>>>>>        While we could not reproduce this issue on our ipq boards 
>>>>>> (do >>> not have a mdm9607 right now) and
>>>>>>         issue does not look any obvious.
>>>>>>        can you please give the debug logs that you did for the 
>>>>>> above >>> stage by stage ?
>>>>> I won't have access to the board for about two weeks, sorry.
>>>>>
>>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>>
>>>>> wasn't much more than just something jumping to who-knows-where
>>>>>
>>>>> after clear_bam_transaction was called, resulting in values >> 
>>>>> associated with
>>>>>
>>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>>
>>>>      Ok sure. So was the READID command itself failing (or) the > 
>>>> subsequent one ?
>>>>     We can check which parameter reset by the clear_bam_transaction 
>>>> is > causing the
>>>>     failure.  Meanwhile, looping in Pradeep who has access to the > 
>>>> board, so in a better
>>>>     position to debug.
>>> I'm sorry I have so few details on hand, and no kernel tree (no 
>>> access to that machine either, for now).
>>>
>>>
>>> I will try to describe to the best of my abilities what I recall.
>>>
>>>
>>> My methodology of making sure things don't go haywire was to print 
>>> the oob size
>>>
>>> of our NAND basically every two lines of code (yes, i was very 
>>> desperate at one point),
>>>
>>> as that was zeroed out when *the bug* happened,
>> This does look like a pointer error at some point and some kernel data
>> has been corrupted very badly by the driver.
>>
>>> leading to a kernel bug/panic/stall
>>>
>>> (can't recall what exactly it was, but it said something along the 
>>> lines of "no support for
>>>
>>> oob size 0" and then it didn't fail graceully, leading to some bad 
>>> jumps and ultimately
>>>
>>> a dead platform..)
>>>
>>>
>>> after hours of digging, I found out that everything goes fine until 
>>> clear_bam_transaction is called,
>> Do you remember if this function was called for the first time when
>> this happened?
>
> I think so, if I recall correctly there are no more callers in this 
> path, as readid is the first nand command executed in flash probe flow.
>
>
>
>>
>>> after that gets executed every nand op starts reading all zeroes 
>>> (for example in JEDEC ID check)
>>>
>>> so I added the changes from this patch, and things magically started 
>>> working... My suspicion is
>>>
>>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? 
>>> bah, i work on too many socs at once)
>> I don't see it in the list of supported devices, what's the exact
>> compatible used?
>
> qcom,ipq4019-nand
>
>
>
>>
>>> and this function only makes Linux think it is, without actually 
>>> draining it, and the leftover
>>>
>>> commands get executed with some parts of them getting overwritten, 
>>> resulting in the
>>>
>>> famous garbage in - garbage out situation, but that's only a 
>>> guesstimate..
>> I would bet for a non allocated bam-ish pointer that is reset to zero
>> in the clear_bam_transaction() helper.
>>
>> Can you get your hands on the board again?
>
> Sure, but as I mentioned previously, only in about 2 weeks, I can't 
> really do any dev before then.. :(
>
>
>
>> It would be nice to check if the allocation always occurs before use,
>> and if yes on how much bytes.
>>
>> If the pointer is not dangling, then perhaps something else smashes
>> that pointer.
>
>
> Konrad
>
>>
>>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. 
>>> I went as far as replacing most
>>>
>>> of the kernel with the updated/downgraded parts via git checkout (i 
>>> tried many combinations),
>>>
>>> to no avail.. I even tried different compilers and optimization 
>>> levels, thinking it could have been
>>>
>>> a codegen issue, but no luck either.
>>>
>>>
>>> I.. do understand this email is a total mess to read, as much as it 
>>> was to write, but
>>>
>>> without access to my code and the machine itself I can't give you 
>>> solid details, and
>>>
>>> the fact this situation is far from ordinary doesn't help either..
>>>
>>>
>>> The latest (ancient, not quite pretty, but probably working if my 
>>> memory is correct) version of my patches
>>>
>>> for the mdm9607 is available at [1], I will push the new revision 
>>> after I get access to the workstation.
>>>
   + few more who have access to the board.

    Going by the description, for kernel corruption, we can try out a 
KASAN build.
    Since you have mentioned it worked till 5.11, you bisected the 
driver till 5.11 head and it worked ?

Regards,
    Sricharan





______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-02-02  7:24                         ` Sricharan Ramabadhran
@ 2022-02-04 17:17                           ` Sricharan Ramabadhran
  -1 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-02-04 17:17 UTC (permalink / raw)
  To: Konrad Dybcio, Miquel Raynal
  Cc: Manivannan Sadhasivam, pragalla, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk


On 2/2/2022 12:54 PM, Sricharan Ramabadhran wrote:
> Hi Konrad/Miquel,
>
> On 2/1/2022 9:21 PM, Konrad Dybcio wrote:
>>
>> On 01/02/2022 14:52, Miquel Raynal wrote:
>>> Hi Konrad,
>>>
>>> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>>>
>>>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>>>> Hi Konrad,
>>>>>
>>>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>>>> Hi Konrad,
>>>>>>>
>>>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>>>> Hi Miquel,
>>>>>>>>
>>>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>>>> Hi Mani,
>>>>>>>>>
>>>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 
>>>>>>>>>>> +0100:
>>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>>
>>>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 
>>>>>>>>>>>> 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>>>>>>>>> >>>>>>>>> clear_bam_transaction
>>>>>>>>>>>>> when READID is issued makes the DMA totally clog up and 
>>>>>>>>>>>>> refuse >>>>>>>>> to function
>>>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data 
>>>>>>>>>>>>> >>>>>>>>> gets garbled
>>>>>>>>>>>>> and after a short while in the nand probe flow, the CPU 
>>>>>>>>>>>>> >>>>>>>>> decides that
>>>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Removing _READID from the if condition makes it work like 
>>>>>>>>>>>>> a >>>>>>>>> charm, I can
>>>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> This is totally just an observation which took me an 
>>>>>>>>>>>>> inhumane >>>>>>>>> amount of
>>>>>>>>>>>>> debug prints to find.. perhaps there's a better reason 
>>>>>>>>>>>>> behind >>>>>>>>> this, but
>>>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a 
>>>>>>>>>>>>> BIG RFC!
>>>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on 
>>>>>>>>>>>> this >>>>>>>> driver.
>>>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>>>> Sadre, I've spent a significant amount of time reviewing 
>>>>>>>>>>> your >>>>>>> patches,
>>>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>>>> proposals.
>>>>>>>>>>>
>>>>>>>>>>> Please help reviewing this patch.
>>>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't 
>>>>>>>>>> >>>>>> have any idea
>>>>>>>>>> about the mdm9607 platform. It could be that the mail server 
>>>>>>>>>> >>>>>> migration from
>>>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>>>
>>>>>>>>>> Let me ping them internally.
>>>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>>>     Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>>>     Thanks to Mani for pinging us, we will test this up today 
>>>>>>>> and >>>> get back.
>>>>>>>        While we could not reproduce this issue on our ipq boards 
>>>>>>> (do >>> not have a mdm9607 right now) and
>>>>>>>         issue does not look any obvious.
>>>>>>>        can you please give the debug logs that you did for the 
>>>>>>> above >>> stage by stage ?
>>>>>> I won't have access to the board for about two weeks, sorry.
>>>>>>
>>>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>>>
>>>>>> wasn't much more than just something jumping to who-knows-where
>>>>>>
>>>>>> after clear_bam_transaction was called, resulting in values >> 
>>>>>> associated with
>>>>>>
>>>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>>>
>>>>>      Ok sure. So was the READID command itself failing (or) the > 
>>>>> subsequent one ?
>>>>>     We can check which parameter reset by the 
>>>>> clear_bam_transaction is > causing the
>>>>>     failure.  Meanwhile, looping in Pradeep who has access to the 
>>>>> > board, so in a better
>>>>>     position to debug.
>>>> I'm sorry I have so few details on hand, and no kernel tree (no 
>>>> access to that machine either, for now).
>>>>
>>>>
>>>> I will try to describe to the best of my abilities what I recall.
>>>>
>>>>
>>>> My methodology of making sure things don't go haywire was to print 
>>>> the oob size
>>>>
>>>> of our NAND basically every two lines of code (yes, i was very 
>>>> desperate at one point),
>>>>
>>>> as that was zeroed out when *the bug* happened,
>>> This does look like a pointer error at some point and some kernel data
>>> has been corrupted very badly by the driver.
>>>
>>>> leading to a kernel bug/panic/stall
>>>>
>>>> (can't recall what exactly it was, but it said something along the 
>>>> lines of "no support for
>>>>
>>>> oob size 0" and then it didn't fail graceully, leading to some bad 
>>>> jumps and ultimately
>>>>
>>>> a dead platform..)
>>>>
>>>>
>>>> after hours of digging, I found out that everything goes fine until 
>>>> clear_bam_transaction is called,
>>> Do you remember if this function was called for the first time when
>>> this happened?
>>
>> I think so, if I recall correctly there are no more callers in this 
>> path, as readid is the first nand command executed in flash probe flow.
>>
>>
>>
>>>
>>>> after that gets executed every nand op starts reading all zeroes 
>>>> (for example in JEDEC ID check)
>>>>
>>>> so I added the changes from this patch, and things magically 
>>>> started working... My suspicion is
>>>>
>>>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? 
>>>> bah, i work on too many socs at once)
>>> I don't see it in the list of supported devices, what's the exact
>>> compatible used?
>>
>> qcom,ipq4019-nand
>>
>>
>>
>>>
>>>> and this function only makes Linux think it is, without actually 
>>>> draining it, and the leftover
>>>>
>>>> commands get executed with some parts of them getting overwritten, 
>>>> resulting in the
>>>>
>>>> famous garbage in - garbage out situation, but that's only a 
>>>> guesstimate..
>>> I would bet for a non allocated bam-ish pointer that is reset to zero
>>> in the clear_bam_transaction() helper.
>>>
>>> Can you get your hands on the board again?
>>
>> Sure, but as I mentioned previously, only in about 2 weeks, I can't 
>> really do any dev before then.. :(
>>
>>
>>
>>> It would be nice to check if the allocation always occurs before use,
>>> and if yes on how much bytes.
>>>
>>> If the pointer is not dangling, then perhaps something else smashes
>>> that pointer.
>>
>>
>> Konrad
>>
>>>
>>>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. 
>>>> I went as far as replacing most
>>>>
>>>> of the kernel with the updated/downgraded parts via git checkout (i 
>>>> tried many combinations),
>>>>
>>>> to no avail.. I even tried different compilers and optimization 
>>>> levels, thinking it could have been
>>>>
>>>> a codegen issue, but no luck either.
>>>>
>>>>
>>>> I.. do understand this email is a total mess to read, as much as it 
>>>> was to write, but
>>>>
>>>> without access to my code and the machine itself I can't give you 
>>>> solid details, and
>>>>
>>>> the fact this situation is far from ordinary doesn't help either..
>>>>
>>>>
>>>> The latest (ancient, not quite pretty, but probably working if my 
>>>> memory is correct) version of my patches
>>>>
>>>> for the mdm9607 is available at [1], I will push the new revision 
>>>> after I get access to the workstation.
>>>>
>   + few more who have access to the board.
>
>    Going by the description, for kernel corruption, we can try out a 
> KASAN build.
>    Since you have mentioned it worked till 5.11, you bisected the 
> driver till 5.11 head and it worked ?
>
    Tried running a KASAN enabled image on IPQ board, but no luck. 
Nothing came out.
    Only if someone with the board can help here, we can proceed


Regards,
   Sricharan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-02-04 17:17                           ` Sricharan Ramabadhran
  0 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-02-04 17:17 UTC (permalink / raw)
  To: Konrad Dybcio, Miquel Raynal
  Cc: Manivannan Sadhasivam, pragalla, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk


On 2/2/2022 12:54 PM, Sricharan Ramabadhran wrote:
> Hi Konrad/Miquel,
>
> On 2/1/2022 9:21 PM, Konrad Dybcio wrote:
>>
>> On 01/02/2022 14:52, Miquel Raynal wrote:
>>> Hi Konrad,
>>>
>>> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>>>
>>>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>>>> Hi Konrad,
>>>>>
>>>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>>>> Hi Konrad,
>>>>>>>
>>>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>>>> Hi Miquel,
>>>>>>>>
>>>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>>>> Hi Mani,
>>>>>>>>>
>>>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 
>>>>>>>>>>> +0100:
>>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>>
>>>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 
>>>>>>>>>>>> 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>>>> While I have absolutely 0 idea why and how, running 
>>>>>>>>>>>>> >>>>>>>>> clear_bam_transaction
>>>>>>>>>>>>> when READID is issued makes the DMA totally clog up and 
>>>>>>>>>>>>> refuse >>>>>>>>> to function
>>>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data 
>>>>>>>>>>>>> >>>>>>>>> gets garbled
>>>>>>>>>>>>> and after a short while in the nand probe flow, the CPU 
>>>>>>>>>>>>> >>>>>>>>> decides that
>>>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Removing _READID from the if condition makes it work like 
>>>>>>>>>>>>> a >>>>>>>>> charm, I can
>>>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> This is totally just an observation which took me an 
>>>>>>>>>>>>> inhumane >>>>>>>>> amount of
>>>>>>>>>>>>> debug prints to find.. perhaps there's a better reason 
>>>>>>>>>>>>> behind >>>>>>>>> this, but
>>>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a 
>>>>>>>>>>>>> BIG RFC!
>>>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on 
>>>>>>>>>>>> this >>>>>>>> driver.
>>>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>>>> Sadre, I've spent a significant amount of time reviewing 
>>>>>>>>>>> your >>>>>>> patches,
>>>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>>>> proposals.
>>>>>>>>>>>
>>>>>>>>>>> Please help reviewing this patch.
>>>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't 
>>>>>>>>>> >>>>>> have any idea
>>>>>>>>>> about the mdm9607 platform. It could be that the mail server 
>>>>>>>>>> >>>>>> migration from
>>>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>>>
>>>>>>>>>> Let me ping them internally.
>>>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>>>     Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>>>     Thanks to Mani for pinging us, we will test this up today 
>>>>>>>> and >>>> get back.
>>>>>>>        While we could not reproduce this issue on our ipq boards 
>>>>>>> (do >>> not have a mdm9607 right now) and
>>>>>>>         issue does not look any obvious.
>>>>>>>        can you please give the debug logs that you did for the 
>>>>>>> above >>> stage by stage ?
>>>>>> I won't have access to the board for about two weeks, sorry.
>>>>>>
>>>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>>>
>>>>>> wasn't much more than just something jumping to who-knows-where
>>>>>>
>>>>>> after clear_bam_transaction was called, resulting in values >> 
>>>>>> associated with
>>>>>>
>>>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>>>
>>>>>      Ok sure. So was the READID command itself failing (or) the > 
>>>>> subsequent one ?
>>>>>     We can check which parameter reset by the 
>>>>> clear_bam_transaction is > causing the
>>>>>     failure.  Meanwhile, looping in Pradeep who has access to the 
>>>>> > board, so in a better
>>>>>     position to debug.
>>>> I'm sorry I have so few details on hand, and no kernel tree (no 
>>>> access to that machine either, for now).
>>>>
>>>>
>>>> I will try to describe to the best of my abilities what I recall.
>>>>
>>>>
>>>> My methodology of making sure things don't go haywire was to print 
>>>> the oob size
>>>>
>>>> of our NAND basically every two lines of code (yes, i was very 
>>>> desperate at one point),
>>>>
>>>> as that was zeroed out when *the bug* happened,
>>> This does look like a pointer error at some point and some kernel data
>>> has been corrupted very badly by the driver.
>>>
>>>> leading to a kernel bug/panic/stall
>>>>
>>>> (can't recall what exactly it was, but it said something along the 
>>>> lines of "no support for
>>>>
>>>> oob size 0" and then it didn't fail graceully, leading to some bad 
>>>> jumps and ultimately
>>>>
>>>> a dead platform..)
>>>>
>>>>
>>>> after hours of digging, I found out that everything goes fine until 
>>>> clear_bam_transaction is called,
>>> Do you remember if this function was called for the first time when
>>> this happened?
>>
>> I think so, if I recall correctly there are no more callers in this 
>> path, as readid is the first nand command executed in flash probe flow.
>>
>>
>>
>>>
>>>> after that gets executed every nand op starts reading all zeroes 
>>>> (for example in JEDEC ID check)
>>>>
>>>> so I added the changes from this patch, and things magically 
>>>> started working... My suspicion is
>>>>
>>>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? 
>>>> bah, i work on too many socs at once)
>>> I don't see it in the list of supported devices, what's the exact
>>> compatible used?
>>
>> qcom,ipq4019-nand
>>
>>
>>
>>>
>>>> and this function only makes Linux think it is, without actually 
>>>> draining it, and the leftover
>>>>
>>>> commands get executed with some parts of them getting overwritten, 
>>>> resulting in the
>>>>
>>>> famous garbage in - garbage out situation, but that's only a 
>>>> guesstimate..
>>> I would bet for a non allocated bam-ish pointer that is reset to zero
>>> in the clear_bam_transaction() helper.
>>>
>>> Can you get your hands on the board again?
>>
>> Sure, but as I mentioned previously, only in about 2 weeks, I can't 
>> really do any dev before then.. :(
>>
>>
>>
>>> It would be nice to check if the allocation always occurs before use,
>>> and if yes on how much bytes.
>>>
>>> If the pointer is not dangling, then perhaps something else smashes
>>> that pointer.
>>
>>
>> Konrad
>>
>>>
>>>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. 
>>>> I went as far as replacing most
>>>>
>>>> of the kernel with the updated/downgraded parts via git checkout (i 
>>>> tried many combinations),
>>>>
>>>> to no avail.. I even tried different compilers and optimization 
>>>> levels, thinking it could have been
>>>>
>>>> a codegen issue, but no luck either.
>>>>
>>>>
>>>> I.. do understand this email is a total mess to read, as much as it 
>>>> was to write, but
>>>>
>>>> without access to my code and the machine itself I can't give you 
>>>> solid details, and
>>>>
>>>> the fact this situation is far from ordinary doesn't help either..
>>>>
>>>>
>>>> The latest (ancient, not quite pretty, but probably working if my 
>>>> memory is correct) version of my patches
>>>>
>>>> for the mdm9607 is available at [1], I will push the new revision 
>>>> after I get access to the workstation.
>>>>
>   + few more who have access to the board.
>
>    Going by the description, for kernel corruption, we can try out a 
> KASAN build.
>    Since you have mentioned it worked till 5.11, you bisected the 
> driver till 5.11 head and it worked ?
>
    Tried running a KASAN enabled image on IPQ board, but no luck. 
Nothing came out.
    Only if someone with the board can help here, we can proceed


Regards,
   Sricharan


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-02-04 17:17                           ` Sricharan Ramabadhran
@ 2022-02-08 16:45                             ` Konrad Dybcio
  -1 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-02-08 16:45 UTC (permalink / raw)
  To: Sricharan Ramabadhran, Miquel Raynal
  Cc: Manivannan Sadhasivam, pragalla, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk



On 4.02.2022 18:17, Sricharan Ramabadhran wrote:
> 
> On 2/2/2022 12:54 PM, Sricharan Ramabadhran wrote:
>> Hi Konrad/Miquel,
>>
>> On 2/1/2022 9:21 PM, Konrad Dybcio wrote:
>>>
>>> On 01/02/2022 14:52, Miquel Raynal wrote:
>>>> Hi Konrad,
>>>>
>>>> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>>>>
>>>>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>>>>> Hi Konrad,
>>>>>>
>>>>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>>>>> Hi Konrad,
>>>>>>>>
>>>>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>>>>> Hi Miquel,
>>>>>>>>>
>>>>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>>>>> Hi Mani,
>>>>>>>>>>
>>>>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>>>
>>>>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>>>>> While I have absolutely 0 idea why and how, running >>>>>>>>> clear_bam_transaction
>>>>>>>>>>>>>> when READID is issued makes the DMA totally clog up and refuse >>>>>>>>> to function
>>>>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data >>>>>>>>> gets garbled
>>>>>>>>>>>>>> and after a short while in the nand probe flow, the CPU >>>>>>>>> decides that
>>>>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Removing _READID from the if condition makes it work like a >>>>>>>>> charm, I can
>>>>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> This is totally just an observation which took me an inhumane >>>>>>>>> amount of
>>>>>>>>>>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>>>>> this, but
>>>>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on this >>>>>>>> driver.
>>>>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>>>>> Sadre, I've spent a significant amount of time reviewing your >>>>>>> patches,
>>>>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>>>>> proposals.
>>>>>>>>>>>>
>>>>>>>>>>>> Please help reviewing this patch.
>>>>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't >>>>>> have any idea
>>>>>>>>>>> about the mdm9607 platform. It could be that the mail server >>>>>> migration from
>>>>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>>>>
>>>>>>>>>>> Let me ping them internally.
>>>>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>>>>     Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>>>>     Thanks to Mani for pinging us, we will test this up today and >>>> get back.
>>>>>>>>        While we could not reproduce this issue on our ipq boards (do >>> not have a mdm9607 right now) and
>>>>>>>>         issue does not look any obvious.
>>>>>>>>        can you please give the debug logs that you did for the above >>> stage by stage ?
>>>>>>> I won't have access to the board for about two weeks, sorry.
>>>>>>>
>>>>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>>>>
>>>>>>> wasn't much more than just something jumping to who-knows-where
>>>>>>>
>>>>>>> after clear_bam_transaction was called, resulting in values >> associated with
>>>>>>>
>>>>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>>>>
>>>>>>      Ok sure. So was the READID command itself failing (or) the > subsequent one ?
>>>>>>     We can check which parameter reset by the clear_bam_transaction is > causing the
>>>>>>     failure.  Meanwhile, looping in Pradeep who has access to the > board, so in a better
>>>>>>     position to debug.
>>>>> I'm sorry I have so few details on hand, and no kernel tree (no access to that machine either, for now).
>>>>>
>>>>>
>>>>> I will try to describe to the best of my abilities what I recall.
>>>>>
>>>>>
>>>>> My methodology of making sure things don't go haywire was to print the oob size
>>>>>
>>>>> of our NAND basically every two lines of code (yes, i was very desperate at one point),
>>>>>
>>>>> as that was zeroed out when *the bug* happened,
>>>> This does look like a pointer error at some point and some kernel data
>>>> has been corrupted very badly by the driver.
>>>>
>>>>> leading to a kernel bug/panic/stall
>>>>>
>>>>> (can't recall what exactly it was, but it said something along the lines of "no support for
>>>>>
>>>>> oob size 0" and then it didn't fail graceully, leading to some bad jumps and ultimately
>>>>>
>>>>> a dead platform..)
>>>>>
>>>>>
>>>>> after hours of digging, I found out that everything goes fine until clear_bam_transaction is called,
>>>> Do you remember if this function was called for the first time when
>>>> this happened?
>>>
>>> I think so, if I recall correctly there are no more callers in this path, as readid is the first nand command executed in flash probe flow.
>>>
>>>
>>>
>>>>
>>>>> after that gets executed every nand op starts reading all zeroes (for example in JEDEC ID check)
>>>>>
>>>>> so I added the changes from this patch, and things magically started working... My suspicion is
>>>>>
>>>>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, i work on too many socs at once)
>>>> I don't see it in the list of supported devices, what's the exact
>>>> compatible used?
>>>
>>> qcom,ipq4019-nand
>>>
>>>
>>>
>>>>
>>>>> and this function only makes Linux think it is, without actually draining it, and the leftover
>>>>>
>>>>> commands get executed with some parts of them getting overwritten, resulting in the
>>>>>
>>>>> famous garbage in - garbage out situation, but that's only a guesstimate..
>>>> I would bet for a non allocated bam-ish pointer that is reset to zero
>>>> in the clear_bam_transaction() helper.
>>>>
>>>> Can you get your hands on the board again?
>>>
>>> Sure, but as I mentioned previously, only in about 2 weeks, I can't really do any dev before then.. :(
>>>
>>>
>>>
>>>> It would be nice to check if the allocation always occurs before use,
>>>> and if yes on how much bytes.
>>>>
>>>> If the pointer is not dangling, then perhaps something else smashes
>>>> that pointer.
>>>
>>>
>>> Konrad
>>>
>>>>
>>>>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I went as far as replacing most
>>>>>
>>>>> of the kernel with the updated/downgraded parts via git checkout (i tried many combinations),
>>>>>
>>>>> to no avail.. I even tried different compilers and optimization levels, thinking it could have been
>>>>>
>>>>> a codegen issue, but no luck either.
>>>>>
>>>>>
>>>>> I.. do understand this email is a total mess to read, as much as it was to write, but
>>>>>
>>>>> without access to my code and the machine itself I can't give you solid details, and
>>>>>
>>>>> the fact this situation is far from ordinary doesn't help either..
>>>>>
>>>>>
>>>>> The latest (ancient, not quite pretty, but probably working if my memory is correct) version of my patches
>>>>>
>>>>> for the mdm9607 is available at [1], I will push the new revision after I get access to the workstation.
>>>>>
>>   + few more who have access to the board.
>>
>>    Going by the description, for kernel corruption, we can try out a KASAN build.
>>    Since you have mentioned it worked till 5.11, you bisected the driver till 5.11 head and it worked ?
>>
>    Tried running a KASAN enabled image on IPQ board, but no luck. Nothing came out.
>    Only if someone with the board can help here, we can proceed
> 
> 
> Regards,
>   Sricharan
> 
I have the board with me again. Please tell me where do we start :)

Konrad

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-02-08 16:45                             ` Konrad Dybcio
  0 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-02-08 16:45 UTC (permalink / raw)
  To: Sricharan Ramabadhran, Miquel Raynal
  Cc: Manivannan Sadhasivam, pragalla, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk



On 4.02.2022 18:17, Sricharan Ramabadhran wrote:
> 
> On 2/2/2022 12:54 PM, Sricharan Ramabadhran wrote:
>> Hi Konrad/Miquel,
>>
>> On 2/1/2022 9:21 PM, Konrad Dybcio wrote:
>>>
>>> On 01/02/2022 14:52, Miquel Raynal wrote:
>>>> Hi Konrad,
>>>>
>>>> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>>>>
>>>>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>>>>> Hi Konrad,
>>>>>>
>>>>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>>>>> Hi Konrad,
>>>>>>>>
>>>>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>>>>> Hi Miquel,
>>>>>>>>>
>>>>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>>>>> Hi Mani,
>>>>>>>>>>
>>>>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>>>
>>>>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>>>>> While I have absolutely 0 idea why and how, running >>>>>>>>> clear_bam_transaction
>>>>>>>>>>>>>> when READID is issued makes the DMA totally clog up and refuse >>>>>>>>> to function
>>>>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data >>>>>>>>> gets garbled
>>>>>>>>>>>>>> and after a short while in the nand probe flow, the CPU >>>>>>>>> decides that
>>>>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Removing _READID from the if condition makes it work like a >>>>>>>>> charm, I can
>>>>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> This is totally just an observation which took me an inhumane >>>>>>>>> amount of
>>>>>>>>>>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>>>>> this, but
>>>>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on this >>>>>>>> driver.
>>>>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>>>>> Sadre, I've spent a significant amount of time reviewing your >>>>>>> patches,
>>>>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>>>>> proposals.
>>>>>>>>>>>>
>>>>>>>>>>>> Please help reviewing this patch.
>>>>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't >>>>>> have any idea
>>>>>>>>>>> about the mdm9607 platform. It could be that the mail server >>>>>> migration from
>>>>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>>>>
>>>>>>>>>>> Let me ping them internally.
>>>>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>>>>     Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>>>>     Thanks to Mani for pinging us, we will test this up today and >>>> get back.
>>>>>>>>        While we could not reproduce this issue on our ipq boards (do >>> not have a mdm9607 right now) and
>>>>>>>>         issue does not look any obvious.
>>>>>>>>        can you please give the debug logs that you did for the above >>> stage by stage ?
>>>>>>> I won't have access to the board for about two weeks, sorry.
>>>>>>>
>>>>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>>>>
>>>>>>> wasn't much more than just something jumping to who-knows-where
>>>>>>>
>>>>>>> after clear_bam_transaction was called, resulting in values >> associated with
>>>>>>>
>>>>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>>>>
>>>>>>      Ok sure. So was the READID command itself failing (or) the > subsequent one ?
>>>>>>     We can check which parameter reset by the clear_bam_transaction is > causing the
>>>>>>     failure.  Meanwhile, looping in Pradeep who has access to the > board, so in a better
>>>>>>     position to debug.
>>>>> I'm sorry I have so few details on hand, and no kernel tree (no access to that machine either, for now).
>>>>>
>>>>>
>>>>> I will try to describe to the best of my abilities what I recall.
>>>>>
>>>>>
>>>>> My methodology of making sure things don't go haywire was to print the oob size
>>>>>
>>>>> of our NAND basically every two lines of code (yes, i was very desperate at one point),
>>>>>
>>>>> as that was zeroed out when *the bug* happened,
>>>> This does look like a pointer error at some point and some kernel data
>>>> has been corrupted very badly by the driver.
>>>>
>>>>> leading to a kernel bug/panic/stall
>>>>>
>>>>> (can't recall what exactly it was, but it said something along the lines of "no support for
>>>>>
>>>>> oob size 0" and then it didn't fail graceully, leading to some bad jumps and ultimately
>>>>>
>>>>> a dead platform..)
>>>>>
>>>>>
>>>>> after hours of digging, I found out that everything goes fine until clear_bam_transaction is called,
>>>> Do you remember if this function was called for the first time when
>>>> this happened?
>>>
>>> I think so, if I recall correctly there are no more callers in this path, as readid is the first nand command executed in flash probe flow.
>>>
>>>
>>>
>>>>
>>>>> after that gets executed every nand op starts reading all zeroes (for example in JEDEC ID check)
>>>>>
>>>>> so I added the changes from this patch, and things magically started working... My suspicion is
>>>>>
>>>>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, i work on too many socs at once)
>>>> I don't see it in the list of supported devices, what's the exact
>>>> compatible used?
>>>
>>> qcom,ipq4019-nand
>>>
>>>
>>>
>>>>
>>>>> and this function only makes Linux think it is, without actually draining it, and the leftover
>>>>>
>>>>> commands get executed with some parts of them getting overwritten, resulting in the
>>>>>
>>>>> famous garbage in - garbage out situation, but that's only a guesstimate..
>>>> I would bet for a non allocated bam-ish pointer that is reset to zero
>>>> in the clear_bam_transaction() helper.
>>>>
>>>> Can you get your hands on the board again?
>>>
>>> Sure, but as I mentioned previously, only in about 2 weeks, I can't really do any dev before then.. :(
>>>
>>>
>>>
>>>> It would be nice to check if the allocation always occurs before use,
>>>> and if yes on how much bytes.
>>>>
>>>> If the pointer is not dangling, then perhaps something else smashes
>>>> that pointer.
>>>
>>>
>>> Konrad
>>>
>>>>
>>>>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I went as far as replacing most
>>>>>
>>>>> of the kernel with the updated/downgraded parts via git checkout (i tried many combinations),
>>>>>
>>>>> to no avail.. I even tried different compilers and optimization levels, thinking it could have been
>>>>>
>>>>> a codegen issue, but no luck either.
>>>>>
>>>>>
>>>>> I.. do understand this email is a total mess to read, as much as it was to write, but
>>>>>
>>>>> without access to my code and the machine itself I can't give you solid details, and
>>>>>
>>>>> the fact this situation is far from ordinary doesn't help either..
>>>>>
>>>>>
>>>>> The latest (ancient, not quite pretty, but probably working if my memory is correct) version of my patches
>>>>>
>>>>> for the mdm9607 is available at [1], I will push the new revision after I get access to the workstation.
>>>>>
>>   + few more who have access to the board.
>>
>>    Going by the description, for kernel corruption, we can try out a KASAN build.
>>    Since you have mentioned it worked till 5.11, you bisected the driver till 5.11 head and it worked ?
>>
>    Tried running a KASAN enabled image on IPQ board, but no luck. Nothing came out.
>    Only if someone with the board can help here, we can proceed
> 
> 
> Regards,
>   Sricharan
> 
I have the board with me again. Please tell me where do we start :)

Konrad

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-02-08 16:45                             ` Konrad Dybcio
@ 2022-02-24  7:33                               ` Sricharan Ramabadhran
  -1 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-02-24  7:33 UTC (permalink / raw)
  To: Konrad Dybcio, Miquel Raynal
  Cc: Manivannan Sadhasivam, pragalla, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk

Hi Konrad,

On 2/8/2022 10:15 PM, Konrad Dybcio wrote:
>
> On 4.02.2022 18:17, Sricharan Ramabadhran wrote:
>> On 2/2/2022 12:54 PM, Sricharan Ramabadhran wrote:
>>> Hi Konrad/Miquel,
>>>
>>> On 2/1/2022 9:21 PM, Konrad Dybcio wrote:
>>>> On 01/02/2022 14:52, Miquel Raynal wrote:
>>>>> Hi Konrad,
>>>>>
>>>>> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>>>>>
>>>>>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>>>>>> Hi Konrad,
>>>>>>>
>>>>>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>>>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>>>>>> Hi Konrad,
>>>>>>>>>
>>>>>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>>>>>> Hi Miquel,
>>>>>>>>>>
>>>>>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>>>>>> Hi Mani,
>>>>>>>>>>>
>>>>>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>>>>>> While I have absolutely 0 idea why and how, running >>>>>>>>> clear_bam_transaction
>>>>>>>>>>>>>>> when READID is issued makes the DMA totally clog up and refuse >>>>>>>>> to function
>>>>>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data >>>>>>>>> gets garbled
>>>>>>>>>>>>>>> and after a short while in the nand probe flow, the CPU >>>>>>>>> decides that
>>>>>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Removing _READID from the if condition makes it work like a >>>>>>>>> charm, I can
>>>>>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>> This is totally just an observation which took me an inhumane >>>>>>>>> amount of
>>>>>>>>>>>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>>>>> this, but
>>>>>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on this >>>>>>>> driver.
>>>>>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>>>>>> Sadre, I've spent a significant amount of time reviewing your >>>>>>> patches,
>>>>>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>>>>>> proposals.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please help reviewing this patch.
>>>>>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't >>>>>> have any idea
>>>>>>>>>>>> about the mdm9607 platform. It could be that the mail server >>>>>> migration from
>>>>>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>>>>>
>>>>>>>>>>>> Let me ping them internally.
>>>>>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>>>>>      Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>>>>>      Thanks to Mani for pinging us, we will test this up today and >>>> get back.
>>>>>>>>>         While we could not reproduce this issue on our ipq boards (do >>> not have a mdm9607 right now) and
>>>>>>>>>          issue does not look any obvious.
>>>>>>>>>         can you please give the debug logs that you did for the above >>> stage by stage ?
>>>>>>>> I won't have access to the board for about two weeks, sorry.
>>>>>>>>
>>>>>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>>>>>
>>>>>>>> wasn't much more than just something jumping to who-knows-where
>>>>>>>>
>>>>>>>> after clear_bam_transaction was called, resulting in values >> associated with
>>>>>>>>
>>>>>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>>>>>
>>>>>>>       Ok sure. So was the READID command itself failing (or) the > subsequent one ?
>>>>>>>      We can check which parameter reset by the clear_bam_transaction is > causing the
>>>>>>>      failure.  Meanwhile, looping in Pradeep who has access to the > board, so in a better
>>>>>>>      position to debug.
>>>>>> I'm sorry I have so few details on hand, and no kernel tree (no access to that machine either, for now).
>>>>>>
>>>>>>
>>>>>> I will try to describe to the best of my abilities what I recall.
>>>>>>
>>>>>>
>>>>>> My methodology of making sure things don't go haywire was to print the oob size
>>>>>>
>>>>>> of our NAND basically every two lines of code (yes, i was very desperate at one point),
>>>>>>
>>>>>> as that was zeroed out when *the bug* happened,
>>>>> This does look like a pointer error at some point and some kernel data
>>>>> has been corrupted very badly by the driver.
>>>>>
>>>>>> leading to a kernel bug/panic/stall
>>>>>>
>>>>>> (can't recall what exactly it was, but it said something along the lines of "no support for
>>>>>>
>>>>>> oob size 0" and then it didn't fail graceully, leading to some bad jumps and ultimately
>>>>>>
>>>>>> a dead platform..)
>>>>>>
>>>>>>
>>>>>> after hours of digging, I found out that everything goes fine until clear_bam_transaction is called,
>>>>> Do you remember if this function was called for the first time when
>>>>> this happened?
>>>> I think so, if I recall correctly there are no more callers in this path, as readid is the first nand command executed in flash probe flow.
>>>>
>>>>
>>>>
>>>>>> after that gets executed every nand op starts reading all zeroes (for example in JEDEC ID check)
>>>>>>
>>>>>> so I added the changes from this patch, and things magically started working... My suspicion is
>>>>>>
>>>>>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, i work on too many socs at once)
>>>>> I don't see it in the list of supported devices, what's the exact
>>>>> compatible used?
>>>> qcom,ipq4019-nand
>>>>
>>>>
>>>>
>>>>>> and this function only makes Linux think it is, without actually draining it, and the leftover
>>>>>>
>>>>>> commands get executed with some parts of them getting overwritten, resulting in the
>>>>>>
>>>>>> famous garbage in - garbage out situation, but that's only a guesstimate..
>>>>> I would bet for a non allocated bam-ish pointer that is reset to zero
>>>>> in the clear_bam_transaction() helper.
>>>>>
>>>>> Can you get your hands on the board again?
>>>> Sure, but as I mentioned previously, only in about 2 weeks, I can't really do any dev before then.. :(
>>>>
>>>>
>>>>
>>>>> It would be nice to check if the allocation always occurs before use,
>>>>> and if yes on how much bytes.
>>>>>
>>>>> If the pointer is not dangling, then perhaps something else smashes
>>>>> that pointer.
>>>>
>>>> Konrad
>>>>
>>>>>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I went as far as replacing most
>>>>>>
>>>>>> of the kernel with the updated/downgraded parts via git checkout (i tried many combinations),
>>>>>>
>>>>>> to no avail.. I even tried different compilers and optimization levels, thinking it could have been
>>>>>>
>>>>>> a codegen issue, but no luck either.
>>>>>>
>>>>>>
>>>>>> I.. do understand this email is a total mess to read, as much as it was to write, but
>>>>>>
>>>>>> without access to my code and the machine itself I can't give you solid details, and
>>>>>>
>>>>>> the fact this situation is far from ordinary doesn't help either..
>>>>>>
>>>>>>
>>>>>> The latest (ancient, not quite pretty, but probably working if my memory is correct) version of my patches
>>>>>>
>>>>>> for the mdm9607 is available at [1], I will push the new revision after I get access to the workstation.
>>>>>>
>>>    + few more who have access to the board.
>>>
>>>     Going by the description, for kernel corruption, we can try out a KASAN build.
>>>     Since you have mentioned it worked till 5.11, you bisected the driver till 5.11 head and it worked ?
>>>
>>     Tried running a KASAN enabled image on IPQ board, but no luck. Nothing came out.
>>     Only if someone with the board can help here, we can proceed
>>
>>
>> Regards,
>>    Sricharan
>>
> I have the board with me again. Please tell me where do we start :)

  Sorry for the delayed response.

      As a first step, Can you enable KASAN and check if you get any 
warnings ?

      Then, can you check inside clear_bam_transaction, which parameter 
resetting specifically is causing the issue ?


Regards,
   Sricharan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-02-24  7:33                               ` Sricharan Ramabadhran
  0 siblings, 0 replies; 42+ messages in thread
From: Sricharan Ramabadhran @ 2022-02-24  7:33 UTC (permalink / raw)
  To: Konrad Dybcio, Miquel Raynal
  Cc: Manivannan Sadhasivam, pragalla, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk

Hi Konrad,

On 2/8/2022 10:15 PM, Konrad Dybcio wrote:
>
> On 4.02.2022 18:17, Sricharan Ramabadhran wrote:
>> On 2/2/2022 12:54 PM, Sricharan Ramabadhran wrote:
>>> Hi Konrad/Miquel,
>>>
>>> On 2/1/2022 9:21 PM, Konrad Dybcio wrote:
>>>> On 01/02/2022 14:52, Miquel Raynal wrote:
>>>>> Hi Konrad,
>>>>>
>>>>> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>>>>>
>>>>>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>>>>>> Hi Konrad,
>>>>>>>
>>>>>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>>>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>>>>>> Hi Konrad,
>>>>>>>>>
>>>>>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>>>>>> Hi Miquel,
>>>>>>>>>>
>>>>>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>>>>>> Hi Mani,
>>>>>>>>>>>
>>>>>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>>>>>> While I have absolutely 0 idea why and how, running >>>>>>>>> clear_bam_transaction
>>>>>>>>>>>>>>> when READID is issued makes the DMA totally clog up and refuse >>>>>>>>> to function
>>>>>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data >>>>>>>>> gets garbled
>>>>>>>>>>>>>>> and after a short while in the nand probe flow, the CPU >>>>>>>>> decides that
>>>>>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Removing _READID from the if condition makes it work like a >>>>>>>>> charm, I can
>>>>>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>> This is totally just an observation which took me an inhumane >>>>>>>>> amount of
>>>>>>>>>>>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>>>>> this, but
>>>>>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on this >>>>>>>> driver.
>>>>>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>>>>>> Sadre, I've spent a significant amount of time reviewing your >>>>>>> patches,
>>>>>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>>>>>> proposals.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please help reviewing this patch.
>>>>>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't >>>>>> have any idea
>>>>>>>>>>>> about the mdm9607 platform. It could be that the mail server >>>>>> migration from
>>>>>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>>>>>
>>>>>>>>>>>> Let me ping them internally.
>>>>>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>>>>>      Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>>>>>      Thanks to Mani for pinging us, we will test this up today and >>>> get back.
>>>>>>>>>         While we could not reproduce this issue on our ipq boards (do >>> not have a mdm9607 right now) and
>>>>>>>>>          issue does not look any obvious.
>>>>>>>>>         can you please give the debug logs that you did for the above >>> stage by stage ?
>>>>>>>> I won't have access to the board for about two weeks, sorry.
>>>>>>>>
>>>>>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>>>>>
>>>>>>>> wasn't much more than just something jumping to who-knows-where
>>>>>>>>
>>>>>>>> after clear_bam_transaction was called, resulting in values >> associated with
>>>>>>>>
>>>>>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>>>>>
>>>>>>>       Ok sure. So was the READID command itself failing (or) the > subsequent one ?
>>>>>>>      We can check which parameter reset by the clear_bam_transaction is > causing the
>>>>>>>      failure.  Meanwhile, looping in Pradeep who has access to the > board, so in a better
>>>>>>>      position to debug.
>>>>>> I'm sorry I have so few details on hand, and no kernel tree (no access to that machine either, for now).
>>>>>>
>>>>>>
>>>>>> I will try to describe to the best of my abilities what I recall.
>>>>>>
>>>>>>
>>>>>> My methodology of making sure things don't go haywire was to print the oob size
>>>>>>
>>>>>> of our NAND basically every two lines of code (yes, i was very desperate at one point),
>>>>>>
>>>>>> as that was zeroed out when *the bug* happened,
>>>>> This does look like a pointer error at some point and some kernel data
>>>>> has been corrupted very badly by the driver.
>>>>>
>>>>>> leading to a kernel bug/panic/stall
>>>>>>
>>>>>> (can't recall what exactly it was, but it said something along the lines of "no support for
>>>>>>
>>>>>> oob size 0" and then it didn't fail graceully, leading to some bad jumps and ultimately
>>>>>>
>>>>>> a dead platform..)
>>>>>>
>>>>>>
>>>>>> after hours of digging, I found out that everything goes fine until clear_bam_transaction is called,
>>>>> Do you remember if this function was called for the first time when
>>>>> this happened?
>>>> I think so, if I recall correctly there are no more callers in this path, as readid is the first nand command executed in flash probe flow.
>>>>
>>>>
>>>>
>>>>>> after that gets executed every nand op starts reading all zeroes (for example in JEDEC ID check)
>>>>>>
>>>>>> so I added the changes from this patch, and things magically started working... My suspicion is
>>>>>>
>>>>>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, i work on too many socs at once)
>>>>> I don't see it in the list of supported devices, what's the exact
>>>>> compatible used?
>>>> qcom,ipq4019-nand
>>>>
>>>>
>>>>
>>>>>> and this function only makes Linux think it is, without actually draining it, and the leftover
>>>>>>
>>>>>> commands get executed with some parts of them getting overwritten, resulting in the
>>>>>>
>>>>>> famous garbage in - garbage out situation, but that's only a guesstimate..
>>>>> I would bet for a non allocated bam-ish pointer that is reset to zero
>>>>> in the clear_bam_transaction() helper.
>>>>>
>>>>> Can you get your hands on the board again?
>>>> Sure, but as I mentioned previously, only in about 2 weeks, I can't really do any dev before then.. :(
>>>>
>>>>
>>>>
>>>>> It would be nice to check if the allocation always occurs before use,
>>>>> and if yes on how much bytes.
>>>>>
>>>>> If the pointer is not dangling, then perhaps something else smashes
>>>>> that pointer.
>>>>
>>>> Konrad
>>>>
>>>>>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I went as far as replacing most
>>>>>>
>>>>>> of the kernel with the updated/downgraded parts via git checkout (i tried many combinations),
>>>>>>
>>>>>> to no avail.. I even tried different compilers and optimization levels, thinking it could have been
>>>>>>
>>>>>> a codegen issue, but no luck either.
>>>>>>
>>>>>>
>>>>>> I.. do understand this email is a total mess to read, as much as it was to write, but
>>>>>>
>>>>>> without access to my code and the machine itself I can't give you solid details, and
>>>>>>
>>>>>> the fact this situation is far from ordinary doesn't help either..
>>>>>>
>>>>>>
>>>>>> The latest (ancient, not quite pretty, but probably working if my memory is correct) version of my patches
>>>>>>
>>>>>> for the mdm9607 is available at [1], I will push the new revision after I get access to the workstation.
>>>>>>
>>>    + few more who have access to the board.
>>>
>>>     Going by the description, for kernel corruption, we can try out a KASAN build.
>>>     Since you have mentioned it worked till 5.11, you bisected the driver till 5.11 head and it worked ?
>>>
>>     Tried running a KASAN enabled image on IPQ board, but no luck. Nothing came out.
>>     Only if someone with the board can help here, we can proceed
>>
>>
>> Regards,
>>    Sricharan
>>
> I have the board with me again. Please tell me where do we start :)

  Sorry for the delayed response.

      As a first step, Can you enable KASAN and check if you get any 
warnings ?

      Then, can you check inside clear_bam_transaction, which parameter 
resetting specifically is causing the issue ?


Regards,
   Sricharan



______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-02-24  7:33                               ` Sricharan Ramabadhran
@ 2022-03-11 21:22                                 ` Konrad Dybcio
  -1 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-03-11 21:22 UTC (permalink / raw)
  To: Sricharan Ramabadhran, Miquel Raynal
  Cc: Manivannan Sadhasivam, pragalla, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk



On 24.02.2022 08:33, Sricharan Ramabadhran wrote:
> Hi Konrad,
> 
> On 2/8/2022 10:15 PM, Konrad Dybcio wrote:
>>
>> On 4.02.2022 18:17, Sricharan Ramabadhran wrote:
>>> On 2/2/2022 12:54 PM, Sricharan Ramabadhran wrote:
>>>> Hi Konrad/Miquel,
>>>>
>>>> On 2/1/2022 9:21 PM, Konrad Dybcio wrote:
>>>>> On 01/02/2022 14:52, Miquel Raynal wrote:
>>>>>> Hi Konrad,
>>>>>>
>>>>>> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>>>>>>
>>>>>>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>>>>>>> Hi Konrad,
>>>>>>>>
>>>>>>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>>>>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>>>>>>> Hi Konrad,
>>>>>>>>>>
>>>>>>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>>>>>>> Hi Miquel,
>>>>>>>>>>>
>>>>>>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>>>>>>> Hi Mani,
>>>>>>>>>>>>
>>>>>>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>>>>>>> While I have absolutely 0 idea why and how, running >>>>>>>>> clear_bam_transaction
>>>>>>>>>>>>>>>> when READID is issued makes the DMA totally clog up and refuse >>>>>>>>> to function
>>>>>>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data >>>>>>>>> gets garbled
>>>>>>>>>>>>>>>> and after a short while in the nand probe flow, the CPU >>>>>>>>> decides that
>>>>>>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Removing _READID from the if condition makes it work like a >>>>>>>>> charm, I can
>>>>>>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>> This is totally just an observation which took me an inhumane >>>>>>>>> amount of
>>>>>>>>>>>>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>>>>> this, but
>>>>>>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on this >>>>>>>> driver.
>>>>>>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>>>>>>> Sadre, I've spent a significant amount of time reviewing your >>>>>>> patches,
>>>>>>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>>>>>>> proposals.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please help reviewing this patch.
>>>>>>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't >>>>>> have any idea
>>>>>>>>>>>>> about the mdm9607 platform. It could be that the mail server >>>>>> migration from
>>>>>>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let me ping them internally.
>>>>>>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>>>>>>      Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>>>>>>      Thanks to Mani for pinging us, we will test this up today and >>>> get back.
>>>>>>>>>>         While we could not reproduce this issue on our ipq boards (do >>> not have a mdm9607 right now) and
>>>>>>>>>>          issue does not look any obvious.
>>>>>>>>>>         can you please give the debug logs that you did for the above >>> stage by stage ?
>>>>>>>>> I won't have access to the board for about two weeks, sorry.
>>>>>>>>>
>>>>>>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>>>>>>
>>>>>>>>> wasn't much more than just something jumping to who-knows-where
>>>>>>>>>
>>>>>>>>> after clear_bam_transaction was called, resulting in values >> associated with
>>>>>>>>>
>>>>>>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>>>>>>
>>>>>>>>       Ok sure. So was the READID command itself failing (or) the > subsequent one ?
>>>>>>>>      We can check which parameter reset by the clear_bam_transaction is > causing the
>>>>>>>>      failure.  Meanwhile, looping in Pradeep who has access to the > board, so in a better
>>>>>>>>      position to debug.
>>>>>>> I'm sorry I have so few details on hand, and no kernel tree (no access to that machine either, for now).
>>>>>>>
>>>>>>>
>>>>>>> I will try to describe to the best of my abilities what I recall.
>>>>>>>
>>>>>>>
>>>>>>> My methodology of making sure things don't go haywire was to print the oob size
>>>>>>>
>>>>>>> of our NAND basically every two lines of code (yes, i was very desperate at one point),
>>>>>>>
>>>>>>> as that was zeroed out when *the bug* happened,
>>>>>> This does look like a pointer error at some point and some kernel data
>>>>>> has been corrupted very badly by the driver.
>>>>>>
>>>>>>> leading to a kernel bug/panic/stall
>>>>>>>
>>>>>>> (can't recall what exactly it was, but it said something along the lines of "no support for
>>>>>>>
>>>>>>> oob size 0" and then it didn't fail graceully, leading to some bad jumps and ultimately
>>>>>>>
>>>>>>> a dead platform..)
>>>>>>>
>>>>>>>
>>>>>>> after hours of digging, I found out that everything goes fine until clear_bam_transaction is called,
>>>>>> Do you remember if this function was called for the first time when
>>>>>> this happened?
>>>>> I think so, if I recall correctly there are no more callers in this path, as readid is the first nand command executed in flash probe flow.
>>>>>
>>>>>
>>>>>
>>>>>>> after that gets executed every nand op starts reading all zeroes (for example in JEDEC ID check)
>>>>>>>
>>>>>>> so I added the changes from this patch, and things magically started working... My suspicion is
>>>>>>>
>>>>>>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, i work on too many socs at once)
>>>>>> I don't see it in the list of supported devices, what's the exact
>>>>>> compatible used?
>>>>> qcom,ipq4019-nand
>>>>>
>>>>>
>>>>>
>>>>>>> and this function only makes Linux think it is, without actually draining it, and the leftover
>>>>>>>
>>>>>>> commands get executed with some parts of them getting overwritten, resulting in the
>>>>>>>
>>>>>>> famous garbage in - garbage out situation, but that's only a guesstimate..
>>>>>> I would bet for a non allocated bam-ish pointer that is reset to zero
>>>>>> in the clear_bam_transaction() helper.
>>>>>>
>>>>>> Can you get your hands on the board again?
>>>>> Sure, but as I mentioned previously, only in about 2 weeks, I can't really do any dev before then.. :(
>>>>>
>>>>>
>>>>>
>>>>>> It would be nice to check if the allocation always occurs before use,
>>>>>> and if yes on how much bytes.
>>>>>>
>>>>>> If the pointer is not dangling, then perhaps something else smashes
>>>>>> that pointer.
>>>>>
>>>>> Konrad
>>>>>
>>>>>>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I went as far as replacing most
>>>>>>>
>>>>>>> of the kernel with the updated/downgraded parts via git checkout (i tried many combinations),
>>>>>>>
>>>>>>> to no avail.. I even tried different compilers and optimization levels, thinking it could have been
>>>>>>>
>>>>>>> a codegen issue, but no luck either.
>>>>>>>
>>>>>>>
>>>>>>> I.. do understand this email is a total mess to read, as much as it was to write, but
>>>>>>>
>>>>>>> without access to my code and the machine itself I can't give you solid details, and
>>>>>>>
>>>>>>> the fact this situation is far from ordinary doesn't help either..
>>>>>>>
>>>>>>>
>>>>>>> The latest (ancient, not quite pretty, but probably working if my memory is correct) version of my patches
>>>>>>>
>>>>>>> for the mdm9607 is available at [1], I will push the new revision after I get access to the workstation.
>>>>>>>
>>>>    + few more who have access to the board.
>>>>
>>>>     Going by the description, for kernel corruption, we can try out a KASAN build.
>>>>     Since you have mentioned it worked till 5.11, you bisected the driver till 5.11 head and it worked ?
>>>>
>>>     Tried running a KASAN enabled image on IPQ board, but no luck. Nothing came out.
>>>     Only if someone with the board can help here, we can proceed
>>>
>>>
>>> Regards,
>>>    Sricharan
>>>
>> I have the board with me again. Please tell me where do we start :)
> 
>  Sorry for the delayed response.
[Looks at the calendar] What can I say... lots of things happened :)


> 
>      As a first step, Can you enable KASAN and check if you get any warnings ?
> 
>      Then, can you check inside clear_bam_transaction, which parameter resetting specifically is causing the issue ?
> 
I have 3 logs for you:

[1] is KASAN=y, with this patch
[2] is KASAN=y, WITHOUT this patch (should die, but doesn't - does KASAN prevent it from doing something stupid?)
[3] is KASAN=n, WITHOUT this patch (dies as expected)

Looks like there's a lot happening..


Konrad
> 
> Regards,
>   Sricharan
> 
> 

[1] https://paste.debian.net/1233873/
[2] https://paste.debian.net/1233874/
[3] https://paste.debian.net/1233878/

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-03-11 21:22                                 ` Konrad Dybcio
  0 siblings, 0 replies; 42+ messages in thread
From: Konrad Dybcio @ 2022-03-11 21:22 UTC (permalink / raw)
  To: Sricharan Ramabadhran, Miquel Raynal
  Cc: Manivannan Sadhasivam, pragalla, ~postmarketos/upstreaming,
	martin.botka, angelogioacchino.delregno, marijn.suijten,
	jamipkettunen, Richard Weinberger, Vignesh Raghavendra,
	linux-mtd, linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk



On 24.02.2022 08:33, Sricharan Ramabadhran wrote:
> Hi Konrad,
> 
> On 2/8/2022 10:15 PM, Konrad Dybcio wrote:
>>
>> On 4.02.2022 18:17, Sricharan Ramabadhran wrote:
>>> On 2/2/2022 12:54 PM, Sricharan Ramabadhran wrote:
>>>> Hi Konrad/Miquel,
>>>>
>>>> On 2/1/2022 9:21 PM, Konrad Dybcio wrote:
>>>>> On 01/02/2022 14:52, Miquel Raynal wrote:
>>>>>> Hi Konrad,
>>>>>>
>>>>>> konrad.dybcio@somainline.org wrote on Mon, 31 Jan 2022 20:54:12 +0100:
>>>>>>
>>>>>>> On 31/01/2022 15:13, Sricharan Ramabadhran wrote:
>>>>>>>> Hi Konrad,
>>>>>>>>
>>>>>>>> On 1/31/2022 3:39 PM, Konrad Dybcio wrote:
>>>>>>>>> On 28/01/2022 18:50, Sricharan Ramabadhran wrote:
>>>>>>>>>> Hi Konrad,
>>>>>>>>>>
>>>>>>>>>> On 1/28/2022 9:55 AM, Sricharan Ramabadhran wrote:
>>>>>>>>>>> Hi Miquel,
>>>>>>>>>>>
>>>>>>>>>>> On 1/26/2022 4:12 PM, Miquel Raynal wrote:
>>>>>>>>>>>> Hi Mani,
>>>>>>>>>>>>
>>>>>>>>>>>> mani@kernel.org wrote on Wed, 26 Jan 2022 16:03:16 +0530:
>>>>>>>>>>>>> On Wed, Jan 26, 2022 at 11:16:13AM +0100, Miquel Raynal wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> miquel.raynal@bootlin.com wrote on Fri, 14 Jan 2022 08:27:18 +0100:
>>>>>>>>>>>>>>> Hi Konrad,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> konrad.dybcio@somainline.org wrote on Thu, 13 Jan 2022 19:44:26 >>>>>>>> +0100:
>>>>>>>>>>>>>>>> While I have absolutely 0 idea why and how, running >>>>>>>>> clear_bam_transaction
>>>>>>>>>>>>>>>> when READID is issued makes the DMA totally clog up and refuse >>>>>>>>> to function
>>>>>>>>>>>>>>>> at all on mdm9607. In fact, it is so bad that all the data >>>>>>>>> gets garbled
>>>>>>>>>>>>>>>> and after a short while in the nand probe flow, the CPU >>>>>>>>> decides that
>>>>>>>>>>>>>>>> sepuku is the only option.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Removing _READID from the if condition makes it work like a >>>>>>>>> charm, I can
>>>>>>>>>>>>>>>> read data and mount partitions without a problem.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>> This is totally just an observation which took me an inhumane >>>>>>>>> amount of
>>>>>>>>>>>>>>>> debug prints to find.. perhaps there's a better reason behind >>>>>>>>> this, but
>>>>>>>>>>>>>>>> I can't seem to find any answers.. Therefore, this is a BIG RFC!
>>>>>>>>>>>>>>> I'm adding two people from codeaurora who worked a lot on this >>>>>>>> driver.
>>>>>>>>>>>>>>> Hopefully they will have an idea :)
>>>>>>>>>>>>>> Sadre, I've spent a significant amount of time reviewing your >>>>>>> patches,
>>>>>>>>>>>>>> now it's your turn to not take a month to answer to your peers
>>>>>>>>>>>>>> proposals.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please help reviewing this patch.
>>>>>>>>>>>>> Sorry. I was hoping that Qcom folks would chime in as I don't >>>>>> have any idea
>>>>>>>>>>>>> about the mdm9607 platform. It could be that the mail server >>>>>> migration from
>>>>>>>>>>>>> codeaurora to quicinc put a barrier here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let me ping them internally.
>>>>>>>>>>>> Oh, ok, I didn't know. Thanks!
>>>>>>>>>>>      Sorry Miquel, somehow we did not get this email in our inbox.
>>>>>>>>>>>      Thanks to Mani for pinging us, we will test this up today and >>>> get back.
>>>>>>>>>>         While we could not reproduce this issue on our ipq boards (do >>> not have a mdm9607 right now) and
>>>>>>>>>>          issue does not look any obvious.
>>>>>>>>>>         can you please give the debug logs that you did for the above >>> stage by stage ?
>>>>>>>>> I won't have access to the board for about two weeks, sorry.
>>>>>>>>>
>>>>>>>>> When I get to it, I'll surely try to send you the logs, though there
>>>>>>>>>
>>>>>>>>> wasn't much more than just something jumping to who-knows-where
>>>>>>>>>
>>>>>>>>> after clear_bam_transaction was called, resulting in values >> associated with
>>>>>>>>>
>>>>>>>>> the NAND being all zeroed out in pr_err/_debug/etc.
>>>>>>>>>
>>>>>>>>       Ok sure. So was the READID command itself failing (or) the > subsequent one ?
>>>>>>>>      We can check which parameter reset by the clear_bam_transaction is > causing the
>>>>>>>>      failure.  Meanwhile, looping in Pradeep who has access to the > board, so in a better
>>>>>>>>      position to debug.
>>>>>>> I'm sorry I have so few details on hand, and no kernel tree (no access to that machine either, for now).
>>>>>>>
>>>>>>>
>>>>>>> I will try to describe to the best of my abilities what I recall.
>>>>>>>
>>>>>>>
>>>>>>> My methodology of making sure things don't go haywire was to print the oob size
>>>>>>>
>>>>>>> of our NAND basically every two lines of code (yes, i was very desperate at one point),
>>>>>>>
>>>>>>> as that was zeroed out when *the bug* happened,
>>>>>> This does look like a pointer error at some point and some kernel data
>>>>>> has been corrupted very badly by the driver.
>>>>>>
>>>>>>> leading to a kernel bug/panic/stall
>>>>>>>
>>>>>>> (can't recall what exactly it was, but it said something along the lines of "no support for
>>>>>>>
>>>>>>> oob size 0" and then it didn't fail graceully, leading to some bad jumps and ultimately
>>>>>>>
>>>>>>> a dead platform..)
>>>>>>>
>>>>>>>
>>>>>>> after hours of digging, I found out that everything goes fine until clear_bam_transaction is called,
>>>>>> Do you remember if this function was called for the first time when
>>>>>> this happened?
>>>>> I think so, if I recall correctly there are no more callers in this path, as readid is the first nand command executed in flash probe flow.
>>>>>
>>>>>
>>>>>
>>>>>>> after that gets executed every nand op starts reading all zeroes (for example in JEDEC ID check)
>>>>>>>
>>>>>>> so I added the changes from this patch, and things magically started working... My suspicion is
>>>>>>>
>>>>>>> that the underlying FIFO isn't fully drained (is it a FIFO on 9607? bah, i work on too many socs at once)
>>>>>> I don't see it in the list of supported devices, what's the exact
>>>>>> compatible used?
>>>>> qcom,ipq4019-nand
>>>>>
>>>>>
>>>>>
>>>>>>> and this function only makes Linux think it is, without actually draining it, and the leftover
>>>>>>>
>>>>>>> commands get executed with some parts of them getting overwritten, resulting in the
>>>>>>>
>>>>>>> famous garbage in - garbage out situation, but that's only a guesstimate..
>>>>>> I would bet for a non allocated bam-ish pointer that is reset to zero
>>>>>> in the clear_bam_transaction() helper.
>>>>>>
>>>>>> Can you get your hands on the board again?
>>>>> Sure, but as I mentioned previously, only in about 2 weeks, I can't really do any dev before then.. :(
>>>>>
>>>>>
>>>>>
>>>>>> It would be nice to check if the allocation always occurs before use,
>>>>>> and if yes on how much bytes.
>>>>>>
>>>>>> If the pointer is not dangling, then perhaps something else smashes
>>>>>> that pointer.
>>>>>
>>>>> Konrad
>>>>>
>>>>>>> Do note this somehow worked fine on 5.11 and then broke on 5.12/13. I went as far as replacing most
>>>>>>>
>>>>>>> of the kernel with the updated/downgraded parts via git checkout (i tried many combinations),
>>>>>>>
>>>>>>> to no avail.. I even tried different compilers and optimization levels, thinking it could have been
>>>>>>>
>>>>>>> a codegen issue, but no luck either.
>>>>>>>
>>>>>>>
>>>>>>> I.. do understand this email is a total mess to read, as much as it was to write, but
>>>>>>>
>>>>>>> without access to my code and the machine itself I can't give you solid details, and
>>>>>>>
>>>>>>> the fact this situation is far from ordinary doesn't help either..
>>>>>>>
>>>>>>>
>>>>>>> The latest (ancient, not quite pretty, but probably working if my memory is correct) version of my patches
>>>>>>>
>>>>>>> for the mdm9607 is available at [1], I will push the new revision after I get access to the workstation.
>>>>>>>
>>>>    + few more who have access to the board.
>>>>
>>>>     Going by the description, for kernel corruption, we can try out a KASAN build.
>>>>     Since you have mentioned it worked till 5.11, you bisected the driver till 5.11 head and it worked ?
>>>>
>>>     Tried running a KASAN enabled image on IPQ board, but no luck. Nothing came out.
>>>     Only if someone with the board can help here, we can proceed
>>>
>>>
>>> Regards,
>>>    Sricharan
>>>
>> I have the board with me again. Please tell me where do we start :)
> 
>  Sorry for the delayed response.
[Looks at the calendar] What can I say... lots of things happened :)


> 
>      As a first step, Can you enable KASAN and check if you get any warnings ?
> 
>      Then, can you check inside clear_bam_transaction, which parameter resetting specifically is causing the issue ?
> 
I have 3 logs for you:

[1] is KASAN=y, with this patch
[2] is KASAN=y, WITHOUT this patch (should die, but doesn't - does KASAN prevent it from doing something stupid?)
[3] is KASAN=n, WITHOUT this patch (dies as expected)

Looks like there's a lot happening..


Konrad
> 
> Regards,
>   Sricharan
> 
> 

[1] https://paste.debian.net/1233873/
[2] https://paste.debian.net/1233874/
[3] https://paste.debian.net/1233878/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
  2022-03-11 21:22                                 ` Konrad Dybcio
@ 2022-04-08 13:29                                   ` Manivannan Sadhasivam
  -1 siblings, 0 replies; 42+ messages in thread
From: Manivannan Sadhasivam @ 2022-04-08 13:29 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Sricharan Ramabadhran, Miquel Raynal, Manivannan Sadhasivam,
	pragalla, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk

Hi,

On Fri, Mar 11, 2022 at 10:22:51PM +0100, Konrad Dybcio wrote:
> 
>

[...]
 
> I have 3 logs for you:
> 
> [1] is KASAN=y, with this patch
> [2] is KASAN=y, WITHOUT this patch (should die, but doesn't - does KASAN prevent it from doing something stupid?)
> [3] is KASAN=n, WITHOUT this patch (dies as expected)
> 

We reproduced the same issue on SDX65-MTP board and your hack worked :)
Since this board is available inside Qcom, now Sadre and Sricharan should be
able to investigate it properly.

Thanks,
Mani

> Looks like there's a lot happening..
> 
> 
> Konrad
> > 
> > Regards,
> >   Sricharan
> > 
> > 
> 
> [1] https://paste.debian.net/1233873/
> [2] https://paste.debian.net/1233874/
> [3] https://paste.debian.net/1233878/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID
@ 2022-04-08 13:29                                   ` Manivannan Sadhasivam
  0 siblings, 0 replies; 42+ messages in thread
From: Manivannan Sadhasivam @ 2022-04-08 13:29 UTC (permalink / raw)
  To: Konrad Dybcio
  Cc: Sricharan Ramabadhran, Miquel Raynal, Manivannan Sadhasivam,
	pragalla, ~postmarketos/upstreaming, martin.botka,
	angelogioacchino.delregno, marijn.suijten, jamipkettunen,
	Richard Weinberger, Vignesh Raghavendra, linux-mtd,
	linux-arm-msm, linux-kernel, mdalam, bbhatt, hemantk

Hi,

On Fri, Mar 11, 2022 at 10:22:51PM +0100, Konrad Dybcio wrote:
> 
>

[...]
 
> I have 3 logs for you:
> 
> [1] is KASAN=y, with this patch
> [2] is KASAN=y, WITHOUT this patch (should die, but doesn't - does KASAN prevent it from doing something stupid?)
> [3] is KASAN=n, WITHOUT this patch (dies as expected)
> 

We reproduced the same issue on SDX65-MTP board and your hack worked :)
Since this board is available inside Qcom, now Sadre and Sricharan should be
able to investigate it properly.

Thanks,
Mani

> Looks like there's a lot happening..
> 
> 
> Konrad
> > 
> > Regards,
> >   Sricharan
> > 
> > 
> 
> [1] https://paste.debian.net/1233873/
> [2] https://paste.debian.net/1233874/
> [3] https://paste.debian.net/1233878/

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2022-04-08 13:30 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 18:44 [PATCH] mtd: nand: raw: qcom_nandc: Don't clear_bam_transaction on READID Konrad Dybcio
2022-01-13 18:44 ` Konrad Dybcio
2022-01-13 18:45 ` Konrad Dybcio
2022-01-13 18:45   ` Konrad Dybcio
2022-01-14  7:27 ` Miquel Raynal
2022-01-14  7:27   ` Miquel Raynal
2022-01-26 10:16   ` Miquel Raynal
2022-01-26 10:16     ` Miquel Raynal
2022-01-26 10:33     ` Manivannan Sadhasivam
2022-01-26 10:33       ` Manivannan Sadhasivam
2022-01-26 10:42       ` Miquel Raynal
2022-01-26 10:42         ` Miquel Raynal
2022-01-26 11:36         ` Manivannan Sadhasivam
2022-01-26 11:36           ` Manivannan Sadhasivam
2022-01-28  4:25         ` Sricharan Ramabadhran
2022-01-28  4:25           ` Sricharan Ramabadhran
2022-01-28 17:50           ` Sricharan Ramabadhran
2022-01-28 17:50             ` Sricharan Ramabadhran
2022-01-31  9:52             ` Miquel Raynal
2022-01-31  9:52               ` Miquel Raynal
2022-01-31 10:09             ` Konrad Dybcio
2022-01-31 10:09               ` Konrad Dybcio
2022-01-31 14:13               ` Sricharan Ramabadhran
2022-01-31 14:13                 ` Sricharan Ramabadhran
2022-01-31 19:54                 ` Konrad Dybcio
2022-01-31 19:54                   ` Konrad Dybcio
2022-02-01 13:52                   ` Miquel Raynal
2022-02-01 13:52                     ` Miquel Raynal
2022-02-01 15:51                     ` Konrad Dybcio
2022-02-01 15:51                       ` Konrad Dybcio
2022-02-02  7:24                       ` Sricharan Ramabadhran
2022-02-02  7:24                         ` Sricharan Ramabadhran
2022-02-04 17:17                         ` Sricharan Ramabadhran
2022-02-04 17:17                           ` Sricharan Ramabadhran
2022-02-08 16:45                           ` Konrad Dybcio
2022-02-08 16:45                             ` Konrad Dybcio
2022-02-24  7:33                             ` Sricharan Ramabadhran
2022-02-24  7:33                               ` Sricharan Ramabadhran
2022-03-11 21:22                               ` Konrad Dybcio
2022-03-11 21:22                                 ` Konrad Dybcio
2022-04-08 13:29                                 ` Manivannan Sadhasivam
2022-04-08 13:29                                   ` Manivannan Sadhasivam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.