All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Manivannan Sadhasivam <mani@kernel.org>
Cc: Md Sadre Alam <quic_mdalam@quicinc.com>,
	richard@nod.at, vigneshr@ti.com, linux-mtd@lists.infradead.org,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	konrad.dybcio@somainline.org, quic_srichara@quicinc.com
Subject: Re: [PATCH] mtd: rawnand: qcom: fix memory corruption that causes panic
Date: Thu, 14 Apr 2022 16:59:09 +0200	[thread overview]
Message-ID: <20220414165909.249c2325@xps13> (raw)
In-Reply-To: <20220414143907.GA20493@thinkpad>

Hi Manivannan,

mani@kernel.org wrote on Thu, 14 Apr 2022 20:09:07 +0530:

> On Thu, Apr 14, 2022 at 02:42:36PM +0200, Miquel Raynal wrote:
> > Hi Md,
> > 
> > quic_mdalam@quicinc.com wrote on Thu, 14 Apr 2022 17:50:48 +0530:
> >   
> > > > Hi Md,
> > > >
> > > > quic_mdalam@quicinc.com wrote on Thu, 14 Apr 2022 11:09:33 +0530:
> > > >    
> > > >> This patch fixes a memory corruption that occurred in the
> > > >> nand_scan() path for Hynix nand device.
> > > >>
> > > >> On boot, for Hynix nand device will panic at a weird place:
> > > >> | Unable to handle kernel NULL pointer dereference at virtual
> > > >>    address 00000070
> > > >> | [00000070] *pgd=00000000
> > > >> | Internal error: Oops: 5 [#1] PREEMPT SMP ARM Modules linked in:
> > > >> | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-01473-g13ae1769cfb0
> > > >>    #38
> > > >> | Hardware name: Generic DT based system PC is at
> > > >> | nandc_set_reg+0x8/0x1c LR is at qcom_nandc_command+0x20c/0x5d0
> > > >> | pc : [<c088b74c>]    lr : [<c088d9c8>]    psr: 00000113
> > > >> | sp : c14adc50  ip : c14ee208  fp : c0cc970c
> > > >> | r10: 000000a3  r9 : 00000000  r8 : 00000040
> > > >> | r7 : c16f6a00  r6 : 00000090  r5 : 00000004  r4 :c14ee040
> > > >> | r3 : 00000000  r2 : 0000000b  r1 : 00000000  r0 :c14ee040
> > > >> | Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM Segment none
> > > >> | Control: 10c5387d  Table: 8020406a  DAC: 00000051 Register r0
> > > >> | information: slab kmalloc-2k start c14ee000 pointer offset
> > > >>    64 size 2048
> > > >> | Process swapper/0 (pid: 1, stack limit = 0x(ptrval)) nandc_set_reg
> > > >> | from qcom_nandc_command+0x20c/0x5d0 qcom_nandc_command from
> > > >> | nand_readid_op+0x198/0x1e8 nand_readid_op from
> > > >> | hynix_nand_has_valid_jedecid+0x30/0x78
> > > >> | hynix_nand_has_valid_jedecid from hynix_nand_init+0xb8/0x454
> > > >> | hynix_nand_init from nand_scan_with_ids+0xa30/0x14a8
> > > >> | nand_scan_with_ids from qcom_nandc_probe+0x648/0x7b0
> > > >> | qcom_nandc_probe from platform_probe+0x58/0xac
> > > >>
> > > >> The problem is that the nand_scan()'s qcom_nand_attach_chip callback
> > > >> is updating the nandc->max_cwperpage from 1 to 4.This causes the
> > > >> sg_init_table of clear_bam_transaction() in the driver's
> > > >> qcom_nandc_command() to memset much more than what was initially
> > > >> allocated by alloc_bam_transaction().    
> > > > Thanks for investigating!
> > > >    
> > > >> This patch will update nandc->max_cwperpage 1 to 4 after nand_scan()
> > > >> returns, and remove updating nandc->max_cwperpage from
> > > >> qcom_nand_attach_chip call back.    
> > > > The fix does not look right, as far as I understand, this should be properly handled during the attach phase. That is where we have all information about the chip and do the configuration for this chip.
> > > >
> > > > If you update max_cwperpage there you should probably update other internal variables that depend on it as well.    
> > > 
> > >     Currently we are updating max_cwperpage  in qcom_nand_attach_chip(), but we are seeing issue for Hynix nand device since nand_scan_tail() is getting called after nand_attach() and in nand_attach() we are updating max_cwperpage to 4 or 8 based on page size.
> > > 
> > >      From nand_scan_tail() there is a call for nand_manufacturer_init() , specific to Hynix nand read_id is getting called that's why we are seeing this issue only for Hynix nand device. Read id sequence as below
> > > 
> > >     hynix_nand_has_valid_jedecid()
> > > 
> > >                  |
> > > 
> > >     nand_readid_op()
> > > 
> > >               |
> > > 
> > >   qcom_nandc_command()
> > > 
> > >              |
> > > 
> > > pre_command()
> > > 
> > >            |
> > > 
> > > clear_bam_transaction()   --> In this call we are doing sg_init_table() which is calling memset() based on max_cwperpage.Since initially we have allocated bam transaction as per max_cwperpage =1 and , since nand_chip_attach() updated max_cwperpage,  now we are doing memset as per max_cwperpage = 4 or 8.
> > > 
> > > 
> > > So anyway we have to updated max_cwperpage after nand_scan() call only.  Since there is no other dependency on max_cwperpage in nand_attach_chip() and we are using this in bam_alloc() and bam_clear().  
> > 
> > Why don't you update the sg table after increasing max_cwperpage?
> >   
> 
> Or we could move the bam reallocation inside qcom_nand_attach_chip() as below?

Much better approach, yes.

> 
> diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
> index 7c6efa3b6255..58c16054630f 100644
> --- a/drivers/mtd/nand/raw/qcom_nandc.c
> +++ b/drivers/mtd/nand/raw/qcom_nandc.c
> @@ -2653,9 +2653,23 @@ static int qcom_nand_attach_chip(struct nand_chip *chip)
>  
>         mtd_set_ooblayout(mtd, &qcom_nand_ooblayout_ops);
>  
> +       /* Free the initially allocated BAM transaction for reading the ONFI params */
> +       if (nandc->props->is_bam)
> +               free_bam_transaction(nandc);
> +
>         nandc->max_cwperpage = max_t(unsigned int, nandc->max_cwperpage,
>                                      cwperpage);
>  
> +       /* Now allocate the BAM transaction based on updated max_cwperpage */
> +       if (nandc->props->is_bam) {
> +               nandc->bam_txn = alloc_bam_transaction(nandc);
> +               if (!nandc->bam_txn) {
> +                       dev_err(nandc->dev,
> +                               "failed to allocate bam transaction\n");
> +                       return -ENOMEM;
> +               }
> +       }
> +
>         /*
>          * DATA_UD_BYTES varies based on whether the read/write command protects
>          * spare data with ECC too. We protect spare data by default, so we set
> @@ -2956,17 +2970,6 @@ static int qcom_nand_host_init_and_register(struct qcom_nand_controller *nandc,
>         if (ret)
>                 return ret;
>  
> -       if (nandc->props->is_bam) {
> -               free_bam_transaction(nandc);
> -               nandc->bam_txn = alloc_bam_transaction(nandc);
> -               if (!nandc->bam_txn) {
> -                       dev_err(nandc->dev,
> -                               "failed to allocate bam transaction\n");
> -                       nand_cleanup(chip);
> -                       return -ENOMEM;
> -               }
> -       }
> -
>         ret = mtd_device_parse_register(mtd, probes, NULL, NULL, 0);
>         if (ret)
>                 nand_cleanup(chip);
> 
> Thanks,
> Mani


Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

WARNING: multiple messages have this Message-ID (diff)
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Manivannan Sadhasivam <mani@kernel.org>
Cc: Md Sadre Alam <quic_mdalam@quicinc.com>,
	richard@nod.at, vigneshr@ti.com, linux-mtd@lists.infradead.org,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	konrad.dybcio@somainline.org, quic_srichara@quicinc.com
Subject: Re: [PATCH] mtd: rawnand: qcom: fix memory corruption that causes panic
Date: Thu, 14 Apr 2022 16:59:09 +0200	[thread overview]
Message-ID: <20220414165909.249c2325@xps13> (raw)
In-Reply-To: <20220414143907.GA20493@thinkpad>

Hi Manivannan,

mani@kernel.org wrote on Thu, 14 Apr 2022 20:09:07 +0530:

> On Thu, Apr 14, 2022 at 02:42:36PM +0200, Miquel Raynal wrote:
> > Hi Md,
> > 
> > quic_mdalam@quicinc.com wrote on Thu, 14 Apr 2022 17:50:48 +0530:
> >   
> > > > Hi Md,
> > > >
> > > > quic_mdalam@quicinc.com wrote on Thu, 14 Apr 2022 11:09:33 +0530:
> > > >    
> > > >> This patch fixes a memory corruption that occurred in the
> > > >> nand_scan() path for Hynix nand device.
> > > >>
> > > >> On boot, for Hynix nand device will panic at a weird place:
> > > >> | Unable to handle kernel NULL pointer dereference at virtual
> > > >>    address 00000070
> > > >> | [00000070] *pgd=00000000
> > > >> | Internal error: Oops: 5 [#1] PREEMPT SMP ARM Modules linked in:
> > > >> | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-01473-g13ae1769cfb0
> > > >>    #38
> > > >> | Hardware name: Generic DT based system PC is at
> > > >> | nandc_set_reg+0x8/0x1c LR is at qcom_nandc_command+0x20c/0x5d0
> > > >> | pc : [<c088b74c>]    lr : [<c088d9c8>]    psr: 00000113
> > > >> | sp : c14adc50  ip : c14ee208  fp : c0cc970c
> > > >> | r10: 000000a3  r9 : 00000000  r8 : 00000040
> > > >> | r7 : c16f6a00  r6 : 00000090  r5 : 00000004  r4 :c14ee040
> > > >> | r3 : 00000000  r2 : 0000000b  r1 : 00000000  r0 :c14ee040
> > > >> | Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM Segment none
> > > >> | Control: 10c5387d  Table: 8020406a  DAC: 00000051 Register r0
> > > >> | information: slab kmalloc-2k start c14ee000 pointer offset
> > > >>    64 size 2048
> > > >> | Process swapper/0 (pid: 1, stack limit = 0x(ptrval)) nandc_set_reg
> > > >> | from qcom_nandc_command+0x20c/0x5d0 qcom_nandc_command from
> > > >> | nand_readid_op+0x198/0x1e8 nand_readid_op from
> > > >> | hynix_nand_has_valid_jedecid+0x30/0x78
> > > >> | hynix_nand_has_valid_jedecid from hynix_nand_init+0xb8/0x454
> > > >> | hynix_nand_init from nand_scan_with_ids+0xa30/0x14a8
> > > >> | nand_scan_with_ids from qcom_nandc_probe+0x648/0x7b0
> > > >> | qcom_nandc_probe from platform_probe+0x58/0xac
> > > >>
> > > >> The problem is that the nand_scan()'s qcom_nand_attach_chip callback
> > > >> is updating the nandc->max_cwperpage from 1 to 4.This causes the
> > > >> sg_init_table of clear_bam_transaction() in the driver's
> > > >> qcom_nandc_command() to memset much more than what was initially
> > > >> allocated by alloc_bam_transaction().    
> > > > Thanks for investigating!
> > > >    
> > > >> This patch will update nandc->max_cwperpage 1 to 4 after nand_scan()
> > > >> returns, and remove updating nandc->max_cwperpage from
> > > >> qcom_nand_attach_chip call back.    
> > > > The fix does not look right, as far as I understand, this should be properly handled during the attach phase. That is where we have all information about the chip and do the configuration for this chip.
> > > >
> > > > If you update max_cwperpage there you should probably update other internal variables that depend on it as well.    
> > > 
> > >     Currently we are updating max_cwperpage  in qcom_nand_attach_chip(), but we are seeing issue for Hynix nand device since nand_scan_tail() is getting called after nand_attach() and in nand_attach() we are updating max_cwperpage to 4 or 8 based on page size.
> > > 
> > >      From nand_scan_tail() there is a call for nand_manufacturer_init() , specific to Hynix nand read_id is getting called that's why we are seeing this issue only for Hynix nand device. Read id sequence as below
> > > 
> > >     hynix_nand_has_valid_jedecid()
> > > 
> > >                  |
> > > 
> > >     nand_readid_op()
> > > 
> > >               |
> > > 
> > >   qcom_nandc_command()
> > > 
> > >              |
> > > 
> > > pre_command()
> > > 
> > >            |
> > > 
> > > clear_bam_transaction()   --> In this call we are doing sg_init_table() which is calling memset() based on max_cwperpage.Since initially we have allocated bam transaction as per max_cwperpage =1 and , since nand_chip_attach() updated max_cwperpage,  now we are doing memset as per max_cwperpage = 4 or 8.
> > > 
> > > 
> > > So anyway we have to updated max_cwperpage after nand_scan() call only.  Since there is no other dependency on max_cwperpage in nand_attach_chip() and we are using this in bam_alloc() and bam_clear().  
> > 
> > Why don't you update the sg table after increasing max_cwperpage?
> >   
> 
> Or we could move the bam reallocation inside qcom_nand_attach_chip() as below?

Much better approach, yes.

> 
> diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
> index 7c6efa3b6255..58c16054630f 100644
> --- a/drivers/mtd/nand/raw/qcom_nandc.c
> +++ b/drivers/mtd/nand/raw/qcom_nandc.c
> @@ -2653,9 +2653,23 @@ static int qcom_nand_attach_chip(struct nand_chip *chip)
>  
>         mtd_set_ooblayout(mtd, &qcom_nand_ooblayout_ops);
>  
> +       /* Free the initially allocated BAM transaction for reading the ONFI params */
> +       if (nandc->props->is_bam)
> +               free_bam_transaction(nandc);
> +
>         nandc->max_cwperpage = max_t(unsigned int, nandc->max_cwperpage,
>                                      cwperpage);
>  
> +       /* Now allocate the BAM transaction based on updated max_cwperpage */
> +       if (nandc->props->is_bam) {
> +               nandc->bam_txn = alloc_bam_transaction(nandc);
> +               if (!nandc->bam_txn) {
> +                       dev_err(nandc->dev,
> +                               "failed to allocate bam transaction\n");
> +                       return -ENOMEM;
> +               }
> +       }
> +
>         /*
>          * DATA_UD_BYTES varies based on whether the read/write command protects
>          * spare data with ECC too. We protect spare data by default, so we set
> @@ -2956,17 +2970,6 @@ static int qcom_nand_host_init_and_register(struct qcom_nand_controller *nandc,
>         if (ret)
>                 return ret;
>  
> -       if (nandc->props->is_bam) {
> -               free_bam_transaction(nandc);
> -               nandc->bam_txn = alloc_bam_transaction(nandc);
> -               if (!nandc->bam_txn) {
> -                       dev_err(nandc->dev,
> -                               "failed to allocate bam transaction\n");
> -                       nand_cleanup(chip);
> -                       return -ENOMEM;
> -               }
> -       }
> -
>         ret = mtd_device_parse_register(mtd, probes, NULL, NULL, 0);
>         if (ret)
>                 nand_cleanup(chip);
> 
> Thanks,
> Mani


Thanks,
Miquèl

  reply	other threads:[~2022-04-14 14:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14  5:39 [PATCH] mtd: rawnand: qcom: fix memory corruption that causes panic Md Sadre Alam
2022-04-14  5:39 ` Md Sadre Alam
2022-04-14  8:15 ` Miquel Raynal
2022-04-14  8:15   ` Miquel Raynal
     [not found]   ` <DM6PR02MB580382FA47C4884AFC1A98D0FAEF9@DM6PR02MB5803.namprd02.prod.outlook.com>
2022-04-14 12:20     ` FW: " Md Sadre Alam
2022-04-14 12:20       ` Md Sadre Alam
2022-04-14 12:42       ` Miquel Raynal
2022-04-14 12:42         ` Miquel Raynal
2022-04-14 14:39         ` Manivannan Sadhasivam
2022-04-14 14:39           ` Manivannan Sadhasivam
2022-04-14 14:59           ` Miquel Raynal [this message]
2022-04-14 14:59             ` Miquel Raynal
2022-04-14 15:31             ` Md Sadre Alam
2022-04-14 15:31               ` Md Sadre Alam
  -- strict thread matches above, loose matches on Subject: below --
2018-12-23  0:31 Christian Lamparter
2019-01-08 11:01 ` Miquel Raynal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220414165909.249c2325@xps13 \
    --to=miquel.raynal@bootlin.com \
    --cc=konrad.dybcio@somainline.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=mani@kernel.org \
    --cc=quic_mdalam@quicinc.com \
    --cc=quic_srichara@quicinc.com \
    --cc=richard@nod.at \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.