nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Is nvdimm driver RT compatible?
@ 2019-02-14  4:10 Liu, Yongxin
  2019-02-14  6:49 ` Pankaj Gupta
  0 siblings, 1 reply; 7+ messages in thread
From: Liu, Yongxin @ 2019-02-14  4:10 UTC (permalink / raw)
  To: linux-nvdimm

Hi experts,

Could anyone tell me whether Linux nvdimm driver is RT compatible?

When I was testing PMEM performance with fio using the following command, I got calltrace below.

# fio -filename=/dev/pmem0s -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=16k -size=1G -numjobs=30 -runtime=100 -group_reporting -name=mytest

    BUG: scheduling while atomic: fio/2514/0x00000002
    Modules linked in: intel_rapl nd_pmem nd_btt skx_edac iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp 
    crct10dif_pclmul crct10dif_common aesni_intel aes_x86_64 crypto_simd    cryptd i40e nvme glue_helper nvme_core lpc_ich i2c_i801 nfit pcc_cpufreq wmi libnvdimm acpi_pad   
    acpi_power_meter
    Preemption disabled at:
    [<ffffffffc03608d9>] nd_region_acquire_lane+0x19/0x80 [libnvdimm]
    CPU: 44 PID: 2514 Comm: fio Tainted: G        W         4.18.20-rt8-preempt-rt #1
    Call Trace:
     dump_stack+0x4f/0x6a
     ? nd_region_acquire_lane+0x19/0x80 [libnvdimm]
     __schedule_bug.cold.17+0x38/0x55
     __schedule+0x484/0x6c0
     ? _raw_spin_lock+0x17/0x40
     schedule+0x3d/0xe0
     rt_spin_lock_slowlock_locked+0x118/0x2a0
     rt_spin_lock_slowlock+0x57/0x90
     rt_spin_lock+0x52/0x60
     btt_write_pg.isra.16+0x280/0x4b0 [nd_btt]
     btt_make_request+0x1b1/0x320 [nd_btt]
     generic_make_request+0x1dc/0x3f0
     submit_bio+0x49/0x140

nd_region_acquire_lane() disables preemption with get_cpu() which causes "scheduling while atomic" spews on RT.

Is it safe to replace get_cpu()/put_cpu() with get_cpu_light()/put_cpu_light() in nd_region_acquire_lane()/nd_region_release_lane()?
After this replacement, the codes protected by nd_region_release_lane/nd_region_release_lane would become pre-emptible. 
So are these codes reentrant?



Thanks,
Yongxin


_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Is nvdimm driver RT compatible?
  2019-02-14  4:10 Is nvdimm driver RT compatible? Liu, Yongxin
@ 2019-02-14  6:49 ` Pankaj Gupta
  2019-02-14 20:45   ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Pankaj Gupta @ 2019-02-14  6:49 UTC (permalink / raw)
  To: Yongxin Liu; +Cc: linux-nvdimm


Hi Yongxin,

> 
> Hi experts,
> 
> Could anyone tell me whether Linux nvdimm driver is RT compatible?
> 
> When I was testing PMEM performance with fio using the following command, I
> got calltrace below.
> 
> # fio -filename=/dev/pmem0s -direct=1 -iodepth 1 -thread -rw=randrw
> -rwmixread=70 -ioengine=psync -bs=16k -size=1G -numjobs=30 -runtime=100
> -group_reporting -name=mytest
> 
>     BUG: scheduling while atomic: fio/2514/0x00000002
>     Modules linked in: intel_rapl nd_pmem nd_btt skx_edac iTCO_wdt
>     iTCO_vendor_support intel_powerclamp coretemp
>     crct10dif_pclmul crct10dif_common aesni_intel aes_x86_64 crypto_simd
>     cryptd i40e nvme glue_helper nvme_core lpc_ich i2c_i801 nfit
>     pcc_cpufreq wmi libnvdimm acpi_pad
>     acpi_power_meter
>     Preemption disabled at:
>     [<ffffffffc03608d9>] nd_region_acquire_lane+0x19/0x80 [libnvdimm]
>     CPU: 44 PID: 2514 Comm: fio Tainted: G        W
>     4.18.20-rt8-preempt-rt #1
>     Call Trace:
>      dump_stack+0x4f/0x6a
>      ? nd_region_acquire_lane+0x19/0x80 [libnvdimm]
>      __schedule_bug.cold.17+0x38/0x55
>      __schedule+0x484/0x6c0
>      ? _raw_spin_lock+0x17/0x40
>      schedule+0x3d/0xe0
>      rt_spin_lock_slowlock_locked+0x118/0x2a0
>      rt_spin_lock_slowlock+0x57/0x90
>      rt_spin_lock+0x52/0x60
>      btt_write_pg.isra.16+0x280/0x4b0 [nd_btt]
>      btt_make_request+0x1b1/0x320 [nd_btt]
>      generic_make_request+0x1dc/0x3f0
>      submit_bio+0x49/0x140
> 
> nd_region_acquire_lane() disables preemption with get_cpu() which causes
> "scheduling while atomic" spews on RT.
> 
> Is it safe to replace get_cpu()/put_cpu() with
> get_cpu_light()/put_cpu_light() in
> nd_region_acquire_lane()/nd_region_release_lane()?

get_cpu disables preemption on a core and get_cpu_light will just avoid
task migration across the cores. If we change this to get/put_cpu_light
I think there is possibility of a race here.

> After this replacement, the codes protected by
> nd_region_release_lane/nd_region_release_lane would become pre-emptible.
> So are these codes reentrant?

Another thing I see is when get_cpu is called in "nd_region_acquire_lane"
there is no corresponding call for "put_cpu" in same function. This is
called only in "nd_region_release_lane". It means for the duration of
these calls preemption will be disabled. 

Here, we cannot change preemption disable to migration disable for mainline.
As there is code which can result in race.

For RT only patchset as well, IIUC will have same problem as there is spin_lock 
protection on a condition and path before/without spinlock may also race.
We can have something like local_lock_* =>  local_lock_cpu

This are my two cents.

Thanks,
Pankaj

> 
> 
> 
> Thanks,
> Yongxin
> 
> 
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Is nvdimm driver RT compatible?
  2019-02-14  6:49 ` Pankaj Gupta
@ 2019-02-14 20:45   ` Dan Williams
  2019-02-14 23:46     ` Liu, Yongxin
  2019-02-15  7:24     ` Pankaj Gupta
  0 siblings, 2 replies; 7+ messages in thread
From: Dan Williams @ 2019-02-14 20:45 UTC (permalink / raw)
  To: Pankaj Gupta; +Cc: Yongxin Liu, linux-nvdimm

On Wed, Feb 13, 2019 at 10:50 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
>
> Hi Yongxin,
>
> >
> > Hi experts,
> >
> > Could anyone tell me whether Linux nvdimm driver is RT compatible?
> >
> > When I was testing PMEM performance with fio using the following command, I
> > got calltrace below.
> >
> > # fio -filename=/dev/pmem0s -direct=1 -iodepth 1 -thread -rw=randrw
> > -rwmixread=70 -ioengine=psync -bs=16k -size=1G -numjobs=30 -runtime=100
> > -group_reporting -name=mytest
> >
> >     BUG: scheduling while atomic: fio/2514/0x00000002
> >     Modules linked in: intel_rapl nd_pmem nd_btt skx_edac iTCO_wdt
> >     iTCO_vendor_support intel_powerclamp coretemp
> >     crct10dif_pclmul crct10dif_common aesni_intel aes_x86_64 crypto_simd
> >     cryptd i40e nvme glue_helper nvme_core lpc_ich i2c_i801 nfit
> >     pcc_cpufreq wmi libnvdimm acpi_pad
> >     acpi_power_meter
> >     Preemption disabled at:
> >     [<ffffffffc03608d9>] nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> >     CPU: 44 PID: 2514 Comm: fio Tainted: G        W
> >     4.18.20-rt8-preempt-rt #1
> >     Call Trace:
> >      dump_stack+0x4f/0x6a
> >      ? nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> >      __schedule_bug.cold.17+0x38/0x55
> >      __schedule+0x484/0x6c0
> >      ? _raw_spin_lock+0x17/0x40
> >      schedule+0x3d/0xe0
> >      rt_spin_lock_slowlock_locked+0x118/0x2a0
> >      rt_spin_lock_slowlock+0x57/0x90
> >      rt_spin_lock+0x52/0x60
> >      btt_write_pg.isra.16+0x280/0x4b0 [nd_btt]
> >      btt_make_request+0x1b1/0x320 [nd_btt]
> >      generic_make_request+0x1dc/0x3f0
> >      submit_bio+0x49/0x140
> >
> > nd_region_acquire_lane() disables preemption with get_cpu() which causes
> > "scheduling while atomic" spews on RT.
> >
> > Is it safe to replace get_cpu()/put_cpu() with
> > get_cpu_light()/put_cpu_light() in
> > nd_region_acquire_lane()/nd_region_release_lane()?
>
> get_cpu disables preemption on a core and get_cpu_light will just avoid
> task migration across the cores. If we change this to get/put_cpu_light
> I think there is possibility of a race here.
>
> > After this replacement, the codes protected by
> > nd_region_release_lane/nd_region_release_lane would become pre-emptible.
> > So are these codes reentrant?
>
> Another thing I see is when get_cpu is called in "nd_region_acquire_lane"
> there is no corresponding call for "put_cpu" in same function. This is
> called only in "nd_region_release_lane". It means for the duration of
> these calls preemption will be disabled.
>
> Here, we cannot change preemption disable to migration disable for mainline.
> As there is code which can result in race.
>
> For RT only patchset as well, IIUC will have same problem as there is spin_lock
> protection on a condition and path before/without spinlock may also race.
> We can have something like local_lock_* =>  local_lock_cpu
>
> This are my two cents.

The btt implementation assumes that the btt code will not be
re-entered on the same cpu. So the pre-emption disable is required for
performance, but for correctness the implementation can fall back to
using a lock along with migration being disabled. In other words I
think changing nd_region_acquire_lane() like the following would be
sufficient for RT kernels:

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e2818f94f292..e720fc42bbc0 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -947,18 +947,15 @@ int nd_blk_region_init(struct nd_region *nd_region)
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
 {
        unsigned int cpu, lane;
+       struct nd_percpu_lane *ndl_lock, *ndl_count;

-       cpu = get_cpu();
-       if (nd_region->num_lanes < nr_cpu_ids) {
-               struct nd_percpu_lane *ndl_lock, *ndl_count;
+       cpu = get_cpu_light();

-               lane = cpu % nd_region->num_lanes;
-               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
-               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
-               if (ndl_count->count++ == 0)
-                       spin_lock(&ndl_lock->lock);
-       } else
-               lane = cpu;
+       lane = cpu % nd_region->num_lanes;
+       ndl_count = per_cpu_ptr(nd_region->lane, cpu);
+       ndl_lock = per_cpu_ptr(nd_region->lane, lane);
+       if (ndl_count->count++ == 0)
+               spin_lock(&ndl_lock->lock);

        return lane;
 }
@@ -966,17 +963,14 @@ EXPORT_SYMBOL(nd_region_acquire_lane);

 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
 {
-       if (nd_region->num_lanes < nr_cpu_ids) {
-               unsigned int cpu = get_cpu();
-               struct nd_percpu_lane *ndl_lock, *ndl_count;
-
-               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
-               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
-               if (--ndl_count->count == 0)
-                       spin_unlock(&ndl_lock->lock);
-               put_cpu();
-       }
-       put_cpu();
+       unsigned int cpu = get_cpu();
+       struct nd_percpu_lane *ndl_lock, *ndl_count;
+
+       ndl_count = per_cpu_ptr(nd_region->lane, cpu);
+       ndl_lock = per_cpu_ptr(nd_region->lane, lane);
+       if (--ndl_count->count == 0)
+               spin_unlock(&ndl_lock->lock);
+       put_cpu_light();
 }
 EXPORT_SYMBOL(nd_region_release_lane);
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: Is nvdimm driver RT compatible?
  2019-02-14 20:45   ` Dan Williams
@ 2019-02-14 23:46     ` Liu, Yongxin
  2019-02-15  7:24     ` Pankaj Gupta
  1 sibling, 0 replies; 7+ messages in thread
From: Liu, Yongxin @ 2019-02-14 23:46 UTC (permalink / raw)
  To: Dan Williams, Pankaj Gupta; +Cc: linux-nvdimm

Thank you all for the valuable opinions and suggestions.
Appreciate it so much.


-Yongxin


> -----Original Message-----
> From: Dan Williams [mailto:dan.j.williams@intel.com]
> Sent: Friday, February 15, 2019 04:45
> To: Pankaj Gupta
> Cc: Liu, Yongxin; linux-nvdimm
> Subject: Re: Is nvdimm driver RT compatible?
> 
> On Wed, Feb 13, 2019 at 10:50 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> >
> >
> > Hi Yongxin,
> >
> > >
> > > Hi experts,
> > >
> > > Could anyone tell me whether Linux nvdimm driver is RT compatible?
> > >
> > > When I was testing PMEM performance with fio using the following
> command, I
> > > got calltrace below.
> > >
> > > # fio -filename=/dev/pmem0s -direct=1 -iodepth 1 -thread -rw=randrw
> > > -rwmixread=70 -ioengine=psync -bs=16k -size=1G -numjobs=30 -
> runtime=100
> > > -group_reporting -name=mytest
> > >
> > >     BUG: scheduling while atomic: fio/2514/0x00000002
> > >     Modules linked in: intel_rapl nd_pmem nd_btt skx_edac iTCO_wdt
> > >     iTCO_vendor_support intel_powerclamp coretemp
> > >     crct10dif_pclmul crct10dif_common aesni_intel aes_x86_64
> crypto_simd
> > >     cryptd i40e nvme glue_helper nvme_core lpc_ich i2c_i801 nfit
> > >     pcc_cpufreq wmi libnvdimm acpi_pad
> > >     acpi_power_meter
> > >     Preemption disabled at:
> > >     [<ffffffffc03608d9>] nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> > >     CPU: 44 PID: 2514 Comm: fio Tainted: G        W
> > >     4.18.20-rt8-preempt-rt #1
> > >     Call Trace:
> > >      dump_stack+0x4f/0x6a
> > >      ? nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> > >      __schedule_bug.cold.17+0x38/0x55
> > >      __schedule+0x484/0x6c0
> > >      ? _raw_spin_lock+0x17/0x40
> > >      schedule+0x3d/0xe0
> > >      rt_spin_lock_slowlock_locked+0x118/0x2a0
> > >      rt_spin_lock_slowlock+0x57/0x90
> > >      rt_spin_lock+0x52/0x60
> > >      btt_write_pg.isra.16+0x280/0x4b0 [nd_btt]
> > >      btt_make_request+0x1b1/0x320 [nd_btt]
> > >      generic_make_request+0x1dc/0x3f0
> > >      submit_bio+0x49/0x140
> > >
> > > nd_region_acquire_lane() disables preemption with get_cpu() which
> causes
> > > "scheduling while atomic" spews on RT.
> > >
> > > Is it safe to replace get_cpu()/put_cpu() with
> > > get_cpu_light()/put_cpu_light() in
> > > nd_region_acquire_lane()/nd_region_release_lane()?
> >
> > get_cpu disables preemption on a core and get_cpu_light will just avoid
> > task migration across the cores. If we change this to get/put_cpu_light
> > I think there is possibility of a race here.
> >
> > > After this replacement, the codes protected by
> > > nd_region_release_lane/nd_region_release_lane would become pre-
> emptible.
> > > So are these codes reentrant?
> >
> > Another thing I see is when get_cpu is called in
> "nd_region_acquire_lane"
> > there is no corresponding call for "put_cpu" in same function. This is
> > called only in "nd_region_release_lane". It means for the duration of
> > these calls preemption will be disabled.
> >
> > Here, we cannot change preemption disable to migration disable for
> mainline.
> > As there is code which can result in race.
> >
> > For RT only patchset as well, IIUC will have same problem as there is
> spin_lock
> > protection on a condition and path before/without spinlock may also
> race.
> > We can have something like local_lock_* =>  local_lock_cpu
> >
> > This are my two cents.
> 
> The btt implementation assumes that the btt code will not be
> re-entered on the same cpu. So the pre-emption disable is required for
> performance, but for correctness the implementation can fall back to
> using a lock along with migration being disabled. In other words I
> think changing nd_region_acquire_lane() like the following would be
> sufficient for RT kernels:
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index e2818f94f292..e720fc42bbc0 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -947,18 +947,15 @@ int nd_blk_region_init(struct nd_region *nd_region)
>  unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
>  {
>         unsigned int cpu, lane;
> +       struct nd_percpu_lane *ndl_lock, *ndl_count;
> 
> -       cpu = get_cpu();
> -       if (nd_region->num_lanes < nr_cpu_ids) {
> -               struct nd_percpu_lane *ndl_lock, *ndl_count;
> +       cpu = get_cpu_light();
> 
> -               lane = cpu % nd_region->num_lanes;
> -               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> -               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> -               if (ndl_count->count++ == 0)
> -                       spin_lock(&ndl_lock->lock);
> -       } else
> -               lane = cpu;
> +       lane = cpu % nd_region->num_lanes;
> +       ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> +       ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> +       if (ndl_count->count++ == 0)
> +               spin_lock(&ndl_lock->lock);
> 
>         return lane;
>  }
> @@ -966,17 +963,14 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
> 
>  void nd_region_release_lane(struct nd_region *nd_region, unsigned int
> lane)
>  {
> -       if (nd_region->num_lanes < nr_cpu_ids) {
> -               unsigned int cpu = get_cpu();
> -               struct nd_percpu_lane *ndl_lock, *ndl_count;
> -
> -               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> -               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> -               if (--ndl_count->count == 0)
> -                       spin_unlock(&ndl_lock->lock);
> -               put_cpu();
> -       }
> -       put_cpu();
> +       unsigned int cpu = get_cpu();
> +       struct nd_percpu_lane *ndl_lock, *ndl_count;
> +
> +       ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> +       ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> +       if (--ndl_count->count == 0)
> +               spin_unlock(&ndl_lock->lock);
> +       put_cpu_light();
>  }
>  EXPORT_SYMBOL(nd_region_release_lane);
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Is nvdimm driver RT compatible?
  2019-02-14 20:45   ` Dan Williams
  2019-02-14 23:46     ` Liu, Yongxin
@ 2019-02-15  7:24     ` Pankaj Gupta
  2019-02-15 16:31       ` Dan Williams
  1 sibling, 1 reply; 7+ messages in thread
From: Pankaj Gupta @ 2019-02-15  7:24 UTC (permalink / raw)
  To: Dan Williams; +Cc: Yongxin Liu, linux-nvdimm


> >
> > Hi Yongxin,
> >
> > >
> > > Hi experts,
> > >
> > > Could anyone tell me whether Linux nvdimm driver is RT compatible?
> > >
> > > When I was testing PMEM performance with fio using the following command,
> > > I
> > > got calltrace below.
> > >
> > > # fio -filename=/dev/pmem0s -direct=1 -iodepth 1 -thread -rw=randrw
> > > -rwmixread=70 -ioengine=psync -bs=16k -size=1G -numjobs=30 -runtime=100
> > > -group_reporting -name=mytest
> > >
> > >     BUG: scheduling while atomic: fio/2514/0x00000002
> > >     Modules linked in: intel_rapl nd_pmem nd_btt skx_edac iTCO_wdt
> > >     iTCO_vendor_support intel_powerclamp coretemp
> > >     crct10dif_pclmul crct10dif_common aesni_intel aes_x86_64 crypto_simd
> > >     cryptd i40e nvme glue_helper nvme_core lpc_ich i2c_i801 nfit
> > >     pcc_cpufreq wmi libnvdimm acpi_pad
> > >     acpi_power_meter
> > >     Preemption disabled at:
> > >     [<ffffffffc03608d9>] nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> > >     CPU: 44 PID: 2514 Comm: fio Tainted: G        W
> > >     4.18.20-rt8-preempt-rt #1
> > >     Call Trace:
> > >      dump_stack+0x4f/0x6a
> > >      ? nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> > >      __schedule_bug.cold.17+0x38/0x55
> > >      __schedule+0x484/0x6c0
> > >      ? _raw_spin_lock+0x17/0x40
> > >      schedule+0x3d/0xe0
> > >      rt_spin_lock_slowlock_locked+0x118/0x2a0
> > >      rt_spin_lock_slowlock+0x57/0x90
> > >      rt_spin_lock+0x52/0x60
> > >      btt_write_pg.isra.16+0x280/0x4b0 [nd_btt]
> > >      btt_make_request+0x1b1/0x320 [nd_btt]
> > >      generic_make_request+0x1dc/0x3f0
> > >      submit_bio+0x49/0x140
> > >
> > > nd_region_acquire_lane() disables preemption with get_cpu() which causes
> > > "scheduling while atomic" spews on RT.
> > >
> > > Is it safe to replace get_cpu()/put_cpu() with
> > > get_cpu_light()/put_cpu_light() in
> > > nd_region_acquire_lane()/nd_region_release_lane()?
> >
> > get_cpu disables preemption on a core and get_cpu_light will just avoid
> > task migration across the cores. If we change this to get/put_cpu_light
> > I think there is possibility of a race here.
> >
> > > After this replacement, the codes protected by
> > > nd_region_release_lane/nd_region_release_lane would become pre-emptible.
> > > So are these codes reentrant?
> >
> > Another thing I see is when get_cpu is called in "nd_region_acquire_lane"
> > there is no corresponding call for "put_cpu" in same function. This is
> > called only in "nd_region_release_lane". It means for the duration of
> > these calls preemption will be disabled.
> >
> > Here, we cannot change preemption disable to migration disable for
> > mainline.
> > As there is code which can result in race.
> >
> > For RT only patchset as well, IIUC will have same problem as there is
> > spin_lock
> > protection on a condition and path before/without spinlock may also race.
> > We can have something like local_lock_* =>  local_lock_cpu
> >
> > This are my two cents.
> 
> The btt implementation assumes that the btt code will not be
> re-entered on the same cpu. So the pre-emption disable is required for
> performance, but for correctness the implementation can fall back to
> using a lock along with migration being disabled. In other words I
> think changing nd_region_acquire_lane() like the following would be
> sufficient for RT kernels:

o.k
Not sure if its good idea to have different implementation for both mainline
and RT Kernel. It at all its possible to have single implementation. But I 
don't understand all of optimizations in original code so this might be okay.  

> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index e2818f94f292..e720fc42bbc0 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -947,18 +947,15 @@ int nd_blk_region_init(struct nd_region *nd_region)
>  unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
>  {
>         unsigned int cpu, lane;
> +       struct nd_percpu_lane *ndl_lock, *ndl_count;
> 
> -       cpu = get_cpu();
> -       if (nd_region->num_lanes < nr_cpu_ids) {
> -               struct nd_percpu_lane *ndl_lock, *ndl_count;
> +       cpu = get_cpu_light();
> 
> -               lane = cpu % nd_region->num_lanes;
> -               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> -               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> -               if (ndl_count->count++ == 0)
> -                       spin_lock(&ndl_lock->lock);
> -       } else
> -               lane = cpu;
> +       lane = cpu % nd_region->num_lanes;
> +       ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> +       ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> +       if (ndl_count->count++ == 0)
> +               spin_lock(&ndl_lock->lock);

I could not understand why spinlock here with condition?.
Are two tasks sharing same lane in same cpu and we are increasing
the count with first consumer. Other task won't wait/spin as count is
already elevated?

Or I am misunderstanding it.

Thanks,
Pankaj


> 
>         return lane;
>  }
> @@ -966,17 +963,14 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
> 
>  void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
>  {
> -       if (nd_region->num_lanes < nr_cpu_ids) {
> -               unsigned int cpu = get_cpu();
> -               struct nd_percpu_lane *ndl_lock, *ndl_count;
> -
> -               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> -               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> -               if (--ndl_count->count == 0)
> -                       spin_unlock(&ndl_lock->lock);
> -               put_cpu();
> -       }
> -       put_cpu();
> +       unsigned int cpu = get_cpu();

Is this get_cpu_light here?

> +       struct nd_percpu_lane *ndl_lock, *ndl_count;
> +
> +       ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> +       ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> +       if (--ndl_count->count == 0)
> +               spin_unlock(&ndl_lock->lock);
> +       put_cpu_light();
>  }
>  EXPORT_SYMBOL(nd_region_release_lane);
> 

Thanks,
Pankaj
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Is nvdimm driver RT compatible?
  2019-02-15  7:24     ` Pankaj Gupta
@ 2019-02-15 16:31       ` Dan Williams
  2019-02-18  9:57         ` Pankaj Gupta
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2019-02-15 16:31 UTC (permalink / raw)
  To: Pankaj Gupta; +Cc: Yongxin Liu, linux-nvdimm

On Thu, Feb 14, 2019 at 11:25 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
>
> > >
> > > Hi Yongxin,
> > >
> > > >
> > > > Hi experts,
> > > >
> > > > Could anyone tell me whether Linux nvdimm driver is RT compatible?
> > > >
> > > > When I was testing PMEM performance with fio using the following command,
> > > > I
> > > > got calltrace below.
> > > >
> > > > # fio -filename=/dev/pmem0s -direct=1 -iodepth 1 -thread -rw=randrw
> > > > -rwmixread=70 -ioengine=psync -bs=16k -size=1G -numjobs=30 -runtime=100
> > > > -group_reporting -name=mytest
> > > >
> > > >     BUG: scheduling while atomic: fio/2514/0x00000002
> > > >     Modules linked in: intel_rapl nd_pmem nd_btt skx_edac iTCO_wdt
> > > >     iTCO_vendor_support intel_powerclamp coretemp
> > > >     crct10dif_pclmul crct10dif_common aesni_intel aes_x86_64 crypto_simd
> > > >     cryptd i40e nvme glue_helper nvme_core lpc_ich i2c_i801 nfit
> > > >     pcc_cpufreq wmi libnvdimm acpi_pad
> > > >     acpi_power_meter
> > > >     Preemption disabled at:
> > > >     [<ffffffffc03608d9>] nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> > > >     CPU: 44 PID: 2514 Comm: fio Tainted: G        W
> > > >     4.18.20-rt8-preempt-rt #1
> > > >     Call Trace:
> > > >      dump_stack+0x4f/0x6a
> > > >      ? nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> > > >      __schedule_bug.cold.17+0x38/0x55
> > > >      __schedule+0x484/0x6c0
> > > >      ? _raw_spin_lock+0x17/0x40
> > > >      schedule+0x3d/0xe0
> > > >      rt_spin_lock_slowlock_locked+0x118/0x2a0
> > > >      rt_spin_lock_slowlock+0x57/0x90
> > > >      rt_spin_lock+0x52/0x60
> > > >      btt_write_pg.isra.16+0x280/0x4b0 [nd_btt]
> > > >      btt_make_request+0x1b1/0x320 [nd_btt]
> > > >      generic_make_request+0x1dc/0x3f0
> > > >      submit_bio+0x49/0x140
> > > >
> > > > nd_region_acquire_lane() disables preemption with get_cpu() which causes
> > > > "scheduling while atomic" spews on RT.
> > > >
> > > > Is it safe to replace get_cpu()/put_cpu() with
> > > > get_cpu_light()/put_cpu_light() in
> > > > nd_region_acquire_lane()/nd_region_release_lane()?
> > >
> > > get_cpu disables preemption on a core and get_cpu_light will just avoid
> > > task migration across the cores. If we change this to get/put_cpu_light
> > > I think there is possibility of a race here.
> > >
> > > > After this replacement, the codes protected by
> > > > nd_region_release_lane/nd_region_release_lane would become pre-emptible.
> > > > So are these codes reentrant?
> > >
> > > Another thing I see is when get_cpu is called in "nd_region_acquire_lane"
> > > there is no corresponding call for "put_cpu" in same function. This is
> > > called only in "nd_region_release_lane". It means for the duration of
> > > these calls preemption will be disabled.
> > >
> > > Here, we cannot change preemption disable to migration disable for
> > > mainline.
> > > As there is code which can result in race.
> > >
> > > For RT only patchset as well, IIUC will have same problem as there is
> > > spin_lock
> > > protection on a condition and path before/without spinlock may also race.
> > > We can have something like local_lock_* =>  local_lock_cpu
> > >
> > > This are my two cents.
> >
> > The btt implementation assumes that the btt code will not be
> > re-entered on the same cpu. So the pre-emption disable is required for
> > performance, but for correctness the implementation can fall back to
> > using a lock along with migration being disabled. In other words I
> > think changing nd_region_acquire_lane() like the following would be
> > sufficient for RT kernels:
>
> o.k
> Not sure if its good idea to have different implementation for both mainline
> and RT Kernel. It at all its possible to have single implementation. But I
> don't understand all of optimizations in original code so this might be okay.
>

I agree, it was just clarifying the implementation, but I would
otherwise not recommend shipping different driver implementations.
Unless / until get_cpu_light() is available in mainline I think this
patch should be considered reference only.

> >
> > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> > index e2818f94f292..e720fc42bbc0 100644
> > --- a/drivers/nvdimm/region_devs.c
> > +++ b/drivers/nvdimm/region_devs.c
> > @@ -947,18 +947,15 @@ int nd_blk_region_init(struct nd_region *nd_region)
> >  unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
> >  {
> >         unsigned int cpu, lane;
> > +       struct nd_percpu_lane *ndl_lock, *ndl_count;
> >
> > -       cpu = get_cpu();
> > -       if (nd_region->num_lanes < nr_cpu_ids) {
> > -               struct nd_percpu_lane *ndl_lock, *ndl_count;
> > +       cpu = get_cpu_light();
> >
> > -               lane = cpu % nd_region->num_lanes;
> > -               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> > -               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> > -               if (ndl_count->count++ == 0)
> > -                       spin_lock(&ndl_lock->lock);
> > -       } else
> > -               lane = cpu;
> > +       lane = cpu % nd_region->num_lanes;
> > +       ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> > +       ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> > +       if (ndl_count->count++ == 0)
> > +               spin_lock(&ndl_lock->lock);
>
> I could not understand why spinlock here with condition?.
> Are two tasks sharing same lane in same cpu and we are increasing
> the count with first consumer. Other task won't wait/spin as count is
> already elevated?

When the number of cpus exceeds the number of lanes, or if RT makes
this routine re-entrant on the same cpu, then the lock is needed to
ensure single user of the lane. It is allowed for a cpu to acquire the
same lane twice, hence the ndl_count.

> Or I am misunderstanding it.
>
> Thanks,
> Pankaj
>
>
> >
> >         return lane;
> >  }
> > @@ -966,17 +963,14 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
> >
> >  void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
> >  {
> > -       if (nd_region->num_lanes < nr_cpu_ids) {
> > -               unsigned int cpu = get_cpu();
> > -               struct nd_percpu_lane *ndl_lock, *ndl_count;
> > -
> > -               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> > -               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> > -               if (--ndl_count->count == 0)
> > -                       spin_unlock(&ndl_lock->lock);
> > -               put_cpu();
> > -       }
> > -       put_cpu();
> > +       unsigned int cpu = get_cpu();
>
> Is this get_cpu_light here?

Yes, good catch.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Is nvdimm driver RT compatible?
  2019-02-15 16:31       ` Dan Williams
@ 2019-02-18  9:57         ` Pankaj Gupta
  0 siblings, 0 replies; 7+ messages in thread
From: Pankaj Gupta @ 2019-02-18  9:57 UTC (permalink / raw)
  To: Dan Williams; +Cc: Yongxin Liu, linux-nvdimm


> >
> > > >
> > > > Hi Yongxin,
> > > >
> > > > >
> > > > > Hi experts,
> > > > >
> > > > > Could anyone tell me whether Linux nvdimm driver is RT compatible?
> > > > >
> > > > > When I was testing PMEM performance with fio using the following
> > > > > command,
> > > > > I
> > > > > got calltrace below.
> > > > >
> > > > > # fio -filename=/dev/pmem0s -direct=1 -iodepth 1 -thread -rw=randrw
> > > > > -rwmixread=70 -ioengine=psync -bs=16k -size=1G -numjobs=30
> > > > > -runtime=100
> > > > > -group_reporting -name=mytest
> > > > >
> > > > >     BUG: scheduling while atomic: fio/2514/0x00000002
> > > > >     Modules linked in: intel_rapl nd_pmem nd_btt skx_edac iTCO_wdt
> > > > >     iTCO_vendor_support intel_powerclamp coretemp
> > > > >     crct10dif_pclmul crct10dif_common aesni_intel aes_x86_64
> > > > >     crypto_simd
> > > > >     cryptd i40e nvme glue_helper nvme_core lpc_ich i2c_i801 nfit
> > > > >     pcc_cpufreq wmi libnvdimm acpi_pad
> > > > >     acpi_power_meter
> > > > >     Preemption disabled at:
> > > > >     [<ffffffffc03608d9>] nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> > > > >     CPU: 44 PID: 2514 Comm: fio Tainted: G        W
> > > > >     4.18.20-rt8-preempt-rt #1
> > > > >     Call Trace:
> > > > >      dump_stack+0x4f/0x6a
> > > > >      ? nd_region_acquire_lane+0x19/0x80 [libnvdimm]
> > > > >      __schedule_bug.cold.17+0x38/0x55
> > > > >      __schedule+0x484/0x6c0
> > > > >      ? _raw_spin_lock+0x17/0x40
> > > > >      schedule+0x3d/0xe0
> > > > >      rt_spin_lock_slowlock_locked+0x118/0x2a0
> > > > >      rt_spin_lock_slowlock+0x57/0x90
> > > > >      rt_spin_lock+0x52/0x60
> > > > >      btt_write_pg.isra.16+0x280/0x4b0 [nd_btt]
> > > > >      btt_make_request+0x1b1/0x320 [nd_btt]
> > > > >      generic_make_request+0x1dc/0x3f0
> > > > >      submit_bio+0x49/0x140
> > > > >
> > > > > nd_region_acquire_lane() disables preemption with get_cpu() which
> > > > > causes
> > > > > "scheduling while atomic" spews on RT.
> > > > >
> > > > > Is it safe to replace get_cpu()/put_cpu() with
> > > > > get_cpu_light()/put_cpu_light() in
> > > > > nd_region_acquire_lane()/nd_region_release_lane()?
> > > >
> > > > get_cpu disables preemption on a core and get_cpu_light will just avoid
> > > > task migration across the cores. If we change this to get/put_cpu_light
> > > > I think there is possibility of a race here.
> > > >
> > > > > After this replacement, the codes protected by
> > > > > nd_region_release_lane/nd_region_release_lane would become
> > > > > pre-emptible.
> > > > > So are these codes reentrant?
> > > >
> > > > Another thing I see is when get_cpu is called in
> > > > "nd_region_acquire_lane"
> > > > there is no corresponding call for "put_cpu" in same function. This is
> > > > called only in "nd_region_release_lane". It means for the duration of
> > > > these calls preemption will be disabled.
> > > >
> > > > Here, we cannot change preemption disable to migration disable for
> > > > mainline.
> > > > As there is code which can result in race.
> > > >
> > > > For RT only patchset as well, IIUC will have same problem as there is
> > > > spin_lock
> > > > protection on a condition and path before/without spinlock may also
> > > > race.
> > > > We can have something like local_lock_* =>  local_lock_cpu
> > > >
> > > > This are my two cents.
> > >
> > > The btt implementation assumes that the btt code will not be
> > > re-entered on the same cpu. So the pre-emption disable is required for
> > > performance, but for correctness the implementation can fall back to
> > > using a lock along with migration being disabled. In other words I
> > > think changing nd_region_acquire_lane() like the following would be
> > > sufficient for RT kernels:
> >
> > o.k
> > Not sure if its good idea to have different implementation for both
> > mainline
> > and RT Kernel. It at all its possible to have single implementation. But I
> > don't understand all of optimizations in original code so this might be
> > okay.
> >
> 
> I agree, it was just clarifying the implementation, but I would
> otherwise not recommend shipping different driver implementations.
> Unless / until get_cpu_light() is available in mainline I think this
> patch should be considered reference only.
> 
> > >
> > > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> > > index e2818f94f292..e720fc42bbc0 100644
> > > --- a/drivers/nvdimm/region_devs.c
> > > +++ b/drivers/nvdimm/region_devs.c
> > > @@ -947,18 +947,15 @@ int nd_blk_region_init(struct nd_region *nd_region)
> > >  unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
> > >  {
> > >         unsigned int cpu, lane;
> > > +       struct nd_percpu_lane *ndl_lock, *ndl_count;
> > >
> > > -       cpu = get_cpu();
> > > -       if (nd_region->num_lanes < nr_cpu_ids) {
> > > -               struct nd_percpu_lane *ndl_lock, *ndl_count;
> > > +       cpu = get_cpu_light();
> > >
> > > -               lane = cpu % nd_region->num_lanes;
> > > -               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> > > -               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> > > -               if (ndl_count->count++ == 0)
> > > -                       spin_lock(&ndl_lock->lock);
> > > -       } else
> > > -               lane = cpu;
> > > +       lane = cpu % nd_region->num_lanes;
> > > +       ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> > > +       ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> > > +       if (ndl_count->count++ == 0)
> > > +               spin_lock(&ndl_lock->lock);
> >
> > I could not understand why spinlock here with condition?.
> > Are two tasks sharing same lane in same cpu and we are increasing
> > the count with first consumer. Other task won't wait/spin as count is
> > already elevated?
> 
> When the number of cpus exceeds the number of lanes, or if RT makes
> this routine re-entrant on the same cpu, then the lock is needed to
> ensure single user of the lane. It is allowed for a cpu to acquire the
> same lane twice, hence the ndl_count.

o.k. Thanks for explaining. I just have a small comment:

For non-RT as preemption is disabled on a CPU, only check is for reentrant
code. This is true because there is no chance that any other task(on same CPU)
would in any case execute the critical section.

For RT, as preemption is enabled. Unless something is explicitly done in
design some other task can simultaneously execute the critical section. If 
there is any possibility of this then we are not protected for this.

Or I am missing something here.

Thanks,
Pankaj

> 
> > Or I am misunderstanding it.
> >
> > Thanks,
> > Pankaj
> >
> >
> > >
> > >         return lane;
> > >  }
> > > @@ -966,17 +963,14 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
> > >
> > >  void nd_region_release_lane(struct nd_region *nd_region, unsigned int
> > >  lane)
> > >  {
> > > -       if (nd_region->num_lanes < nr_cpu_ids) {
> > > -               unsigned int cpu = get_cpu();
> > > -               struct nd_percpu_lane *ndl_lock, *ndl_count;
> > > -
> > > -               ndl_count = per_cpu_ptr(nd_region->lane, cpu);
> > > -               ndl_lock = per_cpu_ptr(nd_region->lane, lane);
> > > -               if (--ndl_count->count == 0)
> > > -                       spin_unlock(&ndl_lock->lock);
> > > -               put_cpu();
> > > -       }
> > > -       put_cpu();
> > > +       unsigned int cpu = get_cpu();
> >
> > Is this get_cpu_light here?
> 
> Yes, good catch.
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-02-18  9:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14  4:10 Is nvdimm driver RT compatible? Liu, Yongxin
2019-02-14  6:49 ` Pankaj Gupta
2019-02-14 20:45   ` Dan Williams
2019-02-14 23:46     ` Liu, Yongxin
2019-02-15  7:24     ` Pankaj Gupta
2019-02-15 16:31       ` Dan Williams
2019-02-18  9:57         ` Pankaj Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).