All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] NULL pointer de-ref when setting io.cost.qos on LUKS devices
@ 2020-03-03 14:13 Benjamin Berg
       [not found] ` <1dbdcbb0c8db70a08aac467311a80abcf7779575.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Berg @ 2020-03-03 14:13 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 5132 bytes --]

Hello,

so, I tried to set io.latency for some cgroups for the root device,
which is ext4 inside LVM inside LUKS.

I tried doing so with systemd (it has the IODeviceLatencyTargetSec
option). What was interesting is that it selects the LUKS device if I
ask it to set the latency for the root partition. i.e. setting:

  IODeviceLatencyTargetSec=/usr/bin 20ms

results in io.latency:

  253:0 target=20000

and 253:0 corresponds to the LUKS device. 253:1 would be the root
partition itself, 8:2 the partition on disk and 8:0 the disk. I am not
sure the systemd selection works as intended. And I wonder which device
systemd should select in this scenario.


Anyway, I then thought I might need to enable the QOS controller in
io.cost.qos before it will take effect. Unfortunately, trying to do so
reliably results in an oops on my system, i.e.:

  # echo 253:1 enable=1 >io.cost.qos
or
  # echo 253:0 enable=1 >io.cost.qos

results in the below oops. A similar setup on a different machine
without LUKS seems to work fine.

The kernel version was 5.5.6-201.fc31.x86_64

Benjamin


Mar 02 15:06:31 ben-x1 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000138
Mar 02 15:06:31 ben-x1 kernel: #PF: supervisor read access in kernel mode
Mar 02 15:06:31 ben-x1 kernel: #PF: error_code(0x0000) - not-present page
Mar 02 15:06:31 ben-x1 kernel: PGD 0 P4D 0 
Mar 02 15:06:31 ben-x1 kernel: Oops: 0000 [#1] SMP PTI
Mar 02 15:06:31 ben-x1 kernel: CPU: 3 PID: 3413 Comm: bash Tainted: G           OE     5.5.6-201.fc31.x86_64 #1
Mar 02 15:06:31 ben-x1 kernel: Hardware name: LENOVO 20FCS0RV0G/20FCS0RV0G, BIOS N1FET36W (1.10 ) 03/09/2016
Mar 02 15:06:31 ben-x1 kernel: RIP: 0010:ioc_pd_init+0x126/0x190
Mar 02 15:06:31 ben-x1 kernel: Code: 48 8b 45 28 48 8b 00 8b 80 f8 00 00 00 41 89 84 24 38 01 00 00 48 85 ed 74 28 48 63 0d d3 40 25 01 48 83 c1 1c 48 8b 44 cd 08 <48> 63 90 38 01 00 00 49 89 84 d4 40 01 00 00 48 8b 6d 38 48 85 ed
Mar 02 15:06:31 ben-x1 kernel: RSP: 0018:ffffb0c3880ffcd8 EFLAGS: 00010086
Mar 02 15:06:31 ben-x1 kernel: RAX: 0000000000000000 RBX: ffff89609be87e00 RCX: 000000000000001e
Mar 02 15:06:31 ben-x1 kernel: RDX: 0000000000000003 RSI: 0000000000000001 RDI: ffff89609b8e0d28
Mar 02 15:06:31 ben-x1 kernel: RBP: ffff896099e2be00 R08: ffff8960c21b0140 R09: ffff89609b8e0000
Mar 02 15:06:31 ben-x1 kernel: R10: ffff8960c1002e00 R11: ffff896083bdc000 R12: ffff89609b8e0c00
Mar 02 15:06:31 ben-x1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8a745e40
Mar 02 15:06:31 ben-x1 kernel: FS:  00007fba669b4740(0000) GS:ffff8960c2180000(0000) knlGS:0000000000000000
Mar 02 15:06:31 ben-x1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 02 15:06:31 ben-x1 kernel: CR2: 0000000000000138 CR3: 000000021a52c003 CR4: 00000000003606e0
Mar 02 15:06:31 ben-x1 kernel: Call Trace:
Mar 02 15:06:31 ben-x1 kernel:  blkcg_activate_policy+0x11d/0x2b0
Mar 02 15:06:31 ben-x1 kernel:  blk_iocost_init+0x15c/0x1e0
Mar 02 15:06:31 ben-x1 kernel:  ioc_qos_write+0x2d1/0x3e0
Mar 02 15:06:31 ben-x1 kernel:  ? do_filp_open+0xa5/0x100
Mar 02 15:06:31 ben-x1 kernel:  cgroup_file_write+0x8a/0x150
Mar 02 15:06:31 ben-x1 kernel:  ? __check_object_size+0x136/0x147
Mar 02 15:06:31 ben-x1 kernel:  kernfs_fop_write+0xce/0x1b0
Mar 02 15:06:31 ben-x1 kernel:  vfs_write+0xb6/0x1a0
Mar 02 15:06:31 ben-x1 kernel:  ksys_write+0x5f/0xe0
Mar 02 15:06:31 ben-x1 kernel:  do_syscall_64+0x5b/0x1c0
Mar 02 15:06:31 ben-x1 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 02 15:06:31 ben-x1 kernel: RIP: 0033:0x7fba66aa94b7
Mar 02 15:06:31 ben-x1 kernel: Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
Mar 02 15:06:31 ben-x1 kernel: RSP: 002b:00007ffc71074178 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Mar 02 15:06:31 ben-x1 kernel: RAX: ffffffffffffffda RBX: 000000000000000f RCX: 00007fba66aa94b7
Mar 02 15:06:31 ben-x1 kernel: RDX: 000000000000000f RSI: 00005612c8793640 RDI: 0000000000000001
Mar 02 15:06:31 ben-x1 kernel: RBP: 00005612c8793640 R08: 000000000000000a R09: 0000000000000008
Mar 02 15:06:31 ben-x1 kernel: R10: 00005612c90f56b0 R11: 0000000000000246 R12: 000000000000000f
Mar 02 15:06:31 ben-x1 kernel: R13: 00007fba66b7a500 R14: 000000000000000f R15: 00007fba66b7a700
Mar 02 15:06:31 ben-x1 kernel: Modules linked in: uinput ccm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp rfcomm tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtab>
Mar 02 15:06:31 ben-x1 kernel:  snd_hda_codec_hdmi mc snd_hda_codec_conexant snd_hda_codec_generic snd_compress ac97_bus snd_pcm_dmaengine cfg80211 snd_hda_intel ecdh_generic ecc snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep irqbypass intel_cstate snd_seq intel_uncore snd_>
Mar 02 15:06:31 ben-x1 kernel: CR2: 0000000000000138
Mar 02 15:06:31 ben-x1 kernel: ---[ end trace d1bdee4e9a482594 ]---

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] NULL pointer de-ref when setting io.cost.qos on LUKS devices
       [not found] ` <1dbdcbb0c8db70a08aac467311a80abcf7779575.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
@ 2020-03-03 14:19   ` Tejun Heo
       [not found]     ` <20200303141902.GB189690-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2020-03-03 14:19 UTC (permalink / raw)
  To: Benjamin Berg; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Tue, Mar 03, 2020 at 03:13:10PM +0100, Benjamin Berg wrote:
> so, I tried to set io.latency for some cgroups for the root device,
> which is ext4 inside LVM inside LUKS.

It's pointless on compound devices. I think the right thing to do here
is disallowing to enable it on those devices.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] NULL pointer de-ref when setting io.cost.qos on LUKS devices
       [not found]     ` <20200303141902.GB189690-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
@ 2020-03-03 14:40       ` Benjamin Berg
       [not found]         ` <24bd31cdaa3ea945908bc11cea05d6aae6929240.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Berg @ 2020-03-03 14:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 658 bytes --]

On Tue, 2020-03-03 at 09:19 -0500, Tejun Heo wrote:
> Hello,
> 
> On Tue, Mar 03, 2020 at 03:13:10PM +0100, Benjamin Berg wrote:
> > so, I tried to set io.latency for some cgroups for the root device,
> > which is ext4 inside LVM inside LUKS.
> 
> It's pointless on compound devices. I think the right thing to do here
> is disallowing to enable it on those devices.

I believe systemd tries to resolve to /dev/sda but that seems to fail
for me. So I think there is a bug in that code; I'll verify that and
submit a fix if so.

Which device should actually be selected? Is it /dev/sda or the mapper
device that / is mounted from?

Benjamin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] NULL pointer de-ref when setting io.cost.qos on LUKS devices
       [not found]         ` <24bd31cdaa3ea945908bc11cea05d6aae6929240.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
@ 2020-03-04 16:42           ` Tejun Heo
       [not found]             ` <20200304164205.GH189690-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2020-03-04 16:42 UTC (permalink / raw)
  To: Benjamin Berg; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

Hello, Benjamin.

On Tue, Mar 03, 2020 at 03:40:38PM +0100, Benjamin Berg wrote:
> I believe systemd tries to resolve to /dev/sda but that seems to fail
> for me. So I think there is a bug in that code; I'll verify that and
> submit a fix if so.
> 
> Which device should actually be selected? Is it /dev/sda or the mapper
> device that / is mounted from?

Right now, the situation isn't great with dm. When pagecache
writebacks go through dm, in some cases including dm-crypt, the cgroup
ownership information is completely lost and all writes end up being
issued as the root cgroup, so it breaks down when dm is in use.

In the longer term, what we wanna do is controlling at physical
devices (sda here) and then updating dm so that it can maintain and
propagate the ownership correctly but we aren't there yet.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] NULL pointer de-ref when setting io.cost.qos on LUKS devices
       [not found]             ` <20200304164205.GH189690-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
@ 2020-03-05 10:31               ` Benjamin Berg
       [not found]                 ` <71515f7a143937ab9ab11625485659bb7288f024.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Berg @ 2020-03-05 10:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 697 bytes --]

On Wed, 2020-03-04 at 11:42 -0500, Tejun Heo wrote:
> [SNIP]
> 
> Right now, the situation isn't great with dm. When pagecache
> writebacks go through dm, in some cases including dm-crypt, the cgroup
> ownership information is completely lost and all writes end up being
> issued as the root cgroup, so it breaks down when dm is in use.

Fair enough.

> In the longer term, what we wanna do is controlling at physical
> devices (sda here) and then updating dm so that it can maintain and
> propagate the ownership correctly but we aren't there yet.

Perfect, so what I am seeing is really just a small systemd bug. Thansk
for confirming, I'll submit a patch to fix it.

Benjamin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] NULL pointer de-ref when setting io.cost.qos on LUKS devices
       [not found]                 ` <71515f7a143937ab9ab11625485659bb7288f024.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
@ 2020-03-05 15:20                   ` Tejun Heo
  0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2020-03-05 15:20 UTC (permalink / raw)
  To: Benjamin Berg; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Thu, Mar 05, 2020 at 11:31:29AM +0100, Benjamin Berg wrote:
> > In the longer term, what we wanna do is controlling at physical
> > devices (sda here) and then updating dm so that it can maintain and
> > propagate the ownership correctly but we aren't there yet.
> 
> Perfect, so what I am seeing is really just a small systemd bug. Thansk
> for confirming, I'll submit a patch to fix it.

IO control is a bit confusing right now. Here's the breakdown.

* There are four controllers - io.latency, io.cost, io.max and bfq's
  weight implementation.

* io.latency and io.cost when combined with btrfs can control all IOs
  including metadata IOs and writebacks while avoiding priority
  inversions.

* wbt may interfere with IO control. It can be disabled with "echo 0 >
  /sys/block/DEV/queue/wbt_lat_usec".

* io.latency is useful to protect one thing against everything else
  but it gets tricky when multiple entities competing at different
  priority levels.

* io.cost is what we're verifying against and deploying. While
  system-level configuration is a bit involved
  (/sys/fs/cgroup/io.cost.model and /sys/fs/cgroup/io.cost.qos).
  Actual cgroup configuration is really simple. Simply enabling IO
  controller and leaving all weights at default often can achieve most
  of what's needed.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-05 15:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-03 14:13 [BUG] NULL pointer de-ref when setting io.cost.qos on LUKS devices Benjamin Berg
     [not found] ` <1dbdcbb0c8db70a08aac467311a80abcf7779575.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
2020-03-03 14:19   ` Tejun Heo
     [not found]     ` <20200303141902.GB189690-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
2020-03-03 14:40       ` Benjamin Berg
     [not found]         ` <24bd31cdaa3ea945908bc11cea05d6aae6929240.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
2020-03-04 16:42           ` Tejun Heo
     [not found]             ` <20200304164205.GH189690-146+VewaZzwNjtGbbfXrCEEOCMrvLtNR@public.gmane.org>
2020-03-05 10:31               ` Benjamin Berg
     [not found]                 ` <71515f7a143937ab9ab11625485659bb7288f024.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
2020-03-05 15:20                   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.