* uml segfault during I/O @ 2019-10-17 11:37 Ritesh Raj Sarraf [not found] ` <1ccf27d8-6b6a-7d08-acef-93077f07511b@cambridgegreys.com> 0 siblings, 1 reply; 12+ messages in thread From: Ritesh Raj Sarraf @ 2019-10-17 11:37 UTC (permalink / raw) To: linux-um; +Cc: Anton Ivanov [-- Attachment #1.1: Type: text/plain, Size: 2580 bytes --] This happens on the 5.2 Linux kernel with Debian patches on top. The configs details are: Linux Host: 5.2 (Debian) Linux UML: 5.2 (Debian) This tends to happen very easily when doing good amount of I/O, in this case, doing an `apt upgrade` [ OK ] Started Update UTMP about System Runlevel Changes. NET: Registered protocol family 10 Segment Routing with IPv6 random: crng init done Modules linked in: ipv6 crc_ccitt Pid: 870, comm: kworker/0:1H Not tainted 5.2.17 RIP: 0033:[<00000000607e9c03>] RSP: 000000009d40fde0 EFLAGS: 00010297 RAX: 0000000000000000 RBX: 0000000000004801 RCX: 000000009d40fe30 RDX: 0000000000000001 RSI: 000000009dbc35c0 RDI: 0000000000000000 RBP: 000000009d40fe30 R08: 0000000c00000204 R09: 8080808080808080 R10: fefefefefefefeff R11: 0000000000000246 R12: 0000000000000000 R13: 000000009dbc35c0 R14: 000000009dbc3608 R15: 000000009dbb9800 Kernel panic - not syncing: Segfault with no mm CPU: 0 PID: 870 Comm: kworker/0:1H Not tainted 5.2.17 #2 Workqueue: kblockd blk_mq_requeue_work Stack: 00000000 100000000000001 607d4425 00000001 9dbc35c0 00000000 00000000 00000000 9dbb9800 607d44e9 9d40fe30 9d40fe30 Call Trace: [<607d4425>] ? blk_mq_sched_insert_request+0x0/0xff [<607d44e9>] ? blk_mq_sched_insert_request+0xc4/0xff [<607d4425>] ? blk_mq_sched_insert_request+0x0/0xff [<607d07c6>] ? blk_mq_requeue_work+0xd0/0x133 [<60434285>] ? process_one_work+0x1e4/0x34c [<60431fd0>] ? move_linked_works+0x0/0x4f [<609791f0>] ? __schedule+0x0/0x41d [<6043569f>] ? wq_worker_running+0xd/0x2f [<60431fd0>] ? move_linked_works+0x0/0x4f [<60435362>] ? worker_thread+0x324/0x45e [<6043215f>] ? set_pf_worker+0x0/0x5e [<604167c7>] ? get_signals+0x0/0xa [<60439485>] ? __kthread_parkme+0x4a/0x94 [<60421a53>] ? do_exit+0x0/0x948 [<6043503e>] ? worker_thread+0x0/0x45e [<604399aa>] ? kthread+0x198/0x1a0 [<603fea08>] ? new_thread_handler+0x82/0xb8 /home/rrs/bin/uml-debian: line 8: 16743 Aborted (core dumped) linux ubd0=~/rrs-home/Libvirt-Images/uml.img eth0=daemon mem=1024M rw 16:58 ♒ ॐ ☹ 😟=> 134 And the linux process remains stray on the host machine. rrs 18159 0.0 0.0 1057692 11496 ? Ss 16:57 0:00 linux ubd0=/home/rrs/rrs-home/Libvirt-Images/uml.img eth0=daemon mem=1024M rw rrs 18187 0.0 0.0 1057704 11496 ? Ss 16:57 0:00 linux ubd0=/home/rrs/rrs-home/Libvirt-Images/uml.img eth0=daemon mem=1024M rw -- Ritesh Raj Sarraf RESEARCHUT - http://www.researchut.com "Necessity is the mother of invention." [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 152 bytes --] _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <1ccf27d8-6b6a-7d08-acef-93077f07511b@cambridgegreys.com>]
* Re: uml segfault during I/O [not found] ` <1ccf27d8-6b6a-7d08-acef-93077f07511b@cambridgegreys.com> @ 2019-10-17 13:30 ` Ritesh Raj Sarraf 2019-10-17 13:33 ` Anton Ivanov 0 siblings, 1 reply; 12+ messages in thread From: Ritesh Raj Sarraf @ 2019-10-17 13:30 UTC (permalink / raw) To: Anton Ivanov, linux-um [-- Attachment #1.1: Type: text/plain, Size: 1145 bytes --] On Thu, 2019-10-17 at 14:02 +0100, Anton Ivanov wrote: > Looking into that. I have not run into anything like that, but I have > not used any of the legacy networking for 5 odd years now. > Do you think this is related to networking ? I ask because there was no network activity going on. apt is just an example. The packages were all already downloaded and there was no network activity. Rather, there was block I/O. One thing I noticed, which may or may not be useful to this report. I was booting the uml guest and immediately logging into it and running the block I/O. And the segfault would occur. If I, instead, booted the uml vm and let it remain idle for a minute or so and then did the I/O, it worked fine. So I am not sure if there is any lazy initialization happening in the background which gets corrupted during immediate hot boot I/O. On the other hand, if you think there can be any number of commands to run locally that could generate more information, please assist me so. Thanks, Ritesh -- Ritesh Raj Sarraf RESEARCHUT - http://www.researchut.com "Necessity is the mother of invention." [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 152 bytes --] _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: uml segfault during I/O 2019-10-17 13:30 ` Ritesh Raj Sarraf @ 2019-10-17 13:33 ` Anton Ivanov 2019-10-17 15:03 ` Anton Ivanov 0 siblings, 1 reply; 12+ messages in thread From: Anton Ivanov @ 2019-10-17 13:33 UTC (permalink / raw) To: rrs, linux-um On 17/10/2019 14:30, Ritesh Raj Sarraf wrote: > On Thu, 2019-10-17 at 14:02 +0100, Anton Ivanov wrote: >> Looking into that. I have not run into anything like that, but I have >> not used any of the legacy networking for 5 odd years now. >> > > Do you think this is related to networking ? I ask because there was no > network activity going on. no, it's disk somewhere. I managed to reproduce it with 5.2 stock on Debian 5.2 host. > > apt is just an example. The packages were all already downloaded and > there was no network activity. Rather, there was block I/O. I am looking into that. > > One thing I noticed, which may or may not be useful to this report. I > was booting the uml guest and immediately logging into it and running > the block I/O. And the segfault would occur. > > If I, instead, booted the uml vm and let it remain idle for a minute or > so and then did the I/O, it worked fine. So I am not sure if there is > any lazy initialization happening in the background which gets > corrupted during immediate hot boot I/O. Interesting... > > On the other hand, if you think there can be any number of commands to > run locally that could generate more information, please assist me so. As I said - I managed to reproduce it, I am looking at it. In first instance I am trying with a couple of version up/down to see if this is 5.2 specific. > > Thanks, > Ritesh > > > _______________________________________________ > linux-um mailing list > linux-um@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-um > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: uml segfault during I/O 2019-10-17 13:33 ` Anton Ivanov @ 2019-10-17 15:03 ` Anton Ivanov 2019-10-17 16:29 ` Anton Ivanov 0 siblings, 1 reply; 12+ messages in thread From: Anton Ivanov @ 2019-10-17 15:03 UTC (permalink / raw) To: rrs, linux-um On 17/10/2019 14:33, Anton Ivanov wrote: > > > On 17/10/2019 14:30, Ritesh Raj Sarraf wrote: >> On Thu, 2019-10-17 at 14:02 +0100, Anton Ivanov wrote: >>> Looking into that. I have not run into anything like that, but I have >>> not used any of the legacy networking for 5 odd years now. >>> >> >> Do you think this is related to networking ? I ask because there was no >> network activity going on. > > no, it's disk somewhere. I managed to reproduce it with 5.2 stock on > Debian 5.2 host. > >> >> apt is just an example. The packages were all already downloaded and >> there was no network activity. Rather, there was block I/O. > > I am looking into that. > >> >> One thing I noticed, which may or may not be useful to this report. I >> was booting the uml guest and immediately logging into it and running >> the block I/O. And the segfault would occur. >> >> If I, instead, booted the uml vm and let it remain idle for a minute or >> so and then did the I/O, it worked fine. So I am not sure if there is >> any lazy initialization happening in the background which gets >> corrupted during immediate hot boot I/O. > > Interesting... > >> >> On the other hand, if you think there can be any number of commands to >> run locally that could generate more information, please assist me so. > > As I said - I managed to reproduce it, I am looking at it. In first > instance I am trying with a couple of version up/down to see if this is > 5.2 specific. I cannot even get it to start on 5.4-rc1, 5.3 shows the same symptoms. > >> >> Thanks, >> Ritesh >> >> >> _______________________________________________ >> linux-um mailing list >> linux-um@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-um >> > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: uml segfault during I/O 2019-10-17 15:03 ` Anton Ivanov @ 2019-10-17 16:29 ` Anton Ivanov 2019-10-18 7:35 ` Anton Ivanov 0 siblings, 1 reply; 12+ messages in thread From: Anton Ivanov @ 2019-10-17 16:29 UTC (permalink / raw) To: rrs, linux-um On 17/10/2019 16:03, Anton Ivanov wrote: > > > On 17/10/2019 14:33, Anton Ivanov wrote: >> >> >> On 17/10/2019 14:30, Ritesh Raj Sarraf wrote: >>> On Thu, 2019-10-17 at 14:02 +0100, Anton Ivanov wrote: >>>> Looking into that. I have not run into anything like that, but I have >>>> not used any of the legacy networking for 5 odd years now. >>>> >>> >>> Do you think this is related to networking ? I ask because there was no >>> network activity going on. >> >> no, it's disk somewhere. I managed to reproduce it with 5.2 stock on >> Debian 5.2 host. >> >>> >>> apt is just an example. The packages were all already downloaded and >>> there was no network activity. Rather, there was block I/O. >> >> I am looking into that. >> >>> >>> One thing I noticed, which may or may not be useful to this report. I >>> was booting the uml guest and immediately logging into it and running >>> the block I/O. And the segfault would occur. >>> >>> If I, instead, booted the uml vm and let it remain idle for a minute or >>> so and then did the I/O, it worked fine. So I am not sure if there is >>> any lazy initialization happening in the background which gets >>> corrupted during immediate hot boot I/O. >> >> Interesting... >> >>> >>> On the other hand, if you think there can be any number of commands to >>> run locally that could generate more information, please assist me so. >> >> As I said - I managed to reproduce it, I am looking at it. In first >> instance I am trying with a couple of version up/down to see if this >> is 5.2 specific. > > I cannot even get it to start on 5.4-rc1, 5.3 shows the same symptoms. This is something outside the UBD driver as such. There were no notable changes to it since we ported it to block-mq and added DISCARD which was quite a while back. I am going to check the other usual suspects such as IRQs, but that is something I test quite extensively when working on the networking side. So I suspect that this is something outside UML which is showing only in UML for some reason. A. > >> >>> >>> Thanks, >>> Ritesh >>> >>> >>> _______________________________________________ >>> linux-um mailing list >>> linux-um@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-um >>> >> > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: uml segfault during I/O 2019-10-17 16:29 ` Anton Ivanov @ 2019-10-18 7:35 ` Anton Ivanov 2019-10-18 8:57 ` Johannes Berg 2019-10-18 9:55 ` Anton Ivanov 0 siblings, 2 replies; 12+ messages in thread From: Anton Ivanov @ 2019-10-18 7:35 UTC (permalink / raw) To: rrs, linux-um On 17/10/2019 17:29, Anton Ivanov wrote: > > > On 17/10/2019 16:03, Anton Ivanov wrote: >> >> >> On 17/10/2019 14:33, Anton Ivanov wrote: >>> >>> >>> On 17/10/2019 14:30, Ritesh Raj Sarraf wrote: >>>> On Thu, 2019-10-17 at 14:02 +0100, Anton Ivanov wrote: >>>>> Looking into that. I have not run into anything like that, but I have >>>>> not used any of the legacy networking for 5 odd years now. >>>>> >>>> >>>> Do you think this is related to networking ? I ask because there was no >>>> network activity going on. >>> >>> no, it's disk somewhere. I managed to reproduce it with 5.2 stock on >>> Debian 5.2 host. >>> >>>> >>>> apt is just an example. The packages were all already downloaded and >>>> there was no network activity. Rather, there was block I/O. >>> >>> I am looking into that. >>> >>>> >>>> One thing I noticed, which may or may not be useful to this report. I >>>> was booting the uml guest and immediately logging into it and running >>>> the block I/O. And the segfault would occur. >>>> >>>> If I, instead, booted the uml vm and let it remain idle for a minute or >>>> so and then did the I/O, it worked fine. So I am not sure if there is >>>> any lazy initialization happening in the background which gets >>>> corrupted during immediate hot boot I/O. >>> >>> Interesting... >>> >>>> >>>> On the other hand, if you think there can be any number of commands to >>>> run locally that could generate more information, please assist me so. >>> >>> As I said - I managed to reproduce it, I am looking at it. In first >>> instance I am trying with a couple of version up/down to see if this >>> is 5.2 specific. >> >> I cannot even get it to start on 5.4-rc1, 5.3 shows the same symptoms. > > This is something outside the UBD driver as such. There were no notable > changes to it since we ported it to block-mq and added DISCARD which was > quite a while back. > > I am going to check the other usual suspects such as IRQs, but that is > something I test quite extensively when working on the networking side. > > So I suspect that this is something outside UML which is showing only in > UML for some reason. Still happening with 5.2.21, albeit a bit more difficult to reproduce. Looking at the backtraces it is ALWAYS a result of a re-queue. [ 263.990000] [<60440633>] blk_mq_dispatch_rq_list+0xf3/0x5f0 [ 263.990000] [<60440501>] ? blk_mq_get_driver_tag+0xc1/0x100 [ 263.990000] [<60440540>] ? blk_mq_dispatch_rq_list+0x0/0x5f0 [ 263.990000] [<60445966>] blk_mq_do_dispatch_sched+0x66/0xe0 [ 263.990000] [<60446107>] blk_mq_sched_dispatch_requests+0x107/0x190 [ 263.990000] [<6043f130>] ? blk_mq_run_hw_queue+0x0/0x120 [ 263.990000] [<6043efb4>] __blk_mq_run_hw_queue+0x74/0xd0 [ 263.990000] [<6043f0d4>] __blk_mq_delay_run_hw_queue+0xc4/0xd0 [ 263.990000] [<6043f1d9>] blk_mq_run_hw_queue+0xa9/0x120 [ 263.990000] [<6043f292>] blk_mq_run_hw_queues+0x42/0x60 [ 263.990000] [<60440bc0>] ? blk_mq_request_bypass_insert+0x0/0x90 [ 263.990000] [<604462a0>] ? blk_mq_sched_insert_request+0x0/0x1c0 ALWAYS >>[ 263.990000] [<604414b0>] blk_mq_requeue_work+0x160/0x170 [ 263.990000] [<6008c91b>] process_one_work+0x1eb/0x490 [ 263.990000] [<607f5aa0>] ? __schedule+0x0/0x620 [ 263.990000] [<6008e000>] ? wq_worker_running+0x10/0x40 [ 263.990000] [<6008c730>] ? process_one_work+0x0/0x490 [ 263.990000] [<6008cc06>] worker_thread+0x46/0x670 [ 263.990000] [<600931c1>] ? __kthread_parkme+0xa1/0xd0 [ 263.990000] [<6008cbc0>] ? worker_thread+0x0/0x670 [ 263.990000] [<6008cbc0>] ? worker_thread+0x0/0x670 [ 263.990000] [<60093bc4>] kthread+0x194/0x1c0 [ 263.990000] [<6002a091>] new_thread_handler+0x81/0xc0 A. > > A. > >> >>> >>>> >>>> Thanks, >>>> Ritesh >>>> >>>> >>>> _______________________________________________ >>>> linux-um mailing list >>>> linux-um@lists.infradead.org >>>> http://lists.infradead.org/mailman/listinfo/linux-um >>>> >>> >> > -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: uml segfault during I/O 2019-10-18 7:35 ` Anton Ivanov @ 2019-10-18 8:57 ` Johannes Berg 2019-10-18 9:13 ` Anton Ivanov 2019-10-18 9:55 ` Anton Ivanov 1 sibling, 1 reply; 12+ messages in thread From: Johannes Berg @ 2019-10-18 8:57 UTC (permalink / raw) To: Anton Ivanov, rrs, linux-um On Fri, 2019-10-18 at 08:35 +0100, Anton Ivanov wrote: > > Still happening with 5.2.21, albeit a bit more difficult to reproduce. Just randomly reviewing the code, isn't there a bug in io_thread()? --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -1602,7 +1602,8 @@ int io_thread(void *arg) written = 0; do { - res = os_write_file(kernel_fd, ((char *) io_req_buffer) + written, n); + res = os_write_file(kernel_fd, ((char *) io_req_buffer) + written, + n - written); if (res >= 0) { written += res; } johannes _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: uml segfault during I/O 2019-10-18 8:57 ` Johannes Berg @ 2019-10-18 9:13 ` Anton Ivanov 2019-10-18 9:49 ` Anton Ivanov 0 siblings, 1 reply; 12+ messages in thread From: Anton Ivanov @ 2019-10-18 9:13 UTC (permalink / raw) To: Johannes Berg, rrs, linux-um On 18/10/2019 09:57, Johannes Berg wrote: > On Fri, 2019-10-18 at 08:35 +0100, Anton Ivanov wrote: >> >> Still happening with 5.2.21, albeit a bit more difficult to reproduce. > > Just randomly reviewing the code, isn't there a bug in io_thread()? Indeed. Well spotted. It is good that a short write to a block device is so rare :) > > --- a/arch/um/drivers/ubd_kern.c > +++ b/arch/um/drivers/ubd_kern.c > @@ -1602,7 +1602,8 @@ int io_thread(void *arg) > written = 0; > > do { > - res = os_write_file(kernel_fd, ((char *) io_req_buffer) + written, n); > + res = os_write_file(kernel_fd, ((char *) io_req_buffer) + written, > + n - written); > if (res >= 0) { > written += res; > } > > johannes > > > _______________________________________________ > linux-um mailing list > linux-um@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-um > I have verified this. It is always in a requeue and always after the io thread has returned EAGAIN on the attempt to submit a request. That part of the code has been tested very heavily before and it works fine in 4.19 -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: uml segfault during I/O 2019-10-18 9:13 ` Anton Ivanov @ 2019-10-18 9:49 ` Anton Ivanov 0 siblings, 0 replies; 12+ messages in thread From: Anton Ivanov @ 2019-10-18 9:49 UTC (permalink / raw) To: Johannes Berg, rrs, linux-um On 18/10/2019 10:13, Anton Ivanov wrote: > > > On 18/10/2019 09:57, Johannes Berg wrote: >> On Fri, 2019-10-18 at 08:35 +0100, Anton Ivanov wrote: >>> >>> Still happening with 5.2.21, albeit a bit more difficult to reproduce. >> >> Just randomly reviewing the code, isn't there a bug in io_thread()? > > Indeed. Well spotted. It is good that a short write to a block device is > so rare :) > >> >> --- a/arch/um/drivers/ubd_kern.c >> +++ b/arch/um/drivers/ubd_kern.c >> @@ -1602,7 +1602,8 @@ int io_thread(void *arg) >> written = 0; >> do { >> - res = os_write_file(kernel_fd, ((char *) >> io_req_buffer) + written, n); >> + res = os_write_file(kernel_fd, ((char *) >> io_req_buffer) + written, >> + n - written); >> if (res >= 0) { >> written += res; >> } >> >> johannes >> >> >> _______________________________________________ >> linux-um mailing list >> linux-um@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-um >> > > I have verified this. > > It is always in a requeue and always after the io thread has returned > EAGAIN on the attempt to submit a request. > > That part of the code has been tested very heavily before and it works > fine in 4.19 > Applied and it is not it. This is something in the blk-mq code and we should forward the whole thread to the blk_mq maintainers. A -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: uml segfault during I/O 2019-10-18 7:35 ` Anton Ivanov 2019-10-18 8:57 ` Johannes Berg @ 2019-10-18 9:55 ` Anton Ivanov 2019-10-18 10:51 ` Anton Ivanov 1 sibling, 1 reply; 12+ messages in thread From: Anton Ivanov @ 2019-10-18 9:55 UTC (permalink / raw) To: rrs, linux-um; +Cc: axboe, hch Adding Jens and Christoph On 18/10/2019 08:35, Anton Ivanov wrote: > > > On 17/10/2019 17:29, Anton Ivanov wrote: >> >> >> On 17/10/2019 16:03, Anton Ivanov wrote: >>> >>> >>> On 17/10/2019 14:33, Anton Ivanov wrote: >>>> >>>> >>>> On 17/10/2019 14:30, Ritesh Raj Sarraf wrote: >>>>> On Thu, 2019-10-17 at 14:02 +0100, Anton Ivanov wrote: >>>>>> Looking into that. I have not run into anything like that, but I have >>>>>> not used any of the legacy networking for 5 odd years now. >>>>>> >>>>> >>>>> Do you think this is related to networking ? I ask because there >>>>> was no >>>>> network activity going on. >>>> >>>> no, it's disk somewhere. I managed to reproduce it with 5.2 stock on >>>> Debian 5.2 host. >>>> >>>>> >>>>> apt is just an example. The packages were all already downloaded and >>>>> there was no network activity. Rather, there was block I/O. >>>> >>>> I am looking into that. >>>> >>>>> >>>>> One thing I noticed, which may or may not be useful to this report. I >>>>> was booting the uml guest and immediately logging into it and running >>>>> the block I/O. And the segfault would occur. >>>>> >>>>> If I, instead, booted the uml vm and let it remain idle for a >>>>> minute or >>>>> so and then did the I/O, it worked fine. So I am not sure if there is >>>>> any lazy initialization happening in the background which gets >>>>> corrupted during immediate hot boot I/O. >>>> >>>> Interesting... >>>> >>>>> >>>>> On the other hand, if you think there can be any number of commands to >>>>> run locally that could generate more information, please assist me so. >>>> >>>> As I said - I managed to reproduce it, I am looking at it. In first >>>> instance I am trying with a couple of version up/down to see if this >>>> is 5.2 specific. >>> >>> I cannot even get it to start on 5.4-rc1, 5.3 shows the same symptoms. >> >> This is something outside the UBD driver as such. There were no >> notable changes to it since we ported it to block-mq and added DISCARD >> which was quite a while back. >> >> I am going to check the other usual suspects such as IRQs, but that is >> something I test quite extensively when working on the networking side. >> >> So I suspect that this is something outside UML which is showing only >> in UML for some reason. > > Still happening with 5.2.21, albeit a bit more difficult to reproduce. > > Looking at the backtraces it is ALWAYS a result of a re-queue. > > [ 263.990000] [<60440633>] blk_mq_dispatch_rq_list+0xf3/0x5f0 > [ 263.990000] [<60440501>] ? blk_mq_get_driver_tag+0xc1/0x100 > [ 263.990000] [<60440540>] ? blk_mq_dispatch_rq_list+0x0/0x5f0 > [ 263.990000] [<60445966>] blk_mq_do_dispatch_sched+0x66/0xe0 > [ 263.990000] [<60446107>] blk_mq_sched_dispatch_requests+0x107/0x190 > [ 263.990000] [<6043f130>] ? blk_mq_run_hw_queue+0x0/0x120 > [ 263.990000] [<6043efb4>] __blk_mq_run_hw_queue+0x74/0xd0 > [ 263.990000] [<6043f0d4>] __blk_mq_delay_run_hw_queue+0xc4/0xd0 > [ 263.990000] [<6043f1d9>] blk_mq_run_hw_queue+0xa9/0x120 > [ 263.990000] [<6043f292>] blk_mq_run_hw_queues+0x42/0x60 > [ 263.990000] [<60440bc0>] ? blk_mq_request_bypass_insert+0x0/0x90 > [ 263.990000] [<604462a0>] ? blk_mq_sched_insert_request+0x0/0x1c0 > ALWAYS >>[ 263.990000] [<604414b0>] blk_mq_requeue_work+0x160/0x170 > [ 263.990000] [<6008c91b>] process_one_work+0x1eb/0x490 > [ 263.990000] [<607f5aa0>] ? __schedule+0x0/0x620 > [ 263.990000] [<6008e000>] ? wq_worker_running+0x10/0x40 > [ 263.990000] [<6008c730>] ? process_one_work+0x0/0x490 > [ 263.990000] [<6008cc06>] worker_thread+0x46/0x670 > [ 263.990000] [<600931c1>] ? __kthread_parkme+0xa1/0xd0 > [ 263.990000] [<6008cbc0>] ? worker_thread+0x0/0x670 > [ 263.990000] [<6008cbc0>] ? worker_thread+0x0/0x670 > [ 263.990000] [<60093bc4>] kthread+0x194/0x1c0 > [ 263.990000] [<6002a091>] new_thread_handler+0x81/0xc0 > > > A. > >> >> A. >> >>> >>>> >>>>> >>>>> Thanks, >>>>> Ritesh >>>>> >>>>> >>>>> _______________________________________________ >>>>> linux-um mailing list >>>>> linux-um@lists.infradead.org >>>>> http://lists.infradead.org/mailman/listinfo/linux-um >>>>> >>>> >>> >> > To put a long story short we have a reproducible segfault when UML re-queues a block request in 5.x This used to work in 4.x We found a couple of minor things in UML when looking at it which we will fix shortly, but the root problem seems to be either in block_mq or in the way we are using it to re-queue requests. -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: uml segfault during I/O 2019-10-18 9:55 ` Anton Ivanov @ 2019-10-18 10:51 ` Anton Ivanov 2019-10-28 16:51 ` Anton Ivanov 0 siblings, 1 reply; 12+ messages in thread From: Anton Ivanov @ 2019-10-18 10:51 UTC (permalink / raw) To: rrs, linux-um; +Cc: axboe, hch On 18/10/2019 10:55, Anton Ivanov wrote: > Adding Jens and Christoph > > On 18/10/2019 08:35, Anton Ivanov wrote: >> >> >> On 17/10/2019 17:29, Anton Ivanov wrote: >>> >>> >>> On 17/10/2019 16:03, Anton Ivanov wrote: >>>> >>>> >>>> On 17/10/2019 14:33, Anton Ivanov wrote: >>>>> >>>>> >>>>> On 17/10/2019 14:30, Ritesh Raj Sarraf wrote: >>>>>> On Thu, 2019-10-17 at 14:02 +0100, Anton Ivanov wrote: >>>>>>> Looking into that. I have not run into anything like that, but I >>>>>>> have >>>>>>> not used any of the legacy networking for 5 odd years now. >>>>>>> >>>>>> >>>>>> Do you think this is related to networking ? I ask because there >>>>>> was no >>>>>> network activity going on. >>>>> >>>>> no, it's disk somewhere. I managed to reproduce it with 5.2 stock >>>>> on Debian 5.2 host. >>>>> >>>>>> >>>>>> apt is just an example. The packages were all already downloaded and >>>>>> there was no network activity. Rather, there was block I/O. >>>>> >>>>> I am looking into that. >>>>> >>>>>> >>>>>> One thing I noticed, which may or may not be useful to this report. I >>>>>> was booting the uml guest and immediately logging into it and running >>>>>> the block I/O. And the segfault would occur. >>>>>> >>>>>> If I, instead, booted the uml vm and let it remain idle for a >>>>>> minute or >>>>>> so and then did the I/O, it worked fine. So I am not sure if there is >>>>>> any lazy initialization happening in the background which gets >>>>>> corrupted during immediate hot boot I/O. >>>>> >>>>> Interesting... >>>>> >>>>>> >>>>>> On the other hand, if you think there can be any number of >>>>>> commands to >>>>>> run locally that could generate more information, please assist me >>>>>> so. >>>>> >>>>> As I said - I managed to reproduce it, I am looking at it. In first >>>>> instance I am trying with a couple of version up/down to see if >>>>> this is 5.2 specific. >>>> >>>> I cannot even get it to start on 5.4-rc1, 5.3 shows the same symptoms. >>> >>> This is something outside the UBD driver as such. There were no >>> notable changes to it since we ported it to block-mq and added >>> DISCARD which was quite a while back. >>> >>> I am going to check the other usual suspects such as IRQs, but that >>> is something I test quite extensively when working on the networking >>> side. >>> >>> So I suspect that this is something outside UML which is showing only >>> in UML for some reason. >> >> Still happening with 5.2.21, albeit a bit more difficult to reproduce. >> >> Looking at the backtraces it is ALWAYS a result of a re-queue. >> >> [ 263.990000] [<60440633>] blk_mq_dispatch_rq_list+0xf3/0x5f0 >> [ 263.990000] [<60440501>] ? blk_mq_get_driver_tag+0xc1/0x100 >> [ 263.990000] [<60440540>] ? blk_mq_dispatch_rq_list+0x0/0x5f0 >> [ 263.990000] [<60445966>] blk_mq_do_dispatch_sched+0x66/0xe0 >> [ 263.990000] [<60446107>] blk_mq_sched_dispatch_requests+0x107/0x190 >> [ 263.990000] [<6043f130>] ? blk_mq_run_hw_queue+0x0/0x120 >> [ 263.990000] [<6043efb4>] __blk_mq_run_hw_queue+0x74/0xd0 >> [ 263.990000] [<6043f0d4>] __blk_mq_delay_run_hw_queue+0xc4/0xd0 >> [ 263.990000] [<6043f1d9>] blk_mq_run_hw_queue+0xa9/0x120 >> [ 263.990000] [<6043f292>] blk_mq_run_hw_queues+0x42/0x60 >> [ 263.990000] [<60440bc0>] ? blk_mq_request_bypass_insert+0x0/0x90 >> [ 263.990000] [<604462a0>] ? blk_mq_sched_insert_request+0x0/0x1c0 >> ALWAYS >>[ 263.990000] [<604414b0>] blk_mq_requeue_work+0x160/0x170 >> [ 263.990000] [<6008c91b>] process_one_work+0x1eb/0x490 >> [ 263.990000] [<607f5aa0>] ? __schedule+0x0/0x620 >> [ 263.990000] [<6008e000>] ? wq_worker_running+0x10/0x40 >> [ 263.990000] [<6008c730>] ? process_one_work+0x0/0x490 >> [ 263.990000] [<6008cc06>] worker_thread+0x46/0x670 >> [ 263.990000] [<600931c1>] ? __kthread_parkme+0xa1/0xd0 >> [ 263.990000] [<6008cbc0>] ? worker_thread+0x0/0x670 >> [ 263.990000] [<6008cbc0>] ? worker_thread+0x0/0x670 >> [ 263.990000] [<60093bc4>] kthread+0x194/0x1c0 >> [ 263.990000] [<6002a091>] new_thread_handler+0x81/0xc0 >> >> >> A. >> >>> >>> A. >>> >>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Ritesh >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> linux-um mailing list >>>>>> linux-um@lists.infradead.org >>>>>> http://lists.infradead.org/mailman/listinfo/linux-um >>>>>> >>>>> >>>> >>> >> > > To put a long story short we have a reproducible segfault when UML > re-queues a block request in 5.x > > This used to work in 4.x > > We found a couple of minor things in UML when looking at it which we > will fix shortly, but the root problem seems to be either in block_mq or > in the way we are using it to re-queue requests. > My (probably wrong) guess is that it is something related to the changes from 4.x to 5.x where blk-mq is no longer getting the hw context out of the cpu map. For some reason it gets a null deref at some point. I may be wrong of course. I will not be able to pick this up again before Tuesday so if someone can have a go before then it will be greatly appreciated. -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: uml segfault during I/O 2019-10-18 10:51 ` Anton Ivanov @ 2019-10-28 16:51 ` Anton Ivanov 0 siblings, 0 replies; 12+ messages in thread From: Anton Ivanov @ 2019-10-28 16:51 UTC (permalink / raw) To: rrs, linux-um; +Cc: axboe, hch On 18/10/2019 11:51, Anton Ivanov wrote: > > > On 18/10/2019 10:55, Anton Ivanov wrote: >> Adding Jens and Christoph >> >> On 18/10/2019 08:35, Anton Ivanov wrote: >>> >>> >>> On 17/10/2019 17:29, Anton Ivanov wrote: >>>> >>>> >>>> On 17/10/2019 16:03, Anton Ivanov wrote: >>>>> >>>>> >>>>> On 17/10/2019 14:33, Anton Ivanov wrote: >>>>>> >>>>>> >>>>>> On 17/10/2019 14:30, Ritesh Raj Sarraf wrote: >>>>>>> On Thu, 2019-10-17 at 14:02 +0100, Anton Ivanov wrote: >>>>>>>> Looking into that. I have not run into anything like that, but >>>>>>>> I have >>>>>>>> not used any of the legacy networking for 5 odd years now. >>>>>>>> >>>>>>> >>>>>>> Do you think this is related to networking ? I ask because there >>>>>>> was no >>>>>>> network activity going on. >>>>>> >>>>>> no, it's disk somewhere. I managed to reproduce it with 5.2 stock >>>>>> on Debian 5.2 host. >>>>>> >>>>>>> >>>>>>> apt is just an example. The packages were all already downloaded >>>>>>> and >>>>>>> there was no network activity. Rather, there was block I/O. >>>>>> >>>>>> I am looking into that. >>>>>> >>>>>>> >>>>>>> One thing I noticed, which may or may not be useful to this >>>>>>> report. I >>>>>>> was booting the uml guest and immediately logging into it and >>>>>>> running >>>>>>> the block I/O. And the segfault would occur. >>>>>>> >>>>>>> If I, instead, booted the uml vm and let it remain idle for a >>>>>>> minute or >>>>>>> so and then did the I/O, it worked fine. So I am not sure if >>>>>>> there is >>>>>>> any lazy initialization happening in the background which gets >>>>>>> corrupted during immediate hot boot I/O. >>>>>> >>>>>> Interesting... >>>>>> >>>>>>> >>>>>>> On the other hand, if you think there can be any number of >>>>>>> commands to >>>>>>> run locally that could generate more information, please assist >>>>>>> me so. >>>>>> >>>>>> As I said - I managed to reproduce it, I am looking at it. In >>>>>> first instance I am trying with a couple of version up/down to >>>>>> see if this is 5.2 specific. >>>>> >>>>> I cannot even get it to start on 5.4-rc1, 5.3 shows the same >>>>> symptoms. >>>> >>>> This is something outside the UBD driver as such. There were no >>>> notable changes to it since we ported it to block-mq and added >>>> DISCARD which was quite a while back. >>>> >>>> I am going to check the other usual suspects such as IRQs, but that >>>> is something I test quite extensively when working on the >>>> networking side. >>>> >>>> So I suspect that this is something outside UML which is showing >>>> only in UML for some reason. >>> >>> Still happening with 5.2.21, albeit a bit more difficult to reproduce. >>> >>> Looking at the backtraces it is ALWAYS a result of a re-queue. >>> >>> [ 263.990000] [<60440633>] blk_mq_dispatch_rq_list+0xf3/0x5f0 >>> [ 263.990000] [<60440501>] ? blk_mq_get_driver_tag+0xc1/0x100 >>> [ 263.990000] [<60440540>] ? blk_mq_dispatch_rq_list+0x0/0x5f0 >>> [ 263.990000] [<60445966>] blk_mq_do_dispatch_sched+0x66/0xe0 >>> [ 263.990000] [<60446107>] blk_mq_sched_dispatch_requests+0x107/0x190 >>> [ 263.990000] [<6043f130>] ? blk_mq_run_hw_queue+0x0/0x120 >>> [ 263.990000] [<6043efb4>] __blk_mq_run_hw_queue+0x74/0xd0 >>> [ 263.990000] [<6043f0d4>] __blk_mq_delay_run_hw_queue+0xc4/0xd0 >>> [ 263.990000] [<6043f1d9>] blk_mq_run_hw_queue+0xa9/0x120 >>> [ 263.990000] [<6043f292>] blk_mq_run_hw_queues+0x42/0x60 >>> [ 263.990000] [<60440bc0>] ? blk_mq_request_bypass_insert+0x0/0x90 >>> [ 263.990000] [<604462a0>] ? blk_mq_sched_insert_request+0x0/0x1c0 >>> ALWAYS >>[ 263.990000] [<604414b0>] blk_mq_requeue_work+0x160/0x170 >>> [ 263.990000] [<6008c91b>] process_one_work+0x1eb/0x490 >>> [ 263.990000] [<607f5aa0>] ? __schedule+0x0/0x620 >>> [ 263.990000] [<6008e000>] ? wq_worker_running+0x10/0x40 >>> [ 263.990000] [<6008c730>] ? process_one_work+0x0/0x490 >>> [ 263.990000] [<6008cc06>] worker_thread+0x46/0x670 >>> [ 263.990000] [<600931c1>] ? __kthread_parkme+0xa1/0xd0 >>> [ 263.990000] [<6008cbc0>] ? worker_thread+0x0/0x670 >>> [ 263.990000] [<6008cbc0>] ? worker_thread+0x0/0x670 >>> [ 263.990000] [<60093bc4>] kthread+0x194/0x1c0 >>> [ 263.990000] [<6002a091>] new_thread_handler+0x81/0xc0 >>> >>> >>> A. >>> >>>> >>>> A. >>>> >>>>> >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Ritesh >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> linux-um mailing list >>>>>>> linux-um@lists.infradead.org >>>>>>> http://lists.infradead.org/mailman/listinfo/linux-um >>>>>>> >>>>>> >>>>> >>>> >>> >> >> To put a long story short we have a reproducible segfault when UML >> re-queues a block request in 5.x >> >> This used to work in 4.x >> >> We found a couple of minor things in UML when looking at it which we >> will fix shortly, but the root problem seems to be either in block_mq >> or in the way we are using it to re-queue requests. >> > > My (probably wrong) guess is that it is something related to the > changes from 4.x to 5.x where blk-mq is no longer getting the hw > context out of the cpu map. > > For some reason it gets a null deref at some point. > > I may be wrong of course. > > I will not be able to pick this up again before Tuesday so if someone > can have a go before then it will be greatly appreciated. > Unless I am mistaken It should not re-queue. Xen does not: https://elixir.bootlin.com/linux/latest/source/drivers/block/xen-blkfront.c#L910 Unless I am mistaken, if you return a form of BUSY the the upper layers will re-queue for you. That explains the crash as we have the same bio enqueued multiple times when the device stalls. I am testing a patch and if it all tests out I will submit it shortly. -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/ _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-10-28 17:21 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-17 11:37 uml segfault during I/O Ritesh Raj Sarraf [not found] ` <1ccf27d8-6b6a-7d08-acef-93077f07511b@cambridgegreys.com> 2019-10-17 13:30 ` Ritesh Raj Sarraf 2019-10-17 13:33 ` Anton Ivanov 2019-10-17 15:03 ` Anton Ivanov 2019-10-17 16:29 ` Anton Ivanov 2019-10-18 7:35 ` Anton Ivanov 2019-10-18 8:57 ` Johannes Berg 2019-10-18 9:13 ` Anton Ivanov 2019-10-18 9:49 ` Anton Ivanov 2019-10-18 9:55 ` Anton Ivanov 2019-10-18 10:51 ` Anton Ivanov 2019-10-28 16:51 ` Anton Ivanov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.