* Re: Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual Machines [not found] <SJ0PR10MB5742C02D9F7F146A1313BD5DE9CE9@SJ0PR10MB5742.namprd10.prod.outlook.com> @ 2022-05-17 18:33 ` Larry Finger 0 siblings, 0 replies; 8+ messages in thread From: Larry Finger @ 2022-05-17 18:33 UTC (permalink / raw) To: Vadim Galitsin, larry.finger, Jason; +Cc: LKML On 5/17/22 12:27, Vadim Galitsin wrote: > Hi Larry and Jason, > > I am from VirtualBox team. I noticed your conversation here: > > https://lore.kernel.org/lkml/Ym8uPcuQpq1xBS6d@zx2c4.com/T/#mea7aa731b5524a05ac3b3e8588c0c42235bb33d6 > <https://lore.kernel.org/lkml/Ym8uPcuQpq1xBS6d@zx2c4.com/T/#mea7aa731b5524a05ac3b3e8588c0c42235bb33d6> > > Please let me add my 5c. I agree with Larry, the issue start happen after > 6e8ec2552c7d. I did not do complete bisecting, but rather tried this revision > and the one before (with dcd03ba15947cbad1a34cfed370c4feb41058469 -- I do not > see the issue). > > For me this issue is quite reproducible with Ubuntu 20.04 Linux guest (other > guests are also affected). It happens even if there is no VBox Guest Additions > installed into guest. Guest kernel version does not play much role. Running > kernel 5.18-rc1+ on the host side is essential. > > The first way for me to reproduce it -- is to run stress-ng(1) tool inside guest > and perform random mouse cursor movements (basically, mouse or keyboard > interrupts generation is somehow essential here). Tool will report the following > error: > > root@test-VirtualBox:~# stress-ng --vm 4 -t 10 > stress-ng: info: [5463] dispatching hogs: 4 vm > stress-ng: fail: [5464] stress-ng-vm: detected 194065152 bit errors while > stressing memory > stress-ng: error: [5463] process 5464 (stress-ng-vm) terminated with an error, > exit status=1 (stress-ng core failure) > stress-ng: info: [5463] unsuccessful run completed in 10.06s > > This approach does not work in 100% cases, but triggers issue quite frequently. > > The second approach is much more reliable for me. I basically, start compiling > kernel inside guest (say, with make -j4) and start moving mouse (or generate > keyboard interrupts, pressing keys randomly). In this case, gcc processes will > randomly receive SEGFAULT. > > Important note: if I do not touch mouse or keyboard in both cases above -- all > works as normal. > > My initial guess was that this might have something to do with kstack > randomization, but booting host kernel with randomize_kstack_offset=0 seem does > not change anything in this regard. > > I am currently running out of ideas what exactly might trigger such behavior. > Hopefully, this additional info might shed additional light. > > Best regards, > Vadim > Vadim, I had an extended E-mail interchange with Jason Donenfeld over this issue. Sorry that most of this was private because some large files needed to be transmitted that were not appropriate for LKML. LKML is added back in to this reply. My test for the fault was to start a VM running Windows 10 and use Edge to load the VirtualBox web page. Usually within a few seconds, Edge or Windows would crash. In the latter case, the log for the VM might show an unhandled exception while in kernel mode. I thought the browser was hitting the random number generator hard, but there is mouse activity, of course. Jason has created a patch entitled "random: do not use input pool from hard IRQs" that fixes the problem for me. It can be found at https://lore.kernel.org/lkml/20220510140025.81168-1-Jason@zx2c4.com/. I had expected this patch to be merged into the mainline kernel by now. Jason should be able to shed light on any delays. The bottom line and good news for Oracle/VirtualBox and those of us that package VB for distros is that this is a kernel regression - which is a conclusion I hesitated to make earlier. It is not a problem with VirtualBox, VB just exposes the kernel problem. I certainly hope that this problem is fixed before 5.18 is released. If not, I will need to campaign to prevent openSUSE Tumbleweed from switching to 5.18. That would normally happen with the release of 5.18.1! Larry ^ permalink raw reply [flat|nested] 8+ messages in thread
* Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual Machines @ 2022-05-01 17:26 Larry Finger 2022-05-01 17:47 ` Jason A. Donenfeld 0 siblings, 1 reply; 8+ messages in thread From: Larry Finger @ 2022-05-01 17:26 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: LKML Jason, I maintain VirtualBox for openSUSE. When kernel 5.18-rc1 was released, I fixed the usual set of API changes needed to compile the external kernel modules for VB. Despite a clean compile, I am still getting random crashes in the VMs. For Linux instances, the desktop disappears, but for Windows guests, the VM crashes with unhandled kernel exceptions. As I have no experience tracing such crashes, I decided to bisect the kernel to find the commit that started these problems. Surprisingly, the bisection pointed to commit 6e8ec2552c7d ("random: use computational hash for entropy extraction"). I am very sure of the bisection as the kernel built from the commit that immediately precedes this one, cfb92440ee71 - a tag commit by Linus, runs correctly. Note that I do not believe there is anything wrong with your changes to the random number generators. It seems to be a problem with the way the emulator is accessing them. The VirtualBox code is quite complicated, and I am no expert with C++. Are there changes that would be required to the X86_64 emulator's access to the random number code as a result of your changes? I have found places where the emulator accesses /dev/urandom or /dev/random. There are also places that use the rdrand and reseed instructions. Thanks for reading this, Larry ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual Machines 2022-05-01 17:26 Larry Finger @ 2022-05-01 17:47 ` Jason A. Donenfeld 2022-05-01 21:07 ` Larry Finger 0 siblings, 1 reply; 8+ messages in thread From: Jason A. Donenfeld @ 2022-05-01 17:47 UTC (permalink / raw) To: Larry Finger; +Cc: LKML Hi Larry, Thanks for the report. Several questions: 1) Can you reproduce with 5.18-rc4? 2) Can you send me a stacktrace from the crash or any relevant console output? 3) Does the crash happen in the guest or the host? Question two is very important. Thanks, Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual Machines 2022-05-01 17:47 ` Jason A. Donenfeld @ 2022-05-01 21:07 ` Larry Finger 2022-05-01 23:32 ` Jason A. Donenfeld 0 siblings, 1 reply; 8+ messages in thread From: Larry Finger @ 2022-05-01 21:07 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: LKML On 5/1/22 12:47, Jason A. Donenfeld wrote: > Hi Larry, > > Thanks for the report. Several questions: > > 1) Can you reproduce with 5.18-rc4? > > 2) Can you send me a stacktrace from the crash or any relevant console > output? > > 3) Does the crash happen in the guest or the host? > > Question two is very important. Jason, 1. Yes, the problem happens with 5.18-rc4 and -rc5. 3. The crash is in the guest. Nothing unusual is logged in the host. 2. My answer here will be incomplete. There are no stacktraces or console ouput on the host from any of the guest crashes, either in dmesg or under journalctl. The desktop just disappears. The VirtualBox log files show nothing for the Linux guest, and the following for the Windows instance: 00:00:57.908011 GUI: UIMachineLogicNormal::sltCheckForRequestedVisualStateType: Requested-state=0, Machine-state=5 00:01:24.502961 GIM: HyperV: Guest indicates a fatal condition! P0=0x1e P1=0xffffffffc0000005 P2=0xfffff8054c61e97c P3=0x0 P4=0x28 00:01:24.503053 GIMHv: BugCheck 1e {ffffffffc0000005, fffff8054c61e97c, 0, 28} 00:01:24.503054 KMODE_EXCEPTION_NOT_HANDLED 00:01:24.503054 P1: ffffffffc0000005 - exception code - STATUS_ACCESS_VIOLATION 00:01:24.503054 P2: fffff8054c61e97c - EIP/RIP 00:01:24.503054 P3: 0000000000000000 - Xcpt param #0 00:01:24.503054 P4: 0000000000000028 - Xcpt param #1 Running a 3rd party dump analyzer shows that the crash happens at ntoskrnl.exe+3f7d50. I have installed the Windows debugger, but I think the learning curve will be steep. At this point, I have no further info available. Thanks, Larry ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual Machines 2022-05-01 21:07 ` Larry Finger @ 2022-05-01 23:32 ` Jason A. Donenfeld 2022-05-02 0:11 ` Jason A. Donenfeld 0 siblings, 1 reply; 8+ messages in thread From: Jason A. Donenfeld @ 2022-05-01 23:32 UTC (permalink / raw) To: Larry Finger; +Cc: LKML Hi Larry, On Sun, May 01, 2022 at 04:07:39PM -0500, Larry Finger wrote: > 1. Yes, the problem happens with 5.18-rc4 and -rc5. Do you still have your bisection logs handy? Something about this seems a bit fishy to me, and it might be helpful. > 2. My answer here will be incomplete. There are no stacktraces or console ouput You're going to have to make it more complete somehow... > on the host from any of the guest crashes, either in dmesg or under journalctl. > The desktop just disappears. The VirtualBox log files show nothing for the Linux What do you mean "just disappears"? What is the "desktop"? Do you mean that the X server segfaults or something? Can you attach a debugger somewhere and try again? There's got to be something you can do to get more info. > guest, and the following for the Windows instance: > > 00:00:57.908011 GUI: UIMachineLogicNormal::sltCheckForRequestedVisualStateType: > Requested-state=0, Machine-state=5 > 00:01:24.502961 GIM: HyperV: Guest indicates a fatal condition! P0=0x1e > P1=0xffffffffc0000005 P2=0xfffff8054c61e97c P3=0x0 P4=0x28 > 00:01:24.503053 GIMHv: BugCheck 1e {ffffffffc0000005, fffff8054c61e97c, 0, 28} > 00:01:24.503054 KMODE_EXCEPTION_NOT_HANDLED > 00:01:24.503054 P1: ffffffffc0000005 - exception code - STATUS_ACCESS_VIOLATION > 00:01:24.503054 P2: fffff8054c61e97c - EIP/RIP > 00:01:24.503054 P3: 0000000000000000 - Xcpt param #0 > 00:01:24.503054 P4: 0000000000000028 - Xcpt param #1 > > Running a 3rd party dump analyzer shows that the crash happens at > ntoskrnl.exe+3f7d50. I have installed the Windows debugger, but I think the > learning curve will be steep. At this point, I have no further info available. Can you email me the minidump files from the crash? In another life that's not supposed to intersect with lkml, windbg keeps me up at night... Also, if you've got some easy steps at repro, that'd be helpful. If I have to install OpenSUSE in a VM or something and type some commands and twiddle things here and there, let me know what it takes to get an environment going. Or, better, if you've got a VM already baked with vbox installed in it with a VM inside of that that exhibits the issue, that'd let me take a look. Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual Machines 2022-05-01 23:32 ` Jason A. Donenfeld @ 2022-05-02 0:11 ` Jason A. Donenfeld 2022-05-02 1:05 ` Jason A. Donenfeld 0 siblings, 1 reply; 8+ messages in thread From: Jason A. Donenfeld @ 2022-05-02 0:11 UTC (permalink / raw) To: Larry Finger; +Cc: LKML Hey again, I just installed VirtualBox ontop of 5.18-rc4, and then I made a new VM with a fresh install of OpenSUSE, and everything is fine. No issues at all. So you're going to have to provide more information. Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual Machines 2022-05-02 0:11 ` Jason A. Donenfeld @ 2022-05-02 1:05 ` Jason A. Donenfeld 2022-05-02 10:49 ` Jason A. Donenfeld 0 siblings, 1 reply; 8+ messages in thread From: Jason A. Donenfeld @ 2022-05-02 1:05 UTC (permalink / raw) To: Larry Finger; +Cc: LKML Hi Larry, On Mon, May 02, 2022 at 02:11:13AM +0200, Jason A. Donenfeld wrote: > Hey again, > > I just installed VirtualBox ontop of 5.18-rc4, and then I made a new VM > with a fresh install of OpenSUSE, and everything is fine. No issues at > all. > > So you're going to have to provide more information. > > Jason With still no more information provided from you, I've gone scouring and found your much more informative bug report here: https://www.virtualbox.org/ticket/20914 along with a larger log here https://www.virtualbox.org/attachment/ticket/20914/Windows%2010%20Clone-2022-04-24-20-55-56.log Why would you not have sent me all this information right away? Surely you know how to report bugs. If you're going to concern me with the possibility that I've broken something, at least give me enough detail to be able to do something. Otherwise it's pure frustration. Anyway, it's still too little information, but I could extract the Windows build from that log file, pull down ntoskrnl.exe and hope it roughly matches, and then go to work in IDA Pro trying to figure out what's going on at ntoskrnl.exe+3f7d50, and if I managed to grab the right build -- which I more than likely did not -- then that's a `mov byte ptr gs:853h, 0` in KiInterruptDispatch, which seems entirely unrelated to the change you mentioned. So I think it'd be a good moment for you to show your bisect logs so we can be certain we're after the right thing. Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual Machines 2022-05-02 1:05 ` Jason A. Donenfeld @ 2022-05-02 10:49 ` Jason A. Donenfeld 0 siblings, 0 replies; 8+ messages in thread From: Jason A. Donenfeld @ 2022-05-02 10:49 UTC (permalink / raw) To: Larry Finger; +Cc: LKML Hi Larry, On Mon, May 2, 2022 at 4:55 AM Larry Finger <Larry.Finger@lwfinger.net> wrote: > On 5/1/22 20:05, Jason A. Donenfeld wrote: > > Hi Larry, > > > > On Mon, May 02, 2022 at 02:11:13AM +0200, Jason A. Donenfeld wrote: > >> Hey again, > >> > >> I just installed VirtualBox ontop of 5.18-rc4, and then I made a new VM > >> with a fresh install of OpenSUSE, and everything is fine. No issues at > >> all. > >> > >> So you're going to have to provide more information. > >> > >> Jason > > > > With still no more information provided from you, I've gone scouring and > > found your much more informative bug report here: > > https://www.virtualbox.org/ticket/20914 along with a larger log here > > https://www.virtualbox.org/attachment/ticket/20914/Windows%2010%20Clone-2022-04-24-20-55-56.log > > > > Why would you not have sent me all this information right away? Surely > > you know how to report bugs. If you're going to concern me with the > > possibility that I've broken something, at least give me enough detail > > to be able to do something. Otherwise it's pure frustration. > > > > Anyway, it's still too little information, but I could extract the > > Windows build from that log file, pull down ntoskrnl.exe and hope it > > roughly matches, and then go to work in IDA Pro trying to figure out > > what's going on at ntoskrnl.exe+3f7d50, and if I managed to grab the > > right build -- which I more than likely did not -- then that's a `mov > > byte ptr gs:853h, 0` in KiInterruptDispatch, which seems entirely > > unrelated to the change you mentioned. > > > > So I think it'd be a good moment for you to show your bisect logs so we > > can be certain we're after the right thing. > > LKML removed from cc due to large files. > > Yes, I do know how to report bugs. If you remember my first E-mail, I was just > looking for some suggestions on how using rdrand and rdseed could conflict with > your changes. I'm sorry that you think I'm wasting your time. > > Where did you get your copy of VirtualBox? Perhaps they have some fixes that I > do not know about. I patched <https://dev.gentoo.org/~polynomial-c/virtualbox/vbox-kernel-module-src-6.1.34.tar.xz> using <https://xn--4db.cc/AtB1jwli>. > My bisect logs are gone. I will need to recreate them and I should have them > tomorrow. I do have my paper log to create the bisect. I will have it for you > tomorrow. > > I ran the VM again and got a slightly different result. The kernel exception was > at ntoskrnl.exe+458647.The mini dump is attached. The ntosknl.exe is available > at https:/lwfinger.com/download/ntosknl.exe.gz. You spelled your URL wrong in two places. Had to guess how to fix it. Please spend more time with your bug reports. This is already more painful than it should be. From looking at the minidump you sent, I don't see how this is related to the RNG. Maybe something else is wrong with your VirtualBox, and you're just experiencing a 5.17->5.18 transition. The VirtualBox team themselves said they haven't released the modules for 5.18 yet. Then on top of that, maybe you're bisecting wrong. Anyway, from that minidump... PROCESS_NAME: svchost.exe STACK_TEXT: ffff8603`177407f8 fffff806`30464647 : 00000000`0000001e ffffffff`c0000005 fffff806`3062797c 00000000`00000000 : nt!KeBugCheckEx ffff8603`17740800 fffff806`30415dac : 00000000`00001000 ffff8603`177410a0 ffff8000`00000000 00000000`00000000 : nt!KiDispatchException+0x17c287 ffff8603`17740ec0 fffff806`30411f43 : 00000000`00000001 ffffa20d`a3e00340 00000000`00000060 00000000`00000000 : nt!KiExceptionDispatch+0x12c ffff8603`177410a0 fffff806`3062797c : 00000000`000000c8 fffff806`30248da4 00000000`00000000 00000000`00000001 : nt!KiPageFault+0x443 ffff8603`17741230 fffff806`3064606e : 00000000`00000000 ffffdd8e`e4fe9970 00000000`00000000 00000000`00000000 : nt!MiPfPrepareReadList+0x4c ffff8603`17741320 fffff806`30645de4 : ffffa20d`ac52dcc0 00000000`00000000 00000000`00000000 ffffdd8e`e4fe9970 : nt!MmPrefetchPagesEx+0x96 ffff8603`17741390 fffff806`3064b349 : 00000000`00000000 ffff8603`00000000 ffffa20d`00000000 00000000`00000006 : nt!PfpPrefetchFilesTrickle+0x2a8 ffff8603`17741480 fffff806`3064bb6e : ffffa20d`abf59000 ffffa20d`abf59000 ffff8603`177416a0 00000000`00000000 : nt!PfpPrefetchRequestPerform+0x299 ffff8603`177415f0 fffff806`30651679 : 00000000`00000001 fffff806`302c0c01 ffffdd8e`e9e81760 ffffa20d`abf59000 : nt!PfpPrefetchRequest+0x132 ffff8603`17741670 fffff806`3065050d : ffffdd8e`00000000 00000000`00000000 00000000`1d16c86a 00000000`1d16c801 : nt!PfSetSuperfetchInformation+0x155 ffff8603`17741770 fffff806`304156b5 : 00000000`00000000 00000000`00000000 ffff8603`17741b80 00000000`00000000 : nt!NtSetSystemInformation+0x9bd ffff8603`17741b00 00007fff`5b9b0274 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 00000075`ba37f9c8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007fff`5b9b0274 SYMBOL_NAME: nt!MiPfPrepareReadList+4c MODULE_NAME: nt IMAGE_VERSION: 10.0.19041.1682 Loading up the kernel image, we see: PAGE:000000014061B946 mov r13, rcx [...] PAGE:000000014061B96F mov rax, [r13+0] [...] PAGE:000000014061B97C mov rdx, [rax+28h] So it dereferences the first argument of MiPfPrepareReadList(), and then dereferences offset 0x28 of that, and crashes there. Looks like the same thing happens in your other traces too, based on the bugcheck code showing offset 0x28 in those too. Anyway, until I can see that bisect log, this is beginning to smell like a big waste of time. Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-05-17 18:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <SJ0PR10MB5742C02D9F7F146A1313BD5DE9CE9@SJ0PR10MB5742.namprd10.prod.outlook.com> 2022-05-17 18:33 ` Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual Machines Larry Finger 2022-05-01 17:26 Larry Finger 2022-05-01 17:47 ` Jason A. Donenfeld 2022-05-01 21:07 ` Larry Finger 2022-05-01 23:32 ` Jason A. Donenfeld 2022-05-02 0:11 ` Jason A. Donenfeld 2022-05-02 1:05 ` Jason A. Donenfeld 2022-05-02 10:49 ` Jason A. Donenfeld
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.