Am 03.04.19 um 22:05 schrieb Matheus Fillipe: > Okay I found a way to get it working and there was also a huge mistake > on my last boot-config, the resume was commented :P > I basically followed this: https://askubuntu.com/a/1064114 > but changed to: > resume=/dev/disk/by-uuid/70d967e6-ad52-4c21-baf0-01a813ccc6ac (just > the uuid wouldnt work) and this is probably the most important thing > to do.it worked! > I also set the resume variable in initramfs to my swap partition but > this might nor be so important anyway since it's automatically > detected. > > I tested both systemctl hibernate and pm-hibernate, i guess they call > the same thing anyway. I attached a screenshot. Seems to be working > fine without uswsusp and with nvidia proprietary drivers! > > On Wed, Apr 3, 2019 at 2:55 PM Rainer Fiebig wrote: >> >> Am 03.04.19 um 18:59 schrieb Matheus Fillipe: >>> Yes I can sorta confirm the bug is in uswsusp. I removed the package >>> and pm-utils >> >> Matheus, >> >> there is no need to uninstall pm-utils. You actually need this to have >> comfortable suspend/hibernate. >> >> The only additional option you will get from uswsusp is true s2both >> (which is nice, imo). >> >> pm-utils provides something similar called "suspend-hybrid" which means >> that the computer suspends and after a configurable time wakes up again >> to go into hibernation. >> >> and used both "systemctl hibernate" and "echo disk >> >>> /sys/power/state" to hibernate. It seems to succeed and shuts down, I >>> am just not able to resume from it, which seems to be a classical >>> problem solved just by setting the resume swap file/partition on grub. >>> (which i tried and didn't work even with nvidia disabled) >>> >>> Anyway uswsusp is still necessary because the default kernel >>> hibernation doesn't work with the proprietary nvidia drivers as long >>> as I know and tested. >> >> What doesn't work: hibernating or resuming? >> And /var/log/pm-suspend.log might give you a clue what causes the problem. >> >>> >>> Is there anyway I could get any workaround to this bug on my current >>> OS by the way? >> >> *I* don't know, I don't use Ubuntu. But what I would do now is >> re-install pm-utils *without* uswsusp and make sure that you have got >> the swap-partition/file right in grub.cfg or menu.lst (grub legacy). >> >> Then do a few pm-hibernate/resume and tell us what happened. >> >> So long! >> >>> >>> On Wed, Apr 3, 2019 at 7:04 AM Rainer Fiebig wrote: >>>> >>>> Am 03.04.19 um 11:34 schrieb Jan Kara: >>>>> On Tue 02-04-19 16:25:00, Andrew Morton wrote: >>>>>> >>>>>> I cc'ed a bunch of people from bugzilla. >>>>>> >>>>>> Folks, please please please remember to reply via emailed >>>>>> reply-to-all. Don't use the bugzilla interface! >>>>>> >>>>>> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" wrote: >>>>>> >>>>>>> On 6/13/2014 6:55 AM, Johannes Weiner wrote: >>>>>>>> On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: >>>>>>>>> On 6/13/2014 12:02 AM, Johannes Weiner wrote: >>>>>>>>>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: >>>>>>>>>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote: >>>>>>>>>>>> Hi Oliver, >>>>>>>>>>>> >>>>>>>>>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1] >>>>>>>>>>>>> attached. >>>>>>>>>>>>> >>>>>>>>>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in detail >>>>>>>>>>>>> Probably more efficient when one of you guys looks directly. >>>>>>>>>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the >>>>>>>>>>>> bdi_wq workqueue as it should: >>>>>>>>>>>> >>>>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits <-balance_dirty_pages_ratelimited >>>>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : global_dirtyable_memory <-global_dirty_limits >>>>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : writeback_in_progress <-balance_dirty_pages_ratelimited >>>>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : bdi_start_background_writeback <-balance_dirty_pages_ratelimited >>>>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on <-balance_dirty_pages_ratelimited >>>>>>>>>>>> but the worker wakeup doesn't actually do anything: >>>>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch <-__schedule >>>>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq <-worker_thread >>>>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : need_to_create_worker <-worker_thread >>>>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle <-worker_thread >>>>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers <-worker_enter_idle >>>>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule <-worker_thread >>>>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule <-worker_thread >>>>>>>>>>>> >>>>>>>>>>>> My suspicion is that this fails because the bdi_wq is frozen at this >>>>>>>>>>>> point and so the flush work never runs until resume, whereas before my >>>>>>>>>>>> patch the effective dirty limit was high enough so that image could be >>>>>>>>>>>> written in one go without being throttled; followed by an fsync() that >>>>>>>>>>>> then writes the pages in the context of the unfrozen s2disk. >>>>>>>>>>>> >>>>>>>>>>>> Does this make sense? Rafael? Tejun? >>>>>>>>>>> Well, it does seem to make sense to me. >>>>>>>>>> From what I see, this is a deadlock in the userspace suspend model and >>>>>>>>>> just happened to work by chance in the past. >>>>>>>>> Well, it had been working for quite a while, so it was a rather large >>>>>>>>> opportunity >>>>>>>>> window it seems. :-) >>>>>>>> No doubt about that, and I feel bad that it broke. But it's still a >>>>>>>> deadlock that can't reasonably be accommodated from dirty throttling. >>>>>>>> >>>>>>>> It can't just put the flushers to sleep and then issue a large amount >>>>>>>> of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot >>>>>>>> the messenger, this bug needs to be addressed, not get papered over. >>>>>>>> >>>>>>>>>> Can we patch suspend-utils as follows? >>>>>>>>> Perhaps we can. Let's ask the new maintainer. >>>>>>>>> >>>>>>>>> Rodolfo, do you think you can apply the patch below to suspend-utils? >>>>>>>>> >>>>>>>>>> Alternatively, suspend-utils >>>>>>>>>> could clear the dirty limits before it starts writing and restore them >>>>>>>>>> post-resume. >>>>>>>>> That (and the patch too) doesn't seem to address the problem with existing >>>>>>>>> suspend-utils >>>>>>>>> binaries, however. >>>>>>>> It's userspace that freezes the system before issuing buffered IO, so >>>>>>>> my conclusion was that the bug is in there. This is arguable. I also >>>>>>>> wouldn't be opposed to a patch that sets the dirty limits to infinity >>>>>>>> from the ioctl that freezes the system or creates the image. >>>>>>> >>>>>>> OK, that sounds like a workable plan. >>>>>>> >>>>>>> How do I set those limits to infinity? >>>>>> >>>>>> Five years have passed and people are still hitting this. >>>>>> >>>>>> Killian described the workaround in comment 14 at >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=75101. >>>>>> >>>>>> People can use this workaround manually by hand or in scripts. But we >>>>>> really should find a proper solution. Maybe special-case the freezing >>>>>> of the flusher threads until all the writeout has completed. Or >>>>>> something else. >>>>> >>>>> I've refreshed my memory wrt this bug and I believe the bug is really on >>>>> the side of suspend-utils (uswsusp or however it is called). They are low >>>>> level system tools, they ask the kernel to freeze all processes >>>>> (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which is >>>>> relatively heavyweight infrastructure) to work. That is wrong in my >>>>> opinion. >>>>> >>>>> I can see Johanness was suggesting in comment 11 to use O_SYNC in >>>>> suspend-utils which worked but was too slow. Indeed O_SYNC is rather big >>>>> hammer but using O_DIRECT should be what they need and get better >>>>> performance - no additional buffering in the kernel, no dirty throttling, >>>>> etc. They only need their buffer & device offsets sector aligned - they >>>>> seem to be even page aligned in suspend-utils so they should be fine. And >>>>> if the performance still sucks (currently they appear to do mostly random >>>>> 4k writes so it probably would for rotating disks), they could use AIO DIO >>>>> to get multiple pages in flight (as many as they dare to allocate buffers) >>>>> and then the IO scheduler will reorder things as good as it can and they >>>>> should get reasonable performance. >>>>> >>>>> Is there someone who works on suspend-utils these days? Because the repo >>>>> I've found on kernel.org seems to be long dead (last commit in 2012). >>>>> >>>>> Honza >>>>> >>>> >>>> Whether it's suspend-utils (or uswsusp) or not could be answered quickly >>>> by de-installing this package and using the kernel-methods instead. >>>> >>>> >> >> So you got hibernate working now with pm-utils *and* the prop. Nvidia drivers. That's good - although a bit contrary to what you said in Comment 29: > Anyway uswsusp is still necessary because the default kernel > hibernation doesn't work with the proprietary nvidia drivers as long > as I know and tested Never mind. Stick with it if you don't need s2both. What still puzzles me is that while others are having problems, suspend-utils/uswsusp work for me almost 100 % of the time, except for a few extreme test-cases in the past. You also said that it worked "flawlessly" for you until you upgraded your system. So I'm wondering whether used-up swap space might play a role in this matter, too. At least for the cases that I've seen on my system, I can't rule this out. And when I look at the screenshot you provided in Comment 27 (https://launchpadlibrarian.net/417327528/i915.jpg), sparse swap-space could have been a factor in that case as well. Because roughly 3.5 GB free swap-space doesn't seem much for a 16-GB-RAM box.