On Sun, 2015-01-18 at 23:37 -0500, David Miller wrote: > From: Marcel Holtmann > Date: Sun, 18 Jan 2015 18:10:46 -0800 > > > Hi Scott, > > > >> This patch needs to be reverted ASAP. git bisect landed me here also; > >> my processes are getting the OOM msgs. What testing was done? > >> > >> Seems someone does care that nlmsg_end() returns skb->len. > > > > I still wonder how this affects userspace. I have not figured that > > out. Something goes wrong pretty badly somewhere. > > > > Have you tried the small diff with the two locations that were > > problematic for me? > > There were a lot more cases not converted properly, I hope the > patch below gets them all. > > Johannes, this was either not tested or tested very poorly, please > don't submit changes like this. Even neighbour entry and route > dumping were hosed. > > ==================== > [PATCH] netlink: Fix bugs in nlmsg_end() conversions. > > Commit 053c095a82cf ("netlink: make nlmsg_end() and genlmsg_end() > void") didn't catch all of the cases where callers were breaking out > on the return value being equal to zero, which they no longer should > when zero means success. > > Fix all such cases. I'm not sure if this is entirely fixed. In Fedora 22 (4.0.0-rc5-git4) I'm occasionally seeing glibc deadlock in __check_pf() on a netlink recvmsg(), here: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/check_pf.c;h=162606d7;hb=glibc-2.21#l166 As I understand it, this shouldn't happen. Even if messages are dropped (which surely shouldn't happen as often as I'm seeing this), glibc should get ENOBUFS from the recvmsg() call. https://bugzilla.redhat.com/show_bug.cgi?id=1209433 I haven't bisected and proved that it *was* this commit which introduced the problem, as it only happens after a day or two of running Evolution and I haven't managed to trigger it more reliably. -- David Woodhouse Open Source Technology Centre David.Woodhouse@intel.com Intel Corporation