From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755740AbdEEVIC (ORCPT ); Fri, 5 May 2017 17:08:02 -0400 Received: from mail-io0-f182.google.com ([209.85.223.182]:34437 "EHLO mail-io0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751783AbdEEVH7 (ORCPT ); Fri, 5 May 2017 17:07:59 -0400 MIME-Version: 1.0 In-Reply-To: <1070215647.4566044.1494016107082.JavaMail.zimbra@redhat.com> References: <909713995.4565875.1494016018520.JavaMail.zimbra@redhat.com> <1070215647.4566044.1494016107082.JavaMail.zimbra@redhat.com> From: Linus Torvalds Date: Fri, 5 May 2017 14:07:53 -0700 X-Google-Sender-Auth: 3qWIETwlZioW57cR1vU0lzW5W0c Message-ID: Subject: Re: GFS2: Pull request (merge window) To: Bob Peterson Cc: cluster-devel , Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 5, 2017 at 1:28 PM, Bob Peterson wrote: > > I asked around, but nobody could tell me what went wrong. Strangely, > this command: > > git log --oneline --right-only origin/master...FETCH_HEAD --stat > > doesn't show this, but this one does: > > git diff --stat --right-only origin/master...FETCH_HEAD So the fundamental difference between "git log" and "git diff" is that one is a "ser operation" on the commits in question, and the other is fundamentally a "operation between two endpoints". And that's why "git log" will always show the "right" thing - because even in the presense of complex history, there's no ambiguity about which commits are part of the new set, and which are in the old set. So "git log" just does a set difference, and shows the commits in one set but not the other. But "git diff", because it is fundamentally about two particular points in history, can have a hard time once you have complex history: what are the two points? In particular, what "git diff origin/master...FETCH_HEAD" means is really: - find the common point (git calls it "merge base" because the common point is also used for merging) between the two commits (origin/master and FETCH_HEAD) - do the diff from that common point to the end result (FETCH_HEAD) and for linear history that is all very obvious and unambiguous. But once you have non-linear history, and particularly once you have back-merges (ie you're not just merging work that is uniquely your own from multiple of your *own* branches, but you're also doing merges of upstream code), the notion of that "common case" is no longer unambiguous. There is not necessarily any *one* common base, there can be multiple points in history that are common between the two branches, but are distinct points of history (ie one is not an ancestor of another). And since a diff is fundamentally about just two end-points ("what are the differences between these two points in the history"), "git diff" fundamentally cannot handle that case without help. So "git diff" will pick the first of the merge bases it finds, and just use that. Which even in the presense of more complex history will often work by luck, but more often just means that you'll see differences that aren't all from your tree, but some of them came from the *other* common point(s). For example, after doing the pull, I can then do: git merge-base --all HEAD^ HEAD^2 to see the merge bases of the merge in HEAD. In this case, because of your back-merge, there's two of them (with more complex history, there can be more still): f9fe1c12d126 rhashtable: Add rhashtable_lookup_get_insert_fast 69eea5a4ab9c Merge branch 'for-linus' of git://git.kernel.dk/linux-block and because "git diff" will just pick the first one, you will basically have done git diff f9fe1c12d126..FETCH_HEAD and if you then look at the *set* of changes (with "git log" of that range), you'll see why that diff also ends up containing those block changes (because they came on from that other merge base: commit 69eea5a4ab9c that had that linux-block merge). Now, doing a *merge* in git will take _all_ of those merge bases into account, and do something *much* more complicated than just a two-way diff. It will internally first create a single merge base (by recursively merging up all the other merge bases into a new internal commit), and then using that single merge base it will then do a normal three-way merge of the two branches. "git diff' doesn't do that kind of complicated operation, and although it *could* do that merge base recursive merging dance, the problem would be what to do about conflicts (which "git merge" obviously can also have, but with git merge you have that whole manual conflict resolution case). So once you have complex history that isn't just about merging your own local changes from other local branches, you'll start hitting this situation. Visualizing the history with "gitk" for those cases is often a great way to see why there's no single point that can be diffed against. But once you *do* have that kind of complex history, you're also expected to have the expertise to handle it: > So I created a temporary local branch and used git merge to > generate a correct diffstat. That's the correct thing to do. Hopefully the above explains *why* it's the correct thing to do. (Although to be honest, I also am pretty used to parsing the wrong diffs, and people sometimes just send me the garbage diffstat and say "I don't know what happened", and I'll figure it out and can still validate that the garbage diffstat they sent me is what I too get if I do just a silly "git diff" without taking merge bases into account). Linus