Skip to content

Conversation

@tulinkry
Copy link
Contributor

@tulinkry tulinkry commented Jan 24, 2017

This should fix #1335 and it might fix #1258.
Implements #794

There should be no harm with this. If anything does not work, it will work like it did before - no renamed files tracking.

List<String> cmd = new ArrayList<>();
cmd.add(RepoCommand);
cmd.add("log");
cmd.add("-M8"); // similarity 80%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use 100% ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, maybe use long variant of the option (--find-renames) so that it is more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is 50% so I pushed that up to 80%. 100% is to much I think...one can change the location of the file while changing some lines


@Override
public void processStream(InputStream input) throws IOException {
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the sequence of command that lead to this particular log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

entries.add(entry);
}

history.setHistoryEntries(entries);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the setHistoryEntries() no longer necessary ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the history is now created as a return value directly

return new History(entries, parser.getRenamedFiles());
throw new HistoryException("Failed to get history for: \"" +
file.getAbsolutePath() + "\" Exit code: " + status);
throw new HistoryException("Failed to get history for: \""
+ file.getAbsolutePath() + "\" Exit code: " + status);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about the style here - where do we generally keep the operator when wrapping lines ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I get when I autoformat in netbeans with default settings. I changed those lines anyway to use String.format and not string composition.

try (BufferedReader in = new BufferedReader(new InputStreamReader(input))) {
String line;
while ((line = in.readLine()) != null) {
if (line.startsWith("R")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be more strict here - preferably use regexp matching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

origpath = findOriginalName(fullpath, rev);
} catch (IOException exp) {
LOGGER.log(Level.SEVERE,
"Failed to get original revision: {0}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

@tulinkry tulinkry Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it is

protected String expandGitRenamedOutput(String renamedFile) {
// the pattern should cover format like this:
// moved/{renamed.c => renamed2.c} (100%)
Pattern pattern = Pattern.compile("\\s*([^\\{]+)\\s*\\{\\s*(.+?)\\s+=>\\s+([^\\}]+)\\s*\\}(.*)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch PatternSyntaxException exception just to be sure ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pattern is hardcoded so the exception can't occur

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless the compiler changes.. better to catch it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention that I changed the command to obtain more predictable output format (without these {, } constructs. So this method is no longer available :-)

file = split[0];
}

if (rev.equals(changeset)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how this is supposed work given the same piece of code on lines 360-362

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mistake. removed

if (split.length < 2) {
throw new IOException(
String.format(
"Invalid format of the git output. Expected "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add more info about the file path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Test
public void testRenamedFiles() throws Exception {
String[][] tests = new String[][]{
{"moved/renamed2.c", "67dfbe26", "moved/renamed2.c"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about testing renames across distinct directories ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test case

@vladak
Copy link
Member

vladak commented Jan 26, 2017

Could you do some basic perf. measurements for some Git repo with renames ? Also, you should see standalone git log processes to be forked for the renamed files.

@tulinkry
Copy link
Contributor Author

tulinkry commented Jan 27, 2017

Ok. I created a small testing repo and did some renames. These commands I executed to create the repository:

Basically:

  • the file in 1/file.txt got moved twice (to 4 and back to 1)
  • all of the files from 1 were moved to 4
  • file from 3/3/file.txt was moved to root and changed a little bit

also download repo.txt

mkdir renamed cd renamed/ mkdir 1 mkdir 2 mkdir 3 tr -dc A-Za-z0-9 </dev/urandom | head -c 20kB > 1/file.txt tr -dc A-Za-z0-9 </dev/urandom | head -c 25kB > 2/file.txt tr -dc A-Za-z0-9 </dev/urandom | head -c 30kB > 3/file.txt mkdir -p {1,2,3}/{1,2,3} tr -dc A-Za-z0-9 </dev/urandom | head -c 30kB > 1/1/file.txt tr -dc A-Za-z0-9 </dev/urandom | head -c 30kB > 1/2/file.txt tr -dc A-Za-z0-9 </dev/urandom | head -c 30kB > 1/3/file.txt tr -dc A-Za-z0-9 </dev/urandom | head -c 30kB > 2/1/file.txt tr -dc A-Za-z0-9 </dev/urandom | head -c 30kB > 2/2/file.txt tr -dc A-Za-z0-9 </dev/urandom | head -c 30kB > 2/3/file.txt tr -dc A-Za-z0-9 </dev/urandom | head -c 30kB > 3/1/file.txt tr -dc A-Za-z0-9 </dev/urandom | head -c 30kB > 3/2/file.txt tr -dc A-Za-z0-9 </dev/urandom | head -c 30kB > 3/3/file.txt git init git add 1/ git commit -m "first" git add 2/1/ git commit -m "second" git mv 1 4 git add 1 git add 1/ git add 4/ git commit -m "first rename" git add 2/2 git commit -m "third" git add 3 git commit "fourth" git commit -m "fourth" mkdir 1 mv 4/1/file.txt 1/ git add 4 git add 1 git commit -m "second rename" git add 2/3 git commit -m "fifth" mv 3/3/file.txt . cat file.txt | head -c 20Kb > tmp && cp tmp file.txt && rm tmp git add 3 git add file.txt git commit -m "third rename" git add 2/ git commit -m "sixth" git log git status git commit

Then I ran the index process from scratch for current master (without) and this pr (with).

$ time strace -f 2>&1 ./OpenGrok index | egrep "fork|exec.*" > exec-without.log real	0m2.776s user	0m2.740s sys	0m0.960s $ time strace -f 2>&1 ./OpenGrok index | egrep "fork|exec.*" > exec-with.log real	0m2.335s user	0m2.548s sys	0m1.092s

I did that several times, once the with version had like 5s. Most of the time it was same.

These are the outputs:

download exec-without.txt

[pid 16355] execve(["--help"]) [pid 16357] execve(["remote", "-v"]) [pid 16358] execve(["branch"]) [pid 16360] execve(["log", "-1", "--pretty=%cd: %h %an %s", "--date=rfc"]) [pid 16363] execve(["--help"]) [pid 16365] execve(["remote", "-v"]) [pid 16366] execve(["branch"]) [pid 16368] execve(["log", "-1", "--pretty=%cd: %h %an %s", "--date=rfc"]) [pid 16372] execve(["log", "--abbrev-commit", "--abbrev=8", "--name-only", "--pretty=fuller", "--date=rfc"]) [pid 16376] execve(["log", "--abbrev-commit", "--abbrev=8", "--name-only", "--pretty=fuller", "--date=rfc", "dbe628c0.."]) 

download exec-with.txt

[pid 16648] execve(["--help"] [pid 16650] execve(["remote", "-v"] [pid 16651] execve(["branch"] [pid 16653] execve(["log", "-1", "--pretty=%cd: %h %an %s", "--date=rfc"]) [pid 16656] execve(["--help"]) [pid 16658] execve(["remote", "-v"]) [pid 16659] execve(["branch"]) [pid 16661] execve(["log", "-1", "--pretty=%cd: %h %an %s", "--date=rfc"]) [pid 16665] execve(["log", "--abbrev-commit", "--abbrev=8", "--name-only", "--pretty=fuller", "--date=rfc"]) [pid 16668] execve(["log", "--find-renames=8", "--summary", "--abbrev=8", "--name-status", "--oneline"]) [pid 16675] execve(["log", "--abbrev-commit", "--abbrev=8", "--name-only", "--pretty=fuller", "--date=rfc", "--follow", "4/2/file.txt"]) [pid 16679] execve(["log", "--abbrev-commit", "--abbrev=8", "--name-only", "--pretty=fuller", "--date=rfc", "--follow", "file.txt"], [/ [pid 16680] execve(["log", "--abbrev-commit", "--abbrev=8", "--name-only", "--pretty=fuller", "--date=rfc", "--follow", "4/3/file.txt"], [pid 16686] execve(["log", "--abbrev-commit", "--abbrev=8", "--name-only", "--pretty=fuller", "--date=rfc", "--follow", "1/file.txt"], [pid 16688] execve(["log", "--find-renames=8", "--summary", "--abbrev=8", "--name-status", "--oneline", "--follow", "--", "4/3/file.txt"], [pid 16692] execve(["log", "--find-renames=8", "--summary", "--abbrev=8", "--name-status", "--oneline", "--follow", "--", "4/2/file.txt"], [pid 16697] execve(["log", "--find-renames=8", "--summary", "--abbrev=8", "--name-status", "--oneline", "--follow", "--", "file.txt"], [pid 16699] execve(["log", "--find-renames=8", "--summary", "--abbrev=8", "--name-status", "--oneline", "--follow", "--", "1/file.txt"], [pid 16703] execve(["log", "--abbrev-commit", "--abbrev=8", "--name-only", "--pretty=fuller", "--date=rfc", "dbe628c0.."]) [pid 16706] execve(["log", "--find-renames=8", "--summary", "--abbrev=8", "--name-status", "--oneline", "dbe628c0.."]) 

It's clearly visible that there is one extra git command for finding renamed files. All of the renamed files have their dedicated call to git log command to find their renamed history.

In the 'with' version the commands are duplicated. This is caused by the fact the the history generation calls generic getHistory and this method does not know if it is for renamed file or not. So our implementation must perform the traditional log command with --follow (which gets all the changesets) and then again find all renames of the given file (which is not necessary for a file but it is for a directory)

<zip destfile="${build.test.classes.dir}/org/opensolaris/opengrok/history/repositories.zip"
basedir="${test.repositories}"
excludes="mercurial/hg/**, mercurial/hgignore, git/git"
excludes="mercurial/hg/**, mercurial/hgignore, git/git/**"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the compiled test-classes still contained the both git and .git directory which caused a null revision error for getting annotations from the git repository (as they aren't tracked by git but still are in the repository).

This exact same line is in the root build.xml with these ** at the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants