blob: 8d9e85566e642ecc47620cc4622414d587d3b993 [file] [log] [blame]
Junio C Hamano8ac8a3d2019-11-11 04:33:461= My First Object Walk
2
3== What's an Object Walk?
4
5The object walk is a key concept in Git - this is the process that underpins
6operations like object transfer and fsck. Beginning from a given commit, the
7list of objects is found by walking parent relationships between commits (commit
8X based on commit W) and containment relationships between objects (tree Y is
9contained within commit X, and blob Z is located within tree Y, giving our
10working tree for commit X something like `y/z.txt`).
11
12A related concept is the revision walk, which is focused on commit objects and
13their parent relationships and does not delve into other object types. The
14revision walk is used for operations like `git log`.
15
16=== Related Reading
17
18- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
19 the revision walker in its various incarnations.
Junio C Hamano2267da52019-12-18 23:09:4320- `revision.h`
Junio C Hamano8ac8a3d2019-11-11 04:33:4621- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
22 gives a good overview of the types of objects in Git and what your object
23 walk is really describing.
24
25== Setting Up
26
27Create a new branch from `master`.
28
29----
30git checkout -b revwalk origin/master
31----
32
33We'll put our fiddling into a new command. For fun, let's name it `git walken`.
34Open up a new file `builtin/walken.c` and set up the command handler:
35
36----
37/*
38 * "git walken"
39 *
40 * Part of the "My First Object Walk" tutorial.
41 */
42
43#include "builtin.h"
44
45int cmd_walken(int argc, const char **argv, const char *prefix)
46{
47trace_printf(_("cmd_walken incoming...\n"));
48return 0;
49}
50----
51
52NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
53off at runtime. For the purposes of this tutorial, we will write `walken` as
54though it is intended for use as a "plumbing" command: that is, a command which
55is used primarily in scripts, rather than interactively by humans (a "porcelain"
56command). So we will send our debug output to `trace_printf()` instead. When
57running, enable trace output by setting the environment variable `GIT_TRACE`.
58
59Add usage text and `-h` handling, like all subcommands should consistently do
60(our test suite will notice and complain if you fail to do so).
Junio C Hamano59a32b02021-12-10 22:53:3861We'll need to include the `parse-options.h` header.
Junio C Hamano8ac8a3d2019-11-11 04:33:4662
63----
Junio C Hamano59a32b02021-12-10 22:53:3864#include "parse-options.h"
65
66...
67
Junio C Hamano8ac8a3d2019-11-11 04:33:4668int cmd_walken(int argc, const char **argv, const char *prefix)
69{
70const char * const walken_usage[] = {
71N_("git walken"),
72NULL,
Junio C Hamano59a32b02021-12-10 22:53:3873};
Junio C Hamano8ac8a3d2019-11-11 04:33:4674struct option options[] = {
75OPT_END()
76};
77
78argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
79
80...
81}
82----
83
84Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
85
86----
87int cmd_walken(int argc, const char **argv, const char *prefix);
88----
89
90Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
91maintaining alphabetical ordering:
92
93----
94{ "walken", cmd_walken, RUN_SETUP },
95----
96
97Add it to the `Makefile` near the line for `builtin/worktree.o`:
98
99----
100BUILTIN_OBJS += builtin/walken.o
101----
102
103Build and test out your command, without forgetting to ensure the `DEVELOPER`
104flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
105
106----
107$ echo DEVELOPER=1 >>config.mak
108$ make
109$ GIT_TRACE=1 ./bin-wrappers/git walken
110----
111
112NOTE: For a more exhaustive overview of the new command process, take a look at
113`Documentation/MyFirstContribution.txt`.
114
115NOTE: A reference implementation can be found at
116https://github.com/nasamuffin/git/tree/revwalk.
117
118=== `struct rev_cmdline_info`
119
120The definition of `struct rev_cmdline_info` can be found in `revision.h`.
121
122This struct is contained within the `rev_info` struct and is used to reflect
123parameters provided by the user over the CLI.
124
125`nr` represents the number of `rev_cmdline_entry` present in the array.
126
Junio C Hamano2267da52019-12-18 23:09:43127`alloc` is used by the `ALLOC_GROW` macro. Check `cache.h` - this variable is
128used to track the allocated size of the list.
Junio C Hamano8ac8a3d2019-11-11 04:33:46129
130Per entry, we find:
131
132`item` is the object provided upon which to base the object walk. Items in Git
133can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
134
135`name` is the object ID (OID) of the object - a hex string you may be familiar
136with from using Git to organize your source in the past. Check the tutorial
137mentioned above towards the top for a discussion of where the OID can come
138from.
139
140`whence` indicates some information about what to do with the parents of the
141specified object. We'll explore this flag more later on; take a look at
142`Documentation/revisions.txt` to get an idea of what could set the `whence`
143value.
144
145`flags` are used to hint the beginning of the revision walk and are the first
146block under the `#include`s in `revision.h`. The most likely ones to be set in
147the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
148can be used during the walk, as well.
149
150=== `struct rev_info`
151
152This one is quite a bit longer, and many fields are only used during the walk
153by `revision.c` - not configuration options. Most of the configurable flags in
154`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
155good idea to take some time and read through that document.
156
157== Basic Commit Walk
158
159First, let's see if we can replicate the output of `git log --oneline`. We'll
160refer back to the implementation frequently to discover norms when performing
161an object walk of our own.
162
163To do so, we'll first find all the commits, in order, which preceded the current
164commit. We'll extract the name and subject of the commit from each.
165
166Ideally, we will also be able to find out which ones are currently at the tip of
167various branches.
168
169=== Setting Up
170
171Preparing for your object walk has some distinct stages.
172
1731. Perform default setup for this mode, and others which may be invoked.
1742. Check configuration files for relevant settings.
1753. Set up the `rev_info` struct.
1764. Tweak the initialized `rev_info` to suit the current walk.
1775. Prepare the `rev_info` for the walk.
1786. Iterate over the objects, processing each one.
179
180==== Default Setups
181
182Before examining configuration files which may modify command behavior, set up
183default state for switches or options your command may have. If your command
184utilizes other Git components, ask them to set up their default states as well.
185For instance, `git log` takes advantage of `grep` and `diff` functionality, so
186its `init_log_defaults()` sets its own state (`decoration_style`) and asks
187`grep` and `diff` to initialize themselves by calling each of their
188initialization functions.
189
Junio C Hamano8ac8a3d2019-11-11 04:33:46190==== Configuring From `.gitconfig`
191
192Next, we should have a look at any relevant configuration settings (i.e.,
193settings readable and settable from `git config`). This is done by providing a
194callback to `git_config()`; within that callback, you can also invoke methods
195from other components you may need that need to intercept these options. Your
196callback will be invoked once per each configuration value which Git knows about
197(global, local, worktree, etc.).
198
199Similarly to the default values, we don't have anything to do here yet
200ourselves; however, we should call `git_default_config()` if we aren't calling
201any other existing config callbacks.
202
Junio C Hamano59a32b02021-12-10 22:53:38203Add a new function to `builtin/walken.c`.
204We'll also need to include the `config.h` header:
Junio C Hamano8ac8a3d2019-11-11 04:33:46205
206----
Junio C Hamano59a32b02021-12-10 22:53:38207#include "config.h"
208
209...
210
Junio C Hamano8ac8a3d2019-11-11 04:33:46211static int git_walken_config(const char *var, const char *value, void *cb)
212{
213/*
214 * For now, we don't have any custom configuration, so fall back to
215 * the default config.
216 */
217return git_default_config(var, value, cb);
218}
219----
220
221Make sure to invoke `git_config()` with it in your `cmd_walken()`:
222
223----
224int cmd_walken(int argc, const char **argv, const char *prefix)
225{
226...
227
228git_config(git_walken_config, NULL);
229
230...
231}
232----
233
234==== Setting Up `rev_info`
235
236Now that we've gathered external configuration and options, it's time to
237initialize the `rev_info` object which we will use to perform the walk. This is
238typically done by calling `repo_init_revisions()` with the repository you intend
239to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
240struct.
241
Junio C Hamano59a32b02021-12-10 22:53:38242Add the `struct rev_info` and the `repo_init_revisions()` call.
243We'll also need to include the `revision.h` header:
244
Junio C Hamano8ac8a3d2019-11-11 04:33:46245----
Junio C Hamano59a32b02021-12-10 22:53:38246#include "revision.h"
247
248...
249
Junio C Hamano8ac8a3d2019-11-11 04:33:46250int cmd_walken(int argc, const char **argv, const char *prefix)
251{
252/* This can go wherever you like in your declarations.*/
253struct rev_info rev;
254...
255
256/* This should go after the git_config() call. */
257repo_init_revisions(the_repository, &rev, prefix);
258
259...
260}
261----
262
263==== Tweaking `rev_info` For the Walk
264
265We're getting close, but we're still not quite ready to go. Now that `rev` is
266initialized, we can modify it to fit our needs. This is usually done within a
267helper for clarity, so let's add one:
268
269----
270static void final_rev_info_setup(struct rev_info *rev)
271{
272/*
273 * We want to mimic the appearance of `git log --oneline`, so let's
274 * force oneline format.
275 */
276get_commit_format("oneline", rev);
277
278/* Start our object walk at HEAD. */
279add_head_to_pending(rev);
280}
281----
282
283[NOTE]
284====
285Instead of using the shorthand `add_head_to_pending()`, you could do
286something like this:
287----
288struct setup_revision_opt opt;
289
290memset(&opt, 0, sizeof(opt));
291opt.def = "HEAD";
292opt.revarg_opt = REVARG_COMMITTISH;
293setup_revisions(argc, argv, rev, &opt);
294----
295Using a `setup_revision_opt` gives you finer control over your walk's starting
296point.
297====
298
299Then let's invoke `final_rev_info_setup()` after the call to
300`repo_init_revisions()`:
301
302----
303int cmd_walken(int argc, const char **argv, const char *prefix)
304{
305...
306
307final_rev_info_setup(&rev);
308
309...
310}
311----
312
313Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
314now, this is all we need.
315
316==== Preparing `rev_info` For the Walk
317
318Now that `rev` is all initialized and configured, we've got one more setup step
319before we get rolling. We can do this in a helper, which will both prepare the
320`rev_info` for the walk, and perform the walk itself. Let's start the helper
321with the call to `prepare_revision_walk()`, which can return an error without
322dying on its own:
323
324----
325static void walken_commit_walk(struct rev_info *rev)
326{
327if (prepare_revision_walk(rev))
328die(_("revision walk setup failed"));
329}
330----
331
332NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
333`stderr` it's likely to be seen by a human, so we will localize it.
334
335==== Performing the Walk!
336
337Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
338can also be used as an iterator; we move to the next item in the walk by using
339`get_revision()` repeatedly. Add the listed variable declarations at the top and
340the walk loop below the `prepare_revision_walk()` call within your
341`walken_commit_walk()`:
342
343----
344static void walken_commit_walk(struct rev_info *rev)
345{
346struct commit *commit;
347struct strbuf prettybuf = STRBUF_INIT;
348
349...
350
351while ((commit = get_revision(rev))) {
Junio C Hamano8ac8a3d2019-11-11 04:33:46352strbuf_reset(&prettybuf);
353pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
354puts(prettybuf.buf);
355}
356strbuf_release(&prettybuf);
357}
358----
359
360NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
361command we expect to be machine-parsed, we're sending it directly to stdout.
362
363Give it a shot.
364
365----
366$ make
367$ ./bin-wrappers/git walken
368----
369
370You should see all of the subject lines of all the commits in
371your tree's history, in order, ending with the initial commit, "Initial revision
372of "git", the information manager from hell". Congratulations! You've written
373your first revision walk. You can play with printing some additional fields
374from each commit if you're curious; have a look at the functions available in
375`commit.h`.
376
377=== Adding a Filter
378
379Next, let's try to filter the commits we see based on their author. This is
380equivalent to running `git log --author=<pattern>`. We can add a filter by
381modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
382
Junio C Hamano992fbdc2020-12-09 00:14:29383First some setup. Add `grep_config()` to `git_walken_config()`:
Junio C Hamano8ac8a3d2019-11-11 04:33:46384
385----
Junio C Hamano8ac8a3d2019-11-11 04:33:46386static int git_walken_config(const char *var, const char *value, void *cb)
387{
388grep_config(var, value, cb);
389return git_default_config(var, value, cb);
390}
391----
392
393Next, we can modify the `grep_filter`. This is done with convenience functions
394found in `grep.h`. For fun, we're filtering to only commits from folks using a
395`gmail.com` email address - a not-very-precise guess at who may be working on
396Git as a hobby. Since we're checking the author, which is a specific line in the
397header, we'll use the `append_header_grep_pattern()` helper. We can use
398the `enum grep_header_field` to indicate which part of the commit header we want
399to search.
400
401In `final_rev_info_setup()`, add your filter line:
402
403----
404static void final_rev_info_setup(int argc, const char **argv,
405const char *prefix, struct rev_info *rev)
406{
407...
408
409append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
410"gmail");
411compile_grep_patterns(&rev->grep_filter);
412
413...
414}
415----
416
417`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
418it won't work unless we compile it with `compile_grep_patterns()`.
419
420NOTE: If you are using `setup_revisions()` (for example, if you are passing a
421`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
422to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
423
424NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
425wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
426`enum grep_pat_token` for us.
427
428=== Changing the Order
429
430There are a few ways that we can change the order of the commits during a
431revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
432typical orderings.
433
434`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
435before all of its children have been shown, and we avoid mixing commits which
436are in different lines of history. (`git help log`'s section on `--topo-order`
437has a very nice diagram to illustrate this.)
438
439Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
440`REV_SORT_BY_AUTHOR_DATE`. Add the following:
441
442----
443static void final_rev_info_setup(int argc, const char **argv,
444const char *prefix, struct rev_info *rev)
445{
446...
447
448rev->topo_order = 1;
449rev->sort_order = REV_SORT_BY_COMMIT_DATE;
450
451...
452}
453----
454
455Let's output this into a file so we can easily diff it with the walk sorted by
456author date.
457
458----
459$ make
460$ ./bin-wrappers/git walken > commit-date.txt
461----
462
463Then, let's sort by author date and run it again.
464
465----
466static void final_rev_info_setup(int argc, const char **argv,
467const char *prefix, struct rev_info *rev)
468{
469...
470
471rev->topo_order = 1;
472rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
473
474...
475}
476----
477
478----
479$ make
480$ ./bin-wrappers/git walken > author-date.txt
481----
482
483Finally, compare the two. This is a little less helpful without object names or
484dates, but hopefully we get the idea.
485
486----
487$ diff -u commit-date.txt author-date.txt
488----
489
490This display indicates that commits can be reordered after they're written, for
491example with `git rebase`.
492
493Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
494Set that flag somewhere inside of `final_rev_info_setup()`:
495
496----
497static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
498struct rev_info *rev)
499{
500...
501
502rev->reverse = 1;
503
504...
505}
506----
507
508Run your walk again and note the difference in order. (If you remove the grep
509pattern, you should see the last commit this call gives you as your current
510HEAD.)
511
512== Basic Object Walk
513
514So far we've been walking only commits. But Git has more types of objects than
515that! Let's see if we can walk _all_ objects, and find out some information
516about each one.
517
518We can base our work on an example. `git pack-objects` prepares all kinds of
519objects for packing into a bitmap or packfile. The work we are interested in
520resides in `builtins/pack-objects.c:get_object_list()`; examination of that
521function shows that the all-object walk is being performed by
522`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
523functions reside in `list-objects.c`; examining the source shows that, despite
524the name, these functions traverse all kinds of objects. Let's have a look at
Junio C Hamano67fef492022-03-27 17:31:23525the arguments to `traverse_commit_list()`.
Junio C Hamano8ac8a3d2019-11-11 04:33:46526
Junio C Hamano67fef492022-03-27 17:31:23527- `struct rev_info *revs`: This is the `rev_info` used for the walk. If
528 its `filter` member is not `NULL`, then `filter` contains information for
529 how to filter the object list.
Junio C Hamano8ac8a3d2019-11-11 04:33:46530- `show_commit_fn show_commit`: A callback which will be used to handle each
531 individual commit object.
532- `show_object_fn show_object`: A callback which will be used to handle each
533 non-commit object (so each blob, tree, or tag).
534- `void *show_data`: A context buffer which is passed in turn to `show_commit`
535 and `show_object`.
Junio C Hamano67fef492022-03-27 17:31:23536
537In addition, `traverse_commit_list_filtered()` has an additional paramter:
538
Junio C Hamano8ac8a3d2019-11-11 04:33:46539- `struct oidset *omitted`: A linked-list of object IDs which the provided
540 filter caused to be omitted.
541
Junio C Hamano67fef492022-03-27 17:31:23542It looks like these methods use callbacks we provide instead of needing us
543to call it repeatedly ourselves. Cool! Let's add the callbacks first.
Junio C Hamano8ac8a3d2019-11-11 04:33:46544
545For the sake of this tutorial, we'll simply keep track of how many of each kind
546of object we find. At file scope in `builtin/walken.c` add the following
547tracking variables:
548
549----
550static int commit_count;
551static int tag_count;
552static int blob_count;
553static int tree_count;
554----
555
556Commits are handled by a different callback than other objects; let's do that
557one first:
558
559----
560static void walken_show_commit(struct commit *cmt, void *buf)
561{
562commit_count++;
563}
564----
565
566The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
567the `buf` argument is actually the context buffer that we can provide to the
568traversal calls - `show_data`, which we mentioned a moment ago.
569
570Since we have the `struct commit` object, we can look at all the same parts that
571we looked at in our earlier commit-only walk. For the sake of this tutorial,
572though, we'll just increment the commit counter and move on.
573
574The callback for non-commits is a little different, as we'll need to check
575which kind of object we're dealing with:
576
577----
578static void walken_show_object(struct object *obj, const char *str, void *buf)
579{
580switch (obj->type) {
581case OBJ_TREE:
582tree_count++;
583break;
584case OBJ_BLOB:
585blob_count++;
586break;
587case OBJ_TAG:
588tag_count++;
589break;
590case OBJ_COMMIT:
591BUG("unexpected commit object in walken_show_object\n");
592default:
593BUG("unexpected object type %s in walken_show_object\n",
594type_name(obj->type));
595}
596}
597----
598
599Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
600context pointer that `walken_show_commit()` receives: the `show_data` argument
601to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
602`str` contains the name of the object, which ends up being something like
603`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
604
605To help assure us that we aren't double-counting commits, we'll include some
606complaining if a commit object is routed through our non-commit callback; we'll
607also complain if we see an invalid object type. Since those two cases should be
608unreachable, and would only change in the event of a semantic change to the Git
609codebase, we complain by using `BUG()` - which is a signal to a developer that
610the change they made caused unintended consequences, and the rest of the
611codebase needs to be updated to understand that change. `BUG()` is not intended
612to be seen by the public, so it is not localized.
613
614Our main object walk implementation is substantially different from our commit
615walk implementation, so let's make a new function to perform the object walk. We
616can perform setup which is applicable to all objects here, too, to keep separate
617from setup which is applicable to commit-only walks.
618
619We'll start by enabling all types of objects in the `struct rev_info`. We'll
620also turn on `tree_blobs_in_commit_order`, which means that we will walk a
621commit's tree and everything it points to immediately after we find each commit,
622as opposed to waiting for the end and walking through all trees after the commit
623history has been discovered. With the appropriate settings configured, we are
624ready to call `prepare_revision_walk()`.
625
626----
627static void walken_object_walk(struct rev_info *rev)
628{
629rev->tree_objects = 1;
630rev->blob_objects = 1;
631rev->tag_objects = 1;
632rev->tree_blobs_in_commit_order = 1;
633
634if (prepare_revision_walk(rev))
635die(_("revision walk setup failed"));
636
637commit_count = 0;
638tag_count = 0;
639blob_count = 0;
640tree_count = 0;
641----
642
643Let's start by calling just the unfiltered walk and reporting our counts.
Junio C Hamano59a32b02021-12-10 22:53:38644Complete your implementation of `walken_object_walk()`.
645We'll also need to include the `list-objects.h` header.
Junio C Hamano8ac8a3d2019-11-11 04:33:46646
647----
Junio C Hamano59a32b02021-12-10 22:53:38648#include "list-objects.h"
649
650...
651
Junio C Hamano8ac8a3d2019-11-11 04:33:46652traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
653
654printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
655blob_count, tag_count, tree_count);
656}
657----
658
659NOTE: This output is intended to be machine-parsed. Therefore, we are not
660sending it to `trace_printf()`, and we are not localizing it - we need scripts
661to be able to count on the formatting to be exactly the way it is shown here.
662If we were intending this output to be read by humans, we would need to localize
663it with `_()`.
664
665Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
666command line options is out of scope for this tutorial, so we'll just hardcode
667a branch we can change at compile time. Where you call `final_rev_info_setup()`
668and `walken_commit_walk()`, instead branch like so:
669
670----
671if (1) {
672add_head_to_pending(&rev);
673walken_object_walk(&rev);
674} else {
675final_rev_info_setup(argc, argv, prefix, &rev);
676walken_commit_walk(&rev);
677}
678----
679
680NOTE: For simplicity, we've avoided all the filters and sorts we applied in
681`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
682want, you can certainly use the filters we added before by moving
683`final_rev_info_setup()` out of the conditional and removing the call to
684`add_head_to_pending()`.
685
686Now we can try to run our command! It should take noticeably longer than the
687commit walk, but an examination of the output will give you an idea why. Your
688output should look similar to this example, but with different counts:
689
690----
691Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
692----
693
694This makes sense. We have more trees than commits because the Git project has
695lots of subdirectories which can change, plus at least one tree per commit. We
696have no tags because we started on a commit (`HEAD`) and while tags can point to
697commits, commits can't point to tags.
698
699NOTE: You will have different counts when you run this yourself! The number of
700objects grows along with the Git project.
701
702=== Adding a Filter
703
704There are a handful of filters that we can apply to the object walk laid out in
705`Documentation/rev-list-options.txt`. These filters are typically useful for
706operations such as creating packfiles or performing a partial clone. They are
707defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
708will use the "tree:1" filter, which causes the walk to omit all trees and blobs
709which are not directly referenced by commits reachable from the commit in
710`pending` when the walk begins. (`pending` is the list of objects which need to
711be traversed during a walk; you can imagine a breadth-first tree traversal to
712help understand. In our case, that means we omit trees and blobs not directly
713referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
714`HEAD` in the `pending` list.)
715
Junio C Hamano8ac8a3d2019-11-11 04:33:46716For now, we are not going to track the omitted objects, so we'll replace those
717parameters with `NULL`. For the sake of simplicity, we'll add a simple
Junio C Hamano67fef492022-03-27 17:31:23718build-time branch to use our filter or not. Preface the line calling
Junio C Hamano8ac8a3d2019-11-11 04:33:46719`traverse_commit_list()` with the following, which will remind us which kind of
720walk we've just performed:
721
722----
723if (0) {
724/* Unfiltered: */
725trace_printf(_("Unfiltered object walk.\n"));
Junio C Hamano8ac8a3d2019-11-11 04:33:46726} else {
727trace_printf(
728_("Filtered object walk with filterspec 'tree:1'.\n"));
Junio C Hamano67fef492022-03-27 17:31:23729CALLOC_ARRAY(rev->filter, 1);
730parse_list_objects_filter(rev->filter, "tree:1");
Junio C Hamano8ac8a3d2019-11-11 04:33:46731}
Junio C Hamano67fef492022-03-27 17:31:23732traverse_commit_list(rev, walken_show_commit,
733 walken_show_object, NULL);
Junio C Hamano8ac8a3d2019-11-11 04:33:46734----
735
Junio C Hamano67fef492022-03-27 17:31:23736The `rev->filter` member is usually built directly from a command
Junio C Hamano8ac8a3d2019-11-11 04:33:46737line argument, so the module provides an easy way to build one from a string.
738Even though we aren't taking user input right now, we can still build one with
739a hardcoded string using `parse_list_objects_filter()`.
740
741With the filter spec "tree:1", we are expecting to see _only_ the root tree for
742each commit; therefore, the tree object count should be less than or equal to
743the number of commits. (For an example of why that's true: `git commit --revert`
744points to the same tree object as its grandparent.)
745
746=== Counting Omitted Objects
747
748We also have the capability to enumerate all objects which were omitted by a
749filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
750`traverse_commit_list_filtered()` to populate the `omitted` list means that our
751object walk does not perform any better than an unfiltered object walk; all
752reachable objects are walked in order to populate the list.
753
754First, add the `struct oidset` and related items we will use to iterate it:
755
756----
757static void walken_object_walk(
758...
759
760struct oidset omitted;
761struct oidset_iter oit;
762struct object_id *oid = NULL;
763int omitted_count = 0;
764oidset_init(&omitted, 0);
765
766...
767----
768
769Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
770object:
771
772----
773...
774
Junio C Hamano67fef492022-03-27 17:31:23775traverse_commit_list_filtered(rev,
Junio C Hamano8ac8a3d2019-11-11 04:33:46776walken_show_commit, walken_show_object, NULL, &omitted);
777
778...
779----
780
781Then, after your traversal, the `oidset` traversal is pretty straightforward.
782Count all the objects within and modify the print statement:
783
784----
785/* Count the omitted objects. */
786oidset_iter_init(&omitted, &oit);
787
788while ((oid = oidset_iter_next(&oit)))
789omitted_count++;
790
Junio C Hamano7d6f46e2021-09-10 19:54:21791printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n",
Junio C Hamano8ac8a3d2019-11-11 04:33:46792commit_count, blob_count, tag_count, tree_count, omitted_count);
793----
794
795By running your walk with and without the filter, you should find that the total
796object count in each case is identical. You can also time each invocation of
797the `walken` subcommand, with and without `omitted` being passed in, to confirm
798to yourself the runtime impact of tracking all omitted objects.
799
800=== Changing the Order
801
802Finally, let's demonstrate that you can also reorder walks of all objects, not
803just walks of commits. First, we'll make our handlers chattier - modify
804`walken_show_commit()` and `walken_show_object()` to print the object as they
805go:
806
807----
808static void walken_show_commit(struct commit *cmt, void *buf)
809{
810trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
811commit_count++;
812}
813
814static void walken_show_object(struct object *obj, const char *str, void *buf)
815{
816trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
817
818...
819}
820----
821
822NOTE: Since we will be examining this output directly as humans, we'll use
823`trace_printf()` here. Additionally, since this change introduces a significant
824number of printed lines, using `trace_printf()` will allow us to easily silence
825those lines without having to recompile.
826
827(Leave the counter increment logic in place.)
828
829With only that change, run again (but save yourself some scrollback):
830
831----
832$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
833----
834
835Take a look at the top commit with `git show` and the object ID you printed; it
836should be the same as the output of `git show HEAD`.
837
838Next, let's change a setting on our `struct rev_info` within
839`walken_object_walk()`. Find where you're changing the other settings on `rev`,
840such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
841`reverse` setting at the bottom:
842
843----
844...
845
846rev->tree_objects = 1;
847rev->blob_objects = 1;
848rev->tag_objects = 1;
849rev->tree_blobs_in_commit_order = 1;
850rev->reverse = 1;
851
852...
853----
854
855Now, run again, but this time, let's grab the last handful of objects instead
856of the first handful:
857
858----
859$ make
860$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
861----
862
863The last commit object given should have the same OID as the one we saw at the
864top before, and running `git show <oid>` with that OID should give you again
865the same results as `git show HEAD`. Furthermore, if you run and examine the
866first ten lines again (with `head` instead of `tail` like we did before applying
867the `reverse` setting), you should see that now the first commit printed is the
868initial commit, `e83c5163`.
869
870== Wrapping Up
871
872Let's review. In this tutorial, we:
873
874- Built a commit walk from the ground up
875- Enabled a grep filter for that commit walk
876- Changed the sort order of that filtered commit walk
877- Built an object walk (tags, commits, trees, and blobs) from the ground up
878- Learned how to add a filter-spec to an object walk
879- Changed the display order of the filtered object walk