Blame - MyFirstObjectWalk.txt - external/gitster/git-htmldocs

blob: 8d9e85566e642ecc47620cc4622414d587d3b993 [file] [log] [blame]

Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	1	= My First Object Walk
				2
				3	== What's an Object Walk?
				4
				5	The object walk is a key concept in Git - this is the process that underpins
				6	operations like object transfer and fsck. Beginning from a given commit, the
				7	list of objects is found by walking parent relationships between commits (commit
				8	X based on commit W) and containment relationships between objects (tree Y is
				9	contained within commit X, and blob Z is located within tree Y, giving our
				10	working tree for commit X something like `y/z.txt`).
				11
				12	A related concept is the revision walk, which is focused on commit objects and
				13	their parent relationships and does not delve into other object types. The
				14	revision walk is used for operations like `git log`.
				15
				16	=== Related Reading
				17
				18	- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
				19	the revision walker in its various incarnations.
Junio C Hamano	2267da5	2019-12-18 23:09:43	[diff] [blame]	20	- `revision.h`
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	21	- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
				22	gives a good overview of the types of objects in Git and what your object
				23	walk is really describing.
				24
				25	== Setting Up
				26
				27	Create a new branch from `master`.
				28
				29	----
				30	git checkout -b revwalk origin/master
				31	----
				32
				33	We'll put our fiddling into a new command. For fun, let's name it `git walken`.
				34	Open up a new file `builtin/walken.c` and set up the command handler:
				35
				36	----
				37	/*
				38	* "git walken"
				39	*
				40	* Part of the "My First Object Walk" tutorial.
				41	*/
				42
				43	#include "builtin.h"
				44
				45	int cmd_walken(int argc, const char *argv, const char prefix)
				46	{
				47	trace_printf(_("cmd_walken incoming...\n"));
				48	return 0;
				49	}
				50	----
				51
				52	NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
				53	off at runtime. For the purposes of this tutorial, we will write `walken` as
				54	though it is intended for use as a "plumbing" command: that is, a command which
				55	is used primarily in scripts, rather than interactively by humans (a "porcelain"
				56	command). So we will send our debug output to `trace_printf()` instead. When
				57	running, enable trace output by setting the environment variable `GIT_TRACE`.
				58
				59	Add usage text and `-h` handling, like all subcommands should consistently do
				60	(our test suite will notice and complain if you fail to do so).
Junio C Hamano	59a32b0	2021-12-10 22:53:38	[diff] [blame]	61	We'll need to include the `parse-options.h` header.
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	62
				63	----
Junio C Hamano	59a32b0	2021-12-10 22:53:38	[diff] [blame]	64	#include "parse-options.h"
				65
				66	...
				67
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	68	int cmd_walken(int argc, const char *argv, const char prefix)
				69	{
				70	const char * const walken_usage[] = {
				71	N_("git walken"),
				72	NULL,
Junio C Hamano	59a32b0	2021-12-10 22:53:38	[diff] [blame]	73	};
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	74	struct option options[] = {
				75	OPT_END()
				76	};
				77
				78	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
				79
				80	...
				81	}
				82	----
				83
				84	Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
				85
				86	----
				87	int cmd_walken(int argc, const char *argv, const char prefix);
				88	----
				89
				90	Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
				91	maintaining alphabetical ordering:
				92
				93	----
				94	{ "walken", cmd_walken, RUN_SETUP },
				95	----
				96
				97	Add it to the `Makefile` near the line for `builtin/worktree.o`:
				98
				99	----
				100	BUILTIN_OBJS += builtin/walken.o
				101	----
				102
				103	Build and test out your command, without forgetting to ensure the `DEVELOPER`
				104	flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
				105
				106	----
				107	$ echo DEVELOPER=1 >>config.mak
				108	$ make
				109	$ GIT_TRACE=1 ./bin-wrappers/git walken
				110	----
				111
				112	NOTE: For a more exhaustive overview of the new command process, take a look at
				113	`Documentation/MyFirstContribution.txt`.
				114
				115	NOTE: A reference implementation can be found at
				116	https://github.com/nasamuffin/git/tree/revwalk.
				117
				118	=== `struct rev_cmdline_info`
				119
				120	The definition of `struct rev_cmdline_info` can be found in `revision.h`.
				121
				122	This struct is contained within the `rev_info` struct and is used to reflect
				123	parameters provided by the user over the CLI.
				124
				125	`nr` represents the number of `rev_cmdline_entry` present in the array.
				126
Junio C Hamano	2267da5	2019-12-18 23:09:43	[diff] [blame]	127	`alloc` is used by the `ALLOC_GROW` macro. Check `cache.h` - this variable is
				128	used to track the allocated size of the list.
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	129
				130	Per entry, we find:
				131
				132	`item` is the object provided upon which to base the object walk. Items in Git
				133	can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
				134
				135	`name` is the object ID (OID) of the object - a hex string you may be familiar
				136	with from using Git to organize your source in the past. Check the tutorial
				137	mentioned above towards the top for a discussion of where the OID can come
				138	from.
				139
				140	`whence` indicates some information about what to do with the parents of the
				141	specified object. We'll explore this flag more later on; take a look at
				142	`Documentation/revisions.txt` to get an idea of what could set the `whence`
				143	value.
				144
				145	`flags` are used to hint the beginning of the revision walk and are the first
				146	block under the `#include`s in `revision.h`. The most likely ones to be set in
				147	the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
				148	can be used during the walk, as well.
				149
				150	=== `struct rev_info`
				151
				152	This one is quite a bit longer, and many fields are only used during the walk
				153	by `revision.c` - not configuration options. Most of the configurable flags in
				154	`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
				155	good idea to take some time and read through that document.
				156
				157	== Basic Commit Walk
				158
				159	First, let's see if we can replicate the output of `git log --oneline`. We'll
				160	refer back to the implementation frequently to discover norms when performing
				161	an object walk of our own.
				162
				163	To do so, we'll first find all the commits, in order, which preceded the current
				164	commit. We'll extract the name and subject of the commit from each.
				165
				166	Ideally, we will also be able to find out which ones are currently at the tip of
				167	various branches.
				168
				169	=== Setting Up
				170
				171	Preparing for your object walk has some distinct stages.
				172
				173	1. Perform default setup for this mode, and others which may be invoked.
				174	2. Check configuration files for relevant settings.
				175	3. Set up the `rev_info` struct.
				176	4. Tweak the initialized `rev_info` to suit the current walk.
				177	5. Prepare the `rev_info` for the walk.
				178	6. Iterate over the objects, processing each one.
				179
				180	==== Default Setups
				181
				182	Before examining configuration files which may modify command behavior, set up
				183	default state for switches or options your command may have. If your command
				184	utilizes other Git components, ask them to set up their default states as well.
				185	For instance, `git log` takes advantage of `grep` and `diff` functionality, so
				186	its `init_log_defaults()` sets its own state (`decoration_style`) and asks
				187	`grep` and `diff` to initialize themselves by calling each of their
				188	initialization functions.
				189
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	190	==== Configuring From `.gitconfig`
				191
				192	Next, we should have a look at any relevant configuration settings (i.e.,
				193	settings readable and settable from `git config`). This is done by providing a
				194	callback to `git_config()`; within that callback, you can also invoke methods
				195	from other components you may need that need to intercept these options. Your
				196	callback will be invoked once per each configuration value which Git knows about
				197	(global, local, worktree, etc.).
				198
				199	Similarly to the default values, we don't have anything to do here yet
				200	ourselves; however, we should call `git_default_config()` if we aren't calling
				201	any other existing config callbacks.
				202
Junio C Hamano	59a32b0	2021-12-10 22:53:38	[diff] [blame]	203	Add a new function to `builtin/walken.c`.
				204	We'll also need to include the `config.h` header:
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	205
				206	----
Junio C Hamano	59a32b0	2021-12-10 22:53:38	[diff] [blame]	207	#include "config.h"
				208
				209	...
				210
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	211	static int git_walken_config(const char var, const char value, void *cb)
				212	{
				213	/*
				214	* For now, we don't have any custom configuration, so fall back to
				215	* the default config.
				216	*/
				217	return git_default_config(var, value, cb);
				218	}
				219	----
				220
				221	Make sure to invoke `git_config()` with it in your `cmd_walken()`:
				222
				223	----
				224	int cmd_walken(int argc, const char *argv, const char prefix)
				225	{
				226	...
				227
				228	git_config(git_walken_config, NULL);
				229
				230	...
				231	}
				232	----
				233
				234	==== Setting Up `rev_info`
				235
				236	Now that we've gathered external configuration and options, it's time to
				237	initialize the `rev_info` object which we will use to perform the walk. This is
				238	typically done by calling `repo_init_revisions()` with the repository you intend
				239	to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
				240	struct.
				241
Junio C Hamano	59a32b0	2021-12-10 22:53:38	[diff] [blame]	242	Add the `struct rev_info` and the `repo_init_revisions()` call.
				243	We'll also need to include the `revision.h` header:
				244
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	245	----
Junio C Hamano	59a32b0	2021-12-10 22:53:38	[diff] [blame]	246	#include "revision.h"
				247
				248	...
				249
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	250	int cmd_walken(int argc, const char *argv, const char prefix)
				251	{
				252	/* This can go wherever you like in your declarations.*/
				253	struct rev_info rev;
				254	...
				255
				256	/* This should go after the git_config() call. */
				257	repo_init_revisions(the_repository, &rev, prefix);
				258
				259	...
				260	}
				261	----
				262
				263	==== Tweaking `rev_info` For the Walk
				264
				265	We're getting close, but we're still not quite ready to go. Now that `rev` is
				266	initialized, we can modify it to fit our needs. This is usually done within a
				267	helper for clarity, so let's add one:
				268
				269	----
				270	static void final_rev_info_setup(struct rev_info *rev)
				271	{
				272	/*
				273	* We want to mimic the appearance of `git log --oneline`, so let's
				274	* force oneline format.
				275	*/
				276	get_commit_format("oneline", rev);
				277
				278	/* Start our object walk at HEAD. */
				279	add_head_to_pending(rev);
				280	}
				281	----
				282
				283	[NOTE]
				284	====
				285	Instead of using the shorthand `add_head_to_pending()`, you could do
				286	something like this:
				287	----
				288	struct setup_revision_opt opt;
				289
				290	memset(&opt, 0, sizeof(opt));
				291	opt.def = "HEAD";
				292	opt.revarg_opt = REVARG_COMMITTISH;
				293	setup_revisions(argc, argv, rev, &opt);
				294	----
				295	Using a `setup_revision_opt` gives you finer control over your walk's starting
				296	point.
				297	====
				298
				299	Then let's invoke `final_rev_info_setup()` after the call to
				300	`repo_init_revisions()`:
				301
				302	----
				303	int cmd_walken(int argc, const char *argv, const char prefix)
				304	{
				305	...
				306
				307	final_rev_info_setup(&rev);
				308
				309	...
				310	}
				311	----
				312
				313	Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
				314	now, this is all we need.
				315
				316	==== Preparing `rev_info` For the Walk
				317
				318	Now that `rev` is all initialized and configured, we've got one more setup step
				319	before we get rolling. We can do this in a helper, which will both prepare the
				320	`rev_info` for the walk, and perform the walk itself. Let's start the helper
				321	with the call to `prepare_revision_walk()`, which can return an error without
				322	dying on its own:
				323
				324	----
				325	static void walken_commit_walk(struct rev_info *rev)
				326	{
				327	if (prepare_revision_walk(rev))
				328	die(_("revision walk setup failed"));
				329	}
				330	----
				331
				332	NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
				333	`stderr` it's likely to be seen by a human, so we will localize it.
				334
				335	==== Performing the Walk!
				336
				337	Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
				338	can also be used as an iterator; we move to the next item in the walk by using
				339	`get_revision()` repeatedly. Add the listed variable declarations at the top and
				340	the walk loop below the `prepare_revision_walk()` call within your
				341	`walken_commit_walk()`:
				342
				343	----
				344	static void walken_commit_walk(struct rev_info *rev)
				345	{
				346	struct commit *commit;
				347	struct strbuf prettybuf = STRBUF_INIT;
				348
				349	...
				350
				351	while ((commit = get_revision(rev))) {
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	352	strbuf_reset(&prettybuf);
				353	pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
				354	puts(prettybuf.buf);
				355	}
				356	strbuf_release(&prettybuf);
				357	}
				358	----
				359
				360	NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
				361	command we expect to be machine-parsed, we're sending it directly to stdout.
				362
				363	Give it a shot.
				364
				365	----
				366	$ make
				367	$ ./bin-wrappers/git walken
				368	----
				369
				370	You should see all of the subject lines of all the commits in
				371	your tree's history, in order, ending with the initial commit, "Initial revision
				372	of "git", the information manager from hell". Congratulations! You've written
				373	your first revision walk. You can play with printing some additional fields
				374	from each commit if you're curious; have a look at the functions available in
				375	`commit.h`.
				376
				377	=== Adding a Filter
				378
				379	Next, let's try to filter the commits we see based on their author. This is
				380	equivalent to running `git log --author=<pattern>`. We can add a filter by
				381	modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
				382
Junio C Hamano	992fbdc	2020-12-09 00:14:29	[diff] [blame]	383	First some setup. Add `grep_config()` to `git_walken_config()`:
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	384
				385	----
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	386	static int git_walken_config(const char var, const char value, void *cb)
				387	{
				388	grep_config(var, value, cb);
				389	return git_default_config(var, value, cb);
				390	}
				391	----
				392
				393	Next, we can modify the `grep_filter`. This is done with convenience functions
				394	found in `grep.h`. For fun, we're filtering to only commits from folks using a
				395	`gmail.com` email address - a not-very-precise guess at who may be working on
				396	Git as a hobby. Since we're checking the author, which is a specific line in the
				397	header, we'll use the `append_header_grep_pattern()` helper. We can use
				398	the `enum grep_header_field` to indicate which part of the commit header we want
				399	to search.
				400
				401	In `final_rev_info_setup()`, add your filter line:
				402
				403	----
				404	static void final_rev_info_setup(int argc, const char **argv,
				405	const char prefix, struct rev_info rev)
				406	{
				407	...
				408
				409	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
				410	"gmail");
				411	compile_grep_patterns(&rev->grep_filter);
				412
				413	...
				414	}
				415	----
				416
				417	`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
				418	it won't work unless we compile it with `compile_grep_patterns()`.
				419
				420	NOTE: If you are using `setup_revisions()` (for example, if you are passing a
				421	`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
				422	to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
				423
				424	NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
				425	wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
				426	`enum grep_pat_token` for us.
				427
				428	=== Changing the Order
				429
				430	There are a few ways that we can change the order of the commits during a
				431	revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
				432	typical orderings.
				433
				434	`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
				435	before all of its children have been shown, and we avoid mixing commits which
				436	are in different lines of history. (`git help log`'s section on `--topo-order`
				437	has a very nice diagram to illustrate this.)
				438
				439	Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
				440	`REV_SORT_BY_AUTHOR_DATE`. Add the following:
				441
				442	----
				443	static void final_rev_info_setup(int argc, const char **argv,
				444	const char prefix, struct rev_info rev)
				445	{
				446	...
				447
				448	rev->topo_order = 1;
				449	rev->sort_order = REV_SORT_BY_COMMIT_DATE;
				450
				451	...
				452	}
				453	----
				454
				455	Let's output this into a file so we can easily diff it with the walk sorted by
				456	author date.
				457
				458	----
				459	$ make
				460	$ ./bin-wrappers/git walken > commit-date.txt
				461	----
				462
				463	Then, let's sort by author date and run it again.
				464
				465	----
				466	static void final_rev_info_setup(int argc, const char **argv,
				467	const char prefix, struct rev_info rev)
				468	{
				469	...
				470
				471	rev->topo_order = 1;
				472	rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
				473
				474	...
				475	}
				476	----
				477
				478	----
				479	$ make
				480	$ ./bin-wrappers/git walken > author-date.txt
				481	----
				482
				483	Finally, compare the two. This is a little less helpful without object names or
				484	dates, but hopefully we get the idea.
				485
				486	----
				487	$ diff -u commit-date.txt author-date.txt
				488	----
				489
				490	This display indicates that commits can be reordered after they're written, for
				491	example with `git rebase`.
				492
				493	Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
				494	Set that flag somewhere inside of `final_rev_info_setup()`:
				495
				496	----
				497	static void final_rev_info_setup(int argc, const char *argv, const char prefix,
				498	struct rev_info *rev)
				499	{
				500	...
				501
				502	rev->reverse = 1;
				503
				504	...
				505	}
				506	----
				507
				508	Run your walk again and note the difference in order. (If you remove the grep
				509	pattern, you should see the last commit this call gives you as your current
				510	HEAD.)
				511
				512	== Basic Object Walk
				513
				514	So far we've been walking only commits. But Git has more types of objects than
				515	that! Let's see if we can walk _all_ objects, and find out some information
				516	about each one.
				517
				518	We can base our work on an example. `git pack-objects` prepares all kinds of
				519	objects for packing into a bitmap or packfile. The work we are interested in
				520	resides in `builtins/pack-objects.c:get_object_list()`; examination of that
				521	function shows that the all-object walk is being performed by
				522	`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
				523	functions reside in `list-objects.c`; examining the source shows that, despite
				524	the name, these functions traverse all kinds of objects. Let's have a look at
Junio C Hamano	67fef49	2022-03-27 17:31:23	[diff] [blame]	525	the arguments to `traverse_commit_list()`.
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	526
Junio C Hamano	67fef49	2022-03-27 17:31:23	[diff] [blame]	527	- `struct rev_info *revs`: This is the `rev_info` used for the walk. If
				528	its `filter` member is not `NULL`, then `filter` contains information for
				529	how to filter the object list.
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	530	- `show_commit_fn show_commit`: A callback which will be used to handle each
				531	individual commit object.
				532	- `show_object_fn show_object`: A callback which will be used to handle each
				533	non-commit object (so each blob, tree, or tag).
				534	- `void *show_data`: A context buffer which is passed in turn to `show_commit`
				535	and `show_object`.
Junio C Hamano	67fef49	2022-03-27 17:31:23	[diff] [blame]	536
				537	In addition, `traverse_commit_list_filtered()` has an additional paramter:
				538
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	539	- `struct oidset *omitted`: A linked-list of object IDs which the provided
				540	filter caused to be omitted.
				541
Junio C Hamano	67fef49	2022-03-27 17:31:23	[diff] [blame]	542	It looks like these methods use callbacks we provide instead of needing us
				543	to call it repeatedly ourselves. Cool! Let's add the callbacks first.
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	544
				545	For the sake of this tutorial, we'll simply keep track of how many of each kind
				546	of object we find. At file scope in `builtin/walken.c` add the following
				547	tracking variables:
				548
				549	----
				550	static int commit_count;
				551	static int tag_count;
				552	static int blob_count;
				553	static int tree_count;
				554	----
				555
				556	Commits are handled by a different callback than other objects; let's do that
				557	one first:
				558
				559	----
				560	static void walken_show_commit(struct commit cmt, void buf)
				561	{
				562	commit_count++;
				563	}
				564	----
				565
				566	The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
				567	the `buf` argument is actually the context buffer that we can provide to the
				568	traversal calls - `show_data`, which we mentioned a moment ago.
				569
				570	Since we have the `struct commit` object, we can look at all the same parts that
				571	we looked at in our earlier commit-only walk. For the sake of this tutorial,
				572	though, we'll just increment the commit counter and move on.
				573
				574	The callback for non-commits is a little different, as we'll need to check
				575	which kind of object we're dealing with:
				576
				577	----
				578	static void walken_show_object(struct object obj, const char str, void *buf)
				579	{
				580	switch (obj->type) {
				581	case OBJ_TREE:
				582	tree_count++;
				583	break;
				584	case OBJ_BLOB:
				585	blob_count++;
				586	break;
				587	case OBJ_TAG:
				588	tag_count++;
				589	break;
				590	case OBJ_COMMIT:
				591	BUG("unexpected commit object in walken_show_object\n");
				592	default:
				593	BUG("unexpected object type %s in walken_show_object\n",
				594	type_name(obj->type));
				595	}
				596	}
				597	----
				598
				599	Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
				600	context pointer that `walken_show_commit()` receives: the `show_data` argument
				601	to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
				602	`str` contains the name of the object, which ends up being something like
				603	`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
				604
				605	To help assure us that we aren't double-counting commits, we'll include some
				606	complaining if a commit object is routed through our non-commit callback; we'll
				607	also complain if we see an invalid object type. Since those two cases should be
				608	unreachable, and would only change in the event of a semantic change to the Git
				609	codebase, we complain by using `BUG()` - which is a signal to a developer that
				610	the change they made caused unintended consequences, and the rest of the
				611	codebase needs to be updated to understand that change. `BUG()` is not intended
				612	to be seen by the public, so it is not localized.
				613
				614	Our main object walk implementation is substantially different from our commit
				615	walk implementation, so let's make a new function to perform the object walk. We
				616	can perform setup which is applicable to all objects here, too, to keep separate
				617	from setup which is applicable to commit-only walks.
				618
				619	We'll start by enabling all types of objects in the `struct rev_info`. We'll
				620	also turn on `tree_blobs_in_commit_order`, which means that we will walk a
				621	commit's tree and everything it points to immediately after we find each commit,
				622	as opposed to waiting for the end and walking through all trees after the commit
				623	history has been discovered. With the appropriate settings configured, we are
				624	ready to call `prepare_revision_walk()`.
				625
				626	----
				627	static void walken_object_walk(struct rev_info *rev)
				628	{
				629	rev->tree_objects = 1;
				630	rev->blob_objects = 1;
				631	rev->tag_objects = 1;
				632	rev->tree_blobs_in_commit_order = 1;
				633
				634	if (prepare_revision_walk(rev))
				635	die(_("revision walk setup failed"));
				636
				637	commit_count = 0;
				638	tag_count = 0;
				639	blob_count = 0;
				640	tree_count = 0;
				641	----
				642
				643	Let's start by calling just the unfiltered walk and reporting our counts.
Junio C Hamano	59a32b0	2021-12-10 22:53:38	[diff] [blame]	644	Complete your implementation of `walken_object_walk()`.
				645	We'll also need to include the `list-objects.h` header.
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	646
				647	----
Junio C Hamano	59a32b0	2021-12-10 22:53:38	[diff] [blame]	648	#include "list-objects.h"
				649
				650	...
				651
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	652	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
				653
				654	printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
				655	blob_count, tag_count, tree_count);
				656	}
				657	----
				658
				659	NOTE: This output is intended to be machine-parsed. Therefore, we are not
				660	sending it to `trace_printf()`, and we are not localizing it - we need scripts
				661	to be able to count on the formatting to be exactly the way it is shown here.
				662	If we were intending this output to be read by humans, we would need to localize
				663	it with `_()`.
				664
				665	Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
				666	command line options is out of scope for this tutorial, so we'll just hardcode
				667	a branch we can change at compile time. Where you call `final_rev_info_setup()`
				668	and `walken_commit_walk()`, instead branch like so:
				669
				670	----
				671	if (1) {
				672	add_head_to_pending(&rev);
				673	walken_object_walk(&rev);
				674	} else {
				675	final_rev_info_setup(argc, argv, prefix, &rev);
				676	walken_commit_walk(&rev);
				677	}
				678	----
				679
				680	NOTE: For simplicity, we've avoided all the filters and sorts we applied in
				681	`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
				682	want, you can certainly use the filters we added before by moving
				683	`final_rev_info_setup()` out of the conditional and removing the call to
				684	`add_head_to_pending()`.
				685
				686	Now we can try to run our command! It should take noticeably longer than the
				687	commit walk, but an examination of the output will give you an idea why. Your
				688	output should look similar to this example, but with different counts:
				689
				690	----
				691	Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
				692	----
				693
				694	This makes sense. We have more trees than commits because the Git project has
				695	lots of subdirectories which can change, plus at least one tree per commit. We
				696	have no tags because we started on a commit (`HEAD`) and while tags can point to
				697	commits, commits can't point to tags.
				698
				699	NOTE: You will have different counts when you run this yourself! The number of
				700	objects grows along with the Git project.
				701
				702	=== Adding a Filter
				703
				704	There are a handful of filters that we can apply to the object walk laid out in
				705	`Documentation/rev-list-options.txt`. These filters are typically useful for
				706	operations such as creating packfiles or performing a partial clone. They are
				707	defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
				708	will use the "tree:1" filter, which causes the walk to omit all trees and blobs
				709	which are not directly referenced by commits reachable from the commit in
				710	`pending` when the walk begins. (`pending` is the list of objects which need to
				711	be traversed during a walk; you can imagine a breadth-first tree traversal to
				712	help understand. In our case, that means we omit trees and blobs not directly
				713	referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
				714	`HEAD` in the `pending` list.)
				715
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	716	For now, we are not going to track the omitted objects, so we'll replace those
				717	parameters with `NULL`. For the sake of simplicity, we'll add a simple
Junio C Hamano	67fef49	2022-03-27 17:31:23	[diff] [blame]	718	build-time branch to use our filter or not. Preface the line calling
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	719	`traverse_commit_list()` with the following, which will remind us which kind of
				720	walk we've just performed:
				721
				722	----
				723	if (0) {
				724	/* Unfiltered: */
				725	trace_printf(_("Unfiltered object walk.\n"));
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	726	} else {
				727	trace_printf(
				728	_("Filtered object walk with filterspec 'tree:1'.\n"));
Junio C Hamano	67fef49	2022-03-27 17:31:23	[diff] [blame]	729	CALLOC_ARRAY(rev->filter, 1);
				730	parse_list_objects_filter(rev->filter, "tree:1");
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	731	}
Junio C Hamano	67fef49	2022-03-27 17:31:23	[diff] [blame]	732	traverse_commit_list(rev, walken_show_commit,
				733	walken_show_object, NULL);
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	734	----
				735
Junio C Hamano	67fef49	2022-03-27 17:31:23	[diff] [blame]	736	The `rev->filter` member is usually built directly from a command
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	737	line argument, so the module provides an easy way to build one from a string.
				738	Even though we aren't taking user input right now, we can still build one with
				739	a hardcoded string using `parse_list_objects_filter()`.
				740
				741	With the filter spec "tree:1", we are expecting to see _only_ the root tree for
				742	each commit; therefore, the tree object count should be less than or equal to
				743	the number of commits. (For an example of why that's true: `git commit --revert`
				744	points to the same tree object as its grandparent.)
				745
				746	=== Counting Omitted Objects
				747
				748	We also have the capability to enumerate all objects which were omitted by a
				749	filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
				750	`traverse_commit_list_filtered()` to populate the `omitted` list means that our
				751	object walk does not perform any better than an unfiltered object walk; all
				752	reachable objects are walked in order to populate the list.
				753
				754	First, add the `struct oidset` and related items we will use to iterate it:
				755
				756	----
				757	static void walken_object_walk(
				758	...
				759
				760	struct oidset omitted;
				761	struct oidset_iter oit;
				762	struct object_id *oid = NULL;
				763	int omitted_count = 0;
				764	oidset_init(&omitted, 0);
				765
				766	...
				767	----
				768
				769	Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
				770	object:
				771
				772	----
				773	...
				774
Junio C Hamano	67fef49	2022-03-27 17:31:23	[diff] [blame]	775	traverse_commit_list_filtered(rev,
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	776	walken_show_commit, walken_show_object, NULL, &omitted);
				777
				778	...
				779	----
				780
				781	Then, after your traversal, the `oidset` traversal is pretty straightforward.
				782	Count all the objects within and modify the print statement:
				783
				784	----
				785	/* Count the omitted objects. */
				786	oidset_iter_init(&omitted, &oit);
				787
				788	while ((oid = oidset_iter_next(&oit)))
				789	omitted_count++;
				790
Junio C Hamano	7d6f46e	2021-09-10 19:54:21	[diff] [blame]	791	printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n",
Junio C Hamano	8ac8a3d	2019-11-11 04:33:46	[diff] [blame]	792	commit_count, blob_count, tag_count, tree_count, omitted_count);
				793	----
				794
				795	By running your walk with and without the filter, you should find that the total
				796	object count in each case is identical. You can also time each invocation of
				797	the `walken` subcommand, with and without `omitted` being passed in, to confirm
				798	to yourself the runtime impact of tracking all omitted objects.
				799
				800	=== Changing the Order
				801
				802	Finally, let's demonstrate that you can also reorder walks of all objects, not
				803	just walks of commits. First, we'll make our handlers chattier - modify
				804	`walken_show_commit()` and `walken_show_object()` to print the object as they
				805	go:
				806
				807	----
				808	static void walken_show_commit(struct commit cmt, void buf)
				809	{
				810	trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
				811	commit_count++;
				812	}
				813
				814	static void walken_show_object(struct object obj, const char str, void *buf)
				815	{
				816	trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
				817
				818	...
				819	}
				820	----
				821
				822	NOTE: Since we will be examining this output directly as humans, we'll use
				823	`trace_printf()` here. Additionally, since this change introduces a significant
				824	number of printed lines, using `trace_printf()` will allow us to easily silence
				825	those lines without having to recompile.
				826
				827	(Leave the counter increment logic in place.)
				828
				829	With only that change, run again (but save yourself some scrollback):
				830
				831	----
				832	$ GIT_TRACE=1 ./bin-wrappers/git walken \| head -n 10
				833	----
				834
				835	Take a look at the top commit with `git show` and the object ID you printed; it
				836	should be the same as the output of `git show HEAD`.
				837
				838	Next, let's change a setting on our `struct rev_info` within
				839	`walken_object_walk()`. Find where you're changing the other settings on `rev`,
				840	such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
				841	`reverse` setting at the bottom:
				842
				843	----
				844	...
				845
				846	rev->tree_objects = 1;
				847	rev->blob_objects = 1;
				848	rev->tag_objects = 1;
				849	rev->tree_blobs_in_commit_order = 1;
				850	rev->reverse = 1;
				851
				852	...
				853	----
				854
				855	Now, run again, but this time, let's grab the last handful of objects instead
				856	of the first handful:
				857
				858	----
				859	$ make
				860	$ GIT_TRACE=1 ./bin-wrappers git walken \| tail -n 10
				861	----
				862
				863	The last commit object given should have the same OID as the one we saw at the
				864	top before, and running `git show <oid>` with that OID should give you again
				865	the same results as `git show HEAD`. Furthermore, if you run and examine the
				866	first ten lines again (with `head` instead of `tail` like we did before applying
				867	the `reverse` setting), you should see that now the first commit printed is the
				868	initial commit, `e83c5163`.
				869
				870	== Wrapping Up
				871
				872	Let's review. In this tutorial, we:
				873
				874	- Built a commit walk from the ground up
				875	- Enabled a grep filter for that commit walk
				876	- Changed the sort order of that filtered commit walk
				877	- Built an object walk (tags, commits, trees, and blobs) from the ground up
				878	- Learned how to add a filter-spec to an object walk
				879	- Changed the display order of the filtered object walk