blob: ad7a5bddd24d91ceda78d430fd86a204b21fd005 [file] [log] [blame]
Junio C Hamano5b3533d2014-02-27 23:07:151hashmap API
2===========
3
4The hashmap API is a generic implementation of hash-based key-value mappings.
5
6Data Structures
7---------------
8
9`struct hashmap`::
10
Junio C Hamanodbe51052014-07-22 19:39:5111The hash table structure. Members can be used as follows, but should
12not be modified directly:
Junio C Hamano5b3533d2014-02-27 23:07:1513+
Junio C Hamanodbe51052014-07-22 19:39:5114The `size` member keeps track of the total number of entries (0 means the
15hashmap is empty).
16+
17`tablesize` is the allocated size of the hash table. A non-0 value indicates
18that the hashmap is initialized. It may also be useful for statistical purposes
19(i.e. `size / tablesize` is the current load factor).
20+
21`cmpfn` stores the comparison function specified in `hashmap_init()`. In
22advanced scenarios, it may be useful to change this, e.g. to switch between
23case-sensitive and case-insensitive lookup.
Junio C Hamano5b3533d2014-02-27 23:07:1524
25`struct hashmap_entry`::
26
27An opaque structure representing an entry in the hash table, which must
28be used as first member of user data structures. Ideally it should be
29followed by an int-sized member to prevent unused memory on 64-bit
30systems due to alignment.
31+
32The `hash` member is the entry's hash code and the `next` member points to the
33next entry in case of collisions (i.e. if multiple entries map to the same
34bucket).
35
36`struct hashmap_iter`::
37
38An iterator structure, to be used with hashmap_iter_* functions.
39
40Types
41-----
42
43`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
44
45User-supplied function to test two hashmap entries for equality. Shall
46return 0 if the entries are equal.
47+
48This function is always called with non-NULL `entry` / `entry_or_key`
49parameters that have the same hash code. When looking up an entry, the `key`
50and `keydata` parameters to hashmap_get and hashmap_remove are always passed
51as second and third argument, respectively. Otherwise, `keydata` is NULL.
52
53Functions
54---------
55
56`unsigned int strhash(const char *buf)`::
57`unsigned int strihash(const char *buf)`::
58`unsigned int memhash(const void *buf, size_t len)`::
59`unsigned int memihash(const void *buf, size_t len)`::
60
61Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
62http://www.isthe.com/chongo/tech/comp/fnv).
63+
64`strhash` and `strihash` take 0-terminated strings, while `memhash` and
65`memihash` operate on arbitrary-length memory.
66+
67`strihash` and `memihash` are case insensitive versions.
68
Junio C Hamanodbe51052014-07-22 19:39:5169`unsigned int sha1hash(const unsigned char *sha1)`::
70
71Converts a cryptographic hash (e.g. SHA-1) into an int-sized hash code
72for use in hash tables. Cryptographic hashes are supposed to have
73uniform distribution, so in contrast to `memhash()`, this just copies
74the first `sizeof(int)` bytes without shuffling any bits. Note that
75the results will be different on big-endian and little-endian
76platforms, so they should not be stored or transferred over the net.
77
Junio C Hamano5b3533d2014-02-27 23:07:1578`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
79
80Initializes a hashmap structure.
81+
82`map` is the hashmap to initialize.
83+
84The `equals_function` can be specified to compare two entries for equality.
85If NULL, entries are considered equal if their hash codes are equal.
86+
87If the total number of entries is known in advance, the `initial_size`
88parameter may be used to preallocate a sufficiently large table and thus
89prevent expensive resizing. If 0, the table is dynamically resized.
90
91`void hashmap_free(struct hashmap *map, int free_entries)`::
92
93Frees a hashmap structure and allocated memory.
94+
95`map` is the hashmap to free.
96+
97If `free_entries` is true, each hashmap_entry in the map is freed as well
98(using stdlib's free()).
99
100`void hashmap_entry_init(void *entry, unsigned int hash)`::
101
102Initializes a hashmap_entry structure.
103+
104`entry` points to the entry to initialize.
105+
106`hash` is the hash code of the entry.
107
108`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
109
110Returns the hashmap entry for the specified key, or NULL if not found.
111+
112`map` is the hashmap structure.
113+
114`key` is a hashmap_entry structure (or user data structure that starts with
115hashmap_entry) that has at least been initialized with the proper hash code
116(via `hashmap_entry_init`).
117+
118If an entry with matching hash code is found, `key` and `keydata` are passed
119to `hashmap_cmp_fn` to decide whether the entry matches the key.
120
Junio C Hamanodbe51052014-07-22 19:39:51121`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
122
123Returns the hashmap entry for the specified hash code and key data,
124or NULL if not found.
125+
126`map` is the hashmap structure.
127+
128`hash` is the hash code of the entry to look up.
129+
130If an entry with matching hash code is found, `keydata` is passed to
131`hashmap_cmp_fn` to decide whether the entry matches the key. The
132`entry_or_key` parameter points to a bogus hashmap_entry structure that
133should not be used in the comparison.
134
Junio C Hamano5b3533d2014-02-27 23:07:15135`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
136
137Returns the next equal hashmap entry, or NULL if not found. This can be
138used to iterate over duplicate entries (see `hashmap_add`).
139+
140`map` is the hashmap structure.
141+
142`entry` is the hashmap_entry to start the search from, obtained via a previous
143call to `hashmap_get` or `hashmap_get_next`.
144
145`void hashmap_add(struct hashmap *map, void *entry)`::
146
147Adds a hashmap entry. This allows to add duplicate entries (i.e.
148separate values with the same key according to hashmap_cmp_fn).
149+
150`map` is the hashmap structure.
151+
152`entry` is the entry to add.
153
154`void *hashmap_put(struct hashmap *map, void *entry)`::
155
156Adds or replaces a hashmap entry. If the hashmap contains duplicate
157entries equal to the specified entry, only one of them will be replaced.
158+
159`map` is the hashmap structure.
160+
161`entry` is the entry to add or replace.
162+
163Returns the replaced entry, or NULL if not found (i.e. the entry was added).
164
165`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
166
167Removes a hashmap entry matching the specified key. If the hashmap
168contains duplicate entries equal to the specified key, only one of
169them will be removed.
170+
171`map` is the hashmap structure.
172+
173`key` is a hashmap_entry structure (or user data structure that starts with
174hashmap_entry) that has at least been initialized with the proper hash code
175(via `hashmap_entry_init`).
176+
177If an entry with matching hash code is found, `key` and `keydata` are
178passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
179+
180Returns the removed entry, or NULL if not found.
181
182`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
183`void *hashmap_iter_next(struct hashmap_iter *iter)`::
184`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
185
186Used to iterate over all entries of a hashmap.
187+
188`hashmap_iter_init` initializes a `hashmap_iter` structure.
189+
190`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
191more entries.
192+
193`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
194and returns the first entry, if any).
195
Junio C Hamanodbe51052014-07-22 19:39:51196`const char *strintern(const char *string)`::
197`const void *memintern(const void *data, size_t len)`::
198
199Returns the unique, interned version of the specified string or data,
200similar to the `String.intern` API in Java and .NET, respectively.
201Interned strings remain valid for the entire lifetime of the process.
202+
203Can be used as `[x]strdup()` or `xmemdupz` replacement, except that interned
204strings / data must not be modified or freed.
205+
206Interned strings are best used for short strings with high probability of
207duplicates.
208+
209Uses a hashmap to store the pool of interned strings.
210
Junio C Hamano5b3533d2014-02-27 23:07:15211Usage example
212-------------
213
214Here's a simple usage example that maps long keys to double values.
Junio C Hamano5b3533d2014-02-27 23:07:15215------------
216struct hashmap map;
217
218struct long2double {
219struct hashmap_entry ent; /* must be the first member! */
220long key;
221double value;
222};
223
224static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
225{
226return !(e1->key == e2->key);
227}
228
229void long2double_init(void)
230{
231hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
232}
233
234void long2double_free(void)
235{
236hashmap_free(&map, 1);
237}
238
239static struct long2double *find_entry(long key)
240{
241struct long2double k;
242hashmap_entry_init(&k, memhash(&key, sizeof(long)));
243k.key = key;
244return hashmap_get(&map, &k, NULL);
245}
246
247double get_value(long key)
248{
249struct long2double *e = find_entry(key);
250return e ? e->value : 0;
251}
252
253void set_value(long key, double value)
254{
255struct long2double *e = find_entry(key);
256if (!e) {
257e = malloc(sizeof(struct long2double));
258hashmap_entry_init(e, memhash(&key, sizeof(long)));
259e->key = key;
260hashmap_add(&map, e);
261}
262e->value = value;
263}
264------------
265
266Using variable-sized keys
267-------------------------
268
269The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
270`hashmap_entry` structure as key to find the correct entry. If the key data is
271variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
272to create a full-fledged entry structure on the heap and copy all the key data
273into the structure.
274
275In this case, the `keydata` parameter can be used to pass
276variable-sized key data directly to the comparison function, and the `key`
277parameter can be a stripped-down, fixed size entry structure allocated on the
278stack.
279
280See test-hashmap.c for an example using arbitrary-length strings as keys.