Mstdlib-1.24.0
Hashtable generic/base implementation

Data Structures

struct  M_hashtable_enum_t
 
struct  M_hashtable_callbacks
 
union  M_hashtable_enum.entry
 
struct  M_hashtable_enum.entry.unordered
 
struct  M_hashtable_enum.entry.ordered
 

Macros

#define M_HASHTABLE_MAX_BUCKETS   (1U<<24)
 

Typedefs

typedef struct M_hashtable M_hashtable_t
 
typedef M_uint32(* M_hashtable_hash_func) (const void *, M_uint32)
 
typedef void *(* M_hashtable_duplicate_func) (const void *)
 
typedef void(* M_hashtable_free_func) (void *)
 

Enumerations

enum  M_hashtable_flags_t {
  M_HASHTABLE_NONE = 0 ,
  M_HASHTABLE_KEYS_ORDERED = 1 << 0 ,
  M_HASHTABLE_KEYS_SORTED = 1 << 1 ,
  M_HASHTABLE_MULTI_VALUE = 1 << 2 ,
  M_HASHTABLE_MULTI_SORTED = 1 << 3 ,
  M_HASHTABLE_MULTI_GETLAST = 1 << 4 ,
  M_HASHTABLE_STATIC_SEED = 1 << 5
}
 

Functions

M_hashtable_tM_hashtable_create (size_t size, M_uint8 fillpct, M_hashtable_hash_func key_hash, M_sort_compar_t key_equality, M_uint32 flags, const struct M_hashtable_callbacks *callbacks) M_MALLOC
 
void M_hashtable_destroy (M_hashtable_t *h, M_bool destroy_vals) M_FREE(1)
 
M_bool M_hashtable_insert (M_hashtable_t *h, const void *key, const void *value)
 
M_bool M_hashtable_remove (M_hashtable_t *h, const void *key, M_bool destroy_vals)
 
M_bool M_hashtable_get (const M_hashtable_t *h, const void *key, void **value)
 
M_bool M_hashtable_is_multi (const M_hashtable_t *h)
 
M_bool M_hashtable_multi_len (const M_hashtable_t *h, const void *key, size_t *len)
 
M_bool M_hashtable_multi_get (const M_hashtable_t *h, const void *key, size_t idx, void **value)
 
M_bool M_hashtable_multi_remove (M_hashtable_t *h, const void *key, size_t idx, M_bool destroy_vals)
 
M_uint32 M_hashtable_size (const M_hashtable_t *h)
 
size_t M_hashtable_num_collisions (const M_hashtable_t *h)
 
size_t M_hashtable_num_expansions (const M_hashtable_t *h)
 
size_t M_hashtable_num_keys (const M_hashtable_t *h)
 
size_t M_hashtable_enumerate (const M_hashtable_t *h, M_hashtable_enum_t *hashenum)
 
M_bool M_hashtable_enumerate_next (const M_hashtable_t *h, M_hashtable_enum_t *hashenum, const void **key, const void **value)
 
void M_hashtable_merge (M_hashtable_t **dest, M_hashtable_t *src)
 
M_hashtable_tM_hashtable_duplicate (const M_hashtable_t *h)
 

Detailed Description

Hashtable, meant for storing key/value pairs.

This should not be used directly. It is a base implementation that should be used by a type safe wrapper. For example: M_hash_dict.

The h can uses a set of callback functions to determine behavior. Such as if it should duplicate or free values.

An optional hash algorithm can be specified when creating a type safe wrapper. It is highly recommended to provide a hash algorithm. The default algorithm is an FNV1a variant using the pointer of the key.

The currently provided wrappers (str and u64) use an FNV1a variant. Multiple hashing algorithms were considered but FNV1a was ultimately chosen because testing with real world data sets it was found to provide the best performance.

The following hash functions were evaluated:

Overall performance was tested. We looked at time to generate the hash, time for insert, and lookup time. The insert and lookup are specific to see how chaining due to increased collisions impacted overall performance.

FNV1a had average collision performance and average hash time. Some hash functions had fewer collisions but the time it took to generate the hash far exceeded the chaining time. Others had very fast generation time but had so many collisions that the chaining time exceeded the benefit of being quick.

FNV1a was found to have few enough collisions to keep any chains sort and the combined hash generation and chaining time (when chaining happened) was overall faster than the other hash's times.

In order to prevent denial of service attacks by an attacker causing generation of extremely large chains FNV1a was modified. A random hash seed that is unique per hashtable object (each hashtable created using _create(...)) is used as the offset bias for the algorithm.

According to draft-eastlake-fnv-09 at https://tools.ietf.org/html/draft-eastlake-fnv-09#section-2.2 . "In the general case, almost any offset_basis will serve so long as it is non-zero." This information can also be found on Noll's website http://isthe.com/chongo/tech/comp/fnv/index.html in the section, "Parameters of the FNV-1/FNV-1a hash".

In our variation care has been taken to ensure the bias is never 0.

The random seed is created using M_rand. While M_rand is not a secure random number generator the random seed for M_rand is created from unlikely to be known data such as stack and heap memory addresses at the time the hashtable is created. It is unlikely an attacker would be able to determine the random seed to be able to get the hash seed. Nor is it likely for an attacker to be able to determine the hash seed. Testing using a random hash seed was found to alleviate chaining attacks.


Data Structure Documentation

◆ M_hashtable_enum

struct M_hashtable_enum

State tracking object for enumerating a Hashtable. This is explicitly not hidden so it doesn't require a malloc()

Data Fields
union M_hashtable_enum.entry entry
size_t valueidx

When multi-value is in use which index of next value.

◆ M_hashtable_callbacks

struct M_hashtable_callbacks

Structure of callbacks that can be registered to override default behavior for h implementation.

This allows a great deal of flexibility. For instance, you may want the HashTable to take ownership of the 'value' passed to it and clean up when the entry is replaced, removed, or the h is destroyed. In this implementation, you could use NULL for 'value_duplicate' so the pointer passed in is used directly, but register an appropriate 'value_free' to auto-cleanup.

Note that there are two duplicate callbacks for keys and values. There are two times a key or value can be duplicated. When it is first inserted into the h and when the h itself is duplicated.

In some cases the key or value needs to be duplicated by the h wrapper instead of by the base itself. For example storing unbounded binary data as a value. To prevent extra allocations and additional wrapping the value is duplicated by the wrapper and the length is prepended. This duplicate needs the length in order to work where the other duplicate (copy of h) will get the length from the fist few bytes of the value itself.

Data Fields
M_hashtable_duplicate_func key_duplicate_insert

Callback to duplicate a key on insert. Default if NULL is pass-thru pointer

M_hashtable_duplicate_func key_duplicate_copy

Callback to duplicate a key on copy. Default if NULL is pass-thru pointer

M_hashtable_free_func key_free

Callback to free a key. Default if NULL is no-op

M_hashtable_duplicate_func value_duplicate_insert

Callback to duplicate a value on insert. Default if NULL is pass-thru pointer

M_hashtable_duplicate_func value_duplicate_copy

Callback to duplicate a value on copy. Default if NULL is pass-thru pointer

M_sort_compar_t value_equality

Callback used to determine if two values are equal. Primarily used for sorting muli-values stores. Default is all values are equal.

M_hashtable_free_func value_free

Callback to free a value. Default if NULL is a no-op

◆ M_hashtable_enum.entry

union M_hashtable_enum.entry
Data Fields
struct M_hashtable_enum.entry.unordered unordered
struct M_hashtable_enum.entry.ordered ordered

◆ M_hashtable_enum.entry.unordered

struct M_hashtable_enum.entry.unordered
Data Fields
M_uint32 hash

Hash of last processed entry

size_t chainid

1-based offset within linked list of clashes of last processed entry. This value is 1-based specifically so when starting an enumeration, a 0,0 value would indicate this

◆ M_hashtable_enum.entry.ordered

struct M_hashtable_enum.entry.ordered
Data Fields
M_llist_node_t * keynode

When ordered keys are in use this is the node of the key currently being processed.

Macro Definition Documentation

◆ M_HASHTABLE_MAX_BUCKETS

#define M_HASHTABLE_MAX_BUCKETS   (1U<<24)

Typedef Documentation

◆ M_hashtable_t

typedef struct M_hashtable M_hashtable_t

◆ M_hashtable_hash_func

typedef M_uint32(* M_hashtable_hash_func) (const void *, M_uint32)

Function definition for callback to hash a key

◆ M_hashtable_duplicate_func

typedef void *(* M_hashtable_duplicate_func) (const void *)

Function definition to duplicate a key or value

◆ M_hashtable_free_func

typedef void(* M_hashtable_free_func) (void *)

Function definition to free a key or value

Enumeration Type Documentation

◆ M_hashtable_flags_t

Flags for controlling the behavior of the hash

Enumerator
M_HASHTABLE_NONE 

Case sensitive single value (new values replace).

M_HASHTABLE_KEYS_ORDERED 

Keys should be ordered. Default is insertion order unless the sorted option is specified.

M_HASHTABLE_KEYS_SORTED 

When the keys are ordered sort them using the key_equality function.

M_HASHTABLE_MULTI_VALUE 

Allow keys to contain multiple values. Sorted in insertion order another sorting is specified.

M_HASHTABLE_MULTI_SORTED 

Allow keys to contain multiple values sorted in ascending order

M_HASHTABLE_MULTI_GETLAST 

When using the get function will get the last value from the list when allowing multiple values. The default is to get the first value.

M_HASHTABLE_STATIC_SEED 

Use a static seed for hash function initialization. This greatly reduces the security of the hashtable and removes collision attack protections. This should only be used as a performance optimization when creating millions of hashtables with static data specifically for quick look up. DO NOT use this flag with any hashtable that could store user generated data! Be very careful about duplicating a hashtable that was created with this flag. All duplicates will use the static seed.

Function Documentation

◆ M_hashtable_create()

M_hashtable_t * M_hashtable_create ( size_t  size,
M_uint8  fillpct,
M_hashtable_hash_func  key_hash,
M_sort_compar_t  key_equality,
M_uint32  flags,
const struct M_hashtable_callbacks callbacks 
)

Create a new h.

The h will pre-allocate an array of buckets based on the rounded up size specified. Any hash collisions will result in those collisions being chained together via a linked list. The h will auto-expand by a power of 2 when the fill percentage specified is reached. All key entries are compared in a case-insensitive fashion, and are duplicated internally. Values are duplicated. Case is preserved for both keys and values.

Parameters
[in]sizeSize of the hash table. If not specified as a power of 2, will be rounded up to the nearest power of 2.
[in]fillpctThe maximum fill percentage before the hash table is expanded. If 0 is specified, the h will never expand, otherwise the value must be between 1 and 99 (recommended: 75).
[in]key_hashThe function to use for hashing a key. If not specified will use the pointer address as the key and use FNV1a.
[in]key_equalityThe function to use to determine if two keys are equal. If not specified, will compare pointer addresses.
[in]flagsM_hash_strvp_flags_t flags for modifying behavior.
[in]callbacksRegister callbacks for overriding default behavior.
Returns
Allocated h.
See also
M_hashtable_destroy

◆ M_hashtable_destroy()

void M_hashtable_destroy ( M_hashtable_t h,
M_bool  destroy_vals 
)

Destroy the h.

Parameters
[in]hHashtable to destroy
[in]destroy_valsM_TRUE if the values held by the h should be destroyed. This will almost always be M_TRUE. This should only be set to M_FALSE when all values held by the h are being managed externally.

◆ M_hashtable_insert()

M_bool M_hashtable_insert ( M_hashtable_t h,
const void *  key,
const void *  value 
)

Insert an entry into the h.

Parameters
[in]hHashtable being referenced.
[in]keyKey to insert.
[in]valueValue to insert into h. Value will not be duplicated. The h will take ownership of the value. Maybe NULL.
Returns
M_TRUE on success, or M_FALSE on failure.

◆ M_hashtable_remove()

M_bool M_hashtable_remove ( M_hashtable_t h,
const void *  key,
M_bool  destroy_vals 
)

Remove an entry from the h.

Parameters
[in]hHashtable being referenced.
[in]keyKey to remove from the h.
[in]destroy_valsM_TRUE if the value held by the h should be destroyed. This will almost always be M_TRUE. This should only be set to M_FALSE when the value held by the h is being managed externally.
Returns
M_TRUE on success, or M_FALSE if key does not exist.

◆ M_hashtable_get()

M_bool M_hashtable_get ( const M_hashtable_t h,
const void *  key,
void **  value 
)

Retrieve the value for a key from the h.

Parameters
[in]hHashtable being referenced.
[in]keyKey for value.
[out]valuePointer to value stored in the h. Optional, pass NULL if not needed.
Returns
M_TRUE if value retrieved, M_FALSE if key does not exist.

◆ M_hashtable_is_multi()

M_bool M_hashtable_is_multi ( const M_hashtable_t h)

Wether the hashtable a multi value table.

Parameters
[in]hHashtable being referenced.
Returns
M_TRUE if a multi value hashtable.

◆ M_hashtable_multi_len()

M_bool M_hashtable_multi_len ( const M_hashtable_t h,
const void *  key,
size_t *  len 
)

Get the number of values for a given key.

Parameters
[in]hHashtable being referenced.
[in]keyKey for value to retrieve.
[out]lenThe number of values.
Returns
M_TRUE if length is retrieved, M_FALSE if key does not exist.

◆ M_hashtable_multi_get()

M_bool M_hashtable_multi_get ( const M_hashtable_t h,
const void *  key,
size_t  idx,
void **  value 
)

Retrieve the value for a key from the given index when supporting muli-values.

Parameters
[in]hHashtable being referenced.
[in]keyKey for value to retrieve.
[in]idxThe index the value resides at.
[out]valuePointer to value stored. Optional, pass NULL if not needed.
Returns
M_TRUE if value retrieved, M_FALSE if key does not exist

◆ M_hashtable_multi_remove()

M_bool M_hashtable_multi_remove ( M_hashtable_t h,
const void *  key,
size_t  idx,
M_bool  destroy_vals 
)

Remove a value from the h when supporting muli-values.

If all values have been removed then the key will be removed.

Parameters
[in]hHashtable being referenced
[in]keyKey for value to retrieve.
[in]idxThe index the value resides at.
[in]destroy_valsM_TRUE if the value held by the h should be destroyed. This will almost always be M_TRUE. This should only be set to M_FALSE when the value held by the h is being managed externally.
Returns
M_TRUE if the value was removed, M_FALSE if key does not exist.

◆ M_hashtable_size()

M_uint32 M_hashtable_size ( const M_hashtable_t h)

Retrieve the current size (number of buckets/slots, not necessarily used).

Parameters
[in]hHashtable being referenced.
Returns
Size of the h

◆ M_hashtable_num_collisions()

size_t M_hashtable_num_collisions ( const M_hashtable_t h)

Retrieve the number of collisions for h entries that has occurred since creation.

Parameters
[in]hHashtable being referenced.
Returns
Number of collisions.

◆ M_hashtable_num_expansions()

size_t M_hashtable_num_expansions ( const M_hashtable_t h)

Retrieve the number of expansions/rehashes since creation.

Parameters
[in]hHashtable being referenced.
Returns
number of expansions/rehashes.

◆ M_hashtable_num_keys()

size_t M_hashtable_num_keys ( const M_hashtable_t h)

Retrieve the number of entries in the h.

This is the number of keys stored.

Parameters
[in]hHashtable being referenced.
Returns
number of entries in the h.

◆ M_hashtable_enumerate()

size_t M_hashtable_enumerate ( const M_hashtable_t h,
M_hashtable_enum_t *  hashenum 
)

Start an enumeration of the keys within a h.

Parameters
[in]hHashtable being referenced.
[out]hashenumOutputs an initialized state variable for starting an enumeration.
Returns
Number of items in the h

◆ M_hashtable_enumerate_next()

M_bool M_hashtable_enumerate_next ( const M_hashtable_t h,
M_hashtable_enum_t *  hashenum,
const void **  key,
const void **  value 
)

Retrieve the next item from a h enumeration.

If multi-value, keys will appear multiple times as each value will be retrieved individually.

Parameters
[in]hHashtable being referenced.
[in,out]hashenumState variable for tracking the enumeration process.
[out]keyValue of next enumerated key. Optional, pass NULL if not needed.
[out]valueValue of next enumerated value. Optional, pass NULL if not needed.
Returns
M_TRUE if enumeration succeeded, M_FALSE if no more keys.

◆ M_hashtable_merge()

void M_hashtable_merge ( M_hashtable_t **  dest,
M_hashtable_t src 
)

Merge two hashtables together.

The second (src) h will be destroyed automatically upon completion of this function. Any key/value pointers for the h will be directly copied over to the destination h, they will not be duplicated. Any keys which exist in 'dest' that also exist in 'src' will be overwritten by the 'src' value.

If dest and src are multi-value, all values from src will be copied into dest and the values from dest will not be removed. If dest is not multi-value and src is, then only the last value in src will be present in dest. If dest is multi-value and src is not, then the value from src will be added to dest. A value_equality function in dest is very important to ensure duplicate values are not present in a given key with multiple values.

Parameters
[in,out]destPointer by reference to the h receiving the key/value pairs. if dest is NULL, the src address will simply be copied to dest.
[in,out]srcPointer to the h giving up its key/value pairs.

◆ M_hashtable_duplicate()

M_hashtable_t * M_hashtable_duplicate ( const M_hashtable_t h)

Duplicate an existing h.

Copying all keys and values. As well as other elements such as callbacks.

Parameters
[in]hHashtable to be copied.
Returns
Duplicated h.