Mstdlib-1.24.0
|
Data Structures | |
struct | M_hashtable_enum_t |
struct | M_hashtable_callbacks |
union | M_hashtable_enum.entry |
struct | M_hashtable_enum.entry.unordered |
struct | M_hashtable_enum.entry.ordered |
Macros | |
#define | M_HASHTABLE_MAX_BUCKETS (1U<<24) |
Typedefs | |
typedef struct M_hashtable | M_hashtable_t |
typedef M_uint32(* | M_hashtable_hash_func) (const void *, M_uint32) |
typedef void *(* | M_hashtable_duplicate_func) (const void *) |
typedef void(* | M_hashtable_free_func) (void *) |
Enumerations | |
enum | M_hashtable_flags_t { M_HASHTABLE_NONE = 0 , M_HASHTABLE_KEYS_ORDERED = 1 << 0 , M_HASHTABLE_KEYS_SORTED = 1 << 1 , M_HASHTABLE_MULTI_VALUE = 1 << 2 , M_HASHTABLE_MULTI_SORTED = 1 << 3 , M_HASHTABLE_MULTI_GETLAST = 1 << 4 , M_HASHTABLE_STATIC_SEED = 1 << 5 } |
Functions | |
M_hashtable_t * | M_hashtable_create (size_t size, M_uint8 fillpct, M_hashtable_hash_func key_hash, M_sort_compar_t key_equality, M_uint32 flags, const struct M_hashtable_callbacks *callbacks) M_MALLOC |
void | M_hashtable_destroy (M_hashtable_t *h, M_bool destroy_vals) M_FREE(1) |
M_bool | M_hashtable_insert (M_hashtable_t *h, const void *key, const void *value) |
M_bool | M_hashtable_remove (M_hashtable_t *h, const void *key, M_bool destroy_vals) |
M_bool | M_hashtable_get (const M_hashtable_t *h, const void *key, void **value) |
M_bool | M_hashtable_is_multi (const M_hashtable_t *h) |
M_bool | M_hashtable_multi_len (const M_hashtable_t *h, const void *key, size_t *len) |
M_bool | M_hashtable_multi_get (const M_hashtable_t *h, const void *key, size_t idx, void **value) |
M_bool | M_hashtable_multi_remove (M_hashtable_t *h, const void *key, size_t idx, M_bool destroy_vals) |
M_uint32 | M_hashtable_size (const M_hashtable_t *h) |
size_t | M_hashtable_num_collisions (const M_hashtable_t *h) |
size_t | M_hashtable_num_expansions (const M_hashtable_t *h) |
size_t | M_hashtable_num_keys (const M_hashtable_t *h) |
size_t | M_hashtable_enumerate (const M_hashtable_t *h, M_hashtable_enum_t *hashenum) |
M_bool | M_hashtable_enumerate_next (const M_hashtable_t *h, M_hashtable_enum_t *hashenum, const void **key, const void **value) |
void | M_hashtable_merge (M_hashtable_t **dest, M_hashtable_t *src) |
M_hashtable_t * | M_hashtable_duplicate (const M_hashtable_t *h) |
Hashtable, meant for storing key/value pairs.
This should not be used directly. It is a base implementation that should be used by a type safe wrapper. For example: M_hash_dict.
The h can uses a set of callback functions to determine behavior. Such as if it should duplicate or free values.
An optional hash algorithm can be specified when creating a type safe wrapper. It is highly recommended to provide a hash algorithm. The default algorithm is an FNV1a variant using the pointer of the key.
The currently provided wrappers (str and u64) use an FNV1a variant. Multiple hashing algorithms were considered but FNV1a was ultimately chosen because testing with real world data sets it was found to provide the best performance.
The following hash functions were evaluated:
Overall performance was tested. We looked at time to generate the hash, time for insert, and lookup time. The insert and lookup are specific to see how chaining due to increased collisions impacted overall performance.
FNV1a had average collision performance and average hash time. Some hash functions had fewer collisions but the time it took to generate the hash far exceeded the chaining time. Others had very fast generation time but had so many collisions that the chaining time exceeded the benefit of being quick.
FNV1a was found to have few enough collisions to keep any chains sort and the combined hash generation and chaining time (when chaining happened) was overall faster than the other hash's times.
In order to prevent denial of service attacks by an attacker causing generation of extremely large chains FNV1a was modified. A random hash seed that is unique per hashtable object (each hashtable created using _create(...)) is used as the offset bias for the algorithm.
According to draft-eastlake-fnv-09 at https://tools.ietf.org/html/draft-eastlake-fnv-09#section-2.2 . "In the general case, almost any offset_basis will serve so long as it is non-zero." This information can also be found on Noll's website http://isthe.com/chongo/tech/comp/fnv/index.html in the section, "Parameters of the FNV-1/FNV-1a hash".
In our variation care has been taken to ensure the bias is never 0.
The random seed is created using M_rand. While M_rand is not a secure random number generator the random seed for M_rand is created from unlikely to be known data such as stack and heap memory addresses at the time the hashtable is created. It is unlikely an attacker would be able to determine the random seed to be able to get the hash seed. Nor is it likely for an attacker to be able to determine the hash seed. Testing using a random hash seed was found to alleviate chaining attacks.
struct M_hashtable_enum |
State tracking object for enumerating a Hashtable. This is explicitly not hidden so it doesn't require a malloc()
Data Fields | ||
---|---|---|
union M_hashtable_enum.entry | entry | |
size_t | valueidx |
When multi-value is in use which index of next value. |
struct M_hashtable_callbacks |
Structure of callbacks that can be registered to override default behavior for h implementation.
This allows a great deal of flexibility. For instance, you may want the HashTable to take ownership of the 'value' passed to it and clean up when the entry is replaced, removed, or the h is destroyed. In this implementation, you could use NULL for 'value_duplicate' so the pointer passed in is used directly, but register an appropriate 'value_free' to auto-cleanup.
Note that there are two duplicate callbacks for keys and values. There are two times a key or value can be duplicated. When it is first inserted into the h and when the h itself is duplicated.
In some cases the key or value needs to be duplicated by the h wrapper instead of by the base itself. For example storing unbounded binary data as a value. To prevent extra allocations and additional wrapping the value is duplicated by the wrapper and the length is prepended. This duplicate needs the length in order to work where the other duplicate (copy of h) will get the length from the fist few bytes of the value itself.
Data Fields | ||
---|---|---|
M_hashtable_duplicate_func | key_duplicate_insert |
Callback to duplicate a key on insert. Default if NULL is pass-thru pointer |
M_hashtable_duplicate_func | key_duplicate_copy |
Callback to duplicate a key on copy. Default if NULL is pass-thru pointer |
M_hashtable_free_func | key_free |
Callback to free a key. Default if NULL is no-op |
M_hashtable_duplicate_func | value_duplicate_insert |
Callback to duplicate a value on insert. Default if NULL is pass-thru pointer |
M_hashtable_duplicate_func | value_duplicate_copy |
Callback to duplicate a value on copy. Default if NULL is pass-thru pointer |
M_sort_compar_t | value_equality |
Callback used to determine if two values are equal. Primarily used for sorting muli-values stores. Default is all values are equal. |
M_hashtable_free_func | value_free |
Callback to free a value. Default if NULL is a no-op |
union M_hashtable_enum.entry |
Data Fields | ||
---|---|---|
struct M_hashtable_enum.entry.unordered | unordered | |
struct M_hashtable_enum.entry.ordered | ordered |
struct M_hashtable_enum.entry.unordered |
struct M_hashtable_enum.entry.ordered |
Data Fields | ||
---|---|---|
M_llist_node_t * | keynode |
When ordered keys are in use this is the node of the key currently being processed. |
#define M_HASHTABLE_MAX_BUCKETS (1U<<24) |
typedef struct M_hashtable M_hashtable_t |
typedef M_uint32(* M_hashtable_hash_func) (const void *, M_uint32) |
Function definition for callback to hash a key
typedef void *(* M_hashtable_duplicate_func) (const void *) |
Function definition to duplicate a key or value
typedef void(* M_hashtable_free_func) (void *) |
Function definition to free a key or value
enum M_hashtable_flags_t |
Flags for controlling the behavior of the hash
M_hashtable_t * M_hashtable_create | ( | size_t | size, |
M_uint8 | fillpct, | ||
M_hashtable_hash_func | key_hash, | ||
M_sort_compar_t | key_equality, | ||
M_uint32 | flags, | ||
const struct M_hashtable_callbacks * | callbacks | ||
) |
Create a new h.
The h will pre-allocate an array of buckets based on the rounded up size specified. Any hash collisions will result in those collisions being chained together via a linked list. The h will auto-expand by a power of 2 when the fill percentage specified is reached. All key entries are compared in a case-insensitive fashion, and are duplicated internally. Values are duplicated. Case is preserved for both keys and values.
[in] | size | Size of the hash table. If not specified as a power of 2, will be rounded up to the nearest power of 2. |
[in] | fillpct | The maximum fill percentage before the hash table is expanded. If 0 is specified, the h will never expand, otherwise the value must be between 1 and 99 (recommended: 75). |
[in] | key_hash | The function to use for hashing a key. If not specified will use the pointer address as the key and use FNV1a. |
[in] | key_equality | The function to use to determine if two keys are equal. If not specified, will compare pointer addresses. |
[in] | flags | M_hash_strvp_flags_t flags for modifying behavior. |
[in] | callbacks | Register callbacks for overriding default behavior. |
void M_hashtable_destroy | ( | M_hashtable_t * | h, |
M_bool | destroy_vals | ||
) |
Destroy the h.
[in] | h | Hashtable to destroy |
[in] | destroy_vals | M_TRUE if the values held by the h should be destroyed. This will almost always be M_TRUE. This should only be set to M_FALSE when all values held by the h are being managed externally. |
M_bool M_hashtable_insert | ( | M_hashtable_t * | h, |
const void * | key, | ||
const void * | value | ||
) |
Insert an entry into the h.
[in] | h | Hashtable being referenced. |
[in] | key | Key to insert. |
[in] | value | Value to insert into h. Value will not be duplicated. The h will take ownership of the value. Maybe NULL. |
M_bool M_hashtable_remove | ( | M_hashtable_t * | h, |
const void * | key, | ||
M_bool | destroy_vals | ||
) |
Remove an entry from the h.
[in] | h | Hashtable being referenced. |
[in] | key | Key to remove from the h. |
[in] | destroy_vals | M_TRUE if the value held by the h should be destroyed. This will almost always be M_TRUE. This should only be set to M_FALSE when the value held by the h is being managed externally. |
M_bool M_hashtable_get | ( | const M_hashtable_t * | h, |
const void * | key, | ||
void ** | value | ||
) |
Retrieve the value for a key from the h.
[in] | h | Hashtable being referenced. |
[in] | key | Key for value. |
[out] | value | Pointer to value stored in the h. Optional, pass NULL if not needed. |
M_bool M_hashtable_is_multi | ( | const M_hashtable_t * | h | ) |
Wether the hashtable a multi value table.
[in] | h | Hashtable being referenced. |
M_bool M_hashtable_multi_len | ( | const M_hashtable_t * | h, |
const void * | key, | ||
size_t * | len | ||
) |
Get the number of values for a given key.
[in] | h | Hashtable being referenced. |
[in] | key | Key for value to retrieve. |
[out] | len | The number of values. |
M_bool M_hashtable_multi_get | ( | const M_hashtable_t * | h, |
const void * | key, | ||
size_t | idx, | ||
void ** | value | ||
) |
Retrieve the value for a key from the given index when supporting muli-values.
[in] | h | Hashtable being referenced. |
[in] | key | Key for value to retrieve. |
[in] | idx | The index the value resides at. |
[out] | value | Pointer to value stored. Optional, pass NULL if not needed. |
M_bool M_hashtable_multi_remove | ( | M_hashtable_t * | h, |
const void * | key, | ||
size_t | idx, | ||
M_bool | destroy_vals | ||
) |
Remove a value from the h when supporting muli-values.
If all values have been removed then the key will be removed.
[in] | h | Hashtable being referenced |
[in] | key | Key for value to retrieve. |
[in] | idx | The index the value resides at. |
[in] | destroy_vals | M_TRUE if the value held by the h should be destroyed. This will almost always be M_TRUE. This should only be set to M_FALSE when the value held by the h is being managed externally. |
M_uint32 M_hashtable_size | ( | const M_hashtable_t * | h | ) |
Retrieve the current size (number of buckets/slots, not necessarily used).
[in] | h | Hashtable being referenced. |
size_t M_hashtable_num_collisions | ( | const M_hashtable_t * | h | ) |
Retrieve the number of collisions for h entries that has occurred since creation.
[in] | h | Hashtable being referenced. |
size_t M_hashtable_num_expansions | ( | const M_hashtable_t * | h | ) |
Retrieve the number of expansions/rehashes since creation.
[in] | h | Hashtable being referenced. |
size_t M_hashtable_num_keys | ( | const M_hashtable_t * | h | ) |
Retrieve the number of entries in the h.
This is the number of keys stored.
[in] | h | Hashtable being referenced. |
size_t M_hashtable_enumerate | ( | const M_hashtable_t * | h, |
M_hashtable_enum_t * | hashenum | ||
) |
Start an enumeration of the keys within a h.
[in] | h | Hashtable being referenced. |
[out] | hashenum | Outputs an initialized state variable for starting an enumeration. |
M_bool M_hashtable_enumerate_next | ( | const M_hashtable_t * | h, |
M_hashtable_enum_t * | hashenum, | ||
const void ** | key, | ||
const void ** | value | ||
) |
Retrieve the next item from a h enumeration.
If multi-value, keys will appear multiple times as each value will be retrieved individually.
[in] | h | Hashtable being referenced. |
[in,out] | hashenum | State variable for tracking the enumeration process. |
[out] | key | Value of next enumerated key. Optional, pass NULL if not needed. |
[out] | value | Value of next enumerated value. Optional, pass NULL if not needed. |
void M_hashtable_merge | ( | M_hashtable_t ** | dest, |
M_hashtable_t * | src | ||
) |
Merge two hashtables together.
The second (src) h will be destroyed automatically upon completion of this function. Any key/value pointers for the h will be directly copied over to the destination h, they will not be duplicated. Any keys which exist in 'dest' that also exist in 'src' will be overwritten by the 'src' value.
If dest and src are multi-value, all values from src will be copied into dest and the values from dest will not be removed. If dest is not multi-value and src is, then only the last value in src will be present in dest. If dest is multi-value and src is not, then the value from src will be added to dest. A value_equality function in dest is very important to ensure duplicate values are not present in a given key with multiple values.
[in,out] | dest | Pointer by reference to the h receiving the key/value pairs. if dest is NULL, the src address will simply be copied to dest. |
[in,out] | src | Pointer to the h giving up its key/value pairs. |
M_hashtable_t * M_hashtable_duplicate | ( | const M_hashtable_t * | h | ) |
Duplicate an existing h.
Copying all keys and values. As well as other elements such as callbacks.
[in] | h | Hashtable to be copied. |