Future

CodingBlocks

96. Data Structures – Hashtable vs Dictionary

Just in time to help you spread some cheer this holiday season, the dad jokes are back as we dig into the details of hash tables and dictionaries.

Hey you. Yes you. Are you reading this via your podcast player? You can find this episode’s full show notes and be a part of the conversation by visiting https://www.codingblocks.net/episode96.

Sponsors

  • Manning Publications – Sign up to Manning’s Deal of the Day (here) to participate in the Countdown to 2019 to be in the running to win awesome prizes and get special daily discounts.
  • Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard.

Survey Says

Anonymous Vote
Sign in with Wordpress
How do you plan to spend your time off this holiday season?
  • Spending time with the family because the holidays are all about the memories.
  • I'm not avoiding the family, I'm building my next great project.
  • Escaping the family _and_ the keyboard.
  • What time off?

News

  • Thank you to everyone that took a moment to make our holidays a little bit brighter by leaving a review:
    • iTunes: krauseling, Jla115, Ross44ross, hutchybong, JonMabale, code_in_my_robe, Saltoshi, SXENedger, BubbaCow
    • Stitcher: Järvinen, EliavCodeIsrael, sbgood, maxim_bendiks, Nathan Iyer
  • Thanks to chumak84 for the clarification of string interning. (Wikipedia)
  • We play a little game of How much data is generated every minute by use? taken from the data provided by Domo’s Data Never Sleeps 5.0 infographic. (Domo)

Hashtables vs Dictionary

Hashtable

About hash tables

  • When it comes to reading data, arrays are great (and fast) because getting to the next element is just a matter of math (i.e. the offset is the size of the type of thing in the array).
  • But when it comes to writing, or more specifically inserting data, linked lists are great because we only need adjust some pointers.
  • A hash table takes the best of these two worlds and smashes them together … sometimes
  • The MSDN documentation succinctly describes the Hashtable class as _Represents a collection of key/value pairs that are organized based on the hash code of the key._

How do they work?

  • Consider the core structure to the hash table to be an array.
  • Each element contains an object that contains the key and the value(s).
    • In the Wikipedia article for hash table, this object is called a bucket. Bucket?!
  • The data that you want to use as the key to read and/or write is used to create a hash that serves as the index to the array.
  • Implementation can vary based on how collisions are handled (i.e. keys that produce the same hash).
Popular Collision Resolution Strategies
  • Separate Chaining
    • In this implementation, the elements in the array are pointers to a linked list of buckets.
    • During a collision, you traverse the list looking for your key.
    • A good hash table will rarely have more than 3 items at the same array index, so the list traversal is small.
  • Open Addressing
    • In this implementation, the elements in the array are the buckets themselves.
    • The hash value serves as a starting point.
      • Meaning that during a write operation, if we go to that index in the array, and something is already there, we will continue on to the next element in the array looking for an empty slot to write our value.
      • And for a read operation, we will similarly start at the index of the hash and look for the key we’re looking for and continue until we either find it or get to an empty slot in the array.
  • Other strategies include:
    • Cuckoo hashing
    • Hopscotch hashing
    • Robin Hood
    • 2-choice hashing
      • As the name implies, two hashing functions are used.
      • During a write operation, the new key/value pair is written to the array’s index location that has the fewest objects already at each hash value.

Complexity

  • Space: Average and Worst case = O(n)
  • Search, Insert, and Delete: Average = O(1) and Worst case = O(n)

Pros

  • Speed. Hash tables are fast. Reads are generally O(1) (ignoring collisions).
    • If there is a collision, the read/write time _can be_ reduced to O(n/k) where k is the size of the hash table, which can be reduced to just O(n).
    • Assuming a good hashing algorithm is used, performance will usually be O(1).
      • But this assumes that by “good”, the performance of the hashing algorithm is considered … i.e. a slow algorithm might still be O(1) but be slower than alternatives.

Cons

  • Depending on language, looking at you C#, the Hashtable type is loosely typed.
  • The cost of the hashing function can be more than just looping over the list, specially for few entries.
  • Because hash table entries are spread around, there is poor “locality of reference” and can trigger processor cache misses.
  • Cache performance can also be poor or ineffective depending on implementations, such as separate chaining.
  • Performance degrades when there are many collisions.

When to use

  • If we’re talking language specific implementations of the Hashtable _type_
    • Um. Don’t? At least, not in C#? Prefer the Dictionary type.
    • And um, not in Java either. Use HashMap.
  • In more general terms, use them when …
    • You need an associative array … i.e. you want to access the array by some key, like a name, rather than by index.
    • Database indexing
    • Cache
    • Sets

Dictionary

What is it? And how does it work?

  • Like a hash table, dictionary holds key/value pairs.
  • By definition hash tables are weakly typed, but dictionaries are not
    • Hashtable numbers = new Hashtable();
      Dictionary<int, string> dictionary = new Dictionary<int, string >();
  • C# Detail, Hashtable and Dictionary use a different collision strategy. Hashtable uses a rehash strategy, while Dictionary utilizes “chaining”.
    • Chaining is using a secondary data structure (sparse array) rather than re-hashing.
    • Chaining is…complicated, and there are different methods for doing it (separate chaining vs open-addressing).
    • Wait, but didn’t we just talk about how adding and removing items from an array takes time? Doesn’t that mean that chaining will be slower than a normal hash lookup?
    • What about memory size? Sounds like it will potentially be double?
      • Nah, it’s fine. Chaining ends up running in O(1) on average time as well, because of the implementation details of chaining.
        • Weird math ahead …
        • Namely that the length of the list can never exceed the bucket size. This keeps the “load factor” (ratio of nodes to buckets) low so the average lookup time ends up being known as the separate chaining ends up being O(n/m) and since n/m are equal, you get 1 … on average.

Pros

  • Same pros as hash tables.
  • In static languages, dictionary are more efficient because there is no boxing necessary.
  • Operations are safer, because errors are caught at compile time.

Cons

  • Same cons as hash tables.

When to use

  • Whenever you need a hash table like data structure, but want type safety.
  • Fast insert, delete, lookup – sparse data.

What about..

  • Hashes and Dictionaries technically don’t support ordering, so what the heck is IOrderedDictionary?

Dictionary vs Hashtable in C#

  • Hashtable uses the type object as both the key and value. Meaning they are loosely typed.
    • This also means value types like int get boxed/unboxed during use (see episode 2).
  • Dictionaryon the other hand is strongly typed. So the key and value types are explicitly defined in the code.
    • This favors failing at compile time rather than run time.

This is a valid use of a Hashtable in C#:

using System;
using System.Collections;
					
public class Program
{
	public static void Main()
	{
		var h = new Hashtable();
		h.Add("foo", "bar");
		h.Add(1, 2);
		
		foreach(var key in h.Keys) {
			Console.WriteLine(h[key]);
		}
	}
}

Closing Thoughts
  • C#’s Dictionary is like Java’s Hashtable.
  • In C# – Dictionary<TKey,TValue>
  • In Java – Hashtable<K,V>
    • In fact, in Java, Hashtable extends Dictionary (abstract)
    • Although you’re supposed to use HashMap instead.
  • And a Javascript Object can be used like a dictionary (aka associative array).

Resources We Like

Tip of the Week

  • Join the Slack community and join in on all of the fun in #pet-pictures.
  • Visualize your time series data using Grafana. (Grafana Labs)
  • Add emojis to your file (_never_ in your code!) in Visual Studio Code by using WIN+;

Episode source