Introduction

One thing I’ve come to discover as a professional developer is how useful hash tables are for providing an easy, fast, and concise way to look up values.

Whether it be strategically grabbing a function or accumulating some quantity on objects when processing large batches of data, I use hash tables all the time when writing software.

Quick Background on Hash Tables

Hash tables are clever data structures. They are a structure in which you can store a value in association with a key. You can think about them like a collection of PO boxes.

You store values using a specific key- and you can look them back up using that same key later.

In JavaScript an object is implemented as a hash table. In this example we are storing the value "Craig" using the key "firstName" in an object called hashtable

const hashtable = {}
hashtable.firstName = "Craig"

To retrieve the value from the hash table we simply provide the key "firstName"

console.log(hashtable.firstName) // outputs "Craig"

Hash tables are convenient because they provide O(1) look up times while being able to use a variety of data types as a key. This means the time it takes to look up a value is (on average) constant.

Even if your hash table contains a lot of entries it won’t impact your look up time significantly. And we aren’t constrained to using indexes directly to achieve O(1) look up times like with an array.

The implementation underneath a hash table depends on the language you use. But to give you a better idea of how it works, let’s describe a basic example of how a hash table cleverly converts your key into an index under-the-hood to instantly retrieve your value.

Basic Hash Table Implementation

When you pass in a key and a value to a hash table your key is taken and plugged into a hashing algorithm. This hashing algorithm generates some number from your key. We can call this hashedKeyNumber

const hashtable = {}
hashtable.someKey = 'someValue'

// Under the hood- the hash table is hashing your key 
// (in this case, a string called 'someKey')
const hashedKeyNumber = someHashingAlgorithm('someKey')

The hash table will take this hashed value, which is a number, and use the modulo operator on it. The divisor for this operation will be the length of the underlying array that your hash table is storing your values in.

const hashedKeyNumber = someHashingAlgorithm('someKey')
const index = hashedKeyNumber % underlyingArray.length

By using the length of the underlying array as the divisor in the modulo operation the hash table ensures it will generate an index that is not out of bounds.

And that’s it! The index we get from all of that is where your value is stored for the someKey key. And because looking up values by index is an O(1) operation in arrays, our hash table is able to use our key to look up our value in O(1) time.

Hashing Functions Are Pure

The hashing algorithm or function that a hash table uses to derive an index from a given key is pure. This means for a given key the resulting index will always be the same. If the key firstName results in an index of 2 it will always result in an index of 2.

This makes hash tables ideal for operations where you need to keep track of the identity of some object and a piece of data associated with it. Rather than having to search through an array to find an object with the identifier you are looking for (an O(n) operation) you can simply use the identifier as the key in a hash table and look it up instantly.

Let’s go through an example of why this is so incredibly useful for certain problems.

Tracking Data Using IDs

Hash tables are great for tracking data points across objects that have some unique identifier.

Here’s a common example:

You run an online store. You want to pull in customer order data from an API and update the sales volume of products in your database accordingly.

The response from the API is a list of orders that include a specific product UPC and the quantity of it that was ordered. Often times even thousands of orders may only pertain to a handful of unique products.

We could run through this list and send a separate request to the database for each order that updates the sales volume of the appropriate product. This is a simple approach that is easy to implement. It will require querying the database for every single order. If we assume we are processing 10,000 orders with an average response time of 150ms from the database this would cost 1500 seconds of total latency.

But what if we simply condensed this data in our application logic first using a hash table?

We can create a hash table for storing specific product UPCs and their total sales quantity:

const productSales = {}

With this hash table we can easily run through all the orders and condense thousands of them into a handful of unique entries.

const customerOrders = getAllCustomerOrders()

console.log(customerOrders.length) // outputs 10,000

customerOrders.forEach((order) => {
    const upc = order.upc
    const quantity = order.quantity
    
    if (!productSales[upc]) productSales[upc] = 0
    
    productSales[upc] += quantity
})

Now when we iterate over the hash table to update the sales volume in our database we are making only 20 requests as opposed to ten thousand. This results in a total latency time of 3 seconds. Which means we just shaved off 1497 seconds (nearly 25 minutes) of latency.

console.log(Object.keys(productSales).length) // outputs 20


Object.entries(productSales).forEach((entry) => {
    const [upc, totalSales] = entry
    knex("products").increment('sales', totalSales).where("upc", "=", upc)
})

This is a fairly extreme example but it illustrates the immense performance benefits of condensing data in your application logic before you start querying your database, and when you use hash tables to do it it’s easy and fast.

Storing Functions

One of my favorite uses for hash tables is for storing functions. It pairs incredibly well with the strategy pattern which I go into more detail here.

Briefly, if you’re having to execute similar but distinct logic based on the type of input you get, plugging that input into a hash table to retrieve the function you need to execute can be really convenient.

For example: let’s say you’re building an HTTP library to handle requests. You’ll need to be able to handle decompressing various responses based on the compression algorithm used.

const decompressGzip = (resData) => {...}
const decompressDeflate = (resData) => {...}

const decompressionAlgorithms = {
    'gzip': decompressGzip,
    'deflate': decompressDeflate
}

const handleResponse = (encodingType, resData) => {
    const decompressionAlgorithm = decompressionAlgorithms[encodingType]
    return decompressionAlgorithm(resData)
}