Sets - Basic Operations

Sets are fascinating constructs in both mathematics and computer science. In this lesson we learn how to implement the Set abstract data type with a Map as the underlying data structure to achieve O(1) operations for certain methods.

We're working on inlining examples! Until then, please find the examples here on GitHub.

Sets

Sets are fascinating constructs in the mathematics world. Without them we wouldn't have a proper definition of a function, since functions are defined by their inputs and outputs, both of which happen to be sets. They have a a number of variants that offer different properties and possibilities; infinite, countably infinite, finite sets, and more. As such, sets play a huge part in computer science as well. So much so that I'm hoping to have a dedicated course on set theory at some point in the future.

However, for this lesson, we are going to narrow our focus on the Set abstract data type, which is built as the Finite Set in mathematics. It can store a collection of unique values without order. So while I can define a 5 element array as Array(1,2,2,2,3), if I were to transform it to a set that contains only unique values, it would only have 3 elements, Set(1,2,3). And then because sets do not have a particular order, a set of 1,2,3 is equal to the set of 3,2,1. Set(1,2,3) = Set(3,2,1). On a higher level, we could even say we have a set of all natural numbers, which is all positive integers 0 through infinity Set(0,1,2,...) it is still the set of all natural numbers even if you shake it around and they get all jumbled.

Another thing that makes sets stand out from lists or arrays is that you wouldn't really reach for a set if you wanted to store something just to access it later. So you won't commonly see something like Set(1,2,3)[2] accessing the second element of a set, because it doesn't make much sense. It's unordered and unindexed, so indexing into it is kind of like reaching into a bowl of M&M's - any individual one could be the first, twelfth, or forty second. Now that's not to say you cannot have an ordered set - some languages do indeed offer a type called ordered sets. But we are going for the baseline.

A set is better suited when you need to store a collection of items but don't want duplicates, or if you just need to check to see if something belongs to a certain collection. In JavaScript, if you find yourself using a list just to run a bunch of .indexOf(foo) !== -1 checks, you're wasting a lot of CPU cycles because that check is running in linear time. If it's a short list, then the point is moot, but if you have a few hundred or thousand elements to check repeatedly, that's going to start to add up. Checking if an element belongs to a collection should be a cheap operation, but it's not specified by the spec so that's just a shared opinion by many people on the topic.

As such, since I'd like to keep membership lookup cheap, we are going to implement our set type in terms of a Map type which is available in EcmaScript2015 and higher. We could use a plain old object for this, but I don't see enough usage of the Map type in the wild yet so we're going to add to that list. Now, if you already know about the Map type, then you're likely also aware that ES2015 also added a new Set type of it's own. I highly recommend checking it out after you're done with this lesson, but for now just pretend it's not there. But, since the perfect name has been taken, we're forced to name ours something silly like MySet.

class MySet {

}

So as you can see I've started this class already, but I've only defined an empty constructor and a static method that will be executable on the MySet class. All this static method does is create a new instance of our object from a collection of elements. I'm skipping this part, and leaving this clever one-liner that also works because I will go into a tangent grumbling about how using the class keyword here complicates things that are easier to do with functions, but, alas, it makes showing the actual content of this screencast cleaner.

Moving right along, we're going to be implementing the basic operations for a set, which I have in comments at the top of this test file here. Let's take a minute to look over what operations a set needs to function, bearing in mind that some of these are optional or can be named differently. Our first three here, create, build, and create_from, are all about creating an initializing a new set, whether empty, or populated by arguments or a collection of some sort. Pretty standard for collection types. Next we have add, which adds the constraint that it will only add an element if it's not already present. This fulfills the constraint of a set that all elements must be unique. Next, remove will get rid of an element in the set if it exists, otherwise it's a no-op. is_empty just returns true or false for whether a set is empty or not. size or cardinality will return the number of elements in a set, just like the length property on JavaScript arrays. enumerate will return a list of all elements in the set, not dependent on any given order, for use with other code that may not support the Set type, and finally is_element_of will return true or false for whether the given element is present in the set.

I've again translated a bunch of the operation and equality specs into tests so we can just start implementing the code and move forward with confidence that we are adhering to the definition of the abstract data type correctly.

So let's pick this top spec, and run only this test. This failure is telling me that we don't have a length property to test with, because I'm using the lengthOf assertion from chai. I could fix this alone by setting length to something, but we're going to skip running the test again with, because it'll be wrong unless we are also saving passed in arguments into storage for our set.

So in the constructor I'm going to use the spread operator to collect all the arguments in an array. I'm going to create a new property on the instance named data, and that is going to be an instance of Map like we mentioned earlier. Next we're going to iterate over each element passed into our constructor and set it on data by passing the argument as the key, and we'll always set the value to 1 since we're just using the map for its key-lookup abilities and don't care about the value. Then we'll set the length of our set instance to be the size of the map. The reason for this is that there may be duplicates passed into our set, but our map will overwrite keys with new entries, so the map's keys will tell us how many unique values were passed in.

constructor(...args) {
  this.data = new Map();

  args.forEach((arg) => {
    this.data.set(arg, 1);
  });

  this.length = this.data.size;
}

Let's run our tests again...great, that works. So lets run all of the Initialization specs next. Looks like we can construct an empty set, populate one with arguments, and ensure uniqueness properly, but our static method is failing because we haven't made a method named add yet. So we know what's next.

We'll define add, then have it take an element as an argument. Like we did above, we'll store el in data with the value of 1. Then we'll update our length, and finally we'll return our object just to offer a convenient API. And we know this will already prevent duplicate items since it uses the same code as above, so we don't have to test that again.

add(el) {
  this.data.set(el, 1);
  this.length = this.data.size;

  return this;
}

Now if we run our tests... everything in here passes. So now we can start to focus on our basic operations. Let's run those specs and see what fails first. Our first spec passes because we've already created the add method to satisfy the static method, so our first failure is that the remove method doesn't exist. Let's go create that now. This method will also take an element, the one we want to remove from our set. We will then delete that element from the map, update our length again so that if it existed we'll lower the length of the set. And then again we'll just return this.

remove(el) {
  this.data.delete(el);
  this.length = this.data.size;

  return this;
}

Okay that one passes, and now we are seeing that our spec determining if our set can tell if it is empty is failing because it has no empty method. We can define empty just as you'd expect by returning whether our length is 0 or not.

empty() {
  return this.length === 0;
}

Running our tests again shows a greenlight there, but we fail on checking cardinality, which is the size of the set. I decided to add a size() method even though we have a length property because I like uniform APIs and the spec wanted one. So we'll just create the method, return the length, and keep going.

size() {
  return this.length;
}

Okay now we are failing on verifying we can return our set as an enumerable, which is part of the the standard set operations. So we'll define the method it says it's missing, toList, and then because we can't get a list of our keys from the map, only an iterator. So we're going to again use the spread operator for our benefit and return an array with keys spread out in it.

toList() {
  return [...this.data.keys()];
}

Running our specs again, that test worked out well, and now we need to determine if an element is a member of our set.

So we'll define the method has and it'll take an argument to check, so now we can directly leverage the Map type's quick lookup time and use its has method to determine if it is a key in the map.

has(el) {
  return this.data.has(el);
}

Looks like that did it, so now just need to cover the equality section down here, where we define equality such that order is not important for equality, but two sets must have the same number of the same elements to be equal. So let's see if we can write that up real quick.

So we'll define the method equals as taking another set, then we will return whether the size of our set and the size of the other set are equal, then we'll and that together by reducing over the elements in our set's list with a reducer and setting the default value to true. Okay so our reducer will look a bit dense, but it is going to take the accumulation of values as a, and the current element as e, and it will return the value of logically ANDing our accumulated value, which starts as true, and then whether our other array has this element in it. If any element returns false, then the whole thing will be false, otherwise it'll return true and we'll know that these sets are equal.

equals(other) {
  const reducer = (a, e) => a && other.has(e)

  return (this.size() === other.size()) && this.toList().reduce(reducer, true);
}

Let's run the tests one more time... and we're good to go! And reading through our passing specs gives us a nice rundown of all the features we've implemented over the course of this lesson. We gave our set a way to be initialized empty, or with elements populated by a list or arguments. We verified that elements are unique, which is a major gain from using Sets. Then we ensured we could run through each of the major pieces of functionality - we can add and remove elements, determine the size of the set and when it is empty. We can return an enumerable version of our set for consumption by other JS functionality, and then we can determine if an element belongs to our set. As mentioned before if you're using a set, you typically want guaranteeing uniqueness and a way to quickly determine if an element is in the set.

And then at the bottom we verified that when checking equality of a set we do not care about order, so long as it has the exact same elements.

MySet
  Initialization
    ✓ can be created empty
    ✓ can be populated from arguments
    ✓ can be populated from a list
    ✓ only contains unique elements
  Basic operations
    ✓ can add a new element to set
    ✓ can remove elements from a set
    ✓ can determine when empty
    ✓ can return cardinality
    ✓ can return enumerable list
    ✓ can determine when element belongs to set
    ✓ can determine when element does not belong to set
  Equality
    ✓ is not order dependent
    ✓ must have the number of elements
    ✓ must have the same elements

If we had chosen to use a typical array as the underlying data structure, we would have to iterate through each element of the array if we wanted to see if an element was a member. The same is true if we wanted to add an element, because we'd have to check to make sure we don't insert a duplicate, or if we wanted to remove an element because we'd have to find it. Since we'd have to iterate through each element of an array of size n, we would call this a O(n), or linear time operation. However, since we opted to use a map structure that can give us direct lookup, adding, or removing of a key without iteration, that gave us operations that occur in constant time, or O(1). Sometimes sets are implemented as trees as well, but I'll leave you to find out what some of the tradeoffs there are.

And because sets are such fascinating constructs, there are many more things we can do with them. We're going to continue looking at sets next lesson by looking at some of the algebraic operations we can perform on sets.