Abstract Data Type

There are many different abstract data types out there, but many people aren't sure what that term means. Find out what it is, and how it allows us to design-by-contract, much like interfaces or protocols.

We're working on inlining examples! Until then, please find the examples here on GitHub.

Abstract Data Types

While learning about various data structures and types, you'll frequently come across the term "abstract data type". It may seem like an intimidating term, but the concept is fairly simple. It is the definition of a data type's behavior from the perspective of a user, without specifying the concrete implementation.

Now that I've phrased it in that way, you might be thinking of a language feature common in languages like PHP, Java, Swift, and several more. Often this idea is called an interface, like in PHP, Go, or Java. Sometimes it's called a protocol, as it is in Objective-C and Swift, and other times they're named contracts. The idea is you can define the functions or methods for everything your type or class can do, without specifying an exact implementation. An abstract data type and a construct like these are still different, but let's try creating a quick interface in PHP to see if understanding one makes understanding the other easier.

I'm going to open up AbstractArray.php and we are going to translate the abstract array data type to an interface. I'll point out that I'm using PHP7 for this example so that I have better typehinting support than PHP5 would be able to provide to make the example a bit more realistic. I've already copied the definition and axioms from Wikipedia into a comment block at the top for convenience so we don't need to switch between the browser and editor.

The first thing we need to do is declare that we are creating an interface named AbstractArray.

<?php

/*
Operations:
get(A, I): the data stored in the element of the array A whose indices are the integer tuple I.
set(A,I,V): the array that results by setting the value of that element to V.

Axioms:
get(set(A,I, V), I) = V
get(set(A,I, V), J) = get(A, J) if I ≠ J
for any array state A, any value V, and any tuples I, J for which the operations are defined.
*/

interface AbstractArray
{

}

Let's work from the top down - we see two operations here that we need to implement - get and set. Get retrieves the data stored in the element of array A whose indices are the integer tuple I. Now that seems a bit strange, doesn't it? Most languages you've likely used don't accept a tuple as an index for an array, they want a single integer. Let's just accept this as-is for now and we'll explain what it's for once we implement the type. Next we have a set, which sets the value at element I to be the value V. Okay, now that we've outlined the methods, let's define them in our interface with slight tweaks to accommodate this being an interface for a class.

Since we are defining the interface for classes and methods, we don't need to pass A into the get function anymore, it is always available on the object holding the methods. As such, we'll define our function get as only accepting one element, tuple I; which since PHP doesn't have native tuples we'll simply make an array. Then it looks like that will return whatever the value stored at that index will be, so I'm not going to give it any return type.

public function get(array $i);

Next we'll define set, again ignoring the array argument, which takes the index tuple $i like get, and the value to be inserted, $j which could be of any type. According to the definition, it returns the array that results from this operation - in our case, the AbstractArray.

public function set(array $i, $v) : AbstractArray;
interface AbstractArray
{
    public function get(array $i);
    public function set(array $i, $j) : AbstractArray;
}

So we have defined the two functions...but the thing about our interface is that we cannot define any pre-conditions or post-conditions. That is, we can't ensure the array is at a given state, nor that it results in another state. For instance, in set, I have defined that it'll take an array as a first argument, and that it'll return an AbstractArray, but I have no guarantees that it won't return the same unmodified AbstractArray, a different implementation of the AbstractArray class, or an AbstractArray with the value $v set at 3 random locations in the AbstractArray. I mostly wanted to point this out, that an interface isn't a complete language implementation of an abstract data type, but it can at least help you get started implementing one.

So let's take it the rest of the way and define a new class below, MyArray, that implements the AbstractArray interface.

The constructor will reasonably take an array as an argument. And because set and get take tuples as their indices, that implies our data type should handle multi-dimensional arrays, that is, an array of arrays. Now I could just use the tuple values as-is and access a nested array by indexing the row and then column. But since the implementation is up to us, and I want to show you how to calculate a single-dimensional array as though you wanted to access a bi-dimensional, we're going to flatten our arrays into a single array.

class MyArray implements AbstractArray
{
    public function __construct(array $arr)
    {
    }
}

So if the array isn't empty, and it finds another array to be the first element of the array, then I know the user has passed in a bi-dimensional array. As such, I am going to use array_reduce to iterate over our array of arrays and reduce their elements into a single array by merging all of the arrays together in order. And then should our array be empty or single-dimensional already, we'll simply grab the values of the array so that we get a fresh copy of the array.

    public function __construct(array $arr)
    {
        if (!empty($arr) && is_array($arr[0])) {
            $this->arr = array_reduce($arr, function($acc, $i) {
                return array_merge($acc, array_values($i));
            }, []);

            $this->rows = 2;
        } else {
            $this->arr = array_values($arr);

            $this->rows = 1;
        }
    }

Now lets check out our test file that I've already populated with tests for the definition and axioms. The first test just verifies that we can access an element randomly, and the second ensures axiom one is valid, that if we fetch a set index, we'll get the set value back in return. This third test here verifies that we satisfy the second axiom, that setting one element does not disrupt another element, such that if I set the first element to 42, the second index should still be its initialized value of 1.

Finally, the last two tests are identical to the first two, except that they validate indexing into a bi-dimensional array properly.

So lets run our tests to see what happens. If you've used a PHP or another language with interfaces, you may have guessed this error where it's squawking at me for not implementing the methods on the interface, so lets declare those to get a step toward a real failure. And since set requires that we return an AbstractArray, we will return $this to satisfy that constraint.

Okay there we go, some real failures where we are not accessing an element at a given index. To implement get, we could first try naively implementing it by returning the first element from $i for our internal storage return $this->arr[$i[0]];. However, if we run the tests we can see this triggers one error, then failures for our bi-dimensional arrays since we are missing half of the index.

So lets instead change our index to a function named tupleToIndex and we'll pass in our tuple $this->tupleToIndex($i). Next at the bottom we'll declare a new private method by that name, and then we'll check to see if we have two rows. If so, we'll return the number of rows, 2, times our first index, plus the second index. Otherwise we'll simply return the first index. And I'll note that I've only done the work for if this is a one or two dimensional array. It's possible to go even further, but for time reasons we're sticking with two.

private function tupleToIndex(array $tuple)
{
    if (count($tuple) === 2) {
        return 2 * $tuple[0] + $tuple[1];
    }

    return $tuple[0];
}

Now the reason for the calculation in our if statement is this, if we have an array of pairs, 0 1 and 2 3, we know that each row is 2 elements long. So if we try to access the first element, our calculation would be 2 * 0 + 0, which results in accessing element 0 of a flat array, and that would translate to the element at row 0 column 0. And if we wanted to access the first element of the second row, the calculation would be 2*1+0, which would be the element 2, the third element of the array, meaning we'd be accessing row 1, column 0 of a bi-dimensional array. So it's all pretty simple mathematics,

//[[0,1],[2,3]]
// 2 * 0 + 0 => element[0] => element[0][0]
// 2 * 1 + 0 => element[2] => element[1][0] => 2 elements per row

Alright let's run the tests again...now we're down to 1 error and 1 failure, and looking at the lines they have issues with, 19 and 40, they both fail when calling set because it isn't implemented yet, so let's do that now. I'm just going to copy our code from above, remove the return statement since we already have one, and then set that index equal to the value $j. And now when we run them...everything passes!

This means we've just implemented the abstract array data type, utilizing a flat array as the underlying data structure. Looking back at our definition and axioms, we've implemented get such that it will return us the value at that index, implemented set so that it will set the value of that index, and satisfied the axioms that getting a value we just set will return that value, and will not disrupt a different element. This was a pretty easy one to implement overall, and I think gives us a nice introduction for implementing the abstract data types that are to follow in this course. As we learn more data structures and data types, we'll likely see some interesting combinations along the way, so keep an eye on the perks of various implementations and why they may be favorable.