Arrays

What is an array? It seems like such a simple question, but we'll discover that an array is two independent concepts that happen to converge in some languages, but not entirely in others.

We're working on inlining examples! Until then, please find the examples here on GitHub.

Arrays

Intro

Arrays are a programming construct you've undoubtably used if you've been programming for any notable amount of time. Simply put, an array is the default data type you'd reach for to hold a collection of values that you may want to iterate over, or access directly via indexing at some point.

So here in this JavaScript REPL, I'm going to define an array of integers containing 4, 8, 15, 16, 23, and 42. If I want to iterate over them I may call numbers.forEach and pass in a function I'd like called on each element. In this case we'll simply log out to the screen: "Value is ${i}", where i is the current element we're on in the array.

When we run this code each element gets printed out in order with the phrase we've specified. I can also say that I'd like the 3rd element by passing in index 2, since in JavaScript and any other language with 0-based indexing, 0 will be the first index.

Data Type vs. Data Structure

I'm sure none of this is has been new to you, but one thing that may not be obvious is that an array is both a data type and a data structure. That is, while your language may have an array type, it may not be implemented in terms of the array data structure, but possibly as a linked list or some variant of a tree.

The array data structure in the way we commonly know it, is a data structure that represents a collection of elements with the position of each element being computable via some sort of formula. What this resolves to in many implementations is an array being stored in a contiguous block of memory, with each index representing the offset in memory to reach that element.

Explain C++ Implementation

Let's look how C and C++ handle native arrays. In C I'd have to define the type for my array, so if we wanted to represent our numbers array, I'd declare it as int numbers[6], because I'd have to specify the fixed width of the array at compile time and we have 6 numbers.

Now we should make a set of assumptions before we proceed. We'll assume that the memory address given to our array at runtime happens to be the number 1000. Next we'll assume we are on a system where an integer takes up 4 bytes of memory.

Knowing these values, we can calculate the memory address of each element in the array using a very simple formula. Let's create a function to house it named addressFor that will take the starting address, the size of the type we're representing, and the offset we need to make into the array, which is the index.

function addressFor(start, size, offet) { return }

Now we will calculate the location in memory by taking the start address and adding it to the type size multipled by the offset.

function addressFor(start, size, offet) { return start + size * offset; }

Let's test it out with our assumptions. We'll pass in the start address, which is 1000, the size of the type, which is 4 bytes, and then the offset which will be index 0. If we did this right, we should get our start address since that is the first location reservered for elements in our array.

If I pass it 1 as the index, we'll get 1004, because the first integer will only take up bytes 1000 through 1003.

C++ Implementation

Now that this primer is out of the way, let's look at some real C++ code for just a moment. If you've never seen C++ before, don't panic! You don't need to understand the syntax, beyond what I explain.

#include <iostream>

using namespace std;

int main() {
  const int ARRAY_SIZE = 6;
  int numbers[ARRAY_SIZE] = { 4, 8, 15, 16, 23, 42 };
  std::cout << "Size of Int " << sizeof(int) << " bytes" << endl;
  std::cout << "Base Address: " << numbers << endl;

  for (int i = 0; i < ARRAY_SIZE; ++i) {
    cout << "Index " << i << " address: " << &numbers[i] << endl;
  }
}

Okay so on line 6 I define the length of my array, then below declare it similarly to as we showed before, but here I've also listed out the values of the array.

Next, to make sure we know the size of an integer on my system, I'm printing it to stdout, followed by the starting address of our array which is represented as a hexadecimal number. We only need to be concerned with the last two digits, which conveniently start at 00. Next we see 04, just like our last example, followed by 08, then 0c, which is 12 in hex, and it continues on till the last element of our array.

Having these easily resolved addresses in memory allows our program to look up positions in memory extremely efficiently and in constant time. If constant time is a new term, it effectively means that whether our array has one element or one billion, it will perform the same number of operations to complete a task, which we demonstrated in our JS example here by having a single function that can tell us where any element is located in memory.

Outro

As mentioned before, languages may or may not use array structures to implement their array types internally. JavaScript arrays, for the most part, uses C arrays under the hood. Java seems to have some form of array structure without a guarantee of a contiguous block of memory, and some other languages use linked lists or trees for a variety of reasons typically related to tradeoffs of time and complexity for various operations.

While it's not terribly important to know your platform's internal representation, it can sometimes help debug issues involving speed or space. Additionally in many languages like Ruby, Java, or PHP, there exist several array-like data types that provide interfaces compatible with arrays, or allow you to create a data structure of your own. It's certainly a topic worth exploring as you start looking into creating collections objects.