Type Safety Done Right - PHP Array Hacking
Published on October 31, 2023

In the first five or six years of my experience in developing larger-scale PHP applications, I experienced and created many of the issues that people starting out in backend development will have often seen. It's only now, especially after inheriting the Vonage PHP SDK, that I've realised that I am in a position where I can educate others not to do the same. These problems actually stem a lot from how I've seen PHP in particular being written over the years, regardless of whether a framework is being used or not. Let's throw the first grenade:

PHP Arrays are sort of awful

OK, they're not awful, that would be an odd thing to say. What I mean is, the way they are often used is awful. There is a very good reason for this; starting out with PHP as my first language (after VBA and SQL) I was not aware of how other languages handle collections of data. I just assumed they were the same as PHP:

  • You have an index array, which has a numeric key or

  • You have an associated array, which has defined alphanumeric keys.

Both are considered the same type of variable i.e. array, because the engine has resolved what kind of array you want to create.

$myAssociatedArray = ['foo' => 'bar']; // array
$myHashedArray = ['foo', 'bar']; // also array

That's how everything else probably works, right?

No, it isn't. My naivety with using just PHP has meant that I was not aware of the following statement:

In every other major backend language indexed and associative arrays are split into two or more different base classes.

Furthermore, Associative Arrays are a term used only in PHP. From now on, we'll call associative arrays hashed arrays because that is the term used most elsewhere.

Here is the behaviour for other languages:

NodeJS

  • Array (indexed array)

  • Object (hashed array)

Python

  • List (indexed array)

  • Dictionary (hashed array), using List Comprehension

Ruby

  • Array (indexed array)

  • Hash (hashed array)

It's worth noting that Ruby can actually have an associative array in the same way as PHP can, but because of the existence of the Hash object it is very rare to see an array being used as a hash.

Go(lang)

  • Slice (indexed array)

  • Map (hashed array)

Java

  • Array (indexed array, primitive)

  • List (interface, implemented by ArrayList)

  • ArrayList (indexed array, creates new object for sizing)

  • Hashmap (hashed array, implementation of Map)

  • LinkedHashMap (hashed array, retrains insertion order)

It's worth noting that, as you can see, Java is -very- strict with sets of data structure. All of these are mutable in that you can change their values, but you cannot change the size of the object with regard to it's nodes. This is why you can in ArrayList, because it actually creates a new object on modification.

C#

  • Array (indexed array)

  • List (indexed array with generics, mutable)

  • Dictionary (hashed array)

So, given the fact that all of these different classes in the base API of those languages, it means they all have the ability to instil stricter type behaviour.

So What?

We don't have that in PHP. It means PHP arrays can sort of be anything. It is this fact that has resulted in the sort of code I have been reading for years that looks like this:

public function updateRecord(int $id, array $options) {
	// do some stuff here
}

I want to update the ID of something. And then... I have $options. Because they are options. But it's a plain PHP array. What's in it? The function has no idea.

What's wrong with this code? If personal experience is anything to go by, four hours of xdebug stepping to find out where an array key came from. We've got no idea what it's supposed to look like. This particular bugbear of mine is something I used to code myself for years, especially in agency environments where time is money. If the only concern is "Does it work according to the client's requirements?", then corners will be cut as much as possible to ship on time. The person after you that has to fix a bug introduced from some side-effect? That's tech-debt in action for you.

OK, What Now?

The path I've trodden for maintainable code is to use type hints everywhere, all the time. This is the type of software development that, if done throughout and thoroughly, should result in the mythical self-documenting code. But, we can only type hint an array, so that doesn't help us here. So, we have three options:

  • Generics

  • Array shape (via. hacked Generics)

  • Value objects

Generics

Sorry, I tricked you: we don't have generics in PHP because of limitations within the engine. However, as you may see the word banded around a bit, consider the following code from TypeScript:

interface UserProfile {
    id: string;
    name: string;
    age: integer;
}

class Collection<type> {
    private items: Type[] = [];

    add(item: Type) {
        this.items.push(item);
    }

    get(index: number): Type | undefined {
        return this.items[index];
    }
}

const userProfiles = new Collection<userprofile>();
</userprofile></type>

As someone who is very much in the "make this PHP as robust as possible, make it harder than Valyrian Steel!", camp, generics like this are a deep joy to work with. The <> parentheses state that when creating a new Collection object, it must only contain the type given during its instantiation. So, when we create a new Collection<UserProfile>, we're saying that this object can only contain UserProfile objects. That, friends and enemies, is self-documenting code.

Can PHP Do This Some Other Way?

I'm glad you asked, yes it can! This is going to lead us to bullet point number two, which is an Array definition. This sounds so stupid for me to be saying this, but until I joined Vonage it had never occurred to me that I could extend off core SPL objects within the PHP API. So, you can extend arrays!

class Collection extends \ArrayObject {

	protected function __construct(...$args) {
		parent::__construct(...args);
	}
}

Currently, this does nothing apart from passing the arguments to the parent class with the splat operator, which is an ArrayObject. But, we've extended it, which means we can overload the behaviour!

We need to do two things here:

  • Set a type, the same way we did in TypeScript

  • Overload the offsetSet() method which is the SPL method for adding to an array (either when adding a value after instantiation or during the constructor creation).

So, let's address this:

class Collection extends \ArrayObject {  

	protected $typeError = 'Only UserProfile objects can be added!'
	
    public function __construct(...$args) {  
    
        foreach ($args as $arg) {  
            if (!$arg instanceof UserProfile) {  
                throw new \TypeError($this->typeError);  
            }  
        }  
        
        parent::__construct(...$args);  
    }  
  
    public function offsetSet($key, $value): void  
    {  
  
        if (!$value instanceof UserProfile) {  
            throw new \TypeError($this->typeError);  
        }  
  
        parent::offsetSet($key, $value);  
    }  
}

And there we have it. offsetSet is overloaded so that only UserProfile objects can be added, and when creating the Collection you can pass any number of arguments in as long as they're UserProfile objects.

Value Objects

There is a much simpler solution to the problem of throwing arrays around: don't. I have continued the work carried out by my predecessors on the Vonage PHP Core SDK to make sure that anything that is passed into Client methods is a value object.

Yes, there are downsides to this. It means class properties need to be defined, it means that getter and setter methods need to be added. It means more code. However, we also have tooling to make the code less bloated in modern PHP, with the use of either native enum classes or constructor property promotion complete with Access Modifiers to keep the line rates down.

A good example of this was when I coded the PHP SDK implementation of Verify v2. Sending a verification request would look like this without value objects:

$payload = [
	'locale' => 'en_us',
	'channel_timeout' => 300,
	'client_ref' => 'a-reference',
	'code_length' => 4,
	'workflow' => [
		'channel' => 'sms',
		'to' => '123456789'
	]
];

$myVonageClient = new Client('apiKey', 'apiSecret');
$myVonageClient->verify2()->startVerification($payload);

The client object knows nothing about that array being passed into it. What if a key is wrong? What if a key is missing? There's nothing within the code that tells you about the behaviour: and if I did put validation into the startVerification() method then you'd have to trawl through my code to find out what was in my head.

Instead, using value objects, we shift the logic into well-typed PHP. It becomes self-documenting and hardened by design.

class SMSRequest extends BaseVerifyRequest  
{
	public function __construct(  
	    protected string $to,  
	    protected string $brand,  
	    protected ?VerificationLocale $locale = null,  
	) {  
	    if (!$this->locale) {  
	        $this->locale = new VerificationLocale();  
	    }  
	  
	    $workflow = new VerificationWorkflow(VerificationWorkflow::WORKFLOW_SMS, $to);  
	  
	    $this->addWorkflow($workflow);  
	}
}

This object is now passed to the startVerification() method. What you can and cannot do in the objects passed in can now be defined at the lowest level of the code. For example, here is the VerificationLocale object.

class VerificationLocale  
{  
    private array $allowedCodes = [  
        'en-us',  
        'en-gb',  
        'es-es',  
        'es-mx',  
        'es-us',  
        'it-it',  
        'fr-fr',  
        'de-de',  
        'ru-ru',  
        'hi-in',  
        'pt-br',  
        'pt-pt',  
        'id-id',  
    ];  
  
    public function __construct(protected string $code = 'en-us')  
    {  
        if (! in_array($code, $this->allowedCodes, true)) {  
            throw new \InvalidArgumentException('Invalid Locale Code Provided');  
        }  
    }  
  
    public function getCode(): string  
    {  
        return $this->code;  
    }  
  
    public function setCode(string $code): static  
    {  
        $this->code = $code;  
  
        return $this;  
    }  
}

And there we have it. Self-documenting code. Yes, you have to create all the objects to stitch them all together, but also how our SDK behaves is both strict and explicit.

James SecondeSenior PHP Developer Advocate

A trained actor with a dissertation on standup comedy, I came into PHP development via. the meetup scene. You can find me speaking and writing on tech, or playing/buying odd records from my vinyl collection.

Ready to start building?

Experience seamless connectivity, real-time messaging, and crystal-clear voice and video calls-all at your fingertips.