Generators in the Real World

Spread the news

Generators are one of the coolest things in modern PHP! They were introduced in PHP v5.5 and expanded even more in PHP 7.  I’ve read a lot about concurrency and how many event loops utilize these generators to interrupt code to do a little bit of everything at (seemingly) the same time.  However, this is not the most useful point of generators in typical PHP applications.  Let’s take a look at a couple ways that Generators can make life easier in typical, synchronous, PHP applications.

Processing Large Files

Probably the most useful aspect of generators are their ability to fix a common problem encountered when reading a full file into memory.  During development, the test file being parsed might not be very large, but over time files tend to grow larger and larger! One of the easiest ways to parse a file is to simply read the whole thing in and make a big array.  That gives us the ability to foreach over it, which is very easy and a standard methodology learned by developers when they are getting started in PHP (and for good reason). However, as files get bigger, the array of data gets bigger – eventually getting big enough that it exhausts the amount of memory allowed by PHP.

Sure, we can increase the memory limit in the php.ini file, but how long will this increase allow it to work? And what about when we run out of memory to limit to? with this in mind, it is better to re-think the method being used.  Fortunately generators can easily be dropped in without having to refactor everything. If you were originally looping over an array, then generators can turn the array into something you can iterate over without loading all the data into memory.

Let’s build a simple generator to parse through a CSV file.  A generator is created by using a simple keyword in a standard function.  In every way it looks just like a function, with the exception that it has a yield statement.  Once a yield statement is found in a function, PHP will make the function actually return an “iterable” Generator.

function func ($var){
	//i am a function.
	return $var;
}

function gen ($var){
	//I am a generator.
	yield $var;
}

Calling func($var) will just return the $var, but calling gen($var) will return a Generator. This object can then be iterated over to get the data.

$a = "test";
$first = func($a);
$second = gen($a);

var_dump($first);  // string(4) "test"
var_dump($second); // object(Generator)#1 (0) {}
foreach ($second as $line){ var_dump($line); } // string(4) "test"

A generator works by allowing code to function to get the next value that will be yielded to the foreach. With that in mind we can use this to open a file, and get just the amount of data that we need to return. If the loop should receive an array of columns in a file, then the generator should yield the array, but if a line of data is needed, the generator should yield the line. I’ll go ahead and throw in some code to give us as associative array for files with a header row:

function csvReader($file, $hasHeaders=true){
	if (($handle = fopen($file, "r")) !== FALSE) {
		//get headers
		$headers = [];
		if ($hasHeaders){
			$headers = fgetcsv($handle);
		}
		if ($headers !== false) { //false if empty file...
			while (( $data = fgetcsv($handle) ) !== false) {
				//process headers
				if ($hasHeaders){
					foreach ($data as $key=>$value){
						unset($data[$key]); //ditch the original entry
						$data[$headers[$key]] = $value; //create a new entry with the header as the key
					}
				}
				yield $data;
			}
		}
		fclose($handle);
	}
}

Most of this code is actually there to make the data an associative array. The rest is super simple! The only “new” thing here is the keyword yield.

Let’s say we have a csv file called test.csv that has the following data:

Header 1,Header 2,Header 3
Data 1.1,Data 1.2,Data 1.3
Data 2.1,Data 2.2,Data 2.3

We can then run a small test:

$csvGenerator = csvReader('test.csv');
foreach ($csvGenerator as $line){
	var_dump($line);
}

This should give us the following:

array(3) {
  ["Header 1"]=>
  string(8) "Data 1.1"
  ["Header 2"]=>
  string(8) "Data 1.2"
  ["Header 3"]=>
  string(8) "Data 1.3"
}
array(3) {
  ["Header 1"]=>
  string(8) "Data 2.1"
  ["Header 2"]=>
  string(8) "Data 2.2"
  ["Header 3"]=>
  string(8) "Data 2.3"
}

The only downside to this, that I can come up with, would be that the data (in this case as an associative array) does not persist. That means we cannot foreach over the $csvGenerator again. Instead we will have to create it again. This might be slower for small files that need to be used multiple times, but for large files that still will not work, so using generators is the way to go in my opinion.

Combining Multiple Iterables

In addition to saving memory when reading in large files, Generators can also be used to stitch multiple data sources together.  Let’s say that we have a mysqli_resource as well as an array of data not yet in the database. PHP will automatically help us manage the memory overhead of the mysqli_resource, but if we want to put the data together, we would otherwise have to read all of the data into an array and go from there.  Sound familiar?

Instead of looping over each resource, a generator can be used together with yield from which is available as of PHP 7.0.  In a whopping three (real) lines of code we can stitch them together without adding any additional memory requirements:

function stitchTogether(...$resources){
	foreach ($resources as $resource){
		yield from $resource;
	}
}

Without coming up with a completely new scenario from the previous setup, let’s do a test putting “two” CSV files together:

$csvGenerator1 = csvReader('test.csv');
$csvGenerator2 = csvReader('test.csv');

$stitchedGenerator = stitchTogether($csvGenerator1, $csvGenerator2);

foreach ($stitchedGenerator as $line){
	var_dump($line);
}

This will give us the same results as before, except that it will double the data – showing the first instance of the file first, and then the second.

Conclusion

These two uses of generators are definitely a great tool to have, but I doubt they will change the life of the PHP developer. To be honest, I’ve only used this kind of technique once in production over the last two years, but I believe it saved me hours and hours of refactoring some legacy code that expected an array of data!

One other aspect of generators that is very useful to mention is how Amp is using them. Amp has taken generators to the next level, allowing an event loop to come in and out of an unlimited number of functions (all of which are generators). This technique quite possibly will change the life of the PHP developer as it allows us to do more asynchronous types of things. I’ll be sure to write about some uses of Amp very soon. One such example is Aerys web server, which can eliminate the need for Apache/NGINX and also allow the PHP developer to effectively utilize WebSockets! These concepts are still being actively developed, but I believe we are on the verge of successfully using them in production!


Spread the news

Leave a Reply

Your email address will not be published. Required fields are marked *