Scaling Solutions – Primes part 1

Recently I attended DConf 2016, where I got to know a lot of people involved with the D programming language. Let’s just say that before attending D had my interest, but now it has my attention 😉

As I mentioned to several people at the conference I’m interested in using prime number calculation as a method for learning when and why software architectures make sense – in classes only the very basic stuff is taught, but as soon as you would like to find prime numbers that are ‘interesting’ (say, for use in encryption) you would do better to use quite a different approach. To make an analogy, you don’t really need help to build a house of cards, but when building a cathedral, different principles apply as well.

That said, my experience with the D language is rather limited, so at least initially I’ll be going over some actual basics for those who are not familiar with the language. Gradually the topics covered in this series will cover more advanced topics and techniques, but I’m starting out assuming that you don’t know the D language at all. I’m developing using Microsoft Visual Studio 2015, with the Visual D plugin.

So let’s get started – a fresh project in the D language. To keep things as simple as possible, just start out with a console application. You should have a single “main.d” file available in the solution with the following content:

import std.stdio;

int main(string[] argv)
{
    writeln("Hello D-World!");
    return 0;
}

I’m coming from a C++ background, so this seems very familiar. The details are all different, of course. And pretty irrelevant for our little project. We want to calculate primes; let’s start with the first 10 and output them to the standard console output — choosing such a low number allows visual verification of the outputs’ correctness, and as such is a reasonable starting point for any algorithm.

import std.stdio;

bool isPrime(ulong x, const ref ulong[] currentSet) {
	// if any of the primes in the current working set cleanly divides the value x it is not prime
	foreach (prime; currentSet)
		if ((x % prime) == 0)
			return false;

	return true;
}

int main() {
	auto primes = new ulong[1];	// create a dynamic array that can be appended to

	primes[0] = 2;			// provide the first prime number

	while (primes.length < 10) {
		auto lastPrimeFound = primes[$ - 1];		
		auto value = lastPrimeFound + 1;	// the next prime number is always larger than the last one found

		while (!isPrime(value, primes))		// test for a prime number
			++value;			// if it wasn't prime, increment and repeat

		// append the prime number to the working set
		primes ~= value;
	}

	// loop over the values found and print them to the console
	foreach (value; primes)
		writeln(value);

    return 0;
}

Of course this is algorithmically a bit silly; it works, but is one of the most naive approaches one can take. It does highlight a few features of the D language that might seem a bit alien - arrays have a .length property, the end of the array can be referenced by '$' and the ~= operator is used to append to it. Automatic type deduction, const references, foreach loops. Nothing shocking, which is how it is supposed to be.
Let's address a couple of those algorithmic inefficiencies - one of the biggest improvements for this iterative search is to test just the odd numbers. Also, not the entire set of primes needs to be checked -- if we've reached a prime that is larger than the root of the number we're considering there's no way that any of the remaining prime numbers will cleanly divide it either.

import std.stdio;

bool isPrime(ulong x, const ref ulong[] currentSet) {
	// if any of the primes in the current working set cleanly divides the value x it is not prime
	foreach (prime; currentSet) {
		if ((x % prime) == 0)
			return false;

		if ((prime ^^ 2) > x)	// the ^^ is the operator for powers, so prime^^2 is prime squared
			return true;
	}

	return true;
}

int main() {
	auto primes = new ulong[2];	// create a dynamic array that can be appended to

	primes[0] = 2;				// provide the first prime number
	primes[1] = 3;				// we're only checking odd numbers, so the first odd number needs to be provided by hand

	while (primes.length < 10) {
		auto lastPrimeFound = primes[$ - 1];		
		auto value = lastPrimeFound + 2;	// the next prime number is always larger than the last one found

		while (!isPrime(value, primes))		// test for a prime number
			value += 2;			// if it wasn't prime, increment and repeat (just the odd ones now)

		primes ~= value;			// append the prime number to the working set
	}

	// loop over the values found and print them to the console
	foreach (value; primes)
		writeln(value);

    return 0;
}

These are very small additions that should be greatly beneficial at very low cost. Far fewer numbers are considered, and the iteration over the set of primes can be terminated earlier. Both of these are key algorithmic optimization techniques, albeit very obvious ones.
The next step is to try and find a larger set, let's say 1000. Now the problem (not the algorithm) changes slightly, in that the console is no longer a very appropriate output method. It's also more difficult to verify the correctness of the program. Luckily, prime numbers are a well-researched topic and the first 1000 primes can be found easily online at the Prime Pages. As a very careful step we could just get the first 1000 primes and verify that the last one is equal to 7919. We could store the entire table in a file as well.

import std.stdio;
import std.conv : to;

bool isPrime(ulong x, const ref ulong[] currentSet) {
	// if any of the primes in the current working set cleanly divides the value x it is not prime
	foreach (prime; currentSet) {
		if ((x % prime) == 0)
			return false;

		if ((prime ^^ 2) > x)
			return true;
	}

	return true;
}

int main() {
	auto primes = new ulong[2];	// create a dynamic array that can be appended to

	primes[0] = 2;				// provide the first prime number
	primes[1] = 3;				// we're only checking odd numbers, so the first odd number needs to be provided by hand

	while (primes.length < 1000) {
		auto lastPrimeFound = primes[$ - 1];		
		auto value = lastPrimeFound + 2;		// the next prime number is always larger than the last one found

		while (!isPrime(value, primes))			// test for a prime number
			value += 2;							// if it wasn't prime, increment and repeat

		primes ~= value;						// append the prime number to the working set
	}

	// loop over the values found and store them in a file
	{
		import std.file;

		auto file = File("primes.txt", "w"); // opens the file (closed at end of scope)

		foreach (value; primes)
			file.writeln(value);

		writeln("Final prime: " ~ to!string(primes[$ - 1]));
	}

    return 0;
}

unittest {
	auto primes = new ulong[2];	// create a dynamic array that can be appended to

	primes[0] = 2;				// provide the first prime number
	primes[1] = 3;				// we're only checking odd numbers, so the first odd number needs to be provided by hand

	while (primes.length < 1000) {
		auto lastPrimeFound = primes[$ - 1];		
		auto value = lastPrimeFound + 2;		// the next prime number is always larger than the last one found

		while (!isPrime(value, primes))			// test for a prime number
			value += 2;							// if it wasn't prime, increment and repeat

		primes ~= value;						// append the prime number to the working set
	}

	assert(primes[$ - 1] == 7919, "Unexpected value: " ~ to!string(primes[$ - 1]));

	writeln("Unittest completed");
}

This illustrates several D features - template instantiation (to!string), built-in unit testing, standard file I/O, scoped imports and RAII. I think I'll leave it here for now, as this would be about the point where homework exercises would end. Next time we'll raise the limit to 50000 at the very least, where performance improvements start to become noticable.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.