Game Engine part 4

And we’re back! And there are (almost) no templates in sight! This episode I’m going to show an approach to a fairly generic parallel task scheduling and execution system, which should scale relatively well with the number of processor cores available. I’ll be making use of a thread pool with worker threads, mutexes and synchronization, a bit of boost::bind magic and the event publishing system from [part 3]. As you might expect, this is going to be a long tutorial, but fear not! It’s really not all that complicated.

So let’s begin by creating a flexible singlethreaded task system and work from there. Start out with the assumption that there are going to be tasks and a task manager (aka scheduler), and the manager needs to keep track of the tasks and possibly their progress. A straightforward approach is to declare an abstract superclass for the tasks and make use of pointers in the manager. The most important progress indication that I’m interested in is the moment the task is completed, which, given a type, could neatly be broadcast through the event channel. This means that the task superclass needs a pure virtual function and some unique type in order to indicate that the task has been completed. To improve usability, the task completion event should contain a pointer to the specific task that was just completed:

class Task {
public:
	struct TaskCompleted {
		TaskCompleted(boost::shared_ptr t): mTask(t) {}
		boost::shared_ptr mTask;
	};

	Task();
	virtual ~Task();
	
	virtual void run() = 0;
};

This seems like a good place to start the design of the task manager. The manager needs at least some kind of functionality to add new tasks, and a method that executes them. The execution method is also a convenient place to broadcast the task completion event:

class TaskManager {
public:
	typedef boost::shared_ptr TaskPtr;
	typedef std::deque TaskList;

	TaskManager() {}
	~TaskManager() {}
	
	void add(TaskPtr task) {
		mTaskList.push_back(task);
	}
	
	void execute(TaskPtr t) {
		EventChannel chan;
		
		t->run();
		chan.broadcast(Task::TaskCompleted(t));
	}
	
private:
	TaskList mTaskList;
};

Now, the manager needs some kind of loop that keeps executing any available tasks, until the program needs to terminate. Checking to see if the program quits can be done in many ways, but I prefer to make use of the event system and respond to a shutdown signal, so let’s add the following to the manager code:

struct TerminationEvent {};

void start() {
	EventChannel chan;
	mRunning = true;
	
	chan.add(*this);
	
	while (mRunning) {
		if (!mTaskList.empty()) {
			TaskPtr t = mTaskList.front();
			mTaskList.pop_front();			
			execute(t);
		}
	}
}

void stop() {
	mRunning = false;
}

void handle(const TerminationEvent& ) {
	stop();
}

This way, the manager will keep executing tasks until a termination event is broadcast through the event channel. There are a couple of ways to proceed from here, but basically this is a suitable (and more importantly, extensible) replacement of a typical ‘main loop’ that can be found in singlethreaded engines. The next step from here means adding more capabilities to the task system, for example whether or not the task should be repeated after a single execution. Again, there are a couple of options to achieve this, but I plan to add a couple capabilities of the boolean type so I pick the packed bitfield option:

class Task {
public:
	struct TaskCompleted {
		TaskCompleted(boost::shared_ptr t): mTask(t) {}
		boost::shared_ptr mTask;
	};
	
	enum {
		REPEATING = 0x1 << 0,
		
		SINGLETHREADED = NO_FLAGS,
		SINGLETHREADED_REPEATING = REPEATING,
		
		NO_FLAGS = 0x0,
		ALL_FLAGS = ~0x0
	};

	Task(unsigned int taskFlags = SINGLETHREADED): mTaskFlags(taskFlags) {}
	virtual ~Task() {}
	
	virtual void run() = 0;
	
	unsigned int getTaskFlags() const { return mTaskFlags; }
	
private:
	unsigned int mTaskFlags;
};

Note that the enum contains a couple of different types of value-setting methods - the REPEATING value is set as a single bit through a method of value assignment which is convenient for assigning single bits, SINGLETHREADED and SINGLETHREADED_REPEATING are aliases of other values in the enum and lastly the ALL_FLAGS makes use of a shorthand notation for the binary inverse of 0x0, which means 'all bits to 1'. Now then, we can repeat tasks in the manager by listening to the TaskCompleted event and re-adding the task if it had the repeating flag turned on:

//...in TaskManager::start()
	chan.add(*this);
//...

void TaskManager::handle(const Task::TaskCompleted& tc) {
	if (tc.mTask->getTaskFlags() & Task::REPEATING)
		add(tc.mTask);
}

Pretty much trivial, but still kind of neat. Now then, about adding multithreaded capabilities to this... first off, there is no single solution to doing this. It's also pretty much a guaranteed way to make your code more complex. By a lot. But, first things first, the Tasklist needs to be some kind of thread-safe storage class. I prefer to make a wrapper for an existing STL container, something like this:

template 
class ConcurrentQueue {
private:
	typedef std::queue Queue;
	typedef boost::mutex Mutex;
	typedef Mutex::scoped_lock ScopedLock;
	typedef boost::condition_variable Condition;

public:
	bool empty() const;
	std::size_t size() const;
	void push(const T& value);
	bool try_pop(T& result);
	T wait_pop();

private:
	Queue mQueue;
	mutable Mutex mMutex;
	Condition mCondition;
};

/* implementation */
template 
bool ConcurrentQueue::empty() const {
	ScopedLock lock(mMutex);
	return mQueue.empty();
}

template 
std::size_t ConcurrentQueue::size() const {
	ScopedLock lock(mMutex);
	return mQueue.size();
}

template 
void ConcurrentQueue::push(const T& value) {
	ScopedLock lock(mMutex);
	
	mQueue.push(value);
	lock.unlock();

	mCondition.notify_one();
}

template 
bool ConcurrentQueue::try_pop(T& result) {
	ScopedLock lock(mMutex);

	if (mQueue.empty())
		return false;

	result = mQueue.front();
	mQueue.pop();

	return true;
}

template 
T ConcurrentQueue::wait_pop() {
	ScopedLock lock(mMutex);

	while (mQueue.empty())
		mCondition.wait(lock);

	T result(mQueue.front());
	mQueue.pop();

	return result;
}

The STL queue is a good starting point, but all containers can similarily be wrapped and made threadsafe. Note that in the implementation above I provide both a blocking (wait_pop) and a nonblocking (try_pop) pop method. This may come in handy later on, but for the task manager really only the blocking method is relevant. Now that we have a replacement for the old queue, we can start work on implementing background task execution. First off, let's start by adding a second property flag to the Tasks' enumeration:

//...Task::enum
THREADSAFE = 0x1 << 1,

BACKGROUND = THREADSAFE,
BACKGROUND_REPEATING = THREADSAFE | REPEATING,
//...

Now that there is something to test for, I'd like to have two separate queues in the manager, one for tasks that are to be executed in the main thread and one for tasks that can be executed from any thread. Also, I'd like to implement a thread pool that employs worker threads to execute background threads:

//...class TaskManager
	typedef ConcurrentQueue TaskList;
	
	TaskList mTaskList;
	TaskList mBackgroundTaskList;
	
	boost::thread_group mThreads;

//...void TaskManager::start()
	unsigned int numThreads = boost::thread::hardware_concurrency() + 1;
	
	for (unsigned int i = 0; i < numThreads; ++i)
		mThreads.add_thread(
			new boost::thread(
				boost::bind(&TaskManager::worker, this)
			)
		);
		
	while (mRunning) {
		execute(mTaskList.wait_pop());
		boost::thread::yield();
	}

//...void TaskManager::worker()
void TaskManager::worker() {
	while (mRunning) {
		execute(mBackgroundTasks.wait_pop());
		boost::thread::yield();
	}
}

//...TaskManager::~TaskManager()
TaskManager::~TaskManager() {
	mThreads.join_all();
}

I planned to explain a bit about boost::bind here, but I don't think it can be any clearer than the explanation found at [Think-Async]. Ok then, now for the final part of this tutorial, I'll add one more task property, relevant for background tasks - synchronization. It is conceivable that some tasks can run in the background without worrying about when they are completed, but some tasks might need to be synchronized with each other. First, let's add another flag to the Task enum:

//...Task enum
	FRAME_SYNC = 0x1 << 2,
	
	BACKGROUND_SYNC = THREADSAFE | FRAME_SYNC,
	BACKGROUND_REPEATING_SYNC = THREADSAFE | REPEATING | FRAME_SYNC
//...

Now then, there is a need for some kind of synchronization point, for which I choose the moment that all of the singlethreaded tasks have been executed once. Right now, that's going to be a problem, but let's introduce a second queue for the main thread so that there is one queue that can be processed and one queue for the 'next frame'. This way, you can just check to see if the current queue is empty.

//...TaskManager
TaskList mTaskList[2];
	
unsigned int mReadList;
unsigned int mWriteList;
	
//...TasManager::TaskManager()
	mReadList = 0;
	mWriteList = 1;
	
//...TaskManager::add(TaskPtr task)
	mTaskList[mWriteList].push(task);
	
//...TaskManager::start()
	while (mRunning) {
		if (!mTaskList[mReadList].empty())
			execute(mTaskList[mReadList].wait_pop());
		else {
			//the current frame is complete
			std::swap(mReadList, mWriteList);
		}
	}
//...

To me, this is reminiscent of front/back buffer techniques used in graphics code, with similar advantages and disadvantages. Now then, let's continue with the synchronized background tasks. The manager isn't neccesarily interested in which specific tasks need to be waited for, but mostly whether or not there are any background tasks left at all in the current 'frame'. On the other hand, repeating synchronized tasks need to be prevented from being executed multiple times per frame. So that's two separate points that need to be taken into consideration here.

Let's start by adressing the multiple-executions-per-frame thing. One way to resolve this is to delay the adding of those tasks until a synchronization point is reached:

//...TaskManager

TaskList mBackgroundSyncTasks;

//...

void TaskManager::add(TaskPtr task) {
	unsigned int flags = task->getTaskFlags();
	
	if (flags & Task::THREADSAFE) {
		if (flags & Task::FRAME_SYNC)
			mSyncTasks.push(task);
		else
			mBackgroundTasks.push(task);
	}
	else
		mTaskList[mWriteList].push(task);
}

//...TaskManager::start
	while(mRunning) {
		if (!mTaskList[mReadList].empty()) {
			execute(mTaskList[mReadList].wait_pop());
		}
		else {
			synchronize();
			std::swap(mReadList, mWriteList);
		}
		
		boost::thread::yield();
	}

Now, all that's left is to write the synchronization part, which should first wait until the requisite number of synchronized background tasks have completed, and then dump the SyncTask queue into the BackgroundTasks queue (after storing how many there were):

//...TaskManager

typedef boost::mutex Mutex;
typedef boost::condition_variable Condition;
typedef Mutex::scoped_lock ScopedLock;

mutable Mutex mSyncMutex;
Condition mSyncCondition;
unsigned int mNumTasksToWaitFor;

//...

void TaskManager::synchronize() {
	ScopedLock lock(mSyncMutex);
	
	while (mNumTasksToWaitFor > 0)
		mSyncCondition.wait(lock);
		
	mNumTasksToWaitFor = mSyncTasks.size();
	
	while (!mSyncTasks.empty())
		mBackgroundTasks.push(mSyncTasks.wait_pop());
}

//...TaskManager::worker()
	while (mRunning) {
		TaskPtr task = mBackgroundTasks.wait_pop();
		execute(task);
		
		if (task->getTaskFlags() & Task::FRAME_SYNC) {
			ScopedLock lock(mSyncMutex);
			--mNumTasksToWaitFor;
			lock.unlock();
			mSyncCondition.notify_one();
		}
	}

And there you have it. The manager doesn't necessarily care to know which specific tasks to wait for, just that there are tasks to wait for, which means that the number of tasks to wait for is really the only thing you need. There are a couple more things that could be added such as priorities, task dependencies, timing and so on, but as a multithreaded task execution system, this is pretty useable already.

If it turns out that the centralized job pool is a source of a lot of contention problems, I may upgrade this design at a later stage to make use of a slightly more advanced job-stealing scheme (such as used by TBB).

Next time I'll continue with component design and configuration, 'till then!

Source for this tutorial can be found [here]

Edit: There is a slight flaw in the above design, which will affect the shutdown code - if there are no background tasks the background threads will wait forever, including during shutdown. There are two easy fixes for this, one of which is to simply interrupt the background threads in the destructor:

TaskManager::~TaskManager() {
	mThreads.interrupt_all();
	mThreads.join_all();
}

Another relatively easy fix is to guarantee that there is at least one background task, which can be implemented like this:

class BackgroundDummyTask: public Task {
public:
	BackgroundDummyTask(): Task(BACKGROUND_REPEATING) {}
	virtual ~BackgroundDummyTask() {}
	virtual void run() {}
};

//...in TaskManager::start()
	add(TaskPtr(new BackgroundDummyTask()));

The latter has the advantage that it actually allows background tasks to finish what they're doing.

6 thoughts on “Game Engine part 4

  1. I’ve played with your event/task code into my engine and I like it. I’ve made a few minor changes to remove the Boost dependency and I was thinking about using them both in my engine and uploading my version to GitHub.

    However, I’d like to know what copyright/attribution you’d like for your code. I want to give credit where credit is due and honor proper usage.

    1. Nice to know you like the concept. I really don’t recall why I had the boost dependency for these classes, plain std library should be fine.

      As for copyright/attribution, I guess that a standard MIT license should be fine. Thanks again for your interest and contributions 🙂

  2. Hey Ian and Grandmaster, you both did some fantastic work here. Helped me to understand and grasp on more of the advanced techniques.

    However, on the Eventhandlerbridge, I’m doing some tests and whenever I try to remove the channel, the actual bool EventBridge::operator == (const U& handler) const always returns false.

    Basically both mHandler and the passed handler are two different pointers and I don’t understand why. Could be an oversight of mine of sorts. The code currently is almost the same as in Tutorial 3.

    1. And it seems it was human error (or my error) at not placing a & at a specific point. Anyway, many thanks guys for the good work and Grandmaster for these wonderful tutorials.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.