How Futures Aid Task Based Multithreading

Task based multithreading makes a lot of sense and is a great way to break work into manageable parcels that can be processed in parallel. A task can be conceptualised as a finite piece of work that produces a result when complete. Using a thread to carry out a task is logical, but the problem with using one is that they have no in built concept of returning a result or communicating if they have finished the work or not. Working with this limitation can lead to all sorts of problems and backasswards idioms being applied (like polling the thread for the result). Futures are the correct solution to this problem and as of boost 1.41.0 there is a solid C++ implementation freely available.

Futures were designed specifically to solve the problem described above and fundamentally they allow you to get the return value from a thread when it's ready. The only caveat here is that whoever is retrieving the result must be prepared to block while waiting for it. So to solve the question of: “is the result ready yet?” A future answers: “You'll just have to wait until it is!”. In a lot of situations this is perfectly fine but if you can't afford to wait then you should consider using another technique.

A Concrete Example

The example I have chosen to illustrate the use of futures is a parallel hashing function. If you don't know what hashing is then you can find a concise summary here. Basically hashing a large number of values can take a long time and is a good candidate for being processed in parallel. Incidentally the example provided, using 1 additional thread, close to halves the execution time when processing the hashes when compared to a single threaded version.

You can get the example code for this tutorial here and it should be referred to when reading the rest of this tutorial. As per usual the example code uses some of the boost libraries so you will need to have them installed if you want to compile and run it.

Running the Example Code

Running the example code will produce something close to the following output:

Generating random numbers to hash...

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

Finished generating numbers, hashing...

Hashing Took: 00:00:00.676386

So very simply; numbers to hash are generated, the numbers are then hashed and a summary of the time it took is displayed.

To complement this here is the guts of the code for the main function:


/** typdefs to improve readability */
typedef boost::uniform_real<> UniformDist;
typedef boost::mt19937 BaseRandomNumberGenerator;
typedef boost::variate_generator<BaseRandomNumberGenerator&,UniformDist>
    NumberGenerator;

typedef boost::posix_time::microsec_clock Clock;

const unsigned int NUM_VALUES = 1E07;

/** set up the random number generator */
UniformDist distribution(0, NUM_VALUES);
BaseRandomNumberGenerator generator;
NumberGenerator numberGenerator(generator, distribution);
generator.seed(std::time(0)); // seed with seconds since Jan 1 1970

std::cout << std::endl << "Generating random numbers to hash..."
          << std::endl;

boost::shared_array<float> valuesToHash(new float[NUM_VALUES]);
boost::progress_display showProgress(NUM_VALUES);
for (int i=0; i<NUM_VALUES; ++i)
{
  valuesToHash[i] = numberGenerator();
  ++showProgress;
}

std::cout << std::endl << "Finished generating numbers, hashing..."
          << std::endl << std::endl;

boost::posix_time::ptime start = Clock::local_time();
boost::shared_array<size_t> valueHashes =
    hash::getHashValuesThreaded(valuesToHash.get(), NUM_VALUES);

std::cout << "Hashing Took: " << Clock::local_time() - start 
          << std::endl;
So values are generated using Boost.Random and then the hashes are generated using
hash::getHashValuesThreaded
.

Generating the Hashes

This is where the use of futures is so pay attention!


boost::shared_array<size_t> getHashValuesThreaded(float* values,
                                                  const int numValues)
{
  /** typedefs and constants to aid comprehensibility */
  typedef boost::packaged_task<boost::shared_array<size_t> > HashingTask;
  typedef boost::unique_future<boost::shared_array<size_t> > HashingFuture;

  const size_t middle = numValues / 2;
  const size_t halfValuesRoundedDown = numValues / 2;
  const size_t halfValuesRoundedUp = numValues - halfValuesRoundedDown;

  /** create a task and bind the hasher with the bottom
    * half of the values. */
  HashingTask hashTask( boost::bind(&hash::getHashValues,
                                    &values[0],
                                    halfValuesRoundedDown) );
  /** get the future from the task */
  HashingFuture hashFuture = hashTask.get_future();

  /** kick off a thread to hash the bottom half */
  boost::thread hashThread( boost::move(hashTask) );

  /** the thread is hashing the bottom half so now hash
    * the top half at the same time */
  boost::shared_array<size_t> topHalfResults =
      getHashValues(&values[middle], halfValuesRoundedUp);

  /** get the results from the future, blocks until they are ready. */
  boost::shared_array<size_t> bottomHalfResults = hashFuture.get();

  /** copy the two halves of the hashed values into the
      final array to be returned. */
  boost::shared_array<size_t> fullResults( new size_t[numValues] );
  std::copy(&bottomHalfResults[0], 
            &bottomHalfResults[halfValuesRoundedDown],
            &fullResults[0]);
  
  std::copy(&topHalfResults[0], 
            &topHalfResults[halfValuesRoundedUp],
            &fullResults[numValues/2]);

  return fullResults;

}

The threaded hashing function operates very simply. It splits the values to hash in two and hashes one in a thread and does the other itself. Once both are complete it copies the hashes into a unified array and returns it.

The concepts to be aware of here are that of a task and a future. A task is a thread that is set up to run and return a value through a future when it completes. A future wraps the return value of a task and is used to get that value, incurring a wait to a client if it is not yet ready.

The concept of a future greatly simplifies the task of parallelising this type of operation and the code ends up being easy to follow and understand.

Beyond the Example

The example above shows only the simplest application of futures. The boost implementation of futures contains more features to aid flexibility and I would recommend you check out the documentation on boost.org if you want more detail.

4 Responses to “How Futures Aid Task Based Multithreading”


  1. Grand Central Dispatch?

    http://en.wikipedia.org/wiki/Grand_Central_Dispatch
    http://developer.apple.com/library/mac/documentation/Performance/Reference/GCD_libdispatch_Ref/Reference/reference.html

    By the way, I haven’t read the entirety of your article yet, just the first few lines, but this is the first thing it made me think of.

    Phil
    December 17th, 2010
    • That looks to be a pretty cool library if your working with Apple tech. You’re right as well; GCD does firmly stand in the task based multithreading domain, although a lot higher level than what I go over here.

      radman
      December 17th, 2010
  2. Starting threads like that during processing is certainly not something I would advise. What about using a thread pool to do tasks ? GCD might be interesting, it’s not the only solution. Look at TBB, e.g.

    Barbie
    January 12th, 2011
    • Starting threads during processing is much less expensive than you might expect (in this example it adds negligible overhead when compared to the sorting task), but I do agree that a thread pool is better for situations where sorting happens multiple times. In the context of explaining futures I tried to keep the example as simple as possible so as not to confuse matters.

      Thumbs up for the TBB reference too, definitely applicable in the example i’m using.

      radman
      January 13th, 2011

Leave a Comment