Skip to content

Introduction to PDL Threading

Zakariyya Mughal edited this page Jan 5, 2015 · 2 revisions

This needs a little bit of clean-up in the examples at the beginning (before the Game of Life example) to make sure that variables are consistent across lines and with the text. Lambd 05:44, 7 December 2009 (UTC)

TOC

Introducing PDL Threads

One of the most powerful features of PDL is threading, which can produce very compact and very fast PDL code by avoiding multiple nested for loops that C and BASIC users may be familiar with. The trouble is that it takes a little bit of getting used to, since (at least for me!) it took a while to appreciate the power of threading.

Other vector based languages, such as MATLAB, use a subset of threading techniques, but PDL shines by completely generalizing them for all sorts of vector-based uses.

Thinking in Terms of Threading

If you have used PDL for a little while already, you may have been using threading without realising it.

It may be useful to start up a pdl2 shell - we'll be using this for most of our examples.

> pdl2

In these examples we'll use a simple image, i.e. a two-dimensional array of numbers:

$a = sequence(11,9);

Okay, so let's see what that looks like:

pdl> p $a 

[
 [ 0  1  2  3  4  5  6  7  8  9 10]
 [11 12 13 14 15 16 17 18 19 20 21]
 [22 23 24 25 26 27 28 29 30 31 32]
 [33 34 35 36 37 38 39 40 41 42 43]
 [44 45 46 47 48 49 50 51 52 53 54]
 [55 56 57 58 59 60 61 62 63 64 65]
 [66 67 68 69 70 71 72 73 74 75 76]
 [77 78 79 80 81 82 83 84 85 86 87]
 [88 89 90 91 92 93 94 95 96 97 98]
]

A PDL variable (sometimes known as a piddle) consists of a set of numbers (with a default precision of 'double') that can go from a single simple number up to a multi-dimensional space. Now, let's use the dims command to see what the dimensions are:

pdl> p $a->dims                                                              
11 9

...and some more detailed information can be obtained with the function info:

pdl> p $a->info
PDL: Double D [11,9]

So far, so good. PDL sees a two-dimensional array with dimensions of length 11 and 9. In keeping with Perl numbering, the first dimension is dimension 0 and the second dimension is dimension 1, and so on.

Now, if we wanted to add 3 to all elements in an array of size

nx,ny

a traditional language would use two nested for loops, one in dimension 0 and the other in dimension 1 - here's some pseudo code: for (x=0; x

$a

:

$line = sequence(11);
pdl> p $line;                                                                
[0 1 2 3 4 5 6 7 8 9 10]
pdl> $c = $a - $line;
pdl> p $c;                                                             

[
 [ 0  0  0  0  0  0  0  0  0  0  0]
 [11 11 11 11 11 11 11 11 11 11 11]
 [22 22 22 22 22 22 22 22 22 22 22]
 [33 33 33 33 33 33 33 33 33 33 33]
 [44 44 44 44 44 44 44 44 44 44 44]
 [55 55 55 55 55 55 55 55 55 55 55]
 [66 66 66 66 66 66 66 66 66 66 66]
 [77 77 77 77 77 77 77 77 77 77 77]
 [88 88 88 88 88 88 88 88 88 88 88]
]

Two things to note here - firstly, the value of

$a

is still the same - try

p $a

again and you'll see that it's unchanged. Secondly, PDL automatically subtracted

$line

off each row in

$a

. Why did it do that? Let's look at the dimensions of

$a

,

$line

and

$c

:

pdl> p $a->info                                                              
PDL: Double D [11,9]
pdl> p $line->info                                                           
PDL: Double D [11]
pdl> p $c->info                                                              
PDL: Double D [11,9]

So, both

$a

and

$line

have the same number of elements in the 0th dimension! What PDL then did was thread over the higher dimensions in

$a

and repeated the same operation 9 times to all the rows on

$a

. This is

PDL threading in action.

Do not worry - if you did want to subtract off

$line

from the first line in

$a

ONLY, you can do that by specifying the line explicitly, as so:

$a(:,0) -= $line;

Specifying chunks and bits of piddles is a separate topic, but you can

go read

PDL::NiceSlice

for more on that.

Back to the threading - the true power comes when you realize that the PDL can be any number of dimensions!

Let's make a 4 dimensional PDL:

$b = sequence(11,3,7,2);

I wouldn't try printing this out, but give it a go if you want to.

Let's subtract off

$line

again:

$c = $b - $line;

pdl> p $b->info;
PDL: Double D [11,3,7,2]
pdl> p $c->info;
PDL: Double D [11,3,7,2]

PDL now threads over the three higher dimensions

automatically, subtracting off

$line

all the way.

But, maybe you don't want to subtract off from the rows (dimension 0), but off the columns (dimension 1), how do I subtract off a column of

numbers from each column in

$a

?

$cols = sequence(9);

p $a - $cols;

pdl> p $a - $cols                                                            
PDL: PDL::Ops::minus(a,b,c): Parameter 'b'
PDL: Mismatched implicit thread dimension 0: should be 11, is 9
    
Caught at file (eval 76), line 4, pkg main

Uh oh, we have a problem. PDL didn't do anything, and that's right - we're not subtracting off a line, which is in the 0th dimension, we're subtracting off a column, which is in the 1st dimension:

pdl> p $a->info                                                              
PDL: Double D [11,9]
pdl> p $cols->info                                                           
PDL: Double D [9]

How do we tell PDL that we want to subtract off the 1st dimension?

Manipulating Dimensions

There are a whole set of PDL functions that allow you to rearrange the dimensions of PDLs, and they are mostly covered in

PDL::Slices

The three most useful ones are:

xchg
reorder
mv

In PDL, as in Perl, There Is More Than One Way To Do It - in our first

approach, we will rearrange the order of the dimensions of

$a

so that it is the 0th dimension. Note that we don't lose the original shape of

$a

at all -

$a

is in

fact still sitting there and accessible as it ever was.

$a = sequence(11,9,3);

$a_redone = $a->xchg(1,0);

p $a->info

pdl> p $a->info                                                              
PDL: Double D [11,9,3]
pdl> p $a_redone->info
PDL: Double D [9,11,3]                                                       

So,

mv()

moves dimension 1 into the slot of dimension 0, and now we can subtract off

$col

$c = $a_redone - $cols;

Presto!

To Flow Or Not to Flow

You need to be slightly careful when you start to play with PDL,

especially with the assignment operator (

.=

).

By default, PDLs are linked together so that changes on one PDL will go back and affect the original piddle as well.

Back with the earlier example:

$a = sequence(9,11,3);
$a_redone = $a->xchg(1,0);

Here,

$a_redone

is not a separate piddle, but it is a way of looking at

$a

with a different index. If you change a value in

$a_redone

, it flows back and affects

$a

! Sometimes you want this behaviour, and

sometimes you want to avoid it.

If we do:

$c = $a_redone - $cols;


$a

and

$c

are two independent piddles, which are not connected together.

However, if we do:

$a_redone .= $a_redone - $cols;

Then not only is

$a_redone

different, but

$a

is changed as well!

$a_redone = $a->copy->xchg(1,0);

...generates a copy of

$a

, and then this distinct, new copy has

$a_redone

pointed to it instead.

Life is but a Game

Okay, enough theory. Let's do something a bit more interesting and write a game of Life in PDL and see how powerful it can be!

The Game of Life is a simulation run on a big two dimensional grid. Each cell in the grid can either be alive or dead (represented by 1 or 0). The next generation of cells in the grid is calculated with simple rules according to the number of living cells in it's immediate neighbourhood - is an empty cell has exactly three neighbours, a living cell is generated. If a living cell has less than two neighbours, it dies of overfeeding, or if it has 4 or more neighbours, it dies from starvation.

Only the first generation of cells is determined by the programmer - after that, the simulation runs completely according to these rules.

To calculate the next generation, you need to look at each cell in the 2D field (requiring two loops), calculate the number of live cells adjacent to this cell (requiring another two loops) and then fill the next generation grid.

Here's a classic way of writing this in Perl (although we do use PDL for addressing individual cells):

#!/usr/bin/perl -w
use PDL;
use PDL::NiceSlice;

# make a board for the game of life
my $nx = 10;
my $ny = 11;

# current generation
my $a = zeroes($nx, $ny);

# next generation
my $n = zeroes($nx, $ny);

# put in a simple glider
$a(1:3,1).=1;$a(1,2).=1;$a(2,3).=1;

while (1) {

    $n = zeroes($nx, $ny);
    $new_a = $a->copy;
    for ($x = 0; $x 

If you run this, you will see a small glider crawl diagonally across the grid of zeroes. On my machine, it prints out a couple of generations per second.

And here's the threaded version in PDL - just three lines of PDL code, and one of those is printing out the latest generation! If you run this program, it runs at a much faster rate.

#!/usr/bin/perl -w
use strict;
use PDL;
use PDL::NiceSlice;

my $a = zeroes(40,41);

# put in a simple glider
$a(1:3,1:3) .= pdl ( [1,1,1],
                     [0,0,1],
                     [0,1,0] );

while (1) {

    # calculate the number of neighbours per cell
    my $n = ($a->range(ndcoords($a)-1,3,3)->reorder(2,3,0,1)->sumover->sumover)-$a;

    # calculate the next generation
    $a = ((($n == 2) + ($n == 3))* $a) + (($n==3) * !$a);

    print $a;
}

So, how did we get the four

for

loops into PDL threads?

A lot of PDL functions are written to help carry out PDL threading.

Reading through

PDL::Slices

we encounter the function

range

.

Let's start with an example again.

range extracts a selected chunk from a source piddle -

$a = sequence(9,11);

pdl> p $a                                                                    

[
 [ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]
 [18 19 20 21 22 23 24 25 26]
 [27 28 29 30 31 32 33 34 35]
 [36 37 38 39 40 41 42 43 44]
 [45 46 47 48 49 50 51 52 53]
 [54 55 56 57 58 59 60 61 62]
 [63 64 65 66 67 68 69 70 71]
 [72 73 74 75 76 77 78 79 80]
 [81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98]
]

pdl> p $a->range(pdl([2,3]),3)                                                    

[
 [29 30 31]
 [38 39 40]
 [47 48 49]
]

pdl> p $a->range(pdl([2,3]),3)->info                                              
PDL: Double D [3,3]

...so for the inner loops, we have a way of selecting a 3 by 3 box starting from the upper left corner. What we

really want for a given

[x,y]

position in

$a

is the range of

[x-1:x+1,y-1:y+1]

. But, since our starting position is itself a piddle, we can

simply subtract 1 from that piddle to start our 3x3 box with an upper left corner to the upper left of our target cell!

pdl> p $a->range(pdl([2,3])-1,3)                                             

[
 [19 20 21]
 [28 29 30]
 [37 38 39]
]

pdl> p $a->range(pdl([2,3])-1,3)->info                                       
PDL: Double D [3,3]

In a life cell, we want the sum of all these cells to add up all the neighbours - the function we will use is sumover, which adds up all the values along the 0th dimension of a piddle:

pdl> p $a->range(pdl([2,3])-1,3)->sumover                                    
[60 87 114]
pdl> p $a->range(pdl([2,3])-1,3)->sumover->sumover                           
261

BUT we've added the contents of the central cell itself, so we need to

subtract off the original value of

$a

:

pdl> p (($a->range(pdl([2,3])-1,3)->sumover->sumover)-$a(2,3))               

[
 [232]
]

So we have an expression that can calculate the number of neighbours of a given cell - we've taken care of two for loops in the original program.

Now, what we want to do is not just calculate a single cell's neighbour, but all cell neighbours. How to do this?

Looking at the

PDL::Slice

documentation, we find the function

ndcoords()

If you give it an input piddle with P dimensions, it will produce a piddle with one higher dimension, with the 0th dimension containing P elements representing the coordinates of each element in the input piddle. Once again, an example makes it clearer for me:

pdl> $c = zeroes(3,4)                                                        

pdl> p ndcoords($c)->info                                                    
PDL: Double D [2,3,4]
pdl> p ndcoords($c)                                                          

[
 [
  [0 0]
  [1 0]
  [2 0]
 ]
 [
  [0 1]
  [1 1]
  [2 1]
 ]
 [
  [0 2]
  [1 2]
  [2 2]
 ]
 [
  [0 3]
  [1 3]
  [2 3]
 ]
]

...so if we do

ndcoords($a)

then we will get a huge list of all the

x,y

positions in

$a

- perfect for input into

range()

. Let's look at what kind of piddle is generated using

ndcoords

and

range

:

pdl> $a = sequence(9,11)                                                        
pdl> $b = $a->range(ndcoords($a)-1,3,3)                                      

pdl> p $b->info                                                              
PDL: Double D [9,11,3,3]

Wow, this is a four dimensional piddle! And it is no coincidence that we have four for loops in the original Life algorithm.

This monster piddle contains:

An array of each cell in

$a

, and each of these cells contains the 3x3

set of cells in it. If we wanted to look at the neighbours of cell 2,3 we would write:

p $b(2,3,:,:)

or more simply:

p $b(2,2,,)

But this prints out a long straggly list of nested PDL dimensions - we

can use the

squeeze

function (

PDL::Core

) to get rid of all the dims with

a size of 1, and then this looks cleaner:

pdl> p $b(2,3,,)->info                                                       
PDL: Double D [1,1,3,3]
pdl> p $b(2,3,,)->squeeze                                                    

[
 [19 20 21]
 [28 29 30]
 [37 38 39]
]

pdl> p $b(2,3,,;-)                                                           

[
 [19 20 21]
 [28 29 30]
 [37 38 39]
]

pdl> p $b(2,3,,;-)->info                                                     
PDL: Double D [3,3]

You may have noticed that the

range()

function above had an extra

variable added to it - an extra '3' as the last input. This tells

range()

how to deal with the boundary conditions of range, i.e. what to

do when the 3x3 box falls off the edge of the array. In this case, '3' tells range to wrap to the other side of the array:

p $b(0,1,,;-)

[
 [ 8  0  1]
 [17  9 10]
 [26 18 19]
]

...so the Life grid wraps at each edge, making for a finite but unbounded universe.

The final command is the

reorder()

function - if we look at the output again of

$b

:

$b = $a->range(ndcoords($a)-1,3,3)                                      

pdl> p $b->info PDL: Double D [9,11,3,3]

To calculate the number of neighbours, we want to add up the number of cells over the last two dimensions - dimensions 2 and 3. But, the function sumover() only sums over dimension 0! So, what do we do? We rearrange the dimension order!

p $b->reorder(2,3,0,1)->info
PDL: Double D [3,3,9,11]

And now we can do two

sumover()

to get the total number of neighbours.

p ($b->range(ndcoords($a)-1,3,3)->reorder(2,3,0,1)->sumover->sumover)

pdl> p $b->reorder(2,3,0,1)->sumover->sumover                                

[
 [324 306 315 324 333 342 351 360 342]
 [108  90  99 108 117 126 135 144 126]
 [189 171 180 189 198 207 216 225 207]
 [270 252 261 270 279 288 297 306 288]
 [351 333 342 351 360 369 378 387 369]
 [432 414 423 432 441 450 459 468 450]
 [513 495 504 513 522 531 540 549 531]
 [594 576 585 594 603 612 621 630 612]
 [675 657 666 675 684 693 702 711 693]
 [756 738 747 756 765 774 783 792 774]
 [540 522 531 540 549 558 567 576 558]
]

pdl> p $b->reorder(2,3,0,1)->sumover->sumover->info()                        
PDL: Double D [9,11]

...and now, as a bonus, this expression is the same dimensions as

$a

, so we can simply subtract off

$a

to calculate the neighbours!

The next line simply the neighbour rules for the next generation:

$a = ((($n == 2) + ($n == 3))* $a) + (($n==3) * !$a);

...and we're done.

Conclusions

Here we've gone through PDL threading, and an example which (hopefully) demonstrates why it's worth getting to grips with it in PDL.

  • Threading helps eliminate slow

    for
    

loops from your code

  • It can make your code shorter and algorthims more concise
  • Feel free to move the the dimensions you want to operate on to the start of your PDL dimension list, and let PDL thread over the higher dimensions!
Clone this wiki locally