-
Notifications
You must be signed in to change notification settings - Fork 3
Factorie
The factorie
package written by Andrew MacCallum is a great tool for research on factor graph. However, like most package coming from research community, the codes are not well documented. This post will log down my exploration of this package.
Well, not that simple under the ground. The Gaussian
class is a DirectedFamily3
subclass. Each Gaussian variable depends on the mean variable and variance variable. This makes things pretty nifty. See the example in GaussianDemo.sacla
:
implicit val model = DirectedModel()
implicit val random = new scala.util.Random(0)
val mean = new DoubleVariable(10)
val variance = new DoubleVariable(1.0)
val data = for (i<-1 to 1000) yield new DoubleVariable :~ Gaussian(mean, variance)
These three lines will automatically generate 1000 Gaussian variables, together with the factors. What that means is you can use the generated data like:
MaximizeGaussianMean(mean, model)
Factorie implements the factors in a very flexible way, taking advantage of scala's trait functionality. A factor should have three parts: the neighboring variables, the score
and the statistics
for the neighboring variables.
Take the factor Factor3
as example. A factor of type Factor3
has three neighbors, and can return a score(un-normalized) representing the compatibility of the current world. So one thing is how to compute the score from the three neighbors. Of course, you can directly write codes to do the computation. However, there are some common cases that can be abstracted.
- If each neighbors can return a
Tensor
, then you can useTensorFactor3
. Two methods need to be implemented:statistics
return aTensor
based on the three tensors returned by three neighbors; andstatisticsScore
compute the finalscore
based on thestatistics
- Based on the
TensorFactor3
, if thestatistics
can be computed as the outer product of the three tensors, then you can useTensorFactorStatistics3
. OnlystatisticsScore
need to be implemented. - Based on the
TensorFactor3
again, if thestatisticsScore
can be computed as dot product betweenstatistics
and a givenweights
, then you can useDotFactor3
. In this way, you need to defineweights
andstatistics
for the factor.- Combine the
TensorFactorStatistics3
andDotFactor3
, you getDotFactorWithStatistics3
- Combine the
- Based on the
To facilitate the generation of factors, factorie have defined Template
as factor template. Corresponding to Factor3
, we have Family3
to generate Factor3
. One difference here is we now have a TupleFamily3
trait.
-
TensorFamily3
-->TensorFactor3
-
TensorFamilyWithStatistics3
-->TensorFactor3
-
DotFamily3
-->DotFactor3
-
DotFamilyWithStatistics3
-->DotFactorWithStatistics3
-
-
-
TupleFamily3
, here thestatistics
is defined as a tuple, whose types are three neighbors'Value
s.-
TupleFamilyWithStatistics3
make things even further, thestatistics
's value will be three neighbors' value.
-
There are some requirements to define your desired factor graph. Suppose the node in you factor graph is of class Node
, during the inference, some fields of the Node
instance will change. These fields should be defined as a subclass of *Variable
, like BooleanVariable
. All these class inherits from trait MutableVar
, which defines a function set(newValue:Value)(implicit d:DiffList)
. The sampling process relies on the DiffList
to score proposals and do inference. So what is a DiffList
?
DiffList
is a list of Diff
, which stores the variable
it refers to and redo
/undo
functions. The DiffList
tracks all the changes of variables in a proposal, thus we can compute the model score before the proposed changes and after the proposed changes. The DiffList
class provides a scoreAndUndo(model:Model)
methods, which will compute the model score with and without proposal, and return the score difference. Wait, why is the DiffList
responsible for computing the model score and how does it compute? The Model
class, which as you may guess is a collection a factors, knows which factor touch(contain) a given variable, and the Diff
contains the pointer to the variable, so we can tell the Model
to compute the score given all the variables involved in a DiffList
.
Notice this raises a problem when creating factors using factor family. The Family
class generate one or more Factor
s using the unroll
method, based on the type of involved variable. Each neighbor has the corresponding unroll
method, like unroll1
, unroll2
, unroll3
in a Family3
class. If any of the neighbor variable changes in a proposal, it will generate one factor. Some proposal may change multiple neighbor variables at the same time, thus generates the same factor multiple times. This is solved by defining the touching factors in a Model
as a LinkedHashSet
and implement the equals
method for Factor
. In this way, factorie ensures there is no duplicate factors for a proposal.