-
-
Notifications
You must be signed in to change notification settings - Fork 44
Garbage Collector Algorithm
The Felix GC uses a naive mark sweep algorithm. Copying collectors may be more efficient but require objects to be movable and so are not suitable for use in a system designed to interface with C/C++.
The collector operates in 4 phases.
Initially, all objects are assumed unreachable. The collector searches the machine stacks of all threads to find potential pointers into managed objects and adds the bases of these objects to the set of register roots to find the initial root set.
Starting with the root set, the collector marks objects it knows about reachable. It then finds the pointers in each of these objects, marks what the point at as reachable if the pointers are into managed objects, and then recursively scans those objects.
The destruction phase runs through the complete set of managed objects, checking to see if each one is reachable. If not, the objects finaliser is executed, and the object is removed from the managed set and placed into a set of objects to be deleted.
During this phase, all the objects scheduled for deletion in the destruction phase are deallocated.
A finaliser is a C function which is required to release any unmanaged resources associated with an object, it has the following interface:
typedef void finaliser_t (collector_t*, void*);
The finaliser of an object defaults to its C++ destructor. When binding a primitive in Felix, a finaliser can be provided if the destructor is not what is desired. Compiler generated objects always use the C++ destructor.
Finalisers for C objects allow emulation of C++ destruction.
There is another use for finalisers, however. It is possible for a finaliser to force the finalisation of another object before itself, thereby imposing a partial order on otherwise random finalisation. For example if a file object contains a pointer to a lock object, and the lock must be released before the file is closed, the file's finaliser can destroy the lock before closing the file. To ensure the lock is not destroyed twice, it can be marked as released.
A scanner is a C function that is responsible for finding all the pointers in an object. It is a C function with the following interface:
typedef void *scanner_t(collector_t*, gc_shape_t *, void *, size_t, int);
The arguments, in order are: a pointer to the collector, a pointer to the registered shape of the object, a pointer to some arbitrary scanner specific data, the number of elements if the object is a varray, and an integer which restricts the depth of any recursions that the scanner can make, to protect the machine stack from overflow.
When the programmer defines a primitive type by binding it to a C/C++ they may provide a scanner for that type. If no scanner is provided, the object should not contain managed pointers. More precisely, these pointers are weak, and do not contribute to the determination as to whether what they point at is reachable.
The scan object method is a generic method used to invoke the scanner of an object. It checks if its argument is a pointer interior to a managed object. If not, it does nothing.
If it is, the pointer is adjusted to the base of the managed object, the shape of the object is found, and the scanner registered in the shape is called.
void flx_collector_t::scan_object(void *p, int reclimit)
{
again:
memdata_t memdata = check_interior(p);
if(memdata.head == NULL) return;
size_t dyncount = get_used((void*)memdata.head);
if(memdata.pshape->scanner) {
void *r = memdata.pshape->scanner(this, memdata.pshape,memdata.head,dyncount,reclimit);
if (r) { p = r; goto again; }
}
}
The scanner is required to return NULL unless the pointer is managed, and it is the sole pointer in the object. In this case the scan_object method does a self-tail recursion optimisation by jumping back to the start of its body.
The garbage collector provides a method register_pointer()
.
When a scanner finds a pointer, it calls it, passing the pointer
and recursion limit, possibly decremented by one.
The method is shown here, it is the core of the algorithm:
void flx_collector_t::register_pointer(void *q, int reclimit)
{
if (inrange(q)) {
if(reclimit==0)
Judy1Set(&j_tmp,(Word_t)q,&je);
else
scan_object(q, reclimit-1);
}
}
In detail: the method first checks if the machine word is between the bounds of memory allocated so far. The allocation method keeps track of the lowest allocated address, and the highest allocated address plus the size of that object. Any address value outside this range cannot be a pointer to a managed object. This is an optimisation.
The next step is fundamental. We check the recursion limit. If we have not hit the limit on recursions, we call the scan_object method with the recursion limit decremented. This will eventually call register_pointer again. This is the recursive descent examining a subtree of the data structure for pointers.
If we have hit the recursion bound, we add the pointer to the j_tmp
buffer. This buffer is a Judy1Array, which is an ultra-high performance
implementation of a set of machine words. The implementation uses a cache
tuned digital trie. Because it models a set, adding a value already in
the set does not cause any change. This is an additional optimisation.
In the case of a multi-threaded scanning operation, a condition variable is used to serialise access to the buffer (not shown).
When the Felix compiler generates an object of a product
type, it constructs a shape object containing pointer
to the standard scanner scan_by_offsets
. This scanner
uses an array of offsets into the object which the
compiler puts into the shape table to find the
pointers in the object. For nested products, the offset
array is linearised by the compiler.
NOTE: there is a design fault here. A product component cannot use a custom scanner. A pointer to an object with custom scanner is fine!
Below is the single threaded mark algorithm. The multi-thread algorithm is effectively identical.
void flx_collector_t::mark_single(pthread::memory_ranges_t *px, int reclimit)
{
if(px)
{
// for all pthreads
std::vector<pthread::memory_range_t>::iterator end = (*px).end();
for(
std::vector<pthread::memory_range_t>::iterator i = (*px).begin();
i != end;
++i
)
{
// get the stack extent for one pthread
pthread::memory_range_t range = *i;
void *end = range.e;
for ( void *i = range.b; i != end; i = (void*)((void**)i+1))
scan_object(*(void**)i, reclimit);
}
}
// Now scan all the registered roots
rootmap_t::iterator const end = roots.end();
for(
rootmap_t::iterator i = roots.begin();
i != end;
++i
)
{
scan_object((*i).first, reclimit);
}
Word_t toscan = 0;
int res = Judy1First(j_tmp,&toscan,&je); // get one object scheduled for scanning
while(res) {
Judy1Unset(&j_tmp,toscan,&je); // remove it immediately
scan_object((void*)toscan, reclimit); // scan it, it will either be marked or discarded
toscan = 0;
res = Judy1First(j_tmp,&toscan,&je);
}
assert(j_tmp == 0);
}
If the FLX_GCTHREADS
environment variable was set to an integer greater
than 1 when the system started up, then the specified number of threads
is used to perform the mark phase. All the threads use the same data structures
start from the same initial conditions, and terminate from the same final
conditions.
Locking is used ONLY on the buffer object. Surprisingly this is not only sufficient, it is maximally optimal. Given the same pointer, two threads are going to perform the same recursive descent. It would seem to lead to a lot of duplication of effort. This is not the case! Instead, because access to the buffer is serialised, once a thread has safely stashed a pointer in the buffer the recursion path through that pointer is blocked. The blockage is required even for a single thread, to prevent an infinite loop when pointers form a cycle. With the serialisation, this path cutting also prevents a thread following the same path as another with some lag.
There is actually a race going on here, but the race has a definite winner, and it is in fact this very competition which makes the algorithm so efficient. Once one thread gets ahead of another, the second thread must follow a different path. It is possible due to the presence of cycles that the paths will converge again. As before the contention is automatically resolved.