-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This Component Rating System allows to evaluate individual components of a software system by considering performance evaluations of component combinations. It provides a default implementation that is based on Microsoft's TrueSkill(tm) player rating system behind a generic interface ready to be used for alternative implementations.
The Component Rating System (CRS) is written in Scala, which is bytecode-compatible with Java, i.e. it runs on the JVM. The CRS can thus be easily used in any Scala or Java application. One way of setting up a Scala development environment is the Scala IDE for Eclipse. Scala 2.9.2 is required for compilation.
To use the Component Rating System you need to instantiate the class. In the following example the implementation using TrueSkill is insantiated:
val crs: ComponentRatingSystem = alesia.componentrating.TrueSkillRatingSystem()
This instance manages the component ratings and receives updates in the form of component combination comparisons. Initially, nothing is known about your components. This changes as more comparison results are submitted.
A comparison result is an ordering of components, reflecting their relative performance shown at the last comparison. A simple example will help understand comparison results:
Let's say we have a set of table tennis players and we want to find the best player, or more precisely, a ranking of the players according to their performance in double matches (2 vs. 2). To begin we randomly form two teams of two players: Alan and Bob play against Caesar and Dionysos. Each player can be viewed as a component and a team performance as the result of those components working together. Likewise, a game is a comparison of components. The first game is played and now we want to find out how our ranking of players/components has changed. First we have to submit the game result to the rating system:
val comparisonResult = List( Set("Caesar","Dionysos"), Set("Alan","Bob") )
crs.submitResults(comparisonResult)
Components are identified by a string. If a new component is submitted to the rating system it is set to the default values. In this example, the ancient players were victorious and this is represented by their team being at the head of the list. The ordering of teams when submitting a result is determined by their relative performance in the game.
Each submission changes the knowledge base and updates what the rating system believes to be the real strength of each component. After several updates, if the submitted comparison results are not too noisy and each component shows up in at least some updates, the rating system comes up with a ranking of the components somewhat resembling their "real" (with regards to the submitted comparison results) strength.
To retrieve the ranking of the components the knowledge base can be accessed in two ways. Firstly, the components can be compared with regards to their current strength using the comparator interface:
val comparisonOfTwoComponents = crs.compare("Alan", "Caesar") // will yield some double < 0 here
Secondly, the strength and uncertainty of each component in the knowledge base can be accessed directly:
val strength = crs.getPoints("Caesar") // will yield some double around 30.0 here
val uncertainty = crs.getUncertainty("Caesar") // will yield some double around 9.0 here
The meaning of these values are straight forward: the component with higher points is more likely to win a comparison and a higher uncertainty means that the lately submitted comparison results may have been contradictory.
To overwrite the rating of one component or to add a component with a certain rating instead of the default one, injectRating
can be used:
val componentName = "MyComponent"
val componentPoints = 40.0 // mean
val componentUncertainty = 8.0 // sigma
crs.injectRating(componentName, componentPoints, componentUncertainty)
It may happen that a component has been updated, that is, its strength has supposedly changed greatly, but the old rating should still be taken into account. In that case use componentUpdated
:
crs.componentUpdated("ComponentName")
The uncertainty of the Component is increased in the knowledge base to take account of the update. At the point of writing this, it is multiplied by 1.5.
This Implementation of the Component Rating System uses TrueSkill to determine how the result of a component comparison influences the points and uncertainty of the involved components. The result ist the TrueSkill Rating System
.
The TrueSkill algorithm relies on some basic variables that influence the updates.
For those variables default values are given in the papers that introduced TrueSkill. The Alesia Component Rating System holds these variables in the class TrueSkillDefaultValues
, an instance of which can be implicitly passed to TrueSkillRatingSystem
. For most purposes nothing needs to be passed and the default values as given by the paper of Herbrich et al. are used.
If you want to change a variable refer to the following example (setting the "skill chain" beta to 500):
val myDefaultValues = new TrueSkillDefaultValues(beta = 500.0)
val crs = alesia.componentrating.TrueSkillRatingSystem()(myDefaultValues, new AdvancedOptions)
Note that new AdvancedOptions
part is necessary here due to a limitation of Scala (both parameters need to be stated).
The default values you can change are: the default skill consisting of mean and uncertainty (example: defaultSkill = new NormalDist( myMean, myUncertainty )
), draw probability pDraw
, skill chain beta
, uncertainty additive factor tau tau
, accuracy of the approximative algorithm delta
(note: the closer to 0 the more accurate) and the debug flag debug
.
The original inventors of TrueSkill considered a scenario, where a player might not participate in the whole game but rather leave or join after some time. To take that into account partial play was developed.
When submitting game results to the TrueSkillRatingSystem
a player can be marked as contributing only partly to his teams performance in this game. To set the partial play factor for a component to 50%:
crs.setPartialPlayFactor("myComponentName", 0.5)
This is persistent within the TrueSkillRatingSystem instance.
One of TrueSkill more controversial features is the influence it lends to team size on the supposed team strength: the strenght of the team equals the sum of the strength of its components. The result is that components which play in bigger teams more often are ranked lower.
There are some applications, for which a different approach is preferred. For example, the team strength could be around the average strength of this teams' components, so the influence of team size is (almost) removed.
This can be set in the advancedOptions
, which are then passed to the TrueSkillRatingSystem
on creation:
val crs: ComponentRatingSystem = TrueSkillRatingSystem()(new DefaultOptions, new AdvancedOptions(useVPBalancing=true))
Note that if any options are passed, both must be present. The above example uses the Virtual Player Team Size Balancing which was observed to be somewhat closer to the desired effect than the Partial Play Team Size Balancing.
- Paper describing the use of TrueSkill for component-based simulation systems: Jonathan Wienß, Michael Stein, Roland Ewald: Evaluating Simulation Software Components with Player Rating Systems (presented at SIMUTools 2013): https://dl.acm.org/doi/10.5555/2512734.2512740
- TrueSkill homepage at Microsoft Research: http://research.microsoft.com/en-us/projects/trueskill/
- Original paper on TrueSkill by Ralf Herbrich, Tom Minka, and Thore Graepel: http://research.microsoft.com/apps/pubs/default.aspx?id=67956
- Microsoft Researchs Online Rank Calculator. Useful for checking results: http://atom.research.microsoft.com/trueskill/rankcalculator.aspx
- F# implementation from Microsoft Research in its Applied Games Group Blog : http://blogs.technet.com/b/apg/archive/2008/06/16/trueskill-in-f.aspx
- Article by Jeff Moser about TrueSkill, explains much of the math and also his C# implementation. Recommended for initial understanding of TS: http://www.moserware.com/2010/03/computing-your-skill.html