[optimization] parallelize #6

bddap · 2020-11-25T20:01:58Z

During each single round of reasoning, the RDF store does not receive writes. Instead, new inferences are queued for addition. This structure should lend itself well to a multi threaded reasoner. Each rule can be processed by a separate thread. Once all rules are processed for a round, merge all resulting inferences together an insert them into the RDF store.

Maybe work can be split up more granularly than per-rule but perhaps not worth the complexity.

bddap · 2021-01-26T19:35:16Z

Initial experiments adding rayon.

ancestry benchmark, AMD Ryzen 9 3900X 12-Core Processor (24 logical cores), 50 entities

	24 rules (22 of them duplicates)	2 rules
single threaded	294,074,410 ns/iter	20,142,490 ns/iter
rayon version	51,397,531 ns/iter	12,489,231 ns/iter

benchmark source

#[bench]
fn ancestry_high_bench(b: &mut Bencher) {
    let mut next_uniq = 0u32;
    let parent = inc(&mut next_uniq);
    let ancestor = inc(&mut next_uniq);
    let nodes: Vec<u32> = (0usize..50).map(|_| inc(&mut next_uniq)).collect();
    let facts: Vec<Claim<u32>> = nodes
        .iter()
        .zip(nodes.iter().cycle().skip(1))
        .map(|(a, b)| [*a, parent, *b])
        .collect();
    let arule: [&[Claim<Entity<_, _>>]; 2] = [
        &[
            [Unbound("a"), Bound(ancestor), Unbound("b")],
            [Unbound("b"), Bound(ancestor), Unbound("c")],
        ],
        &[[Unbound("a"), Bound(ancestor), Unbound("c")]],
    ];
    let rules = decl_rules(&[
        [
            &[[Unbound("a"), Bound(parent), Unbound("b")]],
            &[[Unbound("a"), Bound(ancestor), Unbound("b")]],
        ],
        [
            &[
                [Unbound("a"), Bound(ancestor), Unbound("b")],
                [Unbound("b"), Bound(ancestor), Unbound("c")],
            ],
            &[[Unbound("a"), Bound(ancestor), Unbound("c")]],
        ],
    ]);
    let composite_claims = vec![
        [nodes[0], ancestor, *nodes.last().unwrap()],
        [*nodes.last().unwrap(), ancestor, nodes[0]],
        [nodes[0], ancestor, nodes[0]],
        [nodes[0], parent, nodes[1]], // (first node, parent,  second node) is a premise
    ];
    b.iter(|| {
        prove::<&str, u32>(&facts, &composite_claims, &rules).unwrap();
    })
}

Looks like, the parallel implementation improves performance when rule count is high, otherwise it hiders performance.

bddap · 2021-01-26T19:35:53Z

Better to work on #1 first.

bddap · 2021-04-08T17:57:29Z

Better to work on #9 before working on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[optimization] parallelize #6

[optimization] parallelize #6

bddap commented Nov 25, 2020

bddap commented Jan 26, 2021

bddap commented Jan 26, 2021

bddap commented Apr 8, 2021 •

edited

Loading

[optimization] parallelize #6

[optimization] parallelize #6

Comments

bddap commented Nov 25, 2020

bddap commented Jan 26, 2021

bddap commented Jan 26, 2021

bddap commented Apr 8, 2021 • edited Loading

bddap commented Apr 8, 2021 •

edited

Loading