You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I realized when investigating method chaining (e.g. sub_) in my other issue, a significant amount of time is spent in instantiating Array that contains 0, and very often, they may be intermediate result that never see the light of day, e.g.:
let v123 = v1 - v2 - v3 (i.e. result for v1 - v2)
The amount of time is much longer than using sub_() which won't create new array.
I did a quick benchmark (release):
benchmark(title: "array.create", num_trials: 10) {
let a = [Float](repeating: 0, count: len)
}
benchmark(title: "array.create faster", num_trials: 10) {
let p = UnsafeMutablePointer<Float>.allocate(capacity: len)
}
Results:
array.create: 9.340 ms
array.create faster: 0.005 ms
Thats just a great speed diff! I highly suspect one can do away with just the ptr for intermediate result, and wrap it in Array again when returned "outside" of BaseMath.
E.g. if you use pure Accelerate API where a dest ptr is specified, that dest ptr only need .allocate and zeroing isn't necessary.
Just a thought for possible optimizing. This library is looking very good. As an experiment, I am able to create my own library that depends BaseMath but would use Accelerate API in place of explicit pt loop, and I really like the concise clean syntax that look nice like swift code. The only think that bother me is the init that fill them with zero, which is not necessary for Accelerate (and probably elsewhere).
The text was updated successfully, but these errors were encountered:
I noticed AlignedStorage may be without this problem. Here:
benchmark(title: "AlignedStorage.create") {
let v1 = AlignedStorage<Float>(len)
}
AlignedStorage.create: 0.007 ms
This is really nice. And I am still able to do arithmetic and math with this object (need more test), as well as diverting it to use Accelerate in my own lib.
However, my previous point may still be valid, if one has to use Swift array or simply not aware of AlignedStorage.
Note: for CoreML (apple machine learning), there's a type call MLMultiArray, which is a multi-dimensional array but i think it is row-major aligned underneath. You can obtain a UnsafeMutableBufferPointer and then instantiate an AlignedStorage with it, and off you go running nontrivial math algorithm (and with Accelerate if thats profiled to be faster).
Thanks a lot. I had wanted something like BaseMath for a while. People did blog about it but i ain't aware of any concrete/complete project till this one.
Update: someone told me about these. I will just dump it here for future investigation:
var a = [Float]()
a.reserveCapacity(len) <---- NB: a.count will still be 0.
let a = ContiguousArray<Float>(repeating: 0, count: len) // this isn't any much faster.
var a = ContiguousArray<Float>()
a.reserveCapacity(len) // NB: a.count will still be 0.
a.count worries me, does it mean it isn't doing any eager alloc or what? don't know enuf swift here.
I realized when investigating method chaining (e.g. sub_) in my other issue, a significant amount of time is spent in instantiating Array that contains 0, and very often, they may be intermediate result that never see the light of day, e.g.:
let v123 = v1 - v2 - v3 (i.e. result for v1 - v2)
The amount of time is much longer than using sub_() which won't create new array.
I did a quick benchmark (release):
Results:
array.create: 9.340 ms
array.create faster: 0.005 ms
Thats just a great speed diff! I highly suspect one can do away with just the ptr for intermediate result, and wrap it in Array again when returned "outside" of BaseMath.
E.g. if you use pure Accelerate API where a dest ptr is specified, that dest ptr only need .allocate and zeroing isn't necessary.
Just a thought for possible optimizing. This library is looking very good. As an experiment, I am able to create my own library that depends BaseMath but would use Accelerate API in place of explicit pt loop, and I really like the concise clean syntax that look nice like swift code. The only think that bother me is the init that fill them with zero, which is not necessary for Accelerate (and probably elsewhere).
The text was updated successfully, but these errors were encountered: