-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved ArrayList performance by removing size
field.
#255
Conversation
Removed the `size` field from the `List` struct and replaced it with the built-in Go slice length implementation. + Achieved an average reduction of nearly 40% in execution time for `ArrayListGet`. + Achieved an average reduction of nearly 23% in memory usage for `ArrayListAdd`. + However, this change slightly increased the execution time for `ArrayListAdd` by 2.7%. ``` goos: linux goarch: amd64 pkg: github.com/emirpasic/gods/v2/lists/arraylist cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ ArrayListGet100-8 51.20n ± 0% 32.49n ± 0% -36.53% (n=50) ArrayListGet1000-8 447.5n ± 0% 270.3n ± 1% -39.60% (n=50) ArrayListGet10000-8 4.418µ ± 1% 2.540µ ± 0% -42.52% (n=50) ArrayListGet100000-8 44.06µ ± 0% 25.15µ ± 0% -42.91% (n=50) ArrayListAdd100-8 726.5n ± 1% 760.5n ± 0% +4.69% (p=0.000 n=50) ArrayListAdd1000-8 7.437µ ± 2% 7.389µ ± 1% ~ (p=0.746 n=50) ArrayListAdd10000-8 70.06µ ± 1% 74.34µ ± 1% +6.11% (p=0.000 n=50) ArrayListAdd100000-8 740.2µ ± 1% 728.9µ ± 2% ~ (p=0.147 n=50) ArrayListRemove100-8 233.8n ± 0% 233.9n ± 0% ~ (p=0.162 n=50) ArrayListRemove1000-8 2.275µ ± 0% 2.276µ ± 0% ~ (p=0.452 n=50) ArrayListRemove10000-8 22.75µ ± 0% 22.75µ ± 0% ~ (p=0.956 n=50) ArrayListRemove100000-8 1.323m ± 1% 1.331m ± 1% ~ (p=0.120 n=50) geomean 7.218µ 6.119µ -15.22% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ ArrayListGet100-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListGet1000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListGet10000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListGet100000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListAdd100-8 2.987Ki ± 0% 2.087Ki ± 1% -30.14% (n=50) ArrayListAdd1000-8 27.50Ki ± 1% 23.31Ki ± 2% -15.24% (p=0.000 n=50) ArrayListAdd10000-8 293.9Ki ± 1% 204.7Ki ± 1% -30.36% (n=50) ArrayListAdd100000-8 2.667Mi ± 1% 2.244Mi ± 12% -15.86% (p=0.000 n=50) ArrayListRemove100-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListRemove1000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListRemove10000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListRemove100000-8 453.5 ± 1% 457.5 ± 1% ~ (p=0.059 n=50) geomean ² -8.38% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ ArrayListGet100-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListGet1000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListGet10000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListGet100000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListAdd100-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListAdd1000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListAdd10000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListAdd100000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListRemove100-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListRemove1000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListRemove10000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ ArrayListRemove100000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=50) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
@ksw2000 interesting, but expected benchmarks, i.e. the native implementation will always be faster. @PapaCharlie what do you think? Tests are passing, if OK, let's merge... |
Gave this a quick once over, I think we can do even better! Specifically around Add. If we just used append, I bet we'd see some nice speedups. Let me open some comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple comments around using stdlib stuff now that it's available. LGTM otherwise! Thank you for this
lists/arraylist/arraylist.go
Outdated
@@ -99,14 +97,14 @@ func (list *List[T]) Contains(values ...T) bool { | |||
|
|||
// Values returns all elements in the list. | |||
func (list *List[T]) Values() []T { | |||
newElements := make([]T, list.size, list.size) | |||
copy(newElements, list.elements[:list.size]) | |||
newElements := make([]T, len(list.elements)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be replaced with slices.Copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no slices.Copy()
in slices package. 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, clone exists though https://pkg.go.dev/slices#Clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created some benchmarks to test Values()
. The slices.Clone()
also improves performance when the array is large.
func (list *List[T]) Values() []T {
- newElements := make([]T, len(list.elements))
- copy(newElements, list.elements)
- return newElements
+ return slices.Clone(list.elements)
}
goos: linux
goarch: amd64
pkg: github.com/emirpasic/gods/v2/lists/arraylist
cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
│ copy.txt │ slices.clone.txt │
│ sec/op │ sec/op vs base │
ArrayListValues100-8 526.1n ± 15% 457.1n ± 17% ~ (p=0.174 n=30)
ArrayListValues1000-8 4.606µ ± 7% 4.068µ ± 19% ~ (p=0.096 n=30)
ArrayListValues10000-8 39.26µ ± 18% 34.02µ ± 18% ~ (p=0.279 n=30)
ArrayListValues100000-8 375.3µ ± 17% 298.7µ ± 25% -20.42% (p=0.004 n=30)
geomean 13.75µ 11.72µ -14.71%
│ copy.txt │ slices.clone.txt │
│ B/op │ B/op vs base │
ArrayListValues100-8 896.0 ± 0% 896.0 ± 0% ~ (p=1.000 n=30) ¹
ArrayListValues1000-8 8.000Ki ± 0% 8.000Ki ± 0% ~ (p=1.000 n=30) ¹
ArrayListValues10000-8 80.00Ki ± 0% 80.00Ki ± 0% ~ (p=1.000 n=30) ¹
ArrayListValues100000-8 784.0Ki ± 0% 784.0Ki ± 0% +0.00% (p=0.000 n=30)
geomean 25.74Ki 25.74Ki +0.00%
¹ all samples are equal
│ copy.txt │ slices.clone.txt │
│ allocs/op │ allocs/op vs base │
ArrayListValues100-8 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=30) ¹
ArrayListValues1000-8 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=30) ¹
ArrayListValues10000-8 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=30) ¹
ArrayListValues100000-8 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=30) ¹
geomean 1.000 1.000 +0.00%
¹ all samples are equal
func benchmarkValues(b *testing.B, list *List[int]) {
for i := 0; i < b.N; i++ {
list.Values()
}
}
func BenchmarkArrayListValues100000(b *testing.B) {
b.StopTimer()
size := 100000
list := New[int]()
for n := 0; n < size; n++ {
list.Add(n)
}
b.StartTimer()
benchmarkValues(b, list)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PapaCharlie
Should I commit the benchmark I created in arraylist_test.go
?
In fact, I tried func (list *List[T]) Add(values ...T) {
l := len(list.elements)
list.growBy(len(values))
+ list.elements = append(list.elements[:l], values...)
- for i := range values {
- list.elements[l+i] = values[i]
- }
}
|
Hmmm that's very interesting... What about |
The performance of func (list *List[T]) Add(values ...T) {
l := len(list.elements)
list.growBy(len(values))
+ copy(list.elements[l:], values)
- for i := range values {
- list.elements[l+i] = values[i]
- }
}
|
The performance of `slices.Delete()` is better ``` goos: linux goarch: amd64 pkg: github.com/emirpasic/gods/v2/lists/arraylist cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz │ old-remove.txt │ new-remove.txt │ │ sec/op │ sec/op vs base │ ArrayListRemove100-8 234.2n ± 1% 211.2n ± 2% -9.82% (p=0.000 n=10) ArrayListRemove1000-8 2.293µ ± 1% 2.063µ ± 4% -10.05% (p=0.000 n=10) ArrayListRemove10000-8 22.78µ ± 1% 20.53µ ± 2% -9.86% (p=0.000 n=10) ArrayListRemove100000-8 1.318m ± 3% 1.279m ± 2% -2.96% (p=0.019 n=10) geomean 11.27µ 10.34µ -8.22% │ old-remove.txt │ new-remove.txt │ │ B/op │ B/op vs base │ ArrayListRemove100-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ArrayListRemove1000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ArrayListRemove10000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ArrayListRemove100000-8 452.0 ± 4% 444.5 ± 1% ~ (p=0.224 n=10) geomean ² -0.42% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ old-remove.txt │ new-remove.txt │ │ allocs/op │ allocs/op vs base │ ArrayListRemove100-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ArrayListRemove1000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ArrayListRemove10000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ArrayListRemove100000-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
There's probably something else at play here. Can you put your copy and append code here? And when you run the benchmarks, can you run the profiler (with |
Oh and can you show your benchmarks? All of these will behave differently depending on how you're benchmarking them. For example, how many elements are you adding, etc |
I use the benchmark in
func benchmarkAdd(b *testing.B, list *List[int], size int) {
for i := 0; i < b.N; i++ {
for n := 0; n < size; n++ {
list.Add(n)
}
}
}
func BenchmarkArrayListAdd100(b *testing.B) {
b.StopTimer()
size := 100
list := New[int]()
b.StartTimer()
benchmarkAdd(b, list, size)
} The append method func (list *List[T]) Add(values ...T) {
l := len(list.elements)
list.growBy(len(values))
list.elements = append(list.elements[:l], values...)
} The copy method func (list *List[T]) Add(values ...T) {
l := len(list.elements)
list.growBy(len(values))
copy(list.elements[l:], values)
} I will run the benchmark again using a profiler (with |
I also found there are some weird benchmark in The func benchmarkAdd(b *testing.B, list *List[int], size int) {
for i := 0; i < b.N; i++ {
for n := 0; n < size; n++ {
list.Add(n)
}
}
} The func BenchmarkArrayListAdd100(b *testing.B) {
b.StopTimer()
size := 100
list := New[int]()
b.StartTimer()
benchmarkAdd(b, list, size)
} HOWEVER The func BenchmarkArrayListAdd1000(b *testing.B) {
b.StopTimer()
size := 1000
list := New[int]()
for n := 0; n < size; n++ {
list.Add(n)
}
b.StartTimer()
benchmarkAdd(b, list, size)
} It also occurs in Why In fact, the action of |
Hi @PapaCharlie , I hope you're doing well. I was wondering if you had a chance to review my PR? Please let me know if there's anything I can do to help move this forward. Thank you! |
Thank you for this and for the in-depth analysis! I'm still very surprised by the result as a whole, namely that memcpy is somehow slower than the for loop you have. This is good to go 👍 |
Removed the
size
field from theList
struct and replaced it with the built-in Go slice length implementation.ArrayListGet
.ArrayListAdd
.ArrayListAdd
by 2.7%.