Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved ArrayList performance by removing size field. #255

Merged
merged 5 commits into from
Jul 18, 2024

Conversation

ksw2000
Copy link
Contributor

@ksw2000 ksw2000 commented Jun 18, 2024

Removed the size field from the List struct and replaced it with the built-in Go slice length implementation.

  • Achieved an average reduction of nearly 40% in execution time for ArrayListGet.
  • Achieved an average reduction of nearly 23% in memory usage for ArrayListAdd.
  • However, this change slightly increased the execution time for ArrayListAdd by 2.7%.
goos: linux
goarch: amd64
pkg: github.com/emirpasic/gods/v2/lists/arraylist
cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
                        │   old.txt   │               new.txt               │
                        │   sec/op    │   sec/op     vs base                │
ArrayListGet100-8         51.20n ± 0%   32.49n ± 0%  -36.53% (n=50)
ArrayListGet1000-8        447.5n ± 0%   270.3n ± 1%  -39.60% (n=50)
ArrayListGet10000-8       4.418µ ± 1%   2.540µ ± 0%  -42.52% (n=50)
ArrayListGet100000-8      44.06µ ± 0%   25.15µ ± 0%  -42.91% (n=50)
ArrayListAdd100-8         726.5n ± 1%   760.5n ± 0%   +4.69% (p=0.000 n=50)
ArrayListAdd1000-8        7.437µ ± 2%   7.389µ ± 1%        ~ (p=0.746 n=50)
ArrayListAdd10000-8       70.06µ ± 1%   74.34µ ± 1%   +6.11% (p=0.000 n=50)
ArrayListAdd100000-8      740.2µ ± 1%   728.9µ ± 2%        ~ (p=0.147 n=50)
ArrayListRemove100-8      233.8n ± 0%   233.9n ± 0%        ~ (p=0.162 n=50)
ArrayListRemove1000-8     2.275µ ± 0%   2.276µ ± 0%        ~ (p=0.452 n=50)
ArrayListRemove10000-8    22.75µ ± 0%   22.75µ ± 0%        ~ (p=0.956 n=50)
ArrayListRemove100000-8   1.323m ± 1%   1.331m ± 1%        ~ (p=0.120 n=50)
geomean                   7.218µ        6.119µ       -15.22%

                        │    old.txt     │                 new.txt                 │
                        │      B/op      │     B/op       vs base                  │
ArrayListGet100-8           0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListGet1000-8          0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListGet10000-8         0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListGet100000-8        0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListAdd100-8         2.987Ki ± 0%     2.087Ki ±  1%  -30.14% (n=50)
ArrayListAdd1000-8        27.50Ki ± 1%     23.31Ki ±  2%  -15.24% (p=0.000 n=50)
ArrayListAdd10000-8       293.9Ki ± 1%     204.7Ki ±  1%  -30.36% (n=50)
ArrayListAdd100000-8      2.667Mi ± 1%     2.244Mi ± 12%  -15.86% (p=0.000 n=50)
ArrayListRemove100-8        0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListRemove1000-8       0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListRemove10000-8      0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListRemove100000-8     453.5 ± 1%       457.5 ±  1%        ~ (p=0.059 n=50)
geomean                                ²                   -8.38%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                        │   old.txt    │               new.txt               │
                        │  allocs/op   │ allocs/op   vs base                 │
ArrayListGet100-8         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListGet1000-8        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListGet10000-8       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListGet100000-8      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListAdd100-8         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListAdd1000-8        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListAdd10000-8       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListAdd100000-8      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListRemove100-8      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListRemove1000-8     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListRemove10000-8    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListRemove100000-8   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
geomean                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Removed the `size` field from the `List` struct and replaced it with the built-in Go slice length implementation.
+ Achieved an average reduction of nearly 40% in execution time for `ArrayListGet`.
+ Achieved an average reduction of nearly 23% in memory usage for `ArrayListAdd`.
+ However, this change slightly increased the execution time for `ArrayListAdd` by 2.7%.
```
goos: linux
goarch: amd64
pkg: github.com/emirpasic/gods/v2/lists/arraylist
cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
                        │   old.txt   │               new.txt               │
                        │   sec/op    │   sec/op     vs base                │
ArrayListGet100-8         51.20n ± 0%   32.49n ± 0%  -36.53% (n=50)
ArrayListGet1000-8        447.5n ± 0%   270.3n ± 1%  -39.60% (n=50)
ArrayListGet10000-8       4.418µ ± 1%   2.540µ ± 0%  -42.52% (n=50)
ArrayListGet100000-8      44.06µ ± 0%   25.15µ ± 0%  -42.91% (n=50)
ArrayListAdd100-8         726.5n ± 1%   760.5n ± 0%   +4.69% (p=0.000 n=50)
ArrayListAdd1000-8        7.437µ ± 2%   7.389µ ± 1%        ~ (p=0.746 n=50)
ArrayListAdd10000-8       70.06µ ± 1%   74.34µ ± 1%   +6.11% (p=0.000 n=50)
ArrayListAdd100000-8      740.2µ ± 1%   728.9µ ± 2%        ~ (p=0.147 n=50)
ArrayListRemove100-8      233.8n ± 0%   233.9n ± 0%        ~ (p=0.162 n=50)
ArrayListRemove1000-8     2.275µ ± 0%   2.276µ ± 0%        ~ (p=0.452 n=50)
ArrayListRemove10000-8    22.75µ ± 0%   22.75µ ± 0%        ~ (p=0.956 n=50)
ArrayListRemove100000-8   1.323m ± 1%   1.331m ± 1%        ~ (p=0.120 n=50)
geomean                   7.218µ        6.119µ       -15.22%

                        │    old.txt     │                 new.txt                 │
                        │      B/op      │     B/op       vs base                  │
ArrayListGet100-8           0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListGet1000-8          0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListGet10000-8         0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListGet100000-8        0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListAdd100-8         2.987Ki ± 0%     2.087Ki ±  1%  -30.14% (n=50)
ArrayListAdd1000-8        27.50Ki ± 1%     23.31Ki ±  2%  -15.24% (p=0.000 n=50)
ArrayListAdd10000-8       293.9Ki ± 1%     204.7Ki ±  1%  -30.36% (n=50)
ArrayListAdd100000-8      2.667Mi ± 1%     2.244Mi ± 12%  -15.86% (p=0.000 n=50)
ArrayListRemove100-8        0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListRemove1000-8       0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListRemove10000-8      0.000 ± 0%       0.000 ±  0%        ~ (p=1.000 n=50) ¹
ArrayListRemove100000-8     453.5 ± 1%       457.5 ±  1%        ~ (p=0.059 n=50)
geomean                                ²                   -8.38%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                        │   old.txt    │               new.txt               │
                        │  allocs/op   │ allocs/op   vs base                 │
ArrayListGet100-8         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListGet1000-8        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListGet10000-8       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListGet100000-8      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListAdd100-8         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListAdd1000-8        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListAdd10000-8       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListAdd100000-8      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListRemove100-8      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListRemove1000-8     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListRemove10000-8    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
ArrayListRemove100000-8   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=50) ¹
geomean                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
@emirpasic
Copy link
Owner

@ksw2000 interesting, but expected benchmarks, i.e. the native implementation will always be faster.

@PapaCharlie what do you think? Tests are passing, if OK, let's merge...

@PapaCharlie
Copy link
Collaborator

Gave this a quick once over, I think we can do even better! Specifically around Add. If we just used append, I bet we'd see some nice speedups. Let me open some comments

Copy link
Collaborator

@PapaCharlie PapaCharlie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple comments around using stdlib stuff now that it's available. LGTM otherwise! Thank you for this

lists/arraylist/arraylist.go Show resolved Hide resolved
lists/arraylist/arraylist.go Outdated Show resolved Hide resolved
@@ -99,14 +97,14 @@ func (list *List[T]) Contains(values ...T) bool {

// Values returns all elements in the list.
func (list *List[T]) Values() []T {
newElements := make([]T, list.size, list.size)
copy(newElements, list.elements[:list.size])
newElements := make([]T, len(list.elements))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be replaced with slices.Copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no slices.Copy() in slices package. 😂

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, clone exists though https://pkg.go.dev/slices#Clone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created some benchmarks to test Values(). The slices.Clone() also improves performance when the array is large.

func (list *List[T]) Values() []T {
-	newElements := make([]T, len(list.elements))
-	copy(newElements, list.elements)
-	return newElements
+	return slices.Clone(list.elements)
}
goos: linux
goarch: amd64
pkg: github.com/emirpasic/gods/v2/lists/arraylist
cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
                        │   copy.txt   │           slices.clone.txt           │
                        │    sec/op    │    sec/op     vs base                │
ArrayListValues100-8      526.1n ± 15%   457.1n ± 17%        ~ (p=0.174 n=30)
ArrayListValues1000-8     4.606µ ±  7%   4.068µ ± 19%        ~ (p=0.096 n=30)
ArrayListValues10000-8    39.26µ ± 18%   34.02µ ± 18%        ~ (p=0.279 n=30)
ArrayListValues100000-8   375.3µ ± 17%   298.7µ ± 25%  -20.42% (p=0.004 n=30)
geomean                   13.75µ         11.72µ        -14.71%

                        │   copy.txt   │           slices.clone.txt            │
                        │     B/op     │     B/op      vs base                 │
ArrayListValues100-8        896.0 ± 0%     896.0 ± 0%       ~ (p=1.000 n=30) ¹
ArrayListValues1000-8     8.000Ki ± 0%   8.000Ki ± 0%       ~ (p=1.000 n=30) ¹
ArrayListValues10000-8    80.00Ki ± 0%   80.00Ki ± 0%       ~ (p=1.000 n=30) ¹
ArrayListValues100000-8   784.0Ki ± 0%   784.0Ki ± 0%  +0.00% (p=0.000 n=30)
geomean                   25.74Ki        25.74Ki       +0.00%
¹ all samples are equal

                        │  copy.txt  │          slices.clone.txt           │
                        │ allocs/op  │ allocs/op   vs base                 │
ArrayListValues100-8      1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=30) ¹
ArrayListValues1000-8     1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=30) ¹
ArrayListValues10000-8    1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=30) ¹
ArrayListValues100000-8   1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=30) ¹
geomean                   1.000        1.000       +0.00%
¹ all samples are equal
func benchmarkValues(b *testing.B, list *List[int]) {
	for i := 0; i < b.N; i++ {
		list.Values()
	}
}

func BenchmarkArrayListValues100000(b *testing.B) {
	b.StopTimer()
	size := 100000
	list := New[int]()
	for n := 0; n < size; n++ {
		list.Add(n)
	}
	b.StartTimer()
	benchmarkValues(b, list)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PapaCharlie
Should I commit the benchmark I created in arraylist_test.go?

lists/arraylist/arraylist.go Outdated Show resolved Hide resolved
lists/arraylist/arraylist.go Show resolved Hide resolved
@ksw2000
Copy link
Contributor Author

ksw2000 commented Jun 19, 2024

In fact, I tried append before making the pull request. The performance of append() is worse than for loop.

func (list *List[T]) Add(values ...T) {
	l := len(list.elements)
	list.growBy(len(values))
+	list.elements = append(list.elements[:l], values...)
-	for i := range values {
-		list.elements[l+i] = values[i]
-	}
}
goos: linux
goarch: amd64
pkg: github.com/emirpasic/gods/v2/lists/arraylist
cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
                     │ for-loop.txt │              append.txt              │
                     │    sec/op    │    sec/op     vs base                │
ArrayListAdd100-8       788.6n ± 1%    837.4n ± 1%   +6.19% (p=0.000 n=30)
ArrayListAdd1000-8      7.732µ ± 2%   12.390µ ± 6%  +60.25% (p=0.000 n=30)
ArrayListAdd10000-8     76.45µ ± 0%   124.02µ ± 1%  +62.23% (p=0.000 n=30)
ArrayListAdd100000-8    764.5µ ± 2%   1205.2µ ± 3%  +57.64% (p=0.000 n=30)
geomean                 24.43µ         35.29µ       +44.43%

                     │ for-loop.txt │              append.txt              │
                     │     B/op     │     B/op      vs base                │
ArrayListAdd100-8      2.075Ki ± 1%   1.647Ki ± 2%  -20.64% (p=0.000 n=30)
ArrayListAdd1000-8     24.15Ki ± 3%   26.86Ki ± 5%  +11.25% (p=0.009 n=30)
ArrayListAdd10000-8    204.5Ki ± 1%   286.3Ki ± 1%  +40.04% (p=0.000 n=30)
ArrayListAdd100000-8   2.307Mi ± 4%   2.658Mi ± 5%  +15.26% (p=0.000 n=30)
geomean                70.14Ki        76.63Ki        +9.26%

                     │ for-loop.txt │             append.txt              │
                     │  allocs/op   │ allocs/op   vs base                 │
ArrayListAdd100-8      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=30) ¹
ArrayListAdd1000-8     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=30) ¹
ArrayListAdd10000-8    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=30) ¹
ArrayListAdd100000-8   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=30) ¹
geomean                           ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

@PapaCharlie
Copy link
Collaborator

Hmmm that's very interesting... What about copy? You know the bounds on the destination and source

@ksw2000
Copy link
Contributor Author

ksw2000 commented Jun 19, 2024

The performance of copy() is still not better than for-loop.

func (list *List[T]) Add(values ...T) {
	l := len(list.elements)
	list.growBy(len(values))
+	copy(list.elements[l:], values)
-	for i := range values {
-		list.elements[l+i] = values[i]
-	}
}
goos: linux
goarch: amd64
pkg: github.com/emirpasic/gods/v2/lists/arraylist
cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
                     │ for-loop.txt │               copy.txt               │
                     │    sec/op    │    sec/op     vs base                │
ArrayListAdd100-8       788.6n ± 1%   1162.5n ± 1%  +47.41% (p=0.000 n=30)
ArrayListAdd1000-8      7.732µ ± 2%   10.980µ ± 2%  +42.02% (p=0.000 n=30)
ArrayListAdd10000-8     76.45µ ± 0%   114.61µ ± 1%  +49.92% (p=0.000 n=30)
ArrayListAdd100000-8    764.5µ ± 2%   1105.4µ ± 2%  +44.59% (p=0.000 n=30)
geomean                 24.43µ         35.66µ       +45.95%

                     │ for-loop.txt │               copy.txt               │
                     │     B/op     │     B/op      vs base                │
ArrayListAdd100-8      2.075Ki ± 1%   2.712Ki ± 1%  +30.68% (p=0.000 n=30)
ArrayListAdd1000-8     24.15Ki ± 3%   29.42Ki ± 3%  +21.83% (p=0.000 n=30)
ArrayListAdd10000-8    204.5Ki ± 1%   267.4Ki ± 1%  +30.79% (p=0.000 n=30)
ArrayListAdd100000-8   2.307Mi ± 4%   2.887Mi ± 1%  +25.17% (p=0.000 n=30)
geomean                70.14Ki        89.12Ki       +27.06%

                     │ for-loop.txt │              copy.txt               │
                     │  allocs/op   │ allocs/op   vs base                 │
ArrayListAdd100-8      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=30) ¹
ArrayListAdd1000-8     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=30) ¹
ArrayListAdd10000-8    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=30) ¹
ArrayListAdd100000-8   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=30) ¹
geomean                           ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

ksw2000 added 2 commits June 19, 2024 23:38
The performance of `slices.Delete()` is better
```
goos: linux
goarch: amd64
pkg: github.com/emirpasic/gods/v2/lists/arraylist
cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
                        │ old-remove.txt │           new-remove.txt            │
                        │     sec/op     │   sec/op     vs base                │
ArrayListRemove100-8         234.2n ± 1%   211.2n ± 2%   -9.82% (p=0.000 n=10)
ArrayListRemove1000-8        2.293µ ± 1%   2.063µ ± 4%  -10.05% (p=0.000 n=10)
ArrayListRemove10000-8       22.78µ ± 1%   20.53µ ± 2%   -9.86% (p=0.000 n=10)
ArrayListRemove100000-8      1.318m ± 3%   1.279m ± 2%   -2.96% (p=0.019 n=10)
geomean                      11.27µ        10.34µ        -8.22%

                        │ old-remove.txt │           new-remove.txt            │
                        │      B/op      │    B/op     vs base                 │
ArrayListRemove100-8        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ArrayListRemove1000-8       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ArrayListRemove10000-8      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ArrayListRemove100000-8     452.0 ± 4%     444.5 ± 1%       ~ (p=0.224 n=10)
geomean                                ²               -0.42%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                        │ old-remove.txt │           new-remove.txt            │
                        │   allocs/op    │ allocs/op   vs base                 │
ArrayListRemove100-8        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ArrayListRemove1000-8       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ArrayListRemove10000-8      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
ArrayListRemove100000-8     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
@PapaCharlie
Copy link
Collaborator

There's probably something else at play here. Can you put your copy and append code here? And when you run the benchmarks, can you run the profiler (with -cpuprofile) and show the results here? You can get some graphs with go tool pprof

@PapaCharlie
Copy link
Collaborator

Oh and can you show your benchmarks? All of these will behave differently depending on how you're benchmarking them. For example, how many elements are you adding, etc

@ksw2000
Copy link
Contributor Author

ksw2000 commented Jun 19, 2024

I use the benchmark in /lists/arraylist/arraylist_test.go

  • ArrayListAdd100 adds 100 elements into arrayList
  • ArrayListAdd1000 adds 1000 elements into arrayList
  • ArrayListAdd10000 adds 10000 elements into arrayList
  • ArrayListAdd100000 adds 100000 elements into arrayList
func benchmarkAdd(b *testing.B, list *List[int], size int) {
	for i := 0; i < b.N; i++ {
		for n := 0; n < size; n++ {
			list.Add(n)
		}
	}
}

func BenchmarkArrayListAdd100(b *testing.B) {
	b.StopTimer()
	size := 100
	list := New[int]()
	b.StartTimer()
	benchmarkAdd(b, list, size)
}

The append method

func (list *List[T]) Add(values ...T) {
	l := len(list.elements)
	list.growBy(len(values))
	list.elements = append(list.elements[:l], values...)
}

The copy method

func (list *List[T]) Add(values ...T) {
	l := len(list.elements)
	list.growBy(len(values))
	copy(list.elements[l:], values)
}

I will run the benchmark again using a profiler (with -cpuprofile) and show the results here later.

@ksw2000
Copy link
Contributor Author

ksw2000 commented Jun 19, 2024

The for-loop method

func (list *List[T]) Add(values ...T) {
	l := len(list.elements)
	list.growBy(len(values))
	for i := range values {
		list.elements[l+i] = values[i]
	}
}

for

The append method

func (list *List[T]) Add(values ...T) {
	l := len(list.elements)
	list.growBy(len(values))
	list.elements = append(list.elements[:l], values...)
}

append

The copy method

func (list *List[T]) Add(values ...T) {
	l := len(list.elements)
	list.growBy(len(values))
	copy(list.elements[l:], values)
}

copy


The left part is the append method, and the right part is the for-loop method. Maybe the memmove caused the difference between these two implementations. Similarly, implementing with copy also leads to a path that involves memmove.

image

If runtime.memmove appears frequently in a CPU profile, it suggests that a lot of memory copying operations are occurring, which can be a performance bottleneck. This often happens in programs that do a lot of slice manipulations, such as frequent appends or deletes.

@ksw2000
Copy link
Contributor Author

ksw2000 commented Jun 20, 2024

I also found there are some weird benchmark in arraylist_test.go

The benchmarkAdd(b *testing.B, list *List[int], size int) will add size elements into list.

func benchmarkAdd(b *testing.B, list *List[int], size int) {
	for i := 0; i < b.N; i++ {
		for n := 0; n < size; n++ {
			list.Add(n)
		}
	}
}

The BenchmarkArrayListAdd100() initially creates an empty list, and call benchmarkAdd()

func BenchmarkArrayListAdd100(b *testing.B) {
	b.StopTimer()
	size := 100
	list := New[int]()
	b.StartTimer()
	benchmarkAdd(b, list, size)
}

HOWEVER

The BenchmarkArrayListAdd1000 adds elements before the timer starts.

func BenchmarkArrayListAdd1000(b *testing.B) {
	b.StopTimer()
	size := 1000
	list := New[int]()
	for n := 0; n < size; n++ {
		list.Add(n)
	}
	b.StartTimer()
	benchmarkAdd(b, list, size)
}

It also occurs in BenchmarkArrayListAdd10000 and BenchmarkArrayListAdd10000.

Why BenchmarkArrayListAdd1000, BenchmarkArrayListAdd10000 and BenchmarkArrayListAdd10000 add the elements before calling benchmarkAdd?
This code causes the BenchmarkArrayListAdd1000 to measure the time for adding elements from 1000 to 2000, rather than from 0 to 1000.


In fact, the action of benchmarkAdd is strange too. When running the benchmark, the code in for i:0 i<b.N; i++ will be run more than once. Thus, before doing list.Add() iteratively, the length of list.elements may be larger than 0.

@ksw2000 ksw2000 requested a review from PapaCharlie June 25, 2024 12:45
@ksw2000
Copy link
Contributor Author

ksw2000 commented Jun 28, 2024

Hi @PapaCharlie ,

I hope you're doing well. I was wondering if you had a chance to review my PR? Please let me know if there's anything I can do to help move this forward.

Thank you!

lists/arraylist/arraylist.go Show resolved Hide resolved
@PapaCharlie
Copy link
Collaborator

Thank you for this and for the in-depth analysis! I'm still very surprised by the result as a whole, namely that memcpy is somehow slower than the for loop you have. This is good to go 👍

@PapaCharlie PapaCharlie merged commit 454720c into emirpasic:master Jul 18, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants