From 95653904a116a8220972108a94d70a15827f3c66 Mon Sep 17 00:00:00 2001 From: panbingkun Date: Thu, 12 Oct 2023 11:08:43 +0800 Subject: [PATCH] [SPARK-42881][SQL][FOLLOWUP] Update the results of JsonBenchmark-jdk21 after get_json_object supports codgen ### What changes were proposed in this pull request? The pr aims to followup https://github.com/apache/spark/pull/40506, update JsonBenchmark-jdk21-results.txt for it. ### Why are the changes needed? Update JsonBenchmark-jdk21-results.txt. https://github.com/panbingkun/spark/actions/runs/6489918873 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Only update the results of the benchmark, ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43346 from panbingkun/get_json_object_followup. Authored-by: panbingkun Signed-off-by: yangjie01 --- .../JsonBenchmark-jdk21-results.txt | 153 +++++++++--------- 1 file changed, 77 insertions(+), 76 deletions(-) diff --git a/sql/core/benchmarks/JsonBenchmark-jdk21-results.txt b/sql/core/benchmarks/JsonBenchmark-jdk21-results.txt index 3b48a59e660a1..f0e19c0ecf9af 100644 --- a/sql/core/benchmarks/JsonBenchmark-jdk21-results.txt +++ b/sql/core/benchmarks/JsonBenchmark-jdk21-results.txt @@ -3,127 +3,128 @@ Benchmark for performance of JSON parsing ================================================================================================ Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 2855 2912 65 1.8 571.0 1.0X -UTF-8 is set 4699 4723 31 1.1 939.9 0.6X +No encoding 2944 3061 191 1.7 588.8 1.0X +UTF-8 is set 4437 4465 26 1.1 887.5 0.7X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 2946 2952 10 1.7 589.1 1.0X -UTF-8 is set 4557 4580 32 1.1 911.4 0.6X +No encoding 2545 2567 31 2.0 509.0 1.0X +UTF-8 is set 4020 4028 9 1.2 804.1 0.6X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 6977 7229 433 0.1 6977.2 1.0X -UTF-8 is set 6373 6394 25 0.2 6372.9 1.1X +No encoding 6786 6939 264 0.1 6785.7 1.0X +UTF-8 is set 5668 5680 11 0.2 5668.1 1.2X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 15128 15242 148 0.0 302554.9 1.0X -UTF-8 is set 16572 16678 143 0.0 331438.1 0.9X +No encoding 12016 12190 274 0.0 240310.5 1.0X +UTF-8 is set 13209 13266 50 0.0 264186.2 0.9X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Select 10 columns 2698 2717 30 0.4 2698.0 1.0X -Select 1 column 1713 1722 11 0.6 1713.3 1.6X +Select 10 columns 2433 2436 5 0.4 2432.7 1.0X +Select 1 column 1675 1678 5 0.6 1675.3 1.5X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Short column without encoding 837 858 33 1.2 837.4 1.0X -Short column with UTF-8 1151 1156 4 0.9 1151.4 0.7X -Wide column without encoding 7283 7353 79 0.1 7283.2 0.1X -Wide column with UTF-8 8935 9006 109 0.1 8935.4 0.1X +Short column without encoding 714 725 15 1.4 714.3 1.0X +Short column with UTF-8 1020 1024 4 1.0 1020.4 0.7X +Wide column without encoding 6743 6807 73 0.1 6743.2 0.1X +Wide column with UTF-8 9714 9734 19 0.1 9713.7 0.1X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 80 83 3 12.5 80.0 1.0X -from_json 2247 2276 41 0.4 2246.5 0.0X -json_tuple 2205 2214 11 0.5 2205.1 0.0X -get_json_object 2111 2115 5 0.5 2111.2 0.0X +Text read 74 75 1 13.5 74.1 1.0X +from_json 1691 1703 13 0.6 1691.2 0.0X +json_tuple 1830 1849 22 0.5 1830.3 0.0X +get_json_object wholestage off 1761 1767 5 0.6 1761.4 0.0X +get_json_object wholestage on 1648 1656 9 0.6 1647.6 0.0X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 332 334 2 15.0 66.5 1.0X -schema inferring 2319 2321 5 2.2 463.8 0.1X -parsing 3706 3735 49 1.3 741.1 0.1X +Text read 303 305 2 16.5 60.6 1.0X +schema inferring 2336 2346 9 2.1 467.2 0.1X +parsing 3154 3175 26 1.6 630.8 0.1X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 811 817 5 6.2 162.3 1.0X -Schema inferring 2964 2965 0 1.7 592.9 0.3X -Parsing without charset 3803 3806 4 1.3 760.6 0.2X -Parsing with UTF-8 5557 5563 6 0.9 1111.4 0.1X +Text read 739 750 16 6.8 147.8 1.0X +Schema inferring 3175 3187 12 1.6 635.0 0.2X +Parsing without charset 3359 3370 9 1.5 671.8 0.2X +Parsing with UTF-8 4819 4828 11 1.0 963.8 0.2X -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Create a dataset of timestamps 198 203 5 5.1 197.6 1.0X -to_json(timestamp) 962 974 12 1.0 961.8 0.2X -write timestamps to files 859 872 14 1.2 859.3 0.2X -Create a dataset of dates 183 192 8 5.5 183.0 1.1X -to_json(date) 770 776 6 1.3 769.6 0.3X -write dates to files 614 631 22 1.6 613.8 0.3X +Create a dataset of timestamps 138 148 13 7.3 137.5 1.0X +to_json(timestamp) 917 924 12 1.1 917.3 0.1X +write timestamps to files 873 883 9 1.1 873.1 0.2X +Create a dataset of dates 153 165 10 6.5 152.9 0.9X +to_json(date) 683 689 8 1.5 682.6 0.2X +write dates to files 598 605 8 1.7 598.3 0.2X -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ----------------------------------------------------------------------------------------------------------------------------------------------------- -read timestamp text from files 217 224 7 4.6 216.9 1.0X -read timestamps from files 2614 2645 48 0.4 2614.0 0.1X -infer timestamps from files 6395 6411 20 0.2 6395.4 0.0X -read date text from files 192 197 9 5.2 191.6 1.1X -read date from files 920 923 2 1.1 920.3 0.2X -timestamp strings 209 215 7 4.8 209.3 1.0X -parse timestamps from Dataset[String] 2799 2812 13 0.4 2799.2 0.1X -infer timestamps from Dataset[String] 6517 6537 19 0.2 6516.8 0.0X -date strings 278 289 10 3.6 277.5 0.8X -parse dates from Dataset[String] 1251 1252 1 0.8 1250.9 0.2X -from_json(timestamp) 4256 4260 4 0.2 4256.0 0.1X -from_json(date) 2716 2731 19 0.4 2715.9 0.1X -infer error timestamps from Dataset[String] with default format 1838 1855 15 0.5 1838.5 0.1X -infer error timestamps from Dataset[String] with user-provided format 1846 1870 33 0.5 1846.3 0.1X -infer error timestamps from Dataset[String] with legacy format 1822 1857 34 0.5 1822.3 0.1X +read timestamp text from files 186 190 7 5.4 185.7 1.0X +read timestamps from files 2596 2638 60 0.4 2595.9 0.1X +infer timestamps from files 6351 6355 4 0.2 6350.9 0.0X +read date text from files 175 177 2 5.7 174.7 1.1X +read date from files 843 844 0 1.2 843.3 0.2X +timestamp strings 196 199 5 5.1 195.6 0.9X +parse timestamps from Dataset[String] 2903 2907 3 0.3 2903.2 0.1X +infer timestamps from Dataset[String] 6634 6638 6 0.2 6633.9 0.0X +date strings 260 263 2 3.8 260.2 0.7X +parse dates from Dataset[String] 1253 1259 6 0.8 1253.1 0.1X +from_json(timestamp) 3891 3900 8 0.3 3890.9 0.0X +from_json(date) 2089 2103 13 0.5 2088.6 0.1X +infer error timestamps from Dataset[String] with default format 1717 1729 17 0.6 1717.2 0.1X +infer error timestamps from Dataset[String] with user-provided format 1722 1728 9 0.6 1722.4 0.1X +infer error timestamps from Dataset[String] with legacy format 1705 1708 5 0.6 1704.6 0.1X -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Filters pushdown: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -w/o filters 18911 18925 13 0.0 189110.9 1.0X -pushdown disabled 18841 18860 18 0.0 188411.4 1.0X -w/ filters 1015 1033 16 0.1 10153.1 18.6X +w/o filters 18530 18533 5 0.0 185299.9 1.0X +pushdown disabled 18343 18365 24 0.0 183429.8 1.0X +w/ filters 828 833 6 0.1 8279.8 22.4X -OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 21+35-LTS on Linux 5.15.0-1047-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Partial JSON results: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -parse invalid JSON 3721 3848 201 0.0 372114.6 1.0X +parse invalid JSON 3262 3291 47 0.0 326246.2 1.0X