From 2968e8070f4bd89c20ba1e50610c336450c1cd91 Mon Sep 17 00:00:00 2001 From: Umut Date: Thu, 27 Jun 2024 15:06:06 +0300 Subject: [PATCH] docs(frontend): add performance tips section --- docs/SUMMARY.md | 14 +++ .../complexity_and_timing_per_bit_width.png | Bin 0 -> 29995 bytes .../improve-parallelism/dataflow.md | 102 ++++++++++++++++++ docs/optimization/improve-parallelism/self.md | 11 ++ .../improve-parallelism/tensorization.md | 55 ++++++++++ .../composition.md | 65 +++++++++++ .../p-error.md | 31 ++++++ .../optimize-cryptographic-parameters/self.md | 5 + .../optimize-table-lookups/approximate.md | 36 +++++++ .../optimize-table-lookups/bit-extraction.md | 38 +++++++ .../optimize-table-lookups/reducing-amount.md | 94 ++++++++++++++++ .../optimize-table-lookups/round-truncate.md | 37 +++++++ .../optimize-table-lookups/self.md | 78 ++++++++++++++ .../optimize-table-lookups/strategies.md | 98 +++++++++++++++++ docs/optimization/self.md | 8 ++ docs/optimization/summary.md | 15 +++ 16 files changed, 687 insertions(+) create mode 100644 docs/_static/compilation/performance_tips/complexity_and_timing_per_bit_width.png create mode 100644 docs/optimization/improve-parallelism/dataflow.md create mode 100644 docs/optimization/improve-parallelism/self.md create mode 100644 docs/optimization/improve-parallelism/tensorization.md create mode 100644 docs/optimization/optimize-cryptographic-parameters/composition.md create mode 100644 docs/optimization/optimize-cryptographic-parameters/p-error.md create mode 100644 docs/optimization/optimize-cryptographic-parameters/self.md create mode 100644 docs/optimization/optimize-table-lookups/approximate.md create mode 100644 docs/optimization/optimize-table-lookups/bit-extraction.md create mode 100644 docs/optimization/optimize-table-lookups/reducing-amount.md create mode 100644 docs/optimization/optimize-table-lookups/round-truncate.md create mode 100644 docs/optimization/optimize-table-lookups/self.md create mode 100644 docs/optimization/optimize-table-lookups/strategies.md create mode 100644 docs/optimization/self.md create mode 100644 docs/optimization/summary.md diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index bb5512a3f5..5129171941 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -35,6 +35,7 @@ * [Simulation](execution-analysis/simulation.md) * [Debugging and artifact](execution-analysis/debug.md) +* [Performance](optimization/summary.md) * [GPU acceleration](execution-analysis/gpu_acceleration.md) * Other * [Statistics](compilation/statistics.md) @@ -46,6 +47,19 @@ * [Configure](guides/configure.md) * [Manage keys](guides/manage_keys.md) * [Deploy](guides/deploy.md) +* [Optimization](optimization/self.md) + * [Improve parallelism](optimization/improve-parallelism/self.md) + * [Dataflow parallelism](optimization/improve-parallelism/dataflow.md) + * [Tensorizing operations](optimization/improve-parallelism/tensorization.md) + * [Optimize table lookups](optimization/optimize-table-lookups/self.md) + * [Reducing TLU](optimization/optimize-table-lookups/reducing-amount.md) + * [Implementation strategies](optimization/optimize-table-lookups/strategies.md) + * [Round/truncating](optimization/optimize-table-lookups/round-truncate.md) + * [Approximate mode](optimization/optimize-table-lookups/approximate.md) + * [Bit extraction](optimization/optimize-table-lookups/bit-extraction.md) + * [Optimize cryptographic parameters](optimization/optimize-cryptographic-parameters/self.md) + * [Error probability](optimization/optimize-cryptographic-parameters/p-error.md) + * [Composition](optimization/optimize-cryptographic-parameters/composition.md) ## Tutorials diff --git a/docs/_static/compilation/performance_tips/complexity_and_timing_per_bit_width.png b/docs/_static/compilation/performance_tips/complexity_and_timing_per_bit_width.png new file mode 100644 index 0000000000000000000000000000000000000000..712c05b34d2898a3c6b2432829b2e70b324c0112 GIT binary patch literal 29995 zcmb@uby$?&_bxm%NK1=|f`W*&bcxcf2*}Va-OT`s5+Wkq@ez>j?(UNA?(Q7o-2PS#-}&$5k_#uEXZBuuueI)VuX~TbqPzs|y$AOo5D2c6r09DH1lbD$L7Kxv2hVW! zPc4Bzy!K)r?Uk&I?45LN4Iy&6_SWWB_U0yfj~xwd?M$pJIhc8wSzkW>WN&Y6$H&5A z@t-#^TiF`3yh4>M1vkO6mQ=HYK=5@DzewrAX(kYeeUy}_$Oq@R?P(|HFy)!n{oU@T z9P2f|#Pk07Msx4EUf%PC-#gIJ(ibm2Iu0K@Z2vZEnIUmgLqioqW%&NVM0)oZ!Q&cw zj_BysD_T|pOah&d=%=Tvl`GOK8Fl_|EPGvgdFh&1aYVsSN((Ez5<&=mOf>F!(SRSO zAIM35etweJ$&lyZ;Xvba$anC#nDzhXmz}-7w*+)lFRO6Hv47>%ta`_%-}+8F*5(?Ou^~L&9-KiGJhZkTRw&uhT{JGrh=8+O?{~MRJs9qNMGMVG$p|+^D4evI+ zr$pAxrSBK~Hl@9eG_BkZU~WresArWBPa-Y@Ox?FWko@GISC+osQ)Jq|m*BfL;SqY9 zL;hdgcAgwHBSDB2E`IKD^gh1$&Irl>lll&J*t1jzXYNMxf;Ii6BWw0dhE*759;UN*FG>zxU)JrYWZ{6n^iJx&66)GhD6{nE}6+u7V6TKLoQFhlFzD_ zTN_iYs@-CsGD8Zkn^l4c+!Gf=`}ndLK6}HeX6nmpCy5E63kB2OeRiE&WxcXYNI)fb z{sj$!`X`mswFiur2ID=rs?>72ez{TkmxAQYwXfiaK>rN)T zOC|8A_+!p1&)64zlqxbbnfBuETmBi|*l;4uQtY@h)!~JNB2YKDyhjXa7~GBx3llXp%?!#ye7t_DM8$@L@U#D8DDNP| z;IWA=F$A&KpyX(;|9^Y=LCeZ;?05<1tq_3lK5cuug_j?mK)YyGn|7kBcbff-tT zyDt5C`2!?j9Iv)6>nfW_RUQ8yd`oPORa8~eJ1*K#=mciCKL~KLM78nWT@PheH{Mh_ zTCHx{q6%*z3u&S4)&*7l{^EDoE4aj_-}dtvJNv+)w5rTMAVzw#;&|_JWI@PB*c#i0 z+qHULKu8r5-t3s=gR>s%N z>rdS$F3tLuS_9AEA(adR`aUGKBROWrz9&Bwq8d_wE{jTYbP})62^| zrBFzB-yYT4bV)q4m-`#=P$+IGt!<&Z zCza1F&DgVFRr~W`$4=WL<5T@0x6)lfEwCu=i?OpKZx!j~@p5U{ha8uPQu@J-a0G22 z(<8aOOV=oGH9;IOT?+y4U3$)IYa6B8jKZoXsY55rC(&N&vM1k`<$j|YJ8%F z{l0jIjY!DO-?GZ1GFm=EG^QA_LObfJJn8e=SNs{w$YDnGwRMXI1EGx-D>dJi!pAoS zgxJSA(+Bz3=SxK{c~*$UPh^%8Z{Xg%n|6Y2)x5?W)ZLSvp-pbAHbXjJP*BQOuD*Rx zu6nSkKOAtI{GD4?cK;Lz3bB@zQ0?xvz^G{aycVcbmc)V=5Bh317IIHb}tsugy0SJPk|l^85)i;x^y$u`qRy z_RYu-%60=v!i(0GY_W0zHa7V3>mQ1iG%eM_S07+#ahwuNch>&C1xRh%eO76O%zto6 zn|fNj`Qh0=hqF{eLqZl@tcu3*?2h`ei_rI);?op&tl6#fg@QcmVv3^(&>j7bH((E9 zZq5CWtZ93t%24gsgiokmH%3V&=bybjYY_f^I&0Ybkpzs}|Md!*Vuy<`Im0<(<%*{{ zewVq~KBctlbcUUXgIhfldf9Fl_|-27Ix4HcXDb&db?YkaHVoS3eu=bZE7z24udXm9 z;236J(T^B{yCb5Q zLyDbiTVDTQktrujRftw>Nw8K%aIrsD%e*a{nFTXZ#P*#3ydZ=%T2@22oGPz->h)x~ z;(*>4m+Q3Sr{ZHg{DS?)iABt}9%MM5Gwu7YPV_oG;`(OixminAFnD^lT2mX=3f zHY8uRgxRVs>~*#m!#qbN7GAQ#o^4<5{AoejkyMqC_|rdc-Ji2)B{byEbR_6o@|FmO zaO`U&(P2?I3hya$cQc@BSS(+1cW2V z^C%vF4kJ}Xa^eR5(WAxeBNd^Hw{=}XKNFBZSP$0ca@%mf=S}1Hs*0z??hA;KuD9De zSpR*E9ler_nRrz~VuBav>w@1Ie?1vMROO|sqLJ`DV7^gHU;w4(dfX5n z{7;*cdQ0gHw+r^phX0|gW+_$LGcWmhqsr2QJj33l_J2PptS6zTuYfbhlNR0K$cEGQ z<-gCHpY70J|1iU^7lXZ8w^d5H-R8$DFQ3o(Zj+GHaa^gRh2AIG+6-ADzYUC%HQ}=7 z32Ge_5DoTxZ>xY{{Xch?lK~!z1jTFCd!<&9=`T|Pxyv5 z7QZijydAJ)4Y7Pw`__;UM8o!t`Yb7Jo7mmd+pQ};vbr%;z0t=isE?k$57P{Jh;a*% zcz++b9G>%LQ$bdbBx%M>_ie;%1@30zcR(+%CogRU@i~<_1QV>jdpcjk(6C#rAn?9A zd4Y%#x|83rgOqLDzpqJqUIwfA3qs2C6&@gww)p;$zd-|lY$@MSKBoIJB?iJ%t?(Vd zIIG*76a3cJz=++v3o0?uE*Ws(Ahm#Yw6?a@P!T_P-4fLmAEsqDI@> z*W0lhi>A%7cVm6Lllfw#4_d!5<{0IrTN4Uw;v9SzP*RRo6Qd~xDZE6e^rU`Z9@vO2 z$lVR<*KhyOsPGB0zjHw=lDD=s&FJmWT~3T0f7E|3tEIU@5PDwubf5j!xEZMi2gF9H z1`uoUGEtf@$T0O^&hd9Cg4{=BzS+N9K7-=Q&BmwFGnV%Div9dNwWbhcGM6XszD~Zx z*}lKK8Ipp8fD0e1qq@~C4VxI=ejF^JOyw|oxF2`$EnrMQ57o@=L~n6?Rq<&1t9Uvn z-BQjOVt9hpOgO!Tdv~K{5$T*n>g`??WbHo%V7gWte_>&J^)l%jj5x_Ee4k8z3) zdj`CW*X)4F>j?$Lhi+{}GxhUo7gS?|vpjg23Z{Jn@2chu?xGk`i`r?UeFupy*g~K zf9Ua&CMP(3rxGuZ#`yY7gy$!HUdn*?s%3sG_Op04oE>Bt*kZo$MN6udS?a;(SqG#5 zfaw+)67sEf>Xi6e2+5528z}}J-a2ow;?9BX_cR4#M^kDDbMM~)E2eJ2GH~TZ5&_EK`g|5gii#MW;!oCHA)>WzK#!1kWBB{^+A_JxkaHSpMkv?G|T<8 z-?Kv;Zc1N$zW|HMq`2=Cszx~axiJaz_giF&0p6L>XXnilhh+yNaXm{B?CcG*~A`(a-Zt((itxmPy`R3i?KkcA5@pm?d#e`sXhQw>Y&}x*UaMbxKf|F2!T|bfK`JyVb_zRf z{LQG!yuAs@&Sprp>G8&JO3D^?i9^XXlBn-T5Qac`U}n+BR7B+Yb$M^*+K|Yqd!{*! zJ3+E6#joknBp=t;C`Cr~!!END;)|HOTVon~2pQI0Z>YhYoD)}uFE9umJ(3p{{krOq z4*7)3I!OC0#08E!XI+K6V$q#u6`TH6>!yb{)_Hj~@JF2oq~<+7Y)XqifRQ2F`WL|? z`(}O0(7PM^g)k_!Cp)$r4;7wuw&mYBcSRFaN(RYvsis0b!x$`)@9Qr%J+Y8Mu-tH?u=zrZwtt#JPzcv7Va;K?XiIJ%{K zp$;WpWURWdCUlc8xo@pm)7YU^kE_7Q7;E-r$M2NP7_du$uPyErRq9~xnYBi21cnP_BRO=yr0tvZepT7 zK8U>C#vH0bR*!n81dPgd@ay-4nB4Y!Z-dt>w-zG$0=_>?RiJt)-Os12`0x+%2JWY{D7*%6}jetv-MGIR{1!{S$9-mWj`c$d~`Pw&59-2_ks~ zu)jCWF=J%9b6SoMuWV?m z%Pe#0cV{FezD47y<>l@j>-!KgvmIexuYaExu%Y39tM`#A;>S2@lj}1C8sfN{u;lU1 zbXZ?v-_}q}xG=hdAMF#G_BLhzy1Q5|E={M zS*oTSWN1yBm-sfJ7y+vJc{KaCaGUK8PBcP7;`+bL`-rK9-AzI9WHYb)`}e3po5zsV zQ`tsRoqZeSg}ghnd|vP;vWa5qmfOHehpXPz7AgLAq@?fp-0%X3s@|D?{6s?7?7t0Y zrie4P+YhA6UGk+)s*KSdP(Yje{llOe@FH3FJ#TUTKr@L#oT$rd0&JQIQ8C;3hrovd^fRl zU}dDJK^NAf2I}{occ1brD?4sroumC;fN>7tcSM>`0BK$(2J>!O=|^datS=bPQ`<}+ zIP1g`=21Ou%2G=6kx2P-g9_u$7Zlms!?8?}n)1uko)53j_7}(PcU-c!;~RYbNh@&M z`Z(q2Myt#SMt%03UcMP(lzM{p>g?^$pR;K7A=Dm^SJ#dGC^^47&U(C6jAS;i+oy{= zKX-VfE#PvGx3BK?RQkK(n`^g_1m`ay)PF+YjRAog6$tQQZLpoB*H1GdQAXEA@!C4+ zThRa!Of%xu6Av=Tq3}b6q4r==O+IAxwx7kWM}*xXBE+ymi}*HGW_5Av*;-M&Dgrap zd%E=abPHdw(=QME#B(*kPP7S=y~gv>h9_dwk53a}m7tv!J~h$1d;xiYhoPLMLdf?i z$vC#doX&pk^@~WByu-LTAy_)Jq;pVitGzpAD^eT4=A~b!1(VxhVPVc%si~=2R}Vp? z!fj#20n=|ZWanVaq)qCtvpu@pNW|`Kq_M8ZZ*Co#u=EdTsWp^|Uqy^eRriT{UiCd3 zlsNg{x#e-?>en{y@E=tC5fap}q1aRazFvLrH0-i5pKSeJoIv~`)(@-1RyxgDJMx*_ z5A71jqX1Afw7bDApjWF;CSZrS40+YE<8j*Uu&d*@35WG4_=*}DTP`_THI;;n0EX~7 zwuG8Sc*IVgYCxFRq5<6LxZ#TwUr}Z9Lh65~Q!vM;Q}4^jlIT@l&ezugJ2#|Zsq|Y5 zrbWWs7@$Zl42i#Si7HpOY%mZy!~fP})MPZt{Q=6p)p*cN^CxFY z?AtV2eN6nbY93gP>h7|tW<7*(H8H3t>DQF1R%>{yZ;m=QwC=Dp4ZEDrjXvsRd8Az3 z3Mtfo%7$fes)kZV7H2*U_nAjG`K&wWGX|s2b$2D3?`TM)ct>kfp?^A9U0I2y9<|*a z*DT);JF&m%3rg7b=4;;XueQ*tiJV1y?G=@k`*bCfadT>PWiiOKh8qo)z!Uy_!HRmA zJQWc4XQKN{40E<@o^$6Ps{T=bj~Y+Z1boH0zfv=`+FdNsyR^**%8+jDe5~a$ihy4< z7?b=LPV%ut5c1px>K=Jf#mBs<$`}^f;hJx<{2u4c$ROf&bDwgKW&ms(YTLB<5&Foh z1C@*c!o}a@O229|(_9Ai2u5(j#hpuBwO%<4UG7gN1=J(Gt$Ip8aer+wGbSi##cDZ+ zH@1VRXpo8Fiv-Jtyg2kje49t@C;nOH9=++@7^!HtKjH^{c9kQ1GpsmTakUA3WXA8` ztI6Gz_6x988`nVhahm3Qg`L{x!~Fjiq=#%C=md|5KW;9|nIfpUeWhng zp3T-gV`FgQ!mjwQVTj#<(i4F`o@T!VxxBR3Xzu@=~?&lBd6JtBS$m6fd7 zf!&c>m%sEkRbE6gB~RRM?>mVh5%xLlpYoIl`QMer7+(-$w)LDXW%IXjQ;Q({qF^^a z<~bBWK?huaLXm{}NOhIviq--RyYA?Ciqk_MR|hg`mOKX}D}gH<8Wt*%p=Xfl zpxF2+(Vs9z%xrRuB%6h@}~BfNTf#?|;$JzQ2?)f27)JYuj7qOXkBjQn9Bo;* zha&dO=l`>=ft`NG$3D=Y)vZgN1Ra?ada8;)Zne;cPO281-Ox80QMgJd*Q+Obkm`si zv%FN>;TPZOWerY=6bd&B+Wf2MkP*|jKHZtA3-k*`EfgK50wxd;SCtfZ z=0H`~!BjHH#QSA`6#ubM^&f0!iFbxMiEw)C(s}EOisQl5l3b39AM#TUD#noik<3@j zTSdLu{bKnezA}Q|_w>_Ve?46jk50_G{)Y-vk`^#PApdAEr#n~(@PBYlX1+b?_A>)o zmykotlDzqSd8Ryd)ZoAQ|5|1VP+#W*T8S2}efn-BmqYdF9D}*6A=}NDtKBufN6PL@ zb7nKx1hmj>DyLB~pqTkX3E74%O~!89T3TjEQ+x;PZ!mCw57Xikd9+eQA>$_$OvJ=U zH=nP%ZAO56)YCh&Gb@&+^*yUS$;mXOQ5$ za*jHn2VXG`S2?~D@1I30!z<#7h;Hd3-iGNa=rW;jnqqlK|V|ZBpes@u| z10_hjMpazEPE5M zAbvj;YC*J6`mI zbx*l}pA}z5|Bl-f606zJHs@@VtikNf>m1l0F9@>Id&%-$?KZdDx#T!Eib5O{c;n>G zNyd=7;|8zV?b@UU9pA%0_MMurJ$WjDPW5UzNSEAzOx5MMwv7r>m7|7R-^$UOvVBua z_n7aZJOF4-Or&splZ_s5>d3D)a#$L-69W+7fw6{Xb)Xz){|{C zKoyZ3`;&rIhZH0-~+xVt) zr5j1dC`(K6eIr^;NcGGD?X{V*Bb(wNaycLdKS#i}y*`t2Ooz~vyGX*{#=*>gGD3D{ zx3idaz=V=j*_Yj{`Pfd^S`;Q3A+IjVzMU_ue1=THkMBKt6fyI>&%@P%=2O|UM>}ko zmv^Qz5D^Fqop$PkeYD<~gomcIKSnPO+_!kQK-3qwHT?<2vzioA_MSX}0m~d@Taw(RWlAxlp(x%`-K(22!&oz7Q<#+1L1o6S)`J5A}Ei%Ku3z^GJ ze+<9IXs_KB(z~%KDzl_Tn2lH=lj2f0&K13giwtSwI=pqh#6&0Lkw;X%l7ivtTl{xi zZg{;@8r@Hx6klz~eBpCC8L2j=}b_dITAh z0Eom0jpN0Z2U0)mPDwQio<6?>*mCF}_iN-hUS1$-I(>PiGI-%M_kw^1x513b3-BA` zHoL+$E6X7#JNzGBYRQgRlo7&uhM)eR@McgfiUx(uR*BP|?^X5u%Q4Um7r3EknvAHn zU9sw_| zE_9lL3Tx>3>A{TXUu9(j;j_nP@FT;+d|hBvtoq!x#64MD#3~N(78a(TT{*EvM}@~A z)&J_d)w2G|NhR=m&h!$I=$N-FPm7OCmc~hCKc+Uwiy{Q3!prVg8gmb{=#SO z{f=+4Nl7g`jOQaRp>SC`VO+1=FK=EJ6ku35sj7^8U;;C0RA$zcK0euwdi&<&o#hf^h_|bKn)yi_jPm4b?eDxB32SkmyTvddQOI7u#ub7; ztku>$%#Muk2(2Xd`K(JWA+$&CvYq%4%V}C{MUGp4wwSPK`8S9v*5*^9EeFfR=WDc9 z(}>oGl&z}rJJ1Y*WML#o%v|2rdy^A4#n6kG!MTwXsntzkG}HMqq&NtBdF9DAML+j& z*6BVJTSv0aqd1laNrTxuo?q#DQUFY=`0CQ`i<|wwmPsj$=CFwjCya{gF3x0yFmHj5 zK}%Lvp)}FSPG0PJoj)VR1ZTqN^++V*SBbyvaPDh(HGI=3Z>B2?pX#G}8QuDRTaAkp zA_@KTsc*n)B!GD(O}Ppw@H%`#D6T9VXkF$8Jm64xue3r6ke9IZeC!KMD?Xj50YR#= zvvUKM{p{t|UDz~GUR_cYI^~9V>U z^tM*bs0~{EF+C2?0j72nB-GzBOO>iG2N4XiLl|bgC*`cg(S;3EQLJrtiDhG+y*(M^ z;`+LKGt~j4Gb0m1hGJaXdT(o7^A_)U<+OROrLrR~r3V{jX7I{PgC}m>P5cILH+e@p zV>_3F-wwXo^|3tF>)H*9>^ZB_ULELCeNk3>H;TVVDAmfoh^T3xU4#z5IF8CzS^0C+ zT477HE=6cBi+So`vS8f18Qt8xMlcgw6`iQEi=3qIPEI6&`$s}L1ytU{B9C(y^uAx8 z;-2)xLh>EfB!~QDfDF{-kPBc4xz?G~*+zNc_>RqcNF~=D{~XH(E~PSFygGwbbDdP{ z?dx$R*wCRRx__xkK?Q6B_+NJ`N8fL~{e|R8m4>v$!u=zTkCR#4>daMfox;~VTl={o z_ZaYWK#XYb0$;FwK zBd5M@E}^4Ph?C$658-OOM3%&(R)YeFPPxXG>k%8cB@dA6s^)EmUGAaqDhR&?uRkew zV!7Aq*?dB2^}A_^8Z%u#gSTm32Y$@0;~)6MVTuR&{(aE=`}~ZiQJq>2wBw_~uK37Q zMUW!KJ{o;g`ivVl#(Z08N78xx{=b3A$6x;eCN-oGs>!lYcFQ>WOQfimqNaU_n)XElA4(0S(n3BRuhE9}M8EClp1tpMD2G9%eY9Ox{yo41er_CvjDnQ^$5f39 z)Ri?g({;gQTJ0&qrU~m0a#LpwgcGOj@4su{JYPPHl{YPhyD~x0Q_~j5s1e!UL zb6Aa#=b|!`MW9r=GT9XH93E^4r15bC444eEPB9P6xCQ`vCl$z(+uas>-p&|rjTMNB zi@R#AudlZqMT6oHFBM3*e{7hVYgV$BjG-f#X;B~4i*ojT)ieQ)&`aAkk;#A;nY_!< z(SYAUY?CQF76@;jpz5hmXofD`H^-`i=GKs-F@3N;APS|qI^mt~)=6A$CvONWf5vz; zw^Ow{qdNPO>fUD>ng!(+BC|@*CYDFNPF+qA*6N^2&*i@-LYq!Ww!TE_ucvEb+By+1 z`p0@3EYNH1dHzT1<<-%Noz5=$$IQ7qb_gFD)hGf+7@@d6e7PHGq*X=j4vo;$f-zlosd|YFjo(YZe8a3 zn4l)=TIhsPhS&gq9^TZM%BnDmOjjuZ$UbkJfaNvss8p+i)UT?gw#wT8gTH4tyyr6+ z*r73)yggPK?say?*MwN)x0C#5v`0$~?$sp^z3g0zX`cVB%HbIh$nIoDH-$n*w_iN_ zTbM`1`d9A{;JvuBK|I;a@!EcmTqm0ilkCYsR&x!k0HU0G`xejJrW23V2 z4ptWD4WR~M>y+X1l{$kw}~jLyy<`Q}A%Z{pHNk#+jZ zh{lfhnt;ogsa%hFq6Tz)*oBbWT%Yq`-|K)a8v=WeRXQXe^IN%9dSw+keKDCG3$IUk zIxqwComLQ*>u+hol9YYAvXSLqvo3B0dj%4RYu3kN%+=obsHP^hn2~_r+NuOVu%_ed zy;?{zK5Iwo#ywF3pr@U-M|AnSGD5e*o`l>neG5ibm4UL?)AXPtm9xX(b{QFBiSI+sf$HAE!njkBu8cV0-$%N9)2{&ydgzSJq7<4}G zuKrrJOLrKdJ`TD%V|-4n#ES3%fBdQ4TLg5LT4Q%3rsd8Ss$u{VnRurNBfsCnK$(ja z>wUAj{W-Lm*rG8rB-WelGKNHcGP5eje2*)UTVgJzXk*L#1u1j(`E~bUG+d4O!Tw^VTJ>ZN!r8;OxR)#5 zFG1^T;UcVwHeZ`lwRj-4s7*-B^Z;8(ADpH5q9Bwyd?UD)BZDZqv6MY8KkIjq%8H0A zC0gI>($@Yuj!JeUec)hXOTODX-AIBmP@e7i00QK(9o;3o?#FI)Ww?UnEjdn5;+oT9 zJRuH;?I(F=Y_U?9_Rs^M41kbNvF${?@p^^lJJ2=t;y^*^koqg#9%$$v>8mLi5>@-Tzqp9>YiH58s#kOcCX z(C#nynBDAz3{Pkk77r4o7{}E#9R8k6KT{TXWire*`hhI31VR@C>r{_Kl;P~1Mm?>X z8nUnTXw;!;MJ}E0=Accwld!fq08KeMnovp=xZG?#5jV5OzeeHga7M!9S8&{CLM;LbWYQy)o<{{|Y9k{!T=y1Bvxdq{nS9*m-_A z?*C)nlgG1AFaqJIFm!~r`3y}YC_$HxQ&DS1(6$go>nwiZeApB7stwapO$Sa&pE&6g zZ~TSa?4#RS{Bdo zW;3y>mIu8+$OmS%#$5VDy;F|9UqgO9c_4V<_N3<;wF%##$N%?0{PBO*YLAB4TePnZqtnu+O~OEVr2tT57? z17WiH+obdU0ZyKWBM6fNi5@hDQ#@}J9?p%EI&Ne<9se?BGpdPb_)6{KI7xSoJUnY!HA_Tz^`;(gQ zThLhgj;lT4kv-K{T}JZeWKBSRfmz?X@M7oEodOU&paexI@3G2oX-XeWcjnM|pV@RX z$ZACNfc6(ES`3h(1OhD|R`J=oG(vlcyr73)D`Jkblj$u{i~z;T-?ZJO=8Y3ScZhZt z-F!QtI^Qp%@S^jyH2vR_j~T@>Rj4ecdSdNd{;1or6)Y$#fxryn$-lqnEmqxNX2Z@y z>kT?&W=_f^6?rxC*5ZqD=7`q#uTC;%y-^|s2!XKct9j$NhU^mFu!(rUcGuc^c+&uQAUlociY57|R~Q=H^oTr}!Z?%UCP0vE9{N zX*IhYt))}XN&DPuL{|hz{Fh7ITmay4SeNxR5q>Y=!v3>|bFf}i!&)e~$wwD&Kz0qx zp}3;6v%PYg3-D*C$7>=3p$a`*U5#E}nxXFL(U`37qeM_yQz zf^91+EVc$^IWAaNtcQ@xgA)kUsxRc!j6}W}AMIWPDaMP3%%Dm&>w}QO&Yz7qZsSi_ zZjJ^2W~sgN)CZb7GKA>3eeHBdFueP)J0mx;?WDu=@N2x|gMFvK{Vfh0Z;aZjP!a(A(J>B=Fn3FPCA&DQ z749{`Qxw;&-vMpSYKw#a>1qjt+^z{1Xi~V^93}dYL>OtP%dGw(qT~0~kUu|4uG+>>yN_Lmrfm_S{DO(ocN-A_`5^cQ z0YZKO9^GeomnA`?fM z%&4m8|DEPIaJ1m-HnuZ|%I-?t;0W?wd%iuo!;eGdjDc=M03M^on=8B!1l2&Ao1C)e=kpktn2(p2ZAua>w9B%>`n87y zPMwzhb1GbQ)MrUv^z;TL`|j;xdD>og)g8>CO^=1B_F@3BCE#Q#HoJQfB@qfjPpYI0 zx+xVbfhbACZ}AEFbi9nU!GQ-#VW@0HQ;XELzVp)Wo!S>lF&XW@UTakqvfNzRFx@2V z(IXiVQD4I#-jqo|=y!-X1n|mGd2^&-QTr5$0Ja=1s$FZPivlo2PU6caMM)teY)^n? zg^`&qNM6jG&ReOU&6VY74Dm;U`ti}ZrEyAaY4InYCXtbNr}y#kayqdiDYa)bf4pGZwA4Mbt*Zg0`s@u|o;^u+ zdgb_8;Va$STp%J707*D1II;1Ko@Kp*QuTNlhy<%${~ezSadM1mMTPo*6EM6(8iv(cv`z!&6A`S3_9IpUFA#1vt!B{X!T_>Md0Yr z$}k)ubCV4Cz;(*)D17w_f^|A0GxqCA0<0x`c#747?U}Ot1;9!fgo13*vb8Uip?(>j zJ4-RAWcP_b=_)&7;iXY}4ZNVw>zP=)lLbfy=+72@qjH+Xwd4+t)Ogc9n|HWnSAmPj zTCsLrQr~eFa`IC?-Wa<*qBdasCCY9bsUnMS(KW@~~JCoT&=QH2qf> z0mW%rH)xwaokJ9=DNo;P!%{~=Qni_Sm!<`inG&g^MS{X4wZmU9DY};jTAY&rT}5?? z7uaTCYa>)##pJwwLK5dw6;WTjrQEiakrz&K^7r5)$dBUA=pjxC{r=cMbC)S=@=q?$ zKb%4GS}ld!-=bvc93xfOY{v0GCcssVADhTJ9eBkG(g_VvKH;J*wxRqIaC|AUzMcS6 z`$lrPxcKZt6;AJF|93WH`vc|ZfiKfs#V9_y*2`7c8QG3DbHKc!tE=Oo1Ap!pWb^6DV1W|`*%esbg;q>Kf=QO6 zeo2(e=7vX#C>lH;8r*uTo9qdhcMOHyT#b$9w%n(bnJ~-K!yD%gu|4>1u9cw1QLz&m zd6c9b>zy`BV3LcF$)%`ux*aP^r}fnnlS^(e<`G22ZH=Ews1Tld)F6As%A2&cUz>oy z(oBE*?s0}_zVPbNSxC~beh}e6S&*@|yc7wyXEAEkU=N^nraZN{wZj<}ZUvG9*&lK> z#7yetdeU!%%qPHV&eh=_9Pd5FQt9^i+Aw=`{bK}1Nr)?_N?E`exu#h{f<%a2ew%>iYR8T=$g z;6{Khprr^QH;yhBpbQvqrc^yiwa`e%&m!4pLg-^m4!~)fQP1nQ!{3EjCB#EcS~ov7 zp7tbj=^tBO^II@Mf^cMv72C?+YD%YwA+)B!ba``ErwpJM@bP6z?GeogmxTa)Wc!8T z5m^^2PWeIeh=yQ(&h9;ueIgG^2)58wSYTRW!-Q7Nxb67zhwFWbnqJ21Yqt^mZSj@# zNy_p!AVp)O78wPD-aX{0&>rL;6b#oBk}&>q-AlbrDrR|8>Lyw;hg0Qf8jlth6rB!y z;a`VG9L<3b#Hy_7o}|l=mCia7^BKLBCwX;N~!~dDe!1kkLi?p z!kB$@_x7B!5KK3#I%HRCGI32YD~OSFpBiWraecirm9sFZiu15$4Y-K-tRIJWMN%JB zSC4A7-KTszq^24zbJEfRc>@FQt-_c==4?ikBT-f9z1>e)Im5#u+jI+eu$}+! z6mX`SOE+);P7>LZl78;`_Kd4^#8F1$Wp8ygUtb0wGJjjuA_{{|hoA!ew)+$m>n^Ej zY1CHZz_0uq1&@hIu*vlBu$~+rk7D}$JlTq^dJ@$3CmVh4+OEphbyO6wi_0#VA&;0; z)p*BmDuYtmsI!jA>gWIa(2eEX3o+4V^?SlMaliZOKuck1HQ`yhopE0+lgskd$@+b8 zkdzSi9s~wAYLyoYb+^aok$)%G>oqvkd*Q2U2_e~F^i6(k$WFItv&eqf1PDoT$?3&J6 z!ane>5AuCa6F)i!)8a&hw#!H>;6ItP7WiHl)NM z@XoGs+rxZ^P4P{qnG<^?RpVV$rBF}zccP*j4b~OywX0eWDY*%M{d&ET1)|>s*3?xMIog85%#b``#z@1=-u=e*3%Nx@hlB6eozDB;al`=rAP1bbX!kr7goGrx z7pk0fy~v)Ure^tP!q8ePNacr6a%f&Bv574rf;O`#Gi&$u`d*{F-V(kRS^7rid=fqx zKXuf9|DcNsNTT;H4Qtq9ITut@YH#1a`(bxNY~^2=ZG*m_ybG413U}w)l-JBLf`H6u zIo=FTq<7pUF(Gytfyg&!BtYn{RuDk&xd!qGt7Xa&ws_hL||m>!$H?C(#hz2*b$-M;G%#zGgD6>&(ygZ@wt3q5z` z%vVIf4?ETExI)#90KJJCq%{=(qd1)}NVBdgjLZYtm&pHp{N0C+w`70pnmV?BZpPI9 zCIxYKpYq(QR;mME>B(cSA>h$+a{F6w+AKe}IxVF7p0^W+ksc4$8QP3NqDqJ=^yxQt zT!8b-(+M)fna!w6YDL%dE;3Rv19HWB)Xajm2BaZ{F{NtvViqr2nIu-w9h4M{=?7+QeQOXSHKq5UUglB!1m| zLA&v!Ep>gntztEy1<;M3iV|*e+P5YAZ^udaFTa&QRWiQep^}&S|F!nz z@ld~Q+oO~SWhpzQkSvj(eQO~~l1fS#N@U;nEfk_i5-qe)c2n86v6FSmHd)6$wlVf~ z@SI=$-urp(`+4s7`Qv^6=<_L>?_A&Oy3X@Dm*Y6D9pcA(peGnbs$3d2bMfE2WVhq<4nNcg{w(#N%uYJ$Jnex4phPhgTBp9lB%)zx`2jp4J~x|>-;QQ= zU2R9OwJN%+zcg6=7sTS4lS*cl@|H=;|K;$UO#1uqj4`kOA8T9$!|abKd*GnwH0r`Q zcWpNLQ)#Z^RcRGRS#j7vCmPwOXe9qBU;d_~lu?XncxgWKZxC*lsNT&{C+1^8N{KBz zS7f=IE>Wep6?Q=xC~X>A<5UTCEDvm+gf=87Bt`Q z`=gA?O9pk1X4hkUX_cy7xcKX=Z;{RD8h8K=tz7uotw=yaVr!i$L4M1PRqyU;5|<;k zI4Nwual(}6_VTH>o=|K;I+LfiCBGWu7ugWiB{bWh_3hiYlRkfrdj0yMRhr=8Lrj(@ zptGB2>?R*!+p}+`Eu?c;wb~|kmo$!g%e(Rt0_}jKmc@=sExFo(PhV8~v0p47&t~=| zn5V~sF5icGSi#>bA> z5mNdKx#m4=8mv$=s`(&0z844N zGW##t$8SS&RNL>XRcq};if8pGTZ+ZJ-rNrh6zznDrn8kd!bH*wA;`Iu_HNOVr{;R> zV#d%%25!2kh{tkj!BJ2F?*(HJ<;5~!!p7&0{Xwfobpsy;(3lf`ly6(eYXAF z-2$q4h}c*V%e&PxW(ht`0CIv(v~q@G3I)DhUYh79tnk|Ekx} z3(K+D`QOqp{#j{u`9+B^aw2^u58nmfVE%q zBh_h}=%9b&;As<7Ca=%WY@7uB%=XxUn`VcZ4l(~I#xDvfo%k8j5%xz4W+a8Zqhp#~ zIe0QdbXQxeCdUf%mvUb!@-z9jaetDBDgP-g(4l}W$G@Wz6P5UVs$M7+T8>cZvBZ>J zIJ1b=Ep5{$k9Ap2p^y>`{UkdP-grb}#|HZ8R5qSM^^kfeFs$jcBwVwzc8`%53x~Rd zq3bmem`ZIn%{tDH-Qm?OT~qN_DSq!m_Yi_5K}t@W5;P^Jd*!y=AurgQ7dbj;+UKkH zCA2*Vu}AJGXvTi8?cF2J7R82EYhsX{?jgE3PbF*BpUg?q5%r#`4uqwDwQdpMGu zaJ--n8^8lq4ECjoYdf*?^@53)%aDJnMFi};bkg0f({7tYRaAAq*M1NtQdwn?aiU%yF;+kO~6?`Q;&UK~*+x!pXUg*a%uNSFV{I zv^VaH{BR~5j@hVtI~xfFO@AodS9CUF%8>|4iw$(GIFSe{?AvwxeB-)d;BK)JXKjDx z${?Rx-$J<&6#+PUvG)rMS#=UF_oriY<74)mF)`pw#gqoiyATwxNI3+`QYfTk0Lbvg zi>3Td6l9ASoPhrMV)DcY%xi=;z<8=_nA{u1)Qn`9vt0>3e&P9tX9ucPso0&(g`mzg z68%QKT5;!t?0`Gk9%XZ?eO84nGUsG-Haq@Jw*O31j6P&FsjfaZP50#clsvMEtpEs} zBXJ^rvM3|;ucOGCifMw9+DHVK2A~^J08k4poKd*D<;U}CHb}@(JYTWUkwPAcRUA*? zu7D2uJsIn2jOKM>{tgFD*n{l*&J!B-Zb&Co(*a}b%6n(ELmve57&n3$P}HtELW5bw zjGiT-np)S_>i9;mi_D`7W4lehcZFi-FWt2-ih4Lcu7God7w%y&O z9yAxf6Dwt2$La5^H^*y)39#dzOeh)%Sx({dX28SWzk?z$w{qe__qPZo>F`~a;aVe? zG%j=5C2hW}rxp}kA7FtPI0SgMKYZ5?HTZ26nx8j0^6S|4xoK6qqY`toBJL$}_movL z`=52(9fML?6!)RteTTv|@69%%;{^<*KHA)x=3e6jP#=J?{Axq&$KDGIR4BljR#oEdP+~I_SIvvtR!V*qv&KEiuMB=-)Jzi@(J$tsW4XBihl` zH3Fe{4U1r4BV<(bwvf1#q*Jdz#+jq}cAOqhXMA7&Xt zF1##uXVUfAJK}ptzyf}r4Vvt=&+h@cUcw4+_38{Up`ofED)<6=D6G)$0G%t8>G=?5 z)%7cIx~Jo%A&B;H?Yw}*8?cQgj96Cjg+^T8ZJ6emAmytQA(Pw@cD$_$XJnUo`a&G! z+4JLxkupyA`T6^aCr|Q&NZg6lRhFk|CsF6FC_GvH<<8uitW%n$YpC1&w#;TIh>&+x zkF?k1L99ouQB(j_g4Ev#z@z{0@EUNK+`gDtoim2LrnqK*vhO^>Go+JTrUTT;5T%*F zp+Czdgm3p~5V(rHx3E2qX`(si_`mQStW| zDNM1b=e`7A>f^^>IOu4L7y}mi`O3V=an%*>0AXQ}!3cOmdu0V(cTFuR^)xpl-S&#= zJi()F!TnvX?VTZVNzkk_uEtD;Z;lg43Q(5zu&mt+;pwiQf-;Yaag$A7jqBeG+)mmv z*Z9AnanNq+@BIUGjhZ8VC0ZK^*|nble_E;wfT%e^V5$M1V5D@J6N;TEs3KklGOL=g z57U>2$F`;I*!P0gS5A;eL-7cjZnU z-|ZTd+GA>`o}Y|{QeVZ+2E5t}M5SjJ5lJR=9@M&XZUmcNmGPEAp5)uiiGm2Hy5EQ8 zm;VOW*uXk)u3fy3B3+aHGHj@WcXJ(v>q!NUn%+Ra=F0G6vgAa(E=6rE@G*F_7Z$DF z5CTa0Z!1dTjlu;)GmP=Kiqjtvzhqj}zUf0jj-92D%8(!58Bp-GD7lt7>J?J2tgZtd zt&l~p+{v+1sB%_TwI)4%X6S=atKscw^K92fku(B;JNOKWd0^CLYr#U(m81s@=)Xgv z$QzkPT00#`<>DQm7Hf0?e43!d(T~RX-;l~Qs@e}U*1VSMla6#j{}g!wt9v6e zOtTRw54Zl9ew+V%Wm4oNMIzTcsO$2GK-H5nk_?b3z)w^<(ID)kkcGUv?lX5vhx@aO ze;!kx`VjtO5TPUf#jMfEIB)D94}9?oBbILd7S;C<+Shev8G%KcyU%a+ij^!hnqMMG zr@JrOvox2rkQyGkNiygaX=X3|qGXmxTOa_5A_4W;arM4>&2WX0Il}h0Xks$vSo+r& zZr?sa^;rMPR1v0TD*uAg%^Nn3e(MS{#7@Iw!ly%# z+EuGxszK@bk$==H{>Yg8A7#BU7=uy|^iqQq!6NSH#^a6iGM{r@u|!+<04$%9W2K*I zyI+RbMxL919`EU|=S5c%`W}z}Jf_p%ABU_^Nwv}Ki-XDUUu?G&7lc{g>_k4ji5%*#>=E z)eSF&knxA3lOi{(?~=5qed!Ys*#ju(6I2sCX#kf20L!w@A_9b11U~$?2iDD#Jf?Xl zJB826`1if99AL{_I|a&0g7?dQErMc-bP{OZcw@=kCj;#=`Pt{EML}}K!lAC% z-@i+j8G=7a5|-+<1Gc#UIY!^nrjPE-86PFb{Ju-=pJVTS?>v=*bSfWDOua7Q=R!d* zha`SC^WzNRslEBCin*aQ7|~-+a}Z8 z_gsr06Jq?$!opDiv=Pg=SP`YeYdVNKwU#Lyt~EvHGorL^uzr&zTwWw>g~_M*`B#ud zvqnbW%g%O?Zx<|Q^gHU3&1MwRrGTl7xYJb9zp2yw6BDu7MH_#>u5lDN)$wQ!!}jl zES4hLiUl2jb*`zS;uISb>VXG&K-QOv09f*J(jvKNMVL^Qc84CoOtkjt-neZ&d$UYN z@@es>13-X`fNhs(Yq-xI-A9n3`A}nX7utH*+b`5F>PdsKf(xtbvv93AJb>O^a5>nN z9V=dm|Ni~Erk2(*DXHl8_8XvC%I}wScjB~D-Lu5IgXd6}D=MlDT@QVeHH;KEnP!YX zk(oBuht8h`f`zbq0r1GhgZ<;%9&g#^dWD3_jw5wwU>9rhq;+N-UHR+?X!YF>p=gmJ zl67Qe?CrkLNDM(b&B#Z|p9_QQ*ECgcraPj1jeixSp*UKFucW(&hT@Qt%5)vpQwq+< za}$xOoi<-@g>pL5I_CFZGI5$8)ag&`9vzz^WP7X;8KzIB-Y!i9q$Xnf&c*qNR~y<9 z@_2YmBpf8C6IYAL<)ppQPeK=PLR%3%-|}qlV5|%J_C|027z4mY*Bdm|87H%Y_gbc# zLb1Zs50)62=2##(Y5k-}ZaGSBmOZgfkiBZurp@J6%K`Q(bk;}#Kg9{>y|ob|ocFjl z@!;UAgAF)b4K%WpQgaE)6fu5eV~V{1?Pp3p;o~gi|I^Ye6~Q1VOo{siO166BDbe0{ zxa_vhxsAMYs7hO^otE$W-iWJjx55rFZ6}q*t0k#B#=N>VTS3F>BggBmt}SZ|usLw5 z+T#zYmar{=wWPRrEBM_;XM2T+l{dKt^*BwX@`Td;Lgs(~ejrI5Y7kdJb{y2od1r?1 zqM$lu9Y_#PZueH3seO0aI1U@Zx77Bh%D z0|qzuoxDz;zPxA9L~}TFhzcp>tSXX^Q>L>gc_+R8t^Kc7mFg&wDA3S8|6 zEa@GJYDtz*yawrN3Ho>Pg>rc$T8}&Vq_v`$$)?Lsm4oCNS>s+^00ez#!R)!quhtow zeu!i*bZ0#CDKb%H)CWHZ@={K?uK0=WORzaK+mP~vU!$JkG_Bim=AG33Jcf;ga*JtC zKa99w`XTGLGPip2YP#LX_&h`}SXkn#@rrnA+87w^LVvH=jp^H^_e?f!ZF$>vM zR+0fQjimFdFG2jdZke=EWSu`VmFIfsFz>or_UrQYb^|J0_ZNn|JX>1LOO<9SdGaeK zdBj!Tl*P!7*?f9{o8&3|qin{}D|r(cVfF_YAwx?_R&7+w<9j`Mo9B$1S6YmU8NKZQ z7=?Y9zbzuYC%VHs(Kg-44r}*$(!t;@Uyu z^QpAFlB_Mylw0gBXzC0{v?&5Y>n}Ki_?rnMf^`OQ5q>tJtx1Fjp2=`0mxxSf&W>u?8<1=hN1Z&Ww6g<8)Q@-U`J zNga4l-?Lz7Y&T2?&@5p1jSyt&px{M4Sn{xXk zbWz7HUc3)RpczhUNvVW*{SizZ-)Z{Y?w)(HDtEy;6wvK=JbZZQq-ZzUm#QVc*ean> zO4y$>z_6Kc6YQ4-x0L!hg=Hlg>$_P$jf;H_u2G3&!JQMJgcW5wg;6k z%?LU{fs=7B$4w9ondIu(3#+Uk@Tg67HHir%V00W5<(dw0v(CU*hS(M=hSsZ@TxzK3 z%O!O)$GyBkzK`QPU)WcOPB;3T+#z0^5(cnTaH^G~S&3^is9#3|lGJFTxj76)Mg%nW zL~|cfGfZK~{^9_Hi|Ab+e(0WlcLng0u8#@>74A$TZ*?HsrfU$8ZK9Hrw>~BA3k|Df zE+rmf(LUqy7uDRHbkM2RR*4=P#{DCeeCQNlK$PZNI92u2Oe%G#bnboc?qYgmp;i-zf>KgZ+JgsK^nX|R} zJ6_yvhC}$y)LQZ8u7oPjaZA%>x?BS#qwgE}-K4%(>~_b>hxk}j!b3It-gMea%y|P> zr})z@*kdzc^PAXLcpFU0@0HC24BWMlH9!9bbAPlT%&PKTan53#mBgOCGoDw(&Cw3w zX^Ef3&2`Gd!{)<@#E1}(Vbr_9_tPpr{}4uSar9Mpqz6wlEpw$U4NY$`3b5(9;)hemFBGt}2(rGrzGrk0U0gU#!)pmidsLVgZS5Y*|Ibq6;(C z!Oc0OPX%ROZgE*alNB$NW3mhg*}M(q}T=L6abUmkA4qI^a(|kKt1R*8Gx#$XRZ-Y!2rK51K`eCynKWsCs@KxQ&81hM4?=_}DY@jJ7{eSr)2^_sD1Ck?&JFd1o{XuZ)c zJGppvL6*JJW#!gb8VS?5@@KTm)3PSG74BITRFLMrC)NA9oHOdFWw@yKn11h)*s~{~ zIk^7jLc$R)!Kr@RJYRIC%vS^Je)6{TPdX|pX;)Jd_V#wuujEqG>7G2oie1%*qUv(v ztIw(mxr&{Mra}0C9ukesiUuTLGPuSePxMCmp7ffhVzUozX}DQ_#sIPw>DdO9b|v}KT#>FJJ<5?}O+UAkqd z)Y}!io`utIC0;irkekPRsVdDXowMqx&7)Jl*g9-mU*BM7>@}WJYOBa4xvS(_5w1^t z%DeyrxZ9a_5U{J>Cr~mUs$Y6^Zm!(TT?Ht6 zDD;$U&d~Y6?}j4(A_T!tH1`kxsR;k8D%$eu3qF6dHcw)9p1!@zf#T5b^3I&8*MRCD z_2vHxXi|v8+$ws?v(l^8ju!vi_sE`Bae@A{eOVARg18Eu-~oSdApc3t{OUTb36RAK zZdNrQXLfq}RMhb?xy{#C_dD#)W8~3fSAO$h(T>$`{q`|^02c8NZU3Y0`rpOP8b(E| zcn%ztSsgjg3u3hVtuNO4PMBDjG8Gu!x_i&sHov;&F5&y<8Rz+V5HaI%DUmvbX&EK~ z5wo~&0h^PvxEcm(7EG4sC|P^HN+JSgKYnQ9M3j~L@B)z8h5p6)kILqMDvTB3o`hG$ zyw@iW^LY$Vg@uPJV?Zw6`?K55bRmkm^;QBY-tG;=Q=nnmYuCUgdPMG6Vv94}X@Xba z1#Oj*^SmG|56_=1ffTY-4-0<{B`g%9_8s)@%LVAFv^tC6T|&+iqe!K&|58pIQQ8h9 zt^$$oBdBl96Sh$RjfO77$ruoR^BpDlEHMGON{#Z0z_5!dRBqen-5w}>&7TIjVt^w4 zT940HHY)1qQmn{1RUEuXI^W=od$G|f=Y;lQvhI;dDqW{L8?U%Ac?=oNs#vrsdoU(^ zHa8rY3UC9DE#h+ZYy~{aX8kf9HyMVbtjJoVi|jx&)%=!3IPiRSEo!2 zLuXrUtuJ8Ac z3ku$q@y^#T$;o-Sxae{dW1P9Rcj~MvX1xtJIX+I*^IyNs!?Rg3i8<g-^~o%|O3{k6}s{T@NarO16iphr)=n}*SBvON_mo_^6|eIgmK#7{Y-YVn^} zKWp%MQb*U{nCi&2)*soU6yR#a3hPIpBU{I~nyiSc>1J-mzCuj*r;i_j1Bk;UefcuK*rwB#dR+)Wmz32xL!Y@Ch&j)krq|K3~Cx6?E6<) zHhj=fJeIiG5F;W#m66RZ@04^PeIAWdTA;TtncF*D7n$t6GDF$gAX7F0d3yTv3q^8C z8-6rblXFk6rjxTXGM6pyK zD9urGaB!3?wrXL;8_5Xg?D9LgsL{sQjiQ%=gYXT+e1hLF9__#6Dy-cSCwVUp$h%xJ zHsxlg*NV8yoChDXy@-qZ;Y}_jcDYS_j%DWNZW4O$GgQ@JAnr;f`vUHQ`5^fF$48iJ z?CQ||hQ>w?Ipj{2xjTs$o`GDW_&}2sZqZfP&mP|YxR6jA*4I0$Wd2f~Jpm+rDV^>h zjk&$G$szA_oR*e$aiT56z2CDAmQy*Tssg^KatV*d%)0Lj+RHFReD80*xxNJ2>)!$k zorzBUp2W3+@2|9ck3K6Wxk;Hf0eqkLawM-ZTquKHU%;utP>$&7={f=?7}n#upVb>=`l#Q>_q!u)Z5!d9eO|Yolz&zQD+LQ#>4A6LM%hKPBaag8NKU?K3W4qcAj} zWEIBwjQWaPhu_*`=XBM#?0fQTo8lyc?8`Pg!PayAw-(;_`fqO_TU9G9S7>_+l2x)wbI1p6KmR=kS*&asw*vPOKLo3{y8lWdL8 z>V@4!=dW?~=FQqK+HVU00f&r(Q^{lYY@1#tx-3D)wtF>&e|x2ZhVS|F=jD6Y3vB^j{m04A~(%<;iwCX|I z7ly8`E`$wwOZ4#8I=*Zj+=B~jB*_Oqk0gMyfq3!s>BC|M)jjrQ&G3gb)k=BMLmCS@ zpfUj9_>RSM8WJ|YB@%q*FB;_AX@dv4ooB;ws$_N-Sac~y%es)>^5B_x%mJ0va{TAD z_$f?TKYZyk(_DjRtE(QvPo-KU{MILg(6Jq6@4O~cYsY^j`WgkJizxGr&ytdkS6St9 zf>?MhYzI4=-hy9Xm$~)PLR`vTixP05(=YJJ!?0yQyx-5dJI8Xb$&PpbyjuF=1@pbd z@n09FIy0Yr_;4iAcVoF02K@lQ(&*UEyOz9wv|xgZ%XTO||1Y=kX(n>_|nVFe+WQsFI1%Ij8uU>sh%(x`(=g+G+@96sKN3<;LDn14HnK>9d zGdsJ-4dP#FJb;{o{8x=~&BwRE(MWJwWx#tNixwv>l2(MS@ zW~?BgY$_MN6mIvZ7XVHNZe#c6=qdj}P$pd|;RC}0xw%|x{H_59xqS=__iLYWlvfy) z%gf8tbICj%7%+i@sV@1wpTX3+M%wUjr0N%V7N}1+)g02lYGB~?QWa1ywFsy~MbeMr zv1`|^TR9JyhHIYw5J1fLq^hZNbSqz*Og%XaZ(nH zI;xv4r2t2T79`Ax57N|RL;FOD8x=W>#^^hN9nOLu?ev`PpBVr%H^{Ye{V_B6@qEa4 zuO%xup@c<6%|NdGDX(%j3(5kQ&`c}z?u@49kF&%qu)Eaj$@49WWQW#^LccI*)qZcX z@<|bqDZLQb+lVjLh~$0e)F5o=yVOAez2CaUY=}%BK%HfXMJV2d8SWoh@Ea%(u=)L! zDz?$J2|xWSkw#Kdvgx%{2-<%$M#XpC0S>Oy7A@Q#zNkBl|NQ8hP;B1J2~#+Z$Go?= zgU>R05*7x!l;D80JUj}BzHmiTX6Jb(CVvEn+G+!<8BEVn2{!z#%Z8pu;U&0T)BK*> zg-!x+9t^{PZJ&Gd$-Vh)c6!jQm}|tCpTAB^3>n{kHt~!lzly*;E|cxF;WgKYa{&PX zOIAF+8{7l%I1bpY=Gm{0o)4a#hHE79dpMLl9O8+ta9AX_)xQwXQtAO8qM@t%OR@tt z9}db@)k1Jsw!nHf7r9QY!gP4T3b2C-9*qPCDyF(kkpf@%>Fd|o=XE4sMu(0xy>~wI z^$E7`{nmvyH&!sx`2$-;E)$i7#DqrGt+@k-#SMSJHxk@exbv})Ew>+bpA2ywLE?C#tl{oE#PpalBW7O z>clg@R#vLAIu!Knkg62NhE54+#X&Znu%np&|MyzJ q{~`YT+sVKN2APU~4Nv2?X=tiA@5@|eS%!;CQ5V&(sAZiq3H%Rh@+H0i literal 0 HcmV?d00001 diff --git a/docs/optimization/improve-parallelism/dataflow.md b/docs/optimization/improve-parallelism/dataflow.md new file mode 100644 index 0000000000..701d69a36d --- /dev/null +++ b/docs/optimization/improve-parallelism/dataflow.md @@ -0,0 +1,102 @@ +### Enabling dataflow parallelism + +This guide teaches what data parallelism is and how it can improve the execution time of Concrete circuits. + +Dataflow parallelism is a great feature, especially when the circuit is doing a lot of scalar operations. + +Without dataflow parallelism, circuit is executed operation by operation, like an imperative language. If the operations themselves are not tensorized, loop parallelism would not be utilized and the entire execution would happen in a single thread. Dataflow parallelism changes this by analyzing the operations and their dependencies within the circuit to determine what can be done in parallel and what cannot. Then it distributes the tasks that can be done in parallel to different threads. + +For example: + +```python +import time + +import numpy as np +from concrete import fhe + +def f(x, y, z): + # normally, you'd use fhe.array to construct a concrete tensor + # but for this example, we just create a simple numpy array + # so the matrix multiplication can happen on a cellular level + a = np.array([[x, y], [z, 2]]) + b = np.array([[1, x], [z, y]]) + return fhe.array(a @ b) + +inputset = fhe.inputset(fhe.uint3, fhe.uint3, fhe.uint3) + +for dataflow_parallelize in [False, True]: + compiler = fhe.Compiler(f, {"x": "encrypted", "y": "encrypted", "z": "encrypted"}) + circuit = compiler.compile(inputset, dataflow_parallelize=dataflow_parallelize) + + circuit.keygen() + for sample in inputset[:3]: # warmup + circuit.encrypt_run_decrypt(*sample) + + timings = [] + for sample in inputset[3:13]: + start = time.time() + result = circuit.encrypt_run_decrypt(*sample) + end = time.time() + + assert np.array_equal(result, f(*sample)) + timings.append(end - start) + + if not dataflow_parallelize: + print(f"without dataflow parallelize -> {np.mean(timings):.03f}s") + else: + print(f" with dataflow parallelize -> {np.mean(timings):.03f}s") +``` + +prints: + +``` +without dataflow parallelize -> 0.609s + with dataflow parallelize -> 0.414s +``` + +and the reason for that is: + +``` +// this is the generated MLIR for the circuit +// without dataflow, every single line would be executed one after the other + +module { + func.func @main(%arg0: !FHE.eint<7>, %arg1: !FHE.eint<7>, %arg2: !FHE.eint<7>) -> tensor<2x2x!FHE.eint<7>> { + + // but if you look closely, you can see that this multiplication + %c1_i2 = arith.constant 1 : i2 + %0 = "FHE.mul_eint_int"(%arg0, %c1_i2) : (!FHE.eint<7>, i2) -> !FHE.eint<7> + + // is completely independent of this one, so dataflow makes them run in parallel + %1 = "FHE.mul_eint"(%arg1, %arg2) : (!FHE.eint<7>, !FHE.eint<7>) -> !FHE.eint<7> + + // however, this addition needs the first two operations + // so dataflow waits until both are done before performing this one + %2 = "FHE.add_eint"(%0, %1) : (!FHE.eint<7>, !FHE.eint<7>) -> !FHE.eint<7> + + // lastly, this multiplication is completely independent from the first three operations + // so its execution starts in parallel when execution starts with dataflow + %3 = "FHE.mul_eint"(%arg0, %arg0) : (!FHE.eint<7>, !FHE.eint<7>) -> !FHE.eint<7> + + // similar logic can be applied to the remaining operations... + %4 = "FHE.mul_eint"(%arg1, %arg1) : (!FHE.eint<7>, !FHE.eint<7>) -> !FHE.eint<7> + %5 = "FHE.add_eint"(%3, %4) : (!FHE.eint<7>, !FHE.eint<7>) -> !FHE.eint<7> + %6 = "FHE.mul_eint_int"(%arg2, %c1_i2) : (!FHE.eint<7>, i2) -> !FHE.eint<7> + %c2_i3 = arith.constant 2 : i3 + %7 = "FHE.mul_eint_int"(%arg2, %c2_i3) : (!FHE.eint<7>, i3) -> !FHE.eint<7> + %8 = "FHE.add_eint"(%6, %7) : (!FHE.eint<7>, !FHE.eint<7>) -> !FHE.eint<7> + %9 = "FHE.mul_eint"(%arg2, %arg0) : (!FHE.eint<7>, !FHE.eint<7>) -> !FHE.eint<7> + %10 = "FHE.mul_eint_int"(%arg1, %c2_i3) : (!FHE.eint<7>, i3) -> !FHE.eint<7> + %11 = "FHE.add_eint"(%9, %10) : (!FHE.eint<7>, !FHE.eint<7>) -> !FHE.eint<7> + %from_elements = tensor.from_elements %2, %5, %8, %11 : tensor<2x2x!FHE.eint<7>> + return %from_elements : tensor<2x2x!FHE.eint<7>> + + } +} +``` + +To summarize, dataflow analyzes the circuit to determine which parts of the circuit can be run at the same time, and tries to run as many operations as possible in parallel. + +{% hint style="warning" %} +When the circuit is tensorized, dataflow might slow execution down since the tensor operations already use multiple threads and adding dataflow on top creates congestion in the CPU between the HPX (dataflow parallelism runtime) and OpenMP (loop parallelism runtime). So try both before deciding on whether to use dataflow or not. +{% endhint %} diff --git a/docs/optimization/improve-parallelism/self.md b/docs/optimization/improve-parallelism/self.md new file mode 100644 index 0000000000..c5a6bd4fd2 --- /dev/null +++ b/docs/optimization/improve-parallelism/self.md @@ -0,0 +1,11 @@ +## Improve parallelism + +This guide teaches the different options for parallelism in Concrete and how to utilize them to improve the execution time of Concrete circuits. + +Modern CPUs have multiple cores to perform computation and utilizing multiple cores is a great way to boost performance. + +There are two kinds of parallelism in Concrete: +- Loop parallelism to make tensor operations parallel, achieved by using [OpenMP](https://www.openmp.org/) +- Dataflow parallelism to make independent operations parallel, achieved by using [HPX](https://hpx.stellar-group.org/) + +Loop parallelism is enabled by default, as it's supported on all platforms. Dataflow parallelism however is only supported on Linux, hence not enabled by default. diff --git a/docs/optimization/improve-parallelism/tensorization.md b/docs/optimization/improve-parallelism/tensorization.md new file mode 100644 index 0000000000..9a7f9ce1ee --- /dev/null +++ b/docs/optimization/improve-parallelism/tensorization.md @@ -0,0 +1,55 @@ +### Tensorizing operations + +This guide teaches what tensorization is and how it can improve the execution time of Concrete circuits. + +Tensors should be used instead of scalars when possible to maximize loop parallelism. + +For example: + +```python +import time + +import numpy as np +from concrete import fhe + +inputset = fhe.inputset(fhe.uint6, fhe.uint6, fhe.uint6) +for tensorize in [False, True]: + def f(x, y, z): + return ( + np.sum(fhe.array([x, y, z]) ** 2) + if tensorize + else (x ** 2) + (y ** 2) + (z ** 2) + ) + + compiler = fhe.Compiler(f, {"x": "encrypted", "y": "encrypted", "z": "encrypted"}) + circuit = compiler.compile(inputset) + + circuit.keygen() + for sample in inputset[:3]: # warmup + circuit.encrypt_run_decrypt(*sample) + + timings = [] + for sample in inputset[3:13]: + start = time.time() + result = circuit.encrypt_run_decrypt(*sample) + end = time.time() + + assert np.array_equal(result, f(*sample)) + timings.append(end - start) + + if not tensorize: + print(f"without tensorization -> {np.mean(timings):.03f}s") + else: + print(f" with tensorization -> {np.mean(timings):.03f}s") +``` + +prints: + +``` +without tensorization -> 0.214s + with tensorization -> 0.118s +``` + +{% hint style="info" %} +Enabling dataflow is kind of letting the runtime do this for you. It'd also help in the specific case. +{% endhint %} diff --git a/docs/optimization/optimize-cryptographic-parameters/composition.md b/docs/optimization/optimize-cryptographic-parameters/composition.md new file mode 100644 index 0000000000..172ff80e80 --- /dev/null +++ b/docs/optimization/optimize-cryptographic-parameters/composition.md @@ -0,0 +1,65 @@ +### Specifying composition when using modules + +This guide explains how to optimize cryptographic parameters by specifying composition when using [modules](../../compilation/composing_functions_with_modules.md). + +When using [modules](../../compilation/composing_functions_with_modules.md) make sure to specify [composition](../../compilation/composing_functions_with_modules.md#optimizing-runtimes-with-composition-policies) so that the compiler can select more optimal parameters based on how the functions in the module would be used. + +For example: + +```python +import numpy as np +from concrete import fhe + + +@fhe.module() +class PowerWithoutComposition: + @fhe.function({"x": "encrypted"}) + def square(x): + return x ** 2 + + @fhe.function({"x": "encrypted"}) + def cube(x): + return x ** 3 + +without_composition = PowerWithoutComposition.compile( + { + "square": fhe.inputset(fhe.uint2), + "cube": fhe.inputset(fhe.uint4), + } +) +print(f"without composition -> {int(without_composition.complexity):>10_} complexity") + + +@fhe.module() +class PowerWithComposition: + @fhe.function({"x": "encrypted"}) + def square(x): + return x ** 2 + + @fhe.function({"x": "encrypted"}) + def cube(x): + return x ** 3 + + composition = fhe.Wired( + [ + fhe.Wire(fhe.Output(square, 0), fhe.Input(cube, 0)) + ] + ) + +with_composition = PowerWithComposition.compile( + { + "square": fhe.inputset(fhe.uint2), + "cube": fhe.inputset(fhe.uint4), + } +) +print(f" with composition -> {int(with_composition.complexity):>10_} complexity") +``` + +prints: + +``` +without composition -> 185_863_835 complexity + with composition -> 135_871_612 complexity +``` + +which means specifying composition resulted in ~35% improvement to complexity for computing `cube(square(x))`. diff --git a/docs/optimization/optimize-cryptographic-parameters/p-error.md b/docs/optimization/optimize-cryptographic-parameters/p-error.md new file mode 100644 index 0000000000..7f5e014656 --- /dev/null +++ b/docs/optimization/optimize-cryptographic-parameters/p-error.md @@ -0,0 +1,31 @@ +### Adjusting table lookup error probability + +This guide teaches how setting `p_error` configuration option can affect the performance of Concrete circuits. + +Adjusting table lookup error probability is discussed extensively in [Table lookup exactness](../../core-features/table_lookups_advanced.md#table-lookup-exactness) section. The idea is to sacrifice exactness to gain performance. + +For example: + +```python +import numpy as np +from concrete import fhe + +def f(x, y): + return (x // 2) * (y // 3) + +inputset = fhe.inputset(fhe.uint4, fhe.uint4) +for p_error in [(1 / 1_000_000), (1 / 100_000), (1 / 10_000), (1 / 1_000), (1 / 100)]: + compiler = fhe.Compiler(f, {"x": "encrypted", "y": "encrypted"}) + circuit = compiler.compile(inputset, p_error=p_error) + print(f"p_error of {p_error:.6f} -> {int(circuit.complexity):_} complexity") +``` + +prints: + +``` +p_error of 0.000001 -> 294_773_524 complexity +p_error of 0.000010 -> 286_577_520 complexity +p_error of 0.000100 -> 275_887_080 complexity +p_error of 0.001000 -> 265_196_640 complexity +p_error of 0.010000 -> 184_144_972 complexity +``` diff --git a/docs/optimization/optimize-cryptographic-parameters/self.md b/docs/optimization/optimize-cryptographic-parameters/self.md new file mode 100644 index 0000000000..f19f57501e --- /dev/null +++ b/docs/optimization/optimize-cryptographic-parameters/self.md @@ -0,0 +1,5 @@ +## Optimize cryptographic parameters + +This guide teaches how to help Concrete Optimizer to select more performant parameters to improve the execution time of Concrete circuits. + +The idea is to obtain more optimal cryptographic parameters (especially for table lookups) without changing the operations within the circuit. diff --git a/docs/optimization/optimize-table-lookups/approximate.md b/docs/optimization/optimize-table-lookups/approximate.md new file mode 100644 index 0000000000..bf2be52933 --- /dev/null +++ b/docs/optimization/optimize-table-lookups/approximate.md @@ -0,0 +1,36 @@ +### Activating approximate mode for rounding + +This guide teaches how to improve the execution time of Concrete circuits by using approximate mode for rounding. + +You can enable [approximate mode](../../core-features/rounding.md#exactness) to gain even more performance when using rounding by sacrificing some more exactness: + +```python +import numpy as np +from concrete import fhe + +inputset = fhe.inputset(fhe.uint10) +for lsbs_to_remove in range(0, 10): + def f(x): + return fhe.round_bit_pattern(x, lsbs_to_remove, exactness=fhe.Exactness.APPROXIMATE) // 2 + + compiler = fhe.Compiler(f, {"x": "encrypted"}) + circuit = compiler.compile(inputset) + + print(f"{lsbs_to_remove=} -> {int(circuit.complexity):>13_} complexity") + +``` + +prints: + +``` +lsbs_to_remove=0 -> 9_134_406_574 complexity +lsbs_to_remove=1 -> 5_548_275_712 complexity +lsbs_to_remove=2 -> 2_430_793_927 complexity +lsbs_to_remove=3 -> 1_058_638_119 complexity +lsbs_to_remove=4 -> 409_952_712 complexity +lsbs_to_remove=5 -> 172_138_947 complexity +lsbs_to_remove=6 -> 99_198_195 complexity +lsbs_to_remove=7 -> 71_644_380 complexity +lsbs_to_remove=8 -> 55_860_516 complexity +lsbs_to_remove=9 -> 50_978_148 complexity +``` diff --git a/docs/optimization/optimize-table-lookups/bit-extraction.md b/docs/optimization/optimize-table-lookups/bit-extraction.md new file mode 100644 index 0000000000..b4e4f59acf --- /dev/null +++ b/docs/optimization/optimize-table-lookups/bit-extraction.md @@ -0,0 +1,38 @@ +### Utilizing bit extraction + +This guide teaches how to improve the execution time of Concrete circuits by using bit extraction. + +[Bit extraction](../../core-features/bit_extraction.md) is a cheap way to extract certain bits of encrypted values. It can be very useful for improving the performance of circuits. + +For example: + +```python +import numpy as np +from concrete import fhe + +inputset = fhe.inputset(fhe.uint6) +for bit_extraction in [False, True]: + def is_even(x): + return ( + x % 2 == 0 + if not bit_extraction + else 1 - fhe.bits(x)[0] + ) + + compiler = fhe.Compiler(is_even, {"x": "encrypted"}) + circuit = compiler.compile(inputset) + + if not bit_extraction: + print(f"without bit extraction -> {int(circuit.complexity):>11_} complexity") + else: + print(f" with bit extraction -> {int(circuit.complexity):>11_} complexity") +``` + +prints: + +``` +without bit extraction -> 230_210_706 complexity + with bit extraction -> 29_506_014 complexity +``` + +That's almost 8x improvement to circuit complexity! diff --git a/docs/optimization/optimize-table-lookups/reducing-amount.md b/docs/optimization/optimize-table-lookups/reducing-amount.md new file mode 100644 index 0000000000..0f6eaa5d19 --- /dev/null +++ b/docs/optimization/optimize-table-lookups/reducing-amount.md @@ -0,0 +1,94 @@ +### Reducing the amount of table lookups + +This guide teaches how to improve the execution time of Concrete circuits by reducing the amount of table lookups. + +Reducing the amount of table lookups is probably the most complicated guide in this section as it's not automated. The idea is to use mathematical properties of operations to reduce the amount of table lookups needed to achieve the result. + +One great example is in adding big integers in bitmap representation. Here is the basic implementation: + +```python +def add_bitmaps(x, y): + result = fhe.zeros((N,)) + carry = 0 + + addition = x + y + for i in range(N): + addition_and_carry = addition[i] + carry + carry = addition_and_carry >> 1 + result[i] = addition_and_carry % 2 + + return result +``` + +There are two table lookups within the loop body, one for `>>` and one for `%`. + +This implementation is not optimal though, since the same output can be achieved with just a single table lookup: + +```python +def add_bitmaps(x, y): + result = fhe.zeros((N,)) + carry = 0 + + addition = x + y + for i in range(N): + addition_and_carry = addition[i] + carry + carry = addition_and_carry >> 1 + result[i] = addition_and_carry - (carry * 2) + + return result +``` + +It was possible to do this because the original operations had a mathematical equivalence with the optimized operations and optimized operations achieved the same output with less table lookups! + +Here is the full code example and some numbers for this optimization: + +```python +import numpy as np +from concrete import fhe + +N = 32 + +def add_bitmaps_naive(x, y): + result = fhe.zeros((N,)) + carry = 0 + + addition = x + y + for i in range(N): + addition_and_carry = addition[i] + carry + carry = addition_and_carry >= 2 + result[i] = addition_and_carry % 2 + + return result + +def add_bitmaps_optimized(x, y): + result = fhe.zeros((N,)) + carry = 0 + + addition = x + y + for i in range(N): + addition_and_carry = addition[i] + carry + carry = addition_and_carry >> 1 + result[i] = addition_and_carry - (carry * 2) + + return result + +inputset = fhe.inputset(fhe.tensor[fhe.uint1, N], fhe.tensor[fhe.uint1, N]) +for (name, implementation) in [("naive", add_bitmaps_naive), ("optimized", add_bitmaps_optimized)]: + compiler = fhe.Compiler(implementation, {"x": "encrypted", "y": "encrypted"}) + circuit = compiler.compile(inputset) + + print( + f"{name:>9} implementation " + f"-> {int(circuit.programmable_bootstrap_count)} table lookups " + f"-> {int(circuit.complexity):_} complexity" + ) +``` + +prints: + +``` + naive implementation -> 63 table lookups -> 2_427_170_697 complexity +optimized implementation -> 32 table lookups -> 1_224_206_208 complexity +``` + +which is almost half the amount of table lookups and ~2x less complexity for the same operation! diff --git a/docs/optimization/optimize-table-lookups/round-truncate.md b/docs/optimization/optimize-table-lookups/round-truncate.md new file mode 100644 index 0000000000..b4f4e68056 --- /dev/null +++ b/docs/optimization/optimize-table-lookups/round-truncate.md @@ -0,0 +1,37 @@ +### Using round/truncate bit pattern before table lookups + +This guide teaches how to improve the execution time of Concrete circuits by using some special operations that reduce the bit width of the input of the table lookup. + +There are two extensions which can reduce the bit width of the table lookup input, [fhe.round_bit_pattern(...)](../../core-features/rounding.md) and [fhe.truncate_bit_pattern(...)](../../core-features/truncating.md), which can improve performance by sacrificing exactness. + +For example the following code: + +```python +import numpy as np +from concrete import fhe + +inputset = fhe.inputset(fhe.uint10) +for lsbs_to_remove in range(0, 10): + def f(x): + return fhe.round_bit_pattern(x, lsbs_to_remove) // 2 + + compiler = fhe.Compiler(f, {"x": "encrypted"}) + circuit = compiler.compile(inputset) + + print(f"{lsbs_to_remove=} -> {int(circuit.complexity):>13_} complexity") +``` + +prints: + +``` +lsbs_to_remove=0 -> 9_134_406_574 complexity +lsbs_to_remove=1 -> 3_209_430_092 complexity +lsbs_to_remove=2 -> 1_536_476_735 complexity +lsbs_to_remove=3 -> 1_588_749_586 complexity +lsbs_to_remove=4 -> 848_133_081 complexity +lsbs_to_remove=5 -> 525_987_801 complexity +lsbs_to_remove=6 -> 358_276_023 complexity +lsbs_to_remove=7 -> 373_311_341 complexity +lsbs_to_remove=8 -> 400_596_351 complexity +lsbs_to_remove=9 -> 438_681_996 complexity +``` diff --git a/docs/optimization/optimize-table-lookups/self.md b/docs/optimization/optimize-table-lookups/self.md new file mode 100644 index 0000000000..3fc8bb0f66 --- /dev/null +++ b/docs/optimization/optimize-table-lookups/self.md @@ -0,0 +1,78 @@ +## Optimize table lookups + +This guide teaches how costly table lookups are, and how to optimize them to improve the execution time of Concrete circuits. + +The most costly operation in Concrete is the table lookup operation, so one of the primary goals of optimizing performance is to reduce the amount of table lookups. + +Furthermore, the bit width of the input of the table lookup plays a major role in performance. + +```python +import time + +import numpy as np +import matplotlib.pyplot as plt +from concrete import fhe + +def f(x): + return x // 2 + +bit_widths = list(range(2, 9)) +complexities = [] +timings = [] + +for bit_width in bit_widths: + inputset = fhe.inputset(lambda _: np.random.randint(0, 2 ** bit_width)) + + compiler = fhe.Compiler(f, {"x": "encrypted"}) + circuit = compiler.compile(inputset) + + circuit.keygen() + for sample in inputset[:3]: # warmup + circuit.encrypt_run_decrypt(*sample) + + current_timings = [] + for sample in inputset[3:13]: + start = time.time() + result = circuit.encrypt_run_decrypt(*sample) + end = time.time() + + assert np.array_equal(result, f(*sample)) + current_timings.append(end - start) + + complexities.append(int(circuit.complexity)) + timings.append(float(np.mean(current_timings))) + + print(f"{bit_width} bits -> {complexities[-1]:>13_} complexity -> {timings[-1]:.06f}s") + +figure, complexity_axis = plt.subplots() + +color = "tab:red" +complexity_axis.set_xlabel("bit width") +complexity_axis.set_ylabel("complexity", color=color) +complexity_axis.plot(bit_widths, complexities, color=color) +complexity_axis.tick_params(axis="y", labelcolor=color) + +timing_axis = complexity_axis.twinx() + +color = 'tab:blue' +timing_axis.set_ylabel('execution time', color=color) +timing_axis.plot(bit_widths, timings, color=color) +timing_axis.tick_params(axis='y', labelcolor=color) + +figure.tight_layout() +plt.show() +``` + +The code above prints: +``` +2 bits -> 29_944_416 complexity -> 0.019826s +3 bits -> 42_154_798 complexity -> 0.020093s +4 bits -> 61_979_934 complexity -> 0.021961s +5 bits -> 99_198_195 complexity -> 0.029475s +6 bits -> 230_210_706 complexity -> 0.062841s +7 bits -> 535_706_740 complexity -> 0.139669s +8 bits -> 1_217_510_420 complexity -> 0.318838s +``` + +And displays: +![](../../_static/compilation/performance_tips/complexity_and_timing_per_bit_width.png) diff --git a/docs/optimization/optimize-table-lookups/strategies.md b/docs/optimization/optimize-table-lookups/strategies.md new file mode 100644 index 0000000000..492743cf31 --- /dev/null +++ b/docs/optimization/optimize-table-lookups/strategies.md @@ -0,0 +1,98 @@ +### Changing the implementation strategy of complex operations + +This guide teaches how to improve the execution time of Concrete circuits by using different conversion strategies for complex operations. + +Concrete provides multiple implementation strategies for these complex operations: + +- [comparisons (<,<=,==,!=,>=,>)](../../core-features/comparisons.md) +- [bitwise operations (<<,&,|,^,>>)](../../core-features/bitwise.md) +- [minimum and maximum operations](../../core-features/minmax.md) +- [multivariate extension](../../core-features/extensions.md#fhemultivariatefunction) + +{% hint style="info" %} +The default strategy is the one that doesn't increase the input bit width, even if it's less optimal than the others. If you don't care about the input bit widths (e.g., if the inputs are only used in this operation), you should definitely change the default strategy. +{% endhint %} + +Choosing the correct strategy can lead to big speedups. So if you are not sure which one to use, you can compile with different strategies and compare the complexity. + +For example, the following code: + +```python +import numpy as np +from concrete import fhe + +def f(x, y): + return x & y + +inputset = fhe.inputset(fhe.uint3, fhe.uint4) +strategies = [ + fhe.BitwiseStrategy.ONE_TLU_PROMOTED, + fhe.BitwiseStrategy.THREE_TLU_CASTED, + fhe.BitwiseStrategy.TWO_TLU_BIGGER_PROMOTED_SMALLER_CASTED, + fhe.BitwiseStrategy.TWO_TLU_BIGGER_CASTED_SMALLER_PROMOTED, + fhe.BitwiseStrategy.CHUNKED, +] + +for strategy in strategies: + compiler = fhe.Compiler(f, {"x": "encrypted", "y": "encrypted"}) + circuit = compiler.compile(inputset, bitwise_strategy_preference=strategy) + print( + f"{strategy:>55} " + f"-> {circuit.programmable_bootstrap_count:>2} TLUs " + f"-> {int(circuit.complexity):>12_} complexity" + ) +``` + +prints: + +``` + BitwiseStrategy.ONE_TLU_PROMOTED -> 1 TLUs -> 535_706_740 complexity + BitwiseStrategy.THREE_TLU_CASTED -> 3 TLUs -> 599_489_229 complexity + BitwiseStrategy.TWO_TLU_BIGGER_PROMOTED_SMALLER_CASTED -> 2 TLUs -> 522_239_955 complexity + BitwiseStrategy.TWO_TLU_BIGGER_CASTED_SMALLER_PROMOTED -> 2 TLUs -> 519_246_216 complexity + BitwiseStrategy.CHUNKED -> 6 TLUs -> 358_905_521 complexity +``` + +or: + +```python +import numpy as np +from concrete import fhe + +def f(x, y): + return x == y + +inputset = fhe.inputset(fhe.uint4, fhe.uint7) +strategies = [ + fhe.ComparisonStrategy.ONE_TLU_PROMOTED, + fhe.ComparisonStrategy.THREE_TLU_CASTED, + fhe.ComparisonStrategy.TWO_TLU_BIGGER_PROMOTED_SMALLER_CASTED, + fhe.ComparisonStrategy.TWO_TLU_BIGGER_CASTED_SMALLER_PROMOTED, + fhe.ComparisonStrategy.THREE_TLU_BIGGER_CLIPPED_SMALLER_CASTED, + fhe.ComparisonStrategy.TWO_TLU_BIGGER_CLIPPED_SMALLER_PROMOTED, + fhe.ComparisonStrategy.CHUNKED, +] + +for strategy in strategies: + compiler = fhe.Compiler(f, {"x": "encrypted", "y": "encrypted"}) + circuit = compiler.compile(inputset, comparison_strategy_preference=strategy) + print( + f"{strategy:>58} " + f"-> {circuit.programmable_bootstrap_count:>2} TLUs " + f"-> {int(circuit.complexity):>13_} complexity" + ) +``` + +prints: + +``` + ComparisonStrategy.ONE_TLU_PROMOTED -> 1 TLUs -> 1_217_510_420 complexity + ComparisonStrategy.THREE_TLU_CASTED -> 3 TLUs -> 751_172_128 complexity + ComparisonStrategy.TWO_TLU_BIGGER_PROMOTED_SMALLER_CASTED -> 2 TLUs -> 1_043_702_103 complexity + ComparisonStrategy.TWO_TLU_BIGGER_CASTED_SMALLER_PROMOTED -> 2 TLUs -> 1_898_305_707 complexity +ComparisonStrategy.THREE_TLU_BIGGER_CLIPPED_SMALLER_CASTED -> 3 TLUs -> 751_172_128 complexity +ComparisonStrategy.TWO_TLU_BIGGER_CLIPPED_SMALLER_PROMOTED -> 2 TLUs -> 682_694_770 complexity + ComparisonStrategy.CHUNKED -> 3 TLUs -> 751_172_128 complexity +``` + +As you can see, strategies can affect the performance a lot! So make sure to select the appropriate one for your use case if you want to optimize performance. diff --git a/docs/optimization/self.md b/docs/optimization/self.md new file mode 100644 index 0000000000..171e935cbf --- /dev/null +++ b/docs/optimization/self.md @@ -0,0 +1,8 @@ +# Optimization + +This guide teaches how to optimize Concrete circuits extensively. + +It's split in 3 sections: +- [Improve parallelism](./improve-parallelism/self.md): to show how to make circuits utilize more cores. +- [Optimize table lookups](./optimize-table-lookups/self.md): to show how to optimize the most expensive operation in Concrete. +- [Optimize cryptographic parameters](./optimize-cryptographic-parameters/self.md): to show how to make Concrete select more performant parameters. diff --git a/docs/optimization/summary.md b/docs/optimization/summary.md new file mode 100644 index 0000000000..cc7f8f7235 --- /dev/null +++ b/docs/optimization/summary.md @@ -0,0 +1,15 @@ +# Performance + +This document shows some basic things you can do to improve the performance of your circuit. + +Here are some quick tips to reduce the execution time of your circuit: + +- Reduce the amount of [table lookups](../core-features/table_lookups.md) in the circuit. +- Try different implementation strategies for [complex operations](../core-features/non_linear_operations.md#comparisons). +- Utilize [rounding](../core-features/rounding.md) and [truncating](../core-features/truncating.md) if your application doesn't require precise execution. +- Use tensors as much as possible in your circuits. +- Enable dataflow parallelization, by setting `dataflow_parallelize=True` in the [configuration](../guides/configure.md). +- Tweak `p_error` configuration option until you get optimal exactness vs performance tradeoff for your application. +- Specify composition when using [modules](../compilation/composing_functions_with_modules.md#optimizing-runtimes-with-composition-policies). + +You can refer to our full [Optimization Guide](../optimization/self.md) for detailed examples of how to do each of these, and more!