-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: Cache translations to reduce file read and parse operations #2939
base: main
Are you sure you want to change the base?
Conversation
That's an interesting idea, but we'll have to think about the implications - increased memory size and extra complexity. @salochara has been investigating another big opportunity for improving the performance of i18n translations by moving away from yaml to json files. The jump in performance would be much larger than 10% without any cost to memory usage. I think we should look at the json vs yaml thing first, as I believe it will have a higher impact. Then we could come back to your idea to explore further improvements. |
@thdaraujo I agree that investigating the json vs yaml performance improvement is a great idea, but this translation cache is at a higher level than
This cache does not increase memory size by any significant amount. The 2.2MB number mentioned above is an extreme case of someone using every single available Just to be clear, this PR does not load the entire translation directory into memory. It is a cache that warms as the application makes calls to the If you were to use all |
Ah okay, that makes more sense then! I will think about it, thanks for adding more context! Interestingly, it looks like i18n already allows a cache backend to be plugged in for caching translations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about the pros as cons of faker keeping a translation cache versus just utilizing i18n's built-in cache backend. According to their docs, we would need to pass a cache store that only needs to respond to #fetch
and #write
.
We should also be careful with thread-safety given that faker could be running in a threaded web server. Do you think it's worth it to introduce a mutex or store the cache in a thread-local variable?
Probably not a big deal for this specific feature, as I think the only potential issue would be cache misses.
@@ -161,8 +163,9 @@ def parse(key) | |||
# locale is specified | |||
def translate(*args, **opts) | |||
opts[:locale] ||= Faker::Config.locale | |||
translate_key = args.to_s + opts.sort.join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe translate_key
should be hashed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does sound reasonable to hash the translate_key
, but there is no reason to compute this digest because the translate_key
(as provided in this PR) is a valid string and can be used for lookups. The extra time taken on every method call to compute even an md5
digest erases all performance gains for this PR.
Another interesting fact here is that i18n
computes the sha256
digest for all cache keys if you enable their default translation cache. This suggests that implementing the i18n
default cache will also erase the performance gains for this PR.
Motivation / Background
This Pull Request has been created because the
Faker
library is reading from disk and parsing theyml
translation files on nearly every method call. These redundant "file read and YAML parse" operations can be efficiently cached in-memory to increase the performance of the entire library by 10.6% (faster), full modules are up to 18% (faster), and individual methods are up to 30% (faster).Additional Information
args
string with a deterministically serialized (sorted)opts
hash. (This allows the cache to generate the correct lookup key regardless of the order of the providedopts
keys.)I18n.translate
is then either retrieved from memory, or cached for future lookup and returned.This cache speeds up the operation of the entire
Faker
library by 10.6%, but it comes with the slight tradeoff of increasing memory size as the cache is warmed (as you use Faker methods within your program). Fortunately enough; the ENTIREFaker
English translation directory is only 2.2MB, while allFaker
translations combined are 7.1MB. Allocating anywhere from 2MB - 7MB of extra memory for theFaker
library to run 10.6% faster is a great tradeoff today with system memory typically measured in thousands of megabytes.Performance Benchmark
You can see from the benchmark below that after caching the redundant "file read and YAML parse" operations that the
Faker
modules perform up to 18% faster. Even the popularFaker::Lorem
receives a 14.8% performance increase (averaged across all methods within that module). When combined; the entireFaker
library benefits from a 10.6% speed improvement. Here is a list of the top 50 improved module times from caching the translations:Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]