Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document whether fetch/4 is an atomic operation #393

Open
1player opened this issue Dec 3, 2024 · 4 comments
Open

Document whether fetch/4 is an atomic operation #393

1player opened this issue Dec 3, 2024 · 4 comments
Labels

Comments

@1player
Copy link

1player commented Dec 3, 2024

Use case: I want to fetch a key from Cachex, or run an expensive operation to generate it. In a multi-node cluster, I'd like to do the operation once, which means that the key should be locked until the fallback function returns.

#190 clarified that get_and_update/4 is atomic and uses transactions, yet it is not clear to me whether fetch/4 is. If it is not atomic, I wonder it's usefulness given that get_and_update/4 is, and thus is more reliable in many scenarios.

So this issue is to ask whether my assumptions are correct, and that perhaps the documentation should be a bit more clear about it.

@whitfin
Copy link
Owner

whitfin commented Dec 3, 2024

@1player it is not atomic, otherwise it would be documented as such.

I'm not sure why you would say it's not useful for that reason; why would you want to block your entire cache on an operation that could take (in some cases) multiple seconds?

You are correct that a naive implementation of fetch/4 could be built using get_and_update/4. That would be redundant (as you point out), so the fetch/4 implementation is much more efficient by avoiding locking the cache table while still ensuring that there are no overlapping calls on the same key.

This behaviour is documented here (which is also available on the published documentation).

Does this answer your question? What would you change in the documentation?

@whitfin whitfin added the discuss label Dec 3, 2024
@1player
Copy link
Author

1player commented Dec 4, 2024

Thank you for clarifying!

it is not atomic, otherwise it would be documented as such.

I assumed the lack of any mention regarding atomicity to just be an oversight, rather than implicitly saying that it is not atomic. I think it would be more useful to mention this fact explicitly in the fetch/4 docs, and direct users to use get_and_update/4 instead if they need stronger guarantees.

The cache warmer example you have linked are quite useful and clearly answer my question, yet I didn't come across it because I have no need for cache warming so I never thought to look there for answers :)

@1player
Copy link
Author

1player commented Dec 4, 2024

Apologies, now I am confused. You say:

the fetch/4 implementation is much more efficient by avoiding locking the cache table while still ensuring that there are no overlapping calls on the same key

So if no overlapping calls are allowed, it's basically "atomic", i.e. it operates as if that key is locked while a fetch operation is executing, which is exactly what I need. That's what I understand from your comment.

Since I'm apparently too slow, and you haven't confirmed explicitly: can I use fetch/4 on a particular key and have the guarantee there will be NO overlapping calls in the cluster? I just want to compute an expensive operation once.

@whitfin
Copy link
Owner

whitfin commented Dec 4, 2024

No problem! Let me try to be clearer, it is a bit awkward to explain in words. It might be clearer with examples:

If you do something like this:

Cachex.fetch(:cache, :my_key, fn -> 
  :timer.sleep(5000)
  :ok
end)

Other writes to keys can still be running in the background, and that includes the key :my_key. So during that 5s sleep, you could do something like this and it would work:

Cachex.put(:cache, :my_key, :my_value)

In this case what would happen is after the 5s, your fetch/4 would realise that it has been set in the meantime and simply return :my_value (discarding the computed value) because :my_value was technically the last in (based on call time). So it's not "atomic"; the key is still both accessible and writeable.

I just want to compute an expensive operation once

Yep! That's where fetch/4 is far superior to get_and_update/4. The fetch/4 implementation specifically has a mechanism to achieve what you're saying. As we keep comparing with get_and_update/4, I'll tweak the documented example to use that in case it's clearer:

# start a new cache
Cachex.start(:cache)

# via get_and_update/4
for _ <- 1..10 do
  spawn(fn ->
    Cachex.get_and_update(:cache, "key1", fn value ->
      IO.puts("Running get_and_update/4 handler")
      case value do
        nil -> :timer.sleep(1000)
        value -> value
      end
    end)
  end)
end

# via fetch/4
for _ <- 1..10 do
  spawn(fn ->
    Cachex.fetch(:cache, "key2", fn key ->
      IO.puts("Running fetch/4 handler")
      value = :timer.sleep(1000)
      value
    end)
  end)
end

If you run this, you'll see that the function you provide to get_and_update/4 is executed 10 times. In contrast, the function you provide to fetch/4 is executed only a single time. If Cachex knows you have an existing fetch/4 running, it will queue up any additional calls to fetch/4 to wait and resolve with the value of the first. So it's not "atomic", but it does have measures to make sure that you don't (e.g.) spawn 100 calls to a database when you only need 1.

Does this make more sense? I'm happy to clarify anything if it's still confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants