Document whether `fetch/4` is an atomic operation #393

1player · 2024-12-03T10:30:45Z

Use case: I want to fetch a key from Cachex, or run an expensive operation to generate it. In a multi-node cluster, I'd like to do the operation once, which means that the key should be locked until the fallback function returns.

#190 clarified that get_and_update/4 is atomic and uses transactions, yet it is not clear to me whether fetch/4 is. If it is not atomic, I wonder it's usefulness given that get_and_update/4 is, and thus is more reliable in many scenarios.

So this issue is to ask whether my assumptions are correct, and that perhaps the documentation should be a bit more clear about it.

The text was updated successfully, but these errors were encountered:

whitfin · 2024-12-03T17:00:15Z

@1player it is not atomic, otherwise it would be documented as such.

I'm not sure why you would say it's not useful for that reason; why would you want to block your entire cache on an operation that could take (in some cases) multiple seconds?

You are correct that a naive implementation of fetch/4 could be built using get_and_update/4. That would be redundant (as you point out), so the fetch/4 implementation is much more efficient by avoiding locking the cache table while still ensuring that there are no overlapping calls on the same key.

This behaviour is documented here (which is also available on the published documentation).

Does this answer your question? What would you change in the documentation?

1player · 2024-12-04T10:46:40Z

Thank you for clarifying!

it is not atomic, otherwise it would be documented as such.

I assumed the lack of any mention regarding atomicity to just be an oversight, rather than implicitly saying that it is not atomic. I think it would be more useful to mention this fact explicitly in the fetch/4 docs, and direct users to use get_and_update/4 instead if they need stronger guarantees.

The cache warmer example you have linked are quite useful and clearly answer my question, yet I didn't come across it because I have no need for cache warming so I never thought to look there for answers :)

1player · 2024-12-04T10:56:52Z

Apologies, now I am confused. You say:

the fetch/4 implementation is much more efficient by avoiding locking the cache table while still ensuring that there are no overlapping calls on the same key

So if no overlapping calls are allowed, it's basically "atomic", i.e. it operates as if that key is locked while a fetch operation is executing, which is exactly what I need. That's what I understand from your comment.

Since I'm apparently too slow, and you haven't confirmed explicitly: can I use fetch/4 on a particular key and have the guarantee there will be NO overlapping calls in the cluster? I just want to compute an expensive operation once.

whitfin · 2024-12-04T16:58:58Z

No problem! Let me try to be clearer, it is a bit awkward to explain in words. It might be clearer with examples:

If you do something like this:

Cachex.fetch(:cache, :my_key, fn -> 
  :timer.sleep(5000)
  :ok
end)

Other writes to keys can still be running in the background, and that includes the key :my_key. So during that 5s sleep, you could do something like this and it would work:

Cachex.put(:cache, :my_key, :my_value)

In this case what would happen is after the 5s, your fetch/4 would realise that it has been set in the meantime and simply return :my_value (discarding the computed value) because :my_value was technically the last in (based on call time). So it's not "atomic"; the key is still both accessible and writeable.

I just want to compute an expensive operation once

Yep! That's where fetch/4 is far superior to get_and_update/4. The fetch/4 implementation specifically has a mechanism to achieve what you're saying. As we keep comparing with get_and_update/4, I'll tweak the documented example to use that in case it's clearer:

# start a new cache
Cachex.start(:cache)

# via get_and_update/4
for _ <- 1..10 do
  spawn(fn ->
    Cachex.get_and_update(:cache, "key1", fn value ->
      IO.puts("Running get_and_update/4 handler")
      case value do
        nil -> :timer.sleep(1000)
        value -> value
      end
    end)
  end)
end

# via fetch/4
for _ <- 1..10 do
  spawn(fn ->
    Cachex.fetch(:cache, "key2", fn key ->
      IO.puts("Running fetch/4 handler")
      value = :timer.sleep(1000)
      value
    end)
  end)
end

If you run this, you'll see that the function you provide to get_and_update/4 is executed 10 times. In contrast, the function you provide to fetch/4 is executed only a single time. If Cachex knows you have an existing fetch/4 running, it will queue up any additional calls to fetch/4 to wait and resolve with the value of the first. So it's not "atomic", but it does have measures to make sure that you don't (e.g.) spawn 100 calls to a database when you only need 1.

Does this make more sense? I'm happy to clarify anything if it's still confusing.

whitfin added the discuss label Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document whether `fetch/4` is an atomic operation #393

Document whether `fetch/4` is an atomic operation #393

1player commented Dec 3, 2024

whitfin commented Dec 3, 2024

1player commented Dec 4, 2024 •

edited

Loading

1player commented Dec 4, 2024

whitfin commented Dec 4, 2024

Document whether fetch/4 is an atomic operation #393

Document whether fetch/4 is an atomic operation #393

Comments

1player commented Dec 3, 2024

whitfin commented Dec 3, 2024

1player commented Dec 4, 2024 • edited Loading

1player commented Dec 4, 2024

whitfin commented Dec 4, 2024

Document whether `fetch/4` is an atomic operation #393

Document whether `fetch/4` is an atomic operation #393

1player commented Dec 4, 2024 •

edited

Loading