Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dynamic file naming #44

Open
iangreenleaf opened this issue May 14, 2020 · 4 comments
Open

Support dynamic file naming #44

iangreenleaf opened this issue May 14, 2020 · 4 comments

Comments

@iangreenleaf
Copy link

As I've been building out a CDN backed by Arc/Waffle I've been pondering the support for file naming. The way it works currently feels restrictive to me, as if it's really only built for a couple use cases and getting anything else to work involves crowbar-ing it open.

I don't have a super concrete proposal for how to change this, but I wanted to write up my thoughts while they're fresh and hopefully start a discussion.

First, I want to define a couple axioms that help me think about the problem clearly:

Axiom 1: File contents are available to the Waffle definition at upload time, but not when retrieving URLs.

Axiom 2: Scopes are available at both upload and retrieval, and should not change for URL retrieval to be successful.

Axiom 3: Waffle definitions must be able to consistently retrieve a URL for any version of a file using only some identifying information and the scope that was passed while uploading.

Axiom 4: .store should return any identifying information not in the scope, so that by saving the return value of that call and the scope, the caller may reliably retrieve URLs in the future.

Easy Case: Simple file naming

If you just want to do some simple file names based on version and original upload name, that's well supported currently:

def filename(version, {file, id}) do
  # Don't use anything from `file` other than `file.file_name` or you'll be in trouble!
  file_name = Path.basename(file.file_name, Path.extname(file.file_name))
  "#{file_name}-#{version}"
end

.store returns the original file name, which can be used to retrieve URLs, satisfying our axioms:

Avatar.store({%Plug.Upload{}, user}) # => {:ok, "selfie.png"}
Avatar.url({"selfie.png", user}, :thumb)

Medium Case: UUIDs

Say we want to generate a single UUID per upload and incorporate that into the filename. We can use filename to build the string but we cannot generate the UUID there, since it is also called at retrieval:

def filename(version, {_file, id}) do
  "#{id}-#{version}"
end

So we have to generate the UUID outside the definition and pass it in via scope:

uuid = UUID.uuid4()
Avatar.store({%Plug.Upload{}, uuid}) # => {:ok, "selfie.png"}

If we aren't using the original filename at all, we end up in the silly situation mentioned here where the return value is totally irrelevant and the scope is wholly responsible for URL generation:

Avatar.url({"", uuid}, :thumb)

This all works okay, but it makes the calls to Waffle unintuitive. In an ideal world, we'd be able to define the UUID generation inside Waffle and not worry about passing it in the scope. The return value of .store could provide the UUID to us for storage and later retrieval:

Avatar.store({%Plug.Upload{}, user}) # => {:ok, "8ef90ad2276eaa7910"}

Avatar.url({"8ef90ad2276eaa7910", user}, :thumb)

Hard Case: Cache buster

Say we want to generate bulletproof cache-busting filenames for every version of our file. The best way to do this is to hash the file contents and include the hash in the URL. That way identical files will always generate identical file names, and any change to the file contents will generate a new URL.

If we can't do the hashing inside the definition, though, we will have to redefine so much logic we might as well not be using Waffle. But if we try to do it inside the definition, we're in trouble:

def filename(version, {file, scope}) do
  "#{scope.id}-#{uh_oh(where_are_my_file_contents)}"
end

We can't do the hashing inside filename because the file contents won't be available at retrieval (nor would it be performant even if they were). And even if we could, the file that's passed in is the original file, not the version being named.

If we are able to do the hashing in the definition, we now face a new problem: what do we return from .store? Remember Axiom 4: we need to return everything necessary to retrieve URLs. Since our URLs are now different for every version in a non-recreatable way, we need to return all the generated information for every version. One possibility is to return the full storage path:

Avatar.store({%Plug.Upload{}, user})
  # => {:ok, %{thumb: "/user/12/selfie-thumb-8ac330ef1.jpg", original: "/user/12/selfie-original-d9951efca.jpg"}}

Avatar.url({%{thumb: "/user/12/selfie-thumb-8ac330ef1.jpg", original: "/user/12/selfie-original-d9951efca.jpg"}, user}, :thumb)

Of course, if we do this for every definition, it unnecessarily complicates the API for all the Easy Case people who don't have dynamic file names and just want to store "selfie.png" in a string field and not worry about a big data structure.

I'm not sure what the right answer is. I do think this final use case is worth supporting, but it's hard to do so in a way that doesn't box people in but is still pleasant to work with.

@iangreenleaf
Copy link
Author

iangreenleaf commented May 14, 2020

FWIW, the solution I'm leaning towards at the moment, in broad strokes, is to have a "context" that gets passed around the definition calls that can contain additional information about the file and versions. We can allow the definition to hook in at one or more places in the upload lifecycle to set information in this context:

defmodule Avatar do
  use Arc.Definition
  def context_after_processing(context_before_processing, original_file, versions = %{thumb: %FileObj{}, original: ...}) do
    {
      UUID.uuid4(),
      %{thumb: hash_file(versions[:thumb]), ...}
    }
  end

  def filename(version, {file, scope}, context = {uuid, hashes}) do
    "#{file.file_name}-#{hashes[version]}"
  end

  def storage_dir(version, {file, scope}, context = {uuid, _}) do
    "users/#{scope.id}/#{uuid}"
  end
end

The definition can use any data format for the context, and whatever is set at the end of the upload process is returned by .store, and the caller is responsible for storing the data however they please.

If the context hooks are not overridden, the default context is simply the original file name, which retains the simple API for simple use cases. The definition functions can ignore the context var if it's not needed. Since .store returns the context, it will return "selfie.png" just like it did before and people can continue using it like normal.

@harrygr
Copy link

harrygr commented Aug 20, 2021

In your medium case it seems like it won't quite work as you'll lose the file extension. It seems the filename passed to .url is only used to get the file extension. I've found I have to do something like this:

uid = UUID.uuid4()
{:ok, filename} = Avatar.store({%Plug.Upload{}, uuid})

saved_filename = "#{uuid}#{Path.extname(filename)}"

# persist the filename somewhere (like the DB)

url = Avatar.url({saved_filename, uid})

If you just use an empty string in the tuple passed to .url() you lose the file extension.

I agree this API is far from intuitive. I'm not sure how to improve it though.

@iangreenleaf
Copy link
Author

@harrygr that's a good point! In my case I was doing file type conversion so I always know what filetype the stored version is going to have, but that won't be the case for everyone. I think your solution makes sense, and it highlights the shortcomings of the current API, because you have to combine information from inside and outside the store method to get a usable identifier.

@harrygr
Copy link

harrygr commented Aug 20, 2021

What would be nice is to have less reliance on scope and simply return to full path to the image. Of course then we have the problem of managing the transformations. store returns a single string whereas there'd often be multiple different files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants