Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema-dump all user-created functions, not just 'public' #140

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

smudge
Copy link

@smudge smudge commented Dec 14, 2023

EDIT: see this comment for more recent changes to this PR!

Original Description:

For an application relying on schema.rb, this gem will dump only functions in the hardcoded public schema. This means that an application with its functions/tables in other pg schemas (and/or an application with a nondefault schema search path) may be able to create a function with create_function that does not survive the round-trip to schema.rb, in turn breaking the app after db:schema:load.

One workaround is to use structure.sql (which relies on pg_dump) to dump the functions in the other schemas. However, because the postgres search_path is accessible from within a query, this PR proposes using that value in this gem's query to dump all functions within the current search_path (and--see the last bullet below--actually matches the way structure.sql gets produced!). This allows the end-developer to continue using schema.rb while inserting functions into multiple PG schemas, and is backwards compatible with the prior = 'public' check (assuming public is in the search path, which it is by default).

Alternatives considered:

  • Filter functions not by namespace, but by owner, and only dump functions owned by the current user.
  • Look up grants and only dump (non-information-schema) functions granted to the current user (or to the PUBLIC role).
  • Use pg_dump (or Rails' code that executes pg_dump) and extract functions. However, the Rails implementation for structure.sql actually uses search_path to construct the --schema= argument for the command, which is how I came full circle back to using search_path in this PR. It actually matches what Rails does in its Ruby logic for producing structure.sql:
    args += search_path.split(",").map do |part|
      "--schema=#{part.strip}"
    end
    (source)

Copy link
Owner

@teoljungberg teoljungberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, holidays and vacations came between.

Thank you for contributing!

lib/fx/adapters/postgres/functions.rb Outdated Show resolved Hide resolved
Comment on lines +47 to +48
"name" => schema_aware_name(function),
"definition" => schema_aware_definition(function)
Copy link
Author

@smudge smudge Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nudge @teoljungberg! I'm also a scenic user and so I dug into how that gem works, and ended up proposing this change on their end (which so far seems to be well-received, though I cannot say whether it will ultimately merge): scenic-views/scenic#401

In resolving some discrepancies within that gem, I ended up relying on the way that a core Rails contribution enabled schema-handling for enums: rails/rails#45740

I think it would make a ton of sense for both fx and scenic to behave similar to this core enum approach, which is roughly as follows:

  1. Query for all user-created structures that are accessible to the current PG user (regardless of search_path)
  2. If the structure's schema does not match current_schema(), include the fully-qualified name in schema.rb
  3. If the structure's schema does match current_schema(), do not include the schema in the generated schema.rb name.

Points 2 and 3 there are really crucial. This allows a schema.rb to be shared across contexts where the "default" search path might vary (and is entirely dependent on external/config factors -- it is not guaranteed to be public, and some PAAS providers do not even place this in the end-developer's control), while still allowing developers to explicitly specify non-default schemas and have those survive the round-trip to schema.rb.

@@ -53,6 +61,25 @@ def functions_from_postgres
def to_fx_function(result)
Fx::Function.new(result)
end

def schema_aware_name(function)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered moving both of these "schema-aware" mappers to Fx::Function, but it also complicates the tests for that class in ways that didn't feel necessary. (I figured I would leave it up to you @teoljungberg, as to whether that class should be a "dumb" wrapper or a "smart" translator.)

end
end

def schema_aware_definition(function)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this method is necessary is because PG itself is not schema-aware when returning the function definition, and will, for example, include the fully-qualified name (public.something) even if current_schema() matches the function's schema.

This is probably the correct behavior on PG's part, but it means that for schema.rb purposes, we should strip it. (See here for the reasoning -- this how Rails handles enum names in a way that enables a single schema.rb to be valid & shareable in many contexts that may not be configured to have the same default PG schema.)

@smudge smudge changed the title Schema-dump all functions within search_path, not just 'public' Schema-dump all user-created functions, not just 'public' Jan 19, 2024
@smudge smudge requested a review from teoljungberg January 19, 2024 23:15
FROM pg_proc pp
JOIN pg_namespace pn
ON pn.oid = pp.pronamespace
LEFT JOIN pg_depend pd
ON pd.objid = pp.oid AND pd.deptype = 'e'
LEFT JOIN pg_aggregate pa
ON pa.aggfnoid = pp.oid
WHERE pn.nspname = 'public' AND pd.objid IS NULL
WHERE pn.nspname NOT IN ('pg_catalog', 'information_schema')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some strategies go as far as ignoring everything in pg_*. I'm considering making that change still.

(Unfortunately I have not found any other way to isolate user-created functions, and there are many examples that take a similar approach to what I have done here.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smudge I think you can use this (I stupidly branched and created a PR before I saw this one)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants