PostgreSQL Materialized view for NFT collection stats #1

grandsmarquis · 2022-03-16T01:14:32Z

This materialized view goal's is to sum up NFT collection stats based on their on-chain activity (number of total transfers/transfer last day/hour and number of owners). The view is based on the 3 following tables:


CREATE TABLE collection (
    "address" varchar(42) NOT NULL,
    "name" varchar NULL,
    CONSTRAINT data_unique_collection UNIQUE ("address"),
    PRIMARY KEY ("address")
);


CREATE TABLE nft_transfer (
    "collection" varchar(42) NOT NULL,
    "block_number" BIGINT NOT NULL,
    "transaction_hash" varchar(66) NOT NULL,
    "transaction_index" NUMERIC NULL,
    "tx_to" varchar(42) NULL,
    "tx_from" varchar(42) NULL,
    "log_index" BIGINT NOT NULL,
    "transfer_index" BIGINT NOT NULL,
    "timestamp" timestamp NULL,
    "to" varchar(42) NULL,
    "from" varchar(42) NULL,
    "amount" NUMERIC NULL,
    "token_id" NUMERIC NULL,
    "gas_price" NUMERIC NULL,
    "created_at" TIMESTAMP DEFAULT NOW(),
    "trade_currency" varchar NULL,
	"trade_price" numeric NULL,
	"trade_marketplace" int2 NULL,
    CONSTRAINT unique_nft_transfer UNIQUE (
        "transaction_hash",
        "log_index",
        "transfer_index"
    )
);

CREATE TABLE nft (
    "address" varchar(42),
    "token_id" NUMERIC,
    "owner" varchar NULL,
    CONSTRAINT data_unique_nft UNIQUE ("address", "token_id"),
    PRIMARY KEY ("address", "token_id")
);

We currently have a Materialized view that does this that is lacking optimization:

create materialized view nft_collection_stats as
SELECT
    c."address",
    c."name",
    coalesce(T.transfers_total, 0) as transfers_total,
    coalesce(H.transfers_hourly, 0) as transfers_hourly,
    coalesce(H.receivers_hourly, 0) as receivers_hourly,
    coalesce(H.senders_hourly, 0) as senders_hourly,
    coalesce(D.transfers_daily, 0) as transfers_daily,
    coalesce(D.receivers_daily, 0) as receivers_daily,
    coalesce(D.senders_daily, 0) as senders_daily,
    N."supply",
    N."owners"
FROM
    collection c
    LEFT JOIN (
        SELECT
            collection,
            SUM(1) as transfers_total
        FROM
            nft_transfer nt
        GROUP BY
            collection
    ) T ON T.collection = c.address
    LEFT JOIN (
        SELECT
            collection,
            SUM(1) as transfers_daily,
            COUNT(distinct("to")) as receivers_daily,
            COUNT(distinct("from")) as senders_daily
        FROM
            nft_transfer nt
        where
            nt."timestamp" > (CURRENT_DATE - '1 day' :: interval)
        GROUP BY
            collection
    ) D ON D.collection = c.address
    LEFT JOIN (
        SELECT
            collection,
            SUM(1) as transfers_hourly,
            COUNT(distinct("to")) as receivers_hourly,
            COUNT(distinct("from")) as senders_hourly
        FROM
            nft_transfer nt
        where
            nt."timestamp" > (CURRENT_DATE - '1 hour' :: interval)
        GROUP BY
            collection
    ) H ON H.collection = c.address
    LEFT JOIN (
        SELECT
            address,
            SUM(1) as supply,
            COUNT(distinct("owner")) as owners
        FROM
            nft
        GROUP BY
            address
    ) N ON N.address = c.address
where
    c."type" = 721
    or c."type" = 1155 CREATE UNIQUE INDEX nft_collection_stats_idx ON nft_collection_stats (address);

We are looking for someone able to rewrite this query in a more optimized way first. A second step would be to add information about trade/price as you can see that the nft_transfer table contains trade_price so we could compute the floor price during the last and current hour/day.

Comment here if you're interested in this issue and how you'd start.

The text was updated successfully, but these errors were encountered:

georgeroman · 2022-03-16T19:47:40Z

Here are some ideas for improvement I can think of on a quick look:

To be as lean as possible I would store addresses / hashes as bytea instead of varchar.
Some fields could be cached so that aggregation can get skipped for them (eg. token and transfer count per collection). Those would need to be kept in sync as events are ingested but that's not that much of an overhead compared to the speed up to the queries. Getting the token or transfer count for very large collections (eg. ENS on mainnet) will be very expensive unless some data is cached.
Use count(*) instead of sum(1).
Have an additional index of nft_transfers so that the hourly/daily stats can be efficiently retrieved via an index-only scan:
CREATE INDEX nft_transfers_collection_timestamp_index ON nft_transfers(collection, timestamp DESC NULLS LAST) INCLUDE ("to", "from").

Looking forward to getting the indexer open-sourced btw :)

grandsmarquis · 2022-03-17T10:54:45Z

Thanks for your input @georgeroman. I omitted some details to keep the discussion focused on the materialized view

To be as lean as possible I would store addresses / hashes as bytea instead of varchar.

I agree but working with bytea is really annoying as it adds a layer of conversion for presenting in a good format for API/frontend. IMO with Dune Analytics. Or are there some ways to magically render it as a normal address string?

Some fields could be cached so that aggregation can get skipped for them (eg. token and transfer count per collection). Those would need to be kept in sync as events are ingested but that's not that much of an overhead compared to the speed up to the queries. Getting the token or transfer count for very large collections (eg. ENS on mainnet) will be very expensive unless some data is cached.

This could be a really interesting optimization. Would we achieve this with a trigger on the nft_transfer insert? or are there any other way to do it?

Use count(*) instead of sum(1).

Thank you, will do!

Have an additional index of nft_transfers so that the hourly/daily stats can be efficiently retrieved via an index-only scan:
CREATE INDEX nft_transfers_collection_timestamp_index ON nft_transfers(collection, timestamp DESC NULLS LAST) INCLUDE ("to", "from").

We have already some indexes for eg:

CREATE INDEX nft_transfer_collection_idx ON nft_transfer (collection);

CREATE INDEX nft_transfer_to_idx ON nft_transfer ("to");

CREATE INDEX nft_transfer_to_blocknumber ON nft_transfer ("block_number");

CREATE INDEX nft_transfer_to_log_index ON nft_transfer ("log_index");

CREATE INDEX nft_transfer_from_idx ON nft_transfer ("from");

CREATE INDEX nft_transfer_trade_marketplace ON nft_transfer ("trade_marketplace");

CREATE INDEX nft_transfer_timestamp ON nft_transfer (timestamp);

Would you recommend spending time cleaning them with more advanced specifications?

Was actually watching Reservoir protocol for us to power a marketplace as it was recommended to us by Bailey from Defiance ;)

georgeroman · 2022-03-17T16:07:55Z

I agree but working with bytea is really annoying as it adds a layer of conversion for presenting in a good format for API/frontend. IMO with Dune Analytics. Or are there some ways to magically render it as a normal address string?

I think the storage gains you get would be worthwhile (using bytea instead of varchar/text would halve the storage you need for those columns basically). There might also some result in minor performance improvements by having the table rows as lean as possible. As for converting, you can have that done in SQL (or maybe directly in the code):

SELECT '0x' || encode('\xea674fdde714fd979de3edf0f56aa9716b898ec8'::bytea, 'hex');

This could be a really interesting optimization. Would we achieve this with a trigger on the nft_transfer insert? or are there any other way to do it?

Having triggers is probably the easiest way to do it. Although there are more efficient ways of doing it (eg. WITH x AS (INSERT INTO nft_transfer ... ON CONFLICT DO NOTHING RETURNING ...) UPDATE ... FROM x WHERE ...).

Would you recommend spending time cleaning them with more advanced specifications?

Yes, I would remove all indexes that are not actually used and replace them with ones that fit the current queries. Indexes slow down writes so not worthwhile keeping anything not being used. You can always just create them on-demand as they are needed.

Was actually watching Reservoir protocol for us to power a marketplace as it was recommended to us by Bailey from Defiance ;)

Nice, would be glad to help if you need any assistance :)

grandsmarquis added the SQL label Mar 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL Materialized view for NFT collection stats #1

PostgreSQL Materialized view for NFT collection stats #1

grandsmarquis commented Mar 16, 2022 •

edited

Loading

georgeroman commented Mar 16, 2022

grandsmarquis commented Mar 17, 2022

georgeroman commented Mar 17, 2022

PostgreSQL Materialized view for NFT collection stats #1

PostgreSQL Materialized view for NFT collection stats #1

Comments

grandsmarquis commented Mar 16, 2022 • edited Loading

georgeroman commented Mar 16, 2022

grandsmarquis commented Mar 17, 2022

georgeroman commented Mar 17, 2022

grandsmarquis commented Mar 16, 2022 •

edited

Loading