"pool timed out while waiting for an open connection" error #3232

nk9 · 2024-05-14T15:12:24Z

nk9
May 14, 2024

I have 700 INSERT queries I need to run against a Postgres database. My DB connection looks like this:

async fn db_connect() -> Result<PgPool, sqlx::Error> {
    let db_url = env::var("DATABASE_URL").expect("Missing DATABASE_URL env var");

    let connect_options = PgConnectOptions::from_str(&db_url)?
        .ssl_mode(sqlx::postgres::PgSslMode::Prefer)
        .log_statements(LevelFilter::Trace);

    let pool = PgPoolOptions::new()
        .max_connections(20)
        .acquire_timeout(Duration::from_secs(2))
        .connect_with(connect_options)
        .await?;

    Ok(pool)
}

The pool is stored in my AppState struct and passed down to the point at which the call needs to be made. Because I have so many queries, I've created a bunch of Futures with a loop and join_all like so:

async fn load_collections(state: &AppState, table_name: &str, paths: Vec<PathBuf>) -> Result<()> {
    let collections = load_geojson_files(paths).await?;

    let mut tasks = Vec::with_capacity(collections.len());
    for coll in collections {
        tasks.push(populate_collection(state, table_name, coll)); // sqlx call happens in here
    }

    let results = join_all(tasks).await;
    println!("{:?}", results);

    Ok(())
}

The query in populate_collection() inserts many rows at once using UNNEST, each of which contains a geom which can be quite large.

Everything works fine on my local machine, with no network latency. (It ALSO works when I run the queries sequentially!) However, when I instead try to send the data to a Supabase server, the first 50ish queries run fine, but then they start intermittently failing with the error Err(pool timed out while waiting for an open connection). I have chunked the larger inserts in batches of 300 rows so they don't take longer than 1sec and trigger a "long query" warning, and that helped somewhat. However, I'm still getting 10–20 of these "pool timed out" errors on every run of 281 insertion sets (when sending data to the remote server). I even tried setting the chunking to a really low number like 75, which dramatically increased the number of INSERT queries. Unfortunately, that still didn't resolve the issue.

Is this approach, where I create all the Futures at once and then join_all, the best practice for doing this kind of thing with sqlx? I tried using a JoinSet as suggested here so that the number of active Futures matched the number of available connections. However, I had trouble getting that to work because of borrow checker issues. So before I spend more time trying to get it to work, I thought I'd check and see if it would help in this instance, or if I should be looking somewhere else for a solution.

UPDATE: I succeeded in getting the JoinSet approach to work by using an Arc and… I'm still getting about the same number of "pool timed out" errors. The weird thing is, they tend to come from the beginning of the set of joins, and they sometimes even fail before the last successful join has completed. So acquiring a connection is timing out, but then some other task is trying again and this time succeeding at acquiring a connection. Very strange.

async fn load_collections(
    state: &Arc<AppState>,
    table_name: &str,
    paths: Vec<PathBuf>,
) -> Result<()> {
    let collections = load_geojson_files(paths).await?;

    let mut join_set = JoinSet::new();
    let mut results = Vec::new();
    let concurrency_limit = 20;

    for coll in collections {
        let state = state.clone();
        let table_name = table_name.to_string();
        join_set.spawn(async move { populate_collection(&state, &table_name, coll).await });

        if join_set.len() >= concurrency_limit {
            if let Some(result) = join_set.join_next().await {
                results.push(result.unwrap());
            }
        }
    }

    // Await the completion of all tasks
    while let Some(result) = join_set.join_next().await {
        results.push(result.unwrap());
    }

    Ok(())
}

sqlx = { version = "0.7.3", features = ["runtime-tokio-rustls", "json"] }
macOS Sonoma 14.4.1
DB: Supabase-hosted Postgres

Answered by nk9

May 21, 2024

I implemented a deadpool pool, instead of the built-in PgPool, as suggested by this comment. That worked, but when I used the same concurrency limit of 20, I still got timeouts. The strange thing? It was exactly 6 connection acquisition timeouts every time! This time, though, the error said "max connections exceeded". Supabase says that you get 60 connections, so this is strange.

Still, I tried lowering the number of connections to 14 et… voilà. No more connection exceeded errors. And then I reverted all my deadpool changes and just set the connection limit to 14, and now PgPool is working without any "pool timed out" errors.

So it would appear that I've found the solution. I don't unders…

View full answer

nk9 · 2024-05-20T10:54:15Z

nk9
May 20, 2024
Author

I've got a workaround which I'm really not happy with. If I just try again, it seems to work the next time through. 😬 With a 2sec acquisition timeout, I'm seeing around 20 retries needed to successfully upload everything. But it DOES eventually work. Again, this isn't needed when I do them sequentially with a 2sec timeout.

I'd love some insight into how I can fix the underlying problem!

async fn populate_collection(
    state: &Arc<AppState>,
    table_name: &str,
    coll: NamedFeatureCollection,
) -> Result<()> {
    let metro_year = format!("{} {:?}", coll.utp_code, coll.year);
    let chunk_count = 500;

    'outer: for (i, (names, features)) in coll.chunks(chunk_count).enumerate() {
        loop {
            let count = names.len();

            // …

            let query = format!(
                "INSERT […]"
            );
            let result = sqlx::query(query.as_str())
                .bind([…])
                .execute(&state.pool)
                .await;

            match result {
                Ok(_) => continue 'outer,
                Err(e) => match e {
                    sqlx::Error::PoolTimedOut => println!(
                        "!!! Failed to insert {} for {}. Trying again…",
                        table_name, metro_year
                    ),
                    _ => return Err(anyhow!(e)),
                },
            }
        }
    }
    debug!("Finished inserting {} for {}", table_name, metro_year);

    Ok(())
}

0 replies

nk9 · 2024-05-21T23:28:26Z

nk9
May 21, 2024
Author

I implemented a deadpool pool, instead of the built-in PgPool, as suggested by this comment. That worked, but when I used the same concurrency limit of 20, I still got timeouts. The strange thing? It was exactly 6 connection acquisition timeouts every time! This time, though, the error said "max connections exceeded". Supabase says that you get 60 connections, so this is strange.

Still, I tried lowering the number of connections to 14 et… voilà. No more connection exceeded errors. And then I reverted all my deadpool changes and just set the connection limit to 14, and now PgPool is working without any "pool timed out" errors.

So it would appear that I've found the solution. I don't understand WHY it works, but it does. Maybe it's something to do with the fact that I'm not using the Supabase pooler, although that is supposed to give me 200 connections, so 🤷.

For anyone who wants a deadpool replacement for PgPool, here is a version which works with the current version of deadpool (0.12.1):

pool.rs

// Adapted from performance-service by tsunyoku
// https://github.com/osuAkatsuki/performance-service/blob/9d40594d7645d38d1bde167fdadedb89cb4b4772/src/models/pool.rs
// Used under MIT license

use deadpool::managed::{Manager, Metrics, RecycleResult};
use sqlx::postgres::PgConnectOptions;
use sqlx::{ConnectOptions, Connection, Error as SqlxError, PgConnection};

#[derive(Clone, Debug)]
pub struct DbPool {
    options: PgConnectOptions,
}

impl DbPool {
    pub fn new(options: PgConnectOptions, max_size: usize) -> anyhow::Result<Pool> {
        Ok(Pool::builder(Self { options }).max_size(max_size).build()?)
    }
}

impl Manager for DbPool {
    type Type = PgConnection;
    type Error = SqlxError;

    async fn create(&self) -> Result<PgConnection, SqlxError> {
        self.options.connect().await
    }

    async fn recycle(&self, obj: &mut Self::Type, _: &Metrics) -> RecycleResult<SqlxError> {
        Ok(obj.ping().await?)
    }
}

pub type Pool = deadpool::managed::Pool<DbPool>;

macro_rules! get_conn {
    ($input:expr) => {
        $input.get().await?.deref_mut()
    };
}
pub(crate) use get_conn;

And then to use it, it's just:

main.rs

use sqlx::query;
use std::ops::DerefMut;

mod pool;
use pool::{DbPool, Pool, get_conn};

const CONNECTION_COUNT: usize = 14;

// …

async fn main() -> Result<()> {
    let pool = DbPool::new(connect_options, CONNECTION_COUNT)?;
    
    query!("SELECT 1+1 AS res").execute(get_conn!(pool)).await?;
}

2 replies

benjamingb May 22, 2024

Great solution, you can add the DerefMut inside the macro so as not to worry about importing

macro_rules! get_conn {
    ($input:expr) => {{
        use std::ops::DerefMut;
        $input.get().await?.deref_mut()
    }};
}

hkaiser25 May 23, 2024

How do I use this with sqlx's built-in transaction support?

maxcountryman · 2024-05-30T16:18:25Z

maxcountryman
May 30, 2024

I believe this is a bug. It's manifesting in 0.7.4 specifically. I'm suspicious of this commit.

9 replies

nk9 May 31, 2024
Author

Good point, I hadn't pinned it. However, even after I changed my Cargo.toml, I'm still seeing all sorts of issues acquiring connections from the pool if I use more than ~14 connections. So it seems like this still wouldn't fix my issue.

maxcountryman May 31, 2024

🤷‍♂️ It's incredibly difficult to say without having direct access to your project. For example, if you didn't properly clean the build, etc.

Regardless of that, there is clearly an issue that isn't present in 0.7.3; this has successfully fixed the pool timeout errors for me.

tamaroning Jun 10, 2024

I encountered the same issue when using MySqlPool so the issue may not be specific to Postgres.

Jhonfunk Jul 2, 2024

Funny thing is test_before_acquire(false) solved my problem!
It turns out that PgConnecting::ping returns the error message unexpected Sync message for some postgresql servers. But PgPool doesn't seem to detect the error returned by ping, it just waits for a timeout.
#3241

cole-h Jul 2, 2024

Just as another data point: neither test_before_acquire(false) nor pinning to =0.7.3 (nor combining the two) resolved my timeouts in tests.

riverreal · 2024-07-25T04:03:56Z

riverreal
Jul 25, 2024

It might sound obvious but it can also be the machine not having enough resources when running the program with such high load.
That might not be the case for OP, but it could be for anyone else trying to figure out this error, so I will leave the comment here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"pool timed out while waiting for an open connection" error #3232

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 11 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

"pool timed out while waiting for an open connection" error #3232

Replies: 4 comments · 11 replies

nk9 May 20, 2024 Author

nk9 May 21, 2024 Author

nk9 May 31, 2024 Author

Replies: 4 comments 11 replies

nk9
May 20, 2024
Author

nk9
May 21, 2024
Author

nk9 May 31, 2024
Author