Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: port forwarding via SSH tunnel #8367

Open
nicolai86 opened this issue Aug 22, 2016 · 67 comments
Open

Proposal: port forwarding via SSH tunnel #8367

nicolai86 opened this issue Aug 22, 2016 · 67 comments

Comments

@nicolai86
Copy link
Contributor

proposal: port forwarding via SSH tunnel

I'd like to start adding port forwarding via SSH tunnels to terraform.

This is useful when you want to use terraform with systems which are only accessible via a jump host, ie. company internal systems.

Right now terraform already ships with a bunch of providers which might need to
talk to internal systems (e.g. postgres/ mysql/ influxdb/…).

The status quo is to create a SSH tunnel beforehand, or, in cases where the
entire infrastructure is created from scratch, to be split terraform scripts into multiple stages with glue code outside.
E.g. one might setup a private cluster with a jump host, open an
SSH tunnel via bash, and then run a differen terraform script using the newly created
tunnel to access private systems, all wrapped in a single setup.sh script.

Assuming that the SSH tunnel is required for all resources of a given provider,
I suggest adding connection settings to the terraform providers as well, like this:

provider "consul" {
    address = "localhost:80"
    datacenter = "nyc1"

    # run "ssh -L localhost:80:demo.consul.io:80" for any resources of this provider
    connection {
        user = "private-user"
        host = "private.jump-host.io"

        forward {
            remote_host = "demo.consul.io"
            remote_port = 80
            local_port = 80
        }
    }
}

# Access a key in Consul; consul is only available via SSH tunnel
resource "consul_keys" "app" {
    key {
        name = "ami"
        path = "service/app/launch_ami"
        default = "ami-1234"
    }
}

Looking forward to any feedback before I head of adding something like this to terraform… ;)

Related: #4442, #4775

@jen20
Copy link
Contributor

jen20 commented Aug 22, 2016

Hi @nicolai86! I'm certainly not opposed to this, though I'm not sure exactly what it would look like. Going to cc @phinze or @mitchellh here for a second opinion on this.

@jbardin
Copy link
Member

jbardin commented Aug 22, 2016

This sounds reasonable to me. My only comment so far would be to name it something like local_forward to align it with the actual type of forwarding being done (-L), and leave room in case we find a need for remote_forward (-R) later on.

@apparentlymart
Copy link
Contributor

apparentlymart commented Aug 22, 2016

This is an interesting approach. I have some feedback, but really just exploring the idea:


Given that the connection doesn't really "belong to" the provider, I wonder if we should hoist it out to the top level, and add some interpolation variables for it like this:

provider "consul" {
    # expands to the local listen address and port that the "connection" created
    address = "${connection.consul_tunnel.local_address}"
    datacenter = "nyc1"
}

connection "consul_tunnel" {
    type = "ssh"
    user = "private-user"
    host = "private.jump-host.io"

    forward {
        remote_host = "demo.consul.io"
        remote_port = 80
        local_port = 80
    }
}

Presumably for real use the user would sometimes need to provide some credentials in the connection block (either a private key or a password), so the ability to interpolate from variables would be useful to avoid hard-coding those credentials in the config.


It could also be nice to make the local port optional and have Terraform just allocate any arbitrary open port and expose it via the interpolation variable, so the user doesn't need to think about what port is likely to be open on all machines where Terraform might be run.


Wondering if maybe it would be more intuitive to invert the nesting, so that the forwarder is the primary object and it takes a connection as part of its configuration, similar to how resources and provisioners work:

port_forward "consul_tunnel" {
    remote_host = "demo.consul.io"
    remote_port = 80

    connection {
        # Now the connection block is the same as in other contexts, as long
        # as the selected connection type supports port forwarding.
        type = "ssh"
        user = "private-user"
        host = "private.jump-host.io"
    }
}

provider "consul" {
    address = "${port_forward.consul_tunnel.local_address}"
}

@nicolai86
Copy link
Contributor Author

I think exposing the port forwarding as a primitive is a good idea in terms of reuse between multiple resources, and it might also help with code reuse given that the connection attribute already exists on resources. I'm also hoping for a clean integration into the execution graph.

It seems that the general theme is "this is a worthwhile addition" and the questions are mostly minor details. Since I have no idea at all about the terraform core internals I'll take a deep dive and report back in a couple of days…

@apparentlymart
Copy link
Contributor

@nicolai86 I would suggest giving @phinze and/or @mitchellh a chance to respond since they know Terraform (and its roadmap) the best and are likely to give more detailed feedback. Of course, that doesn't mean you can't dig in and start learning about Terraform core. 😀

@nicolai86
Copy link
Contributor Author

don't worry, just want to start learning about terraform core internals. Did I sound like I will go of building yet? 😅

@apparentlymart
Copy link
Contributor

Reflecting on this a while later...

At work we took the approach of running Terraform on a host within the network it's being deployed to, and running it with an automation tool.

This has been working out really well for us:

  • The problem being described here is moot, because there's no bastion wall between Terraform and the services it's trying to configure.
  • The hole we had to poke to allow our build system to trigger a deployment is small: it's just triggering a job in the automation tool we use via a very well-defined API. The only thing possible to do remotely (via a secure channel) is to tell the Terraform "deploy worker" machine to deploy the Terraform configuration at the HEAD of our git repo, and so we don't need to expose SSH access to these Terraform machines, bastion or otherwise.
  • It encourages other good practices around running Terraform in a very specific environment that's managed by configuration management, which prevents weird little issues caused by running Terraform on different OSes and different machines.
  • Terraform can obtain auth credentials it needs from a credential store within the environment, so the credentials never need to appear on any machine outside of the walls of the target network nor be known directly by any human operator. (We currently do this with a home-grown wrapper script that sets environment variables, rather than with Terraform itself.)

So with all of that said, while it'd be great to have a feature like what was proposed here in the long run so that Terraform can be flexible to run in a variety of different environments, in the short term I'd wholeheartedly recommend that folks consider this alternative approach which has worked out very well for us.

AFAIK such a setup is not possible with Atlas today, in which case I would also suggest that it would be a great feature to be able to use the Atlas UI to control "agents" running within a private network over a secure channel as an alternative to running Terraform on Hashicorp-run infrastructure, which would then enable the above configuration with Atlas as the orchestration tool.

@cakeface
Copy link

I think running Terraform on a server within the VPC is a nice work around for this problem but it has a bootstrapping issue. Where does this server come from initially? Terraform. It means admitting that you have to split your infrastructure management and cannot stand the entire thing up with one run of Terraform.

I also have multiple VPCs that are managed from one Terraform source repository. Applying changes now involves connecting to multiple Terraform nodes and running the updates. And splitting the code out.

All of that is possible, and I can even automate with Fabric or Bash but I don't like adding more tools when Terraform is supposed to be the tool. Also I'm layering scripted automation on top of my very nice declarative automation which just makes me feel a little gross.

For me, I added the SSH tunnel step to a plan and apply shell wrapper for now.

@apparentlymart
Copy link
Contributor

Yes, it is the case that we had to bootstrap the environment from outside and that there is one Terraform config that requires custom effort to apply because it affects the deploy workers themselves. A temporary extra machine booted manually from the same AMI as the deploy workers addresses that problem, but I certainly won't claim that this is super convenient. It's just a compromise that we tolerate because we apply this configuration relatively infrequently compared to the others that deal with our applications themselves.

@hingstarne
Copy link

hingstarne commented Mar 13, 2017

Hi guys,
just wanted to add my 5 cent and try to revive this topic.
From my perspective to move the tunnel out of the provider looks smart, but has a severe disadvantage.
If you have a remote exec or a file copy the ssh connection is closed after that, so nothing to clean.
It just simple exits. Even if terraform crashes. If you would implement a tunnel this way.

port_forward "consul_tunnel" {
    remote_host = "demo.consul.io"
    remote_port = 80

    connection {
        # Now the connection block is the same as in other contexts, as long
        # as the selected connection type supports port forwarding.
        type = "ssh"
        user = "private-user"
        host = "private.jump-host.io"
    }
}

You need a destructor in the code that can also be triggered.
So from this logic extend the existing connection and add it to certain providers or ressources would be the more safer route to go.

@PLaRoche
Copy link

Any progress on this? We have to open an SSH tunnel every time we run terraform as it manages our RDS instances that are private only.

@matelang
Copy link

This is a major blocker for us as well.

@rata
Copy link

rata commented Mar 29, 2017

What we are thinking as a workaround, but of course doesn't help all, is to use a kubernetes job to run terraform plan/apply.

As it runs in the cluster, it has access to the private resources, and it's easy to run for all (using a web interface for kubernetes) without needing manually setup tunnels, credentials for those and all. And the idea is to use a remote tfstate on S3 (or something else).

I'll update if we have the time to go more on this path. But, of course, will only help people also running kubernetes clusters :)

@automaticgiant
Copy link
Contributor

i mostly just (right now) want to be able to provision a vm with docker and forward the docker.sock so that terraform can deploy containers onto it without having to set up tcp listener (because i won't want it later anyway.)

@fquffio
Copy link

fquffio commented Jul 1, 2017

Any progress on this? It's almost a year now… The mentioned Terraform gurus were asked for an opinion but didn't reply. Is this issue abandoned?

Bastion hosts are quite common, and relying on external scripts to create an SSH tunnel before Terraform can operate sucks, makes the whole process way more complicated since there are more steps that you must remember of, makes your project far more difficult to maintain if you have multiple resources that require such feature (Redis, MySQL, ElasticSearch, Consul, …), and can be very dangerous if you're working with multiple environments (it's kinda easy to launch terraform apply on dev when you still have your tunnel pointing to production database, and vice versa). I definitely can't see why this issue is considered so low priority?

@apparentlymart
Copy link
Contributor

apparentlymart commented Jul 1, 2017

Hi @fquffio!

Before I respond I should explain that at the time of my last comments I was an outside open source contributor, but in the meantime I've become a HashiCorp employee working on Terraform.

It is not that this issue is considered low priority, but rather that there are many issues that are all considered important. There remains design work to do to figure out exactly how this will work, and then non-trivial implementation work to get it actually done.

Believe me that I really want to see this feature too, and we'll get there. We're working through the feature request backlog as fast as we can while also keeping up with bug fixes, etc. I understand the frustration and I can only ask for continued patience.

At this time, my hope is to move forward with a configuration structure somewhat like the following, taken from my comment above:

port_forward "consul_tunnel" {
    target_host = "demo.consul.io"
    target_port = 80

    connection {
        # Now the connection block is the same as in other contexts, as long
        # as the selected connection type supports port forwarding.
        type = "ssh"
        user = "private-user"
        host = "private.jump-host.io"
    }
}

provider "consul" {
    address = "${port_forward.consul_tunnel.local_address}"
}

It'll take a little more prototyping to figure out the details of this, such as how we can wire the connection creation and shutdown into the graph, whether the existing connection mechanism can be extended to support tunnels in this way, etc. We'll have more to say here when we are able to complete that prototyping.

@spanktar
Copy link

spanktar commented Aug 4, 2017

I'm also interested in this and suggest something along these lines: using a connection block inside the provider:

provider "consul" {
  address = "${aws_route53_record.elb_consul.fqdn}"
  datacenter = "dc1"

  connection {
    type = "tunnel"
    host = "${aws_instance.bastion_1.public_ip}"
    port = "8500"
    private_key = "${file("${var.local_ssh_key_path}")}"
    user = "${var.ssh_user}"
  } 
}

@ekristen
Copy link

While I like this approach I think it is sensible for long term, I have to wonder if it would not be easier to get bastion support as it exists today with aws_instance, and other resources added to resources like postgres_database, etc so that people can start using it today.

Either way, I'm a big +1 for supporting bastion hosts on more resources.

@madmod
Copy link

madmod commented Aug 24, 2017

+1 I think this same pattern could be good for supporting VPN access to resources. Having the ssh tunnel be a resource which depends on other resources (Like the bastion instance for example) would solve any ordering issues on first run.

@cloudvant
Copy link

@apparentlymart it is time to fix this. You have been dancing around the issue for too long. Either fix it or close it but you have kept us waiting for too long.

@nbering
Copy link

nbering commented Aug 29, 2017

@vmendoza That comment seems a little out of line for a free open source project. If you feel so strongly about it... dig in and write some code.

@kwerle
Copy link

kwerle commented Sep 1, 2017

I would also request that if/when this is implemented there be a remote_command portion. I specifically want to forward a port to a service that I want to launch as I make the connection.

ssh -L my_port:target_port host some_service_providing_access_on_target_port

@stefansundin
Copy link
Contributor

Hello everyone. I decided to try to tackle this myself by building a custom provider. And I'm happy to say that I'm quite pleased with the result. It works by declaring a data source (basically, what you want is a local port that you want to be forwarded somewhere via SSH).

While I am sure there are many things that can be improved, what is great about my solution is that it is usable right now.

I'd like to invite everyone who is having this issue to try it out. Here's the repository: https://github.com/stefansundin/terraform-provider-ssh

Please be careful and do not use in production quite yet. If it breaks something you can keep both pieces. :)

As always, suggestions for improvements are welcome! Thanks all!

@jaysonsantos
Copy link

hey @WhyNotHugo that looks like a race condition where the db tried to connect before the tunnel had the port open.
Do you mind opening an issue with a minimal terraform file to test?
thank you!

@seanamos
Copy link

For those still struggling with this, I'd like to point out a module that has a clever solution for this: https://github.com/flaupretre/terraform-ssh-tunnel
I've been using it with great success, it works with plan/apply/destroy etc. and cleans up after itself.

Since we don't use SSH any longer and use AWS SSM, I was also able to adapt the module fairly easily to use the aws cli to start SSM sessions to forward ports.

@shake76
Copy link

shake76 commented Aug 15, 2022

Hey Guys, I just ran with the same issue when running tf plan in Terraform Cloud with my private EKS endpoint, just wondering if is there any workaround for SSH tunnel with HTTPS_PROXY env

@pjanuario
Copy link

@seanamos do you have any example of the AWS SSM usage?

@UXabre
Copy link

UXabre commented Sep 6, 2022

ssh tunneling doesn't work for many azurerm resources since it doesn't allow to use the URI anymore but instead demands and supports only the resource id to be provided. The only "work-around" is setting up the entire infrastructure on a jump host...which is very involved. I also second a way to "jump" via ssh tunnels or by any other means automatically running terraform on a jump host

@littlejo
Copy link

littlejo commented Sep 9, 2022

@pjanuario You can find my version: https://github.com/littlejo/terraform-ssm-tunnel

@boris-yakimov

This comment was marked as off-topic.

@corkupine
Copy link

Any news on the prioritization of this? Glad it's staying open and remaining under consideration at least.

@omarismail
Copy link
Contributor

Hey @corkupine , the Terraform team is researching this problem, and would love to chat with you (and other folks) about it.

Mind reaching out to me ([email protected]) and we can schedule a time to chat live?

@idelsink
Copy link

I'm also interested in this! I'm currently trying to manage a Private Kubernetes cluster inside GCP and I want to use an IAP proxy to connect to that cluster.

So far I'm able to install the gcloud SDK in the terraform Cloud runner (using a custom data external script because the terraform-google0gcloud cannot be use in Terraform Cloud; see terraform-google-modules/terraform-google-gcloud#94 ) and start a Socks5 tunnel (using gcloud iap command) to my private kubernetes cluster using the flaupretre/terraform-ssh-tunnel module.

I am able to connect to my cluster during the plan phase, but the background process is I think stopped during the execution phase. On my local machine it works as expected, so for now I'm assuming that the plan and run phase on TF Cloud spawn new instances? (I'm still very much in an experimentation phase)

@sdemjanenko
Copy link

sdemjanenko commented Apr 30, 2024

@idelsink I was able to do something like with in Terraform Cloud for AWS SSM. The Terraform provider lifecycle makes it so we have to do use a keepalive data source as well as several depends_on statements: https://github.com/ComplyCo/terraform-provider-aws-ssm-tunnels. The benefit of AWS SSM is that their websocket server that runs is built in Go so including it in the provider code was fairly straightforward.

@omarismail I would love providers to have lifecycle hooks so that my provider could be triggered to start before the provider that uses a tunnel and stays up until the provider is done with the tunnel? I currently have to use a keepalive data source to prevent Terraform from terminating my provider which maintains the tunnel.

@apparentlymart
Copy link
Contributor

apparentlymart commented May 29, 2024

Thanks for sharing that SSM Tunnels example, @sdemjanenko.

I've been researching and prototyping in this area as part of my work in #35078, and although I was using traditional SSH tunnels as my main motivating example it seems like SSM tunnels are similar enough in concept that they could also be exposed using the "ephemeral resources" concept I discussed over there.

Either your provider or the official hashicorp/aws provider could offer an ephemeral resource type that negotiates an SSM tunnel using a randomly-selected local port number, and then exposes the address for the local listen socket as an attribute so it could be used in a provider configuration, provisioner configuration, or any other context where ephemeral values are accepted:

resource "aws_instance" "example" {
  # ...
}

# INVALID: This is a hypothetical new concept of "ephemeral resources"
# that we're currently considering as part of a solution to this issue.
ephemeral "aws_ssm_tunnel" "mysql_server" {
  instance_id = aws_instance.example.id
  remote_port = 3306
}

provider "mysql" {
  # The local_endpoint attribute would be something like 127.0.0.1:33453,
  # where 33453 is a randomly-selected TCP port number that serves
  # as the local end of the tunnel.
  endpoint = ephemeral.aws_ssm_tunnel.mysql_server.local_endpoint

  # ...
}

In the prototype Terraform would notice that the MySQL provider instance refers to ephemeral.aws_ssm_tunnel.mysql_server and so would "open" an instance of that ephemeral resource before instantiating the provider, and would wait until the MySQL provider's work is all done before "closing" that instance. That directly addresses the problem described in the documentation for the "aws-ssm-tunnels" provider of Terraform not understanding that other parts of the configuration are directly using the tunnel.

The MySQL provider would really be connecting to a listen socket belonging to the hashicorp/aws plugin process, which would then in turn forward to the remote MySQL server, in a similar way as would be true for the stubby prototype SSH tunnels ephemeral resource type I implemented in the aforementioned draft PR.

@seanamos
Copy link

@apparentlymart I like the idea. My initial thinking was that having a first class resource/concept for a "connection" in Terraform would be the way to go, however, a more generic "ephemeral" resource might neatly solve an even broader set of cases.

Would an ephemeral resource run during both plan and apply phases? Currently, all the community workarounds to this issue have all sorts of plan/apply shortcomings. For example, with the data source approach (which we currently use):

terraform plan -out=plan # works, connection will open
terraform apply plan # will fail, connection only opened during plan

terraform apply # works

@apparentlymart
Copy link
Contributor

Yes, the general concept of "ephemeral" is to describe anything that exists only temporarily in memory during one Terraform phase, and ephemeral resources in particular represent "objects" (in Terraform's usual fuzzy sense of that word) that ought to be opened when needed and closed soon after they are no longer needed, including repeating that process separately for the plan and apply phases.

As a provider developer one analogy that might resonate is the provider instance itself: those are also "ephemeral" despite us not having used that term to describe them before now. Terraform re-instantiates each provider again in each phase (plan vs. apply) and the provider instance has no "memory" preserved between phases. Ephemeral resources are a similar concept except that the "open" and "close" behaviors are decided by the provider itself, rather than by Terraform Core.

An ephemeral resource type representing a tunnel would therefore implement "open" by opening the connection to the remote service and the local listen socket, and "close" by closing both of those, and those two actions would be repeated for both the plan and apply phases.

@seanamos
Copy link

Thanks for the in-depth explanation. After taking a better look at your PR, I can say I'm quite excited for this to land in TF 👍. Can finally get rid our various flaky hacks and workarounds.

@tomalok
Copy link

tomalok commented May 29, 2024

ephemeral "aws_ssm_tunnel" "mysql_server" {
  # This tunnel is used only for provisioning, so is only needed
  # during the apply phase.
  count = terraform.applying ? 1 : 0

  instance_id = aws_instance.example.id
  remote_port = 3306
}```

Note that you might need to have ephemeral resources existing in order to successfully plan what's going to happen during apply...

That is, the tunnel needs to be set up so that you can reach your MySQL instance to pull the current state (refresh, iirc) to determine the deltas for potentially making modifications to the MySQL schema, etc.

@sdemjanenko
Copy link

@apparentlymart I like this idea. It would simplify my provider (and make the dev experience way better) + increase reliability dramatically.

@apparentlymart
Copy link
Contributor

@tomalok indeed sorry I was adapting a few different examples I'd already written for other purposes and I forgot to remove the count = terraform.applying ? 1 : 0 part that wasn't relevant here.

Making the ephemeral resource only get opened and closed during the apply phase would be valid for a tunnel to an SSH server used for provisioners, but would not be valid for a provider configuration since (as you say) a provider needs to be instantiated during the plan phase too.

I've edited the earlier comment to remove the declaration that didn't belong. If you're curious you can see in @tomalok's comment what I originally shared, which was incorrect.

@dm3ch
Copy link

dm3ch commented Nov 21, 2024

Created a feature request for adding ephemeral resource for ssm tunnel in AWS provider, since ephemeral resources is going to be GA in TF 1.10.

If you're interested please vote for it - hashicorp/terraform-provider-aws#40249

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests