Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equinix migration to new data centres #2666

Closed
sxa opened this issue Jul 11, 2022 · 24 comments
Closed

Equinix migration to new data centres #2666

sxa opened this issue Jul 11, 2022 · 24 comments

Comments

@sxa
Copy link
Member

sxa commented Jul 11, 2022

Equinix wish to move us to their new servers and decommission the older instances that had been previously hosted at Packet data centres. The scope for this will be all of the servers that we have hosted with them other than the Ampere Altra systems which are hosted by the "Works On ARM" project. We already have one system that has been migrated, so we already have a second project set up in the Equinix web UI, so that part of it is sorted out. The next stage is to start creating more new machines in the new project, and start migrating workloads over. We will have some overlap time to run systems in parallel until they are verified as working.

The current set of machines in scope that we have with Equinix are as follows:

Hostname Type (core/RAM) Usage Replacement required
ansible.adoptopenjdk.net t1.small.x86 (4*C2550/8) Old ansible server Already done
docker-packet-amd-1 c3.medium.x86 (24*7402P/64) DockerStatic+build host Yes
docker-packet-intel-1 m2.xlarge.x86 (56*5120/384) DockerStatic+build host Yes
docker-packet-armv8-1 c3.large.arm (32*8180/128) eMag for tests Yes (Maybe TC?)
esxi-host m1.xlarge.x86 (16*2650/256) ESX: Solaris/x64 VMs Yes
test-packet-u1604-x64-1 t1.small.x86 (4*C2550/8) Test system Virtualise? Offline - FS perm issue
test-packet-u1604-x64-2 t1.small.x86 (4*C2550/8) Test system (Actually 20.04) Virtualise?
test-packet-u1604-x64-3 t1.small.x86 (4*C2550/8) Test system Inaccessible? Virtualise?
test-packet-x64-win2012 t1.small.x86 (4*C2550/8) Test system No? Capacity elsewhere

NOTE: Once migration has occurred we will need to raise an issue in Eclipse Infrastructure's gitlab to get the x64 and Solaris machines added to the TC jenkins instance.

@sxa sxa added the ansible label Jul 11, 2022
@sxa sxa added this to the 2022-07 (July) milestone Jul 11, 2022
@sxa sxa self-assigned this Jul 11, 2022
@sxa sxa moved this to In Progress in Adoptium July 2022 Plan Jul 11, 2022
@sxa
Copy link
Member Author

sxa commented Jul 11, 2022

Initial thoughts:

  • docker-packet-amd-1 and docker-packet-intel-1: Provide direct replacements for both as docker hosts
  • ESXi/Solaris and the test-packet-u1604 ones: Two replacement servers running some hypervisor for running "real" VMs (inc. RHEL) with redundancy
  • eMag - replace - possibly use in place of the Altra for TC? Is there benefit to having a non-Altra system for general work?
  • Decommission windows server since we have good capacity elsewhere.

@sxa
Copy link
Member Author

sxa commented Jul 15, 2022

Note that at the moment the machines we use for our Linux/x64 performance runs are on those bare metal machines. If we switch to a virtualised systems we will need to consider whether they are still suitable

@sxa
Copy link
Member Author

sxa commented Sep 28, 2022

@vielmetti do these requirements look feasible? Sounds like the emag system will be fully decomissioned so that one can be ignored in the above comments.

@vielmetti
Copy link

@sxa Yeah, these all look quite sensible, I don't see any problems. I think it's a good tdea to virtualize the t1.small equivalents as opposed to replacing them with hardware one for one.

@sxa
Copy link
Member Author

sxa commented Sep 28, 2022

@vielmetti Great! Are you ok for me to start creating more on-demand machines in our t1.replacement project now and starting to move over? Is there anything I need to do for the Altras which I understand will also be physically moving or will we just wait for more from you on that?

@vielmetti
Copy link

@sxa You should be able to start putting new on-demand machines in the new project and starting to move over. (Of course it's OK to temporarily have both old and new systems going).

The physical move of the Works on Arm Altras is being coordinated separately, I don't have a date for that yet.

@sxa
Copy link
Member Author

sxa commented Oct 1, 2022

@vielmetti Any preference on which locations we should use for the new ones? Most of our existing ones are in AM or DA with one in NY - a similar geographical split would likely still be useful.

@vielmetti
Copy link

@sxa You should be able to stay in the same metros (moving AMS1 resources to AM, and EWR1 resources to NY, and DFW2 resources to DA).

You may find our "capacity dashboard" useful for planning or to get a quick overview of availability by plan and metro:

https://metal.equinix.com/developers/capacity-dashboard/

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Oct 13, 2022

Ive got 2 x64 dockerhost machines up and 1 ESXi7 host up.

Bit of a problem with the ESXi host. I created a Solaris 10 vm, have given it the ip 145.40.115.43, out of the 145.40.115.40/29 network provided by the ESXi host. But I can't ssh into it. I have enabled PermitRootLogin in /etc/ssh/ssh_config yet I still cant get it. The ip of the machine is pingable.

Upon first trying to ssh in

hkhel@hkhel-mac ~ % ssh [email protected]
Unable to negotiate with 145.40.115.43 port 22: no matching key exchange method found. Their offer: gss-group1-sha1-toWM5Slw5Ew8Mqkay+al2g==,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1

Then I installed OpenSSH onto the machine

hkhel@hkhel-mac ~ % ssh [email protected]
The authenticity of host '145.40.115.43 (145.40.115.43)' can't be established.
ED25519 key fingerprint is SHA256:WsCOUDSHXWgBIaonGzT1rPuOpqeiEAEoPsTcfD6PYrk.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '145.40.115.43' (ED25519) to the list of known hosts.
ssh_dispatch_run_fatal: Connection to 145.40.115.43 port 22: incorrect signature

While im still googling, any idea @sxa @vielmetti ?

@sxa
Copy link
Member Author

sxa commented Oct 13, 2022

Your local ssh client (on the machine you're connecting from) likely does not support the algorithms which the Solaris 10 default install does.

Three options:

  1. Start up a docker container running an older OS (Ubuntu:20.04 should be fine) and connect in from that - if I do that I don't get the failure and I get a password prompt
  2. run with ssh -o HostKeyAlgorithms=ssh-rsa
  3. Adjust the server configuration to accept RSA by default - try HostKeyAlgorithms ssh-rsa in the sshd_config file on the server (may need the ssh server restarted to take effect).

@sxa sxa pinned this issue Oct 14, 2022
@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Oct 17, 2022

@Haroon-Khel
Copy link
Contributor

@Haroon-Khel
Copy link
Contributor

Copied from #2792 (comment)

A second ESXi server has been created as the first one ran out of assignable IP addresses. On this second server, we have the 2 tck machines, solaris and ubuntu. EF are currently setting them up with an ETA of before the end of the month

@sxa sxa moved this to In Progress in Adoptium November 2022 Plan Nov 23, 2022
@Haroon-Khel
Copy link
Contributor

Those static docker containers have had https://ci.adoptopenjdk.net/view/Test_grinder/job/AQA_Test_Pipeline/ run on them to completion. Theyre good to go

Repository owner moved this from In Progress to Done in Adoptium November 2022 Plan Dec 2, 2022
@vielmetti
Copy link

@Haroon-Khel - does this mean that all of the old infrastructure has been (or is ready to be) decommissioned? I still see those old systems online. cc @sxa

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Dec 2, 2022

@vielmetti The old infrastructure is ready to be decommissioned. Replacements have been made for them

@sxa
Copy link
Member Author

sxa commented Dec 2, 2022

@vielmetti I haven't shut them down yet - do you want me to do that and deprovision in your UI? Happy to do that although if possible I'd quite like to keep the old ESXi server around for a few days more, but not critical if that's a problem.

@vielmetti
Copy link

@sxa If you could deprovision in the UI that would be great, thanks.

I know it's nearly weekend already for you - so if anything you can do that's super easy could be done sooner, and then anything harder like the ESXi could wait to early next week.

@sxa
Copy link
Member Author

sxa commented Dec 2, 2022

@sxa If you could deprovision in the UI that would be great, thanks.

I know it's nearly weekend already for you - so if anything you can do that's super easy could be done sooner, and then anything harder like the ESXi could wait to early next week.

All shut down except for the ESXi server and test-packet-ubuntu1604-x64-3 - both are awaiting https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/2276 but if necessary they could go if needed too. I'll aim to deprovision them too.

@sxa
Copy link
Member Author

sxa commented Dec 2, 2022

All apart from the two I mentioned (And the ARM systems) now deleted. I've typed delete into your UI so many times I feel like a Cyberman :-)

@sxa
Copy link
Member Author

sxa commented Jan 25, 2023

Old ESXi server now decommissioned too so this is fully complete now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

3 participants