More fix spring auth #277

geekingfrog · 2024-04-19T07:39:59Z

A few fixes for some old tests.
mix test test/teiserver/protocols/spring/spring_auth_test.exs now returns 16 tests passed out of 16 vs 7 failures out of 16 tests on master.

Although there are still some async errors regarding tests teardown, it doesn't actually fail any tests. More info in af60475

This seems to be an alternative way, that's what's autogenerated on a brand new phoenix project.

When disconnecting, there's a bunch of additional logic run, like storing some statistic and publishing events. This is racy against the teardown of the SQL sandbox, leading to the confusing "cannot find ownership process" message. By manually disconnecting we avoid this problem.

Turn out, the server doesn't send the ignore list to the client when it modifies anything. This seems to be conform to the spec at: https://springrts.com/dl/LobbyProtocol/ProtocolDescription.html#IGNORELIST:server The test needs to manually send the message to get the ignore list.

Otherwise login will fail and won't be broadcasted to other players.

Assuming the server does the correct thing, after all, this is what's currently running live and that the tests drifted away.

BAR is custom commands for this, that have been added after the initial tests. Updated the tests to reflect these changes and make them pass.

geekingfrog · 2024-04-20T18:36:53Z

@L-e-x-o-n if you have some time to have a look.

AdamChlupacek · 2024-04-22T06:26:59Z

lib/teiserver/libs/test_lib.ex

@@ -92,6 +92,12 @@ defmodule Teiserver.TeiserverTestLib do

  @spec auth_setup(nil | Map.t()) :: %{socket: port(), user: Map.t(), pid: pid()}
  def auth_setup(user \\ nil) do
+    # Remember to call Teiserver.Client.disconnect(user.id) at the end


I think this is just a workaround to issue where teiserver instance is shared across all tests. I do not think it is wise to share a teiserver instance and with every tests figureout how to correctly clean it up. We should instead ensure teiserver boots and shuts down on each test that requires connections to teiserver.

L-e-x-o-n · 2024-04-22T15:24:08Z

@L-e-x-o-n if you have some time to have a look.

I am getting 16 tests, 4 failures.
Starting with

2024-04-22 17:16:09.818 [error] Postgrex.Protocol (#PID<0.529.0>) disconnected: ** (DBConnection.ConnectionError) owner #PID<0.2214.0> exited

Client #PID<0.2216.0> is still using a connection from owner at location:

but there is probably something broken with my local build because GH action test workflow only gets 1 auth test error:

132) test RENAMEACCOUNT (Teiserver.SpringAuthTest)
Error:      test/teiserver/protocols/spring/spring_auth_test.exs:450
     Assertion with == failed
     code:  assert reply == "SERVERMSG Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\n"
     left:  "SERVERMSG Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\nSAIDPRIVATE Coordinator Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\n"
     right: "SERVERMSG Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\n"
     stacktrace:
       test/teiserver/protocols/spring/spring_auth_test.exs:467: (test)

geekingfrog · 2024-04-22T18:50:09Z

I am getting 16 tests, 4 failures. Starting with

Have you tried dropping the test db? psql -U teiserver_test postgres -c "drop database teiserver_test"
Because some other tests aren't correctly setup with the sql sandbox, so they are leaving some state in place.
I saw for example that some tests were setting up some "new" user by name, but since the name was already existing, the id expected in the test and what the server was using was different.

There may be some more flakiness around though, as shown from the github action, but I'd rather focus on running tests for just this file for now, or at least for the "fixed" files.

We should instead ensure teiserver boots and shuts down on each test that requires connections to teiserver.

That seems very excessive, and not the default for phoenix tests. Ensuring disconnection is really to prevent stdout being inundated by errors irrelevant to the tests themselves.

AdamChlupacek · 2024-04-22T19:00:59Z

I agree that shutting the whole server down is excessive, however those error logs are prove that there is some code still executing after the test finishes. I would just like to have some way how to sand box the whole test so know nothing else is running, i guess this is a topic on its own.

All the tests do pass fine on my machine otherwise.

geekingfrog · 2024-04-22T19:28:03Z

I agree that shutting the whole server down is excessive, however those error logs are prove that there is some code still executing after the test finishes. I would just like to have some way how to sand box the whole test so know nothing else is running, i guess this is a topic on its own.

Yeah, the teardown code is running after the tests because there are a few on_exit(fn) hooks. The entire test suite goes something like that:

start server
setup test1 (may be many of these as these can be nested)
run test1
teardown test1

setup test2
run test2
teardown test2

...
shutdown erlang VM

When starting, teiserver does a bunch of DB query, I think it seeds the DB with some data if not there, and warm a bunch of caches as well. I suspect this may be problematic with many tests indeed.
I do not yet 100% understand how the sandbox work, and how the connection sharing play with async/sync tests. It would be nice to get to the bottom of it definitely.

AdamChlupacek · 2024-04-22T20:09:10Z

When starting, teiserver does a bunch of DB query, I think it seeds the DB with some data if not there, and warm a bunch of caches as well. I suspect this may be problematic with many tests indeed. I do not yet 100% understand how the sandbox work, and how the connection sharing play with async/sync tests. It would be nice to get to the bottom of it definitely.

As far as i understand the sandboxing in terms of ecto, it just records the transactions made agains the DB and then reverses them. I did not find any other layer of sandboxing.

I think what is happening is that we completely escape the context of the test via the TCP connections that are made to the running teiserver. Since most of these test dont only test how the messages are handled but as well the L4 TCP. These test are on level of rancher so no phoenix is include here afaik.

Overall i think we should either remove the on_exit here since its not really contributing to the test, and the core issue of the extraneous error logs should be handled in a different PR, be it by including the on_exit as part of of auth_setup or changing how we test the functionality not to escape the context via TCP.

L-e-x-o-n · 2024-04-22T20:53:46Z

I am getting 16 tests, 4 failures. Starting with

Have you tried dropping the test db? psql -U teiserver_test postgres -c "drop database teiserver_test" Because some other tests aren't correctly setup with the sql sandbox, so they are leaving some state in place. I saw for example that some tests were setting up some "new" user by name, but since the name was already existing, the id expected in the test and what the server was using was different.

There may be some more flakiness around though, as shown from the github action, but I'd rather focus on running tests for just this file for now, or at least for the "fixed" files.

We should instead ensure teiserver boots and shuts down on each test that requires connections to teiserver.

That seems very excessive, and not the default for phoenix tests. Ensuring disconnection is really to prevent stdout being inundated by errors irrelevant to the tests themselves.

Good point, used MIX_ENV=test mix ecto.reset instead and it worked. All 16 test pass now :)

geekingfrog · 2024-04-27T06:24:14Z

Overall i think we should either remove the on_exit here since its not really contributing to the test, and the core issue of the extraneous error logs should be handled in a different PR, be it by including the on_exit as part of of auth_setup or changing how we test the functionality not to escape the context via TCP.

I amended the commit 627af28
Turn out indeed, I can simply put the disconnect inside an on_exit callback in the setup function. This way the test code doesn't need to care about that and it's cleaner.

geekingfrog · 2024-05-10T09:07:34Z

Is there anything anymore to this PR? I'd like to get it merged soon-ish if possible.

geekingfrog added 8 commits April 19, 2024 08:40

Setup SQL sandbox

256d09c

This seems to be an alternative way, that's what's autogenerated on a brand new phoenix project.

Correctly set moderator role

b5bf1fd

Need to verify users after rename

12ac135

Otherwise login will fail and won't be broadcasted to other players.

Give correct role to bot to manage battle in test

35ccef1

Fix join battle response assertions

81de0e8

Assuming the server does the correct thing, after all, this is what's currently running live and that the tests drifted away.

Update friendship relationship tests

cb89206

BAR is custom commands for this, that have been added after the initial tests. Updated the tests to reflect these changes and make them pass.

geekingfrog force-pushed the more-fix-spring-auth branch from af60475 to 627af28 Compare April 19, 2024 07:40

AdamChlupacek reviewed Apr 22, 2024

View reviewed changes

Ensure disconnection of all user at the end of tests

a96004a

geekingfrog force-pushed the more-fix-spring-auth branch from 627af28 to a96004a Compare April 27, 2024 06:23

L-e-x-o-n approved these changes May 10, 2024

View reviewed changes

StanczakDominik added the needs testing Needs unit tests or testing on integration server label May 28, 2024

geekingfrog mentioned this pull request Jun 2, 2024

Improve lobby restrictions #303

Merged

StanczakDominik approved these changes Jun 2, 2024

View reviewed changes

StanczakDominik merged commit e342b87 into beyond-all-reason:master Jun 2, 2024
1 check failed

geekingfrog deleted the more-fix-spring-auth branch June 3, 2024 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More fix spring auth #277

More fix spring auth #277

geekingfrog commented Apr 19, 2024 •

edited

Loading

geekingfrog commented Apr 20, 2024

AdamChlupacek Apr 22, 2024

L-e-x-o-n commented Apr 22, 2024

geekingfrog commented Apr 22, 2024

AdamChlupacek commented Apr 22, 2024 •

edited

Loading

geekingfrog commented Apr 22, 2024

AdamChlupacek commented Apr 22, 2024

L-e-x-o-n commented Apr 22, 2024

geekingfrog commented Apr 27, 2024

geekingfrog commented May 10, 2024

More fix spring auth #277

More fix spring auth #277

Conversation

geekingfrog commented Apr 19, 2024 • edited Loading

geekingfrog commented Apr 20, 2024

AdamChlupacek Apr 22, 2024

Choose a reason for hiding this comment

L-e-x-o-n commented Apr 22, 2024

geekingfrog commented Apr 22, 2024

AdamChlupacek commented Apr 22, 2024 • edited Loading

geekingfrog commented Apr 22, 2024

AdamChlupacek commented Apr 22, 2024

L-e-x-o-n commented Apr 22, 2024

geekingfrog commented Apr 27, 2024

geekingfrog commented May 10, 2024

geekingfrog commented Apr 19, 2024 •

edited

Loading

AdamChlupacek commented Apr 22, 2024 •

edited

Loading