Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More fix spring auth #277

Merged

Conversation

geekingfrog
Copy link
Contributor

@geekingfrog geekingfrog commented Apr 19, 2024

A few fixes for some old tests.
mix test test/teiserver/protocols/spring/spring_auth_test.exs now returns 16 tests passed out of 16 vs 7 failures out of 16 tests on master.

Although there are still some async errors regarding tests teardown, it doesn't actually fail any tests. More info in af60475

This seems to be an alternative way, that's what's autogenerated on a
brand new phoenix project.
When disconnecting, there's a bunch of additional logic run, like
storing some statistic and publishing events. This is racy against the
teardown of the SQL sandbox, leading to the confusing
"cannot find ownership process" message.
By manually disconnecting we avoid this problem.
Turn out, the server doesn't send the ignore list to the client when it
modifies anything. This seems to be conform to the spec at:
https://springrts.com/dl/LobbyProtocol/ProtocolDescription.html#IGNORELIST:server
The test needs to manually send the message to get the ignore list.
Otherwise login will fail and won't be broadcasted to other players.
Assuming the server does the correct thing, after all, this is what's
currently running live and that the tests drifted away.
BAR is custom commands for this, that have been added after the initial
tests. Updated the tests to reflect these changes and make them pass.
@geekingfrog geekingfrog force-pushed the more-fix-spring-auth branch from af60475 to 627af28 Compare April 19, 2024 07:40
@geekingfrog
Copy link
Contributor Author

@L-e-x-o-n if you have some time to have a look.

@@ -92,6 +92,12 @@ defmodule Teiserver.TeiserverTestLib do

@spec auth_setup(nil | Map.t()) :: %{socket: port(), user: Map.t(), pid: pid()}
def auth_setup(user \\ nil) do
# Remember to call Teiserver.Client.disconnect(user.id) at the end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just a workaround to issue where teiserver instance is shared across all tests. I do not think it is wise to share a teiserver instance and with every tests figureout how to correctly clean it up. We should instead ensure teiserver boots and shuts down on each test that requires connections to teiserver.

@L-e-x-o-n
Copy link
Collaborator

@L-e-x-o-n if you have some time to have a look.

I am getting 16 tests, 4 failures.
Starting with

2024-04-22 17:16:09.818 [error] Postgrex.Protocol (#PID<0.529.0>) disconnected: ** (DBConnection.ConnectionError) owner #PID<0.2214.0> exited

Client #PID<0.2216.0> is still using a connection from owner at location:

but there is probably something broken with my local build because GH action test workflow only gets 1 auth test error:

132) test RENAMEACCOUNT (Teiserver.SpringAuthTest)
Error:      test/teiserver/protocols/spring/spring_auth_test.exs:450
     Assertion with == failed
     code:  assert reply == "SERVERMSG Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\n"
     left:  "SERVERMSG Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\nSAIDPRIVATE Coordinator Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\n"
     right: "SERVERMSG Invalid characters in name (only a-z, A-Z, 0-9, [, ] allowed)\n"
     stacktrace:
       test/teiserver/protocols/spring/spring_auth_test.exs:467: (test)

@geekingfrog
Copy link
Contributor Author

I am getting 16 tests, 4 failures. Starting with

Have you tried dropping the test db? psql -U teiserver_test postgres -c "drop database teiserver_test"
Because some other tests aren't correctly setup with the sql sandbox, so they are leaving some state in place.
I saw for example that some tests were setting up some "new" user by name, but since the name was already existing, the id expected in the test and what the server was using was different.

There may be some more flakiness around though, as shown from the github action, but I'd rather focus on running tests for just this file for now, or at least for the "fixed" files.

We should instead ensure teiserver boots and shuts down on each test that requires connections to teiserver.

That seems very excessive, and not the default for phoenix tests. Ensuring disconnection is really to prevent stdout being inundated by errors irrelevant to the tests themselves.

@AdamChlupacek
Copy link
Contributor

AdamChlupacek commented Apr 22, 2024

I agree that shutting the whole server down is excessive, however those error logs are prove that there is some code still executing after the test finishes. I would just like to have some way how to sand box the whole test so know nothing else is running, i guess this is a topic on its own.

All the tests do pass fine on my machine otherwise.

@geekingfrog
Copy link
Contributor Author

I agree that shutting the whole server down is excessive, however those error logs are prove that there is some code still executing after the test finishes. I would just like to have some way how to sand box the whole test so know nothing else is running, i guess this is a topic on its own.

Yeah, the teardown code is running after the tests because there are a few on_exit(fn) hooks. The entire test suite goes something like that:

start server
setup test1 (may be many of these as these can be nested)
run test1
teardown test1

setup test2
run test2
teardown test2

...
shutdown erlang VM

When starting, teiserver does a bunch of DB query, I think it seeds the DB with some data if not there, and warm a bunch of caches as well. I suspect this may be problematic with many tests indeed.
I do not yet 100% understand how the sandbox work, and how the connection sharing play with async/sync tests. It would be nice to get to the bottom of it definitely.

@AdamChlupacek
Copy link
Contributor

When starting, teiserver does a bunch of DB query, I think it seeds the DB with some data if not there, and warm a bunch of caches as well. I suspect this may be problematic with many tests indeed. I do not yet 100% understand how the sandbox work, and how the connection sharing play with async/sync tests. It would be nice to get to the bottom of it definitely.

As far as i understand the sandboxing in terms of ecto, it just records the transactions made agains the DB and then reverses them. I did not find any other layer of sandboxing.

I think what is happening is that we completely escape the context of the test via the TCP connections that are made to the running teiserver. Since most of these test dont only test how the messages are handled but as well the L4 TCP. These test are on level of rancher so no phoenix is include here afaik.

Overall i think we should either remove the on_exit here since its not really contributing to the test, and the core issue of the extraneous error logs should be handled in a different PR, be it by including the on_exit as part of of auth_setup or changing how we test the functionality not to escape the context via TCP.

@L-e-x-o-n
Copy link
Collaborator

I am getting 16 tests, 4 failures. Starting with

Have you tried dropping the test db? psql -U teiserver_test postgres -c "drop database teiserver_test" Because some other tests aren't correctly setup with the sql sandbox, so they are leaving some state in place. I saw for example that some tests were setting up some "new" user by name, but since the name was already existing, the id expected in the test and what the server was using was different.

There may be some more flakiness around though, as shown from the github action, but I'd rather focus on running tests for just this file for now, or at least for the "fixed" files.

We should instead ensure teiserver boots and shuts down on each test that requires connections to teiserver.

That seems very excessive, and not the default for phoenix tests. Ensuring disconnection is really to prevent stdout being inundated by errors irrelevant to the tests themselves.

Good point, used MIX_ENV=test mix ecto.reset instead and it worked. All 16 test pass now :)

@geekingfrog geekingfrog force-pushed the more-fix-spring-auth branch from 627af28 to a96004a Compare April 27, 2024 06:23
@geekingfrog
Copy link
Contributor Author

Overall i think we should either remove the on_exit here since its not really contributing to the test, and the core issue of the extraneous error logs should be handled in a different PR, be it by including the on_exit as part of of auth_setup or changing how we test the functionality not to escape the context via TCP.

I amended the commit 627af28
Turn out indeed, I can simply put the disconnect inside an on_exit callback in the setup function. This way the test code doesn't need to care about that and it's cleaner.

@geekingfrog
Copy link
Contributor Author

Is there anything anymore to this PR? I'd like to get it merged soon-ish if possible.

@StanczakDominik StanczakDominik added the needs testing Needs unit tests or testing on integration server label May 28, 2024
@StanczakDominik StanczakDominik merged commit e342b87 into beyond-all-reason:master Jun 2, 2024
1 check failed
@geekingfrog geekingfrog deleted the more-fix-spring-auth branch June 3, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs testing Needs unit tests or testing on integration server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants