Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flash unittests sometimes crashes #2425

Closed
Geod24 opened this issue Aug 27, 2021 · 11 comments · Fixed by #2964
Closed

Flash unittests sometimes crashes #2425

Geod24 opened this issue Aug 27, 2021 · 11 comments · Fixed by #2964
Assignees
Labels
prio-high Will have an effect in the next three sprints type-bug Things don't work as they were intended to, or the way they were intended to work doesn't make sense

Comments

@Geod24
Copy link
Collaborator

Geod24 commented Aug 27, 2021

Unittesting agora.test.Flash#### FATAL ERROR: Connection with peer closed
This node was started at source/agora/test/Flash.d:399
This most likely means that the node crashed due to an uncaught exception
If not, please file a bug at https://github.com/Geod24/localrest/
Full error: object.Exception@submodules/localrest/source/geod24/LocalRest.d-mixin-1179(1190): Connection with peer closed
----------------
submodules/localrest/source/geod24/LocalRest.d-mixin-1179:1190 @trusted geod24.LocalRest.Response geod24.LocalRest.RemoteAPI!(agora.test.Flash.TestFlashAPI, agora.test.Base.Serializer).RemoteAPI.confirmChannelUpdate(agora.crypto.Key.PublicKey, agora.crypto.Key.PublicKey, geod24.bitblob.BitBlob!(64uL).BitBlob, uint).__lambda6() [0x10f28d091]
submodules/localrest/source/geod24/LocalRest.d-mixin-1179:1185 @safe agora.flash.Types.Result!(bool).Result geod24.LocalRest.RemoteAPI!(agora.test.Flash.TestFlashAPI, agora.test.Base.Serializer).RemoteAPI.confirmChannelUpdate(agora.crypto.Key.PublicKey, agora.crypto.Key.PublicKey, geod24.bitblob.BitBlob!(64uL).BitBlob, uint) [0x10f28cb33]
submodules/localrest/source/geod24/LocalRest.d-mixin-1179:1180 _DThn16_6geod249LocalRest__T9RemoteAPITC5agora4test5Flash12TestFlashAPITSQBhQBe4Base10SerializerZQCr20confirmChannelUpdateMFNfSQDj6crypto3Key9PublicKeyQzSQFq7bitblob__T7BitBlobVmi64ZQokZSQFr5flash5Types__T6ResultTbZQk [0x10f28e505]
source/agora/flash/UpdateSigner.d:421 agora.flash.Types.Result!(agora.flash.Types.UpdatePair).Result agora.flash.UpdateSigner.UpdateSigner.collectSignatures(agora.flash.api.FlashAPI.FlashAPI, uint, agora.consensus.data.Transaction.Output[], agora.flash.Types.PrivateNonce, agora.flash.Types.PublicNonce, geod24.bitblob.BitBlob!(64uL).BitBlob, agora.consensus.data.UTXO.UTXO) [0x10f35f8a4]
source/agora/flash/Channel.d:994 void agora.flash.Channel.Channel.checkProposeUpdate() [0x10ee73ca1]
source/agora/flash/Channel.d:602 void agora.flash.Channel.Channel.eventLoop() [0x10f382f55]
source/agora/flash/Channel.d:533 void agora.flash.Channel.Channel.start(in ref agora.flash.Types.PrivateNonce, in ref agora.flash.Types.PublicNonce) [0x10f38127f]
source/agora/flash/Channel.d:191 void agora.flash.Channel.Channel.__ctor(agora.flash.Config.FlashConfig, agora.flash.Config.ChannelConfig, in ref agora.crypto.Key.KeyPair, agora.flash.Types.PrivateNonce, agora.flash.Types.PublicNonce, agora.flash.api.FlashAPI.FlashAPI, agora.script.Engine.Engine, agora.common.Task.ITaskManager, void delegate(agora.consensus.data.Transaction.Transaction), void delegate(in ref agora.crypto.Key.PublicKey, in ref geod24.bitblob.BitBlob!(64uL).BitBlob, in ref geod24.bitblob.BitBlob!(64uL).BitBlob, in agora.common.Amount.Amount, in agora.common.Types.Height, in ref agora.flash.OnionPacket.OnionPacket), void delegate(agora.crypto.Key.PublicKey, geod24.bitblob.BitBlob!(64uL).BitBlob, agora.flash.Types.ChannelState, agora.flash.ErrorCode.ErrorCode) @safe, void delegate(agora.crypto.Key.PublicKey, geod24.bitblob.BitBlob!(64uL).BitBlob, geod24.bitblob.BitBlob!(64uL).BitBlob, agora.flash.ErrorCode.ErrorCode) @safe, void delegate(agora.crypto.Key.PublicKey, in geod24.bitblob.BitBlob!(64uL).BitBlob[], in geod24.bitblob.BitBlob!(64uL).BitBlob[]), agora.flash.api.FlashListenerAPI.FeeUTXOs delegate(agora.crypto.Key.PublicKey, ulong), agora.common.ManagedDatabase.ManagedDatabase).__lambda16() [0x10f380bd3]
/Users/runner/hostedtoolcache/dc/ldc2-2c6a6680/x64/ldc2-2c6a6680-osx-x86_64/bin/../import/core/internal/hash.d:193 void geod24.LocalRest.Timer.run() [0x10f95556b]
core/thread/context.d:46 void core.thread.context.Callable.opCall() [0x10ff4f87c]
core/thread/fiber.d:1141 void core.thread.fiber.Fiber.run() [0x10ff4fa08]
core/thread/fiber.d:198 fiber_entryPoint [0x10ff4f945]
./ci/run.sh: line 21: 15907 Abort trap: 6           ./build/agora-unittests
@Geod24 Geod24 added the type-bug Things don't work as they were intended to, or the way they were intended to work doesn't make sense label Aug 27, 2021
@Geod24 Geod24 added this to the 6. Stabilization milestone Aug 27, 2021
@Geod24
Copy link
Collaborator Author

Geod24 commented Aug 27, 2021

RemoteAPI!TestFlashAPI api = RemoteAPI!TestFlashAPI.spawn!FlashNodeImpl(
conf, this.agora_registry, agora_address, storage,
&this.flash_registry, &this.listener_registry,
10.seconds); // timeout from main thread

@Geod24 Geod24 changed the title Flash unittests ometimes crashes Flash unittests sometimes crashes Aug 27, 2021
@Geod24
Copy link
Collaborator Author

Geod24 commented Sep 30, 2021

Also seeing:

network.listener.waitUntilChannelState(chan_id,
ChannelState.Closed);

Module tests failed: agora.test.Flash
object.Exception@submodules/localrest/source/geod24/LocalRest.d-mixin-1179(1203): Request timed-out
----------------
/home/runner/work/agora/agora/submodules/localrest/source/geod24/LocalRest.d-mixin-1179:1203 @safe agora.flash.ErrorCode.ErrorCode geod24.LocalRest.RemoteAPI!(agora.test.Flash.TestFlashListenerAPI, agora.test.Base.Serializer).RemoteAPI.waitUntilChannelState(geod24.bitblob.BitBlob!(64uL).BitBlob, agora.flash.Types.ChannelState, agora.crypto.Key.PublicKey) [0x55f09ed6d61e]
/home/runner/work/agora/agora/source/agora/test/Flash.d:1937 void agora.test.Flash.__unittest_L1845_C1() [0x55f09eb801be]
/home/runner/work/agora/agora/source/agora/test/Flash.d [0x55f09ed2fdfc]
/home/runner/work/agora/agora/source/agora/test/Base.d:275 bool agora.test.Base.customModuleUnitTester().runTest(agora.test.Base.customModuleUnitTester().ModTest) [0x55f09ed44162]
/home/runner/work/agora/agora/source/agora/test/Base.d:302 core.runtime.UnitTestResult agora.test.Base.customModuleUnitTester() [0x55f09eb01366]
runtime.d:607 runModuleUnitTests [0x55f09f950e20]
dmain2.d:551 void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).runAll() [0x55f09f96ade2]
dmain2.d:525 void rt.dmain2._d_run_main2(char[][], ulong, extern (C) int function(char[][])*).tryExec(scope void delegate()) [0x55f09f96ad62]
dmain2.d:584 _d_run_main2 [0x55f09f96ac19]
dmain2.d:384 _d_run_main [0x55f09f96a8d3]
/opt/hostedtoolcache/dc/ldc2-1.26.0/x64/ldc2-1.26.0-linux-x86_64/bin/../import/core/internal/entrypoint.d:42 main [0x55f09ed4d5d4]
??:? __libc_start_main [0x7f0499c7e0b2]
??:? _start [0x55f09e5e625d]

@Geod24
Copy link
Collaborator Author

Geod24 commented Sep 30, 2021

Regarding the crash, it doesn't seem to be specific to a single test, I just got it at a very different line:

Unittesting agora.test.Fee#### FATAL ERROR: Connection with peer closed
This node was started at source/agora/test/Flash.d:1951
This most likely means that the node crashed due to an uncaught exception
./ci/run.sh: line 21: 12129 Abort trap: 6           ./build/agora-unittests
If not, please file a bug at https://github.com/Geod24/localrest/
+ ./build/agora-unittests
Full error: object.Exception@submodules/localrest/source/geod24/LocalRest.d-mixin-1179(1190): Connection with peer closed
----------------
/Users/runner/work/agora/agora/submodules/localrest/source/geod24/LocalRest.d-mixin-1179:1190 @trusted geod24.LocalRest.Response geod24.LocalRest.RemoteAPI!(agora.test.Flash.TestFlashAPI, agora.test.Base.Serializer).RemoteAPI.requestSettleSig(agora.crypto.Key.PublicKey, agora.crypto.Key.PublicKey, geod24.bitblob.BitBlob!(64uL).BitBlob, uint).__lambda6() [0x109715131]
/Users/runner/work/agora/agora/submodules/localrest/source/geod24/LocalRest.d-mixin-1179:1185 @safe agora.flash.Types.Result!(agora.crypto.Schnorr.Signature).Result geod24.LocalRest.RemoteAPI!(agora.test.Flash.TestFlashAPI, agora.test.Base.Serializer).RemoteAPI.requestSettleSig(agora.crypto.Key.PublicKey, agora.crypto.Key.PublicKey, geod24.bitblob.BitBlob!(64uL).BitBlob, uint) [0x1096d096f]
/Users/runner/work/agora/agora/submodules/localrest/source/geod24/LocalRest.d-mixin-1179:1180 _DThn16_6geod249LocalRest__T9RemoteAPITC5agora4test5Flash12TestFlashAPITSQBhQBe4Base10SerializerZQCr16requestSettleSigMFNfSQDf6crypto3Key9PublicKeyQzSQFm7bitblob__T7BitBlobVmi64ZQokZSQFn5flash5Types__T6ResultTSQGoQDj7Schnorr9SignatureZQBi [0x109717658]
/Users/runner/work/agora/agora/source/agora/flash/UpdateSigner.d:335 agora.flash.Types.Result!(agora.flash.Types.UpdatePair).Result agora.flash.UpdateSigner.UpdateSigner.collectSignatures(agora.flash.api.FlashAPI.FlashAPI, uint, agora.consensus.data.Transaction.Output[], agora.flash.Types.PrivateNonce, agora.flash.Types.PublicNonce, geod24.bitblob.BitBlob!(64uL).BitBlob, agora.consensus.data.UTXO.UTXO) [0x1098cfa08]
/Users/runner/work/agora/agora/source/agora/flash/Channel.d:987 void agora.flash.Channel.Channel.checkProposeUpdate() [0x1092d6972]
/Users/runner/work/agora/agora/source/agora/flash/Channel.d:594 void agora.flash.Channel.Channel.eventLoop() [0x1098f9b28]
/Users/runner/work/agora/agora/source/agora/flash/Channel.d:524 void agora.flash.Channel.Channel.start().__lambda1() [0x1098f8608]
/Users/runner/hostedtoolcache/dc/ldc2-1.26.0/x64/ldc2-1.26.0-osx-x86_64/bin/../import/std/array.d:3392 void geod24.LocalRest.Timer.run() [0x109eb8b3b]
context.d:46 void core.thread.context.Callable.opCall() [0x10a41e44e]
fiber.d:1129 void core.thread.fiber.Fiber.run() [0x10a41e5dd]
fiber.d:186 fiber_entryPoint [0x10a41e512]

Obviously my previous message and the original report are likely to be different bugs, but I don't want to multiply the issues as addressing one will likely require a deep dive into Flash (and find a few others).

@Geod24 Geod24 added the prio-high Will have an effect in the next three sprints label Jan 6, 2022
@Geod24
Copy link
Collaborator Author

Geod24 commented Jan 6, 2022

Raising the priority because of how often this happens.

@linked0
Copy link
Contributor

linked0 commented Jan 26, 2022

This is the reason the test fails often.
A funder like Alice publishes update tx, of which hash is like 0xb8ee9c for example, by calling postTransaction after the externalization of a certain height like the height 3. But, in some cases, the first validator is often before the externalization of the height of 3 so it rejects the update tx with this log.

2022-01-21T02:34:20.2359449Z 2022-01-21 02:34:13,530 [32mInfo [0m [boa1xrval5rzmma29zh4aqgv3mvcarhwa0w8rgthy3l9vaj3fywf9894ycmjkm8] - Rejected tx: Inputs (2): 0xd7db...823e:0x4e6b...48c6, 0xe2e8...975f:0xac4f...1e7c
2022-01-21T02:34:20.2360154Z Outputs (2): boa1xza007gl...9g93(59,499,999.9,348,1)<Payment>, boa1xqqqqqqq...jq8m(0.0,01)<Payment>, txHash: 0xb8ee9cb0f8c83f072c212296c1596113e796b628c3bbbd2965289ae567728746a0e2161ddaf4f8297a7a0e3fee3a44d96958f7c35bd174e4ef4c47e3b0e2cde7, Reason: Transaction: Input ref not in UTXO.
<omitted>
2022-01-21T02:34:20.2364038Z 2022-01-21 02:34:13,591 [32mInfo [0m [agora.consensus.state.Ledger] - Beginning externalization of block #3

Alice has already sent the update tx with the following log.

2022-01-21T02:34:20.3736764Z [32mInfo [0m [agora.flash.Node] - Block #3 is externalized...
2022-01-21T02:34:20.3737234Z [32mInfo [0m [agora.flash.Channel] - Alice: Update tx externalized(0x8d0d0b) on height 3
2022-01-21T02:34:20.3737693Z [32mInfo [0m [agora.flash.Channel] - Alice: Publishing update tx 6: 0xb8ee9c

@linked0
Copy link
Contributor

linked0 commented Feb 4, 2022

Found one more case where a test fails on CI as follows. I will find out why the public key of the output referenced by a transaction is PublicKey.init which means its string representation is boa1xqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqa9jq8m

2022-02-04T02:17:14.2093470Z Module tests failed: agora.test.Flash
2022-02-04T02:17:14.2094510Z core.exception.AssertError@source/agora/utils/TxBuilder.d(275): Address not found in Well-Known keypairs: boa1xqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqa9jq8m
2022-02-04T02:17:14.2095400Z ----------------
2022-02-04T02:17:14.2096120Z source/agora/test/Base.d:176 nothrow void agora.test.Base.testAssertHandler(immutable(char)[], ulong, immutable(char)[]) [0x10368be42]
2022-02-04T02:17:14.2096820Z core/exception.d:891 onAssertErrorMsg [0x10448b0db]
2022-02-04T02:17:14.2097450Z core/exception.d:761 _d_assert_msg [0x10448b615]
2022-02-04T02:17:14.2098370Z source/agora/utils/TxBuilder.d:275 nothrow @safe agora.script.Lock.Unlock agora.utils.TxBuilder.TxBuilder.keyUnlocker(in ref agora.consensus.data.Transaction.Transaction, in ref agora.utils.TxBuilder.OutputRef) [0x1035add2d]
2022-02-04T02:17:14.2099090Z source/agora/utils/TxBuilder.d:557 nothrow @safe void agora.utils.TxBuilder.TxBuilder.minFees().__lambda2(agora.utils.TxBuilder.OutputRef) [0x1035bc7a5]
2022-02-04T02:17:14.2100710Z /Users/runner/hostedtoolcache/dc/ldc2-1.28.1/x64/ldc2-1.28.1-osx-x86_64/bin/../import/std/algorithm/iteration.d:1028 nothrow @safe std.typecons.Flag!("each").Flag std.algorithm.iteration.each!(agora.utils.TxBuilder.TxBuilder.minFees().__lambda2(agora.utils.TxBuilder.OutputRef)).each!(agora.utils.TxBuilder.OutputRef[]).each(ref agora.utils.TxBuilder.OutputRef[]) [0x1035b1a36]
2022-02-04T02:17:14.2101570Z source/agora/utils/TxBuilder.d:555 nothrow @safe agora.common.Amount.Amount agora.utils.TxBuilder.TxBuilder.minFees() [0x1035b0c9b]
2022-02-04T02:17:14.2102240Z source/agora/utils/TxBuilder.d:358 nothrow @safe agora.consensus.data.Transaction.Transaction agora.utils.TxBuilder.TxBuilder.sign(in agora.consensus.data.Transaction.OutputType, uint, agora.common.Amount.Amount) [0x1035b005d]
2022-02-04T02:17:14.2103000Z source/agora/test/Base.d:1132 void agora.test.Base.TestAPIManager.addBlock!(std.range.iota!(int, ubyte).iota(int, ubyte).Result).addBlock(std.range.iota!(int, ubyte).iota(int, ubyte).Result, bool, immutable(char)[], int) [0x1035b58d2]
2022-02-04T02:17:14.2103970Z source/agora/test/Base.d:1115 void agora.test.Base.TestAPIManager.addBlock(bool, immutable(char)[], int) [0x10362bf57]
2022-02-04T02:17:14.2104400Z source/agora/test/Flash.d:807 void agora.test.Flash.__unittest_L729_C1().__lambda18!(int).__lambda18(int) [0x103671ea4]

@linked0
Copy link
Contributor

linked0 commented Feb 4, 2022

As @hewison-chris commented, we might need to modify the logic of the disabling publishing. Will analyze the code more for that.

@omerfirmak
Copy link
Contributor

Found one more case where a test fails on CI as follows. I will find out why the public key of the output referenced by a transaction is PublicKey.init which means its string representation is boa1xqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqa9jq8m

2022-02-04T02:17:14.2093470Z Module tests failed: agora.test.Flash
2022-02-04T02:17:14.2094510Z core.exception.AssertError@source/agora/utils/TxBuilder.d(275): Address not found in Well-Known keypairs: boa1xqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqa9jq8m
2022-02-04T02:17:14.2095400Z ----------------
2022-02-04T02:17:14.2096120Z source/agora/test/Base.d:176 nothrow void agora.test.Base.testAssertHandler(immutable(char)[], ulong, immutable(char)[]) [0x10368be42]
2022-02-04T02:17:14.2096820Z core/exception.d:891 onAssertErrorMsg [0x10448b0db]
2022-02-04T02:17:14.2097450Z core/exception.d:761 _d_assert_msg [0x10448b615]
2022-02-04T02:17:14.2098370Z source/agora/utils/TxBuilder.d:275 nothrow @safe agora.script.Lock.Unlock agora.utils.TxBuilder.TxBuilder.keyUnlocker(in ref agora.consensus.data.Transaction.Transaction, in ref agora.utils.TxBuilder.OutputRef) [0x1035add2d]
2022-02-04T02:17:14.2099090Z source/agora/utils/TxBuilder.d:557 nothrow @safe void agora.utils.TxBuilder.TxBuilder.minFees().__lambda2(agora.utils.TxBuilder.OutputRef) [0x1035bc7a5]
2022-02-04T02:17:14.2100710Z /Users/runner/hostedtoolcache/dc/ldc2-1.28.1/x64/ldc2-1.28.1-osx-x86_64/bin/../import/std/algorithm/iteration.d:1028 nothrow @safe std.typecons.Flag!("each").Flag std.algorithm.iteration.each!(agora.utils.TxBuilder.TxBuilder.minFees().__lambda2(agora.utils.TxBuilder.OutputRef)).each!(agora.utils.TxBuilder.OutputRef[]).each(ref agora.utils.TxBuilder.OutputRef[]) [0x1035b1a36]
2022-02-04T02:17:14.2101570Z source/agora/utils/TxBuilder.d:555 nothrow @safe agora.common.Amount.Amount agora.utils.TxBuilder.TxBuilder.minFees() [0x1035b0c9b]
2022-02-04T02:17:14.2102240Z source/agora/utils/TxBuilder.d:358 nothrow @safe agora.consensus.data.Transaction.Transaction agora.utils.TxBuilder.TxBuilder.sign(in agora.consensus.data.Transaction.OutputType, uint, agora.common.Amount.Amount) [0x1035b005d]
2022-02-04T02:17:14.2103000Z source/agora/test/Base.d:1132 void agora.test.Base.TestAPIManager.addBlock!(std.range.iota!(int, ubyte).iota(int, ubyte).Result).addBlock(std.range.iota!(int, ubyte).iota(int, ubyte).Result, bool, immutable(char)[], int) [0x1035b58d2]
2022-02-04T02:17:14.2103970Z source/agora/test/Base.d:1115 void agora.test.Base.TestAPIManager.addBlock(bool, immutable(char)[], int) [0x10362bf57]
2022-02-04T02:17:14.2104400Z source/agora/test/Flash.d:807 void agora.test.Flash.__unittest_L729_C1().__lambda18!(int).__lambda18(int) [0x103671ea4]

That looks more like a general issue than a Flash specifc one.

@linked0
Copy link
Contributor

linked0 commented Feb 4, 2022

That looks more like a general issue than a Flash specifc one.

Right, it does. But we need to know why it happens. I saw the representation of the public key some times in the Flash test.

@linked0
Copy link
Contributor

linked0 commented Feb 4, 2022

That looks more like a general issue than a Flash specifc one.

Right, it does. But we need to know why it happens. I saw the representation of the public key some times in the Flash test.

This could be another timing issue.

@linked0
Copy link
Contributor

linked0 commented Feb 16, 2022

This issue seems to be solved by handling the timing issues. So closing.

@linked0 linked0 closed this as completed Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-high Will have an effect in the next three sprints type-bug Things don't work as they were intended to, or the way they were intended to work doesn't make sense
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants