Reorg handling may be broken in 0.20 #3940

victorkirov · 2024-09-11T07:11:55Z

There was a reorg on Testnet around block 2904360. Ord 0.20 successfully picked it up and executed the rollback, but the rollback has been stuck for over 2 hours where it used to be close to instant before. This is on multiple instances of the indexer, not just one, so it's not an outlier issue.

[2024-09-11T05:51:03Z INFO  ord::index] 5 block deep reorg detected at height 2904360
[2024-09-11T05:51:03Z INFO  ord::index::reorg] rolling back database after reorg of depth 5 at height 2904360

Could something have broken in a new version of redb?

The text was updated successfully, but these errors were encountered:

victorkirov · 2024-09-11T07:21:40Z

It also looks like whatever state the process is in is blocking shutdown. After the first sigterm, it logs that it's gracefully shutting down and you can force shutdown by pushing ctrl-c again. A second sigterm usually kills the process at this point but it's not responding. A sigkill also does nothing, though I don't think sigkills are currently handled.

raphjaph · 2024-09-13T15:56:56Z

Hey, thanks for opening this issue! Unfortunately our testnet instance was still running v19 so I can't check this myself. I've just updated the server though. What does it say on the /status page of the server?

SIGKILL should always work. Could yo maybe provide a ps or top output? You can also send it to me privately to raphjaph AT protonmail.com.

raphjaph · 2024-09-15T18:39:08Z

I just deliberately put testnet ord into recovery mode and SIGKILL worked for me. Weird that it doesn't work for you. I'll try simulating reorgs on regtest next and see what happens.

victorkirov · 2024-09-16T06:17:09Z

Unfortunately, I already reverted the instance back to 0.19. I'll start a new 0.20 one and try to simulate a reorg, but it may take some time.

For SIG KILL, I just tried doing a SIGKILL on a normally running instance and it's ignored completely 👀 Maybe this is due to it running inside a container, but it shouldn't affect the signal as I'm executing it inside a bash terminal in the container (kill -9 1)

raphjaph · 2024-09-17T06:35:11Z

On regtest it seems to recover without a problem.
I did cargo run env on master then bitcoin-cli -datadir=env generatetoaddress 10 <ADDRESS> the bitcoin-cli -datadir=env invalidateblock <BLOCK_HASH> and the bitcoin-cli -datadir=env generatetoaddress 10 <ADDRESS>.

[2024-09-17T06:31:58Z INFO  ord::index::updater] Committing at block height 265, 2 outputs traversed, 3 in map, 0 cached
[2024-09-17T06:32:17Z INFO  ord::index] 6 block deep reorg detected at height 265
[2024-09-17T06:32:17Z INFO  ord::index::reorg] rolling back database after reorg of depth 6 at height 265
[2024-09-17T06:32:17Z INFO  ord::index::reorg] successfully rolled back database to height 250

victorkirov · 2024-09-17T07:25:06Z

Hmm, I guess there may have been something else wrong. It's just really strange that it happened across multiple instances. I'll try upgrade again and see if it happens. Will close this ticket for now and reopen if I can reproduce it and get more info.

Thanks for looking into it 🙌

dcorral · 2024-10-03T04:45:41Z

Mainnet reorg crashing on my side as well could we reopen this to see if we can get to the root of the issue? 🙏

raphjaph · 2024-10-03T09:14:31Z

What block was the reorg at and do you have any log outputs?

dcorral · 2024-10-04T17:06:47Z

I have no logs sorry but the block was 863888, will report back if anything comes up

dcorral · 2024-10-06T13:19:01Z

Got the above while testing regtest:

generate 101 blocks
invalidate block 99
generate 3 blocks

thread '' panicked at src/index/reorg.rs:66:76:
called Option::unwrap() on a None value
stack backtrace:
0: 0x5969b7ebadb6 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h410d4c66be4e37f9
1: 0x5969b7eeb140 - core::fmt::write::he40921d4802ce2ac
2: 0x5969b7eb697f - std::io::Write::write_fmt::h5de5a4e7037c9b20
3: 0x5969b7ebab94 - std::sys_common::backtrace::print::h11c067a88e3bdb22
4: 0x5969b7ebc417 - std::panicking::default_hook::{{closure}}::h8c832ecb03fde8ea
5: 0x5969b7ebc179 - std::panicking::default_hook::h1633e272b4150cf3
6: 0x5969b7ebc8a8 - std::panicking::rust_panic_with_hook::hb164d19c0c1e71d4
7: 0x5969b7ebc749 - std::panicking::begin_panic_handler::{{closure}}::h0369088c533c20e9
8: 0x5969b7ebb2b6 - std::sys_common::backtrace::__rust_end_short_backtrace::hc11d910daf35ac2e
9: 0x5969b7ebc4d4 - rust_begin_unwind
10: 0x5969b720ebe5 - core::panicking::panic_fmt::ha6effc2775a0749c
11: 0x5969b720eca3 - core::panicking::panic::h44790a89027c670f
12: 0x5969b720eb36 - core::option::unwrap_failed::hcb3a256a9f1ca882
13: 0x5969b73e2dbe - ord::index::reorg::Reorg::handle_reorg::h5ff6b80ae2a95596
14: 0x5969b726160b - ord::index::Index::update::h7a56ebe30e595534
15: 0x5969b7817375 - std::sys_common::backtrace::__rust_begin_short_backtrace::hbf31dcab6cd6a497
16: 0x5969b77686cf - core::ops::function::FnOnce::call_once{{vtable.shim}}::h6189ab0ff9d9cd46
17: 0x5969b7ec1bc5 - std::sys::pal::unix::thread::Thread::new::thread_start::h3631815ad38387d6
18: 0x7bdadcfb439d -
19: 0x7bdadd03949c -
20: 0x0 -

I am not seeing this all the time I do the steps above though.

ord version: 0.20.0

gus4rs · 2024-10-09T15:37:47Z

Version 0.20.1 dies after a reorg, doesn't respond to graceful shutdown signal and must be killed abruptly, screwing up the database. Happened to me on testnet3

raphjaph added the bug label Sep 13, 2024

raphjaph self-assigned this Sep 13, 2024

victorkirov closed this as completed Sep 17, 2024

victorkirov reopened this Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorg handling may be broken in 0.20 #3940

Reorg handling may be broken in 0.20 #3940

victorkirov commented Sep 11, 2024

victorkirov commented Sep 11, 2024 •

edited

Loading

raphjaph commented Sep 13, 2024

raphjaph commented Sep 15, 2024

victorkirov commented Sep 16, 2024

raphjaph commented Sep 17, 2024

victorkirov commented Sep 17, 2024

dcorral commented Oct 3, 2024

raphjaph commented Oct 3, 2024

dcorral commented Oct 4, 2024

dcorral commented Oct 6, 2024

gus4rs commented Oct 9, 2024

Reorg handling may be broken in 0.20 #3940

Reorg handling may be broken in 0.20 #3940

Comments

victorkirov commented Sep 11, 2024

victorkirov commented Sep 11, 2024 • edited Loading

raphjaph commented Sep 13, 2024

raphjaph commented Sep 15, 2024

victorkirov commented Sep 16, 2024

raphjaph commented Sep 17, 2024

victorkirov commented Sep 17, 2024

dcorral commented Oct 3, 2024

raphjaph commented Oct 3, 2024

dcorral commented Oct 4, 2024

dcorral commented Oct 6, 2024

gus4rs commented Oct 9, 2024

victorkirov commented Sep 11, 2024 •

edited

Loading