Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error recreating stream "MY": sequence numbers for cache load did not match, 16429458 vs 16429457 #2993

Closed
yadvlz opened this issue Apr 1, 2022 · 22 comments
Assignees

Comments

@yadvlz
Copy link

yadvlz commented Apr 1, 2022

Got WRN "Error recreating stream "MY": sequence numbers for cache load did not match, 16429458 vs 16429457" after restarting JetStream (2.7.4).

Two questions:

  1. Is there any way to restore this stream?
  2. Where is sequence number stored and how can I change it?
@derekcollison
Copy link
Member

This on single server or a cluster? Is the replication factor if clustered 1 or like an R3?

@yadvlz
Copy link
Author

yadvlz commented Apr 1, 2022

Single server.

@derekcollison
Copy link
Member

Any special about the storage directory? Is it a direct attached disk? Network? Where is the server running?

Is it possible to try the nightly server and see if that is able to recreate?

@derekcollison
Copy link
Member

Can you show me whole startup log with -D flag?

@yadvlz
Copy link
Author

yadvlz commented Apr 1, 2022

Any special about the storage directory? Is it a direct attached disk? Network? Where is the server running?

Is it possible to try the nightly server and see if that is able to recreate?

This is a VM with directly attached disk. There's no NFS/iSCSI.

Can you show me whole startup log with -D flag?

[33882] 2022/04/01 21:30:39.923608 [INF] Starting nats-server
[33882] 2022/04/01 21:30:39.923731 [INF] Version: 2.7.4
[33882] 2022/04/01 21:30:39.923743 [INF] Git: [a86b84a]
[33882] 2022/04/01 21:30:39.923749 [DBG] Go build: go1.17.8
[33882] 2022/04/01 21:30:39.923755 [INF] Name: 10.193.68.253_4222
[33882] 2022/04/01 21:30:39.923776 [INF] Node: I4NRt60q
[33882] 2022/04/01 21:30:39.923782 [INF] ID: NB5RZYDEHRYZIHKZDZ26N3CIZPDXD2QO2XNEYUNAU4QSSVNTGUBKVIC2
[33882] 2022/04/01 21:30:39.923860 [WRN] Plaintext passwords detected, use nkeys or bcrypt
[33882] 2022/04/01 21:30:39.923903 [INF] Using configuration file: /opt/nats-server/conf/nats.conf
[33882] 2022/04/01 21:30:39.924799 [INF] Starting http monitor on 0.0.0.0:8222
[33882] 2022/04/01 21:30:39.924902 [INF] Starting JetStream
[33882] 2022/04/01 21:30:39.925264 [INF] _ ___ _____ ___ _____ ___ ___ _ __ __
[33882] 2022/04/01 21:30:39.925276 [INF] _ | | | / | | _ \ __| /\ | / |
[33882] 2022/04/01 21:30:39.925280 [INF] | || | | | | _ \ | | | / | / _ | |/| |
[33882] 2022/04/01 21:30:39.925284 [INF] _
/|
| || |/ || ||___// __| |_|
[33882] 2022/04/01 21:30:39.925288 [INF]
[33882] 2022/04/01 21:30:39.925292 [INF] https://docs.nats.io/jetstream
[33882] 2022/04/01 21:30:39.925296 [INF]
[33882] 2022/04/01 21:30:39.925300 [INF] ---------------- JETSTREAM ----------------
[33882] 2022/04/01 21:30:39.925337 [INF] Max Memory: 953.67 MB
[33882] 2022/04/01 21:30:39.925344 [INF] Max Storage: 279.40 GB
[33882] 2022/04/01 21:30:39.925348 [INF] Store Directory: "/storage/nats-server/data/jetstream"
[33882] 2022/04/01 21:30:39.925352 [INF] -------------------------------------------
[33882] 2022/04/01 21:30:39.925549 [DBG] Exports:
[33882] 2022/04/01 21:30:39.925562 [DBG] $JS.API.>
[33882] 2022/04/01 21:30:39.925610 [DBG] Enabled JetStream for account "USERS"
[33882] 2022/04/01 21:30:39.925619 [DBG] Max Memory: -1 B
[33882] 2022/04/01 21:30:39.925624 [DBG] Max Storage: -1 B
[33882] 2022/04/01 21:30:39.925647 [DBG] Recovering JetStream state for account "USERS"
[33882] 2022/04/01 21:30:59.503928 [DBG] JETSTREAM - JetStream connection closed: Client Closed
[33882] 2022/04/01 21:30:59.504040 [DBG] JETSTREAM - JetStream connection closed: Client Closed
[33882] 2022/04/01 21:30:59.504082 [WRN] Error recreating stream "MY": sequence numbers for cache load did not match, 16429458 vs 16429457
[33882] 2022/04/01 21:30:59.507332 [INF] Restored 3,072 messages for stream 'USERS > CORE'
[33882] 2022/04/01 21:30:59.507970 [INF] Restored 3,127 messages for stream 'USERS > LOGS'
[33882] 2022/04/01 21:30:59.516982 [INF] Restored 118,325 messages for stream 'USERS > MH'
[33882] 2022/04/01 21:30:59.517952 [INF] Restored 0 messages for stream 'USERS > REFBOOKS'
[33882] 2022/04/01 21:30:59.521253 [INF] Restored 4,425 messages for stream 'USERS > SOVA'
[33882] 2022/04/01 21:30:59.522235 [INF] Restored 0 messages for stream 'USERS > WAREHOUSE'
[33882] 2022/04/01 21:30:59.704605 [INF] Recovering 28565 consumers for stream - 'USERS > CORE'
[33882] 2022/04/01 21:31:08.071054 [INF] Recovering 23041 consumers for stream - 'USERS > REFBOOKS'
[33882] 2022/04/01 21:31:14.593277 [INF] Recovering 28542 consumers for stream - 'USERS > SOVA'
[33882] 2022/04/01 21:31:23.134638 [INF] Recovering 56934 consumers for stream - 'USERS > WAREHOUSE'
[33882] 2022/04/01 21:31:42.268125 [DBG] JetStream state for account "USERS" recovered
[33882] 2022/04/01 21:31:42.268328 [INF] Listening for leafnode connections on 10.10.10.1:4224
[33882] 2022/04/01 21:31:42.268606 [INF] Listening for client connections on 0.0.0.0:4222
[33882] 2022/04/01 21:31:42.268619 [DBG] Get non local IPs for "0.0.0.0"
[33882] 2022/04/01 21:31:42.269012 [DBG] ip=10.10.10.1
[33882] 2022/04/01 21:31:42.269042 [INF] Server is ready

@derekcollison
Copy link
Member

Be interested if you could try the 2.8.0 beta from the top of main branch. If not and you are open to it you can send me just the MY stream as a gzipped tarball from the storage directory and I will try to figure out what happened.

@yadvlz
Copy link
Author

yadvlz commented Apr 1, 2022

Should I build a blob by myself of I can download it somewhere?
MY stream is about 20GB :) I can make a tarball and put it to webserver.

@derekcollison
Copy link
Member

The system needs to recover a stream in order to use the system to create a compressed backup etc.

So we will need to work with the directory for now. If you could tar it up and gzip or bzip2 it and send me an email with the link that would be great. We will see what we can do.

@yadvlz
Copy link
Author

yadvlz commented Apr 1, 2022

OK, give me some time.

@derekcollison
Copy link
Member

Anything abnormal about the previous server shutdown? How was the server shutdown?

@yadvlz
Copy link
Author

yadvlz commented Apr 1, 2022

Nats-server was shutted down correctly. But I saw in logs some messages about max opened files. There was 65k limit and I extend it to 1M.

@derekcollison
Copy link
Member

Ah.. Good to know.. Thanks.

@yadvlz
Copy link
Author

yadvlz commented Apr 1, 2022

Check your email please.

@derekcollison
Copy link
Member

Which email?

@yadvlz
Copy link
Author

yadvlz commented Apr 1, 2022

derek@nats.io

@derekcollison
Copy link
Member

Ok not showing up yet but will keep and eye out.

@yadvlz
Copy link
Author

yadvlz commented Apr 1, 2022

Maybe another email?

@derekcollison
Copy link
Member

I see it. Thanks. Might not be til tomorrow for me to take a look. What is the time sensitivity?

@yadvlz
Copy link
Author

yadvlz commented Apr 1, 2022

It's a critical prod, but I'll wait. Thanks.

@derekcollison derekcollison self-assigned this Apr 2, 2022
@derekcollison
Copy link
Member

This seems to be an issue of the system running out of file descriptors and corrupting the msg blks sequence numbers.
Will keep this open for now until we can learn more.

@Kowiste
Copy link

Kowiste commented Mar 27, 2023

I'm having the same problem, I just begin a few days ago to test nats.
Here I just create a public repository with the docker compose and everything ready to fail, just in case you want to take a look.

When I start the container, and later the program inside cmd folder everything looks good, I can see the message arriving to the consumer but in the logs appear this:

[ERR] Received an error looking up message for consumer: malformed or corrupt message

if I re-run the program the log in the server is

2023-03-27 13:46:21 [1] 2023/03/27 11:46:21.263561 [ERR] JetStream failed to store a msg on stream '$G > hello': malformed or corrupt message
2023-03-27 13:46:53 [1] 2023/03/27 11:46:53.398778 [ERR] Received an error looking up message for consumer: malformed or corrupt message
2023-03-27 13:47:05 [1] 2023/03/27 11:47:05.673953 [ERR] Received an error looking up message for consumer: malformed or corrupt message
2023-03-27 13:47:05 [1] 2023/03/27 11:47:05.783734 [ERR] Received an error looking up message for consumer: malformed or corrupt message
2023-03-27 13:47:05 [1] 2023/03/27 11:47:05.947694 [ERR] Received an error looking up message for consumer: malformed or corrupt message
2023-03-27 13:47:06 [1] 2023/03/27 11:47:06.694625 [ERR] Received an error looking up message for consumer: malformed or corrupt message
2023-03-27 13:47:07 [1] 2023/03/27 11:47:07.415628 [ERR] Received an error looking up message for consumer: malformed or corrupt message

@wallyqs
Copy link
Member

wallyqs commented Apr 3, 2023

Thanks @Kowiste for sharing how to reproduce, that issue has now been fixed here #4013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants