Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such ledger exists on Bookies but ledgermetadata exist #4287

Open
hamadodene opened this issue Apr 15, 2024 · 9 comments
Open

No such ledger exists on Bookies but ledgermetadata exist #4287

hamadodene opened this issue Apr 15, 2024 · 9 comments
Labels

Comments

@hamadodene
Copy link

hamadodene commented Apr 15, 2024

BUG REPORT

Describe the bug

We have noticed a strange behavior in our Bookkeeper cluster in production. In summary, we are currently unable to access the data of some ledgers that should have been created by Bookkeeper and therefore should exist. When we try to find the ledger using the Bookkeeper CLI:

./bookkeeper shell ledgermetadata -ledger 15543
24-04-14-12-30-18    ledgerID: 15543
24-04-14-12-30-18    LedgerMetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=CLOSED, length=13417, lastEntryId=78, digestType=CRC32C, password=base64:, ensembles={0=[mn1-bookie2:1823, mn1-bookie3:1822]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:aWRjNzE3Ny9ucy9wZXJzaXN0ZW50L2V2ZW50cw==, application=base64:cHVsc2Fy}}

However, when we try to read the ledger using the CLI:
./bookkeeper readledger -ledgerid 15543

./bookkeeper  readledger -ledgerid 15543
24-04-14-12-30-49       BookKeeper metadata driver manager initialized
24-04-14-12-30-49       Initialize zookeeper metadata driver at metadata service uri zk+hierarchical://mn1-clustermanager1.dna:1281;mn1-clustermanager2.dna:1281;mn1-clustermanager3.dna:1281/ledgers : zkServers = mn1-clustermanager1.dna:1281,mn1-clustermanager2.dna:1281,mn1-clustermanager3.dna:1281, ledgersRootPath = /ledgers.
24-04-14-12-30-49       Client environment:zookeeper.version=3.8.0-5a02a05eddb59aee6ac762f7ea82e92a68eb9c0f, built on 2022-02-25 08:49 UTC
24-04-14-12-30-49       Client environment:host.name=mn1-bookie2
24-04-14-12-30-49       Client environment:java.version=19.0.2
24-04-14-12-30-49       Client environment:java.vendor=Eclipse Adoptium
24-04-14-12-30-49       Client environment:java.home=/usr/java/jdk-19.0.2+7
24-04-14-12-30-49       Client environment:java.class.path=/data/mn1/bookie/./.code/commons-codec-1.15.jar:/data/mn1/bookie/./.code/geoip2-4.0.1.jar:/data/mn1/bookie/./.code/pulsar-functions-runtime-3.0.3.jar:/data/mn1/bookie/./.code/lombok-1.18.30.jar:/data/mn1/bookie/./.code/zookeeper-jute-3.8.0.jar:/data/mn1/bookie/./.code/jetty-util-9.4.54.v20240208.jar:/data/mn1/bookie/./.code/graal-sdk-22.0.0.jar:/data/mn1/bookie/./.code/netty-resolver-dns-native-macos-4.1.107.Final-osx-x86_64.jar:/data/mn1/bookie/./.code/commons-collections-3.2.2.jar:/data/mn1/bookie/./.code/javax.activation-api-1.2.0.jar:/data/mn1/bookie/./.code/hadoop-hdfs-client-2.10.2.jar:/data/mn1/bookie/./.code/ha-api-3.1.12.jar:/data/mn1/bookie/./.code/jaxb-api-2.3.1.jar:/data/mn1/bookie/./.code/batik-codec-1.17.jar:/data/mn1/bookie/./.code/netty-transport-sctp-4.1.107.Final.jar:/data/mn1/bookie/./.code/pngj-2.1.0.jar:/data/mn1/bookie/./.code/netty-tcnative-boringssl-static-2.0.65.Final-windows-x86_64.jar:/data/mn1/bookie/./.code/hbase-client-2.3.7.jar:/data/mn1/bookie/./.code/commons-digester-2.1.jar:/data/mn1/bookie/./.code/pulsar-metadata-3.0.3.jar:/data/mn1/bookie/./.code/batik-constants-1.17.jar:/data/mn1/bookie/./.code/simpleclient_tracer_common-0.16.0.jar:/data/mn1/bookie/./.code/RoaringBitmap-0.9.44.jar:/data/mn1/bookie/./.code/netty-transport-classes-epoll-4.1.107.Final.jar:/data/mn1/bookie/./.code/org.apache.oltu.oauth2.client-1.0.2.jar:/data/mn1/bookie/./.code/rocksdbjni-7.9.2.jar:/data/mn1/bookie/./.code/netty-tcnative-boringssl-static-2.0.65.Final-osx-x86_64.jar:/data/mn1/bookie/./.code/tomcat-util-scan-9.0.84.jar:/data/mn1/bookie/./.code/netty-tcnative-boringssl-static-2.0.65.Final-linux-x86_64.jar:/data/mn1/bookie/./.code/jakarta.ws.rs-api-2.1.6.jar:/data/mn1/bookie/./.code/eclipse-collections-7.1.1.jar:/data/mn1/bookie/./.code/jetty-servlets-9.4.54.v20240208.jar:/data/mn1/bookie/./.code/stax-ex-1.8.3.jar:/data/mn1/bookie/./.code/batik-anim-1.17.jar:/data/mn1/bookie/./.code/async-http-client-netty-utils-2.12.1.jar:/data/mn1/bookie/./.code/hbase-metrics-2.3.7.jar:/data/mn1/bookie/./.code/jsoup-1.15.3.jar:/data/mn1/bookie/./.code/netty-resolver-dns-native-macos-4.1.107.Final-osx-aarch_64.jar:/data/mn1/bookie/./.code/saaj-impl-1.5.2.jar:/data/mn1/bookie/./.code/hbase-shaded-netty-2.2.1.jar:/data/mn1/bookie/./.code/oshi-core-java11-6.4.0.jar:/data/mn1/bookie/./.code/netty-codec-http2-4.1.107.Final.jar:/data/mn1/bookie/./.code/core-3.0.1.jar:/data/mn1/bookie/./.code/jackson-databind-2.16.2.jar:/data/mn1/bookie/./.code/bookkeeper-common-4.16.4.jar:/data/mn1/bookie/./.code/jersey-entity-filtering-2.40.jar:/data/mn1/bookie/./.code/batik-util-1.17.jar:/data/mn1/bookie/./.code/bcpg-jdk18on-1.75.jar:/data/mn1/bookie/./.code/hbase-hadoop-compat-2.3.7.jar:/data/mn1/bookie/./.code/j2objc-annotations-1.3.jar:/data/mn1/bookie/./.code/javax.websocket-client-api-1.0.jar:/data/mn1/bookie/./.code/commons-lang-2.6.jar:/data/mn1/bookie/./.code/scripting-api-29.04.jar:/data/mn1/bookie/./.code/netty-tcnative-boringssl-static-2.0.65.Final-linux-aarch_64.jar:/data/mn1/bookie/./.code/winzipaes-1.0.1.jar:/data/mn1/bookie/./.code/jetty-alpn-server-9.4.54.v20240208.jar:/data/mn1/bookie/./.code/protobuf-java-util-3.21.8.jar:/data/mn1/bookie/./.code/avatica-metrics-1.22.0.jar:/data/mn1/bookie/./.code/magnews.cache-29.04.jar:/data/mn1/bookie/./.code/jetty-servlet-9.4.54.v20240208.jar:/data/mn1/bookie/./.code/hbase-hadoop2-compat-2.3.7.jar:/data/mn1/bookie/./.code/js-scriptengine-22.0.0.jar:/data/mn1/bookie/./.code/herddb-collections-0.29.0.jar:/data/mn1/bookie/./.code/pulsar-functions-worker-3.0.3.jar:/data/mn1/bookie/./.code/juel-impl-2.2.5.jar:/data/mn1/bookie/./.code/magnews.inbox.litmus-29.04.jar:/data/mn1/bookie/./.code/jakarta.mail-api-1.6.5.jar:/data/mn1/bookie/./.code/unbescape-1.1.3.RELEASE.jar:/data/mn1/bookie/./.code/xml-apis-1.4.01.jar:/data/mn1/bookie/./.code/asm-9.6.jar:/data/mn1/bookie/./.code/poi-ooxml-5.2.5.jar:/data/mn1/bookie/./.code/txw2-2.3.3.jar:/data/mn1/bookie/./.code/metrics-jmx-4.1.12.1.jar:/data/mn1/bookie/./.code/validation-api-1.1.0.Final.jar:/data/mn1/bookie/./.code/codahale-metrics-provider-4.16.4.jar:/data/mn1/bookie/./.code/slf4j-api-2.0.4.jar:/data/mn1/bookie/./.code/jfreechart-1.0.12.jar:/data/mn1/bookie/./.code/memory-0.8.3.jar:/data/mn1/bookie/./.code/failsafe-2.4.4.jar:/data/mn1/bookie/./.code/netty-incubator-transport-classes-io_uring-0.0.21.Final.jar:/data/mn1/bookie/./.code/pulsar-functions-secrets-3.0.3.jar:/data/mn1/bookie/./.code/javax.servlet-api-4.0.1.jar:/data/mn1/bookie/./.code/magnews-backend-json-29.04.jar:/data/mn1/bookie/./.code/magnews.inbox.core-29.04.jar:/data/mn1/bookie/./.code/gmbal-4.0.1.jar:/data/mn1/bookie/./.code/tomcat-catalina-9.0.84.jar:/data/mn1/bookie/./.code/netty-transport-udt-4.1.107.Final.jar:/data/mn1/bookie/./.code/managed-ledger-3.0.3.jar:/data/mn1/bookie/./.code/zookeeper-3.8.0.jar:/data/mn1/bookie/./.code/magnews.majordodo-29.04.jar:/data/mn1/bookie/./.code/herddb-net-0.29.0.jar:/data/mn1/bookie/./.code/jakarta.jws-api-2.1.0.jar:/data/mn1/bookie/./.code/hbase-metrics-api-2.3.7.jar:/data/mn1/bookie/./.code/imageio-jpeg-3.9.4.jar:/data/mn1/bookie/./.code/jetty-server-9.4.54.v20240208.jar:/data/mn1/bookie/./.code/scrimage-core-4.1.1.jar:/data/mn1/bookie/./.code/audience-annotations-0.5.0.jar:/data/mn1/bookie/./.code/batik-xml-1.17.jar:/data/mn1/bookie/./.code/vertx-core-4.3.5.jar:/data/mn1/bookie/./.code/jcommon-1.0.15.jar:/data/mn1/bookie/./.code/magnews-xml-29.04.jar:/data/mn1/bookie/./.code/magnews.backup-29.04.jar:/data/mn1/bookie/./.code/jsr181-api-1.0-MR1.jar:/data/mn1/bookie/./.code/magnews.bookie-29.04.jar:/data/mn1/bookie/./.code/jtidy-r938.jar:/data/mn1/bookie/./.code/pulsar-package-filesystem-storage-3.0.3.jar:/data/mn1/bookie/./.code/jetty-continuation-9.4.54.v20240208.jar:/data/mn1/bookie/./.code/bookkeeper-stats-api-4.16.4.jar:/data/mn1/bookie/./.code/jna-platform-jpms-5.12.1.jar:/data/mn1/bookie/./.code/hbase-shaded-miscellaneous-3.3.0.jar:/data/mn1/bookie/./.code/hk2-api-2.6.1.jar:/data/mn1/bookie/./.code/simpleclient_servlet-0.16.0.jar:/data/mn1/bookie/./.code/jna-jpms-5.12.1.jar:/data/mn1/bookie/./.code/magnews.linkchecker-29.04.jar:/data/mn1/bookie/./.code/jakarta.xml.bind-api-2.3.3.jar:/data/mn1/bookie/./.code/pulsar-common-3.0.3.jar:/data/mn1/bookie/./.code/magnews.elasticsearch-29.04.jar:/data/mn1/bookie/./.code/jersey-hk2-2.40.jar:/data/mn1/bookie/./.code/jctools-core-2.1.2.jar:/data/mn1/bookie/./.code/eclipse-collections-api-7.1.1.jar:/data/mn1/bookie/./.code/httpclient-4.5.13.jar:/data/mn1/bookie/./.code/magnews.tomcat.embedded-29.04.jar:/data/mn1/bookie/./.code/giotto.importer-29.04.jar:/data/mn1/bookie/./.code/elsa-3.0.0-M5.jar:/data/mn1/bookie/./.code/sshd-sftp-2.12.0.jar:/data/mn1/bookie/./.code/netty-tcnative-classes-2.0.65.Final.jar:/data/mn1/bookie/./.code/commons-lang3-3.11.jar:/data/mn1/bookie/./.code/jackson-core-asl-1.9.13.jar:/data/mn1/bookie/./.code/jetty-util-ajax-9.4.54.v20240208.jar:/data/mn1/bookie/./.code/netty-codec-mqtt-4.1.107.Final.jar:/data/mn1/bookie/./.code/tomcat-jasper-9.0.84.jar:/data/mn1/bookie/./.code/pfl-basic-4.1.0.jar:/data/mn1/bookie/./.code/jackson-module-jsonSchema-2.16.2.jar:/data/mn1/bookie/./.code/magnews.spf-29.04.jar:/data/mn1/bookie/./.code/commons-collections4-4.4.jar:/data/mn1/bookie/./.code/pulsar-client-original-3.0.3.jar:/data/mn1/bookie/./.code/pulsar-client-admin-api-3.0.3.jar:/data/mn1/bookie/./.code/blazingcache-core-3.3.0.jar:/data/mn1/bookie/./.code/jackson-jaxrs-base-2.16.2.jar:/data/mn1/bookie/./.code/subethasmtp-5.2.7.jar:/data/mn1/bookie/./.code/jasperreports-mn-4.0.0.jar:/data/mn1/bookie/./.code/stringtemplate-3.2.1.jar:/data/mn1/bookie/./.code/jersey-common-2.40.jar:/data/mn1/bookie/./.code/netty-transport-rxtx-4.1.107.Final.jar:/data/mn1/bookie/./.code/junit-4.13.2.jar:/data/mn1/bookie/./.code/avro-1.11.3.jar:/data/mn1/bookie/./.code/herddb-mock-0.29.0.jar:/data/mn1/bookie/./.code/conscrypt-openjdk-uber-2.5.2.jar:/data/mn1/bookie/./.code/jersey-container-servlet-2.40.jar:/data/mn1/bookie/./.code/httpcore-4.4.13.jar:/data/mn1/bookie/./.code/sac-1.3.jar:/data/mn1/bookie/./.code/simpleclient_jetty-0.16.0.jar:/data/mn1/bookie/./.code/majordodo-client-0.17.0.jar:/data/mn1/bookie/./.code/jackson-datatype-jsr310-2.16.2.jar:/data/mn1/bookie/./.code/httpcore5-5.1.3.jar:/data/mn1/bookie/./.code/com-diennea-security-29.04.jar:/data/mn1/bookie/./.code/commons-compress-1.25.0.jar:/data/mn1/bookie/./.code/maxmind-db-3.0.0.jar:/data/mn1/bookie/./.code/common-image-3.9.4.jar:/data/mn1/bookie/./.code/hbase-protocol-shaded-2.3.7.jar:/data/mn1/bookie/./.code/perfmark-api-0.26.0.jar:/data/mn1/bookie/./.code/netty-transport-native-kqueue-4.1.107.Final-osx-x86_64.jar:/data/mn1/bookie/./.code/rxjava-3.0.1.jar:/data/mn1/bookie/./.code/jetty-client-9.4.54.v20240208.jar:/data/mn1/bookie/./.code/magnews.dsnparser-29.04.jar:/data/mn1/bookie/./.code/magnews.rss-29.04.jar:/data/mn1/bookie/./.code/magnews-util-12.13.0.jar:/data/mn1/bookie/./.code/netty-resolver-dns-classes-macos-4.1.107.Final.jar:/data/mn1/bookie/./.code/curator-client-5.1.0.jar:/data/mn1/bookie/./.code/tomcat-jsp-api-9.0.84.jar:/data/mn1/bookie/./.code/bookkeeper-server-4.16.4.jar:/data/mn1/bookie/./.code/pdfbox-2.0.28.jar:/data/mn1/bookie/./.code/stax2-api-4.2.2.jar:/data/mn1/bookie/./.code/typetools-0.5.0.jar:/data/mn1/bookie/./.code/httpcore5-h2-5.1.3.jar:/data/mn1/bookie/./.code/jackson-dataformat-yaml-2.16.2.jar:/data/mn1/bookie/./.code/customtables-29.04.jar:/data/mn1/bookie/./.code/jakarta.inject-2.6.1.jar:/data/mn1/bookie/./.code/netty-codec-haproxy-4.1.107.Final.jar:/data/mn1/bookie/./.code/rome-utils-2.1.0.jar:/data/mn1/bookie/./.code/itext-2.1.7.jar:/data/mn1/bookie/./.code/netty-codec-dns-4.1.107.Final.jar:/data/mn1/bookie/./.code/commons-io-2.15.1.jar:/data/mn1/bookie/./.code/ecj-3.26.0.jar:/data/mn1/bookie/./.code/barcode4j-2.1.jar:/data/mn1/bookie/./.code/bcutil-jdk18on-1.75.jar:/data/mn1/bookie/./.code/jna-5.12.1.jar:/data/mn1/bookie/./.code/netty-tcnative-boringssl-static-2.0.65.Final.jar:/data/mn1/bookie/./.code/netty-reactive-streams-2.0.6.jar:/data/mn1/bookie/./.code/openpdf-1.3.30.jar:/data/mn1/bookie/./.code/batik-i18n-1.17.jar:/data/mn1/bookie/./.code/curator-framework-5.1.0.jar:/data/mn1/bookie/./.code/jers
24-04-14-12-30-49       Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
24-04-14-12-30-49       Client environment:java.io.tmpdir=/tmp
24-04-14-12-30-49       Client environment:java.compiler=<NA>
24-04-14-12-30-49       Client environment:os.name=Linux
24-04-14-12-30-49       Client environment:os.arch=amd64
24-04-14-12-30-49       Client environment:os.version=5.4.17-2136.321.4.1.el8uek.x86_64
24-04-14-12-30-49       Client environment:user.name=magnews
24-04-14-12-30-49       Client environment:user.home=/home/magnews
24-04-14-12-30-49       Client environment:user.dir=/data/mn1/bookie
24-04-14-12-30-49       Client environment:os.memory.free=153MB
24-04-14-12-30-49       Client environment:os.memory.max=2906MB
24-04-14-12-30-49       Client environment:os.memory.total=186MB
24-04-14-12-30-49       Initiating client connection, connectString=mn1-clustermanager1.dna:1281,mn1-clustermanager2.dna:1281,mn1-clustermanager3.dna:1281 sessionTimeout=10000 watcher=org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase@323b36e0
24-04-14-12-30-49       Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
24-04-14-12-30-49       jute.maxbuffer value is 1048575 Bytes
24-04-14-12-30-49       zookeeper.request.timeout value is 0. feature enabled=false
24-04-14-12-30-49       Opening socket connection to server mn1-clustermanager1.dna/10.200.86.86:1281.
24-04-14-12-30-49       SASL config status: Will not attempt to authenticate using SASL (unknown error)
24-04-14-12-30-49       Socket connection established, initiating session, client: /10.200.86.156:41888, server: mn1-clustermanager1.dna/10.200.86.86:1281
24-04-14-12-30-49       Session establishment complete on server mn1-clustermanager1.dna/10.200.86.86:1281, session id = 0x103c4496f7e0075, negotiated timeout = 10000
24-04-14-12-30-49       ZooKeeper client is connected now.
24-04-14-12-30-49       Failed to initialize DNS Resolver org.apache.bookkeeper.net.ScriptBasedMapping, used default subnet resolver because No network topology script is found when using script based DNS resolver.
24-04-14-12-30-49       Initialize rackaware ensemble placement policy @ <Bookie:10.200.86.156:0> @ /default-rack : org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy$DefaultResolver.
24-04-14-12-30-49       Not weighted
24-04-14-12-30-49       Weighted ledger placement is not enabled
24-04-14-12-30-49       Update BookieInfoCache (writable bookie) mn1-bookie2:1823 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=bookie, port=1823, host=mn1-bookie2, protocol=bookie-rpc, auth=[], extensions=[]}]}
24-04-14-12-30-49       Update BookieInfoCache (writable bookie) mn1-bookie1:1822 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=bookie, port=1822, host=mn1-bookie1, protocol=bookie-rpc, auth=[], extensions=[]}]}
24-04-14-12-30-49       Update BookieInfoCache (writable bookie) mn1-bookie3:1822 -> BookieServiceInfo{properties={}, endpoints=[EndpointInfo{id=bookie, port=1822, host=mn1-bookie3, protocol=bookie-rpc, auth=[], extensions=[]}]}
24-04-14-12-30-49       Adding a new node: /default-rack/mn1-bookie2:1823
24-04-14-12-30-49       Adding a new node: /default-rack/mn1-bookie1:1822
24-04-14-12-30-49       Adding a new node: /default-rack/mn1-bookie3:1822
24-04-14-12-30-49       Successfully connected to bookie: mn1-bookie2:1823 [id: 0xa0ee35b2, L:/10.200.86.156:37280 - R:mn1-bookie2/10.200.86.156:1823]
24-04-14-12-30-49       connection [id: 0xa0ee35b2, L:/10.200.86.156:37280 - R:mn1-bookie2/10.200.86.156:1823] authenticated as BookKeeperPrincipal{ANONYMOUS}
24-04-14-12-30-49       Successfully connected to bookie: mn1-bookie3:1822 [id: 0xe9a3189a, L:/10.200.86.156:38104 - R:mn1-bookie3/10.200.86.157:1822]
24-04-14-12-30-49       connection [id: 0xe9a3189a, L:/10.200.86.156:38104 - R:mn1-bookie3/10.200.86.157:1822] authenticated as BookKeeperPrincipal{ANONYMOUS}
24-04-14-12-30-49       Read of ledger entry failed: L15543 E0-E0, Sent to [mn1-bookie2:1823, mn1-bookie3:1822], Heard from [] : bitset = {}, Error = 'No such ledger exists on Bookies'. First unread entry is (-1, rc = null)
24-04-14-12-30-49       Error reading entry 0 from ledger 15543
24-04-14-12-30-49       org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies
org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies
        at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83)
        at org.apache.bookkeeper.client.SyncCallbackUtils$SyncReadCallback.readComplete(SyncCallbackUtils.java:229)
        at org.apache.bookkeeper.client.LedgerHandle$4.onFailure(LedgerHandle.java:818)
        at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38)
        at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
        at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
        at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:107)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:1589)

24-04-14-12-30-49       Closing the per channel bookie client for mn1-bookie2:1823
24-04-14-12-30-49       Closing the per channel bookie client for mn1-bookie3:1822
24-04-14-12-30-49       Disconnected from bookie channel [id: 0xa0ee35b2, L:/10.200.86.156:37280 ! R:mn1-bookie2/10.200.86.156:1823]
24-04-14-12-30-49       Disconnected from bookie channel [id: 0xe9a3189a, L:/10.200.86.156:38104 ! R:mn1-bookie3/10.200.86.157:1822]
24-04-14-12-30-49       The mainWorkerPool did not shutdown cleanly
24-04-14-12-30-49       An exception was thrown while closing send thread for session 0x103c4496f7e0075.
24-04-14-12-30-50       Session: 0x103c4496f7e0075 closed
24-04-14-12-30-50       Got an exception
24-04-14-12-30-50       com.google.common.util.concurrent.UncheckedExecutionException: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies
24-04-14-12-30-50       EventThread shut down for session: 0x103c4496f7e0075
com.google.common.util.concurrent.UncheckedExecutionException: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies
        at org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand.apply(ReadLedgerCommand.java:138)
        at org.apache.bookkeeper.bookie.BookieShell$ReadLedgerEntriesCmd.runCmd(BookieShell.java:641)
        at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:248)
        at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2349)
        at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2446)
Caused by: java.lang.RuntimeException: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies
        at org.apache.bookkeeper.client.BookKeeperAdmin$LedgerEntriesIterator.hasNext(BookKeeperAdmin.java:457)
        at org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand.readledger(ReadLedgerCommand.java:173)
        at org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand.apply(ReadLedgerCommand.java:136)
        ... 4 more
Caused by: org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies
        at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83)
        at org.apache.bookkeeper.client.SyncCallbackUtils$SyncReadCallback.readComplete(SyncCallbackUtils.java:229)
        at org.apache.bookkeeper.client.LedgerHandle$4.onFailure(LedgerHandle.java:818)
        at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38)
        at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
        at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
        at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:107)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:1589)
		

Note:

  • We have not performed some maintenance on Bookkeeper's storage.
  • The creation of the ledger was requested by Apache Pulsar.

We also checked in the entry log files, and it really seems that ledger does not exist.

Furthermore, when that ledger was created by Apache Pulsar, Pulsar did not give any errors during writing. But when trying to read the ledger, Bookkeeper responded with "No such ledger exists on Bookies."

Do you have any information on what the problem might be or how we can debug this issue?

To Reproduce

We were unable to reproduce the issue.

Expected behavior

Given that the metadata exists, I expect the ledger to actually exist on Bookkeeper as well. We have not performed any ledger deletions on Bookkeeper.

Pulsar version: 3.0.3
Bookeeper version: 4.16.4

@lhotari
Copy link
Member

lhotari commented Apr 15, 2024

@hamadodene It would be helpful to share Pulsar version & Bookkeeper version & possible customized Ensemble size (E), write quorum (Qw) and ack quorum (Qa) size.

@hamadodene
Copy link
Author

@lhotari Yes, we have pulsar 3.0.3, bk 4.16.4
And for E, Qw, Qa we use ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2

@lhotari
Copy link
Member

lhotari commented Apr 16, 2024

@hamadodene noticed this in the output that you shared:

jute.maxbuffer value is 1048575 Bytes

in Pulsar, the default is -Djute.maxbuffer=10485760.

When you run Bookkeeper, do you use bin/pulsar bookie to start it?

This might not be relevant in this context, but just just wondering if large ZNodes with low jute.maxbuffer value could result in inconsistencies.

@lhotari
Copy link
Member

lhotari commented Apr 16, 2024

When running Bookkeeper with Pulsar's bin/pulsar bookie script, one of the main differences is that Bookkeeper will use org.apache.pulsar.metadata.bookkeeper.PulsarMetadataBookieDriver and org.apache.pulsar.metadata.bookkeeper.PulsarMetadataClientDriver from the Pulsar code base for metadata operations.

@lhotari
Copy link
Member

lhotari commented Apr 16, 2024

@hamadodene do you use offloading? I found issue apache/pulsar#21737 which could be related in that case.

@lhotari
Copy link
Member

lhotari commented Apr 16, 2024

also apache/pulsar#15464

@hamadodene
Copy link
Author

hamadodene commented Apr 16, 2024

@lhotari
We don't use offload. We have our own service that wraps Bookkeeper (we create an org.apache.bookkeeper.server.EmbeddedServer). We don't use the two classes you mentioned earlier, but we configure the metadataServiceUri of Bookkeeper as zk+hierarchical and the ZNode /ledgers/LAYOUT indicates hierarchical.

We recently forced the metadataServiceUri to be hierarchical; previously, we were using zk+null, which then used the Bookkeeper default. Therefore, the layout on the ZNode was Flat, probably due to defaults from older versions.

This caused problems because during the update, the ledger Pulsar ZNodes were written with hierarchical layout, while other nodes were written with flat layout. Perhaps this caused the inconsistencies.

However, Bookkeeper seemed to write without errors (at least it wrote the ZNodes); perhaps the missing ledgers in the logs are those written before we fixed the layout?

The update was made from Pulsar 2.9.5 to 3.0.3 and Bookkeeper 4.14.4 to 4.16.4.

@eolivelli
Copy link
Contributor

zk+null is the safest default because it automatically adatps to the existing layout.

I suggest to use that and let the clients automatically discover.
In case it is a new cluster when you format it using zk+null the layout will be hierarchical

@dmercuriali
Copy link
Contributor

dmercuriali commented Apr 30, 2024

@eolivelli our system is pretty old. In the znode /ledgers/LAYOUT we had Flat.
We use the same BK cluster for pulsar and for some other parts of our application. Our code defaulted to zk+null.

After the pulsar upgrade we noticed that the ledgers for pulsar topics were created with the hierarchical layout (while the ledgers created directly by us were still created with the flat layout). This might be a problem with pulsar, maybe It forces the layout instead of reliyng on the cluster-default.

But the strange thing @hamadodene is reporting, is that pulsar was (apparently) able to publish messages on the topics, but could not read the messages because bk was throwing BKException$BKNoSuchLedgerExistsException: No such ledger exists on Bookies.
We then forced the hierarchical layout on the bk cluster, but bk still could not read the pulsar ledgers. Looking in the bk logs, we found no entries for the ledgers "created" before the layout switch.

Is it possible that bk was creating the znode for the ledger (with hierarchical layout), and then silently failed to actually write because of the conflicting layout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants