Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] In read-only mode can't get data from blob only if there is just one checkpoint with one entry #12503

Open
TheSmithSoftware opened this issue Apr 3, 2024 · 14 comments

Comments

@TheSmithSoftware
Copy link

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev

Expected behavior

When I try to get data with rocksDB.get method in read-only mode using blob (options.setEnableBlobFiles(true)) and I have only one checkpoint with one entry, I expect to get the actual data.

Actual behavior

When I try to get data with rocksDB.get method in read-only mode using blob (options.setEnableBlobFiles(true)) and I have only one checkpoint with one entry, I got null.

Steps to reproduce the behavior

Here is a minimal code to reproduce the bug:
Github

@adamretter
Copy link
Collaborator

@TheSmithSoftware can you provide minimal reproducible example code please?

@Smith1123
Copy link

Smith1123 commented Apr 4, 2024

There is a link to the repo, where you could find the code, I mean the repo is the code itself. I'm sorry, I answer with a different account, but I locked out myself from my account temporarily, but @TheSmithSoftware is also me.
If you have any question regarding the example code, feel free to ask :)

@dfa1
Copy link
Contributor

dfa1 commented Apr 4, 2024

@TheSmithSoftware @adamretter I'm able to reproduce the issue on both linux and windows 10 with JDK17 and RocksDB 9.0.0, 6.29.5 and some intermediate versions like 7.0.0.

@dfa1
Copy link
Contributor

dfa1 commented Apr 4, 2024

@adamretter please have a look:

package com.sixgroup;

import org.rocksdb.Checkpoint;
import org.rocksdb.CompressionType;
import org.rocksdb.Options;
import org.rocksdb.RocksDB;

import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;

public class Minimal {

	public static void main(String[] args) throws Exception {
		final int minBlobSize = 1000;
		final byte[] messageKey = "id".getBytes(StandardCharsets.UTF_8);
		final byte[] message = "a".repeat(minBlobSize).getBytes(StandardCharsets.UTF_8);

		final Path mainPath = Files.createTempDirectory("rocksdb-issues-12503-");
		final Path checkpointPath = mainPath.resolve("checkpoint");
		try (Options options = new Options().setCreateIfMissing(true).setEnableBlobFiles(true).setMinBlobSize(minBlobSize).setBlobCompressionType(CompressionType.ZLIB_COMPRESSION)) {
			System.out.println(mainPath);
			try (RocksDB rocks = RocksDB.open(options, mainPath.toString())) {
				rocks.put(messageKey, message);
				try (Checkpoint checkpoint = Checkpoint.create(rocks)) {
					checkpoint.createCheckpoint(checkpointPath.toString());
				}
			}
			try (RocksDB rocks = RocksDB.open(options, checkpointPath.toString())) {
				byte[] read = rocks.get(messageKey);
				System.out.println("read with RockDB.open on checkpoint: " + (read != null));
			}
			try (RocksDB rocks = RocksDB.openReadOnly(options, checkpointPath.toString())) {
				byte[] read = rocks.get(messageKey);
				System.out.println("read with RockDB.openReadOnly on checkpoint: " + (read != null));
			}
			try (RocksDB rocks = RocksDB.open(options, mainPath.toString())) {
				byte[] read = rocks.get(messageKey);
				System.out.println("read with RockDB.open on main: " + (read != null));
			}
			try (RocksDB rocks = RocksDB.openReadOnly(options, mainPath.toString())) {
				byte[] read = rocks.get(messageKey);
				System.out.println("read with RockDB.openReadOnly on main: " + (read != null));
			}
		}
	}
}

on my linux machine the output is:

read with RockDB.open on checkpoint: true
read with RockDB.openReadOnly on checkpoint: false
read with RockDB.open on main: true
read with RockDB.openReadOnly on main: false

apparently blobs works only with open on both main db and checkpoint. If the database in written again, then it is possible to read the blob even with openReadOnly /cc @TheSmithSoftware please jump in if I'm missing something! ;)

@adamretter
Copy link
Collaborator

@TheSmithSoftware @Smith1123 @dfa1 Okay thanks for the code, my colleague here @rhubner is going to pick this up

@rhubner
Copy link
Contributor

rhubner commented Apr 8, 2024

Hello @TheSmithSoftware, @Smith1123, @dfa1

@dfa1, I tried your example and I can confirm it behaves the same on my PC. In JNI I didn't see any obvious errors, so I wrote a small C++ test and everything works properly. So the error must be somewhere in the JNI layer. I will dig deeper. I have a small suspicion on the PinnableSlice, but as you can see, it works in C++ code.

TEST_F(BlobTest, ReadOnlyWithBlob) {
  const int min_blob_size = 1000;
  // const int blob_size = min_blob_size + 10;
  const int blob_size = min_blob_size;
  const auto db_path = "c:\\tmp\\";
  const auto checkpoint_path = "c:\\tmp\\checkpoint";

  Options options = CurrentOptions();
  options.create_if_missing = true;
  options.enable_blob_files = true;
  options.min_blob_size = min_blob_size;

  DB* db2 = nullptr;

  ASSERT_OK(DB::Open(options, db_path, &db2));
  ASSERT_OK(db2->Put(WriteOptions(), Slice("key"), Slice("value")));

  std::string read_result;
  Status readStatus = db2->Get(ReadOptions(), Slice("key"), &read_result);
  EXPECT_EQ(std::string("value"), read_result);

  auto big_value = std::make_unique<char[]>(blob_size);
  for (int i = 0; i < blob_size; i++) {
    big_value[i] = 'a';
  }
  ASSERT_OK(db2->Put(WriteOptions(), Slice("key2"),
                     Slice(big_value.get(), blob_size)));
  ASSERT_OK(db2->Get(ReadOptions(), Slice("key2"), &read_result));
  ASSERT_EQ(std::string(big_value.get(), blob_size), read_result);

  Checkpoint* checkpoint;
  ASSERT_OK(Checkpoint::Create(db2, &checkpoint));
  ASSERT_OK(checkpoint->CreateCheckpoint(checkpoint_path));

  delete checkpoint;

  db2->Close();
  delete db2;

  ASSERT_OK(DB::OpenForReadOnly(options, checkpoint_path, &db2));

  //  ASSERT_OK(db2->Get(ReadOptions(), Slice("key2"), &read_result));
  //  ASSERT_EQ(std::string(big_value.get(), blob_size), read_result);
  //  ASSERT_OK(db2->Get(ReadOptions(), Slice("key2"), &read_result));

  PinnableSlice result_slice;
  ASSERT_OK(db2->Get(ReadOptions(), db2->DefaultColumnFamily(), Slice("key2"),
                     &result_slice));
  ASSERT_EQ(Slice(big_value.get(), blob_size), result_slice);

  db2->Close();
  delete db2;
}

cc: @adamretter

@rhubner
Copy link
Contributor

rhubner commented Apr 8, 2024

After some debbuging which didn't bring any result, I try different appropoach, Iterator.

try (RocksDB rocks = RocksDB.openReadOnly(options, checkpointPath.toString());
                 RocksIterator it = rocks.newIterator()) {
                it.seekToFirst();
                System.out.println("It isValid : " + it.isValid());
                byte[] keyFromIt = it.key();
                byte[] valueFromIt = it.value();
                System.out.println("Key from it : " + (keyFromIt != null));
                System.out.println("value from it : " + (valueFromIt != null));
                System.out.println("key value from it: " + new String(keyFromIt));
                System.out.println("value value from it: " + new String(valueFromIt));
}

Tis produce on my pc this resutl:

It isValid : true
Key from it : true
value from it : true
key value from it: some_key
value value from it: aaaaaaaaaaaa

This make me assume that data are there, only not accesible with Get operation. But why? 🤔

@dfa1
Copy link
Contributor

dfa1 commented Apr 8, 2024

@rhubner thanks for the updates! Data is there because opening the same checkpoint in readwrite, works and the checkpoint has the blob file.

@Smith1123
Copy link

Sorry not to mention it before, but I was already aware about, it is working with iterator

@Smith1123
Copy link

ASSERT_OK(db2->Put(WriteOptions(), Slice("key"), Slice("value")));

Have you checked, that the code above actually write into a blob? Because in my experience, if I don’t use a data big enough, rocksdb doesn’t use the blob actually.

@dfa1
Copy link
Contributor

dfa1 commented Apr 23, 2024

@rhubner thanks for the feedback!

Maybe this is useful: only if the checkpoint is created on the main instance, the next read fails!

Proof:

package com.sixgroup;

import org.rocksdb.Checkpoint;
import org.rocksdb.CompressionType;
import org.rocksdb.Options;
import org.rocksdb.RocksDB;

import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;

public class Minimal {

	public static void main(String[] args) throws Exception {
		final int minBlobSize = 1000;
		final byte[] messageKey = "id".getBytes(StandardCharsets.UTF_8);
		final byte[] message = "a".repeat(minBlobSize).getBytes(StandardCharsets.UTF_8);

		final Path mainPath = Files.createTempDirectory("rocksdb-issues-12503-");
		final Path checkpointPath = mainPath.resolve("checkpoint");
		try (Options options = new Options().setCreateIfMissing(true).setEnableBlobFiles(true).setMinBlobSize(minBlobSize).setBlobCompressionType(CompressionType.ZLIB_COMPRESSION)) {
			System.out.println(mainPath);
			try (RocksDB rocks = RocksDB.open(options, mainPath.toString())) {
				rocks.put(messageKey, message);
				try (Checkpoint checkpoint = Checkpoint.create(rocks)) {
					checkpoint.createCheckpoint(checkpointPath.toString());
				}
			}
			try (RocksDB rocks = RocksDB.openReadOnly(options, mainPath.toString())) {
				byte[] read = rocks.get(messageKey);
				System.out.println("read with RockDB.openReadOnly on main: " + (read != null));

			}
			try (RocksDB rocks = RocksDB.openReadOnly(options, checkpointPath.toString())) {
				byte[] read = rocks.get(messageKey);
				System.out.println("read with RockDB.openReadOnly on checkpoint: " + (read != null));
			}
		}
	}
}

this is the output on my machine is:

/tmp/rocksdb-issues-12503-13352246346231585106
read with RockDB.openReadOnly on main: true
read with RockDB.openReadOnly on checkpoint: true

so this is just confirming the bug. But now if I move the checkpoint from the first block to the second:

public class Minimal {

	public static void main(String[] args) throws Exception {
		final int minBlobSize = 1000;
		final byte[] messageKey = "id".getBytes(StandardCharsets.UTF_8);
		final byte[] message = "a".repeat(minBlobSize).getBytes(StandardCharsets.UTF_8);

		final Path mainPath = Files.createTempDirectory("rocksdb-issues-12503-");
		final Path checkpointPath = mainPath.resolve("checkpoint");
		try (Options options = new Options().setCreateIfMissing(true).setEnableBlobFiles(true).setMinBlobSize(minBlobSize).setBlobCompressionType(CompressionType.ZLIB_COMPRESSION)) {
			System.out.println(mainPath);
			try (RocksDB rocks = RocksDB.open(options, mainPath.toString())) {
				rocks.put(messageKey, message);
				
			}
			try (RocksDB rocks = RocksDB.openReadOnly(options, mainPath.toString())) {
				byte[] read = rocks.get(messageKey);
				System.out.println("read with RockDB.openReadOnly on main: " + (read != null));
				try (Checkpoint checkpoint = Checkpoint.create(rocks)) {
					checkpoint.createCheckpoint(checkpointPath.toString());
				}
			}
			try (RocksDB rocks = RocksDB.openReadOnly(options, checkpointPath.toString())) {
				byte[] read = rocks.get(messageKey);
				System.out.println("read with RockDB.openReadOnly on checkpoint: " + (read != null));
			}

		}
	}
}

the output is:

/tmp/rocksdb-issues-12503-13352246346231585106
read with RockDB.openReadOnly on main: true
read with RockDB.openReadOnly on checkpoint: true

Basically, creating the ckeckpoint from readOnly makes the bug disappear /cc @adamretter @Smith1123 @TheSmithSoftware

NB: in case you're wondering, the problem is really the checkpoint operation. The following code behaves correctly:

public class Minimal {

	public static void main(String[] args) throws Exception {
		final int minBlobSize = 1000;
		final byte[] messageKey = "id".getBytes(StandardCharsets.UTF_8);
		final byte[] message = "a".repeat(minBlobSize).getBytes(StandardCharsets.UTF_8);

		final Path mainPath = Files.createTempDirectory("rocksdb-issues-12503-");
		final Path checkpointPath = mainPath.resolve("checkpoint");
		try (Options options = new Options().setCreateIfMissing(true).setEnableBlobFiles(true).setMinBlobSize(minBlobSize).setBlobCompressionType(CompressionType.ZLIB_COMPRESSION)) {
			System.out.println(mainPath);
			try (RocksDB rocks = RocksDB.open(options, mainPath.toString())) {
				rocks.put(messageKey, message);

			}
			try (RocksDB rocks = RocksDB.openReadOnly(options, mainPath.toString())) {
				byte[] read = rocks.get(messageKey);
				System.out.println("read with RockDB.openReadOnly on main: " + (read != null));
			}
		}
	}
}

NB: all tests are done with RocksDB 9.0.0 on a Linux machine (debian stable).

@rhubner
Copy link
Contributor

rhubner commented Apr 30, 2024

Hello @dfa1,

Thanks for your minimalistic example. I think in your first console output /tmp/rocksdb-issues-12503-13352246346231585106 is an error as it says true when it is supposed to be false. At least when I run it, I'm getting false on both gets.

Last time when I debugged this issue, I wrote a small C++ test where I wasn't able to replicate the same behaviour in C++ as we have in Java. But I made a mistake in the test and now I'm able to replicate it. So at least we are progressing. I think the problem is somewhere around Options, this is where my C++ test previously deviated from Java JNI code.

Radek

cc: @adamretter

@rhubner
Copy link
Contributor

rhubner commented May 1, 2024

Hello @dfa1,

I wrote a small C++ test where I can replicate the issue. It's not in Java code but in C++ code and I think it's related to Options.

If I instantiate Options with auto options = ROCKSDB_NAMESPACE::Options(); It doesn't work. But if I use utils from RocksDB testing framework : auto options = CurrentOptions(); Everything works as expected. I also dumped these options into the console and the only place where they are different(except pointer address) is Options.fs. The working one use LegacyFileSyste and the non working, default use in my case WinFS(I'm developing on Windows)

@pdillinger Is there a certain way how we should create instances of Options ? Are the defaults ok? Do you think that different Filesystem implementations can change behaviour?

#include <cstring>
#include "db/db_test_util.h"

namespace ROCKSDB_NAMESPACE {

class BlobTest : public DBTestBase {
 public:
  BlobTest() : DBTestBase("blob_test", /*env_do_fsync=*/false) {}

};

TEST_F(BlobTest, BlobSnapshotError) {

  const int blob_size = 1000;
  //auto options = CurrentOptions(); // Everything works when we create options with this method.
  auto options = ROCKSDB_NAMESPACE::Options();
  options.create_if_missing = true;
  options.enable_blob_files = true;
  options.min_blob_size = blob_size;

  std::string path = "c:\\tmp\\";
  std::string checkpointPath = path + "\\checkpoint";

  auto big_value = std::make_unique<char[]>(blob_size);
  for (int i = 0; i < blob_size; i++) {
    big_value[i] = 'a';
  }

  auto value = Slice(big_value.get(), blob_size);
  auto key = Slice("some_key");

  { // Create DB, Write data and create checkpoint.
    DB* db = nullptr;

    ASSERT_OK(rocksdb::DB::Open(options, path, &db));
    ASSERT_OK(db->Put(rocksdb::WriteOptions(),key, value ));

    PinnableSlice result_slice;
    ASSERT_OK(db->Get(rocksdb::ReadOptions(), db->DefaultColumnFamily(), key,
                      &result_slice));  //Verify data are in DB
    result_slice.Reset();

    Checkpoint* checkpoint;
    ASSERT_OK(Checkpoint::Create(db, &checkpoint));
    ASSERT_OK(checkpoint->CreateCheckpoint(checkpointPath));
    delete checkpoint;

    ASSERT_OK(db->Close());
    delete db;
  }

  { // Open checkpoint as read only
    DB* db = nullptr;
    ASSERT_OK(rocksdb::DB::OpenForReadOnly(options, checkpointPath, &db, true));
    PinnableSlice result_slice;
    ASSERT_OK(db->Get(rocksdb::ReadOptions(), db->DefaultColumnFamily(), key,
                       &result_slice));
    result_slice.Reset();
    db->Close();
    delete db;

  }
}
}

int main(int argc, char** argv) {
  ROCKSDB_NAMESPACE::port::InstallStackTraceHandler();
  ::testing::InitGoogleTest(&argc, argv);
  RegisterCustomObjects(argc, argv);
  return RUN_ALL_TESTS();
}

cc: @adamretter @alanpaxton

@dfa1
Copy link
Contributor

dfa1 commented May 11, 2024

Perhaps @ltamasi or @ajkr could help with the C++ part?

facebook-github-bot pushed a commit that referenced this issue May 21, 2024
Summary:
While I was trying to understand issue #12503, I found this minor problem. Please have a look adamretter rhubner

Pull Request resolved: #12575

Reviewed By: ajkr

Differential Revision: D57596055

Pulled By: cbi42

fbshipit-source-id: ee0860bdfbee9364cd30c23957b72a04da6acd45
konstantinvilin pushed a commit to konstantinvilin/rocksdb that referenced this issue May 22, 2024
Summary:
While I was trying to understand issue facebook#12503, I found this minor problem. Please have a look adamretter rhubner

Pull Request resolved: facebook#12575

Reviewed By: ajkr

Differential Revision: D57596055

Pulled By: cbi42

fbshipit-source-id: ee0860bdfbee9364cd30c23957b72a04da6acd45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants