Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Buffer Manager] Add StorageRegion class to read and write pages to disk #2621

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

nikriek
Copy link
Contributor

@nikriek nikriek commented Oct 17, 2023

This PR depends on #2618.

We add the StorageRegion class that allows reading and writing pages to disk using standard posix IO.

@nikriek nikriek added the FullCI Run all CI tests (slow, but required for merge) label Oct 17, 2023
src/lib/storage/buffer/persistence_manager.hpp Outdated Show resolved Hide resolved
src/lib/storage/buffer/persistence_manager.hpp Outdated Show resolved Hide resolved
src/lib/storage/buffer/persistence_manager.hpp Outdated Show resolved Hide resolved
src/lib/storage/buffer/persistence_manager.hpp Outdated Show resolved Hide resolved
src/lib/storage/buffer/persistence_manager.hpp Outdated Show resolved Hide resolved
src/test/lib/storage/buffer/persistence_manager_test.cpp Outdated Show resolved Hide resolved
src/test/lib/storage/buffer/persistence_manager_test.cpp Outdated Show resolved Hide resolved
src/test/lib/storage/buffer/persistence_manager_test.cpp Outdated Show resolved Hide resolved
src/test/lib/storage/buffer/persistence_manager_test.cpp Outdated Show resolved Hide resolved
src/test/lib/storage/buffer/persistence_manager_test.cpp Outdated Show resolved Hide resolved
@nikriek nikriek changed the title [Buffer Manager] Add Persistence Manager class to read and write pages to disk [Buffer Manager] Add StorageRegion class to read and write pages to disk Nov 2, 2023
@nikriek nikriek marked this pull request as ready for review November 3, 2023 07:53
@nikriek nikriek requested a review from mweisgut November 5, 2023 21:03
@@ -95,7 +95,7 @@ struct PageID {

static_assert(sizeof(PageID) == 8, "PageID must be 64 bit");

std::ostream& operator<<(std::ostream& os, const PageID& page_id) {
inline std::ostream& operator<<(std::ostream& os, const PageID& page_id) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why inline?

@@ -0,0 +1,166 @@
#include "storage_region.hpp"
#ifdef __linux__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment?


namespace hyrise {

inline void DebugAssertPageAlignment(const std::byte* data) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why inline?


inline void DebugAssertPageAlignment(const std::byte* data) {
DebugAssert(reinterpret_cast<std::uintptr_t>(data) % 512 == 0,
"Data buffer is not properly aligned to 512 byte boundary as required for direct IO: " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be "inproperly aligned to 512 byte"?

static StorageRegion::Mode find_mode_or_fail(const std::filesystem::path& file_name) {
if (std::filesystem::is_directory(file_name)) {
return StorageRegion::Mode::FILE_PER_PAGE_SIZE;
} else if (std::filesystem::is_block_file(file_name)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess tidy will complain to avoid else after return.

Two stand-alone if's with a trailing fail?

if (fcntl(fd, F_NOCACHE, 1) == -1) {
const auto error = errno;
close(fd);
Fail("Error while setting F_NOCACHE on __APPLE__: " + strerror(error));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on macOS? Looks like a placeholder that wasn't replaced.

const auto handle = _file_handles[static_cast<uint64_t>(page_id.size_type())];
const auto byte_count = page_id.byte_count();
const auto offset = handle.offset + byte_count * page_id.index();
DebugAssertPageAlignment(data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't really matter, but I would check for asserts on the input parameters in the beginning of the method.

std::array<StorageRegion::FileHandle, PAGE_SIZE_TYPES_COUNT> StorageRegion::_open_file_handles_in_directory(
const std::filesystem::path& path) {
DebugAssert(std::filesystem::is_directory(path), "StorageRegion path must be a directory");
auto array = std::array<StorageRegion::FileHandle, PAGE_SIZE_TYPES_COUNT>{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto file_handles = ...?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array can be more descriptive, see Martin's comment

std::to_string(page_size_type_index) + ".bin");
array[page_size_type_index] = {_open_file_descriptor(file_name), file_name};
// We initialize the file with 32 MiB.
std::filesystem::resize_file(file_name, 1UL << 25);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When/how is this ever resized? Let's say I run SF 1000.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we always use something like 300 MB for the StorageRegion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When/how is this ever resized? Let's say I run SF 1000.

if you use pwrite to write data to a position beyond the current end of the file, the OS will extend the file size to include the new data.


const auto fd = _open_file_descriptor(path);

// ioctl with the BLKGETSIZE64 allows us to get the size of the block device on Linux
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing periods.

int StorageRegion::_open_file_descriptor(const std::filesystem::path& file_name) {
// Partially taken from
// DuckDB: https://github.com/duckdb/duckdb/blob/60ed227816669be497fa4ba53e593d3899479c43/src/common/local_file_system.cpp
// LevelDB: https://github.com/google/leveldb/commit/296de8d5b8e4e57bd1e46c981114dfbe58a8c4fa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather use a permalink to the file instead of the commit?

// LevelDB: https://github.com/google/leveldb/commit/296de8d5b8e4e57bd1e46c981114dfbe58a8c4fa

// On Linux, we use O_DIRECT to bypass the page cache. This is not supported on Mac,
// but we use the fcntl(F_NOCACHE) call below. Other nan that, files are opened in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// but we use the fcntl(F_NOCACHE) call below. Other nan that, files are opened in
// but we use the fcntl(F_NOCACHE) call below. Other than that, files are opened in

namespace hyrise {

/**
* StorageRegion manages read and write access of pages to an SSD. StorageRegion can either use a block device directly by assigning equal-sized regions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* StorageRegion manages read and write access of pages to an SSD. StorageRegion can either use a block device directly by assigning equal-sized regions
* StorageRegion manages read and write access of pages to a storage device. StorageRegion can either use a block device directly by assigning equal-sized regions

std::array<StorageRegion::FileHandle, PAGE_SIZE_TYPES_COUNT> StorageRegion::_open_file_handles_in_directory(
const std::filesystem::path& path) {
DebugAssert(std::filesystem::is_directory(path), "StorageRegion path must be a directory");
auto array = std::array<StorageRegion::FileHandle, PAGE_SIZE_TYPES_COUNT>{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array can be more descriptive, see Martin's comment


std::array<StorageRegion::FileHandle, PAGE_SIZE_TYPES_COUNT> StorageRegion::_open_file_handles_block(
const std::filesystem::path& path) {
DebugAssert(std::filesystem::is_block_file(path), "StorageRegion path must be a directory");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DebugAssert(std::filesystem::is_block_file(path), "StorageRegion path must be a directory");
DebugAssert(std::filesystem::is_block_file(path), "StorageRegion path must be a block device");

std::atomic_uint64_t _total_bytes_written = 0;
std::atomic_uint64_t _total_bytes_read = 0;

// Open file haswndles for the given path. If the path is a directory,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to Martin's comment below, this comment sounds like multiple types of paths are possible here.

}

TEST_F(StorageRegionTest, WriteFailsWithUnalignedData) {
if constexpr (!HYRISE_DEBUG) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here about why this test case is only executed in Debug build would help.

}

TEST_F(StorageRegionTest, ReadFailsWithUnalignedData) {
if constexpr (!HYRISE_DEBUG) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here about why this test case is only executed in debug build would help.

std::memset(write_page.data.data(), 0x1, page_id.byte_count());
storage_region->write_page(page_id, write_page.data.data());
storage_region->read_page(page_id, read_page1.data.data());
EXPECT_EQ(std::memcmp(read_page1.data.data(), read_page1.data.data(), page_id.byte_count()), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rhs and lhs are the buffer pointers. Better compare write_page and read_page1?

auto read_page2 = Page{};
std::memset(write_page.data.data(), 0x2, page_id.byte_count());
storage_region->write_page(page_id, write_page.data.data());
EXPECT_NE(std::memcmp(read_page1.data.data(), read_page2.data.data(), page_id.byte_count()), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_page2 is default initialized here. read_page(page_id, read_page2) is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FullCI Run all CI tests (slow, but required for merge)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants