-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: resource estimate improvement #32964
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: chyezh The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@chyezh ut workflow job failed, comment |
@chyezh E2e jenkins job failed, comment |
d774a06
to
5c8d83c
Compare
@chyezh E2e jenkins job failed, comment |
@chyezh ut workflow job failed, comment |
@chyezh E2e jenkins job failed, comment |
46989a0
to
595b74c
Compare
@chyezh E2e jenkins job failed, comment |
/run-cpu-e2e |
rerun ut |
@chyezh E2e jenkins job failed, comment |
/run-cpu-e2e |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #32964 +/- ##
========================================
Coverage 82.07% 82.08%
========================================
Files 1007 999 -8
Lines 127647 128008 +361
========================================
+ Hits 104767 105074 +307
- Misses 18903 18960 +57
+ Partials 3977 3974 -3
|
f37ae6f
to
dd77db4
Compare
@chyezh ut workflow job failed, comment |
@chyezh E2e jenkins job failed, comment |
Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: chyezh <chyezh@outlook.com>
rerun ut |
1 similar comment
rerun ut |
@chyezh E2e jenkins job failed, comment |
/run-cpu-e2e |
rerun ut |
1 similar comment
rerun ut |
Signed-off-by: chyezh <chyezh@outlook.com>
@chyezh E2e jenkins job failed, comment |
/run-cpu-e2e |
/lgtm |
@chyezh E2e jenkins job failed, comment |
/run-cpu-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm generally for the cpp part.
throw SegcoreError( | ||
ErrorCode::UnistdError, | ||
fmt::format("write index to fd error: {}", strerror(errno))); | ||
throw; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw;
will rethrow the current exception.
write_disk_duration_sum += | ||
(std::chrono::system_clock::now() - start_write_file); | ||
AssertInfo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ignore this exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File instance in File.h
is already handing the exception.
So exception handling can be ignored here.
milvus::ResourceUsage | ||
GetResourceUsage() const override { | ||
auto mem_size = stats_.mem_size.load() + deleted_record_.mem_size(); | ||
return milvus::ResourceUsage{mem_size, 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also consider the intermediate index for growing index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it will be a TODO after incoming growing mmap index
.
ResourceUsage | ||
StringIndexMarisa::GetResourceUsage() const { | ||
// TODO: we cannot estimate the memory usage of marisa trie. | ||
return resource_usage_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
die we enable mmap for marisa trie?
if so, the memory usage can be set to 0
VectorDiskAnnIndex<T>::GetResourceUsage() const { | ||
// TODO: we can't get the memory usage of vector index now. | ||
auto disk_size = file_manager_->GetLocalFileSize(); | ||
return ResourceUsage{0, disk_size}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this need to double check with @liliu-z
i believe there is a way to estimate memory utilizations
ResourceUsage | ||
VectorMemIndex<T>::GetResourceUsage() const { | ||
// TODO: we can't get the memory usage of vector index now. | ||
return resource_usage_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if mmap is not enbaled, resource usage should be all memory usage. but I dind't see any calculation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will use file size as mem size if disable mmap
@@ -179,12 +167,6 @@ type promMetricsObserver struct { | |||
nodeID string | |||
label SegmentLabel // never updates | |||
|
|||
DiskCacheLoadTotal prometheus.Counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we remove these metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already exported in cache monitor.go
repeated metrics can be removed.
// ResourceUsageEstimate returns the estimated resource usage of the segment | ||
ResourceUsageEstimate() ResourceUsage | ||
|
||
// ResourceUsageEstimate returns the estimated resource usage of the segment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the variable name,
ResourceUsageEstimateofLoad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
return pk.Size() + 8 | ||
}) | ||
return SegmentResourceUsage{ | ||
Predict: ResourceUsage{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you get predict value before you load it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original segment loader logic is used in prediction.
@@ -180,7 +180,7 @@ func NewManager() *Manager { | |||
if segment == nil { | |||
return 0 | |||
} | |||
return int64(segment.ResourceUsageEstimate().DiskSize) | |||
return int64(segment.ResourceUsageEstimate().GetInuseOrPredictDiskUsage()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for cache, if weight is set.
once we use index as alternative, or add new delete, the weight will not be correct.
@@ -1540,7 +1540,7 @@ func (loader *segmentLoader) checkSegmentSize(ctx context.Context, segmentLoadIn | |||
zap.Float64("diskUsage(MB)", toMB(usage.DiskSize)), | |||
zap.Float64("memoryLoadFactor", factor.memoryUsageFactor), | |||
) | |||
mmapFieldCount += usage.MmapFieldCount | |||
mmapFieldCount += mmapFieldCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does we use mmapFieldCount?
and mmapFieldCount += mmapFieldCount seems to be a bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only used in logging, record how many fields to be mmap.
} | ||
|
||
type LazyScavenger[K comparable] struct { | ||
capacity int64 | ||
size int64 | ||
size *atomic.Int64 | ||
weight func(K) int64 | ||
weights map[K]int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not maintain weights here.
we should cache sizable object
issue: #32963
Add Exception to file class to avoid exception ignore.
Add GetResourceUsage method for index, column and segment.
Use bin log size but not mem size in CU of search and query.
Use in-use disk size in disk cache weight if segment is loaded.
Enable cache metric itself, remove redundant metric at search/query path.