-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adds functionality to clear out bad shard list #25398
base: master-1.x
Are you sure you want to change the base?
Conversation
This PR adds test and new method to clear out the bad shards list the method will return the values of the shards that it cleared out along with the errors. This is the first part in the feature for adding a load-shards command to influxd-ctl. Closes influxdata/feature-requests#591
0a8b33e
to
cb3abb4
Compare
@devanbenz Is there a companion PR that adds APIs to access this functionality? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this controlled externally?
tsdb/store.go
Outdated
// were removed from the cache. | ||
func (s *Store) ClearBadShardList() map[uint64]error { | ||
if s.badShards.shardErrors == nil { | ||
badShards := make(map[uint64]error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this needs to be protected by the s.badShards.mu
throughout this method.
s.badShards.shardErrors
should never be nil
, so we probably want to do two things:
- Log a warning
- Set
s.badShards.ShardErrors
to an empty map.
Then we can proceed with the clone and clear code below which will be the same in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClearBadShardList
should use GetBadShardList
so that all the mutex and clone logic is localized to a single method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would produce:
func (s *Store) ClearBadShardList() map[uint64]error {
badShards := s.GetBadShardList()
clear(s.badShards.shardErrors)
return badShards
}
// GetBadShardList is exposed as a method for test purposes
func (s *Store) GetBadShardList() map[uint64]error {
s.badShards.mu.Lock()
defer s.badShards.mu.Unlock()
if s.badShards.shardErrors == nil {
s.Logger.Warn("badShards was nil")
s.badShards.shardErrors = make(map[uint64]error)
}
shardList := maps.Clone(s.badShards.shardErrors)
return shardList
}
So it's likely I would still need to use a mutex in ClearBadShardList()
to call the clear()
method on the original map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you must use the mutex around the clear
tsdb/store.go
Outdated
|
||
// GetBadShardList is exposed as a method for test purposes | ||
func (s *Store) GetBadShardList() map[uint64]error { | ||
return s.badShards.shardErrors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be protected by the mutex here, and the return value cloned for safety.
Yes I'm currently working on the plutonium code for that |
https://github.com/influxdata/plutonium/pull/4193/files This is the WIP PR that consumes the following code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions in comments.
tsdb/store.go
Outdated
} | ||
badShards := maps.Clone(s.badShards.shardErrors) | ||
clear(s.badShards.shardErrors) | ||
s.badShards.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use defer
immediately after the Lock
call as future-proofing against other code paths being added, as well as idiomatic Go.
tsdb/store.go
Outdated
return s.badShards.shardErrors | ||
s.badShards.mu.Lock() | ||
shardList := maps.Clone(s.badShards.shardErrors) | ||
s.badShards.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested above, I would use defer
immediately after the Lock
call.
tsdb/store_test.go
Outdated
if len(badShards) != 1 { | ||
t.Fatalf("expected 1 shard, got %d", len(badShards)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use require.Len(t, badShards, 1)
tsdb/store_test.go
Outdated
} | ||
|
||
// Check that bad shard list has been cleared | ||
require.Equal(t, 0, len(s.Store.GetBadShardList())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this uses require
and works, require.Empty(t, s.GetBadShardList)
is better because it will output what was in the slice / map if it isn't actually empty. This makes debugging issues simpler.
tsdb/store_test.go
Outdated
require.EqualError(t, err2, fmt.Errorf("not attempting to open shard %d; opening shard previously failed with: %w", shId, expErr).Error()) | ||
|
||
// Check that bad shard list now has a bad shard in it | ||
require.Equal(t, 1, len(s.Store.GetBadShardList())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use require.Len(t, s.GetBadShardList, 1)
so the contents of map will be printed out if there isn't exactly 1 item in the list.
tsdb/store_test.go
Outdated
require.NoError(t, err, "opening temp shard") | ||
defer require.NoError(t, sh.Close(), "closing temporary shard") | ||
|
||
require.Equal(t, 0, len(s.Store.GetBadShardList())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use require.Empty
tsdb/store_test.go
Outdated
if len(badShards) != 0 { | ||
t.Fatalf("expected no shards, got %d", len(badShards)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use require.Empty
tsdb/store_test.go
Outdated
} | ||
|
||
// Check that bad shard list has been cleared | ||
require.Equal(t, 0, len(s.Store.GetBadShardList())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use require.Empty
for _, idx := range indexes { | ||
func() { | ||
s := MustOpenStore(t, idx) | ||
defer require.NoErrorf(t, s.Close(), "closing store with index type: %s", idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice error checking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't s.Close
called on the defer
line?
- A deferred function’s arguments are evaluated when the defer statement is evaluated.
tsdb/store_test.go
Outdated
s.SetShardOpenErrorForTest(sh.ID(), expErr) | ||
err2 := s.OpenShard(sh.Shard, false) | ||
require.Error(t, err2, "no error opening bad shard") | ||
require.True(t, errors.Is(err2, tsdb.ErrPreviousShardFail{}), "exp: ErrPreviousShardFail, got: %v", err2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use require.ErrorIs(t, err2, tsdb.ErrPerviousShardFail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions around defer
semantics
for _, idx := range indexes { | ||
func() { | ||
s := MustOpenStore(t, idx) | ||
defer require.NoErrorf(t, s.Close(), "closing store with index type: %s", idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this close the store immediately, because arguments to deferred functions are evaluated at the defer
line? See here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm interesting - this similar construct is used through out the tests (outside of the ones I worked on)
I'll go through this and see what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the Go playground link I shared for a demonstration.
for _, idx := range indexes { | ||
func() { | ||
s := MustOpenStore(t, idx) | ||
defer require.NoErrorf(t, s.Close(), "closing store with index type: %s", idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't s.Close
called on the defer
line?
- A deferred function’s arguments are evaluated when the defer statement is evaluated.
sh := tsdb.NewTempShard(idx) | ||
err := s.OpenShard(sh.Shard, false) | ||
require.NoError(t, err, "opening temp shard") | ||
defer require.NoError(t, sh.Close(), "closing temporary shard") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, when is sh.Close
called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a defer
is not necessary here, since the shard is never used after opening.
This PR adds test and new method to clear out the bad shards list
the method will return the values of the shards that it cleared out
along with the errors. This is the first part in the feature
for adding a load-shards command to influxd-ctl.
companion PR: https://github.com/influxdata/plutonium/pull/4193
Closes https://github.com/influxdata/feature-requests/issues/591