Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adds functionality to clear out bad shard list #25398

Open
wants to merge 8 commits into
base: master-1.x
Choose a base branch
from

Conversation

devanbenz
Copy link

@devanbenz devanbenz commented Sep 25, 2024

This PR adds test and new method to clear out the bad shards list
the method will return the values of the shards that it cleared out
along with the errors. This is the first part in the feature
for adding a load-shards command to influxd-ctl.

companion PR: https://github.com/influxdata/plutonium/pull/4193

Closes https://github.com/influxdata/feature-requests/issues/591

@devanbenz devanbenz changed the title feat: WIP working on deleting badShards feat: Adds functionality to clear out bad shard list Sep 27, 2024
This PR adds test and new method to clear out the bad shards list
the method will return the values of the shards that it cleared out
along with the errors. This is the first part in the feature
for adding a load-shards command to influxd-ctl.

Closes influxdata/feature-requests#591
@devanbenz devanbenz force-pushed the feat/591-load-failed-shards-cmd branch from 0a8b33e to cb3abb4 Compare September 27, 2024 15:11
@devanbenz devanbenz marked this pull request as ready for review September 27, 2024 15:11
@devanbenz devanbenz requested review from gwossum and davidby-influx and removed request for gwossum October 16, 2024 19:42
@gwossum
Copy link
Member

gwossum commented Oct 16, 2024

@devanbenz Is there a companion PR that adds APIs to access this functionality?

Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this controlled externally?

tsdb/store.go Outdated
// were removed from the cache.
func (s *Store) ClearBadShardList() map[uint64]error {
if s.badShards.shardErrors == nil {
badShards := make(map[uint64]error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this needs to be protected by the s.badShards.mu throughout this method.

s.badShards.shardErrors should never be nil, so we probably want to do two things:

  1. Log a warning
  2. Set s.badShards.ShardErrors to an empty map.

Then we can proceed with the clone and clear code below which will be the same in all cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClearBadShardList should use GetBadShardList so that all the mutex and clone logic is localized to a single method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would produce:

func (s *Store) ClearBadShardList() map[uint64]error {
	badShards := s.GetBadShardList()
	clear(s.badShards.shardErrors)

	return badShards
}

// GetBadShardList is exposed as a method for test purposes
func (s *Store) GetBadShardList() map[uint64]error {
	s.badShards.mu.Lock()
	defer s.badShards.mu.Unlock()

	if s.badShards.shardErrors == nil {
		s.Logger.Warn("badShards was nil")
		s.badShards.shardErrors = make(map[uint64]error)
	}

	shardList := maps.Clone(s.badShards.shardErrors)

	return shardList
}

So it's likely I would still need to use a mutex in ClearBadShardList() to call the clear() method on the original map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you must use the mutex around the clear

tsdb/store.go Outdated

// GetBadShardList is exposed as a method for test purposes
func (s *Store) GetBadShardList() map[uint64]error {
return s.badShards.shardErrors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be protected by the mutex here, and the return value cloned for safety.

@devanbenz
Copy link
Author

@devanbenz Is there a companion PR that adds APIs to access this functionality?

Yes I'm currently working on the plutonium code for that

@devanbenz
Copy link
Author

https://github.com/influxdata/plutonium/pull/4193/files This is the WIP PR that consumes the following code.

Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions in comments.

tsdb/store.go Outdated
}
badShards := maps.Clone(s.badShards.shardErrors)
clear(s.badShards.shardErrors)
s.badShards.mu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use defer immediately after the Lock call as future-proofing against other code paths being added, as well as idiomatic Go.

tsdb/store.go Outdated
return s.badShards.shardErrors
s.badShards.mu.Lock()
shardList := maps.Clone(s.badShards.shardErrors)
s.badShards.mu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested above, I would use defer immediately after the Lock call.

Comment on lines 289 to 291
if len(badShards) != 1 {
t.Fatalf("expected 1 shard, got %d", len(badShards))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use require.Len(t, badShards, 1)

}

// Check that bad shard list has been cleared
require.Equal(t, 0, len(s.Store.GetBadShardList()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this uses require and works, require.Empty(t, s.GetBadShardList) is better because it will output what was in the slice / map if it isn't actually empty. This makes debugging issues simpler.

require.EqualError(t, err2, fmt.Errorf("not attempting to open shard %d; opening shard previously failed with: %w", shId, expErr).Error())

// Check that bad shard list now has a bad shard in it
require.Equal(t, 1, len(s.Store.GetBadShardList()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use require.Len(t, s.GetBadShardList, 1) so the contents of map will be printed out if there isn't exactly 1 item in the list.

require.NoError(t, err, "opening temp shard")
defer require.NoError(t, sh.Close(), "closing temporary shard")

require.Equal(t, 0, len(s.Store.GetBadShardList()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use require.Empty

Comment on lines 323 to 325
if len(badShards) != 0 {
t.Fatalf("expected no shards, got %d", len(badShards))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use require.Empty

}

// Check that bad shard list has been cleared
require.Equal(t, 0, len(s.Store.GetBadShardList()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use require.Empty

for _, idx := range indexes {
func() {
s := MustOpenStore(t, idx)
defer require.NoErrorf(t, s.Close(), "closing store with index type: %s", idx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice error checking!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't s.Close called on the defer line?

  1. A deferred function’s arguments are evaluated when the defer statement is evaluated.

https://go.dev/blog/defer-panic-and-recover

s.SetShardOpenErrorForTest(sh.ID(), expErr)
err2 := s.OpenShard(sh.Shard, false)
require.Error(t, err2, "no error opening bad shard")
require.True(t, errors.Is(err2, tsdb.ErrPreviousShardFail{}), "exp: ErrPreviousShardFail, got: %v", err2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use require.ErrorIs(t, err2, tsdb.ErrPerviousShardFail)

Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions around defer semantics

for _, idx := range indexes {
func() {
s := MustOpenStore(t, idx)
defer require.NoErrorf(t, s.Close(), "closing store with index type: %s", idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this close the store immediately, because arguments to deferred functions are evaluated at the defer line? See here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm interesting - this similar construct is used through out the tests (outside of the ones I worked on)

I'll go through this and see what's going on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the Go playground link I shared for a demonstration.

for _, idx := range indexes {
func() {
s := MustOpenStore(t, idx)
defer require.NoErrorf(t, s.Close(), "closing store with index type: %s", idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't s.Close called on the defer line?

  1. A deferred function’s arguments are evaluated when the defer statement is evaluated.

https://go.dev/blog/defer-panic-and-recover

sh := tsdb.NewTempShard(idx)
err := s.OpenShard(sh.Shard, false)
require.NoError(t, err, "opening temp shard")
defer require.NoError(t, sh.Close(), "closing temporary shard")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, when is sh.Close called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a defer is not necessary here, since the shard is never used after opening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants