Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adds functionality to clear out bad shard list #25398

Open
wants to merge 8 commits into
base: master-1.x
Choose a base branch
from
29 changes: 29 additions & 0 deletions tsdb/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (
"errors"
"fmt"
"io"
"maps"
"os"
"path/filepath"
"runtime"
Expand Down Expand Up @@ -672,6 +673,34 @@ func (s *Store) Shard(id uint64) *Shard {
return sh
}

// ClearBadShardList will remove all shards from the badShards cache
// this will allow for lazy loading of bad shards if/when they are no
// longer in a "bad" state. This method will return any shards that
// were removed from the cache.
func (s *Store) ClearBadShardList() map[uint64]error {
s.badShards.mu.Lock()
defer s.badShards.mu.Unlock()

if s.badShards.shardErrors == nil {
s.Logger.Warn("badShards was nil")
s.badShards.shardErrors = make(map[uint64]error)
}
badShards := maps.Clone(s.badShards.shardErrors)
clear(s.badShards.shardErrors)

return badShards
}

// GetBadShardList is exposed as a method for test purposes
func (s *Store) GetBadShardList() map[uint64]error {
s.badShards.mu.Lock()
defer s.badShards.mu.Unlock()

shardList := maps.Clone(s.badShards.shardErrors)

return shardList
}

type ErrPreviousShardFail struct {
error
}
Expand Down
64 changes: 64 additions & 0 deletions tsdb/store_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,70 @@ func TestStore_BadShard(t *testing.T) {
}
}

func TestStore_BadShardClear(t *testing.T) {
const errStr = "a shard open error"
indexes := tsdb.RegisteredIndexes()
for _, idx := range indexes {
func() {
s := MustOpenStore(t, idx)
defer require.NoErrorf(t, s.Close(), "closing store with index type: %s", idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this close the store immediately, because arguments to deferred functions are evaluated at the defer line? See here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm interesting - this similar construct is used through out the tests (outside of the ones I worked on)

I'll go through this and see what's going on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the Go playground link I shared for a demonstration.


sh := tsdb.NewTempShard(idx)
shId := sh.ID()
err := s.OpenShard(sh.Shard, false)
require.NoError(t, err, "opening temp shard")
defer require.NoError(t, sh.Close(), "closing temporary shard")

expErr := errors.New(errStr)
s.SetShardOpenErrorForTest(sh.ID(), expErr)
err2 := s.OpenShard(sh.Shard, false)
require.Error(t, err2, "no error opening bad shard")
require.ErrorIs(t, err2, tsdb.ErrPreviousShardFail{})
require.EqualError(t, err2, fmt.Errorf("not attempting to open shard %d; opening shard previously failed with: %w", shId, expErr).Error())

require.Equal(t, 1, len(s.Store.GetBadShardList()))

badShards := s.ClearBadShardList()
require.Len(t, badShards, 1)

// Check that bad shard list has been cleared
require.Empty(t, s.Store.GetBadShardList())

s.SetShardOpenErrorForTest(sh.ID(), expErr)
err2 = s.OpenShard(sh.Shard, false)
require.Error(t, err2, "no error opening bad shard")
require.ErrorIs(t, err2, tsdb.ErrPreviousShardFail{})
require.EqualError(t, err2, fmt.Errorf("not attempting to open shard %d; opening shard previously failed with: %w", shId, expErr).Error())

// Check that bad shard list now has a bad shard in it
require.Len(t, s.Store.GetBadShardList(), 1)
}()
}
}

func TestStore_BadShardClearNoBadShards(t *testing.T) {
indexes := tsdb.RegisteredIndexes()
for _, idx := range indexes {
func() {
s := MustOpenStore(t, idx)
defer require.NoErrorf(t, s.Close(), "closing store with index type: %s", idx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice error checking!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't s.Close called on the defer line?

  1. A deferred function’s arguments are evaluated when the defer statement is evaluated.

https://go.dev/blog/defer-panic-and-recover


sh := tsdb.NewTempShard(idx)
err := s.OpenShard(sh.Shard, false)
require.NoError(t, err, "opening temp shard")
defer require.NoError(t, sh.Close(), "closing temporary shard")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, when is sh.Close called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a defer is not necessary here, since the shard is never used after opening.


require.Empty(t, s.Store.GetBadShardList())

badShards := s.ClearBadShardList()
require.Empty(t, badShards)

// Check that bad shard list has been cleared
require.Empty(t, s.Store.GetBadShardList())
}()
}
}

func TestStore_CreateMixedShards(t *testing.T) {
t.Parallel()

Expand Down