Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crl provider: Static and FileWatcher provider implementations #6670

Merged
merged 63 commits into from Oct 31, 2023

Conversation

erm-g
Copy link
Contributor

@erm-g erm-g commented Sep 29, 2023

Part of CRL Provider effort. The functionality in this PR includes a built in Static and FileWatcher provider implementations. It also gives users a tool to implement their own providers

RELEASE NOTES: N/A

@erm-g erm-g changed the title CRL Provider: Static and FileWatcher provider implementations crl provider: Static and FileWatcher provider implementations Oct 1, 2023
@erm-g erm-g added Type: Feature New features or improvements in behavior Type: Security A bug or other problem affecting security labels Oct 1, 2023
@erm-g erm-g added this to the 1.59 Release milestone Oct 1, 2023
Copy link
Contributor

@gtcooke94 gtcooke94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! Have a few questions/comments

security/advancedtls/advancedtls_test.go Outdated Show resolved Hide resolved
@@ -594,14 +619,14 @@ func (s) TestClientServerHandshake(t *testing.T) {
// server custom check fails
{
desc: "Client sets peer cert, reload root function with verifyFuncGood; Server sets bad custom check; mutualTLS",
clientCert: []tls.Certificate{cs.ClientCert1},
clientGetRoot: getRootCAsForClient,
clientCert: []tls.Certificate{cs.ClientCert3},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name the certs for what they are rather than just being numbered?

e.g. if ClientCert3 is a cert that is revoked, just name is ClientRevokedCert or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to leave it as is for this PR since it fits the current pattern - security/advancedtls/internal/testutils/testutils.go
Definitely something to address during 'Unifying API' effort (will be a separate PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should continue to add bad code that we plan to refactor later just to keep with a bad existing pattern - let's go ahead and name these variables with something meaningful rather than making more work for ourselves down the road.

In fact, I'd argue the current pattern isn't much of a pattern - it's just lazy variable naming. For there to be a pattern there would need to be some meaning to Cert1, Cert2, etc

clientVerifyFunc: clientVerifyFuncGood,
clientVType: CertVerification,
clientExpectHandshakeError: true,
serverMutualTLS: true,
serverCert: []tls.Certificate{cs.ServerCert1},
serverGetRoot: getRootCAsForServer,
serverCert: []tls.Certificate{cs.ServerCert3},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about variable naming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to leave it as is for this PR since it fits the current pattern - security/advancedtls/internal/testutils/testutils.go
Definitely something to address during 'Unifying API' effort (will be a separate PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other thread

@@ -594,14 +619,14 @@ func (s) TestClientServerHandshake(t *testing.T) {
// server custom check fails
{
desc: "Client sets peer cert, reload root function with verifyFuncGood; Server sets bad custom check; mutualTLS",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you changed this test case and it's not a new one - just curious as to why you changed it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch, I was experimenting with some chains and accidentally submitted (it doesn't influence the outputs). Will revert it back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -704,6 +729,37 @@ func (s) TestClientServerHandshake(t *testing.T) {
Cache: cache,
},
},
// Client: set valid credentials with the revocation config
// Server: set valid credentials with the revocation config
// Expected Behavior: success, because non of the certificate chains sent in the connection are revoked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: non -> none

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You fixed the wrong non :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

failCounter := 0
for _, file := range files {
filePath := fmt.Sprintf("%s/%s", p.opts.CRLDirectory, file.Name())
err := p.addCRL(filePath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some thread safety around these updates, or we need to use a backing map type that is thread safe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for _, tt := range tests {
t.Run(tt.desc, func(t *testing.T) {
for _, c := range tt.certs {
crl, err := p.CRL(c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also check that the correct CRL is returned, rather than just that a non-nil value is returned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point but I'd prefer to keep it very simple here and do detailed checks on integration test level

// and the validation of Options configuration. The configurations include empty
// one, non existing CRLDirectory, invalid RefreshDuration, and the correct one.
func TestFileWatcherCRLProviderConfig(t *testing.T) {
if _, err := MakeFileWatcherCRLProvider(Options{}); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can split the if cases here into separate tests, or make this a Table Test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have 3 separate test methods - to check config, CRL api, and directory scanning. I think config one can also be fit to Table test pattern (like the other 2) but I think it's a little bit easier to read it in the current format

// and the validation of Options configuration. The configurations include empty
// one, non existing CRLDirectory, invalid RefreshDuration, and the correct one.
func TestFileWatcherCRLProviderConfig(t *testing.T) {
if _, err := MakeFileWatcherCRLProvider(Options{}); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And let's check that the errors being returned are the expected errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at how tests are organized for credentials/tls/certprovider/pemfile/watcher_test.go I think it's an overkill for the config tests.

// intermediate chains, as well as a chain without CRL for issuer, and check
// that it’s correctly processed. Additionally, we also check if number of
// invocations of custom callback is correct.
func TestFileWatcherCRLProvider(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test verifying the async refreshing goroutine works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, it'll require playing with the timers so I'm planning to have a separate PR for that (like e2e test)

@@ -136,9 +135,13 @@ func (o *FileWatcherOptions) validate() error {
return fmt.Errorf("advancedtls: CRLDirectory %v is not readable: %v", o.CRLDirectory, err)
}
// Checks related to RefreshDuration.
if o.RefreshDuration < time.Minute {
if o.RefreshDuration <= 0 {
grpclogLogger.Warningf("RefreshDuration is not set or negative: provided value %v, default value %v will be used.", o.RefreshDuration, defaultCRLRefreshDuration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not a warning to use the default value...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@erm-g erm-g assigned dfawley and unassigned erm-g Oct 26, 2023
Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the last nit. And also investigating the race test failure.

@@ -136,7 +136,7 @@ func (o *FileWatcherOptions) validate() error {
}
// Checks related to RefreshDuration.
if o.RefreshDuration <= 0 {
grpclogLogger.Warningf("RefreshDuration is not set or negative: provided value %v, default value %v will be used.", o.RefreshDuration, defaultCRLRefreshDuration)
grpclogLogger.Infof("RefreshDuration is not set or negative: provided value %v, default value %v will be used.", o.RefreshDuration, defaultCRLRefreshDuration)
Copy link
Member

@dfawley dfawley Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think this necessarily needs to be logged at all? Especially in the case where you're just using the default, anything being logged would be spam. Maybe if you had a negative number? But who would even do such a thing? IMO:

if dur == 0 {
	// use default, silently
} else if dur < 1m {
	// use 1m and info or warning that we're using 1m instead of the illegal value they used
}

See https://github.com/grpc/grpc-go/blob/master/Documentation/log_levels.md for some guidance we try to follow regarding this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dfawley dfawley assigned erm-g and unassigned dfawley Oct 26, 2023
@erm-g erm-g assigned dfawley and unassigned erm-g Oct 27, 2023
@erm-g erm-g requested a review from dfawley October 27, 2023 00:24
const nonCRLFilesUnderCRLDirectory = 5
nonCRLFilesSet := make(map[string]struct{})
//var callbackMutex sync.Mutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -191,9 +197,11 @@ func (s) TestFileWatcherCRLProvider(t *testing.T) {
}
})
}
nonCRLMutex.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does calling p.Close here, before the check, not remove the need for the mutex? If not, why not? I would think that would eliminate the concurrency concerns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a very good point - current p.Close impl waits for gouroutines to end. I tested locally and -race is green

@dfawley dfawley assigned erm-g and unassigned dfawley Oct 27, 2023
@@ -141,7 +141,6 @@ func (s) TestFileWatcherCRLProvider(t *testing.T) {
t.Parallel()
const nonCRLFilesUnderCRLDirectory = 5
nonCRLFilesSet := make(map[string]struct{})
//var callbackMutex sync.Mutex
var nonCRLMutex sync.Mutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can you delete this now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there is no need to protect this local to the test map anymore. We do the read once provider's goroutine is done

Comment on lines 267 to 268
p.scanMutex.Lock()
defer p.scanMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything similar here? We really shouldn't need mutexes while testing conditions, or else they might be flaky if we lose a race.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm not sure - I need the same instance of Provider for all the table tests, and I need to make the check cmp.Diff(len(p.crls), tt.expectedEntries) before the next test case scan, that's why I use mutex from p.ScanCRLDirectory

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, you're taking the mutex from the thing you're testing.

Can you call p.CRL() instead of directly accessing p.crls then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's doable - I can provide a list of filenames to be scanned ([]string{"2.crl", "3.crl"} and list of CRLs to be found in the map by calling p.CRL()
Some downside - the current approach ensures the map contains 'exactly x' entries (len(p.crls)) but the new one will be about 'at least x' entries. Is that ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, CRL() takes a parameter and you'd need to call it multiple times.

This is OK, then, though I'd prefer stricter validation, i.e. checking the contents of the map, or at least the keys, and not just the length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the refactoring to the stricter validation and I think it looks better now (plus no more calling mutexes inside tests)

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gtcooke94 might want to take another pass after all the changes, but LGTM for me.

Comment on lines 267 to 268
p.scanMutex.Lock()
defer p.scanMutex.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, CRL() takes a parameter and you'd need to call it multiple times.

This is OK, then, though I'd prefer stricter validation, i.e. checking the contents of the map, or at least the keys, and not just the length.

// are not independent from each other.
func (s) TestFileWatcherCRLProviderDirectoryScan(t *testing.T) {
sourcePath := testdata.Path("crl")
targetPath := createTmpDir(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we remove this directory once the tests are finished? Or are all the Go versions we support recent enough for us to use https://godoc.corp.google.com/pkg/testing#T.TempDir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added the 'remove dir' approach.

@erm-g erm-g merged commit b82468a into grpc:master Oct 31, 2023
14 checks passed
arvindbr8 pushed a commit to arvindbr8/grpc-go that referenced this pull request Nov 7, 2023
)

* rename certificateListExt to CRL

* CRLProvider file

* Add CRLProvider to RevocationConfig

* Beginning refactor of CRL handling

* Shell of StaticCRLProvider

* basic static crl provider test

* use loadCRL helper

* refactor of CRL loading

* Table tests

* Table tests

* Add tests with Static CRL provider

* New certs to be used for CRL tests. Added test for passing and failing connections based on CRL check outcomes

* Main functionality of File Watcher (Directory) CRL provider

* Refactor async go routine, validate() func, add unit tests

* Custom error callback, related unit tests

* Error callback test improvement

* Comments for StaticCRLProvider

* Comments for public API

* go mod tidy

* Comments for tests

* Fix vet errors

* Change Static provider behavior to match C Core, address other PR comments

* Data race fix

* Test helper fn change

* Address PR comments

* Address PR comments (part 2)

* Migration from context to channel for controlling crl reloading goroutine

* Align in-memory CRL updates during directory scan to C++ behavior

* Improve comments for ScanCRLDirectory

* Base test case for Scan CRL Directory file manipulations

* full set of cases for CRL directory content manipulation

* Add comment for table test structure

* Fix for go.mod and go.sum

* Empty directoru workaround

* Delete deprecated crl functionality

* Restoring deprecated crl files

* Fit to grpctest.Tester pattern

* Update readme for crl provider tests

* Address PR comments

* Revert "Restoring deprecated crl files"

This reverts commit 5643760.

* Revert "Resolve conflicts with upstream - deletion of deprecated crl"

This reverts commit e013064, reversing
changes made to 21f4301.

Revert deletion

* Update link for gRFC proposal

* Address PR comments

* Address PR comments part 1

* Address PR comments part 2

* Address PR comments part 3

* Fix for go.mod and go.sum

* Fix comment typo

* Fix for gRFC tag

* Add more details to CRL api  godoc comments.

* Address PR comments

* Address PR comments

* Delete crl_deprecated.go and crl_deprecated_test.go

* Delete testdate/crl/provider/filewatcher directory and .gitignore under it

* Race test fix

* Address PR comments

* Address PR comments

* Refactor directory reloader test from checking size of crl map to querying individual entries approach

* Add extra case for RefreshDuration config test

* Update cpmment for table test structure

* Unexport scan scanCRLDirectory, drop related mutex, update the comments

* Update API comments, clear tmp dir after the tests

---------

Co-authored-by: Gregory Cooke <gregorycooke@google.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature New features or improvements in behavior Type: Security A bug or other problem affecting security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants