Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pam/integration-tests): Improve reliability of PAM integration tests and actually run them in race mode #560

Merged
merged 31 commits into from
Oct 2, 2024

Conversation

3v1n0
Copy link
Collaborator

@3v1n0 3v1n0 commented Sep 30, 2024

Since we rely a lot on sleeping on the integration tests, due to the fact we can't yet go with charmbracelet/vhs#257, we need to be able to tune this value depending on the context we're running the tests.

In general sanitizers slows down the runtime quite a bit (especially the thread one), so make possible to use dynamic sleep times values depending on variables that we adjust depending on the context we're running in.
Tests can now slowed up/down using AUTHD_TESTS_SLEEP_MULTIPLIER variable too (this can be used for helping local testing too, e.g. personally I can reliably run the tests faster in my machine - with no failures - using AUTHD_TESTS_SLEEP_MULTIPLIER=0.3; but also it helps to quickly "fix" slower machines).

Tune the tape files so that we don't miss some changing contents (as in the MFA/QrCode tests) and make the example broker to be a bit lazier.

Then, I noticed that we were not actually running the integration tests in -race mode, so fix this. Indeed this implies slower tests, but at least now we're fully checking for races both the daemon and the client.

This worked fine in various builds both in my fork and in a private repo fork I did where the builders are way slower than the public ones.


@adombeck please check if this improves the results in your local environment too, where you had failures

UDENG-4793

@3v1n0 3v1n0 requested a review from a team as a code owner September 30, 2024 00:33
@codecov-commenter
Copy link

codecov-commenter commented Sep 30, 2024

Codecov Report

Attention: Patch coverage is 72.41379% with 16 lines in your changes missing coverage. Please review.

Project coverage is 84.72%. Comparing base (1ba9bce) to head (db2664e).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
examplebroker/broker.go 65.95% 14 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #560      +/-   ##
==========================================
- Coverage   84.79%   84.72%   -0.07%     
==========================================
  Files          79       79              
  Lines        6963     6987      +24     
  Branches       75       75              
==========================================
+ Hits         5904     5920      +16     
- Misses        739      746       +7     
- Partials      320      321       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@denisonbarbosa denisonbarbosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big one, although the review is quite small. I have some nitpicks, but nothing blocking so I'll leave my approval already. Nice work! Hopefully our tests get more stable now!

It's so frustrating to see how much we need to work around because we don't have the "wait for output" feature in vhs yet, but well... It's not something we can control, right? 🥲

examplebroker/broker.go Outdated Show resolved Hide resolved
pam/integration-tests/cli_test.go Outdated Show resolved Hide resolved
@adombeck
Copy link
Contributor

@adombeck please check if this improves the results in your local environment too, where you had failures

when I run it without setting AUTHD_TESTS_SLEEP_MULTIPLIER it's still flaky. It passed in the first run but in a second run it failed with the usual golden file mismatch error:

    --- FAIL: TestNativeAuthenticate/Authenticate_user_with_mfa (28.09s)
        native_test.go:126: 
            	Error Trace:	/home/adombeck/projects/authd/pam/integration-tests/native_test.go:126
            	Error:      	Not equal: [...]
            	Test:       	TestNativeAuthenticate/Authenticate_user_with_mfa
            	Messages:   	Output of tape "mfa_auth" does not match golden file
[...]
FAIL
FAIL	github.com/ubuntu/authd/pam/integration-tests	152.044s

I then tried running it with AUTHD_TESTS_SLEEP_MULTIPLIER=0.3. After about 3 minutes, it started using up all the 42 GiB of RAM on my system + the 64 GiB of swap and eventually caused my system to freeze.

I'll now try with AUTHD_TESTS_SLEEP_MULTIPLIER=1.5 and report back.

@3v1n0
Copy link
Collaborator Author

3v1n0 commented Sep 30, 2024

FAIL: TestNativeAuthenticate/Authenticate_user_with_mfa (28.09s)

Do you have the generated golden file from that log?

It should be in /tmp/authd-test-artifacts/ if you didn't preserve the log diff itself

@adombeck
Copy link
Contributor

I'll now try with AUTHD_TESTS_SLEEP_MULTIPLIER=1.5 and report back.

It failed:

--- FAIL: TestNativeAuthenticate (4.47s)
    native_test.go:20: Compiling PAM Wrapper library at /tmp/TestNativeAuthenticate2034516298/002/pam_authd_exectestnativeauthenticate.so
    native_test.go:20: Running compiler command: /usr/bin/cc -o /tmp/TestNativeAuthenticate2034516298/002/pam_authd_exectestnativeauthenticate.so ./pam/go-exec/module.c -Wall -Werror -g3 -O0 -DAUTHD_TEST_MODULE=1 -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/gio-unix-2.0 -pthread -Wl,--as-needed -Wl,--allow-shlib-undefined -shared -fPIC -Wl,--unresolved-symbols=report-all -Wl,-soname,pam_authd_exectestnativeauthenticate -lpam -lgio-2.0 -lgobject-2.0 -lglib-2.0
    native_test.go:20: Compiling Exec client at /tmp/TestNativeAuthenticate2034516298/003/authd-pam
    native_test.go:20: go build -C pam -gcflags=-dwarflocationlists=true
    --- FAIL: TestNativeAuthenticate/Authenticate_user_and_add_it_to_local_group (18.95s)
        native_test.go:128: 
            	Error Trace:	/home/adombeck/projects/authd/internal/users/localgroups/testutils/gpasswd.go:136
            	            				/home/adombeck/projects/authd/pam/integration-tests/native_test.go:128
            	Error:      	Not equal: 
            	            	expected: "--add user-local-groups localgroup"
            	            	actual  : "--add user-local-groups localgroup\n--add user-local-groups localgroup"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1,2 @@
            	            	 --add user-local-groups localgroup
            	            	+--add user-local-groups localgroup
            	Test:       	TestNativeAuthenticate/Authenticate_user_and_add_it_to_local_group
            	Messages:   	IsAuthenticated should return the expected combined data, but did not
--- FAIL: TestCLIAuthenticate (4.82s)
    cli_test.go:27: Compiling PAM Wrapper library at /tmp/TestCLIAuthenticate1888500761/002/pam_authd_exectestcliauthenticate.so
    cli_test.go:27: Running compiler command: /usr/bin/cc -o /tmp/TestCLIAuthenticate1888500761/002/pam_authd_exectestcliauthenticate.so ./pam/go-exec/module.c -Wall -Werror -g3 -O0 -DAUTHD_TEST_MODULE=1 -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/gio-unix-2.0 -pthread -Wl,--as-needed -Wl,--allow-shlib-undefined -shared -fPIC -Wl,--unresolved-symbols=report-all -Wl,-soname,pam_authd_exectestcliauthenticate -lpam -lgio-2.0 -lgobject-2.0 -lglib-2.0
    cli_test.go:27: Compiling Exec client at /tmp/TestCLIAuthenticate1888500761/003/authd-pam
    cli_test.go:27: go build -C pam -gcflags=-dwarflocationlists=true
    --- FAIL: TestCLIAuthenticate/Authenticate_user_and_add_it_to_local_group (15.59s)
        cli_test.go:112: 
            	Error Trace:	/home/adombeck/projects/authd/internal/users/localgroups/testutils/gpasswd.go:136
            	            				/home/adombeck/projects/authd/pam/integration-tests/cli_test.go:112
            	Error:      	Not equal: 
            	            	expected: "--add user-local-groups localgroup"
            	            	actual  : "--add user-local-groups localgroup\n--add user-local-groups localgroup"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1,2 @@
            	            	 --add user-local-groups localgroup
            	            	+--add user-local-groups localgroup
            	Test:       	TestCLIAuthenticate/Authenticate_user_and_add_it_to_local_group
            	Messages:   	IsAuthenticated should return the expected combined data, but did not

Do you have the generated golden file from that log?

It should be in /tmp/authd-test-artifacts/ if you didn't preserve the log diff itself

No, I don't have the golden files from that run. It's not in /tmp either because my system froze afterwards (when I ran with UTHD_TESTS_SLEEP_MULTIPLIER=0.3) and I had to reboot. I can try to reproduce the error and provide you the complete log if that helps.

@adombeck
Copy link
Contributor

I can try to reproduce the error and provide you the complete log if that helps.

Here is another error from running without AUTHD_TESTS_SLEEP_MULTIPLIER:

    --- FAIL: TestNativeAuthenticate/Exit_if_user_is_not_pre-checked_on_ssh_service (2.98s)
        native_test.go:122: 
            	Error Trace:	/home/adombeck/projects/authd/pam/integration-tests/native_test.go:122
            	Error:      	Received unexpected error:
            	            	exit status 1
            	Test:       	TestNativeAuthenticate/Exit_if_user_is_not_pre-checked_on_ssh_service
            	Messages:   	Failed to run tape "local_ssh": exit status 1: File: /tmp/TestNativeAuthenticateExit_if_user_is_not_pre-checked_on_ssh_service2432860924/001/local_ssh
            	            	Host your GIF on vhs.charm.sh: vhs publish <file>.gif
            	            	could not launch browser: [launcher] Failed to get the debug url: Failed to open /dev/null                             
            	            	[280538:280538:0930/164144.200144:FATAL:zygote_host_impl_linux.cc(207)] Check failed: . : No such file or directory (2)
            	            	[0930/164144.206417:ERROR:ptracer.cc(605)] ptrace: Input/output error (5)                                              
            	            	[0930/164144.330607:ERROR:elf_dynamic_array_reader.h(64)] tag not found                                                
            	            	                                                                                                                       
            	            	recording failed

Full test logs: gotest.log

@3v1n0
Copy link
Collaborator Author

3v1n0 commented Sep 30, 2024

Ok the Authenticate_user_and_add_it_to_local_group issues are not due to integration tests but I think we had race due to the fact we may try to rely on filesystem content that may not have been yet updated or vice versa, 6d3afdd should help with it.

@3v1n0
Copy link
Collaborator Author

3v1n0 commented Sep 30, 2024

--- FAIL: TestNativeAuthenticate/Exit_if_user_is_not_pre-checked_on_ssh_service (2.98s)

Not our bug: vhs (or the called chrome/chromium) doesn't seem to start, so not something we can fix.

@adombeck
Copy link
Contributor

Another failure:

    --- FAIL: TestNativeAuthenticate/Authenticate_user_on_ssh_service_with_custom_name_and_connection_env (3.51s)
        native_test.go:122: 
            	Error Trace:	/home/adombeck/projects/authd/pam/integration-tests/native_test.go:122
            	Error:      	Received unexpected error:
            	            	exit status 2
            	Test:       	TestNativeAuthenticate/Authenticate_user_on_ssh_service_with_custom_name_and_connection_env
            	Messages:   	Failed to run tape "simple_ssh_auth": exit status 2: File: /tmp/TestNativeAuthenticateAuthenticate_user_on_ssh_service_with_custom_name_and_connection_env3717920695/001/simple_ssh_auth
            	            	Host your GIF on vhs.charm.sh: vhs publish <file>.gif
            	            	Output .txt simple_ssh_auth.txt
            	            	Output .gif simple_ssh_auth.gif
            	            	Set Width 800
            	            	Set Height 500
            	            	Set FontFamily Monospace
            	            	Set FontSize 13
            	            	Set Padding 0
            	            	Set Margin 0
            	            	Set Shell bash
            	            	panic: write tcp 127.0.0.1:39266->127.0.0.1:36905: use of closed network connection
            	            	
            	            	goroutine 1 [running]:
            	            	github.com/go-rod/rod/lib/utils.init.func2({0xf65600?, 0xc0003dabe0?})
            	            		/home/adombeck/go/pkg/mod/github.com/go-rod/[email protected]/lib/utils/utils.go:69 +0x1d
            	            	main.(*VHS).Start.New.(*Browser).WithPanic.genE.func1({0xc00004bb40?, 0xc0003da960?, 0x14f9f00?})
            	            		/home/adombeck/go/pkg/mod/github.com/go-rod/[email protected]/must.go:36 +0x62
            	            	github.com/go-rod/rod.(*Page).MustSetViewport(0xc00023e160, 0x100006ff1f0?, 0x724b27282f88?, 0x10?, 0xb8?)
            	            		/home/adombeck/go/pkg/mod/github.com/go-rod/[email protected]/must.go:313 +0xe2
            	            	main.(*VHS).Setup(0xc00070e7e0)
            	            		/home/adombeck/go/pkg/mod/github.com/charmbracelet/[email protected]/vhs.go:166 +0x7f
            	            	main.Evaluate({0x14ff778, 0xc0003e5400}, {0xc000264000, 0x19a}, {0x14f95c0, 0xc00008a058}, {0xc0006ffc30, 0x1, 0x0?})
            	            		/home/adombeck/go/pkg/mod/github.com/charmbracelet/[email protected]/evaluator.go:79 +0x8f1
            	            	main.init.func6(0x1c0d4c0, {0xc0005de110, 0x1, 0x101e816?})
            	            		/home/adombeck/go/pkg/mod/github.com/charmbracelet/[email protected]/main.go:99 +0x44e
            	            	github.com/spf13/cobra.(*Command).execute(0x1c0d4c0, {0xc000036050, 0x1, 0x1})
            	            		/home/adombeck/go/pkg/mod/github.com/spf13/[email protected]/command.go:983 +0xaca
            	            	github.com/spf13/cobra.(*Command).ExecuteC(0x1c0d4c0)
            	            		/home/adombeck/go/pkg/mod/github.com/spf13/[email protected]/command.go:1115 +0x3ff
            	            	github.com/spf13/cobra.(*Command).Execute(...)
            	            		/home/adombeck/go/pkg/mod/github.com/spf13/[email protected]/command.go:1039
            	            	github.com/spf13/cobra.(*Command).ExecuteContext(...)
            	            		/home/adombeck/go/pkg/mod/github.com/spf13/[email protected]/command.go:1032
            	            	main.main()
            	            		/home/adombeck/go/pkg/mod/github.com/charmbracelet/[email protected]/main.go:250 +0xc9

Full test logs: gotest.log

@3v1n0
Copy link
Collaborator Author

3v1n0 commented Sep 30, 2024

@adombeck looks like another vhs upstream issue, please report it there.

@3v1n0 3v1n0 force-pushed the race-integration-tests branch 2 times, most recently from 1f884b2 to db2664e Compare September 30, 2024 18:28
@3v1n0
Copy link
Collaborator Author

3v1n0 commented Oct 1, 2024

@adombeck, @denisonbarbosa: If there are no further blockers I'd proceed merging this (later tomorrow/today) so that I'm not blocked in further cleanups branch I've + other tests.

This seems more stable in CI and locally to me, but if there further failures I'm happy to analyze them (in general they require just some extra sleeps here and there, sadly)

We repeat the same for all the files and it's hard to define specific
values, so avoid repeating the same for each file but instead generate
the contents programmatically so that it can be changed per test.

This generalization makes adding new tests easier and will permit to
change the tape data before running it.
We were doing this manually anyways, so just do it forever so that we
don't forget when the function is re-used
Depending on the testing conditions we may need to change the sleeping
time, since VHS doesn't yet support a wait-for feature, it's just better
to make possible to define the sleep programmatically so that we can
change it at once without having to refactor all the tape files
We were not waiting enough to show the view again, causing the result
arriving too early and not being printed in a new line as expected
This makes the test more reliable when we are in a slower context
… change

We need the broker to do some work so let's wait more
…izer

Instead of relying on cgo, we can just get from go if asan support is
enabled, so rely on the build parameter instead of calling the C code.

This can't be done everywhere though, as the value isn't set when
running the gdm model tests and so we still need to keep around
pam_test.IsAddressSanitizerActive() to be used there
If the thread sanitizer is enabled we may take different paths, so
enable the detection of it as we do with ASAN
We are running integration tests in race mode, but we don't actually run
the binaries compiled with thread sanitizer in such case since it was
not detected.

Fix this by adding the -race flag if the tests are running in such mode
The integration tests relying on sleeping, and so they become quite
unreliable when the environment is not optimal.

To prevent unexpected failures, make possible to define a sleep
multiplier via an environment variable and define some default values
depending whether we're building with address or thread sanitizers,
since these builds are slower
Tests may take longer time, so avoid issues in slower builders
While -dwarflocationlists is enabled by default, we may want to disable
further optimizations in the PAM cflags build
…own in tests

We have various MFA based tests where we wait for a new method selection
for being shown, but we used too long sleeps in the tape files compared
to the one of the broker, so some stages were not reflected by the
golden files.

Increase the sleep time on the broker side so that the client has time
to show the items
We are always accessing to the example users map to check if the user
is tracked, so just lock the map all the times and not just on specific
cases.

Fixes the race:

WARNING: DATA RACE
Read at 0x00c000253680 by goroutine 335:
	runtime.mapaccess2_faststr()
		/opt/hostedtoolcache/go/1.23.0/x64/src/runtime/map_faststr.go:117 +0x0
	github.com/ubuntu/authd/examplebroker.(*Broker).handleIsAuthenticated()
		/home/runner/work/authd/authd/examplebroker/broker.go:679 +0x1cbd
	github.com/ubuntu/authd/examplebroker.(*Broker).IsAuthenticated()
		/home/runner/work/authd/authd/examplebroker/broker.go:530 +0x412
	github.com/ubuntu/authd/examplebroker.(*Bus).IsAuthenticated()
		/home/runner/work/authd/authd/examplebroker/dbus.go:106 +0x91
	runtime.call64()
		/opt/hostedtoolcache/go/1.23.0/x64/src/runtime/asm_amd64.s:777 +0x42
	reflect.Value.Call()
		/opt/hostedtoolcache/go/1.23.0/x64/src/reflect/value.go:365 +0xb5
	github.com/godbus/dbus/v5.exportedMethod.Call()
		/home/runner/go/pkg/mod/github.com/godbus/dbus/[email protected]/default_handler.go:128 +0x239
	github.com/godbus/dbus/v5.(*exportedMethod).Call()
		<autogenerated>:1 +0x84
	github.com/godbus/dbus/v5.(*Conn).handleCall()
		/home/runner/go/pkg/mod/github.com/godbus/dbus/[email protected]/export.go:193 +0x6f2
	github.com/godbus/dbus/v5.(*Conn).inWorker.gowrap1()
		/home/runner/go/pkg/mod/github.com/godbus/dbus/[email protected]/conn.go:435 +0x44

Previous write at 0x00c000253680 by goroutine 350:
	runtime.mapassign_faststr()
		/opt/hostedtoolcache/go/1.23.0/x64/src/runtime/map_faststr.go:223 +0x0
	github.com/ubuntu/authd/examplebroker.(*Broker).handleIsAuthenticated()
		/home/runner/work/authd/authd/examplebroker/broker.go:655 +0x128b
	github.com/ubuntu/authd/examplebroker.(*Broker).IsAuthenticated()
		/home/runner/work/authd/authd/examplebroker/broker.go:530 +0x412
	github.com/ubuntu/authd/examplebroker.(*Bus).IsAuthenticated()
		/home/runner/work/authd/authd/examplebroker/dbus.go:106 +0x91
	runtime.call64()
		/opt/hostedtoolcache/go/1.23.0/x64/src/runtime/asm_amd64.s:777 +0x42
	reflect.Value.Call()
		/opt/hostedtoolcache/go/1.23.0/x64/src/reflect/value.go:365 +0xb5
	github.com/godbus/dbus/v5.exportedMethod.Call()
		/home/runner/go/pkg/mod/github.com/godbus/dbus/[email protected]/default_handler.go:128 +0x239
	github.com/godbus/dbus/v5.(*exportedMethod).Call()
		<autogenerated>:1 +0x84
	github.com/godbus/dbus/v5.(*Conn).handleCall()
		/home/runner/go/pkg/mod/github.com/godbus/dbus/[email protected]/export.go:193 +0x6f2
	github.com/godbus/dbus/v5.(*Conn).inWorker.gowrap1()
		/home/runner/go/pkg/mod/github.com/godbus/dbus/[email protected]/conn.go:435 +0x44

Goroutine 335 (running) created at:
	github.com/godbus/dbus/v5.(*Conn).inWorker()
		/home/runner/go/pkg/mod/github.com/godbus/dbus/[email protected]/conn.go:435 +0x46a
	github.com/godbus/dbus/v5.(*Conn).Auth.gowrap1()
		/home/runner/go/pkg/mod/github.com/godbus/dbus/[email protected]/auth.go:118 +0x33

Goroutine 350 (finished) created at:
	github.com/godbus/dbus/v5.(*Conn).inWorker()
		/home/runner/go/pkg/mod/github.com/godbus/dbus/[email protected]/conn.go:435 +0x46a
	github.com/godbus/dbus/v5.(*Conn).Auth.gowrap1()
		/home/runner/go/pkg/mod/github.com/godbus/dbus/[email protected]/auth.go:118 +0x33
Permit IsAuthenticated being called concurrently by reading the user
info just once, instead of making subsequent calls being blocked waiting
for the user info being read
… duration

Add a sleep variable matching the default example broker sleep value so
that we don't have to guess it in the tape files but it's clearly
related
We are relying on local groups content in order to call gpasswd, but
in case there are two concurrent writes, the gpasswd file content we're
reading may not be updated, causing us to call --add or --remove twice
for the same content.

So use a mutex to avoid that we're trying to read from the filesystem
before that the current write operation has been completed.
We were sleeping after typing and before actually submitting the new
password, leading to potential failures.

Do the other way around instead since it's after we hit Enter that we
could have slowdowns
@3v1n0 3v1n0 merged commit e304624 into ubuntu:main Oct 2, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants