Page MenuHomePhabricator

tests: "bind: address already in use"
Closed, ResolvedPublic


I have problems testing libssh on ubuntu 18.04. Oddly enough, this happens on my VM but not my bare metal ubuntu 18.04 workstation.
The error at the end (torture.c:232) happens when the testcase cannot find the pidfile for the sshd daemon. It looks like there's a bug hiding in the sshd spawning code.

aris@vm1804:~/libssh/build/tests/client$ ctest -R auth -V
UpdateCTestConfiguration  from :/home/aris/libssh/build/tests/client/DartConfiguration.tcl
UpdateCTestConfiguration  from :/home/aris/libssh/build/tests/client/DartConfiguration.tcl
Test project /home/aris/libssh/build/tests/client
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 5
    Start 5: torture_auth

5: Test command: /home/aris/libssh/build/tests/client/torture_auth
5: Environment variables: 
5:  LD_PRELOAD=/usr/lib/
5:  NSS_WRAPPER_PASSWD=/home/aris/libssh/build/tests/etc/passwd
5:  NSS_WRAPPER_SHADOW=/home/aris/libssh/build/tests/etc/shadow
5:  NSS_WRAPPER_GROUP=/home/aris/libssh/build/tests/etc/group
5:  PAM_WRAPPER_SERVICE_DIR=/home/aris/libssh/build/tests/etc/pam.d
5: Test timeout computed to be: 9.99988e+06
5: [==========] Running 19 test(s).
5: OK: SSH-2.0-OpenSSH_7.6p1 Ubuntu-4ubuntu0.3
5: [ RUN      ] torture_auth_none
5: [       OK ] torture_auth_none
5: [ RUN      ] torture_auth_none_nonblocking
5: [       OK ] torture_auth_none_nonblocking
5: [ RUN      ] torture_auth_password
5: [       OK ] torture_auth_password
5: [ RUN      ] torture_auth_password_nonblocking
5: [       OK ] torture_auth_password_nonblocking
5: [ RUN      ] torture_auth_kbdint
5: [       OK ] torture_auth_kbdint
5: [ RUN      ] torture_auth_kbdint_nonblocking
5: [       OK ] torture_auth_kbdint_nonblocking
5: [ RUN      ] torture_auth_autopubkey
5: [       OK ] torture_auth_autopubkey
5: [ RUN      ] torture_auth_autopubkey_nonblocking
5: [       OK ] torture_auth_autopubkey_nonblocking
5: [ RUN      ] torture_auth_agent
5: Agent pid 105096
5: Identity added: /home/aris/libssh/build/tests/home/bob/.ssh/id_rsa (/home/aris/libssh/build/tests/home/bob/.ssh/id_rsa)
5: Could not run the test - check test fixtures
5: [  ERROR   ] torture_auth_agent
5: [ RUN      ] torture_auth_agent_nonblocking
5: bind: Address already in use
5: unix_listener: cannot bind to path: /tmp/test_socket_wrapper_X8sI3D/agent.sock
5: Identity added: /home/aris/libssh/build/tests/home/bob/.ssh/id_rsa (/home/aris/libssh/build/tests/home/bob/.ssh/id_rsa)
5: Could not run the test - check test fixtures
5: [  ERROR   ] torture_auth_agent_nonblocking
5: [ RUN      ] torture_auth_cert
5: [       OK ] torture_auth_cert
5: [ RUN      ] torture_auth_agent_cert
5: bind: Address already in use
5: unix_listener: cannot bind to path: /tmp/test_socket_wrapper_X8sI3D/agent.sock
5: Identity added: /home/aris/libssh/build/tests/home/bob/.ssh/id_rsa (/home/aris/libssh/build/tests/home/bob/.ssh/id_rsa)
5: All identities removed.
5: Identity added: /home/aris/libssh/build/tests/home/bob/.ssh_cert/id_rsa (/home/aris/libssh/build/tests/home/bob/.ssh_cert/id_rsa)
5: Certificate added: /home/aris/libssh/build/tests/home/bob/.ssh_cert/ (torture_auth_carlos)
5: Could not run the test - check test fixtures
5: [  ERROR   ] torture_auth_agent_cert
5: [ RUN      ] torture_auth_agent_cert_nonblocking
5: bind: Address already in use
5: unix_listener: cannot bind to path: /tmp/test_socket_wrapper_X8sI3D/agent.sock
5: Identity added: /home/aris/libssh/build/tests/home/bob/.ssh/id_rsa (/home/aris/libssh/build/tests/home/bob/.ssh/id_rsa)
5: All identities removed.
5: Identity added: /home/aris/libssh/build/tests/home/bob/.ssh_cert/id_rsa (/home/aris/libssh/build/tests/home/bob/.ssh_cert/id_rsa)
5: Certificate added: /home/aris/libssh/build/tests/home/bob/.ssh_cert/ (torture_auth_carlos)
5: Could not run the test - check test fixtures
5: [  ERROR   ] torture_auth_agent_cert_nonblocking
5: [ RUN      ] torture_auth_pubkey_types
5: [       OK ] torture_auth_pubkey_types
5: [ RUN      ] torture_auth_pubkey_types_nonblocking
5: [       OK ] torture_auth_pubkey_types_nonblocking
5: [ RUN      ] torture_auth_pubkey_types_ecdsa
5: [       OK ] torture_auth_pubkey_types_ecdsa
5: [ RUN      ] torture_auth_pubkey_types_ecdsa_nonblocking
5: [       OK ] torture_auth_pubkey_types_ecdsa_nonblocking
5: [ RUN      ] torture_auth_pubkey_types_ed25519
5: [       OK ] torture_auth_pubkey_types_ed25519
5: [ RUN      ] torture_auth_pubkey_types_ed25519_nonblocking
5: [       OK ] torture_auth_pubkey_types_ed25519_nonblocking
5: [  ERROR   ] --- 0xffffffffffffffff == 0xffffffffffffffff
5: [   LINE   ] --- /home/aris/libssh/tests/torture.c:232: error: Failure!
5: [  ERROR   ] tests
5: [==========] 19 test(s) run.
5: [  PASSED  ] 15 test(s).
1/1 Test #5: torture_auth .....................***Failed    0.68 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.68 sec

The following tests FAILED:
	  5 - torture_auth (Failed)
Errors while running CTest

Event Timeline

aris created this task.Oct 16 2019, 10:15 PM
aris claimed this task.Oct 20 2019, 12:58 AM

I found the cause of the problem. The snprintf used to build the configuration file was reaching the maximum size of the buffer and was truncated, effectively removing the pidfile from the sshd configuration.
I'll push the commit on my review branch tomorrow.

diff --git a/tests/torture.c b/tests/torture.c
index ac4d8d16..155f0f83 100644
--- a/tests/torture.c
+++ b/tests/torture.c
@@ -581,7 +581,7 @@ static void torture_setup_create_sshd_config(void **state, bool pam)
     char rsa_hostkey[1024];
     char ecdsa_hostkey[1024];
     char trusted_ca_pubkey[1024];
-    char sshd_config[2048];
+    char sshd_config[4096];
     char sshd_path[1024];
     const char *additional_config = NULL;
     struct stat sb;
asn added a subscriber: asn.Oct 25 2019, 11:18 AM

Oh, great catch! :-)

aris added a comment.Nov 4 2019, 10:47 AM

Could you please review my patches from ? There are fixes to this issue but also a testcase for T191.

Jakuje added a subscriber: Jakuje.Nov 4 2019, 12:33 PM

@aris your commits are missing Sign-off. I added my review. I also see many failed pkd tests in the last CI run with Fedora. Are they related to your changes?

If you can reliably reproduce your issues on Ubuntu, would it make sense to add CI targets to make sure we do not regress in future?

aris added a comment.Nov 4 2019, 1:52 PM

Hi @Jakuje, thanks for your review. All your comments are directly actionable so I'll fix them asap.
the two pkd tests that failed are related to my changes and bug T191 that I discovered this way, so I think we should merge it and acknowledge that there are unfixed bugs on our tree.
A few CI targets have two more failing tests. I didn't manage to reproduce them, I'm not sure if they did not exist before.
I'll see what I should do to have ubuntu as part of the CI targets. It makes totally sense because ubuntu is breaking every time I want to catch up on libssh dev :)

@aris I added the Ubuntu CI image, where I can successfully build libssh now:

Can you check your branch to address my comments to be mergeable (signoff, leftovers from merges)?

I just tried your branch and it is still not working for me but I would like to avoid working on the same thing and wasting time with issues you already solved.

Update: Your branch passes for me locally in container with OpenSSL now with last fix. But the commits still need some love before we can merge them. Will you have some time to touch that in coming days so we can add the ubuntu targets?

aris added a comment.Nov 21 2019, 4:23 PM

hi @Jakuje, I have updated the branch and I think I fixed all your concerns. I opened a separate ticket T200 for the remaining problems. I haven't tried the new CI image, I don't understand yet how to run the gitlab CI tests locally.

Thanks. There are just a few nits I pointed out in the comments. The changes generally look good to me now and ready to merge (after the freeze -- see the email).

I added the Ubuntu target to the .gitlab-ci.yaml and pulled your changes to my branch to see where we stand. At this moment, I see also the rekey test failing, but this is Ubuntu 18.04 (LTS) which might be different from your platform:

The following tests FAILED:
	 39 - torture_rekey (Failed)
	 43 - torture_proxycommand (Failed)

I will check if it is something I can simply solve or I will fill a new issue for that.

aris added a comment.Nov 21 2019, 4:57 PM

Thanks for your comments. I saw the rekey test fail in other pipelines too, I think we have a probabilistic bug there, unrelated to this issue.

You are probably right. I was not able to reproduce it locally even from repetitive runs in the same container image with the same code.

The log from the CI is not very helpful. It looks like it fails to start the sshd server for whatever reason. But even if it is some probabilistic bug, we should figure out why it fails. Either by improving logging to have more helpful logs or trying to reproduce more systematically.

I forgot to attach the link to the pipeline, for the record:

FYI, I think you might have been hitting also this issue which I recently fixed in master: It would be nice if you could check if the original issue got addressed.

Jakuje closed this task as Resolved.Dec 16 2019, 3:55 PM

Closing as these symptoms really look like the same. If not, please reopen.

Through T200 I will add some Ubuntu targets to have this checked also in CI.