Page MenuHomePhabricator

Regression introduced in commit e0e3636c60dd28fd358e47da2be132702ff8edc9 breaks libnetconf2
Closed, ResolvedPublic

Description

libnetconf2 uses libssh for handling the SSH protocol when talking to NETCONF servers. Someone reported that while libssh-0.9.2 worked fine, upgrading to libssh-0.9.3 prevents the [netopeer2-cli](https://github.com/CESNET/Netopeer2/tree/legacy/cli) from connecting to any NETCONF server over SSH (note: use the legacy branches of that software stack when trying to reproduce).

I've run a git bisection, tracing this down to commit e0e3636c60. The last commit which works is 36bdcb85. I've also tried current master (12d5c136), and that one also does not work with netopeer2-cli. What puzzles me is that the commit appears to be reasonable -- the return value of ssh_buffer_get_len is uint32_t, so putting it into a size_t sounds OK. Here's how libnetconf2 behaves when we hit that problem:

Breakpoint 1, nc_recv_client_hello_io (session=0x61100000ff40) at /home/jkt/work/cesnet/gerrit/CzechLight/cla-sysrepo/submodules/dependencies/libnetconf2/src/session.c:1261
1261            ERR("Server's <hello> timeout elapsed.");
(gdb) bt
#0  nc_recv_client_hello_io (session=0x61100000ff40) at /home/jkt/work/cesnet/gerrit/CzechLight/cla-sysrepo/submodules/dependencies/libnetconf2/src/session.c:1261
#1  0x00007ffff662ebe6 in nc_handshake_io (session=0x61100000ff40) at /home/jkt/work/cesnet/gerrit/CzechLight/cla-sysrepo/submodules/dependencies/libnetconf2/src/session.c:1350
#2  0x00007ffff665ea73 in nc_connect_ssh (host=0x602000009850 "line-qc8b", port=830, ctx=0x60d000002e90)
    at /home/jkt/work/cesnet/gerrit/CzechLight/cla-sysrepo/submodules/dependencies/libnetconf2/src/session_client_ssh.c:1757
#3  0x0000555555570f81 in cmd_connect_listen_ssh (cmd=0x7fffffffd2c0, is_connect=1)
    at /home/jkt/work/cesnet/gerrit/CzechLight/cla-sysrepo/submodules/dependencies/Netopeer2/cli/commands.c:1533
#4  0x0000555555578eeb in cmd_connect_listen (arg=0x606000003f20 "connect --ssh --login dwdm --host line-qc8b --port 830", is_connect=1)
    at /home/jkt/work/cesnet/gerrit/CzechLight/cla-sysrepo/submodules/dependencies/Netopeer2/cli/commands.c:2687
#5  0x0000555555579117 in cmd_connect (arg=0x606000003f20 "connect --ssh --login dwdm --host line-qc8b --port 830", UNUSED_tmp_config_file=0x7fffffffd5d0)
    at /home/jkt/work/cesnet/gerrit/CzechLight/cla-sysrepo/submodules/dependencies/Netopeer2/cli/commands.c:2722
#6  0x0000555555568875 in main () at /home/jkt/work/cesnet/gerrit/CzechLight/cla-sysrepo/submodules/dependencies/Netopeer2/cli/main.c:200

I went looking through the code a bit more, and I *think* that libnetconf2 uses libssh correctly; the timeout for this function appears to be high enough (60 seconds...), while the poll feels as if returning immediately.

Can you please help me investigate this further?

Event Timeline

jktjkt created this task.Jan 22 2020, 7:14 PM
Jakuje added a subscriber: Jakuje.Jan 22 2020, 8:34 PM

This is follow-up from the mailing list [1], where we addressed one issue, but the second got lost and slipped from my radar.

Having better look, it looks like this indeed changes behavior, even though it is not well documented in the doxygen comments. The ssh_handle_packets_termination() can return SSH_AGAIN, which is now propagated to the return of the ssh_channel_poll_timeout() function, while previously this was overwritten with the SSH_OK.

I will try to check this further, but having some simple reproducer would be very helpful. I was not able to find the related code in your code yet.

[1] https://www.libssh.org/archive/libssh/2020-01/0000002.html

asn added a comment.Jan 23 2020, 11:05 AM

Thank you very much for tracking this down. Could you please test the latest stable-0.9 branch which includes a fix for this?