Openssh pts fun

So… Came across a whole new bug today that had me scratching my head for quite a while. One of my servers at work started hanging during ssh logins completely out of the blue. Attempting to login with the really verbose flags ‘-v -v’ didn’t give any clues, and I just assumed something had happened with our network auth system.

More investigation showed I could execute commands via ssh, but any login attempt (even using pub keys) would hang until I manually killed the process. Managed to run ‘tail /var/log/secure’ and saw login attempts were failing with this error:
openpty: No such file or directory
session_pty_req: session 0 alloc failed

Googling for that came up with all sorts of posts from the 2.6 kernel move (which added devpts fs), broken dev file systems and other equally non-helpful suggestions. Finally I just went ahead and ran strace and was able to see:
open("/dev/ptmx", O_RDWR) = -1 EIO (Input/output error)

Did some more digging, and spoke with some folks around the office and discovered it was an old(-ish) kernel bug where the ptmx device no longer returns the correct number for the next pseudo terminal, and instead simply returns ‘0’. I was able to confirm by killing the process attached to pty0 and was able to login, however once I did I blocked any other logins…

The other ‘fix’ for the issue was to kill all sessions attached to a pty, umount devpts and mount it again. However it seemed most prudent to just reboot into a newer kernel where the bug is (hopefully) fixed.

[tags]OpenSSH,RHEL,Kernel[/tags]

4 thoughts on “Openssh pts fun

  1. I found this while searching for help for the same symptoms. However, rebooting does not fix the problem. Th eproblem showed up on a system which had been running for at least several days, and had not been updated or had any configuration changes that I remember just before it began failed. I had ssh’d in without problems just five minutes before.

  2. Experienced this same problem. The machine had been recently built with the latest RHEL 4ES and updated to the lastest fixes. I had been working on the machine all day with multiple logins and all of a sudden I could no longer establish any new logins and had the same messages in the secure log file you found. I tried your fix of unmount and remount of /dev/pts and its working now. I guess its possible RedHat may have provided new updates with the old bug??

  3. We have been struggling with this bug since we upgraded to 2.6 (RHEL). We’ve had an open support ticket with Red Hat since March 2006 but have gotten nothing other than last month an admission there was a bug but they didn’t know where. We can reproduce the problem, and the only fix is a reboot. We believe it is caused by how pseudo-terminals are released or re-used when a parent process is killed abnormally (clicking the window button instead of ‘exit’) and then the child is likewise terminated. Although we can reproduce the failure (“tty problem”) the behavior is erratic and seems to depend upon which pts are in use when the tty problem is forced. Our 2 or 3 line (depending on the programmer) program is written in Lahey-Fujitsu Fortran F95.
    …Kristi

Leave a Reply