Instability in RDM and RDConnection

Instability in RDM and RDConnection

avatar

Is anyone also experiencing any stability issues with RDM today?
I downloaded and installed the latest version to see if that would help, it has not. But for some reason RDM has been acting very strange today. I use RDM to manage several dozen windows servers, both on site and on Azure. I typically have 5-10 remote connections active at any one time, Historically RDM has been a life saver, today not so much.
When I try and close a connection to a server, it seems to hang up, and it is very difficult to get RDM to shutdown. The issue seems to be associated with closing an RDP connection. Many time I have had to log out and back on to clear RDM from the task list. At first I thought it might be associated with some MS issued update, as I noticed that most of the servers were wanting to reboot after installing updates. In the end I did manage to get the servers updated and rebooted. But the problem has persisted on the fully patched and rebooted servers. I have been forced to manage the servers via Remote Desktop Connection. But even then RDC behaves oddly when terminating a session. Hence my suspicion the issue might be related to RDP itself.

Any insights, or suggestions?

Thanks All

All Comments (3)

avatar

Hello

When you say Remote Desktop Connection, do you mean one of Microsoft's RDP clients or perhaps RDCMan (Remote Desktop Connection Manager)?

What it sounds like, is the RDP client getting hung up on disconnect. We do see those kinds of issues and RDM is impacted because multiple connections are run in the same process, so if the underlying RDP client gets hung, it hangs the whole process. mstsc.exe is one process for one connection, so if it gets hung up on disconnect the process can still terminate and it still seems somewhat "normal". I ask about RDCMan because it has a similar process model to RDM.

If I'm right, we typically see that in a couple of situations. The first might be if you're using any kind of tunnel or VPN to access the servers? If that's the case, disabling the UDP transport might help or at least provide a useful data point. Another scenario might be related to resource redirection; something like a local device (or drive); most recently we had a case where I believe a misbehaving printer or fax driver combined with printer redirection in the RDP session caused similar symptoms. This might especially be true if you've added new devices or updated drivers recently. You can try temporarily disabling redirected devices (in the "Local Resources" tab of the session) like drives, printers, smartcards and so on.

Finally, if you can reproduce the issue; download Process Explorer and while RDM is hung capture a mini dump (follow the instructions for Process Explorer here). Don't send me the file or post it in public, instead let me know that you have it and the support team will provide you a secure upload link. A mini dump often won't provide a clear cause but it can usually yield a "smoking gun" at least in cases like this.

Sorry for the inconvenience

Kind regards,

Richard Markievicz

avatar

Hi Richard,
Thanks for the reply. Yes as a fall back position I was using the Remote Desktop Connection from Microsoft. I'm guessing that since it only does 1 connection at a time it made the issue I was having yesterday more tolerable that Remote Desktop Manager trying to maintain 10 simultaneous sessions. I think I now better understand the process model, All RDP sessions flow thru a single session manager, If one session hangs, they all might appear to hang. Props for sussing out something else that was in play, I am connection to the facility that hosts the servers I manage is over a branch office VPN. So yes if the underlying network was misbehaving that too could help explain the trouble I was having. As most of the servers in question were needing to be updated, so God only known what OS level code & drivers were being changed. This morning my RDM sessions have returned to normal, so we're back to smooth sailing. I'm going to play around with disabling UDP as a transport option, having a sketchy VPN drop a UDP packet here and there seems like a real possibility. Good call on that.

Once again thanks for your help and thanks to your team for RDM!!

Keith

avatar

Hi Keith

I'm glad things seem to be working better for you. I'll provide some further technical background in case it helps if this issue arises again.

The RDP protocol is "extensible" by a feature called virtual channels. Actually Microsoft extend the core protocol using this feature in their own clients and servers, and that's how they provide features like UDP transport, device redirection and so on. What we've seen in the past is one or more of these virtual channels getting "stuck" on disconnect and preventing the session being torn down gracefully.

If it happens, and it's a core virtual channel (not a third party) it's either a bug on Microsoft's side or a faulty device driver interacting badly with the virtual channel itself. Microsoft themselves don't hit these kind of bugs because as we discussed, the Remote Desktop Connection (mstsc) application is one session per process, so even if the disconnection can't happen cleanly the process can still terminate.

RDM is different; RDP is (by default) provided by Microsoft's RDP ActiveX control. It shares lots of the same code with mstsc but the architecture enables multiple sessions. Since it's a user control, it must be created and destroyed (at session end) on the main application UI thread. So - if it hangs at that point, it hangs the application UI thread and we are frozen. Microsoft don't hit these bugs themselves because they don't use their ActiveX control in any of their first party clients (the exception here is RDCMan, which was formerly SysInternals and could be considered "second party" I guess).

Indeed this has come up in the past in relation to UDP over a VPN or Gateway, which is why we have the knowledge base article for disabling UDP. It should make connections more robust but possibly at the expense of slightly higher latency. It really depends on your network and workflow (full screen video would be affected much more than server administration, for example). As I wrote, other candidates can be device redirection which can usually only be solved via process of elimination (or we can usually get some relevant clues from a min dump of the frozen process).

I hope this provides some good insight. Don't hesitate to post back with further issues or questions.

Kind regards,

Richard Markievicz