I have got a collection of web sites that need to send time-sensitive messages to host machines all over my metro area, each on its own generally dynamic IP. Until now, I have been doing this the way of the script kiddie:
- Each host machine runs an (s)FTP server, or an HTTP(s) server, and correspondingly has a certain port opened up by its gateway. 
- Each host machine runs a program that watches a certain folder and automatically opens or prints or exec()s when a new file of a given extension shows up. Dynamic IP addresses are accommodated using a dynamic DNS service. 
- Each web site does cURL or fsockopen or whatever and communicates directly with its recipient as-needed. 
This approach has been suprisingly reliable, however obvious issues have come up and the situation needs to be addressed.
As stated, these messages are time-sensitive and failures need to be detected within minutes of submission by end-users. What I'm doing is building a messaging protocol. It will run on a machine and connection in my control. As far as the service is concerned, there is no distinction between web site and host machine -- there is only one device sending a message to another device.
So that's where I'm at right now. I've got a skeleton server and a skeleton client. They can negotiate high-quality authentication and encryption. The (TCP) connection is persistent and asynchronous, and can handle delimited (i.e., read until \r\n or whatever) as well as length-prefixed (i.e., read exactly n bytes) messages. Unless somebody gives me a better idea, I think I'll handle messages as byte arrays.
So I'm looking for suggestions on how to model the protocol itself -- at the application level. I'll mostly be transferring XML and DLM type files, as well as control messages for things like "handshake" and "is so-and-so online?" and so forth. Is there anything really stupid in my train of thought? Or anything I should read about before I get started? Stuff like that -- please and thanks.
Update:
@mrdenny's is the approach I have ultimately gone with, so he gets the answer. @Henrik's ZeroMQ suggestion applied as well, but I basically had that coded already and switching my code for a 3rd party framework didn't really help to design the application layer. In the end, I have discovered just how incredibly versatile HTTP can be, and there is really no need for a roll-your-own protocol. Simply let the web sites present content-type application/json (or xml if necessary) in addition to the text/html they were already doing, and let recipients make outbound web requests instead of listen and respond to filesystem updates. Removes all of the "script kiddie" overhead described above, works much more reliably, enables much better error handling, easy to build, and more.
