First off, I took a bit of a hiatus so forgive the long absence. I am back now, and ready to share some interesting things I've seen during my "sabbatical". I just completed O365 hybridization with a customer’s Exchange 2013 CU17 environment, and during my testing phase is where I started experiencing this problem...
I could send emails FROM Exchange Online to Exchange On-Prem, but I couldn't send emails from On-Prem to Exchange Online. The first thing I did was enable Verbose logging on my send connector to O365, and tried to recreate the issue. I sent my test emails, and I could see they moved to the "Sent Items" in Outlook as expected, however, the emails were just sitting in the Exchange Message Queue on-prem.
When I manually re-tried the messages in the queue, I received the following error for each message.
Error:"451 4.4.0 Primary target IP address responded with: “421 4.2.1 Unable to connect.” Attempted failover to alternate host, but that did not succeed. Either there are no alternate hosts, or delivery failed to all alternate hosts."
At this point, I verified that my send connector was configured to use the O365 smarthost name as the delivery smarthost: %customerdomain%.mail.protection.outlook.com, and I confirmed that I could telnet from the Exchange Servers to that smarthost on port 25. This confirms that the firewall was open, and that the server could communicate on port 25 to the smarthost. SO...what the heck is up with that 451 4.4.0 error message I received about not being able to connect?!?!
Now that I sent a few test messages, and confirmed my send connector and firewall were good, I reviewed my verbose send connector logs. In this case, the logs were in the Default location (yours may be different) of “C:\program files\Microsoft\Exchange Server\V15\TransportRoles\Logs\Hub\ProtocolLog\SmtpSend”
I opened the most recent log file, and what do you know, I've got nothing but a list of failed SMTP attempts:
*IP addresses, certificate details and hostnames obfuscated for privacy*
2017-10-18T14:07:11.365Z,Outbound to Office 365,08D50F79EFBB5A31,0,,184.108.40.206:25,*,,attempting to connect
2017-10-18T14:07:11.412Z,Outbound to Office 365,08D50F79EFBB5A31,1,10.2.8.206:56905,220.127.116.11:25,+,,
2017-10-18T14:07:11.490Z,Outbound to Office 365,08D50F79EFBB5A31,2,10.2.8.206:56905,18.104.22.168:25,<,"220 SN1NAM01FT023.mail.protection.outlook.com Microsoft ESMTP MAIL Service ready at Wed, 18 Oct 2017 14:07:11 +0000",
2017-10-18T14:07:11.490Z,Outbound to Office 365,08D50F79EFBB5A31,3,10.2.8.206:56905,22.214.171.124:25,>,EHLO xxx.therealcontoso.com,
2017-10-18T14:07:11.537Z,Outbound to Office 365,08D50F79EFBB5A31,4,10.2.8.206:56905,126.96.36.199:25,<,250-SN1NAM01FT023.mail.protection.outlook.com Hello [xxx.xxx.155.65],
2017-10-18T14:07:11.537Z,Outbound to Office 365,08D50F79EFBB5A31,5,10.2.8.206:56905,188.8.131.52:25,<,250-SIZE 157286400,
2017-10-18T14:07:11.537Z,Outbound to Office 365,08D50F79EFBB5A31,6,10.2.8.206:56905,184.108.40.206:25,<,250-PIPELINING,
2017-10-18T14:07:11.537Z,Outbound to Office 365,08D50F79EFBB5A31,7,10.2.8.206:56905,220.127.116.11:25,<,250-DSN,
2017-10-18T14:07:11.537Z,Outbound to Office 365,08D50F79EFBB5A31,8,10.2.8.206:56905,18.104.22.168:25,<,250-ENHANCEDSTATUSCODES,
2017-10-18T14:07:11.537Z,Outbound to Office 365,08D50F79EFBB5A31,9,10.2.8.206:56905,22.214.171.124:25,<,250-STARTTLS,
2017-10-18T14:07:11.537Z,Outbound to Office 365,08D50F79EFBB5A31,10,10.2.8.206:56905,126.96.36.199:25,<,250-8BITMIME,
2017-10-18T14:07:11.537Z,Outbound to Office 365,08D50F79EFBB5A31,11,10.2.8.206:56905,188.8.131.52:25,<,250-BINARYMIME,
2017-10-18T14:07:11.537Z,Outbound to Office 365,08D50F79EFBB5A31,12,10.2.8.206:56905,184.108.40.206:25,<,250 CHUNKING,
2017-10-18T14:07:11.537Z,Outbound to Office 365,08D50F79EFBB5A31,13,10.2.8.206:56905,220.127.116.11:25,>,STARTTLS,
2017-10-18T14:07:11.568Z,Outbound to Office 365,08D50F79EFBB5A31,14,10.2.8.206:56905,18.104.22.168:25,<,220 2.0.0 SMTP server ready,
2017-10-18T14:07:11.568Z,Outbound to Office 365,08D50F79EFBB5A31,15,10.2.8.206:56905,22.214.171.124:25,*,,Sending certificate
2017-10-18T14:07:11.568Z,Outbound to Office 365,08D50F79EFBB5A31,16,10.2.8.206:56905,126.96.36.199:25,*,"CN=xxx.therealcontoso.com, OU=Domain Control Validated",Certificate subject
2017-10-18T14:07:11.568Z,Outbound to Office 365,XXXXXXXXXXXXXXXX,17,10.2.8.206:56905,188.8.131.52:25,*,"CN=Starfield Secure Certificate Authority - G2, OU=http://certs.starfieldtech.com/repository/, O=""Starfield Technologies, Inc."", L=Scottsdale, S=Arizona, C=US",Certificate issuer name
2017-10-18T14:07:11.568Z,Outbound to Office 365,XXXXXXXXXXXXXXXX,18,10.2.8.206:56905,184.108.40.206:25,*,XXXXXXXXXXXXXXXX,Certificate serial number
2017-10-18T14:07:11.568Z,Outbound to Office 365,XXXXXXXXXXXXXXXX,19,10.2.8.206:56905,220.127.116.11:25,*,XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX,Certificate thumbprint
2017-10-18T14:07:11.568Z,Outbound to Office 365,08D50F79EFBB5A31,20,10.2.8.206:56905,18.104.22.168:25,*,xxx.therealcontoso.com,;www.therealcontoso.com,;AutoDiscover.therealcontoso.com,;Certificate alternate names
2017-10-18T14:07:11.583Z,Outbound to Office 365,08D50F79EFBB5A31,21,10.2.8.206:56905,22.214.171.124:25,*,,TLS negotiation failed with error UnknownCredentials
2017-10-18T14:07:11.583Z,Outbound to Office 365,08D50F79EFBB5A31,22,10.2.8.206:56905,126.96.36.199:25,-,,Local
Right towards the end of the log is where I saw it. "TLS Negotiation failed.....". This naturally led me to my transport certificates being used by Exchange. I reviewed the certificates, verified that the thumbprint was accurate, and valid. I then looked at the System event logs on the Exchange server and found this error persisting right around the same exact time as when I sent my test message:
Error: A fatal error occurred when attempting to access the SSL server credential private key. The error code returned from the cryptographic module is 0x8009030d. The internal error state is 10001.
Regardless of these errors, everything appeared to be configured correctly. Quite simply, I couldn't identify where the issue was - everything looked 100% correct. So I sounded the alarms, and contacted our friends over at Microsoft. After a few hours of working with them, we found the issue!!
Open up the certificates Snap-In using the MMC console. (Refer to this link if you're not sure how to do this). Open the "personal certificates" node, and find your Exchange transport certificate in the list. Once you find it, right click on it, select All Tasks and select "Manage Private Keys" as depicted below. A security permissions window will appear. In my case, System and the built-in administrators group was already in the list with full control permissions. However, NETWORK SERVICE was not listed at all. THIS was the issue all along!! Adding in NETWORK SERVICE with READ permissions is exactly what I needed to do. This allows Exchange to manipulate/read the private key associated with the certificate defined during the hybridization wizard. Once I added this permission, I checked my message queue, forced a retry and the messages were sent out without any issues.
WOW - That was a really strange one!!
I hope this helps you!!