Learning Resources for Software Engineering Students »
HTTPS is the end-to-end encryption on data on top of HTTP to prevent network sniffing (eavesdropping data packets). In this tutorial, we will cover four questions to have a better understanding of https. The questions are:
The web application usually runs over IP network, which is vulnerable to network sniffing. The old HTTP transmits data packets in plain text and if the network is sniffed, the sniffer can see confidential information in the data packets such as the password or session tokens. Here are some examples on how a plain text could be sniffed.
On public free wifi, a Wifi adapter in monitor mode would be able to capture all the ongoing packets to and from the wifi access point, regardless of its destination. If the traffic is transmitted over HTTP, the data sent over wifi is in plain text and the session token and password can be stolen. One famous example would be Firesheep, a Firefox plugin to sniff session token used in websites such as Facebook. This has made Facebook change its default protocol from HTTP to HTTPS
Our network packets usually travel through switches and routers around the globe to reach the destination. Any one of them, if compromised, could expose our network traffic to the sniffer. Network tap is an example of a device used to sniff network traffic.
Our Internet architecture relies on DNS for domain name and IP mapping and ARP for MAC address and IP mapping. None of the above is built with security in mind. Common attacks such as DNS cache poisoning or ARP poisoning could redirect your traffic for monitoring.
All in all, the Internet architecture that we rely on for network transmission is very vulnerable to network sniffing. If we were to use HTTP, which transmits packets in plain text, no confidentiality could be guaranteed for our web application. Therefore, we need to use HTTPS as an end to end encryption to secure our network packets.
As aforementioned, our network is not secure, so how could HTTPS help? HTTPS is built on top of HTTP with the addition of SSL to encrypt the plain text message. The purpose of this encryption is to make sure only client and the server could decrypt the message with required keys, and sniffer cannot decrypt packets even though they may sniff packets.
There are mainly 3 encryption algorithms used in HTTPS, namely RSA, Diffie Hellman and Elliptic Curve Algorithm. They are more thoroughly explained in Introduction to Cryptography section. These algorithms prevent sniffers from decrypting packets without knowing the keys used because the best attack algorithms known at the moment run in sub-exponential time. Therefore, the attack is believed to be computationally infeasible when the keysize is large enough (e.g. 2048 bits for RSA), though it is not mathematically proven.
Thus, by using https, we can be sure that even though our network packets are transmitted over an insecure network, sniffers cannot understand the content of our encrypted packets.
Besides providing secure network traffic, HTTPS also provides server validation through Certificate Authority (CA) architecture. A detailed explanation on CA is here . In short, CA works by issuing the server a digital certificate that can only be produced by CA. When the server sends its digital certificate to the client, client browsers verify the digital certificate with CA to check whether the server is indeed the intended server. To obtain such digital certificate, the server needs to apply to CA and CA will verify the server before issuing the digital certificate.
In order set up HTTPS on your server, you would need to have:
However, most CAs are not free of charge. One free initiative to provide free domain validation certificates is Lets Encrypt.
Up to this moment, it seems that nothing could go wrong with HTTPS. However, in real life, there could be weakness on the implementation of HTTPS.
As aforementioned, Certificate Authority (CA) signs a digital certificate provided by the server to prove the identity of the server. However, in the real-world implementation, the CA does not sign the digital certificate directly, rather CA signs the fixed length hashed digest of the digital certificate for efficiency. This introduces the possibility of two different digital certificates with the same hashed digest. Thus, if the attackers manage to forge a fake digital certificate with the same hashed digest of another valid digital certificate, the browser would trust the attackers server and all the servers signed by the attackers. This loophole occurs with SHA-1 hashing algorithm and SHA-1 is no longer used in HTTPS after 2016. In 2017, Google has announced an algorithm to forge a duplicated SHA-1 hash and the report is here. A more detailed explanation of this problem is found here.
During the cipher suite negotiation process of HTTPS, client and server send each other in plain text the HTTPS standard they support and the most secure standard is chosen to be used. However, this can be exploited for HTTPS downgrade by a man in the middle attack. The man in the middle could send forged cipher suite negotiation to both server and client to indicate the maximum security supported is only 512 bits Diffie Hellman and trick server and client to encrypt with 512 bits Diffie Hellman. At the moment, without knowing the keys, 512 bits Diffie Hellman algorithm could be decrypted with sufficient resources. A more detailed description could be found here
HTTPS provides security to a web application. If the web application requires secure network traffic (e.g. online banking), HTTPS should be implemented. However, servers usually need to pay for the digital certificate used by HTTPS. Also, the additional layer of encryption and decryption adds overhead to network traffic (though the impact is not significant). If secure network traffic is not required (e.g. University home page), HTTPS may not be used.