In order to achieve reliability on weak mobile connections as well as speed when dealing with large files (such as photos, large videos and files up to 1,5 GB), MTProto uses an original approach. This document is intended to clarify certain details of our setup, as well as address some points that are difficult to understand at first glance.
Detailed protocol documentation is available here. Please note that MTProto supports two layers: client-server encryption that is used in Telegram cloud chats and end-to-end encryption that is used in Telegram Secret Chats. See below for more information.
If you have any comments, hit us up on Twitter.
Server-client encryption is used in Telegram cloud chats. Here's a brief overview of the setup:
Each plaintext message to be encrypted in MTProto always contains the following data to be checked upon decryption in order to make the system robust against known problems with the components:
Telegram's End-to-end encrypted Secret Chats are using an additional layer of encryption on top of the described above.
End-to-end encryption is used in Telegram Secret Chats. You can read more about it here: Secret Chats, End-to-End encryption. Here's a brief overview of the setup:
Please see these articles for details:
While other ways of achieving the same cryptographic goals, undoubtedly, exist, we feel that the present solution is both robust and also sucсeeds at our secondary task of beating unencrypted messengers in terms of delivery time and stability.
We prefer to use well-known algorithms, created in the days when bandwidth and processing power were both a much rarer commodity. This has valuable side-effects for modern-day mobile development, provided one takes care of the known drawbacks.
The weakspots of such algorithms are also well-known, and have been exploited for decades. We use these algorithms in such a combination that, to our best knowledge, prevents any known attack from possibly succeeding. Although we’d be grateful to see any evidence of the contrary (so far absent) and update our system accordingly.
If you have any comments, we would be happy to hear them at email@example.com.
You are also welcome to join in our competition — we are offering $300,000 to the first person to break Telegram encryption. Check out the contest announcement.
All Telegram apps ensure that msg_key is equal to SHA-1 of the decrypted message. It is important that the plaintext always contains message length, server salt, session_id and other data not known to the attacker.
It is crucial that AES decryption keys depend both on msg_key, and on auth_key, known only to the parties involved in the exchange.
We do none of the above, strictly speaking. For message authentication, we compute SHA-1(AES(…,encrypted_message)) upon message receipt and compare this value with the msg_key received with the encrypted message.
Using encrypt-then-MAC, e.g. involving GCM (Galois Counter Mode), would enable the receiving party to detect unauthorized or modified ciphertexts, thus eliminating the need to decrypt them in case of tampering.
In MTProto, the clients and the server authenticate messages by ensuring that SHA1(plaintext) = msg_key and that the plaintext always contains message length, server salt, session_id and other data not known to a potential attacker before accepting any message. These security checks performed on the client before any message is accepted ensure that invalid or tampered with messages will always be safely (and silently) discarded.
This way we arrive at the same result. The difference is that the security check is performed before decryption in Encrypt-then-MAC and after decryption in MTProto – but in either case before a message is accepted. AES encryption / decryption on devices currently in use is comparable in speed with the additional HMAC computation required for the encrypt-then-MAC approach.
Even though it is possible for a well-funded attacker to find collisions in SHA-1, crafting a collision does not break the MTProto encryption scheme due to the way SHA-1 is used in the protocol (see here and here). We do not use SHA-1 in any areas where collisions are important, such as digital signatures or file identification.
We use SHA-1 as a component in the KDF and for a security check after transmission. Even if an attacker could create messages with a chosen SHA-1 (a far greater achievement than merely finding a collision), it would avail to nothing. This is because we use the SHA-1 of the plaintext *, while MTProto passes the ciphertext encrypted with a key unknown to the attacker. In order to threaten this setup, you need not a collision in SHA-1(ciphertext), you need a collision in SHA-1(AES_Decrypt(key_unknown_to_attacker, ciphertext)).
At the same time, SHA-1 is computed considerably faster than SHA-256 and other suitable and well-studied algorithms. This edge in speed is very important on older mobile devices, and especially so, since Telegram can send full-quality photos, as well as large videos and other files of up to 1,5 GB each. We stick with SHA-1 for the time being, reserving the option to switch to a different hash function in a future revision of the protocol, as the computational power of both potential adversaries and user devices grows over time.
It is important that the plaintext always contains message length, server salt, session_id and other data not known to the attacker. It is crucial that AES decryption keys depend both on msg_key, and on auth_key, known only to the parties involved in the exchange.
Yes, we use IGE, but it is not broken in our implementation. The fact that we do not use IGE as MAC together with other properties of our system makes the known attacks on IGE irrelevant.
Adaptive attacks are even theoretically impossible in MTProto, because in order to be encrypted the message must be fully formed first, since the key is dependent on the message content. As for non-adaptive CPA, IGE is secure against them, as is CBC.
Various secrets (nonce, server_nonce, new_nonce) exchanged during key generation guarantee that the DH-key can only be obtained by the instance that initiated the exchange.
Notice that new_nonce is transferred explicitly only once, inside an RSA-encrypted message from the client to the server.
Keys for end-to-end encrypted secret chats are generated by a new instance of DH key exchange, so they are known only to the parties involved and not to the server. To establish the identities of these parties and to ensure that no MitM is in place, it is recommended to compare identicons, generated from hashes of the DH secret chat keys (key visualizations).
Keys for end-to-end encrypted calls are generated using the Diffie-Hellman key exchange. Users who are on a call can ensure that there is no MitM by comparing key visualizations.
To make key verification practical in the context of a voice call, Telegram uses a three-message modification of the standard DH key exchange for calls:
The idea is that Alice commits to a specific value of a (and of g_a), but does not reveal g_a to Bob (or Eve) until the very last step. Bob has to choose his value of b and g_b without knowing the true value of g_a. If Eve is performing a Man-in-the-Middle attack, she cannot change a depending on the value of g_b received from Bob and she also can't tune her value of b depending on g_a. As a result, Eve only gets one shot at injecting her parameters — and she must fire this shot with her eyes closed.
Thanks to this modification, it becomes possible to prevent eavesdropping (MitM attacks on DH) with a probability of more than 0.9999999999 by using just over 33 bits of entropy in the visualization. These bits are presented to the users in the form of four emoticons. We have selected a pool of 333 emoji that all look quite different from one another and can be easily described in simple words in any language.
You can read more about key verification for Telegram calls here.
Telegram's Secret chats support Perfect Forward Secrecy, you can read more about it here.
By definition, the known-plaintext attack (KPA) is an attack model for cryptanalysis where the attacker has samples of both the plaintext, and its encrypted version (ciphertext).
AES IGE that is used in MTProto is robust against KPA attacks (see this, if you wonder how one can securely use IGE). On top of that, the plaintext in MTProto always contains server_salt and session id.
By definition, a chosen-plaintext attack (CPA) is an attack model for cryptanalysis which presumes that the attacker has the capability to choose arbitrary plaintexts to be encrypted and obtain the corresponding ciphertexts.
MTProto uses AES in IGE mode (see this, if you wonder how one can securely use IGE) that is secure against non-adaptive CPAs. IGE is known to be not secure against blockwise-adaptive CPA, but MTProto fixes this in the following manner:
Each plaintext message to be encrypted always contains the following to be checked upon decryption:
On top of this, in order to replace the plaintext, you would also need to use the right AES key and iv, both dependent on the auth_key. This makes MTProto robust against a CPA.
By definition, a chosen-ciphertext attack (CCA) is an attack model for cryptanalysis in which the cryptanalyst gathers information, at least in part, by choosing a ciphertext and obtaining its decryption under an unknown key. In the attack, an adversary has a chance to enter one or more known ciphertexts into the system and obtain the resulting plaintexts. From these pieces of information the adversary can attempt to recover the hidden secret key used for decryption.
Each time a message is decrypted in MTProto, a check is performed to see whether the msg_key is equal to the SHA-1 of the decrypted data. The plaintext (decrypted data) also always contains message length, server salt and sequence number. This negates known CCAs.
Properties like IND-CCA are convenient for theoretical definitions and scientific inquiry, but they are not directly related to the actual security of communication. There are cases when IND-CCA compliance can be critical, but in the case of MTProto the deviation from this property is a minor issue and does not affect message security. Namely, under certain circumstances a ciphertext can be modified so that it will be accepted and decrypted to the same plaintext as the original unmodified ciphertext. It is impossible for the attacker to tamper with or decipher the plaintext.
The gist, for non-technical readers, is this: Under certain circumstances somebody can take an encrypted message after it was sent (without knowing what was inside), change some symbols in the ciphertext (without being able to alter the actual message inside), and pass it on to you. After decryption, you will receive the same message that was sent and only you and the sender will know what was in it.
To put this case into familiar terms:
A postal worker can write ‘Haha’ (using invisible ink!) on the outside of a sealed package that he delivers to you. It doesn‘t stop the package from being delivered, it doesn’t allow them to change the contents of the package, and it doesn't allow them to see what was inside.
Replay attacks are denied because each plaintext to be encrypted contains the server salt and the unique message id and sequence number.
This means that each message can only be sent once.
Telegram has two modes of communication — ordinary chats using client-server encryption and Secret Chats using end-to-end encryption.
Client-Server communication is protected from MiTM-attacks during DH key generation by means of a server RSA public key embedded into client software. After that, if both clients trust the server software, the Secret Chats between them are protected by the server from MiTM attacks.
The interface offers a way of comparing Secret Chat keys for users who do not trust the server. Visualizations of the key are presented in the form of identicons (example here). By comparing key visualizations users can make sure no MITM attack had taken place.
Earlier versions of Telegram used a 128-bit fingerprint to create the key visualization. It was theoretically possible to spoof it, provided a man-in-the-middle attacker was prepared to spend hundreds of billions of dollars to spoof one secret chat. It would‘ve also taken such a secret chat an entire month to be created instead of mere seconds, which would’ve certainly been hard to ignore.
Currently, the fingerprint uses an additional 160 bits from the SHA-256 of the key, yielding a total of 288 fingerprint bits, which makes the already infeasible attacks completely impossible.
By definition, length extension attacks are a type of attack when certain types of hashes are misused as message authentication codes, allowing for inclusion of extra information.
A message in MTProto consists of an msg_key, essentially equal to SHA-1 of the plaintext with some additional parameters, followed by the ciphertext. The attacker cannot append extra bytes to the end and recompute the SHA-1, since the SHA-1 is computed from the plaintext, not the ciphertext, and the attacker has no way to obtain the ciphertext corresponding to the extra plaintext bytes she may want to add.
Apart from that, changing the msg_key would also change the AES decryption key for the message in a way unpredictable for the attacker, so even the original prefix would decrypt to garbage — which would be immediately detected since the app performs a security check to ensure the that SHA-1 of the plaintext matches the msg_key received.