This document describes the Nearby Share protocol as understood by me (@grishka) from Chromium sources (that are way too convoluted) and decompiled Google Services Framework apk (that, despite being a decompilation of obfuscated code, is still more helpful than Chromium sources).
The protocol is peer-to-peer, end-to-end encrypted. Overall, it appears like it was mostly designed to run over an unreliable medium like UDP, but I've only observed it over TCP.
Only the WiFi LAN protocol is described here because that's what I reverse engineered, but the data running over other mediums (Bluetooth, WiFi Direct, ...) is most probably largely the same.
If you want to build your own Nearby Share thing, you will need:
- An implementation of multicast DNS (most modern OSes have it built-in)
- A cryptography library that is capable of ECDSA key exchange, AES-CBC, HMAC, and SHA256. OpenSSL will do but is definitely an overkill.
- A Protobuf library
- These Protobuf files I collected from the Chromium sources so you don't have to
It is also very helpful to read logcat on your Android device if you're having any trouble. The logging of the Android implementation of Nearby Share is very verbose.
Since a file transfer is unidirectional, the peers are assigned roles:
- The receiving side is the server. It listens on a TCP port and advertises an MDNS service.
- The sending side is the client. It discovers the MDNS service and connects to the server's TCP port.
To become visible in the sheet on Android devices, a server advertises an MDNS service. The domain is empty. The type is _FC9F5ED42C8A._tcp.
. The port is an arbitrary TCP port on which the server accepts incoming connections.
The name is the following 10 bytes encoded in URL-safe base64:
0x23
, Google calls this "PCP" but I have no idea what it is- 4-byte endpoint ID
- 3-byte service ID:
0xFC, 0x9F, 0x5E
- 2 zero bytes that serve an unknown purpose
The endpoint ID is 4 random alphanumeric characters. It identifies devices to each other. Android uses it in its logs quite extensively.
The service also needs to have a TXT record with key n
and the value of the following encoded in URL-safe base64 ("endpoint info"):
- 1 byte: bit field
- 3 bits: version, set to 0
- 1 bit: visibility, 0 = visible
- 3 bits: device type. Android uses this to pick an icon. 0 = unknown, 1 = phone, 2 = tablet, 3 = laptop
- 1 bit: reserved, set to 0
- 16 bytes of unknown purpose. I set them to random.
- User-visible device name in UTF-8, prefixed with 1-byte length.
Android does not advertise the MDNS service all the time regardless of the visibility setting. It waits for a BLE advertisement with the following parameters:
- Service UUID =
fe 2c
- Service data =
fc 12 8e 01 42 00 00 00 00 00 00 00 00 00 [10 random bytes]
This can't be sent from macOS because there's no API I could find that would allow setting the service data. As far as I can tell, the Android side is hardcoded to look for that prefix in the service data so there really is no way to make it work on macOS. Android sends these BLE advertisements periodically while searching for Nearby Share targets; these are also what makes the "device nearby is sharing" notification pop up.
The service ID (FC9F...) comes from SHA256("NearbySharing") = fc9f5ed42c8a5e9e94684076ef3bf938a809c60ad354992b0435aebbdc58b97b
.
sequenceDiagram
Client-->>Server: (Connects to advertised TCP port)
Client->>Server: Connection request
Client->>Server: UKEY2 ClientInit
Server->>Client: UKEY2 ServerInit
Client->>Server: UKEY2 ClientFinish
Server->>Client: Connection response
Client->>Server: Connection response
Note over Server, Client: All following packets are encrypted
Server->>Client: Paired key encryption
Client->>Server: Paired key encryption
Server->>Client: Paired key result
Client->>Server: Paired key result
Client->>Server: Introduction (transfer metadata)
Note over Server: Asks the user
Server->>Client: Response (accept/reject)
Client->>Server: File chunks
Client-->>Server:
Client-->>Server:
Client->>Server: Disconnection
Client-->Server: (Close TCP connection)
From the Google point of view, the "Nearby connections" part is a separate universal transport layer, over which the "Share" runs. This may explain some bizarre arrangements where you have protobuf inside protobuf inside encrypted protobuf inside protobuf.
There are three types of packets that can appear directly on the wire in the air:
- Offline frames. These are the basic uint of the nearby protocol. They are used to control the connection.
- UKEY2 messages. These are used for the encryption key exchange (UKEY2 is Google's bespoke algorithm for that).
- Secure messages. These are used exclusively after the initial negotiation and carry other packets inside them in the encrypted form.
Keep the protobuf files open to follow along.
Each protobuf message sent over the TCP connection is prefixed with 4-byte big-endian (MSB first) length.
(described from the server/recipient point of view)
After the client connects to the server, it sends two packets: a "connection request" and a "UKEY2 client init".
This is a subtype of "offline frame".
The client wants to connect to the server and tells it about itself. The only field of interest here is endpointInfo
. It contains the device type and name. It has the same format as the n
TXT record described above, just without the base64 encoding.
This is a subtype of "UKEY2 message". Google's UKEY2 reference implementation is open source and comes with documentation. Please refer to that repo for details on the key exchange. This is the initial step of the key exchange for end-to-end encryption. Upon receiving this, the server generates a ECDSA key pair and sends its public key in a "server init". The server also needs to remember the raw serialized client init message for the final key derivation step. The outgoing server init message is also needed for the next step. The bytes include the entire protobuf message but do not include the int32 length prefix.
After receiving the server init, the client completes the key derivation and sends a "client finish", containing its public key.
This is where the server completes the key derivation. This step is described in detail in the Google readme.
The result of the key exchange is two values: the authentication string and the next protocol secret.
The next protocol secret is further processed to obtain the two 32-byte AES and two 32-byte HMAC keys used for encryption and authentication of further communications (relevant Chromium code and this as well):
Derive two 32-byte "device to device" keys using HKDF-SHA256:
- D2D client key, using the next protocol secret for input key material,
82AA55A0D397F88346CA1CEE8D3909B95F13FA7DEB1D4AB38376B8256DA85510
for salt, and the stringclient
for info - D2D server key, using the same parameters, except info is
server
Next, derive the four keys you will use for the actual encryption. All four use the same value of salt, which is SHA256("SecureMessage")
, or BF9D2A53C63616D75DB0A7165B91C1EF73E537F2427405FA23610A4BE657642E
. These keys are from the server POV; if you're the client, they need to be swapped around (decrypt/receive shoud use the server key and vice versa).
- Decrypt key: IKM = D2D client key, info =
ENC:2
- Receive HMAC key: IKM = D2D client key, info =
SIG:1
- Encrypt key: IKM = D2D server key, info =
ENC:2
- Send HMAC key: IKM = D2D server key, info =
SIG:1
The key exchange is now complete.
The authentication string is used for out-of-band key verification. Nearby Share doesn't use the algorithm specified by UKEY2. Instead, a 4-digit PIN code is generated using this algorithm.
After the key exchange is complete, the server and client send each other one last plaintext message: a connection response. It's a subtype of offline frame saying that the other party has accepted the connection. All the following communication is encrypted and wrapped in the payload layer.
The message type on the wire is always the "secure message". A secure message has two fields: header and body, and the signature.
Header and body is a serialized HeaderAndBody
message. Inside, there are two fields, that are (what a surprise!) header and body. The body contains the encrypted payload. The header contains the encryption scheme (must be set to AES256_CBC
), the signature scheme (must be set to HMAC_SHA256
), the IV (initialization vector) for AES-CBC consisting of 16 random bytes, and the public metadata. Public metadata is needed because the protocol is extensible af. It contains two fields with constant values: version that is always 1 and type that is always DEVICE_TO_DEVICE_MESSAGE
.
The signature is a HMAC-SHA256 of the header-and-body fields using one of the keys derived above.
The body inside header-and-body is encrypted using AES-CBC with PKCS7 padding. After decryption it should be a valid device to device message (see securegcm.proto). If it isn't, you did something wrong. Go back and debug your key exchange code. Cryptography is messy, don't worry, no one gets it right on the first try ¯\_(ツ)_/¯
The device to device message contains a sequence number and a message. The message is always a serialized offline frame. The sequence number is 1 for the first message, and is incremented with each message. Client and server have their own independent sequence numbers.
This layer allows the transfer of arbitrarily large payloads in chunks. Payloads come in two types: bytes and files. All negotiation uses bytes payloads with protobuf messages inside. The file payloads are used for actual files.
Payload transfer frames are wrapped into offline frames of type PAYLOAD_TRANSFER
. These are then encrypted as described above. Meaning of the important payload transfer frame fields is as follows:
- header: the metadata
- id: the payload identifier within the connection. Allows transferring multiple payloads in parallel. You use it to keep track of multiple transfers, associate buffers and files to it, etc.
- type: either
BYTES
orFILE
- totalSize: self-explanatory
- chunk: the data itself
- offset: the offset at which this chunk needs to be written into the buffer or file
- flags: if
LAST_CHUNK
(bit 0) is set, the transfer is complete and this is the last chunk - body: the data bytes themselves
Android does this thing where it sends 2 payload transfer frames in succession for each negotiation message: the first contains the entire message, the second contains 0 bytes but has the LAST_CHUNK
flag set. I replicated this behavior in NearDrop.
The client and the server send each other a "paired key encryption" frame, wrapped into the payload layer. Presumably, this is used for all the visibility-restriction phone number stuff. Also presumably getting the data contained within involves talking to Google servers. I set secretIDHash
to 6 random bytes and signedData
to 72 random bytes in the ones I send. It works fine.
After that, the client and the server send each other a "paired key result" frame. Both have status
set to UNABLE
. Whatever.
These and following protobuf messages are specific to Nearby Share and are defined here.
After the successful exchange of the meaningless paired key encryption frames, the client sends an "introduction" frame to the server. This contains the list of files that the client is about to send to the server. The fields should be self-explanatory. The payload_id
will be used in the payload layer for transferring that particluar file.
At this point, Android shows that the connection was successful and displays the PIN code. The server would prompt the user to accept the transfer.
To accept the transfer, the server sends a "response" frame with status
set to ACCEPT
. The client will then start sending file chunks over the payload layer. You did it 🎉
Do the same but set status
to REJECT
. There are also other status codes, like NOT_ENOUGH_SPACE
, that result in Android showing a different error.
Android sends offline frames of type KEEP_ALIVE
every 10 seconds and expects the server to do the same. If you don't, it will terminate the connection after a while thinking your app crashed or something. This especially comes into play when sending large files. No, TCP's built-in acknowledgements are not enough. There are so many abstraction layers that whoever came up with this forgot about them.
Nearby Connections, the underlying layer of Nearby Share, supports running over different "mediums" as Google calls them. Wi-Fi LAN is one of them. Bluetooth, BLE, Wi-Fi Direct, to name a few, are others.
The server is in charge of choosing the medium. The client specifies its supported mediums in its "connection request" packet. The server then intersects that with its own set of supported mediums. After the transfer is accepted, the server may ask the client for a "bandwidth upgrade" by sending a corresponding packet with its chosen medium and authentication credentials, if any.
It is still not clear how the actual medium switch occurs.