After coming back from NDSS, my senor told me that there was one paper on USENIX Security 20 about the GPU cache side-channel attack and my former supervisor praised this kind of idea in the group meeting. I also like the hardware ML security topic. Then, I read this paper and write this essay to record some ideas.

This paper is Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures. Essentially, this paper discovers a solution mapping from DNN hyper-parameters to matrix function parameters and using the side-channel method to get the latter to achieve the DNN hyper-parameters. I will introduce it in detail in the following parts.


Honestly, I am always curious about how the authors find the idea and the connection between the DNN architecture and the function parameters. First, the DNN interface relies on GEMM (Generalized Matrix Multiply) which is used for accelerating matrix operations. In the kernel, DNN architecture parameters determine the number of calls of GEMM and the dimension of the matrices of GEMM functions. However, GEMM operations can be leaked by the cache side-channel attacks, and based on the connection, the attacker can get the parameters to reverse engineer the architecture. Though recovering the architecture means recovering plenty of parameters, Cache Telepathy makes use of this connection to reduce the search space.


Briefly, the steps of this paper are as following:

  • use cache side-channel attack to monitor matrix multiplication and obtain matrix parameters
  • reverse engineer DNN architecture based on DNN hyperparameters and matrix parameters
  • prune the possible value of the remaining undiscovered hyperparameters and generate pruned search space for possible DNN architecture

Mapping DNN

Fully Connected (FC) Layer
In a fully-connected layer, each neuron computes a weighted sum of values from all the neurons in the previous layer, followed by a non-linear transformation.
Convolutional Layer
A neuron is connected to only a spatial region of neurons in the previous layer. A convolution operation is converted to a single matrix multiplication.

Based on the above analysis, it is possible to map DNN hyperparameters to matrix operation parameters by assuming all layers are sequentially connected.