The requirement to repeatedly move large feature maps off-and on-chip during inference with convolutional neural networks (CNNs) imposes high costs in energy and time. In this work we explore an improved method for compressing all feature maps of pre-trained CNNs to below a specified limit. This is done by means of learned projections trained via end-to-end finetuning, which can then be folded and fused into the pre-trained network. We also introduce a new ‘ceiling compression' framework in which to evaluate such techniques in view of the future goal of performing inference fully on-chip.