The irony is that currently my Atmega setup works but the 21th century STM32 not
I discovered that with all pins floating (STM32 Standby mode: 2 μA) the RFM95 uses 0.39mA.
The fundamental problem is that the STM32 uses in STOP mode (GPIOs in last setting, all clocks stopped except wakeup timer) in default GPIO state 0.7 mA.
With all pins programmed to analog input (according to the STM manual the most power efficient state) 0.018 mA in the same STOP mode.
So I am currently trading reduced leakage through the RFM pins for a less efficient STM32 GPIO setting. Pulling NSS low drops the current from 0.7mA to 0.4mA so the SPI configuration is probably the root cause, but I have not found a configuration with lower usage than 0.4mA for the STM32 setup.
Perhaps additional external pull up resistors are indeed the only solution.