export - Simple example of NetGraph running in MXNet?

There is an great example of exporting a NetGraph (that was trained in Mathematica) into the MXNet's format.

However, I think there's a bug or something is wrong with the answer, because once exported the file refuses to load into mxnet.

I've cleaned the old answer (which is broken) into something that works:

<< MXNetLink`;
<< NeuralNetworks`;
<< GeneralUtilities`;
net = NetChain[{ConvolutionLayer[20, {5, 5}], ElementwiseLayer[Ramp], 
   PoolingLayer[{2, 2}, {2, 2}], ConvolutionLayer[50, {5, 5}], 

   ElementwiseLayer[Ramp], PoolingLayer[{2, 2}, {2, 2}], 
   FlattenLayer[], DotPlusLayer[500], ElementwiseLayer[Ramp], 
   DotPlusLayer[10], SoftmaxLayer[]}, 
  "Output" -> NetDecoder[{"Class", Range[0, 9]}], 
  "Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale"}]]
resource = ResourceObject["MNIST"];
trainingData = ResourceData[resource, "TrainingData"];
testData = ResourceData[resource, "TestData"];
trained = 
 NetTrain[net, trainingData, ValidationSet -> testData, 

  MaxTrainingRounds -> 3]
jsonPath = "~/MNIST-symbol.json";
paraPath = "~/MNIST-0000.params";
Export[jsonPath, ToMXJSON[trained][[1]], "String"];
f[str_] := 
  If[StringFreeQ[str, "Arrays"], str, 
   StringReplace[
    StringSplit[str, ".Arrays."] /. {a_, b_} :> 
      StringJoin[{"arg:", a, "_", b}], {"Weights" -> "weight", 
     "Biases" -> "bias"}]];

plan = ToMXPlan[trained];
NDArrayExport[paraPath, 
 NDArrayCreate /@ KeyMap[f, plan["ArgumentArrays"]]]

Now, yes we can load it back into MMA alright, but when it comes time to try it with MXNet for reals, we start hitting various snags when trying to load the symbol/params into a module so we can actually use the network:

import mxnet as mx (* python code that loads an runs the model *)
sym, arg_params, aux_params = mx.model.load_checkpoint('MNIST', 0)
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
batch_size = 2

iter = mx.io.NDArrayIter(mnist['data'][:6], mnist['target'][:6], batch_size)
mod = mx.mod.Module(symbol=sym, context=mx.cpu(), data_names=['Input'], label_names=None)
mod.bind(for_training=False, data_shapes = [('Input', (batch_size, 1, 28, 28))])
mod.set_params(arg_params, aux_params)
print(mod.predict(iter))