machine learning - Using NetTrain with HDF5?

I have some nicely formatted training data in HDF5 files, i.e. the images and labels are serialized within the H5 file as groups and datasets.

AFAIK NetTrain only works in an out-of-core mode on lists of paths each wrapped in File.

Is there any way to get NetTrain to work off of the training data stored in an H5 file efficiently (with parallelized batch loading)?

Update from comment:

@AlexeyGolyshev suggested to use @TaliesinBeynon's undocumented answer which doesn't seem to work anymore in v11.3:

{input, output} = RandomReal[1, {2, 1000, 2}];
<< GeneralUtilities`;
fn = FileNameJoin[{CreateDirectory[], "test.h5"}]
ExportStructuredHDF5[fn, <|"Input" -> input, "Output" -> output|>];

NetTrain[LinearLayer["Input" -> 2, "Output" -> 2], File[fn]]

Answer

Example of how to pack images in HDF5 and train neural network

SeedRandom[0];
X = Table[RandomImage[1, {32, 32}, ColorSpace -> "RGB"], 10];
Y = RandomInteger[1, 10];

enc = NetEncoder[{"Image", {32, 32}, ColorSpace -> "RGB"}];


Export["data.h5",
 {
  "Datasets" ->
   {
    "Input" -> Flatten /@ enc@X,
    "Output" -> N@Y
    },
  "DataFormat" -> {Automatic, Automatic}
  },

 "Rules"
 ];

net = NetChain[{
   ReshapeLayer[{3, 32, 32}],
   ConvolutionLayer[32, {3, 3}],
   2,
   SoftmaxLayer[]
   },
  "Input" -> 3*32*32,

  "Output" -> NetDecoder[{"Class", {0, 1}}]
  ]

netT = NetTrain[net, File["data.h5"]]

Blog

Search This Blog

machine learning - Using NetTrain with HDF5?

Comments

Post a Comment

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

How to thread a list

functions - Get leading series expansion term?