I have some nicely formatted training data in HDF5 files, i.e. the images and labels are serialized within the H5 file as groups and datasets.
AFAIK NetTrain
only works in an out-of-core mode on lists of paths each wrapped in File
.
Is there any way to get NetTrain
to work off of the training data stored in an H5 file efficiently (with parallelized batch loading)?
Update from comment:
@AlexeyGolyshev suggested to use @TaliesinBeynon's undocumented answer which doesn't seem to work anymore in v11.3:
{input, output} = RandomReal[1, {2, 1000, 2}];
<< GeneralUtilities`;
fn = FileNameJoin[{CreateDirectory[], "test.h5"}]
ExportStructuredHDF5[fn, <|"Input" -> input, "Output" -> output|>];
NetTrain[LinearLayer["Input" -> 2, "Output" -> 2], File[fn]]
Answer
Example of how to pack images in HDF5 and train neural network
SeedRandom[0];
X = Table[RandomImage[1, {32, 32}, ColorSpace -> "RGB"], 10];
Y = RandomInteger[1, 10];
enc = NetEncoder[{"Image", {32, 32}, ColorSpace -> "RGB"}];
Export["data.h5",
{
"Datasets" ->
{
"Input" -> Flatten /@ enc@X,
"Output" -> N@Y
},
"DataFormat" -> {Automatic, Automatic}
},
"Rules"
];
net = NetChain[{
ReshapeLayer[{3, 32, 32}],
ConvolutionLayer[32, {3, 3}],
2,
SoftmaxLayer[]
},
"Input" -> 3*32*32,
"Output" -> NetDecoder[{"Class", {0, 1}}]
]
netT = NetTrain[net, File["data.h5"]]
Comments
Post a Comment