machine learning - How to train Sequence-to-sequence autoencoder using LSTM?

According Keras blog,I find the Seq2Seq auto-encoder. But it didn't give any example only code.

To build a LSTM-based autoencoder, first use a LSTM encoder to turn your input sequences into a single vector that contains information about the entire sequence, then repeat this vector n times (where n is the number of timesteps in the output sequence), and run a LSTM decoder to turn this constant sequence into the target sequence.

So I try to make similar thing in Mathematica using MNIST database.

resource = ResourceObject["MNIST"];
trainingData = ResourceData[resource, "TrainingData"];
testData = ResourceData[resource, "TestData"];
trainingSubset = Select[trainingData, Last[#] <= 4 &];
testSubset = Select[testData, Last[#] <= 4 &];
trainingImages = Keys[trainingSubset];
meanImage = Image[Mean@Map[ImageData, trainingImages]];

This is normal DNN auto-encoder of MNIST.

encoder = NetChain[{FlattenLayer[], 100, Ramp, 50, Ramp, 16}];
decoder = NetChain[{32, 50, Ramp, 100, Ramp, 784, ReshapeLayer[{1, 28, 28}]}];
net = NetGraph[{encoder, decoder, MeanSquaredLossLayer[]}, {1 -> 2 -> NetPort["Output"], 2 -> NetPort[3, "Input"], NetPort["Input"] -> NetPort[3, "Target"]}, 
               "Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale", "MeanImage" -> meanImage}], 
               "Output" -> NetDecoder[{"Image", "Grayscale"}]]
{lossplot1, trained} = NetTrain[net, <|"Input" -> trainingImages|>, "Loss", 
                                {"LossEvolutionPlot", "TrainedNet"}, 
                                BatchSize -> 256, MaxTrainingRounds -> 10];

We could explore the visulization of it.

reconstructor = Take[trained, {NetPort["Input"], NetPort["Output"]}];
BlockRandom[
 Grid[{#, ImageAdd[reconstructor[#], meanImage] & /@ #} &@
   RandomSample[trainingImages, 10]], RandomSeeding -> 1234]

encoder = Take[trained, {NetPort["Input"], 1}];
testImages = Keys[testSubset];

coords = DimensionReduce[encoder[testImages], 2, Method -> "TSNE"];
labels = Values[testSubset];
ListPlot[Table[Extract[coords, Position[labels, i]], {i, 0, 4}], 
 PlotLegends -> PointLegend[96, Range[0, 4]], Axes -> None, 
 PlotStyle -> Map[ColorData[96], Range[1, 5]], AspectRatio -> 1, 
 ImageSize -> Small]

But using autoencoder based LSTM,the result isn't good.

encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 

                    SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28], 
                    ReshapeLayer[{1, 28, 28}]}];
net = NetGraph[{encoder, decoder, MeanSquaredLossLayer[]}, {1 -> 2 -> NetPort["Output"], 2 -> NetPort[3, "Input"], NetPort["Input"] -> NetPort[3, "Target"]}, 
                "Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale", "MeanImage" -> meanImage}], 
                "Output" -> NetDecoder[{"Image", "Grayscale"}]]
{lossplot2, trained} = NetTrain[net, <|"Input" -> trainingImages|>, "Loss", 
                                {"LossEvolutionPlot", "TrainedNet"}, 
                                BatchSize -> 256, MaxTrainingRounds -> 10];

reconstructor = Take[trained, {NetPort["Input"], NetPort["Output"]}];
BlockRandom[Grid[{#, ImageAdd[reconstructor[#], meanImage] & /@ #} &@
   RandomSample[trainingImages, 10]], RandomSeeding -> 1234]

encoder = Take[trained, {NetPort["Input"], 1}];
testImages = Keys[testSubset];
coords = DimensionReduce[encoder[testImages], 2, Method -> "TSNE"];
labels = Values[testSubset];

ListPlot[Table[Extract[coords, Position[labels, i]], {i, 0, 4}], 
 PlotLegends -> PointLegend[96, Range[0, 4]], Axes -> None, 
 PlotStyle -> Map[ColorData[96], Range[1, 5]], AspectRatio -> 1, 
 ImageSize -> Small]

The loss of LSTM(purple color) fall down slowly

Show[{lossplot1, lossplot2 /. Hue[__] -> Hue[0.75]}]

How to improve the result?

PS:Using LSTM to predict digits number in MNIST got a good result even data is image instead of temporal time-series in this case

Using LSTM,the dimensions of embedding vector after LSTM is also 16,it can be think a presentation or encoder of the image.

why in the above case,same embedding vector size can't get a pretty good result of auto-encoder?

lstmnet = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 
                    SequenceLastLayer[], LinearLayer[5], SoftmaxLayer[]}, 
                    "Output" -> NetDecoder[{"Class", Range[0, 4]}], 
                    "Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale"}]];

trained = NetTrain[lstmnet, trainingSubset, ValidationSet -> testSubset, 
                   BatchSize -> 256, MaxTrainingRounds -> 5]
ClassifierMeasurements[trained, testSubset]["Accuracy"]
(*0.965947*)

Using CNN(lenet version)

lenet = NetChain[{ConvolutionLayer[20, 5], Ramp, PoolingLayer[2, 2], 
                  ConvolutionLayer[50, 5], Ramp, PoolingLayer[2, 2], FlattenLayer[],
                  500, Ramp, 5, SoftmaxLayer[]}, 

                  "Output" -> NetDecoder[{"Class", Range[0, 4]}], 
                  "Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale"}]];
trained = NetTrain[lenet, trainingSubset, ValidationSet -> testSubset,
                   BatchSize -> 256,MaxTrainingRounds -> 5]
ClassifierMeasurements[trained, testSubset]["Accuracy"]
(*0.997081*)

Answer

I compare many net structure using LSTM.

Then finding multi-stacks LSTM will get a more accurate result.But it's not as effective as enlarge the embedding size.

Note:All the networks use this code as basic model,batch size 256,epochs 10

net = NetGraph[{encoder, decoder, MeanSquaredLossLayer[]}, 
               {1 -> 2 -> NetPort["Output"], 2 -> NetPort[3, "Input"], NetPort["Input"] -> NetPort[3, "Target"]}, 
               "Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale", "MeanImage" -> meanImage}], 
               "Output" -> NetDecoder[{"Image", "Grayscale"}]]

Baseline network:

encoder = NetChain[{FlattenLayer[], 100, Ramp, 50, Ramp, 16}];
decoder = NetChain[{50, Ramp, 100, Ramp, 784, ReshapeLayer[{1, 28, 28}]}];

Using single LSTM(state size 16):

encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 
                    SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28], 
                    ReshapeLayer[{1, 28, 28}]}];

Using single LSTM(state size 64):

encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[64], 
                    SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28], 
                    ReshapeLayer[{1, 28, 28}]}];

Using double LSTM(state size 16):

encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 
                    LongShortTermMemoryLayer[16], SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28], 
                    LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

Using double LSTM(state size 64):

encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[64], 
    LongShortTermMemoryLayer[64], SequenceLastLayer[]}];

decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28], 
    LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

Using triple LSTM(state size 64):

encoder = NetChain[{ReshapeLayer[{28, 28}], 
                    LongShortTermMemoryLayer[64], 
                    LongShortTermMemoryLayer[64], 
                    LongShortTermMemoryLayer[64], SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], 
                    LongShortTermMemoryLayer[28], 

                    LongShortTermMemoryLayer[28], 
                    LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

Using triple LSTM(state size 64) and DNN:

Contrary to my expectations, it didn't get a better result compare to the above net :)

encoder = NetChain[{ReshapeLayer[{28, 28}], 
                    LongShortTermMemoryLayer[64], 
                    LongShortTermMemoryLayer[64], 
                    LongShortTermMemoryLayer[64], 
                    SequenceLastLayer[], 32, ElementwiseLayer["SELU"], 16}];

decoder = NetChain[{32, ElementwiseLayer["SELU"], 64, ReplicateLayer[28], 
                    LongShortTermMemoryLayer[28], 
                    LongShortTermMemoryLayer[28], 
                    LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

Using quadruple LSTM(state size 64) - Best result of LSTM family.

encoder = NetChain[{ReshapeLayer[{28, 28}], 
                    LongShortTermMemoryLayer[64], 
                    LongShortTermMemoryLayer[64], 
                    LongShortTermMemoryLayer[64], 

                    LongShortTermMemoryLayer[64], SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], 
                    LongShortTermMemoryLayer[28], 
                    LongShortTermMemoryLayer[28], 
                    LongShortTermMemoryLayer[28], 
                    LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

Visualize the effect of auto-encoder of Bast result(using quadruple LSTM(state size 64)),the effect of clustering is pretty good.

Extreme Test:

Enlarge the number of LSTM-staks is not an efficency way, it will make the net hard to train. In other words, Increasing the number of LSTM-stacks may not be effective.

Using quadruple LSTM(state size 16):

encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 
    LongShortTermMemoryLayer[16], LongShortTermMemoryLayer[16], 
    LongShortTermMemoryLayer[16], LongShortTermMemoryLayer[16], 
    SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28], 
    LongShortTermMemoryLayer[28], LongShortTermMemoryLayer[28], 

    LongShortTermMemoryLayer[28], LongShortTermMemoryLayer[28], 
    ReshapeLayer[{1, 28, 28}]}];

Using octuple LSTM(state size 16),it is very hard to train because too many parameters.

encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 
    LongShortTermMemoryLayer[16], LongShortTermMemoryLayer[16], 
    LongShortTermMemoryLayer[16], LongShortTermMemoryLayer[16], 
    LongShortTermMemoryLayer[16], LongShortTermMemoryLayer[16], 
    LongShortTermMemoryLayer[16], SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28], 

    LongShortTermMemoryLayer[28], LongShortTermMemoryLayer[28], 
    LongShortTermMemoryLayer[28], LongShortTermMemoryLayer[28], 
    LongShortTermMemoryLayer[28], LongShortTermMemoryLayer[28], 
    LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

Finally,Come back to this question.

We can simulate the process the auto-encoder of DNN, not only the result is not bad but also it can learn the embedding size(16) if using the structure of Best result of quadruple LSTM(state size 64)

encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[64], 
    LongShortTermMemoryLayer[64], LongShortTermMemoryLayer[32], 

    LongShortTermMemoryLayer[16], SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[16], 
    LongShortTermMemoryLayer[32], LongShortTermMemoryLayer[28], 
    LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

We can see the result is not bad.

The loss plot refer the answer from Adding legends when using Show

dynamic - How can I make a clickable ArrayPlot that returns input?

I would like to create a dynamic ArrayPlot so that the rectangles, when clicked, provide the input. Can I use ArrayPlot for this? Or is there something else I should have to use? Answer ArrayPlot is much more than just a simple array like Grid : it represents a ranged 2D dataset, and its visualization can be finetuned by options like DataReversed and DataRange . These features make it quite complicated to reproduce the same layout and order with Grid . Here I offer AnnotatedArrayPlot which comes in handy when your dataset is more than just a flat 2D array. The dynamic interface allows highlighting individual cells and possibly interacting with them. AnnotatedArrayPlot works the same way as ArrayPlot and accepts the same options plus Enabled , HighlightCoordinates , HighlightStyle and HighlightElementFunction . data = {{Missing["HasSomeMoreData"], GrayLevel[ 1], {RGBColor[0, 1, 1], RGBColor[0, 0, 1], GrayLevel[1]}, RGBColor[0, 1, 0]}, {GrayLevel[0], GrayLevel...

Blog

Search This Blog

machine learning - How to train Sequence-to-sequence autoencoder using LSTM?

Comments

Post a Comment

Popular posts from this blog

plotting - How to draw lines between specified dots on ListPlot?

equation solving - Invert and fit implicitly defined curve

dynamic - How can I make a clickable ArrayPlot that returns input?