Skip to main content

machine learning - How to train Sequence-to-sequence autoencoder using LSTM?


According Keras blog,I find the Seq2Seq auto-encoder. But it didn't give any example only code.


enter image description here



To build a LSTM-based autoencoder, first use a LSTM encoder to turn your input sequences into a single vector that contains information about the entire sequence, then repeat this vector n times (where n is the number of timesteps in the output sequence), and run a LSTM decoder to turn this constant sequence into the target sequence.




So I try to make similar thing in Mathematica using MNIST database.


resource = ResourceObject["MNIST"];
trainingData = ResourceData[resource, "TrainingData"];
testData = ResourceData[resource, "TestData"];
trainingSubset = Select[trainingData, Last[#] <= 4 &];
testSubset = Select[testData, Last[#] <= 4 &];
trainingImages = Keys[trainingSubset];
meanImage = Image[Mean@Map[ImageData, trainingImages]];


This is normal DNN auto-encoder of MNIST.


encoder = NetChain[{FlattenLayer[], 100, Ramp, 50, Ramp, 16}];
decoder = NetChain[{32, 50, Ramp, 100, Ramp, 784, ReshapeLayer[{1, 28, 28}]}];
net = NetGraph[{encoder, decoder, MeanSquaredLossLayer[]}, {1 -> 2 -> NetPort["Output"], 2 -> NetPort[3, "Input"], NetPort["Input"] -> NetPort[3, "Target"]},
"Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale", "MeanImage" -> meanImage}],
"Output" -> NetDecoder[{"Image", "Grayscale"}]]
{lossplot1, trained} = NetTrain[net, <|"Input" -> trainingImages|>, "Loss",
{"LossEvolutionPlot", "TrainedNet"},
BatchSize -> 256, MaxTrainingRounds -> 10];


enter image description here


We could explore the visulization of it.


reconstructor = Take[trained, {NetPort["Input"], NetPort["Output"]}];
BlockRandom[
Grid[{#, ImageAdd[reconstructor[#], meanImage] & /@ #} &@
RandomSample[trainingImages, 10]], RandomSeeding -> 1234]

enter image description here


encoder = Take[trained, {NetPort["Input"], 1}];
testImages = Keys[testSubset];

coords = DimensionReduce[encoder[testImages], 2, Method -> "TSNE"];
labels = Values[testSubset];
ListPlot[Table[Extract[coords, Position[labels, i]], {i, 0, 4}],
PlotLegends -> PointLegend[96, Range[0, 4]], Axes -> None,
PlotStyle -> Map[ColorData[96], Range[1, 5]], AspectRatio -> 1,
ImageSize -> Small]

enter image description here


But using autoencoder based LSTM,the result isn't good.


encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 

SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28],
ReshapeLayer[{1, 28, 28}]}];
net = NetGraph[{encoder, decoder, MeanSquaredLossLayer[]}, {1 -> 2 -> NetPort["Output"], 2 -> NetPort[3, "Input"], NetPort["Input"] -> NetPort[3, "Target"]},
"Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale", "MeanImage" -> meanImage}],
"Output" -> NetDecoder[{"Image", "Grayscale"}]]
{lossplot2, trained} = NetTrain[net, <|"Input" -> trainingImages|>, "Loss",
{"LossEvolutionPlot", "TrainedNet"},
BatchSize -> 256, MaxTrainingRounds -> 10];


enter image description here


reconstructor = Take[trained, {NetPort["Input"], NetPort["Output"]}];
BlockRandom[Grid[{#, ImageAdd[reconstructor[#], meanImage] & /@ #} &@
RandomSample[trainingImages, 10]], RandomSeeding -> 1234]

enter image description here


encoder = Take[trained, {NetPort["Input"], 1}];
testImages = Keys[testSubset];
coords = DimensionReduce[encoder[testImages], 2, Method -> "TSNE"];
labels = Values[testSubset];

ListPlot[Table[Extract[coords, Position[labels, i]], {i, 0, 4}],
PlotLegends -> PointLegend[96, Range[0, 4]], Axes -> None,
PlotStyle -> Map[ColorData[96], Range[1, 5]], AspectRatio -> 1,
ImageSize -> Small]

enter image description here


The loss of LSTM(purple color) fall down slowly


Show[{lossplot1, lossplot2 /. Hue[__] -> Hue[0.75]}]

enter image description here



How to improve the result?


PS:Using LSTM to predict digits number in MNIST got a good result even data is image instead of temporal time-series in this case



Using LSTM,the dimensions of embedding vector after LSTM is also 16,it can be think a presentation or encoder of the image.



why in the above case,same embedding vector size can't get a pretty good result of auto-encoder?


lstmnet = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 
SequenceLastLayer[], LinearLayer[5], SoftmaxLayer[]},
"Output" -> NetDecoder[{"Class", Range[0, 4]}],
"Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale"}]];

trained = NetTrain[lstmnet, trainingSubset, ValidationSet -> testSubset,
BatchSize -> 256, MaxTrainingRounds -> 5]
ClassifierMeasurements[trained, testSubset]["Accuracy"]
(*0.965947*)

enter image description here


Using CNN(lenet version)


lenet = NetChain[{ConvolutionLayer[20, 5], Ramp, PoolingLayer[2, 2], 
ConvolutionLayer[50, 5], Ramp, PoolingLayer[2, 2], FlattenLayer[],
500, Ramp, 5, SoftmaxLayer[]},

"Output" -> NetDecoder[{"Class", Range[0, 4]}],
"Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale"}]];
trained = NetTrain[lenet, trainingSubset, ValidationSet -> testSubset,
BatchSize -> 256,MaxTrainingRounds -> 5]
ClassifierMeasurements[trained, testSubset]["Accuracy"]
(*0.997081*)

Answer



I compare many net structure using LSTM.


Then finding multi-stacks LSTM will get a more accurate result.But it's not as effective as enlarge the embedding size.


Note:All the networks use this code as basic model,batch size 256,epochs 10



net = NetGraph[{encoder, decoder, MeanSquaredLossLayer[]}, 
{1 -> 2 -> NetPort["Output"], 2 -> NetPort[3, "Input"], NetPort["Input"] -> NetPort[3, "Target"]},
"Input" -> NetEncoder[{"Image", {28, 28}, "Grayscale", "MeanImage" -> meanImage}],
"Output" -> NetDecoder[{"Image", "Grayscale"}]]

Baseline network:


encoder = NetChain[{FlattenLayer[], 100, Ramp, 50, Ramp, 16}];
decoder = NetChain[{50, Ramp, 100, Ramp, 784, ReshapeLayer[{1, 28, 28}]}];

Using single LSTM(state size 16):



encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 
SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28],
ReshapeLayer[{1, 28, 28}]}];

Using single LSTM(state size 64):


encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[64], 
SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28],
ReshapeLayer[{1, 28, 28}]}];


Using double LSTM(state size 16):


encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 
LongShortTermMemoryLayer[16], SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

Using double LSTM(state size 64):


encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[64], 
LongShortTermMemoryLayer[64], SequenceLastLayer[]}];

decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

Using triple LSTM(state size 64):


encoder = NetChain[{ReshapeLayer[{28, 28}], 
LongShortTermMemoryLayer[64],
LongShortTermMemoryLayer[64],
LongShortTermMemoryLayer[64], SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28],
LongShortTermMemoryLayer[28],

LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

Using triple LSTM(state size 64) and DNN:


Contrary to my expectations, it didn't get a better result compare to the above net :)


encoder = NetChain[{ReshapeLayer[{28, 28}], 
LongShortTermMemoryLayer[64],
LongShortTermMemoryLayer[64],
LongShortTermMemoryLayer[64],
SequenceLastLayer[], 32, ElementwiseLayer["SELU"], 16}];

decoder = NetChain[{32, ElementwiseLayer["SELU"], 64, ReplicateLayer[28],
LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

Using quadruple LSTM(state size 64) - Best result of LSTM family.


encoder = NetChain[{ReshapeLayer[{28, 28}], 
LongShortTermMemoryLayer[64],
LongShortTermMemoryLayer[64],
LongShortTermMemoryLayer[64],

LongShortTermMemoryLayer[64], SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28],
LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

enter image description here


Visualize the effect of auto-encoder of Bast result(using quadruple LSTM(state size 64)),the effect of clustering is pretty good.


enter image description here



enter image description here


Extreme Test:


Enlarge the number of LSTM-staks is not an efficency way, it will make the net hard to train. In other words, Increasing the number of LSTM-stacks may not be effective.


Using quadruple LSTM(state size 16):


encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 
LongShortTermMemoryLayer[16], LongShortTermMemoryLayer[16],
LongShortTermMemoryLayer[16], LongShortTermMemoryLayer[16],
SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28], LongShortTermMemoryLayer[28],

LongShortTermMemoryLayer[28], LongShortTermMemoryLayer[28],
ReshapeLayer[{1, 28, 28}]}];

Using octuple LSTM(state size 16),it is very hard to train because too many parameters.


encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[16], 
LongShortTermMemoryLayer[16], LongShortTermMemoryLayer[16],
LongShortTermMemoryLayer[16], LongShortTermMemoryLayer[16],
LongShortTermMemoryLayer[16], LongShortTermMemoryLayer[16],
LongShortTermMemoryLayer[16], SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[28],

LongShortTermMemoryLayer[28], LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28], LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28], LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

enter image description here


Finally,Come back to this question.


We can simulate the process the auto-encoder of DNN, not only the result is not bad but also it can learn the embedding size(16) if using the structure of Best result of quadruple LSTM(state size 64)


encoder = NetChain[{ReshapeLayer[{28, 28}], LongShortTermMemoryLayer[64], 
LongShortTermMemoryLayer[64], LongShortTermMemoryLayer[32],

LongShortTermMemoryLayer[16], SequenceLastLayer[]}];
decoder = NetChain[{ReplicateLayer[28], LongShortTermMemoryLayer[16],
LongShortTermMemoryLayer[32], LongShortTermMemoryLayer[28],
LongShortTermMemoryLayer[28], ReshapeLayer[{1, 28, 28}]}];

We can see the result is not bad.


enter image description here


The loss plot refer the answer from Adding legends when using Show


Comments

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

plotting - How to draw lines between specified dots on ListPlot?

I would like to create a plot where I have unconnected dots and some connected. So far, I have figured out how to draw the dots. My code is the following: ListPlot[{{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4,13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full] I have thought using ListLinePlot command, but I don't know how to specify to the command to draw only selected lines between the dots. Do have any suggestions/hints on how to do that? Thank you. Answer One possibility would be to use Epilog with Line : ListPlot[ {{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4, 13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full, Epilog -> { Line[ ...