I have the following list:
x={"A", "A", "A", "E", "D", "D", "D", "C", "B", "E", "E", "E", "D", \
"B", "A", "D", "B", "E", "C", "A", "D", "A", "A", "A", "A", "C", "C", \
"C", "D", "D", "E"}
I want to make the Markov transition probability matrix of first order. To do so I have started by writing:
Partition[x, 2, 1] // Sort // Counts
This will give:
<|{"A", "A"} -> 5, {"A", "C"} -> 1, {"A", "D"} -> 2, {"A", "E"} ->
1, {"B", "A"} -> 1, {"B", "E"} -> 2, {"C", "A"} -> 1, {"C", "B"} ->
1, {"C", "C"} -> 2, {"C", "D"} -> 1, {"D", "A"} -> 1, {"D", "B"} ->
2, {"D", "C"} -> 1, {"D", "D"} -> 3, {"D", "E"} -> 1, {"E", "C"} ->
1, {"E", "D"} -> 2, {"E", "E"} -> 2|>
above shows the frequencies of state transition A to A, A to B, A to C, A to D and A to E and so on for other letters, I wonder how can I show this result as a matrix?
Answer
You can use SparseArray with additive assembly as follows:
x = RandomChoice[Alphabet["English", "IndexCharacters"], 1000000];
data = Flatten[ToCharacterCode[x]] - (ToCharacterCode["A"][[1]] - 1); // AbsoluteTiming // First
A = With[{
spopt = SystemOptions["SparseArrayOptions"]},
Internal`WithLocalSettings[
(*switch to additive assembly*)
SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> Total}],
(*assemble matrix*)
SparseArray[
Partition[data, 2, 1] -> 1,
Max[data] {1, 1}
]
,
(*reset "SparseArrayOptions" to previous value*)
SetSystemOptions[spopt]]
]; // AbsoluteTiming // First
0.739454
0.114682
As the timings suggest, it is worthwhile to avoid strings in the first place.
Formerly, I used
LetterNumber, butToCharacterCodeis much, much faster.It is
"TreatRepeatedEntries" -> Totalwhich enables summing of entries.Countis not needed anymore.Developer`ToPackedArraymight speed up things a bit ifxis very long. The other hokus-pokus is for making things bulletproof against aborts (i.e., options are reset even if computations are interrupted). See also (37566) and (136017).
Comments
Post a Comment