Skip to main content

string manipulation - Possible bug in Export with CharacterEncoding -> "PrintableASCII"


In the Documentation we read (emphasis is mine):




By default, the Wolfram System uses the character encoding "PrintableASCII" when saving notebooks and packages. This means that when special characters are written out to files or external programs, they are represented purely as sequences of ordinary characters. This uniform representation is crucial in allowing special characters in the Wolfram Language to be used in a way that does not depend on the details of particular computer systems.


When creating packages and notebooks, special characters are always written out using full names.


ExportString["Lamé \[LongRightArrow] αβ+", "Package"]


"(* Created by Wolfram Mathematica 10.0 : www.wolfram.com *)
\"Lam\\[EAcute] \\[LongRightArrow] \\[Alpha]\\[Beta]+\"
"


In InputForm, all special characters are written out fully when using "PrintableASCII".


ToString["Lamé \[LongRightArrow] αβ+", InputForm, 
CharacterEncoding -> "PrintableASCII"]


"\"Lam\\[EAcute] \\[LongRightArrow] \\[Alpha]\\[Beta]+\""


In the above-cited examples the character é is converted into the corresponding Mathematica's platform-independent representation \[EAcute]. This is expected since this character isn't PrintableASCIIQ:


PrintableASCIIQ@"é"

ToCharacterCode@"é"
ToCharacterCode["é", "ASCII"]


False

{233}

None


Hence I expect that Exporting this character with CharacterEncoding -> "PrintableASCII" will also give me \[EAcute]:


Export["test1.txt", "é", "Text", 
CharacterEncoding -> "PrintableASCII"] // SystemOpen

Export["test2.txt", "é", "String",
CharacterEncoding -> "PrintableASCII"] // SystemOpen

The first input produces a file containing e', the second – a file with é.


Is this behavior correct? How can I export strings using only "PrintableASCII" and writing all the special characters in their FullForm as it happens when I Export as "Package"?





Further considerations


Wrapping the string by InputForm doesn't work:


Export["test3.txt", InputForm@"é", "String", 
CharacterEncoding -> "PrintableASCII"] // SystemOpen

produces a file containing "é" literally. It is worth to note that for ExportString wrapping by InputForm works:


ExportString[InputForm@"é", "String", CharacterEncoding -> "PrintableASCII"]


"\"\\[EAcute]\""


Wrapping by FullForm works but is unacceptable due to the quotation marks added:


Export["test4.txt", FullForm@"é", "String", 
CharacterEncoding -> "PrintableASCII"] // SystemOpen

produces a file containing "\[EAcute]" literally.


Exporting as "Package" doesn't serve as a workaround because of the header added and also because strings are exported with the quotation marks.


The same happens with CharacterEncoding -> None.



Answer



I'm afraid support didn't answer your question correctly. However, "String" is one of the most confusing formats we have, so I am not totally suprised.



If you look at the "Background and Context" on ref/format/String, you'll see



  • Arbitrary binary data represented as a Wolfram Language string.

  • Used for importing or exporting entire raw binary data.


Note in particular that it is used for binary data--binary data in general does not use character encodings. The intended use case is something along the lines of Import["foo.png", "String"], when you wish to operate directly on the bytes of the PNG rather than the image it contains. In the import case, each byte gets mapped to the corresponding code point in the range 0-255. In the export case, we have to do something if a special character > 255 appears, so it is converted to its long name. You can think of this as akin to ToString[expr, CharacterEncoding->"ISO8859-1"]. In short, "String" is very different from "Text", which attempts to interpret textual data.


In a world where we have ByteArray, we problably wouldn't have had the "String" format, as the ByteArray would be the natural data structure. We don't yet have this format, but it is certainly on our roadmap. For now, you can Export lists a ByteArray using the "Byte" format, and import "Byte" format and then pass it to ByteArray to convert it from a lists of bytes to a proper ByteArray object.


Comments

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

plotting - How to draw lines between specified dots on ListPlot?

I would like to create a plot where I have unconnected dots and some connected. So far, I have figured out how to draw the dots. My code is the following: ListPlot[{{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4,13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full] I have thought using ListLinePlot command, but I don't know how to specify to the command to draw only selected lines between the dots. Do have any suggestions/hints on how to do that? Thank you. Answer One possibility would be to use Epilog with Line : ListPlot[ {{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4, 13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full, Epilog -> { Line[ ...