In the Documentation we read (emphasis is mine):
By default, the Wolfram System uses the character encoding
"PrintableASCII"when saving notebooks and packages. This means that when special characters are written out to files or external programs, they are represented purely as sequences of ordinary characters. This uniform representation is crucial in allowing special characters in the Wolfram Language to be used in a way that does not depend on the details of particular computer systems.When creating packages and notebooks, special characters are always written out using full names.
ExportString["Lamé \[LongRightArrow] αβ+", "Package"]"(* Created by Wolfram Mathematica 10.0 : www.wolfram.com *)
\"Lam\\[EAcute] \\[LongRightArrow] \\[Alpha]\\[Beta]+\"
"In
InputForm, all special characters are written out fully when using"PrintableASCII".ToString["Lamé \[LongRightArrow] αβ+", InputForm,
CharacterEncoding -> "PrintableASCII"]"\"Lam\\[EAcute] \\[LongRightArrow] \\[Alpha]\\[Beta]+\""
In the above-cited examples the character é is converted into the corresponding Mathematica's platform-independent representation \[EAcute]. This is expected since this character isn't PrintableASCIIQ:
PrintableASCIIQ@"é"
ToCharacterCode@"é"
ToCharacterCode["é", "ASCII"]
False
{233}
None
Hence I expect that Exporting this character with CharacterEncoding -> "PrintableASCII" will also give me \[EAcute]:
Export["test1.txt", "é", "Text",
CharacterEncoding -> "PrintableASCII"] // SystemOpen
Export["test2.txt", "é", "String",
CharacterEncoding -> "PrintableASCII"] // SystemOpen
The first input produces a file containing e', the second – a file with é.
Is this behavior correct? How can I export strings using only "PrintableASCII" and writing all the special characters in their FullForm as it happens when I Export as "Package"?
Further considerations
Wrapping the string by InputForm doesn't work:
Export["test3.txt", InputForm@"é", "String",
CharacterEncoding -> "PrintableASCII"] // SystemOpen
produces a file containing "é" literally. It is worth to note that for ExportString wrapping by InputForm works:
ExportString[InputForm@"é", "String", CharacterEncoding -> "PrintableASCII"]
"\"\\[EAcute]\""
Wrapping by FullForm works but is unacceptable due to the quotation marks added:
Export["test4.txt", FullForm@"é", "String",
CharacterEncoding -> "PrintableASCII"] // SystemOpen
produces a file containing "\[EAcute]" literally.
Exporting as "Package" doesn't serve as a workaround because of the header added and also because strings are exported with the quotation marks.
The same happens with CharacterEncoding -> None.
Answer
I'm afraid support didn't answer your question correctly. However, "String" is one of the most confusing formats we have, so I am not totally suprised.
If you look at the "Background and Context" on ref/format/String, you'll see
- Arbitrary binary data represented as a Wolfram Language string.
- Used for importing or exporting entire raw binary data.
Note in particular that it is used for binary data--binary data in general does not use character encodings. The intended use case is something along the lines of Import["foo.png", "String"], when you wish to operate directly on the bytes of the PNG rather than the image it contains. In the import case, each byte gets mapped to the corresponding code point in the range 0-255. In the export case, we have to do something if a special character > 255 appears, so it is converted to its long name. You can think of this as akin to ToString[expr, CharacterEncoding->"ISO8859-1"]. In short, "String" is very different from "Text", which attempts to interpret textual data.
In a world where we have ByteArray, we problably wouldn't have had the "String" format, as the ByteArray would be the natural data structure. We don't yet have this format, but it is certainly on our roadmap. For now, you can Export lists a ByteArray using the "Byte" format, and import "Byte" format and then pass it to ByteArray to convert it from a lists of bytes to a proper ByteArray object.
Comments
Post a Comment