Skip to main content

import - How can I read compressed .Z file automatically by Mathematica?



When I read .Z file by something like


jpldatstring = Import["ftp://cddis.gsfc.nasa.gov/products/jpl/2003/jplg0040.04i.Z"]

Unfortunately, it turned out that "Import::infer: "Cannot infer format of file !(\"jplg0040.04i.Z\")"". Indeed, the form “.Z” isn’t included in the ImportFormats. Mathematica can read other types of compressed file included in $ImportFormats quite well such as “GZIP”,”ZIP”, “TAR”.


The .Z files I deal with can be uncompressed by certain software. But there are thousands of them added up to several GBytes , so an automatic way is needed. How can it be done in Mathematica?


The compressed .Z files can be uncompressed by 'uncompress' order under Linux, so is there any way to code Linux order into Mathematica program?


The dat example can be downloaded from ftp://cddis.gsfc.nasa.gov/products/jpl/2003/


The commit of george2079 shows a way out:


dat = Import[ "!uncompress -c /mnt/data/home/huangjp/magnphy/jpl/jplg0040.04i.Z", "GZIP"];


It returns a dir which contains the uncompressed file,again. which means I have to Imoport dat Is there anyway to read dat into mathematica directly? The trick of coding linux order into mathematica used isn't found in document. where could I learn more about this kind of skill?



Answer



This is an unfortunate oversight on the part of Wolfram and would be of general interest, so I wrote a LibraryFunction in C (compiled using CreateLibrary) that will decompress data compressed using the Unix compress command (i.e. .Z files, compressed in the LZW format).


Included are functions to extend Import and ImportString for the LZW format. You can then do:


importLZW["ftp://cddis.gsfc.nasa.gov/products/jpl/2003/comb2003_midnight.eop.Z", "Table"]

or:


Import["ftp://cddis.gsfc.nasa.gov/products/jpl/2003/comb2003_midnight.eop.Z", {"LZW", "Table"}]

which give the same result.



Here is the notebook (with an attempt at formatting for this site):


LZW Package


Mark Adler, 22 August 2015


Start Package


BeginPackage["LZW`"]
unlzw::usage="unlzw[str] decompresses str, assuming that str is the result of compressing data using the Unix compress command. Unix compress uses the LZW (Lempel\[Dash]Ziv\[Dash]Welch) algorithm. Files compressed using the Unix compress command will have the suffix \".Z\". unlzw[] returns the decompressed data or NULL is the data was not compressed using the Unix compress command, or if the compressed data was corrupted.";
importLZW::usage="importLZW[] can be used in place of Import[], extending its functionality by detecting and, if necessary, decompressing data that was compressed with the Unix compress command. Files compressed using the Unix compress command will have the suffix \".Z\".";
importStringLZW::usage="importStringLZW[] can be used in place of ImportString[], extending its functionality by detecting and, if necessary, decompressing data that was compressed with the Unix compress command. Files compressed using the Unix compress command will have the suffix \".Z\".";
Begin["`Private`"]


Compile unlzw C source code


Needs["CCompilerDriver`"]

unlzw() is adapted from the pigz source code. This decompressor is 20% faster than Unix uncompress.


unlzw$source="
/*
unlzw version 1.4, 22 August 2015

Copyright (C) 2014, 2015 Mark Adler


This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.

Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:

1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software

in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.

Mark Adler
madler@alumni.caltech.edu
*/


/* Version history:
1.0 28 Sep 2014 First version
1.1 1 Oct 2014 Cast before shift of bit buffer for portability
Use fastest 32-bit type for bit buffer, uint_fast32_t
Use uint_least16_t in case a 16-bit type is not available
1.2 3 Oct 2014 Clean up comments, consolidate return values
1.3 20 Aug 2015 Assure no out-of-bounds access on invalid input
1.4 22 Aug 2015 Return uncompressed data so far on error conditions
Be more permissive on where the input is allowed to end
*/


#include
#include

/* Type for accumulating bits. 23 bits of the register are used to accumulate
up to 16-bit symbols. */
typedef uint_fast32_t bits_t;

/* Double size_t variable n, saturating at the maximum size_t value. */
#define DOUBLE(n) \\

do { \\
size_t was = n; \\
n <<= 1; \\
if (n < was) \\
n = (size_t)0 - 1; \\
} while (0)

/* Decompress compressed data generated by the Unix compress utility (LZW
compression, files with suffix .Z). Decompress in[0..inlen-1] to an
allocated buffer (*out)[0..*outlen-1]. The length of the uncompressed data

in the allocated buffer is returned in *outlen. unlzw() returns zero on
success, negative if the compressed data is invalid, or 1 if out of memory.
The negative return values are -1 for an invalid header, -2 if the first
code is not a literal or if an invalid code is detected, and -3 if the
stream ended in the middle of a code. -1 means that the data was not
produced by Unix compress, -2 generally means random or corrupted data, and
-3 generally means prematurely terminated data. If the decompression
results in a proper zero-length output, then unlzw() returns zero, *outlen
is zero, and *out is NULL. On error, any decompressed data up to that point
is returned using *out and *outlen. */

static int unlzw(unsigned const char *in, size_t inlen,
unsigned char **out, size_t *outlen)
{
unsigned bits; /* current number of bits per code (9..16) */
unsigned mask; /* mask for current bits codes = (1< bits_t buf; /* bit buffer -- holds up to 23 bits */
unsigned left; /* bits left in buf (0..7 after code pulled) */
size_t next; /* index of next input byte in in[] */
size_t mark; /* index where last change in bits began */
unsigned code; /* code, table traversal index */

unsigned max; /* maximum bits per code for this stream */
unsigned flags; /* compress flags, then block compress flag */
unsigned end; /* last valid entry in prefix/suffix tables */
unsigned prev; /* previous code */
unsigned final; /* last character written for previous code */
unsigned stack; /* next position for reversed string */
unsigned char *put; /* allocated output buffer */
size_t size; /* size of put[] allocation */
size_t have; /* number of bytes of data in put[] */
int ret = 0; /* return code */

/* memory for unlzw() -- the first 256 entries of prefix[] and suffix[] are
never used, so could have offset the index but it's faster to waste a
little memory */
uint_least16_t prefix[65536]; /* index to LZW prefix string */
unsigned char suffix[65536]; /* one-character LZW suffix */
unsigned char match[65280 + 2]; /* buffer for reversed match */

/* initialize output for error returns */
*out = NULL;
*outlen = 0;


/* process the header */
if (inlen < 3 || in[0] != 0x1f || in[1] != 0x9d)
return -1; /* invalid header */
flags = in[2];
if (flags & 0x60)
return -1; /* invalid header */
max = flags & 0x1f;
if (max < 9 || max > 16)
return -1; /* invalid header */

if (max == 9) /* 9 doesn't really mean 9 */
max = 10;
flags &= 0x80; /* true if block compress */

/* clear table, start at nine bits per symbol */
bits = 9;
mask = 0x1ff;
end = flags ? 256 : 255;

/* set up: get the first 9-bit code, which is the first decompressed byte,

but don't create a table entry until the next code */
if (inlen == 3)
return 0; /* zero-length input is ok */
buf = in[3];
if (inlen == 4)
return -3; /* a partial code is not ok */
buf += in[4] << 8;
final = prev = buf & mask; /* code */
buf >>= bits;
left = 16 - bits;

if (prev > 255)
return -2; /* first code must be a literal */

/* we have output -- allocate and set up an output buffer four times the
size of the input (Unix compress usually compresses less than 4:1, so
this will avoid a reallocation most of the time) */
size = inlen;
DOUBLE(size);
DOUBLE(size);
put = malloc(size);

if (put == NULL)
return 1;
put[0] = final; /* first decompressed byte */
have = 1;

/* decode codes */
mark = 3; /* start of compressed data */
next = 5; /* consumed five bytes so far */
stack = 0; /* empty stack */
while (next < inlen) {

/* if the table will be full after this, increment the code size */
if (end >= mask && bits < max) {
/* flush unused input bits and bytes to next 8*bits bit boundary
(this is a vestigial aspect of the compressed data format
derived from an implementation that made use of a special VAX
machine instruction!) */
{
unsigned rem = (next - mark) % bits;
if (rem) {
rem = bits - rem;

if (rem >= inlen - next)
break;
next += rem;
}
}
buf = 0;
left = 0;

/* mark this new location for computing the next flush */
mark = next;


/* increment the number of bits per symbol */
bits++;
mask <<= 1;
mask++;
}

/* get a code of bits bits */
buf += (bits_t)(in[next++]) << left;
left += 8;

if (left < bits) {
if (next == inlen) {
ret = -3; /* partial code (not ok) */
break;
}
buf += (bits_t)(in[next++]) << left;
left += 8;
}
code = buf & mask;
buf >>= bits;

left -= bits;

/* process clear code (256) */
if (code == 256 && flags) {
/* flush unused input bits and bytes to next 8*bits bit boundary */
{
unsigned rem = (next - mark) % bits;
if (rem) {
rem = bits - rem;
if (rem > inlen - next)

break;
next += rem;
}
}
buf = 0;
left = 0;

/* mark this new location for computing the next flush */
mark = next;


/* go back to nine bits per symbol */
bits = 9; /* initialize bits and mask */
mask = 0x1ff;
end = 255; /* empty table */
continue; /* get next code */
}

/* process LZW code */
{
unsigned temp = code; /* save the current code */


/* special code to reuse last match */
if (code > end) {
/* Be picky on the allowed code here, and make sure that the
code we drop through (prev) will be a valid index so that
random input does not cause an exception. */
if (code != end + 1 || prev > end) {
ret = -2; /* invalid LZW code */
break;
}

match[stack++] = final;
code = prev;
}

/* walk through linked list to generate output in reverse order */
while (code >= 256) {
match[stack++] = suffix[code];
code = prefix[code];
}
match[stack++] = code;

final = code;

/* link new table entry */
if (end < mask) {
end++;
prefix[end] = prev;
suffix[end] = final;
}

/* set previous code for next iteration */

prev = temp;
}

/* make room for the stack in the output */
if (stack > size - have) {
if (have + stack + 1 < have) {
ret = 1;
break;
}
do {

DOUBLE(size);
} while (stack > size - have);
{
unsigned char *mem = realloc(put, size);
if (mem == NULL) {
ret = 1;
break;
}
put = mem;
}

}

/* write output in forward order */
do {
put[have++] = match[--stack];
} while (stack);

/* stack is now empty (zero) for the next code */
}


/* return the decompressed data, first reducing the allocated memory */
{
unsigned char *mem = realloc(put, have);
if (mem != NULL)
put = mem;
}
*out = put;
*outlen = have;
return ret;
}


#include \"mathlink.h\"
#include \"WolframLibrary.h\"

DLLEXPORT mint WolframLibrary_getVersion( ) {
return WolframLibraryVersion;
}

DLLEXPORT int WolframLibrary_initialize(WolframLibraryData libData) {
return LIBRARY_NO_ERROR;

}

DLLEXPORT void WolframLibrary_uninitialize( ) {
return;
}

static void report(WolframLibraryData libData, const char *name,
const char *err, const char *text) {
MLINK link = libData->getMathLink(libData);
/* name::err = \"`1`\" */

MLPutFunction(link, \"EvaluatePacket\", 1);
MLPutFunction(link, \"Set\", 2);
MLPutFunction(link, \"MessageName\", 2);
MLPutSymbol(link, name);
MLPutString(link, err);
MLPutString(link, \"`1`\");
libData->processMathLink(link);
if (MLNextPacket(link) == RETURNPKT)
MLNewPacket(link);
/* Message[name::err, text] */

MLPutFunction(link, \"EvaluatePacket\", 1);
MLPutFunction(link, \"Message\", 2);
MLPutFunction(link, \"MessageName\", 2);
MLPutSymbol(link, name);
MLPutString(link, err);
MLPutString(link, text);
libData->processMathLink(link);
if (MLNextPacket(link) == RETURNPKT)
MLNewPacket(link);
}


DLLEXPORT mint munlzw(WolframLibraryData libData, MLINK link) {
const unsigned char *str = NULL;
int len; /* int type used by MLGetByteString() */
unsigned char *out;
size_t outlen;
int ret;
const char *errmsg[] = {
\"Prematurely terminated compress stream\", /* -3 */
\"Corrupted compress stream\", /* -2 */

\"Not a Unix compress (.Z) stream\", /* -1 */
\"Unexpected return code\", /* < -3 or > 1 */
\"Out of memory\" /* 1 */
};

if (MLTestHead(link, \"List\", &len) && len == 1 &&
MLGetByteString(link, &str, &len, 0)) {
MLNewPacket(link);
ret = unlzw(str, len, &out, &outlen);
MLReleaseByteString(link, str, len);

if (ret == 0) {
MLPutByteString(link, out, outlen);
free(out);
}
else {
free(out);
MLPutSymbol(link, \"Null\");
report(libData, \"unlzw\",
ret < 0 ? \"invalid\" : \"memerr\",
errmsg[ret < -3 || ret > 1 ? 3 : ret + 3]);

}
}
else {
MLClearError(link);
MLNewPacket(link);
MLPutSymbol(link, \"Null\");
report(libData, \"unlzw\", \"arg\",
\"String expected at position 1 as the only argument\");
}
MLClearError(link);

return LIBRARY_NO_ERROR;
}";

If[Head[lib]===String,LibraryUnload[lib]]
lib=CreateLibrary[unlzw$source,"unlzw","ShellOutputFunction"->Print]
unlzw=LibraryFunctionLoad[lib,"munlzw",LinkObject,LinkObject]

Provide Import functions that use unlzw


If compressed, decompress the provided string str (possibly with nested compression) and pass on to ImportString. If not, pass on the string as is. This function can be used in place of ImportString, cleanly extending its functionality:


importStringLZW[str_String,opts___]:=

Module[{got},
If[(got=Quiet@unlzw@str)=!=Null,
importStringLZW[got,opts],
ImportString[str,opts]]
]

If compressed, decompress the data from the provided channel (possibly with nested compression) and pass on to ImportString. If not, pass on the data as is. This function can be used in place of Import, cleanly extending its functionality:


importLZW[channel_,opts___]:=
importStringLZW[Import[channel,"String"],opts]


End the Package


End[]


LZW`Private`

EndPackage[]

$Context



Global`

Extend Import with LZW compression format


Change the DownValue of Import to recognize the explicit compression format "LZW" and call importLZW with the "LZW" stripped out. Do the same for ImportString, calling importStringLZW above. While it would be possible to extend Import and ImportString to automatically detect LZW compression, that could introduce inefficiencies in their operation for non-LZW data, in part because it would force always reading the compressed data into memory. Both importLZW and Import[_, "LZW"] will automatically detect LZW compression and decompress as needed.


ImportExport'RegisterImport is not suitable for this application, due to its expectations for a "low-level" function that returns a set of rules. For a compression format that then passes another format for further import conversions, changing the DownValue as is done here allows for completely transparent decompression with the automatic detection of the uncompressed data format and the specification of elements therein, e.g. for the TAR format, which was often compressed with Unix compress.


Begin["System`ConvertersDump`"];
Unprotect@Import;
DownValues[Import]={HoldPattern[Import[channel_,"LZW"|{"LZW"},opts___?OptionQ]]:>importLZW[channel,opts],
HoldPattern[Import[channel_,{"LZW",elts__},opts___?OptionQ]]:>importLZW[channel,{elts},opts]}~Join~DownValues[Import];

Protect@Import;
Unprotect@ImportString;
DownValues[ImportString]={HoldPattern[ImportString[str_String,"LZW"|{"LZW"},opts___?OptionQ]]:>importStringLZW[str,opts],
HoldPattern[ImportString[str_String,{"LZW",elts__},opts___?OptionQ]]:>importStringLZW[str,{elts},opts]}~Join~DownValues[ImportString];
Protect@ImportString;
End[]


System`ConvertersDump`


$Context


Global`

Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...