import - How can I read compressed .Z file automatically by Mathematica?

When I read .Z file by something like

jpldatstring = Import["ftp://cddis.gsfc.nasa.gov/products/jpl/2003/jplg0040.04i.Z"]

Unfortunately, it turned out that "Import::infer: "Cannot infer format of file !(\"jplg0040.04i.Z\")"". Indeed, the form “.Z” isn’t included in the ImportFormats. Mathematica can read other types of compressed file included in $ImportFormats quite well such as “GZIP”,”ZIP”, “TAR”.

The .Z files I deal with can be uncompressed by certain software. But there are thousands of them added up to several GBytes , so an automatic way is needed. How can it be done in Mathematica?

The compressed .Z files can be uncompressed by 'uncompress' order under Linux, so is there any way to code Linux order into Mathematica program?

The dat example can be downloaded from ftp://cddis.gsfc.nasa.gov/products/jpl/2003/

The commit of george2079 shows a way out:

dat = Import[ "!uncompress -c /mnt/data/home/huangjp/magnphy/jpl/jplg0040.04i.Z", "GZIP"];

It returns a dir which contains the uncompressed file,again. which means I have to Imoport dat Is there anyway to read dat into mathematica directly? The trick of coding linux order into mathematica used isn't found in document. where could I learn more about this kind of skill?

Answer

This is an unfortunate oversight on the part of Wolfram and would be of general interest, so I wrote a LibraryFunction in C (compiled using CreateLibrary) that will decompress data compressed using the Unix compress command (i.e. .Z files, compressed in the LZW format).

Included are functions to extend Import and ImportString for the LZW format. You can then do:

importLZW["ftp://cddis.gsfc.nasa.gov/products/jpl/2003/comb2003_midnight.eop.Z", "Table"]

or:

Import["ftp://cddis.gsfc.nasa.gov/products/jpl/2003/comb2003_midnight.eop.Z", {"LZW", "Table"}]

which give the same result.

Here is the notebook (with an attempt at formatting for this site):

LZW Package

Mark Adler, 22 August 2015

Start Package

BeginPackage["LZW`"]
unlzw::usage="unlzw[str] decompresses str, assuming that str is the result of compressing data using the Unix compress command. Unix compress uses the LZW (Lempel\[Dash]Ziv\[Dash]Welch) algorithm. Files compressed using the Unix compress command will have the suffix \".Z\". unlzw[] returns the decompressed data or NULL is the data was not compressed using the Unix compress command, or if the compressed data was corrupted.";
importLZW::usage="importLZW[] can be used in place of Import[], extending its functionality by detecting and, if necessary, decompressing data that was compressed with the Unix compress command. Files compressed using the Unix compress command will have the suffix \".Z\".";
importStringLZW::usage="importStringLZW[] can be used in place of ImportString[], extending its functionality by detecting and, if necessary, decompressing data that was compressed with the Unix compress command. Files compressed using the Unix compress command will have the suffix \".Z\".";
Begin["`Private`"]

Compile unlzw C source code

Needs["CCompilerDriver`"]

unlzw() is adapted from the pigz source code. This decompressor is 20% faster than Unix uncompress.

unlzw$source="
/*
  unlzw version 1.4, 22 August 2015

  Copyright (C) 2014, 2015 Mark Adler


  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software

     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Mark Adler
  madler@alumni.caltech.edu
*/


/* Version history:
   1.0  28 Sep 2014  First version
   1.1   1 Oct 2014  Cast before shift of bit buffer for portability
                     Use fastest 32-bit type for bit buffer, uint_fast32_t
                     Use uint_least16_t in case a 16-bit type is not available
   1.2   3 Oct 2014  Clean up comments, consolidate return values
   1.3  20 Aug 2015  Assure no out-of-bounds access on invalid input
   1.4  22 Aug 2015  Return uncompressed data so far on error conditions
                     Be more permissive on where the input is allowed to end
 */


#include 
#include 

/* Type for accumulating bits.  23 bits of the register are used to accumulate
   up to 16-bit symbols. */
typedef uint_fast32_t bits_t;

/* Double size_t variable n, saturating at the maximum size_t value. */
#define DOUBLE(n) \\

    do { \\
        size_t was = n; \\
        n <<= 1; \\
        if (n < was) \\
            n = (size_t)0 - 1; \\
    } while (0)

/* Decompress compressed data generated by the Unix compress utility (LZW
   compression, files with suffix .Z).  Decompress in[0..inlen-1] to an
   allocated buffer (*out)[0..*outlen-1].  The length of the uncompressed data

   in the allocated buffer is returned in *outlen.  unlzw() returns zero on
   success, negative if the compressed data is invalid, or 1 if out of memory.
   The negative return values are -1 for an invalid header, -2 if the first
   code is not a literal or if an invalid code is detected, and -3 if the
   stream ended in the middle of a code.  -1 means that the data was not
   produced by Unix compress, -2 generally means random or corrupted data, and
   -3 generally means prematurely terminated data.  If the decompression
   results in a proper zero-length output, then unlzw() returns zero, *outlen
   is zero, and *out is NULL.  On error, any decompressed data up to that point
   is returned using *out and *outlen. */

static int unlzw(unsigned const char *in, size_t inlen,
                 unsigned char **out, size_t *outlen)
{
    unsigned bits;              /* current number of bits per code (9..16) */
    unsigned mask;              /* mask for current bits codes = (1<    bits_t buf;                 /* bit buffer -- holds up to 23 bits */
    unsigned left;              /* bits left in buf (0..7 after code pulled) */
    size_t next;                /* index of next input byte in in[] */
    size_t mark;                /* index where last change in bits began */
    unsigned code;              /* code, table traversal index */

    unsigned max;               /* maximum bits per code for this stream */
    unsigned flags;             /* compress flags, then block compress flag */
    unsigned end;               /* last valid entry in prefix/suffix tables */
    unsigned prev;              /* previous code */
    unsigned final;             /* last character written for previous code */
    unsigned stack;             /* next position for reversed string */
    unsigned char *put;         /* allocated output buffer */
    size_t size;                /* size of put[] allocation */
    size_t have;                /* number of bytes of data in put[] */
    int ret = 0;                /* return code */

    /* memory for unlzw() -- the first 256 entries of prefix[] and suffix[] are
       never used, so could have offset the index but it's faster to waste a
       little memory */
    uint_least16_t prefix[65536];       /* index to LZW prefix string */
    unsigned char suffix[65536];        /* one-character LZW suffix */
    unsigned char match[65280 + 2];     /* buffer for reversed match */

    /* initialize output for error returns */
    *out = NULL;
    *outlen = 0;


    /* process the header */
    if (inlen < 3 || in[0] != 0x1f || in[1] != 0x9d)
        return -1;                          /* invalid header */
    flags = in[2];
    if (flags & 0x60)
        return -1;                          /* invalid header */
    max = flags & 0x1f;
    if (max < 9 || max > 16)
        return -1;                          /* invalid header */

    if (max == 9)                           /* 9 doesn't really mean 9 */
        max = 10;
    flags &= 0x80;                          /* true if block compress */

    /* clear table, start at nine bits per symbol */
    bits = 9;
    mask = 0x1ff;
    end = flags ? 256 : 255;

    /* set up: get the first 9-bit code, which is the first decompressed byte,

       but don't create a table entry until the next code */
    if (inlen == 3)
        return 0;                           /* zero-length input is ok */
    buf = in[3];
    if (inlen == 4)
        return -3;                          /* a partial code is not ok */
    buf += in[4] << 8;
    final = prev = buf & mask;              /* code */
    buf >>= bits;
    left = 16 - bits;

    if (prev > 255)
        return -2;                          /* first code must be a literal */

    /* we have output -- allocate and set up an output buffer four times the
       size of the input (Unix compress usually compresses less than 4:1, so
       this will avoid a reallocation most of the time) */
    size = inlen;
    DOUBLE(size);
    DOUBLE(size);
    put = malloc(size);

    if (put == NULL)
        return 1;
    put[0] = final;                         /* first decompressed byte */
    have = 1;

    /* decode codes */
    mark = 3;                               /* start of compressed data */
    next = 5;                               /* consumed five bytes so far */
    stack = 0;                              /* empty stack */
    while (next < inlen) {

        /* if the table will be full after this, increment the code size */
        if (end >= mask && bits < max) {
            /* flush unused input bits and bytes to next 8*bits bit boundary
               (this is a vestigial aspect of the compressed data format
               derived from an implementation that made use of a special VAX
               machine instruction!) */
            {
                unsigned rem = (next - mark) % bits;
                if (rem) {
                    rem = bits - rem;

                    if (rem >= inlen - next)
                        break;
                    next += rem;
                }
            }
            buf = 0;
            left = 0;

            /* mark this new location for computing the next flush */
            mark = next;


            /* increment the number of bits per symbol */
            bits++;
            mask <<= 1;
            mask++;
        }

        /* get a code of bits bits */
        buf += (bits_t)(in[next++]) << left;
        left += 8;

        if (left < bits) {
            if (next == inlen) {
                ret = -3;               /* partial code (not ok) */
                break;
            }
            buf += (bits_t)(in[next++]) << left;
            left += 8;
        }
        code = buf & mask;
        buf >>= bits;

        left -= bits;

        /* process clear code (256) */
        if (code == 256 && flags) {
            /* flush unused input bits and bytes to next 8*bits bit boundary */
            {
                unsigned rem = (next - mark) % bits;
                if (rem) {
                    rem = bits - rem;
                    if (rem > inlen - next)

                        break;
                    next += rem;
                }
            }
            buf = 0;
            left = 0;

            /* mark this new location for computing the next flush */
            mark = next;


            /* go back to nine bits per symbol */
            bits = 9;                       /* initialize bits and mask */
            mask = 0x1ff;
            end = 255;                      /* empty table */
            continue;                       /* get next code */
        }

        /* process LZW code */
        {
            unsigned temp = code;           /* save the current code */


            /* special code to reuse last match */
            if (code > end) {
                /* Be picky on the allowed code here, and make sure that the
                   code we drop through (prev) will be a valid index so that
                   random input does not cause an exception. */
                if (code != end + 1 || prev > end) {
                    ret = -2;               /* invalid LZW code */
                    break;
                }

                match[stack++] = final;
                code = prev;
            }

            /* walk through linked list to generate output in reverse order */
            while (code >= 256) {
                match[stack++] = suffix[code];
                code = prefix[code];
            }
            match[stack++] = code;

            final = code;

            /* link new table entry */
            if (end < mask) {
                end++;
                prefix[end] = prev;
                suffix[end] = final;
            }

            /* set previous code for next iteration */

            prev = temp;
        }

        /* make room for the stack in the output */
        if (stack > size - have) {
            if (have + stack + 1 < have) {
                ret = 1;
                break;
            }
            do {

                DOUBLE(size);
            } while (stack > size - have);
            {
                unsigned char *mem = realloc(put, size);
                if (mem == NULL) {
                    ret = 1;
                    break;
                }
                put = mem;
            }

        }

        /* write output in forward order */
        do {
            put[have++] = match[--stack];
        } while (stack);

        /* stack is now empty (zero) for the next code */
    }


    /* return the decompressed data, first reducing the allocated memory */
    {
        unsigned char *mem = realloc(put, have);
        if (mem != NULL)
            put = mem;
    }
    *out = put;
    *outlen = have;
    return ret;
}


#include \"mathlink.h\"
#include \"WolframLibrary.h\"

DLLEXPORT mint WolframLibrary_getVersion( ) {
   return WolframLibraryVersion;
}

DLLEXPORT int WolframLibrary_initialize(WolframLibraryData libData) {
   return LIBRARY_NO_ERROR;

}

DLLEXPORT void WolframLibrary_uninitialize( ) {
   return;
}

static void report(WolframLibraryData libData, const char *name,
                   const char *err, const char *text) {
   MLINK link = libData->getMathLink(libData);
      /* name::err = \"`1`\" */

   MLPutFunction(link, \"EvaluatePacket\", 1);
   MLPutFunction(link, \"Set\", 2);
   MLPutFunction(link, \"MessageName\", 2);
   MLPutSymbol(link, name);
   MLPutString(link, err);
   MLPutString(link, \"`1`\");
   libData->processMathLink(link);
   if (MLNextPacket(link) == RETURNPKT)
      MLNewPacket(link);
      /* Message[name::err, text] */

   MLPutFunction(link, \"EvaluatePacket\", 1);
   MLPutFunction(link, \"Message\", 2);
   MLPutFunction(link, \"MessageName\", 2);
   MLPutSymbol(link, name);
   MLPutString(link, err);
   MLPutString(link, text);
   libData->processMathLink(link);
   if (MLNextPacket(link) == RETURNPKT)
       MLNewPacket(link);
}


DLLEXPORT mint munlzw(WolframLibraryData libData, MLINK link) {
   const unsigned char *str = NULL;
   int len;       /* int type used by MLGetByteString() */
   unsigned char *out;
   size_t outlen;
   int ret;
   const char *errmsg[] = {
      \"Prematurely terminated compress stream\",  /* -3 */
      \"Corrupted compress stream\",               /* -2 */

      \"Not a Unix compress (.Z) stream\",         /* -1 */
      \"Unexpected return code\",                  /* < -3 or > 1 */
      \"Out of memory\"                            /* 1 */
   };

   if (MLTestHead(link, \"List\", &len) && len == 1 &&
       MLGetByteString(link, &str, &len, 0)) {
      MLNewPacket(link);
      ret = unlzw(str, len, &out, &outlen);
      MLReleaseByteString(link, str, len);

      if (ret == 0) {
         MLPutByteString(link, out, outlen);
         free(out);
      }
      else {
         free(out);
         MLPutSymbol(link, \"Null\");
         report(libData, \"unlzw\",
                ret < 0 ? \"invalid\" : \"memerr\",
                errmsg[ret < -3 || ret > 1 ? 3 : ret + 3]);

      }
   }
   else {
      MLClearError(link);
      MLNewPacket(link);
      MLPutSymbol(link, \"Null\");
      report(libData, \"unlzw\", \"arg\",
             \"String expected at position 1 as the only argument\");
   }
   MLClearError(link);

   return LIBRARY_NO_ERROR;
}";

If[Head[lib]===String,LibraryUnload[lib]]
lib=CreateLibrary[unlzw$source,"unlzw","ShellOutputFunction"->Print]
unlzw=LibraryFunctionLoad[lib,"munlzw",LinkObject,LinkObject]

Provide Import functions that use unlzw

If compressed, decompress the provided string str (possibly with nested compression) and pass on to ImportString. If not, pass on the string as is. This function can be used in place of ImportString, cleanly extending its functionality:

importStringLZW[str_String,opts___]:=

  Module[{got},
    If[(got=Quiet@unlzw@str)=!=Null,
      importStringLZW[got,opts],
      ImportString[str,opts]]
  ]

If compressed, decompress the data from the provided channel (possibly with nested compression) and pass on to ImportString. If not, pass on the data as is. This function can be used in place of Import, cleanly extending its functionality:

importLZW[channel_,opts___]:=
  importStringLZW[Import[channel,"String"],opts]

End the Package

End[]

LZW`Private`

EndPackage[]

$Context

Global`

Extend Import with LZW compression format

Change the DownValue of Import to recognize the explicit compression format "LZW" and call importLZW with the "LZW" stripped out. Do the same for ImportString, calling importStringLZW above. While it would be possible to extend Import and ImportString to automatically detect LZW compression, that could introduce inefficiencies in their operation for non-LZW data, in part because it would force always reading the compressed data into memory. Both importLZW and Import[_, "LZW"] will automatically detect LZW compression and decompress as needed.

ImportExport'RegisterImport is not suitable for this application, due to its expectations for a "low-level" function that returns a set of rules. For a compression format that then passes another format for further import conversions, changing the DownValue as is done here allows for completely transparent decompression with the automatic detection of the uncompressed data format and the specification of elements therein, e.g. for the TAR format, which was often compressed with Unix compress.

Begin["System`ConvertersDump`"];
Unprotect@Import;
DownValues[Import]={HoldPattern[Import[channel_,"LZW"|{"LZW"},opts___?OptionQ]]:>importLZW[channel,opts],
HoldPattern[Import[channel_,{"LZW",elts__},opts___?OptionQ]]:>importLZW[channel,{elts},opts]}~Join~DownValues[Import];

Protect@Import;
Unprotect@ImportString;
DownValues[ImportString]={HoldPattern[ImportString[str_String,"LZW"|{"LZW"},opts___?OptionQ]]:>importStringLZW[str,opts],
HoldPattern[ImportString[str_String,{"LZW",elts__},opts___?OptionQ]]:>importStringLZW[str,{elts},opts]}~Join~DownValues[ImportString];
Protect@ImportString;
End[]

System`ConvertersDump`

$Context

Global`

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

Blog

Search This Blog