I have an unformatted binary file generated using the Compaq Visual Fortran compiler (big endian).
Here's what the little bit of documentation states about it:
The binary file is written in a general format consisting of data arrays, headed by a descriptor record:
- An 8-character keyword which identifies the data in the block.
- A 4-byte signed integer defining the number of elements in the block.
- A 4-character keyword defining the type of data. (
INTE,REAL,LOGI,DOUB, orCHAR)
The header items are read in as a single record. The data follows the descriptor on a new record. Numerical arrays are divided into block of up to 1000 items. The physical record size is the same as the block size.
Additional keyword info:
SEQHDR- 1 item -INTE- Sequence header, with data value. If number is present it is an encoded integer corresponding to the time the file was created.MINISTEP- 1 item -INTE- Ministep number is essentially the data number (ex: psi on day 1)PARAMS- n items -REAL- Vector parameter at ministep value.
Attempts to read such data into Mathematica including Import
data=Import["file", "Binary", ByteOrdering -> +1];
data = FromCharacterCode[data]
and OpenRead
OpenRead["file", BinaryFormat -> True]

show me some identifiable text, but no useful numerical values.
A file in question is available here.
Is Mathematica able to parse this file type, and if so, what is the best way?
Answer
The file appears to be a Unified Summary File from the Schlumberger Eclipse Reservoir Simulator. This file format uses Compaq Visual Fortran variable length record encoding.
Mathematica does not offer any built-in functionality to read this file format, so we will have to parse it ourselves.
We start by defining a convenience function to read big-endian binary data from a file:
read[s_, t_] := BinaryRead[s, t, ByteOrdering -> +1]
Logical records in Eclipse files come in two parts: the header and the data. The following function reads the header:
readEclHeader[s_] :=
read[
s
, {"Integer32"
, Sequence@@ConstantArray["Character8", 8]
, "Integer32"
, Sequence@@ConstantArray["Character8", 4]
, "Integer32"
}
] /. {EndOfFile, ___} :> EndOfFile
The CVF leading and trailing record lengths are skipped, leaving the record type keyword, the number of data elements, and the type of the data elements. Each element type requires special handling:
readEclData[s_, "INTE", n_] := readEclElements[s, "Integer32", 4, n]
readEclData[s_, "REAL", n_] := readEclElements[s, "Real32", 4, n]
readEclData[_, t_, _] := (Message[readEclData::unknowntype, t]; Abort[])
This code only handles the integer (INTE) and real data types (REAL), although it would be easy to extend this to handle the other types as well. readEclElements is used in each case to read the required number of data elements -- which may span multiple variable records:
readEclElements[s_, t_, b_, n_] :=
Module[{len, next, r}
, len[] := read[s, "Integer32"]
; next[] := (If[r == 0, len[]; r = len[]]; r -= b; read[s, t])
; r = len[]
; (len[]; #) &@ Table[next[], {n}]
]
These helper functions are used to read a complete header/data pair:
readEclRecord[s_] :=
readEclHeader[s] /. {_, k__String, n_, t__String, _} :>
{StringJoin[k], readEclData[s, StringJoin[t], n]}
All that remains is to open the file, read all of the records, and close the file:
readEclFile[filename_] :=
Module[{s = OpenRead[filename, BinaryFormat -> True], r}
, r = Reap[
While[readEclRecord[s] /. {EndOfFile -> False, d_ :> (Sow[d]; True)}]
][[2, 1]]
; Close[s]
; r
]
Here is readEclFile in action, reading the supplied data file (assuming that file is in the same directory as the notebook):
$file = FileNameJoin[{NotebookDirectory[], "INITIAL-TEST.UNSMRY"}];
readEclFile[$file] // Column
(*
{SEQHDR ,{-1163229266}}
{MINISTEP,{0}}
{PARAMS ,{0.,0.,0.,0.,0.,0.,0.,4085.81,4085.81,0.,0.,0.}}
{MINISTEP,{1}}
{PARAMS ,{1.,0.00273785,3348.6,3468.9,0.,0.,0.,3694.18,3662.5,0.,0.,0.}}
{MINISTEP,{2}}
{PARAMS ,{4.,0.0109514,3348.6,3468.9,0.,0.,0.,3561.9,3519.26,0.,0.,0.}}
{MINISTEP,{3}}
{PARAMS ,{11.5,0.0314853,3348.6,3468.9,0.,0.,0.,3422.25,3369.69,0.,0.,0.}}
{MINISTEP,{4}}
{PARAMS ,{19.,0.0520192,3348.6,3468.9,0.,0.,0.,3343.98,3286.4,0.,0.,0.}}
{SEQHDR ,{-1163229208}}
{MINISTEP,{5}}
{PARAMS ,{37.,0.1013,6419.3,6882.3,0.,0.,0.,2591.91,2425.78,0.,0.,0.}}
...
{SEQHDR ,{-1163228692}}
{MINISTEP,{30}}
{PARAMS ,{616.,1.68652,1826.6,2386.1,0.,0.,0.,2616.22,2432.4,0.,0.,0.}}
*)
I do not know the time encoding used in the SEQHDR records.
Disclaimer: I have no affiliation with Schlumberger.
Comments
Post a Comment