Excel corruption writing DIF files

2 1 1 1

When Excel writes a file in the DIF format, SAS is unable to read the file.  Here’s a valid DIF that has 24 columns, and 446,213 rows:

TABLE
0,1
""
VECTORS
0,24      
""
TUPLES
0,446213  

Note that “Tuples” in mathematical formulas are equivalent to “Rows”. A VECTOR is like a dimension, or field. In the case of Excel, it refers to columns. So far so good. However here is how such a file would be saved by Excel 2010:

TABLE
0,1
"EXCEL"
VECTORS
0,446213
""
TUPLES
0,24

Excel has no problem reading the above file format, as it ignores tuples/vectors internally. However SAS cannot handle this not-so-standard deviation.

Below is VBA code that after saving a DIF file “fixes” the header by opening the file in binary mode and corrects the issue. Note FName contains both path and filename:

    'Fixup TUPLES and VECTORS

Dim filenum As Integer
Dim strData As String
Dim VectorIdx As Integer
Dim TuplesIdx As Integer
Dim VectorStr As String
Dim TuplesStr As String
Const CharsToProcess = 60
Dim outStr
Dim CRLF As String
Dim DoubleQuote As String
Dim Fname As String

CRLF = Chr(13) & Chr(10)
DoubleQuote = Chr(34)

Fname = saveString

filenum = FreeFile
Open Fname For Binary Access Read As filenum

strData = String$(CharsToProcess, " ")
Get #filenum, , strData
Close #filenum

VectorIdx = InStr(strData, "VECTORS")
TuplesIdx = InStr(strData, "TUPLES")
VectorStr = Mid(strData, VectorIdx + 9, 14) 'overly generous portion of chars
TuplesStr = Mid(strData, TuplesIdx + 8, 14)

If InStr(TuplesStr, Chr(13)) > 0 Then 'trim CR LF
  TuplesStr = Left(TuplesStr, InStr(TuplesStr, Chr(13)) - 1)
End If

If InStr(VectorStr, Chr(13)) > 0 Then 'trim CR LF
  VectorStr = Left(VectorStr, InStr(VectorStr, Chr(13)) - 1)
End If

outStr = "VECTORS" & CRLF & TuplesStr & CRLF & DoubleQuote & DoubleQuote & CRLF & "TUPLES" & CRLF & VectorStr

filenum = FreeFile
Open Fname For Binary Access Write As filenum
Put #filenum, VectorIdx, outStr
Close #filenum

 

Share this entry

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *

Table of Contents

Categories

Categories