Disclaimer: I am by no means an expert on TIFF (or anything else). This blog (series) is me sharing my recent look into TIFF errors. Please feel free to comment, point out errors, suggest better fixes, etc. At the end of the day, we’re all in this together and here to learn from each other!
Following the potentially worrying but not life-threatening error Tag 270 out of sequence today’s blog post covers a bigger headache: StripOffsets inconsistent with StripByteCounts, which is followed by two numbers: <len1> != <len2> – so, for example: StripOffsets inconsistent with StripByteCounts: 1!=14
What’s the message?
In our workflows the first tool to check TIFF files is JHOVE – if an error occurs, I crosscheck with other tools. The message in the title of this blog post – “StripOffsets inconsistent with StripByteCounts” – is taken from JHOVE. The corresponding message with DPF Manager is “STRIPS-0005 IFD1 Inconsistent strip lengths, The cardinality of striptOffsets and StripsBytesCount must match“.
One observation I made was that JHOVE reports the file as well-formed, but not valid, potentially putting some people’s mind at ease and dismissing the problem as not too serious.
While JHOVE and DPF Manager have validation routines that detect the error, exiftool can be used to dig a bit deeper into the tags in question.
|jhove||jhove 1.22.1 with TIFF-hul 1.9.1||TIFF-HUL-28: StripOffsets inconsistent with StripByteCounts: <len1> != <len2>|
|DPF Manager||Version 3.5.1||STRIPS-0005 IFD1 Inconsistent strip lengths, The cardinality of striptOffsets and StripsBytesCount must match|
|exiftool||Version 11.91|| Extract values from tags and compare number of values: |
exiftool -stripoffsets -b filename | wc -w
exiftool -stripbytecounts -b filename | wc -w
What’s the problem?
To understand this message, we first have to understand what strips are. For effective memory usage and random access many formats group information together in things like chunks, blocks or other forms. Within TIFF two forms of bitmapped data groups exist: tiles, which were introduced in the TIFF 6.0 extension section and strips. Strips contain one or several consecutive rows of data. Strips within a file share all basic TIFF characteristic such as compression, color or bits per pixel.
For each strip, three values need to be defined:
- RowsPerStrip (Tag 278)
Exactly what it says on the tin – how many rows are present in each strip. The number will be the same except for the last strip, which will not be padded with dummy zeros. So, if your imageHeight is 200 and your RowsPerStrip value is 64, you have 3 strips with 64 rows each and 1 strip with 8 rows.
- StripOffsets (Tag 273)
While RowsPerStrip only contains one value, StripOffsets contains an array with (at least) one value for each strip. These values are vital for the TIFF reader, as they tell the program where to find the starting byte (offset) for each strip. Life would be easier, if TIFF would write all strips consecutively and you could recalculate the offsets if one or more had gone missing – but this isn’t the case. To quote the spec: “Never assume that strip N+1 followsstrip N on disk.“
In other words: not knowing the offset of a strip is a bit like knowing that your friend lives in a blue house in LA and setting off on foot to find it.
- StripByteCounts (Tag 279):
Like StripOffsets, StripByteCounts contains an array with (at least) one value for each strip. This value describes the number of bytes in each strip. Despite the fixed number of rows, the number of bytes is not necessarily the same for each strip, due to two reasons: one – the last strip having less rows; two – while strips have the same byte length uncompressed, compression can change this.
The above mentioned “at least one value per strip” for the arrays in StripOffsets and StripByteCounts depends on the planar configuration of the TIFF file. Without going into too much detail:
- Planar Configuration = Chunky –> 1D array, meaning one value per strip
- Planar Configuration = Planar -> 2 D array, meaning two values per strip
Either way, the number of values in the StripOffsets and StripByteCounts arrays should always be the same.
So what if it isn’t? Enter the error message in this blog’s title:
StripOffsets inconsistent with StripByteCounts
By now you should be able to understand that this message is no good news. Let’s look at a real life example:
We can extract the values for the ImageHeight and the three strip-related tags with exiftool.
exiftool -ImageHeight -StripOffsets -RowsPerStrip -StripByteCounts filename
For the sample file above, I get the following result:
Image Height : 7032 Strip Offsets : 716 Rows Per Strip : 512 Strip Byte Counts : (Binary data 81 bytes, use -b option to extract)
We now already know that each strip has 512 rows. There is only one Offset value, making us believe that there is only one strip. To compare that against the number of StripByteCounts we need to extract the binary data for that tag:
exiftool -b -StripByteCounts filename 23604 43028 42027 47615 57651 57171 64208 64755 64987 70643 70452 73899 40458 763
StripByteCounts returns 14 values. The first observation is that this is inline with the JHOVE error message, which was “StripOffsets inconsistent with StripByteCounts: 1 != 14“.
The second observation is that we can now also tell that something isn’t right by comparing the ImageHeight to the number of RowsPerStrip to the number of Strips as per StripOffsets:
7032 = 13 * 512 + 1 * 376 -> 14 rows
Since only one offset is given, the reader has no chance to find the remaining 13 strips of image data.
How can I fix it?
This is where it gets really bad … there isn’t really a way how these files can be fixed.
If the stars align perfectly and you have an uncompressed TIFF which actually stored all strips consecutively, you could, in theory try to recalculate the offset based on the StripByteCounts values. But, as I pointed out in the “What’s the problem” section, the specification doesn’t require the strips to be written in that way.
How do I rate the issue?
As described in the last blog, I use the following easy scale to rate issues:
- Critical – indicating a severe issue, data loss, unreadable files
- Middle – violations of standard / expected structure which should be fixed
- Low – issues which can be neglected / fixes postponed to later date
In this case there is no way around it: The problem is critical, as the content cannot be rendered correctly. Since we can’t fix it, the only solution is to receive an intact file from the data producer. And unfortunately we know that all too often that is just not possible.