Notepad defuses the BOM.

, posted: 17-Jul-2008 19:37

Yesterday, I converted a rather large collection of asp files which had no actual asp content aside from some includes of other plain html asp files, into a rather large collection of html files with includes of other html files.

Pretty simple operation, one that happens all the time in my world for reasons which are unimportant.

However this afternoon I spend nearly 2 hours trying to work out what the hell went wrong with the new files' layout in Firefox and IE (Opera, no trouble), there seemed to be random margins or spacing in places that there shouldn't be, even though the files aside from the changing of link urls and the file extentions were essentially identical.

I spent ages twiddling in Firebug looking for the damn margin that shouldn't have been, using LiveHeaders to see if any CSS file was getting 404'd, correcting various syntax problems in the source html, putting back mistakes in the source html syntax I had already fixed just to see if there was some combination of errors producing a good result.

Nothing I did helped.  Until I had the bright idea to "View Source" of a badly layed out page in Notepad, and bingo I saw the problem right away, UTF-8 Byte-Order-Marks (well, "This is UTF-8" markers really) were present in the included files.

The Byte-Order-Mark for UTF-8 is actually the "Zero Width Non Breaking Space" character, so in my utf-8 editor (jEdit, on my Ubuntu workstation) I didn't see these at all they were zero width spaces, completely invisible, notepad however didn't know how to display this and just substituted the empty-box character.

Similarly in Firefox and IE they were invisible, but even though they are zero width that non-breaking bit forced the following div's to drop down a line.  Opera either ignored the character, or treated it as a breaking (and thus unimportant) space.

So anyway, a teency bit of PHP to remove the byte order mark (I have glossed over why PHP is involved here, it's complicated and unimportant):

   function strip_bom($string)
     if(@$string[0] == chr(0xEF) && @$string[1] == chr(0xBB) && @$string[2] == chr(0xBF))
       return substr($string, 3);
     return $string;

and all was well.

I suspect that the BOMs originated from a designer's use of Dreamweaver to edit some of the files before they were passed onto me to convert, I can't complain too much though at least they are using UTF-8 now.

Other related posts:
Xero vs. Quickbooks, from a Quickbooks User
Vodafone Website Failure Fails
CSS namespacing, somebody tell me what I'm doing wrong.

sleemanj's profile

James Sleeman
New Zealand

PHP Programmer Extraordinaire

All views expressed are held by the poster, not necessarily any person or organisation associated therewith.