Xml   (non breaking space) and JavaScript string comparison

by Jason Haley 12. September 2006 16:14

It is hard for me to believe I have not ran into this interesting item earlier ...   =   !=  

Say you have an Xml file like this:

<?xml version="1.0" encoding="utf-8"?>

<Books>

    <Book>

        <Title>Why Software Sucks...and What You Can Do About It</Title>

        <Author>David S Platt</Author>

    </Book>

    <Book>

        <Title>Cascading Style Sheets</Title>

        <Author>Eric A Meyer</Author>

    </Book>

</Books>

 And you want to translate that file using a stylesheet like this: 

<?xml version="1.0" encoding="UTF-8" ?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 

    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>

 

    <xsl:template match="/">

        <MyBooks>

            <xsl:for-each select="//Book">

                <Book>

                    <xsl:value-of select="Author"/>&#xA0;:&#xA0;<xsl:value-of select="Title"/>

                </Book>

            </xsl:for-each>

        </MyBooks>

    </xsl:template>

</xsl:stylesheet>

You will end up with an xml doc like this:

<?xml version="1.0" encoding="utf-8"?>

<MyBooks>

    <Book>David S Platt : Why Software Sucks...and What You Can Do About It</Book>

    <Book>Eric A Meyer : Cascading Style Sheets</Book>

</MyBooks>

So far so good right?  Everything looks fine, until you start do do a string comparison against the new MyBooks/Book text to something like a string in javascript.  Here is some javascript that shows some unexpected results until you look at the character codes of the strings. 

// Simple function to check for item in an array
function IndexOf(list, item)
{
    for (var i=0; i<list.length;i++)
    {
        if (list[i] == item)
        {
            return i;
        }
    }
    return -1;
}

// Items originally generated from Xml/Xslt Transform
var myBooks = "David S Platt : Why Software Sucks...and What You Can Do About It,Eric A Meyer : Cascading Style Sheets"

// Items created in some other manner
var notPrintedYet = "David S Platt : Why Software Sucks...and What You Can Do About It,Adam Machanic : Expert SQL Server 2005 Development"

// Split the books into arrays
var books = myBooks.split(",");
var notPrinted = notPrintedYet.split(",");

// Write out the books, checking to make sure they are not in the notPrinted list first
for (var i = 0; i<books.length; i++)
{
    if (IndexOf(notPrinted, books[i]) == -1)
    {
        document.write(books[i]);
       document.write("<br>");
    }
}

The results show both Dave Platt's book and Eric Meyer's book.

Here is some script that will build a table comparing the Char codes and highlight the differences:

// The problem book is the first one, so see what the differences are in the Chars
var item1 = books[0];
var item2 = notPrinted[0];

// Build tables
document.write("<table cellpadding='2' cellspacing='0'><tr>");
for (var i = 0; i <item1.length; i++)
{
    // highlight any differences
    if (item1.charCodeAt(i) != item2.charCodeAt(i))
        styleAttribute = " style='background:yellow';";
    else
        styleAttribute = ""  ;

    document.write("<tr>");
    document.write("<td" + styleAttribute + " class='letter'>'");
    document.write(item1.charAt(i));
    document.write("'</td>");
    document.write("<td" + styleAttribute + ">");
    document.write(item1.charCodeAt(i));  
    document.write("</td>");
    document.write("<td" + styleAttribute + " class='letter'>'");
    document.write(item2.charAt(i));
    document.write("'</td>");
    document.write("<td" + styleAttribute + ">");
    document.write(item2.charCodeAt(i));
    document.write("</td>");
    document.write("<tr>");
}
document.write("</tr></table>");

Here is the part of the comparison that is interesting:

The string coming from the Xml/Xslt transform that used &#xA0; for a whitespace is a character code of 160 in Javascript ... this means watch what character code you use for whitespace in your xml - &#xA0; and &#160; will not be equal to a normal space (&#32;).

Solution for me was to just use &#32; instead of the &#xA0;

Some resources with more detail than pictures: 

http://www.dpawson.co.uk/xsl/characters.html

http://www.w3.org/TR/2000/WD-xml-2e-20000814.html#charsets

 

 

Comments (0) | Post RSSRSS comment feed |

Categories:
Tags:

Comments are closed