string.sub produces incorrect characters

Seeing some odd behavior with console output and display.newText rendering when using string.sub to produce the data.  I have data coming from a google docs form that I suspect might have non-utf data in the text.  When produce the text from a print() to the console or display.newText the data matches what is from the google doc.  If I use string.sub to iterate through the string it outputs incorrect data:

in the following code the variable answer is populated from a JSON file produced from a google doc spreadsheet:

[lua]

print( “answer”, answer )

    

local correctLable = “”

 local count

for count = 1, string.len( answer ) do

  correctLable = correctLable … string.sub( answer, count, count )

  if count < string.len( answer ) then

    correctLable = correctLable … " "

  end

end

print( correctLable )

[/lua]

the output from the first print statement to the mac terminal is correct, the output from the second print statement is incorrect:

2014-06-30 17:04:52.209 Corona Simulator[5908:507] answer Gilligan’s Island

2014-06-30 17:04:52.209 Corona Simulator[5908:507] G i l l i g a n â € ™ s   I s l a n d

I suspect it’s my ignorance in text encoding from the google doc to the json file, but I’m out of answers.

Thanks for any assistance or pointers.

Here’s the actual text, not sure if this post will re-encode it or not:

Gilligan’s Island

The ’ in Gilligan’s is a smart quote.  It is a UTF-8 character which is multiple bytes long.  Lua’s string operators still operate at the byte level and like string.len() is counting the multiple characters that make up the smart quote.  Is there a reason this is an issue for you?  Corona handles UTF-8 characters correctly.

Rob

Here’s the actual text, not sure if this post will re-encode it or not:

Gilligan’s Island

The ’ in Gilligan’s is a smart quote.  It is a UTF-8 character which is multiple bytes long.  Lua’s string operators still operate at the byte level and like string.len() is counting the multiple characters that make up the smart quote.  Is there a reason this is an issue for you?  Corona handles UTF-8 characters correctly.

Rob