How to identify trash characters from text file

kilopop · September 28, 2016, 8:45am

Reading from a text file and then printing the lines. The simulator on MacOS outputs an extra number of letters.

 local filePath = system.pathForFile( textFile, system.ResourceDirectory ) local file = io.open( filePath, "rb" ) for lines in file:lines() do print("Lines = ", lines, #lines) end

This is what is sent to the simulator output window. You can see it prints lines and then starts #lines on a new line (print must be recognising the carriage return symbol as such). Then the number on a new line is listed as 1 too many.

Sep 28 09:17:01.994 Lines = \<p2\_1 6 Sep 28 09:17:01.994 Lines = Some of the critters can be GIGANTIC! 38 Sep 28 09:17:01.994 Lines = 1 Sep 28 09:17:01.994 Lines = \<p2\_2 6

Opening the text in textMate shows the addition of <CR> after every new line. This is due to the text file being edited in Windows and saved as a windows text format.

Question, how to identify these trash characters in Corona or is there a program or text encoding that can remove these characters?

rob · September 28, 2016, 4:47pm

Since you are reading in a line at a time, the carriage-return character should be the last character on the line. You can use something like:

if lines:ends("\r") then &nbsp; &nbsp; &nbsp;lines = lines:sub(lines:len() - 1 )) end

(not tested)

\r is the escape character that represents the CR (^M, ASCII 13) character. If you’re dealing with UTF-8 strings, it would be wise to use the UTF-8 library instead which means not doing that object style but more like:

if utf8.ends( lines, "\r") then &nbsp; &nbsp; &nbsp;lines = utf8.sub( lines, utf8.len( lines - 1) ) end

Rob

kilopop · September 28, 2016, 9:19pm

Many thanks Rob. Just to follow up. lines:ends("\r") didn’t work.

utf8.ends responds with: “Attempt to call field ‘ends’ (a nil value)” The plugin has been registered and required. I can’t find anything about the function ends online, does it exist?

rob · September 28, 2016, 9:54pm

UTF8 might not have an ends function. I thought it did. You can still use other functions like utf8.sub( lines, utf8.len(lines), utf8.len(lines)) to check the last character.

kilopop · September 28, 2016, 11:02pm

Thanks, this is what ended up identifying that carridge return:

if utf8.sub( lines, utf8.len(lines), utf8.len(lines)) == "\r" then lines = utf8.sub( lines, 1, (utf8.len( lines ))-1 ) end

rob · September 29, 2016, 1:43pm

That works! Glad you got it working!

Rob

rob · September 28, 2016, 4:47pm