Splitting a string - looping with regular expressions

I am horrible with regular expressions.  I thought that this would be on the easier side, but when I sat down to do to it, I wasn’t able to get it to work and got frustrated and decided I didn’t need to do it, so I threw it away and did something different.

One of my beta testers, however, thought that how I presented the text didn’t look good, so considering going back to this again.

Here is what I need to do. Take a string of text that I retrieve from an XML file and parse it into separate strings.  Here is an example of the string:

LINE 1\n|Line 2 will be a long line of text.\n\n|LINE 3\n|Line 4 is another line of text that will appear here.

I wanted to break that into 4 strings – using the | character as the character to break on, but not include the character in the final strings – keeping the line breaks, that I will then create separate text elements of each, so that I can center lines 1 and 3, and left align lines 2 and 4. 

It will not be consistently 4 strings.  It could be anything from 2-8, most likely. And I can’t break the string up in the XML file (to keep with the node formatting that was already structured).

I use the Text Wrapper component to determine the largest font size that I can use for any given text block, so I will use that string as a whole to get the correct font size, then use the broken up strings as separate text elements with that correct font size, positioned one after the other.

Any suggestions on how to parse such a string, especially not knowing from the outset in the code how many strings you will be creating?

Appreciate any help that anyone can give.

What is the difficult that you are facing?

 

I was having trouble parsing it.  Again, I am not good with the regular expressions and when I threw looping in, I was getting confused.  :slight_smile:

I might just sit down and try again, but figured I would ask if someone had already done it before and could just point me to a code sample that addresses that situation where you are breaking something into sub-strings when you don’t know ahead of time how many parts you are breaking it into.

Can you give a real example of the input and output that you want?

Basically, I am trying to make something format like a screenplay, so one string could be:

DAVID\n|I need help because I really suck at regular expressions.\n\n|MIKE\n|Is this really important? Damn you and your scope creep!

And then another could just be:

MIKE\n|Really? Do you really need to do this now?

In the first example, I want it to output 4 strings:

string[1]: “DAVID\n”

string[2]: “I need help because I really suck at regular expressions.\n\n”

string[3]: “MIKE\n”

string[4]: “Is this really important? Damn you and your scope creep!”

In the second example, it would output 2 strings:

string[1]: “MIKE\n”

string[2]: “Really? Do you really need to do this now?”

And this is not at all based on actual conversations.   :slight_smile:

Once I have that broken up, I can go through those and build text components out of them and position them accordingly, which I am fairly certain I know how to do that part.

I chose the | character as the character to use to determine where to break up the string because I am going to run the original string through the Text Wrapper component to judge the largest font size that I can use for the designated area.  Since it has a very narrow character width, I should be able to trust that when it gets removed from the strings, it won’t affect sizing in any way.  Plus, I would never actually use that character ever in a string normally, so it’s a safe one to assume can be stripped.

(Sorry, fixed an error with the \n characters as I thought I had too many, but I did not.)

local split = function (str, delim, maxNb) -- Eliminate bad cases... if string.find(str, delim) == nil then return { str } end if maxNb == nil or maxNb \< 1 then maxNb = 0 -- No limit end local result = {} local pat = "(.-)" .. delim .. "()" local nb = 0 local lastPos for part, pos in string.gfind(str, pat) do nb = nb + 1 result[nb] = part lastPos = pos if nb == maxNb then break end end -- Handle the last field if nb ~= maxNb then result[nb + 1] = string.sub(str, lastPos) end return result end local s = "DAVID\n|I need help because I really suck at regular expressions.\n\n|MIKE\n|Is this really important? Damn you and your scope creep!" local phrases = split(s, "|")

Damn, really?  Wow, you are fast!  Or I am just really bad at Lua.  :slight_smile:

Thanks so much, Renato.  Will try this out.  So appreciative of your time.

Wow. That was fast. Thanks for sharing this useful function.

I was just about to share the following which is what I managed to come up in the same time. I was going to say here are the string functions you need. Put it into some form of recursive loop and you’ll get what you need but then Renato happened!!!  :slight_smile: I am humbled!

local jsonLine = "LINE 1\n|Line 2 will be a long line of text.\n\n|LINE 3\n|Line 4 is another line of text that will appear here." local parsedStrings = {} local separatorLocation = string.find(jsonLine,"|" ) -- now we know where the first separator is print(separatorLocation) parsedStrings[1] = string.sub(jsonLine, 1, (separatorLocation - 1)) print(parsedStrings[1]) local jsonLineLength = string.len(jsonLine) jsonLine = string.sub(jsonLine, (separatorLocation + 1), jsonLineLength) print(jsonLine)

I know, seriously!  Took me more time to write the message than it did Renato to create code.   :slight_smile:

I didn’t create the code, I already had that in one of my libs due to some prior need. I got it from here: http://lua-users.org/wiki/SplitJoin

Either way, Renato, thank you so much for providing that code.  I think I looked at that site as well, but didn’t put 2 and 2 together.  So glad you knew it was what I needed.

And ksan, thank you as well for even trying to work on some code for me.  Didn’t see it until after I had started implementing the code Renato provided, but definitely appreciate it.

It did work for me.  Took me a little time to get it implemented (had trouble with finding/removing line breaks and doing some other matching, but I figured it out), and it is working great. 

Again, thank you both for your help.

Super!!! Sounds like you have a fun project. Look forward to seeing it when you release. Good luck!!!

On our side we are usually using the string.gmatch function to achieve this. I don’t know which one is faster; The split or  gmatch. Here is some sample code (you can replace the ‘|’, by comma or any other character that you use to split your sentence:
 

     local strToSplit =“DAVID\n|I need help because I really suck at regular expressions.\n\n|MIKE\n|Is this really important? Damn you and your scope creep!”

           

    for phrase in string.gmatch(strToSplit, “[^|]+”) do

         print(phrase)

    end

Thanks nmichaud.  I did end up going with Renato’s code, and it works fast enough for my app.  Would definitely recommend it for someone with a similar need.

What is the difficult that you are facing?

 

I was having trouble parsing it.  Again, I am not good with the regular expressions and when I threw looping in, I was getting confused.  :slight_smile:

I might just sit down and try again, but figured I would ask if someone had already done it before and could just point me to a code sample that addresses that situation where you are breaking something into sub-strings when you don’t know ahead of time how many parts you are breaking it into.

Can you give a real example of the input and output that you want?

Basically, I am trying to make something format like a screenplay, so one string could be:

DAVID\n|I need help because I really suck at regular expressions.\n\n|MIKE\n|Is this really important? Damn you and your scope creep!

And then another could just be:

MIKE\n|Really? Do you really need to do this now?

In the first example, I want it to output 4 strings:

string[1]: “DAVID\n”

string[2]: “I need help because I really suck at regular expressions.\n\n”

string[3]: “MIKE\n”

string[4]: “Is this really important? Damn you and your scope creep!”

In the second example, it would output 2 strings:

string[1]: “MIKE\n”

string[2]: “Really? Do you really need to do this now?”

And this is not at all based on actual conversations.   :slight_smile:

Once I have that broken up, I can go through those and build text components out of them and position them accordingly, which I am fairly certain I know how to do that part.

I chose the | character as the character to use to determine where to break up the string because I am going to run the original string through the Text Wrapper component to judge the largest font size that I can use for the designated area.  Since it has a very narrow character width, I should be able to trust that when it gets removed from the strings, it won’t affect sizing in any way.  Plus, I would never actually use that character ever in a string normally, so it’s a safe one to assume can be stripped.

(Sorry, fixed an error with the \n characters as I thought I had too many, but I did not.)

local split = function (str, delim, maxNb) -- Eliminate bad cases... if string.find(str, delim) == nil then return { str } end if maxNb == nil or maxNb \< 1 then maxNb = 0 -- No limit end local result = {} local pat = "(.-)" .. delim .. "()" local nb = 0 local lastPos for part, pos in string.gfind(str, pat) do nb = nb + 1 result[nb] = part lastPos = pos if nb == maxNb then break end end -- Handle the last field if nb ~= maxNb then result[nb + 1] = string.sub(str, lastPos) end return result end local s = "DAVID\n|I need help because I really suck at regular expressions.\n\n|MIKE\n|Is this really important? Damn you and your scope creep!" local phrases = split(s, "|")

Damn, really?  Wow, you are fast!  Or I am just really bad at Lua.  :slight_smile:

Thanks so much, Renato.  Will try this out.  So appreciative of your time.