Binary Search

werdinteractive · August 3, 2010, 12:57pm

I need to speed up the amount of time it takes to find a word in a large dictionary array. I have found some examples on google, but I can not figure out where to plug in the array and the word that I am searching for. It would be really helpful if someone could either help explain this to me or something similar.

This is what I am currently doing.

MatchFound = false

for o=1,#DictionaryArray do
if(DiplayWordText.text == DictionaryArray[o])then
MatchFound = true;
end
end [import]uid: 4871 topic_id: 1582 reply_id: 301582[/import]

Lourenco · August 3, 2010, 1:23pm

I would do something like this :

function wordExists (pWord)  
  
 local filePath = system.pathForFile( "dictionary.file", system.ResourceDirectory )  
 local w  
  
 local file = io.open( filePath, "r" )  
  
 if file then  
  
 for r = 1, numberOfRecordsInDictionaryFile do  
 w = file:read ("\*l")  
  
 if (w == pWord) then -- word exists  
 io.close (file)  
 return true  
 end  
  
 if (w \> pWord) then -- sure word does not exist (pre-requisite : entries in your dictionary file are sorted in ascending order)  
 io.close (file)  
 return false  
 end  
  
 end  
  
 -- read file complete but did not find word  
 io.close (file)  
 return false  
  
 else  
  
 return false  
  
 end  
  
end

Hope this helps. [import]uid: 6661 topic_id: 1582 reply_id: 4494[/import]

carlos_m_icaza · August 3, 2010, 2:33pm

Out of curiosity, how many words are you searching?

100, 1,000, 10,000+

And are you doing the match while typing or idle?
C [import]uid: 24 topic_id: 1582 reply_id: 4500[/import]

werdinteractive · August 3, 2010, 6:06pm

I am frequently confirming a match on submission in a game and am searching well over 200,000 words in ascending order.

If I could get some help finding a way faster than the way that I am doing it I would be very happy, but the way that was suggested is not a binary search and looks like it would take longer than the way that I am doing it now. Though, I am thankful for the suggestion. I am glad to see someone try to help me figure this out.

I built an array in a separate lua file with all the words and required it at the beginning of my document. There is no lag on start up and it takes between .5 seconds to 1.5 seconds to preform on the device. The lag is minimal, but I am required to reduce it with a binary search or something more efficient than just running through the array item by item.
[import]uid: 4871 topic_id: 1582 reply_id: 4507[/import]

carlos_m_icaza · August 3, 2010, 6:51pm

have you tried using sql lite? and indexing the words so that you can get to them faster?

that’s one thought…

create the file
index the file that contains the words using sql lite

then
search for a word in the file. since it is index, sql lite would probably be a lost faster…

or

are the words saved fixed length on the file? meaning each word saved takes max 20 characters even with spacing?

then you know the length of the file

choose the word you want to search - seek to the middle of the file since you know the offset based on fixed word length.
proceed to the right half of the file if the word is greater/proceed to the left is the word is smaller.

or in your case, you are loading all 200,000 words in a table, which will eat up memory,

since you know the size of the array,

choose word to compare
pick the middle of the array
compare
choose left or right of the array depending if word is bigger/smaller
pick the half of the right/left
iterate until word matches.

C [import]uid: 24 topic_id: 1582 reply_id: 4510[/import]

carlos_m_icaza · August 3, 2010, 6:54pm

at one point, it is better to load a fixed set of data to compare against as the smaller the number can cause n2 performance.

choose a minimum of 50 words or 20 words when doing your recursive calls then do a linear search. a binary search on data that is almost sorted on recursive calls can actually slow down to n2, n (linear) beats n2.

c. [import]uid: 24 topic_id: 1582 reply_id: 4511[/import]

werdinteractive · August 3, 2010, 7:13pm

Thanks that helps out quite a bit. I have been search how to do the binary search and have had no luck figuring out what it was or how it worked. It seems very complicated for something that is doing more than just a simple linear search. Hopefully I can find some readable code now that I understand what it is supposed to be doing. I am going to look up SQL lite also. [import]uid: 4871 topic_id: 1582 reply_id: 4512[/import]

werdinteractive · August 3, 2010, 7:56pm

How do I divide a word into letters? They are in ABC order, so I am just going to iterate backward if the first letter is after M. It looks like I cannot do the search type I was looking for without restructuring all of the words by size. [import]uid: 4871 topic_id: 1582 reply_id: 4515[/import]

carlos_m_icaza · August 4, 2010, 8:50am

Here is another thought

you could break down your giant table into 26 tables, one table for each letter of the alphabet.

tableA = { } – all the words that begin with a
tableB = {} – all the words that begin with b

then when you get your word to compare, check the first letter and only compare that against the corresponding table.

if the firstLetter = A then search on tableA
else if firstLetter = B then search on tableB

and depending on the number of words on each table, you could save each table to a file and load that file specifically and still would be much faster than doing a linear search on 200,000 words.

Carlos

[import]uid: 24 topic_id: 1582 reply_id: 4533[/import]

werdinteractive · August 6, 2010, 10:23am

Think you so much. I got everything working fine. After you told me to use SQL I started looking it up and found and even faster way in the forums. I made every word = true and checked if the word I was looking for was a nil value in the table. It is instantaneous and simple. [import]uid: 4871 topic_id: 1582 reply_id: 4644[/import]

carlos_m_icaza · August 9, 2010, 1:35pm

Your wilcome

C [import]uid: 24 topic_id: 1582 reply_id: 4744[/import]

Binary Search

then search for a word in the file. since it is index, sql lite would probably be a lost faster…

then
search for a word in the file. since it is index, sql lite would probably be a lost faster…