Dictionaries for Scrabble game

Hi guys,

I was thinking of making a scrabble game, I like these types of games and I wanted to make a version of them too.

The main problem is dictionary files for different languages. Are they available online or is there something ready? I looked for something but I’m not satisfied. Also I would like to remove bad words or offensive words…

Has anyone already dealt with this?

I have an English word list as a MySQL database if that’s any good to you? I collated it for an old boggle-like Facebook game I built years ago, so I believe it contains 4-8 letter words (possibly 3-8 actually). I’m fairly sure it’s free of “bad words” but I can’t remember whether it’s US or UK English. I’ll dig it out for you tomorrow if it sounds helpful.

Multilingual would definitely be a complication though. You could potentially write a script that loops through this database and passes each word through a translation API to build further lists, but the result isn’t necessarily going to be accurate.

Yes, thank you so much it would be a great start!

I do not really like the translator’s idea either. However, on the net I see many games of this kind that work on different languages. So I think somewhere there is some ready food. I doubt that each of the apps I’ve seen has a team to create dictionaries from 0.

Sorry for coming back a little late with this, but here’s the MySQL table. It looks like it contains 2+ letter count words, and they’re indexed by first letter for speedy search results. Contains just short of 114,000 words and I have a feeling they were sourced from open scrabble-allowed lists, though unfortunately does seem to still contain profanity!

Hope this is useful.

No problem and thanks again!

I wanted to ask you if you’ve ever used this file locally.

I ask this to know any performance issues when uploading the file

If you keep it as a MySQL database it should perform just fine. There’s an index on the letter column and MySQL is more than capable of churning through this number of records in almost no time, subject to server spec and how your queries are constructed of course.

That said, I’ve only used it with PHP. To date I’ve not used databases with Corona.

If you split it into files I’d recommend a file for each starting letter so that you only need to load in and iterate the words that begin the same letter that you’re looking for.

For even more performance you could split into letter count based files too. E.g. if the user types “Corona” and you need to check that this is in the dictionary, you could load in the “c_6.txt” file, which would only contain the 6 letter words beginning with c, and iterate those to check for “Corona”. This is effectively just how database indexes work and should perform very well.

II thought of dividing the file by creating one for each letter, but not the rest.

Thanks for the tips!

I would recommend on using a local word list. I created a prototype word game some years back with ~172k English words in a simple .txt file. From what I remember, there was no noticeable delay or lag of any kind when going through the dictionary.

Also, as you already pointed out, figuring out foreign languages can be a real hassle. Finding dictionaries is easy enough, removing profanity as well, but then you’d have to actually understand the typical word structure in each language in order to know what alphabets to provide the player with and in what quantities, how to give points for different languages, setup score requirements, etc. It just gets very difficult without a proficient speaker as a part of the development team and you’d require extensive testing.

I’d recommend just focusing on developing a word game in your own language at first. If the game finds traction, then look into translations.

I have an English word list as a MySQL database if that’s any good to you? I collated it for an old boggle-like Facebook game I built years ago, so I believe it contains 4-8 letter words (possibly 3-8 actually). I’m fairly sure it’s free of “bad words” but I can’t remember whether it’s US or UK English. I’ll dig it out for you tomorrow if it sounds helpful.

Multilingual would definitely be a complication though. You could potentially write a script that loops through this database and passes each word through a translation API to build further lists, but the result isn’t necessarily going to be accurate.

Yes, thank you so much it would be a great start!

I do not really like the translator’s idea either. However, on the net I see many games of this kind that work on different languages. So I think somewhere there is some ready food. I doubt that each of the apps I’ve seen has a team to create dictionaries from 0.

Sorry for coming back a little late with this, but here’s the MySQL table. It looks like it contains 2+ letter count words, and they’re indexed by first letter for speedy search results. Contains just short of 114,000 words and I have a feeling they were sourced from open scrabble-allowed lists, though unfortunately does seem to still contain profanity!

Hope this is useful.

No problem and thanks again!

I wanted to ask you if you’ve ever used this file locally.

I ask this to know any performance issues when uploading the file

If you keep it as a MySQL database it should perform just fine. There’s an index on the letter column and MySQL is more than capable of churning through this number of records in almost no time, subject to server spec and how your queries are constructed of course.

That said, I’ve only used it with PHP. To date I’ve not used databases with Corona.

If you split it into files I’d recommend a file for each starting letter so that you only need to load in and iterate the words that begin the same letter that you’re looking for.

For even more performance you could split into letter count based files too. E.g. if the user types “Corona” and you need to check that this is in the dictionary, you could load in the “c_6.txt” file, which would only contain the 6 letter words beginning with c, and iterate those to check for “Corona”. This is effectively just how database indexes work and should perform very well.

II thought of dividing the file by creating one for each letter, but not the rest.

Thanks for the tips!

I would recommend on using a local word list. I created a prototype word game some years back with ~172k English words in a simple .txt file. From what I remember, there was no noticeable delay or lag of any kind when going through the dictionary.

Also, as you already pointed out, figuring out foreign languages can be a real hassle. Finding dictionaries is easy enough, removing profanity as well, but then you’d have to actually understand the typical word structure in each language in order to know what alphabets to provide the player with and in what quantities, how to give points for different languages, setup score requirements, etc. It just gets very difficult without a proficient speaker as a part of the development team and you’d require extensive testing.

I’d recommend just focusing on developing a word game in your own language at first. If the game finds traction, then look into translations.