Coronium GS - The Coronium Game Server

Hi,

I’ve taken a look at appwarp client code and their code gave me the needed hint to solve the issue (I hope its not premature celebration but I really hope this is it as I’ve spend nearly 3 days on this).

The method client:receive(’*l’) returns a third parameter called partial which was not handled by the client code. When communication is not robust the message takes time to arrive and when that happens calling receive returns a partial result. In addition it arrives with an error message for timeout, so I always thought it was network failure while in fact the connection was ok, simply very slow…

In order to handle that properly I’ve modified the beginning of the tick() internal method on CoroniumGSClient as follows:

 local function tick() local input, output = socketlib.select( { self.socket }, { self.socket }, TIMEOUT ) for \_, client in ipairs( input ) do local data, err, partial = client:receive('\*l') if partial and #partial \> 0 then if not self.partialData then self.partialData = partial else self.partialData = self.partialData .. partial end elseif data and not err then --== Incoming data if self.partialData then data = self.partialData .. data self.partialData = nil end local data = json.decode( data ) if type( data ) ~= "table" then data = {} end

Question on server robustness. what can I do to overcome a situation where the server ran into some exception in production and automatically restart it?

I’m using both Coronium.GS and Coronium.IO.

Thanks.

Still looking for a method to recover the instance. It seems lately that the service is exiting for now good reason, perhaps idling too much? Where can I check the relevant log for that? How can I automatically restart the service?

Hi,

Yes, GS was specifically retired because I was no longer confident with the engine it runs on (luvit.io). It’s very finicky and I found that trying to trap for random errors was futile. Things like leaving connections open in the terminal overnight, will most likely end in an error condition at some point in time, taking down the whole thing. Not cool.

I do know that there have been improvements to the luvit/luvi engine, but I have not been compelled to rewrite that project at this time.

The actual server and client source is available to hack on or study here: https://github.com/develephant/coronium-game-server

You can view archived documentation here: http://coronium.io/gs/

Cheers.

I think most of the dependency is on luvit net class, but you’ve already implemented the equivalent of it on the client side. Do you think it would be easy for me to switch the server to the alternative and implement it over the coronium.io package? Would it improve robustness in your opinion?

Hi,

There is actually more going on in the server side than may be obvious (event handling especially). Luvit is an event driven loop similar to NodeJS (using libuv), a very different architecture than Coronium Cloud, which follows a more standard web server model (though it does run on an event loop as well via Nginx) and has a completely different architecture.

Coronium GS is a real time socket connection, Coronium Cloud is not. This fact alone will cause you nothing but headaches.

If you would like to see a more robust socket server solution, please leave a note in the feedback section here: http://feedback.coronalabs.com/

Cheers.

Hi, has anyone been able to go live in production with Coronium GS?

It seems to be able to run OK on good wi-fi connection but I’ve been struggling with cellular connection. Lots of mobile users have un-stable connection and Coronium GS has very little functionality to handle this properly.

I’ve tried to simply increase timeout numbers on the client but this does not solve the issue and despite being TCP based, I’m losing critical messages to the client.

If someone managed to issue a productive game with this engine, I’d appreciate any help on how to properly setup the system to get robust communication.

@rune7

When cellular connection is not stable, and you are saying you are losing critical messages to the client, what do you mean? You mean some message dropped? or the client dropped? or?

Could you explain more about the situation? and what kind of handling do you expect Coronium GS to do?

Hi,  

I mean that messages sent to the client are not received. The client still has connection to the server, but the message is lost. Instead we sometime get a timeout error on the socket.select method. This is despite setting the timeout to 10 seconds (vs the default 0).

@rune7

I don’t have project go live with Coronium GS yet but I have the intention to use it for my next project. So I am interested in finding out if the server has any problem.

Please don’t mind that I am not an expert about it, but I do have experience with networking before (I was an engineer of an VoIP server).

So in your case… I still don’t clearly get the whole picture of the problem you are seeing

  1. The client still has the connection to the server… how do you know?

  2. If the connection is still there, why is message to the client is lost? 

  3. You get a timeout error sometimes. Does it mean the connection is lost or what does it mean to you?

Hi joe528, 

I’m sure you have far more knowledge than me about networking. Regarding your questions:

  1. I do not see any disconnect error on the server or client. its mostly time out messages. Also, some of the messages we lose do not cause the complete halt of the game (there only few critical points where the message ask the user for input and without it the game halts), so in that case we see a message timeout and then the next message arrives.

  2. I do not know. it does not make sense as tcp protocol should guarantee we get it. I know for sure its being sent by the server. I don’t really know what happens on the device until it reaches the app client code. I enlarged the timeout significantly to avoid losing packets but it only partially helps. However, the messages that get lost are often very short, so its probably not that.

  3. Again, I’m not an expert on sockets so I only know I get timeout error on socket.select. why? I can’t tell from the message itself. its just plain “timeout”. I can say that for sure I do not get this at all when I tried local LAN and also I did not get it so far on good WiFi connections.

@rune7

I had tried Coronium GS for a while but months ago, so maybe I am not totally correct, here are just some ideas for consideration:

Coronium GS is a very basic server. It only handle very basic things, which is good because it provides the flexibility so that all kinds of games/apps can use it.

Hence, it totally depends on your protocol and handling.

I am not sure what you are trying to achieve and what kind of protocols/flows you are trying to implement. My guess is that Coronium GS cannot do much to solve your problem because it is just a basic framework.

For unstable connection, the error handling just has to be done at the application level. You might need to implement state machines for each connection handled by the server and for each client. You cannot assume a message sent will be delivered to a client even if it’s TCP that guarantees a delivery (if connected) because the client might be disconnected at network level from the server and a timeout has happened. And later the client might reconnect to the server after the network assumes (the your application should know this and send the message again).

PS. I am not sure if there is some additional timeout from Coronium GS on addition to the socket timeout. You have to look into the server code (it’s not a lot of code to dig out because the whole framework is very basic). As far as I could remember, the was some timeout from Coronium server I wasn’t expecting. Maybe it closes a connection if no message is sent for a certain time. So it seems there is a option to turn on “ping” in server configuration.

Hi joe528,

Yes, I’m quite familiar by now with the server code as I modified it significantly for the game purposes. However, I did not touch the basic sockets code as I thought there was not much to add there. If I understand correctly I cannot assume delivery even on tcp? so, should I switch to udp and implement checks and retries myself? I read a bit about that and understood its much harder to reach a stable system, especially compared with the effort made into tcp.

the only thing I can think of to counter these issues is to implement another acknowledge layer monitored by the server. Basically send message, expect a return in ~1sec, if not, send again, then again after longer wait and so forth. Actually what tcp already does.

Or is there something better to try?

@rune7

In socket level, TCP packet delivery is guaranteed. You don’t need to switch to UDP to implement checks and retries.

What you need to do is to implement protocols & controls at application level. It varies a lot and many different kind of implementations. Sometimes it gets quite complicated. So maybe I can’t answer your question after all.

Hi,

 

@joe528 - First off, thanks for jumping in, it’s very appreciated.

 

@rune7 - I’m not sure what type of game you’re making, but some ideas:

 

The main sticking point that has come up with the GS server is to be aware how fast your pushing packets. You can’t use GS for a “true” real time streaming game. It was built with real-time “turn-based” use cases in mind. You can implement your own “throttle” and that may help. The problem becomes an issue when the mobile clients can’t consume the packets fast enough, so it starts holding back messages on the server, which exacerbates the problem.

 

As far as editing the server code, though simple, there is some cross-pollination between some of the components. As I’m sure you’re also aware, GS is running on Luvit. It’s not straight Lua. so you may find some resources here as well. The maintainer is fairly quick at responding.

 

All that being said, there are 3 games in the app stores right now that use GS that I know of. I’m not sure what kind of volume they are dealing with, but they are running last I heard.

 

And don’t forget to check some of the other options available to you:

 

http://appwarp.shephertz.com/

 

https://www.photonengine.com/en/Realtime

 

https://github.com/Overtorment/NoobHub

 

 

Cheers.

Hi joe528, develephant,

Thanks for the input.

develephant, If you can point me to developers I’ll try and reach out to them.

Regarding your points:

Yes, the game is turn based genre (although some actions are made in parallel). There is no time pressure on the server as each turn takes ~1-2 minutes and it involves around 8-10 server messages at most. Also messages are spaced to have at least 1 second delay in between.

It seems that there is a connection to the size of the message, which at times is dependent on the amount of players in the game (the game is for 3-8 players). I think that when the connection is not good, either part of the message does not get through or the message is lost somewhere on the way to the app code so the server is not aware the message was lost.

Moving to another backend at this stage will push back the project timetable significantly. I’d like to explore other options before considering another solution.

@rune7

Here is my two cents. Although mobile network is unstable, I think Coronium GS still can handle real-time streaming for games. It’s really up to the server & client to handle the unstable issue. Unless the unstable issue is preventing such a game to exist in a mobile network. For example, you can never use mobile network to play StarCraft. But if you try it, you still can play, but all players will feel the lag and eventually you will be disconnected because your network just can’t keep up with the game.

Therefore, if your game is not that real time, when a client is disconnected, all other clients should know it. And other clients should either wait for him to reconnect by showing a dialog saying waiting for someone who is not responding and eventually if the disconnection is too long, you have to drop it anyway.

To handle this seamlessly, the logics have to be implemented into the client code (sometimes the server gets involved with it too). 

Hi,

I’ve taken a look at appwarp client code and their code gave me the needed hint to solve the issue (I hope its not premature celebration but I really hope this is it as I’ve spend nearly 3 days on this).

The method client:receive(’*l’) returns a third parameter called partial which was not handled by the client code. When communication is not robust the message takes time to arrive and when that happens calling receive returns a partial result. In addition it arrives with an error message for timeout, so I always thought it was network failure while in fact the connection was ok, simply very slow…

In order to handle that properly I’ve modified the beginning of the tick() internal method on CoroniumGSClient as follows:

 local function tick() local input, output = socketlib.select( { self.socket }, { self.socket }, TIMEOUT ) for \_, client in ipairs( input ) do local data, err, partial = client:receive('\*l') if partial and #partial \> 0 then if not self.partialData then self.partialData = partial else self.partialData = self.partialData .. partial end elseif data and not err then --== Incoming data if self.partialData then data = self.partialData .. data self.partialData = nil end local data = json.decode( data ) if type( data ) ~= "table" then data = {} end

Question on server robustness. what can I do to overcome a situation where the server ran into some exception in production and automatically restart it?

I’m using both Coronium.GS and Coronium.IO.

Thanks.

Hello everybody,

I am going to give Conronium GS a try.

The game will be a 2 player game where players have to find words on a 4x4 letters board.

The challenge is that it will be real-time game where both players play with the same board. When a player finds a word, the opponent will not score any points if he plays that word.

The game will be for iOS and Android (French and English).

I am looking for beta-testers. If you are interested, let me know.