How do I determine if network.download fails?

Hi,

If I call network.download to “GET” a file, I can’t seem to determine if it works or fails … at least in one important case.

Here are two examples, showing (much of) the event structure I get back.  In the first case, I asked

for  foo.fum.com/mobile_data.csv (which exists);

in the second case, I asked for foo.badname.com/mobile_data.csv (which does *NOT* exist, and the domain does not exist).

The fields in ‘event’ (with some fields with matching data deleted for space):

The fields in ‘event’ (with some fields with matching data deleted for space):

Good:

                    __unnamed__ = {

                       [“responseHeaders”] = {

                          [“Connection”] = “Keep-Alive”;

                          [“Content-Type”] = “text/csv”;

                          [“Etag”] = "“4f8ed-577bf29922da6"”;

                          [“Last-Modified”] = “Mon, 08 Oct 2018 22:30:03 GMT”;

                          [“Accept-Ranges”] = “bytes”;

                          [“Keep-Alive”] = “timeout=120, max=100”;

                          [“Content-Length”] = “325869”;

                          [“Server”] = “Apache/2.4.34 (Amazon)”;

                       };

                       [“bytesEstimated”] = 325869;

                       [“name”] = “networkRequest”;

                       [“bytesTransferred”] = 325869;

                       [“status”] = 200;

                       [“url”] = "http://foo.fum.com/mobile_data.csv";

                       [“isError”] = false;

                       [“requestId”] = “userdata: 0x600000e5d338”;

                    };

Bad:

                    __unnamed__ = {

                       [“responseHeaders”] = {

                          [“Connection”] = “close”;

                          [“Content-Type”] = “text/html”;

                          [“Expires”] = “Tue, 09 Oct 2018 00:13:45 GMT”;

                          [“Cache-Control”] = “no-cache”;

                          [“Vary”] = “Accept-Encoding”;

                          [“Transfer-Encoding”] = “Identity”;

                          [“Content-Encoding”] = “gzip”;

                          [“Server”] = “nginx”;

                       };

                       [“bytesEstimated”] = 322;

                       [“name”] = “networkRequest”;

                       [“bytesTransferred”] = 322;

                       [“status”] = 200;

                       [“url”] = "http://foo.badname.com/mobile_data.csv";

                       [“isError”] = false;

                       [“requestId”] = “userdata: 0x618000e42578”;

                    };

A file *is* downloaded in both cases.  The bad file has:

   <html><head><meta http-equiv=“refresh” 

   content="0;url=http://dnserrorassist.att.net/search/?q=http://foo.badname.com/mobile_data.csv&t=0"/>

   </head><body><script>window.location="http://dnserrorassist.att.net/search/?q="+

   escape(window.location)+"&r="+escape(document.referrer)+"&t=0";</script></body></html>

 

The good file has several thousand lines of .csv data.

 

Obviously, the http status can’t be used to differentiate: they both got back 200.

 

I had expected to get status 404 for the bad case :slight_smile:

 

(Yes, I could check the resulting file for size, or to see if it has “dnserrorassist” in it, but those are both kludges.)

 

thanks,

Stan

Do you control the servers? The second server does not provide “content-length” as a header value? 

@agramonte no, the URL is relatively free form, and not controlled, unfortunately.

I don’t agree with the kludge idea.  If the server is not properly reporting a error status, you must fall back on the info you have.

So, If you’re expecting a large payload, you might be able to use the bytesSent field as a hint that something went wrong.

It isn’t great, but as an additional info item you check it might do the trick.

What really puzzles me here is how you’re getting a bad file back. :(    That seems like the server is screwing up and doesn’t know it.

What is happening is that your internet service provider (I’m guessing AT&T) is seeing that the domain name you are requesting does not exist. Their DNS server is forwarding your request to a search engine to help you find valid results. See the screenshot of what happens when I go to the the page in my browser. 

That request is being redirected to a valid URL. A web page downloads successfully and you get a 200 result code which is perfectly valid.

My question to you is you control your code. Why are you attempting to hit a web server that doesn’t exist? If you’re trying to test handing a URL that doesn’t exist, you should use a valid domain name and let the web server return a 404 code to you when it can’t find the page. But since you’re trying to hit an invalid domain name that doesn’t exist, your ISP is sending you to a valid page.

What is the use case for allowing your users to go to domain names that don’t exist?

Rob

@Rob asks: What is the use case for allowing your users to go to domain names that don’t exist?

Well, I haven’t found a corona api like network.is_domain_valid so I have no way other than saying “try to fetch this file”.

Obviously, network.is_domain_valid is not a good function name … it’s too open to interpretation :slight_smile:

E.g.: what does “valid” mean: valid syntax?  a domain that resolves to an IP address?  A domain with a valid MX record?

Perhaps: network.is_domain_reachable (domain_name, protocol) where protocol is one of “http”, “https”, “telnet”, “ftp”? :slight_smile:

SS

@roaminggamer I’ll probably add a file size check, since for my application, it’s likely that any valid file will have

at least 100 lines of text.

That getting of a wrong file was puzzling me too, that’s why I added code to dump the entire ‘event’ result, and then

to look at the resulting small file (data shown in OP).

I don’t know (yet) what other ISPs would provide as the result of a bad domain name (or bad filename on a valid server).

Since those servers are likely to be major cellular players (AT&T, Sprint, etc) as opposed to my home’s ISP, I may well

see something different in the error cases.

thanks.

Stan

I guess I wasn’t clear in my question.

Why are you letting users enter domain names? It seems to me that if you’re wanting to download a specific CSV file, it would have to be on a server you know about and that wouldn’t be a user choice. If it’s a domain name that you know about, it should resolve. Once you have a domain name that resolves, then you can use network.* to make HTTP and HTTPS requests. If for some reason there isn’t a web server there or it fails, .isError will be true. If there is a valid web server to respond to the request, you will get a success (.isError will be false) and then you can use the HTTP status code to determine if your URL endpoint was valid. That’s when you would get a 404.

Currently, you’re failing before you even talk to a web server.  The system.canOpenURL() might provide your more intelligence to determine if the URL is valid or not, but I don’t know if it will do a DNS lookup on the domain name or not. If you really need your end user to enter a valid domain name, you need to do a DNS lookup to make sure it’s valid. You, of course, could do this at the native level or you can use network.request() to talk to a REST based API.  See: https://www.google.com/search?q=rest+based+dns+lookup&oq=rest+based+dns+lookup&aqs=chrome…69i57.6471j0j7&sourceid=chrome&ie=UTF-8

Rob

Do you control the servers? The second server does not provide “content-length” as a header value? 

@agramonte no, the URL is relatively free form, and not controlled, unfortunately.

I don’t agree with the kludge idea.  If the server is not properly reporting a error status, you must fall back on the info you have.

So, If you’re expecting a large payload, you might be able to use the bytesSent field as a hint that something went wrong.

It isn’t great, but as an additional info item you check it might do the trick.

What really puzzles me here is how you’re getting a bad file back. :(    That seems like the server is screwing up and doesn’t know it.

What is happening is that your internet service provider (I’m guessing AT&T) is seeing that the domain name you are requesting does not exist. Their DNS server is forwarding your request to a search engine to help you find valid results. See the screenshot of what happens when I go to the the page in my browser. 

That request is being redirected to a valid URL. A web page downloads successfully and you get a 200 result code which is perfectly valid.

My question to you is you control your code. Why are you attempting to hit a web server that doesn’t exist? If you’re trying to test handing a URL that doesn’t exist, you should use a valid domain name and let the web server return a 404 code to you when it can’t find the page. But since you’re trying to hit an invalid domain name that doesn’t exist, your ISP is sending you to a valid page.

What is the use case for allowing your users to go to domain names that don’t exist?

Rob

@Rob asks: What is the use case for allowing your users to go to domain names that don’t exist?

Well, I haven’t found a corona api like network.is_domain_valid so I have no way other than saying “try to fetch this file”.

Obviously, network.is_domain_valid is not a good function name … it’s too open to interpretation :slight_smile:

E.g.: what does “valid” mean: valid syntax?  a domain that resolves to an IP address?  A domain with a valid MX record?

Perhaps: network.is_domain_reachable (domain_name, protocol) where protocol is one of “http”, “https”, “telnet”, “ftp”? :slight_smile:

SS

@roaminggamer I’ll probably add a file size check, since for my application, it’s likely that any valid file will have

at least 100 lines of text.

That getting of a wrong file was puzzling me too, that’s why I added code to dump the entire ‘event’ result, and then

to look at the resulting small file (data shown in OP).

I don’t know (yet) what other ISPs would provide as the result of a bad domain name (or bad filename on a valid server).

Since those servers are likely to be major cellular players (AT&T, Sprint, etc) as opposed to my home’s ISP, I may well

see something different in the error cases.

thanks.

Stan

I guess I wasn’t clear in my question.

Why are you letting users enter domain names? It seems to me that if you’re wanting to download a specific CSV file, it would have to be on a server you know about and that wouldn’t be a user choice. If it’s a domain name that you know about, it should resolve. Once you have a domain name that resolves, then you can use network.* to make HTTP and HTTPS requests. If for some reason there isn’t a web server there or it fails, .isError will be true. If there is a valid web server to respond to the request, you will get a success (.isError will be false) and then you can use the HTTP status code to determine if your URL endpoint was valid. That’s when you would get a 404.

Currently, you’re failing before you even talk to a web server.  The system.canOpenURL() might provide your more intelligence to determine if the URL is valid or not, but I don’t know if it will do a DNS lookup on the domain name or not. If you really need your end user to enter a valid domain name, you need to do a DNS lookup to make sure it’s valid. You, of course, could do this at the native level or you can use network.request() to talk to a REST based API.  See: https://www.google.com/search?q=rest+based+dns+lookup&oq=rest+based+dns+lookup&aqs=chrome…69i57.6471j0j7&sourceid=chrome&ie=UTF-8

Rob