I'm trying to download an image from a url. The process I wrote works for everyone except for ONE content provider that we're dealing with.
When I access their JPGs via Firefox, everything looks kosher (happy Passover, btw). However, when I use my process I either:
A) get a 404 or
B) in the debugger when I set a break point at the URL line (URL url = new URL(str);) then after the connection I DO get a file but it's not a .jpg, but rather some HTML that they're producing with generic links and stuff. I don't see a redirect code, though! It comes back as 200.
Here's my code...
URL url = new URL(urlString);
URLConnection uc = url.openConnection();
String val = uc.getHeaderField(0);
System.out.println("FOUND OBJECT OF TYPE:" + contType);
if(!val.contains("200")){
//problem
}
else{
is = uc.getInputStream();
}
Has anyone seen anything of this nature? I'm thinking maybe it's some mime type issue, but that's just a total guess... I'm completely stumped.
-
Have you tried using WireShark to see exactly what packets are going back and forth? This is often the fastest way to see what is different. That is:
- First run WireShark when using FireFox to get the GIF, and then
- Run WireShark to use your code to get it.
Then compare and contrast the packets in both directions and I almost guarantee that you'll see something different in the HTTP headers or some other part of the traffic that will explain the problem.
-
Maybe the site is just using some kind of protection to prevent others from hotlinking their images or to disallow mass downloads.
They usually check either the HTTP referrer (it must be from their own domain), or the user agent (must be a browser, not a download manager). Set both and try it again.
-
if(!val.contains("200")) // ...
First of all, I would suggest you to use this useful class called HttpURLConnection, which provides the method getResponseCode()
Searching the whole data for some '200' implies
- performance issues, and
- inconsistency (binary files can contain some '200')
-
All good guesses, but the "right" answer reward, I think, has to go to ivan_pertrovich_ivanovich_harkovich_rostropovitch_o'neil because using HttpURLConnection I was able to see that, in fact, before getting a 404, I'm first getting a 301. So, now, it's just a matter of finding out from these people what they're expecting in the header, which would make them less inclined to redirect me.
thanks for the suggestion.
ivan_ivanovich_ivanoff : Well, you guessed right, my real name is not Ivan Ivanovich Ivanoff, but you should know, there are really people who are named this way ;) (Though my first name is Ivan really)... The middle name in Russia is patronymic ( http://en.wikipedia.org/wiki/Patronymic )Dr.Dredel : I am well familiar with the nature of middle names in Russian (my own being Grigoryavich) but I didn't know it was known as "patronymic"... how can I grade your answer even MORE highly?!
0 comments:
Post a Comment