Portal Home > Knowledgebase > Articles Database > Automating downloading/decompressing files from another site


Automating downloading/decompressing files from another site




Posted by phaedarus, 06-13-2009, 05:16 AM
Hi, We have a database that requires updating once a day. To do this, we must manually log into another service with our account username and password, click on a access subscription link and then select a zip file containing .csv lists using a form

Posted by mwatkins, 06-13-2009, 12:16 PM
Basically you have to act as a web client to the remote site. You may find it possible to use command line tools such as curl or wget to do this; depending on remote site / the authentication method used, you may have to do quite a bit of coding... non trivial coding. I would examine curl/wget first. It may not get you all the way there but will give you some insight. Check the man pages. Now if you can't make that work out fully for you, then you'll likely get all the way there via web testing tools - these are tools that pretend to be a "browser", allowing scripts to take actions on web sites / test for reactions. Automated testing is the principle purpose but these suites work well for automating repetitive tasks too. http://twill.idyll.org/python-api.html The above link shows an interactive session - this sort of ability is invaluable while trying to figure out how you'll craft a solution. I would highly recommend downloading and trying twill (and some of the other web testing tools if twill does not fully meet your needs). http://webunit.sourceforge.net/ - look at this sample session (it is Python code but that doesn't matter... I think you'll get an appreciation for what is happening): http://webunit.sourceforge.net/session_example.py Don't let the fact that it is Python stop you from looking at this. Consider this just another application with a specific type of control file layout.

Posted by UNIXy, 06-13-2009, 01:10 PM
I've done this a long time ago with Lynx (more specifically with flags -cmd_log and -cmd_script). It should still work (Lynx has gone through some drastic source code changes). First, run the command line browser. Let's say I want to record actions from browsing http://www.unixy.net: Perform the action that you would normally do. Lynx will record all actions as macros inside the file auto_login_download.chat. Once done, exit from Lynx. Schedule a cron job to run at 6am or whichever time is convenient: The csv_download.sh script contains the following: Best

Posted by mwatkins, 06-13-2009, 04:49 PM
Ok, I learned something new today - never knew Lynx had the playback functionality.

Posted by phaedarus, 06-13-2009, 05:54 PM
Thanks for replies. I was afraid it would get a little messy. I'll have a further look and see what I can come up with.

Posted by phaedarus, 06-13-2009, 07:35 PM
UNIXy, I neglected to mention that we're on a shared hosting package that does not offer shell access (only CPANEL). Would I still be able to do what you described under such circumstances?

Posted by mwatkins, 06-13-2009, 10:37 PM
That is a rather significant detail left out. Personally -- for software development even of the scope you are faced with I would not remain at a host that offers no shell access. It just makes getting on with the job too hard and there are many good hosts out there you can turn to that will give you all the tools you need to do this. You will almost certainly be frustrated at every turn if all you have is CPanel access. If you have access to a local Linux/Unix machine or another account you can record your command script via Lynx as UNIXy has suggested there, then upload the files and execute them via a cron job. However lynx is essentially a command line non GUI web browser, one which most users employ from within a login shell. If your host provides no shell access then it is less likely, though still possible, that they will have a copy of lynx installed for you to call from a script. I should point out that if there is any logic (i.e. you are not selecting the same file over and over) in your actions, simply recording a script - for Lynx or any other solution, is not going to be satisfactory. You should explore also wget and curl -- both are command line tools that are routinely included in scripts to download files and both can pass username and password to info to remote sites using basic, http-digest and in some cases form-based authentication. Really your task might be very simple if the remote application supports http-basic, http-digest, or NTLM authentication handling. curl can post to forms, pass along specific header info, authentication - it really is a very useful tool for doing web app interaction. You may need to pass some header fields as well; curl also has some more convenient ways of passing form data than I've shown here. curl and/or wget are somewhat more likely to be available on a host than Lynx. Last edited by mwatkins; 06-13-2009 at 10:40 PM.

Posted by Plutost, 06-14-2009, 01:16 AM
Why don't you run a Cron Job instead ? will it help in updating the script or do you need to run it manually ?

Posted by UNIXy, 06-14-2009, 01:53 AM
Have you asked your provider for jailshell access? Shared hosts will enable shell access for a one time fee. If they still refuse to enable the shell, run the "recording" part of the command elsewhere and simply preserve the chat file (there's a Windows port of Lynx so you probably can run the recording of the chat on your workstation). Then upload the chat file (ex: auto_login_download.chat) into your home directory on the existing cPanel account. From cPanel, set up the cron job. -cmd_script will now point to /home/cpanel_account/auto_login_download.chat If your provider is so uncooperative it might be time to revalue the contract. A Web host should be an enabler especially when you intend to do no harm. Regards



Was this answer helpful?

Add to Favourites Add to Favourites    Print this Article Print this Article

Also Read
yum installation error (Views: 603)
availability script ?? (Views: 606)