Welcome, Guest. Please login or register.

Author Topic: Example of C source code for getting web page.  (Read 8461 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show all replies
    • http://koft.net
Re: Example of C source code for getting web page.
« on: January 21, 2006, 03:05:10 AM »
Quote

AmigaEd wrote:
Hello,
I'm trying to learn C and I am wondering if someone out there might have a very simple example of some C source code that will grab a web page and display it or even just save it as a file.

I've looked at a few programs on aminet, but I can't seem to make sense out of them.

Thank you,
AmigaEd


This isnt too bad achttp://www.google.com/search?hl=en&q=software+hut&btnG=Google+Searchtually. Learn your tcp library. All you have to do is issue one simple string. something like "GET HTTP 1.0 /"

then the web server simply spits back the result.

so it goes like this:

create socket
set sock addr
set sock port
open socket
send socket ( request_string )
receive result

then write the input uffer to standard output or to a file, what ever

obviously the libs are different for every os

check out the rfc for http for more detial on what you can do with it http://www.faqs.org/rfcs/rfc2616.html

sorry i cant help you with the tcp stuff on amiga, as i have never used socket library on amigaos.


 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show all replies
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #1 on: January 21, 2006, 03:53:38 AM »
Ive written a simple program that does what you are talking about, for linux. Theres still a small problem with it though, compiler is telling me that it doesnt know the size for socket_detials....

Code: [Select]

#include <sys/types.h>
#include <sys/socket.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main ( void ) {
        int socket_handle ;
        struct sockaddr_in socket_detials ;
        char * input_buffer;
        char * httpget = &quot;GET HTTP 1.1 / \r\r&quot; ;

        input_buffer = malloc(20000);

        socket_handle = socket ( AF_INET, SOCK_STREAM, 0) ;
        socket_detials.sin_family = AF_INET ;
        socket_detials.sin_addr.s_addr=inet_addr(&quot;68.90.68.66&quot;);
        socket_detials.sin_port = htons(80);

        connect (socket_handle,(struct sockaddr*)&socket_detials, sizeof ( struct sockaddr));
        send ( socket_handle , httpget, strlen(httpget), 0 ) ;
        recv ( socket_handle , input_buffer , 20000, 0 ) ;
        printf ( &quot;%s\n&quot;, input_buffer ) ;

        return 0 ;
}


you may want to check out the simple socket library from http://mysite.verizon.net/astronaut/ssl/

It supports a lot of os's and i seem to remember saying it supported amiga... Socket programming with training wheels
 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show all replies
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #2 on: January 21, 2006, 04:04:28 AM »
Quote

AmigaEd wrote:
Quote
by koaftder on 2006/1/20 22:05:10
This isnt too bad achttp://www.google.com/search?hl=en&q=software+hut&btnG=Google+Searchtually. Learn your tcp library. All you have to do is issue one simple string. something like "GET HTTP 1.0 /"


Hi koaftder,
The link you posted seems to take me to a bunch of links to software hut on google.

Can you please post the link again or point me to the correct site.

Thank you,
AmigaEd





Sorry ): i must have hit paste when i typed that. Only link i meant to post on that comment was http://www.faqs.org/rfcs/rfc2616.html for the http protocol specification. I was on the hunt for zip ram, i figure if i cant figure out which order they need to be piled up in the sockets, i'll just fill up every socket.
 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show all replies
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #3 on: January 21, 2006, 04:40:49 AM »
ok, finally got my code to work
Code: [Select]

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main ( void ) {
        int socket_handle ;
        struct sockaddr_in socket_detials ;
        char * input_buffer;
        char * httpget = &quot;GET HTTP 1.1 / \x0D\x0A\n\x0D\x0A\n&quot; ;

        input_buffer = malloc(20000);

        socket_handle = socket ( AF_INET, SOCK_STREAM, 0) ;
        socket_detials.sin_family = AF_INET ;
        socket_detials.sin_addr.s_addr=inet_addr(&quot;68.90.68.66&quot;);
        socket_detials.sin_port = htons(80);
        bzero ( &(socket_detials.sin_zero), 8 ) ;

        if ( connect (socket_handle,(struct sockaddr*)&socket_detials, sizeof ( struct sockaddr)) == -1 ){
                printf ( &quot;Couldnt connect to server\n&quot; ) ;
        }
        printf ( &quot;Sending %d bytes\n&quot;,  send ( socket_handle , httpget, strlen(httpget), 0 ) ) ;
        printf ( &quot;Received %d bytes\n&quot;, recv ( socket_handle , input_buffer , 20000, 0 ) ) ;
        printf ( &quot;%s\n&quot;, input_buffer ) ;

        return 0 ;
}


and when i run it i get :

Code: [Select]

koft@macdev:~$ ./socket
Sending 21 bytes
Received 658 bytes
HTTP/1.1 400 Bad Request
Date: Sat, 21 Jan 2006 04:37:48 GMT
Server: Apache/1.3.34 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a
Connection: close
Content-Type: text/html; charset=iso-8859-1



400 Bad Request

Bad Request


Your browser sent a request that this server could not understand.


The request line contained invalid characters following the protocol string.





Apache/1.3.34 Server at cpanel1.betterbox.net Port 80



koft@macdev:~$


Tip, when things arent working, use ethereal to view your traffic. I spent some time watching the program hang because i wasnt sending it the right stuff after the get. ( i tried  0d0a0d0a but that didnt do it.... ) I wasnt even sure it actually send the packet to the server, or at the right address or port, untill i fired up ethereal and saw for sure what was really going on.
 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show all replies
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #4 on: January 21, 2006, 05:17:21 AM »
Quote

patrik wrote:
@koaftder:

The reason why you are getting a "400 Bad Request" response is because you are specifying that your client is a HTTP/1.1 client, which requires you to supply the "Host: something.com" header-line, which is optional in HTTP/1.0, but required for virtual hosts to work, so it is definately recommended to supply it anyhow.

With a simple client, there is no advantage in telling the server that your client supports HTTP/1.1 instead of HTTP/1.0, rather disadvantages as then the server is allowed to send you dynamic pages as chunks using the so called "chunked transfer-coding".


/Patrik


ok, so i will just capture what fire fox sends out when i goto amiga.org, here is the fix  :-)

Code: [Select]

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main ( void ) {
        int socket_handle ;
        struct sockaddr_in socket_detials ;
        char * input_buffer;
        char * httpget =

          &quot;GET / HTTP/1.1\r\n&quot;
          &quot;Host: www.amiga.org\r\n&quot;
          &quot;User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.7.10) Gecko/20050825 Firefox/1.0.6 (Ubuntu package 1.0.6)\r\n&quot;
          &quot;Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n&quot;
          &quot;Accept-Language: en-us,en;q=0.5\r\n&quot;
          &quot;Accept-Encoding: gzip,deflate\r\n&quot;
          &quot;Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n&quot;
          &quot;Keep-Alive: 300\r\n&quot;
          &quot;Connection: keep-alive\r\n&quot;
          &quot;Referer: http://www.amiga.org/gallery/index.php?n=896=33\r\n&quot;
          &quot;Cookie: PHPSESSID=442105507b7dca6d4042a641fc132c8f; AO_Session=442105507b7dca6d4042a641fc132c8f\r\n&quot;
          &quot;Cache-Control: max-age=0\r\n&quot;
          &quot;\r\n&quot;;

        input_buffer = malloc(20000);

        socket_handle = socket ( AF_INET, SOCK_STREAM, 0) ;
        socket_detials.sin_family = AF_INET ;
        socket_detials.sin_addr.s_addr=inet_addr(&quot;68.90.68.66&quot;);
        socket_detials.sin_port = htons(80);
        bzero ( &(socket_detials.sin_zero), 8 ) ;

        if ( connect (socket_handle,(struct sockaddr*)&socket_detials, sizeof ( struct sockaddr)) == -1 ){
                printf ( &quot;Couldnt connect to server\n&quot; ) ;
        }
        printf ( &quot;Sending %d bytes\n&quot;,  send ( socket_handle , httpget, strlen(httpget), 0 ) ) ;
        printf ( &quot;Received %d bytes\n&quot;, recv ( socket_handle , input_buffer , 20000, 0 ) ) ;
        printf ( &quot;%s\n&quot;, input_buffer ) ;

        return 0 ;
}


and it returns the following now:

Code: [Select]

koft@macdev:~$ ./socket
Sending 612 bytes
Received 1460 bytes
HTTP/1.1 200 OK
Date: Sat, 21 Jan 2006 05:03:13 GMT
Server: Apache/1.3.34 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a
X-Powered-By: PHP/4.4.1
Set-Cookie: PHPSESSID=442105507b7dca6d4042a641fc132c8f; path=/
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: private, no-cache
Pragma: no-cache
Set-Cookie: AO_Session=442105507b7dca6d4042a641fc132c8f; expires=Saturday, 28-Jan-06 05:03:14 GMT; path=/
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=ISO-8859-1

d19







Code: [Select]

koft@macdev:~$


i broke up one of thoes lines because it was annoying

Writing network code can be a lot of fun, though it can be a major undertaking if you have to write something more than some hacks for a hobby project.

I have a few books about socket programming for windows and linux, they cover a lot of material but dont seem to dig into some detials i'd like to fill in without having to experiment for years and write mountians of code. Multithreaded socket programming comes into mind, most books seem to skirt around the subject. If i'm writing a server daemon for something, i'm going to need to handle a lot of simultaneous connections. Do i use blocking or non blocking sockets? Do i spawn off a thread for each socket, and make it a blocking socket? Should i spawn of a thread for every 10 sockets and do non blocking io? Should i fork the daemon 5 times and load balance the connections accross the processes? How do i effectively deal with resource starvation caused by jerks who write scripts that keep opening hundreds of connections and letting them hang? Thoes are the topics i'd like to see covered in a book, performance strategies and security issues. I really really dont have time to read all the socket code for apache.
 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show all replies
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #5 on: January 21, 2006, 06:36:49 AM »
@patrik

You are right about the extraneous stuff i had put in there. I just wanted to demonstrate the connection and retrieving some stuff, no need to confuse people.

That document you pointed out is a wonderful read.

@AmigaEd

Sorry to point you off into a wrong direction. I grabbed the package and sure enough, doesnt support amiga ): It's a really nice easy lib to work with. It supported dos/windows/os2/unix/linux/vms, etc. I guess i just thought it ran on amiga cause the guy who wrote it is a nasa geek and everybody seems to mention about how much amiga was used in that organisation.
 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show all replies
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #6 on: January 21, 2006, 08:42:27 PM »
Quote

Piru wrote:
@koaftder

Bugs:

- You don't check if malloc() fails, but just crash if it does.
- You don't check if socket() fails, but just continue instead.
- bzero ( &(socket_detials.sin_zero), 8 ) ; is wrong. It assumes knowlege of the struct sockaddr_in, which can be different between platforms. Typecally it is 8 though, but there is no guarantee of this.
- You don't bail out if connect() fails, but just continue.
- You don't check if send() succeeds.
- You don't check how much data you manage to recv().
- You limit the recv size to 20000 bytes. If more data would be available you just truncate input.
- There is no guarantee single recv() will get all the input at once. You might get just the header for the 1st call, or part of the header. You should call recv() till -1 (error) or 0 (eof) is returned.
- You printf %s the input buffer, even though it is not '\0' terminated.
- Sending fixed cookies will not work. Esp PHPSESSID will just fail once the session id has expired.


haha, i'm supprised you took the time to list all the unresponsible things i did. Since you took the time to review it, i'll take the time to fix it. Here it is:

Code: [Select]

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main ( void ) {
int socket_handle ;
struct sockaddr_in socket_detials ;
char * input_buffer;
char * pinput_buffer ;
ssize_t bytes_received ;
ssize_t bytes_sent ;
char * phttpget ;
char * httpget =
 &quot;GET / HTTP/1.0\r\n&quot;
 &quot;Host: www.amiga.org\r\n&quot;
 &quot;\r\n&quot;;

phttpget = httpget ;
bytes_sent = 0 ;

input_buffer = malloc(1024);
if ( input_buffer == NULL ) {
printf ( &quot;Sorry, couldnt allocate memory for input buffer\n&quot; );
return -1 ;
}
memset ( input_buffer, 0, 1024 ) ;

memset ( &socket_detials , 0 , sizeof(struct sockaddr_in) );

socket_handle = socket ( AF_INET, SOCK_STREAM, 0) ;
if ( socket_handle == -1 ) {
printf ( &quot;Could not create socket\n&quot; ) ;
return -1 ;
}
socket_detials.sin_family = AF_INET ;
socket_detials.sin_addr.s_addr=inet_addr(&quot;68.90.68.66&quot;);
socket_detials.sin_port = htons(80);

if ( connect (socket_handle,(struct sockaddr*)&socket_detials, sizeof ( struct sockaddr)) == -1 ){
printf ( &quot;Couldnt connect to server\n&quot; ) ;
return -1 ;
}

printf ( &quot;Attempting to send %d bytes to server\n&quot; , strlen ( httpget ) );
for(;;){
bytes_sent = send ( socket_handle , phttpget, strlen(phttpget), 0 ) ;
if ( bytes_sent == -1 ) {
printf ( &quot;An error occured sending data\n&quot; );
return -1 ;
}
if ( httpget+strlen(httpget) == phttpget )
break ;
phttpget += bytes_sent ;
}

        for (;;) {
bytes_received = recv ( socket_handle , input_buffer , 1023, 0 ) ;
if ( bytes_received == -1 ) {
printf ( &quot;An error occured during the receive procedure \n&quot; ) ;
return 0 ;
}
if ( bytes_received == 0 )
break ;
pinput_buffer = input_buffer + bytes_received ;
*pinput_buffer = 0 ;
printf ( &quot;%s&quot; , input_buffer ) ;
}

printf ( &quot;\nFinished receiving data\n&quot; ) ;
return 0 ;
}