Welcome, Guest. Please login or register.

Author Topic: Example of C source code for getting web page.  (Read 8432 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show all replies
    • http://www.iki.fi/sintonen/
Re: Example of C source code for getting web page.
« on: January 21, 2006, 03:08:49 PM »
@koaftder

Bugs:

- You don't check if malloc() fails, but just crash if it does.
- You don't check if socket() fails, but just continue instead.
- bzero ( &(socket_detials.sin_zero), 8 ) ; is wrong. It assumes knowlege of the struct sockaddr_in, which can be different between platforms. Typecally it is 8 though, but there is no guarantee of this.
- You don't bail out if connect() fails, but just continue.
- You don't check if send() succeeds.
- You don't check how much data you manage to recv().
- You limit the recv size to 20000 bytes. If more data would be available you just truncate input.
- There is no guarantee single recv() will get all the input at once. You might get just the header for the 1st call, or part of the header. You should call recv() till -1 (error) or 0 (eof) is returned.
- You printf %s the input buffer, even though it is not '\0' terminated.
- Sending fixed cookies will not work. Esp PHPSESSID will just fail once the session id has expired.
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show all replies
    • http://www.iki.fi/sintonen/
Re: Example of C source code for getting web page.
« Reply #1 on: January 21, 2006, 05:31:13 PM »
Code: [Select]

;/*
gcc -noixemul -Wall -O2 httpget.c -o httpget
quit
*/

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#ifndef STDOUT_FILENO
#define STDOUT_FILENO 1
#endif


#include <proto/exec.h>
#ifdef __SASC
#include <proto/socket.h>
#endif
#include <clib/alib_protos.h>

#define RECV_BUFSIZE 16384

struct MinList *http_get(const char *host, int port, const char *path);
struct MinList *dorecv(int s);
void dumplist(struct MinList *list);
void freelist(struct MinList *list);

struct datanode
{
  struct MinNode node;
  int            len;
};

int main(void)
{
  struct MinList *res;

  res = http_get(&quot;www.amiga.org&quot;, 80, &quot;/&quot;);
  if (res)
  {
    dumplist(res);
    freelist(res);
  }
  else
  {
    fprintf(stderr, &quot;http_get failed\n&quot;);
  }

  return 0;
}

/*
   FUNCTION

   http_get - HTTP GET a location off a web server

   struct MinList *http_get(const char *host, int port, const char *path)
   

   INPUT

   The http-request must be split into valid components for this function.

   host: hostname
   port: port number
   path: the path of object to http get. spaces and special chars should
         be encoded to %<hex>

   RESULT

   struct MinList *

     NULL if error, else list filled with 'struct datanode' nodes. Note
     that the output includes the full header returned by the server, and
     it's left for the caller to parse it (separate header and actual
     data).

   NOTE

   This function blocks, and it can potentially take hours to complete;
   for example if the file is long, or the connection is very slow.
*/

struct MinList *http_get(const char *host, int port, const char *path)
{
  struct MinList *list = NULL;
  int s;

  if (host && host[0] && port > 0 && path)
  {
    struct sockaddr_in saddr;
    struct hostent *he;

    bzero(&saddr, sizeof(saddr));

    he = gethostbyname(host);
    if (he)
    {
      memcpy(&saddr.sin_addr, he->h_addr, he->h_length);
      saddr.sin_family = he->h_addrtype;
    }
    else
    {
      saddr.sin_addr.s_addr = inet_addr(host);
      saddr.sin_family = AF_INET;
    }

    if (saddr.sin_addr.s_addr != INADDR_NONE)
    {
      s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
      if (s != -1)
      {
        saddr.sin_port = htons(port);

        if (connect(s, (struct sockaddr *) &saddr, sizeof(saddr)) != -1)
        {
          const char *fmt =
            &quot;GET %s HTTP/1.0\r\n&quot;
            &quot;Host: %s\r\n&quot;
            &quot;User-Agent: httpget_test_app/1.0\r\n&quot;
            &quot;\r\n&quot;;
          char *req;

          if (path[0] == '\0')
          {
            path = &quot;/&quot;;
          }

          req = malloc(strlen(fmt) +
                       strlen(path) - 2 +
                       strlen(host) - 2 + 1);
          if (req)
          {
            int reqlen;

            sprintf(req, fmt, path, host);
            reqlen = strlen(req);

            if (send(s, req, reqlen, 0) == reqlen)
            {
              list = dorecv(s);
            }

            free(req);
          }

          close(s);
        }
      }
    }
  }

  return list;
}

struct MinList *dorecv(int s)
{
  struct MinList *list = NULL;
  UBYTE *buf;

  buf = malloc(RECV_BUFSIZE);
  if (buf)
  {
    int ok = 0;

    for (;;)
    {
      int actual;

      actual = recv(s, buf, RECV_BUFSIZE, 0);
      if (actual == -1)
      {
        /* error */
        break;
      }
      else if (actual == 0)
      {
        /* eof */
        ok = 1;
        break;
      }
      else
      {
        struct datanode *node;

        if (!list)
        {
          list = malloc(sizeof(*list));
          if (!list)
          {
            break;
          }
          NewList((struct List *) list);
        }

        node = malloc(sizeof(*node) + actual);
        if (!node)
        {
          break;
        }
        node->len = actual;
        memcpy(node + 1, buf, actual);

        AddTail((struct List *) list, (struct Node *) node);
      }
    }

    if (!ok)
    {
      freelist(list);
      list = NULL;
    }

    free(buf);
  }

  return list;
}

void dumplist(struct MinList *list)
{
  if (list)
  {
    struct datanode *node;

    fflush(stdout);

    for (node = (APTR) list->mlh_Head;
         node->node.mln_Succ;
         node = (APTR) node->node.mln_Succ)
    {
      write(STDOUT_FILENO, node + 1, node->len);
    }

    fflush(stdout);
  }
}

void freelist(struct MinList *list)
{
  if (list)
  {
    struct datanode *node, *nextnode;

    for (node = (APTR) list->mlh_Head;
         (nextnode = (APTR) node->node.mln_Succ);
         node = nextnode)
    {
      free(node);
    }

    free(list);
  }
}
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show all replies
    • http://www.iki.fi/sintonen/
Re: Example of C source code for getting web page.
« Reply #2 on: January 21, 2006, 05:39:25 PM »
Oh, I used exec lists there.

If this thing must be portable you can use dlist instead.
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show all replies
    • http://www.iki.fi/sintonen/
Re: Example of C source code for getting web page.
« Reply #3 on: January 21, 2006, 06:09:49 PM »
@ChaosLord

-W is warning option. -Wall means "enable all standard warnings"

-noixemul means 'do not use ixemul.library' (this is stricly gcc thing).

BTW: In this program m68k gcc users might need to drop the -noixemul option, that is to use ixemul.library. Also, dependin on the gcc version -lamiga migth be required to get the thing to link.
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show all replies
    • http://www.iki.fi/sintonen/
Re: Example of C source code for getting web page.
« Reply #4 on: January 21, 2006, 06:30:48 PM »
Note: It builds with SAS/C aswell now (at least with MiamiSDK).

sc resopt incdir Path:to/MiamiSDK/netinclude/ link httpget.c lib Path:to/MiamiSDK/netlib/miami.lib

Anyway, if someone has any more questions about the src, just ask. I'm happy to explain it.
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show all replies
    • http://www.iki.fi/sintonen/
Re: Example of C source code for getting web page.
« Reply #5 on: January 22, 2006, 04:09:00 AM »
@AmigaEd

I believe it's include conflict between sys/time.h and devices/timer.h. I didn't see it as MorphOS includes account for the problem.

Anyway, clib/alib_protos.h isn't really required here as it only declares one of the prototypes (NewList()). As you noticed it works without.
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show all replies
    • http://www.iki.fi/sintonen/
Re: Example of C source code for getting web page.
« Reply #6 on: January 22, 2006, 04:36:03 PM »
@uncharted

I wouldn't really call it that good. Here are the obvious bugs I could find from it:

- The HTTP GET is broken (missing carriage returns, missing "Host:" breaking vhosts etc).

- getservbyname and getprotobyname can return NULL. The program doesn't check for this, but read zeropage instead.

- It leaks sockets in error conditions, eventually (after 64 attempts) you would just run out  and socket() would just return error.

- The code will crash or hang if recv() returns error.

- The GUI is dead while the program is getting the data. (Ok, this is just an example so not offloading the network code to separate thread isn't that bad... :-))
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show all replies
    • http://www.iki.fi/sintonen/
Re: Example of C source code for getting web page.
« Reply #7 on: March 18, 2012, 08:40:57 PM »
@kvasir

The code is fairly good. However, this code here has a bug:
Code: [Select]
for (;;)
 {
  bytes_received = recv ( socket_handle , input_buffer , 1023, 0 ) ;
  if ( bytes_received == -1 )
  {
   printf ( &quot;An error occured during the receive procedure \n&quot; ) ;
   return bailout() ; // return 0; to return bailout();
  }
  if ( bytes_received == 0 )
  break ;
  pinput_buffer = input_buffer + bytes_received ;
  *pinput_buffer = 0 ;
  printf ( &quot;%s&quot; , input_buffer ) ;
 }
It doesn't handle the response arriving in multiple parts correctly.

Another thing to look for is the inadvertent adding of ; after blocks. It's a bad habit you should learn away from.

Trying to get HTTP GET right with custom code is probably one of the most difficult tasks. Unless if for exercise you really should use libcurl.
« Last Edit: March 18, 2012, 08:44:26 PM by Piru »