Author Topic: My C homework (Read 19329 times)

falemagn · « **on:** February 24, 2007, 02:50:39 PM »

Quote

Piru wrote:
@Karlos
Quote
A hash map satisfies the second point nicely. That would be trivial to implement in C++/STL but it requires some thinking about to do from first principles in C. But then, that is the aim of the exercise.
Certainly you don't want to be iterating over the entire list of known words for each word read.

True. There's IMO a better data structure for this particular case: a binary tree. Hash tables are boring... ;-)

Or even better, a trie.

falemagn · « **Reply #1 on:** February 24, 2007, 03:10:05 PM »

Quote

Of course, it depends on your hash function. If it takes longer to compute than it does to iterate n levels down the tree (or trie), chances are it will be slower.

Unless you factor in cache locality, which may very well have an inpact, a hash map will always be slower than a trie as you not only have to computer the hash, but also still compare the strings.

falemagn · « **Reply #2 on:** February 26, 2007, 01:11:53 PM »

Quote

Also...
What you may want to try at a more advanced stage is to rewrite it so that it loads a large portion of the file (or all of it) into memory and then processes it from there.

That's unnecessary, as she's using buffered I/O file access which basically does what you're suggesting she should do.

One thing I would suggest, instead, is to use ctype.h's isspace() function (or macro, depending on its implementation).

falemagn · « **Reply #3 on:** February 26, 2007, 02:14:18 PM »

Quote

Karlos wrote:
@falemagn

Even though the stdio buffers the file, the overhead of repeatedly calling fgetc() can make a difference compared to iterating over an array of characters that you effectively have direct access to. You might notice the benefit on an 030 IDE system like Mel's.

The standard allows getc() to be implemented as a macro, and it often is. Getc() is otherwise equivalent to fgetc().

falemagn · « **Reply #4 on:** February 26, 2007, 02:15:35 PM »

Quote

Karlos wrote:
Quote

falemagn wrote:

One thing I would suggest, instead, is to use ctype.h's isspace() function (or macro, depending on its implementation).

Actually, I think the idea of differentiating "word" characters from "non-word" characters would be more effective. You'd probably want to use isalnum() for that one as it only returns non-zero for letters or numbers.

I agree with you, but I was merely suggesting a way to do without the 'case' construct she used.

falemagn · « **Reply #5 on:** February 27, 2007, 04:59:50 PM »

Mel,

it looks good, but you don't really need the notinaword variable, you can recode the countwords function like this:

Code: [Select]


int countwords(FILE* file, int* lengths)
{
    int numwords = 0;
    int length   = 0;
    int character;

    while ((character = getc(file)) != EOF) {
        if (isalnum(character)) {
            /* character is in a word so increment the length */
            length++;
        } 
        else
        if (length > 0) {
            /* if we just got here then update our counters */
            numwords++;

            /* fill appropriate length bucket */
            if (length < 4) {
                /* for lengths < 4 fill bucket 0 */
                lengths[0]++;
            }
            else
            if (length < 11) {
                /* for lengths 4...10 fill buckets 1-7*/
                lengths[length - 3]++;
            }
            else {
                /* for lengths > 10 fill bucket 8*/
                lengths[8]++;
            }
            
            length = 0;
        }
    }
    
    return numwords;
}

Notice also the use of getc instead of fgetc: it's likely to make the code faster, for the implementation can chose to implement it as an inline function or macro accessing the buffering data structures directly.

falemagn · « **Reply #6 on:** February 28, 2007, 08:45:35 AM »

Quote

mel_zoom wrote:
Hmm.

I hit a snag. It seems isalnum() only returns non zero for the characters 0...9 a...z A...Z

That's because the locale is set to the default "C". Libc's setlocale() function will change that behaviour, provided the underlying implementation supports it. AmigaOS' locale.library provides equivalent functionalities, but then it's not ANSI C anymore.

falemagn · « **Reply #7 on:** February 28, 2007, 08:49:08 AM »

Quote

AmireX wrote:
Not even faster, but more compact :-)
Code: [Select]
/* fill a range bucket based on the word length*/ length = min(max(length-3,0),8); lengths[length]++;
with
Code: [Select]
#define min(a, b) (((a) < (b)) ? (a) : (b)) #define max(a, b) (((a) > (b)) ? (a) : (b))
Regards, ReX

If she's using gcc, then the following might be a better choice:

Code: [Select]


    /* fill appropriate length bucket */
    switch (length)
    {
        case 1 ... 3:  lengths[0]++;        break;
        case 4 ... 10: lengths[length-3]++; break;
        default:       lengths[8]++;
    }

falemagn · « **Reply #8 on:** March 02, 2007, 07:06:30 AM »

@ mel_zoom

Piru said all that I had in mind to reply myself, so... just wanted to say that I wish my coworkers were as skilled as a newbee like you is. ;-)

falemagn · « **Reply #9 on:** March 02, 2007, 11:01:17 AM »

Quote

mel_zoom wrote:

I thought the look of perplexed frustration would be apt for when I start getting into more realistic C code :-)

Except that you don't look either perplexed nor frustrated, in that picture. ;-)

falemagn · « **Reply #10 on:** March 07, 2007, 07:16:50 AM »

@ Mel

Since you're never traversing the list backwards, you don't really need the "prev" pointer. Besides, you mispelled "prev" with "last" in the uniqueword structure definition. ;-)

About the loop issue: the "trick" is to not add the new word while within the loop, but do it always outside, trying to unify the cases in which it's the last word in the chain or one in between. You can state the loop's invariant property that when it finishes looping, the "last" pointer always points to the word after which you need to insert yours, then factor the loop in such a way as to make that the case.

In other words, consider the loop as the implementation of a "find" function that returns the pointer to the element in the list which is the upper bound of all elements that are less than the one you want to add.

If you need code hints, I can give them to you, but I'm sure you'll prefer to work that out by yourself. ;-)

falemagn · « **Reply #11 on:** March 07, 2007, 12:11:23 PM »

Quote

mel_zoom wrote:
falemagn:

Thanks for the hints. The code there was put together but not yet compiled so I expect there to be those sorts of bugs.

I guessed as much, but still apart from that mistake the code as it is will work just fine. It's kind of amazing how you are so good at such stuff... the dream girl of any geek, I'd say. ;-)

Quote

I thought about having a search that ran from both ends towards the middle that could terminate once the pointers "cross over". Thats why I thought about having prev/next.

That may sound a clever idea, but you'd not get better performances than the one-way search unless you were thinking of parallelizing the algorithm... which would be quite an overengineering. :-) In both cases, the average compexity would be O(n/2).

Quote

As you can tell I am still some time from making it work :-)

There's still another improvement you could make, namely structure the code in such a way as to allocate the new uniqueword only in one place rather than in three like you do now. But we'll keep this for later... ;-)

Author Topic: My C homework (Read 19329 times)

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework

falemagn

Re: Mel's C homework