- #1
Enharmonics
- 29
- 2
Homework Statement
Write a C program to run on ocelot to find the total count of words and optionally the longest and or shortest words in a string input by the user or coming from a file. If there is no filename the user would be prompted to enter the string. You must use getopt to parse the command line. The string would not be input on the command line.
Usage: countwords [-l] [-s] [filename]
- The l flag means to find the longest word in the string.
- The s option means to find the shortest word in the string.
- You may have both or one of the flags.
- Output should be well formatted and easy to read.
Homework Equations
N/A
The Attempt at a Solution
My code so far:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
// Function Prototype for the strcatc function,
// which concatenates a char to a string
void strcatc(char* str, char c);
int main(int argc, char **argv)
{
extern char *optarg;
extern int optind;
int c, err = 0;
// The flags for longest/shortest
// options
int lflag = 0, sflag = 0;
// Stores the number of words in a file/user-provided string
int wordCount = 0;
// Will be used to count the number of letters in each word
int letterCount = 0;
// Stores the size of the C-strings used
const int STRING_SIZE = 5000;
// TRUE/FALSE boolean value stand-ins
const int TRUE = 1;
const int FALSE = 0;
// Type alias for the bool type
typedef int bool;
// Holds the user-provided string
char userString[STRING_SIZE];
// Holds the current line in the file/user-provided string.
char currentLine[STRING_SIZE];
// Holds the current word we're processing in the file
// or user-provided string, which will be derived while
// processing the current line.
char currentWord[STRING_SIZE];
// Holds the current character in the user-provided string.
char currentChar;
// These will point to the char arrays holding the longest and shortest
// words in a string (where applicable)
char longestWord[STRING_SIZE], shortestWord[STRING_SIZE]; static char usage[] = "Usage: %s [-l] [-s] [filename]\n";
while ((c = getopt(argc, argv, "ls")) != -1)
switch (c)
{
case 'l':
lflag = 1;
break;
case 's':
sflag = 1;
break;
case '?':
err = 1;
break;
}
if (err)
{
// Generic error message
printf("ERROR: Invalid input.\n");
fprintf(stderr, usage, argv[0]);
exit(1);
}
// CODE FOR THE WORD COUNTER STARTS HERE
// First, check whether the user provided a filename. We do this using
// optind. We know we've reached the final getopt argument at this point,
// so optind at this point represents the index position of filename (if it exists).
char fileDirectory[STRING_SIZE];
strcpy(fileDirectory, argv[optind]);
// FILE pointer that will be used to attempt to open the file
// the user provided, if indeed they did provide one
FILE *inFile;
// Represents the mode the file will be opened in
char *mode = "r";
// This will be used to iterate through each
// line of the file, which will be stored
// in a char array
int i;
// This "bool" (an alias for an int defined earlier,
// since C has no native bool type) variable indicates whether the current
// word being processed in the file/user-provided string
// is the FIRST WORD
bool firstWord;
// Here we actually attempt to open the file
inFile = fopen(fileDirectory, mode);
// Check whether the file was opened successfully; If it
// wasn't, that means the user didn't provide a file path,
// misspelled it, the file didn't exist, etc.
if (inFile == NULL)
{
// Represents the index of
// userString, and will be used
// to iterate through it
int strIndex; // Set the firstWord "bool" variable
// to true before we begin to process
// the user-provided string
firstWord = TRUE; // If there is no file to access, prompt the user
// to enter a string
printf("Please enter a string: ");
// Scanf takes user's input from stdin
scanf("%[^\n]s", userString);
for (strIndex = 0; strIndex < strlen(userString); strIndex++)
{
// If the current character isn't whitespace and
// is an alphanumeric character, increase
// the letterCount of the current word
// and append the character to currentWord
if (!isspace(userString[strIndex]) && isalnum(userString[strIndex]))
{
letterCount++;
strcatc(currentWord, userString[strIndex]);
}
// Otherwise, we've reached the end of a word,
// so increase wordCount
else
{
wordCount++;
// Check whether the current word is
// the first word
if (firstWord)
{
// If it is, assume it is both
// the longest and shortest word
// (this will be changed as we iterate
// through the line, of course)
strcpy(longestWord, currentWord);
strcpy(shortestWord, currentWord);
// At this point, we set the firstWord
// variable to 0. Because this variable
// is not modified anywhere else in the
// loop, we will only ever enter this if
// branch when we are processing the first
// word in the string
firstWord = FALSE;
}
// If not, check whether the currentWord is
// longer or shorter than the current
// longest and shortest words, respectively
else if (letterCount > strlen(longestWord))
{
strcpy(longestWord, currentWord);
}
// Note that here I use <= when comparing
// letterCount to the length of shortestWord.
// This is because shortestWord is initialized
// to a single blank space, so the shortest
// possible words ("I", "A", etc as explained earlier)
// would not be copied into shortestWord if I just
// used
else if (letterCount <= strlen(shortestWord))
{
strcpy(shortestWord, currentWord);
}
// Now that we're done comparing the
// letterCount of the currentWord to
// the current longest/shortest words,
// reset letterCount so we can start
// counting the number of letters in the
// NEXT word from 0
letterCount = 0;
// We also reset currentWord, since we are
// moving on to the next word in the string
strcpy(currentWord, "");
}
}
}
// Otherwise, process the file
else
{
// As we did earlier, set the "bool"
// variable to true. We do this outside
// the while loop below so that it is only
// ever set to true BEFORE we begin processing
// the file (further explanation below)
firstWord = TRUE;
// While loop that iterates until the end
// of the file provided
while (!feof(inFile))
{
// Extract a line of text from the file
fgets(currentLine, STRING_SIZE, inFile);
// The rest of this follows exactly
// same algorithm as I did in the user-provided
// string.
for (i = 0; i < strlen(currentLine); i++)
{
if (!isspace(currentLine[i]) && isalnum(currentLine[i]))
{
letterCount++;
strcatc(currentWord, currentLine[i]);
}
else
{
wordCount++;
if (firstWord)
{
strcpy(longestWord, currentWord);
strcpy(shortestWord, currentWord);
firstWord = FALSE;
}
else if (letterCount > strlen(longestWord))
{
strcpy(longestWord, currentWord);
}
else if (letterCount <= strlen(shortestWord))
{
strcpy(shortestWord, currentWord);
}
letterCount = 0;
strcpy(currentWord, "");
}
}
}
} // If the number of words isn't zero (that is,
// if the string/file wasn't empty), increment
// wordCount by one
// The reason this is necessary is that my algorithm
// doesn't actually count the words themselves - it
// counts the spaces BETWEEN words. Because every two
// words are separated by a single space, that means
// that the number of spaces in a sentence is equal
// to (number of words - 1), hence this adjustment
if (wordCount != 0)
{
wordCount++;
}
// Output the total number of words
printf("The total number of words is: %d\n", wordCount);
// If lflag is set, output
// the longest word to the user
if (lflag)
{
printf("The longest word is: %s\n", longestWord);
}
// If sflag is set, output the
// shortest word to the user
if (sflag)
{
printf("The shortest word is: %s\n", shortestWord);
} fclose(inFile);
}
// Auxiliary method used to concatenate a character to a char array ("String")
void strcatc(char *str, char c)
{
// Iterate through the memory locations that
// make up the char array until we reach the
// final spot
for (; *str; str++);
// Add the char to the end of the already-existing
// C-string
*str++ = c;
// Add the "null character" to the end of the string
// to accommodate C-string formats (null-terminated character arrays)
*str++ = '\0';
}
So my problem this time is a weird one. When I run the program on Ocelot (I'm not totally sure what it is - it's required for the course I'm taking and basically works like a command line that uses Unix syntax. You run it through PuTTY), it works perfectly when I provide a filename. If I type in the commands
make
./countwords -l -s infile
Where infile is a file containing the sentence "This is a test", I'll get the correct output (number of words is 4, shortest "word" is the particle "a", longest is the word "this", even though it's technically tied with "test").
However, when I try to run it without providing a filename, as in
./countwords -l -s
I immediately get a segmentation fault. I have no idea what's causing it. At first, I thought it might have something to do with the line
Code:
char *mode = "r";
Since I'm assigning a value to the pointer without setting aside memory space for it, but even if I change it
to, for example,
Code:
char mode[2] = "r";
I still get the segmentation fault.
I don't know what else could be wrong. The fact that the program works perfectly when I provide a filename and only breaks down when I fail to do so tells me that the problem lies in that area (the filename).
My hunch is that it may have something to do with these lines:
Code:
char fileDirectory[STRING_SIZE];
strcpy(fileDirectory, argv[optind]);
Specifically the second one. When I provide a filename, argv[optind] returns the C-string at the end of the option list (that is, it returns the filename in ./countwords -l -s filename).
When I don't provide a filename, it returns... something else, and maybe that something isn't a string, or is otherwise incompatible with the strcpy method? That's all I can think of off the top of my head.