创建一个空哈希表

loading...


-1

我正在尝试创建一个结构的空哈希表:

struct htab
{
   size_t capacity;
   size_t size;
   struct pair *data;
};

数据是结构对值的链表列表 . 这些链表包含标记(虚拟值)作为第一个元素 .

struct pair
{
   uint32_t hkey;
   char *key;
   void *value;
   struct pair *next;
};

所以我把它写成容量为4,大小为0.我怎样才能将'data'数组的所有单元格初始化为0?

struct htab *htab_new()
{
   struct htab *newtable = 
   malloc(sizeof(struct htab));
   if (newtable == NULL)
   {
       errx(1, "Not enough memory!");
   }
   newtable->capacity = 4;
   newtable->size = 0;
   newtable->data = calloc(// ??);
   return newtable;
}

另外,我怎么测试这实际上是否有效?

loading...

2回答

  • 0

    在评论中,OP提到他们从一个例子中学得更好 . (这不是一个答案本身,只是一个例子 . 请考虑这是一个扩展的评论,而不是答案 . )

    让我们看一个简单的,现实世界的例子吧;但有些东西不能只是复制粘贴并呈现为自己的作品 .

    假设我们需要一个文本或标记的哈希表,比方说

    #include <stdlib.h>
    #include <string.h>
    #include <stdio.h>
    
    struct hashentry {
        struct hashentry   *next;
        size_t              hash;
        unsigned char       data[];
    };
    
    struct hashtable {
        size_t              size;
        struct hashentry  **slot;
    };
    

    表本身是一个指针数组,并通过链接解决哈希冲突 . 请注意,我实际上只使用键,而不是键值对;这是为了避免将此示例代码复制粘贴并呈现为某人自己的工作 . 我写这篇文章是为了帮助新程序员理解,而不是让作弊者作为他们的家庭作业提交 . (我不是指OP,请注意 . 这些问题通常是通过网络搜索找到的,我会为整个小组编写这些答案,而不仅仅是提问者 . )

    表初始化为特定大小:

    static inline void hashtable_init(struct hashtable *const ht, const size_t size)
    {
        size_t  i;
    
        if (!ht) {
            fprintf(stderr, "hashtable_init(): No hashtable specified (ht == NULL).\n");
            exit(EXIT_FAILURE);
        } else
        if (size < 1) {
            fprintf(stderr, "hashtable_init(): Invalid hashtable size (size == %zu).\n", size);
            exit(EXIT_FAILURE);
        }
    
        /* Allocate an array of pointers. */
        ht->slot = calloc(size, sizeof ht->slot[0]);
        if (!ht->slot) {
            fprintf(stderr, "hashtable_init(): Failed to allocate an array of %zu slots.\n", size);
            exit(EXIT_FAILURE);
        }
        ht->size = size;
    
        /* Mark each slot unused. (On all current architectures, this is unnecessary,
           as calloc() does initialize the pointers to NULL, but let's do it anyway.) */
        for (i = 0; i < size; i++)
            ht->slot[i] = NULL;
    }
    

    对于哈希函数,我喜欢用于文本字符串的DJB2 Xor变体 . 它不是特别好(会有碰撞),但它非常简单快速:

    static inline size_t  hash_data(const char *data, const size_t datalen)
    {
        const char *const ends = data + datalen;
        size_t            hash = 5381;
    
        while (data < ends)
            hash = (33 * hash) ^ (unsigned char)(*(data++));
    
        return hash;
    }
    

    请注意,我使用 size_t 作为哈希的类型 . 你可以使用你想要的任何类型,但在大多数架构中,它与指针的大小相同,即 .

    要向哈希表添加数据条目:

    static inline void hashtable_add(struct hashtable *ht, const char *data, const size_t datalen)
    {
        struct hashentry *entry;
        size_t            hash, slotnum;
    
        if (!ht) {
            fprintf(stderr, "hashtable_add(): No hashtable specified (ht == NULL).\n");
            exit(EXIT_FAILURE);
        } else
        if (ht->size < 1) {
            fprintf(stderr, "hashtable_add(): Hashtable has zero size.\n");
            exit(EXIT_FAILURE);
        } else
        if (!data && datalen > 0) {
            fprintf(stderr, "hashtable_add(): data is NULL, but datalen == %zu.\n", datalen);
            exit(EXIT_FAILURE);
        }
    
        /* Allocate memory for the entry, including the data, and the string-terminating nul '\0'. */
        entry = malloc(sizeof (struct hashentry) + datalen + 1);
        if (!entry) {
            fprintf(stderr, "hashtable_add(): Out of memory (datalen = %zu).\n", datalen);        
            exit(EXIT_FAILURE);
        }
    
        /* Copy the data, if any. */
        if (datalen > 0)
            memcpy(entry->data, data, datalen);
    
        /* Ensure the data is terminated with a nul, '\0'. */
        entry->data[datalen] = '\0';
    
        /* Compute the hash. */
        hash = hash_data(data, datalen);
        entry->hash = hash;
    
        /* The slot number is the hash modulo hash table size. */
        slotnum = hash % ht->size;
    
        /* Prepend entry to the corresponding slot chain. */
        entry->next = ht->slot[slotnum];
        ht->slot[slotnum] = entry;
    }
    

    当我最初编写如上所述的代码时,我总是将其编写为测试程序,并对其进行测试 . (这在技术上属于_4921范式 . )

    在这种情况下,我们可以简单地将插槽数作为命令行参数,并从标准输入读取每一行作为要添加到散列表的数据 .

    因为标准C没有实现 getline() ,所以我们最好使用 fgets() ,而使用固定大小的行缓冲区 . 如果我们宣布

    #ifndef  MAX_LINE_LEN
    #define  MAX_LINE_LEN  16383
    #endif
    

    我们有一个预处理器宏 MAX_LINE_LEN ,默认为16383,但可以在编译时使用编译器选项覆盖 . (使用GCC,Intel CC和clang,您可以使用例如 -DMAX_LINE_LEN=8191 将其减半 . )

    main() 中,如果参数计数不正确,或者 -h--help 是第一个参数,我喜欢打印用法:

    int main(int argc, char *argv[])
    {
        char              buffer[MAX_LINE_LEN + 1];
        char             *line;
        size_t            size, i;
        struct hashtable  table;
        char              dummy;
    
        if (argc != 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
            fprintf(stderr, "\n");
            fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
            fprintf(stderr, "       %s ENTRIES < DATA-FILE > DOT-FILE\n", argv[0]);
            fprintf(stderr, "\n");
            fprintf(stderr, "This program reads lines from DATA-FILE, adding them to\n");
            fprintf(stderr, "a hash table with ENTRIES slots and hash chaining.\n");
            fprintf(stderr, "When all input lines have been read, the contents of the\n");
            fprintf(stderr, "hash table slots will be output as a Graphviz DOT format graph.\n");
            fprintf(stderr, "\n");
            return EXIT_SUCCESS;
        }
    

    接下来,我们可以尝试将第一个命令行参数解析为 size_t size; . 我喜欢使用"sentinel"字符来检测参数在值之后是否有垃圾(除了空白):

    if (sscanf(argv[1], "%zu %c", &size, &dummy) != 1 || size < 1) {
            fprintf(stderr, "%s: Invalid number of hash table entries.\n", argv[1]);
            return EXIT_FAILURE;
        }
        hashtable_init(&table, size);
    

    下一部分是从标准输入中读取每一行,并将它们添加到哈希表中 .

    while (1) {
    
            line = fgets(buffer, sizeof buffer, stdin);
            if (!line)
                break;
    
            /* Skip leading ASCII whitespace. */
            line += strspn(line, "\t\n\v\f\r ");
    
            /* Find out the remaining length of the line. */
            size = strlen(line);
    
            /* Ignore trailing ASCII whitespace. */
            while (size > 0 && (line[size-1] == '\t' || line[size-1] == '\n' ||
                                line[size-1] == '\v' || line[size-1] == '\f' ||
                                line[size-1] == '\r' || line[size-1] == ' '))
                size--;
    
            /* Ignore empty lines. */
            if (size < 1)
                continue;
    
            /* Add line to the hash table. */
            hashtable_add(&table, line, size);
        }
    
        /* Check if fgets() failed due to an error, and not EOF. */
        if (ferror(stdin) || !feof(stdin)) {
            fprintf(stderr, "Error reading from standard input.\n");
            return EXIT_FAILURE;
        }
    

    此时,我们有 tablesize 个插槽 . 我编写测试程序来编写纯文本输出(如果简单)或Graphviz DOT格式输出(如果结构像图形) . 在这种情况下,图形输出格式听起来更好 .

    /* Print the hash table contents as a directed graph, with slots as boxes. */
        printf("digraph {\n");
    
        for (i = 0; i < table.size; i++) {
            struct hashentry *next = table.slot[i];
    
            /* The slot box. */
            printf("    \"%zu\" [ shape=\"box\", label=\"%zu\" ];\n", i, i);
    
            if (next) {
    
                /* The edge from the slot box to the entry oval. */
                printf("    \"%zu\" -> \"%p\";\n", i, (void *)next);
    
                while (next) {
                    struct hashentry *curr = next;
    
                    /* Each entry oval; text shown is the value read from the file. */
                    printf("    \"%p\" [ shape=\"oval\", label=\"%s\" ];\n", (void *)curr, curr->data);
    
                    next = next->next;
    
                    /* The edge to the next oval, if any. */
                    if (next)
                        printf("    \"%p\" -> \"%p\";\n", (void *)curr, (void *)next);
                }
            } 
        }
    
        printf("}\n");
        return EXIT_SUCCESS;
    }
    

    而已 . 如果使用 10 作为命令行参数编译并运行上述程序,并使用feed

    one
    two
    three
    four
    five
    six
    seven
    eight
    nine
    ten
    

    对于它的标准输入,它将输出

    digraph {
        "0" [ shape="box", label="0" ];
        "1" [ shape="box", label="1" ];
        "1" -> "0xb460c0";
        "0xb460c0" [ shape="oval", label="three" ];
        "0xb460c0" -> "0xb46080";
        "0xb46080" [ shape="oval", label="one" ];
        "2" [ shape="box", label="2" ];
        "3" [ shape="box", label="3" ];
        "3" -> "0xb46180";
        "0xb46180" [ shape="oval", label="nine" ];
        "0xb46180" -> "0xb460a0";
        "0xb460a0" [ shape="oval", label="two" ];
        "4" [ shape="box", label="4" ];
        "4" -> "0xb46160";
        "0xb46160" [ shape="oval", label="eight" ];
        "0xb46160" -> "0xb46140";
        "0xb46140" [ shape="oval", label="seven" ];
        "5" [ shape="box", label="5" ];
        "5" -> "0xb46100";
        "0xb46100" [ shape="oval", label="five" ];
        "6" [ shape="box", label="6" ];
        "6" -> "0xb461a0";
        "0xb461a0" [ shape="oval", label="ten" ];
        "7" [ shape="box", label="7" ];
        "7" -> "0xb46120";
        "0xb46120" [ shape="oval", label="six" ];
        "0xb46120" -> "0xb460e0";
        "0xb460e0" [ shape="oval", label="four" ];
        "8" [ shape="box", label="8" ];
        "9" [ shape="box", label="9" ];
    }
    

    喂给Graphviz dot 会生成一个漂亮的图表:

    Hash table graph

    如果要查看数据字符串上方的实际哈希值,请更改为

    /* Each entry oval; text shown is the value read from the file. */
                    printf("    \"%p\" [ shape=oval, label=\"%zu:\\n%s\" ];\n", (void *)curr, curr->hash, curr->data);
    

    正如我所说,DJB2 Xor哈希并不是特别好,对于上面的输入,你需要至少43个哈希表槽来避免链接 .


  • 0

    这些链接列表包含作为第一个元素的标记(虚拟值) .

    newtable->data = calloc(capacity, sizeof(*newtable->data));
    // but you should check the result of calloc as well, just as you did for malloc!
    

    你会发现钥匙已经分配了一个值实际使用的插槽吗?然后,您将无法使用空指针作为键 . 如果只使用 next 指针,那么为什么要使用结构呢?你可以有一个指针数组,然后sentinel将是一个空指针:

    struct pair** data;
    

    有趣的是,有了这个,你就不需要像上面给出的那样改变对calloc的调用( sizeof(*data) 现在将是指针的大小......) .

    旁注:见calloc and pointers ......

评论

暂时没有评论!