Friday 20 January 2017

Performance benchmarks in radix tree test suite

Currently we have tests to check performance of tagged or normal iteration, in terms of speed or time of iteration. Let's see how these work.

static void benchmark_size(unsigned long size, unsigned long step, int order)
{
        RADIX_TREE(tree, GFP_KERNEL);
        long long normal, tagged;
        unsigned long index;

        for (index = 0 ; index < size ; index += step) {
                item_insert_order(&tree, index, order);
                radix_tree_tag_set(&tree, index, 0); 
        }

        tagged = benchmark_iter(&tree, true);
        normal = benchmark_iter(&tree, false);

        printf("Size %ld, step %6ld, order %d tagged %10lld ns, normal %10lld ns\n",
                size, step, order, tagged, normal);

        item_kill_tree(&tree);
        rcu_barrier();
}

We start with an arbitrary size of the radix tree say 210 or 220. A step of 128 means that starting from index 0 till the max index i.e. size, every 128th key index will be used to insert a tagged item. Order denotes the number of indices covered by a particular key. So a given key covers 2order indices around it. So if the order is non zero, the step size passed in will be 2order times it.

So we insert an item at every index which is an integral multiple of 'step', between 0 to size, and tag it with TAG 0, then execute tagged followed by normal iteration.

static long long benchmark_iter(struct radix_tree_root *root, bool tagged)
{
        volatile unsigned long sink = 0;
        struct radix_tree_iter iter;
        struct timespec start, finish;
        long long nsec;
        int l, loops = 1;
        void **slot;

#ifdef BENCHMARK
again:
#endif
        clock_gettime(CLOCK_MONOTONIC, &start);
        for (l = 0; l < loops; l++) {
                if (tagged) {
                        radix_tree_for_each_tagged(slot, root, &iter, 0, 0)
                                sink ^= (unsigned long)slot;
                } else {
                        radix_tree_for_each_slot(slot, root, &iter, 0)
                                sink ^= (unsigned long)slot;
                }
        }
        clock_gettime(CLOCK_MONOTONIC, &finish);

        nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
               (finish.tv_nsec - start.tv_nsec);

#ifdef BENCHMARK
        if (loops == 1 && nsec * 5 < NSEC_PER_SEC) {
                loops = NSEC_PER_SEC / nsec / 4 + 1;
                goto again;
        }
#endif

        nsec /= loops;
        return nsec;
}

By default these tests are executed with RADIX_TREE_MAP_SHIFT value of 3, but if we want to get performance comparable to in kernel performance, we can compile using BENCHMARK=1 which sets RADIX_TREE_MAP_SHIFT to 6.

We track the time elapsed for the iteration in nanoseconds in variable nsec. If RADIX_TREE_MAP_SHIFT is 6, i.e. benchmark is 1, we may get nsec too small (less than 0.2th fraction of NSEC_PER_SEC), in which case we want to measure time elapsed for multiple iterations and take the average. The number of iterations is decided by how small nsec initially is. 

No comments:

Post a Comment