Haskell FFI / C的性能考虑？-Java 学习之路

如果将Haskell用作 library 被称为 from 我的C程序，那么调用它会对性能产生什么影响？例如，如果我有一个20kB数据的世界数据集的问题，我想运行如下：

// Go through my 1000 actors and have them make a decision based on
// HaskellCode() function, which is compiled Haskell I'm accessing through
// the FFI.  As an argument, send in the SAME 20kB of data to EACH of these
// function calls, and some actor specific data
// The 20kB constant data defines the environment and the actor specific
// data could be their personality or state
for(i = 0; i < 1000; i++)
   actor[i].decision = HaskellCode(20kB of data here, actor[i].personality);

这里会发生什么 - 我是否有可能将20kB的数据保存为Haskell代码访问的全局不可变引用，或者每次都必须创建该数据的副本？

值得关注的是，这些数据可能更大，更大 - 我还希望编写一些算法，这些算法可以处理更大的数据集，使用Haskell代码的多次调用所使用的相同模式的不可变数据 .

另外，我想将它并行化，就像dispatch_apply（）GCD或Parallel.ForEach（..）C＃ . 我在Haskell之外进行并行化的基本原理是，我知道我将始终在许多单独的函数调用上运行，即1000个actor，因此在Haskell函数中使用细粒度并行化并不比在C级别管理它更好 . 运行FFI Haskell实例'线程安全'以及如何实现这一点 - 每次启动并行运行时是否需要初始化Haskell实例？（如果必须的话，似乎很慢......）如何以良好的性能实现这一目标？

4 回答

9
拨打电话会对性能产生什么影响

假设你只启动Haskell运行时一次（like this），在我的机器上，从C进入Haskell的函数调用，在边界上来回传递Int，需要 80,000 cycles （我的Core 2上的 31,000 ns ） - 通过实验确定通过rdstc寄存器

我是否有可能将20kB的数据保存为Haskell代码访问的全局不可变引用

是的，这当然是可能的 . 如果数据确实是不可变的，那么无论您是否获得相同的结果：
- 通过编组来跨越语言边界来回传递数据;
- 来回传递对数据的引用;
- 或将其缓存在Haskell端的 IORef 中 .
哪种策略最好？这取决于数据类型 . 最常用的方法是来回传递对C数据的引用，将其视为Haskell端的 ByteString 或 Vector .

我想将其并行化

我强烈建议反过来控制，并从Haskell运行时进行并行化 - 它会更加健壮，因为该路径已经过严格测试 .

关于线程安全性，对在同一运行时运行的 foreign exported 函数进行并行调用显然是安全的 - 尽管相当确定没有人为了获得并行性而尝试过这一点 . 调用获取一个本质上是锁定的功能，因此多个调用可能会阻塞，从而降低并行性的可能性 . 在多核情况下（例如 -N4 左右），您的结果可能会有所不同（有多种功能可用），但这几乎肯定是提高性能的一种不好方法 .

同样，从Haskell通过 forkIO 进行许多并行函数调用是一个更好的文档化，更好的测试路径，与在C端执行工作相比，开销更少，并且最终可能更少的代码 .

只需调用Haskell函数，然后通过许多Haskell线程执行并行操作 . 简单！
回复于 2024-04-20T09:45:12+08:00

我为我的一个应用程序使用了C和Haskell线程的混合，并没有发现在两者之间切换的性能很大 . 所以我制作了一个简单的基准测试......比Don的快一点/便宜 . 这是在2.66GHz i7上测量1000万次迭代：

$ ./foo
IO  : 2381952795 nanoseconds total, 238.195279 nanoseconds per, 160000000 value
Pure: 2188546976 nanoseconds total, 218.854698 nanoseconds per, 160000000 value

在OSX 10.6上使用GHC 7.0.3 / x86_64和gcc-4.2.1编译

ghc -no-hs-main -lstdc++ -O2 -optc-O2 -o foo ForeignExportCost.hs Driver.cpp

哈斯克尔：

{-# LANGUAGE ForeignFunctionInterface #-}

module ForeignExportCost where

import Foreign.C.Types

foreign export ccall simpleFunction :: CInt -> CInt
simpleFunction i = i * i

foreign export ccall simpleFunctionIO :: CInt -> IO CInt
simpleFunctionIO i = return (i * i)

一个OSX C应用程序来驱动它，应该很容易适应Windows或Linux：

#include <stdio.h>
#include <mach/mach_time.h>
#include <mach/kern_return.h>
#include <HsFFI.h>
#include "ForeignExportCost_stub.h"

static const int s_loop = 10000000;

int main(int argc, char** argv) {
    hs_init(&argc, &argv);

    struct mach_timebase_info timebase_info = { };
    kern_return_t err;
    err = mach_timebase_info(&timebase_info);
    if (err != KERN_SUCCESS) {
        fprintf(stderr, "error: %x\n", err);
        return err;
    }

    // timing a function in IO
    uint64_t start = mach_absolute_time();
    HsInt32 val = 0;
    for (int i = 0; i < s_loop; ++i) {
        val += simpleFunctionIO(4);
    }

    // in nanoseconds per http://developer.apple.com/library/mac/#qa/qa1398/_index.html
    uint64_t duration = (mach_absolute_time() - start) * timebase_info.numer / timebase_info.denom;
    double duration_per = static_cast<double>(duration) / s_loop;
    printf("IO  : %lld nanoseconds total, %f nanoseconds per, %d value\n", duration, duration_per, val);

    // run the loop again with a pure function
    start = mach_absolute_time();
    val = 0;
    for (int i = 0; i < s_loop; ++i) {
        val += simpleFunction(4);
    }

    duration = (mach_absolute_time() - start) * timebase_info.numer / timebase_info.denom;
    duration_per = static_cast<double>(duration) / s_loop;
    printf("Pure: %lld nanoseconds total, %f nanoseconds per, %d value\n", duration, duration_per, val);

    hs_exit();
}

回复于 2024-04-20T09:45:12+08:00

3

Haskell can peek into that 20k blob if you pass the pointer.

回复于 2024-04-20T09:45:12+08:00
20
免责声明：我没有FFI的经验 .

但在我看来，如果你想重复使用20 Kb的数据，这样你就不会每次都传递它，那么你可以简单地使用一个方法来获取“个性”列表，并返回一个“决策”列表 .

所以，如果你有一个功能
```
f :: LotsaData -> Personality -> Decision
f data p = ...
```
那么为什么不做一个帮手功能呢
```
helper :: LotsaData -> [Personality] -> [Decision]
helper data ps = map (f data) ps
```
并调用它？但是，使用这种方式，如果你想并行化，你需要使用并行列表和并行映射来做Haskell端 .

我请专家解释是否/如何将C数组轻松编组到Haskell列表（或类似结构）中 .
回复于 2024-04-20T09:45:12+08:00

Haskell FFI / C的性能考虑？

4 回答

相关问题