从一组曲线到数据点的最佳拟合-Java 学习之路

我有一组曲线 F={f1, f2, f3,..., fN} ，每个曲线都是通过一组点定义的，即：我没有明确的函数形式 . 所以我有一组像这样的 N 表：

#f1: x  y
1.2  0.5
0.6  5.6
0.3  1.2
...

#f2: x  y
0.3  0.1
1.2  4.1
0.8  2.2
...

#fN: x  y
0.7  0.3
0.3  1.1
0.1  0.4
...

我还有一组观察/测量的数据点 O=[p1, p2, p3,..., pM] ，其中每个点都有 x, y 坐标和 [0, 1] 之间的给定权重，所以它看起来像：

#O: x  y  w
0.2  1.6  0.5
0.3  0.7  0.3
0.1  0.9  0.8
...

自 N ~ 10000 （我有很多函数）以来我正在寻找的是一种有效的（更确切地说： fast ）方法来找到最符合我的观察和加权点的曲线 O .

当我有函数的显式形式（scipy.optimize.curve_fit）时，我知道如何找到与 python 的最佳匹配，但是当我将函数定义为表时，我该怎么做？

3 回答

4
这是一种可能的解决方案 . 这结合了原始帖子和@ elyase上面的解决方案的一些评论 . @elyase提供了一种在每个函数的点之间进行插值的方法 . 鉴于此，并且最佳拟合的定义是加权平方和，我认为以下是您想要的：
```
# Here a model is an interpolated function as per @elyase's solution above
min_score = sys.float_info.max
best_model = None
for model in models:
    # data is an array of (x, y, weight) tuples
    score = 0.0
    for data_point in data:
        w = data_point[2]
        x = data_point[0]
        y = data_point[1]
        score += w * (y - model.get_y(x)) ** 2
    if score < min_score:
        best_model = model
return best_model
```
你提到你需要一个“快速”的解决方案 . 根据您的上述答案，对每组数据执行上述操作会导致总计约200万次迭代 . 即使使用Python，这也不会超过几秒钟 . 这够快吗？

如果不是，事情会变得复杂得多 . 例如，您可以尝试按排序顺序存储模型（您将其称为上面的函数），以便 model1 > model2 if model1(x) > model2(x) 适用于所有 x （给定上面的插值内容） . 这仅定义了部分订单，但如果您的模型具有正确的属性，那么这可能足以非常有用 . 鉴于此，您可以执行类似于二进制搜索的操作 . 或者，您可以执行分支绑定操作，其中绑定由数据中的第一个值与函数中的第一个值之间的距离给出 . 取决于您的功能和数据的性质，可能会或可能没有帮助 . 如果你需要一个几乎完全但不一定是最佳的答案等等，你可以考虑解决方案等等 . 总之，为了超越上面的微不足道的答案，我想我们需要更多地了解你的时间限制，数据和模型 .
回复于 2024-05-06T17:05:12+08:00
1
您需要两个元素才能获得拟合，数据（您已经拥有）和模型空间（线性模型，高斯过程，支持向量回归） . 在您的情况下，您的模型还有一个额外的约束，即某些数据点的权重应该与其他数据点不同 . 可能是这样的东西你的作品：
```
from scipy.interpolate import UnivariateSpline

temp = np.asarray([10, 9.6, 9.3, 9.0, 8.7])
height = np.asarray([129, 145, 167, 190, 213])
f = UnivariateSpline(height, temp)
```
现在您可以在任何地方评估 f ：
```
test_points = np.arange(120, 213, 5)  
plot(height, temp, 'o', regular_heights, f(test_points), 'x')
```
回复于 2024-05-06T17:05:12+08:00

这是我建议的方法：

将所有函数放在一个numpy数组中
计算测试数据中每个点与每个函数中每个点之间的平方距离（您也可以计算精确距离，但是sqrt很贵）
计算误差作为距离的加权和（或根据您的喜好修改）
找到最小错误

例如：

import numpy as np

# define an array of N=3 functions
funcs = np.array([
    [[0, 1, 2, 3, 4, 5],  # x1
     [0, 1, 2, 1, 0, 0]], # y1
    [[0, 1, 2, 3, 4, 5],  # x2
     [0, 0, 0, 1, 2, 3]], # y2
    [[0, 1, 2, 3, 4, 5],  # x3
     [5, 4, 3, 2, 1, 0]]  # y3
    ], dtype=float)

# define the test data and weights with the same
# dimensions as function array
data = np.array([
    [[0, 1, 2, 3, 4, 5],  # x
     [0, 1, 2, 2, 1, 0]]  # y
    ], dtype=float)

weight = np.array([
    [0.1, 0.2, 0.3, 0, 0, 0]  # w
    ])

# compute distance between points in data and each function:
dist = ((funcs - data) ** 2).sum(axis=1)

# compute weighted error across all functions:
err = (dist * weight).sum(axis=1)

print "Errors:", err
print "Best fit:", np.argmin(err)

回复于 2024-05-06T17:05:12+08:00

从一组曲线到数据点的最佳拟合

3 回答

相关问题