在dask-ml中相当于scikit-learn的GroupShuffleSplit?

我喜欢以一种没有人在测试和训练数据集中出现观察结果的方式进行分裂 . 要在scikit-learn中进行这种分裂,我会做这样的事情,使用GroupShuffleSplit

import numpy as np
from sklearn.model_selection import GroupShuffleSplit

X = np.array([0.1, 0.2, 2.2, 2.4, 2.3, 4.55, 5.8, 0.001])
y = np.array(["a", "b", "b", "b", "c", "c", "c", "a"])
groups = np.array([1, 1, 2, 2, 3, 3, 4, 4])

gss = GroupShuffleSplit(n_splits=1, test_size=0.2, random_state=0)
train, test = next(gss.split(X, y, groups=groups))

X_train, y_train = X[train], y[train]
X_test,  y_test  = X[test],  y[test]

我怎么能用Dask或Dask-ML做到这一点?

回答(0)