如何在具有100k行的两个数据帧中应用内部联接操作? . 我有8 GB的计算机RAM并使用Dask但仍然我的计算机被挂起 . 什么是正确的解决方案?
import pandas as pd
import numpy as np
import dask.dataframe as dd
import time
pool=mp.Pool(processes=4)
start=time.time()
SData = dd.read_csv("KD_111.csv")
TData = dd.read_csv("KD_111_T.csv")
SData["Unique"] = SData["OrderDate"]+ SData["Region"] + (SData["Rep"]) + SData["Item"]
TData["Unique"] = TData["OrderDate"]+ TData["Region"] + TData["Rep"] + TData["Item"]
SData=SData.set_index("Unique")
TData=TData.set_index("Unique")
#Data1=SData.groupby(SData.index)
#Data2=TData.groupby(TData.index)
Data=dd.merge(SData,TData,left_index=True,right_index=True)
#print(Data.columns)
Data1=Data.loc[:,:"Total_x"]
Data2=Data.loc[:,"OrderDate_y":]
print(Data1.compute())