提高双循环的性能-Java 学习之路

我正在研究一种算法，作为客户的餐馆推荐 . 这些建议基于一些过滤器，但主要是通过比较人们留在餐馆的评论 . （我会告诉你细节） .

为了计算皮尔逊相关性（一个确定用户与彼此之间的适合程度的数字），我必须检查用户在同一家餐厅的评论 . 为了增加比赛的数量，我在主题的价格范围内包括一个匹配 . 我会试着解释一下，这是我的餐厅课：

public class Restaurant
{
    public Guid Id { get; set; }
    public int PriceRange { get; set; }
}

这是一个简化版本，但对我的例子来说已经足够了 . 价格范围可以是1-5的整数，这决定了餐厅的价格 .

这是for循环，我用它来检查他们是否在同一家餐厅留下评论，或者在具有相同价格范围的餐厅评论 .

//List<Review> user1Reviews is a list of all reviews from the first user
//List<Review> user2Reviews is a list of all reviews from the second user
Dictionary<Review, Review> shared_items = new Dictionary<Review, Review>();
    foreach (var review1 in user1Reviews)
        foreach (var review2 in user2Reviews)
            if (review1.Restaurant.Id == review2.Restaurant.Id || 
                review1.Restaurant.PriceRange == review2.Restaurant.PriceRange)
                if (!shared_items.ContainsKey(review1))
                    shared_items.Add(review1, review2);

现在这是我的实际问题 . 您可以看到我正在为第一个用户离开的每个评论循环第二个列表 . 有没有办法改善这些循环的性能？我曾尝试使用hashset和.contains（）函数，但我需要包含更多条件（即价格范围） . 我无法弄清楚如何将其包含在hashset中 .

我希望它不会太混乱，并提前感谢任何帮助！

Edit: After testing both linq and the for loops I have concluded that the for loops is twice as fast as using linq. Thanks for your help!

2 回答

您可以尝试使用外部循环的条件通过Linq查询替换内部循环：

foreach (var review1 in user1Reviews)
{
    var review2 = user2Reviews.FirstOrDefault(r2 => r2.Restaurant.Id == review1.Restaurant.Id ||
                                            r2.Restaurant.PriceRange == review1.Restaurant.PriceRange);
    if (review2 != null)
    {
        if (!shared_items.ContainsKey(review1))
            shared_items.Add(review1, review2);
    }
}

如果有多个匹配项，则应使用 Where 并处理潜在的结果列表 .

我不确定它会更快，因为你仍然需要检查所有user1评论对user1评论 .

不过，如果您为您的餐馆类编写了自定义比较器，您可以使用Intersect的重载来返回常用评论：

var commonReviews = user1Reviews.Intersect(user2Reviews, new RestaurantComparer());

RestaurantComparer看起来像这样：

// Custom comparer for the Restaurant class
class RestaurantComparer : IEqualityComparer<Restaurant>
{
    // Products are equal if their ids and price ranges are equal.
    public bool Equals(Restaurant x, Restaurant y)
    {
        //Check whether the compared objects reference the same data.
        if (Object.ReferenceEquals(x, y)) return true;

        //Check whether any of the compared objects is null.
        if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
            return false;

        //Check whether the properties are equal.
        return x.Id == y.Id && x.PriceRange == y.PriceRange;
    }

    // If Equals() returns true for a pair of objects 
    // then GetHashCode() must return the same value for these objects.

    public int GetHashCode(Product product)
    {
        //Check whether the object is null
        if (Object.ReferenceEquals(product, null)) return 0;

        //Get hash code for the Id field.
        int hashId product.Id.GetHashCode();

        //Get hash code for the Code field.
        int hashPriceRange = product.PriceRange.GetHashCode();

        //Calculate the hash code for the product.
        return hashId ^ hashPriceRange;
    }
}

回复于 2024-05-01T00:36:52+08:00

0
你基本上需要一种快速的方法来查找 Id or PriceRange 的评论 . 通常，您将使用基于快速哈希的查找结构（如 Dictionary<TKey, TValue> ）作为单个键，或者如果匹配操作是 and 则使用复合键 . 不幸的是你的 or ，所以 Dictionary 不起作用 .

嗯，不是真的 . 单字典不起作用，但你可以使用两个字典，并且由于字典查找是O（1），操作仍然是O（N）（而不是O（N * M）和内循环/天真LINQ） .

由于键不是唯一的，而不是字典，您可以使用lookups，保持相同的效率：
```
var lookup1 = user2Reviews.ToLookup(r => r.Restaurant.Id);
var lookup2 = user2Reviews.ToLookup(r => r.Restaurant.PriceRange);
foreach (var review1 in user1Reviews)
{
    var review2 = lookup1[review.Restaurant.Id].FirstOrDefault() ??
                  lookup2[review.Restaurant.PriceRange].FirstOrDefault();
    if (review2 != null)
    {
        // do something
    }
}
```
回复于 2024-05-01T00:36:52+08:00

提高双循环的性能

2 回答

相关问题