Home Articles

使用vb.net在大型excel文件上执行SQL查询的最佳方法是什么?

Asked
Viewed 686 times
4

Setup Environment:

我正在使用带有.NET framework 4的vb.net开发Excel 2010应用程序级外接程序 .

My goal:

  • 让用户输入多个名称进行搜索

  • 使用名称列表在LARGE电子表格上执行SQL查询(30,000行)

  • 返回记录集并粘贴到新工作表中

表现是我的首要任务 . 我想通过利用.NET框架了解最快的方法 .

在我的代码中使用ADO连接对象可以工作,但这个过程需要很长时间(5 - 8秒) .

This is the SQL query I'm using on the table named wells:

SELECT * 
    FROM wells 
    WHERE padgroup in 

    (SELECT padgroup 
     FROM wells 
     WHERE name LIKE 'TOMCHUCK 21-30'
             OR name LIKE 'FEDERAL 41-25PH')

Here's a portion of what the table looks like:

Excel Table

I'm using this code right now to create an ADO Connection Object to retrieve my results:

'Create Recordset Object
    rsCon = CreateObject("ADODB.Connection")
    rsData = CreateObject("ADODB.Recordset")

    rsCon.Open(szConnect)
    rsData.Open(mySQLQueryToExecute, rsCon, 0, 1, 1)

    'Check to make sure data is received, then copy the data
    If Not rsData.EOF Then

        TargetRange.Cells(1, 1).CopyFromRecordset(rsData)

    Else

        MsgBox("No records returned from : " & SourceFile, vbCritical)

    End If

    'Clean up the Recordset object
    rsData.Close()
    rsData = Nothing
    rsCon.Close()
    rsCon = Nothing

根据我所知,Excel电子表格以Open XML格式存储,.NET框架包括解析XML的本机支持 .

After researching it, I came across a few different options:

有人可以提供一个指针,指出最好的方法吗?我真的很感激 .

Additional Notes:

  • 所有查询都需要能够在不连接到在线数据库的情况下执行

  • 我只需要访问一次电子表格就可以从行中提取原始数据

现在我只是将电子表格作为项目资源嵌入 .

然后,在运行时我创建文件,运行查询,将结果存储在内存中,并删除文件 .

'Create temp file path in the commonapplicationdata folder
    Dim excelsheetpath As StringBuilder

    excelsheetpath = New StringBuilder(Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData))

    excelsheetpath.Append("\MasterList.xlsm")

    'Save resources into temp location in HD
    System.IO.File.WriteAllBytes(excelsheetpath.ToString, My.Resources.MasterList)

    'Now call the function to use ADO to get records from the MasterList.xlsm file just created
    GetData(excelsheetpath.ToString, "Sheet1", "A1:S40000", True, False)

    'Store the results in-memory and display by adding to a datagridview control (in a custom task pane)

    'Delete the spreadsheet
    System.IO.File.Delete(excelsheetpath.ToString())

3 Answers

  • 2

    你正在以错误的方式做VSTO;)不要在Excel中使用SQL . 如果您需要速度,请利用VSTO和本机Excel API . 您可以跳过ADODB / OLEDB图层的开销,直接进入Excel对象模型,在Excel中使用超快速自动过滤器, SpecialCells 方法仅将可见单元格放入多区域范围,并使用 Value 方法快速复制范围到数组 .

    以下是VSTO 2010自定义工作簿示例,可快速搜索包含"aba","cat"或"zon"的a list of 58k words for words .

    using System;
    using System.Collections.Generic;
    using System.Data;
    using System.Linq;
    using System.Text;
    using System.Windows.Forms;
    using System.Xml.Linq;
    using Microsoft.Office.Tools.Excel;
    using Microsoft.VisualStudio.Tools.Applications.Runtime;
    using Excel = Microsoft.Office.Interop.Excel;
    using Office = Microsoft.Office.Core;
    
    namespace ExcelWorkbook1
    {
        public partial class ThisWorkbook
        {
            private void ThisWorkbook_Startup(object sender, System.EventArgs e)
            {
                const int Sheet1 = 1; // you can use Linq to find a sheet by name if needed
                const int ColumnB = 2;
                List<List<object>> results = Query(Sheet1, ColumnB, "aba", "cat", "zon");
    
                foreach (List<object> record in results)
                {
                    System.Diagnostics.Debug.Print("{0,-10} {1,30} {2}", record[0], record[1], record[2]);
                }
            }
    
            private void ThisWorkbook_Shutdown(object sender, System.EventArgs e)
            {
            }
    
            /// <summary>
            ///     Removes any existing Excel autofilters from the worksheet
            /// </summary>
            private void ClearFilter(Microsoft.Office.Interop.Excel._Worksheet worksheet)
            {
                if (worksheet.AutoFilter != null)
                {
                    worksheet.Cells.AutoFilter();
                }
            }
    
            /// <summary>
            ///     Applies an Excel Autofilter to the worksheet for search for an array of substring predicates
            /// </summary>
            private void ApplyFilter(Microsoft.Office.Interop.Excel._Worksheet worksheet, int column, params string[] predicates)
            {
                string[] criteria = new string[predicates.Length];
                int i = 0;
    
                ClearFilter(worksheet);
    
                foreach (string value in predicates)
                {
                    criteria[i++] = String.Concat("=*", value, "*");
                }
    
                worksheet.Cells.AutoFilter(column, criteria, Excel.XlAutoFilterOperator.xlOr); 
            }
    
            /// <summary>
            ///     Returns a list of rows that are hits on a search for an array of substrings in Column B of Sheet1
            /// </summary>
            private List<List<object>> Query(int sheetIndex, int columnIndex, params string[] words)
            {
                Microsoft.Office.Interop.Excel._Worksheet worksheet;
                Excel.Range range;
                List<List<object>> records = new List<List<object>>();
                List<object> record;
                object[,] cells;
                object value;
                int row, column, rows, columns;
                bool hit;
    
                try
                {
                    worksheet = (Microsoft.Office.Interop.Excel._Worksheet)Globals.ThisWorkbook.Sheets[sheetIndex];
                    if (null == worksheet)
                    {
                        return null;
                    }
    
                    // apply the autofilter
                    ApplyFilter(worksheet, columnIndex, words);
    
                    // get the 
                    range = worksheet.Range["$A:$C"].SpecialCells(Excel.XlCellType.xlCellTypeVisible);
                    foreach (Excel.Range subrange in range.Areas)
                    {
                        // copy the cells to a multidimensional array for perfomance
                        cells = subrange.Value;
    
                        // transform the multidimensional array to a List
                        for (row = cells.GetLowerBound(0), rows = cells.GetUpperBound(0); row <= rows; row++)
                        {
                            record = new List<object>();
                            hit = false;
    
                            for (column = cells.GetLowerBound(1), columns = cells.GetUpperBound(1); column <= columns; column++)
                            {
                                value = cells[row, column];
                                hit = hit || (null != value);
    
                                if (hit)
                                {
                                    record.Add(cells[row, column]);
                                }
                            }
    
                            if (hit)
                            {
                                records.Add(record);
                            }
                        }
                    }
                }
                catch { }
                finally
                {
                    // use GC.Collect over Marshal.ReleaseComObject() to release all RCWs per http://stackoverflow.com/a/17131389/1995977 and more
                    cells = null;
                    GC.Collect();
                    GC.WaitForPendingFinalizers();
                }
    
                return records;
            }
    
            #region VSTO Designer generated code
    
            /// <summary>
            /// Required method for Designer support - do not modify
            /// the contents of this method with the code editor.
            /// </summary>
            private void InternalStartup()
            {
                this.Startup += new System.EventHandler(ThisWorkbook_Startup);
                this.Shutdown += new System.EventHandler(ThisWorkbook_Shutdown);
            }
    
            #endregion
    
        }
    }
    
  • 0

    Excel 2010文件不是完全XML . 获取XLSX(或XMSM)文件,并使用.zip扩展名重命名该文件 . 然后将其解压缩到一个新文件夹 . 子文件夹中的文件将是XML文件,是的,但实际的XLSX文件是一个zip文件,其中包含包含XML文件的文件夹集合 .

    我认为,您最好的选择是使用ACE驱动程序(不再支持JET)并通过ODBC访问它 . 如果这还不够快,您可以在特定时间提取摘录并将其上传到可以运行查询的数据库;查询应该更快,但可能会过时 .

  • 0

    My Solution:

    我尝试了三种不同的方法:

    • 带SQL的ADO连接对象(最慢)

    • VSTO和Excel的自动过滤器(可靠)

    • LINQ to XML (fastest)

    LINQ to XML提供了最佳性能 . 我将我的表转换为XML文件:

    XML Table

    然后,在我的代码中,我使用StringReader引入XMLwellData文件(将其保存为项目资源) .

    'welldoc will be the file to do queries on using LINQ to XML
        Dim stream As System.IO.StringReader
        stream = New StringReader(My.Resources.XMLwellData)
    
        welldoc = XDocument.Load(stream)
    
        'clean up stream now that it's no longer needed
        stream.Close()
        stream.Dispose()
    
    
        '***** later in the code perform my query on XML file *********
    
            Dim query = _
         From well In welldoc.<wellList>.<well> _
         Where well.<name>.Value Like "TOMCHUCK 21-30" _
         Select well
    
         For Each well in query
    
            MessageBox.Show(well.<padgroup>.value)
    
         Next
    

    做我想做的事情简直太简单了,最棒的是它很快 .

    感谢您的所有帮助和建议 . 这对我来说很有意义 .

    Alternative Method using Excel's Autofilter

    如果您尝试使用其他答案中建议的代码,则只会过滤两个值:

    worksheet.Cells.AutoFilter(column, criteria, Excel.XlAutoFilterOperator.xlOr);
    

    因此,对于filter with multiple criteria using Excel's Auotfilter,您必须将参数作为数组传递并过滤xlFilterValues .

    Dim wrkbk As Excel.Workbook
        Dim wrksht As Excel.Worksheet
        Dim myRange As Excel.Range
        Dim cell As Excel.Range
    
        'You would add all of your wellnames to search to this List
        Dim wellNames As New List(Of String)
    
        wrksht = wrkbk.Sheets(1)
    
        'In my excel file there is a Named Range which includes the all the information
        myRange = wrksht.Range("WellLookUpTable")
    
        'Notice, for `Criteria1:=` you MUST convert the List to an array
        With wrksht.Range("WellLookUpTable")
            .AutoFilter(Field:=2, Criteria1:=wellNames.ToArray, Operator:=Excel.XlAutoFilterOperator.xlFilterValues)
        End With
    
        myRange = wrksht.Range("A2", wrksht.Range("A2").End(Excel.XlDirection.xlDown)).Cells.SpecialCells(Excel.XlCellType.xlCellTypeVisible)
    
        For Each cell In myRange
    
            'column 11 is padgroup
            MessageBox.Show(cell.Offset(0, 11).Value)
    
        Next
    

Related