首页 文章

Swift计算大文件的MD5校验和

提问于
浏览
8

我正在为大型视频文件创建MD5校验和 . 我目前正在使用代码:

extension NSData {
func MD5() -> NSString {
    let digestLength = Int(CC_MD5_DIGEST_LENGTH)
    let md5Buffer = UnsafeMutablePointer<CUnsignedChar>.allocate(capacity: digestLength)

    CC_MD5(bytes, CC_LONG(length), md5Buffer)
    let output = NSMutableString(capacity: Int(CC_MD5_DIGEST_LENGTH * 2))
    for i in 0..<digestLength {
        output.appendFormat("%02x", md5Buffer[i])
    }

    return NSString(format: output)
    }
}

但这会产生一个内存缓冲区,对于大型视频文件来说并不理想 . 在Swift中有没有办法计算读取文件流的MD5校验和,因此内存占用量最小?

2 回答

  • 1

    您可以以块的形式计算MD5校验和,如图所示 . 在Is there a MD5 library that doesn't require the whole input at the same time? .

    以下是使用Swift的可能实现:

    func md5File(url: URL) -> Data? {
    
        let bufferSize = 1024 * 1024
    
        do {
            // Open file for reading:
            let file = try FileHandle(forReadingFrom: url)
            defer {
                file.closeFile()
            }
    
            // Create and initialize MD5 context:
            var context = CC_MD5_CTX()
            CC_MD5_Init(&context)
    
            // Read up to `bufferSize` bytes, until EOF is reached, and update MD5 context:
            while autoreleasepool(invoking: {
                let data = file.readData(ofLength: bufferSize)
                if data.count > 0 {
                    data.withUnsafeBytes {
                        _ = CC_MD5_Update(&context, $0, numericCast(data.count))
                    }
                    return true // Continue
                } else {
                    return false // End of file
                }
            }) { }
    
            // Compute the MD5 digest:
            var digest = Data(count: Int(CC_MD5_DIGEST_LENGTH))
            digest.withUnsafeMutableBytes {
                _ = CC_MD5_Final($0, &context)
            }
    
            return digest
    
        } catch {
            print("Cannot open file:", error.localizedDescription)
            return nil
        }
    }
    

    需要自动释放池才能释放 file.readData() 返回的内存,而不会将整个(可能很大的)文件加载到内存中 . 感谢Abhi Beckert注意到并提供了实施 .

    如果您需要将摘要作为十六进制编码的字符串,则将返回类型更改为 String? 并替换

    return digest
    

    通过

    let hexDigest = digest.map { String(format: "%02hhx", $0) }.joined()
    return hexDigest
    
  • 12

    针对SHA256哈希的解决方案(基于Martin R的答案):

    func sha256(url: URL) -> Data? {
        do {
            let bufferSize = 1024 * 1024
            // Open file for reading:
            let file = try FileHandle(forReadingFrom: url)
            defer {
                file.closeFile()
            }
    
            // Create and initialize SHA256 context:
            var context = CC_SHA256_CTX()
            CC_SHA256_Init(&context)
    
            // Read up to `bufferSize` bytes, until EOF is reached, and update SHA256 context:
            while autoreleasepool(invoking: {
                // Read up to `bufferSize` bytes
                let data = file.readData(ofLength: bufferSize)
                if data.count > 0 {
                    data.withUnsafeBytes {
                        _ = CC_SHA256_Update(&context, $0, numericCast(data.count))
                    }
                    // Continue
                    return true
                } else {
                    // End of file
                    return false
                }
            }) { }
    
            // Compute the SHA256 digest:
            var digest = Data(count: Int(CC_SHA256_DIGEST_LENGTH))
            digest.withUnsafeMutableBytes {
                _ = CC_SHA256_Final($0, &context)
            }
    
            return digest
        } catch {
            print(error)
            return nil
        }
    }
    

    使用前面创建的名为 fileURLURL 类型的实例:

    if let digestData = sha256(url: fileURL) {
        let calculatedHash = digestData.map { String(format: "%02hhx", $0) }.joined()
        DDLogDebug(calculatedHash)
    }
    

相关问题