首页 文章

将 dataframe to_csv 文件缓冲区上载到 Google Cloud Storage 时出现 UnicodeError

提问于
浏览
0
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 26612: Body (''') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

我使用带编码集的df.to_csv方法创建一个 CSV 文件。仍然,我在将文件上传到 Google 云端存储时收到以上错误。

json_data = [
    {
      "Critics": "Like many of the landmark films of the 1980s, I had watched Mani Ratnam\u2019s breakout hit \u201cMouna Ragam\u201d at a time when I hadn\u2019t yet crossed paths with ideas like feminism and gender equality. Though I retained fragments of the film somewhere in the recesses of my mind, the details had long since escaped conscious reflection.\n\n\n\nWith the 30th anniversary of the film rolling by earlier this month, I returned to the film expectant and hesitant by turns. Playing on my mind was the question of whether the film could stand up to renewed scrutiny with so much having been written and said about the institution of marriage in the intervening decades.  \n\n\n\nFor the uninitiated, \u201cMouna Ragam\u201d tells the tale of an arranged marriage that is hobbled from the start by the asymmetry of feeling between the two spouses. While Chandrakumar (played by Mohan) or CK, as he\u2019s called, wants to be an enthusiastic and caring husband, his wife Divya (Revathi) enters the marriage reluctantly, only out of the guilt of causing her father\u2019s heart attack by her initial refusal of the match. Burdened by the emotional remnants of a past relationship, Divya finds herself unwilling and unable to love her husband, but is eventually won over.\n\n\n\nExpectedly, there are elements of the story that don\u2019t sit well if we read them according to today\u2019s standards. So let\u2019s get done with them first. When Divya first objects to her marriage, she does so with principled arguments against the idea of arranged marriage (asking at one point if she is being sold off to the lowest bidder in terms of dowry). She wants to study further, she declares instead. But as the movie progresses, we find out that these vehement declarations were a cover for feelings she still holds for a dead lover (Karthik), a Robin-Hoodesque strongman, who participates in violence in the name of certain un-enunciated principles.\n\n\n\nThat Divya is only unwilling to participate in her current marriage because of a previous relationship is something of a letdown, in an otherwise interesting premise. That the flashback of that relationship shows it starting off with the \u201cdon\u2019t take no for an answer\u201d method that seems standard courtship protocol for Tamil \u201cheroes\u201d adds to this disappointment. Though, to be fair, it is far less aggressive and offensive than some of the examples we see today.\n\n\n\nDrawing as it does from a refusal to move on from that first relationship, what initially seems like a spirited resistance to arranged marriage from Divya gradually become reduced to a naive stubbornness. Indeed, quite late in the film, when Divya has started to come around, but CK hasn\u2019t realised it yet, he accuses her of being childish, and declares that such childishness can only be tolerated up to a point.\n\n\n\nTake that easy crutch of the past lover out of the film, however, and there are quite a few engaging touches to the film that make you see why this film established Mani Ratnam\u2019s reputation as a writer and director. Against the enduring mythologising of the good Tamil woman as quiet, self-sacrificing, and self-effacing, Divya is a breath of fresh air. She has a spark of irreverence, an insistent recognition of her own desires, and a fairly clear-headed understanding of her own strengths and short-comings. And Revathi plays her with an exuberance and energy that is quite a treat to watch. Her no-nonsense declaration of why she would not make a good wife to CK when they first meet, for instance, is one of the highlights of the film.\n\n\n\nAnd Mani Ratnam ensures that the film stays focused on Divya and her particular experience, and this is what separates \"Mouna Ragam\" from other similar films like the Bhagyaraj-starrer \"Andha Ezhu Naatkal\", which focuses more on grand pronouncements about marriage.\n\n\n\nCK is the other strong pillar of the film. Starting off with a dignified forbearance at Divya\u2019s resistance to him, he gradually veers into passive-aggressive territory as the marriage seems less and less likely to work. Thus, when the tables turn, he throws back at Divya some of the very lines she first uses to disabuse him that their marriage is anything more than an empty performance (the reference to the thaali as nothing more than a yellow-dyed string around her neck, for instance).\n\n\n\nConsidering how often marriage gets framed onscreen in passive-aggressive terms, this could have been a major self-goal, but the film holds CK back from going too far down that road, so that it adds texture to his character without ruining him in the process.\n\n\n\nBut what\u2019s most interesting about the film is its willingness to engage with the D-word (a taboo in its time) \u2013 divorce.\n\n\n\nThe first time CK and Divya go out in the city together after they are married, he tells her he wants to buy her gift, and she says that the only thing she wants is a divorce. A day later, gift-wrapped on the coffee table, is a package containing anklets and divorce papers. \u201cChoose what you want,\u201d declares CK. And Divya holds the anklets for a few seconds before resolutely signing the papers. Seven days after they\u2019re married, the couple is filing for a mutual consent divorce.\n\n\n\nThis may not seem like a big deal now, but the Censor Board reportedly wanted to give the film an \u2018A\u2019 certificate because it dealt with divorce thus.\n\n\n\nIt isn\u2019t possible to talk about \u201cMouna Ragam\u201d without mentioning two other high-water marks in the film. The first, of course, is the incomparable Ilaiyaraaja whose wonderful songs stay with you long after the film fades from memory, whether it\u2019s the exuberant \u201cOh Oh Megam Vanthatho\u201d or the more haunting \u201cMandram Vandha\u201d, which many years later got repackaged as the signature tune for \u201cCheeni Kum\u201d. \n\n\n\nThe second is cinematographer PC Sreeram, who manages to give the house that CK and Divya live in the sense of a space inhabited by both, but never shared, as each misses the moment when the other opens up to the possibility of commonality.\n\n\n\nYes, \u201cMouna Ragam\u201d is a mainstream film, and in that sense holds on to many of the prejudices of its time. But it holds to them loosely enough that, even three decades later, it offers up possibilities that you can think, imagine and ultimately enjoy. Considering that Mani Rathnam\u2019s last venture into relationship territory was the eminently forgettable \u201cOK Kanmani\u201d (though I enjoyed the effervescent Nithya Menen, that\u2019s just a fanboy aside), \u201cMouna Ragam\u201d seems leagues ahead of its time. And for that, \u201cMouna Ragam\u201d deserves its reputation as a classic. \n\n",
    }
]
dataframe = DataFrame(json_data)

这是我创建 CSV 文件的代码段。

buffer = StringIO()
df.to_csv(buffer, encoding="utf-8")

将代码上传到 GCS 的代码看起来像这样

blob = Blob(blob_path, bucket)
blob.upload_from_file(buffer, content_type=content_type)

1 回答

  • 1
    blob = Blob(blob_path, bucket)
    blob.upload_from_file(buffer, content_type=content_type)
    

    问题出在upload_from_file方法上。在内部,他们正在读取文件并将其作为块上传到 GCS blob。

    HTTP库中的一个方法将此内容解析为latin-1.

    def _encode(data, name='data'):
        """Call data.encode("latin-1") but show a better error message."""
        try:
            return data.encode("latin-1")
        except UnicodeEncodeError as err:
            raise UnicodeEncodeError(
                err.encoding,
                err.object,
                err.start,
                err.end,
                "%s (%.20r) is not valid Latin-1. Use %s.encode('utf-8') "
                "if you want to send it encoded in UTF-8." %
                (name.title(), data[err.start:err.end], name)) from None
    

    所以我将upload_from_file替换为upload_to_string方法

    csv_data = df.to_csv(encoding="utf-8")
    blob = Blob(blob_path, bucket)
    blob.upload_from_string(csv_data, content_type=content_type)
    

相关问题