首页 文章

用大写字母拆分字符串

提问于
浏览
64

在给定字符集出现之前分割字符串的pythonic方法是什么?

例如,我想在任何大写字母出现时分割 'TheLongAndWindingRoad' (可能除了第一个),并获得 ['The', 'Long', 'And', 'Winding', 'Road'] .

编辑:它也应该拆分单次出现,即从 'ABC' 我想获得 ['A', 'B', 'C'] .

12 回答

  • 20

    不使用正则表达式或枚举的替代方法:

    word = 'TheLongAndWindingRoad'
    list = [x for x in word]
    
    for char in list:
        if char != list[0] and char.isupper():
            list[list.index(char)] = ' ' + char
    
    fin_list = ''.join(list).split(' ')
    

    我认为如果没有链接太多方法或使用难以阅读的长列表理解,它会更清晰,更简单 .

  • 17
    import re
    filter(None, re.split("([A-Z][^A-Z]*)", "TheLongAndWindingRoad"))
    

    要么

    [s for s in re.split("([A-Z][^A-Z]*)", "TheLongAndWindingRoad") if s]
    
  • 1

    这是另一种正则表达式解决方案 . 该问题可以被称为“如何在执行拆分之前在每个大写字母之前插入空格”:

    >>> s = "TheLongAndWindingRoad ABC A123B45"
    >>> re.sub( r"([A-Z])", r" \1", s).split()
    ['The', 'Long', 'And', 'Winding', 'Road', 'A', 'B', 'C', 'A123', 'B45']
    

    这具有保留所有非空白字符的优点,而大多数其他解决方案则不能 .

  • 0
    src = 'TheLongAndWindingRoad'
    glue = ' '
    
    result = ''.join(glue + x if x.isupper() else x for x in src).strip(glue).split(glue)
    
  • 4

    另一个没有正则表达式,并且如果需要可以保持连续的大写

    def split_on_uppercase(s, keep_contiguous=False):
        """
    
        Args:
            s (str): string
            keep_contiguous (bool): flag to indicate we want to 
                                    keep contiguous uppercase chars together
    
        Returns:
    
        """
    
        string_length = len(s)
        is_lower_around = (lambda: s[i-1].islower() or 
                           string_length > (i + 1) and s[i + 1].islower())
    
        start = 0
        parts = []
        for i in range(1, string_length):
            if s[i].isupper() and (not keep_contiguous or is_lower_around()):
                parts.append(s[start: i])
                start = i
        parts.append(s[start:])
    
        return parts
    
    >>> split_on_uppercase('theLongWindingRoad')
    ['the', 'Long', 'Winding', 'Road']
    >>> split_on_uppercase('TheLongWindingRoad')
    ['The', 'Long', 'Winding', 'Road']
    >>> split_on_uppercase('TheLongWINDINGRoadT', True)
    ['The', 'Long', 'WINDING', 'Road', 'T']
    >>> split_on_uppercase('ABC')
    ['A', 'B', 'C']
    >>> split_on_uppercase('ABCD', True)
    ['ABCD']
    >>> split_on_uppercase('')
    ['']
    >>> split_on_uppercase('hello world')
    ['hello world']
    
  • 4

    用给定的空格加上字母“L”替换给定的每个大写字母“L” .

    def splitAtUpperCase(text):
        result = ""
        for char in text:
            if char.isupper():
                result += " " + char
            else:
                result += char
        return result.split()
    

    在给定示例的情况下:

    print(splitAtUpperCase('TheLongAndWindingRoad')) 
    ['The', 'Long', 'And', 'Winding', 'Road']
    

    您也可以使用带有 if 语句的 for 循环

    def splitAtUpperCase(s):
        for i in range(len(s)-1)[::-1]:
            if s[i].isupper() and s[i+1].islower():
                s = s[:i]+' '+s[i:]
            if s[i].isupper() and s[i-1].islower():
                s = s[:i]+' '+s[i:]
        return ' '.join(s.split)
    print(splitAtUpperCase(TheLongAndWindingRoad)
    
    >>>> 'The Long And Winding Road'
    

    谢谢 .

  • 3

    不幸的是,在Python中不可能split on a zero-width match . 但您可以使用 re.findall 代替:

    >>> import re
    >>> re.findall('[A-Z][^A-Z]*', 'TheLongAndWindingRoad')
    ['The', 'Long', 'And', 'Winding', 'Road']
    >>> re.findall('[A-Z][^A-Z]*', 'ABC')
    ['A', 'B', 'C']
    
  • 0
    >>> import re
    >>> re.findall('[A-Z][a-z]*', 'TheLongAndWindingRoad')
    ['The', 'Long', 'And', 'Winding', 'Road']
    
    >>> re.findall('[A-Z][a-z]*', 'SplitAString')
    ['Split', 'A', 'String']
    
    >>> re.findall('[A-Z][a-z]*', 'ABC')
    ['A', 'B', 'C']
    

    如果您希望 "It'sATest" 拆分为 ["It's", 'A', 'Test'] ,请将rexeg更改为 "[A-Z][a-z']*"

  • 0

    替代解决方案(如果您不喜欢显式正则表达式):

    s = 'TheLongAndWindingRoad'
    
    pos = [i for i,e in enumerate(s) if e.isupper()]
    
    parts = []
    for j in xrange(len(pos)):
        try:
            parts.append(s[pos[j]:pos[j+1]])
        except IndexError:
            parts.append(s[pos[j]:])
    
    print parts
    
  • 101

    使用 enumerateisupper() 的替代方法

    Code:

    strs = 'TheLongAndWindingRoad'
    ind =0
    count =0
    new_lst=[]
    for index, val in enumerate(strs[1:],1):
        if val.isupper():
            new_lst.append(strs[ind:index])
            ind=index
    if ind<len(strs):
        new_lst.append(strs[ind:])
    print new_lst
    

    Output:

    ['The', 'Long', 'And', 'Winding', 'Road']
    
  • -1

    使用more_itertools.split_before工具可以实现这一点 .

    import more_itertools as mit
    
    iterable = "TheLongAndWindingRoad"
    [ "".join(i) for i in mit.split_before(iterable, lambda s: s.isupper())]
    # ['The', 'Long', 'And', 'Winding', 'Road']
    

    它也应该拆分单次出现,即从'ABC'我想获得['A','B','C'] .

    iterable = "ABC"
    [ "".join(i) for i in mit.split_before(iterable, lambda s: s.isupper())]
    # ['A', 'B', 'C']
    

    more_itertools是一个第三方软件包,包含60个有用的工具,包括所有原始itertools recipes的实现,这避免了他们的手动实现 .

  • 2

    @ChristopheD解决方案的变体

    s = 'TheLongAndWindingRoad'
    
    pos = [i for i,e in enumerate(s+'A') if e.isupper()]
    parts = [s[pos[j]:pos[j+1]] for j in xrange(len(pos)-1)]
    
    print parts
    

相关问题