首页 文章

用一堆数字排序一行

提问于
浏览
1

我有一条线:

string 2 2 3 3 1 4

其中第2列,第4列和第6列表示ID(假设每个ID号是唯一的),第3列,第5列和第7列表示与相应ID相关联的一些数据 .

如何重新排列该行,以便按ID排序?

string 1 4 2 2 3 3

注意:与示例不同,一行可以包含任意数量的ID .

使用shell脚本,我在想类似的东西

while read n    
do
   echo $(echo $n | sork -k (... stuck here) )
done < infile

4 回答

  • 1

    作为bash脚本,可以通过以下方式完成:

    Code:

    #!/usr/bin/env bash
    
    # send field pairs as separate lines
    function emit_line() {
        while [ $# -gt 0 ] ; do
            echo "$1" "$2"
            shift; shift
        done
    }
    
    # break the line into pieces and send to sort
    function sort_line() {
        echo $1
        shift
        emit_line $* | sort
    }
    
    # loop through the lines in the file and sort by key-value pairs
    while read n; do
       echo $(sort_line $n)
    done < infile
    

    File infile:

    string 2 2 3 3 1 4
    string 2 2 0 3 4 4 1 7
    string 2 2 0 3 2 1
    

    Output:

    string 1 4 2 2 3 3
    string 0 3 1 7 2 2 4 4
    string 0 3 2 1 2 2
    

    Update:

    grail's version中删除排序,以删除(慢得多)外部排序:

    function sort_line() {
        line="$1"
        shift
    
        while [ $# -gt 0 ] ; do
            data[$1]=$2
            shift; shift
        done
    
        for i in ${!data[@]}; do
            out="$line $i ${data[i]}"
        done
        unset data
        echo $line
    }
    
    while read n; do
       sort_line $n
    done < infile
    
  • 2

    另一种bash替代方案,它不依赖于有多少id:

    #!/usr/bin/env bash
    
    x='string 2 2 3 3 1 4'
    out="${x%% *}" 
    
    in=($x)
    
    for (( i = 1; i < ${#in[*]}; i += 2 ))
    do
      new[${in[i]}]=${in[i+1]}
    done
    
    for i in ${!new[@]}
    do
      out="$out $i ${new[i]}"
    done
    
    echo $out
    

    如果您想要读取文件,可以在批次周围放置一个循环

  • 2

    我将为您的长选项列表添加一个gawk解决方案 .

    这是一个独立的脚本:

    #!/usr/bin/env gawk -f
    
    {
        line=$1
    
        # Collect the tuples into values of an array,
        for (i=2;i<NF;i+=2) a[i]=$i FS $(i+1)
    
        # This sorts the array "a" by value, numerically, ascending...
        asort(a, a, "@val_num_asc")
    
        # And this for loop gathers the result.
        for (i=0; i<length(a); i++) line=line FS a[i]
    
        # Finally, print the line,
        print line
    
        # and clear the array for the next round.
        delete a
    }
    

    这可以通过将元组复制到数组中,对数组进行排序,然后在打印数组元素的for循环中重新组装已排序的元组来实现 .

    请注意,由于使用了 asort() ,它只是gawk(不是传统的awk) .

    $ cat infile
    string 2 2 3 3 1 4
    other 5 1 20 9 3 7
    $ ./sorttuples infile
    string   1 4 2 2 3 3
    other   3 7 5 1 20 9
    
  • 1

    你可以使用python . 此函数将列拆分为 list ,然后可以对其进行排序 . 然后使用 itertools.chain 重新组合键值对 .

    Code:

    import itertools as it
    
    def sort_line(line):
        # split the line on white space
        x = line.split()
    
        # make a tuple of key value pairs
        as_tuples = [tuple(x[i:i+2]) for i in range(1, len(x), 2)]
    
        # sort the tuples, and flatten them with chain
        sorted_kv = list(it.chain(*sorted(as_tuples)))
    
        # join the results back into a string
        return ' '.join([x[0]] + sorted_kv)
    

    Test Code:

    data = [
        "string 2 2 3 3 1 4",
        "string 2 2 0 3 4 4 1 7",
    ]
    
    for line in data:
        print(sort_line(line))
    

    Results:

    string 1 4 2 2 3 3
    string 0 3 1 7 2 2 4 4
    

相关问题