首页 文章

在linux,bash的单独列中回显2个变量

提问于
浏览
-2

我试图在各自的列下打印2个变量(Phaster_positions和GBKPositions) . 我希望每个变量都打印在由制表符分隔的列下 . 这是我获得的:

Phaster_positions             GBKPositions  Phaster_positions  GBKPositions
371860-418565                 247..381
2947108-2988239               378..1781
4663633-4680174               1884..2987
5756724-5793879               3008..3103
5794433-5829445               3128..4405
6867447-6901202               4479..5081
5102..6229
6253..8670
complement(8742..9269)
complement(9583..10563)
complement(10560..12458)
complement(12455..13402)
complement(13973..15541)
complement(15881..16051)
16440..16814
complement(16858..18234)
complement(18254..18628)
complement(18710..20266)
complement(20317..22452)
complement(22888..23454)
complement(23474..25552)
complement(25557..26504)
26735..27631
complement(27655..29334)
29603..30559
complement(30534..31982)
complement(32016..33389)
complement(33391..34734)
complement(34736..35692)
complement(35761..36267)
36431..37459
37519..38688

我想要:

Phaster_positions   GBKPositions

371860-418565       247..381
2947108-2988239     378.1781
4663633-4680174     etc
5756724-5793879     etc
5794433-5829445     etc
6867447-6901202     etc

我的剧本:

#!/bin/bash

printf "Phaster_positions\n\n">gbk31.txt
printf "GBKPositions\n\n">gbk32.txt

PhasterPositions=`awk '$2~/[0-9]Kb/{print ($5)}' CP000155.phaster`
GBKPositions=`awk '$1~/CDS/{print ($2)}' CP000155.gbk`

echo -e "$PhasterPositions">>gbk31.txt
echo -e "$GBKPositions">>gbk32.txt

joined=`paste gbk31.txt gbk32.txt | column -s $'\t' -t`
echo -e "$joined">> gbkfinal.txt

第一个变量的源文件:

gi|00000000|ref|NC_000000|  Hahella chejuensis KCTC 2396, complete genome. .7215267, gc%: 53.87%
                                  REGION         REGION_LENGTH            COMPLETENESS(score)           SPECIFIC_KEYWORD                             REGION_POSITION          TRNA_NUM                 TOTAL_PROTEIN_NUM       PHAGE_HIT_PROTEIN_NUM            HYPOTHETICAL_PROTEIN_NUM         PHAGE+HYPO_PROTEIN_PERCENTAGE    BACTERIAL_PROTEIN_NUM            ATT_SITE_SHOWUP                  PHAGE_SPECIES_NUM                MOST_COMMON_PHAGE_NAME(hit_genes_count)    FIRST_MOST_COMMON_PHAGE_NUM      FIRST_MOST_COMMON_PHAGE_PERCENTAGE   GC_PERCENTAGE                 
                                 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                  1              46.7Kb                   questionable(80)              head,terminase,tail,capsid,recombinase       371860-418565            0                        38                      27                               8                                92.1%                            3                                yes                              10                               PHAGE_Pseudo_phi3_NC_030940(17),PHAGE_Aeromo_phiO18P_NC_009542(15),PHAGE_Haemop_HP1_NC_001697(11),PHAGE_Pasteu_F108_NC_008193(9),PHAGE_Vibrio_8_NC_022747(8),PHAGE_Vibrio_K139_NC_003313(8),PHAGE_Haemop_HP2_NC_003315(7),PHAGE_Phormi_MIS_PhV1A_NC_029032(3),PHAGE_Ralsto_RSY1_NC_025115(3),PHAGE_Burkho_KS14_NC_015273(2),PHAGE_Entero_186_NC_001317(2),PHAGE_Entero_N15_NC_001901(1),PHAGE_Salmon_SEN1_NC_029003(1),PHAGE_Mannhe_vB_MhM_587AP1_NC_028898(1),PHAGE_Salmon_RE_2010_NC_019488(1),PHAGE_Vibrio_vB_VpaM_MAR_NC_019722(1),PHAGE_Klebsi_phiKO2_NC_005857(1),PHAGE_Burkho_KS5_NC_015265(1),PHAGE_Pseudo_YuA_NC_010116(1),PHAGE_Vibrio_VP882_NC_009016(1),PHAGE_Mannhe_phiMHaA1_NC_008201(1),PHAGE_Pseudo_MP1412_NC_018282(1),PHAGE_Stenot_Smp131_NC_023588(1),PHAGE_Pseudo_JG004_NC_019450(1),PHAGE_Bdello_phi1422_NC_019525(1),PHAGE_Salmon_Fels_2_NC_010463(1),PHAGE_Bacill_G_NC_023719(1),PHAGE_Pseudo_phiCTX_NC_003278(1),PHAGE_Psychr_Psymv2_NC_023734(1),PHAGE_Entero_fiAA91_ss_NC_022750(1),PHAGE_Escher_pro483_NC_028943(1),PHAGE_Burkho_KL3_NC_015266(1)   16                               44.73%                           55.35%                        
                                  2              41.1Kb                   intact(120)                   integrase,head,recombinase,capsid,tail       2947108-2988239          1                        53                      23                               28                               96.2%                            2                                yes                              18                               PHAGE_Pseudo_phi2_NC_030931(10),PHAGE_Entero_lambda_NC_001416(4),PHAGE_Pseudo_F10_NC_007805(4),PHAGE_Escher_vB_EcoM_ECO1230_10_NC_027995(3),PHAGE_Entero_N15_NC_001901(3),PHAGE_Burkho_AH2_NC_018283(3),PHAGE_Shewan_1/44_NC_025463(2),PHAGE_Achrom_phiAxp_2_NC_029106(2),PHAGE_Vibrio_VvAW1_NC_020488(2),PHAGE_Burkho_BcepNazgul_NC_005091(2),PHAGE_Entero_Arya_NC_031048(2),PHAGE_Entero_mEp460_NC_019716(2),PHAGE_Entero_HK630_NC_019723(2),PHAGE_Vibrio_X29_NC_024369(2),PHAGE_Escher_vB_EcoM_ep3_NC_025430(1),PHAGE_Salmon_phiSG_JL2_NC_010807(1),PHAGE_Rueger_DSS3_P1_NC_025428(1),PHAGE_Shigel_SfIV_NC_022749(1),PHAGE_Klebsi_phiKO2_NC_005857(1),PHAGE_Shigel_Ss_VASD_NC_028685(1),PHAGE_Entero_SfV_NC_003444(1),PHAGE_Marino_P12026_NC_018269(1),PHAGE_Entero_HK629_NC_019711(1),PHAGE_Entero_phi80_NC_021190(1),PHAGE_Entero_BP_4795_NC_004813(1),PHAGE_Burkho_BcepIL02_NC_012743(1),PHAGE_Vibrio_VP882_NC_009016(1),PHAGE_Entero_VT2phi_272_NC_028656(1),PHAGE_Phage_Gifsy_1_NC_010392(1),PHAGE_Bdello_phi1422_NC_019525(1),PHAGE_Vibrio_VpKK5_NC_026610(1),PHAGE_Pectob_ZF40_NC_019522(1),PHAGE_Ralsto_RS138_NC_029107(1),PHAGE_Entero_mEp237_NC_019704(1),PHAGE_Salico_CGphi29_NC_020844(1),PHAGE_Entero_HK225_NC_019717(1),PHAGE_Bacill_Slash_NC_022774(1),PHAGE_Rhodob_RcapNL_NC_020489(1),PHAGE_Pseudo_F116_NC_006552(1),PHAGE_Escher_80001_NC_027387(1),PHAGE_Salmon_FSLSP088_NC_021780(1),PHAGE_Pseudo_PPpW_3_NC_023006(1),PHAGE_Vibrio_vB_VpaM_MAR_NC_019722(1),PHAGE_Synech_S_CBS1_NC_016164(1),PHAGE_Burkho_KS14_NC_015273(1),PHAGE_Stenot_S1_NC_011589(1),PHAGE_Escher_TL_2011c_NC_019442(1),PHAGE_Entero_186_NC_001317(1),PHAGE_Entero_cdtI_NC_009514(1),PHAGE_Burkho_DC1_NC_018452(1),PHAGE_Bacter_Lily_NC_028841(1),PHAGE_Burkho_BcepMigl_NC_019917(1),PHAGE_Salmon_iEPS5_NC_021783(1),PHAGE_Erwini_vB_EamP_L1_NC_019510(1),PHAGE_Escher_P13374_NC_018846(1),PHAGE_Vibrio_SIO_2_NC_016567(1)   4                                18.86%                           53.18%                        
                                  3              16.5Kb                   intact(110)                   tail,head,capsid,terminase                   4663633-4680174          0                        17                      12                               5                                100%                             0                                no                               10                               PHAGE_Salmon_ST64B_NC_004313(3),PHAGE_Entero_phiP27_NC_003356(3),PHAGE_Burkho_phi6442_NC_009235(3),PHAGE_Burkho_phiE125_NC_003309(3),PHAGE_Burkho_phi1026b_NC_005284(3),PHAGE_Entero_SfV_NC_003444(2),PHAGE_Entero_HK140_NC_019710(2),PHAGE_Salmon_118970_sal3_NC_031940(2),PHAGE_Strept_phiSASD1_NC_014229(2),PHAGE_Salmon_118970_sal3_NC_031940(2),PHAGE_Idioma_1N2_2_NC_025439(1),PHAGE_Shigel_SfIV_NC_022749(1),PHAGE_Entero_mEp235_NC_019708(1),PHAGE_Mannhe_vB_MhS_1152AP2_NC_028956(1),PHAGE_Vibrio_12B8_NC_021073(1),PHAGE_Mycoba_Lockley_NC_011021(1),PHAGE_Entero_HK022_NC_002166(1),PHAGE_Entero_mEp390_NC_019721(1),PHAGE_Entero_BP_4795_NC_004813(1),PHAGE_Marino_P12026_NC_018269(1),PHAGE_Colwel_9A_NC_018088(1),PHAGE_Vibrio_VpKK5_NC_026610(1),PHAGE_Clostr_phiCD6356_NC_015262(1),PHAGE_Entero_HK542_NC_019769(1),PHAGE_Entero_IME_EFm5_NC_028826(1),PHAGE_Geobac_E2_NC_009552(1),PHAGE_Entero_IME_EFm1_NC_024356(1),PHAGE_Burkho_KS9_NC_013055(1),PHAGE_Pseudo_Pq0_NC_029100(1),PHAGE_Rhizob_vB_RleS_L338C_NC_023502(1),PHAGE_Entero_SfI_NC_027339(1),PHAGE_Geobac_GBK2_NC_023612(1),PHAGE_Shigel_SfII_NC_021857(1),PHAGE_Rhodoc_REQ1_NC_016655(1),PHAGE_Burkho_Bcep176_NC_007497(1),PHAGE_Entero_mEpX2_NC_019705(1),PHAGE_Mycoba_MOOREtheMARYer_NC_028791(1)   3                                17.64%                           58.49%                        
                                  4              37.1Kb                   questionable(90)              tail,virion,capsid,portal,terminase          5756724-5793879          0                        30                      22                               4                                86.6%                            4                                no                               15                               PHAGE_Pseudo_JBD93_NC_030918(5),PHAGE_Pseudo_M6_NC_007809(5),PHAGE_Pseudo_YuA_NC_010116(4),PHAGE_Pseudo_PAE1_NC_028980(4),PHAGE_Pseudo_JBD24_NC_020203(4),PHAGE_Pseudo_vB_PaeS_PAO1_Ab30_NC_026601(3),PHAGE_Vibrio_vB_VpaM_MAR_NC_019722(3),PHAGE_Synech_S_CBS1_NC_016164(3),PHAGE_Vibrio_VHML_NC_004456(3),PHAGE_Vibrio_VP58.5_NC_027981(3),PHAGE_Pseudo_MP1412_NC_018282(3),PHAGE_Pseudo_DMS3_NC_008717(2),PHAGE_Stenot_vB_SmaS_DLP_2_NC_029019(2),PHAGE_Synech_S_CBS3_NC_015465(2),PHAGE_Pseudo_PaMx11_NC_028770(2),PHAGE_Rhizob_RR1_A_NC_021560(2),PHAGE_Pseudo_MP38_NC_011611(2),PHAGE_Pseudo_vB_PaeS_PAO1_Ab18_NC_026594(2),PHAGE_Pseudo_PaMx28_NC_028931(2),PHAGE_Vibrio_SIO_2_NC_016567(2),PHAGE_Rueger_DSS3_P1_NC_025428(1),PHAGE_Klebsi_phiKO2_NC_005857(1),PHAGE_Cellul_phi18:3_NC_021794(1),PHAGE_Shewan_1/44_NC_025463(1),PHAGE_Achrom_phiAxp_2_NC_029106(1),PHAGE_Pseudo_vB_Pae_Kakheti25_NC_017864(1),PHAGE_Vibrio_12A10_NC_029067(1),PHAGE_Pseudo_vB_PaeS_PM105_NC_028667(1),PHAGE_Pseudo_vB_PaeS_SCH_Ab26_NC_024381(1),PHAGE_Cellul_phi46:3_NC_021792(1),PHAGE_Vibrio_12B3_NC_021067(1),PHAGE_Ralsto_RS138_NC_029107(1),PHAGE_Salmon_SSU5_NC_018843(1),PHAGE_Vibrio_12B12_NC_021070(1),PHAGE_Cellul_phi39:1_NC_021804(1),PHAGE_Pseudo_phiMK_NC_031110(1),PHAGE_Pseudo_73_NC_007806(1),PHAGE_Pseudo_PaMx74_NC_028809(1),PHAGE_Pseudo_MP22_NC_009818(1),PHAGE_Rhizob_vB_RleS_L338C_NC_023502(1),PHAGE_Pseudo_PaMx42_NC_028879(1),PHAGE_Burkho_phi6442_NC_009235(1),PHAGE_Stenot_S1_NC_011589(1),PHAGE_Pseudo_B3_NC_006548(1),PHAGE_Pseudo_D3112_NC_005178(1),PHAGE_Bacter_Lily_NC_028841(1),PHAGE_Burkho_phiE125_NC_003309(1),PHAGE_Vibrio_X29_NC_024369(1),PHAGE_Burkho_AH2_NC_018283(1)   3                                16.66%                           56.85%                        
                                  5              35Kb                     incomplete(50)                capsid,integrase                             5794433-5829445          0                        18                      10                               3                                72.2%                            5                                yes                              9                                PHAGE_Entero_JenP1_NC_029028(2),PHAGE_Entero_CAjan_NC_028776(2),PHAGE_Entero_JenP2_NC_028997(2),PHAGE_Psychr_pOW20_A_NC_020841(1),PHAGE_Idioma_1N2_2_NC_025439(1),PHAGE_Burkho_BcepGomr_NC_009447(1),PHAGE_Strept_MM1_NC_003050(1),PHAGE_Strept_EJ_1_NC_005294(1),PHAGE_Mycoba_Milly_NC_026598(1),PHAGE_Entero_JenK1_NC_029021(1),PHAGE_Mycoba_Cheetobro_NC_028979(1),PHAGE_Strept_phiARI0746_NC_031907(1),PHAGE_Salico_CGphi29_NC_020844(1),PHAGE_Gordon_Wizard_NC_030913(1),PHAGE_Entero_phiFL3A_NC_013648(1),PHAGE_Mycoba_Phelemich_NC_022063(1),PHAGE_Deep_s_D6E_NC_019544(1),PHAGE_Verruc_P8625_NC_029047(1),PHAGE_Pseudo_PPpW_3_NC_023006(1),PHAGE_Bacill_TP21_L_NC_011645(1),PHAGE_Aurant_AmM_1_NC_027334(1),PHAGE_Bacill_BM5_NC_029069(1),PHAGE_Burkho_phiE12_2_NC_009236(1),PHAGE_Bacill_phi105_NC_004167(1),PHAGE_Bacill_BMBtp2_NC_019912(1),PHAGE_Escher_slur01_NC_028831(1),PHAGE_Mycoba_ZoeJ_NC_024147(1),PHAGE_Mycoba_Acadian_NC_023701(1),PHAGE_Thermo_THSA_485A_NC_018264(1),PHAGE_Entero_phiFL1A_NC_013646(1),PHAGE_Lactob_Lj771_NC_010179(1),PHAGE_Mycoba_Baee_NC_028742(1)   2                                11.11%                           49.25%                        
                                  6              33.7Kb                   questionable(80)              recombinase,capsid,terminase,tail,head       6867447-6901202          0                        37                      26                               7                                89.1%                            4                                yes                              7                                PHAGE_Pseudo_phi3_NC_030940(19),PHAGE_Aeromo_phiO18P_NC_009542(17),PHAGE_Haemop_HP1_NC_001697(10),PHAGE_Pasteu_F108_NC_008193(9),PHAGE_Vibrio_8_NC_022747(9),PHAGE_Vibrio_K139_NC_003313(9),PHAGE_Haemop_HP2_NC_003315(8),PHAGE_Ralsto_RSY1_NC_025115(3),PHAGE_Burkho_KS14_NC_015273(2),PHAGE_Burkho_KS5_NC_015265(2),PHAGE_Salmon_Fels_2_NC_010463(2),PHAGE_Ralsto_RSA1_NC_009382(1),PHAGE_Phormi_MIS_PhV1A_NC_029032(1),PHAGE_Entero_N15_NC_001901(1),PHAGE_Salmon_RE_2010_NC_019488(1),PHAGE_Vibrio_vB_VpaM_MAR_NC_019722(1),PHAGE_Halomo_phiHAP_1_NC_010342(1),PHAGE_Klebsi_phiKO2_NC_005857(1),PHAGE_Vibrio_VP882_NC_009016(1),PHAGE_Bdello_phi1422_NC_019525(1),PHAGE_Entero_186_NC_001317(1),PHAGE_Pseudo_phiCTX_NC_003278(1),PHAGE_Entero_fiAA91_ss_NC_022750(1),PHAGE_Haemop_SuMu_NC_019455(1),PHAGE_Burkho_KL3_NC_015266(1)   18                               51.35%                           55.42%

第二个变量的源文件(这是一个非常大的文件):

source          1..7215267
                     /organism="Hahella chejuensis KCTC 2396"
                     /mol_type="genomic DNA"
                     /strain="KCTC 2396"
                     /db_xref="taxon:349521"
     gene            247..381
                     /locus_tag="HCH_00001"
     CDS             247..381
                     /locus_tag="HCH_00001"
                     /codon_start=1
                     /transl_table=11
                     /product="hypothetical protein"
                     /protein_id="ABC26924.1"
                     /translation="MGFGHRVLFSLKNINIRFSLYIESRRLKFAQKKSKHVRILEVWK
                     "
     gene            378..1781
                     /gene="dnaA"
                     /locus_tag="HCH_00002"
     CDS             378..1781
                     /gene="dnaA"
                     /locus_tag="HCH_00002"
                     /note="TIGRFAMsMatches:TIGR00362"
                     /codon_start=1
                     /transl_table=11
                     /product="chromosomal replication initiator protein DnaA"
                     /protein_id="ABC26925.1"
                     /translation="MTSELWHQCLGYLEDELPAQQFNTWLRPLQAKGSEEELLLFAPN
                     RFVLDWVNEKYIGRINEILSELTSQKAPRISLKIGSITGNSKGQQASKDSAVGATRTT
                     APSRPVIADVAPSGERNVTVEGAIKHESYLNPTFTFETFVEGKSNQLARAAAMQVADN
                     PGSAYNPLFLYGGVGLGKTHLMQAVGNAIFKKNPNAKILYLHSERFVADMVKALQLNA
                     FNEFKRLYRSVDALLIDDIQFFARKERSQEEFFHTFNALLEGGQQMILTCDRYPKEID
                     HMEERLKSRFGWGLTVMVEPPELETRVAILMKKAEQANVHLSSESAFFIAQKIRSNVR
                     ELEGALKLVIANAHFTGQEITPAFIRECLKDLLALHEKQVSIDNIQRTVAEYYKIRIA
                     DILSKRRTRSITRPRQMAMALAKELTNHSLPEIGEAFGGRDHTTVLHACKVMIELQQS
                     DPTLRDDYQNFMRMLTS"
     gene            1884..2987
                     /gene="dnaN"
                     /locus_tag="HCH_00003"
     CDS             1884..2987
                     /gene="dnaN"
                     /locus_tag="HCH_00003"
                     /EC_number="2.7.7.7"
                     /note="TIGRFAMsMatches:TIGR00663"
                     /codon_start=1
                     /transl_table=11
                     /product="DNA polymerase III, beta subunit"
                     /protein_id="ABC26926.1"
                     /translation="MKLTITREALVTSLQMISGVVEKRQTMPVLANVLLDARDGKLVI
                     TGTNMEVELVAEISDVNIEHESRITVPAKKFTDICRALPEGAAIGIELKDGRLNVRYG
                     SSHFILSTLPAEHFPNVEEEPESVKVTLPQRELKRLIDATAFAMAQQDVRYYLNGMLM
                     ELDEQGLRTVATDGHRLALANVSLQTGVSEKRQPIVPRKGILELGRLLNDTDESCTLV
                     FGDNHVRASVGHFTFTSKLIDGKFPDYQRVIPRSGDKVMLADRVLLKGVLSRASILSH
                     ESIRGVRLQFEEGLLKVFANNPDQEEAEDSLEVEYPHEALQIGFNVGYLIDVLNALDD
                     EQVKVTLSNANSSALVEGVDTRDAVYVVMPMRL"
     gene            3008..3103
                     /locus_tag="HCH_00004"
     CDS             3008..3103
                     /locus_tag="HCH_00004"
                     /codon_start=1
                     /transl_table=11
                     /product="hypothetical protein"
                     /protein_id="ABC26927.1"
                     /translation="MNLFELERSRRVARSGMTLGKDVSPLNADRV"
     gene            3128..4405
                     /gene="aarF"
                     /locus_tag="HCH_00005"
     CDS             3128..4405
                     /gene="aarF"
                     /locus_tag="HCH_00005"
                     /note="Predicted unusual protein kinase; COG0661"
                     /codon_start=1
                     /transl_table=11
                     /product="ABC1 family protein kinase"
                     /protein_id="ABC26928.1"
                     /translation="MGKIVNAVKGAARIGQTAAVISKVGLGWLKGNRAPAPRLLRQTF
                     EELGATYIKLGQFIASSPTFFPADYVEEFQLCLDKTKPLPYSQIEKILKEEFKRPLQS
                     IYSHIDTKPLASASIAQVHAARLVTGEDVVIKVQKPGVRNVLLTDLNFLYVAARVVEY
                     LAPKLSWTSLSGIVEEIQRTMMEECDFYQEAANLKEFREFLVSSGNDQAVVPTVYEQA
                     STMRVLTMERFYGVPLTDLETIRKYCSDPEKTLITAMNTWFASLTQCDFFHADVHAGN
                     LMVLEDGRIGFIDFGIVGRIGAGTWQAVSDFITAIMMGNFHGMADAMSRIGITKSQLS
                     VDDLAADIADVYKKMDAMTPDMPPIYYDQQTGDDEVNNILMDLVRIGEQHGLHFPREF
                     ALLLKQFLYFDRYVHVLAPELDMFMDERLSLIQ"

3 回答

  • 0

    您需要分离并命名要对数据执行的所有操作 . 然后找到等效的UNIX命令 . 大多数UNIX工具在行方面工作,而不是在列方面工作,因此学会在行中思考是有益的;)并且请将所有工具放在一起 - 而不是将潜在的大数据分配给变量 .

    您想从两个文件中提取一个列( awk ),然后将两个提取的列一起粘贴到输出( paste )中,由 \t 分隔(默认情况下粘贴使用 \t ),前面是 Headers 行 . 您可以创建两个中间文件,也可以使用shell替换 .

    paste\
      <( <CP000155.phaster awk '$2~/[0-9]Kb/{print ($5)}' )\
      <( <CP000155.gbk awk '$1~/CDS/{print ($2)}' ) |
    (echo -e 'Phaster_positions\tGBKPositions'; cat) \
    > gbk3.txt
    

    编辑:查看源数据,这可能会产生所需的文件格式,但不是正确的数据 . 你需要确保来自第一个 awk 的第三行与来自第二个 awk 的第三行完全对应 . 您的数据可能需要使用唯一标识符 join 进行组合 .

  • 0

    我没有开始非常彻底地浏览这些文件,基本上从代码中取出部分并将它们添加到单个脚本中并添加了散列和输出:

    $ awk -v OFS="\t" '        # tab as output delimiter
    NR==FNR && $2~/[0-9]Kb/ {  # process the first file (with a condition)
        a[++i]=$5              # hash $5 to a
        next                   # process next record
    }
    $1~/CDS/ {                 # process the second file (with a condition)
        b[++j]=$2              # hash $2 to b
    }
    END {
        print "Phaster_positions","GBKPositions"
        if(i>=j)               # was there more is or js
            n=i                # take the bigger value and use it...
        else 
            n=j
        for(i=1;i<=n;i++)      # ... here
            print a[i],b[i]    # output side by side
    }' first second
    

    输出:

    Phaster_positions       GBKPositions
    371860-418565   247..381
    2947108-2988239 378..1781
    4663633-4680174 1884..2987
    5756724-5793879 3008..3103
    5794433-5829445 3128..4405
    6867447-6901202
    

    这有任何意义吗?它将文件1匹配存储到 a 哈希,文件2匹配 b . 如果有大量数据,则可能会耗尽内存 . 如果是这种情况,请报告回来,我们将为您安排另一个解决方案 .

    Update

    这个只存储文件1到 a 并在输出文件2时清空它:

    awk '
    BEGIN {
        OFS="\t"               # the output field separator
        print "Phaster_positions","GBKPositions"  # output the header
    }
    NR==FNR && $2~/[0-9]Kb/ {  # process the first file (with a condition)
        a[++i]=$5              # hash $5 to a
        next                   # process next record
    }
    $1~/CDS/ {                 # process the second file (with a condition)
        print ((++j in a)?a[j]:"") OFS $2  # output from a if exists and $2
        delete a[j]            # delete after output
    }
    END {
        for(j=1;j<=i;j++)      # stupid loop
            if(j in a)         # if there are any left in a
            print a[j] OFS     # output them
    }' first second
    

    没有经过实战考验, END 中的 for 循环是愚蠢的 .

  • -1
    printf "Phaster_positions\tGBKPositions\n\n">gbk3.txt
    
    PhasterPositions=`awk '$2~/[0-9]Kb/{print ($5)}' CP000155.phaster`
    GBKPositions=`awk '$1~/CDS/{print ($2)}' CP000155.gbk`
    
    printf "$PhasterPositions\t$GBKPositions">>gbk3.txt
    

    看看这是否有效

相关问题