首页 文章

当我到达null String时如何继续处理文件

提问于
浏览
1

我正在尝试读取包含DNA序列的文件 . 在我的程序中,我想读取长度为4的DNA的每个子序列,并将其存储在我的hashmap中以计算每个子序列的出现次数 . 例如,如果我有序列 CCACACCACACCCACACACCCAC ,并且我想要 length 4 的每个子序列,前3个子序列将是:
CCAC, CACA, ACAC
所以为了做到这一点,我必须多次迭代字符串,这是我的实现

try
    {
        String file = sc.nextLine();
        BufferedReader reader = new BufferedReader(new FileReader(file + ".fasta")); 

        Map<String, Integer> frequency = new HashMap<>(); 

        String line = reader.readLine();

        while(line != null)
        {
            System.out.println("Processing Line: " + line);
            String [] kmer = line.split("");

            for(String nucleotide : kmer)
            {
                System.out.print(nucleotide);
                int sequence = nucleotide.length(); 
                for(int i = 0; i < sequence; i++)
                {
                    String subsequence = nucleotide.substring(i, i+5); 
                    if(frequency.containsKey(subsequence))
                    {
                        frequency.put(subsequence, frequency.get(subsequence) +1);
                    }
                    else
                    {
                        frequency.put(subsequence, 1);
                    }
                }
            }
            System.out.println();
            line = reader.readLine();
        }
        System.out.println(frequency);            
    }
    catch(StringIndexOutOfBoundsException e)
    {
        System.out.println();
    }

到达字符串末尾时出现问题,由于错误,它不会继续处理 . 我该如何绕过那个?

4 回答

  • 0
    • 您可以直接读取每一行并提取前4个子字符,而无需在每次读取行时将其拆分 .

    你得到的错误是因为当程序循环遍历分割的字符时,可能总共有少于4个字符被提取出来 . 少于4个字符负责抛出错误 . 例如假设您有一行CCACACC然后分成4个字符,您将获得第一组完整,即CCAC和第二组作为ACC不完整 . 所以在你的代码中行的核苷酸.substring(i,i 5);遇到然后可能在最后没有完整的4个字符组可以提取,因此程序抛出错误 . 要提取4个字符,您需要添加4个而不是5个字符 .

    因此,围绕代码的工作是将提取行放在try块中,如下面编辑的代码中所示 . 用下面的代码替换循环体 .

    while(reader.hasNextLine())
    {
        line = reader.nextLine();
        for(int i = 0; i < line.length; i++)
        {
            String subsequence = "";
            // put the extract operation in a try block
            // to avoid crashing
            try
            {
                subsequence = nucleotide.substring(i, i+4); 
            }
            catch(Exception e)
            {
                // just leave blank to pass the error
            }
    
            if(frequency.containsKey(subsequence))
            {
                frequency.put(subsequence, frequency.get(subsequence) +1);
            }
            else
            {
                frequency.put(subsequence, 1);
            }
        }
    
  • -1

    你正在调用substring(i,i 5) . 在字符串i 5的末尾超出范围 . 假设你的字符串是“ABCDEFGH”,长度为8,你的循环将从i = 0变为i = 7.当我到达4个子串(4,9)时,无法计算并引发异常 .

    试试这个:

    for(int i = 0; i < sequence - 4; i++)
    
  • 0

    根本不清楚问题描述,但我猜你的输入文件以空行结束 .

    尝试删除输入文件中的最后一个换行符,或者在while循环中检查为空:

    while (line != null && !line.isEmpty())
    
  • 0

    根据您帖子的 Headers ...尝试更改 while 循环的条件 . 而不是使用当前:

    String line = reader.readLine();
    while(line != null) {
        // ...... your code .....
    }
    

    使用此代码:

    String line;
    while((line = reader.readLine()) != null) {
        // If file line is blank then skip to next file line.
        if (line.trim().equals("")) {
            continue;
        }
        // ...... your code .....
    }
    

    这将涵盖处理空白文件行 .

    现在关于你正在经历的 StringIndexOutOfBoundsException 例外 . 我相信到现在你已基本知道为什么你会收到这个例外,因此你需要决定你想做什么 . 如果要将字符串拆分为特定长度的块,并且如果特定的文件行字符,则该长度对于整个长度不能完全可分,那么显然有一些选项可用:

    • 忽略文件行末尾的剩余字符 . 虽然这是一个简单的解决方案,但它对DNA知之甚少,但我确信这不是可行的途径 .

    • 将剩余的DNA序列(即使它很短)添加到 Map . 我再一次对DNA一无所知,而且我是一个可行的解决方案 . 也许是,我根本就不知道 .

    • 将剩余的短DNA序列添加到下一个输入文件行的开头,然后将该行分成4个字符块 . 继续这样做直到文件结束,如果最终的DNA序列确定为短,则将其添加到 Map (或不是) .

    当然可能有其他选择,无论它们是什么,它都需要决定.24099_ . 但是,为了帮助您,这里的代码涵盖了我提到的三个选项:

    忽略剩余的字符:

    Map<String, Integer> frequency = new HashMap<>();
    String subsequence;
    String line;
    try (BufferedReader reader = new BufferedReader(new FileReader("DNA.txt"))) {
        while ((line = reader.readLine()) != null) {
            // If file line is blank then skip to next file line.
            if (line.trim().equals("")) {
                continue;
            }
    
            for (int i = 0; i < line.length(); i += 4) {
                // Get out of loop - Don't want to deal with remaining Chars
                if ((i + 4) > (line.length() - 1)) {
                       break;
                }
    
                subsequence = line.substring(i, i + 4);
                if (frequency.containsKey(subsequence)) {
                    frequency.put(subsequence, frequency.get(subsequence) + 1);
                }
                else {
                    frequency.put(subsequence, 1);
                }
            }
        }
    }
    catch (IOException ex) {
        ex.printStackTrace();
    }
    

    将剩余的DNA序列(即使它很短)添加到 Map 中:

    Map<String, Integer> frequency = new HashMap<>();
    String subsequence;
    String line;
    try (BufferedReader reader = new BufferedReader(new FileReader("DNA.txt"))) {
        while ((line = reader.readLine()) != null) {
            // If file line is blank then skip to next file line.
            if (line.trim().equals("")) {
                continue;
            }
    
            String lineRemaining = "";
    
            for (int i = 0; i < line.length(); i += 4) {
                // Get out of loop - Don't want to deal with remaining Chars
                if ((i + 4) > (line.length() - 1)) {
                    lineRemaining = line.substring(i);
                    break;
                }
    
                subsequence = line.substring(i, i + 4);
                if (frequency.containsKey(subsequence)) {
                    frequency.put(subsequence, frequency.get(subsequence) + 1);
                }
                else {
                    frequency.put(subsequence, 1);
                }
            }
            if (lineRemaining.length() > 0) {
                subsequence = lineRemaining;
                if (frequency.containsKey(subsequence)) {
                    frequency.put(subsequence, frequency.get(subsequence) + 1);
                }
                else {
                    frequency.put(subsequence, 1);
                }
            }
        }
    }
    catch (IOException ex) {
        ex.printStackTrace();
    }
    

    将剩余的短DNA序列添加到下一个传入文件行的开头:

    Map<String, Integer> frequency = new HashMap<>();
    String lineRemaining = "";
    String subsequence;
    String line;
    try (BufferedReader reader = new BufferedReader(new FileReader("DNA.txt"))) {
        while ((line = reader.readLine()) != null) {
            // If file line is blank then skip to next file line.
            if (line.trim().equals("")) {
                continue;
            }
            // Add remaining portion of last line to new line.
            if (lineRemaining.length() > 0) {
                line = lineRemaining + line;
                lineRemaining = "";
            }
    
            for (int i = 0; i < line.length(); i += 4) {
                // Get out of loop - Don't want to deal with remaining Chars
                if ((i + 4) > (line.length() - 1)) {
                    lineRemaining = line.substring(i);
                    break;
                }
    
                subsequence = line.substring(i, i + 4);
                if (frequency.containsKey(subsequence)) {
                    frequency.put(subsequence, frequency.get(subsequence) + 1);
                }
                else {
                    frequency.put(subsequence, 1);
                }
            }
        }
        // If any Chars remaining at end of file then
        // add to MAP
        if (lineRemaining.length() > 0) {
            frequency.put(lineRemaining, 1);
        }
    }
    catch (IOException ex) {
        ex.printStackTrace();
    }
    

相关问题