x86 NASM将字符串转换为整数-Java 学习之路

这是一个简单的问题，但它让我头晕目眩 . 我需要将一串字符（输入为负十进制数）转换为无符号整数 . rdi寄存器保存要转换的字符串 . rax寄存器将保存结果 .

xor rsi, rsi
    xor rax, rax
    xor dl, dl
    xor rdx, rdx
convert:
    mov dl, [rdi+rsi]    ;+rsi causes segmentation fault

    cmp dl, "-"
    jz  increment

    cmp dl, "."
    jz  dtoi_end

    sub dl, "0"

    mov rdx, 10
    mul rdx

    add rax, dl          ;invalid combination

    inc rsi
    jmp convert

increment:
    inc rsi
    jmp convert

convert_end:
    ret

我需要迭代每个字符，我试图通过使用rsi寄存器来使用它 . 但每次我尝试这个，我都会遇到分段错误 .
无效的组合错误 . 我知道这是因为寄存器的大小不同，但我不知道如何继续将转换后的ascii值添加回rax .

这里有一个类似的问题帮助我更好地理解了这个过程，但是我遇到了障碍：Convert string to int. x86 32 bit Assembler using Nasm

1 回答

2
我需要迭代每个字符，我试图通过使用rsi寄存器来使用它 . 但每次我尝试这个，我都会遇到分段错误 .

根据您显示的代码以及 RDI 保存字符串开头地址的语句，我可以看到为什么您在该负载中遇到分段错误的几个不同原因 .

也许问题是 RDI 包含一个8字符的ASCII字符串（按值传递），而不是包含字符串的内存位置的地址（通过引用传递）？

另一个更可能的可能性是它在循环的前几次迭代中工作正常，但是你开始尝试读取超过字符串的结尾，因为你没有正确地终止循环 . 您显示的代码中没有 dtoi_end 标签，也没有实际跳转到 convert_end 标签的位置 . 这些应该是相同的标签吗？如果我传入字符串"-2"会发生什么？你的循环何时终止？在我看来它不会！

您需要某种方式来指示整个字符串已被处理 . 有几种常见的方法 . 一种是在字符串末尾使用一个sentinel终结符，就像C使用ASCII NUL字符一样 . 在循环内部，您将检查正在处理的字符是否为0（NUL），如果是，则跳出循环 . 另一种选择是将字符串的长度作为函数的附加参数传递，就像Pascal对count-length字符串一样 . 然后，你将在循环内部进行测试，检查你是否已经处理了足够多的字符，如果是，则跳出循环 .

我'll try not to be too preachy about this, but you should have been able to detect this problem yourself by using a debugger. Step through the execution of the code line-by-line, watching the values of the variables/registers, and making sure you understand what is happening. This is basically what I did when analyzing your code, except I used my head as the debugger, 196903 the code in my own mind. It is much easier (and less error-prone) to let the computer do it, though, and that'为什么发明了调试器 . 如果您的代码在调试器中逐行逐步执行，那么您还没有努力工作来自己解决问题 . 事实上，单步执行您编写的每个函数都是一个很好的习惯，因为（A）它已经写好了，（B）它会帮助您找到错误 .

组合错误无效 . 我知道这是因为寄存器的大小不同，但我不知道如何继续将转换后的ascii值添加回rax .

你必须使尺寸匹配 . 您可以执行 add al, dl ，但之后您将结果限制为8位BYTE . 这可能不是你想要的 . 因此，您需要将 dl 转换为64位QWORD，如 rax . 显而易见的方法是使用 MOVZX 指令，该指令执行零扩展 . 换句话说，它将值设置为更大的大小，将高位填充为0 . 这就是你想要的无符号值 . 对于签名值，您需要执行符号识别扩展（即将符号位考虑在内），为此，您将使用 MOVSX 指令 .

在代码中：
```
movzx  rdx, dl
add    rax, rdx
```
请注意，正如评论者指出的那样， DL 只是 RDX 寄存器的最低8位：
```
| 63 - 32 | 31 - 16 | 15 - 8 | 7 - 0 |
--------------------------------------
                    |   DH   |   DL  |
--------------------------------------
          |           EDX            |
--------------------------------------
|                 RDX                |
```
因此， xor dl, dl 和 xor rdx, rdx 是多余的 . 后者完成了前者 . 此外，每次修改 dl 时，实际上都在修改 rdx 的最低8位，这将导致错误的结果 . 提示，提示：这是你已经捕获的其他东西（虽然你可能不明白为什么！）通过单步执行调试器 .

而且，根本没必要做 xor rdx, rdx ！您可以通过 xor edx, edx 完成相同的任务more efficiently .

只是为了好玩，这里有一个可能的代码实现：
```
; Parameters: RDI == address of start of character string
;             RCX == number of characters in string
; Clobbers:   RDX, RSI
; Returns:    result is in RAX

    xor   esi, esi

convert:
    ; See if we've done enough characters by checking the length of the string
    ; against our current index.
    cmp   rsi, rcx
    jge   convert_end

    ; Get the next character from the string.
    mov   dl, BYTE [rdi + rsi]

    cmp   dl, "-"
    je    increment

    cmp   dl, "."
    je    convert_end

    ; Efficient way to multiply by 10.
    ; (Faster and less difficult to write than the MUL instruction.)
    add   rax, rax
    lea   rax, [4 * rax + rax]

    sub   dl, "0"
    movzx rdx, dl
    add   rax, rdx

    ; (fall through to increment---no reason for redundant instructions!)

increment:
    inc   rsi            ; increment index/counter
    jmp   convert        ; keep looping

convert_end:
    ret
```
（警告：这个逻辑是未经测试的！我只是以更优化的方式重写了现有代码，没有错误 . ）
回复于 2024-04-20T02:17:33+08:00

x86 NASM将字符串转换为整数

1 回答

相关问题