Skip to content

Multiple/unnecessary movzx/movsxd with char -> byte -> IntPtr casts #38794

@Sergio0694

Description

@Sergio0694

Description

I found a couple places where the JIT seems to be adding unnecessary movzx/movsxd instructions when doing unchecked conversions from char to byte and then up again to IntPtr. It's basically just doing the same operation again.

Consider this simple example (sharplab.io sample here):

public static unsafe IntPtr Convert(char c)
{
    return unchecked((IntPtr)(void*)(byte)c);
}

I get the following:

;  V00 arg0         [V00,T00] (  3,  3   )  ushort  ->  rcx        
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;  V03 tmp2         [V03,T01] (  2,  4   )    long  ->  rax         "NewObj constructor temp"
;* V04 tmp3         [V04    ] (  0,  0   )    long  ->  zero-ref    "Inlining Arg"
;
; Lcl frame size = 0

G_M51760_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M51760_IG02:
       0FB7C1               movzx    rax, cx
       0FB6C0               movzx    rax, al
						;; bbWeight=1    PerfScore 0.50
G_M51760_IG03:
       C3                   ret      
						;; bbWeight=1    PerfScore 1.00

; Total bytes of code 7, prolog size 0, PerfScore 2.20

That second movzx rax, al is not actually neeced, right?

Or, this other related case (sharplab.io sample here):

// Assume this has 255 items in it
private static ReadOnlySpan<byte> SomeTable => new byte[] { 1, 2, 3 };

[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public static byte Lookup(char c)
{
    ref byte r0 = ref MemoryMarshal.GetReference(SomeTable);

    return Unsafe.Add(ref r0, unchecked((byte)c));
}

Gives the following:

;  V00 arg0         [V00,T00] (  3,  3   )  ushort  ->  rcx        
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )  struct (16) zero-ref    "struct address for call/obj"
;* V03 tmp2         [V03    ] (  0,  0   )  struct (16) zero-ref    "NewObj constructor temp"
;* V04 tmp3         [V04    ] (  0,  0   )  struct ( 8) zero-ref    "NewObj constructor temp"
;* V05 tmp4         [V05    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg"
;* V06 tmp5         [V06    ] (  0,  0   )   byref  ->  zero-ref    "Inlining Arg"
;* V07 tmp6         [V07    ] (  0,  0   )     int  ->  zero-ref    "Inlining Arg"
;* V08 tmp7         [V08    ] (  0,  0   )   byref  ->  zero-ref    V02._pointer(offs=0x00) P-INDEP "field V02._pointer (fldOffset=0x0)"
;* V09 tmp8         [V09    ] (  0,  0   )     int  ->  zero-ref    V02._length(offs=0x08) P-INDEP "field V02._length (fldOffset=0x8)"
;* V10 tmp9         [V10    ] (  0,  0   )   byref  ->  zero-ref    V03._pointer(offs=0x00) P-INDEP "field V03._pointer (fldOffset=0x0)"
;* V11 tmp10        [V11    ] (  0,  0   )     int  ->  zero-ref    V03._length(offs=0x08) P-INDEP "field V03._length (fldOffset=0x8)"
;* V12 tmp11        [V12    ] (  0,  0   )   byref  ->  zero-ref    V04._value(offs=0x00) P-INDEP "field V04._value (fldOffset=0x0)"
;* V13 tmp12        [V13    ] (  0,  0   )   byref  ->  zero-ref    V05._pointer(offs=0x00) P-INDEP "field V05._pointer (fldOffset=0x0)"
;* V14 tmp13        [V14    ] (  0,  0   )     int  ->  zero-ref    V05._length(offs=0x08) P-INDEP "field V05._length (fldOffset=0x8)"
;
; Lcl frame size = 0

G_M14832_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M14832_IG02:
       0FB7C1               movzx    rax, cx
       0FB6C0               movzx    rax, al
       4863C0               movsxd   rax, eax
       48BA7CB76ABD22020000 mov      rdx, 0x222BD6AB77C
       0FB60410             movzx    rax, byte  ptr [rax+rdx]
						;; bbWeight=1    PerfScore 3.00
G_M14832_IG03:
       C3                   ret      
						;; bbWeight=1    PerfScore 1.00

; Total bytes of code 24, prolog size 0, PerfScore 6.40

Here we have both the unnecessary movzx, as well as that extra movsxd rax, eax (presumably because the Unsafe.Add overload it picks up is the int one?) that I think the JIT should be able to remove as well, as it should be able to see that the value is immediately converted to native integer right after that, so that sign extension isn't necessary in this situation.

Thankfully this can be worked around by just manually casting to IntPtr, I couldn't remove that duplicate movzx though.

Configuration

  • Tested on both .NET 5 Preview 6 with disasmo, and on sharplab.io on .NET Core 3.x
  • Locally I have Windows 10 Pro x64 19041.x
  • x64

Regression?

From what I can see on sharplab.io, this happens on .NET Core 3.x too, and even way back on .NET Framework x64.

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions