Parallel

Undocumented Corner

By Robert R. Collins, September 01, 1996

How does your program know which Intel processor is the current system CPU? Robert looks at the options, including Intel's PUSHF/POPF technique.

September 1996: Undocumented Corner

Detecting Intel Processors

Knowing the generation of a system CPU

Robert is a design verification manager at Texas Instruments' Microprocessor Design Center. Robert can be reached via e-mail at [email protected].

The debate about the correct way to detect different generations of Intel microprocessors has raged for years. In one corner are programmers who traditionally used a series of PUSHF/POPF instructions to detect the FLAGs differences between processors. In the other corner, it always seemed I stood alone, arguing that this technique is flawed. The debate subsided somewhat in 1989, when Intel published an algorithm that relied upon PUSHF/POPF for microprocessor identification. But even while the naysayers said, "See, even Intel does it our way," I stood in my little corner saying "Sure, but it's wrong."

The truth is, neither algorithm is fail-safe. Intel's PUSHF/POPF method can misdiagnose which processor family is running and does not guarantee to operate outside of real mode. My technique should always run in v86 mode, but sometimes doesn't because of shortcomings in the design of many v86-memory managers-like EMM386 from Microsoft.

Intel's Algorithm

All current-generation Intel x86 processors have an instruction called CPUID that reads CPU identification information. This information can be used by software to dynamically take advantage of processor-specific programming techniques. Before CPUID, you needed to write an algorithm to detect differences between different generations of processors. This algorithm would serve much of the same purpose as executing the CPUID instruction. Intel didn't invent the algorithm; the company borrowed one that was in wide distribution on the Internet, and published it in the i486 Microprocessor Programmer's Reference Manual (Intel Corp. 1990), claiming "Copyright Intel Corporation." Oddly, the original algorithm was published in two halves, in opposite ends of the manual. Section 22.10 contained the algorithm to detect the differences between 8086 through 80386. Figure 3-23 contained the algorithm to detect the difference between the 80386 and 80486. The latest edition of this manual removes the code fragments, referring you to "AP-485, Intel Processor Identification With the CPUID Instruction," Order Number 241618 (ftp://ftp.intel.com/ pub/IAL/software_specs/ap48504f.pdf).

AP-485 includes the following comment:

Please understand that the code sequences have been validated by Intel to detect CPU_ID, math coprocessor function, and initialize accordingly. Any other approach may produce unpredictable results in future processors.

It's ironic that Intel claims that "any other approach may produce unpredictable results," since its algorithm is prone to failures that yield unpredictable results (as I'll demonstrate in this article). For more information on CPUID, see the text box "Pentium Detection," by Robert Moote (which accompanied the article "Processor-Detection Schemes," by Richard C. Leinecker, DDJ, June 1993).

The Intel algorithm relies on a series of PUSHF/POPF instructions to set and clear various FLAGs bits. Each generation of processor has a slightly different behavior which may be detected by this approach. This algorithm makes no attempt to detect the 80186/88 series of processors. In this regard, the algorithm is incomplete.

The 8086/88 is distinguished from the 80286 by attempting to clear bits 12-15 of the FLAGs register. The 8086/88 will always set these bits, regardless of what values are popped into them (see Listing One). The 286 treats these bits differently. In real mode, these bits are always cleared by the 286; in protected mode, they are used for IOPL (I/O Privilege Level) and NT (Nested Task). To continue the detection code, you need to set bits 12-15 in the FLAGs register, and see if they are cleared by the processor. If they are, then a 286 has been detected (see Listing Two).

If you get beyond this point in the algorithm, you know you have at least a 386. Therefore, it is safe to use 32-bit instructions, like PUSHFD/POPFD. This will be necessary in detecting the difference between a 386 and 486. These processors are distinguished from each other by attempting to set the AC flag in the EFLAGs register. This flag was introduced in the 486. The 386 never sets this bit, and always clears it when it is set by POPFD. Therefore, to detect the difference between these processor generations, the algorithm attempts to set this bit, to see if it is latched or cleared by the processor (see Listing Three).

At this point in the algorithm, you're almost home. To detect the difference between the 486 and the Pentium, you attempt to set another new EFLAG bit (bit-21) called the "ID flag." This flag has only one purpose-to indicate the presence of the CPUID instruction. This bit was first introduced on the Pentium, but later retrofitted into the 486. If the CPUID instruction exists on either processor, it may be executed to return the processor-identification information. 486s without the CPUID instruction will not be able to toggle this bit. Therefore, it is safe to execute a sequence of instructions on either processor that detects the processor's ability to toggle this bit (see Listing Four).

Once the algorithm gets to this point, you can execute the CPUID instruction to obtain the processor identification. This instruction can be run in any processor mode, at any privilege level. On the Pentium and 486, the CPUID instruction has two levels:

Level 0 returns a vendor ID string in EBX:EDX:ECX, which says "GenuineIntel" when printed as ASCII text.
Level 1 returns the processor identification signature-the same signature that appears in the EDX register after a processor RESET (see Listing Five).

The complete Intel algorithm is available in AP-485, or via anonymous FTP at ftp://ftp.intel.com/pub/IAL/tools_utils_ demos/cpuid3.zip.

The Caveats

In spite of Intel's claim, this algorithm is far from perfect. For one thing, it fails to detect the 80186/88 series of processors. Even though this processor wasn't adopted by many PC manufacturers, it was used in some computers, primarily notebook computers. The 80186/88 processor contains most of the new instructions and CPU-generated exceptions contained in the 80286. These instructions include PUSHA/POPA, PUSH immed, SHL reg,

immed, and the invalid opcode exception. The only 80286 instructions and exceptions not implemented in the 80186/88 are those specifically used for protected mode. Failure to detect this processor could prohibit the use of some software that can take advantage of these new instructions and exceptions.

This algorithm is only designed to run in real mode, not in a virtual-8086 DOS box running under Windows. This limitation is even mentioned in the 486 manual. This results from the fact that PUSHF and POPF are privileged instructions that are sensitive to the I/O Privilege Level while running in protected mode. (DOS boxes, running under Windows, run in virtual-8086 mode-a special form of protected mode.) If IOPL is not equal to three, then a general-protection fault occurs while attempting to execute these instructions. The operating system then intervenes to emulate the instruction as it sees fit. Therefore, there is no guarantee that the operating system will mimic the real-mode behavior of the specific processor under test. In reality, this may not be as big a problem as it sounds. Windows sets IOPL equal to three for DOS boxes. This renders these instructions transparent to the operating system, and they execute without generating a fault.

Not all operating systems with a DOS-compatibility box follow the example set by Windows. OS/2 Warp uses a special form of virtual-8086 mode, called Virtual Mode Extensions (VME). Running in VME affords the protection advantages of running at IOPL=2 without incurring the faults generated by PUSHF/POPF used in this algorithm. (See http://www.x86.org/vme1 for a discussion on VME.) To accommodate this behavior, Intel modified the algorithms of PUSHF/POPF to allow them to run in VME without faulting to the host operating system. When IOPL<3, PUSHF always pushes an IOPL value of three onto the stack. This doesn't cause any problems for the Intel algorithm, as none of the detection code depends upon setting or clearing these two bits alone.

Should the CPUID instruction ever return a signed number (for example, 80000001h), the Intel algorithm will fail. In Listing Five, the instruction above the designated <- symbol is a conditional jump based on a signed comparison. This is a common programming error which can easily be fixed in the Intel algorithm.

This algorithm relies on undocumented processor behavior to detect the differences between early generations of Intel processors. The use of such programming tricks violates Intel's own recommendations. Consider the following guidelines set forth in various Intel manuals:

Reserved Bits and Software Compatibility

Software should not try to identify features by exploiting programming tricks, undocumented features, or otherwise deviating from the guidelines presented in this application note.

When bits are marked as reserved, it is essential for compatibility with future processors that software treat these bits as having a future, though unknown, effect. The behavior of reserved bits should be regarded as not only undefined, but unpredictable. Software should follow these guidelines in dealing with reserved bits:

Do not use undocumented features of a processor to identify steppings or features.
Do not depend on the states of any reserved bits when testing the values of registers which contain such bits. Mask out the reserved bits before testing.
Do not depend on the states of any reserved bits when storing to memory or to a register.
Do not depend on the ability to retain information written into any reserved bits.
When loading a register, always load the reserved bits with the values indicated in the documentation, if any, or reload them with values previously read from the same register.

These guidelines were quoted from a combination of two sources: Pentium Pro Family Developer's Manual, Volume 3: Operating System Writer's Manual (1996), section 1.3.2 and AP-485 Application Note: Intel Processor Identification With the CPUID Instruction. Very similar guidelines also appear in the 80386 High Performance Microprocessor with Integrated Memory Management Unit (1985), section 2.3.10; i486 Microprocessor (1989), section 2.1.6; and Pentium Processor Family Developer's Manual, Volume 3 (1995), section 1.3.2.

These are strong guidelines set forth in Intel's documentation, and the irony of Intel's algorithm is that it violates each and every one of them. Detecting the difference between 8086/88 and 80286/88, and between 80286/88 and 80386, completely depends upon setting and clearing reserved bits in the FLAGs register, and then depends on the state of those bits when they are stored to a resultant register. Detecting the difference between 386 and 486, and between 486 and Pentium, depends upon setting an EFLAGs bit that is undefined on the previous-generation processor, then depends on that processor to clear the undefined bit. To abide by Intel's guidelines, the behavior of these undocumented FLAGs bits must be documented in their respective manuals-but they aren't. None of these differences are documented in any of the processors' respective data sheets. Processor behavior often isn't documented until many years after release. The 8086 FLAGs behavior was first described in the 386 programmer's reference manual in 1988 (nearly ten years after the 8086's introduction). The 80286 FLAGs behavior wasn't described until the Pentium manuals were introduced in 1993 (ten years after the 80286 introduction, and four years after Intel introduced this algorithm in the 486 manuals).

Even though Intel's algorithm violates all of its own guidelines, the company is partially exonerated by the Pentium programmer's reference manual, where Intel says that it's acceptable to use this algorithm to detect the differences in these processors. However, the Pentium manual doesn't change the prohibitions set forth in the 386 or 486 manuals; those prohibitions still exist. The following excerpt was taking from the Pentium Programmer's Reference Manual, chapter 5:

The setting of the flags stored by the PUSHF instruction, by interrupts, and by exceptions is different on the 32-bit processors than that stored by the 8086, and Intel 286 processors in bits 12 and 13 (IOPL), 14 (NT), and 15 (reserved). These differences can be used to distinguish what type of processor is present in a system while an application is running.

My biggest objection to this algorithm is that it's prone to failure on all processors newer than a 386. When it fails, the algorithm incorrectly determines that a 386 processor is installed in the system. The failure is caused when an interrupt occurs precisely where the <- appears in Listing Three. When this occurs, the AC flag is cleared (in real mode), and the algorithm fails to detect the correct processor type. The AC flag has always behaved in this manner, but the behavior wasn't documented until the 1994 edition of the Pentium Programmer's Reference Manual (chapter 25, description of INT instruction). There are a few ways to demonstrate this failure (assuming you're running on a 486 or later processor). You can put an HLT instruction or an INT instruction at the point designated by the "(", or run the algorithm in a loop. Eventually, a timer-tick interrupt will occur at this point. Inserting an HLT instruction will force the processor to wait for an interrupt before continuation. When the interrupt occurs, the AC flag will be cleared during its invocation. Listing Six presents source code to demonstrate this behavior.

Conclusion

The Intel algorithm isn't nearly as bad as it sounds. It has a few bugs that can easily be fixed. Intel's intentions were noble, but their implementation was flawed (see http://www.x86.org for an updated version of this algorithm). In spite of its drawbacks, the reasons this algorithm is in such widespread use are simple:

It's conveniently available and published by Intel.
It works-most of the time, even in v86 mode.

The biggest drawbacks are that it's not guaranteed to work outside of real mode, and it depends upon undocumented processor behavior. It would be nice if an algorithm existed to get the actual stepping information of processors that didn't support the CPUID instruction, and didn't rely on undocumented processor behavior. In my next column, I'll present such an algorithm, discuss its strengths and weaknesses, along with a comparison of the two algorithms under real operating conditions.

Listing One


    pushf                ; push original FLAGS
    pop     ax           ; get original FLAGS
    mov     cx, ax       ; save original FLAGS
    and     ax, 0fffh    ; clear bits 12-15 in FLAGS
    push    ax           ; save new FLAGS value on stack
    popf                 ; replace current FLAGS value
    pushf                ; get new FLAGS
    pop     ax           ; store new FLAGS in AX
    and     ax, 0f000h   ; if bits 12-15 are set, then
    cmp     ax, 0f000h   ;   processor is an 8086/8088
    mov     _cpu_type, 0 ; turn on 8086/8088 flag
    je      end_cpu_type ; jump if processor is 8086/8088



Listing Two

    or      cx, 0f000h   ; try to set bits 12-15
    push    cx           ; save new FLAGS value on stack
    popf                 ; replace current FLAGS value
    pushf                ; get new FLAGS
    pop     ax           ; store new FLAGS in AX
    and     ax, 0f000h   ; if bits 12-15 are clear
    mov     _cpu_type, 2 ; processor=80286, turn on 80286 flag
    jz      end_cpu_type ; if no bits set, processor is 80286




Listing Three

    pushfd               ; push original EFLAGS
    pop     eax          ; get original EFLAGS
    mov     ecx, eax     ; save original EFLAGS
    xor     eax, 40000h  ; flip AC bit in EFLAGS
    push    eax          ; save new EFLAGS value on stack
    popfd                ; replace current EFLAGS value
<-
    pushfd               ; get new EFLAGS
    pop     eax          ; store new EFLAGS in EAX
    xor     eax, ecx     ; can't toggle AC bit, processor=80386
    mov     _cpu_type, 3 ; turn on 80386 processor flag
    jz      end_cpu_type ; jump if 80386 processor
    push    ecx
    popfd                ; restore AC bit in EFLAGS first




Listing Four

    mov     _cpu_type, 4    ; turn on 80486 processor flag
    mov     eax, ecx        ; get original EFLAGS
    xor     eax, 200000h    ; flip ID bit in EFLAGS
    push    eax             ; save new EFLAGS value on stack
    popfd                   ; replace current EFLAGS value
    pushfd                  ; get new EFLAGS
    pop     eax             ; store new EFLAGS in EAX
    xor     eax, ecx        ; can't toggle ID bit,
    je      end_cpu_type    ; processor=80486





Listing Five

    mov      _cpuid_flag, 1  ; flag indicating use of CPUID inst.
    push     ebx             ; save registers
    push     esi     push     edi
    mov      eax, 0          ; set up for CPUID instruction
    CPU_ID                   ; get and save vendor ID

    mov      dword ptr _vendor_id, ebx
    mov      dword ptr _vendor_id[+4], edx
    mov      dword ptr _vendor_id[+8], ecx

    mov      si, ds
    mov      es, si

    mov      si, offset _vendor_id
    mov      di, offset intel_id
    mov      cx, 12          ; should be length intel_id
    cld                      ; set direction flag
    repe     cmpsb           ; compare vendor ID to "GenuineIntel"
    jne      end_cpuid_type  ; if not equal, not an Intel processor

    mov      _intel_CPU, 1   ; indicate an Intel processor
    cmp      eax, 1          ; make sure 1 is valid input for CPUID
    jl       end_cpuid_type  ; if not, jump to end
<-
    mov      eax, 1
    CPU_ID                   ; get family/model/stepping/features
    mov     _cpu_signature, eax
    mov     _features_ebx, ebx
    mov     _features_edx, edx
    mov     _features_ecx, ecx

    shr     eax, 8          ; isolate family
    and     eax, 0fh
    mov     _cpu_type, al   ; set _cpu_type with family




Listing Six

    TITLE   intel
    DOSSEG
    .model  small
    .stack  100h
;----- Include file section -----
    includelib  \masm\lib\miscutil.lib
    includelib  \masm\lib\videofns.lib
; ----- External declarations -----
    extrn   _get_fpu_type:     proc
    extrn   _get_cpu_type:     proc
    extrn   Set_cursor:        proc
    extrn   Get_cursor:        proc

    extrn   HEX32OUT:          proc
    extrn   CLS:               proc

    extrn   _cpu_type:         byte
    extrn   _fpu_type:         byte
    extrn   _cpuid_flag:       byte
    extrn   _intel_CPU:        byte
    extrn   _vendor_id:        byte     extrn   _cpu_signature:    dword
    extrn   _features_ecx:     dword
    extrn   _features_edx:     dword
    extrn   _features_ebx:     dword
;------ Local variables & Equates ------
        KBD_ReadFn      equ     0       ; function to read keyboard
    KBD_StatusFn        equ     1       ; function to read keyboard status
; ------ Misc data variables ------
    .data
    PSeriesMsg  label   byte
                db      "P6:     "
    P6Buffer    db      "         ",0dh,0ah
                db      "P5:     "
    P5Buffer    db      "         ",0dh,0ah
                db      "P4:     "
    P4Buffer    db      "         ",0dh,0ah
                db      "P3:     "
    P3Buffer    db      "         ",0dh,0ah
                db      "P2:     "
    P2Buffer    db      "         ",0dh,0ah
                db      "P2:     "
    P1Buffer    db      "         ",0dh,0ah
                db      "P0:     "
    P0Buffer    db      "         ",0dh,0ah,24h

    P6Count     dd      0
    P5Count     dd      0
    P4Count     dd      0
    P3Count     dd      0
    P2Count     dd      0
    P1Count     dd      0
    P0Count     dd      0

    CPUTbl1     dw      offset  P6Count
                dw      offset  P5Count
                dw      offset  P4Count
                dw      offset  P3Count
                dw      offset  P2Count
                dw      offset  P1Count
                dw      offset  P0Count

    CPUTbl2     dw      offset  P6Buffer
                dw      offset  P5Buffer
                dw      offset  P4Buffer
                dw      offset  P3Buffer
                dw      offset  P2Buffer
                dw      offset  P1Buffer
                dw      offset  P0Buffer

    CPUID_Buffer db     "        $"
;------------------------------------------------------------------------
        .code
        .8086
start:  mov     ax, @data
        mov     ds, ax      ; set segment register         mov     es, ax      ; set segment register
        and     sp, not 3   ; align stack to avoid AC fault
        call    CLS         ; clear screen
        call    Get_cursor
        mov     ah,9
        mov     dx,offset PSeriesMsg        ; get message buffer address
        int     21h
        mov     P6Buffer[8],'$'     ; make ASCII$ string
        mov     P5Buffer[8],'$'     ; make ASCII$ string
        mov     P4Buffer[8],'$'     ; make ASCII$ string
        mov     P3Buffer[8],'$'     ; make ASCII$ string
        mov     P2Buffer[8],'$'     ; make ASCII$ string
        mov     P1Buffer[8],'$'     ; make ASCII$ string
        mov     P0Buffer[8],'$'     ; make ASCII$ string
@GetCPUID:
        call    _get_cpu_type       ; determine processor type
        call    print
        mov     _cpu_type,0         ; clear it...for later
        mov     ah,KBD_StatusFn     ; get keyboard status
        int     16h                 ; read keyboard status
        jz      @GetCPUID
           mov     ah,KBD_ReadFn    ; read keyboard function
           int     16h              ; get get key
        mov     ax, 4c00h           ; terminate program
        int     21h
;----- print proc   near -----
       xor      bx,bx
       mov      bl,_cpu_type    ; get CPUID
       shl      bx,1            ; *2
       mov      si,CPUTbl1[bx]  ; get pointer to variable
       add      word ptr [si],1 ; adjust CPUID counter
       adc      word ptr [si][2],0
       mov      dx,608h         ; get initial row/col pointer
       sub      dh,byte ptr _cpu_type
       call     Set_cursor      ; set cursor position
       mov      si,CPUTbl1[bx]
       mov      di,CPUTbl2[bx]  ; get buffer pointer
       call     HEX32OUT        ; do buffer
       mov      ah,9            ; print it
       mov      dx,CPUTbl2[bx]  ; get buffer address
       int      21h
       ret
print  endp
       end      start

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Parallel

Undocumented Corner

Detecting Intel Processors