# Zyng UltraScale+ RFSoC **Data Sheet: Overview** DS889 (v1.5) July 23, 2018 **Advance Product Specification** # **General Description** The Zyng® UltraScale+™ RFSoC family integrates key subsystems for multiband, multi-mode cellular radios and cable infrastructure (DOCSIS) into an SoC platform that contains a feature-rich 64-bit guad-core ARM® Cortex<sup>™</sup>-A53 and dual-core ARM Cortex-R5 based processing system. Combining the processing system with UltraScale<sup>™</sup> architecture programmable logic and RF-ADCs, RF-DACs, and soft-decision FECs, the Zynq UltraScale+ RFSoC family is capable of implementing a complete software-defined radio including direct RF sampling data converters, enabling CPRI™ and gigabit Ethernet-to-RF on a single, highly programmable SoC. Zyng UltraScale+ RFSoCs integrate up to 16 channels of RF-ADCs and RF-DACs. The RF-ADCs can sample input frequencies up to 4GHz at 4.096GSPS with excellent noise spectral density. The RF-DACs generate output carrier frequencies up to 4GHz using the 2nd Nyquist zone with excellent noise spectral density at an update rate of 6.554GSPS. The RF data converters also include power efficient digital down converters (DDCs) and digital up converters (DUCs) that include programmable interpolation and decimation, NCO, and complex mixer. The DDCs and DUCs can also support dual-band operation. The soft-decision FEC (SD-FEC) is a highly flexible forward error correction engine capable of operating in Turbo decoding mode for wireless applications such as LTE and LDPC encode/decode mode used in 5G wireless, backhaul, and DOCSIS 3.1 cable modems. # Key Components of the Zynq UltraScale+ RFSoC Figure 1: Zyng UltraScale+ RFSoC © Copyright 2017–2018 Xilinx, Inc., Xilinx, the Xilinx logo, Artix, ISE, Kintex, Spartan, UltraScale, Virtex, Vivado, Zyng, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. AMBA, AMBA Designer, ARM, ARM1176JZ-S, CoreSight, Cortex, and PrimeCell are trademarks of ARM in the EU and other countries. CPRI is a trademark of Siemens AG. PCI, PCIe, and PCI Express are trademarks of PCI-SIG and used under license. All other trademarks are the # **Summary of Features** # **RF Data Converter Subsystem Overview** Most Zynq UltraScale+ RFSoCs include an RF data converter subsystem, which contains multiple radio frequency analog to digital converters (RF-ADCs) and multiple radio frequency digital to analog converters (RF-DACs). The high-precision, high-speed, power efficient RF-ADCs and RF-DACs can be individually configured for real data or can be configured in pairs for real and imaginary I/Q data. The 12-bit RF-ADCs support sample rates up to 2.058GSPS or 4.096GSPS, depending on the selected device. The 14-bit RF-DACs support sample rates up to 6.554GSPS. # Soft Decision Forward Error Correction (SD-FEC) Overview Some Zynq UltraScale+ RFSoCs include highly flexible soft-decision FEC blocks for decoding and encoding data as a means to control errors in data transmission over unreliable or noisy communication channels. The SD-FEC blocks support low-density parity check (LDPC) decode/encode and Turbo decode for use in 5G wireless, backhaul, DOCSIS, and LTE applications. ### **Processing System Overview** Zynq UltraScale+ RFSoCs feature a quad-core ARM Cortex-A53 (APU) with a dual-core ARM Cortex-R5 (RPU) processing system (PS). To support the processors' functionality, a number of peripherals with dedicated functions are included in the PS. For interfacing to external memories for data or configuration storage, the PS includes a multi-protocol dynamic memory controller, a DMA controller, a NAND controller, an SD/eMMC controller and a Quad SPI controller. In addition to interfacing to external memories, the APU also includes a Level-1 (L1) and Level-2 (L2) cache hierarchy; the RPU includes an L1 cache and Tightly Coupled memory subsystem. Each has access to a 256KB on-chip memory. For high-speed interfacing, the PS includes 4 channels of transmit (TX) and receive (RX) pairs of transceivers, called PS-GTR transceivers, supporting data rates of up to 6.0Gb/s. These transceivers can interface to the high-speed peripheral blocks to support PCIe® Gen2 root complex or Endpoint in x1, x2, or x4 configurations; Serial-ATA (SATA) at 1.5Gb/s, 3.0Gb/s, or 6.0Gb/s data rates; and up to two lanes of DisplayPort at 1.62Gb/s, 2.7Gb/s, or 5.4Gb/s data rates. The PS-GTR transceivers can also interface to components over USB 3.0 and Serial Gigabit Media Independent Interface (SGMII). For general connectivity, the PS includes: a pair of USB 2.0 controllers, which can be configured as host, device, or On-The-Go (OTG); an I2C controller; a UART; and a CAN2.0B controller that conforms to ISO11898-1. There are also four triple speed Ethernet MACs and 128 bits of GPIO, of which 78 bits are available through the MIO and 96 through the EMIO. High-bandwidth connectivity based on the ARM AMBA® AXI4 protocol connects the processing units with the peripherals and provides interface between the PS and the programmable logic (PL). # I/O, Transceiver, PCIe, 100G Ethernet, and 150G Interlaken Data is transported on and off chip through a combination of the high-performance parallel SelectIO™ interface and high-speed serial transceiver connectivity. I/O blocks provide support for cutting-edge memory interface and network protocols through flexible I/O standard and voltage support. The serial transceivers in the UltraScale architecture-based devices transfer data up to 32.75Gb/s, enabling 25G+ backplane designs with dramatically lower power per bit than previous generation transceivers. The GTY transceivers support the required data rates for PCIe Gen3, and Gen4 (rev 0.5), and integrated blocks for PCIe enable Zynq UltraScale+ RFSoCs to support up to Gen4 x8 and Gen3 x16 Endpoint and Root Port designs. Integrated blocks for 150Gb/s Interlaken and 100Gb/s Ethernet (100G MAC/PCS) extend the capabilities of UltraScale™ devices, enabling simple, reliable support for Nx100G switch and bridge applications. # **Clocks and Memory Interfaces** Zynq UltraScale+ RFSoCs contain powerful clock management circuitry, including clock synthesis, buffering, and routing components that together provide a highly capable framework to meet design requirements. The clock network allows for extremely flexible distribution of clocks to minimize the skew, power consumption, and delay associated with clock signals. The clock management technology is tightly integrated with dedicated memory interface circuitry to enable support for high-performance external memories, including DDR4. In addition to parallel memory interfaces, Zynq UltraScale+ RFSoCs support serial memories, such as hybrid memory cube (HMC). # Routing, Logic, Storage, and Signal Processing Configurable logic blocks (CLBs) containing 6-input look-up tables (LUTs) and flip-flops, DSP slices with 27x18 multipliers, 36Kb block RAMs with built-in FIFO and ECC support, and 4Kx72 UltraRAM blocks are all connected with an abundance of high-performance, low-latency interconnect. In addition to logical functions, the CLB provides shift register, multiplexer, and carry logic functionality as well as the ability to configure the LUTs as distributed memory to complement the highly capable and configurable block RAMs. The DSP slice, with its 96-bit-wide XOR functionality, 27-bit pre-adder, and 30-bit A input, performs numerous independent functions including multiply accumulate, multiply add, and pattern detect. ### Configuration, Encryption, and System Monitoring Zynq UltraScale+ RFSoCs are booted via the configuration security unit (CSU), which supports secure boot via the 256-bit AES-GCM and SHA/384 blocks. The cryptographic engines in the CSU can be used in the RFSoC after boot for user encryption. The System Monitor enables the monitoring of the physical environment via on-chip temperature and supply sensors and can also monitor up to 17 external analog inputs. # **Zynq UltraScale+ RFSoC Feature Summary** Table 1: Zynq UltraScale+ RFSoC Feature Summary | | XCZU21DR | XCZU25DR | XCZU27DR | XCZU28DR | XCZU29DR | |---------------------------------|---------------------------------------|--------------------------------------|---------------------|--------------------|------------------| | 12-bit, 4.096GSPS RF-ADC w/ DDC | 0 | 8 | 8 | 8 | 0 | | 12-bit, 2.058GSPS RF-ADC w/ DDC | 0 | 0 | 0 | 0 | 16 | | 14-bit, 6.554GSPS RF-DAC w/ DUC | 0 | 8 | 8 | 8 | 16 | | SD-FEC | 8 | 0 | 0 | 8 | 0 | | Application Processing Unit | | ortex-A53 MPCore<br>KB/32KB L1 Cache | | NEON and Single/D | ouble Precision | | Real-Time Processing Unit | Dual-core ARM Co<br>L1 Cache, and TC | rtex-R5 with CoreS<br>M | ight; Single/Double | Precision Floating | Point; 32KB/32KB | | Embedded and External Memory | 256KB On-Chip M<br>Quad-SPI; NAND; | emory w/ECC; Exte | ernal DDR4; DDR3; | DDR3L; LPDDR4; | LPDDR3; External | | General Connectivity | 214 PS I/O; UART<br>Triple Timer Coun | ; CAN; USB 2.0; I2<br>ters | 2C; SPI; 32b GPIO; | Real Time Clock; \ | Watchdog Timers; | | High-Speed Connectivity | 4 PS-GTR; PCIe® | Gen1/2; Serial AT | A 3.1; DisplayPort | 1.2a; USB 3.0; SG | MII | | System Logic Cells | 930,300 | 678,318 | 930,300 | 930,300 | 930,300 | | CLB Flip-Flops | 850,560 | 620,176 | 850,560 | 850,560 | 850,560 | | CLB LUTs | 425,280 | 310,088 | 425,280 | 425,280 | 425,280 | | Distributed RAM (Mb) | 13.0 | 9.6 | 13.0 | 13.0 | 13.0 | | Block RAM Blocks | 1,080 | 792 | 1,080 | 1,080 | 1,080 | | Block RAM (Mb) | 38.0 | 27.8 | 38.0 | 38.0 | 38.0 | | UltraRAM Blocks | 80 | 48 | 80 | 80 | 80 | | UltraRAM (Mb) | 22.5 | 13.5 | 22.5 | 22.5 | 22.5 | | DSP Slices | 4,272 | 3,145 | 4,272 | 4,272 | 4,272 | | CMTs | 8 | 6 | 8 | 8 | 8 | | Maximum HP I/O | 208 | 299 | 299 | 299 | 312 | | Maximum HD I/O | 72 | 48 | 48 | 48 | 96 | | System Monitor | 1 | 1 | 1 | 1 | 1 | | GTY Transceivers | 16 | 8 | 16 | 16 | 16 | | Transceivers Fractional PLLs | 8 | 4 | 8 | 8 | 8 | | PCIe Gen3 x16 and Gen4 x8 | 2 | 1 | 2 | 2 | 2 | | 150G Interlaken | 1 | 1 | 1 | 1 | 1 | | 100G Ethernet w/ RS-FEC | 2 | 1 | 2 | 2 | 2 | Table 2: Zynq UltraScale+ RFSoC Device-Package Combinations and Maximum I/Os | | XCZU21DR | XCZU25DR | XCZU27DR | XCZU28DR | XCZU29DR | | |----------|------------|--------------------------------------------|----------------------------|-----------------------------|-----------------------------|-------------------------------| | Package | Dimensions | PSIO, HDIO,<br>PS-GTR, GTY, RF-ADC, RF-DAC | | | | | | FFVD1156 | 35x35 | 214, 72, 208<br>4, 16, 0, 0 | | | | | | FFVE1156 | 35x35 | | 214, 48, 104<br>4, 8, 8, 8 | 214, 48, 104<br>4, 8, 8, 8 | 214, 48, 104<br>4, 8, 8, 8 | | | FSVE1156 | 35x35 | | 214, 48, 104<br>4, 8, 8, 8 | 214, 48, 104<br>4, 8, 8, 8 | 214, 48, 104<br>4, 8, 8, 8 | | | FFVG1517 | 40x40 | | 214, 48, 299<br>4, 8, 8, 8 | 214, 48, 299<br>4, 16, 8, 8 | 214, 48, 299<br>4, 16, 8, 8 | | | FSVG1517 | 40x40 | | 214, 48, 299<br>4, 8, 8, 8 | 214, 48, 299<br>4, 16, 8, 8 | 214, 48, 299<br>4, 16, 8, 8 | | | FFVF1760 | 42.5x42.5 | | | | | 214, 96, 312<br>4, 16, 16, 16 | | FSVF1760 | 42.5x42.5 | | | | | 214, 96, 312<br>4, 16, 16, 16 | # **RF Data Converter Subsystem** The RF data converter subsystem comprises RF-ADCs and RF-DACs. #### **RF-ADC Features** - Tile oriented - Four RF-ADCs and one PLL per tile - o 12-bit resolution - o Implemented as either 4 channels of 2.058GSPS, or 2 channels of 4.096GSPS (device dependent) - Decimation filters - o 1x, 2x, 4x, 8x - o Full bandwidth data-rate support - o 80% pass band, 89dB stop-band attenuation - Mixer - Full complex mixers - 48-bit NCO per RF-ADC - o Fixed Fs/4, Fs/2 low-power mode - Single/multiband flexibility - o 2x bands per 2.058GSPS RF-ADC pair - o Can be configured for real or imaginary (I/Q) inputs - Signal amplitude threshold - o Two programmable flags per RF-ADC **Advance Product Specification** - · Quadrature modulator correction - Gain/phase/offset correction per RF-ADC pair - Multi-chip synchronization - Flexible interconnect logic interface - o N words x frequency selection #### **RF-DAC Features** - Tile oriented - o Four RF-DACs and one PLL per tile - 14-bit resolution - Sampling speed 6.554GSPS per RF-DAC - 4GHz full power output bandwidth - Interpolation - o 1x, 2x, 4x, 8x - o Full bandwidth data rate support - o 80% pass band, 89dB stop band attenuation - Mixing - Full complex mixers - 48-bit NCO per RF-DAC - Fixed Fs/4, Fs/2 low-power mode - 1st/2nd Nyquist zone RF-DAC operation support - Single/multiband flexibility - 2x bands per RF-DAC pair - o Can be configured for real or imaginary (I/Q) outputs - Quadrature modulator correction - o Gain/phase/offset correction per RF-DAC pair - sinx/x correction - Sample delay correction - Multi-chip synchronization - Flexible interconnect logic interface - N words x frequency selection # **Soft Decision Forward Error Correction (SD-FEC)** The SD-FEC is a highly flexible soft-decision FEC decoder and LDPC encoder with the following features. # LDPC Decoding/Encoding - Highly configurable codes. - A range of Quasi-Cyclic codes can be configured over an AXI4-Lite interface - Code parameter memory can be shared across up to 128 codes - Codes can be selected on a block-by-block basis - o Encoder can re-use suitable decoder codes - Normalized min-sum decoding algorithm - Normalization factor programmable (from 0.0625 to 1 in steps of 0.0625) for layers - Number of iterations between 1 and 63 - Specified for each codeword - Early termination - Specified for each codeword to be none, one, or both of the following: - Parity check passes - No change in hard information or parity bits since last iteration - Soft or hard outputs - Specified for each codeword to include information and optional parity - 6-bit soft log likelihood ratio (LLR) input and 8-bit output (8-bit interface, 2 fractional bits, with external saturation before input to symmetric range –7.75 to +7.75) - In- or out-of-order execution of blocks, with user specified ID field to identify blocks ### **Turbo Decoding** - Max, Max Scale (scale factor is programmable as a multiple of 0.0625), or Max Star - Number of iterations between 1 and 63 - Specified for each block via streaming control interface - Early termination - Specified for each codeword to be none, one, or both of the following: - No change in hard decision since last iteration - CRC pass - Soft or hard outputs - o Specified for each codeword to include systematic and optionally parity 0 and parity 1 - 8-bit soft LLR on input and output (8-bit interface, 2 fractional bits, with external saturation before input to symmetric range -31.75 to +31.75) **Advance Product Specification** #### **Interfaces** - Separate clocks on each interface to ease integration - Wide data interfaces on input and output with configurable support for 1, 2, or 4 lanes - Ability to specify number of LLR values on each lane on either a block-by-block basis, or transfer basis - Separate inputs to specify control parameters and receive status output on a block-by-block basis # **Processing System** # **Application Processing Unit (APU)** The key features of the APU include: - 64-bit quad-core ARM Cortex-A53 MPCores. Features associated with each core include: - o ARM v8-A Architecture - Operating target frequency: up to 1.5GHz - Single and double precision floating point:4 SP / 2 DP FLOPs - NEON advanced SIMD support with single and double precision floating point instructions - o A64 instruction set in 64-bit operating mode, A32/T32 instruction set in 32-bit operating mode - Level 1 cache (separate instruction and data, 32KB each for each Cortex-A53 CPU) - 2-way set-associative Instruction Cache with parity support - 4-way set-associative Data Cache with ECC support - Integrated memory management unit (MMU) per processor core - TrustZone for secure mode operation - Virtualization support - Ability to operate in single processor, symmetric quad processor, and asymmetric quad-processor modes - Integrated 16-way set-associative 1MB Unified Level 2 cache with ECC support - Interrupts and Timers - Generic interrupt controller (GIC-400) - ARM generic timers (4 timers per CPU) - One watchdog timer (WDT) - One global timer - Two triple timers/counters (TTC) - CoreSight debug and trace support - Embedded trace macrocell (ETM) for instruction trace - Cross trigger interface (CTI) enabling hardware breakpoints and triggers - ACP interface to PL for I/O coherency and Level 2 cache allocation - ACE interface to PL for full coherency - Power island gating on each processor core - Optional eFUSE disable per core ### Real-Time Processing Unit (RPU) - Dual-core ARM Cortex-R5 MPCores. Features associated with each core include: - o ARM v7-R architecture (32-bit) - Operating target frequency: Up to 600MHz - o A32/T32 instruction set support - o 4-way set-associative Level 1 caches (separate instruction and data, 32KB each) with ECC support - o Integrated memory protection unit (MPU) per processor - o 128KB tightly coupled memory (TCM) with ECC support - o TCMs can be combined to become 256KB in lock-step mode - Ability to operate in single-processor or dual-processor modes (split and lock-step) - Dedicated SWDT and two triple timer counters (TTC) - CoreSight debug and trace support - o Embedded trace macrocell (ETM) for instruction and trace - Cross trigger interface (CTI) enabling hardware breakpoints and triggers - Optional eFUSE disable # Full-Power Domain DMA (FPD-DMA) and Low-Power Domain DMA (LPD-DMA) - Two general-purpose DMA controllers one in the full-power domain (FPD-DMA) and one in the low-power domain (LPD-DMA) - Eight independent channels per DMA - Multiple transfer types: - Memory-to-memory - o Memory-to-peripheral - o Peripheral-to-memory and - Scatter-gather - 8 peripheral interfaces per DMA - TrustZone per DMA for optional secure operation # Xilinx Memory Protection Unit (XMPU) - Region based memory protection unit - Up to 16 regions - Each region supports address alignment of 1MB or 4KB - Regions can overlap; the higher region number has priority - Each region can be independently enabled or disabled - Each region has a start and end address # **Dynamic Memory Controller (DDRC)** - DDR3, DDR3L, DDR4, LPDDR3, LPDDR4 - Target data rate: Up to 2400Mb/s DDR4 operation in -1 speed grade - 32-bit and 64-bit bus width support for DDR4, DDR3, DDR3L, or LPDDR3 memories, and 32-bit bus width support for LPDDR4 memory - ECC support (using extra bits) - Up to a total DRAM capacity of 32GB - Low power modes - Active/precharge power down - Self-refresh, including clean exit from self-refresh after a controller power cycle - Enhanced DDR training by allowing software to measure read/write eye and make delay adjustments dynamically - Independent performance monitors for read path and write path - Integration of PHY debug access port (DAP) into JTAG for testing The DDR memory controller is multi-ported and enables the PS and the PL to have shared access to a common memory. The DDR controller features six AXI slave ports for this purpose: - Two 128-bit AXI ports from the ARM Cortex-A53 CPU(s), RPU (ARM Cortex-R5 and LPD peripherals), high speed peripherals (USB3, PCIe & SATA), and High Performance Ports (HPO & HP1) from the PL through the cache coherent interconnect (CCI) - One 64-bit port is dedicated for the ARM Cortex-R5 CPU(s) - One 128-bit AXI port from the DisplayPort and HP2 port from the PL - One 128-bit AXI port from HP3 and HP4 ports from the PL - One 128-bit AXI port from general DMA and HP5 from the PL ### **High-Speed Connectivity Peripherals** #### **PCIe** - Compliant with the PCI Express Base Specification 2.1 - Fully compliant with PCI Express transaction ordering rules - Lane width: x1, x2, or x4 at Gen1 or Gen2 rates - 1 virtual channel - Full duplex PCIe port - Endpoint and single PCIe link Root Port - Root Port supports enhanced configuration access mechanism (ECAM), Cfg transaction generation - Root Port support for INTx, and MSI - Endpoint support for MSI or MSI-X - 1 physical function, no SR-IOV - No relaxed or ID ordering - Fully configurable BARs - o INTx not recommended, but can be generated - Endpoint to support configurable target/slave apertures with address translation and interrupt capability #### SATA - Compliant with SATA 3.1 specification - SATA host port supports up to 2 external devices - Compliant with advanced host controller interface (AHCI) ver. 1.3 - 1.5Gb/s, 3.0Gb/s, and 6.0Gb/s data rates - Power management features: supports partial and slumber modes #### **USB 3.0** - Two USB controllers (configurable as USB 2.0 or USB 3.0) - Up to 5.0Gb/s data rate - Host and device modes - Super speed, high speed, full speed, and low speed - Up to 12 endpoints - The USB host controller registers and data structures are compliant to Intel xHCI specifications - 64-bit AXI master port with built-in DMA - Power management features: Hibernation mode #### DisplayPort Controller - 4K display processing with DisplayPort output - Maximum resolution of 4K x 2K-30 (30Hz pixel rate) - o DisplayPort AUX channel, and hot-plug detect (HPD) on the output - o RGB YCbCr, 4:2:0; 4:2:2, 4:4:4 with 6, 8, 10, and 12b/c - Y-only, xvYCC, RGB 4:4:4, YCbCr 4:4:4, YCbCr 4:2:2, and YCbCr 4:2:0 video format with 6, 8, 10 and 12-bits per color component - o 256-color palette - Multiple frame buffer formats - o 1, 2, 4, 8 bits per pixel (bpp) via a palette - o 16, 24, 32bpp - o Graphics formats such as RGBA8888, RGB555, etc. - Accepts streaming video from the PL or dedicated DMA controller - Enables alpha blending of graphics and chroma keying - Audio support - o A single stream carries up to 8 LPCM channels at 192kHz with 24-bit resolution - Supports compressed formats including DRA, Dolby MAT, and DTS HD - Multi-stream transport can extend the number of audio channels - Audio copy protection - 2-channel streaming or input from the PL - Multi-channel non-streaming audio from a memory audio frame buffer - Includes a system time clock (STC) compliant with ISO/IEC 13818-1 - Boot-time display using minimum resources # Platform Management Unit (PMU) - Performs system initialization during boot - Acts as a delegate to the application and real-time processors during sleep state - Initiates power-up and restart after the wake-up request - Maintains the system power state at all time - Manages the sequence of low-level events required for power-up, power-down, reset, clock gating, and power gating of islands and domains - Provides error management (error handling and reporting) - Provides safety check functions (e.g., memory scrubbing) The PMU includes the following blocks: - Platform management processor - Fixed ROM for boot-up of the device - 128KB RAM with ECC for optional user/firmware code - Local and global registers to manage power-down, power-up, reset, clock gating, and power gating requests - Interrupt controller with 16 interrupts from other modules and the inter-processor communication interface (IPI) - GPI and GPO interfaces to and from PS I/O and PL - JTAG interface for PMU debug - · Optional user-defined firmware ### **Configuration Security Unit (CSU)** - Triple redundant secure processor block (SPB) with built-in ECC - Crypto interface block - 256-bit AES-GCM - o SHA-3/384 - o 4096-bit RSA - Key management unit - Built-in DMA - PCAP interface - Supports ROM validation during pre-configuration stage - Loads first stage boot loader (FSBL) into OCM in either secure or non-secure boot modes - Supports voltage, temperature, and frequency monitoring after configuration # Xilinx Peripheral Protection Unit (XPPU) - Provides peripheral protection support - Up to 20 masters simultaneously - Multiple aperture sizes - Access control for a specified set of address apertures on a per master basis - 64KB peripheral apertures and controls access on per peripheral basis # I/O Peripherals The IOP unit contains the data communication peripherals. Key features of the IOP include: #### Triple-Speed Gigabit Ethernet - Compatible with IEEE Std 802.3 and supports 10/100/1000Mb/s transfer rates (Full and Half duplex) - Supports jumbo frames - Built-in Scatter-Gather DMA capability - Statistics counter registers for RMON/MIB - Multiple I/O types (1.8, 2.5, 3.3V) on RGMII interface with external PHY - GMII interface to PL to support interfaces as: TBI, SGMII, and RGMII v2.0 support - Automatic pad and cyclic redundancy check (CRC) generation on transmitted frames - Transmitter and Receive IP, TCP, and UDP checksum offload - MDIO interface for physical layer management - Full duplex flow control with recognition of incoming pause frames and hardware generation of transmitted pause frames - 802.1Q VLAN tagging with recognition of incoming VLAN and priority tagged frames - Supports IEEE Std 1588 v2 #### SD/SDIO 3.0 Controller In addition to secure digital (SD) devices, this controller also supports eMMC 4.51. - Host mode support only - Built-in DMA - 1/4-Bit SD Specification, version 3.0 - 1/4/8-Bit eMMC Specification, version 4.51 - Supports primary boot from SD Card and eMMC (Managed NAND) - High speed, default speed, and low-speed support - 1 and 4-bit data interface support - Low speed clock 0-400KHz - Default speed 0-25MHz - High speed clock 0-50MHz - High-speed Interface - o SD UHS-1: 208MHz - o eMMC HS200: 200MHz - Memory, I/O, and SD cards - Power control modes - Data FIFO interface up to 512B **Advance Product Specification** #### **UART** - Programmable baud rate generator - 6, 7, or 8 data bits - 1, 1.5, or 2 stop bits - Odd, even, space, mark, or no parity - Parity, framing, and overrun error detection - Line break generation and detection - Automatic echo, local loopback, and remote loopback channel modes - Modem control signals: CTS, RTS, DSR, DTR, RI, and DCD (from EMIO only) #### SPI - Full-duplex operation offers simultaneous receive and transmit - 128B deep read and write FIFO - Master or slave SPI mode - Up to 3 chip select lines - Multi-master environment - Identifies an error condition if more than one master detected - Selectable master clock reference - Software can poll for status or be interrupt driven #### **12C** - 128-bit buffer size - Both normal (100kHz) and fast bus data rates (400kHz) - Master or slave mode - Normal or extended addressing - I2C bus hold for slow host service #### **GPIO** - Up to 128 GPIO bits - Up to 78-bits from MIO and 96-bits from EMIO - Each GPIO bit can be dynamically programmed as input or output - Independent reset values for each bit of all registers - Interrupt request generation for each GPIO signals - Single channel (bit) write capability for all control registers include data output register, direction control register, and interrupt clear register - Read back in output mode **Advance Product Specification** #### CAN - Conforms to the ISO 11898 -1, CAN2.0A, and CAN 2.0B standards - Both standard (11-bit identifier) and extended (29-bit identifier) frames - Bit rates up to 1Mb/s - Transmit and Receive message FIFO with a depth of 64 messages - Watermark interrupts for TXFIFO and RXFIFO - Automatic re-transmission on errors or arbitration loss in normal mode - Acceptance filtering of 4 acceptance filters - Sleep mode with automatic wake-up - Snoop mode - 16-bit timestamping for receive messages - Both internal generated reference clock and external reference clock input from MIO - Guarantee clock sampling edge between 80 to 83% at 24MHz reference clock input - Optional eFUSE disable per port #### **USB 2.0** - Two USB controllers (configurable as USB 2.0 or USB 3.0) - Host, device, and On-The-Go (OTG) modes - High Speed, Full Speed, and Low Speed - Up to 12 endpoints - 8-bit ULPI External PHY Interface - The USB host controller registers and data structures are compliant to Intel xHCI specifications. - 64-bit AXI master port with built-in DMA - Power management features: hibernation mode ### **Static Memory Interfaces** The static memory interfaces support external static memories. - ONFI 3.1 NAND flash support with up to 24-bit ECC - 1-bit SPI, 2-bit SPI, 4-bit SPI (Quad-SPI), or two Quad-SPI (8-bit) serial NOR flash - 8-bit eMMC interface supporting managed NAND flash #### NAND ONFI 3.1 Flash Controller - ONFI 3.1 compliant - Supports chip select reduction per ONFI 3.1 spec - SLC NAND for boot/configuration and data storage - ECC options based on SLC NAND - o 1, 4, or 8 bits per 512+spare bytes - o 24 bits per 1024+spare bytes - Maximum throughput as follows - o Asynchronous mode (SDR) 24.3MB/s - o Synchronous mode (NV-DDR) 112MB/s (for 100MHz flash clock) - 8-bit SDR NAND interface - 2 chip selects - Programmable access timing - 1.8V and 3.3V I/O - Built-in DMA for improved performance ### **Quad-SPI Controller** - 4 bytes (32-bit) and 3 bytes (24-bit) address width - Maximum SPI Clock at Master Mode at 150MHz - Single, Dual-Parallel, and Dual-Stacked mode - 32-bit AXI Linear Address Mapping Interface for read operation - Up to 2 chip select signals - Write Protection Signal - Hold signals - 4-bit bidirectional I/O signals - x1/x2/x4 read speed required - x1 write speed required only - 64-byte entry FIFO depth to improve QSPI read efficiency - Built-in DMA for improved performance ### Interconnect All the blocks are connected to each other and to the PL through a multi-layered ARM Advanced Microprocessor Bus Architecture (AMBA) AXI interconnect. The interconnect is non-blocking and supports multiple simultaneous master-slave transactions. The interconnect is designed with latency sensitive masters, such as the ARM CPU, having the shortest paths to memory, and bandwidth critical masters, such as the potential PL masters, having high throughput connections to the slaves with which they need to communicate. Traffic through the interconnect can be regulated through the Quality of Service (QoS) block in the interconnect. The QoS feature is used to regulate traffic generated by the CPU, DMA controller, and a combined entity representing the masters in the IOP. # **PS Interfaces** PS interfaces include external interfaces going off-chip or signals going from PS to PL. #### **PS External Interfaces** The Zynq UltraScale+RFSoC's external interfaces use dedicated pins that cannot be assigned as PL pins. These include: - Clock, reset, boot mode, and voltage reference - Up to 78 dedicated multiplexed I/O (MIO) pins, software-configurable to connect to any of the internal I/O peripherals and static memory controllers - 32-bit or 64-bit DDR4/DDR3/DDR3L/LPDDR3 memories with optional ECC - 32-bit LPDDR4 memory with optional ECC - 4 channels (TX and RX pair) for transceivers #### **MIO Overview** The IOP peripherals communicate to external devices through a shared pool of up to 78 dedicated multiplexed I/O (MIO) pins. Each peripheral can be assigned one of several pre-defined groups of pins, enabling a flexible assignment of multiple devices simultaneously. Although 78 pins are not enough for simultaneous use of all the I/O peripherals, most IOP interface signals are available to the PL, allowing use of standard PL I/O pins when powered up and properly configured. Extended multiplexed I/O (EMIO) allows unmapped PS peripherals to access PL I/O. Port mappings can appear in multiple locations. For example, there are up to 12 possible port mappings for CAN pins. The PS Configuration Wizard (PCW) tool aids in peripheral and static memory pin mapping. Table 3: MIO Peripheral Interface Mapping | Peripheral<br>Interface | MIO | ЕМІО | |------------------------------------------|--------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Quad-SPI<br>NAND | Yes | No | | USB2.0: 0,1 | Yes: External PHY | No | | SDIO 0,1 | Yes | Yes | | SPI: 0,1<br>I2C: 0,1<br>CAN: 0,1<br>GPIO | Yes CAN: External PHY GPIO: Up to 78 bits | Yes CAN: External PHY GPIO: Up to 96 bits | | GigE: 0,1,2,3 | RGMII v2.0:<br>External PHY | Supports GMII, RGMII v2.0 (HSTL), RGMII v1.3, MII, SGMII, and 1000BASE-X in Programmable Logic | | UART: 0,1 | Simple UART:<br>Only two pins (TX and RX) | <ul> <li>Full UART (TX, RX, DTR, DCD, DSR, RI, RTS, and CTS) requires either:</li> <li>Two Processing System (PS) pins (RX and TX) through MIO and six additional Programmable Logic (PL) pins, or</li> <li>Eight Programmable Logic (PL) pins</li> </ul> | | Debug Trace Ports | Yes: Up to 16 trace bits | Yes: Up to 32 trace bits | | Processor JTAG | Yes | Yes | #### Transceiver (PS-GTR) The four PS-GTR transceivers, which reside in the full-power domain (FPD), support data rates of up to 6.0Gb/s. All the protocols cannot be pinned out at the same time. At any given time, four differential pairs can be pinned out using the transceivers. This is user programmable via the high-speed I/O multiplexer (HS-MIO). - A Quad transceiver PS-GTR (TX/RX pair) able to support following standards simultaneously - o x1, x2, or x4 lane of PCIe at Gen1 (2.5Gb/s) or Gen2 (5.0Gb/s) rates - o 1 or 2 lanes of DisplayPort (TX only) at 1.62Gb/s, 2.7Gb/s, or 5.4Gb/s - o 1 or 2 SATA channels at 1.5Gb/s, 3.0Gb/s, or 6.0Gb/s - o 1 or 2 USB3.0 channels at 5.0Gb/s - o 1-4 Ethernet SGMII channels at 1.25Gb/s - Provides flexible host-programmable multiplexing function for connecting the transceiver resources to the PS masters (DisplayPort, PCIe, Serial-ATA, USB3.0, and GigE). #### **HS-MIO** The function of the HS-MIO is to multiplex access from the high-speed PS peripheral to the differential pair on the PS-GTR transceiver as defined in the configuration registers. Up to 4 channels of the transceiver are available for use by the high-speed interfaces in the PS. Table 4: HS-MIO Peripheral Interface Mapping | Peripheral Interface | Lane0 | Lane1 | Lane2 | Lane3 | |------------------------|--------|--------|--------|--------| | PCIe (x1, x2 or x4) | PCIe0 | PCIe1 | PCIe2 | PCIe3 | | SATA (1 or 2 channels) | SATA0 | SATA1 | SATA0 | SATA1 | | DisplayPort (TX only) | DP1 | DP0 | DP1 | DP0 | | USB0 | USB0 | USB0 | USB0 | _ | | USB1 | _ | _ | _ | USB1 | | SGMII0 | SGMII0 | _ | _ | - | | SGMII1 | _ | SGMII1 | _ | _ | | SGMI12 | _ | _ | SGMI12 | _ | | SGMI13 | _ | _ | _ | SGMII3 | #### **PS-PL Interface** The PS-PL interface includes: - AMBA AXI4 interfaces for primary data communication - Six 128-bit/64-bit/32-bit High Performance (HP) Slave AXI interfaces from PL to PS. - Four 128-bit/64-bit/32-bit HP AXI interfaces from PL to PS DDR. - Two 128-bit/64-bit/32-bit high-performance coherent (HPC) ports from PL to cache coherent interconnect (CCI). - o Two 128-bit/64-bit/32-bit HP Master AXI interfaces from PS to PL. - o One 128-bit/64-bit/32-bit interface from PL to RPU in PS (PL LPD) for low latency access to OCM. - o One 128-bit/64-bit/32-bit AXI interface from RPU in PS to PL (LPD\_PL) for low latency access to PL. - One 128-bit AXI interface (ACP port) for I/O coherent access from PL to Cortex-A53 cache memory. This interface provides coherency in hardware for Cortex-A53 cache memory. - o One 128-bit AXI interface (ACE Port) for Fully coherent access from PL to Cortex-A53. This interface provides coherency in hardware for Cortex-A53 cache memory and the PL. - Clocks and resets - Four PS clock outputs to the PL with start/stop control. - Four PS reset outputs to the PL. #### **High-Performance AXI Ports** The high-performance AXI4 ports provide access from the PL to DDR and high-speed interconnect in the PS. The six dedicated AXI memory ports from the PL to the PS are configurable as either 128-bit, 64-bit, or 32-bit interfaces. These interfaces connect the PL to the memory interconnect via a FIFO interface. Two of the AXI interfaces support I/O coherent access to the APU caches. Each high-performance AXI port has these characteristics: - Reduced latency between PL and processing system memory - 1KB deep FIFO - Configurable either as 128-bit, 64-bit, or 32-bit AXI interfaces - Multiple AXI command issuing to DDR #### Accelerator Coherency Port (ACP) The Zynq UltraScale+ RFSoC accelerator coherency port (ACP) is a 64-bit AXI slave interface that provides connectivity between the APU and a potential accelerator function in the PL. The ACP directly connects the PL to the snoop control unit (SCU) of the ARM Cortex-A53 processors, enabling cache-coherent access to CPU data in the L2 cache. The ACP provides a low latency path between the PS and a PL-based accelerator when compared with a legacy cache flushing and loading scheme. The ACP only snoops access in the CPU L2 cache, providing coherency in hardware. It does not support coherency on the PL side. So this interface is ideal for a DMA or an accelerator in the PL that only requires coherency on the CPU cache memories. For example, if a MicroBlaze™ processor in the PL is attached to the ACP interface, the cache of MicroBlaze processor will not be coherent with Cortex-A53 caches. ### AXI Coherency Extension (ACE) The Zynq UltraScale+ RFSoC AXI coherency extension (ACE) is a 64-bit AXI4 slave interface that provides connectivity between the APU and a potential accelerator function in the PL. The ACE directly connects the PL to the snoop control unit (SCU) of the ARM Cortex-A53 processors, enabling cache-coherent access to Cache Coherent Interconnect (CCI). The ACE provides a low-latency path between the PS and a PL-based accelerator when compared with a legacy cache flushing and loading scheme. The ACE snoops accesses to the CCI and the PL side, thus, providing full coherency in hardware. This interface can be used to hook up a cached interface in the PL to the PS as caches on both the Cortex-A53 memories and the PL master are snooped thus providing full coherency. For example, if a MicroBlaze processor in the PL is hooked up using an ACE interface, then Cortex-A53 and MicroBlaze processor caches will be coherent with each other. # Input/Output All Zynq UltraScale+ RFSoCs have I/O pins for communicating to external components. In addition, in the PS, there are another 78 I/Os that the I/O peripherals use to communicate to external components, referred to as multiplexed I/O (MIO). If more than 78 pins are required by the I/O peripherals, the I/O pins in the PL can be used to extend the RFSoC interfacing capability, referred to as extended MIO (EMIO). The number of I/O pins in the programmable logic varies depending on device and package. Each I/O is configurable and can comply with a large number of I/O standards. The I/Os are classed as high-performance (HP) or high-density (HD). The HP I/Os are optimized for highest performance operation, from 1.0V to 1.8V. The HD I/Os are reduced-feature I/Os organized in banks of 24, providing voltage support from 1.2V to 3.3V. All I/O pins are organized in banks, with 52 HP or 24 HD pins per bank. Each bank has one common $V_{CCO}$ output buffer power supply, which also powers certain input buffers. Some single-ended input buffers require an internally generated or an externally applied reference voltage ( $V_{REF}$ ). $V_{REF}$ pins can be driven directly from the PCB or internally generated using the internal $V_{REF}$ generator circuitry present in each bank. # I/O Electrical Characteristics Single-ended outputs use a conventional CMOS push/pull output structure driving High towards $V_{CCO}$ or Low towards ground, and can be put into a high-Z state. The system designer can specify the slew rate and the output strength. The input is always active but is usually ignored while the output is active. Each pin can optionally have a weak pull-up or a weak pull-down resistor. Most signal pin pairs can be configured as differential input pairs or output pairs. Differential input pin pairs can optionally be terminated with a $100\Omega$ internal resistor. All UltraScale devices support differential standards beyond LVDS, including RSDS, BLVDS, differential SSTL, and differential HSTL. Each of the I/Os supports memory I/O standards, such as single-ended and differential HSTL as well as single-ended and differential SSTL. Zyng UltraScale+ RFSoCs also support for MIPI with a dedicated D-PHY in the I/O bank. ### 3-State Digitally Controlled Impedance and Low Power I/O Features The 3-state Digitally Controlled Impedance (T\_DCI) can control the output drive impedance (series termination) or can provide parallel termination of an input signal to $V_{CCO}$ or split (Thevenin) termination to $V_{CCO}/2$ . This allows users to eliminate off-chip termination for signals using T\_DCI. In addition to board space savings, the termination automatically turns off when in output mode or when 3-stated, saving considerable power compared to off-chip termination. The I/Os also have low power modes for IBUF and IDELAY to provide further power savings, especially when used to implement memory interfaces. # I/O Logic #### **Input and Output Delay** All inputs and outputs can be configured as either combinatorial or registered. Double data rate (DDR) is supported by all inputs and outputs. Any input or output can be individually delayed by up to 1,250ps of delay with a resolution of 5–15ps. Such delays are implemented as IDELAY and ODELAY. The number of delay steps can be set by configuration and can also be incremented or decremented while in use. The IDELAY and ODELAY can be cascaded together to double the amount of delay in a single direction. #### **ISERDES** and **OSERDES** Many applications combine high-speed, bit-serial I/O with slower parallel operation inside the device. This requires a serializer and deserializer (SerDes) inside the I/O logic. Each I/O pin possesses an IOSERDES (ISERDES and OSERDES) capable of performing serial-to-parallel or parallel-to-serial conversions with programmable widths of 2, 4, or 8 bits. These I/O logic features enable high-performance interfaces, such as Gigabit Ethernet/1000BaseX/SGMII, to be moved from the transceivers to the SelectIO interface. # **High-Speed Serial Transceivers** Serial data transmission between devices on the same PCB, over backplanes, and across even longer distances is becoming increasingly important for scaling to 100Gb/s and 400Gb/s line cards. Specialized dedicated on-chip circuitry and differential I/O capable of coping with the signal integrity issues are required at these high data rates. Two types of transceivers are used in Zynq UltraScale+ RFSoCs: GTY in the PL and PS-GTR in the PS. Transceivers are arranged in groups of four, known as a transceiver Quad. Each serial transceiver is a combined transmitter and receiver. Table 5 compares the available transceivers. | Table | 5: T | ransce | iver I | nfor | mati | on | |-------|------|--------|--------|------|------|----| | | | | | | | | | | Zynq UltraScale+ RFSoCs | | | |----------------|----------------------------------------------------------|----------------------------------------------------------------------------------------|--| | Туре | PS-GTR | GTY | | | Qty | 4 | 8–16 | | | Max. Data Rate | 6.0Gb/s | 32.75Gb/s | | | Min. Data Rate | 1.25Gb/s | 0.5Gb/s | | | Key Apps | <ul><li>PCIe Gen2</li><li>USB</li><li>Ethernet</li></ul> | <ul><li>100G+ Optics</li><li>Chip-to-Chip</li><li>25G+ Backplane</li><li>HMC</li></ul> | | The following information in this section pertains to the GTY only. The serial transmitter and receiver are independent circuits that use an advanced phase-locked loop (PLL) architecture to multiply the reference frequency input by certain programmable numbers between 4 and 25 to become the bit-serial data clock. Each transceiver has a large number of user-definable features and parameters. All of these can be defined during device configuration, and many can also be modified during operation. #### **Transmitter** The transmitter is fundamentally a parallel-to-serial converter with a conversion ratio of 16, 20, 32, 40, 64, 80, 128, or 160. This allows the designer to trade off datapath width against timing margin in high-performance designs. These transmitter outputs drive the PC board with a single-channel differential output signal. TXOUTCLK is the appropriately divided serial data clock and can be used directly to register the parallel data coming from the internal logic. The incoming parallel data is fed through an optional FIFO and has additional hardware support for the 8B/10B, 64B/66B, or 64B/67B encoding schemes to provide a sufficient number of transitions. The bit-serial output signal drives two package pins with differential signals. This output signal pair has programmable signal swing as well as programmable pre- and post-emphasis to compensate for PC board losses and other interconnect characteristics. For shorter channels, the swing can be reduced to reduce power consumption. #### Receiver The receiver is fundamentally a serial-to-parallel converter, changing the incoming bit-serial differential signal into a parallel stream of words, each 16, 20, 32, 40, 64, 80, 128, or 160. This allows the designer to trade off internal datapath width against logic timing margin. The receiver takes the incoming differential data stream, feeds it through programmable DC automatic gain control, linear and decision feedback equalizers (to compensate for PC board, cable, optical and other interconnect characteristics), and uses the reference clock input to initiate clock recognition. There is no need for a separate clock line. The data pattern uses non-return-to-zero (NRZ) encoding and optionally ensures sufficient data transitions by using the selected encoding scheme. Parallel data is then transferred into the device logic using the RXUSRCLK clock. For short channels, the transceivers offer a special low-power mode (LPM) to reduce power consumption by approximately 30%. The receiver DC automatic gain control and linear and decision feedback equalizers can optionally "auto-adapt" to automatically learn and compensate for different interconnect characteristics. This enables even more margin for 10G+ and 25G+ backplanes. # **Out-of-Band Signaling** The transceivers provide out-of-band (OOB) signaling, often used to send low-speed signals from the transmitter to the receiver while high-speed serial data transmission is not active. This is typically done when the link is in a powered-down state or has not yet been initialized. This benefits PCIe and SATA/SAS and QPI applications. # **Integrated Interface Blocks for PCI Express Designs** The UltraScale architecture includes integrated blocks for PCIe technology that can be configured as an Endpoint or Root Port. UltraScale devices are compliant to the PCI Express Base Specification Revision 3.0. UltraScale+ devices are compliant to the PCI Express Base Specification Revision 3.1 for Gen3 and lower data rates, and compatible with the PCI Express Base Specification Revision 4.0 (rev 0.5) for Gen4 data rates. The Root Port can be used to build the basis for a compatible Root Complex, to allow custom chip-to-chip communication via the PCI Express protocol, and to attach ASSP Endpoint devices, such as Ethernet Controllers or Fibre Channel HBAs, to the RFSoC. This block is highly configurable to system design requirements and can operate up to the maximum lane widths and data rates listed in Table 6. Table 6: PCIe Maximum Configurations | | Zynq UltraScale+ RFSoCs | |----------------|-------------------------| | Gen1 (2.5Gb/s) | x16 | | Gen2 (5Gb/s) | x16 | | Gen3 (8Gb/s) | x16 | | Gen4 (16Gb/s) | x8 | For high-performance applications, advanced buffering techniques of the block offer a flexible maximum payload size of up to 1,024 bytes. The integrated block interfaces to the integrated high-speed transceivers for serial connectivity and to block RAMs for data buffering. Combined, these elements implement the Physical Layer, Data Link Layer, and Transaction Layer of the PCI Express protocol. Xilinx provides a light-weight, configurable, easy-to-use LogiCORE™ IP wrapper that ties the various building blocks (the integrated block for PCIe, the transceivers, block RAM, and clocking resources) into an Endpoint or Root Port solution. The system designer has control over many configurable parameters: link width and speed, maximum payload size, logic interface speeds, reference clock frequency, and base address register decoding and filtering. # **Integrated Block for Interlaken** Interlaken is a scalable chip-to-chip interconnect protocol designed to enable transmission speeds from 10Gb/s to 150Gb/s. The integrated block for Interlaken in the Zynq UltraScale+ RFSoC is compliant to revision 1.2 of the Interlaken specification with data striping and de-striping across 1 to 12 lanes. Permitted configurations are: 1 to 12 lanes at up to 12.5Gb/s and 1 to 6 lanes at up to 25.78125Gb/s, enabling flexible support for up to 150Gb/s per integrated block. # **Integrated Block for 100G Ethernet** Compliant to the IEEE Std 802.3ba, the 100G Ethernet integrated blocks provide low latency 100Gb/s Ethernet ports with a wide range of user customization and statistics gathering. With support for 10 x 10.3125Gb/s (CAUI) and 4 x 25.78125Gb/s (CAUI-4) configurations, the integrated block includes both the 100G MAC and PCS logic with support for IEEE Std 1588v2 1-step and 2-step hardware timestamping. The 100G Ethernet blocks contain a Reed Solomon Forward Error Correction (RS-FEC) block, compliant to IEEE Std 802.3bj, that can be used with the Ethernet block or stand alone in user applications. These families also support OTN mapping mode in which the PCS can be operated without using the MAC. # **Clock Management** The clock generation and distribution components are located adjacent to the columns that contain the memory interface and input and output circuitry. This tight coupling of clocking and I/O provides low-latency clocking to the I/O for memory interfaces and other I/O protocols. Within every clock management tile (CMT) resides one mixed-mode clock manager (MMCM), two PLLs, clock distribution buffers and routing, and dedicated circuitry for implementing external memory interfaces. ### **Mixed-Mode Clock Manager** The mixed-mode clock manager (MMCM) can serve as a frequency synthesizer for a wide range of frequencies and as a jitter filter for incoming clocks. At the center of the MMCM is a voltage-controlled oscillator (VCO), which speeds up and slows down depending on the input voltage it receives from the phase frequency detector (PFD). There are three sets of programmable frequency dividers (D, M, and O) that are programmable by configuration and during normal operation via the Dynamic Reconfiguration Port (DRP). The pre-divider D reduces the input frequency and feeds one input of the phase/frequency comparator. The feedback divider M acts as a multiplier because it divides the VCO output frequency before feeding the other input of the phase comparator. D and M must be chosen appropriately to keep the VCO within its specified frequency range. The VCO has eight equally-spaced output phases (0°, 45°, 90°, 135°, 180°, 225°, 270°, and 315°). Each phase can be selected to drive one of the output dividers, and each divider is programmable by configuration to divide by any integer from 1 to 128. The MMCM has three input-jitter filter options: low bandwidth, high bandwidth, or optimized mode. Low-Bandwidth mode has the best jitter attenuation. High-Bandwidth mode has the best phase offset. Optimized mode allows the tools to find the best setting. The MMCM can have a fractional counter in either the feedback path (acting as a multiplier) or in one output path. Fractional counters allow non-integer increments of 1/8 and can thus increase frequency synthesis capabilities by a factor of 8. The MMCM can also provide fixed or dynamic phase shift in small increments that depend on the VCO frequency. At 1,600MHz, the phase-shift timing increment is 11.2ps. #### **PLL** With fewer features than the MMCM, the two PLLs in a clock management tile are primarily present to provide the necessary clocks to the dedicated memory interface circuitry. The circuit at the center of the PLLs is similar to the MMCM, with PFD feeding a VCO and programmable M, D, and O counters. There are two divided outputs to the device fabric per PLL as well as one clock plus one enable signal to the memory interface circuitry. Zynq UltraScale+ RFSoCs are equipped with five additional PLLs in the PS for independently configuring the four primary clock domains with the PS: the APU, the RPU, the DDR controller, and the I/O peripherals. # **Clock Distribution** Clocks are distributed throughout the programmable logic via buffers that drive a number of vertical and horizontal tracks. There are 24 horizontal clock routes per clock region and 24 vertical clock routes per clock region with 24 additional vertical clock routes adjacent to the MMCM and PLL. Within a clock region, clock signals are routed to the device logic (CLBs, etc.) via 16 gateable leaf clocks. Several types of clock buffers are available. The BUFGCE and BUFCE\_LEAF buffers provide clock gating at the global and leaf levels, respectively. BUFGCTRL provides glitchless clock muxing and gating capability. BUFGCE\_DIV has clock gating capability and can divide a clock by 1 to 8. BUFG\_GT performs clock division from 1 to 8 for the transceiver clocks. Clocks can be transferred from the PS to the PL using dedicated buffers. # **Memory Interfaces** Memory interface data rates continue to increase, driving the need for dedicated circuitry that enables high performance, reliable interfacing to current and next-generation memory technologies. Every UltraScale device includes dedicated physical interfaces (PHY) blocks located between the CMT and I/O columns that support implementation of high-performance PHY blocks to external memories such as DDR4, DDR3, QDRII+, and RLDRAM3. The PHY blocks in each I/O bank generate the address/control and data bus signaling protocols as well as the precision clock/data alignment required to reliably communicate with a variety of high-performance memory standards. Multiple I/O banks can be used to create wider memory interfaces. As well as external parallel memory interfaces, Zynq UltraScale+ RFSoCs can communicate to external serial memories, such as Hybrid Memory Cube (HMC), via the high-speed serial transceivers. All transceivers in the UltraScale architecture support the HMC protocol, up to 15Gb/s line rates. # **Block RAM** Zynq UltraScale+ RFSoCs contain 36Kb block RAMs, each with two completely independent ports that share only the stored data. Each block RAM can be configured as one 36Kb RAM or two independent 18Kb RAMs. Each memory access, read or write, is controlled by the clock. Connections in every block RAM column enable signals to be cascaded between vertically adjacent block RAMs, providing an easy method to create large, fast memory arrays, and FIFOs with greatly reduced power consumption. All inputs, data, address, clock enables, and write enables are registered. The input address is always clocked (unless address latching is turned off), retaining data until the next operation. An optional output data pipeline register allows higher clock rates at the cost of an extra cycle of latency. During a write operation, the data output can reflect either the previously stored data or the newly written data, or it can remain unchanged. Block RAM sites that remain unused in the user design are automatically powered down to reduce total power consumption. There is an additional pin on every block RAM to control the dynamic power gating feature. # **Programmable Data Width** Each port can be configured as $32K \times 1$ ; $16K \times 2$ ; $8K \times 4$ ; $4K \times 9$ (or 8); $2K \times 18$ (or 16); $1K \times 36$ (or 32); or $512 \times 72$ (or 64). Whether configured as block RAM or FIFO, the two ports can have different aspect ratios without any constraints. Each block RAM can be divided into two completely independent 18Kb block RAMs that can each be configured to any aspect ratio from $16K \times 1$ to $512 \times 36$ . Everything described previously for the full 36Kb block RAM also applies to each of the smaller 18Kb block RAMs. Only in simple dual-port (SDP) mode can data widths of greater than 18-bits (18Kb RAM) or 36-bits (36Kb RAM) be accessed. In this mode, one port is dedicated to read operation, the other to write operation. In SDP mode, one side (read or write) can be variable, while the other is fixed to 32/36 or 64/72. Both sides of the dual-port 36Kb RAM can be of variable width. ### **Error Detection and Correction** Each 64-bit-wide block RAM can generate, store, and utilize eight additional Hamming code bits and perform single-bit error correction and double-bit error detection (ECC) during the read process. The ECC logic can also be used when writing to or reading from external 64- to 72-bit-wide memories. ### **FIFO Controller** Each block RAM can be configured as a 36Kb FIFO or an 18Kb FIFO. The built-in FIFO controller for single-clock (synchronous) or dual-clock (asynchronous or multirate) operation increments the internal addresses and provides four handshaking flags: full, empty, programmable full, and programmable empty. The programmable flags allow the user to specify the FIFO counter values that make these flags go active. The FIFO width and depth are programmable with support for different read port and write port widths on a single FIFO. A dedicated cascade path allows for easy creation of deeper FIFOs. ### **UltraRAM** UltraRAM is a high-density, dual-port, synchronous memory block available in Zynq UltraScale+ RFSoCs. Both of the ports share the same clock and can address all of the 4K x 72 bits. Each port can independently read from or write to the memory array. UltraRAM supports two types of write enable schemes. The first mode is consistent with the block RAM byte write enable mode. The second mode allows gating the data and parity byte writes separately. UltraRAM blocks can be connected together to create larger memory arrays. Dedicated routing in the UltraRAM column enables the entire column height to be connected together. If additional density is required, all the UltraRAMs can be connected together with a few logic resources to create single instances of RAM approximately 22Mb in size. This makes UltraRAM an ideal solution for replacing external memories such as SRAM. Cascadable anywhere from 288Kb to 22Mb, UltraRAM provides the flexibility to fulfill many different memory requirements. #### **Error Detection and Correction** Each 64-bit-wide UltraRAM can generate, store and utilize eight additional Hamming code bits and perform single-bit error correction and double-bit error detection (ECC) during the read process. # **Configurable Logic Block** Every configurable logic block (CLB) contains 8 LUTs and 16 flip-flops. The LUTs can be configured as either one 6-input LUT with one output, or as two 5-input LUTs with separate outputs but common inputs. Each LUT can optionally be registered in a flip-flop. In addition to the LUTs and flip-flops, the CLB contains arithmetic carry logic and multiplexers to create wider logic functions. Each CLB contains one slice. There are two types of slices: SLICEL and SLICEM. LUTs in the SLICEM can be configured as 64-bit RAM, as 32-bit shift registers (SRL32), or as two SRL16s. CLBs in the UltraScale architecture have increased routing and connectivity compared to CLBs in previous-generation Xilinx devices. They also have additional control signals to enable superior register packing, resulting in overall higher device utilization. ### Interconnect Various length vertical and horizontal routing resources in the UltraScale architecture that span 1, 2, 4, 5, 12, or 16 CLBs ensure that all signals can be transported from source to destination with ease, providing support for the next generation of wide data buses to be routed across even the highest capacity devices while simultaneously improving quality of results and software run time. # **Digital Signal Processing** DSP applications use many binary multipliers and accumulators, best implemented in dedicated DSP slices. All Zynq UltraScale+ RFSoCs have many dedicated, low-power DSP slices, combining high speed with small size while retaining system design flexibility. Each DSP slice fundamentally consists of a dedicated 27 × 18 bit twos complement multiplier and a 48-bit accumulator. The multiplier can be dynamically bypassed, and two 48-bit inputs can feed a single-instruction-multiple-data (SIMD) arithmetic unit (dual 24-bit add/subtract/accumulate or quad 12-bit add/subtract/accumulate), or a logic unit that can generate any one of ten different logic functions of the two operands. The DSP includes an additional pre-adder, typically used in symmetrical filters. This pre-adder improves performance in densely packed designs and reduces the DSP slice count by up to 50%. The 96-bit-wide XOR function, programmable to 12, 24, 48, or 96-bit widths, enables performance improvements when implementing forward error correction and cyclic redundancy checking algorithms. The DSP also includes a 48-bit-wide pattern detector that can be used for convergent or symmetric rounding. The pattern detector is also capable of implementing 96-bit-wide logic functions when used in conjunction with the logic unit. The DSP slice provides extensive pipelining and extension capabilities that enhance the speed and efficiency of many applications beyond digital signal processing, such as wide dynamic bus shifters, memory address generators, wide bus multiplexers, and memory-mapped I/O register files. The accumulator can also be used as a synchronous up/down counter. # **System Monitor** System Monitor is used to enhance the overall safety, security, and reliability of the system by monitoring the physical environment via on-chip power supply and temperature sensors and external channels to the ADC. Zyng UltraScale+ RFSoCs contain an additional System Monitor block in the PS. See Table 7. Table 7: Key System Monitor Features | | Zynq UltraScale+ RFSoC PL | Zynq UltraScale+ RFSoC PS | |------------|---------------------------|---------------------------| | ADC | 10-bit 200kSPS | 10-bit 1MSPS | | Interfaces | JTAG, I2C, DRP, PMBus | APB | In the System Monitor in the PL, sensor outputs and up to 17 user-allocated external analog inputs are digitized using a 10-bit 200 kilo-sample-per-second (kSPS) ADC, and the measurements are stored in registers that can be accessed via internal DRP, JTAG, PMBus, or I2C interfaces. The I2C interface and PMBus allow the on-chip monitoring to be easily accessed by the System Manager/Host before and after device configuration. The System Monitor in the RFSoC PS uses a 10-bit, 1 mega-sample-per-second (MSPS) ADC to digitize the sensor outputs. The measurements are stored in registers and are accessed via the Advanced Peripheral Bus (APB) interface by the processors and the platform management unit (PMU) in the PS. # **Booting RFSoCs** Zynq UltraScale+ RFSoCs use a multi-stage boot process that supports both a non-secure and a secure boot. The PS is the master of the boot and configuration process. For a secure boot, the AES-GCM, SHA-3/384 decryption/authentication, and 4096-bit RSA blocks decrypt and authenticate the image. Upon reset, the device mode pins are read to determine the primary boot device to be used: NAND, Quad-SPI, SD, eMMC, or JTAG. JTAG can only be used as a non-secure boot source and is intended for debugging purposes. One of the CPUs, Cortex-A53 or Cortex-R5, executes code out of on-chip ROM and copies the first stage boot loader (FSBL) from the boot device to the on-chip memory (OCM). After copying the FSBL to OCM, the processor executes the FSBL. Xilinx supplies example FSBLs or users can create their own. The FSBL initiates the boot of the PS and can load and configure the PL, or configuration of the PL can be deferred to a later stage. The FSBL typically loads either a user application or an optional second stage boot loader (SSBL) such as U-Boot. Users obtain example SSBL from Xilinx or a third party, or they can create their own SSBL. The SSBL continues the boot process by loading code from any of the primary boot devices or from other sources such as USB, Ethernet, etc. If the FSBL did not configure the PL, the SSBL can do so, or again, the configuration can be deferred to a later stage. The static memory interface controller (NAND, eMMC, or Quad-SPI) is configured using default settings. To improve device configuration speed, these settings can be modified by information provided in the boot image header. The ROM boot image is not user readable or executable after boot. # **Packaging** Zynq UltraScale+ RFSoCs are available in high-performance, organic flip-chip and lidless flip-chip packages supporting different quantities of I/Os, transceivers, RF-ADCs and RF-DACs. Decoupling capacitors are mounted on the package substrate to optimize signal integrity under simultaneous switching of outputs (SSO) conditions. Always refer to the specific device data sheet for performance specifications by package and speed grade. # **Ordering Information** Table 8 shows the speed grade, temperature ranges, and operating voltages available in the Zynq UltraScale+ RFSoC device family. Table 8: Speed Grade, Temperature Range, and Operating Voltages | | | Speed Grade, Temperature Ranges, and V <sub>CCINT</sub> Operating Voltages | | | | | |---------------------|-------------|----------------------------------------------------------------------------|-----------------------------------------|--------------------------------------|--|--| | Device<br>Family | XC Devices | es Extended (E) | | Industrial<br>(I) | | | | | | 0°C to +100°C | 0°C to +110°C | -40°C to +100°C | | | | | | -2E (0.85V) | | -2I (0.85V) | | | | Zynq<br>UltraScale+ | | | -2LE <sup>(1)(2)</sup> (0.85V or 0.72V) | -2LI (0.72V) <sup>(3)</sup> | | | | RFSoCs Devices | -1E (0.85V) | | -1I (0.85V) | | | | | | | | | -1LI <sup>(2)</sup> (0.85V or 0.72V) | | | #### Notes: - 1. In -2LE speed/temperature grade, devices can operate for a limited time with junction temperature of 110°C. Timing parameters adhere to the same speed file at 110°C as they do below 110°C, regardless of operating voltage (nominal at 0.85V or low voltage at 0.72V). Operation at 110°C Tj is limited to 1% of the device lifetime and can occur sequentially or at regular intervals as long as the total time does not exceed 1% of device lifetime. - 2. When operating the PL at low voltage (0.72V), the PS operates at nominal voltage (0.85V). - 3. In -2LI speed/temperature grade, devices can operate for a limited time with junction temperature of 110°C. Timing parameters adhere to the same speed file at 110°C as they do below 110°C. Operation at 110°C Tj is limited to 5% of the device lifetime and can occur sequentially or at regular intervals as long as the total time does not exceed 5% of device lifetime. The ordering information shown in Figure 2 applies to all packages in the Zynq UltraScale+ RFSoC family. Figure 2: Zynq UltraScale+ RFSoC Ordering Information # **Revision History** The following table shows the revision history for this document: | Date | Version | Description of Revisions | |------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 07/23/2018 | 1.5 | Updated Figure 1. | | 05/17/2018 | 1.4 | Updated General Description, RF Data Converter Subsystem Overview, Table 1, RF-ADC Features, Table 8 (removed -3E, added -2LI, and note 3), and Figure 2. | | 01/23/2018 | 1.3 | Added lidless flip-chip packaging to Packaging. | | 12/19/2017 | 1.2 | Updated RF-ADC/DAC rates throughout document: General Description, RF Data Converter Subsystem Overview, Table 1, RF-ADC Features, and RF-DAC Features. | | 11/15/2017 | 1.1 | Updated Table 2 with FSVE1156, FSVG1517, and FSVF1760 packages. Updated Figure 2 with Lidless Stiffener information. Updated Application Processing Unit (APU) and Real-Time Processing Unit (RPU). | | 10/03/2017 | 1.0 | Initial Xilinx release. | ### Disclaimer The information disclosed to you hereunder (the "Materials") is provided solely for the selection and use of Xilinx products. To the maximum extent permitted by applicable law: (1) Materials are made available "AS IS" and with all faults, Xilinx hereby DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and (2) Xilinx shall not be liable (whether in contract or tort, including negligence, or under any other theory of liability) for any loss or damage of any kind or nature related to, arising under, or in connection with, the Materials (including your use of the Materials), including for any direct, indirect, special, incidental, or consequential loss or damage (including loss of data, profits, goodwill, or any type of loss or damage suffered as a result of any action brought by a third party) even if such damage or loss was reasonably foreseeable or Xilinx had been advised of the possibility of the same. Xilinx assumes no obligation to correct any errors contained in the Materials or to notify you of updates to the Materials or to product specifications. You may not reproduce, modify, distribute, or publicly display the Materials without prior written consent. Certain products are subject to the terms and conditions of Xilinx's limited warranty, please refer to Xilinx's Terms of Sale which can be viewed at <a href="http://www.xilinx.com/legal.htm#tos">http://www.xilinx.com/legal.htm#tos</a>; IP cores may be subject to warranty and support terms contained in a license issued to you by Xilinx. Xilinx products are not designed or intended to be fail-safe or for use in any application requiring fail-safe performance; you assume sole risk and liability for use of Xilinx products in such critical applications, please refer to Xilinx's Terms of Sale which can be viewed at <a href="http://www.xilinx.com/legal.htm#tos">http://www.xilinx.com/legal.htm#tos</a>. This document contains preliminary information and is subject to change without notice. Information provided herein relates to products and/or services not yet available for sale, and provided solely for information purposes and are not intended, or to be construed, as an offer for sale or an attempted commercialization of the products and/or services referred to herein. ### **Automotive Applications Disclaimer** AUTOMOTIVE PRODUCTS (IDENTIFIED AS "XA" IN THE PART NUMBER) ARE NOT WARRANTED FOR USE IN THE DEPLOYMENT OF AIRBAGS OR FOR USE IN APPLICATIONS THAT AFFECT CONTROL OF A VEHICLE ("SAFETY APPLICATION") UNLESS THERE IS A SAFETY CONCEPT OR REDUNDANCY FEATURE CONSISTENT WITH THE ISO 26262 AUTOMOTIVE SAFETY STANDARD ("SAFETY DESIGN"). CUSTOMER SHALL, PRIOR TO USING OR DISTRIBUTING ANY SYSTEMS THAT INCORPORATE PRODUCTS, THOROUGHLY TEST SUCH SYSTEMS FOR SAFETY PURPOSES. USE OF PRODUCTS IN A SAFETY APPLICATION WITHOUT A SAFETY DESIGN IS FULLY AT THE RISK OF CUSTOMER, SUBJECT ONLY TO APPLICABLE LAWS AND REGULATIONS GOVERNING LIMITATIONS ON PRODUCT LIABILITY.