Ray C. C. Cheung
Latest
- MLFormer: a high performance MPC linear inference framework for transformers
- Randomized tensor decomposition using parallel reconfigurable systems
- A LoRaWAN-BLE Based AIoT Smart Farm Management and Control System
- An AIoT LoRaWAN Control System With Compression and Image Recovery Algorithm (CIRA) for Extreme Weather
- An Efficient FPGA-based Depthwise Separable Convolutional Neural Network Accelerator with Hardware Pruning
- Building a Learner-Centric Citywide Digital Literacy Ecosystem: Train-the-Trainer, Community-Based Learning, and Gifted Education - A Guide for Educators, Policymakers, and Stakeholders
- Efficient Blind Hyperspectral Unmixing Framework Based on CUR Decomposition (CUR-HU)
- Efficient Key-Switching for Word-Type FHE and GPU Acceleration
- Enhanced Black-Scholes Option Pricing: Bit-Width Optimization with Automatic Differentiation and Lagrange Multipliers
- EOG Signal Processor: An SVM-based Multiclass Classifier to Detect Eye Movements
- Gradient-Congruity Guided Federated Sparse Training
- HTCNN: High-Throughput Batch CNN Inference with Homomorphic Encryption for Edge Computing
- MSCA: A Multi-Grained Sparse Convolution Accelerator for DNN Training
- PQNTRU: Acceleration of NTRU-based Schemes via Customized Post-Quantum Processor
- ProgramGalois: A Programmable Generator of Radix-4 Discrete Galois Transformation Architecture for Lattice-Based Cryptography
- REALISE-IoT: RISC-V-Based Efficient and Lightweight Public-Key System for IoT Applications
- Revisiting Keccak and Dilithium Implementations on ARMv7-M
- RO-SVD: A Reconfigurable Hardware Copyright Protection Framework for AIGC Applications
- RO-SVD: A Reconfigurable Hardware Copyright Protection Framework for AIGC Applications
- Yet Another Improvement of Plantard Arithmetic for Faster Kyber on Low-End 32-bit IoT Devices
- A Platform for Adaptive Interference Mitigation and Intent Analysis Using OpenLANE
- A Versatility-Performance Balanced Hardware Architecture for Scene Text Detection
- Algorithm-Hardware Co-Design of Split-Radix Discrete Galois Transformation for KyberKEM
- Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer
- Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer
- CO-Detector: Towards Complex Object Detection with Cross-Part Feature Learning in Remote Sensing
- Design of a Hippocampal Cognitive Prosthesis Chip
- Efficient and Automatic Breast Cancer Early Diagnosis System Based on the Hierarchical Extreme Learning Machine
- Efficient Multiple Channels EEG Signal Classification Based on Hierarchical Extreme Learning Machine
- High-performance and Configurable SW/HW Co-design of Post-quantum Signature CRYSTALS-Dilithium
- Homomorphic Encryption-Based System Design for Secure Data Processing
- Image Super-Resolution and FPGA Hardware Design
- In-Network Aggregation with Transport Transparency for Distributed Training
- MUREN: MUltistage Recursive Enhanced Network for Coal-Fired Power Plant Detection
- Yet another Improvement of Plantard Arithmetic for Faster Kyber on Low-end 32-bit IoT Devices
- A High-Performance FPGA Accelerator for CUR Decomposition
- A Versatility-Performance Balanced Hardware Architecture for Scene Text Detection
- Comp-TCAM: An Adaptable Composite Ternary Content-Addressable Memory on FPGAs
- High Throughput Hardware/Software Heterogeneous System for RRPN-Based Scene Text Detection
- Improved Plantard Arithmetic for Lattice-based Cryptography
- Improved Plantard Arithmetic for Lattice-based Cryptography
- Machine Learning Based Hardware Architecture for DOA Measurement From Mice EEG
- Melting Glacier: A 37-Year (1984-2020) High-Resolution Glacier-Cover Record of MT. Kilimanjaro
- Message from the General Chair and Program Co-Chairs
- PipeNTT: A Pipelined Number Theoretic Transform Architecture
- Preface
- Reconfigurable content-addressable memory (CAM) on FPGAs: A tutorial and survey
- A survey of breakthrough in blockchain technology: Adoptions, applications, challenges and future research
- A systematic review of blockchain scalability: Issues, solutions, analysis and future research
- Accelerated Updating Mechanisms for FPGA-Based Ternary Content-Addressable Memory
- Aero-Hydroponic Agriculture IoT System
- An Efficient Parallel Processor for Dense Tensor Computation
- An FPGA-based MobileNet Accelerator Considering Network Structure Characteristics
- Design of a Battery Carrying Barge for Enhancing Autonomous Sailboat's Endurance Capacity
- Efficient High-Performance FPGA-Redis Hybrid NoSQL Caching System for Blockchain Scalability
- Elastic Net Constraint-Based Tensor Model for High-Order Graph Matching
- LoRaWAN-based Camera with (CIRA) Compression and Image Recovery Algorithm
- On the Suitability of Read only Memory for FPGA-Based CAM Emulation Using Partial Reconfiguration
- A Highly Parallel Constant-Time Almost-Inverse Algorithm
- Binary convolutional neural network acceleration framework for rapid system prototyping
- Compact Code-Based Signature for Reconfigurable Devices With Side Channel Resilience
- Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers
- Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers
- NetReduce: RDMA-Compatible In-Network Reduction for Distributed DNN Training Acceleration
- RPE-TCAM: Reconfigurable Power-Efficient Ternary Content-Addressable Memory on FPGAs
- A high performance hardware architecture for non-negative tensor factorization
- A robust background initialization algorithm with superpixel motion detection
- Accurate and Compact Convolutional Neural Networks with Trained Binarization
- Accurate and Compact Convolutional Neural Networks with Trained Binarization
- An Efficient Application Specific Instruction Set Processor (ASIP) for Tensor Computation
- Bank-selective Strategy for Gate-based Ternary Content-addressable Memory on FPGAs
- D-TCAM: A High-Performance Distributed RAM Based TCAM Architecture on FPGAs
- Feature Selection Based on Tensor Decomposition and Object Proposal for Night-Time Multiclass Vehicle Detection
- High performance hardware architecture for singular spectrum analysis of Hankel tensors
- High Performance Power-Efficient Gate-Based CAM for Reconfigurable Computing
- Optimized Polynomial Multiplier Over Commutative Rings on FPGAs: A Case Study on BIKE
- Reconfigurable RISC-V Secure Processor And SoC Integration
- A fast inter CU decision algorithm for HEVC
- A Robust Background Initialization Algorithm with Superpixel Motion Detection
- ASIC Implementation of a Nonlinear Dynamical Model for Hippocampal Prosthesis
- Dynamic Virtual Page-Based Flash Translation Layer With Novel Hot Data Identification and Adaptive Parallelism Management
- FFT-Based McLaughlin's Montgomery Exponentiation without Conditional Selections
- High-Speed Discrete Gaussian Sampler With Heterodyne Chaotic Laser Inputs
- Lightweight Secure Processor Prototype on FPGA
- Spectral arithmetic in Montgomery modular multiplication
- A Bias-Bounded Digital True Random Number Generator Architecture
- A Fully Pipelined Hardware Architecture for Intra Prediction of HEVC
- A low power V-band LC VCO with high Q varactor technique in 40 nm CMOS process
- Area-Time Efficient Architecture of FFT-Based Montgomery Multiplication
- Area-Time Efficient Computation of Niederreiter Encryption on QC-MDPC Codes for Embedded Hardware
- Compact Constant Weight Coding Engines for the Code-Based Cryptography
- Fast HEVC intra coding decision based on statistical cost and corner detection
- High DC gain and wide output swing class-C inverter
- Toward Practical Code-Based Signature: Implementing Fast and Compact QC-LDGM Signature Scheme on Embedded Hardware
- An FPGA-Based High-Performance Neural Ensemble Spiking Activity Simulator Utilizing Generalized Volterra Kernel and Complexity Analysis
- FPGA-Based High-Performance Collision Detection: An Enabling Technique for Image-Guided Robotic Surgery
- Parameter Space for the Architecture of FFT-Based Montgomery Modular Multiplication
- A Fast CU Size Decision Algorithm for the HEVC Intra Encoder
- An Application Specific Instruction Set Processor (ASIP) for Adaptive Filters in Neural Prosthetics
- Architecture Support for Task Out-of-Order Execution in MPSoCs
- Configurable Architectures for Multi-Mode Floating Point Adders
- Efficient Pairing Computation on Huff Curves
- Fast and Generic Inversion Architectures Over GF(2(^mboxm)) Using Modified Itoh-Tsujii Algorithms
- High-Speed Polynomial Multiplication Architecture for Ring-LWE and SHE Cryptosystems
- Z-TCAM: An SRAM-based Architecture for TCAM
- A complementary architecture for high-speed true random number generator
- A low-power inverter-based (Σ)(Δ) analog-to-digital converter for audio applications
- A perfectly current matched charge pump with wide dynamic range for ultra low voltage applications
- An FPGA based scalable architecture of a stochastic state point process filter (SSPPF) to track the nonlinear dynamics underlying neural spiking
- Big data genome sequencing on Zynq based clusters (abstract only)
- Configurable Architecture for Double/Two-Parallel Single Precision Floating Point Division
- Design Exploration of Geometric Biclustering for Microarray Data Analysis in Data Mining
- E-TCAM: An Efficient SRAM-Based Architecture for TCAM
- GPU-based biclustering for microarray data analysis in neurocomputing
- High-speed Polynomial Multiplication Architecture for Ring-LWE and SHE Cryptosystems
- Laguerre-volterra model and architecture for MIMO system identification and output prediction
- Novel RNS Parameter Selection for Fast Modular Multiplication
- Series Expansion based Efficient Architectures for Double Precision Floating Point Division
- Time-efficient computation of digit serial Montgomery multiplication
- Trade-offs between the sensitivity and the speed of the FPGA-based sequence aligner
- Unified Architecture for Double/Two-Parallel Single Precision Floating Point Adder
- VLSI architecture of a high-performance neural spiking activity simulator based on generalized Volterra kernel
- Zero collision attack and its countermeasures on Residue Number System multipliers
- A (σ)(δ) modulator using gain-Boost Class-C Inverter for Audio Applications
- A 0.8-V 230-(µ)W 98-dB DR Inverter-Based (Σ)(Δ) Modulator for Audio Applications
- A customizable Stochastic State Point Process Filter (SSPPF) for neural spiking activity
- A Flexible and Customizable Architecture for the Relaxation Labeling Algorithm
- A memory-based NFA regular expression match engine for signature-based intrusion detection
- A reconfigurable architecture for real-time prediction of neural activity
- A scalable RNS Montgomery multiplier over F(_mbox2(^mboxm))
- Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support
- Binding Hardware IPs to Specific FPGA Device via Inter-twining the PUF Response with the FSM of Sequential Circuits
- Design Automation Framework for Reconfigurable Interconnection Networks
- Design space explorations of Hybrid-Partitioned TCAM (HP-TCAM)
- Fast simulation of Digital Spiking Silicon Neuron model employing reconfigurable dataflow computing
- FPGA IP protection by binding Finite State Machine to Physical Unclonable Function
- Genome sequencing using mapreduce on FPGA with multiple hardware accelerators (abstract only)
- HEALPIX DCT technique for compressing PCA-based illumination adjustable images
- Noise filtering and occurrence identification of mouse ultrasonic vocalization call
- Parallel architecture for DNA sequence inexact matching with Burrows-Wheeler Transform
- Real-Time Prediction of Neuronal Population Spiking Activity Using FPGA
- VLSI Implementation of Double-Precision Floating-Point Multiplier Using Karatsuba Technique
- A dual mode FPGA design for the hippocampal prosthesis
- An FPGA-based acceleration platform for auction algorithm
- Area-Efficient Architectures for Large Integer and Quadruple Precision Floating Point Multipliers
- Area-Efficient FPGA Implementation of Quadruple Precision Floating Point Multiplier
- Faster Pairing Coprocessor Architecture
- FPGA Implementation of SRAM-based Ternary Content Addressable Memory
- GPU-Based Biclustering for Neural Information Processing
- High Performance Reconfigurable Architecture for Double Precision Floating Point Division
- Hypergraph based geometric biclustering algorithm
- Low complexity and hardware-friendly spectral modular multiplication
- Reconfigurable Computing: Architectures, Tools and Applications - 8th International Symposium, ARC 2012, Hong Kong, China, March 19-23, 2012. Proceedings
- Subthreshold CMOS voltage reference circuit with body bias compensation for process variation
- A hardware-based computational platform for Generalized Laguerre-Volterra MIMO model for neural activities
- A High Speed Pairing Coprocessor Using RNS and Lazy Reduction
- FPGA Architecture of Generalized Laguerre-Volterra MIMO Model for Neural Population Activities
- FPGA Architecture of Generalized Laguerre-Volterra MIMO Model for Neural Population Spiking Activities
- FPGA Implementation of Pairings Using Residue Number System and Lazy Reduction
- High-Performance and Scalable System Architecture for the Real-Time Estimation of Generalized Laguerre-Volterra MIMO Model From Neural Population Spiking Activity
- Hydrate: Hybrid Reconfigurable Architecture Expressions
- Rapid single-chip secure processor prototyping on the OpenSPARC FPGA platform
- Counter Embedded Memory architecture for trusted computing platform
- Reconfigurable Number Theoretic Transform architectures for cryptographic applications
- A High-Performance Hardware Architecture for Spectral Hash Algorithm
- Hierarchical Segmentation for Hardware Function Evaluation
- Hardware Implementation Trade-Offs of Polynomial Approximations and Interpolations
- A Flexible Architecture for Precise Gamma Correction
- Automatic Accuracy-Guaranteed Bit-Width Optimization for Fixed and Floating-Point Systems
- Hardware Generation of Arbitrary Random Number Distributions From Uniform Distributions Via the Inversion Method
- Instrumented Multi-Stage Word-Length Optimization
- Accuracy-Guaranteed Bit-Width Optimization
- Inversion-based hardware gaussian random number generator: A case study of function evaluation via hierarchical segmentation
- Automating custom-precision function evaluation for embedded processors
- Customizable elliptic curve cryptosystems
- Reconfigurable Acceleration for Monte Carlo Based Financial Simulation
- Reconfigurable Elliptic Curve Cryptosystems on a Chip
- Ziggurat-based Hardware Gaussian Random Number Generator
- A scalable hardware architecture for prime number validation
- A System on Chip Design Framework for Prime Number Validation Using Reconfigurable Hardware
- Customising Hardware Designs for Elliptic Curve Cryptography
- An FPGA-based re-configurable 24-bit 96kHz sigma-delta audio DAC