Ray C. C. Cheung

Latest

SR-TCUR: Scalable and robust tubal CUR decomposition for large-scale multidimensional tensors
ConBOOM: A Configurable CPU Microarchitecture for Speculative Covert Channel Mitigation
A Speculative Loop Pipeline Framework with Accurate Path Modeling for High-Level Synthesis
CuFDFB: Fast and Private Computation on Non-Linear Functions Using FHE
Efficient CUR decomposition for interpretable low-rank approximations and imaging applications
FastViT: Real-Time Linear Attention Accelerator for Dense Predictions of Vision Transformer (ViT)
High-Radix/Mixed-Radix NTT Multiplication Algorithm/Architecture Co-Design Over Fermat Modulus
IncTSVD: Incremental Tensor Singular Value Decomposition of Multidimensional Streaming Data
MLFormer: a high performance MPC linear inference framework for transformers
PQNTRU: Acceleration of NTRU-Based Schemes via Customized Post-Quantum Processor
Randomized tensor decomposition using parallel reconfigurable systems
RVSLH: Acceleration of Postquantum Standard SLH-DSA With Customized RISC-V Processor
VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping
A LoRaWAN-BLE Based AIoT Smart Farm Management and Control System
An AIoT LoRaWAN Control System With Compression and Image Recovery Algorithm (CIRA) for Extreme Weather
An Efficient FPGA-based Depthwise Separable Convolutional Neural Network Accelerator with Hardware Pruning
Building a Learner-Centric Citywide Digital Literacy Ecosystem: Train-the-Trainer, Community-Based Learning, and Gifted Education - A Guide for Educators, Policymakers, and Stakeholders
Efficient Blind Hyperspectral Unmixing Framework Based on CUR Decomposition (CUR-HU)
Efficient Key-Switching for Word-Type FHE and GPU Acceleration
Enhanced Black-Scholes Option Pricing: Bit-Width Optimization with Automatic Differentiation and Lagrange Multipliers
EOG Signal Processor: An SVM-based Multiclass Classifier to Detect Eye Movements
Gradient-Congruity Guided Federated Sparse Training
HTCNN: High-Throughput Batch CNN Inference with Homomorphic Encryption for Edge Computing
MSCA: A Multi-Grained Sparse Convolution Accelerator for DNN Training
PQNTRU: Acceleration of NTRU-based Schemes via Customized Post-Quantum Processor
ProgramGalois: A Programmable Generator of Radix-4 Discrete Galois Transformation Architecture for Lattice-Based Cryptography
REALISE-IoT: RISC-V-Based Efficient and Lightweight Public-Key System for IoT Applications
Revisiting Keccak and Dilithium Implementations on ARMv7-M
RO-SVD: A Reconfigurable Hardware Copyright Protection Framework for AIGC Applications
RO-SVD: A Reconfigurable Hardware Copyright Protection Framework for AIGC Applications
Yet Another Improvement of Plantard Arithmetic for Faster Kyber on Low-End 32-bit IoT Devices
A Platform for Adaptive Interference Mitigation and Intent Analysis Using OpenLANE
A Versatility-Performance Balanced Hardware Architecture for Scene Text Detection
Algorithm-Hardware Co-Design of Split-Radix Discrete Galois Transformation for KyberKEM
Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer
Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer
CO-Detector: Towards Complex Object Detection with Cross-Part Feature Learning in Remote Sensing
Design of a Hippocampal Cognitive Prosthesis Chip
Efficient and Automatic Breast Cancer Early Diagnosis System Based on the Hierarchical Extreme Learning Machine
Efficient Multiple Channels EEG Signal Classification Based on Hierarchical Extreme Learning Machine
High-performance and Configurable SW/HW Co-design of Post-quantum Signature CRYSTALS-Dilithium
Homomorphic Encryption-Based System Design for Secure Data Processing
Image Super-Resolution and FPGA Hardware Design
In-Network Aggregation with Transport Transparency for Distributed Training
MUREN: MUltistage Recursive Enhanced Network for Coal-Fired Power Plant Detection
Yet another Improvement of Plantard Arithmetic for Faster Kyber on Low-end 32-bit IoT Devices
A High-Performance FPGA Accelerator for CUR Decomposition
A Versatility-Performance Balanced Hardware Architecture for Scene Text Detection
Comp-TCAM: An Adaptable Composite Ternary Content-Addressable Memory on FPGAs
High Throughput Hardware/Software Heterogeneous System for RRPN-Based Scene Text Detection
Improved Plantard Arithmetic for Lattice-based Cryptography
Improved Plantard Arithmetic for Lattice-based Cryptography
Machine Learning Based Hardware Architecture for DOA Measurement From Mice EEG
Melting Glacier: A 37-Year (1984-2020) High-Resolution Glacier-Cover Record of MT. Kilimanjaro
Message from the General Chair and Program Co-Chairs
PipeNTT: A Pipelined Number Theoretic Transform Architecture
Preface
Reconfigurable content-addressable memory (CAM) on FPGAs: A tutorial and survey
A survey of breakthrough in blockchain technology: Adoptions, applications, challenges and future research
A systematic review of blockchain scalability: Issues, solutions, analysis and future research
Accelerated Updating Mechanisms for FPGA-Based Ternary Content-Addressable Memory
Aero-Hydroponic Agriculture IoT System
An Efficient Parallel Processor for Dense Tensor Computation
An FPGA-based MobileNet Accelerator Considering Network Structure Characteristics
Design of a Battery Carrying Barge for Enhancing Autonomous Sailboat's Endurance Capacity
Efficient High-Performance FPGA-Redis Hybrid NoSQL Caching System for Blockchain Scalability
Elastic Net Constraint-Based Tensor Model for High-Order Graph Matching
LoRaWAN-based Camera with (CIRA) Compression and Image Recovery Algorithm
On the Suitability of Read only Memory for FPGA-Based CAM Emulation Using Partial Reconfiguration
A Highly Parallel Constant-Time Almost-Inverse Algorithm
Binary convolutional neural network acceleration framework for rapid system prototyping
Compact Code-Based Signature for Reconfigurable Devices With Side Channel Resilience
Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers
Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers
NetReduce: RDMA-Compatible In-Network Reduction for Distributed DNN Training Acceleration
RPE-TCAM: Reconfigurable Power-Efficient Ternary Content-Addressable Memory on FPGAs
A high performance hardware architecture for non-negative tensor factorization
A robust background initialization algorithm with superpixel motion detection
Accurate and Compact Convolutional Neural Networks with Trained Binarization
Accurate and Compact Convolutional Neural Networks with Trained Binarization
An Efficient Application Specific Instruction Set Processor (ASIP) for Tensor Computation
Bank-selective Strategy for Gate-based Ternary Content-addressable Memory on FPGAs
D-TCAM: A High-Performance Distributed RAM Based TCAM Architecture on FPGAs
Feature Selection Based on Tensor Decomposition and Object Proposal for Night-Time Multiclass Vehicle Detection
High performance hardware architecture for singular spectrum analysis of Hankel tensors
High Performance Power-Efficient Gate-Based CAM for Reconfigurable Computing
Optimized Polynomial Multiplier Over Commutative Rings on FPGAs: A Case Study on BIKE
Reconfigurable RISC-V Secure Processor And SoC Integration
A fast inter CU decision algorithm for HEVC
A Robust Background Initialization Algorithm with Superpixel Motion Detection
ASIC Implementation of a Nonlinear Dynamical Model for Hippocampal Prosthesis
Dynamic Virtual Page-Based Flash Translation Layer With Novel Hot Data Identification and Adaptive Parallelism Management
FFT-Based McLaughlin's Montgomery Exponentiation without Conditional Selections
High-Speed Discrete Gaussian Sampler With Heterodyne Chaotic Laser Inputs
Lightweight Secure Processor Prototype on FPGA
Spectral arithmetic in Montgomery modular multiplication
A Bias-Bounded Digital True Random Number Generator Architecture
A Fully Pipelined Hardware Architecture for Intra Prediction of HEVC
A low power V-band LC VCO with high Q varactor technique in 40 nm CMOS process
Area-Time Efficient Architecture of FFT-Based Montgomery Multiplication
Area-Time Efficient Computation of Niederreiter Encryption on QC-MDPC Codes for Embedded Hardware
Compact Constant Weight Coding Engines for the Code-Based Cryptography
Fast HEVC intra coding decision based on statistical cost and corner detection
High DC gain and wide output swing class-C inverter
Toward Practical Code-Based Signature: Implementing Fast and Compact QC-LDGM Signature Scheme on Embedded Hardware
An FPGA-Based High-Performance Neural Ensemble Spiking Activity Simulator Utilizing Generalized Volterra Kernel and Complexity Analysis
FPGA-Based High-Performance Collision Detection: An Enabling Technique for Image-Guided Robotic Surgery
Parameter Space for the Architecture of FFT-Based Montgomery Modular Multiplication
A Fast CU Size Decision Algorithm for the HEVC Intra Encoder
An Application Specific Instruction Set Processor (ASIP) for Adaptive Filters in Neural Prosthetics
Architecture Support for Task Out-of-Order Execution in MPSoCs
Configurable Architectures for Multi-Mode Floating Point Adders
Efficient Pairing Computation on Huff Curves
Fast and Generic Inversion Architectures Over GF(2(^mboxm)) Using Modified Itoh-Tsujii Algorithms
High-Speed Polynomial Multiplication Architecture for Ring-LWE and SHE Cryptosystems
Z-TCAM: An SRAM-based Architecture for TCAM
A complementary architecture for high-speed true random number generator
A low-power inverter-based (Σ)(Δ) analog-to-digital converter for audio applications
A perfectly current matched charge pump with wide dynamic range for ultra low voltage applications
An FPGA based scalable architecture of a stochastic state point process filter (SSPPF) to track the nonlinear dynamics underlying neural spiking
Big data genome sequencing on Zynq based clusters (abstract only)
Configurable Architecture for Double/Two-Parallel Single Precision Floating Point Division
Design Exploration of Geometric Biclustering for Microarray Data Analysis in Data Mining
E-TCAM: An Efficient SRAM-Based Architecture for TCAM
GPU-based biclustering for microarray data analysis in neurocomputing
High-speed Polynomial Multiplication Architecture for Ring-LWE and SHE Cryptosystems
Laguerre-volterra model and architecture for MIMO system identification and output prediction
Novel RNS Parameter Selection for Fast Modular Multiplication
Series Expansion based Efficient Architectures for Double Precision Floating Point Division
Time-efficient computation of digit serial Montgomery multiplication
Trade-offs between the sensitivity and the speed of the FPGA-based sequence aligner
Unified Architecture for Double/Two-Parallel Single Precision Floating Point Adder
VLSI architecture of a high-performance neural spiking activity simulator based on generalized Volterra kernel
Zero collision attack and its countermeasures on Residue Number System multipliers
A (σ)(δ) modulator using gain-Boost Class-C Inverter for Audio Applications
A 0.8-V 230-(µ)W 98-dB DR Inverter-Based (Σ)(Δ) Modulator for Audio Applications
A customizable Stochastic State Point Process Filter (SSPPF) for neural spiking activity
A Flexible and Customizable Architecture for the Relaxation Labeling Algorithm
A memory-based NFA regular expression match engine for signature-based intrusion detection
A reconfigurable architecture for real-time prediction of neural activity
A scalable RNS Montgomery multiplier over F(_mbox2(^mboxm))
Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support
Binding Hardware IPs to Specific FPGA Device via Inter-twining the PUF Response with the FSM of Sequential Circuits
Design Automation Framework for Reconfigurable Interconnection Networks
Design space explorations of Hybrid-Partitioned TCAM (HP-TCAM)
Fast simulation of Digital Spiking Silicon Neuron model employing reconfigurable dataflow computing
FPGA IP protection by binding Finite State Machine to Physical Unclonable Function
Genome sequencing using mapreduce on FPGA with multiple hardware accelerators (abstract only)
HEALPIX DCT technique for compressing PCA-based illumination adjustable images
Noise filtering and occurrence identification of mouse ultrasonic vocalization call
Parallel architecture for DNA sequence inexact matching with Burrows-Wheeler Transform
Real-Time Prediction of Neuronal Population Spiking Activity Using FPGA
VLSI Implementation of Double-Precision Floating-Point Multiplier Using Karatsuba Technique
A dual mode FPGA design for the hippocampal prosthesis
An FPGA-based acceleration platform for auction algorithm
Area-Efficient Architectures for Large Integer and Quadruple Precision Floating Point Multipliers
Area-Efficient FPGA Implementation of Quadruple Precision Floating Point Multiplier
Faster Pairing Coprocessor Architecture
FPGA Implementation of SRAM-based Ternary Content Addressable Memory
GPU-Based Biclustering for Neural Information Processing
High Performance Reconfigurable Architecture for Double Precision Floating Point Division
Hypergraph based geometric biclustering algorithm
Low complexity and hardware-friendly spectral modular multiplication
Reconfigurable Computing: Architectures, Tools and Applications - 8th International Symposium, ARC 2012, Hong Kong, China, March 19-23, 2012. Proceedings
Subthreshold CMOS voltage reference circuit with body bias compensation for process variation
A hardware-based computational platform for Generalized Laguerre-Volterra MIMO model for neural activities
A High Speed Pairing Coprocessor Using RNS and Lazy Reduction
FPGA Architecture of Generalized Laguerre-Volterra MIMO Model for Neural Population Activities
FPGA Architecture of Generalized Laguerre-Volterra MIMO Model for Neural Population Spiking Activities
FPGA Implementation of Pairings Using Residue Number System and Lazy Reduction
High-Performance and Scalable System Architecture for the Real-Time Estimation of Generalized Laguerre-Volterra MIMO Model From Neural Population Spiking Activity
Hydrate: Hybrid Reconfigurable Architecture Expressions
Rapid single-chip secure processor prototyping on the OpenSPARC FPGA platform
Counter Embedded Memory architecture for trusted computing platform
Reconfigurable Number Theoretic Transform architectures for cryptographic applications
A High-Performance Hardware Architecture for Spectral Hash Algorithm
Hierarchical Segmentation for Hardware Function Evaluation
Hardware Implementation Trade-Offs of Polynomial Approximations and Interpolations
A Flexible Architecture for Precise Gamma Correction
Automatic Accuracy-Guaranteed Bit-Width Optimization for Fixed and Floating-Point Systems
Hardware Generation of Arbitrary Random Number Distributions From Uniform Distributions Via the Inversion Method
Instrumented Multi-Stage Word-Length Optimization
Accuracy-Guaranteed Bit-Width Optimization
Inversion-based hardware gaussian random number generator: A case study of function evaluation via hierarchical segmentation
Automating custom-precision function evaluation for embedded processors
Customizable elliptic curve cryptosystems
Reconfigurable Acceleration for Monte Carlo Based Financial Simulation
Reconfigurable Elliptic Curve Cryptosystems on a Chip
Ziggurat-based Hardware Gaussian Random Number Generator
A scalable hardware architecture for prime number validation
A System on Chip Design Framework for Prime Number Validation Using Reconfigurable Hardware
Customising Hardware Designs for Elliptic Curve Cryptography
An FPGA-based re-configurable 24-bit 96kHz sigma-delta audio DAC