Person:
Kung, H.

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Kung

First Name

H.

Name

Kung, H.

Search Results

Now showing 1 - 10 of 53
  • Publication
    Millimeter-Wave Field Experiments with Many Antenna Configurations for Indoor Multipath Environments
    (IEEE, 2017-12) Comiter, Marcus; Crouse, Michael; Kung, H.; Tarng, Jenn-Hwan; Tsai, Zuo-Min; Wu, Wei-Ting; Lee, Ta-Sung; Chang, M. C. Frank; Kaun, Yen-Cheng
    Next-generation wireless networks, such as 5G networks, will use millimeter waves (mmWaves) operating at 28 GHz, 38 GHz, 60 GHz, or higher frequencies to deliver unprecedentedly high data rates, e.g., 10 gigabits per second. Due to high attenuation at this higher frequency, use of directional antennas is commonly suggested for mmWave communication. It is therefore important to study how different antenna configurations at the transmitter and receiver effect received power and data throughput. In this paper, we describe field experiments with mmWave antennas for indoor multipath environments and report measurement results on a multitude of antenna configurations. Specifically, we examine four different mmWave systems, operating at two different frequencies (38 and 60 GHz), using a number of different antennas (horn antennas, omnidirectional antennas, and phase arrays). For each system, we systematically collect performance measurements (e.g., received power), and use these to examine the effects of beam misalignment on signal quality, the presence of multipath effects, and susceptibility to blockage. We capture interesting phenomena, including a multipath scenario in which a single receiver antenna can receive two copies of signals transmitted from the same transmitter antenna over multiple paths. From these field experiments, we discuss lessons learned and draw several conclusions, and their applicability to the design of future mmWave networks.
  • Publication
    Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices
    (IEEE, 2017-06) Teerapittayanon, Surat; McDanel, Bradley; Kung, H.; Teerapittayanon
    We propose distributed deep neural networks (DDNNs) over distributed computing hierarchies, consisting of the cloud, the edge (fog) and end devices. While being able to accommodate inference of a deep neural network (DNN) in the cloud, a DDNN also allows fast and localized inference using shallow portions of the neural network at the edge and end devices. When supported by a scalable distributed computing hierarchy, a DDNN can scale up in neural network size and scale out in geographical span. Due to its distributed nature, DDNNs enhance sensor fusion, system fault tolerance and data privacy for DNN applications. In implementing a DDNN, we map sections of a DNN onto a distributed computing hierarchy. By jointly training these sections, we minimize communication and resource usage for devices and maximize usefulness of extracted features which are utilized in the cloud. The resulting system has built-in support for automatic sensor fusion and fault tolerance. As a proof of concept, we show a DDNN can exploit geographical diversity of sensors to improve object recognition accuracy and reduce communication cost. In our experiment, compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x.
  • Publication
    Language Modeling by Clustering with Word Embeddings for Text Readability Assessment
    (ACM, 2017) Cha, Miriam; Gwon, Youngjune; Kung, H.
    We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences. We argue that clustering with word embeddings in the metric space should yield feature representations in a higher semantic space appropriate for text regression. Also, by representing features in terms of histograms, our approach can naturally address documents of varying lengths. An empirical evaluation using the Common Core Standards corpus reveals that the features formed on our clustering-based language model significantly improve the previously known results for the same corpus in readability prediction. We also evaluate the task of sentence matching based on semantic relatedness using the Wiki-SimpleWiki corpus and find that our features lead to superior matching performance.
  • Thumbnail Image
    Publication
    Blind Signal Classification via Sparse Coding
    (2016) Gwon, Youngjune; Dastangoo, Siamak; Kung, H.; Fossa, Carl
    We propose a novel RF signal classification method based on sparse coding, an unsupervised learning method popular in computer vision. In particular, we employ a convolutional sparse coder that can extract high-level features of an unknown received signal by maximal similarity matching against an over-complete dictionary of filter patterns. Such dictionary can be either generated or learned in an unsupervised fashion from measured signal examples conveying no ground-truth labels. The computed sparse code is then applied to train SVM classifiers for discriminating RF signals. As a result, the proposed approach can achieve blind signal classification that requires no prior knowledge (e.g., MCS, pulse shaping) about the signals present in an arbitrary RF channel. Since modulated RF signals undergo pulse shaping to aid the matched filter detection, our method exploits variability in relative similarity against the dictionary atoms as the key discriminating factor for classification. Our experimental results indicate that we can blindly separate different classes of digitally modulated signals with a 0.703 recall and 0.246 false alarm at 20dB SNR. Provided a small labeled dataset for supervised classifier training, we could improve the classification performance to a 0.878 recall and 0.141 false alarm.
  • Publication
    Sparse-coded net model and applications
    (IEEE, 2016-09) Gwon, Youngjune; Cha, Miriam; Campbell, William; Kung, H.; Dagli, Charlie K.
    As an unsupervised learning method, sparse coding can discover high-level representations for an input in a large variety of learning problems. Under semi-supervised settings, sparse coding is used to extract features for a supervised task such as classification. While sparse representations learned from unlabeled data independently of the supervised task perform well, we argue that sparse coding should also be built as a holistic learning unit optimizing on the supervised task objectives more explicitly. In this paper, we propose sparse-coded net, a feedforward model that integrates sparse coding and task-driven output layers, and describe training methods in detail. After pretraining a sparse-coded net via semi-supervised learning, we optimize its task-specific performance in a novel backpropagation algorithm that can traverse nonlinear feature pooling operators to update the dictionary. Thus, sparse-coded net can be applied to supervised dictionary learning. We evaluate sparse-coded net with classification problems in sound, image, and text data. The results confirm a significant improvement over semi-supervised learning as well as superior classification performance against deep stacked autoencoder neural network and GMM-SVM pipelines in small to medium-scale settings.
  • Thumbnail Image
    Publication
    BranchyNet: Fast inference via early exiting from deep neural networks
    (IEEE, 2017) Teerapittayanon, Surat; McDanel, Bradley; Kung, H.
    Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer. However, the improved performance of additional layers in a deep network comes at the cost of added latency and energy usage in feedforward inference. As networks continue to get deeper and larger, these costs become more prohibitive for real-time and energy-sensitive applications. To address this issue, we present BranchyNet, a novel deep network architecture that is augmented with additional side branch classifiers. The architecture allows prediction results for a large portion of test samples to exit the network early via these branches when samples can already be inferred with high confidence. BranchyNet exploits the observation that features learned at an early layer of a network may often be sufficient for the classification of many data points. For more difficult samples, which are expected less frequently, BranchyNet will use further or all network layers to provide the best likelihood of correct prediction. We study the BranchyNet architecture using several well-known networks (LeNet, AlexNet, ResNet) and datasets (MNIST, CIFAR10) and show that it can both improve accuracy and significantly reduce the inference time of the network.
  • Thumbnail Image
    Publication
    Language Recognition via Sparse Coding
    (2017-09-29) Gwon, Youngjune; Campbell, William M.; Sturim, Douglas E.; Kung, H.
    Spoken language recognition requires a series of signal processing steps and learning algorithms to model distinguishing characteristics of different languages. In this paper, we present a sparse discriminative feature learning framework for language recognition. We use sparse coding, an unsupervised method, to compute efficient representations for spectral features from a speech utterance while learning basis vectors for language models. Differentiated from existing approaches in sparse representation classification, we introduce a maximum a posteriori (MAP) adaptation scheme based on online learning that further optimizes the discriminative quality of sparse-coded speech features. We empirically validate the effectiveness of our approach using the NIST LRE 2015 dataset.
  • Thumbnail Image
    Publication
    Nested Buddy System: A New Block Address Allocation Scheme for ISPs and IaaS Providers
    (IEEE, 2016) Crouse, Michael; Kung, H.
    We propose a novel block address allocation method, called the nested buddy system, which can make use of wasted areas in the classical buddy system due to internal fragmentation. While achieving high utilization of address space, our new scheme supports efficient address matching for routers in packet forwarding and for network middleboxes in packet filtering. Specifically, the scheme uses just one prefix rule for each allocated address block in a packet routing/filtering table. We show by analysis and simulation that the increased address utilization can lead to significant reduction in the probability of a denial-of-service under bursty address allocation requests. In contrast, the classical buddy system requires the aggregation of many requests over time to smooth out demand, resulting in service delays undesirable to end users. Our solution is applicable to ISPs in serving mobile users carrying many network connected IoT devices and IasS providers in the cloud in serving tenants with dynamically varying demands for network addresses.
  • Thumbnail Image
    Publication
    Lambda means clustering: Automatic parameter search and distributed computing implementation
    (2016) Comiter, Marcus; Cha, Miriam; Kung, H.; Teerapittayanon, Surat
    Recent advances in clustering have shown that ensuring a minimum separation between cluster centroids leads to higher quality clusters compared to those found by methods that explicitly set the number of clusters to be found, such as k-means. One such algorithm is DP-means, which sets a distance parameter λ for the minimum separation. However, without knowing either the true number of clusters or the underlying true distribution, setting λ itself can be difficult, and poor choices in setting λ will negatively impact cluster quality. As a general solution for finding λ, in this paper we present λ-means, a clustering algorithm capable of deriving an optimal value for λ automatically. We contribute both a theoretically-motivated cluster-based version of λ-means, as well as a faster conflict-based version of λ-means. We demonstrate that λ-means discovers the true underlying value of λ asymptotically when run on datasets generated by a Dirichlet Process, and achieves competitive performance on a real world test dataset. Further, we demonstrate that when run on both parallel multicore computers and distributed cluster computers in the cloud, cluster-based λ-means achieves near perfect speedup, and while being a more efficient algorithm, conflict-based λmeans achieves speedups only a factor of two away from the maximum-possible.
  • Thumbnail Image
    Publication
    Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors
    (USENIX Association, 2012) Tarsa, Stephen; Lin, Tsung-Han; Kung, H.
    Conjugate gradient is an important iterative method used for solving least squares problems. It is compute-bound and generally involves only simple matrix computations. One would expect that we could fully parallelize such computation on the GPU architecture with multiple Stream Multiprocessors (SMs), each consisting of many SIMD processing units. While implementing a conjugate gradient method for compressive sensing signal reconstruction, we have noticed that large speed-up due to parallel processing is actually infeasible due to the high I/O cost between SMs and GPU global memory. WE have found that if SMs were linearly connected, we could gain a 15x speedup by loop unrolling. We conclude that adding these relatively inexpensive neighbor connections for SMs can significantly enhance the applicability of GPUs to a large class of similar matrix computations.