Person: Tarsa, Stephen
Loading...
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
Tarsa
First Name
Stephen
Name
Tarsa, Stephen
12 results
Search Results
Now showing 1 - 10 of 12
Publication Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors(USENIX Association, 2012) Tarsa, Stephen; Lin, Tsung-Han; Kung, H.Conjugate gradient is an important iterative method used for solving least squares problems. It is compute-bound and generally involves only simple matrix computations. One would expect that we could fully parallelize such computation on the GPU architecture with multiple Stream Multiprocessors (SMs), each consisting of many SIMD processing units. While implementing a conjugate gradient method for compressive sensing signal reconstruction, we have noticed that large speed-up due to parallel processing is actually infeasible due to the high I/O cost between SMs and GPU global memory. WE have found that if SMs were linearly connected, we could gain a 15x speedup by loop unrolling. We conclude that adding these relatively inexpensive neighbor connections for SMs can significantly enhance the applicability of GPUs to a large class of similar matrix computations.Publication Achieving High Throughput Ground-to-UAV Transport via Parallel Links(IEEE, 2011) Lin, Chit-Kwan; Kung, H.; Lin, Tsung-Han; Tarsa, Stephen; Vlah, DarioWireless data transfer under high mobility, as found in unmanned aerial vehicle (UAV) applications, is a challenge due to varying channel quality and extended link outages. We present FlowCode, an easily deployable link-layer solution utilizing multiple transmitters and receivers for the purpose of supporting existing transport protocols such as TCP in these scenarios. By using multiple transmitters and receivers and by exploiting the resulting antenna beam diversity and parallel transmission effects, FlowCode increases throughput and reception range. In emulation, we show that TCP over FlowCode gives greater goodput over a larger portion of the flight path, compared to an enhanced TCP protocol using the standard 802.11 MAC. In the process, we make a strong case for using trace-modulated emulation when developing distributed protocols for complex wireless environments.Publication Hierarchical Sparse Coding for Wireless Link Prediction in an Airborne Scenario(IEEE, 2013) Tarsa, Stephen; Kung, H.We build a data-driven hierarchical inference model to predict wireless link quality between a mobile unmanned aerial vehicle (UAV) and ground nodes. Clustering, sparse feature extraction, and non-linear pooling are combined to improve Support Vector Machine (SVM) classification when a limited training set does not comprehensively characterize data variations. Our approach first learns two layers of dictionaries by clustering packet reception data. These dictionaries are used to perform sparse feature extraction, which expresses link state vectors first in terms of a few prominent local patterns, or features, and then in terms of co-occurring features along the flight path. In order to tolerate artifacts like small positional shifts in field-collected data, we pool large magnitude features among overlapping shifted patches within windows. Together, these techniques transform raw link measurements into stable feature vectors that capture environmental effects driven by radio range limitations, antenna pattern variations, line-of-sight occlusions, etc. Link outage prediction is implemented by an SVM that assigns a common label to feature vectors immediately preceding gaps of successive packet losses, predictions are then fed to an adaptive link layer protocol that adjusts forward error correction rates, or queues packets during outages to prevent TCP timeout. In our harsh target environment, links are unstable and temporary outages common, so baseline TCP connections achieve only minimal throughput. However, connections under our predictive protocol temporarily hold packets that would otherwise be lost on unavailable links, and react quickly when the UAV link is restored, increasing overall channel utilization.Publication Parallelization Primitives for Dynamic Sparse Computations(2013) Lin, Tsung-Han; Tarsa, Stephen; Kung, H.We characterize a general class of algorithms common in machine learning, scientific computing, and signal processing, whose computational dependencies are both sparse, and dynamically defined throughout execution. Existing parallel computing runtimes, like MapReduce and GraphLab, are a poor fit for this class because they assume statically defined dependencies for resource allocation and scheduling decisions. As a result, changing load characteristics and straggling compute units degrade performance significantly. However, we show that the sparsity of computational dependencies and these algorithms’ natural error tolerance can be exploited to implement a flexible execution model with large efficiency gains, using two simple primitives: selective push-pull and statistical barriers. With reconstruction for compressive time-lapse MRI as a motivating application, we deploy a large Orthogonal Matching Pursuit (OMP) computation on Amazon’s EC2 cluster to demonstrate a 19x speedup over current static execution models.Publication Measuring diversity on a low-altitude UAV in a ground-to-air wireless 802.11 mesh network(IEEE, 2010) Kung, H.; Lin, Chit-Kwan; Lin, Tsung-Han; Tarsa, Stephen; Vlah, DarioWe consider the problem of mitigating a highly varying wireless channel between a transmitting ground node and receivers on a small, low-altitude unmanned aerial vehicle (UAV) in a 802.11 wireless mesh network. One approach is to use multiple transmitter and receiver nodes that exploit the channel's spatial/temporal diversity and that cooperate to improve overall packet reception. We present a series of measurement results from a real-world testbed that characterize the resulting wireless channel. We show that the correlation between receiver nodes on the airplane is poor at small time scales so receiver diversity can be exploited. Our measurements suggest that using several receiver nodes simultaneously can boost packet delivery rates substantially. Lastly, we show that similar results apply to transmitter selection diversity as well.Publication A location-dependent runs-and-gaps model for predicting TCP performance over a UAV wireless channel(IEEE, 2010) Kung, H.; Lin, Chit-Kwan; Lin, Tsung-Han; Tarsa, Stephen; Vlah, Dario; Hague, Daniel; Muccio, Michael; Poland, Brendon; Suter, BruceIn this paper, we use a finite-state model to predict the performance of the Transmission Control Protocol (TCP) over a varying wireless channel between an unmanned aerial vehicle (UAV) and ground nodes. As a UAV traverses its flight path, the wireless channel may experience periods of significant packet loss, successful packet delivery, and intermittent reception. By capturing packet run-length and gap-length statistics at various locations on the flight path, this location-dependent model can predict TCP throughput in spite of dynamically changing channel characteristics. We train the model by using packet traces from flight tests in the field and validate it by comparing TCP throughput distributions for model-generated traces against those for actual traces randomly sampled from field data. Our modeling methodology is general and can be applied to any UAV flight path.Publication FlowCode: Multi-site data exchange over wireless ad-hoc networks using network coding(IEEE, 2009) Kung, H.; Lin, Chit-Kwan; Lin, Tsung-Han; Tarsa, Stephen; Vlah, DarioWe present FlowCode, a system that exploits network coding at the granularity of traffic flows to facilitate fault-tolerant data exchange in wireless mesh networks. Applications include multi-site data replication in ad-hoc environments such as mesh networks or wireless data centers. By coupling an operand-driven transmission mechanism with a layered network topology, FlowCode allows us to realize the gains of network coding in application systems without a global scheduler. We analyze the resulting gains through modeling and simulation and validate our results on an outdoor testbed of 12 wireless devices. Results indicate that in high loss environments, FlowCode provides the most significant gains from improved fault tolerance over redundant paths.Publication Workload Prediction for Adaptive Power Scaling Using Deep Learning(IEEE, 2014) Tarsa, Stephen; Kumar, Amit; Kung, H.We apply hierarchical sparse coding, a form of deep learning, to model user-driven workloads based on on-chip hardware performance counters. We then predict periods of low instruction throughput, during which frequency and voltage can be scaled to reclaim power. Using a multi-layer coding structure, our method progressively codes counter values in terms of a few prominent features learned from data, and passes them to a Support Vector Machine (SVM) classifier where they act as signatures for predicting future workload states. We show that prediction accuracy and look-ahead range improve significantly over linear regression modeling, giving more time to adjust power management settings. Our method relies on learning and feature extraction algorithms that can discover and exploit hidden statistical invariances specific to workloads. We argue that, in addition to achieving superior prediction performance, our method is fast enough for practical use. To our knowledge, we are the first to use deep learning at the instruction level for workload prediction and on-chip power adaptation.Publication Taming Wireless Fluctuations by Predictive Queuing Using a Sparse-Coding Link-State Model(Association of Computing Machinery, 2015) Tarsa, Stephen; Comiter, Marcus; Crouse, Michael; McDanel, Bradley; Kung, H.We introduce State-Informed Link-Layer Queuing (SILQ), a system that models, predicts, and avoids packet delivery failures due to temporary wireless outages in everyday scenarios. By stabilizing connections in adverse link conditions, SILQ boosts throughput and reduces performance variation for network applications, for example by preventing unnecessary TCP timeouts caused by dead zones, elevators, and subway tunnels. SILQ makes predictions in real-time by actively probing links, matching measurements to an overcomplete dictionary of patterns learned offline, and classifying the resulting sparse feature vectors to identify those that precede outages. We use a clustering method called sparse coding to build our data-driven link model, and show that it produces more variation-tolerant predictions than traditional loss-rate, location-based, or Markov chain techniques. We present extensive data collection and field-validation of SILQ in airborne, indoor, and urban scenarios of practical interest. We show how offline unsupervised learning discovers link-state patterns that are stable across diverse networks and signal-propagation environments. Using these canonical primitives, we train outage predictors for 802.11 (Wi-Fi) and 3G cellular networks to demonstrate TCP throughput gains of 4x with off-the-shelf mobile devices. SILQ addresses delivery failures solely at the link layer, requires no new hardware, and upholds the end-to-end design principle, enabling easy integration across applications, devices, and networks.Publication Machine Learning for Machines: Data-Driven Performance Tuning at Runtime Using Sparse Coding(2015-01-21) Tarsa, Stephen; Seltzer, Margo; Lu, Yue; Gortler, StevenWe develop methods for adjusting device configurations to runtime conditions based on system-state predictions. Our approach statistically models performance data collected by either actively probing conditions such as wireless link quality, or leveraging existing infrastructure such as hardware performance counters. By predicting future runtime characteristics, we enable on-the-fly changes to wireless transmission schedule, voltage and frequency in circuits, and data placement in storage systems. In highly-variable everyday use-cases, we demonstrate large performance gains not by designing new protocols or system configurations, but by more-judiciously using those that exist. This thesis presents a state-modeling framework based on sparse feature represen- tation. It is applied in diverse application scenarios to data representing: 1. Packet loss over diverse wireless links 2. Circuit performance counters collected during user-driven workloads 3. Access pattern statistics measured from data- center storage systems Our framework uses unsupervised clustering to discover latent statistical structure in large datasets. We exploit this stable structure to reduce overfitting in supervised learning models like Support Vector Machine (SVM) classifiers and Classification and Regression Trees (CART) trained on small datasets. As a result, we can capture transient predictive statistics that change based on wireless environment, circuit workload, and storage application. Given the magnitude of performance improvements and the potential economic opportunity, we hope that this work becomes the foundation for a broad investigation into on-platform data-driven device optimization, dubbed Machine Learning for Machines (MLM).