Person:

Mitzenmacher, Michael

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Mitzenmacher

First Name

Michael

Name

Mitzenmacher, Michael

Search Results

Now showing 1 - 10 of 19
  • Publication

    Estimating and Comparing Entropy across Written Natural Languages Using PPM Compression

    (2002) Behr, Jr., Frederic H.; Fossum, Victoria; Mitzenmacher, Michael; Xiao, David

    Previous work on estimating the entropy of written natural language has focused primarily on English. We expand this work by considering other natural languages, inluding Arabic, Chinese, French, Greek, Japanese, Korean, Russian, and Spanish. We present the results of PPM compression on machine-generated and human-generated translations of texts into various languages. Under the assumption that languages are equally expressive, and that PPM compression does well across languages, one would expect that translated documents would compress to approximately the same size. We verify this empirically on a novel corpus of translated documents. We suggest as an application of this finding using the size of compressed natural language texts as a mean of automatically testing translation quality.

  • Publication

    Estimating Resemblance of MIDI Documents

    (2000) Mitzenmacher, Michael; Owen, Sean

    Search engines often employ techniques for determining syntactic similarity of web pages. Such a tool allows them to avoid returning multiple copies of essentially the same page when a user makes a query. Here we describe our experience extending these techniques to MIDI music files. The music domain requires modification to cope with problems introduced in the musical setting, such as polyphony. Our experience suggests that when used properly these techniques prove useful for determining duplicates and clustering databases in the musical setting as well.

  • Publication

    The Asymptotics of Selecting the Shortest of Two, Improved

    (1999) Mitzenmacher, Michael; Vocking, Berhold

    We investigate variations of a novel, recently proposed load balancing scheme based on small amounts of choice. The static (hashing) setting is modeled as a balls-and-bins process. The balls are sequentially placed into bins, with each ball selecting d bins randomly and going to the bin with the fewest balls. A similar dynamic setting is modeled as a scenario where tasks arrive as a Poisson process at a bank of FIFO servers and queue at one for service. Tasks probe a small random sample of servers in the bank and queue at the server with the fewest tasks. Recently is has been shown that breaking ties in a fixed, asymmetric fashion, actually improves performance, whereas in all previous analyses, ties were broken randomly. We demonstrate the nature of this improvement using fluid limit models, suggest further improvements, and verify and quantify the improvement using fluid limit models, suggest further improvements, and verify and quantify the improvement through simulations.

  • Publication

    Bounds and Improvements for BiBa Signature Schemes

    (2002) Mitzenmacher, Michael; Perrig, Adrian

    This paper analyzes and improves the recently proposed bins and balls signature (BiBa [23]), a new approach for designing signatures from one-way functions without trapdoors. We first construct a general framework for signature schemes based on the balls and bins paradigm and propose several new related signature algorithms. The framework also allows us to obtain upper bounds on the security of such signatures. Several of our signature algorithms approach the upper bound. We then show that by changing the framework in a novel manner we can boost the efficiency and security of our signature schemes. We call the resulting mechanism Powerball signatures. Powerball signatures offer greater security and efficiency than previous signature schemes based on one-way functions without trapdoors.

  • Publication

    Worst-Case and Average-Case Floating Codes for Flash Memory

    (2009) Finucane, Hilary; Mitzenmacher, Michael

    Flash memory is used in portable electronic devices like cell phones, mp3 players, digital cameras, and PDAs. The question of how to code information in flash memory is a difficult one whose answer can have effects on the speed and lifespan of a chip of flash memory. The main factor to be taken into account when designing codes for flash memory is the difference in cost between decreasing and increasing the state of a cell; decreasing the state of a cell requires first the resetting of an entire section of cells. Codes that focus on maximizing the number of updates between these reset operations are called floating codes. Worst-case floating codes focus on maximizing the lower bound on the number of updates between resets, and average-case floating codes focus on maximizing the average number of updates between resets. In this thesis, we review the work done on both worst-case and average-case floating codes. We also introduce a new code which performs as well as the previous best worst-case floating code but is significantly simpler, and a second code which performs asymptotically better.

  • Publication

    On the Hardness of Finding Optimal Multiple Preset Dictionaries

    (2000) Mitzenmacher, Michael

    Preset dictionaries for Huffman codes are used effectively in fax transmission and JPEG encoding. A natural extension is to allow multiple preset dictionaries instead of just one. We show, however, that finding optimal multiple preset dictionaries for Huffman and LZ77-based compression schemes is NP-hard.

  • Publication

    Towards Compressing Web Graphs

    (2000) Adler, Micah; Mitzenmacher, Michael

    In this paper, we consider the problem of compressing graphs of the link structure of the World Wide Web. We provide efficient algorithms for such compression that are motivated by recently proposed random graph models for describing the Web. The algorithms are based on reducing the compression problem to the problem of finding a minimum spanning tree in a directed graph related to the original link graph. The performance of the algorithms on graphs generated by the random graph models suggests that by taking advantage of the link structure of the Web, one may achieve significantly better compression than natural Huffman-based schemes. We also provide hardness results demonstrating limitations on natural extensions of our approach.

  • Publication

    Building a Better Bloom Filter

    (2005) Kirsch, Adam; Mitzenmacher, Michael

    A technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice.

  • Publication

    Cleaning up the record on the maximal information coefficient and equitability

    (Proceedings of the National Academy of Sciences, 2014) Reshef, David; Reshef, Yakir; Mitzenmacher, Michael; Sabeti, Pardis
  • Publication

    Concatenated Codes for Deletion Channels

    (2003) Chen, Johnny; Mitzenmacher, Michael; Ng, Chaki; Varnica, Nedeljko

    We design concatenated codes suitable for the deletion channel. The inner code is a combination of a single deletion correcting Varshamov-Tenengolts block code and a marker code. The outer code is a low-density parity-check (LDPC) code. The inner decoder detects the synchronizing points in the receive symbol sequence and feeds the outer LPDC decoder with soft information. Even using regular LPC codes as outer odes our results are promising. Our simulation results demonstrate that bit error rates of 10^-6 can be obtained at rate 0.21 when the probability of deletion is 8%.