Martin, Milo

Collection

Researcher profiles

View all metadata

Search Results

Now showing 1 - 10 of 44

Improved Sequence-Based Speculation Techniques for Implementing Memory Consistency
(2008-05-27) Blundell, Colin; Martin, Milo M.K.; Wenisch, Tom
This work presents BMW, a new design for speculative implementations of memory consistency models in shared-memory multiprocessors. BMW obtains the same performance as prior proposals, but achieves this performance while avoiding several undesirable attributes of prior proposals: non-scalable structures, per-word valid bits in the data cache, modifications to the cache coherence protocol, and global arbitration. BMW uses a read and write bit per cache block and a standard invalidation-based cache coherence protocol to perform conflict detection while speculating. While speculating, stores to block not in the cache are placed into a coalescing store buffer until those misses return. Stores are written speculatively to the primary cache, and non-speculative state is maintained by cleaning dirty blocks before being written speculatively. Speculative blocks are invalidated on abort and marked as non-speculative on commit. This organization allows for fast, local commits while avoiding a non-scalable store queue.
Bandwidth Adaptive Snooping
(2002-02-02) Martin, Milo; Sorin, Daniel J; Hill, Mark D; Wood, David A
This paper advocates that cache coherence protocols use a bandwidth adaptive approach to adjust to varied system configurations (e.g., number of processors) and workload behaviors. We propose Bandwidth Adaptive Snooping Hybrid (BASH), a hybrid protocol that ranges from behaving like snooping (by broadcasting requests) when excess bandwidth is available to behaving like a directory protocol (by unicasting requests) when bandwidth is limited. BASH adapts dynamically by probabilistically deciding to broadcast or unicast on a per request basis using a local estimate of recent interconnection network utilization. Simulations of a microbenchmark and commercial and scientific workloads show that BASH robustly performs as well or better than the best of snooping and directory protocols as available bandwidth is varied. By mixing broadcasts and unicasts, BASH outperforms both snooping and directory protocols in the mid-range where a static choice of either is inefficient.
NoSQ: Store-Load Communication without a Store Queue
(2006-12-01) Sha, Tingting; Martin, Milo; Roth, Amir
This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load communication without a store queue and without executing stores in the out-of-order engine. NoSQ implements store-load communication using speculative memory bypassing (SMB), the dynamic short-circuiting of DEF-store-load-USE chains to DEF-USE chains. Whereas previous proposals used SMB as an opportunistic complement to conventional store queue-based forwarding, NoSQ uses SMB as a store queue replacement. NoSQ relies on two supporting mechanisms. The first is an advanced store-load bypassing predictor that for a given dynamic load can predict whether that load will bypass and the identity of the communicating store. The second is an efficient verification mechanism for both bypassed and non-bypassed loads using in-order load re-execution with an SMB-aware store vulnerability window (SVW) filter. The primary benefit of NoSQ is a simple, fast datapath that does not contain store-load forwarding hardware; all loads get their values either from the data cache or from the register file. Experiments show that this simpler design - despite being more speculative - slightly outperforms a conventional store-queue based design on most benchmarks (by 2% on average).
SoftBound: Highly Compatible and Complete Spatial Memory Safety for C
(2009-01-01) Nagarakatte, Santosh; Martin, Milo; Zhao, Jianzhou; Zdancewic, Stephan A
The serious bugs and security vulnerabilities facilitated by C/C++’s lack of bounds checking are well known. Yet, C and C++ remain in widespread use. Unfortunately, C’s arbitrary pointer arithmetic, conflation of pointers and arrays, and programmer-visible memory layout make retrofitting C/C++ with spatial safety guarantees extremely challenging. Existing approaches suffer from incompleteness, have high runtime overhead, or require non-trivial changes to the C source code. Thus far, these deficiencies have prevented widespread adoption of such techniques. This paper proposes SoftBound, a compile time transformation for enforcing complete spatial safety of C. SoftBound records base and bound information for every pointer as disjoint metadata. This decoupling enables SoftBound to provide complete spatial safety while requiring no changes to C source code. Moreover, SoftBound performs metadata manipulation only when loading or storing pointer values. A formal proof shows this is sufficient to provide complete spatial safety even in the presence of wild casts. SoftBound’s full checking mode provides complete spatial violation detection. To further reduce overheads, SoftBound has a store-only checking mode that successfully detects all the security vulnerabilities in a test suite while adding 15% or less overhead to half of the benchmarks.
Improving Multiple-CMP Systems Using Token Coherence
(2005-02-12) Marty, Michael R; Bingham, Jesse D; Hill, Mark D; Hu, Alan J; Martin, Milo M.K.; Wood, David A
Improvements in semiconductor technology now enable Chip Multiprocessors (CMPs). As many future computer systems will use one or more CMPs and support shared memory, such systems will have caches that must be kept coherent. Coherence is a particular challenge for Multiple-CMP (M-CMP) systems. One approach is to use a hierarchical protocol that explicitly separates the intra-CMP coherence protocol from the inter-CMP protocol, but couples them hierarchically to maintain coherence. However, hierarchical protocols are complex, leading to subtle, difficult-to-verify race conditions. Furthermore, most previous hierarchical protocols use directories at one or both levels, incurring indirections—and thus extra latency—for sharing misses, which are common in commercial workloads. In contrast, this paper exploits the separation of correctness substrate and performance policy in the recently-proposed token coherence protocol to develop the first M-CMP coherence protocol that is flat for correctness, but hierarchical for performance. Via model checking studies, we show that flat correctness eases verification. Via simulation with micro-benchmarks, we make new protocol variants more robust under contention. Finally, via simulation with commercial workloads on a commercial operating system, we show that new protocol variants can be 10-50% faster than a hierarchical directory protocol.
Evaluation of Few-View Reconstruction Parameters for Illicit Substance Detection using Fast-Neutron Transmission Spectroscopy
(1996-06-01) Fink, C. L; Humm, P. G; Martin, Milo; Micklich, B. J
We have evaluated the performance of an illicit substance detection system that performs image reconstruction using the Maximum Likelihood algebraic reconstruction algorithm, a few number of projections, and relatively coarse projection and pixel resolution. This evaluation was done using receiver operator curves and simulated data from the fast-neutron transmission spectroscopy system operated in a mode to detect explosives in luggage. The results show that increasing the number of projection angles is more important than increasing the projection resolution, the reconstructed pixel resolution, or the number of iterations in the Maximum Likelihood algorithm. A 100% detection efficiency with essentially no false positives is possible for a square block of RDX explosive, a projection resolution of 2 cm, a reconstructed pixel size of 2x2 cm, and five projection angles. For rectangular shaped explosives more angles are required to obtain the same system performance.
Exploiting Dead Value Information
(1997) Martin, Milo; Roth, Amir; Fischer, Charles N
We describe Dead Value Information (DVI) and introduce three new optimizations which exploit it. DVI provides assertions that certain register values are dead, meaning they will not be read before being overwritten. The processor can use DVI to track dead registers and dynamically eliminate unnecessary save and restore instructions from the execution stream at procedure calls and context switches. Our results indicate that dynamic saves and restore instances can be reduced by 46% for procedure calls and by 51% for context switches. In addition, save/restore elimination for procedure calls can improve overall performance by up to 5%. DVI also allows the processor manage physical registers to efficiently, reducing the size requirements of the physical register file. When the system clock rate is proportional to the register file cycle time, this optimization can improve performance. All of these optimizations can be supported with only a few new instructions and minimal additional hardware structures.
RETCON: Transactional Repair Without Replay
(2009-11-25) Blundell, Colin; Martin, Milo; Raghavan, Arun
Over the past decade, there has been a surge of academic and industrial interest in optimistic concurrency, i.e., the speculative parallel execution of code regions (transactions or critical sections) with the semantics of isolation from one another. This work analyzes bottlenecks to the scalability of workloads that use optimistic concurrency. We find that one common source of performance degradation is updates to auxiliary program data in otherwise non-conflicting operations, e.g. reference count updates on shared object reads and hashtable size field increments on inserts of different elements. To eliminate the performance impact of conflicts on such auxiliary data, this work proposes RETCON, a hardware mechanism that tracks the relationship between input and output values symbolically and uses this symbolic information to transparently repair the output state of a transaction at commit. RETCON is inspired by instruction replay-based mechanisms but exploits simplifying properties of the nature of computations on auxiliary data to perform repair without replay. Our experiments show that RETCON provides significant speedups for workloads that exhibit conflicts on auxiliary data, including transforming a transactionalized version of the reference python interpreter from a workload that exhibits no scaling to one that exhibits near-linear scaling on 32 cores.
Formalizing the LLVM Intermediate Representation for Verified Program Transformations
(2012-01-01) Nagarakatte, Santosh; Martin, Milo; Zhao, Jianzhou; Zdancewic, Stephan A
This paper presents Vellvm (verified LLVM), a framework for reasoning about programs expressed in LLVM's intermediate representation and transformations that operate on it. Vellvm provides a mechanized formal semantics of LLVM's intermediate representation, its type system, and properties of its SSA form. The framework is built using the Coq interactive theorem prover. It includes multiple operational semantics and proves relations among them to facilitate different reasoning styles and proof techniques. To validate Vellvm's design, we extract an interpreter from the Coq formal semantics that can execute programs from LLVM test suite and thus be compared against LLVM reference implementations. To demonstrate Vellvm's practicality, we formalize and verify a previously proposed transformation that hardens C programs against spatial memory safety violations. Vellvm's tools allow us to extract a new, verified implementation of the transformation pass that plugs into the real LLVM infrastructure; its performance is competitive with the non-verified, ad-hoc original.
Watchdog: Hardware for Safe and Secure Manual Memory Management and Full Memory Safety
(2012-06-01) Martin, Milo; Nagarakatte, Santosh; Zdancewic, Stephan A
Languages such as C and C++ use unsafe manual memory management, allowing simple bugs (i.e., accesses to an object after deallocation) to become the root cause of exploitable security vulnerabilities. This paper proposes Watchdog, a hardware-based approach for ensuring safe and secure manual memory management. Inspired by prior software-only proposals, Watchdog generates a unique identifier for each memory allocation, associates these identifiers with pointers, and checks to ensure that the identifier is still valid on every memory access. This use of identifiers and checks enables Watchdog to detect errors even in the presence of reallocations. Watchdog stores these pointer identifiers in a disjoint shadow space to provide comprehensive protection and ensure compatibility with existing code. To streamline the implementation and reduce runtime overhead: Watchdog (1) uses micro-ops to access metadata and perform checks, (2) eliminates metadata copies among registers via modified register renaming, and (3) uses a dedicated metadata cache to reduce checking overhead. Furthermore, this paper extends Watchdog’s mechanisms to detect bounds errors, thereby providing full hardware-enforced memory safety at low overheads.

Martin, Milo

Email Address

ORCID

Disciplines

relationships.isProjectOf

relationships.isOrgUnitOf

Position

Introduction

Research Interests

Collection

Filters

Settings

Sort By

Results per page

Search Results

Penn's Heritage